diff --git a/recipes/quickstart/NotebookLlama/README.md b/recipes/quickstart/NotebookLlama/README.md index ea7d827bea5e02719e2eaba057f83e1e04c4a287..70293c7f5a3f360ab9f4712d91df3b70115c80d8 100644 --- a/recipes/quickstart/NotebookLlama/README.md +++ b/recipes/quickstart/NotebookLlama/README.md @@ -23,6 +23,8 @@ Note 1: In Step 1, we prompt the 1B model to not modify the text or summarize it Note 2: For Step 2, you can also use `Llama-3.1-8B-Instruct` model, we recommend experimenting and trying if you see any differences. The 70B model was used here because it gave slightly more creative podcast transcripts for the tested examples. +Note 3: For Step 4, please try to extend the approach with other models. These models were chosen based on a sample prompt and worked best, newer models might sound better. Please see [Notes](./TTS_Notes.md) for some of the sample tests. + ### Detailed steps on running the notebook: Requirements: GPU server or an API provider for using 70B, 8B and 1B Llama models. diff --git a/recipes/quickstart/NotebookLlama/Step-1 PDF-Pre-Processing-Logic.ipynb b/recipes/quickstart/NotebookLlama/Step-1 PDF-Pre-Processing-Logic.ipynb index 107ce48220ff4064962f5239f1641c728836b699..e4bf71d3812d440e358ff8bcaa4a416c18f2f6ec 100644 --- a/recipes/quickstart/NotebookLlama/Step-1 PDF-Pre-Processing-Logic.ipynb +++ b/recipes/quickstart/NotebookLlama/Step-1 PDF-Pre-Processing-Logic.ipynb @@ -2696,6 +2696,16 @@ "print(processed_text[-1000:])" ] }, + { + "cell_type": "markdown", + "id": "3d996ac5", + "metadata": {}, + "source": [ + "### Next Notebook: Transcript Writer\n", + "\n", + "Now that we have the pre-processed text ready, we can move to converting into a transcript in the next notebook" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb b/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb index ea25940b23cbd13ca5b863d1bab4b04bf353b8a8..5f0679a4e9dfbaf69c7294ee70e6392f9d471881 100644 --- a/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb +++ b/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb @@ -302,6 +302,16 @@ " pickle.dump(save_string_pkl, file)" ] }, + { + "cell_type": "markdown", + "id": "dbae9411", + "metadata": {}, + "source": [ + "### Next Notebook: Transcript Re-writer\n", + "\n", + "We now have a working transcript but we can try making it more dramatic and natural. In the next notebook, we will use `Llama-3.1-8B-Instruct` model to do so." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/recipes/quickstart/NotebookLlama/Step-3-Re-Writer.ipynb b/recipes/quickstart/NotebookLlama/Step-3-Re-Writer.ipynb index 035d2b1a60a783f9283fb054039793fa7158c1e3..f120bc4b3b3e9d9a81d7ebd241fa9b43a1b39348 100644 --- a/recipes/quickstart/NotebookLlama/Step-3-Re-Writer.ipynb +++ b/recipes/quickstart/NotebookLlama/Step-3-Re-Writer.ipynb @@ -253,6 +253,16 @@ " pickle.dump(save_string_pkl, file)" ] }, + { + "cell_type": "markdown", + "id": "2dccf336", + "metadata": {}, + "source": [ + "### Next Notebook: TTS Workflow\n", + "\n", + "Now that we have our transcript ready, we are ready to generate the audio in the next notebook." + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/recipes/quickstart/NotebookLlama/Step-4-TTS-Workflow.ipynb b/recipes/quickstart/NotebookLlama/Step-4-TTS-Workflow.ipynb index a55ec8e28617618107fc54532ec283e6a2938a84..fece59a448af45e6d7376985d70b2a2df5a928c3 100644 --- a/recipes/quickstart/NotebookLlama/Step-4-TTS-Workflow.ipynb +++ b/recipes/quickstart/NotebookLlama/Step-4-TTS-Workflow.ipynb @@ -11,7 +11,9 @@ "\n", "In this notebook, we will learn how to generate Audio using both `suno/bark` and `parler-tts/parler-tts-mini-v1` models first. \n", "\n", - "After that, we will use the output from Notebook 3 to generate our complete podcast" + "After that, we will use the output from Notebook 3 to generate our complete podcast\n", + "\n", + "Note: Please feel free to extend this notebook with newer models. The above two were chosen after some tests using a sample prompt." ] }, { @@ -117,11 +119,7 @@ "id": "50b62df5-5ea3-4913-832a-da59f7cf8de2", "metadata": {}, "source": [ - "Generally in life, you set your device to \"cuda\" and are happy. \n", - "\n", - "However, sometimes you want to compensate for things and set it to `cuda:7` to tell the system but even more-so the world that you have 8 GPUS.\n", - "\n", - "Jokes aside please set `device = \"cuda\"` below if you're using a single GPU node." + "Please set `device = \"cuda\"` below if you're using a single GPU node." ] }, { @@ -161,7 +159,7 @@ ], "source": [ "# Set up device\n", - "device = \"cuda:7\" if torch.cuda.is_available() else \"cpu\"\n", + "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "\n", "# Load model and tokenizer\n", "model = ParlerTTSForConditionalGeneration.from_pretrained(\"parler-tts/parler-tts-mini-v1\").to(device)\n", @@ -639,6 +637,19 @@ " parameters=[\"-q:a\", \"0\"])" ] }, + { + "cell_type": "markdown", + "id": "c7ce5836", + "metadata": {}, + "source": [ + "### Suggested Next Steps:\n", + "\n", + "- Experiment with the prompts: Please feel free to experiment with the SYSTEM_PROMPT in the notebooks\n", + "- Extend workflow beyond two speakers\n", + "- Test other TTS Models\n", + "- Experiment with Speech Enhancer models as a step 5." + ] + }, { "cell_type": "code", "execution_count": null,