diff --git a/recipes/quickstart/NotebookLlama/README.md b/recipes/quickstart/NotebookLlama/README.md index a117cda27b0b3f7f5bc4aee3ce140d347eac349d..d0ec1bebec8d754e3c44dd397d9618210f79aa07 100644 --- a/recipes/quickstart/NotebookLlama/README.md +++ b/recipes/quickstart/NotebookLlama/README.md @@ -77,5 +77,4 @@ The speakers and the prompt for parler model were decided based on experimentati - https://colab.research.google.com/drive/1eJfA2XUa-mXwdMy7DoYKVYHI1iTd9Vkt?usp=sharing#scrollTo=NyYQ--3YksJY - https://replicate.com/suno-ai/bark?prediction=zh8j6yddxxrge0cjp9asgzd534 - https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c -- diff --git a/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb b/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb index 6d24bc4a7bdd1fb5af57a839b49322a919b5c8f9..5c9388d06dfcca6a9c8a614c52762ac83aa0258a 100644 --- a/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb +++ b/recipes/quickstart/NotebookLlama/Step-2-Transcript-Writer.ipynb @@ -1,5 +1,25 @@ { "cells": [ + { + "cell_type": "markdown", + "id": "de42c49d", + "metadata": {}, + "source": [ + "## Notebook 2: Transcript Writer\n", + "\n", + "This notebook uses the `Llama-3.1-70B-Instruct` model to take the cleaned up text from previous notebook and convert it into a podcast transcript\n", + "\n", + "`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task" + ] + }, + { + "cell_type": "markdown", + "id": "2e576ea9", + "metadata": {}, + "source": [ + "Experimentation with the `SYSTEM_PROMPT` below is encouraged, this worked best for the few examples the flow was tested with:" + ] + }, { "cell_type": "code", "execution_count": 1, @@ -35,6 +55,16 @@ "\"\"\"" ] }, + { + "cell_type": "markdown", + "id": "549aaccb", + "metadata": {}, + "source": [ + "For those of the readers that want to flex their money, please feel free to try using the 405B model here. \n", + "\n", + "For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:" + ] + }, { "cell_type": "code", "execution_count": 2, @@ -45,6 +75,14 @@ "MODEL = \"meta-llama/Llama-3.1-70B-Instruct\"" ] }, + { + "cell_type": "markdown", + "id": "fadc7eda", + "metadata": {}, + "source": [ + "Import the necessary framework" + ] + }, { "cell_type": "code", "execution_count": 3, @@ -64,6 +102,16 @@ "warnings.filterwarnings('ignore')" ] }, + { + "cell_type": "markdown", + "id": "7865ff7e", + "metadata": {}, + "source": [ + "Read in the file generated from earlier. \n", + "\n", + "The encoding details are to avoid issues with generic PDF(s) that might be ingested" + ] + }, { "cell_type": "code", "execution_count": 4, @@ -99,6 +147,14 @@ " return None" ] }, + { + "cell_type": "markdown", + "id": "66093561", + "metadata": {}, + "source": [ + "Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast" + ] + }, { "cell_type": "code", "execution_count": 5, @@ -109,6 +165,16 @@ "INPUT_PROMPT = read_file_to_string('./clean_extracted_text.txt')" ] }, + { + "cell_type": "markdown", + "id": "9be8dd2c", + "metadata": {}, + "source": [ + "Hugging Face has a great `pipeline()` method which makes our life easy for generating text from LLMs. \n", + "\n", + "We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126" + ] + }, { "cell_type": "code", "execution_count": 6, @@ -158,6 +224,14 @@ ")" ] }, + { + "cell_type": "markdown", + "id": "6349e7f3", + "metadata": {}, + "source": [ + "This is awesome, we can now save and verify the output generated from the model before moving to the next notebook" + ] + }, { "cell_type": "code", "execution_count": 7, @@ -209,6 +283,14 @@ "print(outputs[0][\"generated_text\"][-1]['content'])" ] }, + { + "cell_type": "markdown", + "id": "1e1414fe", + "metadata": {}, + "source": [ + "Let's save the output as pickle file and continue further to Notebook 3" + ] + }, { "cell_type": "code", "execution_count": 8, @@ -226,7 +308,9 @@ "id": "d9bab2f2-f539-435a-ae6a-3c9028489628", "metadata": {}, "outputs": [], - "source": [] + "source": [ + "#fin" + ] } ], "metadata": {