{"cells":[{"cell_type":"markdown","source":[""],"metadata":{"id":"RJSnI0Xy-kCm"}},{"cell_type":"markdown","metadata":{"id":"LERqQn5v8-ak"},"source":["# **Getting to know Llama 2: Everything you need to start building**\n","Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects."]},{"cell_type":"markdown","source":["##**0 - Prerequisites**\n","* Basic understanding of Large Language Models\n","\n","* Basic understanding of Python"],"metadata":{"id":"ioVMNcTesSEk"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"ktEA7qXmwdUM"},"outputs":[],"source":["# presentation layer code\n","\n","import base64\n","from IPython.display import Image, display\n","import matplotlib.pyplot as plt\n","\n","def mm(graph):\n"," graphbytes = graph.encode(\"ascii\")\n"," base64_bytes = base64.b64encode(graphbytes)\n"," base64_string = base64_bytes.decode(\"ascii\")\n"," display(Image(url=\"https://mermaid.ink/img/\" + base64_string))\n","\n","def genai_app_arch():\n"," mm(\"\"\"\n"," flowchart TD\n"," A[Users] --> B(Applications e.g. mobile, web)\n"," B --> |Hosted API|C(Platforms e.g. Custom, HuggingFace, Replicate)\n"," B -- optional --> E(Frameworks e.g. LangChain)\n"," C-->|User Input|D[Llama 2]\n"," D-->|Model Output|C\n"," E --> C\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def rag_arch():\n"," mm(\"\"\"\n"," flowchart TD\n"," A[User Prompts] --> B(Frameworks e.g. LangChain)\n"," B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n"," B -->|API|D[Llama 2]\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def llama2_family():\n"," mm(\"\"\"\n"," graph LR;\n"," llama-2 --> llama-2-7b\n"," llama-2 --> llama-2-13b\n"," llama-2 --> llama-2-70b\n"," llama-2-7b --> llama-2-7b-chat\n"," llama-2-13b --> llama-2-13b-chat\n"," llama-2-70b --> llama-2-70b-chat\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def apps_and_llms():\n"," mm(\"\"\"\n"," graph LR;\n"," users --> apps\n"," apps --> frameworks\n"," frameworks --> platforms\n"," platforms --> Llama 2\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","import ipywidgets as widgets\n","from IPython.display import display, Markdown\n","\n","# Create a text widget\n","API_KEY = widgets.Password(\n"," value='',\n"," placeholder='',\n"," description='API_KEY:',\n"," disabled=False\n",")\n","\n","def md(t):\n"," display(Markdown(t))\n","\n","def bot_arch():\n"," mm(\"\"\"\n"," graph LR;\n"," user --> prompt\n"," prompt --> i_safety\n"," i_safety --> context\n"," context --> Llama_2\n"," Llama_2 --> output\n"," output --> o_safety\n"," i_safety --> memory\n"," o_safety --> memory\n"," memory --> context\n"," o_safety --> user\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def fine_tuned_arch():\n"," mm(\"\"\"\n"," graph LR;\n"," Custom_Dataset --> Pre-trained_Llama\n"," Pre-trained_Llama --> Fine-tuned_Llama\n"," Fine-tuned_Llama --> RLHF\n"," RLHF --> |Loss:Cross-Entropy|Fine-tuned_Llama\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def load_data_faiss_arch():\n"," mm(\"\"\"\n"," graph LR;\n"," documents --> textsplitter\n"," textsplitter --> embeddings\n"," embeddings --> vectorstore\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def mem_context():\n"," mm(\"\"\"\n"," graph LR\n"," context(text)\n"," user_prompt --> context\n"," instruction --> context\n"," examples --> context\n"," memory --> context\n"," context --> tokenizer\n"," tokenizer --> embeddings\n"," embeddings --> LLM\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n"]},{"cell_type":"markdown","source":["##**1 - Understanding Llama 2**"],"metadata":{"id":"i4Np_l_KtIno"}},{"cell_type":"markdown","metadata":{"id":"PGPSI3M5PGTi"},"source":["### **1.1 - What is Llama 2?**\n","\n","* State of the art (SOTA), Open Source LLM\n","* 7B, 13B, 70B\n","* Pretrained + Chat\n","* Choosing model: Size, Quality, Cost, Speed\n","* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","\n","* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"OXRCC7wexZXd"},"outputs":[],"source":["llama2_family()"]},{"cell_type":"markdown","metadata":{"id":"aYeHVVh45bdT"},"source":["###**1.2 - Accessing Llama 2**\n","* Download + Self Host (on-premise)\n","* Hosted API Platform (e.g. Replicate)\n","\n","* Hosted Container Platform (e.g. Azure, AWS, GCP)\n","\n"]},{"cell_type":"markdown","source":["### **1.3 - Use Cases of Llama 2**\n","* Content Generation\n","* Chatbots\n","* Summarization\n","* Programming (e.g. Code Llama)\n","\n","* and many more..."],"metadata":{"id":"kBuSay8vtzL4"}},{"cell_type":"markdown","source":["##**2 - Using Llama 2**"],"metadata":{"id":"sd54g0OHuqBY"}},{"cell_type":"markdown","metadata":{"id":"h3YGMDJidHtH"},"source":["### **2.1 - Install dependencies**"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VhN6hXwx7FCp"},"outputs":[],"source":["# Install dependencies and initialize\n","%pip install -qU \\\n"," replicate \\\n"," langchain \\\n"," sentence_transformers \\\n"," pdf2image \\\n"," pdfminer \\\n"," pdfminer.six \\\n"," unstructured \\\n"," faiss-gpu"]},{"cell_type":"code","source":["# model we will use throughout the notebook\n","llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\""],"metadata":{"id":"Z8Y8qjEjmg50"},"execution_count":null,"outputs":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"8hkWpqWD28ho"},"outputs":[],"source":["# We will use Replicate hosted cloud environment\n","# Obtain Replicate API key → https://replicate.com/account/api-tokens)\n","# Find the model to use → we will use [`llama-2-13b-chat`](https://replicate.com/lucataco/llama-2-13b-chat)\n","\n","# enter your replicate api token\n","from getpass import getpass\n","import os\n","\n","REPLICATE_API_TOKEN = getpass()\n","os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n","\n","# alternatively, you can also store the tokens in environment variables and load it here"]},{"cell_type":"code","source":["# we will use replicate's hosted api\n","import replicate\n","\n","# text completion with input prompt\n","def Completion(prompt):\n"," output = replicate.run(\n"," llama2_13b,\n"," input={\"prompt\": prompt, \"max_new_tokens\":1000}\n"," )\n"," return \"\".join(output)\n","\n","# chat completion with input prompt and system prompt\n","def ChatCompletion(prompt, system_prompt=None):\n"," output = replicate.run(\n"," llama2_13b,\n"," input={\"system_prompt\": system_prompt,\n"," \"prompt\": prompt,\n"," \"max_new_tokens\":1000}\n"," )\n"," return \"\".join(output)"],"metadata":{"id":"bVCHZmETk36v"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"5Jxq0pmf6L73"},"source":["### **2.2 - Basic completion**"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"H93zZBIk6tNU"},"outputs":[],"source":["output = Completion(prompt=\"The typical color of a llama is: \")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"StccjUDh6W0Q"},"source":["### **2.3 - System prompts**\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VRnFogxd6rTc"},"outputs":[],"source":["output = ChatCompletion(\n"," prompt=\"The typical color of a llama is: \",\n"," system_prompt=\"respond with only one word\"\n"," )\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"Hp4GNa066pYy"},"source":["### **2.4 - Response formats**\n","* Can support different formatted outputs e.g. text, JSON, etc."]},{"cell_type":"code","source":["output = ChatCompletion(\n"," prompt=\"The typical color of a llama is: \",\n"," system_prompt=\"response in json format\"\n"," )\n","md(output)"],"metadata":{"id":"HTN79h4RptgQ"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"cWs_s9y-avIT"},"source":["## **3 - Gen AI Application Architecture**\n"]},{"cell_type":"code","source":["genai_app_arch()"],"metadata":{"id":"j9BGuI-9AOL5"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"6UlxBtbgys6j"},"source":["##4 - **Chatbot Architecture**\n","* User Prompts\n","* Input Safety\n","* Llama 2\n","* Output Safety\n","\n","* Memory & Context"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"tO5HnB56ys6t"},"outputs":[],"source":["bot_arch()"]},{"cell_type":"markdown","metadata":{"id":"r4DyTLD5ys6t"},"source":["### **4.1 - Chat conversation**\n","* LLMs are stateless\n","* Single Turn\n","\n","* Multi Turn (Memory)\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"EMM_egWMys6u"},"outputs":[],"source":["# example of single turn chat\n","prompt_chat = \"What is the average lifespan of a Llama?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"]},{"cell_type":"code","source":["# example without previous context. LLM's are stateless and cannot understand \"they\" without previous context\n","prompt_chat = \"What animal family are they?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"],"metadata":{"id":"sZ7uVKDYucgi"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat."],"metadata":{"id":"WQl3wmfbyBQ1"}},{"cell_type":"code","source":["# example of multi-turn chat, with storing previous context\n","prompt_chat = \"\"\"\n","User: What is the average lifespan of a Llama?\n","Assistant: Sure! The average lifespan of a llama is around 20-30 years.\n","User: What animal family are they?\n","\"\"\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question\")\n","md(output)"],"metadata":{"id":"t7SZe5fT3HG3"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"moXnmJ_xyD10"},"source":["### **4.2 - Prompt Engineering**\n","Prompt engineering refers to the science of designing effective prompts to get desired responses.\n"]},{"cell_type":"markdown","metadata":{"id":"t-v-FeZ4ztTB"},"source":["#### **4.2.1 - In-Context Learning (e.g. Zero-shot, Few-shot)**\n"," * In-context learning - specific method of prompt engineering where demonstration of task are provided as part of prompt.\n"," 1. Zero-shot learning - model is performing tasks without any\n","input examples.\n"," 2. Few or “N-Shot” Learning - model is performing and behaving based on input examples in user's prompt.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6W71MFNZyRkQ"},"outputs":[],"source":["# Zero-shot example. To get positive/negative/neutral sentiment, we need to give examples in the prompt\n","prompt = '''\n","Classify: I saw a Gecko.\n","Sentiment: ?\n","'''\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"MCQRjf1Y1RYJ"},"outputs":[],"source":["# By giving examples to Llama, it understands the expected output format.\n","\n","prompt = '''\n","Classify: I love Llamas!\n","Sentiment: Positive\n","Classify: I dont like Snakes.\n","Sentiment: Negative\n","Classify: I saw a Gecko.\n","Sentiment:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"One word response\")\n","md(output)"]},{"cell_type":"code","source":["# another zero-shot learning\n","prompt = '''\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"],"metadata":{"id":"8UmdlTmpDZxA"},"execution_count":null,"outputs":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"M_EcsUo1zqFD"},"outputs":[],"source":["# Another few-shot learning example with formatted prompt.\n","\n","prompt = '''\n","QUESTION: Llama?\n","ANSWER: Yes\n","QUESTION: Alpaca?\n","ANSWER: Yes\n","QUESTION: Rabbit?\n","ANSWER: No\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"mbr124Y197xl"},"source":["#### **4.2.2 - Chain of Thought**\n","* \"chain of thought\" or a coherent sequence of ideas is crucial for generating meaningful and contextually relevant responses\n","\n","* Hallucination on word problems"]},{"cell_type":"code","source":["# Standard prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"],"metadata":{"id":"Xn8zmLBQzpgj"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# Chain-Of-Thought prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","Let's think step by step.\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"],"metadata":{"id":"lKNOj79o1Kwu"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"C7tDW-AH770Y"},"source":["### **4.3 - Retrieval Augmented Generation (RAG)**\n","* Prompt Eng Limitations (Knowledge cutoff & lack of specialized data)\n","\n","* Langchain\n","\n","Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n","\n","\n"]},{"cell_type":"code","source":["rag_arch()"],"metadata":{"id":"Fl1LPltpRQD9"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["#### **4.3.1 - LangChain**\n","LangChain is a framework that helps make it easier to implement RAG.\n"],"metadata":{"id":"JJaGMLl_4vYm"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"aoqU3KTcHTWN"},"outputs":[],"source":["# langchain setup\n","from langchain.llms import Replicate\n","# Use the Llama 2 model hosted on Replicate\n","# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n","# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n","# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n","llama_model = Replicate(\n"," model=llama2_13b,\n"," model_kwargs={\"temperature\": 0.75,\"top_p\": 1, \"max_new_tokens\":1000}\n",")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gAV2EkZqcruF"},"outputs":[],"source":["# Step 1: load the external data source. In our case, we will load Meta’s “Responsible Use Guide” pdf document.\n","from langchain.document_loaders import OnlinePDFLoader\n","loader = OnlinePDFLoader(\"https://ai.meta.com/static-resource/responsible-use-guide/\")\n","documents = loader.load()\n","\n","# Step 2: Get text splits from document\n","from langchain.text_splitter import RecursiveCharacterTextSplitter\n","text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)\n","all_splits = text_splitter.split_documents(documents)\n","\n","# Step 3: Use the embedding model\n","from langchain.vectorstores import FAISS\n","from langchain.embeddings import HuggingFaceEmbeddings\n","model_name = \"sentence-transformers/all-mpnet-base-v2\" # embedding model\n","model_kwargs = {\"device\": \"cpu\"}\n","embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)\n","\n","# Step 4: Use vector store to store embeddings\n","vectorstore = FAISS.from_documents(all_splits, embeddings)"]},{"cell_type":"markdown","metadata":{"id":"K2l8S5tBxlkc"},"source":["#### **4.3.2 - LangChain Q&A Retriever**\n","* ConversationalRetrievalChain\n","\n","* Query the Source documents\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"NmEhBe3Kiyre"},"outputs":[],"source":["# Query against your own data\n","from langchain.chains import ConversationalRetrievalChain\n","chain = ConversationalRetrievalChain.from_llm(llama_model, vectorstore.as_retriever(), return_source_documents=True)\n","\n","chat_history = []\n","query = \"How is Meta approaching open science in two short sentences?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"CelLHIvoy2Ke"},"outputs":[],"source":["# This time your previous question and answer will be included as a chat history which will enable the ability\n","# to ask follow up questions.\n","chat_history = [(query, result[\"answer\"])]\n","query = \"How is it benefiting the world?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"markdown","source":["## **5 - Fine-Tuning Models**\n","\n","* Limitatons of Prompt Eng and RAG\n","* Fine-Tuning Arch\n","* Types (PEFT, LoRA, QLoRA)\n","* Using PyTorch for Pre-Training & Fine-Tuning\n","\n","* Evals + Quality\n"],"metadata":{"id":"TEvefAWIJONx"}},{"cell_type":"code","source":["fine_tuned_arch()"],"metadata":{"id":"0a9CvJ8YcTzV"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"_8lcgdZa8onC"},"source":["## **6 - Responsible AI**\n","\n","* Power + Responsibility\n","* Hallucinations\n","* Input & Output Safety\n","* Red-teaming (simulating real-world cyber attackers)\n","* [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"pbqb006R-T_k"},"source":["##**7 - Conclusion**\n","* Active research on LLMs and Llama\n","* Leverage the power of Llama and its open community\n","* Safety and responsible use is paramount!\n","\n","* Call-To-Action\n"," * [Replicate Free Credits](https://replicate.fyi/connect2023) for Connect attendees!\n"," * This notebook is available through Llama Github recipes\n"," * Use Llama in your projects and give us feedback\n"]},{"cell_type":"markdown","source":["#### **Resources**\n","- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n","- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n","- [Llama 2](https://ai.meta.com/llama/)\n","- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n","- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n","- [Replicate](https://replicate.com/meta/)\n","- [LangChain](https://www.langchain.com/)\n","\n"],"metadata":{"id":"gSz5dTMxp7xo"}},{"cell_type":"markdown","source":["#### **Authors & Contact**\n"," * asangani@meta.com, [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n"," * mohsena@meta.com, [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n"],"metadata":{"id":"V7aI6fhZp-KC"}}],"metadata":{"colab":{"provenance":[],"machine_shape":"hm","collapsed_sections":["ioVMNcTesSEk"],"toc_visible":true},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.6"}},"nbformat":4,"nbformat_minor":0}
{"cells":[{"cell_type":"markdown","metadata":{"id":"RJSnI0Xy-kCm"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"LERqQn5v8-ak"},"source":["# **Getting to know Llama 2: Everything you need to start building**\n","Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects."]},{"cell_type":"markdown","metadata":{"id":"ioVMNcTesSEk"},"source":["##**0 - Prerequisites**\n","* Basic understanding of Large Language Models\n","\n","* Basic understanding of Python"]},{"cell_type":"code","execution_count":5,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["Requirement already satisfied: matplotlib in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (3.8.0)\n","Requirement already satisfied: contourpy>=1.0.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.1.1)\n","Requirement already satisfied: cycler>=0.10 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (0.11.0)\n","Requirement already satisfied: fonttools>=4.22.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (4.42.1)\n","Requirement already satisfied: kiwisolver>=1.0.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.4.5)\n","Requirement already satisfied: numpy<2,>=1.21 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.25.2)\n","Requirement already satisfied: packaging>=20.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (23.1)\n","Requirement already satisfied: pillow>=6.2.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (9.3.0)\n","Requirement already satisfied: pyparsing>=2.3.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (3.1.1)\n","Requirement already satisfied: python-dateutil>=2.7 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (2.8.2)\n","Requirement already satisfied: six>=1.5 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)\n","Note: you may need to restart the kernel to use updated packages.\n","Collecting ipywidgets\n"," Obtaining dependency information for ipywidgets from https://files.pythonhosted.org/packages/4a/0e/57ed498fafbc60419a9332d872e929879ceba2d73cb11d284d7112472b3e/ipywidgets-8.1.1-py3-none-any.whl.metadata\n"," Downloading ipywidgets-8.1.1-py3-none-any.whl.metadata (2.4 kB)\n","Requirement already satisfied: comm>=0.1.3 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipywidgets) (0.1.4)\n","Requirement already satisfied: ipython>=6.1.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipywidgets) (8.15.0)\n","Requirement already satisfied: traitlets>=4.3.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipywidgets) (5.10.0)\n","Collecting widgetsnbextension~=4.0.9 (from ipywidgets)\n"," Obtaining dependency information for widgetsnbextension~=4.0.9 from https://files.pythonhosted.org/packages/29/03/107d96077c4befed191f7ad1a12c7b52a8f9d2778a5836d59f9855c105f6/widgetsnbextension-4.0.9-py3-none-any.whl.metadata\n"," Downloading widgetsnbextension-4.0.9-py3-none-any.whl.metadata (1.6 kB)\n","Collecting jupyterlab-widgets~=3.0.9 (from ipywidgets)\n"," Obtaining dependency information for jupyterlab-widgets~=3.0.9 from https://files.pythonhosted.org/packages/e8/05/0ebab152288693b5ec7b339aab857362947031143b282853b4c2dd4b5b40/jupyterlab_widgets-3.0.9-py3-none-any.whl.metadata\n"," Downloading jupyterlab_widgets-3.0.9-py3-none-any.whl.metadata (4.1 kB)\n","Requirement already satisfied: backcall in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.2.0)\n","Requirement already satisfied: decorator in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (5.1.1)\n","Requirement already satisfied: jedi>=0.16 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.19.0)\n","Requirement already satisfied: matplotlib-inline in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.1.6)\n","Requirement already satisfied: pickleshare in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.7.5)\n","Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (3.0.39)\n","Requirement already satisfied: pygments>=2.4.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (2.16.1)\n","Requirement already satisfied: stack-data in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.6.2)\n","Requirement already satisfied: exceptiongroup in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (1.1.3)\n","Requirement already satisfied: pexpect>4.3 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (4.8.0)\n","Requirement already satisfied: parso<0.9.0,>=0.8.3 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets) (0.8.3)\n","Requirement already satisfied: ptyprocess>=0.5 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets) (0.7.0)\n","Requirement already satisfied: wcwidth in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython>=6.1.0->ipywidgets) (0.2.6)\n","Requirement already satisfied: executing>=1.2.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (1.2.0)\n","Requirement already satisfied: asttokens>=2.1.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (2.4.0)\n","Requirement already satisfied: pure-eval in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (0.2.2)\n","Requirement already satisfied: six>=1.12.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from asttokens>=2.1.0->stack-data->ipython>=6.1.0->ipywidgets) (1.16.0)\n","Downloading ipywidgets-8.1.1-py3-none-any.whl (139 kB)\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m139.4/139.4 kB\u001b[0m \u001b[31m2.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mta \u001b[36m0:00:01\u001b[0m\n","\u001b[?25hDownloading jupyterlab_widgets-3.0.9-py3-none-any.whl (214 kB)\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m214.9/214.9 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mta \u001b[36m0:00:01\u001b[0m\n","\u001b[?25hDownloading widgetsnbextension-4.0.9-py3-none-any.whl (2.3 MB)\n","\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.3/2.3 MB\u001b[0m \u001b[31m24.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m\n","\u001b[?25hInstalling collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets\n","Successfully installed ipywidgets-8.1.1 jupyterlab-widgets-3.0.9 widgetsnbextension-4.0.9\n","Note: you may need to restart the kernel to use updated packages.\n"]}],"source":["%pip install matplotlib\n","%pip install ipywidgets"]},{"cell_type":"code","execution_count":6,"metadata":{"id":"ktEA7qXmwdUM"},"outputs":[],"source":["# presentation layer code\n","\n","import base64\n","from IPython.display import Image, display\n","import matplotlib.pyplot as plt\n","import ipywidgets as widgets\n","from IPython.display import display, Markdown\n","\n","\n","def mm(graph):\n"," graphbytes = graph.encode(\"ascii\")\n"," base64_bytes = base64.b64encode(graphbytes)\n"," base64_string = base64_bytes.decode(\"ascii\")\n"," display(Image(url=\"https://mermaid.ink/img/\" + base64_string))\n","\n","def genai_app_arch():\n"," mm(\"\"\"\n"," flowchart TD\n"," A[Users] --> B(Applications e.g. mobile, web)\n"," B --> |Hosted API|C(Platforms e.g. Custom, HuggingFace, Replicate)\n"," B -- optional --> E(Frameworks e.g. LangChain)\n"," C-->|User Input|D[Llama 2]\n"," D-->|Model Output|C\n"," E --> C\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def rag_arch():\n"," mm(\"\"\"\n"," flowchart TD\n"," A[User Prompts] --> B(Frameworks e.g. LangChain)\n"," B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n"," B -->|API|D[Llama 2]\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def llama2_family():\n"," mm(\"\"\"\n"," graph LR;\n"," llama-2 --> llama-2-7b\n"," llama-2 --> llama-2-13b\n"," llama-2 --> llama-2-70b\n"," llama-2-7b --> llama-2-7b-chat\n"," llama-2-13b --> llama-2-13b-chat\n"," llama-2-70b --> llama-2-70b-chat\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def apps_and_llms():\n"," mm(\"\"\"\n"," graph LR;\n"," users --> apps\n"," apps --> frameworks\n"," frameworks --> platforms\n"," platforms --> Llama 2\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","\n","\n","# Create a text widget\n","API_KEY = widgets.Password(\n"," value='',\n"," placeholder='',\n"," description='API_KEY:',\n"," disabled=False\n",")\n","\n","def md(t):\n"," display(Markdown(t))\n","\n","def bot_arch():\n"," mm(\"\"\"\n"," graph LR;\n"," user --> prompt\n"," prompt --> i_safety\n"," i_safety --> context\n"," context --> Llama_2\n"," Llama_2 --> output\n"," output --> o_safety\n"," i_safety --> memory\n"," o_safety --> memory\n"," memory --> context\n"," o_safety --> user\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def fine_tuned_arch():\n"," mm(\"\"\"\n"," graph LR;\n"," Custom_Dataset --> Pre-trained_Llama\n"," Pre-trained_Llama --> Fine-tuned_Llama\n"," Fine-tuned_Llama --> RLHF\n"," RLHF --> |Loss:Cross-Entropy|Fine-tuned_Llama\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def load_data_faiss_arch():\n"," mm(\"\"\"\n"," graph LR;\n"," documents --> textsplitter\n"," textsplitter --> embeddings\n"," embeddings --> vectorstore\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n","\n","def mem_context():\n"," mm(\"\"\"\n"," graph LR\n"," context(text)\n"," user_prompt --> context\n"," instruction --> context\n"," examples --> context\n"," memory --> context\n"," context --> tokenizer\n"," tokenizer --> embeddings\n"," embeddings --> LLM\n"," classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n"," \"\"\")\n"]},{"cell_type":"markdown","metadata":{"id":"i4Np_l_KtIno"},"source":["## **1 - Understanding Llama 2**"]},{"cell_type":"markdown","metadata":{"id":"PGPSI3M5PGTi"},"source":["### **1.1 - What is Llama 2?**\n","\n","* State of the art (SOTA), Open Source LLM\n","* 7B, 13B, 70B\n","* Pretrained + Chat\n","* Choosing model: Size, Quality, Cost, Speed\n","* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","\n","* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)"]},{"cell_type":"code","execution_count":7,"metadata":{"id":"OXRCC7wexZXd"},"outputs":[{"data":{"text/html":["<img src=\"https://mermaid.ink/img/CiAgZ3JhcGggTFI7CiAgICAgIGxsYW1hLTIgLS0+IGxsYW1hLTItN2IKICAgICAgbGxhbWEtMiAtLT4gbGxhbWEtMi0xM2IKICAgICAgbGxhbWEtMiAtLT4gbGxhbWEtMi03MGIKICAgICAgbGxhbWEtMi03YiAtLT4gbGxhbWEtMi03Yi1jaGF0CiAgICAgIGxsYW1hLTItMTNiIC0tPiBsbGFtYS0yLTEzYi1jaGF0CiAgICAgIGxsYW1hLTItNzBiIC0tPiBsbGFtYS0yLTcwYi1jaGF0CiAgICAgIGNsYXNzRGVmIGRlZmF1bHQgZmlsbDojQ0NFNkZGLHN0cm9rZTojODRCQ0Y1LHRleHRDb2xvcjojMUMyQjMzLGZvbnRGYW1pbHk6dHJlYnVjaGV0IG1zOwogIA==\"/>"],"text/plain":["<IPython.core.display.Image object>"]},"metadata":{},"output_type":"display_data"}],"source":["llama2_family()"]},{"cell_type":"markdown","metadata":{"id":"aYeHVVh45bdT"},"source":["### **1.2 - Accessing Llama 2**\n","* Download + Self Host (on-premise)\n","* Hosted API Platform (e.g. Replicate)\n","\n","* Hosted Container Platform (e.g. Azure, AWS, GCP)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"kBuSay8vtzL4"},"source":["### **1.3 - Use Cases of Llama 2**\n","* Content Generation\n","* Chatbots\n","* Summarization\n","* Programming (e.g. Code Llama)\n","\n","* and many more..."]},{"cell_type":"markdown","metadata":{"id":"sd54g0OHuqBY"},"source":["## **2 - Using Llama 2**"]},{"cell_type":"markdown","metadata":{"id":"h3YGMDJidHtH"},"source":["### **2.1 - Install dependencies**"]},{"cell_type":"code","execution_count":8,"metadata":{"id":"VhN6hXwx7FCp"},"outputs":[{"name":"stdout","output_type":"stream","text":["Note: you may need to restart the kernel to use updated packages.\n"]}],"source":["# Install dependencies and initialize\n","%pip install -qU \\\n"," replicate \\\n"," langchain \\\n"," sentence_transformers \\\n"," pdf2image \\\n"," pdfminer \\\n"," pdfminer.six \\\n"," unstructured \\\n"," faiss-gpu"]},{"cell_type":"code","execution_count":9,"metadata":{"id":"Z8Y8qjEjmg50"},"outputs":[],"source":["# model we will use throughout the notebook\n","llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\""]},{"cell_type":"code","execution_count":13,"metadata":{"id":"8hkWpqWD28ho"},"outputs":[],"source":["# We will use Replicate hosted cloud environment\n","# Obtain Replicate API key → https://replicate.com/account/api-tokens)\n","# Find the model to use → we will use [`llama-2-13b-chat`](https://replicate.com/lucataco/llama-2-13b-chat)\n","\n","# enter your replicate api token\n","from getpass import getpass\n","import os\n","\n","REPLICATE_API_TOKEN = getpass()\n","REPLICATE_API_TOKEN = \"r8_BllHLsEknCEcXvgyektqHMN4BrzOHe83CGxHz\"\n","os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n","\n","# alternatively, you can also store the tokens in environment variables and load it here"]},{"cell_type":"code","execution_count":11,"metadata":{"id":"bVCHZmETk36v"},"outputs":[],"source":["# we will use replicate's hosted api\n","import replicate\n","\n","# text completion with input prompt\n","def Completion(prompt):\n"," output = replicate.run(\n"," llama2_13b,\n"," input={\"prompt\": prompt, \"max_new_tokens\":1000}\n"," )\n"," return \"\".join(output)\n","\n","# chat completion with input prompt and system prompt\n","def ChatCompletion(prompt, system_prompt=None):\n"," output = replicate.run(\n"," llama2_13b,\n"," input={\"system_prompt\": system_prompt,\n"," \"prompt\": prompt,\n"," \"max_new_tokens\":1000}\n"," )\n"," return \"\".join(output)"]},{"cell_type":"markdown","metadata":{"id":"5Jxq0pmf6L73"},"source":["### **2.2 - Basic completion**"]},{"cell_type":"code","execution_count":12,"metadata":{"id":"H93zZBIk6tNU"},"outputs":[{"ename":"ReplicateError","evalue":"Incorrect authentication token. Learn how to authenticate and get your API token here: https://replicate.com/docs/reference/http#authentication","output_type":"error","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mReplicateError\u001b[0m Traceback (most recent call last)","\u001b[1;32m/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb Cell 18\u001b[0m line \u001b[0;36m1\n\u001b[0;32m----> <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a>\u001b[0m output \u001b[39m=\u001b[39m Completion(prompt\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mThe typical color of a llama is: \u001b[39;49m\u001b[39m\"\u001b[39;49m)\n\u001b[1;32m <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a>\u001b[0m md(output)\n","\u001b[1;32m/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb Cell 18\u001b[0m line \u001b[0;36m6\n\u001b[1;32m <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4'>5</a>\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mCompletion\u001b[39m(prompt):\n\u001b[0;32m----> <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5'>6</a>\u001b[0m output \u001b[39m=\u001b[39m replicate\u001b[39m.\u001b[39;49mrun(\n\u001b[1;32m <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6'>7</a>\u001b[0m llama2_13b,\n\u001b[1;32m <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7'>8</a>\u001b[0m \u001b[39minput\u001b[39;49m\u001b[39m=\u001b[39;49m{\u001b[39m\"\u001b[39;49m\u001b[39mprompt\u001b[39;49m\u001b[39m\"\u001b[39;49m: prompt, \u001b[39m\"\u001b[39;49m\u001b[39mmax_new_tokens\u001b[39;49m\u001b[39m\"\u001b[39;49m:\u001b[39m1000\u001b[39;49m}\n\u001b[1;32m <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8'>9</a>\u001b[0m )\n\u001b[1;32m <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9'>10</a>\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39m\"\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39mjoin(output)\n","File \u001b[0;32m~/miniconda/envs/llama-package/lib/python3.10/site-packages/replicate/client.py:138\u001b[0m, in \u001b[0;36mClient.run\u001b[0;34m(self, model_version, **kwargs)\u001b[0m\n\u001b[1;32m 134\u001b[0m \u001b[39mraise\u001b[39;00m ReplicateError(\n\u001b[1;32m 135\u001b[0m \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mInvalid model_version: \u001b[39m\u001b[39m{\u001b[39;00mmodel_version\u001b[39m}\u001b[39;00m\u001b[39m. Expected format: owner/name:version\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 136\u001b[0m )\n\u001b[1;32m 137\u001b[0m model \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mmodels\u001b[39m.\u001b[39mget(m\u001b[39m.\u001b[39mgroup(\u001b[39m\"\u001b[39m\u001b[39mmodel\u001b[39m\u001b[39m\"\u001b[39m))\n\u001b[0;32m--> 138\u001b[0m version \u001b[39m=\u001b[39m model\u001b[39m.\u001b[39;49mversions\u001b[39m.\u001b[39;49mget(m\u001b[39m.\u001b[39;49mgroup(\u001b[39m\"\u001b[39;49m\u001b[39mversion\u001b[39;49m\u001b[39m\"\u001b[39;49m))\n\u001b[1;32m 139\u001b[0m prediction \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mpredictions\u001b[39m.\u001b[39mcreate(version\u001b[39m=\u001b[39mversion, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 140\u001b[0m \u001b[39m# Return an iterator of the output\u001b[39;00m\n","File \u001b[0;32m~/miniconda/envs/llama-package/lib/python3.10/site-packages/replicate/version.py:89\u001b[0m, in \u001b[0;36mVersionCollection.get\u001b[0;34m(self, id)\u001b[0m\n\u001b[1;32m 80\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mget\u001b[39m(\u001b[39mself\u001b[39m, \u001b[39mid\u001b[39m: \u001b[39mstr\u001b[39m) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m Version:\n\u001b[1;32m 81\u001b[0m \u001b[39m \u001b[39m\u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m 82\u001b[0m \u001b[39m Get a specific model version.\u001b[39;00m\n\u001b[1;32m 83\u001b[0m \n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 87\u001b[0m \u001b[39m The model version.\u001b[39;00m\n\u001b[1;32m 88\u001b[0m \u001b[39m \"\"\"\u001b[39;00m\n\u001b[0;32m---> 89\u001b[0m resp \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_client\u001b[39m.\u001b[39;49m_request(\n\u001b[1;32m 90\u001b[0m \u001b[39m\"\u001b[39;49m\u001b[39mGET\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39mf\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39m/v1/models/\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_model\u001b[39m.\u001b[39;49musername\u001b[39m}\u001b[39;49;00m\u001b[39m/\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_model\u001b[39m.\u001b[39;49mname\u001b[39m}\u001b[39;49;00m\u001b[39m/versions/\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mid\u001b[39;49m\u001b[39m}\u001b[39;49;00m\u001b[39m\"\u001b[39;49m\n\u001b[1;32m 91\u001b[0m )\n\u001b[1;32m 92\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mprepare_model(resp\u001b[39m.\u001b[39mjson())\n","File \u001b[0;32m~/miniconda/envs/llama-package/lib/python3.10/site-packages/replicate/client.py:80\u001b[0m, in \u001b[0;36mClient._request\u001b[0;34m(self, method, path, **kwargs)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39m400\u001b[39m \u001b[39m<\u001b[39m\u001b[39m=\u001b[39m resp\u001b[39m.\u001b[39mstatus_code \u001b[39m<\u001b[39m \u001b[39m600\u001b[39m:\n\u001b[1;32m 79\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 80\u001b[0m \u001b[39mraise\u001b[39;00m ReplicateError(resp\u001b[39m.\u001b[39mjson()[\u001b[39m\"\u001b[39m\u001b[39mdetail\u001b[39m\u001b[39m\"\u001b[39m])\n\u001b[1;32m 81\u001b[0m \u001b[39mexcept\u001b[39;00m (JSONDecodeError, \u001b[39mKeyError\u001b[39;00m):\n\u001b[1;32m 82\u001b[0m \u001b[39mpass\u001b[39;00m\n","\u001b[0;31mReplicateError\u001b[0m: Incorrect authentication token. Learn how to authenticate and get your API token here: https://replicate.com/docs/reference/http#authentication"]}],"source":["output = Completion(prompt=\"The typical color of a llama is: \")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"StccjUDh6W0Q"},"source":["### **2.3 - System prompts**\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VRnFogxd6rTc"},"outputs":[],"source":["output = ChatCompletion(\n"," prompt=\"The typical color of a llama is: \",\n"," system_prompt=\"respond with only one word\"\n"," )\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"Hp4GNa066pYy"},"source":["### **2.4 - Response formats**\n","* Can support different formatted outputs e.g. text, JSON, etc."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"HTN79h4RptgQ"},"outputs":[],"source":["output = ChatCompletion(\n"," prompt=\"The typical color of a llama is: \",\n"," system_prompt=\"response in json format\"\n"," )\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"cWs_s9y-avIT"},"source":["## **3 - Gen AI Application Architecture**\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"j9BGuI-9AOL5"},"outputs":[],"source":["genai_app_arch()"]},{"cell_type":"markdown","metadata":{"id":"6UlxBtbgys6j"},"source":["##4 - **Chatbot Architecture**\n","* User Prompts\n","* Input Safety\n","* Llama 2\n","* Output Safety\n","\n","* Memory & Context"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"tO5HnB56ys6t"},"outputs":[],"source":["bot_arch()"]},{"cell_type":"markdown","metadata":{"id":"r4DyTLD5ys6t"},"source":["### **4.1 - Chat conversation**\n","* LLMs are stateless\n","* Single Turn\n","\n","* Multi Turn (Memory)\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"EMM_egWMys6u"},"outputs":[],"source":["# example of single turn chat\n","prompt_chat = \"What is the average lifespan of a Llama?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"sZ7uVKDYucgi"},"outputs":[],"source":["# example without previous context. LLM's are stateless and cannot understand \"they\" without previous context\n","prompt_chat = \"What animal family are they?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"WQl3wmfbyBQ1"},"source":["Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"t7SZe5fT3HG3"},"outputs":[],"source":["# example of multi-turn chat, with storing previous context\n","prompt_chat = \"\"\"\n","User: What is the average lifespan of a Llama?\n","Assistant: Sure! The average lifespan of a llama is around 20-30 years.\n","User: What animal family are they?\n","\"\"\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"moXnmJ_xyD10"},"source":["### **4.2 - Prompt Engineering**\n","Prompt engineering refers to the science of designing effective prompts to get desired responses.\n"]},{"cell_type":"markdown","metadata":{"id":"t-v-FeZ4ztTB"},"source":["#### **4.2.1 - In-Context Learning (e.g. Zero-shot, Few-shot)**\n"," * In-context learning - specific method of prompt engineering where demonstration of task are provided as part of prompt.\n"," 1. Zero-shot learning - model is performing tasks without any\n","input examples.\n"," 2. Few or “N-Shot” Learning - model is performing and behaving based on input examples in user's prompt.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6W71MFNZyRkQ"},"outputs":[],"source":["# Zero-shot example. To get positive/negative/neutral sentiment, we need to give examples in the prompt\n","prompt = '''\n","Classify: I saw a Gecko.\n","Sentiment: ?\n","'''\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"MCQRjf1Y1RYJ"},"outputs":[],"source":["# By giving examples to Llama, it understands the expected output format.\n","\n","prompt = '''\n","Classify: I love Llamas!\n","Sentiment: Positive\n","Classify: I dont like Snakes.\n","Sentiment: Negative\n","Classify: I saw a Gecko.\n","Sentiment:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"One word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"8UmdlTmpDZxA"},"outputs":[],"source":["# another zero-shot learning\n","prompt = '''\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"M_EcsUo1zqFD"},"outputs":[],"source":["# Another few-shot learning example with formatted prompt.\n","\n","prompt = '''\n","QUESTION: Llama?\n","ANSWER: Yes\n","QUESTION: Alpaca?\n","ANSWER: Yes\n","QUESTION: Rabbit?\n","ANSWER: No\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"mbr124Y197xl"},"source":["#### **4.2.2 - Chain of Thought**\n","* \"chain of thought\" or a coherent sequence of ideas is crucial for generating meaningful and contextually relevant responses\n","\n","* Hallucination on word problems"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Xn8zmLBQzpgj"},"outputs":[],"source":["# Standard prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"lKNOj79o1Kwu"},"outputs":[],"source":["# Chain-Of-Thought prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","Let's think step by step.\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"C7tDW-AH770Y"},"source":["### **4.3 - Retrieval Augmented Generation (RAG)**\n","* Prompt Eng Limitations (Knowledge cutoff & lack of specialized data)\n","\n","* Langchain\n","\n","Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n","\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Fl1LPltpRQD9"},"outputs":[],"source":["rag_arch()"]},{"cell_type":"markdown","metadata":{"id":"JJaGMLl_4vYm"},"source":["#### **4.3.1 - LangChain**\n","LangChain is a framework that helps make it easier to implement RAG.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"aoqU3KTcHTWN"},"outputs":[],"source":["# langchain setup\n","from langchain.llms import Replicate\n","# Use the Llama 2 model hosted on Replicate\n","# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n","# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n","# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n","llama_model = Replicate(\n"," model=llama2_13b,\n"," model_kwargs={\"temperature\": 0.75,\"top_p\": 1, \"max_new_tokens\":1000}\n",")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gAV2EkZqcruF"},"outputs":[],"source":["# Step 1: load the external data source. In our case, we will load Meta’s “Responsible Use Guide” pdf document.\n","from langchain.document_loaders import OnlinePDFLoader\n","loader = OnlinePDFLoader(\"https://ai.meta.com/static-resource/responsible-use-guide/\")\n","documents = loader.load()\n","\n","# Step 2: Get text splits from document\n","from langchain.text_splitter import RecursiveCharacterTextSplitter\n","text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)\n","all_splits = text_splitter.split_documents(documents)\n","\n","# Step 3: Use the embedding model\n","from langchain.vectorstores import FAISS\n","from langchain.embeddings import HuggingFaceEmbeddings\n","model_name = \"sentence-transformers/all-mpnet-base-v2\" # embedding model\n","model_kwargs = {\"device\": \"cpu\"}\n","embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)\n","\n","# Step 4: Use vector store to store embeddings\n","vectorstore = FAISS.from_documents(all_splits, embeddings)"]},{"cell_type":"markdown","metadata":{"id":"K2l8S5tBxlkc"},"source":["#### **4.3.2 - LangChain Q&A Retriever**\n","* ConversationalRetrievalChain\n","\n","* Query the Source documents\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"NmEhBe3Kiyre"},"outputs":[],"source":["# Query against your own data\n","from langchain.chains import ConversationalRetrievalChain\n","chain = ConversationalRetrievalChain.from_llm(llama_model, vectorstore.as_retriever(), return_source_documents=True)\n","\n","chat_history = []\n","query = \"How is Meta approaching open science in two short sentences?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"CelLHIvoy2Ke"},"outputs":[],"source":["# This time your previous question and answer will be included as a chat history which will enable the ability\n","# to ask follow up questions.\n","chat_history = [(query, result[\"answer\"])]\n","query = \"How is it benefiting the world?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"markdown","metadata":{"id":"TEvefAWIJONx"},"source":["## **5 - Fine-Tuning Models**\n","\n","* Limitatons of Prompt Eng and RAG\n","* Fine-Tuning Arch\n","* Types (PEFT, LoRA, QLoRA)\n","* Using PyTorch for Pre-Training & Fine-Tuning\n","\n","* Evals + Quality\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"0a9CvJ8YcTzV"},"outputs":[],"source":["fine_tuned_arch()"]},{"cell_type":"markdown","metadata":{"id":"_8lcgdZa8onC"},"source":["## **6 - Responsible AI**\n","\n","* Power + Responsibility\n","* Hallucinations\n","* Input & Output Safety\n","* Red-teaming (simulating real-world cyber attackers)\n","* [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"pbqb006R-T_k"},"source":["##**7 - Conclusion**\n","* Active research on LLMs and Llama\n","* Leverage the power of Llama and its open community\n","* Safety and responsible use is paramount!\n","\n","* Call-To-Action\n"," * [Replicate Free Credits](https://replicate.fyi/connect2023) for Connect attendees!\n"," * This notebook is available through Llama Github recipes\n"," * Use Llama in your projects and give us feedback\n"]},{"cell_type":"markdown","metadata":{"id":"gSz5dTMxp7xo"},"source":["#### **Resources**\n","- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n","- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n","- [Llama 2](https://ai.meta.com/llama/)\n","- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n","- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n","- [Replicate](https://replicate.com/meta/)\n","- [LangChain](https://www.langchain.com/)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"V7aI6fhZp-KC"},"source":["#### **Authors & Contact**\n"," * asangani@meta.com, [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n"," * mohsena@meta.com, [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n"]}],"metadata":{"colab":{"collapsed_sections":["ioVMNcTesSEk"],"machine_shape":"hm","provenance":[],"toc_visible":true},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.13"}},"nbformat":4,"nbformat_minor":0}
# **Getting to know Llama 2: Everything you need to start building**
# **Getting to know Llama 2: Everything you need to start building**
Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects.
Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##**0 - Prerequisites**
##**0 - Prerequisites**
* Basic understanding of Large Language Models
* Basic understanding of Large Language Models
* Basic understanding of Python
* Basic understanding of Python
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
%pip install matplotlib
%pip install ipywidgets
```
%% Output
Requirement already satisfied: matplotlib in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (3.8.0)
Requirement already satisfied: contourpy>=1.0.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.1.1)
Requirement already satisfied: cycler>=0.10 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (4.42.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.4.5)
Requirement already satisfied: numpy<2,>=1.21 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.25.2)
Requirement already satisfied: packaging>=20.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (9.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (3.1.1)
Requirement already satisfied: python-dateutil>=2.7 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: six>=1.5 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
Collecting ipywidgets
Obtaining dependency information for ipywidgets from https://files.pythonhosted.org/packages/4a/0e/57ed498fafbc60419a9332d872e929879ceba2d73cb11d284d7112472b3e/ipywidgets-8.1.1-py3-none-any.whl.metadata
Obtaining dependency information for widgetsnbextension~=4.0.9 from https://files.pythonhosted.org/packages/29/03/107d96077c4befed191f7ad1a12c7b52a8f9d2778a5836d59f9855c105f6/widgetsnbextension-4.0.9-py3-none-any.whl.metadata
Obtaining dependency information for jupyterlab-widgets~=3.0.9 from https://files.pythonhosted.org/packages/e8/05/0ebab152288693b5ec7b339aab857362947031143b282853b4c2dd4b5b40/jupyterlab_widgets-3.0.9-py3-none-any.whl.metadata
/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb Cell 18 line 1
----> <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> output = Completion(prompt="The typical color of a llama is: ")
ReplicateError: Incorrect authentication token. Learn how to authenticate and get your API token here: https://replicate.com/docs/reference/http#authentication
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### **2.3 - System prompts**
### **2.3 - System prompts**
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
output = ChatCompletion(
output = ChatCompletion(
prompt="The typical color of a llama is: ",
prompt="The typical color of a llama is: ",
system_prompt="respond with only one word"
system_prompt="respond with only one word"
)
)
md(output)
md(output)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### **2.4 - Response formats**
### **2.4 - Response formats**
* Can support different formatted outputs e.g. text, JSON, etc.
* Can support different formatted outputs e.g. text, JSON, etc.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
output = ChatCompletion(
output = ChatCompletion(
prompt="The typical color of a llama is: ",
prompt="The typical color of a llama is: ",
system_prompt="response in json format"
system_prompt="response in json format"
)
)
md(output)
md(output)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## **3 - Gen AI Application Architecture**
## **3 - Gen AI Application Architecture**
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
genai_app_arch()
genai_app_arch()
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
##4 - **Chatbot Architecture**
##4 - **Chatbot Architecture**
* User Prompts
* User Prompts
* Input Safety
* Input Safety
* Llama 2
* Llama 2
* Output Safety
* Output Safety
* Memory & Context
* Memory & Context
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
bot_arch()
bot_arch()
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### **4.1 - Chat conversation**
### **4.1 - Chat conversation**
* LLMs are stateless
* LLMs are stateless
* Single Turn
* Single Turn
* Multi Turn (Memory)
* Multi Turn (Memory)
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
# example of single turn chat
# example of single turn chat
prompt_chat = "What is the average lifespan of a Llama?"
prompt_chat = "What is the average lifespan of a Llama?"
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words")
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words")
md(output)
md(output)
```
```
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
# example without previous context. LLM's are stateless and cannot understand "they" without previous context
# example without previous context. LLM's are stateless and cannot understand "they" without previous context
prompt_chat = "What animal family are they?"
prompt_chat = "What animal family are they?"
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words")
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words")
md(output)
md(output)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat.
Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
# example of multi-turn chat, with storing previous context
# example of multi-turn chat, with storing previous context
prompt_chat = """
prompt_chat = """
User: What is the average lifespan of a Llama?
User: What is the average lifespan of a Llama?
Assistant: Sure! The average lifespan of a llama is around 20-30 years.
Assistant: Sure! The average lifespan of a llama is around 20-30 years.
User: What animal family are they?
User: What animal family are they?
"""
"""
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question")
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question")
md(output)
md(output)
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
### **4.2 - Prompt Engineering**
### **4.2 - Prompt Engineering**
Prompt engineering refers to the science of designing effective prompts to get desired responses.
Prompt engineering refers to the science of designing effective prompts to get desired responses.
* Prompt Eng Limitations (Knowledge cutoff & lack of specialized data)
* Prompt Eng Limitations (Knowledge cutoff & lack of specialized data)
* Langchain
* Langchain
Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.
Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
rag_arch()
rag_arch()
```
```
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
#### **4.3.1 - LangChain**
#### **4.3.1 - LangChain**
LangChain is a framework that helps make it easier to implement RAG.
LangChain is a framework that helps make it easier to implement RAG.
%% Cell type:code id: tags:
%% Cell type:code id: tags:
```
```
# langchain setup
# langchain setup
from langchain.llms import Replicate
from langchain.llms import Replicate
# Use the Llama 2 model hosted on Replicate
# Use the Llama 2 model hosted on Replicate
# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value
# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value
# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens
# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens