diff --git a/examples/Getting_to_know_Llama.ipynb b/examples/Getting_to_know_Llama.ipynb
index 8944d3d1838483f5457ce1dcbbd840826d8132ab..c751021cb004bcf2ecdb56cc50e938698bf25ec3 100644
--- a/examples/Getting_to_know_Llama.ipynb
+++ b/examples/Getting_to_know_Llama.ipynb
@@ -1 +1 @@
-{"cells":[{"cell_type":"markdown","source":["![Meta---Logo@1x.jpg]()"],"metadata":{"id":"RJSnI0Xy-kCm"}},{"cell_type":"markdown","metadata":{"id":"LERqQn5v8-ak"},"source":["# **Getting to know Llama 2: Everything you need to start building**\n","Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects."]},{"cell_type":"markdown","source":["##**0 - Prerequisites**\n","* Basic understanding of Large Language Models\n","\n","* Basic understanding of Python"],"metadata":{"id":"ioVMNcTesSEk"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"ktEA7qXmwdUM"},"outputs":[],"source":["# presentation layer code\n","\n","import base64\n","from IPython.display import Image, display\n","import matplotlib.pyplot as plt\n","\n","def mm(graph):\n","  graphbytes = graph.encode(\"ascii\")\n","  base64_bytes = base64.b64encode(graphbytes)\n","  base64_string = base64_bytes.decode(\"ascii\")\n","  display(Image(url=\"https://mermaid.ink/img/\" + base64_string))\n","\n","def genai_app_arch():\n","  mm(\"\"\"\n","  flowchart TD\n","    A[Users] --> B(Applications e.g. mobile, web)\n","    B --> |Hosted API|C(Platforms e.g. Custom, HuggingFace, Replicate)\n","    B -- optional --> E(Frameworks e.g. LangChain)\n","    C-->|User Input|D[Llama 2]\n","    D-->|Model Output|C\n","    E --> C\n","    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def rag_arch():\n","  mm(\"\"\"\n","  flowchart TD\n","    A[User Prompts] --> B(Frameworks e.g. LangChain)\n","    B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n","    B -->|API|D[Llama 2]\n","    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def llama2_family():\n","  mm(\"\"\"\n","  graph LR;\n","      llama-2 --> llama-2-7b\n","      llama-2 --> llama-2-13b\n","      llama-2 --> llama-2-70b\n","      llama-2-7b --> llama-2-7b-chat\n","      llama-2-13b --> llama-2-13b-chat\n","      llama-2-70b --> llama-2-70b-chat\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def apps_and_llms():\n","  mm(\"\"\"\n","  graph LR;\n","    users --> apps\n","    apps --> frameworks\n","    frameworks --> platforms\n","    platforms --> Llama 2\n","    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","import ipywidgets as widgets\n","from IPython.display import display, Markdown\n","\n","# Create a text widget\n","API_KEY = widgets.Password(\n","    value='',\n","    placeholder='',\n","    description='API_KEY:',\n","    disabled=False\n",")\n","\n","def md(t):\n","  display(Markdown(t))\n","\n","def bot_arch():\n","  mm(\"\"\"\n","  graph LR;\n","  user --> prompt\n","  prompt --> i_safety\n","  i_safety --> context\n","  context --> Llama_2\n","  Llama_2 --> output\n","  output --> o_safety\n","  i_safety --> memory\n","  o_safety --> memory\n","  memory --> context\n","  o_safety --> user\n","  classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def fine_tuned_arch():\n","  mm(\"\"\"\n","  graph LR;\n","      Custom_Dataset --> Pre-trained_Llama\n","      Pre-trained_Llama --> Fine-tuned_Llama\n","      Fine-tuned_Llama --> RLHF\n","      RLHF --> |Loss:Cross-Entropy|Fine-tuned_Llama\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def load_data_faiss_arch():\n","  mm(\"\"\"\n","  graph LR;\n","      documents --> textsplitter\n","      textsplitter --> embeddings\n","      embeddings --> vectorstore\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def mem_context():\n","  mm(\"\"\"\n","      graph LR\n","      context(text)\n","      user_prompt --> context\n","      instruction --> context\n","      examples --> context\n","      memory --> context\n","      context --> tokenizer\n","      tokenizer --> embeddings\n","      embeddings --> LLM\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n"]},{"cell_type":"markdown","source":["##**1 - Understanding Llama 2**"],"metadata":{"id":"i4Np_l_KtIno"}},{"cell_type":"markdown","metadata":{"id":"PGPSI3M5PGTi"},"source":["### **1.1 - What is Llama 2?**\n","\n","* State of the art (SOTA), Open Source LLM\n","* 7B, 13B, 70B\n","* Pretrained + Chat\n","* Choosing model: Size, Quality, Cost, Speed\n","* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","\n","* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"OXRCC7wexZXd"},"outputs":[],"source":["llama2_family()"]},{"cell_type":"markdown","metadata":{"id":"aYeHVVh45bdT"},"source":["###**1.2 - Accessing Llama 2**\n","* Download + Self Host (on-premise)\n","* Hosted API Platform (e.g. Replicate)\n","\n","* Hosted Container Platform (e.g. Azure, AWS, GCP)\n","\n"]},{"cell_type":"markdown","source":["### **1.3 - Use Cases of Llama 2**\n","* Content Generation\n","* Chatbots\n","* Summarization\n","* Programming (e.g. Code Llama)\n","\n","* and many more..."],"metadata":{"id":"kBuSay8vtzL4"}},{"cell_type":"markdown","source":["##**2 - Using Llama 2**"],"metadata":{"id":"sd54g0OHuqBY"}},{"cell_type":"markdown","metadata":{"id":"h3YGMDJidHtH"},"source":["### **2.1 - Install dependencies**"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VhN6hXwx7FCp"},"outputs":[],"source":["# Install dependencies and initialize\n","%pip install -qU \\\n","    replicate \\\n","    langchain \\\n","    sentence_transformers \\\n","    pdf2image \\\n","    pdfminer \\\n","    pdfminer.six \\\n","    unstructured \\\n","    faiss-gpu"]},{"cell_type":"code","source":["# model we will use throughout the notebook\n","llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\""],"metadata":{"id":"Z8Y8qjEjmg50"},"execution_count":null,"outputs":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"8hkWpqWD28ho"},"outputs":[],"source":["# We will use Replicate hosted cloud environment\n","# Obtain Replicate API key → https://replicate.com/account/api-tokens)\n","# Find the model to use → we will use [`llama-2-13b-chat`](https://replicate.com/lucataco/llama-2-13b-chat)\n","\n","# enter your replicate api token\n","from getpass import getpass\n","import os\n","\n","REPLICATE_API_TOKEN = getpass()\n","os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n","\n","# alternatively, you can also store the tokens in environment variables and load it here"]},{"cell_type":"code","source":["# we will use replicate's hosted api\n","import replicate\n","\n","# text completion with input prompt\n","def Completion(prompt):\n","  output = replicate.run(\n","      llama2_13b,\n","      input={\"prompt\": prompt, \"max_new_tokens\":1000}\n","  )\n","  return \"\".join(output)\n","\n","# chat completion with input prompt and system prompt\n","def ChatCompletion(prompt, system_prompt=None):\n","  output = replicate.run(\n","    llama2_13b,\n","    input={\"system_prompt\": system_prompt,\n","            \"prompt\": prompt,\n","            \"max_new_tokens\":1000}\n","  )\n","  return \"\".join(output)"],"metadata":{"id":"bVCHZmETk36v"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"5Jxq0pmf6L73"},"source":["### **2.2 - Basic completion**"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"H93zZBIk6tNU"},"outputs":[],"source":["output = Completion(prompt=\"The typical color of a llama is: \")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"StccjUDh6W0Q"},"source":["### **2.3 - System prompts**\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VRnFogxd6rTc"},"outputs":[],"source":["output = ChatCompletion(\n","    prompt=\"The typical color of a llama is: \",\n","    system_prompt=\"respond with only one word\"\n","  )\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"Hp4GNa066pYy"},"source":["### **2.4 - Response formats**\n","* Can support different formatted outputs e.g. text, JSON, etc."]},{"cell_type":"code","source":["output = ChatCompletion(\n","    prompt=\"The typical color of a llama is: \",\n","    system_prompt=\"response in json format\"\n","  )\n","md(output)"],"metadata":{"id":"HTN79h4RptgQ"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"cWs_s9y-avIT"},"source":["## **3 - Gen AI Application Architecture**\n"]},{"cell_type":"code","source":["genai_app_arch()"],"metadata":{"id":"j9BGuI-9AOL5"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"6UlxBtbgys6j"},"source":["##4 - **Chatbot Architecture**\n","* User Prompts\n","* Input Safety\n","* Llama 2\n","* Output Safety\n","\n","* Memory & Context"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"tO5HnB56ys6t"},"outputs":[],"source":["bot_arch()"]},{"cell_type":"markdown","metadata":{"id":"r4DyTLD5ys6t"},"source":["### **4.1 - Chat conversation**\n","* LLMs are stateless\n","* Single Turn\n","\n","* Multi Turn (Memory)\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"EMM_egWMys6u"},"outputs":[],"source":["# example of single turn chat\n","prompt_chat = \"What is the average lifespan of a Llama?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"]},{"cell_type":"code","source":["# example without previous context. LLM's are stateless and cannot understand \"they\" without previous context\n","prompt_chat = \"What animal family are they?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"],"metadata":{"id":"sZ7uVKDYucgi"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat."],"metadata":{"id":"WQl3wmfbyBQ1"}},{"cell_type":"code","source":["# example of multi-turn chat, with storing previous context\n","prompt_chat = \"\"\"\n","User: What is the average lifespan of a Llama?\n","Assistant: Sure! The average lifespan of a llama is around 20-30 years.\n","User: What animal family are they?\n","\"\"\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question\")\n","md(output)"],"metadata":{"id":"t7SZe5fT3HG3"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"moXnmJ_xyD10"},"source":["### **4.2 - Prompt Engineering**\n","Prompt engineering refers to the science of designing effective prompts to get desired responses.\n"]},{"cell_type":"markdown","metadata":{"id":"t-v-FeZ4ztTB"},"source":["#### **4.2.1 - In-Context Learning (e.g. Zero-shot, Few-shot)**\n"," * In-context learning - specific method of prompt engineering where demonstration of task are provided as part of prompt.\n","  1. Zero-shot learning - model is performing tasks without any\n","input examples.\n","  2. Few or “N-Shot” Learning - model is performing and behaving based on input examples in user's prompt.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6W71MFNZyRkQ"},"outputs":[],"source":["# Zero-shot example. To get positive/negative/neutral sentiment, we need to give examples in the prompt\n","prompt = '''\n","Classify: I saw a Gecko.\n","Sentiment: ?\n","'''\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"MCQRjf1Y1RYJ"},"outputs":[],"source":["# By giving examples to Llama, it understands the expected output format.\n","\n","prompt = '''\n","Classify: I love Llamas!\n","Sentiment: Positive\n","Classify: I dont like Snakes.\n","Sentiment: Negative\n","Classify: I saw a Gecko.\n","Sentiment:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"One word response\")\n","md(output)"]},{"cell_type":"code","source":["# another zero-shot learning\n","prompt = '''\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"],"metadata":{"id":"8UmdlTmpDZxA"},"execution_count":null,"outputs":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"M_EcsUo1zqFD"},"outputs":[],"source":["# Another few-shot learning example with formatted prompt.\n","\n","prompt = '''\n","QUESTION: Llama?\n","ANSWER: Yes\n","QUESTION: Alpaca?\n","ANSWER: Yes\n","QUESTION: Rabbit?\n","ANSWER: No\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"mbr124Y197xl"},"source":["#### **4.2.2 - Chain of Thought**\n","* \"chain of thought\" or a coherent sequence of ideas is crucial for generating meaningful and contextually relevant responses\n","\n","* Hallucination on word problems"]},{"cell_type":"code","source":["# Standard prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"],"metadata":{"id":"Xn8zmLBQzpgj"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# Chain-Of-Thought prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","Let's think step by step.\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"],"metadata":{"id":"lKNOj79o1Kwu"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"C7tDW-AH770Y"},"source":["### **4.3 - Retrieval Augmented Generation (RAG)**\n","* Prompt Eng Limitations (Knowledge cutoff & lack of specialized data)\n","\n","* Langchain\n","\n","Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n","\n","\n"]},{"cell_type":"code","source":["rag_arch()"],"metadata":{"id":"Fl1LPltpRQD9"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["#### **4.3.1 - LangChain**\n","LangChain is a framework that helps make it easier to implement RAG.\n"],"metadata":{"id":"JJaGMLl_4vYm"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"aoqU3KTcHTWN"},"outputs":[],"source":["# langchain setup\n","from langchain.llms import Replicate\n","# Use the Llama 2 model hosted on Replicate\n","# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n","# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n","# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n","llama_model = Replicate(\n","    model=llama2_13b,\n","    model_kwargs={\"temperature\": 0.75,\"top_p\": 1, \"max_new_tokens\":1000}\n",")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gAV2EkZqcruF"},"outputs":[],"source":["# Step 1: load the external data source. In our case, we will load Meta’s “Responsible Use Guide” pdf document.\n","from langchain.document_loaders import OnlinePDFLoader\n","loader = OnlinePDFLoader(\"https://ai.meta.com/static-resource/responsible-use-guide/\")\n","documents = loader.load()\n","\n","# Step 2: Get text splits from document\n","from langchain.text_splitter import RecursiveCharacterTextSplitter\n","text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)\n","all_splits = text_splitter.split_documents(documents)\n","\n","# Step 3: Use the embedding model\n","from langchain.vectorstores import FAISS\n","from langchain.embeddings import HuggingFaceEmbeddings\n","model_name = \"sentence-transformers/all-mpnet-base-v2\" # embedding model\n","model_kwargs = {\"device\": \"cpu\"}\n","embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)\n","\n","# Step 4: Use vector store to store embeddings\n","vectorstore = FAISS.from_documents(all_splits, embeddings)"]},{"cell_type":"markdown","metadata":{"id":"K2l8S5tBxlkc"},"source":["#### **4.3.2 - LangChain Q&A Retriever**\n","* ConversationalRetrievalChain\n","\n","* Query the Source documents\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"NmEhBe3Kiyre"},"outputs":[],"source":["# Query against your own data\n","from langchain.chains import ConversationalRetrievalChain\n","chain = ConversationalRetrievalChain.from_llm(llama_model, vectorstore.as_retriever(), return_source_documents=True)\n","\n","chat_history = []\n","query = \"How is Meta approaching open science in two short sentences?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"CelLHIvoy2Ke"},"outputs":[],"source":["# This time your previous question and answer will be included as a chat history which will enable the ability\n","# to ask follow up questions.\n","chat_history = [(query, result[\"answer\"])]\n","query = \"How is it benefiting the world?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"markdown","source":["## **5 - Fine-Tuning Models**\n","\n","* Limitatons of Prompt Eng and RAG\n","* Fine-Tuning Arch\n","* Types (PEFT, LoRA, QLoRA)\n","* Using PyTorch for Pre-Training & Fine-Tuning\n","\n","* Evals + Quality\n"],"metadata":{"id":"TEvefAWIJONx"}},{"cell_type":"code","source":["fine_tuned_arch()"],"metadata":{"id":"0a9CvJ8YcTzV"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"_8lcgdZa8onC"},"source":["## **6 - Responsible AI**\n","\n","* Power + Responsibility\n","* Hallucinations\n","* Input & Output Safety\n","* Red-teaming (simulating real-world cyber attackers)\n","* [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"pbqb006R-T_k"},"source":["##**7 - Conclusion**\n","* Active research on LLMs and Llama\n","* Leverage the power of Llama and its open community\n","* Safety and responsible use is paramount!\n","\n","* Call-To-Action\n","  * [Replicate Free Credits](https://replicate.fyi/connect2023) for Connect attendees!\n","  * This notebook is available through Llama Github recipes\n","  * Use Llama in your projects and give us feedback\n"]},{"cell_type":"markdown","source":["#### **Resources**\n","- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n","- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n","- [Llama 2](https://ai.meta.com/llama/)\n","- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n","- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n","- [Replicate](https://replicate.com/meta/)\n","- [LangChain](https://www.langchain.com/)\n","\n"],"metadata":{"id":"gSz5dTMxp7xo"}},{"cell_type":"markdown","source":["#### **Authors & Contact**\n","  * asangani@meta.com, [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n","  * mohsena@meta.com, [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n"],"metadata":{"id":"V7aI6fhZp-KC"}}],"metadata":{"colab":{"provenance":[],"machine_shape":"hm","collapsed_sections":["ioVMNcTesSEk"],"toc_visible":true},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.6"}},"nbformat":4,"nbformat_minor":0}
\ No newline at end of file
+{"cells":[{"cell_type":"markdown","metadata":{"id":"RJSnI0Xy-kCm"},"source":["![Meta---Logo@1x.jpg]()"]},{"cell_type":"markdown","metadata":{"id":"LERqQn5v8-ak"},"source":["# **Getting to know Llama 2: Everything you need to start building**\n","Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects."]},{"cell_type":"markdown","metadata":{"id":"ioVMNcTesSEk"},"source":["##**0 - Prerequisites**\n","* Basic understanding of Large Language Models\n","\n","* Basic understanding of Python"]},{"cell_type":"code","execution_count":5,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["Requirement already satisfied: matplotlib in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (3.8.0)\n","Requirement already satisfied: contourpy>=1.0.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.1.1)\n","Requirement already satisfied: cycler>=0.10 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (0.11.0)\n","Requirement already satisfied: fonttools>=4.22.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (4.42.1)\n","Requirement already satisfied: kiwisolver>=1.0.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.4.5)\n","Requirement already satisfied: numpy<2,>=1.21 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (1.25.2)\n","Requirement already satisfied: packaging>=20.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (23.1)\n","Requirement already satisfied: pillow>=6.2.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (9.3.0)\n","Requirement already satisfied: pyparsing>=2.3.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (3.1.1)\n","Requirement already satisfied: python-dateutil>=2.7 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from matplotlib) (2.8.2)\n","Requirement already satisfied: six>=1.5 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)\n","Note: you may need to restart the kernel to use updated packages.\n","Collecting ipywidgets\n","  Obtaining dependency information for ipywidgets from https://files.pythonhosted.org/packages/4a/0e/57ed498fafbc60419a9332d872e929879ceba2d73cb11d284d7112472b3e/ipywidgets-8.1.1-py3-none-any.whl.metadata\n","  Downloading ipywidgets-8.1.1-py3-none-any.whl.metadata (2.4 kB)\n","Requirement already satisfied: comm>=0.1.3 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipywidgets) (0.1.4)\n","Requirement already satisfied: ipython>=6.1.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipywidgets) (8.15.0)\n","Requirement already satisfied: traitlets>=4.3.1 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipywidgets) (5.10.0)\n","Collecting widgetsnbextension~=4.0.9 (from ipywidgets)\n","  Obtaining dependency information for widgetsnbextension~=4.0.9 from https://files.pythonhosted.org/packages/29/03/107d96077c4befed191f7ad1a12c7b52a8f9d2778a5836d59f9855c105f6/widgetsnbextension-4.0.9-py3-none-any.whl.metadata\n","  Downloading widgetsnbextension-4.0.9-py3-none-any.whl.metadata (1.6 kB)\n","Collecting jupyterlab-widgets~=3.0.9 (from ipywidgets)\n","  Obtaining dependency information for jupyterlab-widgets~=3.0.9 from https://files.pythonhosted.org/packages/e8/05/0ebab152288693b5ec7b339aab857362947031143b282853b4c2dd4b5b40/jupyterlab_widgets-3.0.9-py3-none-any.whl.metadata\n","  Downloading jupyterlab_widgets-3.0.9-py3-none-any.whl.metadata (4.1 kB)\n","Requirement already satisfied: backcall in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.2.0)\n","Requirement already satisfied: decorator in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (5.1.1)\n","Requirement already satisfied: jedi>=0.16 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.19.0)\n","Requirement already satisfied: matplotlib-inline in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.1.6)\n","Requirement already satisfied: pickleshare in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.7.5)\n","Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (3.0.39)\n","Requirement already satisfied: pygments>=2.4.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (2.16.1)\n","Requirement already satisfied: stack-data in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (0.6.2)\n","Requirement already satisfied: exceptiongroup in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (1.1.3)\n","Requirement already satisfied: pexpect>4.3 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from ipython>=6.1.0->ipywidgets) (4.8.0)\n","Requirement already satisfied: parso<0.9.0,>=0.8.3 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets) (0.8.3)\n","Requirement already satisfied: ptyprocess>=0.5 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets) (0.7.0)\n","Requirement already satisfied: wcwidth in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython>=6.1.0->ipywidgets) (0.2.6)\n","Requirement already satisfied: executing>=1.2.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (1.2.0)\n","Requirement already satisfied: asttokens>=2.1.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (2.4.0)\n","Requirement already satisfied: pure-eval in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (0.2.2)\n","Requirement already satisfied: six>=1.12.0 in /data/home/hamidnazeri/miniconda/envs/llama-package/lib/python3.10/site-packages (from asttokens>=2.1.0->stack-data->ipython>=6.1.0->ipywidgets) (1.16.0)\n","Downloading ipywidgets-8.1.1-py3-none-any.whl (139 kB)\n","\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m139.4/139.4 kB\u001b[0m \u001b[31m2.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mta \u001b[36m0:00:01\u001b[0m\n","\u001b[?25hDownloading jupyterlab_widgets-3.0.9-py3-none-any.whl (214 kB)\n","\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m214.9/214.9 kB\u001b[0m \u001b[31m5.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0mta \u001b[36m0:00:01\u001b[0m\n","\u001b[?25hDownloading widgetsnbextension-4.0.9-py3-none-any.whl (2.3 MB)\n","\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.3/2.3 MB\u001b[0m \u001b[31m24.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m\n","\u001b[?25hInstalling collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets\n","Successfully installed ipywidgets-8.1.1 jupyterlab-widgets-3.0.9 widgetsnbextension-4.0.9\n","Note: you may need to restart the kernel to use updated packages.\n"]}],"source":["%pip install matplotlib\n","%pip install ipywidgets"]},{"cell_type":"code","execution_count":6,"metadata":{"id":"ktEA7qXmwdUM"},"outputs":[],"source":["# presentation layer code\n","\n","import base64\n","from IPython.display import Image, display\n","import matplotlib.pyplot as plt\n","import ipywidgets as widgets\n","from IPython.display import display, Markdown\n","\n","\n","def mm(graph):\n","  graphbytes = graph.encode(\"ascii\")\n","  base64_bytes = base64.b64encode(graphbytes)\n","  base64_string = base64_bytes.decode(\"ascii\")\n","  display(Image(url=\"https://mermaid.ink/img/\" + base64_string))\n","\n","def genai_app_arch():\n","  mm(\"\"\"\n","  flowchart TD\n","    A[Users] --> B(Applications e.g. mobile, web)\n","    B --> |Hosted API|C(Platforms e.g. Custom, HuggingFace, Replicate)\n","    B -- optional --> E(Frameworks e.g. LangChain)\n","    C-->|User Input|D[Llama 2]\n","    D-->|Model Output|C\n","    E --> C\n","    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def rag_arch():\n","  mm(\"\"\"\n","  flowchart TD\n","    A[User Prompts] --> B(Frameworks e.g. LangChain)\n","    B <--> |Database, Docs, XLS|C[fa:fa-database External Data]\n","    B -->|API|D[Llama 2]\n","    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def llama2_family():\n","  mm(\"\"\"\n","  graph LR;\n","      llama-2 --> llama-2-7b\n","      llama-2 --> llama-2-13b\n","      llama-2 --> llama-2-70b\n","      llama-2-7b --> llama-2-7b-chat\n","      llama-2-13b --> llama-2-13b-chat\n","      llama-2-70b --> llama-2-70b-chat\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def apps_and_llms():\n","  mm(\"\"\"\n","  graph LR;\n","    users --> apps\n","    apps --> frameworks\n","    frameworks --> platforms\n","    platforms --> Llama 2\n","    classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","\n","\n","# Create a text widget\n","API_KEY = widgets.Password(\n","    value='',\n","    placeholder='',\n","    description='API_KEY:',\n","    disabled=False\n",")\n","\n","def md(t):\n","  display(Markdown(t))\n","\n","def bot_arch():\n","  mm(\"\"\"\n","  graph LR;\n","  user --> prompt\n","  prompt --> i_safety\n","  i_safety --> context\n","  context --> Llama_2\n","  Llama_2 --> output\n","  output --> o_safety\n","  i_safety --> memory\n","  o_safety --> memory\n","  memory --> context\n","  o_safety --> user\n","  classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def fine_tuned_arch():\n","  mm(\"\"\"\n","  graph LR;\n","      Custom_Dataset --> Pre-trained_Llama\n","      Pre-trained_Llama --> Fine-tuned_Llama\n","      Fine-tuned_Llama --> RLHF\n","      RLHF --> |Loss:Cross-Entropy|Fine-tuned_Llama\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def load_data_faiss_arch():\n","  mm(\"\"\"\n","  graph LR;\n","      documents --> textsplitter\n","      textsplitter --> embeddings\n","      embeddings --> vectorstore\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n","\n","def mem_context():\n","  mm(\"\"\"\n","      graph LR\n","      context(text)\n","      user_prompt --> context\n","      instruction --> context\n","      examples --> context\n","      memory --> context\n","      context --> tokenizer\n","      tokenizer --> embeddings\n","      embeddings --> LLM\n","      classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms;\n","  \"\"\")\n"]},{"cell_type":"markdown","metadata":{"id":"i4Np_l_KtIno"},"source":["## **1 - Understanding Llama 2**"]},{"cell_type":"markdown","metadata":{"id":"PGPSI3M5PGTi"},"source":["### **1.1 - What is Llama 2?**\n","\n","* State of the art (SOTA), Open Source LLM\n","* 7B, 13B, 70B\n","* Pretrained + Chat\n","* Choosing model: Size, Quality, Cost, Speed\n","* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","\n","* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/)"]},{"cell_type":"code","execution_count":7,"metadata":{"id":"OXRCC7wexZXd"},"outputs":[{"data":{"text/html":["<img src=\"https://mermaid.ink/img/CiAgZ3JhcGggTFI7CiAgICAgIGxsYW1hLTIgLS0+IGxsYW1hLTItN2IKICAgICAgbGxhbWEtMiAtLT4gbGxhbWEtMi0xM2IKICAgICAgbGxhbWEtMiAtLT4gbGxhbWEtMi03MGIKICAgICAgbGxhbWEtMi03YiAtLT4gbGxhbWEtMi03Yi1jaGF0CiAgICAgIGxsYW1hLTItMTNiIC0tPiBsbGFtYS0yLTEzYi1jaGF0CiAgICAgIGxsYW1hLTItNzBiIC0tPiBsbGFtYS0yLTcwYi1jaGF0CiAgICAgIGNsYXNzRGVmIGRlZmF1bHQgZmlsbDojQ0NFNkZGLHN0cm9rZTojODRCQ0Y1LHRleHRDb2xvcjojMUMyQjMzLGZvbnRGYW1pbHk6dHJlYnVjaGV0IG1zOwogIA==\"/>"],"text/plain":["<IPython.core.display.Image object>"]},"metadata":{},"output_type":"display_data"}],"source":["llama2_family()"]},{"cell_type":"markdown","metadata":{"id":"aYeHVVh45bdT"},"source":["### **1.2 - Accessing Llama 2**\n","* Download + Self Host (on-premise)\n","* Hosted API Platform (e.g. Replicate)\n","\n","* Hosted Container Platform (e.g. Azure, AWS, GCP)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"kBuSay8vtzL4"},"source":["### **1.3 - Use Cases of Llama 2**\n","* Content Generation\n","* Chatbots\n","* Summarization\n","* Programming (e.g. Code Llama)\n","\n","* and many more..."]},{"cell_type":"markdown","metadata":{"id":"sd54g0OHuqBY"},"source":["## **2 - Using Llama 2**"]},{"cell_type":"markdown","metadata":{"id":"h3YGMDJidHtH"},"source":["### **2.1 - Install dependencies**"]},{"cell_type":"code","execution_count":8,"metadata":{"id":"VhN6hXwx7FCp"},"outputs":[{"name":"stdout","output_type":"stream","text":["Note: you may need to restart the kernel to use updated packages.\n"]}],"source":["# Install dependencies and initialize\n","%pip install -qU \\\n","    replicate \\\n","    langchain \\\n","    sentence_transformers \\\n","    pdf2image \\\n","    pdfminer \\\n","    pdfminer.six \\\n","    unstructured \\\n","    faiss-gpu"]},{"cell_type":"code","execution_count":9,"metadata":{"id":"Z8Y8qjEjmg50"},"outputs":[],"source":["# model we will use throughout the notebook\n","llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\""]},{"cell_type":"code","execution_count":13,"metadata":{"id":"8hkWpqWD28ho"},"outputs":[],"source":["# We will use Replicate hosted cloud environment\n","# Obtain Replicate API key → https://replicate.com/account/api-tokens)\n","# Find the model to use → we will use [`llama-2-13b-chat`](https://replicate.com/lucataco/llama-2-13b-chat)\n","\n","# enter your replicate api token\n","from getpass import getpass\n","import os\n","\n","REPLICATE_API_TOKEN = getpass()\n","REPLICATE_API_TOKEN  = \"r8_BllHLsEknCEcXvgyektqHMN4BrzOHe83CGxHz\"\n","os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n","\n","# alternatively, you can also store the tokens in environment variables and load it here"]},{"cell_type":"code","execution_count":11,"metadata":{"id":"bVCHZmETk36v"},"outputs":[],"source":["# we will use replicate's hosted api\n","import replicate\n","\n","# text completion with input prompt\n","def Completion(prompt):\n","  output = replicate.run(\n","      llama2_13b,\n","      input={\"prompt\": prompt, \"max_new_tokens\":1000}\n","  )\n","  return \"\".join(output)\n","\n","# chat completion with input prompt and system prompt\n","def ChatCompletion(prompt, system_prompt=None):\n","  output = replicate.run(\n","    llama2_13b,\n","    input={\"system_prompt\": system_prompt,\n","            \"prompt\": prompt,\n","            \"max_new_tokens\":1000}\n","  )\n","  return \"\".join(output)"]},{"cell_type":"markdown","metadata":{"id":"5Jxq0pmf6L73"},"source":["### **2.2 - Basic completion**"]},{"cell_type":"code","execution_count":12,"metadata":{"id":"H93zZBIk6tNU"},"outputs":[{"ename":"ReplicateError","evalue":"Incorrect authentication token. Learn how to authenticate and get your API token here: https://replicate.com/docs/reference/http#authentication","output_type":"error","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mReplicateError\u001b[0m                            Traceback (most recent call last)","\u001b[1;32m/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb Cell 18\u001b[0m line \u001b[0;36m1\n\u001b[0;32m----> <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a>\u001b[0m output \u001b[39m=\u001b[39m Completion(prompt\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mThe typical color of a llama is: \u001b[39;49m\u001b[39m\"\u001b[39;49m)\n\u001b[1;32m      <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a>\u001b[0m md(output)\n","\u001b[1;32m/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb Cell 18\u001b[0m line \u001b[0;36m6\n\u001b[1;32m      <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4'>5</a>\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mCompletion\u001b[39m(prompt):\n\u001b[0;32m----> <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=5'>6</a>\u001b[0m   output \u001b[39m=\u001b[39m replicate\u001b[39m.\u001b[39;49mrun(\n\u001b[1;32m      <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=6'>7</a>\u001b[0m       llama2_13b,\n\u001b[1;32m      <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=7'>8</a>\u001b[0m       \u001b[39minput\u001b[39;49m\u001b[39m=\u001b[39;49m{\u001b[39m\"\u001b[39;49m\u001b[39mprompt\u001b[39;49m\u001b[39m\"\u001b[39;49m: prompt, \u001b[39m\"\u001b[39;49m\u001b[39mmax_new_tokens\u001b[39;49m\u001b[39m\"\u001b[39;49m:\u001b[39m1000\u001b[39;49m}\n\u001b[1;32m      <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=8'>9</a>\u001b[0m   )\n\u001b[1;32m     <a href='vscode-notebook-cell://ssh-remote%2Bpytorch/data/home/hamidnazeri/llama-package/llama-recipes/examples/Getting_to_know_Llama.ipynb#X22sdnNjb2RlLXJlbW90ZQ%3D%3D?line=9'>10</a>\u001b[0m   \u001b[39mreturn\u001b[39;00m \u001b[39m\"\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m.\u001b[39mjoin(output)\n","File \u001b[0;32m~/miniconda/envs/llama-package/lib/python3.10/site-packages/replicate/client.py:138\u001b[0m, in \u001b[0;36mClient.run\u001b[0;34m(self, model_version, **kwargs)\u001b[0m\n\u001b[1;32m    134\u001b[0m     \u001b[39mraise\u001b[39;00m ReplicateError(\n\u001b[1;32m    135\u001b[0m         \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mInvalid model_version: \u001b[39m\u001b[39m{\u001b[39;00mmodel_version\u001b[39m}\u001b[39;00m\u001b[39m. Expected format: owner/name:version\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m    136\u001b[0m     )\n\u001b[1;32m    137\u001b[0m model \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mmodels\u001b[39m.\u001b[39mget(m\u001b[39m.\u001b[39mgroup(\u001b[39m\"\u001b[39m\u001b[39mmodel\u001b[39m\u001b[39m\"\u001b[39m))\n\u001b[0;32m--> 138\u001b[0m version \u001b[39m=\u001b[39m model\u001b[39m.\u001b[39;49mversions\u001b[39m.\u001b[39;49mget(m\u001b[39m.\u001b[39;49mgroup(\u001b[39m\"\u001b[39;49m\u001b[39mversion\u001b[39;49m\u001b[39m\"\u001b[39;49m))\n\u001b[1;32m    139\u001b[0m prediction \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mpredictions\u001b[39m.\u001b[39mcreate(version\u001b[39m=\u001b[39mversion, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m    140\u001b[0m \u001b[39m# Return an iterator of the output\u001b[39;00m\n","File \u001b[0;32m~/miniconda/envs/llama-package/lib/python3.10/site-packages/replicate/version.py:89\u001b[0m, in \u001b[0;36mVersionCollection.get\u001b[0;34m(self, id)\u001b[0m\n\u001b[1;32m     80\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mget\u001b[39m(\u001b[39mself\u001b[39m, \u001b[39mid\u001b[39m: \u001b[39mstr\u001b[39m) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m Version:\n\u001b[1;32m     81\u001b[0m \u001b[39m    \u001b[39m\u001b[39m\"\"\"\u001b[39;00m\n\u001b[1;32m     82\u001b[0m \u001b[39m    Get a specific model version.\u001b[39;00m\n\u001b[1;32m     83\u001b[0m \n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     87\u001b[0m \u001b[39m        The model version.\u001b[39;00m\n\u001b[1;32m     88\u001b[0m \u001b[39m    \"\"\"\u001b[39;00m\n\u001b[0;32m---> 89\u001b[0m     resp \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_client\u001b[39m.\u001b[39;49m_request(\n\u001b[1;32m     90\u001b[0m         \u001b[39m\"\u001b[39;49m\u001b[39mGET\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39mf\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39m/v1/models/\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_model\u001b[39m.\u001b[39;49musername\u001b[39m}\u001b[39;49;00m\u001b[39m/\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_model\u001b[39m.\u001b[39;49mname\u001b[39m}\u001b[39;49;00m\u001b[39m/versions/\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mid\u001b[39;49m\u001b[39m}\u001b[39;49;00m\u001b[39m\"\u001b[39;49m\n\u001b[1;32m     91\u001b[0m     )\n\u001b[1;32m     92\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mprepare_model(resp\u001b[39m.\u001b[39mjson())\n","File \u001b[0;32m~/miniconda/envs/llama-package/lib/python3.10/site-packages/replicate/client.py:80\u001b[0m, in \u001b[0;36mClient._request\u001b[0;34m(self, method, path, **kwargs)\u001b[0m\n\u001b[1;32m     78\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39m400\u001b[39m \u001b[39m<\u001b[39m\u001b[39m=\u001b[39m resp\u001b[39m.\u001b[39mstatus_code \u001b[39m<\u001b[39m \u001b[39m600\u001b[39m:\n\u001b[1;32m     79\u001b[0m     \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m---> 80\u001b[0m         \u001b[39mraise\u001b[39;00m ReplicateError(resp\u001b[39m.\u001b[39mjson()[\u001b[39m\"\u001b[39m\u001b[39mdetail\u001b[39m\u001b[39m\"\u001b[39m])\n\u001b[1;32m     81\u001b[0m     \u001b[39mexcept\u001b[39;00m (JSONDecodeError, \u001b[39mKeyError\u001b[39;00m):\n\u001b[1;32m     82\u001b[0m         \u001b[39mpass\u001b[39;00m\n","\u001b[0;31mReplicateError\u001b[0m: Incorrect authentication token. Learn how to authenticate and get your API token here: https://replicate.com/docs/reference/http#authentication"]}],"source":["output = Completion(prompt=\"The typical color of a llama is: \")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"StccjUDh6W0Q"},"source":["### **2.3 - System prompts**\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"VRnFogxd6rTc"},"outputs":[],"source":["output = ChatCompletion(\n","    prompt=\"The typical color of a llama is: \",\n","    system_prompt=\"respond with only one word\"\n","  )\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"Hp4GNa066pYy"},"source":["### **2.4 - Response formats**\n","* Can support different formatted outputs e.g. text, JSON, etc."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"HTN79h4RptgQ"},"outputs":[],"source":["output = ChatCompletion(\n","    prompt=\"The typical color of a llama is: \",\n","    system_prompt=\"response in json format\"\n","  )\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"cWs_s9y-avIT"},"source":["## **3 - Gen AI Application Architecture**\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"j9BGuI-9AOL5"},"outputs":[],"source":["genai_app_arch()"]},{"cell_type":"markdown","metadata":{"id":"6UlxBtbgys6j"},"source":["##4 - **Chatbot Architecture**\n","* User Prompts\n","* Input Safety\n","* Llama 2\n","* Output Safety\n","\n","* Memory & Context"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"tO5HnB56ys6t"},"outputs":[],"source":["bot_arch()"]},{"cell_type":"markdown","metadata":{"id":"r4DyTLD5ys6t"},"source":["### **4.1 - Chat conversation**\n","* LLMs are stateless\n","* Single Turn\n","\n","* Multi Turn (Memory)\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"EMM_egWMys6u"},"outputs":[],"source":["# example of single turn chat\n","prompt_chat = \"What is the average lifespan of a Llama?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"sZ7uVKDYucgi"},"outputs":[],"source":["# example without previous context. LLM's are stateless and cannot understand \"they\" without previous context\n","prompt_chat = \"What animal family are they?\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question in few words\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"WQl3wmfbyBQ1"},"source":["Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"t7SZe5fT3HG3"},"outputs":[],"source":["# example of multi-turn chat, with storing previous context\n","prompt_chat = \"\"\"\n","User: What is the average lifespan of a Llama?\n","Assistant: Sure! The average lifespan of a llama is around 20-30 years.\n","User: What animal family are they?\n","\"\"\"\n","output = ChatCompletion(prompt=prompt_chat, system_prompt=\"answer the last question\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"moXnmJ_xyD10"},"source":["### **4.2 - Prompt Engineering**\n","Prompt engineering refers to the science of designing effective prompts to get desired responses.\n"]},{"cell_type":"markdown","metadata":{"id":"t-v-FeZ4ztTB"},"source":["#### **4.2.1 - In-Context Learning (e.g. Zero-shot, Few-shot)**\n"," * In-context learning - specific method of prompt engineering where demonstration of task are provided as part of prompt.\n","  1. Zero-shot learning - model is performing tasks without any\n","input examples.\n","  2. Few or “N-Shot” Learning - model is performing and behaving based on input examples in user's prompt.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6W71MFNZyRkQ"},"outputs":[],"source":["# Zero-shot example. To get positive/negative/neutral sentiment, we need to give examples in the prompt\n","prompt = '''\n","Classify: I saw a Gecko.\n","Sentiment: ?\n","'''\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"MCQRjf1Y1RYJ"},"outputs":[],"source":["# By giving examples to Llama, it understands the expected output format.\n","\n","prompt = '''\n","Classify: I love Llamas!\n","Sentiment: Positive\n","Classify: I dont like Snakes.\n","Sentiment: Negative\n","Classify: I saw a Gecko.\n","Sentiment:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"One word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"8UmdlTmpDZxA"},"outputs":[],"source":["# another zero-shot learning\n","prompt = '''\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"M_EcsUo1zqFD"},"outputs":[],"source":["# Another few-shot learning example with formatted prompt.\n","\n","prompt = '''\n","QUESTION: Llama?\n","ANSWER: Yes\n","QUESTION: Alpaca?\n","ANSWER: Yes\n","QUESTION: Rabbit?\n","ANSWER: No\n","QUESTION: Vicuna?\n","ANSWER:'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"one word response\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"mbr124Y197xl"},"source":["#### **4.2.2 - Chain of Thought**\n","* \"chain of thought\" or a coherent sequence of ideas is crucial for generating meaningful and contextually relevant responses\n","\n","* Hallucination on word problems"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Xn8zmLBQzpgj"},"outputs":[],"source":["# Standard prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"lKNOj79o1Kwu"},"outputs":[],"source":["# Chain-Of-Thought prompting\n","prompt = '''\n","Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now?\n","Let's think step by step.\n","'''\n","\n","output = ChatCompletion(prompt, system_prompt=\"provide short answer\")\n","md(output)"]},{"cell_type":"markdown","metadata":{"id":"C7tDW-AH770Y"},"source":["### **4.3 - Retrieval Augmented Generation (RAG)**\n","* Prompt Eng Limitations (Knowledge cutoff & lack of specialized data)\n","\n","* Langchain\n","\n","Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2.\n","\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Fl1LPltpRQD9"},"outputs":[],"source":["rag_arch()"]},{"cell_type":"markdown","metadata":{"id":"JJaGMLl_4vYm"},"source":["#### **4.3.1 - LangChain**\n","LangChain is a framework that helps make it easier to implement RAG.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"aoqU3KTcHTWN"},"outputs":[],"source":["# langchain setup\n","from langchain.llms import Replicate\n","# Use the Llama 2 model hosted on Replicate\n","# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value\n","# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens\n","# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens\n","llama_model = Replicate(\n","    model=llama2_13b,\n","    model_kwargs={\"temperature\": 0.75,\"top_p\": 1, \"max_new_tokens\":1000}\n",")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gAV2EkZqcruF"},"outputs":[],"source":["# Step 1: load the external data source. In our case, we will load Meta’s “Responsible Use Guide” pdf document.\n","from langchain.document_loaders import OnlinePDFLoader\n","loader = OnlinePDFLoader(\"https://ai.meta.com/static-resource/responsible-use-guide/\")\n","documents = loader.load()\n","\n","# Step 2: Get text splits from document\n","from langchain.text_splitter import RecursiveCharacterTextSplitter\n","text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)\n","all_splits = text_splitter.split_documents(documents)\n","\n","# Step 3: Use the embedding model\n","from langchain.vectorstores import FAISS\n","from langchain.embeddings import HuggingFaceEmbeddings\n","model_name = \"sentence-transformers/all-mpnet-base-v2\" # embedding model\n","model_kwargs = {\"device\": \"cpu\"}\n","embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)\n","\n","# Step 4: Use vector store to store embeddings\n","vectorstore = FAISS.from_documents(all_splits, embeddings)"]},{"cell_type":"markdown","metadata":{"id":"K2l8S5tBxlkc"},"source":["#### **4.3.2 - LangChain Q&A Retriever**\n","* ConversationalRetrievalChain\n","\n","* Query the Source documents\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"NmEhBe3Kiyre"},"outputs":[],"source":["# Query against your own data\n","from langchain.chains import ConversationalRetrievalChain\n","chain = ConversationalRetrievalChain.from_llm(llama_model, vectorstore.as_retriever(), return_source_documents=True)\n","\n","chat_history = []\n","query = \"How is Meta approaching open science in two short sentences?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"CelLHIvoy2Ke"},"outputs":[],"source":["# This time your previous question and answer will be included as a chat history which will enable the ability\n","# to ask follow up questions.\n","chat_history = [(query, result[\"answer\"])]\n","query = \"How is it benefiting the world?\"\n","result = chain({\"question\": query, \"chat_history\": chat_history})\n","md(result['answer'])"]},{"cell_type":"markdown","metadata":{"id":"TEvefAWIJONx"},"source":["## **5 - Fine-Tuning Models**\n","\n","* Limitatons of Prompt Eng and RAG\n","* Fine-Tuning Arch\n","* Types (PEFT, LoRA, QLoRA)\n","* Using PyTorch for Pre-Training & Fine-Tuning\n","\n","* Evals + Quality\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"0a9CvJ8YcTzV"},"outputs":[],"source":["fine_tuned_arch()"]},{"cell_type":"markdown","metadata":{"id":"_8lcgdZa8onC"},"source":["## **6 - Responsible AI**\n","\n","* Power + Responsibility\n","* Hallucinations\n","* Input & Output Safety\n","* Red-teaming (simulating real-world cyber attackers)\n","* [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"pbqb006R-T_k"},"source":["##**7 - Conclusion**\n","* Active research on LLMs and Llama\n","* Leverage the power of Llama and its open community\n","* Safety and responsible use is paramount!\n","\n","* Call-To-Action\n","  * [Replicate Free Credits](https://replicate.fyi/connect2023) for Connect attendees!\n","  * This notebook is available through Llama Github recipes\n","  * Use Llama in your projects and give us feedback\n"]},{"cell_type":"markdown","metadata":{"id":"gSz5dTMxp7xo"},"source":["#### **Resources**\n","- [GitHub - Llama 2](https://github.com/facebookresearch/llama)\n","- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes)\n","- [Llama 2](https://ai.meta.com/llama/)\n","- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/)\n","- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md)\n","- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/)\n","- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/)\n","- [Replicate](https://replicate.com/meta/)\n","- [LangChain](https://www.langchain.com/)\n","\n"]},{"cell_type":"markdown","metadata":{"id":"V7aI6fhZp-KC"},"source":["#### **Authors & Contact**\n","  * asangani@meta.com, [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/)\n","  * mohsena@meta.com, [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/)\n"]}],"metadata":{"colab":{"collapsed_sections":["ioVMNcTesSEk"],"machine_shape":"hm","provenance":[],"toc_visible":true},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.13"}},"nbformat":4,"nbformat_minor":0}