-
- Downloads
authorship update
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
# **Getting to know Llama 2: Everything you need to start building** | # **Getting to know Llama 2: Everything you need to start building** | ||
Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects. | Our goal in this session is to provide a guided tour of Llama 2, including understanding different Llama 2 models, how and where to access them, Generative AI and Chatbot architectures, prompt engineering, RAG (Retrieval Augmented Generation), Fine-tuning and more. All this is implemented with a starter code for you to take it and use it in your Llama 2 projects. | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
##**0 - Prerequisites** | ##**0 - Prerequisites** | ||
* Basic understanding of Large Language Models | * Basic understanding of Large Language Models | ||
* Basic understanding of Python | * Basic understanding of Python | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# presentation layer code | # presentation layer code | ||
import base64 | import base64 | ||
from IPython.display import Image, display | from IPython.display import Image, display | ||
import matplotlib.pyplot as plt | import matplotlib.pyplot as plt | ||
def mm(graph): | def mm(graph): | ||
graphbytes = graph.encode("ascii") | graphbytes = graph.encode("ascii") | ||
base64_bytes = base64.b64encode(graphbytes) | base64_bytes = base64.b64encode(graphbytes) | ||
base64_string = base64_bytes.decode("ascii") | base64_string = base64_bytes.decode("ascii") | ||
display(Image(url="https://mermaid.ink/img/" + base64_string)) | display(Image(url="https://mermaid.ink/img/" + base64_string)) | ||
def genai_app_arch(): | def genai_app_arch(): | ||
mm(""" | mm(""" | ||
flowchart TD | flowchart TD | ||
A[Users] --> B(Applications e.g. mobile, web) | A[Users] --> B(Applications e.g. mobile, web) | ||
B --> |Hosted API|C(Platforms e.g. Custom, OctoAI, HuggingFace, Replicate) | B --> |Hosted API|C(Platforms e.g. Custom, OctoAI, HuggingFace, Replicate) | ||
B -- optional --> E(Frameworks e.g. LangChain) | B -- optional --> E(Frameworks e.g. LangChain) | ||
C-->|User Input|D[Llama 2] | C-->|User Input|D[Llama 2] | ||
D-->|Model Output|C | D-->|Model Output|C | ||
E --> C | E --> C | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
def rag_arch(): | def rag_arch(): | ||
mm(""" | mm(""" | ||
flowchart TD | flowchart TD | ||
A[User Prompts] --> B(Frameworks e.g. LangChain) | A[User Prompts] --> B(Frameworks e.g. LangChain) | ||
B <--> |Database, Docs, XLS|C[fa:fa-database External Data] | B <--> |Database, Docs, XLS|C[fa:fa-database External Data] | ||
B -->|API|D[Llama 2] | B -->|API|D[Llama 2] | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
def llama2_family(): | def llama2_family(): | ||
mm(""" | mm(""" | ||
graph LR; | graph LR; | ||
llama-2 --> llama-2-7b | llama-2 --> llama-2-7b | ||
llama-2 --> llama-2-13b | llama-2 --> llama-2-13b | ||
llama-2 --> llama-2-70b | llama-2 --> llama-2-70b | ||
llama-2-7b --> llama-2-7b-chat | llama-2-7b --> llama-2-7b-chat | ||
llama-2-13b --> llama-2-13b-chat | llama-2-13b --> llama-2-13b-chat | ||
llama-2-70b --> llama-2-70b-chat | llama-2-70b --> llama-2-70b-chat | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
def apps_and_llms(): | def apps_and_llms(): | ||
mm(""" | mm(""" | ||
graph LR; | graph LR; | ||
users --> apps | users --> apps | ||
apps --> frameworks | apps --> frameworks | ||
frameworks --> platforms | frameworks --> platforms | ||
platforms --> Llama 2 | platforms --> Llama 2 | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
import ipywidgets as widgets | import ipywidgets as widgets | ||
from IPython.display import display, Markdown | from IPython.display import display, Markdown | ||
# Create a text widget | # Create a text widget | ||
API_KEY = widgets.Password( | API_KEY = widgets.Password( | ||
value='', | value='', | ||
placeholder='', | placeholder='', | ||
description='API_KEY:', | description='API_KEY:', | ||
disabled=False | disabled=False | ||
) | ) | ||
def md(t): | def md(t): | ||
display(Markdown(t)) | display(Markdown(t)) | ||
def bot_arch(): | def bot_arch(): | ||
mm(""" | mm(""" | ||
graph LR; | graph LR; | ||
user --> prompt | user --> prompt | ||
prompt --> i_safety | prompt --> i_safety | ||
i_safety --> context | i_safety --> context | ||
context --> Llama_2 | context --> Llama_2 | ||
Llama_2 --> output | Llama_2 --> output | ||
output --> o_safety | output --> o_safety | ||
i_safety --> memory | i_safety --> memory | ||
o_safety --> memory | o_safety --> memory | ||
memory --> context | memory --> context | ||
o_safety --> user | o_safety --> user | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
def fine_tuned_arch(): | def fine_tuned_arch(): | ||
mm(""" | mm(""" | ||
graph LR; | graph LR; | ||
Custom_Dataset --> Pre-trained_Llama | Custom_Dataset --> Pre-trained_Llama | ||
Pre-trained_Llama --> Fine-tuned_Llama | Pre-trained_Llama --> Fine-tuned_Llama | ||
Fine-tuned_Llama --> RLHF | Fine-tuned_Llama --> RLHF | ||
RLHF --> |Loss:Cross-Entropy|Fine-tuned_Llama | RLHF --> |Loss:Cross-Entropy|Fine-tuned_Llama | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
def load_data_faiss_arch(): | def load_data_faiss_arch(): | ||
mm(""" | mm(""" | ||
graph LR; | graph LR; | ||
documents --> textsplitter | documents --> textsplitter | ||
textsplitter --> embeddings | textsplitter --> embeddings | ||
embeddings --> vectorstore | embeddings --> vectorstore | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
def mem_context(): | def mem_context(): | ||
mm(""" | mm(""" | ||
graph LR | graph LR | ||
context(text) | context(text) | ||
user_prompt --> context | user_prompt --> context | ||
instruction --> context | instruction --> context | ||
examples --> context | examples --> context | ||
memory --> context | memory --> context | ||
context --> tokenizer | context --> tokenizer | ||
tokenizer --> embeddings | tokenizer --> embeddings | ||
embeddings --> LLM | embeddings --> LLM | ||
classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | classDef default fill:#CCE6FF,stroke:#84BCF5,textColor:#1C2B33,fontFamily:trebuchet ms; | ||
""") | """) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
##**1 - Understanding Llama 2** | ##**1 - Understanding Llama 2** | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **1.1 - What is Llama 2?** | ### **1.1 - What is Llama 2?** | ||
* State of the art (SOTA), Open Source LLM | * State of the art (SOTA), Open Source LLM | ||
* 7B, 13B, 70B | * 7B, 13B, 70B | ||
* Pretrained + Chat | * Pretrained + Chat | ||
* Choosing model: Size, Quality, Cost, Speed | * Choosing model: Size, Quality, Cost, Speed | ||
* [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) | * [Research paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) | ||
* [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/) | * [Responsible use guide](https://ai.meta.com/llama/responsible-use-guide/) | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
llama2_family() | llama2_family() | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
###**1.2 - Accessing Llama 2** | ###**1.2 - Accessing Llama 2** | ||
* Download + Self Host (on-premise) | * Download + Self Host (on-premise) | ||
* Hosted API Platform (e.g. [OctoAI](https://octoai.cloud/), [Replicate](https://replicate.com/meta)) | * Hosted API Platform (e.g. [OctoAI](https://octoai.cloud/), [Replicate](https://replicate.com/meta)) | ||
* Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139)) | * Hosted Container Platform (e.g. [Azure](https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/introducing-llama-2-on-azure/ba-p/3881233), [AWS](https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/), [GCP](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/139)) | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **1.3 - Use Cases of Llama 2** | ### **1.3 - Use Cases of Llama 2** | ||
* Content Generation | * Content Generation | ||
* Chatbots | * Chatbots | ||
* Summarization | * Summarization | ||
* Programming (e.g. Code Llama) | * Programming (e.g. Code Llama) | ||
* and many more... | * and many more... | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
##**2 - Using Llama 2** | ##**2 - Using Llama 2** | ||
In this notebook, we are going to access [Llama 13b chat model](https://octoai.cloud/tools/text/chat?mode=demo&model=llama-2-13b-chat-fp16) using hosted API from OctoAI. | In this notebook, we are going to access [Llama 13b chat model](https://octoai.cloud/tools/text/chat?mode=demo&model=llama-2-13b-chat-fp16) using hosted API from OctoAI. | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **2.1 - Install dependencies** | ### **2.1 - Install dependencies** | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# Install dependencies and initialize | # Install dependencies and initialize | ||
%pip install -qU \ | %pip install -qU \ | ||
octoai-sdk \ | octoai-sdk \ | ||
langchain \ | langchain \ | ||
sentence_transformers \ | sentence_transformers \ | ||
pdf2image \ | pdf2image \ | ||
pdfminer \ | pdfminer \ | ||
pdfminer.six \ | pdfminer.six \ | ||
unstructured \ | unstructured \ | ||
faiss-cpu \ | faiss-cpu \ | ||
pillow-heif \ | pillow-heif \ | ||
opencv-python \ | opencv-python \ | ||
unstructured-inference \ | unstructured-inference \ | ||
pikepdf | pikepdf | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# model on OctoAI platform that we will use for inferencing | # model on OctoAI platform that we will use for inferencing | ||
# We will use llama 13b chat model hosted on OctoAI server () | # We will use llama 13b chat model hosted on OctoAI server () | ||
llama2_13b = "llama-2-13b-chat-fp16" | llama2_13b = "llama-2-13b-chat-fp16" | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# We will use OctoAI hosted cloud environment | # We will use OctoAI hosted cloud environment | ||
# Obtain OctoAI API key → https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token | # Obtain OctoAI API key → https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token | ||
# enter your replicate api token | # enter your replicate api token | ||
from getpass import getpass | from getpass import getpass | ||
import os | import os | ||
OCTOAI_API_TOKEN = getpass() | OCTOAI_API_TOKEN = getpass() | ||
os.environ["OCTOAI_API_TOKEN"] = OCTOAI_API_TOKEN | os.environ["OCTOAI_API_TOKEN"] = OCTOAI_API_TOKEN | ||
# alternatively, you can also store the tokens in environment variables and load it here | # alternatively, you can also store the tokens in environment variables and load it here | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# we will use OctoAI's hosted API | # we will use OctoAI's hosted API | ||
from octoai.client import Client | from octoai.client import Client | ||
client = Client(OCTOAI_API_TOKEN) | client = Client(OCTOAI_API_TOKEN) | ||
# text completion with input prompt | # text completion with input prompt | ||
def Completion(prompt): | def Completion(prompt): | ||
output = client.chat.completions.create( | output = client.chat.completions.create( | ||
messages=[ | messages=[ | ||
{ | { | ||
"role": "user", | "role": "user", | ||
"content": prompt | "content": prompt | ||
} | } | ||
], | ], | ||
model="llama-2-13b-chat-fp16", | model="llama-2-13b-chat-fp16", | ||
max_tokens=1000 | max_tokens=1000 | ||
) | ) | ||
return output.choices[0].message.content | return output.choices[0].message.content | ||
# chat completion with input prompt and system prompt | # chat completion with input prompt and system prompt | ||
def ChatCompletion(prompt, system_prompt=None): | def ChatCompletion(prompt, system_prompt=None): | ||
output = client.chat.completions.create( | output = client.chat.completions.create( | ||
messages=[ | messages=[ | ||
{ | { | ||
"role": "system", | "role": "system", | ||
"content": system_prompt | "content": system_prompt | ||
}, | }, | ||
{ | { | ||
"role": "user", | "role": "user", | ||
"content": prompt | "content": prompt | ||
} | } | ||
], | ], | ||
model="llama-2-13b-chat-fp16", | model="llama-2-13b-chat-fp16", | ||
max_tokens=1000 | max_tokens=1000 | ||
) | ) | ||
return output.choices[0].message.content | return output.choices[0].message.content | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **2.2 - Basic completion** | ### **2.2 - Basic completion** | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
output = Completion(prompt="The typical color of a llama is: ") | output = Completion(prompt="The typical color of a llama is: ") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **2.3 - System prompts** | ### **2.3 - System prompts** | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
output = ChatCompletion( | output = ChatCompletion( | ||
prompt="The typical color of a llama is: ", | prompt="The typical color of a llama is: ", | ||
system_prompt="respond with only one word" | system_prompt="respond with only one word" | ||
) | ) | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **2.4 - Response formats** | ### **2.4 - Response formats** | ||
* Can support different formatted outputs e.g. text, JSON, etc. | * Can support different formatted outputs e.g. text, JSON, etc. | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
output = ChatCompletion( | output = ChatCompletion( | ||
prompt="The typical color of a llama is: ", | prompt="The typical color of a llama is: ", | ||
system_prompt="response in json format" | system_prompt="response in json format" | ||
) | ) | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
## **3 - Gen AI Application Architecture** | ## **3 - Gen AI Application Architecture** | ||
Here is the high-level tech stack/architecture of Generative AI application. | Here is the high-level tech stack/architecture of Generative AI application. | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
genai_app_arch() | genai_app_arch() | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
##4 - **Chatbot Architecture** | ##4 - **Chatbot Architecture** | ||
Here are the key components and the information flow in a chatbot. | Here are the key components and the information flow in a chatbot. | ||
* User Prompts | * User Prompts | ||
* Input Safety | * Input Safety | ||
* Llama 2 | * Llama 2 | ||
* Output Safety | * Output Safety | ||
* Memory & Context | * Memory & Context | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
bot_arch() | bot_arch() | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **4.1 - Chat conversation** | ### **4.1 - Chat conversation** | ||
* LLMs are stateless | * LLMs are stateless | ||
* Single Turn | * Single Turn | ||
* Multi Turn (Memory) | * Multi Turn (Memory) | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# example of single turn chat | # example of single turn chat | ||
prompt_chat = "What is the average lifespan of a Llama?" | prompt_chat = "What is the average lifespan of a Llama?" | ||
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words") | output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# example without previous context. LLM's are stateless and cannot understand "they" without previous context | # example without previous context. LLM's are stateless and cannot understand "they" without previous context | ||
prompt_chat = "What animal family are they?" | prompt_chat = "What animal family are they?" | ||
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words") | output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question in few words") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat. | Chat app requires us to send in previous context to LLM to get in valid responses. Below is an example of Multi-turn chat. | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# example of multi-turn chat, with storing previous context | # example of multi-turn chat, with storing previous context | ||
prompt_chat = """ | prompt_chat = """ | ||
User: What is the average lifespan of a Llama? | User: What is the average lifespan of a Llama? | ||
Assistant: Sure! The average lifespan of a llama is around 20-30 years. | Assistant: Sure! The average lifespan of a llama is around 20-30 years. | ||
User: What animal family are they? | User: What animal family are they? | ||
""" | """ | ||
output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question") | output = ChatCompletion(prompt=prompt_chat, system_prompt="answer the last question") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **4.2 - Prompt Engineering** | ### **4.2 - Prompt Engineering** | ||
* Prompt engineering refers to the science of designing effective prompts to get desired responses | * Prompt engineering refers to the science of designing effective prompts to get desired responses | ||
* Helps reduce hallucination | * Helps reduce hallucination | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
#### **4.2.1 - In-Context Learning (e.g. Zero-shot, Few-shot)** | #### **4.2.1 - In-Context Learning (e.g. Zero-shot, Few-shot)** | ||
* In-context learning - specific method of prompt engineering where demonstration of task are provided as part of prompt. | * In-context learning - specific method of prompt engineering where demonstration of task are provided as part of prompt. | ||
1. Zero-shot learning - model is performing tasks without any | 1. Zero-shot learning - model is performing tasks without any | ||
input examples. | input examples. | ||
2. Few or “N-Shot” Learning - model is performing and behaving based on input examples in user's prompt. | 2. Few or “N-Shot” Learning - model is performing and behaving based on input examples in user's prompt. | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# Zero-shot example. To get positive/negative/neutral sentiment, we need to give examples in the prompt | # Zero-shot example. To get positive/negative/neutral sentiment, we need to give examples in the prompt | ||
prompt = ''' | prompt = ''' | ||
Classify: I saw a Gecko. | Classify: I saw a Gecko. | ||
Sentiment: ? | Sentiment: ? | ||
''' | ''' | ||
output = ChatCompletion(prompt, system_prompt="one word response") | output = ChatCompletion(prompt, system_prompt="one word response") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# By giving examples to Llama, it understands the expected output format. | # By giving examples to Llama, it understands the expected output format. | ||
prompt = ''' | prompt = ''' | ||
Classify: I love Llamas! | Classify: I love Llamas! | ||
Sentiment: Positive | Sentiment: Positive | ||
Classify: I dont like Snakes. | Classify: I dont like Snakes. | ||
Sentiment: Negative | Sentiment: Negative | ||
Classify: I saw a Gecko. | Classify: I saw a Gecko. | ||
Sentiment:''' | Sentiment:''' | ||
output = ChatCompletion(prompt, system_prompt="One word response") | output = ChatCompletion(prompt, system_prompt="One word response") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# another zero-shot learning | # another zero-shot learning | ||
prompt = ''' | prompt = ''' | ||
QUESTION: Vicuna? | QUESTION: Vicuna? | ||
ANSWER:''' | ANSWER:''' | ||
output = ChatCompletion(prompt, system_prompt="one word response") | output = ChatCompletion(prompt, system_prompt="one word response") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# Another few-shot learning example with formatted prompt. | # Another few-shot learning example with formatted prompt. | ||
prompt = ''' | prompt = ''' | ||
QUESTION: Llama? | QUESTION: Llama? | ||
ANSWER: Yes | ANSWER: Yes | ||
QUESTION: Alpaca? | QUESTION: Alpaca? | ||
ANSWER: Yes | ANSWER: Yes | ||
QUESTION: Rabbit? | QUESTION: Rabbit? | ||
ANSWER: No | ANSWER: No | ||
QUESTION: Vicuna? | QUESTION: Vicuna? | ||
ANSWER:''' | ANSWER:''' | ||
output = ChatCompletion(prompt, system_prompt="one word response") | output = ChatCompletion(prompt, system_prompt="one word response") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
#### **4.2.2 - Chain of Thought** | #### **4.2.2 - Chain of Thought** | ||
"Chain of thought" enables complex reasoning through logical step by step thinking and generates meaningful and contextually relevant responses. | "Chain of thought" enables complex reasoning through logical step by step thinking and generates meaningful and contextually relevant responses. | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# Standard prompting | # Standard prompting | ||
prompt = ''' | prompt = ''' | ||
Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now? | Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now? | ||
''' | ''' | ||
output = ChatCompletion(prompt, system_prompt="provide short answer") | output = ChatCompletion(prompt, system_prompt="provide short answer") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# Chain-Of-Thought prompting | # Chain-Of-Thought prompting | ||
prompt = ''' | prompt = ''' | ||
Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now? | Llama started with 5 tennis balls. It buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does Llama have now? | ||
Let's think step by step. | Let's think step by step. | ||
''' | ''' | ||
output = ChatCompletion(prompt, system_prompt="provide short answer") | output = ChatCompletion(prompt, system_prompt="provide short answer") | ||
md(output) | md(output) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
### **4.3 - Retrieval Augmented Generation (RAG)** | ### **4.3 - Retrieval Augmented Generation (RAG)** | ||
* Prompt Eng Limitations - Knowledge cutoff & lack of specialized data | * Prompt Eng Limitations - Knowledge cutoff & lack of specialized data | ||
* Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2. | * Retrieval Augmented Generation(RAG) allows us to retrieve snippets of information from external data sources and augment it to the user's prompt to get tailored responses from Llama 2. | ||
For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama! | For our demo, we are going to download an external PDF file from a URL and query against the content in the pdf file to get contextually relevant information back with the help of Llama! | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
rag_arch() | rag_arch() | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
#### **4.3.1 - LangChain** | #### **4.3.1 - LangChain** | ||
LangChain is a framework that helps make it easier to implement RAG. | LangChain is a framework that helps make it easier to implement RAG. | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# langchain setup | # langchain setup | ||
from langchain.llms.octoai_endpoint import OctoAIEndpoint | from langchain.llms.octoai_endpoint import OctoAIEndpoint | ||
# Use the Llama 2 model hosted on OctoAI | # Use the Llama 2 model hosted on OctoAI | ||
# Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value | # Temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value | ||
# top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens | # top_p: When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens | ||
# max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens | # max_new_tokens: Maximum number of tokens to generate. A word is generally 2-3 tokens | ||
llama_model = OctoAIEndpoint( | llama_model = OctoAIEndpoint( | ||
endpoint_url="https://text.octoai.run/v1/chat/completions", | endpoint_url="https://text.octoai.run/v1/chat/completions", | ||
model_kwargs={ | model_kwargs={ | ||
"model": llama2_13b, | "model": llama2_13b, | ||
"messages": [ | "messages": [ | ||
{ | { | ||
"role": "system", | "role": "system", | ||
"content": "You are a helpful, respectful and honest assistant." | "content": "You are a helpful, respectful and honest assistant." | ||
} | } | ||
], | ], | ||
"max_tokens": 1000, | "max_tokens": 1000, | ||
"top_p": 1, | "top_p": 1, | ||
"temperature": 0.75 | "temperature": 0.75 | ||
}, | }, | ||
) | ) | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# Step 1: load the external data source. In our case, we will load Meta’s “Responsible Use Guide” pdf document. | # Step 1: load the external data source. In our case, we will load Meta’s “Responsible Use Guide” pdf document. | ||
from langchain.document_loaders import OnlinePDFLoader | from langchain.document_loaders import OnlinePDFLoader | ||
loader = OnlinePDFLoader("https://ai.meta.com/static-resource/responsible-use-guide/") | loader = OnlinePDFLoader("https://ai.meta.com/static-resource/responsible-use-guide/") | ||
documents = loader.load() | documents = loader.load() | ||
# Step 2: Get text splits from document | # Step 2: Get text splits from document | ||
from langchain.text_splitter import RecursiveCharacterTextSplitter | from langchain.text_splitter import RecursiveCharacterTextSplitter | ||
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20) | text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20) | ||
all_splits = text_splitter.split_documents(documents) | all_splits = text_splitter.split_documents(documents) | ||
# Step 3: Use the embedding model | # Step 3: Use the embedding model | ||
from langchain.vectorstores import FAISS | from langchain.vectorstores import FAISS | ||
from langchain.embeddings import OctoAIEmbeddings | from langchain.embeddings import OctoAIEmbeddings | ||
embeddings = OctoAIEmbeddings(endpoint_url="https://text.octoai.run/v1/embeddings") | embeddings = OctoAIEmbeddings(endpoint_url="https://text.octoai.run/v1/embeddings") | ||
# Step 4: Use vector store to store embeddings | # Step 4: Use vector store to store embeddings | ||
vectorstore = FAISS.from_documents(all_splits, embeddings) | vectorstore = FAISS.from_documents(all_splits, embeddings) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
#### **4.3.2 - LangChain Q&A Retriever** | #### **4.3.2 - LangChain Q&A Retriever** | ||
* ConversationalRetrievalChain | * ConversationalRetrievalChain | ||
* Query the Source documents | * Query the Source documents | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# Query against your own data | # Query against your own data | ||
from langchain.chains import ConversationalRetrievalChain | from langchain.chains import ConversationalRetrievalChain | ||
chain = ConversationalRetrievalChain.from_llm(llama_model, vectorstore.as_retriever(), return_source_documents=True) | chain = ConversationalRetrievalChain.from_llm(llama_model, vectorstore.as_retriever(), return_source_documents=True) | ||
chat_history = [] | chat_history = [] | ||
query = "How is Meta approaching open science in two short sentences?" | query = "How is Meta approaching open science in two short sentences?" | ||
result = chain.invoke({"question": query, "chat_history": chat_history}) | result = chain.invoke({"question": query, "chat_history": chat_history}) | ||
md(result['answer']) | md(result['answer']) | ||
``` | ``` | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
# This time your previous question and answer will be included as a chat history which will enable the ability | # This time your previous question and answer will be included as a chat history which will enable the ability | ||
# to ask follow up questions. | # to ask follow up questions. | ||
chat_history = [(query, result["answer"])] | chat_history = [(query, result["answer"])] | ||
query = "How is it benefiting the world?" | query = "How is it benefiting the world?" | ||
result = chain({"question": query, "chat_history": chat_history}) | result = chain({"question": query, "chat_history": chat_history}) | ||
md(result['answer']) | md(result['answer']) | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
## **5 - Fine-Tuning Models** | ## **5 - Fine-Tuning Models** | ||
* Limitatons of Prompt Eng and RAG | * Limitatons of Prompt Eng and RAG | ||
* Fine-Tuning Arch | * Fine-Tuning Arch | ||
* Types (PEFT, LoRA, QLoRA) | * Types (PEFT, LoRA, QLoRA) | ||
* Using PyTorch for Pre-Training & Fine-Tuning | * Using PyTorch for Pre-Training & Fine-Tuning | ||
* Evals + Quality | * Evals + Quality | ||
%% Cell type:code id: tags: | %% Cell type:code id: tags: | ||
``` python | ``` python | ||
fine_tuned_arch() | fine_tuned_arch() | ||
``` | ``` | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
## **6 - Responsible AI** | ## **6 - Responsible AI** | ||
* Power + Responsibility | * Power + Responsibility | ||
* Hallucinations | * Hallucinations | ||
* Input & Output Safety | * Input & Output Safety | ||
* Red-teaming (simulating real-world cyber attackers) | * Red-teaming (simulating real-world cyber attackers) | ||
* [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/) | * [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/) | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
##**7 - Conclusion** | ##**7 - Conclusion** | ||
* Active research on LLMs and Llama | * Active research on LLMs and Llama | ||
* Leverage the power of Llama and its open community | * Leverage the power of Llama and its open community | ||
* Safety and responsible use is paramount! | * Safety and responsible use is paramount! | ||
* Call-To-Action | * Call-To-Action | ||
* [Replicate Free Credits](https://replicate.fyi/connect2023) for Connect attendees! | * [Replicate Free Credits](https://replicate.fyi/connect2023) for Connect attendees! | ||
* This notebook is available through Llama Github recipes | * This notebook is available through Llama Github recipes | ||
* Use Llama in your projects and give us feedback | * Use Llama in your projects and give us feedback | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
#### **Resources** | #### **Resources** | ||
- [GitHub - Llama 2](https://github.com/facebookresearch/llama) | - [GitHub - Llama 2](https://github.com/facebookresearch/llama) | ||
- [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes) | - [Github - LLama 2 Recipes](https://github.com/facebookresearch/llama-recipes) | ||
- [Llama 2](https://ai.meta.com/llama/) | - [Llama 2](https://ai.meta.com/llama/) | ||
- [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) | - [Research Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) | ||
- [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) | - [Model Card](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) | ||
- [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/) | - [Responsible Use Guide](https://ai.meta.com/llama/responsible-use-guide/) | ||
- [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/) | - [Acceptable Use Policy](https://ai.meta.com/llama/use-policy/) | ||
- [OctoAI](https://octoai.cloud/) | - [OctoAI](https://octoai.cloud/) | ||
- [LangChain](https://www.langchain.com/) | - [LangChain](https://www.langchain.com/) | ||
%% Cell type:markdown id: tags: | %% Cell type:markdown id: tags: | ||
#### **Authors & Contact** | #### **Authors & Contact** | ||
* asangani@meta.com, [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/) | * asangani@meta.com, [Amit Sangani | LinkedIn](https://www.linkedin.com/in/amitsangani/) | ||
* mohsena@meta.com, [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/) | * mohsena@meta.com, [Mohsen Agsen | LinkedIn](https://www.linkedin.com/in/mohsen-agsen-62a9791/) | ||
* Adapted to run on OctoAI by Thierry Moreau - tmoreau@octo.ai | |||
Adapted to run on OctoAI by Thierry Moreau - tmoreau@octo.ai | |||
... | ... |
Please register or sign in to comment