diff --git a/demo_apps/HelloLlamaLocal.ipynb b/demo_apps/HelloLlamaLocal.ipynb index bc15527d980edeb6e1d5c78d409ab1e946eea4cb..facaeb9b4903c8e0738096c80dd62e9840c8adf8 100644 --- a/demo_apps/HelloLlamaLocal.ipynb +++ b/demo_apps/HelloLlamaLocal.ipynb @@ -152,7 +152,7 @@ "id": "73df46d9", "metadata": {}, "source": [ - "Next, initialize the langchain CallBackManager. This handles callbacks from Langchain and for this example we will use token-wise streaming so the answer gets generated token by token when Llama is answering your question." + "Next, initialize the langchain `CallBackManager`. This handles callbacks from Langchain and for this example we will use token-wise streaming so the answer gets generated token by token when Llama is answering your question." ] }, { @@ -173,7 +173,10 @@ "source": [ "\n", "Set up the Llama 2 model. \n", - "Replace `<path-to-llama-gguf-file>` with the path either to your downloaded quantized model file [here](https://drive.google.com/file/d/1afPv3HOy73BE2MoYCgYJvBDeQNa9rZbj/view?usp=sharing), or to the ggml-model-q4_0.gguf file built with the following commands:\n", + "\n", + "Replace `<path-to-llama-gguf-file>` with the path either to your downloaded quantized model file [here](https://drive.google.com/file/d/1afPv3HOy73BE2MoYCgYJvBDeQNa9rZbj/view?usp=sharing), \n", + "\n", + "or to the `ggml-model-q4_0.gguf` file built with the following commands:\n", "\n", "```bash\n", "git clone https://github.com/ggerganov/llama.cpp\n", @@ -181,6 +184,7 @@ "python3 -m pip install -r requirements.txt\n", "python convert.py <path_to_your_downloaded_llama-2-13b_model>\n", "./quantize <path_to_your_downloaded_llama-2-13b_model>/ggml-model-f16.gguf <path_to_your_downloaded_llama-2-13b_model>/ggml-model-q4_0.gguf q4_0\n", + "\n", "```\n", "For more info see https://python.langchain.com/docs/integrations/llms/llamacpp" ] @@ -209,6 +213,7 @@ "metadata": {}, "source": [ "With the model set up, you are now ready to ask some questions. \n", + "\n", "Here is an example of the simplest way to ask the model some general questions." ] }, @@ -251,7 +256,8 @@ "id": "545cb6aa", "metadata": {}, "source": [ - "Alternatively, you can sue LangChain's PromptTemplate for some flexibility in your prompts and questions.\n", + "Alternatively, you can sue LangChain's `PromptTemplate` for some flexibility in your prompts and questions.\n", + "\n", "For more information on LangChain's prompt template visit this [link](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/)" ] }, @@ -367,7 +373,8 @@ "id": "37f77909", "metadata": {}, "source": [ - "One way we can fix the hallucinations is to use RAG, to augment it with more recent or custom data that holds the info for it to answer correctly.\n", + "One way we can fix the hallucinations is to use RAG, to augment it with more recent or custom data that holds the information for it to answer correctly.\n", + "\n", "First we load the Llama2 paper using LangChain's [PDF loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf)" ] }, @@ -417,7 +424,7 @@ "For this example we will use [Chroma](https://python.langchain.com/docs/integrations/vectorstores/chroma) which is light-weight and in memory so it's easy to get started with.\n", "For other vector stores especially if you need to store a large amount of data - see https://python.langchain.com/docs/integrations/vectorstores\n", "\n", - "We will also import the HuggingFaceEmbeddings and RecursiveCharacterTextSplitter to assist in storing the documents." + "We will also import the `HuggingFaceEmbeddings` and `RecursiveCharacterTextSplitter` to assist in storing the documents." ] }, { @@ -443,7 +450,7 @@ "metadata": {}, "source": [ "\n", - "To store the documents, we will need to split them into chunks using [`RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) and create vector representations of these chunks using [`HuggingFaceEmbeddings`](https://www.google.com/search?q=langchain+hugging+face+embeddings&sca_esv=572890011&ei=ARUoZaH4LuumptQP48ah2Ac&oq=langchian+hugg&gs_lp=Egxnd3Mtd2l6LXNlcnAiDmxhbmdjaGlhbiBodWdnKgIIADIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCkjeHlC5Cli5D3ABeAGQAQCYAV6gAb4CqgEBNLgBAcgBAPgBAcICChAAGEcY1gQYsAPiAwQYACBBiAYBkAYI&sclient=gws-wiz-serp) to them before storing them into our vector database. \n" + "To store the documents, we will need to split them into chunks using [`RecursiveCharacterTextSplitter`](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter) and create vector representations of these chunks using [`HuggingFaceEmbeddings`](https://www.google.com/search?q=langchain+hugging+face+embeddings&sca_esv=572890011&ei=ARUoZaH4LuumptQP48ah2Ac&oq=langchian+hugg&gs_lp=Egxnd3Mtd2l6LXNlcnAiDmxhbmdjaGlhbiBodWdnKgIIADIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCjIHEAAYgAQYCkjeHlC5Cli5D3ABeAGQAQCYAV6gAb4CqgEBNLgBAcgBAPgBAcICChAAGEcY1gQYsAPiAwQYACBBiAYBkAYI&sclient=gws-wiz-serp) on them before storing them into our vector database. \n" ] }, { @@ -524,6 +531,7 @@ "metadata": {}, "source": [ "For each question, LangChain performs a semantic similarity search of it in the vector db, then passes the search results as the context to the model to answer the question.\n", + "\n", "It takes close to 2 minutes to return the result (but using other vector stores other than Chroma such as FAISS can take longer) because Llama2 is running on a local Mac. \n", "To get much faster results, you can use a cloud service with GPU used for inference - see HelloLlamaCloud for a demo." ]