Merge branch 'main' into fsdp_lmm

6af36197 · Kai Wu · 8715e044 · 82d40492 · 6af36197 · 6af36197
Commit 6af36197 authored 6 months ago by Kai Wu
--- a/.github/scripts/spellcheck_conf/wordlist.txt
+++ b/.github/scripts/spellcheck_conf/wordlist.txt
@@ -1483,3 +1483,4 @@ ttft
 uv
 8xL40S
 xL
+EDA
--- a/docs/multi_gpu.md
+++ b/docs/multi_gpu.md
@@ -4,7 +4,7 @@ To run fine-tuning on multi-GPUs, we will  make use of two packages:
 1. [PEFT](https://huggingface.co/blog/peft) methods and in particular using the Hugging Face [PEFT](https://github.com/huggingface/peft)library.
-2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](LLM_finetuning.md/#2-full-partial-parameter-finetuning).
+2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](./LLM_finetuning.md).
 Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 8B model on multiple GPUs in one node.
 For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled.

--- a/recipes/experimental/long_context/H2O/README.md
+++ b/recipes/experimental/long_context/H2O/README.md
@@ -36,7 +36,7 @@ Expected results on XSUM (Rouge-2 score, the higher the better) from the above s
 ### One Demo on Streaming to "Infinite" Context Length
-The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size. Results can be found on [Demo](https://allenz.work/?p=11) (Video 1).
+The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size.
 ```
 # run with full cache

--- a/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb
+++ b/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb
@@ -4,7 +4,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/dlai/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+    "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {

 %% Cell type:markdown id: tags:
-<a href="https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/dlai/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
+<a href="https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
 %% Cell type:markdown id: tags:
 This notebook ports the DeepLearning.AI short course [Building Agentic RAG with Llamaindex Lesson 1 Router Engine](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/2/router-query-engine) to using Llama 3.
 You should take the course before or after going through this notebook to have a deeper understanding.
 %% Cell type:code id: tags:
 ``` python
 !pip install llama-index
 !pip install llama-index-embeddings-huggingface
 !pip install llama-index-llms-groq
 ```
 %% Cell type:code id: tags:
 ``` python
 import os
 os.environ['GROQ_API_KEY'] = 'your_groq_api_key' # get a free key at https://console.groq.com/keys
 ```
 %% Cell type:code id: tags:
 ``` python
 !wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf
 ```
 %% Cell type:code id: tags:
 ``` python
 from llama_index.core import SimpleDirectoryReader
 documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()
 ```
 %% Cell type:code id: tags:
 ``` python
 from llama_index.core.node_parser import SentenceSplitter
 splitter = SentenceSplitter(chunk_size=1024)
 nodes = splitter.get_nodes_from_documents(documents)
 ```
 %% Cell type:code id: tags:
 ``` python
 from llama_index.llms.groq import Groq
 from llama_index.core import Settings, VectorStoreIndex
 from llama_index.embeddings.huggingface import HuggingFaceEmbedding
 llm = Groq(model="llama3-8b-8192") #, api_key=GROQ_API_TOKEN)
 Settings.llm = llm
 #llm.complete("Who wrote the book godfather").text
 Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
 )
 ```
 %% Cell type:code id: tags:
 ``` python
 from llama_index.core import SummaryIndex, VectorStoreIndex
 summary_index = SummaryIndex(nodes)
 vector_index = VectorStoreIndex(nodes)
 ```
 %% Cell type:code id: tags:
 ``` python
 summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
 )
 vector_query_engine = vector_index.as_query_engine()
 ```
 %% Cell type:code id: tags:
 ``` python
 from llama_index.core.tools import QueryEngineTool
 summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
 )
 vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
 )
 ```
 %% Cell type:code id: tags:
 ``` python
 from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
 from llama_index.core.selectors import LLMSingleSelector
 query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
 )
 ```
 %% Cell type:code id: tags:
 ``` python
 import nest_asyncio
 nest_asyncio.apply()
 ```
 %% Cell type:code id: tags:
 ``` python
 response = query_engine.query("What is the summary of the document?")
 print(str(response))
 ```
 %% Cell type:code id: tags:
 ``` python
 print(len(response.source_nodes))
 ```
 %% Cell type:code id: tags:
 ``` python
 response = query_engine.query(
    "How do agents share information with other agents? This is not a summarization question."
 )
 print(str(response))
 ```
 %% Cell type:code id: tags:
 ``` python
 def get_router_query_engine(file_path: str):
    """Get router query engine."""
    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
    splitter = SentenceSplitter(chunk_size=1024)
    nodes = splitter.get_nodes_from_documents(documents)
    summary_index = SummaryIndex(nodes)
    vector_index = VectorStoreIndex(nodes)
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )
    vector_query_engine = vector_index.as_query_engine()
    summary_tool = QueryEngineTool.from_defaults(
        query_engine=summary_query_engine,
        description=(
            "Useful for summarization questions related to MetaGPT"
        ),
    )
    vector_tool = QueryEngineTool.from_defaults(
        query_engine=vector_query_engine,
        description=(
            "Useful for retrieving specific context from the MetaGPT paper."
        ),
    )
    query_engine = RouterQueryEngine(
        selector=LLMSingleSelector.from_defaults(),
        query_engine_tools=[
            summary_tool,
            vector_tool,
        ],
        verbose=True
    )
    return query_engine
 query_engine = get_router_query_engine("metagpt.pdf")
 ```
 %% Cell type:code id: tags:
 ``` python
 response = query_engine.query("Tell me about the ablation study results?")
 print(str(response))
 ```
 %% Cell type:code id: tags:
 ``` python
 ```