Skip to content
Snippets Groups Projects
Commit 6af36197 authored by Kai Wu's avatar Kai Wu
Browse files

Merge branch 'main' into fsdp_lmm

parents 8715e044 82d40492
No related branches found
No related tags found
No related merge requests found
...@@ -1483,3 +1483,4 @@ ttft ...@@ -1483,3 +1483,4 @@ ttft
uv uv
8xL40S 8xL40S
xL xL
EDA
...@@ -4,7 +4,7 @@ To run fine-tuning on multi-GPUs, we will make use of two packages: ...@@ -4,7 +4,7 @@ To run fine-tuning on multi-GPUs, we will make use of two packages:
1. [PEFT](https://huggingface.co/blog/peft) methods and in particular using the Hugging Face [PEFT](https://github.com/huggingface/peft)library. 1. [PEFT](https://huggingface.co/blog/peft) methods and in particular using the Hugging Face [PEFT](https://github.com/huggingface/peft)library.
2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](LLM_finetuning.md/#2-full-partial-parameter-finetuning). 2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](./LLM_finetuning.md).
Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 8B model on multiple GPUs in one node. Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 8B model on multiple GPUs in one node.
For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled. For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled.
......
...@@ -36,7 +36,7 @@ Expected results on XSUM (Rouge-2 score, the higher the better) from the above s ...@@ -36,7 +36,7 @@ Expected results on XSUM (Rouge-2 score, the higher the better) from the above s
### One Demo on Streaming to "Infinite" Context Length ### One Demo on Streaming to "Infinite" Context Length
The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size. Results can be found on [Demo](https://allenz.work/?p=11) (Video 1). The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size.
``` ```
# run with full cache # run with full cache
......
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a href="https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/dlai/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> <a href="https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This notebook ports the DeepLearning.AI short course [Building Agentic RAG with Llamaindex Lesson 1 Router Engine](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/2/router-query-engine) to using Llama 3. This notebook ports the DeepLearning.AI short course [Building Agentic RAG with Llamaindex Lesson 1 Router Engine](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/2/router-query-engine) to using Llama 3.
You should take the course before or after going through this notebook to have a deeper understanding. You should take the course before or after going through this notebook to have a deeper understanding.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!pip install llama-index !pip install llama-index
!pip install llama-index-embeddings-huggingface !pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-groq !pip install llama-index-llms-groq
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import os import os
os.environ['GROQ_API_KEY'] = 'your_groq_api_key' # get a free key at https://console.groq.com/keys os.environ['GROQ_API_KEY'] = 'your_groq_api_key' # get a free key at https://console.groq.com/keys
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf !wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.core import SimpleDirectoryReader from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data() documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.core.node_parser import SentenceSplitter from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=1024) splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents) nodes = splitter.get_nodes_from_documents(documents)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.llms.groq import Groq from llama_index.llms.groq import Groq
from llama_index.core import Settings, VectorStoreIndex from llama_index.core import Settings, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.embeddings.huggingface import HuggingFaceEmbedding
llm = Groq(model="llama3-8b-8192") #, api_key=GROQ_API_TOKEN) llm = Groq(model="llama3-8b-8192") #, api_key=GROQ_API_TOKEN)
Settings.llm = llm Settings.llm = llm
#llm.complete("Who wrote the book godfather").text #llm.complete("Who wrote the book godfather").text
Settings.embed_model = HuggingFaceEmbedding( Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5" model_name="BAAI/bge-small-en-v1.5"
) )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.core import SummaryIndex, VectorStoreIndex from llama_index.core import SummaryIndex, VectorStoreIndex
summary_index = SummaryIndex(nodes) summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes) vector_index = VectorStoreIndex(nodes)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
summary_query_engine = summary_index.as_query_engine( summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize", response_mode="tree_summarize",
use_async=True, use_async=True,
) )
vector_query_engine = vector_index.as_query_engine() vector_query_engine = vector_index.as_query_engine()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.core.tools import QueryEngineTool from llama_index.core.tools import QueryEngineTool
summary_tool = QueryEngineTool.from_defaults( summary_tool = QueryEngineTool.from_defaults(
query_engine=summary_query_engine, query_engine=summary_query_engine,
description=( description=(
"Useful for summarization questions related to MetaGPT" "Useful for summarization questions related to MetaGPT"
), ),
) )
vector_tool = QueryEngineTool.from_defaults( vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine, query_engine=vector_query_engine,
description=( description=(
"Useful for retrieving specific context from the MetaGPT paper." "Useful for retrieving specific context from the MetaGPT paper."
), ),
) )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector from llama_index.core.selectors import LLMSingleSelector
query_engine = RouterQueryEngine( query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(), selector=LLMSingleSelector.from_defaults(),
query_engine_tools=[ query_engine_tools=[
summary_tool, summary_tool,
vector_tool, vector_tool,
], ],
verbose=True verbose=True
) )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import nest_asyncio import nest_asyncio
nest_asyncio.apply() nest_asyncio.apply()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
response = query_engine.query("What is the summary of the document?") response = query_engine.query("What is the summary of the document?")
print(str(response)) print(str(response))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
print(len(response.source_nodes)) print(len(response.source_nodes))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
response = query_engine.query( response = query_engine.query(
"How do agents share information with other agents? This is not a summarization question." "How do agents share information with other agents? This is not a summarization question."
) )
print(str(response)) print(str(response))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
def get_router_query_engine(file_path: str): def get_router_query_engine(file_path: str):
"""Get router query engine.""" """Get router query engine."""
documents = SimpleDirectoryReader(input_files=[file_path]).load_data() documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
splitter = SentenceSplitter(chunk_size=1024) splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents) nodes = splitter.get_nodes_from_documents(documents)
summary_index = SummaryIndex(nodes) summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes) vector_index = VectorStoreIndex(nodes)
summary_query_engine = summary_index.as_query_engine( summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize", response_mode="tree_summarize",
use_async=True, use_async=True,
) )
vector_query_engine = vector_index.as_query_engine() vector_query_engine = vector_index.as_query_engine()
summary_tool = QueryEngineTool.from_defaults( summary_tool = QueryEngineTool.from_defaults(
query_engine=summary_query_engine, query_engine=summary_query_engine,
description=( description=(
"Useful for summarization questions related to MetaGPT" "Useful for summarization questions related to MetaGPT"
), ),
) )
vector_tool = QueryEngineTool.from_defaults( vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine, query_engine=vector_query_engine,
description=( description=(
"Useful for retrieving specific context from the MetaGPT paper." "Useful for retrieving specific context from the MetaGPT paper."
), ),
) )
query_engine = RouterQueryEngine( query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(), selector=LLMSingleSelector.from_defaults(),
query_engine_tools=[ query_engine_tools=[
summary_tool, summary_tool,
vector_tool, vector_tool,
], ],
verbose=True verbose=True
) )
return query_engine return query_engine
query_engine = get_router_query_engine("metagpt.pdf") query_engine = get_router_query_engine("metagpt.pdf")
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
response = query_engine.query("Tell me about the ablation study results?") response = query_engine.query("Tell me about the ablation study results?")
print(str(response)) print(str(response))
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment