diff --git a/.github/scripts/spellcheck_conf/wordlist.txt b/.github/scripts/spellcheck_conf/wordlist.txt index 350e83106b599a89ff39a41d777507cb854e8bc6..f9ff571b50a336ee39f383580e051cf60475abbe 100644 --- a/.github/scripts/spellcheck_conf/wordlist.txt +++ b/.github/scripts/spellcheck_conf/wordlist.txt @@ -1483,3 +1483,4 @@ ttft uv 8xL40S xL +EDA diff --git a/docs/multi_gpu.md b/docs/multi_gpu.md index 3535422c145aa10c66a402d38c00db94ca56f678..820595dcf3bdd6169dba4ac56c1fb3209aeb5ee8 100644 --- a/docs/multi_gpu.md +++ b/docs/multi_gpu.md @@ -4,7 +4,7 @@ To run fine-tuning on multi-GPUs, we will make use of two packages: 1. [PEFT](https://huggingface.co/blog/peft) methods and in particular using the Hugging Face [PEFT](https://github.com/huggingface/peft)library. -2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](LLM_finetuning.md/#2-full-partial-parameter-finetuning). +2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](./LLM_finetuning.md). Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 8B model on multiple GPUs in one node. For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled. diff --git a/recipes/experimental/long_context/H2O/README.md b/recipes/experimental/long_context/H2O/README.md index 675e1ef68138e6014e03bccc017aa4254c6a4599..b73d8706a11235a95f2f1194dffbab91c58346b6 100644 --- a/recipes/experimental/long_context/H2O/README.md +++ b/recipes/experimental/long_context/H2O/README.md @@ -36,7 +36,7 @@ Expected results on XSUM (Rouge-2 score, the higher the better) from the above s ### One Demo on Streaming to "Infinite" Context Length -The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size. Results can be found on [Demo](https://allenz.work/?p=11) (Video 1). +The following example demonstrates the generation process of "infinite" sequence length. We use MT-Bench data and generate the context sample-by-sample. The KV Cache will keep the KV pairs from the previous samples while maintain a fixed size. ``` # run with full cache diff --git a/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb b/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb index 433c6906cf0e8f3e5063fc691d9cb0dc62a64f6d..67eda87f7e680bff16a3119676b585224e16e898 100644 --- a/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb +++ b/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/dlai/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" + "<a href=\"https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" ] }, {