From bd210b105df1b09922ebb8bea9fb81878e3fe658 Mon Sep 17 00:00:00 2001 From: Sanyam Bhutani <sanyambhutani@meta.com> Date: Thu, 9 Jan 2025 17:41:07 -0800 Subject: [PATCH] Fix A LOT of links --- .../llamaindex/dlai_agentic_rag/README.md | 2 +- end-to-end-use-cases/RAFT-Chatbot/README.md | 4 ++-- end-to-end-use-cases/README.md | 2 +- .../benchmarks/inference/on_prem/README.md | 2 +- .../benchmarks/llm_eval_harness/meta_eval/README.md | 2 +- .../whatsapp_chatbot/whatsapp_llama3.md | 4 ++-- end-to-end-use-cases/multilingual/README.md | 2 +- getting-started/README.md | 4 ++-- getting-started/finetuning/README.md | 12 ++++++------ getting-started/finetuning/datasets/README.md | 8 ++++---- getting-started/finetuning/multigpu_finetuning.md | 8 ++++---- getting-started/finetuning/singlegpu_finetuning.md | 6 +++--- getting-started/inference/local_inference/README.md | 4 ++-- .../mobile_inference/android_inference/README.md | 2 +- src/docs/FAQ.md | 6 +++--- src/docs/multi_gpu.md | 4 ++-- 16 files changed, 36 insertions(+), 36 deletions(-) diff --git a/3p-integrations/llamaindex/dlai_agentic_rag/README.md b/3p-integrations/llamaindex/dlai_agentic_rag/README.md index deeee9a9..b61a6b77 100644 --- a/3p-integrations/llamaindex/dlai_agentic_rag/README.md +++ b/3p-integrations/llamaindex/dlai_agentic_rag/README.md @@ -2,7 +2,7 @@ The folder here containts the Llama 3 ported notebooks of the DLAI short course [Building Agentic RAG with Llamaindex](https://www.deeplearning.ai/short-courses/building-agentic-rag-with-llamaindex/). -1. [Building Agentic RAG with Llamaindex L1 Router Engine](../../../quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb) shows how to implement a simple agentic RAG, a router that will pick up one of several query tools (question answering or summarization) to execute a query on a single document. Note this notebook is located in the `quickstart` folder. +1. [Building Agentic RAG with Llamaindex L1 Router Engine](../../../end-to-end-use-cases/agents/DeepLearningai_Course_Notebooks/AI_Agents_in_LangGraph_L1_Build_an_Agent_from_Scratch.ipynb) shows how to implement a simple agentic RAG, a router that will pick up one of several query tools (question answering or summarization) to execute a query on a single document. Note this notebook is located in the `quickstart` folder. 2. [Building Agentic RAG with Llamaindex L2 Tool Calling](Building_Agentic_RAG_with_Llamaindex_L2_Tool_Calling.ipynb) shows how to use Llama 3 to not only pick a function to execute, but also infer an argument to pass through the function. diff --git a/end-to-end-use-cases/RAFT-Chatbot/README.md b/end-to-end-use-cases/RAFT-Chatbot/README.md index 50356d50..b500944a 100644 --- a/end-to-end-use-cases/RAFT-Chatbot/README.md +++ b/end-to-end-use-cases/RAFT-Chatbot/README.md @@ -124,7 +124,7 @@ export PATH_TO_RAFT_JSON=recipes/use_cases/end2end-recipes/raft/output/raft.json torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name meta-Llama/Meta-Llama-3-8B-Instruct --dist_checkpoint_root_folder $PATH_TO_ROOT_FOLDER --dist_checkpoint_folder fine-tuned --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/finetuning/datasets/raft_dataset.py" --use-wandb --run_validation True --custom_dataset.data_path $PATH_TO_RAFT_JSON ``` -For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../../quickstart/finetuning/multigpu_finetuning.md) in the finetuning recipe. +For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../getting-started/finetuning/multigpu_finetuning.md) in the finetuning recipe. Next, we need to convert the FSDP checkpoint to a HuggingFace checkpoint using the following command: @@ -132,7 +132,7 @@ Next, we need to convert the FSDP checkpoint to a HuggingFace checkpoint using t python src/llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path "$PATH_TO_ROOT_FOLDER/fine-tuned-meta-Llama/Meta-Llama-3-8B-Instruct" --consolidated_model_path "$PATH_TO_ROOT_FOLDER" ``` -For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../../quickstart/inference/local_inference/README.md) in the inference/local_inference recipe. +For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../getting-started/finetuning/multigpu_finetuning.md) in the inference/local_inference recipe. ## Evaluation Steps Once we have the RAFT model, we need to evaluate its performance. In this tutorial, we'll not only use traditional evaluation methods (e.g., calculating exact match rate or ROUGE score) but also use LLM as a judge to score model-generated answers. diff --git a/end-to-end-use-cases/README.md b/end-to-end-use-cases/README.md index f5c66a0b..24b868c5 100644 --- a/end-to-end-use-cases/README.md +++ b/end-to-end-use-cases/README.md @@ -18,7 +18,7 @@ This demo app shows how to use LangChain and Llama 3 to let users ask questions ## [NotebookLlama](./NotebookLlama/): PDF to Podcast using Llama Models Workflow showcasing how to use multiple Llama models to go from any PDF to a Podcast and using open models to generate a multi-speaker podcast -## [live_data](live_data.ipynb): Ask Llama 3 about Live Data (using Replicate or [OctoAI](../3p_integrations/octoai/live_data.ipynb)) +## [live_data](live_data.ipynb): Ask Llama 3 about Live Data (using Replicate or [OctoAI](../3pintegrations/octoai/live_data.ipynb)) This demo app shows how to perform live data augmented generation tasks with Llama 3, [LlamaIndex](https://github.com/run-llama/llama_index), another leading open-source framework for building LLM apps, and the [Tavily](https://tavily.com) live search API. ## [WhatsApp Chatbot](./customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md): Building a Llama 3 Enabled WhatsApp Chatbot diff --git a/end-to-end-use-cases/benchmarks/inference/on_prem/README.md b/end-to-end-use-cases/benchmarks/inference/on_prem/README.md index afffd6ee..f9d7c02f 100644 --- a/end-to-end-use-cases/benchmarks/inference/on_prem/README.md +++ b/end-to-end-use-cases/benchmarks/inference/on_prem/README.md @@ -7,7 +7,7 @@ We support benchmark on these serving framework: # vLLM - Getting Started -To get started, we first need to deploy containers on-prem as a API host. Follow the guidance [here](../../../../recipes/3p_integrations/llama_on_prem.md#setting-up-vllm-with-llama-3) to deploy vLLM on-prem. +To get started, we first need to deploy containers on-prem as a API host. Follow the guidance [here](../../../../3p-integrations/llama_on_prem.md#setting-up-vllm-with-llama-3) to deploy vLLM on-prem. Note that in common scenario which overall throughput is important, we suggest you prioritize deploying as many model replicas as possible to reach higher overall throughput and request-per-second (RPS), comparing to deploy one model container among multiple GPUs for model parallelism. Additionally, as deploying multiple model replicas, there is a need for a higher level wrapper to handle the load balancing which here has been simulated in the benchmark scripts. For example, we have an instance from Azure that has 8xA100 80G GPUs, and we want to deploy the Meta Llama 3 70B instruct model, which is around 140GB with FP16. So for deployment we can do: diff --git a/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md b/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md index 64319784..edf27bc6 100644 --- a/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md +++ b/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md @@ -104,7 +104,7 @@ lm_eval --model vllm --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct,te **NOTE**: As for `add_bos_token=True`, since our prompts in the evals dataset has already included all the special tokens required by instruct model, such as `<|start_header_id|>user<|end_header_id|>`, we will not use `--apply_chat_template` argument for instruct models anymore. However, we need to use `add_bos_token=True` flag to add the BOS_token back during VLLM inference, as the BOS_token is removed by default in [this PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/1465). -**NOTE**: For `meta_math_hard` tasks, some of our internal math ground truth has been converted to scientific notation, e.g. `6\sqrt{7}` has been converted to `1.59e+1`, which will be later handled by our internal math evaluation functions. As the lm-evaluation-harness [math evaluation utils.py](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/math/utils.py) can not fully handle those conversion, we will use the original ground truth from the original dataset [lighteval/MATH-Hard](https://huggingface.co/datasets/lighteval/MATH-Hard) by joining the tables on the original input questions. The `get_math_data` function in the [prepare_meta_eval.py](./prepare_meta_eval.py) will handle this step and produce a local parquet dataset file. +**NOTE**: For `meta_math_hard` tasks, some of our internal math ground truth has been converted to scientific notation, e.g. `6\sqrt{7}` has been converted to `1.59e+1`, which will be later handled by our internal math evaluation functions. As the lm-evaluation-harness [math evaluation utils.py](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/math/utils.py) can not fully handle those conversion, we will use the original ground truth from the original dataset [lighteval/MATH-Hard](https://www.oxen.ai/lighteval/MATH-Hard) by joining the tables on the original input questions. The `get_math_data` function in the [prepare_meta_eval.py](./prepare_meta_eval.py) will handle this step and produce a local parquet dataset file. Moreover, we have modified this [math_hard/utils.py](./meta_template/math_hard/utils.py) to address two issues: diff --git a/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md b/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md index 9b022785..8d1c136d 100644 --- a/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md +++ b/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md @@ -10,7 +10,7 @@ Businesses of all sizes can use the [WhatsApp Business API](https://developers.f The diagram below shows the components and overall data flow of the Llama 3 enabled WhatsApp chatbot demo we built, using Amazon EC2 instance as an example for running the web server. - + ## Getting Started with WhatsApp Business Cloud API @@ -25,7 +25,7 @@ For the last step, you need to further follow the [Sample Callback URL for Webho Now open the [Meta for Develops Apps](https://developers.facebook.com/apps/) page and select the WhatsApp business app and you should be able to copy the curl command (as shown in the App Dashboard - WhatsApp - API Setup - Step 2 below) and run the command on a Terminal to send a test message to your WhatsApp. - + Note down the "Temporary access token", "Phone number ID", and "a recipient phone number" in the API Setup page above, which will be used later. diff --git a/end-to-end-use-cases/multilingual/README.md b/end-to-end-use-cases/multilingual/README.md index e8a678b3..662f7c50 100644 --- a/end-to-end-use-cases/multilingual/README.md +++ b/end-to-end-use-cases/multilingual/README.md @@ -119,7 +119,7 @@ phase2_ds.save_to_disk("data/phase2") ``` ### Train -Finally, we can start finetuning Llama2 on these datasets by following the [finetuning recipes](../getting-started/finetuning/). Remember to pass the new tokenizer path as an argument to the script: `--tokenizer_name=./extended_tokenizer`. +Finally, we can start finetuning Llama2 on these datasets by following the [finetuning recipes](../../getting-started/finetuning/). Remember to pass the new tokenizer path as an argument to the script: `--tokenizer_name=./extended_tokenizer`. OpenHathi was trained on 64 A100 80GB GPUs. Here are the hyperparameters used and other training details: - maximum learning rate: 2e-4 diff --git a/getting-started/README.md b/getting-started/README.md index 523135c0..c09ac3d5 100644 --- a/getting-started/README.md +++ b/getting-started/README.md @@ -5,6 +5,6 @@ If you are new to developing with Meta Llama models, this is where you should st * The [Build_with_Llama 3.2](./build_with_Llama_3_2.ipynb) notebook showcases a comprehensive walkthrough of the new capabilities of Llama 3.2 models, including multimodal use cases, function/tool calling, Llama Stack, and Llama on edge. * The [Running_Llama_Anywhere](./Running_Llama3_Anywhere/) notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling. * The [Prompt_Engineering_with_Llama](./Prompt_Engineering_with_Llama_3.ipynb) notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters. -* The [inference](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [3p_integrations/vllm](../3p_integrations/vllm/) and [3p_integrations/tgi](../3p_integrations/tgi/) for hosting Llama on open-source model servers. +* The [inference](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [3p_integrations/vllm](../3p-integrations/vllm/) and [3p_integrations/tgi](../3p-integrations/tgi/) for hosting Llama on open-source model servers. * The [RAG](./RAG/) folder contains a simple Retrieval-Augmented Generation application using Llama. -* The [finetuning](./finetuning/) folder contains resources to help you finetune Llama on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [finetuning.py](../../src/llama_recipes/finetuning.py) which supports these features: +* The [finetuning](./finetuning/) folder contains resources to help you finetune Llama on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [finetuning.py](../src/llama_recipes/finetuning.py) which supports these features: diff --git a/getting-started/finetuning/README.md b/getting-started/finetuning/README.md index 46d58aa6..ca2b6757 100644 --- a/getting-started/finetuning/README.md +++ b/getting-started/finetuning/README.md @@ -6,7 +6,7 @@ This folder contains instructions to fine-tune Meta Llama 3 on a * [single-GPU setup](./singlegpu_finetuning.md) * [multi-GPU setup](./multigpu_finetuning.md) -using the canonical [finetuning script](../../../src/llama_recipes/finetuning.py) in the llama-recipes package. +using the canonical [finetuning script](../../src/llama_recipes/finetuning.py) in the llama-recipes package. If you are new to fine-tuning techniques, check out [an overview](./LLM_finetuning_overview.md). @@ -17,10 +17,10 @@ If you are new to fine-tuning techniques, check out [an overview](./LLM_finetuni ## How to configure finetuning settings? > [!TIP] -> All the setting defined in [config files](../../../src/llama_recipes/configs/) can be passed as args through CLI when running the script, there is no need to change from config files directly. +> All the setting defined in [config files](../../src/llama_recipes/configs/) can be passed as args through CLI when running the script, there is no need to change from config files directly. -* [Training config file](../../../src/llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../../../src/llama_recipes/configs/) +* [Training config file](../../src/llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../../src/llama_recipes/configs/) It lets us specify the training settings for everything from `model_name` to `dataset_name`, `batch_size` and so on. Below is the list of supported settings: @@ -71,11 +71,11 @@ It lets us specify the training settings for everything from `model_name` to `da ``` -* [Datasets config file](../../../src/llama_recipes/configs/datasets.py) provides the available options for datasets. +* [Datasets config file](../../src/llama_recipes/configs/datasets.py) provides the available options for datasets. -* [peft config file](../../../src/llama_recipes/configs/peft.py) provides the supported PEFT methods and respective settings that can be modified. We currently support LoRA and Llama-Adapter. Please note that LoRA is the only technique which is supported in combination with FSDP. +* [peft config file](../../src/llama_recipes/configs/peft.py) provides the supported PEFT methods and respective settings that can be modified. We currently support LoRA and Llama-Adapter. Please note that LoRA is the only technique which is supported in combination with FSDP. -* [FSDP config file](../../../src/llama_recipes/configs/fsdp.py) provides FSDP settings such as: +* [FSDP config file](../../src/llama_recipes/configs/fsdp.py) provides FSDP settings such as: * `mixed_precision` boolean flag to specify using mixed precision, defatults to true. diff --git a/getting-started/finetuning/datasets/README.md b/getting-started/finetuning/datasets/README.md index 8795ca96..3543ee77 100644 --- a/getting-started/finetuning/datasets/README.md +++ b/getting-started/finetuning/datasets/README.md @@ -48,17 +48,17 @@ python -m llama_recipes.finetuning --dataset "custom_dataset" --custom_dataset.f This will call the function `get_foo` instead of `get_custom_dataset` when retrieving the dataset. ### Adding new dataset -Each dataset has a corresponding configuration (dataclass) in [configs/datasets.py](../../../../src/llama_recipes/configs/datasets.py) which contains the dataset name, training/validation split names, as well as optional parameters like datafiles etc. +Each dataset has a corresponding configuration (dataclass) in [configs/datasets.py](../../../src/llama_recipes/configs/datasets.py) which contains the dataset name, training/validation split names, as well as optional parameters like datafiles etc. -Additionally, there is a preprocessing function for each dataset in the [datasets](../../../../src/llama_recipes/datasets) folder. +Additionally, there is a preprocessing function for each dataset in the [datasets](../../../src/llama_recipes/datasets) folder. The returned data of the dataset needs to be consumable by the forward method of the fine-tuned model by calling ```model(**data)```. For CausalLM models this usually means that the data needs to be in the form of a dictionary with "input_ids", "attention_mask" and "labels" fields. To add a custom dataset the following steps need to be performed. -1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../../../../src/llama_recipes/configs/datasets.py). +1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../../../src/llama_recipes/configs/datasets.py). 2. Create a preprocessing routine which loads the data and returns a PyTorch style dataset. The signature for the preprocessing function needs to be (dataset_config, tokenizer, split_name) where split_name will be the string for train/validation split as defined in the dataclass. -3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [datasets/__init__.py](../../../../src/llama_recipes/datasets/__init__.py) +3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [datasets/__init__.py](../../../src/llama_recipes/datasets/__init__.py) 4. Set dataset field in training config to dataset name or use --dataset option of the `llama_recipes.finetuning` module or examples/finetuning.py training script. ## Application diff --git a/getting-started/finetuning/multigpu_finetuning.md b/getting-started/finetuning/multigpu_finetuning.md index 0dbf99b8..43a818d1 100644 --- a/getting-started/finetuning/multigpu_finetuning.md +++ b/getting-started/finetuning/multigpu_finetuning.md @@ -96,14 +96,14 @@ srun torchrun --nproc_per_node 8 --rdzv_id $RANDOM --rdzv_backend c10d --rdzv_e Do not forget to adjust the number of nodes, ntasks and gpus-per-task in the top. ## Running with different datasets -Currently 3 open source datasets are supported that can be found in [Datasets config file](../../../src/llama_recipes/configs/datasets.py). You can also use your custom dataset (more info [here](./datasets/README.md)). +Currently 3 open source datasets are supported that can be found in [Datasets config file](../../src/llama_recipes/configs/datasets.py). You can also use your custom dataset (more info [here](./datasets/README.md)). -* `grammar_dataset` : use this [notebook](../../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking. +* `grammar_dataset` : use this [notebook](../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking. * `alpaca_dataset` : to get this open source data please download the `aplaca.json` to `dataset` folder. ```bash -wget -P ../../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json +wget -P ../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json ``` * `samsum_dataset` @@ -132,7 +132,7 @@ In case you are dealing with slower interconnect network between nodes, to reduc HSDP (Hybrid sharding Data Parallel) helps to define a hybrid sharding strategy where you can have FSDP within `sharding_group_size` which can be the minimum number of GPUs you can fit your model and DDP between the replicas of the model specified by `replica_group_size`. -This will require to set the Sharding strategy in [fsdp config](../../../src/llama_recipes/configs/fsdp.py) to `ShardingStrategy.HYBRID_SHARD` and specify two additional settings, `sharding_group_size` and `replica_group_size` where former specifies the sharding group size, number of GPUs that you model can fit into to form a replica of a model and latter specifies the replica group size, which is world_size/sharding_group_size. +This will require to set the Sharding strategy in [fsdp config](../../src/llama_recipes/configs/fsdp.py) to `ShardingStrategy.HYBRID_SHARD` and specify two additional settings, `sharding_group_size` and `replica_group_size` where former specifies the sharding group size, number of GPUs that you model can fit into to form a replica of a model and latter specifies the replica group size, which is world_size/sharding_group_size. ```bash diff --git a/getting-started/finetuning/singlegpu_finetuning.md b/getting-started/finetuning/singlegpu_finetuning.md index 1b054be1..8ab3d8a9 100644 --- a/getting-started/finetuning/singlegpu_finetuning.md +++ b/getting-started/finetuning/singlegpu_finetuning.md @@ -1,7 +1,7 @@ # Fine-tuning with Single GPU This recipe steps you through how to finetune a Meta Llama 3 model on the text summarization task using the [samsum](https://huggingface.co/datasets/samsum) dataset on a single GPU. -These are the instructions for using the canonical [finetuning script](../../../src/llama_recipes/finetuning.py) in the llama-recipes package. +These are the instructions for using the canonical [finetuning script](../../src/llama_recipes/finetuning.py) in the llama-recipes package. ## Requirements @@ -35,13 +35,13 @@ The args used in the command above are: Currently 3 open source datasets are supported that can be found in [Datasets config file](../../../src/llama_recipes/configs/datasets.py). You can also use your custom dataset (more info [here](./datasets/README.md)). -* `grammar_dataset` : use this [notebook](../../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking. +* `grammar_dataset` : use this [notebook](../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking. * `alpaca_dataset` : to get this open source data please download the `alpaca.json` to `dataset` folder. ```bash -wget -P ../../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json +wget -P ../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json ``` * `samsum_dataset` diff --git a/getting-started/inference/local_inference/README.md b/getting-started/inference/local_inference/README.md index 8e27304a..a31ee952 100644 --- a/getting-started/inference/local_inference/README.md +++ b/getting-started/inference/local_inference/README.md @@ -105,7 +105,7 @@ python inference.py --model_name <training_config.output_dir> --peft_model <trai ## Inference with FSDP checkpoints -In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above. +In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above. **To convert the checkpoint use the following command**: This is helpful if you have fine-tuned you model using FSDP only as follows: @@ -130,4 +130,4 @@ python inference.py --model_name <training_config.output_dir> --prompt_file <tes ## Inference on large models like Meta Llama 405B The FP8 quantized variants of Meta Llama (i.e. meta-llama/Meta-Llama-3.1-405B-FP8 and meta-llama/Meta-Llama-3.1-405B-Instruct-FP8) can be executed on a single node with 8x80GB H100 using the scripts located in this folder. -To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md). +To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p-integrations/vllm/README.md). diff --git a/getting-started/inference/mobile_inference/android_inference/README.md b/getting-started/inference/mobile_inference/android_inference/README.md index 5a0ec16b..50ec467d 100644 --- a/getting-started/inference/mobile_inference/android_inference/README.md +++ b/getting-started/inference/mobile_inference/android_inference/README.md @@ -9,7 +9,7 @@ Machine Learning Compilation for Large Language Models (MLC LLM) is a high-perfo You can read more about MLC-LLM at the following [link](https://github.com/mlc-ai/mlc-llm). -MLC-LLM is also what powers the Llama3 inference APIs provided by [OctoAI](https://octo.ai/). You can use OctoAI for your Llama3 cloud-based inference needs by trying out the examples under the [following path](../../../../3p_integrations/octoai/). +MLC-LLM is also what powers the Llama3 inference APIs provided by [OctoAI](https://octo.ai/). You can use OctoAI for your Llama3 cloud-based inference needs by trying out the examples under the [following path](../../../../3p-integrations/octoai/). This tutorial was tested with the following setup: * MacBook Pro 16 inch from 2021 with Apple M1 Max and 32GB of RAM running Sonoma 14.3.1 diff --git a/src/docs/FAQ.md b/src/docs/FAQ.md index 6dc3fd91..db0d9e08 100644 --- a/src/docs/FAQ.md +++ b/src/docs/FAQ.md @@ -16,7 +16,7 @@ Here we discuss frequently asked questions that may occur and we found useful al 4. Can I add custom datasets? - Yes, you can find more information on how to do that [here](../recipes/quickstart/finetuning/datasets/README.md). + Yes, you can find more information on how to do that [here](../../getting-started/finetuning/datasets/README.md). 5. What are the hardware SKU requirements for deploying these models? @@ -36,13 +36,13 @@ Here we discuss frequently asked questions that may occur and we found useful al os.environ['PYTORCH_CUDA_ALLOC_CONF']='expandable_segments:True' ``` - We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../src/llama_recipes/utils/train_utils.py), feel free to uncomment it if required. + We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../llama_recipes/utils/train_utils.py), feel free to uncomment it if required. 8. Additional debugging flags? The environment variable `TORCH_DISTRIBUTED_DEBUG` can be used to trigger additional useful logging and collective synchronization checks to ensure all ranks are synchronized appropriately. `TORCH_DISTRIBUTED_DEBUG` can be set to either OFF (default), INFO, or DETAIL depending on the debugging level required. Please note that the most verbose option, DETAIL may impact the application performance and thus should only be used when debugging issues. - We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../src/llama_recipes/utils/train_utils.py), feel free to uncomment it if required. + We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../llama_recipes/utils/train_utils.py), feel free to uncomment it if required. 9. I am getting import errors when running inference. diff --git a/src/docs/multi_gpu.md b/src/docs/multi_gpu.md index aecafd93..6b34c42d 100644 --- a/src/docs/multi_gpu.md +++ b/src/docs/multi_gpu.md @@ -10,7 +10,7 @@ Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Lla For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled. ## Requirements -To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../recipes/quickstart/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details). +To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../../getting-started/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details). ## How to run it @@ -117,7 +117,7 @@ torchrun --nnodes 1 --nproc_per_node 4 recipes/quickstart/finetuning/finetuning ## Where to configure settings? -* [Training config file](../llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../src/llama_recipes/configs/) +* [Training config file](../llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../llama_recipes/configs/) It lets us specify the training settings for everything from `model_name` to `dataset_name`, `batch_size` and so on. Below is the list of supported settings: -- GitLab