diff --git a/docs/multi_gpu.md b/docs/multi_gpu.md index 3535422c145aa10c66a402d38c00db94ca56f678..820595dcf3bdd6169dba4ac56c1fb3209aeb5ee8 100644 --- a/docs/multi_gpu.md +++ b/docs/multi_gpu.md @@ -4,7 +4,7 @@ To run fine-tuning on multi-GPUs, we will make use of two packages: 1. [PEFT](https://huggingface.co/blog/peft) methods and in particular using the Hugging Face [PEFT](https://github.com/huggingface/peft)library. -2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](LLM_finetuning.md/#2-full-partial-parameter-finetuning). +2. [FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](./LLM_finetuning.md). Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Llama 8B model on multiple GPUs in one node. For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled.