diff --git a/docs/single_gpu.md b/docs/single_gpu.md index 4aec78103bd684591b2aad64b9168431b4f460e0..2131ee6ed86316bd472e5c86740ee437eda4ecf7 100644 --- a/docs/single_gpu.md +++ b/docs/single_gpu.md @@ -4,7 +4,7 @@ To run fine-tuning on a single GPU, we will make use of two packages 1- [PEFT](https://huggingface.co/blog/peft) methods and in specific using HuggingFace [PEFT](https://github.com/huggingface/peft)library. -2- [bitandbytes](https://github.com/TimDettmers/bitsandbytes) int8 quantization. +2- [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) int8 quantization. Given combination of PEFT and Int8 quantization, we would be able to fine_tune a Llama 2 7B model on one consumer grade GPU such as A10. diff --git a/src/llama_recipes/inference/hf_text_generation_inference/README.md b/src/llama_recipes/inference/hf_text_generation_inference/README.md index caa71210daa674a702bfcb3a8fe2b667bde63d82..7db1e00e5c444f48c858b2d036eff2ea6113dd46 100644 --- a/src/llama_recipes/inference/hf_text_generation_inference/README.md +++ b/src/llama_recipes/inference/hf_text_generation_inference/README.md @@ -1,6 +1,6 @@ # Serving a fine tuned Llama model with HuggingFace text-generation-inference server -This document shows how to serve a fine tuned LLaMA mode with HuggingFace's text-generation-inference server. This option is currently only available for models that were trained using the LoRA method or without using the `--use_peft` argument. +This document shows how to serve a fine tuned Llama mode with HuggingFace's text-generation-inference server. This option is currently only available for models that were trained using the LoRA method or without using the `--use_peft` argument. ## Step 0: Merging the weights (Only required if LoRA method was used)