diff --git a/docs/single_gpu.md b/docs/single_gpu.md index bcd2a3178a3e209ba391a14971ab8c464f9ada62..4aec78103bd684591b2aad64b9168431b4f460e0 100644 --- a/docs/single_gpu.md +++ b/docs/single_gpu.md @@ -4,7 +4,7 @@ To run fine-tuning on a single GPU, we will make use of two packages 1- [PEFT](https://huggingface.co/blog/peft) methods and in specific using HuggingFace [PEFT](https://github.com/huggingface/peft)library. -2- [BitandBytes](https://github.com/TimDettmers/bitsandbytes) int8 quantization. +2- [bitandbytes](https://github.com/TimDettmers/bitsandbytes) int8 quantization. Given combination of PEFT and Int8 quantization, we would be able to fine_tune a Llama 2 7B model on one consumer grade GPU such as A10. @@ -15,7 +15,7 @@ To run the examples, make sure to install the llama-recipes package (See [README ## How to run it? -Get access to a machine with one GPU or if using a multi-GPU macine please make sure to only make one of them visible using `export CUDA_VISIBLE_DEVICES=GPU:id` and run the following. It runs by default with `samsum_dataset` for summarization application. +Get access to a machine with one GPU or if using a multi-GPU machine please make sure to only make one of them visible using `export CUDA_VISIBLE_DEVICES=GPU:id` and run the following. It runs by default with `samsum_dataset` for summarization application. ```bash diff --git a/scripts/spellcheck_conf/wordlist.txt b/scripts/spellcheck_conf/wordlist.txt index 27c7323cdbcfb06bfca606234c4d2ba31c577b9d..113f21661f557ea0a9b74887981f9ca6aa8839ca 100644 --- a/scripts/spellcheck_conf/wordlist.txt +++ b/scripts/spellcheck_conf/wordlist.txt @@ -1121,3 +1121,26 @@ summarization xA Sanitization tokenization +hatchling +setuptools +BoolQ +CausalLM +Dyck +GSM +HellaSwag +HumanEval +MMLU +NarrativeQA +NaturalQuestions +OpenbookQA +PREPROC +QuAC +TruthfulQA +WinoGender +bAbI +dataclass +datafiles +davinci +GPU's +HuggingFace's +LoRA \ No newline at end of file diff --git a/src/llama_recipes/inference/hf_text_generation_inference/README.md b/src/llama_recipes/inference/hf_text_generation_inference/README.md index 7a4f72c610b553cc7bac6edad3432ae5514b2936..d6c3ada0a1d402af52d9e5bc569d7a36c54eb912 100644 --- a/src/llama_recipes/inference/hf_text_generation_inference/README.md +++ b/src/llama_recipes/inference/hf_text_generation_inference/README.md @@ -1,4 +1,4 @@ -# Serving a fine tuned LLaMA model with HuggingFace text-generation-inference server +# Serving a fine tuned Llama model with HuggingFace text-generation-inference server This document shows how to serve a fine tuned LLaMA mode with HuggingFace's text-generation-inference server. This option is currently only available for models that were trained using the LoRA method or without using the `--use_peft` argument.