Skip to content
Snippets Groups Projects
Commit 1d3780f3 authored by Matthias Reso's avatar Matthias Reso
Browse files

Address spell check issues

parent 6c38cbeb
No related branches found
No related tags found
No related merge requests found
...@@ -4,7 +4,7 @@ To run fine-tuning on a single GPU, we will make use of two packages ...@@ -4,7 +4,7 @@ To run fine-tuning on a single GPU, we will make use of two packages
1- [PEFT](https://huggingface.co/blog/peft) methods and in specific using HuggingFace [PEFT](https://github.com/huggingface/peft)library. 1- [PEFT](https://huggingface.co/blog/peft) methods and in specific using HuggingFace [PEFT](https://github.com/huggingface/peft)library.
2- [BitandBytes](https://github.com/TimDettmers/bitsandbytes) int8 quantization. 2- [bitandbytes](https://github.com/TimDettmers/bitsandbytes) int8 quantization.
Given combination of PEFT and Int8 quantization, we would be able to fine_tune a Llama 2 7B model on one consumer grade GPU such as A10. Given combination of PEFT and Int8 quantization, we would be able to fine_tune a Llama 2 7B model on one consumer grade GPU such as A10.
...@@ -15,7 +15,7 @@ To run the examples, make sure to install the llama-recipes package (See [README ...@@ -15,7 +15,7 @@ To run the examples, make sure to install the llama-recipes package (See [README
## How to run it? ## How to run it?
Get access to a machine with one GPU or if using a multi-GPU macine please make sure to only make one of them visible using `export CUDA_VISIBLE_DEVICES=GPU:id` and run the following. It runs by default with `samsum_dataset` for summarization application. Get access to a machine with one GPU or if using a multi-GPU machine please make sure to only make one of them visible using `export CUDA_VISIBLE_DEVICES=GPU:id` and run the following. It runs by default with `samsum_dataset` for summarization application.
```bash ```bash
......
...@@ -1121,3 +1121,26 @@ summarization ...@@ -1121,3 +1121,26 @@ summarization
xA xA
Sanitization Sanitization
tokenization tokenization
hatchling
setuptools
BoolQ
CausalLM
Dyck
GSM
HellaSwag
HumanEval
MMLU
NarrativeQA
NaturalQuestions
OpenbookQA
PREPROC
QuAC
TruthfulQA
WinoGender
bAbI
dataclass
datafiles
davinci
GPU's
HuggingFace's
LoRA
\ No newline at end of file
# Serving a fine tuned LLaMA model with HuggingFace text-generation-inference server # Serving a fine tuned Llama model with HuggingFace text-generation-inference server
This document shows how to serve a fine tuned LLaMA mode with HuggingFace's text-generation-inference server. This option is currently only available for models that were trained using the LoRA method or without using the `--use_peft` argument. This document shows how to serve a fine tuned LLaMA mode with HuggingFace's text-generation-inference server. This option is currently only available for models that were trained using the LoRA method or without using the `--use_peft` argument.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment