diff --git a/3p-integrations/modal/many-llamas-human-eval/README.md b/3p-integrations/modal/many-llamas-human-eval/README.md index 1c3c1b661918a415a20cac455a1e2f6250294100..342949e92c86f7a1e3b4c3bdae744e47a551f83a 100644 --- a/3p-integrations/modal/many-llamas-human-eval/README.md +++ b/3p-integrations/modal/many-llamas-human-eval/README.md @@ -12,7 +12,7 @@ This experiment built by the team at [Modal](https://modal.com), and is describe [Beat GPT-4o at Python by searching with 100 small Llamas](https://modal.com/blog/llama-human-eval) -The experiment has since been upgraded to use the [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model, and runnable end-to-end using the Modal serverless platform. +The experiment has since been upgraded to use the [Llama 3.2 3B Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) model, and run end-to-end using the Modal serverless platform. ## Run it yourself @@ -55,7 +55,7 @@ This will execute: 5. Generating graphs of pass@k and fail@k ### Results - +<!-- markdown-link-check-disable --> The resulting plots of the evals will be saved locally to: - `/tmp/plot-pass-k.jpeg` - `/tmp/plot-fail-k.jpeg` @@ -69,3 +69,4 @@ You'll see that at 100 generations, the Llama model is able to perform on-par wi `/tmp/plot-fail-k.jpeg` shows fail@k across a log-scale, showing smooth scaling of this method.  +<!-- markdown-link-check-enable --> diff --git a/README.md b/README.md index 1087438a3feaa96c277a62bcf67af94a0ae3da72..c264d414524550fb4f079c7ef5c3a01a8bc328b5 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,4 @@ # Llama Cookbook: The Official Guide to building with Llama Models -<!-- markdown-link-check-disable --> > Note: We recently did a refactor of the repo, [archive-main](https://github.com/meta-llama/llama-recipes/tree/archive-main) is a snapshot branch from before the refactor @@ -18,7 +17,6 @@ The examples cover the most popular community approaches, popular use-cases and > * [Multimodal Inference with Llama 3.2 Vision](./getting-started/inference/local_inference/README.md#multimodal-inference) > * [Inference on Llama Guard 1B + Multimodal inference on Llama Guard 11B-Vision](./end-to-end-use-cases/responsible_ai/llama_guard/llama_guard_text_and_vision_inference.ipynb) -<!-- markdown-link-check-enable --> > [!NOTE] > Llama 3.2 follows the same prompt template as Llama 3.1, with a new special token `<|image|>` representing the input image for the multimodal models. > diff --git a/UPDATES.md b/UPDATES.md index f4dc5cef27ae8367c891be483a927ee6fa032d8c..0281eb3098bae6e63f5eabe7508c591f39c9d09c 100644 --- a/UPDATES.md +++ b/UPDATES.md @@ -1,5 +1,5 @@ DIFFLOG: - +<!-- markdown-link-check-disable --> Nested Folders rename: - /recipes/3p_integrations -> /3p-integrations - /recipes/quickstart -> /getting-started @@ -20,4 +20,4 @@ Removed folders: - /flagged (Empty folder) - /recipes/quickstart/Running_Llama3_Anywhere (Redundant code) - /recipes/quickstart/inference/codellama (deprecated model) - +<!-- markdown-link-check-enable --> diff --git a/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md b/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md index 96e0ae677e83d8543a767f54d5aa2d98db5597ae..64319784946651a6d79d87db8a08c97b18bbf818 100644 --- a/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md +++ b/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md @@ -50,7 +50,7 @@ Given the extensive number of tasks available (12 for pretrained models and 30 f - **Tasks for 3.2 pretrained models**: MMLU - **Tasks for 3.2 instruct models**: MMLU, GPQA -These tasks are common evalutions, many of which overlap with the Hugging Face [Open LLM Leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) +These tasks are common evaluations, many of which overlap with the Hugging Face [Open LLM Leaderboard v2](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Here, we aim to get the benchmark numbers on the aforementioned tasks using Hugging Face [leaderboard implementation](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/leaderboard). Please follow the instructions below to make necessary modifications to use our eval prompts and get more eval metrics. diff --git a/end-to-end-use-cases/multilingual/README.md b/end-to-end-use-cases/multilingual/README.md index 159db54b36f2ed8905f3444a697bd6fa4e43d724..e8a678b3f54d4b0f174ddb429f6997566cd6c97c 100644 --- a/end-to-end-use-cases/multilingual/README.md +++ b/end-to-end-use-cases/multilingual/README.md @@ -119,7 +119,7 @@ phase2_ds.save_to_disk("data/phase2") ``` ### Train -Finally, we can start finetuning Llama2 on these datasets by following the [finetuning recipes](../../quickstart/finetuning/). Remember to pass the new tokenizer path as an argument to the script: `--tokenizer_name=./extended_tokenizer`. +Finally, we can start finetuning Llama2 on these datasets by following the [finetuning recipes](../getting-started/finetuning/). Remember to pass the new tokenizer path as an argument to the script: `--tokenizer_name=./extended_tokenizer`. OpenHathi was trained on 64 A100 80GB GPUs. Here are the hyperparameters used and other training details: - maximum learning rate: 2e-4