diff --git a/3p-integrations/llamaindex/dlai_agentic_rag/README.md b/3p-integrations/llamaindex/dlai_agentic_rag/README.md
index deeee9a9cdd1317c0f406ecfa410701305891719..b61a6b772275df7656c0418474a450619003f118 100644
--- a/3p-integrations/llamaindex/dlai_agentic_rag/README.md
+++ b/3p-integrations/llamaindex/dlai_agentic_rag/README.md
@@ -2,7 +2,7 @@
 
 The folder here containts the Llama 3 ported notebooks of the DLAI short course [Building Agentic RAG with Llamaindex](https://www.deeplearning.ai/short-courses/building-agentic-rag-with-llamaindex/).
 
-1. [Building Agentic RAG with Llamaindex L1 Router Engine](../../../quickstart/agents/DeepLearningai_Course_Notebooks/Building_Agentic_RAG_with_Llamaindex_L1_Router_Engine.ipynb) shows how to implement a simple agentic RAG, a router that will pick up one of several query tools (question answering or summarization) to execute a query on a single document. Note this notebook is located in the `quickstart` folder.
+1. [Building Agentic RAG with Llamaindex L1 Router Engine](../../../end-to-end-use-cases/agents/DeepLearningai_Course_Notebooks/AI_Agents_in_LangGraph_L1_Build_an_Agent_from_Scratch.ipynb) shows how to implement a simple agentic RAG, a router that will pick up one of several query tools (question answering or summarization) to execute a query on a single document. Note this notebook is located in the `quickstart` folder.
 
 2. [Building Agentic RAG with Llamaindex L2 Tool Calling](Building_Agentic_RAG_with_Llamaindex_L2_Tool_Calling.ipynb) shows how to use Llama 3 to not only pick a function to execute, but also infer an argument to pass through the function.
 
diff --git a/end-to-end-use-cases/RAFT-Chatbot/README.md b/end-to-end-use-cases/RAFT-Chatbot/README.md
index 50356d50950bb92265abc54c0d056d9afa390f24..b500944a205197e3e6398e77048d575aa086ce84 100644
--- a/end-to-end-use-cases/RAFT-Chatbot/README.md
+++ b/end-to-end-use-cases/RAFT-Chatbot/README.md
@@ -124,7 +124,7 @@ export PATH_TO_RAFT_JSON=recipes/use_cases/end2end-recipes/raft/output/raft.json
 torchrun --nnodes 1 --nproc_per_node 4  recipes/quickstart/finetuning/finetuning.py --enable_fsdp --lr 1e-5 --context_length 8192 --num_epochs 1 --batch_size_training 1 --model_name meta-Llama/Meta-Llama-3-8B-Instruct --dist_checkpoint_root_folder $PATH_TO_ROOT_FOLDER --dist_checkpoint_folder fine-tuned  --use_fast_kernels --dataset "custom_dataset" --custom_dataset.test_split "test" --custom_dataset.file "recipes/finetuning/datasets/raft_dataset.py" --use-wandb  --run_validation True  --custom_dataset.data_path $PATH_TO_RAFT_JSON
 ```
 
-For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../../quickstart/finetuning/multigpu_finetuning.md) in the finetuning recipe.
+For more details on multi-GPU fine-tuning, please refer to the [multigpu_finetuning.md](../../getting-started/finetuning/multigpu_finetuning.md) in the finetuning recipe.
 
 Next, we need to convert the FSDP checkpoint to a HuggingFace checkpoint using the following command:
 
@@ -132,7 +132,7 @@ Next, we need to convert the FSDP checkpoint to a HuggingFace checkpoint using t
 python src/llama_recipes/inference/checkpoint_converter_fsdp_hf.py --fsdp_checkpoint_path  "$PATH_TO_ROOT_FOLDER/fine-tuned-meta-Llama/Meta-Llama-3-8B-Instruct" --consolidated_model_path "$PATH_TO_ROOT_FOLDER"
 ```
 
-For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../../quickstart/inference/local_inference/README.md) in the inference/local_inference recipe.
+For more details on FSDP to HuggingFace checkpoint conversion, please refer to the [readme](../../getting-started/finetuning/multigpu_finetuning.md) in the inference/local_inference recipe.
 
 ## Evaluation Steps
 Once we have the RAFT model, we need to evaluate its performance. In this tutorial, we'll not only use traditional evaluation methods (e.g., calculating exact match rate or ROUGE score) but also use LLM as a judge to score model-generated answers.
diff --git a/end-to-end-use-cases/README.md b/end-to-end-use-cases/README.md
index f5c66a0b8e7a0275c52de36f7c03502b8ee5a3b2..24b868c5e21687a3f2026889eafe468543583b39 100644
--- a/end-to-end-use-cases/README.md
+++ b/end-to-end-use-cases/README.md
@@ -18,7 +18,7 @@ This demo app shows how to use LangChain and Llama 3 to let users ask questions
 ## [NotebookLlama](./NotebookLlama/): PDF to Podcast using Llama Models
 Workflow showcasing how to use multiple Llama models to go from any PDF to a Podcast and using open models to generate a multi-speaker podcast
 
-## [live_data](live_data.ipynb): Ask Llama 3 about Live Data (using Replicate or [OctoAI](../3p_integrations/octoai/live_data.ipynb))
+## [live_data](live_data.ipynb): Ask Llama 3 about Live Data (using Replicate or [OctoAI](../3pintegrations/octoai/live_data.ipynb))
 This demo app shows how to perform live data augmented generation tasks with Llama 3, [LlamaIndex](https://github.com/run-llama/llama_index), another leading open-source framework for building LLM apps, and the [Tavily](https://tavily.com) live search API.
 
 ## [WhatsApp Chatbot](./customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md): Building a Llama 3 Enabled WhatsApp Chatbot
diff --git a/end-to-end-use-cases/benchmarks/inference/on_prem/README.md b/end-to-end-use-cases/benchmarks/inference/on_prem/README.md
index afffd6ee5155f9641788128a3bacee23946ea6bb..f9d7c02fc441eadda21f448dec9a82a667ec14ee 100644
--- a/end-to-end-use-cases/benchmarks/inference/on_prem/README.md
+++ b/end-to-end-use-cases/benchmarks/inference/on_prem/README.md
@@ -7,7 +7,7 @@ We support benchmark on these serving framework:
 
 # vLLM - Getting Started
 
-To get started, we first need to deploy containers on-prem as a API host. Follow the guidance [here](../../../../recipes/3p_integrations/llama_on_prem.md#setting-up-vllm-with-llama-3) to deploy vLLM on-prem.
+To get started, we first need to deploy containers on-prem as a API host. Follow the guidance [here](../../../../3p-integrations/llama_on_prem.md#setting-up-vllm-with-llama-3) to deploy vLLM on-prem.
 
 Note that in common scenario which overall throughput is important, we suggest you prioritize deploying as many model replicas as possible to reach higher overall throughput and request-per-second (RPS), comparing to deploy one model container among multiple GPUs for model parallelism. Additionally, as deploying multiple model replicas, there is a need for a higher level wrapper to handle the load balancing which here has been simulated in the benchmark scripts.
 For example, we have an instance from Azure that has 8xA100 80G GPUs, and we want to deploy the Meta Llama 3 70B instruct model, which is around 140GB with FP16. So for deployment we can do:
diff --git a/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md b/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md
index 64319784946651a6d79d87db8a08c97b18bbf818..edf27bc6d0c4ab2eb57ba1d2eda0e2a12e18f665 100644
--- a/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md
+++ b/end-to-end-use-cases/benchmarks/llm_eval_harness/meta_eval/README.md
@@ -104,7 +104,7 @@ lm_eval --model vllm --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct,te
 
 **NOTE**: As for `add_bos_token=True`, since our prompts in the evals dataset has already included all the special tokens required by instruct model, such as `<|start_header_id|>user<|end_header_id|>`, we will not use `--apply_chat_template` argument for instruct models anymore. However, we need to use `add_bos_token=True` flag to add the BOS_token back during VLLM inference, as the BOS_token is removed by default in [this PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/1465).
 
-**NOTE**: For `meta_math_hard` tasks, some of our internal math ground truth has been converted to scientific notation, e.g. `6\sqrt{7}` has been converted to `1.59e+1`, which will be later handled by our internal math evaluation functions. As the lm-evaluation-harness [math evaluation utils.py](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/math/utils.py) can not fully handle those conversion, we will use the original ground truth from the original dataset [lighteval/MATH-Hard](https://huggingface.co/datasets/lighteval/MATH-Hard) by joining the tables on the original input questions. The `get_math_data` function in the [prepare_meta_eval.py](./prepare_meta_eval.py) will handle this step and produce a local parquet dataset file.
+**NOTE**: For `meta_math_hard` tasks, some of our internal math ground truth has been converted to scientific notation, e.g. `6\sqrt{7}` has been converted to `1.59e+1`, which will be later handled by our internal math evaluation functions. As the lm-evaluation-harness [math evaluation utils.py](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/leaderboard/math/utils.py) can not fully handle those conversion, we will use the original ground truth from the original dataset [lighteval/MATH-Hard](https://www.oxen.ai/lighteval/MATH-Hard) by joining the tables on the original input questions. The `get_math_data` function in the [prepare_meta_eval.py](./prepare_meta_eval.py) will handle this step and produce a local parquet dataset file.
 
 Moreover, we have modified this [math_hard/utils.py](./meta_template/math_hard/utils.py) to address two issues:
 
diff --git a/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md b/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md
index 9b022785b4e9455be9fc50faf7b0e7f912062c4f..8d1c136d16ee2daf63f0c0f2c3b78c1efe6917dc 100644
--- a/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md
+++ b/end-to-end-use-cases/customerservice_chatbots/whatsapp_chatbot/whatsapp_llama3.md
@@ -10,7 +10,7 @@ Businesses of all sizes can use the [WhatsApp Business API](https://developers.f
 
 The diagram below shows the components and overall data flow of the Llama 3 enabled WhatsApp chatbot demo we built, using Amazon EC2 instance as an example for running the web server.
 
-![](../../../../docs/img/whatsapp_llama_arch.jpg)
+![](../../../src/docs/img/whatsapp_llama_arch.jpg)
 
 ## Getting Started with WhatsApp Business Cloud API
 
@@ -25,7 +25,7 @@ For the last step, you need to further follow the [Sample Callback URL for Webho
 
 Now open the [Meta for Develops Apps](https://developers.facebook.com/apps/) page and select the WhatsApp business app and you should be able to copy the curl command (as shown in the App Dashboard - WhatsApp - API Setup - Step 2 below) and run the command on a Terminal to send a test message to your WhatsApp.
 
-![](../../../../docs/img/whatsapp_dashboard.jpg)
+![](../../../src/docs/img/whatsapp_dashboard.jpg)
 
 Note down the "Temporary access token", "Phone number ID", and "a recipient phone number" in the API Setup page above, which will be used later.
 
diff --git a/end-to-end-use-cases/multilingual/README.md b/end-to-end-use-cases/multilingual/README.md
index e8a678b3f54d4b0f174ddb429f6997566cd6c97c..662f7c50b9edb83ad786dc61b2f7392dfa443bb1 100644
--- a/end-to-end-use-cases/multilingual/README.md
+++ b/end-to-end-use-cases/multilingual/README.md
@@ -119,7 +119,7 @@ phase2_ds.save_to_disk("data/phase2")
 ```
 
 ### Train
-Finally, we can start finetuning Llama2 on these datasets by following the [finetuning recipes](../getting-started/finetuning/). Remember to pass the new tokenizer path as an argument to the script: `--tokenizer_name=./extended_tokenizer`.
+Finally, we can start finetuning Llama2 on these datasets by following the [finetuning recipes](../../getting-started/finetuning/). Remember to pass the new tokenizer path as an argument to the script: `--tokenizer_name=./extended_tokenizer`.
 
 OpenHathi was trained on 64 A100 80GB GPUs. Here are the hyperparameters used and other training details:
 - maximum learning rate: 2e-4
diff --git a/getting-started/README.md b/getting-started/README.md
index 523135c0bc93f38d13f503ac7f7d095370693c83..c09ac3d532bbb5fa1503d4e9d989e4291e46764c 100644
--- a/getting-started/README.md
+++ b/getting-started/README.md
@@ -5,6 +5,6 @@ If you are new to developing with Meta Llama models, this is where you should st
 * The [Build_with_Llama 3.2](./build_with_Llama_3_2.ipynb) notebook showcases a comprehensive walkthrough of the new capabilities of Llama 3.2 models, including multimodal use cases, function/tool calling, Llama Stack, and Llama on edge.
 * The [Running_Llama_Anywhere](./Running_Llama3_Anywhere/) notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling.
 * The [Prompt_Engineering_with_Llama](./Prompt_Engineering_with_Llama_3.ipynb) notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
-* The [inference](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [3p_integrations/vllm](../3p_integrations/vllm/) and [3p_integrations/tgi](../3p_integrations/tgi/) for hosting Llama on open-source model servers.
+* The [inference](./inference/) folder contains scripts to deploy Llama for inference on server and mobile. See also [3p_integrations/vllm](../3p-integrations/vllm/) and [3p_integrations/tgi](../3p-integrations/tgi/) for hosting Llama on open-source model servers.
 * The [RAG](./RAG/) folder contains a simple Retrieval-Augmented Generation application using Llama.
-* The [finetuning](./finetuning/) folder contains resources to help you finetune Llama on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [finetuning.py](../../src/llama_recipes/finetuning.py) which supports these features:
+* The [finetuning](./finetuning/) folder contains resources to help you finetune Llama on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in [finetuning.py](../src/llama_recipes/finetuning.py) which supports these features:
diff --git a/getting-started/finetuning/README.md b/getting-started/finetuning/README.md
index 46d58aa6cfd58ae8387cefb9a3ba29d963556bce..ca2b67578547ccd0847085f692aea8dc1ab59dfa 100644
--- a/getting-started/finetuning/README.md
+++ b/getting-started/finetuning/README.md
@@ -6,7 +6,7 @@ This folder contains instructions to fine-tune Meta Llama 3 on a
 * [single-GPU setup](./singlegpu_finetuning.md)
 * [multi-GPU setup](./multigpu_finetuning.md)
 
-using the canonical [finetuning script](../../../src/llama_recipes/finetuning.py) in the llama-recipes package.
+using the canonical [finetuning script](../../src/llama_recipes/finetuning.py) in the llama-recipes package.
 
 If you are new to fine-tuning techniques, check out [an overview](./LLM_finetuning_overview.md).
 
@@ -17,10 +17,10 @@ If you are new to fine-tuning techniques, check out [an overview](./LLM_finetuni
 ## How to configure finetuning settings?
 
 > [!TIP]
-> All the setting defined in [config files](../../../src/llama_recipes/configs/) can be passed as args through CLI when running the script, there is no need to change from config files directly.
+> All the setting defined in [config files](../../src/llama_recipes/configs/) can be passed as args through CLI when running the script, there is no need to change from config files directly.
 
 
-* [Training config file](../../../src/llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../../../src/llama_recipes/configs/)
+* [Training config file](../../src/llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../../src/llama_recipes/configs/)
 
 It lets us specify the training settings for everything from `model_name` to `dataset_name`, `batch_size` and so on. Below is the list of supported settings:
 
@@ -71,11 +71,11 @@ It lets us specify the training settings for everything from `model_name` to `da
 
 ```
 
-* [Datasets config file](../../../src/llama_recipes/configs/datasets.py) provides the available options for datasets.
+* [Datasets config file](../../src/llama_recipes/configs/datasets.py) provides the available options for datasets.
 
-* [peft config file](../../../src/llama_recipes/configs/peft.py) provides the supported PEFT methods and respective settings that can be modified. We currently support LoRA and Llama-Adapter. Please note that LoRA is the only technique which is supported in combination with FSDP.
+* [peft config file](../../src/llama_recipes/configs/peft.py) provides the supported PEFT methods and respective settings that can be modified. We currently support LoRA and Llama-Adapter. Please note that LoRA is the only technique which is supported in combination with FSDP.
 
-* [FSDP config file](../../../src/llama_recipes/configs/fsdp.py) provides FSDP settings such as:
+* [FSDP config file](../../src/llama_recipes/configs/fsdp.py) provides FSDP settings such as:
 
     * `mixed_precision` boolean flag to specify using mixed precision, defatults to true.
 
diff --git a/getting-started/finetuning/datasets/README.md b/getting-started/finetuning/datasets/README.md
index 8795ca96ddd31372ff37537e763c96f9db1f23ff..3543ee776b4995a1b7766cff412d87fdead1f259 100644
--- a/getting-started/finetuning/datasets/README.md
+++ b/getting-started/finetuning/datasets/README.md
@@ -48,17 +48,17 @@ python -m llama_recipes.finetuning --dataset "custom_dataset" --custom_dataset.f
 This will call the function `get_foo` instead of `get_custom_dataset` when retrieving the dataset.
 
 ### Adding new dataset
-Each dataset has a corresponding configuration (dataclass) in [configs/datasets.py](../../../../src/llama_recipes/configs/datasets.py) which contains the dataset name, training/validation split names, as well as optional parameters like datafiles etc.
+Each dataset has a corresponding configuration (dataclass) in [configs/datasets.py](../../../src/llama_recipes/configs/datasets.py) which contains the dataset name, training/validation split names, as well as optional parameters like datafiles etc.
 
-Additionally, there is a preprocessing function for each dataset in the [datasets](../../../../src/llama_recipes/datasets) folder.
+Additionally, there is a preprocessing function for each dataset in the [datasets](../../../src/llama_recipes/datasets) folder.
 The returned data of the dataset needs to be consumable by the forward method of the fine-tuned model by calling ```model(**data)```.
 For CausalLM models this usually means that the data needs to be in the form of a dictionary with "input_ids", "attention_mask" and "labels" fields.
 
 To add a custom dataset the following steps need to be performed.
 
-1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../../../../src/llama_recipes/configs/datasets.py).
+1. Create a dataset configuration after the schema described above. Examples can be found in [configs/datasets.py](../../../src/llama_recipes/configs/datasets.py).
 2. Create a preprocessing routine which loads the data and returns a PyTorch style dataset. The signature for the preprocessing function needs to be (dataset_config, tokenizer, split_name) where split_name will be the string for train/validation split as defined in the dataclass.
-3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [datasets/__init__.py](../../../../src/llama_recipes/datasets/__init__.py)
+3. Register the dataset name and preprocessing function by inserting it as key and value into the DATASET_PREPROC dictionary in [datasets/__init__.py](../../../src/llama_recipes/datasets/__init__.py)
 4. Set dataset field in training config to dataset name or use --dataset option of the `llama_recipes.finetuning` module or examples/finetuning.py training script.
 
 ## Application
diff --git a/getting-started/finetuning/multigpu_finetuning.md b/getting-started/finetuning/multigpu_finetuning.md
index 0dbf99b8f84a771cfc1a16d2aebb984853fc5893..43a818d18595b3226f9515841e2544dfbaefe60b 100644
--- a/getting-started/finetuning/multigpu_finetuning.md
+++ b/getting-started/finetuning/multigpu_finetuning.md
@@ -96,14 +96,14 @@ srun  torchrun --nproc_per_node 8 --rdzv_id $RANDOM --rdzv_backend c10d --rdzv_e
 Do not forget to adjust the number of nodes, ntasks and gpus-per-task in the top.
 
 ## Running with different datasets
-Currently 3 open source datasets are supported that can be found in [Datasets config file](../../../src/llama_recipes/configs/datasets.py). You can also use your custom dataset (more info [here](./datasets/README.md)).
+Currently 3 open source datasets are supported that can be found in [Datasets config file](../../src/llama_recipes/configs/datasets.py). You can also use your custom dataset (more info [here](./datasets/README.md)).
 
-* `grammar_dataset` : use this [notebook](../../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking.
+* `grammar_dataset` : use this [notebook](../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking.
 
 * `alpaca_dataset` : to get this open source data please download the `aplaca.json` to `dataset` folder.
 
 ```bash
-wget -P ../../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
+wget -P ../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
 ```
 
 * `samsum_dataset`
@@ -132,7 +132,7 @@ In case you are dealing with slower interconnect network between nodes, to reduc
 
 HSDP (Hybrid sharding Data Parallel) helps to define a hybrid sharding strategy where you can have FSDP within `sharding_group_size` which can be the minimum number of GPUs you can fit your model and DDP between the replicas of the model specified by `replica_group_size`.
 
-This will require to set the Sharding strategy in [fsdp config](../../../src/llama_recipes/configs/fsdp.py) to `ShardingStrategy.HYBRID_SHARD` and specify two additional settings, `sharding_group_size` and `replica_group_size` where former specifies the sharding group size, number of GPUs that you model can fit into to form a replica of a model and latter specifies the replica group size, which is world_size/sharding_group_size.
+This will require to set the Sharding strategy in [fsdp config](../../src/llama_recipes/configs/fsdp.py) to `ShardingStrategy.HYBRID_SHARD` and specify two additional settings, `sharding_group_size` and `replica_group_size` where former specifies the sharding group size, number of GPUs that you model can fit into to form a replica of a model and latter specifies the replica group size, which is world_size/sharding_group_size.
 
 ```bash
 
diff --git a/getting-started/finetuning/singlegpu_finetuning.md b/getting-started/finetuning/singlegpu_finetuning.md
index 1b054be18d6a5bd61f89a720d49a41d86921e4a3..8ab3d8a9833431eb2fde85d54eea231c4c4bfd50 100644
--- a/getting-started/finetuning/singlegpu_finetuning.md
+++ b/getting-started/finetuning/singlegpu_finetuning.md
@@ -1,7 +1,7 @@
 # Fine-tuning with Single GPU
 This recipe steps you through how to finetune a Meta Llama 3 model on the text summarization task using the [samsum](https://huggingface.co/datasets/samsum) dataset on a single GPU.
 
-These are the instructions for using the canonical [finetuning script](../../../src/llama_recipes/finetuning.py) in the llama-recipes package.
+These are the instructions for using the canonical [finetuning script](../../src/llama_recipes/finetuning.py) in the llama-recipes package.
 
 
 ## Requirements
@@ -35,13 +35,13 @@ The args used in the command above are:
 
 Currently 3 open source datasets are supported that can be found in [Datasets config file](../../../src/llama_recipes/configs/datasets.py). You can also use your custom dataset (more info [here](./datasets/README.md)).
 
-* `grammar_dataset` : use this [notebook](../../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking.
+* `grammar_dataset` : use this [notebook](../../src/llama_recipes/datasets/grammar_dataset/grammar_dataset_process.ipynb) to pull and process the Jfleg and C4 200M datasets for grammar checking.
 
 * `alpaca_dataset` : to get this open source data please download the `alpaca.json` to `dataset` folder.
 
 
 ```bash
-wget -P ../../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
+wget -P ../../src/llama_recipes/datasets https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
 ```
 
 * `samsum_dataset`
diff --git a/getting-started/inference/local_inference/README.md b/getting-started/inference/local_inference/README.md
index 8e27304a257cd1a059f467688a2dbc2c2ccb7673..a31ee9522b5a487253fec4cbaf18b32c9adfcabf 100644
--- a/getting-started/inference/local_inference/README.md
+++ b/getting-started/inference/local_inference/README.md
@@ -105,7 +105,7 @@ python inference.py --model_name <training_config.output_dir> --peft_model <trai
 
 ## Inference with FSDP checkpoints
 
-In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
+In case you have fine-tuned your model with pure FSDP and saved the checkpoints with "SHARDED_STATE_DICT" as shown [here](../../../src/llama_recipes/configs/fsdp.py), you can use this converter script to convert the FSDP Sharded checkpoints into HuggingFace checkpoints. This enables you to use the inference script normally as mentioned above.
 **To convert the checkpoint use the following command**:
 
 This is helpful if you have fine-tuned you model using FSDP only as follows:
@@ -130,4 +130,4 @@ python inference.py --model_name <training_config.output_dir> --prompt_file <tes
 
 ## Inference on large models like Meta Llama 405B
 The FP8 quantized variants of Meta Llama (i.e. meta-llama/Meta-Llama-3.1-405B-FP8 and meta-llama/Meta-Llama-3.1-405B-Instruct-FP8) can be executed on a single node with 8x80GB H100 using the scripts located in this folder.
-To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).
+To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p-integrations/vllm/README.md).
diff --git a/getting-started/inference/mobile_inference/android_inference/README.md b/getting-started/inference/mobile_inference/android_inference/README.md
index 5a0ec16b575550c6cba54ce3fa336f9b91e340bc..50ec467dcf75c6dda8a194480ed209812da10389 100644
--- a/getting-started/inference/mobile_inference/android_inference/README.md
+++ b/getting-started/inference/mobile_inference/android_inference/README.md
@@ -9,7 +9,7 @@ Machine Learning Compilation for Large Language Models (MLC LLM) is a high-perfo
 
 You can read more about MLC-LLM at the following [link](https://github.com/mlc-ai/mlc-llm).
 
-MLC-LLM is also what powers the Llama3 inference APIs provided by [OctoAI](https://octo.ai/). You can use OctoAI for your Llama3 cloud-based inference needs by trying out the examples under the [following path](../../../../3p_integrations/octoai/).
+MLC-LLM is also what powers the Llama3 inference APIs provided by [OctoAI](https://octo.ai/). You can use OctoAI for your Llama3 cloud-based inference needs by trying out the examples under the [following path](../../../../3p-integrations/octoai/).
 
 This tutorial was tested with the following setup:
 * MacBook Pro 16 inch from 2021 with Apple M1 Max and 32GB of RAM running Sonoma 14.3.1
diff --git a/src/docs/FAQ.md b/src/docs/FAQ.md
index 6dc3fd91b40acf42015b7c78ef7300470ea8a039..db0d9e08c7ccb29792d470916e755682e329dbca 100644
--- a/src/docs/FAQ.md
+++ b/src/docs/FAQ.md
@@ -16,7 +16,7 @@ Here we discuss frequently asked questions that may occur and we found useful al
 
 4. Can I add custom datasets?
 
-    Yes, you can find more information on how to do that [here](../recipes/quickstart/finetuning/datasets/README.md).
+    Yes, you can find more information on how to do that [here](../../getting-started/finetuning/datasets/README.md).
 
 5. What are the hardware SKU requirements for deploying these models?
 
@@ -36,13 +36,13 @@ Here we discuss frequently asked questions that may occur and we found useful al
     os.environ['PYTORCH_CUDA_ALLOC_CONF']='expandable_segments:True'
 
     ```
-    We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../src/llama_recipes/utils/train_utils.py), feel free to uncomment it if required.
+    We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../llama_recipes/utils/train_utils.py), feel free to uncomment it if required.
 
 8. Additional debugging flags?
 
     The environment variable `TORCH_DISTRIBUTED_DEBUG` can be used to trigger additional useful logging and collective synchronization checks to ensure all ranks are synchronized appropriately. `TORCH_DISTRIBUTED_DEBUG` can be set to either OFF (default), INFO, or DETAIL depending on the debugging level required. Please note that the most verbose option, DETAIL may impact the application performance and thus should only be used when debugging issues.
 
-    We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../src/llama_recipes/utils/train_utils.py), feel free to uncomment it if required.
+    We also added this enviroment variable in `setup_environ_flags` of the [train_utils.py](../llama_recipes/utils/train_utils.py), feel free to uncomment it if required.
 
 9. I am getting import errors when running inference.
 
diff --git a/src/docs/multi_gpu.md b/src/docs/multi_gpu.md
index aecafd930e6e0ebec031fe81d2e60fdae38f81e7..6b34c42d604706cfeddc7f859695d2ff2b2bc533 100644
--- a/src/docs/multi_gpu.md
+++ b/src/docs/multi_gpu.md
@@ -10,7 +10,7 @@ Given the combination of PEFT and FSDP, we would be able to fine tune a Meta Lla
 For big models like 405B we will need to fine-tune in a multi-node setup even if 4bit quantization is enabled.
 
 ## Requirements
-To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../recipes/quickstart/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
+To run the examples, make sure to install the llama-recipes package and clone the github repository in order to use the provided [`finetuning.py`](../../getting-started/finetuning/finetuning.py) script with torchrun (See [README.md](../README.md) for details).
 
 ## How to run it
 
@@ -117,7 +117,7 @@ torchrun --nnodes 1 --nproc_per_node 4  recipes/quickstart/finetuning/finetuning
 
 ## Where to configure settings?
 
-* [Training config file](../llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../src/llama_recipes/configs/)
+* [Training config file](../llama_recipes/configs/training.py) is the main config file that helps to specify the settings for our run and can be found in [configs folder](../llama_recipes/configs/)
 
 It lets us specify the training settings for everything from `model_name` to `dataset_name`, `batch_size` and so on. Below is the list of supported settings: