Skip to content
Snippets Groups Projects
Commit 7f026ae8 authored by xiamengzhou's avatar xiamengzhou
Browse files

update

parent 33deb347
No related branches found
No related tags found
No related merge requests found
...@@ -37,7 +37,7 @@ pip install -e . ...@@ -37,7 +37,7 @@ pip install -e .
## Data Preparation ## Data Preparation
We follow the [open-instruct](https://github.com/allenai/open-instruct?tab=readme-ov-file#dataset-preparation) repo to prepare hour instruction tuning datasets. In our project, we utilize a combination of four training datasets: Flan v2, COT, Dolly, and Open Assistant. For the purposes of evaluation, we employ three additional datasets: MMLU, Tydiqa, and BBH. A processed version of these files will be made available [here] [TODO]. We follow the [open-instruct](https://github.com/allenai/open-instruct?tab=readme-ov-file#dataset-preparation) repo to prepare hour instruction tuning datasets. In our project, we utilize a combination of four training datasets: Flan v2, COT, Dolly, and Open Assistant. For the purposes of evaluation, we employ three additional datasets: MMLU, Tydiqa, and BBH. A processed version of these files are available [here](https://huggingface.co/datasets/princeton-nlp/less_data).
## Data Selection Pipeline ## Data Selection Pipeline
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment