From 7f026ae834e115da22a63f5494170a611a585555 Mon Sep 17 00:00:00 2001 From: xiamengzhou <296337231@qq.com> Date: Mon, 5 Feb 2024 11:51:25 -0500 Subject: [PATCH] update --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a222499..04601cb 100644 --- a/README.md +++ b/README.md @@ -37,7 +37,7 @@ pip install -e . ## Data Preparation -We follow the [open-instruct](https://github.com/allenai/open-instruct?tab=readme-ov-file#dataset-preparation) repo to prepare hour instruction tuning datasets. In our project, we utilize a combination of four training datasets: Flan v2, COT, Dolly, and Open Assistant. For the purposes of evaluation, we employ three additional datasets: MMLU, Tydiqa, and BBH. A processed version of these files will be made available [here] [TODO]. +We follow the [open-instruct](https://github.com/allenai/open-instruct?tab=readme-ov-file#dataset-preparation) repo to prepare hour instruction tuning datasets. In our project, we utilize a combination of four training datasets: Flan v2, COT, Dolly, and Open Assistant. For the purposes of evaluation, we employ three additional datasets: MMLU, Tydiqa, and BBH. A processed version of these files are available [here](https://huggingface.co/datasets/princeton-nlp/less_data). ## Data Selection Pipeline -- GitLab