Skip to content
Snippets Groups Projects
user avatar
Andrei Fajardo authored
* copy dataset metadata

* update library.json and download modules

* working

* pants tailor

* use main not dev branch
b3112de7
History
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Mini Squad V2 Dataset

arize (100 x 40 px)

This dataset was prepared in collaboration with Xander Song of Arize AI.

CLI Usage

You can download llamadatasets directly using llamaindex-cli, which comes installed with the llama-index python package:

llamaindex-cli download-llamadataset MiniSquadV2Dataset --download-dir ./data

You can then inspect the files at ./data. When you're ready to load the data into python, you can use the below snippet of code:

from llama_index.core import SimpleDirectoryReader
from llama_index.core.llama_dataset import LabelledRagDataset

rag_dataset = LabelledRagDataset.from_json("./data/rag_dataset.json")
documents = SimpleDirectoryReader(input_dir="./data/source_files").load_data()

Code Usage

You can download the dataset to a directory, say ./data directly in Python as well. From there, you can use the convenient RagEvaluatorPack llamapack to run your own LlamaIndex RAG pipeline with the llamadataset.

from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.llama_pack import download_llama_pack
from llama_index.core import VectorStoreIndex

# download and install dependencies for benchmark dataset
rag_dataset, documents = download_llama_dataset("MiniSquadV2Dataset", "./data")

# build basic RAG system
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()

# evaluate using the RagEvaluatorPack
RagEvaluatorPack = download_llama_pack(
    "RagEvaluatorPack", "./rag_evaluator_pack"
)
rag_evaluator_pack = RagEvaluatorPack(
    rag_dataset=rag_dataset, query_engine=query_engine
)

############################################################################
# NOTE: If have a lower tier subscription for OpenAI API like Usage Tier 1 #
# then you'll need to use different batch_size and sleep_time_in_seconds.  #
# For Usage Tier 1, settings that seemed to work well were batch_size=5,   #
# and sleep_time_in_seconds=15 (as of December 2023.)                      #
############################################################################

benchmark_df = await rag_evaluator_pack.arun(
    batch_size=20,  # batches the number of openai api calls to make
    sleep_time_in_seconds=1,  # seconds to sleep before making an api call
)

Original data citation

@article{2016arXiv160605250R,
       author = {{Rajpurkar}, Pranav and {Zhang}, Jian and {Lopyrev},
                 Konstantin and {Liang}, Percy},
        title = "{SQuAD: 100,000+ Questions for Machine Comprehension of Text}",
      journal = {arXiv e-prints},
         year = 2016,
          eid = {arXiv:1606.05250},
        pages = {arXiv:1606.05250},
archivePrefix = {arXiv},
       eprint = {1606.05250},
}