llm=OpenAI(model="gpt-3.5-turbo",temperature=0.1),# used for generating summaries
vector_store=vector_store,# used for storage
similarity_top_k=2,# top k for each layer, or overall top-k for collapsed
mode="collapsed",# sets default mode
transformations=[
SentenceSplitter(chunk_size=400,chunk_overlap=50)
],# transformations applied for ingestion
)
```
%% Output
Generating embeddings for level 0.
Performing clustering for level 0.
Generating summaries for level 0 with 10 clusters.
Level 0 created summaries/clusters: 10
Generating embeddings for level 1.
Performing clustering for level 1.
Generating summaries for level 1 with 1 clusters.
Level 1 created summaries/clusters: 1
Generating embeddings for level 2.
Performing clustering for level 2.
Generating summaries for level 2 with 1 clusters.
Level 2 created summaries/clusters: 1
%% Cell type:markdown id: tags:
## Retrieval
%% Cell type:code id: tags:
``` python
nodes=raptor_pack.run("What baselines is raptor compared against?",mode="collapsed")
print(len(nodes))
print(nodes[0].text)
```
%% Output
2
Specifically, RAPTOR’s F-1 scores are at least 1.8% points higher than DPR and at least 5.3% points
higher than BM25.
Retriever GPT-3 F-1 Match GPT-4 F-1 Match UnifiedQA F-1 Match
Title + Abstract 25.2 22.2 17.5
BM25 46.6 50.2 26.4
DPR 51.3 53.0 32.1
RAPTOR 53.1 55.7 36.6
Table 4: Comparison of accuracies on the QuAL-
ITY dev dataset for two different language mod-
els (GPT-3, UnifiedQA 3B) using various retrieval
methods. RAPTOR outperforms the baselines of
BM25 and DPR by at least 2.0% in accuracy.
Model GPT-3 Acc. UnifiedQA Acc.
BM25 57.3 49.9
DPR 60.4 53.9
RAPTOR 62.4 56.6
Table 5: Results on F-1 Match scores of various
models on the QASPER dataset.
Model F-1 Match
LongT5 XL (Guo et al., 2022) 53.1
CoLT5 XL (Ainslie et al., 2023) 53.9
RAPTOR + GPT-4 55.7Comparison to State-of-the-art Systems
Building upon our controlled comparisons,
we examine RAPTOR’s performance relative
to other state-of-the-art models.
%% Cell type:code id: tags:
``` python
nodes=raptor_pack.run(
"What baselines is raptor compared against?",mode="tree_traversal"
)
print(len(nodes))
print(nodes[0].text)
```
%% Output
Retrieved parent IDs from level 2: ['cc3b3f41-f4ca-4020-b11f-be7e0ce04c4f']
Retrieved 1 from parents at level 2.
Retrieved parent IDs from level 1: ['a4ca9426-a312-4a01-813a-c9b02aefc7e8']
Retrieved 2 from parents at level 1.
Retrieved parent IDs from level 0: ['63126782-2778-449f-99c0-1e8fd90caa36', 'd8f68d31-d878-41f1-aeb6-a7dde8ed5143']
Retrieved 4 from parents at level 0.
4
Specifically, RAPTOR’s F-1 scores are at least 1.8% points higher than DPR and at least 5.3% points
higher than BM25.
Retriever GPT-3 F-1 Match GPT-4 F-1 Match UnifiedQA F-1 Match
Title + Abstract 25.2 22.2 17.5
BM25 46.6 50.2 26.4
DPR 51.3 53.0 32.1
RAPTOR 53.1 55.7 36.6
Table 4: Comparison of accuracies on the QuAL-
ITY dev dataset for two different language mod-
els (GPT-3, UnifiedQA 3B) using various retrieval
methods. RAPTOR outperforms the baselines of
BM25 and DPR by at least 2.0% in accuracy.
Model GPT-3 Acc. UnifiedQA Acc.
BM25 57.3 49.9
DPR 60.4 53.9
RAPTOR 62.4 56.6
Table 5: Results on F-1 Match scores of various
models on the QASPER dataset.
Model F-1 Match
LongT5 XL (Guo et al., 2022) 53.1
CoLT5 XL (Ainslie et al., 2023) 53.9
RAPTOR + GPT-4 55.7Comparison to State-of-the-art Systems
Building upon our controlled comparisons,
we examine RAPTOR’s performance relative
to other state-of-the-art models.
%% Cell type:markdown id: tags:
## Loading
Since we saved to a vector store, we can also use it again! (For local vector stores, there is a `persist` and `from_persist_dir` method on the retriever)
%% Cell type:code id: tags:
``` python
fromllama_index.packs.raptorimportRaptorRetriever
retriever=RaptorRetriever(
[],
embed_model=OpenAIEmbedding(
model="text-embedding-3-small"
),# used for embedding clusters
llm=OpenAI(model="gpt-3.5-turbo",temperature=0.1),# used for generating summaries
vector_store=vector_store,# used for storage
similarity_top_k=2,# top k for each layer, or overall top-k for collapsed