Skip to content
Snippets Groups Projects
Unverified Commit 58f098a1 authored by Cole Murray's avatar Cole Murray Committed by GitHub
Browse files

Fix Url Truncation in MetadataRepalcementDemo Notebook (#9393)

Fix Url Truncation in MetadataReaplcementDemo Notebook
parent e743f235
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/MetadataReplacementDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> <a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/node_postprocessor/MetadataReplacementDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Metadata Replacement + Node Sentence Window # Metadata Replacement + Node Sentence Window
In this notebook, we use the `SentenceWindowNodeParser` to parse documents into single sentences per node. Each node also contains a "window" with the sentences on either side of the node sentence. In this notebook, we use the `SentenceWindowNodeParser` to parse documents into single sentences per node. Each node also contains a "window" with the sentences on either side of the node sentence.
Then, during retrieval, before passing the retrieved sentences to the LLM, the single sentences are replaced with a window containing the surrounding sentences using the `MetadataReplacementNodePostProcessor`. Then, during retrieval, before passing the retrieved sentences to the LLM, the single sentences are replaced with a window containing the surrounding sentences using the `MetadataReplacementNodePostProcessor`.
This is most useful for large documents/indexes, as it helps to retrieve more fine-grained details. This is most useful for large documents/indexes, as it helps to retrieve more fine-grained details.
By default, the sentence window is 5 sentences on either side of the original sentence. By default, the sentence window is 5 sentences on either side of the original sentence.
In this case, chunk size settings are not used, in favor of following the window settings. In this case, chunk size settings are not used, in favor of following the window settings.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
%load_ext autoreload %load_ext autoreload
%autoreload 2 %autoreload 2
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Setup ## Setup
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙. If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!pip install llama-index !pip install llama-index
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import os import os
import openai import openai
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
os.environ["OPENAI_API_KEY"] = "sk-..." os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"] openai.api_key = os.environ["OPENAI_API_KEY"]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index import ServiceContext, set_global_service_context from llama_index import ServiceContext, set_global_service_context
from llama_index.llms import OpenAI from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding, HuggingFaceEmbedding from llama_index.embeddings import OpenAIEmbedding, HuggingFaceEmbedding
from llama_index.node_parser import ( from llama_index.node_parser import (
SentenceWindowNodeParser, SentenceWindowNodeParser,
) )
from llama_index.text_splitter import SentenceSplitter from llama_index.text_splitter import SentenceSplitter
# create the sentence window node parser w/ default settings # create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults( node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3, window_size=3,
window_metadata_key="window", window_metadata_key="window",
original_text_metadata_key="original_text", original_text_metadata_key="original_text",
) )
# base node parser is a sentence splitter # base node parser is a sentence splitter
text_splitter = SentenceSplitter() text_splitter = SentenceSplitter()
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1) llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = HuggingFaceEmbedding( embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-mpnet-base-v2", max_length=512 model_name="sentence-transformers/all-mpnet-base-v2", max_length=512
) )
ctx = ServiceContext.from_defaults( ctx = ServiceContext.from_defaults(
llm=llm, llm=llm,
embed_model=embed_model, embed_model=embed_model,
# node_parser=node_parser, # node_parser=node_parser,
) )
# if you wanted to use OpenAIEmbedding, we should also increase the batch size, # if you wanted to use OpenAIEmbedding, we should also increase the batch size,
# since it involves many more calls to the API # since it involves many more calls to the API
# ctx = ServiceContext.from_defaults(llm=llm, embed_model=OpenAIEmbedding(embed_batch_size=50)), node_parser=node_parser) # ctx = ServiceContext.from_defaults(llm=llm, embed_model=OpenAIEmbedding(embed_batch_size=50)), node_parser=node_parser)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Load Data, Build the Index ## Load Data, Build the Index
In this section, we load data and build the vector index. In this section, we load data and build the vector index.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Load Data ### Load Data
Here, we build an index using chapter 3 of the recent IPCC climate report. Here, we build an index using chapter 3 of the recent IPCC climate report.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
!curl https://www..ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf !curl https://www.ipcc.ch/report/ar6/wg2/downloads/report/IPCC_AR6_WGII_Chapter03.pdf --output IPCC_AR6_WGII_Chapter03.pdf
``` ```
%% Output %% Output
% Total % Received % Xferd Average Speed Time Time Time Current % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: www..ch 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: www..ch
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index import SimpleDirectoryReader from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader( documents = SimpleDirectoryReader(
input_files=["./IPCC_AR6_WGII_Chapter03.pdf"] input_files=["./IPCC_AR6_WGII_Chapter03.pdf"]
).load_data() ).load_data()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Extract Nodes ### Extract Nodes
We extract out the set of nodes that will be stored in the VectorIndex. This includes both the nodes with the sentence window parser, as well as the "base" nodes extracted using the standard parser. We extract out the set of nodes that will be stored in the VectorIndex. This includes both the nodes with the sentence window parser, as well as the "base" nodes extracted using the standard parser.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
nodes = node_parser.get_nodes_from_documents(documents) nodes = node_parser.get_nodes_from_documents(documents)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
base_nodes = text_splitter.get_nodes_from_documents(documents) base_nodes = text_splitter.get_nodes_from_documents(documents)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Build the Indexes ### Build the Indexes
We build both the sentence index, as well as the "base" index (with default chunk sizes). We build both the sentence index, as well as the "base" index (with default chunk sizes).
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index import VectorStoreIndex from llama_index import VectorStoreIndex
sentence_index = VectorStoreIndex(nodes, service_context=ctx) sentence_index = VectorStoreIndex(nodes, service_context=ctx)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
base_index = VectorStoreIndex(base_nodes, service_context=ctx) base_index = VectorStoreIndex(base_nodes, service_context=ctx)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Querying ## Querying
### With MetadataReplacementPostProcessor ### With MetadataReplacementPostProcessor
Here, we now use the `MetadataReplacementPostProcessor` to replace the sentence in each node with it's surrounding context. Here, we now use the `MetadataReplacementPostProcessor` to replace the sentence in each node with it's surrounding context.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.postprocessor import MetadataReplacementPostProcessor from llama_index.postprocessor import MetadataReplacementPostProcessor
query_engine = sentence_index.as_query_engine( query_engine = sentence_index.as_query_engine(
similarity_top_k=2, similarity_top_k=2,
# the target key defaults to `window` to match the node_parser's default # the target key defaults to `window` to match the node_parser's default
node_postprocessors=[ node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window") MetadataReplacementPostProcessor(target_metadata_key="window")
], ],
) )
window_response = query_engine.query( window_response = query_engine.query(
"What are the concerns surrounding the AMOC?" "What are the concerns surrounding the AMOC?"
) )
print(window_response) print(window_response)
``` ```
%% Output %% Output
There is low confidence in the quantification of Atlantic Meridional Overturning Circulation (AMOC) changes in the 20th century due to low agreement in quantitative reconstructed and simulated trends. Additionally, direct observational records since the mid-2000s remain too short to determine the relative contributions of internal variability, natural forcing, and anthropogenic forcing to AMOC change. However, it is very likely that AMOC will decline for all SSP scenarios over the 21st century, but it will not involve an abrupt collapse before 2100. There is low confidence in the quantification of Atlantic Meridional Overturning Circulation (AMOC) changes in the 20th century due to low agreement in quantitative reconstructed and simulated trends. Additionally, direct observational records since the mid-2000s remain too short to determine the relative contributions of internal variability, natural forcing, and anthropogenic forcing to AMOC change. However, it is very likely that AMOC will decline for all SSP scenarios over the 21st century, but it will not involve an abrupt collapse before 2100.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
We can also check the original sentence that was retrieved for each node, as well as the actual window of sentences that was sent to the LLM. We can also check the original sentence that was retrieved for each node, as well as the actual window of sentences that was sent to the LLM.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
window = window_response.source_nodes[0].node.metadata["window"] window = window_response.source_nodes[0].node.metadata["window"]
sentence = window_response.source_nodes[0].node.metadata["original_text"] sentence = window_response.source_nodes[0].node.metadata["original_text"]
print(f"Window: {window}") print(f"Window: {window}")
print("------------------") print("------------------")
print(f"Original Sentence: {sentence}") print(f"Original Sentence: {sentence}")
``` ```
%% Output %% Output
Window: Nevertheless, projected future annual cumulative upwelling wind Window: Nevertheless, projected future annual cumulative upwelling wind
changes at most locations and seasons remain within ±10–20% of changes at most locations and seasons remain within ±10–20% of
present-day values (medium confidence) (WGI AR6 Section 9.2.3.5; present-day values (medium confidence) (WGI AR6 Section 9.2.3.5;
Fox-Kemper et al., 2021). Fox-Kemper et al., 2021).
Continuous observation of the Atlantic meridional overturning Continuous observation of the Atlantic meridional overturning
circulation (AMOC) has improved the understanding of its variability circulation (AMOC) has improved the understanding of its variability
(Frajka-Williams et al., 2019), but there is low confidence in the (Frajka-Williams et al., 2019), but there is low confidence in the
quantification of AMOC changes in the 20th century because of low quantification of AMOC changes in the 20th century because of low
agreement in quantitative reconstructed and simulated trends (WGI agreement in quantitative reconstructed and simulated trends (WGI
AR6 Sections 2.3.3, 9.2.3.1; Fox-Kemper et al., 2021; Gulev et al., 2021). AR6 Sections 2.3.3, 9.2.3.1; Fox-Kemper et al., 2021; Gulev et al., 2021).
Direct observational records since the mid-2000s remain too short to Direct observational records since the mid-2000s remain too short to
determine the relative contributions of internal variability, natural determine the relative contributions of internal variability, natural
forcing and anthropogenic forcing to AMOC change (high confidence) forcing and anthropogenic forcing to AMOC change (high confidence)
(WGI AR6 Sections 2.3.3, 9.2.3.1; Fox-Kemper et al., 2021; Gulev et al., (WGI AR6 Sections 2.3.3, 9.2.3.1; Fox-Kemper et al., 2021; Gulev et al.,
2021). Over the 21st century, AMOC will very likely decline for all SSP 2021). Over the 21st century, AMOC will very likely decline for all SSP
scenarios but will not involve an abrupt collapse before 2100 (WGI scenarios but will not involve an abrupt collapse before 2100 (WGI
AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021). AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021).
3.2.2.4 Sea Ice Changes 3.2.2.4 Sea Ice Changes
Sea ice is a key driver of polar marine life, hosting unique ecosystems Sea ice is a key driver of polar marine life, hosting unique ecosystems
and affecting diverse marine organisms and food webs through its and affecting diverse marine organisms and food webs through its
impact on light penetration and supplies of nutrients and organic impact on light penetration and supplies of nutrients and organic
matter (Arrigo, 2014). Since the late 1970s, Arctic sea ice area has matter (Arrigo, 2014). Since the late 1970s, Arctic sea ice area has
decreased for all months, with an estimated decrease of 2 million km2 decreased for all months, with an estimated decrease of 2 million km2
(or 25%) for summer sea ice (averaged for August, September and (or 25%) for summer sea ice (averaged for August, September and
October) in 2010–2019 as compared with 1979–1988 (WGI AR6 October) in 2010–2019 as compared with 1979–1988 (WGI AR6
Section 9.3.1.1; Fox-Kemper et al., 2021). Section 9.3.1.1; Fox-Kemper et al., 2021).
------------------ ------------------
Original Sentence: Over the 21st century, AMOC will very likely decline for all SSP Original Sentence: Over the 21st century, AMOC will very likely decline for all SSP
scenarios but will not involve an abrupt collapse before 2100 (WGI scenarios but will not involve an abrupt collapse before 2100 (WGI
AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021). AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Contrast with normal VectorStoreIndex ### Contrast with normal VectorStoreIndex
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
query_engine = base_index.as_query_engine(similarity_top_k=2) query_engine = base_index.as_query_engine(similarity_top_k=2)
vector_response = query_engine.query( vector_response = query_engine.query(
"What are the concerns surrounding the AMOC?" "What are the concerns surrounding the AMOC?"
) )
print(vector_response) print(vector_response)
``` ```
%% Output %% Output
The concerns surrounding the AMOC are not provided in the given context information. The concerns surrounding the AMOC are not provided in the given context information.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Well, that didn't work. Let's bump up the top k! This will be slower and use more tokens compared to the sentence window index. Well, that didn't work. Let's bump up the top k! This will be slower and use more tokens compared to the sentence window index.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
query_engine = base_index.as_query_engine(similarity_top_k=5) query_engine = base_index.as_query_engine(similarity_top_k=5)
vector_response = query_engine.query( vector_response = query_engine.query(
"What are the concerns surrounding the AMOC?" "What are the concerns surrounding the AMOC?"
) )
print(vector_response) print(vector_response)
``` ```
%% Output %% Output
There are concerns surrounding the AMOC (Atlantic Meridional Overturning Circulation). The context information mentions that the AMOC will decline over the 21st century, with high confidence but low confidence for quantitative projections. There are concerns surrounding the AMOC (Atlantic Meridional Overturning Circulation). The context information mentions that the AMOC will decline over the 21st century, with high confidence but low confidence for quantitative projections.
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Analysis ## Analysis
So the `SentenceWindowNodeParser` + `MetadataReplacementNodePostProcessor` combo is the clear winner here. But why? So the `SentenceWindowNodeParser` + `MetadataReplacementNodePostProcessor` combo is the clear winner here. But why?
Embeddings at a sentence level seem to capture more fine-grained details, like the word `AMOC`. Embeddings at a sentence level seem to capture more fine-grained details, like the word `AMOC`.
We can also compare the retrieved chunks for each index! We can also compare the retrieved chunks for each index!
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
for source_node in window_response.source_nodes: for source_node in window_response.source_nodes:
print(source_node.node.metadata["original_text"]) print(source_node.node.metadata["original_text"])
print("--------") print("--------")
``` ```
%% Output %% Output
Over the 21st century, AMOC will very likely decline for all SSP Over the 21st century, AMOC will very likely decline for all SSP
scenarios but will not involve an abrupt collapse before 2100 (WGI scenarios but will not involve an abrupt collapse before 2100 (WGI
AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021). AR6 Sections 4.3.2, 9.2.3.1; Fox-Kemper et al., 2021; Lee et al., 2021).
-------- --------
Direct observational records since the mid-2000s remain too short to Direct observational records since the mid-2000s remain too short to
determine the relative contributions of internal variability, natural determine the relative contributions of internal variability, natural
forcing and anthropogenic forcing to AMOC change (high confidence) forcing and anthropogenic forcing to AMOC change (high confidence)
(WGI AR6 Sections 2.3.3, 9.2.3.1; Fox-Kemper et al., 2021; Gulev et al., (WGI AR6 Sections 2.3.3, 9.2.3.1; Fox-Kemper et al., 2021; Gulev et al.,
2021). 2021).
-------- --------
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Here, we can see that the sentence window index easily retrieved two nodes that talk about AMOC. Remember, the embeddings are based purely on the original sentence here, but the LLM actually ends up reading the surrounding context as well! Here, we can see that the sentence window index easily retrieved two nodes that talk about AMOC. Remember, the embeddings are based purely on the original sentence here, but the LLM actually ends up reading the surrounding context as well!
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Now, let's try and disect why the naive vector index failed. Now, let's try and disect why the naive vector index failed.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
for node in vector_response.source_nodes: for node in vector_response.source_nodes:
print("AMOC mentioned?", "AMOC" in node.node.text) print("AMOC mentioned?", "AMOC" in node.node.text)
print("--------") print("--------")
``` ```
%% Output %% Output
AMOC mentioned? False AMOC mentioned? False
-------- --------
AMOC mentioned? False AMOC mentioned? False
-------- --------
AMOC mentioned? True AMOC mentioned? True
-------- --------
AMOC mentioned? False AMOC mentioned? False
-------- --------
AMOC mentioned? False AMOC mentioned? False
-------- --------
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So source node at index [2] mentions AMOC, but what did this text actually look like? So source node at index [2] mentions AMOC, but what did this text actually look like?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
print(vector_response.source_nodes[2].node.text) print(vector_response.source_nodes[2].node.text)
``` ```
%% Output %% Output
2021; Gulev et al. 2021; Gulev et al.
2021)The AMOC will decline over the 21st century 2021)The AMOC will decline over the 21st century
(high confidence, but low confidence for (high confidence, but low confidence for
quantitative projections).4.3.2.3, 9.2.3 (Fox-Kemper quantitative projections).4.3.2.3, 9.2.3 (Fox-Kemper
et al. 2021; Lee et al. et al. 2021; Lee et al.
2021) 2021)
Sea ice Sea ice
Arctic sea ice Arctic sea ice
changes‘Current Arctic sea ice coverage levels are the changes‘Current Arctic sea ice coverage levels are the
lowest since at least 1850 for both annual mean lowest since at least 1850 for both annual mean
and late-summer values (high confidence).’2.3.2.1, 9.3.1 (Fox-Kemper and late-summer values (high confidence).’2.3.2.1, 9.3.1 (Fox-Kemper
et al. 2021; Gulev et al. et al. 2021; Gulev et al.
2021)‘The Arctic will become practically ice-free in 2021)‘The Arctic will become practically ice-free in
September by the end of the 21st century under September by the end of the 21st century under
SSP2-4.5, SSP3-7.0 and SSP5-8.5[…](high SSP2-4.5, SSP3-7.0 and SSP5-8.5[…](high
confidence).’4.3.2.1, 9.3.1 (Fox-Kemper confidence).’4.3.2.1, 9.3.1 (Fox-Kemper
et al. 2021; Lee et al. et al. 2021; Lee et al.
2021) 2021)
Antarctic sea ice Antarctic sea ice
changesThere is no global significant trend in changesThere is no global significant trend in
Antarctic sea ice area from 1979 to 2020 (high Antarctic sea ice area from 1979 to 2020 (high
confidence).2.3.2.1, 9.3.2 (Fox-Kemper confidence).2.3.2.1, 9.3.2 (Fox-Kemper
et al. 2021; Gulev et al. et al. 2021; Gulev et al.
2021)There is low confidence in model simulations of 2021)There is low confidence in model simulations of
future Antarctic sea ice.9.3.2 (Fox-Kemper et al. future Antarctic sea ice.9.3.2 (Fox-Kemper et al.
2021) 2021)
Ocean chemistry Ocean chemistry
Changes in salinityThe ‘large-scale, near-surface salinity contrasts Changes in salinityThe ‘large-scale, near-surface salinity contrasts
have intensified since at least 1950 […] have intensified since at least 1950 […]
(virtually certain).’2.3.3.2, 9.2.2.2 (virtually certain).’2.3.3.2, 9.2.2.2
(Fox-Kemper et al. 2021; (Fox-Kemper et al. 2021;
Gulev et al. 2021)‘Fresh ocean regions will continue to get fresher Gulev et al. 2021)‘Fresh ocean regions will continue to get fresher
and salty ocean regions will continue to get and salty ocean regions will continue to get
saltier in the 21st century (medium confidence).’9.2.2.2 (Fox-Kemper et al. saltier in the 21st century (medium confidence).’9.2.2.2 (Fox-Kemper et al.
2021) 2021)
Ocean acidificationOcean surface pH has declined globally over the Ocean acidificationOcean surface pH has declined globally over the
past four decades (virtually certain).2.3.3.5, 5.3.2.2 (Canadell past four decades (virtually certain).2.3.3.5, 5.3.2.2 (Canadell
et al. 2021; Gulev et al. et al. 2021; Gulev et al.
2021)Ocean surface pH will continue to decrease 2021)Ocean surface pH will continue to decrease
‘through the 21st century, except for the ‘through the 21st century, except for the
lower-emission scenarios SSP1-1.9 and SSP1-2.6 lower-emission scenarios SSP1-1.9 and SSP1-2.6
[…] (high confidence).’4.3.2.5, 4.5.2.2, 5.3.4.1 […] (high confidence).’4.3.2.5, 4.5.2.2, 5.3.4.1
(Lee et al. 2021; Canadell (Lee et al. 2021; Canadell
et al. 2021) et al. 2021)
Ocean Ocean
deoxygenationDeoxygenation has occurred in most open deoxygenationDeoxygenation has occurred in most open
ocean regions since the mid-20th century (high ocean regions since the mid-20th century (high
confidence).2.3.3.6, 5.3.3.2 (Canadell confidence).2.3.3.6, 5.3.3.2 (Canadell
et al. 2021; Gulev et al. et al. 2021; Gulev et al.
2021)Subsurface oxygen content ‘is projected to 2021)Subsurface oxygen content ‘is projected to
transition to historically unprecedented condition transition to historically unprecedented condition
with decline over the 21st century (medium with decline over the 21st century (medium
confidence).’5.3.3.2 (Canadell et al. confidence).’5.3.3.2 (Canadell et al.
2021) 2021)
Changes in nutrient Changes in nutrient
concentrationsNot assessed in WGI Not assessed in WGI concentrationsNot assessed in WGI Not assessed in WGI
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
So AMOC is disuccsed, but sadly it is in the middle chunk. With LLMs, it is often observed that text in the middle of retrieved context is often ignored or less useful. A recent paper ["Lost in the Middle" discusses this here](https://arxiv.org/abs/2307.03172). So AMOC is disuccsed, but sadly it is in the middle chunk. With LLMs, it is often observed that text in the middle of retrieved context is often ignored or less useful. A recent paper ["Lost in the Middle" discusses this here](https://arxiv.org/abs/2307.03172).
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## [Optional] Evaluation ## [Optional] Evaluation
We more rigorously evaluate how well the sentence window retriever works compared to the base retriever. We more rigorously evaluate how well the sentence window retriever works compared to the base retriever.
We define/load an eval benchmark dataset and then run different evaluations over it. We define/load an eval benchmark dataset and then run different evaluations over it.
**WARNING**: This can be *expensive*, especially with GPT-4. Use caution and tune the sample size to fit your budget. **WARNING**: This can be *expensive*, especially with GPT-4. Use caution and tune the sample size to fit your budget.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.evaluation import ( from llama_index.evaluation import (
DatasetGenerator, DatasetGenerator,
QueryResponseDataset, QueryResponseDataset,
) )
from llama_index import ServiceContext from llama_index import ServiceContext
from llama_index.llms import OpenAI from llama_index.llms import OpenAI
import nest_asyncio import nest_asyncio
import random import random
nest_asyncio.apply() nest_asyncio.apply()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
len(base_nodes) len(base_nodes)
``` ```
%% Output %% Output
428 428
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
num_nodes_eval = 30 num_nodes_eval = 30
# there are 428 nodes total. Take the first 200 to generate questions (the back half of the doc is all references) # there are 428 nodes total. Take the first 200 to generate questions (the back half of the doc is all references)
sample_eval_nodes = random.sample(base_nodes[:200], num_nodes_eval) sample_eval_nodes = random.sample(base_nodes[:200], num_nodes_eval)
# NOTE: run this if the dataset isn't already saved # NOTE: run this if the dataset isn't already saved
eval_service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-4")) eval_service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-4"))
# generate questions from the largest chunks (1024) # generate questions from the largest chunks (1024)
dataset_generator = DatasetGenerator( dataset_generator = DatasetGenerator(
sample_eval_nodes, sample_eval_nodes,
service_context=eval_service_context, service_context=eval_service_context,
show_progress=True, show_progress=True,
num_questions_per_chunk=2, num_questions_per_chunk=2,
) )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
eval_dataset = await dataset_generator.agenerate_dataset_from_nodes() eval_dataset = await dataset_generator.agenerate_dataset_from_nodes()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
eval_dataset.save_json("data/ipcc_eval_qr_dataset.json") eval_dataset.save_json("data/ipcc_eval_qr_dataset.json")
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# optional # optional
eval_dataset = QueryResponseDataset.from_json("data/ipcc_eval_qr_dataset.json") eval_dataset = QueryResponseDataset.from_json("data/ipcc_eval_qr_dataset.json")
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Compare Results ### Compare Results
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import asyncio import asyncio
import nest_asyncio import nest_asyncio
nest_asyncio.apply() nest_asyncio.apply()
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.evaluation import ( from llama_index.evaluation import (
CorrectnessEvaluator, CorrectnessEvaluator,
SemanticSimilarityEvaluator, SemanticSimilarityEvaluator,
RelevancyEvaluator, RelevancyEvaluator,
FaithfulnessEvaluator, FaithfulnessEvaluator,
PairwiseComparisonEvaluator, PairwiseComparisonEvaluator,
) )
from collections import defaultdict from collections import defaultdict
import pandas as pd import pandas as pd
# NOTE: can uncomment other evaluators # NOTE: can uncomment other evaluators
evaluator_c = CorrectnessEvaluator(service_context=eval_service_context) evaluator_c = CorrectnessEvaluator(service_context=eval_service_context)
evaluator_s = SemanticSimilarityEvaluator(service_context=eval_service_context) evaluator_s = SemanticSimilarityEvaluator(service_context=eval_service_context)
evaluator_r = RelevancyEvaluator(service_context=eval_service_context) evaluator_r = RelevancyEvaluator(service_context=eval_service_context)
evaluator_f = FaithfulnessEvaluator(service_context=eval_service_context) evaluator_f = FaithfulnessEvaluator(service_context=eval_service_context)
# pairwise_evaluator = PairwiseComparisonEvaluator(service_context=eval_service_context) # pairwise_evaluator = PairwiseComparisonEvaluator(service_context=eval_service_context)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
from llama_index.evaluation.eval_utils import get_responses, get_results_df from llama_index.evaluation.eval_utils import get_responses, get_results_df
from llama_index.evaluation import BatchEvalRunner from llama_index.evaluation import BatchEvalRunner
max_samples = 30 max_samples = 30
eval_qs = eval_dataset.questions eval_qs = eval_dataset.questions
ref_response_strs = [r for (_, r) in eval_dataset.qr_pairs] ref_response_strs = [r for (_, r) in eval_dataset.qr_pairs]
# resetup base query engine and sentence window query engine # resetup base query engine and sentence window query engine
# base query engine # base query engine
base_query_engine = base_index.as_query_engine(similarity_top_k=2) base_query_engine = base_index.as_query_engine(similarity_top_k=2)
# sentence window query engine # sentence window query engine
query_engine = sentence_index.as_query_engine( query_engine = sentence_index.as_query_engine(
similarity_top_k=2, similarity_top_k=2,
# the target key defaults to `window` to match the node_parser's default # the target key defaults to `window` to match the node_parser's default
node_postprocessors=[ node_postprocessors=[
MetadataReplacementPostProcessor(target_metadata_key="window") MetadataReplacementPostProcessor(target_metadata_key="window")
], ],
) )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
base_pred_responses = get_responses( base_pred_responses = get_responses(
eval_qs[:max_samples], base_query_engine, show_progress=True eval_qs[:max_samples], base_query_engine, show_progress=True
) )
pred_responses = get_responses( pred_responses = get_responses(
eval_qs[:max_samples], query_engine, show_progress=True eval_qs[:max_samples], query_engine, show_progress=True
) )
pred_response_strs = [str(p) for p in pred_responses] pred_response_strs = [str(p) for p in pred_responses]
base_pred_response_strs = [str(p) for p in base_pred_responses] base_pred_response_strs = [str(p) for p in base_pred_responses]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
evaluator_dict = { evaluator_dict = {
"correctness": evaluator_c, "correctness": evaluator_c,
"faithfulness": evaluator_f, "faithfulness": evaluator_f,
"relevancy": evaluator_r, "relevancy": evaluator_r,
"semantic_similarity": evaluator_s, "semantic_similarity": evaluator_s,
} }
batch_runner = BatchEvalRunner(evaluator_dict, workers=2, show_progress=True) batch_runner = BatchEvalRunner(evaluator_dict, workers=2, show_progress=True)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Run evaluations over faithfulness/semantic similarity. Run evaluations over faithfulness/semantic similarity.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
eval_results = await batch_runner.aevaluate_responses( eval_results = await batch_runner.aevaluate_responses(
queries=eval_qs[:max_samples], queries=eval_qs[:max_samples],
responses=pred_responses[:max_samples], responses=pred_responses[:max_samples],
reference=ref_response_strs[:max_samples], reference=ref_response_strs[:max_samples],
) )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
base_eval_results = await batch_runner.aevaluate_responses( base_eval_results = await batch_runner.aevaluate_responses(
queries=eval_qs[:max_samples], queries=eval_qs[:max_samples],
responses=base_pred_responses[:max_samples], responses=base_pred_responses[:max_samples],
reference=ref_response_strs[:max_samples], reference=ref_response_strs[:max_samples],
) )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
results_df = get_results_df( results_df = get_results_df(
[eval_results, base_eval_results], [eval_results, base_eval_results],
["Sentence Window Retriever", "Base Retriever"], ["Sentence Window Retriever", "Base Retriever"],
["correctness", "relevancy", "faithfulness", "semantic_similarity"], ["correctness", "relevancy", "faithfulness", "semantic_similarity"],
) )
display(results_df) display(results_df)
``` ```
%% Output %% Output
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment