diff --git a/docs/getting_started/installation.md b/docs/getting_started/installation.md index 03cb10c927354fbfffdb2c6a6c814ddac8ecb21a..42920ae389a60971af1f09102ddcab32d3339169 100644 --- a/docs/getting_started/installation.md +++ b/docs/getting_started/installation.md @@ -8,7 +8,7 @@ pip install llama-index ``` ### Installation from Source -Git clone this repository: `git clone git@github.com:jerryjliu/llama_index.git`. Then do: +Git clone this repository: `git clone https://github.com/jerryjliu/llama_index.git`. Then do: - `pip install -e .` if you want to do an editable install (you can modify source files) of just the package itself. - `pip install -r requirements.txt` if you want to install optional dependencies + dependencies used for development (e.g. unit testing). diff --git a/docs/guides/primer/usage_pattern.md b/docs/guides/primer/usage_pattern.md index b6524bb64df2fff52d40cff8fe827d628f0c7a0e..69e23cf0f3109b6b51ebfb6ecabef057345cec37 100644 --- a/docs/guides/primer/usage_pattern.md +++ b/docs/guides/primer/usage_pattern.md @@ -61,6 +61,7 @@ node2 = Node(text="<text_chunk>", doc_id="<node_id>") # set relationships node1.relationships[DocumentRelationship.NEXT] = node2.get_doc_id() node2.relationships[DocumentRelationship.PREVIOUS] = node1.get_doc_id() +nodes = [node1, node2] ``` @@ -183,6 +184,7 @@ For embedding-based indices, you can choose to pass in a custom embedding model. Creating an index, inserting to an index, and querying an index may use tokens. We can track token usage through the outputs of these operations. When running operations, the token usage will be printed. + You can also fetch the token usage through `index.llm_predictor.last_token_usage`. See [Cost Predictor How-To](/docs/how_to/analysis/cost_analysis.md) for more details. @@ -337,6 +339,9 @@ Right now, we support the following options: multiple prompts. - `tree_summarize`: Given a set of `Node` objects and the query, recursively construct a tree and return the root node as the response. Good for summarization purposes. +- `no_text`: Only runs the retriever to fetch the nodes that would have been sent to the LLM, + without actually sending them. Then can be inspected by checking `response.source_nodes`. + The response object is covered in more detail in Section 5. - `accumulate`: Given a set of `Node` objects and the query, apply the query to each `Node` text chunk while accumulating the responses into an array. Returns a concatenated string of all responses. Good for when you need to run the same query separately against each text @@ -357,6 +362,10 @@ response = query_engine.query("What did the author do growing up?") # tree summarize query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='tree_summarize') response = query_engine.query("What did the author do growing up?") + +# no text +query_engine = RetrieverQueryEngine.from_args(retriever, response_mode='no_text') +response = query_engine.query("What did the author do growing up?") ``` diff --git a/docs/how_to/customization/custom_llms.md b/docs/how_to/customization/custom_llms.md index d8cfdd25b4b1e8469fa0b4b9d03a23b0cdcabbde..701ab35009880d99d8801cb699d647950533c010 100644 --- a/docs/how_to/customization/custom_llms.md +++ b/docs/how_to/customization/custom_llms.md @@ -136,7 +136,7 @@ response = query_engine.query("What did the author do after his time at Y Combin ## Example: Using a HuggingFace LLM -LlamaIndex supports using LLMs from HuggingFace directly. +LlamaIndex supports using LLMs from HuggingFace directly. Note that for a completely private experience, also setup a local embedding model (example [here](./embeddings.md#custom-embeddings)). ```python from llama_index.prompts.prompts import SimpleInputPrompt @@ -181,7 +181,9 @@ Several example notebooks are also listed below: ## Example: Using a Custom LLM Model - Advanced -To use a custom LLM model, you only need to implement the `LLM` class [from Langchain](https://langchain.readthedocs.io/en/latest/modules/llms/examples/custom_llm.html). You will be responsible for passing the text to the model and returning the newly generated tokens. +To use a custom LLM model, you only need to implement the `LLM` class [from Langchain](https://python.langchain.com/en/latest/modules/models/llms/examples/custom_llm.html). You will be responsible for passing the text to the model and returning the newly generated tokens. + +Note that for a completely private experience, also setup a local embedding model (example [here](./embeddings.md#custom-embeddings)). Here is a small example using locally running facebook/OPT model and Huggingface's pipeline abstraction: diff --git a/docs/how_to/index_structs/composability.md b/docs/how_to/index_structs/composability.md index cc1c9dba6db2572cc3b8db8427880ca8a7240366..4077415c3e63ccb032dd6bf3c821d18cdb16cdf8 100644 --- a/docs/how_to/index_structs/composability.md +++ b/docs/how_to/index_structs/composability.md @@ -9,6 +9,8 @@ Composability allows you to to define lower-level indices for each document, and To see how this works, imagine you have 3 documents: `doc1`, `doc2`, and `doc3`. ```python +from llama_index import SimpleDirectoryReader + doc1 = SimpleDirectoryReader('data1').load_data() doc2 = SimpleDirectoryReader('data2').load_data() doc3 = SimpleDirectoryReader('data3').load_data() @@ -16,12 +18,18 @@ doc3 = SimpleDirectoryReader('data3').load_data()  -Now let's define a tree index for each document. In Python, we have: +Now let's define a tree index for each document. In order to persist the graph later, each index should share the same storage context. + +In Python, we have: ```python -index1 = GPTTreeIndex.from_documents(doc1) -index2 = GPTTreeIndex.from_documents(doc2) -index3 = GPTTreeIndex.from_documents(doc3) +from llama_index import GPTTreeIndex + +storage_context = storage_context.from_defaults() + +index1 = GPTTreeIndex.from_documents(doc1, storage_context=storage_context) +index2 = GPTTreeIndex.from_documents(doc2, storage_context=storage_context) +index3 = GPTTreeIndex.from_documents(doc3, storage_context=storage_context) ```  @@ -61,6 +69,7 @@ graph = ComposableGraph.from_indices( GPTListIndex, [index1, index2, index3], index_summaries=[index1_summary, index2_summary, index3_summary], + storage_context=storage_context, ) ``` @@ -94,12 +103,12 @@ response = query_engine.query("Where did the author grow up?") ``` > Note that specifying custom retriever for index by id -> might require you to inspect e.g., `index1.index_struct.index_id`. +> might require you to inspect e.g., `index1.index_id`. > Alternatively, you can explicitly set it as follows: ```python -index1.index_struct.index_id = "<index_id_1>" -index2.index_struct.index_id = "<index_id_2>" -index3.index_struct.index_id = "<index_id_3>" +index1.set_index_id("<index_id_1>") +index2.set_index_id("<index_id_2>") +index3.set_index_id("<index_id_3>") ```  @@ -111,6 +120,26 @@ So within a node, instead of fetching the text, we would recursively query the s NOTE: You can stack indices as many times as you want, depending on the hierarchies of your knowledge base! +### [Optional] Persisting the Graph + +The graph can also be persisted to storage, and then loaded again when needed. Note that you'll need to set the +ID of the root index, or keep track of the default. + +```python +# set the ID +graph.root_index.set_index_id("my_id") + +# persist to storage +graph.root_index.storage_context.persist(persist_dir="./storage") + +# load +from llama_index import StorageContext, load_graph_from_storage + +storage_context = StorageContext.from_defaults(persist_dir="./storage") +graph = load_graph_from_storage(storage_context, root_id="my_id") +``` + + We can take a look at a code example below as well. We first build two tree indices, one over the Wikipedia NYC page, and the other over Paul Graham's essay. We then define a keyword extractor index over the two tree indices. [Here is an example notebook](https://github.com/jerryjliu/llama_index/blob/main/docs/examples/composable_indices/ComposableIndices.ipynb). diff --git a/docs/how_to/storage/save_load.md b/docs/how_to/storage/save_load.md index 591396cdac56958990b71248b728cd18351b64e9..79b913043f4276986e61d1e17f4f9e74f88f5468 100644 --- a/docs/how_to/storage/save_load.md +++ b/docs/how_to/storage/save_load.md @@ -7,6 +7,8 @@ storage_context.persist(persist_dir="<persist_dir>") ``` This will persist data to disk, under the specified `persist_dir` (or `./storage` by default). +Multiple indexes can be persisted and loaded from the same directory, assuming you keep track of index ID's for loading. + User can also configure alternative storage backends (e.g. `MongoDB`) that persist data by default. In this case, calling `storage_context.persist()` will do nothing. @@ -28,16 +30,18 @@ We can then load specific indices from the `StorageContext` through some conveni from llama_index import load_index_from_storage, load_indices_from_storage, load_graph_from_storage # load a single index -index = load_index_from_storage(storage_context, index_id="<index_id>") # need to specify index_id if it's ambiguous -index = load_index_from_storage(storage_context) # don't need to specify index_id if there's only one index in storage context +# need to specify index_id if multiple indexes are persisted to the same directory +index = load_index_from_storage(storage_context, index_id="<index_id>") + +# don't need to specify index_id if there's only one index in storage context +index = load_index_from_storage(storage_context) # load multiple indices indices = load_indices_from_storage(storage_context) # loads all indices -indices = load_indices_from_storage(storage_context, index_ids=<index_ids>) # loads specific indices +indices = load_indices_from_storage(storage_context, index_ids=[index_id1, ...]) # loads specific indices # load composable graph graph = load_graph_from_storage(storage_context, root_id="<root_id>") # loads graph with the specified root_id - ``` Here's the full [API Reference on saving and loading](/reference/storage/indices_save_load.rst).