Fixed typo in ChromaIndexDemo.ipynb (#9700)

943c2ab5 · Sad Mathematician · GitHub · 0d9c9dcd · 943c2ab5
Unverified Commit 943c2ab5 authored 1 year ago by Sad Mathematician Committed by GitHub 1 year ago
--- a/docs/examples/vector_stores/ChromaIndexDemo.ipynb
+++ b/docs/examples/vector_stores/ChromaIndexDemo.ipynb
@@ -65,7 +65,7 @@
   "source": [
    "## Basic Example\n",
    "\n",
-    "In this basic example, we take the a Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it."
+    "In this basic example, we take the Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it."
   ]
  },
  {

 %% Cell type:markdown id:0af3ec93 tags:

 <a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/ChromaIndexDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 %% Cell type:markdown id:307804a3-c02b-4a57-ac0d-172c30ddc851 tags:

 # Chroma

 >[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.

 <a href="https://discord.gg/MMeYNTmh3x" target="_blank">
      <img src="https://img.shields.io/discord/1073293645303795742" alt="Discord">
  </a>&nbsp;&nbsp;
  <a href="https://github.com/chroma-core/chroma/blob/master/LICENSE" target="_blank">
      <img src="https://img.shields.io/static/v1?label=license&message=Apache 2.0&color=white" alt="License">
  </a>&nbsp;&nbsp;
  <img src="https://github.com/chroma-core/chroma/actions/workflows/chroma-integration-test.yml/badge.svg?branch=main" alt="Integration Tests">

 - [Website](https://www.trychroma.com/)
 - [Documentation](https://docs.trychroma.com/)
 - [Twitter](https://twitter.com/trychroma)
 - [Discord](https://discord.gg/MMeYNTmh3x)

 Chroma is fully-typed, fully-tested and fully-documented.

 Install Chroma with:

 ```sh
 pip install chromadb
 ```

 Chroma runs in various modes. See below for examples of each integrated with LangChain.
 - `in-memory` - in a python script or jupyter notebook
 - `in-memory with persistance` - in a script or notebook and save/load to disk
 - `in a docker container` - as a server running your local machine or in the cloud

 Like any other database, you can:
 - `.add`
 - `.get`
 - `.update`
 - `.upsert`
 - `.delete`
 - `.peek`
 - and `.query` runs the similarity search.

 View full docs at [docs](https://docs.trychroma.com/reference/Collection).

 %% Cell type:markdown id:b5331b6b tags:

 ## Basic Example

-In this basic example, we take the a Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it.
+In this basic example, we take the Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Chroma, and then query it.

 %% Cell type:markdown id:54361467 tags:

 If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

 %% Cell type:code id:0ffe7d98 tags:

 ``` python
 !pip install llama-index
 ```

 %% Cell type:markdown id:f7010b1d-d1bb-4f08-9309-a328bb4ea396 tags:

 #### Creating a Chroma Index

 %% Cell type:code id:b3df0b97 tags:

 ``` python
 # !pip install llama-index chromadb --quiet
 # !pip install chromadb
 # !pip install sentence-transformers
 # !pip install pydantic==1.10.11
 ```

 %% Cell type:code id:d48af8e1 tags:

 ``` python
 # import
 from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
 from llama_index.vector_stores import ChromaVectorStore
 from llama_index.storage.storage_context import StorageContext
 from llama_index.embeddings import HuggingFaceEmbedding
 from IPython.display import Markdown, display
 import chromadb
 ```

 %% Cell type:code id:374a148b tags:

 ``` python
 # set up OpenAI
 import os
 import getpass

 os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
 import openai

 openai.api_key = os.environ["OPENAI_API_KEY"]
 ```

 %% Cell type:markdown id:7b9a55de tags:

 Download Data

 %% Cell type:code id:01f19bc6 tags:

 ``` python
 !mkdir -p 'data/paul_graham/'
 !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
 ```

 %% Cell type:code id:667f3cb3-ce18-48d5-b9aa-bfc1a1f0f0f6 tags:

 ``` python
 # create client and a new collection
 chroma_client = chromadb.EphemeralClient()
 chroma_collection = chroma_client.create_collection("quickstart")

 # define embedding function
 embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

 # load documents
 documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

 # set up ChromaVectorStore and load in data
 vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
 storage_context = StorageContext.from_defaults(vector_store=vector_store)
 service_context = ServiceContext.from_defaults(embed_model=embed_model)
 index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
 )

 # Query Data
 query_engine = index.as_query_engine()
 response = query_engine.query("What did the author do growing up?")
 display(Markdown(f"<b>{response}</b>"))
 ```

 %% Output

    /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
      from .autonotebook import tqdm as notebook_tqdm
    /Users/loganmarkewich/llama_index/llama-index/lib/python3.9/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
      warn("The installed version of bitsandbytes was compiled without GPU support. "

    'NoneType' object has no attribute 'cadam32bit_grad_fp32'

    <b>The author worked on writing and programming growing up. They wrote short stories and tried writing programs on an IBM 1401 computer. Later, they got a microcomputer and started programming more extensively.</b>

 %% Cell type:markdown id:349de571 tags:

 ## Basic Example (including saving to disk)

 Extending the previous example, if you want to save to disk, simply initialize the Chroma client and pass the directory where you want the data to be saved to.

 `Caution`: Chroma makes a best-effort to automatically save data to disk, however multiple in-memory clients can stomp each other's work. As a best practice, only have one client per path running at any given time.

 %% Cell type:code id:9c3a56a5 tags:

 ``` python
 # save to disk

 db = chromadb.PersistentClient(path="./chroma_db")
 chroma_collection = db.get_or_create_collection("quickstart")
 vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
 storage_context = StorageContext.from_defaults(vector_store=vector_store)
 service_context = ServiceContext.from_defaults(embed_model=embed_model)
 index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
 )

 # load from disk
 db2 = chromadb.PersistentClient(path="./chroma_db")
 chroma_collection = db2.get_or_create_collection("quickstart")
 vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
 index = VectorStoreIndex.from_vector_store(
    vector_store,
    service_context=service_context,
 )

 # Query Data from the persisted index
 query_engine = index.as_query_engine()
 response = query_engine.query("What did the author do growing up?")
 display(Markdown(f"<b>{response}</b>"))
 ```

 %% Output

    <b>The author worked on writing and programming growing up. They wrote short stories and tried writing programs on an IBM 1401 computer. Later, they got a microcomputer and started programming games and a word processor.</b>

 %% Cell type:markdown id:d596e475 tags:

 ## Basic Example (using the Docker Container)

 You can also run the Chroma Server in a Docker container separately, create a Client to connect to it, and then pass that to LlamaIndex.

 Here is how to clone, build, and run the Docker Image:
 ```
 git clone git@github.com:chroma-core/chroma.git
 docker-compose up -d --build
 ```

 %% Cell type:code id:d6c9bd64 tags:

 ``` python
 # create the chroma client and add our data
 import chromadb

 remote_db = chromadb.HttpClient()
 chroma_collection = remote_db.get_or_create_collection("quickstart")
 vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
 storage_context = StorageContext.from_defaults(vector_store=vector_store)
 service_context = ServiceContext.from_defaults(embed_model=embed_model)
 index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context
 )
 ```

 %% Cell type:code id:88e10c26 tags:

 ``` python
 # Query Data from the Chroma Docker index
 query_engine = index.as_query_engine()
 response = query_engine.query("What did the author do growing up?")
 display(Markdown(f"<b>{response}</b>"))
 ```

 %% Output

    <b>
    Growing up, the author wrote short stories, programmed on an IBM 1401, and wrote programs on a TRS-80 microcomputer. He also took painting classes at Harvard and worked as a de facto studio assistant for a painter. He also tried to start a company to put art galleries online, and wrote software to build online stores.</b>

 %% Cell type:markdown id:0a0e79f7 tags:

 ## Update and Delete

 While building toward a real application, you want to go beyond adding data, and also update and delete data.

 Chroma has users provide `ids` to simplify the bookkeeping here. `ids` can be the name of the file, or a combined has like `filename_paragraphNumber`, etc.

 Here is a basic example showing how to do various operations:

 %% Cell type:code id:d9411826 tags:

 ``` python
 doc_to_update = chroma_collection.get(limit=1)
 doc_to_update["metadatas"][0] = {
    **doc_to_update["metadatas"][0],
    **{"author": "Paul Graham"},
 }
 chroma_collection.update(
    ids=[doc_to_update["ids"][0]], metadatas=[doc_to_update["metadatas"][0]]
 )
 updated_doc = chroma_collection.get(limit=1)
 print(updated_doc["metadatas"][0])

 # delete the last document
 print("count before", chroma_collection.count())
 chroma_collection.delete(ids=[doc_to_update["ids"][0]])
 print("count after", chroma_collection.count())
 ```

 %% Output

    {'_node_content': '{"id_": "be08c8bc-f43e-4a71-ba64-e525921a8319", "embedding": null, "metadata": {}, "excluded_embed_metadata_keys": [], "excluded_llm_metadata_keys": [], "relationships": {"1": {"node_id": "2cbecdbb-0840-48b2-8151-00119da0995b", "node_type": null, "metadata": {}, "hash": "4c702b4df575421e1d1af4b1fd50511b226e0c9863dbfffeccb8b689b8448f35"}, "3": {"node_id": "6a75604a-fa76-4193-8f52-c72a7b18b154", "node_type": null, "metadata": {}, "hash": "d6c408ee1fbca650fb669214e6f32ffe363b658201d31c204e85a72edb71772f"}}, "hash": "b4d0b960aa09e693f9dc0d50ef46a3d0bf5a8fb3ac9f3e4bcf438e326d17e0d8", "text": "", "start_char_idx": 0, "end_char_idx": 4050, "text_template": "{metadata_str}\\n\\n{content}", "metadata_template": "{key}: {value}", "metadata_seperator": "\\n"}', 'author': 'Paul Graham', 'doc_id': '2cbecdbb-0840-48b2-8151-00119da0995b', 'document_id': '2cbecdbb-0840-48b2-8151-00119da0995b', 'ref_doc_id': '2cbecdbb-0840-48b2-8151-00119da0995b'}
    count before 20
    count after 19