Update zcp docs (#9698)

9e73564c · Jael Gu · GitHub · 943c2ab5 · 9e73564c · 9e73564c
Unverified Commit 9e73564c authored 1 year ago by Jael Gu Committed by GitHub 1 year ago
--- a/docs/community/integrations/managed_indices.md
+++ b/docs/community/integrations/managed_indices.md
@@ -122,24 +122,26 @@ zcp_index = ZillizCloudPipelineIndex.from_document_url(
    url="https://publicdataset.zillizcloud.com/milvus_doc.md",
    cluster_id=os.getenv("ZILLIZ_CLUSTER_ID"),
    token=os.getenv("ZILLIZ_TOKEN"),
-    metadata={"version": "2.3"},  # optional
+    metadata={"version": "2.3"},
 )
-# Insert more docs into index, eg. a Milvus v2.0 document
+# Insert more docs into index, eg. a Milvus v2.2 document
 zcp_index.insert_doc_url(
-    url="https://milvus.io/docs/v2.0.x/delete_data.md",
+    url="https://publicdataset.zillizcloud.com/milvus_doc_22.md",
-    metadata={"version": "2.0"},
+    metadata={"version": "2.2"},
 )
 # Query index
 from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters
-query_engine_with_filters = zcp_index.as_query_engine(
+query_engine_milvus23 = zcp_index.as_query_engine(
    search_top_k=3,
    filters=MetadataFilters(
-        filters=[ExactMatchFilter(key="version", value="2.3")]
+        filters=[
-    ),  # optional, here we will only retrieve info of Milvus 2.3
+            ExactMatchFilter(key="version", value="2.3")
-    output_metadata=["version"],  # optional
+        ]  # version == "2.3"
+    ),
+    output_metadata=["version"],
 )
 ```

--- a/docs/examples/managed/zcpDemo.ipynb
+++ b/docs/examples/managed/zcpDemo.ipynb
@@ -81,13 +81,13 @@
    "    url=\"https://publicdataset.zillizcloud.com/milvus_doc.md\",  # a public or pre-signed url of a file stored on s3 or gcs\n",
    "    cluster_id=ZILLIZ_CLUSTER_ID,\n",
    "    token=ZILLIZ_TOKEN,\n",
-    "    metadata={\"version\": \"2.3\"},  # optional\n",
+    "    metadata={\"version\": \"2.3\"},\n",
    ")\n",
    "\n",
-    "# Insert more docs, eg. a Milvus v2.0 document\n",
+    "# Insert more docs, eg. a Milvus v2.2 document\n",
    "zcp_index.insert_doc_url(\n",
-    "    url=\"https://milvus.io/docs/v2.0.x/delete_data.md\",\n",
+    "    url=\"https://publicdataset.zillizcloud.com/milvus_doc_22.md\",\n",
-    "    metadata={\"version\": \"2.0\"},\n",
+    "    metadata={\"version\": \"2.2\"},\n",
    ")"
   ]
  },
@@ -96,8 +96,6 @@
   "id": "d16a498e",
   "metadata": {},
   "source": [
-    "- It is optional to add metadata for each document.\n",
-    "\n",
    "### From Local File\n",
    "\n",
    "Coming soon.\n",
@@ -114,11 +112,13 @@
   "source": [
    "## Working as Query Engine\n",
    "\n",
-    "A Zilliz Cloud Pipeline's Index can work as a Query Engine in Llama-Index.\n",
+    "A Zilliz Cloud Pipeline's Index can work as a Query Engine in LlamaIndex.\n",
    "It allows users to customize some parameters:\n",
    "- search_top_k: How many text nodes/chunks retrieved. Optional, defaults to `DEFAULT_SIMILARITY_TOP_K` (2).\n",
    "- filters: Metadata filters. Optional, defaults to None.\n",
-    "- output_metadata: What metadata fields included in each retrieved text node. Optional, defaults to []."
+    "- output_metadata: What metadata fields included in each retrieved text node. Optional, defaults to [].\n",
+    "\n",
+    "It is optional to apply filters. For example, if we want to ask about Milvus 2.3, then we can set version as 2.3 in filters."
   ]
  },
  {
@@ -139,12 +139,14 @@
    "\n",
    "from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters\n",
    "\n",
-    "query_engine_with_filters = zcp_index.as_query_engine(\n",
+    "query_engine_milvus23 = zcp_index.as_query_engine(\n",
    "    search_top_k=3,\n",
    "    filters=MetadataFilters(\n",
-    "        filters=[ExactMatchFilter(key=\"version\", value=\"2.3\")]\n",
+    "        filters=[\n",
-    "    ),  # optional, here we will only retrieve info of Milvus 2.3\n",
+    "            ExactMatchFilter(key=\"version\", value=\"2.3\")\n",
-    "    output_metadata=[\"version\"],  # optional\n",
+    "        ]  # version == \"2.3\"\n",
+    "    ),\n",
+    "    output_metadata=[\"version\"],\n",
    ")"
   ]
  },
@@ -153,7 +155,7 @@
   "id": "9803232e",
   "metadata": {},
   "source": [
-    "Then the query engine is ready for Semantic Search or Retrieval Augmented Generation:\n",
+    "Then the query engine is ready for Semantic Search or Retrieval Augmented Generation with Milvus 2.3 documents:\n",
    "\n",
    "- **Retrieve** (Semantic search powered by Zilliz Cloud Pipeline's Index):"
   ]
@@ -163,18 +165,19 @@
   "execution_count": null,
   "id": "8ab92af7",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
-   "source": [
+    {
-    "question = \"Can users delete entities by complex boolean expressions?\"\n",
+     "name": "stdout",
-    "query_engine_with_filters.retrieve(question)"
+     "output_type": "stream",
-   ]
+     "text": [
-  },
+      "[NodeWithScore(node=TextNode(id_='446268394525283746', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='c3254bc65319b52914d6e68fbce69161fcf0e2998e4619287a8560258a2fe53d', text='Delete Entities\\nThis topic describes how to delete entities in Milvus.\\nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.\\nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\\nFrequent deletion operations will impact the system performance.\\nBefore deleting entities by comlpex boolean expressions, make sure the collection has been loaded.\\nDeleting entities by complex boolean expressions is not an atomic operation. Therefore, if it fails halfway through, some data may still be deleted.\\nDeleting entities by complex boolean expressions is supported only when the consistency is set to Bounded. For details, see Consistency.\\nPrepare boolean expression\\nPrepare the boolean expression that filters the entities to delete.\\nMilvus supports deleting entities by primary key or complex boolean expressions. For more information on expression rules and supported operators, see Boolean Expression Rules.\\nSimple boolean expression\\nUse a simple expression to filter data with primary key values of 0 and 1:\\npython\\nexpr = \"book_id in [0,1]\"\\nComplex boolean expression', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.8668166995048523), NodeWithScore(node=TextNode(id_='446268394525283747', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='3ec8b3a992fb72d081145b7859c70453dc9d71be714c0f5f99ad2b2c2cb1f7ea', text='To filter entities that meet specific conditions, define complex boolean expressions.\\nFilter entities whose word_count is greater than or equal to 11000:\\npython\\nexpr = \"word_count >= 11000\"\\nFilter entities whose book_name is not Unknown:\\npython\\nexpr = \"book_name != Unknown\"\\nFilter entities whose primary key values are greater than 5 and word_count is smaller than or equal to 9999:\\npython\\nexpr = \"book_id > 5 && word_count <= 9999\"\\nDelete entities\\nDelete the entities with the boolean expression you created. Milvus returns the ID list of the deleted entities.\\npython\\nfrom pymilvus import Collection\\ncollection = Collection(\"book\")      # Get an existing collection.\\ncollection.delete(expr)\\nParameter   Description\\nexpr    Boolean expression that specifies the entities to delete.\\npartition_name (optional)   Name of the partition to delete entities from.\\nUpsert Entities\\nThis topic describes how to upsert entities in Milvus.\\nUpserting is a combination of insert and delete operations. In the context of a Milvus vector database, an upsert is a data-level operation that will overwrite an existing entity if a specified field already exists in a collection, and insert a new entity if the specified value doesn’t already exist.\\nThe following example upserts 3,000 rows of randomly generated data as the example data. When performing upsert operations, it\\'s important to note that the operation may compromise performance. This is because the operation involves deleting data during execution.\\nPrepare data', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.841397762298584), NodeWithScore(node=TextNode(id_='446268394525283749', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='42656e32ce6baa2897419b8bae612412db94f1d570ab1702f2ae6c5557f248a9', text='When data is upserted into Milvus it is updated and inserted into segments. Segments have to reach a certain size to be sealed and indexed. Unsealed segments will be searched brute force. In order to avoid this with any remainder data, it is best to call flush(). The flush() call will seal any remaining segments and send them for indexing. It is important to only call this method at the end of an upsert session. Calling it too often will cause fragmented data that will need to be cleaned later on.\\nLimits\\nUpdating primary key fields is not supported by upsert().\\nupsert() is not applicable and an error can occur if autoID is set to True for primary key fields.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.793336033821106)]\n"
-  {
+     ]
-   "cell_type": "markdown",
+    }
-   "id": "a503d6e0",
+   ],
-   "metadata": {},
   "source": [
-    "> The query engine with filters retrieves only text nodes with version 2.3."
+    "question = \"Can users delete entities by filtering non-primary fields?\"\n",
+    "retrieved_nodes = query_engine_milvus23.retrieve(question)\n",
+    "print(retrieved_nodes)"
   ]
  },
  {
@@ -190,9 +193,17 @@
   "execution_count": null,
   "id": "fc7b01b7",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Yes, users can delete entities by filtering non-primary fields. Milvus supports deleting entities by complex boolean expressions, which can include conditions based on non-primary fields. Users can define complex boolean expressions to filter entities based on specific conditions and then delete those entities using the expression.\n"
+     ]
+    }
+   ],
   "source": [
-    "response = query_engine_with_filters.query(question)\n",
+    "response = query_engine_milvus23.query(question)\n",
    "print(response.response)"
   ]
  }

 %% Cell type:markdown id:adf7d63d tags:
 <a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
 %% Cell type:markdown id:db0855d0 tags:
 # Managed Index with Zilliz Cloud Pipeline
 [Zilliz Cloud Pipelines](https://docs.zilliz.com/docs/pipelines) is a robust solution that efficiently transforms unstructured data into a vector database for effective semantic search.
 ## Setup
 1. Install llama-index
 %% Cell type:code id:6019e01a tags:
 ``` python
 # ! pip install llama-index
 ```
 %% Cell type:markdown id:3fc94b2f tags:
 2. Set your [OpenAI](https://platform.openai.com) & [Zilliz Cloud](https://cloud.zilliz.com/) accounts
 %% Cell type:code id:c2d1c538 tags:
 ``` python
 from getpass import getpass
 import os
 os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:")
 ZILLIZ_CLUSTER_ID = getpass("Enter your Zilliz Cluster ID:")
 ZILLIZ_TOKEN = getpass("Enter your Zilliz Token:")
 ```
 %% Cell type:markdown id:5d3c5b5f tags:
 ## Indexing documents
 ### From Signed URL
 Zilliz Cloud Pipeline is able to ingest & automatically index a document given a presigned url.
 %% Cell type:code id:97d5c934 tags:
 ``` python
 from llama_index.indices import ZillizCloudPipelineIndex
 zcp_index = ZillizCloudPipelineIndex.from_document_url(
    url="https://publicdataset.zillizcloud.com/milvus_doc.md",  # a public or pre-signed url of a file stored on s3 or gcs
    cluster_id=ZILLIZ_CLUSTER_ID,
    token=ZILLIZ_TOKEN,
-    metadata={"version": "2.3"},  # optional
+    metadata={"version": "2.3"},
 )
-# Insert more docs, eg. a Milvus v2.0 document
+# Insert more docs, eg. a Milvus v2.2 document
 zcp_index.insert_doc_url(
-    url="https://milvus.io/docs/v2.0.x/delete_data.md",
+    url="https://publicdataset.zillizcloud.com/milvus_doc_22.md",
-    metadata={"version": "2.0"},
+    metadata={"version": "2.2"},
 )
 ```
 %% Cell type:markdown id:d16a498e tags:
- It is optional to add metadata for each document.
 ### From Local File
 Coming soon.
 ### From Raw Text
 Coming soon.
 %% Cell type:markdown id:c94365ab tags:
 ## Working as Query Engine
-A Zilliz Cloud Pipeline's Index can work as a Query Engine in Llama-Index.
+A Zilliz Cloud Pipeline's Index can work as a Query Engine in LlamaIndex.
 It allows users to customize some parameters:
 - search_top_k: How many text nodes/chunks retrieved. Optional, defaults to `DEFAULT_SIMILARITY_TOP_K` (2).
 - filters: Metadata filters. Optional, defaults to None.
 - output_metadata: What metadata fields included in each retrieved text node. Optional, defaults to [].
+It is optional to apply filters. For example, if we want to ask about Milvus 2.3, then we can set version as 2.3 in filters.
 %% Cell type:code id:dafda7a1 tags:
 ``` python
 # # Get index without ingestion:
 # from llama_index.indices import ZillizCloudPipelineIndex
 # zcp_index = ZillizCloudPipelineIndex(
 #         cluster_id=ZILLIZ_CLUSTER_ID,
 #         token=ZILLIZ_TOKEN,
 #         # collection_name='zcp_llamalection'
 #     )
 from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters
-query_engine_with_filters = zcp_index.as_query_engine(
+query_engine_milvus23 = zcp_index.as_query_engine(
    search_top_k=3,
    filters=MetadataFilters(
-        filters=[ExactMatchFilter(key="version", value="2.3")]
+        filters=[
-    ),  # optional, here we will only retrieve info of Milvus 2.3
+            ExactMatchFilter(key="version", value="2.3")
-    output_metadata=["version"],  # optional
+        ]  # version == "2.3"
+    ),
+    output_metadata=["version"],
 )
 ```
 %% Cell type:markdown id:9803232e tags:
-Then the query engine is ready for Semantic Search or Retrieval Augmented Generation:
+Then the query engine is ready for Semantic Search or Retrieval Augmented Generation with Milvus 2.3 documents:
 - **Retrieve** (Semantic search powered by Zilliz Cloud Pipeline's Index):
 %% Cell type:code id:8ab92af7 tags:
 ``` python
-question = "Can users delete entities by complex boolean expressions?"
+question = "Can users delete entities by filtering non-primary fields?"
-query_engine_with_filters.retrieve(question)
+retrieved_nodes = query_engine_milvus23.retrieve(question)
+print(retrieved_nodes)
 ```
-%% Cell type:markdown id:a503d6e0 tags:
+%% Output
-> The query engine with filters retrieves only text nodes with version 2.3.
+    [NodeWithScore(node=TextNode(id_='446268394525283746', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='c3254bc65319b52914d6e68fbce69161fcf0e2998e4619287a8560258a2fe53d', text='Delete Entities\nThis topic describes how to delete entities in Milvus.\nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.\nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\nFrequent deletion operations will impact the system performance.\nBefore deleting entities by comlpex boolean expressions, make sure the collection has been loaded.\nDeleting entities by complex boolean expressions is not an atomic operation. Therefore, if it fails halfway through, some data may still be deleted.\nDeleting entities by complex boolean expressions is supported only when the consistency is set to Bounded. For details, see Consistency.\nPrepare boolean expression\nPrepare the boolean expression that filters the entities to delete.\nMilvus supports deleting entities by primary key or complex boolean expressions. For more information on expression rules and supported operators, see Boolean Expression Rules.\nSimple boolean expression\nUse a simple expression to filter data with primary key values of 0 and 1:\npython\nexpr = "book_id in [0,1]"\nComplex boolean expression', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.8668166995048523), NodeWithScore(node=TextNode(id_='446268394525283747', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='3ec8b3a992fb72d081145b7859c70453dc9d71be714c0f5f99ad2b2c2cb1f7ea', text='To filter entities that meet specific conditions, define complex boolean expressions.\nFilter entities whose word_count is greater than or equal to 11000:\npython\nexpr = "word_count >= 11000"\nFilter entities whose book_name is not Unknown:\npython\nexpr = "book_name != Unknown"\nFilter entities whose primary key values are greater than 5 and word_count is smaller than or equal to 9999:\npython\nexpr = "book_id > 5 && word_count <= 9999"\nDelete entities\nDelete the entities with the boolean expression you created. Milvus returns the ID list of the deleted entities.\npython\nfrom pymilvus import Collection\ncollection = Collection("book")      # Get an existing collection.\ncollection.delete(expr)\nParameter   Description\nexpr    Boolean expression that specifies the entities to delete.\npartition_name (optional)   Name of the partition to delete entities from.\nUpsert Entities\nThis topic describes how to upsert entities in Milvus.\nUpserting is a combination of insert and delete operations. In the context of a Milvus vector database, an upsert is a data-level operation that will overwrite an existing entity if a specified field already exists in a collection, and insert a new entity if the specified value doesn’t already exist.\nThe following example upserts 3,000 rows of randomly generated data as the example data. When performing upsert operations, it\'s important to note that the operation may compromise performance. This is because the operation involves deleting data during execution.\nPrepare data', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.841397762298584), NodeWithScore(node=TextNode(id_='446268394525283749', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='42656e32ce6baa2897419b8bae612412db94f1d570ab1702f2ae6c5557f248a9', text='When data is upserted into Milvus it is updated and inserted into segments. Segments have to reach a certain size to be sealed and indexed. Unsealed segments will be searched brute force. In order to avoid this with any remainder data, it is best to call flush(). The flush() call will seal any remaining segments and send them for indexing. It is important to only call this method at the end of an upsert session. Calling it too often will cause fragmented data that will need to be cleaned later on.\nLimits\nUpdating primary key fields is not supported by upsert().\nupsert() is not applicable and an error can occur if autoID is set to True for primary key fields.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.793336033821106)]
 %% Cell type:markdown id:e91c5896 tags:
 - **Query** (RAG powered by Zilliz Cloud Pipeline's Index & OpenAI's LLM):
 %% Cell type:code id:fc7b01b7 tags:
 ``` python
-response = query_engine_with_filters.query(question)
+response = query_engine_milvus23.query(question)
 print(response.response)
 ```
+%% Output
+    Yes, users can delete entities by filtering non-primary fields. Milvus supports deleting entities by complex boolean expressions, which can include conditions based on non-primary fields. Users can define complex boolean expressions to filter entities based on specific conditions and then delete those entities using the expression.