Skip to content
Snippets Groups Projects
Unverified Commit 9e73564c authored by Jael Gu's avatar Jael Gu Committed by GitHub
Browse files

Update zcp docs (#9698)

parent 943c2ab5
No related branches found
No related tags found
No related merge requests found
...@@ -122,24 +122,26 @@ zcp_index = ZillizCloudPipelineIndex.from_document_url( ...@@ -122,24 +122,26 @@ zcp_index = ZillizCloudPipelineIndex.from_document_url(
url="https://publicdataset.zillizcloud.com/milvus_doc.md", url="https://publicdataset.zillizcloud.com/milvus_doc.md",
cluster_id=os.getenv("ZILLIZ_CLUSTER_ID"), cluster_id=os.getenv("ZILLIZ_CLUSTER_ID"),
token=os.getenv("ZILLIZ_TOKEN"), token=os.getenv("ZILLIZ_TOKEN"),
metadata={"version": "2.3"}, # optional metadata={"version": "2.3"},
) )
# Insert more docs into index, eg. a Milvus v2.0 document # Insert more docs into index, eg. a Milvus v2.2 document
zcp_index.insert_doc_url( zcp_index.insert_doc_url(
url="https://milvus.io/docs/v2.0.x/delete_data.md", url="https://publicdataset.zillizcloud.com/milvus_doc_22.md",
metadata={"version": "2.0"}, metadata={"version": "2.2"},
) )
# Query index # Query index
from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters
query_engine_with_filters = zcp_index.as_query_engine( query_engine_milvus23 = zcp_index.as_query_engine(
search_top_k=3, search_top_k=3,
filters=MetadataFilters( filters=MetadataFilters(
filters=[ExactMatchFilter(key="version", value="2.3")] filters=[
), # optional, here we will only retrieve info of Milvus 2.3 ExactMatchFilter(key="version", value="2.3")
output_metadata=["version"], # optional ] # version == "2.3"
),
output_metadata=["version"],
) )
``` ```
......
%% Cell type:markdown id:adf7d63d tags: %% Cell type:markdown id:adf7d63d tags:
<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> <a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
%% Cell type:markdown id:db0855d0 tags: %% Cell type:markdown id:db0855d0 tags:
# Managed Index with Zilliz Cloud Pipeline # Managed Index with Zilliz Cloud Pipeline
[Zilliz Cloud Pipelines](https://docs.zilliz.com/docs/pipelines) is a robust solution that efficiently transforms unstructured data into a vector database for effective semantic search. [Zilliz Cloud Pipelines](https://docs.zilliz.com/docs/pipelines) is a robust solution that efficiently transforms unstructured data into a vector database for effective semantic search.
## Setup ## Setup
1. Install llama-index 1. Install llama-index
%% Cell type:code id:6019e01a tags: %% Cell type:code id:6019e01a tags:
``` python ``` python
# ! pip install llama-index # ! pip install llama-index
``` ```
%% Cell type:markdown id:3fc94b2f tags: %% Cell type:markdown id:3fc94b2f tags:
2. Set your [OpenAI](https://platform.openai.com) & [Zilliz Cloud](https://cloud.zilliz.com/) accounts 2. Set your [OpenAI](https://platform.openai.com) & [Zilliz Cloud](https://cloud.zilliz.com/) accounts
%% Cell type:code id:c2d1c538 tags: %% Cell type:code id:c2d1c538 tags:
``` python ``` python
from getpass import getpass from getpass import getpass
import os import os
os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:") os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:")
ZILLIZ_CLUSTER_ID = getpass("Enter your Zilliz Cluster ID:") ZILLIZ_CLUSTER_ID = getpass("Enter your Zilliz Cluster ID:")
ZILLIZ_TOKEN = getpass("Enter your Zilliz Token:") ZILLIZ_TOKEN = getpass("Enter your Zilliz Token:")
``` ```
%% Cell type:markdown id:5d3c5b5f tags: %% Cell type:markdown id:5d3c5b5f tags:
## Indexing documents ## Indexing documents
### From Signed URL ### From Signed URL
Zilliz Cloud Pipeline is able to ingest & automatically index a document given a presigned url. Zilliz Cloud Pipeline is able to ingest & automatically index a document given a presigned url.
%% Cell type:code id:97d5c934 tags: %% Cell type:code id:97d5c934 tags:
``` python ``` python
from llama_index.indices import ZillizCloudPipelineIndex from llama_index.indices import ZillizCloudPipelineIndex
zcp_index = ZillizCloudPipelineIndex.from_document_url( zcp_index = ZillizCloudPipelineIndex.from_document_url(
url="https://publicdataset.zillizcloud.com/milvus_doc.md", # a public or pre-signed url of a file stored on s3 or gcs url="https://publicdataset.zillizcloud.com/milvus_doc.md", # a public or pre-signed url of a file stored on s3 or gcs
cluster_id=ZILLIZ_CLUSTER_ID, cluster_id=ZILLIZ_CLUSTER_ID,
token=ZILLIZ_TOKEN, token=ZILLIZ_TOKEN,
metadata={"version": "2.3"}, # optional metadata={"version": "2.3"},
) )
# Insert more docs, eg. a Milvus v2.0 document # Insert more docs, eg. a Milvus v2.2 document
zcp_index.insert_doc_url( zcp_index.insert_doc_url(
url="https://milvus.io/docs/v2.0.x/delete_data.md", url="https://publicdataset.zillizcloud.com/milvus_doc_22.md",
metadata={"version": "2.0"}, metadata={"version": "2.2"},
) )
``` ```
%% Cell type:markdown id:d16a498e tags: %% Cell type:markdown id:d16a498e tags:
- It is optional to add metadata for each document.
### From Local File ### From Local File
Coming soon. Coming soon.
### From Raw Text ### From Raw Text
Coming soon. Coming soon.
%% Cell type:markdown id:c94365ab tags: %% Cell type:markdown id:c94365ab tags:
## Working as Query Engine ## Working as Query Engine
A Zilliz Cloud Pipeline's Index can work as a Query Engine in Llama-Index. A Zilliz Cloud Pipeline's Index can work as a Query Engine in LlamaIndex.
It allows users to customize some parameters: It allows users to customize some parameters:
- search_top_k: How many text nodes/chunks retrieved. Optional, defaults to `DEFAULT_SIMILARITY_TOP_K` (2). - search_top_k: How many text nodes/chunks retrieved. Optional, defaults to `DEFAULT_SIMILARITY_TOP_K` (2).
- filters: Metadata filters. Optional, defaults to None. - filters: Metadata filters. Optional, defaults to None.
- output_metadata: What metadata fields included in each retrieved text node. Optional, defaults to []. - output_metadata: What metadata fields included in each retrieved text node. Optional, defaults to [].
It is optional to apply filters. For example, if we want to ask about Milvus 2.3, then we can set version as 2.3 in filters.
%% Cell type:code id:dafda7a1 tags: %% Cell type:code id:dafda7a1 tags:
``` python ``` python
# # Get index without ingestion: # # Get index without ingestion:
# from llama_index.indices import ZillizCloudPipelineIndex # from llama_index.indices import ZillizCloudPipelineIndex
# zcp_index = ZillizCloudPipelineIndex( # zcp_index = ZillizCloudPipelineIndex(
# cluster_id=ZILLIZ_CLUSTER_ID, # cluster_id=ZILLIZ_CLUSTER_ID,
# token=ZILLIZ_TOKEN, # token=ZILLIZ_TOKEN,
# # collection_name='zcp_llamalection' # # collection_name='zcp_llamalection'
# ) # )
from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters
query_engine_with_filters = zcp_index.as_query_engine( query_engine_milvus23 = zcp_index.as_query_engine(
search_top_k=3, search_top_k=3,
filters=MetadataFilters( filters=MetadataFilters(
filters=[ExactMatchFilter(key="version", value="2.3")] filters=[
), # optional, here we will only retrieve info of Milvus 2.3 ExactMatchFilter(key="version", value="2.3")
output_metadata=["version"], # optional ] # version == "2.3"
),
output_metadata=["version"],
) )
``` ```
%% Cell type:markdown id:9803232e tags: %% Cell type:markdown id:9803232e tags:
Then the query engine is ready for Semantic Search or Retrieval Augmented Generation: Then the query engine is ready for Semantic Search or Retrieval Augmented Generation with Milvus 2.3 documents:
- **Retrieve** (Semantic search powered by Zilliz Cloud Pipeline's Index): - **Retrieve** (Semantic search powered by Zilliz Cloud Pipeline's Index):
%% Cell type:code id:8ab92af7 tags: %% Cell type:code id:8ab92af7 tags:
``` python ``` python
question = "Can users delete entities by complex boolean expressions?" question = "Can users delete entities by filtering non-primary fields?"
query_engine_with_filters.retrieve(question) retrieved_nodes = query_engine_milvus23.retrieve(question)
print(retrieved_nodes)
``` ```
%% Cell type:markdown id:a503d6e0 tags: %% Output
> The query engine with filters retrieves only text nodes with version 2.3. [NodeWithScore(node=TextNode(id_='446268394525283746', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='c3254bc65319b52914d6e68fbce69161fcf0e2998e4619287a8560258a2fe53d', text='Delete Entities\nThis topic describes how to delete entities in Milvus.\nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.\nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\nFrequent deletion operations will impact the system performance.\nBefore deleting entities by comlpex boolean expressions, make sure the collection has been loaded.\nDeleting entities by complex boolean expressions is not an atomic operation. Therefore, if it fails halfway through, some data may still be deleted.\nDeleting entities by complex boolean expressions is supported only when the consistency is set to Bounded. For details, see Consistency.\nPrepare boolean expression\nPrepare the boolean expression that filters the entities to delete.\nMilvus supports deleting entities by primary key or complex boolean expressions. For more information on expression rules and supported operators, see Boolean Expression Rules.\nSimple boolean expression\nUse a simple expression to filter data with primary key values of 0 and 1:\npython\nexpr = "book_id in [0,1]"\nComplex boolean expression', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.8668166995048523), NodeWithScore(node=TextNode(id_='446268394525283747', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='3ec8b3a992fb72d081145b7859c70453dc9d71be714c0f5f99ad2b2c2cb1f7ea', text='To filter entities that meet specific conditions, define complex boolean expressions.\nFilter entities whose word_count is greater than or equal to 11000:\npython\nexpr = "word_count >= 11000"\nFilter entities whose book_name is not Unknown:\npython\nexpr = "book_name != Unknown"\nFilter entities whose primary key values are greater than 5 and word_count is smaller than or equal to 9999:\npython\nexpr = "book_id > 5 && word_count <= 9999"\nDelete entities\nDelete the entities with the boolean expression you created. Milvus returns the ID list of the deleted entities.\npython\nfrom pymilvus import Collection\ncollection = Collection("book") # Get an existing collection.\ncollection.delete(expr)\nParameter Description\nexpr Boolean expression that specifies the entities to delete.\npartition_name (optional) Name of the partition to delete entities from.\nUpsert Entities\nThis topic describes how to upsert entities in Milvus.\nUpserting is a combination of insert and delete operations. In the context of a Milvus vector database, an upsert is a data-level operation that will overwrite an existing entity if a specified field already exists in a collection, and insert a new entity if the specified value doesn’t already exist.\nThe following example upserts 3,000 rows of randomly generated data as the example data. When performing upsert operations, it\'s important to note that the operation may compromise performance. This is because the operation involves deleting data during execution.\nPrepare data', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.841397762298584), NodeWithScore(node=TextNode(id_='446268394525283749', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='42656e32ce6baa2897419b8bae612412db94f1d570ab1702f2ae6c5557f248a9', text='When data is upserted into Milvus it is updated and inserted into segments. Segments have to reach a certain size to be sealed and indexed. Unsealed segments will be searched brute force. In order to avoid this with any remainder data, it is best to call flush(). The flush() call will seal any remaining segments and send them for indexing. It is important to only call this method at the end of an upsert session. Calling it too often will cause fragmented data that will need to be cleaned later on.\nLimits\nUpdating primary key fields is not supported by upsert().\nupsert() is not applicable and an error can occur if autoID is set to True for primary key fields.', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.793336033821106)]
%% Cell type:markdown id:e91c5896 tags: %% Cell type:markdown id:e91c5896 tags:
- **Query** (RAG powered by Zilliz Cloud Pipeline's Index & OpenAI's LLM): - **Query** (RAG powered by Zilliz Cloud Pipeline's Index & OpenAI's LLM):
%% Cell type:code id:fc7b01b7 tags: %% Cell type:code id:fc7b01b7 tags:
``` python ``` python
response = query_engine_with_filters.query(question) response = query_engine_milvus23.query(question)
print(response.response) print(response.response)
``` ```
%% Output
Yes, users can delete entities by filtering non-primary fields. Milvus supports deleting entities by complex boolean expressions, which can include conditions based on non-primary fields. Users can define complex boolean expressions to filter entities based on specific conditions and then delete those entities using the expression.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment