Skip to content
Snippets Groups Projects
Unverified Commit 5178106c authored by Ian's avatar Ian Committed by GitHub
Browse files

Minor improvement for TiDB Vector (#11766)

parent 74df7c01
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# TiDB Vector Store
> [TiDB Cloud](https://tidbcloud.com/), is a comprehensive Database-as-a-Service (DBaaS) solution, that provides dedicated and serverless options. TiDB Serverless is now integrating a built-in vector search into the MySQL landscape. With this enhancement, you can seamlessly develop AI applications using TiDB Serverless without the need for a new database or additional technical stacks. Be among the first to experience it by joining the waitlist for the private beta at https://tidb.cloud/ai.
In its latest version (insert version number here), TiDB introduces support for vector search. This notebook provides a detailed guide on utilizing the tidb vector search in LlamaIndex.
This notebook provides a detailed guide on utilizing the tidb vector search in LlamaIndex.
%% Cell type:markdown id: tags:
## Setting up environments
%% Cell type:code id: tags:
``` python
%pip install llama-index-vector-stores-tidbvector
!pip install llama-index
%pip install llama-index
```
%% Cell type:code id: tags:
``` python
import textwrap
import openai
from llama_index.core import SimpleDirectoryReader, StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.tidbvector import TiDBVectorStore
```
%% Cell type:markdown id: tags:
Configure both the OpenAI and TiDB host settings that you will need
%% Cell type:code id: tags:
``` python
# Here we useimport getpass
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
tidb_connection_url = getpass.getpass(
"TiDB connection URL (format - mysql+pymysql://root@127.0.0.1:4000/test): "
)
```
%% Cell type:markdown id: tags:
Prepare data that used to show case
%% Cell type:code id: tags:
``` python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
```
%% Cell type:code id: tags:
``` python
documents = SimpleDirectoryReader("./data/paul_graham").load_data()
print("Document ID:", documents[0].doc_id)
for index, document in enumerate(documents):
document.metadata = {"book": "paul_graham"}
```
%% Output
Document ID: d970e919-4469-414b-967e-24dd9b2eb014
%% Cell type:markdown id: tags:
## Create TiDB Vectore Store
The code snippet below creates a table named `VECTOR_TABLE_NAME` in TiDB, optimized for vector searching. Upon successful execution of this code, you will be able to view and access the `VECTOR_TABLE_NAME` table directly within your TiDB database environment
%% Cell type:code id: tags:
``` python
VECTOR_TABLE_NAME = "paul_graham_test"
tidbvec = TiDBVectorStore(
connection_string=tidb_connection_url,
table_name=VECTOR_TABLE_NAME,
distance_strategy="cosine",
vector_dimension=1536,
drop_existing_table=False,
)
```
%% Cell type:markdown id: tags:
Create a query engine based on tidb vectore store
%% Cell type:code id: tags:
``` python
storage_context = StorageContext.from_defaults(vector_store=tidbvec)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context, show_progress=True
)
```
%% Output
/Users/ianz/Work/miniconda3/envs/llama_index/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 8.76it/s]
Generating embeddings: 100%|██████████| 21/21 [00:02<00:00, 8.22it/s]
%% Cell type:markdown id: tags:
## Semantic similarity search
This section focus on vector search basics and refining results using metadata filters. Please note that tidb vector only supports Deafult VectorStoreQueryMode.
%% Cell type:code id: tags:
``` python
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do?")
print(textwrap.fill(str(response), 100))
```
%% Output
The author worked on writing, programming, building microcomputers, giving talks at conferences,
publishing essays online, developing spam filters, painting, hosting dinner parties, and purchasing
a building for office use.
%% Cell type:markdown id: tags:
### Filter with metadata
perform searches using metadata filters to retrieve a specific number of nearest-neighbor results that align with the applied filters.
%% Cell type:code id: tags:
``` python
from llama_index.core.vector_stores.types import MetadataFilter, MetadataFilters
query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
MetadataFilter(key="book", value="paul_graham", operator="!="),
]
),
similarity_top_k=2,
)
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
```
%% Output
Empty Response
%% Cell type:markdown id: tags:
Query again
%% Cell type:code id: tags:
``` python
from llama_index.core.vector_stores.types import MetadataFilter, MetadataFilters
query_engine = index.as_query_engine(
filters=MetadataFilters(
filters=[
MetadataFilter(key="book", value="paul_graham", operator="=="),
]
),
similarity_top_k=2,
)
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
```
%% Output
The author learned programming on an IBM 1401 using an early version of Fortran in 9th grade, then
later transitioned to working with microcomputers like the TRS-80 and Apple II. Additionally, the
author studied philosophy in college but found it unfulfilling, leading to a switch to studying AI.
Later on, the author attended art school in both the US and Italy, where they observed a lack of
substantial teaching in the painting department.
%% Cell type:markdown id: tags:
## Delete documents
%% Cell type:code id: tags:
``` python
tidbvec.delete(documents[0].doc_id)
```
%% Cell type:markdown id: tags:
Check whether the documents had been deleted
%% Cell type:code id: tags:
``` python
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))
```
%% Output
Empty Response
......
Source diff could not be displayed: it is too large. Options to address this: view the blob.
......@@ -31,13 +31,14 @@ license = "MIT"
name = "llama-index-vector-stores-tidbvector"
packages = [{include = "llama_index/"}]
readme = "README.md"
version = "0.1.0"
version = "0.1.1"
[tool.poetry.dependencies]
python = ">=3.8.1,<4.0"
llama-index-core = ">=0.10.1"
sqlalchemy = ">=1.4,<3"
tidb-vector = ">=0.0.3,<1.0.0"
pymysql = "^1.1.0"
[tool.poetry.group.dev.dependencies]
black = {extras = ["jupyter"], version = "<=23.9.1,>=23.7.0"}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment