">[DuckDB](https://duckdb.org/docs/api/python/overview) is a fast in-process analytical database. DuckDB is under an MIT license.\n",
"\n",
"In this notebook we are going to show how to use DuckDB as a Vector store to be used in LlamaIndex.\n",
"\n",
"Install DuckDB with:\n",
"\n",
"```sh\n",
"pip install duckdb\n",
"```\n",
"\n",
"Make sure to use the latest DuckDB version (>= 0.10.0).\n",
"\n",
"You can run DuckDB in different modes depending on persistence:\n",
"- `in-memory` is the default mode, where the database is created in memory, you can force this to be use by setting `database_name = \":memory:\"` when initializing the vector store.\n",
"- `persistence` is set by using a name for a database and setting a persistence directory `database_name = \"my_vector_store.duckdb\"` where the database is persisted in the default `persist_dir` or to the one you set it to.\n",
"\n",
"With the vector store created, you can:\n",
"- `.add` \n",
"- `.get` \n",
"- `.update`\n",
"- `.upsert`\n",
"- `.delete`\n",
"- `.peek`\n",
"- `.query` to run a search. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic example\n",
"\n",
"In this basic example, we take the Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into `DuckDBVectorStore`, and then query it.\n",
"\n",
"For the embedding model we will use OpenAI. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
"<b>The author mentions that before college, they worked on two main things outside of school: writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. They later got a microcomputer and started programming more extensively.</b>"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"query_engine = index.as_query_engine()\n",
"response = query_engine.query(\"What did the author do growing up?\")\n",
"display(Markdown(f\"<b>{response}</b>\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Persisting to disk example\n",
"\n",
"Extending the previous example, if you want to save to disk, simply initialize the DuckDBVectorStore by specifying a database name and persist directory."
"<b>The author mentions that before college, they worked on two main things outside of school: writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. They later got a microcomputer and started programming more extensively.</b>"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Query Data\n",
"query_engine = index.as_query_engine()\n",
"response = query_engine.query(\"What did the author do growing up?\")\n",
"display(Markdown(f\"<b>{response}</b>\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Metadata filter example"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is possible to narrow down the search space by filter with metadata. Below is an example to show that in practice. "
"retriever.retrieve(\"What is inception about?\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "llama-index-dev",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
%% Cell type:markdown id: tags:
# DuckDB
>[DuckDB](https://duckdb.org/docs/api/python/overview) is a fast in-process analytical database. DuckDB is under an MIT license.
In this notebook we are going to show how to use DuckDB as a Vector store to be used in LlamaIndex.
Install DuckDB with:
```sh
pip install duckdb
```
Make sure to use the latest DuckDB version (>= 0.10.0).
You can run DuckDB in different modes depending on persistence:
-`in-memory` is the default mode, where the database is created in memory, you can force this to be use by setting `database_name = ":memory:"` when initializing the vector store.
-`persistence` is set by using a name for a database and setting a persistence directory `database_name = "my_vector_store.duckdb"` where the database is persisted in the default `persist_dir` or to the one you set it to.
With the vector store created, you can:
-`.add`
-`.get`
-`.update`
-`.upsert`
-`.delete`
-`.peek`
-`.query` to run a search.
%% Cell type:markdown id: tags:
## Basic example
In this basic example, we take the Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into `DuckDBVectorStore`, and then query it.
For the embedding model we will use OpenAI.
%% Cell type:markdown id: tags:
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
response=query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))
```
%% Output
<b>The author mentions that before college, they worked on two main things outside of school: writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. They later got a microcomputer and started programming more extensively.</b>
%% Cell type:markdown id: tags:
## Persisting to disk example
Extending the previous example, if you want to save to disk, simply initialize the DuckDBVectorStore by specifying a database name and persist directory.
response=query_engine.query("What did the author do growing up?")
display(Markdown(f"<b>{response}</b>"))
```
%% Output
<b>The author mentions that before college, they worked on two main things outside of school: writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer. They later got a microcomputer and started programming more extensively.</b>
%% Cell type:markdown id: tags:
## Metadata filter example
%% Cell type:markdown id: tags:
It is possible to narrow down the search space by filter with metadata. Below is an example to show that in practice.