Add Auto-retriever tutorial for Weaviate (#11885)

* Add Auto-retriever tutorial for Weaviate * Include review comments * cr --------- Co-authored-by: Leonie <leonie@Leonies-MBP-2.fritz.box> Co-authored-by: Haotian Zhang <socool.king@gmail.com>

Add Auto-retriever tutorial for Weaviate (#11885)
446949e3 · Leonie · GitHub · ddafdc18 · 446949e3
Unverified Commit 446949e3 authored 1 year ago by Leonie Committed by GitHub 1 year ago
--- a/docs/examples/vector_stores/WeaviateIndex_auto_retriever.ipynb
+++ b/docs/examples/vector_stores/WeaviateIndex_auto_retriever.ipynb
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "0e81b124",
+   "metadata": {},
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/WeaviateIndex_auto_retriever.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "307804a3-c02b-4a57-ac0d-172c30ddc851",
+   "metadata": {},
+   "source": [
+    "# Auto-Retrieval from a Weaviate Vector Database\n",
+    "\n",
+    "This guide shows how to perform **auto-retrieval** in LlamaIndex with [Weaviate](https://weaviate.io/). \n",
+    "\n",
+    "The Weaviate vector database supports a set of [metadata filters](https://weaviate.io/developers/weaviate/search/filters) in addition to a query string for semantic search. Given a natural language query, we first use a Large Language Model (LLM) to infer a set of metadata filters as well as the right query string to pass to the vector database (either can also be blank). This overall query bundle is then executed against the vector database.\n",
+    "\n",
+    "This allows for more dynamic, expressive forms of retrieval beyond top-k semantic search. The relevant context for a given query may only require filtering on a metadata tag, or require a joint combination of filtering + semantic search within the filtered set, or just raw semantic search."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
+   "metadata": {},
+   "source": [
+    "## Setup \n",
+    "\n",
+    "We first define imports and define an empty Weaviate collection."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "31faecfb",
+   "metadata": {},
+   "source": [
+    "If you're opening this Notebook on Colab, you will probably need to install LlamaIndex 🦙."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "13223201",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install llama-index-vector-stores-weaviate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "53d3d6e7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install llama-index weaviate-client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d48af8e1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import logging\n",
+    "import sys\n",
+    "\n",
+    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
+    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "198ea1cc",
+   "metadata": {},
+   "source": [
+    "We will be using GPT-4 for its reasoning capabilities to infer the metadata filters. Depending on your use case, `\"gpt-3.5-turbo\"` can work as well."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bf49ac18",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# set up OpenAI\n",
+    "import os\n",
+    "import getpass\n",
+    "import openai\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")\n",
+    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f2819b6c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:numexpr.utils:Note: NumExpr detected 10 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
+      "Note: NumExpr detected 10 cores but \"NUMEXPR_MAX_THREADS\" not set, so enforcing safe limit of 8.\n",
+      "INFO:numexpr.utils:NumExpr defaulting to 8 threads.\n",
+      "NumExpr defaulting to 8 threads.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
+    "from llama_index.llms.openai import OpenAI\n",
+    "from llama_index.core.settings import Settings\n",
+    "\n",
+    "Settings.llm = OpenAI(model=\"gpt-4\")\n",
+    "Settings.embed_model = OpenAIEmbedding()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a9d3d43c",
+   "metadata": {},
+   "source": [
+    "This Notebook uses Weaviate in [Embedded mode](https://weaviate.io/developers/weaviate/installation/embedded), which is supported on Linux and macOS.\n",
+    "\n",
+    "If you prefer to try out Weaviate's fully managed service, [Weaviate Cloud Services (WCS)](https://weaviate.io/developers/weaviate/installation/weaviate-cloud-services), you can enable the code in the comments."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ce3143d-198c-4dd2-8e5a-c5cdf94f017a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "embedded weaviate is already listening on port 8079\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/opt/homebrew/lib/python3.11/site-packages/weaviate/warnings.py:121: DeprecationWarning: Dep005: You are using weaviate-client version 3.26.2. The latest version is 4.5.2.\n",
+      "            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'\\nimport weaviate\\n\\n# cloud\\nresource_owner_config = weaviate.AuthClientPassword(\\n    username=\"\",\\n    password=\"\",\\n)\\nclient = weaviate.Client(\\n    \"https://test.weaviate.network\",\\n    auth_client_secret=resource_owner_config,\\n)\\n\\n# local\\n# client = weaviate.Client(\"http://localhost:8081\")\\n'"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import weaviate\n",
+    "from weaviate.embedded import EmbeddedOptions\n",
+    "\n",
+    "# Connect to Weaviate client in embedded mode\n",
+    "client = weaviate.Client(embedded_options=EmbeddedOptions())\n",
+    "\n",
+    "# Enable this code if you want to use Weaviate Cloud Services instead of Embedded mode.\n",
+    "\"\"\"\n",
+    "import weaviate\n",
+    "\n",
+    "# cloud\n",
+    "resource_owner_config = weaviate.AuthClientPassword(\n",
+    "    username=\"\",\n",
+    "    password=\"\",\n",
+    ")\n",
+    "client = weaviate.Client(\n",
+    "    \"https://test.weaviate.network\",\n",
+    "    auth_client_secret=resource_owner_config,\n",
+    ")\n",
+    "\n",
+    "# local\n",
+    "# client = weaviate.Client(\"http://localhost:8081\")\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41aa106b-8261-4a01-97c6-1b037dffa1b4",
+   "metadata": {},
+   "source": [
+    "## Defining Some Sample Data\n",
+    "\n",
+    "We insert some sample nodes containing text chunks into the vector database. Note that each `TextNode` not only contains the text, but also metadata e.g. `category` and `country`. These metadata fields will get converted/stored as such in the underlying vector db."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.schema import TextNode\n",
+    "\n",
+    "nodes = [\n",
+    "    TextNode(\n",
+    "        text=(\n",
+    "            \"Michael Jordan is a retired professional basketball player,\"\n",
+    "            \" widely regarded as one of the greatest basketball players of all\"\n",
+    "            \" time.\"\n",
+    "        ),\n",
+    "        metadata={\n",
+    "            \"category\": \"Sports\",\n",
+    "            \"country\": \"United States\",\n",
+    "        },\n",
+    "    ),\n",
+    "    TextNode(\n",
+    "        text=(\n",
+    "            \"Angelina Jolie is an American actress, filmmaker, and\"\n",
+    "            \" humanitarian. She has received numerous awards for her acting\"\n",
+    "            \" and is known for her philanthropic work.\"\n",
+    "        ),\n",
+    "        metadata={\n",
+    "            \"category\": \"Entertainment\",\n",
+    "            \"country\": \"United States\",\n",
+    "        },\n",
+    "    ),\n",
+    "    TextNode(\n",
+    "        text=(\n",
+    "            \"Elon Musk is a business magnate, industrial designer, and\"\n",
+    "            \" engineer. He is the founder, CEO, and lead designer of SpaceX,\"\n",
+    "            \" Tesla, Inc., Neuralink, and The Boring Company.\"\n",
+    "        ),\n",
+    "        metadata={\n",
+    "            \"category\": \"Business\",\n",
+    "            \"country\": \"United States\",\n",
+    "        },\n",
+    "    ),\n",
+    "    TextNode(\n",
+    "        text=(\n",
+    "            \"Rihanna is a Barbadian singer, actress, and businesswoman. She\"\n",
+    "            \" has achieved significant success in the music industry and is\"\n",
+    "            \" known for her versatile musical style.\"\n",
+    "        ),\n",
+    "        metadata={\n",
+    "            \"category\": \"Music\",\n",
+    "            \"country\": \"Barbados\",\n",
+    "        },\n",
+    "    ),\n",
+    "    TextNode(\n",
+    "        text=(\n",
+    "            \"Cristiano Ronaldo is a Portuguese professional footballer who is\"\n",
+    "            \" considered one of the greatest football players of all time. He\"\n",
+    "            \" has won numerous awards and set multiple records during his\"\n",
+    "            \" career.\"\n",
+    "        ),\n",
+    "        metadata={\n",
+    "            \"category\": \"Sports\",\n",
+    "            \"country\": \"Portugal\",\n",
+    "        },\n",
+    "    ),\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8bd70be-57c7-49e2-990b-ad9a876710fb",
+   "metadata": {},
+   "source": [
+    "## Build Vector Index with Weaviate Vector Store\n",
+    "\n",
+    "Here we load the data into the vector store. As mentioned above, both the text and metadata for each node will get converted into corresopnding representations in Weaviate. We can now run semantic queries and also metadata filtering on this data from Weaviate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ba1558b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core import VectorStoreIndex, StorageContext\n",
+    "from llama_index.vector_stores.weaviate import WeaviateVectorStore\n",
+    "\n",
+    "vector_store = WeaviateVectorStore(\n",
+    "    weaviate_client=client, index_name=\"LlamaIndex_filter\"\n",
+    ")\n",
+    "\n",
+    "storage_context = StorageContext.from_defaults(vector_store=vector_store)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "35369eda",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "index = VectorStoreIndex(nodes, storage_context=storage_context)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c793dc45-5087-4dcb-b0d3-85b8e718539f",
+   "metadata": {},
+   "source": [
+    "## Define `VectorIndexAutoRetriever`\n",
+    "\n",
+    "We define our core `VectorIndexAutoRetriever` module. The module takes in `VectorStoreInfo`,\n",
+    "which contains a structured description of the vector store collection and the metadata filters it supports.\n",
+    "This information will then be used in the auto-retrieval prompt where the LLM infers metadata filters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bedbb693-725f-478f-be26-fa7180ea38b2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.retrievers import VectorIndexAutoRetriever\n",
+    "from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo\n",
+    "\n",
+    "\n",
+    "vector_store_info = VectorStoreInfo(\n",
+    "    content_info=\"brief biography of celebrities\",\n",
+    "    metadata_info=[\n",
+    "        MetadataInfo(\n",
+    "            name=\"category\",\n",
+    "            type=\"str\",\n",
+    "            description=(\n",
+    "                \"Category of the celebrity, one of [Sports, Entertainment,\"\n",
+    "                \" Business, Music]\"\n",
+    "            ),\n",
+    "        ),\n",
+    "        MetadataInfo(\n",
+    "            name=\"country\",\n",
+    "            type=\"str\",\n",
+    "            description=(\n",
+    "                \"Country of the celebrity, one of [United States, Barbados,\"\n",
+    "                \" Portugal]\"\n",
+    "            ),\n",
+    "        ),\n",
+    "    ],\n",
+    ")\n",
+    "\n",
+    "retriever = VectorIndexAutoRetriever(\n",
+    "    index, vector_store_info=vector_store_info\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32808a60-7bab-4e9e-944c-cfe2ed0b0e2e",
+   "metadata": {},
+   "source": [
+    "## Running over some sample data\n",
+    "\n",
+    "We try running over some sample data. Note how metadata filters are inferred - this helps with more precise retrieval! "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eeb18e9c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using query str: Tell me about celebrities\n",
+      "Using query str: Tell me about celebrities\n",
+      "INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using filters: [('country', '==', 'United States')]\n",
+      "Using filters: [('country', '==', 'United States')]\n",
+      "INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using top_k: 2\n",
+      "Using top_k: 2\n",
+      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = retriever.retrieve(\"Tell me about celebrities from United States\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee543cf6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Node ID: a6ba3fa4-3ced-4281-9cf3-df9e73bb92d2\n",
+      "Text: Angelina Jolie is an American actress, filmmaker, and\n",
+      "humanitarian. She has received numerous awards for her acting and is\n",
+      "known for her philanthropic work.\n",
+      "Score:  0.790\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(response[0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "51f00cde",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "HTTP Request: POST https://api.openai.com/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using query str: Sports celebrities\n",
+      "Using query str: Sports celebrities\n",
+      "INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using filters: [('category', '==', 'Sports'), ('country', '==', 'United States')]\n",
+      "Using filters: [('category', '==', 'Sports'), ('country', '==', 'United States')]\n",
+      "INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using top_k: 2\n",
+      "Using top_k: 2\n",
+      "INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n",
+      "HTTP Request: POST https://api.openai.com/v1/embeddings \"HTTP/1.1 200 OK\"\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = retriever.retrieve(\n",
+    "    \"Tell me about Sports celebrities from United States\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "345d3ca3",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Node ID: 5e689253-2e2c-440d-9f72-6f9513fd2c3a\n",
+      "Text: Michael Jordan is a retired professional basketball player,\n",
+      "widely regarded as one of the greatest basketball players of all time.\n",
+      "Score:  0.797\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(response[0])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:markdown id:0e81b124 tags:
+<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/WeaviateIndex_auto_retriever.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
+%% Cell type:markdown id:307804a3-c02b-4a57-ac0d-172c30ddc851 tags:
+# Auto-Retrieval from a Weaviate Vector Database
+This guide shows how to perform **auto-retrieval** in LlamaIndex with [Weaviate](https://weaviate.io/).
+The Weaviate vector database supports a set of [metadata filters](https://weaviate.io/developers/weaviate/search/filters) in addition to a query string for semantic search. Given a natural language query, we first use a Large Language Model (LLM) to infer a set of metadata filters as well as the right query string to pass to the vector database (either can also be blank). This overall query bundle is then executed against the vector database.
+This allows for more dynamic, expressive forms of retrieval beyond top-k semantic search. The relevant context for a given query may only require filtering on a metadata tag, or require a joint combination of filtering + semantic search within the filtered set, or just raw semantic search.
+%% Cell type:markdown id:f7010b1d-d1bb-4f08-9309-a328bb4ea396 tags:
+## Setup
+We first define imports and define an empty Weaviate collection.
+%% Cell type:markdown id:31faecfb tags:
+If you're opening this Notebook on Colab, you will probably need to install LlamaIndex 🦙.
+%% Cell type:code id:13223201 tags:
+``` python
+%pip install llama-index-vector-stores-weaviate
+```
+%% Cell type:code id:53d3d6e7 tags:
+``` python
+!pip install llama-index weaviate-client
+```
+%% Cell type:code id:d48af8e1 tags:
+``` python
+import logging
+import sys
+logging.basicConfig(stream=sys.stdout, level=logging.INFO)
+logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
+```
+%% Cell type:markdown id:198ea1cc tags:
+We will be using GPT-4 for its reasoning capabilities to infer the metadata filters. Depending on your use case, `"gpt-3.5-turbo"` can work as well.
+%% Cell type:code id:bf49ac18 tags:
+``` python
+# set up OpenAI
+import os
+import getpass
+import openai
+os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
+openai.api_key = os.environ["OPENAI_API_KEY"]
+```
+%% Cell type:code id:f2819b6c tags:
+``` python
+from llama_index.embeddings.openai import OpenAIEmbedding
+from llama_index.llms.openai import OpenAI
+from llama_index.core.settings import Settings
+Settings.llm = OpenAI(model="gpt-4")
+Settings.embed_model = OpenAIEmbedding()
+```
+%% Output
+    INFO:numexpr.utils:Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
+    Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
+    INFO:numexpr.utils:NumExpr defaulting to 8 threads.
+    NumExpr defaulting to 8 threads.
+%% Cell type:markdown id:a9d3d43c tags:
+This Notebook uses Weaviate in [Embedded mode](https://weaviate.io/developers/weaviate/installation/embedded), which is supported on Linux and macOS.
+If you prefer to try out Weaviate's fully managed service, [Weaviate Cloud Services (WCS)](https://weaviate.io/developers/weaviate/installation/weaviate-cloud-services), you can enable the code in the comments.
+%% Cell type:code id:0ce3143d-198c-4dd2-8e5a-c5cdf94f017a tags:
+``` python
+import weaviate
+from weaviate.embedded import EmbeddedOptions
+# Connect to Weaviate client in embedded mode
+client = weaviate.Client(embedded_options=EmbeddedOptions())
+# Enable this code if you want to use Weaviate Cloud Services instead of Embedded mode.
+"""
+import weaviate
+# cloud
+resource_owner_config = weaviate.AuthClientPassword(
+    username="",
+    password="",
+)
+client = weaviate.Client(
+    "https://test.weaviate.network",
+    auth_client_secret=resource_owner_config,
+)
+# local
+# client = weaviate.Client("http://localhost:8081")
+"""
+```
+%% Output
+    embedded weaviate is already listening on port 8079
+    /opt/homebrew/lib/python3.11/site-packages/weaviate/warnings.py:121: DeprecationWarning: Dep005: You are using weaviate-client version 3.26.2. The latest version is 4.5.2.
+                Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.
+      warnings.warn(
+    '\nimport weaviate\n\n# cloud\nresource_owner_config = weaviate.AuthClientPassword(\n    username="",\n    password="",\n)\nclient = weaviate.Client(\n    "https://test.weaviate.network",\n    auth_client_secret=resource_owner_config,\n)\n\n# local\n# client = weaviate.Client("http://localhost:8081")\n'
+%% Cell type:markdown id:41aa106b-8261-4a01-97c6-1b037dffa1b4 tags:
+## Defining Some Sample Data
+We insert some sample nodes containing text chunks into the vector database. Note that each `TextNode` not only contains the text, but also metadata e.g. `category` and `country`. These metadata fields will get converted/stored as such in the underlying vector db.
+%% Cell type:code id:68cbd239-880e-41a3-98d8-dbb3fab55431 tags:
+``` python
+from llama_index.core.schema import TextNode
+nodes = [
+    TextNode(
+        text=(
+            "Michael Jordan is a retired professional basketball player,"
+            " widely regarded as one of the greatest basketball players of all"
+            " time."
+        ),
+        metadata={
+            "category": "Sports",
+            "country": "United States",
+        },
+    ),
+    TextNode(
+        text=(
+            "Angelina Jolie is an American actress, filmmaker, and"
+            " humanitarian. She has received numerous awards for her acting"
+            " and is known for her philanthropic work."
+        ),
+        metadata={
+            "category": "Entertainment",
+            "country": "United States",
+        },
+    ),
+    TextNode(
+        text=(
+            "Elon Musk is a business magnate, industrial designer, and"
+            " engineer. He is the founder, CEO, and lead designer of SpaceX,"
+            " Tesla, Inc., Neuralink, and The Boring Company."
+        ),
+        metadata={
+            "category": "Business",
+            "country": "United States",
+        },
+    ),
+    TextNode(
+        text=(
+            "Rihanna is a Barbadian singer, actress, and businesswoman. She"
+            " has achieved significant success in the music industry and is"
+            " known for her versatile musical style."
+        ),
+        metadata={
+            "category": "Music",
+            "country": "Barbados",
+        },
+    ),
+    TextNode(
+        text=(
+            "Cristiano Ronaldo is a Portuguese professional footballer who is"
+            " considered one of the greatest football players of all time. He"
+            " has won numerous awards and set multiple records during his"
+            " career."
+        ),
+        metadata={
+            "category": "Sports",
+            "country": "Portugal",
+        },
+    ),
+]
+```
+%% Cell type:markdown id:e8bd70be-57c7-49e2-990b-ad9a876710fb tags:
+## Build Vector Index with Weaviate Vector Store
+Here we load the data into the vector store. As mentioned above, both the text and metadata for each node will get converted into corresopnding representations in Weaviate. We can now run semantic queries and also metadata filtering on this data from Weaviate.
+%% Cell type:code id:ba1558b3 tags:
+``` python
+from llama_index.core import VectorStoreIndex, StorageContext
+from llama_index.vector_stores.weaviate import WeaviateVectorStore
+vector_store = WeaviateVectorStore(
+    weaviate_client=client, index_name="LlamaIndex_filter"
+)
+storage_context = StorageContext.from_defaults(vector_store=vector_store)
+```
+%% Cell type:code id:35369eda tags:
+``` python
+index = VectorStoreIndex(nodes, storage_context=storage_context)
+```
+%% Output
+    INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
+    HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
+%% Cell type:markdown id:c793dc45-5087-4dcb-b0d3-85b8e718539f tags:
+## Define `VectorIndexAutoRetriever`
+We define our core `VectorIndexAutoRetriever` module. The module takes in `VectorStoreInfo`,
+which contains a structured description of the vector store collection and the metadata filters it supports.
+This information will then be used in the auto-retrieval prompt where the LLM infers metadata filters.
+%% Cell type:code id:bedbb693-725f-478f-be26-fa7180ea38b2 tags:
+``` python
+from llama_index.core.retrievers import VectorIndexAutoRetriever
+from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo
+vector_store_info = VectorStoreInfo(
+    content_info="brief biography of celebrities",
+    metadata_info=[
+        MetadataInfo(
+            name="category",
+            type="str",
+            description=(
+                "Category of the celebrity, one of [Sports, Entertainment,"
+                " Business, Music]"
+            ),
+        ),
+        MetadataInfo(
+            name="country",
+            type="str",
+            description=(
+                "Country of the celebrity, one of [United States, Barbados,"
+                " Portugal]"
+            ),
+        ),
+    ],
+)
+retriever = VectorIndexAutoRetriever(
+    index, vector_store_info=vector_store_info
+)
+```
+%% Cell type:markdown id:32808a60-7bab-4e9e-944c-cfe2ed0b0e2e tags:
+## Running over some sample data
+We try running over some sample data. Note how metadata filters are inferred - this helps with more precise retrieval!
+%% Cell type:code id:eeb18e9c tags:
+``` python
+response = retriever.retrieve("Tell me about celebrities from United States")
+```
+%% Output
+    INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
+    HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
+    INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using query str: Tell me about celebrities
+    Using query str: Tell me about celebrities
+    INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using filters: [('country', '==', 'United States')]
+    Using filters: [('country', '==', 'United States')]
+    INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using top_k: 2
+    Using top_k: 2
+    INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
+    HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
+%% Cell type:code id:ee543cf6 tags:
+``` python
+print(response[0])
+```
+%% Output
+    Node ID: a6ba3fa4-3ced-4281-9cf3-df9e73bb92d2
+    Text: Angelina Jolie is an American actress, filmmaker, and
+    humanitarian. She has received numerous awards for her acting and is
+    known for her philanthropic work.
+    Score:  0.790
+%% Cell type:code id:51f00cde tags:
+``` python
+response = retriever.retrieve(
+    "Tell me about Sports celebrities from United States"
+)
+```
+%% Output
+    INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
+    HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
+    INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using query str: Sports celebrities
+    Using query str: Sports celebrities
+    INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using filters: [('category', '==', 'Sports'), ('country', '==', 'United States')]
+    Using filters: [('category', '==', 'Sports'), ('country', '==', 'United States')]
+    INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using top_k: 2
+    Using top_k: 2
+    INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
+    HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
+%% Cell type:code id:345d3ca3 tags:
+``` python
+print(response[0])
+```
+%% Output
+    Node ID: 5e689253-2e2c-440d-9f72-6f9513fd2c3a
+    Text: Michael Jordan is a retired professional basketball player,
+    widely regarded as one of the greatest basketball players of all time.
+    Score:  0.797