From 83b05bb5e0f72bfa6c8241dbd651e1164a773c1d Mon Sep 17 00:00:00 2001 From: Stephen Witkowski <stephen.witkowski@66degrees.com> Date: Mon, 1 Apr 2024 09:26:44 -0400 Subject: [PATCH] Add GoogleEncoder example notebook --- docs/encoders/google.ipynb | 294 +++++++++++++++++++++++++++++++++++++ 1 file changed, 294 insertions(+) create mode 100644 docs/encoders/google.ipynb diff --git a/docs/encoders/google.ipynb b/docs/encoders/google.ipynb new file mode 100644 index 00000000..0dad529c --- /dev/null +++ b/docs/encoders/google.ipynb @@ -0,0 +1,294 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/encoders/huggingface.ipynb) [](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/encoders/huggingface.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using GoogleEncoder" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Google's [Pathways Language Model](https://blog.research.google/2022/04/pathways-language-model-palm-scaling-to.html) (PaLM) is a dense decoder-only model that is trained on a large corpus of text data. The hidden states of the model can be used as embeddings for text data, and Google has released versions of those layers for public use. This notebook demonstrates how to use the GoogleEncoder with the Semantic Router." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Getting Started" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We start by installing semantic-router. Support for the new `GoogleEncoder` class was added in `semantic-router==0.0.X`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -qU semantic-router==0.0.X" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We start by defining a dictionary mapping routes to example phrases that should trigger those routes." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/Stephen.Witkowski/miniforge3/envs/semantic-router/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], + "source": [ + "from semantic_router import Route\n", + "\n", + "politics = Route(\n", + " name=\"politics\",\n", + " utterances=[\n", + " \"isn't politics the best thing ever\",\n", + " \"why don't you tell me about your political opinions\",\n", + " \"don't you just love the president\",\n", + " \"don't you just hate the president\",\n", + " \"they're going to destroy this country!\",\n", + " \"they will save the country!\",\n", + " ],\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's define another for good measure:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "chitchat = Route(\n", + " name=\"chitchat\",\n", + " utterances=[\n", + " \"how's the weather today?\",\n", + " \"how are things going?\",\n", + " \"lovely weather today\",\n", + " \"the weather is horrendous\",\n", + " \"let's go to the chippy\",\n", + " ],\n", + ")\n", + "\n", + "routes = [politics, chitchat]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we initialize our embedding model. To do this with GoogleEncoder, you'll need to have an active Google Cloud Platform account and a project with the Embeddings API enabled. You can find more information on how to set up a development project in the [Google Cloud documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-text-embeddings)." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "from semantic_router.encoders import GoogleEncoder\n", + "\n", + "PROJECT_ID = \"your-project-id\"\n", + "LOCATION = \"us-central1\"\n", + "\n", + "encoder = GoogleEncoder(project_id=PROJECT_ID, location=LOCATION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\u001b[32m2024-04-01 09:25:34 INFO semantic_router.utils.logger local\u001b[0m\n" + ] + } + ], + "source": [ + "from semantic_router.layer import RouteLayer\n", + "\n", + "rl = RouteLayer(encoder=encoder, routes=routes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can test it:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "chitchat 0.6414884458605304\n", + "politics 0.7184715572295559\n", + "politics 0.7439275034824486\n", + "politics 0.7782497192765216\n", + "politics 0.896539779971592\n" + ] + }, + { + "data": { + "text/plain": [ + "RouteChoice(name='politics', function_call=None, similarity_score=None)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rl(\"don't you love politics?\")" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "politics 0.6707593528326423\n", + "chitchat 0.7714977687548606\n", + "chitchat 0.8433716231360381\n", + "chitchat 0.8563028552685018\n", + "chitchat 1.0000000000000004\n" + ] + }, + { + "data": { + "text/plain": [ + "RouteChoice(name='chitchat', function_call=None, similarity_score=None)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rl(\"how's the weather today?\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both are classified accurately, what if we send a query that is unrelated to our existing `Route` objects?" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "politics 0.5136569490418237\n", + "politics 0.5147603969017434\n", + "politics 0.5481785565189041\n", + "chitchat 0.5495495505326361\n", + "chitchat 0.5587930037610617\n" + ] + }, + { + "data": { + "text/plain": [ + "RouteChoice(name=None, function_call=None, similarity_score=None)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "rl(\"I'm interested in learning about llama 2\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this case, we return `None` because no matches were identified." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "decision-layer", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} -- GitLab