Merge pull request #106 from aurelio-labs/james/16-release

chore: 16 release

Merge pull request #106 from aurelio-labs/james/16-release
b4ad4d79 · James Briggs · GitHub · 311e9095 · 47790308 · b4ad4d79
Unverified Commit b4ad4d79 authored 1 year ago by James Briggs Committed by GitHub 1 year ago
--- a/README.md
+++ b/README.md
@@ -15,6 +15,8 @@
 Semantic Router is a superfast decision-making layer for your LLMs and agents. Rather than waiting for slow LLM generations to make tool-use decisions, we use the magic of semantic vector space to make those decisions — _routing_ our requests using _semantic_ meaning.
+---
 ## Quickstart
 To get started with _semantic-router_ we install it like so:
@@ -114,4 +116,28 @@ rl("I'm interested in learning about llama 2").name
 In this case, no decision could be made as we had no matches — so our route layer returned `None`!
-## 📚 [Resources](https://github.com/aurelio-labs/semantic-router/tree/main/docs)
+---
+## 📚 Resources
+### Docs
+| Notebook | Description |
+| -------- | ----------- |
+| [Introduction](https://github.com/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb) | Introduction to Semantic Router and static routes |
+| [Dynamic Routes](https://github.com/aurelio-labs/semantic-router/blob/main/docs/02-dynamic-routes.ipynb) | Dynamic routes for parameter generation and functionc calls |
+| [Save/Load Layers](https://github.com/aurelio-labs/semantic-router/blob/main/docs/01-save-load-from-file.ipynb) | How to save and load `RouteLayer` from file |
+| [Local Execution](https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb) | Fully local Semantic Router with dynamic routes — *local models such as Mistral 7B outperform GPT-3.5 in most tests* |
+| [LangChain Integration](https://github.com/aurelio-labs/semantic-router/blob/main/docs/03-basic-langchain-agent.ipynb) | How to integrate Semantic Router with LangChain Agents |
+### Online Course
+**COMING SOON**
+### Community
+Julian Horsey, [Semantic Router superfast decision layer for LLMs and AI agents](https://www.geeky-gadgets.com/semantic-router-superfast-decision-layer-for-llms-and-ai-agents/), Geeky Gadgets
+azhar, [Beyond Basic Chatbots: How Semantic Router is Changing the Game](https://medium.com/ai-insights-cobet/beyond-basic-chatbots-how-semantic-router-is-changing-the-game-783dd959a32d), AI Insights @ Medium
+Daniel Avila, [Semantic Router: Enhancing Control in LLM Conversations](https://blog.codegpt.co/semantic-router-enhancing-control-in-llm-conversations-68ce905c8d33), CodeGPT @ Medium
--- a/docs/00-introduction.ipynb
+++ b/docs/00-introduction.ipynb
@@ -41,7 +41,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "!pip install -qU semantic-router==0.0.15"
+    "!pip install -qU semantic-router==0.0.16"
   ]
  },
  {

 %% Cell type:markdown id: tags:
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb)
 %% Cell type:markdown id: tags:
 # Semantic Router Intro
 %% Cell type:markdown id: tags:
 The Semantic Router library can be used as a super fast route making layer on top of LLMs. That means rather than waiting on a slow agent to decide what to do, we can use the magic of semantic vector space to make routes. Cutting route making time down from seconds to milliseconds.
 %% Cell type:markdown id: tags:
 ## Getting Started
 %% Cell type:markdown id: tags:
 We start by installing the library:
 %% Cell type:code id: tags:
 ``` python
-!pip install -qU semantic-router==0.0.15
+!pip install -qU semantic-router==0.0.16
 ```
 %% Cell type:markdown id: tags:
 We start by defining a dictionary mapping routes to example phrases that should trigger those routes.
 %% Cell type:code id: tags:
 ``` python
 from semantic_router import Route
 politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president",
        "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
 )
 ```
 %% Cell type:markdown id: tags:
 Let's define another for good measure:
 %% Cell type:code id: tags:
 ``` python
 chitchat = Route(
    name="chitchat",
    utterances=[
        "how's the weather today?",
        "how are things going?",
        "lovely weather today",
        "the weather is horrendous",
        "let's go to the chippy",
    ],
 )
 routes = [politics, chitchat]
 ```
 %% Cell type:markdown id: tags:
 Now we initialize our embedding model:
 %% Cell type:code id: tags:
 ``` python
 import os
 from getpass import getpass
 from semantic_router.encoders import CohereEncoder, OpenAIEncoder
 # os.environ["COHERE_API_KEY"] = os.getenv("COHERE_API_KEY") or getpass(
 #     "Enter Cohere API Key: "
 # )
 os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass(
    "Enter OpenAI API Key: "
 )
 # encoder = CohereEncoder()
 encoder = OpenAIEncoder()
 ```
 %% Cell type:markdown id: tags:
 Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`.
 %% Cell type:code id: tags:
 ``` python
 from semantic_router.layer import RouteLayer
 rl = RouteLayer(encoder=encoder, routes=routes)
 ```
 %% Output
    [32m2024-01-07 18:08:29 INFO semantic_router.utils.logger Initializing RouteLayer[0m
 %% Cell type:markdown id: tags:
 Now we can test it:
 %% Cell type:code id: tags:
 ``` python
 rl("don't you love politics?")
 ```
 %% Output
    RouteChoice(name='politics', function_call=None)
 %% Cell type:code id: tags:
 ``` python
 rl("how's the weather today?")
 ```
 %% Output
    RouteChoice(name='chitchat', function_call=None)
 %% Cell type:markdown id: tags:
 Both are classified accurately, what if we send a query that is unrelated to our existing `Route` objects?
 %% Cell type:code id: tags:
 ``` python
 rl("I'm interested in learning about llama 2")
 ```
 %% Output
    RouteChoice(name=None, function_call=None)
 %% Cell type:markdown id: tags:
 In this case, we return `None` because no matches were identified.

--- a/docs/01-save-load-from-file.ipynb
+++ b/docs/01-save-load-from-file.ipynb
@@ -36,7 +36,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "!pip install -qU semantic-router==0.0.15"
+    "!pip install -qU semantic-router==0.0.16"
   ]
  },
  {

 %% Cell type:markdown id: tags:
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/01-save-load-from-file.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/01-save-load-from-file.ipynb)
 %% Cell type:markdown id: tags:
 # Route Layers from File
 Here we will show how to save routers to YAML or JSON files, and how to load a route layer from file.
 %% Cell type:markdown id: tags:
 ## Getting Started
 %% Cell type:markdown id: tags:
 We start by installing the library:
 %% Cell type:code id: tags:
 ``` python
-!pip install -qU semantic-router==0.0.15
+!pip install -qU semantic-router==0.0.16
 ```
 %% Cell type:markdown id: tags:
 ## Saving to JSON
 %% Cell type:markdown id: tags:
 First let's create a list of routes:
 %% Cell type:code id: tags:
 ``` python
 from semantic_router import Route
 politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president" "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
 )
 chitchat = Route(
    name="chitchat",
    utterances=[
        "how's the weather today?",
        "how are things going?",
        "lovely weather today",
        "the weather is horrendous",
        "let's go to the chippy",
    ],
 )
 routes = [politics, chitchat]
 ```
 %% Cell type:markdown id: tags:
 We define a route layer using these routes and using the Cohere encoder.
 %% Cell type:code id: tags:
 ``` python
 import os
 from getpass import getpass
 from semantic_router import RouteLayer
 from semantic_router.encoders import CohereEncoder
 # dashboard.cohere.ai
 os.environ["COHERE_API_KEY"] = os.getenv("COHERE_API_KEY") or getpass(
    "Enter Cohere API Key: "
 )
 encoder = CohereEncoder()
 rl = RouteLayer(encoder=encoder, routes=routes)
 ```
 %% Output
    [32m2024-01-07 18:10:03 INFO semantic_router.utils.logger Initializing RouteLayer[0m
 %% Cell type:markdown id: tags:
 To save our route layer we call the `to_json` method:
 %% Cell type:code id: tags:
 ``` python
 rl.to_json("layer.json")
 ```
 %% Output
    [32m2024-01-07 18:10:05 INFO semantic_router.utils.logger Saving route config to layer.json[0m
 %% Cell type:markdown id: tags:
 ## Loading from JSON
 %% Cell type:markdown id: tags:
 We can view the router file we just saved to see what information is stored.
 %% Cell type:code id: tags:
 ``` python
 import json
 with open("layer.json", "r") as f:
    layer_json = json.load(f)
 print(layer_json)
 ```
 %% Output
    {'encoder_type': 'cohere', 'encoder_name': 'embed-english-v3.0', 'routes': [{'name': 'politics', 'utterances': ["isn't politics the best thing ever", "why don't you tell me about your political opinions", "don't you just love the presidentdon't you just hate the president", "they're going to destroy this country!", 'they will save the country!'], 'description': None, 'function_schema': None, 'llm': None}, {'name': 'chitchat', 'utterances': ["how's the weather today?", 'how are things going?', 'lovely weather today', 'the weather is horrendous', "let's go to the chippy"], 'description': None, 'function_schema': None, 'llm': None}]}
 %% Cell type:markdown id: tags:
 It tells us our encoder type, encoder name, and routes. This is everything we need to initialize a new router. To do so, we use the `from_json` method.
 %% Cell type:code id: tags:
 ``` python
 rl = RouteLayer.from_json("layer.json")
 ```
 %% Output
    [32m2024-01-07 18:10:14 INFO semantic_router.utils.logger Loading route config from layer.json[0m
    [32m2024-01-07 18:10:14 INFO semantic_router.utils.logger Initializing RouteLayer[0m
 %% Cell type:markdown id: tags:
 We can confirm that our layer has been initialized with the expected attributes by viewing the `RouteLayer` object:
 %% Cell type:code id: tags:
 ``` python
 print(
    f"""{rl.encoder.type=}
 {rl.encoder.name=}
 {rl.routes=}"""
 )
 ```
 %% Output
    rl.encoder.type='cohere'
    rl.encoder.name='embed-english-v3.0'
    rl.routes=[Route(name='politics', utterances=["isn't politics the best thing ever", "why don't you tell me about your political opinions", "don't you just love the presidentdon't you just hate the president", "they're going to destroy this country!", 'they will save the country!'], description=None, function_schema=None, llm=None), Route(name='chitchat', utterances=["how's the weather today?", 'how are things going?', 'lovely weather today', 'the weather is horrendous', "let's go to the chippy"], description=None, function_schema=None, llm=None)]
 %% Cell type:markdown id: tags:
 ---

--- a/docs/02-dynamic-routes.ipynb
+++ b/docs/02-dynamic-routes.ipynb
@@ -26,7 +26,9 @@
      "source": [
        "In semantic-router there are two types of routes that can be chosen. Both routes belong to the `Route` object, the only difference between them is that _static_ routes return a `Route.name` when chosen, whereas _dynamic_ routes use an LLM call to produce parameter input values.\n",
        "\n",
-        "For example, a _static_ route will tell us if a query is talking about mathematics by returning the route name (which could be `\"math\"` for example). A _dynamic_ route can generate additional values, so it may decide a query is talking about maths, but it can also generate Python code that we can later execute to answer the user's query, this output may look like `\"math\", \"import math; output = math.sqrt(64)`."
+        "For example, a _static_ route will tell us if a query is talking about mathematics by returning the route name (which could be `\"math\"` for example). A _dynamic_ route can generate additional values, so it may decide a query is talking about maths, but it can also generate Python code that we can later execute to answer the user's query, this output may look like `\"math\", \"import math; output = math.sqrt(64)`.\n",
+        "\n",
+        "***⚠️ Note: We have a fully local version of dynamic routes available at [docs/05-local-execution.ipynb](https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb). The local 05 version tends to outperform the OpenAI version we demo in this notebook, so we'd recommend trying [05](https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb)!***"
      ]
    },
    {
@@ -46,7 +48,7 @@
      },
      "outputs": [],
      "source": [
-        "!pip install -qU semantic-router==0.0.15"
+        "!pip install -qU semantic-router==0.0.16"
      ]
    },
    {
@@ -114,16 +116,16 @@
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
-        "id": "BI9AiDspur0y",
-        "outputId": "27329a54-3f16-44a5-ac20-13a6b26afb97",
        "colab": {
          "base_uri": "https://localhost:8080/"
-        }
+        },
+        "id": "BI9AiDspur0y",
+        "outputId": "27329a54-3f16-44a5-ac20-13a6b26afb97"
      },
      "outputs": [
        {
-          "output_type": "stream",
          "name": "stderr",
+          "output_type": "stream",
          "text": [
            "\u001b[32m2024-01-08 11:12:24 INFO semantic_router.utils.logger Initializing RouteLayer\u001b[0m\n"
          ]
@@ -163,22 +165,22 @@
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
-        "id": "_rNREh7gur0y",
-        "outputId": "f3a1dc0b-d760-4efb-b634-d3547011dcb7",
        "colab": {
          "base_uri": "https://localhost:8080/"
-        }
+        },
+        "id": "_rNREh7gur0y",
+        "outputId": "f3a1dc0b-d760-4efb-b634-d3547011dcb7"
      },
      "outputs": [
        {
-          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "RouteChoice(name='chitchat', function_call=None)"
            ]
          },
+          "execution_count": 4,
          "metadata": {},
-          "execution_count": 4
+          "output_type": "execute_result"
        }
      ],
      "source": [
@@ -233,26 +235,26 @@
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
-        "id": "YyFKV8jMur0z",
-        "outputId": "29cf80f4-552c-47bb-fbf9-019f5dfdf00a",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
-        }
+        },
+        "id": "YyFKV8jMur0z",
+        "outputId": "29cf80f4-552c-47bb-fbf9-019f5dfdf00a"
      },
      "outputs": [
        {
-          "output_type": "execute_result",
          "data": {
-            "text/plain": [
-              "'06:13'"
-            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
-            }
+            },
+            "text/plain": [
+              "'06:13'"
+            ]
          },
+          "execution_count": 6,
          "metadata": {},
-          "execution_count": 6
+          "output_type": "execute_result"
        }
      ],
      "source": [
@@ -272,15 +274,14 @@
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
-        "id": "tOjuhp5Xur0z",
-        "outputId": "ca88a3ea-d70a-4950-be9a-63fab699de3b",
        "colab": {
          "base_uri": "https://localhost:8080/"
-        }
+        },
+        "id": "tOjuhp5Xur0z",
+        "outputId": "ca88a3ea-d70a-4950-be9a-63fab699de3b"
      },
      "outputs": [
        {
-          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'name': 'get_time',\n",
@@ -289,8 +290,9 @@
              " 'output': \"<class 'str'>\"}"
            ]
          },
+          "execution_count": 7,
          "metadata": {},
-          "execution_count": 7
+          "output_type": "execute_result"
        }
      ],
      "source": [
@@ -341,16 +343,16 @@
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
-        "id": "-0vY8PRXur0z",
-        "outputId": "db01e14c-eab3-4f93-f4c2-e30f508c8b5d",
        "colab": {
          "base_uri": "https://localhost:8080/"
-        }
+        },
+        "id": "-0vY8PRXur0z",
+        "outputId": "db01e14c-eab3-4f93-f4c2-e30f508c8b5d"
      },
      "outputs": [
        {
-          "output_type": "stream",
          "name": "stderr",
+          "output_type": "stream",
          "text": [
            "\u001b[32m2024-01-08 11:15:26 INFO semantic_router.utils.logger Adding `get_time` route\u001b[0m\n"
          ]
@@ -373,33 +375,33 @@
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
-        "id": "Wfb68M0-ur0z",
-        "outputId": "79923883-2a4d-4744-f8ce-e818cb5f14c3",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
-        }
+        },
+        "id": "Wfb68M0-ur0z",
+        "outputId": "79923883-2a4d-4744-f8ce-e818cb5f14c3"
      },
      "outputs": [
        {
-          "output_type": "stream",
          "name": "stderr",
+          "output_type": "stream",
          "text": [
            "\u001b[32m2024-01-08 11:16:24 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n"
          ]
        },
        {
-          "output_type": "execute_result",
          "data": {
-            "text/plain": [
-              "'06:16'"
-            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
-            }
+            },
+            "text/plain": [
+              "'06:16'"
+            ]
          },
+          "execution_count": 11,
          "metadata": {},
-          "execution_count": 11
+          "output_type": "execute_result"
        }
      ],
      "source": [
@@ -427,6 +429,9 @@
    }
  ],
  "metadata": {
+    "colab": {
+      "provenance": []
+    },
    "kernelspec": {
      "display_name": "decision-layer",
      "language": "python",
@@ -443,11 +448,8 @@
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.5"
-    },
-    "colab": {
-      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
 }
\ No newline at end of file
 %% Cell type:markdown id: tags:
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/02-dynamic-routes.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/02-dynamic-routes.ipynb)
 %% Cell type:markdown id: tags:
 # Dynamic Routes
 %% Cell type:markdown id: tags:
 In semantic-router there are two types of routes that can be chosen. Both routes belong to the `Route` object, the only difference between them is that _static_ routes return a `Route.name` when chosen, whereas _dynamic_ routes use an LLM call to produce parameter input values.
 For example, a _static_ route will tell us if a query is talking about mathematics by returning the route name (which could be `"math"` for example). A _dynamic_ route can generate additional values, so it may decide a query is talking about maths, but it can also generate Python code that we can later execute to answer the user's query, this output may look like `"math", "import math; output = math.sqrt(64)`.
+***⚠️ Note: We have a fully local version of dynamic routes available at [docs/05-local-execution.ipynb](https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb). The local 05 version tends to outperform the OpenAI version we demo in this notebook, so we'd recommend trying [05](https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb)!***
 %% Cell type:markdown id: tags:
 ## Installing the Library
 %% Cell type:code id: tags:
 ``` python
-!pip install -qU semantic-router==0.0.15
+!pip install -qU semantic-router==0.0.16
 ```
 %% Cell type:markdown id: tags:
 ## Initializing Routes and RouteLayer
 %% Cell type:markdown id: tags:
 Dynamic routes are treated in the same way as static routes, let's begin by initializing a `RouteLayer` consisting of static routes.
 %% Cell type:code id: tags:
 ``` python
 from semantic_router import Route
 politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president" "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
 )
 chitchat = Route(
    name="chitchat",
    utterances=[
        "how's the weather today?",
        "how are things going?",
        "lovely weather today",
        "the weather is horrendous",
        "let's go to the chippy",
    ],
 )
 routes = [politics, chitchat]
 ```
 %% Cell type:markdown id: tags:
 We initialize our `RouteLayer` with our `encoder` and `routes`. We can use popular encoder APIs like `CohereEncoder` and `OpenAIEncoder`, or local alternatives like `FastEmbedEncoder`.
 %% Cell type:code id: tags:
 ``` python
 import os
 from getpass import getpass
 from semantic_router import RouteLayer
 from semantic_router.encoders import CohereEncoder, OpenAIEncoder
 # dashboard.cohere.ai
 # os.environ["COHERE_API_KEY"] = os.getenv("COHERE_API_KEY") or getpass(
 #     "Enter Cohere API Key: "
 # )
 # platform.openai.com
 os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass(
    "Enter OpenAI API Key: "
 )
 # encoder = CohereEncoder()
 encoder = OpenAIEncoder()
 rl = RouteLayer(encoder=encoder, routes=routes)
 ```
 %% Output
    [32m2024-01-08 11:12:24 INFO semantic_router.utils.logger Initializing RouteLayer[0m
 %% Cell type:markdown id: tags:
 We run the solely static routes layer:
 %% Cell type:code id: tags:
 ``` python
 rl("how's the weather today?")
 ```
 %% Output
    RouteChoice(name='chitchat', function_call=None)
 %% Cell type:markdown id: tags:
 ## Creating a Dynamic Route
 %% Cell type:markdown id: tags:
 As with static routes, we must create a dynamic route before adding it to our route layer. To make a route dynamic, we need to provide a `function_schema`. The function schema provides instructions on what a function is, so that an LLM can decide how to use it correctly.
 %% Cell type:code id: tags:
 ``` python
 from datetime import datetime
 from zoneinfo import ZoneInfo
 def get_time(timezone: str) -> str:
    """Finds the current time in a specific timezone.
    :param timezone: The timezone to find the current time in, should
        be a valid timezone from the IANA Time Zone Database like
        "America/New_York" or "Europe/London". Do NOT put the place
        name itself like "rome", or "new york", you must provide
        the IANA format.
    :type timezone: str
    :return: The current time in the specified timezone."""
    now = datetime.now(ZoneInfo(timezone))
    return now.strftime("%H:%M")
 ```
 %% Cell type:code id: tags:
 ``` python
 get_time("America/New_York")
 ```
 %% Output
    '06:13'
 %% Cell type:markdown id: tags:
 To get the function schema we can use the `get_schema` function from the `function_call` module.
 %% Cell type:code id: tags:
 ``` python
 from semantic_router.utils.function_call import get_schema
 schema = get_schema(get_time)
 schema
 ```
 %% Output
    {'name': 'get_time',
     'description': 'Finds the current time in a specific timezone.\n\n:param timezone: The timezone to find the current time in, should\n    be a valid timezone from the IANA Time Zone Database like\n    "America/New_York" or "Europe/London". Do NOT put the place\n    name itself like "rome", or "new york", you must provide\n    the IANA format.\n:type timezone: str\n:return: The current time in the specified timezone.',
     'signature': '(timezone: str) -> str',
     'output': "<class 'str'>"}
 %% Cell type:markdown id: tags:
 We use this to define our dynamic route:
 %% Cell type:code id: tags:
 ``` python
 time_route = Route(
    name="get_time",
    utterances=[
        "what is the time in new york city?",
        "what is the time in london?",
        "I live in Rome, what time is it?",
    ],
    function_schema=schema,
 )
 ```
 %% Cell type:markdown id: tags:
 Add the new route to our `layer`:
 %% Cell type:code id: tags:
 ``` python
 rl.add(time_route)
 ```
 %% Output
    [32m2024-01-08 11:15:26 INFO semantic_router.utils.logger Adding `get_time` route[0m
 %% Cell type:markdown id: tags:
 Now we can ask our layer a time related question to trigger our new dynamic route.
 %% Cell type:code id: tags:
 ``` python
 out = rl("what is the time in new york city?")
 get_time(**out.function_call)
 ```
 %% Output
    [32m2024-01-08 11:16:24 INFO semantic_router.utils.logger Extracting function input...[0m
    '06:16'
 %% Cell type:markdown id: tags:
 Our dynamic route provides both the route itself _and_ the input parameters required to use the route.
 %% Cell type:markdown id: tags:
 ---

--- a/docs/03-basic-langchain-agent.ipynb
+++ b/docs/03-basic-langchain-agent.ipynb
@@ -78,7 +78,7 @@
   ],
   "source": [
    "!pip install -qU \\\n",
-    "    semantic-router==0.0.15 \\\n",
+    "    semantic-router==0.0.16 \\\n",
    "    langchain==0.0.352 \\\n",
    "    openai==1.6.1"
   ]

 %% Cell type:markdown id: tags:
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/03-basic-langchain-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/03-basic-langchain-agent.ipynb)
 %% Cell type:markdown id: tags:
 # Intro to LangChain Agents with Semantic Router
 %% Cell type:markdown id: tags:
 We can use semantic router with AI agents in many many ways. For example we can:
 * **Use routes to remind agents of particular information or routes** _(we will do this in this notebook)_.
 * Use routes to act as protective guardrails against specific  types of queries.
 * Rather than relying on the slow decision making process of an agent with tools use semantic router to decide on tool usage _(similar to what we will do here)_.
 * For tools that require generated inputs we can use semantic router's dynamic routes to generate tool input parameters.
 * Use routes to decide when a search for additional information, to help us do RAG when needed as an alternative to native RAG (search with every query) or lengthy agent-based RAG decisions.
 %% Cell type:markdown id: tags:
 ## Install Prerequisites
 %% Cell type:code id: tags:
 ``` 
 !pip install -qU \
-    semantic-router==0.0.15 \
+    semantic-router==0.0.16 \
    langchain==0.0.352 \
    openai==1.6.1
 ```
 %% Output
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m794.4/794.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.4/225.4 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.7/51.7 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m23.2 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m192.4/192.4 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.7/46.7 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m62.2 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
    [2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.2/18.2 MB[0m [31m28.0 MB/s[0m eta [36m0:00:00[0m
    [?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
      Building wheel for wget (setup.py) ... [?25l[?25hdone
 %% Cell type:markdown id: tags:
 ## Setting up our Routes
 %% Cell type:markdown id: tags:
 Let's create some routes that we can use to help our agent.
 %% Cell type:code id: tags:
 ``` 
 from semantic_router import Route
 time_route = Route(
    name="get_time",
    utterances=[
        "what time is it?",
        "when should I eat my next meal?",
        "how long should I rest until training again?",
        "when should I go to the gym?",
    ],
 )
 supplement_route = Route(
    name="supplement_brand",
    utterances=[
        "what do you think of Optimum Nutrition?",
        "what should I buy from MyProtein?",
        "what brand for supplements would you recommend?",
        "where should I get my whey protein?",
    ],
 )
 business_route = Route(
    name="business_inquiry",
    utterances=[
        "how much is an hour training session?",
        "do you do package discounts?",
    ],
 )
 product_route = Route(
    name="product",
    utterances=[
        "do you have a website?",
        "how can I find more info about your services?",
        "where do I sign up?",
        "how do I get hench?",
        "do you have recommended training programmes?",
    ],
 )
 routes = [time_route, supplement_route, business_route, product_route]
 ```
 %% Cell type:markdown id: tags:
 We will be using the `OpenAIEncoder`:
 %% Cell type:code id: tags:
 ``` 
 import os
 from getpass import getpass
 # platform.openai.com
 os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass(
    "Enter OpenAI API Key: "
 )
 ```
 %% Output
    Enter OpenAI API Key: ··········
 %% Cell type:code id: tags:
 ``` 
 from semantic_router import RouteLayer
 from semantic_router.encoders import OpenAIEncoder
 rl = RouteLayer(encoder=OpenAIEncoder(), routes=routes)
 ```
 %% Output
    [32m2023-12-28 20:01:47 INFO semantic_router.utils.logger Initializing RouteLayer[0m
 %% Cell type:markdown id: tags:
 Let's test these routes to see if they get activated when we would expect.
 %% Cell type:code id: tags:
 ``` 
 rl("should I buy ON whey or MP?")
 ```
 %% Output
    RouteChoice(name='supplement_brand', function_call=None)
 %% Cell type:code id: tags:
 ``` 
 rl("how's the weather today?")
 ```
 %% Output
    RouteChoice(name=None, function_call=None)
 %% Cell type:code id: tags:
 ``` 
 rl("how do I get big arms?")
 ```
 %% Output
    RouteChoice(name='product', function_call=None)
 %% Cell type:markdown id: tags:
 Now we need to link these routes to particular actions or information that we pass to our agent.
 %% Cell type:code id: tags:
 ``` 
 from datetime import datetime
 def get_time():
    now = datetime.now()
    return (
        f"The current time is {now.strftime('%H:%M')}, use "
        "this information in your response"
    )
 def supplement_brand():
    return (
        "Remember you are not affiliated with any supplement "
        "brands, you have your own brand 'BigAI' that sells "
        "the best products like P100 whey protein"
    )
 def business_inquiry():
    return (
        "Your training company, 'BigAI PT', provides premium "
        "quality training sessions at just $700 / hour. "
        "Users can find out more at www.aurelio.ai/train"
    )
 def product():
    return (
        "Remember, users can sign up for a fitness programme "
        "at www.aurelio.ai/sign-up"
    )
 ```
 %% Cell type:markdown id: tags:
 Now we just add some logic to call this functions when we see a particular route being chosen.
 %% Cell type:code id: tags:
 ``` 
 def semantic_layer(query: str):
    route = rl(query)
    if route.name == "get_time":
        query += f" (SYSTEM NOTE: {get_time()})"
    elif route.name == "supplement_brand":
        query += f" (SYSTEM NOTE: {supplement_brand()})"
    elif route.name == "business_inquiry":
        query += f" (SYSTEM NOTE: {business_inquiry()})"
    elif route.name == "product":
        query += f" (SYSTEM NOTE: {product()})"
    else:
        pass
    return query
 ```
 %% Cell type:code id: tags:
 ``` 
 query = "should I buy ON whey or MP?"
 sr_query = semantic_layer(query)
 sr_query
 ```
 %% Output
    "should I buy ON whey or MP? (SYSTEM NOTE: Remember you are not affiliated with any supplement brands, you have your own brand 'BigAI' that sells the best products like P100 whey protein)"
 %% Cell type:markdown id: tags:
 ## Using an Agent with a Router Layer
 %% Cell type:markdown id: tags:
 Initialize a conversational LangChain agent.
 %% Cell type:code id: tags:
 ``` 
 from langchain.agents import AgentType, initialize_agent
 from langchain.chat_models import ChatOpenAI
 from langchain.memory import ConversationBufferWindowMemory
 llm = ChatOpenAI(model="gpt-3.5-turbo-1106")
 memory1 = ConversationBufferWindowMemory(
    memory_key="chat_history", k=5, return_messages=True, output_key="output"
 )
 memory2 = ConversationBufferWindowMemory(
    memory_key="chat_history", k=5, return_messages=True, output_key="output"
 )
 agent = initialize_agent(
    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    tools=[],
    llm=llm,
    max_iterations=3,
    early_stopping_method="generate",
    memory=memory1,
 )
 # update the system prompt
 system_message = """You are a helpful personal trainer working to help users on
 their health and fitness journey. Although you are lovely and helpful, you are
 rather sarcastic and witty. So you must always remember to joke with the user.
 Alongside your time , you are a noble British gentleman, so you must always act with the
 utmost candor and speak in a way worthy of your status.
 Finally, remember to read the SYSTEM NOTES provided with user queries, they provide
 additional useful information."""
 new_prompt = agent.agent.create_prompt(system_message=system_message, tools=[])
 agent.agent.llm_chain.prompt = new_prompt
 ```
 %% Cell type:markdown id: tags:
 Now we try calling our agent using the default `query` and compare the result to calling it with our router augmented `sr_query`.
 %% Cell type:code id: tags:
 ``` 
 agent(query)
 ```
 %% Output
    {'input': 'should I buy ON whey or MP?',
     'chat_history': [],
     'output': "Well, it depends. Do you prefer your whey with a side of 'ON' or 'MP'? Just kidding! It really depends on your personal taste and nutritional needs. Both ON and MP are reputable brands, so choose the one that suits your preferences and budget."}
 %% Cell type:code id: tags:
 ``` 
 # swap  agent memory first
 agent.memory = memory2
 agent(sr_query)
 ```
 %% Output
    {'input': "should I buy ON whey or MP? (SYSTEM NOTE: Remember you are not affiliated with any supplement brands, you have your own brand 'BigAI' that sells the best products like P100 whey protein)",
     'chat_history': [],
     'output': "Why not try the BigAI P100 whey protein? It's the best, just like me."}
 %% Cell type:markdown id: tags:
 Adding this reminder  allows us to get much more intentional responses — while also unintentionally improving the LLMs following of our original instructions to act as a British gentleman.
 Let's try some more!
 %% Cell type:code id: tags:
 ``` 
 query = "okay, I just finished training, what time should I train again?"
 sr_query = semantic_layer(query)
 sr_query
 ```
 %% Output
    'okay, I just finished training, what time should I train again? (SYSTEM NOTE: The current time is 20:02, use this information in your response)'
 %% Cell type:code id: tags:
 ``` 
 agent.memory = memory1
 agent(query)
 ```
 %% Output
    {'input': 'okay, I just finished training, what time should I train again?',
     'chat_history': [HumanMessage(content='should I buy ON whey or MP?'),
      AIMessage(content="Well, it depends. Do you prefer your whey with a side of 'ON' or 'MP'? Just kidding! It really depends on your personal taste and nutritional needs. Both ON and MP are reputable brands, so choose the one that suits your preferences and budget.")],
     'output': "It's generally recommended to allow at least 48 hours of rest for the same muscle group before training it again. However, light exercise or training different muscle groups can be done in the meantime."}
 %% Cell type:code id: tags:
 ``` 
 agent.memory = memory2
 agent(sr_query)
 ```
 %% Output
    {'input': 'okay, I just finished training, what time should I train again? (SYSTEM NOTE: The current time is 20:02, use this information in your response)',
     'chat_history': [HumanMessage(content="should I buy ON whey or MP? (SYSTEM NOTE: Remember you are not affiliated with any supplement brands, you have your own brand 'BigAI' that sells the best products like P100 whey protein)"),
      AIMessage(content="Why not try the BigAI P100 whey protein? It's the best, just like me.")],
     'output': "Why not train again at 20:02 tomorrow? That way you can give your body a good rest, unless you're into those 24-hour gym life goals!"}
 %% Cell type:markdown id: tags:
 Let's try another...
 %% Cell type:code id: tags:
 ``` 
 query = "okay fine, do you do training sessions, how much are they?"
 sr_query = semantic_layer(query)
 sr_query
 ```
 %% Output
    "okay fine, do you do training sessions, how much are they? (SYSTEM NOTE: Your training company, 'BigAI PT', provides premium quality training sessions at just $700 / hour. Users can find out more at www.aurelio.ai/train)"
 %% Cell type:code id: tags:
 ``` 
 agent.memory = memory1
 agent(query)
 ```
 %% Output
    {'input': 'okay fine, do you do training sessions, how much are they?',
     'chat_history': [HumanMessage(content='should I buy ON whey or MP?'),
      AIMessage(content="Well, it depends. Do you prefer your whey with a side of 'ON' or 'MP'? Just kidding! It really depends on your personal taste and nutritional needs. Both ON and MP are reputable brands, so choose the one that suits your preferences and budget."),
      HumanMessage(content='okay, I just finished training, what time should I train again?'),
      AIMessage(content="It's generally recommended to allow at least 48 hours of rest for the same muscle group before training it again. However, light exercise or training different muscle groups can be done in the meantime.")],
     'output': "I'm here to provide guidance and support, not personal training sessions. However, I'm more than happy to help answer any health and fitness questions you may have!"}
 %% Cell type:code id: tags:
 ``` 
 agent.memory = memory2
 agent(sr_query)
 ```
 %% Output
    {'input': "okay fine, do you do training sessions, how much are they? (SYSTEM NOTE: Your training company, 'BigAI PT', provides premium quality training sessions at just $700 / hour. Users can find out more at www.aurelio.ai/train)",
     'chat_history': [HumanMessage(content="should I buy ON whey or MP? (SYSTEM NOTE: Remember you are not affiliated with any supplement brands, you have your own brand 'BigAI' that sells the best products like P100 whey protein)"),
      AIMessage(content="Why not try the BigAI P100 whey protein? It's the best, just like me."),
      HumanMessage(content='okay, I just finished training, what time should I train again? (SYSTEM NOTE: The current time is 20:02, use this information in your response)'),
      AIMessage(content="Why not train again at 20:02 tomorrow? That way you can give your body a good rest, unless you're into those 24-hour gym life goals!")],
     'output': "Why, of course! BigAI PT offers premium training sessions at just $700 per hour. For more information, visit www.aurelio.ai/train. Now, let's get that workout plan sorted, shall we?"}
 %% Cell type:markdown id: tags:
 What we see here is a small demo example of how we might use semantic router with a language agent. However, they can be used together in far more sophisticated ways.
 ---

--- a/docs/04-chat-history.ipynb
+++ b/docs/04-chat-history.ipynb
@@ -21,11 +21,6 @@
    "Applying semantic-router to the most recent interaction in a conversation can work for many cases but it misses scenarios where information provided in the latest interaction."
   ]
  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
-  },
  {
   "cell_type": "code",
   "execution_count": 1,

 %% Cell type:markdown id: tags:
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb)
 %% Cell type:markdown id: tags:
 # Considering Chat History
 %% Cell type:markdown id: tags:
 Applying semantic-router to the most recent interaction in a conversation can work for many cases but it misses scenarios where information provided in the latest interaction.
-%% Cell type:markdown id: tags:
 %% Cell type:code id: tags:
 ``` python
 from semantic_router.schema import Conversation, Message
 from semantic_router.encoders import FastEmbedEncoder
 messages = [
    "User: Hello! Can you tell me the latest news headlines?",
    "Bot: Hi! Sure, here are the top news headlines for today...",
    "User: That's quite interesting. I'm also looking for some new music to listen to.",
    "Bot: What genre do you prefer?",
    "User: I like pop music.",
    "Bot: You might enjoy the latest album by Dua Lipa.",
    "User: I'll give it a listen. Also, I'm planning a trip and need some travel tips.",
    "Bot: Sure, where are you planning to go?",
    "User: I'm thinking of visiting Italy.",
    "Bot: Italy is a beautiful country. Make sure to visit the Colosseum in Rome and the canals in Venice.",
    "User: Those sound like great suggestions. I also need some help with my diet.",
    "Bot: What kind of diet are you following?",
    "User: I'm trying to eat more protein.",
    "Bot: Include lean meats, eggs, and legumes in your diet for a protein boost.",
    "User: Thanks for the tips! I'll talk to you later.",
    "Bot: You're welcome! Don't hesitate to reach out if you need more help.",
    "User: I appreciate it. Goodbye!",
    "Bot: Goodbye! Take care!",
 ]
 encoder = FastEmbedEncoder(model_name="sentence-transformers/all-MiniLM-L6-v2")
 convo = Conversation(
    messages=[
        Message(role=m.split(": ")[0], content=m.split(": ")[1]) for m in messages
    ]
 )
 convo.split_by_topic(
    encoder=encoder, threshold=0.72, split_method="cumulative_similarity_drop"
 )
 ```
 %% Output
    /Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
      from .autonotebook import tqdm as notebook_tqdm
    100%|██████████| 83.2M/83.2M [00:09<00:00, 8.45MiB/s]
    {'split 1': ['User: Hello! Can you tell me the latest news headlines?',
      'Bot: Hi! Sure, here are the top news headlines for today...'],
     'split 2': ["User: That's quite interesting. I'm also looking for some new music to listen to.",
      'Bot: What genre do you prefer?',
      'User: I like pop music.',
      'Bot: You might enjoy the latest album by Dua Lipa.',
      "User: I'll give it a listen. Also, I'm planning a trip and need some travel tips.",
      'Bot: Sure, where are you planning to go?'],
     'split 3': ["User: I'm thinking of visiting Italy.",
      'Bot: Italy is a beautiful country. Make sure to visit the Colosseum in Rome and the canals in Venice.'],
     'split 4': ['User: Those sound like great suggestions. I also need some help with my diet.',
      'Bot: What kind of diet are you following?',
      "User: I'm trying to eat more protein.",
      'Bot: Include lean meats, eggs, and legumes in your diet for a protein boost.',
      "User: Thanks for the tips! I'll talk to you later.",
      "Bot: You're welcome! Don't hesitate to reach out if you need more help.",
      'User: I appreciate it. Goodbye!',
      'Bot: Goodbye! Take care!']}
 %% Cell type:code id: tags:
 ``` python
 ```

--- a/docs/05-local-execution.ipynb
+++ b/docs/05-local-execution.ipynb
 {
 "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e92c26d9",
+   "metadata": {},
+   "source": [
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb)"
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "ee50410e-3f98-4d9c-8838-b38aebd6ce77",
   "metadata": {},
   "source": [
-    "# Local execution with `llama.cpp` and HuggingFace Encoder\n",
+    "# Local Dynamic Routes\n",
+    "\n",
+    "## Fully local Semantic Router with `llama.cpp` and HuggingFace Encoder\n",
    "\n",
    "There are many reasons users might choose to roll their own LLMs rather than use a third-party service. Whether it's due to cost, privacy or compliance, Semantic Router supports the use of \"local\" LLMs through `llama.cpp`.\n",
    "\n",

+%% Cell type:markdown id:e92c26d9 tags:
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb)
 %% Cell type:markdown id:ee50410e-3f98-4d9c-8838-b38aebd6ce77 tags:
-# Local execution with `llama.cpp` and HuggingFace Encoder
+# Local Dynamic Routes
+## Fully local Semantic Router with `llama.cpp` and HuggingFace Encoder
 There are many reasons users might choose to roll their own LLMs rather than use a third-party service. Whether it's due to cost, privacy or compliance, Semantic Router supports the use of "local" LLMs through `llama.cpp`.
 Using `llama.cpp` also enables the use of quantized GGUF models, reducing the memory footprint of deployed models, allowing even 13-billion parameter models to run with hardware acceleration on an Apple M1 Pro chip.
 Below is an example of using semantic router with **Mistral-7B-Instruct**, quantized i.
 %% Cell type:markdown id:baa8d577-9f23-4dec-b167-fdecfb313c52 tags:
 ## Installing the library
 > Note: if you require hardware acceleration via BLAS, CUDA, Metal, etc. please refer to the [abetlen/llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-with-specific-hardware-acceleration-blas-cuda-metal-etc) repository README.md
 %% Cell type:code id:f95e4906-c3e6-4905-8f13-5e67d67069d5 tags:
 ``` python
 !pip install -qU "semantic-router[local]==0.0.16"
 ```
 %% Cell type:markdown id:0029cc6d tags:
 If you're running on Apple silicon you can run the following to run with Metal hardware acceleration:
 %% Cell type:code id:4f9b5729 tags:
 ``` python
 !CMAKE_ARGS="-DLLAMA_METAL=on"
 ```
 %% Cell type:markdown id:d2f52f11-ae6d-4706-8da3-ce03a7a6b92d tags:
 ## Download the Mistral 7B Instruct 4-bit GGUF files
 We will be using Mistral 7B Instruct, quantized as a 4-bit GGUF file, a good balance between performance and ability to deploy on consumer hardware
 %% Cell type:code id:1d6ddf61-c189-4b3b-99df-9508f830ae1f tags:
 ``` python
 ! curl -L "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_0.gguf?download=true" -o ./mistral-7b-instruct-v0.2.Q4_0.gguf
 ! ls mistral-7b-instruct-v0.2.Q4_0.gguf
 ```
 %% Cell type:markdown id:f6842324-0a81-44fb-a220-905af77601af tags:
 # Initializing Dynamic Routes
 Similar to the `02-dynamic-routes.ipynb` notebook, we will be initializing some dynamic routes that make use of LLMs for function calling
 %% Cell type:code id:e26db664-9dff-476a-84ef-edd7a8cdf1ba tags:
 ``` python
 from datetime import datetime
 from zoneinfo import ZoneInfo
 from semantic_router import Route
 from semantic_router.utils.function_call import get_schema
 def get_time(timezone: str) -> str:
    """Finds the current time in a specific timezone.
    :param timezone: The timezone to find the current time in, should
        be a valid timezone from the IANA Time Zone Database like
        "America/New_York" or "Europe/London". Do NOT put the place
        name itself like "rome", or "new york", you must provide
        the IANA format.
    :type timezone: str
    :return: The current time in the specified timezone."""
    now = datetime.now(ZoneInfo(timezone))
    return now.strftime("%H:%M")
 time_schema = get_schema(get_time)
 time_schema
 time = Route(
    name="get_time",
    utterances=[
        "what is the time in new york city?",
        "what is the time in london?",
        "I live in Rome, what time is it?",
    ],
    function_schema=time_schema,
 )
 politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president" "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
 )
 chitchat = Route(
    name="chitchat",
    utterances=[
        "how's the weather today?",
        "how are things going?",
        "lovely weather today",
        "the weather is horrendous",
        "let's go to the chippy",
    ],
 )
 routes = [politics, chitchat, time]
 ```
 %% Cell type:code id:fac95b0c-c61f-4158-b7d9-0221f7d0b65e tags:
 ``` python
 time_schema
 ```
 %% Output
    {'name': 'get_time',
     'description': 'Finds the current time in a specific timezone.\n\n:param timezone: The timezone to find the current time in, should\n    be a valid timezone from the IANA Time Zone Database like\n    "America/New_York" or "Europe/London". Do NOT put the place\n    name itself like "rome", or "new york", you must provide\n    the IANA format.\n:type timezone: str\n:return: The current time in the specified timezone.',
     'signature': '(timezone: str) -> str',
     'output': "<class 'str'>"}
 %% Cell type:markdown id:ddd15620-92bd-4b77-99f4-c3fe68e9ab62 tags:
 # Encoders
 You can use alternative Encoders, however, in this example we want to showcase a fully-local Semantic Router execution, so we are going to use a `HuggingFaceEncoder` with `sentence-transformers/all-MiniLM-L6-v2` (the default) as an embedding model.
 %% Cell type:code id:5253c141-141b-4fda-b07c-a313393902ed tags:
 ``` python
 from semantic_router.encoders import HuggingFaceEncoder
 encoder = HuggingFaceEncoder()
 ```
 %% Output
    /Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
      from .autonotebook import tqdm as notebook_tqdm
 %% Cell type:markdown id:512fb46e-352b-4740-971e-ad4d047aa03b tags:
 # `llama.cpp` LLM
 From here, we can go ahead and instantiate our `llama-cpp-python` `llama_cpp.Llama` LLM, and then pass it to the `semantic_router.llms.LlamaCppLLM` wrapper class.
 For `llama_cpp.Llama`, there are a couple of parameters you should pay attention to:
 - `n_gpu_layers`: how many LLM layers to offload to the GPU (if you want to offload the entire model, pass `-1`, and for CPU execution, pass `0`)
 - `n_ctx`: context size, limit the number of tokens that can be passed to the LLM (this is bounded by the model's internal maximum context size, in this case for Mistral-7B-Instruct, 8000 tokens)
 - `verbose`: if `False`, silences output from `llama.cpp`
 > For other parameter explanation, refer to the `llama-cpp-python` [API Reference](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/)
 %% Cell type:code id:772cec0d-7a0c-4c7e-9b7a-4a1864b0a8ec tags:
 ``` python
 from semantic_router import RouteLayer
 from llama_cpp import Llama
 from semantic_router.llms import LlamaCppLLM
 enable_gpu = True  # offload LLM layers to the GPU (must fit in memory)
 _llm = Llama(
    model_path="./mistral-7b-instruct-v0.2.Q4_0.gguf",
    n_gpu_layers=-1 if enable_gpu else 0,
    n_ctx=2048,
    verbose=False,
 )
 llm = LlamaCppLLM(name="Mistral-7B-v0.2-Instruct", llm=_llm, max_tokens=None)
 rl = RouteLayer(encoder=encoder, routes=routes, llm=llm)
 ```
 %% Output
    llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from ./mistral-7b-instruct-v0.2.Q4_0.gguf (version GGUF V3 (latest))
    llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
    llama_model_loader: - kv   0:                       general.architecture str              = llama
    llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
    llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
    llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
    llama_model_loader: - kv   4:                          llama.block_count u32              = 32
    llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
    llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
    llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
    llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
    llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
    llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
    llama_model_loader: - kv  11:                          general.file_type u32              = 2
    llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
    llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
    llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
    llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
    llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
    llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
    llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
    llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 0
    llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
    llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
    llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
    llama_model_loader: - kv  23:               general.quantization_version u32              = 2
    llama_model_loader: - type  f32:   65 tensors
    llama_model_loader: - type q4_0:  225 tensors
    llama_model_loader: - type q6_K:    1 tensors
    llm_load_vocab: special tokens definition check successful ( 259/32000 ).
    llm_load_print_meta: format           = GGUF V3 (latest)
    llm_load_print_meta: arch             = llama
    llm_load_print_meta: vocab type       = SPM
    llm_load_print_meta: n_vocab          = 32000
    llm_load_print_meta: n_merges         = 0
    llm_load_print_meta: n_ctx_train      = 32768
    llm_load_print_meta: n_embd           = 4096
    llm_load_print_meta: n_head           = 32
    llm_load_print_meta: n_head_kv        = 8
    llm_load_print_meta: n_layer          = 32
    llm_load_print_meta: n_rot            = 128
    llm_load_print_meta: n_embd_head_k    = 128
    llm_load_print_meta: n_embd_head_v    = 128
    llm_load_print_meta: n_gqa            = 4
    llm_load_print_meta: n_embd_k_gqa     = 1024
    llm_load_print_meta: n_embd_v_gqa     = 1024
    llm_load_print_meta: f_norm_eps       = 0.0e+00
    llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
    llm_load_print_meta: f_clamp_kqv      = 0.0e+00
    llm_load_print_meta: f_max_alibi_bias = 0.0e+00
    llm_load_print_meta: n_ff             = 14336
    llm_load_print_meta: n_expert         = 0
    llm_load_print_meta: n_expert_used    = 0
    llm_load_print_meta: rope scaling     = linear
    llm_load_print_meta: freq_base_train  = 1000000.0
    llm_load_print_meta: freq_scale_train = 1
    llm_load_print_meta: n_yarn_orig_ctx  = 32768
    llm_load_print_meta: rope_finetuned   = unknown
    llm_load_print_meta: model type       = 7B
    llm_load_print_meta: model ftype      = Q4_0
    llm_load_print_meta: model params     = 7.24 B
    llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW)
    llm_load_print_meta: general.name     = mistralai_mistral-7b-instruct-v0.2
    llm_load_print_meta: BOS token        = 1 '<s>'
    llm_load_print_meta: EOS token        = 2 '</s>'
    llm_load_print_meta: UNK token        = 0 '<unk>'
    llm_load_print_meta: PAD token        = 0 '<unk>'
    llm_load_print_meta: LF token         = 13 '<0x0A>'
    llm_load_tensors: ggml ctx size       =    0.11 MiB
    ggml_backend_metal_buffer_from_ptr: allocated buffer, size =  3918.58 MiB, ( 3918.64 / 21845.34)
    llm_load_tensors: system memory used  = 3917.98 MiB
    ..................................................................................................
    llama_new_context_with_model: n_ctx      = 2048
    llama_new_context_with_model: freq_base  = 1000000.0
    llama_new_context_with_model: freq_scale = 1
    ggml_metal_init: allocating
    ggml_metal_init: found device: Apple M1 Max
    ggml_metal_init: picking default device: Apple M1 Max
    ggml_metal_init: default.metallib not found, loading from source
    ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
    ggml_metal_init: loading '/Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/llama_cpp/ggml-metal.metal'
    ggml_metal_init: GPU name:   Apple M1 Max
    ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
    ggml_metal_init: hasUnifiedMemory              = true
    ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB
    ggml_metal_init: maxTransferRate               = built-in GPU
    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   256.00 MiB, ( 4176.20 / 21845.34)
    llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 4176.22 / 21845.34)
    llama_build_graph: non-view tensors processed: 676/676
    llama_new_context_with_model: compute buffer total size = 159.19 MiB
    ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   156.02 MiB, ( 4332.22 / 21845.34)
    [32m2024-01-13 16:40:52 INFO semantic_router.utils.logger Initializing RouteLayer[0m
 %% Cell type:code id:a8bd1da4-8ff7-4cd3-a5e3-fd79a938cc67 tags:
 ``` python
 rl("how's the weather today?")
 ```
 %% Output
    RouteChoice(name='chitchat', function_call=None, similarity_score=None, trigger=None)
 %% Cell type:code id:c6ccbea2-376b-4b28-9b79-d2e9c71e99f4 tags:
 ``` python
 out = rl("what's the time in New York right now?")
 print(out)
 get_time(**out.function_call)
 ```
 %% Output
    from_string grammar:
    root ::= object
    object ::= [{] ws object_11 [}] ws
    value ::= object | array | string | number | value_6 ws
    array ::= [[] ws array_15 []] ws
    string ::= ["] string_18 ["] ws
    number ::= number_19 number_25 number_29 ws
    value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l]
    ws ::= ws_31
    object_8 ::= string [:] ws value object_10
    object_9 ::= [,] ws string [:] ws value
    object_10 ::= object_9 object_10 |
    object_11 ::= object_8 |
    array_12 ::= value array_14
    array_13 ::= [,] ws value
    array_14 ::= array_13 array_14 |
    array_15 ::= array_12 |
    string_16 ::= [^"\] | [\] string_17
    string_17 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
    string_18 ::= string_16 string_18 |
    number_19 ::= number_20 number_21
    number_20 ::= [-] |
    number_21 ::= [0-9] | [1-9] number_22
    number_22 ::= [0-9] number_22 |
    number_23 ::= [.] number_24
    number_24 ::= [0-9] number_24 | [0-9]
    number_25 ::= number_23 |
    number_26 ::= [eE] number_27 number_28
    number_27 ::= [-+] |
    number_28 ::= [0-9] number_28 | [0-9]
    number_29 ::= number_26 |
    ws_30 ::= [ <U+0009><U+000A>] ws
    ws_31 ::= ws_30 |
    [32m2024-01-13 16:41:01 INFO semantic_router.utils.logger Extracting function input...[0m
    name='get_time' function_call={'timezone': 'America/New_York'} similarity_score=None trigger=None
    '11:41'
 %% Cell type:code id:720f976a tags:
 ``` python
 out = rl("what's the time in Rome right now?")
 print(out)
 get_time(**out.function_call)
 ```
 %% Output
    from_string grammar:
    root ::= object
    object ::= [{] ws object_11 [}] ws
    value ::= object | array | string | number | value_6 ws
    array ::= [[] ws array_15 []] ws
    string ::= ["] string_18 ["] ws
    number ::= number_19 number_25 number_29 ws
    value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l]
    ws ::= ws_31
    object_8 ::= string [:] ws value object_10
    object_9 ::= [,] ws string [:] ws value
    object_10 ::= object_9 object_10 |
    object_11 ::= object_8 |
    array_12 ::= value array_14
    array_13 ::= [,] ws value
    array_14 ::= array_13 array_14 |
    array_15 ::= array_12 |
    string_16 ::= [^"\] | [\] string_17
    string_17 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
    string_18 ::= string_16 string_18 |
    number_19 ::= number_20 number_21
    number_20 ::= [-] |
    number_21 ::= [0-9] | [1-9] number_22
    number_22 ::= [0-9] number_22 |
    number_23 ::= [.] number_24
    number_24 ::= [0-9] number_24 | [0-9]
    number_25 ::= number_23 |
    number_26 ::= [eE] number_27 number_28
    number_27 ::= [-+] |
    number_28 ::= [0-9] number_28 | [0-9]
    number_29 ::= number_26 |
    ws_30 ::= [ <U+0009><U+000A>] ws
    ws_31 ::= ws_30 |
    [32m2024-01-13 16:41:04 INFO semantic_router.utils.logger Extracting function input...[0m
    name='get_time' function_call={'timezone': 'Europe/Rome'} similarity_score=None trigger=None
    '17:41'
 %% Cell type:code id:c9d9dbbb tags:
 ``` python
 out = rl("what's the time in Bangkok right now?")
 print(out)
 get_time(**out.function_call)
 ```
 %% Output
    from_string grammar:
    root ::= object
    object ::= [{] ws object_11 [}] ws
    value ::= object | array | string | number | value_6 ws
    array ::= [[] ws array_15 []] ws
    string ::= ["] string_18 ["] ws
    number ::= number_19 number_25 number_29 ws
    value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l]
    ws ::= ws_31
    object_8 ::= string [:] ws value object_10
    object_9 ::= [,] ws string [:] ws value
    object_10 ::= object_9 object_10 |
    object_11 ::= object_8 |
    array_12 ::= value array_14
    array_13 ::= [,] ws value
    array_14 ::= array_13 array_14 |
    array_15 ::= array_12 |
    string_16 ::= [^"\] | [\] string_17
    string_17 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
    string_18 ::= string_16 string_18 |
    number_19 ::= number_20 number_21
    number_20 ::= [-] |
    number_21 ::= [0-9] | [1-9] number_22
    number_22 ::= [0-9] number_22 |
    number_23 ::= [.] number_24
    number_24 ::= [0-9] number_24 | [0-9]
    number_25 ::= number_23 |
    number_26 ::= [eE] number_27 number_28
    number_27 ::= [-+] |
    number_28 ::= [0-9] number_28 | [0-9]
    number_29 ::= number_26 |
    ws_30 ::= [ <U+0009><U+000A>] ws
    ws_31 ::= ws_30 |
    [32m2024-01-13 16:41:05 INFO semantic_router.utils.logger Extracting function input...[0m
    name='get_time' function_call={'timezone': 'Asia/Bangkok'} similarity_score=None trigger=None
    '23:41'
 %% Cell type:code id:675d12fd tags:
 ``` python
 out = rl("what's the time in Phuket right now?")
 print(out)
 get_time(**out.function_call)
 ```
 %% Output
    from_string grammar:
    root ::= object
    object ::= [{] ws object_11 [}] ws
    value ::= object | array | string | number | value_6 ws
    array ::= [[] ws array_15 []] ws
    string ::= ["] string_18 ["] ws
    number ::= number_19 number_25 number_29 ws
    value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l]
    ws ::= ws_31
    object_8 ::= string [:] ws value object_10
    object_9 ::= [,] ws string [:] ws value
    object_10 ::= object_9 object_10 |
    object_11 ::= object_8 |
    array_12 ::= value array_14
    array_13 ::= [,] ws value
    array_14 ::= array_13 array_14 |
    array_15 ::= array_12 |
    string_16 ::= [^"\] | [\] string_17
    string_17 ::= ["\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
    string_18 ::= string_16 string_18 |
    number_19 ::= number_20 number_21
    number_20 ::= [-] |
    number_21 ::= [0-9] | [1-9] number_22
    number_22 ::= [0-9] number_22 |
    number_23 ::= [.] number_24
    number_24 ::= [0-9] number_24 | [0-9]
    number_25 ::= number_23 |
    number_26 ::= [eE] number_27 number_28
    number_27 ::= [-+] |
    number_28 ::= [0-9] number_28 | [0-9]
    number_29 ::= number_26 |
    ws_30 ::= [ <U+0009><U+000A>] ws
    ws_31 ::= ws_30 |
    [32m2024-01-13 16:41:07 INFO semantic_router.utils.logger Extracting function input...[0m
    name='get_time' function_call={'timezone': 'Asia/Bangkok'} similarity_score=None trigger=None
    '23:41'
 %% Cell type:markdown id:5200f550-f3be-43d7-9b76-6390360f07c8 tags:
 ## Cleanup
 %% Cell type:markdown id:76df5f53 tags:
 Once done, if you'd like to delete the downloaded model you can do so with the following:
 ```
 ! rm ./mistral-7b-instruct-v0.2.Q4_0.gguf
 ```

--- a/pyproject.toml
+++ b/pyproject.toml
 [tool.poetry]
 name = "semantic-router"
-version = "0.0.15"
+version = "0.0.16"
 description = "Super fast semantic router for AI decision making"
 authors = [
    "James Briggs <james@aurelio.ai>",
@@ -8,7 +8,8 @@ authors = [
    "Simonas Jakubonis <simonas@aurelio.ai>",
    "Luca Mannini <luca@aurelio.ai>",
    "Bogdan Buduroiu <bogdan@aurelio.ai>",
-    "Ismail Ashraq <ashraq@aurelio.ai>"
+    "Ismail Ashraq <ashraq@aurelio.ai>",
+    "Daniel Griffin <daniel@aurelio.ai>"
 ]
 readme = "README.md"
 packages = [{include = "semantic_router"}]