fix link

a76bb805 · James Briggs · cfae77c8 · a76bb805
Unverified Commit a76bb805 authored 1 year ago by James Briggs
--- a/docs/06-threshold-optimization.ipynb
+++ b/docs/06-threshold-optimization.ipynb
@@ -4,7 +4,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/08-scaling-agent-tools.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/08-scaling-agent-tools.ipynb)"
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/06-threshold-optimization.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/06-threshold-optimization)"
   ]
  },
  {

 %% Cell type:markdown id: tags:

-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/08-scaling-agent-tools.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/08-scaling-agent-tools.ipynb)
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/06-threshold-optimization.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/06-threshold-optimization)

 %% Cell type:markdown id: tags:

 # Route Threshold Optimization

 %% Cell type:markdown id: tags:

 Route score thresholds are what defines whether a route should be chosen. If the score we identify for any given route is higher than the `Route.score_threshold` it passes, otherwise it does not and _either_ another route is chosen, or we return _no_ route.

 Given that this one `score_threshold` parameter can define the choice of a route, it's important to get it right — but it's incredibly inefficient to do so manually. Instead, we can use the `fit` and `evaluate` methods of our `RouteLayer`. All we must do is pass a smaller number of _(utterance, target route)_ examples to our methods, and with `fit` we will often see dramatically improved performance within seconds — we will see how to measure that performance gain with `evaluate`.

 %% Cell type:code id: tags:

 ``` python
 !pip install -qU "semantic-router[local]==0.0.20"
 ```

 %% Cell type:markdown id: tags:

 ## Define RouteLayer

 As usual we will define our `RouteLayer`. The `RouteLayer` requires just `routes` and an `encoder`. If using dynamic routes you must also define an `llm` (or use the OpenAI default).

 We will start by defining four routes; _politics_, _chitchat_, _mathematics_, and _biology_.

 %% Cell type:code id: tags:

 ``` python
 from semantic_router import Route

 # we could use this as a guide for our chatbot to avoid political conversations
 politics = Route(
    name="politics",
    utterances=[
        "isn't politics the best thing ever",
        "why don't you tell me about your political opinions",
        "don't you just love the president" "don't you just hate the president",
        "they're going to destroy this country!",
        "they will save the country!",
    ],
 )

 # this could be used as an indicator to our chatbot to switch to a more
 # conversational prompt
 chitchat = Route(
    name="chitchat",
    utterances=[
        "Did you watch the game last night?",
        "what's your favorite type of music?",
        "Have you read any good books lately?",
        "nice weather we're having",
        "Do you have any plans for the weekend?",
    ],
 )

 # we can use this to switch to an agent with more math tools, prompting, and LLMs
 mathematics = Route(
    name="mathematics",
    utterances=[
        "can you explain the concept of a derivative?",
        "What is the formula for the area of a triangle?",
        "how do you solve a system of linear equations?",
        "What is the concept of a prime number?",
        "Can you explain the Pythagorean theorem?",
    ],
 )

 # we can use this to switch to an agent with more biology knowledge
 biology = Route(
    name="biology",
    utterances=[
        "what is the process of osmosis?",
        "can you explain the structure of a cell?",
        "What is the role of RNA?",
        "What is genetic mutation?",
        "Can you explain the process of photosynthesis?",
    ],
 )

 # we place all of our decisions together into single list
 routes = [politics, chitchat, mathematics, biology]
 ```

 %% Output

    /Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
      from .autonotebook import tqdm as notebook_tqdm

 %% Cell type:markdown id: tags:

 For our encoder we will use the local `HuggingFaceEncoder`. Other popular encoders include `CohereEncoder`, `FastEmbedEncoder`, `OpenAIEncoder`, and `AzureOpenAIEncoder`.

 %% Cell type:code id: tags:

 ``` python
 from semantic_router.encoders import HuggingFaceEncoder

 encoder = HuggingFaceEncoder()
 ```

 %% Output

    /Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
      return self.fget.__get__(instance, owner)()

 %% Cell type:markdown id: tags:

 Now we initialize our `RouteLayer`.

 %% Cell type:code id: tags:

 ``` python
 from semantic_router.layer import RouteLayer

 rl = RouteLayer(encoder=encoder, routes=routes)
 ```

 %% Output

    [32m2024-01-28 11:08:05 INFO semantic_router.utils.logger Initializing RouteLayer[0m

 %% Cell type:markdown id: tags:

 By default, we should get reasonable performance:

 %% Cell type:code id: tags:

 ``` python
 for utterance in [
    "don't you love politics?",
    "how's the weather today?",
    "What's DNA?",
    "I'm interested in learning about llama 2",
 ]:
    print(f"{utterance} -> {rl(utterance).name}")
 ```

 %% Output

    don't you love politics? -> politics
    how's the weather today? -> chitchat
    What's DNA? -> biology
    I'm interested in learning about llama 2 -> None

 %% Cell type:markdown id: tags:

 We can evaluate the performance of our route layer using the `evaluate` method. All we need is to pass a list of utterances and target route labels:

 %% Cell type:code id: tags:

 ``` python
 test_data = [
    ("don't you love politics?", "politics"),
    ("how's the weather today?", "chitchat"),
    ("What's DNA?", "biology"),
    ("I'm interested in learning about llama 2", None),
 ]

 # unpack the test data
 X, y = zip(*test_data)

 # evaluate using the default thresholds
 accuracy = rl.evaluate(X=X, y=y)
 print(f"Accuracy: {accuracy*100:.2f}%")
 ```

 %% Output

    Accuracy: 100.00%

 %% Cell type:markdown id: tags:

 On this small subset we get perfect accuracy — but what if we try we a larger, more robust dataset?

 _Hint: try using GPT-4 or another LLM to generate some examples for your own use-cases. The more accurate examples you provide, the better you can expect the routes to perform on your actual use-case._

 %% Cell type:code id: tags:

 ``` python
 test_data = [
    # politics
    ("What's your opinion on the current government?", "politics"),
    ("Who do you think will win the next election?", "politics"),
    ("What are your thoughts on the new policy?", "politics"),
    ("How do you feel about the political situation?", "politics"),
    ("Do you agree with the president's actions?", "politics"),
    ("What's your stance on the political debate?", "politics"),
    ("How do you see the future of our country?", "politics"),
    ("What do you think about the opposition party?", "politics"),
    ("Do you believe the government is doing enough?", "politics"),
    ("What's your opinion on the political scandal?", "politics"),
    ("Do you think the new law will make a difference?", "politics"),
    ("What are your thoughts on the political reform?", "politics"),
    ("Do you agree with the government's foreign policy?", "politics"),
    # chitchat
    ("What's the weather like?", "chitchat"),
    ("It's a beautiful day today.", "chitchat"),
    ("How's your day going?", "chitchat"),
    ("It's raining cats and dogs.", "chitchat"),
    ("Let's grab a coffee.", "chitchat"),
    ("What's up?", "chitchat"),
    ("It's a bit chilly today.", "chitchat"),
    ("How's it going?", "chitchat"),
    ("Nice weather we're having.", "chitchat"),
    ("It's a bit windy today.", "chitchat"),
    ("Let's go for a walk.", "chitchat"),
    ("How's your week been?", "chitchat"),
    ("It's quite sunny today.", "chitchat"),
    ("How are you feeling?", "chitchat"),
    ("It's a bit cloudy today.", "chitchat"),
    # mathematics
    ("What is the Pythagorean theorem?", "mathematics"),
    ("Can you solve this quadratic equation?", "mathematics"),
    ("What is the derivative of x squared?", "mathematics"),
    ("Explain the concept of integration.", "mathematics"),
    ("What is the area of a circle?", "mathematics"),
    ("How do you calculate the volume of a sphere?", "mathematics"),
    ("What is the difference between a vector and a scalar?", "mathematics"),
    ("Explain the concept of a matrix.", "mathematics"),
    ("What is the Fibonacci sequence?", "mathematics"),
    ("How do you calculate permutations?", "mathematics"),
    ("What is the concept of probability?", "mathematics"),
    ("Explain the binomial theorem.", "mathematics"),
    ("What is the difference between discrete and continuous data?", "mathematics"),
    ("What is a complex number?", "mathematics"),
    ("Explain the concept of limits.", "mathematics"),
    # biology
    ("What is photosynthesis?", "biology"),
    ("Explain the process of cell division.", "biology"),
    ("What is the function of mitochondria?", "biology"),
    ("What is DNA?", "biology"),
    ("What is the difference between prokaryotic and eukaryotic cells?", "biology"),
    ("What is an ecosystem?", "biology"),
    ("Explain the theory of evolution.", "biology"),
    ("What is a species?", "biology"),
    ("What is the role of enzymes?", "biology"),
    ("What is the circulatory system?", "biology"),
    ("Explain the process of respiration.", "biology"),
    ("What is a gene?", "biology"),
    ("What is the function of the nervous system?", "biology"),
    ("What is homeostasis?", "biology"),
    ("What is the difference between a virus and a bacteria?", "biology"),
    ("What is the role of the immune system?", "biology"),
    # add some None routes to prevent excessively small thresholds
    ("What is the capital of France?", None),
    ("how many people live in the US?", None),
    ("when is the best time to visit Bali?", None),
    ("how do I learn a language", None),
    ("tell me an interesting fact", None),
    ("what is the best programming language?", None),
    ("I'm interested in learning about llama 2", None),
 ]
 ```

 %% Cell type:code id: tags:

 ``` python
 # unpack the test data
 X, y = zip(*test_data)

 # evaluate using the default thresholds
 accuracy = rl.evaluate(X=X, y=y)
 print(f"Accuracy: {accuracy*100:.2f}%")
 ```

 %% Output

    Accuracy: 34.85%

 %% Cell type:markdown id: tags:

 Ouch, that's not so good! Fortunately, we can easily improve our performance here.

 %% Cell type:markdown id: tags:

 ## Route Layer Optimization

 %% Cell type:markdown id: tags:

 Our optimization works by finding the best route thresholds for each `Route` in our `RouteLayer`. We can see the current, default thresholds by calling the `get_thresholds` method:

 %% Cell type:code id: tags:

 ``` python
 route_thresholds = rl.get_thresholds()
 print("Default route thresholds:", route_thresholds)
 ```

 %% Output

    Default route thresholds: {'politics': 0.5, 'chitchat': 0.5, 'mathematics': 0.5, 'biology': 0.5}

 %% Cell type:markdown id: tags:

 These are all preset route threshold values. Fortunately, it's very easy to optimize these — we simply call the `fit` method and provide our training utterances `X`, and target route labels `y`:

 %% Cell type:code id: tags:

 ``` python
 # Call the fit method
 rl.fit(X=X, y=y)
 ```

 %% Output

    huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
    To disable this warning, you can either:
    	- Avoid using `tokenizers` before the fork if possible
    	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
    100%|██████████| 500/500 [00:01<00:00, 379.54it/s, acc=0.91]

 %% Cell type:markdown id: tags:

 Let's see what our new thresholds look like:

 %% Cell type:code id: tags:

 ``` python
 route_thresholds = rl.get_thresholds()
 print("Updated route thresholds:", route_thresholds)
 ```

 %% Output

    Updated route thresholds: {'politics': 0.20202020202020204, 'chitchat': 0.5, 'mathematics': 0.14141414141414144, 'biology': 0.1806754412815019}

 %% Cell type:markdown id: tags:

 These are vastly different thresholds to what we were seeing before — it's worth noting that _optimal_ values for different encoders can vary greatly. For example, OpenAI's Ada 002 model, when used with our encoders will tend to output much larger numbers in the `0.5` to `0.8` range.

 After training we have a final performance of:

 %% Cell type:code id: tags:

 ``` python
 accuracy = rl.evaluate(X=X, y=y)
 print(f"Accuracy: {accuracy*100:.2f}%")
 ```

 %% Output

    Accuracy: 90.91%

 %% Cell type:markdown id: tags:

 That is _much_ better. If we wanted to optimize this further we can focus on adding more utterances to our existing routes, analyzing _where_ exactly our failures are, and modifying our routes around those. This extended optimzation process is much more manual, but with it we can continue optimizing routes to get even better performance.