Skip to content
Snippets Groups Projects
Unverified Commit 4de9294e authored by James Briggs's avatar James Briggs Committed by GitHub
Browse files

Merge pull request #266 from aurelio-labs/anup/splitter-fix

fix: video splitter -> type error
parents 29762f08 dafd97de
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/00-introduction.ipynb)
%% Cell type:markdown id: tags:
# Semantic Router Intro
%% Cell type:markdown id: tags:
The Semantic Router library can be used as a super fast route making layer on top of LLMs. That means rather than waiting on a slow agent to decide what to do, we can use the magic of semantic vector space to make routes. Cutting route making time down from seconds to milliseconds.
%% Cell type:markdown id: tags:
## Getting Started
%% Cell type:markdown id: tags:
We start by installing the library:
%% Cell type:code id: tags:
``` python
!pip install -qU semantic-router
```
%% Output
[notice] A new release of pip is available: 23.1.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip
%% Cell type:markdown id: tags:
We start by defining a dictionary mapping routes to example phrases that should trigger those routes.
%% Cell type:code id: tags:
``` python
from semantic_router import Route
politics = Route(
name="politics",
utterances=[
"isn't politics the best thing ever",
"why don't you tell me about your political opinions",
"don't you just love the president",
"don't you just hate the president",
"they're going to destroy this country!",
"they will save the country!",
],
)
```
%% Output
d:\Program_Installation\anaconda\envs\rag\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
%% Cell type:markdown id: tags:
Let's define another for good measure:
%% Cell type:code id: tags:
``` python
chitchat = Route(
name="chitchat",
utterances=[
"how's the weather today?",
"how are things going?",
"lovely weather today",
"the weather is horrendous",
"let's go to the chippy",
],
)
routes = [politics, chitchat]
```
%% Cell type:markdown id: tags:
Now we initialize our embedding model:
%% Cell type:code id: tags:
``` python
import os
from getpass import getpass
from semantic_router.encoders import CohereEncoder, OpenAIEncoder
# os.environ["COHERE_API_KEY"] = os.getenv("COHERE_API_KEY") or getpass(
# "Enter Cohere API Key: "
# )
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or getpass(
"Enter OpenAI API Key: "
)
# encoder = CohereEncoder()
encoder = OpenAIEncoder()
```
%% Cell type:markdown id: tags:
Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`.
%% Cell type:code id: tags:
``` python
from semantic_router.layer import RouteLayer
rl = RouteLayer(encoder=encoder, routes=routes)
```
%% Output
2024-04-19 18:34:06 INFO semantic_router.utils.logger local
2024-05-02 12:38:34 INFO semantic_router.utils.logger local
%% Cell type:markdown id: tags:
Now we can test it:
%% Cell type:code id: tags:
``` python
rl("don't you love politics?")
```
%% Output
RouteChoice(name='politics', function_call=None, similarity_score=None)
%% Cell type:code id: tags:
``` python
rl("how's the weather today?")
```
%% Output
RouteChoice(name='chitchat', function_call=None, similarity_score=None)
%% Cell type:markdown id: tags:
Both are classified accurately, what if we send a query that is unrelated to our existing `Route` objects?
%% Cell type:code id: tags:
``` python
rl("I'm interested in learning about llama 2")
```
%% Output
RouteChoice(name=None, function_call=None, similarity_score=None)
%% Cell type:markdown id: tags:
We can also retrieve multiple routes with its associated score:
%% Cell type:code id: tags:
``` python
rl.retrieve_multiple_routes("Hi! How are you doing in politics??")
```
%% Output
[RouteChoice(name='politics', function_call=None, similarity_score=0.8596186767854487),
RouteChoice(name='chitchat', function_call=None, similarity_score=0.8356239688161808)]
%% Cell type:code id: tags:
``` python
rl.retrieve_multiple_routes("I'm interested in learning about llama 2")
```
%% Output
[]
%% Cell type:code id: tags:
``` python
```
......
......@@ -80,9 +80,7 @@ class OpenAIEncoder(BaseEncoder):
if truncate:
# check if any document exceeds token limit and truncate if so
for i in range(len(docs)):
logger.info(f"Document {i+1} length: {len(docs[i])}")
docs[i] = self._truncate(docs[i])
logger.info(f"Document {i+1} trunc length: {len(docs[i])}")
# Exponential backoff
for j in range(1, 7):
......
from enum import Enum
from typing import List, Optional
from typing import List, Optional, Union, Any
from pydantic.v1 import BaseModel
......@@ -52,7 +51,7 @@ class Message(BaseModel):
class DocumentSplit(BaseModel):
docs: List[str]
docs: List[Union[str, Any]]
is_triggered: bool = False
triggered_score: Optional[float] = None
token_count: Optional[int] = None
......@@ -60,7 +59,7 @@ class DocumentSplit(BaseModel):
@property
def content(self) -> str:
return " ".join(self.docs)
return " ".join([doc if isinstance(doc, str) else "" for doc in self.docs])
class Metric(Enum):
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment