"Topics Splitters have been implemented in the code in `semantic-router/splitters`.\n",
"\n",
"These allow a set of utterances to be automatically grouped/clustered into (un-labelled) topics. \n",
"\n",
"Additionally, splitters have been integrated with `Conversation` objects allowing conversations to be progressively spit by topic as they evolve. This is beneficial to routing, as earlier messages in a conversation topic might provide useful context when determining routes. By using all utterances in the latest conversation this additional context allows for correct routes to be more reliably chosen."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example: IT Support Dialogue"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we import the necessary classes and initialize the conversation with dialogue."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\Siraj\\Documents\\Personal\\Work\\Aurelio\\20240130 2148 Semantic Topic Splitter (Siraj Local Repo)\\venvs\\semantic_splitter_1\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
"for i, (topic_id, doc) in enumerate(all_topics):\n",
" print(f\"Topic {topic_id + 1}: - {doc}\")\n",
"print(\"\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that the last message says \"Yeah, it crashes right after I start the software\".\n",
"\n",
"This might be correctly routed by the semantic-router, particularly if the route is quite generic, intended for \"software\" and/or \"crashes\".\n",
"\n",
"However, as an illustrative example, what if the routes were \n",
"\n",
"Route A: \"Sotware Crashes - v3.11\"\n",
"\n",
"Route B: \"Computer Crashes - v3.11\"\n",
"\n",
"If just the last utterance was used, then Route A would likely be chosen. However, if instead every utterance from the last topic (Topic 4), concatenated together, were sent to the semantic-router, then this important additional context would most likely result in Route A being chosen.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Topic Splitting After Topic Continuation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that topics can be continued even after `conversation.split_by_topic()` has already been run. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Add some new messages."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Define the IT support dialogue\n",
"messages = [\n",
" Message(role=\"user\", content=\"What do the system logs say, right before the crash?\"),\n",
" Message(role=\"user\", content=\"I'll check soon, but first let's talk refund.\"),\n",
" Message(role=\"bot\", content=\"Okay let me sort out a refund.\"),\n",
"]\n",
"\n",
"# Add messages to the conversation\n",
"conversation.add_new_messages(messages)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All Topics:\n",
"Topic 2: - user: Hi, there, please can you confirm your full name\n",
"Topic 2: - user: Hi, my name is John Doe.\n",
"Topic 3: - bot: Okay, how can I help you today?\n",
"Topic 4: - user: My computer keeps crashing\n",
"Topic 4: - bot: Okay, is our software running when the computer crashes.\n",
"Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.\n",
"Topic 4: - user: What do the system logs say, right before the crash?\n",
"Topic 5: - user: I'll check soon, but first let's talk refund.\n",
"Topic 5: - bot: Okay let me sort out a refund.\n",
"for i, (topic_id, doc) in enumerate(all_topics):\n",
" print(f\"Topic {topic_id + 1}: - {doc}\")\n",
"print(\"\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, we:\n",
"\n",
"1) Added the first six messages, as seen above, to the `Conversation`.\n",
"2) Ran the Topic Splitter.\n",
"3) Added the last two messages to the `Conversation`.\n",
"4) Ran the Topic Splitter again.\n",
"\n",
"Despite \"user: Yeah, v.3.11.2 is running when it crashes\" and \"user: What do the system logs say, right before the crash?\" being added and separately, and despite the conversation splitter being run twice (once before user: What do the system logs say, right before the crash?\" was added, and once after), both these utterances were successfully assigned the same Topic - `Topic 4`.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "semantic_splitter_1",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
%% Cell type:markdown id: tags:
# Split Conversations by Topic
%% Cell type:markdown id: tags:
Topics Splitters have been implemented in the code in `semantic-router/splitters`.
These allow a set of utterances to be automatically grouped/clustered into (un-labelled) topics.
Additionally, splitters have been integrated with `Conversation` objects allowing conversations to be progressively spit by topic as they evolve. This is beneficial to routing, as earlier messages in a conversation topic might provide useful context when determining routes. By using all utterances in the latest conversation this additional context allows for correct routes to be more reliably chosen.
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
## Example: IT Support Dialogue
%% Cell type:markdown id: tags:
### Setup
%% Cell type:markdown id: tags:
First, we import the necessary classes and initialize the conversation with dialogue.
%% Cell type:code id: tags:
``` python
fromsemantic_router.textimportConversation
fromsemantic_router.schemaimportMessage
# Initialize the Conversation
conversation=Conversation()
# Define the IT support dialogue
messages=[
Message(role="user",content="Hi, there, please can you confirm your full name"),
Message(role="user",content="Hi, my name is John Doe."),
Message(role="bot",content="Okay, how can I help you today?"),
Message(role="bot",content="Okay, is our software running when the computer crashes."),
Message(role="user",content="Yeah, v.3.11.2 is running when it crashes."),
]
# Add messages to the conversation
conversation.add_new_messages(messages)
```
%% Output
c:\Users\Siraj\Documents\Personal\Work\Aurelio\20240130 2148 Semantic Topic Splitter (Siraj Local Repo)\venvs\semantic_splitter_1\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Topic 2: - user: Hi, there, please can you confirm your full name
Topic 2: - user: Hi, my name is John Doe.
Topic 3: - bot: Okay, how can I help you today?
Topic 4: - user: My computer keeps crashing
Topic 4: - bot: Okay, is our software running when the computer crashes.
Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.
%% Cell type:markdown id: tags:
Notice that the last message says "Yeah, it crashes right after I start the software".
This might be correctly routed by the semantic-router, particularly if the route is quite generic, intended for "software" and/or "crashes".
However, as an illustrative example, what if the routes were
Route A: "Sotware Crashes - v3.11"
Route B: "Computer Crashes - v3.11"
If just the last utterance was used, then Route A would likely be chosen. However, if instead every utterance from the last topic (Topic 4), concatenated together, were sent to the semantic-router, then this important additional context would most likely result in Route A being chosen.
%% Cell type:markdown id: tags:
### Topic Splitting After Topic Continuation
%% Cell type:markdown id: tags:
Note that topics can be continued even after `conversation.split_by_topic()` has already been run.
%% Cell type:markdown id: tags:
Add some new messages.
%% Cell type:code id: tags:
``` python
# Define the IT support dialogue
messages=[
Message(role="user",content="What do the system logs say, right before the crash?"),
Message(role="user",content="I'll check soon, but first let's talk refund."),
Message(role="bot",content="Okay let me sort out a refund."),
Topic 2: - user: Hi, there, please can you confirm your full name
Topic 2: - user: Hi, my name is John Doe.
Topic 3: - bot: Okay, how can I help you today?
Topic 4: - user: My computer keeps crashing
Topic 4: - bot: Okay, is our software running when the computer crashes.
Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.
Topic 4: - user: What do the system logs say, right before the crash?
Topic 5: - user: I'll check soon, but first let's talk refund.
Topic 5: - bot: Okay let me sort out a refund.
%% Cell type:markdown id: tags:
As you can see, we:
1) Added the first six messages, as seen above, to the `Conversation`.
2) Ran the Topic Splitter.
3) Added the last two messages to the `Conversation`.
4) Ran the Topic Splitter again.
Despite "user: Yeah, v.3.11.2 is running when it crashes" and "user: What do the system logs say, right before the crash?" being added and separately, and despite the conversation splitter being run twice (once before user: What do the system logs say, right before the crash?" was added, and once after), both these utterances were successfully assigned the same Topic - `Topic 4`.