diff --git a/docs/07-route-conversations-by-topic.ipynb b/docs/07-route-conversations-by-topic.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..fca110c7edfc1303da89761172bd661d6c9a848b --- /dev/null +++ b/docs/07-route-conversations-by-topic.ipynb @@ -0,0 +1,294 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Split Conversations by Topic" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Topics Splitters have been implemented in the code in `semantic-router/splitters`.\n", + "\n", + "These allow a set of utterances to be automatically grouped/clustered into (un-labelled) topics. \n", + "\n", + "Additionally, splitters have been integrated with `Conversation` objects allowing conversations to be progressively spit by topic as they evolve. This is beneficial to routing, as earlier messages in a conversation topic might provide useful context when determining routes. By using all utterances in the latest conversation this additional context allows for correct routes to be more reliably chosen." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example: IT Support Dialogue" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, we import the necessary classes and initialize the conversation with dialogue." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "c:\\Users\\Siraj\\Documents\\Personal\\Work\\Aurelio\\20240130 2148 Semantic Topic Splitter (Siraj Local Repo)\\venvs\\semantic_splitter_1\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], + "source": [ + "from semantic_router.text import Conversation\n", + "from semantic_router.schema import Message\n", + "\n", + "# Initialize the Conversation\n", + "conversation = Conversation()\n", + "\n", + "# Define the IT support dialogue\n", + "messages = [\n", + " Message(role=\"user\", content=\"Hi, there, please can you confirm your full name\"),\n", + " Message(role=\"user\", content=\"Hi, my name is John Doe.\"),\n", + " Message(role=\"bot\", content=\"Okay, how can I help you today?\"),\n", + " Message(role=\"user\", content=\"My computer keeps crashing\"),\n", + " Message(role=\"bot\", content=\"Okay, is our software running when the computer crashes.\"),\n", + " Message(role=\"user\", content=\"Yeah, v.3.11.2 is running when it crashes.\"),\n", + "]\n", + "\n", + "# Add messages to the conversation\n", + "conversation.add_new_messages(messages)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize an Encoder" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "from semantic_router.encoders.cohere import CohereEncoder\n", + "\n", + "cohere_encoder = CohereEncoder(\n", + " name=\"embed-english-v3.0\", \n", + " cohere_api_key='UAeHalpFY5WNc5eL1v9a3ARRFA3VBYZspJcXwvk4',\n", + " input_type=\"search_document\",\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Split Conversation by Topic" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All Topics:\n", + "Topic 2: - user: Hi, there, please can you confirm your full name\n", + "Topic 2: - user: Hi, my name is John Doe.\n", + "Topic 3: - bot: Okay, how can I help you today?\n", + "Topic 4: - user: My computer keeps crashing\n", + "Topic 4: - bot: Okay, is our software running when the computer crashes.\n", + "Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.\n", + "\n", + "\n" + ] + } + ], + "source": [ + "conversation.configure_splitter(\n", + " encoder=cohere_encoder, \n", + " threshold=0.5, \n", + " split_method=\"cumulative_similarity\"\n", + ")\n", + "\n", + "all_topics, new_topics = conversation.split_by_topic()\n", + "\n", + "# Display all topics\n", + "print(\"All Topics:\")\n", + "for i, (topic_id, doc) in enumerate(all_topics):\n", + " print(f\"Topic {topic_id + 1}: - {doc}\")\n", + "print(\"\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that the last message says \"Yeah, it crashes right after I start the software\".\n", + "\n", + "This might be correctly routed by the semantic-router, particularly if the route is quite generic, intended for \"software\" and/or \"crashes\".\n", + "\n", + "However, as an illustrative example, what if the routes were \n", + "\n", + "Route A: \"Sotware Crashes - v3.11\"\n", + "\n", + "Route B: \"Computer Crashes - v3.11\"\n", + "\n", + "If just the last utterance was used, then Route A would likely be chosen. However, if instead every utterance from the last topic (Topic 4), concatenated together, were sent to the semantic-router, then this important additional context would most likely result in Route A being chosen.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Topic Splitting After Topic Continuation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that topics can be continued even after `conversation.split_by_topic()` has already been run. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Add some new messages." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# Define the IT support dialogue\n", + "messages = [\n", + " Message(role=\"user\", content=\"What do the system logs say, right before the crash?\"),\n", + " Message(role=\"user\", content=\"I'll check soon, but first let's talk refund.\"),\n", + " Message(role=\"bot\", content=\"Okay let me sort out a refund.\"),\n", + "]\n", + "\n", + "# Add messages to the conversation\n", + "conversation.add_new_messages(messages)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All Topics:\n", + "Topic 2: - user: Hi, there, please can you confirm your full name\n", + "Topic 2: - user: Hi, my name is John Doe.\n", + "Topic 3: - bot: Okay, how can I help you today?\n", + "Topic 4: - user: My computer keeps crashing\n", + "Topic 4: - bot: Okay, is our software running when the computer crashes.\n", + "Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.\n", + "Topic 4: - user: What do the system logs say, right before the crash?\n", + "Topic 5: - user: I'll check soon, but first let's talk refund.\n", + "Topic 5: - bot: Okay let me sort out a refund.\n", + "\n", + "\n" + ] + } + ], + "source": [ + "conversation.configure_splitter(\n", + " encoder=cohere_encoder, \n", + " threshold=0.5, \n", + " split_method=\"cumulative_similarity\"\n", + ")\n", + "\n", + "all_topics, new_topics = conversation.split_by_topic()\n", + "\n", + "# Display all topics\n", + "print(\"All Topics:\")\n", + "for i, (topic_id, doc) in enumerate(all_topics):\n", + " print(f\"Topic {topic_id + 1}: - {doc}\")\n", + "print(\"\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see, we:\n", + "\n", + "1) Added the first six messages, as seen above, to the `Conversation`.\n", + "2) Ran the Topic Splitter.\n", + "3) Added the last two messages to the `Conversation`.\n", + "4) Ran the Topic Splitter again.\n", + "\n", + "Despite \"user: Yeah, v.3.11.2 is running when it crashes\" and \"user: What do the system logs say, right before the crash?\" being added and separately, and despite the conversation splitter being run twice (once before user: What do the system logs say, right before the crash?\" was added, and once after), both these utterances were successfully assigned the same Topic - `Topic 4`.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "semantic_splitter_1", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}