{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Split Conversations by Topic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Topics Splitters have been implemented in the code in `semantic-router/splitters`.\n", "\n", "These allow a set of utterances to be automatically grouped/clustered into (un-labelled) topics. \n", "\n", "Additionally, splitters have been integrated with `Conversation` objects allowing conversations to be progressively spit by topic as they evolve. This is beneficial to routing, as earlier messages in a conversation topic might provide useful context when determining routes. By using all utterances in the latest conversation this additional context allows for correct routes to be more reliably chosen." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: IT Support Dialogue" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we import the necessary classes and initialize the conversation with dialogue." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "c:\\Users\\Siraj\\Documents\\Personal\\Work\\Aurelio\\20240130 2148 Semantic Topic Splitter (Siraj Local Repo)\\venvs\\semantic_splitter_1\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "from semantic_router.text import Conversation\n", "from semantic_router.schema import Message\n", "\n", "# Initialize the Conversation\n", "conversation = Conversation()\n", "\n", "# Define the IT support dialogue\n", "messages = [\n", " Message(role=\"user\", content=\"Hi, there, please can you confirm your full name\"),\n", " Message(role=\"user\", content=\"Hi, my name is John Doe.\"),\n", " Message(role=\"bot\", content=\"Okay, how can I help you today?\"),\n", " Message(role=\"user\", content=\"My computer keeps crashing\"),\n", " Message(role=\"bot\", content=\"Okay, is our software running when the computer crashes.\"),\n", " Message(role=\"user\", content=\"Yeah, v.3.11.2 is running when it crashes.\"),\n", "]\n", "\n", "# Add messages to the conversation\n", "conversation.add_new_messages(messages)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initialize an Encoder" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from semantic_router.encoders.cohere import CohereEncoder\n", "\n", "cohere_encoder = CohereEncoder(\n", " name=\"embed-english-v3.0\", \n", " cohere_api_key='',\n", " input_type=\"search_document\",\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Split Conversation by Topic" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All Topics:\n", "Topic 2: - user: Hi, there, please can you confirm your full name\n", "Topic 2: - user: Hi, my name is John Doe.\n", "Topic 3: - bot: Okay, how can I help you today?\n", "Topic 4: - user: My computer keeps crashing\n", "Topic 4: - bot: Okay, is our software running when the computer crashes.\n", "Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.\n", "\n", "\n" ] } ], "source": [ "conversation.configure_splitter(\n", " encoder=cohere_encoder, \n", " threshold=0.5, \n", " split_method=\"cumulative_similarity\"\n", ")\n", "\n", "all_topics, new_topics = conversation.split_by_topic()\n", "\n", "# Display all topics\n", "print(\"All Topics:\")\n", "for i, (topic_id, doc) in enumerate(all_topics):\n", " print(f\"Topic {topic_id + 1}: - {doc}\")\n", "print(\"\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the last message says \"Yeah, it crashes right after I start the software\".\n", "\n", "This might be correctly routed by the semantic-router, particularly if the route is quite generic, intended for \"software\" and/or \"crashes\".\n", "\n", "However, as an illustrative example, what if the routes were \n", "\n", "Route A: \"Sotware Crashes - v3.11\"\n", "\n", "Route B: \"Computer Crashes - v3.11\"\n", "\n", "If just the last utterance was used, then Route A would likely be chosen. However, if instead every utterance from the last topic (Topic 4), concatenated together, were sent to the semantic-router, then this important additional context would most likely result in Route A being chosen.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Topic Splitting After Topic Continuation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that topics can be continued even after `conversation.split_by_topic()` has already been run. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add some new messages." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Define the IT support dialogue\n", "messages = [\n", " Message(role=\"user\", content=\"What do the system logs say, right before the crash?\"),\n", " Message(role=\"user\", content=\"I'll check soon, but first let's talk refund.\"),\n", " Message(role=\"bot\", content=\"Okay let me sort out a refund.\"),\n", "]\n", "\n", "# Add messages to the conversation\n", "conversation.add_new_messages(messages)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All Topics:\n", "Topic 2: - user: Hi, there, please can you confirm your full name\n", "Topic 2: - user: Hi, my name is John Doe.\n", "Topic 3: - bot: Okay, how can I help you today?\n", "Topic 4: - user: My computer keeps crashing\n", "Topic 4: - bot: Okay, is our software running when the computer crashes.\n", "Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.\n", "Topic 4: - user: What do the system logs say, right before the crash?\n", "Topic 5: - user: I'll check soon, but first let's talk refund.\n", "Topic 5: - bot: Okay let me sort out a refund.\n", "\n", "\n" ] } ], "source": [ "conversation.configure_splitter(\n", " encoder=cohere_encoder, \n", " threshold=0.5, \n", " split_method=\"cumulative_similarity\"\n", ")\n", "\n", "all_topics, new_topics = conversation.split_by_topic()\n", "\n", "# Display all topics\n", "print(\"All Topics:\")\n", "for i, (topic_id, doc) in enumerate(all_topics):\n", " print(f\"Topic {topic_id + 1}: - {doc}\")\n", "print(\"\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, we:\n", "\n", "1) Added the first six messages, as seen above, to the `Conversation`.\n", "2) Ran the Topic Splitter.\n", "3) Added the last two messages to the `Conversation`.\n", "4) Ran the Topic Splitter again.\n", "\n", "Despite \"user: Yeah, v.3.11.2 is running when it crashes\" and \"user: What do the system logs say, right before the crash?\" being added and separately, and despite the conversation splitter being run twice (once before user: What do the system logs say, right before the crash?\" was added, and once after), both these utterances were successfully assigned the same Topic - `Topic 4`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "semantic_splitter_1", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 2 }