{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Split Conversations by Topic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Topics Splitters have been implemented in the code in `semantic-router/splitters`.\n",
    "\n",
    "These allow a set of utterances to be automatically grouped/clustered into (un-labelled) topics. \n",
    "\n",
    "Additionally, splitters have been integrated with `Conversation` objects allowing conversations to be progressively spit by topic as they evolve. This is beneficial to routing, as earlier messages in a conversation topic might provide useful context when determining routes. By using all utterances in the latest conversation this additional context allows for correct routes to be more reliably chosen."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: IT Support Dialogue"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, we import the necessary classes and initialize the conversation with dialogue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\Siraj\\Documents\\Personal\\Work\\Aurelio\\20240130 2148 Semantic Topic Splitter (Siraj Local Repo)\\venvs\\semantic_splitter_1\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "from semantic_router.text import Conversation\n",
    "from semantic_router.schema import Message\n",
    "\n",
    "# Initialize the Conversation\n",
    "conversation = Conversation()\n",
    "\n",
    "# Define the IT support dialogue\n",
    "messages = [\n",
    "    Message(role=\"user\", content=\"Hi, there, please can you confirm your full name\"),\n",
    "    Message(role=\"user\", content=\"Hi, my name is John Doe.\"),\n",
    "    Message(role=\"bot\", content=\"Okay, how can I help you today?\"),\n",
    "    Message(role=\"user\", content=\"My computer keeps crashing\"),\n",
    "    Message(role=\"bot\", content=\"Okay, is our software running when the computer crashes.\"),\n",
    "    Message(role=\"user\", content=\"Yeah, v.3.11.2 is running when it crashes.\"),\n",
    "]\n",
    "\n",
    "# Add messages to the conversation\n",
    "conversation.add_new_messages(messages)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initialize an Encoder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from semantic_router.encoders.cohere import CohereEncoder\n",
    "\n",
    "cohere_encoder = CohereEncoder(\n",
    "    name=\"embed-english-v3.0\", \n",
    "    cohere_api_key='',\n",
    "    input_type=\"search_document\",\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Split Conversation by Topic"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All Topics:\n",
      "Topic 2: - user: Hi, there, please can you confirm your full name\n",
      "Topic 2: - user: Hi, my name is John Doe.\n",
      "Topic 3: - bot: Okay, how can I help you today?\n",
      "Topic 4: - user: My computer keeps crashing\n",
      "Topic 4: - bot: Okay, is our software running when the computer crashes.\n",
      "Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "conversation.configure_splitter(\n",
    "    encoder=cohere_encoder, \n",
    "    threshold=0.5, \n",
    "    split_method=\"cumulative_similarity\"\n",
    ")\n",
    "\n",
    "all_topics, new_topics = conversation.split_by_topic()\n",
    "\n",
    "# Display all topics\n",
    "print(\"All Topics:\")\n",
    "for i, (topic_id, doc) in enumerate(all_topics):\n",
    "    print(f\"Topic {topic_id + 1}: - {doc}\")\n",
    "print(\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the last message says \"Yeah, it crashes right after I start the software\".\n",
    "\n",
    "This might be correctly routed by the semantic-router, particularly if the route is quite generic, intended for \"software\" and/or \"crashes\".\n",
    "\n",
    "However, as an illustrative example, what if the routes were \n",
    "\n",
    "Route A: \"Sotware Crashes - v3.11\"\n",
    "\n",
    "Route B: \"Computer Crashes - v3.11\"\n",
    "\n",
    "If just the last utterance was used, then Route A would likely be chosen. However, if instead every utterance from the last topic (Topic 4), concatenated together, were sent to the semantic-router, then this important additional context would most likely result in Route A being chosen.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Topic Splitting After Topic Continuation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that topics can be continued even after `conversation.split_by_topic()` has already been run. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add some new messages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the IT support dialogue\n",
    "messages = [\n",
    "    Message(role=\"user\", content=\"What do the system logs say, right before the crash?\"),\n",
    "    Message(role=\"user\", content=\"I'll check soon, but first let's talk refund.\"),\n",
    "    Message(role=\"bot\", content=\"Okay let me sort out a refund.\"),\n",
    "]\n",
    "\n",
    "# Add messages to the conversation\n",
    "conversation.add_new_messages(messages)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All Topics:\n",
      "Topic 2: - user: Hi, there, please can you confirm your full name\n",
      "Topic 2: - user: Hi, my name is John Doe.\n",
      "Topic 3: - bot: Okay, how can I help you today?\n",
      "Topic 4: - user: My computer keeps crashing\n",
      "Topic 4: - bot: Okay, is our software running when the computer crashes.\n",
      "Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.\n",
      "Topic 4: - user: What do the system logs say, right before the crash?\n",
      "Topic 5: - user: I'll check soon, but first let's talk refund.\n",
      "Topic 5: - bot: Okay let me sort out a refund.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "conversation.configure_splitter(\n",
    "    encoder=cohere_encoder, \n",
    "    threshold=0.5, \n",
    "    split_method=\"cumulative_similarity\"\n",
    ")\n",
    "\n",
    "all_topics, new_topics = conversation.split_by_topic()\n",
    "\n",
    "# Display all topics\n",
    "print(\"All Topics:\")\n",
    "for i, (topic_id, doc) in enumerate(all_topics):\n",
    "    print(f\"Topic {topic_id + 1}: - {doc}\")\n",
    "print(\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, we:\n",
    "\n",
    "1) Added the first six messages, as seen above, to the `Conversation`.\n",
    "2) Ran the Topic Splitter.\n",
    "3) Added the last two messages to the `Conversation`.\n",
    "4) Ran the Topic Splitter again.\n",
    "\n",
    "Despite \"user: Yeah, v.3.11.2 is running when it crashes\" and \"user: What do the system logs say, right before the crash?\" being added and separately, and despite the conversation splitter being run twice (once before user: What do the system logs say, right before the crash?\" was added, and once after), both these utterances were successfully assigned the same Topic - `Topic 4`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "semantic_splitter_1",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}