# Split Conversations by Topic

Topics Splitters have been implemented in the code in `semantic-router/splitters`.

These allow a set of utterances to be automatically grouped/clustered into (un-labelled) topics. 

Additionally, splitters have been integrated with `Conversation` objects allowing conversations to be progressively spit by topic as they evolve. This is beneficial to routing, as earlier messages in a conversation topic might provide useful context when determining routes. By using all utterances in the latest conversation this additional context allows for correct routes to be more reliably chosen.

## Example: IT Support Dialogue

### Setup

First, we import the necessary classes and initialize the conversation with dialogue.

In [1]:
from semantic_router.text import Conversation
from semantic_router.schema import Message

# Initialize the Conversation
conversation = Conversation()

# Define the IT support dialogue
messages = [
    Message(role="user", content="Hi, there, please can you confirm your full name"),
    Message(role="user", content="Hi, my name is John Doe."),
    Message(role="bot", content="Okay, how can I help you today?"),
    Message(role="user", content="My computer keeps crashing"),
    Message(role="bot", content="Okay, is our software running when the computer crashes."),
    Message(role="user", content="Yeah, v.3.11.2 is running when it crashes."),
]

# Add messages to the conversation
conversation.add_new_messages(messages)

  from .autonotebook import tqdm as notebook_tqdm


### Initialize an Encoder

In [2]:
from semantic_router.encoders.cohere import CohereEncoder

cohere_encoder = CohereEncoder(
    name="embed-english-v3.0", 
    cohere_api_key='',
    input_type="search_document",
    )

### Split Conversation by Topic

In [3]:
conversation.configure_splitter(
    encoder=cohere_encoder, 
    threshold=0.5, 
    split_method="cumulative_similarity"
)

all_topics, new_topics = conversation.split_by_topic()

# Display all topics
print("All Topics:")
for i, (topic_id, doc) in enumerate(all_topics):
    print(f"Topic {topic_id + 1}: - {doc}")
print("\n")

All Topics:
Topic 2: - user: Hi, there, please can you confirm your full name
Topic 2: - user: Hi, my name is John Doe.
Topic 3: - bot: Okay, how can I help you today?
Topic 4: - user: My computer keeps crashing
Topic 4: - bot: Okay, is our software running when the computer crashes.
Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.




Notice that the last message says "Yeah, it crashes right after I start the software".

This might be correctly routed by the semantic-router, particularly if the route is quite generic, intended for "software" and/or "crashes".

However, as an illustrative example, what if the routes were 

Route A: "Sotware Crashes - v3.11"

Route B: "Computer Crashes - v3.11"

If just the last utterance was used, then Route A would likely be chosen. However, if instead every utterance from the last topic (Topic 4), concatenated together, were sent to the semantic-router, then this important additional context would most likely result in Route A being chosen.


### Topic Splitting After Topic Continuation

Note that topics can be continued even after `conversation.split_by_topic()` has already been run. 

Add some new messages.

In [4]:
# Define the IT support dialogue
messages = [
    Message(role="user", content="What do the system logs say, right before the crash?"),
    Message(role="user", content="I'll check soon, but first let's talk refund."),
    Message(role="bot", content="Okay let me sort out a refund."),
]

# Add messages to the conversation
conversation.add_new_messages(messages)

In [5]:
conversation.configure_splitter(
    encoder=cohere_encoder, 
    threshold=0.5, 
    split_method="cumulative_similarity"
)

all_topics, new_topics = conversation.split_by_topic()

# Display all topics
print("All Topics:")
for i, (topic_id, doc) in enumerate(all_topics):
    print(f"Topic {topic_id + 1}: - {doc}")
print("\n")

All Topics:
Topic 2: - user: Hi, there, please can you confirm your full name
Topic 2: - user: Hi, my name is John Doe.
Topic 3: - bot: Okay, how can I help you today?
Topic 4: - user: My computer keeps crashing
Topic 4: - bot: Okay, is our software running when the computer crashes.
Topic 4: - user: Yeah, v.3.11.2 is running when it crashes.
Topic 4: - user: What do the system logs say, right before the crash?
Topic 5: - user: I'll check soon, but first let's talk refund.
Topic 5: - bot: Okay let me sort out a refund.




As you can see, we:

1) Added the first six messages, as seen above, to the `Conversation`.
2) Ran the Topic Splitter.
3) Added the last two messages to the `Conversation`.
4) Ran the Topic Splitter again.

Despite "user: Yeah, v.3.11.2 is running when it crashes" and "user: What do the system logs say, right before the crash?" being added and separately, and despite the conversation splitter being run twice (once before user: What do the system logs say, right before the crash?" was added, and once after), both these utterances were successfully assigned the same Topic - `Topic 4`.
