Skip to content
Snippets Groups Projects
sync.rst 10.5 KiB
Newer Older
  • Learn to ignore specific revisions
  • Synchronizing the Route Layer with Indexes
    
    James Briggs's avatar
    James Briggs committed
    ==========================================
    
    
    The `RouteLayer` class is the main class in the semantic router package. It
    contains the routes and allows us to interact with the underlying index. Both
    the `RouteLayer` and the various index classes support synchronization
    strategies that allow us to synchronize the routes and utterances in the layer
    with the underlying index.
    
    This functionality becomes increasingly important when using the semantic
    router in a distributed environment. For example, when using one of the *remote
    instances*, such as `PineconeIndex` or `QdrantIndex`. Deciding the correct
    synchronization strategy for these remote indexes will save application time
    and reduce the risk of errors.
    
    Semantic router supports several synchronization strategies. Those strategies
    
    James Briggs's avatar
    James Briggs committed
    are:
    
    * `error`: Raise an error if local and remote are not synchronized.
    
    James Briggs's avatar
    James Briggs committed
    * `remote`: Take remote as the source of truth and update local to align.
    
    James Briggs's avatar
    James Briggs committed
    * `local`: Take local as the source of truth and update remote to align.
    
    
    * `merge-force-local`: Merge both local and remote keeping local as the
      priority. Remote utterances are only merged into local *if* a matching route
      for the utterance is found in local, all other route-utterances are dropped.
      Where a route exists in both local and remote, but each contains different
      `function_schema` or `metadata` information, the local version takes priority
      and local `function_schemas` and `metadata` is propogated to all remote
      utterances belonging to the given route.
    
    * `merge-force-remote`: Merge both local and remote keeping remote as the
      priority. Local utterances are only merged into remote *if* a matching route
      for the utterance is found in the remote, all other route-utterances are
      dropped. Where a route exists in both local and remote, but each contains
      different `function_schema` or `metadata` information, the remote version takes
      priotity and remote `function_schemas` and `metadata` are propogated to all
      local routes.
    
    * `merge`: Merge both local and remote, merging also local and remote utterances when a route with same route name is present both locally and remotely. If a route exists in both local and remote but contains different `function_schemas` or `metadata` information, the local version takes priority and local `function_schemas` and `metadata` are propogated to all remote routes.
    
    James Briggs's avatar
    James Briggs committed
    
    
    There are two ways to specify the synchronization strategy. The first is to
    specify the strategy when initializing the `RouteLayer` object via the
    `auto_sync` parameter. The second is to trigger synchronization directly via
    the `RouteLayer.sync` method.
    
    ---
    
    Using the `auto_sync` parameter
    -------------------------------
    
    The `auto_sync` parameter is used to specify the synchronization strategy when
    initializing the `RouteLayer` object. Depending on the chosen strategy, the
    `RouteLayer` object will automatically synchronize with the defined index. As
    this happens on initialization, this will often increase the initialization
    time of the `RouteLayer` object.
    
    Let's see an example of `auto_sync` in action.
    
    James Briggs's avatar
    James Briggs committed
    
    .. code-block:: python
    
        from semantic_router import Route
    
    
        # we could use this as a guide for our chatbot to avoid political conversations
    
    James Briggs's avatar
    James Briggs committed
        politics = Route(
            name="politics",
            utterances=[
                "isn't politics the best thing ever",
                "why don't you tell me about your political opinions",
                "don't you just love the president",
    
                "don't you just hate the president",
                "they're going to destroy this country!",
                "they will save the country!",
    
        # this could be used as an indicator to our chatbot to switch to a more
        # conversational prompt
    
    James Briggs's avatar
    James Briggs committed
        chitchat = Route(
            name="chitchat",
            utterances=[
                "how's the weather today?",
                "how are things going?",
    
                "lovely weather today",
                "the weather is horrendous",
                "let's go to the chippy",
    
        # we place both of our decisions together into single list
    
    James Briggs's avatar
    James Briggs committed
        routes = [politics, chitchat]
    
        encoder = OpenAIEncoder(openai_api_key=openai_api_key)
    
        pc_index = PineconeIndex(
            api_key=pinecone_api_key,
            region="us-east-1",
            index_name="sync-example",
    
        )
        # before initializing the RouteLayer with auto_sync we should initialize
        # the index
        pc_index.index = pc_index._init_index(force_create=True)
    
        # now we can initialize the RouteLayer with local auto_sync
        rl = RouteLayer(
            encoder=encoder, routes=routes, index=pc_index,
            auto_sync="local"
    
    Now we can run `rl.is_synced()` to confirm that our local and remote instances
    are synchronized.
    
    .. code-block:: python
    
        rl.is_synced()
    
    Vittorio's avatar
    Vittorio committed
    
    Checking for Synchronization
    ----------------------------
    
    
    James Briggs's avatar
    James Briggs committed
    To verify whether the local and remote instances are synchronized, you can use
    the `RouteLayer.is_synced` method. This method checks if the routes, utterances,
    and associated metadata in the local instance match those stored in the remote
    index.
    
    The `is_synced` method works in two steps. The first is our *fast* sync check.
    The fast check creates a hash of our local route layer which is constructed
    from:
    
    - `encoder_type` and `encoder_name`
    - `route` names
    - `route` utterances
    - `route` description
    - `route` function schemas (if any)
    - `route` llm (if any)
    - `route` score threshold
    - `route` metadata (if any)
    
    The fast check then compares this hash to the hash of the remote index. If
    the hashes match, we know that the local and remote instances are synchronized
    and we can return `True`. If the hashes do not match, we need to perform a
    *slow* sync check.
    
    The slow sync check works by creating a `LayerConfig` object from the remote
    index and then comparing this to our local `LayerConfig` object. If the two
    objects match, we know that the local and remote instances are synchronized and
    
    we can return `True`. If the two objects do not match, we must investigate and
    decide how to synchronize the two instances.
    
    
    James Briggs's avatar
    James Briggs committed
    To quickly sync the local and remote instances we can use the `RouteLayer.sync`
    method. This method is equivalent to the `auto_sync` strategy specified when
    initializing the `RouteLayer` object. So, if we assume our local `RouteLayer`
    object contains the ground truth routes, we would use the `local` strategy to
    copy our local routes to the remote instance.
    
    James Briggs's avatar
    James Briggs committed
    .. code-block:: python
    
        rl.sync(sync_mode="local")
    
    After running the above code, we can check whether the local and remote
    instances are synchronized by rerunning `rl.is_synced()`, which should now
    return `True`.
    
    Investigating Synchronization Differences
    -----------------------------------------
    
    We may often need to further investigate and understand *why* our local and
    remote instances have become desynchronized. The first step in further investigation and resolution of synchronization
    differences is to see the differences. We can get a readable diff using the
    
    `RouteLayer.get_utterance_diff` method.
    
    .. code-block:: python
    
        diff = rl.get_utterance_diff()
    
    .. code-block:: python
    
        ["- politics: don't you just hate the president",
        "- politics: don't you just love the president",
        "- politics: isn't politics the best thing ever",
        '- politics: they will save the country!',
        "- politics: they're going to destroy this country!",
        "- politics: why don't you tell me about your political opinions",
        '+ chitchat: how\'s the weather today?',
        '+ chitchat: how are things going?',
        '+ chitchat: lovely weather today',
        '+ chitchat: the weather is horrendous',
        '+ chitchat: let\'s go to the chippy']
    
    James Briggs's avatar
    James Briggs committed
    
    The diff works by creating a list of all the routes in the remote index and
    then comparing these to the routes in our local instance. Any differences
    
    between the remote and local routes are shown in the above diff.
    
    Now, to resolve these differences we will need to initialize an `UtteranceDiff`
    object. This object will contain the differences between the remote and local
    utterances. We can then use this object to decide how to synchronize the two
    instances. To initialize the `UtteranceDiff` object we need to get our local
    and remote utterances.
    
    .. code-block:: python
    
        local_utterances = rl.to_config().to_utterances()
        remote_utterances = rl.index.get_utterances()
    
    We create an utterance diff object like so:
    
    .. code-block:: python
    
        diff = UtteranceDiff.from_utterances(
            local_utterances=local_utterances, remote_utterances=remote_utterances
        )
    
    `UtteranceDiff` objects include all diff information inside the `diff`
    attribute (which is a list of `Utterance` objects). Each of our `Utterance`
    objects inside `UtteranceDiff.diff` now contain a populated `diff_tag`
    attribute, where:
    
    - `diff_tag='+'` indicates the utterance exists in the remote instance *only*.
    - `diff_tag='-'` indicates the utterance exists in the local instance *only*.
    - `diff_tag=' '` indicates the utterance exists in both the local and remote
      instances.
    
    After initializing an `UtteranceDiff` object we can get all utterances with
    each diff tag like so:
    
    .. code-block:: python
    
        # all utterances that exist only in remote
        diff.get_utterances(diff_tag='+')
    
        # all utterances that exist only in local
        diff.get_utterances(diff_tag='-')
    
        # all utterances that exist in both local and remote
        diff.get_utterances(diff_tag=' ')
    
    James Briggs's avatar
    James Briggs committed
    
    These can be investigated if needed. Once we're happy with our understanding
    of the issues we can resolve them by executing a synchronization by running
    the `RouteLayer._execute_sync_strategy` method:
    
    .. code-block:: python
    
        rl._execute_sync_strategy(sync_mode="local")
    
    Once complete, we can confirm that our local and remote instances are
    synchronized by running `rl.is_synced()`:
    
    .. code-block:: python
    
        rl.is_synced()
    
    If the above returns `True` we are now synchronized!
    
    
    James Briggs's avatar
    James Briggs committed
    .. code-block::
    
    James Briggs's avatar
    James Briggs committed
                          .=                
                         :%%*               
                        -%%%%#              
                       =%%%%%%#.            
                      +%%%%%%%+             
                     *%%%%%%%=              
                   .#%%%%%%%-               
                  .#%%%%%%%: -%:            
                 :%%%%%%%#. =%%%=           
                -%%%%%%%#  *%%%%%+          
               =%%%%%%%*  -%%%%%%%*         
              .-------:    -%%%%%%%#        
        :*****************+ :%%%%%%%#.      
       -%%%%%%%%%%%%%%%%%%%* .#%%%%%%%:     
      =%%%%%%%%%%%%%%%%%%%%%#..#%%%%%%%-    
     +%%%%%%%%%%%%%%%%%%%%%%%#. *%%%%%%%=   
                                 +%%%%%%%+  
                                  =#######+