{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e92c26d9",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee50410e-3f98-4d9c-8838-b38aebd6ce77",
   "metadata": {},
   "source": [
    "# Local Dynamic Routes\n",
    "\n",
    "## Fully local Semantic Router with `llama.cpp` and HuggingFace Encoder\n",
    "\n",
    "There are many reasons users might choose to roll their own LLMs rather than use a third-party service. Whether it's due to cost, privacy or compliance, Semantic Router supports the use of \"local\" LLMs through `llama.cpp`.\n",
    "\n",
    "Using `llama.cpp` also enables the use of quantized GGUF models, reducing the memory footprint of deployed models, allowing even 13-billion parameter models to run with hardware acceleration on an Apple M1 Pro chip.\n",
    "\n",
    "Below is an example of using semantic router with **Mistral-7B-Instruct**, quantized i."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "baa8d577-9f23-4dec-b167-fdecfb313c52",
   "metadata": {},
   "source": [
    "## Installing the library\n",
    "\n",
    "> Note: if you require hardware acceleration via BLAS, CUDA, Metal, etc. please refer to the [abetlen/llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-with-specific-hardware-acceleration-blas-cuda-metal-etc) repository README.md"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f95e4906-c3e6-4905-8f13-5e67d67069d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -qU \"semantic-router[local]==0.0.20\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0029cc6d",
   "metadata": {},
   "source": [
    "If you're running on Apple silicon you can run the following to run with Metal hardware acceleration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4f9b5729",
   "metadata": {},
   "outputs": [],
   "source": [
    "!CMAKE_ARGS=\"-DLLAMA_METAL=on\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2f52f11-ae6d-4706-8da3-ce03a7a6b92d",
   "metadata": {},
   "source": [
    "## Download the Mistral 7B Instruct 4-bit GGUF files\n",
    "\n",
    "We will be using Mistral 7B Instruct, quantized as a 4-bit GGUF file, a good balance between performance and ability to deploy on consumer hardware"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d6ddf61-c189-4b3b-99df-9508f830ae1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "! curl -L \"https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_0.gguf?download=true\" -o ./mistral-7b-instruct-v0.2.Q4_0.gguf\n",
    "! ls mistral-7b-instruct-v0.2.Q4_0.gguf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6842324-0a81-44fb-a220-905af77601af",
   "metadata": {},
   "source": [
    "# Initializing Dynamic Routes\n",
    "\n",
    "Similar to the `02-dynamic-routes.ipynb` notebook, we will be initializing some dynamic routes that make use of LLMs for function calling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "e26db664-9dff-476a-84ef-edd7a8cdf1ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "from datetime import datetime\n",
    "from zoneinfo import ZoneInfo\n",
    "\n",
    "from semantic_router import Route\n",
    "from semantic_router.utils.function_call import get_schema\n",
    "\n",
    "\n",
    "def get_time(timezone: str) -> str:\n",
    "    \"\"\"Finds the current time in a specific timezone.\n",
    "\n",
    "    :param timezone: The timezone to find the current time in, should\n",
    "        be a valid timezone from the IANA Time Zone Database like\n",
    "        \"America/New_York\" or \"Europe/London\". Do NOT put the place\n",
    "        name itself like \"rome\", or \"new york\", you must provide\n",
    "        the IANA format.\n",
    "    :type timezone: str\n",
    "    :return: The current time in the specified timezone.\"\"\"\n",
    "    now = datetime.now(ZoneInfo(timezone))\n",
    "    return now.strftime(\"%H:%M\")\n",
    "\n",
    "\n",
    "time_schema = get_schema(get_time)\n",
    "time_schema\n",
    "time = Route(\n",
    "    name=\"get_time\",\n",
    "    utterances=[\n",
    "        \"what is the time in new york city?\",\n",
    "        \"what is the time in london?\",\n",
    "        \"I live in Rome, what time is it?\",\n",
    "    ],\n",
    "    function_schema=time_schema,\n",
    ")\n",
    "\n",
    "politics = Route(\n",
    "    name=\"politics\",\n",
    "    utterances=[\n",
    "        \"isn't politics the best thing ever\",\n",
    "        \"why don't you tell me about your political opinions\",\n",
    "        \"don't you just love the president\" \"don't you just hate the president\",\n",
    "        \"they're going to destroy this country!\",\n",
    "        \"they will save the country!\",\n",
    "    ],\n",
    ")\n",
    "chitchat = Route(\n",
    "    name=\"chitchat\",\n",
    "    utterances=[\n",
    "        \"how's the weather today?\",\n",
    "        \"how are things going?\",\n",
    "        \"lovely weather today\",\n",
    "        \"the weather is horrendous\",\n",
    "        \"let's go to the chippy\",\n",
    "    ],\n",
    ")\n",
    "\n",
    "routes = [politics, chitchat, time]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "fac95b0c-c61f-4158-b7d9-0221f7d0b65e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'get_time',\n",
       " 'description': 'Finds the current time in a specific timezone.\\n\\n:param timezone: The timezone to find the current time in, should\\n    be a valid timezone from the IANA Time Zone Database like\\n    \"America/New_York\" or \"Europe/London\". Do NOT put the place\\n    name itself like \"rome\", or \"new york\", you must provide\\n    the IANA format.\\n:type timezone: str\\n:return: The current time in the specified timezone.',\n",
       " 'signature': '(timezone: str) -> str',\n",
       " 'output': \"<class 'str'>\"}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "time_schema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddd15620-92bd-4b77-99f4-c3fe68e9ab62",
   "metadata": {},
   "source": [
    "# Encoders\n",
    "\n",
    "You can use alternative Encoders, however, in this example we want to showcase a fully-local Semantic Router execution, so we are going to use a `HuggingFaceEncoder` with `sentence-transformers/all-MiniLM-L6-v2` (the default) as an embedding model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5253c141-141b-4fda-b07c-a313393902ed",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "from semantic_router.encoders import HuggingFaceEncoder\n",
    "\n",
    "encoder = HuggingFaceEncoder()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "512fb46e-352b-4740-971e-ad4d047aa03b",
   "metadata": {},
   "source": [
    "# `llama.cpp` LLM\n",
    "\n",
    "From here, we can go ahead and instantiate our `llama-cpp-python` `llama_cpp.Llama` LLM, and then pass it to the `semantic_router.llms.LlamaCppLLM` wrapper class.\n",
    "\n",
    "For `llama_cpp.Llama`, there are a couple of parameters you should pay attention to:\n",
    "\n",
    "- `n_gpu_layers`: how many LLM layers to offload to the GPU (if you want to offload the entire model, pass `-1`, and for CPU execution, pass `0`)\n",
    "- `n_ctx`: context size, limit the number of tokens that can be passed to the LLM (this is bounded by the model's internal maximum context size, in this case for Mistral-7B-Instruct, 8000 tokens)\n",
    "- `verbose`: if `False`, silences output from `llama.cpp`\n",
    "\n",
    "> For other parameter explanation, refer to the `llama-cpp-python` [API Reference](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "772cec0d-7a0c-4c7e-9b7a-4a1864b0a8ec",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from ./mistral-7b-instruct-v0.2.Q4_0.gguf (version GGUF V3 (latest))\n",
      "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
      "llama_model_loader: - kv   0:                       general.architecture str              = llama\n",
      "llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2\n",
      "llama_model_loader: - kv   2:                       llama.context_length u32              = 32768\n",
      "llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096\n",
      "llama_model_loader: - kv   4:                          llama.block_count u32              = 32\n",
      "llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336\n",
      "llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128\n",
      "llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32\n",
      "llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8\n",
      "llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010\n",
      "llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000\n",
      "llama_model_loader: - kv  11:                          general.file_type u32              = 2\n",
      "llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama\n",
      "llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = [\"<unk>\", \"<s>\", \"</s>\", \"<0x00>\", \"<...\n",
      "llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...\n",
      "llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n",
      "llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1\n",
      "llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2\n",
      "llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0\n",
      "llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 0\n",
      "llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true\n",
      "llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false\n",
      "llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...\n",
      "llama_model_loader: - kv  23:               general.quantization_version u32              = 2\n",
      "llama_model_loader: - type  f32:   65 tensors\n",
      "llama_model_loader: - type q4_0:  225 tensors\n",
      "llama_model_loader: - type q6_K:    1 tensors\n",
      "llm_load_vocab: special tokens definition check successful ( 259/32000 ).\n",
      "llm_load_print_meta: format           = GGUF V3 (latest)\n",
      "llm_load_print_meta: arch             = llama\n",
      "llm_load_print_meta: vocab type       = SPM\n",
      "llm_load_print_meta: n_vocab          = 32000\n",
      "llm_load_print_meta: n_merges         = 0\n",
      "llm_load_print_meta: n_ctx_train      = 32768\n",
      "llm_load_print_meta: n_embd           = 4096\n",
      "llm_load_print_meta: n_head           = 32\n",
      "llm_load_print_meta: n_head_kv        = 8\n",
      "llm_load_print_meta: n_layer          = 32\n",
      "llm_load_print_meta: n_rot            = 128\n",
      "llm_load_print_meta: n_embd_head_k    = 128\n",
      "llm_load_print_meta: n_embd_head_v    = 128\n",
      "llm_load_print_meta: n_gqa            = 4\n",
      "llm_load_print_meta: n_embd_k_gqa     = 1024\n",
      "llm_load_print_meta: n_embd_v_gqa     = 1024\n",
      "llm_load_print_meta: f_norm_eps       = 0.0e+00\n",
      "llm_load_print_meta: f_norm_rms_eps   = 1.0e-05\n",
      "llm_load_print_meta: f_clamp_kqv      = 0.0e+00\n",
      "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n",
      "llm_load_print_meta: n_ff             = 14336\n",
      "llm_load_print_meta: n_expert         = 0\n",
      "llm_load_print_meta: n_expert_used    = 0\n",
      "llm_load_print_meta: rope scaling     = linear\n",
      "llm_load_print_meta: freq_base_train  = 1000000.0\n",
      "llm_load_print_meta: freq_scale_train = 1\n",
      "llm_load_print_meta: n_yarn_orig_ctx  = 32768\n",
      "llm_load_print_meta: rope_finetuned   = unknown\n",
      "llm_load_print_meta: model type       = 7B\n",
      "llm_load_print_meta: model ftype      = Q4_0\n",
      "llm_load_print_meta: model params     = 7.24 B\n",
      "llm_load_print_meta: model size       = 3.83 GiB (4.54 BPW) \n",
      "llm_load_print_meta: general.name     = mistralai_mistral-7b-instruct-v0.2\n",
      "llm_load_print_meta: BOS token        = 1 '<s>'\n",
      "llm_load_print_meta: EOS token        = 2 '</s>'\n",
      "llm_load_print_meta: UNK token        = 0 '<unk>'\n",
      "llm_load_print_meta: PAD token        = 0 '<unk>'\n",
      "llm_load_print_meta: LF token         = 13 '<0x0A>'\n",
      "llm_load_tensors: ggml ctx size       =    0.11 MiB\n",
      "ggml_backend_metal_buffer_from_ptr: allocated buffer, size =  3918.58 MiB, ( 3918.64 / 21845.34)\n",
      "llm_load_tensors: system memory used  = 3917.98 MiB\n",
      "..................................................................................................\n",
      "llama_new_context_with_model: n_ctx      = 2048\n",
      "llama_new_context_with_model: freq_base  = 1000000.0\n",
      "llama_new_context_with_model: freq_scale = 1\n",
      "ggml_metal_init: allocating\n",
      "ggml_metal_init: found device: Apple M1 Max\n",
      "ggml_metal_init: picking default device: Apple M1 Max\n",
      "ggml_metal_init: default.metallib not found, loading from source\n",
      "ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil\n",
      "ggml_metal_init: loading '/Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/llama_cpp/ggml-metal.metal'\n",
      "ggml_metal_init: GPU name:   Apple M1 Max\n",
      "ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)\n",
      "ggml_metal_init: hasUnifiedMemory              = true\n",
      "ggml_metal_init: recommendedMaxWorkingSetSize  = 22906.50 MB\n",
      "ggml_metal_init: maxTransferRate               = built-in GPU\n",
      "ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   256.00 MiB, ( 4176.20 / 21845.34)\n",
      "llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB\n",
      "ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 4176.22 / 21845.34)\n",
      "llama_build_graph: non-view tensors processed: 676/676\n",
      "llama_new_context_with_model: compute buffer total size = 159.19 MiB\n",
      "ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   156.02 MiB, ( 4332.22 / 21845.34)\n",
      "\u001b[32m2024-01-13 16:40:52 INFO semantic_router.utils.logger Initializing RouteLayer\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "from semantic_router import RouteLayer\n",
    "\n",
    "from llama_cpp import Llama\n",
    "from semantic_router.llms.llamacpp import LlamaCppLLM\n",
    "\n",
    "enable_gpu = True  # offload LLM layers to the GPU (must fit in memory)\n",
    "\n",
    "_llm = Llama(\n",
    "    model_path=\"./mistral-7b-instruct-v0.2.Q4_0.gguf\",\n",
    "    n_gpu_layers=-1 if enable_gpu else 0,\n",
    "    n_ctx=2048,\n",
    ")\n",
    "_llm.verbose=False\n",
    "llm = LlamaCppLLM(name=\"Mistral-7B-v0.2-Instruct\", llm=_llm, max_tokens=None)\n",
    "\n",
    "rl = RouteLayer(encoder=encoder, routes=routes, llm=llm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a8bd1da4-8ff7-4cd3-a5e3-fd79a938cc67",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RouteChoice(name='chitchat', function_call=None, similarity_score=None, trigger=None)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rl(\"how's the weather today?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "c6ccbea2-376b-4b28-9b79-d2e9c71e99f4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "from_string grammar:\n",
      "root ::= object \n",
      "object ::= [{] ws object_11 [}] ws \n",
      "value ::= object | array | string | number | value_6 ws \n",
      "array ::= [[] ws array_15 []] ws \n",
      "string ::= [\"] string_18 [\"] ws \n",
      "number ::= number_19 number_25 number_29 ws \n",
      "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n",
      "ws ::= ws_31 \n",
      "object_8 ::= string [:] ws value object_10 \n",
      "object_9 ::= [,] ws string [:] ws value \n",
      "object_10 ::= object_9 object_10 | \n",
      "object_11 ::= object_8 | \n",
      "array_12 ::= value array_14 \n",
      "array_13 ::= [,] ws value \n",
      "array_14 ::= array_13 array_14 | \n",
      "array_15 ::= array_12 | \n",
      "string_16 ::= [^\"\\] | [\\] string_17 \n",
      "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n",
      "string_18 ::= string_16 string_18 | \n",
      "number_19 ::= number_20 number_21 \n",
      "number_20 ::= [-] | \n",
      "number_21 ::= [0-9] | [1-9] number_22 \n",
      "number_22 ::= [0-9] number_22 | \n",
      "number_23 ::= [.] number_24 \n",
      "number_24 ::= [0-9] number_24 | [0-9] \n",
      "number_25 ::= number_23 | \n",
      "number_26 ::= [eE] number_27 number_28 \n",
      "number_27 ::= [-+] | \n",
      "number_28 ::= [0-9] number_28 | [0-9] \n",
      "number_29 ::= number_26 | \n",
      "ws_30 ::= [ <U+0009><U+000A>] ws \n",
      "ws_31 ::= ws_30 | \n",
      "\n",
      "\u001b[32m2024-01-13 16:41:01 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name='get_time' function_call={'timezone': 'America/New_York'} similarity_score=None trigger=None\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'11:41'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "out = rl(\"what's the time in New York right now?\")\n",
    "print(out)\n",
    "get_time(**out.function_call)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "720f976a",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "from_string grammar:\n",
      "root ::= object \n",
      "object ::= [{] ws object_11 [}] ws \n",
      "value ::= object | array | string | number | value_6 ws \n",
      "array ::= [[] ws array_15 []] ws \n",
      "string ::= [\"] string_18 [\"] ws \n",
      "number ::= number_19 number_25 number_29 ws \n",
      "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n",
      "ws ::= ws_31 \n",
      "object_8 ::= string [:] ws value object_10 \n",
      "object_9 ::= [,] ws string [:] ws value \n",
      "object_10 ::= object_9 object_10 | \n",
      "object_11 ::= object_8 | \n",
      "array_12 ::= value array_14 \n",
      "array_13 ::= [,] ws value \n",
      "array_14 ::= array_13 array_14 | \n",
      "array_15 ::= array_12 | \n",
      "string_16 ::= [^\"\\] | [\\] string_17 \n",
      "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n",
      "string_18 ::= string_16 string_18 | \n",
      "number_19 ::= number_20 number_21 \n",
      "number_20 ::= [-] | \n",
      "number_21 ::= [0-9] | [1-9] number_22 \n",
      "number_22 ::= [0-9] number_22 | \n",
      "number_23 ::= [.] number_24 \n",
      "number_24 ::= [0-9] number_24 | [0-9] \n",
      "number_25 ::= number_23 | \n",
      "number_26 ::= [eE] number_27 number_28 \n",
      "number_27 ::= [-+] | \n",
      "number_28 ::= [0-9] number_28 | [0-9] \n",
      "number_29 ::= number_26 | \n",
      "ws_30 ::= [ <U+0009><U+000A>] ws \n",
      "ws_31 ::= ws_30 | \n",
      "\n",
      "\u001b[32m2024-01-13 16:41:04 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name='get_time' function_call={'timezone': 'Europe/Rome'} similarity_score=None trigger=None\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'17:41'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "out = rl(\"what's the time in Rome right now?\")\n",
    "print(out)\n",
    "get_time(**out.function_call)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c9d9dbbb",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "from_string grammar:\n",
      "root ::= object \n",
      "object ::= [{] ws object_11 [}] ws \n",
      "value ::= object | array | string | number | value_6 ws \n",
      "array ::= [[] ws array_15 []] ws \n",
      "string ::= [\"] string_18 [\"] ws \n",
      "number ::= number_19 number_25 number_29 ws \n",
      "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n",
      "ws ::= ws_31 \n",
      "object_8 ::= string [:] ws value object_10 \n",
      "object_9 ::= [,] ws string [:] ws value \n",
      "object_10 ::= object_9 object_10 | \n",
      "object_11 ::= object_8 | \n",
      "array_12 ::= value array_14 \n",
      "array_13 ::= [,] ws value \n",
      "array_14 ::= array_13 array_14 | \n",
      "array_15 ::= array_12 | \n",
      "string_16 ::= [^\"\\] | [\\] string_17 \n",
      "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n",
      "string_18 ::= string_16 string_18 | \n",
      "number_19 ::= number_20 number_21 \n",
      "number_20 ::= [-] | \n",
      "number_21 ::= [0-9] | [1-9] number_22 \n",
      "number_22 ::= [0-9] number_22 | \n",
      "number_23 ::= [.] number_24 \n",
      "number_24 ::= [0-9] number_24 | [0-9] \n",
      "number_25 ::= number_23 | \n",
      "number_26 ::= [eE] number_27 number_28 \n",
      "number_27 ::= [-+] | \n",
      "number_28 ::= [0-9] number_28 | [0-9] \n",
      "number_29 ::= number_26 | \n",
      "ws_30 ::= [ <U+0009><U+000A>] ws \n",
      "ws_31 ::= ws_30 | \n",
      "\n",
      "\u001b[32m2024-01-13 16:41:05 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name='get_time' function_call={'timezone': 'Asia/Bangkok'} similarity_score=None trigger=None\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'23:41'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "out = rl(\"what's the time in Bangkok right now?\")\n",
    "print(out)\n",
    "get_time(**out.function_call)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "675d12fd",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "from_string grammar:\n",
      "root ::= object \n",
      "object ::= [{] ws object_11 [}] ws \n",
      "value ::= object | array | string | number | value_6 ws \n",
      "array ::= [[] ws array_15 []] ws \n",
      "string ::= [\"] string_18 [\"] ws \n",
      "number ::= number_19 number_25 number_29 ws \n",
      "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n",
      "ws ::= ws_31 \n",
      "object_8 ::= string [:] ws value object_10 \n",
      "object_9 ::= [,] ws string [:] ws value \n",
      "object_10 ::= object_9 object_10 | \n",
      "object_11 ::= object_8 | \n",
      "array_12 ::= value array_14 \n",
      "array_13 ::= [,] ws value \n",
      "array_14 ::= array_13 array_14 | \n",
      "array_15 ::= array_12 | \n",
      "string_16 ::= [^\"\\] | [\\] string_17 \n",
      "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n",
      "string_18 ::= string_16 string_18 | \n",
      "number_19 ::= number_20 number_21 \n",
      "number_20 ::= [-] | \n",
      "number_21 ::= [0-9] | [1-9] number_22 \n",
      "number_22 ::= [0-9] number_22 | \n",
      "number_23 ::= [.] number_24 \n",
      "number_24 ::= [0-9] number_24 | [0-9] \n",
      "number_25 ::= number_23 | \n",
      "number_26 ::= [eE] number_27 number_28 \n",
      "number_27 ::= [-+] | \n",
      "number_28 ::= [0-9] number_28 | [0-9] \n",
      "number_29 ::= number_26 | \n",
      "ws_30 ::= [ <U+0009><U+000A>] ws \n",
      "ws_31 ::= ws_30 | \n",
      "\n",
      "\u001b[32m2024-01-13 16:41:07 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name='get_time' function_call={'timezone': 'Asia/Bangkok'} similarity_score=None trigger=None\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'23:41'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "out = rl(\"what's the time in Phuket right now?\")\n",
    "print(out)\n",
    "get_time(**out.function_call)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5200f550-f3be-43d7-9b76-6390360f07c8",
   "metadata": {},
   "source": [
    "## Cleanup"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76df5f53",
   "metadata": {},
   "source": [
    "Once done, if you'd like to delete the downloaded model you can do so with the following:\n",
    "\n",
    "```\n",
    "! rm ./mistral-7b-instruct-v0.2.Q4_0.gguf\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}