{ "cells": [ { "cell_type": "markdown", "id": "e92c26d9", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb) [](https://nbviewer.org/github/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb)" ] }, { "cell_type": "markdown", "id": "ee50410e-3f98-4d9c-8838-b38aebd6ce77", "metadata": {}, "source": [ "# Local Dynamic Routes\n", "\n", "## Fully local Semantic Router with `llama.cpp` and HuggingFace Encoder\n", "\n", "There are many reasons users might choose to roll their own LLMs rather than use a third-party service. Whether it's due to cost, privacy or compliance, Semantic Router supports the use of \"local\" LLMs through `llama.cpp`.\n", "\n", "Using `llama.cpp` also enables the use of quantized GGUF models, reducing the memory footprint of deployed models, allowing even 13-billion parameter models to run with hardware acceleration on an Apple M1 Pro chip.\n", "\n", "Below is an example of using semantic router with **Mistral-7B-Instruct**, quantized i." ] }, { "cell_type": "markdown", "id": "baa8d577-9f23-4dec-b167-fdecfb313c52", "metadata": {}, "source": [ "## Installing the library\n", "\n", "> Note: if you require hardware acceleration via BLAS, CUDA, Metal, etc. please refer to the [abetlen/llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-with-specific-hardware-acceleration-blas-cuda-metal-etc) repository README.md" ] }, { "cell_type": "code", "execution_count": null, "id": "f95e4906-c3e6-4905-8f13-5e67d67069d5", "metadata": {}, "outputs": [], "source": [ "!pip install -qU \"semantic-router[local]==0.0.20\"" ] }, { "cell_type": "markdown", "id": "0029cc6d", "metadata": {}, "source": [ "If you're running on Apple silicon you can run the following to run with Metal hardware acceleration:" ] }, { "cell_type": "code", "execution_count": 1, "id": "4f9b5729", "metadata": {}, "outputs": [], "source": [ "!CMAKE_ARGS=\"-DLLAMA_METAL=on\"" ] }, { "cell_type": "markdown", "id": "d2f52f11-ae6d-4706-8da3-ce03a7a6b92d", "metadata": {}, "source": [ "## Download the Mistral 7B Instruct 4-bit GGUF files\n", "\n", "We will be using Mistral 7B Instruct, quantized as a 4-bit GGUF file, a good balance between performance and ability to deploy on consumer hardware" ] }, { "cell_type": "code", "execution_count": null, "id": "1d6ddf61-c189-4b3b-99df-9508f830ae1f", "metadata": {}, "outputs": [], "source": [ "! curl -L \"https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_0.gguf?download=true\" -o ./mistral-7b-instruct-v0.2.Q4_0.gguf\n", "! ls mistral-7b-instruct-v0.2.Q4_0.gguf" ] }, { "cell_type": "markdown", "id": "f6842324-0a81-44fb-a220-905af77601af", "metadata": {}, "source": [ "# Initializing Dynamic Routes\n", "\n", "Similar to the `02-dynamic-routes.ipynb` notebook, we will be initializing some dynamic routes that make use of LLMs for function calling" ] }, { "cell_type": "code", "execution_count": 2, "id": "e26db664-9dff-476a-84ef-edd7a8cdf1ba", "metadata": {}, "outputs": [], "source": [ "from datetime import datetime\n", "from zoneinfo import ZoneInfo\n", "\n", "from semantic_router import Route\n", "from semantic_router.utils.function_call import get_schema\n", "\n", "\n", "def get_time(timezone: str) -> str:\n", " \"\"\"Finds the current time in a specific timezone.\n", "\n", " :param timezone: The timezone to find the current time in, should\n", " be a valid timezone from the IANA Time Zone Database like\n", " \"America/New_York\" or \"Europe/London\". Do NOT put the place\n", " name itself like \"rome\", or \"new york\", you must provide\n", " the IANA format.\n", " :type timezone: str\n", " :return: The current time in the specified timezone.\"\"\"\n", " now = datetime.now(ZoneInfo(timezone))\n", " return now.strftime(\"%H:%M\")\n", "\n", "\n", "time_schema = get_schema(get_time)\n", "time_schema\n", "time = Route(\n", " name=\"get_time\",\n", " utterances=[\n", " \"what is the time in new york city?\",\n", " \"what is the time in london?\",\n", " \"I live in Rome, what time is it?\",\n", " ],\n", " function_schema=time_schema,\n", ")\n", "\n", "politics = Route(\n", " name=\"politics\",\n", " utterances=[\n", " \"isn't politics the best thing ever\",\n", " \"why don't you tell me about your political opinions\",\n", " \"don't you just love the president\" \"don't you just hate the president\",\n", " \"they're going to destroy this country!\",\n", " \"they will save the country!\",\n", " ],\n", ")\n", "chitchat = Route(\n", " name=\"chitchat\",\n", " utterances=[\n", " \"how's the weather today?\",\n", " \"how are things going?\",\n", " \"lovely weather today\",\n", " \"the weather is horrendous\",\n", " \"let's go to the chippy\",\n", " ],\n", ")\n", "\n", "routes = [politics, chitchat, time]" ] }, { "cell_type": "code", "execution_count": 3, "id": "fac95b0c-c61f-4158-b7d9-0221f7d0b65e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'name': 'get_time',\n", " 'description': 'Finds the current time in a specific timezone.\\n\\n:param timezone: The timezone to find the current time in, should\\n be a valid timezone from the IANA Time Zone Database like\\n \"America/New_York\" or \"Europe/London\". Do NOT put the place\\n name itself like \"rome\", or \"new york\", you must provide\\n the IANA format.\\n:type timezone: str\\n:return: The current time in the specified timezone.',\n", " 'signature': '(timezone: str) -> str',\n", " 'output': \"<class 'str'>\"}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "time_schema" ] }, { "cell_type": "markdown", "id": "ddd15620-92bd-4b77-99f4-c3fe68e9ab62", "metadata": {}, "source": [ "# Encoders\n", "\n", "You can use alternative Encoders, however, in this example we want to showcase a fully-local Semantic Router execution, so we are going to use a `HuggingFaceEncoder` with `sentence-transformers/all-MiniLM-L6-v2` (the default) as an embedding model." ] }, { "cell_type": "code", "execution_count": 4, "id": "5253c141-141b-4fda-b07c-a313393902ed", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "from semantic_router.encoders import HuggingFaceEncoder\n", "\n", "encoder = HuggingFaceEncoder()" ] }, { "cell_type": "markdown", "id": "512fb46e-352b-4740-971e-ad4d047aa03b", "metadata": {}, "source": [ "# `llama.cpp` LLM\n", "\n", "From here, we can go ahead and instantiate our `llama-cpp-python` `llama_cpp.Llama` LLM, and then pass it to the `semantic_router.llms.LlamaCppLLM` wrapper class.\n", "\n", "For `llama_cpp.Llama`, there are a couple of parameters you should pay attention to:\n", "\n", "- `n_gpu_layers`: how many LLM layers to offload to the GPU (if you want to offload the entire model, pass `-1`, and for CPU execution, pass `0`)\n", "- `n_ctx`: context size, limit the number of tokens that can be passed to the LLM (this is bounded by the model's internal maximum context size, in this case for Mistral-7B-Instruct, 8000 tokens)\n", "- `verbose`: if `False`, silences output from `llama.cpp`\n", "\n", "> For other parameter explanation, refer to the `llama-cpp-python` [API Reference](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/)" ] }, { "cell_type": "code", "execution_count": 5, "id": "772cec0d-7a0c-4c7e-9b7a-4a1864b0a8ec", "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from ./mistral-7b-instruct-v0.2.Q4_0.gguf (version GGUF V3 (latest))\n", "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n", "llama_model_loader: - kv 0: general.architecture str = llama\n", "llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.2\n", "llama_model_loader: - kv 2: llama.context_length u32 = 32768\n", "llama_model_loader: - kv 3: llama.embedding_length u32 = 4096\n", "llama_model_loader: - kv 4: llama.block_count u32 = 32\n", "llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336\n", "llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128\n", "llama_model_loader: - kv 7: llama.attention.head_count u32 = 32\n", "llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8\n", "llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010\n", "llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000\n", "llama_model_loader: - kv 11: general.file_type u32 = 2\n", "llama_model_loader: - kv 12: tokenizer.ggml.model str = llama\n", "llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = [\"<unk>\", \"<s>\", \"</s>\", \"<0x00>\", \"<...\n", "llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...\n", "llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n", "llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1\n", "llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2\n", "llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0\n", "llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0\n", "llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true\n", "llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false\n", "llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...\n", "llama_model_loader: - kv 23: general.quantization_version u32 = 2\n", "llama_model_loader: - type f32: 65 tensors\n", "llama_model_loader: - type q4_0: 225 tensors\n", "llama_model_loader: - type q6_K: 1 tensors\n", "llm_load_vocab: special tokens definition check successful ( 259/32000 ).\n", "llm_load_print_meta: format = GGUF V3 (latest)\n", "llm_load_print_meta: arch = llama\n", "llm_load_print_meta: vocab type = SPM\n", "llm_load_print_meta: n_vocab = 32000\n", "llm_load_print_meta: n_merges = 0\n", "llm_load_print_meta: n_ctx_train = 32768\n", "llm_load_print_meta: n_embd = 4096\n", "llm_load_print_meta: n_head = 32\n", "llm_load_print_meta: n_head_kv = 8\n", "llm_load_print_meta: n_layer = 32\n", "llm_load_print_meta: n_rot = 128\n", "llm_load_print_meta: n_embd_head_k = 128\n", "llm_load_print_meta: n_embd_head_v = 128\n", "llm_load_print_meta: n_gqa = 4\n", "llm_load_print_meta: n_embd_k_gqa = 1024\n", "llm_load_print_meta: n_embd_v_gqa = 1024\n", "llm_load_print_meta: f_norm_eps = 0.0e+00\n", "llm_load_print_meta: f_norm_rms_eps = 1.0e-05\n", "llm_load_print_meta: f_clamp_kqv = 0.0e+00\n", "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n", "llm_load_print_meta: n_ff = 14336\n", "llm_load_print_meta: n_expert = 0\n", "llm_load_print_meta: n_expert_used = 0\n", "llm_load_print_meta: rope scaling = linear\n", "llm_load_print_meta: freq_base_train = 1000000.0\n", "llm_load_print_meta: freq_scale_train = 1\n", "llm_load_print_meta: n_yarn_orig_ctx = 32768\n", "llm_load_print_meta: rope_finetuned = unknown\n", "llm_load_print_meta: model type = 7B\n", "llm_load_print_meta: model ftype = Q4_0\n", "llm_load_print_meta: model params = 7.24 B\n", "llm_load_print_meta: model size = 3.83 GiB (4.54 BPW) \n", "llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2\n", "llm_load_print_meta: BOS token = 1 '<s>'\n", "llm_load_print_meta: EOS token = 2 '</s>'\n", "llm_load_print_meta: UNK token = 0 '<unk>'\n", "llm_load_print_meta: PAD token = 0 '<unk>'\n", "llm_load_print_meta: LF token = 13 '<0x0A>'\n", "llm_load_tensors: ggml ctx size = 0.11 MiB\n", "ggml_backend_metal_buffer_from_ptr: allocated buffer, size = 3918.58 MiB, ( 3918.64 / 21845.34)\n", "llm_load_tensors: system memory used = 3917.98 MiB\n", "..................................................................................................\n", "llama_new_context_with_model: n_ctx = 2048\n", "llama_new_context_with_model: freq_base = 1000000.0\n", "llama_new_context_with_model: freq_scale = 1\n", "ggml_metal_init: allocating\n", "ggml_metal_init: found device: Apple M1 Max\n", "ggml_metal_init: picking default device: Apple M1 Max\n", "ggml_metal_init: default.metallib not found, loading from source\n", "ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil\n", "ggml_metal_init: loading '/Users/jamesbriggs/opt/anaconda3/envs/decision-layer/lib/python3.11/site-packages/llama_cpp/ggml-metal.metal'\n", "ggml_metal_init: GPU name: Apple M1 Max\n", "ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)\n", "ggml_metal_init: hasUnifiedMemory = true\n", "ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB\n", "ggml_metal_init: maxTransferRate = built-in GPU\n", "ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 256.00 MiB, ( 4176.20 / 21845.34)\n", "llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB\n", "ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 0.02 MiB, ( 4176.22 / 21845.34)\n", "llama_build_graph: non-view tensors processed: 676/676\n", "llama_new_context_with_model: compute buffer total size = 159.19 MiB\n", "ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size = 156.02 MiB, ( 4332.22 / 21845.34)\n", "\u001b[32m2024-01-13 16:40:52 INFO semantic_router.utils.logger Initializing RouteLayer\u001b[0m\n" ] } ], "source": [ "from semantic_router import RouteLayer\n", "\n", "from llama_cpp import Llama\n", "from semantic_router.llms.llamacpp import LlamaCppLLM\n", "\n", "enable_gpu = True # offload LLM layers to the GPU (must fit in memory)\n", "\n", "_llm = Llama(\n", " model_path=\"./mistral-7b-instruct-v0.2.Q4_0.gguf\",\n", " n_gpu_layers=-1 if enable_gpu else 0,\n", " n_ctx=2048,\n", ")\n", "_llm.verbose=False\n", "llm = LlamaCppLLM(name=\"Mistral-7B-v0.2-Instruct\", llm=_llm, max_tokens=None)\n", "\n", "rl = RouteLayer(encoder=encoder, routes=routes, llm=llm)" ] }, { "cell_type": "code", "execution_count": 6, "id": "a8bd1da4-8ff7-4cd3-a5e3-fd79a938cc67", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RouteChoice(name='chitchat', function_call=None, similarity_score=None, trigger=None)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rl(\"how's the weather today?\")" ] }, { "cell_type": "code", "execution_count": 7, "id": "c6ccbea2-376b-4b28-9b79-d2e9c71e99f4", "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "from_string grammar:\n", "root ::= object \n", "object ::= [{] ws object_11 [}] ws \n", "value ::= object | array | string | number | value_6 ws \n", "array ::= [[] ws array_15 []] ws \n", "string ::= [\"] string_18 [\"] ws \n", "number ::= number_19 number_25 number_29 ws \n", "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n", "ws ::= ws_31 \n", "object_8 ::= string [:] ws value object_10 \n", "object_9 ::= [,] ws string [:] ws value \n", "object_10 ::= object_9 object_10 | \n", "object_11 ::= object_8 | \n", "array_12 ::= value array_14 \n", "array_13 ::= [,] ws value \n", "array_14 ::= array_13 array_14 | \n", "array_15 ::= array_12 | \n", "string_16 ::= [^\"\\] | [\\] string_17 \n", "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n", "string_18 ::= string_16 string_18 | \n", "number_19 ::= number_20 number_21 \n", "number_20 ::= [-] | \n", "number_21 ::= [0-9] | [1-9] number_22 \n", "number_22 ::= [0-9] number_22 | \n", "number_23 ::= [.] number_24 \n", "number_24 ::= [0-9] number_24 | [0-9] \n", "number_25 ::= number_23 | \n", "number_26 ::= [eE] number_27 number_28 \n", "number_27 ::= [-+] | \n", "number_28 ::= [0-9] number_28 | [0-9] \n", "number_29 ::= number_26 | \n", "ws_30 ::= [ <U+0009><U+000A>] ws \n", "ws_31 ::= ws_30 | \n", "\n", "\u001b[32m2024-01-13 16:41:01 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "name='get_time' function_call={'timezone': 'America/New_York'} similarity_score=None trigger=None\n" ] }, { "data": { "text/plain": [ "'11:41'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "out = rl(\"what's the time in New York right now?\")\n", "print(out)\n", "get_time(**out.function_call)" ] }, { "cell_type": "code", "execution_count": 8, "id": "720f976a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "from_string grammar:\n", "root ::= object \n", "object ::= [{] ws object_11 [}] ws \n", "value ::= object | array | string | number | value_6 ws \n", "array ::= [[] ws array_15 []] ws \n", "string ::= [\"] string_18 [\"] ws \n", "number ::= number_19 number_25 number_29 ws \n", "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n", "ws ::= ws_31 \n", "object_8 ::= string [:] ws value object_10 \n", "object_9 ::= [,] ws string [:] ws value \n", "object_10 ::= object_9 object_10 | \n", "object_11 ::= object_8 | \n", "array_12 ::= value array_14 \n", "array_13 ::= [,] ws value \n", "array_14 ::= array_13 array_14 | \n", "array_15 ::= array_12 | \n", "string_16 ::= [^\"\\] | [\\] string_17 \n", "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n", "string_18 ::= string_16 string_18 | \n", "number_19 ::= number_20 number_21 \n", "number_20 ::= [-] | \n", "number_21 ::= [0-9] | [1-9] number_22 \n", "number_22 ::= [0-9] number_22 | \n", "number_23 ::= [.] number_24 \n", "number_24 ::= [0-9] number_24 | [0-9] \n", "number_25 ::= number_23 | \n", "number_26 ::= [eE] number_27 number_28 \n", "number_27 ::= [-+] | \n", "number_28 ::= [0-9] number_28 | [0-9] \n", "number_29 ::= number_26 | \n", "ws_30 ::= [ <U+0009><U+000A>] ws \n", "ws_31 ::= ws_30 | \n", "\n", "\u001b[32m2024-01-13 16:41:04 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "name='get_time' function_call={'timezone': 'Europe/Rome'} similarity_score=None trigger=None\n" ] }, { "data": { "text/plain": [ "'17:41'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "out = rl(\"what's the time in Rome right now?\")\n", "print(out)\n", "get_time(**out.function_call)" ] }, { "cell_type": "code", "execution_count": 9, "id": "c9d9dbbb", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "from_string grammar:\n", "root ::= object \n", "object ::= [{] ws object_11 [}] ws \n", "value ::= object | array | string | number | value_6 ws \n", "array ::= [[] ws array_15 []] ws \n", "string ::= [\"] string_18 [\"] ws \n", "number ::= number_19 number_25 number_29 ws \n", "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n", "ws ::= ws_31 \n", "object_8 ::= string [:] ws value object_10 \n", "object_9 ::= [,] ws string [:] ws value \n", "object_10 ::= object_9 object_10 | \n", "object_11 ::= object_8 | \n", "array_12 ::= value array_14 \n", "array_13 ::= [,] ws value \n", "array_14 ::= array_13 array_14 | \n", "array_15 ::= array_12 | \n", "string_16 ::= [^\"\\] | [\\] string_17 \n", "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n", "string_18 ::= string_16 string_18 | \n", "number_19 ::= number_20 number_21 \n", "number_20 ::= [-] | \n", "number_21 ::= [0-9] | [1-9] number_22 \n", "number_22 ::= [0-9] number_22 | \n", "number_23 ::= [.] number_24 \n", "number_24 ::= [0-9] number_24 | [0-9] \n", "number_25 ::= number_23 | \n", "number_26 ::= [eE] number_27 number_28 \n", "number_27 ::= [-+] | \n", "number_28 ::= [0-9] number_28 | [0-9] \n", "number_29 ::= number_26 | \n", "ws_30 ::= [ <U+0009><U+000A>] ws \n", "ws_31 ::= ws_30 | \n", "\n", "\u001b[32m2024-01-13 16:41:05 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "name='get_time' function_call={'timezone': 'Asia/Bangkok'} similarity_score=None trigger=None\n" ] }, { "data": { "text/plain": [ "'23:41'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "out = rl(\"what's the time in Bangkok right now?\")\n", "print(out)\n", "get_time(**out.function_call)" ] }, { "cell_type": "code", "execution_count": 10, "id": "675d12fd", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "from_string grammar:\n", "root ::= object \n", "object ::= [{] ws object_11 [}] ws \n", "value ::= object | array | string | number | value_6 ws \n", "array ::= [[] ws array_15 []] ws \n", "string ::= [\"] string_18 [\"] ws \n", "number ::= number_19 number_25 number_29 ws \n", "value_6 ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e] | [n] [u] [l] [l] \n", "ws ::= ws_31 \n", "object_8 ::= string [:] ws value object_10 \n", "object_9 ::= [,] ws string [:] ws value \n", "object_10 ::= object_9 object_10 | \n", "object_11 ::= object_8 | \n", "array_12 ::= value array_14 \n", "array_13 ::= [,] ws value \n", "array_14 ::= array_13 array_14 | \n", "array_15 ::= array_12 | \n", "string_16 ::= [^\"\\] | [\\] string_17 \n", "string_17 ::= [\"\\/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] \n", "string_18 ::= string_16 string_18 | \n", "number_19 ::= number_20 number_21 \n", "number_20 ::= [-] | \n", "number_21 ::= [0-9] | [1-9] number_22 \n", "number_22 ::= [0-9] number_22 | \n", "number_23 ::= [.] number_24 \n", "number_24 ::= [0-9] number_24 | [0-9] \n", "number_25 ::= number_23 | \n", "number_26 ::= [eE] number_27 number_28 \n", "number_27 ::= [-+] | \n", "number_28 ::= [0-9] number_28 | [0-9] \n", "number_29 ::= number_26 | \n", "ws_30 ::= [ <U+0009><U+000A>] ws \n", "ws_31 ::= ws_30 | \n", "\n", "\u001b[32m2024-01-13 16:41:07 INFO semantic_router.utils.logger Extracting function input...\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "name='get_time' function_call={'timezone': 'Asia/Bangkok'} similarity_score=None trigger=None\n" ] }, { "data": { "text/plain": [ "'23:41'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "out = rl(\"what's the time in Phuket right now?\")\n", "print(out)\n", "get_time(**out.function_call)" ] }, { "cell_type": "markdown", "id": "5200f550-f3be-43d7-9b76-6390360f07c8", "metadata": {}, "source": [ "## Cleanup" ] }, { "cell_type": "markdown", "id": "76df5f53", "metadata": {}, "source": [ "Once done, if you'd like to delete the downloaded model you can do so with the following:\n", "\n", "```\n", "! rm ./mistral-7b-instruct-v0.2.Q4_0.gguf\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }