Cleaned up API KEY placeholder text (#851)

1ea50104 · Sanyam Bhutani · GitHub · 258052fc · 3cd67f31 · 1ea50104
Unverified Commit 1ea50104 authored 1 month ago by Sanyam Bhutani Committed by GitHub 1 month ago
--- a/end-to-end-use-cases/agents/Agents_Tutorial/Tool_Calling_101.ipynb
+++ b/end-to-end-use-cases/agents/Agents_Tutorial/Tool_Calling_101.ipynb
@@ -71,7 +71,7 @@
    "import os\n",
    "from groq import Groq\n",
    "# Create the Groq client\n",
-    "client = Groq(api_key='gsk_PDfGP611i_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')"
+    "client = Groq(api_key='YOUR_API_KEY')"
   ]
  },
  {

 %% Cell type:markdown id: tags:
 # Tool Calling 101:
 Note: If you are looking for `3.2` Featherlight Model (1B and 3B) instructions, please see the respective notebook, this one covers 3.1 models.
 We are briefly introduction the `3.2` models at the end.
 Note: The new vision models behave same as `3.1` models when you are talking to the models without an image
 This is part (1/2) in the tool calling series, this notebook will cover the basics of what tool calling is and how to perform it with `Llama 3.1 models`
 Here's what you will learn in this notebook:
 - Setup Groq to access Llama 3.1 70B model
 - Avoid common mistakes when performing tool-calling with Llama
 - Understand Prompt templates for Tool Calling
 - Understand how the tool calls are handled under the hood
 - 3.2 Model Tool Calling Format and Behaviour
 In Part 2, we will learn how to build system that can get us comparison between 2 papers
 %% Cell type:markdown id: tags:
 ## What is Tool Calling?
 This approach was popularised by the [Gorilla](https://gorilla.cs.berkeley.edu) paper-which showed that Large Language Model(s) can be fine-tuned on API examples to teach them calling an external API.
 This is really cool because we can now use a LLM as a "brain" of a system and connect it to external systems to perform actions.
 In simpler words, "Llama can order your pizza for you" :)
 With the Llama 3.1 release, the models excel at tool calling and support out of box `brave_search`, `wolfram_api` and `code_interpreter`.
 However, first let's take a look at a common mistake
 %% Cell type:markdown id: tags:
 #### Install and setup groq dependencies
 - Install `groq` api to access Llama model(s)
 - Configure our client and authenticate with API Key(s), Note: PLEASE UPDATE YOUR KEY BELOW
 %% Cell type:code id: tags:
 ``` python
 #!pip3 install groq
 %set_env GROQ_API_KEY=''
 ```
 %% Cell type:code id: tags:
 ``` python
 import os
 from groq import Groq
 # Create the Groq client
-client = Groq(api_key='gsk_PDfGP611i_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')
+client = Groq(api_key='YOUR_API_KEY')
 ```
 %% Cell type:markdown id: tags:
 ## Common Mistake of Tool-Calling: Incorrect Prompt Template
 While Llama 3.1 works with tool-calling out of box, a wrong prompt template can cause issues with unexpected behaviour.
 Sometimes, even superheroes need to be reminded of their powers.
 Let's first try "forcing a prompt response from the model"
 %% Cell type:markdown id: tags:
 #### Note: Remember this is the WRONG template, please scroll to next section to see the right approach if you are in a rushed copy-pasta sprint
 This section will show you that the model will not use `brave_search` and `wolfram_api` out of the box unless the prompt template is set correctly.
 Even if the model is asked to do so!
 %% Cell type:code id: tags:
 ``` python
 SYSTEM_PROMPT = """
 Cutting Knowledge Date: December 2023
 Today Date: 20 August 2024
 You are a helpful assistant
 """
 ```
 %% Cell type:code id: tags:
 ``` python
 system_prompt = {}
 chat_history = []
 def model_chat(user_input: str, sys_prompt = SYSTEM_PROMPT, temperature: int = 0.7, max_tokens=2048):
    chat_history = [
        {
            "role": "system",
            "content": sys_prompt
        }
    ]
    chat_history.append({"role": "user", "content": user_input})
    response = client.chat.completions.create(model="llama-3.1-70b-versatile",
                                          messages=chat_history,
                                          max_tokens=max_tokens,
                                          temperature=temperature)
    chat_history.append({
    "role": "assistant",
    "content": response.choices[0].message.content
    })
    #print("Assistant:", response.choices[0].message.content)
    return response.choices[0].message.content
 ```
 %% Cell type:markdown id: tags:
 #### Asking the model about a recent news
 Since the prompt template is incorrect, it will answer using cutoff memory
 %% Cell type:code id: tags:
 ``` python
 user_input = """
 When is the next elden ring game coming out?
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: Unfortunately, I don't have information on a specific release date for the next Elden Ring game. However, I can tell you that there have been rumors and speculations about a potential sequel or DLC (Downloadable Content) for Elden Ring.
    In June 2022, the game's director, Hidetaka Miyazaki, mentioned that FromSoftware, the developer of Elden Ring, was working on "multiple" new projects, but no official announcements have been made since then.
    It's also worth noting that FromSoftware has a history of taking their time to develop new games, and the studio is known for its attention to detail and commitment to quality. So, even if there is a new Elden Ring game in development, it's likely that we won't see it anytime soon.
    Keep an eye on official announcements from FromSoftware and Bandai Namco, the publisher of Elden Ring, for any updates on a potential sequel or new game in the series.
 %% Cell type:markdown id: tags:
 #### Asking the model about a Math problem
 Again, the model answer(s) based on memory and not tool-calling
 %% Cell type:code id: tags:
 ``` python
 user_input = """
 When is the square root of 23131231?
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: To find the square root of 23131231, I'll calculate it for you.
    √23131231 ≈ 4813.61
 %% Cell type:markdown id: tags:
 #### Can we solve this using a reminder prompt?
 %% Cell type:code id: tags:
 ``` python
 user_input = """
 When is the square root of 23131231?
 Can you use a tool to solve the question?
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: I can use a mathematical tool to solve the question.
    The square root of 23131231 is:
    √23131231 ≈ 4810.51
 %% Cell type:markdown id: tags:
 Looks like we didn't get the wolfram_api call, let's try one more time with a stronger prompt:
 %% Cell type:code id: tags:
 ``` python
 user_input = """
 When is the square root of 23131231?
 Can you use a tool to solve the question?
 Remember you have been trained on wolfram_alpha
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: I can use Wolfram Alpha to calculate the square root of 23131231.
    According to Wolfram Alpha, the square root of 23131231 is:
    √23131231 ≈ 4809.07
 %% Cell type:markdown id: tags:
 ### Official Prompt Template
 As you can see, the model doesn't perform tool-calling in an expected fashion above. This is because we are not following the recommended prompting format.
 The Llama Stack is the go to approach to use the Llama model family and build applications.
 Let's first install the `llama_toolchain` Python package to have the Llama CLI available.
 %% Cell type:code id: tags:
 ``` python
 #!pip3 install llama-toolchain
 ```
 %% Cell type:markdown id: tags:
 #### Now we can learn about the various prompt formats available
 When you run the cell below-you will see models available and then we can check details for model specific prompts
 %% Cell type:code id: tags:
 ``` python
 !llama model prompt-format
 ```
 %% Output
    Traceback (most recent call last):
      File "/opt/miniconda3/bin/llama", line 8, in <module>
        sys.exit(main())
                 ^^^^^^
      File "/opt/miniconda3/lib/python3.12/site-packages/llama_toolchain/cli/llama.py", line 44, in main
        parser.run(args)
      File "/opt/miniconda3/lib/python3.12/site-packages/llama_toolchain/cli/llama.py", line 38, in run
        args.func(args)
      File "/opt/miniconda3/lib/python3.12/site-packages/llama_toolchain/cli/model/prompt_format.py", line 59, in _run_model_template_cmd
        raise argparse.ArgumentTypeError(
    argparse.ArgumentTypeError: llama3_1 is not a valid Model. Choose one from --
    Llama3.1-8B
    Llama3.1-70B
    Llama3.1-405B
    Llama3.1-8B-Instruct
    Llama3.1-70B-Instruct
    Llama3.1-405B-Instruct
    Llama3.2-1B
    Llama3.2-3B
    Llama3.2-1B-Instruct
    Llama3.2-3B-Instruct
    Llama3.2-11B-Vision
    Llama3.2-90B-Vision
    Llama3.2-11B-Vision-Instruct
    Llama3.2-90B-Vision-Instruct
 %% Cell type:code id: tags:
 ``` python
 !llama model prompt-format -m Llama3.1-8B
 ```
 %% Output
    ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[m━━━━━━━━━━━━━━━━━━━┓[m
    ┃                                    [1mLlama 3.1 - Prompt Formats[0m                 [m[1m[0m                   ┃[m
    ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[m━━━━━━━━━━━━━━━━━━━┛[m
    [m
    [m
                                                   [1;4mTokens[0m                           [m[1;4m[0m                    [m
    [m
    Here is a list of special tokens that are supported by Llama 3.1:               [m                    [m
    [m
    [1;33m • [0m[1;36;40m<|begin_of_text|>[0m: Specifies the start of the prompt                         [m[1;33m[0m[1;36;40m[0m                    [m
    [1;33m • [0m[1;36;40m<|end_of_text|>[0m: Model will cease to generate more tokens. This token is gene[m[1;33m[0m[1;36;40m[0mrated only by the   [m
    [1;33m   [0mbase models.                                                                 [m[1;33m[0m                    [m
    [1;33m • [0m[1;36;40m<|finetune_right_pad_id|>[0m: This token is used for padding text sequences to t[m[1;33m[0m[1;36;40m[0mhe same length in a [m
    [1;33m   [0mbatch.                                                                       [m:[K
 %% Cell type:markdown id: tags:
 ## Tool Calling: Using the correct Prompt Template
 With `llama-cli` we have already learned the right behaviour of the model
 %% Cell type:markdown id: tags:
 If everything is setup correctly-the model should now wrap function calls  with the `|<python_tag>|` following the actually function call.
 This can allow you to manage your function calling logic accordingly.
 Time to test the theory
 %% Cell type:code id: tags:
 ``` python
 SYSTEM_PROMPT = """
 Environment: iPython
 Tools: brave_search, wolfram_alpha
 Cutting Knowledge Date: December 2023
 Today Date: 15 September 2024
 """
 user_input = """
 When is the next Elden ring game coming out?
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: <|python_tag|>brave_search.call(query="Elden Ring sequel release date")
 %% Cell type:code id: tags:
 ``` python
 user_input = """
 What is the square root of 23131231?
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: <|python_tag|>wolfram_alpha.call(query="square root of 23131231")
 %% Cell type:markdown id: tags:
 ### Using this knowledge in practise
 A common misconception about tool calling is: the model can handle the tool call and get your output.
 This is NOT TRUE, the actual tool call is something that you have to implement. With this knowledge, let's see how we can utilise brave search to answer our original question
 %% Cell type:code id: tags:
 ``` python
 #!pip3 install brave-search
 ```
 %% Cell type:code id: tags:
 ``` python
 SYSTEM_PROMPT = """
 Environment: iPython
 Tools: brave_search, wolfram_alpha
 Cutting Knowledge Date: December 2023
 Today Date: 15 September 2024
 """
 user_input = """
 What is the square root of 23131231?
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: <|python_tag|>wolfram_alpha.call(query="square root of 23131231")
 %% Cell type:code id: tags:
 ``` python
 print(model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 output = model_chat(user_input, sys_prompt=SYSTEM_PROMPT)
 ```
 %% Output
    <|python_tag|>wolfram_alpha.call(query="square root of 23131231")
 %% Cell type:code id: tags:
 ``` python
 import re
 # Extract the function name
 fn_name = re.search(r'<\|python_tag\|>(\w+)\.', output).group(1)
 # Extract the method
 fn_call_method = re.search(r'\.(\w+)\(', output).group(1)
 # Extract the arguments
 fn_call_args = re.search(r'=\s*([^)]+)', output).group(1)
 print(f"Function name: {fn_name}")
 print(f"Method: {fn_call_method}")
 print(f"Args: {fn_call_args}")
 ```
 %% Output
    Function name: wolfram_alpha
    Method: call
    Args: "square root of 23131231"
 %% Cell type:markdown id: tags:
 You can implement this in different ways but the idea is the same, the LLM gives an output with the `<|python_tag|>`, which should call a tool-calling mechanism.
 This logic gets handled in the program and then the output is passed back to the model to answer the user
 %% Cell type:markdown id: tags:
 ### Code interpreter
 With the correct prompt template, Llama model can output Python (as well as code in any-language that the model has been trained on)
 %% Cell type:code id: tags:
 ``` python
 user_input = """
 If I can invest 400$ every month at 5% interest rate, how long would it take me to make a 100k$ in investments?
 """
 print("Assistant:", model_chat(user_input, sys_prompt=SYSTEM_PROMPT))
 ```
 %% Output
    Assistant: <|python_tag|>import math
    # Define the variables
    monthly_investment = 400
    interest_rate = 0.05
    target_amount = 100000
    # Calculate the number of months it would take to reach the target amount
    months = 0
    current_amount = 0
    while current_amount < target_amount:
        current_amount += monthly_investment
        current_amount *= 1 + interest_rate / 12  # Compound interest
        months += 1
    # Print the result
    print(f"It would take {months} months, approximately {months / 12:.2f} years, to reach the target amount of ${target_amount:.2f}.")
 %% Cell type:markdown id: tags:
 Let's validate the output by running the output from the model:
 %% Cell type:code id: tags:
 ``` python
 # Define the variables
 monthly_investment = 400
 interest_rate = 0.05
 target_amount = 100000
 # Calculate the number of months it would take to reach the target amount
 months = 0
 current_amount = 0
 while current_amount < target_amount:
    current_amount += monthly_investment
    current_amount *= 1 + interest_rate / 12  # Compound interest
    months += 1
 # Print the result
 print(f"It would take {months} months, approximately {months / 12:.2f} years, to reach the target amount of ${target_amount:.2f}.")
 ```
 %% Output
    It would take 172 months, approximately 14.33 years, to reach the target amount of $100000.00.
 %% Cell type:markdown id: tags:
 ### 3.2 Models Custom Tool Prompt Format
 %% Cell type:markdown id: tags:
 Life is great because Llama Team writes great docs for us, so we can conveniently copy-pasta examples from there :)
 [Here](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2#-tool-calling-(1b/3b)-) are the docs for your reference that we will be using.
 Exercise for viewer: Use `llama-toolchain` again to verify like we did earlier and then start the prompt engineering for the small Llamas.
 %% Cell type:code id: tags:
 ``` python
 function_definitions = """[
    {
        "name": "get_user_info",
        "description": "Retrieve details for a specific user by their unique identifier. Note that the provided function is in Python 3 syntax.",
        "parameters": {
            "type": "dict",
            "required": [
                "user_id"
            ],
            "properties": {
                "user_id": {
                "type": "integer",
                "description": "The unique identifier of the user. It is used to fetch the specific user details from the database."
            },
            "special": {
                "type": "string",
                "description": "Any special information or parameters that need to be considered while fetching user details.",
                "default": "none"
                }
            }
        }
    }
 ]
 """
 ```
 %% Cell type:code id: tags:
 ``` python
 system_prompt = """You are an expert in composing functions. You are given a question and a set of possible functions.
 Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
 If none of the function can be used, point it out. If the given question lacks the parameters required by the function,
 also point it out. You should only return the function call in tools call sections.
 If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\n
 You SHOULD NOT include any other text in the response.
 Here is a list of functions in JSON format that you can invoke.\n\n{functions}\n""".format(functions=function_definitions)
 ```
 %% Cell type:code id: tags:
 ``` python
 chat_history = []
 def model_chat(user_input: str, sys_prompt = system_prompt, temperature: int = 0.7, max_tokens=2048):
    chat_history = [
        {
            "role": "system",
            "content": system_prompt
        }
    ]
    chat_history.append({"role": "user", "content": user_input})
    response = client.chat.completions.create(model="llama-3.2-3b-preview",
                                          messages=chat_history,
                                          max_tokens=max_tokens,
                                          temperature=temperature)
    chat_history.append({
    "role": "assistant",
    "content": response.choices[0].message.content
    })
    #print("Assistant:", response.choices[0].message.content)
    return response.choices[0].message.content
 ```
 %% Cell type:markdown id: tags:
 Note: We are assuming a structure for dataset here:
 - Name
 - Email
 - Age
 - Color request
 %% Cell type:code id: tags:
 ``` python
 user_input = "Can you retrieve the details for the user with the ID 7890, who has black as their special request?"
 print("Assistant:", model_chat(user_input, sys_prompt=system_prompt))
 ```
 %% Output
    Assistant: [get_user_info(user_id=7890, special='black')]
 %% Cell type:markdown id: tags:
 #### Dummy dataset to make sure our model stays happy :)
 %% Cell type:code id: tags:
 ``` python
 def get_user_info(user_id: int, special: str = "none") -> dict:
    # This is a mock database of users
    user_database = {
        7890: {"name": "Emma Davis", "email": "emma@example.com", "age": 31},
        1234: {"name": "Liam Wilson", "email": "liam@example.com", "age": 28},
        2345: {"name": "Olivia Chen", "email": "olivia@example.com", "age": 35},
        3456: {"name": "Noah Taylor", "email": "noah@example.com", "age": 42},
        4567: {"name": "Ava Martinez", "email": "ava@example.com", "age": 39},
        5678: {"name": "Ethan Brown", "email": "ethan@example.com", "age": 45},
        6789: {"name": "Sophia Kim", "email": "sophia@example.com", "age": 33},
        8901: {"name": "Mason Lee", "email": "mason@example.com", "age": 29},
        9012: {"name": "Isabella Garcia", "email": "isabella@example.com", "age": 37},
        1357: {"name": "James Johnson", "email": "james@example.com", "age": 41}
    }
    # Check if the user exists in our mock database
    if user_id in user_database:
        user_data = user_database[user_id]
        # Handle the 'special' parameter
        if special != "none":
            user_data["special_info"] = f"Special request: {special}"
        return user_data
    else:
        return {"error": "User not found"}
 ```
 %% Cell type:code id: tags:
 ``` python
 [get_user_info(user_id=7890, special='black')]
 ```
 %% Output
    [{'name': 'Emma Davis',
      'email': 'emma@example.com',
      'age': 31,
      'special_info': 'Special request: black'}]
 %% Cell type:markdown id: tags:
 ### Handling Tool-Calling logic for the model
 %% Cell type:markdown id: tags:
 Hello Regex, my good old friend :)
 With Regex, we can write a simple way to handle tool_calling and return either the model or tool call response
 %% Cell type:code id: tags:
 ``` python
 import re
 import json
 # Assuming you have defined get_user_info function and SYSTEM_PROMPT
 chat_history = []
 def process_response(response):
    function_call_pattern = r'\[(.*?)\((.*?)\)\]'
    function_calls = re.findall(function_call_pattern, response)
    if function_calls:
        processed_response = []
        for func_name, args_str in function_calls:
            args_dict = {}
            for arg in args_str.split(','):
                key, value = arg.split('=')
                key = key.strip()
                value = value.strip().strip("'")
                if value.isdigit():
                    value = int(value)
                args_dict[key] = value
            if func_name == 'get_user_info':
                result = get_user_info(**args_dict)
                processed_response.append(f"Function call result: {json.dumps(result, indent=2)}")
            else:
                processed_response.append(f"Unknown function: {func_name}")
        return "\n".join(processed_response)
    else:
        return response
 def model_chat(user_input: str, sys_prompt=system_prompt, temperature: float = 0.7, max_tokens: int = 2048):
    global chat_history
    if not chat_history:
        chat_history = [
            {
                "role": "system",
                "content": sys_prompt
            }
        ]
    chat_history.append({"role": "user", "content": user_input})
    response = client.chat.completions.create(
        model="llama-3.2-3b-preview",
        messages=chat_history,
        max_tokens=max_tokens,
        temperature=temperature
    )
    assistant_response = response.choices[0].message.content
    processed_response = process_response(assistant_response)
    chat_history.append({
        "role": "assistant",
        "content": assistant_response
    })
    return processed_response
 ```
 %% Cell type:code id: tags:
 ``` python
 user_input = "Can you retrieve the details for the user with the ID 7890, who has black as their special request?"
 print("Assistant:", model_chat(user_input, sys_prompt=system_prompt))
 ```
 %% Output
    Assistant: Function call result: {
      "name": "Emma Davis",
      "email": "emma@example.com",
      "age": 31,
      "special_info": "Special request: black"
    }
 %% Cell type:code id: tags:
 ``` python
 #fin
 ```

--- a/end-to-end-use-cases/agents/Agents_Tutorial/Tool_Calling_201.ipynb
+++ b/end-to-end-use-cases/agents/Agents_Tutorial/Tool_Calling_201.ipynb
@@ -87,9 +87,9 @@
    "from groq import Groq\n",
    "\n",
    "# Create the Groq client\n",
-    "client = Groq(api_key='gsk_PDfGP611i_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')\n",
+    "client = Groq(api_key='YOUR_API_KEY')\n",
    "\n",
-    "tavily_client = TavilyClient(api_key='fake_key_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')\n"
+    "tavily_client = TavilyClient(api_key='YOUR_API_KEY')\n"
   ]
  },
  {

 %% Cell type:markdown id: tags:
 # Tool Calling 201: Llama to find Differences between two papers
 The image below illustrates the demo in this notebook.
 **Goal:** Use `Meta-Llama-3.1-70b` model to find the differences between two papers
 - Step 1: Take the user input query
 - Step 2: Perform an internet search using `tavily` API to fetch the arxiv ID(s) based on the user query
 Note: `3.1` models support `brave_search` but this notebook is also aimed at showcasing custom tools.
 The above is important because many-times the user-query is different from the paper name and arxiv ID-this will help us with the next step
 - Step 3: Use the web results to extract the arxiv ID(s) of the papers
 We will use an 8b model here because who wants to deal with complex regex, that's the main-use case of LLM(s), isn't it? :D
 - Step 4: Use `arxiv` API to download the PDF(s) of the papers in user query
 - Step 5: For ease, we will extract first 80k words from the PDF and write these to a `.txt` file that we can summarise
 - Step 6: Use instances of `Meta-Llama-3.1-8b` instances to summaries the two PDF(s)
 - Step 7: Prompt the `70b` model to get the differences between the two papers being discussed
 %% Cell type:markdown id: tags:
 ## Part 1: Defining the pieces
 We will start by describing all the modules from the image above, to make sure our logic works.
 In second half of the notebook, we will write a simple function to take care of the function calling logic
 %% Cell type:markdown id: tags:
 #### Install necessary libraries
 %% Cell type:code id: tags:
 ``` python
 #!pip3 install groq
 #!pip3 install arxiv
 #!pip3 install tavily-python
 #!pip3 install llama-toolchain
 #!pip3 install PyPDF2
 ```
 %% Cell type:markdown id: tags:
 #### Necessary imports
 %% Cell type:markdown id: tags:
 ##### Note: PLEASE REPLACE API KEYS BELOW WITH YOUR REAL ONES
 %% Cell type:code id: tags:
 ``` python
 import os, arxiv, PyPDF2
 from tavily import TavilyClient
 from groq import Groq
 # Create the Groq client
-client = Groq(api_key='gsk_PDfGP611i_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')
+client = Groq(api_key='YOUR_API_KEY')
-tavily_client = TavilyClient(api_key='fake_key_HAHAHAHA_THIS_IS_NOT_MY_REAL_KEY_PLEASE_REPLACE')
+tavily_client = TavilyClient(api_key='YOUR_API_KEY')
 ```
 %% Cell type:markdown id: tags:
 #### Main LLM thread:
 We will use a `MAIN_SYSTEM_PROMPT` and a `main_model_chat_history` to keep track of the discussion, since we are using 4 instances of LLM(s) along with this.
 Note, if you paid attention and notice that the SYSTEM_PROMPT here is different-thanks for reading closely! It's always a great idea to follow the official recommendations.
 However, when it's a matter of writing complex regex, we can bend the rules slightly :D
 Note, we will outline the functions here and define them as we go
 %% Cell type:code id: tags:
 ``` python
 MAIN_SYSTEM_PROMPT = """
 Environment: iPython
 Cutting Knowledge Date: December 2023
 Today Date: 15 September 2024
 # Tool Instructions
 - Always execute python code in messages that you share.
 - When looking for real time information use relevant functions if available
 You have access to the following functions:
 Use the function 'query_for_two_papers' to: Get the internet query results for the arxiv ID of the two papers user wants to compare
 {
  "name": "query_for_two_papers",
  "description": "Internet search the arxiv ID of two papers that user wants to look up",
  "parameters": {
    "paper_1": {
      "param_type": "string",
      "description": "arxiv id of paper_name_1 from user query",
      "required": true
    },
    "paper_2": {
      "param_type": "string",
      "description": "arxiv id of paper_name_2 from user query",
      "required": true
    },
  }
 }
 Use the function 'get_arxiv_ids' to: Given a dict of websearch queries, use a LLM to return JUST the arxiv ID, which is otherwise harder to extract
 {
  "name": "get_arxiv_ids",
  "description": "Use the dictionary returned from query_for_two_papers to ask a LLM to extract the arxiv IDs",
  "parameters": {
    "web_results": {
      "param_type": "dictionary",
      "description": "dictionary of search result for a query from the previous function",
      "required": true
    },
  }
 }
 Use the function 'process_arxiv_paper' to: Given the arxiv ID from get_arxiv_ids function, return a download txt file of the paper that we can then use for summarising
 {
  "name": "process_arxiv_paper",
  "description": "Use arxiv IDs extracted from earlier to be downloaded and saved to txt files",
  "parameters": {
    "arxiv_id": {
      "param_type": "string",
      "description": "arxiv ID of the paper that we want to download and save a txt file of",
      "required": true
    },
  }
 }
 Use the function 'summarize_text_file' to: Given the txt file name based on the arxiv IDs we are working with from earlier, get a summary of the paper being discussed
 {
  "name": "summarize_text_file",
  "description": "Summarise the arxiv paper saved in the txt file",
  "parameters": {
    "file_name": {
      "param_type": "string",
      "description": "Filename to be used to get a summary of",
      "required": true
    },
  }
 }
 If a you choose to call a function ONLY reply in the following format:
 <{start_tag}={function_name}>{parameters}{end_tag}
 where
 start_tag => `<function`
 parameters => a JSON dict with the function argument name as key and function argument value as value.
 end_tag => `</function>`
 Here is an example,
 <function=example_function_name>{"example_name": "example_value"}</function>
 Reminder:
 - When user is asking for a question that requires your reasoning, DO NOT USE OR FORCE a function call
 - Even if you remember the arxiv ID of papers from input, do not put that in the query_two_papers function call, pass the internet look up query
 - Function calls MUST follow the specified format
 - Required parameters MUST be specified
 - Only call one function at a time
 - Put the entire function call reply on one line
 - When returning a function call, don't add anything else to your response
 """
 ```
 %% Cell type:code id: tags:
 ``` python
 main_model_chat_history = [
    {
        "role" : "system",
        "content" : MAIN_SYSTEM_PROMPT
    }
 ]
 ```
 %% Cell type:markdown id: tags:
 #### Define the `model_chat` instance
 We will be using this to handle all user input(s)
 %% Cell type:code id: tags:
 ``` python
 def model_chat(user_input: str, temperature: int = 0, max_tokens=2048):
    main_model_chat_history.append({"role": "user", "content": user_input})
    #print(chat_history)
    #print("User: ", user_input)
    response = client.chat.completions.create(model="llama-3.1-70b-versatile",
                                          messages=main_model_chat_history,
                                          max_tokens=max_tokens,
                                          temperature=temperature)
    main_model_chat_history.append({
    "role": "assistant",
    "content": response.choices[0].message.content
    })
    #print("Assistant:", response.choices[0].message.content)
    return response.choices[0].message.content
 ```
 %% Cell type:code id: tags:
 ``` python
 user_input = """
 What are the differences between llama 3.1 and BERT?
 """
 output = model_chat(user_input, temperature=1)
 ```
 %% Cell type:code id: tags:
 ``` python
 print(output)
 ```
 %% Output
    <function=query_for_two_papers>{"paper_1": "Llama", "paper_2": "BERT"}</function>
 %% Cell type:markdown id: tags:
 If you remember from `Tool_Calling_101.ipynb`, we need a way to extract and manage tool calling based on the response, the system prompt from earlier makes our lives easier to answer do this later :)
 First, let's validate the logic and define all the functions as we go:
 %% Cell type:markdown id: tags:
 #### Tavily API:
 We will use the Tavily API to do a web query for the papers based on the model outputs
 %% Cell type:code id: tags:
 ``` python
 def query_for_two_papers(paper_1:str , paper_2: str) -> None :
     return [tavily_client.search(f"arxiv id of {paper_1}"), tavily_client.search(f"arxiv id of {paper_2}")]
 ```
 %% Cell type:code id: tags:
 ``` python
 search_results = query_for_two_papers("llama 3.1", "BERT")
 #search_results
 ```
 %% Cell type:code id: tags:
 ``` python
 user_input = f"""
 Here are the search results for the first paper, extract the arxiv ID {search_results[0]}
 """
 output = model_chat(user_input, temperature=1)
 ```
 %% Cell type:code id: tags:
 ``` python
 print(output)
 ```
 %% Output
    <function=get_arxiv_id>{"web_results": "{'query': 'arxiv id of llama 3.1', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'TheLlama3HerdofModels - arXiv.org', 'url': 'https://arxiv.org/pdf/2407.21783', 'content': 'arXiv:2407.21783v2 [cs.AI] 15 Aug 2024. Finetuned Multilingual Longcontext Tooluse Release ... The model architecture of Llama 3 is illustrated in Figure1. The development of our Llama 3 language modelscomprisestwomainstages:', 'score': 0.9955835, 'raw_content': None}, {'title': 'NousResearch/Meta-Llama-3.1-8B - Hugging Face', 'url': 'https://huggingface.co/NousResearch/Meta-Llama-3.1-8B', 'content': 'The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available ...', 'score': 0.95379424, 'raw_content': None}, {'title': 'Introducing Llama 3.1: Our most capable models to date - Meta AI', 'url': 'https://ai.meta.com/blog/meta-llama-3-1/', 'content': 'Bringing open intelligence to all, our latest models expand context length to 128K, add support across eight languages, and include Llama 3.1 405B—the first frontier-level open source AI model. Llama 3.1 405B is in a class of its own, with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models.', 'score': 0.9003547, 'raw_content': None}, {'title': 'The Llama 3 Herd of Models | Research - AI at Meta', 'url': 'https://ai.meta.com/research/publications/the-llama-3-herd-of-models/', 'content': 'This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety.', 'score': 0.89460546, 'raw_content': None}, {'title': '[2407.21783] The Llama 3 Herd of Models - arXiv.org', 'url': 'https://arxiv.org/abs/2407.21783', 'content': 'Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive ...', 'score': 0.6841585, 'raw_content': None}], 'response_time': 2.09}"}</function>
 %% Cell type:code id: tags:
 ``` python
 user_input = f"""
 Here are the search results for the second paper now, extract the arxiv ID {search_results[1]}
 """
 output = model_chat(user_input, temperature=1)
 ```
 %% Cell type:code id: tags:
 ``` python
 print(output)
 ```
 %% Output
    <function=get_arxiv_id>{"web_results": "{'query': 'arxiv id of BERT', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': '[2103.11943] BERT: A Review of Applications in Natural Language ...', 'url': 'https://arxiv.org/abs/2103.11943', 'content': 'arXiv:2103.11943 (cs) [Submitted on 22 Mar 2021] BERT: A Review of Applications in Natural Language Processing and Understanding. M. V. Koroteev. In this review, we describe the application of one of the most popular deep learning-based language models - BERT. The paper describes the mechanism of operation of this model, the main areas of its ...', 'score': 0.99411184, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://aclanthology.org/N19-1423/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning ...', 'score': 0.9222025, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://research.google/pubs/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.', 'score': 0.87652874, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://arxiv.org/abs/1810.04805', 'content': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned ...', 'score': 0.66115755, 'raw_content': None}, {'title': 'A Primer in BERTology: What We Know About How BERT Works', 'url': 'https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT', 'content': 'The issue of model depth must be related to the information flow from the most task-specific layers closer to the classifier (Liu et al., 2019a), to the initial layers which appear to be the most task-invariant (Hao et al., 2019), and where the tokens resemble the input tokens the most (Brunner et al., 2020) For BERT, this has been achieved through experiments with loss functions (Sanh et al., 2019; Jiao et al., 2019), mimicking the activation patterns of individual portions of the teacher network (Sun et al., 2019a), and knowledge transfer at the pre-training (Turc et al., 2019; Jiao et al., 2019; Sun et al., 2020) or fine-tuning stage (Jiao et al., 2019). In particular, they were shown to rely on shallow heuristics in natural language inference (McCoy et al., 2019b; Zellers et al., 2019; Jin et al., 2020), reading comprehension (Si et al., 2019; Rogers et al., 2020; Sugawara et al., 2020; Yogatama et al., 2019), argument reasoning comprehension (Niven and Kao, 2019), and text classification (Jin et al., 2020). Several studies explored the possibilities of improving the fine-tuning of BERT:\\nTaking more layers into account: learning a complementary representation of the information in deep and output layers (Yang and Zhao, 2019), using a weighted combination of all layers instead of the final one (Su and Cheng, 2019; Kondratyuk and Straka, 2019), and layer dropout (Kondratyuk and Straka, 2019).\\n For BERT, Clark et al. (2019) observe that most heads in the same layer show similar self-attention patterns (perhaps related to the fact that the output of all self-attention heads in a layer is passed through the same MLP), which explains why Michel et al. (2019) were able to reduce most layers to a single head.\\n', 'score': 0.4248892, 'raw_content': None}], 'response_time': 2.16}"}</function>
 %% Cell type:markdown id: tags:
 #### Extracting Arxiv IDs:
 At this point, you would know the author is allergic to writing regex. To deal with this, we will simply use an `8b` instance to extract the `arxiv id` from the paper:
 %% Cell type:code id: tags:
 ``` python
 def get_arxiv_ids(web_results: dict, temperature: int = 0, max_tokens=512):
    # Initialize chat history with a specific prompt to extract arXiv IDs
    arxiv_id_chat_history = [{"role": "system", "content": "Given this input, give me the arXiv ID of the papers. The input has the query and web results. DO NOT WRITE ANYTHING ELSE IN YOUR RESPONSE: ONLY THE ARXIV ID ONCE, the web search will have it repeated multiple times, just return the it once and where its actually the arxiv ID"}, {"role": "user", "content": f"Here is the query and results{web_results}"}]
    # Call the model to process the input and extract arXiv IDs
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",  # Adjust the model as necessary
        messages=arxiv_id_chat_history,
        max_tokens=max_tokens,
        temperature=temperature
    )
    # Append the assistant's response to the chat history
    arxiv_id_chat_history.append({
        "role": "assistant",
        "content": response.choices[0].message.content
    })
    # Return the extracted arXiv IDs
    return response.choices[0].message.content
 ```
 %% Cell type:code id: tags:
 ``` python
 print(get_arxiv_ids(search_results[0]))
 print(get_arxiv_ids(search_results[1]))
 ```
 %% Output
    2407.21783
    2103.11943
 %% Cell type:markdown id: tags:
 #### Downloading the papers and extracting details:
 Llama 3.1 family LLM(s) are great enough to use raw outputs extracted from a PDF and summarise them. However, we are still bound by their (great) 128k context length-to live with this, we will extract just the first 80k words.
 The functions below handle the logic of downloading the PDF(s) and extracting their outputs
 %% Cell type:code id: tags:
 ``` python
 # Function to download PDF using arxiv library
 def download_pdf(arxiv_id, filename):
    paper = next(arxiv.Client().results(arxiv.Search(id_list=[arxiv_id])))
    paper.download_pdf(filename=filename)
 # Function to convert PDF to text
 def pdf_to_text(filename):
    with open(filename, "rb") as file:
        reader = PyPDF2.PdfReader(file)
        text = ""
        for page in reader.pages:
            if page.extract_text():
                text += page.extract_text() + " "
    return text
 # Function to truncate text after 80k words
 def truncate_text(text, limit=20000):
    words = text.split()
    truncated = ' '.join(words[:limit])
    return truncated
 # Main function to process an arXiv ID
 def process_arxiv_paper(arxiv_id):
    pdf_filename = f"{arxiv_id}.pdf"
    txt_filename = f"{arxiv_id}.txt"
    # Download PDF
    download_pdf(arxiv_id, pdf_filename)
    # Convert PDF to text
    text = pdf_to_text(pdf_filename)
    # Truncate text
    truncated_text = truncate_text(text)
    # Save to txt file
    with open(txt_filename, "w", encoding="utf-8") as file:
        file.write(truncated_text)
    print(f"Processed text saved to {txt_filename}")
 # Example usage
 arxiv_id = "2407.21783"
 process_arxiv_paper(arxiv_id)
 arxiv_id = "2103.11943"
 process_arxiv_paper(arxiv_id)
 ```
 %% Output
    Processed text saved to 2407.21783.txt
    Processed text saved to 2103.11943.txt
 %% Cell type:markdown id: tags:
 #### Summarising logic:
 We can use a `8b` model instance to summarise our papers:
 %% Cell type:code id: tags:
 ``` python
 SUMMARISER_PROMPT = """
 Cutting Knowledge Date: December 2023
 Today Date: 15 September 2024
 You are an expert summariser of research papers, below you will get an input of the text from an arxiv paper and your job is to read it carefully and return a concise summary with some bullet points at the end of some key-takeways from it
 """
 def summarize_text_file(file_name: str, temperature: int = 0, max_tokens=2048):
    # Read the content of the file
    with open(file_name, 'r') as file:
        file_content = file.read()
    # Initialize chat history
    chat_history = [{"role": "system", "content": f"{SUMMARISER_PROMPT}"}, {"role": "user", "content": f"Text of the paper: {file_content}"}]
    # Generate a summary using the model
    response = client.chat.completions.create(
        model="llama-3.1-8b-instant",  # You can change the model as needed
        messages=chat_history,
        max_tokens=max_tokens,
        temperature=temperature
    )
    # Append the assistant's response to the chat history
    chat_history.append({
        "role": "assistant",
        "content": response.choices[0].message.content
    })
    # Return the summary
    return response.choices[0].message.content
 ```
 %% Cell type:code id: tags:
 ``` python
 paper_1_summary = summarize_text_file("2407.21783.txt")
 print(paper_1_summary)
 ```
 %% Output
    Summary:
    This paper introduces Llama 3, a new set of foundation models developed by Meta AI. The Llama 3 family consists of models with 8B, 70B, and 405B parameters, capable of handling tasks in multiple languages and modalities. The paper details the pre-training and post-training processes, infrastructure improvements, and evaluations across various benchmarks. Llama 3 demonstrates competitive performance compared to other leading language models, including GPT-4 and Claude 3.5 Sonnet, on a wide range of tasks. The paper also explores multimodal capabilities by integrating vision and speech components, although these are still under development and not ready for release.
    Key takeaways:
    Llama 3 includes models with 8B, 70B, and 405B parameters, with the flagship 405B model trained on 15.6T tokens.
    The models excel in multilingual capabilities, coding, reasoning, and tool usage.
    Llama 3 uses a dense Transformer architecture with minimal modifications, focusing on high-quality data and increased training scale.
    The training process involved significant infrastructure improvements to handle large-scale distributed training.
    Post-training includes supervised fine-tuning, rejection sampling, and direct preference optimization to align the model with human preferences.
    Llama 3 demonstrates competitive performance on various benchmarks, including MMLU, coding tasks, and math reasoning.
    The paper presents experiments on integrating vision and speech capabilities using a compositional approach.
    Extensive safety measures were implemented, including pre-training data filtering, safety fine-tuning, and system-level protections.
    The authors are releasing the Llama 3 language models publicly to accelerate research and development in AI.
 %% Cell type:code id: tags:
 ``` python
 paper_2_summary = summarize_text_file("2103.11943.txt")
 print(paper_2_summary)
 ```
 %% Output
    BERT is a novel language representation model developed by researchers at Google AI. It stands for Bidirectional Encoder Representations from Transformers and introduces a new approach to pre-training deep bidirectional representations from unlabeled text. Unlike previous models that looked at text sequences either from left-to-right or combined left-to-right and right-to-left training, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
    The key innovation is the application of bidirectional training of Transformer, a popular attention model, to language modeling. This is achieved through two pre-training tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). In MLM, the model attempts to predict masked words in a sentence, allowing it to incorporate context from both directions. NSP trains the model to understand relationships between sentences.
    BERT significantly outperformed previous state-of-the-art models on a wide range of NLP tasks, including question answering, natural language inference, and others, without substantial task-specific architecture modifications. The researchers demonstrated the effectiveness of BERT by obtaining new state-of-the-art results on eleven natural language processing tasks.
    Key Takeaways:
    BERT introduces deep bidirectional representations, overcoming limitations of previous unidirectional or shallowly bidirectional models.
    The model uses "masked language modeling" (MLM) for bidirectional training of Transformer.
    BERT is pre-trained on two tasks: masked language modeling and next sentence prediction.
    It achieves state-of-the-art performance on 11 NLP tasks, including an improvement of 7.7% on the GLUE benchmark.
    BERT's architecture allows for fine-tuning with just one additional output layer, making it versatile for various NLP tasks.
    The model demonstrates that deep bidirectional language representation improves language understanding compared to left-to-right or shallow bidirectional approaches.
    BERT's performance improves with larger model sizes, even on small-scale tasks.
    The pre-training of BERT is computationally expensive but fine-tuning is relatively inexpensive.
    BERT can be used for both fine-tuning and as a feature-based approach, with competitive results in both scenarios.
 %% Cell type:code id: tags:
 ``` python
 user_input = f"""
 Here are the summaries of the two papers, look at them closely and tell me the differences of the papers: Paper 1 Summary {paper_1_summary} and Paper 2 Summary {paper_2_summary}
 """
 output = model_chat(user_input, temperature=1)
 ```
 %% Cell type:code id: tags:
 ``` python
 print(output)
 ```
 %% Output
    The two paper summaries are about different language models: Llama 3 and BERT.
    The main differences are:
    1. Model Type: Llama 3 is a set of foundation models developed by Meta AI, while BERT is a language representation model developed by researchers at Google AI.
    2. Model Architecture: Llama 3 uses a dense Transformer architecture, while BERT uses a bidirectional Transformer architecture.
    3. Training Process: Llama 3 involves significant infrastructure improvements to handle large-scale distributed training, while BERT uses pre-training tasks such as Masked Language Model (MLM) and Next Sentence Prediction (NSP).
    4. Multimodal Capabilities: Llama 3 explores multimodal capabilities by integrating vision and speech components, while BERT focuses on text-based language understanding.
    5. Performance: Both models demonstrate competitive performance on various benchmarks, but Llama 3 shows performance on tasks such as multilingual capabilities, coding, reasoning, and tool usage, while BERT excels on NLP tasks such as question answering and natural language inference.
    6. Release: Llama 3 is released publicly to accelerate research and development in AI, while BERT is released as a state-of-the-art model for NLP tasks.
    7. Model Size: Llama 3 has models with 8B, 70B, and 405B parameters, while BERT's model size is not specified in the summary.
 %% Cell type:markdown id: tags:
 ## Part 2: Handle the function calling logic:
 Now that we have validated a MVP, we can write a simple function to handle tool-calling:
 %% Cell type:code id: tags:
 ``` python
 def handle_llm_output(llm_output):
    # Check if the output starts with "<function="
    if llm_output.startswith("<function="):
        return extract_details_and_call_function(llm_output)
    else:
        # Output does not start with "<function=", return as is
        return llm_output
 def extract_details_and_call_function(input_string):
    # Extract the function name and parameters
    prefix = "<function="
    suffix = "</function>"
    start = input_string.find(prefix) + len(prefix)
    end = input_string.find(suffix)
    function_and_params = input_string[start:end]
    # Split to get function name and parameters
    function_name, params_json = function_and_params.split(">{")
    function_name = function_name.strip()
    params_json = "{" + params_json
    # Convert parameters to dictionary
    params = json.loads(params_json)
    # Call the function dynamically
    function_map = {
        "query_for_two_papers": query_for_two_papers,
        "get_arxiv_id": get_arxiv_ids,
        "process_arxiv_paper": process_arxiv_paper,
        "summarise_text_file": summarize_text_file
    }
    if function_name in function_map:
        result = function_map[function_name](**params)
        return result
    else:
        return "Function not found"
 # Testing usage
 llm_outputs = [
    "<function=query_for_two_papers>{\"paper_1\": \"Llama 3.1\", \"paper_2\": \"BERT\"}</function>",
    "Llama 3.2 models are here too btw!"
 ]
 for output in llm_outputs:
    result = handle_llm_output(output)
    print(result)
 ```
 %% Output
    [{'query': 'arxiv id of Llama 3.1', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': 'TheLlama3HerdofModels - arXiv.org', 'url': 'https://arxiv.org/pdf/2407.21783', 'content': 'arXiv:2407.21783v2 [cs.AI] 15 Aug 2024. Finetuned Multilingual Longcontext Tooluse Release ... The model architecture of Llama 3 is illustrated in Figure1. The development of our Llama 3 language modelscomprisestwomainstages:', 'score': 0.9961004, 'raw_content': None}, {'title': '[PDF] The Llama 3 Herd of Models - Semantic Scholar', 'url': 'https://www.semanticscholar.org/paper/The-Llama-3-Herd-of-Models-Dubey-Jauhri/6520557cc3bfd198f960cc8cb6151c3474321bd8', 'content': 'DOI: 10.48550/arXiv.2407.21783 Corpus ID: 271571434; The Llama 3 Herd of Models @article{Dubey2024TheL3, title={The Llama 3 Herd of Models}, author={Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al-Dahle and Aiesha Letman and Akhil Mathur and Alan Schelten and Amy Yang and Angela Fan and Anirudh Goyal and Anthony Hartshorn and Aobo Yang and Archi Mitra and ...', 'score': 0.9943581, 'raw_content': None}, {'title': 'The Llama 3 Herd of Models | Research - AI at Meta', 'url': 'https://ai.meta.com/research/publications/the-llama-3-herd-of-models/', 'content': 'This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety.', 'score': 0.9320833, 'raw_content': None}, {'title': 'Introducing Llama 3.1: Our most capable models to date - Meta AI', 'url': 'https://ai.meta.com/blog/meta-llama-3-1/', 'content': 'Bringing open intelligence to all, our latest models expand context length to 128K, add support across eight languages, and include Llama 3.1 405B—the first frontier-level open source AI model. Llama 3.1 405B is in a class of its own, with unmatched flexibility, control, and state-of-the-art capabilities that rival the best closed source models.', 'score': 0.8467045, 'raw_content': None}, {'title': '[2407.21783] The Llama 3 Herd of Models - arXiv.org', 'url': 'https://arxiv.org/abs/2407.21783', 'content': 'Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive ...', 'score': 0.68257374, 'raw_content': None}], 'response_time': 1.7}, {'query': 'arxiv id of BERT', 'follow_up_questions': None, 'answer': None, 'images': [], 'results': [{'title': '[2103.11943] BERT: A Review of Applications in Natural Language ...', 'url': 'https://arxiv.org/abs/2103.11943', 'content': 'arXiv:2103.11943 (cs) [Submitted on 22 Mar 2021] BERT: A Review of Applications in Natural Language Processing and Understanding. M. V. Koroteev. In this review, we describe the application of one of the most popular deep learning-based language models - BERT. The paper describes the mechanism of operation of this model, the main areas of its ...', 'score': 0.99411184, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://aclanthology.org/N19-1423/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning ...', 'score': 0.9222025, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://research.google/pubs/bert-pre-training-of-deep-bidirectional-transformers-for-language-understanding/', 'content': 'Abstract. We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.', 'score': 0.87652874, 'raw_content': None}, {'title': 'BERT: Pre-training of Deep Bidirectional Transformers for Language ...', 'url': 'https://arxiv.org/abs/1810.04805', 'content': 'We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned ...', 'score': 0.66115755, 'raw_content': None}, {'title': 'A Primer in BERTology: What We Know About How BERT Works', 'url': 'https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00349/96482/A-Primer-in-BERTology-What-We-Know-About-How-BERT', 'content': 'The issue of model depth must be related to the information flow from the most task-specific layers closer to the classifier (Liu et al., 2019a), to the initial layers which appear to be the most task-invariant (Hao et al., 2019), and where the tokens resemble the input tokens the most (Brunner et al., 2020) For BERT, this has been achieved through experiments with loss functions (Sanh et al., 2019; Jiao et al., 2019), mimicking the activation patterns of individual portions of the teacher network (Sun et al., 2019a), and knowledge transfer at the pre-training (Turc et al., 2019; Jiao et al., 2019; Sun et al., 2020) or fine-tuning stage (Jiao et al., 2019). In particular, they were shown to rely on shallow heuristics in natural language inference (McCoy et al., 2019b; Zellers et al., 2019; Jin et al., 2020), reading comprehension (Si et al., 2019; Rogers et al., 2020; Sugawara et al., 2020; Yogatama et al., 2019), argument reasoning comprehension (Niven and Kao, 2019), and text classification (Jin et al., 2020). Several studies explored the possibilities of improving the fine-tuning of BERT:\nTaking more layers into account: learning a complementary representation of the information in deep and output layers (Yang and Zhao, 2019), using a weighted combination of all layers instead of the final one (Su and Cheng, 2019; Kondratyuk and Straka, 2019), and layer dropout (Kondratyuk and Straka, 2019).\n For BERT, Clark et al. (2019) observe that most heads in the same layer show similar self-attention patterns (perhaps related to the fact that the output of all self-attention heads in a layer is passed through the same MLP), which explains why Michel et al. (2019) were able to reduce most layers to a single head.\n', 'score': 0.4250085, 'raw_content': None}], 'response_time': 2.2}]
    This is a regular output without function call.
 %% Cell type:code id: tags:
 ``` python
 #fin
 ```