diff --git a/docs/.gitignore b/docs/.gitignore
index 25c292b4cd8faf5fa07de8a5a320e47bb5369712..9f8ca2991efdf87a351c1dd12b4545d3c0496bf1 100644
--- a/docs/.gitignore
+++ b/docs/.gitignore
@@ -2,3 +2,4 @@
 docs/api_reference/llama_deploy/*
 docs/module_guides/llama_deploy/*
 llama_deploy
+.mkdocs.tmp.yml
diff --git a/docs/docs/understanding/agent/agentworkflow.jpg b/docs/docs/understanding/agent/agentworkflow.jpg
new file mode 100644
index 0000000000000000000000000000000000000000..11e7f128fd5188cceb5dbbfd9c1025109b474e86
Binary files /dev/null and b/docs/docs/understanding/agent/agentworkflow.jpg differ
diff --git a/docs/docs/understanding/agent/human_in_the_loop.md b/docs/docs/understanding/agent/human_in_the_loop.md
new file mode 100644
index 0000000000000000000000000000000000000000..1a6f3d35327081c1c65669c760d63544250193b8
--- /dev/null
+++ b/docs/docs/understanding/agent/human_in_the_loop.md
@@ -0,0 +1,84 @@
+# Human in the loop
+
+Tools can also be defined that get a human in the loop. This is useful for tasks that require human input, such as confirming a tool call or providing feedback.
+
+As we'll see in the our [Workflows tutorial](../workflows/index.md), the way Workflows work under the hood of AgentWorkflow is by running steps which both emit and receive events. Here's a diagram of the steps (in blue) that makes up an AgentWorkflow and the events (in green) that pass data between them. You'll recognize these events, they're the same ones we were handling in the output stream earlier.
+
+![Workflows diagram](./agentworkflow.jpg)
+
+To get a human in the loop, we'll get our tool to emit an event that isn't received by any other step in the workflow. We'll then tell our tool to wait until it receives a specific "reply" event.
+
+We have built-in `InputRequiredEvent` and `HumanResponseEvent` events to use for this purpose. If you want to capture different forms of human input, you can subclass these events to match your own preferences. Let's import them:
+
+```python
+from llama_index.core.workflow import (
+    InputRequiredEvent,
+    HumanResponseEvent,
+)
+```
+
+Next we'll create a tool that performs a hypothetical dangerous task. There are a couple of new things happening here:
+
+* We're calling `write_event_to_stream` with an `InputRequiredEvent`. This emits an event to the external stream to be captured. You can attach arbitrary data to the event, which we do in the form of a `user_name`.
+* We call `wait_for_event`, specifying that we want to wait for a `HumanResponseEvent` and that it must have the `user_name` set to "Laurie". You can see how this would be useful in a multi-user system where more than one incoming event might be involved.
+
+```python
+async def dangerous_task(ctx: Context) -> str:
+    """A dangerous task that requires human confirmation."""
+
+    # emit an event to the external stream to be captured
+    ctx.write_event_to_stream(
+        InputRequiredEvent(
+            prefix="Are you sure you want to proceed? ",
+            user_name="Laurie",
+        )
+    )
+
+    # wait until we see a HumanResponseEvent
+    response = await ctx.wait_for_event(
+        HumanResponseEvent, requirements={"user_name": "Laurie"}
+    )
+
+    # act on the input from the event
+    if response.response.strip().lower() == "yes":
+        return "Dangerous task completed successfully."
+    else:
+        return "Dangerous task aborted."
+```
+
+We create our agent as usual, passing it the tool we just defined:
+
+```python
+workflow = AgentWorkflow.from_tools_or_functions(
+    [dangerous_task],
+    llm=llm,
+    system_prompt="You are a helpful assistant that can perform dangerous tasks.",
+)
+```
+
+Now we can run the workflow, handling the `InputRequiredEvent` just like any other streaming event, and responding with a `HumanResponseEvent` passed in using the `send_event` method:
+
+```python
+handler = workflow.run(user_msg="I want to proceed with the dangerous task.")
+
+async for event in handler.stream_events():
+    if isinstance(event, InputRequiredEvent):
+        # capture keyboard input
+        response = input(event.prefix)
+        # send our response back
+        handler.ctx.send_event(
+            HumanResponseEvent(
+                response=response,
+                user_name=event.user_name,
+            )
+        )
+
+response = await handler
+print(str(response))
+```
+
+As usual, you can see the [full code of this example](https://github.com/run-llama/python-agents-tutorial/blob/main/5_human_in_the_loop.py).
+
+You can do anything you want to capture the input; you could use a GUI, or audio input, or even get another, separate agent involved. If your input is going to take a while, or happen in another process, you might want to [serialize the context](./state.md) and save it to a database or file so that you can resume the workflow later.
+
+Speaking of getting other agents involved brings us to our next section, [multi-agent systems](./multi_agent.md).
diff --git a/docs/docs/understanding/agent/index.md b/docs/docs/understanding/agent/index.md
index 8c1f7fb7ff0e7fbf313be4ce8c5ac84717120106..8154c0f4bbdf3453d0819852df95db43d3c8a23b 100644
--- a/docs/docs/understanding/agent/index.md
+++ b/docs/docs/understanding/agent/index.md
@@ -1,8 +1,8 @@
-# Building a basic agent
+# Building an agent
 
 In LlamaIndex, an agent is a semi-autonomous piece of software powered by an LLM that is given a task and executes a series of steps towards solving that task. It is given a set of tools, which can be anything from arbitrary functions up to full LlamaIndex query engines, and it selects the best available tool to complete each step. When each step is completed, the agent judges whether the task is now complete, in which case it returns a result to the user, or whether it needs to take another step, in which case it loops back to the start.
 
-In LlamaIndex, you can either use our prepackaged agents/tools or [build your own agentic workflows from scratch](https://docs.llamaindex.ai/en/stable/understanding/workflows/), covered in the "Building Workflows" section. This section covers our prepackaged agents and tools.
+In LlamaIndex, you can either [build your own agentic workflows from scratch](../understanding/workflows/index.md), covered in the "Building Workflows" section, or you can use our pre-built `AgentWorkflow` class. This tutorial covers building single and multi-agent systems using `AgentWorkflow`. Because `AgentWorkflow` is itself a Workflow, you'll learn lots of concepts that you can apply to building Workflows from scratch later.
 
 ![agent flow](./agent_flow.png)
 
@@ -20,20 +20,20 @@ poetry shell
 And then we'll install the LlamaIndex library and some other dependencies that will come in handy:
 
 ```bash
-pip install llama-index python-dotenv
+pip install llama-index-core llama-index-llms-openai python-dotenv
 ```
 
 If any of this gives you trouble, check out our more detailed [installation guide](../getting_started/installation/).
 
 ## OpenAI Key
 
-Our agent will be powered by OpenAI's `GPT-3.5-Turbo` LLM, so you'll need an [API key](https://platform.openai.com/). Once you have your key, you can put it in a `.env` file in the root of your project:
+Our agent will be powered by OpenAI's `gpt-4o-mini` LLM, so you'll need an [API key](https://platform.openai.com/). Once you have your key, you can put it in a `.env` file in the root of your project:
 
 ```bash
 OPENAI_API_KEY=sk-proj-xxxx
 ```
 
-If you don't want to use OpenAI, we'll show you how to use other models later.
+If you don't want to use OpenAI, you can use [any other LLM](../using_llms/index.md) including local models. Agents require capable models, so smaller models may be less reliable.
 
 ## Bring in dependencies
 
@@ -43,9 +43,9 @@ We'll start by importing the components of LlamaIndex we need, as well as loadin
 from dotenv import load_dotenv
 
 load_dotenv()
-from llama_index.core.agent import ReActAgent
+
 from llama_index.llms.openai import OpenAI
-from llama_index.core.tools import FunctionTool
+from llama_index.core.agent.workflow import AgentWorkflow
 ```
 
 ## Create basic tools
@@ -58,63 +58,66 @@ def multiply(a: float, b: float) -> float:
     return a * b
 
 
-multiply_tool = FunctionTool.from_defaults(fn=multiply)
-
-
 def add(a: float, b: float) -> float:
     """Add two numbers and returns the sum"""
     return a + b
-
-
-add_tool = FunctionTool.from_defaults(fn=add)
 ```
 
-As you can see, these are regular vanilla Python functions. The docstring comments provide metadata to the agent about what the tool does: if your LLM is having trouble figuring out which tool to use, these docstrings are what you should tweak first.
-
-After each function is defined we create `FunctionTool` objects from these functions, which wrap them in a way that the agent can understand.
+As you can see, these are regular Python functions. When deciding what tool to use, your agent will use the tool's name, parameters, and docstring to determine what the tool does and whether it's appropriate for the task at hand. So it's important to make sure the docstrings are descriptive and helpful. It will also use the type hints to determine the expected parameters and return type.
 
 ## Initialize the LLM
 
-`GPT-3.5-Turbo` is going to be doing the work today:
+`gpt-4o-mini` is going to be doing the work today:
 
 ```python
-llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
+llm = OpenAI(model="gpt-4o-mini")
 ```
 
-You could also pick another popular model accessible via API, such as those from [Mistral](../examples/llm/mistralai/), [Claude from Anthropic](../examples/llm/anthropic/) or [Gemini from Google](../examples/llm/gemini/).
+You could also pick another popular model accessible via API, such as those from [Mistral](../../examples/llm/mistralai/), [Claude from Anthropic](../../examples/llm/anthropic/) or [Gemini from Google](../../examples/llm/gemini/).
 
 ## Initialize the agent
 
-Now we create our agent. In this case, this is a [ReAct agent](https://klu.ai/glossary/react-agent-model), a relatively simple but powerful agent. We give it an array containing our two tools, the LLM we just created, and set `verbose=True` so we can see what's going on:
+Now we create our agent. It needs an array of tools, an LLM, and a system prompt to tell it what kind of agent to be. Your system prompt would usually be more detailed than this!
 
 ```python
-agent = ReActAgent.from_tools([multiply_tool, add_tool], llm=llm, verbose=True)
+workflow = AgentWorkflow.from_tools_or_functions(
+    [multiply, add],
+    llm=llm,
+    system_prompt="You are an agent that can perform basic mathematical operations using tools.",
+)
 ```
 
+GPT-4o-mini is actually smart enough to not need tools to do such simple math, which is why we specified that it should use tools in the prompt.
+
 ## Ask a question
 
-We specify that it should use a tool, as this is pretty simple and GPT-3.5 doesn't really need this tool to get the answer.
+Now we can ask the agent to do some math:
 
 ```python
-response = agent.chat("What is 20+(2*4)? Use a tool to calculate every step.")
+response = await workflow.run(user_msg="What is 20+(2*4)?")
+print(response)
+```
+
+Note that this is asynchronous code. It will work in a notebook environment, but if you want to run it in regular Python you'll need to wrap it an asynchronous function, like this:
+
+```python
+async def main():
+    response = await workflow.run(user_msg="What is 20+(2*4)?")
+    print(response)
+
+
+if __name__ == "__main__":
+    import asyncio
+
+    asyncio.run(main())
 ```
 
 This should give you output similar to the following:
 
 ```
-Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
-Action: multiply
-Action Input: {'a': 2, 'b': 4}
-Observation: 8
-Thought: I need to add 20 to the result of the multiplication.
-Action: add
-Action Input: {'a': 20, 'b': 8}
-Observation: 28
-Thought: I can answer without using any more tools. I'll use the user's language to answer
-Answer: The result of 20 + (2 * 4) is 28.
-The result of 20 + (2 * 4) is 28.
+The result of (20 + (2 times 4)) is 28.
 ```
 
-As you can see, the agent picks the correct tools one after the other and combines the answers to give the final result. Check the [repo](https://github.com/run-llama/python-agents-tutorial/blob/main/1_basic_agent.py) to see what the final code should look like.
+Check the [repo](https://github.com/run-llama/python-agents-tutorial/blob/main/1_basic_agent.py) to see what the final code should look like.
 
-Congratulations! You've built the most basic kind of agent. Next you can find out how to use [local models](./local_models.md) or skip to [adding RAG to your agent](./rag_agent.md).
+Congratulations! You've built the most basic kind of agent. Next let's learn how to use [pre-built tools](./tools.md).
diff --git a/docs/docs/understanding/agent/llamaparse.md b/docs/docs/understanding/agent/llamaparse.md
deleted file mode 100644
index 9a87dd1768aa384a4a3009d61c7ebfb0e6435889..0000000000000000000000000000000000000000
--- a/docs/docs/understanding/agent/llamaparse.md
+++ /dev/null
@@ -1,59 +0,0 @@
-# Enhancing with LlamaParse
-
-In the previous example we asked a very basic question of our document, about the total amount of the budget. Let's instead ask a more complicated question about a specific fact in the document:
-
-```python
-documents = SimpleDirectoryReader("./data").load_data()
-index = VectorStoreIndex.from_documents(documents)
-query_engine = index.as_query_engine()
-
-response = query_engine.query(
-    "How much exactly was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?"
-)
-print(response)
-```
-
-We unfortunately get an unhelpful answer:
-
-```
-The budget allocated funds to a new green investments tax credit, but the exact amount was not specified in the provided context information.
-```
-
-This is bad, because we happen to know the exact number is in the document! But the PDF is complicated, with tables and multi-column layout, and the LLM is missing the answer. Luckily, we can use LlamaParse to help us out.
-
-First, you need a LlamaCloud API key. You can [get one for free](https://cloud.llamaindex.ai/) by signing up for LlamaCloud. Then put it in your `.env` file just like your OpenAI key:
-
-```bash
-LLAMA_CLOUD_API_KEY=llx-xxxxx
-```
-
-Now you're ready to use LlamaParse in your code. Let's bring it in as as import:
-
-```python
-from llama_parse import LlamaParse
-```
-
-And let's put in a second attempt to parse and query the file (note that this uses `documents2`, `index2`, etc.) and see if we get a better answer to the exact same question:
-
-```python
-documents2 = LlamaParse(result_type="markdown").load_data(
-    "./data/2023_canadian_budget.pdf"
-)
-index2 = VectorStoreIndex.from_documents(documents2)
-query_engine2 = index2.as_query_engine()
-
-response2 = query_engine2.query(
-    "How much exactly was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?"
-)
-print(response2)
-```
-
-We do!
-
-```
-$20 billion was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget.
-```
-
-You can always check [the repo](https://github.com/run-llama/python-agents-tutorial/blob/main/4_llamaparse.py) to what this code looks like.
-
-As you can see, parsing quality makes a big difference to what the LLM can understand, even for relatively simple questions. Next let's see how [memory](./memory.md) can help us with more complex questions.
diff --git a/docs/docs/understanding/agent/local_models.md b/docs/docs/understanding/agent/local_models.md
deleted file mode 100644
index 4b2d6332c6226c8a1ef662e0a627c426d8b2c0ee..0000000000000000000000000000000000000000
--- a/docs/docs/understanding/agent/local_models.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# Agents with local models
-
-If you're happy using OpenAI or another remote model, you can skip this section, but many people are interested in using models they run themselves. The easiest way to do this is via the great work of our friends at [Ollama](https://ollama.com/), who provide a simple to use client that will download, install and run a [growing range of models](https://ollama.com/library) for you.
-
-## Install Ollama
-
-They provide a one-click installer for Mac, Linux and Windows on their [home page](https://ollama.com/).
-
-## Pick and run a model
-
-Since we're going to be doing agentic work, we'll need a very capable model, but the largest models are hard to run on a laptop. We think `mixtral 8x7b` is a good balance between power and resources, but `llama3` is another great option. You can run Mixtral by running
-
-```bash
-ollama run mixtral:8x7b
-```
-
-The first time you run, it will also automatically download and install the model for you, which can take a while.
-
-## Switch to local agent
-
-To switch to Mixtral, you'll need to bring in the Ollama integration:
-
-```bash
-pip install llama-index-llms-ollama
-```
-
-Then modify your dependencies to bring in Ollama instead of OpenAI:
-
-```python
-from llama_index.llms.ollama import Ollama
-```
-
-And finally initialize Mixtral as your LLM instead:
-
-```python
-llm = Ollama(model="mixtral:8x7b", request_timeout=120.0)
-```
-
-## Ask the question again
-
-```python
-response = agent.chat("What is 20+(2*4)? Calculate step by step.")
-```
-
-The exact output looks different from OpenAI (it makes a mistake the first time it tries), but Mixtral gets the right answer:
-
-```
-Thought: The current language of the user is: English. The user wants to calculate the value of 20+(2*4). I need to break down this task into subtasks and use appropriate tools to solve each subtask.
-Action: multiply
-Action Input: {'a': 2, 'b': 4}
-Observation: 8
-Thought: The user has calculated the multiplication part of the expression, which is (2*4), and got 8 as a result. Now I need to add this value to 20 by using the 'add' tool.
-Action: add
-Action Input: {'a': 20, 'b': 8}
-Observation: 28
-Thought: The user has calculated the sum of 20+(2*4) and got 28 as a result. Now I can answer without using any more tools.
-Answer: The solution to the expression 20+(2*4) is 28.
-The solution to the expression 20+(2*4) is 28.
-```
-
-Check the [repo](https://github.com/run-llama/python-agents-tutorial/blob/main/2_local_agent.py) to see what this final code looks like.
-
-You can now continue the rest of the tutorial with a local model if you prefer. We'll keep using OpenAI as we move on to [adding RAG to your agent](./rag_agent.md).
diff --git a/docs/docs/understanding/agent/memory.md b/docs/docs/understanding/agent/memory.md
deleted file mode 100644
index bdfbeb720c9784aac369b677c942a5120965a285..0000000000000000000000000000000000000000
--- a/docs/docs/understanding/agent/memory.md
+++ /dev/null
@@ -1,56 +0,0 @@
-# Memory
-
-We've now made several additions and subtractions to our code. To make it clear what we're using, you can see [the current code for our agent](https://github.com/run-llama/python-agents-tutorial/blob/main/5_memory.py) in the repo. It's using OpenAI for the LLM and LlamaParse to enhance parsing.
-
-We've also added 3 questions in a row. Let's see how the agent handles them:
-
-```python
-response = agent.chat(
-    "How much exactly was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?"
-)
-
-print(response)
-
-response = agent.chat(
-    "How much was allocated to a implement a means-tested dental care program in the 2023 Canadian federal budget?"
-)
-
-print(response)
-
-response = agent.chat(
-    "How much was the total of those two allocations added together? Use a tool to answer any questions."
-)
-
-print(response)
-```
-
-This is demonstrating a powerful feature of agents in LlamaIndex: memory. Let's see what the output looks like:
-
-```
-Started parsing the file under job_id cac11eca-45e0-4ea9-968a-25f1ac9b8f99
-Thought: The current language of the user is English. I need to use a tool to help me answer the question.
-Action: canadian_budget_2023
-Action Input: {'input': 'How much was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget?'}
-Observation: $20 billion was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget.
-Thought: I can answer without using any more tools. I'll use the user's language to answer
-Answer: $20 billion was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget.
-$20 billion was allocated to a tax credit to promote investment in green technologies in the 2023 Canadian federal budget.
-Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
-Action: canadian_budget_2023
-Action Input: {'input': 'How much was allocated to implement a means-tested dental care program in the 2023 Canadian federal budget?'}
-Observation: $13 billion was allocated to implement a means-tested dental care program in the 2023 Canadian federal budget.
-Thought: I can answer without using any more tools. I'll use the user's language to answer
-Answer: $13 billion was allocated to implement a means-tested dental care program in the 2023 Canadian federal budget.
-$13 billion was allocated to implement a means-tested dental care program in the 2023 Canadian federal budget.
-Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
-Action: add
-Action Input: {'a': 20, 'b': 13}
-Observation: 33
-Thought: I can answer without using any more tools. I'll use the user's language to answer
-Answer: The total of the allocations for the tax credit to promote investment in green technologies and the means-tested dental care program in the 2023 Canadian federal budget is $33 billion.
-The total of the allocations for the tax credit to promote investment in green technologies and the means-tested dental care program in the 2023 Canadian federal budget is $33 billion.
-```
-
-The agent remembers that it already has the budget allocations from previous questions, and can answer a contextual question like "add those two allocations together" without needing to specify which allocations exactly. It even correctly uses the other addition tool to sum up the numbers.
-
-Having demonstrated how memory helps, let's [add some more complex tools](./tools.md) to our agent.
diff --git a/docs/docs/understanding/agent/multi_agent.md b/docs/docs/understanding/agent/multi_agent.md
new file mode 100644
index 0000000000000000000000000000000000000000..e77f7520d6d7cb560600b4c56c871e0eca8a8e4a
--- /dev/null
+++ b/docs/docs/understanding/agent/multi_agent.md
@@ -0,0 +1,246 @@
+# Multi-agent systems with AgentWorkflow
+
+So far you've been using `AgentWorkflow` to create single agents. But `AgentWorkflow` is also designed to support multi-agent systems, where multiple agents collaborate to complete your task, handing off control to each other as needed.
+
+In this example, our system will have three agents:
+
+* A `ResearchAgent` that will search the web for information on the given topic.
+* A `WriteAgent` that will write the report using the information found by the ResearchAgent.
+* A `ReviewAgent` that will review the report and provide feedback.
+
+We will use  `AgentWorkflow` to create a multi-agent system that will execute these agents in order.
+
+There are a lot of ways we could go about building a system to perform this task. In this example, we will use a few tools to help with the research and writing processes.
+
+* A `web_search` tool to search the web for information on the given topic (we'll use Tavily, as we did in previous examples)
+* A `record_notes` tool which will save research found on the web to the state so that the other tools can use it (see [state management](./state.md) to remind yourself how this works)
+* A `write_report` tool to write the report using the information found by the `ResearchAgent`
+* A `review_report` tool to review the report and provide feedback.
+
+Utilizing the Context class, we can pass state between agents, and each agent will have access to the current state of the system.
+
+We'll define our `web_search` tool simply by using the one we get from the `TavilyToolSpec`:
+
+```python
+tavily_tool = TavilyToolSpec(api_key=os.getenv("TAVILY_API_KEY"))
+search_web = tavily_tool.to_tool_list()[0]
+```
+
+Our `record_notes` tool will access the current state, add the notes to the state, and then return a message indicating that the notes have been recorded.
+
+```python
+async def record_notes(ctx: Context, notes: str, notes_title: str) -> str:
+    """Useful for recording notes on a given topic."""
+    current_state = await ctx.get("state")
+    if "research_notes" not in current_state:
+        current_state["research_notes"] = {}
+    current_state["research_notes"][notes_title] = notes
+    await ctx.set("state", current_state)
+    return "Notes recorded."
+```
+
+`write_report` and `review_report` will similarly be tools that access the state:
+
+```python
+async def write_report(ctx: Context, report_content: str) -> str:
+    """Useful for writing a report on a given topic."""
+    current_state = await ctx.get("state")
+    current_state["report_content"] = report_content
+    await ctx.set("state", current_state)
+    return "Report written."
+
+
+async def review_report(ctx: Context, review: str) -> str:
+    """Useful for reviewing a report and providing feedback."""
+    current_state = await ctx.get("state")
+    current_state["review"] = review
+    await ctx.set("state", current_state)
+    return "Report reviewed."
+```
+
+Now we're going to bring in a new class to create a stand-alone function-calling agent, the `FunctionAgent` (we also support a `ReactAgent`):
+
+```python
+from llama_index.core.agent.workflow import FunctionAgent
+```
+
+Using it, we'll create the first of our agents, the `ResearchAgent` which will search the web for information using the `search_web` tool and use the `record_notes` tool to save those notes to the state for other agents to use. The key syntactical elements to note here are:
+* The `name`, which is used to identify the agent to other agents, as we'll see shortly
+* The `description`, which is used by other agents to decide who to hand off control to next
+* The `system_prompt`, which defines the behavior of the agent
+* `can_handoff_to` is an optional list of agent names that the agent can hand control to. By default, it will be able to hand control to any other agent.
+
+```python
+research_agent = FunctionAgent(
+    name="ResearchAgent",
+    description="Useful for searching the web for information on a given topic and recording notes on the topic.",
+    system_prompt=(
+        "You are the ResearchAgent that can search the web for information on a given topic and record notes on the topic. "
+        "Once notes are recorded and you are satisfied, you should hand off control to the WriteAgent to write a report on the topic."
+    ),
+    llm=llm,
+    tools=[search_web, record_notes],
+    can_handoff_to=["WriteAgent"],
+)
+```
+
+Our other two agents are defined similarly, with different tools and system prompts:
+
+```python
+write_agent = FunctionAgent(
+    name="WriteAgent",
+    description="Useful for writing a report on a given topic.",
+    system_prompt=(
+        "You are the WriteAgent that can write a report on a given topic. "
+        "Your report should be in a markdown format. The content should be grounded in the research notes. "
+        "Once the report is written, you should get feedback at least once from the ReviewAgent."
+    ),
+    llm=llm,
+    tools=[write_report],
+    can_handoff_to=["ReviewAgent", "ResearchAgent"],
+)
+
+review_agent = FunctionAgent(
+    name="ReviewAgent",
+    description="Useful for reviewing a report and providing feedback.",
+    system_prompt=(
+        "You are the ReviewAgent that can review a report and provide feedback. "
+        "Your feedback should either approve the current report or request changes for the WriteAgent to implement."
+    ),
+    llm=llm,
+    tools=[review_report],
+    can_handoff_to=["WriteAgent"],
+)
+```
+
+With our agents defined, we can now instantiate our `AgentWorkflow` directly to create a multi-agent system. We give it an array of our agents, and define which one should initially have control using `root_agent`. We can also define the initial value of the `state` variable, which as we've [seen previously](./state.md), is a dictionary that can be accessed by all agents.
+
+```python
+agent_workflow = AgentWorkflow(
+    agents=[research_agent, write_agent, review_agent],
+    root_agent=research_agent.name,
+    initial_state={
+        "research_notes": {},
+        "report_content": "Not written yet.",
+        "review": "Review required.",
+    },
+)
+```
+
+Now we're ready to run our multi-agent system. We've added some event-handling [using streaming events](./streaming.md) to make it clearer what's happening under the hood:
+
+```python
+handler = agent_workflow.run(
+    user_msg="""
+    Write me a report on the history of the web. Briefly describe the history
+    of the world wide web, including the development of the internet and the
+    development of the web, including 21st century developments.
+"""
+)
+
+current_agent = None
+current_tool_calls = ""
+async for event in handler.stream_events():
+    if (
+        hasattr(event, "current_agent_name")
+        and event.current_agent_name != current_agent
+    ):
+        current_agent = event.current_agent_name
+        print(f"\n{'='*50}")
+        print(f"🤖 Agent: {current_agent}")
+        print(f"{'='*50}\n")
+    elif isinstance(event, AgentOutput):
+        if event.response.content:
+            print("📤 Output:", event.response.content)
+        if event.tool_calls:
+            print(
+                "🛠️  Planning to use tools:",
+                [call.tool_name for call in event.tool_calls],
+            )
+    elif isinstance(event, ToolCallResult):
+        print(f"🔧 Tool Result ({event.tool_name}):")
+        print(f"  Arguments: {event.tool_kwargs}")
+        print(f"  Output: {event.tool_output}")
+    elif isinstance(event, ToolCall):
+        print(f"🔨 Calling Tool: {event.tool_name}")
+        print(f"  With arguments: {event.tool_kwargs}")
+```
+
+This gives us some very verbose output, which we've truncated here for brevity:
+
+```
+==================================================
+🤖 Agent: ResearchAgent
+==================================================
+
+🛠️  Planning to use tools: ['search']
+🔨 Calling Tool: search
+  With arguments: {'query': 'history of the world wide web and internet development', 'max_results': 6}
+🔧 Tool Result (search):
+  Arguments: {'query': 'history of the world wide web and internet development', 'max_results': 6}
+  Output: [Document(id_='2e977310-2994-4ea9-ade2-8da4533983e8', embedding=None, metadata={'url': 'https://www.scienceandmediamuseum.org.uk/objects-and-stories/short-history-internet'}, excluded_embed_metadata_keys=[], ...
+🛠️  Planning to use tools: ['record_notes', 'record_notes']
+🔨 Calling Tool: record_notes
+  With arguments: {'notes': 'The World Wide Web (WWW) was created by Tim Berners-Lee...','notes_title': 'History of the World Wide Web and Internet Development'}
+🔧 Tool Result (record_notes):
+  Arguments: {'notes': 'The World Wide Web (WWW) was created by Tim Berners-Lee...', 'notes_title': 'History of the World Wide Web and Internet Development'}
+  Output: Notes recorded.
+🔨 Calling Tool: record_notes
+  With arguments: {'notes': "The internet's origins trace back to the 1950s....", 'notes_title': '21st Century Developments in Web Technology'}
+🔧 Tool Result (record_notes):
+  Arguments: {'notes': "The internet's origins trace back to the 1950s... .", 'notes_title': '21st Century Developments in Web Technology'}
+  Output: Notes recorded.
+🛠️  Planning to use tools: ['handoff']
+🔨 Calling Tool: handoff
+  With arguments: {'to_agent': 'WriteAgent', 'reason': 'I have recorded the necessary notes on the history of the web and its developments.'}
+🔧 Tool Result (handoff):
+  Arguments: {'to_agent': 'WriteAgent', 'reason': 'I have recorded the necessary notes on the history of the web and its developments.'}
+  Output: Agent WriteAgent is now handling the request due to the following reason: I have recorded the necessary notes on the history of the web and its developments..
+Please continue with the current request.
+```
+
+You can see that `ResearchAgent` has found some notes and handed control to `WriteAgent`, which generates `report_content`:
+
+```
+==================================================
+🤖 Agent: WriteAgent
+==================================================
+
+🛠️  Planning to use tools: ['write_report']
+🔨 Calling Tool: write_report
+  With arguments: {'report_content': '# History of the World Wide Web...'}
+🔧 Tool Result (write_report):
+  Arguments: {'report_content': '# History of the World Wide Web...'}
+  Output: Report written.
+🛠️  Planning to use tools: ['handoff']
+🔨 Calling Tool: handoff
+  With arguments: {'to_agent': 'ReviewAgent', 'reason': 'The report on the history of the web has been completed and requires review.'}
+🔧 Tool Result (handoff):
+  Arguments: {'to_agent': 'ReviewAgent', 'reason': 'The report on the history of the web has been completed and requires review.'}
+  Output: Agent ReviewAgent is now handling the request due to the following reason: The report on the history of the web has been completed and requires review..
+Please continue with the current request.
+```
+
+And finally control is passed to the `ReviewAgent` to review the report:
+
+```
+==================================================
+🤖 Agent: ReviewAgent
+==================================================
+
+🛠️  Planning to use tools: ['review_report']
+🔨 Calling Tool: review_report
+  With arguments: {'review': 'The report on the history of the web is well-structured ... Approval is granted.'}
+🔧 Tool Result (review_report):
+  Arguments: {'review': 'The report on the history of the web is well-structured ... Approval is granted.'}
+  Output: Report reviewed.
+📤 Output: The report on the history of the web has been reviewed and approved. It effectively covers the key developments from the inception of the internet to the 21st century, including significant contributions and advancements. If you need any further assistance or additional reports, feel free to ask!
+```
+
+You can see the [full code of this example](https://github.com/run-llama/python-agents-tutorial/blob/main/6_multi_agent.py).
+
+As an extension of this example, you could create a system that takes the feedback from the `ReviewAgent` and passes it back to the `WriteAgent` to update the report.
+
+## Congratulations!
+
+You've covered all there is to know about building agents with `AgentWorkflow`. In the [Workflows tutorial](../workflows/index.md), you'll take many of the concepts you've learned here and apply them to building more precise, lower-level agentic systems.
diff --git a/docs/docs/understanding/agent/multi_agents.md b/docs/docs/understanding/agent/multi_agents.md
deleted file mode 100644
index 06988d6758b63ded0c9a603443b2bd15392db493..0000000000000000000000000000000000000000
--- a/docs/docs/understanding/agent/multi_agents.md
+++ /dev/null
@@ -1,301 +0,0 @@
-# Multi-Agent Workflows
-
-The `AgentWorkflow` uses Workflow Agents to allow you to create a system of one or more agents that can collaborate and hand off tasks to each other based on their specialized capabilities. This enables building complex agent systems where different agents handle different aspects of a task.
-
-!!! tip
-    The `AgentWorkflow` class is built on top of LlamaIndex `Workflows`. For more information on how workflows work, check out the [detailed guide](../../module_guides/workflow/index.md) or [introductory tutorial](../workflows/index.md).
-
-## Quick Start
-
-Here's a simple example of setting up a multi-agent workflow with a calculator agent and a retriever agent:
-
-```python
-from llama_index.core.agent.workflow import (
-    AgentWorkflow,
-    FunctionAgent,
-    ReActAgent,
-)
-from llama_index.core.tools import FunctionTool
-
-
-# Define some tools
-def add(a: int, b: int) -> int:
-    """Add two numbers."""
-    return a + b
-
-
-def subtract(a: int, b: int) -> int:
-    """Subtract two numbers."""
-    return a - b
-
-
-# Create agent configs
-# NOTE: we can use FunctionAgent or ReActAgent here.
-# FunctionAgent works for LLMs with a function calling API.
-# ReActAgent works for any LLM.
-calculator_agent = FunctionAgent(
-    name="calculator",
-    description="Performs basic arithmetic operations",
-    system_prompt="You are a calculator assistant.",
-    tools=[
-        FunctionTool.from_defaults(fn=add),
-        FunctionTool.from_defaults(fn=subtract),
-    ],
-    llm=OpenAI(model="gpt-4"),
-)
-
-retriever_agent = FunctionAgent(
-    name="retriever",
-    description="Manages data retrieval",
-    system_prompt="You are a retrieval assistant.",
-    llm=OpenAI(model="gpt-4"),
-)
-
-# Create and run the workflow
-workflow = AgentWorkflow(
-    agents=[calculator_agent, retriever_agent], root_agent="calculator"
-)
-
-# Run the system
-response = await workflow.run(user_msg="Can you add 5 and 3?")
-
-#  Or stream the events
-handler = workflow.run(user_msg="Can you add 5 and 3?")
-async for event in handler.stream_events():
-    if hasattr(event, "delta"):
-        print(event.delta, end="", flush=True)
-```
-
-## How It Works
-
-The AgentWorkflow manages a collection of agents, each with their own specialized capabilities. One agent must be designated as the root agent in the `AgentWorkflow` constructor.
-
-When a user message comes in, it's first routed to the root agent. Each agent can then:
-
-1. Handle the request directly using their tools
-2. Hand off to another agent better suited for the task
-3. Return a response to the user
-
-## Configuration Options
-
-### Agent Workflow Config
-
-Each agent holds a certain set of configuration options. Whether you use `FunctionAgent` or `ReActAgent`, the core options are the same.
-
-```python
-FunctionAgent(
-    # Unique name for the agent (str)
-    name="name",
-    # Description of agent's capabilities (str)
-    description="description",
-    # System prompt for the agent (str)
-    system_prompt="system_prompt",
-    # Tools available to this agent (List[BaseTool])
-    tools=[...],
-    # LLM to use for this agent. (BaseLLM)
-    llm=OpenAI(model="gpt-4"),
-    # List of agents this one can hand off to. Defaults to all agents. (List[str])
-    can_handoff_to=[...],
-)
-```
-
-### Workflow Options
-
-The AgentWorkflow constructor accepts:
-
-```python
-AgentWorkflow(
-    # List of agent configs. (List[BaseWorkflowAgent])
-    agents=[...],
-    # Root agent name. (str)
-    root_agent="root_agent",
-    # Initial state dict. (Optional[dict])
-    initial_state=None,
-    # Custom prompt for handoffs. Should contain the `agent_info` string variable. (Optional[str])
-    handoff_prompt=None,
-    # Custom prompt for state. Should contain the `state` and `msg` string variables. (Optional[str])
-    state_prompt=None,
-    # Timeout for the workflow, in seconds. (Optional[float])
-    timeout=None,
-)
-```
-
-### State Management
-
-#### Initial Global State
-
-You can provide an initial state dict that will be available to all agents:
-
-```python
-workflow = AgentWorkflow(
-    agents=[...],
-    root_agent="root_agent",
-    initial_state={"counter": 0},
-    state_prompt="Current state: {state}. User message: {msg}",
-)
-```
-
-The state is stored in the `state` key of the workflow context. It will be injected into the `state_prompt` which augments each new user message.
-
-The state can also be modified by tools by accessing the workflow context directly in the tool body.
-
-#### Persisting State Between Runs
-
-In order to persist state between runs, you can pass in the context from the previous run:
-
-```python
-workflow = AgentWorkflow(...)
-
-# Run the workflow
-handler = workflow.run(user_msg="Can you add 5 and 3?")
-response = await handler
-
-# Pass in the context from the previous run
-handler = workflow.run(ctx=handler.ctx, user_msg="Can you add 5 and 3?")
-response = await handler
-```
-
-#### Serializing Context / State
-
-As with normal workflows, the context is serializable:
-
-```python
-from llama_index.core.workflow import (
-    Context,
-    JsonSerializer,
-    JsonPickleSerializer,
-)
-
-# the default serializer is JsonSerializer for safety
-ctx_dict = handler.ctx.to_dict(serializer=JsonSerializer())
-
-# then you can rehydrate the context
-ctx = Context.from_dict(ctx_dict, serializer=JsonSerializer())
-```
-
-## Streaming Events
-
-The workflow emits various events during execution that you can stream:
-
-```python
-async for event in workflow.run(...).stream_events():
-    if isinstance(event, AgentInput):
-        print(event.input)
-        print(event.current_agent_name)
-    elif isinstance(event, AgentStream):
-        # Agent thinking/tool calling response stream
-        print(event.delta)
-        print(event.current_agent_name)
-    elif isinstance(event, AgentOutput):
-        print(event.response)
-        print(event.tool_calls)
-        print(event.raw)
-        print(event.current_agent_name)
-    elif isinstance(event, ToolCall):
-        # Tool being called
-        print(event.tool_name)
-        print(event.tool_kwargs)
-    elif isinstance(event, ToolCallResult):
-        # Result of tool call
-        print(event.tool_output)
-```
-
-## Accessing Context in Tools
-
-The `FunctionTool` allows tools to access the workflow context if the function has a `Context` type hint as the first parameter:
-
-```python
-from llama_index.core.tools import FunctionTool
-
-
-async def get_counter(ctx: Context) -> int:
-    """Get the current counter value."""
-    return await ctx.get("counter", default=0)
-
-
-counter_tool = FunctionToolWithContext.from_defaults(
-    async_fn=get_counter, description="Get the current counter value"
-)
-```
-
-!!! tip
-    The `FunctionTool` requires the `ctx` parameter to be passed in explicitly when calling the tool. `AgentWorkflow` will automatically pass in the context for you.
-
-## Human in the Loop
-
-Using the context, you can implement a human in the loop pattern in your tools:
-
-```python
-from llama_index.core.workflow import InputRequiredEvent, HumanResponseEvent
-
-
-async def ask_for_confirmation(ctx: Context) -> bool:
-    """Ask the user for confirmation."""
-    ctx.write_event_to_stream(
-        InputRequiredEvent(prefix="Please confirm", confirmation_id="1234")
-    )
-
-    result = await ctx.wait_for_event(
-        HumanResponseEvent, requirements={"confirmation_id": "1234"}
-    )
-    return result.confirmation
-```
-
-When this function is called (i.e, when an agent calls this tool), it will block the workflow execution until the user sends the required confirmation event.
-
-```python
-handler = workflow.run(user_msg="Can you add 5 and 3?")
-
-async for event in handler.stream_events():
-    if isinstance(event, InputRequiredEvent):
-        print(event.confirmation_id)
-        handler.ctx.send_event(
-            HumanResponseEvent(response="True", confirmation_id="1234")
-        )
-    ...
-```
-
-## A Detailed Look at the Workflow
-
-Now that we've covered the basics, let's take a look at how the workflow operates in more detail using an end-to-end example. In this example, assume we have an `AgentWorkflow` with two agents: `generate` and `review`. In this workflow, `generate` is the root agent, and responsible for generating content. The `review` agent is responsible for reviewing the generated content.
-
-When the user sends in a request, here's the actual sequence of events:
-
-1. The workflow initializes the context with:
-   - A memory buffer for chat history.
-   - The available agents
-   - The [initial state](#initial-global-state) dictionary
-   - The current agent (initially set to the root agent, `generate`)
-
-2. The user's message is processed:
-   - If [state exists](#initial-global-state), it's added to the user's message using the [state prompt](#agent-workflow-config)
-   - The message is added to memory
-   - The chat history is prepared for the current agent
-
-3. The current agent is set up:
-   - The agent's tools are gathered (including any retrieved tools)
-   - A special `handoff` tool is added if the agent can hand off to others
-   - The agent's system prompt is prepended to the chat history
-   - An `AgentInput` event is emitted just before the LLM is called
-
-4. The agent processes the input:
-   - The agent generates a response and/or makes tool calls. This generates both `AgentStream` events and an `AgentOutput` event
-   - If there are no tool calls, the agent finalizes its response and returns it
-   - If there are tool calls, each tool is executed and the results are processed. This will generate a `ToolCall` event and a `ToolCallResult` event for each tool call
-
-5. After tool execution:
-   - If any tool was marked as `return_direct=True`, its result becomes the final output
-   - If a handoff occurred (via the handoff tool), the workflow switches to the new agent. This will not be added to the chat history in order to maintain the conversation flow.
-   - Otherwise, the updated chat history is sent back to the current agent for another step
-
-This cycle continues until either:
-- The current agent provides a response without tool calls
-- A tool marked as `return_direct=True` is called (except for handoffs)
-- The workflow times out (if a timeout was configured)
-
-## Examples
-
-We have a few notebook examples using the `AgentWorkflow` class:
-
-- [Agent Workflow Overview](../../examples/agent/agent_workflow_basic.ipynb)
-- [Multi-Agent Research Report Workflow](../../examples/agent/agent_workflow_multi.ipynb)
diff --git a/docs/docs/understanding/agent/state.md b/docs/docs/understanding/agent/state.md
new file mode 100644
index 0000000000000000000000000000000000000000..17a7a6747ef77b041de2ed77f35d56456b1844a1
--- /dev/null
+++ b/docs/docs/understanding/agent/state.md
@@ -0,0 +1,144 @@
+# Maintaining state
+
+By default, the `AgentWorkflow` is stateless between runs. This means that the agent will not have any memory of previous runs.
+
+To maintain state, we need to keep track of the previous state. In LlamaIndex, Workflows have a `Context` class that can be used to maintain state within and between runs. Since the AgentWorkflow is just a pre-built Workflow, we can also use it now.
+
+```python
+from llama_index.core.workflow import Context
+```
+
+To maintain state between runs, we'll create a new Context called ctx. We pass in our workflow to properly configure this Context object for the workflow that will use it.
+
+```python
+ctx = Context(workflow)
+```
+
+With our configured Context, we can pass it to our first run.
+
+```python
+response = await workflow.run(user_msg="Hi, my name is Laurie!", ctx=ctx)
+print(response)
+```
+
+Which gives us:
+
+```
+Hello Laurie! How can I assist you today?
+```
+
+And now if we run the workflow again to ask a follow-up question, it will remember that information:
+
+```python
+response2 = await workflow.run(user_msg="What's my name?", ctx=ctx)
+print(response2)
+```
+
+Which gives us:
+
+```
+Your name is Laurie!
+```
+
+## Maintaining state over longer periods
+
+The Context is serializable, so it can be saved to a database, file, etc. and loaded back in later.
+
+The JsonSerializer is a simple serializer that uses `json.dumps` and `json.loads` to serialize and deserialize the context.
+
+The JsonPickleSerializer is a serializer that uses pickle to serialize and deserialize the context. If you have objects in your context that are not serializable, you can use this serializer.
+
+We bring in our serializers as any other import:
+
+```python
+from llama_index.core.workflow import JsonPickleSerializer, JsonSerializer
+```
+
+We can then serialize our context to a dictionary and save it to a file:
+
+```python
+ctx_dict = ctx.to_dict(serializer=JsonSerializer())
+```
+
+We can deserialize it back into a Context object and ask questions just as before:
+
+```python
+restored_ctx = Context.from_dict(
+    workflow, ctx_dict, serializer=JsonSerializer()
+)
+
+response3 = await workflow.run(user_msg="What's my name?", ctx=restored_ctx)
+```
+
+You can see the [full code of this example](https://github.com/run-llama/python-agents-tutorial/blob/main/3_state.py).
+
+## Tools and state
+
+Tools can also be defined that have access to the workflow context. This means you can set and retrieve variables from the context and use them in the tool, or to pass information between tools.
+
+`AgentWorkflow` uses a context variable called `state` that is available to every agent. You can rely on information in `state` being available without explicitly having to pass it in.
+
+To access the Context, the Context parameter should be the first parameter of the tool, as we're doing here, in a tool that simply adds a name to the state:
+
+```python
+async def set_name(ctx: Context, name: str) -> str:
+    state = await ctx.get("state")
+    state["name"] = name
+    await ctx.set("state", state)
+    return f"Name set to {name}"
+```
+
+We can now create an agent that uses this tool. You can optionally provide the initial state of the agent, which we'll do here:
+
+```python
+workflow = AgentWorkflow.from_tools_or_functions(
+    [set_name],
+    llm=llm,
+    system_prompt="You are a helpful assistant that can set a name.",
+    initial_state={"name": "unset"},
+)
+```
+
+Now we can create a Context and ask the agent about the state:
+
+```python
+ctx = Context(workflow)
+
+# check if it knows a name before setting it
+response = await workflow.run(user_msg="What's my name?", ctx=ctx)
+print(str(response))
+```
+
+Which gives us:
+
+```
+Your name has been set to "unset."
+```
+
+Then we can explicitly set the name in a new run of the agent:
+
+```python
+response2 = await workflow.run(user_msg="My name is Laurie", ctx=ctx)
+print(str(response2))
+```
+
+```
+Your name has been updated to "Laurie."
+```
+
+We could now ask the agent the name again, or we can access the value of the state directly:
+
+```python
+state = await ctx.get("state")
+print("Name as stored in state: ", state["name"])
+```
+
+Which gives us:
+
+```
+Name as stored in state: Laurie
+```
+
+You can see the [full code of this example](https://github.com/run-llama/python-agents-tutorial/blob/main/3a_tools_and_state.py).
+
+Next we'll learn about [streaming output and events](./streaming.md).
diff --git a/docs/docs/understanding/agent/streaming.md b/docs/docs/understanding/agent/streaming.md
new file mode 100644
index 0000000000000000000000000000000000000000..a2725806366b40b70164ec3296f71c9c0eb1d9aa
--- /dev/null
+++ b/docs/docs/understanding/agent/streaming.md
@@ -0,0 +1,76 @@
+# Streaming output and events
+
+In real-world use, agents can take a long time to run. Providing feedback to the user about the progress of the agent is critical, and streaming allows you to do that.
+
+`AgentWorkflow` provides a set of pre-built events that you can use to stream output to the user. Let's take a look at how that's done.
+
+First, we're going to introduce a new tool that takes some time to execute. In this case we'll use a web search tool called [Tavily](https://llamahub.ai/l/tools/llama-index-tools-tavily-research), which is available in LlamaHub.
+
+```bash
+pip install llama-index-tools-tavily-research
+```
+
+It requires an API key, which we're going to set in our `.env` file as `TAVILY_API_KEY` and retrieve using the `os.getenv` method. Let's bring in our imports:
+
+```python
+from llama_index.tools.tavily_research import TavilyToolSpec
+import os
+```
+
+And initialize the tool:
+
+```python
+tavily_tool = TavilyToolSpec(api_key=os.getenv("TAVILY_API_KEY"))
+```
+
+Now we'll create an agent using that tool and an LLM that we initialized just like we did previously.
+
+```python
+workflow = AgentWorkflow.from_tools_or_functions(
+    tavily_tool.to_tool_list(),
+    llm=llm,
+    system_prompt="You're a helpful assistant that can search the web for information.",
+)
+```
+
+In previous examples, we've used `await` on the `workflow.run` method to get the final response from the agent. However, if we don't await the response, we get an asynchronous iterator back that we can iterate over to get the events as they come in. This iterator will return all sorts of events. We'll start with an `AgentStream` event, which contains the "delta" (the most recent change) to the output as it comes in. We'll need to import that event type:
+
+```python
+from llama_index.core.agent.workflow import AgentStream
+```
+
+And now we can run the workflow and look for events of that type to output:
+
+```python
+handler = workflow.run(user_msg="What's the weather like in San Francisco?")
+
+async for event in handler.stream_events():
+    if isinstance(event, AgentStream):
+        print(event.delta, end="", flush=True)
+```
+
+If you run this yourself, you will see the output arriving in chunks as the agent runs, returning something like this:
+
+```
+The current weather in San Francisco is as follows:
+
+- **Temperature**: 17.2°C (63°F)
+- **Condition**: Sunny
+- **Wind**: 6.3 mph (10.1 kph) from the NNW
+- **Humidity**: 54%
+- **Pressure**: 1021 mb (30.16 in)
+- **Visibility**: 16 km (9 miles)
+
+For more details, you can check the full report [here](https://www.weatherapi.com/).
+```
+
+`AgentStream` is just one of many events that `AgentWorkflow` emits as it runs. The others are:
+
+* `AgentInput`: the full message object that begins the agent's execution
+* `AgentOutput`: the response from the agent
+* `ToolCall`: which tools were called and with what arguments
+* `ToolCallResult`: the result of a tool call
+
+You can see us filtering for more of these events in the [full code of this example](https://github.com/run-llama/python-agents-tutorial/blob/main/4_streaming.py).
+
+Next you'll learn about how to get a [human in the loop](./human_in_the_loop.md) to provide feedback to your agents.
diff --git a/docs/docs/understanding/agent/tools.md b/docs/docs/understanding/agent/tools.md
index 48eeef24eaa808ab07adea1985c470b3591c81f1..ea48a4fca42631b22f51ba2096e4e3498fd6c05a 100644
--- a/docs/docs/understanding/agent/tools.md
+++ b/docs/docs/understanding/agent/tools.md
@@ -12,84 +12,45 @@ First we need to install the tool:
 pip install llama-index-tools-yahoo-finance
 ```
 
-Then we can set up our dependencies. This is exactly the same as our previous examples, except for the final import:
+Our dependencies are the same as our previous example, we just need to add the Yahoo Finance tools:
 
 ```python
-from dotenv import load_dotenv
-
-load_dotenv()
-from llama_index.core.agent import ReActAgent
-from llama_index.llms.openai import OpenAI
-from llama_index.core.tools import FunctionTool
-from llama_index.core import Settings
 from llama_index.tools.yahoo_finance import YahooFinanceToolSpec
 ```
 
-To show how custom tools and LlamaHub tools can work together, we'll include the code from our previous examples the defines a "multiple" tool. We'll also take this opportunity to set up the LLM:
-
-```python
-# settings
-Settings.llm = OpenAI(model="gpt-4o", temperature=0)
-
-
-# function tools
-def multiply(a: float, b: float) -> float:
-    """Multiply two numbers and returns the product"""
-    return a * b
-
-
-multiply_tool = FunctionTool.from_defaults(fn=multiply)
-
-
-def add(a: float, b: float) -> float:
-    """Add two numbers and returns the sum"""
-    return a + b
-
-
-add_tool = FunctionTool.from_defaults(fn=add)
-```
-
-Now we'll do the new step, which is to fetch the array of tools:
+To show how you can combine custom tools with LlamaHub tools, we're going to leave the `add` and `multiply` functions in place even though we don't need them here. We'll bring in our tools:
 
 ```python
 finance_tools = YahooFinanceToolSpec().to_tool_list()
 ```
 
-This is just a regular array, so we can use Python's `extend` method to add our own tools to the mix:
+A tool list is just an array, so we can use Python's `extend` method to add our own tools to the mix:
 
 ```python
-finance_tools.extend([multiply_tool, add_tool])
+finance_tools.extend([multiply, add])
 ```
 
-Then we set up the agent as usual, and ask a question:
+And we'll ask a different question than last time, necessitating the use of the new tools:
 
 ```python
-agent = ReActAgent.from_tools(finance_tools, verbose=True)
-
-response = agent.chat("What is the current price of NVDA?")
-
-print(response)
+async def main():
+    response = await workflow.run(
+        user_msg="What's the current stock price of NVIDIA?"
+    )
+    print(response)
 ```
 
-The response is very wordy, so we've truncated it:
+We get this response:
 
 ```
-Thought: The current language of the user is English. I need to use a tool to help me answer the question.
-Action: stock_basic_info
-Action Input: {'ticker': 'NVDA'}
-Observation: Info:
-{'address1': '2788 San Tomas Expressway'
-...
-'currentPrice': 135.58
-...}
-Thought: I have obtained the current price of NVDA from the stock basic info.
-Answer: The current price of NVDA (NVIDIA Corporation) is $135.58.
-The current price of NVDA (NVIDIA Corporation) is $135.58.
+The current stock price of NVIDIA Corporation (NVDA) is $128.41.
 ```
 
-Perfect! As you can see, using existing tools is a snap.
+(This is cheating a little bit, because our model already knew the ticker symbol for NVIDIA. If it were a less well-known corporation you would need to add a search tool like [Tavily](https://llamahub.ai/l/tools/llama-index-tools-tavily-research) to find the ticker symbol.)
+
+And that's it! You can now use any of the tools in LlamaHub in your agents.
 
-As always, you can check [the repo](https://github.com/run-llama/python-agents-tutorial/blob/main/6_tools.py) to see this code all in one place.
+As always, you can check [the repo](https://github.com/run-llama/python-agents-tutorial/blob/main/2_tools.py) to see this code all in one place.
 
 ## Building and contributing your own tools
 
@@ -100,4 +61,4 @@ We love open source contributions of new tools! You can see an example of [what
 
 Once you've got a tool working, follow our [contributing guide](https://github.com/run-llama/llama_index/blob/main/CONTRIBUTING.md#2--contribute-a-pack-reader-tool-or-dataset-formerly-from-llama-hub) for instructions on correctly setting metadata and submitting a pull request.
 
-Congratulations! You've completed our guide to building agents with LlamaIndex. We can't wait to see what use-cases you build!
+Next we'll look at [how to maintain state](./state.md) in your agents.
diff --git a/docs/docs/understanding/index.md b/docs/docs/understanding/index.md
index e8675bd2cd85ef37ac29f419a8f18402d6d8e9bd..88b47574ab225424762c563fd957c8de10a3d3bb 100644
--- a/docs/docs/understanding/index.md
+++ b/docs/docs/understanding/index.md
@@ -1,37 +1,53 @@
 # Building an LLM application
 
-Welcome to the beginning of Understanding LlamaIndex. This is a series of short, bite-sized tutorials on every stage of building an LLM application to get you acquainted with how to use LlamaIndex before diving into more advanced and subtle strategies. If you're an experienced programmer new to LlamaIndex, this is the place to start.
+Welcome to Understanding LlamaIndex. This is a series of short, bite-sized tutorials on every stage of building an agentic LLM application to get you acquainted with how to use LlamaIndex before diving into more advanced and subtle strategies. If you're an experienced programmer new to LlamaIndex, this is the place to start.
 
-## Key steps in building an LLM application
+## Key steps in building an agentic LLM application
 
 !!! tip
-    If you've already read our [high-level concepts](../getting_started/concepts.md) page you'll recognize several of these steps.
+    You might want to read our [high-level concepts](../getting_started/concepts.md) if these terms are unfamiliar.
 
 This tutorial has three main parts: **Building a RAG pipeline**, **Building an agent**, and **Building Workflows**, with some smaller sections before and after. Here's what to expect:
 
 - **[Using LLMs](./using_llms/using_llms.md)**: hit the ground running by getting started working with LLMs. We'll show you how to use any of our [dozens of supported LLMs](../module_guides/models/llms/modules/), whether via remote API calls or running locally on your machine.
 
-- **Building a RAG pipeline**: Retrieval-Augmented Generation (RAG) is a key technique for getting your data into an LLM, and a component of more sophisticated agentic systems. We'll show you how to build a full-featured RAG pipeline that can answer questions about your data. This includes:
+- **[Building agents](./agent/index.md)**: agents are LLM-powered knowledge workers that can interact with the world via a set of tools. Those tools can retrieve information (such as RAG, see below) or take action. This tutorial includes:
 
-    - **[Loading & Ingestion](./loading/loading.md)**: Getting your data from wherever it lives, whether that's unstructured text, PDFs, databases, or APIs to other applications. LlamaIndex has hundreds of connectors to every data source over at [LlamaHub](https://llamahub.ai/).
+    - **[Building a single agent](./agent/index.md)**: We show you how to build a simple agent that can interact with the world via a set of tools.
 
-    - **[Indexing and Embedding](./indexing/indexing.md)**: Once you've got your data there are an infinite number of ways to structure access to that data to ensure your applications is always working with the most relevant data. LlamaIndex has a huge number of these strategies built-in and can help you select the best ones.
+    - **[Using existing tools](./agent/tools.md)**: LlamaIndex provides a registry of pre-built agent tools at [LlamaHub](https://llamahub.ai/) that you can incorporate into your agents.
 
-    - **[Storing](./storing/storing.md)**: You will probably find it more efficient to store your data in indexed form, or pre-processed summaries provided by an LLM, often in a specialized database known as a `Vector Store` (see below). You can also store your indexes, metadata and more.
+    - **[Maintaining state](./agent/state.md)**: agents can maintain state, which is important for building more complex applications.
 
-    - **[Querying](./querying/querying.md)**: Every indexing strategy has a corresponding querying strategy and there are lots of ways to improve the relevance, speed and accuracy of what you retrieve and what the LLM does with it before returning it to you, including turning it into structured responses such as an API.
+    - **[Streaming output and events](./agent/streaming.md)**: providing visibility and feedback to the user is important, and streaming allows you to do that.
+
+    - **[Human in the loop](./agent/human_in_the_loop.md)**: getting human feedback to your agent can be critical.
+
+    - **[Multi-agent systems with AgentWorkflow](./agent/multi_agent.md)**: combining multiple agents to collaborate is a powerful technique for building more complex systems; this section shows you how to do so.
+
+- **[Workflows](./workflows/index.md)**: Workflows are a lower-level, event-driven abstraction for building agentic applications. They're the base layer you should be using to build any advanced agentic application. You can use the pre-built abstractions you learned above, or build agents completely from scratch. This tutorial covers:
+
+    - **[Building a simple workflow](./workflows/index.md)**: a simple workflow that shows you how to use the `Workflow` class to build a basic agentic application.
 
-- **Building an agent**: agents are LLM-powered knowledge workers that can interact with the world via a set of tools. Those tools can be RAG engines such as you learned how to build in the previous section, or any arbitrary code. This tutorial includes:
+    - **[Visualizing workflows](./workflows/visualizing_workflows.md)**: workflows can be visualized as a graph to help you understand the flow of control through your application.
 
-    - **[Building a basic agent](./agent/basic_agent.md)**: We show you how to build a simple agent that can interact with the world via a set of tools.
+    - **[Looping and branching](./workflows/looping_and_branching.md)**: these core control flow patterns are the building blocks of more complex workflows.
 
-    - **[Using local models with agents](./agent/local_models.md)**: Agents can be built to use local models, which can be important for performance or privacy reasons.
+    - **[Concurrent execution](./workflows/concurrent_execution.md)**: you can run steps in parallel to split up work efficiently.
 
-    - **[Adding RAG to an agent](./agent/rag_agent.md)**: The RAG pipelines you built in the previous tutorial can be used as a tool by an agent, giving your agent powerful information-retrieval capabilities.
+    - **[Streaming events](./workflows/streaming_events.md)**: your agents can emit user-facing events just like the agents you built above.
 
-    - **[Adding other tools](./agent/tools.md)**: Let's add more sophisticated tools to your agent, such as API integrations.
+    - **[Multi-agent systems from scratch](./workflows/multi_agent_system.md)**: you can build multi-agent systems from scratch using the techniques you've learned above.
 
-- **Building Workflows**: Workflows are a low-level, event-driven abstraction for building agentic applications. They're the base layer you should be using to build any custom, advanced RAG/agent system. You can use the pre-built abstractions you learned above, or build completely agentic applications from scratch. [Get started here](./workflows/index.md).
+- **[Adding RAG to your agents](./rag/index.md)**: Retrieval-Augmented Generation (RAG) is a key technique for getting your data to an LLM, and a component of more sophisticated agentic systems. We'll show you how to enhance your agents with a full-featured RAG pipeline that can answer questions about your data. This includes:
+
+    - **[Loading & Ingestion](./loading/loading.md)**: Getting your data from wherever it lives, whether that's unstructured text, PDFs, databases, or APIs to other applications. LlamaIndex has hundreds of connectors to every data source over at [LlamaHub](https://llamahub.ai/).
+
+    - **[Indexing and Embedding](./indexing/indexing.md)**: Once you've got your data there are an infinite number of ways to structure access to that data to ensure your applications is always working with the most relevant data. LlamaIndex has a huge number of these strategies built-in and can help you select the best ones.
+
+    - **[Storing](./storing/storing.md)**: You will probably find it more efficient to store your data in indexed form, or pre-processed summaries provided by an LLM, often in a specialized database known as a `Vector Store` (see below). You can also store your indexes, metadata and more.
+
+    - **[Querying](./querying/querying.md)**: Every indexing strategy has a corresponding querying strategy and there are lots of ways to improve the relevance, speed and accuracy of what you retrieve and what the LLM does with it before returning it to you, including turning it into structured responses such as an API.
 
 - **[Putting it all together](./putting_it_all_together/index.md)**: whether you are building question & answering, chatbots, an API, or an autonomous agent, we show you how to get your application into production.
 
diff --git a/docs/docs/understanding/using_llms/using_llms.md b/docs/docs/understanding/using_llms/using_llms.md
index 0b5bfb218c0fdb94f6ffed61aaf25ebf8dda72ea..efc848dad84103ec1fc3013aae609d4f280f8e3f 100644
--- a/docs/docs/understanding/using_llms/using_llms.md
+++ b/docs/docs/understanding/using_llms/using_llms.md
@@ -3,66 +3,75 @@
 !!! tip
     For a list of our supported LLMs and a comparison of their functionality, check out our [LLM module guide](../../module_guides/models/llms.md).
 
-One of the first steps when building an LLM-based application is which LLM to use; you can also use more than one if you wish.
+One of the first steps when building an LLM-based application is which LLM to use; they have different strengths and price points and you may wish to use more than one.
 
-LLMs are used at multiple different stages of your workflow:
+LlamaIndex provides a single interface to a large number of different LLMs. Using an LLM can be as simple as installing the appropriate integration:
 
-- During **Indexing** you may use an LLM to determine the relevance of data (whether to index it at all) or you may use an LLM to summarize the raw data and index the summaries instead.
-- During **Querying** LLMs can be used in two ways:
-  - During **Retrieval** (fetching data from your index) LLMs can be given an array of options (such as multiple different indices) and make decisions about where best to find the information you're looking for. An agentic LLM can also use _tools_ at this stage to query different data sources.
-  - During **Response Synthesis** (turning the retrieved data into an answer) an LLM can combine answers to multiple sub-queries into a single coherent answer, or it can transform data, such as from unstructured text to JSON or another programmatic output format.
+```bash
+pip install llama-index-llms-openai
+```
 
-LlamaIndex provides a single interface to a large number of different LLMs, allowing you to pass in any LLM you choose to any stage of the flow. It can be as simple as this:
+And then calling it in a one-liner:
 
 ```python
 from llama_index.llms.openai import OpenAI
 
-response = OpenAI().complete("Paul Graham is ")
+response = OpenAI().complete("William Shakespeare is ")
 print(response)
 ```
 
-Usually, you will instantiate an LLM and pass it to `Settings`, which you then pass to other stages of the flow, as in this example:
+Note that this requires an API key called `OPENAI_API_KEY` in your environment; see the [starter tutorial](../../getting_started/starter_example.md) for more details.
 
-```python
-from llama_index.llms.openai import OpenAI
-from llama_index.core import Settings
-from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
+`complete` is also available as an async method, `acomplete`.
 
-Settings.llm = OpenAI(temperature=0.2, model="gpt-4")
+You can also get a streaming response by calling `stream_complete`, which returns a generator that yields tokens as they are produced:
 
-documents = SimpleDirectoryReader("data").load_data()
-index = VectorStoreIndex.from_documents(
-    documents,
-)
 ```
+handle = OpenAI().stream_complete("William Shakespeare is ")
 
-In this case, you've instantiated OpenAI and customized it to use the `gpt-4` model instead of the default `gpt-3.5-turbo`, and also modified the `temperature`. The `VectorStoreIndex` will now use gpt-4 to answer questions when querying.
+for token in handle:
+    print(token.delta, end="", flush=True)
+```
 
-!!! tip
-    The `Settings` is a bundle of configuration data that you pass into different parts of LlamaIndex. You can [learn more about Settings](../../module_guides/supporting_modules/settings.md) and how to customize it.
+`stream_complete` is also available as an async method, `astream_complete`.
+
+## Chat interface
+
+The LLM class also implements a `chat` method, which allows you to have more sophisticated interactions:
+
+```python
+messages = [
+    ChatMessage(role="system", content="You are a helpful assistant."),
+    ChatMessage(role="user", content="Tell me a joke."),
+]
+chat_response = llm.chat(messages)
+```
+
+## Specifying models
+
+Many LLM integrations provide more than one model. You can specify a model by passing the `model` parameter to the LLM constructor:
+
+```python
+llm = OpenAI(model="gpt-4o-mini")
+response = llm.complete("Who is Laurie Voss?")
+print(response)
+```
 
 ## Available LLMs
 
-We support integrations with OpenAI, Hugging Face, PaLM, and more. Check out our [module guide to LLMs](../../module_guides/models/llms.md) for a full list, including how to run a local model.
+We support integrations with OpenAI, Anthropic, Mistral, DeepSeek, Hugging Face, and dozens more. Check out our [module guide to LLMs](../../module_guides/models/llms.md) for a full list, including how to run a local model.
 
 !!! tip
-    A general note on privacy and LLMs can be found on the [privacy page](./privacy.md).
+    A general note on privacy and LLM usage can be found on the [privacy page](./privacy.md).
 
 ### Using a local LLM
 
-LlamaIndex doesn't just support hosted LLM APIs; you can also [run a local model such as Llama2 locally](https://replicate.com/blog/run-llama-locally).
-
-For example, if you have [Ollama](https://github.com/ollama/ollama) installed and running:
+LlamaIndex doesn't just support hosted LLM APIs; you can also run a local model such as Meta's Llama 3 locally. For example, if you have [Ollama](https://github.com/ollama/ollama) installed and running:
 
 ```python
 from llama_index.llms.ollama import Ollama
-from llama_index.core import Settings
 
-Settings.llm = Ollama(model="llama2", request_timeout=60.0)
+llm = Ollama(model="llama3.3", request_timeout=60.0)
 ```
 
-See the [custom LLM's How-To](../../module_guides/models/llms/usage_custom.md) for more details.
-
-## Prompts
-
-By default LlamaIndex comes with a great set of built-in, battle-tested prompts that handle the tricky work of getting a specific LLM to correctly handle and format data. This is one of the biggest benefits of using LlamaIndex. If you want to, you can [customize the prompts](../../module_guides/models/prompts/index.md).
+See the [custom LLM's How-To](../../module_guides/models/llms/usage_custom.md) for more details on using and configuring LLM models.
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index ac5fcd298a9a83fc55fedc9df4da910187a6261f..4b5f4e5af07226059a5f7b24b1afd1d4662256c9 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -34,23 +34,13 @@ nav:
   - Learn:
       - Building an LLM Application: ./understanding/index.md
       - Using LLMs: ./understanding/using_llms/using_llms.md
-      - Building a RAG pipeline:
-          - Introduction to RAG: ./understanding/rag/index.md
-          - Loading & Ingestion:
-              - ./understanding/loading/loading.md
-              - ./understanding/loading/llamahub.md
-              - ./understanding/loading/llamacloud.md
-          - Indexing & Embedding: ./understanding/indexing/indexing.md
-          - Storing: ./understanding/storing/storing.md
-          - Querying: ./understanding/querying/querying.md
-      - Building an agent:
+      - Building agents:
           - Building a basic agent: ./understanding/agent/index.md
-          - Agents with local models: ./understanding/agent/local_models.md
-          - Adding RAG to an agent: ./understanding/agent/rag_agent.md
-          - Enhancing with LlamaParse: ./understanding/agent/llamaparse.md
-          - Memory: ./understanding/agent/memory.md
-          - Adding other tools: ./understanding/agent/tools.md
-          - Multi-agent workflows: ./understanding/agent/multi_agents.md
+          - Using existing tools: ./understanding/agent/tools.md
+          - Maintaining state: ./understanding/agent/state.md
+          - Streaming output and events: ./understanding/agent/streaming.md
+          - Human in the loop: ./understanding/agent/human_in_the_loop.md
+          - Multi-agent workflows: ./understanding/agent/multi_agent.md
       - Building Workflows:
           - Introduction to workflows: ./understanding/workflows/index.md
           - A basic workflow: ./understanding/workflows/basic_flow.md
@@ -62,6 +52,15 @@ nav:
           - Nested workflows: ./understanding/workflows/nested.md
           - Observability: ./understanding/workflows/observability.md
           - Unbound syntax: ./understanding/workflows/unbound_functions.md
+      - Building a RAG pipeline:
+          - Introduction to RAG: ./understanding/rag/index.md
+          - Loading & Ingestion:
+              - ./understanding/loading/loading.md
+              - ./understanding/loading/llamahub.md
+              - ./understanding/loading/llamacloud.md
+          - Indexing & Embedding: ./understanding/indexing/indexing.md
+          - Storing: ./understanding/storing/storing.md
+          - Querying: ./understanding/querying/querying.md
       - Structured Data Extraction:
           - Introduction: ./understanding/extraction/index.md
           - Using Structured LLMs: ./understanding/extraction/structured_llms.md