diff --git a/docs/docs/examples/agent/structured_planner.ipynb b/docs/docs/examples/agent/structured_planner.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..908916c7f64f8636b0d6a6e023924a205d90bae5
--- /dev/null
+++ b/docs/docs/examples/agent/structured_planner.ipynb
@@ -0,0 +1,942 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Structured Planning Agent\n",
+    "\n",
+    "A key pattern in agents is the ability to plan. ReAct for example, uses a structured approach to decompose an input into a set of function calls and thoughts, in order to reason about a final response.\n",
+    "\n",
+    "However, breaking down the initial input/task into several sub-tasks can make the ReAct loop (or other reasoning loops) easier to execute.\n",
+    "\n",
+    "The `StructuredPlanningAgnet` in LlamaIndex wraps any agent worker (ReAct, Function Calling, Chain-of-Abstraction, etc.) and decomposes an initial input into several sub-tasks. Each sub-task is represented by an input, expected outcome, and any dependendant sub-tasks that should be completed first.\n",
+    "\n",
+    "This notebook walks through both the high-level and low-level usage of this agent.\n",
+    "\n",
+    "**NOTE:** This agent leverages both structured outputs and agentic reasoning. Because of this, we would recommend a capable LLM (OpenAI, Anthropic, etc.), and open-source LLMs may struggle to plan without prompt engineering or fine-tuning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "In order to create plans, we need a set of tools to create plans on top of. Here, we use some classic 10k examples."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!mkdir -p 'data/10k/'\n",
+    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'\n",
+    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core import Settings\n",
+    "from llama_index.llms.openai import OpenAI\n",
+    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
+    "\n",
+    "# Use ollama in JSON mode\n",
+    "Settings.llm = OpenAI(\n",
+    "    model=\"gpt-4-turbo\",\n",
+    "    temperature=0.1,\n",
+    ")\n",
+    "Settings.embed_model = OpenAIEmbedding(model_name=\"text-embedding-3-small\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
+    "from llama_index.core.tools import QueryEngineTool\n",
+    "\n",
+    "# Load documents, create tools\n",
+    "lyft_documents = SimpleDirectoryReader(\n",
+    "    input_files=[\"./data/10k/lyft_2021.pdf\"]\n",
+    ").load_data()\n",
+    "uber_documents = SimpleDirectoryReader(\n",
+    "    input_files=[\"./data/10k/uber_2021.pdf\"]\n",
+    ").load_data()\n",
+    "\n",
+    "lyft_index = VectorStoreIndex.from_documents(lyft_documents)\n",
+    "uber_index = VectorStoreIndex.from_documents(uber_documents)\n",
+    "\n",
+    "lyft_tool = QueryEngineTool.from_defaults(\n",
+    "    lyft_index.as_query_engine(),\n",
+    "    name=\"lyft_2021\",\n",
+    "    description=\"Useful for asking questions about Lyft's 2021 10-K filling.\",\n",
+    ")\n",
+    "\n",
+    "uber_tool = QueryEngineTool.from_defaults(\n",
+    "    uber_index.as_query_engine(),\n",
+    "    name=\"uber_2021\",\n",
+    "    description=\"Useful for asking questions about Uber's 2021 10-K filling.\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## High Level API\n",
+    "\n",
+    "In this section, we cover the high-level API for creating with and chatting with a structured planning agent.\n",
+    "\n",
+    "### Create the Agent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.agent import (\n",
+    "    StructuredPlannerAgent,\n",
+    "    FunctionCallingAgentWorker,\n",
+    "    ReActAgentWorker,\n",
+    ")\n",
+    "\n",
+    "# create the function calling worker for reasoning\n",
+    "worker = FunctionCallingAgentWorker.from_tools(\n",
+    "    [lyft_tool, uber_tool], verbose=True\n",
+    ")\n",
+    "\n",
+    "# wrap the worker in the top-level planner\n",
+    "agent = StructuredPlannerAgent(\n",
+    "    worker, tools=[lyft_tool, uber_tool], verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Give the agent a complex task"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nest_asyncio\n",
+    "\n",
+    "nest_asyncio.apply()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "=== Initial plan ===\n",
+      "Extract Lyft Risk Factors:\n",
+      "Summarize the key risk factors from Lyft's 2021 10-K filing. -> A summary of the key risk factors for Lyft as outlined in their 2021 10-K filing.\n",
+      "deps: []\n",
+      "\n",
+      "\n",
+      "Extract Uber Risk Factors:\n",
+      "Summarize the key risk factors from Uber's 2021 10-K filing. -> A summary of the key risk factors for Uber as outlined in their 2021 10-K filing.\n",
+      "deps: []\n",
+      "\n",
+      "\n",
+      "Combine Risk Factor Summaries:\n",
+      "Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings. -> A comprehensive summary of the key risk factors for both Lyft and Uber as outlined in their respective 2021 10-K filings.\n",
+      "deps: ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n",
+      "\n",
+      "\n",
+      "> Running step 99c90044-9092-4e1a-828d-0a344fa1534f. Step input: Summarize the key risk factors from Lyft's 2021 10-K filing.\n",
+      "Added user message to memory: Summarize the key risk factors from Lyft's 2021 10-K filing.\n",
+      "> Running step a64c6c00-27f8-4e0e-ab6c-60e30bdbeba9. Step input: Summarize the key risk factors from Uber's 2021 10-K filing.\n",
+      "Added user message to memory: Summarize the key risk factors from Uber's 2021 10-K filing.\n",
+      "=== Calling Function ===\n",
+      "Calling function: lyft_2021 with args: {\"input\": \"key risk factors\"}\n",
+      "=== Calling Function ===\n",
+      "Calling function: uber_2021 with args: {\"input\": \"key risk factors\"}\n",
+      "=== Function Output ===\n",
+      "The key risk factors include market risks such as interest rate risk, investment risk, and foreign currency risk. Interest rate risk is associated with the company's refinanced term loan facilities and fixed rate notes, which are sensitive to changes in interest rates. Investment risk involves the preservation of capital and meeting liquidity requirements without significantly increasing risk, with exposure to changes in interest rates and the carrying values of investments in other companies. Foreign currency risk arises from international transactions in multiple currencies, which can affect revenue and operating results due to fluctuations in exchange rates. Additionally, the company faces risks from potential cyberattacks, which could harm its reputation, business, and operating results. These include threats from malware, ransomware, viruses, spamming, and phishing, which could compromise data security and the integrity of information technology systems.\n",
+      "> Running step 9960ce19-3ad1-4d10-b8ba-cca8ee1dd840. Step input: None\n",
+      "=== Function Output ===\n",
+      "Key risk factors for Lyft include:\n",
+      "\n",
+      "1. General economic factors such as the impact of the COVID-19 pandemic, natural disasters, economic downturns, public health crises, and political crises.\n",
+      "2. Operational factors including Lyft's limited operating history, challenges in achieving or maintaining profitability, competition in the industry, unpredictability of results, and uncertainty regarding the growth of the ridesharing market.\n",
+      "3. The company's ability to attract and retain qualified drivers and riders.\n",
+      "4. Issues related to insurance coverage and the adequacy of insurance reserves.\n",
+      "5. Challenges related to autonomous vehicle technology and the development of the autonomous vehicle industry.\n",
+      "6. The company's reputation, brand, and company culture.\n",
+      "7. Illegal or improper activity by users of the platform.\n",
+      "8. The accuracy of background checks on potential or current drivers.\n",
+      "9. Changes to pricing practices.\n",
+      "10. The growth and quality of Lyft's network of Light Vehicles.\n",
+      "11. The company's ability to manage growth.\n",
+      "12. Security or privacy breaches, or incidents, as well as defects, errors, or vulnerabilities in technology.\n",
+      "13. Reliance on third parties such as Amazon Web Services, vehicle rental partners, payment processors, and other service providers.\n",
+      "14. The operation of Lyft's Express Drive and Lyft Rentals programs and its delivery service platform.\n",
+      "15. The ability to effectively match riders in Shared and Shared Saver Rides offerings and manage up-front pricing methodology.\n",
+      "16. The development of new offerings on the platform and management of the complexities of such expansion.\n",
+      "> Running step 24e5f72f-c0d6-4bbe-af7a-4b4a86ca5fee. Step input: None\n",
+      "=== LLM Response ===\n",
+      "The key risk factors from Uber's 2021 10-K filing include:\n",
+      "\n",
+      "1. **Market Risks:**\n",
+      "   - **Interest Rate Risk:** Related to the company's refinanced term loan facilities and fixed rate notes, which are sensitive to changes in interest rates.\n",
+      "   - **Investment Risk:** Involves the preservation of capital and meeting liquidity requirements without significantly increasing risk. This includes exposure to changes in interest rates and the carrying values of investments in other companies.\n",
+      "   - **Foreign Currency Risk:** Arises from international transactions in multiple currencies, which can affect revenue and operating results due to fluctuations in exchange rates.\n",
+      "\n",
+      "2. **Cybersecurity Risks:**\n",
+      "   - Potential cyberattacks could harm Uber's reputation, business, and operating results. These include threats from malware, ransomware, viruses, spamming, and phishing, which could compromise data security and the integrity of information technology systems.\n",
+      "=== LLM Response ===\n",
+      "Lyft's 2021 10-K filing highlights several key risk factors that could impact the company's business:\n",
+      "\n",
+      "1. **Economic and External Factors**: Risks related to the COVID-19 pandemic, natural disasters, economic downturns, public health crises, and political crises.\n",
+      "2. **Operational Challenges**: Limited operating history, difficulty in achieving or maintaining profitability, intense competition, unpredictability of financial results, and uncertainty in the growth of the ridesharing market.\n",
+      "3. **Driver and Rider Base**: The necessity to attract and retain qualified drivers and riders.\n",
+      "4. **Insurance and Liability**: Issues concerning insurance coverage and the adequacy of insurance reserves.\n",
+      "5. **Autonomous Vehicles**: Challenges related to the development and integration of autonomous vehicle technology.\n",
+      "6. **Reputation and Brand**: The importance of maintaining a positive reputation, brand image, and company culture.\n",
+      "7. **User Conduct**: Risks from illegal or improper activity by users on the platform.\n",
+      "8. **Background Checks**: The accuracy and effectiveness of background checks on drivers.\n",
+      "9. **Pricing Practices**: Changes and management of pricing strategies.\n",
+      "10. **Network Growth**: Expansion and quality maintenance of Lyft's network of Light Vehicles.\n",
+      "11. **Growth Management**: The company's ability to manage and sustain growth.\n",
+      "12. **Security and Privacy**: Risks from security or privacy breaches, and technological defects or vulnerabilities.\n",
+      "13. **Third-Party Dependencies**: Reliance on third-party providers like Amazon Web Services, vehicle rental partners, and payment processors.\n",
+      "14. **Program Operations**: Management of Lyft's Express Drive and Lyft Rentals programs, and its delivery service platform.\n",
+      "15. **Service Offerings**: Effective matching of riders in Shared and Shared Saver Rides and managing up-front pricing methodology.\n",
+      "16. **Platform Expansion**: Development of new offerings on the platform and managing the complexities associated with such expansions.\n",
+      "=== Refined plan ===\n",
+      "Combine Risk Factor Summaries:\n",
+      "Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings. -> A comprehensive summary of the key risk factors for both Lyft and Uber as outlined in their respective 2021 10-K filings.\n",
+      "deps: ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n",
+      "\n",
+      "\n",
+      "Summarize Key Risk Factors:\n",
+      "Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings. -> A detailed summary of the key risk factors for both Lyft and Uber, highlighting the main concerns and challenges as reported in their 2021 10-K filings.\n",
+      "deps: ['Combine Risk Factor Summaries']\n",
+      "\n",
+      "\n",
+      "> Running step a051042d-630a-4085-b7d3-605386dff2a4. Step input: Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings.\n",
+      "Added user message to memory: Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings.\n",
+      "=== LLM Response ===\n",
+      "The key risk factors from the 2021 10-K filings for Lyft and Uber highlight several overlapping and unique challenges faced by both companies:\n",
+      "\n",
+      "**Common Risk Factors:**\n",
+      "1. **Economic and Market Conditions**: Both companies are affected by general economic factors such as the impact of the COVID-19 pandemic, natural disasters, and economic downturns. Additionally, Uber specifically mentions foreign currency risk due to international operations.\n",
+      "2. **Operational Challenges**: Both companies face challenges in achieving or maintaining profitability, competition in the industry, and unpredictability of results.\n",
+      "3. **Regulatory and Legal Risks**: Although not explicitly mentioned in Lyft's summary, both companies operate in environments that are heavily regulated and subject to legal uncertainties.\n",
+      "4. **Technology and Cybersecurity**: Both companies emphasize the importance of managing security or privacy breaches and technological vulnerabilities. Uber specifically notes risks from cyberattacks such as malware and phishing.\n",
+      "5. **Dependence on Third Parties**: Both companies rely on third-party providers for critical services, such as Amazon Web Services for Lyft and various service providers for Uber.\n",
+      "\n",
+      "**Unique Risk Factors for Lyft:**\n",
+      "- **Driver and Rider Base**: Challenges related to attracting and retaining qualified drivers and riders.\n",
+      "- **Insurance and Liability**: Specific concerns about insurance coverage and adequacy of reserves.\n",
+      "- **Autonomous Vehicles**: Focus on the challenges related to autonomous vehicle technology.\n",
+      "- **User Conduct and Background Checks**: Issues related to illegal or improper activity by users and the accuracy of background checks.\n",
+      "- **Service Offerings and Platform Expansion**: Management of specific programs like Express Drive, Lyft Rentals, and complexities in expanding new offerings.\n",
+      "\n",
+      "**Unique Risk Factors for Uber:**\n",
+      "- **Interest Rate and Investment Risk**: Concerns related to refinanced term loan facilities, fixed rate notes, and investments in other companies.\n",
+      "- **Foreign Currency Risk**: Exposure to fluctuations in exchange rates affecting revenue and operating results.\n",
+      "\n",
+      "**Summary:**\n",
+      "Both Lyft and Uber face significant risks related to market conditions, operational challenges, technology, and dependencies on third parties. While Lyft focuses more on internal operations, driver relations, and service expansions, Uber highlights financial risks and cybersecurity threats. Both companies must navigate these risks to sustain and grow their operations in the competitive ridesharing and broader transportation markets.\n",
+      "=== Refined plan ===\n",
+      "Summarize Key Risk Factors:\n",
+      "Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings. -> A detailed summary of the key risk factors for both Lyft and Uber, highlighting the main concerns and challenges as reported in their 2021 10-K filings.\n",
+      "deps: ['Combine Risk Factor Summaries']\n",
+      "\n",
+      "\n",
+      "Combine Risk Factor Summaries:\n",
+      "Combine the key risk factors from Lyft and Uber's 2021 10-K filings into a comprehensive summary. -> A combined summary of the key risk factors for both Lyft and Uber, highlighting both common and unique challenges faced by each company.\n",
+      "deps: ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n",
+      "\n",
+      "\n",
+      "Extract Lyft Risk Factors:\n",
+      "Extract the key risk factors from Lyft's 2021 10-K filing. -> A detailed list of key risk factors from Lyft's 2021 10-K filing.\n",
+      "deps: []\n",
+      "\n",
+      "\n",
+      "Extract Uber Risk Factors:\n",
+      "Extract the key risk factors from Uber's 2021 10-K filing. -> A detailed list of key risk factors from Uber's 2021 10-K filing.\n",
+      "deps: []\n",
+      "\n",
+      "\n",
+      "> Running step 5fc6020c-d8b0-401e-9218-248162d75be5. Step input: Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings.\n",
+      "Added user message to memory: Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings.\n",
+      "=== LLM Response ===\n",
+      "The 2021 10-K filings for Lyft and Uber reveal several key risk factors that both companies face, along with some unique challenges specific to each:\n",
+      "\n",
+      "**Common Risk Factors:**\n",
+      "1. **Economic and Market Conditions**: Both companies are impacted by broader economic factors such as the COVID-19 pandemic, natural disasters, and economic downturns, which can affect demand and operational stability.\n",
+      "2. **Operational Challenges**: Challenges in achieving or maintaining profitability, intense competition in the ridesharing industry, and the unpredictability of financial results are common concerns.\n",
+      "3. **Technology and Cybersecurity**: Both companies emphasize the importance of managing security or privacy breaches and technological vulnerabilities, with Uber specifically noting risks from cyberattacks such as malware and phishing.\n",
+      "4. **Dependence on Third Parties**: Both rely on third-party providers for critical services, such as Amazon Web Services for Lyft and various service providers for Uber, making them vulnerable to external disruptions.\n",
+      "\n",
+      "**Unique Risk Factors for Lyft:**\n",
+      "- **Driver and Rider Base**: The necessity to attract and retain qualified drivers and riders.\n",
+      "- **Insurance and Liability**: Specific concerns about insurance coverage and the adequacy of reserves.\n",
+      "- **Autonomous Vehicles**: Focus on the challenges related to autonomous vehicle technology.\n",
+      "- **User Conduct and Background Checks**: Issues related to illegal or improper activity by users and the accuracy of background checks.\n",
+      "- **Service Offerings and Platform Expansion**: Management of specific programs like Express Drive, Lyft Rentals, and complexities in expanding new offerings.\n",
+      "\n",
+      "**Unique Risk Factors for Uber:**\n",
+      "- **Interest Rate and Investment Risk**: Concerns related to refinanced term loan facilities, fixed rate notes, and investments in other companies.\n",
+      "- **Foreign Currency Risk**: Exposure to fluctuations in exchange rates affecting revenue and operating results.\n",
+      "\n",
+      "**Summary:**\n",
+      "Both Lyft and Uber navigate a complex landscape of economic, operational, and technological risks. While Lyft focuses more on internal operations, driver relations, and service expansions, Uber highlights financial risks and cybersecurity threats. These factors are crucial for understanding the challenges and strategic considerations of both companies in the competitive ridesharing and broader transportation markets.\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = agent.chat(\n",
+    "    \"Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings.\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "assistant: The 2021 10-K filings for Lyft and Uber reveal several key risk factors that both companies face, along with some unique challenges specific to each:\n",
+      "\n",
+      "**Common Risk Factors:**\n",
+      "1. **Economic and Market Conditions**: Both companies are impacted by broader economic factors such as the COVID-19 pandemic, natural disasters, and economic downturns, which can affect demand and operational stability.\n",
+      "2. **Operational Challenges**: Challenges in achieving or maintaining profitability, intense competition in the ridesharing industry, and the unpredictability of financial results are common concerns.\n",
+      "3. **Technology and Cybersecurity**: Both companies emphasize the importance of managing security or privacy breaches and technological vulnerabilities, with Uber specifically noting risks from cyberattacks such as malware and phishing.\n",
+      "4. **Dependence on Third Parties**: Both rely on third-party providers for critical services, such as Amazon Web Services for Lyft and various service providers for Uber, making them vulnerable to external disruptions.\n",
+      "\n",
+      "**Unique Risk Factors for Lyft:**\n",
+      "- **Driver and Rider Base**: The necessity to attract and retain qualified drivers and riders.\n",
+      "- **Insurance and Liability**: Specific concerns about insurance coverage and the adequacy of reserves.\n",
+      "- **Autonomous Vehicles**: Focus on the challenges related to autonomous vehicle technology.\n",
+      "- **User Conduct and Background Checks**: Issues related to illegal or improper activity by users and the accuracy of background checks.\n",
+      "- **Service Offerings and Platform Expansion**: Management of specific programs like Express Drive, Lyft Rentals, and complexities in expanding new offerings.\n",
+      "\n",
+      "**Unique Risk Factors for Uber:**\n",
+      "- **Interest Rate and Investment Risk**: Concerns related to refinanced term loan facilities, fixed rate notes, and investments in other companies.\n",
+      "- **Foreign Currency Risk**: Exposure to fluctuations in exchange rates affecting revenue and operating results.\n",
+      "\n",
+      "**Summary:**\n",
+      "Both Lyft and Uber navigate a complex landscape of economic, operational, and technological risks. While Lyft focuses more on internal operations, driver relations, and service expansions, Uber highlights financial risks and cybersecurity threats. These factors are crucial for understanding the challenges and strategic considerations of both companies in the competitive ridesharing and broader transportation markets.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(str(response))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Low-level API [Advanced] \n",
+    "\n",
+    "In this section, we use the same agent, but expose the lower-level steps that are happening under the hood.\n",
+    "\n",
+    "This is useful for when you want to expose the underlying plan, tasks, etc. to a human to modify them on the fly, or for debugging and running things step-by-step."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create the Agent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_index.core.agent import (\n",
+    "    StructuredPlannerAgent,\n",
+    "    FunctionCallingAgentWorker,\n",
+    "    ReActAgentWorker,\n",
+    ")\n",
+    "\n",
+    "# create the react worker for reasoning\n",
+    "worker = FunctionCallingAgentWorker.from_tools(\n",
+    "    [lyft_tool, uber_tool], verbose=True\n",
+    ")\n",
+    "\n",
+    "# wrap the worker in the top-level planner\n",
+    "agent = StructuredPlannerAgent(\n",
+    "    worker, tools=[lyft_tool, uber_tool], verbose=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Create the initial tasks and plan"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "=== Initial plan ===\n",
+      "Extract Lyft Risk Factors:\n",
+      "What are the key risk factors listed in Lyft's 2021 10-K filing? -> A summary of key risk factors for Lyft from its 2021 10-K filing.\n",
+      "deps: []\n",
+      "\n",
+      "\n",
+      "Extract Uber Risk Factors:\n",
+      "What are the key risk factors listed in Uber's 2021 10-K filing? -> A summary of key risk factors for Uber from its 2021 10-K filing.\n",
+      "deps: []\n",
+      "\n",
+      "\n",
+      "Summarize Risk Factors:\n",
+      "Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings into a comprehensive overview. -> A comprehensive summary of the key risk factors for both Lyft and Uber based on their 2021 10-K filings.\n",
+      "deps: ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "plan_id = agent.create_tasks(\n",
+    "    \"Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings.\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Inspect the initial tasks and plan"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "===== Sub Task Extract Lyft Risk Factors =====\n",
+      "Expected output:  A summary of key risk factors for Lyft from its 2021 10-K filing.\n",
+      "Dependencies:  []\n",
+      "===== Sub Task Extract Uber Risk Factors =====\n",
+      "Expected output:  A summary of key risk factors for Uber from its 2021 10-K filing.\n",
+      "Dependencies:  []\n",
+      "===== Sub Task Summarize Risk Factors =====\n",
+      "Expected output:  A comprehensive summary of the key risk factors for both Lyft and Uber based on their 2021 10-K filings.\n",
+      "Dependencies:  ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n"
+     ]
+    }
+   ],
+   "source": [
+    "plan = agent.state.plan_dict[plan_id]\n",
+    "\n",
+    "for sub_task in plan.sub_tasks:\n",
+    "    print(f\"===== Sub Task {sub_task.name} =====\")\n",
+    "    print(\"Expected output: \", sub_task.expected_output)\n",
+    "    print(\"Dependencies: \", sub_task.dependencies)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Execute the first set of tasks\n",
+    "\n",
+    "Here, we execute the first set of tasks with their dependencies met."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "===== Sub Task Extract Lyft Risk Factors =====\n",
+      "Expected output:  A summary of key risk factors for Lyft from its 2021 10-K filing.\n",
+      "Dependencies:  []\n",
+      "===== Sub Task Extract Uber Risk Factors =====\n",
+      "Expected output:  A summary of key risk factors for Uber from its 2021 10-K filing.\n",
+      "Dependencies:  []\n",
+      "> Running step 45e01153-b625-4d90-869c-eb87da222c1b. Step input: What are the key risk factors listed in Lyft's 2021 10-K filing?\n",
+      "Added user message to memory: What are the key risk factors listed in Lyft's 2021 10-K filing?\n",
+      "=== Calling Function ===\n",
+      "Calling function: lyft_2021 with args: {\"input\": \"What are the key risk factors listed in the 2021 10-K filing?\"}\n",
+      "=== Function Output ===\n",
+      "The key risk factors listed in the 2021 10-K filing include:\n",
+      "\n",
+      "1. General economic factors such as the impact of the COVID-19 pandemic, natural disasters, economic downturns, public health crises, and political crises.\n",
+      "2. Operational factors including the company's limited operating history, competition in the industry, unpredictability of results, and the growth of the ridesharing market.\n",
+      "3. The company's ability to attract and retain qualified drivers and riders, manage insurance coverage and reserves, and handle claims through third-party insurance providers.\n",
+      "4. Challenges related to autonomous vehicle technology and the development of the autonomous vehicle industry.\n",
+      "5. The company's reputation, brand, and company culture, as well as the illegal or improper activity of users on its platform.\n",
+      "6. The accuracy of background checks on potential or current drivers and changes to pricing practices.\n",
+      "7. The growth and quality of the network of Light Vehicles, and the company's ability to manage its growth.\n",
+      "8. Security or privacy breaches, defects, errors, or vulnerabilities in technology, and system failures that could interrupt service availability.\n",
+      "9. Reliance on third parties such as Amazon Web Services, vehicle rental partners, payment processors, and other service providers.\n",
+      "10. The operation of programs like Express Drive and Lyft Rentals, and the delivery service platform.\n",
+      "11. The ability to effectively match riders in Shared and Shared Saver Rides offerings and manage up-front pricing methodology.\n",
+      "12. The development of new offerings on the platform and managing the complexities of such expansion.\n",
+      "\n",
+      "Additionally, there are risks related to regulatory and legal factors, financing and transactional risks, and governance risks related to the ownership of capital stock, including the classification status of drivers, compliance with privacy and data protection laws, intellectual property litigation, future capital requirements, and the dual class structure of common stock.\n",
+      "> Running step 4981f131-29bc-40c6-a926-ff019cedd452. Step input: None\n",
+      "=== LLM Response ===\n",
+      "The key risk factors listed in Lyft's 2021 10-K filing include:\n",
+      "\n",
+      "1. **General Economic Factors**: Impact of the COVID-19 pandemic, natural disasters, economic downturns, public health crises, and political crises.\n",
+      "2. **Operational Factors**: Limited operating history, competition in the industry, unpredictability of results, and the growth of the ridesharing market.\n",
+      "3. **Driver and Rider Management**: Ability to attract and retain qualified drivers and riders, manage insurance coverage and reserves, and handle claims through third-party insurance providers.\n",
+      "4. **Autonomous Vehicle Technology**: Challenges related to the development of the autonomous vehicle industry.\n",
+      "5. **Reputation and Culture**: Company's reputation, brand, and company culture, as well as the illegal or improper activity of users on its platform.\n",
+      "6. **Background Checks and Pricing**: Accuracy of background checks on potential or current drivers and changes to pricing practices.\n",
+      "7. **Network Growth and Management**: Growth and quality of the network of Light Vehicles, and the company's ability to manage its growth.\n",
+      "8. **Technology and Security**: Security or privacy breaches, defects, errors, or vulnerabilities in technology, and system failures that could interrupt service availability.\n",
+      "9. **Reliance on Third Parties**: Dependence on third parties such as Amazon Web Services, vehicle rental partners, payment processors, and other service providers.\n",
+      "10. **Program Operations**: Operation of programs like Express Drive and Lyft Rentals, and the delivery service platform.\n",
+      "11. **Ride Matching and Pricing**: Ability to effectively match riders in Shared and Shared Saver Rides offerings and manage up-front pricing methodology.\n",
+      "12. **Platform Expansion**: Development of new offerings on the platform and managing the complexities of such expansion.\n",
+      "\n",
+      "Additionally, there are risks related to regulatory and legal factors, financing and transactional risks, and governance risks related to the ownership of capital stock, including the classification status of drivers, compliance with privacy and data protection laws, intellectual property litigation, future capital requirements, and the dual class structure of common stock.\n",
+      "> Running step 72c79fd5-1392-44f2-b58d-40b3200059df. Step input: What are the key risk factors listed in Uber's 2021 10-K filing?\n",
+      "Added user message to memory: What are the key risk factors listed in Uber's 2021 10-K filing?\n",
+      "=== Calling Function ===\n",
+      "Calling function: uber_2021 with args: {\"input\": \"What are the key risk factors listed in the 2021 10-K filing?\"}\n",
+      "=== Function Output ===\n",
+      "The key risk factors listed in the 2021 10-K filing include:\n",
+      "\n",
+      "1. The adverse effects of the COVID-19 pandemic on the business.\n",
+      "2. The potential reclassification of drivers as employees or quasi-employees instead of independent contractors.\n",
+      "3. Intense competition in the mobility, delivery, and logistics industries, characterized by well-established and low-cost alternatives, low barriers to entry, low switching costs, and well-capitalized competitors.\n",
+      "4. The practice of lowering fares or service fees and offering significant driver incentives and consumer discounts to remain competitive, which could impact profitability.\n",
+      "5. Historical significant losses and the expectation of increased operating expenses, with uncertainty about achieving or maintaining profitability.\n",
+      "6. The necessity of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers to keep the platform appealing.\n",
+      "7. The importance of maintaining and enhancing the company's brand and reputation, especially following significant negative publicity in the past.\n",
+      "8. Challenges related to the company's historical workplace culture and the ongoing efforts to address these issues.\n",
+      "9. The need to optimize the organizational structure and effectively manage growth to ensure good financial performance and future prospects.\n",
+      "> Running step 64cd53fe-7eeb-43a5-9e37-a18294db5531. Step input: None\n",
+      "=== LLM Response ===\n",
+      "The key risk factors listed in Uber's 2021 10-K filing include:\n",
+      "\n",
+      "1. **COVID-19 Pandemic**: Adverse effects on the business due to the pandemic.\n",
+      "2. **Driver Reclassification**: Potential reclassification of drivers as employees or quasi-employees instead of independent contractors.\n",
+      "3. **Intense Competition**: Competition in the mobility, delivery, and logistics industries, characterized by well-established and low-cost alternatives, low barriers to entry, low switching costs, and well-capitalized competitors.\n",
+      "4. **Pricing and Incentives**: Practice of lowering fares or service fees and offering significant driver incentives and consumer discounts to remain competitive, which could impact profitability.\n",
+      "5. **Financial Losses**: Historical significant losses and the expectation of increased operating expenses, with uncertainty about achieving or maintaining profitability.\n",
+      "6. **Platform Dynamics**: Necessity of attracting and maintaining a critical mass of drivers, consumers, merchants, shippers, and carriers to keep the platform appealing.\n",
+      "7. **Brand and Reputation**: Importance of maintaining and enhancing the company's brand and reputation, especially following significant negative publicity in the past.\n",
+      "8. **Workplace Culture**: Challenges related to the company's historical workplace culture and the ongoing efforts to address these issues.\n",
+      "9. **Organizational Management**: Need to optimize the organizational structure and effectively manage growth to ensure good financial performance and future prospects.\n"
+     ]
+    }
+   ],
+   "source": [
+    "next_tasks = agent.state.get_next_sub_tasks(plan_id)\n",
+    "\n",
+    "for sub_task in next_tasks:\n",
+    "    print(f\"===== Sub Task {sub_task.name} =====\")\n",
+    "    print(\"Expected output: \", sub_task.expected_output)\n",
+    "    print(\"Dependencies: \", sub_task.dependencies)\n",
+    "\n",
+    "completed_tasks = []\n",
+    "for sub_task in next_tasks:\n",
+    "    response = agent.run_task(sub_task.name)\n",
+    "    completed_tasks.append((sub_task, response))\n",
+    "    agent.state.add_completed_sub_task(plan_id, sub_task)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If we wanted to, we could even execute each task in a step-wise fashion. It would look something like this:\n",
+    "\n",
+    "```python\n",
+    "# Step-wise execution per task\n",
+    "\n",
+    "for sub_task in next_tasks:\n",
+    "    # get the task from the state \n",
+    "    task = agent.state.get_task(sub_task.name)\n",
+    "\n",
+    "    # run intial resoning step\n",
+    "    step_output = agent.run_step(task.task_id)\n",
+    "\n",
+    "    # loop until the last step is reached\n",
+    "    while not step_output.is_last:\n",
+    "        step_output = agent.run_step(task.task_id)\n",
+    "    \n",
+    "    # finalize the response and commit to memory\n",
+    "    agent.finalize_response(task.task_id, step_output=step_output)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Check if we are done\n",
+    "\n",
+    "If there are no remaining tasks, then we can stop. Otherwise, we can refine the current plan and continue"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1\n"
+     ]
+    }
+   ],
+   "source": [
+    "next_tasks = agent.state.get_next_sub_tasks(plan_id)\n",
+    "print(len(next_tasks))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "===== Sub Task Summarize Risk Factors =====\n",
+      "Expected output:  A comprehensive summary of the key risk factors for both Lyft and Uber based on their 2021 10-K filings.\n",
+      "Dependencies:  ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n"
+     ]
+    }
+   ],
+   "source": [
+    "for sub_task in next_tasks:\n",
+    "    print(f\"===== Sub Task {sub_task.name} =====\")\n",
+    "    print(\"Expected output: \", sub_task.expected_output)\n",
+    "    print(\"Dependencies: \", sub_task.dependencies)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Refine the plan\n",
+    "\n",
+    "Since we have tasks remaining, lets refine our plan to make sure we are on track."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "=== Refined plan ===\n",
+      "Summarize Risk Factors:\n",
+      "Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings into a comprehensive overview. -> A comprehensive summary of the key risk factors for both Lyft and Uber based on their 2021 10-K filings.\n",
+      "deps: ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n",
+      "\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# refine the plan\n",
+    "agent.refine_plan(\n",
+    "    plan_id,\n",
+    "    \"Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings.\",\n",
+    "    completed_tasks,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "===== Sub Task Summarize Risk Factors =====\n",
+      "Expected output:  A comprehensive summary of the key risk factors for both Lyft and Uber based on their 2021 10-K filings.\n",
+      "Dependencies:  ['Extract Lyft Risk Factors', 'Extract Uber Risk Factors']\n"
+     ]
+    }
+   ],
+   "source": [
+    "plan = agent.state.plan_dict[plan_id]\n",
+    "\n",
+    "for sub_task in plan.sub_tasks:\n",
+    "    print(f\"===== Sub Task {sub_task.name} =====\")\n",
+    "    print(\"Expected output: \", sub_task.expected_output)\n",
+    "    print(\"Dependencies: \", sub_task.dependencies)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Loop until done\n",
+    "\n",
+    "With our plan refined, we can repeat this process until we have no more tasks to run."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "> Running step e59c6ac0-8303-4f8f-acbc-4d9670362f1b. Step input: Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings into a comprehensive overview.\n",
+      "Added user message to memory: Combine the summaries of key risk factors for Lyft and Uber from their 2021 10-K filings into a comprehensive overview.\n",
+      "=== LLM Response ===\n",
+      "Here is a comprehensive overview of the key risk factors for both Lyft and Uber from their 2021 10-K filings:\n",
+      "\n",
+      "### Common Risk Factors\n",
+      "1. **COVID-19 Pandemic**: Both companies highlight the adverse effects of the pandemic on their operations and overall business environment.\n",
+      "2. **Driver Classification**: Risk associated with the potential reclassification of drivers as employees rather than independent contractors, impacting business models and cost structures.\n",
+      "3. **Intense Competition**: Both face intense competition in the mobility and delivery sectors, with challenges from well-capitalized competitors, low barriers to entry, and the necessity to offer competitive pricing and incentives.\n",
+      "4. **Brand and Reputation**: Importance of maintaining a strong brand and reputation, especially given past negative publicity and the potential for illegal or improper user activities on their platforms.\n",
+      "5. **Financial Sustainability**: Challenges related to achieving or maintaining profitability, managing significant historical losses, and handling increased operating expenses.\n",
+      "\n",
+      "### Unique to Lyft\n",
+      "1. **Operational Factors**: Lyft discusses its limited operating history and the unpredictability of its results within the rapidly growing ridesharing market.\n",
+      "2. **Autonomous Vehicles**: Specific challenges related to the development of autonomous vehicle technology.\n",
+      "3. **Technology and Security**: Emphasis on risks from security breaches, system failures, and technological vulnerabilities.\n",
+      "4. **Reliance on Third Parties**: Dependence on third parties like Amazon Web Services and vehicle rental partners.\n",
+      "5. **Program Operations**: Managing programs like Express Drive, Lyft Rentals, and delivery services.\n",
+      "\n",
+      "### Unique to Uber\n",
+      "1. **Global Scope**: Uber's broader global operations expose it to more diverse regulatory and market conditions.\n",
+      "2. **Workplace Culture**: Specific challenges related to historical workplace culture issues and ongoing efforts to address them.\n",
+      "3. **Organizational Management**: The need to optimize organizational structure and manage growth effectively to ensure financial performance and future prospects.\n",
+      "\n",
+      "### Regulatory and Legal Concerns\n",
+      "Both companies face significant regulatory and legal risks, including compliance with privacy and data protection laws, intellectual property litigation, and the impacts of changing regulations on operational flexibility and cost structures.\n",
+      "\n",
+      "This overview encapsulates the shared and distinct challenges faced by Lyft and Uber, reflecting their positions in the competitive and rapidly evolving sectors of mobility, delivery, and technology.\n"
+     ]
+    }
+   ],
+   "source": [
+    "import asyncio\n",
+    "\n",
+    "while True:\n",
+    "    # are we done?\n",
+    "    next_tasks = agent.state.get_next_sub_tasks(plan_id)\n",
+    "    if len(next_tasks) == 0:\n",
+    "        break\n",
+    "\n",
+    "    # run concurrently for better performance\n",
+    "    responses = await asyncio.gather(\n",
+    "        *[agent.arun_task(sub_task.name) for sub_task in next_tasks]\n",
+    "    )\n",
+    "    for sub_task, response in zip(next_tasks, responses):\n",
+    "        completed_tasks.append((sub_task, response))\n",
+    "        agent.state.add_completed_sub_task(plan_id, sub_task)\n",
+    "\n",
+    "    # are we done now?\n",
+    "    # LLMs have a tendency to add more tasks, so we end if there are no more tasks\n",
+    "    next_tasks = agent.state.get_next_sub_tasks(plan_id)\n",
+    "    if len(next_tasks) == 0:\n",
+    "        break\n",
+    "\n",
+    "    # refine the plan\n",
+    "    await agent.arefine_plan(\n",
+    "        plan_id,\n",
+    "        \"Summarize the key risk factors for Lyft and Uber in their 2021 10-K filings.\",\n",
+    "        completed_tasks,\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "By the end, we should have a single response, which is our final response"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "assistant: Here is a comprehensive overview of the key risk factors for both Lyft and Uber from their 2021 10-K filings:\n",
+      "\n",
+      "### Common Risk Factors\n",
+      "1. **COVID-19 Pandemic**: Both companies highlight the adverse effects of the pandemic on their operations and overall business environment.\n",
+      "2. **Driver Classification**: Risk associated with the potential reclassification of drivers as employees rather than independent contractors, impacting business models and cost structures.\n",
+      "3. **Intense Competition**: Both face intense competition in the mobility and delivery sectors, with challenges from well-capitalized competitors, low barriers to entry, and the necessity to offer competitive pricing and incentives.\n",
+      "4. **Brand and Reputation**: Importance of maintaining a strong brand and reputation, especially given past negative publicity and the potential for illegal or improper user activities on their platforms.\n",
+      "5. **Financial Sustainability**: Challenges related to achieving or maintaining profitability, managing significant historical losses, and handling increased operating expenses.\n",
+      "\n",
+      "### Unique to Lyft\n",
+      "1. **Operational Factors**: Lyft discusses its limited operating history and the unpredictability of its results within the rapidly growing ridesharing market.\n",
+      "2. **Autonomous Vehicles**: Specific challenges related to the development of autonomous vehicle technology.\n",
+      "3. **Technology and Security**: Emphasis on risks from security breaches, system failures, and technological vulnerabilities.\n",
+      "4. **Reliance on Third Parties**: Dependence on third parties like Amazon Web Services and vehicle rental partners.\n",
+      "5. **Program Operations**: Managing programs like Express Drive, Lyft Rentals, and delivery services.\n",
+      "\n",
+      "### Unique to Uber\n",
+      "1. **Global Scope**: Uber's broader global operations expose it to more diverse regulatory and market conditions.\n",
+      "2. **Workplace Culture**: Specific challenges related to historical workplace culture issues and ongoing efforts to address them.\n",
+      "3. **Organizational Management**: The need to optimize organizational structure and manage growth effectively to ensure financial performance and future prospects.\n",
+      "\n",
+      "### Regulatory and Legal Concerns\n",
+      "Both companies face significant regulatory and legal risks, including compliance with privacy and data protection laws, intellectual property litigation, and the impacts of changing regulations on operational flexibility and cost structures.\n",
+      "\n",
+      "This overview encapsulates the shared and distinct challenges faced by Lyft and Uber, reflecting their positions in the competitive and rapidly evolving sectors of mobility, delivery, and technology.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(str(responses[-1]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Changing Prompts\n",
+    "\n",
+    "The `StructuredPlanningAgent` has two key prompts:\n",
+    "1. The initial planning prompt\n",
+    "2. The plan refinement prompt\n",
+    "\n",
+    "Below, we show how to configure these prompts, using the defaults as an example."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "DEFAULT_INITIAL_PLAN_PROMPT = \"\"\"\\\n",
+    "Think step-by-step. Given a task and a set of tools, create a comprehesive, end-to-end plan to accomplish the task.\n",
+    "Keep in mind not every task needs to be decomposed into multiple sub-tasks if it is simple enough.\n",
+    "The plan should end with a sub-task that satisfies the overall task.\n",
+    "\n",
+    "The tools available are:\n",
+    "{tools_str}\n",
+    "\n",
+    "Overall Task: {task}\n",
+    "\"\"\"\n",
+    "\n",
+    "DEFAULT_PLAN_REFINE_PROMPT = \"\"\"\\\n",
+    "Think step-by-step. Given an overall task, a set of tools, and completed sub-tasks, update (if needed) the remaining sub-tasks so that the overall task can still be completed.\n",
+    "The plan should end with a sub-task that satisfies the overall task.\n",
+    "If the remaining sub-tasks are sufficient, you can skip this step.\n",
+    "\n",
+    "The tools available are:\n",
+    "{tools_str}\n",
+    "\n",
+    "Overall Task:\n",
+    "{task}\n",
+    "\n",
+    "Completed Sub-Tasks + Outputs:\n",
+    "{completed_outputs}\n",
+    "\n",
+    "Remaining Sub-Tasks:\n",
+    "{remaining_sub_tasks}\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "agent = StructuredPlannerAgent(\n",
+    "    worker,\n",
+    "    tools=[lyft_tool, uber_tool],\n",
+    "    initial_plan_prompt=DEFAULT_INITIAL_PLAN_PROMPT,\n",
+    "    plan_refine_prompt=DEFAULT_PLAN_REFINE_PROMPT,\n",
+    "    verbose=True,\n",
+    ")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/docs/module_guides/deploying/agents/modules.md b/docs/docs/module_guides/deploying/agents/modules.md
index 4640a44f8b26abcf7353ea4bcfa4c75d4816133e..16710ef6dd2afad2e42061d0d847f9f265ad2786 100644
--- a/docs/docs/module_guides/deploying/agents/modules.md
+++ b/docs/docs/module_guides/deploying/agents/modules.md
@@ -16,6 +16,7 @@ For more detailed guides on how to use specific tools, check out our [tools modu
 - [Multi Document Agents](../../../examples/agent/multi_document_agents.ipynb)
 - [Agent Builder](../../../examples/agent/agent_builder.ipynb)
 - [Parallel Function Calling](../../../examples/agent/openai_agent_parallel_function_calling.ipynb)
+- [Agent with Planning](../../../examples/agent/structured_planner.ipynb)
 
 ## [Beta] OpenAI Assistant Agent
 
@@ -49,4 +50,5 @@ For more detailed guides on how to use specific tools, check out our [tools modu
 
 - [Agent Runner](../../../examples/agent/agent_runner/agent_runner.ipynb)
 - [Agent Runner RAG](../../../examples/agent/agent_runner/agent_runner_rag.ipynb)
+- [Agent with Planning](../../../examples/agent/structured_planner.ipynb)
 - [Controllable Agent Runner](../../../examples/agent/agent_runner/agent_runner_rag_controllable.ipynb)
diff --git a/docs/docs/module_guides/deploying/agents/usage_pattern.md b/docs/docs/module_guides/deploying/agents/usage_pattern.md
index dd8cf706cf2be2f22932adaab60b34850243ee94..23fb73d4f86df5c3731c854c437fb2c4b93764ad 100644
--- a/docs/docs/module_guides/deploying/agents/usage_pattern.md
+++ b/docs/docs/module_guides/deploying/agents/usage_pattern.md
@@ -108,6 +108,27 @@ query_engine_tools = [
 outer_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)
 ```
 
+## Agent With Planning
+
+Breaking down an initial task into easier-to-digest sub-tasks is a powerful pattern.
+
+LlamaIndex provides an agent planning module that does just this:
+
+```python
+from llama_index.agent.openai import OpenAIAgentWorker
+from llama_index.core.agent import (
+    StructuredPlannerAgent,
+    FunctionCallingAgentWorker,
+)
+
+worker = FunctionCallingAgentWorker.from_tools(tools, llm=llm)
+agent = StructuredPlannerAgent(worker)
+```
+
+In general, this agent may take longer to respond compared to the basic `AgentRunner` class, but the outputs will often be more complete. Another tradeoff to consider is that planning often requires a very capable LLM (for context, `gpt-3.5-turbo` is sometimes flakey for planning, while `gpt-4-turbo` does much better.)
+
+See more in the [complete guide](../../../examples/agent/structured_planner.ipynb)
+
 ## Lower-Level API
 
 The OpenAIAgent and ReActAgent are simple wrappers on top of an `AgentRunner` interacting with an `AgentWorker`.
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
index 5d6e90a8426e60065272586ab3407e34120b19b5..3bd8fbfeb5149bddd0e1713d9e7352a7b764a129 100644
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@@ -85,6 +85,7 @@ nav:
           - ./examples/agent/coa_agent.ipynb
           - ./examples/agent/lats_agent.ipynb
           - ./examples/agent/llm_compiler.ipynb
+          - ./examples/agent/structured_planner.ipynb
       - Callbacks:
           - ./examples/callbacks/HoneyHiveLlamaIndexTracer.ipynb
           - ./examples/callbacks/PromptLayerHandler.ipynb
diff --git a/llama-index-core/llama_index/core/agent/__init__.py b/llama-index-core/llama_index/core/agent/__init__.py
index c8e2a2d22861d06603e66d1e6af2bdc5507a6b1e..d64740583c15889d86b09e9f5cbff2c14c029911 100644
--- a/llama-index-core/llama_index/core/agent/__init__.py
+++ b/llama-index-core/llama_index/core/agent/__init__.py
@@ -7,6 +7,7 @@ from llama_index.core.agent.react.output_parser import ReActOutputParser
 from llama_index.core.agent.react.step import ReActAgentWorker
 from llama_index.core.agent.react_multimodal.step import MultimodalReActAgentWorker
 from llama_index.core.agent.runner.base import AgentRunner
+from llama_index.core.agent.runner.planner import StructuredPlannerAgent
 from llama_index.core.agent.runner.parallel import ParallelAgentRunner
 from llama_index.core.agent.types import Task
 from llama_index.core.chat_engine.types import AgentChatResponse
@@ -14,6 +15,7 @@ from llama_index.core.agent.function_calling.step import FunctionCallingAgentWor
 
 __all__ = [
     "AgentRunner",
+    "StructuredPlannerAgent",
     "ParallelAgentRunner",
     "ReActAgentWorker",
     "ReActAgent",
diff --git a/llama-index-core/llama_index/core/agent/runner/planner.py b/llama-index-core/llama_index/core/agent/runner/planner.py
new file mode 100644
index 0000000000000000000000000000000000000000..e2c2d2cbdfc25619683aa291301c591c843d5330
--- /dev/null
+++ b/llama-index-core/llama_index/core/agent/runner/planner.py
@@ -0,0 +1,561 @@
+import asyncio
+import uuid
+from typing import Any, Dict, List, Optional, Tuple, Union
+
+from llama_index.core.agent.runner.base import AgentRunner, AgentState
+from llama_index.core.agent.types import (
+    BaseAgentWorker,
+    TaskStepOutput,
+)
+from llama_index.core.bridge.pydantic import BaseModel, Field, ValidationError
+from llama_index.core.callbacks import CallbackManager
+from llama_index.core.chat_engine.types import (
+    AGENT_CHAT_RESPONSE_TYPE,
+    ChatResponseMode,
+)
+from llama_index.core.base.llms.types import ChatMessage
+from llama_index.core.llms.llm import LLM
+from llama_index.core.memory import BaseMemory, ChatMemoryBuffer
+from llama_index.core.memory.types import BaseMemory
+from llama_index.core.objects.base import ObjectRetriever
+from llama_index.core.prompts import PromptTemplate
+from llama_index.core.settings import Settings
+from llama_index.core.tools.types import BaseTool
+from llama_index.core.instrumentation.events.agent import (
+    AgentChatWithStepStartEvent,
+    AgentChatWithStepEndEvent,
+)
+import llama_index.core.instrumentation as instrument
+
+dispatcher = instrument.get_dispatcher(__name__)
+
+
+class SubTask(BaseModel):
+    """A single sub-task in a plan."""
+
+    name: str = Field(..., description="The name of the sub-task.")
+    input: str = Field(..., description="The input prompt for the sub-task.")
+    expected_output: str = Field(
+        ..., description="The expected output of the sub-task."
+    )
+    dependencies: List[str] = Field(
+        ...,
+        description="The sub-task names that must be completed before this sub-task.",
+    )
+
+
+class Plan(BaseModel):
+    """A series of sub-tasks to accomplish an overall task."""
+
+    sub_tasks: List[SubTask] = Field(..., description="The sub-tasks in the plan.")
+
+
+class PlannerAgentState(AgentState):
+    """Agent state."""
+
+    plan_dict: Dict[str, Plan] = Field(
+        default_factory=dict, description="An id-plan lookup."
+    )
+    completed_sub_tasks: Dict[str, List[SubTask]] = Field(
+        default_factory=dict, description="A list of completed sub-tasks for each plan."
+    )
+
+    def get_completed_sub_tasks(self, plan_id: str) -> List[SubTask]:
+        return self.completed_sub_tasks.get(plan_id, [])
+
+    def add_completed_sub_task(self, plan_id: str, sub_task: SubTask) -> None:
+        if plan_id not in self.completed_sub_tasks:
+            self.completed_sub_tasks[plan_id] = []
+
+        self.completed_sub_tasks[plan_id].append(sub_task)
+
+    def get_next_sub_tasks(self, plan_id: str) -> List[SubTask]:
+        next_sub_tasks: List[SubTask] = []
+        plan = self.plan_dict[plan_id]
+
+        if plan_id not in self.completed_sub_tasks:
+            self.completed_sub_tasks[plan_id] = []
+
+        completed_sub_tasks = self.completed_sub_tasks[plan_id]
+        completed_sub_task_names = [sub_task.name for sub_task in completed_sub_tasks]
+
+        for sub_task in plan.sub_tasks:
+            dependencies_met = all(
+                dep in completed_sub_task_names for dep in sub_task.dependencies
+            )
+
+            if sub_task.name not in completed_sub_task_names and dependencies_met:
+                next_sub_tasks.append(sub_task)
+        return next_sub_tasks
+
+    def get_remaining_subtasks(self, plan_id: str) -> List[SubTask]:
+        remaining_subtasks = []
+        plan = self.plan_dict[plan_id]
+
+        if plan_id not in self.completed_sub_tasks:
+            self.completed_sub_tasks[plan_id] = []
+
+        completed_sub_tasks = self.completed_sub_tasks[plan_id]
+        completed_sub_task_names = [sub_task.name for sub_task in completed_sub_tasks]
+
+        for sub_task in plan.sub_tasks:
+            if sub_task.name not in completed_sub_task_names:
+                remaining_subtasks.append(sub_task)
+        return remaining_subtasks
+
+    def reset(self) -> None:
+        """Reset."""
+        self.task_dict = {}
+        self.completed_sub_tasks = {}
+        self.plan_dict = {}
+
+
+DEFAULT_INITIAL_PLAN_PROMPT = """\
+Think step-by-step. Given a task and a set of tools, create a comprehesive, end-to-end plan to accomplish the task.
+Keep in mind not every task needs to be decomposed into multiple sub-tasks if it is simple enough.
+The plan should end with a sub-task that can achieve the overall task.
+
+The tools available are:
+{tools_str}
+
+Overall Task: {task}
+"""
+
+DEFAULT_PLAN_REFINE_PROMPT = """\
+Think step-by-step. Given an overall task, a set of tools, and completed sub-tasks, update (if needed) the remaining sub-tasks so that the overall task can still be completed.
+The plan should end with a sub-task that can achieve and satisfy the overall task.
+If you do update the plan, only create new sub-tasks that will replace the remaining sub-tasks, do NOT repeat tasks that are already completed.
+If the remaining sub-tasks are enough to achieve the overall task, it is ok to skip this step, and instead explain why the plan is complete.
+
+The tools available are:
+{tools_str}
+
+Completed Sub-Tasks + Outputs:
+{completed_outputs}
+
+Remaining Sub-Tasks:
+{remaining_sub_tasks}
+
+Overall Task: {task}
+"""
+
+
+class StructuredPlannerAgent(AgentRunner):
+    """Structured Planner Agent runner.
+
+    Top-level agent orchestrator that can create tasks, run each step in a task,
+    or run a task e2e. Stores state and keeps track of tasks.
+
+    Args:
+        agent_worker (BaseAgentWorker): step executor
+        chat_history (Optional[List[ChatMessage]], optional): chat history. Defaults to None.
+        state (Optional[AgentState], optional): agent state. Defaults to None.
+        memory (Optional[BaseMemory], optional): memory. Defaults to None.
+        llm (Optional[LLM], optional): LLM. Defaults to None.
+        callback_manager (Optional[CallbackManager], optional): callback manager. Defaults to None.
+        init_task_state_kwargs (Optional[dict], optional): init task state kwargs. Defaults to None.
+
+    """
+
+    def __init__(
+        self,
+        agent_worker: BaseAgentWorker,
+        tools: Optional[List[BaseTool]] = None,
+        tool_retriever: Optional[ObjectRetriever[BaseTool]] = None,
+        chat_history: Optional[List[ChatMessage]] = None,
+        state: Optional[PlannerAgentState] = None,
+        memory: Optional[BaseMemory] = None,
+        llm: Optional[LLM] = None,
+        initial_plan_prompt: Union[str, PromptTemplate] = DEFAULT_INITIAL_PLAN_PROMPT,
+        plan_refine_prompt: Union[str, PromptTemplate] = DEFAULT_PLAN_REFINE_PROMPT,
+        callback_manager: Optional[CallbackManager] = None,
+        init_task_state_kwargs: Optional[dict] = None,
+        delete_task_on_finish: bool = False,
+        default_tool_choice: str = "auto",
+        verbose: bool = False,
+    ) -> None:
+        """Initialize."""
+        self.agent_worker = agent_worker
+        self.state = state or PlannerAgentState()
+        self.memory = memory or ChatMemoryBuffer.from_defaults(chat_history, llm=llm)
+        self.tools = tools
+        self.tool_retriever = tool_retriever
+        self.llm = llm or Settings.llm
+
+        if isinstance(initial_plan_prompt, str):
+            initial_plan_prompt = PromptTemplate(initial_plan_prompt)
+        self.initial_plan_prompt = initial_plan_prompt
+
+        if isinstance(plan_refine_prompt, str):
+            plan_refine_prompt = PromptTemplate(plan_refine_prompt)
+        self.plan_refine_prompt = plan_refine_prompt
+
+        # get and set callback manager
+        if callback_manager is not None:
+            self.agent_worker.set_callback_manager(callback_manager)
+            self.callback_manager = callback_manager
+        else:
+            # TODO: This is *temporary*
+            # Stopgap before having a callback on the BaseAgentWorker interface.
+            # Doing that requires a bit more refactoring to make sure existing code
+            # doesn't break.
+            if hasattr(self.agent_worker, "callback_manager"):
+                self.callback_manager = (
+                    self.agent_worker.callback_manager or CallbackManager()
+                )
+            else:
+                self.callback_manager = Settings.callback_manager
+        self.init_task_state_kwargs = init_task_state_kwargs or {}
+        self.delete_task_on_finish = delete_task_on_finish
+        self.default_tool_choice = default_tool_choice
+        self.verbose = verbose
+
+    def get_tools(self, input: str) -> List[BaseTool]:
+        """Get tools."""
+        if self.tools is not None:
+            return self.tools
+        if self.tool_retriever is not None:
+            return self.tool_retriever.retrieve(input)
+        raise ValueError("No tools provided or retriever set.")
+
+    def create_tasks(self, input: str, **kwargs: Any) -> str:
+        """Create a plan to execute a set of tasks."""
+        tools = self.get_tools(input)
+        tools_str = ""
+        for tool in tools:
+            tools_str += tool.metadata.name + ": " + tool.metadata.description + "\n"
+
+        try:
+            plan = self.llm.structured_predict(
+                Plan,
+                self.initial_plan_prompt,
+                tools_str=tools_str,
+                task=input,
+            )
+        except (ValueError, ValidationError):
+            # likely no complex plan predicted
+            # default to a single task plan
+            if self.verbose:
+                print("No complex plan predicted. Defaulting to a single task plan.")
+            plan = Plan(
+                sub_tasks=[
+                    SubTask(
+                        name="default", input=input, expected_output="", dependencies=[]
+                    )
+                ]
+            )
+
+        if self.verbose:
+            print(f"=== Initial plan ===")
+            for sub_task in plan.sub_tasks:
+                print(
+                    f"{sub_task.name}:\n{sub_task.input} -> {sub_task.expected_output}\ndeps: {sub_task.dependencies}\n\n"
+                )
+
+        plan_id = str(uuid.uuid4())
+        self.state.plan_dict[plan_id] = plan
+
+        for sub_task in plan.sub_tasks:
+            self.create_task(sub_task.input, task_id=sub_task.name)
+
+        return plan_id
+
+    async def acreate_tasks(self, input: str, **kwargs: Any) -> str:
+        """Create a plan to execute a set of tasks."""
+        tools = self.get_tools(input)
+        tools_str = ""
+        for tool in tools:
+            tools_str += tool.metadata.name + ": " + tool.metadata.description + "\n"
+
+        try:
+            plan = await self.llm.astructured_predict(
+                Plan,
+                self.initial_plan_prompt,
+                tools_str=tools_str,
+                task=input,
+            )
+        except (ValueError, ValidationError):
+            # likely no complex plan predicted
+            # default to a single task plan
+            if self.verbose:
+                print("No complex plan predicted. Defaulting to a single task plan.")
+            plan = Plan(
+                sub_tasks=[
+                    SubTask(
+                        name="default", input=input, expected_output="", dependencies=[]
+                    )
+                ]
+            )
+
+        if self.verbose:
+            print(f"=== Initial plan ===")
+            for sub_task in plan.sub_tasks:
+                print(
+                    f"{sub_task.name}:\n{sub_task.input} -> {sub_task.expected_output}\ndeps: {sub_task.dependencies}\n\n"
+                )
+
+        plan_id = str(uuid.uuid4())
+        self.state.plan_dict[plan_id] = plan
+
+        for sub_task in plan.sub_tasks:
+            self.create_task(sub_task.input, task_id=sub_task.name)
+
+        return plan_id
+
+    def get_refine_plan_prompt_kwargs(
+        self,
+        plan_id: str,
+        task: str,
+        completed_sub_task_pairs: List[Tuple[SubTask, AGENT_CHAT_RESPONSE_TYPE]],
+    ) -> dict:
+        """Get the refine plan prompt."""
+        # gather completed sub-tasks and response pairs
+        completed_outputs_str = ""
+        for sub_task, response in completed_sub_task_pairs:
+            task_str = f"{sub_task.name}:\n" f"\t{response!s}\n"
+            completed_outputs_str += task_str
+
+        # get a string for the remaining sub-tasks
+        remaining_sub_tasks = self.state.get_remaining_subtasks(plan_id)
+        remaining_sub_tasks_str = "" if len(remaining_sub_tasks) != 0 else "None"
+        for sub_task in remaining_sub_tasks:
+            task_str = (
+                f"SubTask(name='{sub_task.name}', "
+                f"input='{sub_task.input}', "
+                f"expected_output='{sub_task.expected_output}', "
+                f"dependencies='{sub_task.dependencies}')\n"
+            )
+            remaining_sub_tasks_str += task_str
+
+        # get the tools string
+        tools = self.get_tools(remaining_sub_tasks_str)
+        tools_str = ""
+        for tool in tools:
+            tools_str += tool.metadata.name + ": " + tool.metadata.description + "\n"
+
+        # return the kwargs
+        return {
+            "tools_str": tools_str.strip(),
+            "task": task.strip(),
+            "completed_outputs": completed_outputs_str.strip(),
+            "remaining_sub_tasks": remaining_sub_tasks_str.strip(),
+        }
+
+    def refine_plan(
+        self,
+        plan_id: str,
+        task: str,
+        completed_sub_task_pairs: List[Tuple[SubTask, AGENT_CHAT_RESPONSE_TYPE]],
+    ) -> None:
+        """Refine a plan."""
+        prompt_kwargs = self.get_refine_plan_prompt_kwargs(
+            plan_id, task, completed_sub_task_pairs
+        )
+
+        try:
+            new_plan = self.llm.structured_predict(
+                Plan, self.plan_refine_prompt, **prompt_kwargs
+            )
+
+            # delete any tasks from the previous plan
+            for sub_task in self.state.plan_dict[plan_id].sub_tasks:
+                self.delete_task(sub_task.name)
+
+            # update state with new plan
+            self.state.plan_dict[plan_id] = new_plan
+            for sub_task in new_plan.sub_tasks:
+                self.create_task(sub_task.input, task_id=sub_task.name)
+
+            if self.verbose:
+                print(f"=== Refined plan ===")
+                for sub_task in new_plan.sub_tasks:
+                    print(
+                        f"{sub_task.name}:\n{sub_task.input} -> {sub_task.expected_output}\ndeps: {sub_task.dependencies}\n\n"
+                    )
+        except (ValueError, ValidationError):
+            # likely no new plan predicted
+            return
+
+    async def arefine_plan(
+        self,
+        plan_id: str,
+        task: str,
+        completed_sub_task_pairs: List[Tuple[SubTask, AGENT_CHAT_RESPONSE_TYPE]],
+    ) -> None:
+        """Refine a plan."""
+        prompt_args = self.get_refine_plan_prompt_kwargs(
+            plan_id, task, completed_sub_task_pairs
+        )
+
+        try:
+            new_plan = await self.llm.astructured_predict(
+                Plan, self.plan_refine_prompt, **prompt_args
+            )
+
+            # delete any tasks from the previous plan
+            for sub_task in self.state.plan_dict[plan_id].sub_tasks:
+                self.delete_task(sub_task.name)
+
+            # update state with new plan
+            self.state.plan_dict[plan_id] = new_plan
+            for sub_task in new_plan.sub_tasks:
+                self.create_task(sub_task.input, task_id=sub_task.name)
+
+            if self.verbose:
+                print(f"=== Refined plan ===")
+                for sub_task in new_plan.sub_tasks:
+                    print(
+                        f"{sub_task.name}:\n{sub_task.input} -> {sub_task.expected_output}\ndeps: {sub_task.dependencies}\n\n"
+                    )
+
+        except (ValueError, ValidationError):
+            # likely no new plan predicted
+            return
+
+    def run_task(
+        self,
+        task_id: str,
+        mode: ChatResponseMode = ChatResponseMode.WAIT,
+        tool_choice: Union[str, dict] = "auto",
+    ) -> TaskStepOutput:
+        """Run a task."""
+        while True:
+            # pass step queue in as argument, assume step executor is stateless
+            cur_step_output = self._run_step(
+                task_id, mode=mode, tool_choice=tool_choice
+            )
+
+            if cur_step_output.is_last:
+                result_output = cur_step_output
+                break
+
+            # ensure tool_choice does not cause endless loops
+            tool_choice = "auto"
+
+        return self.finalize_response(
+            task_id,
+            result_output,
+        )
+
+    async def arun_task(
+        self,
+        task_id: str,
+        mode: ChatResponseMode = ChatResponseMode.WAIT,
+        tool_choice: Union[str, dict] = "auto",
+    ) -> TaskStepOutput:
+        """Run a task."""
+        while True:
+            # pass step queue in as argument, assume step executor is stateless
+            cur_step_output = await self._arun_step(
+                task_id, mode=mode, tool_choice=tool_choice
+            )
+
+            if cur_step_output.is_last:
+                result_output = cur_step_output
+                break
+
+            # ensure tool_choice does not cause endless loops
+            tool_choice = "auto"
+
+        return self.finalize_response(
+            task_id,
+            result_output,
+        )
+
+    @dispatcher.span
+    def _chat(
+        self,
+        message: str,
+        chat_history: Optional[List[ChatMessage]] = None,
+        tool_choice: Union[str, dict] = "auto",
+        mode: ChatResponseMode = ChatResponseMode.WAIT,
+    ) -> AGENT_CHAT_RESPONSE_TYPE:
+        """Chat with step executor."""
+        dispatch_event = dispatcher.get_dispatch_event()
+
+        if chat_history is not None:
+            self.memory.set(chat_history)
+
+        # create initial set of tasks
+        plan_id = self.create_tasks(message)
+
+        results = []
+        completed_pairs = []
+        dispatch_event(AgentChatWithStepStartEvent())
+        while True:
+            # EXIT CONDITION: check if all sub-tasks are completed
+            next_sub_tasks = self.state.get_next_sub_tasks(plan_id)
+            if len(next_sub_tasks) == 0:
+                break
+
+            jobs = [
+                self.arun_task(sub_task.name, mode=mode, tool_choice=tool_choice)
+                for sub_task in next_sub_tasks
+            ]
+            results = asyncio.run(asyncio.gather(*jobs))
+
+            # gather completed sub-tasks and response pairs
+            for sub_task, response in zip(next_sub_tasks, results):
+                completed_pairs.append((sub_task, response))
+                self.state.add_completed_sub_task(plan_id, sub_task)
+
+            # EXIT CONDITION: check if all sub-tasks are completed now
+            # LLMs have a tendency to add more tasks, so we end if there are no more tasks
+            # next_sub_tasks = self.state.get_next_sub_tasks(plan_id)
+            # if len(next_sub_tasks) == 0:
+            #    break
+
+            # refine the plan
+            self.refine_plan(plan_id, message, completed_pairs)
+
+        dispatch_event(AgentChatWithStepEndEvent())
+        return results[-1]
+
+    @dispatcher.span
+    async def _achat(
+        self,
+        message: str,
+        chat_history: Optional[List[ChatMessage]] = None,
+        tool_choice: Union[str, dict] = "auto",
+        mode: ChatResponseMode = ChatResponseMode.WAIT,
+    ) -> AGENT_CHAT_RESPONSE_TYPE:
+        """Chat with step executor."""
+        dispatch_event = dispatcher.get_dispatch_event()
+
+        if chat_history is not None:
+            self.memory.set(chat_history)
+
+        # create initial set of tasks
+        plan_id = self.create_tasks(message)
+
+        results = []
+        completed_pairs = []
+        dispatch_event(AgentChatWithStepStartEvent())
+        while True:
+            # EXIT CONDITION: check if all sub-tasks are completed
+            next_sub_tasks = self.state.get_next_sub_tasks(plan_id)
+            if len(next_sub_tasks) == 0:
+                break
+
+            jobs = [
+                self.arun_task(sub_task.name, mode=mode, tool_choice=tool_choice)
+                for sub_task in next_sub_tasks
+            ]
+            results = await asyncio.gather(*jobs)
+
+            # gather completed sub-tasks and response pairs
+            for sub_task, response in zip(next_sub_tasks, results):
+                completed_pairs.append((sub_task, response))
+                self.state.add_completed_sub_task(plan_id, sub_task)
+
+            # EXIT CONDITION: check if all sub-tasks are completed now
+            # LLMs have a tendency to add more tasks, so we end if there are no more tasks
+            # next_sub_tasks = self.state.get_next_sub_tasks(plan_id)
+            # if len(next_sub_tasks) == 0:
+            #    break
+
+            # refine the plan
+            await self.arefine_plan(plan_id, message, completed_pairs)
+
+        dispatch_event(AgentChatWithStepEndEvent())
+        return results[-1]
diff --git a/llama-index-core/tests/agent/runner/test_planner.py b/llama-index-core/tests/agent/runner/test_planner.py
new file mode 100644
index 0000000000000000000000000000000000000000..886e07ad943923fa71dd17775edf836f96d3719f
--- /dev/null
+++ b/llama-index-core/tests/agent/runner/test_planner.py
@@ -0,0 +1,91 @@
+from typing import Any
+
+from llama_index.core.agent import ReActAgentWorker, StructuredPlannerAgent
+from llama_index.core.agent.runner.planner import Plan, SubTask
+from llama_index.core.llms.custom import CustomLLM
+from llama_index.core.llms import LLMMetadata, CompletionResponse, CompletionResponseGen
+from llama_index.core.tools import FunctionTool
+
+
+class MockLLM(CustomLLM):
+    @property
+    def metadata(self) -> LLMMetadata:
+        """LLM metadata.
+
+        Returns:
+            LLMMetadata: LLM metadata containing various information about the LLM.
+        """
+        return LLMMetadata()
+
+    def complete(
+        self, prompt: str, formatted: bool = False, **kwargs: Any
+    ) -> CompletionResponse:
+        if "CREATE A PLAN" in prompt:
+            text = Plan(
+                sub_tasks=[
+                    SubTask(
+                        name="one", input="one", expected_output="one", dependencies=[]
+                    ),
+                    SubTask(
+                        name="two", input="two", expected_output="two", dependencies=[]
+                    ),
+                    SubTask(
+                        name="three",
+                        input="three",
+                        expected_output="three",
+                        dependencies=["one", "two"],
+                    ),
+                ]
+            ).json()
+            return CompletionResponse(text=text)
+
+        # dummy response for react
+        return CompletionResponse(text="Final Answer: All done")
+
+    def stream_complete(
+        self, prompt: str, formatted: bool = False, **kwargs: Any
+    ) -> CompletionResponseGen:
+        raise NotImplementedError
+
+
+def dummy_function(a: int, b: int) -> int:
+    """A dummy function that adds two numbers together."""
+    return a + b
+
+
+def test_planner_agent() -> None:
+    dummy_tool = FunctionTool.from_defaults(fn=dummy_function)
+    dummy_llm = MockLLM()
+
+    worker = ReActAgentWorker.from_tools([dummy_tool], llm=dummy_llm)
+    agent = StructuredPlannerAgent(worker, tools=[dummy_tool], llm=dummy_llm)
+
+    # create a plan
+    plan_id = agent.create_tasks("CREATE A PLAN")
+    plan = agent.state.plan_dict[plan_id]
+    assert plan is not None
+    assert len(plan.sub_tasks) == 3
+    assert len(agent.state.get_completed_sub_tasks(plan_id)) == 0
+    assert len(agent.state.get_remaining_subtasks(plan_id)) == 3
+    assert len(agent.state.get_next_sub_tasks(plan_id)) == 2
+
+    next_tasks = agent.state.get_next_sub_tasks(plan_id)
+
+    completed_pairs = []
+    for task in next_tasks:
+        response = agent.run_task(task.name)
+        completed_pairs.append((task, response))
+        agent.state.add_completed_sub_task(plan_id, task)
+
+    assert len(agent.state.get_completed_sub_tasks(plan_id)) == 2
+
+    next_tasks = agent.state.get_next_sub_tasks(plan_id)
+    assert len(next_tasks) == 1
+
+    # will insert the original dummy plan again
+    agent.refine_plan(plan_id, "CREATE A PLAN", completed_pairs)
+
+    assert len(plan.sub_tasks) == 3
+    assert len(agent.state.get_completed_sub_tasks(plan_id)) == 2
+    assert len(agent.state.get_remaining_subtasks(plan_id)) == 1
+    assert len(agent.state.get_next_sub_tasks(plan_id)) == 1
diff --git a/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py b/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py
index 4a0ee911f028ebd456332c2d2ba7b5369f5e6a44..dec4820bbd5278bcc6d11d13a9fb868a49a5afb5 100644
--- a/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py
+++ b/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py
@@ -57,7 +57,7 @@ AZURE_TURBO_MODELS: Dict[str, int] = {
     "gpt-35-turbo-16k": 16384,
     "gpt-35-turbo": 4096,
     # 0125 (2024) model (JSON mode)
-    "gpt-35-turbo-0125": 16385,
+    "gpt-35-turbo-0125": 16384,
     # 1106 model (JSON mode)
     "gpt-35-turbo-1106": 16384,
     # 0613 models (function calling):
@@ -67,15 +67,13 @@ AZURE_TURBO_MODELS: Dict[str, int] = {
 
 TURBO_MODELS: Dict[str, int] = {
     # stable model names:
-    #   resolves to gpt-3.5-turbo-0301 before 2023-06-27,
-    #   resolves to gpt-3.5-turbo-0613 until 2023-12-11,
-    #   resolves to gpt-3.5-turbo-1106 after
-    "gpt-3.5-turbo": 4096,
+    #   resolves to gpt-3.5-turbo-0125 as of 2024-04-29.
+    "gpt-3.5-turbo": 16384,
     # resolves to gpt-3.5-turbo-16k-0613 until 2023-12-11
     # resolves to gpt-3.5-turbo-1106 after
     "gpt-3.5-turbo-16k": 16384,
     # 0125 (2024) model (JSON mode)
-    "gpt-3.5-turbo-0125": 16385,
+    "gpt-3.5-turbo-0125": 16384,
     # 1106 model (JSON mode)
     "gpt-3.5-turbo-1106": 16384,
     # 0613 models (function calling):
diff --git a/llama-index-integrations/program/llama-index-program-openai/pyproject.toml b/llama-index-integrations/program/llama-index-program-openai/pyproject.toml
index e5a18ac3e195abbba7d793b25d04aca5a71cd17b..fc484f061ca41289a31c071482b1cf59e6f56aa7 100644
--- a/llama-index-integrations/program/llama-index-program-openai/pyproject.toml
+++ b/llama-index-integrations/program/llama-index-program-openai/pyproject.toml
@@ -31,7 +31,7 @@ version = "0.1.6"
 
 [tool.poetry.dependencies]
 python = ">=3.8.1,<4.0"
-llama-index-llms-openai = "^0.1.1"
+llama-index-llms-openai = ">=0.1.1"
 llama-index-core = "^0.10.1"
 llama-index-agent-openai = ">=0.1.1,<0.3.0"