From 11addc7a0a54a840f20bde5d352a074864489c86 Mon Sep 17 00:00:00 2001
From: sekyonda <127536312+sekyondaMeta@users.noreply.github.com>
Date: Mon, 16 Oct 2023 15:50:32 -0400
Subject: [PATCH] Update VideoSummary.ipynb

---
 demo_apps/VideoSummary.ipynb | 126 ++++++++++++++++++++++++++++++++---
 1 file changed, 115 insertions(+), 11 deletions(-)

diff --git a/demo_apps/VideoSummary.ipynb b/demo_apps/VideoSummary.ipynb
index edcab0b3..44592b3b 100644
--- a/demo_apps/VideoSummary.ipynb
+++ b/demo_apps/VideoSummary.ipynb
@@ -6,9 +6,24 @@
    "metadata": {},
    "source": [
     "## This demo app shows:\n",
-    "* how to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video;\n",
-    "* how to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method;\n",
-    "* how to bypass the limit of Llama's max input token size by using more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info."
+    "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video.\n",
+    "* How to ask Llama to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method.\n",
+    "* How to bypass the limit of Llama's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c866f6be",
+   "metadata": {},
+   "source": [
+    "We start by installing the necessary packages:\n",
+    "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video.\n",
+    "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo.\n",
+    "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer.\n",
+    "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos.\n",
+    "\n",
+    "**Note** This example uses Replicate to host the Llama model. If you have not set up/or used Replicate before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up Replicate before continuing with this example.\n",
+    "If you do not want to use Replicate, you will need to make some changes to this notebook as you go along."
    ]
   },
   {
@@ -21,6 +36,14 @@
     "!pip install langchain youtube-transcript-api tiktoken pytube"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "af3069b1",
+   "metadata": {},
+   "source": [
+    "Next we load the YouTube video transcript using the YoutubeLoader."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
@@ -69,6 +92,25 @@
     "len(docs[0].page_content), docs[0].page_content[:300]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4af7cc16",
+   "metadata": {},
+   "source": [
+    "We are using Replicate in this example to host our Llama 2 model so you will need to get a Replicate token.\n",
+    "\n",
+    "To get the Replicate token: \n",
+    "\n",
+    "- You will need to first sign in with Replicate with your github account\n",
+    "- Then create a free API token [here](https://replicate.com/account/api-tokens) that you can use for a while. \n",
+    "\n",
+    "**Note** After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on Replicate.\n",
+    "\n",
+    "Alternatively, you can run Llama locally. See:\n",
+    "- [HelloLlamaCloud](HelloLlamaCloud.ipynb) for further information on how to run Llama using Replicate.\n",
+    "- [HelloLlamaLocal](HelloLlamaLocal.ipynb) for further information on how to run Llama locally."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -76,7 +118,7 @@
    "metadata": {},
    "outputs": [
     {
-     "name": "stdin",
+     "name": "stdout",
      "output_type": "stream",
      "text": [
       " Â·Â·Â·Â·Â·Â·Â·Â·\n"
@@ -92,6 +134,17 @@
     "os.environ[\"REPLICATE_API_TOKEN\"] = REPLICATE_API_TOKEN\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "6b911efd",
+   "metadata": {},
+   "source": [
+    "Next we call the Llama 2 model from Replicate. In this example we will use the llama 2 13b chat model. You can find more Llama 2 models by searching for them on the [Replicate model explore page](https://replicate.com/explore?query=llama).\n",
+    "You can add them here in the format: model_name/version\n",
+    "\n",
+    "If you using local Llama, just set llm accordingly - see the [HelloLlamaLocal notebook](HelloLlamaLocal.ipynb)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -99,7 +152,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# set llm to be Llama2-13b model; if you use local Llama, just set llm accordingly - see the HelloLlamaLocal notebook\n",
+    "\n",
     "from langchain.llms import Replicate\n",
     "\n",
     "llama2_13b = \"meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d\"\n",
@@ -109,6 +162,14 @@
     ")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8e3baa56",
+   "metadata": {},
+   "source": [
+    "Once everything is set up, we prompt Llama 2 to summarize the first 4000 characters of the transcript for us."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 6,
@@ -141,6 +202,14 @@
     "print(summary)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "8b684b29",
+   "metadata": {},
+   "source": [
+    "Next we try to summarize all the content of the transcript and we should get a `RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens.`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 7,
@@ -174,8 +243,16 @@
     "# try to get a summary of the whole content\n",
     "text = docs[0].page_content\n",
     "summary = chain.run(text)\n",
-    "print(summary)\n",
-    "# and you'll get - RuntimeError: Your input is too long. Max input length is 4096 tokens, but you supplied 5597 tokens."
+    "print(summary)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ad1881a",
+   "metadata": {},
+   "source": [
+    "\n",
+    "Let's try some workarounds to see if we can summarize the entire transcript without running into the `RuntimeError`.\n"
    ]
   },
   {
@@ -260,6 +337,15 @@
     "chain.run(docs)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "aecf6328",
+   "metadata": {},
+   "source": [
+    "\n",
+    "Since the transcript is bigger than the model can handle, we can split the transcript into chunks instead and use the [`refine`](https://python.langchain.com/docs/modules/chains/document/refine) `chain_type` iteratively to create an answer."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 10,
@@ -321,6 +407,14 @@
     "chain.run(split_docs)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "c3976c92",
+   "metadata": {},
+   "source": [
+    "You can also use [`map_reduce`](https://python.langchain.com/docs/modules/chains/document/map_reduce) `chain_type` to implement a map reduce like architecture while summarizing the documents."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 14,
@@ -400,6 +494,15 @@
     "chain.run(split_docs)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "77d580de",
+   "metadata": {},
+   "source": [
+    "To investigate further, let's turn on Langchain's debug mode on to get an idea of how many calls are made to the model and the details of the inputs and outputs.\n",
+    "We will then run our summary using the `stuff` and `refine` `chain_types` and take a look at our output."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 15,
@@ -559,12 +662,13 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "id": "61ccd0fb-5cdb-43c4-afaf-05bc9f7cf959",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "\n",
+    "As you can see, `stuff` fails because it tries to treat all the split documents as one and \"stuffs\" it into one prompt which leads to a much larger prompt than Llama 2 can handle while `refine` iteratively runs over the documents updating its answer as it goes."
+   ]
   }
  ],
  "metadata": {
-- 
GitLab