Skip to content
Snippets Groups Projects
Unverified Commit 336a88db authored by Jerry Liu's avatar Jerry Liu Committed by GitHub
Browse files

Add multi-modal use case section (#8823)

parent 9f8a08db
No related branches found
No related tags found
No related merge requests found
......@@ -93,6 +93,7 @@ Associated projects
use_cases/chatbots.md
use_cases/agents.md
use_cases/extraction.md
use_cases/multimodal.md
.. toctree::
:maxdepth: 2
......
# Multi-modal
LlamaIndex offers capabilities to not only build language-based applications, but also **multi-modal** applications - combining language and images.
## Types of Multi-modal Use Cases
This space is actively being explored right now, but there are some fascinating use cases popping up.
### Multi-Modal RAG
All the core RAG concepts: indexing, retrieval, and synthesis, can be extended into the image setting.
- The input could be text or image.
- The stored knowledge base can consist of text or images.
- The inputs to response generation can be text or image.
- The final response can be text or image.
Check out our guides below:
```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/gpt4v_multi_modal_retrieval.ipynb
[Old] Multi-modal retrieval with CLIP </examples/multi_modal/multi_modal_retrieval.ipynb>
```
### Retrieval-Augmented Image Captioning
Oftentimes understanding an image requires looking up information from a knowledge base. A flow here is retrieval-augmented image captioning - first caption the image with a multi-modal model, then refine the caption by retrieving from a text corpus.
Check out our guides below:
```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/llava_multi_modal_tesla_10q.ipynb
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment