Skip to content
Snippets Groups Projects
Unverified Commit 362a79cc authored by Ravi Theja's avatar Ravi Theja Committed by GitHub
Browse files

Add GPT4-V and Microsoft Table transformer experiments with PDF which has tables. (#9238)


* format

* cr

* cr

* cr

* Update experiment 4 with ocr

* Update documentation

---------

Co-authored-by: default avatarHaotian Zhang <socool.king@gmail.com>
parent 7b36961a
No related branches found
No related tags found
No related merge requests found
This diff is collapsed.
...@@ -105,3 +105,25 @@ maxdepth: 1 ...@@ -105,3 +105,25 @@ maxdepth: 1
--- ---
/examples/multi_modal/ChromaMultiModalDemo.ipynb /examples/multi_modal/ChromaMultiModalDemo.ipynb
``` ```
### Multi-Modal RAG on PDF's with Tables using Microsoft `Table Transformer`
One common challenge with RAG (Retrieval-Augmented Generation) involves handling PDFs that contain tables. Parsing tables in various formats can be quite complex.
However, Microsoft's newly released model, Table Transformer, offers a promising solution for detecting tables within images.
In this notebook, we will demonstrate how to leverage the Table Transformer model in conjunction with GPT4-V to yield better results for images containing tables.
The experiment is divided into the following parts and we compared those 4 options for extracting table information from PDFs:
1. Retrieving relevant images (PDF pages) and sending them to GPT4-V to respond to queries.
2. Regarding every PDF page as an image, let GPT4-V do the image reasoning for each page. Build Text Vector Store index for the image reasonings. Query the answer against the `Image Reasoning Vector Store`.
3. Using Table Transformer to crop the table information from the retrieved images and then sending these cropped images to GPT4-V for query responses.
4. Applying OCR on cropped table images and send the data to GPT4/ GPT-3.5 to answer the query.
```{toctree}
---
maxdepth: 1
---
/examples/multi_modal/multi_modal_pdf_tables.ipynb
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment