Skip to content
Snippets Groups Projects
Unverified Commit 4fc91a52 authored by Zirui Wang's avatar Zirui Wang Committed by GitHub
Browse files

Update README.md

parent 5a92816a
No related branches found
No related tags found
No related merge requests found
...@@ -10,7 +10,7 @@ This repository contains the code to evaluate models on CharXiv from the paper [ ...@@ -10,7 +10,7 @@ This repository contains the code to evaluate models on CharXiv from the paper [
https://github.com/princeton-nlp/CharXiv/assets/59942464/ab9b293b-8fd6-4735-b8b3-0079ee978b61 https://github.com/princeton-nlp/CharXiv/assets/59942464/ab9b293b-8fd6-4735-b8b3-0079ee978b61
## 📰 News ## 📰 News
**[07/26/2024]** 🚀 Upcoming this week: we'll be releasing scores for [GPT-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) as well as the largest and most capable open-weight VLM in our benchmark: [InternVL2 LLaMA-3 76B](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B). Alongside scores, we find some [interesting patterns](https://x.com/zwcolin/status/1816948825036071196) in the trend of model improvement with respect to differnet chart understanding benchmarks on X. Stay tuned! **[07/26/2024]** 🚀 Upcoming this week: we'll be releasing scores for [GPT-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) as well as the largest and most capable open-weight VLM in our benchmark: [InternVL2 LLaMA-3 76B](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B). Alongside scores, we find some [interesting patterns](https://x.com/zwcolin/status/1816948825036071196) in the trend of model improvement with respect to differnet chart understanding benchmarks on X.
**[07/24/2024]** 🚀 We released the [full evaluation pipeline](https://github.com/princeton-nlp/CharXiv) (i.e., v1.0). **[07/24/2024]** 🚀 We released the [full evaluation pipeline](https://github.com/princeton-nlp/CharXiv) (i.e., v1.0).
**[07/23/2024]** 🚀 We released our [evaluation results](https://huggingface.co/datasets/princeton-nlp/CharXiv/tree/main/existing_evaluations) on **all 34 MLLMs** that we have tested so far -- this includes all models' responses to CharXiv's challenging questions, scores graded by GPT-4o, as well as aggregated stats. **[07/23/2024]** 🚀 We released our [evaluation results](https://huggingface.co/datasets/princeton-nlp/CharXiv/tree/main/existing_evaluations) on **all 34 MLLMs** that we have tested so far -- this includes all models' responses to CharXiv's challenging questions, scores graded by GPT-4o, as well as aggregated stats.
**[07/14/2024]** 🚀 We further evaluated the latest [InternVL Chat V2.0 26B](https://huggingface.co/OpenGVLab/InternVL2-26B) and [Cambrian 34B models](https://huggingface.co/nyu-visionx/cambrian-34b) on CharXiv with some **State-of-the-Art results**. More analysis are [here](https://x.com/zwcolin/status/1812650435808792731). **[07/14/2024]** 🚀 We further evaluated the latest [InternVL Chat V2.0 26B](https://huggingface.co/OpenGVLab/InternVL2-26B) and [Cambrian 34B models](https://huggingface.co/nyu-visionx/cambrian-34b) on CharXiv with some **State-of-the-Art results**. More analysis are [here](https://x.com/zwcolin/status/1812650435808792731).
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment