Skip to content
Snippets Groups Projects
Unverified Commit 297d6d75 authored by Zirui Wang's avatar Zirui Wang Committed by GitHub
Browse files

Update README.md

parent 45c08082
No related branches found
No related tags found
No related merge requests found
......@@ -10,9 +10,9 @@ This repository contains the code to evaluate models on CharXiv from the paper [
https://github.com/princeton-nlp/CharXiv/assets/59942464/ab9b293b-8fd6-4735-b8b3-0079ee978b61
## 📰 News
**[12/25/2024]** 🚀 We updated the [leaderboard]((https://charxiv.github.io/#leaderboard)) with the latest models: o1, Qwen2-VL, Pixtral, InternVL 2.5, Llama 3.2 Vision, NVLM, Molmo, Llava OneVision, Phi 3.5, and more!
**[10/10/2024]** 🚀 CharXiv is accepted at **NeurIPS 2024 Datasets & Benchmarks Track** and NeurIPS 2024 Multimodal Algorithmic Reasoning Workshop as a **spotlight** paper.
**[07/26/2024]** 🚀 Upcoming this week: we'll be releasing scores for [GPT-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) as well as the largest and most capable open-weight VLM in our benchmark: [InternVL2 LLaMA-3 76B](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B). Alongside scores, we find some [interesting patterns](https://x.com/zwcolin/status/1816948825036071196) in the trend of model improvement with respect to differnet chart understanding benchmarks on X.
**[12/25/2024]** 🚀 We updated the [leaderboard]((https://charxiv.github.io/#leaderboard)) with the latest models: o1, Qwen2-VL, Pixtral, InternVL 2.5, Llama 3.2 Vision, NVLM, Molmo, Llava OneVision, Phi 3.5, and more!
**[10/10/2024]** 🚀 CharXiv is accepted at **NeurIPS 2024 Datasets & Benchmarks Track** and NeurIPS 2024 Multimodal Algorithmic Reasoning Workshop as a **spotlight** paper.
**[07/26/2024]** 🚀 Upcoming this week: we'll be releasing scores for [GPT-4o-mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) as well as the largest and most capable open-weight VLM in our benchmark: [InternVL2 LLaMA-3 76B](https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B). Alongside scores, we find some [interesting patterns](https://x.com/zwcolin/status/1816948825036071196) in the trend of model improvement with respect to differnet chart understanding benchmarks on X.
**[07/24/2024]** 🚀 We released the [full evaluation pipeline](https://github.com/princeton-nlp/CharXiv) (i.e., v1.0).
**[07/23/2024]** 🚀 We released our [evaluation results](https://huggingface.co/datasets/princeton-nlp/CharXiv/tree/main/existing_evaluations) on **all 34 MLLMs** that we have tested so far -- this includes all models' responses to CharXiv's challenging questions, scores graded by GPT-4o, as well as aggregated stats.
**[07/14/2024]** 🚀 We further evaluated the latest [InternVL Chat V2.0 26B](https://huggingface.co/OpenGVLab/InternVL2-26B) and [Cambrian 34B models](https://huggingface.co/nyu-visionx/cambrian-34b) on CharXiv with some **State-of-the-Art results**. More analysis are [here](https://x.com/zwcolin/status/1812650435808792731).
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment