benchmark
Projects with this topic
-
https://github.com/THUDM/LongBench [ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Updated -
https://github.com/THUDM/LongCite LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
Updated -
https://github.com/princeton-nlp/SWE-bench [ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
🔗 https://www.swebench.com/Updated -
https://github.com/VictoriaMetrics/prometheus-benchmark Benchmark for Prometheus-compatible systems
Updated -
🔧 🔗 https://github.com/cirruslabs/XcodeBenchmark-ForRunnersXcodeBenchmark measures the compilation time of a large codebase on iMac, MacBook, and Mac Pro
Updated -
🔧 🔗 https://github.com/go101/go-benchmarksSome benchmarks I wrote in writing the Go 101 book.
Updated -
Repository for AI Model Benchmarking
Updated -
https://github.com/princeton-nlp/CharXiv CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Updated -
https://github.com/VictoriaMetrics/billy Billy benchmarks for VictoriaMetrics
Updated