Projects with this topic
-
🔧 🔗 https://github.com/tensorzero/llmgym LLM Gym is a unified environment interface for developing and benchmarking LLM applications that learn from feedback. Think gym for LLM agents.As the space of benchmar
Updated -
https://github.com/princeton-nlp/SWE-bench [ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
🔗 https://www.swebench.com/Updated -
https://github.com/VictoriaMetrics/prometheus-benchmark Benchmark for Prometheus-compatible systems
Updated -
🔧 🔗 https://github.com/cirruslabs/XcodeBenchmark-ForRunnersXcodeBenchmark measures the compilation time of a large codebase on iMac, MacBook, and Mac Pro
Updated -
🔧 🔗 https://github.com/vk-en/fioplot-bsfioplot-bs is a utility that allows you to create graphs and xlsx tables based on the results of the performance testing utility - FIO.
Updated -
Updated
-
🔧 🔗 https://github.com/distributedOne/diskbenchA collection of fio tests for disk performance testing.
Updated -
https://github.com/princeton-nlp/CharXiv CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Updated -
🔧 🔗 https://github.com/gin-gonic/FrameworkBenchmarksSource code for the framework benchmarking project
Updated -
🔧 🔗 https://github.com/gin-gonic/go-web-framework-benchmark⚡ Go web framework benchmarkUpdated -
🔧 🔗 https://github.com/gin-gonic/go-http-routing-benchmarkGo HTTP request router and web framework benchmark
Updated -
Repository for AI Model Benchmarking
Updated -
🔧 🔗 https://github.com/alacritty/vtebench Generate benchmarks for terminal emulatorsUpdated -
https://github.com/THUDM/LongBench [ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Updated -
https://github.com/THUDM/LongCite LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA
Updated -
🔧 🔗 https://github.com/go101/go-benchmarksSome benchmarks I wrote in writing the Go 101 book.
Updated -
https://github.com/VictoriaMetrics/billy Billy benchmarks for VictoriaMetrics
Updated