B
benchmark

Projects with this topic

View SWE Bench project

mirrored_repos / MachineLearning / princeton-nlp / SWE Bench

https://github.com/princeton-nlp/SWE-bench [ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues? 🔗 https://www.swebench.com/

benchmark software-eng... language-model

0

Updated Jan 21, 2026

0 0 0 0

Updated Jan 21, 2026
View LLMgym project

mirrored_repos / MachineLearning / TensorZero / LLMgym

🔧🔗https://github.com/tensorzero/llmgym LLM Gym is a unified environment interface for developing and benchmarking LLM applications that learn from feedback. Think gym for LLM agents.

As the space of benchmar

Large Langua... Python benchmark benchmarking TensorZero

0

Updated Jan 16, 2026

0 0 0 0

Updated Jan 16, 2026
View Mdbench project

mirrored_repos / tigrisdata / Mdbench

mdbench
https://github.com/tigrisdata/mdbench Simple filsystem metadata operations benchmark

filesystem metadata io testing benchmarking benchmark file

0

Updated Jan 07, 2026

0 0 0 0

Updated Jan 07, 2026
View Mint project

mirrored_repos / tigrisdata / Mint

Mint
https://github.com/tigrisdata/mint

Collection of tests to detect overall correctness of TigrisOS. Mint is a testing framework for S3 compatible object stores, available as a docker image. It runs correctness, benchmarking and stress tests.

s3 validation testing object-storage benchmark go GoLang

0

Updated Jan 07, 2026

0 0 0 0

Updated Jan 07, 2026
View fio-CDM project

mirrored_repos / xlucn / fio-CDM

🔧🔗https://github.com/xlucn/fio-cdm

CrystalDiskMark-style disk speed test with fio.

Python version

fio Python crystaldiskmark cdm disk benchmark benchmarking benchmarks diskmark

0

Updated Jun 23, 2025

0 0 0 0

Updated Jun 23, 2025
View CharXiv project

mirrored_repos / MachineLearning / princeton-nlp / CharXiv

https://github.com/princeton-nlp/CharXiv CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

benchmark Machine Lear... multimodal vision-langu... chart-unders...

0

Updated Apr 22, 2025

0 0 0 0

Updated Apr 22, 2025
View FrameworkBenchmarks project

mirrored_repos / gin-gonic / FrameworkBenchmarks

🔧🔗https://github.com/gin-gonic/FrameworkBenchmarks

Source code for the framework benchmarking project

gin benchmark php

0

Updated Mar 24, 2025

0 0 0 0

Updated Mar 24, 2025
View Benchmarking project

mirrored_repos / MachineLearning / TensTorrent / Benchmarking

Repository for AI Model Benchmarking

tenstorrent benchmark benchmarking

0

Updated Mar 01, 2025

0 0 0 0

Updated Mar 01, 2025
View LongBench project

mirrored_repos / MachineLearning / THUDM / LongBench

https://github.com/THUDM/LongBench [ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

benchmark Large Langua... long-context longtext

0

Updated Jan 15, 2025

0 0 0 0

Updated Jan 15, 2025
View LongCite project

mirrored_repos / MachineLearning / THUDM / LongCite

https://github.com/THUDM/LongCite LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

benchmark fine-tuning Large Langua... long-context citation-gen...

0

Updated Dec 31, 2024

0 0 0 0

Updated Dec 31, 2024

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾