L
lm-sys
Projects with this topic
-
https://github.com/lm-sys/arena-hard-auto Arena-Hard-Auto: An automatic LLM benchmark.
Updated -
https://github.com/lm-sys/llm-decontaminator Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
Updated