Projects with this topic
-
🔧 🔗 https://github.com/langfuse/langfuse🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LUpdated -
🔧 🔗 https://github.com/EleutherAI/lm-evaluation-harness A framework for few-shot evaluation of language models.Updated -
🔧 🔗 https://github.com/modelscope/evalscope A streamlined and customizable framework for efficient large model evaluation and performance benchmarkingUpdated -
🔧 🔗 https://github.com/google/lmevalLMEval: Large Model Evaluation Framework
Updated -
🔧 🔗 https://github.com/om-ai-lab/VL-CheckList Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]Updated -
🔧 🔗 https://github.com/om-ai-lab/OVDEvalA Comprehensive Evaluation Benchmark for Open-Vocabulary Detection (AAAI 2024)
Updated -
🔧 🔗 https://github.com/AkariAsai/OpenScholar_ExpertEval This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.Updated