Capability
Benchmark Suite Composition And Leaderboard Aggregation
15 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
EleutherAI's evaluation framework — 200+ benchmarks, powers Open LLM Leaderboard.
Unique: Supports weighted aggregation of metrics across multiple tasks with hierarchical grouping. Leaderboard scores are computed with optional normalization, enabling fair comparison across models with different evaluation configurations.
vs others: Compared to manual leaderboard computation, the framework automates aggregation and ranking. Weighted aggregation enables custom benchmark suites tailored to specific evaluation goals.