Capability
Scenario Library Management And Extensibility
2 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
Stanford's holistic LLM evaluation — 42 scenarios, 7 metrics including fairness, bias, toxicity.
Unique: Implements a pluggable scenario architecture where each scenario is a self-contained module defining input/output format, metrics, and optional prompt templates; enables users to add custom scenarios without modifying core HELM code
vs others: More extensible than monolithic benchmarks (e.g., MMLU) by enabling custom scenario implementation; more modular than ad-hoc evaluation scripts by enforcing consistent scenario interface and metric computation