Capability
Adaptive Difficulty Scaling
20 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
via “difficulty-stratified problem categorization and filtering”
10K coding problems across 3 difficulty levels with test suites.
Unique: Explicitly stratifies problems into three difficulty tiers with substantial size per tier (3.6K, 5K, 1.4K), enabling fine-grained analysis of model performance degradation across skill levels rather than treating all problems as equal difficulty
vs others: Unlike HumanEval which lacks difficulty stratification, APPS enables researchers to measure whether models have genuine reasoning or are pattern-matching, by comparing performance across tiers