Competition Mathematics Problem Dataset Loading With Multi Subject Stratification

1

MATH BenchmarkBenchmark63/100

via “competition-mathematics problem dataset loading with multi-subject stratification”

12.5K competition math problems — AMC/AIME/Olympiad level, 7 subjects, standard math benchmark.

Unique: Curates problems exclusively from high-difficulty mathematical competitions (AMC, AIME, Olympiads) rather than generic math word problems, ensuring evaluation on reasoning-intensive problems that require multi-step derivations and deep mathematical understanding. The MATHDataset class implements subject-aware stratification enabling fine-grained evaluation across mathematical domains.

vs others: More rigorous than generic math QA datasets (e.g., MathQA, SVAMP) because problems require genuine mathematical reasoning rather than simple arithmetic, making it the de facto standard for evaluating LLM mathematical capabilities in research.

2

MATHDataset56/100

via “subject-domain problem categorization and retrieval”

12.5K competition math problems across 7 subjects and 5 difficulty levels.

Unique: Problems are curated and tagged with subject metadata from their original competition context, ensuring accurate domain classification. The 7-subject taxonomy reflects the structure of actual mathematics competitions, making it meaningful for evaluating mathematical reasoning across recognized disciplines.

vs others: More granular than generic math benchmarks that treat all math problems uniformly; more reliable than automatic subject classification because tags are assigned by domain experts during curation, not inferred post-hoc; enables domain-specific analysis that generic benchmarks cannot support.

3

mmluDataset23/100

via “subject-stratified evaluation split generation”

Dataset by cais. 4,76,392 downloads.

Unique: Implements subject-stratified splitting at dataset creation time rather than leaving it to users, guaranteeing proportional subject representation across train/val/test without requiring custom sampling logic. This is embedded in the HuggingFace dataset schema rather than requiring post-hoc processing.

vs others: Prevents common evaluation mistakes (subject leakage, imbalanced splits) that plague ad-hoc dataset partitioning, while maintaining simplicity through pre-computed splits

Top Matches

Also Known As

Company