via “multimodal mathematical reasoning evaluation across visual domains”
Visual mathematical reasoning benchmark.
Unique: Combines visual understanding with mathematical problem-solving across three newly created datasets (IQTest, FunctionQA, PaperQA) plus 28 existing multimodal datasets, totaling 6,141 examples with explicit focus on compositional reasoning where visual perception and mathematical logic must be jointly applied. Unlike single-domain benchmarks, MathVista spans geometry, statistics, and scientific figures, exposing differential model performance across mathematical reasoning types.
vs others: Broader than domain-specific benchmarks (e.g., geometry-only or chart-only) and more rigorous than general vision-language benchmarks because it requires both accurate visual interpretation AND correct mathematical reasoning, not just image captioning or visual QA on non-mathematical content.