via “expert-level multimodal reasoning evaluation across 30 college subjects”
Expert-level multimodal understanding across 30 subjects.
Unique: MMMU is the only benchmark combining (1) 11,500 questions across 30 college subjects and 183 subfields, (2) 30 heterogeneous visual modality types (including domain-specific visuals like chemical structures and music sheets), and (3) explicit sourcing from authentic college exams/textbooks/lectures rather than synthetic or crowdsourced data. This scale and diversity of real-world academic content distinguishes it from narrower benchmarks like MMVP or ScienceQA which focus on single domains or simpler visual reasoning.
vs others: MMMU covers 6x more disciplines and 3x more subjects than domain-specific benchmarks (e.g., MedQA for medicine only), and includes heterogeneous visual types (chemical structures, music sheets) absent from general-purpose multimodal benchmarks like LVLM-eHub, making it the most comprehensive test of expert-level multimodal reasoning across academic domains.