ARC
BenchmarkFreeAbstraction and reasoning corpus for general intelligence
Capabilities2 decomposed
abstract reasoning problem generation
Medium confidenceARC generates visual reasoning problems that require abstract thinking and rule inference. It employs a grid-pattern puzzle design, ensuring that each problem is solvable by humans but challenging for AI systems. This unique structure tests the ability to deduce underlying rules from visual examples, making it distinct from traditional benchmarks that rely on memorization or straightforward logic.
The design of the problems specifically targets abstract reasoning, distinguishing it from other benchmarks that may not focus on visual inference.
More focused on abstract reasoning than standard datasets like MNIST, which primarily test recognition rather than inference.
evaluation metric formulation
Medium confidenceARC provides a framework for evaluating the performance of AI systems on its visual reasoning problems. It uses a set of criteria based on human performance to assess how well AI models can infer rules from the provided examples. This systematic approach to evaluation ensures that results are comparable across different AI systems and methodologies.
The evaluation metrics are specifically tailored to assess abstract reasoning capabilities, unlike generic metrics that may not reflect reasoning depth.
Offers more nuanced evaluation than traditional benchmarks like accuracy, which may not fully capture reasoning abilities.
Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.
Related Artifactssharing capabilities
Artifacts that share capabilities with ARC, ranked by overlap. Discovered automatically through the match graph.
BIG-Bench Hard (BBH)
23 hardest BIG-Bench tasks where models initially failed.
o3
OpenAI's most powerful reasoning model for complex problems.
ARC (AI2 Reasoning Challenge)
7.8K science questions testing genuine reasoning, not just recall.
Build a Reasoning Model (From Scratch)
A guide to building a working reasoning model from the ground up, by Sebastian Raschka.
ARC-AGI
Abstract reasoning benchmark with $1M prize for AGI.
GSM8K
8.5K grade school math problems — multi-step reasoning, verifiable solutions, reasoning benchmark.
Best For
- ✓researchers developing AI models for reasoning tasks
- ✓developers creating AI systems that require advanced reasoning capabilities
- ✓AI researchers looking to benchmark their models
- ✓developers needing a standardized evaluation method for reasoning tasks
Known Limitations
- ⚠Limited to 800 total problems, which may not cover all reasoning scenarios
- ⚠Problems are specifically designed for visual reasoning, not applicable to other reasoning types
- ⚠Evaluation metrics may not capture all nuances of reasoning
- ⚠Dependent on the quality and diversity of the problem set
Requirements
Input / Output
UnfragileRank
UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.
About
ARC is a visual reasoning benchmark with 400 training and 400 test problems. Each problem is a grid-pattern puzzle requiring abstract reasoning. Designed to be solvable by humans but hard for AI. Tests ability to infer underlying rules from examples. Good indicator of general reasoning capability beyond memorization.
Categories
Alternatives to ARC
Are you the builder of ARC?
Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.
Get the weekly brief
New tools, rising stars, and what's actually worth your time. No spam.
Data Sources
Looking for something else?
Search →