Anthropic: Claude Sonnet 4Model25/100 via “code generation and completion with swe-bench optimization”
Claude Sonnet 4 significantly enhances the capabilities of its predecessor, Sonnet 3.7, excelling in both coding and reasoning tasks with improved precision and controllability. Achieving state-of-the-art performance on SWE-bench (72.7%),...
Unique: Achieves 72.7% on SWE-bench (state-of-the-art) through specialized training on real GitHub repositories and software engineering tasks, with implicit structural reasoning that generates code respecting language-specific idioms and type constraints without explicit AST parsing
vs others: Outperforms GPT-4 Turbo and Claude 3.5 Sonnet on SWE-bench by 5-8 percentage points, with better handling of multi-file edits and complex refactoring scenarios due to improved reasoning about code dependencies