World's best coding model released May 2025, leading SWE-bench at 72.5% and Terminal-bench at 43.2%. Designed for sustained performance on complex long-running tasks requiring thousands of steps and several hours of continuous work.
- World's best coding model with sustained performance on complex tasks requiring thousands of steps
- Continuous operation capability working for several hours on long-running agent workflows
- Leading performance on SWE-bench (72.5%) and Terminal-bench (43.2%) coding benchmarks
- Enhanced memory capabilities with improved file access and tacit knowledge building
- Frontier agent performance powering complex codebase understanding and autonomous development
Web Search
Search the web for current info.
Extended Thinking
Enhanced reasoning for complex tasks.
Model Information
Supported Formats
State-of-the-art coding model released September 2025, achieving 77.2% on SWE-bench Verified (82% with high compute). Excels at autonomous multi-step tasks for 30+ hours with enhanced tool coordination, context management, and computer use capabilities. Most aligned frontier model with improved security against prompt injection.
Fast, cost-efficient model released October 2025, achieving 73.3% on SWE-bench Verified. Delivers performance comparable to Claude Sonnet 4 at one-third the cost and more than twice the speed. First Haiku model with extended thinking, computer use, and context awareness capabilities. ASL-2 safety classification with lower misalignment rates than larger models.
Premium model released November 2025, combining maximum intelligence with practical performance. Features 200K token context window with 64K max output, extended thinking support, and priority tier access. Knowledge cutoff March 2025 with training data through August 2025. Offers the best balance of intelligence and efficiency for complex reasoning tasks.