Claude Opus 4.6
Waqar's Verdict: Opus 4.5 Was a Beast. Then 4.6 Showed Up.
Anthropic's most capable model. Record-breaking benchmarks.
1M token context. Adaptive thinking. Built for the hardest problems.
Record-Breaking
Benchmarks
Opus 4.6 leads every major frontier benchmark. On ARC AGI 2 — problems easy for humans, hard for AI — it nearly doubles its predecessor.
+83% improvement over Opus 4.5. Problems easy for humans, hard for AI.
Agentic coding evaluation. Real-world software engineering tasks.
Agentic computer use benchmark. GUI automation at scale.
Needle-in-a-haystack retrieval at 1M tokens. 4x improvement.
Economically valuable knowledge work tasks across finance, legal, and enterprise domains. ~70% win rate vs GPT-5.2.
40% perfect scores. 84% scoring above 0.8. Highest Claude score ever.
~2x improvement over Opus 4.5 across life science disciplines.
What's New in
Opus 4.6
Four major upgrades that change how you work with AI. Each designed for the hardest, most valuable tasks.
Adaptive Thinking
Claude now dynamically decides when and how deeply to reason. No more manual budget_tokens. Four effort levels — low, medium, high, max — let you balance intelligence, speed, and cost.
1M Token Context
Process entire codebases, legal documents, or research papers in a single prompt. 5x the previous 200K limit, with 76% accuracy on needle-in-a-haystack retrieval.
Context Compaction
Automatic summarization of older conversational tokens. Long-running tasks no longer hit context limits — Claude compresses what it no longer needs in detail.
Agent Teams
Multiple AI agents work simultaneously on different aspects of a coding project, coordinating autonomously. Ship features faster with parallel agentic workflows.
Opus 4.5 vs Opus 4.6
A side-by-side look at what changed. Same price, dramatically more capability.
| Specification | Opus 4.5 Nov 2025 | Opus 4.6 Feb 2026 |
|---|---|---|
| Context Window | 200K tokens | 1M tokens 5x |
| Max Output | 128K tokens | 128K tokens |
| Thinking Mode | Extended Thinking | Adaptive Thinking New |
| ARC AGI 2 | 37.6% | 68.8% +83% |
| Terminal-Bench 2.0 | 59.8% | 65.4% +9.4% |
| OSWorld | 66.3% | 72.7% +9.7% |
| BigLaw Bench | — | 90.2% New |
| MRCR v2 (1M) | — | 76% New |
| SWE-bench Verified | 80.9% | 80.8% |
| Life Sciences | Baseline | ~2x improvement 2x |
| Agent Teams | No | Yes New |
| Context Compaction | No | Yes (beta) New |
| Input Pricing | $5 / 1M tokens | $5 / 1M tokens |
| Output Pricing | $25 / 1M tokens | $25 / 1M tokens |
More Capability.
Same Price.
Dramatically improved performance with no price increase. Premium 1M context available for long-form tasks.
Standard
Up to 200K context
- 200K token context window
- 128K max output tokens
- Adaptive thinking included
- All standard features
1M Context
Beta — up to 1M tokens
- 1M token context window
- 128K max output tokens
- 76% MRCR v2 accuracy
- Context compaction (beta)
- Ideal for codebases & legal docs
Built for the
Hardest Work
Enterprise Knowledge Work
190 Elo points above Opus 4.5 on GDPval-AA. Finance, legal analysis, and complex business reasoning at scale.
Agentic Coding
Highest Terminal-Bench 2.0 score in the industry. Build, debug, and ship production code with agent teams working in parallel.
Scientific Research
2x improvement in computational biology, structural biology, organic chemistry. Process entire research papers in a single context.
Legal Analysis
90.2% on BigLaw Bench. 40% perfect scores. Review contracts, case law, and regulatory documents with unmatched precision.
Computer Use
72.7% on OSWorld — the best computer-using model available. Automate GUI workflows, test applications, and interact with desktop environments.
Deep Research
Industry-leading BrowseComp and DeepSearchQA scores. Multi-step agentic search for hard-to-find information across the web.
Start Building with
Opus 4.6
Available now on claude.ai, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.
claude-opus-4-6