New Release February 5, 2026

Claude Opus 4.6

Waqar's Verdict: Opus 4.5 Was a Beast. Then 4.6 Showed Up.

Anthropic's most capable model. Record-breaking benchmarks.
1M token context. Adaptive thinking. Built for the hardest problems.

0% ARC AGI 2
0M Token Context
0% Terminal-Bench 2.0
0% OSWorld
Scroll to explore
Performance

Record-Breaking
Benchmarks

Opus 4.6 leads every major frontier benchmark. On ARC AGI 2 — problems easy for humans, hard for AI — it nearly doubles its predecessor.

Terminal-Bench 2.0 Industry Leading
Opus 4.6
65.4%
Opus 4.5
59.8%

Agentic coding evaluation. Real-world software engineering tasks.

OSWorld Best Computer Use
Opus 4.6
72.7%
Opus 4.5
66.3%

Agentic computer use benchmark. GUI automation at scale.

MRCR v2 1M Context
Opus 4.6
76%
Sonnet 4.5
18.5%

Needle-in-a-haystack retrieval at 1M tokens. 4x improvement.

GDPval-AA — Knowledge Work Enterprise Value
Opus 4.6
GPT-5.2
-144 Elo
Opus 4.5
-190 Elo

Economically valuable knowledge work tasks across finance, legal, and enterprise domains. ~70% win rate vs GPT-5.2.

BigLaw Bench Legal AI
90.2 % score

40% perfect scores. 84% scoring above 0.8. Highest Claude score ever.

Life Sciences 2x Better
2x
Computational Biology Structural Biology Organic Chemistry Phylogenetics

~2x improvement over Opus 4.5 across life science disciplines.

Capabilities

What's New in
Opus 4.6

Four major upgrades that change how you work with AI. Each designed for the hardest, most valuable tasks.

Adaptive Thinking

Claude now dynamically decides when and how deeply to reason. No more manual budget_tokens. Four effort levels — low, medium, high, max — let you balance intelligence, speed, and cost.

Low Fast responses
Medium Balanced
High Default
Max Full power

1M Token Context

Process entire codebases, legal documents, or research papers in a single prompt. 5x the previous 200K limit, with 76% accuracy on needle-in-a-haystack retrieval.

200K
1M tokens

Context Compaction

Automatic summarization of older conversational tokens. Long-running tasks no longer hit context limits — Claude compresses what it no longer needs in detail.

Agent Teams

Multiple AI agents work simultaneously on different aspects of a coding project, coordinating autonomously. Ship features faster with parallel agentic workflows.

Comparison

Opus 4.5 vs Opus 4.6

A side-by-side look at what changed. Same price, dramatically more capability.

Specification Opus 4.5 Nov 2025 Opus 4.6 Feb 2026
Context Window 200K tokens 1M tokens 5x
Max Output 128K tokens 128K tokens
Thinking Mode Extended Thinking Adaptive Thinking New
ARC AGI 2 37.6% 68.8% +83%
Terminal-Bench 2.0 59.8% 65.4% +9.4%
OSWorld 66.3% 72.7% +9.7%
BigLaw Bench 90.2% New
MRCR v2 (1M) 76% New
SWE-bench Verified 80.9% 80.8%
Life Sciences Baseline ~2x improvement 2x
Agent Teams No Yes New
Context Compaction No Yes (beta) New
Input Pricing $5 / 1M tokens $5 / 1M tokens
Output Pricing $25 / 1M tokens $25 / 1M tokens
Pricing

More Capability.
Same Price.

Dramatically improved performance with no price increase. Premium 1M context available for long-form tasks.

Standard

Up to 200K context

$5 / 1M input tokens
$25 / 1M output tokens
  • 200K token context window
  • 128K max output tokens
  • Adaptive thinking included
  • All standard features
Get Started
Use Cases

Built for the
Hardest Work

01

Enterprise Knowledge Work

190 Elo points above Opus 4.5 on GDPval-AA. Finance, legal analysis, and complex business reasoning at scale.

02

Agentic Coding

Highest Terminal-Bench 2.0 score in the industry. Build, debug, and ship production code with agent teams working in parallel.

03

Scientific Research

2x improvement in computational biology, structural biology, organic chemistry. Process entire research papers in a single context.

04

Legal Analysis

90.2% on BigLaw Bench. 40% perfect scores. Review contracts, case law, and regulatory documents with unmatched precision.

05

Computer Use

72.7% on OSWorld — the best computer-using model available. Automate GUI workflows, test applications, and interact with desktop environments.

06

Deep Research

Industry-leading BrowseComp and DeepSearchQA scores. Multi-step agentic search for hard-to-find information across the web.

Start Building with
Opus 4.6

Available now on claude.ai, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure Foundry.

claude-opus-4-6