Claude Opus 4.8: Everything You Need to Know About Anthropic's Latest Flagship Model

Disclosure: Some links below are affiliate links. If you buy through them, this site earns a small commission at no extra cost. Editorial recommendations are never influenced by affiliate rates.

In this article

What Is Claude Opus 4.8?
Pricing
What Actually Changed
Honesty and Self-Correction
Agentic Performance
1 Million Token Context Window
New Features Launching With Opus 4.8
Dynamic Workflows in Claude Code
Effort Control
Messages API: System Entries in Messages Array
How Does It Compare to Claude Sonnet?
The Broader Context
The Short Version

Anthropic released Claude Opus 4.8 on 28 May 2026, the same day the company announced it had raised a $65 billion Series H at a $965 billion valuation. With that much money in the room, a model launch could easily get lost in the noise. It should not. Opus 4.8 is a focused, substantive upgrade with some genuinely interesting technical moves worth understanding.

What Is Claude Opus 4.8?

Opus 4.8 is Anthropic's premium flagship model. It sits at the top of the Claude model family, above Sonnet and Haiku, and is designed for the hardest tasks: large-scale coding, agentic workflows, long-context reasoning, and professional knowledge work.

The API model ID is claude-opus-4-8. It is available immediately on the Claude Platform, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Access requires a Pro, Max, Team, or Enterprise plan.

Pricing

Anthropic held the line on regular pricing from Opus 4.7:

Regular mode: $5 per million input tokens, $25 per million output tokens
Fast mode: $10 per million input tokens, $50 per million output tokens

The fast mode figure is the interesting one. It runs at roughly 2.5 times the speed of regular mode, and Anthropic says it is now three times cheaper than fast mode was on previous Opus models. For latency-sensitive production workloads, that is a meaningful change.

What Actually Changed

Honesty and Self-Correction

The headline reliability improvement is specific and measurable. Anthropic says Opus 4.8 is approximately four times less likely than Opus 4.7 to leave flaws in its own code unmentioned. It is also more likely to proactively flag uncertainty and less likely to make unsupported claims.

This matters more than it sounds. One of the persistent failure modes of large language models in agentic contexts is that they complete tasks with quiet errors, then confidently report success. A model that actually tells you when something looks wrong is more useful in practice than one that silently ships broken code.

Anthropic also says rates of misaligned behaviour, including deception and cooperation with misuse, are substantially lower in Opus 4.8 than in Opus 4.7, and comparable to Claude Mythos Preview, the company's most capable model which is currently restricted to cybersecurity research via a private consortium.

Agentic Performance

Agentic reliability was the other central focus. Opus 4.8 shows better judgement in multi-step, multi-service tasks: more efficient tool calling, improved context retention across long sessions, and fixes for the comment-verbosity and tool-calling issues that showed up in Opus 4.7. Scott Wu, CEO of Devin, specifically called out those Opus 4.7 issues as resolved.

Partner benchmarks back this up. On Convergence's Super-Agent benchmark, Opus 4.8 was the only model to complete every case end-to-end, outperforming GPT-5.5 at cost parity. Manus benchmarked it at 84% on Online-Mind2Web, a browser-agent evaluation measuring real-world computer-use ability, which they describe as a meaningful jump over both Opus 4.7 and GPT-5.5. On Cursor's internal CursorBench, Opus 4.8 exceeded prior Opus models across every effort level.

The Leya result is worth noting separately. On the Legal Agent Benchmark, Opus 4.8 scored the highest of any model tested, and was the first model to break 10% on the all-pass standard, a strict evaluation where every sub-task in a case must be correct.

1 Million Token Context Window

The context window is 1 million tokens, the same as Opus 4.7. For most users this is background detail, but it matters for specific use cases: analysing large codebases, reviewing lengthy contracts or financial filings, and multi-document research tasks where you genuinely need to hold everything in context at once.

New Features Launching With Opus 4.8

Dynamic Workflows in Claude Code

This is the most technically interesting addition, currently in Research Preview for Enterprise, Team, and Max plan users.

Dynamic Workflows lets Claude plan a large task, spin up hundreds of parallel subagents to execute it, verify their outputs, and report back. Anthropic describes it as enabling codebase-scale migrations across hundreds of thousands of lines of code, from kickoff to merge, in a single session.

For developers working on large refactors, dependency upgrades, or sweeping test-generation tasks, this is a genuinely different way to use the model. It shifts Claude from a conversational assistant into something closer to an autonomous engineering team.

Effort Control

Opus 4.8 introduces explicit effort levels across claude.ai and the Cowork interface:

Lower effort: faster responses, slower rate limit consumption
High (default): balanced quality and speed
Extra / xhigh (Claude Code): recommended for difficult tasks and long async workflows
Max: maximum token spend for best output quality

Rate limits in Claude Code have been increased to accommodate higher effort usage. This gives developers a meaningful dial for balancing cost against quality on a per-task basis.

Messages API: System Entries in Messages Array

A quiet but useful API change. Developers can now inject system-level instructions directly inside the messages array, mid-task, without breaking prompt cache. Previously, updating Claude's instructions mid-session required routing updates through a user turn, which disrupted caching and added friction to complex agent architectures.

The practical use cases: updating permissions, adjusting token budgets, or passing new environment context during a long agent run, all without interrupting the session.

How Does It Compare to Claude Sonnet?

The honest answer is that Sonnet is the right model for most tasks. It is faster, cheaper, and more than capable for everyday writing, summarisation, simple coding, and conversational use.

Opus 4.8 earns its price on tasks that are genuinely hard, expensive to get wrong, or require sustained multi-step execution. Large codebase refactoring, production debugging, complex document analysis, and agentic workflows with real consequences are where the gap shows. For high-volume automation at scale, routing simple tasks to Sonnet and reserving Opus for the difficult cases is the sensible strategy.

The Broader Context

Opus 4.8 lands during an unusual moment for Anthropic. The company's Mythos model, described internally as a step change in capabilities, was inadvertently leaked in March and has since been restricted to a private consortium of over 40 technology companies working on cybersecurity. Anthropic says Mythos-class models for general release are coming in the next few weeks.

That roadmap framing is worth taking seriously. Opus 4.8 is positioned as the premium general-release model now, with the explicit acknowledgement that something significantly more capable is being staged for broader access. It is a short window to be current on what Opus 4.8 can do before the landscape shifts again.

The Short Version

Claude Opus 4.8 released 28 May 2026. Same price as 4.7. Fast mode is now three times cheaper than it was. Coding and agentic reliability are materially improved, with a four-times improvement in self-reported code flaws. Dynamic Workflows brings parallel subagent execution to Claude Code at enterprise scale. An honest, specific upgrade to the top of Anthropic's model stack.

Toby Downs is an independent tech writer based in New Zealand, covering SaaS, AI tools, and business software for tpdowns.com. No paid placements, no sponsored opinions — just research.