Claude 4.1 Opus Released: Enhanced Programming Capabilities and Future Improvements Ahead

Claude 4.1 Opus Released

On August 5, 2025, Anthropic officially launched the latest upgrade of its flagship AI model series, Claude 4.1 Opus. This release comes just three months after the previous model, Claude 4 Opus, and Anthropic claims that the new model has made significant improvements in programming, agentic tasks, and reasoning abilities.

The timing of this release is particularly notable, coinciding with OpenAI’s launch of its first open-source reasoning models since 2019, and the industry widely anticipates the debut of GPT-5 later this month. In response to the upcoming competition, Anthropic’s Chief Product Officer, Mike Krieger, stated that this release reflects a shift in the company’s strategy. “In the rapidly evolving AI landscape, we should focus on existing products rather than only releasing truly significant upgrades,” Krieger told Bloomberg.

According to Anthropic’s official introduction, Claude 4.1 Opus is not a revolutionary generational leap but an important upgrade based on Claude 4. Its core improvements focus on three areas: programming capabilities in real-world scenarios, the ability to autonomously execute complex tasks, and enhanced logical reasoning. The new model is available to all paid Claude users, subscribers of Claude Code (a vertical product focused on programming assistance), and is also accessible via its API, Amazon’s Amazon Bedrock, and Google Cloud’s Vertex AI platform.

In terms of pricing, Claude 4.1 Opus maintains the same structure as its predecessor, with input tokens priced at $15 per million and output tokens at $75 per million, making it one of the most expensive AI models on the market.

The most significant update is undoubtedly its enhanced programming capabilities. Anthropic reported that Claude 4.1 Opus achieved a score of 74.5% on the software engineering benchmark SWE-bench Verified, up from 72.5% for the previous model Opus 4, surpassing OpenAI’s latest o3 model (69.1%) and Google Gemini 2.5 Pro (67.2%). In the Terminal-Bench programming test, the new model scored 43.3%, a notable increase from Opus 4’s 39.2%, far exceeding OpenAI o3’s 30.2% and Google Gemini 2.5 Pro’s 25.3%.

Image: Benchmark results for Claude 4.1 Opus (Source: Anthropic)

GitHub noted that Claude 4.1 Opus shows “especially significant performance improvements” in complex tasks such as multi-file code refactoring. Japanese e-commerce giant Rakuten Group reported that the new model can accurately identify and fix issues in large codebases without introducing unnecessary changes or new errors, a precision critical for everyday debugging tasks.

The programming application Windsurf, acquired by Cognition, also provided positive feedback, reporting a standard deviation improvement in its internal junior developer benchmark, akin to the upgrade from Sonnet 3.7 to Sonnet 4.

In terms of safety, Claude 4.1 Opus continues to operate under the ASL-3 (AI Safety Level 3) framework, the strictest safety standard applied by Anthropic to date. In harmlessness testing, the new model’s refusal rate for policy-violating requests improved from 97.27% for Opus 4 to 98.76%, demonstrating stronger safety controls.

However, in other general capability benchmarks, Claude 4.1 Opus’s advantages are not as pronounced as in programming. For instance, in the GPQA Diamond test assessing graduate-level reasoning abilities, its score (80.9%) remains on par with its predecessor but lags behind Gemini 2.5 Pro’s 86.4% and OpenAI o3’s 83.3%. In high school math competitions (AIME) and visual reasoning (MMMU) tests, it has shown mixed results against competitors, lacking absolute dominance. This suggests that the release of Claude 4.1 Opus is a highly focused upgrade with clear strategic goals, primarily aimed at strengthening its moat in the lucrative AI programming market.

Reports indicate that Anthropic’s annual recurring revenue (ARR) has skyrocketed from $1 billion to nearly $5 billion in just seven months, driven largely by its established technological barriers and business ecosystem in the AI programming field. Besides API revenue, Anthropic is actively diversifying its products to build a more robust revenue structure. Its direct-to-developer Claude Code subscription service has shown impressive performance, with annual revenue nearing $400 million and doubling in recent weeks.

Image: ARR comparison between OpenAI and Anthropic (Source: X)

This outstanding business performance also provides solid backing for the company’s ongoing massive financing efforts. Coinciding with this release, Anthropic is nearing the completion of a significant funding round. According to The Information, the company plans to raise up to $5 billion in a new round led by Iconiq Capital, potentially valuing it at $170 billion, nearly tripling its valuation from $61.5 billion in March of this year.

This would make Anthropic one of the most valuable unicorns globally, second only to OpenAI and SpaceX, and provide ample ammunition for its next phase of competition.

In its statement, Anthropic indicated plans to release “more substantial model improvements” in the coming weeks, hinting at more significant technological breakthroughs on the horizon, which is undoubtedly a direct strategic response to the impending GPT-5. The next peak showdown in the AI field is already on the horizon.

Claude 4.1 Opus Released: Enhanced Programming Capabilities and Future Improvements Ahead

Anthropic has launched Claude 4.1 Opus, showcasing significant advancements in programming and reasoning abilities, with more improvements expected soon.

Claude 4.1 Opus Released

Comments