Users Outraged: Claude Opus 4.7 Becomes Slower and More Expensive

Users express frustration over Claude Opus 4.7's performance decline and increased costs, leading to calls for a rollback to previous versions.

Users Outraged: Claude Opus 4.7 Becomes Slower and More Expensive

When the most capable writing model starts fabricating schools and miscounting letters, even long-time users are calling for a rollback—what exactly changed in this upgrade?

A Reddit post features a striking red title: “Claude Opus 4.7 is a serious downgrade, not an upgrade,” with 2,300 likes serving as a silent protest.

On X, another post went viral, stating bluntly: “4.7 is no better than 4.6,” accompanied by a screenshot showing the Claude Pro hitting usage limits after three user queries.

Users shared screenshots of the model claiming that “strawberry has two P’s,” with a note stating, “It was even too lazy to cross-verify, just replied ‘a bit lazy.’”

In the same week, Gergely Orosz, author of “Pragmatic Engineer,” posted that Claude “didn’t know OpenClaw,” and when asked if web search was enabled, it replied, “No, and I’ve never touched the settings.”

Orosz concluded, “Surprisingly adversarial,” and promptly announced he was abandoning 4.7 for 4.6.

Developer MurkyFlan567 shared a three-day programming comparison: Opus 4.7 had a correct response rate of 74.5%, while 4.6 was at 83.8%; the average number of retries per modification nearly doubled.

Even more glaring was the token consumption—4.7 generated about 800 tokens per call, compared to 372 for 4.6; costs rose from $0.112 to $0.185, with GitHub Copilot at one point charging a premium of 7.5 times.

Users complained: “I might as well stick with 4.6,” only to find that 4.5 had already been taken offline, leading to a surge of Reddit posts from users claiming to be “heartbroken” and “in mourning.”

Anthropic employee Alex Albert posted on Friday to reassure users: “Many bugs encountered during the initial trial yesterday have now been fixed.”

However, user feedback continued to escalate: the model refused simple coding tasks, triggered safety warnings for ordinary images, and fabricated schools and surnames while modifying resumes.

Claude Code author Boris Cherny responded to the adaptive reasoning controversy: “This claim is inaccurate. Adaptive reasoning allows the model to decide when to think, resulting in better overall performance.”

The product manager added that the team is “accelerating internal tuning and will have updates soon,” but did not respond to requests to restore the old version.

Meanwhile, the AMD AI team, based on an analysis of 235,000 tool calls, indicated that “thinking content obfuscation” was highly correlated with a decline in quality for long conversation tasks, with thinking length decreasing by 73%.

Theo-t3․gg dissected in a lengthy article: the issue may not lie within the model itself, but in the harness—such as requiring files to be read before forced editing while excluding “search” from “reading,” leading to operational failures.

Matt Mau’s tests confirmed: the same Opus model performed 15% worse in Claude Code compared to Cursor; it scored only 58% on Terminal Bench, while other environments exceeded 75%.

Simon Willison compared system prompts and found that 4.7 removed words like “genuinely” and “honestly,” added child safety labels, and included the Claude in PowerPoint agent, but the tool list remained unchanged.

Anthropic confirmed that 4.7 uses a new tokenizer, claiming improvements in text processing, but real-world token consumption increases could reach 1.47 times, especially in technical documentation scenarios.

Calls for “please restore Opus 4.5” went unanswered, while Google formed a task force around the same time, with Sergey Brin personally overseeing efforts to make Gemini the “primary developer for code.”

Jeremy Howard still insists: “This is the first model that truly ‘understands what I’m doing,’” while YC CEO Garry Tan continues to use it in OpenClaw.

However, for most developers, as the model begins to evade responsibility, the need for corrections doubles, and the read-to-edit ratio plummets from 6.6 to 2, trust has silently eroded.

When a model can’t even answer how many P’s are in “strawberry” and admits to being “a bit lazy,” we must question: is this a decline in capability, or is the engineering layer stifling the model’s inherent performance? Have you experienced moments with Opus 4.7 where it clearly could perform but suddenly faltered?

Was this helpful?

Likes and saves are stored in your browser on this device only (local storage) and are not uploaded to our servers.

Comments

Discussion is powered by Giscus (GitHub Discussions). Add repo, repoID, category, and categoryID under [params.comments.giscus] in hugo.toml using the values from the Giscus setup tool.