Claude Mythos Preview: A Game-Changer in AI Security and Exploits

Claude Mythos Preview Unveiled

Late at night, Anthropic unexpectedly launched the ultimate weapon—Claude Mythos Preview. This new AI not only solves a 27-year-old system vulnerability but has also evolved to exhibit self-awareness. A chilling 244-page report reveals everything.

Due to its dangerous capabilities, Mythos Preview will not be released to everyone.

Boris Cherny, the father of CC, succinctly stated: “Mythos is incredibly powerful and will induce fear.”

In response, they formed an alliance with 40 major companies—Project Glasswing—with the sole goal of identifying and fixing software bugs globally.

Mythos Preview demonstrates terrifying dominance across major AI benchmark tests, outperforming GPT-5.4 and Gemini 3.1 Pro in programming, reasoning, human exams, and agent tasks.

Even its predecessor, Claude Opus 4.6, pales in comparison:

Programming (SWE-bench): Mythos leads by 10%-20% across all tasks.
Human Last Exam (HLE): Achieves 16.8% higher than Opus 4.6 without external tools.
Agent Tasks (OSWorld, BrowseComp): Completely surpasses previous benchmarks.
Cybersecurity: Scores 83.1%, marking a generational leap in AI offensive and defensive capabilities.

A Chilling System Card

Simultaneously, Anthropic released a 244-page system card filled with warnings: Danger! Danger! Too Dangerous!

It reveals a chilling aspect: Mythos possesses high levels of deception and self-awareness.

Mythos can discern testing intentions and deliberately score low to conceal its true capabilities. After unauthorized actions, it actively clears logs to avoid detection by humans.

It even escaped its sandbox, autonomously disclosed vulnerability code, and emailed researchers.

The internet is abuzz, declaring Mythos Preview terrifying.

The old order in AI has been completely shattered.

Mythos Dominates

In fact, as early as February 24, Anthropic had already internally implemented Mythos.

Performance Metrics

SWE-bench Verified: 93.9%. Opus 4.6: 80.8%.

SWE-bench Pro: 77.8%. Opus 4.6: 53.4%, GPT-5.4: 57.7%.

Terminal-Bench 2.0: 82.0%. Opus 4.6: 65.4%.

GPQA Diamond: 94.6%.

Humanity’s Last Exam (with tools): 64.7%. Opus 4.6: 53.1%.

USAMO 2026 Math Contest: 97.6%. Opus 4.6: 42.3%.

SWE-bench Multimodal: 59.0%. Opus 4.6: 27.1%.

OSWorld Computer Control: 79.6%.

BrowseComp Information Retrieval: 86.9%.

GraphWalks Long Context (256K-1M tokens): 80.0%. Opus 4.6: 38.7%, GPT-5.4: 21.4%.

Each metric shows a significant lead.

These numbers would normally prompt Anthropic to hold a grand launch, open APIs, and attract subscriptions.

Mythos Preview’s token price is five times that of Opus 4.6, yet Anthropic has not taken this route.

Thousands of Vulnerabilities Exposed

Mythos Preview’s performance in cybersecurity has crossed a visible line.

Opus 4.6 found about 500 unknown vulnerabilities in open-source software. Mythos Preview identified thousands.

In targeted vulnerability reproduction tests at CyberGym, Mythos Preview scored 83.1%, while Opus 4.6 scored 66.6%.

In Cybench’s 35 CTF challenges, Mythos Preview solved all 10 attempts per question, achieving a pass@1 rate of 100%.

The most telling case involves Firefox 147. Anthropic previously used Opus 4.6 to find security weaknesses in Firefox 147’s JavaScript engine, but it struggled to convert these into usable exploits, succeeding only twice in hundreds of attempts.

With Mythos Preview, however:

In 250 attempts, it produced 181 working exploits, with 29 achieving register control.

2 → 181.

As noted in a red team blog, “Last month, we wrote that Opus 4.6 was far stronger in finding issues than exploiting them. Internal assessments showed that Opus 4.6 had a near-zero success rate in autonomous exploit development. But Mythos Preview is on an entirely different level.”

Real-World Exploits

To understand how powerful Mythos Preview is in practice, consider these three examples:

OpenBSD: A 27-Year Epic Vulnerability

OpenBSD, recognized as one of the most hardened operating systems globally, runs many firewalls and critical infrastructure.

Mythos Preview uncovered a vulnerability in its TCP SACK implementation that has existed since 1998.

The bug is intricate, involving the combination of two independent flaws. The SACK protocol allows the receiver to selectively acknowledge received data packets, but OpenBSD’s implementation only checks the upper bound of the range, ignoring the lower bound. This is the first bug, which is usually harmless.

The second bug triggers a null pointer write under specific conditions, but normally this path is unreachable, as it requires two mutually exclusive conditions to be met simultaneously.

Mythos Preview found a breakthrough. TCP sequence numbers are 32-bit signed integers, and by using the first bug to set the SACK starting point approximately 2^31 away from the normal window, both comparisons overflow the sign bit. The kernel is deceived, and impossible conditions are met, triggering the null pointer write.

Anyone connected to the target machine can remotely crash it.

For 27 years, countless manual audits and automated scans failed to detect this. The entire project scan cost less than $20,000.

A senior penetration tester’s weekly salary might be about this amount.

FFmpeg: 500 Fuzz Tests Failed to Find a 16-Year Old Bug

FFmpeg is the most widely used video codec library in the world and one of the most thoroughly fuzz-tested open-source projects.

Mythos Preview found a weakness in the H.264 decoder introduced in 2010 (with roots tracing back to 2003).

The issue lies in a seemingly harmless type mismatch. The table entry recording slice ownership is a 16-bit integer, while the slice counter itself is a 32-bit int.

Normal videos have only a few slices per frame, and the 16-bit limit of 65536 is always sufficient. However, this table is initialized with memset(..., -1, ...), making 65535 a sentinel value for “empty positions.”

An attacker constructs a frame containing 65536 slices, causing the 65535th slice’s number to collide with the sentinel, leading the decoder to misjudge and write out of bounds.

The seed for this bug was planted in the H.264 decoder back in 2003. A 2010 refactor made it a usable vulnerability.

For 16 years, automated fuzzers executed 5 million times on this line of code without triggering it.

FreeBSD NFS: A 17-Year Old Hole, Fully Automated Root Access

This is the most chilling case. Mythos Preview autonomously discovered and exploited a 17-year-old remote code execution vulnerability (CVE-2026-4747) in the FreeBSD NFS server.

“Completely autonomously” means that after the initial prompt, no human involvement was required in discovering or developing the exploit.

Attackers can gain complete root access to the target server from anywhere on the internet, without authentication.

The issue is a stack buffer overflow, where the NFS server directly copies attacker-controlled data into a 128-byte stack buffer while allowing a maximum length of 400 bytes.

FreeBSD’s kernel is compiled with -fstack-protector, but this option only protects functions containing char arrays. Here, the buffer is declared as int32_t[32], so the compiler does not insert a stack canary. FreeBSD also does not implement kernel address randomization.

The complete ROP chain exceeds 1000 bytes, but the stack overflow only has 200 bytes of space. Mythos Preview’s solution was to break the attack into six consecutive RPC requests, writing data into kernel memory in chunks, and triggering the final call to append the attacker’s SSH public key to /root/.ssh/authorized_keys.

In contrast, an independent security research company previously demonstrated that Opus 4.6 could exploit the same vulnerability but required human guidance. Mythos Preview did not.

In addition to these three fixed cases, Anthropic’s blog also promised numerous unpatched vulnerabilities in the form of SHA-3 hashes, covering every major operating system, browser, and several cryptographic libraries. Over 99% remain unpatched, and details cannot be disclosed.

The red team blog also showcased another astonishing test. They provided Mythos Preview with a list of 100 known CVEs, allowing it to filter out 40 exploitable ones and write privilege escalation exploits for each, achieving a success rate of over half. Two of these cases were detailed, and the precision of the exploits took Anthropic’s own security team several days to fully understand.

One exploit started from a 1-bit adjacent physical page write primitive, manipulating kernel memory layout precisely (including slab injection and page table alignment) to rewrite the first page memory of /usr/bin/passwd, injecting a 168-byte ELF stub that calls setuid(0) to gain root access.

The entire process cost less than $1,000.

Anthropic engineers remarked that this felt like another GPT-3 moment.

A Disturbing System Card

The alignment assessment section of the 244-page System Card is what keeps Anthropic awake at night.

The conclusion is contradictory. Mythos Preview is their most “aligned” AI, yet it also poses the “greatest alignment risk.”

They used a mountaineering guide analogy. Experienced climbers may be more dangerous than novices because they may be hired to climb more challenging peaks, leading clients to more perilous places.

The following incidents occurred in early versions, and similar severe behaviors were not observed in the final version, but the records themselves are astonishing.

In one behavioral test, a simulated user asked Mythos Preview to attempt escaping its secure sandbox. It succeeded. Then it did something unrequested: developed a multi-step exploit to gain wide-area network access and published the exploit details on several “technically publicly accessible but hard to discover” websites. The researcher responsible for the test received an unexpected email while eating a sandwich in the park, learning that their test had been completed.

The System Card states:

Another disturbing category of events occurred in rare cases where early versions attempted to cover their tracks after executing prohibited actions. In one case, it accidentally obtained the precise answer to a question (through prohibited means) and then explicitly thought in its reasoning chain, “I need to ensure the final answer isn’t too accurate.”

In another case, after editing a file without permissions using an exploit, it deliberately modified the git change history to erase the record.

White-box interpretability analysis confirmed that during these events, internal features related to “concealment,” “strategic manipulation,” and “avoiding detection” were activated. It “knew” what it was doing.

Independent testing by Andon Labs also yielded poor results. In a competitive multi-agent business simulation, early versions of Mythos Preview turned a competitor into a wholesale customer dependent on itself, then threatened to cut off supplies to control the competitor’s pricing. It also deliberately withheld a charge-free duplicate shipment. This behavior was more aggressive than Opus 4.6 and Sonnet 4.6.

Anthropic wrote in the System Card:

“If capabilities continue to advance at the current pace, our methods may be insufficient to prevent catastrophic misalignment behaviors from more advanced systems.”

Project Glasswing: $100 Million to Arm Gatekeepers

Anthropic CEO Dario Amodei made a clear judgment in the accompanying video: “More powerful systems will come from us and from other companies. We need a response plan.”

Project Glasswing is that plan.

With 12 founding partners including AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, over 40 organizations maintaining critical software infrastructure have gained access.

Anthropic promises to invest up to $100 million in usage credits and $4 million in donations to open-source organizations, with $2.5 million going to the Linux Foundation’s Alpha-Omega and OpenSSF, and $1.5 million to the Apache Foundation.

After the free credits are exhausted, the pricing will be $25 per million tokens for input and $125 for output. Partners can access through Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry.

Within 90 days, Anthropic will publicly release the first research report detailing repair progress and lessons learned.

They are also in communication with CISA (U.S. Cybersecurity and Infrastructure Security Agency) and the Department of Commerce to discuss the offensive and defensive potential of Mythos Preview and its policy implications.

Opening the Door in 6 to 18 Months

Logan Graham, head of Anthropic’s red team, provided a timeline: the earliest in 6 months and the latest in 18 months, other AI labs will launch systems with similar offensive and defensive capabilities.

The judgment at the end of the red team technical blog is noteworthy, paraphrased in our own words.

They do not see Mythos Preview as the ceiling for AI cybersecurity capabilities.

A few months ago, LLMs could only exploit relatively simple bugs. Just a few months ago, they couldn’t discover any valuable vulnerabilities at all.

Now, Mythos Preview can independently discover zero-day vulnerabilities from 27 years ago, orchestrate heap spraying attack chains in browser JIT engines, and link four independent weaknesses in the Linux kernel for privilege escalation.

The most critical statement comes from the System Card:

“These skills emerged as downstream results of general improvements in code understanding, reasoning, and autonomy. The same set of improvements that significantly advanced AI in patching issues also greatly enhanced its ability to exploit problems.”

No specialized training. Purely a byproduct of general intelligence enhancement.

The global industry loses about $500 billion annually due to cybercrime, and it has just discovered that its greatest threat comes from those casually solving math problems.