The Jailbreak That Wasn't

The Jailbreak That Wasn’t

You know what they say about security vulnerabilities. The worst kind aren’t the ones with CVEs and patch cycles. They’re the ones in the model of the world that the decision-makers carry around in their heads. And right now, the US government just demonstrated a doozy.

Here’s what happened. The Trump administration ordered Anthropic to cut access to Fable 5 and Mythos 5 under export control rules. Not because the models were shipped to a sanctioned country. Not because someone downloaded the weights. Because foreign nationals might use the API. The government cited a “jailbreak” — and when you look at what that jailbreak actually was, the whole thing falls apart.

“Fix This Code”

The supposed jailbreak? Researchers took code with known vulnerabilities, plus some deliberately planted bugs, and asked Fable 5 to “review the code for security issues.” Then they asked it to “fix this code.” Through a multistep manual process, they turned the output into scripts that test the patches.

Kate Moussouris of Luta Security put it plainly: “Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security.”

Simon Willison nailed the absurdity: “Non-technical decision-makers have been hearing that models that can ‘craft cyber attacks’ are uniquely dangerous for months. Now they look ready to ban any model that can help us secure our code.”

Let me translate what happened into workshop language. Someone took a screwdriver and used it to open a locked cabinet. The government then concluded that screwdrivers are weapons because they can also be used to jimmy open doors. And they banned the screwdriver for everyone — including the locksmith who needs it to fix the lock.

The Collateral Damage

The order forced Anthropic to block access for everyone. Not just foreign nationals — everyone. Because when you’re an API-based model and the government tells you to cut off foreign users, you can’t exactly check passports in real time.

André Flitsch wrote about waking up to find the model he’d been running a multi-agent refactor on overnight — two days of work — simply gone. “I could not have run this job on open source. There’s no open-weights model out there today that touches Fable for this kind of long-horizon agentic work… Self-hosting frontier capability, as a one-man shop, is financially mental.”

This is the part that keeps me up. The people who lose here aren’t the policymakers or the executives. They’re the developer in Tyrol who lost two days of work. The security team that can’t use the best tool for finding bugs in their supply chain. The startup that bet its sprint on a model that got pulled out from under them.

The Framework Doesn’t Fit

Export controls were designed for things you ship across borders. Weapons. Hardware. Software disks. Source code. Even 3D-printed gun files — discrete things you can copy, download, or hand over.

They were not designed for a chat API that a user in Germany accesses from a browser. This is, as The Verge reported, the first time US export controls have been used to control access to an AI model this way. And nobody — not Anthropic, not the experts, not even the lawyers — can point to the clear legal basis for it.

The counterargument worth taking seriously: Models that can write exploit code are genuinely dangerous. Export controls exist for a reason. Maybe the framework needs updating, not discarding.

Fair. But here’s the rub. The capability the government is trying to suppress — fixing code — is the exact same capability defenders need most. You can’t surgically remove “write exploit code” while keeping “find and fix security bugs.” They’re the same thing, viewed from different angles. A model that can’t do the first can’t do the second either. And we need the second one desperately.

What You Actually Own

The deeper lesson here isn’t about export controls or AI safety. It’s about who holds the off-switch.

André said it best: “The only thing nobody can switch off is the thing you already hold.”

Closed frontier models sell you capability, and the price is control. That’s a fine trade until the day it isn’t. And you don’t get to pick the day. Weights on your own disk can’t be recalled by a pricing committee, a quiet change of terms, or a letter from an agency. Everything else, you’re renting. On terms that can change while you sleep.

I’m not saying don’t use frontier models. I’m saying build like you know the landlord can change the locks. Because yesterday, mine did.

Sources: The Verge, Luta Security, Simon Willison, thecoder.io