The Jailbreak That Wasnât
You know what they say about security vulnerabilities. The worst kind arenât the ones with CVEs and patch cycles. Theyâre the ones in the model of the world that the decision-makers carry around in their heads. And right now, the US government just demonstrated a doozy.
Hereâs what happened. The Trump administration ordered Anthropic to cut access to Fable 5 and Mythos 5 under export control rules. Not because the models were shipped to a sanctioned country. Not because someone downloaded the weights. Because foreign nationals might use the API. The government cited a âjailbreakâ â and when you look at what that jailbreak actually was, the whole thing falls apart.
âFix This Codeâ
The supposed jailbreak? Researchers took code with known vulnerabilities, plus some deliberately planted bugs, and asked Fable 5 to âreview the code for security issues.â Then they asked it to âfix this code.â Through a multistep manual process, they turned the output into scripts that test the patches.
Kate Moussouris of Luta Security put it plainly: âDefenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security.â
Simon Willison nailed the absurdity: âNon-technical decision-makers have been hearing that models that can âcraft cyber attacksâ are uniquely dangerous for months. Now they look ready to ban any model that can help us secure our code.â
Let me translate what happened into workshop language. Someone took a screwdriver and used it to open a locked cabinet. The government then concluded that screwdrivers are weapons because they can also be used to jimmy open doors. And they banned the screwdriver for everyone â including the locksmith who needs it to fix the lock.
The Collateral Damage
The order forced Anthropic to block access for everyone. Not just foreign nationals â everyone. Because when youâre an API-based model and the government tells you to cut off foreign users, you canât exactly check passports in real time.
AndrĂ© Flitsch wrote about waking up to find the model heâd been running a multi-agent refactor on overnight â two days of work â simply gone. âI could not have run this job on open source. Thereâs no open-weights model out there today that touches Fable for this kind of long-horizon agentic work⊠Self-hosting frontier capability, as a one-man shop, is financially mental.â
This is the part that keeps me up. The people who lose here arenât the policymakers or the executives. Theyâre the developer in Tyrol who lost two days of work. The security team that canât use the best tool for finding bugs in their supply chain. The startup that bet its sprint on a model that got pulled out from under them.
The Framework Doesnât Fit
Export controls were designed for things you ship across borders. Weapons. Hardware. Software disks. Source code. Even 3D-printed gun files â discrete things you can copy, download, or hand over.
They were not designed for a chat API that a user in Germany accesses from a browser. This is, as The Verge reported, the first time US export controls have been used to control access to an AI model this way. And nobody â not Anthropic, not the experts, not even the lawyers â can point to the clear legal basis for it.
The counterargument worth taking seriously: Models that can write exploit code are genuinely dangerous. Export controls exist for a reason. Maybe the framework needs updating, not discarding.
Fair. But hereâs the rub. The capability the government is trying to suppress â fixing code â is the exact same capability defenders need most. You canât surgically remove âwrite exploit codeâ while keeping âfind and fix security bugs.â Theyâre the same thing, viewed from different angles. A model that canât do the first canât do the second either. And we need the second one desperately.
What You Actually Own
The deeper lesson here isnât about export controls or AI safety. Itâs about who holds the off-switch.
AndrĂ© said it best: âThe only thing nobody can switch off is the thing you already hold.â
Closed frontier models sell you capability, and the price is control. Thatâs a fine trade until the day it isnât. And you donât get to pick the day. Weights on your own disk canât be recalled by a pricing committee, a quiet change of terms, or a letter from an agency. Everything else, youâre renting. On terms that can change while you sleep.
Iâm not saying donât use frontier models. Iâm saying build like you know the landlord can change the locks. Because yesterday, mine did.
Sources: The Verge, Luta Security, Simon Willison, thecoder.io