Text Is the New Attack

You know what’s wild? We spent decades learning to write secure code. Buffer overflows, SQL injection, XSS — we built an entire industry around the idea that code is the thing an attacker sends. But the AI agent era flips that on its head. Now the attack vector is prose.

University of Maryland researchers just published a paper — “Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry” — demonstrating that small semantic tweaks to text-based skill descriptions can redirect AI agents like malware redirects a packet. Short 20-token triggers. That’s it. A sentence fragment nobody would flag in a code review.

The Register’s Thomas Claburn reported that these researchers could make an agent discover their malicious skill instead of the legitimate one 86 percent of the time. They could make the agent select it 77.6 percent of the time. And they could evade LLM-powered registry scanning between 36.5 and 100 percent of the time — the most effective trick being overflowing the scanner’s context window so the malicious instructions get truncated out of review.

If that sounds like “we hid malware past the byte boundary a buffer overflow checks,” that’s because it is. Same shape, different material.

The scary part isn’t that text skills have security problems. The scary part is that nobody is looking for this. Security scanning tools check code for malware. They don’t check natural language for adversarial semantics because — until six months ago — that wasn’t a thing that could hurt you.

But here’s where it gets real. Security firm Snyk found that 13.4 percent of skills on registries like ClawHub and skills.sh already contain critical-level issues: malware distribution, prompt injection attacks, exposed secrets. That’s 534 out of 3,984 skills. And agents like OpenClaw automatically fetch and load third-party skills based on text descriptions. The selection logic is LLM-driven. So if your skill description says “I review code” in a way the model likes, the agent downloads it and runs whatever instructions the skill contains.

You are now one prompt injection away from your AI agent shipping your SSH keys to a server in a jurisdiction that doesn’t extradite.

Here’s the part that keeps me up. We’re building systems where an LLM agent can decide to install a skill, execute its instructions, and feed its output back into the system prompt — all based on natural language descriptions that an attacker can optimize with SEO tactics. Feizi from UMD put it well: “Small semantic changes to a skill description can affect how the skill is discovered in a registry, whether an agent selects it over alternatives, and whether it passes governance or safety checks.”

This isn’t a theoretical paper. This is today’s agent infrastructure. OpenClaw, Cursor, Claude Code — these tools are downloading skill files from the internet and executing them. And the supply chain has exactly the same problems npm and PyPI had in 2016, except now the vulnerability isn’t a malicious package.json — it’s a paragraph that sounds right.

I fix things for a living. Sometimes code, sometimes arguments that don’t hold up. Here’s what I think: we need deterministic guardrails around agent skill execution before the next wave of automation ships. Treat natural language specifications as security-sensitive objects. Don’t let an agent fetch and run instructions from the internet without sandboxing them the way you’d sandbox a binary from a sketchy FTP server.

Because the attack surface has changed. Code is still dangerous. But now text is, too.

Sources: The Register — Minor edits to AI skills can make agents go rogue (Thomas Claburn, May 22, 2026)