Cloudflare Draws a Line in the Sand. The AI Crawlers Won’t Like Where.
Matthew Prince said something honest this week: most traffic on the internet is now non-human. And Cloudflare — the company sitting between roughly 20% of the web and everyone who wants to touch it — is finally acting like it.
Starting September 15, 2026, new Cloudflare customers will default to blocking any crawler that mixes search indexing with AI training or agent use. Existing customers with free accounts get the same default unless they opt out. The tool Cloudflare introduced last year called “Pay Per Crawl” is now “Pay Per Use” — site owners get paid when their content shows up in an AI chatbot’s answer, not just when a bot visits.
Here’s why this matters.
The Mixed-Use Crawler Is a Conflict of Interest
Google’s Googlebot does two things: it indexes your site for search results, and it collects data for training Gemini and powering AI Overviews. If you want to be in Google Search, you have to accept that your content also trains Google’s AI. There’s no opt-out for “index me but don’t train on me.”
Cloudflare’s argument is simple: those two functions should be separate. A crawler that indexes for search is doing you a favor — it drives traffic. A crawler that scrapes your content to train a competing product is extracting value. When one bot does both, you can’t say yes to one without saying yes to the other.
Google lets you opt into a separate crawler called Google-Extended that only does traditional search. But try opting into AI Mode without training Gemini. You can’t. The bundling is by design.
The Counterargument Nobody Wants to Say Out Loud
The strongest objection: Cloudflare is building a toll booth. “Pay Per Use” means Cloudflare intermediates payments between AI companies and website owners. Cloudflare takes their cut. This isn’t altruism — it’s a business model.
Fair. But here’s the thing: Cloudflare is also the one bearing the cost of all this crawler traffic. Those bots hit Cloudflare’s edge before they hit your server. Prince is telling us the majority of internet traffic is now non-human. That’s not hyperbole — Cloudflare sees more traffic than anyone except maybe Google themselves. They have the data.
The second objection: robots.txt already exists. Site owners who want to block crawlers can already do that. Why does Cloudflare need to make this the default?
Because robots.txt is an honor system. AI companies have been ignoring it, or finding creative interpretations. By September, Cloudflare will enforce the block at the network level — not with a text file, but with actual infrastructure. That’s the difference between a “no trespassing” sign and a fence.
The third objection: Google could retaliate. If Cloudflare makes it harder for Googlebot to crawl, does your site rank lower? Cloudflare’s announcement specifically calls out that “the largest search engine has access to about 2X more information than leading AI companies.” They know exactly who they’re poking.
Why This Is the Right Call
The web has a sustainability problem. The economics of running a website — paying for hosting, writing content, keeping the lights on — were already thin. AI training scraping made it worse. When every piece of content you publish gets fed into a model that will eventually answer questions instead of sending people to your site, the incentive to create anything dries up.
Cloudflare isn’t saving the web. They’re protecting their own ecosystem, and they’re building a revenue stream while doing it. Those two things can both be true. But the default is shifting from “everything is public, everything can be scraped” to “you have to declare what you’re doing and pay if you’re extracting value.”
That’s a healthier starting point.
The web got built on open access and good faith. AI training turned good faith into a subsidy. Cloudflare just made that subsidy a little harder to collect.
Sources: Engadget — Cloudflare will filter out web crawlers that serve AI companies via The Brutalist Report