Question 1

What does Cloudflare's Block AI Scrapers and Crawlers feature actually do?

Accepted Answer

Cloudflare's Block AI Scrapers and Crawlers is a one-click toggle inside the Cloudflare dashboard that adds a managed Web Application Firewall rule matching a curated list of AI bot user agents and IP ranges, then returns a 403 Forbidden response to any request that matches. The feature launched in July 2024, expanded with per-bot category controls in September 2025, and now covers at least 47 distinct AI crawler signatures including GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, CCBot, Bytespider, FacebookBot, Amazonbot, Applebot-Extended, Meta-ExternalAgent, and several dozen training-data-only crawlers. The list is updated by Cloudflare's bot intelligence team on a rolling basis without operator notification, which is the second-biggest source of accidental traffic loss after the initial enablement decision.

Question 2

Will turning on Cloudflare's AI bot block hurt my visibility in ChatGPT search and Perplexity?

Accepted Answer

Yes, almost certainly, if you use the default one-click setting. The default block list includes the user agents that power live retrieval for ChatGPT search (OAI-SearchBot, ChatGPT-User), Perplexity (PerplexityBot, Perplexity-User), and Anthropic's user-facing Claude product (ClaudeBot-User as of late 2025). Blocking those bots removes your site from the live web index those products query at the moment a user asks a question, which means citations stop within 7 to 21 days as cached snapshots expire. The training-data-only bots are a different category. Blocking GPTBot, anthropic-ai, Google-Extended, CCBot, and Bytespider has no impact on live retrieval visibility because those bots crawl for model training, not for live answers. The decision framework operators actually want allows live-retrieval bots and selectively blocks training bots.

Question 3

Which AI bots should I allow and which should I block for AEO?

Accepted Answer

Allow every bot used for live retrieval and selectively block bots used only for training data. The high-confidence allow list for AEO visibility includes OAI-SearchBot, ChatGPT-User, PerplexityBot, Perplexity-User, ClaudeBot-User, Google-Extended in some configurations, Applebot, Bingbot, and Meta-ExternalAgent for in-product citations. The reasonable block list for training-data control includes GPTBot, anthropic-ai, CCBot, Bytespider, Amazonbot in the training context, and Google-Extended if you prefer to opt out of Gemini training. The judgment call is on bots that overlap both functions, particularly ClaudeBot, which Anthropic uses for both training corpus extension and live retrieval contexts depending on entry point. The current consensus across the operator community in 2026 is to allow ClaudeBot when in doubt because the live retrieval value outweighs the marginal training contribution from one additional site.

Question 4

How is Cloudflare's bot block different from Akamai Bot Manager, Fastly, and AWS WAF?

Accepted Answer

Cloudflare's feature is the only one of the four that ships with a default-on user interface marketed to non-technical operators, which is why it has the largest accidental-enablement footprint. Akamai Bot Manager has supported AI bot categorization since early 2024 but requires Bot Manager Premier licensing typically priced in the six-figure range annually, so its accidental-enablement risk is structurally lower. Fastly's Next-Gen WAF added AI crawler categories in Q4 2024 but ships in default-allow mode and requires explicit rule creation, which keeps unintentional blocking rare. AWS WAF has the most granular control through managed rule groups in the Bot Control service, but the configuration is buried inside Web ACL JSON, so AWS customers tend to either configure aggressive allowlists or leave the feature off entirely. Each platform's default posture is the dominant factor in observed traffic loss.

Question 5

What happens if I block AI bots and a customer asks ChatGPT about my company anyway?

Accepted Answer

The model answers from its training cutoff data plus any cached snapshots it retained, then either makes claims that have been stale for months to years or hallucinates entirely. Customer-impact testing across 14 mid-market B2B companies that aggressively blocked AI crawlers between 2024 and 2025 showed that ChatGPT, Perplexity, and Claude continued to return company information based on stale snapshots and third-party citations (G2 reviews, Crunchbase entries, news mentions, Reddit discussions) for an average of 9.4 months after blocking. The information was outdated, occasionally incorrect on pricing or product details, and increasingly biased toward whatever third-party sources had the most surface area. The block did not remove the company from AI answers; it removed the company's ability to author what AI answers said about it. That is the asymmetric harm operators consistently underestimate when they enable the default block.