Docs
If Crawlable's scan shows your site returning HTTP 403 / 503 with a Cloudflare challenge page, AI crawlers like GPTBot, ClaudeBot and PerplexityBot are seeing the same wall. Your content is effectively invisible to LLM answers. Here's how to fix it without turning off Cloudflare protection.
In your Cloudflare dashboard go to Security → Bots and make sure Verified Bots is set to Allow. This automatically lets Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot and other verified AI crawlers through Bot Fight Mode and Super Bot Fight Mode.
Go to Security → WAF → Custom rules → Create rule. Use this expression (Edit expression mode):
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "Applebot-Extended") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "CrawlableBot")Action: Skip. Then check All remaining custom rules, Bot Fight Mode, Super Bot Fight Mode, Rate limiting rules, and Managed Challenge / Interactive Challenge. Save and deploy.
Cloudflare lets the bots through — robots.txt tells them they're welcome. Drop this in at /robots.txt:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: CCBot
Allow: /
User-agent: CrawlableBot
Allow: /
Sitemap: https://example.com/sitemap.xmlUnder Security → Settings, set Security Level to "Medium" or "Essentially Off" for the content-heavy paths AI bots need to read (usually / and your blog/docs routes). Keep it "High" on /admin, /login, and /api.
Run the scan again from the home page. You should see HTTP 200 and a real readability score. If it still fails, double-check that the WAF rule's Skip targets include "Managed Challenge" — that's the most common miss.
Solving Cloudflare's challenge from our side (residential proxies, CAPTCHA solvers, fingerprint spoofing) violates Cloudflare's terms and the site owner's intent. Real AI crawlers don't do it either — they just give up and your site doesn't appear in their answers. Allowlisting on your end is the only durable fix.