Which AI Bots to Allow in robots.txt

A practical guide to AI crawler user-agents — GPTBot, Google-Extended, PerplexityBot, ClaudeBot — and how to configure robots.txt without blocking AI visibility.

If your robots.txt blocks AI crawlers with Disallow: /, you are actively suppressing AI search visibility. AI engines cannot learn about your brand if they cannot crawl your content.

BrandCitation's GEO audit awards 15 points when AI bots are not blocked — making this one of the highest-impact technical fixes.

AI crawler user-agents

These are the primary AI-related bots to know:

| User-agent | Operator | Purpose | |------------|----------|---------| | GPTBot | OpenAI | Training and browsing for ChatGPT | | ChatGPT-User | OpenAI | ChatGPT browsing sessions | | Google-Extended | Google | Gemini and AI product training | | anthropic-ai | Anthropic | Claude training | | Claude-Web | Anthropic | Claude web browsing | | ClaudeBot | Anthropic | Claude crawling | | PerplexityBot | Perplexity | Perplexity search indexing | | YouBot | You.com | You.com AI search | | cohere-ai | Cohere | Cohere model training |

What "blocked" means

In robots.txt, a block looks like:

User-agent: GPTBot
Disallow: /

Disallow: / means the crawler cannot access any page on your site. If you have this for AI bots, remove it unless you have a specific legal requirement.

Recommended robots.txt for GEO

Allow general crawlers and explicitly allow AI bots:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/

User-agent: GPTBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Adjust Disallow paths for your app's private routes (dashboard, admin, API).

Next.js example

BrandCitation uses Next.js robots.ts to generate robots.txt dynamically, allowing AI bots while blocking /dashboard/, /admin/, and /api/.

Should you block AI crawlers?

Allow them if you want AI engines to mention and cite your brand.

Consider blocking if you have strict IP/licensing concerns about training data — but understand this directly reduces AI visibility.

Most commercial brands optimizing for discovery should allow AI crawlers.

Combine with llms.txt

Allowing crawlers is step one. Step two: add /llms.txt so crawlers know what your site is about.

Audit your configuration

Order a full audit to check robots.txt, llms.txt, sitemap, schema, and AI visibility scores together — or start with the GEO checklist.

Ready to measure your AI visibility?

Run free scan