robots.txt allow list for all major AI bots, (2) a comprehensive llms.txt manifest, (3) a fresh sitemap.xml, (4) server-rendered HTML and (5) Schema.org JSON-LD. AuraCite measures all five.
| Crawler | Owner | Purpose | Visibility impact |
|---|---|---|---|
GPTBot | OpenAI | Training data + live SearchGPT | Critical (ChatGPT) |
OAI-SearchBot | OpenAI | SearchGPT live retrieval | Critical |
ChatGPT-User | OpenAI | User-triggered fetch (Browse with Bing) | High |
ClaudeBot | Anthropic | Training + Claude.ai live retrieval | Critical (Claude) |
PerplexityBot | Perplexity | Live answer engine | Critical |
Google-Extended | Gemini training | High (Gemini) | |
Applebot-Extended | Apple | Apple Intelligence / Siri | Medium |
Bytespider | ByteDance | Doubao / TikTok AI | Medium (APAC) |
CCBot | Common Crawl | Open dataset used by Llama, Mistral, etc. | High (long-tail) |
User-agent: *
Allow: /
Disallow: /api/
Disallow: /admin/
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: Bytespider
Allow: /
User-agent: CCBot
Allow: /
Sitemap: https://example.com/sitemap.xml
This pattern is exactly how AuraCite's own /robots.txt is configured. Every line is intentional — defaults are not enough.
The llms.txt file (proposed by Jeremy Howard, 2024) is a markdown manifest at the site root that tells LLMs what your product is, who it serves and how to cite it. Unlike robots.txt (which controls access), llms.txt controls narrative.
Recommended sections, in order:
See auracite.de/llms.txt for a production reference.
Most AI crawlers do not execute JavaScript. A SPA that renders client-side will appear as an empty <div id="root"></div> in the AI's training corpus. Use one of:
<noscript> fallbacks with full content.sr-only divs duplicating critical paragraphsKeep <lastmod> fresh. AI crawlers prioritise recently changed URLs. Set <priority> to 1.0 on the homepage, 0.9 on pillar pages, 0.8 on glossary entries, 0.7 on blog posts.
The free AuraCite AI Brand Check tests your domain against ChatGPT, Claude, Perplexity and Gemini in under 60 seconds. The full AuraCite platform tracks crawler hits, citation count and Share of AI Voice over time.