Robots.txt Generator
Build a robots.txt file visually with rules for different bots. Quick presets for common configurations including AI crawler blocking
Quick Presets
User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
What robots.txt Does (and What It Doesn't)
robots.txt is a text file at your site's root (yoursite.com/robots.txt) that tells crawlers which paths to request and which to skip. It uses three directives: User-agent (which bot the rule applies to), Disallow (paths to skip), and Allow (paths to include despite a broader Disallow). Sitemap (a URL pointing to your sitemap.xml) is the fourth standard directive. The Robots.txt Generator builds these visually so you can compose a config without remembering the syntax.
What robots.txt does not do: enforce access control. A determined crawler ignores robots.txt entirely; a malicious scraper wouldn't bother reading it. Anything truly private must sit behind authentication, not just a Disallow. robots.txt is a polite request to well-behaved bots (Googlebot, Bingbot, AhrefsBot, the major AI crawlers), and that's the right way to think about it.
Common robots.txt Patterns
| Pattern | Effect | When to Use |
|---|---|---|
| User-agent: * Disallow: | Allow all bots everywhere | Public marketing sites |
| User-agent: * Disallow: / | Block all bots from everything | Staging environments only |
| Disallow: /admin/ | Block admin paths from indexing | Login screens, dashboards |
| Disallow: /api/ | Block API routes from crawling | JSON endpoints, webhooks |
| User-agent: GPTBot Disallow: / | Block OpenAI's crawler | Sites blocking AI training |
| Sitemap: https://... | Point crawlers to your sitemap | Always include if you have one |
Blocking AI Crawlers and Common Bots
Many publishers now block AI training crawlers explicitly. The major ones to know: GPTBot (OpenAI), Google-Extended (Google's AI training, separate from Googlebot), CCBot (Common Crawl, often used as training data), ClaudeBot and anthropic-ai (Anthropic), PerplexityBot (Perplexity), Bytespider (TikTok / ByteDance). Block these without affecting search ranking by adding a per-user-agent Disallow rule, since search bots (Googlebot, Bingbot) read different identifiers.
Pair this with the [Password Generator](/password-generator) for hardening the actual access controls behind robots.txt, and the [QR Code Generator](/qr-code-generator) if you're publishing a printable URL. Always test your robots.txt with Google Search Console's robots.txt tester before deploying; a misplaced slash can accidentally block your entire site.
Frequently Asked Questions
Where does robots.txt go on my site?
At the root of your domain: yoursite.com/robots.txt. It must be a plain text file (not HTML), accessible without authentication, served with Content-Type: text/plain. Subdirectory robots.txt files (yoursite.com/blog/robots.txt) are ignored. Subdomains have their own robots.txt: blog.yoursite.com needs its own.
Does robots.txt prevent pages from showing in Google?
No, robots.txt prevents crawling, not indexing. Google can still index a URL it has discovered through external links, even if it can't crawl the page contents. To prevent indexing, use a meta robots tag (noindex) or HTTP X-Robots-Tag header. To prevent both crawling and indexing, use both robots.txt and noindex (the page must be crawlable for noindex to be read).
How do I block AI crawlers without blocking search engines?
Add per-user-agent rules. Block GPTBot, Google-Extended, ClaudeBot, anthropic-ai, CCBot, PerplexityBot, and Bytespider individually with their own User-agent and Disallow lines, while leaving the catch-all User-agent: * with full access. This blocks AI training without affecting Googlebot's search-indexing crawl.
What is the wildcard syntax in robots.txt?
The * wildcard matches any sequence of characters; $ matches end-of-URL. Disallow: /*.pdf$ blocks all URLs ending in .pdf. Disallow: /search?* blocks all URLs starting with /search?. Both Googlebot and Bingbot support wildcards; older or simpler crawlers may not, so don't rely on wildcards alone for sensitive paths.
Should I include a sitemap in robots.txt?
Yes, always. The Sitemap directive points crawlers directly to your sitemap.xml, which speeds up the discovery of new pages. The line is simply: Sitemap: https://yoursite.com/sitemap.xml. Multiple sitemaps are allowed; each gets its own line.
Related Tools
Password Generator
Generate strong, secure passwords with customisable length and character options. Uses cryptographically secure randomisation.
QR Code Generator
Generate QR codes for website URLs and plain text. Download as PNG or SVG with custom colours. Free, instant, no sign-up required.
SQL Formatter
Format messy SQL queries with proper indentation and keyword highlighting. Options for uppercase keywords and indent size. Supports SELECT, JOIN, WHERE and more.