Robots.txt Generator

Build a robots.txt file visually with rules for different bots. Quick presets for common configurations including AI crawler blocking

Quick Presets

robots.txt
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

What robots.txt Does (and What It Doesn't)

robots.txt is a text file at your site's root (yoursite.com/robots.txt) that tells crawlers which paths to request and which to skip. It uses three directives: User-agent (which bot the rule applies to), Disallow (paths to skip), and Allow (paths to include despite a broader Disallow). Sitemap (a URL pointing to your sitemap.xml) is the fourth standard directive. The Robots.txt Generator builds these visually so you can compose a config without remembering the syntax.

What robots.txt does not do: enforce access control. A determined crawler ignores robots.txt entirely; a malicious scraper wouldn't bother reading it. Anything truly private must sit behind authentication, not just a Disallow. robots.txt is a polite request to well-behaved bots (Googlebot, Bingbot, AhrefsBot, the major AI crawlers), and that's the right way to think about it.

Common robots.txt Patterns

PatternEffectWhen to Use
User-agent: * Disallow:Allow all bots everywherePublic marketing sites
User-agent: * Disallow: /Block all bots from everythingStaging environments only
Disallow: /admin/Block admin paths from indexingLogin screens, dashboards
Disallow: /api/Block API routes from crawlingJSON endpoints, webhooks
User-agent: GPTBot Disallow: /Block OpenAI's crawlerSites blocking AI training
Sitemap: https://...Point crawlers to your sitemapAlways include if you have one

Blocking AI Crawlers and Common Bots

Many publishers now block AI training crawlers explicitly. The major ones to know: GPTBot (OpenAI), Google-Extended (Google's AI training, separate from Googlebot), CCBot (Common Crawl, often used as training data), ClaudeBot and anthropic-ai (Anthropic), PerplexityBot (Perplexity), Bytespider (TikTok / ByteDance). Block these without affecting search ranking by adding a per-user-agent Disallow rule, since search bots (Googlebot, Bingbot) read different identifiers.

Pair this with the [Password Generator](/password-generator) for hardening the actual access controls behind robots.txt, and the [QR Code Generator](/qr-code-generator) if you're publishing a printable URL. Always test your robots.txt with Google Search Console's robots.txt tester before deploying; a misplaced slash can accidentally block your entire site.

Frequently Asked Questions

Where does robots.txt go on my site?

At the root of your domain: yoursite.com/robots.txt. It must be a plain text file (not HTML), accessible without authentication, served with Content-Type: text/plain. Subdirectory robots.txt files (yoursite.com/blog/robots.txt) are ignored. Subdomains have their own robots.txt: blog.yoursite.com needs its own.

Does robots.txt prevent pages from showing in Google?

No, robots.txt prevents crawling, not indexing. Google can still index a URL it has discovered through external links, even if it can't crawl the page contents. To prevent indexing, use a meta robots tag (noindex) or HTTP X-Robots-Tag header. To prevent both crawling and indexing, use both robots.txt and noindex (the page must be crawlable for noindex to be read).

How do I block AI crawlers without blocking search engines?

Add per-user-agent rules. Block GPTBot, Google-Extended, ClaudeBot, anthropic-ai, CCBot, PerplexityBot, and Bytespider individually with their own User-agent and Disallow lines, while leaving the catch-all User-agent: * with full access. This blocks AI training without affecting Googlebot's search-indexing crawl.

What is the wildcard syntax in robots.txt?

The * wildcard matches any sequence of characters; $ matches end-of-URL. Disallow: /*.pdf$ blocks all URLs ending in .pdf. Disallow: /search?* blocks all URLs starting with /search?. Both Googlebot and Bingbot support wildcards; older or simpler crawlers may not, so don't rely on wildcards alone for sensitive paths.

Should I include a sitemap in robots.txt?

Yes, always. The Sitemap directive points crawlers directly to your sitemap.xml, which speeds up the discovery of new pages. The line is simply: Sitemap: https://yoursite.com/sitemap.xml. Multiple sitemaps are allowed; each gets its own line.

More tools β†’