Question 1

Where does robots.txt go on my site?

Accepted Answer

At the root of your domain: yoursite.com/robots.txt. It must be a plain text file (not HTML), accessible without authentication, served with Content-Type: text/plain. Subdirectory robots.txt files (yoursite.com/blog/robots.txt) are ignored. Subdomains have their own robots.txt: blog.yoursite.com needs its own.

Question 2

Does robots.txt prevent pages from showing in Google?

Accepted Answer

No, robots.txt prevents crawling, not indexing. Google can still index a URL it has discovered through external links, even if it can't crawl the page contents. To prevent indexing, use a meta robots tag (noindex) or HTTP X-Robots-Tag header. To prevent both crawling and indexing, use both robots.txt and noindex (the page must be crawlable for noindex to be read).

Question 3

How do I block AI crawlers without blocking search engines?

Accepted Answer

Add per-user-agent rules. Block GPTBot, Google-Extended, ClaudeBot, anthropic-ai, CCBot, PerplexityBot, and Bytespider individually with their own User-agent and Disallow lines, while leaving the catch-all User-agent: * with full access. This blocks AI training without affecting Googlebot's search-indexing crawl.

Question 4

What is the wildcard syntax in robots.txt?

Accepted Answer

The * wildcard matches any sequence of characters; $ matches end-of-URL. Disallow: /*.pdf$ blocks all URLs ending in .pdf. Disallow: /search?* blocks all URLs starting with /search?. Both Googlebot and Bingbot support wildcards; older or simpler crawlers may not, so don't rely on wildcards alone for sensitive paths.

Question 5

Should I include a sitemap in robots.txt?

Accepted Answer

Yes, always. The Sitemap directive points crawlers directly to your sitemap.xml, which speeds up the discovery of new pages. The line is simply: Sitemap: https://yoursite.com/sitemap.xml. Multiple sitemaps are allowed; each gets its own line.

Pattern	Effect	When to Use
User-agent: * Disallow:	Allow all bots everywhere	Public marketing sites
User-agent: * Disallow: /	Block all bots from everything	Staging environments only
Disallow: /admin/	Block admin paths from indexing	Login screens, dashboards
Disallow: /api/	Block API routes from crawling	JSON endpoints, webhooks
User-agent: GPTBot Disallow: /	Block OpenAI's crawler	Sites blocking AI training
Sitemap: https://...	Point crawlers to your sitemap	Always include if you have one

Robots.txt Generator

Quick Presets

What robots.txt Does (and What It Doesn't)

Common robots.txt Patterns

Blocking AI Crawlers and Common Bots

Frequently Asked Questions

Where does robots.txt go on my site?

Does robots.txt prevent pages from showing in Google?

How do I block AI crawlers without blocking search engines?

What is the wildcard syntax in robots.txt?

Should I include a sitemap in robots.txt?

Related Tools