Technical SEO
robots.txt Explained: How to Control What Google Crawls
The robots.txt file lives at the root of your domain and is the first thing most crawlers check. It tells them which parts of your site they may or may not request. Used well it protects crawl budget; used carelessly it can hide pages you want ranked.
What robots.txt does
It grants or restricts crawling by user-agent. See the full robots.txt definition. It does not control indexing — that’s a critical distinction.
Crawling is not indexing
Blocking a page in robots.txt stops crawlers from reading it, but the URL can still appear in search if other sites link to it. To keep a page out of results, use a meta robots noindex tag — and do not block it in robots.txt, or Google can’t see the noindex.
Basic syntax
User-agent:— which crawler the rules apply to (*= all).Disallow:— paths not to request.Allow:— exceptions inside a disallowed path.Sitemap:— the absolute URL of your XML sitemap.
Common mistakes
- No robots.txt at all — see missing robots.txt.
- Syntax errors — see robots.txt syntax errors.
- Blocking pages that should rank — see this fix.
Build a correct file
Avoid syntax slips with our free Robots.txt Generator — set rules and copy a valid file. To validate your live robots.txt and sitemap together, run a free atlookup audit.
FAQ
Where does robots.txt go?
At the domain root, reachable at https://yoursite.com/robots.txt.
Does Google honor crawl-delay?
No — Google ignores it, but Bing and some others respect it.
Tags