robots.txt Explained: Control What Google Crawls

The robots.txt file lives at the root of your domain and is the first thing most crawlers check. It tells them which parts of your site they may or may not request. Used well it protects crawl budget; used carelessly it can hide pages you want ranked.

What robots.txt does

It grants or restricts crawling by user-agent. See the full robots.txt definition. It does not control indexing — that’s a critical distinction.

Crawling is not indexing

Blocking a page in robots.txt stops crawlers from reading it, but the URL can still appear in search if other sites link to it. To keep a page out of results, use a meta robots noindex tag — and do not block it in robots.txt, or Google can’t see the noindex.

Basic syntax

User-agent: — which crawler the rules apply to (* = all).
Disallow: — paths not to request.
Allow: — exceptions inside a disallowed path.
Sitemap: — the absolute URL of your XML sitemap.

Common mistakes

No robots.txt at all — see missing robots.txt.
Syntax errors — see robots.txt syntax errors.
Blocking pages that should rank — see this fix.

Build a correct file

Avoid syntax slips with our free Robots.txt Generator — set rules and copy a valid file. To validate your live robots.txt and sitemap together, run a free atlookup audit.

FAQ

Where does robots.txt go?

At the domain root, reachable at https://yoursite.com/robots.txt.

Does Google honor crawl-delay?

No — Google ignores it, but Bing and some others respect it.