robots.txt Deep Dive

🛡️

Googlebot arrives at a new site

"The very first thing I do at any site is read /robots.txt. No file? I crawl everything. File exists? I follow its rules. This isn't a request — it's a protocol."

📌 robots.txt — a plain text file at the site root (https://example.com/robots.txt) that controls which sections search bots can access.

robots.txt Directives

Directive	What it does	Example
`User-agent`	Which bot the rule applies to	`User-agent: *` (all bots)
`Disallow`	Blocks crawling of a path	`Disallow: /admin/`
`Allow`	Permits a sub-path inside a Disallowed area	`Allow: /admin/public/`
`Crawl-delay`	Pause between bot requests (seconds)	`Crawl-delay: 2`
`Sitemap`	Points to the XML sitemap	`Sitemap: https://site.com/sitemap.xml`

Real-World robots.txt Example

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Allow: /admin/assets/

User-agent: Googlebot
Disallow: /staging/

Sitemap: https://example.com/sitemap.xml

Wildcards: * and $

* — matches any sequence of characters: Disallow: /*.pdf$ blocks all PDF files
$ — end of URL: Disallow: /search$ blocks /search but not /search/results

Common Mistakes

Mistake	Consequence
Blocking /static/ or /css/ with Disallow	Googlebot can't render the page — treats it as broken
Thinking Disallow = noindex	The page can still be indexed via external links
Forgetting to block parameterized URLs	Duplicates like ?sort=&page= waste crawl budget

🧑‍💻

Alex checks robots.txt in GSC

"Google Search Console → Settings → robots.txt Tester. Enter a URL and instantly see: blocked or not. An essential tool before any launch."

robots.txt checklist:

☐ File is accessible at /robots.txt
☐ CSS, JS and images are NOT blocked
☐ /admin/, /cart/, /checkout/ are blocked
☐ Sitemap directive points to the current XML
☐ Tested in GSC robots.txt Tester
☐ Disallow ≠ noindex: important content not hidden via robots

⚠️ Remember: Disallow blocks CRAWLING, not INDEXING. To remove a page from the index — use a noindex meta tag, but keep the page crawlable: otherwise the bot won't see the noindex.

🎮 Test yourself: which robots.txt directive blocks crawling?

🎯

Lesson Task

Test your knowledge and earn +20 XP

← Canonical Tags

Lesson 15 of 22

Go to Task →