โš™๏ธ
Technical SEO
Lesson 15 of 22 ยท Crawl Control
FREE +55 XP
๐Ÿ‡ท๐Ÿ‡บ ะงะธั‚ะฐั‚ัŒ ะฝะฐ ั€ัƒััะบะพะผ

robots.txt Deep Dive

๐Ÿ›ก๏ธ
Googlebot arrives at a new site
"The very first thing I do at any site is read /robots.txt. No file? I crawl everything. File exists? I follow its rules. This isn't a request โ€” it's a protocol."
๐Ÿ“Œ robots.txt โ€” a plain text file at the site root (https://example.com/robots.txt) that controls which sections search bots can access.

robots.txt Directives

DirectiveWhat it doesExample
User-agentWhich bot the rule applies toUser-agent: * (all bots)
DisallowBlocks crawling of a pathDisallow: /admin/
AllowPermits a sub-path inside a Disallowed areaAllow: /admin/public/
Crawl-delayPause between bot requests (seconds)Crawl-delay: 2
SitemapPoints to the XML sitemapSitemap: https://site.com/sitemap.xml

Real-World robots.txt Example

User-agent: *
Disallow: /admin/
Disallow: /cart/
Disallow: /checkout/
Allow: /admin/assets/

User-agent: Googlebot
Disallow: /staging/

Sitemap: https://example.com/sitemap.xml

Wildcards: * and $

  • * โ€” matches any sequence of characters: Disallow: /*.pdf$ blocks all PDF files
  • $ โ€” end of URL: Disallow: /search$ blocks /search but not /search/results

Common Mistakes

MistakeConsequence
Blocking /static/ or /css/ with DisallowGooglebot can't render the page โ€” treats it as broken
Thinking Disallow = noindexThe page can still be indexed via external links
Forgetting to block parameterized URLsDuplicates like ?sort=&page= waste crawl budget
๐Ÿง‘โ€๐Ÿ’ป
Alex checks robots.txt in GSC
"Google Search Console โ†’ Settings โ†’ robots.txt Tester. Enter a URL and instantly see: blocked or not. An essential tool before any launch."
robots.txt checklist:

โ˜ File is accessible at /robots.txt
โ˜ CSS, JS and images are NOT blocked
โ˜ /admin/, /cart/, /checkout/ are blocked
โ˜ Sitemap directive points to the current XML
โ˜ Tested in GSC robots.txt Tester
โ˜ Disallow โ‰  noindex: important content not hidden via robots
โš ๏ธ Remember: Disallow blocks CRAWLING, not INDEXING. To remove a page from the index โ€” use a noindex meta tag, but keep the page crawlable: otherwise the bot won't see the noindex.
๐ŸŽฎ Test yourself: which robots.txt directive blocks crawling?
๐ŸŽฏ
Lesson Task
Test your knowledge and earn +20 XP
โ† Canonical Tags
Lesson 15 of 22
Go to Task โ†’