โ๏ธ
FREE
+50 XP
How Search Engines Find Pages
๐ค
Googlebot โ the search crawler
"Hi! I'm Googlebot. Every day I crawl billions of pages across the web. Want to know how I find your site and decide what makes it into search?"
๐ก Crawling โ the process of a search robot following links and downloading page content. Indexing โ analyzing and storing those pages in the search database.
How Does Googlebot Find Pages?
Links
Follows links from already known pages
โ
Sitemap
Reads the XML sitemap
โ
Submit URL
Manual submission via Search Console
Three Conditions for Page Indexing
| Condition | How to check |
|---|---|
| โ The page has inbound links | Ahrefs / GSC โ Internal links report |
| โ No Disallow in robots.txt | Check robots.txt in GSC |
| โ No noindex tag on the page | Look for <meta name="robots" content="noindex"> in source |
Crawl Budget
Google cannot infinitely crawl your site. Each site is allocated a crawl budget โ the number of pages the bot will visit per session. For large sites, it's critical to spend that budget on the right pages.
How to avoid wasting crawl budget:
โ Block filter/sort pages in robots.txt
โ Fix 404 pages with 301 redirects
โ Exclude duplicates via canonical tags
โ Avoid infinite URLs with parameters
โ Submit XML sitemap in GSC
โ Block filter/sort pages in robots.txt
โ Fix 404 pages with 301 redirects
โ Exclude duplicates via canonical tags
โ Avoid infinite URLs with parameters
โ Submit XML sitemap in GSC
๐งโ๐ป
Alex audits a client's site
"Look โ you have 50,000 filter pages in the index. Googlebot burns its entire budget on them and never reaches the important product pages. That's why you have no traffic."
๐ฏ Remember: crawling โ indexing. The bot can visit a page and not add it to the index (if the content is weak or there's a noindex tag). Track both metrics in Google Search Console.
๐ฎ Test yourself: select the conditions required for a page to be indexed!
Lesson Task
Test your knowledge and earn +20 XP