Robots.txt
Definition
A text file that tells search engine crawlers which pages they can and can't access on your website. Important for controlling what Google sees.
What is Robots.txt?
Robots.txt is a simple text file in your website's root folder (yoursite.com/robots.txt) that gives instructions to search engine crawlers about which parts of your site they can visit.
How It Works
When a crawler visits your site, it first checks robots.txt for instructions. You can tell it:
- Which pages/folders to avoid
- Which pages/folders to crawl
- Where your sitemap is located
Basic Syntax
User-agent: *
Disallow: /private/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml
- User-agent: Which crawler (
*means all) - Disallow: Pages to avoid
- Allow: Pages to crawl (overrides Disallow)
- Sitemap: Your sitemap location
Common Uses
Block Admin Areas
Disallow: /admin/
Disallow: /wp-admin/
Block Search Results
Disallow: /search/
Block Staging Sites
User-agent: *
Disallow: /
Important Warnings
Not Security
Robots.txt is a suggestion, not a lock. Malicious bots ignore it. Don't use it to hide sensitive content.
Blocking Too Much
Accidentally blocking important pages is common. Always check your robots.txt isn't hiding content you want indexed.
Checking Your Robots.txt
Visit yoursite.com/robots.txt to see your current file. Google Search Console also has a robots.txt tester tool.