Glossary
seo

Robots.txt

Definition

A text file that tells search engine crawlers which pages they can and can't access on your website. Important for controlling what Google sees.

What is Robots.txt?

Robots.txt is a simple text file in your website's root folder (yoursite.com/robots.txt) that gives instructions to search engine crawlers about which parts of your site they can visit.

How It Works

When a crawler visits your site, it first checks robots.txt for instructions. You can tell it:

  • Which pages/folders to avoid
  • Which pages/folders to crawl
  • Where your sitemap is located

Basic Syntax

User-agent: *
Disallow: /private/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml
  • User-agent: Which crawler (* means all)
  • Disallow: Pages to avoid
  • Allow: Pages to crawl (overrides Disallow)
  • Sitemap: Your sitemap location

Common Uses

Block Admin Areas

Disallow: /admin/
Disallow: /wp-admin/

Block Search Results

Disallow: /search/

Block Staging Sites

User-agent: *
Disallow: /

Important Warnings

Not Security

Robots.txt is a suggestion, not a lock. Malicious bots ignore it. Don't use it to hide sensitive content.

Blocking Too Much

Accidentally blocking important pages is common. Always check your robots.txt isn't hiding content you want indexed.

Checking Your Robots.txt

Visit yoursite.com/robots.txt to see your current file. Google Search Console also has a robots.txt tester tool.

Want to Learn More?

Check out our in-depth guides on web design, SEO, and digital marketing.