robots.txt Generator

Build a custom robots.txt file with presets for popular platforms

Safe conversion with no data sent to server

Last updated: March 2026

Quick Presets

Rules

Additional Settings

robots.txt Preview

User-agent: *
Allow: /

What is robots.txt?

The robots.txt file is a plain text file placed at the root of your website (e.g., example.com/robots.txt) that provides instructions to web crawlers about which pages or sections of your site they should or should not access. It follows the Robots Exclusion Protocol, a standard first proposed in 1994 that has become a foundational component of how search engines interact with websites.

When Googlebot, Bingbot, or any other web crawler visits your site, the very first thing it does is check for a robots.txt file. The directives in this file act as a guideline for crawler behavior. You can specify rules per user-agent (targeting specific crawlers), allow or disallow access to directories and files, set crawl delay intervals, and point crawlers to your XML sitemap. It is important to understand that robots.txt is advisory, not a security mechanism. Well-behaved crawlers like Googlebot respect these directives, but malicious bots may ignore them entirely.

A properly configured robots.txt is essential for managing your site's crawl budget, preventing indexation of sensitive or low-value pages, and ensuring search engines focus their resources on your most important content. Misconfigurations in robots.txt are among the most common and damaging technical SEO errors, capable of accidentally blocking entire sections of a site from Google's index.

How to Use This Tool

Build a custom robots.txt file with an intuitive visual interface:

  1. Start with a preset - Click one of the quick preset buttons to load common configurations for WordPress, e-commerce sites, or standard setups. Presets provide a solid starting point that you can customize further.
  2. Configure user-agent rules - Each rule group targets a specific crawler (use "*" for all crawlers). Add Allow or Disallow directives for specific paths. For example, disallow /admin/ and /private/ while allowing everything else.
  3. Add multiple rule groups - Click "+ Add User-Agent" to create rules for specific crawlers. You might allow Googlebot full access while restricting other bots from resource-intensive pages like search results or filtered product listings.
  4. Set crawl delay - Optionally specify a delay in seconds between successive crawler requests. This protects servers with limited resources from aggressive crawling, though note that Google does not officially support crawl-delay (use Search Console's crawl rate settings instead).
  5. Add sitemap URLs - Point crawlers to your XML sitemaps. This is one of the most effective ways to ensure search engines discover all your important pages.
  6. Review and download - Check the live preview, then copy the output or download the robots.txt file and upload it to your website's root directory.

Why robots.txt Matters for SEO

The robots.txt file is a critical lever for controlling how search engines interact with your website. Its impact on SEO performance extends across several key areas:

Crawl budget management: Google allocates a finite crawl budget to each website based on the site's size, health, and importance. Wasting crawl budget on low-value pages (admin panels, search result pages, staging environments, parameter-heavy URLs) means important pages get crawled less frequently. A well-configured robots.txt directs crawlers away from waste and toward your revenue-generating and content-rich pages.

Preventing sensitive content indexation: While robots.txt disallow directives do not prevent indexation (Google can still index a URL it discovers through links even if crawling is blocked), they significantly reduce the likelihood of unwanted pages appearing in search results. For true noindex behavior, combine robots.txt with the noindex meta tag or X-Robots-Tag HTTP header.

Server resource protection: Aggressive crawling can impact site performance, especially for dynamic sites with resource-intensive pages like faceted search or real-time data. By blocking crawlers from these heavy endpoints, you protect your server's Core Web Vitals performance for real users while preventing search engines from wasting time on pages that provide no SEO value.

Sitemap discovery: Including your sitemap URL in robots.txt is recognized as a best practice by Google, Bing, and Yandex. It provides an additional discovery mechanism beyond Search Console sitemap submissions, ensuring new and updated content is found and indexed as quickly as possible.

FAQ

Does robots.txt prevent pages from being indexed?

No. Robots.txt only controls crawling, not indexing. If Google discovers a URL through backlinks or sitemaps but cannot crawl it due to robots.txt, it may still index the URL with limited information (showing just the URL and possibly anchor text in search results). To prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header on the actual page.

Does Google respect the Crawl-delay directive?

No. Google does not support the Crawl-delay directive in robots.txt. To control Google's crawl rate, use the crawl rate settings in Google Search Console. However, other crawlers like Bingbot, Yandex, and some SEO tool bots do respect Crawl-delay. If your server has limited resources, setting a crawl delay for non-Google bots can reduce server load significantly.

Where should I place the robots.txt file?

The robots.txt file must be located at the root of your domain: https://example.com/robots.txt. It must be accessible via HTTP/HTTPS and return a 200 status code. A robots.txt in a subdirectory (like /blog/robots.txt) is ignored by crawlers. If you use subdomains, each subdomain needs its own robots.txt file (e.g., blog.example.com/robots.txt).

What happens if robots.txt returns a 404 or 500 error?

If robots.txt returns a 404 (not found), crawlers assume there are no restrictions and crawl the entire site freely. If it returns a 5xx server error, Google treats it as a temporary failure and may limit crawling until the file becomes accessible again. A persistent 5xx error can significantly reduce your site's crawl rate, potentially delaying indexation of new content for days or weeks.