What is Robots.txt?
Robots.txt is a text document stored in the root directory of a website, which gives guidance to well-behaved bots on which URLs to fetch data from or avoid. Private areas of the web, such as staging directories with duplicate content or sensitive files, are restricted through user-agent rules and Disallow directives issued by the webmaster. Although many bots do not respect robots.txt files, optimal configuration still maintains budget on crawls and avoids unintentional leaks of private information. Using robots.txt alongside a detailed XML sitemap allows guiding search engines toward the most critical information.
Examples of robots.txt include disabling/private/ for all user agents, allowing Googlebot access to /public/, and enforcing a crawl delay on less important directories.
Related terms: crawl rules, user-agent, sitemap, meta robots