The Robots Exclusion Standard - also referred to as the 'Robots
Exclusion Protocol', or simple, the 'Robots.txt Standard / Protocol'
- is a standard that enables website to control access to their
content by web robots and crawlers. Why would websites want to limit
access to their content? One reason is that web crawlers have been
accused of using considerable bandwidth and have been known to contribute
to a website exceeding their bandwidth and being made unavailable.
Websites also like to limit access to certain sections of their
website, such as their image folder: so that search engines cannot
place a websites pictures into their image search database.
The Robots Exclusion Standard was initiated by Martijn Koster;
who developed the first ever web search engine: ALIWEB. It has been
claimed that Koster suggested the creation of the Robots Exclusion
Standard after his server was made unavailable by a 'rouge' misbehaving
web crawler. Due to his experience with developing an early web
crawler, Koster was able present his Robots Exclusion Standard proposal
to CERN in 1994 - Berners-Lee invented the World Wide Web at CERN
in the early 1990's. The Robots Exclusion Standard was adopted by
the prominent search engines of 1994-1995: primarily AltaVista,
Yahoo!, Lycos and WebCrawler. The Robots Exclusion Standard has
continued to be adhered to by prominent search engines, such as:
Google, Bing and Yahoo! and Yandex.
The Robots Exclusion Standard is referred to as the 'Robots.txt
Protocol' because it uses a file named: robots.txt. Each subdomain,
port and protocol needs it's own robots.txt file. The robots.txt
syntax is simple, it contains an 'allow' or 'disallow' command for
each web crawlers; the web crawler is identified by it's user-agent.
The allow and disallow syntax is shown below: