With all of the SEO tips, tricks, and tutorials available to you, probably the easiest to achieve is the use of the robots.txt file. This is a simple file that gives instructions to search engine robots, or spiders, on how to crawl your website, and which files and directories to stay out of, and to not index in their databases.
In an earlier tutorial by Clint Dixon, he showed you how to write a robots.txt file, and what information to include, such as User-Agent and the Disallow directive, instructing the search engine spiders on how to crawl your site. In this article, I want to build upon what he showed you, and give you more information on the importance of the robots.txt in your SEO efforts, and some of the consequences of not having one, or having one written incorrectly.
Behavior of Search Engines When Encountering Robots.txt
Search engines behave differently upon encountering, or not, the robots.txt file during a crawl. You only have to follow your web stats to know that the robots.txt is one of the most requested files by search engine spiders. Many spiders check the robots.txt file first before ever performing a crawl, and some even pre-empt crawls by checking for the presence of, and commands in, the file; only to leave and come back another day. In the absence of this file, many search engine spiders will still crawl your website, but you will find that even more will simply leave without indexing. While this is seen as one of the most extreme consequences of excluding the robots.txt file, I will also show you consequences that I consider to be far worse.
Some of the major search engine spiders and robots have distinct behavior patterns upon reading the robots.txt that you can track in your stats. Sometimes, however, it is nice to have an outsider’s perspective on robot behaviors in order to compare it to what you may have noticed. I view a lot of robots.txt files, and sites with and without them, so I’ve been able to come up with a few behavior patterns I would like to share with you.