We just wanted to update everyone on our LinkInTxt crawl that began Friday evening. So far so good, but we’re still going. We had quite a few big sites opt into the program and our lwbot is making his way through them as fast as he can.

We’ll let everyone know once this is complete and it will also allow us to spot any improvements we can make for the next crawl. Just be sure to utilize your robots.txt file to guide our lwbot through your site how you see fit. Here is part of our email sent Friday regarding how to setup your robots.txt:

You can direct our lwbot in your robots.txt file with something like this (ex: excludes directories and files):

User-Agent: lwbot
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /admin/login.php
Disallow: filename.html
Disallow: /html/secure.php

You can look up any document on proper formatting of your robots.txt file to tailor it to your needs. Here is great documentation:

http://www.robotstxt.org/wc/robots.html