Google News

Search engine robots will check a special file in the root of each server called robots.txt, which is, as you may guess, a plain text file (not HTML). Robots.txt implements the Robots Exclusion Protocol, which allows the web site administrator to define what parts of the site are off-limits to specific robot user agent names. Web administrators can disallow access to cgi, private and temporary directories, for example, because they do not want pages in those areas indexed.
The syntax of this file is obscure to most of us: it tells robots not to look at pages which have certain paths in their URLs. Each section includes the name of the user agent (robot) and the paths it may not follow. There is no way to allow a specific directory, or to specify a kind of file. You should remember that robots may access any directory path in a URL which is not explicitly disallowed in this file: everything not forbidden is OK.
The three most common items you will find in a robots.txt file are:
1. allow
2. disallow
3. wildcard or asterisk: "*"

If you want to know the more details about Robots.txt, please log on to following URL:
http://www.seo-news.com/

Google News

Saturday, March 25, 2006

Search Indexing Robots and Robots.txt

2 Comments:

Contributors

Previous Posts