The most common approach is to use a 'robots.txt' file in the home directory of your site.
This file is normally created using a simple text editor that does not insert any formatting such as Microsoft's Notepad and follows the following format, with two lines making up a record;
[field]:[value]
There are two field types, the first 'User-agent' which specifies the type of spider you wish to apply the following rule(s) to and 'Disallow' which specifies prohibited content. The * symbol can be used as a wildcard in either of the fields.
The following allows all robots to visit all files because the wildcard "*" specifies all robots.
User-agent: *This one keeps all robots out.
Disallow:
User-agent: *The next one bars all robots from the cgi-bin and images directories:
Disallow: /
User-agent: *This one bans Roverdog from all files on the server:
Disallow: /cgi-bin/
Disallow: /images/
User-agent: RoverdogThis one bans keeps googlebot from getting at the cheese.htm file:
Disallow: /
User-agent: googlebot
Disallow: cheese.htm
Why not visit my main site for more tips and hints
No comments:
Post a Comment