Most internet search engines use a program known as spiders (crawlers or bots to use their original names) to trawl the internet looking for links and sites to include in their index.
Since the early days of search engines on the internet (certainly since 1996), certain commands have been available to embed within web pages to modify how these spiders behave. It should be noted however that not all spiders obey the commands.
There follows a brief list of these commands and what effect they have on spiders;
meta name="robots" content="index"
This is perhaps completely redundant for all it does is tell the search engine spider to index the page.
meta name="robots" content="follow"
This is a bit different from index in that it tells the spider to follow all links within the page. Again, it is a bit redundant because that is what spiders do anyway!
meta name="robots" content="noindex"
When not used for legitimate purposes, this tag can be dangerous because it can put you at risk for penalization by most, if not all search engines. This is because you can use a noindex tag to hide pages with multiple links that you don't want visitors to see but that you do want all search engines to index. There are however some legitimate uses for the noindex command. For example, if you have a dynamic site and you've created static pages to replace some of your dynamic pages, which can make them easier for search engine spiders to access, you could put a noindex tag on the dynamic version.
meta name="robots" content="nofollow"
This tag tells search engine spiders that it's OK to go ahead and index a page and list it but that they shouldn't follow any of the links that are on the page. This can be useful if, for example, you had some partners that requested a link on your site that you felt obligated to give, but you wanted to hold onto as much Page Rank as possible.
Now this is of course between you and your own personal god, but you would be able to in effect have a partners page, add the nofollow attribute to the meta tags, and basically not pass on any of your Page Rank to any of the sites to which you are linking. The nofollow command in effect tells all search engines that this is the end of the line.
meta name="robots" content="noindex,nofollow"
Obviously, noindex and nofollow are powerful tags - and in combination, they can make a page and the subsequent pages to which it links invisible to nearly all search engines. This combination command tells search engine spiders, "Do not read this page; do not follow any of the links on this page; do not include this page in your index." This command has its beneficial uses. For example, it can be placed on pages on a site that have duplicate content for legitimate reasons. A website might have both a page for the United States and a page for England that cover the same product with exactly the same content. However, nearly all search engines would see this as duplicate content and could devalue both pages. So placing this command on one of them means that search engine spiders will walk on by and you won't be penalized.
meta name="robots" content="noarchive"
Finally, almost all search engines today, including Google and Yahoo, offer a cached version of a site alongside its listing that provides a snapshot of what the page used to look like. The noarchive tag, therefore, is available to be used in circumstances where there is content on your website that is of a timely nature and therefore that you might not necessarily want search engine spiders to cache for people to have access to moving forward. For example, a business might run a one-time special that has a ridiculously low price to drum up some business while things are slow. The business will want to be able to shut that sale down as soon as sales are back up to a solid level. However, it is conceivable that someone could click on the cached version of the business's site, see the old deal that was out there, and insist on getting it for themselves. By using the noarchive tag, you are telling search engine spiders, in effect, "This page is subject to frequent changes, and I don't want my visitors to have access to some of this content at a later time."
These commands are placed within the head section of web pages (if you click 'view source' from your browser, you should see examples of them.).
Providing Computer Help and Support to business in and around Hastings, St Leonards, Battle and Bexhill, East Sussex. Also has a few snippets of random things.
Monday, January 21, 2008
Subscribe to:
Post Comments (Atom)
Labels
mysql
php
hastings
hastings musical festival
laptop
website
east Sussex
hastings music festival
Windows XP
blog
computer support
design
directory
dns
failure
findapro
hewlett packard
internet provider
javascript
music
printer
racing tips
samphire
security
virus
vpn
zen cart
.htaccess
1066
A3 Prints
B A C Productions
Ben
G70-111EM
HP
access
accounting
accounts
adsense
advertising
ajax
anti-virus
anyone there
attribute
backlink
betting
bird
brighton
building
cat5
cms
coding
collaboration
components
content
content management
control-alt-delete
database
decisions
defrag
denial of service
disaster
disk drives
dj
dos
ds
ebay
electrofied
email
error 39
extension
feeding
filename
fragmentation
free
fstab
garden
google
hardware
hastings rock
hello
hmf
hoax
lan
life is beautiful
link
linux
mac
mail
memory
migration
missing CD
mod_rewrite
multidimensional
network
nintendo
off-site
password
phpnuke
power
pricing
procedure
programme
protection
proxy
quickbooks
ranking
recovery
registry
repair
revenue
ribbon cable
router
routing
roy penfold
sage
seo
server
shutdown
slow
slugs
smoke
snopes
spam
spamassassin
squid
st leonards
streetview
substitution
supply
surge
symantec
telephone
telephone stats
trading standards
trojan
turbocash
variable
wan
web design
web hosting
web site
web sites
website design
wildlife
wiring
zencart
No comments:
Post a Comment