|
You can use the robot meta tags to stop web spiders used by search engines from visiting and indexing parts of your web site.
I think the first reaction of many webmasters to this topic, is to think
"what",
"why should I care if the search engines index, the lot, it can only mean more hits for me from somewhere right?"
This view is understandable but may be mistaken on at least a couple of grounds.
-
Some web sites, are a lot bigger than they look and spiders have to stop somewhere. If they didn't they could literally go on for ever as page content is dynamic and with parameters so are page urls. A site can actually be impossible to spider fully.
-
Search Engines may actually down rate your site on the basis of pages devoid of sensibly searchable criteria.
To take marsjupiter.com as a case study, we run some statistics for the Distributed Computing Team, Team Ninja the statistics cover hundred of our own members and hundreds of competing teams, acres of measurements on how our members and teams are doing across a number of DC projects.
The searchable interest of a lot of these pages is negligible, but the number of linked page runs into thousands.
Thus we think it a wise idea to tell the search engines not to index certain pages and not to follow links on those pages.
You do this with a meta tag of the form:
<META name="Robots" content="noindex,nofollow">
valid values for content are:
| all |
same as index, follow. |
| none |
same as noindex, nofollow. |
| index, follow |
index the page. spider links. |
| noindex, nofollow |
don't index the page, don't spider links. |
| index, nofollow |
index the page, don't spider links. |
| no index, follow |
don't index the page, spider links. |
Along side the robot meta tags, search engines can also be instructed via the robot.txt file.
|