Image Alt Descriptions - Page 3
February 4, 2002
These create the text that shows in an image space before a
graphic loads, and subsequently when the mouse rolls over it.
They've been sorely abused, often crammed with long lists of
keywords, and again the spiders have wised up and tend to ignore
them, or penalize obvious abuse.
Their proper use is to show visitors with text only browsers (and
impaired-vision visitors with talking browsers) what they're
missing. Using them as a method of presenting keywords is
spamming and you can hardly complain if it gets you a ranking
penalty.
Frames
Frames confuse most spiders. If you insist on using frames, then
make the most of your <noframes> tag and include a link
within it to a sitemap or contents page that lists your pages and
links to them directly, rather than linking to framesets. You can
always force the framesets to appear when the links are followed
in a regular browser by using JavaScript, which the spiders will
ignore. It's a lot of work but at least it should get you listed
in the search engines.
Robots.txt
This text file goes in your root directory and gives instructions
to spiders about which files and directories to ignore when
they're trawling your site. It can have other uses too, but many
of these are close to spamming techniques so won't be covered
here.
Here's a sample robots.txt file
User-Agent: *
Disallow: /images/
Disallow: /bookmark*.html
Disallow: /cgi_bin/
Disallow: /status/
This tells all spiders (first line) not to look inside the
directories called images, cgi_bin and status, and to ignore
files called bookmark1.html, bookmark2.html and so on.
Incidentally, the linebreaks are important.
It's a good idea to include a robots.txt file on your site, even
if you don't have much to exclude. It helps prevent spiders
wasting their time poking around in your image directories. And
since spiders often tire and give up with sites without fully
indexing them (especially new sites) it can help you get the more
important areas of your site indexed.
Directory Structure
Spiders find their way around your site by following your
internal links. They prioritize pages that are in the root
directory, then first level directories, and if you're lucky (or
a very popular site) they may look at subdirectories beyond that,
but often they won't bother. That's why you find most
professional sites have a flat structure, with many pages in the
root directory and first-level subdirectories, rather than a deep
structure with many levels of subdirectories.
Dynamic Pages
Spiders generally have trouble with these. Also they're a little
frightened of them because they can get trapped inside a dynamic
page server, and may even bring the server down. For this reason
spiders identify dynamic pages by the question mark contained in
their URLs, and usually avoid them. Some will allow you to submit
specific dynamic pages, but they still won't follow the internal
links within them.
One solution is to create static gateway pages that include
static links to other pages on your site. Make sure the link URLs
are inherently complete, not generated on the fly, that they
don't contain question marks, and that your server can translate
these static links to reach dynamic pages if it has to. Also make
sure there's plenty of text on the gateway page, that it isn't
purely made up of links, otherwise it may be ignored.
An alternative is to make technical alterations to your system so
the server can cope with a visit from a spider, and then replace
the question mark with a less obvious symbol such as a % sign.
There's no point in making this replacement if the server won't
be able to cope. The usual problem is that links to dynamic pages
are often created dynamically themselves, and spiders can't
manage this. They request pages with incomplete URLs missing
query string elements, the server sends back a request for more
information to complete the URL, which the spider can't
understand, and the request turns into a dangerous loop. To get
over this you have to create a work-around for the incomplete URL
problem, and technically that's a demanding task.
For more details on getting dynamic sites indexed, try
NetMechanic and
Spider Food.
Additionals Links
Text - Page 2
Making Your Pages Easy for Search Engines to Index
|