Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


Java/Open Source Daily

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Image Alt Descriptions - Page 3

February 4, 2002

These create the text that shows in an image space before a graphic loads, and subsequently when the mouse rolls over it. They've been sorely abused, often crammed with long lists of keywords, and again the spiders have wised up and tend to ignore them, or penalize obvious abuse.

Their proper use is to show visitors with text only browsers (and impaired-vision visitors with talking browsers) what they're missing. Using them as a method of presenting keywords is spamming and you can hardly complain if it gets you a ranking penalty.

Frames

Frames confuse most spiders. If you insist on using frames, then make the most of your <noframes> tag and include a link within it to a sitemap or contents page that lists your pages and links to them directly, rather than linking to framesets. You can always force the framesets to appear when the links are followed in a regular browser by using JavaScript, which the spiders will ignore. It's a lot of work but at least it should get you listed in the search engines.

Robots.txt

This text file goes in your root directory and gives instructions to spiders about which files and directories to ignore when they're trawling your site. It can have other uses too, but many of these are close to spamming techniques so won't be covered here.

Here's a sample robots.txt file

User-Agent: *
Disallow: /images/
Disallow: /bookmark*.html
Disallow: /cgi_bin/
Disallow: /status/

This tells all spiders (first line) not to look inside the directories called images, cgi_bin and status, and to ignore files called bookmark1.html, bookmark2.html and so on. Incidentally, the linebreaks are important.

It's a good idea to include a robots.txt file on your site, even if you don't have much to exclude. It helps prevent spiders wasting their time poking around in your image directories. And since spiders often tire and give up with sites without fully indexing them (especially new sites) it can help you get the more important areas of your site indexed.

Directory Structure

Spiders find their way around your site by following your internal links. They prioritize pages that are in the root directory, then first level directories, and if you're lucky (or a very popular site) they may look at subdirectories beyond that, but often they won't bother. That's why you find most professional sites have a flat structure, with many pages in the root directory and first-level subdirectories, rather than a deep structure with many levels of subdirectories.

Dynamic Pages

Spiders generally have trouble with these. Also they're a little frightened of them because they can get trapped inside a dynamic page server, and may even bring the server down. For this reason spiders identify dynamic pages by the question mark contained in their URLs, and usually avoid them. Some will allow you to submit specific dynamic pages, but they still won't follow the internal links within them.

One solution is to create static gateway pages that include static links to other pages on your site. Make sure the link URLs are inherently complete, not generated on the fly, that they don't contain question marks, and that your server can translate these static links to reach dynamic pages if it has to. Also make sure there's plenty of text on the gateway page, that it isn't purely made up of links, otherwise it may be ignored.

An alternative is to make technical alterations to your system so the server can cope with a visit from a spider, and then replace the question mark with a less obvious symbol such as a % sign. There's no point in making this replacement if the server won't be able to cope. The usual problem is that links to dynamic pages are often created dynamically themselves, and spiders can't manage this. They request pages with incomplete URLs missing query string elements, the server sends back a request for more information to complete the URL, which the spider can't understand, and the request turns into a dangerous loop. To get over this you have to create a work-around for the incomplete URL problem, and technically that's a demanding task.

For more details on getting dynamic sites indexed, try NetMechanic and Spider Food.

Additionals Links

Text - Page 2
Making Your Pages Easy for Search Engines to Index


Up to => Home / Location / Search / Spider