Web Developer's Virtual Library: Encyclopedia of Web Design Tutorials, Articles and Discussions


WDVL Newsletter

Active Server Pages
JSP/Java Servlets
Microsoft SQL Server
Daily Backup
Dedicated Servers
Streaming Audio/Video
24-hour Support    

jobs.webdeveloper.com

Hiermenus


e-commerce
Partner With Us















Developer Channel
FlashKit.com
JavaScript.com
JavaScriptSource
Developer Jobs
ScriptSearch
StreamingMediaWorld
Web Developer's Journal
Web Developer's Virtual Library
WebDeveloper.com
Webreference
Web Hosts
XMLfiles.com

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


So What's the Problem?

June 26, 2000

Web publishers also find dealing with search engines to be a frustrating pursuit. Everybody wants their pages to be easy for the world to find, but getting your site listed can be tough. Search sites may take a long time to list your site, may never list it at all, and may drop it after a few months for no reason. If you resubmit often, as it is very tempting to do, you may even be branded a spamdexer and barred from a search site. And as for trying to get a good ranking, forget it! You have to keep up with all the arcane and ever-changing rules of a dozen different search engines, and adjust the keywords on your pages just so...all the while fighting against the very plausible theory that in fact none of this stuff matters, and the search sites assign rankings at random or by whim.

Spammers are to blame for a lot of the search sites' problems. If the invisible hand of the free market were to be left totally alone, every search site would quickly fill up with get-rich-quick, weight loss and porno links. In order to survive as a useful resource, every search site must wage a never-ending war against spamdexers and over-zealous promoters. This means that they have less time to devote to keeping their databases current and their search algorithms tuned. It also means that a lot of perfectly worthwhile sites get barred from, or never added to, many search sites. See Keeping the Search Engines Happy for more ruminations on the spam wars, and how to avoid being caught in the middle.

Spammers aside, many of the problems stem from the simple fact that the search engines are swamped. Like most everything to do with the Internet, runaway growth has resulted in massive backlogs. Currently the major search sites are taking two months or more to add new submissions. This overload leads to many problems. Overworked search engine boffins are not only way behind in adding new sites. They also don't have time to sort through the database for old and outdated links, to carry on the battle against spamdexers, and least of all to take the time to make needed improvements to the search engines themselves. All the major search sites have scads of outdated links, and links to pages of very minimal use, while well-known worthwhile sites are excluded.

Even allowing for system overload, spammers and sunspots, however, it often seems that the actual search algorithms used by search sites are pretty unintelligent. Obviously, machines can blindly count keywords but they can't make common-sense judgements as humans can. Of course, this doesn't explain why human-edited directories such as Yahoo are also full of junk.

Search engines and directories use keywords to determine how each page will be ranked in search results. They don't simply count the number of instances of a word on a page, but attempt to make the rankings better by assigning more weight to things like titles, subheadings, and so on. In reality, however, the prevalence of a certain keyword is not always in proportion to the relevance of a page. Consider the following two examples, each a hypothetical header to an article:

Adobe's Photoshop is really a great program. I love Adobe's Photoshop. I'm only eight years old, but let me tell you, Adobe's Photoshop is great! Photoshop! Photoshop!

Photoshop is a graphic editing package from Adobe. This article contains an in-depth review of the program, a twelve-part tutorial, and over a hundred links to other resources for users of this fine software package.

Now if you're looking for information about this fine graphics software, and type "Photoshop" into a search engine, which of these pages do you suppose is going to come up first? Now, which do you suppose is likely to be more useful to you? Rather than ranking pages by keywords, which often produces ridiculous results, and has spawned whole new theaters of combat in the spam wars, they should be ranked by the amount of useful information a page contains. Alas, search engines cannot do this.

They are also incapable of distinguishing between content and words that describe the content. For example, let's say we're searching for information about video equipment. Our search turns up forty thousand links that look something like this:

  • Video: Middle East Peace Talks
  • Video: Twisted Sister Interview
  • Video: Underwater Basket Weaving Demonstration

Do any of these sites have anything to do with video gear? Nope. They contain video clips, so they come up on a search for "video", although their content has nothing to do with video. The search engine doesn't realize that the word "video" on these sites refers to the type of content, rather than the content itself. In other words, it can't distinguish between medium and message, or to put it in more geekish terms, between the value of a variable and the name of a variable.

Are search engines dead?
Are search engines dead?
Similar Yet Un-related


Up to => Home / Internet / Dead_SearchEngines




Jupiter Online Media: internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and Jupiter Online Media

Jupitermedia Corporate Info


Legal Notices, Licensing, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers