So What's the Problem?
June 26, 2000
Web publishers also find dealing with search engines to be a
frustrating pursuit. Everybody wants their pages to be easy for the
world to find, but getting your site listed can be tough. Search sites
may take a long time to list your site, may never list it at all, and
may drop it after a few months for no reason. If you resubmit often,
as it is very tempting to do, you may even be branded a spamdexer and
barred from a search site. And as for trying to get a good ranking,
forget it! You have to keep up with all the arcane and ever-changing
rules of a dozen different search engines, and adjust the keywords on
your pages just so...all the while fighting against the very plausible
theory that in fact none of this stuff matters, and the search sites
assign rankings at random or by whim.
Spammers are to blame for a lot of the search sites' problems. If the
invisible hand of the free market were to be left totally alone, every
search site would quickly fill up with get-rich-quick, weight loss and
porno links. In order to survive as a useful resource, every search
site must wage a never-ending war against spamdexers and over-zealous
promoters. This means that they have less time to devote to keeping
their databases current and their search algorithms tuned. It also
means that a lot of perfectly worthwhile sites get barred from, or
never added to, many search sites. See
Keeping the Search Engines Happy for more ruminations on the spam
wars, and how to avoid being caught in the middle.
Spammers aside, many of the problems stem from the simple fact that
the search engines are swamped. Like most everything to do with the
Internet, runaway growth has resulted in massive backlogs. Currently
the major search sites are taking two months or more to add new
submissions. This overload leads to many problems. Overworked search
engine boffins are not only way behind in adding new sites. They also
don't have time to sort through the database for old and outdated
links, to carry on the battle against spamdexers, and least of all to
take the time to make needed improvements to the search engines
themselves. All the major search sites have scads of outdated links,
and links to pages of very minimal use, while well-known worthwhile
sites are excluded.
Even allowing for system overload, spammers and sunspots, however, it
often seems that the actual search algorithms used by search sites are
pretty unintelligent. Obviously, machines can blindly count keywords
but they can't make common-sense judgements as humans can. Of course,
this doesn't explain why human-edited directories such as Yahoo are
also full of junk.
Search engines and directories use keywords to determine how each page
will be ranked in search results. They don't simply count the number
of instances of a word on a page, but attempt to make the rankings
better by assigning more weight to things like titles, subheadings,
and so on. In reality, however, the prevalence of a certain keyword
is not always in proportion to the relevance of a page. Consider the
following two examples, each a hypothetical header to an article:
Adobe's Photoshop is really a great program. I love Adobe's Photoshop.
I'm only eight years old, but let me tell you, Adobe's Photoshop is
great! Photoshop! Photoshop!
Photoshop is a graphic editing package from Adobe. This article
contains an in-depth review of the program, a twelve-part tutorial,
and over a hundred links to other resources for users of this fine
software package.
Now if you're looking for information about this fine graphics
software, and type "Photoshop" into a search engine, which of these
pages do you suppose is going to come up first? Now, which do you
suppose is likely to be more useful to you? Rather than ranking pages
by keywords, which often produces ridiculous results, and has spawned
whole new theaters of combat in the spam wars, they should be ranked
by the amount of useful information a page contains. Alas, search
engines cannot do this.
They are also incapable of distinguishing between content and words
that describe the content. For example, let's say we're searching for
information about video equipment. Our search turns up forty thousand
links that look something like this:
- Video: Middle East Peace Talks
- Video: Twisted Sister Interview
- Video: Underwater Basket Weaving Demonstration
Do any of these sites have anything to do with video gear? Nope. They
contain video clips, so they come up on a search for "video", although
their content has nothing to do with video. The search engine doesn't
realize that the word "video" on these sites refers to the type of
content, rather than the content itself. In other words, it can't
distinguish between medium and message, or to put it in more geekish
terms, between the value of a variable and the name of a variable.
Are search engines dead?
Are search engines dead?
Similar Yet Un-related
|