| |
| How
Searching the World Wide Web Works |
| |
|
When the World
Wide Web opened up the Internet to the world and the general public,
search engine information searches became a normal part of our work, entertainment
and study routines. Web surfing was born. However, one recent survey reported
that 84% of the users were dissatisfied with their ability to find information
on the Web.
There are two things
that allow us to find and retrieve information:
- Search
engine programs
to find the information on the WWW and build the database to store it
- Search
engine software
to allow us users to see and search a database and retrieve information
from it
Web
Search Retrieval Software - are Information Retrieval
(IR) systems, a branch of computer science that allows users
(you at your client
machine) to retrieve specific information from large databases. Search
engines are Web search tools that build a database and then allow
users to ask for information using a web page interface. The search
engine software then
retrieves information about webpages that fit the search criteria and
return a
web
page of results or "hits." Some allow submissions by business
and organizations that create websites. Often there are no fees for a
submission, small fees for a company to submit for you to 20-100's of
search engines, or fees by the search engines for high placement in the results that
are returned.
The user
types a query or search expression into a small window on screen and the search
engine
does its best to find Web pages (Hits) and their URL's that seem
to match your query. Often a description of the Webpage is included with
the results. Each search engine searches its own database. No one search
engine comes close to categorizing the entire Web and no two search engines
have the same database of pages.
| Elements
of a good search Interface |
| Indexed
versus Search Zones - Some search engines will choose
between letting you search its entire indexed database of sites
versus creating search zones to help the user narrow their site
search to pertinent information and pages. |
| Another
element of a well-planned search engine interface is to recognize
the users information needs and provide
search capability of your site
based upon those needs, including the method of indexing your site
and the choice of indexing software to load or purchase. Users will
do one or more of the following:
1. Search for a known item - a clearly defined search; likely
providing a single, correct answer.
2. Existence Search - the user knows what they want but not how to describe
it.
3. Exploratory Search - users know how to phrase the question but are unsure
of what they will find. They are exploring and learning and will do multiple
searches.
4. Comprehensive Search - research; users want everything available on the
topic
A last consideration is to build a good user interface for
the user search tool and for the returned results. The search tool
screen needs to be little more than a small window on the interface
web page screen but often the interface includes tons of ads and
other info. See Yahoo versus Google.
|
|
|
How
well do search engines cover the Internet? - In 1997, the
top 11 search engines covered about 60% of the Internet. As
of 2004, the top 11 coverage has dropped to about 42% of the
Internet.
Despite
claims made by the search engines, no search engine comes close
to covering the entire Internet nor all its web pages. The Internet
is growing faster than search engines can find it and index or
categorize it.
Additionally,
much of the data that a user might want is "Hidden" from the
search bots in databases. This is known as the Hidden Internet,
the Deep Web or the Invisible Web/Internet.
The
Invisible Web is comprised of information stored in
databases, according to Chris Sherman, Webmaster of About.com's Web
Search. Spiders and robots cannot enter these databases.
"It's
as if they've run smack into the entrance of a massive library
with securely bolted doors," Sherman said. "Spiders
can record the library's address, but can tell you nothing about
the books, magazines or other documents."
What
else makes up the Invisible Web?
*
Non-HTML files (PDF files, etc.)
* Webbed databases
* Sites requiring registration or login
* Archives (newspapers and magazines, etc.)
* Dynamically created Web pages
* Interactive tools (calculators, etc.)
http://www.libraryspot.com/features/invisibleweb.htm
Search
Engine Coverage of the Internet
http://www.cabrillo.edu/~tsmalley/searchengines.html
Search
Engine Statistics
http://searchenginewatch.com
http://searchenginewatch.com/3632382
Search
Engine Ranks by Pages Indexed
http://searchenginewatch.com/reports/article/php/215481
Popularity
of Search Engines
http://www.silurian.com/sitepos/coverage.htm
Search Engines
outside the U.S. - Despite calling the web part of the Internet
the World Wide Web, our viewpoint, is generally focused on the
U.S. and ignores the world. The search engines we use are U.S.
based and focus on U.S. websites.
There are search
engines that are based in other countries that focus on the websites
of those countries and their continents.
It's
the world wide web click
here to see more
European
search engines
Search Engine
Colossus |
|
Queries
may be in the form of:
- key word searches
- specific words or phrases
- advanced searches
using Boolean logic (Boolean operators such as AND, OR, NOT to search
more specifically)
- fuzzy queries
- where you type in full sentences (Ask
Jeeves)
The databases
are built by automated searching tools. Private Internet companies
like Yahoo! and Lycos have developed powerful software systems and
computer programs called spiders, crawlers,
web robots, or just bots for
short that search the Internet for web pages and enter them into
the giant databases. The bots automatically:
- Find new information
and Web pages
- find updated information
- delete web pages
that no longer are on the Web
- update the database
The databases that
are created by the bots and searchable by users come in four general
formats or categories:
- search
engines -
uncategorized databases that search large parts of the Web.
- subject
lists, indexes and directories -
databases sorted into categories
- meta-search
engines -
one search engine is used to search several other search engines,
collate the results and return them in one organized report
- Other
Web resources - whatis,
Internic's whois, Cruzio's
domain lookup, bigfoot, whowhere, Yahoo!
People Search, Yahoo Yellow Pages, mapquest, Yellow
Pages, Usenet newsgroups, and so on.
A comparative search
engine chart for selected search engines
http://libwww.cabrillo.cc.ca.us/html/searchengchart.html
| A
comparative chart of four search engines |
| Google - The
largest. Particularly good at ranking of results Automatically
puts an AND between your terms so you don't have to, e.g., "cheap
airline tickets" Europe. No truncation function. Try the
Advanced search mode, where you can use date and domain filters. |
| AltaVista - Particularly
powerful in the Advanced search mode, where you can do
a Boolean search. AltaVista is the only search engine to offer
the operator NEAR which will retrieve words within 10 words of
each other. For example "cheap airline tickets" NEAR
Barcelona. |
| AllTheWeb A
very large, very fast search engine. Simulate Boolean
searches by using + signs to require words in results list, e.g.,
+"cheap airline tickets " +Europe Try the Advanced
Search mode. No truncation |
|
HotBot Easy
to search for particular kinds of files. Takes Boolean
search statements when you change the option box, e.g., "cheap
airline" AND (Spain OR Austria) You can specify date and
time filters, also file types, from the opening search page.
Use Advanced Search for more options. Can truncate using *
|
To get to a list
(with descriptions) of the Search Tools:
1. Be on the Cabrillo
College Library homepage
2. Click on Search the Internet
3. Click on Search Engines
The Big Question: How
will anyone know you're there! Well, they won't unless you shout it
out. It takes marketing, registering and promoting to get customers
to your Web site.
Visitors typically
either:
- Hear about you
from advertising, write down or memorize your URL and type it into
their browser's "Location box,"...Two
years ago about 70% of all business eCommerce sites were found through
Internet searches; Now, about 65% are found through traditional advertising
channels.
- They follow a
link from another site or online ad to your site,
- A friend or acquaintance
refers them,
- They "hit" on
your site through a search.
The reality of
the Internet today is that your website is one of millions of
Websites. It is unrealistic and even foolish to expect customers
to find you through search engines and directories. You will need
to rely upon targeted marketing efforts. That means you must make
and then implement a marketing
plan that begins with identifying and targeting customers and
includes both Web and traditional marketing efforts to reach
those customers with informative and persuasive messages.
|
| Putting
Out the Welcome Mat and Inviting Customers In. |
| First things first: Configure your Web page so search engines
can find you, describe and categorize you and include you in directories
and indexes. |
- Page Titles - Make Sure your page titles make sense. Alta
Vista, for instance, puts the title at the top of your listing and uses
it for the link to your page. Search engines index all the words in the
title.
|
- Page Content - Many search engines use the first few
text lines in a page as a kind of "abstract" of the page. Focus
on using words that will be important to a customer "searching" the
Web.
|
- Meta Tags - keywords
and descriptions.
Keywords META tag values used by search engines to index and categorize
your page. Many search engines (spiders) limit the number of keywords
you can use. Carefully choose 10-15 keywords at most. Example: <meta
name="keywords" content="puppies, dog food, chew toys,
discount pet food, pet food">
Descriptions are META tag values used in directories along with the
link. Some search engines will use descriptions to index or categorize
your sie. Example: <meta name="description" content="The
galaxies best and cheapest pet chow.">
|
- Text Comments are used by some search engines that don't
use Meta Tags and also by those that do use Meta Tags as a description
for your site. Example: <!-- The
galaxies best and cheapest pet chow. eep your puppy healthy and
happy. Order online today. -->
|
- ALT Text in Images - some search engines index the ALT attributes
in your IMG tags to get a sense of a page and to index graphics.
|
| Conducting
Searches |
|
Every search engine
has slightly or greatly different rules for searches. Despite the the
variations, three rules do apply across all the search engines:
- Use quotation
marks (" ") to keep words in phrases together
- Use a plus sign
(+) in front of a term (no space) to require it in the search results.
In some search engines, now, the plus sign is not required -- but you're
not penalized for using it.
- Use a minus sign
(-) in front of a term (no space) to disallow it in the search results
Try
This!
You are asking for Web pages that have the words: Monterey
and bay and sea and otters.
Click on Google
Search. Type in the words and see what is returned. This is not a very
precise search...you will probably have retrieved about 25,300 resources.
Now, try varying the
search. Here's one idea. Type in
"Monterey Bay"
"sea otters"
and then click on
Google Search.
There's a big lesson
here: You use quotation marks to hold words in phrases together. This
reduces our results from 25,300 or so to 10,300 or so. In this second
search, you are specifying that the Web pages you retrieve should have
the phrase "Monterey Bay" and the phrase "sea otters."
Try some other variations
-- try making the word otter singular, for example.
Try searching for "otters in Monterey Bay." Each
time, note how literal the computer is.
|
| Advanced
searches |
|
Boolean
Searches Buddhist
AND Monastery AND "santa cruz"
http://www.cabrillo.edu/~tsmalley/Boolean.html
| Some Common Boolean Operators |
| AND |
AND NOT |
| OR |
( ) = or? |
| NOT |
truncations start* finds anything that
begins with start |
| BOTH |
|
| ADJ - adjacent, like near |
NEAR - within 10 words |
1. Go to Lycos
You've heard that
there is a Buddhist monastery somewhere in Santa Cruz County. Find it
using Boolean operators and describe how you did it.
How did you find it?
______________________________________________________________
Advanced
search modes particular to some of the search engines.
1.
Go to Google
(remember, to get to a list of the search engines, it's
Cabrillo College Library ->
Searching the Internet -> Search Engines).
Click on Advanced
Search -- it's over to the right of the search box
Search
for Web pages
that meet these criteria:
You want information about best bike trails in Santa Cruz
You want Web pages updated within the last 3 months
You want Web pages that only come from educational domains (i.e., have
.edu in their domain name) (You figure those college kids would know
more than others about great bike trails)
What did you find?
______________________________________________________________
2. Go
to AllTheWeb. Click on ADVANCED
SEARCH.
Search
for Web pages that meet these criteria:
You want information about the best places for surfing in Santa Cruz
You want Web pages that include images
You want Web pages updated after 1 January 2001
What did you find?
______________________________________________________________
|
| |
| World
Search Engines |
|
Play around a bit
with the international search engines so you become familiar with them.
My guess is that these will get better and better in the near future.
Search Engine Colossus
European
Search Engines
Search Engines Worldwide
Country-Based Search Engines
|
|
Using international search engines
1. Search for Web
pages that meet these criteria:
You want information
about best bike trails in Melbourne (Australia)
You want Web pages updated within the last 3 months
You want Web pages that only come from Australia. The top level domain
code for Australia is .au.
Here's a list of
all
the top level domain and country codes.
What did you find?
______________________________________________________________
InfoSeek
is now Go.com
NorthernLights
is gone!
|