|
When the World
Wide Web opened up the Internet to the world and the general
public, search engine information searches became a normal part of
our work, entertainment and study routines. Web surfing was born.
However, one recent survey reported that 84% of the users were dissatisfied
with their ability to find information on the Web.
There are two things
that allow us to find and retrieve information:
- Search
engine programs to
find the information on the WWW and build the database to store it
- Search
engine software to
allow us users to see and search a database and retrieve information
from it
Web
Search Retrieval Software - are Information Retrieval
(IR) systems, a branch of computer science that allows users (you
at your client machine) to retrieve specific information from large
databases. Search engines are Web search tools that build
a database and then allow users to ask for information using a
web page interface. The search engine software then retrieves
information about webpages that fit the search criteria and return
a web page of results or "hits." Some allow submissions
by business and organizations that create websites. Often there
are no fees for a submission, small fees for a company to submit
for you to 20-100's of search engines, or fees by the search engines
for high placement in the results that are returned.
The user
types a query or search expression into a small window on
screen and the search engine does its best to find Web pages (Hits)
and their URL's that seem to match your query. Often a description
of the Webpage is included with the results. Each search engine searches
its own database. No one search engine comes close to categorizing
the entire Web and no two search engines have the same database of
pages.
| Elements
of a good search Interface |
| Indexed
versus Search Zones - Some search engines will choose
between letting you search its entire indexed database of sites
versus creating search zones to help the user narrow their site
search to pertinent information and pages. |
| Another
element of a well-planned search engine interface is to recognize
the users information needs and provide search capability
of your site based upon those needs, including the method of
indexing your site and the choice of indexing software to load
or purchase. Users will do one or more of the following:
1. Search for a known item - a clearly defined search;
likely providing a single, correct answer.
2. Existence Search - the user knows what they want but not how
to describe it.
3. Exploratory Search - users know how to phrase the question but
are unsure of what they will find. They are exploring and learning
and will do multiple searches.
4. Comprehensive Search - research; users want everything available
on the topic
A last consideration is to build a good user
interface for the user search tool and for
the returned results. The search tool screen needs
to be little more than a small window on the interface
web page screen but often the interface includes
tons of ads and other info. See Yahoo versus Google.
|
|
|
How
well do search engines cover the Internet? - In 1997, the
top 11 search engines covered about 60% of the Internet. As
of 2004, the top 11 coverage has dropped to about 42% of the
Internet.
Despite
claims made by the search engines, no search engine comes close
to covering the entire Internet nor all its web pages. The Internet
is growing faster than search engines can find it and index or
categorize it.
Additionally,
much of the data that a user might want is "Hidden" from
the search bots in databases. This is known as the Hidden Internet,
the Deep Web or the Invisible Web/Internet.
The
Invisible Web is comprised of information stored in
databases, according to Chris Sherman, Webmaster of About.com's Web
Search. Spiders and robots cannot enter these databases.
"It's
as if they've run smack into the entrance of a massive library
with securely bolted doors," Sherman said. "Spiders
can record the library's address, but can tell you nothing about
the books, magazines or other documents."
What
else makes up the Invisible Web?
*
Non-HTML files (PDF files, etc.)
* Webbed databases
* Sites requiring registration or login
* Archives (newspapers and magazines, etc.)
* Dynamically created Web pages
* Interactive tools (calculators, etc.)
http://www.libraryspot.com/features/invisibleweb.htm
Search
Engine Coverage of the Internet
http://www.cabrillo.edu/~tsmalley/searchengines.html
Search
Engine Statistics
http://searchenginewatch.com
http://searchenginewatch.com/3632382
Search
Engine Ranks by Pages Indexed
http://searchenginewatch.com/reports/article/php/215481
Popularity
of Search Engines
http://www.silurian.com/sitepos/coverage.htm
Search Engines
outside the U.S. - Despite calling the web part of the Internet
the World Wide Web, our viewpoint, is generally focused on the
U.S. and ignores the world. The search engines we use are U.S.
based and focus on U.S. websites.
There are search
engines that are based in other countries that focus on the websites
of those countries and their continents.
It's the
world wide web click
here to see more
European
search engines
Search
Engine Colossus
|
Queries may be in
the form of:
- key word searches
- specific words or phrases
- advanced searches
using Boolean logic (Boolean operators such as AND, OR, NOT to search
more specifically)
- fuzzy queries
- where you type in full sentences (Ask
Jeeves)
The databases
are built by automated searching tools. Private Internet companies
like Yahoo! and Lycos have developed powerful software systems and
computer programs called spiders, crawlers,
web robots, or just bots for
short that search the Internet for web pages and enter them into
the giant databases. The bots automatically:
- Find new information
and Web pages
- find updated information
- delete web pages
that no longer are on the Web
- update the database
The databases that
are created by the bots and searchable by users come in four general
formats or categories:
- search
engines -
uncategorized databases that search large parts of the Web.
- subject
lists, indexes and directories -
databases sorted into categories
- meta-search
engines -
one search engine is used to search several other search engines,
collate the results and return them in one organized report
- Other
Web resources - whatis,
Internic's whois, Cruzio's
domain lookup, bigfoot, whowhere, Yahoo!
People Search, Yahoo Yellow Pages, mapquest, Yellow
Pages, Usenet newsgroups, and so on.
A comparative search
engine chart for selected search engines
http://libwww.cabrillo.cc.ca.us/html/searchengchart.html
| A
comparative chart of four search engines |
| Google - The
largest. Particularly good at ranking of results Automatically
puts an AND between your terms so you don't have to, e.g., "cheap
airline tickets" Europe. No truncation function. Try the
Advanced search mode, where you can use date and domain filters. |
| AltaVista - Particularly
powerful in the Advanced search mode, where you can do
a Boolean search. AltaVista is the only search engine to offer
the operator NEAR which will retrieve words within 10 words of
each other. For example "cheap airline tickets" NEAR
Barcelona. |
| AllTheWeb A
very large, very fast search engine. Simulate Boolean
searches by using + signs to require words in results list, e.g.,
+"cheap airline tickets " +Europe Try the Advanced
Search mode. No truncation |
|
HotBot Easy
to search for particular kinds of files. Takes Boolean
search statements when you change the option box, e.g., "cheap
airline" AND (Spain OR Austria) You can specify date and
time filters, also file types, from the opening search page.
Use Advanced Search for more options. Can truncate using *
|
To get to a list
(with descriptions) of the Search Tools:
1. Be on the Cabrillo
College Library homepage
2. Click on Search the Internet
3. Click on Search Engines
The Big Question: How
will anyone know you're there! Well, they won't unless you shout it
out. It takes marketing, registering and promoting to get customers
to your Web site.
Visitors typically
either:
- Hear about you
from advertising, write down or memorize your URL and type it into
their browser's "Location box,"...Two
years ago about 70% of all business eCommerce sites were found through
Internet searches; Now, about 65% are found through traditional advertising
channels.
- They follow a
link from another site or online ad to your site,
- A friend or acquaintance
refers them,
- They "hit" on
your site through a search.
The reality of
the Internet today is that your website is one of millions of
Websites. It is unrealistic and even foolish to expect customers
to find you through search engines and directories. You will need
to rely upon targeted marketing efforts. That means you must make
and then implement a marketing
plan that begins with identifying and targeting customers and
includes both Web and traditional marketing efforts to reach
those customers with informative and persuasive messages.
|
|
Every search engine
has slightly or greatly different rules for searches. Despite the the
variations, three rules do apply across all the search engines:
- Use quotation
marks (" ") to keep words in phrases together
- Use a plus sign
(+) in front of a term (no space) to require it in the search results.
In some search engines, now, the plus sign is not required -- but
you're not penalized for using it.
- Use a minus sign
(-) in front of a term (no space) to disallow it in the search results
Try
This!
You are asking for Web pages that have the words: Monterey and bay and sea and otters.
Click on Google Search.
Type in the words and see what is returned. This is not a very precise
search...you will probably have retrieved about 25,300 resources.
Now, try varying
the search. Here's one idea. Type in
"Monterey
Bay" "sea otters"
and then click on Google
Search.
There's a big lesson
here: You use quotation marks to hold words in phrases together. This
reduces our results from 25,300 or so to 10,300 or so. In this second
search, you are specifying that the Web pages you retrieve should have
the phrase "Monterey Bay" and the phrase "sea otters."
Try some other variations
-- try making the word otter singular, for example.
Try searching for "otters
in Monterey Bay." Each time, note how literal the computer is.
|
|
Boolean
Searches Buddhist
AND Monastery AND "santa cruz"
http://www.cabrillo.edu/~tsmalley/Boolean.html
| Some
Common Boolean Operators |
| AND |
AND
NOT |
| OR |
(
) = or? |
| NOT |
truncations start*
finds anything that begins with start |
| BOTH |
|
| ADJ -
adjacent, like near |
NEAR
- within 10 words |
1. Go to Lycos
You've heard that
there is a Buddhist monastery somewhere in Santa Cruz County. Find
it using Boolean operators and describe how you did it.
How did you find
it?
______________________________________________________________
______________________________________________________________
______________________________________________________________
Advanced
search modes particular to some of the search engines.
1.
Go to Google (remember,
to get to a list of the search engines, it's Cabrillo
College Library -> Searching the Internet -> Search Engines).
Click on Advanced
Search -- it's over to the right of the search box
Search
for Web pages that
meet these criteria:
You want information about best bike trails in Santa Cruz
You want Web pages updated within the last 3 months
You want Web pages that only come from educational domains (i.e., have
.edu in their domain name) (You figure those college kids would know
more than others about great bike trails)
What did you find?
______________________________________________________________
______________________________________________________________
2. Go
to AllTheWeb. Click
on ADVANCED SEARCH.
Search
for Web pages that meet these criteria:
You want information about the best places for surfing in Santa Cruz
You want Web pages that include images
You want Web pages updated after 1 January 2001
What did you find?
______________________________________________________________
______________________________________________________________
|