INLS80 ASSIGNMENT 2

An Exercise in Web Searching

For questions or comments please Email Robb

INTRODUCTION

Search engines have had a significant impact on the way that college students retrieve ordinary, every day information. The amount of unanswered queries that arise in the average day for a college student is staggering. Who was in that movie with the guy and that girl in 8th grade? How many home runs did Barry Bonds hit in 2000? How old is Dolly Pardon? And whatever happened to Alf? The list goes on and on. Even as late as my freshmen year (1999) the answers to these questions remained unattainable, too large an effort to be worth while. Now, debates are settled and questions answered with "Google It," or "go Yahoo!." It seems that the right words typed in to a search engine will give you the answers to any of life's quandaries.

My approach to comparing and contrasting search engines is to first compile the basics about each site and then query each engine on a random question that could come up in a debate with a competitive friend. The evaluation comes in teh ability of each engine to produce accurate answers with the least effort on the part of the average college student. The three engines profiled are Google, Teoma, and Alta Vista.


SEARCH ENGINE PROFILES:

 
Description
The King
The Up-And Comer
The Old Man
How it Works
Spider-Based Software that crawls the web and sends information back to indexing software. When a query is run, the database returns the indexed items ranked by relevance.
Spider-Based Software, returns information in three categories: relevant web pages, suggestions to narrow search, and Resources
Spider-Based Software that crawls the web and sends information back to indexing software. When a query is run, the database returns the indexed items ranked by relevance.
Size
The biggest. Claims to have over 1.5 Billion websites indexed.
Around 100 Million, a newcomer growing rapidly.
Around 550 Million sites including multimedia resources.
Relevancy
High quality, due to link analysis: link itself is included in scoring returns.
Results returned in a more dynamic fashion than Google: rankings updated in real time. Also takes into consideration the "community" factor: number and relevancy of pages that are linked to the page.
Quality.
Currency
Updated about once a month.
Rankings updated in real time.
Updated about once a month.
Phrase Searching?
Yes, use quotation marks.
Yes, use quotation marks.
Yes, use quotation marks.
Sub-Searching?
Yes, at bottom of initial results page.
No
No
Results Ranking
Based on number of pages that are linked to it.
(See relevancy)
Based on number of pages that are linked to it.
Advanced Search?
Yes
No
Yes
Search Terms Highlighted?
Yes
Yes
Yes

 


MISCELLANEOUS INFORMATION

- Able to search pdf files on the web
- Cannot do complex Boolean searches
- Fully indexed as well as partially indexed sites

- Results page is top 10 results

- User is given option of "Feeling Lucky" search where the user is immediately taken to the top scoring result

 

- No advanced searching
- No field entry options
- Ignores frequently occurring words
- Not case sensitive
- Results display top 10 results with the title, a three line description, and the URL

 

- Case sensitive
- After search the user is given the opportunity to extend the search, search Ebay directly, search the yellow pages, or search a listing of travel, shopping, and other resources
- User is able to search images across the web, audio clips, movie images, and news sources
- Default Boolean options
- Search results page includes top 10 results, a title link, the URL, and a three sentence description from the site

 


THE TEST

All of the above technical information is great for the systems analysis and technical people, but what does it mean for the common user, namely the mildly educated college student? The method for testing is to take a specific, somewhat obscure question that may be of practical use as daily pop culture knowledge. The parameters entered into the query to search by will be from the perspective of the student who has not taken INLS 70 and wants a quick answer. The engine that returns the correct answer with the least effort will be declared the winner.

 

The Question: What ever happened to Alf? When did it go off the air?

Search Parameters: ALF AND TV AND SHOW AND CANCELLED

Correct Answer: The show was cancelled in 1990 after four years on the air.

The reason is that the show was ricidulous.


 

Google: 929 Results returned in .03 seconds

#1 Result: Site about Alf Clausen answering questions about the TV show. One of the last questions of the interview is "What Happened to ALf?" The answer is it was cancelled though the year is not given.
The correct answer is obtained on the third link
It would not take long for even the lowest skilled student to find the answer
 

Teoma: 188 Results

#1 Result: Everything you could want to know about ALF at Stephen's ALF page. Here we learn that ALF was cancelled again in 2001 from its rerun slot on the Hallmark channel. In the Info section we learn all the background about ALF to get the right answer
The remaining indexed links on the first page are completely irrelevant
 

Alta Vista: 280 Results

#1 Result: Interestingly, we can get the answer from the description line of the first link though to verify would be difficult. The description says "ALF, NBC (1986-1990). The website itself, however, is difficult to neavigate and pretty useless. So, we get the answer easily if we trust the content of the search's explanatory line
 

 


CONCLUSION

Teoma is the winner of this exercise. The advanced indexing that allows Teoma to rand according to level of expertise in the community of the topic searched ensures that the top scorers are closer to authorities on the subject. Alta Vista also returned the answer painlessly though the link itself was an unprofessional site with little good information. Google also would get the job done though not as effectively as Teoma.

What we can learn from this simple experiment is that the strenghts, weaknesses, and goals of a search engine need to be considered when selecting your tool. With so much information is at the finger tips of anyone with even a little bit of knowledge of how to access it, every question has an answer!

 

For questions or comments please Email Robb