0% found this document useful (0 votes)
37 views27 pages

Search Engines: The Players and The Field

This document discusses search engines and how they work. It begins with an overview of the typical search process, including how search engines return ranked results and ads in response to queries. It then describes the major components of a search engine, including the web index, query engine, and crawling process. Specific search engines like Google, Yahoo, Ask Jeeves, and others are also mentioned. The document concludes with statistics about search queries, languages used, and the rate of web content changes.

Uploaded by

Parimita Sarma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views27 pages

Search Engines: The Players and The Field

This document discusses search engines and how they work. It begins with an overview of the typical search process, including how search engines return ranked results and ads in response to queries. It then describes the major components of a search engine, including the web index, query engine, and crawling process. Specific search engines like Google, Yahoo, Ask Jeeves, and others are also mentioned. The document concludes with statistics about search queries, languages used, and the rate of web content changes.

Uploaded by

Parimita Sarma
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Search Engines: The players and the field

The mechanics of a typical search.

The search engine wars.


Statistics from search engine logs. The architecture of a search engine.

The query engine.

Mechanics of a typical search

Results & ads returned ranked

Category of first result

Result for phrase query

Search on the Web


Corpus: The publicly accessible Web: static + dynamic Goal: Retrieve high quality results relevant to the users need

(not docs!) Informational want to learn about something Low hemoglobin Navigational want to go to that page United Airlines Transactional want to do something (web-mediated)
Access a service Downloads

Need

Shop

Tampere weather Mars surface images Nikon CoolPix Car rental Finland Abortion morality

Gray areas
Find a good hub Exploratory search see whats there

Search Engines as Info Gatekeepers


Search engines are becoming the primary entry point for discovering web pages. Ranking of web pages influences which pages users will view. Exclusion of a site from search engines will cut off the site from its intended audience. The privacy policy of a search engine is important.

Introna & Nissenbaum: Defining the Web: The Politics of Search Engines Hindman et al: Googlearchy: How a few Heavily-Linked Sites Dominate Politics on the Web

Search Engine Wars


The battle for domination of the web search space is heating up! The competition is good news for users! Crucial: advertising is combined with search results! What if one of the search engines will manage to dominate the space?

Yahoo!

Synonymous with the dot-com boom, probably the best known brand on the web. Started off as a web directory service in 1994, acquired leading search engine technology in 2003.

Has very strong advertising and e-commerce partners

Lycos!

One of the pioneers of the field

Introduced innovations that inspired the creation of Google

Verb google has become synonymous with searching for information on the web. Has raised the bar on search quality Has been the most popular search engine in the last few years.

Had a very successful IPO in August 2004.


Is innovative and dynamic. Has restored glamour in CS lost in dot-com-bust

Google

Synonymous with PC software.

Live Search (was:


MSN Search)

Remember its victory in the browser wars with Netscape.


Developed its own search engine technology only recently, officially launched in Feb. 2005. May link web search into its next version of Windows.

Ask Jeeves

Specialises in natural language question answering. Search driven by Teoma.

Cuil

The latest kid on the block Claims to have indexed 120B pages! So far, it does not rank!

Experiment with query syntax


Default is AND, e.g. computer chess normally interpreted as computer AND chess, i.e. both keywords must be present in all hits. +chess in a query means the user insists that chess be present in all hits. computer OR chess means either keywords must be present in all hits. computer chess means that the phrase computer chess must be present in all hits.

Statistics from search engine logs


Statistic (Year) average terms per query average queries per session average result pages viewed usage of advanced search features AltaVista (1998) 2.35 2.02 1.39 20.4% AlltheWeb Excite (2002) (2001) 2.30 2.60 2.80 1.55 1.0% 2.30 1.70 10.0%

The most popular search keywords


AltaVista (1998) AlltheWeb (2002) Excite (2001)
sex applet porno mp3 free sex download software free sex pictures new

chat

uk

nude

Web search Users


Ill-defined queries

Specific behavior

Short length Imprecise terms Sub-optimal syntax

(80% queries without operator)

Low effort in defining queries

85% look over one result screen only mostly above the fold 78% of queries are not modified
1 query/session

Wide variance in

Needs Expectations Knowledge Bandwidth

Follow links the scent of information ...

Query Distribution

Power law: few popular broad queries, many rare specific queries

How far do people look for results?

(Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)

Architecture of a Search Engine


Sponsored Links CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com

User
Web
Miele, Inc -- Anything else is a compromise

Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)

At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages

Web spider

Miele
Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages

Miele - Deutscher Hersteller von Einbaugerten, Hausgerten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Whlen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages Herzlich willkommen bei Miele sterreich - [ Translate this page ]
Herzlich willkommen bei Miele sterreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERTE ... www.miele.at/ - 3k - Cached - Similar pages

Search

Indexer

The Web Indexes Ad indexes

Rate of web content change


720K pages from 270 popular sites sampled daily from Feb 17 Jun 14, 1999 [Cho00]

Mathematically, what does this seem to be?

What does this suggest for crawling policy?

Diversity
Languages/Encodings

Hundreds of languages, W3C encodings: 55 (Jul01) [W3C01] Home pages (1997): English 82%, Next 15: 13% [Babe97] Google (mid 2001): English: 53%, JGCFSKRIP: 30%

Document & query topic


Popular Query Topics (from 1 million Google queries, Apr 2000)
Arts Computers Regional Society Adult Recreation Business 14.6% 13.8% 10.3% 8.7% 8% 7.3% 7.2% Arts: Music Regional: North America Adult: Image Galleries Computers: Software Computers: Internet Business: Industries Regional: Europe 6.1% 5.3% 4.4% 3.4% 3.2% 2.3% 1.8%

Search Index - Inverted File


Frequency

Also store position of word in web page (offset) and information on HTML structure.

The query engine


The interface between the search index, the user and the web. Algorithmic details of commercial search engines are kept as trade secrets. First step is retrieval of potential results from the index. Second step is the ranking of the results based on their relevance to the query.

Portal User Interface

Crawling the Web

Mode of crawl: BFS Frequency of crawl: important robots.txt gives explicit directions on what not to crawl Parallel machines crawl all the time

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy