0% found this document useful (0 votes)
15 views25 pages

Search Engine - Wikipedia

Uploaded by

mianalihaidar128
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

Search Engine - Wikipedia

Uploaded by

mianalihaidar128
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Search engine

A search engine is a soft ware syst em t hat provides hyperlinks t o web pages and ot her relevant
informat ion on t he Web in response t o a user's query. The user input s a query wit hin a web browser or
a mobile app, and t he search result s are oft en a list of hyperlinks, accompanied by t ext ual
summaries and images. Users also have t he opt ion of limit ing t he search t o a specific t ype of
result s, such as images, videos, or news.

Some engines suggest queries when


the user is typing in the search box.

For a search provider, it s engine is part of a dist ribut ed comput ing syst em t hat can encompass many
dat a cent ers t hroughout t he world. The speed and accuracy of an engine's response t o a query is
based on a complex syst em of indexing t hat is cont inuously updat ed by aut omat ed web crawlers.
This can include dat a mining t he files and dat abases st ored on web servers, but some cont ent is not
accessible t o crawlers.

There have been many search engines since t he dawn of t he Web in t he 1990s, but Google Search
became t he dominant one in t he 2000s and has remained so. It current ly has a 91% global market
share.[1][2] The business of websit es improving t heir visibilit y in search result s, known as market ing
and opt imizat ion, has t hus largely focused on Google.

History

Pre-1990s

In 1945, Vannevar Bush described an informat ion ret rieval syst em t hat would allow a user t o access
a great expanse of informat ion, all at a single desk.[3] He called it a memex. He described t he syst em
in an art icle t it led "As We May Think" t hat was published in The At lant ic Mont hly.[4] The memex was
int ended t o give a user t he capabilit y t o overcome t he ever-increasing difficult y of locat ing
informat ion in ever-growing cent ralized indices of scient ific work. Vannevar Bush envisioned libraries
of research wit h connect ed annot at ions, which are similar t o modern hyperlinks.[5]
Link analysis event ually became a crucial
T imeline (full list )
component of search engines t hrough algorit hms
Year Engine Current st at us
such as Hyper Search and PageRank.[6][7]
W3Catalog Inactive

ALIWEB Inactive
1993
1990s: Birth of search engines JumpStation Inactive

WWW Worm Inactive


The first int ernet search engines predat e t he
WebCrawler Active
debut of t he Web in December 1990: WHOIS user
search dat es back t o 1982,[8] and t he Knowbot Go.com Inactive, redirects to Disney
1994
Informat ion Service mult i-net work user search was Lycos Active

first implement ed in 1989.[9] The first well Infoseek Inactive, redirects to Disney

document ed search engine t hat searched cont ent Active, initially a search
Yahoo! Search
files, namely FTP files, was Archie, which debut ed function for Yahoo! Directory

on 10 Sept ember 1990.[10] Daum Active

Search.ch Active
Prior t o Sept ember 1993, t he World Wide Web was
Magellan Inactive
ent irely indexed by hand. There was a list of 1995
Excite Active
webservers edit ed by Tim Berners-Lee and host ed
MetaCrawler Active
on t he CERN webserver. One snapshot of t he list in
Inactive, acquired by Yahoo!
1992 remains,[11] but as more and more web
AltaVista in 2003, since 2013 redirects
servers went online t he cent ral list could no longer to Yahoo!
keep up. On t he NCSA sit e, new servers were
Inactive, incorporated into
announced under t he t it le "What 's New!".[12] RankDex
Baidu in 2000

Dogpile Active
The first t ool used for searching cont ent (as 1996
Inactive (used Inktomi
opposed t o users) on t he Int ernet was Archie.[13] HotBot
search technology)
The name st ands for "archive" wit hout t he "v".[14] It
Ask Jeeves Active (rebranded ask.com)
was creat ed by Alan Emt age,[14][15][16][17] comput er
Active (rebranded AOL
science st udent at McGill Universit y in Mont real, AOL NetFind
Search since 1999)
Quebec, Canada. The program downloaded t he
1997 goo.ne.jp Active
direct ory list ings of all t he files locat ed on public
Northern Light Inactive
anonymous FTP (File Transfer Prot ocol) sit es,
creat ing a searchable dat abase of file names; Yandex Active

however, Archie Search Engine did not index t he Google Active

cont ent s of t hese sit es since t he amount of dat a Ixquick Active as Startpage.com
1998
was so limit ed it could be readily searched MSN Search Active as Bing
manually. empas Inactive (merged with NATE)
The rise of Gopher (creat ed in 1991 by Mark Year Engine Current st at us
McCahill at t he Universit y of Minnesot a) led t o t wo Inactive (URL redirected to
AlltheWeb
new search programs, Veronica and Jughead. Like Yahoo!)

Archie, t hey searched t he file names and t it les Inactive, rebranded Yellowee

st ored in Gopher index syst ems. Veronica (Very GenieKnows (was redirecting to
1999
justlocalbusiness.com)
Easy Rodent -Orient ed Net -wide Index t o
Comput erized Archives) provided a keyword search Naver Active

of most Gopher menu t it les in t he ent ire Gopher Inactive (redirect to


Teoma
Ask.com)
list ings. Jughead (Jonzy's Universal Gopher
Baidu Active
Hierarchy Excavat ion And Display) was a t ool for
obt aining menu informat ion from specific Gopher 2000 Exalead Inactive

servers. While t he name of t he search engine Gigablast Inactive

"Archie Search Engine" was not a reference t o t he 2001 Kartoo Inactive

Archie comic book series, "Veronica" and "Jughead" 2003 Info.com Active
are charact ers in t he series, t hus referencing t heir A9.com Inactive
predecessor. Inactive (redirect to
Clusty
2004 DuckDuckGo)
In t he summer of 1993, no search engine exist ed
Mojeek Active
for t he web, t hough numerous specialized cat alogs
Sogou Active
were maint ained by hand. Oscar Nierst rasz at t he
SearchMe Inactive
Universit y of Geneva wrot e a series of Perl script s 2005
KidzSearch Active, Google Search
t hat periodically mirrored t hese pages and rewrot e
Soso Inactive, merged with Sogou
t hem int o a st andard format . This formed t he basis
for W3Cat alog, t he web's first primit ive search Quaero Inactive

engine, released on Sept ember 2, 1993.[18] Search.com Active


2006 ChaCha Inactive
In June 1993, Mat t hew Gray, t hen at MIT, produced
Ask.com Active
what was probably t he first web robot , t he Perl-
Active as Bing, rebranded
based World Wide Web Wanderer, and used it t o Live Search
MSN Search
generat e an index called "Wandex". The purpose of
wikiseek Inactive
t he Wanderer was t o measure t he size of t he
Sproose Inactive
World Wide Web, which it did unt il lat e 1995. The 2007
Wikia Search Inactive
web's second search engine Aliweb appeared in
Blackle.com Active, Google Search
November 1993. Aliweb did not use a web robot ,
2008 Powerset Inactive (redirects to Bing)
but inst ead depended on being not ified by websit e
administ rat ors of t he exist ence at each sit e of an Picollator Inactive

index file in a part icular format . Viewzi Inactive


JumpSt at ion (creat ed in December 1993[19] by Year Engine Current st at us
Jonat hon Flet cher) used a web robot t o find web Boogami Inactive
pages and t o build it s index, and used a web form LeapFish Inactive
as t he int erface t o it s query program. It was t hus Forestle Inactive (redirects to Ecosia)
t he first WWW resource-discovery t ool t o combine DuckDuckGo Active
t he t hree essent ial feat ures of a web search
TinEye Active
engine (crawling, indexing, and searching) as
Active, rebranded Live
described below. Because of t he limit ed resources Bing
Search
available on t he plat form it ran on, it s indexing and
Yebol Inactive
hence searching were limit ed t o t he t it les and
Scout (Goby) Active
headings found in t he web pages t he crawler 2009
NATE Active
encount ered.
Ecosia Active

One of t he first "all t ext " crawler-based search Active, sister engine of
Startpage.com
engines was WebCrawler, which came out in 1994. Ixquick

Unlike it s predecessors, it allowed users t o search Blekko Inactive, sold to IBM

for any word in any web page, which has become Cuil Inactive
t he st andard for all major search engines since. It 2010 Yandex
Active
was also t he search engine t hat was widely known (English)

by t he public. Also, in 1994, Lycos (which st art ed at Parsijoo Active


Carnegie Mellon Universit y) was launched and 2011 YaCy Active, P2P
became a major commercial endeavor. 2012 Volunia Inactive

2013 Qwant Active


The first popular search engine on t he Web was
Egerin Active, Kurdish / Sorani
Yahoo! Search.[20] The first product from Yahoo!,
founded by Jerry Yang and David Filo in January 2014 Swisscows Active

1994, was a Web direct ory called Yahoo! Direct ory. Searx Active

In 1995, a search funct ion was added, allowing Yooz Inactive


2015
users t o search Yahoo! Direct ory.[21][22] It became Cliqz Inactive

one of t he most popular ways for people t o find 2016 Kiddle Active, Google Search
web pages of int erest , but it s search funct ion 2017 Presearch Active
operat ed on it s web direct ory, rat her t han it s full- 2018 Kagi Active
t ext copies of web pages. 2020 Petal Active

Brave Search Active


Soon aft er, a number of search engines appeared
and vied for popularit y. These included Magellan, 2021 Queye Active

Excit e, Infoseek, Inkt omi, Nort hern Light , and You.com Active
Alt aVist a. Informat ion seekers could also browse t he direct ory inst ead of doing a keyword-based
search.

In 1996, Robin Li developed t he RankDex sit e-scoring algorit hm for search engines result s page
ranking[23][24][25] and received a US pat ent for t he t echnology.[26] It was t he first search engine t hat
used hyperlinks t o measure t he qualit y of websit es it was indexing,[27] predat ing t he very similar
algorit hm pat ent filed by Google t wo years lat er in 1998.[28] Larry Page referenced Li's work in some
of his U.S. pat ent s for PageRank.[29] Li lat er used his Rankdex t echnology for t he Baidu search
engine, which was founded by him in China and launched in 2000.

In 1996, Net scape was looking t o give a single search engine an exclusive deal as t he feat ured
search engine on Net scape's web browser. There was so much int erest t hat inst ead, Net scape
st ruck deals wit h five of t he major search engines: for $5 million a year, each search engine would be
in rot at ion on t he Net scape search engine page. The five engines were Yahoo!, Magellan, Lycos,
Infoseek, and Excit e.[30][31]

Google adopt ed t he idea of selling search t erms in 1998 from a small search engine company named
got o.com. This move had a significant effect on t he search engine business, which went from
st ruggling t o one of t he most profit able businesses in t he Int ernet .

Search engines were also known as some of t he bright est st ars in t he Int ernet invest ing frenzy t hat
occurred in t he lat e 1990s.[32] Several companies ent ered t he market spect acularly, receiving record
gains during t heir init ial public offerings. Some have t aken down t heir public search engine and are
market ing ent erprise-only edit ions, such as Nort hern Light . Many search engine companies were
caught up in t he dot -com bubble, a speculat ion-driven market boom t hat peaked in March 2000.

2000s–present: Post dot-com bubble

Around 2000, Google's search engine rose t o prominence.[33] The company achieved bet t er result s
for many searches wit h an algorit hm called PageRank, as was explained in t he paper Anatomy of a
Search Engine writ t en by Sergey Brin and Larry Page, t he lat er founders of Google.[7] This it erat ive
algorit hm ranks web pages based on t he number and PageRank of ot her web sit es and pages t hat
link t here, on t he premise t hat good or desirable pages are linked t o more t han ot hers. Larry Page's
pat ent for PageRank cit es Robin Li's earlier RankDex pat ent as an influence.[29][25] Google also
maint ained a minimalist int erface t o it s search engine. In cont rast , many of it s compet it ors
embedded a search engine in a web port al. In fact , t he Google search engine became so popular
t hat spoof engines emerged such as Myst ery Seeker.
By 2000, Yahoo! was providing search services based on Inkt omi's search engine. Yahoo! acquired
Inkt omi in 2002, and Overt ure (which owned Allt heWeb and Alt aVist a) in 2003. Yahoo! swit ched t o
Google's search engine unt il 2004, when it launched it s own search engine based on t he combined
t echnologies of it s acquisit ions.

Microsoft first launched MSN Search in t he fall of 1998 using search result s from Inkt omi. In early
1999, t he sit e began t o display list ings from Looksmart , blended wit h result s from Inkt omi. For a
short t ime in 1999, MSN Search used result s from Alt aVist a inst ead. In 2004, Microsoft began a
t ransit ion t o it s own search t echnology, powered by it s own web crawler (called msnbot ).

Microsoft 's rebranded search engine, Bing, was launched on June 1, 2009. On July 29, 2009, Yahoo!
and Microsoft finalized a deal in which Yahoo! Search would be powered by Microsoft Bing
t echnology.

As of 2019, act ive search engine crawlers include t hose of Google, Sogou, Baidu, Bing, Gigablast ,
Mojeek, DuckDuckGo and Yandex.

Approach

A search engine maint ains t he following processes in near real t ime:[34]

1. Web crawling

2. Indexing

3. Searching[35]

Web search engines get t heir informat ion by web crawling from sit e t o sit e. The "spider" checks for
t he st andard filename robots.txt, addressed t o it . The robot s.t xt file cont ains direct ives for search
spiders, t elling it which pages t o crawl and which pages not t o crawl. Aft er checking for robot s.t xt
and eit her finding it or not , t he spider sends cert ain informat ion back t o be indexed depending on
many fact ors, such as t he t it les, page cont ent , JavaScript , Cascading St yle Sheet s (CSS), headings,
or it s met adat a in HTML met a t ags. Aft er a cert ain number of pages crawled, amount of dat a
indexed, or t ime spent on t he websit e, t he spider st ops crawling and moves on. "[N]o web crawler
may act ually crawl t he ent ire reachable web. Due t o infinit e websit es, spider t raps, spam, and ot her
exigencies of t he real web, crawlers inst ead apply a crawl policy t o det ermine when t he crawling of
a sit e should be deemed sufficient . Some websit es are crawled exhaust ively, while ot hers are
crawled only part ially".[36]
Indexing means associat ing words and ot her definable t okens found on web pages t o t heir domain
names and HTML-based fields. The associat ions are made in a public dat abase, made available for
web search queries. A query from a user can be a single word, mult iple words or a sent ence. The
index helps find informat ion relat ing t o t he query as quickly as possible.[35] Some of t he t echniques
for indexing, and caching are t rade secret s, whereas web crawling is a st raight forward process of
visit ing all sit es on a syst emat ic basis.

Bet ween visit s by t he spider, t he cached version of t he page (some or all t he cont ent needed t o
render it ) st ored in t he search engine working memory is quickly sent t o an inquirer. If a visit is
overdue, t he search engine can just act as a web proxy inst ead. In t his case, t he page may differ
from t he search t erms indexed.[35] The cached page holds t he appearance of t he version whose
words were previously indexed, so a cached version of a page can be useful t o t he websit e when
t he act ual page has been lost , but t his problem is also considered a mild form of linkrot .

High-level architecture of a standard


Web crawler

Typically when a user ent ers a query int o a search engine it is a few keywords.[37] The index already
has t he names of t he sit es cont aining t he keywords, and t hese are inst ant ly obt ained from t he index.
The real processing load is in generat ing t he web pages t hat are t he search result s list : Every page in
t he ent ire list must be weight ed according t o informat ion in t he indexes.[35] Then t he t op search
result it em requires t he lookup, reconst ruct ion, and markup of t he snippets showing t he cont ext of
t he keywords mat ched. These are only part of t he processing each search result s web page
requires, and furt her pages (next t o t he t op) require more of t his post -processing.

Beyond simple keyword lookups, search engines offer t heir own GUI- or command-driven operat ors
and search paramet ers t o refine t he search result s. These provide t he necessary cont rols for t he
user engaged in t he feedback loop users creat e by filtering and weighting while refining t he search
result s, given t he init ial pages of t he first search result s. For example, from 2007 t he Google.com
search engine has allowed one t o filter by dat e by clicking "Show search t ools" in t he left most
column of t he init ial search result s page, and t hen select ing t he desired dat e range.[38] It is also
possible t o weight by dat e because each page has a modificat ion t ime. Most search engines support
t he use of t he Boolean operat ors AND, OR and NOT t o help end users refine t he search query.
Boolean operat ors are for lit eral searches t hat allow t he user t o refine and ext end t he t erms of t he
search. The engine looks for t he words or phrases exact ly as ent ered. Some search engines provide
an advanced feat ure called proximit y search, which allows users t o define t he dist ance bet ween
keywords.[35] There is also concept -based searching where t he research involves using st at ist ical
analysis on pages cont aining t he words or phrases you search for.

The usefulness of a search engine depends on t he relevance of t he result set it gives back. While
t here may be millions of web pages t hat include a part icular word or phrase, some pages may be
more relevant , popular, or aut horit at ive t han ot hers. Most search engines employ met hods t o rank
t he result s t o provide t he "best " result s first . How a search engine decides which pages are t he best
mat ches, and what order t he result s should be shown in, varies widely from one engine t o anot her.[35]
The met hods also change over t ime as Int ernet usage changes and new t echniques evolve. There
are t wo main t ypes of search engine t hat have evolved: one is a syst em of predefined and
hierarchically ordered keywords t hat humans have programmed ext ensively. The ot her is a syst em
t hat generat es an "invert ed index" by analyzing t ext s it locat es. This first form relies much more
heavily on t he comput er it self t o do t he bulk of t he work.

Most Web search engines are commercial vent ures support ed by advert ising revenue and t hus some
of t hem allow advert isers t o have t heir list ings ranked higher in search result s for a fee. Search
engines t hat do not accept money for t heir search result s make money by running search relat ed
ads alongside t he regular search engine result s. The search engines make money every t ime
someone clicks on one of t hese ads.[39]

Local search

Local search is t he process t hat opt imizes t he effort s of local businesses. They focus on change t o
make sure all searches are consist ent . It is import ant because many people det ermine where t hey
plan t o go and what t o buy based on t heir searches.[40]

Market share

As of January 2022, Google is by far t he world's most used search engine, wit h a market share of
90.6%, and t he world's ot her most used search engines were Bing, Yahoo!, Baidu, Yandex, and
DuckDuckGo.[2]
Russia and East Asia

In Russia, Yandex has a market share of 62.6%, compared t o Google's 28.3%. And Yandex is t he
second most used search engine on smart phones in Asia and Europe.[41] In China, Baidu is t he most
popular search engine.[42] Sout h Korea's homegrown search port al, Naver, is used for 62.8% of online
searches in t he count ry.[43] Yahoo! Japan and Yahoo! Taiwan are t he most popular avenues for
Int ernet searches in Japan and Taiwan, respect ively.[44] China is one of few count ries where Google
is not in t he t op t hree web search engines for market share. Google was previously a t op search
engine in China, but wit hdrew aft er a disagreement wit h t he government over censorship and a
cyberat t ack. But Bing is in t op t hree web search engine wit h a market share of 14.95%. Baidu is on
t op wit h 49.1% market share.[45]

Europe

Most count ries' market s in t he European Union are dominat ed by Google, except for t he Czech
Republic, where Seznam is a st rong compet it or.[46]

The search engine Qwant is based in Paris, France, where it at t ract s most of it s 50 million mont hly
regist ered users from.

Search engine bias

Alt hough search engines are programmed t o rank websit es based on some combinat ion of t heir
popularit y and relevancy, empirical st udies indicat e various polit ical, economic, and social biases in
t he informat ion t hey provide [47][48] and t he underlying assumpt ions about t he t echnology.[49] These
biases can be a direct result of economic and commercial processes (e.g., companies t hat advert ise
wit h a search engine can become also more popular in it s organic search result s), and polit ical
processes (e.g., t he removal of search result s t o comply wit h local laws).[50] For example, Google will
not surface cert ain neo-Nazi websit es in France and Germany, where Holocaust denial is illegal.

Biases can also be a result of social processes, as search engine algorit hms are frequent ly designed
t o exclude non-normat ive viewpoint s in favor of more "popular" result s.[51] Indexing algorit hms of
major search engines skew t owards coverage of U.S.-based sit es, rat her t han websit es from non-U.S.
count ries.[48]

Google Bombing is one example of an at t empt t o manipulat e search result s for polit ical, social or
commercial reasons.
Several scholars have st udied t he cult ural changes t riggered by search engines,[52] and t he
represent at ion of cert ain cont roversial t opics in t heir result s, such as t errorism in Ireland,[53] climat e
change denial,[54] and conspiracy t heories.[55]

Customized results and filter bubbles

There has been concern raised t hat search engines such as Google and Bing provide cust omized
result s based on t he user's act ivit y hist ory, leading t o what has been t ermed echo chambers or filt er
bubbles by Eli Pariser in 2011.[56] The argument is t hat search engines and social media plat forms
use algorit hms t o select ively guess what informat ion a user would like t o see, based on informat ion
about t he user (such as locat ion, past click behaviour and search hist ory). As a result , websit es t end
t o show only informat ion t hat agrees wit h t he user's past viewpoint . According t o Eli Pariser users
get less exposure t o conflict ing viewpoint s and are isolat ed int ellect ually in t heir own informat ional
bubble. Since t his problem has been ident ified, compet ing search engines have emerged t hat seek
t o avoid t his problem by not t racking or "bubbling" users, such as DuckDuckGo. However many
scholars have quest ioned Pariser's view, finding t hat t here is lit t le evidence for t he filt er
bubble.[57][58][59] On t he cont rary, a number of st udies t rying t o verify t he exist ence of filt er bubbles
have found only minor levels of personalisat ion in search,[59] t hat most people encount er a range of
views when browsing online, and t hat Google news t ends t o promot e mainst ream est ablished news
out let s.[60][58]

Religious search engines

The global growt h of t he Int ernet and elect ronic media in t he Arab and Muslim World during t he last
decade has encouraged Islamic adherent s in t he Middle East and Asian sub-cont inent , t o at t empt
t heir own search engines, t heir own filt ered search port als t hat would enable users t o perform safe
searches. More t han usual safe search filt ers, t hese Islamic web port als cat egorizing websit es int o
being eit her "halal" or "haram", based on int erpret at ion of t he "Law of Islam". ImHalal came online in
Sept ember 2011. Halalgoogling came online in July 2013. These use haram filt ers on t he collect ions
from Google and Bing (and ot hers).[61]

While lack of invest ment and slow pace in t echnologies in t he Muslim World has hindered progress
and t hwart ed success of an Islamic search engine, t arget ing as t he main consumers Islamic
adherent s, project s like Muxlim, a Muslim lifest yle sit e, did receive millions of dollars from invest ors
like Rit e Int ernet Vent ures, and it also falt ered. Ot her religion-orient ed search engines are Jewogle,
t he Jewish version of Google,[62] and SeekFind.org, which is Christ ian. SeekFind filt ers sit es t hat
at t ack or degrade t heir fait h.[63]
Search engine submission

Web search engine submission is a process in which a webmast er submit s a websit e direct ly t o a
search engine. While search engine submission is somet imes present ed as a way t o promot e a
websit e, it generally is not necessary because t he major search engines use web crawlers t hat will
event ually find most web sit es on t he Int ernet wit hout assist ance. They can eit her submit one web
page at a t ime, or t hey can submit t he ent ire sit e using a sit emap, but it is normally only necessary t o
submit t he home page of a web sit e as search engines are able t o crawl a well designed websit e.
There are t wo remaining reasons t o submit a web sit e or web page t o a search engine: t o add an
ent irely new web sit e wit hout wait ing for a search engine t o discover it , and t o have a web sit e's
record updat ed aft er a subst ant ial redesign.

Some search engine submission soft ware not only submit s websit es t o mult iple search engines, but
also adds links t o websit es from t heir own pages. This could appear helpful in increasing a websit e's
ranking, because ext ernal links are one of t he most import ant fact ors det ermining a websit e's
ranking. However, John Mueller of Google has st at ed t hat t his "can lead t o a t remendous number of
unnat ural links for your sit e" wit h a negat ive impact on sit e ranking.[64]

Comparison to social bookmarking

In comparison t o search engines, a social bookmarking syst em has several advant ages over
t radit ional aut omat ed resource locat ion and classificat ion soft ware, such as search engine spiders.
All t ag-based classificat ion of Int ernet resources (such as web sit es) is done by human beings, who
underst and t he cont ent of t he resource, as opposed t o soft ware, which algorit hmically at t empt s t o
det ermine t he meaning and qualit y of a resource. Also, people can find and bookmark web pages
t hat have not yet been not iced or indexed by web spiders.[65] Addit ionally, a social bookmarking
syst em can rank a resource based on how many t imes it has been bookmarked by users, which may
be a more useful met ric for end-users t han syst ems t hat rank resources based on t he number of
ext ernal links point ing t o it . However, bot h t ypes of ranking are vulnerable t o fraud, (see Gaming t he
syst em), and bot h need t echnical count ermeasures t o t ry t o deal wit h t his.
Technology

Archie

The first web search engine was Archie, creat ed in 1990[66] by Alan Emt age, a st udent at McGill
Universit y in Mont real. The aut hor originally want ed t o call t he program "archives", but had t o short en
it t o comply wit h t he Unix world st andard of assigning programs and files short , crypt ic names such
as grep, cat , t roff, sed, awk, perl, and so on.

The primary met hod of st oring and ret rieving files was via t he File Transfer Prot ocol (FTP). This was
(and st ill is) a syst em t hat specified a common way for comput ers t o exchange files over t he
Int ernet . It works like t his: Some administ rat or decides t hat he want s t o make files available from his
comput er. He set s up a program on his comput er, called an FTP server. When someone on t he
Int ernet want s t o ret rieve a file from t his comput er, he or she connect s t o it via anot her program
called an FTP client . Any FTP client program can connect wit h any FTP server program as long as
t he client and server programs bot h fully follow t he specificat ions set fort h in t he FTP prot ocol.

Init ially, anyone who want ed t o share a file had t o set up an FTP server in order t o make t he file
available t o ot hers. Lat er, "anonymous" FTP sit es became reposit ories for files, allowing all users t o
post and ret rieve t hem.

Even wit h archive sit es, many import ant files were st ill scat t ered on small FTP servers. These files
could be locat ed only by t he Int ernet equivalent of word of mout h: Somebody would post an e-mail
t o a message list or a discussion forum announcing t he availabilit y of a file.

Archie changed all t hat . It combined a script -based dat a gat herer, which fet ched sit e list ings of
anonymous FTP files, wit h a regular expression mat cher for ret rieving file names mat ching a user
query. (4) In ot her words, Archie's gat herer scoured FTP sit es across t he Int ernet and indexed all of
t he files it found. It s regular expression mat cher provided users wit h access t o it s dat abase.[67]

Veronica

In 1993, t he Universit y of Nevada Syst em Comput ing Services group developed Veronica.[66] It was
creat ed as a t ype of searching device similar t o Archie but for Gopher files. Anot her Gopher search
service, called Jughead, appeared a lit t le lat er, probably for t he sole purpose of rounding out t he
comic-st rip t riumvirat e. Jughead is an acronym for Jonzy's Universal Gopher Hierarchy Excavat ion and
Display, alt hough, like Veronica, it is probably safe t o assume t hat t he creat or backed int o t he
acronym. Jughead's funct ionalit y was pret t y much ident ical t o Veronica's, alt hough it appears t o be a
lit t le rougher around t he edges.[67]

The Lone Wanderer

The World Wide Web Wanderer, developed by Mat t hew Gray in 1993[68] was t he first robot on t he
Web and was designed t o t rack t he Web's growt h. Init ially, t he Wanderer count ed only Web servers,
but short ly aft er it s int roduct ion, it st art ed t o capt ure URLs as it went along. The dat abase of
capt ured URLs became t he Wandex, t he first web dat abase.

Mat t hew Gray's Wanderer creat ed quit e a cont roversy at t he t ime, part ially because early versions
of t he soft ware ran rampant t hrough t he Net and caused a not iceable net wide performance
degradat ion. This degradat ion occurred because t he Wanderer would access t he same page
hundreds of t imes a day. The Wanderer soon amended it s ways, but t he cont roversy over whet her
robot s were good or bad for t he Int ernet remained.

In response t o t he Wanderer, Mart ijn Kost er creat ed Archie-Like Indexing of t he Web, or ALIWEB, in
Oct ober 1993. As t he name implies, ALIWEB was t he HTTP equivalent of Archie, and because of
t his, it is st ill unique in many ways.

ALIWEB does not have a web-searching robot . Inst ead, webmast ers of part icipat ing sit es post t heir
own index informat ion for each page t hey want list ed. The advant age t o t his met hod is t hat users
get t o describe t heir own sit e, and a robot does not run about eat ing up Net bandwidt h. The
disadvant ages of ALIWEB are more of a problem t oday. The primary disadvant age is t hat a special
indexing file must be submit t ed. Most users do not underst and how t o creat e such a file, and
t herefore t hey do not submit t heir pages. This leads t o a relat ively small dat abase, which meant t hat
users are less likely t o search ALIWEB t han one of t he large bot -based sit es. This Cat ch-22 has
been somewhat offset by incorporat ing ot her dat abases int o t he ALIWEB search, but it st ill does
not have t he mass appeal of search engines such as Yahoo! or Lycos.[67]

Excite

Excit e, init ially called Archit ext , was st art ed by six St anford undergraduat es in February 1993. Their
idea was t o use st at ist ical analysis of word relat ionships in order t o provide more efficient searches
t hrough t he large amount of informat ion on t he Int ernet . Their project was fully funded by mid-1993.
Once funding was secured. t hey released a version of t heir search soft ware for webmast ers t o use
on t heir own web sit es. At t he t ime, t he soft ware was called Archit ext , but it now goes by t he name
of Excit e for Web Servers.[67]

Excit e was t he first serious commercial search engine which launched in 1995.[69] It was developed
in St anford and was purchased for $6.5 billion by @Home. In 2001 Excit e and @Home went bankrupt
and InfoSpace bought Excit e for $10 million.

Some of t he first analysis of web searching was conduct ed on search logs from Excit e [70][71]

Yahoo!

In April 1994, t wo St anford Universit y Ph.D. candidat es, David Filo and Jerry Yang, creat ed some
pages t hat became rat her popular. They called t he collect ion of pages Yahoo! Their official
explanat ion for t he name choice was t hat t hey considered t hemselves t o be a pair of yahoos.

As t he number of links grew and t heir pages began t o receive t housands of hit s a day, t he t eam
creat ed ways t o bet t er organize t he dat a. In order t o aid in dat a ret rieval, Yahoo! (www.yahoo.com)
became a searchable direct ory. The search feat ure was a simple dat abase search engine. Because
Yahoo! ent ries were ent ered and cat egorized manually, Yahoo! was not really classified as a search
engine. Inst ead, it was generally considered t o be a searchable direct ory. Yahoo! has since
aut omat ed some aspect s of t he gat hering and classificat ion process, blurring t he dist inct ion
bet ween engine and direct ory.

The Wanderer capt ured only URLs, which made it difficult t o find t hings t hat were not explicit ly
described by t heir URL. Because URLs are rat her crypt ic t o begin wit h, t his did not help t he average
user. Searching Yahoo! or t he Galaxy was much more effect ive because t hey cont ained addit ional
descript ive informat ion about t he indexed sit es.

Lycos

At Carnegie Mellon Universit y during July 1994, Michael Mauldin, on leave from CMU, developed t he
Lycos search engine.

Types of web search engines

Search engines on t he web are sit es enriched wit h facilit y t o search t he cont ent st ored on ot her
sit es. There is difference in t he way various search engines work, but t hey all perform t hree basic
t asks.[72]

1. Finding and select ing full or part ial cont ent based on t he keywords provided.

2. Maint aining index of t he cont ent and referencing t o t he locat ion t hey find

3. Allowing users t o look for words or combinat ions of words found in t hat index.

The process begins when a user ent ers a query st at ement int o t he syst em t hrough t he int erface
provided.

Type Example Descript ion

Conventional librarycatalog Search by keyword, title, author, etc.

Text-based Google, Bing, Yahoo! Search by keywords. Limited search using queries in natural language.

Voice-based Google, Bing, Yahoo! Search by keywords. Limited search using queries in natural language.

Multimedia search QBIC, WebSeek, SaFe Search by visual appearance (shapes, colors,..)

Q/A Stack Exchange, NSIR Search in (restricted) natural language

Clustering Systems Vivisimo, Clusty

Research Systems Lemur, Nutch

There are basically t hree t ypes of search engines: Those t hat are powered by robot s (called
crawlers; ant s or spiders) and t hose t hat are powered by human submissions; and t hose t hat are a
hybrid of t he t wo.

Crawler-based search engines are t hose t hat use aut omat ed soft ware agent s (called crawlers) t hat
visit a Web sit e, read t he informat ion on t he act ual sit e, read t he sit e's met a t ags and also follow t he
links t hat t he sit e connect s t o performing indexing on all linked Web sit es as well. The crawler
ret urns all t hat informat ion back t o a cent ral deposit ory, where t he dat a is indexed. The crawler will
periodically ret urn t o t he sit es t o check for any informat ion t hat has changed. The frequency wit h
which t his happens is det ermined by t he administ rat ors of t he search engine.

Human-powered search engines rely on humans t o submit informat ion t hat is subsequent ly indexed
and cat alogued. Only informat ion t hat is submit t ed is put int o t he index.

In bot h cases, when you query a search engine t o locat e informat ion, you're act ually searching
t hrough t he index t hat t he search engine has creat ed —you are not act ually searching t he Web.
These indices are giant dat abases of informat ion t hat is collect ed and st ored and subsequent ly
searched. This explains why somet imes a search on a commercial search engine, such as Yahoo! or
Google, will ret urn result s t hat are, in fact , dead links. Since t he search result s are based on t he
index, if t he index has not been updat ed since a Web page became invalid t he search engine t reat s
t he page as st ill an act ive link even t hough it no longer is. It will remain t hat way unt il t he index is
updat ed.

So why will t he same search on different search engines produce different result s? Part of t he
answer t o t hat quest ion is because not all indices are going t o be exact ly t he same. It depends on
what t he spiders find or what t he humans submit t ed. But more import ant , not every search engine
uses t he same algorit hm t o search t hrough t he indices. The algorit hm is what t he search engines use
t o det ermine t he relevance of t he informat ion in t he index t o what t he user is searching for.

One of t he element s t hat a search engine algorit hm scans for is t he frequency and locat ion of
keywords on a Web page. Those wit h higher frequency are t ypically considered more relevant . But
search engine t echnology is becoming sophist icat ed in it s at t empt t o discourage what is known as
keyword st uffing, or spamdexing.

Anot her common element t hat algorit hms analyze is t he way t hat pages link t o ot her pages in t he
Web. By analyzing how pages link t o each ot her, an engine can bot h det ermine what a page is about
(if t he keywords of t he linked pages are similar t o t he keywords on t he original page) and whet her
t hat page is considered "import ant " and deserving of a boost in ranking. Just as t he t echnology is
becoming increasingly sophist icat ed t o ignore keyword st uffing, it is also becoming more savvy t o
Web mast ers who build art ificial links int o t heir sit es in order t o build an art ificial ranking.

Modern web search engines are highly int ricat e soft ware syst ems t hat employ t echnology t hat has
evolved over t he years. There are a number of sub-cat egories of search engine soft ware t hat are
separat ely applicable t o specific 'browsing' needs. These include web search engines (e.g. Google),
dat abase or st ruct ured dat a search engines (e.g. Dieselpoint ), and mixed search engines or
ent erprise search. The more prevalent search engines, such as Google and Yahoo!, ut ilize hundreds of
t housands comput ers t o process t rillions of web pages in order t o ret urn fairly well-aimed result s.
Due t o t his high volume of queries and t ext processing, t he soft ware is required t o run in a highly
dispersed environment wit h a high degree of superfluit y.

Anot her cat egory of search engines is scient ific search engines. These are search engines which
search scient ific lit erat ure. The best known example is Google Scholar. Researchers are working on
improving search engine t echnology by making t hem underst and t he cont ent element of t he
art icles, such as ext ract ing t heoret ical const ruct s or key research findings.[73]

See also

Comparison of web search engines


Filt er bubble

Google effect

Informat ion ret rieval

Use of web search engines in libraries

List of search engines

Quest ion answering

Search engine manipulat ion effect

Search engine privacy

Semant ic Web

Spell checker

Web development t ools

Web query

Wikipedia:Search engine t est , for a t ut orial on using search engines for researching Wikipedia
art icles

References

1. "Search Engine Market Share Worldwide | StatCounter Global Stats" (http://gs.statcounter.com/searc


h-engine-market-share) . StatCounter. Retrieved 19 February 2024.

2. "Search Engine Market Share Worldwide" (https://www.similarweb.com/engines/) . Similarweb Top


search engines. Retrieved 19 February 2024.

3. Bush, Vannevar (1945-07-01). "As We May Think" (https://web.archive.org/web/20120822132632/http://


www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/4/) . The Atlantic. Archived
from the original (https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/30388
1/) on 2012-08-22. Retrieved 2024-02-22.

4. "Search Engine History.com" (http://www.searchenginehistory.com/) . www.searchenginehistory.com.


Retrieved 2020-07-02.
5. "Penn State WebAccess Secure Login" (https://web.archive.org/web/20220122194212/https://webacc
ess.psu.edu/?cosign-scripts.libraries.psu.edu&https%3A%2F%2Fscripts.libraries.psu.edu%2Fscripts%2F
ezproxyauth.php%3Furl=ezp.2aHR0cHM6Ly9pZWVleHBsb3JlLmllZWUub3JnL3N0YW1wL3N0YW1wLmpz
cD90cD0mYXJudW1iZXI9ODUwOTU1MA--) . webaccess.psu.edu. Archived from the original (https://we
baccess.psu.edu/?cosign-scripts.libraries.psu.edu&https%3A%2F%2Fscripts.libraries.psu.edu%2Fscript
s%2Fezproxyauth.php%3Furl=ezp.2aHR0cHM6Ly9pZWVleHBsb3JlLmllZWUub3JnL3N0YW1wL3N0YW1w
LmpzcD90cD0mYXJudW1iZXI9ODUwOTU1MA--) on 2022-01-22. Retrieved 2020-07-02.

6. Marchiori, Massimo (1997). "The Quest for Correct Information on the Web: Hyper Search Engines" (ht
tp://www.w3.org/People/Massimo/papers/WWW6/Overview.html) . Proceedings of the Sixth
International World Wide Web Conference (WWW6). Retrieved 2021-01-10.

7. Brin, Sergey; Page, Larry (1998). "The Anatomy of a Large-Scale Hypertextual Web Search Engine" (http
s://web.archive.org/web/20170713070157/http://ilpubs.stanford.edu:8090/361/1/1998-8.pdf) (PDF).
Proceedings of the Seventh International World Wide Web Conference (WWW7). Archived from the original
(http://ilpubs.stanford.edu:8090/361/1/1998-8.pdf) (PDF) on 2017-07-13. Retrieved 2021-01-10.

8. Harrenstien, K.; White, V. (1982). "RFC 812 - NICNAME/WHOIS" (https://tools.ietf.org/html/rfc812) . Ietf


Datatracker. doi:10.17487/RFC0812 (https://doi.org/10.17487%2FRFC0812) .

9. "Knowbot programming: System support for mobile agents" (http://www.cnri.reston.va.us/home/koe/i


wooos-full.html) . cnri.reston.va.us.

10. Deutsch, Peter (September 11, 1990). "[next] An Internet archive server server (was about Lisp)" (http
s://groups.google.com/forum/#!msg/comp.archives/LWVA50W8BKk/wyRbF_ lDc6cJ) .
groups.google.com. Retrieved 2017-12-29.

11. "World-Wide Web Servers" (http://www.w3.org/History/19921103-hypertext/hypertext/DataSources/W


WW/Servers.html) . W3C. Retrieved 2012-05-14.

12. "What's New! February 1994" (http://home.mcom.com/home/whatsnew/whats_ new_ 0294.html) .


Mosaic Communications Corporation!. Retrieved 2012-05-14.

13. Search Engine Watch (September 2001). "Search Engines" (https://web.archive.org/web/200904130301


08/http://www.internethistory.leidenuniv.nl/index.php3?c=7) . Internet History. Netherlands: Universiteit
Leiden. Archived from the original (http://www.internethistory.leidenuniv.nl/index.php3?c=7) on 2009-
04-13.

14. "Archie" (https://www.pcmag.com/encyclopedia/term/archie) . PCMag. Retrieved 2020-09-20.

15. Alexandra Samuel (21 February 2017). "Meet Alan Emtage, the Black Technologist Who Invented
ARCHIE, the First Internet Search Engine" (https://daily.jstor.org/alan-emtage-first-internet-search-engi
ne/) . ITHAKA. Retrieved 2020-09-20.
16. loop news barbados. "Alan Emtage- a Barbadian you should know" (https://web.archive.org/web/20200
923065914/http://www.loopnewsbarbados.com/content/alan-emtage-barbadian-you-should-know) .
loopnewsbarbados.com. Archived from the original (http://www.loopnewsbarbados.com/content/alan
-emtage-barbadian-you-should-know) on 2020-09-23. Retrieved 2020-09-21.

17. Dino Grandoni, Alan Emtage (April 2013). "Alan Emtage: The Man Who Invented The World's First
Search Engine (But Didn't Patent It)" (https://www.huffingtonpost.co.uk/entry/alan-emtage-search-eng
ine_ n_ 2994090?ri18n=true&guccounter=1&guce_ referrer=aHR0cHM6Ly9jb25zZW50LnlhaG9vLmNvbS8
&guce_ referrer_ sig=AQAAABveQefuoczW_ 8_ bxwbOgluVTUPvIfv5s_ OP1jMgUJd8MCwKc148lvXb7HAHX
Y48P_ Be6wXMW0LKlLRfQzJNalLpuwnp7F6NpbyDC2BG10OveS2qtubkO0PhJ8-juP3M2a9K2ygbWuoUhO
CvO-1NA6-YQKA8BtdZEcsfUUI_ M-8S) . huffingtonpost.co.uk. Retrieved 2020-09-21.

18. Oscar Nierstrasz (2 September 1993). "Searchable Catalog of WWW Resources (experimental)" (http
s://groups.google.com/group/comp.infosystems.www/browse_ thread/thread/2176526a36dc8bd3/27
18fd17812937ac?hl=en&lnk=gst&q=Oscar+Nierstrasz#2718fd17812937ac) .

19. "Archive of NCSA what's new in December 1993 page" (https://web.archive.org/web/20010620073530/


http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/old-whats-new/whats-new-1293.html) .
2001-06-20. Archived from the original (http://archive.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/old-
whats-new/whats-new-1293.html) on 2001-06-20. Retrieved 2012-05-14.

20. "What is first mover?" (https://searchcio.techtarget.com/definition/first-mover) . SearchCIO.


TechTarget. September 2005. Retrieved 5 September 2019.

21. Oppitz, Marcus; Tomsu, Peter (2017). Inventing the Cloud Century: How Cloudiness Keeps Changing Our
Life, Economy and Technology (https://books.google.com/books?id=vrEvDwAAQBAJ&pg=PA238) .
Springer. p. 238. ISBN 9783319611617.

22. "Yahoo! Search" (https://web.archive.org/web/19961128070718/http://www.yahoo.com/search.htm


l) . Yahoo!. 28 November 1996. Archived from the original (https://www.yahoo.com/search.html) on
28 November 1996. Retrieved 5 September 2019.

23. Greenberg, Andy, "The Man Who's Beating Google" (https://www.forbes.com/forbes/2009/1005/techn


ology-baidu-robin-li-man-whos-beating-google.html) , Forbes magazine, October 5, 2009

24. Yanhong Li, "Toward a Qualitative Search Engine", IEEE Internet Computing, vol. 2, no. 4, pp. 24–29,
July/Aug. 1998, doi:10.1109/4236.707687 (https://doi.org/10.1109%2F4236.707687)

25. "About: RankDex" (http://www.rankdex.com/about.html) , rankdex.com

26. USPTO, "Hypertext Document Retrieval System and Method" (https://patents.google.com/patent/US5


920859) , US Patent number: 5920859, Inventor: Yanhong Li, Filing date: Feb 5, 1997, Issue date: Jul 6,
1999

27. "Baidu Vs Google: The Twins Of Search Compared" (https://fourweekmba.com/baidu-vs-google/) .


FourWeekMBA. 18 September 2018. Retrieved 16 June 2019.
28. Altucher, James (March 18, 2011). "10 Unusual Things About Google" (https://www.forbes.com/sites/j
amesaltucher/2011/03/18/10-unusual-things-about-google-also-the-worst-vc-decision-i-ever-mad
e/) . Forbes. Retrieved 16 June 2019.

29. "Method for node ranking in a linked database" (https://patents.google.com/patent/US6285999) .


Google Patents. Archived (https://web.archive.org/web/20151015185034/http://www.google.com/pate
nts/US6285999) from the original on 15 October 2015. Retrieved 19 October 2015.

30. "Yahoo! And Netscape Ink International Distribution Deal" (https://web.archive.org/web/201311161120


21/http://files.shareholder.com/downloads/YHOO/701084386x0x27155/9a3b5ed8-9e84-4cba-a1e5-77
a3dc606566/YHOO_ News_ 1997_ 7_ 8_ General.pdf) (PDF). Archived from the original (http://files.shar
eholder.com/downloads/YHOO/701084386x0x27155/9a3b5ed8-9e84-4cba-a1e5-77a3dc606566/YHOO
_ News_ 1997_ 7_ 8_ General.pdf) (PDF) on 2013-11-16. Retrieved 2009-08-12.

31. "Browser Deals Push Netscape Stock Up 7.8%" (https://articles.latimes.com/1996-04-01/business/fi-5


3780_ 1_ netscape-home) . Los Angeles Times. 1 April 1996.

32. Gandal, Neil (2001). "The dynamics of competition in the internet search engine market" (http://www.e
scholarship.org/uc/item/0h17g08v) . International Journal of Industrial Organization. 19 (7): 1103–
1117. doi:10.1016/S0167-7187(01)00065-0 (https://doi.org/10.1016%2FS0167-7187%2801%2900065-
0) .

33. "Our history in depth" (https://web.archive.org/web/20121101210037/https://www.google.com/about/


company/history/) . Archived from the original (https://www.google.com/about/company/history/)
on November 1, 2012. Retrieved 2012-10-31.

34. "Definition – search engine" (https://www.techtarget.com/whatis/definition/search-engine) .


Techtarget. Retrieved 1 June 2023.

35. Jawadekar, Waman S (2011), "8. Knowledge Management: Tools and Technology" (https://books.googl
e.com/books?id=XmGx4J9daUMC&pg=PA278) , Knowledge Management: Text & Cases, New Delhi: Tata
McGraw-Hill Education Private Ltd, p. 278, ISBN 978-0-07-07-0086-4, retrieved November 23, 2012

36. Dasgupta, Anirban; Ghosh, Arpita; Kumar, Ravi; Olston, Christopher; Pandey, Sandeep; and Tomkins,
Andrew. The Discoverability of the Web. http://www.arpitaghosh.com/papers/discoverability.pdf

37. Jansen, B. J., Spink, A., and Saracevic, T. 2000. Real life, real users, and real needs: A study and analysis
of user queries on the web. Information Processing & Management (https://faculty.ist.psu.edu/jjanse
n/academic/pubs/jansen_ real_ life_ real_ users_ and_ real_ needs.pdf) . 36(2), 207–227.

38. Chitu, Alex (August 30, 2007). "Easy Way to Find Recent Web Pages" (http://googlesystem.blogspot.co
m/2007/08/easy-way-to-find-recent-web-pages.html) . Google Operating System. Retrieved
22 February 2015.

39. "how search engine works?" (http://globalforumonline.com/detail/how-does-search-engine-works/) .


GFO. Retrieved 26 June 2018.
40. "What Is Local SEO & Why Local Search Is Important" (https://www.searchenginejournal.com/local-se
o/what-is-local-seo-why-local-search-is-important/) . Search Engine Journal. Retrieved 2020-04-26.

41. "Live Internet - Site Statistics" (http://www.liveinternet.ru/stat/ru/searches.html?slice=ru;period=wee


k) . Live Internet. Retrieved 2014-06-04.

42. Arthur, Charles (2014-06-03). "The Chinese technology companies poised to dominate the world" (http
s://www.theguardian.com/world/2014/jun/03/chinese-technology-companies-huawei-dominate-worl
d) . The Guardian. Retrieved 2014-06-04.

43. "How Naver Hurts Companies' Productivity" (https://blogs.wsj.com/korearealtime/2014/05/21/how-na


ver-hurts-companies-productivity/) . The Wall Street Journal. 2014-05-21. Retrieved 2014-06-04.

44. "Age of Internet Empires" (https://geography.oii.ox.ac.uk/age-of-internet-empires/) . Oxford Internet


Institute. Retrieved 15 August 2019.

45. Waddell, Kaveh (2016-01-19). "Why Google Quit China—and Why It's Heading Back" (https://www.theatl
antic.com/technology/archive/2016/01/why-google-quit-china-and-why-its-heading-back/424482/) .
The Atlantic. Retrieved 2020-04-26.

46. Seznam Takes on Google in the Czech Republic (http://www.doz.com/search-engine/seznam-search-


engine) . Doz.

47. Segev, El (2010). Google and the Digital Divide: The Biases of Online Knowledge (https://books.google.c
om/books?id=_ Y9wAgAAQBAJ&q=bias) , Oxford: Chandos Publishing.

48. Vaughan, Liwen; Mike Thelwall (2004). "Search engine coverage bias: evidence and possible causes".
Information Processing & Management. 40 (4): 693–707. CiteSeerX 10.1.1.65.5130 (https://citeseerx.ist.p
su.edu/viewdoc/summary?doi=10.1.1.65.5130) . doi:10.1016/S0306-4573(03)00063-3 (https://doi.org/
10.1016%2FS0306-4573%2803%2900063-3) . S2CID 18977861 (https://api.semanticscholar.org/Corpu
sID:18977861) .

49. Jansen, B. J. and Rieh, S. (2010) The Seventeen Theoretical Constructs of Information Searching and
Information Retrieval (https://faculty.ist.psu.edu/jjansen/academic/jansen_ theoretical_ constructs.pd
f) . Journal of the American Society for Information Sciences and Technology. 61(8), 1517–1534.

50. Berkman Center for Internet & Society (2002), "Replacement of Google with Alternative Search
Systems in China: Documentation and Screen Shots" (http://cyber.law.harvard.edu/filtering/china/goog
le-replacements/) , Harvard Law School.

51. Introna, Lucas; Helen Nissenbaum (2000). "Shaping the Web: Why the Politics of Search Engines
Matters". The Information Society. 16 (3): 169–185. CiteSeerX 10.1.1.24.8051 (https://citeseerx.ist.psu.e
du/viewdoc/summary?doi=10.1.1.24.8051) . doi:10.1080/01972240050133634 (https://doi.org/10.108
0%2F01972240050133634) . S2CID 2111039 (https://api.semanticscholar.org/CorpusID:2111039) .

52. Hillis, Ken; Petit, Michael; Jarrett, Kylie (2012-10-12). Google and the Culture of Search (https://archive.or
g/details/googlecultureofs0000hill) . Routledge. ISBN 9781136933066.
53. Reilly, P. (2008-01-01). " 'Googling' Terrorists: Are Northern Irish Terrorists Visible on Internet Search
Engines?". In Spink, Prof Dr Amanda; Zimmer, Michael (eds.). Web Search. Information Science and
Knowledge Management. Vol. 14. Springer Berlin Heidelberg. pp. 151–175. Bibcode:2008wsis.book..151R
(https://ui.adsabs.harvard.edu/abs/2008wsis.book..151R) . doi:10.1007/978-3-540-75829-7_ 10 (http
s://doi.org/10.1007%2F978-3-540-75829-7_ 10) . ISBN 978-3-540-75828-0. S2CID 84831583 (https://ap
i.semanticscholar.org/CorpusID:84831583) .

54. Hiroko Tabuchi, "How Climate Change Deniers Rise to the Top in Google Searches (https://www.nytim
es.com/2017/12/29/climate/google-search-climate-change.html) ", The New York Times, Dec. 29,
2017. Retrieved November 14, 2018.

55. Ballatore, A (2015). "Google chemtrails: A methodology to analyze topic representation in search
engines" (http://firstmonday.org/ojs/index.php/fm/article/view/5597) . First Monday. 20 (7).
doi:10.5210/fm.v20i7.5597 (https://doi.org/10.5210%2Ffm.v20i7.5597) .

56. Pariser, Eli (2011). The filter bubble : what the Internet is hiding from you (https://www.worldcat.org/oclc/
682892628) . New York: Penguin Press. ISBN 978-1-59420-300-8. OCLC 682892628 (https://www.world
cat.org/oclc/682892628) .

57. O'Hara, K. (2014-07-01). "In Worship of an Echo" (https://doi.org/10.1109%2FMIC.2014.71) . IEEE


Internet Computing. 18 (4): 79–83. doi:10.1109/MIC.2014.71 (https://doi.org/10.1109%2FMIC.2014.71) .
ISSN 1089-7801 (https://www.worldcat.org/issn/1089-7801) . S2CID 37860225 (https://api.semantics
cholar.org/CorpusID:37860225) .

58. Bruns, Axel (2019-11-29). "Filter bubble" (https://policyreview.info/node/1426) . Internet Policy Review. 8
(4). doi:10.14763/2019.4.1426 (https://doi.org/10.14763%2F2019.4.1426) . hdl:10419/214088 (https://h
dl.handle.net/10419%2F214088) . ISSN 2197-6775 (https://www.worldcat.org/issn/2197-6775) .
S2CID 211483210 (https://api.semanticscholar.org/CorpusID:211483210) .

59. Haim, Mario; Graefe, Andreas; Brosius, Hans-Bernd (2018). "Burst of the Filter Bubble?" (https://doi.org/
10.1080%2F21670811.2017.1338145) . Digital Journalism. 6 (3): 330–343.
doi:10.1080/21670811.2017.1338145 (https://doi.org/10.1080%2F21670811.2017.1338145) .
ISSN 2167-0811 (https://www.worldcat.org/issn/2167-0811) . S2CID 168906316 (https://api.semantic
scholar.org/CorpusID:168906316) .

60. Nechushtai, Efrat; Lewis, Seth C. (2019). "What kind of news gatekeepers do we want machines to be?
Filter bubbles, fragmentation, and the normative dimensions of algorithmic recommendations" (http
s://linkinghub.elsevier.com/retrieve/pii/S0747563218303650) . Computers in Human Behavior. 90: 298–
307. doi:10.1016/j.chb.2018.07.043 (https://doi.org/10.1016%2Fj.chb.2018.07.043) . S2CID 53774351 (ht
tps://api.semanticscholar.org/CorpusID:53774351) .

61. "New Islam-approved search engine for Muslims" (https://web.archive.org/web/20130712023215/htt


p://news.msn.com/science-technology/new-islam-approved-search-engine-for-muslims) .
News.msn.com. Archived from the original (https://news.msn.com/science-technology/new-islam-app
roved-search-engine-for-muslims) on 2013-07-12. Retrieved 2013-07-11.
62. "Jewogle - FAQ" (https://web.archive.org/web/20190207015631/http://www.jewogle.com/faq/) .
Archived from the original (http://www.jewogle.com/faq/) on 2019-02-07. Retrieved 2019-02-06.

63. "Halalgoogling: Muslims Get Their Own "sin free" Google; Should Christians Have Christian Google? -
Christian Blog" (https://web.archive.org/web/20140913102516/http://allchristiannews.com/halalgoogli
ng-muslims-get-their-own-sin-free-google-should-christians-have-christian-google/) . Christian Blog.
2013-07-25. Archived from the original (http://allchristiannews.com/halalgoogling-muslims-get-their-o
wn-sin-free-google-should-christians-have-christian-google/) on 2014-09-13. Retrieved 2014-09-13.

64. Schwartz, Barry (2012-10-29). "Google: Search Engine Submission Services Can Be Harmful" (https://w
ww.seroundtable.com/search-engine-submission-google-15906.html) . Search Engine Roundtable.
Retrieved 2016-04-04.

65. Heymann, Paul; Koutrika, Georgia; Garcia-Molina, Hector (February 12, 2008). "Can Social Bookmarking
Improve Web Search?" (http://dbpubs.stanford.edu:8090/pub/2008-2) . First ACM International
Conference on Web Search and Data Mining. Retrieved 2008-03-12.

66. Priti Srinivas Sajja; Rajendra Akerkar (2012). Intelligent technologies for web applications (https://books.g
oogle.com/books?id=HqXxoWK7tucC&q=the+University+of+Nevada+System+Computing+Services+g
roup+developed+Veronica.&pg=PA87) . Boca Raton: CRC Press. p. 87. ISBN 978-1-4398-7162-1.
Retrieved 3 June 2014.

67. "A History of Search Engines" (http://www.wiley.com/legacy/compbooks/sonnenreich/history.html) .


Wiley. Retrieved 1 June 2014.

68. Priti Srinivas Sajja; Rajendra Akerkar (2012). Intelligent technologies for web applications (https://books.g
oogle.com/books?id=HqXxoWK7tucC&q=the+University+of+Nevada+System+Computing+Services+g
roup+developed+Veronica.&pg=PA87) . Boca Raton: CRC Press. p. 86. ISBN 978-1-4398-7162-1.
Retrieved 3 June 2014.

69. "The Major Search Engines" (https://web.archive.org/web/20140605052335/http://www.pccua.edu/khol


land/major_ search_ engines.htm) . 21 January 2014. Archived from the original (http://www.pccua.ed
u/kholland/major_ search_ engines.htm) on 5 June 2014. Retrieved 1 June 2014.

70. Jansen, B. J., Spink, A., Bateman, J., and Saracevic, T. 1998. Real life information retrieval: A study of
user queries on the web (https://faculty.ist.psu.edu/jjansen/academic/jansen_ sigir_ forum.pdf) .
SIGIR Forum, 32(1), 5 -17.

71. Jansen, B. J., Spink, A., and Saracevic, T. 2000. Real life, real users, and real needs: A study and analysis
of user queries on the web (https://faculty.ist.psu.edu/jjansen/academic/pubs/jansen_ real_ life_ real_ u
sers_ and_ real_ needs.pdf) . Information Processing & Management. 36(2), 207–227.

72. Priti Srinivas Sajja; Rajendra Akerkar (2012). Intelligent technologies for web applications (https://books.g
oogle.com/books?id=HqXxoWK7tucC&q=the+University+of+Nevada+System+Computing+Services+g
roup+developed+Veronica.&pg=PA87) . Boca Raton: CRC Press. p. 85. ISBN 978-1-4398-7162-1.
Retrieved 3 June 2014.
73. Li, Jingjing; Larsen, Kai; Abbasi, Ahmed (2020-12-01). "TheoryOn: A Design Framework and System for
Unlocking Behavioral Knowledge Through Ontology Learning" (https://misq.org/theoryon-a-design-fra
mework-and-system-for-unlocking-behavioral-knowledge-through-ontology-learning.html) . MIS
Quarterly. 44 (4): 1733–1772. doi:10.25300/MISQ/2020/15323 (https://doi.org/10.25300%2FMISQ%2F20
20%2F15323) . S2CID 219401379 (https://api.semanticscholar.org/CorpusID:219401379) .

Further reading

St eve Lawrence; C. Lee Giles (1999). "Accessibilit y of informat ion on t he web" (ht t ps://doi.org/10.
1038%2F21987) . Nature. 400 (6740): 107–9. Bibcode:1999Nat ur.400..107L (ht t ps://ui.adsabs.harv
ard.edu/abs/1999Nat ur.400..107L) . doi:10.1038/21987 (ht t ps://doi.org/10.1038%2F21987) .
PMID 10428673 (ht t ps://pubmed.ncbi.nlm.nih.gov/10428673) . S2CID 4347646 (ht t ps://api.sema
nt icscholar.org/CorpusID:4347646) .

Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data (http://www.cs.ui
c.edu/~liub/WebMiningBook.html) . Springer,ISBN 3-540-37881-2

Bar-Ilan, J. (2004). The use of Web search engines in informat ion science research. ARIST, 38,
231–288.

Levene, Mark (2005). An Introduction to Search Engines and Web Navigation. Pearson.

Hock, Randolph (2007). The Extreme Searcher's Handbook.ISBN 978-0-910965-76-7

Javed Most afa (February 2005). "Seeking Bet t er Web Searches". Scientific American. 292 (2): 66–
73. Bibcode:2005SciAm.292b..66M (ht t ps://ui.adsabs.harvard.edu/abs/2005SciAm.292b..66M) .
doi:10.1038/scient ificamerican0205-66 (ht t ps://doi.org/10.1038%2Fscient ificamerican0205-66) .

Ross, Nancy; Wolfram, Diet mar (2000). "End user searching on t he Int ernet : An analysis of t erm pair
t opics submit t ed t o t he Excit e search engine". Journal of the American Society for Information
Science. 51 (10): 949–958. doi:10.1002/1097-4571(2000)51:10<949::AID-ASI70>3.0.CO;2-5 (ht t p
s://doi.org/10.1002%2F1097-4571%282000%2951%3A10%3C949%3A%3AAID-ASI70%3E3.0.CO%
3B2-5) .

Xie, M.; et al. (1998). "Qualit y dimensions of Int ernet search engines". Journal of Information
Science. 24 (5): 365–372. doi:10.1177/016555159802400509 (ht t ps://doi.org/10.1177%2F016555
159802400509) . S2CID 34686531 (ht t ps://api.semant icscholar.org/CorpusID:34686531) .

Information Retrieval: Implementing and Evaluating Search Engines (ht t ps://web.archive.org/web/2


0201005195805/ht t p://www.ir.uwat erloo.ca/book/) . MIT Press. 2010. Archived from t he original
(ht t p://www.ir.uwat erloo.ca/book/) on 2020-10-05. Ret rieved 2010-08-07.
Yeo, ShinJoung. (2023) Behind the Search Box: Google and the Global Internet Industry (U of Illinois
Press, 2023) ISBN 10:0252087127 online (ht t ps://www.jst or.org/st able/10.5406/jj.4116455)

External links

Search Engines (ht t ps://curlie.org/Comput ers/Int ernet /Search Wikimedia Commons has
ing/Search_ Engines/) at Curlie media related to Internet
search engines.

Wikiversity has learning


resources about Search
Engines

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy