Web search engine
Web search engine
For a tutorial on using search engines for researching Wikipedia articles, see Wikipedia:Search engine test.
A web search engine is a software system that is designed to search for information on the World Wide
Web. The search results are generally presented in a line of results often referred to as search engine
results pages (SERPs). The information may be a specialist in web pages, images, information and other
types of files. Some search engines alsomine data available in databases or open directories. Unlike web
directories, which are maintained only by human editors, search engines also maintain real-time information
by running an algorithm on a web crawler.
Contents
[hide]
1 History
3 Market share
6 See also
7 References
8 Further reading
9 External links
History[edit]
Aliweb Inactive
JumpStation Inactive
Lycos Active
Infoseek Inactive
Daum Active
Magellan Inactive
Excite Active
SAPO Active
Yandex Active
Google Active
MSN Search Active as Bing
Naver Active
Teoma Active
Vivisimo Inactive
Exalead Active
Gigablast Active
Scroogle Inactive
A9.com Inactive
Sogou Active
Ask.com Active
GoodSearch Active
SearchMe Inactive
2006 wikiseek Inactive
Quaero Active
Ask.com Active
ChaCha Active
Sproose Inactive
Picollator Inactive
Viewzi Inactive
Boogami Inactive
LeapFish Inactive
DuckDuckGo Active
NATE Active
Cuil Inactive
(English) search
During early development of the web, there was a list of webservers edited by Tim Berners-Lee and hosted
on the CERNwebserver. One historical snapshot of the list in 1992 remains,[1] but as more and more
webservers went online the central list could no longer keep up. On the NCSA site, new servers were
announced under the title "What's New!"[2]
The very first tool used for searching on the Internet was Archie.[3] The name stands for "archive" without
the "v". It was created in 1990 by Alan Emtage, Bill Heelan and J. Peter Deutsch, computer science
students at McGill University in Montreal. The program downloaded the directory listings of all the files
located on public anonymous FTP (File Transfer Protocol) sites, creating a searchable database of file
names; however, Archie did not index the contents of these sites since the amount of data was so limited it
could be readily searched manually.
The rise of Gopher (created in 1991 by Mark McCahill at the University of Minnesota) led to two new search
programs, Veronicaand Jughead. Like Archie, they searched the file names and titles stored in Gopher
index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a
keyword search of most Gopher menu titles in the entire Gopher listings. Jughead
(Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information
from specific Gopher servers. While the name of the search engine "Archie" was not a reference to
the Archie comic book series, "Veronica" and "Jughead" are characters in the series, thus referencing their
predecessor.
In the summer of 1993, no search engine existed for the web, though numerous specialized catalogues
were maintained by hand.Oscar Nierstrasz at the University of Geneva wrote a series of Perl scripts that
periodically mirrored these pages and rewrote them into a standard format. This formed the basis
for W3Catalog, the web's first primitive search engine, released on September 2, 1993. [4]
In June 1993, Matthew Gray, then at MIT, produced what was probably the first web robot, the Perl-
based World Wide Web Wanderer, and used it to generate an index called 'Wandex'. The purpose of the
Wanderer was to measure the size of the World Wide Web, which it did until late 1995. The web's second
search engine Aliweb appeared in November 1993. Aliweb did not use aweb robot, but instead depended
on being notified by website administrators of the existence at each site of an index file in a particular
format.
JumpStation (created in December 1993[5] by Jonathon Fletcher) used a web robot to find web pages and to
build its index, and used a web form as the interface to its query program. It was thus the first WWW
resource-discovery tool to combine the three essential features of a web search engine(crawling, indexing,
and searching) as described below. Because of the limited resources available on the platform it ran on, its
indexing and hence searching were limited to the titles and headings found in the web pages the crawler
encountered.
One of the first "all text" crawler-based search engines was WebCrawler, which came out in 1994. Unlike its
predecessors, it allowed users to search for any word in any webpage, which has become the standard for
all major search engines since. It was also the first one widely known by the public. Also in
1994, Lycos (which started at Carnegie Mellon University) was launched and became a major commercial
endeavor.
Soon after, many search engines appeared and vied for popularity. These
included Magellan, Excite, Infoseek, Inktomi, Northern Light, and AltaVista. Yahoo! was among the most
popular ways for people to find web pages of interest, but its search function operated on its web directory,
rather than its full-text copies of web pages. Information seekers could also browse the directory instead of
doing a keyword-based search.
Google adopted the idea of selling search terms in 1998, from a small search engine company
named goto.com. This move had a significant effect on the SE business, which went from struggling to one
of the most profitable businesses in the internet.[6]
In 1996, Netscape was looking to give a single search engine an exclusive deal as the featured search
engine on Netscape's web browser. There was so much interest that instead Netscape struck deals with
five of the major search engines: for $5 million a year, each search engine would be in rotation on the
Netscape search engine page. The five engines were Yahoo!, Magellan, Lycos, Infoseek, and Excite. [7][8]
Search engines were also known as some of the brightest stars in the Internet investing frenzy that
occurred in the late 1990s.[9]Several companies entered the market spectacularly, receiving record gains
during their initial public offerings. Some have taken down their public search engine, and are marketing
enterprise-only editions, such as Northern Light. Many search engine companies were caught up in the dot-
com bubble, a speculation-driven market boom that peaked in 1999 and ended in 2001.
Around 2000, Google's search engine rose to prominence.[10] The company achieved better results for many
searches with an innovation called PageRank. This iterative algorithm ranks web pages based on the
number and PageRank of other web sites and pages that link there, on the premise that good or desirable
pages are linked to more than others. Google also maintained a minimalist interface to its search engine. In
contrast, many of its competitors embedded a search engine in a web portal. In fact, Google search engine
became so popular that spoof engines emerged such as Mystery Seeker.
By 2000, Yahoo! was providing search services based on Inktomi's search engine. Yahoo! acquired Inktomi
in 2002, and Overture(which owned AlltheWeb and AltaVista) in 2003. Yahoo! switched to Google's search
engine until 2004, when it launched its own search engine based on the combined technologies of its
acquisitions.
Microsoft first launched MSN Search in the fall of 1998 using search results from Inktomi. In early 1999 the
site began to display listings from Looksmart, blended with results from Inktomi. For a short time in 1999,
MSN Search used results from AltaVista were instead. In 2004, Microsoft began a transition to its own
search technology, powered by its own web crawler (called msnbot).
Microsoft's rebranded search engine, Bing, was launched on June 1, 2009. On July 29, 2009, Yahoo! and
Microsoft finalized a deal in which Yahoo! Search would be powered by Microsoft Bing technology.
In 2012, following the April 24 release of Google Drive, Google released the Beta version of Open
Drive (available as a Chrome app) to enable the search of files in the cloud . Open Drive has now been
rebranded as Cloud Kite. Cloud Kite is advertised as a "collective encyclopedia project based on Google
Drive public files and on the crowd sharing, crowd sourcing and crowd-solving principles". Cloud Kite will
also return search results from other cloud storage content services including Dropbox, SkyDrive, Evernote
and Box.[11]
1. Web crawling
2. Indexing
3. Searching[12]
Web search engines work by storing information about many web pages, which they retrieve from
the HTML markup of the pages. These pages are retrieved by a Web crawler(sometimes also known as a
spider) — an automated Web crawler which follows every link on the site. The site owner can exclude
specific pages by using robots.txt.
The search engine then analyzes the contents of each page to determine how it should be indexed (for
example, words can be extracted from the titles, page content, headings, or special fields called meta tags).
Data about web pages are stored in an index database for use in later queries. A query from a user can be
a single word. The index helps find information relating to the query as quickly as possible. [12] Some search
engines, such as Google, store all or part of the source page (referred to as a cache) as well as information
about the web pages, whereas others, such as AltaVista, store every word of every page they find.[citation
needed]
This cached page always holds the actual search text since it is the one that was actually indexed, so
it can be very useful when the content of the current page has been updated and the search terms are no
longer in it.[12] This problem might be considered a mild form of linkrot, and Google's handling of it
increases usability by satisfying user expectations that the search terms will be on the returned webpage.
This satisfies the principle of least astonishment, since the user normally expects that the search terms will
be on the returned pages. Increased search relevance makes these cached pages very useful as they may
contain data that may no longer be available elsewhere.[citation needed]
When a user enters a query into a search engine (typically by using keywords), the engine examines
its index and provides a listing of best-matching web pages according to its criteria, usually with a short
summary containing the document's title and sometimes parts of the text. The index is built from the
information stored with the data and the method by which the information is indexed. [12] From 2007 the
Google.com search engine has allowed one to search by date by clicking 'Show search tools' in the leftmost
column of the initial search results page, and then selecting the desired date range. [citation needed] Most search
engines support the use of the boolean operators AND, OR and NOT to further specify the search query.
Boolean operators are for literal searches that allow the user to refine and extend the terms of the search.
The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced
feature called proximity search, which allows users to define the distance between keywords.[12] There is
also concept-based searching where the research involves using statistical analysis on pages containing
the words or phrases you search for. As well, natural language queries allow the user to type a question in
the same form one would ask it to a human. A site like this would be ask.com. [citation needed]
The usefulness of a search engine depends on the relevance of the result set it gives back. While there
may be millions of web pages that include a particular word or phrase, some pages may be more relevant,
popular, or authoritative than others. Most search engines employ methods to rank the results to provide
the "best" results first. How a search engine decides which pages are the best matches, and what order the
results should be shown in, varies widely from one engine to another.[12] The methods also change over
time as Internet usage changes and new techniques evolve. There are two main types of search engine
that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have
programmed extensively. The other is a system that generates an "inverted index" by analyzing texts it
locates. This first form relies much more heavily on the computer itself to do the bulk of the work.
Most Web search engines are commercial ventures supported by advertising revenue and thus some of
them allow advertisers to have their listings ranked higher in search results for a fee. Search engines that
do not accept money for their search results make money by running search related ads alongside the
regular search engine results. The search engines make money every time someone clicks on one of these
ads.[13]
Market share[edit]
Search
Market share in May 2011 Market share in December 2010[14]
engine
Google's worldwide market share peaked at 86.3% in April 2010.[15] Yahoo!, Bing and other search engines
are more popular in the US than in Europe.
According to Hitwise, market share in the USA for October 2011 was Google 65.38%, Bing-powered (Bing
and Yahoo!) 28.62%, and the remaining 66 search engines 6%. However, an Experian Hit wise report
released in August 2011 gave the "success rate" of searches sampled in July. Over 80 percent of Yahoo!
and Bing searches resulted in the users visiting a web site, while Google's rate was just under 68 percent. [16]
[17]
In the People's Republic of China, Baidu held a 61.6% market share for web search in July 2009.
[18]
In Russian Federation, Yandex holds around 60% of the market share as of April 2012.[19] In July 2013
Google controls 84% Global & 88% US market share for web search.[20] In South Korea, Naver (Hangul: 네이
버) is a popular search portal, which holds a market share of over 70% at least since 2011,[21] continuing to
2013.[22]
Although search engines are programmed to rank websites based on some combination of their popularity
and relevancy, empirical studies indicate various political, economic, and social biases in the information
they provide. [23] [24] These biases can be a direct result of economic and commercial processes (e.g.,
companies that advertise with a search engine can become also more popular in its organic search results),
[25]
and political processes (e.g., the removal of search results to comply with local laws).
Biases can also be a result of social processes, as search engine algorithms are frequently designed to
exclude non-normative viewpoints in favor of more "popular" results.[26]Indexing algorithms of major search
engines skew towards coverage of U.S.-based sites, rather than websites from non-U.S. countries. [24]
Google Bombing is one example of an attempt to manipulate search results for political, social or
commercial reasons.
Since this problem has been identified, competing search engines have emerged that seek to avoid this
problem by not tracking[31] or "bubbling"[32] users.
See also[edit]
Quora
True Knowledge
Wolfram Alpha
Enterprise search
Google effect
Metasearch engine
OpenSearch
Search directory
Selection-based search
Semantic Web
Social search
Spell checker
Web indexing
3. Jump up^ "Internet History - Search Engines" (from Search Engine Watch), Universiteit Leiden,
4. Jump up^ Oscar Nierstrasz (2 September 1993). "Searchable Catalog of WWW Resources
(experimental)".
5. Jump up^ "Archive of NCSA what's new in December 1993 page". Web.archive.org. 2001-06-20.
6. Jump up^http://www.udacity.com/view#Course/cs101/CourseRev/apr2012/Unit/616074/Nugget/671097
8. Jump up^ Browser Deals Push Netscape Stock Up 7.8%. Los Angeles Times. 1 April 1996
9. Jump up^ Gandal, Neil (2001). "The dynamics of competition in the internet search engine
7187(01)00065-0.
12. ^ Jump up to:a b c d e f Jawadekar, Waman S (2011), "8. Knowledge Management: Tools and
Technology", Knowledge Management: Text & Cases, New Delhi: Tata McGraw-Hill Education Private
15. Jump up^ "Net Market share - Google". Marketshare.hitslink.com. Retrieved 2012-05-14.
16. Jump up^ "Google Remains Ahead of Bing, But Relevance Drops". August 12, 2011.
17. Jump up^ Experian Hitwise reports Bing-powered share of searches at 29 percent in October
18. Jump up^ "Search Engine Market Share July 2009 | Rise to the Top Blog". Risetothetop.techwyse.com.
19. Jump up^ Pavliva, Halia (2012-04-02). "Yandex Internet Search Share Gains, Google Steady:
2013.
22. Jump up^ "Naver's new format hits newspapers". Koreatimes.co.kr. 2012-05-24. Retrieved 2013-04-11.
23. Jump up^ Segev, El (2010). Google and the Digital Divide: The Biases of Online Knowledge, Oxford:
Chandos Publishing.
24. ^ Jump up to:a b Vaughan, Liwen; Mike Thelwall (2004). "Search engine coverage bias: evidence and
4573(03)00063-3.
25. Jump up^ Berkman Center for Internet & Society (2002), “Replacement of Google with Alternative
Search Systems in China: Documentation and Screen Shots”, Harvard Law School.
26. Jump up^ Introna, Lucas; Helen Nissenbaum (2000). "Shaping the Web: Why the Politics of Search
Journal 16 (3).doi:10.1080/01972240050133634.
27. Jump up^ Parramore, Lynn (10 October 2010). "The Filter Bubble". The Atlantic. Retrieved 2011-04-20.
"Since Dec. 4, 2009, Google has been personalized for everyone. So when I had two friends this spring
Google "BP," one of them got a set of links that was about investment opportunities in BP. The other
28. Jump up^ Weisberg, Jacob (10 June 2011). "Bubble Trouble: Is Web personalization turning us into
29. Jump up^ Gross, Doug (May 19, 2011). "What the Internet is hiding from you". CNN. Retrieved 2011-
08-15. "I had friends Google BP when the oil spill was happening. These are two women who were quite
similar in a lot of ways. One got a lot of results about the environmental consequences of what was
happening and the spill. The other one just got investment information and nothing about the spill at all."
30. Jump up^ Zhang, Yuan Cao; Séaghdha, Diarmuid Ó; Quercia, Daniele; Jambor, Tamas (February
Further reading[edit]
For a more detailed history of early search engines, see Search Engine Birthdays (from Search Engine
Watch), Chris Sherman, September 2003.
Steve Lawrence; C. Lee Giles (1999). "Accessibility of information on the web". Nature 400 (6740):
107–9. doi:10.1038/21987. PMID 10428673.
Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer,ISBN 3-
540-37881-2
Bar-Ilan, J. (2004). The use of Web search engines in information science research. ARIST, 38, 231-
288.
Levene, Mark (2005). An Introduction to Search Engines and Web Navigation. Pearson.
Javed Mostafa (February 2005). "Seeking Better Web Searches". Scientific American Magazine. [dead link]
[dead link]
Ross, Nancy; Wolfram, Dietmar (2000). "End user searching on the Internet: An analysis of term pair
topics submitted to the Excite search engine". Journal of the American Society for Information
Science 51 (10): 949–958. doi:10.1002/1097-4571(2000)51:10<949::AID-ASI70>3.0.CO;2-5.
Xie, M. et al. (1998). "Quality dimensions of Internet search engines". Journal of Information
Science 24 (5): 365–372. doi:10.1177/016555159802400509.
Information Retrieval: Implementing and Evaluating Search Engines. MIT Press. 2010.
List of search engines
From Wikipedia, the free encyclopedia
This is a list of articles about search engines, including web search engines, selection-based
search engines, metasearch engines, desktop search tools, and web portals andvertical market websites
that have a search facility for online databases.
Contents
[hide]
1 By content/topic
o 1.1 General
o 1.5 Semantic
o 1.6 Accountancy
o 1.7 Business
o 1.8 Enterprise
o 1.9 Fashion
o 1.10 Food/Recipes
o 1.11 Mobile/Handheld
o 1.12 Job
o 1.13 Legal
o 1.14 Medical
o 1.15 News
o 1.16 People
o 1.18 Television
2 By information type
o 2.1 Forum
o 2.2 Blog
o 2.3 Multimedia
o 2.5 BitTorrent
o 2.6 Cloud
o 2.7 Email
o 2.8 Maps
o 2.9 Price
3 By model
o 3.8 Usenet
4 Based on
o 4.1 Google
o 4.2 Yahoo!
o 4.3 Bing
o 4.4 Ask.com
6 See also
7 References
8 External links
By content/topic
General
Name Language !
Bing Multilingual
Blekko English
Name Language !
DuckDuckGo English
Exalead Multilingual
Gigablast English
Google Multilingual
Sogou Chinese
Soso.com Chinese
Volunia Multilingual
Yahoo! Multilingual
Yandex Multilingual
Youdao Chinese
Name Language
FAROO English
Metasearch engines
See also: Metasearch engine
Name Language
Blingo English
DeeperWeb English
Dogpile English
Excite English
Harvester42
HotBot English
Info.com English
Mamma
Metacrawler English
Name Language
Mobissimo Multilingual
Otalo English
WebCrawler English
Daum Korea
Guruji.com India
Leit.is Iceland
Miner.hu Hungary
Najdi.si Slovenia
Rambler Russia
Rediff India
Search.ch Switzerland
Walla! Israel
Yehey! Philippines
Semantic
See also: Semantic search
Search Engine
Description Speciality
Name
True
Knowledge(now Evi) Specialises in knowledge base and semantic search answer engine
[1]
Browse.lt Fast and comfortable web search engine web, video, images, news
Accountancy
IFACnet
Business
Business.com
GlobalSpec
Justdial
Enterprise
See also: Enterprise search
Fashion Net
Food/Recipes
Adzuna (UK)
Bixee.com (India)
CareerBuilder.com (USA)
Dice.com (USA)
Eluta.ca (Canada)
Hotjobs.com (USA)
Incruit (Korea)
Indeed.com (USA)
Glassdoor.com (USA)
LinkUp.com (USA)
Naukri.com (India)
Google Scholar
Manupatra
Quicklaw
WestLaw
Medical
Bing Health
Bioinformatic Harvester
GenieKnows
Healia
Healthline
PubGene
Searchmedica
WebMD
News
Bing News
Daylife
Google News
MagPortal
Newslookup
Topix.net
Trapit
Yahoo! News
People
Comfibook
Ex.plode.us
InfoSpace
PeekYou
Spock
Spokeo
Wink
Worldwide Helpers
Zabasearch.com
ZoomInfo
Real estate / property
Fizber.com
HotPads.com
Realtor.com
Redfin
Rightmove
Trulia
Zillow.com
Zoopla
Television
TV Genius
Video Games
Wazap (Japan)
By information type
Forum
Omgili
Blog
Amatomu
Bloglines
BlogScope
IceRocket
Regator
Technorati
Multimedia
See also: Multimedia search
Bing Videos
blinkx
FindSounds
Google Video
Munax's PlayAudioVideo
Picsearch
Pixsta
Podscope
ScienceStage
SeeqPod
Songza
TinEye
TV Genius
Veveo
Yahoo! Video
Source code
Koders
Krugle
BitTorrent
These search engines work across the BitTorrent protocol.
BTDigg
FlixFlux
Isohunt
Mininova
TorrentSpy
Torrentz
Cloud
Search engines listed below find various types of files that have been stored in the cloud and made publicly
available.
Open Drive
Email
Lookeen
TEK
Maps
Bing Maps
Géoportail
Google Maps
MapQuest
Nokia Maps
OpenStreetMap
WikiMapia
Yahoo! Maps
Price
Bing Shopping
Kelkoo
MySimon
PriceGrabber
PriceRunner
PriceSCAN
Pronto.com
Shopping.com
ShopWiki
SwoopThat.com
TheFind.com
Question and answer
Human answers
Answers.com
DeeperWeb
eHow
Quora
Uclue
wikiHow
Yahoo! Answers
Automatic answers
See also: Question answering
AskMeNow
BrainBoost
True Knowledge
Wolfram Alpha
Natural language
See also: Natural language search engine and Semantic search
Ask.com
hakia
Lexxe
By model
Privacy search engines
DuckDuckGo
Ixquick (StartPage)
Open source search engines
DataparkSearch
Gigablast
Grub
ht://Dig
Isearch
Lucene
mnoGoSearch
Namazu
Nutch
Recoll
Searchdaimon
Seeks
Sphinx
SWISH-E
Terrier Search Engine
Xapian
YaCy
Zettair
Semantic browsing engines
Hakia
Yebol
Social search engines
See also: Social search, Relevance feedback, and Human search engine
ChaCha Search
Delver
Eurekster
Mahalo.com
Rollyo
SearchTeam
Sproose
Trexy
Wink provides web search by analyzing user contributions such as bookmarks and feedback
Visual search engines
See also: Visual search engine
ChunkIt!
Grokker
Pixsta
PubGene
TinEye
Viewzi
Macroglossa
Search appliances
See also: Search appliance
Fabasoft
Searchdaimon
Thunderstone
Desktop search engines
See also: Desktop search
Proprietary,
Autonomy Windows IDOL Enterprise Desktop Search.
commercial
A mix of
the X11/MIT
Open source desktop search tool for Linux based on
Beagle Linux License and
Lucene. Unmaintained since 2009.
the Apache
License
Copernic
Free for
Desktop Windows
home use
Search
Open source desktop search tool for Windows and Linux, Eclipse Public
DocFetcher Cross-platform
based on Apache Lucene License
dtSearch Proprietary
Windows
Desktop (30 day trial)
Everything Windows Find files and folders by name instantly on NTFS volumes Freeware
GNOME
Linux Open Source desktop search tool for Unix/Linux GPL
Storage
Search
Locate32 Windows Graphical port of Unix's locate & updatedb BSD License[2]
Proprietary
Lookeen Windows Outlook Search Tool, with integrated Desktop Search
(14 day trial)
Meta Tracker Linux, Unix Open Source desktop search tool for Unix/Linux GPL v2 [3]
Recoll Linux, Unix Open Source desktop search tool for Unix/Linux GPL [4]
Spotlight Mac OS Found in Apple Mac OS X "Tiger" and later OS X releases. Proprietary
Terrier
Search Linux, Mac OS, Unix Desktop search for Windows, Mac OS X (Tiger), Unix/Linux. MPL
Engine
Freeware
Tropes Zoom Windows Semantic Search Engine. and
commercial
AOL Search
CompuServe Search
Groovle
MySpace Search
Mystery Seeker
Netscape
Ripple
Yahoo!
Ecocho
Forestle (an ecologically motivated site supporting sustainable rain forests - formerly based on Google)
GoodSearch
Rectifi
Bing
A9.com
Alexa Internet
Ciao!
Ms. Dewey
Yahoo! Search
Egerin
Ask.com
iWon
Lycos
Teoma
Defunct or acquired search engines
AlltheWeb
Btjunkie
Cuil
Google Answers
IBM STAIRS
Infoseek
Inktomi
Kartoo
Lotus Magellan
PubSub
RetrievalWare (acquired by Fast Search & Transfer and now owned by Microsoft)
Singingfish
Speechbot
Sphere
Tafiti
Yebol
WiseNut
Search aggregator
Search engine optimization
A metasearch engine is a search tool[1] that sends user requests to several other search engines and/or
databases and aggregates the results into a single list or displays them according to their source.
Metasearch engines enable users to enter search criteria once and access several search engines
simultaneously. Metasearch engines operate on the premise that the Web is too large for any one search
engine to index it all and that more comprehensive search results can be obtained by combining the results
from several search engines. This also may save the user from having to use multiple search engines
separately.
The term "metasearch" is frequently used to classify a set of commercial search engines, see the list of
Metasearch engine, but is also used to describe the paradigm of searching multiple data sources in real
time. The National Information Standards Organization (NISO) uses the terms Federated Search and
Metasearch interchangeably to describe this web search paradigm.
Contents
[hide]
1 Operation
2 See also
3 References
4 External links
Operation[edit]
No two metasearch engines are alike.[2] Some search only the most popular search engines while others
also search lesser-known engines, newsgroups, and other databases. They also differ in how the results
are presented and the quantity of engines that are used. Some will list results according to search engine or
database. Others return results according to relevance, often concealing which search engine returned
which results. This benefits the user by eliminating duplicate hits and grouping the most relevant ones at the
top of the list.
Search engines frequently have different ways they expect requests submitted. For example, some search
engines allow the usage of the word "AND" while others require "+" and others require only a space to
combine words. The better metasearch engines try to synthesize requests appropriately when submitting
them[citation needed].
See also[edit]
Search aggregator
Federated search
Metabrowsing
Multisearch
Travel website
References[edit]
1. Jump up^ Sandy Berger's Great Age Guide to the Internet By Sandy Berger. Que Publishing,
2. Jump up^ Manoj, M Elizabeth, Jacob pages=739–746 (October 2008). "Information retrieval on
Internet using meta-search engines: A review". CSIR. Retrieved February 25, 2012.
External links[edit]
Guide to Meta-Search Engines by UC Berkeley libraries with recommendation not to use them for
serious research.
Meta-search: More heads better than one? Argument against Berkeley's negative recommendation
Federated search
Federated search is an information retrieval technology that allows the simultaneous search of multiple
searchable resources. A user makes a single query request which is distributed to the search
engines participating in the federation. The federated search then aggregates the results that are received
from the search engines for presentation to the user.
Contents
[hide]
1 Purpose
2 Process
3 Implementation
4 Challenges
5 Related links
6 See also
7 References
Purpose[edit]
Federated search came about to meet the need of searching multiple disparate content sources with one
query. This allows a user to search multiple databases at once in real time, arrange the results from the
various databases into a useful form and then present the results to the user.
Process[edit]
As described by Peter Jacso (2004[1]), federated searching consists of (1) transforming a query and
broadcasting it to a group of disparate databases or other web resources, with the appropriate syntax, (2)
merging the results collected from the databases, (3) presenting them in a succinct and unified format with
minimal duplication, and (4) providing a means, performed either automatically or by the portal user, to sort
the merged result set.
Federated search portals, either commercial or open access, generally search public access bibliographic
databases, public access Web-based library catalogues (OPACs), Web-based search engines
like Google and/or open-access, government-operated or corporate data collections. These individual
information sources send back to the portal's interface a list of results from the search query. The user can
review this hit list. Some portals will merely screen scrape the actual database results and not directly allow
a user to enter the information source's application. More sophisticated ones will de-dupe the results list by
merging and removing duplicates. There are additional features available in many portals, but the basic
idea is the same: to improve the accuracy and relevance of individual searches as well as reduce the
amount of time required to search for resources.
This process allows federated search some key advantages when compared with existing crawler-based
search engines. Federated search need not place any requirements or burdens on owners of the individual
information sources, other than handling increased traffic. Federated searches are inherently as current as
the individual information sources, as they are searched in real time.
Implementation[edit]
One application of federated searching is the metasearch engine; however, this is not a complete solution
as many documents are not currently indexed. These documents are on what is known as the deep Web, or
invisible Web. Many more information sources are not yet stored in electronic form. Google Scholar is one
example of many projects trying to address this.
When the search vocabulary or data model of the search system is different from the data model of one or
more of the foreign target systems the query must be translated into each of the foreign target systems.
This can be done using simple data-element translation or may require semantic translation.
A challenge faced in the implementation of federated search engines is scalability, in other words, the
performance of the site as the number of information sources comprising the federated search engine
increase. One federated search engine that has begun to address this issue is WorldWideScience, hosted
by the U.S. Department of Energy's Office of Scientific and Technical Information. WorldWideScience [2] is
composed of more than 40 information sources, several of which are federated search portals themselves.
One such portal is Science.gov [3] which itself federates more than 30 information sources representing
most of the R&D output of the U.S. Federal government. Science.gov returns its highest ranked results to
WorldWideScience, which then merges and ranks these results with the search returned by the other
information sources that comprise WorldWideScience.[3] This approach of cascaded federated search
enables large number of information sources to be searched via a single query.
Another application Sesam running in both Norway and Sweden has been built on top of an open sourced
platform specialised for federated search solutions. Sesat,[4] an acronym for Sesam Search Application
Toolkit, is a platform that provides much of the framework and functionality required for handling parallel
and pipelined searches and displaying them elegantly in a user interface, allowing engineers to focus on the
index/database configuration tuning.
Challenges[edit]
When federated search is performed against secure data sources, the users' credentials must be passed on
to each underlying search engine, so that appropriate security is maintained. If the user has different login
credentials for different systems, there must be a means to map their login ID to each search engine's
security domain.[5]
Another challenge is mapping results list navigators into a common form. Suppose 3 real-estate sites are
searched, each provides a list of hyperlinked city names to click on, to see matches only in each city. Ideally
these facets would be combined into one set, but that presents additional technical challenges. [6] The
system also needs to understand "next page" links if it's going to allow the user to page through the
combined results.
Related links[edit]
Federated Search 101. Linoski, Alexis, Walczyk, Tine, Library Journal, Summer 2008 Net Connect,
Vol. 133[dead link] Note: this content has been moved here, but you will need a remote access account
through your local library to get the whole article.
Cox, Christopher N. Federated Search: Solution or Setback for Online Library Services. Binghamton,
NY: Haworth Information Press, 2007.Table of Contents
Federated Search Primer. Lederman, S., AltSearchEngines, January 2009[dead link] Note: This material
has been reposted here, on the blog of a commercial search engine company.
Milad Shokouhi and Luo Si, Federated Search, Foundations and Trends® in Information Retrieval: Vol.
5: No 1, pp 1-102., http://dx.doi.org/10.1561/1500000010
See also[edit]
Federated content
Metasearch engine
Funnelback
Search aggregator
Deep Web
References[edit]
1. Jump up^ Thoughts About Federated Searching. Jacsó, Péter, Information Today, Oct 2004, Vol. 21,
Issue 9
6. Jump up^ 20+ Differences Between Internet vs. Enterprise Search - part 1
[hide]
V
T
E
Internet search
Metasearch engine
Types
Collaborative search engine
Local search
Vertical search
Selection-based search
Social search
Document retrieval
Text mining
Web crawler
Multisearch
Federated search
Tools
Search aggregator
Index/Web indexing
Focused crawler
Spider trap
Web archiving
Voice search
Enterprise search
Semantic search
Z39.50
Search engine
Online search
Federated content
Federated content is digital media content that is designed to be self-managing to support reporting and
rights management in a peer-to-peer (P2P) network. Ex: Audio stored in adigital rights management (DRM)
file format.
Further reading[edit]
This computer networking article is a stub. You can help Wikipedia by expanding it.
Search aggregator
From Wikipedia, the free encyclopedia
[hide]This article has multiple issues. Please help improve it or discuss these issues on the
This article needs additional citations for verification. (August 2012)
This article possibly contains original research. (December 2008)
A search aggregator is a type of metasearch engine which gathers results from multiple search engines
simultaneously, typically through RSS search results. It combines user specified search feeds
(parameterized RSS feeds which return search results) to give the user the same level of control over
content as a general aggregator.
Soon after the introduction of RSS, sites began publicising their search results in parameterized RSS feeds.
Search aggregators are an increasingly popular way to take advantage of the power of multiple search
engines with a flexibility not seen in traditional metasearch engines. To the end user, a search aggregator
may appear to be just a customizable search engine and the use of RSS may be completely hidden.
However, the presence of RSS is directly responsible for the existence of search aggregators and a critical
component in the behind-the-scenes technology.
Contents
[hide]
1 History
2 Functional overview
3 Advantages
4 Patents
5 See also
6 References
History[edit]
The concept of search aggregation is a relatively recent phenomenon with the first ones becoming available
in 2006. In 2005 Amazon published the OpenSearch specification for making search results available in a
generic XML format. While many sites currently publish results in OpenSearch, many simply publish in
generic RSS format. However, while OpenSearch syndication allows for greater flexibility in the way Search
Aggregators display results, it is generally not required.
Functional overview[edit]
A search aggregator typically allows users to select specific search engines ad hoc to perform a specified
query. At the time the user enters the query into the Search Aggregator, it generates the required URL "on
the fly" by inserting the search query into the parameterized URL for the search feed. A parameterized URL
looks something like this:
http://news.google.com/news?hl=en&ned=us&q={SEARCH_TERMS}&ie=UTF-
8&output=rss
In this case, the {SEARCH_TERMS} parameter would be replaced with the user requested search terms,
and the query would be sent to the host. The Search Aggregator would then parse the results and display
them in a user-friendly way.
Advantages[edit]
This system has several advantages over traditional metasearch engines. Primarily, it allows the user
greater flexibility in deciding which engines should be used to perform the query. They also allow for easy
addition of new engines to the users personal collection (similar to the way a user adds a new news feed to
a news aggregator.)
Patents[edit]
Apple patent 6,847,959,[1] filed January 5 2000, covers universal search aggregation. This resulted in the
removal [2] of this feature from Samsung Android smart phones in July 2012.
See also[edit]
Aggregator
Metasearch engine
Federated search
References[edit]
1. Jump up^ "Patent US6847959 - Universal interface for retrieval of information in a computer system -
2. Jump up^ Florian Mueller (2012-02-15). "Last week's Apple-Samsung lawsuit involves eight patents, 17
products - bid for Nexus ban is based on only a subset". FOSS Patents. Retrieved 2012-08-16.
News aggregator
From Wikipedia, the free encyclopedia
This article needs additional citations for verification. Please help improve this
article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (September 2009)
News aggregator
In computing, a news aggregator, also termed a feed aggregator, feed reader, news
reader, RSS reader or simply aggregator, isclient software or a web application which
aggregates syndicated web content such as online newspapers, blogs, podcasts, and video blogs (vlogs) in
one location for easy viewing.
Contents
[hide]
1 Function
2 Types
3 Media aggregators
4 Broadcatching
5 Feed filtering
6 See also
7 References
8 External links
Function[edit]
Visiting many separate websites frequently to find out if content on the site has been updated can take a
long time. Aggregation technology helps to consolidate many websites into one page that can show the new
or updated information from many sites. Aggregators reduce the time and effort needed to regularly check
websites for updates, creating a unique information space or personal newspaper. Once subscribed to a
feed, an aggregator is able to check for new content at user-determined intervals and retrieve the update.
The content is sometimes described as being pulled to the subscriber, as opposed to pushed with email or
IM. Unlike recipients of some pushed information, the aggregator user can easily unsubscribe from a feed.
Aggregation features are frequently built into web portal sites, in the web browsers themselves,
in email applications or in application software designed specifically for reading feeds.
The aggregator provides a consolidated view of the content in one browser display or desktop application.
Aggregators with podcasting capabilities can automatically download media files, such as MP3 recordings.
In some cases, these can be automatically loaded onto portable media players (like iPods) when they are
connected to the end-user's computer.
By 2011, so-called RSS-narrators appeared, which aggregated text-only news feeds, and converted them
into audio recordings for offline listening.
The syndicated content an aggregator will retrieve and interpret is usually supplied in the form of RSS or
other XML-formatted data, such as RDF/XML or Atom.
Types[edit]
The variety of software applications and components that are available to collect, format, translate, and
republish XML feeds is a testament to the flexibility of the format and has shown the usefulness
of presentation-independent data.
Examples of this sort of website are Google News, Drudge Report, Huffington Post,
[1]
Newslookup, Newsvine, World News (WN) Network, and Daily Beast where aggregation is entirely
automatic, using algorithms which carry out contextual analysis and group similar stories together, while
other sites supplement automatically-aggregated stories with manually curated headlines and their own
articles.[2]
News aggregation websites began with content selected and entered by humans, while automated
selection algorithms were eventually developed to fill the content from a range of either automatically
selected or manually added sources. Google News launched in 2002 using automated story selection, but
humans could add sources to its search engine, while the older Yahoo News, as of 2005, used a
combination of automated news crawlers and human editors.[3][4][5]
Web-based feed readers[edit]
Web-based feeds readers allow users to find a web feed on the internet and add it to their feed reader.
Online feed readers include Bloglines, Feedly, Feedspot, Flipboard, DiggReader, News360, Google
Reader (discontinued July 1, 2013[6]), My Yahoo!, NewsBlur,[7][8] Netvibes. These are meant for personal use
and are hosted on remote servers. Because the application is available via the web, it can be accessed
anywhere by a user with an internet connection.
More advanced methods of aggregating feeds are provided via Ajax coding techniques
and XML components called web widgets. Ranging from full-fledged applications to small fragments
of source code that can be integrated into larger programs, they allow users to aggregate OPML files, email
services, documents, or feeds into one interface. Many customizable homepage and portal implementations
provide such functionality.
In addition to aggregator services mainly for individual use, there are web applications that can be used to
aggregate several blogs into one. One such variety—called planet sites—are used by online communities to
aggregate community blogs in a centralized location. They are named after the Planet aggregator, a server
application designed for this purpose.
Feed aggregation applications are installed on a PC, smartphone or tablet computer and designed to collect
news and interest feed subscriptions and group them together using a user-friendly interface. The graphical
user interface of such applications often closely resembles that of popular e-mail clients, using a three-
panel composition in which subscriptions are grouped in a frame on the left, and individual entries are
browsed, selected, and read in frames on the right. Some notable examples include Flipboard, Prismatic,
and CNN-owned Zite.[9][10]
Software aggregators can also take the form of news tickers which scroll feeds like ticker tape, alerters that
display updates in windows as they are refreshed, web browser macro tools or as smaller components
(sometimes called plugins or extensions), which can integrate feeds into the operating system or software
applications such as a web browser. Clients applications include Mozilla Firefox, Microsoft Office
Outlook, iTunes, FeedDemon and many others.
Media aggregators[edit]
Media aggregators are sometimes referred to as podcatchers due to the popularity of the
term podcast used to refer to a web feed containing audio or video. Media aggregators are client software
or web-based applications which maintain subscriptions to feeds that contain audio or video media
enclosures. They can be used to automatically download media, playback the media within the application
interface, or synchronize media content with a portable media player.
Broadcatching[edit]
Several BitTorrent client software applications have added the ability to broadcatch torrents of distributed
multimedia through the aggregation of web feeds.
Feed filtering[edit]
One of the problems with news aggregators is that the volume of articles can sometimes be overwhelming,
especially when the user has many web feed subscriptions. As a solution, many feed readers allow users
to tag each feed with one or more keywords which can be used to sort and filter the available articles into
easily navigable categories. Another option is to import the user's Attention Profile to filter items based on
their relevance to the user's interests.
See also[edit]
Web feed
Metasearch engine
Lifestreaming
1. Jump up^ Luscombe, Belinda (2009-03-19). "Arianna Huffington: The Web's New Oracle". Time (Time
Inc). Retrieved 2009-03-30. (subscription required (help)). "The Huffington Post was to have three basic
functions: blog, news aggregator with an attitude and place for premoderated comments."
2. Jump up^ "Google News and newspaper publishers: allies or enemies?". Editorsweblog.org. World
3. Jump up^ Hansell, Saul (24 September 2002). "All the news Google algorithms say is fit to print". The
4. Jump up^ Hill, Brad (24 October 2005). Google Search & Rescue For Dummies. John Wiley & Sons.
5. Jump up^ LiCalzi O'Connell, Pamela (29 January 2001). "New Economy; Yahoo Charts the Spread of
the News by E-Mail, and What It Finds Out Is Itself Becoming News.". New York Times.
6. Jump up^ Chitu, Alex. "No More Google Reader". Google Operating System. Blogger. Retrieved 14
March 2013.
7. Jump up^ "YC-Backed NewsBlur Takes Feed Reading Back To Its Basics". TechCrunch. July 30,
2012.
8. Jump up^ "Need A Google Reader Alternative? Meet Newsblur". Search Engine Land. March 14, 2013.
9. Jump up^ Cheredar, Tom (22 May 2013). "Zite’s new iOS app update welcomes (but doesn’t cater to)
Funnelback
Type Private
Industry Software
Founded 2006
Revenue Unknown
Employees 35
Website www.funnelback.com
Funnelback is both an enterprise search engine and the name of the company selling the technology.
Funnelback is used by manyAustralian universities and government organisations to search for information
on their websites, intranets, file-shares and databases.
Contents
[hide]
1 History
2 Research
3 Technology
4 Awards
5 Sources
History[edit]
Funnelback was originally developed by the CSIRO as part of a research project, then called "P@noptic".
The initial design and development was headed up by Dr. David Hawking, a researcher on search
technologies. With its initial launch in 2001 as P@noptic, and with the research weight of CSIRO behind it, it
quickly attracted some high profile clients. With its rapid commercial success, it was spun off from the
CSIRO ICT Centre as its own company in February 2006. In July 2009 Funnelback was purchased
by Squiz.
Research[edit]
Despite being spun off from CSIRO, it still retains close research links with them and jointly publishes
papers. Research is currently underway in the areas of search types, subject-specific search, realistic
applications of metasearch, topic distillation, website searchability and search in support of e-commerce.
Technology[edit]
The core of the system is based around the Padre engine developed by the CSIRO to perform very fast
data look-ups.
It has the ability to search a range of formats in addition to range of HTML files. These
include PDF, Word documents, Excel spreadsheets, images, and XML. It also has the ability to connect to,
and index databases. Adapters currently exist for MySQL and TRIM Context, and the design of the system
allows users to create their own adapters should they not exist.
It works on Windows, Linux and Solaris, and there is also a hosted service available.
Awards[edit]
The Panoptic search engine has been widely recognised as a leader in its field. Panoptic has been
awarded:
Funnelback site
The Deep Web (also called the Deepnet,[1] Invisible Web,[2] or Hidden Web[3]) is World Wide Web content
that is not part of the Surface Web, which is indexed by standardsearch engines. It should not be confused
with the dark Internet, the computers that can no longer be reached via the Internet, or with
a Darknet distributed filesharing network, which could be classified as a smaller part of the Deep Web.
There is concern that the deep web can be used for serious criminal activity.[4]
Mike Bergman, founder of BrightPlanet and credited with coining the phrase,[5] said that searching on the
Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be
caught in the net, but there is a wealth of information that is deep and therefore missed. [6] Most of the Web's
information is buried far down on dynamically generated sites, and standard search engines do not find it.
Traditional search engines cannot "see" or retrieve content in the deep Web—those pages do not exist until
they are created dynamically as the result of a specific search. As of 2001, the deep Web was
several orders of magnitude larger than the surface Web.[7]
Contents
[hide]
1 Size
2 Naming
3 Deep resources
4 Accessing
6 Classifying resources
7 Future
8 See also
9 References
10 Further reading
11 External links
Size[edit]
Naming[edit]
Bergman, in a seminal paper on the deep Web published in the Journal of Electronic Publishing, mentioned
that Jill Ellsworth used the term invisible Web in 1994 to refer towebsites that were not registered with any
search engine.[7] Bergman cited a January 1996 article by Frank Garcia:[10]
It would be a site that's possibly reasonably designed, but they didn't bother to register it with any of the
search engines. So, no one can find them! You're hidden. I call that the invisible Web.
Another early use of the term Invisible Web was by Bruce Mount and Matthew B. Koll of Personal Library
Software, in a description of the @1 deep Web tool found in a December 1996 press release. [11]
The first use of the specific term Deep Web, now generally accepted, occurred in the aforementioned 2001
Bergman study.[7]
Deep resources[edit]
Deep Web resources may be classified into one or more of the following categories:
Dynamic content: dynamic pages which are returned in response to a submitted query or accessed
only through a form, especially if open-domain input elements (such as text fields) are used; such fields
are hard to navigate without domain knowledge.
Unlinked content: pages which are not linked to by other pages, which may prevent Web
crawling programs from accessing the content. This content is referred to as pages without backlinks
(or inlinks).
Private Web: sites that require registration and login (password-protected resources).
Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP
addresses or previous navigation sequence).
Limited access content: sites that limit access to their pages in a technical way (e.g., using the Robots
Exclusion Standard, CAPTCHAs, or no-cache Pragma HTTP headerswhich prohibit search engines
from browsing them and creating cached copies.[12])
Scripted content: pages that are only accessible through links produced by JavaScript as well as
content dynamically downloaded from Web servers via Flash or Ajax solutions.
Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file
formats not handled by search engines.
Accessing[edit]
To discover content on the Web, search engines use web crawlers that follow hyperlinks through known
protocol virtual port numbers. This technique is ideal for discovering resources on the surface Web but is
often ineffective at finding deep Web resources. For example, these crawlers do not attempt to find dynamic
pages that are the result of database queries due to the indeterminate number of queries that are possible.
[5]
It has been noted that this can be (partially) overcome by providing links to query results, but this could
unintentionally inflate the popularity for a member of the deep Web.
In 2005, Yahoo! made a small part of the deep Web searchable by releasing Yahoo! Subscriptions. [citation
needed]
This search engine searches through a few subscription-only Web sites.[citation needed] Some subscription
websites display their full content to search engine robots so they will show up in user searches, but then
show users a login or subscription page when they click a link from the search engine results page. [citation
needed]
DeepPeep, Intute, Deep Web Technologies, and Scirus are a few search engines that have accessed the
deep web. Intute ran out of funding and is now a temporary static archive as of July, 2011. [13] Scirus retired
near the end of January, 2013.[14]
Some so-called hidden services can be accessed only through an onion router such as Tor (anonymity
network) which allows the user to access .onion web pages.
Researchers have been exploring how the Deep Web can be crawled in an automatic fashion. In 2001,
Sriram Raghavan and Hector Garcia-Molina (Stanford Computer Science Department, Stanford University)
[15][16]
presented an architectural model for a hidden-Web crawler that used key terms provided by users or
collected from the query interfaces to query a Web form and crawl the Deep Web resources. Alexandros
Ntoulas, Petros Zerfos, and Junghoo Cho of UCLA created a hidden-Web crawler that automatically
generated meaningful queries to issue against search forms.[17] Several form query languages (e.g.,
DEQUEL[18]) have been proposed that, besides issuing a query, also allow to extract structured data from
result pages. Another effort is DeepPeep, a project of the University of Utah sponsored by the National
Science Foundation, which gathered hidden-Web sources (Web forms) in different domains based on novel
focused crawler techniques.[19][20]
Commercial search engines have begun exploring alternative methods to crawl the deep Web. The Sitemap
Protocol (first developed, and introduced by Google in 2005) and mod oaiare mechanisms that allow search
engines and other interested parties to discover deep Web resources on particular Web servers. Both
mechanisms allow Web servers to advertise the URLs that are accessible on them, thereby allowing
automatic discovery of resources that are not directly linked to the surface Web. Google's deep Web
surfacing system pre-computes submissions for each HTML form and adds the resulting HTML pages into
the Google search engine index. The surfaced results account for a thousand queries per second to deep
Web content.[21] In this system, the pre-computation of submissions is done using three algorithms:
1. selecting input values for text search inputs that accept keywords,
2. identifying inputs which accept only values of a specific type (e.g., date), and
3. selecting a small number of input combinations that generate URLs suitable for inclusion into the
Web search index.
Classifying resources[edit]
This section possibly contains original research. Please improve
it by verifying the claims made and adding inline citations. Statements consisting
only of original research may be removed. (September 2012)
Automatically determining if a Web resource is a member of the surface Web or the deep Web is difficult. If
a resource is indexed by a search engine, it is not necessarily a member of the surface Web, because the
resource could have been found using another method (e.g., the Sitemap Protocol, mod_oai, OAIster)
instead of traditional crawling. If a search engine provides a backlink for a resource, one may assume that
the resource is in the surface Web. Unfortunately, search engines do not always provide all backlinks to
resources. Furthermore, a resource may reside in the surface Web even though it has yet to be found by a
search engine.
Most of the work of classifying search results has been in categorizing the surface Web by topic. For
classification of deep Web resources, Ipeirotis et al.[22] presented an algorithm that classifies a deep Web
site into the category that generates the largest number of hits for some carefully selected, topically-focused
queries. Deep Web directories under development include OAIster at the University of Michigan, Intute at
the University of Manchester, Infomine[23] at the University of California at Riverside, and DirectSearch
(by Gary Price). This classification poses a challenge while searching the deep Web whereby two levels of
categorization are required. The first level is to categorize sites into vertical topics (e.g., health, travel,
automobiles) and sub-topics according to the nature of the content underlying their databases.
The more difficult challenge is to categorize and map the information extracted from multiple deep Web
sources according to end-user needs. Deep Web search reports cannot display URLs like traditional search
reports. End users expect their search tools to not only find what they are looking for special, but to be
intuitive and user-friendly. In order to be meaningful, the search reports have to offer some depth to the
nature of content that underlie the sources or else the end-user will be lost in the sea of URLs that do not
indicate what content lies beneath them. The format in which search results are to be presented varies
widely by the particular topic of the search and the type of content being exposed. The challenge is to find
and map similar data elements from multiple disparate sources so that search results may be exposed in a
unified format on the search report irrespective of their source.
Future[edit]
The lines between search engine content and the deep Web have begun to blur, as search services start to
provide access to part or all of once-restricted content. An increasing amount of deep Web content is
opening up to free search as publishers and libraries make agreements with large search engines. In the
future, deep Web content may be defined less by opportunity for search than by access fees or other types
of authentication.[citation needed]
See also[edit]
Internet portal
Dark Internet
I2P
Freenet
Gopher protocol
References[edit]
1. Jump up^ Hamilton, Nigel. "The Mechanics of a Deep Net Metasearch Engine".doi:10.1.1.90.5847.
2. Jump up^ Devine, Jane; Egger-Sider, Francine (7 2004). "Beyond google: the invisible web in the
academic library". The Journal of Academic Librarianship 30 (4): 265–269. Retrieved 2014-02-06.
3. Jump up^ Raghavan, Sriram; Garcia-Molina, Hector (11–14 September 2001). "Crawling the Hidden
Web". 27th International Conference on Very Large Data Bases (Rome, Italy).
4. Jump up^ The Secret Web: Where Drugs, Porn and Murder Live Online
5. ^ Jump up to:a b Wright, Alex (2009-02-22). "Exploring a 'Deep Web' That Google Can’t Grasp".The New
6. Jump up^ Bergman, Michael K (July 2000). The Deep Web: Surfacing Hidden Value. BrightPlanet LLC.
7. ^ Jump up to:a b c d Bergman, Michael K (August 2001). "The Deep Web: Surfacing Hidden Value". The
8. Jump up^ He, Bin; Patel, Mitesh; Zhang, Zhen; Chang, Kevin Chen-Chuan (May 2007)."Accessing the
101. doi:10.1145/1230819.1241670.
9. Jump up^ Denis Shestakov (2011). "Sampling the National Deep Web" (PDF). Proceedings of the 22nd
International Conference on Database and Expert Systems Applications (DEXA) (in Russian).
Springer.com. pp. 331–340. Archived from the original on September 2, 2011. Retrieved 2011-10-06.
10. Jump up^ Garcia, Frank (January 1996). "Business and Marketing on the Internet". Masthead15 (1).
11. Jump up^ @1 started with 5.7 terabytes of content, estimated to be 30 times the size of the nascent
World Wide Web; PLS was acquired by AOL in 1998 and @1 was abandoned."PLS introduces AT1, the
first 'second generation' Internet search service" (Press release). Personal Library Software. December
12. Jump up^ pragma:no-cache/cache-control:no-cache "HTTP 1.1: Header Field Definitions (14.32
Pragma)". HTTP — Hypertext Transfer Protocol. World Wide Web Consortium. 1999. Retrieved 2009-
02-24.
13. Jump up^ "Intute FAQ". Retrieved October 13, 2012.
15. Jump up^ Sriram Raghavan; Hector Garcia-Molina (2000). Crawling the Hidden Web(PDF). Stanford
16. Jump up^ Raghavan, Sriram; Garcia-Molina, Hector (2001). "Crawling the Hidden
Web"(PDF). Proceedings of the 27th International Conference on Very Large Data Bases (VLDB).
pp. 129–38.
17. Jump up^ Alexandros, Ntoulas; Petros Zerfos, and Junghoo Cho (2005). Downloading Hidden Web
18. Jump up^ Shestakov, Denis; Sourav S Bhowmick, and Ee-Peng Lim (2005). "DEQUE: Querying the
19. Jump up^ Barbosa, Luciano; Juliana Freire (2007). An Adaptive Crawler for Locating Hidden-Web
20. Jump up^ Barbosa, Luciano; Juliana Freire (2005). Searching for Hidden-Web Databases.. WebDB
21. Jump up^ Madhavan, Jayant; David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy
(2008). Google’s Deep-Web Crawl (PDF). VLDB Endowment, ACM. Retrieved 2009-04-17.
22. Jump up^ Ipeirotis, Panagiotis G.; Gravano, Luis; Sahami, Mehran (2001). "Probe, Count, and Classify:
Categorizing Hidden-Web Databases" (PDF). Proceedings of the 2001 ACM SIGMOD International
Further reading[edit]
Paganini, Pierluigi (May 2012), "What is the Deep Web? A first trip into the abyss", Security Affairs, Napoli,
Paganini, Pierluigi (Sept 2012), "The good and the bad of the Deep Web", Security Affairs, Napoli, Italy:
Security Affairs .
Paganini, Pierluigi (Sept 2013), "Traffic Correlation Attacks against Anonymity on Tor", Security Affairs,
Pierluigi Paganini, Richard Amores (Sept 2012), "The Deep Dark Web", The Deep Dark Web, NY, USA:
Amazon .
Paganini, Pierluigi (Oct 2012), "The Deep Web Part 1: Introduction to the Deep Web and how to wear clothes
Hamilton, Nigel (2003), The Mechanics of a Deep Net Metasearch Engine, 12th World Wide Web
Conference.
He, Bin; Chang, Kevin Chen-Chuan (2003). "Statistical Schema Matching across Web Query
Interfaces" (PDF). Proceedings of the 2003 ACM SIGMOD International Conference on Management of
Ipeirotis, Panagiotis G.; Gravano, Luis; Sahami, Mehran (2001). "Probe, Count, and Classify: Categorizing
Hidden-Web Databases" (PDF). Proceedings of the 2001 ACM SIGMOD International Conference on
King, John D.; Li, Yuefeng; Tao, Daniel; Nayak, Richi (November 2007). "Mining World Knowledge for
Analysis of Search Engine Content" (PDF). Web Intelligence and Agent Systems: an International
McCown, Frank; Liu, Xiaoming; Nelson, Michael L; Zubair, Mohammad (March–April 2006). "Search Engine
Coverage of the OAI-PMH Corpus" (PDF). IEEE Internet Computing 10 (2): 66–
73. doi:10.1109/MIC.2006.41.
Price, Gary; Sherman, Chris (July 2001). The Invisible Web: Uncovering Information Sources Search
Shestakov, Denis (June 2008). Search Interfaces on the Web: Querying and Characterizing. TUCS Doctoral
Wright, Alex (Mar 2004), In Search of the Deep Web, Salon, archived from the original on 9 March 2007.
External links[edit]
Basu, Saikat (March 14, 2010), 10 Search Engines to Explore the Invisible Web, MakeUseOf.com.
Whoriskey, Peter (December 11, 2008), Firms Push for a More Searchable Federal Web, The
Washington Post, p. D01.
Metabrowsing
From Wikipedia, the free encyclopedia
Metabrowsing refers to approaches to browsing Web-based information that emerged in the late 1990s as
alternatives to the standard Web browser. According to LexisNexis the term "metabrowsing" began
appearing in mainstream media in March 2000.[1][2] Since then the meaning of "metabrowsing" has split into
a popular and a more scientific use of the term.
Contents
[hide]
1 Popular use
2 Scientific use
4 Technology
5 References
Popular use[edit]
Akin to metasearch, the popular use of the term "metabrowsing" describes an alternative way to viewing
Web-based information other than a single Web-page at a time. "Simply put, metabrowsing is a tool or
service that enables the user to view more than a single Web page at a time inside a single display unit." [3]
According to Dr. Linda Gordon, Liberal Arts Professor at Nova Southeastern University, "metabrowsing is
transforming our understanding of the web, therefore, the vocabulary of this new perspective must
demonstrate the nature of the metamorphosis. The etymological root 'meta', from the Greek, means
'change' and 'transcendance', and thus we can understand the dynamics of metabrowsing as a view of the
web from a higher level. What is this higher level? To speak metaphorically, think of the limitations of street
signs for navigation: metabrowsing will become the GPS of the internet.". [4]
Scientific use[edit]
There are several scientific papers that use the term to describe the browsing of "graphical representations"
of documents. In this context "metabrowsing" refers to a high-level way of browsing through information:
instead of browsing through document contents or document surrogates, the user browses through a
graphical representation of the documents and their relations to the domain.[5][6]
Quickbrowse was one of the first Web-based metabrowsing applications, enabling users to combine
multiple pages into one vertical, continuously scrollable page for faster viewing. Onepage.com and
Octopus.com offered more sophisticated systems for combining not just entire Web pages but bits and
pieces of different pages into a new "combo page". Octopus received more than $11.4 million in venture
capital funding.[7] Onepage received $25 million in venture capital funding.[8] Sybase acquired Onepage in
2002 changing the service from an end user oriented business model to an enterprise-driven concept. In
the end Onepage was terminated. Calltheshots.com was acquired by Akamai and then also disappeared,
as did Katiesoft and iHarvest.com.
Technology[edit]
Web-based metabrowsing services such as Quickbrowse, Octopus and Onepage differed in their
technological approach. Quickbrowse only allows the combination of complete Web pages. The service
retrieves the HTML of designated pages and then combines it into a new "combo page" server-side. This
"raw" approach does not work with all types of Web pages, especially cascading style sheets
whose HTML does not combine well. Quickbrowse also disables JavaScript components so as to avoid
problems that would arise from the combination of disparate and unrelated sources of JavaScript code.
Unwanted layout distortions may result when combining pages. Services like Octopus and Onepage, both
out of business, used a more sophisticated Java-driven approach that enabled users' browsers to retrieve
and combine bits and pieces from disparate Web sites client-side.
References[edit]
1. Jump up^ Peter Sinclair - Here's the hot word to all cyberg(r)eeks, The New Zealand Herald, April 4,
2. Jump up^ Joy Wu (October 26, 2000). "Quickbrowse speeds surfing the web by connecting pages". The
5. Jump up^ Information retrieval by metabrowsing. Journal of the American Society for Information
Multisearch is a multitasking search engine which includes both search engine and metasearch
engine characteristics with additional capability of retrieval of search result sets that were previously
classified by users. It enables the user to gather results from its own search index as well as from one or
more search engines, metasearch engines, databasesor any such kind of information retrieval (IR)
programs. Multisearch is an emerging feature of automated search and information retrieval systems which
combines the capabilities of computer search programs with results classification made by a human.
Multisearch is a way to take advantage of the power of multiple search engines with a flexibility not seen in
traditional metasearch engines. To the end user, a multisearch may appear to be just a customizable
search engine; however, its behind-the-scenes technology enables it to put a face to the search process
and to retrieve and display also a results set which has been classified by a human during a multisearch
session and automatically included in the documents index. There are additional features available in many
search engines and metasearch engines, but the basic idea is the same: reducing the amount of time
required to search for resources by improvement of the accuracy and relevance of individual searches as
well as the ability to manage the results.
See also[edit]
Search aggregator
Metabrowsing
Search engine
Internet search engines
From Wikipedia, the free encyclopedia
Commons cat.|Internet search engines}} General search engines that search for information on the Internet.
For more specific search engines, see other subcategories of Category:Searching.
Subcategories
D P
The following 200 pages are in this category, out of 313 total. This list may not reflect recent changes (learn
more).
Metasearch engine
Types
Collaborative search engine
Local search
Vertical search
Selection-based search
Social search
Document retrieval
Text mining
Web crawler
Multisearch
Federated search
Tools
Search aggregator
Index/Web indexing
Focused crawler
Spider trap
Web archiving
Voice search
Image search
Semantic search
Protocols Z39.50
OpenSearch
Search engine
Online search