0% found this document useful (0 votes)

28 views66 pages

Web search engine

Uploaded by

AchourFenni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views66 pages

Web search engine

Uploaded by

AchourFenni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 66

Web search engine

From Wikipedia, the free encyclopedia

"Search engine" redirects here. For other uses, see Search engine (disambiguation).

For a tutorial on using search engines for researching Wikipedia articles, see Wikipedia:Search engine test.

A web search engine is a software system that is designed to search for information on the World Wide
Web. The search results are generally presented in a line of results often referred to as search engine
results pages (SERPs). The information may be a specialist in web pages, images, information and other
types of files. Some search engines alsomine data available in databases or open directories. Unlike web
directories, which are maintained only by human editors, search engines also maintain real-time information
by running an algorithm on a web crawler.

Contents

[hide]

 1 History

 2 How web search engines work

 3 Market share

 4 Search engine bias

 5 Customized results and filter bubbles

 6 See also

 7 References

 8 Further reading

 9 External links

History[edit]

Further information: Timeline of web search engines

Timeline (full list)

Yea Engine Current status

1993 W3Catalog Active

Aliweb Inactive
JumpStation Inactive

1994 WebCrawler Active, Aggregator

Go.com Active, Yahoo Search

Lycos Active

Infoseek Inactive

1995 AltaVista Inactive, redirected to Yahoo!

Daum Active

Magellan Inactive

Excite Active

SAPO Active

Yahoo! 2008 Active, Launched as a directory

1996 Dogpile Active, Aggregator

Inktomi Inactive, acquired by Yahoo!

HotBot Active (lycos.com)

Ask Jeeves Active (rebranded ask.com)

1997 Northern Light Inactive

Yandex Active

1998 Goto Inactive

Google Active
MSN Search Active as Bing

empas Inactive (merged with NATE)

1999 AlltheWeb Inactive (URL redirected to Yahoo!)

GenieKnows Active, rebranded Yellowee.com

Naver Active

Teoma Active

Vivisimo Inactive

2000 Baidu Active

Exalead Active

Gigablast Active

2003 Info.com Active

Scroogle Inactive

2004 Yahoo! Search Active, Launched own web search

(see Yahoo! Directory, 1995)

A9.com Inactive

Sogou Active

2005 AOL Search Active

Ask.com Active

GoodSearch Active

SearchMe Inactive
2006 wikiseek Inactive

Quaero Active

Ask.com Active

Live Search Active as Bing, Launched as

rebranded MSN Search

ChaCha Active

Guruji.com Active as BeeMP3.com

2007 wikiseek Inactive

Sproose Inactive

Wikia Search Inactive

Blackle.com Active, Google Search

2008 Powerset Inactive (redirects to Bing)

Picollator Inactive

Viewzi Inactive

Boogami Inactive

LeapFish Inactive

Forestle Inactive (redirects to Ecosia)

DuckDuckGo Active

2009 Bing Active, Launched as

rebranded Live Search

Yebol Inactive

Mugurdy Inactive due to a lack of funding

Scout (Goby) Active

NATE Active

2010 Blekko Active

Cuil Inactive

Yandex Active, Launched global

(English) search

2011 YaCy Active, P2P web search engine

2012 Volunia Inactive

Cloud Kite Active,

formerly Open Drive cloud search

During early development of the web, there was a list of webservers edited by Tim Berners-Lee and hosted
on the CERNwebserver. One historical snapshot of the list in 1992 remains,[1] but as more and more
webservers went online the central list could no longer keep up. On the NCSA site, new servers were
announced under the title "What's New!"[2]

The very first tool used for searching on the Internet was Archie.[3] The name stands for "archive" without
the "v". It was created in 1990 by Alan Emtage, Bill Heelan and J. Peter Deutsch, computer science
students at McGill University in Montreal. The program downloaded the directory listings of all the files
located on public anonymous FTP (File Transfer Protocol) sites, creating a searchable database of file
names; however, Archie did not index the contents of these sites since the amount of data was so limited it
could be readily searched manually.

The rise of Gopher (created in 1991 by Mark McCahill at the University of Minnesota) led to two new search
programs, Veronicaand Jughead. Like Archie, they searched the file names and titles stored in Gopher
index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a
keyword search of most Gopher menu titles in the entire Gopher listings. Jughead
(Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information
from specific Gopher servers. While the name of the search engine "Archie" was not a reference to
the Archie comic book series, "Veronica" and "Jughead" are characters in the series, thus referencing their
predecessor.

In the summer of 1993, no search engine existed for the web, though numerous specialized catalogues
were maintained by hand.Oscar Nierstrasz at the University of Geneva wrote a series of Perl scripts that
periodically mirrored these pages and rewrote them into a standard format. This formed the basis
for W3Catalog, the web's first primitive search engine, released on September 2, 1993. [4]

In June 1993, Matthew Gray, then at MIT, produced what was probably the first web robot, the Perl-
based World Wide Web Wanderer, and used it to generate an index called 'Wandex'. The purpose of the
Wanderer was to measure the size of the World Wide Web, which it did until late 1995. The web's second
search engine Aliweb appeared in November 1993. Aliweb did not use aweb robot, but instead depended
on being notified by website administrators of the existence at each site of an index file in a particular
format.

JumpStation (created in December 1993[5] by Jonathon Fletcher) used a web robot to find web pages and to
build its index, and used a web form as the interface to its query program. It was thus the first WWW
resource-discovery tool to combine the three essential features of a web search engine(crawling, indexing,
and searching) as described below. Because of the limited resources available on the platform it ran on, its
indexing and hence searching were limited to the titles and headings found in the web pages the crawler
encountered.

One of the first "all text" crawler-based search engines was WebCrawler, which came out in 1994. Unlike its
predecessors, it allowed users to search for any word in any webpage, which has become the standard for
all major search engines since. It was also the first one widely known by the public. Also in
1994, Lycos (which started at Carnegie Mellon University) was launched and became a major commercial
endeavor.

Soon after, many search engines appeared and vied for popularity. These
included Magellan, Excite, Infoseek, Inktomi, Northern Light, and AltaVista. Yahoo! was among the most
popular ways for people to find web pages of interest, but its search function operated on its web directory,
rather than its full-text copies of web pages. Information seekers could also browse the directory instead of
doing a keyword-based search.

Google adopted the idea of selling search terms in 1998, from a small search engine company
named goto.com. This move had a significant effect on the SE business, which went from struggling to one
of the most profitable businesses in the internet.[6]

In 1996, Netscape was looking to give a single search engine an exclusive deal as the featured search
engine on Netscape's web browser. There was so much interest that instead Netscape struck deals with
five of the major search engines: for $5 million a year, each search engine would be in rotation on the
Netscape search engine page. The five engines were Yahoo!, Magellan, Lycos, Infoseek, and Excite. [7][8]
Search engines were also known as some of the brightest stars in the Internet investing frenzy that
occurred in the late 1990s.[9]Several companies entered the market spectacularly, receiving record gains
during their initial public offerings. Some have taken down their public search engine, and are marketing
enterprise-only editions, such as Northern Light. Many search engine companies were caught up in the dot-
com bubble, a speculation-driven market boom that peaked in 1999 and ended in 2001.

Around 2000, Google's search engine rose to prominence.[10] The company achieved better results for many
searches with an innovation called PageRank. This iterative algorithm ranks web pages based on the
number and PageRank of other web sites and pages that link there, on the premise that good or desirable
pages are linked to more than others. Google also maintained a minimalist interface to its search engine. In
contrast, many of its competitors embedded a search engine in a web portal. In fact, Google search engine
became so popular that spoof engines emerged such as Mystery Seeker.

By 2000, Yahoo! was providing search services based on Inktomi's search engine. Yahoo! acquired Inktomi
in 2002, and Overture(which owned AlltheWeb and AltaVista) in 2003. Yahoo! switched to Google's search
engine until 2004, when it launched its own search engine based on the combined technologies of its
acquisitions.

Microsoft first launched MSN Search in the fall of 1998 using search results from Inktomi. In early 1999 the
site began to display listings from Looksmart, blended with results from Inktomi. For a short time in 1999,
MSN Search used results from AltaVista were instead. In 2004, Microsoft began a transition to its own
search technology, powered by its own web crawler (called msnbot).

Microsoft's rebranded search engine, Bing, was launched on June 1, 2009. On July 29, 2009, Yahoo! and
Microsoft finalized a deal in which Yahoo! Search would be powered by Microsoft Bing technology.

In 2012, following the April 24 release of Google Drive, Google released the Beta version of Open
Drive (available as a Chrome app) to enable the search of files in the cloud . Open Drive has now been
rebranded as Cloud Kite. Cloud Kite is advertised as a "collective encyclopedia project based on Google
Drive public files and on the crowd sharing, crowd sourcing and crowd-solving principles". Cloud Kite will
also return search results from other cloud storage content services including Dropbox, SkyDrive, Evernote
and Box.[11]

How web search engines work[edit]

This section possibly contains original research. Please improve it by verifying the
claims made and adding inline citations. Statements consisting only of original research
may be removed. (October 2012)
This article needs additional citations for verification. Please help improve this
article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (July 2013)

A search engine operates in the following order:

1. Web crawling
2. Indexing

3. Searching[12]

Web search engines work by storing information about many web pages, which they retrieve from
the HTML markup of the pages. These pages are retrieved by a Web crawler(sometimes also known as a
spider) — an automated Web crawler which follows every link on the site. The site owner can exclude
specific pages by using robots.txt.

The search engine then analyzes the contents of each page to determine how it should be indexed (for
example, words can be extracted from the titles, page content, headings, or special fields called meta tags).
Data about web pages are stored in an index database for use in later queries. A query from a user can be
a single word. The index helps find information relating to the query as quickly as possible. [12] Some search
engines, such as Google, store all or part of the source page (referred to as a cache) as well as information
about the web pages, whereas others, such as AltaVista, store every word of every page they find.[citation
needed]
This cached page always holds the actual search text since it is the one that was actually indexed, so
it can be very useful when the content of the current page has been updated and the search terms are no
longer in it.[12] This problem might be considered a mild form of linkrot, and Google's handling of it
increases usability by satisfying user expectations that the search terms will be on the returned webpage.
This satisfies the principle of least astonishment, since the user normally expects that the search terms will
be on the returned pages. Increased search relevance makes these cached pages very useful as they may
contain data that may no longer be available elsewhere.[citation needed]

High-level architecture of a standard Web crawler

When a user enters a query into a search engine (typically by using keywords), the engine examines
its index and provides a listing of best-matching web pages according to its criteria, usually with a short
summary containing the document's title and sometimes parts of the text. The index is built from the
information stored with the data and the method by which the information is indexed. [12] From 2007 the
Google.com search engine has allowed one to search by date by clicking 'Show search tools' in the leftmost
column of the initial search results page, and then selecting the desired date range. [citation needed] Most search
engines support the use of the boolean operators AND, OR and NOT to further specify the search query.
Boolean operators are for literal searches that allow the user to refine and extend the terms of the search.
The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced
feature called proximity search, which allows users to define the distance between keywords.[12] There is
also concept-based searching where the research involves using statistical analysis on pages containing
the words or phrases you search for. As well, natural language queries allow the user to type a question in
the same form one would ask it to a human. A site like this would be ask.com. [citation needed]

The usefulness of a search engine depends on the relevance of the result set it gives back. While there
may be millions of web pages that include a particular word or phrase, some pages may be more relevant,
popular, or authoritative than others. Most search engines employ methods to rank the results to provide
the "best" results first. How a search engine decides which pages are the best matches, and what order the
results should be shown in, varies widely from one engine to another.[12] The methods also change over
time as Internet usage changes and new techniques evolve. There are two main types of search engine
that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have
programmed extensively. The other is a system that generates an "inverted index" by analyzing texts it
locates. This first form relies much more heavily on the computer itself to do the bulk of the work.

Most Web search engines are commercial ventures supported by advertising revenue and thus some of
them allow advertisers to have their listings ranked higher in search results for a fee. Search engines that
do not accept money for their search results make money by running search related ads alongside the
regular search engine results. The search engines make money every time someone clicks on one of these
ads.[13]

Market share[edit]

This section requires expansion with:

Information about national search
engines
like StatCounter, Yandex,Naver and
their market share in respective
countries.. (October 2011)

Search
Market share in May 2011 Market share in December 2010[14]
engine

Google 82.80% 84.65%

Yahoo! 6.42% 6.69%

Baidu 4.89% 3.39%

Bing 3.91% 3.29%

Search
Market share in May 2011 Market share in December 2010[14]
engine

Yandex 1.7% 1.3%

Ask 0.52% 0.56%

AOL 0.3% 0.42%

Google's worldwide market share peaked at 86.3% in April 2010.[15] Yahoo!, Bing and other search engines
are more popular in the US than in Europe.

According to Hitwise, market share in the USA for October 2011 was Google 65.38%, Bing-powered (Bing
and Yahoo!) 28.62%, and the remaining 66 search engines 6%. However, an Experian Hit wise report
released in August 2011 gave the "success rate" of searches sampled in July. Over 80 percent of Yahoo!
and Bing searches resulted in the users visiting a web site, while Google's rate was just under 68 percent. [16]
[17]

In the People's Republic of China, Baidu held a 61.6% market share for web search in July 2009.
[18]
In Russian Federation, Yandex holds around 60% of the market share as of April 2012.[19] In July 2013
Google controls 84% Global & 88% US market share for web search.[20] In South Korea, Naver (Hangul: 네이
버) is a popular search portal, which holds a market share of over 70% at least since 2011,[21] continuing to
2013.[22]

Search engine bias[edit]

Although search engines are programmed to rank websites based on some combination of their popularity
and relevancy, empirical studies indicate various political, economic, and social biases in the information
they provide. [23] [24] These biases can be a direct result of economic and commercial processes (e.g.,
companies that advertise with a search engine can become also more popular in its organic search results),
[25]
and political processes (e.g., the removal of search results to comply with local laws).

Biases can also be a result of social processes, as search engine algorithms are frequently designed to
exclude non-normative viewpoints in favor of more "popular" results.[26]Indexing algorithms of major search
engines skew towards coverage of U.S.-based sites, rather than websites from non-U.S. countries. [24]

Google Bombing is one example of an attempt to manipulate search results for political, social or
commercial reasons.

Customized results and filter bubbles[edit]

Many search engines such as Google and Bing provide customized results based on the user's activity
history. This leads to an effect that has been called a filter bubble. The term describes a phenomenon in
which websites use algorithms to selectively guess what information a user would like to see, based on
information about the user (such as location, past click behaviour and search history). As a result, websites
tend to show only information that agrees with the user's past viewpoint, effectively isolating the user in a
bubble that tends to exclude contrary information. Prime examples are Google's personalized search results
and Facebook's personalized news stream. According to Eli Pariser, who coined the term, users get less
exposure to conflicting viewpoints and are isolated intellectually in their own informational bubble. Pariser
related an example in which one user searched Google for "BP" and got investment news about British
Petroleum while another searcher got information about the Deepwater Horizon oil spill and that the two
search results pages were "strikingly different."[27][28][29] The bubble effect may have negative implications for
civic discourse, according to Pariser.[30]

Since this problem has been identified, competing search engines have emerged that seek to avoid this
problem by not tracking[31] or "bubbling"[32] users.

See also[edit]

 Comparison of web search engines

 List of search engines

 Answer engine (question answering)

 Quora

 True Knowledge

 Wolfram Alpha

 Collaborative search engine

 Enterprise search

 Google effect

 Internet Search Engines and Libraries

 Using search engines etc. for research

 Metasearch engine

 Natural language search engine

 OpenSearch

 Search directory

 Search engine marketing

 Search engine optimization

 Search oriented architecture

 Selection-based search

 Semantic Web
 Social search

 Spell checker

 Web indexing

 Web search query

 Website Parse Template

References[edit]

1. Jump up^ "World-Wide Web Servers". W3.org. Retrieved 2012-05-14.

2. Jump up^ "What's New! February 1994". Home.mcom.com. Retrieved 2012-05-14.

3. Jump up^ "Internet History - Search Engines" (from Search Engine Watch), Universiteit Leiden,

Netherlands, September 2001, web: LeidenU-Archie.

4. Jump up^ Oscar Nierstrasz (2 September 1993). "Searchable Catalog of WWW Resources

(experimental)".

5. Jump up^ "Archive of NCSA what's new in December 1993 page". Web.archive.org. 2001-06-20.

Archived from the original on 2001-06-20. Retrieved 2012-05-14.

6. Jump up^http://www.udacity.com/view#Course/cs101/CourseRev/apr2012/Unit/616074/Nugget/671097

7. Jump up^ "Yahoo! And Netscape Ink International Distribution Deal"

8. Jump up^ Browser Deals Push Netscape Stock Up 7.8%. Los Angeles Times. 1 April 1996

9. Jump up^ Gandal, Neil (2001). "The dynamics of competition in the internet search engine

market". International Journal of Industrial Organization 19 (7): 1103–1117.doi:10.1016/S0167-

7187(01)00065-0.

10. Jump up^ "Our History in depth". W3.org. Retrieved 2012-10-31.

11. Jump up^ Open Drive

12. ^ Jump up to:a b c d e f Jawadekar, Waman S (2011), "8. Knowledge Management: Tools and

Technology", Knowledge Management: Text & Cases, New Delhi: Tata McGraw-Hill Education Private

Ltd, p. 278, ISBN 978-0-07-07-0086-4, retrieved November 23, 2012

13. Jump up^ "FAQ". RankStar. Retrieved 19 June 2013.

14. Jump up^ "Net Marketshare - World". Marketshare.hitslink.com. Retrieved 2012-05-14.

15. Jump up^ "Net Market share - Google". Marketshare.hitslink.com. Retrieved 2012-05-14.

16. Jump up^ "Google Remains Ahead of Bing, But Relevance Drops". August 12, 2011.

17. Jump up^ Experian Hitwise reports Bing-powered share of searches at 29 percent in October

2011, Experian Hitwise, November 16, 2011

18. Jump up^ "Search Engine Market Share July 2009 | Rise to the Top Blog". Risetothetop.techwyse.com.

2009-08-04. Retrieved 2012-05-14.

19. Jump up^ Pavliva, Halia (2012-04-02). "Yandex Internet Search Share Gains, Google Steady:

Liveinternet". Bloomberg.com. Retrieved 2012-05-14.

20. Jump up^ "Search Engine Market Share US & Globally - July 2013". www.academicads.ca. 07-11-

2013.

21. Jump up^ http://www.nhncorp.com/nhnen_/ir/AnnualReport/2012/NHN_2011_AR_ENG.pdf

22. Jump up^ "Naver's new format hits newspapers". Koreatimes.co.kr. 2012-05-24. Retrieved 2013-04-11.

23. Jump up^ Segev, El (2010). Google and the Digital Divide: The Biases of Online Knowledge, Oxford:

Chandos Publishing.

24. ^ Jump up to:a b Vaughan, Liwen; Mike Thelwall (2004). "Search engine coverage bias: evidence and

possible causes". Information Processing & Management 40 (4): 693–707.doi:10.1016/S0306-

4573(03)00063-3.

25. Jump up^ Berkman Center for Internet & Society (2002), “Replacement of Google with Alternative

Search Systems in China: Documentation and Screen Shots”, Harvard Law School.

26. Jump up^ Introna, Lucas; Helen Nissenbaum (2000). "Shaping the Web: Why the Politics of Search

Engines Matters". The Information Society: An International

Journal 16 (3).doi:10.1080/01972240050133634.

27. Jump up^ Parramore, Lynn (10 October 2010). "The Filter Bubble". The Atlantic. Retrieved 2011-04-20.

"Since Dec. 4, 2009, Google has been personalized for everyone. So when I had two friends this spring

Google "BP," one of them got a set of links that was about investment opportunities in BP. The other

one got information about the oil spill...."

28. Jump up^ Weisberg, Jacob (10 June 2011). "Bubble Trouble: Is Web personalization turning us into

solipsistic twits?". Slate. Retrieved 2011-08-15.

29. Jump up^ Gross, Doug (May 19, 2011). "What the Internet is hiding from you". CNN. Retrieved 2011-

08-15. "I had friends Google BP when the oil spill was happening. These are two women who were quite

similar in a lot of ways. One got a lot of results about the environmental consequences of what was

happening and the spill. The other one just got investment information and nothing about the spill at all."

30. Jump up^ Zhang, Yuan Cao; Séaghdha, Diarmuid Ó; Quercia, Daniele; Jambor, Tamas (February

2012). "Auralist: Introducing Serendipity into Music Recommendation".ACM WSDM.

31. Jump up^ "donttrack.us". Retrieved 2012-04-29.

32. Jump up^ "dontbubble.us". Retrieved 2012-04-29.

 Hock, Randolph (2007). The Extreme Searcher's Handbook.ISBN 978-0-910965-76-7

 Javed Mostafa (February 2005). "Seeking Better Web Searches". Scientific American Magazine. [dead link]
[dead link]

 Ross, Nancy; Wolfram, Dietmar (2000). "End user searching on the Internet: An analysis of term pair
topics submitted to the Excite search engine". Journal of the American Society for Information
Science 51 (10): 949–958. doi:10.1002/1097-4571(2000)51:10<949::AID-ASI70>3.0.CO;2-5.

 Xie, M. et al. (1998). "Quality dimensions of Internet search engines". Journal of Information
Science 24 (5): 365–372. doi:10.1177/016555159802400509.

 Information Retrieval: Implementing and Evaluating Search Engines. MIT Press. 2010.
List of search engines
From Wikipedia, the free encyclopedia

This is a list of articles about search engines, including web search engines, selection-based
search engines, metasearch engines, desktop search tools, and web portals andvertical market websites
that have a search facility for online databases.

Contents

[hide]

 1 By content/topic

o 1.1 General

o 1.2 P2P search engines

o 1.3 Metasearch engines

o 1.4 Geographically limited scope

o 1.5 Semantic

o 1.6 Accountancy

o 1.7 Business

o 1.8 Enterprise

o 1.9 Fashion

o 1.10 Food/Recipes

o 1.11 Mobile/Handheld

o 1.12 Job

o 1.13 Legal

o 1.14 Medical

o 1.15 News

o 1.16 People

o 1.17 Real estate / property

o 1.18 Television

o 1.19 Video Games

 2 By information type

o 2.1 Forum

o 2.2 Blog

o 2.3 Multimedia

o 2.4 Source code

o 2.5 BitTorrent

o 2.6 Cloud
o 2.7 Email

o 2.8 Maps

o 2.9 Price

o 2.10 Question and answer

 2.10.1 Human answers

 2.10.2 Automatic answers

o 2.11 Natural language

 3 By model

o 3.1 Privacy search engines

o 3.2 Open source search engines

o 3.3 Semantic browsing engines

o 3.4 Social search engines

o 3.5 Visual search engines

o 3.6 Search appliances

o 3.7 Desktop search engines

o 3.8 Usenet

 4 Based on

o 4.1 Google

o 4.2 Yahoo!

o 4.3 Bing

o 4.4 Ask.com

 5 Defunct or acquired search engines

 6 See also

 7 References

 8 External links

By content/topic
General

Name Language !

Baidu Chinese, Japanese

Bing Multilingual

Blekko English
Name Language !

DuckDuckGo English

Exalead Multilingual

Gigablast English

Google Multilingual

Sogou Chinese

Soso.com Chinese

Volunia Multilingual

Yahoo! Multilingual

Yandex Multilingual

Youdao Chinese

P2P search engines

Name Language

FAROO English

Seeks (Open Source) English

Name Language

YaCy (Free and fully decentralized) Multilingual

Metasearch engines
See also: Metasearch engine

Name Language

Blingo English

Yippy (formerly Clusty) English

DeeperWeb English

Dogpile English

Excite English

Harvester42

HotBot English

Info.com English

Ixquick (StartPage) Multilingual

Kayak and SideStep Multilingual

Mamma

Metacrawler English
Name Language

Mobissimo Multilingual

Otalo English

PCH Search and Win

WebCrawler English

Fewclick Search English

Geographically limited scope

Name Language Country

Accoona Chinese, English China, United States

Alleba English Philippines

Ansearch English Australia, United States, United Kingdom, New Zealand

Biglobe Japanese Japan

Daum Korea

Egerin Kurdish Kurdistan

Goo Japanese Japan

Name Language Country

Guruji.com India

Leit.is Iceland

Maktoob Arab World

Miner.hu Hungary

Najdi.si Slovenia

Naver Korean Korea

Onkosh Arab World

Rambler Russia

Rediff India

SAPO Portugal, Angola, Cabo Verde, Mozambique

Search.ch Switzerland

Sesam Norway, Sweden

Seznam Czech Republic

Walla! Israel

Yandex.ru Russia, Turkey, Ukraine, Belarus, Kazakhstan

Name Language Country

Yehey! Philippines

ZipLocal Canada/United States

Semantic
See also: Semantic search

Search Engine
Description Speciality
Name

Semantic mathematics, chemistry and knowledge

VikiTron math, numbers, chemistry, geography
engine.

Specialises in auto-tagging companies websites in

Firmily business search engine
economic sectors

Specialises in surfacing concepts from various

Invention Machine's document types, enterprise applications, technical Decision engine for researchers,
Goldfire and deep web sites, worldwide patent literature, engineers, and scientists.
consumer sentiment and more.

Sophia Search Specialises in auto-tagging of content for semantic

search engine
Limited search and discovery

Symbolab Specializes in scientific search scientific search engine

True
Knowledge(now Evi) Specialises in knowledge base and semantic search answer engine
[1]

Semantic web search engine for food, cooking and

Yummly food related
recipes
Search Engine
Description Speciality
Name

Browse.lt Fast and comfortable web search engine web, video, images, news

Semantic web ontologies. Indexes over

Swoogle Searching over 10,000 ontologies
4 million semantic web documents.

Provides keyword-based search for

objects, concepts (classes and
Falcons Full semantic search engine
properties), ontologies, and RDF
documents on the semantic web.

International Digital Archive of semantic metadata extracted

Semantic document search engine
Media Archive from thousands of documents.

Accountancy

 IFACnet
Business

 Business.com

 Getit Infoservices Private Limited

 GenieKnows (United States and Canada)

 GlobalSpec

 Nexis (Lexis Nexis)

 Thomasnet (United States)

 Justdial
Enterprise
See also: Enterprise search

 Funnelback: Funnelback Search

 Jumper 2.0: Universal search powered by Enterprise bookmarking

 Oracle Corporation: Secure Enterprise Search 10g

 Q-Sensei: Q-Sensei Enterprise

 TeraText: TeraText Suite

Fashion

 Fashion Net
Food/Recipes

 RecipeBridge: vertical search engine for recipes

 Yummly: semantic recipe search

Mobile/Handheld

 Taganode Local Search Engine

 Taptu: taptu mobile/social search

Job
Main article: Job search engine

 Adzuna (UK)

 Bixee.com (India)

 CareerBuilder.com (USA)

 Craigslist (by city)

 Dice.com (USA)

 Eluta.ca (Canada)

 Hotjobs.com (USA)

 JobStreet.com (Southeast Asia, Japan and India)

 Incruit (Korea)

 Indeed.com (USA)

 Glassdoor.com (USA)

 LinkUp.com (USA)

 Monster.com (USA), (India)

 Naukri.com (India)

 Yahoo! HotJobs (Countrywise subdomains, International)

Legal

 Google Scholar

 Lexis (Lexis Nexis)

 Manupatra

 Quicklaw

 WestLaw
Medical

 Bing Health

 Bioinformatic Harvester

 EB-eye EMBL-EBI's Search engine

 Entrez (includes Pubmed)

 GenieKnows

 GoPubMed (knowledge-based: GO - GeneOntology and MeSH - Medical Subject Headings)

 Healia

 Healthline

 Nextbio (Life Science Search Engine)

 PubGene

 Quertle (Semantic search of the biomedical literature)

 Searchmedica

 WebMD
News

 Bing News

 Daylife

 Google News

 MagPortal

 Newslookup

 Nexis (Lexis Nexis)

 Topix.net

 Trapit

 Yahoo! News
People

 Comfibook

 Ex.plode.us

 InfoSpace

 PeekYou

 Spock

 Spokeo

 Wink

 Worldwide Helpers

 Zabasearch.com
 ZoomInfo
Real estate / property

 Fizber.com

 HotPads.com

 Realtor.com

 Redfin

 Rightmove

 Trulia

 Zillow.com

 Zoopla
Television

 TV Genius
Video Games

 Wazap (Japan)
By information type

Search engines dedicated to a specific kind of information

Forum

 Omgili
Blog

 Amatomu

 Bloglines

 BlogScope

 IceRocket

 Regator

 Technorati
Multimedia
See also: Multimedia search

 Bing Videos

 blinkx

 FindSounds

 Google Video
 Munax's PlayAudioVideo

 Picsearch

 Pixsta

 Podscope

 ScienceStage

 SeeqPod

 Songza

 TinEye

 TV Genius

 Veveo

 Yahoo! Video
Source code

 Google Code Search

 Koders

 Krugle
BitTorrent
These search engines work across the BitTorrent protocol.

 BTDigg

 FlixFlux

 Isohunt

 Mininova

 The Pirate Bay

 TorrentSpy

 Torrentz
Cloud
Search engines listed below find various types of files that have been stored in the cloud and made publicly
available.

 Open Drive
Email

 Lookeen

 TEK
Maps

 Bing Maps

 Géoportail

 Google Maps

 MapQuest

 Nokia Maps

 OpenStreetMap

 WikiMapia

 Yahoo! Maps
Price

 Bing Shopping

 Google Shopping (formerly Google Product Search and Froogle)

 Kelkoo

 MySimon

 PriceGrabber

 PriceRunner

 PriceSCAN

 Pronto.com

 Shopping.com

 ShopWiki

 Shopzilla (also operates Bizrate)

 SwoopThat.com

 TheFind.com
Question and answer
Human answers

 Answers.com

 Ask Me Help Desk

 DeeperWeb

 eHow

 Quora

 Stack Overflow/Stack Exchange Network

 Uclue

 wikiHow

 Yahoo! Answers
Automatic answers
See also: Question answering

 AskMeNow

 BrainBoost

 True Knowledge

 Wolfram Alpha
Natural language
See also: Natural language search engine and Semantic search

 Ask.com

 Bing (Semantic ability is powered by Powerset)

 hakia

 Lexxe
By model
Privacy search engines

 DuckDuckGo

 Ixquick (StartPage)
Open source search engines

 DataparkSearch

 Gigablast

 Grub

 ht://Dig

 Isearch

 Lemur Toolkit & Indri Search Engine

 Lucene

 mnoGoSearch

 Namazu

 Nutch

 Recoll

 Sciencenet (for scientific knowledge, based on YaCy technology)

 Searchdaimon

 Seeks

 Sphinx

 SWISH-E
 Terrier Search Engine

 Xapian

 YaCy

 Zettair
Semantic browsing engines

 Hakia

 Yebol
Social search engines
See also: Social search, Relevance feedback, and Human search engine

 ChaCha Search

 Delver

 Eurekster

 Mahalo.com

 Rollyo

 SearchTeam

 Sproose

 Trexy

 Wink provides web search by analyzing user contributions such as bookmarks and feedback
Visual search engines
See also: Visual search engine

 ChunkIt!

 Grokker

 Pixsta

 PubGene

 TinEye

 Viewzi

 Macroglossa
Search appliances
See also: Search appliance

 Google Search Appliance

 Fabasoft

 Searchdaimon

 Thunderstone
Desktop search engines
See also: Desktop search

Name Platform Remarks License

Proprietary,
Autonomy Windows IDOL Enterprise Desktop Search.
commercial

A mix of
the X11/MIT
Open source desktop search tool for Linux based on
Beagle Linux License and
Lucene. Unmaintained since 2009.
the Apache
License

Copernic
Free for
Desktop Windows
home use
Search

Open source desktop search tool for Windows and Linux, Eclipse Public
DocFetcher Cross-platform
based on Apache Lucene License

dtSearch Proprietary
Windows
Desktop (30 day trial)

Everything Windows Find files and folders by name instantly on NTFS volumes Freeware

Integrates with the main Google search engine page. 5.9

Google Linux, Mac
Release now supports x64 systems. As of September 14, Freeware
Desktop OS, Windows
2011, Google has discontinued this product.

GNOME
Linux Open Source desktop search tool for Unix/Linux GPL
Storage

InSight Windows Metadata-based search utility Freeware

Desktop
Name Platform Remarks License

ISYS Search Proprietary

Windows ISYS:desktop search software.
Software (14 day trial)

Locate32 Windows Graphical port of Unix's locate & updatedb BSD License[2]

Proprietary
Lookeen Windows Outlook Search Tool, with integrated Desktop Search
(14 day trial)

Meta Tracker Linux, Unix Open Source desktop search tool for Unix/Linux GPL v2 [3]

Recoll Linux, Unix Open Source desktop search tool for Unix/Linux GPL [4]

Spotlight Mac OS Found in Apple Mac OS X "Tiger" and later OS X releases. Proprietary

Linux, Unix, Solaris, Ma

Strigi Cross-platform open source desktop search engine LGPL v2 [5]
c OS X and Windows

Terrier
Search Linux, Mac OS, Unix Desktop search for Windows, Mac OS X (Tiger), Unix/Linux. MPL
Engine

Freeware
Tropes Zoom Windows Semantic Search Engine. and
commercial

Unity Dash Linux Part of Ubuntu Desktop. GPL v3

Part of Windows Vista and later OSs. Available as Windows

Windows Proprietary,
Windows Desktop Search for Windows XP andServer 2003. Does not
Search freeware
support indexing UNC paths on x64 systems.
Usenet

 Google Groups (formerly Deja News)

Based on
Google

 AOL Search

 CompuServe Search

 Groovle

 MySpace Search

 Mystery Seeker

 Netscape

 Ripple
Yahoo!

 Ecocho

 Everyclick (formerly based on Ask.com)

 Forestle (an ecologically motivated site supporting sustainable rain forests - formerly based on Google)

 GoodSearch

 Rectifi
Bing

 A9.com

 Alexa Internet

 Ciao!

 Facebook

 Ms. Dewey

 Yahoo! Search

 Egerin
Ask.com

 Hakia (semantic search)

 iWon

 Lycos

 Teoma
Defunct or acquired search engines
 AlltheWeb

 AltaVista (acquired by Yahoo! in 2003, shut down in 2013)

 Brainboost (Public engine no longer exists, acquired by Answers, Inc.)

 BRS/Search (now OpenText Livelink ECM Discovery Server)

 Btjunkie

 Cuil

 ChunkIt! (now "yolink!") (Public engine no longer exists)

 Direct Hit Technologies (acquired by Ask Jeeves in January, 2000)

 Google Answers

 IBM STAIRS

 Infoseek

 Inktomi

 Kartoo

 LeapFish (Public engine no longer exists)

 Lotus Magellan

 MetaLib (Public engine no longer exists)

 mozDex (No longer exists)

 Myriad Search (No longer exists)

 Overture.com (formerly GoTo.com, now Yahoo! Search Marketing)

 PubSub

 RetrievalWare (acquired by Fast Search & Transfer and now owned by Microsoft)

 Scroogle (Google Scraper)

 Singingfish

 Speechbot

 Sphere

 Tafiti

 Yebol

 Wikia Search (defunct)

 WiseNut

 World Wide Web Worm

 List of academic databases and search engines

 List of semantic search engines

 List of web directories

 Search aggregator
 Search engine optimization

 Category:Search engine software

References

1. Jump up^ http://evi.com . Accessed 2013 June 14.

2. Jump up^ According to http://www.softpedia.com/get/Internet/Servers/Database-Utils/Locate32.shtml

3. Jump up^ According to COPYING in SVN trunk.

4. Jump up^ According to [1].

5. Jump up^ According to COPYING in version 0.5.10 tar.bz2 package.

Metasearch engine
From Wikipedia, the free encyclopedia
[hide]This article has multiple issues. Please help improve it or discuss these issues on the
This article relies largely or entirely upon a single source. (November 2010)
This article needs additional citations for verification. (November 2010)

A metasearch engine is a search tool[1] that sends user requests to several other search engines and/or
databases and aggregates the results into a single list or displays them according to their source.
Metasearch engines enable users to enter search criteria once and access several search engines
simultaneously. Metasearch engines operate on the premise that the Web is too large for any one search
engine to index it all and that more comprehensive search results can be obtained by combining the results
from several search engines. This also may save the user from having to use multiple search engines
separately.

The term "metasearch" is frequently used to classify a set of commercial search engines, see the list of
Metasearch engine, but is also used to describe the paradigm of searching multiple data sources in real
time. The National Information Standards Organization (NISO) uses the terms Federated Search and
Metasearch interchangeably to describe this web search paradigm.

Contents

[hide]

 1 Operation

 2 See also

 3 References

 4 External links

Operation[edit]

architecture of a metasearch engine

Metasearch engines create what is known as a virtual database. They do not compile a
physical database or catalogue of the web. Instead, they take a user's request, pass it to several
other heterogeneous databases and then compile the results in ahomogeneous manner based on a
specific algorithm. The first commercial meta search engine was created by Herman Tumurcuoglu in
Montreal, Quebec in 1996. It was dubbed the mother of all search engines Mamma.com

No two metasearch engines are alike.[2] Some search only the most popular search engines while others
also search lesser-known engines, newsgroups, and other databases. They also differ in how the results
are presented and the quantity of engines that are used. Some will list results according to search engine or
database. Others return results according to relevance, often concealing which search engine returned
which results. This benefits the user by eliminating duplicate hits and grouping the most relevant ones at the
top of the list.

Search engines frequently have different ways they expect requests submitted. For example, some search
engines allow the usage of the word "AND" while others require "+" and others require only a space to
combine words. The better metasearch engines try to synthesize requests appropriately when submitting
them[citation needed].

See also[edit]

For engines, see the list of search engines

 Search aggregator

 Federated search

 Metabrowsing

 Multisearch

 Travel website
References[edit]

1. Jump up^ Sandy Berger's Great Age Guide to the Internet By Sandy Berger. Que Publishing,

2005, ISBN 0-7897-3442-7.

2. Jump up^ Manoj, M Elizabeth, Jacob pages=739–746 (October 2008). "Information retrieval on

Internet using meta-search engines: A review". CSIR. Retrieved February 25, 2012.

External links[edit]

 Metasearch on the Open Directory Project

 Guide to Meta-Search Engines by UC Berkeley libraries with recommendation not to use them for
serious research.

 Meta-search: More heads better than one? Argument against Berkeley's negative recommendation
Federated search
Federated search is an information retrieval technology that allows the simultaneous search of multiple
searchable resources. A user makes a single query request which is distributed to the search
engines participating in the federation. The federated search then aggregates the results that are received
from the search engines for presentation to the user.

Contents

[hide]

 1 Purpose

 2 Process

 3 Implementation

 4 Challenges

 5 Related links

 6 See also

 7 References

Purpose[edit]

Federated search came about to meet the need of searching multiple disparate content sources with one
query. This allows a user to search multiple databases at once in real time, arrange the results from the
various databases into a useful form and then present the results to the user.

Process[edit]

As described by Peter Jacso (2004[1]), federated searching consists of (1) transforming a query and
broadcasting it to a group of disparate databases or other web resources, with the appropriate syntax, (2)
merging the results collected from the databases, (3) presenting them in a succinct and unified format with
minimal duplication, and (4) providing a means, performed either automatically or by the portal user, to sort
the merged result set.

Federated search portals, either commercial or open access, generally search public access bibliographic
databases, public access Web-based library catalogues (OPACs), Web-based search engines
like Google and/or open-access, government-operated or corporate data collections. These individual
information sources send back to the portal's interface a list of results from the search query. The user can
review this hit list. Some portals will merely screen scrape the actual database results and not directly allow
a user to enter the information source's application. More sophisticated ones will de-dupe the results list by
merging and removing duplicates. There are additional features available in many portals, but the basic
idea is the same: to improve the accuracy and relevance of individual searches as well as reduce the
amount of time required to search for resources.
This process allows federated search some key advantages when compared with existing crawler-based
search engines. Federated search need not place any requirements or burdens on owners of the individual
information sources, other than handling increased traffic. Federated searches are inherently as current as
the individual information sources, as they are searched in real time.

Implementation[edit]

Federating across three search engines

One application of federated searching is the metasearch engine; however, this is not a complete solution
as many documents are not currently indexed. These documents are on what is known as the deep Web, or
invisible Web. Many more information sources are not yet stored in electronic form. Google Scholar is one
example of many projects trying to address this.

When the search vocabulary or data model of the search system is different from the data model of one or
more of the foreign target systems the query must be translated into each of the foreign target systems.
This can be done using simple data-element translation or may require semantic translation.

A challenge faced in the implementation of federated search engines is scalability, in other words, the
performance of the site as the number of information sources comprising the federated search engine
increase. One federated search engine that has begun to address this issue is WorldWideScience, hosted
by the U.S. Department of Energy's Office of Scientific and Technical Information. WorldWideScience [2] is
composed of more than 40 information sources, several of which are federated search portals themselves.
One such portal is Science.gov [3] which itself federates more than 30 information sources representing
most of the R&D output of the U.S. Federal government. Science.gov returns its highest ranked results to
WorldWideScience, which then merges and ranks these results with the search returned by the other
information sources that comprise WorldWideScience.[3] This approach of cascaded federated search
enables large number of information sources to be searched via a single query.

Another application Sesam running in both Norway and Sweden has been built on top of an open sourced
platform specialised for federated search solutions. Sesat,[4] an acronym for Sesam Search Application
Toolkit, is a platform that provides much of the framework and functionality required for handling parallel
and pipelined searches and displaying them elegantly in a user interface, allowing engineers to focus on the
index/database configuration tuning.
Challenges[edit]

When federated search is performed against secure data sources, the users' credentials must be passed on
to each underlying search engine, so that appropriate security is maintained. If the user has different login
credentials for different systems, there must be a means to map their login ID to each search engine's
security domain.[5]

Another challenge is mapping results list navigators into a common form. Suppose 3 real-estate sites are
searched, each provides a list of hyperlinked city names to click on, to see matches only in each city. Ideally
these facets would be combined into one set, but that presents additional technical challenges. [6] The
system also needs to understand "next page" links if it's going to allow the user to page through the
combined results.

2. Jump up^ WorldWideScience

3. ^ Jump up to:a b Science.gov

4. Jump up^ Sesat

5. Jump up^ Mapping Security Requirements to Enterprise Search

6. Jump up^ 20+ Differences Between Internet vs. Enterprise Search - part 1

[hide]

 V

 T

 E

Internet search

Web search engine (List)

Metasearch engine
Types
Collaborative search engine

Human flesh search engine

Local search

Vertical search

Search engine marketing

Search engine optimization

Search oriented architecture

Selection-based search

Social search

Document retrieval

Text mining

Web crawler

Multisearch

Federated search
Tools
Search aggregator

Index/Web indexing

Focused crawler

Spider trap

Robots exclusion standard

Distributed web crawling

Web archiving

Website mirroring software

Web search query

Voice search

Natural language search engine

Web query classification

Applications Image search

Video search engine

Enterprise search

Semantic search

Z39.50

Search/Retrieve Web Service

Search/Retrieve via URL

Protocols
OpenSearch
and standards
Representational State Transfer

Website Parse Template

Wide area information server

Search engine

 Searching multimedia federated content web collections in Online Information Review

This computer networking article is a stub. You can help Wikipedia by expanding it.
Search aggregator
From Wikipedia, the free encyclopedia
[hide]This article has multiple issues. Please help improve it or discuss these issues on the
This article needs additional citations for verification. (August 2012)
This article possibly contains original research. (December 2008)

A search aggregator is a type of metasearch engine which gathers results from multiple search engines
simultaneously, typically through RSS search results. It combines user specified search feeds
(parameterized RSS feeds which return search results) to give the user the same level of control over
content as a general aggregator.

Soon after the introduction of RSS, sites began publicising their search results in parameterized RSS feeds.
Search aggregators are an increasingly popular way to take advantage of the power of multiple search
engines with a flexibility not seen in traditional metasearch engines. To the end user, a search aggregator
may appear to be just a customizable search engine and the use of RSS may be completely hidden.
However, the presence of RSS is directly responsible for the existence of search aggregators and a critical
component in the behind-the-scenes technology.

Contents

[hide]

 1 History

 2 Functional overview

 3 Advantages

 4 Patents

 5 See also

 6 References

History[edit]

The concept of search aggregation is a relatively recent phenomenon with the first ones becoming available
in 2006. In 2005 Amazon published the OpenSearch specification for making search results available in a
generic XML format. While many sites currently publish results in OpenSearch, many simply publish in
generic RSS format. However, while OpenSearch syndication allows for greater flexibility in the way Search
Aggregators display results, it is generally not required.

Functional overview[edit]

A search aggregator typically allows users to select specific search engines ad hoc to perform a specified
query. At the time the user enters the query into the Search Aggregator, it generates the required URL "on
the fly" by inserting the search query into the parameterized URL for the search feed. A parameterized URL
looks something like this:
http://news.google.com/news?hl=en&ned=us&q={SEARCH_TERMS}&ie=UTF-
8&output=rss

In this case, the {SEARCH_TERMS} parameter would be replaced with the user requested search terms,
and the query would be sent to the host. The Search Aggregator would then parse the results and display
them in a user-friendly way.

Advantages[edit]

This system has several advantages over traditional metasearch engines. Primarily, it allows the user
greater flexibility in deciding which engines should be used to perform the query. They also allow for easy
addition of new engines to the users personal collection (similar to the way a user adds a new news feed to
a news aggregator.)

Patents[edit]

Apple patent 6,847,959,[1] filed January 5 2000, covers universal search aggregation. This resulted in the
removal [2] of this feature from Samsung Android smart phones in July 2012.

See also[edit]

 Aggregator

 Metasearch engine

 Federated search
References[edit]

1. Jump up^ "Patent US6847959 - Universal interface for retrieval of information in a computer system -

Google Patents". Google.com. Retrieved 2012-08-16.

2. Jump up^ Florian Mueller (2012-02-15). "Last week's Apple-Samsung lawsuit involves eight patents, 17

products - bid for Nexus ban is based on only a subset". FOSS Patents. Retrieved 2012-08-16.
News aggregator
From Wikipedia, the free encyclopedia

This article needs additional citations for verification. Please help improve this
article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (September 2009)

News aggregator

In computing, a news aggregator, also termed a feed aggregator, feed reader, news
reader, RSS reader or simply aggregator, isclient software or a web application which
aggregates syndicated web content such as online newspapers, blogs, podcasts, and video blogs (vlogs) in
one location for easy viewing.

Contents

[hide]

 1 Function

 2 Types

o 2.1 News aggregation websites

o 2.2 Web-based feed readers

o 2.3 Feed reader applications

 3 Media aggregators

 4 Broadcatching

 5 Feed filtering

 6 See also

 7 References

 8 External links

Function[edit]
Visiting many separate websites frequently to find out if content on the site has been updated can take a
long time. Aggregation technology helps to consolidate many websites into one page that can show the new
or updated information from many sites. Aggregators reduce the time and effort needed to regularly check
websites for updates, creating a unique information space or personal newspaper. Once subscribed to a
feed, an aggregator is able to check for new content at user-determined intervals and retrieve the update.
The content is sometimes described as being pulled to the subscriber, as opposed to pushed with email or
IM. Unlike recipients of some pushed information, the aggregator user can easily unsubscribe from a feed.

Aggregation features are frequently built into web portal sites, in the web browsers themselves,
in email applications or in application software designed specifically for reading feeds.

The aggregator provides a consolidated view of the content in one browser display or desktop application.
Aggregators with podcasting capabilities can automatically download media files, such as MP3 recordings.
In some cases, these can be automatically loaded onto portable media players (like iPods) when they are
connected to the end-user's computer.

By 2011, so-called RSS-narrators appeared, which aggregated text-only news feeds, and converted them
into audio recordings for offline listening.

The syndicated content an aggregator will retrieve and interpret is usually supplied in the form of RSS or
other XML-formatted data, such as RDF/XML or Atom.

Types[edit]

The variety of software applications and components that are available to collect, format, translate, and
republish XML feeds is a testament to the flexibility of the format and has shown the usefulness
of presentation-independent data.

News aggregation websites[edit]

This section needs additional citations for verification. Please help improve this
article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (January 2014)

Examples of this sort of website are Google News, Drudge Report, Huffington Post,
[1]
Newslookup, Newsvine, World News (WN) Network, and Daily Beast where aggregation is entirely
automatic, using algorithms which carry out contextual analysis and group similar stories together, while
other sites supplement automatically-aggregated stories with manually curated headlines and their own
articles.[2]

News aggregation websites began with content selected and entered by humans, while automated
selection algorithms were eventually developed to fill the content from a range of either automatically
selected or manually added sources. Google News launched in 2002 using automated story selection, but
humans could add sources to its search engine, while the older Yahoo News, as of 2005, used a
combination of automated news crawlers and human editors.[3][4][5]
Web-based feed readers[edit]
Web-based feeds readers allow users to find a web feed on the internet and add it to their feed reader.
Online feed readers include Bloglines, Feedly, Feedspot, Flipboard, DiggReader, News360, Google
Reader (discontinued July 1, 2013[6]), My Yahoo!, NewsBlur,[7][8] Netvibes. These are meant for personal use
and are hosted on remote servers. Because the application is available via the web, it can be accessed
anywhere by a user with an internet connection.

More advanced methods of aggregating feeds are provided via Ajax coding techniques
and XML components called web widgets. Ranging from full-fledged applications to small fragments
of source code that can be integrated into larger programs, they allow users to aggregate OPML files, email
services, documents, or feeds into one interface. Many customizable homepage and portal implementations
provide such functionality.

In addition to aggregator services mainly for individual use, there are web applications that can be used to
aggregate several blogs into one. One such variety—called planet sites—are used by online communities to
aggregate community blogs in a centralized location. They are named after the Planet aggregator, a server
application designed for this purpose.

Feed reader applications[edit]

Main article: Comparison of feed aggregators

Feed aggregation applications are installed on a PC, smartphone or tablet computer and designed to collect
news and interest feed subscriptions and group them together using a user-friendly interface. The graphical
user interface of such applications often closely resembles that of popular e-mail clients, using a three-
panel composition in which subscriptions are grouped in a frame on the left, and individual entries are
browsed, selected, and read in frames on the right. Some notable examples include Flipboard, Prismatic,
and CNN-owned Zite.[9][10]

Software aggregators can also take the form of news tickers which scroll feeds like ticker tape, alerters that
display updates in windows as they are refreshed, web browser macro tools or as smaller components
(sometimes called plugins or extensions), which can integrate feeds into the operating system or software
applications such as a web browser. Clients applications include Mozilla Firefox, Microsoft Office
Outlook, iTunes, FeedDemon and many others.

Media aggregators[edit]

Media aggregators are sometimes referred to as podcatchers due to the popularity of the
term podcast used to refer to a web feed containing audio or video. Media aggregators are client software
or web-based applications which maintain subscriptions to feeds that contain audio or video media
enclosures. They can be used to automatically download media, playback the media within the application
interface, or synchronize media content with a portable media player.

Broadcatching[edit]
Several BitTorrent client software applications have added the ability to broadcatch torrents of distributed
multimedia through the aggregation of web feeds.

Feed filtering[edit]

One of the problems with news aggregators is that the volume of articles can sometimes be overwhelming,
especially when the user has many web feed subscriptions. As a solution, many feed readers allow users
to tag each feed with one or more keywords which can be used to sort and filter the available articles into
easily navigable categories. Another option is to import the user's Attention Profile to filter items based on
their relevance to the user's interests.

See also[edit]

 Comparison of feed aggregators

 History of web syndication technology

 Web feed

 Metasearch engine

 Lifestreaming

 Associated Press v. Meltwater

References[edit]

1. Jump up^ Luscombe, Belinda (2009-03-19). "Arianna Huffington: The Web's New Oracle". Time (Time

Inc). Retrieved 2009-03-30. (subscription required (help)). "The Huffington Post was to have three basic

functions: blog, news aggregator with an attitude and place for premoderated comments."

2. Jump up^ "Google News and newspaper publishers: allies or enemies?". Editorsweblog.org. World

Editors Forum. Retrieved 2009-03-31.

3. Jump up^ Hansell, Saul (24 September 2002). "All the news Google algorithms say is fit to print". The

New York Times. Retrieved 20 January 2014.

4. Jump up^ Hill, Brad (24 October 2005). Google Search & Rescue For Dummies. John Wiley & Sons.

p. 85. ISBN 978-0-471-75811-2.

5. Jump up^ LiCalzi O'Connell, Pamela (29 January 2001). "New Economy; Yahoo Charts the Spread of

the News by E-Mail, and What It Finds Out Is Itself Becoming News.". New York Times.

6. Jump up^ Chitu, Alex. "No More Google Reader". Google Operating System. Blogger. Retrieved 14

March 2013.

7. Jump up^ "YC-Backed NewsBlur Takes Feed Reading Back To Its Basics". TechCrunch. July 30,

2012.

8. Jump up^ "Need A Google Reader Alternative? Meet Newsblur". Search Engine Land. March 14, 2013.

9. Jump up^ Cheredar, Tom (22 May 2013). "Zite’s new iOS app update welcomes (but doesn’t cater to)

mournful Google Reader users". VentureBeat. Retrieved 24 February 2014.

10. Jump up^ Dugdale, Addy (14 March 2013). "Google Reader is dead, but Digg, Zite are among these

alternatives". Fast Company. Retrieved 24 February 2014.

Funnelback
From Wikipedia, the free encyclopedia
[hide]This article has multiple issues. Please help improve it or discuss these issues on the
The topic of this article may not meet Wikipedia's notability guidelines for companies and organiz
This article needs additional citations for verification. (August 2008)
This article appears to be written like an advertisement. (August 2008)
This article contains wording that promotes the subject in a subjective manner without imparting r

Funnelback

Type Private

Industry Software

Founded 2006

Headquarters Canberra, Australia

Key people Dr. David Hawking

Products Funnelback Hosted, Funnelback Server for Web,

Funnelback Server for Enterprise, Funnelback Platform

Revenue Unknown

Employees 35

Website www.funnelback.com

Funnelback is both an enterprise search engine and the name of the company selling the technology.
Funnelback is used by manyAustralian universities and government organisations to search for information
on their websites, intranets, file-shares and databases.

Contents

[hide]

 1 History

 2 Research
 3 Technology

 4 Awards

 5 Sources

History[edit]

Funnelback was originally developed by the CSIRO as part of a research project, then called "P@noptic".
The initial design and development was headed up by Dr. David Hawking, a researcher on search
technologies. With its initial launch in 2001 as P@noptic, and with the research weight of CSIRO behind it, it
quickly attracted some high profile clients. With its rapid commercial success, it was spun off from the
CSIRO ICT Centre as its own company in February 2006. In July 2009 Funnelback was purchased
by Squiz.

Research[edit]

Despite being spun off from CSIRO, it still retains close research links with them and jointly publishes
papers. Research is currently underway in the areas of search types, subject-specific search, realistic
applications of metasearch, topic distillation, website searchability and search in support of e-commerce.

Technology[edit]

The core of the system is based around the Padre engine developed by the CSIRO to perform very fast
data look-ups.

It has the ability to search a range of formats in addition to range of HTML files. These
include PDF, Word documents, Excel spreadsheets, images, and XML. It also has the ability to connect to,
and index databases. Adapters currently exist for MySQL and TRIM Context, and the design of the system
allows users to create their own adapters should they not exist.

It works on Windows, Linux and Solaris, and there is also a hosted service available.

Awards[edit]

The Panoptic search engine has been widely recognised as a leader in its field. Panoptic has been
awarded:

 2003 Network Computing Editor’s Choice Award

 2004 Network Computing Well Connected Award

 2004 AIIA iAward

Sources[edit]

 Funnelback site

 Squiz acquires Funnelback

Deep Web
From Wikipedia, the free encyclopedia
This article needs additional citations for verification. Please help improve this
article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (December 2013)

The Deep Web (also called the Deepnet,[1] Invisible Web,[2] or Hidden Web[3]) is World Wide Web content
that is not part of the Surface Web, which is indexed by standardsearch engines. It should not be confused
with the dark Internet, the computers that can no longer be reached via the Internet, or with
a Darknet distributed filesharing network, which could be classified as a smaller part of the Deep Web.
There is concern that the deep web can be used for serious criminal activity.[4]

Mike Bergman, founder of BrightPlanet and credited with coining the phrase,[5] said that searching on the
Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be
caught in the net, but there is a wealth of information that is deep and therefore missed. [6] Most of the Web's
information is buried far down on dynamically generated sites, and standard search engines do not find it.
Traditional search engines cannot "see" or retrieve content in the deep Web—those pages do not exist until
they are created dynamically as the result of a specific search. As of 2001, the deep Web was
several orders of magnitude larger than the surface Web.[7]

Contents

[hide]

 1 Size

 2 Naming

 3 Deep resources

 4 Accessing

 5 Crawling the Deep Web

 6 Classifying resources

 7 Future

 8 See also

 9 References

 10 Further reading

 11 External links

Size[edit]

Estimates based on extrapolations from a study done at University of California, Berkeley in

2001[7] speculate that the deep Web consists of about 7.5 petabytes. More accurate estimates are available
for the number of resources in the deep Web: research of He et al. detected around 300,000 deep web
sites in the entire Web in 2004,[8] and, according to Shestakov, around 14,000 deep web sites existed in the
Russian part of the Web in 2006.[9]

Naming[edit]

Bergman, in a seminal paper on the deep Web published in the Journal of Electronic Publishing, mentioned
that Jill Ellsworth used the term invisible Web in 1994 to refer towebsites that were not registered with any
search engine.[7] Bergman cited a January 1996 article by Frank Garcia:[10]

It would be a site that's possibly reasonably designed, but they didn't bother to register it with any of the
search engines. So, no one can find them! You're hidden. I call that the invisible Web.

Another early use of the term Invisible Web was by Bruce Mount and Matthew B. Koll of Personal Library
Software, in a description of the @1 deep Web tool found in a December 1996 press release. [11]

The first use of the specific term Deep Web, now generally accepted, occurred in the aforementioned 2001
Bergman study.[7]

Deep resources[edit]

Deep Web resources may be classified into one or more of the following categories:

 Dynamic content: dynamic pages which are returned in response to a submitted query or accessed
only through a form, especially if open-domain input elements (such as text fields) are used; such fields
are hard to navigate without domain knowledge.

 Unlinked content: pages which are not linked to by other pages, which may prevent Web
crawling programs from accessing the content. This content is referred to as pages without backlinks
(or inlinks).

 Private Web: sites that require registration and login (password-protected resources).

 Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP
addresses or previous navigation sequence).

 Limited access content: sites that limit access to their pages in a technical way (e.g., using the Robots
Exclusion Standard, CAPTCHAs, or no-cache Pragma HTTP headerswhich prohibit search engines
from browsing them and creating cached copies.[12])

 Scripted content: pages that are only accessible through links produced by JavaScript as well as
content dynamically downloaded from Web servers via Flash or Ajax solutions.

 Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file
formats not handled by search engines.
Accessing[edit]

To discover content on the Web, search engines use web crawlers that follow hyperlinks through known
protocol virtual port numbers. This technique is ideal for discovering resources on the surface Web but is
often ineffective at finding deep Web resources. For example, these crawlers do not attempt to find dynamic
pages that are the result of database queries due to the indeterminate number of queries that are possible.
[5]
It has been noted that this can be (partially) overcome by providing links to query results, but this could
unintentionally inflate the popularity for a member of the deep Web.

In 2005, Yahoo! made a small part of the deep Web searchable by releasing Yahoo! Subscriptions. [citation
needed]
This search engine searches through a few subscription-only Web sites.[citation needed] Some subscription
websites display their full content to search engine robots so they will show up in user searches, but then
show users a login or subscription page when they click a link from the search engine results page. [citation
needed]

DeepPeep, Intute, Deep Web Technologies, and Scirus are a few search engines that have accessed the
deep web. Intute ran out of funding and is now a temporary static archive as of July, 2011. [13] Scirus retired
near the end of January, 2013.[14]

Some so-called hidden services can be accessed only through an onion router such as Tor (anonymity
network) which allows the user to access .onion web pages.

Crawling the Deep Web[edit]

Researchers have been exploring how the Deep Web can be crawled in an automatic fashion. In 2001,
Sriram Raghavan and Hector Garcia-Molina (Stanford Computer Science Department, Stanford University)
[15][16]
presented an architectural model for a hidden-Web crawler that used key terms provided by users or
collected from the query interfaces to query a Web form and crawl the Deep Web resources. Alexandros
Ntoulas, Petros Zerfos, and Junghoo Cho of UCLA created a hidden-Web crawler that automatically
generated meaningful queries to issue against search forms.[17] Several form query languages (e.g.,
DEQUEL[18]) have been proposed that, besides issuing a query, also allow to extract structured data from
result pages. Another effort is DeepPeep, a project of the University of Utah sponsored by the National
Science Foundation, which gathered hidden-Web sources (Web forms) in different domains based on novel
focused crawler techniques.[19][20]

Commercial search engines have begun exploring alternative methods to crawl the deep Web. The Sitemap
Protocol (first developed, and introduced by Google in 2005) and mod oaiare mechanisms that allow search
engines and other interested parties to discover deep Web resources on particular Web servers. Both
mechanisms allow Web servers to advertise the URLs that are accessible on them, thereby allowing
automatic discovery of resources that are not directly linked to the surface Web. Google's deep Web
surfacing system pre-computes submissions for each HTML form and adds the resulting HTML pages into
the Google search engine index. The surfaced results account for a thousand queries per second to deep
Web content.[21] In this system, the pre-computation of submissions is done using three algorithms:

1. selecting input values for text search inputs that accept keywords,

2. identifying inputs which accept only values of a specific type (e.g., date), and
3. selecting a small number of input combinations that generate URLs suitable for inclusion into the
Web search index.
Classifying resources[edit]
This section possibly contains original research. Please improve
it by verifying the claims made and adding inline citations. Statements consisting
only of original research may be removed. (September 2012)

Automatically determining if a Web resource is a member of the surface Web or the deep Web is difficult. If
a resource is indexed by a search engine, it is not necessarily a member of the surface Web, because the
resource could have been found using another method (e.g., the Sitemap Protocol, mod_oai, OAIster)
instead of traditional crawling. If a search engine provides a backlink for a resource, one may assume that
the resource is in the surface Web. Unfortunately, search engines do not always provide all backlinks to
resources. Furthermore, a resource may reside in the surface Web even though it has yet to be found by a
search engine.

Most of the work of classifying search results has been in categorizing the surface Web by topic. For
classification of deep Web resources, Ipeirotis et al.[22] presented an algorithm that classifies a deep Web
site into the category that generates the largest number of hits for some carefully selected, topically-focused
queries. Deep Web directories under development include OAIster at the University of Michigan, Intute at
the University of Manchester, Infomine[23] at the University of California at Riverside, and DirectSearch
(by Gary Price). This classification poses a challenge while searching the deep Web whereby two levels of
categorization are required. The first level is to categorize sites into vertical topics (e.g., health, travel,
automobiles) and sub-topics according to the nature of the content underlying their databases.

The more difficult challenge is to categorize and map the information extracted from multiple deep Web
sources according to end-user needs. Deep Web search reports cannot display URLs like traditional search
reports. End users expect their search tools to not only find what they are looking for special, but to be
intuitive and user-friendly. In order to be meaningful, the search reports have to offer some depth to the
nature of content that underlie the sources or else the end-user will be lost in the sea of URLs that do not
indicate what content lies beneath them. The format in which search results are to be presented varies
widely by the particular topic of the search and the type of content being exposed. The challenge is to find
and map similar data elements from multiple disparate sources so that search results may be exposed in a
unified format on the search report irrespective of their source.

Future[edit]

The lines between search engine content and the deep Web have begun to blur, as search services start to
provide access to part or all of once-restricted content. An increasing amount of deep Web content is
opening up to free search as publishers and libraries make agreements with large search engines. In the
future, deep Web content may be defined less by opportunity for search than by access fees or other types
of authentication.[citation needed]
See also[edit]
Internet portal

 Dark Internet

 Darknet (file sharing)

 I2P

 Freenet

 Tor (anonymity network)

 Gopher protocol
References[edit]

1. Jump up^ Hamilton, Nigel. "The Mechanics of a Deep Net Metasearch Engine".doi:10.1.1.90.5847.

2. Jump up^ Devine, Jane; Egger-Sider, Francine (7 2004). "Beyond google: the invisible web in the
academic library". The Journal of Academic Librarianship 30 (4): 265–269. Retrieved 2014-02-06.

3. Jump up^ Raghavan, Sriram; Garcia-Molina, Hector (11–14 September 2001). "Crawling the Hidden

Web". 27th International Conference on Very Large Data Bases (Rome, Italy).

4. Jump up^ The Secret Web: Where Drugs, Porn and Murder Live Online

5. ^ Jump up to:a b Wright, Alex (2009-02-22). "Exploring a 'Deep Web' That Google Can’t Grasp".The New

York Times. Retrieved 2009-02-23.

6. Jump up^ Bergman, Michael K (July 2000). The Deep Web: Surfacing Hidden Value. BrightPlanet LLC.

7. ^ Jump up to:a b c d Bergman, Michael K (August 2001). "The Deep Web: Surfacing Hidden Value". The

Journal of Electronic Publishing 7 (1). doi:10.3998/3336451.0007.104.

8. Jump up^ He, Bin; Patel, Mitesh; Zhang, Zhen; Chang, Kevin Chen-Chuan (May 2007)."Accessing the

Deep Web: A Survey". Communications of the ACM (CACM) 50 (2): 94–

101. doi:10.1145/1230819.1241670.

9. Jump up^ Denis Shestakov (2011). "Sampling the National Deep Web" (PDF). Proceedings of the 22nd

International Conference on Database and Expert Systems Applications (DEXA) (in Russian).

Springer.com. pp. 331–340. Archived from the original on September 2, 2011. Retrieved 2011-10-06.

10. Jump up^ Garcia, Frank (January 1996). "Business and Marketing on the Internet". Masthead15 (1).

Archived from the original on 1996-12-05. Retrieved 2009-02-24.

11. Jump up^ @1 started with 5.7 terabytes of content, estimated to be 30 times the size of the nascent

World Wide Web; PLS was acquired by AOL in 1998 and @1 was abandoned."PLS introduces AT1, the

first 'second generation' Internet search service" (Press release). Personal Library Software. December

1996. Retrieved 2009-02-24.

12. Jump up^ pragma:no-cache/cache-control:no-cache "HTTP 1.1: Header Field Definitions (14.32

Pragma)". HTTP — Hypertext Transfer Protocol. World Wide Web Consortium. 1999. Retrieved 2009-

02-24.
13. Jump up^ "Intute FAQ". Retrieved October 13, 2012.

14. Jump up^ http://library.bldrdoc.gov/news.html/

15. Jump up^ Sriram Raghavan; Hector Garcia-Molina (2000). Crawling the Hidden Web(PDF). Stanford

Digital Libraries Technical Report. Retrieved 2008-12-27.

16. Jump up^ Raghavan, Sriram; Garcia-Molina, Hector (2001). "Crawling the Hidden

Web"(PDF). Proceedings of the 27th International Conference on Very Large Data Bases (VLDB).

pp. 129–38.

17. Jump up^ Alexandros, Ntoulas; Petros Zerfos, and Junghoo Cho (2005). Downloading Hidden Web

Content (PDF). UCLA Computer Science. Retrieved 2009-02-24.

18. Jump up^ Shestakov, Denis; Sourav S Bhowmick, and Ee-Peng Lim (2005). "DEQUE: Querying the

Deep Web" (PDF). Data & Knowledge Engineering 52(3): 273–311.

19. Jump up^ Barbosa, Luciano; Juliana Freire (2007). An Adaptive Crawler for Locating Hidden-Web

Entry Points (PDF). WWW Conference 2007. Retrieved 2009-03-20.

20. Jump up^ Barbosa, Luciano; Juliana Freire (2005). Searching for Hidden-Web Databases.. WebDB

2005. Retrieved 2009-03-20.

21. Jump up^ Madhavan, Jayant; David Ko, Łucja Kot, Vignesh Ganapathy, Alex Rasmussen, Alon Halevy

(2008). Google’s Deep-Web Crawl (PDF). VLDB Endowment, ACM. Retrieved 2009-04-17.

22. Jump up^ Ipeirotis, Panagiotis G.; Gravano, Luis; Sahami, Mehran (2001). "Probe, Count, and Classify:

Categorizing Hidden-Web Databases" (PDF). Proceedings of the 2001 ACM SIGMOD International

Conference on Management of Data. pp. 67–78.

23. Jump up^ UCR.edu

Italy: Security Affairs.

 Paganini, Pierluigi (Sept 2012), "The good and the bad of the Deep Web", Security Affairs, Napoli, Italy:

Security Affairs .

 Paganini, Pierluigi (Sept 2013), "Trafﬁc Correlation Attacks against Anonymity on Tor", Security Affairs,

Napoli, Italy: Security Affairs .

 Pierluigi Paganini, Richard Amores (Sept 2012), "The Deep Dark Web", The Deep Dark Web, NY, USA:

Amazon .

 Paganini, Pierluigi (Oct 2012), "The Deep Web Part 1: Introduction to the Deep Web and how to wear clothes

online!", The Malta Independent Online, Malta: The Malta Independent.

 Barker, Joe (Jan 2004), "Invisible Web: What it is, Why it exists, How to find it, and its inherent

ambiguity", Teaching Library Internet Workshops, Berkeley, CA, USA: UC.

 Gruchawka, Steve (June 2006), How-To Guide to the Deep Web.

 Hamilton, Nigel (2003), The Mechanics of a Deep Net Metasearch Engine, 12th World Wide Web

Conference.

 He, Bin; Chang, Kevin Chen-Chuan (2003). "Statistical Schema Matching across Web Query

Interfaces" (PDF). Proceedings of the 2003 ACM SIGMOD International Conference on Management of

Data. Archived from the original on 20 July 2011.

 Ipeirotis, Panagiotis G.; Gravano, Luis; Sahami, Mehran (2001). "Probe, Count, and Classify: Categorizing

Hidden-Web Databases" (PDF). Proceedings of the 2001 ACM SIGMOD International Conference on

Management of Data. pp. 67–78.

 King, John D.; Li, Yuefeng; Tao, Daniel; Nayak, Richi (November 2007). "Mining World Knowledge for

Analysis of Search Engine Content" (PDF). Web Intelligence and Agent Systems: an International

Journal 5 (3): 233–53.

 McCown, Frank; Liu, Xiaoming; Nelson, Michael L; Zubair, Mohammad (March–April 2006). "Search Engine

Coverage of the OAI-PMH Corpus" (PDF). IEEE Internet Computing 10 (2): 66–

73. doi:10.1109/MIC.2006.41.

 Price, Gary; Sherman, Chris (July 2001). The Invisible Web: Uncovering Information Sources Search

Engines Can't See. CyberAge Books. ISBN 0-910965-51-X.

 Shestakov, Denis (June 2008). Search Interfaces on the Web: Querying and Characterizing. TUCS Doctoral

Dissertations 104, University of Turku

 Wright, Alex (Mar 2004), In Search of the Deep Web, Salon, archived from the original on 9 March 2007.

External links[edit]

 Basu, Saikat (March 14, 2010), 10 Search Engines to Explore the Invisible Web, MakeUseOf.com.

 Whoriskey, Peter (December 11, 2008), Firms Push for a More Searchable Federal Web, The
Washington Post, p. D01.
Metabrowsing
From Wikipedia, the free encyclopedia

Metabrowsing refers to approaches to browsing Web-based information that emerged in the late 1990s as
alternatives to the standard Web browser. According to LexisNexis the term "metabrowsing" began
appearing in mainstream media in March 2000.[1][2] Since then the meaning of "metabrowsing" has split into
a popular and a more scientific use of the term.

Contents

[hide]

 1 Popular use

 2 Scientific use

 3 Web metabrowsing applications

 4 Technology

 5 References

Popular use[edit]

Akin to metasearch, the popular use of the term "metabrowsing" describes an alternative way to viewing
Web-based information other than a single Web-page at a time. "Simply put, metabrowsing is a tool or
service that enables the user to view more than a single Web page at a time inside a single display unit." [3]

According to Dr. Linda Gordon, Liberal Arts Professor at Nova Southeastern University, "metabrowsing is
transforming our understanding of the web, therefore, the vocabulary of this new perspective must
demonstrate the nature of the metamorphosis. The etymological root 'meta', from the Greek, means
'change' and 'transcendance', and thus we can understand the dynamics of metabrowsing as a view of the
web from a higher level. What is this higher level? To speak metaphorically, think of the limitations of street
signs for navigation: metabrowsing will become the GPS of the internet.". [4]

Scientific use[edit]

There are several scientific papers that use the term to describe the browsing of "graphical representations"
of documents. In this context "metabrowsing" refers to a high-level way of browsing through information:
instead of browsing through document contents or document surrogates, the user browses through a
graphical representation of the documents and their relations to the domain.[5][6]

Web metabrowsing applications[edit]

Quickbrowse was one of the first Web-based metabrowsing applications, enabling users to combine
multiple pages into one vertical, continuously scrollable page for faster viewing. Onepage.com and
Octopus.com offered more sophisticated systems for combining not just entire Web pages but bits and
pieces of different pages into a new "combo page". Octopus received more than $11.4 million in venture
capital funding.[7] Onepage received $25 million in venture capital funding.[8] Sybase acquired Onepage in
2002 changing the service from an end user oriented business model to an enterprise-driven concept. In
the end Onepage was terminated. Calltheshots.com was acquired by Akamai and then also disappeared,
as did Katiesoft and iHarvest.com.

Technology[edit]

Web-based metabrowsing services such as Quickbrowse, Octopus and Onepage differed in their
technological approach. Quickbrowse only allows the combination of complete Web pages. The service
retrieves the HTML of designated pages and then combines it into a new "combo page" server-side. This
"raw" approach does not work with all types of Web pages, especially cascading style sheets
whose HTML does not combine well. Quickbrowse also disables JavaScript components so as to avoid
problems that would arise from the combination of disparate and unrelated sources of JavaScript code.
Unwanted layout distortions may result when combining pages. Services like Octopus and Onepage, both
out of business, used a more sophisticated Java-driven approach that enabled users' browsers to retrieve
and combine bits and pieces from disparate Web sites client-side.

References[edit]

1. Jump up^ Peter Sinclair - Here's the hot word to all cyberg(r)eeks, The New Zealand Herald, April 4,

2000 Tuesday, TECHNOLOGY; Web-column, 683 words

2. Jump up^ Joy Wu (October 26, 2000). "Quickbrowse speeds surfing the web by connecting pages". The

Johns Hopkins News-letter. Retrieved 2007-01-27.

3. Jump up^ NextGen Web browsing. India.Cnet.com. Retrieved on 2007-01-24.

4. Jump up^ Quickbrowse Pressrelease Quickbrowse.com. Retrieved on 2007-01-24.

5. Jump up^ Information retrieval by metabrowsing. Journal of the American Society for Information

Science and Technology . Retrieved on 2007-01-24.

6. Jump up^ Information Retrieval by Graphically Browsing Meta-Information. Publications Floris

Wiesman. Retrieved on 2007-01-25.

7. Jump up^ Highbeam.com article on Onepage funding Highbeam.com. Retrieved on 2007-01-23.

8. Jump up^ VentureWire.com Venturewire.com (archived at archive.org). Retrieved on 2007-01-23.

Multisearch
From Wikipedia, the free encyclopedia
This article does not cite any references or sources. Please help improve this
article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (October 2007)

Multisearch is a multitasking search engine which includes both search engine and metasearch
engine characteristics with additional capability of retrieval of search result sets that were previously
classified by users. It enables the user to gather results from its own search index as well as from one or
more search engines, metasearch engines, databasesor any such kind of information retrieval (IR)
programs. Multisearch is an emerging feature of automated search and information retrieval systems which
combines the capabilities of computer search programs with results classification made by a human.

Multisearch is a way to take advantage of the power of multiple search engines with a flexibility not seen in
traditional metasearch engines. To the end user, a multisearch may appear to be just a customizable
search engine; however, its behind-the-scenes technology enables it to put a face to the search process
and to retrieve and display also a results set which has been classified by a human during a multisearch
session and automatically included in the documents index. There are additional features available in many
search engines and metasearch engines, but the basic idea is the same: reducing the amount of time
required to search for resources by improvement of the accuracy and relevance of individual searches as
well as the ability to manage the results.

See also[edit]

 Search aggregator

 Metabrowsing

 Search engine
Internet search engines
From Wikipedia, the free encyclopedia

Wikimedia Commons has

media related to Internet
search engines.

Commons cat.|Internet search engines}} General search engines that search for information on the Internet.
For more specific search engines, see other subcategories of Category:Searching.

Subcategories

This category has the following 16 subcategories, out of 16 total.

D P

► Personalized search (2 P) ► Domain-specific search engines (2 C, 42 P) ►

A E S

► Internet search algorithms (2 C, 12 P) ► Employment websites (1 C, 79 P) ►

B G ►
Μ
► Baidu (1 C, 9 P) ► Google (11 C, 179 P, 1 F)
► Bing (21 P) H ►
► Blog search engines (17 P)
C ► Human edited search engines (3 P)
M
► Code search engines (9 P)
► Music search engines (8 P)
P

► Pay per click search engines (9 P)

Pages in category "Internet search engines"

The following 200 pages are in this category, out of 313 total. This list may not reflect recent changes (learn
more).

(previous 200) (next 200)

D cont. I

 Personalized search  Domania 

 Search-based application  DotBot J
 SeatGeek  DuckDuckGo
E 
 Web search engine

 Geliyoo
 EB-eye 
 List of search engines
1  Ecocho 
 Egerin K
 123people  ElgooG
A 
 Empas

 Everyclick
 A9.com 
 EWatch
 Accoona 
 Ex.plode.us
 Alexa Internet 
 Exalead
 Aliweb 
 Excite
 Alleba L
F
 AlltheWeb

 AltaVista  Fabasoft Mindbreeze

 Anchor text  Facebook Graph Search

 Animedia  FAROO

 Ansearch  Favored placement

 Archie search engine  FilesTube

 Ask.com  Finday

 AskMeNow  FoodPair

 Astalavista.box.sk  Forestle
M
 Audio search engine  Future Orientation Index
 Ayna (Internet search engine) G

B

 GazoPa

 Baidu  GenieKnows

 BASE (search engine)  Apostolos Gerasoulis

 Become.com  GetApp.com

 Best of the Web Directory  Gigablast

 Bing  Gimmeyit (search engine)

 Bing Entertainment  Globo.com

 Bioinformatic Harvester  Globrix 
 BizShark  GoHook 
 Blekko  Goo (search engine) 
 Blingo  GoodSearch 
 Blinkx  Google Search 
 BlogPulse  Google search features 
 Boogami  Google SearchWiki 
 BotSeer  Google Shell 
 Brainboost  GoPubMed 
 BTDigg  Grantsmart 
 Bullseye by Intelliseek  Greenpilot 
 Business.com  Groovle 
C  Groxis N
 Grub (search engine)
 Carrot2 
 Guruji.com
 CastTV H 
 ChaCha (search engine) 
 ChemRefer  Hakia 
 ChemXSeer  Halalgoogling 
 Citebase  Harvester42 
 CiteSeer  Heritage Microfilm, Inc. 
 Clicker.com  HighBeam Research 
 Codase  Hoppit 
 Comfibook  HotBot 
 Comparison of web search engines  Human search engine 
 Convera Corporation I 
 Copernic O
 IFACnet
 Covario
 Image meta search 
 Crawljax
 Imense 
 Cuil
D  ImHalal 
 Inbenta 
 DataparkSearch  Info.com 
 Daum Communications  Informedia Digital Library P
 Davindi  Infoseek

 DeeperWeb  Blucora

 Diplomacy Monitor  InfoTrac
 DiscoverEd  Inktomi 
 Distributed search engine  IStorez 
 Divvio  ISYS Search Software 
 Dogpile  Itpints 
 DoGreatGood  IWon 

