Search Engine: A Project On
Search Engine: A Project On
SEARCH ENGINE
Submitted
Submitted by
PRIYANKA SHARMA (1552513023)
RISHU YADAV (1552513024)
SUNIL KUMAR (1552513042)
AMARNATH MAURYA (1552510043)
Under the guidance of
Mr. SUDHIR AGRAWAL
I hereby declare that the project entitled “Search Engine” submitted for the B.Tech (IT)
degree is our original work and the project has not formed the basis for the award of any
other degree, diploma, fellowship or any other similar titles.
This is to certify that Priyanka Sharma, Rishu yadav, Sunil Kumar, Amaranth maurya a student of B.
Tech (IT) of BUDDHA INSTITUE OF TECHNOLOGY, (Gorakhpur) have completed the project work
entitled “SEARCH ENGINE” in partial fulfillment of the requirements for the award of Bachelor of
Technology in INFORMATION TECHNOLOGY and Engineering affiliated to Dr. A.P.J. ABDUL KALAM
TECHNICAL UNIVERSITY, (Lucknow) under the guidance of Mr. SUHIR AGRAWAL during the academic
year 2018-19.
(DIRECTOR)
First and foremost, praises and thanks to the lord, the Almighty, for his blessings
throughout the presentation to completion of the project successfully. Big thanks to the respected
Head of Department, Mr. Shrawan Kumar Pandey for his constant support and guidance throughout
the project. His dynamism, vision, sincerity and motivation have always inspired us deeply.
We would like to express our deep and sincere gratitude to the project guide Mr. Ranjeet Singh for
giving the opportunity to do research and provide us with invaluable guidance throughout the work.
And Lastly, we would thank all the faculty members and staff of Buddha Institute of Technology for
their kindness and Support.
Simple search engine project is implemented in java using servlets, oracle database or SQL
server 2000. Main aim of this project is to develop a search engine which will search in three
different search engines and display top twenty five results which are more useful for users.
In present trend search engines are used by every for finding required information on the
web. Google is one of the mostly used search engine followed by yahoo and bingo. For
every search engine first five results are more useful website with correct information so in
this project we collect top five results from Google, yahoo and bingo and display in first page
which are most useful web pages.
ABSTRACT Search engine optimization (SEO) is the process of affecting the visibility of a
website or a web page in a web search engine's unpaid results. ... So, if a page is optimized
in Google it is optimized for most of the search engines.
CHAPTER 1 INTRODUCTION 1
3.1 Introduction 3
3 3.2 Structure 4
4 3.3 Iterative Model Application 5
5 3.4 Resister on website search engine 5
5
6.4 CSS-Syntax 16
7.1 JavaScript 17
REFERENCES
S.No Topic
1.1 crawling
1.2 Basic topics
1.3 Indexing
1.4 Storages
1.5 Data Flow
1.6 Login page
1.7 Index page
1.8 Topic entry
1.9 Google Search
1.10 Results
INTRODUCTION
Introduction:-
When we want to find something on the web we look to a search engine, such as those Sites like
Google, MSNandYahoo!
Let you search for web sites that contain information pertinent to topics of interest to you.
Potential visitor’s looking for your site are going to do the something. This makes it imperative
that your site Get ranked high enough for important keywords that visitors can find it. Knowing
what keywords are important. Means knowing what visitors are looking for when they find your
site.
Search Engines are the most common tool for promoting a web site the web creates new
challenges for information retrieval. The amount of information on the web is growing rapidly,
As well as the number of new users inexperienced in the art of web research. People are likely
to surf the web using its link graph, often starting with high quality human maintained indices
such as Yahoo! or with search engines.
Human maintained lists cover popular topics effectively but are subjective, expensive to build
and maintain, Slow to improve, and cannot cover all esoteric topics. Automated search engines
that rely on keyword matching usually Return too many low quality matches.
To make matters worse, some advertisers attempt to gain people's attention by taking
measures meant to mislead automated search engines. We have built a large-scale search engine
which addresses many of the problems of existing systems. It makes especially heavy use of the
additional structure present in hypertext to provide much higher quality search results. We
chose our system name, Google, because it is a common spelling of googol, or 10^100 and fits
well with our goal of building very large-scale search engines.
Search engines use selected software programs to search their indexes for matching
keywords and phrases, presenting their findings to you in some kind of relevance ranking.
Although software programs may be similar, no two search engines are exactly the same in
terms of size, speed and content; no two search engines use exactly the same ranking schemes,
and not every search engine offers you exactly the same search options. Therefore, your search
Is going to be different on every engine you use. The difference may not be a lot, but it could be
significant. Recent estimates put search engine overlap at approximately 60 percent and unique
content at around 40 percent.
Web search engines work by storing information about many web pages, which they retrieve
from the Html itself. These pages are retrieved by a Web crawler (sometimes also known as a
spider) — an automated Web browser which follow sever link on the site. Exclusions can be
made by the use of robots.txt.
The contents of each page are then analyzed to determine how it should be indexed (for
example, words are extracted from the titles, headings, or special fields called Meta tags). Data
about web pages are stored in an index database for use in later queries. A query can be a single
word. The purpose of an index is to allow information to be found as quickly as possible. Some
search engines, such as Google, store all or part of the source page (referred to as a cache) as
well as information about the web pages, where as others, such as AltaVista, store every word of
every page they find.
This cached page always holds the actual search text since it is the one that was actually
indexed, so it can be very useful when the content of the current page has been updated and the
search terms are no longer in it.
Most search engines support the use of the Boolean operators AND, OR and NOT to further
specify the search query.
Boolean operators are for literal searches that allow the user to refine and extend the terms of
the search.
Crawler-based search engines have three major elements. First is the spider, also called the
crawler. The spider visits a web page, reads it, and then follows links to other pages within the
site. This is what it means when someone refers to a site being "spidery" or "crawled."
1. Crawler-based
2. Metasearch
Crawler-based search engines are what most of us are familiar with - mainly because that's
what Google and Bing are. These companies develop their own software that enables them to
build and maintain searchable databases of web pages (the engine), and to organize those pages
into the most valuable and pertinent way to the user.
They are called Crawler because the software produced crawls the web like a spider,
automatically updating and adding new pages to its search index as it goes.
You can think of these like the car - what you see and what you use - and the engine which
moves you to your destination. These are notoriously difficult and expensive to build from
scratch, and you have to be just a little bit crazy to start one! :)
Google (USA)
Bing (USA)
Gig blast (USA)
Yandex (Russia)
Exalead (France)
Mojeek (UK)
If crawler-based search engines are the car, then you could think of meta search engines as the
Caravans being towed behind. These search engines don't have the arduous task of developing
the required technology (the engine) and depend upon the crawlers to build their service on. In
many if crawler-based search engines are the car, then you could think of meta search engines
as the cases they bring in results from multiple search engines with the intention of delivering
better results. Further, they usually concentrate on front-end technologies such as user
experience and novel ways of displaying the information.
So if the end-user experience is ostensibly the same, why should you care about what type of
search engine you're using? First, let's start with Wikipedia's definition of the word 'meta' to
learn about what Meta search engines can't do:
Incidentally, most search engines come under this category, with DuckDuckGo being perhaps
the best example, I quick and Unsubtle are two others also worth checking out.
Meta (from the Greek preposition μετά = "after", "beyond", "adjacent", "self"
Once again, the caravan analogy is apt. But more specifically, meta search engines can only use
the limited data accessed from the crawler engine to re-arrange the results. They don't have the
capacity to identify and discriminate between ranking factors [2].
Without being in the driver's seat, your experience is ultimately directed by the whim of
underlying competitors [3]. But also, the crawler search engine could decide to stop supplying
them with results at any time, maybe after seeing them as a threat or otherwise deciding not to
collaborate anymore. Possibly worse still, without an engine of their own the business model is
far easier to replicate by new entrants to the marketplace. So not only are they controlled by
other companies, but they are at more risk of being replaced by a new, shinier caravan!
Naturally we're biased, but impartiality is an attractive quality in business, so if you value user
experience above all else, then we absolutely suggest you try out a meta search engine. It all
comes down to what's important to you.
User experience is becoming far more important as expectations in design continue to increase.
Time saved in not developing and maintaining search technology or indexes of the web, is time
that meta search engines can allocate to the look and feel of their website. (But indeed, only an
advantage against crawler engines with smaller teams.
I'd like to finish up the article with a concern for the growing domination of just a select few
crawler-based search engines, none of which are from the United Kingdom.
There is perhaps no better example of a monopoly today than with Google. Given that most
of their products are free to users, the opportunity for competition is narrowed, and worse
still, it masks the issue to people using their services.
Too much power economically has left behind it a dismal path [4]. And in business, without
consumer-choice, the powerful are incentivized to exploit their position [5]. More generally,
corporate monopolies threaten to reach a point where they become liabilities themselves, and
present dire scenarios for society at large [6].
As Nazism Table notes in his bestselling book, Anti-fragile, "Small is beautiful, but it is also
efficient" [7]. 'Small' owes itself to choice, and 'efficiency' becomes inefficiency once you become
too powerful. Having options, in a word, matters.
I hope this article explained the difference between crawler and Meta engines and why it
matters, but if not, please get in touch.
Metasearch engines are also not allowed to redistribute the results supplied by the
crawler engine, this prohibits their ability to supply a full search API to potential clients,
an area we believe will be important in the future of search and global knowledge
systems.
The "too big to fail" theory asserts that certain financial institutions are so large and so
interconnected that their failure would be disastrous to the economy, and they therefore
must be supported by government when they face difficulty -
http://en.wikipedia.org/wiki/Too_big_to_fail
Google Finance isn't the most popular finance site; according to Com Score, Yahoo
Finance claims that title, and indeed Com Score puts Google Finance in position #60 (as
of April 2010). Nonetheless, the three most prominent links all promote Google's in-
house finance service - http://www.benedelman.org/hardcoding/
Google is now "too big to fail" as indicated by the recent DOJ investigation which could
have resulted in a felony charge for their co-founder, and most certainly would have for
a smaller firm without $500m of liquid cash - http://www.seobook.com/too-big-to-fail.
A hybrid search engine (HSE) is a type of computer search engine that uses different types of
data with or without ontologism to produce the algorithmically generated results based on web
crawling. ... Hybrid search engines use a combination of both crawler-based results and
directory results. More and more search engines these days are moving to a hybrid-based
model.
A meta search engine (or aggregator) is a search tool that uses another search engine's data to
produce its own results from the Internet. Meta search engines take input from a user and
simultaneously send out queries to third party search engines for results. Sufficient data is gathered,
formatted by their ranks and presented to the users.
Metasearch engines have their own sets of unique problems. All of the websites stored on search
engines are different, which draws irrelevant content. Problems such as spamming reduces result
accuracy.[3] The process of fusion aims to tackle this issue and improve the engineering of a
metasearch engine.
Operation:-
A metasearch engine accepts a single search request from the user. This search request is then
passed on to another search engine’s database. A metasearch engine does not create a database of
web pages but generates a virtual database to integrate data from multiple sources.Since every
search engine is unique and has different algorithms for generating ranked data, duplicates will
therefore also be generated. To remove duplicates metasearch engine processes this data and
applies its own algorithm. A revised list is produced as an output for the user. When a metasearch
engine contacts other search engines, these search engines will respond in three ways:
They will both cooperate and provide complete access to interface for the metasearch
engine, including private access to the index database, and will inform the metasearch
engine of any changes made upon the index database.
The search engine can be completely hostile and refuse the metasearch engine total access
to their database and in serious circumstances, by seeking legal methods.
Searching is one of the most used actions on the Internet. Search engines as an instrument of
searching, are very popular and frequently used sites. This is the reason why webmasters and
every ordinary user on the Internet, must have good knowledge about search engines and
searching.
Webmasters use major search engines for submitting their sites on it, and for searching.
Ordinary users use major search engines primarily for searching, and sometimes for submitting
their homepages or small sites.
If you are webmaster, you will also need some information while preparing site for the Web,
and you will also use search engines. Then, when you finish with it, you must submit your URL
to many search engines. After that you will check your URL ranking on every search engine...
There are also hot news on every major search engine, many other interesting contents... All of
this shortly describes why search engines are so popular
As a user at novice level you must learn how to use search engines for searching the Internet.
You must know that there are two ways of searching: by using user's query or by using
categories. If you have keywords or phrase that best describes the theme you need, you should
use user's query. But if you need some theme, and you don't have keywords or phrase, you
should use categories.
If you use user's query, you should type keyword or phrase in this form and click on "search".
Then you will get search results, and you can choose URL, which is the best in your opinion. If
you use categories, you should click on category that best describes the theme you need. You
will then get subcategories and should choose some subcategory that best describes the theme
you are after.
• Using “or” As a webmaster you must submit your URL to all major search engines. This is the
way to promote your site. You could get many visits from major search engines, if you have a
good ranking of your URL. We made page with URLs which takes you to submit forms of major
search engines. You will not lose your valuable time on searching for these forms, all of them are
on one page.
The web is growing much faster than any present-technology search engine can possibly
index (see distributed web crawling).
Many web pages are updated frequently, which forces the search engine to revisit them
periodically.
The queries one can make are currently limited to searching for key words, which may
result in many false positives.
o Dynamically generated sites may be slow or difficult to index, or may result in
excessive results from a single site.
o Many dynamically generated sites are not indexable by search engines; this
phenomenon is known as the invisible web.
o Some search engines do not order the results by relevance, but rather according
to how much money the sites have paid them.
o Some sites use tricks to manipulate the search engine to display them as the first
result returned for some keywords. This can lead to some search results being
polluted, with more relevant links being pushed down in the result list.