0% found this document useful (0 votes)
705 views60 pages

Search Engine: A Project On

This document describes a project on developing a search engine. It was submitted by 4 students - Priyanka Sharma, Rishu Yadav, Sunil Kumar, and Amarnath Maurya at Buddha Institute of Technology, Gorakhpur, India, in partial fulfillment of their Bachelor of Technology degree in Information Technology and Engineering. It was submitted in January-June 2019 under the guidance of their professor Mr. Sudhir Agrawal.

Uploaded by

Sharma Priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
705 views60 pages

Search Engine: A Project On

This document describes a project on developing a search engine. It was submitted by 4 students - Priyanka Sharma, Rishu Yadav, Sunil Kumar, and Amarnath Maurya at Buddha Institute of Technology, Gorakhpur, India, in partial fulfillment of their Bachelor of Technology degree in Information Technology and Engineering. It was submitted in January-June 2019 under the guidance of their professor Mr. Sudhir Agrawal.

Uploaded by

Sharma Priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

A Project on

SEARCH ENGINE
Submitted

In partial fulfillment of the requirements for the

Award of the degree of

BACHELOR OF TECHONOLOGY IN INFORMATION

TECHNOLOGY & ENGINEERING

Submitted by
PRIYANKA SHARMA (1552513023)
RISHU YADAV (1552513024)
SUNIL KUMAR (1552513042)
AMARNATH MAURYA (1552510043)
Under the guidance of
Mr. SUDHIR AGRAWAL

BUDDHA INSTITUTE OF TECHNOLOGY CL-1, SECTOR-7 GIDA

GORAKHPUR, UP (INDIA) - 273209

(AFFILIATED TO Dr. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY

LUCKNOW, UP (INDIA) JAN-JUNE, 2019

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 1


DECLARATION

I hereby declare that the project entitled “Search Engine” submitted for the B.Tech (IT)
degree is our original work and the project has not formed the basis for the award of any
other degree, diploma, fellowship or any other similar titles.

Place: Gorakhpur PRIYANKA SHARMA (1552513023)

Date: 10/4/19 RISHU YAAV (1552513024)

SUNIL KUMAR (1552513)

AMARNATH MAURYA (1552513043)

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 2


CERTIFICATE

This is to certify that Priyanka Sharma, Rishu yadav, Sunil Kumar, Amaranth maurya a student of B.
Tech (IT) of BUDDHA INSTITUE OF TECHNOLOGY, (Gorakhpur) have completed the project work
entitled “SEARCH ENGINE” in partial fulfillment of the requirements for the award of Bachelor of
Technology in INFORMATION TECHNOLOGY and Engineering affiliated to Dr. A.P.J. ABDUL KALAM
TECHNICAL UNIVERSITY, (Lucknow) under the guidance of Mr. SUHIR AGRAWAL during the academic
year 2018-19.

Mr. SUDHIR AGRAWAL

(DIRECTOR)

Department of INFORMATION TECHNOLOGY & Engineering

BUDDHA INSTITUTE OF TECHNOLOGY [525] PAGE 3


ACKNOWLEDGEMENT

First and foremost, praises and thanks to the lord, the Almighty, for his blessings
throughout the presentation to completion of the project successfully. Big thanks to the respected
Head of Department, Mr. Shrawan Kumar Pandey for his constant support and guidance throughout
the project. His dynamism, vision, sincerity and motivation have always inspired us deeply.

We would like to express our deep and sincere gratitude to the project guide Mr. Ranjeet Singh for
giving the opportunity to do research and provide us with invaluable guidance throughout the work.

And Lastly, we would thank all the faculty members and staff of Buddha Institute of Technology for
their kindness and Support.

Priyanka Sharma (1552513023)

Rishu yadav (1552513024)

Sunil Kumar (1552513041)

Amaranth maurya (1452513006)

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 4


ABSTRACT

Simple search engine project is implemented in java using servlets, oracle database or SQL
server 2000. Main aim of this project is to develop a search engine which will search in three
different search engines and display top twenty five results which are more useful for users.
In present trend search engines are used by every for finding required information on the
web. Google is one of the mostly used search engine followed by yahoo and bingo. For
every search engine first five results are more useful website with correct information so in
this project we collect top five results from Google, yahoo and bingo and display in first page
which are most useful web pages.

ABSTRACT Search engine optimization (SEO) is the process of affecting the visibility of a
website or a web page in a web search engine's unpaid results. ... So, if a page is optimized
in Google it is optimized for most of the search engines.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 5


TABLE OF CONTENTS

CHAPTER TOPIC PAGE NO.

CHAPTER 1 INTRODUCTION 1

CHAPTER 2 RRESOURCE REQUIREMENT 2

2.1 Hardware Requirement

2.2 Software Requirement

CHAPTER 3 DEVELOPMENT SYSTEM (3-5)

3.1 Introduction 3
3 3.2 Structure 4
4 3.3 Iterative Model Application 5
5 3.4 Resister on website search engine 5
5

CHAPTER 4 PHP (6-11)

4.1 Meta search engine 6

4.2Crowel base search engine 7

4.3Storing & Web content 8

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 6


CHAPTER 5 HTML (12-13)
5.1 Introduction 12

5.2 HTML Attribute 12

5.3 HTML Tags 13

CHAPTER 6 CSS (14-16)


6 Introduction 14

6.1 Advantages of CSS 15


6.2Module of CSS 15

6.3 CSS Version 16

6.4 CSS-Syntax 16

CHAPTER 7 INTRODUCTION OF JAVA SCRIPT (17-19)

7.1 JavaScript 17

7.2 Advantage of JavaScript 18

7.3 JavaScript Development Tools 18

7.4 JavaScript HTML DOM 19

CHAPTER 8 DATA FLOW DIAGRAM (23-25)

9.1 0-Level DFD 23

9.2 1-Level DFD 24

9.3 2-Level DFD 25

CHAPTER WEB PAGES & CODES (26-70)


10.1 Login Page (26-33)
10.2 Index Page (34-49)

CHAPTER 11 CONCLUSION (74-75)

REFERENCES

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 7


LIST OF FIGURES

S.No Topic

1.1 crawling
1.2 Basic topics
1.3 Indexing
1.4 Storages
1.5 Data Flow
1.6 Login page
1.7 Index page
1.8 Topic entry
1.9 Google Search
1.10 Results

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 8


LIST OF TABLE

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 10


CHAPTER 1

INTRODUCTION

Introduction:-
When we want to find something on the web we look to a search engine, such as those Sites like
Google, MSNandYahoo!
Let you search for web sites that contain information pertinent to topics of interest to you.
Potential visitor’s looking for your site are going to do the something. This makes it imperative
that your site Get ranked high enough for important keywords that visitors can find it. Knowing
what keywords are important. Means knowing what visitors are looking for when they find your
site.

Search Engines are the most common tool for promoting a web site the web creates new
challenges for information retrieval. The amount of information on the web is growing rapidly,
As well as the number of new users inexperienced in the art of web research. People are likely
to surf the web using its link graph, often starting with high quality human maintained indices
such as Yahoo! or with search engines.
Human maintained lists cover popular topics effectively but are subjective, expensive to build
and maintain, Slow to improve, and cannot cover all esoteric topics. Automated search engines
that rely on keyword matching usually Return too many low quality matches.
To make matters worse, some advertisers attempt to gain people's attention by taking
measures meant to mislead automated search engines. We have built a large-scale search engine
which addresses many of the problems of existing systems. It makes especially heavy use of the
additional structure present in hypertext to provide much higher quality search results. We
chose our system name, Google, because it is a common spelling of googol, or 10^100 and fits
well with our goal of building very large-scale search engines.

Search engines use selected software programs to search their indexes for matching
keywords and phrases, presenting their findings to you in some kind of relevance ranking.
Although software programs may be similar, no two search engines are exactly the same in
terms of size, speed and content; no two search engines use exactly the same ranking schemes,
and not every search engine offers you exactly the same search options. Therefore, your search
Is going to be different on every engine you use. The difference may not be a lot, but it could be
significant. Recent estimates put search engine overlap at approximately 60 percent and unique
content at around 40 percent.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 11


CHAPTER 2

SEARCH ENGINE OPERATERS

A search engine operates, in the following order:-


1. Web crawling
2. Indexing
3. Searching

Web search engines work by storing information about many web pages, which they retrieve
from the Html itself. These pages are retrieved by a Web crawler (sometimes also known as a
spider) — an automated Web browser which follow sever link on the site. Exclusions can be
made by the use of robots.txt.
The contents of each page are then analyzed to determine how it should be indexed (for
example, words are extracted from the titles, headings, or special fields called Meta tags). Data
about web pages are stored in an index database for use in later queries. A query can be a single
word. The purpose of an index is to allow information to be found as quickly as possible. Some
search engines, such as Google, store all or part of the source page (referred to as a cache) as
well as information about the web pages, where as others, such as AltaVista, store every word of
every page they find.
This cached page always holds the actual search text since it is the one that was actually
indexed, so it can be very useful when the content of the current page has been updated and the
search terms are no longer in it.
Most search engines support the use of the Boolean operators AND, OR and NOT to further
specify the search query.
Boolean operators are for literal searches that allow the user to refine and extend the terms of
the search.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 12


CHAPTER 3
CRAWLER BASE SERACH ENGINE

Crawler-based search engines have three major elements. First is the spider, also called the
crawler. The spider visits a web page, reads it, and then follows links to other pages within the
site. This is what it means when someone refers to a site being "spidery" or "crawled."

Generally speaking, you'll find two types of search engine:

1. Crawler-based
2. Metasearch

Crawler-based Search Engines

Crawler-based search engines are what most of us are familiar with - mainly because that's
what Google and Bing are. These companies develop their own software that enables them to
build and maintain searchable databases of web pages (the engine), and to organize those pages
into the most valuable and pertinent way to the user.

They are called Crawler because the software produced crawls the web like a spider,
automatically updating and adding new pages to its search index as it goes.

You can think of these like the car - what you see and what you use - and the engine which
moves you to your destination. These are notoriously difficult and expensive to build from
scratch, and you have to be just a little bit crazy to start one! :)

Notable/Web Scale Crawlers (English-language):

 Google (USA)
 Bing (USA)
 Gig blast (USA)
 Yandex (Russia)
 Exalead (France)
 Mojeek (UK)

And yes, Mojeek is a crawler-based search engine.

Meta search Engines

If crawler-based search engines are the car, then you could think of meta search engines as the
Caravans being towed behind. These search engines don't have the arduous task of developing
the required technology (the engine) and depend upon the crawlers to build their service on. In
many if crawler-based search engines are the car, then you could think of meta search engines
as the cases they bring in results from multiple search engines with the intention of delivering
better results. Further, they usually concentrate on front-end technologies such as user
experience and novel ways of displaying the information.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 15


Crawler vs. Meta

So if the end-user experience is ostensibly the same, why should you care about what type of
search engine you're using? First, let's start with Wikipedia's definition of the word 'meta' to
learn about what Meta search engines can't do:

Incidentally, most search engines come under this category, with DuckDuckGo being perhaps
the best example, I quick and Unsubtle are two others also worth checking out.

Meta (from the Greek preposition μετά = "after", "beyond", "adjacent", "self"

Once again, the caravan analogy is apt. But more specifically, meta search engines can only use
the limited data accessed from the crawler engine to re-arrange the results. They don't have the
capacity to identify and discriminate between ranking factors [2].

Without being in the driver's seat, your experience is ultimately directed by the whim of
underlying competitors [3]. But also, the crawler search engine could decide to stop supplying
them with results at any time, maybe after seeing them as a threat or otherwise deciding not to
collaborate anymore. Possibly worse still, without an engine of their own the business model is
far easier to replicate by new entrants to the marketplace. So not only are they controlled by
other companies, but they are at more risk of being replaced by a new, shinier caravan!

The Focus of Meta search

Naturally we're biased, but impartiality is an attractive quality in business, so if you value user
experience above all else, then we absolutely suggest you try out a meta search engine. It all
comes down to what's important to you.

User experience is becoming far more important as expectations in design continue to increase.
Time saved in not developing and maintaining search technology or indexes of the web, is time
that meta search engines can allocate to the look and feel of their website. (But indeed, only an
advantage against crawler engines with smaller teams.

The Importance of Competition

I'd like to finish up the article with a concern for the growing domination of just a select few
crawler-based search engines, none of which are from the United Kingdom.

There is perhaps no better example of a monopoly today than with Google. Given that most
of their products are free to users, the opportunity for competition is narrowed, and worse
still, it masks the issue to people using their services.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 16


CONTINUE…

Too much power economically has left behind it a dismal path [4]. And in business, without
consumer-choice, the powerful are incentivized to exploit their position [5]. More generally,
corporate monopolies threaten to reach a point where they become liabilities themselves, and
present dire scenarios for society at large [6].

As Nazism Table notes in his bestselling book, Anti-fragile, "Small is beautiful, but it is also
efficient" [7]. 'Small' owes itself to choice, and 'efficiency' becomes inefficiency once you become
too powerful. Having options, in a word, matters.

I hope this article explained the difference between crawler and Meta engines and why it
matters, but if not, please get in touch.

 Metasearch engines are also not allowed to redistribute the results supplied by the
crawler engine, this prohibits their ability to supply a full search API to potential clients,
an area we believe will be important in the future of search and global knowledge
systems.
 The "too big to fail" theory asserts that certain financial institutions are so large and so
interconnected that their failure would be disastrous to the economy, and they therefore
must be supported by government when they face difficulty -
http://en.wikipedia.org/wiki/Too_big_to_fail
 Google Finance isn't the most popular finance site; according to Com Score, Yahoo
Finance claims that title, and indeed Com Score puts Google Finance in position #60 (as
of April 2010). Nonetheless, the three most prominent links all promote Google's in-
house finance service - http://www.benedelman.org/hardcoding/
 Google is now "too big to fail" as indicated by the recent DOJ investigation which could
have resulted in a felony charge for their co-founder, and most certainly would have for
a smaller firm without $500m of liquid cash - http://www.seobook.com/too-big-to-fail.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page17


CHAPTER-4
Hybrid search engine

A hybrid search engine (HSE) is a type of computer search engine that uses different types of
data with or without ontologism to produce the algorithmically generated results based on web
crawling. ... Hybrid search engines use a combination of both crawler-based results and
directory results. More and more search engines these days are moving to a hybrid-based
model.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 18


Meta search engine

A meta search engine (or aggregator) is a search tool that uses another search engine's data to
produce its own results from the Internet. Meta search engines take input from a user and
simultaneously send out queries to third party search engines for results. Sufficient data is gathered,
formatted by their ranks and presented to the users.

Metasearch engines have their own sets of unique problems. All of the websites stored on search
engines are different, which draws irrelevant content. Problems such as spamming reduces result
accuracy.[3] The process of fusion aims to tackle this issue and improve the engineering of a
metasearch engine.

Operation:-
A metasearch engine accepts a single search request from the user. This search request is then
passed on to another search engine’s database. A metasearch engine does not create a database of
web pages but generates a virtual database to integrate data from multiple sources.Since every
search engine is unique and has different algorithms for generating ranked data, duplicates will
therefore also be generated. To remove duplicates metasearch engine processes this data and
applies its own algorithm. A revised list is produced as an output for the user. When a metasearch
engine contacts other search engines, these search engines will respond in three ways:

 They will both cooperate and provide complete access to interface for the metasearch
engine, including private access to the index database, and will inform the metasearch
engine of any changes made upon the index database.
 The search engine can be completely hostile and refuse the metasearch engine total access
to their database and in serious circumstances, by seeking legal methods.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page19


CHAPTER-5
How to Use Search Engine

Searching is one of the most used actions on the Internet. Search engines as an instrument of
searching, are very popular and frequently used sites. This is the reason why webmasters and
every ordinary user on the Internet, must have good knowledge about search engines and
searching.

Webmasters use major search engines for submitting their sites on it, and for searching.
Ordinary users use major search engines primarily for searching, and sometimes for submitting
their homepages or small sites.
If you are webmaster, you will also need some information while preparing site for the Web,
and you will also use search engines. Then, when you finish with it, you must submit your URL
to many search engines. After that you will check your URL ranking on every search engine...
There are also hot news on every major search engine, many other interesting contents... All of
this shortly describes why search engines are so popular

As a user at novice level you must learn how to use search engines for searching the Internet.
You must know that there are two ways of searching: by using user's query or by using
categories. If you have keywords or phrase that best describes the theme you need, you should
use user's query. But if you need some theme, and you don't have keywords or phrase, you
should use categories.
If you use user's query, you should type keyword or phrase in this form and click on "search".
Then you will get search results, and you can choose URL, which is the best in your opinion. If
you use categories, you should click on category that best describes the theme you need. You
will then get subcategories and should choose some subcategory that best describes the theme
you are after.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page20


Repeat this action until you find group of URLs, which content is related with theme you want
Word in a search will locate for documents which definitely contain the word.

• A “-” before a word will exclude that word from search.


• Placing words between quotation marks will search for phrase between the quotes.

• Using “or” As a webmaster you must submit your URL to all major search engines. This is the
way to promote your site. You could get many visits from major search engines, if you have a
good ranking of your URL. We made page with URLs which takes you to submit forms of major
search engines. You will not lose your valuable time on searching for these forms, all of them are
on one page.

How to use search Engines


• A “+” before between search phrase will search each term separately.
Example
• +BLACK+BLUE: The search results will contain documents which contain the word black and
the word blue.
• BLACK-BLUE: Those documents will be returned which contain the word black but not the
word blue.

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page21


Challenges faced by Search Engines

 The web is growing much faster than any present-technology search engine can possibly
index (see distributed web crawling).
 Many web pages are updated frequently, which forces the search engine to revisit them
periodically.
 The queries one can make are currently limited to searching for key words, which may
result in many false positives.
o Dynamically generated sites may be slow or difficult to index, or may result in
excessive results from a single site.
o Many dynamically generated sites are not indexable by search engines; this
phenomenon is known as the invisible web.
o Some search engines do not order the results by relevance, but rather according
to how much money the sites have paid them.
o Some sites use tricks to manipulate the search engine to display them as the first
result returned for some keywords. This can lead to some search results being
polluted, with more relevant links being pushed down in the result list.

BUDDHA INSTITUTE OF TECHNOLOGY PAGE 22


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

BUDDHA INSTITUTE OF TECHNOLOGY [525] Page 9

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy