0% found this document useful (0 votes)
817 views10 pages

Unit 1: Search Engine Optimisation

The document provides an overview of search engines including their introduction, history, types and categories. It discusses the key parts and functions of search engines. The history section describes the evolution of search engines from early tools like Archie and Veronica to modern engines like Google.

Uploaded by

BP Sahani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
817 views10 pages

Unit 1: Search Engine Optimisation

The document provides an overview of search engines including their introduction, history, types and categories. It discusses the key parts and functions of search engines. The history section describes the evolution of search engines from early tools like Archie and Veronica to modern engines like Google.

Uploaded by

BP Sahani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit 1

Search Engine Optimisation

Structure
1.1 Introduction to Search Engines

1.2 History of Search Engines

1.3 Types of Search Engines

1.4 Categories of Search Engines

Summary
Key Words
Self-Assessment Questions
Answers to Check your Progress
Suggested Reading

1
Objectives
After going through this unit, you will be able to:

 Understand the basics of Search Engines

 Understand the History of Search Engines

 Learn about various approaches to Search Engines

 Know about the types of Search Engines

 Study about Search Engine Categories

1.1 INTRODUCTION TO SEARCH ENGINES


The function of a search engine is to mine the requested information from the
enormous database of resources that are presented on the internet. The Search engines
turn out to be an imperative day to day means for discovering the essential information
without knowing where precisely it is stored. The practice of using Internet has been
enormously amplified in modern days with the easy to make use of various search
engines like Google etc.

Search Engines
A search engine is a software program which deals with the information retrieval that
discovers crawls, converts and stores the information for retrieval and management in
reply to the queries that are given by the user.

In general, a search engine consists of four parts, i.e., a search interface, crawler,
indexer, and a database. The crawler navigates through a collection of documents,
deconstructs the text of document, allot surrogates for the purpose of storage in the
search engine index. The search engines which are online can store images, they can
connect data and metadata for the document.

A search engine which is used on the web is kind of a website that facilitates a user to
discover the information on the Internet. It accomplishes this by having a look through
various other web pages for the text which the user wants to find. The software that
carries out this type of task is known as a search engine. Instead of the user
encompassing to go to the first webpage, this task can be accomplished with the help
of the Web browser and a search engine.

To make use of a search engine, it is mandatory to enter at least a single keyword in the
search box. In general, an on-screen button will be present, which has to be clicked to
submit the search query. Then, the search engine tries to look for the matches between
the entered keyword(s) and the database of websites and words.

Promptly, once a search is submitted then, the results become visible on the screen. The
web page that shows the results is called as search engine results page (SERP). The SERP
is a record of Web Pages that includes matches to the keywords that were searched.

2
The SERP typically displays the names of the web, their little descriptions and a hyperlink
for each of the matching web page. When, the user click on any of the links, the user
can navigate to one of the websites.

In general, the Search engines can be considered up to some extent as the advanced
websites on the web. The Search engines try to use unique computer code to arrange
the web pages based on SERPs. In general, the most popular or peak quality web pages
will be close to the top of the list.

When a user types the words into the search engine, it glances for web pages with those
types of words. It is possible that, there might be thousands, or millions, of web pages
with those types of words. Hence, the search engines assist the users by positioning the
web pages by assuming the user desires initially.

1.2 HISTORY OF SEARCH ENGINES


The search engines of Internet themselves struck the entry of Web in December 1990.
Earlier, well documented search engine was the Archie that started searching of the
content files, namely FTP which was started on 10 September 1990.

Earlier to September 1993, the World Wide Web (WWW) was exclusively indexed by
hand only. Tim Berners-Lee has edited most of the web and he has hosted on
the CERN webserver.

The Archie program downloads the directory listings of all files which are located on
public unidentified FTP (File Transfer Protocol) sites, by creating a database that can be
searchable on file names. But, the disadvantage of the Archie Search Engine was that it
could not index the content of these sites because the amount of data was so
inadequate and it could be readily searched manually.

The growth of Gopher was formed in 1991 by Mark McCahill at the University of
Minnesota. It has shown the way to 2 innovative popular search programs, which were
Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) and
Jughead (Jonzy’s Universal Gopher Hierarchy Excavation and Display). The two search
engines Veronica and Jughead also searched the file names and titles that were stored
in Gopher index systems. The search engine Veronica as well provided a keyword
exploration of the majority of the menu titles of Gopher in the whole Gopher listings.
The search engine Jughead was a tool for attaining the menu information from
particular Gopher servers.

Unfortunately, in 1993, there was no existence of search engine for the web. Although,
several specific catalogues were maintained by hand. Oscar Nierstrasz at the University
of Geneva has written a sequence of Perl scripts that occasionally reflected these pages
and modified them into a regular format. This produced the foundation for W3Catalog,
i.e., the first primitive search engine of web that was released on September 2, 1993.

In June 1993, Matthew Gray, then at MIT shaped the earliest web robot, which is Perl-
based World Wide Web Wanderer. It is used to produce an index called 'Wandex'. The
main idea of the Wanderer was to determine the size of the World Wide Web. In
November 1993, Aliweb, the second search engine of web came into view. Aliweb did

3
not use a web robot, instead, the administrators were advised to maintain the existence
at each site of an index file in a particular format.

NCSA's Mosaic™ - Mosaic (web browser) was not the first Web browser, but it was the
initial one to make a main splash. In November 1993, Mosaic version 1.0 has started
including a variety of features like bookmarks, icons, a more eye-catching interface, and
pictures, all of them has made the software so that it can be used easily and attractive.

Jump Station, that was created in December 1993 by Jonathon Fletcher used a web
robot to locate web pages and to construct its index, and used a web form as an
interface to its query plan. It was the first resource-discovery tool of World Wide Web
to unite the three necessary features of a web search engine, i.e., crawling, indexing,
and searching.

WebCrawler was one of the foremost "all text" search engines which were crawler-
based, it appeared out in 1994. In contrast to its antecedent, it allowed users to explore
any word on any webpage, which has turn out to be the standard for all major search
engines ever since. It was also the search engine that was extensively known by the
community. In 1994, Lycos which started at Carnegie Mellon University was started and
turn out to be a most important commercial effort. Many search engines have appeared
after and they had gained the popularity. Some of them are Excite, Magellan, Infoseek
Yahoo!, etc. The Yahoo search engine was amongst the most popular one by the people
to locate web pages of significance. The search function of the Yahoo! search engine will
be totally operated on its web directory, slightly than its full-text replica of web pages.
Instead of performing a keyword-based search, the Information seekers look through
the directory.

In 1996, Netscape seems to provide a solo search engine with an elite deal, as the search
engine has all features on the web browser of the Netscape, which has made Netscape
to hit the deals with five of the most important search engines for $5 million a year, so
that, each of the search engine would be in replacement on the page of the Netscape
search engine. The five search engines were Yahoo!, Magellan, Lycos, Infoseek, and
Excite.

Google implemented the thought of selling search terms in 1998, starting from a small
search engine company and named it as goto.com. This progress has a momentous
outcome on the Search Engine business, which has led to one of the most profitable
businesses in the internet.

Number of companies have entered in the market outstandingly by getting record gains
all the way through their initial public offerings. Some of them have opted for their
public search engine, and some of them started as marketing enterprise-only editions
such as Northern Light. Many search engine companies were trapped up in the dot-com
bubble, which was a speculation-driven market explosion that worn out in 1999 and
ended in 2001.

Around 2000, Google has given a rise to prominence. Better results were achieved by
the company for lot of searches with an improvement called as PageRank.
This PageRank was an iterative algorithm that assigns the ranks to the webpages based

4
on the number and PageRank of other websites and pages that link there, on the
principle that good or enviable pages were linked to more than others. Google as well
retain a modest interface to its search engine, where lot of its competitors entrenched
a search engine in a web portal itself. In reality, Google search engine has turn out to be
so popular.

By 2000, Yahoo! was providing search services that were based on Inktomi's search
engine. And later, Yahoo! changed to Google's search engine until 2004, when it started
its own search engine that is based on the combined technologies of its achievements.

Microsoft primarily launched MSN Search in the fall of 1998 using search results from
Inktomi. But later, in 2004, Microsoft began a changeover to its own search technology,
which was powered by its own web crawler which was called as msnbot.

The Microsoft's rebranded search engine is Bing, which was initiated on June 1, 2009.
On July 29, 2009, Yahoo! and Microsoft concluded an agreement in which Yahoo!
Search would be powered by Microsoft Bing technology.

1.3 TYPES OF SEARCH ENGINES


In general, Search engines are categorised on the basis of its work process. They are
categorised as Crawler based search engines, Human powered directories, Hybrid
search engines, other special search engines.

I) Crawler Based Search Engines

The crawler based search engines make an effort to make use of a crawler or bot or
spider for crawling and indexing the content that is novel to the search database. The
basic 4 steps which every crawler based search engine pursues before showing any
site in the search results are Crawling, Indexing, Calculating Relevancy and Retrieving
the Result.

a) Crawling

The Search engines search to get the obtainable web pages in the whole web. A
portion of software called as a crawler or bot or spider carry out the process of
crawling. The frequency of crawling depends on the search engine and it may
perhaps acquire few days between the crawls. Therefore, sometimes, search
results show the content of old page or deleted page. Once the search engines
crawl the site another time, search results will illustrate the latest content.

b) Indexing

The process of Indexing is subsequent step after the crawling process. It is a


method of recognising the words and expressions that most excellently illustrate
the page. The recognised words are also called as keywords and page is allotted
to the recognised keywords. In some cases, when crawler does not recognize the
importance of page then, the site might be given lower rank based on the search
results. Once the crawlers pickup right keywords, then the page will be allocated
to those keywords and higher rank will be given based on search results.

5
c) Calculating Relevancy

The Search engine starts comparing the search string in the search appeal with
the pages that are indexed from the database. Since it is possible that more than
one page may hold the search string, therefore, the search engine begins the
process of computing the relevancy of each and every page in its index with the
search string. There are numerous algorithms that are available to determine the
relevancy. Each and every algorithm has different relative weights for general
factors like keyword density, links, or Meta tags. Therefore, different search
engines offer different search results pages for the identical search string. It is a
well-known fact that every major search engine intermittently modifies their
algorithms.

d) Retrieving Results

It is the last step in search engines. Mainly, it is basically demonstrating them in


the browser in an order. The Search engines arrange the continuous pages of
search results in the order of most appropriate to the least appropriate sites. The
majority of popular search engines are crawler based search engines and use the
above technology to display search results. Some of the examples of crawler
based search engines are Google, Yahoo!, and Bing etc. The other accepted
crawler based search engines are AOL, DuckDuckGo and Ask.

II) Human Powered Directories

The Human powered directories are also called as open directory system which
depends on individual based activities for listings. The working of indexing in human
powered directories is as follows:

 The owner of the site submits a small description of the site to the directory
along with the category type that it is to be listed.

 The submitted site is manually evaluated and then it is added in the suitable
category or it is discarded for listing.

 The keywords that are entered in a search box will be coordinated with the
description of the sites. Therefore, the changes that are made to the content
of web pages are not taken into consideration as it is simply the explanation
that matters.

In general, a good site with good quality content is more possibly to be reviewed for
free compared to a site with deprived content. Some of the examples of human
powered directories are DMOZ and Yahoo! Directory.

III) Hybrid Search Engines

The Hybrid Search Engines attempt to make use of both crawler based and manual
indexing for listing the web sites in search results. The greater part of the crawler
based search engines is similar to that of Google which mostly uses crawlers as a
major mechanism and human powered directories as insignificant mechanism. In
view of the fact that, the human powered directories are becoming extinct,
6
therefore, the hybrid types are becoming more and more crawler based search
engines. However, still there are manual filtering of search result that happens to
eliminate the copied and spam websites. When a website is recognised as a spam, it
is a duty of website’s owner to take the necessary and corrective action and resubmit
the website to search engines. In that case, the specialists do manual evaluation of
the submitted website before including it another time in the search results. In this
way, even if the crawlers manage the processes, the control is manual to observe
and show the search results as per the expectations.

IV) Other Types of Search Engines

These types of search engines may be classified into a variety of other types
depending upon their usage. Some of the search engines hold different types of bots
to completely demonstrate the videos, images, news, products and local listings. One
such search engine includes Google News page which can be used to seek out only
news that arises from dissimilar newspapers.

The search engines similar to Dogpile attempt to collect Meta information of the web
pages from additional search engines and directories to illustrate the search results,
therefore, they are also known as metasearch engines.

The Swoogle is one of the Semantic search engine which tries to present exact search
results on a particular area by taking into account the related importance of the
search queries.

Check your Progress 1


Fill in the Blanks.

1. The Human powered directories are also called as ______________.

2. The ______________ attempt to make use of both crawler based and manual
indexing for listing the web sites in search results.

1.4 CATEGORIES OF SEARCH ENGINES


a) Web Search Engines

The Search engines that are specifically planned to search the web pages, images and
documents were developed to assist searching through a big, tenuous splash of
shapeless resources. They are engineered to go after a multi-stage process, i.e.,
crawling the endless accumulation of pages and documents to skim the abstract
foam from their contents, indexing the fluff/catchphrase in a variety of form i.e.,
semi-structured in nature, and at end, deciding the entries or the queries of the user
to go back with most appropriate results and links to those scanned pages or
documents from the inventory.

7
b) Crawl

In the case of completely textual search, the primary step in categorising the web
pages is to locate an ‘index item’ that may communicate explicitly to the ‘search
term.’ In the earlier period, the search engines initiate with a minute list of Uniform
Resource Locators (URLs) which is also called as seed list. Seed list tries to fetch the
content, and parses the links on those pages for related information, which later
provides the new links. This procedure was extremely recurring and sustained until
an adequate amount of pages were by the searcher for their use. At present, an
uninterrupted crawl technique is in use which is in contrast to a supplementary
finding which is based on a seed list. In general, the crawl method is an expansion of
a foresaid finding method which does not have a seed list, because the system in no
way stops worming.

Majority of search engines try to make use of complicated scheduling algorithms to


make a decision about when to re-examine a particular page, to request its
importance. These algorithms vary from unvarying visit-interval with top priority for
more commonly varying pages to adaptive visit-interval which are based on
numerous criteria such as occurrence of change, attractiveness, and on the whole
quality of site. The speed of the web server that runs the page as well as constraints
of the resources similar to quantity of hardware or bandwidth also shapes in.

c) Link map

The pages that are revealed by web crawls are regularly scattered and fed into
another computer that generate an absolute record of resources that are uncovered.
The bunchy cluster mass appears like a small graph, on which the dissimilar pages are
characterised as minute nodes that are linked by the links among the pages. The
excess of data is accumulated in numerous data structures that allow fast access to
assumed data by definite algorithms. These algorithms calculate the reputation score
of pages based on how many links position to an assured web page, which is how
people can access several number of resources that are concerned with detecting
fixation. In general, the search engines frequently distinguish between internal links
and external links. The Link map data structures usually store up the anchor text that
is surrounded in the links as well, since, the anchor text can repeatedly offer a very
good quality review of the content of the web page.

d) Database Search Engines

Specialised search engines arise because searching for text-based content in


databases presents a small number of extraordinary. Sometimes, the databases can
be slow while resolving complex queries. The process of crawling is not necessary for
the database, because the data is previously structured. But, it is frequently
necessary to index the data in a more economised form to permit a speedier search.

e) Mixed Search Engines

At times, data search includes both database content and web pages or documents.
Therefore, the Search engine technology has developed to act in response to both
sets of requirements. Majority of the assorted search engines are very large, the Web

8
search engines like Google searches together through ordered and unordered data
sources. In general, the documents are crawled and indexed in a separate index. The
databases are also indexed from a variety of sources. The results of search are then
generated for users by questioning these numerous indices in parallel and
compounding the results according to “rules.”

Check your Progress 2

State True or False.

1. The process of crawling is not necessary for the database, because the data is
previously structured.

Summary
 The Swoogle is one of the Semantic search engine which tries to present exact
search results on a particular area by taking into account the related importance of
the search queries.

 At present, an uninterrupted crawl technique is in use which is in contrast to a


supplementary finding that is based on a seed list.

 Data search includes both database content and web pages or documents.

Keywords
 URL: Uniform Resource Locators. In the earlier period, the search engines initiate
with a minute list of URL which is also called as seed list. Seed list tries to fetch the
content, and parses the links on those pages for related information, which later
provides the new links.

Self-Assessment Questions
1. Explain the categories of Search Engines.

2. Explain the types of Search Engines in detail.

Answers to Check your Progress


Check your Progress 1

Fill in the Blanks.

1. The Human powered directories are also called as open directory system.

2. The Hybrid Search Engines attempt to make use of both crawler based and manual
indexing for listing the web sites in search results.

9
Check your Progress 2

State True or False.

1. True

Suggested Reading
1. Peter Kent, SEO for Dummies, 6th Edition, John Wiley & Sons.

2. Jason McDonald, SEO Toolbook: 2018 Directory of Free Search Engine Optimization
Tools, Kindle Edition.

3. W. Bruce Croft, Donald Metzler, Trevor Strohman, Search Engines Information


Retrieval in Practice, Pearson Education, Inc.

4. Aaron Matthew Wall, Search Engine Optimization book.

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy