0% found this document useful (0 votes)
523 views556 pages

103 SM - All - in - One

This document provides an introduction to statistics for management. It discusses the importance and relevance of statistics in modern business environments and decision making. Statistics can be used to solve problems, support decisions, and reduce guesswork. The document also outlines the objectives, history, definitions, scope, applications, characteristics, functions, and limitations of statistics. It provides examples of how statistics are used in various business contexts like accounting, finance, marketing, production, and economics. Descriptive and inferential statistics are also introduced.

Uploaded by

Dilip Goliya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
523 views556 pages

103 SM - All - in - One

This document provides an introduction to statistics for management. It discusses the importance and relevance of statistics in modern business environments and decision making. Statistics can be used to solve problems, support decisions, and reduce guesswork. The document also outlines the objectives, history, definitions, scope, applications, characteristics, functions, and limitations of statistics. It provides examples of how statistics are used in various business contexts like accounting, finance, marketing, production, and economics. Descriptive and inferential statistics are also introduced.

Uploaded by

Dilip Goliya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 556

Statistics for Management Unit 1

Unit 1 Introduction to Statistics


Structure:
1.1 Introduction
Objectives
Relevance
Statistics in practice
Importance of statistics in modern business environment
1.2 History of Statistics
1.3 Definition of Statistics
1.4 Scope and Application of Statistics
1.5 Characteristics of Statistics
1.6 Functions of Statistics
1.7 Limitations of Statistics
1.8 Statistical Softwares
1.9 Summary
1.10 Glossary
1.11 Terminal Questions
1.12 Answers
1.13 Case Study

1.1 Introduction
Statistics plays an important role in almost every facet of human life. In
business context, managers are required to justify decisions on the basis of
data. They need statistical models to support these decisions. Statistical skills
enable managers to collect, analyse and interpret data in order to take
suitable decisions.
Statistical concepts and statistical thinking enable them to:
 Solve problems in almost every domain
 Support their decisions
 Reduce guesswork
In this unit, you will study about Statistics, which deals with gathering,
organising, presenting and analysing data.

Manipal University Jaipur Page No. 1


Statistics for Management Unit 1

Objectives:
After studying this unit, you should be able to:
 describe the scope and applications of statistics
 explain the characteristics of statistics
 recognise the functions of statistics
 identify the limitations of statistics
 analyse statistical software’s

1.1.1 Relevance
Nature created variation and thereby generated the importance for the
subject of statistics. This essentially exists only because of variation in data
– be it the height or weight of newly born babies, features like face, height or
weight of persons, growth of companies or market price. Truly, the capital
Greek word ∑ (pronounced summation), used for indicating total or sum of
numbers and the small Greek word σ (pronounced sigma), used for
measuring deviation could be labelled as the life blood for statisticians.
Although nature believes in variation, it also believes in mathematical
variation like weight of the new born babies, height of the individuals etc.
without any bias. The other examples of man-made asymmetrical variation
are: educational qualification, house hold income etc. The study of Statistics
will help in the study of variation in data for finding patterns and making
conclusions.
(Source: Adapted from T. N. Srivastava & Shailaja Rejo (2008) Statistics for
Management 5th ed.TMH)

1.1.2 Statistics in Practice


Business Week
Business Week is the most popular business magazine in the world. With its
global presence, it circulates more than 1 million copies around the globe.
Along with feature articles on current scenario, the magazine also contains
regular sections on Global Business, Economic Analysis, and Information of
Science & Technology.
Business Week issues provide a detailed report on a topic of current
interest. Often, the detailed report includes statistical facts and conclusions
that help the reader understand the business and economic information
easily. Moreover, the weekly Business Week provides information related
Manipal University Jaipur Page No. 2
Statistics for Management Unit 1

statistics about the state of the economic system including production


indices, stock prices, market growth, mutual funds and interest rates.
Business Week also focuses on statistical information to help manage its
own business. For example, an annual survey of subscribers help the
company to learn about subscriber demographics, reading habits, likely
purchases, lifestyles, etc. The Business Week managers depend on the
statistical conclusion from the survey to provide better services to subscribe
and to advertise.
(Source: David R. Anderson, Dennis J. Sweeney & Thomas A. Williams 5th edition,
Thomson Business Information Pvt Ltd.)

1.1.3 Importance of statistics in modern business environment


Due to advanced communication networks, rapid changes in consumer
behaviour, varied expectations of a variety of consumers and new market
openings, modern managers have a difficult time in making quick and
appropriate decisions. Therefore, there is a need for them to depend more
upon quantitative techniques like mathematical models, statistics,
operations research and econometrics.
In this section, there are examples that illustrate some of the uses of
statistics in business and economics.
Accounting
Public accounting firms use statistical sampling procedures when
conducting audits for their clients.
Finance
Financial advisors use a variety of statistical information to guide their
investment recommendations.
Marketing
Electronic scanners at retail checkout counters are being used to collect
data for a variety of marketing research applications.
Production
Today’s emphasis is on quality. Quality is of utmost importance in
production. A variety of statistical quality control charts are used, to monitor
the average output of a production process.

Manipal University Jaipur Page No. 3


Statistics for Management Unit 1

Economics
Economists are frequently asked to provide forecasts about the future of the
economy. They use a variety of statistical information in making such
forecasts. For example, in forecasting inflation index, economists use
statistical information on indicators such as the producer index, the
unemployment rate and manufacturing capacity utilisation.

Caselet 1
The new General Manager Mr. Ravi of a manufacturing company is
concerned about the dwindling profits of the company. The Marketing and
Production Managers identify the reason as the guarantee period given to
customers, since the product has to be replaced if it fails within the
guarantee period. This replacement lowers the company’s profits and also
causes loss of reputation. The General Manager wants to reduce the
percentage of failure of units within a year. This means that he should
take action to improve the life of the unit. After preliminary studies he
decides to:
i) Estimate the average life of the units and their variation.
ii) Take action to improve the life of the unit.
iii) Lower the replacement cost as much as possible.

As you can see, the General Manager is using Statistics to solve a problem
and to increase profits. Decision making is a key part of our day-to-day life.
Even when we wish to purchase a television, we want to know the price,
quality, durability, and maintainability of various brands and models before
buying one. In this scenario, data is collected and an optimum decision is
made. In other words, we are using Statistics.
Suppose a company wishes to introduce a new product, it has to collect
data on market potential, consumer likings, availability of raw materials, and
feasibility of producing the product. Hence, data collection is the back-bone
of any decision making process.
Many organisations find themselves data-rich but, they are poor in drawing
information out of it. Therefore, it is important to develop the ability to extract
meaningful information from raw data, in order to make better decisions.
Statistics plays an important role in this aspect.

Manipal University Jaipur Page No. 4


Statistics for Management Unit 1

Statistics is broadly categorised into two parts based on their functions,


namely, Descriptive Statistics and Inferential Statistics. Figure 1.1 illustrates
those two categories.

Statistics

Descriptive Inferential
Statistics Statistics

Collecting Making inference


Organising Hypothesis testing
Summarising Determining relationships
Presenting data Making predictions

Fig. 1.1: Categories in Statistics

Descriptive Statistics: Descriptive Statistics is used to present the general


description of data which is summarised quantitatively. This is mostly useful
in clinical research, while communicating the results of experiments.

Caselet 2
In a firm, Human Resource Manager (HR Manager) calculates the
average salary of employees of the production department. The statistical
data collected is related to the production department and does not give
any information about the other departments of the firm. Here, the HR
Manager is using descriptive statistics. In this example, the HR Manager
displays the summarised numerical data in the form of tables, charts, and
diagrams, which come under descriptive statistics.

Inferential Statistics
Inferential Statistics is used to make valid inferences from the data for
effective decision making among managers or professionals. Statistical

Manipal University Jaipur Page No. 5


Statistics for Management Unit 1

methods such as estimation, prediction and hypothesis testing come under


inferential statistics. The researchers make deductions or conclusions,
regarding some characteristics of a population from the data that is collected
from a sample of that population.

Caselet 3
In a firm, the Human Resources Manager (HR Manager) uses the
average salary of employees of the production department, along with the
salary details of other departments, to estimate/project the average salary
of employees for all other departments in the firm. Here, the HR Manager
is using inferential statistics as the estimation of averages deals with
inferential statistics.

Activity
Place the number of the appropriate definition next to the item it describes
A. Statistic 1. Do not contain the same outcome
B. Parameter 2. The use of sample statistics to draw
C. Discrete conclusions concerning the population.
E. Mutually exclusive 3. A numerical characteristic of a sample.
F. Zero 4. Only finite values can exist on the X axis.
G. Continuous 5. Sum of deviation around a mean.
H. Inferential statistic 6. Measurement may assume any value
associated with uninterrupted Scale
7. A numerical characteristic of a
population.
Solution
A. 3, B. 7, C. 4, E. 1, F. 5, G. 6, H. 2

Self Assessment Questions


1. In which of the following situations would you like to use Statistics?
a) Buying a house
b) Purchasing medicine prescribed by a doctor
c) Investing funds in several options
d) Attending relatives marriages

Manipal University Jaipur Page No. 6


Statistics for Management Unit 1

2. Out of the following, which one does not refer to a mass of data?
a) Banking Statistics
b) Mathematical Statistics
c) Agricultural Statistics
d) Income Statistics
3. Which of the following statement is most appropriate?
a) Nature believed in statistics
b) Nature created statistics
c) Nature believed in variation
d) Nature believed in symmetrical variation
4. Which of the following statement is true?
a) Statistics enlarges physical vision
b) Statistics helps in estimation
c) Statistics quantifies uncertainty
d) Statistics is of no use to humanity.
5. The origin of statistics can be traced to
a) State
b) Commerce
c) Economics
d) Industry

1.2 History of Statistics


This is a year wise presentation of the history of statistics.
1654 – Pascal – mathematics of probability, in correspondence with
Fermat
1662 – William Petty and John Graunt – first demographic studies
1713 – Jakob Bernoulli – Ars Conjectandi
1733 – DeMoivre – Approximatio; law of error (similar to standard
deviation)
1763 – Rev. Bayes – An essay towards solving a problem in the
Doctrine of Chances, foundation for "Bayesian statistics"
1805 – A-M Legendre – least square method
1809 – C. F. Gauss – Theoria Motus Corporum Coelestium
1812 – P. S. Laplace – Théorie analytique des probabilités
Manipal University Jaipur Page No. 7
Statistics for Management Unit 1

1834 – Statistical Society of London established


1853 – Adolphe Quetelet – organised first international statistics
conference; applied statistics to biology; described the bell-
shaped curve.
1877 – F. Galton – regression to the mean
1888 – F. Galton – correlation
1889 – F. Galton – Natural Inheritance
1900 – Karl Pearson – chi square; applied correlation to natural selection
1904 – Spearman – rank (non-parametric) correlation coefficient
1908 – "Student" (W. S. Gossett) –The probable error of the mean; the
t-test
1919 – R. A. Fisher – ANOVA; evolutionary biology
1930's – Jerzy Neyman and Egon Pearson (son of Karl Pearson) – type II
errors, power of a test, confidence intervals

1.3 Definition of Statistics


According to Seligman, “Statistics is a science which deals with the method
of collecting, classifying, presenting, comparing and interpreting the
numerical data to throw light on enquiry”.
According to Horace Secrist, Statistics may be defined as “an aggregate of
facts affected to a marked extent by multiplicity of causes, numerically
expressed, enumerated or estimated according to a reasonable standard of
accuracy, collected in a systematic manner for a predetermined purpose
and placed in relation to each other”1. This definition is both comprehensive
and exhaustive.
Prof. Boddington, on the other hand, defined Statistics as “The science of
estimates and probabilities”2. This definition is also not complete.
According to Croxton and Cowden, “Statistics is the science of collection,
presentation, analysis and interpretation of numerical data from logical
analysis”3.

(1Source: Agarwal B L (2006) Basic Statistics 4th ed. Pg 1 New Age International Publishers)
(2Source: Agarwal B L (2006) Basic Statistics 4th ed. Pg 2 New Age International Publishers)
(3Source: Agarwal B L (2006) Basic Statistics 4th ed. Pg 2 New Age International Publishers)

Manipal University Jaipur Page No. 8


Statistics for Management Unit 1

Figure 1.2 depicts four different components of Statistics as per Croxton and
Cowden.

Collection of Presentation Analysis Interpretation


Data of Data of Data of Data

Fig. 1.2: Basic Components of Statistics According to Croxton and Cowden

1. Collection of data
Careful planning is required while collecting data. Two methods used for
collecting data are census method and sampling method. The investigator
has to take care while selecting an appropriate collection method.
In the census method, every unit or object of the population is included in
the investigation. For example, in the census method, if we want to study
the average annual income of 500 families in a given area, we must study
the income of all the families in that area. When the population is large,
applying the census method would be difficult.
Sometimes a sample of units or objects is taken from the population to
describe the overall characteristics of that population. This method of
collecting data is called sampling. Sampling method is helpful when it is a
large population or when the results are needed in a short time.
2. Presentation of data
The collected data is usually presented for further analysis in a tabular,
diagrammatic or graphic form and it is condensed, summarised and visually
represented in a tabular or graphical form.
Tabulation is a systematic arrangement of classified data in rows and
columns. For the representation of data in diagrams, we use different types
of diagrams such as one-dimensional, two-dimensional and three-
dimensional diagrams.
 Line diagrams, bar diagrams are one-dimensional diagrams. (Refer to
figure 1.3 and figure 1.4 for the illustrations of line diagrams and bar
diagrams respectively)

Manipal University Jaipur Page No. 9


Statistics for Management Unit 1

Fig. 1.3: Line diagram Fig. 1.4: Bar diagram

 Pie-charts are two-dimensional diagrams which are in the form of a


circle. In a pie-chart, total and component parts are shown in a circular
shape.

The pie-chart in figure 1.5 represents the sales figures of SPQ


company for the year 2008.

Fig. 1.5: Sales Figures of SPQ Company

 The pie-chart in figure 1.6 shows the monthly expenses of a family.


From the pie-chart, we can infer that Prasad’s family spent maximum on
food and spent equal amounts on fuel and miscellaneous items.

Manipal University Jaipur Page No. 10


Statistics for Management Unit 1


 Fig. 1.6: Pie-chart of Prasad’s Family Expenses

3. Analysis of data
The data presented has to be carefully analysed to make any inference from
it. The inferences can be of various types, for example, as measures of
central tendency, dispersion, correlation or regression.
Measures of central tendency will cluster around the figure which is in the
central location. In case of population, the measures are the parameters and
in case of the sample are statistics that are estimates of population
parameters. The three most common ways of measuring the centre of
distribution is mean, mode and median.
In case of population, the measures of dispersion are used to quantify the
spread of the distribution. Range, interquartile range, mean deviation and
standard deviation are four measures to calculate the dispersion.
4. Interpretation of data
The final step is to draw conclusions from the analysed data. Interpretation
requires a high degree of skill and experience.
Thus, Statistics contains the tools and techniques required for collection,
presentation, analysis and interpretation of data. Thus, we can conclude that
this definition is precise and comprehensive.

Self Assessment Questions


6. According to the definition of Statistics given by Croxton and Cowden,
what are the four components of Statistics?

Manipal University Jaipur Page No. 11


Statistics for Management Unit 1

7. ‘Statistics may be called the science of counting’ is the definition given


by
a) Croxton
b) A.L.Bowley
c) Boddington
d) Webster
8. In the olden days statistics was confined only to _______.

1.4 Scope and Application of Statistics


Statistical methods are applied to specific problems in various fields such as
Biology, Medicine, Agriculture, Commerce, Business, Economics, Industry,
Insurance, Sociology and Psychology.
In the field of medicine, statistical tools like t-tests are used to test the
efficiency of the new drug or medicine. In the field of economics, statistical
tools such as index numbers, estimation theory and time series analysis are
used in solving economic problems related to wages, price, production and
distribution of income. In the field of agriculture, an important concept of
statistics such as analysis of variance (ANOVA) is used in experiments
related to agriculture, to test the significance between two sample means.
In Biology, Medicine and Agriculture, Statistical methods are applied in the
following:
 Study of the growth of plants
 Movement of fish population in the ocean
 Migration pattern of birds
 Analysis of the effect of newly invented medicines
 Theories of heredity
 Estimation of yield of crop
 Study of the effect of fertilizers on yield
 Birth rate
 Death rate
 Population growth
 Growth of bacteria

Manipal University Jaipur Page No. 12


Statistics for Management Unit 1

Insurance companies decide on the insurance premiums based on the age


composition of the population and the mortality rates. Actuarial science is
used for the calculation of insurance premiums and dividends.
Statistics is a part of Economics, Commerce and Business. Statistical
analysis of the variations in price, demand and production are helpful to both
businessmen and economists. Cost of living index numbers help
governments in economic planning and fixation of wages. A government’s
administrative system is fully dependent on production statistics, income
statistics, labour statistics, economic indices of cost, and price. Economic
planning of any nation is entirely based on the statistical facts. Cost of living
index numbers are also used to estimate the value of money. In business
activities, analysis of demand, price, production cost, and inventory costs
help in decision making.
Management of limited resources and labour needs statistical methods to
maximise profit. Planned recruitments and distribution of staff, proper quality
control methods, and a careful study of the demand for goods in the market
and balanced investment, help the producer to extract maximum profit out of
minimum capital investment. In manufacturing industries, statistical quality
control techniques help in increasing and controlling the quality of products
at a minimum cost. Hence, statistics is applied in every sphere of human
activity.

Self Assessment Question


9. Mention some other areas where there is a scope of applying statistics.

1.5 Characteristics of Statistics


There are several characteristics of Statistics. Not only does it deal with an
aggregate of facts, it also gets affected by multiple causes. Statistics
numerically expressed, is estimated with varying degrees of accuracy and
is collected in a systematic manner for pre-determined purposes. To ensure
comparative and analytical studies, statistical facts need to be arranged in a
systematic, logical order. Let us look at each characteristic in detail.
1. Statistics deals with an aggregate of facts
A single figure cannot be analysed. For example, the fact ‘Mr Kiran is 170
cms tall’ cannot be statistically analysed. On the other hand, if we know the
Manipal University Jaipur Page No. 13
Statistics for Management Unit 1

heights of 60 students in a class, we can comment upon the average height


and variation.
2. Statistics gets affected to a great extent by multiplicity of causes
The Statistics of the yield of a crop is the result of several factors, such as
the fertility of soil, amount of rainfall, the quality of seed used, the quality
and quantity of fertilizer used.
3. Statistics are numerically expressed
Only numerical facts can be statistically analysed. Therefore, facts such as
‘price decreases with increasing production’ cannot be called statistics. The
qualitative data such as, the categorical data cannot be called as statistics,
for example, the eye colour of a person or the brand name of an automobile.
4. Statistics are enumerated or estimated with required degree of
accuracy
The facts have to be collected from the field or estimated (computed) with
the required degree of accuracy. The degree of accuracy differs depending
upon the purpose. For example, in measuring the length of screws, an
accuracy of up to a millimetre may be required, whereas while measuring
the heights of students in a class, an accuracy of up to a centimetre is
enough.
5. Statistics are collected in a systematic manner
The facts should be collected according to planned and scientific methods
otherwise, they are likely to be wrong and misleading.
6. Statistics are collected for a pre-determined purpose
There must be a definite purpose for collecting facts. Otherwise,
indiscriminate data collection might take place which would lead to wrong
diagnosis.
7. Statistics are placed in relation to each other
The facts must be placed in such a way that a comparative and analytical
study becomes possible. Thus, only related facts which are arranged in a
logical order can be called Statistics. Statistical analysis cannot be used to
compare heterogeneous data.

Manipal University Jaipur Page No. 14


Statistics for Management Unit 1

Self Assessment Questions


10. Answer the following:
a) Should the same degree of accuracy be applied while measuring
the height of a mountain and the height of a person?
b) Does Statistics deal with qualitative data?
11. Categorise the following data as qualitative or quantitative data
a) The number of transactions occurring in an ATM per day
b) The popular brand name in cars is Maruthi

1.6 Functions of Statistics


Statistics is used for various purposes. It is used to simplify mass data and
to make comparisons easier. It is also used to bring out trends and
tendencies in the data, and the hidden relations between variables. All these
help in easy decision making. Let us look at each function of Statistics in
detail.
1. Statistics simplifies mass data
The use of statistical concepts helps in simplification of complex data. Using
statistical concepts, the managers can make decisions more easily. The
statistical methods help in reducing the complexity of the data and in the
understanding of any huge mass of data.
Solved Problem 1:
Fifty people were interviewed to rate a regional movie on the scale of 1 to
10, with number 1 being the top movie and number 10 being the worst
movie. The table 1.1 shows the ratings given by 50 customers. Simplify the
data.
Table 1.1: The Ratings (scale of 1 to 10) for a
Regional Movie Given by 50 Customers

15768 75347 12587 47424 98725


45798 78967 23287 63576 39548

The data in table 1.1 can be condensed and is presented in table 1.1a using
the statistical concepts such as, calculating frequency and frequency
distribution to draw conclusions and then the frequency table is prepared. In
this example, from the bulk data consisting of 50 rating scores, the

Manipal University Jaipur Page No. 15


Statistics for Management Unit 1

frequency table was prepared. The frequency table is in a condensed and


simple form. From the table 1.1a, we can easily interpret that for the regional
movie, most of the customers gave 7 as a rating (that is, 11 customers).
Only two customers gave a rating of 1 for the movie, which means that, only
two out of 50 customers surveyed, liked this movie the most.
Table 1.1a depicts the rating by customers using frequency and frequency
distribution.
Table 1.1a: Frequency Table

Rating Frequency Frequency Distribution


1 2 2/50 = 0.04
2 5 5/50 = 0.10
3 4 4/50 = 0.08
4 6 6/50 = 0.12
5 7 7/50 = 0.14
6 4 4/50 = 0.08
7 11 11/50 = 0.22
8 7 7/50 = 0.14
9 4 4/50 = 0.08
10 0 0/50 =0
Total 50 1
2. Statistics brings out trends and tendencies in the data
After data is collected, it is easy to analyse the trend and tendencies in the
data by using the various concepts of Statistics.
3. Statistics brings out the hidden relations between variables
Statistical analysis helps in drawing inferences on the data. Statistical
analysis brings out the hidden relations between variables.
4. Decision making power becomes easier
With the proper application of Statistics and statistical software packages on
the collected data, managers can take effective decisions, which can
increase the profits in a business.
5. Statistics makes comparison easier
Without using statistical methods and concepts, collection of data and
comparison would be difficult. Statistics helps us to compare data collected
from various sources. Grand totals, measures of central tendency and

Manipal University Jaipur Page No. 16


Statistics for Management Unit 1

measures of dispersion, graphs and diagrams and coefficient of correlation


all provide ample scope for comparison.

Example 5
The graphical curve represented in figure 1.7 and figure 1.8 shows the
profits of CBA Company and ZYX Company respectively, for ten years
from 1998 to 2008. The timeline in years is plotted on the X-Axis and the
profits are on the Y-Axis. From the graphs, we can compare the profits of
both the companies and conclude that profits of CBA Company in the
year 2008 are higher than that of ZYX Company.
The profits curve in the case of figure 1.7 shows that the profits for CBA
Company are increasing, whereas in figure 1.8 it is constant for ZYX
Company from the middle of the decade (1998-2008).

Fig. 1.7: Profits of CBA Fig. 1.8: Profits of ZYX

Hence, visual representation of the numerical data helps to compare the


data with less effort and effective decisions can be made.

Self Assessment Question


12. The total sale of a product in Area A is 840 for 30 working days. The
total sale of the same product in Area B is 784 for 28 working days.
Should Statistics be applied to get an appropriate picture regarding the
comparison of sales?

Manipal University Jaipur Page No. 17


Statistics for Management Unit 1

1.7 Limitations of Statistics


Despite all its characteristics and functions, Statistics also has certain
limitations.
1. Statistics does not deal with qualitative data
Qualitative data deals with meanings while quantitative data deals with
numbers. Qualitative data describes properties or characteristics that are
used to identify things. Quantitative data describes data in terms of quantity
using the numerical figure accompanied by a measurement unit. Statistics
deals only with quantitative data.
Statistics deals with numerical data, which can be expressed in terms of
quantitative measurements. So, the qualitative phenomenon like beauty and
intelligence cannot be expressed numerically and any statistical analysis
cannot be directly applied on these qualitative phenomena. However,
Statistical techniques may be applied indirectly by first reducing the
qualitative data to accurate quantitative terms. For example, the intelligence
of a group of students can be studied on the basis of their marks in a
particular examination.
2. Statistics does not deal with individual facts
Statistical methods can be applied only to aggregates of facts, because
analysis and interpretation of data is highly difficult in the case of individual
facts.
3. Statistical inferences (conclusions) are not exact
Statistical inferences are true only on an average. They are probabilistic
statements. For example, in case of a data, which consists of the height of
200 male persons taken from a graduate school, the inferences so obtained
may not hold true for an individual male person in particular.
4. Statistics can be misused and misinterpreted
Lack of sufficient knowledge of statistical science often leads to incorrect
conclusions. Therefore, proper care must be taken while selecting the
collection method and also in choosing appropriate statistical models.
Increasing misuse of Statistics has led to increasing distrust in Statistics.
5. Common men cannot handle Statistics properly
The field of Statistics is so vast that it needs experience as well as skill to
understand it effectively and apply the statistical concepts and models.
Hence, only statisticians can handle statistics properly.

Manipal University Jaipur Page No. 18


Statistics for Management Unit 1

1.8 Statistical Softwares


When the collected data is small, the analysis and interpretation can be
done without much difficulty. But when the amount of data is huge, the
process of analysis and interpretation would be difficult. Therefore, there is a
need for statistical packages to calculate it in an easier way.
With the advent of computers, lot of statistical packages have been
developed which help the scientific and technical researchers or statisticians
in getting the most accurate and useful information from the data. These
statistical packages help the statisticians in summarising, presenting and
analysing huge amounts of data in a short time. Some such statistical
software applications are Minitab, SPSS, and E-Views that are mentioned in
brief here.

Minitab
Minitab is a statistical software package that was designed especially for
the teaching of introductory statistics courses. It is an easy-to-use
statistical software package and is a vital and significant component of
such a course. This permits the student to focus on statistical concepts
and thinking, rather than computations or the learning of a statistical
package. The main aim of any introductory statistics course should,
always be the why of statistics rather than technical details that do little to
stimulate the majority of students and do little to reinforce the key
concepts. (Source: http://www.minitab.com)

SPSS (Statistical Package for Social Sciences)


SPSS Inc. technology encapsulates advanced mathematical and
statistical expertise, to extract predictive knowledge that when deployed
into existing processes makes them adaptive to improve the outcome.
Predictive Analytics Software will help:
Capture all the information you need about people's attitudes and opinions
Predict the outcome of interactions before they occur
Act on your insights by embedding analytic results into the business
processes. (Source: http://www.spss.com)

Manipal University Jaipur Page No. 19


Statistics for Management Unit 1

EViews
EViews is a statistical software tool, which offers academic researchers,
corporations, government agencies, and students the access to powerful
statistical, forecasting, and modelling tools through an innovative, easy-to-
use object-oriented interface.
EViews is the ideal package for anyone who works with time series, cross-
section, or longitudinal data. EViews offers an extensive array of powerful
features for data handling, statistics and econometric analysis, forecasting
and simulation, data presentation, and programming. EViews generates
forecasts or model simulations and produce high quality graphs and
tables. (Source: http://www.eviews.com/)

JMP Software
JMP is statistical discovery software. JMP helps you explore data, fit
models, discover patterns, and discover points that don’t fit patterns.
JMP is best for data analysis; JMP aims to present a graph with every
statistics.
Table 1.1b depicts the statistical techniques and their application.
Table 1.1b: Illustrative List of Statistical Techniques and Their Application

Statistical Technique Field Specification


Binomial Distribution Quality Sampling Inspection
Assurance
Correlation and Regression Financial Risk, Hedging of Investments,
Analysis Marketing Cross- Market Analysis
Index Number Economics Wholesale and consumer
Price Indices
Sampling Market Research Consumer Survey
Normal Distribution Equity Research EPS
Testing of Hypothesis Agriculture Testing a Fertiliser
Rank Correlation Rankings Rankings with multiple criteria
Weighted Average Finance Sensex, NIFTY
Percentiles Education Relative Ranking
(Source: TN Srivastava & Shailaja Rejo (2008) Statistics for Management
5th ed. TMH)

Manipal University Jaipur Page No. 20


Statistics for Management Unit 1

Table 1.1c depicts list of decision situation and corresponding statistical


techniques.
Table 1.1c: Illustrative List of Decision Situation and
Corresponding Statistical Techniques

Statistical Techniques
Area Decision
Applicable
Marketing Assessment of Demand of Times Series,
Product, Customer Profiling and Correlation and
Market Research Regression
Retail Identifying Customer Buying Cluster Analysis,
Management Behaviour and Patterns Correlation and
Regression
Finance and Evaluation of Investment, Correlation Analysis and
Banking Derivatives and Predicting EPS Regression Analysis,
Probability, Hypothesis,
Time series
Insurance Determining the Premium, Probability, Hypothesis,
Impact of Different Factors on Time Series, Correlation
Health and Life Analysis and Regression
Analysis
Operations Controlling and Improving Statistical Quality
Production Process and Quality control, Six Sigma,
Sampling Inspection
HRD Performance Appraisal and Normal Distribution,
Reward System Correlation Analysis,
Conjoint Analysis

1.9 Summary
Let us now summarise the key learnings of this unit:
 Decision making process becomes more efficient with the help of
Statistics. Statistics deals with an aggregate of facts.
 Statistics is applied in all fields of our activities. Statistical interpretation
requires skilled and experienced statisticians. Statistical data is
numerical data or quantitative data but not qualitative data.
 Statistics is broadly divided into Descriptive and Inferential Statistics.

Manipal University Jaipur Page No. 21


Statistics for Management Unit 1

 Descriptive Statistics gives the general description of quantitative data,


whereas inferential statistics deals with reaching valid conclusions about
the data in order to make effective judgment.
 The statistical software packages used by the interpreters or statisticians
are Minitab, SPSS, Microsoft Excel, EViews and others.

1.10 Glossary
Data: Data is the facts and figures that are collected, analysed and
interpreted.
Descriptive Statistics: Descriptive statistics is tabular, graphical and
numerical methods used to summarise data.
Element: Element is the entities on which data are collected.
Qualitative Data: Data that are labels or names used to identify an attribute
of each element.
Quantitative Data: Quantitative data describes data in terms of quantity
using the numerical figure accompanied by a measurement unit.
Sample: Sample is a subset of the population.
Statistical Inference: This is the process of using data obtained from a
sample to make estimates about the characteristics of a population.
Statistics: Statistics is the art and science of collecting, analysing,
presenting and interpreting data.
Population: Population is the set of all elements of interest in a particular
study.

1.11 Terminal Questions


1. Mention the characteristics of Statistics.
2. Give the meaning of the word Statistics.
3. What are the limitations of Statistics?
4. What are the functions of Statistics?
5. What is the importance of Statistics in modern business environment?
6. Explain any two applications of Statistics.

Manipal University Jaipur Page No. 22


Statistics for Management Unit 1

1.12 Answers

Self Assessment Questions


1. c) Investing funds in several options
2. b) Mathematical statistics
3. c) Nature believed in variation
4. b) Statistics quantifies uncertainity
5. a) State
6. The four components of Statistics are collection, presentation, analysis
and interpretation of data.
7. b) A. L. Bowley
8. State affair
9. Industrial Quality control, Investment policies, to find market potential
for a product.
10. a) No
b) No
11. a) Quantitative data
b) Qualitative data
12. Yes

Terminal Questions
1. Refer to section 1.5
2. Refer section 1.3
3. Refer to section 1.7
4. Refer to section 1.6
5. Refer to section 1.1.3
6. Refer to section 1.4

1.13 Case Study

The Manager of the customer service division of a consumer electronics


company was interested in determining whether customers who had
purchased a DVD player over the past 12 months were satisfied with their

Manipal University Jaipur Page No. 23


Statistics for Management Unit 1

products. Using the warranty cards submitted after purchases, the manager
was planning to survey these customers.
a. According to you as a researcher in this case, how would you decide in
proceeding with descriptive statistics?
b. Can you decide in thinking for a Manager of customer service division of
a consumer electronics company to use inferential statistics? Justify
your answer.
c. Describe the population and sample for this survey.
d. Develop three categorical and numerical questions that you feel would
be appropriate for the study.

References:
 Agarwal B. L., (2006) Basic Statistics, 4th Ed, New Age International
Publishers.
 Bowerman, B. L & R.T. O Connel, Applied Statistics: Improving Business
Processes, Irwin 1996.
 David R. Anderson, Dennis J. Sweeney & Thomas A. Williams Thomson
Business Information Pvt Ltd. 5th Ed.
 Freedman D. R. Pisani and R. Purves, Statistics 3rd Ed, W.W Norton
1997.
 Rand R. Wilcox, (2009) Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press.
 Richard I. Levin, David S. Rubin, (2008) Statistics for Management, 7th
Ed, PHI Learning Private Limited.
 Srivastava, T. N. & Shailaja Rejo (2008). Statistics for Management, 5th
Ed. TMH.
 Tanur , J.M, Statistics: A Guide to the unknown, 4th Ed, Brooks /cole,
2002.
 Tukey J. W, Exploratory Data Analysis, Addison –Wesley, 1977.

E-References:
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-1.pdf.

Manipal University Jaipur Page No. 24


Statistics for Management Unit 2

Unit 2 Statistical Survey


Structure:
2.1 Introduction
Objectives
Relevance
Statistics in practice
Definition of statistical survey
2.2 Stages of Statistical Survey
Planning a statistical survey
Execution of statistical survey
2.3 Basic Terms used in Statistical Survey
Units or individuals
Population or universe
Sample
Quantitative characteristic
Qualitative characteristic
Variable
2.4 Collection of Data
Primary data
Secondary data
Pilot survey
2.5 Measurement Scales
Qualitative (categorical) data
Quantitative (numerical) data
2.6 Scrutiny and Editing of Data
2.7 Summary
2.8 Glossary
2.9 Terminal Questions
2.10 Answers
2.11 Case Study

2.1 Introduction
In the previous unit, ‘Introduction to Statistics’, we have been introduced to
the definition and functions of statistics. We also studied the broad divisions
of statistics. We now have an idea about the characteristics of statistics and

Manipal University Jaipur Page No. 25


Statistics for Management Unit 2

the limitations of statistics. In this unit, we will study about statistical survey
and the collection and analysis of numerical data.
When the population is large, it is hard to conduct a survey. In such
situations, a sample is drawn and studied to determine the characteristics of
the entire population. The primary purpose of conducting a sample survey is
to obtain certain information about the population.
We define the term ‘survey’ as a measurement procedure to gather people’s
opinions. Surveys differ from each other as their purpose, field of study,
scope, and the source of information differ. Surveys are used by companies
to assess the level of their customer satisfaction, to find out what products
their customers choose and to determine which section of the population is
buying their products. The following are some examples of activities, which
require collection and analysis of data in a systematic manner.
 Formulation of a theory such as “Tobacco Consumption Leads to
Cancer”
 Framing of policies according to the existing nature of a population
 Finding the relationship between characteristics of units in the
population
In other words, a search for knowledge by analysing numerical data is
known as Statistical Survey or Statistical Investigation.
Objectives:
After studying this unit, you should be able to:
 recall the definition of statistical survey
 describe the activities involved in planning a statistical survey
 recall the definition of terms used in statistics
 differentiate between sample and population
 differentiate between quantitative and qualitative characteristics
 describe various methods of data collection
 distinguish between primary and secondary data
 explain various measurement scales
2.1.1 Relevance
The relevance, timeliness and accuracy of data are the standard tools for
any statistical study. The quality of information and conclusion derived from
a data depends on these characteristics. The absence of these reflects in

Manipal University Jaipur Page No. 26


Statistics for Management Unit 2

the popular way “Garbage in, garbage out” abbreviated as GIGO, mostly
used in the field of computer science. Therefore, it is equally important in the
context of statistical data and utmost care has to be taken while collecting
the right data by the right process and from the right source.
2.1.2 Statistics in practice
Recent CASs (Country Assistance Strategy) for Kenya and Armenia provide
good examples of assessing statistical capacity and proposing appropriate
action. The Kenya CAS takes a comprehend approach towards statistical
capacity building, based on the implementation of a national statistical
development strategy supported by IDA (International Development
Association) and a number of other development partners. The CAS for
Armenia finds that, Armenia’s capacity for poverty monitoring and analysis is
reasonably good, as the National Statistical Service (NSS) has conducted
regular household surveys for a number of years. The CAS identifies steps
to further improve capacity, including ‘strengthening the linkages between
different household surveys’ and ’improving questionnaires to reflect current
policies (for example, on social assistance) and to provide better information
(for example, on employment and earnings).’
(Source: http://siteresources.worldbank.org)

2.1.3 Definition of Statistical Survey


A Statistical Survey is a scientific process of collection and analysis of
numerical data. Statistical surveys are used to collect information about
units in a population and it involves asking questions to individuals. Surveys
of human populations are common in government, health, social science
and marketing sectors.

2.2 Stages of Statistical Survey


Statistical surveys involve two stages namely – Planning and Execution.
Figure 2.1 shows the two broad stages of Statistical Survey.

Statistical Survey

Planning Execution

Fig. 2.1: Stages of Statistical Survey


Manipal University Jaipur Page No. 27
Statistics for Management Unit 2

2.2.1 Planning a Statistical Survey


The relevance and accuracy of data obtained in a survey depends upon the
care taken in planning. A properly planned investigation can lead to the best
results with least cost and time. Figure 2.2 gives the explanation of steps
involved in the planning stage.

Fig. 2.2: Steps Involved in Planning of a Statistical Survey

Manipal University Jaipur Page No. 28


Statistics for Management Unit 2

2.2.2 Execution of statistical survey


Controlled methods should be adopted at every stage of carrying out the
investigation to check the accuracy, coverage, methods of measurements,
analysis and interpretation.
The collected data should be edited, classified, tabulated and presented in
the form of diagrams and graphs. The data should be carefully and
systematically analysed and interpreted.
Self Assessment Questions
1. What are the main stages in a survey?
2. Training of investigators belongs to which stage?
3. Analysis of data is a part of the execution of survey. Is this correct?

2.3 Basic Terms Used in Statistical Survey


Statistics, being a specialised subject, has a number of terms which have to
be used. Knowledge and understanding of these terms is necessary to do
any statistical work. Stated here are some of the basic terms used in
Statistical Survey.
2.3.1 Units or Individuals
In a statistical survey, the objects on which the characteristics are measured
are referred to as Units or Individuals.
2.3.2 Population or Universe
The totality of all units or individuals in a survey is called Population or
Universe. If the number of objects in a population is finite, it is called finite
population otherwise, it is known as infinite population.
The measure describing the characteristics of the population is known as
parameter. In figure 2.3, the total number of eight consumers constitutes the
entire population.

Key Statistics
A parameter is a measure of the characteristic of the population.
Population can have many parameters.
A Statistic is a measure of characteristic corresponding to the sample.
Sample can have many statistics.

Manipal University Jaipur Page No. 29


Statistics for Management Unit 2

Fig. 2.3: Population versus Sample

2.3.3 Sample
A sample is a part or a subset of the population. By studying the sample,
you can predict/comment on the characteristics of the entire population from
where the sample is taken. The measure that describes the characteristics
of a sample is known as statistics.
If the population is large, it is hard to collect data corresponding to the entire
population. Hence, a part of the population is chosen to study the
characteristics of the entire population. The size of the sample can never be
as large as the size of the population. Proper care must be taken while
choosing the samples. In the figure 2.3, a sample of three consumers is
drawn from the entire population of eight consumers.

2.3.4 Quantitative Characteristic


A characteristic which is numerically measurable is called a Quantitative
Characteristic. Quantitative data is data expressing a certain quantity,
amount or range. Usually, there is measurement units associated with the
data, for example, the height of a person in centimetres.

2.3.5 Qualitative Characteristic


A characteristic which is not numerically measurable is called a Qualitative
Characteristic. Qualitative data is data describing the attributes or properties
that an object possesses.

Manipal University Jaipur Page No. 30


Statistics for Management Unit 2

Let us understand the basic terminologies of statistical survey with the help
of a Caselet.

Caselet
Consider the survey of the average number of children below 16 years in
a ward of a municipality. The number of houses in the ward is finite and
therefore, the population is finite. The objects are households. The
characteristic measured is number of children below 16 years in a
household. It is numerically measurable and hence quantitative. On the
other hand, in a survey to find the total number of blind people in a
locality, the characteristic ‘blindness’ is qualitative.

2.3.6 Variable
In a population, some characteristics remain the same for all units and some
others vary from unit to unit. The quantitative characteristic that varies from
unit to unit is called a variable. It is a measurable characteristic for example,
age, height, income. The qualitative characteristic that varies from unit to
unit is called an attribute. It is a non-measurable characteristic for example,
religion, nationality and occupation.

Fig. 2.4: Various Types of Variables

Manipal University Jaipur Page No. 31


Statistics for Management Unit 2

A variable that assumes only some specified values in a given range is


known as ‘discrete variable’. A variable that assumes all the values in the
range is known as ‘continuous variable’. For example, the number of
children per family and the number of petals in a flower are examples of
discrete variables. The height and weight of a person are examples of
continuous variables. Figure 2.4 shows the difference between Qualitative
and Quantitative Variable and also the difference between Discrete and
Continuous Variable.

Self Assessment Questions


4. Classify the following as finite or infinite population.
i) Production of a product in a factory for a day
ii) The set of rational numbers
iii) The weight of new born babies measured up to first decimal place
in a state during the first week of February 2008
5. Classify the following as an attribute or a variable.
i) Eye colour of human beings
ii) Number of pages in a book of various subjects
6. Classify the following as discrete or continuous variable
i) Number of shares sold each day in a stock market.
ii) Temperatures recorded every half hour at a regional meteorological
centre.
7. Statistics can best be considered as
i) both Art and Science
ii) Art
iii) Science
iv) neither Art nor Science
8. Data that possess numerical properties are known as
i) Quantitative data
ii) Qualitative data
iii) Primary data
iv) Parametric data

Manipal University Jaipur Page No. 32


Statistics for Management Unit 2

9. A tool of all science in research and making an intelligent judgement is


i) Statistics
ii) Collection
iii) Data
iv) Judgement

2.4 Collection of Data


Collection of data is the first and most important stage in any statistical
survey. The method for collection of data depends upon various factors
such as objective, scope, nature of investigation and availability of
resources. Direct personal interviews, third party agencies, and
questionnaires are some ways through which data is collected.
2.4.1 Primary data
Primary data is the one, which is collected by the investigator for
the purpose of a specific inquiry or study. Such data is original in character
and is generated by a survey conducted by individuals or a research
institution or any organisation.
For Example:
If a researcher is interested to know the impact of a non-meal scheme for
school children, he/she has to undertake a survey and collect data on the
opinion of parents and children by asking relevant questions. Such a data
collected is called primary data.
Data collected for the first time keeping in view the objective of the survey is
known as primary data. Interview, questionnaire and telephone/mail are all
examples of primary data.
They are likely to be more reliable. However, cost of collection of such data
is much higher. Primary data is collected by either a census method or a
sampling method.

Key statistic
A census is the procedure of systematically acquiring and recording
information about the members of a given population

Manipal University Jaipur Page No. 33


Statistics for Management Unit 2

Collection of primary data is done by a suitable method as per the following:


1. Direct personal observation
2. Indirect oral interview
3. Information through agencies
4. Information through mailed questionnaires
5. Information through a schedule filled by investigators
Let us study each of them in detail which are as follows:
1. Direct Personal Observation – In the direct personal observation
method, as illustrated in figure 2.5, the investigator collects data by having
direct contact with the units of investigation. The accuracy of data depends
upon the ability, training and attitude of the investigator.

Fig. 2.5: Direct personal observation

The direct personal observation method is suitable where,


 The scope of investigation is narrow
 Investigation is confidential and requires personal attention of the
investigator
 Accuracy of data is important
Table 2.1 shows the merits and demerits of a direct personal observation
method.
Table 2.1: Merits and Demerits of a Direct Personal Observation Method

Merits Demerits
1. We get the original data which 1. This method is not cost efficient.
is more accurate and reliable.
2. Satisfactory information can be 2. This method consumes more time.
extracted by the investigator
through indirect questions.

Manipal University Jaipur Page No. 34


Statistics for Management Unit 2

3. Data is homogeneous and 3. This method cannot be used when the


comparable. scope of investigation is wide.
4. Additional information can be 4. Most of the data collected through this
gathered. method is confidential.

2. Indirect oral interview – Indirect oral interview is used when the area to
be covered is large. The investigator collects the data from a third party or a
witness or the head of an institution. This method is generally used by the
police department in cases related to enquiries on the cause of fires, thefts
or murders.
In this method, the investigator contacts witnesses or neighbors or friends or
some other third parties who are capable of supplying the necessary
information. Enquiry committees appointed by governments use this method
to get people’s views and every possible detail regarding the enquiry. This
method suits best when direct sources do not exist or cannot be relied upon
or would be unwilling to take part in the survey. Table 2.2 shows the merits
and demerits of indirect oral interview.
Table 2.2: Merits and Demerits of Indirect Oral Interview Method

Merits Demerits
1. Economical in terms of time, cost 1. The degree of accuracy of the
and man power information is less.
2. Confidential information can be
collected
3. Information is likely to be unbiased
and reliable

3. Collecting information through agencies – Methods of collecting


information through local agencies or correspondents is generally adopted
by newspapers and television channels. Local agents are appointed in
different parts of the area under investigation. This method is illustrated in
figure 2.6. They send the desired information at regular intervals.
This method is used where the area to be covered is very large and periodic
information is required. However, one disadvantage of this method is that
the information is likely to be biased.

Manipal University Jaipur Page No. 35


Statistics for Management Unit 2

Fig. 2.6: Collecting Information through Agencies

4. Information collected through mailed questionnaires – Often,


information is collected through questionnaires. The questionnaires are filled
with questions pertaining to the investigation. They are sent to the
respondents with a covering letter soliciting cooperation from the
respondents (respondents are the people who respond to questions in the
questionnaire). The respondents are asked to give correct information and
to mail the questionnaire back. The objectives of the investigation are
explained in the covering letter along with the assurance to keep the
information confidential.

Questionnaire design
 Initial considerations
Type of information required
Type/nature of respondents
Type and method by which survey is to be undertaken
 Question content
Relevance of a question
Clarity of a question
Avoid ambiguous, leading, double-barrelled questions
Ability and willingness of a respondent to answer the questions

Manipal University Jaipur Page No. 36


Statistics for Management Unit 2

 Question phrasing
Style appropriate to target population
Short, Clear and unambiguous questions
Avoid biased words and leading questions
Avoid negative questions
Discourage guessing
Do not assume anything for granted from the part of the respondents
 Types of questions
Closed ended questions
Dichotomous
Multiple choice (4 to 5 options; neutral point)
Likert scale (Agree or disagree)
Semantic differential (scale connecting bipolar words)
Importance scale (importance of some attribute)
Rating scale (Excellent to poor)
Open ended questions
Completely unstructured
Word association (first word that comes to mind …)
Sentence completion
Story completion
Picture completion (filling balloons)
Thematic Apperception Test (relate story to picture)
 Question sequence
Logical order
Avoid questions which suggest answers to later questions (bias)
 Questionnaire layout
Good quality paper
As short as possible (20-30 questions)
Use lines, boxes, pictures, etc.
Instructions kept to a minimum but user-friendly
Purpose of survey explained at the beginning and guarantee of
confidentiality
What is to be done with the completed questionnaire?

Manipal University Jaipur Page No. 37


Statistics for Management Unit 2

 Pre-test, revision and final version of questionnaire


Uncover faults
Misprints
Grammatical mistakes
Relevance of questions
Expected range of answers
Construction of a good questionnaire is an important contributing factor to
the success of a survey. When questionnaires are properly framed and
constructed, they become important tools by which statements can be made
about specific people or entire population.
This method is generally adopted by researchers and other official and non-
official agencies. This method is used to cover large areas of investigation. It
is more economical and free from investigator’s bias. However, it results in
many “non-response” situations. The respondent may be illiterate. The
respondent may also provide wrong information due to wrong interpretation
of questions.
If the questionnaire consists of invalid questions, or questions in incorrect
order, or questions in inappropriate format, or questions that are biased,
then the survey would be useless. An important method for checking and
making sure whether a questionnaire is accurately capturing the intended
information, is to pre-test it among a smaller subset of target respondents.
Success of this method of collection of data depends mainly on proper
drafting of the questionnaire. You have to keep the following points in mind
while preparing a questionnaire:
 The respondent should not take much time in completing the
questionnaire. It should be small and not lengthy.
 The questions asked should be well structured and unambiguous.
 The questions asked should be in a proper logical sequence.
 Questions should be unbiased. The questions in the questionnaire
should not disturb the privacy of the respondents.
 The questionnaire should not have much writing work.
 Necessary instructions and glossary should be given in covering letter.
 Questions involving technological jargons and mathematical calculations
should be avoided.

Manipal University Jaipur Page No. 38


Statistics for Management Unit 2

 The completed questionnaire should be kept confidential and used only


for the purpose of the survey as mentioned in the investigation.
 There should not be any scope for misinterpretation in the questions.
There are different types of questions that can be used in the questionnaire.
A questionnaire can have contingency questions, matrix questions, close
ended questions and open ended questions. Let’s have a look at each one
in detail.
 Contingency questions are questions that are answered only if the
respondent gives a particular response to a previous question. This
avoids asking people questions that do not apply to them.
 Matrix questions are questions which are placed one under the other,
forming a matrix. The response categories are placed on the top and a
list of questions are placed by the side. This is used to efficiently occupy
space and respondents’ time.
 Closed ended questions are those where the respondents’ answers
are limited to a fixed set of responses. Usually scales are closed ended.
There are various types of closed ended questions.

Dichotomous Question: A question that has two possible responses.


The responses could be Yes/no, True/False, Agree/Disagree.
For example:
Example 1
 Are you a science graduate? Yes [ ] No [ ]
 Did you watch a movie last night? Yes [ ] No [ ]

Multiple choices – here the respondents have several options from


which to choose. For example:

Example 2
The sun rises in which direction?
East [ ] West [ ]
North [ ] South [ ]

Manipal University Jaipur Page No. 39


Statistics for Management Unit 2

Scaled questions: here the responses are graded on a continuum (For


example, rating the appearance of a product on a scale from 1 to 10,
with 10 implying the most preferred appearance and 1 implying the
least preferred appearance). Scaled questions are mostly questions
related to attitudes. A Likert scale provides a number of attitude
statements. The respondent has to say how much they agree or
disagree with each one.

Example 3
Read the following statement and then indicate by a tick whether you
Strongly Agree, Agree, Disagree or Strongly Disagree with the
statement.
“Organised and prioritised tasks take less time to complete.”
1. Strongly Agree [ ]
2. Agree [ ]
3. Disagree [ ]
4. Strongly Disagree [ ]

 Open ended questions are those questions for which the respondent
provides their own answer without any fixed set of possible responses.
Examples of the types of open ended questions are:
Sentence completion: In these, respondents complete an incomplete
sentence.

Example 4
Complete the sentence below.
“I like the management courses offered by Manipal University
Jaipur because ...”
Story completion: In these, respondents complete an incomplete story.
Picture completion: In these, respondents fill in an empty conversation
balloon.
Thematic Apperception Test: In these, respondents explain a picture
or make up a story about what they think is happening in the picture.

Manipal University Jaipur Page No. 40


Statistics for Management Unit 2

Activity
Design a questionnaire for consumer response in Facebook Vs Twitter in
the Internet.
5. Information through schedule filled by investigator – Information can
be collected through schedules filled by investigators through personal
contact. In order to get reliable information, the investigator should be well
trained, tactful, unbiased and hard working.
A schedule is suitable for an extensive area of investigation through
investigator’s personal contact. The problem of non-response is minimised.
There is a difference between a schedule and a questionnaire. A schedule
is a form that the investigator fills personally, while surveying the units or
individuals from the sample (respondent). A questionnaire is a form sent
(usually mailed) by an investigator to respondents. The respondent has to fill
it and then send it back to the investigator.

Table 2.3: Advantages and Disadvantages of Information through schedule


filled by investigator method

Advantages Disadvantages

1. The cost per respondent is likely to 1. Low response rate.


be less, i.e, more people can be
sampled

2. More questions can be asked 2. Questionnaires may only be partly


since the respondent is answering answered.
them at his/her own convenience.

3. The interviewer cannot influence 3. Misunderstandings may not be


any respondent (all questionnaires clarified.
are the same).

4. Questions on sensitive issues may 4. There is no encouragement to think


be asked and answered easily. more deeply on the questions
before answering them.

6. A large amount of response is


obtained in a short period of time.

Manipal University Jaipur Page No. 41


Statistics for Management Unit 2

Example of a customer opinion questionnaire used by a local restaurant in


Bangalore:

Customer opinion questionnaire used by a local restaurant in


Bangalore:
We are happy you stopped by the local restaurant and want to make sure
you will come back. So, if you have a little time, we will really appreciate it
if you could fill out this form. Your comments and suggestions are
extremely important to us. Thank you
Server Name _________________
Excellent Good Satisfactory Unsatisfactory
Food Quality
Friendly Service
Prompt Service
Cleanliness
Management
Comments__________________________________________________
_________________________________________________________
What prompted your visit here? _________________________________
__________________________________________________________
Please drop it in the suggestion box at the entrance. Thank you

2.4.2 Secondary data


Any information, that is used for the current investigation but is obtained
from some data, which has been collected and used by some other agency
or person in a separate investigation, or survey, is known as secondary
data. They are available in a published or unpublished form.
In published form, secondary data is available in research papers,
newspapers, magazines, government publication, international publication,
and websites. Secondary data is collected for different purposes. Therefore,
care should be exercised while using it.

Manipal University Jaipur Page No. 42


Statistics for Management Unit 2

The accuracy, reliability, objectives and scope of secondary data should be


examined thoroughly before use. Secondary data may be collected either by
census or by sampling methods.
Published sources
The various sources of published data are:
 Reports and official publications of international and national
organisations as well as central and state governments
 Publications of several local bodies such as municipal corporations and
district boards
 Financial and economic journals
 Annual reports of various companies
 Publications brought out by research agencies and research scholars
Some of the journals (both academic and non-academic) are published at
regular intervals like yearly, monthly, weekly whereas, other publications are
more ad hoc. Internet is a powerful source of secondary data, which can be
accessed at any time for any further analysis of the study.
Unpublished sources
It is not necessary that all statistical contents have to be published.
Unpublished data such as records maintained by various government and
private offices, studies made by research institutions and scholars can also
be used where necessary.
Though, use of secondary data is economic in terms of expense, time and
manpower requirement, researcher must be careful in choosing such
secondary data. Secondary data must possess the following characteristic:
1) Reliability of data: The reliability related to secondary data can be tested
by investigating
a) Who collected the data?
b) What were the sources of data?
c) Whether they are collected by a proper method?
d) At what time were they collected?
f) What level of accuracy was desired? Was it achieved?
2) Suitability of the data: The data that is suitable for one enquiry may not
necessarily be suitable for another enquiry. Hence, if the available data
are found to be unsuitable, they should not be used by the researcher.

Manipal University Jaipur Page No. 43


Statistics for Management Unit 2

In this context, the researcher must carefully scrutinise the definition of


various terms and units of collection used, at the time of collecting the
data from the primary source originally. Similarly, the object, scope and
nature of original enquiry must also be studied. If the researcher finds
that the differences in the data will remain unsuitable for the present
enquiry, it should not be used.
3) Adequacy of data: If the level of accuracy achieved in the data is found
inadequate for the purpose of present enquiry, they will be considered
as inadequate and should not be used by the researcher. The data will
also be considered inadequate, if they are related to an area which may
be either narrower or wider than the area of present enquiry.
With secondary data, people have to compromise between what they want
and what they are able to find.
The merits of secondary data are:
 Secondary data is cheaper to obtain. Many government publications are
relatively cheap and libraries have stock quantities of secondary data
produced by the government, by companies and by other organisations.
 Large quantities of secondary data can be accessed through the internet
and online databases.
 Most of the available secondary data has been collected over a period of
several years and therefore it can be used to plot trends.
 Secondary data is valuable to the government, business and research
areas. For governments, it helps in making decisions and in planning
future policies. In the business and industry areas such as marketing
and sales, it is used to appreciate the general economic and social
conditions and to provide information of competitors. To the research
organisations, it provides social, economical and industrial information.
The demerits of secondary data are:
 It is difficult to judge whether the secondary data is sufficiently accurate
and also reliable.
 It might be difficult to fit secondary data to the needs of the investigator.
 Secondary data might not be available for certain investigations. In such
situations, primary data has to be collected.

Manipal University Jaipur Page No. 44


Statistics for Management Unit 2

The differences between primary and secondary data are listed in the
table 2.4.
Table 2.4: Differences between Primary Data and Secondary Data

Primary Data Secondary Data


1. Data is original and thus more 1. Data may not be reliable.
accurate and reliable.
2. Gathering data is expensive. 2. Gathering data is cheap.
3. Data is not easily accessible. 3. Data is easily accessible through
internet or other resources.
4. Most of the data is homogeneous. 4. Data is not homogeneous.
5. Collection of data requires more 5. Collection of data requires less time.
time.
6. Extra precautionary measures 6. Data selection needs extra care.
need not be taken.
7. Data gives detailed information. 7. Data may not be adequate.

Self Assessment Questions


10. State whether the following data are Primary or Secondary.
i) An official of the Census Board of India is preparing a report on
census of population based on the survey data that is collected by
the Census Board.
ii) An HR representative of a software company is deciding on the time
taken to perform a particular job on a project on the basis of random
observations collected by him.
iii) A neurologist is examining the relationship between cigarette
smoking and brain tumor based on the data published in a famous
neurology journal.
11. When population under investigation is infinite, we should use
i) sample method
ii) census method
iii) neither census nor sample method
iv) both a & b

Manipal University Jaipur Page No. 45


Statistics for Management Unit 2

2.4.3 Pilot survey


Pilot survey is a small trial survey undertaken before the main survey. It
gives a measure of efficiency of the questionnaire and reduces the
inconveniences and loss of information. It helps in introducing necessary
changes in the main survey.
Regarding the nature of population under study, when prior information,
operational and cost aspects of data collection and analysis is not available
from surveys, it is desirable to design and carry out a pilot survey.
Pilot survey is a preliminary research conducted before a complete survey
to test the effectiveness of conducting the research. Pilot survey should be
completed before the final survey begins. By conducting the pilot survey, the
investigator will be able to know any difficulties that might arise that were not
known at the survey proposal stage.
Pilot surveys have several other advantages:
 Pilot surveys provide the investigator with many ideas, approaches and
clues that are not foreseen before conducting the pilot survey. Such
ideas and clues increase the chances of getting accurate findings in the
main survey.
 Pilot surveys help in making necessary alterations in the data collecting
methods. Hence investigators can analyse data in the main survey more
efficiently.
 Pilot surveys save a lot of time and provide enough data for the
investigator to decide whether to go ahead with the main survey or not.
Apart from advantages, pilot survey also has certain limitations that are
discussed here:
 Pilot surveys are not based on strong statistical foundation and are
based on very small sample sizes.
 There is a possibility that the investigator might make wrong predictions
or assumptions on the basis of pilot data.
 If data and results from pilot surveys are included in the main survey
then it might lead to incorrect decisions.
 If the pilot participants are included in the main survey, then data
obtained from these participants might result in corruption of main data.

Manipal University Jaipur Page No. 46


Statistics for Management Unit 2

2.5 Measurement Scales


Variables differ in how well they can be measured, that is, how much
measurable information their measurement scale can provide. There is
obviously some measurement error involved in every measurement, which
determines the amount of information that we can obtain.
Another factor that determines the amount of information that can be
provided by a variable is its type of measurement scale. Specifically
variables are classified under two categories – qualitative and quantitative.
2.5.1 Qualitative (categorical) data
Qualitative, also known as categorical data, cannot be measured on a
numerical scale (quantified). Examples of categorical variables are gender
(male or female) and size of T-shirt (XXS, XS, S, M, L, XL and XXL); yet,
these two variables differ in a sense; the first is said to be nominal or purely
categorical whereas the second is known as ordinal.
Nominal (purely categorical) data
Nominal variables allow for only qualitative classification. They can be
measured only in terms of whether the individual items belong to some
distinctively different categories; however, we cannot quantify or even rank
order these categories. For example, 2 individuals are different in terms of a
certain variable (for example, they are of different race), we cannot say
which one has more of the quality represented by the variable. Typical
examples of nominal variables are gender, race, colour, city, marital status,
etc.
Example 1
Marital status
1. Never married � 4. Married/Cohabiting �
2. Divorced � 5. Separated �
3. Widowed �

Clearly, the numbers associated with the options above have no numerical
significance. Comparison between values is impossible and also descriptive
statistics like the mean and standard deviation would make no sense if
calculated.

Manipal University Jaipur Page No. 47


Statistics for Management Unit 2

Ordinal data
Ordinal variables allow us to rank order the items we measure in terms of
which has less and which has more of the quality represented by the
variable, however they do not allow us to say how much more. A typical
example of an ordinal variable is the socioeconomic status of families. For
example, we know that upper-middle is higher than middle but we cannot
say that it is, for example, 18% higher. Also, this very distinction between
nominal, ordinal, and interval scales itself represents a good example of an
ordinal variable. For example, we can say that nominal measurement
provides less information than ordinal measurement, but we cannot say how
much less or how this difference compares to the difference between ordinal
and interval scales.

Example 2
Employee’s performance
1. Excellent � 4. Poor �
2. Good � 5. Very poor �
3. Average �

It can be easily deduced that ‘Excellent’ is better than ‘Poor’, that is, there is
a latent scale on which comparison can be made among the various values.
Ordinal data can sometimes be treated as interval for the sake of statistical
analysis, provided the assumption is founded. In this case, the values of the
variable are mathematically considered to be ‘equidistant’ on its scale. The
numbers associated with each value starts to get some numerical
significance so that the mean, though not very convincingly, maybe
statistically interpreted.
The variable ‘Employee’s performance’ in Example 2 can be regarded as
interval if we assume that the ‘distance’ between any pair of successive
values is equal (for example, the distance between ‘Excellent’ and ‘Good’ is
the same as that between ‘Average’ and ‘Poor’). In this case, if the average
performance score of 100 employees is calculated and found to be, say,
3.2, we may, within some margin of security, conclude that the overall
performance of employees is just above ‘Average’, the latter having been
assigned a value of 3.

Manipal University Jaipur Page No. 48


Statistics for Management Unit 2

However, it would be statistically dangerous to assume an interval scale for


the following example.

Example 3
Educational level
1. None � 5. Diploma �
2. Primary � 6. Degree �
3. Vocational � 7. Postgraduate �
4. Secondary � 8. Professional �

It is clear that the ‘distance’ between ‘None’ and ‘Primary’ is not equal to that
between ‘Diploma’ and ‘Degree’.
2.5.2 Quantitative (numerical) data
Quantitative data can be easily measured on a numerical scale; variables
which can be quantified in terms of units are all quantitative. Examples of
quantitative variables are number of students per class and height
(measured in centimetres). Again, these two variables differ in their nature;
the first is said to be discrete whereas the second is continuous.
Discrete data
Discrete data occur as definite and separate values; a discrete variable
assumes values which are countable so that there are gaps between its
successive values. For example, when counting the number of children in a
class, we use numbers (0, 1, 2… n).
Continuous data
Continuous data occur as the whole set of real numbers or a subset of it. In
other words, there are no gaps between successive values so that a
continuous variable assumes all the values (including all the decimals)
between given boundaries. Temperature is a good example of a continuous
variable – though thermometer readings are recorded to the nearest tenth of
a degree (Centigrade or Fahrenheit), temperature does not ‘jump’ from, for
example, 17.10 C to 17.20 C. It passes through all the real numbers between
these two values. Height, weight and speed are also continuous variables.
Continuous data can be measured on interval and ratio scales.

Manipal University Jaipur Page No. 49


Statistics for Management Unit 2

Interval scale
Interval variables allow us not only to rank order the items that are
measured, but also to quantify and compare the sizes of differences
between them. For example, temperature, as measured in degrees
Fahrenheit or Celsius, constitutes an interval scale. We can say that a
temperature of 40 degrees is higher than a temperature of 30 degrees, and
that an increase from 20 to 40 degrees is twice as much as an increase
from 30 to 40 degrees. However, interval scale variables do not have an
absolute zero. If the temperatures in Singapore and London are 300 C and
150 C respectively, we cannot say that it is twice as hot in Singapore than in
London. This is simply because it would not be the case if these
temperatures were measured in degrees Fahrenheit: 860 C and 590 F
respectively.
Ratio scale
Ratio variables are very similar to interval variables. In addition to all the
properties of interval variables, they feature an identifiable absolute zero
point, thus they allow for statements such as x is two times more than y.
Typical examples of ratio scales are measures of time or space. For
example, as the Kelvin temperature scale is a ratio scale, a temperature of
200 degrees is higher than 100 degrees, and it is twice as high. Interval
scales do not have the ratio property. Most statistical data analysis
procedures do not distinguish between the interval and ratio properties of
the measurement scales. Height is also a ratio scale variable since, if a
person is twice as tall as another, he/she will remain so, irrespective of the
units used (centimetres, inches, etc…).
Figure 2.7 depicts the various categories of Measurements Scales.

Fig. 2.7: Categories of Measurements Scales

Manipal University Jaipur Page No. 50


Statistics for Management Unit 2

2.6 Scrutiny and Editing of Data


Before using the collected data, it should be checked for its completeness,
accuracy and reliability. By complete, we mean that all the required
information should be available. Editing the data is important and it is a time
consuming process.
The data collected through various sources will be highly unorganised and
needs to be summarised and analysed for further studies. There is a
possibility of missing the valuable data after summarisation. Hence, proper
planning is required in the editing process of any collected data. While
editing, it is important to have all the sources of collected data, and also the
overall scope of survey.
There are different steps involved in editing the collected data. The data
must be checked for:
Legibility: The data must be legible. If a response is not presented clearly,
the investigator has to rewrite it.
Completeness: An unanswered response on a questionnaire implies that,
either the respondent did not answer the entry or the investigator did not
record the data. If the fault is from investigators in making an entry, then the
investigator has to fill the missing entry. If an entry is missing as a result of
omission of that entry by the respondent, then the investigator has to
conduct the survey again to gather the missing entry.
Consistency: The investigator has to examine each questionnaire to check
inconsistency or inaccuracy in any statement. For example, the numerical
figures of attributes such as income, height, weight may be inconsistent. In
such cases, it is the duty of the concerned investigators to make the
necessary corrections. The investigators have to make sure that the
collected data must be free from redundant responses or duplicate entries.

Self Assessment Questions


12. State True or False:
i) Census conducted by Government of India is an example of primary
data.
ii) TV News Bulletins gather information on any event through their
agents.
Manipal University Jaipur Page No. 51
Statistics for Management Unit 2

iii) Schedules make respondents record their answers.


iv) A covering letter to the questionnaire brings confidence in
respondents.
v) Questions in questionnaire should be lengthy.
13. State whether each of the following variables is qualitative or
quantitative.
i) Age
ii) Gender
iii) Class Rank
iv) Number of people favouring the death penalty
14. State whether each of the following variables is qualitative or
quantitative and indicates the measurement scale that is appropriate for
each.
i) Annual sales
ii) Soft drink size (small, medium, large)
iii) Employee classification (GSI through GSIS)
iv) Earning per share
v) Methods of payments (cash, check, credit card)

2.7 Summary
Let us recapitulate the important concepts discussed in this unit:
 A statistical survey is a search for knowledge. There are two main
stages in any statistical survey - Planning and Execution.
 Planning a statistical survey encompasses the following issues:
i) The nature of a problem
ii) The objectives
iii) The scope
iv) Statistical units
v) The degree of accuracy
vi) The time period
vii) The source of information and
viii) The organisation
 The collected data should be edited, analysed and interpreted for
completeness, accuracy and consistency.

Manipal University Jaipur Page No. 52


Statistics for Management Unit 2

 Sample is a subset of the population. Sample can never be larger than


the population from which the sample was taken.
 Quantitative characteristic is a characteristic which is numerically
expressed; otherwise it is a qualitative characteristic.
 The quantitative characteristic that varies from unit to unit is called a
variable. The qualitative characteristic that varies from unit to unit is
called an attribute.
 There are two categories of data - Primary and Secondary data. Primary
data is collected directly from the respondents.
 Any information, that is used for the current investigation but is obtained
from some data, which has been collected and used by some other
agency or person in a separate investigation, or survey, is known as
secondary data. They are available in a published or unpublished form.
 The various methods of collecting Primary data are:
1. Direct personal observation
2. Indirect oral interview
3. Information through agencies
4. Information through mailed questionnaires
5. Information through schedule filled by investigators
 Questionnaires must be structured well and must not be ambiguous. A
covering letter must be included along with the questionnaire. Pilot
survey is a beneficial method when prior information about the survey
does not exist or when the results about the survey is needed quickly.

2.8 Glossary
Interval scale: An interval scale is a scale of measurement where the
distance between any two adjacent units of measurement (or 'intervals') is
the same but the zero point is arbitrary.
Nominal data: A set of data is said to be nominal if the values/observations
belonging to it can be assigned a code in the form of a number where the
numbers are simply labels.
Ordinal data: A set of data is said to be ordinal if the values/observations
belonging to it can be ranked (put in order) or have a rating scale attached.
You can count and order, but not measure, ordinal data.
Population: The set of all elements of interest in a particular study.

Manipal University Jaipur Page No. 53


Statistics for Management Unit 2

Primary data: Data collected for the first time keeping in view the objective
of the survey.
Qualitative variables: A variable with qualitative data.
Quantitative variables: A variable with quantitative data.
Ratio scale: Ratio variables are very similar to interval variables; in addition
to all the properties of interval variables, they feature an identifiable absolute
zero point.
Sample: A subset of the population.
Secondary data: Any information, which is used for the current
investigation collected by some other agency or person in a separate
investigation.
Statistical survey: A scientific process of collection and analysis of
numerical data.

2.9 Terminal Questions


1. What is statistical survey?
2. Enumerate the factors which should be kept in mind for proper planning.
3. What do you understand by the unit of measurement? Explain with
examples.
4. Distinguish between:
a) Primary and secondary data
b) Direct and indirect investigation
c) Questionnaire and schedule

2.10 Answers

Self Assessment Questions


1. Planning and execution
2. Planning
3. Yes
4. i) Finite ii) Infinite iii) Finite
5. i) Attribute ii) Variable
6. i) Discrete ii) Continuous
7. i) both Art & Science

Manipal University Jaipur Page No. 54


Statistics for Management Unit 2

8. i) Quantitative data
9. i) Statistics
10. i) Primary data, ii) Primary data, iii) Secondary data
11. i) sample method
12. i) True ii) True iii) False iv) True v) False
13. i) Quantitative ii) Qualitative iii) Qualitative iv) Quantitative
14. i) Quantitative, Ratio, ii) Qualitative, Nominal, iii) Qualitative, Ordinal,
iv) Quantitative, Ratio, v) Qualitative, Nominal

Terminal Questions
1. Refer section 2.1.3.
2. Refer section 2.2.1.
3. It refers to the unit of the population on which measurements are made,
for example, the height of employees in an office. Employees are
individuals or units. Height is the measurement made on them.
4. a) Data collected for the first time by the investigator is primary data.
Data collected by some other persons but used by the investigator
for his/her study is known as secondary data.
b) Direct investigations are carried out directly by the investigator.
Investigation conducted through mail questionnaire is called indirect
investigation.
c) Questionnaires contain simple questions and are filled by
respondents. Schedules also contain questions but responses are
recorded directly by the investigator.

2.11 Case Study


Case Study 1
A firm is interested in testing the advertising effectiveness of a new
television commercial. As part of the test, the commercial is shown on a
6.30 P.M local news program in Denver, Colorado. Two days later, a market
research firm conducts a telephonic survey to obtain information on recall
rates (percentage of viewers who recall seeing the commercial) and
impression of the commercial.

Manipal University Jaipur Page No. 55


Statistics for Management Unit 2

Discussion Questions:
1. What is the population of the study?
2. What is the sample for this study?
3. Why would a sample be used in this situation? Explain.

Case Study 2
An AMC (Annual Maintenance Contract) company provides onsite IT
support of hardware services clients. At one of the client’s establishment,
the hardware comprises 500 personal computers (PCS) and ten servers
connected by local area network. The AMC covers, interalia, the
maintenance of servers and PCs and the network on 24/7 basis.
The company has a team of 10 technical engineers and a coordinator
posted at the client’s establishment. The company is faced with the problem
of too many complaints about the promptness and quality of service. The
company wants to analyse the problem, for arriving at some appropriate
solution.
Discussion Questions:
Design a questionnaire that would help the company in collecting relevant
data and initiate remedial action. The questionnaire may cover the following
aspects and also any other relevant issues.
 Technical competency
 Promptness
 Behavioural

Case Study 3
Telecom Company wanted to understand the perception of consumers
about value added services of mobiles companies, with a view to add some
new services in this segment. A consultant was hired, and a survey was
planned. The following questionnaire was designed by the consultant.
Questionnaire for Consumers
1) Demographic Profile
2) Name Sex: Male/ Female
3) Occupation: Employed / Self Employed/ Student/Housewife/Retired

Manipal University Jaipur Page No. 56


Statistics for Management Unit 2

4) Annual household Income (in Rs)


i) ≤ 2 lakhs ii) 2 – 5 lakhs iii ) 5–10 lakhs iv) ≥ 10 lakhs
5) Usage Patterns
How long have you owned a cell phone? Month
6) Cellular Operator being used:
i) Vodafone
ii) Reliance
iii) Airtel
iv) BSNL
v) Others
7) What do you normally use the phone for? (You can choose more
than one option)
a) Keeping in touch with friends and family
b) Business
c) For emergencies
d) Others (please specify)
8) Your monthly expenditure on the mobile connection is around:
i) ≤ Rs 500
ii) Rs 500 – 1000
iii) Rs 1000 – 2000
iv) ≥ Rs 2000
9) The mobile charges are paid for by: Self /Company/Spouse or
Parent
10) Value Added Services
Kindly tick(√) those applicable

Services Not Aware Never Used Used Occasionally Used frequently

a) SMS
b) Voice Mail
c) Messenger Services
d) Ringtones
e) GPRS
f) MMS
g) Roaming
h) Internet

Manipal University Jaipur Page No. 57


Statistics for Management Unit 2

11) Reasons for not using some services:


i) Not Aware
ii) Too Expensive
iii) Complicated
iv) No Utility/Time
Discussion Question:
Suggest improvements in the questionnaire.
(T.N. Srivastava & Shailaja Rejo (2008) Statistics for Management 5 th ed.TMH)

References:
 Agarwal, B.L. (2006). Basic Statistics, 4TH Ed, New Age International
Publishers.
 Bowerman, B. L & Connel, R.T.O. Applied Statistics: Improving Business
Processes, Irwin 1996.
 Levin, R. I. & Rubin, D.S. (2008). Statistics for Management, 7th Ed, PHI
Learning Private Limited.
 Lipschutz, S. & Schiller, J.J. Schaum's Outline of Introduction to
Probability and Statistics (Schaum's Outline Series) (Sep 7, 2011).
 Pisani D.R. Freedman & R. Purves, Statistics, 3rd Ed, W.W Norton 1997.
 Schiller, J, Srinivasan, R. Alu, and Spiegel, Murray, Schaum's Outline of
Probability and Statistics, 3rd Ed. (Schaum's Outline Series) (Aug 26,
2008).
 Spiegel, M. & Stephens, L. Schaums Outline of Statistics, Fourth Edition
(Schaum's Outline Series) (Jan 31, 2011).
 Sternstein, M. Barron's AP Statistics with CD-ROM (Barron's AP
Statistics (W/CD)) (Feb 1, 2010).
 Tanur, J.M, Statistics: A Guide to the unknown, 4th Ed, Brooks /cole,
2002.
 Voelkar, D.H, Orton, P. Z. & Adams, S. Statistics (Cliffs Quick Review)
(Jun 15, 2001).
 Wilcox, R.R. (2009). Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press.
E-References:
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-3.pdf.

Manipal University Jaipur Page No. 58


Statistics for Management Unit 3

Unit 3 Classification, Tabulation and


Presentation of Data
Structure:
3.1 Introduction
Objectives
Relevance
Statistics in practise
3.2 Meaning of Classification
Functions of classification
Requisites of a good classification
Types of classification
Methods of classification
3.3 Tabulation
Basic difference between classification and tabulation
Parts of a table
Types of tables
3.4 Frequency and Frequency Distribution
Series of individual observation
Discrete frequency distribution
Continuous frequency distribution
Technical terms used in formulation of frequency distribution
Constructing a frequency distribution
Tally marks
Derived frequency distributions
Cumulative frequency distribution
Bivariate and Multivariate frequency distribution
3.5 Presentation of Data
Diagrammatic and graphic representation
General rules for drawing the diagrams
Types of diagrams
Choice or selection of diagram
3.6 Graphical Presentation
Histogram
Frequency polygon
Frequency curve
Ogives
3.7 Summary

Manipal University Jaipur Page No. 59


Statistics for Management Unit 3

3.8 Glossary
3.9 Terminal Questions
3.10 Answers
3.11 Case Study

3.1 Introduction
In the previous unit, statistical survey, we have studied about surveys,
different methods of collecting the data and analysing the numerical data. In
this unit, we will learn about the classification, tabulation and presentation of
data. We will know about the simplification of collected data and also know
about some methods for graphical summarisation of data that reveals
certain characteristic.
Collected data in the raw form would be voluminous and non-
comprehensible. Therefore, it should be condensed and simplified for better
understanding and usefulness.
Classification is the first stage in simplification. It can be defined as a
systematic grouping of the units according to their common characteristics.
Each of the group is called class.
For example, in a survey of industrial workers of a particular industry,
workers can be classified as unskilled, semi-skilled and skilled, each of
which form a class.
Objectives:
After studying this unit, you should be able to:
 describe the methods of classification
 identify the parts of a table
 describe the functions of tabulation
 calculate the frequency and frequency distribution for the data
 illustrate the numerical data as a graphical representation
3.1.1 Relevance
A picture is equal to a thousand words. The same is true about graphs and
charts that are used to present data in a form which can be easily
comprehended. Graphs and charts reflect our performance. There is
sufficient scope of making effective use of graphs and charts for managerial
functions.

Manipal University Jaipur Page No. 60


Statistics for Management Unit 3

3.1.2 Statistics in practise


Caselet: Colgate-Palmolive Company
The Colgate-Palmolive Company started as a small soap and candle shop
in New York City in1806.Today, Colgate-Palmolive is a $9 billion company
whose products can be found in more than 200 countries and territories
around the world. While best known for its brand names of Colgate-
Palmolive, Ajax, and Fax the company also markets Mennen Hill’s science
diet and Hills’s prescription diet products.
The Colgate–Palmolive Company use statistics in its quality assurance
program for home laundry detergent products. One concern is customer
satisfaction with the quantity of detergent in a carton. Every carton in each
size category is filled with the same amount of detergent by weight, but the
volume of detergent is affected by the density of the detergent powder. For
instance, if the powder density is on the heavy side, a smaller volume of
detergent is needed to reach the cartoon’s specified weight. As a result, the
carton may appear to be under filled when opened by the consumer.
To control the problem of heavy detergent powder, limits are placed on the
acceptable range of powder density. Statistical samples are taken
periodically, and the density of each powder sample is measured. Data
summaries are then provided for operating personnel, so that corrective
action can be taken if necessary to keep the density within the desired
quality specifications.
A frequency distribution for the densities of 150 samples taken over a one-
week period and a bar diagram are shown in the accompanying table and
figure. Density levels above 0.40 are unacceptably high. The frequency
distribution show that the operations are meeting its quality guidelines with
all of the densities less than or equal to 0.40. Managers viewing these
statistical summaries would be pleased with the quality of the detergent
production process.
Frequency distribution of density data
Table 3.1: Frequency distribution
Density Frequency
0.29 30
0.31 75
0.33 32
0.35 09
0.37 03
0.39 01
Total 150
Manipal University Jaipur Page No. 61
Statistics for Management Unit 3

80
70
60
50 3-D Column 1
40
30
20
10
0
0.29 0.31 0.33 0.35 0.37 0.39

Fig. 3.1: Frequency versus Density

3.2 Meaning of Classification


Classification is a process of arranging things or data in groups or classes
according to their resemblances and affinities
Definition of classification
According to Secrist, “Classification is the process of arranging data into
sequences and groups according to their common characteristics or
separating them into different but related parts”.
According to Stockton and Clark, “The process of grouping large number of
individual facts and observations, on the basis of similarity among the items
is called Classification”.
3.2.1 Functions of classification
Classification of data performs many functions which are as follows:
 It condenses the bulk data
 It simplifies the data and makes the data more comprehensible
 It facilitates comparison of characteristics
 It renders the data ready for any statistical analysis
3.2.2 Requisites of a good classification
A good classification should be:
 Unambiguous – It should not lead to any confusion.
 Exhaustive – Every unit should be allotted to one and only one class.

Manipal University Jaipur Page No. 62


Statistics for Management Unit 3

 Mutually exclusive – There should not be any overlapping.


 Flexible – It should be capable of adjusting to changing situations.
 Suitable – It should be suitable to the objectives of the survey.
 Stable – It should remain stable throughout the investigation.
 Homogeneous – There should be similar units in the same class.
 Revealing – It should bring out essential features of the collected data.
3.2.3 Types of classification
Types of classification refer to the class categories into which the data could
be sorted out and tabulated. These categories depend on the nature of data
and purpose for which data is being sought.
a) Geographical (i.e. on the basis of area or region wise)
b) Chronological (on the basis of temporal/historical, i.e. with respect to
time)
c) Qualitative (on the basis of character/attributes)
d) Quantitative (on the basis of magnitude)
a) Geographical classification – In geographical classification, the
classification is based on the geographical regions.
Example 1: Sales of the company (in million rupees) (region–wise)
Table 3.2 depicts the sales of a company in various regions.
Table 3.2: Region Wise Sales in Million Rupees

Region Sales
North 285
South 300
East 185
West 235

b) Chronological classification – If the statistical data are classified


according to the time of its occurrence, the type of classification is called
chronological classification.
Example 2: Table 3.3 depicts the sales reported by a departmental store
from January to August.

Manipal University Jaipur Page No. 63


Statistics for Management Unit 3

Table 3.3: Sales of a Departmental Store


Sales
Month
(Rs.) in lakhs
January 22
February 26
March 32
April 25
May 27
June 29
July 30
August 30

c) Qualitative classification – In qualitative classifications, the data are


classified according to the presence or absence of attributes in given units.
Thus, the classification is based on some quality characteristics/attributes.
Example 3: Sex, literacy, education, class grade, etc.
Further, it may be classified as:
 Simple classification
 Manifold classification
Simple classification – If the classification is done into only two classes
then, classification is known as simple classification.
Manifold classification – When more than one attribute is studied, it is
called manifold classification.
Example 4
 Classification of Population according to sex- male/female, this type of
classification is called simple classification.
 Population may be classified as rural or urban. This may be classified as
male or female and still further as educated or uneducated, this type of
classification is called manifold classification
d) Quantitative classification – In quantitative classification, the
classification is based on quantitative measurements of some
characteristics such as age, marks, income, production, sales etc. The

Manipal University Jaipur Page No. 64


Statistics for Management Unit 3

quantitative phenomenon studied is known as variable and hence this


classification is also called as classification by variable.
Example 5: Table 3.4 depicts the marks obtained by students for a 50
marks test.
Table 3.4: Variable versus Frequency
Marks No. of Students
0 – 10 5
10 – 20 7
20 – 30 10
30 – 40 25
40 – 50 3
Total Students = 50

In this classification, marks obtained by students are variable and number of


students in each class represents the frequency.
3.2.4 Methods of classification
There are three methods of classification. They are as follows:
a. One-way classification – Classification done according to a single
attribute or variable is known as one way classification.

Example 6
Figure 3.2 depicts the number of students who has secured more than
60% in various sub-modules of statistics. This can be classified using
one-way classification method.

Fig. 3.2: One-way Classification

b. Two-way classification – Classification done according to two attributes


or variables is known as two-way classification.

Manipal University Jaipur Page No. 65


Statistics for Management Unit 3

Example 7
Figure 3.3 depicts the classification of students according to gender, who
has secured more than 60% in respective sub-modules of statistics. In
the sub-module titled ‘Basic Concepts’, ten students got more than 60%.
Out of ten students, four are males and six are females.

Fig. 3.3: Two-way Classification

c. Manifold classification – Classification done according to more than two


attributes or variables is known as manifold classification.

Example 8
Figure 3.4 depicts the classification of employees according to skill, sex
and education.

Fig. 3.4: Manifold classification example

Manipal University Jaipur Page No. 66


Statistics for Management Unit 3

Example 9:
Figure 3.5 depicts manifold classification of population.

Fig. 3.5: Manifold Classification

Example 10:
The below Table 3.5 depicts the educational qualification of hotel
employees
Table 3.5: Manifold Classification of Males and Females Based on
Qualification

Yes No Total
Educational
M F M F M F
Qualification
MBA Degree 12 9 3 6 15 15
B.Sc. H and HA 15 15 0 0 15 15

Self Assessment Questions


1. Classification is a systematic __________ of the units according to
their ____________ __________.
2. Classification reduces _________ of the data.
3. Classification of data that are non-measurable is known as _______.

Manipal University Jaipur Page No. 67


Statistics for Management Unit 3

4. Classification done according to two attributes or variables is


_________.
5. Manifold classification involve more than _________ variables.
6. Data arranged according to time of occurrence is known as _______.
7. Geographical classification means classification of data according to:
i) Location ii) Time
iii) Attributes iv) Class intervals
8. Classification is a process of arranging the data into:
i) Different columns ii) Different rows
iii) Different rows and columns
iv) Groups of related facts in different classes
9. The data that can be classified on the basis of time is:
i) Geographical ii) Chronological
iii) Qualitative iv) Quantitative

3.3 Tabulation
Tabulation follows classification. It is a logical or systematic listing of related
data in rows and columns. The row of a table represents the horizontal
arrangement of data and column represents the vertical arrangement of
data. The presentation of data in tables should be simple, systematic and
unambiguous.
The objectives of tabulation are to:
 Simplify complex data
 Highlight important characteristics
 Present data in minimum space
 Facilitate comparison
 Bring out trends and tendencies
 Facilitate further analysis
3.3.1 Basic differences between Classification and Tabulation
Table 3.6 depicts the few differences between classification and tabulation.
Table 3.6: Differences between Classification and Tabulation
Classification Tabulation
It is the basis for tabulation It is the basis for further analysis
It is the basis for simplification It is the basis for presentation
Data is divided into groups and sub-
Data is listed according to a logical
groups on the basis of similarities
sequence of related characteristics
and dissimilarities.

Manipal University Jaipur Page No. 68


Statistics for Management Unit 3

3.3.2 Parts of a Table


In this section, you will study the parts of a table, which will help you in
creating accurate tables with the data given. Table 3.7 and figure 3.6 depict
the parts of a table along with the explanation of each tab (tabs from
1 to 10).

1 2

Table 3.7: Percentage of P.G. Employees based on their Age and


Departments
9
5 (Age in Years) 3
Age
Departments 40 and
20 – 40 4
above
Accounts 2.564 1.282
6 Finance 2.564 1.795
Personal 3.846 1.282
Production 2.564 2.051 7
Marketing 1.282 1.795
8
Total 12.920 8.205

10

Source: ………..

Fig. 3.6: Parts of a Table

Tab 1: Table number


Table number is to identify the table for reference. When there are many
tables in an analysis, then table numbers are helpful in identifying the tables.

Manipal University Jaipur Page No. 69


Statistics for Management Unit 3

Tab 2: Title
Title indicates the scope and the nature of contents in a concise form. In
other words, title of a table gives information about the data contained in the
body of the table. Title should not be lengthy.
Tab 3 and Tab 4: Captions
Captions are the headings and subheadings describing the data present in
the columns.
Tab 5 and Tab 6: Stubs
Stubs are the headings and subheadings of rows.
Tab 7: Body of the table
Body of the table contains numerical information.
Tab 8: Totals
The sub-totals for each separate classification and a general total for all
combined classes should be given at the bottom or right side of the figures
whose totals are taken. Ruling and spacing separate columns and rows.
However, totals are separated from main body by thick lines.
Tab 9: Head note
Head note is given below the title of the table to indicate the units of
measurement of the data and is enclosed in brackets.
Tab 10: Source note
Source note indicates the source from which data is taken. The source note
related to table is placed at the bottom on the left hand corner.
3.3.3 Types of tables
Tables are classified into three types. They are on the basis of:
a. Purpose of investigation
b. The nature of presented figures
c. Construction
a. Purpose of investigation: Tables classified under this classification are
of two types. They are:
 General purpose table – General purpose table or reference table
facilitates easy reference to the collected data. They are formed without
specific objective, but can be used for any specific purpose. They
contain large mass of data. For example: census data

Manipal University Jaipur Page No. 70


Statistics for Management Unit 3

 Specific purpose table – Specific purpose table or text table or


summary table deals with specific problems. They are smaller in size
and they highlight relationship between characteristics. For example:
cost of living indices.
b. The nature of presented figures
Tables classified under this type are of two types. They are:
 Primary table – The primary tables contain data in the form in which it
were originally collected. Table 3.8 is a primary table.
 Derived table – The derived tables represent figures like totals,
averages, ratios, etc, which are derived from original data. Table 3.9 is a
derived table derived from table 3.8.
Table 3.8: Distribution of Employees According to Age and Educational Level
in Various Departments
Age
20 – 40 40 and Above
Depart- A C A
B C Total
ments Under Post Under B
Gra- Post
Gra- Gra- Gra- Graduate
duate Graduate
duate duate duate
Accounts 10 40 10 10 15 5 90
Finance 10 30 10 12 14 7 83
Personal 15 25 10 10 14 5 79
Production 10 30 10 8 12 6 76
Marketing 5 25 10 0 15 7 62
Total 50 150 50 40 70 30 390

Table 3.9: Percentage of P.G. Employees’ Age Group According to


Departments
Age
Departments
20 – 40 40 and above
Accounts 24 21.429
Finance 20 23.571
Personal 20 20.714
Production 20 18.571
Marketing 16 15.714
Total 100 100

Manipal University Jaipur Page No. 71


Statistics for Management Unit 3

c. Construction
Different types of tables under this classification of tables are:
 Simple table – Simple table presents only one characteristic. Table 3.10
depicts a simple table.
 Complex table – Complex table presents two or more characteristics.
Table 3.11 depicts a complex table.
 Cross-classified table – In the cross-classified table, the entries are
classified in both directions. Table 3.12 depicts an example of a cross-
classified table.
Table 3.10: Defectives Produced by Batches
Batches No. of defectives
1 15
2 20
3 40
4 50
Table 3.11: Distribution of Defectives According to Batch and Nature of
Defects
Defects
Batch Major Minor
I 8 7
II 15 5
III 25 15
IV 32 18
Total 80 45
Table 3.12: Population of a City According to Age, Sex and Education During
2003 to 2005
Educated Not Educated
Age
Years Below Above Below Above
20 - 40 Total 20 – 40 Total
20 yrs 40 20 yrs 40
Sex
Male
2003
Female
Male
2004
Female
Male
2005
Female

Manipal University Jaipur Page No. 72


Statistics for Management Unit 3

Solved Problem 1
1.1 When the collected data is grouped with reference to time, we have:
a) Quantitative classification b) Qualitative classification
c) Geographical classification d) Chronological classification
Solution – Chronological classification
1.2 Most quantitative classifications are:
a) Chronological b) Geographical
c) Frequency distribution d) None of these
Solution – Frequency distribution
1.3 Caption stands for:
a) A numerical information b) The column headings
c) The row headings d) The table headings
Solution – The column headings
1.4 A simple table contains data on:
a) Two characteristics b) Several characteristics
c) One characteristic d) Three characteristics
Solution – One characteristic
1.5 The headings of the rows given in the first column of a table are called:
a) Stubs b) Captions
c) Titles d) Reference notes
Solution - Stubs
1.6 Geographical classification means, classification of data according to
_______.
Solution – Geographical regions
1.7 The data recorded according to standard of education like illiterate,
primary, secondary, graduate, technical, etc, will be known as _______
classification.
Solution – Qualitative
1.8 An arrangement of data into rows and columns is known as _______.
Solution -Tabulation
1.9 Tabulation follows ______.
Solution – Classification
1.10 In a manifold table we have data on _______.
Solution – More than two characteristics

Manipal University Jaipur Page No. 73


Statistics for Management Unit 3

Self Assessment Questions


10. State True or False
i. Tabulation presents the data in a minimum space.
ii. Tabulation is a process of analysis
iii. General purpose table deals with specific objectives.
iv. Derived tables deal with total, percentages, ratios, etc

3.4 Frequency and Frequency Distribution


The number of units associated with each value of the variable is called
frequency of that value. For example, if the variable takes the value 15 and
the value 15 occurs 3 times, then 3 is called the frequency of the value 15.
A systematic presentation of the values taken by variable together with
corresponding frequencies is called a frequency distribution of the variable.
According to Croxton and Cowden, “A frequency distribution is a statistical
table which shows the set of all distinct values of the variable arranged in
order of magnitude, either individually or in groups with their corresponding
frequencies”
Frequency distribution is a table used to organise the data. The left column
(called classes or groups) includes numerical intervals on a variable under
study. The right column contains the list of frequencies, or number of
occurrences of each class/group. Intervals are normally of equal size
covering the sample observations range.
It is simply a table in which the gathered data are grouped into classes and
the number of occurrences which fall in each class is recorded.
A frequency distribution can be classified as:
a. Series of individual observation
b. Discrete frequency distribution
c. Continuous frequency distribution
3.4.1 Series of individual observation
Series of individual observation is a series where the items are listed one
after each observation. For statistical calculations, these observations could
be arranged in either ascending or descending order. This is called as an
array.

Manipal University Jaipur Page No. 74


Statistics for Management Unit 3

Example 11
Table 3.13: Data on marks obtained in statistics paper
Marks obtained in
Roll No.
statistics paper
1 83
2 80
3 75
4 92
5 65

The above data list is a raw data. The presentation of data in above form
doesn’t reveal any information. If the data is arranged in ascending/
descending order of their magnitude, it is called arraying of data and it gives
a better presentation.
3.4.2 Discrete frequency distribution
If the data series is presented indicating its exact measurement of units,
then it is called as discrete frequency distribution. Discrete variable is one
where the variates differ from each other by definite amounts.
Solved Problem 2
Assume that a survey has been made to know the number of post-
graduates in 10 families at random; the resulted raw data could be as
follows:
0, 1, 3, 1, 0, 2, 2, 2, 2, 4
Solution
This data can be classified into an ungrouped frequency distribution.
Table 3.14: Discrete Frequency Distribution
Number of post
Frequency
graduates
f
X
0 2
1 2
2 4
3 1
4 1

Manipal University Jaipur Page No. 75


Statistics for Management Unit 3

The number of post-graduates becomes the variable X for which we can list
the frequency of occurrence f in a tabular form. Table 3.14 depicts a discrete
frequency distribution, where the variables have discrete numerical values.

3.4.3 Continuous frequency distribution


Continuous data series is one; where the measurements are only
approximations and are expressed in class intervals within certain limits. In
continuous frequency distribution the class intervals are theoretically
continuous from the starting of the frequency distribution till the end without
break. According to Boddington, ‘the variable which can take the
intermediate value between the smallest and largest value in the distribution
is a continuous frequency distribution’.
Solved Problem 3
Convert the following data into continuous frequency distribution form:
Table 3.15 depicts the marks obtained by 20 students in exams for
50 marks.
Table 3.15: Marks Obtained by 20 Students

18 23 28 29 44 28 48 33 32 43
24 29 32 39 49 42 27 33 28 29

Table 3.16 depicts how the frequency distribution table can be formed by
grouping the marks into class width of 5.
Table 3.16: Continuous Frequency Distribution
Marks No. of students
0-5 0
5 – 10 0
10 – 15 0
15 – 20 1
20 – 25 2
25 – 30 7
30 – 35 4
35 – 40 1
40 – 45 3
45 – 50 2
A continuous frequency distribution is divided into mutually exclusive sub-
ranges called classes. Classes have lower and upper limits known as lower

Manipal University Jaipur Page No. 76


Statistics for Management Unit 3

class limits and upper class limits respectively. The differences between
upper class limit and lower class limit is termed as class width. The middle
value of a class interval is called mid-value of the class. It is the average of
class limits.

3.4.4 Technical terms used in formulation of frequency distribution


a) Class limits: The class limits are the smallest and largest values in the
class.

Example 12
In the class 0 – 10, the lowest value is zero and highest value is 10. The
two boundaries of the class are called upper and lower limits of the class.
Class limit is also called as class boundaries.
b) Class intervals: The difference between the upper and lower limit of the
class is known as class interval.

Example 13
In the class 0 – 10, the class interval is (10 – 0) = 10.

Example 14
If the marks of 60 students in a class vary between 40 and 100 and if we
want to form 6 classes, the class interval would be:
LS
The formula to find class interval is given as follows: i 
R
L = Largest value
S = Smallest value
R = the no. of classes
L = 100 S = 40 R=6
LS 100  40 60
i = = = 10
R 6 6
Therefore, class intervals would be 40 – 50, 50 – 60, 60 – 70, 70 – 80,
80 – 90 and 90 – 100.

Manipal University Jaipur Page No. 77


Statistics for Management Unit 3

c) Methods of forming class-interval


i) Exclusive method (overlapping): In this method, the upper limit of one
class is the lower limit of the next class. This method makes continuity of
data.
A student whose marks are in the range of 20-29.9, will be included in the
20–30 class.
Table 3.17 depicts the number of students who have scored marks in the
range of 20-50.
Table 3.17: Marks versus Students
Marks No. of students
20 to less than 30 5
(More than 20 but less than 30)
30 to less than 40 15
40 to less than 50 25
Total Students 50

Key Statistic
Class intervals are of two types; exclusive and inclusive. The class
interval that does not include upper class limit is called an exclusive type
of class interval. The class interval that includes the upper class limit is
called an inclusive type of class interval.
ii) Inclusive method (non-overlapping): The class interval that includes
the upper class limit is called an inclusive type of class interval.
Example 15
Table 3.18: Marks versus Students

Marks No. of students


20 – 29 5
30 – 39 15
40 – 49 25
A student whose marks are 29 is included in 20–29 class intervals and a
student whose marks are 39 is included in 30–39 class intervals.

Manipal University Jaipur Page No. 78


Statistics for Management Unit 3

In table 3.19, the class ‘0 – 9’ includes the value ‘9’. In table 3.20, the class
‘0 – 10’ does not include the value ‘10’. If the value of ‘10’ occurs, it is
included in the class ‘10 – 20’.
Table 3.19: Inclusive Type of Class Interval

Marks Number of Students


0–9 15
10 – 19 20

Table 3.20: Exclusive Type of Class Interval

Marks Number of Students


0-10 15
10-20 20
20-30 28

d) Class frequency: The number of observations falling within class interval


is called its class frequency.
Example 16: The class frequency 90–100 is 5. This represents 5 students
who have scored between 90 and 100. If we add all the frequencies of
individual classes, the total frequency represents total number of items
studied.
e) Magnitude of class interval: The magnitude of class interval depends
on the range and the number of classes. The range is the difference
between the highest and smallest values in the data series. A class interval
is generally in the multiples of 5, 10, 15 and 20.
Sturges formula to find number of classes is given as follows:
K = 1 + 3.322 log N.
K = No. of classes
log N = Logarithm of total no. of observations
Solved problem 4
If the total number of observations is 100, find the number of classes.
The number of classes could be:
K = 1 + 3.322 log 100
K = 1 + 3.322 x 2
K = 1 + 6.644
K = 7.644 = 8 (Rounded off)

Manipal University Jaipur Page No. 79


Statistics for Management Unit 3

Note: Under this formula, number of classes cannot be less than 4 and not
greater than 20.
f) Class mid point or class marks: The mid value or central value of the
class is called mid point.
(lower limit of class  upper limit of class)
Mid point of a class =
2
Solved Problem 5
For the class 10–20; find the lower class limit, the upper class limit, the
width of the class and the mid value of the class.
Solution
For the class 10-20, the lower class limit and the upper class interval is 10
and 20 respectively. The width of the class is 20-10 = 10. The mid value of
the class is calculated as:
10  20
Mid value of the class = = 15.
2
Therefore, mid value of the class is 15.
g) Sturges formula to find the size of class interval

Range
Size of class interval (h) =
1  3.322 log N

Solved Problem 6
In a group of 20 workers, highest wage is Rs. 175 and lowest wage is 42 per
day. Find the size of the interval.
Solution
K= 1  3.322 log10 N = 1  3.322 log 10 20 = 1  3.322  0.3010  5.3219  6

Range 175  42 133


Solution: h = = = = 22.17 23
1  3.322 log 10 N 1  3.322 log 10 20 6

3.4.5 Constructing a frequency distribution


The following guidelines may be considered for the construction of
frequency distribution.
a) The classes should be clearly defined and each observation must
belong to one and only one class interval. Interval classes must be
inclusive and non-overlapping.
Manipal University Jaipur Page No. 80
Statistics for Management Unit 3

b) The number of classes should be neither too large nor too small.
Too many small classes result in greater interval width with loss of
accuracy. Too many large class intervals result is complexity.
c) All intervals should be of the same width. This is preferred for easy
computations.
Range
The width of interval =
Number of classes
d) Open end classes should be avoided since it creates difficulty in
analysis and interpretation. (Open end class means either lower limit of
the first class or upper limit of the last class will not be specified)
e) Intervals should be continuous throughout the distribution. This is
important for continuous distribution.
f) The lower limits of the class intervals should be simple multiples of the
interval.

3.4.6 Tally marks


Tally marks are used to construct frequency table. Tally mark is a small
vertical line drawn against a class as soon as we observe a value belonging
to the class. The fifth tally mark is crossed for easy counting purposes. The
table 3.21 represents the marks secured in mathematics by the students of
a class.

Example 17
From the table 3.21, we can depict that ten students got 90 marks in
mathematics; five students got 82 and five got 75.
Table 3.21: Marks Secured by Students in Mathematics
Marks secured in Number of Students
mathematics
90
82
75

Solved Problem 7
The following problem will explain how raw data can be converted to
frequency distribution.

Manipal University Jaipur Page No. 81


Statistics for Management Unit 3

Marks of 100 students (out of 20) are given below:


Table 3.22: Marks of 100 students (out of 20)

5 14 10 16 8 15 1 14 9 6
11 3 8 12 6 4 11 17 7 10
18 10 15 9 8 14 8 5 15 4
10 13 4 18 2 6 10 7 13 8
16 7 14 11 9 4 11 9 3 7
1 8 10 5 13 7 15 8 19 16
6 17 11 15 6 3 18 12 9 4
14 11 9 4 14 12 8 7 19 10
15 8 19 11 7 16 10 3 6 14
10 19 3 20 8 11 20 14 9 19

Solution
Frequency table for the above data is as follows
Table 3.22a: Frequency table

Key Statistic
If the class interval does not prescribe lower limit for first class or upper
limit for the last class, then it is known as open-end class interval.

Solved Problem 8
In a survey, it was found that 64 families bought milk in the following
quantity in a particular month.

Manipal University Jaipur Page No. 82


Statistics for Management Unit 3

Table 3.23: Quantity of milk bought by 64 families in a particular month

16 22 9 22 12 39 19 14 23 6
24 16 18 17 20 25 28 18 10 24
20 21 10 7 18 28 24 20 14 23
25 34 22 5 33 23 26 29 13 36
11 26 11 37 30 13 8 15 22 21
32 21 31 17 16 23 12 9 15 27
17 21 19 7

Form a frequency distribution of inclusive type.


Solution: Here the total frequency N= 64
Range = Largest value- Smallest value
= 39-5 = 34
If the class width is 5, the number of classes are = (Range/ Class width) =
(34/5) = 6.8 = 7 (approximately)
Hence, taking magnitude of each class interval as 5, we shall get 7 class
intervals. Since the Smallest value is 5, the first class interval by inclusive
method is 5-9. The frequency distribution table is as follows:
Table 3.23a: Frequency distribution table

Manipal University Jaipur Page No. 83


Statistics for Management Unit 3

3.4.7 Derived frequency distributions


From a given frequency distribution, we can form five derived frequency
distributions. They are:
i) Relative frequency distribution
If ‘f’ is the class frequency and ‘N’ is the total frequency, the relative
frequency distribution is formed by calculating f/N. Total of all the
values of relative frequency distribution will always be one.
ii) Percentage frequency distribution
The percentage frequency distribution is formed by multiplying
the ratio f/N by 100.
iii) Frequency density distribution
If “c” is the width of the class-interval and “f” is the frequency of the
class, then frequency density distribution is formed by calculating f/c.
iv) Less than cumulative frequency distribution
The less than cumulative frequency distribution is formed with a
number of observations which are less than a given value.
v) More than cumulative frequency distribution
The more than cumulative distribution is formed with a number of
observations, which are more than a given value.

3.4.8 Cumulative frequency distribution


Cumulative frequency distribution is a frequency distribution that indicates
directly the number of units that lie above or below the specified values of
the class intervals. When the interest of the investigator is on the number of
cases below the specified value, then the specified value represents the
upper limit of the class interval. It is known as ‘less than’ cumulative
frequency distribution. When the interest lies in finding the number of cases
above specified value, then this value is taken as lower limit of the specified
class interval. This is known as ‘more than’ cumulative frequency
distribution.
The cumulative frequency simply means that summing up the consecutive
frequency.

Manipal University Jaipur Page No. 84


Statistics for Management Unit 3

Solved Problem 9
In a country music band of 48 members, 22 play guitar, 12 play brass,
14 play piano. Create a tabular display of the frequency and Relative
frequency distribution for the type of instruments.
Solution
Table 3.24 depicts the frequency and frequency distribution for the type of
instruments in a country music band.
Table 3.24: Frequency Distribution of the Type of Instruments

Type of Instrument Frequency Relative Frequency Distribution


Guitar 22 22/48 = 0.4583
Brass 12 12/48 = 0.2500
Piano 14 14/48 = 0.2917
Total 48 1

Solved Problem 10
Table 3.25 depicts the frequency distribution of marks. Calculate the derived
frequency distributions, less than and more than cumulative frequency
distribution.
Table 3.25: Frequency Distribution of Marks

Marks No of students
0-20 15
20-40 20
40-60 28
60-80 22
80-100 15
Total 100

Solution: Table 3.25a depicts the derived frequency distributions. Table


3.25b depicts less than cumulative frequency distribution and table 3.25c
depicts more than cumulative frequency distribution.

Manipal University Jaipur Page No. 85


Statistics for Management Unit 3

Table 3.25a: Forms of Derived Frequency Distribution

Frequency Relative Frequency Percentage


f freq.
Marks Density Distribution
distribution
(f/N) D =(f/c) (f/N) x100
0 – 20 15 0.15 0.75 15
20 – 40 20 0.20 1.00 20
40 – 60 28 0.28 1.40 28
60 – 80 22 0.22 1.10 22
80 – 100 15 0.15 0.75 15
N = 100 1.00 – 100 %

Table 3.25b: Less than Cumulative Frequency Distribution

Marks Frequency Less than Cumulative Frequency


f
0 – 20 15 15
20 – 40 20 15+20= 35
40 – 60 28 35+28= 63
60 – 80 22 63+22= 85
80 – 100 15 85+15= 100
N = 100

Table 3.25c: More than Cumulative Frequency Distribution

Marks Frequency More than Cumulative Frequency


f
0 – 20 15 100
20 – 40 20 100-15= 85
40 – 60 28 85-20= 65
60 – 80 22 65-28= 37
80 – 100 15 37-22= 15
N = 100

Manipal University Jaipur Page No. 86


Statistics for Management Unit 3

3.4.9 Bivariate and Multivariate frequency distribution


Frequency distribution of more than two variables is known as multivariate
frequency distribution. If the number of variables is only two, then it is called
bivariate frequency distribution. A bivariate frequency distribution will have
two marginal distributions and “m+n” conditional distributions. Here ‘m’
represents the number of rows and ‘n’ represent the number of columns.
In the table 3.26, the numbers in the last row and column represent marginal
distribution of age. Any row or column number represents conditional
distribution of salary. There are 4 rows (m = 4) and 3 columns
(n = 3). We have 4+3=7 conditional distributions. Table 3.26a depicts the
conditional distribution of age for a given salary and table 3.26b depicts the
conditional distribution of salary for a given age.
Table 3.26: Distribution of Age and Salary

Age in Salary/Month (Rs.)


years 9,000 – 12,000 12,000 – 15,000 15,000 – 18,000 Total
20 – 30 10 3 0 13
30 – 40 8 12 3 23
40 – 50 6 15 10 31
50 – 60 0 3 18 21
Total 24 33 31 88

Table 3.26a: Conditional Distribution of Age for Given Salary


Age Salary (9000 - 12000)
20-30 10
30-40 8
40-50 6
50-60 0
Total 24
Table 3.26b: Conditional Distribution of Salary for Given Age
Salary Age (40-50)
9000-12000 6
12000-15000 15
15000-18000 10
Total 31

Manipal University Jaipur Page No. 87


Statistics for Management Unit 3

Solved problem 11
Table 3.27 depicts the data related to the height and weight of 20 people.
Construct a bivariate frequency table with class interval of height as 62-64,
64-66…and weight as 115-125,125-135 and write down the marginal
distribution of X and Y.
Table 3.27: Height and Weight of 20 People
S.No. Height Weight S.No. Height Weight
1 70 170 11 70 163
2 65 135 12 67 139
3 65 136 13 63 122
4 64 137 14 68 134
5 69 148 15 67 140
6 63 121 16 69 132
7 65 117 17 65 120
8 70 128 18 68 148
9 71 143 19 67 129
10 62 129 20 67 152

Solution
Table 3.27a depicts the bivariate frequency table showing height and weight
of people.
Table 3.27a: Bivariate Frequency Table

Height(X)
Weight(Y) 62-64 64-66 66-68 68-70 70-72 Total

115-125 II (2) II (2) 4


125-135 I (1) I (1) II (2) I (1) 5
135-145 III (3) II (2) I (1) 6
145-155 I (1) II (2) 3
155-165 I (1) 1
165-175 I (1) 1
Total 3 5 4 4 4 20

Manipal University Jaipur Page No. 88


Statistics for Management Unit 3

Table 3.27b depicts the marginal distribution of height and weight.


Table 3.27b: Marginal Distribution of Height and Weight

Marginal distribution of Marginal distribution of weight


height (X) (Y)
CI Frequency CI Frequency
62-64 3 115-125 4
64-66 5 125-135 5
66-68 4 135-145 6
68-70 4 145-155 3
70-72 4 155-165 1
Total 20 165-175 1
Total 20

Self Assessment Questions


11. i) If the data readings are 3, 4, 5, 6, 7, then it is called _________
variable. Height is generally __________ variable.
ii) There are ____________ derived frequency distributions for any
frequency distribution.
iii) Width of class-interval is given by the difference between
________ and ______.
iv) There are ________ marginal distributions for a distribution.
v) __________ formula is used to calculate the number of class-
intervals.
vi) The relative frequency distribution is obtained from frequency
distribution by calculating ___________.

3.5 Presentation of Data


Top management and common man do not have time to go through mass
data and to understand its nature. For them, diagrammatic and graphical
presentations are more intelligible, attractive and appealing. The
diagrammatic representations give a bird’s eye-view of the data. They
facilitate comparison of various aspects of data and it creates ever lasting
impressions. However, they cannot be considered as alternatives for
numerical data. Mathematical calculations are not possible as they do not
give accurate values.

Manipal University Jaipur Page No. 89


Statistics for Management Unit 3

3.5.1 Diagrammatic and Graphic representation


The data collected can be presented graphically or pictorially for easy
understanding and for quick interpretation. Diagrams and graphs give visual
indications of magnitudes, groupings, trends and patterns in the data. The
parameters can be more simply presented in the graphical manner. The
diagrams and graphs helps for comparison of the variables.
Diagrammatic presentation
A diagram is a visual form for presentation of statistical data. The diagram
refers to various types of devices such as bars, circles, maps, pictorials,
cartograms, etc.
Importance of diagrams:
1. They are simple, attractive and easily understandable
2. They give quick information
3. It helps to compare the variables
4. Diagrams are more suitable to illustrate discrete data
5. It will have more stable effect in the reader’s mind
Limitations of diagrams
a. Diagrams shows approximate value
b. Diagrams are not suitable for further analysis
c. Some diagrams are limited to experts (multidimensional)
d. Details cannot be provided fully
e. It is useful only for comparison
3.5.2 General rules for drawing the diagrams
i) Each diagram should have a suitable title indicating the theme with
which the diagram is intended, at the top or at the bottom.
ii) The size of the diagram should emphasise the important characteristics
of the data.
iii) Approximate proposition should be maintained for length and breadth
of the diagram.
iv) A proper/suitable scale to be adopted for diagram.
v) Selection of approximate diagram is important and wrong selection
may mislead the reader.

Manipal University Jaipur Page No. 90


Statistics for Management Unit 3

vi) Source of data should be mentioned at the bottom.


vii) Diagram should be simple and attractive.
viii) Diagram should be effective and not complex.

3.5.3 Types of diagrams


1. One dimensional diagrams (Line and Bar)
In one dimensional diagram, the length of the bars or lines is taken into
account and the widths of the bars are not considered. Bar diagrams are
classified mainly as follows:
a. Line diagram
b. Bar diagram
1. Vertical bar diagram
2. Horizontal bar diagram
3. Multiple (compound) bar diagram
4. Sub-divided (component) bar diagram
5. Percentage subdivided bar diagram
a. Line diagram –
This is the simplest type of one dimensional diagram. On the basis of the
size of the figures, heights of the bar/ lines are drawn. The distances
between bars are kept uniform. The limitation of this diagram is that, it is not
attractive and it cannot provide more than single information.

Solved Problem 12
Draw the line diagram for the following data
Table 3.28: Data for line diagram

Year 2001 2002 2003 2004 2005 2006


No. of students passed in first
5 7 12 5 13 15
class with distinction

Manipal University Jaipur Page No. 91


Statistics for Management Unit 3

Solution
Figure 3.7 depicts line diagram.

16
(15)

14
No. of students passed in FCD

(13)
(12)
12

10

8
(7)

6 (5) (5)

4
2001 2002 2003 2004 2005 2006
Year

Fig. 3.7: Line Diagram

Indication of diagram: Highest FCD is at 2006 and lowest FCD are at


2001 and 2004.
b. Simple bar diagram
Simple bar diagram is drawn when items are to be compared with respect to
a single characteristic. A rectangular bar is constructed with height
proportional to the magnitude of the items. It can be drawn using a
horizontal or a vertical bar. In business and economics, it is a common
diagram.
Solved Problem 13
Table 3.29 depicts the data regarding the yield/acre of paddy in Karnataka
over the last five years. Represent the data in a bar diagram.
Table 3.29: Data Regarding the Yield Per Acre in Karnataka

Year 2001 2002 2003 2004 2005


Yield 20 22 25 27 30

Manipal University Jaipur Page No. 92


Statistics for Management Unit 3

Solution
Figure 3.8 is a simple bar diagram which depicts the yield of paddy in
Karnataka.

Fig. 3.8: Simple Bar Diagram Showing Yield of Paddy in Karnataka

1. Vertical bar diagram


Solved Problem 14
The annual expenses of maintaining various types of cars are given as
follows. Draw the vertical bar diagram. The annual expenses of
maintenance include (fuel + maintenance + repair + assistance +
insurance).
Table 3.30: Annual Expenses of Cars

Type of the Car Expense in Rs. / Year


Maruthi Udyog 47533
Hyundai 59230
Tata Motors 63270

(Source: 2005 TNS TCS Study)


(Published at: Vijaya Karnataka dated: 03.08.2006)

Manipal University Jaipur Page No. 93


Statistics for Management Unit 3

Solution
Figure 3.9 depicts the annual expenses of various cars in a vertical bar
diagram.

70000
65000 63270
59230
60000
55000
50000 47533
45000
40000
35000
30000
Maruthi Udyog Hyundai Tata Motors

Fig. 3.9: Vertical Bar Diagram


(Source: 2005 TNS TCS Study)
(Published at: Vijaya Karnataka dated: 03.08.2006)

Indications of the diagram


 Annual expenses of Maruthi Udyog car is comparatively less with other
brands depicted.
 High annual expenses of Tata motors can be seen from diagram.

2. Horizontal bar diagram


Solved Problem 15
Table 3.31 depicts the data of the world’s top 10 steel makers. Draw a
horizontal bar diagram.
Table 3.31: Production of Steel Makers in Million Tonnes

Steel Arcelor Nippon POSCO JFE BAO US NUCOR RIVA Thyssen- Tangshan
maker Mittal Steel Steel krupp

Prodn. in 110 32 31 30 24 20 18 18 17 16
million
tonnes

Manipal University Jaipur Page No. 94


Statistics for Management Unit 3

Solution
Figure 3.10 depicts production of steel by top ten steel makers.

Tangshan 16
Thyssen-krupp 17
Top - 10 Steel Makers

RIVA 18
NUCOR 18
US Steel 20
BAO Steel 24
JFE 30
POSCO 31
Nippon 32
Arcelor Mittal 110

0 20 40 60 80 100 120
Production of Steel (Million Tonnes)

Fig. 3.10: Production of Top Ten Steel Makers

(Source: ISSB Published by India Today)

3. Compound bar diagram (Multiple bar diagram)


Multiple bar diagrams are used to provide more information than simple bar
diagrams. Multiple bar diagram provides more than one phenomenon and
is highly useful for direct comparison. The bars are drawn side by side and
different columns, shades, hatches can be used for indicating each
variables used. Multiple bar diagrams are drawn when we have two or more
sets of comparable values.
Solved Problem 16
Table 3.32 depicts the resale value of cars (Rs. 000). Draw the bar diagram
for the following data:
Table 3.32: Resale Value of Various Cars
Year (Model) 2003 2004 2005
Santro 208 252 248
Zen 240 278 274
Wagon R 261 296 302

Manipal University Jaipur Page No. 95


Statistics for Management Unit 3

Solution

350
296 302
300 278 274
261 252
240 248
250
Value in Rs.

208
200

150

100

50

0
1 2 3
Model of Car
Santro Zen Wagnor

Fig. 3.11: Bar diagram for data in Table 3.32


(Source: True value used car purchase data)
(Published by: Vijaya Karnataka dated: 03.08.2006)

Solved Problem 17
Table 3.33 depicts the cost of manufacturing/unit and the revenue/unit from
2002-2005. Create a multiple bar diagram for this data.

Table 3.33: Cost and Revenue per Unit

Year Cost of Manufacturing/Unit Revenue/Unit


2002 - 2003 40 70
2003 – 2004 45 85
2004 – 2005 55 90

Solution: The multiple bar diagram in figure 3.12 depicts the cost and
revenue per unit.

Manipal University Jaipur Page No. 96


Statistics for Management Unit 3

2002

Fig. 3.12: Multiple Bar Diagram showing the Cost and Revenue per Unit

4. Component (sub-divided) bar diagram


Component bar diagrams are used when two or more characteristics are
observed in a unit. Each bar is proportionally subdivided.

5. Percentage Bar diagram


Percentage bars are drawn to represent items whose magnitude has two or
more components, and when a comparison of these components as
percentage is required. Here the components are expressed as percentages
of the corresponding totals. The totals are represented by bars of equal
width and height equal to 100 each. These bars are divided according to the
percentage components. The different subdivisions are shaded properly and
an index which describes the shades is provided.

Solved Problem 18
The following table gives the details of monthly expenditure of two families
A and B. Represent the data by percentage bar chart.

Manipal University Jaipur Page No. 97


Statistics for Management Unit 3

Table 3.34: Monthly expenditure of two families A and B

Item Family A Family B


Income Rs. 5000 Income Rs .8000
Food 1400 1600
House rent 1200 1600
Fuel 700 800
Miscellaneous 1000 1600

Solution: Here, on adding the total expenditure of family A comes to


Rs.4300 and that of family B comes to Rs. 5600. But the income of the
families are Rs.5000 and Rs.8000 respectively. Thus, the balance amount
of Rs.700 and Rs.2400 can be treated as savings. From this we can
compute the following percentage expenditures.
Table 3.34a: Percentage of expenditures

Family A Family B
Item Percentage Percentage
Cumulative Cumulative
of of
percentage percentage
expenditure expenditure
Food 28 28 20 20
House rent 24 52 20 40
Fuel 14 66 10 50
Miscellaneous 20 86 20 70
Savings 14 100 30 100
Income 100 100

Fig. 3.13: Percentage bars representing expenditure of families

Manipal University Jaipur Page No. 98


Statistics for Management Unit 3

2. Two-dimensional diagram
In a two-dimensional diagram, both breadth and length of the diagram
i.e. area of the diagram are considered, as area of the diagram represents
the data. The important two dimensional diagrams are:
 Rectangular diagram
 Square diagram
 Rectangular diagram: Rectangular diagrams are used to depict two or
more variables. This diagram helps for direct comparison. The area of
rectangles is kept in proportion to the values. It may be of two types:
 Percentage sub-divided rectangular diagram
 Sub-divided rectangular diagram
In the former, width of the rectangles is proportional to the values, the
various components of the values are converted into percentages and
rectangles are divided according to them. In the later case, rectangles are
used to show some related phenomenon like cost per unit, quality of
production, etc.
Solved Problem 19
Table 3.35 depicts the expenditure of items by family A and family B. Draw
the rectangle diagram.
Table 3.35: Expenditure of Items in Rupees
Expenditure in Rs.
Item Expenditure
Family A Family B
Provisional stores 1000 2000
Education 250 500
Electricity 300 700
House Rent 1500 2800
Vehicle Fuel 500 1000
Total 3500 7000

Solution
Total expenditure will be taken as 100 and the expenditure on individual
items are expressed in percentage. The width of two rectangles is in
proportion to the total expenses of the two families i.e. 3500: 7000 or 1: 2.
The height of rectangles is according to the percentage of expenses. Table

Manipal University Jaipur Page No. 99


Statistics for Management Unit 3

3.35a depicts the expenditure of various items on a monthly basis, by family


A and family B.
Table 3.35a: Monthly Expenditure of Items

Monthly Expenditure
Item Expenditure Family A (Rs. 3500) Family B (Rs. 7000)
Rs. Percentage Rs. Percentage
Provisional stores 1000 28.57 2000 28.57
Education 250 7.14 500 7.14
Electricity 300 8.57 700 10
House Rent 1500 42.85 2800 40
Vehicle Fuel 500 12.85 1000 14.28
Total 3500 100 7000 100

Provisonal Stores Education


Electricity House Rent Vehicle Fuel

100

80
% of Expenditure

60

40

20

0
A B
Family

Fig. 3.14: Diagram showing the expenditure

 Square diagram: To draw square diagrams, the square root is taken of


the values of the various items. A suitable scale may be used to depict
the diagram. Ratios are to be maintained to draw squares.

Manipal University Jaipur Page No. 100


Statistics for Management Unit 3

 Component pie diagram


It is drawn when data have magnitudes for two or more components. Circles
with the area proportional to magnitudes are drawn to represent the total
magnitude. Then circles are divided sector-wise according to the magnitude
of the components. If ‘T’ is the total magnitude and ‘R’ is the magnitude of a
component, then the angle at the centre is given by:
360 R
A
T
Solved Problem 20
Table 3.36 depicts the expenses of Prasad’s family and Krishna’s family.
Draw a pie-diagram for the data.
Table 3.36: Monthly Expenses of Two Families
Items Monthly Expenses of Two Families
Prasad’s Family Krishna’s Family
Food 2000 4000
Rent 1000 1500
Fuel 500 1000
Misc 500 1500
Total 4000 8000

Solution: The radii of the circles should be:


4000 : 8000

63.245 : 89.44
1.27 : 1.79

We draw two circles with radii 1.3 and 1.8 cms (where, 1 cm = 50 units).
Table 3.36a depicts the determined angles at the centre.

Manipal University Jaipur Page No. 101


Statistics for Management Unit 3

Table 3.36a: Monthly Expenses Represented in Angles

Monthly Expenses of Two Families


Items
Prasad’s Family Krishna’s Family
Food 180 180
Rent 90 67.5
Fuel 45 45
Misc 45 67.5
Total 360 360

Fig. 3.15: Pie-chart Showing Monthly Expenses of Prasad’s Family

Fig. 3.16: Pie-chart Showing Monthly Expenses of Krishna’s Family

Self Assessment Questions


12. i) Diagrams give an accurate value. (True/False)
ii) Pie diagram is drawn according to degree subtended at the centre
of a circle. (True/False)

Manipal University Jaipur Page No. 102


Statistics for Management Unit 3

iii) Simple bar diagram is drawn for multiple characteristics.


(True/False)
13. The graph plotted in the form of series of rectangles is
i) Frequency ii) Frequency polygon
iii) Pie iv) Histogram
14. The diagram which are used to show percentages break down is
i) A circle ii) A square
iii) A pie diagram iv) A rectangle
15. A line graph indicates
i) Comparison ii) Variation
iii) Range iv) All the above
16. Which of the following is not a type of bar chart?
i) Multiple ii) Percentages
iii) Subdivided iv) Ogive

3.5.3 Choice or selection of diagram


There are many methods to depict statistical data through diagram. No
angle diagram is suited for all purposes. The choice/selection of diagram to
suit a given set of data requires skill, knowledge and experience. Primarily,
the choice depends upon the nature of data and purpose of presentation, to
whomever it is meant. The nature of data will help in taking a decision as to
one-dimensional or two-dimensional or three-dimensional diagram. It is also
required to know the audience for whom the diagram is depicted.
The following points are to be kept in mind for the choice of diagram:
 To the common man who has less knowledge in statistics; cartogram
and pictograms are suited.
 To present the components apart from magnitude of values, sub-divided
bar diagram can be used.
 When a large number of components are to be shown, pie diagram is
suitable.

3.6 Graphical Presentation


Graphical presentations are visual form of presentation graphs, which are
drawn on a special type of paper known as graph paper.

Manipal University Jaipur Page No. 103


Statistics for Management Unit 3

Graphs are used mainly for frequency distributions. Some of the types of
graphs are:
i) Histogram
ii) Frequency polygon
iii) Frequency curve
iv) Ogives [cumulative frequency curves]
Advantages of graphic presentation
 It provides an attractive and impressive view
 Simplifies complexity of data
 Helps for direct comparison
 It helps for further statistical analysis
 It is the simplest method of presentation of data
 It shows trend and pattern of data
Table 3.37 depicts the difference between graph and diagram
Table 3.37: Differences between Diagrams and Graphs

Diagram Graph
1. Ordinary paper can be used 1. Graph paper is required
2. It is attractive and easily 2. It is not easily understandable
understandable
3. It is appropriate and effective to 3. It creates problem
measure more variable
4. It cannot be used for further 4. Can be used for further analysis
analysis
5. It gives comparison 5. It shows relationship between
variables
6. Data are represented by bars, and 6. Points and lines are used to
rectangles represent data
3.6.1 Histogram
In this type of representation the given data are plotted in the form of series
of rectangles. Class intervals are marked along the x-axis and the
frequencies are along the y-axis according to the suitable scale. Unlike the
bar chart, which is one-dimensional, a histogram is two-dimensional in
which the length and width are both important. A histogram is constructed
from a frequency distribution of grouped data, where the height of rectangle

Manipal University Jaipur Page No. 104


Statistics for Management Unit 3

is proportional to the respective frequency and width represents the class


interval.
Solved Problem 21
Table 3.38 depicts the range of marks obtained by the number of students.
Construct a histogram for the following data.
Table 3.38: Marks Obtained by Students

Marks obtained No. of students (f)


15 – 25 5
25 – 35 3
35 – 45 7
45 – 55 5
55 – 65 3
65 – 75 7
Total 30

Figure 3.17 depicts a histogram for the marks obtained by students.

Fig. 3.17: Histogram for the Marks Obtained by students

Manipal University Jaipur Page No. 105


Statistics for Management Unit 3

Solved Problem 22
Table 3.39 depicts the distribution of age. Draw a histogram for this data.
Table 3.39: Distribution of Age

Age 0-10 10-20 20-30 30-40 40-50


No. of people (f) 5 10 15 12 8

Solution: The figure 3.18 depicts the histogram for the distribution of age
data.

Fig. 3.18: Histogram for the Distribution of Age

We join the upper left corner of highest rectangle to the right adjacent
rectangle’s left corner and right upper corner of highest rectangle to left
adjacent rectangle’s right corner. From the intersecting point of these lines
we draw a perpendicular to the x-axis. The x-reading at that point gives the
mode of the distribution.
If the widths of the rectangles are not equal then we make areas of the
rectangles proportional and draw the histogram.
3.6.2 Frequency polygon
A frequency polygon is a line chart of frequency distribution in which; either
the values of discrete variables or the mid-point of class intervals are plotted
against the frequency or those plotted points are joined together by straight
lines. Since, the frequencies do not start at zero or end at zero, this
diagram as such would not touch the horizontal axis. However, since the
area under entire curve is the same as that of a histogram which is 100%;
Manipal University Jaipur Page No. 106
Statistics for Management Unit 3

the curve must be ‘enclosed’. The beginning of the curve touches the
horizontal axis and the last mid-point is joined with a ‘fictitious’ succeeding
mid-point, whose value is also zero, so that the curve will end at the
horizontal axis. This enclosed diagram is known as ‘frequency polygon’.
Solved Problem 23
Table 3.40 depicts the number of frequencies at which the marks are
obtained. Construct a frequency polygon for this data.
Table 3.40: Number of frequencies at which the marks are obtained

Marks Frequency
Mid-point
CI f
15 – 25 5 20
25 – 35 3 30
35 – 45 7 40
45 – 55 5 50
55 – 65 3 60
65 – 75 7 70

Solution
Figure 3.19 depicts a frequency polygon.

10

8 A Frequency polygon

6
Frequency

0
0 10 20 30 40 50 60 70 80 90 100

Mid point (x)

Fig. 3.19: Frequency Polygon

Manipal University Jaipur Page No. 107


Statistics for Management Unit 3

3.6.3 Frequency curve


First we draw a histogram for the given data. Then we join the mid points of
the rectangles by a smooth curve. Total area under frequency curve
represents total frequency. They are the most useful form of frequency
distribution.
Solved Problem 24
Construct a frequency curve for the data represented in table 3.39.
Solution
Figure 3.20 depicts the frequency curve for the data in solved problem 22.

Fig. 3.20: Frequency Curve

3.6.4 Ogives
Ogive is obtained by drawing the graph of a cumulative frequency
distribution. Hence, ogives are also called as cumulative frequency curves.
Since a cumulative frequency distribution can be of 'less than' or 'greater
than' type, we have less than and greater than type of ogives.
Less than Ogive – Variables are taken along x-axis and less than
cumulative frequencies are taken along y-axis. Less than cumulative
frequencies are plotted against the upper limit of class interval and joined by
a smooth-curve.
More than Ogive – More than cumulative frequencies are plotted against
lower limit of the class-interval and joined by a smooth-curve.

Manipal University Jaipur Page No. 108


Statistics for Management Unit 3

From the meeting point of these two ogives, if we draw a perpendicular line
to the x-axis, the point where it meets x-axis gives the median of distribution.
Solved Problem 25
Construct an Ogive curve for the data depicted in table 3.41.
Table 3.41: Data for Ogive Curve

Marks No. of Less than More than


frequencies Mid-point
CI f Cum. Freq. Cum. Freq.
15 – 25 5 20 5 30
25 – 35 3 30 8 25
35 – 45 7 40 15 22
45 – 55 5 50 20 15
55 – 65 3 60 23 10
65 – 75 7 70 30 7

Figure 3.21 depicts a ‘less than’ ogive diagram.

30
Less than Cumulative Frequency

'Less than' ogive


25

20

15

10

20 30 40 50 60 70

Upper Boundary (CI)

Fig. 3.21: Less than Ogive Diagram

Manipal University Jaipur Page No. 109


Statistics for Management Unit 3

Figure 3.22 depicts a ‘more than’ ogive diagram.


35

30 'More than' ogive


More than Ogive

25

20

15

10

10 20 30 40 50 60 70

Lower Boundary (CI)

Fig. 3.22: More than Ogive Diagram

Activity:
1. A friend of yours heard that you were taking statistics and has
presented you with the following table from which he wants you to
construct a histogram.
Table 3.42: Frequency table
Age Relative Frequency (%)
00-14 28.4
15-44 50.5
45+ 21.1
100.0

Discuss the problems involved in drawing a histogram.

Manipal University Jaipur Page No. 110


Statistics for Management Unit 3

2. Given the following data:


18, 13, 2, 20, 8, 10, 5, 10, 6, 9, 10, 20, 2
15, 16, 16, 13, 10, 17, 10, 3, 2, 15, 8, 5
a) Construct a frequency distribution using the Class limits
1-4, 5-8, 9-12, 13-16, 17-20.
b) Why is it useful to construct a frequency distribution and/or a
histogram of the sample data?
3. On a final examination, the following scores were earned:
5,6,7,7,10,15,16,16,17,17,22,22,22,25,26,28,31,33,35,37,40.
Use this data to answer the following questions:
(1) Construct a frequency table for this data, grouping the data into
9 class intervals.
(2) What are the exact or real limits of the lowest class interval?
4. Variable Y can only take on the following five values with relative
frequencies as indicated:
Table 3.43: Frequency table

Y Relative Frequency Cumulative Relative Frequency

0 5/100

1 25/100

2 30/100

3 25/100

4 15/100

a. Fill in the cumulative relative frequencies.


b. Using 5 classes with unit widths and identifying the midpoints of
the classes with the values of y, plot the relative frequency
histogram for the data.
c. Using your relative frequency histogram, indicate the positions of
the mean, median, and mode.
d. If relative frequency is given as the interpretation of probability,
then what is the probability that y is greater than or equal to 2?

Manipal University Jaipur Page No. 111


Statistics for Management Unit 3

5. If both frequency diagram and relative frequency diagram are drawn


for the following data:
Table 3.44
X f
15 3
17 5
19 4
23 3
29 1

We can state that: (Choose the right option)


a. Both will have the same values on the horizontal axis.
b. The frequency diagram will be less precise than the relative
frequency diagram.
c. One depicts data in many-value classes while the other depicts
single-value classes.
d. Both depict continuous variables.
6. A graphical presentation may accomplish ALL BUT which of the
following objectives?
a) Illustrate the amount of variation in the data.
b) Illustrate approximately where the mean is.
c) Allow comparison with similar data.
d) Will have the exact same shape regardless of what units are used
on the axis.
7. A frequency distribution provides which of the following information:
a) The value of the measurement and the number of individuals
with that value.
b) The value of the measurement and the percent of individuals
with that value.
c) The value of the measurement and the percent of individuals
with that value or a smaller one.
d) The value of measurement permits comparisons between data
sets of different sizes.

Manipal University Jaipur Page No. 112


Statistics for Management Unit 3

Activity Solution
1. Open ended interval, too few intervals to give meaningful results and
intervals are of unequal length.
2. a) Table 3.45: Frequency table
Class Frequency Relative Frequency
1-4 4 4/25
5-8 5 5/25
9-12 6 6/25
13-16 6 6/25
17-20 4 4/25
Totals 25 1

b) (i) Sometimes it is easier to get the mean and variance


(group method -- by hand).
(ii) Visual check on calculation of mean and variance.
(iii) Easier for a lay person (specialist also) to see what values
are likely and what values are unlikely in a population.
3. 1) Table 3.46: Frequency Table

Score Midpoint Frequency


5-8 6.5 4
9-12 10.5 1
13-16 14.5 3
17-20 18.5 2
21-24 22.5 3
25-28 26.5 3
29-32 30.5 1
33-36 34.5 2
37-40 38.5 2

2) The real or exact limits of the lowest interval are 4.5 – 8.5.

Manipal University Jaipur Page No. 113


Statistics for Management Unit 3

4. a) Table 3.47: Frequency table


Y Cumulative relative frequency
0 05/100
1 30/100
2 60/100
3 85/100
4 100/100

b) |
30 + -------
| | |
+ ------ | |------
| | | | |
20 + | | | |
Relative | | | | |
Frequency + | | | |----
| | | | | |
10 + | | | | |
| | | | | |
+-----| | | | |
| | | | | |
---+----+-----+-----+-----+--------> Y
0 1 2 3 4
c) Median = 1.5 + (20/30)(1.0) = 2.17
Mode = 2
Mean = [0(5) + 1(25) + 2(30) + 3(25) + 4(15)]/100 = 220/100 = 2.2
d) Probability (Y >= 2) = (30/100) + (25/100) + (15/100) = 70/100 =
0.70 = 70%
5. a) Both will have the same values on the horizontal axis.
6. d) Will have the exact same shape regardless of what units are used
on the axis.
7. a) The value of the measurement and the number of individuals with
that value.

Manipal University Jaipur Page No. 114


Statistics for Management Unit 3

3.7 Summary
Let us recapitulate the important concepts discussed in this unit:
 For better understanding and usefulness, the collected data is classified
in a systematic manner according to common characteristics.
Classification simplifies and makes data more comprehensible and
renders the data ready for statistical analysis.
 Classified data is tabulated in rows and columns for presentation, using
various types of classification. The tabulated data should be simple and
unambiguous, which should be understood and interpreted easily.
 Frequency distribution is a special type of tabulation. In more concise
form, it brings out the salient features of the distribution.
 Data presented in a diagram or graphical form is more appealing and
gives rough idea of the situation for busy executives.
 Graphical data is visual representation of data in the form of line
diagrams, pie-charts, histograms, frequency polygons, frequency curves,
or ogives.
 In a pie chart, different segments of a circle represent percentage
contribution of various components to the total. It brings out the relative
importance of various components of data.
 The graph of cumulative frequency distribution is the ogive curve.

3.8 Glossary
Bar graph: A graphical device for depicting data that have been
summarised in a frequency distribution.
Bivariate distribution: If the number of variables is only two, then it is
called bivariate frequency distribution.
Cross tabulation: A tabular summary of data for two variables.
Frequency distribution: A tabular summary of data of numbers.
Histogram: A graphical presentation of a frequency distribution.
Multivariate frequency distribution: Frequency distribution of more than
two variables is known as multivariate frequency distribution.
Ogive: A graph of a cumulative distribution.

Manipal University Jaipur Page No. 115


Statistics for Management Unit 3

Pie chart: A graphical device for depicting data summaries based on the
subdivision of a circle into sector that corresponds to the relative frequency
for each class.

3.9 Terminal Questions


1. Form frequency distribution for the following data regarding weight of 50
people.
Table 3.48: Data regarding weight of 50 people
50 72 61 64 72 62 61 56 75 55
52 71 54 64 71 64 59 59 70 54
60 60 57 57 66 68 60 62 68 54
62 65 58 64 65 60 60 67 58 56
70 62 60 68 64 62 59 69 52 58

2. Junior executive of XYZ Company has prepared budget for a new


division of the company. Table 3.49 depicts the budget data. Vice
president of the company wanted to see the summary of the budget in a
diagrammatic form. Prepare a pie diagram.
Table 3.49: Budget of XYZ Company

Category Rs. in Lakhs


Capital investment 140
Salary and wages 65
Raw material 100
Research and development expenses 15
Miscellaneous 40

3. ABC Ice Cream Company attempts to keep all of its ten flavours of ice
cream in stock at each of its stores. In-charge of stores operation
collects data on the daily amount of each flavour to the nearest half
gallon.
i. Is the flavour classification discrete or continuous? Open or closed?
ii. Data collected, is it qualitative or quantitative?
iii. Is the amount collected on each flavour discrete or continuous?

Manipal University Jaipur Page No. 116


Statistics for Management Unit 3

4. Table 3.50 depicts certain data. Construct histogram for this data.
Table 3.50: Frequency Table

Class 0-5 5-10 10-15 15-20 20-25 25-30


Frequency 4 6 10 5 3 1

5. Association of real estate sellers has collected data on a sample of 100


people with respect to the monthly commission earned by them. Table
3.51 depicts certain data. Construct an ogive. Find:
i. What proportion of sales people earn more than 25,000
ii. What proportion earn between 15,000 and 25,000.
Table 3.51: Collected Data of 100 people with Respect to Commissions
Earned
Earnings 5000 or 5000- 10000- 15000- 20000- 25000-
less 10000 15000 20000 25000 30000
No. of 5 9 13 30 27 16
people

3.10 Answers

Self Assessment Questions


1. Grouping, common characteristics
2. Bulk
3. Attributes
4. Two-Way Classification
5. Two
6. Chronological classification
7. i) location
8. iv) groups of related facts in different classes
9. ii) chronological
10. i) True, ii) False, iii) False, iv) True, v) False
11. i) Discrete variable, Continuous variable
ii) Five
iii) Upper class limit and lower class limit

Manipal University Jaipur Page No. 117


Statistics for Management Unit 3

iv) Two
v) Sturge’s
vi) f/N
12. i) False ii) True iii) False
13. iv) Histogram
14. iii) A Pie diagram
15. iv) All the above
16. iv) Ogive
Terminal Questions
1. Table 3.52 depicts the solution for terminal question 1.
Table 3.52: Frequency Distribution Table
Class Interval Frequency
50-55 7
55-60 10
60-65 18
65-70 8
70-75 6
75-80 1
Total 50

2. The table 3.53 depicts the data required to construct the pie-chart (figure
3.22) for the budget data of XYZ Company.
Table 3.53: Budget of XYZ Company

Category Angle Subtended at the centre of


circle
Capital investment 140
Salary and wages 65
Raw material 100
Research and 15
development expenses
Miscellaneous 40

Manipal University Jaipur Page No. 118


Statistics for Management Unit 3

Fig. 3.23: Pie-chart


3. i. Discrete and closed
ii. Flavour is qualitative. Volume is quantitative
iii. Continuous
4. Figure 3.24 depicts a histogram diagram for the data in terminal
question 4.

Fig. 3.24: Histogram

5. Figure 3.25 is the ogive curve for the data given in terminal
question 5.
i. 16% ii. 57%

Manipal University Jaipur Page No. 119


Statistics for Management Unit 3

Fig. 3.25: Ogive Curve

3.11 Case Study

Case Study 1: Housing Complex


The welfare committee of a large housing complex wants to understand the
possibility of appointing private security guards at the entrance gate of the
complex for 24 hours duty. There are 810 Flats in the Housing Complex and
the owners were asked to vote for or against the proposal. The following
data was collected:

Should the guards be appointed?

Yes 194
No 121
Not Sure 73
No response 422

Discussion Questions:
a) Convert the data to percentages and construct
i) A bar chart
ii) A pie chart
Which of these charts do you prefer to use and why?

Manipal University Jaipur Page No. 120


Statistics for Management Unit 3

b) Eliminating the no response group, convert the remaining 388


responses to percentages and again construct bar and pie charts.
c) If you have been designated as a poll officer, based on your analysis of
the data, what would you like to suggest to the president of the welfare
committee?
Case Study 2
All insurance companies, offering unit linked insurance policies, charge a
certain amount of money by way of meeting initial expenses. However, the
percentages of such expenses ratio vary from company to company. Table
below depicts the expenses ratio including allocation charges for some
companies and for various maturity periods.
Present the above data with the help of suitable graphs to provide a
comparative idea of the money charged for meeting initial expenses.
Comment on the comparative assessment.

Bajaj ICICI
Tata
HDFC unit Allianz Kotak Safe Prulife SBI Unit Birla Sun
AIG
linked New Investment Time Plus II life
Invest
Endowment Unit Plan Super Regular Premier
Assurell
Gain Regular
1 year 6.4 4.8 7.4 3.7 5.4 5.6 3.5
10 years 2.7 2.9 3.6 2.4 3.5 2.9 2.2
15 year 1.8 2.4 2.8 2.1 3.0 2.3 1.9
20 years 1.5 2.2 2.4 1.9 2.8 2.0 1.7
25 years 1.3 2.1 2.2 1.8 2.6 1.9 1.6
30 years 1.2 2.0 2.1 1.7 2.6 1.8 1.5
(Source: Economic Times dt 23rd October 2006)

References:
 Agarwal B.L. (2006). Basic Statistics, 4th Ed, New Age International
Publishers.
 Bowerman B. L., & R.T. O Connel. Applied Statistics: Improving
Business Processes, Irwin 1996.
 Levin R.I., & Rubin, L.D.S. (2008). Statistics for Management, 7th Ed,
PHI Learning Private Limited.
 Pisani F.D.R., & Purves R., Statistics, 3rd Ed, W.W Norton 1997.

Manipal University Jaipur Page No. 121


Statistics for Management Unit 3

 Srivastava T.N., & Rejo, Shailaja (2008). Statistics for Management, 5th
Ed. TMH.
 Tanur J.M, Statistics: A Guide to the unknown, 4th Ed, Brooks /cole,
2002.
 Tukey J.W., Exploratory Data Analysis, Addison –Wesley, 1977.
 Wilcox R.R. (2009). Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press.

E-References:

 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-1.pdf

Manipal University Jaipur Page No. 122


Statistics for Management Unit 4

Unit 4 Measures of Central Tendency


and Dispersion
Structure:
4.1 Introduction
Objectives
Relevance
Objectives of Statistical Average
4.2 Requisites of a Good Average
4.3 Statistical Averages or Measure of Central tendency
Arithmetic Mean
Properties of Arithmetic Mean
Combined Mean
Weighted Arithmetic Mean
Merits and demerits of Arithmetic Mean
4.4 Geometric Mean
Geometric Mean for Individual Series
Geometric Mean for Discrete series
Geometric Mean for Continuous series
4.5 Harmonic Mean
Relationship between Arithmetic mean, Geometric mean and
Harmonic mean
4.6 Median
Median for Individual series
Median for Discrete series
Median for Continuous series
Merits and demerits of Median
4.7 Mode
Merits and demerits of Mode
4.8 Appropriate Situations for the Use of Various Averages
4.9 Positional Averages
Quartiles
Deciles
Percentiles
4.10 Dispersion
Range

Manipal University Jaipur Page No. 123


Statistics for Management Unit 4

Quartile deviation
Mean deviation
4.11 Standard Deviation
Properties of Standard Deviation
Combined Standard Deviation
4.12 Coefficient of Variation
4.13 Summary
4.14 Glossary
4.15 Terminal Questions
4.16 Answers
4.17 Case Study

4.1 Introduction
In the previous unit, we have studied about data classification and
representation of data in tables and graphs. In this unit, we will study the
measures of Central tendency and Dispersion.
Graphical representation is a good way to represent summarised data.
However, graphs provide us only an overview and thus may not be used for
further analysis. Hence, we use summary statistics like computing averages
to analyse the data. Mass data, which is collected, classified, tabulated and
presented systematically, is analysed further to bring its size to a single
representative figure. This single figure is the measure which can be found
at central part of the range of all values. It is the one which represents the
entire data set. Hence, this is called the measure of central tendency.
In other words, the tendency of data to cluster around a figure which is in
central location is known as central tendency. Measure of central tendency
or average of first order describes the concentration of large numbers
around a particular value. It is a single value which represents all units.
Objectives:
After studying this unit, you should be able to:
 describe the concept of Average (Measures of Central tendency) and
Measures of Dispersion
 explain Arithmetic Mean for discrete and continuous data
 explain Median and Mode of statistical data
 explain Quartiles, Deciles and Percentiles for statistical data
 explain Coefficient of Variation for statistical data
Manipal University Jaipur Page No. 124
Statistics for Management Unit 4

4.1.1 Relevance
Small Fry Design
Founded in 1997, Small Fry Design is a toy accessories’ company that
designs and imports product for infants. The company‘s product line
includes teddy bears, mobiles musical toys, rattles and security blankets
and features high-quality soft toy designs with an emphasis on colour,
texture and sound. The products are designed in the United States and
manufactured in China.
Small Fry Design uses independent representatives to sell the products to
infant furnishing retailers, children’s accessory and apparel stores, gift
shops, upscale department stores, and major catalogue companies.
Currently, Small Fry Design products are distributed in more than 1000 retail
outlets throughout the United States.
Cash flow management is one of the most critical activities in the day-to-day
operation of this young company. Ensuring sufficient incoming cash to meet
both current and ongoing debt obligations can mean the difference between
business success and failure. A critical factor in cash flow management is
the analysis and control of accounts receivable. By measuring the average
age and dollar values of outstanding invoices, management can predict
cash availability and monitor changes in the status of account receivable.
The company has set the following goals: the average age for outstanding
invoices should not exceed 45 days and the dollar value of invoices more
than 60 days old should not exceed 5% of the dollar value of all accounts
receivable.
In a recent summary of accounts receivable status, the following descriptive
statistics were provided for the age of outstanding invoices.
Mean 40 days
Median 35 days
Mode 31 days
Interpretation of these statistics shows that the mean or average age of an
invoice is 40 days. The median shows that half of the invoices have been
outstanding 35 days or more. The mode of 31 days is the most common
length of time an invoice has been outstanding is 31 days. The statistical
summary also showed that only 3% of the dollar value of all accounts
Manipal University Jaipur Page No. 125
Statistics for Management Unit 4

receivable was over 60 days old. Based on the statistical information,


management was satisfied that accounts receivable and incoming cash flow
were under control.
4.1.2 Objectives of statistical average
The statistical average or simply an average refers to the measure of middle
value of the data set. The objectives of statistical average are to:
 Present mass data in a concise form: The mass data is condensed to
make the data readable and to use it for further analysis. It is very
difficult for human mind to grasp a large body of numerical figures. A
measure of average is used to summarise such data into a single figure,
which makes it easier to understand.
 Facilitate comparison: It is difficult to compare two different sets of
mass data. However, we can compare those two after computing the
averages of individual data sets. While comparing, the same measure of
average should be used. It leads to incorrect conclusions when the
mean salary of employees is compared with the median salary of the
employees.
 Establish relationship between data sets: The average can be used
to draw inferences about the unknown relationships between the data
sets. Computing the averages of the data sets is helpful for estimating
the average of population.
 Provide basis for decision-making: In many fields such as business,
finance, insurance and other sectors, managers compute the averages
and draw useful inferences or conclusions for taking effective decisions.
Thus, we have studied the objectives of statistical average.

4.2 Requisites of a Good Average


In this section, we will discuss the requisites of a good average. The
following are the requisites of a good average:
 It should be simple to calculate and easy to understand.
 It should be based on all values.
 It should not be affected by extreme values.
 It should not be affected by sampling fluctuation.
 It should be rigidly defined, preferably by an algebraic formula, so that
different persons obtain the same value for a given set of data.
Manipal University Jaipur Page No. 126
Statistics for Management Unit 4

By following these requisites, we can achieve a good average.

4.3 Statistical Averages or Measure of Central tendency


In this section, we will discuss the statistical averages. The average of a
distribution has been defined in various ways. Some of the important
definitions are as follows:
According to Clark and Sekkade, "An average is an attempt to find one
single figure to describe the whole of figures".
According to Murray R. Spiegal, "Average is a value which is typical or
representative of a set of data".
According to Coxton and Cowden, "An average is a single value within the
range of the data that is used to represent all the values in the series. Since
an average is somewhere within the range of data, it is sometimes called a
measure of central value".
According to Sipson and Kafka, "A measure of central tendency is a typical
value around which other figures congregate".
The important types of Statistical Averages (Measures of Central tendency)
are:
1. Arithmetic mean,
2. Geometric mean,
3. Harmonic mean.
4. Median
5. Mode
4.3.1 Arithmetic mean
Arithmetic mean is defined as the sum of all values divided by number of
values and is represented by X . Arithmetic mean is also called ‘average’. It
is most commonly used measures of central tendency. Arithmetic Mean of
a series is the value obtained by adding all the observations of a series and
dividing this total by the number of observations.
There are two types of Arithmetic Mean:
a. Simple arithmetic Mean
b. Weighted arithmetic Mean
a. Simple arithmetic Mean
Arithmetic mean is simply sometimes referred as ‘Mean’. For example:
mean income, mean expenses, mean marks, etc.
Manipal University Jaipur Page No. 127
Statistics for Management Unit 4

Unlike other averages, mean has to be computed by considering each and


every observation in the series. Hence, the mean cannot be found by either
inspection or observation of items.
Simple Arithmetic Mean is equal to the sum of the values of the variable
divided by their number of observations.
Individual Series: Arithmetic mean can be computed by following two
methods for direct observation of individual items.
1. Direct method
2. Short cut method.
1. Direct Method
Let Xi is the variable which takes values X1, X2, X3,……… Xn over ‘n’
observations, then arithmetic mean is given by.
n

X 1  X 2  X 3  ............... X n X i
X = i 1
where. i  1,2......n
n n

Here X = Arithmetic Mean


n

X
i 1
i = Sum of the values of the observations of a series

n = Number of observations.

Where,  is the Greek symbol Summation denotes the sum of all Xi


values.
Steps of this method:
n
Step1: Add all the values of the variable Xi and obtain X
i 1
i

Step 2: Divide this total by the number of observations n. This will give the
value of Arithmetic Mean
Solved Problem 1
Calculate the mean for following data. Marks obtained by 6 students are
given below:
20, 15, 23, 22, 25, 20

Manipal University Jaipur Page No. 128


Statistics for Management Unit 4

Solution
Mean marks
n

X 1  X 2  X 3  ............... X n X i
X = i 1
where. i  1,2......n
n n
20  15  23  22  25  20 125
 
6 6
= 20.83
Solved Problem 2
Find the arithmetic mean of 15, 17, 22, 21, 19, 26 and 20.
Solution
The arithmetic mean X is given by:
n

X 1  X 2  X 3  ............... X n X i
X = i 1
where. i  1,2......n
n n
15  17  22  21  19  26  20 140
X   20
7 7
Therefore, the arithmetic mean is 20.
Solved Problem 3
Six months income of a departmental store is depicted in table 4.1. Find
mean income of a departmental store.
Table 4.1: Six Months income of departmental store
Month Jan Feb Mar Apr May June
Income (Rs.) 25000 30000 45000 20000 25000 20000

Solution
Total income =  X = (25000 + 30000 + 45000 + 20000 + 25000 + 20000)
= 165000

Mean income X =
 X  165000  Rs. 27500
n 6

Manipal University Jaipur Page No. 129


Statistics for Management Unit 4

The above example shows that if there are large figures, computations
required to get mean is high. In order to reduce computations one can go
for a short-cut method. The method is illustrated as follows.
2. Shortcut method
When the number of observations are large, the Arithmetic Mean can be
calculated using short cut method. The following formulation is used:

XA
d
N

Where d = Sum of deviation of each value from the assumed mean


A = Assumed Mean, N= Total Number of Observations
Steps of this method are as follows:
Step 1: Assume any value as a mean which is called arbitrary average or
assumed mean (A).
Step 2: Find the difference (deviations) of each value from arbitrary average.
d=X–A
Step 3: Add all deviations (differences) to get d
Step 4: Use following equation and compute the mean value.

XA
d
N
Discrete Series: Frequencies of each value is multiplied with the respective
size to get the total number of items and the total number of items is divided
by total number of frequencies to obtain Arithmetic Mean for Discrete
Series. This can be done in two methods:
1. Direct Method
2. Shortcut Method
1. Direct Method: When direct method is used, the following formula used
is

 fX  fX
X 
f N

Manipal University Jaipur Page No. 130


Statistics for Management Unit 4

Steps of this method are as follows:


1. Multiply each size of item by frequency to get  fX
2. Add all frequencies (f = N)
 fX  fX
3. Use formula X   to get mean value.
f N
2. Shortcut Method: When this method is used the formula for calculating
arithmetic mean is
 fd
XA
N
Where d = X-A, A = Assumed Mean, N = Total Number of Observations.
Solved Problem 4
Calculate the mean for following data depicted in table 4.2
Table 4.2: Frequency distribution

Value (X) 1 2 3 4 5
Frequency (f) 10 15 10 9 5

Solution
By direct method
Table 4.2a: Calculation of Mean using Direct Method

Value (X) Frequency (f) fX


1 10 10
2 15 30
3 10 30
4 9 36
5 5 25
f =N= 49 fx = 131

 fX 131
X   2.67
N 49

Manipal University Jaipur Page No. 131


Statistics for Management Unit 4

By short-cut method
Let A = 3, (Assumed mean = 3)
Table 4.2b: Calculation of Mean using Short-cut method

Value (X) Frequency (f) d = (X –A)= (X-3) fd


1 10 -2 -20
2 15 -1 -15
3 10 0 0
4 9 1 9
5 5 2 10
f = N = 49 fd = - 16

 fd   16 
XA 3     2.67
N  49 
Solved Problem 5
The data in table 4.3 depicts the number of students with respect to age.
Calculate the arithmetic mean of the students age.
Table 4.3: Number of Students with Respect to Age

Students Age (X) 20 23 25 28 30


Number of Students (f) 3 5 10 6 1

Solution
The arithmetic mean X is given by:

 fX  fX
X 
f N
20  3  23  5  25  10  28  6  30  1 623
X   24.92
3  5  10  6  1 25

Therefore, the arithmetic mean X is 24.92.


Continuous series: In continuous frequency distribution, the value of each
item in the frequency distribution is not known. In a continuous series, the
mid points of various classes are written down to replace the classes. In

Manipal University Jaipur Page No. 132


Statistics for Management Unit 4

continuous series the mean can be calculated by any of the following


methods.
1. Direct method
2. Short cut method
3. Step deviation method
1. Direct method
When direct method is used, we apply the following formula:
 fm
X 
N
Where m= mid point of various classes, f = frequency of each class,
N = Total frequency
Steps of this method are as follows:
1. Find out the mid value of class group or class.
20  30 50
E.g.: For a class interval 20-30, the mid value is   25 . Mid
2 2
value is denoted by ‘m’.
2. Multiply the mid value ‘m’ by frequency ‘f’ of each class and sum up to
get ∑fm.
 fm
3. Use X  where N = f to get mean value.
N
Solved Problem 6
Compute the mean for following data.
Table 4.4: Frequency table
Age Group 0-10 10-20 20-30 30-40 40-50
No of persons 5 15 25 8 7

Solution
Table 4.4a: Calculation of Arithmetic Mean
Age group No. of persons Mid point
f m fm
0 – 10 5 5 25
10 – 20 15 15 225
20 – 30 25 25 625
30 – 40 8 35 280
40 – 50 7 45 315
f = N = 60 fm = 1470

Manipal University Jaipur Page No. 133


Statistics for Management Unit 4

 fm  fm 1470
X=    24.5
f N 60
X = 24.5.
2. Short cut method
When this method is used, Arithmetic Mean is computed applying the
formula
 fd
XA where A= Assumed mean, d = deviations of mid value from
N
assumed mean i.e, d = m-A, N = Total frequency

Steps of this method are described below:


1. Find the mid value of each class
2. Assume any of the mid value as arbitrary average (A)
3. Multiply the deviation (differences) ‘d’ by frequency ‘f’
 fd
Using the formula, X  A  find the mean value.
N
3. Step deviation method
In case of continuous series with class intervals of equal magnitude, the
arithmetic mean is computed applying the formula:
 fd '
XA i
N
where

d' =
m  Assumed Mean 
Width of Class Interval

d' =
m  A 
i
m= mid value of the class,
i= common magnitude of the class intervals (width of Class
Interval),
A= Assumed mean
Steps of the step deviation method:
1. Find out the mid value ‘m’
2. Select the arbitrary mean (assumed mean) ‘A’
3. Find the deviation (d) of mid value of each class from ‘A’
Manipal University Jaipur Page No. 134
Statistics for Management Unit 4

4. Deviations ‘d’ are divided by the common magnitude of the class


intervals ‘i’ to get d'
5. Multiply d' of each class by frequency ‘f’ to get fd' and sum up for all
classes to get ∑fd'
 fd '
6. Using the formula X  A   i calculate mean value
N
Solved Problem 7
Find the mean age of following data.

Table 4.5: Frequency table


Age 0-10 10-20 20-30 30-40 40-50
No. of persons 5 15 25 8 7

Solution
Table 4.5a: Calculation of Arithmetic Mean

No. of
Mid value m  A m  25
Age persons d'=  fd'
‘m’ 10 10
‘f’
0 – 10 5 5 -2 -10
10 – 20 15 15 -1 -15
20 – 30 25 25 0 0
30 – 40 8 35 1 8
40 – 50 7 45 2 14
Total f=N=60 fd'= -3

Solution
Let A = 25
 fd '
i = 10 XA i
N
(3)
X  25  x 10
60
1
X  25   24.5
2

Manipal University Jaipur Page No. 135


Statistics for Management Unit 4

Calculation of simple arithmetic mean

Key statistic
For Individual series, the Arithmetic Mean is given by:
n

X 1  X 2  X 3  ............... X n X i
X  i 1
where ....i  1,2......n
n n

n = no of observations

Key statistic
For discrete series, the Arithmetic Mean is given by:

 fX  fX
X 
f N
∑f = N = total frequency

Key Statistic
For continuous series, the arithmetic mean is given by:
 fd '
XA i
N

d' =
m  Assumed Mean 
Width of Class Interval

d' =
m  A 
i
m is the mid value of the class
A is the Assumed Mean
i is the common magnitude of the class intervals (width of the Class
Interval)

4.3.2 Properties of Arithmetic Mean


You have studied how to calculate arithmetic mean for grouped and
ungrouped data. Let us study about the properties of Arithmetic Mean which

Manipal University Jaipur Page No. 136


Statistics for Management Unit 4

are helpful in understanding the concept of Arithmetic Mean. The properties


of arithmetic mean are:
i. Algebraic sum of deviations of a set of values taken from their mean is
always zero, that is,
      0
ii. The sum of the squared deviations of the individual items from the
arithmetic mean is always minimum. In other words, the sum of the
squared deviations taken from any value other than the arithmetic
mean will be higher.

 (X  X ) 2
is lesser than  (X  A) 2

for any choice of A different from X


iii. Arithmetic mean is capable of further algebraic treatment. Suppose
X 1 , X 2 , X 3 ......X n are the means of N1, N2…….Nn sets of values, then
their combined arithmetic mean value is given by:

N1 X 1  N2 X 2  ........  Nn X n
X
N1  N2  ....  Nn

4.3.3 Combined mean


Combined arithmetic mean can be computed if we know the mean and
number of items in each group of the data.
The following equation is used to compute combined mean.

Let X 1 & X 2 be the mean of first and second groups of data containing N1 &
N2 items, respectively.

N1 X1  N 2 X 2
Then, combined mean = X 12 
N1  N 2

N1 X1  N 2 X 2  N 3 X 3
If there are 3 groups, then X123 
N1  N 2  N 3

Manipal University Jaipur Page No. 137


Statistics for Management Unit 4

Solved problem 8
a) Find the mean for the entire group of workers from the following data
depicted in table 4.6.
Table 4.6
Group – 1 Group – 2
Mean wages 75 60
No. of workers 1000 1500

Solution
Given data
N1 = 1000 N2 = 1500
X1  75 & X 2  60
N1 X1  N 2 X 2
Combined Mean = X12 
N1  N 2
1000 x 75  1500 x 60
=
1000  1500
X12  Rs. 66
Solved Problem 9
If average height of 30 men is 158 cm and average height of another group
of 40 men is 162 cm, find the average height of the combined group.
Solution
Given that,

N1  30 X1  158 , N 2  40 X 2  162
30  158  40  162
X 12   160 .29 cm
30  40
The average height of the combined group is 160.29 cm.
Solved Problem 10

Suppose, N 1  30 , N 2  40 X 2  162 , and X 12  160 .28 .Find X 1  ?

Manipal University Jaipur Page No. 138


Statistics for Management Unit 4

Solution
On substituting the given values in the following equation, we get,
N1 X1  N 2 X 2
Combined Mean = X12 
N1  N 2
30  X 1  40  162
 160.28
30  40

30X1  40  162  160.28  70


30X1  160.28  70  6480
30 X 1  11219 .60  6480
4739  6
X1   157.99
30
Solved Problem 11
Average weight of 100 screws in box ‘A’ is 10.4 gm. It is mixed with 150
screws of box ‘B’. Average weight of mixed screws is 10.9 gm. Find the
average weight of screws of box ‘B’.
Solution
Given that,

N 1  100 , N 2  150 X1  10.4 , and X 12  10 .9


X2  ?
N1 X1  N 2 X 2
We know that, X12 
N1  N 2

100  10.4  150  X 2


 10.9
100  150
1040  150 X 2  10.9  250  2725
 150 X 2  2725  1040  150 X 2  1685

1685
 X2   11.23 gm
150
Therefore, the average weight of screws of box ‘B’ is 11.23 gm.

Manipal University Jaipur Page No. 139


Statistics for Management Unit 4

Solved Problem 12
A clerk calculated arithmetic mean of 50 values as 39.2. However, it was
found that instead of taking two values as 25 and 32, he took them as 52
and 23. Find the corrected arithmetic mean.
Solution
Given that,
  50,   39.2

 Present Total = N    50  39.2  1960


 Corrected Total = Present Total – wrong values + correct values
Corrected Total  1960  52  23  25  32  1942
1942
 Corrected Average =  38 .84
50
The arithmetic mean, therefore, is 38.84.
4.3.4 Weighted arithmetic mean
The weighted mean is computed by considering the relative importance of
each of the values to the total value. The arithmetic mean gives equal
importance to all the items of distribution. In certain cases, relative
importance of items is not the same. To give relative importance, weightage
may be given to variables depending on cases. Thus, weightage represents
the relative importance of the items.
The weighted arithmetic mean is computed by the following equation.
Let
X1, X2, X3, ………… Xn are the variables and
W1, W 2, W 3, ………… Wn are the respective weights assigned. Then
weighted mean X w is given by the equation.
n

X W  X 2 W2  X 3 W3  ......  X n Wn   XW
X i Wi
Xw  1 1  i 1n 
W1  W2  W3  ............  Wn
W i
W
i 1

Manipal University Jaipur Page No. 140


Statistics for Management Unit 4

Solved Problem 13
Compute simple weighted arithmetic mean and comment on them.
Table 4.7: Weighted Arithmetic Mean
Strength of
Monthly salary (Rs)
Designation cadre
X XW
W
General Manager 25000 10 250000
Mangers 19000 20 380000
Supervisors 14000 10 140000
Office Assistant 10000 50 500000
Helpers 8000 25 200000
Total (N = 5) X = 76000 W = 115 XW = 1470000

 X 76000
a. Simple arithmetic mean =   Rs. 15200
N 5
 XW 1470000
b. Weighted arithmetic mean =   Rs. 12782.6087
W 115
In this example, simple arithmetic mean does not account the difference in
salary range for various staff. It gives equal importance. The salary of
General Manager and Manager has inflated the value of simple mean. The
weighted mean gives importance to the various staff in various salary range.
Solved Problem 14
Comment on the performance of students of two universities depicted in
table 4.8.
Table 4.8: Weighted Arithmetic Mean
University Bombay Madras
% of No. of
% of No. of
pass students
Course pass students
(000) XW
X W
X W XW
MBA 71 3 213 81 5 405
MCA 83 2 166 76 3 228
MA 73 5 365 58 3 174
M.Sc. 75 2 150 76 1 76
M.Com. 70 2 140 81 2 162
Total () 372 14 1034 372 14 1045

Manipal University Jaipur Page No. 141


Statistics for Management Unit 4

Solution
a. Since X is same, simple arithmetic average for both universities.
 X 372
=   74.4
N 5
 XW 1034
b. Weighted mean for Bombay University =   73.86
W 14
 XW 1045
c. Weighted mean for Madras University =   74.64
W 14
Comment: Madras university student’s performance is better than Bombay
university students.

Solved Problem 15
The data in table 4.9 is a reflection of the marks scored by students of a
class in an examination. Calculate the mean of the marks scored by the
students in an examination.
Table 4.9: Marks Scored by Students

Percentage Less Less Less Less Less Less Less


marks than 10 than 20 than 30 than 40 than 50 than 60 than 70
Number of
4 16 20 65 85 97 100
students

Solution
Table 4.9a: Calculation of Arithmetic Mean
Marks Less than Frequency Mid Point X  35 fd'
Cum. Freq d' 
f X 10
0 – 10 4 4 5 –3 – 12
10 – 20 16 12 15 –2 – 24
20 – 30 20 4 25 –1 –4
30 – 40 65 45 35 0 0
40 – 50 85 20 45 1 20
50 – 60 97 12 55 2 24
60 – 70 100 3 65 3 9
N =100 ∑fd'=13

Manipal University Jaipur Page No. 142


Statistics for Management Unit 4

In the table 4.9, the values given for the column ‘number of students’ are in
cumulative frequency distribution. Now, we have to convert it to frequency
distribution. The calculated values are depicted in table 4.9a.
The mean X is given by:
 fd '
XA i
N
13
X  35   10  36.3
100
Therefore, the mean score of the students is 36.3.
Solved Problem 16
Find the missing frequency for the distribution in table 4.10, given the mean
value as 129 and N=80.
Table 4.10: Distribution Table

Class
80-100 100-120 120-140 140-160 160-180
Interval
Frequency 8 – 26 14 10

Solution
Let the missing frequency be ‘f’. Then,
Table 4.10a: Frequency Distribution Table

Class Mid Point X X  130


Interval f d'   f d'
X i 20
80-100 8 90 -2 -16
100-120 f 110 -1 -1f
120-140 26 130=A 0 0
140-160 14 150 1 14
160-180 10 170 2 20
N=80 ∑fd'=18-f

Since, in case of grouped data, the arithmetic mean is given by:


 fd '
XA i
N

Manipal University Jaipur Page No. 143


Statistics for Management Unit 4

18 - f
that is, 129  130   20
80
360  20f
1 
80
 20f  360  80
f =22
Hence, the missing frequency is 22.

4.3.5 Merits and demerits of arithmetic mean


The table 4.11 displays the merits and demerits of arithmetic mean.

Table 4.11: Merits and Demerits of Arithmetic Mean

Merits Demerits
It is simple to calculate and easy to It is affected by extreme values.
understand.

It is based on all values It cannot be determined for distributions


with open-end class intervals.

It is rigidly defined. It cannot be graphically located.


It is more stable. Sometimes it is a value which is not in the
series.

Self Assessment Questions


1. State whether the following questions are ‘True’ or ‘False’.
i. For a given set of values if we add a constant 5 to every value, then
the arithmetic mean is affected.
ii. Arithmetic mean can be calculated for distribution with open-end
classes.
iii. Arithmetic mean is affected by extreme values.
iv. Arithmetic mean of 12, 16, 23, 25, 28, 32 is 22.
2. A single value within the range of the entire mass of data that is used to
represent the whole data is
i) Measures of Central tendency ii) Statistics
iii) Measures of Dispersion iv) Skewness

Manipal University Jaipur Page No. 144


Statistics for Management Unit 4

3. If X1, X2, X3, ………… Xn are a set of n values of a variate, then the
mean is given by
i) N / Xi ii) Xi / n
N1 X1  N 2 X 2
ii) N Xi iv) X12 
N1  N 2
4. (a) Find the Arithmetic mean 68,41,75,91,53,86,59
i) 67.57 ii) 47.57
iii) 37.57 iv) 27.57
(b) The average computed by considering the relative importance of each
of values to the total value, is called
i) Arithmetic mean ii) Geometric mean
iii) Weighted arithmetic mean iv) Harmonic average.

4.4 Geometric Mean


In this section, we will discuss the geometric mean. The geometric mean
(GM) is nth root of product of quantities of the series. It is observed by
multiplying the values of items together and extracting the root of the
product corresponding to the number of items. Thus, square root of the
products of two items and cube root of the products of the three items are
the geometric mean.
The geometric mean (GM) of a series of “n” positive numbers is given by:
GM  n X1 .X 2 ..............X n

Usually, geometric mean is never larger than arithmetic mean. If there are
zeroes and negative numbers in the series, the geometric means cannot be
used, logarithms can be used to find geometric mean to reduce large
number and to save time.
In the field of business management various problems often arise relating to
average percentage rate of change over a period of time. In such cases,
the arithmetic mean is not an appropriate average to employ, so, we can
use geometric mean in such case. GM is highly useful in the construction of
index numbers. The table 4.12 displays the merits and demerits of
Geometric Mean.

Manipal University Jaipur Page No. 145


Statistics for Management Unit 4

Table 4.12: Merits and Demerits of Geometric Mean


Merits Demerits
It is based on all the observations in It is not simple to understand.
the series.
It is rigidly defined It requires computational skill.
It is best suited for averages and GM cannot be computed if any of the
ratios. items is zero or negative
It is less affected by extreme values. It has restricted application
It is useful for studying social and
economics’ data.

4.4.1 Geometric Mean for Individual Series


The formula for calculating the Geometric Mean in case of individual series
is:
  log X 
Geometric Mean (GM) = Antilog  
 N 
Solved Problem 17
Find the Geometric Mean of data 2, 4, 8.
X1 = 2, X2 = 4, X3 = 8 n=3
Solution
GM = n X1  X 2  X 3
GM = 3
2 48
GM = 3
64  4
GM = 4
Solved Problem 18
Find the Geometric Mean of data 2, 4, 8 using logarithms.
Data: X1 = 2, X2 = 4, X3 = 8, N=3

Manipal University Jaipur Page No. 146


Statistics for Management Unit 4

Solution
Table 4.13: Calculation of Geometric Mean
X log X
2 0.301
4 0.602
8 0.903
log X = 1.806

  log X 
Geometric Mean (GM) = Antilog  
 N 
1.806 
GM = Antilog  
 3 
GM = Antilog [0.6020] = 3.9997
GM  4
Solved Problem 19
Compare the previous year over head (OH) expenses which went up to 32%
in year 2003, then increased by 40% in next year and 50% increase in the
following year. Calculate average increase in over head expenses.
Let 100% be the OH expenses at base year.
Solution
Table 4.14: Calculation of Geometric Mean
Year OH Expenses
X log X
2002 Base year –
2003 132 2.121
2004 140 2.146
2005 150 2.176
 log X = 6.443

  log X 
Geometric Mean (GM) = Antilog  
 N 
 6.443 
GM = Antilog 
 3 
GM = Antilog [2.1477] = 140.49

Manipal University Jaipur Page No. 147


Statistics for Management Unit 4

Solved Problem 20
The growth in bad-debt expense for Das Office Supplies Company, over the
last few years is as depicted in table 4.15. Calculate the average percentage
increase in bad-debt expense over this time period.
Table 4.15: Bad-debt Expense Growth for Das Office Supplies Company
Year 1992 1993 1994 1995 1996 1997 1998
Expense Rate 1.110 1.090 1.075 1.080 1.095 1.080 1.200

Solution
The Geometric Mean is given by:
GM = 7 (1.110 ) (1.090) (1.075) (1.080 ) (1.095) (1.080 ) (1.200) = 1.10
Therefore, the average increase is (1.10 – 1) = 0.10 %
4.4.2 Geometric Mean for discrete series
Geometric Mean for discrete series is given as:
  f log X 
Geometric Mean (GM) = Antilog  
 N 
Solved Problem 21
Find the Geometric Mean for the data depicted in table 4.16
Table 4.16: Frequency Table
Marks 130 135 140 145 150
No. of students 3 4 6 6 3

Solution
Table 4.16a: Calculation of Geometric Mean
Marks No. of students
log X f log X
X f
130 3 2.113 6.339
135 4 2.130 8.52
140 6 2.146 12.876
145 6 2.161 12.996
150 3 2.176 6.528
f = N = 22  f log X =47.23
  f log X 
Geometric Mean (GM) = Antilog  
 N 
Manipal University Jaipur Page No. 148
Statistics for Management Unit 4

 47 .23 
GM = Antilog  
 22 
GM = Antilog [2.1468]
GM = 140.222
Solved Problem 22
The share-price of a particular company was moving up and down. The data
depicted in table 4.17 consolidates its movement for past 6 months. Find the
appropriate average share-price.
Table 4.17: Frequency Table of Share Price

Share Price 110 115 118 119 120


Frequency 4 11 21 6 2

Solution
The data in table 4.17 is obtained from the data in table 4.18a.
Table 4.17a: Calculation of Geometric Mean of Share Prices

Share Price Frequency


X f log X f log X
110 4 2.0414 8.1656
115 11 2.0607 22.6677
118 21 2.0719 43.5099
119 6 2.0755 12.4530
120 2 2.0792 4.1584
Total f = N = 44  f log X =90.9546

The geometric mean GM is calculated as:

  f log x 
GM = Antilog  
 N 
 90.9546 
GM = Antilog  
 44 
GM = Antilog [2.0672] = 116.70
The appropriate average share price is Rs. 116.70.

Manipal University Jaipur Page No. 149


Statistics for Management Unit 4

4.4.3 Geometric Mean for continuous series


Geometric Mean for continuous series is given as:
  f log m 
Geometric Mean (GM) = Antilog  
 N 
Steps:
1. Find mid value m and take log of m for each mid value.
2. Multiply log m with frequency ‘f’ of each class to get f log m and sum up
to obtain  f log m.
3. Divide  f log m by N and take antilog to get GM.
Solved Problem 23
Find the G.M. of the following distribution
Table 4.18: Frequency Table

Rank scored 0-9 10-19 20-29 30-39 40-49


No. of matches 5 12 18 14 3

Solution : Given Class Interval is of inclusive interval, firstly convert it into


exclusive interval
Table 4.18a: Calculation of Geometric Mean
C.I f m log m f log m
-0.5-9.5 5 4.5 0.6532 3.266
9.5-19.5 12 14.5 1.1614 13.9368
19.5-29.5 18 24.5 1.3892 25.0056
29.5-39.5 14 34.5 1.5378 21.5292
39.5-49.5 3 44.5 1.6484 4.9452
N = 52  f log X =68.6828

  f log m 
Geometric Mean (GM) = Antilog  
 N 

GM = Antilog 
68.6828 
 
 52 
GM = Antilog [1.3208]
GM = 20.93

Manipal University Jaipur Page No. 150


Statistics for Management Unit 4

Key statistic
Whenever data deals with rates, ratios, growth rates, etc., the geometric
mean is the best measure.
Geometric mean is not defined even if one of the values is zero or
negative.

Key statistic
Suppose the values X1, X2, … Xn are assigned the weights W 1, W 2………
Wn then their weighted average is given by:

Xw 
 XW
W
and their weighted Geometric Mean is given by:

Gw = Antilog
 W log X where, ‘W’ acts as frequency.
W
4.5 Harmonic Mean
It is the total number of items of a value divided by the sum of reciprocal of
values of variable. It is a specified average which solves problems involving
variables expressed in within ‘time rates’ that vary according to time.
E.g.: Speed in km/hr, min/day, price/unit.

Key statistic
For Individual series, Harmonic Mean is given by:
N
H.M.=
 (1 / X)

Key statistic
For discrete series and continuous series, the Harmonic Mean is given
by:
N
H.M =
 (f / X )
Manipal University Jaipur Page No. 151
Statistics for Management Unit 4

The table 4.19 displays the merits and demerits of Harmonic Mean.
Table 4.19: Merits and Demerits of Harmonic Mean
Merits Demerits
It is based on all observations. It is not easy to compute.
It is rigidly defined It cannot be used when one of the
items is zero.
It is suitable in case of series having It cannot represent distribution
wide dispersion.
It is suitable for further mathematical
treatment.

Solved Problem 24
Calculate the harmonic mean of 9.7, 9.8, 9.5, 9.4, 9.7
Solution
The Harmonic Mean (HM) is calculated as:
Table 4.19: Calculation of Harmonic Mean

X 1/X
9.7 0.1031
9.8 0.1020
9.5 0.1053
9.4 0.1064
9.7 0.1031
∑1/X = 0.5199

N
Harmonic Mean (H.M) =
 (1 / X)
5
HM = = 9.6172
0.5199
Therefore, the Harmonic Mean is 9.6172.
Solved Problem 25
A man travelled by a car for 3 days. He covered 480 km each day. On the
first day he drives for 10 hrs at the rate of 48 KMPH, on the second day for
12 hrs at the rate of 40 KMPH and on the third day for 15 hrs at the rate of

Manipal University Jaipur Page No. 152


Statistics for Management Unit 4

32 KMPH. Compute Harmonic mean and Weighted mean and compare


them.
Solution
Table 4.20: Calculation of Harmonic Mean

X 48 40 32 Total
1/X 0.0208 0.025 0.0312 0.0770

N 3
Harmonic Mean (H.M) = = = 38.91
 (1 / X) 0 . 0770
Data: 10 hrs @ 48 KMPH
12 hrs @ 40 KMPH
15 hrs @ 32 KMPH

Table 4.20a : Weighted Mean

W X XW
10 48 480
12 40 480
15 32 480
W = 37 WX = 1440

 XW 1440
Weighted Mean = X w  =
W 37
X w  38.91
Both Harmonic mean and Weighted mean are the same.
Solved Problem 26
Calculate the Harmonic Mean for the following data
Table 4.21: Frequency table

Marks 30-40 40-50 50-60 60-70 70-80 80-90 90-100


Frequency 15 13 8 6 15 7 6

Manipal University Jaipur Page No. 153


Statistics for Management Unit 4

Table 4.21a: Computation of H.M


Marks Frequency Mid Value f/X
f X
30-40 15 35 0.4286
40-50 13 45 0.2889
50-60 8 55 0.1455
60-70 6 65 0.0923
70-80 15 75 0.2
80-90 7 85 0.0824
90-100 6 95 0.0632
N=70 ∑f/X = 1.3009

N 70
Harmonic mean = =  53.81
 (f / X) 1.3009
4.5.1 Relationship between Arithmetic mean, Geometric mean and
Harmonic mean
The relationship between Arithmetic mean, Geometric mean and Harmonic
mean can be summarised as follows:
1. If all the items in a variable are the same, the Arithmetic mean (AM) X ,
Harmonic mean and Geometric mean are equal. i.e., AM  GM  HM .
2. If the size varies, Arithmetic mean will be greater than Geometric mean
and Geometric mean will be greater than Harmonic mean. This is
because of the property that Geometric mean gives larger weight to
smaller items and of the Harmonic mean gives larger weight to smallest
items. Hence AM  GM  HM .
Thus, we have discussed about Arithmetic Mean, Geometric Mean and
Harmonic Mean.

4.6 Median
In this section, we will discuss the median of distribution. Median of
distribution is that value of the variate, which divides it into two equal parts.
In terms of frequency curve, the ordinate drawn at median divides the area
under the curve into two equal parts. Median is a positional average
because its value depends upon the position of an item and not on its
magnitude.

Manipal University Jaipur Page No. 154


Statistics for Management Unit 4

Median of a set of values is the value which is the middle most value when
they are arranged in the ascending or descending order of magnitude.
Median is denoted by ‘M’.
4.6.1 Median for Individual series
The formula used for calculating median for individual series is

 N  1
th

Median= Size of   item


 2 
Where N= total number of items in the series
Steps for calculation:
1. Arrange the data in ascending or descending order

 N  1
th

2. Locate the median by using the formula Size of   item


 2 
3. The value or the size of the item is the Median
Odd number series
Solved Problem 27
Find the median value of the following set of values
22, 16, 18, 13, 15, 19, 17, 20, 23

Solution
Arranging in ascending order, we get:
13, 15, 16, 17, 18, 19, 20, 22, 23
we have, N= 9

 N  1
th

Median= Size of   item


 2 
 9  1
th

 Median =   item  5 th item


 2 
The median for the given set of values is 18.

Manipal University Jaipur Page No. 155


Statistics for Management Unit 4

Even number series


Solved Problem 28
Find the median value of the following set of values
45, 32, 31, 46, 40, 28, 27, 37, 36, 41, 47, 50
Solution
Arranging in ascending order, we get:
27, 28, 31, 32, 36, 37, 40, 41, 45, 46, 47, 50
we have, N= 12

 N  1
th

Median= Size of   item


 2 
 12  1 
th

 Median =   item  6.5 th item


 2 
We have to take the average of 6th and 7th item

 edian 
37  40   38.5
2
The median for the given set of values is 38.5.
Solved Problem 29
In a class 15 students, 5 students were failed in a test. The marks of 10
students who have passed were 9, 6, 7, 8, 9, 6, 5, 4, 7, 8. Find the median
marks of 15 students.

Solution
The marks of 10 students who passed when arranged in ascending order of
magnitude are: 4,5,6,6,7,7,8,8,9,9.

Since five students who have failed must have scored less than 4 marks,
then the marks of 15 students arranged in ascending order will be as
follows:
0,0,0,0,0, 4,5,6,6,7,7,8,8,9,9.

 N  1
th

Median= Size of   item


 2 
Manipal University Jaipur Page No. 156
Statistics for Management Unit 4

 15  1 
th

Median= Size of   item


 2 
Median= Size of 8th item in arranged series.
Thus Median =6.
4.6.2 Median for Discrete series
In discrete series, the values are (already) in the form of array and the
frequencies are recorded against each value. However, to determine the
 N  1
th

Size of   item , a separate column is to be prepared for cumulative


 2 
frequencies. The median size is first located with reference to the
cumulative frequency which covers the size first. Then, against that
cumulative frequency, the value will be located as median.
The formula for calculating Median in Discrete series is:

 N  1
th

Median= Size of   item


 2 
Steps for calculation:
1. Arrange the data in ascending or descending order of size.
2. Obtain the total frequency, N= ∑f
2. Find the Cumulative Frequencies (In general less than type)
3. Locate Median by applying the formula:

 N  1
th

Median= Size of   item (Where N = ∑f)


 2 
 N  1
th

4. The value for which the cumulative frequency includes   item will
 2 
be taken as Median

Manipal University Jaipur Page No. 157


Statistics for Management Unit 4

Solved Problem 30
Find the median value for the data depicted in table 4.22
Table 4.22: Frequency table
X 12 16 10 14 17 20 15
f 4 9 3 5 4 2 10

Solution
In this problem, we have, N = 37
Table 4.22a: Computation of Median
Less than Cumulative frequency
X f
LCF
10 3 3
12 4 7
14 5 12
15 10 22
16 9 31
17 4 35
20 2 37

 N  1
th

Median= Size of   item


 2 
 37  1 
th

   19 item
th

 2 
This value lies in cumulative frequency (22) for the value 15.
Therefore, the Median is 15.
4.6.3 Median for Continuous series
The procedure to get a median is different in continuous series. The class
intervals are already in the form of array and the frequency are recorded
th
N
against each class interval. For determining the size, we should take
2
item and median class located accordingly with reference to the cumulative
frequency, which covers the size first. When the median class is located,
the median value is to be interpolated using the formula given below.

Manipal University Jaipur Page No. 158


Statistics for Management Unit 4

h N 
Median =     c.f 
f 2 
Where  = lower limit of the median class
h = Class width,
f = frequency of median class
c.f = Cumulative frequency of class preceding the median class.

Key statistic
To solve problems on median,:
i) Arrange the data in ascending order or descending order
ii) Make class-interval as exclusive type

Solved Problem 31
Find the median of the data in table 4.23
Table 4.23: Distribution of Weight Data

Weight in Kg 30-35 35-40 40-45 45-50 50-55


Frequency 10 15 40 27 8

Solution
As it is an exclusive type of interval, we organise the data as shown in the
table 4.23a.
N 100
  50
2 2
Table 4.23a Cumulative Frequency Table

Frequency Less than Cumulative


Weight in Kg frequency
f LCF
30-35 10 10
35-40 15 25
40-45 40f 65
45-50 27 92
50-55 8 100
N = 100

Manipal University Jaipur Page No. 159


Statistics for Management Unit 4

Cum. frequency just above 50 is 65 and hence 40 – 45 is the median class.


h N 
Median =     c.f 
f 2 
Where,
 = Lower limit of median class = 40.
c.f = Cumulative frequency of class preceding the median class = 25
f = frequency of median class = 40
h = class width = 5
5 100 
Median = 40    25
40  2 
 43  125
Hence, the median weight is 43.125 kg.
Solved Problem 32
Find the median for data depicted in table 4.24. The class marks obtained
by 50 students are as follows.
Table 4.24: Computation of Median
Less than
Cumulative
CI Frequency
frequency
LCF
10 – 15 6 6
15 – 20 18 24
20 – 25 9f 33
25 – 30 10 43
30 – 35 4 47
35 – 40 3 50
N = 50

N 50
  25
2 2
Cum. frequency just above 25 is 33 and hence, 20 – 25 is median class.
  20
h = 20 – 15 = 5
f=9
c.f = 24

Manipal University Jaipur Page No. 160


Statistics for Management Unit 4

h N 
Median =     c.f 
f 2 
Median = 20 
5
25  24  = 20  5
9 9
Median = 20.555
Solved Problem 33
Find the missing frequency for the data depicted in table 4.25, given that its
median is 34.
Table 4.25: Frequency table
Class interval Frequency
0 – 10 4
10 – 20 9
20 – 30 -
30 – 40 20
40 – 50 18
50 – 60 7
60 – 70 3

Solution
Since median is 34, it falls in the class-interval 30-40. Let ‘f’ be the missing
frequency. Therefore, we have the data shown in table 4.25a

h N 
Median =     c.f 
f 2 
Table 4.25a: Cumulative Frequency Distribution for Data
Less than Cumulative frequency
Class interval Frequency
LCF
0 – 10 4 4
10 – 20 9 13
20 – 30 f 13 + f
30 – 40 20f 33 + f
40 – 50 18 51 + f
50 – 60 7 58 + f
60 – 70 3 61 + f
N =61+f

Manipal University Jaipur Page No. 161


Statistics for Management Unit 4

10  (61  f ) 
34 = 30    (13  f )
20  2 
10  (61  f ) 
34  30    (13  f )
20  2 
1  (61  f )  2(13  f ) 
34  30   
2 2 
1  61  f  26  2 f 
34  30 
2  2 

 35  f 
34  30   
 4 
120  35  f
34 
4
136 155  f
f  19
Therefore, the missing frequency is 19.

4.6.4 Merits and demerits of Median


The table 4.26 depicts the merits and demerits of median.
Table 4.26: Merits and Demerits of Median

Merits Demerits
It can be easily understood and It is not based on all values.
computed.
It is not affected by extreme values. It is not capable of further algebraic
treatment.
It can be determined graphically
(Ogives).

Key statistic
In case of continuous series, Median M is given by:

h N 
Median =     c.f 
f 2 

Manipal University Jaipur Page No. 162


Statistics for Management Unit 4

Where
 = lower limit of the median class
h = Class width,
f = frequency of median class
c.f = Cumulative frequency of class preceding the median class.

4.7 Mode
In this section, we will discuss the Mode. Mode is the value which occurs
with the maximum frequency. It is the most typical or common value that
receives the height frequency. It represents fashion and often it is used in
business. Thus, it corresponds to the values of variable, which occurs most
frequently. The modal class of a frequency distribution is the class with
highest frequency. It is denoted by ‘z’.
Mode is the value of variable which is repeated the greatest number of times
in the series. It is the usual, not casual size of item in the series. It lies at
the position of greatest density.
E.g.: If we say modal marks obtained by students in class test is 42, it
means that the largest number of student have secured 42 marks.
If each observation occurs the same number of times, we can say that there
is ‘no mode’. If two observations occur the same number of times, we can
say that it is a ‘Bi-modal’. If there 3 or more observations occur the same
number of times we say a ‘multi-modal’ case.
Modal value is most useful for business people. For example, shoe and
readymade garment manufacturers will like to know the modal size of the
people to plan their operations. For individual and discrete series, it is that
value corresponding to highest frequency.

Key statistic
In case of continuous series, mode is given by:
f1  f 0
Mode     i.
2f 1  f 0  f 2
Where,
 = lower limit of the modal class

Manipal University Jaipur Page No. 163


Statistics for Management Unit 4

f1 = frequency of the modal class


f 0 = frequency of previous modal class
f 2 = frequency of succeeding modal class
i = width of the class interval
Solved problem 34
The following data relate to size of shoes. Find the mode.
6, 7, 6, 8, 9, 9, 9, 10, 8, 7, 7, 9, 10, 9, 9, 9, 8, 8, 11
Solution
Arranging the data in ascending order, data obtained is depicted in
table 4.27.
Table 4.27: Frequency Table for Data
Size Frequency
6 3
7 3
8 4
9 7
10 2
11 1

 Modal value is 9, which is corresponding to the highest frequency 7.


Solved Problem 35
Praveen, an apartment builder, concerned about the number of customers
who wishes to have plinth area of their apartments. He collects the data and
depicts in table 4.28. Find the modal plinth area.
Table 4.28: No of Customers wishing to have Plinth Area
Plinth Area Sq ft No. of Customers
600 – 800 4
800 – 1000 10
1000 – 1200 15 f0
1200 – 1400 25 f1
1400 – 1600 12 f2
1600 – 1800 8
Above 1800 2

Manipal University Jaipur Page No. 164


Statistics for Management Unit 4

Solution
We note that the intervals are exclusive type and the highest frequency is
25. Therefore, the corresponding interval is 1200-1400, which is called the
modal class.
f1  f 0
Mode     i.
2f 1  f 0  f 2
Where,
 = lower limit of the modal class = 1200
f1 = frequency of the modal class = 25
f 0 = frequency of previous modal class = 15
f 2 = frequency of succeeding modal class = 12
i = width of the class interval = 200
Therefore, the mode is calculated as:

25  15 2000
Mode  1200   200  1200   1286.95
2  25  15  12 23
Hence, the modal plinth area is 1286.95 square feet.
Solved Problem 36
Find the mode for data depicted in table 4.29
Table 4.29: Frequency table

Marks (CI) No. of students (f)


1 – 10 3
11 – 20 16
21 – 30 26
31 – 40 31
41 – 50 16

51 – 60 8
Total f = N = 100

We will have to first convert the inclusive series into an exclusive series for
calculating the mode. To convert discontinuous distribution to continuous
distribution subtract 0.5 from lower limit and add 0.5 to upper limit

Manipal University Jaipur Page No. 165


Statistics for Management Unit 4

Table 4.29a: Computation of Mode


Marks (CI) Conversions No. of students (f)
1 – 10 0.5 -10.5 3
11 – 20 10.5 - 20.5 16
21 – 30 20.5 – 30.5 26
31 – 40 30.5 - 40.5 31  Max. Frequency
41 – 50 40.5 – 50.5 16

51 – 60 50.5 – 60.5 8
Total f = N = 100

We shall identify the modal class being the class of maximum frequency,
i.e., 30.5 – 40.5
f1  f 0
Mode     i.
2f 1  f 0  f 2
Where,
 = lower limit of the modal class = 30.5
f1 = frequency of the modal class = 31
f 0 = frequency of previous modal class = 26
f 2 = frequency of succeeding modal class = 16
i = width of the class interval = 10
Therefore, the mode is calculated as:
31  26
Mode  30.5   10 = 33
2  31  26  16
Key statistic
The empirical relationship between Mean, Median and Mode:
Mean – Mode = 3 (Mean – Median)
which is same as, Mode = 3 Median – 2 Mean.

4.7.1 Merits and demerits of mode


The table 4.30 depicts the merits and demerits of Mode.

Manipal University Jaipur Page No. 166


Statistics for Management Unit 4

Table 4.30: Merits and Demerits of Mode

Merits Demerits
In many cases it can be found by It is not based on all values.
inspection.
It is not affected by extreme values. It is not capable of further
mathematical treatment.
It can be calculated for distributions with It is much affected by sampling
open end classes. fluctuations.
It can be located graphically.
It can be used for qualitative data.

Self Assessment Questions


5. State whether the following questions are ‘True’ or ‘False’.
i) Mode is based on all values
ii) Mode = 3 Median – Mean
iii) Geometric mean is used when we are interested in rate of growth
of any phenomena.
iv) Harmonic mean exists if one of the values is zero.
v) A.M < G.M < H.M for any two values ‘a’ and ‘b’.
vi) Arithmetic mean can be calculated accurately even when the
distribution has open-end class.
vii) Mode can be located graphically.
viii) Mode is used when data is on interval scale.
6. If the values of the variables are arranged in ascending order of
magnitude, the middle term is
i) mean ii) mode
iii) median iv) quartile
7. In a symmetrical distribution the mean, median and mode
i) differ ii) coincide
iii) mean-median = mode iv) differ by 0.5
8. The relation between mean, median and mode is given by
i) Mode= 3 Median-2 Mean ii) Mode=2 Mean-Median
iii) Mode= 3Median –Mean iv) Mode= Mean- Median

Manipal University Jaipur Page No. 167


Statistics for Management Unit 4

9. The harmonic mean of 30 and 20 is


i) 25 ii) 24
iii) 20 iv) 30
10. If assumed mean A=32.5, i=8,  fd =-13 and f= 90
i) mean = 35.31 ii) mean=31.35
iii) mean = 33.15 iv) mean=35.35
11. In any distribution when the original items differ in size, the value of
Arithmetic mean (AM), Geometric mean (GM) and Harmonic mean
(HM) would also differ in the following order
i) AM>GM>HM ii) AM=GM=HM
iii) AM<HM<GM iv) AM.GM>HM

4.8 Appropriate Situations for the use of Various Averages


In this section, we will discuss the appropriate situations for using each of
the averages.
1. Arithmetic mean is used when:
a. In depth study of the variable is needed
b. The variable is continuous and additive in nature
c. The data are in the interval or ratio scale
d. When the distribution is symmetrical
2. Median is used when:
a. The variable is discrete
b. There exists abnormal values
c. The distribution is skewed
d. The extreme values are missing
e. The characteristics studied are qualitative
f. The data are on the ordinal scale
3. Mode is used when:
a. The variable is discrete
b. There exists abnormal values
c. The distribution is skewed
d. The extreme values are missing
e. The characteristics studied are qualitative
4. Geometric mean is used when:
a. The rate of growth, ratios and percentages are to be studied
b. The variable is of multiplicative nature
Manipal University Jaipur Page No. 168
Statistics for Management Unit 4

5. Harmonic mean is used when:


a. The study is related to speed, time
b. Average of rates which produce equal effects has to be found
Thus, we have discussed the appropriate situations for using each of the
averages.

4.9 Positional Averages


In this section, we will discuss the positional averages. Median is the mid-
value of series of data. It divides the distribution into two equal portions.
Similarly, we can divide a given distribution into four, ten or hundred or any
other number of equal portions.
4.9.1 Quartiles
A measure, which divides an array, into four equal parts is known as
Quartiles. Each portion contains equal number of items. The first, second
and third point are termed as first quartile (Q1), second quartile (Q2) and
third quartile (Q3). The first quartile is also known as lower quartile as 25%
of observations of distribution are below it and 75% of observations of the
distribution are about it. The Third Quartile is known as upper quartile as,
75% of observations of the distribution are below it and 25% of observations
of the distribution are above it.

Key statistic
Quartiles: When distribution is divided into four equal portions, then we
get first quartile (Q1), second quartile (Q2 = Median) and third quartile (Q3)
as the positional averages.
For Individual series Q1 and Q3 are given by:

 N  1
th

Q1 = Size of   item
 4 
 3( N  1) 
th

Q 3 = Size of   item
 4 

Manipal University Jaipur Page No. 169


Statistics for Management Unit 4

For discrete series Q1 and Q3 are given by:

 N  1
th

Q1 = Size of   item
 4 

 3( N  1) 
th

Q 3 = Size of   item
 4 
For continuous distribution Q1 and Q3 are given by:
N / 4  c.f
Q1    i
f
3N / 4  c.f
Q3    i
f
Where,
 = lower limit of the quartile class
i = class width
f = frequency of quartile class
N = total frequency
c.f = cumulative frequency of class preceding the quartile class
Measures of quartiles
The quartile values are located on the principle similar to locating the
median value.
Table 4.31 depicts the procedure of locating quartiles.
Table 4.31: Procedure of locating quartiles
For Individual and Continuous Formula to be used for
Measure
Discrete series series Continuous series
N / 4  c.f
 N  1
th th
N Q1    i
Q1   item item f
 4  4
 2N  1  2 N / 4  c.f
th

  item 2N
th Q2    i
Q2 item f
 4  4

 3( N  1) 
th th 3N / 4  c.f
3N Q3    i
Q3   item item f
 4  4

Manipal University Jaipur Page No. 170


Statistics for Management Unit 4

Individual Series:
Solved Problem 37
Weekly sales of a product on 8 different shops are as follows. Calculate the
quartiles.
Sales in units: 309, 312, 305, 307, 310, 308, 308, 306
Solution
Arranging the data in ascending order
Sales in units: 305, 306, 307, 308, 308, 309, 310, 312

 N  1  8  1
th th

Q1 =   item =   item = 2.25th item


 4   4 
= 2nd value + 0.25 (3rd value – 2nd value)
= 306 + 0.25 (307 – 306) = 306.25

 2(8  1) 
th th
 2(N  1) 
Q2    item    item = 4.5th item
 4   4 
= 4th value + 0.5 (5th value – 4th value)
= 308 + 0.5 (308 - 308) = 308
 3(8  1) 
th th
 3(N  1) 
Q3    item =    item = 6.75th item
 4   4 
= 6 value + 0.75 (7th value – 6th value)
th

= 309 + 0.75 (310 – 309)


= 309 + 0.75 = 309.75
Therefore, Q1, Q2, and Q3 are 306.25, 308 and 309.75 respectively.
Discrete Series:
Solved Problem 38
Locate the median and the quartiles from the data depicted in table 4.32.
Table 4.32: Frequency table

Size of shoes 4 4.5 5 5.5 6 6.5 7 7.5 8


Frequencies 20 36 44 50 80 30 30 16 14

Manipal University Jaipur Page No. 171


Statistics for Management Unit 4

Table 4.32a: Computation of Quartiles


X f Less than Cumulative
frequency
LCF
4 20 20
4.5 36 56
5 44 100  Q1
5.5 50 150
6 80 230  Q2
6.5 30 260  Q3
7 30 290
7.5 16 306
8 14 320
N = f = 320

 N  1
th

Q1 = Size of   item
 4 

 320  1 
th

Q1 = Size of   item
 4 
Q1 = 80.25th item
Just above 80.25, the c.f (Cumulative Frequency) is 100. Against 100 c.f,
value is 5.
 Q1 = 5
 2( N  1) 
th

Q = Size of   item = Median


2
 4 
 2(320  1) 
th

Q = Size of   item
2
 4 
Q2 = 160.5th item
Just above 160.5, the c.f (Cumulative Frequency) is 230. Against 230 c.f,
value is 6.
 Q2 = 6= median
 3( N  1) 
th

Q 3 = Size of   item
 4 
Manipal University Jaipur Page No. 172
Statistics for Management Unit 4

 3(320  1) 
th

Q 3 = Size of   item
 4 
Q3 = 240.75th item
Just above 240.75, the c.f (Cumulative Frequency) is 260. Against 260
c.f, value is 6.5.
 Q3 = 6.5
4.9.2 Deciles
The deciles divide the arrayed set of variates into ten portions of equal
frequency and they are sometimes used to characterise the data for some
specific purpose. In this process, we get nine decile values. The fifth decile
is nothing but a median value. We can calculate other deciles by following
the procedure which is used in computing the quartiles.
Table 4.33: Formula to compute Deciles

For Individual and Continuous Formula to be used for


Measure
Discrete series series Continuous series
N / 10  c.f
 N  1
th th
N D1    i
D1   item item f
 10  10
9 N / 10  c.f
 9( N  1) 
th
th D9    i
D9   item 9N
item f
 10  10

The ith decile is given by


iN / 10  c.f
Di    i
f
Where,  = lower limit of the decile class, N = total frequency, f = frequency
of the decile class, c.f = cumulative frequency of preceding class and
i = width of the class interval

Manipal University Jaipur Page No. 173


Statistics for Management Unit 4

Solved Problem 39
Find the 7th Decile for the data given below:
Table 4.34: Frequency table
Class 13 – 18 18-20 20-21 21-22 22-23 23-25 25-30
interval
Frequency 22 27 51 42 32 16 10

Solution
Table 4.34a: Computation of 7th decile

Class interval Frequency Less than Cumulative


f frequency
LCF
13 – 18 22 22
18 – 20 27 49
20 – 21 51 100
21 – 22 42 142
22 – 23 32 174
23 – 25 16 190
25 – 30 10 200
N = 200

The 7th Decile is given by:


7  200
th
7N
item   140 th item
10 10
D7 lies in the class 21 – 22
7 N / 10  c.f
D7    i
f
140  100
D7 = 21   1 = 21.95
42
Therefore, the 7th decile is 21.95.
4.9.3 Percentiles
Percentile values divide the distribution into 100 parts of equal frequency. In
this process, we get ninety-nine percentile values. The 25th, 50th and 75th
percentiles are nothing but first quartile, median and third quartile values,
respectively.

Manipal University Jaipur Page No. 174


Statistics for Management Unit 4

Table 4.35: Formula to compute Percentiles


For Individual and Continuous Formula to be used for
Measure
Discrete series series Continuous series

 N  1 N / 100  c.f
th th
N P1    i
P1   item item
 100  100 f

 25( N  1) 
th th
25 N 25 N / 100  c.f
P25   item item P25    i
 100  100 f

 99( N  1) 
th th
99 N 99 N / 10  c.f
P99   item item P99    i
 10  100 f

Notations are defined in the Decile definition.


Solved problem 40
From the following data, find the middle Decile and the two extreme
Percentiles.
Table 4.36: Frequency table

Size of the 8” 6.5” 7.5” 7” 5.5” 6” 4.5” 4” 5”


shoes
Frequency 7 8 10 15 40 25 22 18 10
of sale
Solution
Table 4.36a: Calculation of middle deciles and 2 extreme percentiles
Size of the shoes in Frequency Less than Cumulative
ascending order f frequency
LCF
4 18 18
4.5 22 40
5 10 50
5.5 40 90
6 25 115
6.5 8 123
7 15 138
7.5 10 148
8 7 155
N = 155

Manipal University Jaipur Page No. 175


Statistics for Management Unit 4

Middle Decile
 5( N  1)   5(155  1) 
th th

D5 =   item =   item = 78th item


 10   10 
Just above 78, the c.f (Cumulative Frequency) is 90. Against 90 c.f, value is
5.5.
D5 = 5.5
Lower Percentile
 N  1  155  1 
th th

P1 =   item =   item = 1.56th item


 100   100 
Just above 1.56, the c.f (Cumulative Frequency) is 18, Against 18 c.f, value
is 4
P1 = 4.
Upper Percentile
 99(155  1) 
th
 99( N  1) 
th

P99 =   item =   item = 154.44th item


 100   100 
(Just above 154.44, the c.f (Cumulative Frequency) is 155, against 155 c.f,
value is 8)
P99 = 8

Solved Problem 41
For the data provided below, find the 20th percentile.
Table 4.37: Frequency table
Class
13 – 18 18-20 20-21 21-22 22-23 23-25 25-30
interval
Frequency 22 27 51 42 32 16 10

Manipal University Jaipur Page No. 176


Statistics for Management Unit 4

Solution
Table 4.37a: Computation of 20th percentile

Class interval Frequency Less than Cumulative


f frequency LCF
13 – 18 22 22
18 – 20 27 49
20 – 21 51 100
21 – 22 42 142
22 – 23 32 174
23 – 25 16 190
25 – 30 10 200
N = 200

The 20th percentile is given by:


20  200
th
20 N
item = item = 40th item
100 100
P20 lies in the class 18 – 20
20 N / 100  c.f
P20    i
f
40  22
P20  18  2
27
 P20  19.33

Therefore, the 20th percentile is 18.67.

Self Assessment Questions


12. State whether the following questions are ‘True’ or ‘False’.
i) Quartiles are positional value.
ii) Quartiles help us to find percentage of readings below or above a
certain value.
iii) Q2 = P50 = D7 = Median
13. State whether the following questions are ‘True’ or ‘False’.
i) The cost of living index numbers calculated are based on weighted
averages.
ii) Many of the items which we use in our life can be assigned weights.

Manipal University Jaipur Page No. 177


Statistics for Management Unit 4

4.10 Dispersion
In this section, we will discuss about the Dispersion.
Definition: A measure of Dispersion may be defined as a statistics
signifying the extent of the scattering of items around a measure of central
tendency.
It describes another characteristic of a distribution. Consider the two
distribution of weights of a product produced by two machines, depicted in
table 4.38.
Table 4.38: Distribution of Weights of a Product
Machine A B
Sample size 1000 1000
Average weight 80 80
Minimum weight 20 40
Maximum weight 140 100

Machine ‘B’ produces products with weights much closer to the average
than Machine ‘A’. As a manufacturer or customer, we would choose
Machine ‘B’. In other words, we choose that machine whose spread is
smaller.
The property of deviations of values from the average is called Dispersion or
Variation. The degree of variation is found by the measures of variation.
They are as follows:
1. Range (R)
2. Quartile Deviation (Q.D)
3. Mean Deviation (M.D)
4. Standard Deviation (S.D)
They have units of measurement attached to them. Therefore, they are
known as absolute measures of variation. However, we may want to
compare two different distributions whose measurements are in terms of
kilograms and in terms of centimetres. Then, we use the following relative
measures that do not have any units attached to them. The relative
measures are as follows:
1. Coefficient of Range
2. Coefficient of Quartile Deviation
3. Coefficient of Mean Deviation
4. Coefficient of Variation

Manipal University Jaipur Page No. 178


Statistics for Management Unit 4

They are known as relative measures. In this unit, we study both measures
of variation and coefficients of variation simultaneously.
Prerequisite of a good measure of Variation are as follows:
1. It should be easy to understand and simple to calculate.
2. It should be based on all values.
3. It should be rigidly defined.
4. It should not be affected by extreme values.
5. It should not be affected by sampling fluctuations.
6. It should be capable of further algebraic treatment.
4.10.1 Range
‘Range’ represents the differences between the values of the extremes.
The range of any sample is the difference between the highest and the
lowest values in the series.
The values in between two extremes are not taken into consideration. The
range is a simple indicator of the variability of a set of observations. It is
denoted by ‘R’. In a frequency distribution, the range is taken to be the
difference between the lower limit of the class at the lower extreme of the
distribution and the upper limit of the class at the upper extreme of the
distribution. Range can be computed using following equation.
Range = Largest value – Smallest value = L - S
L arg est value  Smallest value L S
Coefficien t of Range  =
L arg e st value  Smallest value L  S

The table 4.39 depicts the merits and demerits of Range.


Table 4.39: Merits and Demerits of Range
Merits Demerits
It is easily understood and simple to It is affected by extreme values.
calculate.
It is rigidly defined. It is not based on all values. It uses
extreme values only.

Range is used in the following cases:


 In Statistical Quality control
 When the study does not require deep analysis
 When data has no abnormal values

Manipal University Jaipur Page No. 179


Statistics for Management Unit 4

Solved Problem 42
Find the Range of the following series 26, 28, 28, 26, 28, 30, 27, 29, 26, 24
Solution
The range ‘R’ is calculated as follows:
R= Range = Largest value – Smallest value = L - S
R = 30 – 24 = 6
Therefore, the range is 6.
Solved Problem 43
Compute range and coefficient of range for the following discrete series of
data.
Table 4.40: Frequency table

X: 6 12 18 24 30 36 42
f: 20 130 16 14 20 15 40

Solution:

R= Range = Largest value – Smallest value = L – S = 42 – 6 = 36


L arg est value  Smallest value L S 36
Coefficien t of Range  = =
L arg e st value  Smallest value L  S 48
=0.75
Solved Problem 44
Find the range for the continuous series of data depicted in table 4.41
Table 4.41: Frequency Table

Class Interval 0-5 5-10 10-15 15-20 20-25


Frequency 10 15 25 12 8

Solution
Range R is calculated as follows:
R = 25 – 0 = 25
Therefore, the range of the given continuous series is 25.

Manipal University Jaipur Page No. 180


Statistics for Management Unit 4

Solved problem 45
Compute the Range and also the co-efficient of Range of the given series
Table 4.42

Series – I – 9, 10, 15, 19, 21 Series – II – 1, 15, 24, 28, 29

Solution:
Table 4.42a: Computation of the Range and also the co-efficient of Range

Series – I – 9, 10, 15, 19, 21 Series – II – 1, 15, 24, 28, 29

R = L – S = 21 – 9 = 12 R = L – S = 29 – 1 = 28
L S 12 12 L  S 28
CR = =  = 0.4 CR =  = 0.933
L  S 21  9 30 L  S 30

Solved Problem 46
Find Range and co-efficient of Range from following data and state which is
more dispersed and which is more uniform:
Table 4.43

A 10 11 12 13 14
B 40 41 42 43 44
C 100 101 102 103 104

Table 4.43a: Calculation of Range and co-efficient of Range


Series – I Series – II Series – III
R=L–S R=L–S R=L–S
R = 14 – 10 R = 44 - 40 R = 104 - 100
= 4 = 4 = 4
L S L S L S
CR = CR = CR =
LS LS LS
4 4 4
= = 0.166 = = 0.0476 = = 0.0196
24 84 204
Series III is less dispersed and more uniform
Series I is more dispersed and less uniform

Manipal University Jaipur Page No. 181


Statistics for Management Unit 4

Key statistic
Range is not defined if the class intervals are open.

4.10.2 Quartile deviation


Quartiles divide the total frequency in to four equal parts. The lower quartile
Q1 refers to the values of variate corresponding to the cumulative
N
frequency .
4
Q2 corresponds to the value of variate with cumulative frequency equal to
N
.
2
Upper quartile Q3 refers to the value of variate corresponding to cumulative
3N
frequency .
4
1
Hence, Quartile Deviation QD = (Q3 – Q1)
2
Q 3  Q1
Co-efficient of Quartile Deviation =
Q 3  Q1

Key statistic
1. Q3-Q1 is called inter quartile range.
2. Q3-Q1 gives the middle 50% of reading. Q3 and Q1 are also known as
upper and lower limit of middle 50% of readings.
3. Quartile range is not capable of further algebraic treatment.

Solved Problem 47
Find the Quartile Deviation and the Co-efficient of Quartile Deviation, from
the marks of 12 students depicted in table 4.44.
Table 4.44
Sl. No 1 2 3 4 5 6 7 8 9 10 11 12
Marks 25 30 37 43 48 54 61 67 72 80 84 89

Manipal University Jaipur Page No. 182


Statistics for Management Unit 4

Solution

 N  1  12  1 
th th

Q1 =   item =   item = 3.25th item


 4   4 
= 3rd item + 0.25 (4th item- 3rd item) = 37 + 0.25 (43 - 37)
Q1 = 38.5

 3(12  1) 
th th
 3(N  1) 
Q3    item =    item = 9.75th item
 4   4 
= 9th item + 0.75 (10th item- 9th item) = 72 + 0.75 (80- 72)
Q3 = 78
1
Quartile Deviation = (Q3 – Q1)
2
1
= (78 – 38.5)
2
Quartile Deviation = 19.75
Q 3  Q1
Co-efficient Quartile Deviation =
Q 3  Q1

78  38.5
= = 0.339
78  38.5
Solved Problem 48
Compute quartile deviation and its coefficient for the data depicted in the
table 4.45:
Table 4.45: Frequency table
X 58 59 60 61 62 63 64 65 65
f 15 20 32 35 33 22 20 10 8

Manipal University Jaipur Page No. 183


Statistics for Management Unit 4

Solution
Table 4.45a: Computation of Quartile deviation and its coefficient
X f Less than Cumulative
frequency
LCF
58 15 15
59 20 35
60 Q1 32 67
61 35 102
62 33 135
63 Q3 22 157
64 20 177
65 10 187
65 8 195
N = 195

 N  1  195  1 
th th

Q1 =   item =   item = 49th item


 4   4 
Just above 49, the c.f (Cumulative Frequency) is 67. Against 67 c.f, value is
60.
Q1 = 60

 3( N  1)   3(195  1) 
th th

Q3=   item =   item = 147th item


 4   4 
Just above 147, the c.f (Cumulative Frequency) is 157. Against 157 c.f,
value is 63.
Q3 = 63
1
Quartile Deviation = (Q3 – Q1)
2
1
= (63 – 60)
2
Quartile Deviation = 1.5
Q  Q1
Coefficient of Quartile Deviation = 3
Q 3  Q1
63  60 3
=  = 0.024
63  60 123

Manipal University Jaipur Page No. 184


Statistics for Management Unit 4

Solved Problem 49
Find Quartile Deviation and Coefficient of Quartile Deviation for the given
data and also compute middle quartile.
Table 4.46: Frequency table

Class 1 – 10 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60
Interval

Frequency 3 16 26 31 16 8

Solution
Table 4.46a: Computation of Quartile Deviation and its coefficient

Class f Conversions Less than Cumulative


frequency
LCF
1 – 10 3 0.5 – 10.5 3
11 – 20 16 10.5 – 20.5 19
21 – 30 26 20.5 – 30.5 45  Q1 Class
31 – 40 31 30.5 – 40.5 76  Q2 & Q3 Class
41 – 50 16 40.5 – 50.5 92
51 – 60 8 50.5 – 60.5 100
N = 100

The lower Quartile Q1 is given by


N/4 = 100/4 = 25
Cum. frequency just above 25 is 45, Q1 lies in the class 20.5 – 30.5
N / 4  c.f
Q1    i
f
100 / 4  19
Q1  20.5   10
26
Q1 = 22.80
The middle Quartile Q2 is given by
(2N/4) = (2 x 100/4) = 50
Cum. frequency just above 50 is 76, Q 2 lies in the class 30.5 – 40.5

Manipal University Jaipur Page No. 185


Statistics for Management Unit 4

2 N / 4  c.f
Q2    i
f
2(100 ) / 4  45
Q 2  30.5   10
31
Q2 = 32.11
The third Quartile Q3 is given by
(3N/4) = (3 x 100/4) = 75
Cum. frequency just above 75 is 76, Q 3 lies in the class 30.5 – 40.5

3N / 4  c.f
Q3    i
f
3(100 ) / 4  45
Q 3  30 .5   10
31
Q3 = 40.17
1 1
Quartile Deviation = (Q3 – Q1) = (40.17 – 22.80) = 8.685
2 2
Q 3  Q1 40 .17  22.80
Co-efficient of Quartile Deviation = =
Q 3  Q1 40.17  22.80

17 .37
 = 0.276
62 .97
The table 4.47 depicts the merits and demerits of quartile deviation.
Table 4.47: Merits and Demerits of Quartile Deviation

Merits Demerits
It is easy to understand and to It is not based on all values.
compute.
It is rigidly defined. It is affected by sampling fluctuations.
It is not affected by extreme It is not capable of further algebraic
values. treatment.

Manipal University Jaipur Page No. 186


Statistics for Management Unit 4

4.10.3 Mean deviation


Mean deviation is defined as the mean of absolute deviations of the values
from the central value.
For individual series, Mean deviation from Mean is calculated as:

M.D(X) 
 (X  X)
i

f
For discrete and continuous series, Mean deviation from Mean is calculated
as:

M.D(X) 
 f (X  X) i

f
For individual series, Mean deviation from Median is calculated as:

M.D(Median ) 
 (X  M) i

f
For discrete and continuous series, Mean deviation from Median is
calculated as:

M.D(Median ) 
 f (X  M) i

f
In case of continuous series ‘X’ represents mid value of class-interval.
However, mean deviation from median is the least.
The corresponding relative measures are coefficient of Mean Deviation.
M.D.(X)
Coefficien t of M.D.X 
X
M.D.(Median )
Coefficien t of M.D.Median 
Median
Solved Problem 50
Compute Mean deviation and its coefficient from Mean and Median for the
data.
X: 21, 32, 38, 41,49, 54, 59, 66, 68

Manipal University Jaipur Page No. 187


Statistics for Management Unit 4

Table 4.48: Computation of Mean Deviation from Mean and Median

Xi ( X i  X) = ( X i  47.55) ( X i  M) = ( X i  49)
21 26.55 28
32 15.55 17
38 9.55 11
41 6.55 8
49 1.45 0
54 6.45 5
59 11.45 10
66 18.45 17
68 20.45 19
X = 428  ( X i  X) = 116.45  ( X i  M) = 115

X 
X 1  X 2  X 3  ...............  X n

 X = 428  47.55
N N 9

M.D(X) 
 (X  X)
i
=
116.45
f 9
M.D(X) = 12.938
M.D.(X) 12 .938
Coefficien t of M.D.X  = = 0.272
X 47 .55

 N  1
th

Median= Size of   item


 2 
 9 1
th th
 10 
Median= Size of   item =   item = 5th item
 2  2
Median = 49

M.D(Median ) 
 (X  M)i
=
115
= 12.778
f 9
M.D.(Median ) 12.778
Coefficien t of M.D.Median    0.2608
Median 49

Manipal University Jaipur Page No. 188


Statistics for Management Unit 4

Solved Problem 51
Compute Mean deviation and its coefficient from Mean and Median for the
data.
Table 4.49: Frequency table
Marks (X) 5 10 15 20 25
Students (f) 6 7 8 11 8

Table 4.49a: Mean Deviation about Mean


Xi f fX
( X i  X) = f ( X i  X)

( X i  16)
5 6 30 11 66
10 7 70 6 42
15 8 120 1 8
20 11 220 4 44
25 8 200 9 72
N=40
 fX  640 f ( X i  X) = 232

 fX  fX 640
X    16
f N 40

M.D(X) 
 f (X  X)
i

232
 5.8 marks
f 40

M.D.(X ) 5 .8
Coefficien t of M.D.X  = = 0.363
X 16
Table 4.49b: Mean Deviation about Median
Xi f Cumulative ( Xi  M) = ( X i  15) f ( Xi  M)
Frequency (LCF)
5 6 6 10 60
10 7 13 5 35
15 8 21 0 0
20 11 32 5 55
25 8 40 10 80
N=40 f ( X i  X) = 230

Manipal University Jaipur Page No. 189


Statistics for Management Unit 4

 40  1  th
Median =   item = 20.5th item = 15th item
 2 
This value lies in cumulative frequency (21) for the value 15.
Therefore, the Median is 15.

M.D(Median ) 
 (X  M)
i
=
230
= 5.75 marks
f 40

M.D.(Median ) 5.75
Coefficien t of M.D.Median   = 0.383
Median 15
Solved Problem 52
Compute Mean Deviation about its Mode and its coefficient from the data
depicted in table 4.50
Table 4.50: Mean Deviation about Mode
X f X i  Mode f ( X i  Mode)
20 6 100 600
40 19 80 1520
60 40 60 2400
80 23 40 920
100 65 20 1300
120  Mode 83 0 0

140 55 20 1100
160 20 40 800
180 9 60 540
f = 320 f X i  Mode
= 9180

Solution
The highest frequency is 83 and hence Mode = 120
 f X i  Mode  9180 
M.D(Mode)  =  = 28.68
f  320 
M.D.(Mode) 28 .68
Coefficien t of M.D.Mode  = = 0.239
Mode 120

Manipal University Jaipur Page No. 190


Statistics for Management Unit 4

Solved Problem 53
Find out the Mean Deviation from the data depicted below about its Median
and its coefficient.
Table 4.51: Frequency table
Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Frequency 7 12 18 25 16 14 8

Table 4.51a: Mean deviation about Median


Size Frequency Mid Cumulative
f value Frequency Xi  M f ( X i  M)
Xi (LCF)
X i  35.2

0-10 7 5 7 30.2 211.4


10-20 12 15 19 20.2 242.4
20-30 18 25 37 10.2 183.6
30-40 25 35 62 0.2 5.0
40-50 16 45 78 9.8 156.8
50-60 14 55 92 19.8 277.2
60-70 8 65 100 29.8 238.4
N = 100 f Xi  M
= 1314.8
N  100
  = = 50
 2  2
Cum. frequency just above 50 is 62 and hence, 30 – 40 is the median class.
h N 
Median =     c.f 
f 2 
10 100 
Median = 30    37 
25  2 
Median = 35.2

 f Xi  M  1314 .8 
M.D(Median )  =  = 13.148
f  100 
M.D.(Median ) 13.148
Coefficien t of M.D.Median  = = 0.3735
Median 35 .2

Manipal University Jaipur Page No. 191


Statistics for Management Unit 4

The table 4.52 depicts the merits and demerits of Mean Deviation.
Table 4.52: Merits and Demerits of Mean Deviation
Merits Demerits
It is based on all values. It is not capable of further algebraic
treatment.
It is less affected by extreme values. It does not take into account negative
signs.
It is not affected much by sampling
fluctuations.

The Mean Deviation is used in the following cases:


 When sample size is small.
 In Statistical analysis of certain economic, business and social
phenomena.

4.11 Standard Deviation


In this section, we will study the Standard deviation. Standard deviation is
the root of sum of the squares of deviations divided by their numbers. It is
also called mean square error deviation (or) root mean square deviation. It
is a second moment of dispersion. Since the sum of squares of deviations
from the mean is a minimum, the deviations are taken only from the mean
(But not from median and mode).
The standard deviation is root mean square (RMS) average of all the
deviations from the mean. It is denoted by sigma ().
Individual series: There are two methods of calculating standard deviation
in an individual observation or series:
i) When deviations taken from Actual Mean: This method is used only when
the mean is a whole number.

Variance =
 (X  X) 2

N
Standard Deviation () = ( Variance )
ii) Deviation Taken from Assumed Mean: When the Arithmetic Mean is a
fractional value the method explained in (i) will be tedious and time-
consuming. Hence we use the following formula.

Manipal University Jaipur Page No. 192


Statistics for Management Unit 4

Variance =
d 2

 (d ) 2
f
Standard Deviation () = ( Variance )
Where, d stands for the deviation from assumed mean d  X  A , A is
assumed mean, f = N
Discrete series:
i) Actual Mean Method:

Variance =
 f (X  X) 2

f
Standard Deviation () = ( Variance )
ii) Assumed Mean Method:
 fd 2  fd  2 
Variance = 
    
 f   f  

Standard Deviation () = ( Variance )
Where d  X  A , A = assumed mean, f = N
Continuous series:
In a continuous series mid-values of the class intervals are to be found out.
Where, ‘X’ is the mid value of class interval for continuous series.
 fd 2  fd  2 
Variance = 
       (i) 2
 f   f  

= ( Variance )
XA
Where d 
i
, A = assumed mean,  f = N, i = class width

Key statistic
The square of standard deviation is called variance. It is denoted by 2.

Manipal University Jaipur Page No. 193


Statistics for Management Unit 4

4.11.1 Properties of standard deviation


The properties of standard deviation are as follows:
1. It is independent of origin but not independent of scale.
2. Standard deviation is always greater than or equal to zero.
3. It is the least of all root-mean-square deviations.
4. Suppose the mean of N1 values is X 1 and that of N2 values is X 2 and
standard deviation of the N1 and N2 values is 1 and 2 respectively.
Then the combined standard deviation of both the values is given by:
N1 (1  d1 )  N 2 ( 2  d 2 )
2 2 2 2
Variance =
N1  N 2
Standard Deviation () = ( Variance )
Where, d1 = X1  X12 , d2 = X 2  X12 , X12 being the combined mean of N1
and N2 values.
4.11.2 Combined standard deviation
Suppose we have different samples of sizes N1, N2 having mean X 1 , X 2
and standard deviation 1, 2, then combined standard deviation can be
computed by using the following formula.
2 (N1 + N2) = N1 (12 + d12) + N2 (22 + d22)
d1 = X1  X12 , d2 = X 2  X12 , X12 being the combined mean of N1 and N2
values
The table 4.53 depicts the merits and demerits of Standard Deviation.
Table 4.53: Merits and Demerits of Standard Deviation

Merits Demerits
It is rigidly defined. It is difficult to understand.
It is based on all values. It gives undue weightage for extreme
values.
It is capable of further algebraic It cannot be calculated for classes with
treatment. open end interval.
It is not very much affected by
sampling fluctuations.

Manipal University Jaipur Page No. 194


Statistics for Management Unit 4

Solved Problem 54
Find the Standard Deviation of (Rs.) 7, 9, 16, 24, 26
Solution:
Table 4.54: Computation of Standard Deviation
Variate (Rs.)
( X  X) = (X  16.40) ( X  X) 2
X
7 -9.4 88.36
9 -7.4 54.76
16 -0.4 0.16
24 7.6 57.76
26 9.6 92.16

 X = 82  (X  X) 2
= 293.20

X =
 X  82  Rs. 16.40
N 5

Variance =
 (X  X) 2


293 .20
 58.64
N 5
Standard Deviation () = ( Variance ) = 58.64 = Rs. 7.66
Hence the standard deviation is Rs. 7.66
Solved Problem 55
Find the Standard Deviation from the following:
Table 4.55: Frequency table
Weight (K.g) 44-46 46-48 48-50 50-52 52-54 Total
No. of persons 3 24 27 21 5 80

Solution
Table 4.55a: Computation of Standard Deviation
Weight Mid Point Frequency X  49
(K.g) X f
d fd fd2
2
44-46 45 3 -2 -6 12
46-48 47 24 -1 -24 24
48-50 49 27 0 0 0
50-52 51 21 1 21 21
52-54 53 5 2 10 20
 f = 80  fd =1  fd 2
= 77

Manipal University Jaipur Page No. 195


Statistics for Management Unit 4

 fd 2  fd  2 
Variance = 
       (i) 2 =  77  1  2 
      ( 2) = 3.8494
2

 f   f    80  80  

Standard Deviation () = ( Variance ) = 3.8494 = 1.96 kg

Hence the standard deviation is 1.96 kg


Solved Problem 56
The average weight of 100 apples from area “A” is 150 gm with standard
deviation of 10 gm. Similarly the average weight of 200 apples from area “B”
is 200 gm with standard deviation of 15 gm. Find the combined standard
deviation.
Solution
Given that:

N1  100, N 2  200 1  150,  2  200


1 = 10, 2 = 15

N1 X1  N 2 X 2
Combined mean = X 12 
N1  N 2
100  150  200  200

100  200
15000  40000 55000
   183.33 gm
300 300
d 1 = X1  X12 = (150 – 183.33) =-33.33  d 12 = (- 33.33)2 = 1110.889
d2 = X 2  X12 = (200 – 183.33) = 16.67  d 22 = (16.67)2 = 277.8889

2 (N1 + N2) = N1 (12 + d12) + N2 (22 + d22)

2( 100+ 200) = 100 ( 102 + 110.889) + 200 ( 152 + 277.8889)

3002 = 121088.9 + 100577.78

2 = 738.8889

 = 27.18
Hence, the combined standard deviation is 27.17.

Manipal University Jaipur Page No. 196


Statistics for Management Unit 4

Solved Problem 57
The mean of two samples of sizes 50 and 100 respectively are 54.1 and
50.3 and their standard deviations are 8 and 7 respectively. Obtain the
Standard Deviation for the combined group.
Solution
N1 = 50,  1 = 54.1, 1 = 8

N2 = 100,  2 = 50.3, 2 = 7

N1 X1  N 2 X 2 (50 x 54.1)  (100 x 50.3)


Combined mean = X 12  
N1  N 2 50  100

X 12 = 51.56

d1 = 1  12 = (54.1 – 51.56) = 2.54  d 12 = 6.45

d2 =  2  12 = (50.3 – 51.56) = - 1.26  d 22 = 1.587

2 (N1 + N2) = N1 (12 + d12) + N2 (22 + d22)


2 (150) = 50 (82 + 6.45) + 100 (72 + 1.58)

1502 = 8580.5

2 = 57.2033

 = 7.5632
Hence the standard deviation for the combined group is 7.5632.
Solved Problem 57
The mean wage is Rs. 75 per day, SD wage is Rs. 5 per day for a group of
1000 workers and the same is Rs. 60 and Rs. 4.5 for the other group of
1500 workers. Find mean and standard deviation for the entire group.
Solution
We have by data,  1 = 75, 1 = 5, N1 = 1000

 2 = 60, 2 = 4.5, N2 = 1500

Manipal University Jaipur Page No. 197


Statistics for Management Unit 4

Combined Mean :
N1 X1  N 2 X 2 1000 x 75  1500 x 60
X 12   66
N1  N 2 1000  1500
Combined Standard deviation:
d1 = 1  12 = (75 – 66) = 9  d 12 = 81
d2 =  2  12 = (60 – 66) = -6  d 22 = 36
(N1 + N2) 2 = N1 (12 + d12) + N2 (22 + d22),
(1000 + 1500) 2 = 1000 (52 + 81) + 1500 (4.52 + 36)
25002 = 190375
2 = 76.15
 = 8.73

4.12 Coefficient of Variation


In this section, we will discuss the coefficient of variation. When we want to
compare two different sets of values pertaining to different characteristics or
pertaining to same characteristic, then we use coefficient of variation (CV). It
is a relative measure expressed in percentage and is defined as:
S.D.
Coefficient of Variation(CV) =  100
Mean

CV = x 100
X
It is used to compare the homogeneity or stability or uniformity or
consistency of two or more data sets. A low value of coefficient of variation
indicates a low degree of variation.
Solved Problem 59
From the prices of shares X and Y. State which share is more stable in its
value?
Table 4.56: Prices of shares X and Y
X 55 54 52 53 56 58 52 50 51 49
Y 108 107 105 105 106 107 104 103 104 101

Manipal University Jaipur Page No. 198


Statistics for Management Unit 4

Solution:
Table 4.56a
Prices of Prices of
( X  X) = (Y  Y) =
(X) ( X  X) 2 (Y) ( Y  Y) 2
shares (X - 53) shares (Y - 105)
55 2 4 108 3 9
54 1 1 107 2 4
52 -1 1 105 0 0
53 0 0 105 0 0
56 3 9 106 1 1
58 5 25 107 2 4
52 -1 1 104 -1 1
50 -3 9 103 -2 4
51 -2 4 104 -1 1
49 -4 16 101 -4 16

 ( X  X) =
2
X = 530  (X  X) 2 = 70 Y = 1050
40

Solution
X 530
X= = = 53
N 10
 Y 1050
Y= = = 105
N 10

x 
 (X  X) 2

=
70
 2.64
N 10

y 
 (Y  Y) 2


40
2
N 10
x 2.64
CVx = x 100 = x 100 = 4.98%
X 53
y 2
CVy = x 100 = x 100 = 1.90%
Y 105

Y shares are more stable in value than X shares since the coefficient of
variation of Y shares is lower than the coefficient of variation of X shares.

Manipal University Jaipur Page No. 199


Statistics for Management Unit 4

Solved Problem 60
Goals scored by two teams A & B in a football season are as depicted in
table 4.57. By calculating Coefficient of Variation in each, find which team
may be considered as more consistent.
Table 4.57: Goals scored by 2 teams A and B
No. of matches
No. of goals
A-team B-team
X
f f
0 27 17
1 9 9
2 8 6
3 5 5
4 4 3
N = 53 N = 40

Solution
With the help of the given data and data depicted in table 4.57, we can
calculate the more consistent team.
Table 4.57a: Computation of Coefficient of Variation
Team (A) Team (B) Team (A) Team (B)
2
fX fX fX fX2
0 0 0 0
9 9 9 9
16 12 32 24
15 15 45 45
16 12 64 48
fX= 56 fX = 48 fx2 = 150 fx2 = 126

 fX 56
XA = = = 1.056
N 53

 fX 48
XB = = = 1.2
N 40

 
A
2  fX 2
N
 XA   2
=
150
53
 1.056   1.715
2

Manipal University Jaipur Page No. 200


Statistics for Management Unit 4

  1.715    1.30
A A

 
B
2  fX 2
N
 XB   2
=
126
40
 1.2  1.71
2

  1.71    1.30
B B

A 1.30
CVA = x 100 = x 100 = 123.1%
XA 1.056

B 1.30
CVB = x 100 = x 100 = 108.33%
XB 1.2

Since, CVB < CVA, Team B is a more consistent player


Solved Problem 61
A student while computing the coefficient of variation obtained the mean and
Standard deviation values of 100 observations equal to 40 and 5.1
respectively. It was later discovered that he had wrongly copied an
observation as 50 instead of 40. Calculate the correct coefficient of
variation.
Solution
X X
X= i.e. 40 
N 100
 X (Incorrect) = 4000
Now Correct X = 4000 – 50 + 40 = 3990
3990
 Correct X  = 39.9
100

Let us consider  
2  X2
N
 X  2

 X2
5.12   40 
2

100

 X2  X2
i.e., 40   5.1   1626 .01
2 2
or
100 100
 X2 (Incorrect) = 100 x 1626.01 = 162601

Manipal University Jaipur Page No. 201


Statistics for Management Unit 4

Now Correct X2 = 162601 – (50)2 + (40)2 = 161701

 Correct 2 = Correct
 X2
n

 Correct X 2

 39 .9   25 ,  = 5
161701
i.e., Correct 2 = 2
100


Coefficient of variation = x 100
X
5
= x 100  12.53%
39.9
Hence Correct Coefficient of variation = 12.53%

Self Assessment Questions


14. State whether the following questions are ‘True’ or ‘False’
i. Standard deviation is based on all the values.
ii. Standard deviation of a set of values is increased if every value of
the set is increased by a constant.
iii. Standard deviation can be calculated for distributions with open-end
classes.
iv. Coefficient of variation can be used to compare the variability of two
sets of data measuring the same characteristics.

Activity
1. The average rainfall of a city from Monday to Saturday is 0.3 inches.
Due to heavy rainfall Sunday the average rainfall for the week
increased to 0.5 inches. What is the rainfall on Sunday?
2. The average salary of male employees in a firm was Rs. 520 and that
of females Rs. 420. The mean of salary of all the employees as a
whole is Rs. 500. Find the percentage of male and female
employees.
3. For a given frequency table, find out the missing data. The average
accident is 1.46 and N= 200.

Manipal University Jaipur Page No. 202


Statistics for Management Unit 4

Table 4.58: Frequency table


No. of accidents Frequency
0 46
1 ?
2 ?
3 25
4 10
5 5
4. The Arithmetic mean of two observations is 25 and their Geometric
mean is 15. Find the Harmonic mean.
5. The Geometric mean is 60 and Harmonic mean is 28.24. Find
Arithmetic mean for two observations.

Activity Solution
Solution 1
Given: Mon – Sat = 0.3”
Sun = 0.5”
 fX 1  fX 1
X , 0.3  , fX1 = 1.8
N 6
 fX 2  fX 2
X , 0.5  , fX2 = 3.5
N 7
Rainfall on Sunday = fX2 – fX1
= 3.5 – 1.8
= 1.7”
Solution 2
Given: X1  520 X 2  420 X12  500
N1 = No. of male employees N2 = No. of female employees
N1 X1  N 2 X 2 ( N 1  520 )  ( N 2  420 )
X 12  500 
N1  N 2 N1  N 2
500N1 + 500N2 = 520N1 + 420N2
500N2 - 420N2 = 520N1 – 500N1
80N2 = 20N1
N1 = 4N2
Let N1 + N2 = 100
4N2 + N2 = 100

Manipal University Jaipur Page No. 203


Statistics for Management Unit 4

5N2 = 100
N2 = 20%  Female
N1 = 80%  Male
20% and 80% are male and females employees in the firm.
Solution 3
Table 4.59
No. of accidents Frequency
fX
(X) (f)
0 46 0
1 f1 f1
2 f2 2f2
3 25 75
4 10 40
5 5 25
N = 200 fX = 140 + f1 + 2f2

 fX
X
N
140  f1  2f2
1.46 =
200
292 = 140 + f1 + 2f2
 f1 + 2f2 = 152 ---- (1)
w.k.t. N = f
200 = 86 + f1 + f2
f1 + f2 = 114 ---- (2)
From (1) and (2)
f1 + 2f2 = 152 ---- (1)
f1 + f2 = 114 ---- (2) (1) – (2)
----------------------------------
f2 = 38
---------------------------------

 f2 = 38,
f1 + f2 = 114, f1 + 38 =114, f1 = 114 – 38, f1 = 76

Manipal University Jaipur Page No. 204


Statistics for Management Unit 4

Solution 4
Given:
AM = 25 GM = 15 HM = ?
ab GM = ab 2
X HM =
1 1
2 
GM = ab a b
ab
X 15 = ab 2ab
2 HM =
ab
ab (15)2 = ( ab )2
25  2 x 225
2 ab = 225 HM =
50
a + b = 50
HM = 9
Solution 5
GM = 60 HM = 28.24 AM = ?

60 = ab
2ab ab
28.24 = X
602 = ab ab 2
ab = 3600 2ab 254 . 95
a+b= X
28 .24 2
2 x 3600 = 127.475
=
28 .24
a + b = 254.95

4.13 Summary
Let us recapitulate the important concepts discussed in this unit:
 The measures of central tendency and measures of dispersion
summarise mass data in terms of its two important features. They are as
follows:
i. With respect to nature of data to cluster around a central value
ii. With respect to their spread from their central value
 Arithmetic mean is defined as the sum of all values divided by number of
values.
 Median of a set of values is the middle most value when the values are
arranged in the ascending order of magnitude.
 Mode is the value which has the highest frequency.

Manipal University Jaipur Page No. 205


Statistics for Management Unit 4

 The measures of Dispersion are as follows:


i. Range (R)
ii. Quartile Deviation ( Q.D)
iii. Mean Deviation (M.D)
iv. Standard Deviation (S.D)
 Coefficient of variation is a relative measure expressed in percentage
and is defined as:
S.D.
Coefficient of variation =  100
Mean

4.14 Glossary
Arithmetic mean: Sum of observations divided by number of observations.
Coefficient of variation: The ratio of standard deviation to the mean,
usually expressed in % form.
Quartiles: A measure, which divides an array, into four equal parts is known
as Quartiles.
Geometric mean: The nth root of the product of ‘n’ observation.
Harmonic Mean: The reciprocal of the arithmetic mean of the reciprocals of
observations.
Inter-quartile range (IQR): The difference between the third quartile and
first quartile.
Mean or average deviation: Sum of the absolute deviations of the values
from their mean or median divided by number of values.
Median: The middle most value of a series of when arranged in first
ascending or descending order of magnitude.
Mode: The value which has maximum number of observations or tends to
have as compared to any other value.
Percentiles: Percentile values divide the distribution into 100 parts of equal
frequency.
Range: The difference between the maximum and the minimum values of
the observations.
Semi inter-quartile range: The difference between the third quartile and
first quartile divided by 2.

Manipal University Jaipur Page No. 206


Statistics for Management Unit 4

Standard deviation: The square of variance.


Third quartile: The third quartile is known as upper quartile as, 75% of
observations of the distribution are below it and 25% of observations of the
distribution are above it.
Variance: The sum of squares of deviations of observations from their mean
divided by number of observations.

4.15 Terminal Questions


1. In an office there are 84 employees. Their salaries in Indian rupees are
as given in table 4.60. Find the mean salary per day.
Table 4.60: Salaries of 84 Employees

Salary / day 60 70 80 90 100 120


Employees 3 5 8 10 4 2

2. A survey of 128 smokers gave the results represented in table 4.61,


which are frequency distributions of smokers’ daily expenses on
smoking. Find the mean expenses and standard deviation. Determine
coefficient of variation.
Table 4.61: Survey Results of 128 Smokers
Expenditure
10 - 20 20 - 30 30 - 40 40 – 50 50 - 60 60 - 70 70 - 80
(Rs.)
No. of
23 44 35 12 9 3 2
Smokers

3. For the distribution shown in table 4.62, find the median and mode.
Table 4.62: Distribution Data for Terminal Question 4
% Marks 0 - 10 10 - 20 20 - 30 30 - 40 40 – 50 50 - 60 60 - 70
No. of 4 9 19 20 18 7 8
Smokers

4. Find the geometric mean of the following distribution given in table


4.63.
Table 4.63: Distribution Data for Terminal Question 5

X 110 115 118 119 120


f 4 11 21 6 2

Manipal University Jaipur Page No. 207


Statistics for Management Unit 4

5. Find the harmonic mean of the following distribution given in table 4.64.
Table 4.64: Distribution Data for Terminal Question 6

X 121 122 123 124 125


f 5 25 36 37 20

6. Given that, sum of upper and lower quartiles is 122 and their difference
is 23; find the quartile deviation of the series.
7. If Coefficient of Variation = 22 and S.D. = 4, find the mean.
8. The table 4.65 shows the distribution of age at the time of first delivery
of 65 women. Find mean deviation from mean and median.
Table 4.65: Distribution of Age at the Time of First Delivery of 65 Women

Age 18 – 22 22 – 26 26 – 30 30 – 34 34 – 38
Frequency 20 30 11 3 1

9. Read the data given below and find the combined mean, S.D. and
coefficient of variation.
n1 = 15 n2 = 20
X 1 = 40 X 2 = 50
1 = 3 2 = 5
10. Mean and Standard deviation of lengths of tails of 8 rats were found to
be 4.7 cm and 0.8 cm respectively. However, one reading was taken as
3.6 cm instead of 6.3 cm; find the corrected mean and standard
deviation.

4.16 Answers

Self Assessment Questions


1. i) True, ii) False, iii) Tue, iv) False
2. i) measures of central tendency
3. ii) Xi / n
4. (a) i) 67.57 (b). iii) weighted arithmetic mean
5. i) False, ii) False, iii) True, iv) False, v) False, vi) False, vii) True, viii)
True

Manipal University Jaipur Page No. 208


Statistics for Management Unit 4

6. iii) median
7. ii) coincide
8. i) Mode= 3 Median-2 Mean
9. ii) 24
10. ii) mean=31.35
11. i) AM>GM>HM
12. i) True, ii) True, iii) False
13. i) True, ii) True
14. i) True, ii) False, iii) False, iv) True

Terminal Questions
1. Rs. 84.69
2. Mean = 31.64, Standard Deviation = 13.36, Coefficient of Variation
=42.225
3. Median = 35.25, Mode = 33.33
4. 116.7 cm
5. 123.33
6. 11.5
7. 18.18
8. 2.462
9. Combined Mean = 45.7
Combined S.D = 6.53,
Coefficient of Variation = 14.29
10. Corrected Mean = 5.0375 cm
Corrected S.D = 0.8336 cm

4.17 Case Study


Case Study 1: Consolidated Foods, Inc
Consolidated Foods, Inc. operates a chain of supermarkets in New Mexico,
Arizona and California. The following table shows a portion of the data on
dollar amounts and method payments for a sample of 16 customers.
Company’s managers requested the sample to be taken to learn about
payment practices of the stores’ customers. In particular, managers were

Manipal University Jaipur Page No. 209


Statistics for Management Unit 4

interested in learning about how a new credit card payment option was
related to the customer’s purchase amounts.
Managerial report
Use the methods of descriptive statistics to summarise the sample data.
Provide summaries of the dollar purchase amounts for each customers,
personal check customers and credit card customers separately. Your
report should contain the following summaries and discussion:
1. A comparison and interpretation of mean and median.
2. A comparison and interpretation of measures of variability such as range
and standard deviation.
3. The identification and the interpretation of the five–number summaries
for each methods of payment.
Use summary section of your report to provide a discussion of what you
have learnt about the method and amount of payments for customers of
Consolidated Foods.
Purchase amount and method of payment for a sample of 16 Consolidated
Foods’ customers are:
Table 4.66
Customer Amount ($) Method of Payment
1 28.58 Check
2 52.04 Check
3 7.41 Cash
4 11.71 Cash
5 43.79 Credit Card
6 48.95 Check
7 57.59 Check
8 27.60 Check
9 26.91 Credit Card
10 9.00 Cash
11 18.09 Cash
12 54.84 Check
13 41.10 Check
14 43.14 Check
15 3.31 Cash
16 69.77 Credit Card

Manipal University Jaipur Page No. 210


Statistics for Management Unit 4

Case Study 2
All insurance companies, offering unit linked insurance policies, charge
certain amount of money by way of meeting initial expenses. However, the
percentages of such expenses ratio vary from company to company. The
following table gives the expenses ratio including allocation charges for
some companies and for various maturity periods.
HDFC unit Bajaj Tata AIG Kotak Safe ICICI SBI Unit Birla Sun
linked Allianz Invest Investment Prulife Plus II life
Endowment New Assurell Plan Time Regular Premier
Unit Super
Gain Regular
1 year 6.4 4.8 7.4 3.7 5.4 5.6 3.5
10 years 2.7 2.9 3.6 2.4 3.5 2.9 2.2
15 year 1.8 2.4 2.8 2.1 3.0 2.3 1.9
20 years 1.5 2.2 2.4 1.9 2.8 2.0 1.7
25 years 1.3 2.1 2.2 1.8 2.6 1.9 1.6
30 years 1.2 2.0 2.1 1.7 2.6 1.8 1.5

(Source: Economic Times dt 23rd October 2006)


Discuss the given data using measures of location and dispersion including
disparity measures.
References:
 Agarwal, B. L. (2006), Basic Statistics, Fourth Edition, New Age
International Publishers.
 Barron's AP Statistics with CD-ROM (Barron's AP Statistics (W/CD)) by
Martin Sternstein (Feb 1, 2010) .
 Barron's AP Statistics, 6th Edition by Martin Sternstein.
 Bowerman, B. L & O Connel, R.T., (1996), Applied Statistics: Improving
Business Processes, Irwin.
 Freedman, D.R. Pisani and Purves, R. (1997), Statistics, 3rd edition,
W.W. Norton.
 Hogg, R. V. and Craig, A.T., (2001), Introduction to Mathematical
Statistics, 5th edition, Prentice Hall.
 Joiner, B.L., and Ryan, B.F. (2000), Minitab Handbook, Brooks/Cole.
 Levin, Richard I. & Rubin, David S. (2008), Statistics for Management,
Seventh Edition, PHI Learning Private Limited.
 Miller, I and Miller, M. (1998), John E Freund’s Mathematical Statistics,
Prince Hall.
Manipal University Jaipur Page No. 211
Statistics for Management Unit 4

 Roberts, H. (1991), Data Analysis for Managers with Minitab, Scientific


Press.
 Schaum's Outline of Introduction to Probability and Statistics (Schaum's
Outline Series) by Seymour Lipschutz and John J. Schiller (Sep 7, 2011)
 Schaum's Outline of Probability and Statistics by Murray R. Spiegel,
John J. Schiller and R. Alu Srinivasan (Mar 17, 2000) .
 Schaum's Outline of Probability and Statistics, 3rd Ed. (Schaum's
Outline Series) by John Schiller, R. Alu Srinivasan and Murray Spiegel
(Aug 26, 2008).
 Schaums Outline of Statistics, Fourth Edition (Schaum's Outline Series)
by Murray Spiegel and Larry Stephens (Jan 31, 2011) .
 Statistics (Cliffs Quick Review) by David H. Voelker, Peter Z. Orton and
Scott Adams (Jun 15, 2001) .
 Tanur, J. M., (2002), Statistics: A Guide to the Unknown, 4th edition,
Brooks /Cole.
 Tukey, J. W., (1977), Exploratory Data Analysis, Addison–Wesley.
 Wilcox, Rand R. (2009), Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press

E-References:
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-3.pdf

Manipal University Jaipur Page No. 212


Statistics for Management Unit 5

Unit 5 Theory of Probability


Structure:
5.1 Introduction
Objectives
Relevance
Statistics in practise
Definition of Probability
Basic terminology used in Probability Theory
Approaches to define Probability
5.2 Rules of Probability
Addition rule
Multiplication rule
5.3 Conditional Probability
5.4 Steps Involved in Solving Problems on Probability
5.5 Bayes’ Theorem
5.6 Random Variable
Mathematical expectation and variance of a Random Variable
5.7 Summary
5.8 Glossary
5.9 Terminal Questions
5.10 Answers
5.11 Case Study

5.1 Introduction
In the previous unit, you have studied about the measures of Central
tendency and measures of Dispersion. In this unit, you will study about
measure of uncertainty involved in our day to day lives by using probability
theory.
Every human activity has an element of uncertainty. Uncertainty affects the
decision making process. In your daily life, you very often use the word
‘probably’, like, probably it may rain today; probably the share price may go
up in the next week. Therefore, there is a need to handle uncertainty
systematically and scientifically.

Manipal University Jaipur Page No. 213


Statistics for Management Unit 5

Mathematicians and statisticians developed a separate field of mathematics


and named it as ‘probability theory’. The theory of probability helps us to
make wiser decisions by understanding the degree of uncertainty.
Objectives:
After studying this unit, you should be able to:
 define Probability
 recognise the approaches to Probability
 apply the rules of probability for calculating different kinds of probabilities
 apply the Bayes’ probability theorem
 apply the concept of Random Variables to real life situations
5.1.1 Relevance
The partner of a consulting company joined as CEO of “Progressive
Enterprise“ which deals with manufacturing, imports and exports of
consumer electronics products. Soon after joining, he called for a meeting of
the top-level executives from the head office, regional offices and
manufacturing units, to deliberate on the next year’s prospects for the
company. He invited presentations from each of them about their areas of
operation and control. Somehow, he felt that the presentations were hazy,
and used lots of phrases like “quite likely”, “expected to”, “almost certain”,
”may not”, etc. After carefully listening to each of them, he announced that
another meeting would be convened after two weeks but in that meeting, he
would expect one of them to express confidence in their statements in
quantitative terms like “90%” as he believed that if one could not measure
what one is taking about in numbers, one’s knowledge and understanding
was not adequate. He also hired the services of a consultant well versed in
the area of statistics in order to help the executives to arrive at quantitative
assessment of uncertainties associated with their areas of operation. This
initiative taken by the CEO, forced the executives to gather more information
and subject it to more sophisticated analysis. At next meeting, the CEO
noted that the executives were pleased to experience that they had become
much more informed and analytical in their approach.
(Source: TN Srivastava and Shailaja Rejo (2008) Statistics for Management 5th ed.,
TMH)

Manipal University Jaipur Page No. 214


Statistics for Management Unit 5

5.1.2 Statistics in practise


Morton International
Chicago
Morton International deals with various businesses like salt, household
products, rocket motors and specialty chemicals. Carstab Corporation, a
subsidiary of Morton International, specifically deals with chemicals and
offers a variety of chemicals designed to meet the unique specification of its
customers. For one particular customer, Carstab produced an expensive
catalyst used in chemical processing. Some, but not all, of the lots produced
by Carstab met the customer’s specification for the product. Carstab’s
customer agreed to test each lot after receiving it and to determine whether
the catalyst would perform the desired function. After trying a lot they did not
pass the customer’s test and decided to returned to Carstab. Over a period
of time, Carstab found that the customers were accepting 60% of the lot and
returning 40%. In probability terms each Carstab shipment to the customer
had a 60% probability of being accepted and a 40% probability of being
returned.
Neither Carstab nor its customers were pleased with the results. In an effort
to improve service, Carstab explored the possibility of duplicating the
customer’s test prior to shipment. However, the high cost of the special
testing equipment made the alternative infeasible. Carstab chemists then
proposed a new relatively low-cost test designed to indicate whether a lot
would pass the customer’s test or not. What is the probability that a lot will
pass customer’s test if it has passed new Carstab test?
A sample of lots was produced and subjected to the new Carstab test. Only
lots that passed the new test were sent to the customer. Probability analysis
of the data indicated that if a lot passed the Carstab’s test, it had a 0.909
probability of passing the customer’s test and being accepted. Alternatively,
if a lot passed the Carstab test, it had only a 0.091 probability of being
returned. The probability analysis provided key supporting evidence for the
adoption and implementation of the new testing procedure at Carstab. The
new test resulted in an immediate solution for customer service and a
substantial reduction in shipping and handling costs for returned lots.
(Source: David R Anderson, Dennis J Sweeney and Thomas A Williams, 5th ed.,
Thomson Business Information Pvt. Ltd.)

Manipal University Jaipur Page No. 215


Statistics for Management Unit 5

5.1.3 Definition of Probability


Probability is a numerical measure which indicates the chance of
occurrence of an event ‘A’. It is denoted by P(A). It is the ratio between the
favourable outcomes of an event ‘A’ (m) to the total outcomes of the
experiment (n). In other words:
m
P(A) =
n
Where, ‘m’ is the number of favourable outcomes of an event ‘A’ and ‘n’ is
the total number of outcomes of the experiment.
Probability is a numerical measure which indicates the chance of
occurrence.

Key statistic
The probability of event A [denoted P(A)], must lie within the interval from
0 to 1.

5.1.4 Basic terminology used in probability theory


a) Experiment
An operation that results in a definite outcome is called an experiment.
Tossing a coin is an experiment, if it shows head (H) or tail (T) on falling. In
anticipation of outcome of either H or T and nothing else, tossing a coin
which is likely to stand on its edge (figure 5.1) over a typical surface is not
an experiment. For a normal thin coin, while tossing that coin on the hard
plane surface, having coin on its edge is impossible event. In other words, it
is an event with probability zero.

Fig. 5.1: A Coin Standing on its Edge

Manipal University Jaipur Page No. 216


Statistics for Management Unit 5

b) Random experiment
When the outcome of an experiment cannot be predicted with certainty, then
it is called random experiment or stochastic experiment.
There are two types of experiments. They are –
(i) Deterministic experiment and (ii) Random experiment.
A deterministic experiment, when repeated under the same conditions,
results in the same outcome. It has a unique outcome.
Random experiment is an experiment which may not result in the same
outcome when repeated under the same conditions. It is an experiment
which does not have a unique outcome.

Example 1
The experiment of 'toss of a coin' is a random experiment. It is so
because when a coin is tossed the result may be 'Head' or it may be
'Tail'.
Example 2
The experiment of 'drawing a card randomly from a pack of playing
cards' is a random experiment. Here, the result of the draw may be any
one of the 52 cards.

c) Sample space
The set of all possible outcomes of a random experiment is the sample
space. The sample space is denoted by S. The outcomes of the random
experiment (elements of the sample space) are called sample points or
outcomes or cases.
A sample space with finite number of outcomes is a finite sample space. A
sample space with infinite number of outcomes is an infinite sample space.

Example 3
In tossing of a coin, the outcomes are head and tail. In tossing a coin the
sample space ‘S’ is given by: S = Η, Τ . The head is denoted as ‘H’
and the tail as ‘T’. In tossing two coins, the sample space ‘S’ is given by:
S = ΗΗ, ΗΤ, ΤΗ, ΤΤ

Manipal University Jaipur Page No. 217


Statistics for Management Unit 5

Example 4
While throwing a die, the sample space is S = {1, 2, 3, 4, 5, 6}. This is a
finite sample space
Example 5
Consider the toss of a coin successively until a head is obtained. Let the
number of tosses be noted. Here, the sample space is S= {1, 2, 3, 4....}.
This is an infinite sample space

Key statistic
If the number of outcomes is finite then it is called as finite sample space,
otherwise it is called as an infinite sample space.

d) Event
Event is a subset of the sample space. Events are denoted by A, B, C, etc.
An event which does not contain any outcome is a null event (impossible
event). It is denoted by Φ. An event which has only one outcome is an
elementary event or simple event. An event which has more than one
outcome is a compound event. An event which contains all the outcomes is
equal to the sample and it is called sure event or certain event.

Example 6
While throwing a die, A= {2, 4, 6} is an event. It is the event that the throw
results in an even number. Here, A is a compound event.
Example 7
While tossing two coins, A= {TT} is an event. It is the event that the toss
results in two tails. Here, A is a simple event.
The outcomes which belong to an event are said to be favourable to that
event. The event happens whenever the experiment results in a favourable
outcome. Otherwise, the event does not happen
While throwing a die, the event A = {2, 4, 6} has three favourable
outcomes, namely, 2, 4 and 6 where the throw results in 2, 4 or 6, event A
occurs.

Manipal University Jaipur Page No. 218


Statistics for Management Unit 5

e) Equally likely events (equiprobable events)


Two or more events are equally likely if they have equal chance of
occurrence. That is, equally likely events are such that none of them have
greater chance of occurrence than the others.

Example 8
While tossing a fair coin, the outcomes 'Head' and 'Tail' are equally likely.
Example 9
While throwing a fair die, the events A={2,4,6}, B = {1,3, 5} and C={ 1,2, 3}
are equally likely.
A sample space is called an equiprobable space if the outcomes are
equally likely. For instance, the sample space S = {1, 2, 3, 4, 5, 6} of throw
of a fair die is equiprobable space because the six outcomes are equally
likely.
Example 10
In tossing an unbiased coin, getting head and tail are equally likely.

f) Mutually exclusive events (disjoint events)


Two or more events are mutually exclusive if only one of them can occur at
a time. That is, the occurrence of any of these events totally excludes the
occurrence of the other events. Mutually exclusive events cannot occur
together.
Example 11
While tossing a coin, the outcomes 'Head’ and 'Tail' are mutually
exclusive because when the coin is tossed once, the result cannot be
head as well as tail. While tossing a coin, if head falls, it prevents the
occurrence of tail and vice versa.
Example 12
While throwing a die, the events A = {2, 4, 6}, B= {3, 5} and C = {1} are
mutually exclusive.
If A is an event, A and A' are mutually exclusive. It should be noted that
intersection of mutually exclusive events is a null event.
Example 13
Consider an experiment of drawing a card and the following events are:

Manipal University Jaipur Page No. 219


Statistics for Management Unit 5

A: The card is a spade


B: the card is a heart
These two events are mutually exclusive as the card drawn cannot be
both a spade and a heart.
g) Exhaustive set of events
A set of events is exhaustive if one or the other of the events in the set
occurs whenever the experiment is conducted, i.e., the set of events
exhausts all the outcomes of the experiment.
The union of exhaustive events is equal to the sample space.
A set of events is exhaustive if each of the possible outcomes of an
experiment occurs in one or the other events in the set. Also, it can be
defined as the set whose sum of sample points forms the total sample
points of the experiment.

Example 14
While throwing a die, the six outcomes together are exhaustive. But here,
if any one of these outcomes is left out, the remaining five outcomes are
not exhaustive.
Example 15
While throwing a die, events A = {2, 4, 6}, B = {3, 6} and C = {1, 5, 6}
together are exhaustive.

h) Complementation of an event
Let A be an event. Then, complement of A is the event of non-occurrence of
A. It is the event constituted by the outcomes which are not favourable to A.
The complement of A is denoted by A or Ā or Ac.
The complement of an event is an event that consists of those possible
outcomes that are different from those outcomes of given event.
While throwing a die, if A = {2, 4, 6}, its complement is A = {1, 3, 5}. Here, A
is the event that the throw results in an even number. A is the event that the
throw does not result in an even number, i.e., A is the event that the throw
results in an odd number.

Manipal University Jaipur Page No. 220


Statistics for Management Unit 5

i) Independent events
Two events are said to be independent of each other if the occurrence of
one is not affected by the occurrence of other or the occurrence of one does
not affect the occurrence of the other.
Example 16
Consider tossing of three fair coins as shown in figure 5.2. Then,
S = { HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}
Let:
A be the event of getting three heads
B be the event of getting two heads
C be the event of getting one head
D be the event of not getting a head

Fig. 5.2: Tossing Three Fair Coins

Then, the outcomes for events A, B, C, and D are: A = {HHH}; B = {HHT,


HTH, THH}; C = {HTT, THT, TTH}; D = {TTT}. Events A, B, C and D are
mutually exclusive and exhaustive but not equally likely.
j) Union of events
Union of two or more events is the event of occurrence of at least one of
these events. Thus, union of two events A and B is the event of occurrence
of at least one of them. The union of A and B is denoted by A  B or A+B or
(A or B).

Example 17
While tossing two coins simultaneously, let A = {HH} and B = {TT} be two
events. Then, their union is A  B = {HH, TT}. Here, A is the event of
occurrence of two heads and B is the event of occurrence of two tails.
Example 18
While throwing a die, let A = {2, 4, 6}, B = {3, 6} and C = {4, 5, 6} be three
events. Then, their union is A  B  C = {2, 3, 4, 5, 6}.

Manipal University Jaipur Page No. 221


Statistics for Management Unit 5

l) Intersection of events
Intersection of two or more events is the event of simultaneous occurrence
of all these events. Thus, intersection of two events A and B is the event of
occurrence of both of them. The intersection of A and B is denoted by A∩B
or AB or (A and B).
Example 19
While tossing two coins, let A = {HH, TT}, B = {HH, HT, TH} be two
events. Then, their intersection is A∩B = {HH}.
Example 20
While throwing a die, let A = {2, 4, 6}, B = {3, 6} and C = {4, 5, 6} be three
events. Then, their intersection is A∩B∩C = {6}.

5.1.5 Approaches to define Probability


There are four approaches to define Probability. They are as follows:
1) Classical / mathematical / priori approach
2) Statistical / relative frequency / empirical / posteriori approach
3) Subjective approach
4) Axiomatic approach
1) Classical / mathematical / priori approach
Under this approach the probability of an event is known before conducting
the experiment. In this case, each of possible outcomes is associated with
equal probability of occurrence and number of outcomes favourable to the
concerned event is known.
Let a random experiment have n equally likely, mutually exclusive and
exhaustive outcomes. Let m of these outcomes be favourable to an event A.
Then, probability of A is –

P(A) = Number of favourable outcomes


Total number of outcomes
m
P(A) =
n
Where, ‘m’ is the number of favourable outcomes, ‘n’ is the total number of
outcomes of the experiments.
The following are some of the examples of events, where the probabilities of
those events are obtained by classical approach:

Manipal University Jaipur Page No. 222


Statistics for Management Unit 5

a) Getting a head in a toss of a coin


b) Drawing a king from a well shuffled pack of cards
c) Getting a ‘6’ in a throw of a die
This definition is applicable only when
(i) The outcomes are equally likely, mutually exclusive and exhaustive.
(ii) The number of outcomes n is finite.
Solved problem 1
Find the probability of ‘head’ in the toss of a fair coin.
Solution
The sample space is S = {H, T}. There are n = 2 equally likely, mutually
exclusive and exhaustive outcomes. One outcome, namely H is favourable
to the event A: toss results in head.
Thus, m = 1.
∴ P[head] = P(A) = m/n = ½
Solved Problem 2
Find the probability that a throw of an unbiased die results in (i) an ace
(number 1) (ii) an even number (iii) a multiple of 3.
Solution
The sample space is S = {1, 2, 3, 4, 5, 6]. There are n = 6 equally likely,
mutually exclusive and exhaustive outcomes. Let events A, B and C be —
A: throw results in an ace (number 1)
B: throw results in an even number
C: throw results in a multiple of 3
(i) Event A has one favourable outcome
∴P[ace] = P(A) = m/n = 1/6
(ii) Event B has 3 favourable outcomes, namely, 2, 4 and 6.
∴P [even number] = P(B) = m/n = 3/6= 1/2
(iii) Event C has 2 favourable outcomes, namely, 3 and 6
P [multiple of 3] = P(C) = m/n = 2/6 = 1/3
Solved Problem 3
A bag contains 3 white, 4 red and 2 green balls. One ball is selected at
random from the bag. Find the probability that the selected ball is
(i) white (ii) non-white (iii) white or green

Manipal University Jaipur Page No. 223


Statistics for Management Unit 5

Solution
The bag has a total of 9 balls. Since the ball drawn can be any one of them,
there are 9 equally likely, mutually exclusive and exhaustive outcomes. Let
events A, B and C be
A: selected ball is white
B: selected ball is non-white
C: selected ball is white or green
(i) There are 3 white balls in the bag. Therefore, out of the 9 outcomes, 3
are favourable to event A.
∴P [white ball] = P (A) = 3/9 = 1/3
(ii) Event B is the complement of event A. Therefore,
∴ P[non-white ball] = P(B) = 1 - P(A) = 1 – 1/3 = 2/3
(iii) There are 3 white balls and 2 green balls in the bag. Therefore, out of
9 outcomes, 5 are either white or green.
∴ P[white ball or green ball] = P(C) = 5/9
Solved Problem 4
One card is drawn from a well-shuffled pack of playing cards. Find the
probability that the card drawn (i) is a heart (ii) is a king (iii) belongs to red
suit (iv) is a king or a queen (v) is a king or a heart.
Solution
A pack of playing cards has 52 cards. There are four suits, namely, spade,
club, heart and diamond (dice). In each suit, there are thirteen
denominations - ace (1), 2, 3,…, 10, jack (knave), queen and king.
A card selected at random may be any one of the 52 cards. Therefore, there
are 52 equally likely, mutually exclusive and exhaustive outcomes. Let
events A, B, C, D and E be —
A: selected card is a heart
B: selected card is a king
C: selected card belongs to a red suit
D: selected card is a king or a queen
E: selected card is a king or a heart
(i) There are 13 hearts in a pack. Therefore, 13 outcomes are favourable
to event A.
∴ P [Heart] = P(A) =13/52 = 1/4

Manipal University Jaipur Page No. 224


Statistics for Management Unit 5

(ii) There are 4 kings in a pack. Therefore, 4 outcomes are favourable to


event B.
∴P[King] = P(B)=4/52 =1/13
(iii) There are 13 hearts and 13 diamonds in a pack. Therefore, 26
outcomes are favourable to event C.
∴ P [Red card] = P(C) =26/52 = 1/2
(iv) There are 4 kings and 4 queens in a pack. Therefore, 8 outcomes are
favourable to event D.
∴ P[King or Queen] = P(D) = 8/52 = 2/13
(v) There are 4 kings and 13 hearts in a pack. Among these, one card is
heart-king. Therefore, (4+13-1) = 16 outcomes are favourable to
event E.
∴ P[King or Heart] = P(E) =16/52 = 4/13
Solved Problem 5
A bag contains 8 tickets which are marked with the numbers 1, 2, 3,.. 8.
Find the probability that a ticket drawn at random from the bag is marked
with (i) an even number (ii) a multiple of 3.

Solution
A: selected number is even
B: selected number is a multiple of 3
(i) Four of the selections, namely, 2, 4, 6 and 8 are favourable to event A
∴ P [even number] = P(A) = 4/8 = 1/2
(ii) Two of the selections, namely, 3 and 6 are favourable to event B
∴ P[multiple of 3] = P(B) = 2/8 = 1/4
2) Statistical / relative frequency / empirical / posteriori approach
Under this approach the probability of an event is arrived at after conducting
an experiment. If we want to know the probability that a particular household
in an area will have two earning members, then we have to gather data on
all households in that area and then arrive at the probability. Greater the
number of households surveyed, greater will be the accuracy in the
probability, arrived.

Manipal University Jaipur Page No. 225


Statistics for Management Unit 5

The probability of an event ‘A’ in this case is defined as,


m
P(A) = Lim
n  n
In real life, it is not possible to conduct experiments because of high cost or
of destructive type experiments or of vast area to be covered.
3) Subjective approach
Under this approach the investigator or researcher assigns probability to the
events either from his experience or from past records. It is more suitable
when the sample size is ten or less than ten. The investigator has full
knowledge about the characteristics of each and every individual. However,
there is a chance of personal bias being introduced in such probability.
4) Axiomatic approach
Let S be a sample space consisting of all events of a random experiment
and A  S , then the probability of an event A is a set function satisfying the
following axioms:
i) Axioms of positivity, P(A)  0
ii) Axiom of certainty, P(S) = 1

n  n
iii) Axiom of addition, 
P 
i
Ai
1 

i
1
P
(Ai)where A , A ,...A are
1 2 n

sequence of disjoint events of the sample space.


iv)
Self Assessment Questions
1. To which approach does the following probability estimates belong:
i. Probability that India will win the game
ii. Probability that Mr. Ram will resign from the post
iii. Probability of drawing a red card
iv. Probability that you will go to America this year

5.2 Rules of Probability


In this section, we will discuss the rules of probability. Managers very often
come across with situations where they have to take decisions about
implementing either course of action A or course of action B or course of

Manipal University Jaipur Page No. 226


Statistics for Management Unit 5

action C. Sometimes, they have to take decisions regarding the


implementation of both A and B.

Example 19
A sales manager may like to know the probability that he will exceed the
target for product A or product B. Sometimes, he would like to know the
probability that the sales of product A and B will exceed the target. The
first type of probability is answered by addition rule. The second type of
probability is answered by multiplication rule.
5.2.1 Addition rule
The addition rule of probability states that:
i) If ‘A’ and ‘B’ are any two events then the probability of the occurrence
of either ‘A’ or ‘B’ is given by:
ΡΑ  Β = ΡΑ  + ΡΒ  ΡΑ  Β
ii) If ‘A’ and ‘B’ are two mutually exclusive events then the probability of
occurrence of either ‘A’ or ‘B’ is given by:
ΡΑ  Β = ΡΑ  + ΡΒ
iii) If ‘A’, ‘B’ and ‘C’ are any three events then the probability of occurrence
of either ‘A’ or ‘B’ or ‘C’ is given by:
ΡΑ  Β  C  = ΡΑ  + ΡΒ + ΡC   ΡΑ  Β  ΡΒ  C   ΡΑ  C  + ΡΑ  Β  C 

In terms of Venn diagram, from the figure 5.3, we can calculate the
probability of occurrence of either event ‘A’ or event ‘B’, given that event ‘A’
and event ‘B’ are dependent events. From the figure 5.4, we can calculate
the probability of occurrence of either ‘A’ or ‘B’, given that, events ‘A’ and ‘B’
are independent events. From the figure 5.5, we can calculate the
probability of occurrence of either ‘A’ or ‘B’ or ‘C’, given that, events ‘A’, ‘B’
and ‘C’ are dependent events.

Fig. 5.3: A  B for Two Fig. 5.4: AB for Two Fig. 5.5: ABC for
Dependent Events Independent Events Three Dependent
A and B A or B Events A, B and C

Manipal University Jaipur Page No. 227


Statistics for Management Unit 5

iv) If A1, A2, A3………, An are ‘n’ mutually exclusive and exhaustive events
then the probability of occurrence of at least one of them is given by:
ΡΑ 1  Α 2  .......  Α n  = ΡΑ 1  + ΡΑ 2  + ........ + ΡΑ n .

5.2.2 Multiplication rule


If ‘A’ and ‘B’ are two independent events then the probability of occurrence
of ‘A’ and ‘B’ is given by:
ΡΑ  Β = ΡΑ ΡΒ
Solved Problem 6
i) Show that P(A) = 1 – P(A')
ii) Show that probability is a value between 0 and 1.
iii) Show that P(Ф) = 0 where Ф is null event.
Solution
(i) If A and A' are complementary events, AA' = S
By the axiom 2, P(S) = 1. And so, P (A  A') =1 .... Result 1
But A and A' are mutually exclusive events. Therefore, by the axiom 3,
P(A  A') = P(A) + P(A') …Result 2
By the results 1 and 2, P(A) + P(A') = 1
That is, P(A) = 1- P(A')
(ii) Let A be an event. Then, by the axiom 1,
P(A) ≥0 ....Result 1
If A' is the complementary event of A,
P(A') = 1 – P(A)
But, by axiom 1, P(A') ≥0
Therefore, 1 - P(A) ≥ 0
Hence, P(A) ≤ 1 …Result 2
By the results 1 and 2, 0 ≤ P(A) ≤ 1, i.e., the probability is a value between 0
and 1.
(iii) If A is an event and if Φ is a null event, AΦ = A
∴P(A  φ) = P(A) ….. Result 1
But, A and Φ are mutually exclusive. Therefore,
P(A  φ) = P(A) + P(φ) ….. Result 2

Manipal University Jaipur Page No. 228


Statistics for Management Unit 5

By the result 1 and 2 , P(A) + P(Φ) = P(A)


That is, P(Φ) = P(A) – P(A) = 0
Solved Problem 7
Write down the sample space for each of the following random experiments.
(i) A coin is tossed three times and the result of each throw is noted,
(ii) A coin is tossed three times and the number of heads obtained is
noted,
(iii) A tetrahedron (a solid with four triangular surfaces) whose sides are
painted red, red, blue and green is thrown. The colour of the side
which touches the ground is noted.
(iv) Blood of husband and wife are tested and the blood group (whether
O, A, B or AB) in each case is identified.
(v) A person is randomly selected and his religion is noted.
Solution
(i) S= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT)
(ii) S= {0, 1, 2, 3}
(iii) S = {red, blue, green}
(iv) S = {(O,O), (O, A), (O, B), (O, AB), (A, O), (A, A), (A, B), (A, AB), (B,
O), (B, A), (B, B), (B, AB), (AB, O), (AB,A), (AB,B), (AB,AB)}
(v) S = {Hindu, Christian, Muslim, Jain, Sikh, Jew, ....}
Solved problem 8
i. Given the equiprobable sample space S = {1, 2, 3, 4, 5, 6} and the
event A = {1, 3, 5}, find P (A)
ii. Given the sample space S = {1, 2, 3, 4, 5, 6} and the events A = {1, 3,
5} and B = {2, 4, 6}. If P(A) = 1/3 find P(B)
iii. If S = {E1, E2) is the sample space and if P(E1) = 0.3, find P(E2)
Solution
(i) Since the sample space is equiprobable, mathematical definition
can be used for finding probability.
= Number of favourable outcomes = 3 = 1
∴ P(A)
Total number of outcomes 6 2

(ii) Here, A and B are complementary events


∴ P(B) = 1 – P(A) = 1 – 1/3 = 2/3
Manipal University Jaipur Page No. 229
Statistics for Management Unit 5

(iii) Here, E1, E2 are complementary events.


∴ P(E2) = 1 – P(E1) = 1 – 0.3 = 0.7

5.3 Conditional Probability


In this section, we will discuss the conditional probability. Sometimes, we
wish to know the probability that the price of a particular petroleum product
will rise, given that the finance minister has increased the petrol price. Such
probabilities are known as conditional probabilities.
Thus the conditional probability of occurrence of an event ‘A’ given that the
event ‘B’ has already occurred is denoted by P (A / B). Here, ‘A’ and ‘B’ are
dependent events. Therefore, we have the following rules.
If ‘A’ and ‘B’ are dependent events, then the probability of occurrence of ‘A
and B’ is given by:
ΡΑ  Β = ΡΑ   Ρ B  A = ΡΒ  ΡA B
It follows that:

ΡA B = P(AP(B)
 B)

ΡB A  = P(AP(A)
 B)

For any bivariate distribution, there exists two marginal distributions and
‘m + n’ conditional distributions, where ‘m’ and ‘n’ are the number of
classifications/characteristics studied on two variables.
Example 20
Consider the example of a librarian who analysed the type of visitors and
their choice of library section. The data is represented in table 5.1
Table 5.1: Bivariate Distribution
Type of visitors Sections
Level of News Magazine Novel Subject Total
education Paper (story)
Under Graduates 50 100 120 50 320
Graduates 70 90 50 100 310
Post Graduates 100 60 30 150 340
Total 220 250 200 300 970

Manipal University Jaipur Page No. 230


Statistics for Management Unit 5

We can get the following distributions:


i) The table 5.1a represents the distribution of level of education
irrespective of their sections. Therefore, it is called marginal
distribution.
Table 5.1a: Marginal Distribution of Level of Education Irrespective of
their Sections
Type of Visitors Total
Undergraduates 320
Graduates 310
Post graduates 340
Total 970

ii) The table 5.1b represents the distribution of people in sections


irrespective of their educational levels. It is another marginal
distribution. Thus, there are two marginal distributions for bivariate
data, variables being sections and level of education.
Table 5.1b: Marginal Distribution of People Irrespective of their
Educational Levels
Sectors News paper Magazine Novels Subjects Total
Total 220 250 200 300 970

iii) The table 5.1c represents the distribution of people in sections given that
they are under graduate. Therefore, it is a conditional distribution.
Table 5.1c: Conditional Distribution
Level of News
Magazine Novels Subjects Total
education paper
Under
50 100 120 50 320
graduate

Thus, for any bivariate distributions having ‘m’ and ‘n’ classifications there
exist two marginal distributions and ‘m + n’ conditional distributions. In this
case there are 3 + 4 = 7 conditional distributions.

5.4 Steps Involved in Solving Problems on Probability


In this section, we will discuss the steps involved in solving problems on
probability. The figure 5.6 gives the explanation of steps involved in solving
problems on probability.

Manipal University Jaipur Page No. 231


Statistics for Management Unit 5

Figure 5.6: Steps Involved in Solving Problems on Probability

Solved problem 9
Calculation of n C r for the following values of ‘n’ and ‘r’:
i. n = 10 and r = 2
ii. n =16 and r = 3
Solution
10 10  9
i. C2 = = 45
1 2
16 16  15  14
ii. C3 = = 560
1 2  3
Key statistic
n
C r = n C n -r
n
C0 = nCn = 1
0! = 1

Manipal University Jaipur Page No. 232


Statistics for Management Unit 5

Solved Problem 10
Calculate 16C13.
Solution
16
C13 = 16C16-3 = 16C3 = 560
16
The value of C13 is 560.
Solved Problem 11
Find the probability of getting a head when a coin is tossed?
Solution
Let ‘A’ be the event of getting a head.

S = Η, Τ  n(S) = 2

n(A) = 1
n(A) 1
∴ P(A) = =
n(S) 2

Therefore, the probability of getting a head when a coin is tossed is 0.5.


Solved Problem 12
What is the probability of getting two heads when 3 coins are tossed and
what is the probability of getting at least one head?
Solution
i) Let ‘A’ be the event of getting two heads.
S = ΗΗΗ, ΗΗΤ, ΗΤΗ , ΤΗΗ , ΤΤΗ , ΤΗΤ , ΗΤΤ , ΤΤΤ   nS = 8
Α = ΗΗΤ, ΗΤΗ , ΤΗΗ   nΑ  = 3
n(A) 3
∴ P(A) = =
n(S) 8
Therefore, the probability of getting two heads when three coins are
tossed is 3/8.
ii) Let ‘A’ be the event of getting at least two heads.
Α = ΗΗΗ, ΗΗΤ, ΗΤΗ , ΤΗΗ   nΑ  = 4
n(A) 4 1
∴ P(A) = = =
n(S) 8 2

Manipal University Jaipur Page No. 233


Statistics for Management Unit 5

Therefore, the probability of getting at least two heads when three coins are
tossed is 1/2.
Solved Problem 13
What is the probability of getting a sum of ‘nine’ when two dice are thrown?
Solution
Let ‘A’ be the probability of getting a sum ‘nine’.
S={ (1 1),(1 2),(1 3),(1 4),(1 5),(1 6)
(2 1),(2 2),(2 3),(2 4),(2 5),(2 6)
(3 1),(3 2),(3 3),(3 4),(3 5),(3 6)
(4 1),(4 2),(4 3),(4 4),(4 5),(4 6)
(5 1),(5 2),(5 3),(5 4),(5 5),(5 6)
(6 1),(6 2),(6 3),(6 4),(6 5),(6 6) }

nS = 62 = 36

Α = 6,3, 3,6 , 4,5, 5,4 


nΑ  = 4
n(A) 4 1
∴ P(A) = = =
n(S) 36 9
Therefore, the probability of getting a sum of ‘nine’ when two dice are
thrown is 1/9.
Solved Problem 14
What is the probability of getting at least a sum of ‘nine’ when two dice are
thrown?
Solution
Let ‘A’ be the probability of getting at least a sum of nine.

nS = 62 = 36
A is the event of combination of mutually exclusive events of getting a sum 9
or 10 or 11 or 12.

Α = 6,3, 3,6 , 5,4 , 4,5, 6,4 , 4,6 , 5,5, 6,5, 5,6 , 6,6   n Α  = 10
n(A) 10 5
∴ P(A) =  =
n(S) 36 18
Manipal University Jaipur Page No. 234
Statistics for Management Unit 5

Therefore, the probability of getting at least a sum of ‘nine’ when two dice
are thrown is 5/18.
Solved Problem 15
A number is selected at random from the numbers 1 to 30. What is the
probability that:
i. It is divisible by either 3 or 7
ii. It is divisible by 5 or 13
Solution
i) Let ‘A’ be the event of selecting a number divisible by 3. Let ‘B’ be the
event of selecting a number divisible by 7.
n S= 30 C1 = 30
Α = 3,6,9,12,15,18,21,24,27,30
nΑ  = 10

Β = 7,14,21,28
nΒ = 4

Α  Β = 21 nΑ  Β = 1

A and B are not mutually exclusive.


ΡΑ  Β = ΡΑ  + ΡΒ  ΡΑ  Β
10 4 1 13
= +  =
30 30 30 30
Therefore, the probability that a number is divisible by 3 or 7 is 13/30.
ii) Let ‘A’ be the event of selecting a number divisible by 5. Let ‘B’ be the
event of selecting a number divisible by 13.
n S= 30 C1 = 30
Α = 5,10,15,20,25,30  n Α  = 6
Β = 13,26  n Β = 2

A and B are mutually exclusive.


∴ΡΑ  Β = ΡΑ  + ΡΒ

Manipal University Jaipur Page No. 235


Statistics for Management Unit 5

6 2 8 4
= + = =
30 30 30 15
Therefore, the probability that a number is divisible by 5 or 13 is 4/15.
Solved Problem 16
The Board of Directors of a company wants to form a quality management
committee to monitor quality of their products. The company has 5
scientists, 4 engineers and 6 accountants. Find the probability that the
committee will contain 2 scientists, 1 engineer and 2 accountants?

Solution
Let ‘A’ be the event of selecting 2 scientists, 1 engineer and 2 accountants.
Then,


1515 
14
13
1211
 
n(S)
C
5 
3003
1
2
3
4
5

n(A)= 5C2 x 4C1 x 6C2

5 4 65
n(A) =  4 = 10  4  15 = 600
1 2 1 2

n(A) 600
∴ P(A) = 
n(S) 3003
Therefore, the probability that the committee will contain 2 scientists,
1 engineer and 2 accountants is 600/3003.

Solved Problem 17
The odds favouring the event of a person hitting a target are 3 to 5. The
odds against the event of another person hitting the target are 3 to 2. If each
of them fire once at the target, find the probability that:
i) Both of them hit the target
ii) At least one of them hit the target
Solution
i) Let ‘A’ be event of first person hitting a target. Odds in favour means,
3 3
∴P(A) = = (1st ratio)
3+ 5 8
Let ‘B’ be event of Second person hitting a target. Odds against means,

Manipal University Jaipur Page No. 236


Statistics for Management Unit 5

2 2
∴ P(B) = = (2nd ratio)
3+2 5
Both hitting the target mean A  B and A and B are independent
3 2 6 3
∴ P(A  B) = P(A)P(B) =  = 
8 5 40 20
Therefore, the probability that both persons hit the target is 3/20.
ii) Let ‘A’ be the probability of hitting the target. Therefore,
3
P(A) =
8
2
Let ‘B’ be the probability of hitting the target. Therefore, P(B) =
5
ΡΑ  Β = ΡΑ  + ΡΒ  ΡΑ  Β

3 2 3 15  16  6 25 5
ΡΑ  Β      
8 5 20 20 40 8
Therefore, the probability that at least one of the persons hit the target
is 5/8.

Solved Problem 18
The probabilities that drivers A, B and C will drive home safely after
consuming liquor are 2/5, 3/7 and 3/4, respectively. What is the probability
that they will drive home safely after consuming liquor?
Solution
Let ‘A’ be the event of driver ‘A’ driving safely after consuming liquor. Let ‘B’
be the event of driver ‘B’ driving safely after consuming liquor. Let ‘C’ be the
event of driver ‘C’ driving safely after consuming liquor.
2 3 3
Given P(A) = , P(B) = , P(C) =
5 7 4
The events A, B, and C are independent. Therefore,
ΡA  B  C  = ΡA   ΡB  ΡC 

ΡA  B  C  =
2 3 3 18 9
  = 
5 7 4 140 70
Therefore, the probability that all the drivers will drive home safely after
consuming liquor is 9/70.

Manipal University Jaipur Page No. 237


Statistics for Management Unit 5

Solved Problem 19
The probabilities that ‘A’ and ‘B’ will tell the truth are 2/3 and 4/5
respectively. What is the probability that:
i) They agree with each other
ii) They contradict each other while giving a testimony in the court.
Solution
i) Let ‘A’ be the event of A telling truth. Let ‘B’ be the event of B telling
truth.
2 2 1
Given P(A) =  P( Α c ) = 1  P(A) = 1  
3 3 3
4 4 1
P(B) =  P(Bc )  1  P(B) = 1  
5 5 5
Both will agree if they say truth or they together lie, that is,
Α  Β or Α c  Β c
They are mutually exclusive. Therefore,
     
ΡΑ  Β + Ρ Α c  Β c = ΡΑ   ΡΒ + Ρ Α c  Ρ Β c
2 4 1 1 9 3
=  +  = =
3 5 3 5 15 5
The events A and B are independent.
Therefore, the probability that both A and B agree with each other is 3/5.
ii) They will contradict if A tells truth and B tells lies or B tells truth and A
tells lies.
  
 Α  Βc or Α c  Β 
Since, they are mutually exclusive.
       
Ρ Α  Βc + Ρ Α c  Β = ΡΑ   Ρ Βc + Ρ Α c  ΡΒ
2 1 1 4 6 2
=  +  = =
3 5 3 5 15 5
They are independent. Therefore, the probability that A and B contradict
each other is 2/5.
Solved Problem 20
A box contains five red and four blue similar shaped balls. Two balls are
drawn at random from the box. Find the probability that both of them are red
if:
Manipal University Jaipur Page No. 238
Statistics for Management Unit 5

i. the balls are drawn together


ii. the balls are drawn one after the other, with replacement
iii. the balls are drawn one after the other, without replacement
Solution
i) Let ‘A’ be the event of drawing two balls together.
9 8
n(S)= 9 C 2 = = 36
1 2
5 4
n(A)=5 C 2 = = 10
1 2
n(A) 10 5
∴ P(A) =  =
n(S) 36 18
Therefore, the probability that both of them are red if the balls are
drawn together is 5/18.
ii) Let ‘A’ be the event of drawing a red ball in the first draw. Let ‘B’ be the
event of drawing a red ball in the second draw. The required probability
is given by:

ΡΑ  Β = ΡΑ   ΡΒ =  =


5 5 25
9 9 81
The sample space does not change.
Therefore, the probability that both of them are red if the balls are drawn
one after the other, with replacement, is 25/81.
iii) Let ‘A’ be the event of drawing red ball in the first draw. Let ‘B’ be the
event of drawing red ball in the second draw. Since the first ball is not
replaced, the sample space changes for second draw. Therefore the
required probability is given by:

ΡΑ  Β  = ΡΑ   Ρ B
A

5 4 20 5
=  = 
9 8 72 18
Therefore, the probability that both of them are red if the balls are drawn
one after the other, without replacement, is 5/18.
Solved Problem 21
Box I contains 5 red and 6 blue balls. Box II contains 6 red and 4 blue balls.
A ball is drawn at random from box I and is transferred to box II. Now from
box II a ball is drawn at random. What is the probability that it is red?
Manipal University Jaipur Page No. 239
Statistics for Management Unit 5

Solution
A ball drawn from box I and transferred to box II could be either red or blue.
Let ‘A’ be the event of drawing a red ball from box I. Let ‘B’ be the event of
drawing a blue ball from box I. Let ‘C’ be the event of drawing red ball from
box II.
 The required events are Α  C or Β  C .

The events are mutually exclusive. Therefore,


ΡΑ  C   Β  C  = ΡΑ  C  + ΡΒ  C  = Ρ( A )  P (C
A
 B
) + ΡΒ  Ρ C
5 7 6 6 35 + 36 71
=  +  = =
11 11 11 11 121 121
Therefore, the required probability is 71/121.
Solved Problem 22
The probabilities that component A and component B of a machine will fail
are 0.09 and 0.06 respectively. The machine will fail if any one of them fails.
Find the probability that it will fail?
Solution
Given that:

ΡΑ  = 0.09 ΡΒ = 0.06

ΡΑ  Β = ΡΑ   ΡΒ = 0.09  0.06 = 0.0054

ΡΑ  Β = ΡΑ  + ΡΒ  ΡΑ  Β = 0.09 + 0.06  0.0054 = 0.1446

Therefore, the probability that the machine will fail is 14.46%.


Solved Problem 23
What is the probability of getting 53 Mondays in a leap year?
Solution
There are 366 days in a leap year. Hence, in a leap year, there are 52
weeks and 2 days. It has 52 Mondays.
For one more Monday we select from the following combination of the
remaining 2 days.
1. Sunday and Monday 3. Tuesday and Wednesday
2. Monday and Tuesday 4. Wednesday and Thursday

Manipal University Jaipur Page No. 240


Statistics for Management Unit 5

5. Thursday and Friday 7. Saturday and Sunday


6. Friday and Saturday
∴nS = 7 and nΑ  = 2
n(A) 2
∴ P(A) = 
n(S) 7
where, A is the event of getting 53 Mondays.
Therefore, the probability of getting 53 Mondays in a leap year is 2/7.
Self Assessment Questions
2. Find the probabilities in the following cases:
i. Getting an even number when a die is thrown
ii. Getting 53 Mondays in ordinary year
3. Given P(A) = 0.6, P(B) = 0.7, and P(A  B) = 0.5. Find P(A U B)?

5.5 Bayes’ Theorem


In this section, we will discuss the Bayes’ theorem. Let A1, A2, A3, A4 be
mutually exclusive and exhaustive events of a random experiment. Let ‘B’
be a common event. The figure 5.7 is the representation of Bayes’ theorem
in Venn diagram.

Fig. 5.7: Bayes’ Theorem

The event ‘B’ is made up of four mutually exclusive and exhaustive events.







1




2



3



4
=  ΡΑ i  Β …….(1) [by using the law of marginal probability]
Manipal University Jaipur Page No. 241
Statistics for Management Unit 5

We know that:

ΡΑ1  Β = ΡΒ.  Ρ 1  ..….. (2) [by the law of conditional


A
 B
probability for dependent events]

= ΡΑ1   Ρ B  …………………… (3)


 A1 
Consider:
ΡΑ1  Β
Ρ 1  =
A …. from above equation (2)
 B ΡΒ

= ΡΑ1   Ρ B 
Ρ 1  =
A  A1 
 B  ΡΑ i  Β
(by substituting (1) in the denominator and (3) in the numerator)
In general, the Bayes’ theorem states that if A1, A2………….., An are ‘n’
mutually exclusive and exhaustive events with prior probabilities
P(A1 ),P(A 2 ),...P(A n ) respectively and ‘B’ be an event for which the
conditional probabilities of the probability of occurrence of B given A1 , B
given A 2 ,… B given A n are P(B / A 1 ), P(B / A 2 ),...P(B / A n ) respectively,
then the posterior probability of occurrence of A1 given that given that ‘B’
has already occurred is given by:
P(A1 ). P(B / A1 )
P(A1 /B) = n

 P(A )  P(B / A )
i =1
i i

Bayes’ probability is also a type of conditional probability. The table 5.2


displays the differences between Conditional probability and Bayes’
probability:
Table 5.2: Differences between Conditional Probability and Bayes’ Probability
Bayes’ Probability General Conditional Probability
1. Finds the probability of population 1. Finds the probability of getting a
value, given the sample value. sample value given the population
value.
2. It is possible to incorporate 2. It is not possible to do so.
latest information.
3. It is possible to incorporate cost 3. It is not possible in this case.
aspects.

Manipal University Jaipur Page No. 242


Statistics for Management Unit 5

Whenever there are two probabilities connected with an event, then we


have to apply Bayes’ approach to solve it.
Solved Problem 24
The probabilities that Mr. Aravind, Mr. Anand and Mr. Akil will become vice-
president of a company are 0.40, 0.35 and 0.25 respectively. The
probabilities that they will introduce new product are 0.10, 0.15 and 0.20,
respectively. What is the probability that Mr. Anand introduces a new
product by becoming vice-president?
Solution
Let us assume the following:
 Let ‘A1’ be the event that Mr. Aravind became vice-president
 Let ‘A2’ be the event that Mr. Anand became vice-president
 Let ‘A3’ be the event that Mr. Akil became vice-president
 Let ‘B’ be the event that a new product was introduced
We are given that:
ΡΑ 1  = 0.40, ΡΑ 2  = 0.35, ΡΑ 3  = 0.25

ΡΒ / Α 1  = 0.10, ΡΒ / Α 2  = 0.15, ΡΒ / Α 3  = 0.20.


The required probabilities are calculated and represented in the table 5.3.
Table 5.3: Required Probabilities for the Data in Solved Problem 24
Event Prior Conditional Joint Posterior
Ai Probability Probability Probability Probability
P(Ai) P(B/Ai) P(Ai ∩ B) P(Ai/B)

0
.0400
A1 0.4 0.10 0.0400 0.2807
0
.1425

0.0525
A2 0.35 0.15 0.0525
0.3684
0.1425

0.0500
A3 0.25 0.20 0.0500
0.3509
0.1425

Total 1.00 0.45 0.1425 1.0000

Therefore, the required probability is ΡΑ 2 / Β  = 0.3684 .

Manipal University Jaipur Page No. 243


Statistics for Management Unit 5

The probability that Mr Anand introduces new product by becoming the Vice
president is 0.3684.
Solved problem 25
A factory has three machines M1, M2 and M3. They produce 4000, 10,000
and 6,000 products per day. From past records, it is known that M1, M2, and
M3 produce 5%, 4%, and 8% defectives. A product is selected at random
from the day’s production and is found to be defective. What is the
probability that it was not produced by machine M3?
Solution
Let us consider the following:
 Let ‘A1’ be the event that the product was produced by M1
 Let ‘A2’ be the event that the product was produced by M2
 Let ‘A3’ be the event that the product was produced by M3
 Let ‘B’ be the event that the product is defective
Then, we are given:


 14000
0
.20
20000


 210000
0.5
2000


3 6000
0.3
20000
P(B/A1) = 0.05 P(B/A2) = 0.04 P(B/A3) = 0.08
The above information is represented in table 5.4.
Table 5.4: Required Probabilities for the Data in Solved Problem 25
Event Prior Conditional Joint Posterior
Ai Probability Probability Probability Probability
P(Ai) P(B/Ai) P(Ai ∩ B) P(Ai/B)
0.010
A1 0.2 0.05 0.010 = 0.1852
0.054
0.020
A2 0.5 0.04 0.020 = 0.3704
0.054
0.054
A3 0.3 0.08 0.024 = 0.4444
1.0000
Total 1.00 P(B) 0.054 1.0000

Manipal University Jaipur Page No. 244


Statistics for Management Unit 5

Required probability 1 


 3
 = 1 – 0.4444 = 0.5556


  
Hence, the required probability is 0.5556.
The probability that the product was not produced by Machine M3 is 0.5556.

Self Assessment Questions


4. State whether the following questions are true or false:
i. Bayes’ probability estimates sample value
ii. Conditional probability can incorporate costs
iii. Bayes’ probability gives up to date information

5.6 Random Variable


In this section, we will discuss the random variable. A real valued function
that associates result or outcome of an experiment with a real number is
known as random variable. In general, a random variable X may take any
value on the real line (i.e., any number between   and  ). We can
assign probability that X would take a value in any specified interval such as
Xa, a  X  b, a  X  b, a  X  b, a  X  b, a  X , X  b
etc. A random variable may be discrete or continuous. It is discrete, if the
set of possible values is either finite or infinite. It is continuous, if it is
capable of taking all values in an interval.
By introducing the concept of random variable, the events are now replaced
by appropriate subset of real numbers and the probability function in the
case of discrete random variable satisfies the following conditions:
(i) ΡΧ i  = ΡΧ = Χ i  for all values of i
(ii) ΡΧ i   0 for all values of i

(iii)  ΡΧ  = 1
i

In case of continuous random variable, the probability is defined over a


range of values of random variable and the summation in (iii) is replaced by
integration over the entire range of random variable (domain of probability
distribution function).

Manipal University Jaipur Page No. 245


Statistics for Management Unit 5

If ‘X’ is a discrete random variable then P(X) is known as probability mass


function of ‘X’. If ‘X’ is a continuous random variable then P(X) is called
probability density function and is denoted by f(X).

Example 22
Let ‘X’ denote the number of heads obtained, while tossing two fair coins.
Then, X is a random variable which takes the values 0,1 and 2 wit
respective probabilities ¼, ½ and ¼ . Here, X is a discrete random variable.
Example 23
Let ‘X’ denote the number obtained while throwing a fair die. Then, ‘X’ is a
discrete random variable taking values 1, 2, 3, 4, 5 and 6 with probability 1/6
each
Example 24
Let ‘X’ denote the weight of apples. Then, ‘X’ is a continuous random
variable.
For example, let us consider the tossing of three coins. The table 5.5
displays the probabilities of getting heads when three coins are tossed.
Table 5.5: Probabilities of Getting Heads when Three Coins are Tossed

No. of Heads
P(X)
(X)
3 ⅛
2 ⅜
1 ⅜
0 ⅛
Total 1

For every Xi, we are able to assign a P(Xi) such that:

 ΡΧ  = 1
i

Probability of the number of heads forms a probability distribution. A


systematic representation of random variable with its value and probabilities
is called a probability distribution of that random variable. The distribution
will have its mean and standard deviation.

Manipal University Jaipur Page No. 246


Statistics for Management Unit 5

5.6.1 Mathematical expectation and variance of a Random Variable


Mathematical expectation of a random variable is denoted by E(X) and is
defined as:
EΧ  =  Χ i ΡΧ i 

Its variance is given by:


Var Χ  = EΧ  E(X)  = E(Χ 2 )  EΧ 
2 2

 
Where, E Χ 2 =  Χ i2 ΡΧ i 
Its standard deviation is:

 
S.DΧ  = Var Χ  = E Χ 2  EΧ 
2

Solved Problem 26
A random variable takes the values -3, -2, 1, 0, 4, 6 with probabilities 1/12,
2/12, 3/12, 4/12, 1/12, 1/12 respectively. Find its mean or expected value
and variance?
Solution
The table 5.6 represents the values required to calculate expectation and
variance for the data in solved problem 26.
Table 5.6: Required Values for Calculating Mean and Variance for the Data
XI P(Xi) Xi P(Xi) Xi2 P(Xi)
-3 1/12 -3/12 9/12
-2 2/12 -4/12 8/12
1 3/12 3/12 3/12
0 4/12 0 0
4 1/12 4/12 16/12
6 1/12 6/12 36/12
Total 6/12 72/12 = 6

∴ EΧ  =  Χ i ΡΧ i  = 6 / 12 = 1 / 2

 
Var Χ = E Χ 2  EΧ = 6  1 / 4 = 23 / 4
2

Manipal University Jaipur Page No. 247


Statistics for Management Unit 5

 
Where, E Χ 2 =  Χ i2 ΡΧ i 
72
12
6

S.DΧ  = 23 / 4  2.4


Hence, the mean, variance and standard deviation are 0.5, 5.75 and 2.4.
Solved Problem 27
Mr. Arun and Mr. Bandari play a game. If Mr. Arun picks up an even number
from 1 to 6, Mr. Bandari will pay him double the amount equal to picked up
number. If Mr. Arun picks up an odd number then he has to pay amount
equal to double the picked up number. What is Mr. Arun’s expectation?
Solution
Let Xi be the random variable and P(Xi) be its probability. The probabilities
are indicated in table 5.7.
Table 5.7: Required Values for Calculating Mean and Variance for the Data

No. (Xi) P(Xi) Xi P(Xi)


1 -2 1/6 -2/6
2 4 1/6 4/6
3 -6 1/6 -1
4 8 1/6 8/6
5 -10 1/6 -10/6
6 12 1/6 12/6
Total 1 1

 Expectation of Mr. Arun is EΧ  =  Χ ΡΧ  = 1.


i i

Solved Problem 28
The table 5.8 displays the distribution of random variable X. Find the
following probabilities:
i) P(Xi) 3
ii) P(Xi = 0)
iii) P(1  Xi  3)
iv) P(Xi)  4

Manipal University Jaipur Page No. 248


Statistics for Management Unit 5

Table 5.8: Distribution of a Random Variable X

Xi -3 -2 0 1 2 3 4 5
P(Xi) K 2K 2K 3K 3K 2K K K

Solution
Since Xi is a random variable  ΡΧ  = 1
i

 K + 2K + 2K + 3K + 3K + 2K + K + K = 1
 15K = 1 ∴ K = 1/15
i) ΡΧ i  3 = ΡΧ i = 3 + ΡΧ i = 4 + ΡΧ i = 5
= 2K + K + K = 4K = 4 / 15
ii) ΡΧ i = 0  = 2K = 2 / 15

iii) Ρ1  Χ i  3
= ΡΧ i = 1 + ΡΧ i = 2  + ΡΧ i = 3
= 3K + 3K + 2K = 8K = 8 / 15
iv) ΡΧ i  4  = ΡΧ i = 4  + ΡΧ i = 5
= K + K = 2K = 2 / 15
Solved Problem 29
Two fair coins are tossed once. Find the mathematical expectation of the
number of heads obtained.

Solution
Let Xi denote the number of heads obtained. Then, Xi is a random variable
which takes the values 0, 1 and 2 with respective probabilities ¼ ½ and ¼
and that is,
Table 5.9
Xi 0 1 2

P(Xi) ¼ ½ ¼

The mathematical expectation of the number of head is

∴ EΧ  =  Χ i ΡΧ i  = 0 
1 1 1
+ 1 + 2  = 1
4 2 4

Manipal University Jaipur Page No. 249


Statistics for Management Unit 5

Key Statistics
1. For a random variable Xi, the arithmetic mean is EΧ  =  Χ ΡΧ 
i i

2. For a random variable Xi, the variance is

Var Χ = EΧ  E(X) = E(Χ 2 )  EΧ


2 2

 
Where, E Χ 2 =  Χ i2 ΡΧ i 
The standard deviation is the square root of the variance.
Solved Problem 30
A bag has 3 white and 4 red balls. Two balls are randomly drawn from the
bag. Find the expected number of white balls in the draw.
Solution
Let ‘Xi’ denote the number of white balls obtained in the draw. Then, X i is a
random variable which takes the values 0, 1 and 2 with respective
probabilities –

4
P(0) = P[both red] = C2 6 2
7
= 
C 2 21 7
3
C 2  4 C1 12 4
P(1) = P[one white and one red] = 7
 =
C2 21 7
3
C2 3 1
P(2) = P[both white] = 7
= 
C 2 21 7
The probability distribution of X is –
Table 5.10
Xi 0 1 2
P(Xi) 2/7 4/7 1/7

∴ EΧ  =  Χ i ΡΧ i  0 
2 4 1 6
+ 1 + 2  =
7 7 7 7
 1 (approximately)

Thus, one white ball is expected in the draw.

Manipal University Jaipur Page No. 250


Statistics for Management Unit 5

Self Assessment Questions


5. Fill in the blanks:
i. For a random variable  P(Xi) = ___________.
ii. Expectation of a random variable is same as ________ of the
probability distribution of that variable.
iii. Var (X) = E (X2) - ___________.

5.7 Summary
Let us recapitulate the important concepts discussed in this unit:
 Probability plays an important role in decision making process.
Probability is a numerical measure which indicates the chance of
occurrence of an event ‘A’. It is denoted by P(A). It is the ratio between
the favourable outcomes of an event ‘A’ (m) to the total outcomes of the
experiment (n).
 When multiple events are involved in an experiment, the concerned
probabilities are calculated using addition and multiplication rules of
probability.
 Bayes’ theorem deals with the probability of the occurrence of an event
to the occurrence or non-occurrence of an associated event. This is an
important theorem helpful for managers in business decisions.
 Random variable is a not a variable. It is a function. It can be discrete or
continuous.

5.8 Glossary
Equally likely events (equiprobable events): Two or more events are
equally likely if they have equal chance of occurrence.
Event: Even is a subset of the sample space.
Exhaustive set of events: A set of events is exhaustive if one or the other
of the events in the set occurs whenever the experiment is conducted.
Experiment: An operation that results in a definite outcome is called an
experiment.
Mutually exclusive events (disjoint events): Two or more events are
mutually exclusive if only one of them can occur at a time.

Manipal University Jaipur Page No. 251


Statistics for Management Unit 5

Probability: It is a numerical measure which indicates the chance of


occurrence.
Random Experiment: When the outcome of an experiment cannot be
predicted with certainty, then it is called random experiment or stochastic
experiment.
Sample Space: The set of all possible outcomes of a random experiment is
the sample space.

5.9 Terminal Questions


1. Define independent events.
2. The probability of Mr. Sunil solving a problem is ¾. The probability of
Mr. Anish solving is ¼. What is the probability that a given problem will
be solved?
3. The probability that a contractor will get an electrical job is 0.8, he will
get a plumbing job is 0.6 and he will get both 0.48. What is the
probability that he gets at least one job? Are the probabilities of getting
electrical job and plumbing job independent?
4. A box contains 4 red and 5 blue similar rings. What is the probability of
selecting at random two rings:
i. having same colour
ii. having different colours
5. If P(A  B) = 1/2 and P(B) = 2/3, find P(A/B)?
6. The probability that a company A will survive for 20 years is 0.6. The
probability that its sister concern will survive for 20 years is 0.8. What is
the probability that at least one of them will survive for 20 years?
7. A recently developed car has two important components A and B. The
probability of failure of A and B are 0.2 and 0.1. What is the probability
that the car will fail?
8. The probability that a football player will play on ordinary ground is 0.6
and on green turf is 0.4. The probability that he will get knee injury
when playing an ordinary ground is 0.07 and that on green turf is 0.04.
What is the probability that he got a knee-injury due to the play on
ordinary ground?

Manipal University Jaipur Page No. 252


Statistics for Management Unit 5

Activity
Problem 1
The probability that a contractor will get a plumbing contract is 2/3 and
probability that he will not get an electrical contract is 5/9. If the probability
of getting at least one of these contracts is 4/5, what is the probability that
he will get both?
Problem 2
A can solve 90 percent of the problems given in a book and B can solve
70 percent. What is the probability that at least one of them will solve a
problem selected at random.
Problem 3
The probability that a trainee will remain with a company 0.6, The
probability that an employee earns more ten Rs.10,000 per year 0.5. The
probability an employee is trainee who remained with the company or who
earns more than Rs.10,000 per year is 0.7. What is the probability of a
trainee who earns more than Rs.10,000 per year given that he is a trainee
who stayed with the company.
Problem 4
Suppose that one of the three men, a politician, a bureaucrat and an
educationist will be appointed as VC of the university. The probabilities of
their appointment are respectively 0.3, 0.25 and 0.45. The probability that
these people will promote research activities if they are appointed is 0.4,
0.7 and 0.8 respectively. What is the probability that research will be
promoted by the new VC.
Problem 5
A box contains 4 green and 6 white balls another box contains 7 green
and 8 white balls. Two balls are transferred from box 1 to box 2 and then
a ball is drawn from box 2. What is the probability that it is white?
event A: transferred balls are green
event B: transferred balls are white
event C: Among transferred balls one green and 1 white
event D: selection of a white ball from box 2

Manipal University Jaipur Page No. 253


Statistics for Management Unit 5

5.10 Answers

Self Assessment Questions


1. i) Relative frequency
ii) Subjective
iii) Classical
iv) Subjective
2. i) 1/2 ii) 1/7
3. 0.8
4. i) False ii) False iii) True
5. i) 1 ii) Mean iii) [E(X)]2

Terminal Questions
1. Refer section 5.1.4
2. 13/16
3. 0.92, Yes
4. i) 4/9, ii) 5/9
5. 3/4
6. 0.92
7. 0.28
8. 21/29

Activity Solution
Solution 1
Let, A: Contractor gets a plumbing contract
B: Contractor gets an electrical contract
Then, P(A) = 2/3 P(B) = 5/9 and P(A  B) = 4/5
Therefore, P(B) = 1-P(Bc) = 4/9
By addition theorem we have,
P(A  B) = P(A) +P(B) – P(A  B)
That is, P(A  B) = P(A) +P(B) – P(A  B)
Therefore, P [he gets both plumbing and electrical contract] =
P(A  B) = P(A) +P(B) – P(A  B)
= 2 / 3 + 4 / 9  4 / 5 = 14 / 45

Manipal University Jaipur Page No. 254


Statistics for Management Unit 5

Solution 2
Event A: Student A solves the problem
Event B: Student B solves the problem.
P(at least one solve the problem) = 1-P(none solve the problem)

 1 P A  B 
 1  P(A ).P(B)
 1  (0.10)(0.30)
 0.97
Solution 3
Event A: a trainee will remain with the company
Event B: a trainee earns more than Rs. 10,000.
Given P(A) = 0.6, P(B) = 0.5, P(A  B) = 0.7
We need to find probability of a trainee who ears more than Rs.10000 per
year given that he is a trainee who stayed with the company:
P(A  B) P(A) + P(B)  P(A  B) 0.6  0.5  0.7 0.4
P(B / A) = =  = = 0.67
P(A) P(A) 0 .6 0.6

Solution 4
Event A: politician appointed as VC
Event B: bureaucrat appointed as VC
Event C: educationist appointed as VC
Event D: promotion of research activities
Probability that the research will be promoted by the new VC:
 P(A  D) + P(B  D) + P(C  D).
= P(D / A).P(A) + P(D / B).P(B) + P(D / C).P(C)
 (0.3)(0.4) + (0.25)(0.7) + (0.45)(0.8) = 0.655

Solution 5
Event A: transferred balls are green
Event B: transferred balls are white
Event C: among transferred balls one green and 1 white
Event D: selection of a white ball from box 2

Manipal University Jaipur Page No. 255


Statistics for Management Unit 5

Probability that the ball is white is:


= P(A  D) + P(B  D) + P(C  D)
= P(D / A).P(A) + P(D / B).P(B) + P(D / C).P(C)
4
C2 8 6
C 2 10 4 C1  6 C1 9
 10
 +  + 10 
C 2 17 10 C 2 17 C2 17
 0.5412

5.11 Case Study


Silver Jubilee of Sun Enterprise
Sun enterprise has been consistently doing well for the last 25 years. As a
part of silver jubilee celebration, the new CEO of the company, Ms Rashmi,
wanted to announce certain welfare and incentive schemes. However,
before that, she wanted to have an idea of the level of satisfaction among
the employees with varying experience in the company. Since the time was
short, she asked the HRD department to conduct a quick survey and
present the findings within 3 days. Accordingly, the HRD department
conducted a survey of 100 randomly selected employees and found the
following results.
Table 5.11
Experience ( in years)
1 to 5 6 to 15 16 and more
Level of Low 11 07 04
Satisfaction Medium 15 24 14
High 08 17 18

As the head of human resource department, do the following:


(a) Present the findings of the survey, and answer the following questions:
(i) What is the probability that an employee selected at random will
have high level of satisfaction?
(ii) What is the probability that an employee with more than 16 years
of experience is highly satisfied?
(iii) What is the probability that an employee with medium level of
satisfaction will have experience from 6 to 15 years?
(b) Suggest suitable format for collecting data to access the level of
performances among the employees with varying experience.

Manipal University Jaipur Page No. 256


Statistics for Management Unit 5

Assuming hypothetical figures, analyse the same and present the findings to
the CEO
(Source: T N Srivastava and Shailaja Rejo (2008) Statistics for Management, 5th
ed., TMH)

References:
 Agarwal, B. L. (2006), Basic Statistics, Fourth Edition, New Age
International Publishers.
 Anderson, David R. Sweeney, Dennis J. & Williams, Thomas A. 5th ed.,
Thomson Business Information Pvt. Ltd.
 Bowerman, B. L. & R.T. O Connel, (1996), Applied Statistics: Improving
Business Processes, Irwin.
 Freedman, D., Pisani, R. and Purves, R.(1997), Statistics, 3rd ed., W. W.
Norton.
 Levin, Richard I. & Rubin, David S. (2008), Statistics for Management,
Seventh Edition, PHI Learning Private Limited.
 Srivastava, T. N. & Rejo, Shailaja (2008), Statistics for Management, 5th
ed., TMH.
 Tanur, J. M. (2002), Statistics: A Guide to the Unknown, 4th ed., Brooks
/Cole..
 Tukey J.W ,Exploratory Data Analysis, Addison –Wesley, 1977.
 Wilcox, Rand R. (2009), Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press

E-References:
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-1.pdf

Manipal University Jaipur Page No. 257


Statistics for Management Unit 6

Unit 6 Theoretical Probability Distributions


Structure:
6.1 Introduction
Objectives
Relevance
Statistics in practice
Random variables
6.2 Probability Distributions
Discrete probability distributions
Continuous probability distributions
6.3 Bernoulli Distribution
Repetition of a Bernoulli experiment
Mean and Variance of Bernoulli distribution
6.4 Binomial Distribution
Assumptions for applying a Binomial distribution
Examples of Binomial variate
Case study on Binomial distribution
6.5 Poisson Distribution
Assumptions for applying the Poisson distribution
Real life examples of Poisson variate
Case study on Poisson distribution
6.6 Normal Distribution
Standard Normal variate
6.7 Summary
6.8 Glossary
6.9 Terminal Questions
6.10 Answers
6.11 Case Study

6.1 Introduction
In the previous unit, we have studied about basic Probability theory
concepts. We have also studied the application of probability rules in solving
problems related to real life situations. We have ended the previous unit with
introduction to the concept of random variables. In this unit, we will discuss
about the probability distributions of the random variables; both discrete and

Manipal University Jaipur Page No. 258


Statistics for Management Unit 6

continuous. Before studying this unit, you have to refresh the concept of
random variables which was covered in the previous unit.
Individuals and corporate generate several data that resemble certain
theoretical distributions. Mathematically, we have many derived
characteristics of the theoretical distributions. We can make use of such
derived characteristics for a quick analysis of the observed distributions.
The examples of observed distributions are:
i. Number of male children in a family
ii. Number of defectives produced per production run
iii. Number of employees drawing salary in some brackets
The theoretical distributions are formed under certain assumptions. The
theoretical distributions are classified into two types. The two types of
theoretical probability distributions are:
i. Discrete probability distributions
ii. Continuous probability distributions
The figure 6.1 depicts the two groups of theoretical distributions.

Fig. 6.1: Theoretical Distributions

Objectives:
After studying this unit, you should be able to:
 differentiate between Bernoulli process and Binomial distribution
 evaluate the probabilities using the Binomial distribution
 evaluate the probabilities using the Poisson distribution
 analyse the probabilities using the Normal distribution

Manipal University Jaipur Page No. 259


Statistics for Management Unit 6

6.1.1 Relevance
Good health, a pharmaceutical firm set up a plant to fill the bottles with
100 ml of costly medicines. The production manager observed that the filling
machine does not fill the bottles with the set volume but each filling was
different from 100ml, through by a small amount sometimes less, sometimes
more. While drug regulation stipulates a heavy fine, if a bottle is found to
have less than 100 ml, the management is concerned with wastage of
medicines that occurs when filling is more than 100 ml.
The dilemma for the production manager is that if he tries to reduce
wastage, he might incur the risk of being fined by regulatory authority. He
has decided the level which he should set for the filling machine so that the
wastage and the risk of getting penalised are minimised. This is where the
role of statistician came into picture and the issue could be resolved with the
help of statistical distribution to the satisfaction of the production manger.
(Source: Srivastava T. N. & Rejo, Shailaja (2008), Statistics for Management, 5th
ed.,TMH)

6.1.2 Statistics in practice


Citi Bank
Long Island, New York
Citibank, a division of Citi group, makes available a wide range of financial
services, including checking and savings accounts, loans and mortgages,
insurance and investment services, within the framework of a unique
strategy for delivering those services called Citibanking. Citibank entails a
consistent brand identity all over the world, consistent product offering and
high level customer service. Citibanking lets you manage your money
anytime, anywhere, anyway you choose.
Each Citibank CBC (Citi Banking Centres) operates as a waiting system
with randomly arriving customers seeking service at one of the ATMs. If all
ATMs are busy, the arriving customers wait in line. Periodic CBC capacity
studies are used to analyse customer waiting times and to determine
whether additional ATMs are needed.
Data collected by Citibank showed that random customer arrivals followed a
probability distribution known as Poisson probability distribution. Using the
Poisson probability distribution, Citibank can compute probabilities for the

Manipal University Jaipur Page No. 260


Statistics for Management Unit 6

number of customers arriving at a CBC during any time period and make
decisions concerning the number of ATMs needed.
(Source: Anderson, David R., Sweeney, Dennis J., & Williams, Thomas A.,
5th ed., Thomson Business Information Pvt. Ltd.)

6.1.3 Random variables


We will recap the definition of a Random variable discussed in the previous
unit.
A real valued function that associates result or outcome of an experiment
with a real number is known as Random variable. In general, a random
variable X may take any value on the real line (i.e., any number
between   and  ). We can assign probability that X would take a value
in any specified interval such as    X a , a  X  b , a  X  b , a  X  b ,
a  X  b , a X , X b , etc.

In fact, the random variable is not exactly a variable but a function.


Discrete random variable
A Random variable is discrete when the number of possible outcomes in a
random experiment is countable. For example, when a fair coin is tossed
once, the number of possible outcomes is two (either head or tail). When the
fair coin is tossed twice, the number of possible outcomes is four {HH, HT,
TT, TH}.
In the above two cases, the number of outcomes are finite. As the number
of values of the random variable is finite, it is called discrete random
variable.
Continuous random variable
A random variable is continuous if it is capable of taking all values in an
interval of real numbers. For example, the values of train timings for
departures and arrivals at a particular station are continuous random
variables. The measures of height and the intelligence quotient of the
people are also examples of continuous random variable.

6.2 Probability Distribution


In this section, we will discuss the probability distributions. As the random
variables are discrete and continuous, the probabilities associated with
random variables are also discrete and continuous. The listing of all the
Manipal University Jaipur Page No. 261
Statistics for Management Unit 6

probable outcomes in a random experiment along with their respective


probabilities is called the Probability distribution.
6.2.1 Discrete probability distributions
A discrete probability distribution consists of all possible values of a discrete
random variable along with their corresponding probabilities. Binomial,
Bernoulli, Poisson are all examples of Discrete probability distributions. In
this unit you will study all the three distributions in detail.
6.2.2 Continuous probability distributions
In a continuous probability distribution, the variable under consideration
assumes any value within a given range. Hence, it is very difficult to list all
values. One example of continuous probability distribution is the distribution
of normal variable. In this unit, you will study about normal distribution in
detail.

6.3 Bernoulli Distributions


In this section, we will discuss the Bernoulli distributions. A random variable,
which assumes values ‘1’ and ‘0’ with probabilities ‘p’ and ‘q’, (where, q = 1-p)
is called Bernoulli variable. It has only one parameter ‘p’. For different
values of ‘p’ (0<p<1), we get different Bernoulli distributions. In these
distributions, ‘1’ represents the occurrence of success and ‘0’ represents the
occurrence of failure.
The Bernoulli distribution can be depicted in tabular format as follows:

X 1 0
P(X) p q

Note 1: Bernoulli distribution has one constant, namely, p. This constant p is


called parameter of the Bernoulli distribution. Different values of p (where
0<p<1) give different Bernoulli distributions.
Note 2: Bernoulli distribution can also be written down as
P(x) = px q1-x, x = 0,1.
Note 3: Here, occurrence of the value 1 may be termed as “Success” and
the occurrence of the value 0 may be termed as “Failure”.
Therefore, P [Success] = p and P [Failure] = q = 1 - p

Manipal University Jaipur Page No. 262


Statistics for Management Unit 6

In other words, the assumption for the distribution is the outcome of an


experiment. It is of dichotomous nature, that is, success/failure, present/
absent, defective/non defective, yes/no, etc.

Example 1:
When a fair coin is tossed as shown in figure 6.2, the outcome is either
head or tail. The variable ‘X’ assumes ‘1’ or ‘0’.

Fig. 6.2: Flipping a Coin

Other examples of Bernoulli distribution:


Often Two outcomes which are not equally likely:
1. Success of medical treatment
2. Interviewed person is female
3. Student passes exam
6.3.1 Repetition of a Bernoulli experiment
A successive repetition of an experiment is called Bernoulli trials if:
i) there are exactly two possible outcome such as success or failure
ii) the trials are independent
Let a Bernoulli experiment be repeated ‘n’ times under identical conditions.
Let Xi, for i = 1 to n, assume the values ‘1’ or ‘0’. Then X i is a Bernoulli
variate with probability ‘p’.
Let X = X1 + X2 +……..+ Xn denote the number of successes in the ‘n’
repetition. Then X follows binomial distribution.
Example 2
Let coin be tossed 3 times. Let Xi (i = 1, 2, 3) be a variate, which takes
values 1 and 0 accordingly as the ith toss result in “Head” or “Tail”. Then,
X = X1 + X2 + X3 denotes “the number of heads” obtained in the 3 tosses.
Manipal University Jaipur Page No. 263
Statistics for Management Unit 6

Result:
If X1, X2, …Xn are independent and identically distributed Bernoulli variates
with common parameter p, their sum X = X1 + X2 + ….+ Xn is a Binomial
variate with parameters n and p.
6.3.2 Mean & variance of Bernoulli distribution
Let, ‘Xi’ be a Bernoulli variate with parameter p. Then, probability distribution
of Xi is

Xi 1 0
P(Xi) p q

EΧ  =  Χ i ΡΧ i   1  p + 0  q = p
 
E Χ 2 =  Χ i2 ΡΧ i   12  p + 0 2  q = p
Var(X) = E(X 2 )  [E(X)] 2  p  p 2  p(1  p) = pq
Thus, mean of Bernoulli distribution is E(X) = p
Variance of Bernoulli distribution is Var(X) = p (1-p) = pq

Standard deviation of Bernoulli distribution is S.D(X) = pq

Solved Problem 1
In an interview conducted by a company, if the probability that an interviewed
person is male is 2/3 and female is 1/3. Find the mean and variance of the
distribution.
Solution
Let, ‘X’ denote gender of the interviewed person. If interviewed person is male
then X takes value 1 and if interviewed person is a female X takes value 0,
with probabilities 2/3 and 1/3 respectively (i.e., p+q=2/3 +1/3=1). And X follows
Bernoulli distribution as shown in the following table:
X 1 0
P(X) 2/3 1/3

Mean of Bernoulli distribution is E(X)= p= 2/3 and


Variance is Var(X)= pq = 2/3 x 1/3 =2/9.

Manipal University Jaipur Page No. 264


Statistics for Management Unit 6

Key statistic
The mean and variance of a Bernoulli distribution are ‘p’ and ‘pq’
respectively.

Self Assessment Questions


1. State whether the following statements are ‘True’ or ‘False’.
i) The sum of probabilities sometimes will be greater than 1.
ii) The amount of time you study for an exam is a discrete random
variable.
iii) The Bernoulli distribution has only one parameter ‘p’.

6.4 Binomial Distribution


In this section, we will discuss the Binomial distribution. When a Bernoulli
experiment is repeated for ‘n’ number of times, then it is called a binomial
process. Binomial distribution is a discrete probability distribution. A
Probability distribution which has the following probability mass function
(p.m.f) is called Binomial distribution. Its probability mass function is given
by:

  x  n C x q n -x p x where, x = 0 ,1,2,…, n.

Here n, p are parameters, the variable X is discrete and it is called Binomial


variate.
Note 1: Binomial p.m.f. has two independent constants, namely, n and p.
These two constants are the parameters of binomial distribution
Note 2: A binomial distribution with parameter n and p is denoted by
B(n, p).
6.4.1 Assumptions for applying a Binomial Distribution
The following are assumptions under which a Binomial distribution can be
applied:
i) The outcome of an experiment should be of dichotomous nature.
In the Bernoulli process, there must be only two possible outcomes on
each trial, such as ‘success’ or ‘failure’, ‘yes’ or ‘no’, ‘defective’ or ‘not
defective’, ‘male’ or ‘female’, ‘pass’ or ‘fail’, ‘favourable’ or

Manipal University Jaipur Page No. 265


Statistics for Management Unit 6

‘unfavourable’, etc. In this experiment, the probability of success is


denoted by ‘p’ and probability of failure is denoted by ‘q’.
ii) The probability of success should remain the same across the
experiments.
Irrespective of the number of times the experiment is conducted, the
probability of success should be same for all the trials of the
experiment. For example, the probability of getting a head is always
0.5 irrespective of the number of times a fair coin is tossed.
iii) Experiments should be conducted under identical conditions.
There should not be any change in conditions while conducting
Binomial experiments. Any change in conditions only leads to incorrect
conclusions for the given experiment.
iv) Experiments should be statistically independent.
We can apply a Binomial distribution only when the events in an
experiment are statistically independent, which means occurrence of
one event does not affect the occurrence of other event.
In a manufacturing plant, the product part coming out of the production line
cannot be ‘defective’ and ‘not defective’ simultaneously. The product part
can be either ‘defective’ or ‘not defective’ but not both, at the same time.
6.4.2 Examples of Binomial variate
Some of the examples of Binomial variate are as follows:
1. Number of defective articles in a random sample of 6 articles drawn from
a manufactured lot
2. Number of seeds germinating among 10 seeds sown
3. Number of heads obtained in 3 tosses of a coin
4. Number of male children in a family of 5 children
5. Number of bombs hitting a bridge among 8 bombs which are dropped on
it
The mean and variance of the distribution are ‘np’ and ‘npq’ respectively,
where, ‘n’ and ‘p’ are its parameters. This distribution is a unimodal
distribution. For fixed ‘n’ or ‘p’, as ‘p’ or ‘n’ increases, the distribution shifts
from left to right.
There are three types of problems in calculating distribution. They are
depicted in table 6.1.

Manipal University Jaipur Page No. 266


Statistics for Management Unit 6

Table 6.1: Types of Problems in Calculating Distribution

Type i Finding the probability of events


Type ii Finding the expected values
Type iii Finding the distribution if parameters are given

Type i: Finding the probability of events

Solved Problem 2
An unbiased coin is tossed six times. What is the probability that the tosses
will result in:
i) Exactly two heads
ii) At least five heads
iii) At most two heads
iv) Not greater than one head
v) Not less than five heads
vi) At least one head
Solution
Let ‘A’ be the event of getting head. Given that:

1 1
p , q , n6
2 2

Therefore, by binomial distribution,    x 6 Cx ( 12 )6-x ( 12 ) x


i) The probability that the tosses will result in exactly two heads is given
by:
6 2 2
1 1  6  5  1  1  15
  2 6 C 2       
2 2  1  2  2 4  2 2  64

Therefore, the probability that the tosses will result in exactly two
heads is 15/64.
ii) The probability that the tosses will result in at least five heads is given
by:

Manipal University Jaipur Page No. 267


Statistics for Management Unit 6

65 5 6 6 6
1 1
  5    5    6 6 C 5      6 C 6  1  1
 
2 2 2 2

6 6
1 1
  5  6       
7
2 2 64

Therefore, the probability that the tosses will result in at least five
heads is 7/64.
iii) The probability that the tosses will result in at most two heads is given
by:

  2     0     1    2 

6 6 1 1 6 2 2
1 1 1 6 1 1
    6 C1      C2    
2 2 2 2 2
 1   1   6  5 1  1  6  15 22 11
  2      6        
 64   64   1  2 64  64 64 32
Therefore, the probability that the tosses will result in at most two
heads is 11/32.
iv) The probability that the tosses will result in not greater than one head
is given by:

  1    0    1 


1 6 7
 
64 64 64
Therefore, the probability that the tosses will result in not greater than
one head is 7/64.
v) The probability that the tosses will result in not less than five heads is
given by:

  5    5    6  


6 1 7
6
 6 
2 2 64
Therefore, the probability that the tosses will result in not less than five
heads is 7/64.

Manipal University Jaipur Page No. 268


Statistics for Management Unit 6

vi) The probability that the tosses will result in at least one head is given
by:

  11 P( X 1) 1    0 1


1 1 63
1  .
26 64 64

Therefore, the probability that the tosses will result in at least one head
is 63/64.
The graph depicted in figure 6.3 illustrates the binomial distribution of
probability of ‘x’ number of heads occurring when a coin is tossed 6 times.

Fig. 6.3: Binomial Probability Distribution

Solved Problem 3
The probability that an employee will get an occupational disease is 20%. In
a firm having five employees, what is the probability that:
i) None of the employees get the disease
ii) Exactly two will get the disease
iii) More than four will contract the disease
Solution
Given that:
20
p  0.2
100
 q  1  0.2  0.8
n=5
Therefore, by binomial distribution,   x  5 C x (0.8) 5-x (0.2) x

Manipal University Jaipur Page No. 269


Statistics for Management Unit 6

i) The probability that none of the employees get the disease is given by:
  0  0.8  0.3277
5

Therefore, the probability that none of the employees get the disease
is 0.3277.
ii) The probability that exactly two employees will get the disease is given
by:

  2 5 C 2 0.8 0.2  (10)(0.512)(0.04)  0.2048


3 2

Therefore, the probability that exactly two employees will get the
disease is 0.2048.
iii) The probability that more than four employees will get the disease is
given by:
  4    5  0.2  0.00032
5

Therefore, the probability that more than four employees will get the
disease is 0.00032.
Solved Problem 4
The probability that a bomb dropped on a bridge, will hit the bridge is 0.5.
Eight bombs are dropped on the bridge. The bridge will be destroyed if any
two bombs fall on it. Find the probability that:
i) All bombs hit the bridge
ii) The bridge is destroyed
Solution
Let the probability that the bomb will hit the bridge be p. Given that:
p  0.5 and n  8
 q  1  0.5  0.5

 
Therefore by binomial distribution,    x  8 C x (0.5)
8-x
(0.5)
x

i) The probability that all the bombs hit the bridge is given by:
8
1  1 
  8  0.5
8
   
2  256 
Therefore, the probability that all the bombs hit the bridge is 1/256.

Manipal University Jaipur Page No. 270


Statistics for Management Unit 6

ii) Bridge is destroyed if two or more bombs fall on it. The required
probability is given by:

 
  2   1     0      1

 1 8 8
8  247
  1  1
 1     8 C1     1  
 2  2  256 256 256

247
Therefore, the probability that the bridge is destroyed is
256
Type ii: Finding the expected values

Solved Problem 5
A random sample of 5 sachets of coconut oil was examined and two were
found to be leaking. A wholesaler receives six hundred and twenty five
packets, each containing 5 sachets. Find the expected number of packets to
contain exactly one sachet leaking?
Solution
Given that:
n  5 ,   625
Probability of leaking p is given by:

2
p
5
2 3
 q 1   
5 5
5x x
3 2
Therefore by binomial distribution,   x  5 C x    
5   5
51 1
3 2
   1 5 C    
1 5  5

  1  5 
81 2 162
 
625 5 625
Manipal University Jaipur Page No. 271
Statistics for Management Unit 6

 The expected number of packets to contain exactly one leaking sachet is


given by:

    1  625  162  162


625

Hence, the expected number of packets to contain exactly one leaking


sachet is 162.
Type iii: Finding the distributions

Solved Problem 6
For a binomial distribution with n = 5 and p = 0.2.
Find:
i) P(X=3)
ii) P(X<4)
Solution
Given that:
n  5, p  0.2, and q 1p  0.8

Therefore by binomial distribution,   x 5 C x 0.8


5x 0.2x

i) X  3 5 C 3 0.8 0.23  10  0.82  0.23  0.0512


5 3

ii) X  4   X  0   X  1  X  2   X  3

5 C 0.850 0.20  5 C 0.8510.21  5 C 0.852 0.22  5 C 0.853 0.23


0 1 2 3
 0.8  5  0.8  0.2  10  0.8  0.2  10  0.8  0.2
5 4 3 2 2 3

= 0.32768 + 0.4096 + 0.2048 + 0.0512


 0.99328
Therefore, the values for P(X=3) and P(X<4) are 0.0512 and 0.99328
respectively.
Solved Problem 7
Bring out the fallacy, if any, in the following statement on binominal
distribution.
Manipal University Jaipur Page No. 272
Statistics for Management Unit 6

The mean of a Binomial distribution is 4 and its variance is 5.


Solution
Given that:
The mean of a Binomial distribution is 4 and its variance is 5.
np  4 (Mean)……………. (1)

npq  5 (Variance)………… (2)

npq 5
  ……………… (3)
np 4

 q  5/ 4

Since, q > 1, the statement: ‘The mean of a binomial distribution is 4 and its
variance is 5’, is wrong.
Solved Problem 8
The incidence of an occupational disease in an industry is such that the
workers have 25% chances of suffering from it. What is the probability that
out of 5 workers, at the most two contract that disease?
Solution
Let X: number of workers contracting the diseases among 5 workers
Then, X is a binomial variate with parameter n = 5
p = P [a worker contracts the disease] = 25/100 = 0.25
Therefore by binomial distribution,
P(X=x) = 5Cx (0.25)x (0.75)5-x, x = 0, 1, 2,….5
The probability that at the most two workers contract the disease is
P[X  2] = P(X  0) + P(X  1) + P(X  2)
 5 C 0 (0.25) 0 (0.75) 5 + 5 C1 (0.25)1 (0.75) 4 + 5 C 2 (0.25) 2 (0.75) 3
 0.2373 + 0.3955 + 0.2637
 0.8965
Solved Problem 9
In a large consignment of electric lamps, 5% are defective. A random
sample of 8 lamps is taken for inspection. What is the probability that it has
one or more defectives?
Manipal University Jaipur Page No. 273
Statistics for Management Unit 6

Solution
Given n = 8, p = 5/100 = 0.05
X: number of defective lamps
Therefore by binomial distribution,
P(X=x) = 8 C x (0.05) x (0.95) 8x , x = 0,1,2,...8
P [sample has one or more defectives] = 1 – P [no defectives]
= 1 - P(X=0)
= 1 - 8C0 (0.05)0(0.95)8
= 1 – 0.6634 = 0.3366

Self Assessment Questions


2. State whether the following statements are ‘True’ or ‘False’.
i) Mean of binomial distribution is ‘npq’.
ii) ‘n’ and ‘p’ are the parameters of Binomial distribution.
iii) If the mean and variance of a Binomial distribution are 6 and 5,
then p = 1/6.
iv) Each trial in a binomial experiment has the different probability of
success ‘p’.
6.4.3 Case study on Binomial distribution

Case Study 1
Vinay is the operations manager of the books section of a large
department store. He has calculated that 0.4 is the probability that a
customer who is just browsing will buy something. Suppose that six
customers browse in the books section each hour. Vinay wants to
calculate the following probabilities.
What is the probability that:
i) Exactly four browsing customers will buy something during a
specified hour
ii) At least two browsing customers will buy something during a specified
hour
iii) None of the browsing customers will buy anything during a specified
hour

Manipal University Jaipur Page No. 274


Statistics for Management Unit 6

6.5 Poisson Distribution


In this section, we will discuss the Poisson distribution. Poisson process is
obtained when the binomial experiment is conducted many number of times.
Here, the number of trials would be a large number. It is also a discrete
probability distribution. If the probability of success ‘p’ is small and the
number of trials ‘n’ is large, the binomial probabilities are hard to calculate.
In such cases, when ‘n’ is large and ‘p’ is small, the binomial distributions
are approximated to Poisson distributions.

Key statistic
The probability distribution of a Poisson random variable ‘X’ is given by:
e m m 
   
x!

where, x takes the values 0,1,2,…,∞.


e  2.71828, the base of natural logarithm
m  mean number of successes in the given time interval

The mean and variance of the distribution is ‘m’. Its standard deviation is
m and ’m’ is called the parameter of the Poisson distribution.

Key statistic
The mean of the Poisson distribution is also given by:
m  np
where, ‘p’ is the probability of success and ‘n’ is the number of trials.

It is a unimodal distribution. It is also known as the distribution of ‘rare


events’. It is the limiting form of binomial distribution as ‘n’ tends to infinity.
6.5.1 Assumptions for applying the Poisson distribution
Poisson distribution can be applied under the following assumptions:
i) The outcome of trial / experiment must be of dichotomous nature
ii) The probability of success must remain the same for trials
iii) The trials should be conducted under identical conditions

Manipal University Jaipur Page No. 275


Statistics for Management Unit 6

iv) The trials should be statistically independent


v) The probability of success should be very small and ‘n’ should be large
such that ‘np’ is a constant ‘m’.
Thus, we have discussed the assumptions for applying the Poisson
distribution.

6.5.2 Real life examples of Poisson variate


Some of the real life examples of Poisson Variate are as follows:
i) Number of accidents in any traffic circle
ii) Number of incoming telephone calls at an exchange per minute
iii) Number of radio-active particles emitted by substances
iv) Number of defects in a product
v) Number of micro-organisms developed during a period

Type i: Finding the probability of events

Solved Problem 10
Suppose two houses in a thousand catch fire in a year and there are 2000
houses in a village. What is the probability that:
i) None of the houses catches fire
ii) At least one house catches fire
iii) Not more than two houses catches fire
Solution
Given the probability of a house catching fire is:

2
p  0.002 , n  2000
1000

The probability function for the Poisson Distribution is


mx
   x   e  m , where x = 0,1,2,…,∞.
x!
 m  np  2000  0.002  4
4x
   x   e  4
x!
Manipal University Jaipur Page No. 276
Statistics for Management Unit 6

Therefore, the required probabilities are calculated as follows:


i) The probability that none catches fire is given by:
m0
   0   e m  e  4  0.01832
0!
Therefore, the probability that none of the houses catches fire is
0.01832.
ii) The probability that at least one catches fire is given by:
  1  1    0  1  0.01832  0.98168

Therefore, the probability that at least one house catches fire is


0.98168.
iii) The probability that not more than 2 houses catches fire is given by:
0 1 2
  2     0     1    2   e
4 4 4 4 4 4
e e
0! 1! 2!

 0.0183  (0.0183  4)  (0.0183  8)  0.2379


Therefore, the probability that not more than 2 houses catches fire is
0.2379
Solved Problem 11
One percent of bulbs manufactured by a firm are expected to be defective.
A carton contains 200 bulbs. Find the probability that the carton contains 3
or more defective bulbs?
Solution
Given that:
The probability that bulb is defective p = 1% = 0.01

n  200  m  np  200  0.01  2

The probability function for the Poisson Distribution is


 mx
   x   e m , where x = 0,1,2,…,∞.
x!
2x
   x   e  2
x!
Manipal University Jaipur Page No. 277
Statistics for Management Unit 6

The probability that the carton contains 3 or more defective bulbs is given
by:

  3  1    0     1     2 

m0 21 22
 1   e 2  e 2  e 2   1  e 2 1  2  2
0! 1! 2!
 1  0.13534  5  1  0.6767  0.3233

Therefore, the probability that the carton contains 3 or more defective bulbs
is 0.3233.
Solved Problem 12
On an average, there are three mistakes on a page of a book. The book
contains 200 pages. What is the probability that a randomly selected page
has exactly one mistake?
Solution
The probability function for the Poisson Distribution is
mx
   x   e  m , where x = 0,1,2,…,∞.
x!
Given that m  3 the required probability is calculated as:

31
  1  e 3   0.04979  3  0.14937
1!
Hence, the probability that a randomly selected page has exactly one
mistake is 0.14937

Type ii: Finding the expectations

Solved Problem 13
From the data given in solved problem 12, how many pages would you
expect to be free from mistakes?

Manipal University Jaipur Page No. 278


Statistics for Management Unit 6

Solution
Given that:
m  3 N  200

  0   e
3
 0.04979

 Expected number of pages to be free from mistakes is given by:

N    0  200  0.04979  9.958  10 pages


Expected number of pages to be free from mistakes is approximately 10
pages.

Type iii: Finding the distributions

Solved Problem 14
If X is a Poisson variate such that P(X = 1) = P(X = 2), find P(X = 0).
Solution
Let ‘m’ be the parameter of the distribution, and P(X = 1) = P(X = 2)

m m1 m m
2
e e
1! 2!

m m2
 
1 2

 2m  m 2  m  2

  0  e 2  0.13534
Solved Problem 15
The following data relates to the number of mistakes in each page of a book
containing 180 pages.
Table 6.2: Data relating to the number of mistakes in each page of a book

No of mistakes per
0 1 2 3 4 Total
page:
No. of Pages 138 161 69 27 5 400

Manipal University Jaipur Page No. 279


Statistics for Management Unit 6

Fit a Poisson distribution to the data. Obtain the theoretical frequencies


Solution
Let ‘X’ denote the number of mistakes per page. Then, X is a Poisson
variate. The parameter is :

m=X=
 fx 
138  0 + 161  1 + 69  2 + 27  3 + 5  4
N 400
400
 = 1, m  1
400
The probability function for the Poisson Distribution is
 mx
   x   e m , where x = 0,1,2,…,∞.
x!
e 11x
P( X  x ) =
x!
Table 6.2a: Calculation of expected frequencies
No of mistakes Probability N Frequency
per page function function
P[X=x] N x P[X=x]

e 110
0 400 400 x 0.3679 =
 0.3679 147.16
0!
e 111
1 400 400 x 0.3679 =
 0.3679 147.16
1!
e 112
2 400 400 x 0.1839 =
 0.1839 73.56
2!
e 113
3 400 400 x 0.0613 =
 0.0613 24.527
3!
e 114
4 400 400 x 0.0153 =
 0.0153 6.12
4!

Solved Problem 16
The average number of telephone calls booked at an exchange between
10-00 A.M. and 10-10 A.M. is 4. Find the probability that on a randomly

Manipal University Jaipur Page No. 280


Statistics for Management Unit 6

selected day 2 or more calls are booked between 10-00 A.M. and
10-10 A.M. On how many days of a year, would you expect booking of
2 or more calls during that time gap.
Solution
Let X: number of telephone calls booked at the exchange during 10-00 A.M.
to 10-10 A.M. The mean is m=4.

The probability function for the Poisson Distribution is

mx
   x   e  m , where x = 0,1,2,…,∞.
x!

e 4 4 x
P( X  x ) =
x!
P [ 2 or more calls] = 1 - P[ less than 2 calls]
 e  4 40 e  4 41 
= 1 – [P(X=0) +P(X=1)]= 1    
 0! 1! 

 40 41 
= 1 - e 4   
 0! 1! 

= 1 – 0.0183 [ 1 + 4 ]
= 1 – 0.0915 = 0.9085
A year has 365 days. Out of these N = 365 days, the number of days on
which there will be 2 or more calls is:
N x P[ 2 or more calls] = 365 x 0.9085 = 332 days

Solved Problem 17
2 percent of the fuses manufactured by a firm are expected to be defective,
Find the probability that a box containing 200 fuses contains
i) defective fuses
ii) 3 or more defective fuses.

Manipal University Jaipur Page No. 281


Statistics for Management Unit 6

Solution
2 percent of the fuses are defective. Therefore, probability that a fuses is
2
defective is p = = 0.02, n = 200
100
Let ‘X’ denote the number of defective fuses in the box of 200 fuses.
Then, X is B (n = 200, p = 0.02) i.e., binomial with parameters n and p.
Here, p is very small and n is very large. Therefore, X can be treated as
Poisson variate with parameter m=np = 200 x 0.02 = 4.

The probability function for the Poisson Distribution is

e 4 4 x
P(X  x) = , where x = 0,1,2, …, .
x!
P [ box has defective fuses] = 1-P [ no defective fuses]

e 4 4 0
= 1 – P(X=0) = 1-
0!
= 1 – 0.0183 =0.9817
P [ 3 or more defective fuses] = 1-P [ less than 3 defective fuses]
= 1 – [P(X=0) +P(X=1) +P(X=2)] =

 e 4 40 e 4 41 e 4 4 2 
=1 -    
 0! 1! 2! 

= 1 – e-4 [1 + 4 + 8]
= 1 – 0.0183 x 13
= 1 – 0.2379 = 0.7621
Solved Problem 18
The probability that a razor blade manufactured by a firm is defective is
1/500. Blades are supplied in packets of 5 each. In a lot of 10,000 packets,
how many packets would:
i) Be free from defective blades?
ii) Contains exactly one defective blade?(e-0.01=0.99)
Manipal University Jaipur Page No. 282
Statistics for Management Unit 6

Solution
Let ‘X’ be the number of defective blades in a packet of 5 blades.
Then, ‘X’ is B (n = 5, p = 1/500)
Since p is very small and n is sufficiently large, X is treated as Poisson
1
variate with parameter m=np = 5 x = 0.01
500

e 0.01(0.01) x
P(X  x) = , x = 0,1,2,3,...
x!

e 0.01(0.01) 0
i) P[ no defective blades] = P(X=0) = = 0.99
0!
The number of packets which will be free of defective blades is
N x P[no defective blades] = 10000*0.99 = 9900

e 0.01(0.01)1
ii) P [ one defective blade] =P(X=1) = 0.0099.
1!
The number of packets which will contain exactly one defective blade is
N x P [one defective blade] =10000 x 0.0099 = 99.
Solved Problem 19
On an average, a typist mistakes while typing one page. What is the
probability that a randomly observed page in free of mistakes? Among 200
pages, in how many pages would you expect mistakes?
Solution
Let X: number of mistakes in a page.
Then, X is a Poisson variate with parameter m=3.

The probability function for the Poisson Distribution is

e 3 3 x
P(X  x) = , x = 0,1,2,3,....
x!

e 3 3 0
P [Page is free of mistakes] = P(X=0) = = e 3 = 0.0498
0!
P [ Page has mistakes] = 1 - P[Page has no mistakes] = 1- 0.0498 = 0.9502

Manipal University Jaipur Page No. 283


Statistics for Management Unit 6

Among 200 pages, the expected number of pages containing mistakes is


N x P [Page has mistakes] = 200 x 0.9502=190

Self Assessment Questions


3. State whether the following statements are ‘True’ or ‘False’
i) ‘X’ is a Poisson variate if p < 0.1 and n > 10.
ii) Poisson distribution is a unimodal distribution.

Activity 1
1. In a binomial distribution the mean is 6 and the variance is 1.5.
Find (i) P[X=2] and (ii) P[X≤2].
2. In a Poisson distribution P[X=2] = P[X=3]. Find P[X=4].

6.5.4 Case study on Poisson distribution

Case Study 2
Read the information and find the required probability.
On average, four pigeons hit the India Gate and are killed each week.
Ramesh, an official of archaeological survey of India, requested the
Central Government to provide funds to buy equipments to scare pigeons
away from the monument. The concerned official from the Central
Government replied that unless the probability of more than two birds
being killed in any week exceeds 0.7, funds cannot be allocated.
Calculate and find out if the Central Government allocates the funds.

6.6 Normal Distribution


So far in this unit, you have studied only the Discrete probability
distributions. Now, in this section, you will study about the Continuous
probability distributions. The Normal distribution is an important continuous
probability distribution.
A probability distribution which has the following probability density function
(p.d.f) is called Normal distribution.
 2
 1  x
1 
f (x)  e 2 
,    x  ,   0,      
 2
Manipal University Jaipur Page No. 284
Statistics for Management Unit 6

Here, the variable X is continuous and it is called Normal variate.


Note 1: Normal distribution has two parameters namely, µ and σ.
(Here, π =3.14 and e = 2.718)
Note 2: Normal distribution has Mean E(X) = µ and Variance = V(X) = σ2,
S.D(X) = σ.
Note 3: A normal variate with parameters µ and σ is denoted by N (µ, σ2)
Note 4: The normal probability density function (p.d.f.) can be written as-

 2
 1  x
1 
f (x)  e 2 
,    x  ,   0,      
 2
The continuous random variables which can take all values in any given
interval such as the measure of heights, weights, temperatures, amount of
rainfall, etc. are all the examples of Normal random variables.
The following are some of the characteristics of Normal distribution:
1. Normal distribution is a Continuous probability distribution
2. Its probability density function is given by:
 2
 1  x
1 
f (x)  e 2 
,    x  ,   0,      
 2
3. Its mean is  and standard deviation is , where  and  are the
parameters of the distribution
4. It is a bell-shaped curve and is symmetric about its mean, as depicted
in figure 6.4

Fig. 6.4: Normal Distribution Curve

a. It is symmetrical (Non-skew). That is β1 = 0


b. The mean, median and mode are equal

Manipal University Jaipur Page No. 285


Statistics for Management Unit 6

5. The Mean divides the curve into two equal portions


6. Its quartile deviation, Q.D. = 2/3 
7. Its mean deviation, M.D.  4/5 
8. The X – axis is an asymptote to the curve
[ Asymptote is a straight line that touches the curve at infinity]
9. The point of inflexion occurs at   
10. It is a unimodal distribution
11. Mean, Median and Mode coincide
12. The area under normal curve within certain limits is depicted in
table 6.3. The graphical representation of the table 6.3 is depicted in
figure 6.5.
Table 6.3: Area under the Normal Curve for Various Values of ‘’ and ‘’
Limits Area %
 68.2
 1.96 95
  2 95.4
  3 99.7

Fig. 6.5: Areas under the Normal Distribution Curve

Manipal University Jaipur Page No. 286


Statistics for Management Unit 6

Key statistic
The normal distribution is the limiting form of binomial distribution.

6.6.1 Standard normal variate


A Normal variate with mean µ = 0 and standard deviation σ = 1 is
called Standard normal variate. It is denoted by Z. Its probability density
function is given by:

z2
1 2
f (z)  e ,    z  .
2
The graph of standard normal distribution is depicted in the figure 6.6.

Fig. 6.6: Standard Normal Distribution

The shaded area in figure 6.6 depicts the probability that the variate takes a
value between 0 and z. This area can be read from the table of areas under
standard normal curve. Corresponding to positive z, the area from 0 to z can
be read from this table.
Let, ‘X’ be a normal variate with mean µ and standard deviation σ.

X μ
Then Z = is a standard normal variate
σ
Therefore, to find any probability regarding X, the standard normal variate
can be made use of.
Note:
1 The standard normal variate (SNV) is denoted by N (0,1).
2. The standard normal table values are given in annexure (Table 1)

Manipal University Jaipur Page No. 287


Statistics for Management Unit 6

Key statistic
Any Normal distribution can be converted into a Standard normal
distribution by the transformation:
X
The Standard normal variate, ‘Z’ is given by: Z  where ‘Z’ is

called Standard normal variate which gives the number of Standard
deviations from X to the mean of this distribution
 is the mean of the distribution
 is the standard deviation of this distribution
Z varies from -  to + 
The mean of its distribution is ‘0’ and standard deviation is ‘1’. The
statisticians have developed a Standard normal table. The table gives the
probability that ‘z’ will lie between ‘0’ and ‘Z’. Therefore, to solve any
problem with a normal distribution, we convert it to Standard normal
distribution to calculate ‘z’ and then refer to the table, which gives the area
under the normal curve between mean and any value of the normally
distributed random variable.

Key statistic
The mean of Standard normal distribution is ‘0’ and the standard
deviation is ‘1’.

Solved Problem 20
The weight of Cocavito packs packed by the filling machine follows a normal
distribution with mean weight of 500 gm and standard deviation of 10 gm. A
pack is selected at random. What is the probability that:
i) The pack’s weight will exceed 515 gm?
ii) The pack’s weight lie within 480 to 520 gm?
iii) The proportion of packs will have less than 480 and greater than
520 gm?
If 10,000 packs are supplied, how many packs will be rejected, given that
480 gm and 520 gm are lower and upper limit for acceptance?
Solution
X is a normal variate with parameters µ = 500 and σ = 10
Manipal University Jaipur Page No. 288
Statistics for Management Unit 6

X  μ X  500
Therefore, Z=  is a standard normal variate.
σ 10
i) The probability that the packs weight will exceed 515 gm is given by:

 X  500 515  500 


  515   0.5    
 10 10 
   1.5  Area from 1.5 to ∞

= [ Area from 0 to ∞] – [Area from 0 to 1.5]


= 0.5 – 0.4332 = 0.0668

Fig. 6.7: Normal Curve

Therefore, the probability that the packs weight will exceed 515 gm is
0.0668.
ii) The probability that the pack’s weight lie between 480 gm to 520 gm, as
depicted in figure 6.8 is given by:

 480  500 X  500 520  500 


480    520       
 10 10 10 

  2    2 = [ Area from -2 to 0] + [ Area from 0 to 2]

= [ Area from 0 to 2] + [ Area from 0 to 2]


= 0.4772 +0.4772 = 0.9544
(since normal distribution is symmetrical -2 value is same as 2)

Manipal University Jaipur Page No. 289


Statistics for Management Unit 6

Fig. 6.8: Normal Curve

Therefore the probability that the pack’s weight lie between 480 gm to 520
gm is 0.9544.
iii) The probability of acceptance is as found in (ii),

480    520   0.9544

Fig. 6.9: Normal Curve

If the weight lies outside these values then it will be rejected.


 The probability of rejection  1 0.9544  0.0456
The number of packets that will be rejected is given by N x P.
N  P  10000  0.0456  456
The number of packets that will be rejected is 456.
Solved Problem 21
X is a Normal variate with mean 42 and standard deviation 4. Find the
probability that a value taken by X is
(i) less than 50 (ii) greater than 50
(iii) less than 40 (iv) greater than 40
(v) between 40 and 44 (vi) between 37 and 41

Manipal University Jaipur Page No. 290


Statistics for Management Unit 6

Solution
X is a normal variate with parameters, µ = 42 and σ = 4.

X  μ X  42
Therefore, Z=  is a Standard normal variate.
σ 4

 X  42 50  42 
(i) P( X  50)  P    PZ  2 
 4 4 

Fig. 6.10: Normal curve

P (Z< 2 )= Standard normal area from -∞ to 2


= [area from -∞ to 0] + [area from 0 to 2]
= 0.5 + 0.4772 (from the table)
= 0.9772.

 X  42 50  42 
(ii) P(X  50)  P    P( Z  2)
 4 4 

Fig. 6.11: Normal curve


P (Z>2) = area from 2 to ∞
= [area from 0 to ∞] - [area from 0 to 2]
= 0.5 - 0.4772 (from the table)
= 0.0228.

Manipal University Jaipur Page No. 291


Statistics for Management Unit 6

 X  42 40  42 
(iii) P(X  40)  P   = P [Z < -0.5]
 4 4 

Fig. 6.12: Normal curve

P [Z < -0.5] = area from -  to -0.5


= area from 0.5 to 

= [area from 0 to ] - [area from 0 to 0.5]


= 0.5 - 0.1915 = 0.3085

 X  42 40  42 
(iv) P(X  40)  P   = P [Z > -0.5]
 4 4 

Fig. 6.13: Normal curve

P[Z > -0.5] = area from -0.5 to 

= [area from (-0.5) to 0] + [area from 0 to ]


= [area from 0 to 0.5] + [area from 0 to ]
= 0.1915 + 0.5 = 0.6915

 40  42 X  42 44  42 
(v) P(40  X  44)  P    = P [-0.5 < Z < 0.5]
 4 4 4 

Manipal University Jaipur Page No. 292


Statistics for Management Unit 6

Fig. 6.14: Normal curve

P[-0.5 < Z < 0.5] = area from -0.5 to 0.5


= [area from (-0.5) to 0] + [area from 0 to 0.5]
= [area from 0 to 0.5] + [area from 0 to 0.5]
= 0.1915 + 0.1915 = 0.3830

 37  42 X  42 41  42 
(vi) P(37  X  41)  P    = P[-1.25 < Z < -0.25]
 4 4 4 

Fig. 6.15: Area from – 1.25 to – 0.25

P[-1.25 < Z <-0.25] = area from –1.25 to -0.25


= area from 0.25 to 1.25
= [area from 0 to 1.25] - [area from 0 to 0.25]
= 0.3944 – 0.0987 = 0.2957
Solved Problem 22
Heights of students are normally distributed with mean 165 cm and standard
deviation 5 cm. Find the probability that height of a student is greater than
177 cm and lesser than 162 cm.

Manipal University Jaipur Page No. 293


Statistics for Management Unit 6

Solution
Let, ‘X’ denote the height of students. Then, X is a Normal variate with
parameters µ = 165 cm. and σ =5 cm.

X  μ X  165
Z=  is a Standard normal variate.
σ 5
i) Probability that the student is more than 177 cm tall is

 X  165 177  165 


P(X  177 )  P   = P [Z > 2.4]
 5 5 

Fig. 6.16: Normal curve


P [Z > 2.4] = area from 2.4 to 
= [area from 0 to ] – [area from 0 to 2.4]
= 0.5 – 0.4918 = 0.0082
Probability that the student is more than 177 cm tall is 0.0082
ii) Probability that the student is less than 162 cm tall is

 X  165 162  165 


P(X  162 )  P   = P[Z < -0.6]
 5 5 
P[Z < -0.6] = area from - to -0.6
= area from 0.6 to 
= [area from 0 to ] – [area from 0 to 0.6]
= 0.5 – 0.2258 = 0.2742

Manipal University Jaipur Page No. 294


Statistics for Management Unit 6

Fig 6.17: Normal curve

Probability that the student is less than 162 cm tall is 0.2742


Solved Problem 23
Mean life of electric bulbs manufactured by a firm is 1200 hours. The
standard deviation is 200 hours.
i) In a lot of 10,000 bulbs, how many bulbs are expected to have life
1050 hours or more?
ii) What is the percentage of bulbs which are expected to fail before 1050
hours of service?
Solution
Let, ‘X’ denote the life of the bulbs. Then, X is a normal variate with
parameters µ = 1200hrs and σ = 200 hrs
X  μ X  1200
Z=  is a Standard normal variate
σ 200
(i) Probability that life of a bulb is 1050 hours or more is

 X  1200 1050  1200 


P(X  1050 )  P   = P[ Z ≥ -0.75]
 200 200 

Fig 6.18: Normal curve


P[ Z ≥ -0.75] = [area from -0.75 to 0] + [area from 0 to ∞]
= [area from 0 to 0.75 ] + [area from 0 to ∞]
=0.2734 + 0.5 = 0.7734
Manipal University Jaipur Page No. 295
Statistics for Management Unit 6

(since normal distribution is symmetrical -0.75 value is same as


0.75)
In a lot of N = 10,000 bulbs, expected number of bulbs with life 1050
hours or more is N x P [X≥1050] = 10,000 x 0.7734 = 7734
(ii) Probability that life of a bulb is 1050 hours or less is

 X  1200 1050  1200 


P(X  1050 )  P   = P (Z ≤ -0.75)
 200 200 

Fig 6.19: Normal curve

P (Z ≤ -0.75) = [ area from -∞ to -0.75]


= [ area from 0.75 to ∞]
= [area from 0 to ∞] – [ area from 0 to 0.75]
= 0.5 - 0.2734 = 0.2266
The percentage of bulbs with life less than 1500 hours is
100 x P(X  1050 ) = 100 x 0.2266 = 22.66

Solved Problem 24
The mean and standard deviation of marks scored by a group of students in
an examination are 47 and 10 respectively. If only 20% of the students have
to be promoted, which should be the marks limits for promotion?
Solution
Let, ‘X’ denote marks. Then, X is a Normal variate with parameters µ=47
and σ = 10.
X  μ X  47
Z= 
σ 10

Manipal University Jaipur Page No. 296


Statistics for Management Unit 6

Let ‘a’ be the marks above which if a student scores he would be promoted.
Then, since only 20% of the students have to be promoted, the probability of
a student getting promotion should be 20/100=0.2
Therefore,

P(X  a )  0.2

 X  47 a  47 
P   0.2
 40 10 

 a  47 
P Z   0.2
 10 

Fig. 6.20: Normal curve

a  47
And so, P [Z ≥ z] = 0.2 where z =
10
That is, [area from z to ∞] = 0.2
That is, [area from 0 to z] = 0.3
From the table of areas, the value of z for which [area from 0 to z] = 0.3 is
z = 0.84. Therefore, z = 0.84.
And so,

a  47
 0.84
10
a – 47 = 8.4
a = 55.4
Thus, the marks limit for promotion is a = 55.4
Manipal University Jaipur Page No. 297
Statistics for Management Unit 6

Self Assessment Questions


4. State whether the following statements are ‘True’ or ‘False’.
i) Quartile deviation of normal distribution is 4/ 5 .
ii) Mean and standard deviation of Standard normal distribution are
‘1’ and ‘0’.
iii) Mean, Median and Mode coincide in a Normal distribution.

6.7 Summary
Let us recapitulate the important concepts discussed in this unit:
 Quick analysis of observed data can be done if it is identified with the
theoretical distribution.
 The probabilities associated with random variate of the distribution help
us to know the chances of occurrence of several events within specified
values.
 Binomial distribution is applied when you run a series of finite
independent Bernoulli trials and the probability of success remains
same for every trial. In this distribution, ‘1’ represents the occurrence of
success and ‘0’ represents the occurrence of failure.
 Poisson distribution is a unimodal distribution with mean ‘m’ and
standard deviation is m . This distribution is the limiting form of
binomial distribution as ‘n’ tends to infinity.
 Normal distribution is a continuous probability distribution with
probability density function f(x) given by:

 2
 1  x
1 
f (x)  e 2 
,    x  ,   0,      
 2
 Any normal distribution can be converted into the standard normal
distribution with the transformation.
 
Z where, ‘Z’ is called Standard normal variate.

Manipal University Jaipur Page No. 298


Statistics for Management Unit 6

6.8 Glossary
Bernoulli variate: A random variable, which assumes values ‘1’ and ‘0’ with
probabilities ‘p’ and ‘q’, (where, q = 1-p) is called Bernoulli variable.
Binomial distribution: A probability distribution which has the following
probability mass function (p.m.f) is called binomial distribution.

  x  n C x q n -x p x where, x = 0 ,1,2,…, n.

Continuous random variable: A random variable is continuous if it is


capable of taking all values in an interval of real numbers.
Discrete random variable: A Random variable is discrete when the
number of possible outcomes in a random experiment is countable.
Normal distribution: A probability distribution which has the following
probability density function (p.d.f) is called Normal distribution.

 2
 1  x
1 
f (x)  e 2 
,    x  ,   0,      
 2
Poisson distribution: Poisson process is obtained when the Binomial
experiment is conducted many number of times.
Probability distributions: The listing of all the probable outcomes in a
random experiment along with their respective probabilities is called the
Probability distribution.
Random variables: A real valued function that associates result or outcome
of an experiment with a real number is known as random variable.
Standard normal distribution: A normal variate with mean µ=0 and
standard deviation σ =1 is called Standard normal variate.

6.9 Terminal Questions


1. What are the assumptions under which binomial distribution is applied?
2. A shopkeeper notes that the probability that a customer will buy his
articles is 0.4. Six customers enter his shop in an hour. What is the
probability that:
i) At least one customer bought something?
Manipal University Jaipur Page No. 299
Statistics for Management Unit 6

ii) Exactly two bought something?


iii) None bought anything?
3. Find P(X = 2), given mean and standard deviation of the binomial
distribution are 4 and 3 respectively.

4. Give real life examples of Poisson variate.


5. If the first two terms of a Poisson distribution are 150 and 90,
find P(X = 0).
6. The average number of phone calls at a booth per hour is 2. What is
the probability that there will be exactly one call in an hour?
7. The probability that a firm’s product will succeed its competitor’s
product is 2/3. If in a month it has introduced 4 products, what is the
probability that:
i) Two products succeed the competitor’s product?
ii) All products succeed the competitor’s product?
8. Mean life of electric bulbs produced by a company is 1500 hours with
a standard deviation of 300 hours. Assuming that the life of bulbs
follow normal distribution, what is the probability that a randomly
selected bulb will:
i) Fail within 1200 hours?
ii) Survive between 1350 and 1650 hours?
iii) Survive beyond 1950 hours?
9. Write short notes on Normal distribution.
10. The height of students follows Normal distribution. 15% of them have
height less than 150 cm and 10 % have height above 180 cm. Find the
mean and standard deviation of the distribution?

6.10 Answers

Self Assessment Questions


1. i) False, ii) False, iii) True
2. i) False, ii) True, iii) True, iv) False

Manipal University Jaipur Page No. 300


Statistics for Management Unit 6

3. i) True, ii) True


4. i) False, ii) False, iii) True

Terminal Questions
1. Refer section 6.4.1.
2. 14899/15625
16
3. C2 (0.75)14 (0.25)2
4. Refer section 6.5.2.
5. e-0.6 = 0.5488
6. 0.27068
7. 8/27
8. i) 0.1587 ii) 0.3830 iii) 0.0668
9. Refer section 6.6.
10. Mean = 165.89, S.D = 11.03

Case Study Solution


Case study 1
The required probabilities are:
i) 0.1382
ii) 0.5444
iii) 0.0467
Case study 2
Yes, the Central Government will allocate funds as the probability of more
than two birds being killed in any week is 0.73 which is greater than 0.7.

Activity Solution
Solution 1:
Let n and p be the parameters. Then,
Mean = np = 6
Variance = npq = 1.5

Variance npq 1.5 1


= = =
Mean np 6 4

Therefore, q = ¼ and p = ¾
Manipal University Jaipur Page No. 301
Statistics for Management Unit 6

Therefore, Mean = n * 3/4 = 6


That is, n = 24/3 = 8
Therefore, by binomial distribution,
P(X=x) = 8Cx (3/4)x (1/4)8-x, x=0,1,2,…….8
(i) P[X=2] = 8C2 (3/4)2 (1/4)6 = 0.003845
(ii) P[X≤2] = P(X=0)+P(X=1)+P(X=2)
= 8C0 (3/4)0 (1/4)8 + 8C1 (3/4)1 (1/4)7 + 8C2 (3/4)2 (1/4)6
= 277/65536 = 0.004227
Solution 2:
Let ‘m’ be the parameter.
Here, P[X = 2] = P[X=3]

e m m 2 e m m 3
=
2! 3!
1 m
=
2 3
3
m
2
The probability mass function of Poission Distribution is

e 3/2 (3/2) x
P(X  x) = , x = 0,1,2,....
x!

e 3/2 (3/2) 4 0.22315.0625


P[X=4] = = = 0.0471
4! 24

6.11 Case Study


Case study 1
Bestvision TV
The life time of picture tubes of ‘Bestvision’ TV is normally distributed with a
mean life of 10 years and a standard deviation of 2 years. If the
manufacturer guarantees free replacement of tubes that fail before 5 years,

Manipal University Jaipur Page No. 302


Statistics for Management Unit 6

what proportion of tubes sold will have to be replaced free of cost? If the
intension is not to replace more than 2% of the failed tubes, what period of
guarantee should be set? Given the cost of replacing a picture tube is Rs
5,000, and the cost for increasing the life of a tube by one year is Rs 1,000,
discuss the options of:
i) Increasing life by 1 year
ii) Increasing guarantee period by 1 year
iii) Revising the replacement policy so that not more than 1% of failed
tubes are replaced free of cost

Case study 2
Credit Cards
As an incentive for customers to spend more money on its credit card, a
bank has decided to award high spending customers with an offer of free
stay of 3 days at one of the holiday resorts in India. However, it doesn’t want
to give the offer to more than 1% of customers. If the mean spending per
customer is Rs. 20,000 with a standard deviation of Rs 5,000, what amount
of spending the company should specify as a cut-off? However, at the end
of the first month, it was found that 5% of customers qualified for the offer.
What could have happened? Assuming that the standard deviation has not
changed, calculate the new mean spending per customer, and assuming the
mean has not changed, calculate the new standard deviation.
(Source: TN Srivastava & Shailaja Rejo (2008) Statistics for Management 5th
edition, TMH)

References:
 Bowerman B. L., & O Connel, R.T., (1996), Applied Statistics: Improving
Business Processes, Irwin.
 David H. Voelker, Peter Z. Orton and Scott Adams (Jun 15, 2001),
Statistics (Cliffs Quick Review)
 Freedman D. R. Pisani, and Purves, R., (1997), Statistics, 3rd edition,
W.W. Norton.
 John Schiller, R. Alu Srinivasan and Murray Spiegel, (Aug 26, 2008),
Schaum's Outline of Probability and Statistics, 3rd Ed. (Schaum's
Outline Series)

Manipal University Jaipur Page No. 303


Statistics for Management Unit 6

 Levin, Richard I., & Rubin, David S., (2008), Statistics for Management,
Seventh Edition, PHI Learning Private Limited
 Martin Sternstein (Feb 1, 2010), Barron's AP Statistics with CD-ROM
(Barron's AP Statistics (W/CD))
 Martin Sternstein, Barron's AP Statistics, 6th Edition
 Murray R. Spiegel, John J. Schiller and R. Alu Srinivasan (Mar 17,
2000), Schaum's Outline of Probability and Statistics
 Murray Spiegel and Larry Stephens (Jan 31, 2011), Schaums Outline of
Statistics, Fourth Edition (Schaum's Outline Series)
 Seymour Lipschutz and John J. Schiller (Sep 7, 2011), Schaum's
Outline of Introduction to Probability and Statistics (Schaum's Outline
Series)
 Tanur, J. M., (2002), Statistics: A Guide to the Unknown, 4th edition,
Brooks/Cole.
 Tukey J. W., (1977), Exploratory Data Analysis, Addison–Wesley.
 Wilcox, Rand R., (2009), Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press.

E-References:

 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-3.pdf

Manipal University Jaipur Page No. 304


Statistics for Management Unit 7

Unit 7 Sampling and Sampling Distributions


Structure:
7.1 Introduction
Objectives
Relevance
Statistics in practise
7.2 Population and Sample
Universe or Population
Types of Population
Sample
7.3 Advantages of Sampling
7.4 Sampling Theory
Law of statistical regularity
Principle of inertia of large numbers
Principle of persistence of small numbers
Principle of validity
Principle of optimisation
7.5 Terms Used in Sampling Theory
7.6 Errors in Statistics
Measures of statistical errors
7.7 Types of Sampling
Probability sampling
Non-probability sampling
7.8 Determination of Sample Size
7.9 Central Limit Theorem
7.10 Summary
7.11 Glossary
7.12 Terminal Questions
7.13 Answers
7.14 Case Study

7.1 Introduction
In the previous unit, ’Theoretical Probability Distributions’, you have studied
about both discrete and continuous random variables along with the
probability distributions of random variables. You have studied about the

Manipal University Jaipur Page No. 305


Statistics for Management Unit 7

Binomial, Poisson and Normal distributions which are explained with the
help of solved problems.
In this unit, we will discuss about the statistical sampling and sampling
designs. You will study about different types of sampling theories and also
the laws of sampling. We will end this unit with the important theorem called
central limit theorem.
In different fields of human activity, the decision making process is based on
the observations of few units which form a portion of the total population.
The process of studying only a portion of the population and making
decisions involves risk, the risk of making wrong decisions. This unit deals
with the various techniques of drawing samples from the population.
Evaluation of risk will be discussed in unit 9, ‘Testing of Hypothesis in case
of Large and Small Samples’.
When sampling design is not done properly, the estimation or the inferences
drawn from the sample can go wrong and the managerial decisions taken
on the wrong conclusions may lead to loss of time, money and human
resources. This may badly affect the reputation of their organisation. Hence,
the risks involved in using the incorrect sampling design are of primary
concern to investigators.
Objectives:
After studying this unit, you should be able to:
 differentiate between population and a sample
 define the laws of sampling theory
 identify the various sampling errors
 recognise the types of sampling available
 determine the sample size
 define the central limit theorem
7.1.1 Relevance
The new chairman of the Flourish bank continued with the existing system
of conducting quarterly meeting with a selected group of regional and
branch managers that were deemed to be an elite band by the previous
management. However, after a few meetings, he observed that the quality
of ideas at the meetings was limited due to the nature of AME executives
participating in the meeting. He was also informed by his secretary that

Manipal University Jaipur Page No. 306


Statistics for Management Unit 7

there was a general discontentment among the other regional and branch
managers, since they felt that they were being denied the opportunity of
meeting and sharing their ideas for growth with the chairman. Consequently,
he asked the head of department for Management Information Systems to
classify the regions into three categories and branches into four categories.
He then developed a system of selecting the participants for the meetings
from each category of regional & branch managers on a random basis, in
such a manner that no regional or branch manager was invited again unless
all others have been invited for the meetings. Thus, with the help of random
sampling, coupled with the other managerial and behavioural initiatives, the
chairman was able to create an atmosphere of involvement, trust and
commitment. This contributed significant to the accelerated growth of the
bank.
(Source: TN Srivastava & Shailaja Rejo(2008) Statistics for Management 5th edition,
TMH.)

7.1.2 Statistics in practise


Mead Corporation
Dayton, Ohio
Mead Corporation is a delivered paper and forest products company. Over
16,000 employees are working with the corporation. It has global presence
in 32 countries with customers located in 98 countries. Mead holds a leading
position in paper production with an annual capacity of 1.8 million tonnes.
Some of company’s products line includes text book paper, glossy
magazine paper, beverages packing systems and office products. Mead’s
internal consulting group uses sampling to provide a variety of information
that enables the company to obtain significant productivity benefits and
remain competitive.
For example, Mead maintains large wood land holding, which provides trees
that serve the purpose of raw material for many of the company’s products.
Managers were expecting reliable and accurate information about the
timberlands and forests to evaluate the company’s ability to meet its future
raw materials’ needs. What is the present volume in the forests? What is the
past growth of the forests? What is the projected future growth of the
forests? With the answers to these essential questions, Mead’s managers

Manipal University Jaipur Page No. 307


Statistics for Management Unit 7

can develop plans for the future including long–term planting and harvesting
schedules for the trees.
These sample data were entered into the company’s continuous forest
inventory (CFI) computer system. Reports from CFI system include a
number of frequency distribution summaries containing statistics on types of
trees, present forest volume, past forest growth rate and projected future
forest growth and volume. Sampling and associated statistical summaries of
the sample data provides the report that are essential for the effective
management of Mead’s forests and timberlands.
(Source: David R Anderson, Dennis J Sweeney & Thomas A Williams 5th edition,
Thomson Business Information Pvt. Ltd.)

7.2 Population and Sample


In this section, we will discuss the concept of Population and of Sample.
7.2.1 Universe or Population
Statistical survey or enquiries deal with studying various characteristics of
unit belonging to a group. The group consisting of all the units is called
Universe or Population. The figure 7.1 depicts the population.

Fig. 7.1: Illustration of Population

Example 1
In the statistical survey aimed at determining average per capita income
of the people in the city, all earning individuals in the city form the
population.

Manipal University Jaipur Page No. 308


Statistics for Management Unit 7

7.2.2 Types of Population


The figure 7.2 depicts the types of population along with the explanation.

Fig. 7.2: Types of Population

Note: Although many populations appear to be exceedingly large, no truly


infinite population of physical objects actually exists. Given limited resources
and time it is practically not possible to count the number of grains of sand
on the beach. Such populations are termed as infinite population for our
study.
7.2.3 Sample
Sample is a finite subset of a population. A sample is drawn from a
population to estimate the characteristics of the population. Sampling is a
tool which enables us to draw conclusions about the characteristics of the
population. The figure 7.3 depicts the population and sample.

Manipal University Jaipur Page No. 309


Statistics for Management Unit 7

Population

Sample

Fig. 7.3: Illustration of Population and Sample

7.3 Advantages of Sampling


In this section, we will discuss the advantages of Sampling. The advantages
of Sampling are:
 In short time we get maximum information about the population.
 It results in considerable amount of saving of time and labour.
 The organisation and administration of a sample survey is relatively
much less.
 The results obtained are reliable and always possible to attach degree of
reliability.
 There is a possibility of obtaining detailed information. In other words,
there is a greater scope.
 In case of infinite population, it is the only available method.
 If the units are destroyed or affected adversely in the course of
investigation, then the only method is sampling.
Thus, we have studied the advantages of sampling. Let us study the
sampling theory in the next section.

7.4 Sampling Theory


In this section, we will discuss the sampling theory. The sampling theory is
based on the following five important laws. The figure 7.4 depicts the five
important laws of sampling theory, as follows:
1. Law of statistical regularity
2. Principle of inertia of large numbers
Manipal University Jaipur Page No. 310
Statistics for Management Unit 7

3. Principle of persistence of small numbers


4. Principle of validity
5. Principle of optimisation

Fig. 7.4: Laws of Sampling

1. Law of statistical regularity


The law of statistical regularity states that a group of units chosen at random
from a large group tends to posses the characteristics of that large group.
Suppose, a particular characteristic of the population has a particular shape,
then the same characteristics will also follow the same shape in the sample.
2. Principle of inertia of large numbers
This principle states that “other things being equal, as the sample size
increases, the results tend to be more reliable and accurate”. Suppose that
the population mean is 25 units, if a sample size of 50 results in average of
24.5 units, then larger sample size of 100 will result in 24.8 units. In other
words, larger the sample size, more accurate will be the result.
3. Principle of persistence of small numbers
If some of the units in a population possess markedly distinct
characteristics, then it will be reflected in the sample values also. For

Manipal University Jaipur Page No. 311


Statistics for Management Unit 7

example, if there are 300 blind persons in a population of 10,000 persons,


then a sample of hundred will have more or less same proportion of blind
persons in it.
4. Principle of validity
A sampling design is said to be valid if it enables us to obtain tests and
estimation about population parameters.
5. Principle of optimisation
This principle aims at obtaining a desired level of efficiency at minimum cost
or obtaining maximum possible efficiency with given level of cost.

7.5 Terms Used in Sampling Theory


In this section, we will discuss the terms that are used in the sampling
theory.
Parameter
Any statistics, like mean, median, calculated from population values are
known as parameters of the population and denoted by Greek letters
(,  and so on).
Statistics
Any statistics calculated from the sample are known as statistic and are
denoted by English letters ( X , s and so on). Statistic is the parameter of a
sample.
Sampling distribution
Sampling distribution consists of all the possible values of a statistic and
their respective probabilities for a given sample size.
Solved Problem 1
Consider the selection of two numbers from the given five numbers
(1, 2, 3, 4, 5). Find the possible combinations and their mean.
Solution
The possible combinations and their average are depicted in table 7.1a

Manipal University Jaipur Page No. 312


Statistics for Management Unit 7

Table 7.1a: Possible Combinations of given 5 numbers and their average

Combinations Numbers Selected Average


1 1,2 1.5
2 1,3 2
3 1,4 2.5
4 1,5 3
5 2,3 2.5
6 2,4 3
7 2,5 3.5
8 3,4 3.5
9 3,5 4
10 4,5 4.5

This gives the mean of sample size 2. We form a distribution of sample


mean which can be represented in table 7.1b.
Table 7.1b: Frequency Table for the Data

X f
fX
Mean Frequency
1.5 1 1.5
2 1 2.0
2.5 2 5.0
3 2 6.0
3.5 2 7.0
4 1 4.0
4.5 1 4.5
N = 10 fX = 30

 Mean of the sampling distribution = fX / n =30/10= 3

Mean of the population is (1 + 2 + 3 + 4 + 5) / 5 = 3


We observe that the mean of sample mean is equal to population mean.

Manipal University Jaipur Page No. 313


Statistics for Management Unit 7

7.6 Errors in Statistics


In this section, we will discuss the errors in statistics. The term ‘error’
denotes the difference between true value of population parameter and its
estimate provided by sampling technique. Therefore, the term ‘error’ in
statistics is not referred in its ordinary sense. There are four types of errors
as depicted in the figure 7.5.

Fig. 7.5: Errors in Statistics

Let us understand about each of the error types and the factors causing
those errors.
1. Sampling errors
The sample results are bound to differ from population results, since sample
is only a small portion of the population. It is also known as inherent error
and cannot be avoided. It is not worth to eliminate them completely. These
errors may be due to the following factors:
 Faulty selection of sample
 Substitution of units to be studied
 Faulty demarcation of sampling units
 Error due to bias in estimation
However, the sampling errors follow random or chance variations and tend
to cancel out each other on averaging.
2. Non-sampling errors
Non-sampling errors are attributed to factors that can be controlled and
eliminated by suitable actions. They are due to the following factors:
 Faulty planning, faulty definitions
 Defective methods of interviewing
 Personal bias of investigator
 Lack of trained and qualified investigators
 Respondents failure to answer
Manipal University Jaipur Page No. 314
Statistics for Management Unit 7

 Improper coverage
 Compiling errors
 Publication errors
It is worth to eliminate these errors.
3. Biased errors
Biased errors arise in both census and sampling methods. These errors
occur due to personal bias of the investigator and the instruments used for
measuring. They are also due to faulty collection of data, respondent’s bias
and bias due to non-response. Biased errors have a tendency to grow with
sample size. Therefore, they are also known as cumulative errors. The
magnitude of biased errors is directly proportional to the sample size.
4. Unbiased errors
The errors that are due to over-estimation and under-estimation, such that
they are equal are known as unbiased errors. They are also known as
compensatory errors. They do not increase with sample size.
7.6.1 Measures of statistical errors

Key statistic
Absolute error is the difference between true value, ‘t’ and the observed
value, ‘a’. Symbolically, Absolute Error ‘AE’ is represented as:
AE  t  a
It is independent of magnitude of the actual value.

Key statistic
Relative error is the ratio of the absolute error to the actual value. It is
symbolically represented as:
AE t - a
RE  
a a
It provides a degree of error for comparison purposes between different
sets of data.

Self Assessment Questions


1. State whether the following statements are True or False.
i) Population is aggregate of objects under study.

Manipal University Jaipur Page No. 315


Statistics for Management Unit 7

ii) Sampling method consume time and resources.


iii) Population is a subset of sample.
iv) An unbiased sample gives an accurate prediction of characteristics
of an entire population.
v) The standard deviation of sampling distribution of a statistic is
known as standard error of that statistic.
vi) Standard error is used as a reliability measure.
vii) Faulty selection of sample contributes to sampling error.
viii) Personal bias increases the non-sampling errors.
ix) Unbiased errors are cumulative in nature.

7.7 Types of Sampling


In this section, we will discuss the various types of sampling. By choosing a
sample technique carefully, errors can be minimised. Let us take a look at
the different techniques available. The sampling techniques may be broadly
classified into the following categories:
i) Probability sampling
ii) Non-probability sampling

7.7.1 Probability sampling


Probability sampling provides a scientific technique of drawing samples from
the population. The technique of drawing samples is according to the law in
which each unit has a predetermined probability of being included in the
sample. The different ways of assigning probability are as follows:
i) each unit is assigned with the same chance of being selected
ii) sampling units are assigned with varying probability depending on
priorities
iii) units are assigned with probability proportional to the sample size
We will discuss here some of the important probability sampling designs.
1. Simple random sampling
Under this technique, sample units are drawn in such a way that each and
every unit in the population has an equal and independent chance of being
included in the sample. If a sample unit is replaced before drawing the next
unit, then it is known as simple random sampling with replacement
[SRSWR]. If the sample unit is not replaced before drawing the next unit,

Manipal University Jaipur Page No. 316


Statistics for Management Unit 7

then it is called simple random sampling without replacement [SRSWOR]. In


first case, probability of drawing a unit is 1/N, where N is the population size.
In the second case, probability of drawing a unit is 1/Nn.
The selection of simple random sampling can be done by the following
ways:
 Lottery method – In lottery method, we identify each and every unit with
distinct numbers by allotting an identical card. The cards are put in a
drum and thoroughly shuffled before each unit is drawn. The figure 7.6
depicts a lotto machine through which samples can be selected
randomly.

Fig. 7.6: Lotto Machine

 The use of table of random numbers – There are several random


number tables. They are Tippet’s random number table, Fisher’s and
Yate’s tables, Kendall and Babington Smiths random tables, Rand
Corporation random numbers etc. The table 7.2 depicts the specimen of
random numbers by Tippett’s.
Table 7.2: Tippett’s Random Number Table

2952 6641 3992 9792 7979 5911 3170 5624


4167 9524 1545 1396 7203 5356 1300 2693
2370 7483 3408 2762 3563 1089 6913 7691
0560 5246 1112 6107 6008 8126 4233 8776
2754 9143 1405 9025 7002 6111 8816 6446

Example 2: Suppose, we want to select 10 units from a population size of


100. We number the population units from 00 to 99. Then we start taking 2
digits. Suppose, we start with 41 (second row) then the other numbers
selected will be 67, 95, 24, 15, 45, 13, 96, 72, 03.

Manipal University Jaipur Page No. 317


Statistics for Management Unit 7

Example 3: If we want to select 10 students out of 30 students in a class,


then number the students from 00 to 29. Then, from the random number
table choose a two digit number. In the Table 7.2, we start from the third
row. The first number selected is 23, which lies between 00 and 29. So the
23rd student is selected as the first unit of the sample. The second number is
70, but it greater than 29 we cannot choose that number. The bold numbers
in table 7.2 are the selected sample that is the numbers selected are 23, 08,
27, 1013, 05, 11, 12, 07, 08. The corresponding students constitute the
required sample.
2. Stratified random sampling
This sampling design is most appropriate if the population is heterogeneous
with respect to characteristic under study or the population distribution is
highly skewed.
We subdivide the population into several groups or strata such that:
i) Units within each stratum is more homogeneous
ii) Units between strata are heterogeneous
iii) Strata do not overlap, in other words, every unit of the population
belongs to one and only one stratum
The criteria used for stratification are geographical, sociological, age, sex,
income etc. The population of size ‘N’ is divided into ‘k’ strata relatively
homogenous of size N1, N2…….Nk such that ‘N1 + N2 +……… + Nk = N’.
Then, we draw a simple random sample from each stratum either
proportional to size of stratum or equal units from each stratum.
The table 7.3 displays the merits and demerits of stratified random
sampling.
Table 7.3: Merits and Demerits of Stratified Random Sampling
Merits Demerits
1. Sample is more representative 1. Many times the stratification is not
effective
2. Provides more efficient estimate 2. Appropriate sample sizes are not
drawn from each of the stratum
3. Administratively more convenient
4. Can be applied in situation where
different degrees of accuracy is
desired for different segments of
population

Manipal University Jaipur Page No. 318


Statistics for Management Unit 7

Example 4
The items produced by factories located at three cities ‘X’, ‘Y’ and ‘Z’ are
200, 300 and 500, respectively. We wish to draw a sample of 20 items
under proportional stratified sampling. We number the unit from 0 to 999.
Then refer to random numbers table and select the numbers as depicted
in table 7.4.
Table 7.4: Stratified Random Sampling

27717 43584 85192 88977 29490 69714 94015 62874


32444 48277 13025 14338 54066 15423 47724 66733
74108 82228 888570 74015 80217 36292 98525 24335
24432 24896 62880

Proportions of samples to be selected are calculated as follows:


200
For Factory X  20  4
1000
300
For Factory Y  20  6
1000
500
For Factory Z  20   10
1000
Total = 20
For the first factory sample units selected are 174, 192, 069, 156
For the second factory sample units selected are 287, 432, 444, 482,
302, 254
For the third factory sample units selected are 854, 772, 733, 741, 822,
853, 570, 802, 629, 525
3. Systematic sampling
This design is recommended if we have a complete list of sampling units
arranged in some systematic order such as geographical, chronological or
alphabetical order.
Suppose the population size is ‘N’. The population units are serially
numbered ‘1’ to ‘N’ in some systematic order and we wish to draw a sample
of ‘n’ units. Then we divide units from ‘1’ to ‘N’ into ‘K’ groups such that each
group has ‘n’ units.
Manipal University Jaipur Page No. 319
Statistics for Management Unit 7

This implies ‘nK = N’ or ‘K = N/n’. From the first group, we select a unit at
random. Suppose the unit selected is 6th unit, thereafter we select every
6 + Kth units. If ‘K’ is 20, ‘n’ is 5 and ‘N’ is 100 then units selected are 6, 26,
46, 66, 86.
The table 7.5 displays the merits and demerits of systematic sampling.
Table 7.5: Merits and Demerits of Systematic Sampling

Merits Demerits
1. Very easy to operate. 1. Many of the cases we do not get
up-to-date list.
2. It saves time and labour. 2. It gives biased results if periodic
feature exist in the data.
3. More efficient than simple random
sampling if we have up-to-date
frame.

Example 3: If there are 100 units in a population serially numbered from


1 to 100 and we want to draw sample of 5 units.
Therefore, we have K=N/n=100/5=20. Let the first number selected
randomly be 8. Then selected units in serial numbers are 8, 28, 28,
68 and 88.
6. Cluster sampling
The total population is divided into recognisable sub-divisions, known as
clusters such that within each cluster, units are more heterogeneous and
between clusters they are homogenous. The units are selected from each
cluster by suitable sampling techniques. The figure 7.7 depicts the cluster
sampling where each packet of candy packets forms a cluster.

Fig. 7.7: Cluster Sampling

Manipal University Jaipur Page No. 320


Statistics for Management Unit 7

8. Multi-stage sampling
The total population is divided into several stages. The sampling process is
carried out through several stages. It is as depicted in figure 7.8.

Fig. 7.8: Multistage Sampling

Example 5
We want to select 1000 colleges from southern states. In the first stages
we may select any three states. In the second stage we may select some
districts in that state. In the third stage, we may select the colleges in
each district. We may adopt any sampling technique at each stage.

The table 7.6 depicts the merits and demerits of multi-stage sampling.
Table 7.6: Merits and Demerits of Multi Stage Sampling
Merits Demerits
Greater flexibility in this sampling Estimates are less accurate
method
Existing division can be used Investigator should have knowledge of
the entire population that will be
sampled

7.7.2 Non-probability sampling


Depending upon the object of enquiry and other considerations a
predetermined number of sample units is selected purposely so that they
represent the true characteristics of the population.
A serious drawback of this sampling design is that it is highly subjective in
nature. The selection of sample units depends entirely upon the personal
convenience, biases, prejudices and beliefs of the investigator. This method
will be more successful if the investigator is thoroughly skilled and
experienced.
Manipal University Jaipur Page No. 321
Statistics for Management Unit 7

1. Judgment sampling
The choice of sample items depends exclusively on the judgment of the
investigator. The investigator’s experience and knowledge about the
population will help to select the sample units. It is the most suitable method
if the population size is less. The table 7.7 depicts the merits and demerits
of judgement sampling.
Table 7.7: Merits and Demerits of Judgement Sampling

Merits Demerits
1. Most useful for small population. 1. It is not a scientific method.
2. Most useful to study some unknown 2. It has a risk of investigator’s
traits of a population some of whose bias being introduced.
characteristics are known.
3. Helpful in solving day-to-day
problems.

2. Convenience sampling
The sample units are selected according to the convenience of the
investigator. It is also called “chunk” which refers to the fraction of the
population being investigated, which is selected neither by probability nor by
judgment.
Moreover, a list or framework should be available for the selection of the
sample. It is used to make pilot studies. However, there is a high chance of
bias being introduced.
3. Quota sampling
It is a type of judgment sampling. Under this design, quotas are set up
according to some specified characteristic such as age groups or income
groups. From each group a specified number of units are sampled
according to the quota allotted to the group. Within the group the selection
of sample units depends on personal judgment. It has a risk of personal
prejudice and bias entering the process. This method is often used in public
opinion studies.

Manipal University Jaipur Page No. 322


Statistics for Management Unit 7

Caselet
Read the information and answer the questions.
You have been given 5 boxes of biscuits. There are orange, brown and
yellow colour biscuits. You are asked to sample the biscuits. The target
population here is all of the biscuits and the sampling unit is the biscuit.
Answer the following questions.
i) How would you apply simple random sampling?
ii) How would you apply stratified sampling?
iii) How would you apply cluster sampling?

7.8 Determination of Sample Size


In this section, we will discuss the determination of sample size. Sample
size depends upon the size of the population, the resources available, the
degree of accuracy desired, homogeneity of the population, nature of study,
methods of sampling used and the nature of respondents. The following are
the formulae available to determine sample size, when the study is
concerned with population proportion or population mean.

Key statistic
The formula used for calculating the sample size while research is
concerned with population proportion and finite population, is given by:

( p  p)
z (For Finite population )
Nn pq
x
N 1 n
where, ‘N’ is population size.
z 2 pqN
n 2
e ( N  1)  z 2 pq
z = value correspond to the degree of confidence desired
p = population proportion,

p = sample proportion
e = acceptable error ( the precision)
q=1–p
n = sample size.

Manipal University Jaipur Page No. 323


Statistics for Management Unit 7

Key statistic
The formula used for calculating the sample size while research is
concerned with population proportion and infinite population, is given by:

( p  p)
z (For Infinite population )
pq
n

z 2 pq
n
e2
z = value correspond to the degree of confidence desired
p = population proportion,

p = sample proportion
e = acceptable error (the precision)
q=1–p
n = sample size.

Key statistic
The formula used for calculating the sample size for infinite population,
when population mean and sample mean are given, is:

( X  μ)
z (For Infinite population )

n
z 2 2
n
e2
z = standard variate at a given confidence level
 = population mean
X = sample mean
e = acceptable error ( the precision)
( X  μ)  e is the error we admit between the true value of parameter and
the statistic (estimated value).
 = standard deviation of population
n = sample size

Manipal University Jaipur Page No. 324


Statistics for Management Unit 7

Key statistic
The formula used for calculating the sample size for finite population,
when population mean and sample mean are given, is:

( X  μ)
z (For Finite population )
 Nn
n N 1

z 2 2 N
n
( N  1)e 2  z 2 2
z = standard variate at a given confidence level
 = population mean
X = sample mean
e = acceptable error ( the precision)
( X  μ)  e is the error we admit between the true value of parameter
and the statistic (estimated value).
 = standard deviation of population
n = sample size
N = size of population
Solved Problem 2
The mean expenditure of per customer at a tyre store is
Rs. 85.00, with a standard deviation of Rs. 9.00. If the mean expenditure of
the sample is Rs. 87, what is the required sample size? (z-value is 1.41)
Solution

Given =85, X =87, =9, z =1.41, ( X  μ)  e  87  85  2


Then we have,

z 2 2 (1.41) 2  9 2 1.9881  81
n    40.25  40.
e2 22 4
Hence the required sample size is 40.

Manipal University Jaipur Page No. 325


Statistics for Management Unit 7

Solved Problem 3
A production company has 350 hourly employees having average 37.6
years of age, with a standard deviation of 8.3. If the sample average is 40
years of age and z-value is 2.07, calculate the required sample size.

Solution: Given N=350, =37.6, X =40, =8.3 and z =2.07,


( X  μ)  e  40  37.6  2.4
Then the sample size is given by,

z 22 N
n
( N  1)e 2  z 2  2
( 2.07 ) 2  (8.3) 2  350

(350  1)  ( 2.4) 2  ( 2.07 ) 2  (8.3) 2
103315 .4
  44.8  45.
2305 .4

Hence the required sample size is 45.

7.9 Central Limit Theorem


In this section, we will discuss the central limit theorem. If X1, X2…………Xn
is a random sample of size ‘n’ from any population, then the sample mean
( X ) is normally distributed with mean ‘’ and variance ‘2 / n’ provided ‘n’ is
sufficiently large.
From the central limit theorem, we infer the following:
i) The mean of the sampling distributions will be equal to the population
mean
ii) The sampling distribution of the mean approaches normal distribution
as the sample size increases
iii) It permits us to use sample statistics to make inferences about the
population parameters irrespective of the shape of frequency
distribution of the population
Self Assessment Questions
2. State whether the following statements are true ‘T’ or false ‘F’.
i) Sample in which units are selected by judgment is known as
probability sample.
Manipal University Jaipur Page No. 326
Statistics for Management Unit 7

ii) Judgment sampling does not give representativeness of a sample.


iii) Large sample size always results in minimising the standard error.
iv) A sampling plan that divides the population into well-defined
groups from which random samples are drawn is known as cluster
sampling.
v) The principles of simple random sampling are the theoretical basis
for statistical inference.
vi) If the mean of a certain population is 20, it is likely that most of the
sample means will be 20.
vii) Any sampling distribution can be totally described by its mean and
standard deviation.
viii) The central limit theorem assures the sampling distribution of the
mean approaches normal distribution as the sample size
increases
ix) Stratified sampling is used when each group considered are more
homogenous within itself and heterogeneous between group.

7.10 Summary
Let us recapitulate the important concepts discussed in this unit:
 Statistical survey or enquiries deal with studying various characteristics
of unit belonging to a group. The group consisting of all the units is
called Universe or Population.
 There are two methods of studying the characteristics of population:
census and sampling.
 Sample is a finite subset of a population. A sample is drawn from a
population to estimate the characteristics of the population.
 There are two methods of sampling namely probability sampling and
non-probability sampling.
 Probability sampling provides a scientific technique of drawing samples
from the population.
 In non- probability sampling method, the selection of sample units
depends entirely upon the personal convenience, biases, prejudices and
beliefs of the investigator.

Manipal University Jaipur Page No. 327


Statistics for Management Unit 7

7.11 Glossary
Biased errors: Biased errors arise in both census and sampling method.
These errors occur due to personal bias of the investigator and the
instruments used for measuring.
Cluster sample: Cluster sample is the one in which the items in the
population are divided into various clusters, so that each cluster is the
representative of the population. A random sample of clusters is taken, and
the clusters selected are analysed.
Convenience sampling: The sample units are selected according to the
convenience of the investigator.
Judgment sampling: The choice of sample items depends exclusively on
the judgment of the investigator.
Non-probability sample: A sample in which items are chosen without
knowing their probability of selection.
Non-sampling errors: Non-sampling errors are attributed to factors that
can be controlled and eliminated by suitable actions.
Probability sampling: Probability sampling provides a scientific technique
of drawing samples from the population.
Sample mean: An unbiased estimate of the mean of the population from
which it was drawn.
Sampling distribution: Sampling distribution consists of all the possible
values of a statistic and their respective probabilities for a given sample
size.
Sampling error: Sampling error is the difference between the sample
statistic and the actual population parameter.
Simple random sample: A sample in which ‘n’ elements are selected from
a population in such a way that every set of ‘n’ elements in the population
has an equal probability of being selected.
Quota sampling: It is a type of judgment sampling. Under this design,
quotas are set up according to some specified characteristic such as age
groups or income groups.

Manipal University Jaipur Page No. 328


Statistics for Management Unit 7

Unbiased errors: The errors that are due to over-estimation and under-
estimation such that they are equal are known as unbiased errors.

7.12 Terminal Questions


1. Discuss the errors that arise in statistical survey.
2. Describe simple random sampling.
3. Describe systematic sampling.
4. What is quota sampling and when do we use it?
5. What are the basic principles on which sampling theory is based?
6. Explain about the sampling distributions of a static and its standard
error.
7. The distribution of employees in three plants of a manufacturing unit is
depicted in table 7.8. Using random numbers discussed under topic
‘simple random sampling’, draw a random sample of size 15.
Table 7.8: Distribution of Employees in Three Manufacturing Plants

Plant A B C
Number of employees 100 200 200

8. Population proportion of tea drinkers is 0.6. Determine the sample size


such that the error between actual and observed proportion will be less
than or equal to 0.05 with 95% confidence, (Z = 1.96).
9. The standard error of mean of bursting strength of card boards
produced by a company is 1.5 units. If the population standard deviation
is 50 , find the sample size.

Manipal University Jaipur Page No. 329


Statistics for Management Unit 7

Activity
1. The process of obtaining information about an entire population by
examining only a part of its is known as
i) statistics ii) sampling
iii) survey iv) selection
2. Non Sampling errors include
i) bias ii) mistakes
iii) both bias & mistakes iv) none of these
3. The simplest way of increasing the accuracy of a sample is to
increase its
i) Size ii) interviewer
iii) population iv) universe.
4. The term error in statistics is
i) mistakes
ii) bias
iii) both bias & mistakes
iv) difference between the value of a statistics and that of the
corresponding parameter

7.13 Answers

Self Assessment Questions


1. i) True, ii) False, iii) False, iv) True, v) True, vi) True, vii) True, viii) True,
ix) False
2. i) False, ii) True, iii) True, iv) False, v) True, vi) False, vii) False, viii)
True, ix) True

Terminal Questions
1. Refer section 7.6
2. Refer section 7.7.1
3. Refer section 7.7.1
4. Refer section 7.7.2
5. Refer section 7.4
6. Refer section 7.5
Manipal University Jaipur Page No. 330
Statistics for Management Unit 7

7. Refer section 7.7.1


8. The sample size is approximately 19
9. The sample size is approximately 23
Activity Solution
1. b) sampling
2. c) both bias & mistakes
3. a) size
4. d) difference between the value of a statistics and that of the
corresponding parameter

7.14 Case study


Case Study 1 – Online Shopping
A survey was conducted to ascertain online purchase of various items. Out
of a total of 1.793 respondents, the percentage of respondents for various
categories of items was as follows:
Table: 7.9
Products % of Respondents
Books 51
Electronics 49
Railway 59
Accessories 76
Apparel 66
Gifts 45
Computers & Peripherals 43
Airline Tickets 49
Music 44
Movies 41

The above data was published in October 02, 2007 issue of Business world,
based on Internet and online association of India (IOAI) report 2006. It was
reported that major online sites were yahoo mail, hot mail and rediff mail. It
was reported that during 2005-06, Indian consumers spent Rs 1,280 crore –
more than double the Rs 670 crore netted during 2004-05. Design a survey
Manipal University Jaipur Page No. 331
Statistics for Management Unit 7

to update the data as of now, giving details such as websites category of


products and amount of purchase.
Case Study 2 – Customer Perception
The Alliance department store wants to have an idea of the level of
customer perception of its customers. The owner wants the exercise to be
done on a Saturday when 3000 customers visit the store. The store is
spread over three floors and has 15 cash counters. Discuss as to how a
survey can be conducted for assessing the overall level of customer
perception on a scale of 0 to 10. The management of the store feels that a
sample of 150 customers could be sufficient for the purpose.

References:
 Agarwal, B. L., (2006), Basic Statistics, Fourth Edition, New Age
International Publishers.
 Anderson, David R., Sweeney, Dennis J. & Williams, Thomas A., 5th
edition, Thomson Business Information Pvt. Ltd.
 Bowerman, B. L. & O Connel, R. T., (1996), Applied Statistics:
Improving Business Processes, Irwin 1996.
 Freedman, D.R Pisani, and Purves, R., (1997), Statistics, 3rd edition, W.
W. Norton.
 Levin, Richard I. & Rubin, David S. (2008), Statistics for Management,
Seventh Edition, PHI Learning Private Limited.
 Srivastava, T. N., & Rejo, Shailaja (2008), Statistics for Management, 5th
edition, TMH.
 Tukey J. W., (1997), Exploratory Data Analysis, Addison–Wesley.
 Wilcox, Rand R., (2009), Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press.

E-References:
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-1.pdf

Manipal University Jaipur Page No. 332


Statistics for Management Unit 8

Unit 8 Estimation
Structure:
8.1 Introduction
Objectives
Relevance
Statistics in practise
8.2 Reasons for Making Estimates
8.3 Making Statistical Inference
8.4 Types of Estimates
Point estimate
Interval estimate
8.5 Criteria of a Good Estimator
8.6 Point Estimator for Mean and Variance
8.7 Interval Estimates
Case study on calculating estimates
Making the interval estimate
8.8 Interval Estimates and Confidence Intervals
Interval estimates of the Mean
Interval estimates of the Proportion
Interval estimates using the Student’s ‘t’ distribution
8.9 Summary
8.10 Glossary
8.11 Terminal Questions
8.12 Answers
8.13 Case Study

8.1 Introduction
In the previous unit, ‘Sampling and Sampling Distributions’, you have
studied about sampling design and different theories of sampling. The
sampling errors in the sampling distributions are also studied. In this unit,
you will study about estimation and different types of estimation. You will
also study about calculation of confidence intervals of the population mean
when the standard deviation is unknown. Finally, you will study the methods
to calculate the sample size for estimating the parameter with certain level
of confidence for given measure of accuracy.

Manipal University Jaipur Page No. 333


Statistics for Management Unit 8

Everyone makes estimates. When you are ready to cross a street, you
estimate the speed of any car that is approaching, the distance between you
and that car, and your own speed. Having made these quick estimates, you
decide whether to wait, walk or run. With the knowledge of inferential
statistics, you can do the estimations about the population using the random
samples which are drawn from the population.
Objectives:
After studying this unit, you should be able to:
 describe the types of estimates
 distinguish between a Point estimate and an Interval estimate
 evaluate the confidence interval
 describe interval estimates and confidence intervals
 evaluate the sample size if the confidence interval and permissible error
are given
8.1.1 Relevance
The new general manager of Ever Bright Light Company, manufacturing
tube lights is concerned about the dwindling profits of the company. The
main reason is that the company provides a guarantee of 1 year of life and
undertakes to replace a tube light if it fails within 1 year. Since a good
number of tube lights are failing in less than a year and are being replaced
free of cost, they are lowering the company’s profitability and also causing
loss of reputation. The general manager intuitively feels that the guaranteed
life must be such that the percentage of tube lights failing within that period
is quite small; say 5% or 10%, so as to keep the cost of replacement low.
Since, it may not be appropriate to reduce the guarantee; the only
alternative is to increase the life of the tube light. After careful consideration,
he outlines the following steps:
 Estimate the average life of tube lights, as well as the variation in their
lives.
 Take action to increase the life of the tube light with the help of improved
technology and better management of the production process.
 Test whether the actions taken have increased the life, and by how
much?

Manipal University Jaipur Page No. 334


Statistics for Management Unit 8

 Fix the price and guarantee period in such a way, so as to ensure


adequate increase in profits.
(Source: TN Srivastava & Shailaja Rejo(2008) Statistics for Management 5 th
ed.TMH)

The subject of statistical inference, as described in this chapter, could play a


useful role in these steps.
8.1.2 Statistics in practise
Dollar General Corporation
Dollar General Corporation was founded in 1939 as dry goods wholesale
company. After World War II, the company began opening retail locations in
rural, south of Kentucky. Dollar General Corporation operates more than
4300 stores across the middle and south eastern United States.
Emphasising small store convenience, Dollar General’s market health and
beauty aids, cleaning supplies, house wares, stationery, apparel, shoes and
domestic items of everyday are sold at low prices. Being in an inventory–
intense business with more than 20,000 products, Dollar General made
decisions to adopt LIFO (Last–in-first out). This method was matching with
current costs against current revenues, which minimises the effect of radical
price changes on profit and loss result. The establishment of a LIFO index
requires that the yearend inventory count for each product, be valued at the
current year cost and the preceding year end cost. To avoid counting the
inventory of every product in more than 4300 retail locations, a random
sample of 800 products is selected from 100 retail locations and three
warehouses. Physical inventories for the sampled products are taken at the
end year. Accounting personnel then provide current year’s costs and
preceding year’s costs needed to construct the LIFO index. For a recent
year the LIFO index was 1030. However, because this index is a sample
estimate of the population LIFO index, a statement about the precision of
the estimate was required. On the basis of the sample result and a 95 %
confidence level, the margin of error was computed to be 0.006. Thus the
interval from 1.024 to 1.036 provides the 95% confidence interval estimate
of the population LIFO Index. This precision was judged to be good.
(Source: David R Anderson, Dennis J Sweeney & Thomas A Williams 5th edition,
Thomson Business Information Pvt. Ltd.)

Manipal University Jaipur Page No. 335


Statistics for Management Unit 8

8.2 Reasons for Making Estimates


In this section, we will discuss the reasons of making estimates. All
managers must make quick estimates. The outcome of these estimates can
affect their organisations. Credit managers estimate whether a purchaser
will eventually pay his bills.
Prospective home buyers make estimates concerning the behaviour of
interest rates in the mortgage market. All these people make estimates
without worrying about whether they are scientific or not, but with the hope
that the estimates bear a reasonable resemblance to the outcome.
Managers use estimates because, if not in all, but in the most trivial
decisions, they must make rational decisions without complete information
and with a great deal of uncertainty about what the future will bring. As
educated citizens and professionals, you will be able to make more useful
estimates by applying the techniques described in this unit and in the
subsequent units.

8.3 Making Statistical Inference


In this section, we will discuss the technique of making statistical inferences.
Statistical inference is based on estimation and hypothesis testing. In both
estimation and hypothesis testing, we make inferences about characteristics
of populations from information contained in samples. Here, we infer
something about a population from information taken from a sample.
Here, we try to estimate with reasonable accuracy the population parameter.
Calculating the parameter such as the exact proportion or the exact mean
would be an impossible goal. Still, we will be able to make an estimate, and
implement some controls to avoid as much error, as possible.

8.4 Types of Estimates


Now, let us discuss the types of estimates. The following are two types of
estimates of population parameter:
i) Point estimate
ii) Interval estimate

Manipal University Jaipur Page No. 336


Statistics for Management Unit 8

8.4.1 Point estimate


Point estimate is a single number that is used to estimate an unknown
population parameter. A point estimate is often insufficient, because it is
either right or wrong. We do not know how wrong it is. Therefore, a point
estimate is much more useful if it is accompanied by an estimate of the error
that might be involved.
8.4.2 Interval estimate
Interval estimate is a range of values used to estimate a population
parameter. It indicates the error in the following two ways:
i) by the extent of its range
ii) by the probability of the true population parameter lying within that range

8.5 Criteria of a Good Estimator


In this section, we will discuss the criteria of a good estimator.
1. Being unbiased
Being unbiased is a desirable property of a good estimator.
We can say that a statistic is an unbiased estimator ,if on average, it tends
to assume values that are above the population parameter being estimated,
as frequently and to the same extent as it tends to assume values that are
below the population parameter being estimated. In fact, it may be noted
that sample mean is an unbiased estimator of population mean.
2. Efficiency
Another desirable property of a good estimator is that it must be efficient.
Efficiency refers to the size of the standard error of the statistic. Let us
compare two statistics from a sample of the same size and try to decide
which one is the more efficient estimator. In this case, we would pick the
statistic that has the smaller standard error, as an efficient estimator.

Example 1
Suppose, we choose a sample of a given size and must decide whether
to use the sample mean or the sample weighted mean to estimate the
population mean.
If we calculate the standard error of the sample mean and found it to be
1.05 and then, calculated the standard error of the sample weighted
mean and found it to be 1.6, we would say that the sample mean is a

Manipal University Jaipur Page No. 337


Statistics for Management Unit 8

more efficient estimator of the population mean, because its standard


error is smaller.
It makes sense that an estimator with a smaller standard error (with less
variation) will have more chance of producing an estimate nearer to the
population parameter under consideration.
3. Consistency
A statistic is a consistent estimator of a population parameter, if the value of
the statistic comes very close to the value of population parameter as the
sample size increases. If an estimator is consistent, it becomes more
reliable with large samples.
4. Sufficiency
An estimator is sufficient if it makes much use of the information in the
sample that no other estimator could extract from the sample, any additional
information about the population parameter being estimated.

8.6 Point Estimator of Mean and Variance


Now, in this section, we will learn the point estimator of mean and variance.
By taking a sample of size ‘n’, we use the sample mean, as a point
estimator for population mean.
n

X i
X i 1

n
We can use the sample variance ‘ s ’ and estimate the population
2

variance, where the sample variance ‘ s ’ is given by the formula.


2

 (X i  X) 2
s 2  i 1

n 1
where ‘n’ is the sample size. In many cases, such as in the case of interval
estimation of mean, we require to know the value of  , the population
standard deviation. If  is not known, we use ‘ s ’ in its place and proceed
with computations.

Manipal University Jaipur Page No. 338


Statistics for Management Unit 8

Solve Problem 1
The following table 8.1 depicts the total income in thousand rupees per year
of 10 randomly selected persons from a particular class of people.
Table 8.1: Total Income of ten people
Income
(in thousand 6.5 7.6 5.4 12.7 8.0 5.5 4.5 9.0 10.1 6.8
Rs)

On the basis of the data find the mean income of a person in this class and
also find sample standard deviation.
Solution
Let income is denoted by X, and given n=10.
Table 8.1a: Calculation of variation from the mean

Income
Xi
X i X = X i X = 
2

X i  7.61 Xi  7.612


6.5 -1.11 1.2321
7.6 0.01 0.0001
5.4 -2.2 4.8841
12.7 5.09 25.9081
8.0 0.39 0.1521
5.5 -2.11 4.4521
4.5 -3.11 9.6721
9.0 1.39 1.9321
10.1 2.49 6.2001
6.8 -0.810. 0.6561
∑X = 76.1

∑ Xi  X  =55.089
2

Sample mean is given by the formula


n

X i
76.1
X i 1
  7.61
n 10
The point estimate of population mean μ is X .

Manipal University Jaipur Page No. 339


Statistics for Management Unit 8

Therefore the average income of a person in this class is Rs 7.61.


The sample variance is given by,
n

 (X i  X) 2
55.089
s 2  i 1
  6.121.
n 1 10  1
Sample standard deviation is given by
s  s 2  6.121  2.474.

The point estimate of population standard deviation σ is  s .

Example 2
The table 8.2 depicts the results of samples of 35 boxes which contain
bolts.
Table 8.2: Results of Samples of 35 Boxes of Bolts (Bolts per Box)
101 103 112 102 98 97 93
105 100 97 107 93 94 97
97 100 110 106 110 103 99
93 98 106 100 112 105 100
114 97 110 102 98 112 99

Consider the table 8.2. We have taken a sample of 35 boxes of bolts


from a manufacturing line and have counted the bolts per box. We can
arrive at the population mean, that is, mean number of bolts by taking the
mean for the 35 boxes we have sampled. This is calculated by adding all
the bolts and dividing by the number of boxes.

X
 X  3570  102
n 35
Thus, using the sample mean X as the estimator we have a point
estimate of the population mean ‘µ’.

8.7 Interval Estimates


In this section, we will discuss the interval estimates. The purpose of
gathering samples is to learn more about a population. We can compute this

Manipal University Jaipur Page No. 340


Statistics for Management Unit 8

information from the sample data as either point estimates, or as interval


estimates.
Key statistic
An interval estimate describes a range of values within which a
population parameter is likely to lie.

If we select and plot a large number of sample means from a population, the
distribution of these means will approximate to normal curve. Furthermore,
the mean of the sample means will be same as the population mean.
8.7.1 Case study on calculating estimates
Case Study
The marketing research director needs an estimate of the average life in
months, for car batteries manufactured by his company. We select a
random sample of 200 batteries with a mean life of 36 months. If we use
the point estimate of the sample mean ‘ X ’ as the best estimator of the
population mean ‘µ’, we would report that the mean life of the company’s
batteries is 36 months.
The director also asks for a statement about the uncertainty that is likely
to accompany this estimate, that is, a statement about the range within
which the unknown population mean is likely to lie. To provide such a
statement, we need to find the standard error of the mean. Our sample
size of 200 is large enough that we can apply the central limit theorem,
suppose, we have already estimated the standard deviation of the
population of the batteries and reported that it is 10 months.
Using this standard deviation of population, we can calculate the
standard error of the mean in the case of large population, by using the

formula,  x 
n

We find the standard error S.E   x  10 / 200 to be 0.707 per month.


(Cont. on topic ‘Making the interval estimate’)

Manipal University Jaipur Page No. 341


Statistics for Management Unit 8

8.7.2 Making the interval estimate

Case Study
(Cont. from topic ‘Interval Estimates’)

We can tell to the director that our estimate of the life of the company’s
batteries is 36 months, and the standard error that accompanies this
estimate is 0.707. In other words, the actual mean life for all the batteries
may lie somewhere in the interval estimate of 35.293 to 36.707 months.
This is helpful but insufficient information for the director.
Next, we need to calculate the chances that the actual life will lie in this
interval or in other intervals of different widths that we might choose.

 2 x ( 2  0.707), 3 x (3  0.707) etc.

The probability is 0.955 that the mean of a sample size of 200 will be
within ±2 standard errors of the population mean. It can be stated
differently as 95.5 percent of all the sample mean are within ±2 standard
errors from population mean ‘’. The population mean, ‘µ’ will be located
within ±2 standard errors from the sample mean at 95.5 percent of the
time.
Hence, we can now report to the director, that the best estimate of the life
of the company’s batteries is 36 months, and we are 68.3 percent

confident that the life lies in the interval from 35.293 to 36.707 36  1 x . 
Similarly, we are 95.5 percent confident that the life falls within the
 
interval of 34.586 to 37.414 36  2 x , and we are 99.7 percent confident
 
that battery life falls within the interval of 33.879 to 38.121 36  3 x .

8.8 Interval Estimates and Confidence Intervals


In this section, we will discuss the interval estimates as well as the
confidence estimates. In finding interval estimates, we are not confined to
±1, 2 and 3 standard errors; for example, ±1.64 standard errors include
about 90 percent of the area under the curve; it includes 0.4495 of the area
on either side of the mean in a normal distribution. Similarly, ±2.58 standard
error includes about 99 percent of the area, or 49.51 percent on either side

Manipal University Jaipur Page No. 342


Statistics for Management Unit 8

of the mean. The probability associated with an interval estimate is called


confidence level. The co-factors associated with standard error in finding
different confidence intervals are drawn from standard normal distribution
table (whenever the sample is large, not lesser than 30) and they
correspond to the confidence level. We use student’s t-distribution in the
case of small sample which shall be discussed separately.

Key statistic
The probability that we associate with an interval estimate is called the
confidence level.
Similarly, we are 95.5 percent confident that the life falls within the
 
interval of 34.586 to 37.414 months 36  2 x , and we are 99.7 percent
confident that battery life falls within the interval of 33.879 to 38.121

months 36  3 x .
This probability indicates how confident we are about the fact that the
interval estimate will include the population parameter. A higher probability
means more confidence. In estimation, the most commonly used confidence
levels are 90 percent, 95 percent, and 99 percent, but we are free to apply
any confidence level. The confidence interval is the range of the estimate
we are making.

Example 3
If we report that we are 90 percent confident that the mean of the
population of incomes of people in a certain community will lie between
Rs. 8,000 and Rs. 24,000, then the range Rs. 8,000 - Rs. 24,000 is our
confidence interval.
Often, however, we will express the confidence interval in standard errors
rather than in numerical values. Thus, we will often express confidence
intervals like this:
X  z x = upper limit of the confidence interval

X  z x = lower limit of the confidence interval


where,  x is the standard error and ‘z’ is the standard normal variate
corresponding to confidence level.

Manipal University Jaipur Page No. 343


Statistics for Management Unit 8

Thus, confidence limits are the upper and lower limits of the confidence
interval. In this case, X  z x is called the upper confidence limit (UCL)

and X  z x is the lower confidence limit (LCL).

8.8.1 Interval estimates of the Mean


It has been noted in the earlier case study, the standard error of mean, in
the case of large (infinite) population, is given by the formula

x  s
n
Further, z-value corresponding to chosen confidence level can be obtained
from statistical table and substituting these values, we calculate UCL and
LCL as:

Xz s
n
If the samples are drawn from finite population, then we use the finite
population multiplier to calculate the standard error. As discussed in the
previous unit, the standard error of the mean of finite population can be
calculated as:
s Nn
x  
n N 1
and also the sample size ‘n’ is greater than five percent of the population
size ‘N’, that is,
n
 0.05
N
Solved Problem 2
From a population of 100, a sample of 10 individuals is taken. From this
sample the mean is found to be 5.2 and the standard deviation to be 1.3.
Find the 95% confidence interval for μ.
Solution
Given N  100, n  10, X  5.2 and  s  1.3

First we have to find  x , since the population size is finite,  x is given by

Manipal University Jaipur Page No. 344


Statistics for Management Unit 8

s N  n 1.3 100  10
x    
n N 1 10 100  1
 4.111  0.9535  3.92
At 95 % level of confidence, we know from the ‘z’ table that ‘z’ is 1.96.
UCL= X  z x = 5.2+1.96 x 3.92 = 5.2+7.6832 = 12.8832

LCL= X  z x = 5.2-1.96 x 3 .92 = 5.2-7.6832 = -2.4832.


Therefore 95% confidence interval for μ is
-2.4832 ≤ μ ≤ 12.8832
8.8.2 Interval estimates of the Proportion
Statisticians often use sample to estimate a proportion of occurrences in a
population. For example, by using a sampling procedure, the government
estimates, the unemployment rate, or the proportion of unemployed people
in the country’s workforce.
We know that for a binomial distribution, the mean and the standard
deviation are:
Mean   np
Standard deviation   npq
where,
n = number of trials
p = probability of success
q = probability of failure  1 p
We can modify the formula for the standard deviation of the binomial
distribution, npq , which measures the standard deviation in the number of
successes. To change the number of successes to the proportion of
pq
successes, we divide npq by n and get .
n
Therefore, the standard error of the proportion is given by:

pq
p = (large population case)
n
Using the above estimated standard error of proportion, we can work out the
confidence interval for population proportion thus:
Manipal University Jaipur Page No. 345
Statistics for Management Unit 8

pq
pz
n
Also, standard error of proportion in the case of finite population, we have:
pq Nn
p   (In case of Finite population).
n N 1
Solved Problem 3
In a very large organisation, the director wanted to find out what proportions
of the employees prefer to provide their own retirement benefits in lieu of a
company – sponsored plan. A simple random sample of 75 employees was
taken. It was found that 40%, that is, 0.4 of them are interested in providing
their own retirement plans. The management requests that we use this
sample to find an interval about which they can be 99 percent confident that
it contains the true population proportion.
Solution
Here, n = 75,
p = 0.4,
q = 1- p = 1 – 0.4 = 0.6

pq (0.4)(0.6)
Therefore, standard error of the proportion = = = 0.057
n 75
Confidence interval is given by

pq
pz
n
At 99 % level of confidence, we know from the ‘z’ table that ‘z’ is 2.58
UCL= 0.4 + 2.58 (0.057) = 0.547
LCL= 0.4 - 2.58 (0.057) = 0.253
Therefore, the interval estimate for 99% level of confidence is
0.4 ± 2.58 (0.057) = 0.253 and 0.547.
Hence, the proportion of the total population of employees who wish to
establish their own retirements plans lie between 0.253 and 0.547.

Manipal University Jaipur Page No. 346


Statistics for Management Unit 8

Table 8.3 : Formulae Concerning Estimation

Infinite Finite population


population

Estimating population mean  p Nn


p x  
() when we know p Xz N 1
n n
Estimating population mean  s Nn
() when we do not know Xz s x  
n n N 1
p

Estimating the population


pq pq
proportion where the pz pz x
sample is large
n n
Nn
N 1
8.8.3 Interval estimates using the Student’s ‘t’ distribution
Till now, the sample sizes that we were examining, were all larger than 30.
Note that in the earlier subsections, z-values corresponding to confidence
level were chosen to calculate UCL and LCL with the assumption that
sampling distribution fits into the normal distribution as the sample size is
large. This is not always the case. Questions like ‘handling estimates where
the normal distribution is not the appropriate sampling distribution’ are
answered in this section. In other words, we will discuss here, how to
compute the confidence interval when the sample size is 30 or less. For
example, we have data only from 10 weeks or sample sizes less than 30.
Sampling distribution in this case may not fit into normal distribution. But,
fortunately, another distribution exists that is appropriate in these cases. It is
called the ‘t’ distribution. Early theoretical work on ‘t’ distributions was done
by a man named W. S. Gosset in the early 1990s. Gosset was employed by
the Guinness Brewery in Dublin, Ireland, which did not permit employees to
publish research findings under their own names. So, Gosset adopted the
pen name ‘Student’ and published under that name. Consequently, the ‘t’
distribution is commonly called Student’s ‘t’ distribution, or simply Student’s
distribution.

Manipal University Jaipur Page No. 347


Statistics for Management Unit 8

Conditions for usage


Statisticians often associate the ‘t’ distribution with small sample statistics,
because it is used when the sample size is 30 or less. This is misleading
because the size of the sample is only one of the conditions that lead us to
use the ‘t’ distribution. The second condition is that the population standard
deviation must be unknown. Furthermore, in using the t distribution, we
assume that the population is normal or approximately normal.
Degrees of freedom
The term degrees of freedom refer to the number of independent
observations for a source of variation minus the number of independent
parameters estimated in computing the variation. “There is a different‘t’
distribution for each of the possible degrees of freedom.”

Key statistic
We can define degrees of freedom as the number of values that we can
freely choose. We will use degrees of freedom when we select a ‘t’
distribution to estimate a population mean, and we will use ‘n-1’ degrees
of freedom, where ‘n’ is the sample size.
For example, if we use a sample of 20 to estimate the mean of population,
we will use 19 degrees of freedom in order to select the appropriate ‘t’
distribution. With two sample values, we have one degree of freedom
(2-1 = 1), and with seven sample values, we have six degrees of freedom
(7-1 = 6). In each of these two examples, then, we had ‘n-1’ degrees of
freedom; assuming ‘n’ is the sample size. Similarly, a sample of 23 would
give us 22 degrees of freedom.

Key statistic
In any estimation problem in which the sample size is 30 or less and the
standard deviation of the population is unknown and the underlying
population can be assumed to be normal or approximately normal, use
the ‘t’ distribution.

Using the ‘t’ distribution table


We will discuss here the comparison between ‘t’ and ‘z’ tables. The values
of ‘t’ distribution table differ in construction from the ‘z’ table or normal
distribution table. The ‘t’ table is more compact and shows areas and ‘t’

Manipal University Jaipur Page No. 348


Statistics for Management Unit 8

values for only a few percentages (10, 5, 2, and 1 Percent). Because there
is a different ‘t’ distribution for each number of degrees of freedom, a more
complete table would be quite lengthy.
A second difference in the ‘t’ table is that it does not focus on the chance
that the population parameter being estimated will fall with our confidence
interval. Instead, it measures the chance that the population parameter we
are estimating will not be within our confidence interval (that is, it will lie
outside the confidence interval).
Table 8.4 : Formulae Concerning Estimation
Infinite population Finite population

Estimating   Nn
population mean () Xt s Xt s
when we do not n n N 1
know  p and use
 s and sample is
small (n  30)

Solved Problem 4
A random sample of 14 items is taken, producing a sample mean of 2.14
and sample standard deviation is 1.29. Find the confidence interval for the
population mean. ( t table value is 3.012)
Solution
Given n=14, X  2.14 and  =1.29. Since the sample size is less than 30,
s
we use t distribution to compute the confidence interval.
Confidence interval for μ is given by

 
UCL  X  t s and LCL  X  t s.
n n
Table value for t at 99% confidence level and n-1 =14 -1 = 13 degrees of
freedom is 3.012. Therefore, we have


UCL  X  t s  2.14  3.012  1.29  2.14  1.04  3.18 and
n 14
Manipal University Jaipur Page No. 349
Statistics for Management Unit 8


LCL  X  t s  2.14  3.012  1.29  2.14  1.04  1.1
n 14
Therefore, the confidence interval for μ is 1.1 ≤ μ ≤ 3.18.

Self Assessment Questions


1. XY Pizza has developed quite a business in Bangalore by delivering
pizza orders promptly. It guarantees that its pizzas will be delivered in
30 minutes or less from the time the order was placed, and if the
delivery is late, the pizza is free. The time that it takes to deliver each
pizza order, that is, the on time is recorded in the pizza time book
(PTB), and the delivery time for those pizzas that are delivered late is
recorded as 30 minutes in the PTB. A sample of 12 random entries
from the PTB is depicted in table 8.5.
Table 8.5: Twelve Random Entries of Pizza Delivery Time

15.3 29.5 30 10.1 30 19.6


10.8 12.2 14.8 30 22.1 18.3
i) Find the mean for the sample.
ii) From what population was this sample drawn?
iii) Can this sample be used to estimate the average time that it
takes for Pizza Hut to deliver a pizza? Explain.
2. Madhu, a frugal student, wants to buy a used bike. After randomly
selecting 125 wanted advertisements, he found the average price of the
bike to be Rs. 3250 with a standard deviation of Rs. 615. Establish an
interval estimate for the average price of bike so that Madhu can be:
i) 68.3% certain that the population mean lies in this interval.
ii) 95.5% certain that the population mean lies in this interval.
3. Given the following confidence levels, express the lower and upper limits
of the confidence interval for these levels in terms of X and  x (Use
the normal distribution tables).
i) 54 percent
ii) 75 percent
iii) 94 percent
iv) 98 percent

Manipal University Jaipur Page No. 350


Statistics for Management Unit 8

4. From a population of 540, a sample of 60 individuals is taken. From this


sample the mean is found to be 6.2 and the standard deviation to be
1.368.
i) Find the estimated standard error of the mean.
ii) Construct a 96 % confidence interval of the mean.
5. For the following sample sizes and confidence levels, find the
approximate ‘t’ values for constructing confidence intervals (use the ‘t’
table).
i) n = 28; 95%
ii) n = 8; 98%
iii) n = 13; 90%
iv) n = 25; 95%
Solved Problem 5
IMA Management University wants to conduct a survey of the annual
earning of its graduates in international placements. It knows from past
experience that the standard deviation of its population of students is
Rs. 1500. How large a sample size should be taken in order to estimate the
mean annual earnings of last year’s class within Rs. 500 at 95% level of
confidence?
Solution
From the given data, it can be stated that variation of Rs. 500 can be on
either side of the population mean. That is,

z x  500

At 95 % level of confidence, we know from the ‘z’ table that ‘z’ is 1.96.
Therefore,
1.96x  500
 x  500 / 1.96  255

Now, if the standard error of the mean is 255; that lead us to:
 x   / n  255

Since, ‘’ is 1500, we can find ‘n’, that is:

1500 / n  255

Manipal University Jaipur Page No. 351


Statistics for Management Unit 8

Therefore,
2
 1500 
n   34.6
 255 
It implies that ‘n’ should be greater than 34.6 or 35, if the university wants to
estimate the precision with which it wants to conduct the survey.

8.9 Summary
Let us recapitulate the important concepts discussed in this unit:
 The point estimates and interval estimates are the foundations for
inferential statistics in estimation and hypothesis testing.
 Point estimate is a single number that is used to estimate an unknown
population parameter.
 Interval estimate is a range of values used to estimate a population
parameter.
 If the sample size is less than 30 and the population standard deviation
is not known, we use the Student’s ‘t’ distribution for estimations.

8.10 Glossary
Confidence interval estimate: A statistic constructed from a set of data to
provide an interval estimate for a parameter, provides a range of values
around an estimate to show how precise the estimate is. The confidence
level associated with the interval usually 90%, 95%, or 99%, is the
percentage of times in repeated sampling that the intervals will contain the
true value of the unknown parameter.
Degrees of freedom: The number of values in the final calculation of a
statistic that are free to vary, frequently referred to in the organisation of
tables of statistical distributions used in undertaking significance tests, for
e.g., t-distribution.
Estimation: The process of using a sample to estimate features of a
population.
Interval estimate: Interval estimate is a range of values used to estimate a
population parameter.
Interval estimates of the proportion: Statisticians often use sample to
estimate a proportion of occurrences in a population.
Manipal University Jaipur Page No. 352
Statistics for Management Unit 8

Point estimate: A single sample statistic, used to estimate a population


proportion.
Statistical inference: Statistical inference is based on estimation, and
hypothesis testing. In both estimation and hypothesis testing, inferences are
made about the characteristics of populations from information contained in
samples. Inferences are made about a population, from the information
taken from a sample.

8.11 Terminal Questions


1. XYZ bank is determining the number of tellers available during the
Friday lunch rush hour. The bank has collected data on the number of
people who entered the bank during the past three months, on Fridays
from 11 am to 1 pm. Using the data from table 8.6, find the point
estimates of the mean and standard deviation of the population from
which the sample was drawn.
Table 8.6: Data of the Number of People entered into XYZ Bank

242 275 289 306 342 385


279 245 269 305 294 328

2. From a population known to have a standard deviation of 1.4, a sample


of 60 individuals is taken. The mean of this sample is found to be 6.2.
i) Find the standard error of the mean.
ii) Establish an interval estimate around the sample mean using one
standard deviation of the mean.
3. On collecting a sample of 250 from a population with a known standard
deviation of 13.7, the mean is found to be 112.4.
i) Find a 95% confidence level interval for the mean.
ii) Find a 99% confidence level interval for the mean.

Activity:
1. Which of the following property is not a desirable property of a point
estimation:
i) Consistency
ii) Efficiency
iii) Sufficiency
iv) Bias

Manipal University Jaipur Page No. 353


Statistics for Management Unit 8

2. Which of the following is most relevant for deriving a point estimate?


i) Sample size
ii) Confidence desired
iii) Variability in the population
iv) Population size.
3. Which of the following factors does not affect the width of a
confidence interval?
i) Sample size
ii) Confidence desired
iii) Variability in the population
iv) Population size.
4. A sample distribution is the distribution of a ______________.
i) Parameter
ii) Mean
iii) Proportion
iv) Statistic
5. The standard deviation of a sample mean is called __________.
i) Sample error
ii) Standard deviation
iii) Standard error
iv) None of the above

8.12 Answers

Self Assessment Questions


1. i) For the given sample the mean is 20.225 minutes.
ii) The population was drawn from the Pizza Time Book (PTB) of XY
Pizza.
iii) No. As the time over 30 minutes is recorded as 30 and hence, it will
underestimate the delivery time.
2. The population standard deviation is given as:
 s  615; n  125, X  3250
and standard error  x is calculated as:

Manipal University Jaipur Page No. 354


Statistics for Management Unit 8

σs 615
x    55.01
n 125
i) X  1 x = 3250  55.01 = 3194.99 and 3305.01 to be 68.3%
certain.
ii) 95.5% certain means X  2 x = 3250  110.02 giving a range
between 3139 and 3360.02.
3. The required lower and upper class intervals are:
i) X  0.74 x ii) X  1.15 x
iii) X  1.88 x iv) X  2.33 x
4.
 Nn n
i.  x   as  0.05
n N 1 N

1.368 540  60
x    0.167
60 540  1

ii. X  2.05 x = 6.2  2.05 (0.167)


Hence, the LCL and UCL are 5.86 and 6.54 respectively.
5. i) 2.052
ii) 2.998
iii) 1.782
iv) 2.262

Terminal Questions
1. The mean and standard deviation are 296.583 and 40.751.
2. i) 0.181
ii) 6.019, 6.381
3. i) 112.4  1.697
ii) 112.4  2.234
Activity Solution
1. iv) Bias
2. i) Sample size

Manipal University Jaipur Page No. 355


Statistics for Management Unit 8

3. iv) Population size


4. iv) Statistic
5. iii) Standard error

8.13 Case Study


Case Study 1 – Survey on MBA Students
A survey was conducted among 400 MBA 3rd year students at a
management institute to ascertain their most preferred criterion for
accepting an offer. The criteria were: salary package, job profile and brand
value of the company. Their responses are tabulated as follows. Do the
criteria depend on the area of specialisation?
Table 8.7: Response of students on criteria

Criteria Production Marketing HRM Finance Total


Salary Package 45 36 28 47 156
Job Profile 35 24 32 33 124
Brand value of
20 40 40 20 120
company
Total 100 100 100 100 400

Will the conclusion change if the data for marketing and finance are
interchanged?
Case Study 2 – Study Consumer Behaviour
An advertisement company is interested in studying the consumers’
behaviour in the context of purchase decision of jeans in the Lee market.
This company is aiming to be a major player in the Lee market that is
characterised by intense competition. It would like to know in particular
whether the income level of the consumer influence their choice of the
brand. Currently there are four brands in the market. ‘A’ and ‘B’ are the
premium brands while ‘C’ and ‘D’ are the economy brands.
A stratified random sampling procedure was adopted to cover the entire
market using income as the basis of selection. The categories that were
used in classifying income level were: lower, upper, middle and high. A
sample of 700 consumers participated in this study. The data depicted in the
following table 8.8, emerged from the study.

Manipal University Jaipur Page No. 356


Statistics for Management Unit 8

Table 8.8: Choice of Brand


Brand
Income
A B C D Total
Lower 30 20 50 70 170
Middle 25 30 40 35 130
Upper Middle 45 50 25 30 150
High 75 50 75 50 250
Total 175 150 190 185 700

Analyse the above data to test independence of brand and income level.
Further, the marketing manager is in dilemma of selecting the appropriate
colours for jeans. For this, he wishes to compare five different colours of
jeans. He is interested in knowing the most preferred colour. A random
sample of 500 consumers reveals the following observation.
Table 8.9: Preference of consumers on colour of Jeans

Jeans colour Preference by consumer


White 80
Basic Blue 140
Green 70
Indigo 130
Black 80
Total 500

Does the consumer preference for jeans colours show any significant
difference?
References:
 Frederick James, (November 29, 2006), Statistical Methods in
Experimental Physics, 2nd Edition, (Hardcover). ).
 Froedesen, A. G., Skjeggestad, D. & Tøfte, H., (1979), Probability and
Statistics in Particle Physics, (Hardcover, out of print).
 Louis, Lyons, (1989), Statistics for Nuclear and Particle Physicists,
(Paperback).

Manipal University Jaipur Page No. 357


Statistics for Management Unit 8

 Devore, Jay L., (January 29, 2008), Probability and Statistics for
Engineering and the Sciences, Enhanced Review Edition (Hardcover).
 Morris, H. & Schervish, Mark J., (January 31, 2002), Probability and
Statistics, DeGroot, (Paperback).
 Ross, Sheldon M., (February 13, 2009), Introduction to Probability and
Statistics for Engineers and Scientists, Fourth Edition, (Hardcover).
 Cowan, Glen, Statistical Data Analysis, Oxford Science Publications,
(Paperback).
 Bevington, Philip R., and Robinson, D. Keith, Data Reduction and Error
Analysis for the Physical Sciences, 3rd Edition, (Paperback).
 Taylor, John R., An Introduction to Error Analysis: The Study of
Uncertainties in Physical Measurements,(Paperback).
 Mandel, John, The Statistical Analysis of Experimental Data,
(Paperback).
 Meyer, Stuart L., Data Analysis for Scientists and Engineers,
(Paperback).
 Press, William H., Teukolsky, Saul A., Vetterling, William T. and
Flannery, Brian P., Numerical Recipes: The Art of Scientific Computing,
3rd Edition.
 Levin, Richard I. & Rubin, David S., (2008), Statistics for Management,
Seventh Edition, PHI Learning Private Limited.

Manipal University Jaipur Page No. 358


Statistics for Management Unit 9

Unit 9 Testing of Hypothesis in Case of Large


and Small Samples
Structure:
9.1 Introduction
Objectives
Relevance
Assumptions
9.2 Testing Hypothesis
Null and Alternate hypothesis
Interpreting the level of significance
Hypothesis are accepted and not proved
9.3 Selecting a Significance Level
Preference of type I error
Preference of type II error
Determine appropriate distribution for the test of Mean
9.4 Two–tailed Tests and One–tailed Tests for Mean
Case study on Two–tailed and One-tailed tests
9.5 Classification of Test Statistics
Statistics used for testing of hypothesis
Test procedure
How to identify the right statistics for the test
9.6 Testing of Hypothesis in the Case of Small Samples
9.7 ‘t’ Distribution
Uses of ‘t’ test
9.8 Summary
9.9 Glossary
9.10 Terminal Questions
9.11 Answers
9.12 Case Study

9.1 Introduction
In the previous unit, estimation, we have studied about the estimation of the
parameter from the samples and the methods of estimation. In this unit,
Testing of hypothesis, we will study about hypothesis and the testing of
hypothesis. Estimation is about estimating the parameters and finding out

Manipal University Jaipur Page No. 359


Statistics for Management Unit 9

the confidence intervals. Hypothesis testing is the opinion about the


population parameter that may or may not be in the confidence interval
derived from the sample. Hypothesis testing is helpful in decision making.
Before starting this unit, let’s refresh the concepts we have studied on
estimation.
Hypothesis testing begins with an assumption, called hypothesis that we
make about a population parameter wherein we assume a certain value for
the population parameter. To test the validity of our assumption, we gather
sample data and determine the difference between the hypothesised value
and the actual value of the sample statistic. Then we judge whether the
difference is significant.
The smaller the difference, the greater the likelihood that our hypothesised
value for the parameter is correct. The larger the difference, the smaller the
likelihood that our hypothesised value for the parameter is correct.
Unfortunately, the difference between the hypothesised population
parameter and the actual statistic is more often; neither so large that we
automatically reject our hypothesis, nor so small that we just as quickly
accept it. So in hypothesis testing, as in most significant real-life decisions,
clear-cut solutions are the exception, not the rule.
Objectives:
After studying this unit, you should be able to:
 describe the basic concepts of testing hypothesis
 describe the different test statistics available
 identify the test for a given problem
 identify the type of errors
9.1.1 Relevance
Caselet
You need to be objective
The government in a certain country says that radiation levels in the area
surrounding a nuclear power plant are well below levels considered harmful.
Three people in the area died of leukemia. The local people immediately put
the blame on the radioactive fallout. Does the death of three people make
us assume that the government is wrong with its information and that we
make assumption or hypothesis, that radiation levels in the area are
Manipal University Jaipur Page No. 360
Statistics for Management Unit 9

abnormally high? Alternatively, do we accept that the deaths from leukemia


are random and not related to the nuclear power facility? You should not
accept or reject a hypothesis about a population parameter- in this case the
radiation levels in the surrounding area of the nuclear power plant, simply by
institution. You need to be objective in decision making. For this situation an
appropriate action would be to take samples of the incidence of leukaemia
cases over a reasonable period of time and use these to test the hypothesis.
The purpose of this unit is to find out how to use hypothesis testing to
determine if a claim is valid.
(Source: Derek L Waller published by Elsevier Inc Ed 2008)

9.1.2 Assumptions
Although hypothesis testing sounds like some formal statistical term and
completely unrelated to business decision making, in fact, managers
propose and test hypothesis all the time. For example, “if we drop the price
of this car model by Rs.1,500, we will sell 50,000 cars this year” is a
hypothesis. To test this hypothesis, total car sales till the end of the year
have to be counted.
Managerial hypothesis are based on intuition; the marketplace decides
whether the manager’s intuitions were correct. Hypothesis testing is about
making inferences about a population from only a small sample. The bottom
line in hypothesis testing is when we ask ourselves (and then decide)
whether a population, like this one, would be likely to produce a sample like
the one we are looking at.

9.2 Testing Hypothesis


9.2.1 Null and Alternate hypothesis
In hypothesis testing, we must state the assumed or hypothesised value of
the population parameter before we begin sampling. The assumption we
wish to test is called the null hypothesis and is symbolised by ’H0’.

Example 1
We want to test the hypothesis, that the population mean is equal to 500.
We would symbolise it as follows and read it as,
The null hypothesis is that the population mean = 500 which is written as,

 0 :   500

Manipal University Jaipur Page No. 361


Statistics for Management Unit 9

From agricultural and medical applications of statistics, the term ‘null


hypothesis’ is derived. In order to test the effectiveness of a new fertilisers
or drugs, the tested hypothesis (the null hypothesis) was that it had no
effect, that is, there was no difference between treated and untreated
samples. If we use a hypothesised value of a population mean in a problem,
we would represent it symbolically as ‘0’. This is read as - ‘The
hypothesised value of the population mean’.
If our sample results fail to support the null hypothesis, we must conclude
that something else is true. Whenever we reject the hypothesis, the
conclusion we do accept is called the alternative hypothesis and is
symbolised as ‘H1.
For the null hypothesis H0:  = 200, we will consider three alternative
hypotheses:
H1:   200 (population mean is not equal to 200)
H1:  > 200 (population mean greater than 200)
H1:  < 200 (population mean less than 200)
Example 2
If we want to test the success rate of a particular treatment, we make null
hypothesis for success rate ‘p’ (for the test value of 0.99) as:
 0 : p  0.99 and alternative hypothesis is among
 1 : p  0.99
 1 : p  0.99
 1 : p  0.99

Example 3
If we want to test if the attribute of educational qualification has any
influence on the income of an individual, we make null hypothesis as:
 0 : Educational qualification has no influence on the income of an
individual
and alternative hypothesis is
1 : Educational qualification has an influence on the income of the
individual

Manipal University Jaipur Page No. 362


Statistics for Management Unit 9

9.2.2 Interpreting the level of significance


The purpose of hypothesis testing is not to question the computed value of
the sample statistic but to make a judgment about the difference between
that sample statistic and a hypothesised value for population parameter.
The next step after stating the null and alternative hypothesis; is to decide
what criterion to be used for deciding whether to accept, or reject the null
hypothesis. If we assume the hypothesis is correct, then the significance
level will indicate the percentage of sample statistic that is outside certain
limits. (In estimation, the confidence level indicates the percentage of
sample statistic that falls within the defined confidence limits).
9.2.3 Hypothesis are accepted and not proved
Even if our sample statistic does fall in the non-shaded region, this does not
prove that our null hypothesis (H0) is true; it simply does not provide
statistical evidence to reject it. Why? It is because the only way in which the
hypothesis can be accepted with certainty is for us to know the population
parameter; unfortunately, this is not possible.
Therefore, whenever we say that we accept the null hypothesis, we actually
mean that there is no sufficient statistical evidence to reject it. Use of the
term accept, instead of do not reject, has become a standard practice. It
means that when sample data do not suggest us to reject a null hypothesis,
we believe that the hypothesis is true. Figure 9.1 depicts the non shaded
region that makes up 95 percent of the area under the curve.

Fig. 9.1: Acceptance and Rejection Region of Sample

Manipal University Jaipur Page No. 363


Statistics for Management Unit 9

9.3 Selecting a Significance Level


There is no single standard or universal level of significance for testing
hypothesis. In some instances, a 5% level of significance is used. In the
published results of research papers, researchers often test hypothesis at
the 1 percent level of significance. Hence, it is possible to test a hypothesis
at any level of significance. However, remember that our choice of the
minimum standard for an acceptable probability, or the significance level, is
also the risk we assume of rejecting a null hypothesis when it is true.
The higher the significance level we use for testing a hypothesis, the higher
the probability of rejecting a null hypothesis when it is true. 5% level of
significance implies we are ready to reject a true hypothesis in 5% of cases.
If the significance level is high then we would rarely accept the null
hypothesis when it is not true but, at the same time, often reject it when it is
true.
When testing a hypothesis we come across four possible situations. Table
9.1 depicts the four possible situations.
Table 9.1: Possible Situations when Testing a Hypothesis
Hypothesis is
True False
Test results says Accept Type II error
Reject

Type I error
The combinations are:
 If null hypothesis is true, and the test result make us to accept it, then
we have made a right decision.
 If null hypothesis is true, and the test result make us to reject it, then we
have made a wrong decision (Type I error). It is also known as
consumer’s risk, denoted by .
 If hypothesis is false, and the test results make us to accept it, then we
have made a wrong decision (Type II error). It is known as producer’s
risk, denoted by  .1 – is called power of the test.
 If hypothesis is false, and the test result make us to reject it – we have
made a right decision.
Manipal University Jaipur Page No. 364
Statistics for Management Unit 9

9.3.1 Preference of type I error


For example, making a type I error (rejecting a null hypothesis when it is
true) involves the time and trouble of reworking a batch of chemicals that
should have been accepted. At the same time, making a type II error
(accepting a null hypothesis when it is false) means taking a chance that an
entire group of users of this chemical compound will be poisoned.
Obviously, the management of this company will prefer a type I error to a
type II error and, as a result, will set very high levels of significance in its
testing to get low ’s.
9.3.2 Preference of type II error
For example, making a type I error involves disassembling an entire engine
at the factory, but making a type II error involves relatively inexpensive
warranty repairs by the dealers. In this case, the manufacturer is more likely
to prefer a type II error and will set lower significance levels in its testing.
9.3.3 Determine appropriate distribution for the test of Mean
After deciding what level of significance to use, our next task in hypothesis
testing is to determine the appropriate probability distribution. We have a
choice between the normal distribution, and the ‘t’ distribution.
The rules for choosing the appropriate distribution are similar to those we
encountered in the unit on estimation. Later in this unit, we shall examine
the distributions appropriate for testing hypothesis about proportions.
Table 9.2 depicts when to use the normal and ‘t’ distributions in making tests
of means.

Table 9.2: Conditions for Using the Normal and ‘t’ Distributions in
Testing Hypothesis about Means
When the Population When the Population
Standard Deviation is Standard Deviation is
known not known
Sample size ‘n’ is larger Normal distribution, Normal distribution,
than 30. z–table z–table
Sample size ‘n’ is 30 or Normal distribution, ‘t’ distribution, ‘t’ table
less and we assume the z–table
population is normal or
approximately so.

Manipal University Jaipur Page No. 365


Statistics for Management Unit 9

One more rule has to be kept in mind, when testing the hypothesised values
of a mean. As in estimation, use the finite population multiplier whenever the
population is finite in size, sampling is done without replacement, and the
sample is more than five percent of the population.

9.4 Two–tailed Tests and One–tailed Tests for Mean


A two-tailed test of a hypothesis will reject the null hypothesis, if the sample
mean is significantly higher than or lower than the hypothesised population
mean. Thus, in a two-tailed test, rejection region is split in two parts under
the distribution curve.
A two-tailed test is appropriate when:
The null hypothesis is  = 0 (where 0 is some specified value)
The alternative hypothesis is   0.
One-tail test is appropriate in each of the following cases:
1 :    0
1 :    0

9.4.1 Case study on Two-tailed and One-tailed tests


Case Study
Let’s assume that a manufacturer of light bulbs wants to produce bulbs
with a mean life of:
   0  1000 hours

If the lifetime is shorter, he/she will lose customers to his/her competitors;


if the lifetime is longer, he/she will have a very high production cost
because the filaments will be excessively thick.
In order to see whether the production process is working properly, a
sample of the output is taken to test the hypothesis,
 0 ;   1000
A two-tailed test is used because he/she does not want to deviate
significantly from 1,000 hours in either direction, the appropriate
alternative hypothesis is:
1 :   1000
Therefore, the null hypothesis is rejected, if the mean life of bulbs in the
sample is either too far above 1,000 hours or too far below 1,000 hours.

Manipal University Jaipur Page No. 366


Statistics for Management Unit 9

However, there are situations in which a two-tailed test is not appropriate,


and we must use a one-tailed test.

Case Study (contd.)


Consider the case of a wholesaler who buys light bulbs from the
manufacturer discussed earlier. The wholesaler buys bulbs in bulk and
does not want to accept a lot of bulbs unless their mean life is at least
1,000 hours. As each shipment arrives, the wholesaler tests a sample to
decide whether the shipment should be accepted. The company will
reject the shipment only if the wholesaler feels that the mean life is below
1,000 hours. If the bulbs are better than expected (with a mean life above
1,000 hours), the shipment will not be rejected because the longer life
comes at no extra cost.
So, the wholesaler’s hypothesis is:

Ho:  = 1,000 and H1:  < 1,000 hours.


The wholesaler rejects ‘H0’ only if the mean life of the sampled bulbs is
significantly below 1,000 hours. Figure 9.2 depicts why this test is called
a left-tailed test (or a lower-tailed test).

Fig. 9.2: Left-tailed Test

In general, a left tailed (lower-tailed) test is used if the hypothesis is


Ho:  = o and H1:  < o. In such a situation, sample evidence with the
sample mean significantly below the hypothesised population leads us to
reject the null hypothesis in favour of the alternative hypothesis. Stated
differently, the rejection region is in the lower tail (left tail) of the distribution
of the sample mean, and that is why we call this a lower-tailed test.

Manipal University Jaipur Page No. 367


Statistics for Management Unit 9

A left-tailed test is one of two kinds of one-tailed tests. The other kind of
one-tailed test is a right-tailed test (or an upper-tailed test). An upper-tailed
test is used when the hypothesis is H1:  > o. Only values of the sample
mean that are significantly above the hypothesised population mean will
cause us to reject the null hypothesis in favour of the alternative hypothesis.
Figure 9.3 depicts an upper-tailed test where the rejection region is in the
upper tail of the distribution of the sample mean.

Fig. 9.3: Right-tailed Test

Tests for proportion and other parameters are similarly discussed; rejection
regions are similarly identified with reference to the given level of
significance and appropriate distribution.
In each example of hypothesis testing, when we accept a null hypothesis on
the basis of sample information, we are really saying that there is no
statistical evidence to reject it. We are not saying that the null hypothesis is
true. The only way to prove a null hypothesis is to know the exact value of
the population parameter or the population distribution and that is not
possible with just sampling. Thus, we accept the null hypothesis and behave
as if it is true simply because we can find no evidence to reject it.

Example 4
The hypothesis to be tested is Ho:  = 100, against the alternative
hypothesis H1:   100, with sample size n = 20, population standard
deviation σ =2.5. Here sample size is smaller than 30 but population
standard deviation is given; hence to test the hypothesis the probability
distribution used is ‘normal distribution’.

Manipal University Jaipur Page No. 368


Statistics for Management Unit 9

Example 5
The hypothesis to be tested is Ho:  = 10 against the alternative
hypothesis H1:  > 10, with sample size n = 20, population standard
deviation is not known. Here sample size is smaller than 20 but
population standard deviation is not given, hence to test the hypothesis
the probability distribution used is ‘t- distribution’.

Self Assessment Questions


1. For the following cases; specify which probability distribution to use in
hypothesis testing:
i. H0:  = 27, H1:   27, X = 33, sample  = 4, n = 25
ii. H0:  = 98.6, H1:  > 98.6, X = 99.1,  = 1.5, n = 50
iii. H0:  = 3.5, H1:  < 3.5, X = 2.8, sample  = 0.6, n = 18
iv. H0:  = 57, H1:  > 57, X = 65, sample  = 12, n = 42

9.5 Classification of Test Statistics


9.5.1 Statistics used for testing of hypothesis
Table 9.3a: Statistics for Testing the Hypothesis on Proportion; Large Sample Case
Test Description
Test Statistics Notes
No. of Test
1 Test for 

specified p p p = Sample proportion
proportion –
Z p = Hypothised value of
pq population proportion
infinite
population n q = 1 – p, n= sample size
2 Test for 
 p = Sample proportion
specified ( p  p)
proportion – Z p = Hypothised value of
finite  pq  Nn population proportion
   
population  n   N 1  q = 1 – p,n = Sample size
N = Population size
3 Test for 

difference in
  p1 = first sample proportion
( p1  p 2 )
proportions of Z 

two samples –     p 2 = second sample


p1 q 1 p 2 q 2
different  proportion
population n1 n2 n1 = first sample size
n2 = second sample size

Manipal University Jaipur Page No. 369


Statistics for Management Unit 9

4 Test between 

proportion –
p1 = first sample proportion
when 

populations  
p 2 = second sample
are similar (p1  p 2 ) proportion
Z n1 = first sample size
with respect to
 1 1 
a given p 0 q 0    n2 = second sample size
attribute  n1 n 2   
n p  n 2 p2
p0  1 1
n1  n 2
and q0 = 1- p0
Table 9.3b: Statistics for Testing the Hypothesis on Mean; Large Sample
Case
Test Description
Test Statistics Notes
No. of Test
5 Test for  = Population mean
specified
mean –
X = Sample mean
infinite (X  ) p = Population S.D
Z
population, p In case p is not known, we
n>30 and use s in its place calculating
n
population
variance(s)  ( X  X )2
s  i
known n 1
6 Test for  = Population mean
specified
mean –
X = Sample mean
( X  ) p = Population S.D
finite Z
p In case p is not known, we
population,
Nn
n>30 and   use s in its place calculating
population
n  N 1 
variance(s)  ( X  X )2
s  i
known n 1
7 Test for X 1 and X 2 are sample mean
difference in (X1  X 2 )
Z for the first and second
means–
  p1 2  p 2 2  samples respectively.
different   
 n1 n2  n1= first sample size
population,  
n>30 and n2 = second sample size

Manipal University Jaipur Page No. 370


Statistics for Management Unit 9

population In case  p1 and  p 2 are not


variance(s)
known known, we use  s1 and
 s 2 respectively in their
places calculating

 ( X  X )2
1i 1
 s1 = n 1
1

(X  X )2
s2 = 2i 2
n 1
2

8 Test for X 1 and X 2 are sample mean


difference in
for the first and second
means–
samples respectively.
same
population, n1= first sample size
n>30 and n2 = second sample size
population In case  p is not
variance(s)
known (X 1  X 2 ) known, we use  s12
Z
in its place calculating
2 1 1 
 p    2 2 2 2
 n1 n 2  N1 (1  d1 )  N 2 ( 2  d 2 )
 s12 
N1  N 2

d1 = X1  X12
d2 = X 2  X12

N1 X1  N 2 X 2
X 12 
N1  N 2

9.5.2 Test procedure


Having calculated appropriate z-statistic or t-statistic, to reject or accept the
null hypothesis, it is necessary to identify the rejection region with reference
to the given level of significance. If the calculated statistic is in the rejection
region, we accept the alternative hypothesis against the null hypothesis at

Manipal University Jaipur Page No. 371


Statistics for Management Unit 9

that level of significance. Otherwise, we accept null hypothesis at given level


of significance. Table 9.4 depicts the rejection region, normally denoted by
‘R’.
Table 9.4: Kinds of Tests

Kind of test z- statistic t- statistic


Two tail test R: |z| > |ztable| R: |t| > |ttable|
Lower tail test R: z < ztable R: t < ttable
Upper tail test R: z > ztable R: t > ttable

Figure 9.4 depicts the hypothesis testing procedure.

Fig. 9.4: Hypothesis Testing Procedure

Manipal University Jaipur Page No. 372


Statistics for Management Unit 9

9.5.3 How to identify the right statistics for the test


Figure 9.5 depicts the step by step procedure to identify the right statistics
for the test.

Fig. 9.5: Identification of Right Statistics for the Test

Self Assessment Questions


2. i) Null hypothesis states that there is a significant difference between
observed and hypothetical values. (True/False)

Manipal University Jaipur Page No. 373


Statistics for Management Unit 9

ii) 1% level of significance means we are ready to reject a true


hypothesis in 99% of cases. (True/False)
iii) If the Null hypothesis H0:  = X or H0: p = ps or H0: 1 = 2 or H0: p1
= p2 then it is two-tailed test. (True/False)
iv) If the calculated value of a statistic is not in the rejection region R,
then Ho is accepted. (True/False)
v) 1 -  is called power of the test. (True/False)
vi) If n1 = 300, n2 = 500, 1 = 50, 2 = 60, 1 = 10, 2 = 12 are results of
two samples taken from two cities A and B then we test for between
means under different population. (True/False)
vii) If n < 30, then we do not apply z test unless, population S.D is
known. (True/False)

Solved problem 1
XYZ press hypothesis is that the average life of its latest web-offset press is
14,500 hours. They know the standard deviation of the press life is 2,100
hours. From a sample of 25 presses, the company finds a sample mean of
13,000 hours. At a significance level of 0.01, should the company conclude
that the average life of the presses is less than the hypothesised 14,500
hours?
Solution
The procedure is described here:
1. Null hypothesis H0:  = 14,500
Alternate hypothesis H1:  < 14,500 (one-tailed test)
2. Level of significance  = 0.01  Ztab = - 2.33 and R:z< -2.33
(X  )
3. Test statistics Z
p
n

4. Given  = 14,500, X = 13,000, p = 2,100, n = 25


Note: Although n < 25, population standard deviation is given, therefore
it becomes Z test.

Manipal University Jaipur Page No. 374


Statistics for Management Unit 9

p 2100 2100
    420
n 25 5

13000  14500
Z cal    3.57
420
5. Conclusion: Since Zcal (-3.57) < Ztab (-2.33) and is in the rejection region,
H0 is rejected. In other words, we accept that the average life of the
press is significantly lesser than 14,500 hrs at 1% level of significance.

Solved problem 2
Theatre owners in India know that a hit movie ran for an average of 84 days,
with a standard deviation of 10 days in each city the movie was screened. A
particular movie distributor was interested in comparing the popularity of the
movie in his/her region with that of the population. The distributor randomly
chose 75 theatres in the region and found a popular movie ran for 81.5
days.
1) State appropriate hypothesis for testing whether there was a significant
difference between theatres in the distributor’s region and the
population.
2) At 1% significance level, test this hypothesis.

Solution
The procedure is explained in the form of steps:
1. Null hypothesis Ho:  = 84
Alternate hypothesis H1:   84 (two-tailed test)
2. Level of significance  = 0.01  Ztab = 2.58 and R:|z| > 2.58
3. Test statistics
(X  )
Z
p
n

4. Given  = 84, X = 81.5, p = 10, n = 75


p 10
Therefore,   1.1547
n 75

Manipal University Jaipur Page No. 375


Statistics for Management Unit 9

84  81.5
Z cal   2.165
1.1547
5. Conclusion: Since Zcal (2.165) < Ztab (2.58), and not in the rejection
region, H0 is accepted at 1% level of significance.

Solved Problem 3
A ketchup manufacturer is in the process of deciding whether to produce a
new extra spicy brand of ketchup. In a survey of 6000 households, the
company’s market research team found that, 355 households would buy the
extra spicy brand. A more extensive study carried out 2 years ago showed
that 5% of the households would buy the brand then. At 2% level of
significance, should the company conclude that there is an increased
interest in the extra spicy flavour?
Solution
The procedure is explained in the following steps:
1. Null hypothesis Ho: p = 0.05
Alternate hypothesis H1: p > 0.05 (one-tailed test)
2. Level of significance  = 0.01  Ztab = 2.05 and R:z > 2.05
3. Test statistics

p p
Z
pq
n

4. Given p = 0.05, p = 355/6000 = 0.0592, n = 6000, q = 1 – p =1- 0.05=
0.95
( 0.0592  0.05 )
Z cal   3.29
0.05  0.95
.
6000

5. Conclusion: Since Zcal (3.29) > Ztab (2.05), and found in the rejection
region, Ho is rejected and it is accepted that there is an increase of the
proportion of population having an interest in the new flavour.

Manipal University Jaipur Page No. 376


Statistics for Management Unit 9

Solved Problem 4
Microsoft estimated that out of 10,000 potential software buyers, 35% wait
to purchase the new OS Windows Vista, until an upgrade has been
released. After an advertising campaign to reassure the public was
released, Microsoft surveyed 3000 buyers and found 950 who are still
skeptical. At 5% level of significance, can the company conclude that the
population of skeptical people had decreased?
Solution
The procedure is explained in the following steps:
1. Null hypothesis Ho: p = .35
Alternate hypothesis H1: p < 0.35
2. Level of significance  = 0.05  Ztab = - 1.645 and R: z < -1.645
3. Test statistics

( p  p)
Z
 pq  Nn
   
 n   N 1 

4. Given p = 950/3000 = 19/60 = 0.317, p = 0.35, q = 1-p = 1- 0.35 = 0.65,

N=10,000, n = 3000

 pq  Nn  0.35  0.65   10000  3000 


        = 0.0073
 n   N 1   3000   10000  1 
( 0.317  0.35 )
Z cal    4.52
.0073
5. Conclusion: Since Zcal (-4.52) < Ztab (-1.645) and is in the rejection
region, Ho is rejected. At 5% level of significance, we conclude that the
proportion of skeptical people has significantly decreased.

Solved problem 5
A machine is designed to pack 200ml of a medicine with a standard
deviation of 5ml. A sample of 100 bottles when measured had a mean
content of 201.3ml. Test whether the machine is functioning properly (use
5% level of significance).

Manipal University Jaipur Page No. 377


Statistics for Management Unit 9

Solution
The procedure is explained in the following steps:
1. Null hypothesis Ho:  = 200
Alternate hypothesis H1:   200 (two-tailed test)
2. Level of significance  = 0.05  Ztab = 1.96 and R: |Z| > 1.96.
3. Test statistics
(X  )
Z
p
n

4. Given  = 200, X = 201.3, p = 5, n = 100

p 5
   0.5
n 100

201 .3  200
Z cal   13 / 5  2.60
0 .5
5. Conclusion: Since Zcal (2.60) > Ztab (1.96) and Zcal is in the rejection
region, Ho is rejected. Hence at 5% level of significance, we reject null
hypothesis and conclude that the machine is not functioning properly.

9.6 Testing of Hypothesis in the Case of Small Samples


Till now, we have studied about the testing of hypothesis when sample size
is large using normal distribution. However, if the sample size is small, then
the distributions of the statistics are far from normal and hence normal test
cannot be applied. Hence to deal with small samples, tests of significance
known as exact sample tests have been developed. For all practical
purposes the sample is termed as small if n  30.
The basic fundamental assumptions in all exact sample tests are:
 the parent population from which the sample drawn is normally
distributed
 Sample/samples is/are drawn at random
 They are independent of each other

Manipal University Jaipur Page No. 378


Statistics for Management Unit 9

It should be noted that the methods and theory of small samples are
applicable to large samples, but the reverse is not true.

9.7 ‘t’ Distribution


The ‘t’ distribution was developed by W.S.Gossett in the name ‘student’.
Therefore, it is known as student’s ’t’ distribution. The properties of ‘t’
distribution are:
1. ‘t’ distribution is a continuous probability distribution
2. “t” statistic is defined as:
( X  )
t
s
n
(X  X) 2
Where, s 
n 1
3. The probability density function is given by:

v  1/ 2
 t2 
f ( t )  C 1  
 v
where,
C = Constant required to make the area under the curve equal to unity.
 = n – 1, Degree of Freedom.
4. The value of ‘t’ ranges from -  to + 
5. “” is called the parameter of the distribution
6. It is symmetrical about mean
7. Its mean is zero
8. Variance of the distribution is greater than one
9. It has larger areas at the tails compared to normal distribution and
lower height at the mean.
10. It tends to a normal distribution as n  ∞.
9.7.1 Uses of ‘t’ test
The ‘t’ test is used:
 To test a specified value
Manipal University Jaipur Page No. 379
Statistics for Management Unit 9

 To test the differences between values (independent sample)


 As a paired ‘t’–test (dependent sample)
 To construct confidence interval for the estimates
Table 9.5 depicts the description of test in the case of small samples where
the population standard deviation is not known.
Table 9.5: Description of Test in the Case of Small Samples

Test Description
Test Statistics Notes
No. of Test
1 Test for X is the sample
specified mean
value – infinite
 = Hypothised
population ( X  )
t value of
with s population mean
d.f. = n -1, n
Population  ( X  X )2
variance s  i
n 1
unknown
2 Test for ( X  )
specified t
value – finite s N  n
  N = Population
population n  N 1 size
with d.f.= n-1,
Population
variance
unknown
3 Test between X1  X 2 X1 = first sample
values – t
mean
independent 2
(X  X )   (X  X )
2 1 1
samples with 1i 1 2i 2    X 2 = second
n n 2 n n 
d.f= n1 + n2 – 1 2  1 2 sample mean
2, X1  X 2 n1 and n 2 are
t
Population sizes of first and
( n  1) 2s  ( n  1) 2s 1 1
variances not 1 1 2 2   second sample
known but n n 2  n  n  respectively.
1 2  1 2
assumed to
be equal

Manipal University Jaipur Page No. 380


Statistics for Management Unit 9

4 Paired “t – D = Mean of
D
test t
(dependent  diff , where difference
n = sample size
samples) with n
d.f= n -1

 diff 

2
 D 2i  D .n
n 1

Solved problem 6
A random sample of 10 bags of fertilisers is found to have the following
weight (kg):
45, 49, 50, 49, 44, 52, 48, 45, 46, 45
Test at 5% level of significance whether the average packing weight can be
taken as 50 kg.
Solution: Table 9.6 depicts the frequency table for solved
problem 6.
Table 9.6: Frequency Table
Xi
X i X =  X i X = 
2

X i  47.3 Xi  47.32


45 -2.3 5.29
49 1.7 2.89
50 2.7 7.29
49 1.7 2.89
44 -3.3 10.89
52 4.7 22.09
48 0.7 0.49
45 -2.3 5.29
46 -1.3 1.69
45 -2.3 5.29

 X i =473 
∑ X i  X =64.1 
2

Manipal University Jaipur Page No. 381


Statistics for Management Unit 9

Sample mean is given by the formula


n

X i
473
X i 1
  47.3
n 10
The sample variance is given by,
n

 (X i  X) 2
64.1
s 2  i 1
  7.12
n 1 10  1
 s  2.6687

The steps are described as follows:


1. Null hypothesis Ho:   50
Alternate hypothesis H1:   50 (two tailed test)
2. Level of significance 5% and degrees of freedom (d.f.)= 9
ttab = 2.262 and the rejection region is R:|t| > 2.262
3. Test statistics
( X  )
t 
s
n
s 2.6687
Given X  47.3 ,  = 50,  s  2.6687 , = = 0.8439
n 10
47.3  50.0
t cal    3.19
0.8439
4. Conclusion: Since |tcal (-3.19)| > |ttab (2.262)|, tcal is in the rejection
region, H0 is rejected. Therefore we conclude, at 5% level of significance
the mean weight of fertiliser bags is not 50 kg.
Solved problem 7
For example, in the above problem, out of 1000 bags packed in a day, a
random sample of 10 was selected and the readings were as given in
solved problem 6. Test whether the population average weight is 50 kg.
Solution
The steps are described as follows:
1. Null hypothesis H0:   50

Manipal University Jaipur Page No. 382


Statistics for Management Unit 9

Alternate hypothesis H1:   50 (two tailed test)


2. Level of significance 5% and degrees of freedom (d.f.)= 9
ttab = 2.262 and the rejection region is R:|t| > 2.262.
3. Test statistics

( X  )
t
s N n
 
n  N  1 
s
Given n = 10, N = 1000, X  47.3 , = 0.8439
n

( X  ) ( 47.3  50)
t cal  
  3.2138
s N n  1000  10 
  0.8439  
n  N  1   1000  1 
4. Conclusion: Since |tcal (-3.2138)| > |ttab (2.262)|, tcal is in the rejection
region and thus Ho is rejected.

Solved Problem 8
Average tensile strength of nine samples of paper is found to be 15.8 units
and variance is 10.3. Can we say at 1% level of significance that it is a
random sample drawn from a population whose mean tensile strength is
17.5?
Solution
The steps are described as follows:
1. Null hypothesis H0:   17 .5
Alternate hypothesis H1:   17 .5
2. Level of significance 1% and degrees of freedom (d.f.) =n-1= 9-1=8
ttab = 3.36 and R:|t| > 3.36
3. Test statistics

Manipal University Jaipur Page No. 383


Statistics for Management Unit 9

( X  )
t 
s
n
X  15 .8 ,  = 17.5,  s = 10.3 ,
2
Given n=9

s 3.2084
  1.0698
n 9
15.8  17.5
t cal    1.5891
1.0698
4. Conclusion: Since Itcal (-1.5891)| < Ittab (3.36)I, Ho is accepted
 It can be considered as a random sample at 1% level of
significance.
Solved Problem 9
A sales manager wants to know whether a special promotional campaign is
a success. Table 9.7 depicts the data. Test at 5% level of significance,
whether it is a success?
Table 9.7: Sales Data Before and After the Campaign
Retail Outlets 1 2 3 4 5 6
Sales before campaign 50 48 31 42 28 53
Sales after campaign 56 55 30 45 29 58

Solution
Table 9.7a depicts the frequency table calculated for the sales data before
and after the campaign.
Table 9.7a: Frequency Table for the Sales Data Before and After the Campaign
Before (Xi) After (Yi) D = After – Before Di 2
Di = Yi - Xi
Campaign
50 56 6 36
48 55 7 49
31 30 -1 1
42 45 3 9
28 29 1 1
53 58 5 25
∑ Di = 21 ∑ Di 2 =121

Manipal University Jaipur Page No. 384


Statistics for Management Unit 9

 D i 21
Mean of Differences or D    3.5
n 6

 diff 

2
 D 2i  D .n
n 1

121  2 .6
 3.5
 diff  =3.08
6 1

The steps are described as follows:


1. Null hypothesis Ho: D  0
Alternate hypothesis H1: D  0 (one tailed test)
2. Level of significance 5% and d.f.= 5
ttab = 2.02 and R: t > 2.02.
3. Test statistics
D 3.5
t cal    2.78
 diff 3.08
n 6
4. Conclusion: Since tcal (2.78) > ttab (2.02) and is in the rejection region,
Ho is rejected. So, we accept the alternative hypothesis that there is a
significant difference in sales after the campaign.

Self Assessment Questions


3. i) ‘t’ distribution is __________ probability distribution.
ii) ‘t’ distribution’s parameter is __________.
iii) The mean and variance of the ‘t’ distribution are ________ and
________.

Manipal University Jaipur Page No. 385


Statistics for Management Unit 9

Activity:
1. A random sample of 200 tins of vanaspathi has a mean weight 4.97
kgs and a standard deviation of 0.2kgs. Test at 1% level of
significance, that the tins have 5 kgs. vanaspathi
2. A random sample of 100 rods drawn from a lot of rods has a mean
length 32.7cms. and a standard deviation of 1.3cms. Can it be
concluded that the lot has a mean of 32 cms?
Solution
1. H0 : µ = 5kg
H1: µ ≠ 5kg
Level of significance  = 0.01  Ztab = 2.58 and R: |Z| > 2.58
Test statistics
( X  )
Z
s
n
Given  = 5, X = 4.97,  s = 0.2, n = 200
4.97  5
Z cal    2.12
0 .2
200
Conclusion: Since IZcalI < Ztab, we accept H0 at 1% level of
significance and conclude that the tins have 5 kgs of vanaspathi.
2. H0 : µ = 32
H1: µ ≠ 32
Level of significance  = 0.05  Ztab = 1.96 and R: |Z| > 1.96
Test statistics
( X  )
Z
s
n
Given  = 32, X = 32.7,  s = 1.3, n = 100
32.7  32
Z cal   5.38
1.3
100
Conclusion: Since Zcal > Ztab, we reject H0 at 5% level of significance
and conclude that the lot does not have a mean of 32 cms.
Manipal University Jaipur Page No. 386
Statistics for Management Unit 9

9.8 Summary
Let us recapitulate the important concepts discussed in this unit:
 Hypothesis testing is the opinion about the population parameter that
may or may not be in the confidence interval derived from the sample.
 In hypothesis testing, we must state the assumed or hypothesised value
of the population parameter before we begin sampling. The assumption
we wish to test is called the null hypothesis and is symbolised by ’H0’.
 If our sample results fail to support the null hypothesis, we must
conclude that something else is true. Whenever we reject the
hypothesis, the conclusion we do accept is called the alternative
hypothesis and is symbolised as ‘H1.
 If null hypothesis is true, and the test result make us to accept it, then
we have made a right decision.
 If null hypothesis is true, and the test result make us to reject it, then we
have made a wrong decision (Type I error). It is also known as
consumer’s risk, denoted by .
 If hypothesis is false, and the test results make us to accept it, then we
have made a wrong decision (Type II error). It is known as producer’s
risk, denoted by  .1 – is called power of the test.
 If hypothesis is false, and the test result make us to reject it , we have
made a right decision.
 ‘t’ tests can be used for sample size (n  30) and samples whose
population standard deviations are not known.

9.9 Glossary
Level of significance: The smallest probability at which the null hypothesis
would be rejected (Type I error). Usually, if the significance level is less than
a number such as 0.05 (5%), the null hypothesis would be rejected in favour
of the alternative; the chance of getting a sample like the one being
analysed if the null hypothesis were true. A small significance level would
imply that getting such a sample was highly unlikely, suggesting that the null
hypothesis is probably not true; also called the P-value of the test.
Null distribution: The distribution of the test statistic assuming the null
hypothesis is true.

Manipal University Jaipur Page No. 387


Statistics for Management Unit 9

One-tailed test: A test in which the alternative hypothesis specifies that the
population parameter is strictly greater, or strictly lesser, than a specified
value. A test in which the alternative hypothesis specifies that the parameter
is on "one side" of the null hypothesis value; a test in which H1 contains >
or <.
P-value: The value that indicates how unusual a computed test statistic
compared with what would be expected under the null hypothesis. A small
value indicates that the null hypothesis should be rejected at any
significance level above the calculated value. For example, if the P value
equals 0.0246, we would reject the null hypothesis at the 5% significance
level, but would not reject it at the 1% significance level.
Two-tailed test: The rejection region in a two-tailed test is split between the
two tails of the distribution.
Type I error: Rejecting a true null hypothesis. The probability of a type I
error is indicated by alpha (α).
Type II error: Not rejecting a false null hypothesis. The probability of a type
II error is indicated by beta (β).
Z test for a population mean: Tests a hypothesis pertaining to the
population mean by using a z-test statistic to evaluate the magnitude of
difference between the sample mean.
Z test for a population proportion: Tests a hypothesis pertaining to the
population proportion by using a z-test statistic to evaluate the magnitude of
the difference between sample proportion and hypothesised population
proportion.

9.10 Terminal Questions


1. Twenty households out of 1000 were using Brand ‘A’ toothpaste. The
company increased the price of the brand. In a survey, they found that
only 12 households out of 1000 are using it now. Can we conclude at
5% level of significance that proportion of users has decreased?
2. A drill drills holes with standard deviation of depth 0.03cms. It is adjusted
to drill holes of depth 5.5cm. For 50 holes drilled, the mean depth is
5.503cm. Test at 5% level of significance whether the adjustment is
correct.

Manipal University Jaipur Page No. 388


Statistics for Management Unit 9

3. Out of 80 batteries produced by a process I, three were found to be


defective. In another sample of 130 produced by process II, two were
found to be defective. Test whether the proportion of defectives in two
processes differs, using 1% level of significance.
4. The table 9.8 depicts the data related to mean weight of a product. Test
whether there is a significant difference in means of the plants.
Table 9.8: Mean Weight of a Product
Plant A Plant B
Size 300 200
Mean 75.4 74.3
Variance 65.6 57.8
5. A machine is set to produce particular characteristics with mean 21.3
and S.D 0.4. A random sample of 625 observations has 21.33 as mean.
Test whether the sample mean differ significantly from population mean.
6. Out 10,000 pumpkins harvested, 1000 were randomly selected. 8%
were found to be rotten. The grower claims that only 7% are rotten. In
this claim tenable? Test at 5% level of significance.
7. A group of seven–week–old chickens reared on a high protein diet
weigh 12, 15, 11, 16, 14, 14 and 16 ounces. In another group, 5 chicken
received low protein diet and weigh 8, 10, 14, 10, and 13. Test whether
there is significant increase in weight due to high protein, use 5% level
of significance.
8. Table 9.9 depicts the strength test results of two yarns. Is there a
significant difference in the mean? Test at 5% level of significance.

Table 9.9: Strength Results of the Two Yarns


Sample Size Mean Sample Variance
Type A 4 52 42
Type B 9 42 56

9. The table 9.10 depicts the results related to the memory capacity of 10
students before and after training. Test at 5% level of significance
whether training is effective.

Manipal University Jaipur Page No. 389


Statistics for Management Unit 9

Table 9.10: Memory Capacity of 10 Students

Roll No 1 2 3 4 5 6 7 8 9 1
Before Training 1 14 11 8 7 1 3 0 5 6
After Training 1 16 10 7 5 1 10 2 3 8

9.11 Answers

Self Assessment Questions


1. i) Normal distribution
ii) Normal distribution
iii) ‘t’ distribution
iv) Normal distribution
v) Normal distribution
2. i) False
ii) False
iii) True
iv) True
v) True
vi) True
vii) True
3. i) Continuous
ii) Degrees of freedom
iii) Zero, greater than one

Terminal Questions
1. Zcal = 1.9457, H0 accepted
2. Zcal = 0.71, H0 accepted
3. Zcal = 0.50, H0 accepted
4. Zcal = 1.54, H0 accepted
5. Zcal = 18.75, H0 rejected
6. Zcal = 1.30, H0 accepted
7. tcal = 2.397, H0 is rejected
8. tcal = 2.21, H0 is rejected
9. tcal = 1.365, H0 is rejected

Manipal University Jaipur Page No. 390


Statistics for Management Unit 9

9.12 Case Study


Automatic Teller Machines
Situation
Banks in SriLanka are closed for 3.5 days from Friday afternoon to Monday
morning. In this case, banks need to have a reasonable estimate of how
much cash should be available in their ATMs. For BNP-Colombo branches
in the southeast of SriLanka, it estimates that for this 3.5 day period the
demand from its customers from hose branches with a single ATM machine
is $ 2,200 with a population standard deviation of $205. A random sample of
the withdrawal from 36 of its branches indicates a sample average
withdrawal of $ 2,235.
Discussion Questions:
i) Using the concept of critical values, at the 5% significance level does
this data indicate that the mean withdrawal from the machine it
different from $ 2,200?
ii) Re-examine question 1 using the p-value approach. Are your
conclusions the same? Explain your conclusions?
iii) What are the confidence limits at 5% significance? How do these
values corroborate your answers to question 1 and 2?
iv) Using the concept of critical values, at the 1% significance level does
this data indicate that the mean life of the population of the batteries is
different from $2,200?
v) Re-examine question 4 using the p-value approach. Are your
conclusions the same? Explain your conclusions?
vi) What are the confidence limits at 1% significance? How do these
values corroborate your answers to Question 4 and 5?
vii) Here we have used the test for a difference. Why is the bank
interested in the difference rather than a one–tail test, either left or
right hand?

References:
 Bevington, P.R. & Robinson, D.K. Data Reduction and Error Analysis for
the Physical Sciences (3rd Edition), (Paperback).
 Cowan, G. Statistical Data Analysis (Oxford Science Publications),
(Paperback).
Manipal University Jaipur Page No. 391
Statistics for Management Unit 9

 Devore, J.L. Probability and Statistics for Engineering and the Sciences,
Enhanced Review Edition. (Hardcover - Jan. 29, 2008).
 James, F. Statistical Methods in Experimental Physics (2nd Edition).
(Hardcover - Nov. 29, 2006).
 Levin, R.I. & Rubin, D.S. (2008) Statistics for Management, Seventh
Edition, PHI Learning Private Limited.
 Lyons, l. Statistics for Nuclear and Particle Physicists. (Paperback,
1989).
 Mandel, J. The Statistical Analysis of Experimental Data, (Paperback).
 Meyer, S.L. Data Analysis for Scientists and Engineers, (Paperback).
 Morris, H., Schervish, M.J. & Degroot, Probability and Statistics
[PROBABILITY & STATISTICS 3 -OS] (Paperback - Jan. 31, 2002).
 Press, W.H., Teukolsky, S.A., Vetterling, W.T. & Flannery, B.P.
Numerical Recipes: The Art of Scientific Computing, 3rd Edition.

Manipal University Jaipur Page No. 392


Statistics for Management Unit 10

Unit 10 Chi–Square Test


Structure:
10.1 Introduction
Objectives
Relevance
10.2 Chi-Square test
Characteristics of Chi-Square test
Steps in solving problems related to Chi-Square test
Conditions for applying the Chi-Square test
Restrictions in applying Chi-Square test
Practical applications of Chi-Square test
Uses of Chi-Square test
Degrees of freedom
Levels of significance
Interpretation of Chi-Square values
10.3 Applications of Chi-Square Test
Tests for independence of attributes
Test of goodness of fit
Test for comparing variance
10.4 Summary
10.5 Glossary
10.6 Terminal Questions
10.7 Answers
10.8 Case Study

10.1 Introduction
In the previous unit, testing of hypothesis, we discussed about how to test
hypothesis concerned with parameters like mean and proportion, using data
from either one or two samples. We used one-sample tests to determine
whether a mean or a proportion was significantly different from a
hypothesised value. In the two-sample tests, we examined the difference
between either two means or two proportions, and we tried to learn whether
this difference was significant.
For example, we have proportions from five populations instead of only two,
then for these cases, the methods for comparing proportions described for

Manipal University Jaipur Page No. 393


Statistics for Management Unit 10

testing hypothesis for two-samples do not apply; we must use the Chi-
Square test (2 test). In this unit, Chi-Square, we will discuss the Chi-Square
tests which enable us to test whether more than two population proportions
can be considered equal. In other words, a Chi-Square test is also a
parametric test which can be applied on categorical data or qualitative data.
This test can be applied when we have few or no assumptions about the
population parameter.
Actually, Chi-Square tests allow us to do a lot more than just test for the
quality of several proportions. If we classify a population into several
categories with respect to two attributes (such as age and job performance),
we can then use a Chi-Square test to determine whether the two attributes
are independent of each other. So, Chi-Square tests can be applied on a
contingency table.
Objectives:
After studying this unit, you should be able to:
 describe the non parametric method of testing hypothesis
 describe the Chi-Square characteristics
 identify the conditions required for applying Chi-Square test for a given
population distribution
 recognise the applications of Chi-Square test
 describe the steps in solving problems related to Chi-Square test
10.1.1 Relevance
Case-let
Women still earn less than men
On 27 February 2006 the Women and Work Commission (WWC), published
its report on the causes of the “gender pay gap “or the difference between
men’s and women‘s hourly pay. According to the report, British women
working full-time currently earn 17% less per hour than men. In February the
European commission also brought out its own report on the pay gap across
the European Union. Its findings were similar in that, on an hourly basis,
women earn 15% less than men for the same work.
In the United States, the difference in median pay between men and women
is around 20%. According to the WWC report the gender pay gap opens
early. Boys and girls study different subjects in school, and boy’s subjects

Manipal University Jaipur Page No. 394


Statistics for Management Unit 10

lead to more lucrative careers. They then work in different sorts of jobs. As a
result, average hourly pay for a woman at the start of her working life is only
91% of a man’s; even through nowadays she is probably better qualified.
How do we compile this type of statistical information? We can use Chi-
Square testing for more than one type of population.
(Source: Derek L Waller Published by Elsevier Inc Ed 2008).

10.2 Chi-Square test


The Chi-square test is one of the most commonly used non-parametric tests
in statistical work. The Greek Letter 2 is used to denote this test. 2
describe the magnitude of discrepancy between the observed and the
expected frequencies. The value of 2 is calculated as:

O i E i 2 O1 E1 2 O 2 E 2 2 O 3 E 3 2 O n E n 2
2       ....... 
Ei E1 E2 E3 En

Where, O1, O2, O3….On are the observed frequencies and E1, E2, E3…En
are the corresponding expected or theoretical frequencies.

10.2.1 Characteristics of Chi-Square test


The following are the characteristics of a Chi-Square test (2 test):
 he2 test is based on frequencies and not on parameters
 It is a non-parametric test where no parameters regarding the rigidity of
populations are required
 Additive property is also found in 2 test
 he 2 test is useful to test the hypothesis about the independence of
attributes
 The 2 test can be used in complex contingency tables
 The 2 test is very widely used for research purposes in behavioral and
social sciences including business research
 While testing whether the observed frequencies of certain outcomes fits
with expected frequencies defined by a theoretical distribution, the 2
value defined here follows 2 distribution:

O i E i 2
2  
Ei

Manipal University Jaipur Page No. 395


Statistics for Management Unit 10

where, ‘Oi’ is the observed frequency and ‘Ei’ is the expected frequency.

Key Statistic
The observed frequencies are the frequencies obtained from the
observation, which are sample frequencies. The expected frequencies
are the calculated frequencies.
10.2.2 Steps in solving problems related to Chi-Square test
Figure 10.1 depicts the steps required for solving the problems related to
Chi-Square test.

Fig. 10.1: Procedural Steps in Solving Problems on Chi-Square Test

10.2.3 Conditions for applying the Chi-Square test


The following are the conditions for using the Chi-Square test:
1. The frequencies used in Chi-Square test must be absolute and not in
relative terms.
2. The total number of observations collected for this test must be large.
3. Each of the observations which make up the sample of this test must be
independent of each other.
4. As 2 test is based wholly on sample data, no assumption is made
concerning the population distribution. In other words, it is a non
parametric-test.

Manipal University Jaipur Page No. 396


Statistics for Management Unit 10

5. 2 test is wholly dependent on degrees of freedom. As the degrees of


freedom increase, the Chi-Square distribution curve becomes
symmetrical.
6. The expected frequency of any item or cell must not be less than 5, the
frequencies of adjacent items or cells should be polled together in order
to make it more than 5.
7. The data should be expressed in original units for convenience of
comparison and the given distribution should not be replaced by relative
frequencies or proportions.
8. This test is used only for drawing inferences through test of the
hypothesis, so it cannot be used for estimation of parameter value.
10.2.4 Restrictions in applying Chi-Square test
The sample observations should be independently and normally distributed.
For this; either the parent population should be infinitely large (for example,
greater than 50), or sampling should be done with replacement.
Constraints imposed upon the observations must be of linear character, for
example,

O  E
i i

The 2 distribution is essentially a continuous distribution; however its


character of continuity is maintained only when the individual frequencies of
the variate values remain greater than or equal to 5. So, in applying 2 test
in the testing of the goodness of fit or testing of the dependency of variables
in a contingency table, the cell frequency should not be less than 5. In
practical problems we can combine a few values of small frequencies into
one to get the pooled frequency greater than 5.

Key Statistic
The results of Chi-Square test cannot be accurate if the cell frequencies
in a contingency table are less than 5.
10.2.5 Practical applications of Chi-Square test
In inferential statistics, the Chi-Square test can also be applied for the
discrete distributions. In using Chi-Square test, we need no assumptions
regarding the shape of sampling distributions. The applications of Chi-
Square test include testing:

Manipal University Jaipur Page No. 397


Statistics for Management Unit 10

 the significance of sample variances


 the goodness of fit of a theoretical distribution
 the independence in a contingency table whether the observed results
are consistent with the expected segregations in breeding experiments
of genetics
Where the first is a parametric test and the other two are nonparametric test.
10.2.6 Uses of Chi-Square test
The 2 test is used broadly to:
 Test goodness of fit for one way classification or for one variable only
 Test independence or interaction for more than one row or column in the
form of a contingency table concerning several attributes
 Test population variance ‘2’ through confidence intervals suggested by
2 test
10.2.7 Degrees of freedom
The number of degrees of freedom for ‘n’ observations is ‘n-k’ and is usually
denoted by ‘’, where ‘k’ is the number of independent linear constraints
imposed upon them.

Example 1
For example, we are asked to write any four numbers, we will have all the
numbers of our choice. If a restriction is applied or imposed to the choice
that the sum of these numbers should be 50; then the freedom of choice
would be reduced to three only and so the degrees of freedom would
now be 3.

If a 2 is defined as the sum of the squares of ‘n’ independent standardized


normal variates, and the condition of the satisfaction of one linear relation is
imposed upon them (such as the estimation of some population parametric
value, etc.), then the effect of these ‘n’ constraints would be replaced by ‘n-
k’. If the sum of squares of a sample mean is taken instead of the population
mean, then ‘n’ is replaced by n -1 = . This is because one linear constraint
has been imposed.
Key Statistic
The Chi-Square distribution has only one parameter, that is, the degrees
of freedom.

Manipal University Jaipur Page No. 398


Statistics for Management Unit 10

10.2.8 Levels of significance


Tables have been prepared for the values of ‘P’, where the probability of
getting a value of 2  02 where 02 is an observed value. From these
tables, we can find the value of ‘P’ corresponding to an observed value of 2
and then proceed to test, whether the difference between observed and
theoretical frequencies is significant or not. Smaller the values of ‘P’, greater
the divergence between fact and theory so that small values lead us to
suspect the hypothesis. Not only do small values of ‘P’ lead us to suspect
the hypothesis but a value of ‘P’ very near to unity may also lead to a similar
result. Thus, if P = 1, 2 = 0, showing that there is a perfect agreement
between fact and theory and this is a very improbable event. There are two
conventional levels of significance. They are:
 If P < 0.05, we say that the observed value of 2 is significant at
5 percent level of significance.
 Similarly, if P < 0.01, the value is significant at 1 % level.

10.2.9 Interpretation of Chi-Square values


After ascertaining the 2 value, the 2 table comprises of columns headed
with symbols 0.05 for 5% level of significance, 0.01 for 1% level of
significance, etc. The left hand side indicates the degrees of freedom. If the
calculated value of 2 falls in the acceptance region, the null hypothesis ‘Ho’
is accepted and vice-versa. Figure 10.2 depicts the acceptance and
rejection regions of Chi-Square distribution.

Fig. 10.2: Acceptance and Rejection Regions under Chi-Square Distribution


Manipal University Jaipur Page No. 399
Statistics for Management Unit 10

Key Statistic
The Chi-Square curve will be on the positive side of x-axis because the
Chi-Square values are always positive.

10.3 Applications of Chi-Square test


10.3.1 Tests for independence of attributes
In the test for independence, the null hypothesis is that the row and column
variables are independent of each other. We have studied earlier, that the
hypothesis testing is done under the assumption that the null hypothesis is
true.
The following are the properties of the test for independence:
 The data are the observed frequencies
 The data is arranged in the form of a contingency table
 The degrees of freedom ‘’ can be calculated as:
  
  Number of rows  1  Number of columns  1 
where, ‘’ is the degrees of freedom
 The test for independence has a Chi-Square distribution and is always a
right tail test.
 The expected value is computed by taking the row total, multiplying it
with the column total and dividing by the grand total. That is given by:
Row Total  Column Total
E
Grand Total
 The test statistic value does not change, if the order of the rows or
columns is interchanged. Also the value does not change even if the
rows and columns are interchanged.
Solved Problem 1
Calculate the degrees of freedom for a contingency table with three rows
and two columns.
Solution – The degrees of freedom denoted by ‘’ is calculated as:

  
  Number of rows  1  Number of columns  1 
  3  1  2  1  2

Manipal University Jaipur Page No. 400


Statistics for Management Unit 10

Hence, a contingency table with three rows and two columns has two
degrees of freedom.
Solved Problem 2
Table 10.1 depicts the production in three shifts and the number of defective
goods that turned out in three weeks. Test at 5% level of significance
whether weeks and shifts are independent.
Table 10.1: Production of Defective Goods in Three Shifts
Shift 1 Week 2 Week 3 Week Total
I 15 5 20 40
II 20 10 20 50
III 25 15 20 60
Total 60 30 60 150

Solution: Table 10.1a depicts the observed and expected values required
to calculate 2.
Table 10.1a: Observed and Expected Values
Observed Expected Value O i E i 2
Value Row Total  Column Total (Oi – Ei)2
E  Ei
Oi i Grand Total

15 (40 x 60) /150 = 16 1 0.0625


20 (50 x 60) /150 = 20 0 0.0000
25 (60 x 60) /150 = 24 1 0.0417
5 (40 x 30) /150 = 8 9 1.1250
10 (50 x 30) /150 = 10 0 0.0000
15 (60 x 30) /150 = 12 9 0.7500
20 (40 x 60) /150 = 16 16 1.0000
20 (50 x 60) /150 = 20 0 0.0000
20 (60 x 60) /150 = 24 16 0.6667
2cal =3.6459

The steps to calculate 2 are described as follows:


1. Null hypothesis ‘Ho’: The week and shifts are independent
Alternate hypothesis ‘H1’: The week and shifts are dependent
2. Level of significance is 5% and degrees of freedom
d.f. = (3 – 1) (3 – 1) = 4
  tab  9.49
2

Manipal University Jaipur Page No. 401


Statistics for Management Unit 10

3. Test statistics

O i E i 2
2  
Ei
2cal = 3.6459
4. Conclusion: Since 2cal (3.6459) < 2tab (9.49), ‘Ho’ is accepted. Hence,
the attributes ‘week’ and ‘shifts’ are independent.

Solved Problem 3
Out of 1000 people surveyed, 600 belonged to urban areas and rest to rural
areas. Among 500 who visited other states, 400 belonged to urban areas.
Test at 5% level of significance whether area and visiting other states are
dependent.
Solution: Table 10.2 depicts the information given in solved problem 3 in a
tabulated form.
Table 10.2: People Belonging to Urban and Rural Areas
Other States Urban Rural Total
Visited 400 100 500
Not Visited 200 300 500
Total 600 400 1000

Table 10.2a depicts the observed and expected values for the calculation of 2.
Table 10.2a: Observed and Expected Values

O i E i 2
Observed Expected Value
Value Row Total  Column Total (Oi – Ei)2
E  Ei
Oi i Grand Total
400 300 10000 33.33
200 300 10000 33.33
100 200 10000 50.00
300 200 10000 50.00
2cal = 166.66

The steps for calculation of Chi-Square are described as follows:


1. Null hypothesis ‘H0’: Area and visit are independent.
Alternate hypothesis ‘H1’: They are dependent.

Manipal University Jaipur Page No. 402


Statistics for Management Unit 10

2. Level of significance is 5% and degrees of freedom


d.f. = (2 – 1) (2 – 1) = 1

 tab  3.84
2

3. Test statistics

O i E i 2
 
2

Ei

2cal = 166.66
4. Conclusion: Since 2cal (166.66) > 2tab (3.84), ‘Ho’ is rejected. Hence, the
‘area’ and ‘visit’ are dependent.
10.3.2 Test of goodness of fit
The test of goodness of fit of a statistical model measures how accurately
the test fits a set of observations. This test measures and summarises the
differences if any, between the observed and expected values of the
considered statistical model. These test results are helpful to know whether
the samples are drawn from identical distributions or not. The degrees of
freedom are ‘n-1’ and the expected value is equal to the average of the
observed values.
Solved Problem 4
A personal manager is interested in trying to determine whether
absenteeism is greater on one day of the week than on another day of the
week. The record for the past years is available. Table 10.3a depicts the
absenteeism for each working day over a week. Test whether absenteeism
is uniformly distributed over the week.
Table 10.3: Comparison of Data about Absenteeism

Days of
Monday Tuesday Wednesday Thursday Friday
Week
Number of
66 57 54 48 75
absentees

Solution: If the absenteeism is uniformly distributed over the week, then


expected number of absenteeism per day is given by:

Manipal University Jaipur Page No. 403


Statistics for Management Unit 10

i 
66  57  54  48  75  60
5
The table 10.3a depicts the calculated expected values required for
calculation of 2 for the data related to problem 4.
Table 10.3a: Observed and Expected Values for Calculation of 2

Observed Value Expected Value


(Oi – Ei) 2
O i E i 2
Oi E
i Ei
66 60 36 0.6000
57 60 9 0.1500
54 60 36 0.6000
48 60 144 2.4000
75 60 225 3.7500
2cal=7.5000

The steps for calculation of Chi-Square are described as follows:


1. Null hypothesis ‘Ho’: The observed frequencies fit with uniform
distribution.
2. Alternate hypothesis ‘H1’: The observed frequencies does not fit with
uniform distribution.
3. Level of significance is 5% and degrees of freedom (d.f.)= (5 – 1) = 4

  2 tab  9.49
4. Test statistics

O i E i 2
 
2

Ei

2cal = 7.50

5. Conclusion: Since 2cal (7.5) < 2tab (9.49), ‘Ho’ is accepted. In other
words, we conclude at 5% level of significance that absenteeism is
uniformly distributed and is independent of the days of the week.

Manipal University Jaipur Page No. 404


Statistics for Management Unit 10

Solved Problem 5
According to a theory in Genetics, the proportion of beans of A, B, C and D
types in a generation should be 9:3:3:1. In an experiment with 1600 beans,
the frequency of bean of A, B, C and D type was observed to be 882, 313,
287 and 118 respectively. Does the result support the theory?
Solution: The steps for calculation of Chi-Square are described as follows:
1. Null hypothesis ‘Ho’: The result supports theory
Alternate hypothesis ‘H1’: The result does not support theory
2. Level of significance is 5% and degrees of freedom(d.f.)= (4 – 1) = 3
2
  tab  7.81

3. Test statistics

O i E i 2
2  
Ei

Table 10.4 depicts the observed and expected values for calculation of 2
for solved problem 5.
Table 10.4: Observed and Expected Values for Calculation of 2

Observed Value Expected Value


(Oi – Ei) 2
O i E i 2
Oi E
i Ei
882 (1600 x 9) / 16 = 900 324 0.36
313 (1600 x 3) / 16 = 300 169 0.56
287 (1600 x 3) / 16 = 300 169 0.56
118 (1600 x 1) / 16 = 100 324 3.24
2cal = 4.72

2cal = 4.72

4. Conclusion: Since 2cal (4.72) < 2tab (7.81), ‘Ho’ is accepted. Therefore,
the result supports the theory.

Manipal University Jaipur Page No. 405


Statistics for Management Unit 10

Solved problem 6
The following table gives the classification of 100 workers according to
gender and the nature of work. Test whether nature of work is independent
of the gender of the worker.
Table 10.5
Skilled Unskilled Total
Males 40 20 60
Females 10 30 40
Total 50 50 100

The steps for calculation of Chi-Square are described as follows:


1. Null hypothesis ‘Ho’: There is no association between nature of work
and is independent of the gender of the worker
2. Level of significance is 5% and degrees of freedom(d.f.)=
(r-1)(c-1)= (2-1) (2-1)=1

  tab  3.84
2

3. Test statistics

O i E i 2
2  
Ei

Table 10.5a depicts the observed and expected values for calculation of 2
for solved problem 6.
Table 10.5a: Observed and Expected Values for Calculation of 2

Observed Value Expected Value


(Oi – Ei) 2
O i E i 2
Oi E
i Ei
40 30 10 3.333
10 20 -10 5.000
20 30 -10 3.333
30 20 10 5.000
2cal = 16.666

2cal = 16.666

Manipal University Jaipur Page No. 406


Statistics for Management Unit 10

4. Conclusion: Since 2cal (16.666) > 2tab (3.84), ‘Ho’ is accepted. Therefore
the null hypothesis that gender and nature of work are independent will
be rejected.
10.3.3 Test for comparing variance
When we have to use  2 as a test of population variance, then,
Ho: s2 = p2 and HA: s2  p2
s
2

 
2
(n  1)
p
2

Where s2= variance of the sample


p2= variance of the population
(n -1) = degrees of freedom, n being the number of items in the
sample.
Then by comparing the calculated value with the table value of  2 for (n-1)
degrees of freedom at a given level of significance, we may either accept or
reject the null hypothesis. If the calculated of  2 is less than the table value,
the null hypothesis is accepted, but if the calculated value is equal or greater
than the table value the hypothesis is rejected.

Self Assessment Questions

1. 2 – test is a __________ test.


2. A table with 4 rows and 2 columns has the degrees of freedom of
_____________.
3. 2 – test is wholly based on _________ data.
4. If there are four rows and five columns in classification for 2 –
test, then the number of degrees of freedom equal to __________.
5. If the calculated 2 value is less than the tabulated 2 value, then the
null hypothesis is __________.

Manipal University Jaipur Page No. 407


Statistics for Management Unit 10

Activity

Objective Questions:
1. What is the appropriate test to use if you want to determine whether
there is evidence that the proportion of successes is higher in group 1
than in group 2 and we have obtained independent samples from the
two groups?
i) The Z test
ii) The Chi-Square test
iii) Both of the above
iv) None of the above
2. Which of the following values cannot occur in a Chi-Square
distribution?
i) 100.0
ii) 38.4
iii) 0.61
iv) -2.45
3. What test would you use to determine whether a set of observed
frequencies differ from their corresponding expected frequencies?
i) The t test for dependent samples
ii) The Chi-Square test
iii) The t test for independent samples
iv) The F test
4. When using the chi-square test for differences in two proportions with
a contingency table that has r rows and c columns, how many degrees
of freedom will the test statistic have?
i) n – 1
ii) n1 + n2 - 2
iii) (r - 1) x (c - 1)
iv) (r - 1) + (c – 1)
5. When testing for the independence in a contingency table with 3 rows
and 4 columns, how many the degrees of freedom will the test statistic
have?
i) 5
ii) 6
iii) 7
iv) 12

Manipal University Jaipur Page No. 408


Statistics for Management Unit 10

6. Which of the following is true about the Chi-Square distribution?


i) It is a skewed distribution
ii) Its shape depends on the number of degrees of freedom
iii) As the degrees of freedom increase, the Chi-Square distribution
becomes more symmetrical
iv) All of the above
7. What other name is used for a contingency table?
i) A cross-classification table
ii) An ANOVA table
iii) A histogram
iv) None of the above

Solutions to Objective Questions


1. i) The Z test
2. iv) -2.45
3. ii) The Chi-Square test
4. iii) (r - 1)x(c – 1)
5. ii) 6
6. 8 iv) All of the above
7. i) A cross-classification table

10.4 Summary
Let us recapitulate the important concepts discussed in this unit:
 Chi-Square test is a non-parametric test. The important applications of
Chi-Square test are the tests for independence of attributes, the test of
goodness of fit and the test for specified variance.
 2 describe the magnitude of discrepancy between the observed and the
expected frequencies. The value of 2 is calculated as:

O i E i 2 O1 E1 2 O 2 E 2 2 O 3 E 3 2 O n E n 2
 2
    ....... 
Ei E1 E2 E3 En
Where, O1, O2, O3….On are the observed frequencies and E1, E2,
E3…En are the corresponding expected or theoretical frequencies..

Manipal University Jaipur Page No. 409


Statistics for Management Unit 10

 An important criterion for applying the Chi-Square test is that the sample
size should be very large.

10.5 Glossary
Chi-Square test: It is a non-parametric test where no parameters regarding
the rigidity of population are required.
Level of significance: The smallest probability at which the null hypothesis
would be rejected (type I error). Usually, if the significance level is less than
a number such as 0.05 (5%), the null hypothesis would be rejected in favour
of the alternative; the chance of getting a sample like the one being
analysed if the null hypothesis were true. A small significance level would
imply that getting such a sample was highly unlikely, suggesting that the null
hypothesis is probably not true; also called the P-value of the test.

10.6 Terminal Questions


5. 400 items of each (material) were given treatment ‘x’ and ‘y’ to enhance
the strength of the material. 80 gained strength by treatment ‘x’ and 20
gained strength by treatment ‘y’. Does the gain in strength depend on
the treatment?
6. The demand for a particular spare part was found to vary from day to
day. Table 10.6 depicts the information obtained in a sample study.
Test the hypothesis that the number demanded depends upon the day.
Table 10.6: Spare Part Demand from Monday to Saturday

Days Mon Tue Wed Thur Fri Sat

Quantity 1124 1125 1110 1120 1126 1115


Demanded

7. In a survey of 200 boys, of which 75 were intelligent, 40 had skilled


fathers. While 85 of the unintelligent boys had unskilled fathers. Can we
say on the basis of the information that skilled fathers had intelligent
boys?
8. The number of car accidents per month in a town was as follows: 6, 9, 4,
12, 8, 20, 14, 15, 2, and 10. Test the hypothesis that the number of
accidents is same every month.

Manipal University Jaipur Page No. 410


Statistics for Management Unit 10

9. In a particular industry the post graduate, graduate, undergraduates are


in the ratio 2:3:5. A firm belonging to the industry had 400, 550 and 1050
postgraduates, graduates and undergraduates on its pay-roll. Do they
follow earlier observation about the industry?
10. Three hundred digits were chosen at random from a set of tables. The
frequencies of the digits were as follows:
Digits 0 1 2 3 4 5 6 7 8 9
Frequency 28 29 33 31 26 35 32 30 31 25
Using Chi-square test assess the hypothesis that the digits were
distributed in equal numbers in the table.

10.7 Answers

Self Assessment Questions


1. Non-parametric
2. 3
3. Sample
4. 12
5. Not Rejected

Terminal Questions

1. 2cal = 41.142 Ho rejected


2. 2cal = 0.179 Ho accepted
3. 2cal = 8.888 Ho rejected
4. 2cal = 26.6 Ho rejected
5. 2cal = 6.6667 Ho rejected
6. 2cal = 2.864 Ho accepted

10.8 Case Study


Automobile Preference
A market research firm in an Asian country made a survey to see if there
was any correlation between a person’s nationality and their preference in
the make of automobile they purchased. Table 10.7 depicts the sample
information obtained.

Manipal University Jaipur Page No. 411


Statistics for Management Unit 10

Table 10.7: Types of Automobile Purchased in Various Countries


Pakistan China India Srilanka Nepal
Maruti Suzuki 40 28 30 25 50
Opel 32 35 29 39 35
Lancer 24 40 27 28 29
Ford 40 20 40 26 40
Fiat 26 10 35 35 46

Discussion Questions:
i. Indicate the appropriate null and alternative hypothesis to test if the
make of automobile purchased is dependent on an individual’s
nationality?
ii. Using the critical value approach of the Chi-Square test at a 1%
significant level, does it appear that there is a relationship between
automobile purchase and nationality?
iii. Verify the result to Question 2 by using the p-value approach of the
Chi-Square test
iv. What has to be the significance level in order that there appears a
breakeven situation between dependency of nationality and
automobile preference?
v. What is your comment about the results?

References:
 Bevington, P. R. & Robinson, D. K. Data Reduction and Error Analysis
for the Physical Sciences (3rd Edition). (Paperback).
 Cowan, G. Statistical Data Analysis (Oxford Science Publications).
(Paperback).
 Devore, J. L. Probability and Statistics for Engineering and the Sciences
Enhanced Review Edition. (Hardcover - Jan. 29, 2008).
 Froedesen, A. G., Skieggestad, D. & Tofte, H. Probability and Statistics
in Particle Physics. (Hardcover, 1979 – out of print).
 James. H. Statistical Methods in Experimental Physics (2nd Edition).
(Hardcover - Nov. 29, 2006).
 Levin, R. I. & Rubin, D. S. (2008) Statistics for Management, Seventh
Edition, PHI Learning Private Limited.
 Lyons, L. Nuclear and Particle Physicists. (Paperback, 1989).

Manipal University Jaipur Page No. 412


Statistics for Management Unit 10

 Mandel, J. The Statistical Analysis of Experimental Data. (Paperback).


 Mayer, S. L. Data Analysis for Scientists and Engineers. (Paperback).
 Morris. H., Schervish, M. J. & Degroot Probability and Statistics
[PROBABILITY & STATISTICS 3 -OS]. (Paperback - Jan. 31, 2002).
 Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P.
Numerical Recipes (3rd Edition): The Art of Scientific Computing.
 Ross, S. M. Introduction to Probability and Statistics for Engineers and
Scientists, Fourth Edition. (Hardcover - Feb. 13, 2009).
 Taylor, J. R. An Introduction to Error Analysis: The Study of
Uncertainties in Physical Measurements. (Paperback).

Manipal University Jaipur Page No. 413


Statistics for Management Unit 11

Unit 11 F– Distribution and


Analysis of Variance (ANOVA)
Structure:
11.1 Introduction
Objectives
Relevance
11.2 Analysis of Variance (ANOVA)
11.3 Assumptions for F-test
Objectives of ANOVA
Assumptions for study of ANOVA
11.4 Classification of ANOVA
One-way ANOVA
Two-way ANOVA
11.5 Summary
11.6 Glossary
11.7 Terminal Questions
11.8 Answers
11.9 Case Study

11.1 Introduction
In the previous unit, we dealt with Chi-square as a test of independence,
and applications of Chi-square test. We have studied about the
characteristics of Chi-Square and its properties. We have also discussed
about how to find the Chi-Square test results for the given sampling
distribution. We also studied the calculations of Chi-Square values for either
rejecting or not rejecting the null hypothesis. In this unit, we will deal with
Analysis of Variance (ANOVA), assumptions for F-test, and classification of
ANOVA.
In the previous unit, we have studied that the Chi-Square test is used for
testing the differences among the two sample proportions and to make
inferences whether they are from the same population distribution or not.
When we have more than two populations, we have to use the analysis of
variance to evaluate the mean differences between two or more
populations.

Manipal University Jaipur Page No. 414


Statistics for Management Unit 11

Analysis of Variance (ANOVA) will enable us to test for the significance of


the differences of variances among more than two sample means. Using
ANOVA, we will be able to make inferences about whether our samples are
drawn from populations having the same mean or not.
Objectives:
After studying this unit, you should be able to:
 evaluate the mean differences between two or more populations using
analysis of variance
 explain the classification of analysis of variance
 describe the procedure for carrying out the two-way analysis of variance
 recognise the assumptions for applying the ANOVA technique
 interpret the result of F-test to reject or not reject the null hypothesis
framed on two or more population variances
11.1.1 Relevance
Mr. Harish, the CEO of a relatively new brand of television, was enjoying the
bliss of continued phenomenal growth in sales during the last two years.
However, to give further impetus to the sales, he planned to recruit about
200 field staff, who could provide guidance to the staff of retail outlets and
also the potential customers about the features of televisions, innovations
taking place etc., and thus bring out the cost-benefit aspect of the
company’s televisions. Mr. Harish intended to take fresh science graduates
and provide them four weeks’ training at one of the three institutions which
were equipped to provide such training. However, before awarding the
contract, he thought of comparing the effectiveness of training imported by
the three institutions. Out of the first batch of 30 officers, he deputed 10
officers to each of the three institutions where they were given training in
three different modules, viz. technical, financial, and behavioural. After the
training, he hired the services of a reputed consulting organisation to
conduct a quantitative assessment of the training imparted to the field
officers. The consulting agency used ANOVA to evaluate the impact of
training in the three modules and also the three institutions. Such an
analysis helped Mr. Harish to award the training contract on an objective
basis, without any prejudice.
(Source: Srivastava, T. N., and Rejo, S. (2008) Statistics for Management
5th edition, TMH)

Manipal University Jaipur Page No. 415


Statistics for Management Unit 11

11.2 Analysis of Variance (ANOVA)


Analysis of Variance (ANOVA) is useful in such situations as comparing the
mileage achieved by five different brands of gasoline, testing which of four
different training methods produce the fastest learning record, or comparing
the first-year earnings of the graduates of half a dozen different business
schools. In each of these cases, we would compare the means of more than
two samples. Hence, in most of the fields, such as agriculture, medical,
finance, banking, insurance, education, etc., the concept of ANOVA is used.
In statistical terms, the difference between two statistical data is known as
variance. When two data are compared for any practical purpose, their
difference is studied through the techniques of ANOVA. With the analysis of
variance technique, we can test the null hypothesis and the alternative
hypothesis.
Null hypothesis, ‘H0’: All sample means are equal.
Alternate Hypothesis, ‘H1’: All sample means are not equal or at least one of
the samples means differ.

Key Statistic
The technique of analysis of variance is referred to as ANOVA.
Initially, the technique was applied in the field of Zoology and Agriculture,
but in a later stage, it was applied to other fields also. In ANOVA, the degree
of variance between two or more data as well as the factors contributing
towards the variance is studied.
In fact, ANOVA is the classification and cross-classification of statistical data
with the view of testing whether the means of specific classification differ
significantly or whether they are homogeneous.
ANOVA is a method of splitting the total variation of data into constituent
parts which measure the different sources of variations.
The total variation is split up into the following two components:
 Variance within the subgroups of samples
 Variation between the subgroups of the samples
Hence, the total variance is the sum of variance between the samples and
the variance within the samples. After obtaining the above two variations,

Manipal University Jaipur Page No. 416


Statistics for Management Unit 11

these are tested for their significance by F-test which is also known as
variance ratio test.

The ‘F’ statistic is defined as F =  s 1 /  s 2 where  s 1 >  s 2 . It is used to


2 2 2 2

test the differences between variance, that is, whether two populations can
be considered to have the same variance or not. As you have studied in unit
10, that to test a specified variance, we used 2 – test.

The sample variances ‘  s 1 ’ and ‘  s 2 ’ are calculated as:


2 2

s 1 =
2 1
n1  1
 2

Σ X  X and

s 2 =
2 1
n2  1
Σ YY  2

where,
 ‘n1’ is the size of the first sample
 ‘n2’ is the size of the second sample
 X and Y denotes the sample means of the random variable ‘X’ and ‘Y’
respectively
It is also known as variance ratio test. It has two degrees of freedom, one for
numerator of the ratio and another for denominator. They are represented
by:
1 = n1 – 1 and 2 = n2 – 1.

Where, ‘1’ and ‘2’ are degrees of freedom in numerator and denominator
respectively.
The degree of freedom for greater variance is represented as 1 and for
smaller variance as 2. By comparing the observed value of F with the
corresponding table value, we can infer whether the difference between the
variances of sample could have arisen due to sampling fluctuations.

11.3 Assumptions for F- test


The following are the assumptions for applying the F-test:
 The samples are simple random samples.
 The samples are independent of each other.
Manipal University Jaipur Page No. 417
Statistics for Management Unit 11

 The parent populations from which they are drawn are normally
distributed.
The assumption that all the populations should have normal distribution is
hardly achieved in practical cases. Hence, it can be considered as a
limitation.
Solved Problem 1
Table 11.1 depicts the time taken to do a job by method I and method II by
workers. Can we conclude that the variance of time distribution for method I
and method II are the same?
Table 11.1: Time Taken by Workers to Finish a Job by Two Different Methods

Method I 27 23 16 20 26 22
Method II 33 35 34 27 42 32 38

Solution:
Step 1: Null hypothesis ‘H0’:  s 1 =  s 2 , that is, the sample variances of
2 2

two methods are equal.


Alternate hypothesis ‘H1’:  s 1   s 2 , that is, the sample variances
2 2

of two methods are not equal.


Step 2: The table value of F at 5% level of significance and (6,5) degrees of
freedom (df) :Ftab = 4.95, as the variance of sample 2 is greater,

1 = 7 – 1 = 6 and 2 = 6 – 1 = 5
Step 3: Test Statistics
Tables 11.1a and 11.1b depict the frequency table required for the
calculation of sample means for the data given for two different methods.
Table 11.1a: Required Values of the Method I to Calculate Sample Mean
X d = X - 22 d2
27 5 25
23 1 1
16 -6 36
20 -2 4
26 4 16
22 0 0
Σd = 2 Σd 2 = 82

Manipal University Jaipur Page No. 418


Statistics for Management Unit 11

Table 11.1b: Required Values of the Method II to Calculate Sample Mean


X d = X – 35 d2
33 –2 4
35 0 0
34 –1 1
27 –8 64
42 7 49
32 –3 9
38 3 9
Σd = – 4 Σd =136
2

1  2 (Σ d) 2  1  4
s 1 = Σd  82 
2
= = 16.266
n1  1  n 1  5  6 
1  2 (Σ d) 2  1  16 
s 2 = Σ d  136  7  = 22.286
2
 =
n2  1  n2  6  
s 2
2
22.2286
Fcal =  = 1.37
s 1
2
16.266

Step 4: Conclusion: Since Fcal (1.37) < Ftab (4.95), ‘H0’ is accepted. Hence,
there is no significant difference and the sample variances of two
methods are equal.

11.3.1 Objectives of ANOVA


The objectives of ANOVA are:
1. To obtain a measure of the total variation between or among the
components
2. To find a measure of variation between or among the components.
Then, the significance of the difference between the variations in two
series or more may be measured
In other words, with the help of the technique of ANOVA, we can test the
hypothesis that the means of all the components constituting a population
are equal to the mean of the population or that the samples have come from
the same population.

Manipal University Jaipur Page No. 419


Statistics for Management Unit 11

Key Statistic
A table showing the source of variance, the sum of squares, degrees of
freedom, mean square (variance), and the formula for the F-ratio is
known as ANOVA table.

Computation of test statistics


The actual analysis of variance is carried out on the basis of ratio between
the variances. The variance ratio is obtained by dividing the variance
between the samples by the variance within the sample. The ratio forms the
test statistic known as F-Statistic, that is,
Variance between the samples
F – Statistic =
Variance within the samples

Key Statistic
The means of samples will not be same if the variation caused by the
interaction between the samples is large when compared to variance
within the each group.

11.3.2 Assumptions for study of ANOVA


The underlying assumptions for the study of ANOVA are:
i) Each of the samples is a simple random sample
ii) Population from which the samples are selected are normally
distributed
iii) Each of the samples is independent of the other samples
iv) Each of the population has the same variation and identical means
v) The effect of various components are additive

11.4 Classification of ANOVA


ANOVA is mainly carried on under the following two classifications:
i) One-way analysis of variance or One-way classification
ii) Two-way analysis of variance or Two-way classified data or manifold
classification
11.4.1 One-way ANOVA
Procedure for carrying out the One-way ANOVA
1. Compute the sum of all values ‘T’.

Manipal University Jaipur Page No. 420


Statistics for Management Unit 11

2. Find the correction factor:


T2
Correction factor =
N
3. Find Total sum of squares:
T2
SST = Sum of squares of all observations 
N
4. Sum of the Squares of Error between the columns (samples):

 (Σ X 1 ) 2 (Σ X 2 ) 2 (Σ X 3 ) 2 (Σ X 4 ) 2 (Σ X n ) 2 T 2 
SSC =  + + + + ,,,,,, +  
 n 1 n2 n3 n4 nn N 

where  X ,  X ,  X …..are Column totals.


1 2 3
5. Sum of the squares of the Error within columns (samples):
SSE = SST SSC SSR
6. Variance between samples:
SSC
MSC 
( k  1)
7. Variance within the samples:

SSE
MSE 
(n  k )

MSC MSE
8. Test statistics F = ; when MSE> MSC we take F =
MSE MSC
9. Decision: If the computed value of F > Table (critical) value of F for
degrees of freedom (k-1, n - k) at α% (5% or 1%), then we reject H0 and
conclude that all the population means are unequal. Otherwise accept H0
and conclude that the population means are not unequal.
Table 11.2 depicts the specimen of ANOVA table.
Table 11.2: ANOVA Table in one-way ANOVA
Source of Variation Sum of Squares Degree of Freedom Mean Square
Between Samples SSC k–1 MSC
Within Samples SSE n–k MSE
Total SST n-1

Manipal University Jaipur Page No. 421


Statistics for Management Unit 11

SST = Total Sum of the Squares


SSC = Sum of the Squares of Error between the columns (samples)
SSE = Sum of the squares of the Error within columns (samples)
MSC = Variance (Mean Square of Errors) between samples (columns)
MSE = Variance (Mean Square of Errors) within the samples (columns)
Solved Problem 2
The three samples below (Table 11.3) have been obtained from normal
populations with equal variance. Test the hypothesis that the sample means
are equal:
Table 11.3: Samples

8 7 12
10 5 9
7 10 13
14 9 12
11 9 14

The table value of F at 5% level of significance for 1 = 2 and 2 = 12 is 3.88.


Solution:
Let H0: There is no significant difference in the means of three samples
Table 11.3a: The three samples
X1 X2 X3
8 7 12
10 5 9
7 10 13
14 9 12
11 9 14
∑ X 1 = 50 ∑ X 2 = 40 ∑ X 3 = 60

T= Sum of all observations = 150


T 2 150 2
Correction factor = = = 1500
N 15
T2
SST (Total Sum of the Squares)= Sum of squares of all observations 
N
 
= 8 2 + 7 2 + 12 2 + 10 2 + ..........+ 14 2  1500  1600 - 1500 = 100
Manipal University Jaipur Page No. 422
Statistics for Management Unit 11

Sum of the Squares of Error between the columns (samples):

 (Σ X 1 ) 2 (Σ X 2 ) 2 (Σ X 3 ) 2 (Σ X 4 ) 2 (Σ X n ) 2 T 2 
SSC =  + + + + ,,,,,, +  
 n 1 n2 n3 n4 nn N 

50 2 40 2 60 2
= + +  1500 = 1540  1500  40
5 5 5
Sum of the squares of the Error within columns (samples):
SSE = SST – SSC = 100 – 40 = 60
Variance between samples:
SSC 40 40
MSC = = = = 20
k  1 3  1 2
Variance within the samples:
SSE 60
MSE = = =5
(n  k ) (15  3)
The degree of freedom = (k – 1, n – k) = (2, 12).
[ k is the number of columns and n is the total number of observations]
ANOVA Table
Table 11.3b: ANOVA Table
Source of Mean
Sum of squares df F-value
variation square
Between SSC = 40 2 MSC = 20 Fcal=20/5=4
Within SSE = 60 12 MSE = 5
Total TSS = 100 14

F table value for degrees of freedom (2,12) [1 = 2, 2 =12] at 5% level of


significance is 3.88. Since F table value is smaller than the F calculated
value, we reject the null hypothesis and conclude that sample means are
not equal.
Solved problem 3
The following table gives the yields on 15 sample fields under three varieties
of seeds (A, B, C). Test at 5% level of significance whether the average
yields of land under different varieties of seed show significant differences.

Manipal University Jaipur Page No. 423


Statistics for Management Unit 11

Table 11.4
A B C
20 18 25
21 20 28
23 17 22
16 25 28
20 15 32

Solution
Null hypothesis ‘H0’: The average yields of land under different varieties of
seed do not differ significantly.
T= Sum of all observations = 330

T 2 330 2
Correction factor = = = 7260
N 15
T2
SST (Total Sum of the Squares) = Sum of squares of all observations 
N
= 20 2 + 18 2 + 25 2 + 212 + ..........+ 32 2   7260
 7590 - 7260  330
Sum of the Squares of Error between the columns (samples):
 (Σ X 1 ) 2 (Σ X 2 ) 2 (Σ X 3 ) 2 (Σ X 4 ) 2 (Σ X n ) 2 T 2 
SSC =  + + + + ,,,,,, +  
 n 1 n2 n3 n4 nn N 

100 2 95 2 135 2
SSC  + +  7260 = 190
5 5 5
Sum of the squares of the Error within columns (samples):
SSE = SST – SSC = 330  190 = 140
Variance between samples:
SSC 190 190
MSC = = = = 95
k  1 3  1 2
Variance within the samples:
SSE 140 140
MSE = = =  11.67
(n  k ) (15  3) 12

Manipal University Jaipur Page No. 424


Statistics for Management Unit 11

The degree of freedom = (k – 1, n – k) = (2, 12).


[k is the number of columns and n is the total number of observations ]

Table 11.4b depicts the ANOVA table for the solved problem 3.
Table 11.4b: ANOVA Table for Solved Problem 3

Source of Variation Sum of Squares Degree of Freedom Mean Square


Between Samples SSC = 190 k–1=3-1=2 MSC = 95
Within Samples SSE = 140 n – k = 15 – 3 = 12 MSE = 11.67
Total SST = 330 n – 1 = 15 -1 = 14

MSC 95
Fcal = = = 8.14
MSE 11.67
The table value of ‘F’, at 5% level of significance for (2, 12) [1 = 2, 2 =12]
degrees of freedom (df), is 3.88 which is less than the calculated value of ‘F’
i.e. 8.14. Therefore, the null hypothesis is rejected and we conclude that the
average yields of land under different varieties of seed show differences.
11.4.2 Two-way ANOVA
In the two-way classification, data are classified on the basis of two factors.
For example, the agricultural output may be classified on the basis of
different varieties of seeds and also on the basis of different varieties of
fertilizers used.
Procedure for carrying out the Two-way ANOVA
1. a) Assume the means of all columns are equal. That is, the effects of all
factors in the first kind of treatment are equal.
α1 = α 2 = α 3 =..........α c

b) Assume the means of all rows are equal. That is, the effects of all
factors in the second kind of treatment are equal.
β1 = β 2 = β 3 = β 4 .......β r

2. Compute the sum of all values ‘T’.


3. Find Total sum of squares:
T2
SST = Sum of squares of all observations 
N
Manipal University Jaipur Page No. 425
Statistics for Management Unit 11

4. For columns, SSC is calculated as:

 ( X ) 2 ( X ) 2 ( X ) 2 ( X ) 2 ( X ) 2  T 2
SSC   1 i  2 i  3 i  4 i  ..... ni  
 n n n n n  N
 1 2 3 4 n 
where  X ,  X ,  X …..are Column totals.
1i 2i 3i
5. For rows, SSR is calculated as:

 (Σ X j1 ) 2 (Σ X j2 ) 2 (Σ X j3 ) 2 (Σ X j4 ) 2 (Σ X jn ) 2  T 2
SSR =  + + + + .... + 
 n 1 n2 n3 n4 n n  N
where, Σ X j1 , Σ X j2 , Σ X j3 …… are Row totals.

6. SS residual or error: SSE = SST SSC SSR

SSC SSR SSE


7. MSC  ; MSR  ; MSE 
(c  1) ( r  1) {( c  1) ( r  1)}

where, ‘c’ - number of columns and ‘r’ - number of rows.

MSC MSR
8. Fc = and Fr =
MSE MSE
Degrees of freedom for Fc = {c-1, (c-1) (r-1)}
Degrees of freedom for Fr = {r-1, (c-1) (r-1)}
MSE
If MSE > MSC then we take Fc =
MSC
MSE
If MSE > MSR then we taken Fr =
MSR
Fc is for column wise comparison
Fr is for row wise comparison
If Fc < table value of F then 1 = 2 = 3 =……….
If Fr < table value of F then 1 = 2 = 3 =……….

Table 11.5 depicts the ANOVA table for two-way ANOVA

Manipal University Jaipur Page No. 426


Statistics for Management Unit 11

Table 11.5: ANOVA Table for Two-way ANOVA


Source of Sum of Degrees of Mean
F-Ratio
Variation Squares Freedom Square
Between Columns SSC (c – 1) MSC
Fc
Between Rows SSR (r – 1) MSR
Fr
Residual SSE (c-1) x (r -1) MSE
Total SST (n – 1)

Solved Problem 4
Three varieties of crops ‘A’, ‘B’, and ‘C’ are tested in a randomised block
design with four replications. The yields are depicted in table 11.6. Test at
0.05 level of significance whether there is a difference between replications.
Test also whether the varieties differ significantly. Answer the question
taking a significant level of 5%.
Table 11.6: Yields of Three Crops Tested with Four Replications

Replications
Variety
1 2 3 4
A 6 4 8 6
B 7 6 6 9
C 8 5 10 9

Solution
The null hypothesis ‘H0’ is given as:
Η 0 : There is no difference between the replications or the varieties
Table 11.6a depicts the totals of yields of three crops tested with four
replications.
Table 11.6a: Totals of Yields of Three Crops Tested with Four Replications

Replications Total
Variety
1 2 3 4
A 6 4 8 6 24
B 7 6 6 9 28
C 8 5 10 9 32
Total 21 15 24 24 84

Manipal University Jaipur Page No. 427


Statistics for Management Unit 11

N = 12, T = sum of all values = 6 + 7 +8 + 4 + 6 + 5 + 8 + 6 + 10 + 6 + 9 + 9


= 84.
T 2 84 2
Correction factor = = = 588
N 12
T2
Total sum of squares: SST = Sum of squares of all observations 
N
= 62+72+82+42+62+52+82+62+102+62+92+92 – 588 = 36
SST = 36
For columns, SSC is calculated as:

 ( X ) 2 ( X ) 2 ( X ) 2 ( X ) 2 ( X ) 2  T 2
SSC   1 i  2 i  3 i  4 i  ..... ni  
 n n n n n  N
 1 2 3 4 n 

 212 15 2 24 2 24 2 
SSC =  + + +   588  18
 3 3 3 3 

SSC 18 18
MSC= =  =6
(c  1) (4 - 1) 3

For rows, SSR is calculated as:


 (Σ X j1 ) 2 (Σ X j2 ) 2 (Σ X j3 ) 2 (Σ X j4 ) 2 (Σ X jn ) 2  T 2
SSR =  + + + + .... + 
 n 1 n2 n3 n4 n n  N

 24 2 28 2 32 2 
SSR =  + +   588 = 8
 4 4 4 

SSR 8 8
MSR = =  =4
(r  1) (3 - 1) 2

SS residual or error: SSE = SST – SSC – SSR = 36 – 18 – 8 = 10

SSE 10 10
MSE = =  = 1.667
(r  1) (c  1) (4 - 1)(3 - 1) 6

Manipal University Jaipur Page No. 428


Statistics for Management Unit 11

Table 11.6b depicts the ANOVA table for data of solved problem 4.

Table 11.6b: ANOVA Table for Solved Problem 4


Source of Sum of Degrees of
Mean Square F-Ratio
Variation Squares Freedom
Between Columns SSC = 18 c – 1 = (4-1) = 3 MSC = 6
Fc = 6/1.667 = 3.6
Between Rows SSR = 18 r – 1 =(3-1) = 2 MSR = 4
Fr = 4/1.667 = 2.4
Residual SSE = 10 (c-1) x (r -1) = 6 MSE = 1.667
Total SST = 36 n – 1 = 12-1= 11

Between columns:
Table value of ‘F’ = 4.757 at  = 0.05 and degrees of freedom (3,6).
Calculated value of ‘F’ = 3.6
Calculated value of ‘F’ < Table value of ‘F’. Therefore, we accept the
hypothesis that there is no significant difference between replications.
Between rows:
Table value of ‘F’ = 5.143 at  = 0.05 and degrees of freedom (2,6)
Calculated ‘F’ value is 2.4
Calculated ‘F’ value < Table value of ‘F’. Therefore, we accept the
hypothesis that there is no significant difference between the varieties.
Solved Problem 5
A performance study was conducted by the Sales Manager of an NML
Manufacturing Company on four salesmen during three seasons and the
data is depicted in table 11.7.
i) Do the salesmen significantly differ in performance?
ii) Is there a significant difference between seasons?
Use 0.05 level of significance.
Table 11.7: Performance Study of Three Salesmen
Season
Salesmen
Salesman-I Salesman-II Salesman-III Salesman-IV
Summer 36 36 21 35
Rainy 28 29 31 32
Winter 26 28 29 29

Manipal University Jaipur Page No. 429


Statistics for Management Unit 11

Solution
The null hypothesis ‘H0’ is given as:

Η 0 : There is no significant difference between salesmen or between


seasons.
To simplify the calculations, we may subtract some suitable number, for
example 30, from all the data without affecting the values of the variations.
The data coded is depicted in table 11.7a.
Table 11.7a
Salesmen
Season
Salesman-I Salesman-II Salesman-III Salesman-IV
Summer 6 6 -9 5
Rainy -2 -1 1 2
Winter -4 -2 -1 -1

N = 9, T = sum of all values =0

Τ 2 0
2
Correction factor = = =0
N 12
T2
Total sum of squares: SST = Sum of squares of all observations 
N
= 6 + 6 + - 9 + 5 + - 2 +  1 + 1 + 2 + - 4  (2)2  (1)2  (1)2  0
2 2 2 2 2 2 2 2 2

SST = 210
SSC (between salesmen):

 ( X ) 2 ( X ) 2 ( X ) 2 ( X ) 2 ( X ) 2  T 2
SSC   1 i  2 i  3 i  4 i  ..... ni  
 n n n n n  N
 1 2 3 4 n 

SSC =
02 + 32 + - 92  62 
Τ2
3 3 3 3 N
= 0 + 3 + 27 + 12 – 0 = 42
Degrees of freedom = c-1) = (4 -1) = 3

Manipal University Jaipur Page No. 430


Statistics for Management Unit 11

SSC 42
MSC= = = 14
(c  1) 3

SSR (between seasons):

 (Σ X j1 ) 2 (Σ X j2 ) 2 (Σ X j3 ) 2 (Σ X j4 ) 2 (Σ X jn ) 2  T 2
SSR =  + + + + .... + 
 n 1 n2 n3 n4 n n  N

SSR =
82 + 02  - 82 
T2
4 4 4 N
= 16 + 0 + 16 – 0 = 32
Degrees of freedom = (r-1) = (3 -1) = 2

SSR 32
MSR = = = 16
(r  1) 2

SS residual or error: SSE = SST – SSC – SSR = 210 – 42 – 32 = 136


SSE 136
MSE = = = 22.67
(r  1) (c  1) 6
MSE
Since MSE > MSC we take Fc = and MSE > MSR we take
MSC
MSE
Fr =
MSR
Table 11.7b: ANOVA Table for Solved Problem 5
Degrees
Source of Sum of
of Mean squares Variance ratio
variation squares
freedom
Between
22.67
columns SSC = 42 3 MSC = 14 FC = = 1.62
(Salesmen) 14
Between
22.67
Rows SSR = 32 2 MSR = 16 FR = = 1.42
(Season) 16
Error SSE = 136 6 MSE = 22.67
Total 210 11

Manipal University Jaipur Page No. 431


Statistics for Management Unit 11

For salesmen:
The calculated value of FC is 1.62. The table value of F for (6,3) df at 5%
level of significance is 8.94. Since the calculated value of F is less than the
table value, we accept the null hypothesis and conclude that the sales of
different salesmen do not differ significantly.
For seasons
The calculated value of FR is 1.42. The table value of F for (6,2) df at 5%
level of significance is 19.3. Since the calculated value of F is less than the
table value, we accept the null hypothesis and conclude that there is no
significant difference in the seasons so far as sales are concerned.

Self Assessment Questions


1. State whether the following statements are ‘True’ or ‘False’
i) Analysis variance is useful to test several means.
ii) Another tool applied to test several means is Z/t–test.
iii) F-ratio is always calculated with respect to mean square error.
iv) The F-distribution curve depends on the degrees of freedom.
v) In applying analysis of variance, the sample sizes must be equal.
vi) In one-way ANOVA, the null hypothesis always states that all the
population means are different.
vii) The F-statistic is the ratio of variance between the samples to the
variance within the samples.
2. If we take only one factor and investigate the difference amongst its
various categories having numerous possible values, we are said to use
i) Two-way ANOVA ii) One-way ANOVA
iii) Multi-way ANOVA iv) Four-way ANOVA
3. The sum of squares for variance between samples is 8 and the sum of
squares for variance within samples is 24, then the sum of squares for
total variance is
i) 16 ii) 32
iii) 48 iv) 8
4. A test used as a test of goodness fit is
i) Chi-square test ii) Z-test
iii) t-test iv) u-test

Manipal University Jaipur Page No. 432


Statistics for Management Unit 11

5. A test used to compare the variance of the two independent samples is


i) F- test ii) Z- test
iii) t - test iv) u –test

11.5 Summary
Let us recapitulate the important concepts discussed in this unit:
 ANOVA is a statistical technique used to evaluate the variances
between three or more sample means. This helps to make inferences to
judge whether the samples are from populations having the same mean
or not.
 ANOVA is classified into one-way ANOVA and two-way ANOVA.
 ANOVA is a parametric test as it assumes normality regarding
population distributions and also as it deal in means.
 The F-test is conducted for performing ANOVA. F-test is used to test the
equality of two variances. ANOVA is used to test the equality of several
means. F-distribution has a pair of degrees of freedom.
 The assumptions for applying the F-test are that the random samples
must be independent to each other and normally distributed.

11.6 Glossary
Analysis of Variance (ANOVA): The process of splitting the variation of a
group of observation into assignable causes and setting up various
significance tests.
One-way or one-way factor ANOVA: When the source of variation in the
observation is primarily due to one factor.
Post hoc test: A test carried out based on the result of the earlier test.
Two-way or two-way factor ANOVA: When there are two factors as
source of variation in the observation.

11.7 Terminal Questions


1. Table 11.8 depicts the data of the number of claims processed per day
of a group of four employees of XYZ Insurance Company observed for a
number of days. Test the hypothesis that the employees mean claims
per day are all the same. Use 5% level of significance.

Manipal University Jaipur Page No. 433


Statistics for Management Unit 11

Table 11.8: Claims Processed per Day of Four Employees of an XYZ


Insurance Company

Employee 1 15 17 14 12

Employee 2 12 10 13 17
Employee 3 11 14 13 15 12
Employee 4 13 12 12 14 10 9

2. Four makes of bulbs were tested for their length of life (in ‘000 hours)
and the data obtained is depicted in table 11.9. Test whether the length
of their life is significantly different.

Table 11.9: Four Different Makes of Bulbs with Their Length of Life
Make I Make II Make III Make IV
20 19 21 15
23 15 19 17
18 17 20 16
17 20 17 18
16 16

3. Table 11.10 depicts the data on production rate by five workmen on four
machines. Test whether the rate is significantly different due to workers
and machines.
Table 11.10: Production Rate of Five Workmen on Four Machines
Machines Workmen
I II III IV V
1 46 48 36 35 40
2 40 42 38 40 44
3 49 54 46 48 51
4 38 45 34 35 41

4. The percentage of sugar content of tobacco in two samples is depicted


in table 11.11. Test whether their population variances are same.

Manipal University Jaipur Page No. 434


Statistics for Management Unit 11

Table 11.11: Percentage of Sugar Content of Tobacco in Two Samples

Sample A 2.4 2.7 2.6 2.1 2.5


Sample B 2.7 3.0 2.8 3.1 2.2 3.6

5. Three students determine the moisture content of samples of a powder,


each student taking a sample from each of 4 consignments. The results
are given below :
Table 11.12
Students Consignment
I II III IV
1 9 10 9 10
2 12 12 10 11
3 11 11 9 12

11.8 Answers

Self Assessment Questions


1. i) True, ii) False iii) True iv) True v) False vi) True vii) True
2. ii) One-way ANOVA
3. ii) 32
4. i) Chi-square test
5. i) F test

Terminal Questions
1. Fcal = 1.47, not significant
2. Fcal = 1.67 not significant
3. Fcal = 8.20 for workman
Fcal = 19.20 for machines
Both are not significant
4. Fcal = 4.08, not significant
5. Fcal = 6.91 for students ( significant)
Fcal = 4.02 for consignments (not significant)

Manipal University Jaipur Page No. 435


Statistics for Management Unit 11

11.9 Case Study


The manager of a bank in Orissa is responsible for ATM operations in three
areas in the city, namely Bhubaneswar, Balasore, and Cuttack. When he
took over the operations, he had faced with the problem of cash running out
from the ATM Machines. So he collected data from all the three areas to
check whether the ATMs in all the three areas need equal amount of cash.
He also wanted to know whether the different ATMs at different locations
needed the same amount of cash or not. So he collected the following data
about cash withdrawals during the last four months.
Table 11.9: Monthly withdrawals from ATMs
(Rs in lakhs)
Areas Station Market Bank
45 46 37
35 33 43
Bhubaneswar
42 47 58
46 49 17
34 45 43
45 56 45
Balasore
47 34 34
48 35 47
56 45 34
34 44 44
Cuttack
45 47 54
45 48 33

Help the manager in taking a decision whether he should place equal


amount of cash in all ATMs.
i) In all areas
ii) In all locations
iii) In all locations as well as areas.

References:
 Bevington P. R.; Robinson, D. K., Data Reduction and Error Analysis for
the Physical Sciences, 3rd Edition.
 Cowan G., Statistical Data Analysis, Oxford Science Publications.

Manipal University Jaipur Page No. 436


Statistics for Management Unit 11

 Devore J. L., (2008) Probability and Statistics for Engineering and the
Sciences, Enhanced Review Edition.
 Froedesen A. G., Skjeggestad D. and Tøfte H., (1979), Probability and
Statistics in Particle Physics, out of print.
 James F., (2006), Statistical Methods in Experimental Physics, 2nd
Edition.
 Levin R. I., and Rubin D. S., (2008) Statistics for Management, Seventh
Edition, PHI Learning Private Limited.
 Lyons L., (1989) Statistics for Nuclear and Particle Physicists .
 Mandel J., The Statistical Analysis of Experimental Data.
 Meyer S. L., Data Analysis for Scientists and Engineers.
 Morris H., Schervish M. J., and DeGroot, (2002) Probability and
Statistics.
 Press W. H., Teukolsky S. A., Vetterling, W. T., and Flannery, B. P.,
Numerical Recipes: The Art of Scientific Computing, 3rd Edition..
 Ross S. M., (2009), Introduction to Probability and Statistics for
Engineers and Scientists, Fourth Edition.
 Taylor J. R., An Introduction to Error Analysis: The Study of
Uncertainties in Physical Measurements.

Manipal University Jaipur Page No. 437


Statistics for Management Unit 12

Unit 12 Simple Correlation and Regression


Structure:
12.1 Introduction
Objectives
Relevance
12.2 Correlation
Causation and Correlation
Types of Correlation
12.3 Methods of Correlation
12.4 Measures of Correlation
Scatter diagram
Karl Pearson’s Correlation Coefficient
Properties of Karl Pearson’s Correlation Coefficient
Factors influencing the size of Correlation Coefficient
12.5 Probable Error
Conditions under which Probable Error is used
12.6 Spearman’s Rank Correlation Coefficient
12.7 Partial Correlation
12.8 Multiple Correlations
12.9 Regression
Regression analysis
Regression lines
Regression coefficient
12.10 Standard Error of Estimate
12.11 Multiple Regression Analysis
Applications of Multiple Regression
12.12 Summary
12.13 Glossary
12.14 Terminal Questions
12.15 Answers
12.16 Case Study

Manipal University Jaipur Page No. 438


Statistics for Management Unit 12

12.1 Introduction
In the previous unit, we dealt with analysis of variance (ANOVA),
assumptions for F-test, and classification of ANOVA. In this unit, we will
deal with correlation, methods of correlation, measures of correlation,
probable error, Spearman’s rank correlation coefficient, partial correlation,
multiple correlations, regression, standard error of estimate, multiple
regression analysis, and application of multiple regressions.
Both correlation and regression are used to measure the strength of
relationships between variables. Those statistical tools measure the
relationship between the variables analysed in social science research.
Objectives:
After studying this unit, you should be able to:
 define correlation and regression
 discuss the types and measures of correlation
 calculate the Karl Pearson’s correlation coefficient
 calculate the coefficient for partial and multiple correlation
 apply the method of estimating unknown values from known values
through regression equations
12.1.1 Relevance
The new CEO of a health care pharmaceutical company called for a
meeting of all heads of various departments to discuss the future strategy of
the company. While he expressed satisfaction over the growing sales of the
company, he also emphasised on the need of giving a further boost to the
sales and image of the company. The head of the R and D unit suggested
investing higher funds on innovation of new products and improvement of
existing ones. He pointed out that R and D had the most significant
contribution to the sales of the company. The head of the Marketing
department emphasised the importance of marketing strategy for boosting
the sales of the company. He, therefore, wanted more funds to be made
available for the purpose. The Head of HRD department suggested the
need for more staff and also new training programmes for improving the
sales significantly. The CEO agreed in person with them and was expecting
some analysis of quantitative facts and figures to evaluate the claims of the
head of department and commit funds for the new strategies. The job was
entrusted to a consultant who analysed the data using statistical techniques
Manipal University Jaipur Page No. 439
Statistics for Management Unit 12

in general and correlation and regression analysis in particular to assess the


impact of R and D, Marketing and HRD initiatives in boosting the sales of
the company and thus facilitated the CEO in taking appropriate decisions
based on an analytical approach.
(Source: Srivastava, T. N. and Rejo, S. (2008), Statistics for Management, 5th
edition, TMH)

12.2 Correlation
When two or more variables move in sympathy with the other, then they are
said to be correlated. If both variables move in the same direction, then they
are said to be positively correlated. If the variables move in the opposite
direction, then they are said to be negatively correlated. If they move
haphazardly, then there is no correlation between them. Correlation analysis
deals with the following:
 Measuring the relationship between variables.
 Testing the relationship for its significance.
 Giving confidence interval for population correlation measure.

Following are some of the definitions:


According to Croxton and Cowden, “When the relationship is of a
quantitative nature, the appropriate statistical tool for discovering and
measuring the relationship and expressing it in a brief formula is known as
correlation”.

According to A.M Tuttle, “Correlation is an analysis of the covariation


between two or more variables.”

According to W. A. Neiswanger, “Correlation analysis contributes to the


understanding of economic behaviour, aids in locating the critically important
variables on which others depend, may reveal to the economist the
connections by which disturbances spread, and suggest to him the paths
through which stabilising forces may become effective.”

According to Tippett, “The effect of correlation is to reduce the range of


uncertainty of our prediction.”

Manipal University Jaipur Page No. 440


Statistics for Management Unit 12

12.2.1 Causation and correlation


The correlation between two variables may be due to the following causes:
 Due to small sample sizes,
Correlation may be present in sample and not in population.
 Due to a third factor, like in the case,
Correlation between yield of rice and tea may be due to a third factor -
‘rain’.
12.2.2 Types of correlation
The following are the three categories of correlation:
1. Positive or negative
2. Simple, partial, and multiple
3. Linear and non-linear
1. Positive and negative correlations: Both the variables (X and Y) will
vary in the same direction. If variable X increases, variable Y also will
increase; and if variable X decreases, variable Y also will decrease; then the
correlation in such cases is known as positive correlation. If the given
variables vary in opposite direction, then they are said to be negatively
correlated. If one variable increases, the other variable will decrease. In
other words, the variables are negatively correlated if there is an inverse
relationship between the variables. For example, price and supply of the
commodity. On the other hand, correlation is said to be negative or inverse if
the variables deviate in the opposite direction, i.e., if the increase (decrease)
in the values of one variable results, on the average, in a corresponding
decrease (increase) in the values of the other variable. For example,
temperature and sale of woolen garments.
2. Simple, partial, and multiple correlations: In simple correlation, the
relationships between two variables are studied. In partial and multiple
correlations, three or more variables are studied. Three or more variables
are simultaneously studied in multiple correlations. In partial correlation
more than two variables are studied, but the effect on one variable is kept
constant and the relationship between the other two variables is studied. For
example, let us suppose that we have three variables, number of hours
studied (x), IQ (y), and marks obtained (z). In a multiple correlation, we will
study the correlation between z with 2 variables, x and y. In contrast, when
we study the relationship between x and z, keeping an average IQ as
constant, it is said to be a study involving partial correlation.
Manipal University Jaipur Page No. 441
Statistics for Management Unit 12

3. Linear and non-linear correlation: Correlation depends upon the


constancy of the ratio of change between the variables. In linear correlation,
the percentage change in one variable will be equal to the percentage
change in another variable. It is not so in non-linear correlation. For
example, Y = aX + b. The relationship between two variables is said to be
non-linear or curvilinear if corresponding to a unit change in one variable,
the other variable does not change at a constant rate but at a fluctuating
rate. When this is plotted in the graph, this will not be a straight line.

12.3 Methods of Correlation

METHODS OF CORRELATION

GRAPHIC ALGEBRAIC

SCATTER
COVARIANCE RANK CONCURR-
DIAGRAM
METHOD CORRELATION ENT
DEVIATION
METHOD

Fig. 12.1: Methods of correlation

12.4 Measures of Correlation


The following are three methods through which we understand the
measures of correlation:
i. Scatter diagram
ii. Karl Pearson’s correlation coefficient
iii. Spearman’s rank correlation coefficient
12.4.1 Scatter diagram
The ordered pair of observed values are plotted on XY plane as dots.
Therefore, it is also known as dot diagram. It is a diagrammatic
representation of relationship.
Manipal University Jaipur Page No. 442
Statistics for Management Unit 12

Interpreting a scatter plot


If the dots lie exactly on a straight line that runs from left bottom to right top,
then the variables are said to be perfectly positively correlated. Figure 12.2
depicts the scattered diagram for perfectly positively correlated variables.

Fig. 12.2: Perfect Positive Correlation

If the dots lie close to a straight line that runs from left bottom to right top,
then the variables are said to be positively correlated. Figure 12.3 depicts
the scattered diagram for positively correlated variables.

Fig. 12.3: Positive Correlation

If the dots lie exactly on a straight line that runs from left top to right bottom,
then the variables are said to be perfectly or exactly negatively correlated.
Figure 12.4 depicts the scattered diagram for the perfectly negatively
correlated variables.

Manipal University Jaipur Page No. 443


Statistics for Management Unit 12

Fig. 12.4: Perfect Negative Correlation

If the dots lie very close to a straight line that runs from left top to right
bottom, then the variables are said to be negatively correlated. Figure 12.5
depicts the scattered diagram for the negatively correlated variables.

Fig. 12.5: Negative Correlation

If the dots lie all over the graph paper, then the variables have zero
correlation. Figure 12.6 depicts the scattered diagram of the variables with
zero correlation.

Manipal University Jaipur Page No. 444


Statistics for Management Unit 12

Fig. 12.6: Zero Correlation

Scatter diagram tells us the direction in which they are related and does not
give any quantitative measure for comparison between data sets.
12.4.2 Karl Pearson’s correlation coefficient
A Mathematical method for measuring the intensity or the magnitude of
linear relationship between two variable series is the correlation coefficient.
In order to study the “degree of variation” between the variables in a
bivariate distribution we can use the correlation coefficient

Key Statistic
Karl Pearson’s correlation coefficient is defined as:
Cov(X, Y )
r 
S.D(X ).S.D(Y )
 xy
i) r ––––––––––––– (A)
N x  y
where, x     and y    
( X  X) 2 ( Y  Y) 2
x  and  Y 
2 2

N N
 xy
where, ‘N’ is the number of paired observations and is called

covariance of ‘x’ and ‘y’.

Manipal University Jaipur Page No. 445


Statistics for Management Unit 12

12.4.3 Properties of Karl Pearson’s correlation coefficient


The following are the properties of Karl Pearson’s correlation coefficient:
 Its value always lies between –1 and 1
 It is not affected by change of origin or change of scale
 It is a relative measure. It does not have any unit attached to it

Key Statistic
The other forms of Karl Pearson’s correlation coefficient formula are:
 xy
ii) r –––––––––––––––––––– (B)
 x  y 
2 2

N  XY   X  Y
r –––– (C)
N  X 2
 (  X) 2  N  Y 2
 (  Y) 2 
N  dx dy   dx  dy
r ––(D)
N  dx 2
 ( dx) 2
 N  dy 2
 ( dy) 2

For all practical purposes, we can conveniently use form D; whenever
summary information is given choose proper form from A to C.
12.4.4 Factors influencing the size of correlation coefficient
The size of ‘r’ is very much dependent upon the variability of measured
values in the correlation sample. The greater the variability, the higher will
be the correlation, everything else being equal. The size of ‘r’ is altered
when researchers select extreme groups of subjects in order to compare
these groups with respect to certain behaviours. Selecting extreme groups
on one variable increases the size of ‘r’ over what would be obtained with
more random sampling.
Combining two groups which differ in their mean values on one of the
variables is not likely to faithfully represent the true situation as far as the
correlation is concerned.
Inclusion of an extreme case (and similarly dropping of an extreme case)
can lead to changes in the amount of correlation.

Manipal University Jaipur Page No. 446


Statistics for Management Unit 12

Process of calculating coefficient of correlation when Deviations are


taken from Arithmetic Mean
 Calculate the means of the two series: X and Y.
 Take deviations in the two series from their respective means, indicated
as x and y. The deviation should be taken in each case as the value of
the individual item minus (–) the arithmetic mean.
 Square the deviations in both the series and obtain the sum of the
respective squares of deviation. This would give ∑x2 and ∑y2.
 Take the product of the deviations, that is, ∑xy. This means individual
deviations are to be multiplied by the corresponding deviations in the
other series and then their sum is obtained.
 The values thus obtained in the preceding steps ∑xy, ∑x2 and ∑y2 are to
be used in the formula for correlation.

 xy
r
 x  y 
2 2

Solved Problem 1
Find Karl Pearson’s correlation coefficient for the data depicted in table
12.1.
Table 12.1: Data Related to Solved Problem 1
X 20 16 12 8 4
Y 22 14 4 12 8

Solution:
Table 12.1a depicts the sums calculated for the data depicted in table 12.1a.
Table 12.1a: Sums Related to Solved Problem 1
X Y X2 Y2 XY
20 22 400 484 440
16 14 256 196 224
12 4 144 16 48
8 12 64 144 96
4 8 16 64 32
X = 60 Y = 60 X = 880
2
Y = 904
2
XY = 840

Manipal University Jaipur Page No. 447


Statistics for Management Unit 12

Applying the formula for ‘r’ and substituting the respective values from the
table we get r as:
N  XY   X  Y
r
N  X 2
 (  X) 2  N  Y 2
 (  Y) 2 
5(840)  (60)(60)
r
[5(880)  (60) 2 ][5(904)  (60) 2 ]
r  0  70
Hence, Karl Pearson’s Correlation Coefficient is 0.70.

Solved Problem 2
Calculate the correlation coefficient from the data depicted in table 12.2.
Table 12.2: Data Related to Solved Problem 2
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70

Solution:
Table 12.2a depicts the frequency table of the data related to solved
problem 2.
Table 12.2a: Frequency Table Data for Solved Problem 2

dx= dy=
X dx2 Y dy2 dx dy
X-50 Y-55
50 0 0 48 -7 49 0
60 + 10 100 65 + 10 100 + 100
58 +8 64 50 -5 25 - 40
47 -3 9 48 -7 49 + 21
49 -1 1 55 0 0 0
33 -17 289 58 3 9 - 51
65 + 15 225 63 8 64 + 120
43 -7 49 48 -7 49 + 49
46 -4 16 50 -5 25 + 20
68 +18 324 70 15 225 + 270
X = 519 dx =19 dx2 = 1077 Y = 535 dy = 5 dy2 = 595 dxdy =
489

Manipal University Jaipur Page No. 448


Statistics for Management Unit 12

Using the formula for calculating ‘r’ as:

N  dx dy   dx  dy
r
N  dx 2
 ( dx) 2  N  dy 2
 ( dy) 2 
And substituting values we get

10  489  19  5
r  0.611
10 1077  19 10  595  5 
2 2

Therefore, Karl Pearson’s correlation coefficient is r = 0.611.

Solved Problem 3
In a bivariate data on ‘x’ and ‘y’, variance of ‘x’ = 49, variance of ‘y’ = 9 and
covariance Cov(x, y) = -17.5. Find coefficient of correlation between ‘x’ and
‘y’.
Solution:
We know that:
 xy
r
N x  y
 xy
Given Cov(x, y) =  - 17.5
N
 x  49  7 y  9  3
 17 .5
r  - 0.833
73
Hence, there is a highly negative correlation.

Solved Problem 4
Ten observation in Weight (x) and Height (y) of a particular age group gave
the following data.

X = 56 Y = 138 X2 = 1357 Y2 = 2136 XY = 836


Find ‘r’.

Manipal University Jaipur Page No. 449


Statistics for Management Unit 12

Solution:
We know that:
N  XY   X  Y
r
N  X 2
 (  X) 2  N  Y 2
 (  Y) 2 
Given N = 10, X = 56 Y = 138
X = 1357 Y2 = 2136 XY = 836
2

10  836 (56)(138)
r  0.1286
10 1357  (56)  10  2136  (138) 
2 2

Hence, Karl Pearson’s correlation coefficient is 0.1286.

12.5 Probable Error


It measures the extent to which the correlation coefficient is dependable. It
is an old measure of testing the reliability of “r”. It is given by:

 
0  6745 1  r 2 
n
where, ‘r’ is measured from sample of size ‘n’.
Probable error is used to:
i) Interpret the value of ‘r’,
 If r < P.E, then it is not at all significant
 If r > 6 P.E, then ‘r’ is highly significant
 If P.E < r < 6 P.E, we cannot say anything about the significance of
‘r’
ii) Construct confidence limits within which correlation in the population
 is expected to lie.

If r is the observed correlation coefficient in a sample of n pairs of


observation then its standard error, usually denoted by S.E (r) is given by

SE (r) =
1  r 
2
PE (r) = SE (r) * 0.6745
n
The reason for taking the factor 0.6745 is that in a normal distribution 50%
of the distribution lie in the range μ ± 0.6745 σ

Manipal University Jaipur Page No. 450


Statistics for Management Unit 12

12.5.1 Conditions under which probable error is used


The following are some conditions under which probable error (P. E.) is
used.
1. Samples should be drawn from a normal population
2. The value of ‘r’ must be determined from sample values
3. Samples must have been selected at random

Solved Problem 5
If r = 0.6 and n = 64, then:
a) Interpret ‘r’
b) Find the limits within which ‘’ is supposed to lie
Solution:

 
0  6745 1  (0.6) 2 
64
= 0.054
a) 6    6  0  054  0  324
Since r 0  6   6   , r is highly significant.

b) Limits for population “”


 0  6  0  054
Hence, the limits within which ‘’ lies are 0.546 and 0.654.

12.6 Spearman’s Rank Correlation Coefficient


Karl Pearson’s correlation coefficient assumes that:
i. Samples are drawn from a normal population
ii. The variables under study are affected by a large number of
independent causes so as to form a normal distribution
When we do not know the shape of population distribution and when the
data is of qualitative type, Spearman’s Ranks correlation coefficient is used
to measure the relationship.

Manipal University Jaipur Page No. 451


Statistics for Management Unit 12

Key Statistic
Spearman’s Rank correlation coefficient is defined as:
6  D2
  1 3
N N
where, D is the difference between ranks assigned to the variables.
N is the number of observation
Value of ‘’ lies between ‘-1’ and ‘+1’ and its interpretation is same as that
of Karl Pearson’s correlation coefficient.
There are three types of problems. Table 12.3 depicts the types of problems
involved in calculating rank correlation coefficient.
Table 12.3: Types of Problems
Type i Ranks are assigned
Type ii Ranks are not assigned
Type iii When ranks are repeated

Type i: Ranks are assigned: When ranks are already assigned, take the
difference between the ranks of the variables and denote it by D. Then the
rank correlation is computed using the formula

6  D2
  1
N( N 2  1)

Solved Problem 6
In a singing competition, two judges assigned the ranks for seven
candidates which is depicted in table 12.4. Find Spearman’s rank correlation
coefficient.
Table 12.4: Ranks of Seven Candidates
Competitor 1 2 3 4 5 6 7
Judge I 5 6 4 3 2 7 1
Judge II 6 4 5 1 2 7 3

Solution:
Table 12.4a depicts the data of solved problem 6.

Manipal University Jaipur Page No. 452


Statistics for Management Unit 12

Table 12.4a: Data of Seven Candidates


Competitor R1 R2 D = R1 – R 2 D2
(Judge 1) (Judge 2)
1 5 6 -1 1
2 6 4 2 4
3 4 5 -1 1
4 3 1 2 4
5 2 2 0 0
6 7 7 0 0
7 1 3 -2 4
N =7 D = 14
2

6  D2
  1
N( N 2  1)

6(14) 6  14
=1– 1   0.75
7(7  1)
2
7  48

Hence, Spearman’s Rank Correlation Coefficient  is 0.75.


Type ii: Ranks are not assigned: When ranks are not given, we have to
assign the ranks to the variables either in ascending order or descending
order. Then use the same formula to compute the rank correlation.

Solved Problem 7
Find the rank difference coefficient of correlation (in case of no ties) for the
data depicted in table 12.5.
Table 12.5: Scores of Students on Test I and Test II

Student Score Score Rank of Rank Difference Difference


on Test on Test I on between squared
I Test II R1 Test II Ranks D2
X Y R2 D
A 16 8 2 5 -3 9
B 14 14 3 3 0 0
C 18 12 1 4 -3 9
D 10 16 4 2 2 4
E 2 20 5 1 4 16
N=5 D2= 38

Manipal University Jaipur Page No. 453


Statistics for Management Unit 12

Applying the formula of regulations, we get:


6  D2 6(38)
=1– 1   1  1.9  0.9
N( N  1)
2
5(5 2  1)

Relation between ‘x’ and ‘y’ is very high and inverse. Relationship between
score on Test I and II is very high and inverse.

Solved Problem 8
Table 12.6 depicts the sales statistics of six sales representatives in two
different localities. Find whether there is a relationship between the buying
habits of the people in the localities.
Table 12.6: Sales Data of Six Representatives
Representative 1 2 3 4 5 6
Locality I 70 40 65 110 60 20
Locality II 70 30 80 100 90 20

Solution:
Table 12.6a depicts the calculated values of correlation coefficient of data in
solved problem 8.
Table 12.6a: Calculating the Coefficient of Correlation
Representative Sales in Sales in D = R1-R2 D2
Locality I locality II
R1 R2
1 2 4 -2 4
2 5 5 0 0
3 3 3 0 0
4 1 1 0 0
5 4 2 2 4
6 6 6 0 0
N=6 D2= 8

6  D2
  1
N( N 2  1)
6(8) 8
=1– 1  0.7714
6(6  1)
2
35
Therefore, there is high positive correlation between the buying habits of the
locality people.

Manipal University Jaipur Page No. 454


Statistics for Management Unit 12

Type ii: When ranks are repeated


In case of attributes, if there is a tie, i.e., if any two or more individuals are
placed together in any classification with respect to an attribute or if in case
of variable data, there is more than one item with the same value in either or
both the series, then Spearman’s formula for calculating the rank correlation
coefficient breaks down, since in this case the variable X ( the ranks of
individuals in characteristic A ( 1st series) and Y ( the ranks of individuals
characteristic B ( 2nd series) do not take the values from 1 to n and
consequently X ≠ Y, while in proving we had assumed that X = Y.
For the computation of coefficient of rank correlation, while ranking the
values, two or more values may be equal. And so, a situation of ties may
arise. In such a case, all those values which are equal are assigned with the
same average rank. And then, the coefficient of rank correlation is found.
Here, corresponding to every such repeated rank correlation is found. Here
corresponding to every such repeated rank (which repeats m times), a factor
(m3 – m) / 12 is added to ∑d2
In this case, common ranks are assigned to the repeated items. These
common ranks are the arithmetic mean of the ranks which these items
would have got if they were different from each other and the next item will
get the rank next to the rank used in computing the common rank. For
example, suppose an item is repeated at rank 4. Then the common rank to
be assigned to each item is (4 + 5) / 2, i.e, 4.5 which is the average of 4 and
5, the ranks which these observations would have assumed if they were
different. The next item will be assigned the rank 6. if an item is repeated
thrice at rank 7, then the common rank to be assigned to each value will be
(7+8+9)/ 3, i.e., 8 which the arithmetic mean of 7, 8, and 9 viz. the ranks
these observation would have got if they were different from each other. The
next rank to be assigned will be 10.
If only a small proportion of the ranks are tied, this technique may be applied
together with formula. If a large proportion of ranks are tied, it is advisable to
apply an adjustment or a correction factor as explained:
“In a formula add the factor m (m3 – 1) / 12 to ∑d2, where m is the number of
times an item is repeated. This correction factor is to be added for each
repeated value in both the series.

Manipal University Jaipur Page No. 455


Statistics for Management Unit 12

Solved Problem 9
Find the rank correlation coefficient for the data depicted in table 12.7.
Table 12.7: Scores of Student in Test I and Test II
Student A B C D E F G H I J
Score on Test I 20 30 22 28 32 40 20 16 14 18
Score on Test II 32 32 48 36 44 48 28 20 24 28

Solution:
Table 12.7a depicts the required data for calculating the correlation
coefficient.
Table 12.7a: Ranks of Test I and Test II
Score Score Rank Rank Difference
Difference
on on of on between
Student squared
Test I Test II Test I Test II Ranks
D2
X Y R1 R2 D
A 20 32 6.5 5.5 1.0 1.00
B 30 32 3 5.5 - 2.5 6.25
C 22 48 5 1.5 3.5 12.25
D 28 36 4 4 0 0
E 32 44 2 3 - 1.0 1.00
F 40 48 1 1.5 - 0.5 0.25
G 20 28 6.5 7.5 - 1.0 1.00
H 16 20 9 10 - 1.0 1.00
I 14 24 10 9 1.0 1.00
J 18 28 8 7.5 0.5 0.25
N = 10 D2 = 24


 = 1 – 6  D  1 / 12(m1  m1 )  1 / 12(m2 m2 )  1 / 12(m3 m3 )  1 / 12(m4 m4 )
2 3 3 3 3

N( N 2  1)

Where, mi represents the number of times a rank is repeated.

=1–
6  24  1 / 12(2 3
 2)  1 / 12(2 3  2)  1 / 12(2 3  2)  1 / 12(2 3  2) 
10(10 2  1)

=1–
144  0.5  0.5  0.5  0.5 = 1 – 146
 0.8525
10  99 10  99

Manipal University Jaipur Page No. 456


Statistics for Management Unit 12

Activity:
Find the rank correlation from the following distribution
Cost 39 65 62 90 82 75 25 98 36 78
Sales 47 53 58 86 62 68 60 91 51 54

Activity Solution
Cost Sales
X Y R1 R2 D D2
39 47 8 10 -2 4
65 53 6 8 -2 4
62 58 7 7 0 0
90 86 2 2 0 0
82 62 3 5 -2 4
75 68 5 4 1 1
25 60 10 6 4 16
98 91 1 1 0 0
36 51 9 9 0 0
78 54 4 3 1 1
D2 = 30
6  D2
  1
N( N 2  1)
6  30 180
  1  1  0.82
10(10  1)
2
990

12.7 Partial Correlation


Partial correlation is used in a situation where three or four variables are
involved. The three variables may be age, height, and weight. Correlation
between height and weight can be computed by keeping the age constant.
Age may be the important factor influencing the strength of relationship
between height and weight. Partial correlation is used to keep constant the
effect of age. The effect of one variable is partially found from the correlation
between the other two variables. This statistical technique is known as
Partial Correlation. Correlation between variables ‘x’ and ‘y’ is denoted as
‘rxy’. Further, partial correlation between ‘x’ and ‘y’ keeping the variable ‘z’
constant is denoted by ‘rxy.z’

Manipal University Jaipur Page No. 457


Statistics for Management Unit 12

Key Statistic
Partial correlation is denoted by the symbol r12.3. Here correlation
between variable 1 and 2 keeping 3rd variable constant is:
r12  r13 .r23
r12.3 
1  r13 . 1  r23
2 2

where,
r12.3 = Partial correlation between variables 1 and 2 keeping 3rd constant
r12 = correlation between variables 1 and 2
r13 = correlation between variables 1 and 3
r23 = correlation between variables 2 and 3
Similarly,
r13  r12 . r23 r23  r12 . r13
r13.2  and r23.1 
1  r12  1  r23 1  r12  1  r13
2 2 2 2

Solved problem 10
Given r12 = 0.8, r13 = 0.5 and r23 = 0.4, calculate all partial correlations.

Solution:
(i) The correlation between variables 1 and 2 keeping the 3rd constant is
given by:
r12  r13 .r23 0.8  0.5  0.4 0.6
r12.3     0.756
2
1  r13 . 1  r23
2
1  0.5  1  0.4
2 2 0.794

(ii) The correlation between variables 1 and 3 keeping the 2nd constant is
given by:

r13  r12 .r23 0.5  0.8  0.4 0.18


r13.2     0.33
1  r12 . 1  r23
2 2
1  0.8  1  0.4
2 2 0.55

(iii) The correlation between variables 2 and 3 keeping the 1st constant is
given by:
r23  r21.r13 0.4  0.8  0.5
r23.1   0
1  r21 . 1  r13
2 2
1  0.8 2  1  0.5 2

Manipal University Jaipur Page No. 458


Statistics for Management Unit 12

Self Assessment Questions


Calculate the required correlation coefficients.
1. i. From the following data, calculate the correlation between variables 1
and 2 keeping the 3rd constant.
r12 = 0.7; r13 = 0.6 r23 = 0.4
ii. Calculate r23.1 and r13.2 from the following:
r12 = 0.60; r13 = 0.51; r23 = 0.40
iii. Given the zero order correlation coefficients, calculate the partial
correlation between variables 1 and 3 keeping the 2nd variable
constant. Interpret your result.
r12 = 0.8; r13 = 0.6; r23 = 0.5

12.8 Multiple Correlations


Three or more variables are involved in Multiple Correlations. The
dependent variable is denoted by X1 and other variables are denoted by X2,
X3 etc. Gupta S.P. has expressed that “the coefficient of multiple linear
correlation is represented by R1 and it is common to add subscripts
designating the variables involved”. Thus R1.234 would represent the
coefficient of multiple linear correlations between X1 on the one hand, X2,
X3, and X4 on the other. The subscript of the dependent variable is always to
the left of the point.
The coefficient of multiple correlations for R1.23, R2.13, and R3.12 can be
expressed as:

R1.23 = r
12
2
 r13 2  2 r12 r13 r23  1  r 
23
2

R2.13 = r
2
12
 r 2  2 r12 r13 r23
23
 1  r 
2
13

R3.12 = r
2
13
 r23
2
 2 r12 r13 r23  1 r 
2
12

Coefficient of multiple correlations for R1.23 is the same as R1.32.


A coefficient of multiple correlation lies between ‘0’ and ‘1’. If the coefficient
of multiple correlations is ‘1’, it shows that the correlation is perfect. If it is ‘0’,
it shows that there is no linear relationship between the variables. The

Manipal University Jaipur Page No. 459


Statistics for Management Unit 12

coefficients of multiple correlations are always positive in sign and range


from ‘0’ to ‘+1’. Coefficient of multiple determinations can be obtained by
squaring R1.23.
Alternative formula for computing R1.23 is:

R1.23  r12 2  r13.2 2 (1  r12 2 ) or

R 21.23  r12 2  r13.2 2 (1  r12 2 )

Similarly, alternative formulas for R1.24 and R1.34 can be obtained.


Multiple correlation analysis measures the relationship between the given
variables. In this analysis, the degree of association is measured between
one variable (which is considered as the dependent variable) and a group of
other variables (which are considered as independent variables).

Solved Problem 11
The following are the zero order correlation coefficients.
r12 = 0.98; r13 = 0.44 r23 = 0.54

Calculate the multiple correlation coefficient treating the first variable as


dependent and second and third variables as independent.

Solution:
The first variable is dependent. The second and third variables are
independent. Using the formula for multiple correlation coefficients for R1.23
we get:

R1.23 = r 2
12  r13
2
 2r 12 r 13 r 23  1  r 
2
23
= 0.986

Hence the multiple correlation coefficient is 0.986.

Self Assessment Questions


2. State whether the following statements are ‘True’ or ‘False’.
i. Scatter diagram does not give us a quantitative measure of
correlation coefficient.
ii. Correlation estimates the value of one variable from the knowledge
of the other.
iii. Correlation coefficient is an absolute measure.

Manipal University Jaipur Page No. 460


Statistics for Management Unit 12

12.9 Regression
According to M. M. Blair, Regression is defined as, “the measure of the
average relationship between two or more variables in terms of the original
units of the data”.
Correlation analysis attempts to study the relationship between the two
variables ‘X and ‘Y’. In regression, it is attempted to quantify the
dependence of one variable on the other. For example, if there are two
variables ‘X’ and ‘Y’ and ‘Y’ depends on ‘X’, then the dependence is
expressed in the form of the equations.
12.9.1 Regression analysis
Regression analysis is used to estimate the values of the dependent
variables from the values of the independent variables. Regression analysis
is used to get a measure of the error involved while using the regression line
as a basis for estimation. The regression coefficient Y on X is the coefficient
of the variable ‘X’ in the line of regression Y on X. Regression coefficients
are used to calculate the correlation coefficient. The square of correlation is
the product of regression coefficients.
12.9.2 Regression lines
For a set of paired observations, there exist two straight lines. The line
drawn in such a way that the sum of vertical deviation is zero and the sum of
their squares is minimum, is called regression line of ‘Y’ on ‘X’. It is used to
estimate ‘Y’ values for given ‘X’ values. The line drawn in such a way that
the sum of horizontal deviation is zero and sum of their squares is minimum,
is called regression line of ‘X’ on ‘Y’. It is used to estimate the ‘X’ values for
the given ‘Y’ values. The smaller the angle between these lines, the higher
is the correlation between the variables. The regression lines always
intersect at ( X, Y ).

The regression lines have equation,


i) The regression equation of ‘Y’ on ‘X’ is given by:


Y  Y  b yx X  X 
ii) The regression equation of ‘X’ on ‘Y’ is given by:

X  X  b xy Y  Y 
Manipal University Jaipur Page No. 461
Statistics for Management Unit 12

where,
N  dxdy  ( dx) ( dy) 
b xy  or b xy  r x
N  dy  ( dy)
2 2
y
N  dxdy  ( dx) ( dy) y
b yx  or b  r
N  dx 2  ( dx) 2 x
yx

‘byx’ and ‘bxy’ are called regression coefficients.


12.9.3 Regression coefficient
When a regression is linear, then the regression coefficient is given by the
slope of the regression line.
 The geometric mean of regression coefficients gives the correlation
coefficient.
r 2  b yx .b xy
r  b yx .b xy
 The product of regression coefficients is always less than 1,that is,
b yx .b xy  1
 If ‘byx’ is negative, then ‘bxy’ is also negative and ‘r’ is negative.
 They can also be expressed as:
y x
b yx  r. and b xy  r.
x y
 It is an absolute measure
The differences between Correlation and Regression Coefficient are
depicted in table 12.8.
Table 12.8: Differences Between Correlation and Regression Coefficient
Correlation Coefficient Regression Coefficient
The correlation coefficients, rxy = ryx The regression coefficients, b yx  b xy
‘r’ lies between -1 and 1. ‘byx’ can be greater than one in which
case ‘bxy’ must be less than one such
that byx.bxy  1
It has no units attached to it. It has units attached to it.

Manipal University Jaipur Page No. 462


Statistics for Management Unit 12

There exists nonsense correlation. There is no such nonsense regression.


It is not based on cause and effect It is based on cause and effect
relationship. relationship.
It indirectly helps in estimation. It is meant for estimation.

Solved Problem 12
Find regression equation from the data depicted in table 12.9. Then
calculate the correlation coefficient.
Table 12.9: Data of Ages of Husband and Wife
Age of Husband 18 19 20 21 22 23 24 25 26 27
Age of Wife 17 17 18 18 19 19 19 20 21 22

Solution:
Table 12.9a depicts the data required for calculation of correlation and
regression coefficients.
Table 12.9a: Data Required for Calculation of Correlation and Regression
Coefficients
Age of
Age of wife
husband dx = X-22 dx2 dy = Y-19 dy2 dx dy
Y
X
18 -4 16 17 -2 4 8
19 -3 9 17 -2 4 6
20 -2 4 18 -1 1 2
21 -1 1 18 -1 1 1
22 0 0 19 0 0 0
23 1 1 19 0 0 0
24 2 4 19 0 0 0
25 3 9 20 1 1 3
26 4 16 21 2 4 8
27 5 25 22 3 9 15
∑X =225 ∑dx = 5 ∑dx2=85 ∑Y = 190 ∑dy = 0 ∑dy2=24 ∑dxdy= 43

225 190
X  22.5 Y  19
10 10

Manipal University Jaipur Page No. 463


Statistics for Management Unit 12

Regression equation of Y on X is :
Y  Y  b y x (X  X)
N  dxdy  ( dx) ( dy)
b yx 
N  dx 2  ( dx) 2
10  43  (5) (0) 430
byx =   0.521
10  85  (5) 2 825
   19  0.521  22.5
   0.521  7.2775
Regression Equation of X and Y is:

X  X  b xy Y  Y 
N  dxdy  ( dx) ( dy)
b xy 
N  dy 2  ( dy) 2
10  43  (5) (0) 430
bxy =   1.792
10  24  (0) 2 240
   22.5  1.792   19 
   1.792  11.548
r  b yx .b xy
r  0.521x1.792  0.966
Hence, the Correlation Coefficient ‘r’ is 0.966.
Solved Problem 13
Table 12.10 depicts the results that were worked out from scores in
statistics and mathematics in a certain examination.
Table 12.10: Scores in Statistics and Mathematics
Scores in Statistics Scores in Mathematics
X Y

Mean 40 48
Standard Deviation 10 15

Manipal University Jaipur Page No. 464


Statistics for Management Unit 12

Karl Pearson’s correlation coefficient between ‘X’ and ‘Y’ is = + 0.42. Find
the regression lines ‘X’ on ‘Y’ and ‘Y’ on ‘X’. Use the regression lines to find
the value of ‘Y’ when X = 50 and value of ‘X’ when Y = 30.
Solution:
Given the following data:
X  40; Y  48 x = 10; y = 15; r = 0.42
The regression line X on Y is:


X  X  b xy Y  Y 
x x 10
b xy  r , b xy  r  0.42   0.28
y y 15
   40  0.28  48 
   0.28  26.56
The regression line ‘y’ on ‘x’ is given as:

Y  Y  b yx X  X 
y y 15
b yx  r , b yx  r  0.42   0.63
x x 10
   48  0.63  40 
 Y  0.63X  22.8
Therefore,
when Y = 30;   0.28  26.56 ; X = 34.96
when X =50; Y  0.63X  22.8 ; Y = 54.3

12.10 Standard Error of Estimate


The standard error of estimates helps to measure the accuracy of the
estimated figures in regression analysis. If the value of the standard error of
estimate is small, it shows that the estimate provided by the regression
equation is better and closer. If standard error of estimate is zero, it shows
that there is no variation about the line and the correlation will be perfect.
Standard Error of Estimate is the average of the square of the deviations
between the actual values and the estimated values based on the
regression equations.

Manipal University Jaipur Page No. 465


Statistics for Management Unit 12

The standard error of estimate of X values from Y is:

(X  X c ) 2
Sxy =
N
The standard error of estimate of Y values from X is:

(Y  Yc ) 2
S xy  ,
N
where Yc and Xc are the estimated values of Y and X variables from the line
of regression of Y on X and X on Y respectively.
The following simpler formulae are used for calculating Sxy and Syx
 X 2  a  X  b  XY
S xy 
N
 Y 2  a  Y  b  XY
S yx 
N
To make the standard error an unbiased estimate of the actual variance of
the X or Y values, we divide the variability by (N - 2)

(X  X c ) 2
Sxy =
N2

(Y  Yc ) 2
S xy 
N2

12.11 Multiple Regression Analysis


Multiple regression analysis is an extension of two variable regression
analysis. In this analysis, two or more independent variables are used to
estimate the values of a dependent variable, instead of one independent
variable.
Objectives of multiple regression analysis are:
 To derive an equation which provides estimates of the dependent
variable from values of the two or more independent variables
 To obtain the measure of the error involved in using the regression
equation as a basis of estimation
Manipal University Jaipur Page No. 466
Statistics for Management Unit 12

 To obtain a measure of the proportion of variance in the dependent


variable accounted for or explained by the independent variables
Multiple regression equation explains the average relationship between the
given variables and the relationship is used to estimate the dependent
variable. Regression equation refers to the equation for estimating a
dependent variable. Estimating dependent variable Y from the independent
variables X1, X2……, is known as regression equation of Y on X1, X2……….
Let the dependent variable be Y which depends on two independent
variables X1 and X2
The linear relationship among Y, X1 and X2 can be expressed in the form of
the regression equation of Y on X1 and X2 in the form:
Y= b0 + b1 X1 + b2 X2
Where b0 is referred to as intercept and b1 & b2 are known as regression
coefficients.
The values of b0, b1 & b2 can be determined by solving the normal equations:
 i  b 0  b 1   1i  b 2   2i

 i  b 0   1i  b 1   1i  b 2   1i X 2i
2
1i

 i  b 0   2i  b 1i X 2i  b 2   2i
2
2i

The values of b0, b1 & b2 are estimated with the help of Principle of Least
squares.
12.11.1 Application of Multiple Regression
Multiple regressions analysis can be applied to test the factors such as
export elasticity, import elasticity, and structural change (contribution of
manufacturing sector towards GDP) influencing over employment. Here,
employment is a dependent variable.
Similarly, researchers can attempt to use multiple regressions in their
research work appropriately.

Self Assessment Questions


3. State whether the following statements are ‘True’ or ‘False’.
i. Correlation coefficient is a geometric mean between regression
coefficients.
Manipal University Jaipur Page No. 467
Statistics for Management Unit 12

ii. The regression lines always intersect at (X, Y) .


x
iii. b xy  r.
y
iv. The higher the angle between regression coefficients, the lower is
the correlation coefficient.

12.12 Summary
Let us recapitulate the important concepts discussed in this unit:
 When two or more variables move in sympathy with the other, then they
are said to be correlated. If both variables move in the same direction,
then they are said to be positively correlated. If the variables move in the
opposite direction, then they are said to be negatively correlated. If they
move haphazardly, then there is no correlation between them.
 Regression helps us to study unknown variables with the help of known
variables. It also establishes a reliability measure for estimated values.
 Regression analysis helps to quantify the dependence of one variable
on the other. Some of the regression types are simple and multiple
regressions, linear and non linear regression.
 Regression analysis is useful in business and economic scenarios in the
decision making process.

12.13 Glossary
Correlation: When two or more variables move in sympathy with the other,
then they are said to be correlated.
Correlation coefficient: Critical statistic which indicates the direction and
intensity of a relationship between two continuous variables. Domain
extends from -1 through 0 to +1. Significance can be determined via
statistical testing. Both parametric and nonparametric correlation coefficients
are possible.
Coefficient of variation: A relative measure of variation, expressed as a
percentage; useful in comparing the variability of data sets with different
units of measure.

Manipal University Jaipur Page No. 468


Statistics for Management Unit 12

Range: A nonresistant measure of variation in data that is equal to either


the minimum and maximum value or the difference between these two
values. The range in a large representative sample should approach the
domain of the variable of interest.
Regression: Technique to produce mathematical models of casual
relationships between and among variables. Regression models are used to
describe and potentially predict outcomes.

12.14 Terminal Questions


1. Table 12.11 depicts the marks obtained by 10 students in commerce
and statistics. Calculate the rank correlation.
Table 12.11: Marks of Students Obtained in Commerce and Statistics
Marks in Statistics 35 90 70 40 95 45 60 85 80 50
Marks in Commerce 45 70 65 30 90 40 50 75 85 60

2. Calculate Spearman’s rank correlation coefficient between the series A


and B depicted in table 12.12.
Table 12.12: Series Data of Terminal Question 2
Series A 57 59 62 63 64 65 55 58 57
Series B 113 117 126 126 130 129 111 116 112

3. For the data in table 12.13, obtain the two lines of regression and its
estimation of the blood pressure when age is 50 yrs.
Table 12.13: Data for Terminal Question 3
Age in yrs (X) 56 42 72 39 63 47 52 49 40 42 68 60
B P (Y) 127 112 140 118 129 116 130 125 115 120 135 133

4. Table 12.14 depicts the results that were worked out from scores in
statistics and mathematics in a certain examination.
Table 12.14: Results of Scores in Statistics and Mathematics Examination
Scores in Statistics Scores in Mathematics
(X) (Y)
Mean 39.5 47.5
Standard Deviation 10.8 17.8

Manipal University Jaipur Page No. 469


Statistics for Management Unit 12

Karl Pearson’s correlation coefficient between X and Y = 0.42. Find both the
regression lines. Use these lines to estimate the value of Y when X = 50 and
the value of X when Y = 30.

12.15 Answers

Self Assessment Questions


1. i) Refer section 12.7
ii) Refer section 12.7
iii) Refer section 12.7
2. i) True ii) False iii) False
3. i) True ii) True iii) True iv) True

Terminal Questions
1. 0.903
2. 0.967
3. X = - 95 + 1.184
Y = 87.2 + 0.724
4. X = 27.62 + 0.25Y
Y = 20.24 + 0.69X

12.16 Case Study


India is ranked as 126th in the Human Development Index (HDI) among 177
countries for which data is compiled as per the report released during
November 2006, and published in Hindustan Times dated November 10,
2006. HDI depends on indicators such as expectancy, literacy, and per
capita income. Use appropriate correlation and regression analysis to
prepare a report on the basis of the given data.

Manipal University Jaipur Page No. 470


Statistics for Management Unit 12

Table 12.12: Human Development Index Table


HDI Life Adult School GDP Human Population
Rank Expect- Literacy Enrol- Per Poverty (2004)
ancy Rate(% ment Capita Index
ages 15 & % Rank
older)
Norway 1 79.6 NA 100 38,454 Nil 4.6 77.3
Iceland 2 80.9 NA 96 33,051 Nil 0.3 92.7
USA 8 77.5 NA 93 39,676 Nil 295.4 80.5
Thailand 74 70.3 92.6 74 8,090 19 63.7 32.0
China 81 71.9 90.9 70 5,896 26 1,308 22.0
Srilanka 93 74.3 90.7 63 4,390 38 20.6 15.2
India 126 63.3 61.0 62 3,139 55 1,087.1 28.5

By calculating the rank correlation, find out as to which of the indicators viz.
life expectancy, literacy, and GDP affects the HDI to the maximum extent.
To what extent the life expectancy in the nation depends on the percentage
of its urban population?
(Source: Srivastava, T. N. and Rejo, S. (2008) Statistics for Management, 5 th
edition, TMH)

References
 Agarwal, B. L. (2006) Basic Statistics, 4th Edition, New Age International
Publishers.
 Bowerman, B. L. and Connel, R.T. O., (1996) Applied Statistics:
Improving Business Processes, Irwin.
 Levin, R. I., Rubin, D. S. (2008), Statistics for Management, 7th Edition,
PHI Learning Private Limited.
 Pisani, F. D. R., and Purves, R. (1997), Statistics, 3rd edition, W.W
Norton.
 Srivastava, T. N. and Rejo, S. (2008) Statistics for Management, 5th
edition, TMH.
 Tanur,J. M., (2002), Statistics: A Guide to the unknown, 4th
edition,Brooks/cole.

Manipal University Jaipur Page No. 471


Statistics for Management Unit 12

 Tukey, J.W, (1977), Exploratory Data Analysis, Addison–Wesley.


 Wilcox, R. R. (2009) Basic Statistics – Understanding Conventional
Methods and Modern Insights, Oxford University Press.

E-Reference
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-1.pdf

Manipal University Jaipur Page No. 472


Statistics for Management Unit 13

Unit 13 Business Forecasting


Structure:
13.1 Introduction
Objectives
13.2 Business Forecasting
Objectives of Forecasting in business
Prediction, Projection and Forecasting
Characteristics of Business Forecasting
Steps in forecasting
13.3 Methods of Business Forecasting
Business barometers
Time series analysis
Extrapolation
Regression analysis
Modern econometric methods
Exponential smoothing methods
13.4 Theories of Business Forecasting
Sequence or time-lag theory
Action and reaction theory
Economic rhythm theory
Specific historical analogy
Cross-cut analysis theory
13.5 Utility of Business Forecasting
Advantages of business forecasting
Limitations of business forecasting
13.6 Summary
13.7 Glossary
13.8 Terminal Questions
13.9 Answers
13.10 Case Study

13.1 Introduction
In the previous unit, we studied about Correlation and Regression
techniques, which are used for investigating the relationship between two or
more variables. In this unit we will discuss about business forecasting, the

Manipal University Jaipur Page No. 473


Statistics for Management Unit 13

methods available in forecasting, and the use of forecasting models in


business improvement processes.
The growing competition, rapidity of change in circumstances and the trend
towards automation demand that decisions in business are based on a
careful analysis of data concerning the future course of events and not
purely on guesses and hunches. The future is unknown to us and yet every
day we are forced to make decisions involving the future and therefore,
there is uncertainty. Great risk is associated with business affairs. All
businessmen are forced to make forecasts regarding business activities.
Success in business depends upon successful forecasts of business events.
In recent times, considerable research has been conducted in this field.
Attempts are being made to make forecasting as scientific as possible.
Business forecasting is not a new development. Every businessman must
forecast; even if the entire product is sold before production. Forecasting
has always been necessary. What is new in the attempt to put forecasting
on a scientific basis is to forecast by reference to past history and statistics
rather than by pure intuition and guess-work.
One of the most important tasks before businessmen and economists these
days is to make estimates for the future. For example, a businessman is
interested in finding out his likely sales next year or as long term planning in
next five or ten years so that he adjusts his production accordingly and
avoid the possibility of either inadequate production to meet the demand or
unsold stocks.
Similarly, an economist is interested in estimating the likely population in the
coming years so that proper planning can be carried out with regard to jobs
for the people, food supply, etc. First step in making estimates for the future
consists of gathering information from the past. In this connection we usually
deal with statistical data which is collected, observed or recorded at
successive intervals of time. Such data is generally referred to as time
series. Thus, when we observe numerical data at different points of time the
set of observations is known as time series.
Objectives:
After studying this unit, you should be able to:
 describe the meaning of business forecasting

Manipal University Jaipur Page No. 474


Statistics for Management Unit 13

 distinguish between prediction, projection and forecast


 describe the forecasting methods available
 apply the forecasting theories in taking effective business decisions

13.2 Business Forecasting


Business forecasting refers to the analysis of past and present economic
conditions with the object of drawing inferences about probable future
business conditions. The process of making definite estimates of future
course of events is referred to as forecasting and the figure or statements
obtained from the process is known as ‘forecast’; future course of events is
rarely known. In order to be assured of the coming course of events, an
organised system of forecasting helps. The following are two aspects of
scientific business forecasting:
1. Analysis of past economic conditions
For this purpose, the components of time series are to be studied. The
secular trend shows how the series has been moving in the past and what
its future course is likely to be over a long period of time. The cyclic
fluctuations would reveal whether the business activity is subjected to a
boom or depression. The seasonal fluctuations would indicate the seasonal
changes in the business activity.
2. Analysis of present economic conditions
The object of analysing present economic conditions is to study those
factors which affect the sequential changes expected on the basis of the
past conditions. Such factors are new inventions, changes in fashion,
changes in economic and political spheres, economic and monetary policies
of the government, war, etc. These factors may affect and alter the duration
of trade cycle. Therefore, it is essential to keep in mind the present
economic conditions since they have an important bearing on the probable
future tendency.
13.2.1 Objectives of forecasting in business
Forecasting is a part of human nature. Businessmen also need to look to
the future. Success in business depends on correct predictions. In fact when
a man enters business, he automatically takes with it the responsibility for
attempting to forecast the future.

Manipal University Jaipur Page No. 475


Statistics for Management Unit 13

To a very large extent, success or failure would depend upon the ability to
successfully forecast the future course of events. Without some element of
continuity between past, present and future, there would be little possibility
of successful prediction. But history is not likely to repeat itself and we would
hardly expect economic conditions next year or over the next 10 years to
follow a clear cut prediction. Yet, past patterns prevail sufficiently to justify
using the past as a basis for predicting the future.
A businessman cannot afford to base his decisions on guesses. Forecasting
helps a businessman in reducing the areas of uncertainty that surround
management decision making with respect to costs, sales, production,
profits, capital investment, pricing, expansion of production, extension of
credit, development of markets, increase of inventories and curtailment of
loans. These decisions are to be based on present indications of future
conditions.
However, we know that it is impossible to forecast the future precisely.
There is a possibility of occurrence of some range of error in the forecast.
Statistical forecasts are the methods in which we can use the mathematical
theory of probability to measure the risks of errors in predictions.
13.2.2 Prediction, Projection and Forecasting
A great amount of confusion seems to have grown up in the use of words
‘forecast’, ‘prediction’ and ‘projection’.

Key Statistic
A prediction is an estimate based solely on past data of the series under
investigation. It is purely a statistical extrapolation.
A projection is a prediction, where the extrapolated values are subject to
certain numerical assumptions.
A forecast is an estimate, which relates the series in which we are
interested into external factors.

Forecasts are made by estimating future values of the external factors by


means of prediction, projection or forecast and from these values calculating
the estimate of the dependent variable.

Manipal University Jaipur Page No. 476


Statistics for Management Unit 13

13.2.3 Characteristics of Business Forecasting


 Based on past and present conditions
Business forecasting is based on past and present economic condition of
the business. To forecast the future, various data, information and facts
concerning to economic condition of business for past and present are
analysed.
 Based on mathematical and statistical methods
The process of forecasting includes the use of statistical and mathematical
methods. By using these methods, the actual trend which may take place in
future can be forecasted.
 Period
The forecasting can be made for long term, short term, medium term or any
specific period.
 Estimation of future
Business forecasting is to forecast the future regarding probable economic
conditions.
 Scope
Forecasting can be physical as well as financial.
13.2.4 Steps in forecasting
Forecasting of business fluctuations consists of the following steps:
1. Understanding why changes in the past have occurred
One of the basic principles of statistical forecasting is that the forecaster
should use past performance data. The current rate and changes in the rate
constitute the basis of forecasting. Once they are known, various
mathematical techniques can develop projections from them. If an attempt is
made to forecast business fluctuations without understanding why past
changes have taken place, the forecast will be purely mechanical.
Business fluctuations are based solely upon the application of mathematical
formulae and are subject to serious error.
2. Determining which phases of business activity must be measured
After understanding the reasons of occurrence of business fluctuations, it is
necessary to measure certain phases of business activity in order to predict
what changes will probably follow the present level of activity.

Manipal University Jaipur Page No. 477


Statistics for Management Unit 13

3. Selecting and compiling data to be used as measuring devices


There is an independent relationship between the selection of statistical
data and determination of why business fluctuations occur. Statistical data
cannot be collected and analysed in an intelligent manner unless there is
sufficient understanding of business fluctuations. It is important that reasons
for business fluctuations be stated in such a manner that it is possible to
secure data that is related to the reasons.
4. Analysing the data
Lastly, the data is analysed to understanding the reason why change
occurs. For example, if it is reasoned that a certain combination of forces
will result in a given change, the statistical part of the problem is to measure
these forces, from the data available, to draw conclusions on the future
course of action. The methods of drawing conclusions may be called
forecasting techniques.

13.3 Methods of Business Forecasting


Almost all businessmen forecast about the conditions related to their
business. In recent years scientific methods of forecasting have been
developed. The base of scientific forecasting is statistics. To handle the
increasing variety of managerial forecasting problems, several forecasting
techniques have been developed in recent years. Forecasting techniques
vary from simple expert guesses to complex analysis of mass data. Each
technique has its special use, and care must be taken to select the correct
technique for a particular situation.
Before applying a method of forecasting, the following questions should be
answered:
 What is the purpose of the forecast and how is it to be used?
 What are the dynamics and components of the system for which the
forecast will be made?
 How important is the past, in estimating the future?
The following are the main methods of business forecasting.
1. Business barometers
2. Time series analysis
3. Extrapolation
Manipal University Jaipur Page No. 478
Statistics for Management Unit 13

4. Regression analysis
5. Modern econometric methods
6. Exponential smoothing method
13.3.1 Business Barometers
Business indices are constructed to study and analyse the business
activities on the basis of which future conditions are predetermined. As
business indices are the indicators of future conditions, they are also known
as ’business barometers’ or ‘economic barometers’. With the help of these
business barometers the trend of fluctuations in business conditions are
understood and a decision can be taken relating to the problem by
forecasting.
The construction of business barometer consists of gross national product,
wholesale prices, consumer prices, industrial production, stock prices, bank
deposits etc. These quantities may be converted into relatives on a certain
base. The relatives so obtained may be weighted and their average
computed.
There are three types of business barometers. They are barometers for:
1. General business activities
2. Specific business or industry
3. Individual business firm
1. Barometers relating to general business activities:
Barometers relating to general business activities are also known as general
indices of business activities which refer to weighted or composite indices of
individual index business activities. With the help of general index of
business activity, long term trends and cyclical fluctuations in the economic
activities of a country are measured. However, in some specific cases, the
long term trends can be different from general trends. These types of
indices help in the formation of a country’s economic policies.
2. Business barometers for specific business or industry
These barometers are used as the supplement of general index of business
activity and are constructed to measure future variations in a specific
business or industry.

Manipal University Jaipur Page No. 479


Statistics for Management Unit 13

3. Business barometers concerning to individual business firm


This type of barometer is constructed to measure the expected variations in
a specific firm of an industry.
Table 13.1 depicts the merits and demerits of Business Barometers.
Table 13.1: Merits and Demerits of Business Barometers

Merits Demerits
The business barometer method is It is very difficult to construct indices of
scientific and reliable and used by business activities.
management for the purpose of
various business decisions at different
levels.
Business barometer method helps In most of the cases, the business
forecasting future trends of a business. barometers provide inaccurate,
incomplete and inconclusive
forecasting due to index numbers
prepared on the basis of incorrect and
inadequate data.
Business barometers are the indicators The business barometers are the
of future business trends and help to indicators of past conditions and the
forecast the speed of fluctuations. forecasting based on these conditions
may be erroneous.
This method helps to find solutions of Separate indices are calculated for
various business problems such as individual industry and firm which are
development of market, capital entirely different from general indices.
investment, exploration of new
consumer market etc.

13.3.2 Time series analysis


Time series analysis is also used for the purpose of making business
forecasting. The forecasting through time series analysis is possible only
when the business data of various years are available which reflects a
definite trend and seasonal variation. By time series analysis the long term
trend, secular trend, seasonal and cyclical variations are ascertained,
analysed and separated from the data of various years.

Manipal University Jaipur Page No. 480


Statistics for Management Unit 13

Table 13.2 depicts the merits and demerits of time series analysis.
Table 13.2: Merits and Demerits of Time Series Analysis
Merits Demerits
It is an easy method of forecasting. This method is expensive, difficult and
time consuming.
By this method a comparative study This method deals with past data only.
of variations can be made.
Reliable results of forecasting are This method can only be used when the
obtained as this method is based on data for several years are available.
mathematical model.

13.3.3 Extrapolation
Extrapolation is the simplest method of business forecasting. By
extrapolation, a businessman finds out the possible trend of demand of his
goods and also about the future price trends. The accuracy of extrapolation
depends on two factors:
 Knowledge about the fluctuations of the figures
 Knowledge about the course of events relating to the problem under
consideration
Thus, extrapolation is based on two assumptions:
1. There is no sudden jump in figures from one period to another
2. There is regularity in fluctuations and the rise and fall is uniform
In extrapolation, we assume that the variable will follow the established
pattern of growth. For the purpose of business forecasting, one needs to
determine accurately the appropriate trend curve and the values of its
parameters.
Some of these trend curves are explained below.
 Arithmetic trend
The straight line arithmetic trend assumes that growth will be a constant
amount each year.
 Semi-log trend
It assumes a constant percentage increase each year. As the annual
increment is constant in logarithm, this line will become a straight line when
drawn on semi-log paper.

Manipal University Jaipur Page No. 481


Statistics for Management Unit 13

 Modified exponential curve


The curve is given by:
y  ab x
This relationship is referred to as an exponential function. It assumes that
each increment of growth will be a constant percent of the previous one.
 Logistic curve
This curve has both an upper asymptote and a lower asymptote. A curve of
this type is well suited to describe the growth of industries as they pass
through early periods of experimentation, rapid growth as the product is
perfected and economics of scale make possible price reductions. The
equation of the curve is given by:
1 1
y or ab x  g 
ab x  g y

 Gompertz curve
It is given by:

c  ab c
X

In the logarithmic form, it is given by:

Logc  Loga  Logbc 

To decide the curve to be used, it is helpful to obtain scatter diagram of


transformed variable.
Table 13.3 depicts the merits and demerits of extrapolation method.
Table 13.3: Merits and Demerits of Extrapolation Method

Merits Demerits
This method is very useful to forecast This method can be used under its
the future demand and production. own assumptions only.
This method is widely used for the This method is not simple but
forecasting of business events. technical, because of its mathematical
formulation.
We get pure and reliable results by this The selection of trend curve is very
method, because it is a mathematical difficult.
method.

Manipal University Jaipur Page No. 482


Statistics for Management Unit 13

13.3.4 Regression analysis


The regression approach offers many valuable contributions to the solution
of the forecasting problem. It is the means by which we select from among
the many possible relationships between variables in a complex economy,
which will be useful for forecasting.
Regression relationship may involve one predicted or dependent variable
and one independent variable under simple regression, or it may involve
relationships between the variable to be forecasted and several independent
variables under multiple regressions.
Statistical techniques to estimate the regression equations are often fairly
complex and time-consuming. However, there are many computer programs
now available that estimate simple and multiple regressions quickly.
13.3.5 Modern econometric methods
Econometric techniques, which originated in the eighteenth century, have
recently gained popularity for forecasting. Econometrics refers to the
application of mathematical economic theories and statistical procedures to
economic data to verify economic theorems. Models take the form of a set
of simultaneous equations. The values of the constants in such equations
are supplied by a study of statistical time series, and a large number of
equations may be necessary to produce an adequate model.
Presently, short-term forecasting uses only statistical methods with little
qualitative information. However, in the years to come when most large
companies develop and refine econometric models of their major business,
this tool of forecasting will become more popular.
Table 13.4 depicts the merits and demerits of modern econometric methods.
Table 13.4: Merits and Demerits of Modern Econometric Methods

Merits Demerits
Accurate and reliable results are This method is difficult and
obtained under this method. complicated.
It is a scientific method where This method can be used only when
computer technology is used. adequate series of data is available.
This method explains in detail and in It is very difficult to construct growth
quantitative terms the way in which model for every business activity.
various aspects of the economy are
interrelated.

Manipal University Jaipur Page No. 483


Statistics for Management Unit 13

13.3.6 Exponential smoothing method


This method is regarded as the best method of business forecasting as
compared to other methods. Exponential smoothing is a special kind of
increasing exponential weighted average assigned to recent observation
data and is found extremely useful in short-term forecasting of inventories
and sales.

Selection of different methods of forecasting


The selection of an appropriate forecasting method depends on many
factors, such as:
 Context of the forecast
 Relevance and availability of historical data
 Degree of accuracy desired
 Time period for which forecasts are required
 Cost benefit of the forecast to the company
 Time available for making the analysis
The forecaster should use a technique that makes the best use of
available data. Where a company wishes to forecast with reference to a
particular product, it must consider the stage of the product’s life cycle.

13.4 Theories of Business Forecasting


There are a few theories that are followed while making business forecasts.
Some of them are:
1. Sequence or time-lag theory
2. Action and reaction theory
3. Economic rhythm theory
4. Specific historical analogy
5. Cross-cut analysis theory
13.4.1 Sequence or time-lag theory
This is the most important theory of business forecasting. It is based on the
assumption that most of the business data have the lag and lead
relationships, that is, changes in business are successive and not
simultaneous. There is time-lag between different movements.

Manipal University Jaipur Page No. 484


Statistics for Management Unit 13

Example 1
When government makes use of deficit financing, it leads to inflationary
pressures; the purchasing power of people goes up. Therefore, the
wholesale prices and retail prices start rising. With the rise in retail prices,
the cost of living goes up and with it there is a demand for increased
wages. Thus, one factor, that is, more money in circulation, has affected
various fields of economic activity not simultaneously but successively.
Table 13.5 depicts the merits and demerits of sequence or time-lag theory.
Table 13.5: Merits and Demerits of Sequence or Time-lag Theory
Merits Demerits
This method is largely used for This method studies only the action and not
business forecasting. the reaction.
Though this theory is based on This method cannot be regarded as
statistical techniques, yet it is accurate because by using statistical
easy to understand. techniques the results can be up to the truth
but not an accurate one.

Time-interval between two events


can be ascertained.
Government can use this
technique for the purpose of
economic stability of the economy
by exercising control over
possible losses.

13.4.2 Action and reaction theory


This theory is based on the following two assumptions.
 Every action has a reaction
 Magnitude of the original action influences the reaction
When the price of rice goes above a certain level in a certain period, there is
a likelihood that after some time it will go down below the normal level.
Thus, according to this theory a certain level of business activity is normal or
abnormal; conditions cannot remain so for ever. Thus, we find four phases
of a business cycle. They are:
1. Prosperity

Manipal University Jaipur Page No. 485


Statistics for Management Unit 13

2. Decline
3. Depression
4. Improvement
Table 13.6 depicts the merits and demerits of Action and Reaction theory.
Table 13.6: Merits and Demerits of Action and Reaction Theory
Merits Demerits
This theory is better than other theories. The determination of normal level is
very difficult.
By this theory more reliable results can It is not necessary that reaction is
be obtained because this theory gives equal to the action.
attention to action and reaction of an
event.

13.4.3 Economic Rhythm Theory


The basic assumption of this theory is that history repeats itself and hence
assumes that all economic and business events behave in a rhythmic order.
According to this theory, the speed and time of all business cycles are more
or less the same and by using statistical and mathematical methods, a trend
is obtained which will represent a long term tendency of growth or decline. It
is done on the basis of the assumption that the trend line denotes the
normal growth or decline of business events.
Table 13.7 depicts the merits and demerits of economic rhythm theory.
Table 13.7: Merits and Demerits of Economic Rhythm Theory

Merits Demerits
Forecasting is made on the basis of The business events are not strictly
past conditions, hence they are more periodic and prediction of business
reliable. cycle on the basis of statistical method
is not satisfactory.

This method is helpful in long-term Past conditions are given more


forecasting. weightage than the present conditions.

Manipal University Jaipur Page No. 486


Statistics for Management Unit 13

13.4.4 Specific historical analogy


History repeats itself is the main foundation of this theory. If conditions are
the same, whatever happened in the past under a set of circumstances is
likely to happen in future also. A time series relating to the data in question
is thoroughly scrutinised such a period is selected in which conditions were
similar to those prevailing at the time of making the forecast. However, this
theory depends largely on past data. Table 13.8 depicts the merits and
demerits of specific historical analogy.
Table 13.8: Merits and Demerits of Specific Historical Analogy

Merits Demerits
It is an easy method. In this theory, forecasting is based on
guess work, not on a scientific method
because the past and present
conditions are rarely found to be similar.
As the future is forecasted on the basis It is very difficult to select the past
of past business conditions, the period with the same business
forecasting is more reliable. conditions like present.

13.4.5 Cross-cut analysis theory


This theory proceeds on the analysis of interplay of current economic forces.
In this method, the combined effects of various factors are not studied. The
effect of each factor is studied independently. Under this theory, forecasting
is made on the basis of analysis and interpretation of present conditions
because the past events have no relevance with present conditions. Table
13.9 depicts the merits and demerits of cross-cut analysis theory.
Table 13.9: Merits and Demerits of Cross-cut Analysis Theory

Merits Demerits
Present conditions are preferred than Independent analysis of individual facts
past. is very difficult.
The effect of each factor is studied Past facts are equally important for the
independently. purpose of forecasting, but in this
method no importance is given to past
facts.
Forecast is nearer to the accuracy as it The forecasting made on the basis of
is based on present conditions. this technique cannot be regarded as
reliable.

Manipal University Jaipur Page No. 487


Statistics for Management Unit 13

13.5 Utility of Business Forecasting


Business forecasting acquires an important place in every field of the
economy. Business forecasting helps the businessmen and industrialists to
form the policies and plans related with their activities. On the basis of the
forecasting, businessmen can forecast the demand of the product, price of
the product, condition of the market and so on. The business decisions can
also be reviewed on the basis of business forecasting.
13.5.1 Advantages of business forecasting
 Helpful in increasing profit and reducing losses
Every business is carried out with the purpose of earning maximum profits.
So, by forecasting the future price of the product and its demand, the
businessman can predetermine the production cost, production and the
level of stock to be determined. Thus, business forecasting is regarded as
the key of success of business.
 Helpful in taking management decisions
Business forecasting provides the basis for management decisions,
because in present times the management has to take the decision in the
atmosphere of uncertainties. Also, business forecasting explains the future
conditions and enables the management to select the best alternative.
 Useful to administration
On the basis of forecasting, the government can control the circulation of
money. It can also modify the economic, fiscal and monetary policies to
avoid adverse effects of trade cycles. So, with the help of forecasting, the
government can control the expected fluctuations in future.
 Basis for capital market
Business forecasting helps in estimating the requirement of capital, position
of stock exchange and the nature of investors.
 Useful in controlling the business cycles
The trade cycles cause various depressions in business such as sudden
change in price level, increase in the risk of business, increase in
unemployment, etc. By adopting a systematic business forecasting,
businessmen and government can handle and control the depression of
trade cycles.

Manipal University Jaipur Page No. 488


Statistics for Management Unit 13

 Helpful in achieving the goals


Business forecasting helps to achieve the objective of business goals
through proper planning of business improvement activities.
 Facilitates control
By business forecasting, the tendency of black marketing, speculation,
uneconomic activities and corruption can be controlled.
 Utility to society
With the help of business forecasting the entire society is also benefited
because the adverse effects of fluctuations in the conditions of business are
kept under control.
13.5.2 Limitations of business forecasting
Business forecasting cannot be accurate due to various limitations which
are mentioned below.
 Forecasting cannot be accurate, because it is largely based on future
events and there is no guarantee that they will happen.
 Business forecasting is generally made by using statistical and
mathematical methods. However, these methods cannot claim to make
an uncertain future a definite one.
 The underlying assumptions of business forecasting cannot be satisfied
simultaneously. In such a case, the results of forecasting will be
misleading.
 The forecasting cannot guarantee the elimination of errors and mistakes.
The managerial decision will be wrong if the forecasting is done in a
wrong way.
 Factors responsible for economic changes are often difficult to discover
and measure. Hence, business forecasting becomes an unnecessary
exercise.
 Business forecasting does not evaluate risks.
 The forecasting is made on the basis of past information and data and
relies on the assumption that economic events are repeated under the
same conditions. But there may be circumstances where these
conditions are not repeated.
 Forecasting is not a continuous process. In order to be effective, it
requires continuous attention.
Manipal University Jaipur Page No. 489
Statistics for Management Unit 13

Self Assessment Questions


State whether the following statements are ‘True’ or ‘False’.
1. Forecast is an estimate based solely on past data of the series under
investigation.
2. In time series analysis method a comparative study of variations can be
made.
3. In exponential smoothing, old observations are given increasing
exponential weightage.

Activity
1. Which of the following is not a forecasting model?
i) Trend method
ii) End-use method
iii) Correlation Method
iv) Exponential Method
2. The basic assumption in Linear Trend Method of forecasting is:
i) Rate of growth is constant from year to year
ii) Absolute growth is constant from year to year
iii) Rate of change is constant from year to year
iv) Absolute growth changes is constant from year to year
3. Which of the following methods is most suited for forecasting of
capital goods and machinery?
i) Trend
ii) Correlation & Regression
iii) End-Use Method
iv) Time Series Analysis
4. Answer to which of the following question is not related to planning or
Budgeting exercise?
i) Where we are?
ii) How did we reach here?
iii) Why we reached here?
iv) Where we ought to reach?
Solution
1. iv) Exponential Method
2. iv) Absolute growth changes is constant from year to year
3. iii) End-Use Method
4. iv) Where we ought to reach?

Manipal University Jaipur Page No. 490


Statistics for Management Unit 13

13.6 Summary
Let us recapitulate the important concepts discussed in this unit:
 Business forecasting refers to the analysis of past and present economic
conditions with the object of drawing inferences about probable future
business conditions.
 To forecast the future, various data, information and facts concerning to
economic condition of business for past and present are analysed.
 Business forecasting helps the businessmen and industrialists to form
the policies and plans related with their activities. On the basis of the
forecasting, businessmen can forecast the demand of the product, price
of the product, condition of the market and so on.
 The following are the main methods of business forecasting: Business
barometers, Time series analysis, Extrapolation, Regression analysis,
Modern econometric methods, Exponential smoothing method.
 There are a few theories that are followed while making business
forecasts. Some of them are: Sequence or time-lag theory, Action and
reaction theory, Economic rhythm theory, Specific historical analogy,
Cross-cut analysis theory.

13.7 Glossary
Arithmetic trend: The straight line arithmetic trend assumes that growth will
be a constant amount each year.
Estimation of future: The business forecasting is to forecast the future
regarding probable economic conditions.
Extrapolation: Extrapolation is the simplest method of business
forecasting. By extrapolation, a businessman finds out the possible trend of
demand of his goods and also about the future price trends
Exponential smoothing: A type of moving average technique, applies to
time series data used in forecasting.
Period: The forecasting can be made for long term, short term, medium
term or any specific period.
Semi-log trend: It assumes a constant percentage increase each year. As
the annual increment is constant in logarithm, this line will become a straight
line when drawn on semi-log paper.
Manipal University Jaipur Page No. 491
Statistics for Management Unit 13

Smoothing constant: Weight attached to the difference between the actual


and forecast values in exponential smoothing method

13.8 Terminal Questions


1. What is business forecasting?
2. Explain the objectives of business forecasting.
3. Explain the steps involved in forecasting.
4. Explain the characteristics of business forecasting.
5. Differentiate between prediction, projection and forecasting.
6. Describe the limitations of business forecasting.
7. Explain the main methods of business forecasting.
8. Critically examine the important theories of business forecasting.

13.9 Answers

Self Assessment Questions


1. False
2. True
3. False
Terminal Questions
1. Refer section 13.2
2. Refer section 13.2.1
3. Refer section 13.2.4
4. Refer section 13.2.3
5. Refer section 13.2.2
6. Refer section 13.5.2
7. Refer section 13.3
8. Refer section 13.4

13.10 Case Study


The following table give the weekly sales of black & colour cartridges by a
computer accessories store for the past 7 weeks.
Manipal University Jaipur Page No. 492
Statistics for Management Unit 13

Table 13.10: Weekly Sales of Cartridges


Black Cartridges Colour Cartridges
Week Sales Week Sales
1 60 1 67
2 50 2 57
3 70 3 47
4 60 4 50
5 50 5 60
6 55 6 70
7 45 7 60
Prepare a weekly forecast for the next 5 weeks for both types of cartridges,
justify the method used

References:
 Cowan, Glen. Statistical Data Analysis, Oxford Science Publications
 Data Reduction and Error Analysis for the Physical Sciences (3rd
Edition), by Philip R. Bevington and D. Keith Robinson (Paperback).
Bevington, Philip R. & Robinson, D. Keith. Data Reduction and Error
Analysis for the Physical Sciences. 3rd Ed
 Devore, Jay L. (2008) Probability and Statistics for Engineering and the
Sciences.
 Froedesen, A. G., Skjeggestad D., & Tøfte, H. (1979) Probability and
Statistics in Particle Physics.
 James, Frederick. (2006) Statistical Methods in Experimental Physics.
2nd Ed.
 Levin, Richard I., & Rubin, David S. (2008) Statistics for Management.
7th Ed. PHI Learning Private Limited.
 Lyons, Louis. (1989) Statistics for Nuclear and Particle Physicists.
 Mandel, John. The Statistical Analysis of Experimental Data.
 Meyer, Stuart L. Data Analysis for Scientists and Engineers.
 Press, William H., Teukolsky, Saul A., Vetterling, William T., & Flannery,
Brian P. Numerical Recipes : The Art of Scientific Computing 3rd Ed.

Manipal University Jaipur Page No. 493


Statistics for Management Unit 13

 Probability and Statistics [PROBABILITY & STATISTICS 3 -OS] by


Morris H.(Author) ;Schervish, Mark J.(Author) DeGroot, (Paperback -
Jan. 31, 2002).
 Ross, Sheldon M. (2009) Introduction to Probability and Statistics for
Engineers and Scientists. 4th Ed.
 Taylor, John R. An Introduction to Error Analysis: The Study of
Uncertainties in Physical Measurements.

Manipal University Jaipur Page No. 494


Statistics for Management Unit 14

Unit 14 Time Series Analysis


Structure:
14.1 Introduction
Objectives
Relevance
14.2 Time Series Analysis
14.3 Utility of the Time Series
14.4 Components of Time Series
Long term trend or secular trend
Seasonal variations
Cyclic variations
Random variations
14.5 Methods of Measuring Trend
Free hand or graphic method
Semi averages method
Method of moving averages
Method of least squares
14.6 Mathematical Models for Time Series
Additive model
Multiplicative model
14.7 Editing of Time Series
14.8 Measurement of Seasonal Variation
Simple average method
Ratio to moving averages method
Chain or link relative method
Ratio to trend method
14.9 Forecasting Methods Using Time Series
Mean forecast
Naive forecast
Linear trend forecast
Non-linear trend forecast
Forecasting with exponential smoothing
14.10 Summary
14.11 Glossary
14.12 Terminal Questions
14.13 Answers
14.14 Case Study
Manipal University Jaipur Page No. 495
Statistics for Management Unit 14

14.1 Introduction
In the previous unit ‘Business Forecasting’, you have studied about the
ways of forecasting business events successfully. You also studied about
the different methods available for forecasting. In this unit, you will study
about the time series analysis and different components of time series. You
will also study about the forecasting methods using time series.
A time series is a set of numerical values of a given variable listed at
successive intervals of time, which means that, data regarding the variable
is listed in chronological order. Usually, the interval of time is taken as
uniform.
Yearly production of wheat in the country, hourly temperature of a city,
bimonthly electricity bills are all examples of time series. Almost all data like
industrial production, agricultural production, exports, imports, dairy
products can be arranged in chronological order.
Objectives:
After studying this unit, you should be able to
 analyse the time series
 describe different components of time series
 describe the forecasting methods
 apply time series analysis in business scenarios
14.1.1 Relevance
The 1990s brought a heightened awareness of an increased concern over
pollution in various forms in the United States. Air pollution is one of the
main areas of environmental concern. The U.S Environmental Protection
Agency (EPA) monitors the quality of air around the country. Some of the air
pollutants monitored include carbon monoxide emissions, nitrogen oxide
emissions, volatile organic compounds, sulphur dioxide emissions, etc. The
substances in these pollutants cause cancer and respiratory problems. If the
data is given for 15 years of period (1985-1999), then the question is to find
whether the air quality in U.S has been improving or deteriorating over time.
Managerial and statistical questions:
1. Is it possible to forecast the emissions of carbon monoxide or nitrogen
oxides for the year 2004-2007, or 2020 using the available data?
2. What techniques best forecast the emissions of carbon monoxide or
nitrogen oxides in the future?
Manipal University Jaipur Page No. 496
Statistics for Management Unit 14

These questions are best answered by time series analysis.

14.2 Time Series Analysis


Given a time series, we need to study about the forces that influence the
variations in time series and the behaviour of phenomenon over the given
period of time. For example, consider the sales of T.V sets (in thousands) by
a producing company. The table 14.1 depicts the sales data of TV sets sold
from 1995 to 2000.
Table 14.1: Sales Data of TV Sets Sold From 1995 to 2000
Year 1995 1996 1997 1998 1999 2000
Number of TV sets
12 14 16 12 10 18
sold (in thousands)

Let us analyse the above data and give some trends regarding the sales.
For example, the company would like to know why sales dropped in 1998
and 1999 and why did it increase. In other words, the company would like to
analyse the various forces that affect the sales.
There can be changes in the values of the variable recorder over different
points of time due to various forces. Analysing the effect of all such forces
on the values of the variable is generally known as the analysis of time
series. Broadly, the following are the four types of changes in the values of
the variable:
i) Changes which generally occur due to general tendency of the data to
increase or decrease
ii) Changes which occur due to change in climate, weather conditions
and festivals
iii) Changes which occur due to booms and depressions
iv) Changes which occur due to some unpredictable forces like floods,
famines and earthquakes

14.3 Utility of the Time Series


The following are the possible uses of the time series:
i. The comparative study of behaviour of the variable over different
periods of time can be done. The variable may be export figures,
quantity of industrial production etc.

Manipal University Jaipur Page No. 497


Statistics for Management Unit 14

ii. Forecasting can be done using the time series. By studying the
variations and other behaviour of the variables over a sufficiently long
period of time, it may be possible to forecast the future behaviour of the
variables. However, such a forecast has meaning only if the period of
forecast is a normal period. For example, various five-year plans by the
government of India are formulated by studying the time series and
forecasting.
iii. Study of the time series helps in analysing the post behaviour of the
variables. This helps in identifying the various forces that affect its
behaviour.

14.4 Components of Time Series


The behaviour of a time series over periods of time is called the movement
of the time series. The time series is classified into the following four
components:
i) Long term trend or secular trend
ii) Seasonal variations
iii) Cyclic variations
iv) Random variations
i) Long term trend or secular trend
This refers to the smooth or regular long term growth or decline of the
series. This movement can be characterised by a trend curve. If this curve is
a straight line, then it is called a trend line. If the variable increases over a
long period of time, then it is called an upward trend. If the variable
decreases over a long period of time, then it is called a downward trend. If
the variable moves upward or downward along a straight line then the trend
is called a linear trend, otherwise it is called a non-linear trend.

ii) Seasonal variations


Variations in a time series that are periodic in nature and occur regularly
over short periods of time during a year are called seasonal variations.
These variations are precise and can be forecasted.
The following are examples of seasonal variations in a time series.
i. The prices of vegetables drop down after rainy season or in winter
months and they go up during summer, every year.

Manipal University Jaipur Page No. 498


Statistics for Management Unit 14

ii. The prices of cooking oils reduce after the harvesting of oil seeds and
go up after some time.

iii) Cyclic variations


The long-term oscillations that represent consistent rise and decline in the
values of the variable are called cyclic variations. Since these are long-term
oscillations in the time series, the period of oscillation is usually greater than
one year. The oscillations are either a trend curve or a trend line. The period
of one cycle is the time-distance between two successive peaks or two
successive troughs.
iv) Random variations
Random variations are called irregular movements. Movements that occur
usually in brief periods of time, without any pattern and which are
unpredictable in nature are called irregular movements. These movements
do not have any regular period or time of occurrences. For example, the
effect of national strikes, floods, earthquakes, etc. It is very difficult to study
the behaviour of such a time series.

14.5 Methods of Measuring Trend


We will study the following methods of measuring the trend of a time series:
i. Free hand or graphic methods
ii. Semi averages method
iii. Moving average method
iv. Method of least squares
14.5.1 Free hand or graphic method
This is the simplest method of drawing a trend curve. We plot the values of
the variable against time on a graph paper and join these points. The trend
line is then fitted by inspecting the graph of the time series. Fitting a trend
line by this method is arbitrary. The trend line is drawn such that the
numbers of fluctuations on either side are approximately the same. The
trend line should be a smooth curve.
The free hand method has the following disadvantages:
i. It depends on individual judgement.
ii. It cannot be used for any predictions of trends as drawing the trend
curve is arbitrary

Manipal University Jaipur Page No. 499


Statistics for Management Unit 14

Solved problem 1
Find the trend with the help of free hand curve method for the data depicted
in table 14.2
Table 14.2: Production Data from 1991 to 2001
Year Production Data (in Lakh ton)
1991 15
1992 18
1993 16
1994 22
1995 19
1996 24
1997 20
1998 28
1999 22
2000 30
2001 26

Solution: Figure 14.1 depicts free hand curve of the production data versus
the time period. In the graph, we have taken production data values on
Y-axis and values of time on X-axis.

Fig. 14.1: Free Hand Curve for Solved Problem 1

Manipal University Jaipur Page No. 500


Statistics for Management Unit 14

14.5.2 Semi Averages method


The methods of fitting a linear trend with the help of semi average method
are as follows:
i. When the number of years is even, then the data of the time series is
divided into two equal parts. The total in each of the part is calculated
and then divided by the number of items to obtain arithmetic means of
the two parts. Each average is then centred in the period of time from
which it has been computed and plotted on the graph paper. A straight
line is drawn passing through these points. This is the required trend
line.
ii. When the number of years is odd, then the value of the middle year is
omitted to divide the time series into two equal parts. Then the
preceding procedure is followed.
Solved Problem 2
The figures given below show the export of sugar from India for the years
1971 to 1980. Determine the trend of the following by using semi averages
method.
Table 14.3: Export of Sugar from India from 1971 to 1980
Years Exports (in lakh tonnes)
1971 3.9
1972 1.3
1973 1.1
1974 4.4
1975 9.4
1976 9.6
1977 3.4
1978 2.5
1979 8.6
1980 2.9

The trend of the export of sugar from India can be found by the semi
averages method as follows:
Here the number of years is 10 i.e., it is even. The series is divided into two
halves consisting of the first five years and last five years.

Manipal University Jaipur Page No. 501


Statistics for Management Unit 14

The average of the first five years=1/5 (3.9+1.3+1.1+4.4+9.4)


= 4.02
The average of the second five years=1/5 (9.6+3.4+2.5+8.6+2.9)
= 5.4
Plot the data on a graph paper and join them by dotted lines as shown in the
previous example. Plot the average value of the first half series against the
year 1973 and the average of the last five years against 1978 on the same
graph. On joining these two points, we get the trend line.
The merits and demerits of semi averages method are depicted in
table 14.4.
Table 14.4: Merits and Demerits of Semi averages Method
Merits Demerits

The semi average method is The semi average method assumes a straight
simple. line relationship between the plotted points,
regardless of the fact whether such
relationship exists or not.
The trend line can be extended This method has an in built limitation of
on either side in order to obtain arithmetic mean. This method is not suitable in
past or future estimates. case of very low or very large extreme values.

This is an objective method, as There is no assurance that the influence of


any one applying this method cycle is eliminated.
get the same trend line.

14.5.3 Method of Moving Averages


Moving averages method is used for smoothing the time series. It
smoothens the fluctuations of the data.
When period of Moving Average is odd:
The procedure to determine the trend by this method is depicted in
figure 14.2.
By plotting these trend values (if desired) you can obtain the trend curve,
with the help of which, you can determine the increasing or decreasing
trend. If needed, you can also compute short-term fluctuations by
subtracting the trend values from the actual values.

Manipal University Jaipur Page No. 502


Statistics for Management Unit 14

Fig. 14.2: Procedure for Determining the Trend when Moving Average is Odd

Solved Problem 3
Calculate the 3 yearly Moving Averages of the data depicted in table 14.5.
Table 14.5: Production Data from 1988 to 1996
Year 1988 1989 1990 1991 1992 1993 1994 1995 1996
Production
21 22 23 25 24 22 25 27 26
(in Lakh ton)

Manipal University Jaipur Page No. 503


Statistics for Management Unit 14

Solution: Table 14.5a depicts the calculated values of 3 yearly averages.

Table 14.5a: Calculated Values of 3 Yearly Moving averages


Production 3 –yearly
(Thousand 3 –yearly moving Short term
Year Tonnes) moving average fluctuations
totals (Y - Yc)
Y Yc
1988 21 - - -
1989 22 66 22.00 0
1990 23 70 23.33 - 0.33
1991 25 72 24.00 1.00
1992 24 71 23.67 0.33
1993 22 71 23.67 - 1.67
1994 25 74 24.67 0.33
1995 27 78 26 1.00
1996 26 - - -

When period of moving averages is even:


When period of moving averages is even (such as four years), we compute
the moving averages by using the steps depicted in figure 14.3.

Manipal University Jaipur Page No. 504


Statistics for Management Unit 14

Fig. 14.3: Procedure for Determining the Trend when Moving Average is Even

Manipal University Jaipur Page No. 505


Statistics for Management Unit 14

Table 14.6 depicts the merits and demerits of the moving averages method.

Table 14.6: Merits and Demerits of Moving Averages Method

Merits Demerits
This is a simple method. No functional relationship between the
values and time. Thus, this method is not
helpful in forecasting and predicting the
values on the basis of time.
This method is objective in the No trend values for some years in the
sense that anybody working on a beginning and some in the end.
problem with this method will get
the same results.
This method is used for determining In case of non–linear trend, the values
seasonal, cyclic and irregular obtained by this method are biased in one
variations besides the trend values. or the other direction.
This method is flexible enough to The period selection of moving average is
add more figures to the data a difficult task. Hence, great care has to be
because the entire calculations are taken in period selection, particularly when
not changed. there is no business cycle during that time.
If the period of moving averages
coincides with the period of cyclic
fluctuations in the data, such
fluctuations are automatically
eliminated.

Solved problem 4
The following table gives the average monthly production
(in thousands) of new passenger cars in the period 1976-1985. Calculate
four yearly moving averages.
Table 14.6: Average Monthly Production of Cars
Year Average monthly production of new cars
1976 708
1977 767
1978 764
1979 702
1980 533
1981 521
1982 421
1983 562
1984 635
1985 667
Manipal University Jaipur Page No. 506
Statistics for Management Unit 14

Solution:
The following table depicts four yearly moving averages.
Table 14.6a: Four yearly moving averages

(1) (2) (3) (4) (5) (6)


Year Average 4 yearly 4 yearly 2 item Centered
monthly moving moving moving 4 yearly
production Total average total moving
of new (3) / 4 average
cars (5) / 2
1976 708

1977 767
2941 735.25
1978 764 713.375
2766 691.5 1426.75
1979 702 660.75
2520 630 1321.5
1980 533 587.125
1174.25
2177 544.25
1981 521 1053.5 526.75
2037 509.25
1982 421 1044 522
2139 534.75
1983 562 1106 553
2285 571.25
1984 635

1985 667

14.5.4 Method of least squares


In this method, the trend curve is determined by fitting a mathematical
equation. This method is more accurate and precise and can be used even
for forecasting. We can fit either a straight line or a parabolic curve from the
given data by this method.

Key Statistic
Let ‘Y’ be the actual values of ‘Y’ and ‘Yc’ be the computed values of ‘Y’ for
a given value of ‘X’.
Let ‘Y = a + bX’ be a straight line to be fitted for trend. To find the values of
‘a’ and ‘b’, such that the sum of squares of differences of the actual and
computed values of ‘Y’ is least, that is,

 Y  Y 
2
c is least

Manipal University Jaipur Page No. 507


Statistics for Management Unit 14

where, the condition


 Y  Y   0 is satisfied,
c

is known as method of least squares. The line obtained by the method is


known as the ‘line of best fit.’
For a given time series data, to find a linear trend, the values of ‘a’ and ‘b’
are obtained by the normal equations.
  a b  
  a  b  2
where, N is the number of pairs for which data are given. Here ‘a’ is
intercept of the line on the y – axis and ‘b’ is the slope of the line. ‘b’ is also
known as growth rate (if b > 0) or decline rate (if b< 0), ‘b’ gives the change
in the value of ‘Y’, for per unit change in the value of ‘X’.
1. Direct method
The procedure to be followed is described below.
i) Convert the years into natural numbers (1, 2, 3……) and denote by ‘X’
and find X.
ii) Find the squares of ‘X’ values and obtain X2.
iii) Multiply the X – values with corresponding Y – values and obtain XY
iv) Add the values of Y to obtain Y.
v) Put these values in the two normal equations and solve for ‘a’ and ‘b’.
vi) Substitute these values of ‘a’ and ‘b’ in ‘Y = a + bX’ and then find trend
values for various values of ‘X’.
2. Short cut method
The calculations are simplified when the mid-point in time is taken as origin
so that:  X = 0

When,  X = 0 then normal equations reduce to:

Y
 Y  a , therefore, a
N

XY
 XY  b X 2
, therefore, b 
X 2

Manipal University Jaipur Page No. 508


Statistics for Management Unit 14

Solved problem 5
The production of pig iron and ferro alloys in thousand metric tons in India is
as given below. Find the trend line by the method of least squares.
Table 14.7: Production data
Year Production
1976 672
1977 824
1978 967
1979 1204
1980 1464
1981 1758
1982 2057
Solution:
The trend line can be fitted by using the method of least squares for the
given data.
Table 14.7a: Calculation for trend line

Production
Year X= Year - 1979 XY X2
Y
1976 672 -3 -2016 9
1977 824 -2 -1648 4
1978 967 -1 -967 1
1979 1204 0 0 0
1980 1464 1 1464 1
1981 1758 2 3516 4
1982 2057 3 6171 9
Total ∑Y= 8946 ∑X=0 ∑XY= 6520 ∑X =28
2

When  X = 0, then normal equations reduce to:


Y 8946
 Y  a , therefore, a
N
=
7
= 1278

XY 6520
 XY  b X 2
, therefore, b 
X 2

28
 232 .9

The estimated equation of the trend line is given by

Y  a  bX  1278  232 .9 X

Manipal University Jaipur Page No. 509


Statistics for Management Unit 14

The merits and demerits of method of least squares are displayed in


the table 14.8.
Table 14.8: Merits and Demerits of Direct Method of Least Squares

Merits Demerits
This method is a completely It requires many calculations and is tedious
objective method. and complicated.
This method gives the trend If even a single item is added to the series
values for the entire time period. a new equation has to be formed.
This method can be used to Future forecasts made by this method are
forecast future trend because based only on trend values. Seasonal,
trend line establishes a functional cyclical or irregular variations are ignored.
relationship between the value
and the time.

Non-linear trend
When the time series data do not confirm with the linear trend, then we
obtain non-linear trend. We do so by obtaining a parabolic curve or non-
linear curve in the method of least squares. For this we use the equation of
the form.
  a  b  c 2  d 3 .......... k n which is known as a polynomial of
degree ‘n’ in ‘X’, k ≠ 0.
Let the parabolic curve be
  a  b   c 2
The values of a, b, and c can be determined by solving the normal
equations:

  ab  c  2
  a  b  2 c  3
 2   a  2 b  3  c  4
If we can change the origin at a suitable point, such that ‘X = 0’, then the
normal equations reduce to:

  ac  2
  b  2
 2   a  2 c  4
Manipal University Jaipur Page No. 510
Statistics for Management Unit 14

Self Assessment Questions


1. State ‘True’ or ‘False’
i) ‘The prices of cooking oils reduce after the harvesting of oil seeds
and go up after some time’ is an example of cyclic variations in a
time series.
ii) The effect of national strikes, floods, earthquakes are examples of
random variations in time series.

14.6 Mathematical Models for Time Series


The following are the two models commonly used for the decomposition of a
time series into its components.
 Additive model
 Multiplicative model
Most of the time series relating to economic and business phenomenon
conform to the multiplication model. In practice, additive model is rarely
used.
14.6.1 Additive Model

Key Statistic
The additive model assumes that the observed value is the sum of four
components of time series, that is,
Y=T+S+C+I
where,
Y = original data
T = trend value
S = seasonal component
C = cyclical component
I = irregular component
The additive model for decomposition of time series assumes that all the
four components of the time series operate independently of one another. It
also assumes that the behaviour of components is additive in character. It is
to be noted that only absolute values are added or deducted from the trend
value to arrive at the observed value.

Manipal University Jaipur Page No. 511


Statistics for Management Unit 14

14.6.2 Multiplicative model

Key Statistic
The multiplicative model assumes that the observed value is obtained by
multiplying the trend (T) by the rates of three other components, that is,
Y=TxSxCxI
where,
Y = original data
T = trend value
S = seasonal component
C = cyclical component
I = irregular component
The multiplicative model assumes that the components, although due to
different causes, are not necessarily independent and they can affect one
another. It also assumes that the behaviour of components is of
multiplicative character. It is to be noted that except the value of trend, all
the other values on the right hand side are rates or index numbers.

14.7 Editing of Time Series


It is necessary to make certain adjustments in the available data. Some
important adjustments are:
1. Time variation
When data is available on monthly basis, the effect of time variation needs
to be adjusted because all months of the year do not have the same number
of days. This adjustment of time variation is done by dividing each monthly
total by daily average. It is then multiplied by 365 / 12 which is the average
number of days in a month.
2. Population changes
Adjustment for population change becomes necessary when a variable is
affected by change in population. When we study national income figures,
such adjustment is necessary. In this case, adjustment is to divide the
income by the number of persons concerned. Then, we can have per capita
income figures.

Manipal University Jaipur Page No. 512


Statistics for Management Unit 14

3. Price changes
Adjustment for price changes becomes necessary wherever we have real
value changes. Current values are to be deflated by the ratio of current
prices to base year prices.
4. Comparability
The data which gets analysed should be comparable in order to have a valid
conclusion. When we deal with the analysis of time series it involves data
relating to the past which must be homogeneous and comparable.
Therefore, effects should be there to make the data as homogeneous and
comparable as possible.

14.8 Measurement of Seasonal Variation


In order to isolate and identify seasonal variations, we first eliminate the
effect of trend, cyclic variations and irregular fluctuations on the time series.
The main methods of measuring seasonal variations are:
 Simple average method
 Ratio to moving averages method
 Chain or link relative method
 Ratio to trend method
Now we will discuss separately each of the methods of measuring seasonal
variations.
14.8.1 Simple Average Method
In the simple average method, the steps followed are described below:
i) The time series is arranged by years and months or quarters.
ii) Totals of each month or quarter are obtained over all the years.
iii) The average for each month or quarter is obtained. The average may
be mean or median. In general, we take mean if not specified
otherwise.
iv) Taking the average of monthly or quarterly average equal to 100,
seasonal index for each month or quarter is calculated by the following
formula:
Seasonal Index for a month (or quarter) =

Monthly (or quarterly) average for the month (or quarter )


100
Average or monthly (or quarterly) averages

Manipal University Jaipur Page No. 513


Statistics for Management Unit 14

S
Symbolically, seasonal index for first term is given by: I  1
 100
1 S
Where, S1 = Average of first term
S = Average of all terms Sj / k where j = 1, 2, 3, 4……..k
k = 12 for monthly data
k = 4 for quarterly data
The merits and demerits of Simple average method are depicted in
table 14.8.
Table 14.9: Merits and Demerits of Simple Average Method
Merits Demerits
This method is the Most economic time series have trends and
simplest one. therefore, the seasonal index computed by this
method is really an index of trends and seasons.

This method is useful The simple averages method of isolating seasonal


where no definite trend fluctuations in time series is based on the
exists in the time series. assumption that the series contains only the
seasonal and irregular fluctuations.

This method does not give a true reflection of the


normal seasonal variation. This is because it is
obtained from the original data which is affected by
not only seasonal movements but also by
remaining three components.
The effects of cycles of the original data are not
eliminated by the process of averaging.

14.8.2 Ratio to moving averages method


Ratio to moving averages method is also known as percentage of moving
average method.
The steps involved in the computation of seasonal indices by this method
are described below:
i) The moving averages of the data are computed. If the data is monthly
then 12-monthly moving averages will be computed and if they are
quarterly then 4-quarterly moving averages will be computed. In both
the cases, time periods of moving averages are even. Hence, these
moving averages are to be centred.

Manipal University Jaipur Page No. 514


Statistics for Management Unit 14

ii) Under additive model, from each original value, the corresponding
moving average is deducted to find out short time fluctuations, which is
given as:
Y–T=S+C+I
iii) By preparing a separate table, monthly (or quarterly) short time
fluctuations are added for each month (or quarter) over all the years
and their average is obtained. These averages are known as seasonal
variations for each month or quarter.
iv) If we want to isolate / measure irregular variations, the mean of the
respective month or quarter is deducted from the short time
fluctuations.
14.8.3 Chain or link relative method
The steps involved in the chain or link relative method are described below.
i) Each quarterly or monthly value is divided by the preceding quarterly
or monthly value and the result is multiplied by 100. These
percentages are known as ‘Link Relatives’ of the seasonal values.
Thus:

Current Season Value


Link Re lative  100
Pr evious Season Value
There will be no link relative corresponding to the first.
ii) The mean of the link relatives for each season is computed over all the
years. Median can also be taken instead of mean of the link relatives.
iii) These average link relatives are converted into chain relatives. The
chain relative of first is taken as 100.
The Chain Re lative of current year


Average Link Re lative of current year  Chain Re lative of previous year 
100

iv) The second chain relative of first is computed on the basis of the chain
relative for the last:
The Chain Re lative of first quarter


Average Link Re lative of the first quarter  Chain Re lative of the last 
100

Manipal University Jaipur Page No. 515


Statistics for Management Unit 14

This chain relative may or may not be 100. It is not equal to 100 due
to secular trend. If it is 100, go to ‘step vi’, if it is not 100, go to ‘step
v’ and then go to ‘step vi’.
v) Compute the difference ‘d’ between the new chain relatives first
obtained in ‘step iv’ and chain relative assumed as 100. ‘d’ is divided
by the number of seasons and the resulting figure is multiplied by
1, 2, 3 and the product is deducted respectively from the chain
relatives of 2nd, 3rd, and 4th quarters. These are called corrected
relatives.
vi) The seasonal indices are obtained when the corrected chain relatives
are expressed as percentage of their relative averages.
14.8.4 Ratio to trend method
The following steps are considered to determine seasonal indices by this
method:
i) Determine the trend values by the method of least squares.
ii) To find ratio to trend, divide the original data by the corresponding
trend values and multiply these ratios by 100, that is,
 Original Data 
Ratio to Trend    100
 Trend Value 
iii) Calculate the arithmetic mean of the trend ratios obtained in ‘step ii’.
iv) Finally, all the trend ratios will be converted into seasonal indices. Add
all averages obtained in ‘step iii’ and find their general average.
Seasonal indices are calculated by using the following formula:
 Quarterly Averages 
Seasonal Indices    100
 General Averages 

14.9 Forecasting Methods Using Time Series


There are five forecasting methods using time series. They are:
1. Mean forecast
2. Naive forecast
3. Linear trend forecast
4. Non-linear trend forecast
5. Forecasting with exponential smoothing

Manipal University Jaipur Page No. 516


Statistics for Management Unit 14

14.9.1 Mean forecast


It is the simplest method of forecasting in which for the time period t, we
forecast the value of the series to be equal to the mean of the series, that is,

Y Y
t

In this method the trend effect and cyclic effects do not come into account.
14.9.2 Naive forecast
In this method we forecast the value, for the time period t, to be equal to the
actual value observed in the previous period, that is, time period (t-1). This
is given as:
Y Y
t t 1

14.9.3 Linear trend forecast


It is given by Yt = a + bX, where X is to be found from the value of t; a and b
are constants. This method is based on the least squares method where a
linear relationship is to be obtained between time and the response value ‘X’
by the formula which is given as:
Y Y
t t 1

14.9.4 Non-linear trend forecast


In this method a non-linear relationship between the time and the response
value has been found by the method of least squares. The value of forecast
‘Yt’ for the time period ‘t’, is given as:

Y  a  b  c 2 where, X-value will be calculated from the value of ‘t’


t

and the constant ‘a’.


14.9.5 Forecasting with exponential smoothing
Exponential smoothing is the forecasting method in which the observation
values are constantly updated and used to revise a forecast. As the
observations get older, they get exponentially decreasing weights.
Exponential smoothing is of many types, such as single, double, triple
exponential smoothing.

Manipal University Jaipur Page No. 517


Statistics for Management Unit 14

Self Assessment Questions


2. Fill in the following blanks.
i) A set of numerical value observed at regular interval of time is called
_______.
ii) Long term movements in time series are called ______.
iii) Variations that occur within a year are known as _______.
iv) Semi averages method is used to measure _________.
v) Method of moving averages does not show any _____ relationship.

Activity
Find seasonal variations by the ratio to trend method from the data given
below:

Year I Quarter II Quarter III Quarter IV Quarter


1994 60 80 72 68
1995 68 104 100 88
1996 80 116 108 96
1997 108 152 136 124
1998 160 184 172 164

Activity Solution
Quarterly
Yearly Trend
Year Average X X2 XY
Total Values
Y
1994 280 70 -2 4 140 64
1995 360 90 -1 1 -90 88
1996 400 100 0 0 0 112
1997 520 130 1 1 130 136
1998 680 170 2 4 340 160
∑Y = 560 ∑X= 0 ∑ X =10
2
∑XY= 240

Fitting of Linear Trend: Y = a + b X


The value of the constants a and b in the equation of a straight line is as
follows:
When,  X = 0 then normal equations reduce to:
Y 560
 Y  a , therefore, a
N
=
5
= 112

Manipal University Jaipur Page No. 518


Statistics for Management Unit 14

XY 240
 XY  b X 2
, therefore, b 
X 2

10
 24

Therefore, the equation will be given by: Y = 112 + 24X


Therefore, the quarterly increment is: (24/4)=6
Now quarterly trend values can be calculated:
Consider the year 1994, trend value is 64
Therefore the values for second and third Quarters of 1994 are: 64 - (6/2)
= (64 – 3)= 61 and 64 + (6/2) = (64+3) = 67 respectively. The value of the
first quarter of 1994 would be (61 – 6)= 55 and for the last quarter (67 +6)
= 73. Similarly all values are calculated.
Year I Quarter II Quarter III Quarter IV Quarter
1994 55 61 67 73
1995 79 85 91 97
1996 103 109 115 121
1997 127 133 139 145
1998 151 157 163 169
The given values of the time series will now be expressed as percentages
of the corresponding trend values given above. Thus, for the first quarter
of 1994, the percentage is (60/55)*100=109.1, second quarter
(80/61)*100= 131.1
Actual Quarterly Values as % of Quarterly Trend Values
Year I Quarter II Quarter III Quarter IV Quarter
1994 109.1 131.1 107.5 93.1
1995 86.1 122.4 109.9 90.7
1996 77.7 106.4 93.9 79.3
1997 85 114.3 97.8 85.5
1998 106.0 117.1 105.5 97.0
Total 463.9 591.3 514.6 445.6
Average 92.78 118.26 102.92 89.12
Seasonal
Index 92.0 117.4 102.1 88.4
Adjustment
Since the total of the seasonal index is 403.08=
(92.78+118.26+102.92+89.12) Each index has to be adjusted by
multiplying it by (400/403.08) and the final indices are thus available.

Manipal University Jaipur Page No. 519


Statistics for Management Unit 14

14.10 Summary
Let us recapitulate the important concepts discussed in this unit:
 A time series is a set of numerical values of a given variable listed at
successive intervals of time.
 The time series is classified into the following four components: Long
term trend or secular trend, Seasonal variations, Cyclic variations and
Random variations.
 The methods of measuring the trend of a time series are: Free hand or
graphic methods, Semi averages method, Moving average method and
Method of least squares.
 The forecasting methods using time series are: Mean forecast, Naïve
forecast, Linear trend forecast, Non-linear trend forecast and Forecasting
with exponential smoothing.

14.11 Glossary
Chain Relative: Ratio of seasonal index for the quarter to the seasonal
index of the previous quarter
Link Relative: Ratio of value of the variable for the quarter to the value of
the variable of the previous quarter.
Random: Changes in data value due to the factors other than trend and
seasonal.
Seasonal: Periodical changes in data values over regular intervals of time.
Time series: A set of observations recorded over a period of time.
Trend: The tendency in the data values either to increase or decrease.

14.12 Terminal Questions


1. What is meant by analysis of time series?
2. State the difference between seasonal variations and cyclical
fluctuations.
3. What is trend? State various methods of measuring it.
4. Explain the moving average method of measuring long term trend.
5. What are the components of time series? Bring out the significance of
moving average in analysing a time series and point out its limitations.

Manipal University Jaipur Page No. 520


Statistics for Management Unit 14

6. What is meant by secular trend? Discuss any two methods of isolating


trend values in a time series.
7. What is seasonal variation of a time series? Describe the various
methods you know to evaluate it and examine their relative merits.
8. Find a straight line trend to the following data and find trend value.
Table 14.10: Yearly Production Data
Year Production in 1000 (kg)
1990 80
1991 90
1992 92
1993 83
1994 94
1995 99
1996 92

9. Find seasonal values for the data in table 14.11.


Table 14.11: Data of Terminal Question 9
Year I Quarter II Quarter III Quarter IV Quarter
1995 3.7 4.1 3.3 3.5
1996 3.7 3.9 3.6 3.6
1997 4.0 4.1 3.3 3.1
1998 3.3 4.4 4.0 4.0

14.13 Answers

Self Assessment Questions


1. i) False
ii) True
2. i) Time series
ii) Secular trend
iii) Seasonal variations
iv) Trend
v) Functional

Manipal University Jaipur Page No. 521


Statistics for Management Unit 14

Terminal Questions
1. Refer section 14.2
2. Refer section 14.4.2 and section 14.4.3
3. Refer section 14.5
4. Refer section 14.5.3
5. Refer section 14.4 and section 14.5
6. Refer section 14.4
7. Refer section 14.8
8. The equation of the straight line is given as: Y = 90 + 2X
The trend values are 84, 86, 88, 90, 92, 94, 96.
9. The seasonal values obtained are 98.66, 110.74, 95.30, 95.30.

14.14 Case Study


The number of people visiting a hotel’s webpage per month is given below,
also called as cyber hits, for the year 2006-07 to 2008-09.The hotel could
forecast the future hits every month to facilitate its EDS department to plan
the server space, processing staff, etc.

Table 14.12: The number of people visiting a hotel’s webpage per month
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
2006-07 420 100 300 344 300 200 344 766 899 900 788 455
2007-08 620 399 345 455 677 355 766 500 799 800 880 555
2008-09 520 289 400 644 566 677 500 800 899 900 680 666

Using the suitable methods with justification, forecast the number of visitors
to web page for all the month in the academic year Jan 2010-2011.

References:
 Agarwal B.L. (2006) Basic Statistics. 4th Ed., New Age International
Publishers.
 Anderson, David R., Sweeney. Dennis J. & Williams, Thomas A.
5th ed, Thomson Business Information Pvt Ltd.
 Bowerman, B. L. & O Connel, R.T. Applied Statistics: Improving
Business Processes. (1996) Irwin.
 Levin, Richard I. , Rubin, David S. (2008) Statistics for Management.
7th Ed. PHI Learning Private Limited.
Manipal University Jaipur Page No. 522
Statistics for Management Unit 14

 Pisani, Freedman D.R. & Purves, R. Statistics. (1997) 3rd Ed. W.W
Norton.
 Srivastava & Shailaja Rejo, T.N. (2008) Statistics for Management
5th Ed.TMH.
 Tanur, J.M. (2002) Statistics: A Guide to the unknown. 4th Ed. Brooks
/cole..
 Tukey, J.W. (1977) Exploratory Data Analysis. Addison –Wesley.
 Wilcox, Rand R. (2009) Basic Statistics – Understanding Conventional
Methods and Modern Insights. Oxford University Press.

E-References:
 http://www.textbooksonline.tn.nic.in/Books/11/Stat-EM/Chapter-1.pdf

Manipal University Jaipur Page No. 523


Statistics for Management Unit 15

Unit 15 Index Number


Structure:
15.1 Introduction
Objectives
Relevance
Statistics in practise
15.2 Definition of an Index Number
Relative
Classification of Index Numbers
Base year and Current year
Chief characteristics of Index Numbers
Main steps in the construction of Index Numbers
15.3 Methods of Computation of Index Numbers
Unweighted Index Numbers
Weighted Index Numbers
15.4 Tests for Adequacy of Index Number
15.5 Cost of Living Index or Consumer Price Index
Utility of Consumer Price Index Numbers
Assumptions of Cost of living Index Numbers
Steps in construction of Cost of living Index Numbers
15.6 Methods of constructing Consumer Price Index
Aggregate expenditure method
Family budget method
15.7 Limitations of Index Numbers
15.8 Utility and Importance of Index Numbers
15.9 Summary
15.10 Glossary
15.11 Terminal Questions
15.12 Answers
15.13 Case Study

15.1 Introduction
In the previous unit, we studied about the definition and components of time
series. We also studied about different forecasting methods using time
series analysis. In this unit, we will discuss about the meaning and definition

Manipal University Jaipur Page No. 524


Statistics for Management Unit 15

of index numbers. We will also study the different kinds of index numbers
and their limitations.
We know that almost all values change and so we wish to know the
changes taken place over a period of time. For example, we may want to
know how much the price of various essential household items increase or
decrease to make necessary adjustments to the monthly budget.
Consequently, in all such situations, an average measure needs to be
defined to compare such difference over a time period. Index numbers are
yardsticks for describing such differences. These differences may have to
do with the physical quantities of the goods, the prices of the commodities,
or such concepts as ‘efficiency’, ‘intelligence’ or beauty’. The comparison
may be between the periods of time, between places, between categories,
etc.
We can have index numbers comparing the cost of living at different times
or in different localities or countries. Index numbers are used in comparison
of the physical volume of production in different years. However, we must
confine most of our attention to the construction of index numbers
measuring changes over time.
Objectives:
After studying this unit, you should be able to:
 represent a data set in an index number form
 describe how much the economic variables have changed over time
 describe three principal types of indices: price indices, quantity indices,
and value indices
 calculate various kinds of index numbers
15.1.1 Relevance
The CEO of Bestview television was quite happy with the excellent growth in
the company’s television sales. However, he was confronted by the
employees union and the officers association to increase the pay package
every quarter in order to compensate for the cost of living, as reflected by
the wholesale and consumer price indices, released by the government of
India. After being apprised of the calculation of these indices, the CEO felt
that these indices, which were for all of India, might not be relevant for his
company, as all of its operations were at one location near Mumbai. He
therefore asked the HRD department to coordinate with the management
Manipal University Jaipur Page No. 525
Statistics for Management Unit 15

information system and to evolve an index that could be more sensitive to


the expenditure pattern of his officers and other staff. Accordingly in
consultation with the employees union and the officers association, two
different costs of living indices were evolved for the two categories of staff in
the company. This one can be done by adopting the broad methodology of
choosing items, deciding their weights and recording of prices, etc.
15.1.2 Statistics in practise
The US Department of labour through the bureau of labour statistics
compiles and distributes indexes and statistics that are indicators of
business and economic activity in the United States. For instance, the
bureau complies and publishes the consumer price index, the product price
index, and statistics on average hours and earning of various groups of
workers. Perhaps the most widely quoted index produced by the bureau of
labour statistics is the consumer price index which is often used as a
measure of inflation. Another indicator to show that inflationary pressures
would be low in the near future was given by the producer price index. It
measures price changes in wholesale markets and is often seen as a
leading indicator of changes in the consumer price index. The producer
price index fell by 2% during August. Federal Reserve policymakers had
raised the overnight bank lending rate six times since June 1999. However,
with the data that prices were not rising rapidly, the interest rates were
expected to remain unchanged for the near future. Many economists and
analysts proclaimed that consumer price index report confirmed that the
United States was in noninflationary era. In addition to the CPI report, they
focused on very little upward pressure on wages. The labour department
also reported that average weekly earnings adjusted for inflation rose only
by 1% in August after being unchanged in each of the prior two months.
In this chapter we will discuss how various indices such as the consumer
and product price index are computed and how they should be interpreted.

15.2 Definition of an Index Number


An index number is a number which is used to measure the level of a
certain phenomenon as compared to the level of the same phenomenon at
some standard period. In other words, an index number is a number which
is used as a device for comparison between the price, quantity or value of a

Manipal University Jaipur Page No. 526


Statistics for Management Unit 15

group of articles in different situations for example, at a certain place or a


period of time and that of another place or period of time.

Key statistic
An index number is a statistical measure which is designed to express
changes or differences in a variable or a group of related variables. It is
usually expressed in percentage form.

When a comparison is with respect to prices, it is called an index number of


price, when it is with respect to physical quantities; it is named as index
number of quantities. Other index numbers are defined in the similar
manner. The index numbers are meant for comparison of variations arising
out of the difference in situations, for example, change of time or change of
place.
15.2.1 Relative
The value of a variable in a given year (or place) divided by the value of the
same variable in a specified year (or place) is called a relative. It is generally
expressed in percentage.
I. Price relative
The price of commodity in a given year expressed as a percentage of the
price of the same commodity in a specified year is called price relative.

Solved Problem 1:
The price of a commodity in India in 2001 was Rs. 95 per kg and in 2000 it
was Rs.80 per kg. Calculate the price relative for the year 2001.
Solution: The price relative for 2001, (using 2000 as base) is calculated as:
95
Price relative for 2001   100  118 .75 %
80
Hence, the price relative for 2001 is 118.75 %.
II. Production relative
Let us understand production relative with an example.
Solved Problem 2:
If the wheat production in India in 2002 was 5,82,000 metric tons and in
2004, it was 6,96,000 metric tons, then calculate the production relative for
2004.

Manipal University Jaipur Page No. 527


Statistics for Management Unit 15

Solution: Let us take the production of 2002 as base.


696000
Production relative for 2004  100  119 .6%
582000
Hence, the production relative for 2004 is 119.6 %.
III. Quantity relative
The quantity (q1) of a commodity consumed in a given year expressed as a
percentage of the quantity (q0) of the same commodity consumed in a
specified year is called quantity relative. Thus,
q1
Quantity relative   100
q0

IV. Value relative


If ‘p1’ and ‘q1’ are the price and quantity respectively for a commodity in a
given year and ‘p0’ and ‘q0’ are the specified price and quantity respectively
of the same commodity, in a specified year, then the value of the given year,
‘V1’ and the value of the specified year, ‘V0’ are calculated as:
V1 = p1 q1
V0 = p0 q0
The value relative of the given year with respect to the specified year is
calculated as the ratio of ‘V1’ to ‘V0’, and then the ratio is multiplied with 100.
That is,
V pq
Value relative  1  100  1 1  100
V0 p 0 q0

The overall change in price, production, quantity or value, etc, is


represented by these typical summaries which are known as relatives.
15.2.2 Classification of Index Numbers
There are various approaches for classification of index numbers. They are:
a. Based on Variables
Price index: When the variable is price.
Quantity index: When the variable is quantity.
Value index: When the variable is value.
Production index: When the variable is production.

Manipal University Jaipur Page No. 528


Statistics for Management Unit 15

b. Based on retail or wholesale prices


Cost of living index number: Where we use retail prices.
Wholesale price index number: Where we use wholesale prices.
c. Based on weights
Simple (unweighted) index number
Weighted index number
d. Based on number of commodities
When the number of commodities is more than one, then we obtain a
single (combined) index number. This can be done in four ways:
 Simple average of relatives
 Weighted average of relatives
 Simple aggregate
 Weighted aggregate
15.2.3 Base year and current year
In the computation of an index number we require two years (or places).
The given year whose values are to be compared is called a current year (or
current period) and the specified year whose values are taken as standard
(for example, 100) is called a base year (base period).

Example 1:
If the prices of 2005 are compared with the prices of 2004, then 2005 is
the current year and 2004 is the base year. The index number of 2005
based on 2004, is denoted by ‘Q01’ or ‘P01’, where subscript ‘0’ stands for
the year 2004, and subscript ‘1’ stands for the year 2005.

15.2.4 Chief Characteristics of Index Numbers


1. Expressed in numbers
Index numbers represent the relative changes such as increase in
production; reduction in prices etc. in the numbers.
2. Expressed in percentage
Index numbers are expressed in terms of percentages so as to show the
extent or relative change where the value of base is assumed to be 100 but
the sign of percentage (%) is not used.
3. Relative measure
Index numbers measure changes which are not capable of direct
measurement.

Manipal University Jaipur Page No. 529


Statistics for Management Unit 15

4. Specified averages
Index number represents a special case of average, in general known as
weighted average. It is a special type of average, because in a simple
average, the data is homogenous having the same unit of measurement,
whereas the average variables have different units of measurement.
5. Basis of comparison
Index numbers by their very nature are comparative. They compare
changes over time or between places or similar categories.
15.2.5 Main steps in the construction of index numbers
To follow the steps many problems are encountered which are to be
discussed carefully. There are many difficulties in following the steps
involved in the construction of index numbers. The following steps are
discussed in detail.
1. Purpose of index number
The steps which are taken in the construction of index numbers generally
depend on the purpose of the index number. Hence, the purpose of index
numbers must be defined clearly and precisely. For example, the purpose of
the general index number of wholesale price index number is to know the
general price level. On the other hand, the purpose of the consumer price
index number is to give an idea of the effect of the change in retail prices on
the cost of living in the classes of people.
2. Selection of base period
The base period of an index number is the period of time against which the
comparisons are made. There are three types of base periods.
Fixed base (a single period)
Fixed base (an average of selected periods)
Chain base
While selecting the base, a decision has to be made to decide whether we
have fixed base or chain base.
Fixed base (a single period): In a fixed base (a single period), the base
period must be a normal period. A normal period means that the period
must be free from all sorts of abnormalities or random causes such as
financial crisis, floods, famines, earth quakes, strikes of labourers, wars, etc.

Manipal University Jaipur Page No. 530


Statistics for Management Unit 15

The base period should be a period for which reliable figures are available.
The base period should not be too distant in the past.
Fixed base (an average of selected periods): When it is difficult to choose
just one single period as the normal, then a better choice is an average of
several periods.
Chain base: If the comparisons are required from year to year, a system of
chain base is used.
3. Selection of commodities
The following problems can occur while selecting the commodities.
First problem is the selection of commodities because it is not feasible to
include all commodities. The purpose of the index number is to help in
deciding the number of commodities.
Another problem is to decide on which commodities are to be included. A
careful selection of the commodities must be made in such a way that:
 It should represent the real tastes, habits and the customs of the people.
 It should be of a standard quality and there must be no significant
variation in the quality.
 It must be easily recognisable.
 It should not be a non-tangible commodity such as personal service.
4. Selection of the representative prices
In the collection of price quotations we have to consider the following points:
 The method of quoting prices of the commodities
 The type of quotations - whether wholesale prices or retail prices
 The place from where the quotations are to be obtained
5. Assignment of Weights
The term ‘weight’ refers to the relative importance of the different
commodities included in the construction of index numbers. There are two
methods of assigning weights. They are:
Implicit method: In this method, several varieties of a certain type of
commodity under study are used. Such weights are called implicit weights.
Explicit method: In this method, the weights are laid down on the basis of
one outward evidence of importance of commodities. One of the problems in
the selection of appropriate weight is to decide this evidence. Another

Manipal University Jaipur Page No. 531


Statistics for Management Unit 15

problem with regard to the system of weighing is whether weights should be


fixed or fluctuating.
6. Selection of the average
To find composite index number we can use any average such as arithmetic
mean, geometric mean, harmonic mean, median and mode. The use of an
average depends on the relative merits and demerits of the various
averages. The average may be weighted or unweighted.
7. Selection of suitable formula
There are various formulae for computing index numbers so the selection of
a suitable formula also poses some problem. A particular formula is suitable
in a particular situation.

15.3 Methods of Computation of Index Numbers


The various methods of constructing index numbers can be classified into
two groups. They are:
a. Unweighted index numbers
b. Weighted index numbers
In Unweighted index numbers, each item is supposed to have the same
weight but in weighted index numbers the weights are assigned to various
items in accordance with their importance. Figure 15.1 illustrates the further
classification of methods of constructing index numbers.

Fig. 15.1: Methods of Constructing Index Numbers

Manipal University Jaipur Page No. 532


Statistics for Management Unit 15

Unweighted index numbers can be further divided into two categories. They
are:
I. Simple aggregative method
II. Simple average of relatives method
Weighted index numbers can also be further divided into two categories.
They are:
I. Weighted aggregative method
II. Weighted average of relatives method
15.3.1 Unweighted index numbers
i. Simple aggregative method
To construct a price index by simple aggregative method, we proceed by
doing the following:
Add the prices of all commodities in the current year, that is, find P1
Add the prices of all commodities in the base year, that is, P0
Divide the total of current year prices by the total of base year prices and
multiply the quotient by 100, that is,
P
P01  1  100
P0
where, ‘P01’ is the simple price index number of current year based on
base year.

Solved Problem 3:
Find the simple aggregative price index from the data displayed in table
15.1.
Table 15.1: Price of Commodities for the Years 2000 and 2004
Price in Rs. per unit
Commodity Unit
2000 2004
A One kilogram 10 15
B One kilogram 40 30
C One dozen 10 12
D One litre 5 13
Total 65 70
Solution: The price index number of 2004 is based in 2000. Using the
formula:
P
P  1
100
01 P
0

Manipal University Jaipur Page No. 533


Statistics for Management Unit 15

Where, P1 = total of prices in 2004 = 70


P0 = total of prices in 2000 = 65

Therefore,
70
P01   100  107.7
65

This implies that the prices had increased by 7.7% in the year 2004 as
compared to the year 2000.
Table 15.1 depicts the merits and demerits of simple aggregative method.

Table 15.2: Merits and Demerits of Simple Aggregative Method


Merits Demerits
This is the simplest method of This method gives inappropriate results when
constructing index numbers. the prices of different commodities are quoted in
different units.
It is simple and easy to Since weights are not used, this method does
understand. not give any consideration to the relative
importance of commodities.
It requires simple Index number calculated by this method is
calculations. unduly affected by high or low values.

Self Assessment Question


1. Find out the price index number using simple aggregate method for the
data represented in table 15.3.
Table 15.3: Price of the Commodities for Years 2001 and 2002

Price in Rs. per quintal


Commodity Base year Current year
2001 2002
Wheat 80 100
Rice 120 250
Gram 100 150
Pulses 200 300

Manipal University Jaipur Page No. 534


Statistics for Management Unit 15

ii. Simple average of relative method

To construct a price index by this method, we proceed by doing the


following:
Obtain the price relative for each commodity, which is calculated as:
Pr ice of current year
Pr ice relative for current year   100
Pr ice of base year
P1
R  100
P0
Calculation of arithmetic mean, geometric mean for the price relatives
denoted it by ’P01’ is given by:
When arithmetic mean is used:
1  P1  R
=
P    100
01 N P  N
 0 
When geometric mean is used:
 log R
P  Anti log
01 N

Solved Problem 4:
The prices of three different commodities for 2002 and 2003 are displayed in
table 15.4. The price given is per each ton of the commodity. Taking the
year 2002 as base, calculate the price index by using the simple average of
relatives method by using both arithmetic mean and geometric mean.

Table 15.4: Prices of Commodities for 2002 and 2003

Commodity Corn Wheat Cocoa


Price in 2002 800 500 900
Price in 2003 880 480 940

Solution: Table 15.4a represents the calculated values for determining


price index.

Manipal University Jaipur Page No. 535


Statistics for Management Unit 15

Table 15.4a: Calculated Values for Determining Price Index


Price
Price Price Relative log
Commodity in 2002 in 2003 P
R  1  100 R
Po P1
Po
Corn 800 880 880 2.04
 100  110
800
Wheat 500 480 480 1.98
 100  96
500
Cocoa 900 940 940 2.02
 100  104 .44
900
Total
P o  2200 P 1  2300  R  310.44 6.04

Simple average of relatives method by using arithmetic mean:


1 P  R
P    1 100  =
310.44
  103.48
01 N P  N 3
 0 
Simple average of relatives method by using geometric mean:
 log R 6.04
P  Anti log  Anti log  103.11
01 N 3
Hence, the price index obtained by simple average of relative method using
arithmetic mean and geometric mean are 103.48 and 103.11 respectively.
The table 15.5 depicts the merits and demerits of simple average of relative
method.
Table 15.5: Merits and Demerits of Simple Average of Relative Method
Merits Demerits
It is not affected by units in which As it is an unweighted average, the
prices are quoted. importance of all items is assumed to be
the same.
It is not affected by absolute values The index number constructed by this
of prices as prices are converted method does not satisfy all the criteria
into price relatives. laid down for an ideal index.
It gives equal importance to all The index number is unduly influenced
items and extreme items do not by high or low prices when arithmetic
unduly affect the index number. mean is used.
The index number calculated by More labour is involved if geometric
this method satisfies the unit test. mean is used.

Manipal University Jaipur Page No. 536


Statistics for Management Unit 15

15.3.2 Weighted index numbers


To meet the weakness of the simple or unweighted method, we weigh the
price of each commodity by a suitable factor; often it is the quantity or the
volume of the commodity sold during the base year. In other words, in this
method, appropriate weights are assigned to various commodities to reflect
their relative importance in the group. The weight can be production figures,
consumption figures or distributive figures.

Key Statistic
For the construction of the price index number quantity weights are used.
If ‘w’ is the weight attached to a commodity, then the price index is given
by:
P  w
Pr ice Index P  1 100
01 P  w
0

Weighted aggregative index number


In the weighted aggregative index numbers, the weights are assigned to
various items and the weighted aggregate of the prices are obtained.
Weights are assigned in various ways and the weighted aggregates are
used in different ways for the construction of index numbers.
Some of the important methods of constructing weighted aggregative index
numbers are described below.

1. Laspeyre’s price index


Laspeyre’s method is based on fixed weights of the base year. Base
year’s quantities are used as weights.
The formula given by Laspeyre is given below.
P Q
Laspeyre' s Pr ice Index LP  1 0  100
01 P Q
0 0
Where, P1 = Current year price, P0 = Base year price,
Q0 = Quantity used for weight in the base year
This index number has an upward bias, that is, when prices increase,
there is a tendency to reduce the consumption of higher priced goods.
This index number is widely used in practical work.

Manipal University Jaipur Page No. 537


Statistics for Management Unit 15

The quantity index number using Laspeyre’s formula is given by:

Laspeyre' s Quantity Index LQ 01 


Q P 1 0
 100
Q P 0 0

Solved Problem 5:
Compute Laspeyre’s price and quantity index number for the following data:

Table 15.6: Price and Quantity of commodities in the base year and current
year
Base year Current year
Commodity Price Quantity Price Quantity
A 3 25 5 28
B 1 50 3 60
C 2 30 1 30
D 5 15 6 12

Solution: Base year price and quantity are denoted by P0 and Q0 and
current year price and quantity are denoted by P1 and Q1 respectively
Table 15.6a

Commodity P0 Q0 P1 Q1 P0Q0 P1Q0 P0Q1 P1Q1


A 3 25 5 28 75 125 84 140
B 1 50 3 60 50 150 60 180
C 2 30 1 30 60 30 60 30
D 5 15 6 12 75 90 60 72
Total 260 395 264 422

Laspeyre’s price index number is given by


P Q
L P  1 0  100
01 P Q
0 0

Now substituting the values we get,


395
LP   100  151.92.
01 260

Manipal University Jaipur Page No. 538


Statistics for Management Unit 15

Laspeyre’s quantity index number is given by

LQ 01 
Q P1 0
 100 
264
 100  101 .54.
Q P0 0 260

2. Paasche’s method
Paasche’s method is based on current year’s quantities. Current year’s
quantities are used as weights. Paache’s Price Index is given by:
 P1Q1
PP01  100
 P0 Q1

Where, P1 = Current year price, P0 = Base year price


Q1 = Current year quantity which are taken as weights.
This index number has downward bias. This formula is not used frequently
in practise where the number of commodities is large.
Quantity index number using Paasche’s formula is given by:
 Q1P1
PQ 01   100
 Q 0P1

Solved Problem 6: Compute Paasche’s price and quantity index number


for solved problem 5.

Solution
Paache’s Price Index is given by:
 P1Q1
PP01  100
 P0 Q1

From the table 15.6a, we have


422
PP01   100  159 .84 .
264
Paache’s quantity index is given by:
 Q1P1
PQ 01   100
 Q 0P1

Manipal University Jaipur Page No. 539


Statistics for Management Unit 15

From the table 15.6a, we have


422
PQ  100  106.84.
01 395

3. Dorbish and Bowley’s method


This method is a combination of Laspeyre’s and Paasche’s method. If we
find out the arithmetic average of Laspeyre’s index and Paasche’s index,
we get the index suggested by Dorbish and Bowley. This index number
takes into account the base year’s as well as the current year’s weights.
Dorbish and Bowley’s Price Index is given by:
 P Q P Q 
 1 0  1 1
 LP  PP01  P Q P Q 
 0 0 0 1
DP   01    100
01
 
2  
2

Where, LP is Laspeyre’s price index and PP Paasche’s price index.


01 01

4. Fisher’s ideal index number


This method is a combination of Laspeyre’s and Paasche’s method. If we
find out the geometric average of Laspeyre’s index and Paasche’s index,
we get the index suggested by Fisher. Fisher’s index number is given by
  P1Q 0  P Q 
FP  LP  PP    1 1   100
 P Q 
 0 0  P0 Q1 
01 01 01

Where, LP is Laspeyre’s price index and PP is Paasche’s price index.


01 01

Solved problem 7:
For the solved example 5, compute Dorbish and Bowley’s and Fisher index
numbers.
Solution: Dorbish and Bowley’s index number is given by
 P Q P Q 
 1 0  1 1
P Q P Q 
 0 0 0 1
DP   100
01 2

Manipal University Jaipur Page No. 540


Statistics for Management Unit 15

From the calculation table 15.6a we get,


 395 422 
  
 260 264 
DP   100  155.9
01 2
Fisher’s index number is given by
 P Q P Q 
FP   1 0  1 1   100
01 P Q P Q 
 0 0 0 1

Substituting the values from the table 15.6a we get,


 395 422 
FP      100  155.84
01  260 264 

The table 15.7 depicts the merits and demerits of Fishers Index Number
Table 15.7: Merits and Demerits of Fishers Index Number
Merits Demerits
It is free from bias, upward as well as This formula is difficult to interpret.
downward.
This formula takes into account both It is not a practical index to compute
current years as well as base year because it is excessively laborious.
prices and quantities.
It satisfies both ‘time several test’ as It requires the prices and quantities for
well as the ‘factor reversal test’. This base year and current year.
is why it is called an ideal index
number.

15.3.3 Quantity index numbers


The quantity index numbers measure the average storage in quantities and
enable us to compare changes in physical quantity of goods produced or
sold. These index numbers can also be simple or weighted. Therefore,
quantity index numbers can be easily obtained from price index numbers
just by interchanging P’s and Q’s in the formulae used for calculating the
price index numbers. The weighted average of relative quantity index is
given by:
 Q1  
  Q  100 Q n Pn 
 0  
Quantity index =
 n n
Q P

Manipal University Jaipur Page No. 541


Statistics for Management Unit 15

where,
‘Q1’ and ‘Q0’ are the quantities for the current and base period respectively
‘Pn’ and ‘Qn’ are the prices and quantities that determine values that we use
for weights.
15.3.4 Value index numbers
The value index numbers are easy to calculate. Value is the product of price
and quantity. A simple value index number is equal to the value of the
current year divided by the value of the base year. If this value is multiplied
by 100 we get the value index number. The required formula is:
 P1Q1
V  100
 P0 Q 0
Simple value index number is given by:
 V1
V 100
 V0
where, V1 = value of the current year.
V0 = value of the base year.
Such index numbers are not weighted, because they do not take into
account either the price or the quantity. These index numbers are not very
popular because the situation revealed by price and quantities are not fully
revealed by the values.

Solved problem 8:
For the solved example 5, compute Value index number.
Solution: The formula to compute Value index number is
 P1Q1
V  100
 P0 Q 0

From the table 15.6a we have


422
V  100  162 .31 .
260
15.4 Tests for Adequacy of Index Number
1. Unit test
This test requires that the formula should be free of units. Except simple
aggregative index, all the others satisfy this test.
Manipal University Jaipur Page No. 542
Statistics for Management Unit 15

2. Time reversal test


This test requires the formula for calculating the index number that should
be such that it will give the same ratio between one period of comparison
and the other. Symbolically,
P01  P10  1
This test is satisfied by Fisher’s ideal index, simple geometric mean of price
relatives, weighted geometric mean of price relatives and Marshall-
Edgeworth index number.
3. Factor reversal test
The formula should permit the interchange of price and quantity without
giving inconsistent results.
 P1Q1
P01  Q 01 
 P0 Q 0
This test is satisfied by Fisher’s ideal index
4. Circular test
It is an extension of time reversal test. The test requires that if an index is
constructed for the year ‘a’ on base year ‘b’, and for the year ‘b’ on the base
year ‘c’, we should get the same result as if we calculated directly for the
year ‘a’ on the base year ‘c’ without going through ‘b’.

Symbolically,
P01  P12  P20  1

It is satisfied by index numbers with fixed weights by aggregate methods.


15.5 Cost of Living Index or Consumer Price Index
The ‘Cost of living index’, also known as ‘consumer price index’ or ‘Cost of
living price index’ is the country’s principal measure of price change. The
Consumer price index helps us in determining the effect of rise and fall in
prices on different classes of consumers living in different areas.

Manipal University Jaipur Page No. 543


Statistics for Management Unit 15

Key statistic
Cost of living price index measures average change over time in the
prices paid by the consumer for specific basket of goods and services.
The cost of living price index numbers are designed to measure the
average change in the price paid by the ultimate consumers for specified
quantities of goods and services over a period of time.

Different people consume different kinds of commodities and the same


commodities in different proportions. The consumer price index helps us in
determining the effect of size. Fall in price index helps us in determining the
effect of rise and fall in prices on different classes of consumers living in
different areas. The consumer price index number is significant because the
demand of a higher wage is based on the cost of living index and the wages
and salaries in most nations are adjusted according to this index number.
The cost of living index does not measure the actual cost of living or the
fluctuations in the cost of living due to causes other than the change in price
level. However, its object is to find out how much the consumers of a
particular class have to pay for a certain quantity of goods and services.
15.5.1 Utility of Consumer Price Index Numbers
The following are the uses of Consumer Price Index Numbers.
 It is useful to measure the change in purchasing power of currency, real
income.
 It helps the government in formulating wage policy, price policy, taxation
and general economic policies.
 Market prices for particular kinds of goods and services are analysed by
consumer price index.
 The salaries and wages are fixed on the basis of consumer price index.
So, it is very helpful to revise wage of dearness allowance.
15.5.2 Assumptions of Cost of Living Index Numbers
Cost of living index number is based on the following assumptions.
 Similar needs
The needs of the people for which this index number is constructed are
same.

Manipal University Jaipur Page No. 544


Statistics for Management Unit 15

 Same goods
The goods consumed in the base years and current years remain
unchanged.
 No change in quantity of goods
It is also assumed that the quantity of goods consumed will remain
same in the base year and current year.
 Price quotations are same
It is assumed that the prices at different places are same and they do
not change frequently.
 True on the average
Cost of living index numbers are true on the average.
 Representative goods
The commodities included in the cost of living index number represent
the consumption of the class of people.
15.5.3 Steps in construction of cost of living index numbers
There are 5 steps involved in construction of cost of living index numbers.

Step 1: Select the class of people


Step 2: Define scope of the index
Step 3: Conduct family budget inquiry
Step 4: Obtain price quotations
Step 5: Prepare a frame or list of persons

15.6 Methods of Constructing Consumer Price Index


There are two methods for constructing consumer price index number. They
are:
I. Aggregate expenditure method
II. Family budget method or method of weighted average of price
relatives.
I. Aggregate Expenditure Method
This method is based on Laspeyre’s method where the base year quantities
are taken as weights (w = Q0).
P Q
P  1 0
100
01 P Q
0 0

Manipal University Jaipur Page No. 545


Statistics for Management Unit 15

II. Family budget method


Family budget method or the method of weighted relatives is the method
where weights are the Value (P0Q0) in the base year often denoted by W.

 PW P
P , where P  1  100 for each item and

01 W P0
W= value weight, i.e., P0Q0
Solved problem 9:
Calculate the cost of living index for the current year on the basis of the
base year from the following data, using
(a) Aggregate expenditure method and (b) Family budget method
Table 15.8: Data for problem 9
Quantity Price in rupees
Article Consumed in Unit Base year Current
the Base Year
year
Wheat 8 Kgs 16 17.2
Rice 4 Kgs 12.2 13.5
Pulses 2 Kgs 7.25 8.50
Milk 16 Lts 2 2
Sugar 3 Kgs 24 25

Solution:
Table 15.8a: Table for calculating cost of living index
Quantity Price in rupees
Article Consumed
Base year Current
in the
year
Base Year
P0 P1 P0Q0 P1Q0
Q0
Wheat 8 16 17.2 128 137.6
Rice 4 12.2 13.5 48.8 54
Pulses 2 7.25 8.50 14.5 17
Milk 16 2 2 32 32
Sugar 3 24 25 72 75
Total 295.3 315.6

(a) Aggregate expenditure method

Manipal University Jaipur Page No. 546


Statistics for Management Unit 15

The formula for aggregate expenditure method is given by:


 P1Q 0 315 .6
P01   100   100  106 .87.
 P0 Q 0 295 .3

Therefore the cost of living index number is 106.87.


(b) Family budget method or the method of weighted relatives
Table 15.8b: Family budget method
Quantity Price in rupees
Article consumed
Base Current P
in the 1
base year year year P  100 W=P0Q0
P0 P PW
Q0 P1 0

Wheat 8 16 17.2 107.5 128 13760


Rice 4 12.2 13.5 110.66 48.8 5400.208
Pulses 2 7.25 8.50 117.24 14.5 1699.98
Milk 16 2 2 100.0 32 3200
Sugar 3 24 25 104.2 72 7502.4
Total 33 ∑W= ∑PW=
295.3 31562.588

Formula for calculating the cost of living index is:


 PW 31562.588
P    106.88.
01 W 295.3
Therefore cost of living index number is 106.88
Solved Problem 10:
Calculate the cost of living index by using Family Budget Method for year
2011 with 2010 as base year from the following data:
Table 15.9: Table for calculating cost of living index
Item Weights Price in
2010 2011
Food 35 150 140
Rent 20 75 90
Clothing 10 25 30
Fuel and lighting 15 50 60
Miscellaneous 20 60 80

Manipal University Jaipur Page No. 547


Statistics for Management Unit 15

Solution
Table 15.9a: Family Budget Method
Item W Price in P PW
1
P0 P1 P  100
P
0
Food 35 150 140 93.33 3266.55
Rent 20 75 90 120.00 2400.00
Clothing 10 25 30 120.00 1200.00
Fuel and 15 50 60 120.00 1800.00
lighting
Miscellaneous 20 60 80 133.33 2666.60
∑W =100 ∑PW =11333.15

Formula for calculating the cost of living index is:

 PW 11333.15
P    113.33.
01 W 100

15.7 Limitations of Index Numbers


There is no doubt that the technique of index numbers is a very useful tool.
However, there are certain limitations of index numbers which should be
borne in mind.
The chief limitations are:
1. Index numbers are not perfect. They are approximated values.
2. Difficulties in the construction of index numbers due to selection of base
year, items, changes in habits and selection of average.
3. Sampling errors occurs.
4. Index numbers can also be manipulated.
5. They have limited applications. An index number constructed for one
purpose cannot be used for other purposes.

Self Assessment Questions


2. The data in table 15.10 is related to workers in an industrial town.
Calculate consumer price index number by using family budget method.

Manipal University Jaipur Page No. 548


Statistics for Management Unit 15

Table 15.10: Price Index and Percentage Expenditures of Items


Item of Consumption Price Index Percentage
P Expenditure
Food 200 50
Clothing 175 10
Fuel & lighting 160 12
Housing 225 15
Miscellaneous 150 13

15.8 Utility and Importance of Index Numbers


The primary purpose of index numbers is to measure relative temporal or
cross-sectional changes in a variable or a group of related variables which
are not capable of being directly measured. The greatest purpose of index
numbers has been to measure and compare the changes in prices and
purchasing power of money which have received great attention from
economists for many years.
Today, index number is not only used for measuring price changes alone.
Factors like wages, employment, production, trade, demand, supply,
business condition, industrial activity, financial problems etc. are also
studied through this statistical device. Just as a barometer measures the
pressure of atmosphere or gases, the index numbers measure the pressure
of economic behaviour. Thus, index numbers are called economic
barometers.
Main uses of Index Numbers:
 Comparative study is made possible
 Simplifies data
 Provides guidelines to economic policy and in formulating decisions
 Measures purchasing power of money
 Measures changes in cost of living
 National income calculations
 Reveals trends and tendencies
 Useful in deflating
 Universal utility

Manipal University Jaipur Page No. 549


Statistics for Management Unit 15

Activity
1. Calculate Fishers Ideal index and show that it satisfies the Time
and Factor Reversal Tests
Table 15.11: Data for Activity
Commodities Base Year Current Year
Price Quantity Price Quantity
P0 Q0 P1 Q1
A 6.5 500 10.8 560
B 2.8 124 2.9 148
C 4.7 69 8.2 78
D 10.9 38 13.4 24
E 8.6 49 10.8 27
Activity Solution
Table 15.11a: Table for calculation of index numbers
Commodities P0Q0 P1Q0 P0Q1 P1Q1
A 3250 5400 3640 6048
B 347.2 359.6 414.4 429.2
C 324.3 565.8 366.6 639.6
D 414.2 509.2 261.6 321.6
E 421.4 529.2 232.2 291.6
Total 4757.1 7363.8 4914.8 7730
Laspeyre’s Price Index Number:

 P Q P Q   
 1 0  1 1    7363.8  7730 
P 
P Q  4757.1 4914.8 
01 P Q  
 0 0 0 1  

P Q P Q   
 0 1  0 0    4914.8  4757.1 
P 
 P Q  7730 7363.8 
10 P Q  
 1 1 1 0  

Time Reversal Test is satisfied if P01  P10  1 (without multiplying by 100)

 
P01  P10  
7363 .8 7730 4914 .8 4757 .1 
 x   1 =1,
 4757 .1 4914 .8 7730 7363 .8 
 

Manipal University Jaipur Page No. 550


Statistics for Management Unit 15

Hence Time Reversal Test is satisfied.


Factor Reversal Test is satisfied if,

 P1Q1
P01  Q 01  (without multiplying by 100)
 P0 Q 0

 Q P Q P   
 1 0  1 1    4914.8  7730 
Q 01 
 Q P Q P   4757.1 
 0 1  7363.8 
0 0  

 
 7363 .8 7730 4914 .8 7730 
P01  Q01 =   x  
 4757 .1 4914 .8 4757 .1 7363 .8
 

 7730 7730  7730  P1Q1


=     
 4757 .1 4757 .1 4757 .1  P0 Q 0

Hence Factor Reversal Test is satisfied.

15.9 Summary
Let us recapitulate the important concepts discussed in this unit:
 An index number is a number which is used as a device for comparison
between the price, quantity or value of a group of articles in different
situations.
 In the computation of an index number the given year whose values are
to be compared is called a current year and the specified year whose
values are taken as standard is called a base year.
 The various methods of constructing index numbers can be classified
into two groups. They are: unweighted index numbers and weighted
index numbers.
 Unit test, time reversal test, factor reversal test and circular test are the
test for adequacy of index numbers.
 The cost of living price index numbers also known as consumer price
index numbers are designed to measure the average change in the

Manipal University Jaipur Page No. 551


Statistics for Management Unit 15

price paid by the ultimate consumers for specified quantities of goods


and services over a period of time.
 There are two methods for constructing consumer price index number.
They are: Aggregate expenditure method and Family budget method or
method of weighted average of price relatives.

15.10 Glossary
Explicit method: In this method, the weights are laid down on the basis of
one outward evidence of importance of commodities.
Implicit method: In this method, several varieties of a certain type of
commodity under study are used. Such weights are called implicit weights.
Index number: An index number is a number which is used to measure the
level of a certain phenomenon as compared to the level of the same
phenomenon at some standard period.
Price relative: The price of commodity in a given year expressed as a
percentage of the price of the same commodity in a specified year is called
price relative.
Relative: The value of a variable in a given year (or place) divided by the
value of the same variable in a specified year (or place) is called a relative.
It is generally expressed in percentage.

15.11 Terminal Questions


1. What is index number? State its utility.
2. Discuss the problems of:
a. Selection of the base year
b. Selection of weights in the construction of index numbers
3. What are the characteristics of an index number?
4. Construct Fisher’s ideal index for the data depicted in table 15.12

Table 15.12: Price of Commodities for the Years 1997 and 2005
Base year 1997 Current year 2005
Commodity
Price Qty Price Qty
A 16 110 25 132
B 5 220 5 264
C 10 132 15 165
D 25 66 30 55

Manipal University Jaipur Page No. 552


Statistics for Management Unit 15

5. The table 15.13 depicts the price of commodities along with the weights
of respective commodities. Calculate index number for 2000 based on
the year 1995.
Table 15.13: Price of Commodities along with the Weights
Commodity 1995 2000 Weights
A 0.50 0.75 2
B 0.60 0.75 5
C 2.00 2.40 4
D 1.80 2.10 8
E 8.00 10.00 1

15.12 Answers

Self Assessment Questions


1. For the data in table 15.3, we can calculate the price index number of
2002 based on 2001 as:
P
1
P  x100
01 P
0

where, P1 = Total of prices in 2002 = 800


P0 = Total of prices in 2001 = 500
800
Therefore, P   100  160
01 500
This means that the price has increased by 60% in 2002 as compared to
2001
2. The table 15.14 depicts the price of items along with the weighted price.

Table 15.14: Price of Items along with the Weighted Prices


Item P W(weight) PW
Food 200 50 10000
Clothing 175 10 1750
Fuel & Lighting 160 12 1920
Housing 225 15 3375
Miscellaneous 150 13 1950
Total W = 100 PW = 18995

Manipal University Jaipur Page No. 553


Statistics for Management Unit 15

Consumer price index number by family budget method is given by:

 PW 18995
P    189.95
01 W 100
Hence, the consumer price index number by family budget method is
189.95.

Terminal Questions
1. Refer section 15.2, section 15.5.1, section 15.8
2. Refer section 15.2.5
3. Refer section 15.2.4
4. The Fisher ideal index is equal to 134.69
5. The required index number for the year 2000 is 123.3

15.13 Case Study


In Asian countries alcohol consumption rates are increasing among the
youth. A market research company wanted to have a survey related to this
topic. It has taken the percentages of 15 and 16 years old who admitted to
being drunk 4 times or more than in a 30 day in 2003.
Table 15.15: Data based on Survey
Country Percentage
India 25
China 26
Pakistan 20
Nepal 5
Srilanka 15
Bangladesh 4
Bhutan 29
Burma 10
Maldives 6
Iran 10
i. Using India as a base, develop a relative regional index for the
percentage of 15 and 16 years olds who admitted to being drunk four
times or more in a 30 day period.
Manipal University Jaipur Page No. 554
Statistics for Management Unit 15

ii. Using the index for India developed in question i, how would you
describe the percentage of 15 and 16 years olds who admitted to being
drunk four times or more in a 30 day period in Bhutan, Bangladesh and
Sri Lanka.
iii. Using Pakistan as the base, develop a relative regional index for the
percentage of 15 and 16 years olds who admitted to being drunk four
times or more in a 30 day period.
iv. Using the index for Pakistan developed in question iii, how would you
describe the percentage of 15 and 16 years olds who admitted to being
drunk four times or more in a 30 day period in Bhutan, Bangladesh and
Sri Lanka.
v. Using China as a base develop a relative regional index for the
percentage of 15 and 16 years olds who admitted to being drunk four
times or more in a 30 day period.
vi. Using the index for China developed in question v, how would you
describe the percentage of 15 and 16 years olds who admitted to being
drunk four times or more in a 30 day period in Bhutan, Bangladesh and
Sri Lanka.
vii. Based on the data what general conclusion can you draw?

References:
 Bevington, Philip R. & Robinson, D. Keith. Data Reduction and Error
Analysis for the Physical Sciences. 3rd Ed.
 Cowan, Glen. Statistical Data Analysis, Oxford Science Publications.
 Devore, Jay L. (2008) Probability and Statistics for Engineering and the
Sciences.
 Froedesen, A.G., Skjeggestad D., & Tofte, H. (1979) Probability and
Statistics in Particle Physics.
 James, Frederick. (2006). Statistical Methods in Experimental Physics.
2nd Ed.
 Levin, Richard I., & Rubin, David S. (2008) Statistics for Management.
7th Ed. PHI Learning Private Limited.
 Lyons, Louis. (1989) Statistics for Nuclear and Particle Physicists.
 Mandel, John. .
Manipal University Jaipur Page No. 555
Statistics for Management Unit 15

 Meyer, Stuart L. Data Analysis for Scientists and Engineers.


 Morris H., Schervish, Mark J. (Paperback - Jan. 31, 2002), Probability
and Statistics
 Press, William H., Teukolsky, Saul A., Vetterling, William T., & Flannery,
Brian P. Numerical Recipes : The Art of Scientific Computing
 Ross, Sheldon M. (2009) Introduction to Probability and Statistics for
Engineers and Scientists. 4th Ed. .
 Taylor, John R. An Introduction to Error Analysis: The Study of
Uncertainties in Physical Measurements.

Manipal University Jaipur Page No. 556

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy