MBA Quantitative Techniques and Analytics 03
MBA Quantitative Techniques and Analytics 03
03 Measure of Variation
Names of Sub-Units
Different Measures of Dispersion, Range, Quartile and interquartile Range, Standard Deviation and
Variance.
Overview
The unit begins by explaining the concept of Measures of Dispersion and Significance of Dispersion.
Further, it describes the Range and Standard Deviation. The unit explains the concept of Mean
Deviation and Quartile Deviation. It also discusses the Variance and Coefficient of Variation.
Learning Objectives
Learning Outcomes
At the end of this unit, you would:
Assess the basis for absolute and relative measures of dispersion
Evaluate the range
Appraise the quartile deviation
Examine the standard deviation and mean deviation
Assess the coefficient of variation
3.1 INTRODUCTION
Dispersion means deviation, difference or spread of certain values from their central value. In relation
to statistical series, it means, deviations of various items of the series from its central value. According
to AX. Bowley, “Dispersion is the measure of variation of the items.” Measures of dispersion have two
types which you will study in this unit. The concept of Mean deviation is also discussed in this unit which
represents the extent of deviation of values from the mean.
According to Clark and Schkade, average deviation is the average amount of scatter of the items in a
distribution from either the mean or the median, ignoring the signs of the deviations. The average that
is taken of the scatter is an arithmetic mean, which accounts for the fact that this measure is often
called the mean deviation. Mean Deviation is used to measure variability across a data series.
Measures of Dispersion
Absolute Relative
32
UNIT 03: Measure of Variation JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
3.3 RANGE
Range represents the difference between the highest value and the lowest value in a data series. It is
considered a rough measure of variability because it depends on the size of the data series. When the
highest (H) and/or the lowest (L) data point in a data series changes, the range also changes.
The formula used to calculate range is as follows:
Range = (Highest value of data series – Lowest value of data series)
Let us learn to calculate range with the help of the preceding example in which a group of 17 people
rated a book on a 5-pointer scale, where 1 is the lowest rating and 5 is the highest rating. The rating
given by the 17 people is as follows:
2, 5, 3, 4, 1, 5, 4, 3, 1, 2, 5, 4, 3, 2, 1, 5, 4
Now, you want to calculate the range for the data series.
To do so, you need to find the highest and lowest values of the data series. In the present case,
Highest value of data series = 5
Lowest value of data series = 1.
33
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Quantitative Techniques and Analytics
Range = (5 – 1)
Range = 4
Consider, we have n number of items in a data set. Then the quartiles are given by;
Q1 = [(n+1)/4]th item
Q2 = [(n+1)/2]th item
Q3 = [3(n+1)/4]th item
Hence, the formula for quartile can be given by;
N
r C
4
Q r 11 (12 – 11 )
f
The interquartile range (IQR) is the difference between the upper and lower quartile of a given data set
and is also called a midspread. It is a measure of statistical distribution, which is equal to the difference
between the upper and lower quartiles. Also, it is a calculation of variation while dividing a data set into
quartiles. If Q1 is the first quartile and Q3 is the third quartile, then the IQR formula is given by;
IQR = Q3 – Q1
Let us understand the quartile with the help of an example.
34
UNIT 03: Measure of Variation JGI JAINDEEMED-TO-BE UNI VE RSI TY
Solution: Here the numbers are arranged in the ascending order and number of items, n = 7
Lower quartile, Q1 = [(n+1)/4]th item
Q1= 7+1/4
= 2nd item = 6
Median, Q 2 = [(n+1)/2]th item
Q2= 7+1/2 item
= 4th item = 8
Upper Quartile, Q3 = [3(n+1)/4]th item
Q3 = 3(7+1)/4 item
= 6th item
= 23
Population Sample
The coefficient of SD can be calculated by dividing SD by the mean of the series. It is a relative measure
of dispersion.
Let us understand the concepts of SD, the coefficient of SD, and the coefficient of variance with the help
of an example.
35
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Quantitative Techniques and Analytics
Suppose you want to calculate the standard deviation of the weights of five friends shown in the
preceding example. Table 1 shows the data used to calculate the standard deviation, the coefficient of
standard deviation, and the coefficient of variance:
The mean deviation of the data values can be easily calculated using the below procedure.
Step 1: Find the mean value for the given data values
Step 2: Now, subtract the mean value from each of the data values given (Note: Ignore the minus symbol)
Step 3: Now, find the mean of those values obtained in step 2.
represents the addition of values
X represents each value in the data set
µ represents the mean of the data set
36
UNIT 03: Measure of Variation JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
37
JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
Quantitative Techniques and Analytics
Solution:
Arrange the data in an order.
i.e., 13, 16, 23, 26, 26, 35, 35, 37
n=8
Q1 = [(n+1)/4] th item
Q1 = 8+1/4 = 9/4
= 2.25th term
Similarly,
Q2 = [(n+1)/2]th item
Q2 = 8+1/2 = 9/2
Q2= 4.5
Q2 = 4th term + 0.5 (5th term – 4th term)
Q2= 26+0.5(26 – 26)
Q2= 26
And,
Q3 = [3(n + 1)/4]th item
Q3 = 3(8 + 1)/4 = 6.75th term
Q3 = 6th term + 0.75(7th term – 6th term)
Q3 = 35 + 0.75(35 – 35)
Q3= 35
Q.D. = (Q3 – Q1)/2
=35 – 15.25/2
=19.75/2
=9.87
3.6 VARIANCE
The variance is a measure of variability. It is calculated by taking the average of squared deviations
from the mean.
38
UNIT 03: Measure of Variation JGI JAIN DEEMED-TO-BE UNI VE RSI TY
Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance
is in relation to the mean. Variance is expressed in much larger units (for example, meters squared)
Since the units of variance are much larger than those of a typical value of a data set, it’s harder to
interpret the variance number intuitively. That’s why standard deviation is often preferred as a main
measure of variability. The formulas for calculating variance are as follows:
(x 1
) 2 (x x )1
2
2 i 1
s2 i 1
N n 1
2= population variance s2 = sample variance
xi = value of ith element xi = value of ith element
= population mean x = sample mean
N = population size n = sample size
39
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Quantitative Techniques and Analytics
Statistical dispersion means the extent to which a numerical data is likely to vary about an average
value.
An absolute measure of dispersion contains the same unit as the original data set.
The relative measures of dispersion are used to compare the distribution of two or more data sets.
Measures of dispersion are also known as the averages of the ‘second order’.
Range represents the difference between the highest value and the lowest value in a data series.
The quartiles are values that divide a list of numbers into quarters.
The interquartile range (IQR) is the difference between the upper and lower quartile of a given data
set and is also called a midspread.
Standard Deviation is used to calculate the scattering of values in a given dataset. The symbol used
to represent standard deviation is sigma ().
The mean deviation is defined as a statistical measure that is used to calculate the average deviation
from the mean value of the given data set.
Quartile deviation is defined as half of the distance between the third and the first quartile.
40
UNIT 03: Measure of Variation JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
The variance is a measure of variability. It is calculated by taking the average of squared deviations
from the mean.
The coefficient of variation is particularly useful when you want to compare results from two
different surveys or tests that have different measures or values.
3.8 GLOSSARY
Statistical dispersion: The extent to which a numerical data is likely to vary about an average value
Variance: A measure of variability
Coefficient of variation: A measure of relative variability. It is the ratio of the standard deviation to
the mean (average)
Case Objective
The case study explains the importance of quality standards.
TPR Inc. was a multi-cuisine restaurant based in India. It had several outlets in the major Indian cities.
The restaurant management wanted to find out if its various outlets were meeting the established
standards of quality and customer service. It hired a consultancy firm for the purpose.
The consultants collected a large scale of data with the help of questionnaires, interviews, and
observations in the restaurants’ outlets. Then, they carefully followed the data processing steps to
analyse it and retrieve relevant and meaningful information from it.
While processing the responses in the questionnaires, they found that quite a large number of
questions were left unanswered. Instead of ignoring such questions, they proceeded systematically.
Each questionnaire comprised a series of interval questions, closed-ended questions and open-ended
questions.
In the case of interval questions, they gave a mid-value to the unanswered questions. In case of open-
ended questions, they went back to the customers and requested them to fill in the answers.
After retrieving sufficient data from the questionnaires, they classified the collected data. To do so, they
combined customers’ responses from different cities and then sub-grouped them according to their
cities.
Next, they formed a table to analyse the relationship between customers’ satisfaction and the sales of
the company:
Calculating the Correlation between Customer Satisfaction and Sales of the Company
Number of Customer Sales of Xi2 Yi2 XiY i
Observations Satisfaction (Xi) Company (Yi)
1 4 5 16 25 20
2 6 6 36 36 36
41
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Quantitative Techniques and Analytics
Calculating the Correlation between Customer Satisfaction and Sales of the Company
Number of Customer Sales of Xi2 Yi2 XiY i
Observations Satisfaction (Xi) Company (Yi)
3 7 6 49 36 42
4 8 4 64 16 32
5 9 6 81 36 54
6 10 9 100 81 90
7 8 10 64 100 80
8 7 2 49 4 14
9 1 3 1 9 3
10 2 4 4 16 8
11 9 9 81 81 81
12 8 8 64 64 64
13 7 9 49 81 63
14 10 11 100 121 110
15 6 5 36 25 30
16 9 12 81 144 108
17 8 15 64 225 120
18 10 12 100 144 120
19 9 16 81 256 144
20 8 20 64 400 160
21 10 20 100 400 200
22 4 6 16 36 24
23 5 8 25 64 40
24 10 14 100 196 140
25 10 19 100 361 190
Total 185 239 1525 2957 1973
The correlation between the customers’ satisfaction and the sales of the company is as follows:
Correlation (r) = (n∑XiYi -∑ Xi∑Yi) / √n∑Xi2
r = (25 × 1973 – 185 × 239) / √ (1525 × 25 – 185 × 185) (25 × 2957 – 239 × 239)
r = 5110/8095.41
r = 0.6
Since the correlation coefficient is positive and close to 1, it indicates that the relationship between
the customers’ satisfaction and the sales is positive and strong. Similarly, the consultants studied the
relationship between different variables, such as quality of service and customer satisfaction, quality
of service and established standards, and so on. Finally, they concluded that the satisfaction level of the
42
UNIT 03: Measure of Variation JGI JAIN
DEEMED-TO-BE UNI VE RSI TY
restaurant’s customers was positive and strong. However, the restaurant’s service level was far behind
the established quality standards.
Questions
1. What are the different steps of data processing used in the case study?
(Hint: The consultants used all the steps of data processing, that is, first they extracted the relevant
data. Then, they classified and organised the information and studied the relationship between
variables.)
2. Which type of measure is used in analysing the table and what type of analysis is used?
(Hint: The measure of relationship is used to analyse the table.)
3. What was done to unanswered questions of the questionnaires filled by customers?
(Hint: Unanswered questions were not ignored and a systematic procedure was followed to retrieve
sufficient data.)
4. How was the data retrieved from questionnaire collected and classified?
(Hint: The customers’ responses from different cities were combined and then sub grouped according
to their cities.)
5. How the relationship between customers’ satisfaction and the sales of the company was derived?
(Hint: By forming a table and calculating correlation between customers’ satisfaction and the sales
of the company)
43
JGI JAINDEEMED-TO-BE UNI VE RSI TY
Quantitative Techniques and Analytics
The relative measures of dispersion are used to compare the distribution of two or more data sets.
Refer to Section Different Measures of Dispersion
3. Range represents the difference between the highest value and the lowest value in a data series. It is
considered a rough measure of variability because it depends on the size of the data series. Refer to
Section Range
4. Standard Deviation is used to calculate the scattering of values in a given dataset. The symbol used
to represent standard deviation is sigma (). The variance is a measure of variability. It is calculated
by taking the average of squared deviations from the mean. Refer to Section Standard Deviation
5. Quartiles are the values that divide a list of numerical data into three quarters. The middle part of
the three quarters measures the central point of distribution and shows the data which are near to
the central point. The lower part of the quarters indicates just half the information set which comes
under the median and the upper part shows the remaining half, which falls over the median. In all,
the quartiles depict the distribution or dispersion of the data set. Quartiles divide the entire set into
four equal parts. So, there are three quartiles, first, second and third represented by Q1, Q2 and Q3,
respectively. Refer to Section Quartile and Interquartile Range
https://www.youtube.com/watch?v=wDAd_QHKoOg
https://www.youtube.com/watch?v=sOb9b_AtwDg
44