Chapter 3 Measure of Variation Dhiraj (Becon 2025)
Chapter 3 Measure of Variation Dhiraj (Becon 2025)
DHIRAJ GIRI
KATHMANDU UNIVERSITY
SCHOOL OF ARTS
2025
Are rainfall in three stations A, B, and C alike?
Annual Rainfall in Different Stations Mean Median Mode
Station A: 50 50 50 50 50 50 50 50 50
Station B: 48 49 50 50 51 52 50 50 50
Station C: 42 46 50 50 54 58 50 50 50
• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1, 2, 2, 2, 2, 2, 2, 2, 2,3,3,3,3, 4, 5
Range = 5 − 1 = 4
L o w est Hig h e st
O b s e r v at i o n Q2 Q3 O b s e rv a ti o n
Q1
Md
th
n 1
Q1 Lower quartile (First quartile) Value of item
4
th
n 1
Q 2 Middle quartile (Second quartile) Value of 2 item
4
th
n 1
Q 3 Upper quartile (Third quartile) Value of 3 item
4
Deciles
• Deciles are the nine points which divide the distribution
into ten equal parts.
• Namely there are nine deciles D1, D2, D3,…, D8, D9
0 % 1 0 % 2 0 % 3 0% 4 0 % 5 0% 6 0 % 70 % 8 0 % 9 0% 10 0 %
Lo w est Hig he st
O bse rva tion Ob serva tio n
Md th
n 1
D1 First decile Value of item
10
th
n 1
D2 Second decile Value of 2 item
10
th
n 1
D 9 Nineth decile Value of 9 item
10
Percentiles
• Percentiles are the 99 points, which divide the distribution
into hundred equal parts. Namely there are ninety-nine
percentiles,1 %P21%, P2,…………. P98, P99
0% 50 % 9 8% 99 % 1 00 %
Lo w es t H ig h e s t
O b s e rv a tio n O b s e r va ti o n
Md
th
n 1
P1 First percentile Value of item
100
th
n 1
P2 Second percentile Value of 2 item
100
th
n 1
P99 Ninetynineth percentile Value of 99 item
100
How to compute quartile, deciles and percentiles in case
of grouped frequency distribution
N
i cf
4
Qi L h for i 1, 2, 3
f
N
i cf
10
Di L h for i 1, 2 , ...,9
f
N
i cf
100
Pi L h for i 1, 2 , ...,99
f
Quartile Deviation or Semi-Interquartile Range
• A measure of dispersion based on quartiles is known as
Quartile Deviation or Semi-Interquartile Range.
• Inter-Quartile Range = Q3 - Q1
• Half of the inter-quartile range is called Semi-Interquartile
Range, which is also known as Quartile Deviation (QD).
Q3 Q1
QD
2
Q3 Q1 Q1 Q3
Coefficient of QD Midhinge
Q3 Q1 2
• It is not affected by extreme values.
• QD is based on central 50% of the observations.
• It can be calculated for the distribution having open-
ended classes.
The Interquartile Range (IQR)
• The IQR is Q3-Q1 and measures the spread in the middle
50% of the data
• The IQR is also called the midspread because it covers the
middle 50% of the data
• The IQR is a measure of variability that is not influenced by
outliers or extreme values
• Measures like Q1, Q3 and IQR that are not influenced by
outliers are called resistant measures
Mean Deviation (Average Deviation)
• Mean deviation or average deviation is the arithmetic mean
of the deviations of a given set of observations from their
average taking all the deviations as positive.
• Mean deviation shows the scatteredness around an
average.
• The mean deviation is also known as average absolute
deviation.
• It shows the average amount by which the items differ from
the mean or the median.
• While calculating average deviation we, have ignored the
fact that some deviation are positive and some are
negative. This is done because while measuring dispersion
we are interested in the amount and not the direction of
the variation.
For individual data set
1
Mean deviation from mean
n x x
1
Mean deviation from median
n x Med
1
Mean deviation from mode
n x Mode
For frequency distribution
1
Mean deviation from mean
N f x x
1
Mean deviation from median
N f x Med
1
Mean deviation from mode
N f x Mode
Mean Deviation (Average Deviation)
Merits
i) It is rigidly defined
ii) It is based on all observations of the series.
iii) It is simple to understand and easy to calculate.
iv) It is less affected by extreme observations.
Demerits
i) It ignores the negative sign.
ii) It cannot be computed in case of open end classes.
iii) It is not suitable for further mathematical treatment.
iv) It is affected by sampling fluctuations.
v) It does not give the satisfactory result when deviations are
taken from mode when mode is ill-defined.
Standard Deviation
• The standard deviation is the square root of the arithmetic
average of the squared deviation of a given set of
observations from their mean and is denoted by sigma
• It is free from all the defects. Has the same units as the
original data
• Standard deviation is the best measure of dispersion and
most widely used in practice.
• Thus for a given set of observations, the formula for
calculating the standard deviation is given by
Population Standard Deviation
In case of individual data
1 12
N x
N
x 2
( ) 2
X i X
S 2 i 1
n 1
Where,
Sample Data X i : 10 12 14 15 17 18 18 24
n=8 Mean X 16
S
n 1
8 1
130 A measure of the “average”
4.3095 scatter around the mean
7
Comparing Standard Deviations
Data A
Mean = 15.5
s = 3.338
Data B
Mean = 15.5
s = 0.926
Data C
Mean = 15.5
s = 4.567
Combined Standard Deviations
N 1 1 N 2 2 N 1 d 1 N 2 d 2
2 2 2 2
12 N1 N 2
Where,
d 1 x1 x12
d2 x 2 x12
x12 combined mean of series
Measures of Variation: Summary Characteristics
• The more the data are spread out, the greater the range,
variance, and standard deviation.
• The more the data are concentrated, the smaller the range,
variance, and standard deviation.
• If the values are all the same (no variation), all these
measures will be zero.
• None of these measures are ever negative.
The Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare the variability of two or more sets
of data measured in different units
S
CV 100%
X
Comparing Coefficients of Variation
• Stock A:
• Mean price last year = $50
• Standard deviation = $5
S $5
CVA 100% 100% 10%
X $50
Both stocks have the
• Stock B: same standard
deviation, but stock B
• Mean price last year = $100 is less variable relative
• Standard deviation = $5 to its mean price
S $5
CVB 100% 100% 5%
X $100
Comparing Coefficients of Variation
• Stock A:
• Mean price last year = $50
• Standard deviation = $5 Stock C has a
much smaller
S $5 standard
CVA 100% 100% 10%
X $50 deviation but a
much higher
• Stock C: coefficient of
variation
• Mean price last year = $8
• Standard deviation = $2
S $2
CVC 100% 100% 25%
X $8
Shape of a Distribution
• Describes how data are distributed
• Two useful shape related statistics are:
• Skewness
▪ Measures the extent to which data values are not
symmetrical
• Kurtosis
▪ Kurtosis measures the peakedness of the curve of
the distribution—that is, how sharply the curve rises
approaching the center of the distribution
Shape of a Distribution (Skewness)
• Measures the extent to which data is not symmetrical
Shape of a Distribution
Kurtosis Measures How Sharply the Curve Rises
Approaching the Center of the Distribution
Five Number Summary and the Boxplot
• The five numbers that help describe the center, spread and
shape of data are:
• Smallest observation,
• Largest Observation and
• Quartiles (Q1, Q2, and Q3)
X i
X1 X 2 X N
i =1
N N
Where,
μ = population mean
N = population size
Xi = ith value of the variable X
Numerical Descriptive Measures for a Population:
The Variance Sigma Squared
• Average of squared deviations of values from the mean.
N
X
2
i
– Population variance: 2 i 1
N
Where,
μ = population mean
N = population size
Xi = ith value of the variable X
Numerical Descriptive Measures for a Population: The
Standard Deviation Sigma
X
2
i
N
Sample Statistics Versus Population Parameters
The Empirical Rule
• The empirical rule approximates the variation of data in a
bell-shaped distribution
• Approximately 68% of the data in a bell shaped distribution
is within 1 standard deviation of the mean or 1
Using the Empirical Rule
• Suppose that the variable Math SAT scores is bell-shaped
with a mean of 500 and a standard deviation of 90. Then,
,,
Algebraic Properties of Standard Deviation
Standard deviation has a lot of algebraic properties for which it is capable of
further algebraic treatment, and considered as the best of the measures of
dispersion. Some of its algebraic properties are depicted here as under.
1. It is independent of change in the origin. This means that the value of the
standard deviation will remain the same, even if, each of the items of a series
is added or subtracted by a constant quantity.
Proof. Let there be 5 items in a series as 1,2,3,4 and 5. The standard deviations of these items as
such and after of these items as such and after a change in their origin will be as under.
√ ( )
∑𝑥
2
1
𝜎=
𝑛
∑ 2
𝑥 −
𝑛
Thus, It is proved that the in all the cases of change of the origin
remains the same.
2. It is dependent of the change in the scale. This means that the
value of standard deviation is multiplied, or divided by the same
contact by which each item of the series is multiplied, or divided.
Proof. Let there be 5 items in a series as 5,10,15,20 and 25. The
standard deviation of these items as such, and after the change
in their scale will be as under.
Measures of Skewness
Various measures of skewness (Sk) are :
(1) Sk = Mean – Median = M – Md or (Pearsons’ Measure)
Sk = Mean – Mode = M – Mo
(2) Sk = (Q3 – Md) – (Md – Q1) = Q3 + Q1 – 2 Md (Bowley’s Measure)
(3) Sk = (P90 – P50) – (P50 – P10) = P90 + P10 – 2P50 (Kelly’s Measure)
These are the absolute measures of skewness and are not of much
practical utility because of the following reasons :
(i) Since the absolute measures of skewness involve the units of
measurement, they cannot be used for comparative study of the
two distributions measured in different units of measurement.
(ii) Even if the distributions are having the same units of measurement,
the absolute measures are not recommended because we may
come across different distributions which have more or less
identical skewness (absolute measures) but which vary widely in the
• Thus for comparing two or more distributions for skewness we
compute the relative measures of skewness, also commonly known
as coefficients of skewness which are pure numbers independent of
the units of measurement.
• Moreover, in a relative measure of skewness, the disturbing factor of
variation or dispersion is eliminated by dividing the absolute measure
of skewness by a suitable measure of dispersion.
• The following are the coefficients of skewness which are commonly
used.
Karl Pearson’s Coefficient of Skewness.
• This is given by the formula : Sk = (Mean – Mode)/ SD = (M – Mo)/σ
• But quite often, mode is ill-defined and is thus quite difficult to
locate. In such a situation, we use the following empirical relationship
between the mean, median and mode for a moderately
asymmetrical (skewed) distribution : Mo = 3Md – 2M
• Substituting in above, we get Sk = M – (3Md – 2M) / σ = 3(M – Md)/σ
1. Theoretically, Karl Pearson’s Coefficient of Skewness lies between the limits
± 3, but these limits are rarely attained in practice
2. Skewness is zero if M = Mo = Md.
• In other words, for a symmetrical distribution mean, mode and median
coincide i.e., M = Md = Mo.
3. Sk > 0, if M > Md > Mo or if Mo < Md < M
• Thus, for a positively skewed distribution, the value of the mean is the
greatest of the three measures and the value of mode is the least of the
three measures.
• If the distribution is negatively skewed, then the inequality
Sk < 0, if M < Md < Mo or if Mo > Md > M
In other words, for a negatively skewed distribution, of the three
measures of central tendency viz., mean, median and mode, the mode
has the maximum value and the mean has the least value.
4. While ‘dispersion’ studies the degree of variation in the given distribution.
Skewness attempts at studying the direction of variation. Extreme variations
towards higher values of the variable give a positively skewed distribution
while in a negatively skewed distribution, the extreme variations are towards
the lower values of the variable.
5. In Pearson’s coefficient of skewness, the disturbing factor of
variation is eliminated by dividing the absolute measure of skewness
M – Mo by the measure of dispersion σ (standard deviation).
Remarks
1. Bowley’s coefficient of skewness is also known as Quartile
coefficient of skewness and is especially useful in situations where
quartiles and median are used :
(i) When the mode is ill-defined and extreme observations are
present in the data.
(ii) When the distribution has open end classes or unequal class
intervals. In these situations, Pearson’s coefficient of skewness
cannot be used.
2. From (above), we observe that: Sk = 0, if Q3 – Md = Md – Q1
This implies that for a symmetrical distribution (S k = 0), median is
equidistant from the upper and lower quartiles.
Moreover, skewness is positive if :
Skewness is Positive if:
Further
where xi is each data point, k is the order of the moment, and n is the
number of data points.
2. Central Moments (About the Mean): These are calculated about the
mean (μ) and are given by:
1. First Moment:
The mean (μ), which represents the central location of the data.
2. Second Moment:
The variance (σ2), which measures the spread or dispersion of the
data. It is the average squared deviation from the mean.
3. Third Moment:
Related to skewness, which indicates the asymmetry of the
distribution. A positive skew suggests a longer tail on the right, while
a negative skew indicates a longer tail on the left.
4. Fourth Moment:
Related to kurtosis, which measures the "tailedness" or the peak of
the distribution. High kurtosis indicates heavy tails, while low
kurtosis suggests lighter tails.
Importance of Moments
Interpretation: The mean tells us where the center of the data lies.
2. Variability (2nd Moment)
• The second central moment is the variance, which measures the
spread or dispersion of the data around the mean.
• Formula:
Interpretation:
• A larger variance indicates that the data points are spread out more
widely from the mean, while a smaller variance indicates they are
closer to the mean.
3. Skewness (3rd Moment)
• The third central moment describes the skewness of the dataset,
which measures the asymmetry of the distribution.
• Formula (Standardized Skewness):
Interpretation:
• Positive Skewness: The right tail is longer, and the data is skewed to
the right.
• Negative Skewness: The left tail is longer, and the data is skewed to
the left.
• Zero Skewness: The distribution is symmetric.
4. Kurtosis (4th Moment)
• The fourth central moment relates to kurtosis, which measures the
"tailedness" or the peakiness of the distribution.
• Formula (Excess Kurtosis):
Interpretation:
• High Kurtosis (>0): Indicates heavy tails and sharp peaks
(leptokurtic distributions).
• Low Kurtosis (<0): Indicates light tails and flatter peaks
(platykurtic distributions).
• Normal Kurtosis (≈0): Indicates a mesokurtic distribution like
the normal distribution
Summary of How Moments Describe a Dataset
1. 1st Moment (Mean): Central location.
2. 2nd Moment (Variance): Spread or variability.
3. 3rd Moment (Skewness): Asymmetry or direction of the tail.
4. 4th Moment (Kurtosis): Tailedness or shape of the peak.
– Examples: