Measures of Dispersion
Measures of Dispersion
The mode can be used when considering the popularity of given attributes e.g. the most popular
car in a given town.
A distribution can have more than one mode. When there are two modes, it is said to be bi-modal
denoted Xm
Example
What is the mode in following data set 2, 11, 25, 11, 2, 5, 17, 38, 25, 17, 25, 13
= 25 (since it appears 3 times)
d1
Xm = Lm + i
d1 + d2
d2 = The difference between the frequencies of the modal class and that of the class
just after it.
i = The class mdth of the modal class.
STU D Y
Example
Compute the modal mark in the business statistics class last semester
Mark Frequency
30 – 40 3
40 – 50 10
50 – 60 19
60 – 70 31
70 – 80 11
Modal class – 60 -70 with a frequency of 31
(31 − 19)
Xm = 60 + × 10
(31 − 19) + (13 − 11)
Xm = 63.75
The measures of central tendencies give us values that may be considered to be typical values
samples of population from which they are computed.
Measures of dispersion enable us to know how far or how near observed values are spread
from the averages. They show the extent to which such values differ from the average value
(usually the mean). When observed values are close to the mean, we say there is low dispersion.
Descriptive Statistics 85
Dispersion is also known as spread, scatter or variation. Some of the most commonly used
measures of dispersion include: - range, variance and standard deviation.
Range
It is the difference between the highest and the lowest values in a data set
Therefore, R= Xmax – Xmin
The range is the simplest measure of dispersion because it only uses two values. It is most useful
in cases where there are erratic changes.
Example
What is the range in the following exchange rates of the shilling to the US dollar? 75, 74,77, 68,
69, 70, 73, 74, 68.5, 75.5, 69, 78.5, 70
Range = max – min
= 78.5 – 68
= 10.5
Example
The following data shows salaries earned by the top management of Kabete International Ltd.
TE X T
105,000; 2,000,000; 300,000; 250,000; 120,000; 350,000, 130,000
Range = Max – Min
= 2,000,000 – 105 000
STU D Y
= 1,895,000
Weakness
Range depends on only two values. This means that it can be influenced by extreme values that
may be considered to be outliers.
It doesn’t not give an indication as to how the values are spread in a distribution
To overcome this weakness, we use the inter-quartile range (IR).
The inter-quartile range is the difference between the top quartile and the lower quartile
IR = Q3 – Q1
Q3– The value in the observation below which ¾ or 75% of the observation lie and above in the
remaining ¼ or 25% of the observations
Q1- Will have 25% of observation are less than and 75% above it.
Another way to describe variation in data is to determine the location of what divides a set
of observation into equal parts. These values include the median, quartile, deciles and
percentiles.
86 q u a n t i tat i v e t e c h n i q u e s
Quartiles
They divide an ordered set into four equal parts. The first quartile Q1 is the value withih which
25% of the observation lie and Q3 is the value below which 75% of the observations lie. When
computing quartiles, the first step is to locate the quartile class. The location of the quartile is
found as:
Qj = (n+1) j/4= Where Qj quartile and is 1, 2, 3, 4.
The quartile value is then found as;
Qj = Lj + (jt/4-cf) i
F
Where Lj - the lower limit of the quartile class
cf - Cumulative frequency up to the class before the quartile class
f - the frequency of the quartile class
i - width of the quartile class
Example
Compute the values of Q1 and Q3 for the scores in the Business Statistics course last semester
Marks f cumulative cf
TE X T
30-40 3 3 0-2
40-50 0 13 3-12
STU D Y
(¾ (74) − 32
Value = 60 + × 10
31
= 67.58
Descriptive Statistics 87
This means that 75% of the students scored less than 67.58% marks
Percentiles
TE X T
Cf - Cumulative frequency upto class just before the percentile
f - Frequency of P class
j - Width of P class
STU D Y
Example 2
Compute the following percentiles for the scores in the Business Statistics course last
semester.
P10, p25, p50, p75, p60, p50
P10, location (74+1) 10/100 = 75x0.1 = 7.5
We look for the class with 7.5th observation
(10 x 74 -3)
40+ 10 = 44.4
100
10
The two common measures of dispersion are variance and standard deviation.
A data set that is more variable will have a larger variance than one that is relatively homogeneous.
The variance is the sum of the square deviations divided by the number of observations. It is the
average of the squares of the deviation of the individual values from their means. For any set of
values, the sum of square deviations from the mean is smaller than the sum of square from any
other point.
Population variance is denoted as δ2→ parameter
Sample variance is denoted as S2→ statistics
88 q u a n t i tat i v e t e c h n i q u e s
∑ (X i − µ) 2
δ2 = i − 1 N
Population Sample
Mean µ ∑x ∑x
= =
N n
Sample size n N
Standard deviation
σ = ∑ (x − µ)2 s = ∑ (x − )2
√ N √ n−1
328
S2 =
5
= 65.6
n
∑ (X i − X ) 2
Sample
variance,
S2 = i = 1
n−1
328
=
4 −
= 82
For group data, we only get an approximation (estimate) of the variance.
∑ f i( X i − X ) 2
S2 = i = 1
∑ fi − 1
The standard deviation is the square root of the variance. It is expressed in the same units as
the original data.
N
∑ (x i − m) 2
=
Population variance,σ 2 i =i
TE X T
n
∑ (x i − x− ) 2
Sample standard deviation, S = i =1
√
STU D Y
n − 1
Coefficient of variation
Coefficient of variation is useful when comparing the levels of variability in sets of data. It is a
relative measure of variability. It is especially useful when comparing sets that are not measured
in the same units e.g. in weights of people vs. income, or when comparing data with means that
are of different magnitudes, or risk of projects. The coefficient of variation is dimensionless (free
of units). It is generally expressed in percentage or in decimal form.
s s
CV − or = × 100%
x x
Example
Which of these 2 sets of data has greater variability?
A B
χ = 150kgs χ = 0.85cm
S = 30.5kgs S = 0.015cm
CV = 30.5 CV = 0.015
150 0.85
= 0.203 = 0.018
Set A has greater variability than set B
Measures of normality/shape
A normal distribution is data that forms a symmetrical bell curve. Measures of normality tells us
more about the way data is distributed e.g. figures A, B and C below appear to be symmetrical.
Distributions may have same averages and measures of dispersion but have different shapes.
Measures of normality give us an idea of how the data is distributed. Measures of normality
include coefficient of skewness and coefficient of kurtosis.
Skewness
TE X T
Skewness describes the degree of symmetry in a distribution. When data are uni-modal and
symmetrical, the mean, mode and median will be almost the same value. In a skewed distribution,
we have higher frequencies occurring to one end of the distribution e.g.
STU D Y
A
Descriptive Statistics 91
C TE X T
STU D Y
92 q u a n t i tat i v e t e c h n i q u e s
X − X0.5
Second coefficient of skewness, SK2 = 3
S
xm = mode
S = standard deviation
X0.5 = median
If SK1 or SK2 = 0, the distribution is normally distributed or is symmetrical.
If SK>0, the distribution is positively skewed.
TE X T
Kurtosis
STU D Y
Example
Leptokurtic
(Highly peaked)
Descriptive Statistics 93
Mesokurtic
(Normal distribution)
TE X T
STU D Y
Platykurtic
½(Q3 − Q1)
K=
P90 − P10
94 q u a n t i tat i v e t e c h n i q u e s
Empirical Rule
i. The empirical rule says that if a sample or population of measurement has a normal
distribution
ii. Approximately 68% of the observations lie within one standard deviation of the mean
iii. Approximately 95% of the observations lie within two standard deviations of the mean
iv. Approximately 99.7% of the observations lie within three standard deviations of the
mean.
Diagram 1.1
68%
95%
99.7%
TE X T
Chapter Summary
Statistics is the art and science of getting information from data or numbers to help in decision
making.
The following are some characteristics of index numbers
1. They are specialised averages to obtain a typical measure of central tendency like an
average. The items must both be comparable and the unit of measurement must be the
same
2. Measure the change in the level of a phenomenon3. Measure the effect of changes over a
period of time
Counting techniques may be classified into:
i. Probability trees
ii. Permutations
iii. Combinations
Descriptive Statistics 95
Chapter Quiz
1. Define Mean
2. Which of the following is the odd one out?
i. Mean
ii. Mode
iii. Median
iv. Range
3. What is the importance of Kurtosis?
4. ………… describes the degree of symmetry in a distribution when data are uni-modal
and symmetrical, the mean, mode and median will be almost the same value.
5. List three counting techniques.
TE X T
STU D Y
96 q u a n t i tat i v e t e c h n i q u e s
Required
a) Determine the semi interquartile range for the above data
b) Determine the minimum value for the top ten per cent.(10%)
c) Determine the maximum value for the lower 40% of the retirees
3. The following information was obtained from an NGO which was giving small loans to some
small scale business enterprises in 1996. the loans are in the form of thousands of Kshs.
Descriptive Statistics 97
Required
Using the Pearsonian measure of skewness, calculate the coefficients of skewness and comment
briefly on the nature of the distribution of the loans.
TE X T
4. a) Distinguish between discrete and continuous data.
b) What is dispersion and what is the formula for the standard deviation?
STU D Y
c) What is the measure of relative dispersion?