Continuation Cahpter 4
Continuation Cahpter 4
There are other measures of location that describe or locate the position of noncentral
pieces of data in relation to the entire set. These measures, called quantiles, are values
below which a specific fraction or percentage of the observations in a given set must fall.
The quantiles are as follows:
1. Quartiles - values that divide a set of arrayed observations into 4 equal parts denoted by
Q1, Q2 (also Median), and Q3; 25% of the data fall below Q1, 50% fall below Q2, and 75% fall
below Q3.
2. Deciles - values that divide a set of arrayed observations into 10 equal parts denoted by
D1, D2, ..., D9; 10% of the data fall below D1, 20% fall below D2, ..., and 90% fall below D9.
3. Percentiles - values that divide a set of arrayed observations into 100 equal parts denoted
by P1, P2, ..., P99; 1% of the data fall below P1, 2% fall below P2, ..., and 99% fall below P99.
88
Deciles and Percentiles
Calculation from Ungrouped Data
When looking for 𝐷𝑖 and 𝑃𝑖 , we have the following steps:
2. Evaluate:
iN
a. For 𝐷𝑖 (ith decile):
10
iN
b. For 𝑃𝑖 (ith percentile):
100
89
Deciles and Percentiles
Illustrations: N = 60
a. D6: i = 6
iN (6)(60)
= = 36
10 10
b. P75: i = 75
iN (75)(60)
= = 45
100 100
90
Deciles and Percentiles
𝑖𝑁
If the product in (2) is a whole number i. 𝑒. , = 36 , the ith quantile is the mean of
4
that ranked observation and the next higher observation. Thus:
X iN + X iN / 2 = Di X iN + X iN / 2 = Pi
20
10
+1
100 100
+1
Illustration: N = 60
X36 + X (36 +1) X36 + X 37 , mean of the 36th and 37th observations in array
D6 = =
2 2
91
Deciles and Percentiles
𝑖𝑁
If the product in (2) is not a whole number i. 𝑒. , 10 = 37.8 , the ith quantile is the
next higher observation.
Illustration: N = 63
𝑖𝑁
Suppose we want D6 and 10 = 37.8
D6 = X38, next higher observation after X37
92
Deciles and Percentiles
Example:
Suppose the arrayed scores of 48 students in a Math 213 long exam are
as follows: (N = 48)
93
Deciles and Percentiles
1. Array is given.
iN (9)(48)
= = 43.2
10 10
2. D9 = X44 = 97
Interpretations:
94
Quartiles
1. Array is given.
iN (9)(48)
= = 43.2
10 10
2. D9 = X44 = 97
Interpretations:
95
Quartiles
To find the quartiles:
3. The first quartile, Q1, is the median of all the values below the location of the
median of the whole data set.
4. The third quartile, Q3, is the median of all the values above the location of the
median of the whole set of data.
96
Quartiles
Example: Using the same data set (N = 48).
97
Quartiles
2. To find Q3:
X n 2 + X n 2
2
2
−1
(X12 + X13 ) = 93 + 94 = 93.5
𝑄3 = =
2 2 2
98
Calculation from Group Data
Although one can always determine quantiles from the original data, it may be
advantageous and less time-consuming to calculate these numbers from the
frequency distribution. The formulas available assume that the measurements
within a given class interval are uniformly distributed between the lower and
upper class boundaries. These formulas are just variations to the median
formula. Given that classes are arrayed from lowest to highest, we have:
iN
Qi = L Q + C − P / f where:
4
iN L - lower class boundary of the quantile class
Di = L D + C − F / f F - less than cumulative frequency preceding the quantile class
10 f - frequency of the quantile class
C - class size
iN
Pi = L P + C − F / f
100
99
Calculation from Group Data
Example: Consider the following data set on the numbers of man-hours required by
a painting company to paint 100 houses of assorted size and condition.
Man-hours Number of < cf Sample calculations:
houses a. Q1 = ?
0 – 19 4 4
20 – 39 5 9 iN (1)(100)
40 – 59 13 22 = = 25 → 25th observation in
60 – 59 17 39 Q1 class D3 4 4 array is in Q 1
80 – 99 24 63 class class
100 – 119 11 74
120 – 139 10 84 25 − 22
140 – 159 7 91 Q1 = 59.5 + 20 = 59.5 + 3.5294 = 63.03
160 – 179 5 96 17
180 – 199 4 100
P99 class
N = 100
Calculation from Group Data
b. D3 = ?
iN (3)(100)
= = 30 → 30th observation in array is in D3 class
10 10
30 − 22
D3 = 59.5 + 20 = 59.5 + 9.4118 = 68.91
17
c. P99 = ?
iN (99)(100)
= = 99 → 99th observation in array is in P99 class
100 2
99 − 96
P99 = 179.5 + 20 = 179.5 + 15 = 194.5
4
Calculation from Group Data
Note: To determine the percentile rank of a given score, 𝑃𝑖 , in the distribution, the
following formula is used.
(P − L )f / c + F
i= 1 100
N
Example: To find the percentile rank of 165 man-hours in the given data,
(165 − 159.5)5 / 20 + 90
i= 100
100
= 92.375
≅ 92.4%
Common Measures of Variation
One of the most important characteristics of a set of data is that the values are
usually not all alike. The precise extent to which they are not alike, or vary
among themselves, is of basic importance in statistics.
RA = 5 - 5 = 0
RB = 7 - 3 = 4
RC = 6 - 4 = 2.
We note that a range of zero simply means that all the values in the data set are the same.
There is no variability in the values or the variable under consideration is a constant for this
data set. Also, the larger is the difference between the two extreme values, the larger is the
range. Comparing the three data sets with respect to variability based on the range, we can
say that while data set A is perfectly homogeneous, data set B is the most heterogeneous.
Data set C ranks second to data set B in terms of variability.
Common Measures of Variation
Computation from grouped data:
1. R = UL - LL + 1
where:
UL = upper limit of the highest class
LL = lower limit of the lowest class
2. R = U – L
where:
U = upper class boundary of the highest class
L = lower class boundary of the lowest class
Common Measures of Variation
Example
Using the data used in the calculation of quantiles where the lowest
class was 0 - 19 and the highest class was 180 - 199.
Man-hours Number of < cf
houses
0 – 19 4 4
20 – 39 5 9 R = UL - LL + 1 = 199 - 0 + 1 = 200 or
40 – 59 13 22
60 – 59 17 39 Q1 class
R=U -D3L = 199.5 - (-0.5) = 200
80 – 99 24 63 class
100 – 119 11 74
120 – 139 10 84
140 – 159 7 91
160 – 179 5 96
180 – 199 4 100
P99 class
N = 100
Common Measures of Variation
Properties of the Range
1. The range is easy to calculate and easy to understand.
2. Its main shortcoming is that it tells us nothing about the dispersion of the data that fall
between the two extremes. Thus, it is a poor measure of variation particularly if the size of
the sample or population is large. Consider the following sets of data, both with a range of 12:
In set A, the mean and the median are both 8 but the numbers vary over the entire interval
from 3 to 15. In set B, the mean and the median are also 8, but most of the values are closer
to the center of the data. Thus, one should conclude that set A is more variable than set B
not that sets A and B are equally heterogeneous based on the range.
Common Measures of Variation
Properties of the Range
3. When the sample size is quite small, the range can be an adequate measure of
variation. It is used primarily when we are interested in getting a quick, though
perhaps not very accurate, picture of the variability of a set of data without going
through excessive calculations.
Example
X i − Md
A.D. = i =1
n
Common Measures of Variation
Average Deviation (Based on the Median)
Example: Find the average deviation of the following data representing the average
relative humidity at 1:30 p.m. in a certain city, for each month of the year.
71, 64, 53, 43, 37, 32, 28, 28, 31, 42, 59, 70
Solution:
To first, find the median,
Md = mean of the 6th and 7th values in array
42+43
= 2
= 42.5
Common Measures of Variation
Average Deviation (Based on the Median)
X i − Md f i
A.D. = i =1
n
where:
45 322.25
322.25
𝐴. 𝐷. = ≅ 7.16
45
Thus, the values in the distribution deviated by 7.16, on the average, from
the median of 57.25.
Common Measures of Variation
Average Deviation (Based on the Median)
1. The sum of the absolute deviations from the median will always be less
than the sum of the absolute deviations from the mean.
2. The main drawback of the average deviation is that due to the absolute
values it does not lend itself readily to further mathematical treatment.
Common Measures of Variation
Variance and Standard Deviation
i x
( X − ) 2
2x = i =1
, population var iance of X
N
n
i
( X − X ) 2
s 2x = i =1
, sample var iance of X
n −1
Common Measures of Variation
Variance and Standard Deviation
X 2
i
N X 2
− ( X ) 2
2x = i =1
− 2x = i
2
i
, population
N N
n X 2
− ( X ) 2
s 2x = i i
, sample
n (n − 1)
Common Measures of Variation
Variance and Standard Deviation
Examples:
6, 7, 7, 7, 8, 8, 8, 9, 10
n X i2 − ( X i ) 2
s =
2
n (n − 1)
x
9(556) − (70) 2
=
9(8)
104
=
72 Note: Using the sample variance key
2
(𝜎𝑛−1 ), of a scientific calculator one gets
1.4444
the same answer.
Common Measures of Variation
Variance and Standard Deviation
where:
For grouped data:
𝑋𝑖 = class mark of ith class
k
k
2
k
k
2
N fi Xi − fi Xi
2
N fidi − fidi
2 𝑓𝑖 = frequency of ith class
G2 = i =1 i =1 = i =1 i =1 C 2 , population C = class size
N 2 N 2 𝑋𝑜 = assumed mean
𝑋𝑖 − 𝑋𝑜
di = , deviation, measure
𝐶
n f i X i2 − (f i X i ) n f i d i2 − (f i d i )2 2
2 k = no. of classes
s =
2
= C , population n = total no. of observations
n (n − 1) −
G
n ( n 1)
Common Measures of Variation
Variance and Standard Deviation
n X i2 f i − (X i f i ) 2 n f i d i2 − (f i d i ) 2 2
s =
2
s =
2
c
n (n − 1) n ( n − 1)
(100)(99) (100)(99)
18,749,900 18,750,400
= =
9,900 9,900
= 1,893.9792 = 1,893.9797
Common Measures of Variation
Variance and Standard Deviation
The variance of a set of data is an extremely important measure of variation and it
is used extensively in statistical work. By reason of squaring the deviations,
however, this variance is not in the same unit of measurement as the data
themselves and their mean. We obtain a number in squared units. That is, if the
original measurements were in year, the variance would be expressed in years
squared. To get a measure of variation expressed in the same units as the raw
data, as was the case for the range and average deviation, we take the square
root of the variance. Taking the square root compensate for the fact that we
averaged the squared deviations.
Common Measures of Variation
Variance and Standard Deviation
The standard deviation of a set of data is the positive square root of its variance.
1. 𝜎𝑥2 = 1.2839498
𝜎𝑥 = 1.2839498 ≅ 1.133
Thus, the mean difference of the scores from their mean is about 1 point only.
2. 𝑠𝑡2 = 1.4444
𝑠𝑥 = 1.4444 ≅ 1.2018
Thus, the scores' mean difference from their mean is about 1 point only.
Common Measures of Variation
Variance and Standard Deviation
Now, the standard deviation of grouped data is calculated on the assumption
that all measurements belonging to a class are located at its class mark. The
error which is introduced by this assumption, and which is called a grouping
error, can be fairly large, particularly if the class size is wide. A correction,
called Sheppard's correction, which compensates for this error but applicable
only to "bell-shaped" distributions-- hump-backed with flat tails-- is given below.
𝐶2
corrected variance = variance - , C = class size
12
The corrected standard deviation is just the positive square root of the
corrected variance.
Common Measures of Variation
Variance and Standard Deviation
(20)2
Corrected variance = 1,893.9797 -
12
= 1,893.9797 - 33.3333
= 1,860.6464
Corrected standard deviation ≅ 43.1352
Thus, the mean difference of the observations from their mean is about
43.1 man-hours.
Common Measures of Variation
Properties of the Variance
1. The variance can never be negative. (It is a squared value.) Like the range
and the average deviation, its minimum value is zero- absence of variability. A
large variance corresponds to a highly dispersed set of values.
𝑋𝑖 + 5 6 7 8 9 10
Common Measures of Variation
Properties of the Variance
10𝑋𝑖 10 20 30 40 50
Common Measures of Variation
Coefficient of Variation
CV = x 100, population
s
CV = x 100, sample
X
Common Measures of Variation
Coefficient of Variation
Examples:
Solution:
Using the coefficient of variation,
This equals approximately four
hundredths of one percent thus
𝑠 0.10 the measurements can be taken
CV = x 100 = x 100 = 0.0416666 = 0.04%
𝑥 240 to be extremely accurate
(extremely close to each other).
Common Measures of Variation
Coefficient of Variation
Solution:
Computing the coefficients of variation,
a. weights of 10 boxes
𝑠 9.64
𝐶𝑉𝑤 = x 100 = x 100 = 3.4676258 ≅ 3.47%
𝑥ҧ 278
Common Measures of Variation
Coefficient of Variation
b. prices of 10 boxes
𝑠 2.43
𝐶𝑉𝑃 = x 100 = x 100 = 6.9767441 ≅ 6.98%
𝑥 34.83
We can conclude that the weights are relatively more homogeneous than the
prices.There are other measures of relative variation which may be defined in
terms of statistical measures other than the standard deviation and the mean.