0% found this document useful (0 votes)
27 views47 pages

Continuation Cahpter 4

There are several types of quantiles that describe the location of data values in a dataset: 1) Quartiles (Q1, Q2, Q3) divide data into 4 equal parts 2) Deciles (D1-D9) divide data into 10 equal parts 3) Percentiles (P1-P99) divide data into 100 equal parts Quantiles can be calculated directly from raw data by determining the values below which a certain percentage of observations fall, or estimated from a frequency distribution using class boundaries and frequencies.

Uploaded by

Jerald Retanal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views47 pages

Continuation Cahpter 4

There are several types of quantiles that describe the location of data values in a dataset: 1) Quartiles (Q1, Q2, Q3) divide data into 4 equal parts 2) Deciles (D1-D9) divide data into 10 equal parts 3) Percentiles (P1-P99) divide data into 100 equal parts Quantiles can be calculated directly from raw data by determining the values below which a certain percentage of observations fall, or estimated from a frequency distribution using class boundaries and frequencies.

Uploaded by

Jerald Retanal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Quantiles: Quartiles, Deciles, and Percentiles

There are other measures of location that describe or locate the position of noncentral
pieces of data in relation to the entire set. These measures, called quantiles, are values
below which a specific fraction or percentage of the observations in a given set must fall.
The quantiles are as follows:

1. Quartiles - values that divide a set of arrayed observations into 4 equal parts denoted by
Q1, Q2 (also Median), and Q3; 25% of the data fall below Q1, 50% fall below Q2, and 75% fall
below Q3.

2. Deciles - values that divide a set of arrayed observations into 10 equal parts denoted by
D1, D2, ..., D9; 10% of the data fall below D1, 20% fall below D2, ..., and 90% fall below D9.

3. Percentiles - values that divide a set of arrayed observations into 100 equal parts denoted
by P1, P2, ..., P99; 1% of the data fall below P1, 2% fall below P2, ..., and 99% fall below P99.

88
Deciles and Percentiles
Calculation from Ungrouped Data
When looking for 𝐷𝑖 and 𝑃𝑖 , we have the following steps:

1. Arrange the observations in increasing magnitude.

2. Evaluate:

iN
a. For 𝐷𝑖 (ith decile):
10

iN
b. For 𝑃𝑖 (ith percentile):
100

where N = total number of observations.

89
Deciles and Percentiles

Illustrations: N = 60
a. D6: i = 6
iN (6)(60)
= = 36
10 10

b. P75: i = 75
iN (75)(60)
= = 45
100 100

90
Deciles and Percentiles
𝑖𝑁
If the product in (2) is a whole number i. 𝑒. , = 36 , the ith quantile is the mean of
4
that ranked observation and the next higher observation. Thus:

   
X  iN  + X  iN   / 2 = Di X iN  + X iN   / 2 = Pi
 
  20 
 
 10
+1  
   100   100  
 +1 

Illustration: N = 60
X36 + X (36 +1) X36 + X 37 , mean of the 36th and 37th observations in array
D6 = =
2 2

91
Deciles and Percentiles
𝑖𝑁
If the product in (2) is not a whole number i. 𝑒. , 10 = 37.8 , the ith quantile is the
next higher observation.

Illustration: N = 63
𝑖𝑁
Suppose we want D6 and 10 = 37.8
D6 = X38, next higher observation after X37

92
Deciles and Percentiles
Example:

Suppose the arrayed scores of 48 students in a Math 213 long exam are
as follows: (N = 48)

85, 85, 86, 89, 90, 90, 90,


90, 90, 90, 90, 90, 91, 91,
91, 91, 91, 92, 92, 92, 92,
92, 92, 92, 92, 92, 92, 93,
93, 93, 93, 93, 93, 93, 93,
93, 94, 95, 95, 95, 95, 96,
96, 97, 97, 98, 99, 99

93
Deciles and Percentiles
1. Array is given.
iN (9)(48)
= = 43.2
10 10

2. D9 = X44 = 97

Interpretations:

1. Nine-tenths or 90% of the scores fall below 97. One-tenth or 10% of


the scores are above 97.

2. The top 10% scored 97 or better.

94
Quartiles
1. Array is given.
iN (9)(48)
= = 43.2
10 10

2. D9 = X44 = 97

Interpretations:

1. Nine-tenths or 90% of the scores fall below 97. One-tenth or 10% of


the scores are above 97.

2. The top 10% scored 97 or better.

95
Quartiles
To find the quartiles:

1. Arrange the observations in increasing magnitude.

2. Find the median in the array.

3. The first quartile, Q1, is the median of all the values below the location of the
median of the whole data set.

4. The third quartile, Q3, is the median of all the values above the location of the
median of the whole set of data.

96
Quartiles
Example: Using the same data set (N = 48).

1. To find Q1: (N is even)


  X + X 25 92 + 92
a. Md = X N + X N   / 2 = 24 = = 92
 2  2 
 +1 
 2 2

b. Q1 = median of observations below 24th position


= median of the first 23 observations (𝑛1 = 23 is odd)
     
X n 1 + X  n 1   X 24 + X  24   X 24 + X  24  
 2  +1  
 2 
 2  +1  
 2   2  +1  
 2  90 + 91
Q1 = = = = = 90.5
2 2 2 2

97
Quartiles
2. To find Q3:

Q3 = median of observations above the 24th position


= median of the remaining 24 observations (𝑛2 = 24 is even)

X n 2 + X n 2
2

 2

−1 

(X12 + X13 ) = 93 + 94 = 93.5
𝑄3 = =
2 2 2

98
Calculation from Group Data
Although one can always determine quantiles from the original data, it may be
advantageous and less time-consuming to calculate these numbers from the
frequency distribution. The formulas available assume that the measurements
within a given class interval are uniformly distributed between the lower and
upper class boundaries. These formulas are just variations to the median
formula. Given that classes are arrayed from lowest to highest, we have:
 iN 
Qi = L Q + C  − P  / f where:
 4 
 iN  L - lower class boundary of the quantile class
Di = L D + C  − F  / f F - less than cumulative frequency preceding the quantile class
 10  f - frequency of the quantile class
C - class size
 iN 
Pi = L P + C  − F / f
 100 

99
Calculation from Group Data
Example: Consider the following data set on the numbers of man-hours required by
a painting company to paint 100 houses of assorted size and condition.
Man-hours Number of < cf Sample calculations:
houses a. Q1 = ?
0 – 19 4 4
20 – 39 5 9 iN (1)(100)
40 – 59 13 22 = = 25 → 25th observation in
60 – 59 17 39 Q1 class  D3 4 4 array is in Q 1
80 – 99 24 63 class class
100 – 119 11 74
120 – 139 10 84  25 − 22 
140 – 159 7 91 Q1 = 59.5 + 20   = 59.5 + 3.5294 = 63.03
160 – 179 5 96  17 
180 – 199 4 100
 P99 class
N = 100
Calculation from Group Data
b. D3 = ?
iN (3)(100)
= = 30 → 30th observation in array is in D3 class
10 10
 30 − 22 
D3 = 59.5 + 20   = 59.5 + 9.4118 = 68.91
 17 
c. P99 = ?
iN (99)(100)
= = 99 → 99th observation in array is in P99 class
100 2
 99 − 96 
P99 = 179.5 + 20  = 179.5 + 15 = 194.5
 4 
Calculation from Group Data
Note: To determine the percentile rank of a given score, 𝑃𝑖 , in the distribution, the
following formula is used.

 (P − L )f / c + F 
i= 1  100
 N 
Example: To find the percentile rank of 165 man-hours in the given data,

 (165 − 159.5)5 / 20 + 90 
i=  100
 100 

= 92.375

≅ 92.4%
Common Measures of Variation
One of the most important characteristics of a set of data is that the values are
usually not all alike. The precise extent to which they are not alike, or vary
among themselves, is of basic importance in statistics.

Measures of central tendency describe one important aspect of a set of data --


their middle or their "average" -- but they tell us nothing about this other basic
characteristic. We can have data sets having the same mean and yet they are
not identical data sets simply because of the different values the data sets
contain.
Date Set Values Mean Hence, we require ways of measuring the extent
A 5, 5, 5, 5, 5 5 to which data are dispersed, or spread out, and
B 3, 4, 5, 6, 7 5 the statistical measures which provide this
C 4, 4, 5, 6, 6 5 information are called measures of variation or
dispersion.
Common Measures of Variation
Range
The range, R, of a set of numbers, is the difference between the largest and the smallest. It
can then be computed for data that are at least ordinal in scale (ordinal, interval, and ratio).

Example: Using data sets A, B, and C above, we find

RA = 5 - 5 = 0
RB = 7 - 3 = 4
RC = 6 - 4 = 2.

We note that a range of zero simply means that all the values in the data set are the same.
There is no variability in the values or the variable under consideration is a constant for this
data set. Also, the larger is the difference between the two extreme values, the larger is the
range. Comparing the three data sets with respect to variability based on the range, we can
say that while data set A is perfectly homogeneous, data set B is the most heterogeneous.
Data set C ranks second to data set B in terms of variability.
Common Measures of Variation
Computation from grouped data:

1. R = UL - LL + 1

where:
UL = upper limit of the highest class
LL = lower limit of the lowest class

2. R = U – L

where:
U = upper class boundary of the highest class
L = lower class boundary of the lowest class
Common Measures of Variation
Example
Using the data used in the calculation of quantiles where the lowest
class was 0 - 19 and the highest class was 180 - 199.
Man-hours Number of < cf
houses
0 – 19 4 4
20 – 39 5 9 R = UL - LL + 1 = 199 - 0 + 1 = 200 or
40 – 59 13 22
60 – 59 17 39 Q1 class
R=U  -D3L = 199.5 - (-0.5) = 200
80 – 99 24 63 class
100 – 119 11 74
120 – 139 10 84
140 – 159 7 91
160 – 179 5 96
180 – 199 4 100
 P99 class
N = 100
Common Measures of Variation
Properties of the Range
1. The range is easy to calculate and easy to understand.

2. Its main shortcoming is that it tells us nothing about the dispersion of the data that fall
between the two extremes. Thus, it is a poor measure of variation particularly if the size of
the sample or population is large. Consider the following sets of data, both with a range of 12:

Set A: 3, 4, 5, 6, 8, 9, 10, 12, 15


Set B: 3, 7, 7, 7, 8, 8, 8, 9, 15

In set A, the mean and the median are both 8 but the numbers vary over the entire interval
from 3 to 15. In set B, the mean and the median are also 8, but most of the values are closer
to the center of the data. Thus, one should conclude that set A is more variable than set B
not that sets A and B are equally heterogeneous based on the range.
Common Measures of Variation
Properties of the Range

3. When the sample size is quite small, the range can be an adequate measure of
variation. It is used primarily when we are interested in getting a quick, though
perhaps not very accurate, picture of the variability of a set of data without going
through excessive calculations.

Example

It is used widely in industrial quality control, where it is necessary to keep


a close check on the quality of raw materials, semi finished, and finished
products on the basis of many small samples taken at more or less regular
intervals of time.
Common Measures of Variation
Average Deviation (Based on the Median)

The average deviation is the average amount of scatter of the values in a


distribution from the median, ignoring the signs of the deviations. This is best used
when the median is the appropriate measure of central tendency (in the presence
of extreme values/skewed distributions).

Calculation from ungrouped data:


n

X i − Md
A.D. = i =1
n
Common Measures of Variation
Average Deviation (Based on the Median)

Example: Find the average deviation of the following data representing the average
relative humidity at 1:30 p.m. in a certain city, for each month of the year.

71, 64, 53, 43, 37, 32, 28, 28, 31, 42, 59, 70

Solution:
To first, find the median,
Md = mean of the 6th and 7th values in array
42+43
= 2
= 42.5
Common Measures of Variation
Average Deviation (Based on the Median)

To find the average deviation using the median,


Xi |Xi – Md|
 X i − 42.5 162
71
64
28.5
21.5 A.D. = = = 13.5
53
43
105
0.5
12 12
37 5.5
32 10.5
28
28
14.5
14.5
By simply looking at the data, we can say that in this
31 11.5 city there are considerable fluctuations in relative
42 0.5
59 16.5 humidity from month to month. The average deviation
70 27.5
tells us more specifically that on the average, the
monthly figures deviated by 13.5 from the annual
162.0
median of 42.5.
Common Measures of Variation
Average Deviation (Based on the Median)
Calculation from grouped data:
k

X i − Md f i
A.D. = i =1
n
where:

𝑋𝑖 = class mark of ith class


𝑓𝑖 = frequency of ith class
k = number of classes
n = total number of observations
Common Measures of Variation
Average Deviation (Based on the Median)
Example: Using the data used in illustrating the calculation of the median for
grouped data (Md = 57.25).
Classes fi Xi |Xi – 57.25| |Xi = 57.25|fi
40-44 4 42 15.25 61
45-49 5 47 10.25 51.25
50-54 8 52 5.25 42
55-59 10 57 0.25 2.5
60-64 7 62 4.75 33.25
65-69 6 67 9.75 58.5
70-74 5 72 14.75 73.75

45 322.25
322.25
𝐴. 𝐷. = ≅ 7.16
45

Thus, the values in the distribution deviated by 7.16, on the average, from
the median of 57.25.
Common Measures of Variation
Average Deviation (Based on the Median)

Properties of the Average Deviation

1. The sum of the absolute deviations from the median will always be less
than the sum of the absolute deviations from the mean.

2. The main drawback of the average deviation is that due to the absolute
values it does not lend itself readily to further mathematical treatment.
Common Measures of Variation
Variance and Standard Deviation

The variance of a set of numbers is the mean of the squared deviations of


these numbers from their mean. The definitional formulas for the
population and sample follow for some variable X:
N

 i x
( X −  ) 2

2x = i =1
, population var iance of X
N
n

 i
( X − X ) 2

s 2x = i =1
, sample var iance of X
n −1
Common Measures of Variation
Variance and Standard Deviation

For ungrouped data:


N

X 2
i
N  X 2
− ( X ) 2
2x = i =1
−  2x = i
2
i
, population
N N

n  X 2
− ( X ) 2
s 2x = i i
, sample
n (n − 1)
Common Measures of Variation
Variance and Standard Deviation

Examples:

1. Consider the scores on the first quiz of a small class:

6, 7, 7, 7, 8, 8, 8, 9, 10

Let Xi = score of the ith student; i = 1, 2, 3, ..., 9.

a. To first find the population mean score:


 Xi 70
x = = = 7.7777777 (keep unrounded)
N 9
Common Measures of Variation
Variance and Standard Deviation

Finding the variance of the scores:


 X i2 N  X i2 − ( X i ) 2
 =
2
x −  2x  =
2
x
N N2
556 9(556) − (70) 2
= − (7.77...)2 =
9 92
= 61.777777 – 60.493827 104
= 81
= 1.2839498 = 1.2839506
 1.2839 = 1.2840
Common Measures of Variation
Variance and Standard Deviation

2. Taking the scores in 1) as sample scores from a bigger class:

n  X i2 − ( X i ) 2
s =
2

n (n − 1)
x

9(556) − (70) 2
=
9(8)
104
=
72 Note: Using the sample variance key
2
(𝜎𝑛−1 ), of a scientific calculator one gets
 1.4444
the same answer.
Common Measures of Variation
Variance and Standard Deviation
where:
For grouped data:
𝑋𝑖 = class mark of ith class
k
 k 
2
 k
 k  
2

N  fi Xi −   fi Xi 
2
 N  fidi −   fidi  
2 𝑓𝑖 = frequency of ith class
G2 = i =1  i =1  =  i =1  i =1   C 2 , population C = class size
N 2  N 2  𝑋𝑜 = assumed mean
 
  𝑋𝑖 − 𝑋𝑜
di = , deviation, measure
𝐶

n  f i X i2 − (f i X i )  n f i d i2 − (f i d i )2  2
2 k = no. of classes
s =
2
=  C , population n = total no. of observations
n (n − 1) −
G
 n ( n 1) 
Common Measures of Variation
Variance and Standard Deviation

Example: Using the data used in the calculation of quantiles to calculate a


sample variance.
𝑋𝑖 − 89.5
Let 𝑋𝑜 = 89.5, 𝑑𝑖 = 20
Man-hours f xi fixi fi x i2 di fidi fi d i2
0 – 19 4 9.5 38 361 -4 -16 64
20 – 39 5 29.5 147.5 4,351.25 -3 -15 45
40 – 59 13 49.5 643.5 31,853.25 -2 -26 52
60 – 79 17 69.5 1,181.5 82,114.25 -1 -17 17
80 – 99 24 89.5 2,148.0 192,246.00 0 0 0
100 – 119 11 109.5 1,204.5 131,892.75 1 11 11
120 – 139 10 129.5 1,295.0 167,702.50 2 20 40
140 – 159 7 149.5 1,046.5 156,451.75 3 21 63
160 – 179 5 169.5 847.5 143,651.25 4 20 80
180 – 199 4 189.5 758.0 143,641.00 5 20 100
9,310.0 1,054,265 18 472
Common Measures of Variation
Variance and Standard Deviation

n  X i2 f i − (X i f i ) 2  n  f i d i2 − (f i d i ) 2  2
s =
2
s =
2
c
n (n − 1)  n ( n − 1) 

100(1,054,265) − (9.310) 2 100(472) − (18) 2 


= =  (20)
2

(100)(99)  (100)(99) 

(1.05426 x 108 ) − 86,676,100  47,200 − 324 


= = 400
9,900  9,900 

18,749,900 18,750,400
= =
9,900 9,900
= 1,893.9792 = 1,893.9797
Common Measures of Variation
Variance and Standard Deviation
The variance of a set of data is an extremely important measure of variation and it
is used extensively in statistical work. By reason of squaring the deviations,
however, this variance is not in the same unit of measurement as the data
themselves and their mean. We obtain a number in squared units. That is, if the
original measurements were in year, the variance would be expressed in years
squared. To get a measure of variation expressed in the same units as the raw
data, as was the case for the range and average deviation, we take the square
root of the variance. Taking the square root compensate for the fact that we
averaged the squared deviations.
Common Measures of Variation
Variance and Standard Deviation
The standard deviation of a set of data is the positive square root of its variance.

𝜎𝑥 = 𝜎𝑥2 , population standard deviation of X

𝜎𝑥 = 𝑠𝑥2 , sample standard deviation of X


Common Measures of Variation
Variance and Standard Deviation
Examples: (Using the variances calculated earlier.)

1. 𝜎𝑥2 = 1.2839498

𝜎𝑥 = 1.2839498 ≅ 1.133
Thus, the mean difference of the scores from their mean is about 1 point only.

2. 𝑠𝑡2 = 1.4444
𝑠𝑥 = 1.4444 ≅ 1.2018

Thus, the scores' mean difference from their mean is about 1 point only.
Common Measures of Variation
Variance and Standard Deviation
Now, the standard deviation of grouped data is calculated on the assumption
that all measurements belonging to a class are located at its class mark. The
error which is introduced by this assumption, and which is called a grouping
error, can be fairly large, particularly if the class size is wide. A correction,
called Sheppard's correction, which compensates for this error but applicable
only to "bell-shaped" distributions-- hump-backed with flat tails-- is given below.

𝐶2
corrected variance = variance - , C = class size
12
The corrected standard deviation is just the positive square root of the
corrected variance.
Common Measures of Variation
Variance and Standard Deviation

Example: Using the variance for the data on man-hours.

Variance = 1,893.9797, using deviation formula


standard deviation ≅ 43.5199

(20)2
Corrected variance = 1,893.9797 -
12
= 1,893.9797 - 33.3333
= 1,860.6464
Corrected standard deviation ≅ 43.1352
Thus, the mean difference of the observations from their mean is about
43.1 man-hours.
Common Measures of Variation
Properties of the Variance

1. The variance can never be negative. (It is a squared value.) Like the range
and the average deviation, its minimum value is zero- absence of variability. A
large variance corresponds to a highly dispersed set of values.

2. If each observation of a set of data is transformed to a new set by the


addition (or subtraction) of a constant c, the variance of the original set of data
is the same as the variance of the new set.
Example: 𝑋𝑖 1 2 3 4 5

𝑋𝑖 + 5 6 7 8 9 10
Common Measures of Variation
Properties of the Variance

3. If a set of data is transformed to a new set by multiplying (or dividing) each


observation by a constant c, the variance of the new set is the original variance
multiplied by (or divided by) c2.
Example:
𝑋𝑖 1 2 3 4 5

10𝑋𝑖 10 20 30 40 50
Common Measures of Variation
Coefficient of Variation

The coefficient of variation, CV, expresses the standard deviation as a


percentage of the mean. The formulas for the population and sample are:


CV = x 100, population

s
CV = x 100, sample
X
Common Measures of Variation
Coefficient of Variation

It is a measure of relative variation expressed as a percent. It can then be


used to compare the variability of two or more data sets even when the
observations are expressed in different units of measurement. Instead of
having to compare the variability of prices in pesos, ages in years, and weights
in kilos, we can compare the respective Vs. V can also be used to compare
the relative dispersion of several data sets given in the same units of
measurement. For instance, it can be used to investigate the relative variability
of the prices of a number of commodities. If the means and units of
measurement are both the same for these data sets, then the standard
deviation can be used as a measure of relative variation.
Common Measures of Variation
Coefficient of Variation

Examples:

1. Five repeated measurements of the length of a room gave a mean of 240


inches with a standard deviation of 0.10 inch. Can you say that the
measurements are extremely accurate?

Solution:
Using the coefficient of variation,
This equals approximately four
hundredths of one percent thus
𝑠 0.10 the measurements can be taken
CV = x 100 = x 100 = 0.0416666 = 0.04%
𝑥 240 to be extremely accurate
(extremely close to each other).
Common Measures of Variation
Coefficient of Variation

2. The weights of 10 boxes of a certain brand of cereal have a mean content of


278 grams with a standard deviation of 9.64 grams. If these boxes were
purchased at 10 different stores and the average price per box is 34.83 with a
standard deviation of 2.43, can you conclude that the weights are relatively
more homogeneous than the prices?

Solution:
Computing the coefficients of variation,
a. weights of 10 boxes
𝑠 9.64
𝐶𝑉𝑤 = x 100 = x 100 = 3.4676258 ≅ 3.47%
𝑥ҧ 278
Common Measures of Variation
Coefficient of Variation

b. prices of 10 boxes

𝑠 2.43
𝐶𝑉𝑃 = x 100 = x 100 = 6.9767441 ≅ 6.98%
𝑥 34.83

We can conclude that the weights are relatively more homogeneous than the
prices.There are other measures of relative variation which may be defined in
terms of statistical measures other than the standard deviation and the mean.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy