P102 Lesson 4
P102 Lesson 4
Objectives:
A. MEASURES OF VARIABILITY
The term variability has much the same meaning in statistics as it has in everyday
language; to say that things are variable means that they are not all the same. In
statistics, our goal is to measure the amount of variability for a particular set of
scores, a distribution. In simple terms, if the scores in a distribution are all the same,
then there is no variability. If there are small differences between scores, then the
variability is small (homogenous), and if there are large differences between scores,
then the variability is large (heterogenous).
The figure below shows two distributions of familiar values for the population of
adult males: part (a) shows the distribution of men’s heights (in inches), and part (b)
shows the distribution of men’s weights (in pounds). Notice that the two distributions
differ in terms of central tendency. The mean height is 70 inches (5 feet, 10 inches) and
the mean weight is 170 pounds. In addition, notice that the distributions differ in terms
of variability. For example, most heights are clustered close together, within 5 or 6 inches
of the mean. On the other hand, weights are spread over a much wider range. In the
weight distribution it is not unusual to find individuals who are located more than 30
pounds away from the mean, and it would not be surprising to find two individuals whose
weights differ by more than 30 or 40 pounds.
The purpose for measuring variability is to obtain an objective measure of how
the scores are spread out in a distribution. In general, a good measure of variability
serves two purposes:
1. Variability describes the distribution. Specifically, it tells whether the scores are
clustered close together or are spread out over a large distance. Usually, variability
is defined in terms of distance. It tells how much distance to expect between one
score and another, or how much distance to expect between an individual score
and the mean. For example, we know that the heights for most adult males are
clustered close together, within 5 or 6 inches of the average. Although more
extreme heights exist, they are relatively rare.
2. Variability measures how well an individual score (or group of scores) represents
the entire distribution. This aspect of variability is very important for inferential
statistics, in which relatively small samples are used to answer questions about
populations. For example, suppose that you selected a sample of one person to
represent the entire population. Because most adult males have heights that are
within a few inches of the population average (the distances are small), there is
a very good chance that you would select someone whose height is within 6
inches of the population mean. On the other hand, the scores are much more
spread out (greater distances) in the distribution of weights. In this case, you
probably would not obtain someone whose weight was within 6 pounds of the
population mean. Thus, variability provides information about how much error to
expect if you are using a sample to represent a population.
Range
Range is the distance covered by the scores in a distribution, from the smallest score
to the largest score.
Formula:
Range = 28 – 17
Range = 11
The range is probably the most obvious way to describe how spread out the
scores are—simply find the distance between the maximum and the minimum scores. The
problem with using the range as a measure of variability is that it is completely
determined by the two extreme values and ignores the other scores in the distribution.
Thus, a distribution with one unusually large (or small) score will have a large range
even if the other scores are all clustered close together. Because the range does not
consider all the scores in the distribution, it often does not give an accurate description
of the variability for the entire distribution. For this reason, the range is considered to
be a crude and unreliable measure of variability.
Standard Deviation
The standard deviation is the most commonly used and the most important
measure of variability. Standard deviation uses the mean of the distribution as a reference
point and measures variability by considering the distance between each score and the
mean. In simple terms, the standard deviation provides a measure of the standard, or
average, distance from the mean, and describes whether the scores are clustered closely
around the mean or are widely scattered.
A. RAW DATA
1. Raw Score Method
FORMULA:
(∑ 𝑋)
∑𝑋 −
𝑠= 𝑛
𝑛−1
𝑋 = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
Example:
Scores on Quiz Squared score
(X) X2
14 196
15 225
n=5
10 100
7 49
8 64
∑X = 54 ∑X2 = 634
(∑ 𝑋)
∑𝑋 −
𝑠= 𝑛
𝑛−1
54
634 −
𝑠= 5
5−1
𝑠 = √12.7
𝑠 = 3.56
* The average distance of the scores from the mean is 3.56.
2. Deviation Method
FORMULA:
∑(𝑋 − 𝑋)
𝑠=
𝑛−1
Where:
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
𝑠 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Example:
50.8
𝑠=
5−1
𝑠 = √12.7
𝑠 = 3.56
B. ORGANIZED DATA
1. Raw Score Method
FORMULA:
𝑛(∑ 𝑓𝑋 ) − (∑ 𝑓𝑋)
𝑠=
𝑛 (𝑛 − 1)
Where:
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑠 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑓𝑋 = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 (𝑋)(𝑓𝑋)
𝑓𝑋 = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑓𝑋
Example:
Class Intervals f X (Midpoint of fX
each class fX
interval) (X)(fX)
15-17 4 16 64 1024
12-14 6 13 78 1014
9-11 8 10 80 800
6-8 12 7 84 588
3-5 10 4 40 160
∑f = 40 (N) ∑fX = 346 ∑ fX = 3586
Step 1: Compute for n by adding all frequencies
Step 2: Compute for the midpoint of each class interval using the formula
𝐿𝐿 + 𝑈𝐿
2
Step 3: Multiply each class interval’s frequency with corresponding midpoint
(fX)
𝑛(∑ 𝑓𝑋 ) − (∑ 𝑓𝑋)
𝑠=
𝑛 (𝑛 − 1)
40 (3586) − (346)
𝑠=
40 (40 − 1)
𝑠 = 3.90
2. Deviation Method
FORMULA:
∑ 𝑓𝑑 (∑ 𝑓𝑑)
𝑠= 𝑖 −
𝑛 − 1 𝑛 (𝑛 − 1)
Where:
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑠 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ/𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
𝑓𝑑 = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 (𝑑)(𝑓𝑑)
𝑓𝑑 = 𝑠𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑓𝑑
Example:
Class Intervals f d fd f𝑑
(d)(fd)
15-17 4 3 12 36
12-14 6 2 12 24
9-11 8 1 8 8
6-8 12 0 0 0
3-5 10 -1 -10 10
i=3 ∑f = 40 (N) ∑fd = 22 ∑ f𝑑 = 78
∑ 𝑓𝑑 (∑ 𝑓𝑑)
𝑠= 𝑖 −
𝑛−1 𝑛 (𝑛 − 1)
78 (22)
𝑠= 3 −
40 − 1 40 (40 − 1)
𝑠 = 3.90
The standard deviation has many important characteristics. First, the standard
deviation gives us a measure of dispersion relative to the mean. This differs from the
range, which gives us an absolute measure of the spread between the two most
extreme scores. Second, the standard deviation is sensitive to each score in the
distribution. If a score is moved closer to the mean, then the standard deviation will
become smaller. Conversely, if a score shifts away from the mean, then the standard
deviation will increase. Third, like the mean, the standard deviation is stable with
regard to sampling fluctuations. If samples were taken repeatedly from populations of
the type usually encountered in the behavioral sciences, the standard deviation of the
samples would vary much less from sample to sample than the range. This property is
one of the main reasons why the standard deviation is used so much more often than
the range for reporting variability.
Variance
The variance of a set of scores is just the square of the standard deviation or
the average squared distance from the mean.
FORMULA:
𝑠 = (𝑠)
Where: 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑠 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Example:
Children who viewed the violent cartoon displayed more aggressive responses
(M = 12.45, SD = 3.7) than those who viewed the control cartoon (M = 4.22,
SD = 1.04).
When reporting the descriptive measures for several groups, the findings may be
Problem: 10 students recorded their total number of dreams during the last 3 weeks.
7 8 8 7 3 1 6 9 3 8
B. MEASURES OF POSITION
Quartile
A distribution of test scores or data can be divided into four parts such that
25% of the test scores occur in each quarter. Thus, the first quartile cuts off the
lowest 25%, the second quartile cuts off the lowest 50%, and the third quartile cuts
off the lowest 75%. (Note that the second quartile is also the median.). The
quartiles/quartile points represent the dividing points between the four quarters in the
distribution. There are three of them, respectively labeled Q1, Q2, and Q3. To
differentiate, quartile refers to a specific point whereas quarter refers to an interval. An
individual score may, for example, fall at the third quartile or in the third quarter (but
not “in” the third quartile or “at” the third quarter).
Simply put, quartiles divide your data into quarters: the lowest quarter, two
middle quarters, and a highest quarter.
FORMULA:
𝑁
𝑄 = 𝐿+𝑖 4 − 𝑐𝑓
𝑓𝑚
2𝑁
− 𝑐𝑓
𝑄 = 𝐿+𝑖 4
𝑓𝑚
3𝑁
− 𝑐𝑓
𝑄 = 𝐿+𝑖 4
𝑓𝑚
Where:
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝐿 = 𝑟𝑒𝑎𝑙 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒/ 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
𝑐𝑓 = 𝑐𝑓 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
Class Intervals f cf rf p cp
15-17 4 40 0.10 10% 100%
12-14 6 36 0.15 15% 90%
9-11 8 30 0.20 20% 75%
6-8 12 22 0.30 30% 55%
3-5 10 10 0.25 25% 25%
i=3 ∑f = 40 (N) ∑rf = 1 ∑ 𝑝 = 100%
𝑁 2𝑁 3𝑁
𝑄 = 𝐿+𝑖 4 − 𝑐𝑓 𝑄 = 𝐿+𝑖 4 − 𝑐𝑓 𝑄 = 𝐿+𝑖 4 − 𝑐𝑓
𝑓𝑚 𝑓𝑚 𝑓𝑚
40 2(40) 3(40)
−0 − 10 − 22
𝑄 = 2.5 + 3 4 4 4
𝑄 = 5.5 + 3 𝑄 = 8.5 + 3
10 12 8
𝑄 = 5.5 𝑄 = 8 𝑄 = 11.5
* 25% of the scores fall below * 50% of scores fall below * 75% of scores fall below
5.5 8 11.5
* 25% of the test takers * 50% of the test takers * 75% of the scores fall
obtained a score below 5.5 obtained a score below 8 below 11.5
MEDIAN QUARTILE 2
𝑛 2𝑁
− 𝑓𝑐 − 𝑐𝑓
𝑀𝑑 = 𝐿 + 𝑖 2 4
𝑄 = 𝐿+𝑖
𝑓𝑚 𝑓𝑚
40 2(40)
𝑀𝑑 = 5.5 + 3 2 − 10 4
− 10
𝑄 = 5.5 + 3
12 12
𝑀𝑑 = 8 𝑄 = 8
Md = 𝑸𝟐
Deciles are similar to quartiles except that they use points that mark 10%
rather than 25% intervals. Thus, the top decile, or D9, is the point below which 90%
of the cases fall. The next decile (D8) marks the point below which 80% of the cases
fall, and so forth. Thus, deciles split the data into ten equal parts, with the first decile
cutting off the lowest 10%, the second decile cutting off the lowest 20%, and so on.
FORMULA:
𝑘(𝑛)
𝐷 = 𝐿+𝑖 10 − 𝑐𝑓
𝑓𝑚
Where:
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑘 = 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑑𝑒𝑐𝑖𝑙𝑒 𝑝𝑜𝑖𝑛𝑡/𝑑𝑒𝑐𝑖𝑙𝑒 𝑝𝑜𝑖𝑛𝑡 𝑡𝑜 𝑙𝑜𝑜𝑘 𝑓𝑜𝑟
𝐿 = 𝑟𝑒𝑎𝑙 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑐𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒/ 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
𝑐𝑓 = 𝑐𝑓 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑑𝑒𝑐𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑐𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
* 30% of the scores fall below * 50% of scores fall below * 70% of scores fall below
6 8 10.75
* 30% of the students got a * 50% of the students got * 70% of the students got
score of below 6 a score of below 8 a score of below 10.75
𝑛 2𝑁 𝑘(𝑛)
𝑀𝑑 = 𝐿 + 𝑖 2 − 𝑓𝑐 4
− 𝑐𝑓
10
− 𝑐𝑓
𝑄 = 𝐿+𝑖 𝐷 = 𝐿+𝑖
𝑓𝑚 𝑓𝑚 𝑓𝑚
40 2(40) 5(40)
− 10
𝑀𝑑 = 5.5 + 3 2
𝑄 = 5.5 + 3 4 − 10 10 − 10
𝐷 = 5.5 + 3
12 12 12
𝑀𝑑 = 8 𝑄 = 8 𝐷 =8
Md = 𝑸𝟐 = 𝑫𝟓
It can be seen that a percentile is a ranking that conveys information about the
relative position of a score within a distribution of scores. More formally defined, a
percentile is an expression of the percentage of people whose score on a test or
measure falls below a particular raw score. Percentile is used extensively in education
to compare the performance of an individual to that of a reference group.
Percentile is a measure that divides the distribution into 100 equal parts. Further, it is
the value on the measurement scale below which a specified percentage of the scores
in the distribution fall. For example, if 80th percentile (P80) = 65, then 80% of all
examinees scored below 65.
FORMULA:
𝑘(𝑛)
− 𝑐𝑓
𝑃 = 𝐿+𝑖 100
𝑓𝑚
Where:
𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
𝑘 = 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑝𝑜𝑖𝑛𝑡/𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑝𝑜𝑖𝑛𝑡 𝑡𝑜 𝑙𝑜𝑜𝑘 𝑓𝑜𝑟
𝐿 = 𝑟𝑒𝑎𝑙 𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒/ 𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
𝑐𝑓 = 𝑐𝑓 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
𝑓𝑚 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑐𝑙𝑎𝑠𝑠
Example: Compute for Percentiles 15, 50, and 95
Class Intervals f cf rf p cp
15-17 4 40 0.10 10% 100%
12-14 6 36 0.15 15% 90%
9-11 8 30 0.20 20% 75%
6-8 12 22 0.30 30% 55%
3-5 10 10 0.25 25% 25%
I=3 ∑f = 40 (N) ∑rf = 1 ∑ 𝑝 = 100%
* 15% of the scores fall * 50% of scores fall below 8 * 95% of scores fall below 16
below 4.3 * 50% of the examinees * 95% of the examinees
* 15% of the examinees scored below 8 scored below 16
scored below 4.3
MEDIAN QUARTILE 2
𝑛 2𝑁
− 𝑓𝑐
𝑀𝑑 = 𝐿 + 𝑖 2
𝑄 = 𝐿+𝑖 4 − 𝑐𝑓
𝑓𝑚 𝑓𝑚
40 2(40)
− 10 − 10
𝑀𝑑 = 5.5 + 3 2 4
𝑄 = 5.5 + 3
12 12
𝑀𝑑 = 8 𝑄 = 8
DECILE 5 PERCENTILE 50
𝑘(𝑛) 𝑘(𝑛)
− 𝑐𝑓 − 𝑐𝑓
𝐷 = 𝐿+𝑖 10 𝑃 = 𝐿+𝑖 100
𝑓𝑚 𝑓𝑚
5(40) 50(40)
− 10 − 10
𝐷 = 5.5 + 3 10 𝑃 = 5.5 + 3 100
12 12
𝐷 =8 𝑃 =8
Md = 𝑸𝟐 = 𝑫𝟓 = 𝑷𝟓𝟎
Problem: 10 students recorded their total number of dreams during the last 3 weeks.
7 8 8 7 3 1 6 9 3 8
Step 1: Enter the scores then, click on Analyze -> Descriptive Statistics -> Frequencies.
Step 2: This will bring up the Frequencies dialog box. You need to get the variable for
which you wish to calculate the percentile(s) into the box on the right. You can drag and
drop, or use the arrow button, as shown below.
Step 3: Once you’ve got your variable into the right column, hit the Statistics button. The
Frequencies: Statistics dialog will pop up. As you can see this allows you to choose
from a variety of measures.
Step 4: To add a percentile of your choice, select the Percentile(s) option, type the
percentile value into the textbox (where we’ve got 83), and then click the Add button.
You can repeat this process if you want SPSS to calculate additional percentiles.
Step 5: You’ll see above that we’ve also selected Quartiles (which will generate the
25th, 50th and 75th percentiles), and the Mean and Median. Once you’ve made your
selection, click the Continue button, and then click OK in the Frequencies dialog to
prompt SPSS to do the calculations.