0% found this document useful (0 votes)
32 views7 pages

Measures of Dispersion Tendency

The document discusses various measures used to describe the dispersion or variation of data values around the mean or median. It defines and provides examples of range, mean deviation, variance, standard deviation, interquartile range, box plots, and the coefficient of variation. These measures allow researchers to quantify and compare the spread or variability present in different data sets beyond just looking at measures of central tendency.

Uploaded by

ashraf helmy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views7 pages

Measures of Dispersion Tendency

The document discusses various measures used to describe the dispersion or variation of data values around the mean or median. It defines and provides examples of range, mean deviation, variance, standard deviation, interquartile range, box plots, and the coefficient of variation. These measures allow researchers to quantify and compare the spread or variability present in different data sets beyond just looking at measures of central tendency.

Uploaded by

ashraf helmy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Measures of dispersion tendency

 Central tendency can reflect the average level of quantitative variable.


 But it is not enough to know the central tendency of the distribution only, we
should also describe the variation of the observations.
 A measure of location, such as the mean or the median, only describes the center
of the data. It is valuable from that standpoint, but it does not tell us anything
about the spread of the data.
 A second reason for studying the dispersion in a set of data is to compare the
spread in two or more distributions.

Dispersion tendency reflects the degree of variability of different measurements

1-Range
— Simplest measure of variation
— Difference between the largest and the smallest observations:
Example:

2-Mean Deviation
MEAN DEVIATION The arithmetic mean of the absolute values of the deviations from the arithmetic mean.
— A shortcoming of the range is that it is based on only two values, the highest and the lowest;
it does not take into consideration all of the values.
— The mean deviation does. It measures the mean amount by which the values in a
population, or sample, vary from their mean

𝚺|𝒙−𝒙|
Mean deviation MD =
𝒏

Example:
The number of cappuccinos sold at the Starbucks location in the Orange County Airport between
4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50, 60, and 80. Determine the mean
deviation for the number of cappuccinos sold.
Step 1: Compute the mean

x
 x  20  40  50  60  80  50
n 5
-21-
Step 2: Subtract the mean (50) from each of the observations, convert to positive if
difference is negative
Step 3: Sum the absolute differences found in step 2 then divide by the number of
observations

3- Variance and Standard Deviation


 VARIANCE The arithmetic mean of the squared deviations from the mean.
 STANDARD DEVIATION The square root of the variance.
— The variance and standard deviations are nonnegative and are zero only if all observations are
the same.
— For populations whose values are near the mean, the variance and standard deviation will be
small.
— For populations whose values are dispersed from the mean, the population variance and
standard deviation will be large.
— The variance overcomes the weakness of the range by using all the values in the population

USE OF STANDARD DEVIATION


Empirical Rule
For a bell shaped distribution approximately
1. 68% of the observations lie within one standard deviation of the mean
2. 95% of the observations lie within two standard deviations of the mean
3. 99.7% of the observations lie within three standard deviations of the mean
Example
The age distribution of a sample of 5000 persons is bell-shaped with a mean of 40 years and a
standard deviation of 12 years. Determine the approximate percentage of people who are 16 to
64 years old.
Solution
From the given information, for this distribution,
x = 40 and s = 12 years
Each of the two points, 16 and 64, is 24 units away from the mean. (64-40=24,40-16=24)
Because the area within two standard deviations of the mean (2 ×12) is approximately 95% for a
bell-shaped curve, approximately 95% of the people in the sample are 16 to 64 years old.

-22-
variance

The population variance of a set of n The population variance of a set of n


measurements x1,x2… with arithmetic measurements x1,x2… with arithmetic mean
mean μ is the sum of the squared μ is the sum of the squared deviations
deviations divided by n. divided by n.

( X   ) 2 ( X  X ) 2
2  s2 
N n 1 Degree of freedom

Short-cut Formulas for the Variance and Standard Deviation

Example:
The time between an electric light stimulus and a bar press to avoid a shock was noted for each of five
conditioned rats. Use the data below to compute the sample variance.
Shock avoidance times (seconds): 5,4,3,1,3
Solution
The deviations and the squared deviations are shown below. The sample mean is 3.2

Using the total of the squared deviations column, we find the sample variance to be
( X  X ) 2 8.8
s2    2.2
n 1 4
Standard deviation
[definition]
Standard deviation is the positive square root of the variance.
The value of the standard deviation tells how closely the values of a data set are clustered around
the mean.
[symbol]

Population standard deviation σ

( X  X ) 2
S
n 1
Sample standard deviation S
-23-
Properties
– It is the best measurement describing the variability of quantitative variable, which can
reflect the variability of any data.
– Only when the data come from normal distribution, can it be used.

Variance and Standard Deviation


 In general, a lower value of the standard deviation for a data set indicates that the
values of that data set are spread over a relatively smaller range around the mean.
 In contrast, a large value of the standard deviation for a data set indicates that the
values of that data set are spread over a relatively large range around the mean.
 The Variance calculated for population data is denoted by σ² (read as sigma squared),
and the variance calculated for sample data is denoted by s².
 The standard deviation calculated for population data is denoted by σ, and the
standard deviation calculated for sample data is denoted by s.

MEASURES OF POSITION Quartiles and Interquartile Range


Quartiles
— Quartiles split the ranked data into 4 segments with an equal number of values per segment

 The first quartile, Q1, is the value for which 25% of the observations are smaller and
75% are larger
 Q2 is the same as the median (50% are smaller, 50% are larger)
 Only 25% of the observations are greater than the third quartile
Quartile Formulas
Find a quartile by determining the value in the appropriate position in the ranked data,
where
First quartile position: Q1 = 0.25(n+1)
Second quartile position: Q2 = 0.50(n+1)
(the median position)
Third quartile position: Q3 = 0.75(n+1)
where n is the number of observed values
 Example: Find the first quartile
Sample Ranked Data: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values, so Q1 = 12.5

-24-
Interquartile Range
The interquartile range is the distance between the third quartile Q3 (P75) and the first
quartile Q1 (P25) .
 This distance will include the middle 50 percent of the observations.
 Interquartile range = Q3 - Q1

Example
The following are the ages of nine employees of an insurance company:
47 28 39 51 33 37 59 24 33
a) Find the values of the three quartiles. Where does the age of 28 fall in relation to the
ages of the employees?
b) Find the interquartile range.
Solution

The age of 28 falls in the lowest 25% of the ages.


B) QR = Interquartile range = Q3 – Q1
= 49 – 30.5 = 18.5 years
Box Plot
 Five specific values are used:
◦ Median, Q2
◦ First quartile, Q1
◦ Third quartile, Q3
◦ Minimum value in the data set
◦ Maximum value in the data set
 Inner Fences
◦ IQR = Q3 - Q1
◦ Lower inner fence = Q1 - 1.5 IQR
◦ Upper inner fence = Q3 + 1.5 IQR

Analyzing The Graph


— The data values found inside the box represent the middle half (50%) of the data.
— The line segment inside the box represents the median

-25-
Example
The following data are the incomes (in thousands of dollars) for a sample of 12 households.
35 29 44 72 34 64 41 50 54 104 39 58
Construct a box-and-whisker plot for these data.

Solution
Step 1.
29 34 35 39 41 44 50 54 58 64 72 104
Median = (44 + 50) / 2 = 47
Q1 = (35 + 39) / 2 = 37
Q3 = (58 + 64) / 2 = 61
IQR = Q3 – Q1 = 61 – 37 = 24
Step 2.
1.5 x IQR = 1.5 x 24 = 36
Lower inner fence = Q1 – 36 = 37 – 36 = 1
Upper inner fence = Q3 + 36 = 61 + 36 = 97

Coefficient of Variation (C.V)


— The variance and the standard deviation are useful as measures of variation of
the values of a single variable for a single population (or sample).
— If we want to compare the variation of two variables we cannot use the
variance or the standard deviation because:
— 1. The variables might have different units.
— 2. The variables might have different means.
The coefficient of variation is the ratio of the standard deviation to the arithmetic mean,
expressed as a percentage:

s
CV  100%
X

We need a measure of the relative variation that will not depend on either the units or on how
large the values are. This measure is the coefficient of variation (C.V.) which is defined by:

s
CV  100% OR
X
The relative variability in the 1st data set is larger than the relative variability in the 2nd data set if
C.V1> C.V2 (and vice versa).

Coefficient of Variation Usage


 The measurements with different units, such as the variability comparison of height (cm)
and weight (kg)
 When the mean of two groups is quite different, one is very small, while the other is very
large. such as the weight of elephants and infants

-26-
Example
One doctor measured the heights and weights of 50 people, the outcome is
Height : X  165cm, S  8.5cm
Weight : X  64kg, S  7kg

Compare which variability is much larger between height and weight?


Solution
Height : CV 1  8.5 /165 100%  5.15%
W eight : CV 2  7 / 64 100%  10.9%
Since CV1 ˂ CV2 , the relative variability in the 2nd dataset is larger than the relative variability
in the 1st data set. So the variability of weight is much larger.

Coefficient of Skewness

Summary measure for skewness

— If S < 0, the distribution is negatively skewed (skewed to the left).


— If S = 0, the distribution is symmetric (not skewed).
— If S > 0, the distribution is positively skewed (skewed to the right).
Skewness: Box and Whisker Plots, and Coefficient of Skewness

Example:
The following data for a sample of 12 households.
Q1=37 , Q2= 47 , Q3=61
Find: 1- Coefficient of Variation
2- Coefficient of Skewness
Solution

S > 0, the distribution is positively skewed (skewed to the right).


-27-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy