0% found this document useful (0 votes)
89 views31 pages

Why Study Dispersion?: Spread of The Data

This document discusses measures of dispersion and skewness in data. It explains that measures of dispersion like variance and standard deviation quantify how spread out values are from the average, and are important for comparing different data sets. It presents formulas for calculating dispersion statistics both from raw and grouped data. It also defines the coefficient of variation, a relative measure used to compare dispersions when data is in different units. Finally, it discusses how distributions can be symmetric, positively skewed, or negatively skewed based on the relationship between the mean, median and mode.

Uploaded by

Ashekin Mahadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views31 pages

Why Study Dispersion?: Spread of The Data

This document discusses measures of dispersion and skewness in data. It explains that measures of dispersion like variance and standard deviation quantify how spread out values are from the average, and are important for comparing different data sets. It presents formulas for calculating dispersion statistics both from raw and grouped data. It also defines the coefficient of variation, a relative measure used to compare dispersions when data is in different units. Finally, it discusses how distributions can be symmetric, positively skewed, or negatively skewed based on the relationship between the mean, median and mode.

Uploaded by

Ashekin Mahadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 31

MEASURES OF DISPERSION.

WHY STUDY DISPERSION?


An average, such as the mean or the median, only
locates the center of the data. It is valuable from
that standpoint, but an average does not tell us
anything about the spread of the data. A small value
for a measure of dispersion indicates that the data
are clustered closely, say, around the arithmetic
mean. The mean is therefore considered
representative of the data. Conversely, a large
measure of dispersion indicates that the mean is
not reliable.
A second reason for studying the dispersion in a
set of data is to compare the spread in two or
more distributions.
• METHODS OF STUDYING THE VARIATIONS:
• The Range
• The Average (Mean) Deviation
• The Variance and the Standard Deviation
• Interquartile Range.
• The mean deviation, the Variance and the
Standard deviation are all based on deviations
from the mean.
• Range:
• Range = Largest Value – smallest value
• The range is widely used in statistical process
control applications.
• A serious defect of the range is that it is based on
only two values, the largest and smallest; it does
not take into consideration all of the values.
Variance and Standard Deviation

• Variance and Standard Deviations are also


based on the deviations from the mean.
• Definition:
• Variance – The arithmetic mean of the
squared deviations from the mean.
• Standard Deviation – The standard deviation
is the square root of the variance.
• When population data are used, the variance is
denoted by σ2 and the standard deviation by σ.
• When sample data are used, the variance is
denoted by s2 and the standard deviation by s.
Population Variance from Raw Data

• The formula for Population Variance is:

 
2  ( X   ) 2

N
• The same formula can be used for more
convenient use in calculation:
 ( X ) 2

 
X 2

N
2 
N
Sample variance from Raw data(Page#115


2
(X  X )
s 
2

n 1

 X 
2 (  X ) 2

S 
2 n
n 1
• NOTE:
• If all the observations (items) in a sample
are very close to each other, the standard
deviation is close to zero. If the items are
well dispersed, the S.D. tends to be large.
A small S.D. means a high degree of
uniformity of the items of a series, a large
S.D. means just the opposite.
Variance and Standard Deviation from
ungrouped Data:[Page#114]
Hourly Wage
Example: Find the (X)

Standard Deviation X2

from the hourly


$12 144
wages of a sample of 20 400
16 256
5 workers working in 18 324
19 361
a factory: $12, 20, 16, Total $85 1485

18, and 19.


 X 2

(  X ) 2

1485 
(85) 2

S2  n  5  10(dollar ) 2  $3.16
n 1 5 1
Variance and standard Deviation from
Grouped Data
Selling
price
($'000) f X fX X2 fX2
15 -18 8 16.5 132 272.25 2178
18 -21 23 19.5 448.5 380.25 8745.75
21 -24 17 22.5 382.5 506.25 8606.25
24 -27 18 25.5 459 650.25 11704.5
27 -30 8 28.5 228 812.25 6498
30 -33 4 31.5 126 992.25 3969
33 -36 2 34.5 69 1190.25 2380.5
Total 80 - 1845 4803.75 44082
with a mean of 100 and a standard deviation of 10.

Chart Page #117

70 80 90 100 110 120 130


68%
95%
99.7%
CHART 3–7 A Symmetrical, Bell-Shaped Curve Showing the
Relationships between the
Exercise .Page#84, self-Review.#58, 59, 60, 61, 62.

 fX 2

(  fX ) 2

44082 
(1845) 2

2
 n  80  19.39(dollar ) 2  $4.40
n 1 80  1
Relative Dispersion

• A direct comparison of two or more measure of


dispersion - say, the standard deviation for a distribution
of annual incomes and the standard deviation of a
distribution of absenteeism for this same group of
employees- is impossible. Can we say that the standard
deviation of $1,200 for the income distribution is greater
than the standard deviation of 4.5 days for the
distribution of absenteeism? Obviously not, because we
cannot directly compare dollars and days absent from
work. In order to make a meaningful comparison of the
dispersion in incomes and absenteeism, we need to
convert each of these measure to a relative value- that
is, a percent.
• Karl Pearson (1857-1936), who contributed
significantly to the science of statistics,
developed a relative measure called the
coefficient of variation (CV). It is a very useful
measure when:
• [When to use (CV)]
• The data are in different units (such as dollars
and days absent).
• The data are in the same units, but the means
are far apart (such as the incomes of top
executives and the incomes of the unskilled
employees).
• COEFFICIENT OF VARIATION: The ratio
of the standard deviation to the arithmetic
mean, expressed as a percent.
• In terms of a formula for a sample:
• COEFFICIENT OF VARITION

S
CV  X 100
x
• EXAMPLE: A study of the amount of
bonus paid and the years of service of
employee resulted in these statistics: The
mean bonus paid was $200; the Standard
deviation was $40.The mean number of
years of service was 20 years; the
standard deviation was 2 years. Compare
the relative dispersion in the two
distributions.
• Solution:
• The distributions are in different units (dollars
and years of service). Therefore, they are
converted to coefficients of variation.
• Done one on the board.
• Interpreting, there is more dispersion relative to
the mean in the distribution of bonus paid
compared with the distribution of years of
service (because 20 percent > 10 percent).
X

• The same procedure is used when the data are in the


same units but the means are far apart. (see the
following example.)
• Example: The variation in the annual incomes of
executives is to be compared with the variation in
incomes of unskilled employees. For a sample of
executives, = $500,000 and s = $50,000. For a
sample of unskilled employees, = $32,000, and s =
$3,200. We are tempted to say that there is more
dispersion in the annual incomes of the executives
because $50,000> $3,200.The means are so far apart,
however, that we need to convert the statistics to
coefficients of variation to make a meaningful
comparison of the variation in annual incomes.
• Solution: Done on the board.
Self- Review 4-4. Page: 113. Example-19, 21.

• The Relative Positions of the Mean,


Median and Mode:
• A distribution is symmetric bell-shaped (mound-
shaped), meaning it has the same shape on
either side of the center. For a symmetric,
mound-shaped distribution, the mean, median
and mode are located at the center and are
always equal.
• Graph
SKEWNESS

Another characteristic of a set of data is the shape.


Other characteristics were: Measures of central
location, and measures of spread]. There are
four shapes commonly observed: symmetric,
positively skewed, negatively skewed, and
bimodal.
In a symmetric set of observations, the mean and
median are equal and data values are evenly
spread around these values. The data values
below the mean and median are a mirror image
of those above.
• A set of values is skewed to the right or
positively skewed if there is a single peak and
the values extend much further to the right of the
peak than to the left of the peak. In this case the
arithmetic mean is the largest of the three
measures. Why? Because the mean is
influenced more than the median or mode by a
few extremely high values. The median is
generally the next largest measure in a
positively skewed frequency distribution. The
mode is the smallest of the three measures.
• In a negatively skewed distribution there is a
single peak but the observations extend further
to the left, in the negative direction, than to the
right. In this case, the mean is the smallest of
the three measures. The mean is influenced by
a few extremely low observations. The median is
greater than the arithmetic mean, and the modal
value is the largest of the three measures.
• Graph
• Positively skewed distributions are more
common. Salaries often follow this pattern. Think
of the salaries of those employed in small
company of about 100 people. The president
and a few top executives would have very large
salaries relative to the other workers and hence
the distribution of salaries would exhibit positive
skewness.
• A bimodal distribution will have two or more
peaks. This is often the case when the values
are from two or more populations.
• There are several formulas in the statistical
literature used to calculate skewness. The
simplest, developed by Professor Karl Pearson,
is based on the difference between the mean
and the median.
• Pearson's Coefficient of Skewness
3( Mean  Median)
Sk=
s
• Using this relationship the coefficient of
skewness can range from -3 up to 3. A value
near -3, such as -2.57, indicates considerable
negative skewness. A value of 0, which will
occur when the mean and median are equal,
indicates the distribution is symmetrical and that
there is no skewness present.
Page #111

• Example: Following are the earning per


share for a sample of 15 software
companies for the year 2019. The
earnings per share arranged from smallest
to largest.
• $0.09 $0.13 $0.41 $0.51 $ 1.12
$ 1.20 $ 1.49 $3.18 $3.50 $6.36
$7.83 $8.92 $10.13 $12.99 $16.40
Find the coefficient of skewness and interpret

3( X  Med ) 3($4.95  $3.18)


Sk    1.017
S $5.22
Empirical Rule. Online page # 117
For a symmetrical, bell-shaped frequency distribution,
approximately 68 percent of the observations will lie within plus
and minus one standard deviation of the mean;
about 95 percent of the observations will lie within plus and
minus two standard deviations of the mean;
and practically
all (99.7 percent) will lie within plus and minus three standard
deviations of the mean.
These relationships are portrayed graphically in Chart 3–7 for a
bell-shaped distribution.
Example: A sample of the rental rates at University Park
Apartments approximates a symmetrical, bell-shaped
distribution. The sample mean is $500; the standard
deviation
is $20. Using the Empirical Rule, answer these questions:
1. About 68 percent of the monthly rentals are between what
two amounts?
2. About 95 percent of the monthly rentals are between what
two amounts?
3. Almost all of the monthly rentals are between what two
amounts?
Solution: Page #118.
1.About 68 percent are between $480 and $520, found by
2.2. About 95 percent are between $460 and $540, found by
3. Almost all (99.7 percent) are between $440 and $560, found

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy