0% found this document useful (0 votes)

186 views126 pages

Numerical Descriptive Measures

Numerical descriptive measures provide important information about data sets beyond what is shown in graphs. The measures of central tendency—mean, median, and mode—identify characteristics of the center or typical value in a data set. The mean is the average and is calculated by dividing the sum of all values by the total number of data points. The median is the middle value when the data is arranged in order. The mode is the most frequently occurring value in the data set. These measures can help understand features of distributions like the typical or relative positions of values.

Uploaded by

Vishesh Dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views126 pages

Numerical Descriptive Measures

Uploaded by

Vishesh Dwivedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 126

NUMERICAL DESCRIPTIVE MEASURES

 Graphs are one important component of statistics;

however, it is also important to numerically describe the
main characteristics of a data set. The numerical summary
measures, such as the ones that identify the center and
spread of a distribution, identify many important features of
a distribution.

 For example, we can prepare graphs based on family

income data. However, if we want to know the income of a
“typical” family (given by the center of the distribution), the
spread of the distribution of incomes, or the relative
position of a family with a particular income, the numerical
summary measures can provide more detailed information.

2
MEASURES OF CENTRAL TENDENCY FOR
UNGROUPED DATA
The measures that we discuss in this chapter include
measures like:

 Central Tendency (Mean, Median, Mode)

 Spread/Dispersion (Range, Standard Deviation)
 Position (Quartiles, Percentiles)

Figure
3.1 3
Mean
The mean for ungrouped data is obtained by dividing the
sum of all values by the number of values in the data set. Thus,

Mean for population data:   x

Mean for sample data: x

 x
n

where  x is the sum of all values; N is the population size; n

is the sample size;  is the population mean; and
x is the
sample mean.

4
Example 3-1
Table 3.1 lists the total cash donations (rounded to millions of
dollars) given by eight U.S. companies during the year 2016.
Table 3.1 Cash Donations in 2016 by Eight
U.S. Companies

Find the mean of cash donations made by these eight

companies.
5
Example 3-1: Solution

x  x 1  x2  x3  x4  x5  x6  x7  x8
 319  199  110  63  21  315  26  63  1116

x
 x 1116
  139.5  $139.5million
n 8

Thus, these eight companies donated an average of $139.5 million in

2010 for charitable purposes.

6
Example 3-2
The following are the ages (in years) of all eight employees of a
small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.

7
Example 3-2: Solution
The population mean is

  x 362
  45.25 years
N 8

Thus, the mean age of all eight employees of this company is

45.25 years, or 45 years and 3 months.

8
 Reconsider Example 3–2. If we take a sample of 3
employees from this company and calculate the mean age of
those 3 employees, this mean will be denoted by
x
Suppose the three values included in the sample are 32, 39,
and 57. Then, the mean age for this sample is:

= (32+39+57)∕3
x
= 42.67 yrs

9
 If we take a second sample of 3 employees of this company,
the value of x will (most likely) be different. Suppose the
second sample includes the values 53, 27, and 44. Then, the
mean age for this sample is

= (53+27+44)∕3
x
= 41.33 yrs

 Consequently, we can state that the value of the population

mean is constant. However, the value of the sample mean
varies from sample to sample. The value of x for a particular
sample depends on what values of the population are
included in that sample.

 A major shortcoming of the mean as a measure of central

10
tendency is that it is very sensitive to outliers.
Example 3-3:
Table 3.2 Number of Homes Foreclosed in
2010

11
Example 3-3
Note that the number of homes foreclosed in California is
very large compared to those in the other six states.
Hence, it is an outlier. Show how the inclusion of this outlier
affects the value of the mean.

12
Example 3-3: Solution
 If we do not include the number of homes foreclosed in
California (the outlier), the mean of the number of
foreclosed homes in six states is

Mean without the outlier

49,723  20,352  10,824  40,911  18,038  61,848

6
201,696
  33,616
6

13
Example 3-3: Solution
 Now, to see the impact of the outlier on the value of the
mean, we include the number of homes foreclosed in
California and find the mean number of homes foreclosed
in the seven states. This mean is

Mean with the outlier

173,175  49,723  20,352  10,824  40,911  18,038  61,848

7
374,871
  53,553
7

 including the foreclosed homes of California causes around 60% increment in

the value of the mean, which changes from 33,616 to 53,553.
14
 Remember that the Mean is not always the best
measure of central tendency because it is heavily
influenced by outliers.

 Sometimes other measures of central tendency give a

more accurate impression of a data set.

 For example, when a data set has outliers, instead of

using the mean, we can use the Median as a measure
of central tendency.

15
Median
 The Median is the value of the middle term in a data set
that has been ranked in increasing order. i.e., it divides a
ranked data set into two equal parts.

 The calculation of the median consists of the following two

steps:

1. Rank the data set in increasing order.

2. Find the middle term. The value of this term is the

median.

16
 Note that if the number of observations in
a data set is odd, then the median is given
by the value of the middle term in the
ranked data.

 However, if the number of observations is

even, then the median is given by the
average of the values of the two middle
terms.

17
Example 3-4
Refer to the data on the number of homes foreclosed in 7
states given in Table 3.2 of Example 3.3. Those values
are
listed below.

173,175 49,723 20,352 10,824 40,911 18,038 61,848

Find the median for these data.

18
Example 3-4: Solution
First, we rank the given data in increasing order as follows:
10,824 18,038 20,352 40,911 49,723 61,848 173,175

Since there are 7 homes in this data set and the middle term
is the fourth term, the median is given by the value of the 4th
term in the ranked data.

Thus, the median number of homes foreclosed in these seven

states was 40,911 in 2010.

19
Example 3-5
 Table 3.3 gives the total compensations (in millions of
dollars) for the year 2010 of the 12 highest-paid CEOs of
U.S. companies.

20
Table 3.3 Total Compensations of 12
Highest-Paid CEOs for the Year 2010
Find the median for
these data.

21
Example 3-5: Solution
 First we rank the given total compensations of the 12 CESs as
follows:

 21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5

 There are 12 values in this data set. Because there are an

even number of values in the data set, the median is given by
the average of the two middle values.

22
Example 3-5: Solution
 The two middle values are the sixth and seventh in the
arranged data, and these two values are 28.0 and 28.2.

28.0  28.2 56.2

Median    28.1  $28.1million
2 2

 Thus, the median for the 2010 compensations of these 12

CEOs is $28.1 million.

23
Median

 The median gives the center of a histogram, with half the

data values to the left of the median and half to the right of
the median.

 The advantage of using the median as a measure of central

tendency is that it is not influenced by outliers.

 Consequently, the median is preferred over the mean as a

measure of central tendency for data sets that contain
outliers.

24
Mode

 In statistics, the mode represents the most common value in

a data set.

 The mode is the value that occurs with the highest

frequency in a data set.

25
Example 3-6
 The following data give the speeds (in miles per hour) of
eight cars that were stopped on NH-95 for speeding
violations.

77 82 74 81 79 84 74 78

Find the mode.

26
Example 3-6: Solution
 In this data set, 74 occurs twice and each of the remaining
values occurs only once. Because 74 occurs with the highest
frequency, it is the mode. Therefore,

Mode = 74 miles per hour

27
Mode
 A major shortcoming of the mode is that a data set may
have none or may have more than one mode, whereas it will
have only one mean and only one median.

 No Mode: A data set with each value occurring only once.

 Unimodal: A data set with only one mode.
 Bimodal: A data set with two modes.
 Multimodal: A data set with more than two modes.

28
Example 3-7 (Data set with no mode)
 Last year’s incomes of five randomly selected families were
$76,150, $95,750, $124,985, $87,490, and $53,740.

 Find the mode.

29
Example 3-7: Solution
 Because each value in this data set occurs only once, this data
set contains no mode.

30
Example 3-8 (Data set with two modes)
A small company has 12 employees. Their commuting times
(rounded to the nearest minute) from home to work are 23,
36, 12, 23, 47, 32, 8, 12, 26, 31, 18, and 28, respectively.

Find the mode for these data.

31
Example 3-8: Solution
In the given data on the commuting times of the 12
employees, each of the values 12 and 23 occurs twice, and
each of the remaining values occurs only once. Therefore,
that data set has two modes: 12 and 23 minutes.

32
Example 3-9 (Data set with three modes)
The ages of 10 randomly selected students from a class are 21,
19, 27, 22, 29, 19, 25, 21, 22 and 30 years, respectively.

Find the mode.

33
Example 3-9: Solution
This data set has three modes: 19, 21 and 22. Each of these
three values occurs with a (highest) frequency of 2.

34
Mode
One advantage of the mode is that it can be calculated for
both kinds of data (quantitative and qualitative) - whereas
the mean and median can be calculated for only
quantitative data.

35
Example 3-10

 The status of five students who are members of the

student senate at a college are senior, sophomore, senior,
junior, and senior, respectively. Find the mode.

36
Example 3-10: Solution
 Because senior occurs more frequently than the other
categories, it is the mode for this data set. We cannot
calculate the mean and median for this data set.

37
 To sum up, we cannot say for sure which of the three
measures of central tendency is a better measure overall.
Each of them may be better under different situations.

 Probably the mean is the most-used measure of central

tendency, followed by the median.

 The mean has the advantage that its calculation includes

each value of the data set.

 The median is a better measure when a data set includes

outliers.

 The mode is simple to locate, but it is not of much use in

practical applications. 38
and Mode
Figure 3.2 Mean, median, and mode for a
symmetric histogram and frequency
distribution curve.

For a symmetric histogram and frequency distribution with one peak, the
values of the mean, median, and mode are identical, and they lie at the center
of the distribution.

39
Relationships Among the Mean, Median,
and Mode Figure 3.3 Mean, median, and
mode for a histogram and frequency
distribution curve skewed to the right.

For a histogram and a frequency distribution curve skewed to the right, the
value of the mean is the largest, that of the mode is the smallest, and the value
of the median lies between these two. (Notice that the mode always occurs at
the peak point.) The value of the mean is the largest in this case because it is
sensitive to outliers that occur in the right tail. These outliers pull the mean to
the right.
40
Relationships Among the Mean, Median,
and Mode Figure 3.4 Mean, median, and
mode for a histogram and frequency
distribution curve skewed to the left.

If a histogram and a frequency distribution curve are skewed to the left, the
value of the mean is the smallest and that of the mode is the largest, with
the value of the median lying between these two. In this case, the outliers in
the left tail pull the mean to the left.

41
 The measures of central tendency, such as the
mean, median, and mode, do not reveal the
whole picture of the distribution of a data set.

 Two data sets with the same mean may have

completely different spreads. The
variation/spread among the values of
observations for one data set may be much
larger or smaller than for the other data set.

42
 Consider the following two data sets on the ages
(in years) of all workers working for each of two
small companies.

Company 1: 47 38 35 40 36 45 39
Company 2: 70 33 18 52 27

The mean age of workers in both these companies

is the same, 40 years.

43
 If we do not know the ages of individual workers at
these two companies and are told only that the mean
age of the workers at both companies is the same, we
may deduce that the workers at these two companies
have a similar age distribution.

 However, the variation in the workers’ ages for each

of these two companies is very different.

 If we look carefully, the ages of the workers at the

second company have a much larger variation than
the ages of the workers at the first company.

44
 Thus, the mean, median, or mode by itself is usually
not a sufficient measure to reveal the shape of the
distribution of a data set.

 We also need a measure that can provide some

information about the variation among data values.

 The measures that help us learn about the spread of a

data set are called the measures of dispersion.

 The measures of central tendency and dispersion

taken together give a better picture of a data set than
the measures of central tendency alone. 45
Measures of Dispersion for
Ungrouped Data
This section discusses three measures of dispersion:

 Range
 Variance and Standard Deviation
 Population Parameters and Sample Statistics

46
Range
Finding the Range for Ungrouped Data

 The range is the simplest measure of dispersion.

Range = Largest value – Smallest Value

47
Example 3-11

 Table 3.4 gives the total areas in square miles of the four
western South-Central states of the United States.

 Find the range for this data set.

48
Table 3.4

49
Example 3-11: Solution

Range = Largest value – Smallest Value

= 267,277 – 49,651
= 217,626 square miles

Thus, the total areas of these four states are spread over a range of
217,626 square miles.

50
Range
Disadvantages

 The range, like the mean, has the disadvantage of being

influenced by outliers. In Example 3–11, if the state of
Texas with a total area of 267,277 square miles is
dropped, the range decreases from 217,626 square miles
to 20,252 square miles. Consequently, the range is not a
good measure of dispersion to use for a data set that
contains outliers.

 Its calculation is based on two values only: the largest

and the smallest. All other values in a data set are ignored
when calculating the range. Thus, the range is not a very
satisfactory measure of dispersion.
51
Variance and Standard Deviation

 The standard deviation is the most-used measure of

dispersion.

 The value of the standard deviation tells how closely the

values of a data set are clustered around the mean.

 In general, a lower value of the standard deviation for a

data set indicates that the values of that data set are
spread over a relatively smaller range around the mean.

 In contrast, a larger value of the standard deviation for a

data set indicates that the values of that data set are
spread over a relatively larger range around the mean.
52
Variance and Standard Deviation
 The variance calculated for population data is denoted by σ²
and the variance calculated for sample data is denoted by
s².

 The standard deviation calculated for population data is

denoted by σ, and the standard deviation calculated for
sample data is denoted by s.

53
Variance and Standard Deviation
Basic Formulas for the Variance and Standard Deviation for
Ungrouped Data

 2

  x   2

and s 2

  x  x
2

N n 1

 x     x  x
2 2

  and s 
N n 1

where σ² is the population variance, s² is the sample variance,

σ is the population standard deviation, and s is the sample
standard deviation.

54
Table 3.5 (Mid-Term scores of a sample of 4
students)

55
Variance and Standard Deviation
Short-cut Formulas for the Variance and Standard Deviation
for Ungrouped Data

 x 2
 x 2

 
x 2

N
 
x 2

n
2  and s 2 
N n 1
  x
2
  x
2

 x 2

N
 x 2

n
  and s 
N n 1
where σ² is the population variance, s² is the sample variance,
σ is the population standard deviation, and s is the sample
standard deviation.

56
Example 3-12
Until about 2009, airline passengers were not charged for checked
baggage. Around 2009, however, many U.S. airlines started charging
a fee for bags. According to the Bureau of Transportation Statistics,
U.S. airlines collected more than $3 billion in baggage fee revenue in
2010. The following table lists the baggage fee revenues of 6 U.S.
airlines for the year 2010.

Find the variance and standard deviation for these data.

57
Example 3-12

58
Example 3-12: Solution
Let x denote the 2010 baggage fee revenue (in millions of
dollars) of an airline. The values of Σx and Σx2 are calculated
in Table 3.6.
Table
3.6

59
Example 3-12: Solution
Step 1. Calculate Σx
The sum of values in the first column of Table 3.6 gives
2,854.

Step 2. Find Σx2

The results of this step are shown in the second column of
Table 3.6, which is 1,746,098.

60
Example 3-12: Solution
Step 3. Determine the variance

  x
2
 2,854
2

x 2

n
1,746,098 
6
s2  
n 1 6 1
1,746,098  1,357,552.667

5
 77,709.06666

61
Example 3-12: Solution
Step 4. Obtain the standard deviation
The standard deviation is obtained by taking the (positive) square root
of the variance:

  x
2

x 2

n
s  77,709.06666
n 1
 278.7634601  $278.76million
Thus, the standard deviation of the 2010 baggage fee revenues of
these six airlines is $278.76 million.

62
 Usually the values of the variance and standard deviation
are positive, but if a data set has no variation, then the
variance and standard deviation are both zero.

For example, if four persons in a group are the same age—

say, 35 years—then the four values in the data set are

35 35 35 35

If we calculate the variance and standard deviation for these

data, their values are zero. This is because there is no
variation in the values of this data set.

63
Example 3-13
Following are the 2011 earnings (in thousands of dollars)
before taxes for all 6 employees of a small company.

88.50 108.40 65.50 52.50 79.80 54.60

Calculate the variance and standard deviation for these data.

64
Example 3-13: Solution
Let x denote the 2011 earnings before taxes of an employee
of this company. The values of ∑x and ∑x2 are calculated in
Table 3.7.
Table
3.7

65
Example 3-13: Solution

 x
2
� (449.30) 2

�x2 
N
35,978.51 
6
2    388.90
N 6
  388.90  $19.721 thousand  $19,721

Thus, the standard deviation of the 2011 earnings of all six

employees of this company is $19,721.

66
Population Parameters and Sample
Statistics
 A numerical measure such as the mean, median, mode,
range, variance, or standard deviation calculated for a
population data set is called a population parameter, or
simply a parameter.

Thus, µ and σ and are population parameters

 A summary measure calculated for a sample data set is

called a sample statistic, or simply a statistic.

Thus, x and s are sample statistics

67
MEAN, VARIANCE AND STANDARD
DEVIATION FOR GROUPED DATA
 Mean for Grouped Data
 Variance and Standard Deviation for Grouped Data

68
Mean for Grouped Data
Calculating Mean for Grouped Data

Mean for population data:   mf

Mean for sample data:

x
 mf
n
where m is the midpoint and f is the frequency of a class.

69
Example 3-14
Table 3.8 gives the frequency distribution of the daily
commuting times (in minutes) from home to work for all 25
employees of a company.

Calculate the mean of the daily commuting times.

70
Example 3-14
Table-3-8

71
Example 3-14: Solution

72
Example 3-14: Solution

  mf

535
 21.40 minutes
N 25

Thus, the employees of this company spend an average of

21.40 minutes a day commuting from home to work.

73
Example 3-15
Table 3.10 gives the frequency distribution of the number of
orders received each day during the past 50 days at the office
of a mail-order company.

Calculate the mean.

74
Example 3-15
Table-3-10

75
Example 3-15: Solution

76
Example 3-15: Solution

x
 mf

832
 16.64 orders
n 50
Thus, this mail-order company received an average of
16.64 orders per day during these 50 days.

77
Variance and Standard Deviation for
Grouped Data
Basic Formulas for the Variance and Standard Deviation for
Grouped Data

 f m    f m  x 
2 2

 2
 and s 2

N n 1

where σ² is the population variance, s² is the sample variance,

and m is the midpoint of a class. In either case, the standard
deviation is obtained by taking the positive square root of the
variance.

78
Variance and Standard Deviation for
Grouped Data
Short-Cut Formulas for the Variance and Standard Deviation
for Grouped Data

(  mf ) 2
  mf 
2

 m f 
2

N
m 2
f 
n
2  and s 2 
N n 1

where σ² is the population variance, s² is the sample variance,

and m is the midpoint of a class.

79
Variance and Standard Deviation for
Grouped Data
Short-cut Formulas for the Variance and Standard Deviation for
Grouped Data

The standard deviation is obtained by taking the positive

square root of the variance.

Population standard deviation:   2

Sample standard deviation: s  s2

80
Example 3-16
The following data, reproduced from Table 3.8 of Example 3-14,
give the frequency distribution of the daily commuting times (in
minutes) from home to work for all 25 employees of a company.

Calculate the variance and standard deviation.

81
Example 3-16: Solution

82
Example 3-16: Solution

(
m f  N
2  mf ) 2

14,825 
(535) 2
25 3376
 
2
   135.04
N 25 25

   2  135.04  11 .62 minutes

Thus, the standard deviation of the daily commuting times for these
employees is 11.62 minutes.

83
Example 3-17
The following data, reproduced from Table 3.10 of Example 3-
15, give the frequency distribution of the number of orders
received each day during the past 50 days at the office of a
mail-order company.

Calculate the variance and standard deviation.

84
Example 3-17: Solution

85
Example 3-17: Solution

 m 2
f
(  mf ) 2

14,216 
(832 ) 2

s2  n  50  7.5820
n 1 50  1

s  s 2  7.5820  2.75 orders

Thus, the standard deviation of the number of orders received at the

office of this mail-order company during the past 50 days is 2.75.

86
USE OF STANDARD DEVIATION
 Chebyshev’s Theorem
 Empirical Rule

87
Chebyshev’s Theorem

 For any number k greater than 1, at least (1 – 1/k²) of the

data values lie within k standard deviations of the mean.

 Applies to any distribution, regardless of shape.

 Places lower limits on the percentages of observations

within a given number of standard deviations from the
mean

88
Chebyshev’s Theorem
 1 
1  
 At least 

of
k2
the elements of any distribution


lie within k standard deviations of the mean

1 1 3
1  1    75%
2
2
4 4 2
Standard
At 1 1 8 Lie
1  2  1    89% 3 deviations
least 3 9 9 within of the mean
1 1 15 4
1 2  1   94%
4 16 16

89
Figure 3.5 Chebyshev’s theorem.

90
Figure 3.6 Percentage of values within two
standard deviations of the mean for
Chebyshev’s theorem.

91
Figure 3.7 Percentage of values within
three standard deviations of the mean for
Chebyshev’s theorem.

92
Example 3-18
 The average systolic blood pressure for 4000 women who
were screened for high blood pressure was found to be 187
mm Hg with a standard deviation of 22. Using Chebyshev’s
theorem, find at least what percentage of women in this
group have a systolic blood pressure between 143 and 231
mm Hg.

93
Example 3-18: Solution
 Let μ and σ be the mean and the standard deviation,
respectively, of the systolic blood pressures of these women.
 μ = 187 and σ = 22

94
Example 3-18: Solution
 The value of k is obtained by dividing the distance between
the mean and each point by the standard deviation. Thus
 k = 44/22 = 2

1 1 1
1  2  1  2  1   1  .25  .75 or 75%
k ( 2) 4

 Hence, according to Chebyshev's theorem, at least 75% of the

women have systolic blood pressure between 143 and 231
mm Hg. This percentage is shown in Figure 3.8.

95
Figure 3.8 Percentage of women with
systolic blood pressure between 143 and
231.

96
Empirical Rule

 Applies only to bell-shaped/symmetric distributions.

 Specifies approximate percentages of observations within a

given number of standard deviations from the mean

97
Empirical Rule
 For a bell shaped distribution, approximately

 68% of the observations lie within 1 standard

deviation of the mean

 95% of the observations lie within 2 standard

deviations of the mean

 99.7% of the observations lie within 3 standard

deviations of the mean

98
Figure 3.9 Illustration of the empirical rule.

99
Example 3-19
 The age distribution of a sample of 5000 persons is bell-shaped
with a mean of 40 years and a standard deviation of 12 years.
Determine the approximate percentage of people who are 16
to 64 years old.

100
Example 3-19: Solution
 From the given information, for this distribution,
 x = 40 and s = 12 years

 Each of the two points, 16 and 64, is 24 units away from the
mean.

 Because the area within two standard deviations of the mean

is approximately 95% for a bell-shaped curve, approximately
95% of the people in the sample are 16 to 64 years old.

101
Figure 3.10 Percentage of people who are
16 to 64 years old.

102
MEASURES OF POSITION
A measure of position determines the position of a
single value in relation to other values in a sample or a
population data set

 Quartiles
 Interquartile Range
 Percentiles

103
Quartiles and Interquartile Range
 Quartiles are the summary measures that divide a
ranked data set into four equal parts.

 The second quartile is the same as the median of a data

set.

 The first quartile is the value of the middle term among

the observations that are less than the median, and the
third quartile is the value of the middle term among the
observations that are greater than the median.

104
Figure 3.11 Quartiles.

105
Quartiles and Interquartile Range
 Calculating Interquartile Range
 The difference between the third and the first quartiles gives
the interquartile range:

 IQR = Interquartile range = Q3 – Q1

106
Example 3-20
Table 3.3 in Example 3-5 gave the total compensations (in
millions of dollars) for the year 2010 of the 12 highest-paid
CEOs of U.S. companies. That table is reproduced on the next
slide.

(a) Find the values of the three quartiles. Where does the total
compensation of Michael D. White (CEO of DirecTV) fall in
relation to these quartiles?

(b) Find the interquartile range.

107
Example 3-20

108
Example 3-20: Solution
(a)

By looking at the position of $32.9 million (total compensation of

Michael D. White, CEO of DirecTV), we can state that this value lies
in the bottom 75% of the 2010 total compensation. This value
falls between the second and third quartiles.

109
Example 3-20: Solution
(b) The interquartile range is given by the difference between
the values of the third and first quartiles. Thus

IQR = Interquartile range = Q3 – Q1

= 51.5 – 24.05 = $27.45 million

110
Example 3-21
The following are the ages (in years) of nine employees of an
insurance company:
 47 28 39 51 33 37 59 24 33

(a) Find the values of the three quartiles. Where does the age of
28 years fall in relation to the ages of the employees?

(b) Find the interquartile range.

111
Example 3-21: Solution
(a)

The age of 28 falls in the lowest 25% of the ages.

112
Example 3-21: Solution
(b) The interquartile range is
IQR = Interquartile range = Q3 – Q1
= 49 – 30.5
= 18.5 years

113
Percentiles
 Percentiles are the summary measures that divide a ranked data
set into 100 equal parts.

 Each (ranked) data set has 99 percentiles that divide it into 100
equal parts.

114
Percentiles and Percentile Rank
 Calculating Percentiles
 The (approximate) value of the k th percentile, denoted by
Pk, is

 kn 
Pk  Value of the   th term in a ranked data set
 100 

 where k denotes the number of the percentile and n

represents the sample size.

115
Example 3-22
 Refer to the data on total compensations (in millions of
dollars) for the year 2010 of the 12 highest-paid CEOs of U.S.
companies given in Example 3-20. Find the value of the 60th
percentile. Give a brief interpretation of the 60th percentile.

116
Example 3-22: Solution
 The data arranged in increasing order is as follows:

 21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5

 The position of the 60th percentile is

kn (60)(12)
  7.20th term  7th term
100 100

117
Example 3-22: Solution
 The value of the 7.20th term can be approximated by the value
of the 7th term in the ranked data. Therefore,

 P60 = 60th percentile = 28.2 = $28.2 million

 Thus, approximately 60% of these 12 CEOs had 2010 total

compensations less than or equal to $28.2 million.

118
BOX-AND-WHISKER PLOT

 A box-and-whisker plot gives a graphic presentation of

data using five measures: the median, the first quartile, the
third quartile, and the smallest and the largest values in the
data set between the lower and the upper inner fences.

 A box-and-whisker plot can help us visualize the center, the

spread, and the skewness of a data set.

 It also helps detect outliers.

 We can compare different distributions by making box-and-

whisker plots for each of them.

119
Example 3-24
 The following data are the incomes (in thousands of dollars)
for a sample of 12 households.

 75 69 84 112 74 104 81 90 94 144 79 98

 Construct a box-and-whisker plot for these data.

120
Example 3-24: Solution
 Step 1. First, rank the data in increasing order and calculate
the values of the median, the first quartile, the third quartile,
and the interquartile range. The ranked data are

 69 74 75 79 81 84 90 94 98 104 112 144

 Median = (84 + 90) / 2 = 87

 Q1 = (75 + 79) / 2 = 77
 Q3 = (98 + 104) / 2 = 101
 IQR = Q3 – Q1 = 101 – 77 = 24

121
Example 3-24: Solution
 Step 2. Find the points that are 1.5 x IQR below Q1 and
1.5 x IQR above Q3.

 1.5 x IQR = 1.5 x 24 = 36

 Lower inner fence = Q1 – 36 = 77 – 36 = 41

 Upper inner fence = Q3 + 36 = 101 + 36 = 137

122
Example 3-24: Solution
 Step 3. Determine the smallest and the largest values in the
given data set within the two inner fences.

 Smallest value within the two inner fences = 69

 Largest value within the two inner fences = 112

123
Example 3-24: Solution
 Step 4. Draw a horizontal line and mark the income levels
on it such that all the values in the given data set are
covered. The result of this step is shown in Figure 3.13.

124
Example 3-24: Solution
 Step 5. By drawing two lines, join the points of the
smallest and the largest values within the two inner
fences to the box. These values are 69 and 112 in this
example. This completes the box-and-whisker plot, as
shown in Figure 3.14.

125
Box Plot
Elementsof
Elements ofaaBox
BoxPlot
Plot
Smallest data Largest data point
point not not exceeding Suspected
Outlier below inner inner fence outlier
fence

o X X *

Median
Outer Inner Q1 Q3 Inner Outer
Fence Fence Fence Fence
Q1-1.5(IQR) Interquartile Range Q3+1.5(IQR)
Q1-3(IQR)
Q3+3(IQR)

126

Chapter 3 DESCRIPTIVE STATISTICS FOR EDA
No ratings yet
Chapter 3 DESCRIPTIVE STATISTICS FOR EDA
51 pages
CH - 3 (Numerical Descriptive Measures)
No ratings yet
CH - 3 (Numerical Descriptive Measures)
138 pages
Descriptive Measure 241122 125046
No ratings yet
Descriptive Measure 241122 125046
116 pages
Business Research Chap 3rd
No ratings yet
Business Research Chap 3rd
51 pages
CH 9 - Part 3
No ratings yet
CH 9 - Part 3
19 pages
Inferential Hypothesis Testing
100% (1)
Inferential Hypothesis Testing
108 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
116 pages
Integral As Area Under A Curve
No ratings yet
Integral As Area Under A Curve
22 pages
Environmental Policy Practice Quiz Week 6
No ratings yet
Environmental Policy Practice Quiz Week 6
6 pages
Problem Set#1 Multiple Choice Test Chapter 01.03 Sources of Error Complete Solution Set
100% (1)
Problem Set#1 Multiple Choice Test Chapter 01.03 Sources of Error Complete Solution Set
6 pages
Activity 1.0 - Statistical Analysis and Design
No ratings yet
Activity 1.0 - Statistical Analysis and Design
22 pages
Higher Order Derivatives
No ratings yet
Higher Order Derivatives
2 pages
Chapter 1-4 P.R
No ratings yet
Chapter 1-4 P.R
44 pages
Topic 3 - Money - Time Relationships and Equivalence
No ratings yet
Topic 3 - Money - Time Relationships and Equivalence
50 pages
Eda Hypothesis Testing For Single Sample
No ratings yet
Eda Hypothesis Testing For Single Sample
6 pages
L2 Accuracy, Precision and Error
No ratings yet
L2 Accuracy, Precision and Error
25 pages
Powerpoint - Principles of Experimental Design and Data Analysis
100% (5)
Powerpoint - Principles of Experimental Design and Data Analysis
9 pages
Your Answer Score Explanation
0% (1)
Your Answer Score Explanation
18 pages
Chapter 1 Introduction To Thermodynamics and Heat Transfer PDF
No ratings yet
Chapter 1 Introduction To Thermodynamics and Heat Transfer PDF
30 pages
Keth Iralex Cabalda BES 043 (Module 7)
No ratings yet
Keth Iralex Cabalda BES 043 (Module 7)
1 page
Formula For Standard Error of The Mean: Standard Deviation / Sample Size
No ratings yet
Formula For Standard Error of The Mean: Standard Deviation / Sample Size
30 pages
Chapter-8 ODE PDF
100% (1)
Chapter-8 ODE PDF
56 pages
Orthogonal Trajectories: Xyc Fxyc
No ratings yet
Orthogonal Trajectories: Xyc Fxyc
14 pages
Experiment 111 Spherical Mirrors - Online
No ratings yet
Experiment 111 Spherical Mirrors - Online
5 pages
Fixed Point Iteration
100% (1)
Fixed Point Iteration
5 pages
De Word Problem Zill&Rainville
100% (1)
De Word Problem Zill&Rainville
34 pages
Poisson Distribution
No ratings yet
Poisson Distribution
13 pages
Caro PHYS101L (A12) Report 6
100% (1)
Caro PHYS101L (A12) Report 6
11 pages
Applications of Differentiation PDF
No ratings yet
Applications of Differentiation PDF
26 pages
DBA Maths
No ratings yet
DBA Maths
98 pages
STA 421 LNote
No ratings yet
STA 421 LNote
20 pages
3 Rectilinear Motion
No ratings yet
3 Rectilinear Motion
13 pages
FINAL PPT PR1!11!12 UNIT 1 LESSON 3 Qualitative and Quantitative Research
No ratings yet
FINAL PPT PR1!11!12 UNIT 1 LESSON 3 Qualitative and Quantitative Research
26 pages
EDA Notebook 3 Random Variables and Probability Distributions
No ratings yet
EDA Notebook 3 Random Variables and Probability Distributions
23 pages
Laboratory #1 - Dertermination of Individual's Pace Factor
No ratings yet
Laboratory #1 - Dertermination of Individual's Pace Factor
3 pages
Allpt
0% (1)
Allpt
26 pages
Test of Hypothesis
No ratings yet
Test of Hypothesis
3 pages
Combine Module
No ratings yet
Combine Module
97 pages
Work Sheet I For Accountants 2 PDF
No ratings yet
Work Sheet I For Accountants 2 PDF
4 pages
Espiritu Dianne Act10 Engdat1 Ebb3 PDF
No ratings yet
Espiritu Dianne Act10 Engdat1 Ebb3 PDF
7 pages
TOPIC 4 - Curve Fiiting and Interpolation
No ratings yet
TOPIC 4 - Curve Fiiting and Interpolation
23 pages
Single Maths B Probability & Statistics: Exercises & Solutions
No ratings yet
Single Maths B Probability & Statistics: Exercises & Solutions
18 pages
Discrete Random Variables and Probability Distributions: Presented By: Juanito S. Chan, PIE, ASEAN Engr. (AE 0490)
No ratings yet
Discrete Random Variables and Probability Distributions: Presented By: Juanito S. Chan, PIE, ASEAN Engr. (AE 0490)
53 pages
Probability and Statistics (IT302) 17 August 2020 Monday 09:45AM-10:15AM Class 6
No ratings yet
Probability and Statistics (IT302) 17 August 2020 Monday 09:45AM-10:15AM Class 6
23 pages
Body-Centered Cubic Problems
No ratings yet
Body-Centered Cubic Problems
8 pages
Statistics Chapter 10-12
No ratings yet
Statistics Chapter 10-12
11 pages
Poisson Distribution-1
No ratings yet
Poisson Distribution-1
7 pages
Mini
No ratings yet
Mini
28 pages
Differential Equations: Elementary Applications of Differential Equations of The First Order
No ratings yet
Differential Equations: Elementary Applications of Differential Equations of The First Order
34 pages
Module 2 - Equilibrium of Rigid Bodies
No ratings yet
Module 2 - Equilibrium of Rigid Bodies
21 pages
Review X DOT X Resultant of Forces
No ratings yet
Review X DOT X Resultant of Forces
44 pages
Hussain Et Al (2023)
No ratings yet
Hussain Et Al (2023)
17 pages
Kathmandu University Course: MATH 208, ENVE II/II Prepared By: Kiran Shrestha, Dr. Samir Shrestha Introduction To Statistical Quality Control
No ratings yet
Kathmandu University Course: MATH 208, ENVE II/II Prepared By: Kiran Shrestha, Dr. Samir Shrestha Introduction To Statistical Quality Control
8 pages
DLL Week 1.2 - Stat and Proba Q3
100% (1)
DLL Week 1.2 - Stat and Proba Q3
8 pages
M8. Eulers Method For ODE
No ratings yet
M8. Eulers Method For ODE
3 pages
Module 3 PDF
No ratings yet
Module 3 PDF
23 pages
Experiment 10 Kirchhoffs Law
No ratings yet
Experiment 10 Kirchhoffs Law
8 pages
Chapter 8 Random Variate Generation
No ratings yet
Chapter 8 Random Variate Generation
14 pages
Circle and Radius of Curvature PDF
No ratings yet
Circle and Radius of Curvature PDF
8 pages
Q2 Ans
No ratings yet
Q2 Ans
13 pages
Jaba Elisabeta
0% (1)
Jaba Elisabeta
7 pages
Phys114 Ps 1
No ratings yet
Phys114 Ps 1
11 pages
Secant Method
No ratings yet
Secant Method
3 pages
MATH 499 Homework 1
No ratings yet
MATH 499 Homework 1
5 pages
LR The Addition and Resolution of Vectors The Force Table
No ratings yet
LR The Addition and Resolution of Vectors The Force Table
6 pages
ProblemSet7 1
No ratings yet
ProblemSet7 1
7 pages
External Advertisement For Part Time Lecturers For Academic Year 2025 and 2026
No ratings yet
External Advertisement For Part Time Lecturers For Academic Year 2025 and 2026
36 pages
Annotated-Part20skittles 20project
No ratings yet
Annotated-Part20skittles 20project
2 pages
Question Bank For Fluid Mechanics Ii - 030410041152 - 1
100% (1)
Question Bank For Fluid Mechanics Ii - 030410041152 - 1
11 pages
Statistical Interpretation of Data - : Guide To
No ratings yet
Statistical Interpretation of Data - : Guide To
24 pages
Dissertation Using Multiple Regression
100% (3)
Dissertation Using Multiple Regression
8 pages
Nvidia Fundamentals of Deep Learning PPT 4
No ratings yet
Nvidia Fundamentals of Deep Learning PPT 4
19 pages
How To Analyze Data Using ANOVA in SPSS
No ratings yet
How To Analyze Data Using ANOVA in SPSS
8 pages
Reserch Project
No ratings yet
Reserch Project
43 pages
Introduction To Business Analytics - Lab Manual
No ratings yet
Introduction To Business Analytics - Lab Manual
8 pages
Factorial Dimensions of Employee Engagement in Public and Private Sector Banks
No ratings yet
Factorial Dimensions of Employee Engagement in Public and Private Sector Banks
5 pages
Youteaching in The New Normal: Effectiveness of Teacher-Made Youtube Video Lessons in Improving Students' Learning Performance On Random Variables
No ratings yet
Youteaching in The New Normal: Effectiveness of Teacher-Made Youtube Video Lessons in Improving Students' Learning Performance On Random Variables
16 pages
Psychologicaltesting (Practical)
No ratings yet
Psychologicaltesting (Practical)
2 pages
The Moderator-Baron
No ratings yet
The Moderator-Baron
19 pages
The Impact of CES Initiatives On Brand Equity of Southville International Schools and Colleges
No ratings yet
The Impact of CES Initiatives On Brand Equity of Southville International Schools and Colleges
47 pages
Seminar Handout
No ratings yet
Seminar Handout
7 pages
Clustering Today
No ratings yet
Clustering Today
52 pages
Examiners Report Pure Mathematics and Statistics
No ratings yet
Examiners Report Pure Mathematics and Statistics
25 pages
IFT Notes R05 Sampling and Estimation
No ratings yet
IFT Notes R05 Sampling and Estimation
16 pages
Local Media1949055759870428644
No ratings yet
Local Media1949055759870428644
6 pages
Globalization and Perceptions of Policy Maker Competence Evidence From France
No ratings yet
Globalization and Perceptions of Policy Maker Competence Evidence From France
14 pages
RCT+Appraisal+sheets 2005
No ratings yet
RCT+Appraisal+sheets 2005
3 pages
Camry Group Activity Bsba-Fm3a
No ratings yet
Camry Group Activity Bsba-Fm3a
12 pages
Second Project - Rics
No ratings yet
Second Project - Rics
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.