0% found this document useful (0 votes)
21 views88 pages

Numerical Measures New

The document presents an overview of descriptive statistics, focusing on numerical measures such as location, variability, and distribution shape. It details various statistical measures including mean, median, mode, percentiles, and quartiles, with examples related to apartment rents. The content is structured to aid understanding of how to analyze and interpret data effectively.

Uploaded by

XIAO LA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views88 pages

Numerical Measures New

The document presents an overview of descriptive statistics, focusing on numerical measures such as location, variability, and distribution shape. It details various statistical measures including mean, median, mode, percentiles, and quartiles, with examples related to apartment rents. The content is structured to aid understanding of how to analyze and interpret data effectively.

Uploaded by

XIAO LA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 88

Statistics for Business

and Economics
Anderson Sweeney
Williams
Slides by
John Loucks
St. Edward’s University

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
1
or duplicated, or posted to a publicly accessible website, in whole or in part.
Descriptive Statistics: Numerical
Measures
 1. Measures of Location
 2. Measures of Variability
 3. Measures of Distribution Shape, Relative
Location, and Detecting Outliers
 4. Exploratory Data Analysis
 5. Measures of Association Between Two
Variables
 6. The Weighted Mean and
Working with Grouped Data

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
2
or duplicated, or posted to a publicly accessible website, in whole or in part.
1. Measures of Location

 Mean
If the measures are computed
 Median
for data from a sample,
 Mode they are called sample statistics.
 Percentiles
 Quartiles If the measures are computed
for data from a population,
they are called population parameters.

A sample statistic is referred to


as the point estimator of the
corresponding population parameter.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
3
or duplicated, or posted to a publicly accessible website, in whole or in part.
1.1 Mean

 Perhaps the most important measure of


location is the mean.
 The mean provides a measure of central
 location
The mean . of a data set is the average of all
the data values.
 The sample mean x is the point estimator of
the population mean .

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
4
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Mean x

Sum of the values


of the n observations

x i
x
n
Number of
observations
in the sample

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
5
or duplicated, or posted to a publicly accessible website, in whole or in part.
Population Mean 

Sum of the values


of the N observations

x i

N
Number of
observations in
the population

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
6
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Mean

 Example: Apartment Rents


Seventy efficiency apartments were
randomly
sampled in a small college town. The
monthly rent
445 615 430 590 435 600 460 600 440 615
prices
440 440 for these
440 525 apartments
425 445 are
575 listed
445 below.
450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
7
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Mean

 Example: Apartment Rents

x
 x
i34,356
 490.80
n 70
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
8
or duplicated, or posted to a publicly accessible website, in whole or in part.
1.2 Median

 The median of a data set is the value in the middle


when the data items are arranged in ascending ord
 Whenever a data set has extreme values, the media
is the preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data
 A few extremely large incomes or property values
can inflate the mean.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
9
or duplicated, or posted to a publicly accessible website, in whole or in part.
Median

 For an odd number of observations:

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

the median is the middle value.

Median = 19

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
10
or duplicated, or posted to a publicly accessible website, in whole or in part.
Median

 For an even number of observations:

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
11
or duplicated, or posted to a publicly accessible website, in whole or in part.
Median

 Example: Apartment Rents


Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
12
or duplicated, or posted to a publicly accessible website, in whole or in part.
Trimmed Mean

 Another measure, sometimes used when extreme


values are present, is the trimmed mean.
 It is obtained by deleting a percentage of the
smallest and largest values from a data set and the
computing the mean of the remaining values.
 For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
13
or duplicated, or posted to a publicly accessible website, in whole or in part.
1.3 Mode

 The mode of a data set is the value that occurs with


greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
14
or duplicated, or posted to a publicly accessible website, in whole or in part.
Mode

 Example: Apartment Rents


450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
15
or duplicated, or posted to a publicly accessible website, in whole or in part.
1.4 Percentiles

 A percentile provides information about how the


data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
 The pth percentile of a data set is a value such
that at least p percent of the items take on this
value or less and at least (100 - p) percent of
the items take on this value or more.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
16
or duplicated, or posted to a publicly accessible website, in whole or in part.
Percentiles

Arrange the data in ascending order.

Compute index i, the position of the pth percentile.


i = (p/100)n

If i is not an integer, round up. The p th percentile


is the value in the i th position.

If i is an integer, the p th percentile is the average


of the values in positions i and i +1.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
17
or duplicated, or posted to a publicly accessible website, in whole or in part.
80th Percentile

 Example: Apartment Rents


i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
18
or duplicated, or posted to a publicly accessible website, in whole or in part.
80th Percentile

 Example: Apartment Rents


“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
19
or duplicated, or posted to a publicly accessible website, in whole or in part.
1.5 Quartiles

 Quartiles are specific percentiles.


 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
20
or duplicated, or posted to a publicly accessible website, in whole or in part.
Third Quartile

 Example: Apartment Rents


Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending
order.
“At least 75% of the items take on a value of 525 or less.”
55/70=78.6%
“At least 25% of the items take on a value of 525 or more.”
18/70
© 2014 Cengage Learning. All Rights =25.7%
Reserved. May not be scanned, copied
Slide
21
or duplicated, or posted to a publicly accessible website, in whole or in part.
2. Measures of Variability

 It is often desirable to consider measures of variabil


(dispersion), as well as measures of location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time f
each, but also the variability in delivery time for ea

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
22
or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Location: Example

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
23
or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Variability

 2.1 Range
 2.2 Interquartile Range
 2.3 Variance
 2.4 Standard Deviation
 2.5 Coefficient of Variation

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
24
or duplicated, or posted to a publicly accessible website, in whole or in part.
2.1 Range

 The range of a data set is the difference between th


largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
25
or duplicated, or posted to a publicly accessible website, in whole or in part.
Range

 Example: Apartment Rents


Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
26
or duplicated, or posted to a publicly accessible website, in whole or in part.
2.2 Interquartile Range

 The interquartile range of a data set is the differenc


between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
27
or duplicated, or posted to a publicly accessible website, in whole or in part.
Interquartile Range

 Example: Apartment Rents


3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
28
or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Variability: Example 1

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
29
or duplicated, or posted to a publicly accessible website, in whole or in part.
2.3 Variance

The variance is a measure of variability that utilizes


all the data.

The variance is useful in comparing the variability


of two or more variables.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
30
or duplicated, or posted to a publicly accessible website, in whole or in part.
Variance

The variance is the average of the squared


differences between each data value and the mean.
mean

The variance is computed as follows:

2 2
 ( xi  x )  ( xi   )
s2  2
 
n 1 N
for a for a
sample population

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
31
or duplicated, or posted to a publicly accessible website, in whole or in part.
2.4 Standard Deviation

The standard deviation of a data set is the positive


square root of the variance.

It is measured in the same units as the data, making


it more easily interpreted than the variance.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
32
or duplicated, or posted to a publicly accessible website, in whole or in part.
Standard Deviation

The standard deviation is computed as follows:

s  s2   2

for a for a
sample population

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
33
or duplicated, or posted to a publicly accessible website, in whole or in part.
2.5 Coefficient of Variation

The coefficient of variation indicates how large the


standard deviation is in relation to the mean.

The coefficient of variation is computed as follows:

s   
 100  %  100  %
x   
for a for a
sample population

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
34
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Variance, Standard Deviation,
And Coefficient of Variation
 Example: Apartment Rents

• Variance  i
( x  x ) 2

s2   2,996.16
n 1

• Standard Deviation the


standard
s  s  2996.16  54.74 deviation is
2

about 11%
• Coefficient of Variation of the
mean
s   54.74 
  100  %    100  %  11.15%
x   490.80 
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
35
or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Variability: Example 2

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
36
or duplicated, or posted to a publicly accessible website, in whole or in part.
3. Measures of Distribution Shape,
Relative Location, and Detecting Outliers
 3.1 Distribution Shape
 3.2 z-Scores
 3.3 Chebyshev’s Theorem
 3.4 Empirical Rule
 3.5 Detecting Outliers

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
37
or duplicated, or posted to a publicly accessible website, in whole or in part.
3.1 Distribution Shape: Skewness
 An important measure of the shape of a
distribution is called skewness.
 The formula for the skewness of sample data
is 3
n  xi  x 
Skewness   
( n  1)( n  2 )  s 

 Skewness can be easily computed using


statistical software.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
38
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.
.35
Skewness =
0
Relative Frequency

.30
.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
39
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Moderately Skewed Left
• Skewness is negative.
• Mean will usually be less than the median.
.35
Skewness = .31
Relative Frequency

.30
.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
40
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Moderately Skewed Right
• Skewness is positive.
• Mean will usually be more than the median.
.35
Skewness = .31
Relative Frequency

.30
.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
41
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
Relative Frequency

.30
.25
.20
.15
.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
42
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Example: Apartment Rents
Seventy efficiency apartments were
randomly
sampled in a college town. The monthly rent
prices
425 430 430 435 435 435 435 435 440 440
for the apartments are listed below in
440 440 440 445 445 445 445 445 450 450
450ascending
450 450 order.
450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
43
or duplicated, or posted to a publicly accessible website, in whole or in part.
Distribution Shape: Skewness
 Example: Apartment Rents

.35 Skewness = .92


Relative Frequency

.30

.25

.20
.15

.10
.05
0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
44
or duplicated, or posted to a publicly accessible website, in whole or in part.
3.2 z-Scores

The
The z-score
z-score is
is often
often called
called the
the standardized
standardized value.
value.

It
It denotes
denotes the the number
number of
of standard
standard deviations
deviations aa data
data
value
value xxii is
is from
from the
the mean.
mean.

xi  x
zi 
s

Excel’s
Excel’s STANDARDIZE
STANDARDIZE function
function can
can be
be used
used to
to
compute
compute the
the z-score.
z-score.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
45
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores

 An observation’s z-score is a measure of the relative


location of the observation in a data set.
 A data value less than the sample mean will have a
z-score less than zero.
 A data value greater than the sample mean will hav
a z-score greater than zero.
 A data value equal to the sample mean will have a
z-score of zero.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
46
or duplicated, or posted to a publicly accessible website, in whole or in part.
z-Scores

 Example: Apartment Rents


• z-Score of Smallest Value (425)
xi  x 425  490.80
z    1.20
s 54.74

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
47
or duplicated, or posted to a publicly accessible website, in whole or in part.
3.3 Chebyshev’s Theorem

At
At least
least (1
(1 -- 1/
1/zz22)) of
of the
the items
items in
in any
any data
data set
set will
will be
be
within
within zz standard
standard deviations
deviations of
of the
the mean,
mean, where
where zz is
is
any
any value
value greater
greater than than 1.
1.

Chebyshev’s
Chebyshev’s theorem
theorem requires
requires zz >
> 11,, but
but zz need
need not
not
be
be an
an integer.
integer.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
48
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem

At least75%
At least of
of the
the data
data values
values must
must be
be
within z = 2 standard deviations
within of
of the
the mean.
mean

At least89%
At least of
of the
the data
data values
values must
must be
be
within z = 3 standard deviations
within of
of the
the mean.
mean

At least94%
At least of
of the
the data
data values
values must
must be
be
within z = 4 standard deviations
within of
of the
the mean.
mean

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
49
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev’s Theorem

 Example: Apartment Rents


Let z = 1.5 with x= 490.80 and s = 54.74

At least (1  1/(1.5)2) = 1  0.44 = 0.56 or 56%


of the rent values must be between Important!
x - z(s) = 490.80  1.5(54.74) = 409
The true
proportions
and found within
x + z(s) = 490.80 + 1.5(54.74) = 573
the indicated
regions could
be greater
(Actually, 86% of the rent values than what
the theorem
are between 409 and 573.)
guarantees.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
50
or duplicated, or posted to a publicly accessible website, in whole or in part.
3.4 Empirical Rule

When the data are believed to approximate a


bell-shaped distribution …

The
The empirical
empirical rule
rule can
can be
be used
used to
to determine
determine the
the
percentage
percentage of
of data
data values
values that
that must
must be
be within
within aa
specified
specified number
number ofof standard
standard deviations
deviations of
of the
the
mean.
mean.

The
The empirical
empirical rule
rule is
is based
based on
on the
the normal
normal
distribution,
distribution, which
which is
is covered
covered in
in Chapter
Chapter 6.
6.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
51
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule

For data having a bell-shaped distribution:

68.26% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/- 1 standard deviation
are within of
of its
its mea
me

95.44% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/- 2 standard deviations
are within of
of its
its mea
me

99.72% of of the
the values
values of
of aa normal
normal random
random variable
variable
are within+/- 3 standard deviations
are within of
of its
its mea
me

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
52
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule

99.72%
95.44%
68.26%


x
 – 3  – 1  + 1  + 3
 – 2  + 2

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
53
or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rule

99.72%
95.44%

68.26%



Tel. bill
   
 

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
54
or duplicated, or posted to a publicly accessible website, in whole or in part.
Chebyshev's Theorem vs Empirical
Rule

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
55
or duplicated, or posted to a publicly accessible website, in whole or in part.
3.5 Detecting Outliers

 An outlier is an unusually small or unusually large


value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in
the data set

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
56
or duplicated, or posted to a publicly accessible website, in whole or in part.
Detecting Outliers

 Example: Apartment Rents


• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.

Standardized Values for Apartment Rents


-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
57
or duplicated, or posted to a publicly accessible website, in whole or in part.
4. Exploratory Data Analysis

Exploratory
Exploratory data
data analysis
analysis procedures
procedures enable
enable us
us to
to use
us
simple
simple arithmetic
arithmetic and
and easy-to-draw
easy-to-draw pictures
pictures to
to
summarize
summarize data.
data.

We
We simply
simply sort
sort the
the data
data values
values into
into ascending
ascending order
order
and
and identify
identify the
the five-number
five-number summary
summary andand then
then
construct
construct aa box
box plot
plot..

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
58
or duplicated, or posted to a publicly accessible website, in whole or in part.
4.1 Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
59
or duplicated, or posted to a publicly accessible website, in whole or in part.
Five-Number Summary

 Example: Apartment Rents

Lowest Value = 425 First Quartile = 445


Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
60
or duplicated, or posted to a publicly accessible website, in whole or in part.
4.2 Box Plot

A
A box
box plot
plot is
is aa graphical
graphical summary
summary of
of data
data that
that is
is
based
based on
on aa five-number
five-number summary.
summary.

A
A key
key to
to the
the development
development ofof aa box
box plot
plot is
is the
the
computation
computation of of the
the median
median and
and the
the quartiles
quartiles Q Q11 and
and
Q
Q33..

Box
Box plots
plots provide
provide another
another way
way to
to identify
identify outliers.
outliers.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
61
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
 Example: Apartment Rents
• A box is drawn with its ends located at the first an
third quartiles.
• A vertical line is drawn in the box at the location of
the median (second quartile).

40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
62
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

 Limits are located (not drawn) using the


interquartile range (IQR).
 Data outside these limits are considered
 outliers .
The locations of each outlier is shown with the
symbol * .
continued

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
63
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
 Example: Apartment Rents
• The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

• The upper limit is located 1.5(IQR) above Q3.


Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
64
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot
 Example: Apartment Rents
• Whiskers (dashed lines) are drawn from the ends
of the box to the smallest and largest data values
inside the limits.

40 42 45 47 50 52 55 57 60 62
0 5 0 5 0 5 0 5 0 5
Smallest value Largest value
inside limits = 425 inside limits = 615
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
65
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
66
or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plot

An excellent graphical technique for


making
comparisons among two or more
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
groups.
or duplicated, or posted to a publicly accessible website, in whole or in part.
Slide
67
5. Measures of Association
Between Two Variables
Thus
Thus far
far we
we have
have examined
examined numerical
numerical methods
methods used
used
to
to summarize
summarize the
the data
data for
for one
one variable
variable at
at aa time.
time.

Often
Often aa manager
manager or
or decision
decision maker
maker isis interested
interested in
in
the
the relationship
relationship between
between two
two variables
variables..

Two
Two descriptive
descriptive measures
measures ofof the
the relationship
relationship
between
between two
two variables
variables are
are covariance
covariance andand correlation
correlation
coefficient
coefficient..

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
68
or duplicated, or posted to a publicly accessible website, in whole or in part.
5.1 Covariance

The
The covariance
covariance is
is aa measure
measure of
of the
the linear
linear association
association
between
between two
two variables.
variables.

Positive
Positive values
values indicate
indicate aa positive
positive relationship.
relationship.

Negative
Negative values
values indicate
indicate aa negative
negative relationship.
relationship.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
69
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance

The
The covariance
covariance is
is computed
computed as
as follows:
follows:

 ( xi  x )( yi  y ) for
s xy 
n 1 samples

 ( xi   x )( yi   y ) for
 xy  populations
N

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
70
or duplicated, or posted to a publicly accessible website, in whole or in part.
5.2 Correlation Coefficient

Correlation
Correlation is
is aa measure
measure of
of linear
linear association
association and
and not
no
necessarily
necessarily causation.
causation.

Just
Just because
because two
two variables
variables are
are highly
highly correlated,
correlated, it
it
does
does not
not mean
mean that
that one
one variable
variable is
is the
the cause
cause of
of the
the
other.
other.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
71
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient

The
The correlation
correlation coefficient
coefficient is
is computed
computed as
as follows:
follows:
sxy  xy
rxy   xy 
sx s y  x y

for for
samples populations

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
72
or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation Coefficient

The
The coefficient
coefficient can
can take
take on
on values
values between
between -1
-1 and
and +1
+

Values
Values near
near -1-1 indicate
indicate aa strong
strong negative
negative linear
linear
relationship
relationship..

Values
Values near
near +1+1 indicate
indicate aa strong
strong positive
positive linear
linear
relationship
relationship..

The
The closer
closer the
the correlation
correlation is
is to
to zero,
zero, the
the weaker
weaker the
the
relationship.
relationship.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
73
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient

 Example: Golfing Study


A golfer is interested in investigating the
relationship, if any, between driving distance
and 18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
272.9 69

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
74
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient

 Example: Golfing Study

x y (xi  x ) (yi  y ) (xi  x )(yi  y )


277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192.8944

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
75
or duplicated, or posted to a publicly accessible website, in whole or in part.
Covariance and Correlation Coefficient

 Example: Golfing Study


• Sample Covariance
sxy 
 (x  x )(y  y )  35.40
i i
   7.08
n 1 6 1
• Sample Correlation Coefficient
sxy  7.08
rxy    -.9631
sxsy (8.2192)(.8944)

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
76
or duplicated, or posted to a publicly accessible website, in whole or in part.
6. The Weighted Mean and
Working with Grouped Data
 6.1 Weighted Mean
 6.2 Mean for Grouped Data
 6.3 Variance for Grouped Data
 6.4 Standard Deviation for Grouped
Data

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
77
or duplicated, or posted to a publicly accessible website, in whole or in part.
6.1 Weighted Mean

 When the mean is computed by giving each data


value a weight that reflects its importance, it is
referred to as a weighted mean.
 In the computation of a grade point average (GPA),
the weights are the number of credit hours earned
each grade.
 When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
78
or duplicated, or posted to a publicly accessible website, in whole or in part.
Weighted Mean

x  wx i i

w i

where:
xi = value of observation i
wi = weight for observation i

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
79
or duplicated, or posted to a publicly accessible website, in whole or in part.
Grouped Data

 The weighted mean computation can be used to


obtain approximations of the mean, variance, and
standard deviation for the grouped data.
 To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
 We compute a weighted mean of the class midpoint
midpoin
using the class frequencies as weights.
 Similarly, in computing the variance and standard
deviation, the class frequencies are used as weight

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
80
or duplicated, or posted to a publicly accessible website, in whole or in part.
6.2 Mean for Grouped Data

 Sample Data

x  fM i i

 Population
Data
  fM i i

N
where:
fi = frequency of class i
Mi = midpoint of class i

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
81
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Mean for Grouped Data

 Example: Apartment Rents


The previously presented sample of apartment
rents is shown here as grouped data in the form of
a frequency distribution. Rent ($) Frequency
420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
82
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Mean for Grouped Data
 Example: Apartment Rents

Rent ($) fi Mi f iMi


420-439 8 429.5 3436.0 34,525
x  493.21
440-459 17 449.5 7641.5 70
460-479 12 469.5 5634.0 This approximation
480-499 8 489.5 3916.0
differs by $2.41 from
500-519 7 509.5 3566.5
520-539 4 529.5 2118.0 the actual sample
540-559 2 549.5 1099.0 mean of $490.80.
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0
600-619 6 609.5 3657.0
Total 70 34525.0

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
83
or duplicated, or posted to a publicly accessible website, in whole or in part.
6.3 Variance for Grouped Data
 For sample data

2  fi ( Mi  x )2
s 
n 1

 For population data

 f ( M   ) 2
2  i i
N

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
84
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Variance for Grouped Data
 Example: Apartment Rents

Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29
continued
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
85
or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Variance for Grouped Data
 Example: Apartment Rents
• Sample Variance
s2 = 208,234.29/(70 – 1) = 3,017.89

• Sample Standard Deviation


s  3,017.89  54.94

This approximation differs by only $.20


from the actual standard deviation of $54.74.

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
86
or duplicated, or posted to a publicly accessible website, in whole or in part.
Your Task Today!

1. How would you explain the


difference between correlation and
covariance? List out the key
differences between covariance and
correlation.

2. What is the difference between


variance and covariance?

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
87
or duplicated, or posted to a publicly accessible website, in whole or in part.
End of Learning Unit 3
Thank you!

© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
Slide
88
or duplicated, or posted to a publicly accessible website, in whole or in part.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy