Cengage EBA 2e Chapter02
Cengage EBA 2e Chapter02
Chapter 2
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Overview of Using Data:
Definitions and Goals
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Overview of Using Data: Definitions and Goals
• Data: The facts and figures collected, analyzed, and summarized for
presentation and interpretation
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
3
Table 2.1:
Data for Dow Jones Industrial Index Companies
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
4
Types of Data
Population and Sample Data
Quantitative and Categorical Data
Cross-Sectional and Time Series Data
Sources of Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Types of Data
• Population: All elements of interest
• Sample: Subset of the population
• Random sampling: A sampling method to gather a representative sample of
the population data
• Quantitative data: Data on which numeric and arithmetic
operations, such as addition, subtraction, multiplication, and division,
can be performed
• Categorical data: Data on which arithmetic operations cannot be
performed
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
6
Types of Data
• Cross-sectional data: Data collected from several entities at the
same, or approximately the same, point in time
• Time series data: Data collected over several time periods
• Graphs of time series data are frequently found in business and economic
publications
• Graphs help analysts understand what happened in the past, identify trends
over time, and project future levels for the time series
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
7
Figure 2.1: Dow Jones Index Values Since 2005
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
8
Types of Data
Sources of Data
• Experimental study
• A variable of interest is first identified
• Then one or more other variables are identified and controlled or manipulated so that
data can be obtained about how they influence the variable of interest
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
9
Figure 2.2: Customer Opinion Questionnaire used by Chops City Grill Restaurant
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
10
Modifying Data in Excel
Sorting and Filtering Data in Excel
Conditional Formatting of Data in Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Table 2.2:
20 Top-Selling Automobiles in United States in March
2011
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
12
Figure 2.3: Data for 20 Top-Selling Automobiles Entered into
Excel with Percent Change in Sales from 2010
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
13
Modifying Data in Excel
Sorting and Filtering Data in Excel
• To sort the automobiles by March 2010 sales:
• Step 1: Select cells A1:F21
• Step 2: Click the Data tab in the Ribbon
• Step 3: Click Sort in the Sort & Filter group
• Step 4: Select the check box for My data has headers
• Step 5: In the first Sort by dropdown menu, select Sales (March 2010)
• Step 6: In the Order dropdown menu, select Largest to Smallest
• Step 7: Click OK
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
14
Figure 2.4: Using Excel’s Sort Function to Sort the Top-Selling Automobiles Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
15
Figure 2.5: Top-Selling Automobiles Data Sorted by Sales in March 2010 Sales
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
16
Modifying Data in Excel
Sorting and Filtering Data in Excel
• Using Excel’s Filter function to see the sales of models made by Toyota
• Step 1: Select cells A1:F21
• Step 5: If all choices are checked, you can easily deselect all choices by unchecking (Select
All). Then select only the check box for Toyota.
• Step 6. Click OK
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
17
Figure 2.6: Top Selling Automobiles Data Filtered to Show Only
Automobiles Manufactured by Toyota
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
18
Modifying Data in Excel
Conditional Formatting of Data in Excel
• Makes it easy to identify data that satisfy certain conditions in a data
set
• To identify the automobile models in Table 2.2 for which sales had
decreased from March 2010 to March 2011:
• Step 1: Starting with the original data shown in Figure 2.3, select cells F1:F21
• Step 2: Click on the Home tab in the Ribbon
• Step 3: Click Conditional Formatting in the Styles group
• Step 4: Select Highlight Cells Rules, and click Less Than from the dropdown
menu
• Step 5: Enter 0% in the Format cells that are LESS THAN: box
• Step 6: Click OK
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
19
Figure 2.7: Using Conditional Formatting in Excel to Highlight
Automobiles with Declining Sales from March 2010
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
20
Figure 2.8: Using Conditional Formatting in Excel to Generate Data
Bars for the Top-Selling Automobiles Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
21
Modifying Data in Excel
• Quick Analysis button appears just outside the bottom-right corner of
a group of selected cells
• Provides shortcuts for Conditional Formatting, adding Data Bars, etc.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Creating Distributions from Data
Frequency Distributions for Categorical Data
Relative Frequency and Percent Frequency Distributions
Frequency Distributions for Quantitative Data
Histograms
Cumulative Distributions
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Creating Distributions from Data
Frequency Distributions for Categorical Data
• Frequency distribution: A summary of data that shows the number
(frequency) of observations in each of several nonoverlapping classes,
• Typically referred to as bins, when dealing with distributions
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
24
Table 2.3: Data from a Sample of 50 Soft Drink Purchases
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
25
Table 2.4: Frequency Distribution of Soft Drink Purchases
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
26
Figure 2.10: Creating a Frequency Distribution for Soft
Drinks Data in Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
27
Creating Distributions from Data
Relative Frequency and Percent Frequency Distributions
• Relative frequency distribution: It is a tabular summary of data
showing the relative frequency for each bin
• Percent frequency distribution: Summarizes the percent frequency of
the data for each bin
• Percent frequency distribution is used to provide estimates of the
relative likelihoods of different values of a random variable
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
28
Table 2.5: Relative Frequency and Percent Frequency Distributions of Soft Drink
Purchases
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
29
Creating Distributions from Data
Frequency Distributions for Quantitative Data
• Three steps necessary to define the classes for a frequency
distribution with quantitative data:
1. Determine the number of nonoverlapping bins.
2. Determine the width of each bin.
3. Determine the bin limits.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
30
Creating Distributions from Data
Table 2.6: Year-End Audit Times (Days)
Table 2.7: Frequency, Relative Frequency, and Percent Frequency Distributions for the
Audit Time Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
31
Figure 2.11: Using Excel to Generate a Frequency Distribution for Audit Times Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
32
Creating Distributions from Data
Histogram
• A common graphical presentation of quantitative data
• Constructed by placing the variable of interest on the horizontal axis
and the selected frequency measure (absolute frequency, relative
frequency, or percent frequency) on the vertical axis.
• The frequency measure of each class is shown by drawing a rectangle
whose base is determined by the class limits on the horizontal axis and
whose height is the corresponding frequency measure.
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
33
Figure 2.12: Histogram for the Audit Time Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
34
Figure 2.13: Creating a Histogram for the Audit Time Data Using Data Analysis
Toolpak in Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
35
Figure 2.14: Completed Histogram for the Audit Time Data Using Data Analysis
ToolPak in Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
36
Creating Distributions from Data
• Histograms provides information about the shape, or form, of a
distribution
• Skewness: Lack of symmetry
• Skewness is an important characteristic of the shape of a distribution
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
37
Figure 2.15: Histograms Showing Distributions with Different Levels of Skewness
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
38
Creating Distributions from Data
Cumulative Distributions
• Cumulative frequency distribution: A variation of the frequency
distribution that provides another tabular summary of quantitative
data
• Uses the number of classes, class widths, and class limits developed for the
frequency distribution
• Shows the number of data items with values less than or equal to the upper
class limit of each class
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
39
Table 2.8: Cumulative Frequency, Cumulative Relative
Frequency, and Cumulative Percent Frequency
Distributions for the Audit Time Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
40
Measures of Location
Mean (Arithmetic Mean)
Median
Mode
Geometric Mean
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Measures of Location
•Mean/Arithmetic
Mean
• Average value for a variable
• The mean is denoted by
• n = sample size
• = value of variable x for the first observation
• = value of variable x for the second observation
• = value of variable x for the nth observation
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
42
Table 2.9:
Data on Home Sales in Cincinnati, Ohio, Suburb
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
43
Computation of Sample Mean
Illustration: Computation of the mean home selling price for the sample
of 12 home sales
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
44
Measures of Location
Median
• Value in the middle when the data are arranged in ascending order
• Middle value, for an odd number of observations
• Average of two middle values, for an even number of observations
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
45
Computation of Sample Median
Illustration: When the number of observations are odd
• Consider the class size data for a sample of five college classes:
46 54 42 46 32
• Arrange the class size data in ascending order
32 42 46 46 54
• Middlemost value in the data set = 46
• Median is 46
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
46
Computation of Sample Median
Illustration - When the number of observations are even
• Consider the data on home sales in Cincinnati, Ohio, Suburb (Table 2.9)
• Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000 199,500 208,000 254,000
254,000 257,500 298,000 456,250
• Median = average of two middle values
= "199,500 + 208,000" /"2" = 203,750
Middle Two Values
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
47
Measures of Location
Mode
• Value that occurs most frequently in a data set
• Consider the class size data:
32 42 46 46 54
• Observe - 46 is the only value that occurs more than once
• Mode is 46
• Multimodal data - Data contain at least two modes
• Bimodal data - Data contain exactly two modes
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
48
Figure 2.16: Calculating the Mean, Median, and Modes for
the Home Sales Data using Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
49
Measures of Location
Geometric Mean
• nth root of the product of n values
• Used in analyzing growth rates in financial data.
• Sample geometric mean
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
50
Table 2.10: Percentage Annual Returns and Growth
Factors for the Mutual Fund Data
• Illustration: Consider the percentage annual returns and growth factors for
the mutual fund data over the past 10 years
• We will determine the mean rate of growth for the fund over the 10-year
period
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
51
Computation of Geometric Mean
• Solution:
• Product of the growth factors:
$100(.779)(1.287)(1.109)(1.049)(1.158)(1.055)(.630)(1.265)(1.151)(1.021)
= 1.335
• Geometric mean of the growth factors:
= = 1.029
• Conclude that annual returns grew at an average annual rate of
(1.029 – 1)100% or 2.9%
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
52
Figure 2.17: Calculating the Geometric Mean for the Mutual Fund Data Using Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
53
Measures of Variability
Range
Variance
Standard Deviation
Coefficient of Variation
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Measures of Variability
Table 2.11: Annual Payouts for Two Figure 2.18: Histograms for Payouts of
Different Investment Funds Past 20 Years from Fund A and Fund B
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
55
Computation of Range
Range
• Found by subtracting the smallest value from the largest value in a data set
• Illustration: Consider the data on home sales in Cincinnati, Ohio, suburb
• Largest home sales price: $456,250
• Smallest home sales price: $108,000
• Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
• Drawback: Range is based on only two of the observations and thus is
highly influenced by extreme values
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
56
Measures of Variability
•Variance
• Measure of variability that utilizes all the data
• It is based on the deviation about the mean, which is the difference
between the value of each observation (xi) and the mean
• The deviations about the mean are squared while computing the
variance
• Sample variance, =
• Population variance , =
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
57
Table 2.12: Computation of Deviations and Squared Deviations
about the Mean for the Class Size Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
58
Figure 2.19: Calculating Variability Measures for the Home Sales Data in Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
59
Measures of Variability
• Standard Deviation
• Positive square root of the variance
• Measured in the same units as the original data
• For sample , s =
• For population, σ =
• Coefficient of Variation
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
60
Computation of Coefficient of Variation
•Illustration:
• Consider the class size data:
46 54 42 46 32
• Mean, = 44
• Standard deviation, s = 8
• Coefficient of variation = % = 18.2%
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
61
Analyzing Distributions
Percentiles
Quartiles Empirical Rule
Z-Scores Identifying Outliers
Box Plots
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Analyzing Distributions
Percentiles
• Value of a variable at which a specified (approximate) percentage of
observations are below that value
• The pth percentile tells us the point in the data where:
• Approximately p percent of the observations have values less than the pth
percentile
• Approximately (100 – p) percent of the observations have values greater than
the pth percentile
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
63
Analyzing Distributions
• Steps to calculate the pth percentile:
• Arrange the data in ascending order (smallest to largest value)
• Compute k = (n + 1) × p
• Divide k into its integer component, i, and its decimal component, d
• If d = 0, find the kth largest value in the data set; this is the pth percentile
• If d > 0, the percentile is between the values in positions i and i + 1 in the sorted data;
to find this percentile, we must interpolate between these two values:
i. Calculate the difference between the values in positions i and i + 1 in the sorted data
set; we define this difference between the two values as m
ii. Multiply this difference by d: t = m × d
iii. To find the pth percentile, add t to the value in position i of the sorted data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
64
Analyzing Distributions
• Illustration
• To determine the 85th percentile for the home sales data in Table 2.9.
1. Arrange the data in ascending order
108,000 138,000 138,000 142,000 186,000 199,500
208,000 254,000 254,000 257,500 298,000 456,250
2. Compute k = (n + 1) × p = (12 + 1) × 0.85 = 11.05
3. Dividing 11.05 into the integer and decimal components gives us i = 11
and d = 0.05
d > 0, interpolate between the values in the 11th and 12th positions in the sorted
data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
65
Analyzing Distributions
Illustration (contd.)
• To determine the 85th percentile for the home sales data in Table 2.9
• The value in the 11th position is 298,000
• The value in the 12th position is 456,250
m = 456,250 – 298,000 = 158,250
t = m × d = 158,250 × 0.05 = 7912.5
pth percentile = 298,000 + 7912.5 = 305,912.5
$305,912.50 represents the 85th percentile of the home sales data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
66
Analyzing Distributions
•Quartiles
• When the data is divided into four equal parts:
• Each part contains approximately 25% of the observations
• Division points are referred to as quartiles
• = first quartile, or 25th percentile
• = second quartile, or 50th percentile (also the median)
• = third quartile, or 75th percentile
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
67
Analyzing Distributions
•z-score
• Measures the relative location of a value in the data set
• Helps to determine how far a particular value is from the mean
relative to the data set’s standard deviation
• Standardized value
• If , , . . . , is a sample of n observations
=
• = z-score for
• = sample mean
• s = sample standard deviation
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
68
Table 2.13: z-Scores for the Class Size Data
• For class size data, = 44 and s = 8
• For observations with a value > mean, z-score > 0
• For observations with a value < mean, z-score < 0
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
69
Figure 2.20: Calculating z-Scores for the Home Sales Data in Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
70
Analyzing Distributions
Empirical Rule
• For data having a bell-shaped distribution:
• Within 1 standard deviation—approximately 68% of the data values
• Within 2 standard deviations—approximately 95% of the data values
• Within 3 standard deviations—almost all the data values
Identifying Outliers
• Outliers: Extreme values in a data set
• It can be identified using standardized values (z-scores)
• Any data value with a z-score less than –3 or greater than +3 is an outlier
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
71
Analyzing Distributions
Box Plots
• Graphical summary of the distribution of data
• Developed from the quartiles for a data set
Figure 2.22: Box Plot
for the Home Sales
Data
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
72
Figure 2.23: Box Plots Comparing Home Sale Prices in Different Communities
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
73
Measures of Association
Between Two Variables
Scatter Charts
Covariance
Correlation Coefficient
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
Measures of Association Between Two Variables
• Scatter Charts: Useful graph for analyzing the relationship between
two variables
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
75
Table 2.14: Data for Bottled Water Sales at Queensland Amusement
Park for a Sample of 14 Summer Days
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
76
Figure 2.24: Chart Showing the Positive Linear Relation
Between Sales and High Temperatures
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
77
Table 2.15: Sample Covariance Calculations for Daily High Temperature and Bottled
Water Sales at Queensland Amusement Park
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
78
Figure 2.25: Calculating Covariance and Correlation Coefficient for
Bottled Water Sales Using Excel
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
79
Measures of Association Between Two Variables
• Correlation coefficient: Measures the relationship between two
variables
• Not affected by the units of measurement for x and y
• Sample correlation coefficient denoted by
• =
• = sample covariance =
• = sample standard deviation of x =
• = sample standard deviation of y =
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
80
Interpretation of Correlation Coefficient
–1 ≤ r ≤ +1
r value Relationship between the x
and y variables
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
81
Figure 2.26: Scatter Diagrams and Associated Covariance Values
for Different Variable Relationships
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
82
Computation of Correlation Coefficient
•Illustration
• To determine the sample correlation coefficient for bottled water
sales at Queensland Amusement Park:
= = = 0.93
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
83
Figure 2.27: Example of Nonlinear Relationship Producing
a Correlation Coefficient Near Zero
© 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a
password-protected website for classroom use.
84