0% found this document useful (0 votes)
12 views92 pages

Chapter 2 (Descriptive)

Uploaded by

haiiyenn0909
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views92 pages

Chapter 2 (Descriptive)

Uploaded by

haiiyenn0909
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 92

Chapter 2

DESCRIPTIVE
STATISTICS
POINTS TO HIGHLIGHT
 Overview of using data: Definition and Goals
 Types of data
 Modifying data in Excel
 Creating distributions from data
 Measures of location
 Measures of variability
 Analyzing distribution
 Measure of association between two variables
Overview of Using
Data: Definitions and
Goals
 Data
 Variable

 Observation

 Variation

 Random variables/ Uncertain variables


Overview of Using Data:
Definitions and Goals
 Data: The facts and figures collected, analyzed, and
summarized for presentation and interpretation
 Variable: A characteristic or a quantity of interest that
can take on different values
 Observation: Set of values corresponding to a set of
variables

 Variation: The difference in a variable measured over


observations
 Random variable/uncertain variable: A quantity
whose values are not known with certainty
Table 2.1: Data for Dow Jones
Industrial Index Companies

5
Types of Data
 Population and Sample Data
 Quantitative and Categorical Data
 Cross-Sectional and Time Series Data
 Sources of Data
Population vs. Sample Data
 Population: the whole group of all elements
of interest
In some cases, it is not feasible to collect data
from population

 Sample: A subset of the population


◦ Random sampling: A sampling method to
gather a representative sample of the
population data

7
Population vs. Sample data

Population Sample

a b cd b c
ef gh i jk l m n gi n
o p q rs t u v o r u
w
y
x y z

8
Quantitative vs. Categorical
data
 Quantitative data: Data on which numeric and
arithmetic operations, such as addition,
subtraction, multiplication, and division, can be
performed
 Example: Share Price, Volume (Data for Dow Jones
Industrial Index Companies)

 Categorical data (Qualitative data): Data on


which arithmetic operations cannot be performed
 Example: Industry (Data for Dow Jones Industrial Index
Companies)

9
Cross-sectional vs. Time series
data
 Cross-sectional data: Data collected from
several entities at the same, or
approximately the same, point in time
 Example: Data in table 2.1 for Dow Jones
Industrial Index Companies

 Time series data: Data collected over


several time periods
 Example: Dow Jones Index Values Since 2005

10
Figure 2.1: Dow Jones Index
Values Since 2005

11
Sources of Data
◦ Experimental study
 A variable of interest is first identified
 Then one or more other variables are identified and
controlled or manipulated so that data can be
obtained about how they influence the variable of
interest

◦ Nonexperimental study or observational


study
 Investigators observe the characteristics of interest
 Make no attempt to control the variables of interest
 A survey is perhaps the most common type of
observational study

12
Source of Data
 Experimental study
A researcher for a pharmaceutical
company wants to determine whether
aspirin does reduce the incidence of heart
attacks. He select a sample of men and
women. The sample would be divided into
two groups: one group would take aspirin
regularly and the other would not. After 2
years, the researcher would determine the
proportion of people in each group who
had suffered a heart attack. Then, it is
possible to draw conclusion whether
aspirin is effective in reducing the
likelihood of heart attacks.

13
Source of Data
 Observational study
A researcher for a pharmaceutical company
wants to determine whether aspirin does
reduce the incidence of heart attacks. He
select a sample of men and women and
asking each whether he or she has taken
aspirin regularly over the past 2 years. Each
person would be asked whether he or she
had suffered a heart attack over the same
period. The proportions reporting heart
attacks would be compared and a
conclusion can be drawn whether aspirin is
effective in reducing the likelihood of heart
attacks.

14
Figure 2.2: Customer Opinion
Questionnaire used by Chops City
Grill Restaurant

15
Figure 2.3: Data for 20 Top-Selling
Automobiles Entered into Excel with
Percent Change in Sales from 2010

16
Modifying Data in
Excel
 Sorting and Filtering Data in
Excel
 Conditional Formatting of

Data in Excel
Sorting and Filtering Data in
Excel
To sort the automobiles by March 2010 sales:
◦ Step 1: Select cells A1:F21
◦ Step 2: Click the Data tab in the Ribbon
◦ Step 3: Click Sort in the Sort & Filter group
◦ Step 4: Select the check box for My data has
headers
◦ Step 5: In the first Sort by dropdown menu,
select Sales (March 2010)
◦ Step 6: In the Order dropdown menu, select
Largest to Smallest
◦ Step 7: Click OK

18
Figure 2.4: Using Excel’s Sort
Function to Sort the Top-Selling
Automobiles Data

19
Figure 2.5: Top-Selling Automobiles
Data Sorted by Sales in March 2010
Sales

20
Sorting and Filtering Data in
Excel
 Using Excel’s Filter function to see the sales of models made
by Toyota
◦ Step 1: Select cells A1:F21
◦ Step 2: Click the Data tab in the Ribbon
◦ Step 3: Click Filter in the Sort & Filter group
◦ Step 4: Click on the Filter Arrow in column B, next to
Manufacturer
◦ Step 5: If all choices are checked, you can easily deselect all
choices by unchecking (Select All). Then select only the check
box for Toyota.
◦ Step 6. Click OK

21
Figure 2.6: Top Selling Automobiles
Data Filtered to Show Only
Automobiles Manufactured by Toyota

22
Conditional Formatting of Data in
Excel

 Makes it easy to identify data that satisfy


certain conditions in a data set

23
Conditional Formatting of Data in
Excel
Example:
 To identify the automobile models in Table 2.2 for which

sales had decreased from March 2010 to March 2011:

◦ Step 1: Starting with the original data shown in Figure 2.2, select
cells F1:F21

◦ Step 2: Click on the Home tab in the Ribbon

◦ Step 3: Click Conditional Formatting in the Styles group

◦ Step 4: Select Highlight Cells Rules, and click Less Than


from the dropdown menu

◦ Step 5: Enter 0% in the Format cells that are LESS THAN:


box

◦ Step 6: Click OK
24
Figure 2.7: Using Conditional Formatting
in Excel to Highlight Automobiles with
Declining Sales from March 2010

25
Figure 2.8: Using Conditional
Formatting in Excel to Generate Data
Bars for the Top-Selling Automobiles
Data

26
Conditional Formatting of Data in
Excel
 Quick Analysis button appears just outside the
bottom-right corner of a group of selected cells
 Provides shortcuts for Conditional Formatting,
adding Data Bars, etc.
Creating
Distributions from
Data
 Frequency Distributions for
Categorical Data
 Relative Frequency and
Percent Frequency
Distributions
 Frequency Distributions for

Quantitative Data
 Histograms

 Cumulative Distributions
Frequency Distributions for
Categorical Data

◦ Frequency distribution: A summary of


data that shows the number (frequency)
of observations in each of several
nonoverlapping classes,

◦ Typically referred to as bins, when dealing


with distributions

29
Table 2.3: Data from a Sample
of 50 Soft Drink Purchases

30
Table 2.4: Frequency Distribution of
Soft Drink Purchases

 The frequency distribution summarizes information


about the popularity of the five soft drinks:

◦ Coca-Cola is the leader


◦ Pepsi is second
◦ Diet Coke is third and Sprite and Dr. Pepper are tied for
fourth
Figure 2.10: Creating a Frequency
Distribution for Soft Drinks Data in
Excel

32
Relative Frequency and Percent
Frequency Distributions

◦ Relative frequency distribution: It is a


tabular summary of data showing the
relative frequency for each bin

◦ Percent frequency distribution:


Summarizes the percent frequency of the
data for each bin

33
Table 2.5: Relative Frequency and
Percent Frequency Distributions of
Soft Drink Purchases

34
Frequency Distributions for
Quantitative Data
Three steps necessary to define the
classes for a frequency distribution with
quantitative data:
1. Determine the number of non-overlapping
bins.
2. Determine the width of each bin.
3. Determine the bin limits.

35
Creating Distributions from
Data
Table 2.6: Year-End Audit Times (Days)

Table 2.7: Frequency, Relative Frequency, and Percent


Frequency Distributions for the Audit Time Data

36
Figure 2.11: Using Excel to Generate
a Frequency Distribution for Audit
Times Data

37
Histogram

◦ A common graphical presentation of quantitative


data

◦ Constructed by placing the variable of interest on


the horizontal axis and the selected frequency
measure (absolute frequency, relative frequency,
or percent frequency) on the vertical axis.

◦ The frequency measure of each class is shown by


drawing a rectangle whose base is determined by
the class limits on the horizontal axis and whose
height is the corresponding frequency measure.

38
Figure 2.12: Histogram for the
Audit Time Data

39
Figure 2.13: Creating a Histogram for the
Audit Time Data Using Data Analysis Toolpak in
Excel

40
Figure 2.14: Completed Histogram for the
Audit Time Data Using Data Analysis ToolPak in
Excel

41
Creating Distributions from
Data
 Histograms provides information about the
shape, or form, of a distribution
 Skewness: Lack of symmetry
 Skewness is an important characteristic of
the shape of a distribution

42
Figure 2.15: Histograms Showing
Distributions with Different Levels of
Skewness

43
Cumulative Distributions
 Cumulative frequency distribution:
shows the number of data items with values
less than or equal to the upper class limit of
each class
◦ A variation of the frequency distribution that
provides another tabular summary of
quantitative data

44
Table 2.8: Cumulative Frequency,
Cumulative Relative Frequency, and
Cumulative Percent Frequency
Distributions for the Audit Time Data

45
Measures of Location
 Mean (Arithmetic Mean)
 Median

 Mode

 Geometric Mean
Measures of Location
Mean/Arithmetic Mean
◦ Average value for a variable
◦ The mean is denoted by
◦ n = sample size
◦ = value of variable x for the first observation
◦ = value of variable x for the second observation
◦ = value of variable x for the nth observation

47
Table 2.9: Data on Home Sales in
Cincinnati, Ohio, Suburb

48
Computation of Sample Mean
illustration: Computation of the mean
home selling price for the sample of 12
home sales

49
Measures of Location
Median
◦ Value of the item in the middle when the
data are arranged in ascending order
◦ Value of middle item, for an odd number of
observations
◦ Average of values of two middle items, for
an even number of observations

50
Computation of Sample Median
 illustration: When the number of
observations are odd
 Consider the class size data for a sample of five
college classes:
46 54 42 46 32
 Arrange the class size data in ascending order
32 42 46 46 54
 Middlemost value in the data set = 46
 Median is 46

51
Computation of Sample Median
 illustration - When the number of
observations are even
 Consider the data on home sales in Cincinnati,
Ohio, Suburb (Table 2.9)
 Arrange the data in ascending order:
108,000 138,000 138,000 142,000 186,000
199,500 208,000 254,000 254,000 257,500
298,000 456,250
 Median = average of two middle values
= "199,500 + 208,000" /"2" =203,750

Middle Two Values


52
Measures of Location
Mode
◦ Value that occurs most frequently in a data set
◦ Consider the class size data:
32 42 46 46 54
◦ Observe - 46 is the only value that occurs
more than once
◦ Mode is 46
◦ Multimodal data - Data contain at least two
modes
◦ Bimodal data - Data contain exactly two
modes

53
Figure 2.16: Calculating the Mean,
Median, and Modes for the Home
Sales Data using Excel

54
Measures of Location
Geometric Mean
◦ nth root of the product of n values
◦ Used in analyzing growth rates or rate of
change
◦ Sample geometric mean

55
Table 2.10: Percentage Annual
Returns and Growth Factors for the
Mutual Fund Data
 illustration: Consider the percentage annual returns and
growth factors for the mutual fund data over the past 10
years
 We will determine the mean rate of growth for the fund
over the 10-year period

56
Computation of Geometric
Mean
 Solution:
◦ Product of the growth factors:
(.779)(1.287)(1.109)(1.049)(1.158)(1.055)(.630)(1.265)
(1.151)(1.021)
= 1.335
◦ Geometric mean of the growth factors:
= = 1.029
◦ Conclude that annual returns grew at an
average annual rate of
(1.029 – 1)100% or 2.9%

57
Figure 2.17: Calculating the
Geometric Mean for the Mutual
Fund Data Using Excel

58
Measures of
Variability
 Range
 Variance

 Standard Deviation

 Coefficient of Variation
Measures of Variability

Table 2.11: Annual Payouts for Two Figure 2.18: Histograms for Payouts of
Different Investment Funds Past 20 Years from Fund A and Fund B

60
Computation of Range
Range
 Found by subtracting the smallest value from the

largest value in a data set


 Illustration: Consider the data on home sales in

Cincinnati, Ohio, suburb


◦ Largest home sales price: $456,250
◦ Smallest home sales price: $108,000
◦ Range = Largest value – Smallest value
= $456,250 – $108,000
= $348,250
 Drawback: Range is based on only two of the
observations and thus is highly influenced by
extreme values
61
Measures of Variability
Variance
 Measure of variability that utilizes all the

data
 It is based on the deviation about the mean,
which is the difference between the value of
each observation (xi) and the mean
 The deviations about the mean are squared
while computing the variance
◦ Sample variance, =
◦ Population variance , =

62
Table 2.12: Computation of
Deviations and Squared Deviations
about the Mean for the Class Size
Data

Computation of Sample
Variance:

63
Figure 2.19: Calculating Variability
Measures for the Home Sales Data
in Excel

64
Measures of Variability
 Standard Deviation
◦ Positive square root of the variance
◦ Measured in the same units as the original data
◦ For sample , s=
◦ For population, σ=
 Coefficient of Variation

◦ Measures the standard deviation relative to the


mean
◦ Expressed as a percentage

65
Computation of Coefficient of
Variation
 Illustration:
 Consider the class size data:
46 54 42 46 32
 Mean, = 44
 Standard deviation, s = 8
 Coefficient of variation = % = 18.2%

66
Analyzing
Distributions
 Percentiles
 Quartiles

 Z-Scores

 Empirical Rule

 Identifying

Outliers
 Box Plots
Analyzing Distributions
Percentiles
 Value of a variable at which a specified

(approximate) percentage of observations


are below that value
 The pth percentile tells us the point in the

data where:
◦ Approximately p percent of the observations
have values less than the pth percentile
◦ Approximately (100 – p) percent of the
observations have values greater than the pth
percentile

68
Analyzing Distributions
 Steps to calculate the pth percentile:
◦ Arrange the data in ascending order (smallest to largest value)
◦ Compute k = (n + 1) × p
◦ Divide k into its integer component, i, and its decimal
component, d
 If d = 0, find the kth largest value in the data set; this is the pth
percentile
 If d > 0, the percentile is between the values in positions i and i +
1 in the sorted data; to find this percentile, we must interpolate
between these two values:
i. Calculate the difference between the values in positions i and i +
1 in the sorted data set; we define this difference between the two
values as m
ii. Multiply this difference by d: t = m × d
iii. To find the pth percentile, add t to the value in position i of the
sorted data

69
Analyzing Distributions
 Illustration
 To determine the 85th percentile for the home sales
data in Table 2.9.
1. Arrange the data in ascending order
108,000 138,000 138,000 142,000 186,000
199,500
208,000 254,000 254,000 257,500 298,000
456,250
2. Compute k = (n + 1) × p = (12 + 1) × 0.85 = 11.05
3. Dividing 11.05 into the integer and decimal
components gives us i = 11 and d = 0.05
d > 0, interpolate between the values in the 11th and
12th positions in the sorted data

70
Analyzing Distributions
Illustration (contd.)
 To determine the 85th percentile for the

home sales data in Table 2.9


◦ The value in the 11th position is 298,000

◦ The value in the 12th position is 456,250


m = 456,250 – 298,000 = 158,250
t = m × d = 158,250 × 0.05 = 7912.5
pth percentile = 298,000 + 7912.5 = 305,912.5
$305,912.50 represents the 85th percentile of the home
sales data

71
Analyzing Distributions
Quartiles
 When the data is divided into four equal

parts:
◦ Each part contains approximately 25% of
the observations
◦ Division points are referred to as quartiles
 = first quartile, or 25th percentile
 = second quartile, or 50th percentile (also the
median)
 = third quartile, or 75th percentile

72
Analyzing Distributions
z-score
 Measures the relative location of a value in the

data set
 Helps to determine how far a particular value is

from the mean relative to the data set’s standard


deviation
 Standardized value

 If , , . . . , is a sample of n observations
=
 = z-score for
 = sample mean
 s = sample standard deviation

73
Table 2.13: z-Scores for the
Class Size Data
 For class size data, = 44 and s= 8
◦ For observations with a value > mean, z-score >0
◦ For observations with a value <mean, z-score <0

74
Figure 2.20: Calculating z-
Scores for the Home Sales
Data in Excel

75
Example: which is the better
offer?
Suppose that two graduating seniors, one a marketing
major and one an accounting major, are comparing job
offers. The accounting major has an offer for $45,000
per year, and the marketing student has an offer for
$42,000 per year. Summary information about the
distribution of offers follows:
Accounting: mean = 46,000 Standard deviation =
1500
Marketing: mean = 42,500 Standard deviation =
1000
Example: which is the better
offer?
 Accounting  Marketing

z score = z score =

This offer is 0.67 SD This offer is 0.5 SD


below the mean below the mean
Analyzing Distributions
Empirical Rule
◦ For data having a bell-shaped distribution:
 Within 1 standard deviation—approximately 68% of
the data values
 Within 2 standard deviations—approximately 95% of
the data values
 Within 3 standard deviations—almost all the data
values
Identifying Outliers
◦ Outliers: Extreme values in a data set
◦ It can be identified using standardized values (z-
scores)
◦ Any data value with a z-score less than –3 or
greater than +3 is an outlier
78
Analyzing Distributions
Box Plots
 Graphical summary of the distribution of

data
 Developed from the quartiles for a data set
Figure 2.22: Box
Plot for the
Home Sales Data

79
Figure 2.23: Box Plots Comparing Home
Sale Prices in Different Communities

80
Measures of
Association Between
Two Variables
 Scatter Charts
 Covariance
 Correlation Coefficient
Table 2.14: Data for Bottled Water
Sales at Queensland Amusement Park
for a Sample of 14 Summer Days

82
Figure 2.24: Chart Showing the
Positive Linear Relation Between
Sales and High Temperatures
Scatter
chart

83
Measures of Association
Between Two Variables
 Scatter Charts:
 Useful graph for analyzing the relationship
between two variables
 The scatter chart also suggests that a straight
line could be used as an approximation for the
relationship between two variables

84
Measures of Association
Between Two Variables
 Covariance: Descriptive measure of the
linear association between two variables
◦ Sample covariance for a sample of size n with
the observations
(, ), (, ), and so on: =
◦ Population covariance, =

85
Table 2.15: Sample Covariance Calculations for
Daily High Temperature and Bottled Water Sales
at Queensland Amusement Park

86
Figure 2.25: Calculating Covariance and
Correlation Coefficient for Bottled Water
Sales Using Excel

87
Measures of Association
Between Two Variables
 Correlation coefficient: Measures the
relationship between two variables
◦ Not affected by the units of measurement for x
andy
◦ Sample correlation coefficient denoted by
 =
 = sample covariance =
 = sample standard deviation of x =
 = sample standard deviation of y=

88
Interpretation of Correlation
Coefficient
–1 ≤ r ≤ +1
r value Relationship between
the x and y variables

<0 Negative linear

Near 0 No linear relationship

>0 Positive linear

89
Figure 2.26: Scatter Diagrams and
Associated Covariance Values for
Different Variable Relationships

(a) (b) (c)


Positive: Approximately 0: Negative:
(x and y are positively (x and y are not (x and y are
linearly related) linearly related) negatively
linearly related)

90
Computation of Correlation
Coefficient
Illustration
 To determine the sample correlation

coefficient for bottled water sales at


Queensland Amusement Park:
= = = 0.93
 There is a very strong linear relationship
between high temperature and sales

91
Figure 2.27: Example of Nonlinear
Relationship Producing a Correlation
Coefficient Near Zero

92

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy