CH 12
CH 12
(상관과 회귀)
Chapter 12
What is Correlation Analysis?
Used to report the relationship between two variables
CORRELATION ANALYSIS a descriptive statistic that summarizes the
relationship between two variables with a single number
6672
r = (15−1)(42.76)(12.89) = 0.865
Correlation Coefficient Example
The Applewood Auto Group’s marketing department believes younger buyers
purchase vehicles on which lower profits are earned and older buyers purchase
vehicles on which higher profits are earned. They would like to use this information as
part of an upcoming advertising campaign to try to attract older buyers. Develop a
scatter diagram and then determine the correlation coefficient. Would this be a useful
advertising feature?
To illustrate, the same data are plotted in the three charts below
Least Squares Regression Line
This is the equation of a line
The first step is to find the slope of the least squares regression line, b
Next, find a
So if a salesperson makes 100 calls, he or she can expect to sell 46.0432 copiers
Drawing the Regression Line
The least squares equation can be drawn on the scatter diagram. For example, the fifth
sales representative is Jeff Hall. He made 164 calls. His estimated number of copiers sold
is 62.7344. The plot x = 164 and yො = 62.7344 is located by moving to 164 on the x-axis
and then going vertically to 62.7344. The other points on the regression equation can be
determined by substituting a particular value of x into the regression equation and
calculating yො .
Regression Equation Slope Test
For a regression equation, the slope is tested for
significance
We test the hypothesis that the slope of the line in the
population is 0
If we do not reject the null hypothesis, we conclude there
is no relationship between the two variables
We begin with the hypothesis statements
H0: β = 0
H1: 𝛽 ≠ 0
Regression Equation Slope Test Example
Recall the North American Copier Sales example. We identified the slope as b and it is
our estimate of the slope of the population, β. We conduct a hypothesis test.
Step 1: State the null and alternate hypothesis
H0: β ≤ 0
H1: 𝛽 > 0
Step 2: Select the level of significance, we use .05
Step 3: Select the test statistic, t
Step 4: Formulate the decision rule, reject H0 if t > 1.771
Step 5: Make decision, reject H0, t = 6.205
Step 6: Interpret, the number of sales calls is useful in estimating copier sales
𝑠𝑒
𝑠𝑏 =
(𝑛−1)× 𝑠𝑥 2
(𝑌𝑖 −𝑌𝑖 )2
Highlighted, b is .2606; the standard error is .0420 𝑠𝑒 =
2
(𝑛−2)
coefficient se t p
If the standard error of estimate is small, this indicates that the data are relatively
close to the regression line and the regression equation can be used. If it is large,
the data are widely scattered around the regression line and the regression
equation will not provide a precise estimate of y.
Coefficient of Determination
COEFFICIENT OF DETERMINATION The proportion of the total variation in
the dependent variable Y that is explained, or accounted for, by the variation in
the independent variable X.
It ranges from 0 to 1.0
It is the square of the correlation coefficient
It is found from the following formula
In the North American Copier Sales example, the correlation coefficient was
.865; just square that (.865)2 = .748; this is the coefficient of determination
This means 74.8% of the variation in the number of copiers sold is explained by
the variation in sales calls
Relationships among r, r2, and sy,x
Recall the standard error of estimate measures how close
the actual values are to the regression line
When it is small, the two variables are closely related
The correlation coefficient measures the strength of the
linear association between two variables
When points on the scatter diagram are close to the
line, the correlation coefficient tends to be large
Therefore, the correlation coefficient and the standard
error of estimate are inversely related
As noted earlier, the coefficient of determination is the
correlation coefficient squared
Hypothesis testing using correlations:
Example
Acme Co. is researching whether there is a linear
relationship between workload and employee
satisfaction. Workload and self-reported employee
satisfaction was recorded for 4 employees. Data is
provided below:
Workload Satisfaction
2 3
3 2
4 5
3 5
RQ: Is there a linear relationship between workload
and employee satisfaction?
Hypotheses:
H0: ρ = 0
H1: ρ ≠ 0
α = .05
t(2)crit = ± 4.303
Satisfaction
Workload (X) x2 y2 xy
(Y)
2 3 4 9 6
3 2 9 4 6
4 5 16 25 20
3 5 9 25 15
ΣX2 =
ΣX = 12 ΣY = 15 ΣY2 = 63 ΣXY = 47
38
12*15
47 −
r= 4 = .544331
122 152
38 − 63 −
4 4
.544331 4 − 2
t= = .917663
1 − .5443312
r(2) = .92 , p > .05
Retain the null. The correlation was not statistically
significant. There is not a linear relationship between
workload and employee satisfaction.
Learning check
Acme Co. is interested in whether there is a linear
relationship between customer satisfaction and number
of purchases. Customer satisfaction and number of
purchases were recorded for 5 customers. If there is a
linear relationship, calculate the predicted number of
sales if customer satisfaction is 2.5. Data is provided
below:
Satisfaction # of Purchases
3 1
3 3
4 5
4 5
4 5
Learning check
RQ: Is there a linear relationship between satisfaction and
number of purchases?
Hypotheses:
H0: ρ = 0
H1: ρ ≠ 0
α = .05
t(3)crit = ±3.182
Learning check
−
=
=
−