0% found this document useful (0 votes)
31 views36 pages

Chapter 12

Uploaded by

Osama Samha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views36 pages

Chapter 12

Uploaded by

Osama Samha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Chapter 12

Correlation and Regression

1
Introduction
 In addition to hypothesis testing and
confidence intervals, inferential statistics
involves determining whether a
relationship between two or more
numerical or quantitative variables exists.

2
Introduction
 Correlation is a statistical method used
to determine whether a linear relationship
between variables exists.

 Regression is a statistical method used


to describe the nature of the relationship
between variables—that is, positive or
negative, linear or nonlinear.

3
Introduction
 The purpose of this chapter is to answer
these questions statistically:
1. Are two or more variables related?
2. If so, what is the strength of the
relationship?
3. What type of relationship exists?
4. What kind of predictions can be
made from the relationship?
4
Introduction
1. Are two or more variables related?
2. If so, what is the strength of the
relationship?

To answer these two questions, statisticians use


the correlation coefficient, a numerical
measure to determine whether two or more
variables are related and to determine the
strength of the relationship between or among
the variables.

5
Introduction
3. What type of relationship exists?

There are two types of relationships: simple and


multiple.
In a simple relationship, there are two variables:
an independent variable (predictor variable)
and a dependent variable (response variable).

In a multiple relationship, there are two or more


independent variables that are used to predict
one dependent variable.
6
Introduction
4. What kind of predictions can be made
from the relationship?

Predictions are made in all areas and daily.


Examples include weather forecasting, stock
market analyses, sales predictions, crop
predictions, gasoline price predictions, and sports
predictions. Some predictions are more accurate
than others, due to the strength of the relationship.
That is, the stronger the relationship is between
variables, the more accurate the prediction is.
7
10.1 Scatter Plots and Correlation
 A scatter plot is a graph of the ordered
pairs (x, y) of numbers consisting of the
independent variable x and the
dependent variable y.

8
Example 10-1: Car Rental Companies
Construct a scatter plot for the data shown for car rental
companies in the United States for a recent year.

Step 1: Draw and label the x and y axes.


Step 2: Plot each point on the graph.

9
Example 10-1: Car Rental Companies

Positive Relationship

10
Example 10-2: Absences/Final Grades
Construct a scatter plot for the data obtained in a study on
the number of absences and the final grades of seven
randomly selected students from a statistics class.

Step 1: Draw and label the x and y axes.


Step 2: Plot each point on the graph.
11
Example 10-2: Absences/Final Grades
Negative Relationship

12
Example 10-3: Exercise/Milk Intake
Construct a scatter plot for the data obtained in a study on
the number of hours that nine people exercise each week
and the amount of milk (in ounces) each person
consumes per week.

Step 1: Draw and label the x and y axes.


Step 2: Plot each point on the graph.
13
Example 10-3: Exercise/Milk Intake

Very Weak Relationship

14
Correlation
 The correlation coefficient computed from
the sample data measures the strength and
direction of a linear relationship between two
variables.
 There are several types of correlation
coefficients. The one explained in this section
is called the Pearson product moment
correlation coefficient (PPMC).
 The symbol for the sample correlation
coefficient is r. The symbol for the population
correlation coefficient is ρ.
15
Correlation
 The range of the correlation coefficient is from
−1 to +1.
 If there is a strong positive linear
relationship between the variables, the value
of r will be close to +1.
 If there is a strong negative linear
relationship between the variables, the value
of r will be close to −1.

16
Correlation

17
Correlation Coefficient
The formula for the correlation coefficient is

n ( ∑ xy ) − ( ∑ x )( ∑ y )
r=
 n ( x 2 ) − ( x )2   n ( y 2 ) − ( y )2 
 ∑ ∑   ∑ ∑ 
where n is the number of data pairs.

18
Example 10-4: Car Rental Companies
Compute the correlation coefficient for the data in
Example 10–1.
Cars x Income y
Company (in 10,000s) (in billions) xy x2 y2
A 63.0 7.0 441.00 3969.00 49.00
B 29.0 3.9 113.10 841.00 15.21
C 20.8 2.1 43.68 432.64 4.41
D 19.1 2.8 53.48 364.81 7.84
E 13.4 1.4 18.76 179.56 1.96
F 8.5 1.5 2.75 72.25 2.25
Σx = Σy = Σxy = Σx2 = Σy2 =
153.8 18.7 682.77 5859.26 80.67

19
Example 10-4: Car Rental Companies
Compute the correlation coefficient for the data in
Example 10–1.
Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26,
Σy2 = 80.67, n = 6
n ( ∑ xy ) − ( ∑ x )( ∑ y )
r=
 n ( x 2 ) − ( x )2   n ( y 2 ) − ( y )2 
 ∑ ∑   ∑ ∑ 
r=
( 6 )( 682.77 ) − (153.8)(18.7 )
( 6 )( 5859.26 ) − (153.8 )2  ( 6 )( 80.67 ) − (18.7 ) 2 
  
r = 0.982 (strong positive relationship)

20
Example 10-5: Absences/Final Grades
Compute the correlation coefficient for the data in
Example 10–2.
Number of Final Grade
Student absences, x y (pct.) xy x2 y2
A 6 82 492 36 6,724
B 2 86 172 4 7,396
C 15 43 645 225 1,849
D 9 74 666 81 5,476
E 12 58 696 144 3,364
F 5 90 450 25 8,100
G 8 78 624 64 6,084
Σx = Σy = Σxy = Σx 2 = Σy2 =
57 511 3745 579 38,993

21
Example 10-5: Absences/Final Grades
Compute the correlation coefficient for the data in
Example 10–2.
Σx = 57, Σy = 511, Σxy = 3745, Σx2 = 579,
Σy2 = 38,993, n = 7
n ( ∑ xy ) − ( ∑ x )( ∑ y )
r=
 n ( x 2 ) − ( x )2   n ( y 2 ) − ( y )2 
 ∑ ∑   ∑ ∑ 
r=
( 7 )( 3745 ) − ( 57 )( 511)
( 7 )( 579 ) − ( 57 )2  ( 7 )( 38, 993) − ( 511)2 
  
r = −0.944 (strong negative relationship)

22
Example 10-6: Exercise/Milk Intake
Compute the correlation coefficient for the data in
Example 10–3.

Subject Hours, x Amount y xy x2 y2


A 3 48 144 9 2,304
B 0 8 0 0 64
C 2 32 64 4 1,024
D 5 64 320 25 4,096
E 8 10 80 64 100
F 5 32 160 25 1,024
G 10 56 560 100 3,136
H 2 72 144 4 5,184
I 1 48 48 1 2,304
Σx = Σy = Σxy = Σx 2 = Σy2 =
36 370 1,520 232 19,236

23
Example 10-6: Exercise/Milk Intake
Compute the correlation coefficient for the data in
Example 10–3.
Σx = 36, Σy = 370, Σxy = 1520, Σx2 = 232,
Σy2 = 19,236, n = 9
n ( ∑ xy ) − ( ∑ x )( ∑ y )
r=
 n ( x 2 ) − ( x )2   n ( y 2 ) − ( y )2 
 ∑ ∑   ∑ ∑ 
r=
( 7 )(1520 ) − ( 36 )( 370 )
( 7 )( 232 ) − ( 36 )2  ( 7 )(19, 236 ) − ( 370 ) 2 
  
r = 0.067 (very weak relationship)

24
Hypothesis Testing
 In hypothesis testing, one of the following is
true:
H0: ρ = 0 This null hypothesis means that
there is no correlation between the
x and y variables in the population.
H1: ρ ≠ 0 This alternative hypothesis means
that there is a significant correlation
between the variables in the
population.

25
t Test for the Correlation Coefficient

n−2
t=r
1− r2

with degrees of freedom equal to n − 2.

26
Example 10-7: Car Rental Companies
Test the significance of the correlation coefficient found in
Example 10–4. Use α = 0.05 and r = 0.982.

Step 1: State the hypotheses.


H0: ρ = 0 and H1: ρ ≠ 0

Step 2: Find the critical value.


Since α = 0.05 and there are 6 – 2 = 4 degrees of
freedom, the critical values obtained from Table t
are ±2.776.

27
Example 10-7: Car Rental Companies
Step 3: Compute the test value.
n−2 6−2
= t=r 0.982
= 10.4
1 − ( 0.982 )
2 2
1− r

Step 4: Make the decision.


Reject the null hypothesis.

Step 5: Summarize the results.


There is a significant relationship between the
number of cars a rental agency owns and its
annual income.
28
Example 10-8: Car Rental Companies
Using Table I, test the significance of the correlation
coefficient r = 0.067, from Example 10–6, at α = 0.01.

Step 1: State the hypotheses.


H0: ρ = 0 and H1: ρ ≠ 0

There are 9 – 2 = 7 degrees of freedom. The value in


Table I when α = 0.01 is 0.798.
For a significant relationship, r must be greater than 0.798
or less than -0.798. Since r = 0.067, do not reject the null.
Hence, there is not enough evidence to say that there is a
significant linear relationship between the variables.

29
10.2 Regression
 If the value of the correlation coefficient is
significant, the next step is to determine
the equation of the regression line
which is the data’s line of best fit.

30
Regression
 Best fit means that the sum of the
squares of the vertical distance from
each point to the line is at a minimum.

31
Regression Line y′= a + bx

a=
( ∑ ) ( ∑ ) − ( ∑ x )( ∑ xy )
y x 2

n (∑ x ) − (∑ x)
2 2

n ( ∑ xy ) − ( ∑ x )( ∑ y )
b=
n (∑ x ) − (∑ x)
2 2

where
a = y′ intercept
b = the slope of the line.

32
Example 10-9: Car Rental Companies
Find the equation of the regression line for the data in
Example 10–4, and graph the line on the scatter plot.
Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26,
Σy2 = 80.67, n = 6
( ∑ y ) ( ∑ x ) − ( ∑ x )( ∑ xy )
2

a=
n (∑ x ) − (∑ x)2 2

=
(18.7 )( 5859.26 ) − (153.8 )( 682.77 )
= 0.396
6 ( 5859.26 ) − (153.8 )
2

n ( ∑ xy ) − ( ∑ x )( ∑ y ) 6 ( 682.77 ) − (153.8 )(18.7 )


b= = = 0.106
n (∑ x ) − (∑ x) 6 ( 5859.26 ) − (153.8 )
2 2 2

y′ =+
a bx → y′ =0.396 + 0.106 x
33
Example 10-9: Car Rental Companies
Find two points to sketch the graph of the regression line.

Use any x values between 10 and 60. For example, let x


equal 15 and 40. Substitute in the equation and find the
corresponding y value.
=y′ 0.396 + 0.106 x =y′ 0.396 + 0.106 x
= 0.396 + 0.106 (15 )= 0.396 + 0.106 ( 40 )
= 1.986 = 4.636

Plot (15,1.986) and (40,4.636), and sketch the resulting


line.

34
Example 10-9: Car Rental Companies
Find the equation of the regression line for the data in
Example 10–4, and graph the line on the scatter plot.

=y′ 0.396 + 0.106 x

( 40, 4.636 )

(15, 1.986 )

35
Example 10-11: Car Rental Companies
Use the equation of the regression line to predict the
income of a car rental agency that has 200,000
automobiles.

x = 20 corresponds to 200,000 automobiles.


=y′ 0.396 + 0.106 x
= 0.396 + 0.106 ( 20 )
= 2.516
Hence, when a rental agency has 200,000 automobiles, its
revenue will be approximately $2.516 billion.

36

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy