0% found this document useful (0 votes)
18 views32 pages

Chapter 10 Regression and Correlation 4

Uploaded by

Amber Dela Cruz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views32 pages

Chapter 10 Regression and Correlation 4

Uploaded by

Amber Dela Cruz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Correlation and

Regression Analysis
Correlation

Correlation refers to the departure of two


random variables from independence.

Pearson product-moment correlation (PPMC)


is the most widely used in statistics to
measure the degree of the relationship
between the linear related variables.

The correlation coefficient is defined as the


covariance divided by the standard
deviations of the variables.
Pearson product-moment correlation

Pearson’s product-moment correlation


coefficient of simply correlation coefficient
(or Pearson’s r) is a measure of the linear
strength of the association between two
variables.

 Founded by Karl Pearson.

 The value of the correlation coefficient


varies between +1 and –1.
Correlation Coefficient
Y Variable

Y Variable
X Variable X Variable

Perfect Positive Correlation (r = Perfect Negative Correlation (r = -


1.00) 1.00)

Y Variable
Y Variable

X Variable X Variable

Positive Correlation (r = 0.80) Negative Correlation (r = -0.80)


Correlation Coefficient

Y Variable
Y Variable

X Variable X Variable

Zero Correlation (r = 0.00) Non-Linear Correlation (r = -0.00)


Pearson product-moment correlation

N XY  ( X)( Y)
r
[N( X2)  ( X)2][N( Y 2)  ( Y)2]

Test of Significance

r N 2
t
1 r 2

df = n – 2
Correlation Coefficient & Strength of
Relationships

0.00 – no correlation, no relationship


±0.01 to ±0.20 – very slight correlation, almost negligible
relationship
±0.21 to ±0.40 – slight correlation, definite but small
relationship
±0.41 to ±0.70 – moderate correlation, substantial relationship

±0.71 to ±0.90 – high correlation, marked relationship


±0.91 to ±0.99 – very high correlation, very dependable
relationship
±1.00 – perfect correlation, perfect relationship
Assumptions

Subjects are randomly selected and


independently assigned to groups.

Both populations are normally distributed.


Procedure for Pearson Product-Moment Corr. test

 Set up the hypotheses.


H0:  = 0 (The correlation in the population is
zero.)
H1:   0,   0,   0 (The correlation in the
population is different from zero.)
 Calculate the value of Pearson’s r.
 Calculate the value of t value.
 Statistical decision for hypothesis testing
If tcomputed  tcritical, do not reject H0.
If tcomputed  tcritical, reject H0.
Example 1: Pearson r

The owner of a chain of fruit shake stores would like to


study the correlation between atmospheric temperature
and sales during the summer season. A random sample of
12 days is selected with the results given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales (Units) 14 14 14 16 20 15 19 21 20 18 20 15
7 3 7 8 6 5 2 1 9 7 0 0

Plot the data on a scatter diagram. Does it appear there is a


relationship between atmospheric temperature and sales?
Compute the coefficient of correlation. Determine at the
0.05 significance level whether the correlation in the
population is greater than zero.
Scatter Plot

220
220
210
210
200
200
190
190
(Y)
Sales(Y)

180
180
Sales

170
170
160
160
150
150
140
140
130
130
70
70 75
75 80
80 85
85 90
90 95
95 100
100
Temperature
Temperature(X)
(X)
Solution 1:
Step 1: State the hypotheses.
H0: r = 0
There is no correlation between atmospheric
temperature and total sales of fruit shake.
H1: r  0
There is a correlation between atmospheric
temperature and total sales of fruit shake.
Step 2: Level of significance is α = 0.05.

Step 3: df = n–2 = 12 – 2 = 10 & t critical value is 2.228

Step 4: Compute the Pearson’s r.


Table
X(temp) Y(sales)
Day X2 Y2 XY
1 79 147 6,241 21,609 11,613  X 1,029
2 76 143 5,776 20,449 10,868
3 78 147 6,084 21,609 11,466  Y 2,115
4 84 168 7,056 28,224 14,112
5 90 206 8,100 42,436 18,540
X 2
88,733

6 83 155 6,889 24,025 12,865  380,887


Y 2

7 93 192 8,649 36,864 17,856


8 94 211 8,836 44,521 19,834  XY 183,222
9 97 209 9,409 43,681 20,273
10 85 187 7,225 34,969 15,895
11 88 200 7,744 40,000 17,600
12 82 150 6,724 22,500 12,300
Tota 1,02 2,11 88,73 380,88 183,22
l 9 5 3 7 2
Computation of Pearson’s r

N XY  ( X)( Y)
r
[N( X 2 )  ( X)2 ][N( Y 2 )  ( Y)2 ]

12(183,222)  (1,029)(2,115)

[12(88,733)  (1,029) 2 ][12(380,887)  (2,115) 2 ]

= 0.93

The atmospheric temperature and total sales


indicates a very high positive correlation (very
dependable relationship)–that is an increased in
atmospheric temperature is highly associated
with the increased in total sales of fruit shake.
Solution 2:
Step 5: Decision rule.
r N 2 0.93 12 2 0.93(3.16227766
) 2.940918224
t    8.00
1 r 2 1 (0.93)2 1 0.8649 0.367559519

Reject H0

0
+1.812 8.00

Step 6: Conclusion.
We can conclude that there is evidence that
shows significant association between the
atmospheric temperature and the total sales
of fruit shake.
Example 2: Spearman Rank

The owner of a chain of fruit shake stores would like to


study the correlation between atmospheric temperature
and sales during the summer season. A random sample of
12 days is selected with the results given as follows:

Day 1 2 3 4 5 6 7 8 9 10 11 12
Temperature (°F) 79 76 78 84 90 83 93 94 97 85 88 82
Total Sales (Units) 14 14 14 16 20 15 19 21 20 18 20 15
7 3 7 8 6 5 2 1 9 7 0 0

Plot the data on a scatter diagram. Does it appear there is a


relationship between atmospheric temperature and sales?
Compute the coefficient of correlation. Determine at the
0.05 significance level whether the correlation in the
population is greater than zero.
Scatter Plot

14
14
12
12
10
10
ofYY

88
Rankof
Rank

66

44

22

00
00 22 44 66 88 10
10 12
12 14
14
Rank
Rankof
ofXX
Solution 2:
Step 1: State the hypotheses.
H0:  = 0
There is no correlation between atmospheric
temperature and total sales of fruit shake.
H1:   0
There is a correlation between atmospheric
temperature and total sales of fruit shake.
Step 2: Level of significance is α = 0.05.

Step 3: df = n–2 = 12 – 2 = 10 & t critical value is 1.812

Step 4: Compute the .


Table

Day X Y RX RY D D2
1 79 147 10 10.5 –0.5 0.25
2 76 143 12 12 0 0
3 78 147 11 10.5 0.5 0.25
4 84 168 7 7 0 0
5 90 206 4 3 1 1
6 83 155 8 8 0 0
7 93 192 3 5 –2 4
8 94 211 2 1 1 1
9 97 209 1 2 –1 1
10 85 187 6 6 0 0
11 88 200 5 4 1 1
12 82 150 9 9 0 0
Total 0 D 2
8.5
Computation of 

6 D2
 1
N(N2  1)
6(8.5) 51
1 2
1 1 0.030.97
12(12  1) 12(143)

The atmospheric temperature and total sales


indicates a very high positive correlation (very
dependable relationship)–that is an increased in
atmospheric temperature is highly associated with
the increased in total sales of fruit shake.
Solution 2:

Step 5: Decision rule.


 N 2 0.97 12 2 0.97(3.16227766
) 3.06740933
t    12.62
1  2 1 (0.97)2 1 0.9409 0.243104915

Reject H0

0
+1.812 12.62
Step 6: Conclusion.
We can conclude that there is evidence that
shows significant association between the
atmospheric temperature and the total sales
of fruit shake.
Simple Regression Equation
Regression analysis is a simple statistical
tool used to model the dependence of a
variable on one (or more) explanatory
variables.
A simple linear regression is the least
estimator of a linear regression model with
a single predictor (or one independent
variable)
The least square model determines a
regression equation by minimizing the sum
of squares of the vertical distances between
the actual Y values and the predicted values
of Y.
Assumptions of Linear Regression
Equation

Linearity – The mean of each error


component is zero.
Independence of Error Terms – The errors are
independent of each other.

Normally Distributed Error Terms – Each error


component (random variable) follows an
approximate normal distribution.

Homoscedasticity – The variance of the error


components is the same for each value of the
independent variable.
Concept of Linear Regression
is the simplest presentation of a
relationship of a two-variable
model. These two variables are
usually expressed in the form of
an equation.
Example:
1. the quantity demanded is related to price.
2. quantity produced in a factory is related to
the production cost.
3. the number of students for a given semester
may be related to tuition fee.
4. expenditures of a certain household is
related to income and so on.

=> the simplest presentation of the estimation


of these relationships is given by the linear
equation….
Y = a + bX
Steps to obtained the a’ and b’
1. Rewrite the paired values in vertical column.
2.Get the sum of the values in each column.
3.Multiply the value of x by its corresponding y
value to obtain the xy column.
4.Add the values in xy column to get _______
5.Square x values to get the x2 column.
6.Add the values x2 column to obtain ________.
7.Apply the formula:
b’ =
a’ =
8.Solve for the value of a’ by using the any of
the 2 equations.
Estimating the Coefficient

Predicted or fitted value of Y.


Ŷ = b 1X + b 0

Slope of the regression line


N( XY)  ( X)( Y)
b1 
N( X 2 )  ( X) 2

Intercept of the regression line


b0 Y  b1X
Determine the linear regression equation y = a + b x

Year Cost Sales X2 XY


1995 15 38.0 225 570
1996 30 53.3 900 1599
1997 16 60.0 256 960
1998 39 72.0 1521 2808
1999 20 40.0 400 800
2000 36 47.5 1296 1710
2001 45 82.0 2025 3690
2002 10 21.5 100 215
2003 13 25.0 169 325

total 224 439.3 6,892 12,677


ANSWER:
Slope
b’ = 1 .32
a’ = 15.86
Y = a + bx
Y = 15.8629 + 1.32X linear regression equation
Example # 2
Consider the linear equation
obtained in the # 1 example. Give y’ =
15.86 + 1.32 X, where x and y’ are
annual cost and expected annual sales
respectively, determine the expected
sales when the cost is
a. P23.54
b. P16.20
ANSWER:

Y = 15.8629 + 1.32X linear regression equation

a. Y = 15.8629 + 1.32 (23.54)


Y = 46.94

b. Y = 15.8629 + 1.32 (16.20)


Y = 37.25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy