0% found this document useful (0 votes)
64 views44 pages

Production Planning and Control

This document discusses correlation and linear regression. It defines correlation as the relationship between two variables, with one being independent and the other dependent. A scatter plot can show if there is a positive, negative, or no linear correlation between the two variables. The correlation coefficient r measures the strength and direction of the linear relationship, ranging from -1 to 1. The document provides examples of calculating r from data and interpreting the results. It explains how to determine if a correlation is statistically significant using the critical values from tables based on the sample size and level of significance.

Uploaded by

Shanu Sha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views44 pages

Production Planning and Control

This document discusses correlation and linear regression. It defines correlation as the relationship between two variables, with one being independent and the other dependent. A scatter plot can show if there is a positive, negative, or no linear correlation between the two variables. The correlation coefficient r measures the strength and direction of the linear relationship, ranging from -1 to 1. The document provides examples of calculating r from data and interpreting the results. It explains how to determine if a correlation is statistically significant using the critical values from tables based on the sample size and level of significance.

Uploaded by

Shanu Sha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Production Planning and

Control
A correlation is a relationship between two variables. The data
can be represented by the ordered pairs (x, y) where x is the
independent (or explanatory) variable, and y is the dependent
(or response) variable.

A scatter plot can be used to determine y


whether a linear (straight line)
correlation exists between two 2
variables.
x
Example: 2 4 6

x 1 2 3 4 5 –2

y –4 –2 –1 0 2
–4
y y
As x As x
increases, y increases, y
tends to tends to
decrease. increase.

x x
Negative Linear Correlation Positive Linear Correlation
y y

x x
No Correlation Nonlinear Correlation
The correlation coefficient is a measure of the strength and the
direction of a linear relationship between two variables. The
symbol r represents the sample correlation coefficient. The
formula for r is

n  xy   x  y 
r  .
2 2
n  x 2   x  n  y 2   y 

The range of the correlation coefficient is 1 to 1. If x and y


have a strong positive linear correlation, r is close to 1. If x
and y have a strong negative linear correlation, r is close to
1. If there is no linear correlation or a weak linear
correlation, r is close to 0.
y
y

r = 0.91 r = 0.88

x
x
Strong negative correlation
Strong positive correlation
y
y

r = 0.42
r = 0.07

x
x
Weak positive correlation
Nonlinear Correlation
Calculating a Correlation Coefficient
In Words In Symbols

1. Find the sum of the x-values. x


2. Find the sum of the y-values. y
3. Multiply each x-value by its  xy
corresponding y-value and find the sum.
4. Square each x-value and find the sum.
5. Square each y-value and find the sum. x 2
6. Use these five sums to calculate the y 2
correlation coefficient. n  xy   x  y 
r  .
2 2
n  x   x 
2
n  y   y 
2

Continued.
Example:
Calculate the correlation coefficient r for the following data.

x y xy x2 y2
1 –3 –3 1 9
2 –1 –2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
 x  15  y  1  xy  9  x 2  55  y 2  15

n  xy   x  y  5(9)  151


r  
2 2 2
n  x 2   x  n  y 2   y  5(55)  15 2 5(15)  1

60 There is a strong
  0.986
50 74 positive linear
correlation between x
and y.
Example:
The following data represents the number of hours 12
different students watched television during the weekend and
the scores of each student who took a test the following
Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50

n  xy   x  y 
r 
2 2
n  x 2   x  n  y 2   y 
Continued.
Example continued:

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
y
n  xy   x  y 
r 
100 n  x 2   x 
2
n  y 2   y 
2

80
Test score

60
40
20
x
2 4 6 8 10
Hours watching TV Continued.
n  xy   x  y 
r 
2 2
Example continued: n  x 2   x  n  y 2   y 

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500

 x  54  y  908  xy  3724  x 2  332  y 2  70836

n  xy   x  y  12(3724)  54 908


r    0.831
2 2 2
n  x   x 
2
n  y   y 
2
12(332)  54 2
12(70836)  908

There is a strong negative linear correlation.


As the number of hours spent watching TV
increases, the test scores tend to decrease.
Once the sample correlation coefficient r has been calculated, we
need to determine whether there is enough evidence to decide that
the population correlation coefficient ρ is significant at a specified
level of significance.

One way to determine this is to use Table 11 in Appendix B.

If |r| is greater than the critical value, there is enough evidence


to decide that the correlation coefficient ρ is significant.

n  = 0.05  = 0.01
For a sample of size
4 0.950 0.990
n = 6, ρ is significant
5 0.878 0.959
at the 5%
6 0.811 0.917
significance level, if |
7 0.754 0.875
r| > 0.811.
Finding the Correlation Coefficient ρ

In Words In Symbols

1. Determine the number of pairs Determine n.


of data in the sample.
2. Specify the level of significance. Identify .
3. Find the critical value.
Use Table 11 in
4. Decide if the correlation is Appendix B.
significant. If |r| > critical value, the
correlation is significant.
5. Interpret the decision in the Otherwise, there is not
context of the original claim. enough evidence to
support that the
correlation is significant.
Example:
The following data represents the number of hours 12
different students watched television during the weekend and
the scores of each student who took a test the following
Monday.

The correlation coefficient r  0.831.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50

Is the correlation coefficient significant at  = 0.01?

Continued.
Example continued:
Appendix B: Table 11
r  0.831 n  = 0.05  = 0.01
n = 12 4 0.950 0.990
5 0.878 0.959
 = 0.01 6 0.811 0.917

10 0.632 0.765
11 0.602 0.735
12 0.576 0.708 |r| > 0.708
13 0.553 0.684
Because, the population correlation is significant,
there is enough evidence at the 1% level of
significance to conclude that there is a significant
linear correlation between the number of hours of
television watched during the weekend and the scores
of each student who took a test the following Monday.
A hypothesis test can also be used to determine whether the
sample correlation coefficient r provides enough evidence to
conclude that the population correlation coefficient ρ is significant
at a specified level of significance.

A hypothesis test can be one tailed or two tailed.

H0: ρ  0 (no significant negative correlation)


Left-tailed test
Ha: ρ < 0 (significant negative
correlation)
H0: ρ  0 (no significant positive correlation)
Right-tailed test
Ha: ρ > 0 (significant positive
correlation)
H0: ρ = 0 (no significant correlation) Two-tailed test
Ha: ρ  0 (significant correlation)
The t-Test for the Correlation Coefficient
A t-test can be used to test whether the correlation between
two variables is significant. The test statistic is r and the
standardized test statistic

follows a t-distribution with n – 2 degrees of freedom.


r r
t  
σr 1 r2
n 2

In this text, only two-tailed hypothesis tests for ρ are


considered.
Using the t-Test for the Correlation Coefficient ρ

In Words In Symbols

1. State the null and alternative State H0 and Ha.


hypothesis.
2. Specify the level of significance. Identify .
3. Identify the degrees of freedom.
d.f. = n – 2
4. Determine the critical value(s)
and rejection region(s).
Use Table 5 in Appendix
B.
Using the t-Test for the Correlation Coefficient ρ

In Words In Symbols

5. Find the standardized test r


t 
statistic. 1 r2
n 2

6. Make a decision to reject or fail If t is in the


to reject the null hypothesis. rejection region,
reject H0.
Otherwise fail to
7. Interpret the decision in the
reject H0.
context of the original claim.
Example:
The following data represents the number of hours 12
different students watched television during the weekend and
the scores of each student who took a test the following
Monday.

The correlation coefficient r  0.831.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50

Test the significance of this correlation coefficient significant at


 = 0.01?
Continued.
Example continued:
H 0: ρ = 0 (no correlation) Ha: ρ  0 (significant correlation)

The level of significance is  = 0.01.

Degrees of freedom are d.f. = 12 – 2 = 10.


The critical values are t0 = 3.169 and t0 = 3.169.

The standardized test statistic is


t 
r 
0.831 The test statistic
2 2
1 r 1  (0.831) falls in the rejection
n 2 12  2
region, so H0 is
 4.72. rejected.
t
t0 = 0 t0 = 3.169
3.169
At the 1% level of significance, there is enough evidence
to conclude that there is a significant linear correlation
between the number of hours of TV watched over the
weekend and the test scores on Monday morning.
Measures of Regression
and Prediction Intervals
To find the total variation, you must first calculate the total
deviation, the explained deviation, and the unexplained
deviation.

Tot a l devia t ion  y i  y


E xpla in ed devia t ion  yˆ i  y
Un expla in ed devia t ion  y i  yˆ i
y (xi, yi)
Unexplain
ed
Total deviation
y i  yˆ i
deviation
yi  y Explaine
(xi, ŷi)
y d
(xi, yi) yˆ i  y deviation

x
x
The total variation about a regression line is the
sum of the squares of the differences between the
y-value of each ordered pair and the mean of y.
2
Tot a l va r ia t ion   y i  y 
The explained variation is the sum of the squares
of the differences between each predicted y-value
andEthe
xplamean
in ed vaof
r iay.
t ion   yˆ  y 
2
i

The unexplained variation is the sum of the


squares of the differences between the y-value of
each Unordered
expla in edpair and
va r ia t ioneach
  ycorresponding
 yˆ 2
i i
predicted y-value.
Tot a l var iat ion  E xpla in ed var ia t ion  Un expla in ed var ia t ion
The coefficient of determination r2 is the ratio of the explained
variation to the total variation. That is,

E xpla in ed var ia t ion


r2 
Tot a l va r ia t ion
Example:
The correlation coefficient for the data that represents the
number of hours students watched television and the test
scores of each student is r  0.831. Find the coefficient of
determination.

r 2  (0.831)2 About 69.1% of the variation in


the test scores can be explained
 0.691
by the variation in the hours of TV
watched. About 30.9% of the
variation is unexplained.
When a ŷ-value is predicted from an x-value, the prediction is a
point estimate.

An interval can also be constructed.

The standard error of estimate se is the standard deviation of the


observed yi -values about the predicted ŷ-value for a given xi -
value. It is given by
( y i  yˆ i )2
se 
n 2
where n is the number of ordered pairs in the data set.

The closer the observed y-values are to the predicted y-values, the
smaller the standard error of estimate will be.
Finding the Standard Error of Estimate

In Words In Symbols

1. Make a table that includes the x i , y i , yˆ i , ( y i  yˆ i ),


column heading shown. ( y i  yˆ i )2
2. Use the regression equation to yˆ  m x i  b
calculate the predicted y-values.
3. Calculate the sum of the squares of ( y i  yˆ i )2
the differences between each
observed y-value and the
corresponding predicted y-value.
4. Find the standard error of estimate. ( y i  yˆ i )2
se 
n 2
Example:
The regression equation for the following data is
ŷ = 1.2x – 3.8.
Find the standard error of estimate.

xi yi ŷi (yi – ŷi )2 ( y i  yˆ i )2
se 
n 2
1 –3 – 2.6 0.16
2 –1 – 1.4 0.16
3 0 – 0.2 0.04
4 1 1 0
5 2 2.2 0.04 Unexplained
  0.4
variation
( y i  yˆ i )2 0.4
se    0.365
n 2 5 2
The standard deviation of the predicted y value
for a given x value is about 0.365.
Example:
The regression equation for the data that represents the
number of hours 12 different students watched television
during the weekend and the scores of each student who took
a test the following Monday is
ŷ = –4.07x + 93.97.
Find the standard error of estimate.

Hours, xi 0 1 2 3 3 5
Test score, yi 96 85 82 74 95 68
ŷi 93.97 89.9 85.83 81.76 81.76 73.62
(yi – ŷi)2 4.12 24.01 14.67 60.22 175.3 31.58
Hours, xi 5 5 6 7 7 10
Test score, yi 76 84 58 65 75 50
ŷi 73.62 73.62 69.55 65.48 65.48 53.27
Continued.
(yi – ŷi)2 5.66 107.74 133.4 0.23 90.63 10.69
Example continued:

( y i  yˆ i )2  658.25
Unexplained
variation

( y i  yˆ i )2 658.25  8.11
se  
n 2 12  2

The standard deviation of the student test


scores for a specific number of hours of TV
watched is about 8.11.
Two variables have a bivariate normal distribution if for any
fixed value of x, the corresponding values of y are normally
distributed and for any fixed values of y, the corresponding x-
values are normally distributed.
A prediction interval can be constructed for the true value of y.

Given a linear regression equation ŷ = mx + b and


x0, a specific value of x, a prediction interval for y
is
ŷ–E<y<ŷ +E
where 1 n (x 0  x )2
E  t cs e 1   .
n 2 2
n  x  ( x )

The point estimate is ŷ and the margin of error is


E. The probability that the prediction interval
Construct a Prediction Interval for y for a Specific Value of x

In Words In Symbols

1. Identify the number of ordered d.f.  n  2


pairs in the data set n and the
degrees of freedom.
2. Use the regression equation and the yˆ  m x i  b
given x-value to find the point
estimate ŷ.
3. Find the critical value tc that Use Table 5 in
Appendix B.
corresponds to the given level of
confidence c.

Continued.
Construct a Prediction Interval for y for a Specific Value of x

In Words In Symbols

4. Find the standard error of ( y i  yˆ i )2


estimate se. se 
n 2

5. Find the margin of error E. 1 n (x 0  x )2


E  t cs e 1 
n n  x 2  ( x )2

6. Find the left and right Left endpoint: ŷ – E


endpoints and form the Right endpoint: ŷ + E
prediction interval. Interval: ŷ – E < y < ŷ
+E
Example:
The following data represents the number of hours 12
different students watched television during the weekend and
the scores of each student who took a test the following
Monday.

Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50

ŷ = –4.07x + 93.97 se  8.11

Construct a 95% prediction interval for the test scores


when 4 hours of TV are watched.
Continued.
Example continued:
Construct a 95% prediction interval for the test scores when
the number of hours of TV watched is 4.

There are n – 2 = 12 – 2 = 10 degrees of freedom.

The point estimate is

ŷ = –4.07x + 93.97 = –4.07(4) + 93.97 = 77.69.

The critical value tc = 2.228, and se = 8.11.

ŷ–E<y< ŷ+E

77.69 – 8.11 = 69.58 77.69+ 8.11 = 85.8


You can be 95% confident that when a student watches 4 hours
of TV over the weekend, the student’s test grade will be between
69.58 and 85.8.
Multiple Regression
 1. ‘Best Fit’ Means Difference Between
Actual Y Values & Predicted Y Values Are a
Minimum. But Positive Differences Off-Set
Negative.n So square errors!n
 Y  Yˆ
i    ˆ
i
2
2
i
i 1 i 1

 2. LS Minimizes the Sum of the Squared


Differences (errors) (SSE)

36
In many instances, a better prediction can be found
for a dependent (response) variable by using more
than one independent (explanatory) variable.
For example, a more accurate prediction of
Monday’s test grade from the previous section
might be made by considering the number of other
classes a student is taking as well as the student’s
previous knowledge of the test material.
A multiple regression equation has the form
ŷ = b + m1x1 + m2x2 + m3x3 + … + mkxk
where x1, x2, x3,…, xk are independent
variables, b is the y-intercept, and y is the
dependent
* Because variable.
the mathematics associated with this concept is complicated,
technology is generally used to calculate the multiple regression equation.
After finding the equation of the multiple
regression line, you can use the equation to
predict y-values over the range of the data.
Example:
The following multiple regression equation can be
used to predict the annual U.S. rice yield (in pounds).

ŷ = 859 + 5.76x1 + 3.82x2


where x1 is the number of acres planted (in
thousands), and x2 is the number of acres harvested
(in thousands). (Source: U.S. National
Agricultural Statistics Service)
a.) Predict the annual rice yield when x1 = 2758, and
x2 = 2714.
Continued.
b.) Predict the annual rice yield when x1 = 3581, and
x = 3021.
Example continued:
a.) ŷ = 859 + 5.76x1 + 3.82x2

= 859 + 5.76(2758) + 3.82(2714)


= 27,112.56
The predicted annual rice yield is 27,1125.56 pounds.

b.) ŷ = 859 + 5.76x1 + 3.82x2

= 859 + 5.76(3581) + 3.82(3021)


= 33,025.78
The predicted annual rice yield is 33,025.78
pounds.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy