Production Planning and Control
Production Planning and Control
Control
A correlation is a relationship between two variables. The data
can be represented by the ordered pairs (x, y) where x is the
independent (or explanatory) variable, and y is the dependent
(or response) variable.
x 1 2 3 4 5 –2
y –4 –2 –1 0 2
–4
y y
As x As x
increases, y increases, y
tends to tends to
decrease. increase.
x x
Negative Linear Correlation Positive Linear Correlation
y y
x x
No Correlation Nonlinear Correlation
The correlation coefficient is a measure of the strength and the
direction of a linear relationship between two variables. The
symbol r represents the sample correlation coefficient. The
formula for r is
n xy x y
r .
2 2
n x 2 x n y 2 y
r = 0.91 r = 0.88
x
x
Strong negative correlation
Strong positive correlation
y
y
r = 0.42
r = 0.07
x
x
Weak positive correlation
Nonlinear Correlation
Calculating a Correlation Coefficient
In Words In Symbols
Continued.
Example:
Calculate the correlation coefficient r for the following data.
x y xy x2 y2
1 –3 –3 1 9
2 –1 –2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
x 15 y 1 xy 9 x 2 55 y 2 15
60 There is a strong
0.986
50 74 positive linear
correlation between x
and y.
Example:
The following data represents the number of hours 12
different students watched television during the weekend and
the scores of each student who took a test the following
Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
n xy x y
r
2 2
n x 2 x n y 2 y
Continued.
Example continued:
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
y
n xy x y
r
100 n x 2 x
2
n y 2 y
2
80
Test score
60
40
20
x
2 4 6 8 10
Hours watching TV Continued.
n xy x y
r
2 2
Example continued: n x 2 x n y 2 y
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
xy 0 85 164 222 285 340 380 420 348 455 525 500
x2 0 1 4 9 9 25 25 25 36 49 49 100
y2 9216 7225 6724 5476 9025 4624 5776 7056 3364 4225 5625 2500
n = 0.05 = 0.01
For a sample of size
4 0.950 0.990
n = 6, ρ is significant
5 0.878 0.959
at the 5%
6 0.811 0.917
significance level, if |
7 0.754 0.875
r| > 0.811.
Finding the Correlation Coefficient ρ
In Words In Symbols
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
Continued.
Example continued:
Appendix B: Table 11
r 0.831 n = 0.05 = 0.01
n = 12 4 0.950 0.990
5 0.878 0.959
= 0.01 6 0.811 0.917
10 0.632 0.765
11 0.602 0.735
12 0.576 0.708 |r| > 0.708
13 0.553 0.684
Because, the population correlation is significant,
there is enough evidence at the 1% level of
significance to conclude that there is a significant
linear correlation between the number of hours of
television watched during the weekend and the scores
of each student who took a test the following Monday.
A hypothesis test can also be used to determine whether the
sample correlation coefficient r provides enough evidence to
conclude that the population correlation coefficient ρ is significant
at a specified level of significance.
In Words In Symbols
In Words In Symbols
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
x
x
The total variation about a regression line is the
sum of the squares of the differences between the
y-value of each ordered pair and the mean of y.
2
Tot a l va r ia t ion y i y
The explained variation is the sum of the squares
of the differences between each predicted y-value
andEthe
xplamean
in ed vaof
r iay.
t ion yˆ y
2
i
The closer the observed y-values are to the predicted y-values, the
smaller the standard error of estimate will be.
Finding the Standard Error of Estimate
In Words In Symbols
xi yi ŷi (yi – ŷi )2 ( y i yˆ i )2
se
n 2
1 –3 – 2.6 0.16
2 –1 – 1.4 0.16
3 0 – 0.2 0.04
4 1 1 0
5 2 2.2 0.04 Unexplained
0.4
variation
( y i yˆ i )2 0.4
se 0.365
n 2 5 2
The standard deviation of the predicted y value
for a given x value is about 0.365.
Example:
The regression equation for the data that represents the
number of hours 12 different students watched television
during the weekend and the scores of each student who took
a test the following Monday is
ŷ = –4.07x + 93.97.
Find the standard error of estimate.
Hours, xi 0 1 2 3 3 5
Test score, yi 96 85 82 74 95 68
ŷi 93.97 89.9 85.83 81.76 81.76 73.62
(yi – ŷi)2 4.12 24.01 14.67 60.22 175.3 31.58
Hours, xi 5 5 6 7 7 10
Test score, yi 76 84 58 65 75 50
ŷi 73.62 73.62 69.55 65.48 65.48 53.27
Continued.
(yi – ŷi)2 5.66 107.74 133.4 0.23 90.63 10.69
Example continued:
( y i yˆ i )2 658.25
Unexplained
variation
( y i yˆ i )2 658.25 8.11
se
n 2 12 2
In Words In Symbols
Continued.
Construct a Prediction Interval for y for a Specific Value of x
In Words In Symbols
Hours, x 0 1 2 3 3 5 5 5 6 7 7 10
Test score, y 96 85 82 74 95 68 76 84 58 65 75 50
ŷ–E<y< ŷ+E
36
In many instances, a better prediction can be found
for a dependent (response) variable by using more
than one independent (explanatory) variable.
For example, a more accurate prediction of
Monday’s test grade from the previous section
might be made by considering the number of other
classes a student is taking as well as the student’s
previous knowledge of the test material.
A multiple regression equation has the form
ŷ = b + m1x1 + m2x2 + m3x3 + … + mkxk
where x1, x2, x3,…, xk are independent
variables, b is the y-intercept, and y is the
dependent
* Because variable.
the mathematics associated with this concept is complicated,
technology is generally used to calculate the multiple regression equation.
After finding the equation of the multiple
regression line, you can use the equation to
predict y-values over the range of the data.
Example:
The following multiple regression equation can be
used to predict the annual U.S. rice yield (in pounds).