Correlation and Regression
Correlation and Regression
By
Dr. Neeraj Anand
Email: nanand@ddn.upes.ac.in
Quantitative Techniques
Purpose: To provide a rational basis for
making decisions in the absence of
complete information.
It deals with three classical aspects of
science:
- Describing the behaviour of systems
- Analyzing the behaviour by
constructing appropriate models.
- Applying these models to predict
future behaviour.
O.R. function is a staff function.
CORRELATION
Definitions
Correlation
A method used to determine if a relationship between
variables exists
Correlation Coefficient
A statistic or parameter which measures the strength
and direction of a relationship between two variables. Its
value ranges between -1 and +1.
Dependent Variable
A variable in correlation or regression that can not be
controlled, that is, it depends on the independent
variable.
Independent Variable
A variable in correlation or regression which can be
controlled, that is, it is independent of the other
variable.
Coefficient of Determination
The percent of the variation in dependent variable that
can be explained by the variation in the independent
variable in the regression model. It is expressed as r2.
Scatter Plot
A plot of the data values on a coordinate system. The
independent variable is graphed along the x-axis and the
dependent variable along the y-axis.
Pearson's Correlation Coefficient
This is a measure of linear correlation. The
population parameter is denoted by the greek
letter ‘rho’(ρ) and the sample statistic is denoted
by the roman letter ‘r’.
The Pearson Product Moment Correlation
Measures the extent to which one variable
220
200
180
160
140
120
100
80 wt (kg)
60 70 80 90 100 110 120
200
180
160
140
120
100
80
Wt (kg)
60 70 80 90 100 110 120
negative relationship
no relationship
Positive relationship
18
16
14
12
Height in CM
10
0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship
Reliability
Age of Car
No relation
Correlation Coefficient
If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)
xy x y
r n
( x) 2
( y)
2
x
2 . y
2
n n
Example:
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.
x y
xy n
r
( x) 2 ( y)2
x
2 . y
2
n n
Age Weight
Serial
(years) (Kg) xy X2 Y2
.no
(x) (y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total =x∑ =y∑ xy=∑ =x2∑ =y2∑
41 66 461 291 742
41 66
461
r 6
(41)2 (66)2
291 .742
6 6
r = 0.759
strong direct correlation
EXAMPLE: Relationship between Anxiety and Test
Scores
Anxiety Test X2 Y2 XY
)X( score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
Calculating Correlation Coefficient
r = - 0.94
Cov ( x, y )
rx , y
sx s y
Test scores and work experience of 5
executives is given below, determine the
coefficient of correlation.
X1= Test Score, X2 = Experience (years)
X1
X2
50 2
80 8
20 6
90 5
60 4
Using
EXCEL
r=0.204124
Properties of “r”
r is always between -1 and 1 inclusive. -1means
perfect negative linear correlation and +1 means
perfect positive linear correlation.
r only measures the strength of a linear
relationship. There are other kinds of relationships
besides linear.
r has the same sign as the slope of the regression
(best fit) line.
r does not change if the independent (x) and
dependent (y) variables are interchanged.
r does not change if the scale on either variable is
changed. You may multiply, divide, add, or subtract
a value to/from all the x-values or y-values without
changing the value of r.
Spearman’s Rank Correlation
Coefficient
Nonparametric correlation between two
ordinal variables.
Rank correlation coefficient rs =
1- [(6 ∑D2 )/( N3-N)]
where D (Difference of ranks) = R1-R2
N= Total number of observations
Procedure
1. Rank the values of X from 1 to n where n
is the numbers of pairs of values of X and
Y in the sample.
2. Rank the values of Y from 1 to n.
3. Compute the value of di for each pair of
observation by subtracting the rank of Yi
from the rank of Xi
4. Square each di and compute ∑di2 which
is the sum of the squared values.
5. Apply the following formula
6 (di) 2
rs 1
n(n 2 1)
∑ di2=64
6 64
rs 1 0.1
7(48)
Comment:
There is an indirect weak correlation
between level of education and income.
Causation
If there is a significant linear correlation between
two variables, then one of five situations can be
true.
There is a direct cause and effect relationship
There is a reverse cause and effect relationship
The relationship may be caused by a third
variable
The relationship may be caused by complex
interactions of several variables
The relationship may be coincidental
Regression
Regression
A method used to describe the relationship between
two variables.
Regression Line
The best fit line.
There is significant linear correlation. (That is, when we reject the null
hypothesis that rho=0 in a correlation hypothesis test.)
the square of r
i) Determine the relationship between the performance of the Inspectors and their test scores as
well as experience
ii) Determine the correlation between test score and experience of Inspectors.
{Hint: Yc = a + b1 X1 + b2 X2
∑Y = Na + b1∑X1 + b2 ∑X2
2
∑X1Y
∑X1Y = a ∑X1 + b1 ∑X1 + b2 ∑ X1X2
2
∑X2Y = a ∑X2 + b1 ∑X1X2 + b2∑X2
X FORECAST USING EXCEL Y
40 2
45 1.8
50 2.3
52 2.5
60 3
65 3.4
68 3.1
72 4.2
80 4.8
THANK YOU