100% found this document useful (4 votes)
4K views

Correlation and Regression

This document discusses correlation and regression analysis. It defines key terms like correlation coefficient, dependent and independent variables, and scatter plots. It describes Pearson's correlation coefficient as a measure of the linear relationship between two variables. The correlation coefficient ranges from -1 to 1, where -1 is a perfect negative correlation, 0 is no correlation, and 1 is a perfect positive correlation. Examples are provided to demonstrate how to interpret scatter plots and calculate the correlation coefficient to determine the strength and direction of relationships between variables.

Uploaded by

Josh Jj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
4K views

Correlation and Regression

This document discusses correlation and regression analysis. It defines key terms like correlation coefficient, dependent and independent variables, and scatter plots. It describes Pearson's correlation coefficient as a measure of the linear relationship between two variables. The correlation coefficient ranges from -1 to 1, where -1 is a perfect negative correlation, 0 is no correlation, and 1 is a perfect positive correlation. Examples are provided to demonstrate how to interpret scatter plots and calculate the correlation coefficient to determine the strength and direction of relationships between variables.

Uploaded by

Josh Jj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

Correlation & Regression

By
Dr. Neeraj Anand
Email: nanand@ddn.upes.ac.in
Quantitative Techniques
Purpose: To provide a rational basis for
making decisions in the absence of
complete information.
It deals with three classical aspects of
science:
- Describing the behaviour of systems
- Analyzing the behaviour by
constructing appropriate models.
- Applying these models to predict
future behaviour.
O.R. function is a staff function.
CORRELATION
Definitions
 Correlation
A method used to determine if a relationship between
variables exists
 Correlation Coefficient
A statistic or parameter which measures the strength
and direction of a relationship between two variables. Its
value ranges between -1 and +1.
 Dependent Variable
A variable in correlation or regression that can not be
controlled, that is, it depends on the independent
variable.
 Independent Variable
A variable in correlation or regression which can be
controlled, that is, it is independent of the other
variable.
Coefficient of Determination
The percent of the variation in dependent variable that
can be explained by the variation in the independent
variable in the regression model. It is expressed as r2.
Scatter Plot
A plot of the data values on a coordinate system. The
independent variable is graphed along the x-axis and the
dependent variable along the y-axis.
Pearson's Correlation Coefficient
 This is a measure of linear correlation. The
population parameter is denoted by the greek
letter ‘rho’(ρ) and the sample statistic is denoted
by the roman letter ‘r’.
The Pearson Product Moment Correlation
 Measures the extent to which one variable

covaries with another. The correlation


standardizes the two variables when it computes
the covariance. Hence, the correlation is a
standardized covariance.
Types of Correlation
 POSITIVE
 NEGATIVE
 LINEAR
 PERFECT (Positive/Negative)
 NON-LINEAR
 SIMPLE
 PARTIAL
 NON-SENSE (Spurious)
 BIVARIATE
 MULTIPLE
Types of Correlation
 Bivariate Correlations: are correlations
between two variables. Some bivariate
correlations are non-directional and these are
called symmetric correlations. Other bivariate
correlations are directional and are called
asymmetric correlations.
 Multiple Correlations: are those between one
variable and a set of variables.
There are multiple correlations that hold part of
the set of variables constant:
Assumptions
1. Interval level data
2.Linearity (plot the relationship
between the variables with a
scattergram or fit the functional
curve formed by the relationship to
be sure of linearity).
3. Homoskedasticity or equal variances
4. Independence of observations
5. Representative sampling
Scatter diagram
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X)
and the second is called dependent (Y)
• Points are not joined
• No frequency table
Example
SBP(mmHg)

220
200

180
160

140
120

100
80 wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood


pressure
SBP (mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood


pressure
Scatter plots

The pattern of data is indicative of the type of


relationship between your two variables:
 positive relationship

 negative relationship

 no relationship
Positive relationship
18

16

14

12
Height in CM

10

0
0 10 20 30 40 50 60 70 80 90
Age in Weeks
Negative relationship

Reliability

Age of Car
No relation
Correlation Coefficient

Statistic showing the degree of relation


between two variables
Simple Correlation coefficient (r)

 It is also called Pearson's correlation


or product moment correlation
coefficient.
 It measures the nature and strength
between two variables of
the quantitative type.
The sign of r denotes the nature of
association

while the value of r denotes the


strength of association.
 If the sign is +ve this means the relation
is direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one
variable is associated with a
decrease in the other variable).

 While if the sign is -ve this means an


inverse or indirect relationship (which
means an increase in one variable is
associated with a decrease in the other).
 The value of r ranges between ( -1) and ( +1)
 The value of r denotes the strength of the
association as illustrated
by the following diagram.

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
If r = Zero this means no association or
correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)

 xy   x y
r n
 ( x) 2
  ( y) 
2
x 
2 .  y 
2 
 n  n 
  
Example:
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.

serial Age Weight


No (years) (Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
These 2 variables are of the quantitative type, one
variable (Age) is called the independent and
denoted as (X) variable and the other (weight)
is called the dependent and denoted as (Y)
variables to find the relation between age and
weight compute the simple correlation coefficient
using the following formula:

 x y
 xy  n
r 
 ( x) 2  ( y)2 
x 
2 .  y 
2 
 n  n 
  
Age Weight
Serial
(years) (Kg) xy X2 Y2
.no
(x) (y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total =x∑ =y∑ xy=∑ =x2∑ =y2∑
41 66 461 291 742
41  66
461 
r 6
 (41)2   (66)2 
291  .742  
 6  6 

r = 0.759
strong direct correlation
EXAMPLE: Relationship between Anxiety and Test
Scores
Anxiety Test X2 Y2 XY
)X( score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
Calculating Correlation Coefficient

(6)(129)  (32)(32) 774  1024


r   .94
 6(230)  32  6(204)  32 
2 2
(356)(200)

r = - 0.94

Indirect strong correlation


Pearson Product Moment Correlation

consists of the covariation divided by


the square root of the product of the
standard deviations of the two
variables.
a = (∑Y)/ N b = ∑xy / ∑x2

Cov ( x, y )
rx , y 
sx s y
Test scores and work experience of 5
executives is given below, determine the
coefficient of correlation.
X1= Test Score, X2 = Experience (years)
X1

X2
50 2
80 8
20 6
90 5
60 4

Using
EXCEL
r=0.204124
Properties of “r”
 r is always between -1 and 1 inclusive. -1means
perfect negative linear correlation and +1 means
perfect positive linear correlation.
 r only measures the strength of a linear
relationship. There are other kinds of relationships
besides linear.
 r has the same sign as the slope of the regression
(best fit) line.
 r does not change if the independent (x) and
dependent (y) variables are interchanged.
 r does not change if the scale on either variable is
changed. You may multiply, divide, add, or subtract
a value to/from all the x-values or y-values without
changing the value of r.
Spearman’s Rank Correlation
Coefficient
Nonparametric correlation between two
ordinal variables.
Rank correlation coefficient rs =
1- [(6 ∑D2 )/( N3-N)]
where D (Difference of ranks) = R1-R2
N= Total number of observations
Procedure
1. Rank the values of X from 1 to n where n
is the numbers of pairs of values of X and
Y in the sample.
2. Rank the values of Y from 1 to n.
3. Compute the value of di for each pair of
observation by subtracting the rank of Yi
from the rank of Xi
4. Square each di and compute ∑di2 which
is the sum of the squared values.
5. Apply the following formula

6 (di) 2
rs  1 
n(n 2  1)

The value of rs denotes the magnitude


and nature of association giving the same
interpretation as simple r.
Example
In a study of the relationship between level
education and income of 6 executives, the following
data was obtained. Find the relationship between
them and comment.

sample level education Income


numbers (X) (Y)
A Preparatory. 25
B Primary. 10
C University. 8
D secondary 10
E secondary 15
F illiterate 50
G University. 60
Answer:
Rank Rank di di2
(X) (Y) X Y
A Middle 25 5 3 2 4

B Primary 10 6 5.5 0.5 0.25


C University 8 1.5 7 -5.5 30.25
D Secondary 10 3.5 5.5 -2 4
E Secondary 15 3.5 4 -0.5 0.25
F Illiterate 50 7 2 5 25
G university 60 1.5 1 0.5 0.25

∑ di2=64
6  64
rs  1   0.1
7(48)

Comment:
There is an indirect weak correlation
between level of education and income.
Causation
If there is a significant linear correlation between
two variables, then one of five situations can be
true.
 There is a direct cause and effect relationship
 There is a reverse cause and effect relationship
 The relationship may be caused by a third
variable
 The relationship may be caused by complex
interactions of several variables
 The relationship may be coincidental
Regression
 Regression
A method used to describe the relationship between
two variables.
 Regression Line
 The best fit line.

 When there is significant linear correlation, we can use a line to estimate


the value of the dependent variable for certain values of the independent
variable.
When the regression equation should be used:

 There is significant linear correlation. (That is, when we reject the null
hypothesis that rho=0 in a correlation hypothesis test.)

 The value of the independent variable being used in the estimation is


close to the original values not the values much beyond the range. (That
is, we should not use a regression equation obtained using x's between
10 and 20 to estimate y when x is 350).
Regression (contd..)
 The regression equation should not be
used with different populations.( That is, if x
is the height of a male, and y is the weight
of a male, then you shouldn't use the
regression equation to estimate the weight
of a female).
 The regression equation should n't be used
to forecast values not from that time frame.
(That is, if data is from the 1970's, it
probably isn't valid estimate for 2000's).
Regression Equation
 The regression equation is:
y' = a + bx
 ‘b’ is the slope of the regression line, ‘a’
is the y-intercept of the regression line.
The regression line is sometimes called
the "line of best fit" or the "best fit
line".
 Since it "best fits" the data, it makes
sense that the line passes through the
means.
Coefficient of Determination
The coefficient of determination is:
 the percent of the variation that can be explained by the
regression equation.
 the explained variation divided by the total variation

 the square of r

 Every sample has some variation in it. The total variation


is made up of two parts, the part that can be explained
by the regression equation and the part that can't be
explained by the regression equation.
 The ratio of the explained variation to the total variation
is a measure of how good the regression line is. If the
regression line passed through every point on the scatter
plot exactly, it would be able to explain all of the
variation. The further the line is from the points, the less
it is able to explain.
Total V= UV + EV
Applications
 Finance:
- Profit and sales revenue
- Return on stock and return of BSE/NIFTY
- Capital reserve and return on stock
 Marketing:
- Sales of any product and Advt. budget
- Sales revenue and salary of executive
 Economics:
- Inflation and GDP
- Demand of any product and Temperature
A Case on Multiple Regression
Modern Plastics Ltd. manufactures different type of
plastic products. The company has taken the
decision to train its supervisors. The company
conducted an aptitude test for selection of
supervisors. A random sample of five supervisors
was selected who had experience of minimum two
years. They were provided training for two weeks
as Quality Control Inspectors and after the
completion of training their proficiency was
measured by their output per shift basis. The
output per shift (Number of units) was recorded for
all the five supervisors as 20, 60,30,50 and 70
respectively.
Contd.
The values of test scores (X1) and experience (X2) in terms of number
of years are as under:
X1 X 2
50 2
80 8
20 3
90 5
60 4

i) Determine the relationship between the performance of the Inspectors and their test scores as
well as experience

ii) Determine the correlation between test score and experience of Inspectors.

{Hint: Yc = a + b1 X1 + b2 X2

∑Y = Na + b1∑X1 + b2 ∑X2

2
∑X1Y
∑X1Y = a ∑X1 + b1 ∑X1 + b2 ∑ X1X2

2
∑X2Y = a ∑X2 + b1 ∑X1X2 + b2∑X2
X FORECAST USING EXCEL Y
40 2
45 1.8

50 2.3
52 2.5
60 3
65 3.4
68 3.1
72 4.2
80 4.8
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy