Correlation
Correlation
Class 12
Tanvi
What is correlation?
Correlation is a statistical measure that expresses the extent to
which two variables are linearly related (meaning they change
together at a constant rate). It’s a common tool for describing
simple relationships without making a statement about cause and
effect.
Tanvi
Graphical Representation of Correlation –
Scatter Plots
Tanvi
Try drawing three lines across the data and consider
which is most appropriate.
We can tell straight away that A is not
the right line. This data appears to have
a positive linear relationship, but A has
a negative gradient. B has the correct
sign for its gradient, and it passes
through three points! However, there
are many more points above the line
than below it, and we should try to
make sure the line of best fit passes
through the centre of all the points. The
means that line C is the best fit for this
data out of the three lines.
DESCRIBING THE TREND
Tanvi
FORM
• If the data roughly follows a linear trend line, we can say the relationship is
linear. If the data more closely follows a parabolic curve, we would say the
relationship in parabolic. If the scatterplot just looks like one big blob, and
you can’t really see any relationship in the data, then we would say there’s
no relationship or correlation at all.
Tanvi
PA R A B O L I C
C O R R E L AT I O N
Tanvi
Moderate linear relationship Strong linear relationship
Tanvi
DIRECTION
Tanvi
S TRENGTH:
• IF THE DATA IS CLUSTERED TIGHTLY AROUND ITS
REGRESSION LINE, WE MIGHT SAY IT SHOWS A STRONG
LINEAR RELATIONSHIP. IF THE DATA IS LOOSELY CLUSTERED,
WE MIGHT SAY IT SHOWS A MODERATE LINEAR
RELATIONSHIP. A WEAK LINEAR RELATIONSHIP WOULD BE
DATA THAT IS SPREAD OUT BUT STILL NOTICEABLY IN THE
FORM OF A TREND LINE OR CURVE.
Tanvi
NO
C O R R E L AT I O N :
Tanvi
Formulae : Correlation
∑ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത
𝐶𝑜𝑣 𝑥, 𝑦 =
𝑛
𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
𝑟=
𝑛∑𝑥 2 − ∑𝑥 2 𝑛∑𝑦 2 − ∑𝑦 2
TOPIC: LINEAR REGRESSION
• The regression line is a trend line we use to
model a linear trend that we see in a scatterplot
but realize that some data will show a
relationship that isn’t necessarily linear. For
example, the relationship might follow the curve
of a parabola, in which case the regression
curve would be parabolic in nature.
Tanvi
Tanvi
What is regression?
Correlation coefficient indicates the direction of co variation and
the closeness of the linear relation between two variables. If two
variables are related, the mathematical equation of their relation
is regression. Regression equation gives the value of the
dependent variable corresponding to any specified value of
independent variable. The cause and effect relationship is
measured in regression analysis. That is which variable is cause
and which variable is effect is known in regression analysis.
However, the measurement of cause and effect relationship is
possible only if they are correlated.
The differences between correlation and regression:
S. No. Correlation Regression
1 Correlation is the relationship The average relation between the
between variables. It is expressed variables is given as an equation.
numerically
2 Between two variables, none is One of the variables is independent
identified as independent or variable and the other is dependent
dependent variable. variable in any particular context.
3 Correlation does not reveal the cause Independent variable may be the cause'
and effect relation. One variable need and dependent variable, the effect".
not be the cause and the other effect
4 There is spurious or nonsense There is no such possibility. Regression is
correlation considered only when the variables are
related.
5 Correlation coefficient is a number The two regression coefficients have the
between -1 and +1. same sign,+ or -. One of them can be
greater than 1 numerically. But they
cannot be greater than 1 numerically
simultaneously.
Line of best fit (Normal Equations)
𝒚 𝒐𝒏 𝒙 𝒙 𝒐𝒏 𝒚
𝒚 = 𝑎 + 𝑏𝒙 𝑥 = 𝑎 + 𝑏𝑦
∑𝑦 = 𝑎𝑛 + 𝑏∑𝑥 ∑𝑥 = 𝑎𝑛 + 𝑏∑𝑦
and ∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥 2 and ∑𝑥𝑦 = 𝑎∑𝑦 + 𝑏∑𝑦 2
▪ Regression line of 𝒚 𝒐𝒏 𝒙 : 𝑥 is the independent variable and 𝑦 is the dependent variable. i.e, if 𝑥 is
known, 𝑦 can be found from 𝑦 − 𝑦ത = 𝑏𝑦𝑥 𝑥 − 𝑥ҧ .
▪ Regression line of 𝒙 𝒐𝒏 𝒚 : 𝑦 is the independent variable and 𝑥 is the dependent variable. i.e, if 𝑦 is
known, 𝑥 can be found from 𝑥 − 𝑥ҧ = 𝑏𝑥𝑦 𝑦 − 𝑦ത .
▪ Regression Coefficients- These are the slopes of the regression lines
𝒃𝒚𝒙 , 𝒃𝒙𝒚 and 𝒓 [or 𝜌(𝑥, 𝑦)] are of the same sign.
|𝑟| is the geometric mean of 𝑏𝑦𝑥 and 𝑏𝑥𝑦 . i.e 𝑟 2 = 𝑏𝑦𝑥 ∙ 𝑏𝑥𝑦
The two regression lines intersect at 𝑥,ҧ 𝑦ത .
The two regression lines will coincide only when there is a
perfect linear relationship between 𝑥 and 𝑦. 𝑖. 𝑒. 𝑖𝑓𝑓 𝑟 = ±1.
Tanvi