0% found this document useful (0 votes)
6 views25 pages

Correlation

ISC class 12

Uploaded by

adhlakhaparnika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views25 pages

Correlation

ISC class 12

Uploaded by

adhlakhaparnika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MATHEMATICS

Class 12

Tanvi
What is correlation?
Correlation is a statistical measure that expresses the extent to
which two variables are linearly related (meaning they change
together at a constant rate). It’s a common tool for describing
simple relationships without making a statement about cause and
effect.

How is correlation measured?


The sample correlation coefficient, r, quantifies the strength of the
relationship. Correlations are also tested for statistical
significance.
LINEAR
C O R R E L AT I O N :

Tanvi
Graphical Representation of Correlation –
Scatter Plots

• A scatterplot, also called a scatter graph or scatter diagram, is a


plot of the data points in a set.
• It plots data that takes two variables into account at the same time.
Here are some examples of scatterplots:
S C AT T E R P L O T S AC C O U N T F O R T H E VA L U E S O F T WO
VA R I A B L E S AT O N E T I M E

Tanvi
Try drawing three lines across the data and consider
which is most appropriate.
We can tell straight away that A is not
the right line. This data appears to have
a positive linear relationship, but A has
a negative gradient. B has the correct
sign for its gradient, and it passes
through three points! However, there
are many more points above the line
than below it, and we should try to
make sure the line of best fit passes
through the centre of all the points. The
means that line C is the best fit for this
data out of the three lines.
DESCRIBING THE TREND

• Whenever you describe a relationship in the data, you should describe


• the form (linear, parabolic, sinusoidal, etc.),
• the direction (positive, negative),
• the strength (strong, weak), and
• the outliers (outliers, no outliers).

Tanvi
FORM
• If the data roughly follows a linear trend line, we can say the relationship is
linear. If the data more closely follows a parabolic curve, we would say the
relationship in parabolic. If the scatterplot just looks like one big blob, and
you can’t really see any relationship in the data, then we would say there’s
no relationship or correlation at all.

Tanvi
PA R A B O L I C
C O R R E L AT I O N

Tanvi
Moderate linear relationship Strong linear relationship

Tanvi
DIRECTION

POSITIVE LINEAR CORRELATION NEGATIVE LINEAR CORRELATION

Tanvi
S TRENGTH:
• IF THE DATA IS CLUSTERED TIGHTLY AROUND ITS
REGRESSION LINE, WE MIGHT SAY IT SHOWS A STRONG
LINEAR RELATIONSHIP. IF THE DATA IS LOOSELY CLUSTERED,
WE MIGHT SAY IT SHOWS A MODERATE LINEAR
RELATIONSHIP. A WEAK LINEAR RELATIONSHIP WOULD BE
DATA THAT IS SPREAD OUT BUT STILL NOTICEABLY IN THE
FORM OF A TREND LINE OR CURVE.

Tanvi
NO
C O R R E L AT I O N :

Tanvi
Formulae : Correlation
∑ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത
𝐶𝑜𝑣 𝑥, 𝑦 =
𝑛

𝐶𝑜𝑣 𝑥,𝑦 ∑ 𝑥−𝑥ҧ 𝑦−𝑦ത


𝑟 𝑜𝑟 𝜌 𝑥, 𝑦 = =
𝑉𝑎𝑟 𝑥 𝑉𝑎𝑟 𝑦 ∑ 𝑥−𝑥ҧ 2 ∑ 𝑦−𝑦ത 2

𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
𝑟=
𝑛∑𝑥 2 − ∑𝑥 2 𝑛∑𝑦 2 − ∑𝑦 2
TOPIC: LINEAR REGRESSION
• The regression line is a trend line we use to
model a linear trend that we see in a scatterplot
but realize that some data will show a
relationship that isn’t necessarily linear. For
example, the relationship might follow the curve
of a parabola, in which case the regression
curve would be parabolic in nature.

Tanvi
Tanvi
What is regression?
Correlation coefficient indicates the direction of co variation and
the closeness of the linear relation between two variables. If two
variables are related, the mathematical equation of their relation
is regression. Regression equation gives the value of the
dependent variable corresponding to any specified value of
independent variable. The cause and effect relationship is
measured in regression analysis. That is which variable is cause
and which variable is effect is known in regression analysis.
However, the measurement of cause and effect relationship is
possible only if they are correlated.
The differences between correlation and regression:
S. No. Correlation Regression
1 Correlation is the relationship The average relation between the
between variables. It is expressed variables is given as an equation.
numerically
2 Between two variables, none is One of the variables is independent
identified as independent or variable and the other is dependent
dependent variable. variable in any particular context.
3 Correlation does not reveal the cause Independent variable may be the cause'
and effect relation. One variable need and dependent variable, the effect".
not be the cause and the other effect
4 There is spurious or nonsense There is no such possibility. Regression is
correlation considered only when the variables are
related.
5 Correlation coefficient is a number The two regression coefficients have the
between -1 and +1. same sign,+ or -. One of them can be
greater than 1 numerically. But they
cannot be greater than 1 numerically
simultaneously.
Line of best fit (Normal Equations)

𝒚 𝒐𝒏 𝒙 𝒙 𝒐𝒏 𝒚
𝒚 = 𝑎 + 𝑏𝒙 𝑥 = 𝑎 + 𝑏𝑦
∑𝑦 = 𝑎𝑛 + 𝑏∑𝑥 ∑𝑥 = 𝑎𝑛 + 𝑏∑𝑦
and ∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥 2 and ∑𝑥𝑦 = 𝑎∑𝑦 + 𝑏∑𝑦 2
▪ Regression line of 𝒚 𝒐𝒏 𝒙 : 𝑥 is the independent variable and 𝑦 is the dependent variable. i.e, if 𝑥 is
known, 𝑦 can be found from 𝑦 − 𝑦ത = 𝑏𝑦𝑥 𝑥 − 𝑥ҧ .
▪ Regression line of 𝒙 𝒐𝒏 𝒚 : 𝑦 is the independent variable and 𝑥 is the dependent variable. i.e, if 𝑦 is
known, 𝑥 can be found from 𝑥 − 𝑥ҧ = 𝑏𝑥𝑦 𝑦 − 𝑦ത .
▪ Regression Coefficients- These are the slopes of the regression lines

Regression coefficient Regression coefficient


of 𝒚 𝒐𝒏 𝒙 of 𝒙 𝒐𝒏 𝒚
𝜎𝑥 , 𝜎𝑦 are the standard 𝜎𝑦 𝑐𝑜𝑣 𝑥,𝑦 𝜎𝑥
𝑏𝑦𝑥 = 𝑟 ∙ = 𝑏𝑥𝑦 = 𝑟 ∙ =
deviations of 𝑥 and 𝑦 𝜎𝑥 𝜎𝑥2 𝜎𝑦
respectively and 𝑟 is the 𝑐𝑜𝑣 𝑥,𝑦
coefficient of correlation. 𝜎𝑦2
if 𝑥, 𝑦 are small numbers (when 𝑛∑𝑥𝑦−∑𝑥∑𝑦 𝑛∑𝑥𝑦−∑𝑥∑𝑦
original values are used) 𝑏𝑦𝑥 = 𝑏𝑥𝑦 =
𝑛∑𝑥 2 − ∑𝑥 2 𝑛∑𝑦 2 − ∑𝑦 2
If 𝑥 − 𝑥,ҧ 𝑦 − 𝑦ത are small fraction ҧ
∑(𝑥−𝑥)(𝑦− ത
𝑦) ҧ
∑(𝑥−𝑥)(𝑦− ത
𝑦)
less numbers 𝑏𝑦𝑥 = ∑ 𝑥−𝑥ҧ 2
𝑏𝑥𝑦 = ∑ 𝑦−𝑦ത 2
If 𝑥, 𝑦 are large numbers. 𝑢 = 𝑥 − 𝑛∑𝑢𝑣−∑𝑢∑𝑣 𝑛∑𝑢𝑣−∑𝑢∑𝑣
𝐴, 𝑣 = 𝑦 − 𝐵, where 𝐴, 𝐵 are 𝑏𝑦𝑥 = 𝑏𝑥𝑦 =
𝑛∑𝑢2 − ∑𝑢 2 𝑛∑𝑣 2 − ∑𝑣 2
assumed means
IMPORTANT properties of lines of regression

𝒃𝒚𝒙 , 𝒃𝒙𝒚 and 𝒓 [or 𝜌(𝑥, 𝑦)] are of the same sign.

|𝑟| is the geometric mean of 𝑏𝑦𝑥 and 𝑏𝑥𝑦 . i.e 𝑟 2 = 𝑏𝑦𝑥 ∙ 𝑏𝑥𝑦
The two regression lines intersect at 𝑥,ҧ 𝑦ത .
The two regression lines will coincide only when there is a
perfect linear relationship between 𝑥 and 𝑦. 𝑖. 𝑒. 𝑖𝑓𝑓 𝑟 = ±1.

If 𝑟 = 0, then the lines of regression are parallel to the


coordinates axes.
TRY this
problem
Happy problem solving!

Tanvi

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy