0% found this document useful (0 votes)
35 views32 pages

Correlationandregression1 200905162711

Uploaded by

titelord3000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views32 pages

Correlationandregression1 200905162711

Uploaded by

titelord3000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Correlation and Regression

Correlation

• Definition: The extent (degree) of the linear relationship between two


variables is called correlation.
• Correlation analysis is a statistical tool, that measures the closeness or
strength of the relationship between the variables.
• In correlation, two variables are inter-dependent or co-vary and we can not
make distinction between the independent and dependent variables. E.g birth
weight and maternal height, drug intake and number of days taken to cure etc.
• Correlation analysis is not only establishing relationship but also quantify it.
Correlation is unable to indicate the cause and effect relationship between two
variables.
Types of Correlation

On the basis of the nature of relationship between the


variables, correlation can be categorized as

1.Positive and negative correlation.

2.Simple, partial and multiple correlation

3.Linear and non-linear


Positive correlation

• This correlation is also called,


70
No. of days direct correlation.
• In this, an increase or decrease
53 in the value of one variable is
associated with the increase or
35 decrease in the value of the
other.
18
• In this, both variables move in
0
the same direction.
• E.g. number of tillers and plant
No. of d...
1 2 3 4 5

yield in wheat, plant yield and


number of pods, number of days
and height of the plant, etc.
Negative correlation

• In this, increase in one variable


Supply (Tonnes) causes the proportionate
125
decrease in the other variable.
100
• Here the two variables move in
the opposite direction.
75 • E.g. supply and price of
commodity. If the supply of the
50 commodity is more, price fall
and if there is scarcity of the
25 commodity, then the price goes
up. Here there is negative
0 relationship between supply and
1 2 3 4 5
Supply (Tonn...
price.
Types of Correlations
• Depending of the number of variables the correlation is classified into Simple, partial and
multiple correlations.
• 1. Simple:
• In this only two variables are involved, and these two variables are taken into consideration at a
time.
• E.g. yield of wheat and the amount (dose) of fertilizers.
• 2. Partial correlation:
• Relationship between three or more variables is studied.
• In this type only two variables are taken into consideration and other variables are excluded.
• E.g. the yield of maize and the amount of fertilizers applied to it are taken into consideration
and the effect of the other variables such as effect of pesticides, type of soil, availability of
water etc. are not taken into consideration.
Multiple correlations

• In this three or more variables are studied simultaneously.

• However multiple correlations consist of measurements of relationships


between a dependable variable and two or more independent variable.
• Partial and multiple correlation are mainly associated with multivariate
analysis.
• E.g. relationship between agricultural production, rainfall and quantity of
fertilizers used.
Liner correlation

X • Linear and non-linear


113
correlation:
90
• Difference between these two is

68
based on the ratio of change
between the variables under
45 study.

23
• Linear correlation: values have
constant ratio.
0
1 2X 3 • E.g. X= 30, 60, 90.
• Y= 10, 20, 30
Non-linear correlation

No. of days
70 The amount of change in one
variable doesn’t have a
53 constant ratio to the change in
other related variable.
35
• E.g. If the use of fertilizer is
doubled, yield of maize
18
crop would not be exactly
doubled.
0
1 2 3 No. of d... 4 5
Measures of correlation
• Measures of correlation: There are several measures of
correlation but following three are important measures.

1.Scatter diagram

2.Graph method

3.Correlation coefficient
Scatter diagram

• This is the simplest method for confirming whether there is any


relationship between two variables by plotting values on chart or
graph.
• It is nothing but a visual representation of two variables by points
(dots) on a graph.
• In a scatter diagram one variable is taken on the X-axis and other on
the Y-axis and the data is represented in the form of points.
• It is called as a scatter diagram because it indicates scatter of various
points (variables).
Scatter diagram

• Scatter diagram gives a general idea


38
Y about existence of correlation between
two variables and type of correlation, but
30 it does not give correct numerical value of
the correlation.
23
• Depending on the extent of relationship
between two variables, scatter diagrams
15
shows perfect correlation, perfect negative
correlation, no correlation, high positive
8

and high negative correlation.


0
23 45 68 90 113
Merits of Scatter diagram
• Merits of scatter diagram:
1. It is the simple method to find out nature of correlation between two variables.
2. It is not influenced by extreme limits
3. It is easy to understand.
• Demerits:
1. It doesn’t give correct numerical value of correlation. It is unable to give exact
degree of correlation between two variables.
2. It is a subjective method.
3. It cannot be applied to qualitative data.
4. Scatter is the only first step in finding out the strength of correlation-ship.
Correlation coefficient

• Scattered diagram and graphic method only gives a rough idea


about the relationship between two variables but does not give
numerical measure of correlation.
• The degree of relationship can be established by calculating Karl
Pearson’s coefficient, which is denoted by ‘r’
• Definition: The coefficient of correlation ‘r’ can be defined as a
measure of strength of the linear relationship between the two
variables X and Y.
Correlation coefficient

• r= S( X -`X) (Y-`Y )/ ÖS( X -`X)(Y-`Y)

• where X = Independent variable

• Y= dependent variable

• X -`X = deviation from AM

• Y-`Y = deviation from the mean

• If r>0, correlation is positive and r<0, correlation is negative.

• r =0 variables are not related.


Correlation coefficient

• Larger the numerical value of ‘r’ more close relationship between


variables.
• If r = 1, we can say that there is perfect positive relationship

• If r = -1 there is perfect negative relationship.

• In general, for r >0.8 we can say that there is high correlation

• If r is between 0.3-0.8 then there is considerable correlation exists


and
• If r < 0.3 we can say that there is negligible correlation.
Characteristics of correlation coefficient

The value of r ranges between (-1) and (+1):


• If there is no relationship at all between the two variables, then the
value is zero.
• On the other hand if the relationship is perfect, which means that all
the points on the scatter diagram fall on the straight line, the value of
r is +1 or –1, depending on the direction of line.
• Other values of r show an intermediate degree of relationship
between the two variables.
Characteristics of correlation coefficient

Sign of the coefficient can be positive or negative:


• It is positive when the slope of the line is positive, and it is
negative when the slope of line is negative.
• If the value of Y increases as the value of X increases the
sign will be positive on the other hand if the value Y
decreases as the value of X increases, then the slope will
be negative a so there will be –ve coefficient of correlation.
Merits of Correlation coefficient

1.It is the numerical measure of correlation.

2. It determines a single value which summarizes extent


of linear relationship.

3. It also indicates the type of correlation

4. It depends on all the observations so give true


picture.
Demerits of Correlation coefficient
1.It can not be computed for qualitative data such
as flower colour, honesty, beauty, intelligence etc.
2.It measures only linear relationship, but it fails to
measure non-linear relationship.
3.It is difficult to calculate.
Applications of correlation
• In agriculture, genetics, physiology, medicine etc. correlation is used as a
tool of the analysis.
Agriculture:
• Correlation is widely used as a tool of analysis in agriculture sciences.
• E.g. to estimate the role of various variables (factors) such as fertilizers,
irrigation, fertility of soil etc. on crop yield.
• Physiology:
• Using regression and correlation analysis relationship between
germination time and temperature of soil, alkalinity of river water and
growth of fungi, etc. can be estimated.
Applications of Correlations

Genetics:
• Correlation analysis finds a lot of application in genetics.
• For instance, when ‘r’=0 (correlation coefficient) then it
indicates that the concern genes are located at distance
on same chromosomes.
• When r=1, it indicates that genes are linked. Thus,
correlation analysis is very important in gene mapping.
Types of Correlations

• Depending on the extent of relationship between two variables, scatter


diagrams shows perfect correlation, perfect negative correlation, no
correlation, high positive and high negative correlation.

Perfect correlation:
• All the points lie on a straight line.

• As the variable value increases on X-axis the value on Y-axis also increases
or vice a versa.
• E.g. height and biomass.
Types of Correlations
Perfect negative correlation:
• In this all the points lie on a straight line.
• As the value on X-axis increases, the value on Y-axis decreases
proportionately
• e.g. Water temperature and amount of dissolved oxygen.
No-correlations:
• In this the line can not be drawn which is passing through most of the
plotted points and the points are totally scattered.
• Hence there is no correlation between variables of X and Y-axis.
Types of Correlations

High positive correlation:


In this most of the plotted points lie on the line and others
near to this line.
High negative correlation:
The diagram is showing high negative correlation as the
slope of the lines is more than 90o and most of the points
either lie on the straight line or in close vicinity.
Regression
• This term was first used by Sir Francis Galton to describe the laws of human
inheritance.
• Regression describes the liner relationship in quantitative terms.

• It is used to make predictions about one variable based on our knowledge of the
other.
• The regression is divided into two categories i.e. simple regression and multiple
regressions.
• The simple regression is concerning with two variables while multiple regression
is concerning with more than two variables.
• Simple regression is further classified into linear and non-linear type regression.
Regression
• A linear regression is one in which some change in dependent variable
(Y) can be expected for the change in independent variable (X,
irrespective of the values of Y).
• In studying the way in which the yield of wheat vary in relation to
change the amount of fertilizer applied, yield is dependent variable (Y)
and fertilizer level is independent variable (X).
• The starting point in regression is to illustrate the relationship between
the dependent variable (weight) and independent variable (age) by scatter
diagram.
Regression analysis

• Regression analysis is widely used for prediction and


forecasting.
• It is also used to understand which among the independent
variables are related to the dependent variable, and to
explore the forms of these relationships.
• In restricted circumstances, regression analysis can be used
to infer causal relationships between the independent and
dependent variables.
Linear regression

• In statistics linear regression includes any approach to modelling


the relationship between a scalar variable y and one or more
variables denoted X, such that the model depends linearly on the
unknown parameters to be estimated from the data.
• Such a model is called a “linear model”.
• Linear regression has many practical applications.
• This is because models that depend linearly on their unknown
parameters are easier to fit than models which are non-linearly
related to their parameters.
Applications of linear regression

• Linear regression is widely used in biological, behavioural and social sciences to


describe possible relationships between variables.
• It ranks as one of the most important tools used in these disciplines.

Prediction or forecasting:
• Linear regression can be used to fit a predictive model to an observed data set
of y and X values.
• After developing such a model, if an additional value of X is then given without
its accompanying value of y, the fitted model can be used to make a prediction of
the value of y.
Applications of linear regression
Epidemiology:
• Early evidence relating tobacco smoking to mortality and morbidity came
from observational studies employing regression analysis.
• In order to reduce spurious correlations when analyzing observational data,
researchers usually include several variables in their regression models in
addition to the variable of primary interest.
• For example, suppose we have a regression model in which cigarette smoking is
the independent variable of interest, and the dependent variable is lifespan
measured in years.
Applications of linear regression

Environmental science:
• Linear regression finds application in a wide range of
environmental science.
• In Canada, the Environmental Effects Monitoring
Program uses statistical analyses on fish and benthic
surveys to measure the effects of pulp mill or metal
mine effluent on the aquatic ecosystem.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy