Correlationandregression1 200905162711
Correlationandregression1 200905162711
Correlation
68
based on the ratio of change
between the variables under
45 study.
23
• Linear correlation: values have
constant ratio.
0
1 2X 3 • E.g. X= 30, 60, 90.
• Y= 10, 20, 30
Non-linear correlation
No. of days
70 The amount of change in one
variable doesn’t have a
53 constant ratio to the change in
other related variable.
35
• E.g. If the use of fertilizer is
doubled, yield of maize
18
crop would not be exactly
doubled.
0
1 2 3 No. of d... 4 5
Measures of correlation
• Measures of correlation: There are several measures of
correlation but following three are important measures.
1.Scatter diagram
2.Graph method
3.Correlation coefficient
Scatter diagram
• Y= dependent variable
Genetics:
• Correlation analysis finds a lot of application in genetics.
• For instance, when ‘r’=0 (correlation coefficient) then it
indicates that the concern genes are located at distance
on same chromosomes.
• When r=1, it indicates that genes are linked. Thus,
correlation analysis is very important in gene mapping.
Types of Correlations
Perfect correlation:
• All the points lie on a straight line.
• As the variable value increases on X-axis the value on Y-axis also increases
or vice a versa.
• E.g. height and biomass.
Types of Correlations
Perfect negative correlation:
• In this all the points lie on a straight line.
• As the value on X-axis increases, the value on Y-axis decreases
proportionately
• e.g. Water temperature and amount of dissolved oxygen.
No-correlations:
• In this the line can not be drawn which is passing through most of the
plotted points and the points are totally scattered.
• Hence there is no correlation between variables of X and Y-axis.
Types of Correlations
• It is used to make predictions about one variable based on our knowledge of the
other.
• The regression is divided into two categories i.e. simple regression and multiple
regressions.
• The simple regression is concerning with two variables while multiple regression
is concerning with more than two variables.
• Simple regression is further classified into linear and non-linear type regression.
Regression
• A linear regression is one in which some change in dependent variable
(Y) can be expected for the change in independent variable (X,
irrespective of the values of Y).
• In studying the way in which the yield of wheat vary in relation to
change the amount of fertilizer applied, yield is dependent variable (Y)
and fertilizer level is independent variable (X).
• The starting point in regression is to illustrate the relationship between
the dependent variable (weight) and independent variable (age) by scatter
diagram.
Regression analysis
Prediction or forecasting:
• Linear regression can be used to fit a predictive model to an observed data set
of y and X values.
• After developing such a model, if an additional value of X is then given without
its accompanying value of y, the fitted model can be used to make a prediction of
the value of y.
Applications of linear regression
Epidemiology:
• Early evidence relating tobacco smoking to mortality and morbidity came
from observational studies employing regression analysis.
• In order to reduce spurious correlations when analyzing observational data,
researchers usually include several variables in their regression models in
addition to the variable of primary interest.
• For example, suppose we have a regression model in which cigarette smoking is
the independent variable of interest, and the dependent variable is lifespan
measured in years.
Applications of linear regression
Environmental science:
• Linear regression finds application in a wide range of
environmental science.
• In Canada, the Environmental Effects Monitoring
Program uses statistical analyses on fish and benthic
surveys to measure the effects of pulp mill or metal
mine effluent on the aquatic ecosystem.