Correlation and Regression
Correlation and Regression
Correlation shows Association or relation between two variables whereas regression shows the value of the variable based
on other Bivariate data.
Bivariate Data are the data collected for tow variables irrespective of time.
Can be Marginal and conditional distribution.
Collected for 2 variables at the same time.
For distribution p+q, a number of cells are pq.
Some cells may be 0.
For p*q, marginal distribution is 2, and conditional are p+q.
Correlation
The change in one variable is Reciprocated by a corresponding change in the other variable either directly or inversely; then
the two variables are known to be associated or correlated. Correlation analysis aim is establishing relation between two
variables and measuring the extent of relation between two variable.
These are of two types of correlation
1) Positive correlation 2) Negative correlation
The value of correlation is between -1 to 1, -1 means perfect negative, 0 to -1 means negatives, 0 means no correlation,
while 0 to 1 means positive and +1 means perfect positive correlation.
Scatter diagram:- this is a simple diagrammatic method to establish a correlation between a pair of variables, and it is used of
linear and non-linear (curvilinear) distribution.
Karl Pearson’s Product moment correlation coefficient
The coefficient of correlation is a unit free measure.
The coefficient of correlation is unaffected by change the origin or scale, but it changes its sign with the change of sign of
variables. If the sign of both variables is the same, r remains the same. While if sign differs r sign also changes.
The coefficient of correlation always is between -1 and 1, including both limiting values.
Spurious correlation means the correlation between two variables has no causal relation.
Product moment correlation coefficient is considered for finding the nature of correlation and the the amount of correlation.
Product moment correlation coefficient may be defined as the ratio of covariance between the variable to the product of their
standard deviations.
r= rxy = Where Cov (x,y) = or Cov (x,y)=
and
Spearman’s rank correlation:-
For finding the correlation between two attributes we consider
rR=
Where rR denotes rank correlation coefficient, and it lies between -1 and 1 inclusive of these two
Values. d= Rx – Ry. The Rx and Ry are Rankings given to x and y series. Ranks are given in descending order. 1 is given to highest
and so on.
In case the same Ranks are received by individuals, then
[∑ ∑
]
3
2 𝑚 −𝑚
1 −6 ⅆ +
12
𝑛 ( 𝑛2 − 1 )
m represents the number of repetitions. (m3 – m) will come under number of times numbers are repeated.
Coefficient of concurrent deviations:-
It is the quickest method to find correlation between two variables.
rc =
If (2c-m)> 0, then we take the positive sign both inside and outside the radical sign, and if (2c – m) < 0, we are to consider the
negative sign both inside and outside the radical sign.
C= no. of positive signs
m= n-1,
n= no. of observations
Regression
In regression analysis, we are concerned with the estimation of one variable for a given value of another variable or establishing
a mathematical relationship between two variables and predicting for deriving the regression equations is knowns as least square
method.
If y= a+bx, a and b are regression parameters, regression equation y on x, b yx methods based on least square. Regression
coefficient also represents the shape of regression equations.
Probable error:-
It is a method of obtaining the correlation coefficient of population. It is defined as
P.E = 0.674
S.E = Standard error of correlation coefficient.
S.E =
The limit of correlation coefficient = r ± P.E.
= r > 6 P.E. = presence of correlation is certain.
= r < P.E. = No evidence of correlation