regression analysis
regression analysis
Primeasia University
Regression Analysis
Regression Analysis
Regression analysis is a statistical technique, which has developed to study and measure the
statistical relationship among two or more variables with a vision to estimate or predict the value
of dependent variable for some known value of the independent variable.
By regression analysis we measure the cause and effect relationship between two variables.
Example
i. Income and expenditure of a class of people
ii. Fertilizer used and yield of various plots of land
iii. The price of commodity and amount of demanded
In regression analysis, there are two types of variable. They are known as
i. Dependent variable
ii. Independent Variable
Dependent Variable
Independent Variable
The variable, which influences the value or is to be use for prediction, is called independent
variable. It is usually denoted by x. In regression analysis, independent variable is also known as
explanatory variable, predictor, regressor, control or exogenous variable.
Example
i. If we want to know the expected weekly production of a company then production will be
the dependent variable and the predictor/independent variables could be the capital,
number of labors engaged, supply of raw materials.
ii. If we want to know about the yield of a plot, within the limits, the yield of a plot depends
upon the kind and amount of the fertilizer used. Hence the yield variable y is known the
dependent variable and the fertilizer x is known as the independent variable.
Regression Model
If the dependent variable depends only on one explanatory/independent variable, then the
regression is called simple linear regression.
The simple linear regression model can be expressed by the following equation:
yi = α+ βxi+ 𝜀 i
If there are more than one explanatory variable, then the regression is called multiple linear
regression.
The multiple linear regression model can be expressed by the following equation:
yi = α+ 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ … … … + 𝛽𝑘 𝑥𝑘 + 𝜀 i
Simple Linear Regression Model
In simple linear regression, the model used to describe the relationship between a single dependent
variable y and a single independent variable x is
yi = 𝛼+ βxi + εi
where,
𝛼 is the intercept
𝛽 is the slope
𝜀 is the error term which is normally and independently distributed with zero mean and variance
s2 .
𝑦̂ = a+bx
Where,
b is the estimated value of 𝛽. It describes the average change in 𝑦̂ for one unit in the
independent variable x
Error Term
The deviation of a particular value of y from 𝑦̂ is called error term or the random component. It is
written as ei =(yi -𝑦̂i)
Least Square Method
To find the best fitted regression, we have to find the values of a and b in such a way that error
sum of squares is minimum. This is done by the method of least squares developed by Legender.
min(yi -𝑦̂i)2
Where,
yi= observed value of the dependent variable for the ith observation
𝑦̂i = estimated value of the dependent variable for the ith observation
𝚺𝒙𝚺𝒚
𝚺𝐱𝐲−
𝒏
b=
(𝚺𝒙)𝟐
𝚺𝒙𝟐 − 𝒏
̅ − 𝒃𝒙
and a= 𝒚 ̅
Example: The following data relate to advertising expenditure and sales of a firm:
Advertising 14 19 24 21 26 22 15 20 19
Expenditure :x
Sales (in lakh tk) : y 31 36 48 37 50 45 33 41 39
(i) Find the regression equation of sales on advertising expenditure and interpret a and b.
(iii) Find the test sales when the advertising expenditure is Taka 40 lakh
Solution:
𝑦̂ = a+ bx
Table for calculation of regression equation
Here,
180
𝑥̅ = =20
9
360
𝑦= =40
9
Σ𝑥Σ𝑦 180∗360
Σxy− 7393− 7393−7200 193
𝑛 9
b=
(Σ𝑥)2
= (180)2
= 3720−3600 =120 = 1.61
Σ𝑥 2 − 3720−
𝑛 9
𝑦̂ = a+bx = 7.8+1.61x
Interpretation of a:
a = 7.8 means that, if there is no advertising expenditure, then the sales is 7.8
Interpretation of b:
b = 1.61 means that, one lakh changes in advertising expenditure, the sales will increase by 1.61
lakhs.
Coefficient of Determination
The coefficient of determination is the primary way we can measure the extent, or strength, of the
association that exists between two variables, X and Y. Statisticians interpret the coefficient of
determination by looking at the amount of the variation in Y that is explained by the regression
line. The coefficient of determination is defined by
r2 1
( y yˆ ) 2
( y y) 2
Example: Calculate the coefficient of variation for the following data set
X 12 30 15 24 14 18 28 26 19 27
y 20 60 27 50 21 30 61 54 32 57
Solution: We can use the following Table to calculate coefficient of determination:
xi yi yˆ 13.02 2.54x ( y yˆ ) 2 ( y y) 2
y
y 412 41.2
n 10
r 2
1
( y yˆ ) 2
1
56.097
1 0.0224 0.9776
( y y) 2
2505.6
Thus, we can conclude that the variation in number of workers (the independent variable X)
explains 97.76 percent of the variation in the production of Redwood falls plant (the dependent
variable Y).
N.B: Suppose r2 =0.93, then it implies that 93 percent variation in y is explained by the
variation in x.
r = √𝑏 × 𝑑