0% found this document useful (0 votes)
13 views8 pages

regression analysis

Regression analysis is a statistical method used to estimate the relationship between dependent and independent variables, allowing for predictions based on known values. It includes simple and multiple linear regression models, where the dependent variable is influenced by one or more independent variables. Key concepts include the regression equation, error term, least squares method, and the coefficient of determination, which measures the strength of the relationship between the variables.

Uploaded by

sadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

regression analysis

Regression analysis is a statistical method used to estimate the relationship between dependent and independent variables, allowing for predictions based on known values. It includes simple and multiple linear regression models, where the dependent variable is influenced by one or more independent variables. Key concepts include the regression equation, error term, least squares method, and the coefficient of determination, which measures the strength of the relationship between the variables.

Uploaded by

sadia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Department of Basic Science

Primeasia University

Regression Analysis
Regression Analysis
Regression analysis is a statistical technique, which has developed to study and measure the
statistical relationship among two or more variables with a vision to estimate or predict the value
of dependent variable for some known value of the independent variable.

By regression analysis we measure the cause and effect relationship between two variables.

Example
i. Income and expenditure of a class of people
ii. Fertilizer used and yield of various plots of land
iii. The price of commodity and amount of demanded

Some Terms Regarding Regression Analysis

In regression analysis, there are two types of variable. They are known as

i. Dependent variable
ii. Independent Variable

Dependent Variable

The variable whose value is influenced or is to be predicted is called dependent variable. It is


usually denoted by y. Dependent variable is also known as explained variable, predictant,
regressand, response or endogenous variable.

Independent Variable

The variable, which influences the value or is to be use for prediction, is called independent
variable. It is usually denoted by x. In regression analysis, independent variable is also known as
explanatory variable, predictor, regressor, control or exogenous variable.
Example

i. If we want to know the expected weekly production of a company then production will be
the dependent variable and the predictor/independent variables could be the capital,
number of labors engaged, supply of raw materials.
ii. If we want to know about the yield of a plot, within the limits, the yield of a plot depends
upon the kind and amount of the fertilizer used. Hence the yield variable y is known the
dependent variable and the fertilizer x is known as the independent variable.

Regression Model

A regression model is used to investigate the relationship between two or more


variables and estimate one variable based on the others.

Types of Regression Model:

There are two types of regression model. They are

i. Simple linear regression model


ii. Multiple linear regression model

Simple Linear Regression Model

If the dependent variable depends only on one explanatory/independent variable, then the
regression is called simple linear regression.

The simple linear regression model can be expressed by the following equation:

yi = α+ βxi+ 𝜀 i

Here y is the dependent variable and x is the independent variable.

Multiple Linear Regression Model

If there are more than one explanatory variable, then the regression is called multiple linear
regression.

The multiple linear regression model can be expressed by the following equation:

yi = α+ 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ … … … + 𝛽𝑘 𝑥𝑘 + 𝜀 i
Simple Linear Regression Model

In simple linear regression, the model used to describe the relationship between a single dependent
variable y and a single independent variable x is

yi = 𝛼+ βxi + εi

where,

yi is the Dependent Variable

xi is the Independent variable

𝛼 and 𝛽 are the model parameters.

𝛼 is the intercept

𝛽 is the slope

𝜀 is the error term which is normally and independently distributed with zero mean and variance
s2 .

Estimated Linear Regression Equation

An estimated linear regression equation can be written as

𝑦̂ = a+bx

Where,

𝑦̂ is the estimated value of y

a is the estimated value of 𝛼. It is the estimated value of y when x=0.

b is the estimated value of 𝛽. It describes the average change in 𝑦̂ for one unit in the
independent variable x

x is the independent variable that is selected.

Error Term

The deviation of a particular value of y from 𝑦̂ is called error term or the random component. It is
written as ei =(yi -𝑦̂i)
Least Square Method

To find the best fitted regression, we have to find the values of a and b in such a way that error
sum of squares is minimum. This is done by the method of least squares developed by Legender.

min(yi -𝑦̂i)2

Where,

yi= observed value of the dependent variable for the ith observation

𝑦̂i = estimated value of the dependent variable for the ith observation

Slope for the estimated regression equation:

𝚺𝒙𝚺𝒚
𝚺𝐱𝐲−
𝒏
b=
(𝚺𝒙)𝟐
𝚺𝒙𝟐 − 𝒏

̅ − 𝒃𝒙
and a= 𝒚 ̅

Example: The following data relate to advertising expenditure and sales of a firm:

Advertising 14 19 24 21 26 22 15 20 19
Expenditure :x
Sales (in lakh tk) : y 31 36 48 37 50 45 33 41 39

(i) Find the regression equation of sales on advertising expenditure and interpret a and b.

(iii) Find the test sales when the advertising expenditure is Taka 40 lakh

Solution:

(i) The best fitted regression equation of sales y on advertising expenditure x is

𝑦̂ = a+ bx
Table for calculation of regression equation

Advertisement Sales (y) x2 y2 xy


Expenditure (x)
14 31 196 961 434
19 36 361 1296 684
24 48 576 2304 1152
21 37 441 1369 777
26 50 676 2500 1300
22 45 484 2025 990
15 33 225 1089 495
20 41 400 1681 820
19 39 361 1521 741
Σ𝑥 = 180 Σ𝑦 = 360 Σ𝑥 2 =3720 Σ𝑦 2 =1476 Σ𝑥𝑦 = 7393

Here,

180
𝑥̅ = =20
9

360
𝑦= =40
9

Σ𝑥Σ𝑦 180∗360
Σxy− 7393− 7393−7200 193
𝑛 9
b=
(Σ𝑥)2
= (180)2
= 3720−3600 =120 = 1.61
Σ𝑥 2 − 3720−
𝑛 9

a= 𝑦̅ − 𝑏𝑥̅ = 40-1.61×20 = 40-32.2 = 7.8

Hence the required regression equation of test scores on sale is

𝑦̂ = a+bx = 7.8+1.61x

Interpretation of a:

a = 7.8 means that, if there is no advertising expenditure, then the sales is 7.8
Interpretation of b:

b = 1.61 means that, one lakh changes in advertising expenditure, the sales will increase by 1.61
lakhs.

(ii) when the advertising expenditure that is x=40, the value of y is

𝑦̂ = 7.8+1.61*40 = 7.8+64.4 = 72.2

The sales are 72 lakhs when the advertising expenditure is 40lakhs.

Coefficient of Determination

The coefficient of determination is the primary way we can measure the extent, or strength, of the
association that exists between two variables, X and Y. Statisticians interpret the coefficient of
determination by looking at the amount of the variation in Y that is explained by the regression
line. The coefficient of determination is defined by

r2  1
 ( y  yˆ ) 2

 ( y  y) 2

Example: Calculate the coefficient of variation for the following data set

X 12 30 15 24 14 18 28 26 19 27
y 20 60 27 50 21 30 61 54 32 57
Solution: We can use the following Table to calculate coefficient of determination:

xi yi yˆ  13.02  2.54x ( y  yˆ ) 2 ( y  y) 2

12 20 17.46 6.4516 449.44

30 60 63.18 10.1124 353.44

15 27 25.08 3.6864 201.64

24 50 47.94 4.2436 77.44

14 21 22.54 2.3716 408.04

18 30 32.7 7.29 125.44

28 61 58.1 8.41 392.04

26 54 53.02 0.9604 163.84

19 32 35.24 10.4976 84.64

27 57 55.56 2.0736 249.64

213 412 56.0972 2505.6

y
 y  412  41.2
n 10

r 2
 1
 ( y  yˆ ) 2

 1
56.097
 1  0.0224  0.9776
 ( y  y) 2
2505.6

Thus, we can conclude that the variation in number of workers (the independent variable X)
explains 97.76 percent of the variation in the production of Redwood falls plant (the dependent
variable Y).
N.B: Suppose r2 =0.93, then it implies that 93 percent variation in y is explained by the
variation in x.

Relationship between regression coefficient and correlation coefficient:

Suppose r is the correlation coefficient between x and y, b is the regression coefficient of y on x


and d is regression coefficient of x on y, then

r = √𝑏 × 𝑑

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy