0% found this document useful (0 votes)
22 views19 pages

Regression

Uploaded by

Shaheen G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views19 pages

Regression

Uploaded by

Shaheen G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Simple Linear Regression

Statistics for research

1
What is simple linear regression ?
• The regression is used to predicted the expected value of one
variable if the value for another one is given.
• Among the two variables, one should be treated as independent
variable and the other is treated to be dependent.
• The relationship stated above can be expressed in the form of
linear equation in two variables. Among the two variables say X
and Y, at a time one can be treated as dependent on the other.
a) X depends on Y
b) Y depends on X
2
Population Regression model

Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

Y = β 0 + β1X + ε
Variable

Linear component Random Error


component

3
Linear Regression Assumptions
• Error values (ε) are statistically independent
• Error values are normally distributed for any given value of x
• The probability distribution of the errors is normal
• The probability distribution of the errors has constant variance
• The underlying relationship between the x variable and the y
variable is linear

4
Estimated Regression Model

The sample regression line provides an estimate of


the population regression line

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value intercept

Independent

ŷ i = b0 + b1x variable

The individual random error terms ei have a mean of zero

5
Least square estimators

𝑛∑𝑋𝑌 − ∑𝑋∑𝑌
𝑏1 =
𝑛 ∑ 𝑋 2 − ∑𝑋 2

𝑏0 = 𝑌ത − 𝑏1 𝑋ത

6
Example: The following table shows the ages [X] and systolic
blood pressure [Y] of eight person: Find the regression line.

Number Age Blood Pressure [Y]


of [X]
variables
1 56 160
2 42 130
3 60 125
4 50 135
5 54 145
6 49 115
7 39 140
8 45 120
Total n=8 395 1070

7
Computation of Simple Linear Regression equation

X Y XY
56 160 8960
42 130 5460
60 125 7500
50 135 6750
54 145 7830
49 115 5635
39 140 5460
45 120 5400
395 1070 52995
8
Computation of Simple Linear Regression equation

X Y XY 𝑿𝟐
56 160 8960 3136
42 130 5460 1764
60 125 7500 3600
50 135 6750 2500
54 145 7830 2916
49 115 5635 2401
39 140 5460 1521
45 120 5400 2025
395 1070 52995 19863

9
Solution:-

𝒏∑𝑿𝒀−∑𝑿∑𝒀 𝟖(52995)−(𝟑𝟗𝟓)(𝟏𝟎𝟕𝟎)
• 𝒃𝟏 = = = 0.455019
𝒏 ∑ 𝑿𝟐 − ∑𝑿 𝟐 𝟖 𝟏𝟗𝟖𝟔𝟑 − 𝟑𝟗𝟓 𝟐

∑𝑿 𝟑𝟗𝟓 ∑𝒚 𝟏𝟎𝟕𝟎

•𝑿= = = 𝟒𝟗. 𝟑𝟕𝟓 ഥ=
𝒚 = =
𝒏 𝟖 𝒏 𝟖
𝟏𝟑𝟑. 𝟕𝟓
ഥ − 𝒃𝟏 𝑿
• 𝒃𝟎 = 𝒀 ഥ = 133.75-(0.4550)(49.375)= 111.2834
෡ = 𝒃𝟎 + 𝒃𝟏𝑿
•𝒀
෡ = 𝟏𝟏𝟏. 𝟐𝟖𝟑𝟒 + 𝟎. 𝟒𝟓𝟓𝟎𝟏𝟗𝑿
•𝒀
10
•Computation of Simple Linear Regression
equation

X Y ෡ = 𝟏𝟏𝟏. 𝟐𝟖𝟑𝟒 + 𝟎. 𝟒𝟓𝟓𝟎𝟏𝟗𝑿


𝒀 ෡)
(𝒀 − 𝒀
56 160 136.7634 23.2366
42 130 130.3934 -0.3934
60 125 138.5834 -13.5834
50 135 134.0334 0.9666
54 145 135.8534 9.1466
49 115 133.5784 -18.5784
39 140 129.0284 10.9716
45 120 131.7584 -11.7584
395 1070 1069.9922

11
Interpertation

•𝒀෡ = 𝟏𝟏𝟏. 𝟐𝟖𝟑𝟒 + 𝟎. 𝟒𝟓𝟓𝟎𝟏𝟗𝑿


• The value of b1=0.455019, indicates that the average Y (dependent variable
)as blood peressure are expected to increase by 0.455019. For unit increase
in independent variable.

• The value of b0= 𝟏𝟏𝟏. 𝟐𝟖𝟑𝟒 indicates the average blood pressure
without any exercise (X=0). The interpretation of b0 is not always meaningful.

12
Standard error of estimate:
The observed value of (X, Y) do not all fall on the regression line but they
scatter from it. The degree of scatter (or dispersion) of the observed values
about the regression line is measured by what is called the standard deviation
of regression or the standard error of estimate Y on X.
For sample data, we estimate by 𝑆𝑦.𝑥 which is defined as:
෠ 2
∑(𝑌−𝑌)
𝑆𝑦.𝑥 =
𝑛−2

13
Standard error of estimation

෠ 2
∑(𝑌 − 𝑌)
𝑆𝑦.𝑥 =
𝑛−2

1412.991
𝑆𝑦.𝑥 =
8−2

𝑆𝑦.𝑥 = 15.34596

14
Standard error of estimation

෠ 2
∑(𝑌 − 𝑌) ∑ 𝑌 2 − 𝑏0 ∑ 𝑌 − 𝑏1 ∑ 𝑋𝑦
𝑆𝑦.𝑥 = =
𝑛−2 𝑛−2

15
Solution
X Y ෡ = 𝟏𝟏𝟏. 𝟐𝟖𝟑𝟒 + 𝟎. 𝟒𝟓𝟓𝟎𝟏𝟗𝑿
𝒀 ෡)
(𝒀 − 𝒀 ෡ )𝟐
(𝒀 − 𝒀
56 160 136.7634 23.2366 539.9396
42 130 130.3934 -0.3934 0.154764
60 125 138.5834 -13.5834 184.5088
50 135 134.0334 0.9666 0.934316
54 145 135.8534 9.1466 83.66029
49 115 133.5784 -18.5784 345.1569
39 140 129.0284 10.9716 120.376
45 120 131.7584 -11.7584 138.26
395 1070 1069.9922 0 1412.991

16
𝟐
Coefficient of determination or Goodness of fit ( 𝑹 )
A commonly used measure of the goodness of fit of linear model is (𝑹𝟐 )
Is called coefficient of determination.
The coefficient of determination tells us the proportion of variation in the
response variable explain by the independent variable
The range of 𝑹𝟐 is 0 ≤ 𝑹𝟐 ≤1

Suppose if 𝑹𝟐 is 83%
This indicate that 83 % of the variability in response (dependent) variable is
explained by its linear relationship with the independent variable.
17% variation by chance or due to other factors.

17
𝟐
𝑹

𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
• 𝑅2 = × 100
𝑇𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛

∑𝑌 2
• Total variation = ∑ 𝑌 − 𝑌ത 2 = ∑ 𝑌2 −
𝑛

2
• Unexplained variation = ∑ 𝑌 − 𝑌෠ = ∑ 𝑌 2 − 𝑎 ∑ 𝑌 − 𝑏 ∑ 𝑋𝑌

• Explained variation = (Total variation) − (unexplained varaiiton)

18
𝟐
Goodness of fit (𝑹 )
2
∑ 𝑌 − 𝑌෠
𝑅2 = 1 −
∑ 𝑌 − 𝑌ത 2

1412.991
𝑅2 =1−
1487.5

𝑅2 = 0.05009 *100=5%

19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy