0% found this document useful (0 votes)
9 views75 pages

ch13 Regressionanalysis

Chapter 13 focuses on simple linear regression, outlining the steps for conducting regression analysis, including data collection, model design, and interpretation of results. It explains the definitions of simple and linear regression, the importance of the least squares method, and the coefficient of determination (R²) as a measure of model fit. Examples illustrate the application of regression analysis in both micro and macro contexts, emphasizing the significance of variable selection and data quality.

Uploaded by

poemaefendija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views75 pages

ch13 Regressionanalysis

Chapter 13 focuses on simple linear regression, outlining the steps for conducting regression analysis, including data collection, model design, and interpretation of results. It explains the definitions of simple and linear regression, the importance of the least squares method, and the coefficient of determination (R²) as a measure of model fit. Examples illustrate the application of regression analysis in both micro and macro contexts, emphasizing the significance of variable selection and data quality.

Uploaded by

poemaefendija
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 75

CHAPTER 13

SIMPLE LINEAR
REGRESSION

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
 Descriptive statistics
 Inferential statistics: it studies relationship
between variables
Steps to do regression
analysis
1.Literature review=economic theory
2. Get data: primary or secondary
data=good data, garbage in-garbage out
3. Design the model
Y and which Xs to include in the model-
based on the theory
4. Run the regression
5. Interpret results
6. Recommendations or forecasting if that
was the aim of the paper/report
Preconditions to undertake
regression analysis
1. At least 30 observations
2. Number of Xs<number of observations
3. Xs should be variables not constant
Garbage in garbage out=poor data poor
results/unreliable
X=number of times participated in
lecture=if all students participated 12
times=we should not include this variable
The purpose of regression
analysis
1. Test the theory
2. Forecasting/predict
3. Propose some policies/interventions
Propose a good practice from Iceland
Opening Example

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example: micro data
We aim to study factors that influence the students’
performance
Y=dependent variable=average grade
Explanatory variable
X1=participation in lectures
X2=times spend reading/learning
X3=concentration/focus during lectures
X4=participation in exercises
X5=gender
X6=performance in secondary school
X7=gymnasium or vocational school
X8=rural/urban
Example: macro level data
GDP, unemployment
-Philips Curve studies relationship between
unemployment and inflation rate
Y=inflation
X=unemployment
-Okun’s Law studies relationship between
economic growth and unemployment
Y=change in unemployment rate
X=GDP growth
Y=a+bX=simple regression-only one X
X=variable, not a period
Example: micro level data
Y=dependent variable
X are independent=explanatory variables
Single regression=only one X=this type of regression is nearly
never used
Example Y is average grade of a student
X1=time spend studying=average number of hours reading
per day
Why? Always more than one factor that influence Y
Multiple regressions=more than one X
X2=participation in lectures
X3=participation in exercises
X4=active participation
X5=success in the secondary education
X6=gymnasium/vocational school
Example 2: macro level
data
 Country/region level data are macro data
Philips Curve=association between
unemployment and inflation rate
Okun’s Law=impact of GDP growth on
change in unemployment rate
Y=unemployment rate
Many Xs
SIMPLE LINEAR REGRESSION
MODEL

 Simple Regression
 Linear Regression

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Simple Regression
Definition
A regression model is a mathematical equation
that describes the relationship between two or
more variables. A simple regression model
includes only two variables: one independent and
one dependent. The dependent variable is the
one being explained, and the independent
variable is the one used to explain the variation in
the dependent variable.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Linear Regression

Definition
A (simple) regression model that gives a
straight-line relationship between two
variables is called a linear regression
model.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.1 Relationship between food
expenditure and income. (a) Linear
relationship. (b) Nonlinear relationship.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.2 Plotting a linear equation.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.3 y-intercept and slope of a
line.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
SIMPLE LINEAR REGRESSION
ANALYSIS

 Scatter Diagram
 Least Squares Line
 Interpretation of a and b
 Assumptions of the Regression Model

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
SIMPLE LINEAR REGRESSION
ANALYSIS

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
SIMPLE LINEAR REGRESSION
ANALYSIS

Definition
In the regression model y = A + Bx + ε, A
is called the y-intercept or constant term, B
is the slope, and ε is the random error term.
The dependent and independent variables
are y and x, respectively.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
SIMPLE LINEAR REGRESSION
ANALYSIS

Definition
In the model ŷ = a + bx, a and b, which are
calculated using sample data, are called
the estimates of A and B, respectively.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.1 Incomes (in hundreds of
dollars) and Food Expenditures of
Seven Households

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Consumption function
C=C+cY
Y=a+bX
Y=income
c=MPC marginal propensity to consume
c=0-1

Boyes
Scatter Diagram

Definition
A plot of paired observations is called a
scatter diagram.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.4 Scatter diagram.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.5 Scatter diagram and
straight lines.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.6 Regression Line and
random errors.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Error Sum of Squares (SSE)
The error sum of squares, denoted SSE, is

SSE  e  ( y  yˆ )
2 2

The values of a and b that give the minimum SSE


are called the least square estimates of A and B,
and the regression line obtained with these
estimates is called the least square line.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
The Least Squares Line

For the least squares regression line


ŷ = a + bx,

SS xy
b and a  y  bx
SS xx

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
The Least Squares Line

where

 x  y   x 
2

SS xy  xy  and SS xx  x  2

n n

and SS stands for “sum of squares”. The


least squares regression line ŷ = a + bx us
also called the regression of y on x.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1

Find the least squares regression line for


the data on incomes and food expenditure
on the seven households given in the Table
13.1. Use income as an independent
variable and food expenditure as a
dependent variable.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.2

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1: Solution

 x 386  y 108
x  x / n 386 / 7 55.1429
y  y / n 108 / 7 15.4286

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1: Solution

SS xy  xy 
 x  y  6403  (386)(108)
447.5714
n 7
 x 
2
(386)2
SS xx  x 2  23,058  1772.8571
n 7

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-1: Solution
SSxy 447.5714
b  .2525
SSxx 1772.8571
a y  bx 15.4286  (.2525)(55.1429) 1.5050

Thus, our estimated regression model is


ŷ = -1.5050 + .2525 x
When income increases with one unit (Euro), food expenditure increases by 0.25
units (Euros)=from every additional Euro in income, food expenditure increase by
0.25 Euros.
When income is zero, food expenditure is 1.5 units (Euro)
When income is zero, food expenditure is -1.5 units (Euro)=does not have a
meaning because consumption cannot be negative
Figure 13.7 Error of prediction.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Interpretation of a and b
Interpretation of a
 Consider the household with zero income.
Using the estimated regression line
obtained in Example 13-1,
 ŷ = 1.5050 + .2525(0) = $1.5050 hundred
 Thus, we can state that households with no
income is expected to spend $150.50 per
month on food
 The regression line is valid only for the
values of x between 33 and 83
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Interpretation of a and b

Interpretation of b
 The value of b in the regression model
gives the change in y (dependent variable)
due to change of one unit in x
(independent variable).
 We can state that, on average, a $100 (or
$1) increase in income of a household will
increase the food expenditure by $25.25
(or $.2525).

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.8 Positive and negative
linear relationships between x and y.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Case Study 13-1 Regression of
Heights and Weights of NBA Players

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Case Study 13-1 Regression of
Heights and Weights of NBA Players

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
COEFFICIENT OF DETERMINATION

Total Sum of Squares (SST)


The total sum of squares, denoted by
SST, is calculated as
 y 
2

SST  y  2

n
Note that this is the same formula that we
used to calculate SSyy.
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.15 Total errors.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.4

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.16 Errors of prediction
when regression model is used.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
COEFFICIENT OF DETERMINATION
Measures the goodness of fit=it measures how well is the
model defined
Assume the literature is telling us to include 8 Xs in the model
However, in the database that we are using, we have
only 6 variables out of 8
R2=0 to 1
Closer to 1, means the model is better fitted
R2=0.22; 22% of changes in food expenditure is explained
with changes in income
The model is not good fitted because it is less than 0.5

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
COEFFICIENT OF
DETERMINATION
Why is R2 low
1.We miss variables/omitted variables
2.Data entry error
3.We cannot explain things/issues with 100%
accuracy
COEFFICIENT OF DETERMINATION

Regression Sum of Squares (SSR)


The regression sum of squares , denoted
by SSR, is

SSR SST  SSE

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
COEFFICIENT OF DETERMINATION
Coefficient of Determination
The coefficient of determination, denoted
by r2, represents the proportion of SST that is
explained by the use of the regression model.
The computational formula for r2 is

2
b SSxy
r 
SSyy
and 0 ≤ r2 ≤ 1

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-3
For the data of Table 13.1 on monthly
incomes and food expenditures of seven
households, calculate the coefficient of
determination.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-3: Solution
 From earlier calculations made in Examples 13-1
and 13-2,
 b = .2525, SSxx = 447.5714, SSyy = 125.7143

2
b SSxy (.2525)(447.5714)
r   .90
SSyy 125.7143

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
LINEAR CORRELATION

 Linear Correlation Coefficient


 Hypothesis Testing About the Linear
Correlation Coefficient

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Linear Correlation Coefficient

Value of the Correlation Coefficient


The value of the correlation coefficient
always lies in the range of –1 to 1; that is,
-1 ≤ ρ ≤ 1 and -1 ≤ r ≤ 1

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation
between two variables.
(a) Perfect positive linear correlation, r = 1

x
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation
between two variables.
(b) Perfect negative linear correlation, r = -1

x
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.18 Linear correlation
between two variables.
(c) No linear correlation, , r ≈ 0

x
Prem Mann, Introductory Statistics, 7/E
Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation
between variables.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation
between variables.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation
between variables.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.19 Linear correlation
between variables.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Linear Correlation Coefficient
Linear Correlation Coefficient
The simple linear correlation, denoted by
r, measures the strength of the linear
relationship between two variables for a
sample and is calculated as
SSxy
r
SSxx SSyy

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-6

Calculate the correlation coefficient for the


example on incomes and food expenditures
of seven households.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-6: Solution

SSxy
r
SSxx SSyy
447.5714
 .95
(1772.8571)(125.7143)

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
REGRESSION ANALYSIS: A
COMPLETE EXAMPLE
Example 13-8
A random sample of eight drivers insured
with a company and having similar auto
insurance policies was selected. The
following table lists their driving experience
(in years) and monthly auto insurance
premiums.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8
a) Does the insurance premium depend on the
driving experience or does the driving experience
depend on the insurance premium? Do you
expect a positive or a negative relationship
between these two variables?

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
a) Based on theory and intuition, we
expect the insurance premium to
depend on driving experience
 The insurance premium is a dependent
variable
 The driving experience is an independent
variable

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Table 13.5

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

b) x  x / n 90 / 8 11.25
y  y / n 474 / 8 59.25
( x )( y ) (90)(474)
SSxy  xy  4739   593.5000
n 8
(  x )2 (90)2
SSxx  x 2  1396  383.5000
n 8
(  y )2 (474)2
SSyy  y 2  29,642  1557.5000
n 8

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
c)

SSxy  593.5000
b   1.5476
SSxx 383.5000
a  y  bx 59.25  ( 1.5476)(11.25) 76.6605

yˆ 76.6605  1.547 x

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

d) The value of a = 76.6605 gives the value


of ŷ for x = 0; that is, it gives the monthly
auto insurance premium for a driver with
no driving experience.
The value of b = -1.5476 indicates that,
on average, for every extra year of
driving experience, the monthly auto
insurance premium decreases by $1.55.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Figure 13.21 Scatter diagram and the
regression line.
e) The regression line slopes downward from
left to right.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

f)
SSxy  593.5000
r    .77
SSxx SSyy (383.5000)(1557.5000)

2
bSSxy ( 1.5476)(  593.5000)
r   .59
SSyy 1557.5000

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution
f) The value of r = -0.77 indicates that the
driving experience and the monthly auto
insurance premium are negatively related.
The (linear) relationship is strong but not
very strong.
The value of r² = 0.59 states that 59% of
the total variation in insurance premiums is
explained by years of driving experience
and 41% is not.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
Example 13-8: Solution

g) Using the estimated regression line, we


find the predict value of y for x = 10 is

ŷ = 76.6605 – 1.5476(10) = $61.18

Thus, we expect the monthly auto insurance


premium of a driver with 10 years of
driving experience to be $61.18.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved
CAUTIONS IN USING REGRESSION

 Extrapolation: The regression line estimated


for the sample data is reliable only for the
range of x values observed in the sample.

 Causality: The regression line does not


prove causality between two variables: that
is, it does not predict that a change in y is
caused by a change in x.

Prem Mann, Introductory Statistics, 7/E


Copyright © 2010 John Wiley & Sons. All right reserved

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy