0% found this document useful (0 votes)
155 views32 pages

Chapter 6 Variable Selection and Model Building

This document discusses variable selection and model building. It introduces the variable selection problem of balancing including more variables for their information value while keeping the number of variables low to avoid increased variance. It describes several variable selection algorithms and evaluation criteria like R-squared. It also covers computational techniques for variable selection like all possible regressions and stepwise regression methods including forward selection, backward elimination, and stepwise regression.

Uploaded by

kenenisa Abdisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views32 pages

Chapter 6 Variable Selection and Model Building

This document discusses variable selection and model building. It introduces the variable selection problem of balancing including more variables for their information value while keeping the number of variables low to avoid increased variance. It describes several variable selection algorithms and evaluation criteria like R-squared. It also covers computational techniques for variable selection like all possible regressions and stepwise regression methods including forward selection, backward elimination, and stepwise regression.

Uploaded by

kenenisa Abdisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 6 Variable Selection and

Model building
Department of Statistics
DDU

1
9.1 Introduction
9.1.1 The Model-Building Problem
• Ensure that the function form of the model is
correct and that the underlying assumptions are
not violated.
• A pool of candidate regressors
• Variable selection problem
• Two conflicting objectives:
– Include as many regressors as possible: the
information content in these factors can
influence the predicted values, y 2
– Include as few regressors as possible: the
variance of the prediction increases as the
number of the regressors increases
• “Best” regression equation???
• Several algorithms can be used for variable
selection, but these procedures frequently specify
different subsets of the candidate regressors as
best.
• An idealized setting:
– The correct functional forms of regressors are
known.
– No outliers or influential observations
3
• Residual analysis
• Iterative approach:
1. A variable selection strategy
2. Check the correct functional forms, outliers
and influential observations
• None of the variable selection procedures are
guaranteed to produce the best regression
equation for a given data set.

4
9.1.2 Consequences of Model Misspecification
• The full model

• The subset model

5
6
7
8
• Motivation for variable selection:
– Deleting variables from the model can improve
the precision of parameter estimates. This is
also true for the variance of predicted response.
– Deleting variable from the model will introduce
the bias.
– However, if the deleted variables have small
effects, the MSE of the biased estimates will be
less than the variance of the unbiased estimates.

9
9.1.3 Criteria for Evaluating Subset Regression
Models
• Coefficient of Multiple Determination:

10
– Aitkin (1974) : R2-adequate subset: the subset
regressor variables produce R2 > R20

11
12
13
14
15
16
• Uses of Regression and Model Evaluation Criteria
– Data description: Minimize SSRes and as few
regressors as possible
– Prediction and estimation: Minimize the mean
square error of prediction. Use PRESS statistic
– Parameter estimation: Chapter 10
– Control: minimize the standard errors of the
regression coefficients.

17
9.2 Computational Techniques
for Variable Selection
9.2.1 All Possible Regressions
• Fit all possible regression equations, and then
select the best one by some suitable criterions.
• Assume the model includes the intercept term
• If there are K candidate regressors, there are 2K
total equations to be estimated and examined.

18
Example 9.1 The Hald Cement Data

19
20
• R2p criterion:

21
22
23
24
25
26
27
28
9.2.2 Stepwise Regression Methods

• Three broad categories:


1. Forward selection
2. Backward elimination
3. Stepwise regression

29
30
Backward elimination
– Start with a model with all K candidate
regressors.
– The partial F-statistic is computed for each
regressor, and drop a regressor which has the
smallest F-statistic and < FOUT.
– Stop when all partial F-statistics > FOUT.

31
Stepwise Regression
• A modification of forward selection.
• A regressor added at an earlier step may be
redundant. Hence this variable should be dropped
from the model.
• Two cutoff values: FOUT and FIN
• Usually choose FIN > FOUT : more difficult to add a
regressor than to delete one.

32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy