Chapter 5 The Dummy Variable Trap (EC220)
Chapter 5 The Dummy Variable Trap (EC220)
ThisworkislicensedunderaCreativeCommonsAttribution-ShareAlike3.0License.Thislicenseallows
theusertoremix,tweak,andbuildupontheworkevenforcommercialpurposes,aslongastheuser
creditstheauthorandlicensestheirnewcreationsundertheidenticalterms.
http://creativecommons.org/licenses/by-sa/3.0/
http://learningresources.lse.ac.uk/
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Suppose that you have a regression model with Y depending on a set of ordinary variables
X2, ..., Xk and a qualitative variable.
1
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Suppose that the qualitative variable has s categories. We choose one of them as the
omitted category (without loss of generality, category 1) and define dummy variables D2, ...,
Ds for the rest.
2
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
What would happen if we did not drop the reference category? Suppose we defined a
dummy variable D1 for it and included it in the specification. What would happen then?
3
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
We would fall into the dummy variable trap. I would be impossible to fit the model as
specified.
4
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
We will start with an intuitive explanation. The coefficient of each dummy variable
represents the increase in the intercept relative to that for the basic category. But there is
no basic category for such a comparison.
5
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
1 represents the fixed component of Y for the basic category. But again, there is no basic
category. Thus the model does not have any logical interpretation.
6
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 1 X 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Observation
Category
1
2
3
4
5
6
7
8
4
3
1
2
2
3
1
4
X1
D1
D2
D3
D4
1
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
D
i 1
X1
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 1 X 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Observation
Category
1
2
3
4
5
6
7
8
4
3
1
2
2
3
1
4
X1
D1
D2
D3
D4
1
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
D
i 1
X1
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 1 X 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Observation
Category
1
2
3
4
5
6
7
8
4
3
1
2
2
3
1
4
X1
D1
D2
D3
D4
1
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
D
i 1
X1
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 1 X 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Observation
Category
1
2
3
4
5
6
7
8
4
3
1
2
2
3
1
4
X1
D1
D2
D3
D4
1
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
D
i 1
X1
If you tried to run the regression anyway, the regression application should detect the
problem and do one of two things. It may simply refuse to run the regression.
10
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 1 X 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Observation
Category
1
2
3
4
5
6
7
8
4
3
1
2
2
3
1
4
X1
D1
D2
D3
D4
1
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
D
i 1
X1
Alternatively, it may run it, dropping one of the variables in the linear relationship,
effectively defining the omitted category by itself.
11
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 1 X 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Observation
Category
1
2
3
4
5
6
7
8
4
3
1
2
2
3
1
4
X1
D1
D2
D3
D4
1
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
D
i 1
X1
There is another way of avoiding the dummy variable trap. That is to drop the intercept (and
X1). There is no longer a problem because there is no longer an exact linear relationship
linking the variables.
12
Y 1 2 X 2 ... k X k 2 D2 ... s Ds u
Y 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 1 X 1 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Y 2 X 2 ... k X k 1 D1 2 D2 ... s Ds u
Observation
Category
1
2
3
4
5
6
7
8
4
3
1
2
2
3
1
4
X1
D1
D2
D3
D4
1
1
1
1
1
1
1
1
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
D
i 1
X1
The parameters are now the intercepts in the relationship for the individual categories.
For example, if the observation relates to category 2, all the dummy variables except D2 will
be equal to 0. D2 = 1, and hence the relationship for that observation has intercept 2.
13
11.07.25