Econometric Lec4
Econometric Lec4
VARIABLES
4
LECTURE
1
CHAPTER 9
Dummy Variable
Regression Models
2
Nature of “dummy” variable:
(1) Variables that assume such “1” and “0” values
(2) Variables usually indicates the dichotomized
“presence” or “absence”, “yes” or “no”, etc.
(3) Variables indicates a “quality” or an attribute,
such as “male” or “female”,
“black” or “white”,
“urban” or non-urban”
“before” or “after”
“North” or “south”, “east” or “west”
………..etc.
3
obs Dummy Dummy Years of
Male=1 Female=1 Salary(K)
teaching
1 1 0 23 1
2 0 1 19.5 1
3 1 0 24 2
4 0 1 21 2
5 1 0 25 3
6 0 1 22 3
7 1 0 26.5 4
8 0 1 23.1 4
9 0 1 25 5
10 1 0 28 5
11 1 0 29.5 6
12 0 1 26 6
13 0 1 27.5 7
14 1 0 31.5 7
15 0 1 29 6
16 1 0 22 5
17 0 1 19 2
18 1 0 18 2
19 0 1 21.7 5
20 0 1 18.5 2
21 1 0 21 4
22 1 0 20.5 4
23 0 1 17 1
24 0 1 17.5 1
4
25 1 0 21.2 5
Separate male sample
obs Starting salary, Y Years of teaching, X2
1 23 1
3 24 2
5 25 3
7 26.5 4
10 28 5
11 29.5 6
14 31.5 7
16 22 5
18 21.7 5
21 21 4
22 20.5 4
25 21.2 5
5
Separate female sample
obs Staring salary, Y Years of teaching, X2
2 19.5 1
4 21 2
6 22 3
8 23.1 4
9 25 5
12 26 6
13 27.5 7
15 29 6
17 19 2
19 18 2
20 18.5 2
23 17 1
24 17.5 1
6
Salary Y
35
^ ^ ^
Y = 1 + 2 X (male)
30
25
20 ^=^
Y ^ X
’1+ ’ 2
(female)
Male
15 Female
Linear (Male)
Linear (Female) X
10 teaching
0 1 2 3 4 5 6 7 8
years
Two separate models: Ym = 1 + 2 Xm + um (male)
Yf = ’1 + ’2 Xf + uf (female) 7
Assuming ’2 = 2, same slope but different constant
between Yi and Xi.
1st model: Yi = 1 + *1 Di + 2 Xi + ui
Di = 1 if male
control
=0 otherwise (female) variable
8
Salary Y
35 ^ ^ ^
Y = ”1 + ”2 X (whole)
^ ^ ^
Y = 1 + 2 X (male)
30
25
^ ^
^ + ’
20 Y = ’ 1 2X (female)
Male
15 Female
Linear (Male)
Linear (Female) X
10 teaching
0 1 2 3 4 5 6 7 8 years
Two separate models: Ym = 1 + 2 Xm + um (male)
Yf = ’1 + ’2 Xf + uf (female) 9
annual years of
male female
Salary teaching
obs D2 D1 Y X D1 + D2 = 1
1 0 1 23 1
2 1 0 19.5 1
3 0 1 24 2 D1 = 1 - D2
4 1 0 21 2
5 0 1 25 3
6 1 0 22 3 Each dummy
7 0 1 26.5 4
8 1 0 23.1 4 identify two
9
10
1
0
0
1
25
28
5
5
different
11 0 1 29.5 6 categories,
12 1 0 26 6
13 1 0 27.5 7 but when
14
15
0
1
1
0
31.5
29
7
6
sum up two
16 0 1 22 5 dummies
17 1 0 19 2
18 0 1 18 2 it cannot
19
20
1
1
0
0
21.7
18.5
5
2
identify
21 0 1 21 4 which is
22 0 1 20.5 4
23 1 0 17 1 male or female
24 1 0 17.5 1
25 0 1 21.2 5 10
Caution in the use of Dummy variables
(Dummy variable trap)
If we introduce two dummy variables in one model to
identify two categories of one qualitative variable such as
Yi = 1+ *1 D1i + **1 D2i + 2 Xi + ui
where D1i = 1 if female
= 0 otherwise
where D2i = 1 if male
= 0 otherwise
This model cannot be estimated because of
perfect collinearity between D1 and D2
D1 = 1 - D2
or D2 = 1 - D1
or D1 + D2 = 1 ( Perfect collinearity ) 11
Use two dummy variables to identify two different qualitative
categories in one model will be fall into the
“Trap of perfect multi-collinearity”
General rule : To avoid the perfect multicollinearity
Qualitative variable
age
1 10 20 30 40 m
Categories
dummy => D1 D2 D3 D4 D5 … Dm-1
12
2 When a category is assigned the value of zero, this
category is called a control category (or omitted group).
Male: ^ ^ +^ ^
==> Yi = ( 1 2) + 3Xi D2i = 1
Female: ==> ^ ^ ^
Yi = 1 + 3Xi D2i = 0
13
In order to test whether there is any difference in
the relationships between two categories
Compare: ^ ^ ^
Yi = 1 + 2Xi
^ ^+ ^ ^
Yi = ( 1 2 ) + 3 Xi
^ there is different
If t-statistics is significant in 2,
in constant term.
^
=>same 3 means two categories of X have the
^
Y = ^1 + ^2Di+ ^3 Xi +^4DiXi This part is testing
= = whether any
This part is difference in slope
testing the Check Check of two categories
difference of t-statistics t-statistics
15
intercept
Separate Examples for female and male:
Female Male
D1:Female =1 D2:Male =1
others = 0 others = 0
^ ^
Yi = (^1 +^2) + 3 Xi ^ ^+ ^
Yi = ( 2 )+
^
3 Xi
1
^ ^ ^
Yi = 1 + 2Xi
= 17.095+1.608Xi
18
D1: Female =1
Male: ^1 + ^3Xi
Y= = 18.689 + 1.373 Xm
Male: ^ +^
Y = ( )+ ^+
( ^ )X
1 2 3 4 =18.689 + 1.373 X
20
2
One qualitative variable with more than two categories
(Y) (X)
(Health care) = 1 + 2 D2 + 3 D3 + 4Income + u
D3 = 1 if college education
= 0 otherwise
21
Health College education
care D3 = 1
^ ^ ^ ^
Y = ( 1 + 3 ) + 4 X
^1
22
income
D2 = 1 High school D3 = 1 College
= 0 otherwise = 0 otherwise
=========================================
obs Y X D2 D3
=========================================
1 6.000000 40.00000 0.000000 1.000000
2 3.900000 31.00000 1.000000 0.000000
3 1.800000 18.00000 0.000000 0.000000
4 1.900000 19.00000 0.000000 0.000000
5 7.200000 47.00000 0.000000 1.000000
6 3.300000 27.00000 1.000000 0.000000
7 3.100000 26.00000 1.000000 0.000000
8 1.700000 17.00000 0.000000 0.000000
9 6.400000 43.00000 0.000000 1.000000
10 7.900000 49.00000 0.000000 1.000000
11 1.500000 15.00000 0.000000 0.000000
12 3.100000 25.00000 1.000000 0.000000
13 3.600000 29.00000 1.000000 0.000000
14 2.000000 20.00000 0.000000 0.000000
15 6.200000 41.00000 0.000000 1.000000
========================================= 23
24
Measuring the estimated results of different groups:
^
Less than high school: Yi = -1.2859 + 0.1722 Xi
^
High school: Yi = (-1.2859 - 0.068 ) + 0.1722 Xi
If t value of D2 is = -1.3539 + 0.1722 X
statistically significant = -1.2859 + 0.1722 X
^
College: Yi = (-1.2859 + 0.447 ) + 0.1722 Xi
If t value of D3 is = -0.8389 + 0.1722 Xi
statistically significant = -1.2859 + 0.1722 X
If t-test is not
statistically significant
25
One Qualitative variable with many categories :
Example : An estimate model on three different
age’s medical care expenditure
Yi = 1 + 2 D1 + 3 D2 + 4 Xi + ui
(t-value) (t-value)
D2 = 1 if age > 55
= 0 otherwise
0 A1 =1 A2 =1
25 55
26
measure the estimated models are :
^ ^ ^
age below 25 Y = 1 + 4 X
^ ^X
25 < age < 55 Y = (^1 + ^ 2) + 4
^ ^
age > 55 Y = ( ^1 + ^3)+ 4 X
H0 : 2 = 0, 3 = 0 t 1*
Compare to tc(α/2, n-k)
H1 : 2 0, 3 0 t 2*
27
In scatter diagram :
Y
^ ^
Y = ( ^1+
^)+
2 4X
> 5 5 ^ ^
a ge Y = ( ^1 + ^3) + 4X
< 55 ^ ^
25 < age Y = (^1 ) + 4 X
< 2 5
^’’ 0
age
^
’ 0
^
0
28
One Qualitative variable with many categories :
Example : An estimate model on four different age’s
medical care expenditure
Y = 1 + 2 D1 + 3 D2 + 4 D3 + 5 X + u
29
Measure the estimated models are :
^ ^ ^
age 15 Y = 1 + 5 X
^ ^ ^ ^
15 < age 35 Y = (1 + 4) + 5 X
^ ^ ^ ^
35 < age 55 Y = (1 + 3) + 5 X
^ ^ ^ ^
age > 55 Y = (1 + 2) + 5 X
30
Two qualitative variables
(Y) Salary = 1 + 2D1 + 3 D2 + 4X + u
or Y = 1+ 2D1+ 3D2 + 4X + 5D1*X + 6D2*X + u
D1 = 1 if male
sex
=0 otherwise
D2 = 1 if white
race
=0 otherwise
(1) Mean salary for “non-white” female teacher:
^ ^ ^
Y= 1 + 2X that is D1 = 0, D2 = 0
(2) Mean salary for “non-white” male teacher:
^
Y = (^1 + ^2) + ( 4+^5)X
^ that is D1 = 1, D2 = 0
31
(3) Mean salary for “white” female teacher:
^ ^
Y = (^1 + 3) + ( 4 +^6 )X that is D1 = 0, D2 = 1
32
Different types of dummy regression:
1.
Y = 1 + 2 X + ’1D + ’2D*X
H0 : ’1 = 0 and ’2 = 0 D = 1 if 1970-1981
2 = 0 otherwise
(1982-1995)
Y = 1 + 2 X + ’1 D + ’2D*X
H0 : ’1 = 0
3
Y = 1 + 2 X + ’1 D + ’2D*X
H0 : ’2 = 0
4
Y = 1 + 2 X + ’1D + ’2D*X
H0 : ’1 0 and ’2 0
33
(1970-1981): Yt = A1 + A2 Xt + u1t
(1982-1995: Yt = B1 + B2 Xt + u2t
Y Y
B2
1
A2 = B2 A2
1 B1 1
A1
A1 = B1
X
X A1 B1, A2 = B2
Identical regressions Parallel regressions
34
Y Y
A1
A1 B1
1
1 B1 1
1 B0
A0 = B0 A0
X X
A0 = B0, A1 B1 A0 B0, A1 B1
Concurrent regressions dissimilar regressions
35
Interactive effects between the two qualitative variables
D1 = 1 if female
sex
=0 otherwise
D2 = 1 if college graduate education
=0 otherwise
Interaction effect:
Spending(Y) = 1 + ’1D1 + ”1D2 + ’”1D1*D2 + 2income(X) + u
gas spending ^ ^ ^
o used car Y = 1+ 2X
Y o ^ ^
o o
Y = 1+ (^2 + ^ ’ )X
2
o o
o * New car ^Y = ^1 + ^
2 X
o * *
o *
o * * *
* *
* *
^
0
X miles running
37
Let 2= 2 + ’2 D where D = 1 if used
car
Now in one model : = 0 otherwise
Yi = 1 + (2 + ’2 D) Xi + ui multiplicative
= 1 + 2 Xi + ’2 D*Xi +ui dummy variable
= 1 + 2 Xi + ’2 Zi + ui
^
(ii) use t-test on ^ ’2:Y = ^1 +
^ X + ^ ’ Z
2 i 2
H0 : ^
’2 = 0
H :^
1 ’ > 0 Used car is spending more gasoline per mile
2
39
obs Yi Xi Di (Di Xi) = Zi
1 210 100 0 0
2 250 110 1 110
3 340 150 1 150
4 305 120 1 120
…...
…...
…...
…...
…...
^
Y = ^1 + ^2 Xi + ’
^ Z
2 i
40
Shifts in both intercept and slope
E = 1 + 2 T + u
E : electricity consumption
T : temperature
To capture effect of seasonal factors
E = 1 + ’1D1 +”1D2 + ’’’ 1 D3 + 2T + u
where D1 = 1 if winter
0 otherwise
D2 = 1 if spring
0 otherwise Q1 Q2 Q3 Q4
spring summer fall winter
D3 = 1 if summer
0 otherwise
41
Measure the basic difference of four seasonal results :
^ ^
Fall E = ^1 + 2 T
^ ^ )+ ^
Winter E = (^1 + ’ 1 2 T
^ ^
Spring E = (^1 + ^
”1)+ 2 T
^ ^ )+ ^ T
^ +”’
Summer E = ( 1 1 2
E
^ ^ ^ ^
E=( 1 + ”’ 1 )+ 2T (Summer)
^ ^ ^ ) + ^ T (Spring)
E = ( 1 + ” 1 2
^ ^ ^
E = (1 + ’1) + ^2 T
^
E = ^1 + ^2T (Fall)
(winter)
^’’’ 1
^’’ 1 ^
^’1
1 T
42
Also consider the slope in different seasons
Z3
43
Measure the four seasonal results :
^ ^
Fall E = ^1 + 2 T
^ ^ )+ (^ + ’
^
Winter E = (^1 + ’ 1 2 2) T
^ ^ ^
Spring E = (^1 + ^
”1)+(2 + ”1) T
^ ^ )+ (
^ +”’ ^ ^
Summer E = ( 1 1 2 + ”’2) T
E ^ ^ )T(Summer)
^ )+( ^+ ”’
E=(^1+ ”’ 1 2 2
^ )+(^ + ”
^1 + ”
E = ( ^ )T (Spring)
1 2 2
^ ^ ^ ^ ^
E = ( 1 + ’ 1 )+( 2 +’2)T(winter)
^’’’ ^
1
^’’ 1 ^
E = ^1 + ^2T (Fall)
’1
^1 T 44
Quarterly effect is same as seasonal effect
D1 = 1 1st Quarter
=0 otherwise
D2 = 1 2nd Quarter
=0 otherwise
D3 = 1 3rd Quarter
=0 otherwise
45
Using the dummy variable to identify the structural change
Basic model 1989
1960
Yt = 1 + 2 Xt + ut 1974
Dummy regression:
Yt = 1 + ’1 D + 2 Xt + ’2D Xt + ut
46
The Chow test on the Unemployment rate-capacity utilization rate
_
Dependent Var. Constant CAPt R2 F RSS n
^
unemplt 30.0 -0.293 0.761 93.6 17.15 30
(12.1) (9.7) RSSR
^
unemplt 19.64 -0.175 0.59 19.7 4.69 14
(5.9) (4.4) RSS1
^
unemplt 30.63 -0.296 0.871 102.1 3.29 16
(13.1) (10.1) RSS2
^
unempl = 19.6 + 11.0 Dt - 0.175 CAPt - 0.121 (Dt*CAPt)
(6.7) (2.7) (5.0) (2.5)
_
R2 = 0.88 SEE = 0.554 F = 72.2 n = 30
^ ^ D)+(^ + ’
^ + ’ ^ D) X
Prior to 1974 : 1 Y=( 1 1 2 2
* *
51
GENR DUMMY = 1 (sample 1970 - 1980)
GENR DUMMY = 0 (sample 1981 - 1991)
=================================================
obs SAVINGS INCOME DUMMY D*INCOME
=================================================
1970 57.50000 831.0000 1.000000 831.0000
1971 65.40000 893.5000 1.000000 893.5000
1972 59.70000 980.5000 1.000000 980.5000
1973 86.10000 1098.700 1.000000 1098.700
1974 93.40000 1205.700 1.000000 1205.700
1975 100.3000 1307.300 1.000000 1307.300
1976 93.00000 1446.300 1.000000 1446.300
1977 87.90000 1601.300 1.000000 1601.300
1978 107.8000 1807.900 1.000000 1807.900
1979 123.3000 2033.100 1.000000 2033.100
1980 153.8000 2265.400 1.000000 2265.400
1981 191.8000 2534.700 0.000000 0.000000
1982 199.5000 2690.900 0.000000 0.000000
1983 168.7000 2862.500 0.000000 0.000000
1984 222.0000 3154.600 0.000000 0.000000
1985 189.3000 3379.800 0.000000 0.000000
1986 187.5000 3590.400 0.000000 0.000000
1987 142.0000 3802.000 0.000000 0.000000
1988 155.7000 4075.900 0.000000 0.000000
1989 175.6000 4664.200 0.000000 0.000000
1990 175.6000 4664.200 0.000000 0.000000
1991 199.6000 4828.300 0.000000 0.000000
52
=================================================
Savings = 1 + 2 Income + ’1D + ’2D*Income + u
D=1 1970--1980
=0 1981--1991
1970 - 1991 :
Savings = 57.63 + 0.031 Income
(3.86) (5.95)
1970 - 1980 :
Savings = 14.61 + 0.056 Income
(1.40) (7.93)
1981 - 1991 :
Savings = 217.81 + 0.010 Income
(6.16) (-1.08)
54
LS // Dependent Variable is SAVINGS
Date: 03/02/99 Time: 22:23
Sample: 1946 1963 Only consider the difference in intercept
Number of observations: 18
=====================================================
Variable Coefficient Std. Error t-Statistic. Prob.
=====================================================
C -1.250957 0.364879 -3.428419 0.0037
DUMMY 0.091857 0.181244 0.506816 0.6197
INCOME -0.125655 0.017837 -7.044517 0.0000
=====================================================
R-squared 0.919909 Mean dependent var 0.773333
Adjusted R-squared 0.909230 S.D. dependent var 0.642806
S.E. of likelihood 0.193665 Akaike info criterion -3.132238
Sum squared resid 0.562593 Schwarz criterion -2.983843
Log likelihood 5.649250 F-statistic 86.14326
Durbin-Watson stat 0.976197 Prob(F-statistic) 0.000000
=====================================================
Savings = -1.250 + 0.091 Dummy + 0.125 Income
(-3.42) (0.506) (7.04) 55
LS // Dependent Variable is SAVINGS
Date: 03/02/99 Time: 22:23
Sample: 1946 1963 Whether intercept and slope change?
Number of observations: 18
=====================================================
Variable Coefficient Std. Error t-Statistic. Prob.
=====================================================
C -1.750172 0.331888 -5.273377 0.0001
DUMMY 1.483923 0.470362 3.154852 0.0070
INCOME 0.150450 0.016286 9.238172 0.0000
DINCOME -0.103422 0.033260 -3.109471 0.0077
=====================================================
R-squared 0.952626 Mean dependent var 0.773333
Adjusted R-squared 0.942475 S.D. dependent var 0.642806
S.E. of likelihood 0.154173 Akaike info criterion -3.546228
Sum squared resid 0.332771 Schwarz criterion -3.348367
Log likelihood 10.37516 F-statistic 93.84109
Durbin-Watson stat 1.468099 Prob(F-statistic) 0.000000
===================================================== 56
^
Savings = -1.750 + 1.483 D + 0.150 Income - 0.103 (Income*D)
(-5.273) (3.154) (9.238) (-3.109)
57
THE END
58