Asymp2 Analogy 2006-04-05 Mms
Asymp2 Analogy 2006-04-05 Mms
James J. Heckman
University of Chicago
Econ 312
This draft, April 5, 2006
Consider four methods:
3. Methods of Moments
\w = [w + Xw
1
Under standard assumptions
1. Xw Q(0> 2x ); Xw i.i.d.;
2. [w non-stochastic; and
S
[w [w0
3. W
full rank.
2
1 Analogy Principle: The Large Sam-
ple Version
This is originally due to Karl Pearson or Goldberger.
3
The main points of this principle are set out in §1.1. The
conditions for consistency of the estimator are discussed in
§1.2. In §1.3, an abstract example with a finite parameter set
illustrates the application of the analog principle and the role of
regularity conditions in ensuring consistency of the estimator.
4
1.1 Outline of the Analog Principle
The ideas constituting the analog principle can be grouped
under four steps:
5
3. Analog in sample. In the sample, construct an ‘analog’
to T, TW (\> [> ), which has the following properties:
a. s. unif.
(a) (; 5 X) TW (\> [> ) $ T(\> [> ),
where W is the sample size; and
(b) TW (\> [> ) mimics in sample properties of T(\> [> )1
in the population.
1
Hereafter, we shall suppress the dependence of the criterion function
T and the sample analog TW on the data (\> [) for notational conve-
nience.
6
1.2 Consistency and Regularity
ˆ W is
Definition 1 (Consistency) We say that an estimator
a consistent estimator of parameter 0 if
s
ˆ W $ 0.
7
2. Uniform convergence of the analog function. We need the
sample analog TW (ˆw ) to converge ‘nicely’. In particular,
we need the convergence of the sample analog criterion
function to the criterion function – (; 5 X) TW () $
T() – to be uniform.
8
The first condition ensures that 0 can be identified. If there
is more than one value of that causes T to have property S ,
then we cannot be sure that only one value of will cause TW
to assume property S in the sample. In this case we may not
be able to determine what ˆW estimates.
9
1.3 An Abstract Example
The intuition underlying the analog principle method and the
role of the regularity conditions is illustrated in the simple
(non-stochastic) example below.
10
• Select ˆW to maximize TW () for each W . Then, if con-
vergence is ‘OK’ (we get that under the regularity con-
ditions), we have convergence in the sample to the max-
imizer in the population; i.e. we have ˆ $ as W $ 4.
11
Now suppose X = {1> 2> 3} is a finite set, so that ˆ assumes
only one of three values. Then under regularity condition 1
(q.v. §1.2), T() is maximized at one of these.
Further, by construction we have
¡ ¢
; 5 X = {1> 2> 3} TW () $ T().
This estimator must work well (i.e. be consistent for for ‘big
enough’ W , because TW $ T for each . Why? We loosely
sketch a proof of the consistency of ˆ for below.
12
Say 0 = 2 so that T() is maximized at = 2, T(2) A T(1),
and T(2) A T(3). Now
13
Now suppose that even for very large W , TW () is not maxi-
mized at 2 but say at 1. Then we have
14
2 Overview of Convergence Concepts
for Non-stochastic Functions
Most estimators involve forming functions using the data avail-
able. These functions of data can be viewed as sequences of
functions, with the index being the number of data points avail-
able.
15
In this section, we examine certain basic concepts about con-
vergence of sequences of functions. The key idea here is that
of uniform convergence; we shall broadly motivate why this
notion of convergence is required for us.
16
2.1 Pointwise Convergence
Let Iq : V $ U be a sequence © of ªreal valued functions. For
"
each { 5 V form a sequence Iq ({) q=1 . Let E 5 V be the set
of points for which Iq ({) converges.
17
Inheritance requires uniform convergence. Usually, this is a
su!cient condition for the properties of Iq being shared by the
limit function. Some of the properties we will be interested in
are summarized below:
The answer to all three questions is: No, not in general. The
pointwise convergence of Iq ({) to I ({) is generally not su!-
18
cient to guarantee that these properties hold. This is demon-
strated by the examples which follow.
19
Example 1
Consider Iq ({) = {q (0 { 1).
Here, Iq ({) is continuous for every q, but the limit function
is: (
0 if 0 { ? 1
I ({) =
1 if 0 { = 1
This is discontinuous at { = 1.
20
Example 2
Consider
{
Iq ({) = ({ 5 R).
{+q
Then the limit function is I ({) = lim Iq ({) = 0 for each fixed
q<"
{. This implies that lim lim Iq ({) = 0.
{<" q<"
But we have lim Iq ({) = 1 for every fixed q. This implies that
{<"
lim lim Iq ({) = 1 6= lim lim Iq ({) = 0.
q<" {<" {<" q<"
21
Example 3
Consider Iq ({) = q{(1 {2 )q (0 { 1). Then the limit
function is2
I ({) = lim Iq ({) = 0
q<"
R1
for each fixed {. This implies that 0 I ({) = 0. But we also
get Z 1
q 1
lim Iq ({) = lim = .
q<" 0 q<" 2q + 2 2
2
See Rudin, Chapter 3 for concepts relating to convergence of se-
quences.
22
2.2 Uniform Convergence
A sequence of functions {Iq } converges uniformly to I on E
if
23
Note that the convergence displayed in the examples in §2.1
did not satisfy the notion of ‘uniform’ convergence.
3
See Rudin, Chapter 7. Theorems 7.12, 7.11, and 7.16 respectively
ensure that the three properties listed in §2.1 hold.
24
3 Applications of the Analog
Principle
25
3.1 Moment Principle
In this case, the criterion function T is an equation connecting
population moments and 0 .
¡ ¢
T = T population moments of [|> {]> = 0 at = 0 . (1)
26
When forming the estimator ‘analog’ in the sample, we can
consider two cases.
3.1.1 Case A
Here we solve for 0 in the population equation, obtaining
¡ ¢
0 = 0 population moments [y,x] .
ˆ s
Given regularity condition 1 (q.v. §1.2), we get W $ 0 . Note
that we do not need regularity condition 2.
27
Example 4 (OLS) 1. The model.
\w = [w 0 + Xw E(Xw ) = 0
Var(Xw ) = 2x E([w0 Xw ) = 0
X X
0 0
E([w [w ) = positive definite E([w \w ) = |
{{ {
E([w0 Xw ) =0
0 0 0
or [
Pw |w =[
Pw [w 0 + [w Xw
,T: = {{ 0 + 0
{| P31 P
, 0 = {{ {| .
28
3. Analog in sample.
μP ¶31 μP ¶
[w0 [w [w \w
ˆ W =
W W
¡ P ¢31 P
Now, r.h.s. $ ˆ s
{{ {| . Thus, $ 0 .
29
3.1.2 Case B
Here we form the criterion function, form the sample analog
for the criterion function, and then solve for the estimator.
30
3. Analog in sample. Here we define an ample analog of X:
b = | {ˆ
X
This mimics X in the criterion function, so that we can
form the sample analog of the criterion function by sub-
b for X in the expression for T above:
stituting X
1X ˆ =0
TW = {w (\w [w )
W
31
Remarks.
32
3.2 Extremum Principle
We saw in §1.3 that under the extremum principle, property P
provides that T() achieves a maximum or minimum at = 0
in the population.
33
The OLS estimator can be seen as a special case of the NLS
estimators, and can be viewed as an extremum estimator.
34
Example 1 (OLS as an extremum principle)
1. The model. We assume the true model
\w = [w 0 +¡XW ¢
=, \W = [w + [w ( 0 ) + Xw
=, (\w [w )0 (\w [w ) = ( 0 )0 [w0 [w ( 0 )
+ 2Xw0 [w ( 0 ) + Xw0 Xw .
2. Criterion function.
¡ 0
¢
T = E (\w [w ) (\w [w )
From model assumptions, we have E([w0 Xw ) = 0. Thus
X
0
T = ( 0 ) ( 0 ) + 2x .
{{
35
So T is minimized (with respect to ) at = 0 .
Here, then, T is an example of the extremum principle.
It is minimized when = 0 (the true parameter vector).
1 XW
TW = (\w [w )0 (\w [w0 ).
W w=1
36
We can show that this analog satisfies the key requirement
of the analog principle:
1 X W
plim TW = plim (\w [w )0 (\w [w0 )
W w=1
X
0
= ( 0 ) ( 0 ) + 2X = T
{{
37
Example 2 (NLS as an extremum principle)
B j([w ; ).
(;) Xw B
38
2. Criterion function. Choose the criterion function as
¡ ¢2 ¡ ¢2
T = E \w j([w ; ) = E j([w ; 0 ) j([w ; ) + 2X .
3. Analog in sample.
1 XW
¡ ¢2
TW () := \w j([w ; )
W w=1
39
4. The estimator. We construct the NLS estimator as
ˆW = argmin TW ().
TW (0 ) $ T(0 )
and
(; 5 X) TW () $ T() =, ˆW $ .
40
Remark. The NLS estimator could also be derived as a moment
estimator, just as in the OLS example (q.v. §3.1).
1. The model. Same as the non-linear model above. We
have Xw B B j([w ; ).
¡ ¢
2. Criterion function. T = E Xw · j([w ; ) = 0. Note that
this is only one implication of B
B. We may now write
¡ w ; ) = Xw
\w j([ ¢
=, T = E \w j([w ; ) · j([w ; ) = 0
3. Analog in sample.
1 XW
¡ ¢
TW := \w j([w ; ) · j([w ; )
W w=1
41
4. The estimator. Find which sets TW = 0 (or as close to
zero as possible).
4 Maximum Likelihood
The maximum likelihood estimation (MLE) method is an ex-
ample of the extremum principle. In this section, we look at the
ideas underlying MLE and examine the regularity conditions
and convergence notions in more detail for this estimator.
42
4.1 The Model
Suppose that the joint density of data is
i (|w > {w ; 0 ) · i(|w | {w ; 0 ) · i ({w ).
Assume that {w is ‘exogenous’ – i.e. the density of {w is unin-
formative about 0 . Also assume random sampling. We arrive
at the likelihood function
Y
W
L= i(|w > {w ; 0 ).
w=1
43
4.2 Criterion Function
In the population define the criterion function as
¡ ¢
T = E0 ln i (|w > {w ; )
Z
¡ ¢
= ln i(|w > {w ; ) i (|w > {w ; 0 ) d|w d{w .
44
Claim. The criterion function is maximized at = 0 .
Proof.
μ ¶
i (|w > {w ; )
E0 = 1 because
i(|w > {w ; 0 )
Z
i(|w > {w ; )
i (|w > {w ; 0 ) d|w d{w = 1.
i (|w > {w ; 0 )
45
Applying Jensen’s inequality, concavity of the ln function im-
plies that
¡ ¢
E ln({) ln E({)
μ μ ¶¶
i(|w > {w ; )
=, E0 ln 0
i (|w > {w ; 0 )
¡ ¢ ¡ ¢
=, (;) E0 ln i (|w > {w ; ) E0 ln i (|w > {w ; 0 ) .
46
4.3 Analog in Sample
Construct a sample analog of the criterion function TW as
1 XW
TW := ln i(|w > {w ; ).
W w=1
47
Local form of the principle. In the local form, we use FOC and
SOC to arrive at 5 X argmax Tw . Recall that we have the
criterion function
Z
¡ ¢
T() = ln i(|; ) i (|; 0 ) d| = E0 ln i(|; ) .
48
Accordingly, we require for the sample analog:
1 XW
C ln i(|w ; )
=0
W w=1 C
1 X C 2 ln i
W
negative definite
W w=1 CC0
49
Either way (e.g. directly by a grid search or using the FO/SOCs),
we have the same basic idea. For each W , we pick ˆW such that
50
5 Some Concepts and Definitions for
Random (Stochastic) Functions
In §5.1 we define random functions and examine some funda-
mental properties of such functions. In §5.2 we define conver-
gence concepts for sequences of random functions.
51
5.1 Random Functions and Some Properties
Definition 2 (Random Function)
Let (l> A> S ) be a probability space and let X 5 Rn . A real
function *() = *(> $) on X × l is called a random function
on X if
1
© ª
(;w 5 R )(; 5 X) $ 5 l : *(> $) ? w 5 A.
52
Proposition. If *(> {) is a continuous real-valued function on
X × Uq where X is compact, then
53
Proposition. If for almost all values of { 5 [, j({> ) is con-
tinuous with respect to at the point 0 , and if for all in a
neighborhood of 0 we have
¯ ¯
¯j({> )¯ ? J1 ({) ? 4,
then Z Z
lim j({> ) dI ({) = j({> 0 ) dI ({).
<0
I.e. ¡ ¢ ¡ ¢
lim E j({> ) = E j({> 0 ) .
<0
54
Proposition. If for almost all values of { 5 [ and for a fixed
value of
Cj({> )
(a) exists (in a neighborhood of ), and
¯ C ¯
¯ j({> + k) j({> ) ¯
(b) ¯ ¯ ? J2 ({),
¯ k ¯
55
5.2 Convergence Concepts for Random Func-
tions
In Part I (asymptotic theory) we defined convergence concepts
for random variables. Here we define analogous concepts for
random functions.
i.e. if for every fixed the set V [ such that |Iq (> $)
I (> $)| %, q q0 (> %), has no probability.
56
V = 5 X ^ V may have a non-negligible probability even
though any one set has negligible probability. We avoid this
by the following definition.
57
Definition 5 (Convergence in Probability) Let Iq () and
I () be random functions on X. Then Iq () $ I () in prob-
ability uniformly in on X if
© ª
lim S sup |Iq () I ()| A % = 0.
q<" MX
58
Theorem 1 (Strong Uniform Law of Large Numbers)
Let {{q } be a sequence of random n × 1 i.i.d. vectors. Let
I ({> ) be a continuous real function on Un . X is compact (it
is closed and bounded, and thus has a finite subcover). Define
XW Z
1 a. s. unif.
I ({m > ) $ I ({> ) dJ({).
W m=1
59
This is a type of LLN, and could be modified for the non-i.i.d.
case. In that case, each {m has its own distribution Im , and we
would require that
1 XW
g
(a) Im $ J and
W m=1
1X ¡ ¢1+
W
(b) W sup E |#({m )| ? 4.
W m=1
Then
XW Z
1 a. s. unif.
I ({m > ) $ I ({> ) dJ({).
W m=1
60
References
1. Amemiya, Advanced Econometrics, 1985, Chapter 3.
61
6. Stokey, Lucas, and Prescott, Recursive Methods in Eco-
nomic Dynamics, 1989, Chapter 3.
62