0% found this document useful (0 votes)

59 views63 pages

Asymp2 Analogy 2006-04-05 Mms

The document discusses four common econometric estimation methods and how they relate to the analogy principle. It outlines the analogy principle, describes how it ensures consistency of estimators, and provides an illustrative example. The document also discusses different types of convergence for non-stochastic functions and how uniform convergence is needed for properties to carry over to limit functions.

Uploaded by

Samu Lascar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views63 pages

Asymp2 Analogy 2006-04-05 Mms

Uploaded by

Samu Lascar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Analogy Principle

Asymptotic Theory — Part II

James J. Heckman
University of Chicago

Econ 312
This draft, April 5, 2006
Consider four methods:

1. Maximum Likelihood Estimation (MLE)

2. (Nonlinear) Least Squares

3. Methods of Moments

4. Generalized Method of Moments (GMM)

These methods are not always distinct for a particular problem.

Consider the classical normal linear regression model:

\w = [w + Xw

1
Under standard assumptions

1. Xw Q(0> 2x ); Xw i.i.d.;

2. [w non-stochastic; and
S
[w [w0
3. W
full rank.

OLS is all four rolled into one.

In this lecture we will show how one basic principle – the

Analogy Principle – underlies all of these modern econometric
methods.

2
1 Analogy Principle: The Large Sam-
ple Version
This is originally due to Karl Pearson or Goldberger.

The intuition behind the analogy principle is as follows:

‘Suppose we know some properties that are satisﬁed for the

“true parameter" in the population. If we can ﬁnd a para-
meter value in the sample that causes the sample to mimic
the properties of the population, we might use this parameter
value to estimate the true parameter.’

3
The main points of this principle are set out in §1.1. The
conditions for consistency of the estimator are discussed in
§1.2. In §1.3, an abstract example with a ﬁnite parameter set
illustrates the application of the analog principle and the role of
regularity conditions in ensuring consistency of the estimator.

4
1.1 Outline of the Analog Principle
The ideas constituting the analog principle can be grouped
under four steps:

1. The model. Suppose there exists a ‘true model’ in the

population – P(\> [> 0 ) = 0 – an implicit equation.
0 is a true parameter vector, and the model is deﬁned
for other values (at least some other) of 5 X.
2. Criterion function. Based on the model, construct a ‘cri-
terion function’ of model and data:
T(\> [> )
which has property S in the population when = 0 (the
‘true’ parameter vector).

5
3. Analog in sample. In the sample, construct an ‘analog’
to T, TW (\> [> ), which has the following properties:
a. s. unif.
(a) (; 5 X) TW (\> [> ) $ T(\> [> ),
where W is the sample size; and
(b) TW (\> [> ) mimics in sample properties of T(\> [> )1
in the population.

4. The estimator. Let ˆW be the estimator of 0 in sample W

formed from the analog in sample, which causes sample
TW (.) to have the property S .

1
Hereafter, we shall suppress the dependence of the criterion function
T and the sample analog TW on the data (\> [) for notational conve-
nience.

6
1.2 Consistency and Regularity
ˆ W is
Deﬁnition 1 (Consistency) We say that an estimator
a consistent estimator of parameter 0 if
s
ˆ W $ 0.

In general, to prove that the estimator formed using the analog

principle ˆW is consistent, we need to make the following two
assumptions, which are referred to as the regularity conditions:

1. Identiﬁcation condition. Generally this states that only

0 causes T to possess property S , at least in some neigh-
borhood of 0 ; i.e. 0 must be at least locally identiﬁed.

7
2. Uniform convergence of the analog function. We need the
sample analog TW (ˆw ) to converge ‘nicely’. In particular,
we need the convergence of the sample analog criterion
function to the criterion function – (; 5 X) TW () $
T() – to be uniform.

8
The ﬁrst condition ensures that 0 can be identiﬁed. If there
is more than one value of that causes T to have property S ,
then we cannot be sure that only one value of will cause TW
to assume property S in the sample. In this case we may not
be able to determine what ˆW estimates.

The second condition is a technical one that ensures that if

TW has a property such as continuity or dierentiability, this
property is inherited by T.

Speciﬁc proofs of consistency depend on which property we

suppose T has when = 0 .

9
1.3 An Abstract Example
The intuition underlying the analog principle method and the
role of the regularity conditions is illustrated in the simple
(non-stochastic) example below.

• Assume an abstract model P(\> [> 0 ) which leads to a

criterion function T() which has the property

S := T() is maximized at = 0 (in population).

• Construct a sample analog of the criterion function TW

such that
(; 5 X) TW () $ T.

10
• Select ˆW to maximize TW () for each W . Then, if con-
vergence is ‘OK’ (we get that under the regularity con-
ditions), we have convergence in the sample to the max-
imizer in the population; i.e. we have ˆ $ as W $ 4.

11
Now suppose X = {1> 2> 3} is a ﬁnite set, so that ˆ assumes
only one of three values. Then under regularity condition 1
(q.v. §1.2), T() is maximized at one of these.
Further, by construction we have
¡ ¢
; 5 X = {1> 2> 3} TW () $ T().

Note that the rule picks

ˆW = argmax TW ().

This estimator must work well (i.e. be consistent for for ‘big
enough’ W , because TW $ T for each . Why? We loosely
sketch a proof of the consistency of ˆ for below.

12
Say 0 = 2 so that T() is maximized at = 2, T(2) A T(1),
and T(2) A T(3). Now

(; 5 X) TW () $ T().

This implies that as W gets large,

(;W ) ||TW () T()|| $ 0.

In other words, as W gets ‘big’ we get TW () arbitrarily close

to T().

13
Now suppose that even for very large W , TW () is not maxi-
mized at 2 but say at 1. Then we have

(;W ) TW (1) TW (2) A 0.

Then under regularity condition 2 (q.v. §1.2) this would imply

T(1) T(2) A 0, a contradiction.
Hence the estimator ˆW $ .

Here principle S is an example of the extremum principle.

This principle involves either maximization or minimization
of the criterion function. Examples include the maximum like-
lihood (maximization) and nonlinear least squares (minimiza-
tion) methods.

14
2 Overview of Convergence Concepts
for Non-stochastic Functions
Most estimators involve forming functions using the data avail-
able. These functions of data can be viewed as sequences of
functions, with the index being the number of data points avail-
able.

With su!cient data (large samples), these functions of sample

data converge ‘nicely’ (assuming certain conditions), ensuring
that the estimators we form have good properties, such as ‘con-
sistency ’ and ‘asymptotic normality’.

15
In this section, we examine certain basic concepts about con-
vergence of sequences of functions. The key idea here is that
of uniform convergence; we shall broadly motivate why this
notion of convergence is required for us.

For simplicity, we look at non-stochastic functions and non-

stochastic convergence notions. Broadly construed, the same
ideas also apply to stochastic functions–analogous conver-
gence concepts are applicable in the stochastic case.

16
2.1 Pointwise Convergence
Let Iq : V $ U be a sequence © of ªreal valued functions. For
"
each { 5 V form a sequence Iq ({) q=1 . Let E 5 V be the set
of points for which Iq ({) converges.

(;{ 5 E) I ({) := lim Iq ({)

q<"
© ª"
We then call I ({) a ‘limit function’, and say ‘ Iq ({) q=1 con-
verges pointwise to I ({) on E’.

Nota bene, pointwise convergence is not enough to guarantee

that if Iq has a property (continuity, dierentiability, etc.) that
that property is inherited by I .

17
Inheritance requires uniform convergence. Usually, this is a
su!cient condition for the properties of Iq being shared by the
limit function. Some of the properties we will be interested in
are summarized below:

• Does ‘(;q) Iq ({) is continuous’ =, ‘I ({) is continuous’?

• Does ‘(;q) lim Iq ({) = Iq ({0 )’ =, ‘ lim I ({) = I ({0 )’?

{<{0 {<{0
I.e. does lim lim Iq ({) = lim lim Iq ({)?
{<{0 q<" q<" {<{0
R R R
• Does lim Iq ({) = lim Iq ({) = I ({) (where I ({)
q<" q<"
is the limit function)?

The answer to all three questions is: No, not in general. The
pointwise convergence of Iq ({) to I ({) is generally not su!-

18
cient to guarantee that these properties hold. This is demon-
strated by the examples which follow.

19
Example 1
Consider Iq ({) = {q (0 { 1).
Here, Iq ({) is continuous for every q, but the limit function
is: (
0 if 0 { ? 1
I ({) =
1 if 0 { = 1
This is discontinuous at { = 1.

20
Example 2
Consider
{
Iq ({) = ({ 5 R).
{+q

Then the limit function is I ({) = lim Iq ({) = 0 for each ﬁxed
q<"
{. This implies that lim lim Iq ({) = 0.
{<" q<"
But we have lim Iq ({) = 1 for every ﬁxed q. This implies that
{<"
lim lim Iq ({) = 1 6= lim lim Iq ({) = 0.
q<" {<" {<" q<"

21
Example 3
Consider Iq ({) = q{(1 {2 )q (0 { 1). Then the limit
function is2
I ({) = lim Iq ({) = 0
q<"
R1
for each ﬁxed {. This implies that 0 I ({) = 0. But we also
get Z 1
q 1
lim Iq ({) = lim = .
q<" 0 q<" 2q + 2 2

2
See Rudin, Chapter 3 for concepts relating to convergence of se-
quences.

22
2.2 Uniform Convergence
A sequence of functions {Iq } converges uniformly to I on E
if

(;% A 0)(<Q)(;q A Q)(;{ 5 E) |Iq ({) I ({)| ? %

where Q depends on % but not on {, i.e.,

(;{ 5 E) I ({) % ? Iq ({) ? I ({) + %.

Intuitively, for any { 5 E as q gets large enough, Iq ({) lies in

a band of uniform thickness around the limit function I ({).

23
Note that the convergence displayed in the examples in §2.1
did not satisfy the notion of ‘uniform’ convergence.

We have theorems that ensure that the properties of inheri-

tance referred to in §2.1 are satisﬁed when convergence is uni-
form.3

3
See Rudin, Chapter 7. Theorems 7.12, 7.11, and 7.16 respectively
ensure that the three properties listed in §2.1 hold.

24
3 Applications of the Analog
Principle

25
3.1 Moment Principle
In this case, the criterion function T is an equation connecting
population moments and 0 .
¡ ¢
T = T population moments of [|> {]> = 0 at = 0 . (1)

Thus, the property here is not maximization or minimization

(as in the extremum principle), but setting some function of
the population moments (generally) to zero at = 0 , so that
solving equation 1 we get the estimator:
¡ ¢
ˆW = sample moments [y,x]

26
When forming the estimator ‘analog’ in the sample, we can
consider two cases.

3.1.1 Case A
Here we solve for 0 in the population equation, obtaining
¡ ¢
0 = 0 population moments [y,x] .

From this equation we construct the simple analog

¡ ¢
ˆ = ˆ population moments [y,x] .

ˆ s
Given regularity condition 1 (q.v. §1.2), we get W $ 0 . Note
that we do not need regularity condition 2.

27
Example 4 (OLS) 1. The model.

\w = [w 0 + Xw E(Xw ) = 0
Var(Xw ) = 2x E([w0 Xw ) = 0
X X
0 0
E([w [w ) = positive deﬁnite E([w \w ) = |
{{ {

2. Moment principle (criterion function). In the population

E([w0 Xw ) =0
0 0 0
or [
Pw |w =[
Pw [w 0 + [w Xw
,T: = {{ 0 + 0
{| P31 P
, 0 = {{ {| .

28
3. Analog in sample.
μP ¶31 μP ¶
[w0 [w [w \w
ˆ W =
W W
¡ P ¢31 P
Now, r.h.s. $ ˆ s
{{ {| . Thus, $ 0 .

29
3.1.2 Case B
Here we form the criterion function, form the sample analog
for the criterion function, and then solve for the estimator.

We require condition 1 (q.v. §1.2) and uniform convergence

(condition 2) to get consistency of the estimator – i.e. for
ˆ $s
0 .

Example 5 (OLS – another analogy)

1. The model. As in §3.1.1.

2. Moment principle (criterion function). In the population
T : E([w0 X w) = 0.

30
3. Analog in sample. Here we deﬁne an ample analog of X:
b = | {ˆ
X
This mimics X in the criterion function, so that we can
form the sample analog of the criterion function by sub-
b for X in the expression for T above:
stituting X
1X ˆ =0
TW = {w (\w [w )
W

4. The estimator. Here we pick ˆ by solving from TW to

arrive at the relation
X μP ¶
1 [w \w ˆ
[w \w = .
W W

31
Remarks.

• Recall that in Case A (q.v. §3.1.1) we did not form TW ,

but ﬁrst solved for 0 and then formed ˆ directly, skipping
step 3 of Case B.

• Observe that when [w = 1,

P
\w
ˆ =
W
is the mean.

• OLS can also be interpreted as an application of the ex-

tremum principle (s.v. §3.2).

32
3.2 Extremum Principle
We saw in §1.3 that under the extremum principle, property P
provides that T() achieves a maximum or minimum at = 0
in the population.

This principle underlies the nonlinear least squares (NLS) and

maximum likelihood estimation (MLE) methods. As their
names suggest, MLE choose T to be a maximum while NLS
choose S to be the minimum.

33
The OLS estimator can be seen as a special case of the NLS
estimators, and can be viewed as an extremum estimator.

In this section we analyze OLS and NLS as extremum estima-

tors. MLE is examined in more detail in §4.

34
Example 1 (OLS as an extremum principle)
1. The model. We assume the true model
\w = [w 0 +¡XW ¢
=, \W = [w + [w ( 0 ) + Xw
=, (\w [w )0 (\w [w ) = ( 0 )0 [w0 [w ( 0 )
+ 2Xw0 [w ( 0 ) + Xw0 Xw .

2. Criterion function.
¡ 0
¢
T = E (\w [w ) (\w [w )
From model assumptions, we have E([w0 Xw ) = 0. Thus
X
0
T = ( 0 ) ( 0 ) + 2x .
{{

35
So T is minimized (with respect to ) at = 0 .
Here, then, T is an example of the extremum principle.
It is minimized when = 0 (the true parameter vector).

3. Analog in sample. A sample analog of the criterion func-

tion is constructed as

1 XW
TW = (\w [w )0 (\w [w0 ).
W w=1

36
We can show that this analog satisﬁes the key requirement
of the analog principle:

1 X W
plim TW = plim (\w [w )0 (\w [w0 )
W w=1
X
0
= ( 0 ) ( 0 ) + 2X = T
{{

(Assuming conditions for application of some LLN are

satisﬁed; q.v. Part I of these lectures.)

4. The estimator. We pick ˆ to minimize TW . Under stan-

dard regularity conditions, we can show that we get con-
tradiction unless ˆ $ 0 .

37
Example 2 (NLS as an extremum principle)

1. The model. We assume that the following model holds in

the population:

\w = j([w ; 0 ) +¡Xw (non-linear model)¢

=, \W = j([w ; ) + j([w ; 0 ) j([w ; ) + Xw .

Assume (Xw > [w > \w ) i.i.d. Then Xw B

B [w implies that

B j([w ; ).
(;) Xw B

38
2. Criterion function. Choose the criterion function as
¡ ¢2 ¡ ¢2
T = E \w j([w ; ) = E j([w ; 0 ) j([w ; ) + 2X .

Then T is minimized at = 0 (a true parameter value).

If = 0 is the only such value, the model is identiﬁed
with respect to the T criterion – regularity condition 1
(q.v. §1.2) is satisﬁed.

3. Analog in sample.

1 XW
¡ ¢2
TW () := \w j([w ; )
W w=1

As in the OLS case, we can show that plim TW = T.

39
4. The estimator. We construct the NLS estimator as
ˆW = argmin TW ().

We thus choose ˆ to minimize TW (). Reductio ad ab-

surdum veriﬁes that

TW (0 ) $ T(0 )

and
(; 5 X) TW () $ T() =, ˆW $ .

40
Remark. The NLS estimator could also be derived as a moment
estimator, just as in the OLS example (q.v. §3.1).
1. The model. Same as the non-linear model above. We
have Xw B B j([w ; ).
¡ ¢
2. Criterion function. T = E Xw · j([w ; ) = 0. Note that
this is only one implication of B
B. We may now write

¡ w ; ) = Xw
\w j([ ¢
=, T = E \w j([w ; ) · j([w ; ) = 0

3. Analog in sample.

1 XW
¡ ¢
TW := \w j([w ; ) · j([w ; )
W w=1

41
4. The estimator. Find which sets TW = 0 (or as close to
zero as possible).

4 Maximum Likelihood
The maximum likelihood estimation (MLE) method is an ex-
ample of the extremum principle. In this section, we look at the
ideas underlying MLE and examine the regularity conditions
and convergence notions in more detail for this estimator.

42
4.1 The Model
Suppose that the joint density of data is
i (|w > {w ; 0 ) · i(|w | {w ; 0 ) · i ({w ).
Assume that {w is ‘exogenous’ – i.e. the density of {w is unin-
formative about 0 . Also assume random sampling. We arrive
at the likelihood function
Y
W
L= i(|w > {w ; 0 ).
w=1

Taking (|w > {w ) as data, L becomes a function of 0 . The log

likelihood function is
X
W X
W X
W
ln L = ln i(|w > {w ; ) = ln i(|w | {w ; ) + ln i ({w ).
w=1 w=1 w=1

43
4.2 Criterion Function
In the population deﬁne the criterion function as
¡ ¢
T = E0 ln i (|w > {w ; )
Z
¡ ¢
= ln i(|w > {w ; ) i (|w > {w ; 0 ) d|w d{w .

(We assume this integral exists.)

We pick the that maximizes L. Note that this is an extremum
principle application of the analogy principle.

44
Claim. The criterion function is maximized at = 0 .
Proof.
μ ¶
i (|w > {w ; )
E0 = 1 because
i(|w > {w ; 0 )
Z
i(|w > {w ; )
i (|w > {w ; 0 ) d|w d{w = 1.
i (|w > {w ; 0 )

45
Applying Jensen’s inequality, concavity of the ln function im-
plies that
¡ ¢
E ln({) ln E({)
μ μ ¶¶
i(|w > {w ; )
=, E0 ln 0
i (|w > {w ; 0 )
¡ ¢ ¡ ¢
=, (;) E0 ln i (|w > {w ; ) E0 ln i (|w > {w ; 0 ) .

We get global identiﬁcation in the population if the inequality

is strict for all 6= .

46
4.3 Analog in Sample
Construct a sample analog of the criterion function TW as

1 XW
TW := ln i(|w > {w ; ).
W w=1

4.4 The Estimator and its Properties

We form the estimator as
ˆW := 5 X argmax Tw

(we assume this exists).

47
Local form of the principle. In the local form, we use FOC and
SOC to arrive at 5 X argmax Tw . Recall that we have the
criterion function
Z
¡ ¢
T() = ln i(|; ) i (|; 0 ) d| = E0 ln i(|; ) .

Maximization of the criterion function yields ﬁrst and second

48
Accordingly, we require for the sample analog:

1 XW
C ln i(|w ; )
=0
W w=1 C

1 X C 2 ln i
W
negative deﬁnite
W w=1 CC0

For ‘local identiﬁcation’, we require that the second order con-

ditions be satisﬁed locally around the point solving the FOC.
For ‘global’ identiﬁcation, we need SOC to hold for every 5
X.

49
Either way (e.g. directly by a grid search or using the FO/SOCs),
we have the same basic idea. For each W , we pick ˆW such that

(; 5 X) TW (ˆW ) A TW ().

Now if TW (ˆW ) $ T(lim ˆW ) (uniform convergence), we get the

contradiction
T(ˆW ) A T(0 ) –
assumed to be a maximum value – unless plim ˆW = 0 .
To be more precise, we must check whether
uniformly
TW $ T (almost surely).

It remains to cover certain concepts and deﬁnitions for random

functions.

50
5 Some Concepts and Definitions for
Random (Stochastic) Functions
In §5.1 we define random functions and examine some funda-
mental properties of such functions. In §5.2 we define conver-
gence concepts for sequences of random functions.

51
5.1 Random Functions and Some Properties
Deﬁnition 2 (Random Function)
Let (l> A> S ) be a probability space and let X 5 Rn . A real
function *() = *(> $) on X × l is called a random function
on X if
1
© ª
(;w 5 R )(; 5 X) $ 5 l : *(> $) ? w 5 A.

We can then assign a probability to the event: *(> $) ? w.

52
Proposition. If *(> {) is a continuous real-valued function on
X × Uq where X is compact, then

j({) sup (> {) and k({) inf (> {)

MX MX

are continuous functions.

Proof. See Stokey, Lucas, and Prescott, Chapter 3 for deﬁni-
tions of sup and inf , and for the proof of this proposition (q.v.
MX MX
Theorem 3.6).

53
Proposition. If for almost all values of { 5 [, j({> ) is con-
tinuous with respect to at the point 0 , and if for all in a
neighborhood of 0 we have
¯ ¯
¯j({> )¯ ? J1 ({) ? 4,

then Z Z
lim j({> ) dI ({) = j({> 0 ) dI ({).
<0

I.e. ¡ ¢ ¡ ¢
lim E j({> ) = E j({> 0 ) .
<0

Proof. This is a version of a ‘dominated convergence theorem’.

See inter alios Royden, Chapter 4.

54
Proposition. If for almost all values of { 5 [ and for a ﬁxed
value of
Cj({> )
(a) exists (in a neighborhood of ), and
¯ C ¯
¯ j({> + k) j({> ) ¯
(b) ¯ ¯ ? J2 ({),
¯ k ¯

for 0 ? |k| ? k0 , k independent of {, then

Z Z
C Cj({> )
j({> ) dI ({) = dI ({).
C C
I.e. ¸
C ¡ ¢ Cj({> )
E j({> ) = E .
C C

55
5.2 Convergence Concepts for Random Func-
tions
In Part I (asymptotic theory) we deﬁned convergence concepts
for random variables. Here we deﬁne analogous concepts for
random functions.

Deﬁnition 3 (Almost Sure Convergence)

Let I () and Iq () be random functions on X Un for each
5 X. Then Iq () almost surely converges to I () as q $ 4
if
© ª
(;% A 0) S $ : lim |Iq (> $) I (> $)| ? % = 1;
q<"

i.e. if for every ﬁxed the set V [ such that |Iq (> $)
I (> $)| %, q q0 (> %), has no probability.

56
V = 5 X ^ V may have a non-negligible probability even
though any one set has negligible probability. We avoid this
by the following deﬁnition.

Deﬁnition 4 (Almost Sure Uniform Convergence)

Iq () $ I () almost surely uniformly in if

sup |Iq () I ()| $ 0

almost surely as q $ 4. I.e., if

In this case, the negligible set is not indexed by .

57
Deﬁnition 5 (Convergence in Probability) Let Iq () and
I () be random functions on X. Then Iq () $ I () in prob-
ability uniformly in on X if
© ª
lim S sup |Iq () I ()| A % = 0.
q<" MX

58
Theorem 1 (Strong Uniform Law of Large Numbers)
Let {{q } be a sequence of random n × 1 i.i.d. vectors. Let
I ({> ) be a continuous real function on Un . X is compact (it
is closed and bounded, and thus has a ﬁnite subcover). Deﬁne

#(d) = sup sup |I ({> )|.

||{||?d MX

Let J({) be the distribution of {. Assume E[#({)] ? 4. Then,

XW Z
1 a. s. unif.
I ({m > ) $ I ({> ) dJ({).
W m=1

59
This is a type of LLN, and could be modiﬁed for the non-i.i.d.
case. In that case, each {m has its own distribution Im , and we
would require that

1 XW
g
(a) Im $ J and
W m=1

1X ¡ ¢1+
W
(b) W sup E |#({m )| ? 4.
W m=1

Then
XW Z
1 a. s. unif.
I ({m > ) $ I ({> ) dJ({).
W m=1

Note that we need a bound in either case on E[|#({m )|].

60
References
1. Amemiya, Advanced Econometrics, 1985, Chapter 3.

2. Greenberg and Webster, Advanced Econometrics, 1991,

Chapter 1.

3. Newey and McFadden, ‘Large Sample Estimation and

Hypothesis Testing’ in Handbook of Econometrics, 1994,
Chapter 36, Volume IV.

4. Royden, Real Analysis, 1968, Chapter 4.

5. Rudin, Principles of Mathematical Analysis, 1976, Chap-

ters 3 and 7.

61
6. Stokey, Lucas, and Prescott, Recursive Methods in Eco-
nomic Dynamics, 1989, Chapter 3.

US President Inquiry
67% (3)
US President Inquiry
13 pages
Math 139 Fourier Analysis Notes PDF
No ratings yet
Math 139 Fourier Analysis Notes PDF
212 pages
Probability Essentials: 19 Weak Convergence and Characteristic Func-Tions
No ratings yet
Probability Essentials: 19 Weak Convergence and Characteristic Func-Tions
65 pages
Vapnik - Complete Statistical Theory of Learning Learning U
No ratings yet
Vapnik - Complete Statistical Theory of Learning Learning U
59 pages
Amath731 Intro
No ratings yet
Amath731 Intro
7 pages
CH 4
No ratings yet
CH 4
39 pages
480 Note Lin
No ratings yet
480 Note Lin
11 pages
Econometric Theory Lecture Notes
No ratings yet
Econometric Theory Lecture Notes
90 pages
Lec 4
No ratings yet
Lec 4
8 pages
Econometric Analysis MT Official Problem Set Solution 4
No ratings yet
Econometric Analysis MT Official Problem Set Solution 4
7 pages
Lecture 12
No ratings yet
Lecture 12
21 pages
Lec 0 General
No ratings yet
Lec 0 General
7 pages
Lecture 7: Convergence and Limit Theorems
No ratings yet
Lecture 7: Convergence and Limit Theorems
23 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
Selected Theoretical Aspects of ML and Deep Learning
No ratings yet
Selected Theoretical Aspects of ML and Deep Learning
46 pages
P.G. Sem-1 Real Analysis All Model Ques-Ans 2025... !!
No ratings yet
P.G. Sem-1 Real Analysis All Model Ques-Ans 2025... !!
69 pages
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
No ratings yet
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
11 pages
Horowitz Sinander Notes
No ratings yet
Horowitz Sinander Notes
136 pages
Real Analysis Flashcards
No ratings yet
Real Analysis Flashcards
18 pages
Non-Linear Methods 4.1. Asymptotic Analysis: 4.1.2. Stochastic Regressors
No ratings yet
Non-Linear Methods 4.1. Asymptotic Analysis: 4.1.2. Stochastic Regressors
73 pages
BSDS Slides-Week9
No ratings yet
BSDS Slides-Week9
6 pages
Class 02
No ratings yet
Class 02
42 pages
Problem Set 2
No ratings yet
Problem Set 2
18 pages
Theory of Quadrature PDF
No ratings yet
Theory of Quadrature PDF
280 pages
Supervised Learning
No ratings yet
Supervised Learning
61 pages
CH 7
No ratings yet
CH 7
47 pages
Solutions To Exam 1: 1 2 N N A N
No ratings yet
Solutions To Exam 1: 1 2 N N A N
3 pages
Hu & Shiu 2018
No ratings yet
Hu & Shiu 2018
35 pages
Notes
No ratings yet
Notes
10 pages
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
No ratings yet
A Strengthened Conjecture On The Minimax Optimal Constant Stepsize For Gradient Descent
8 pages
341 Lectures Chapter 6 Class Notes
No ratings yet
341 Lectures Chapter 6 Class Notes
39 pages
STAT 513 Solutions
No ratings yet
STAT 513 Solutions
16 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Econ 623 AsymptoticTheory 2023
No ratings yet
Econ 623 AsymptoticTheory 2023
74 pages
Econometría
No ratings yet
Econometría
43 pages
MIT14 384F13 Rec7
No ratings yet
MIT14 384F13 Rec7
6 pages
Recitation 1
No ratings yet
Recitation 1
10 pages
Protter
No ratings yet
Protter
43 pages
Introduction To Stochastic Optimization-2
No ratings yet
Introduction To Stochastic Optimization-2
15 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
No ratings yet
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
27 pages
Optimization Lecture 2
No ratings yet
Optimization Lecture 2
7 pages
Selective Review - Probability
No ratings yet
Selective Review - Probability
30 pages
Uniform Convergence Malik Arora
No ratings yet
Uniform Convergence Malik Arora
42 pages
Uniform Convergence
No ratings yet
Uniform Convergence
24 pages
Uniform Convergence PDF
No ratings yet
Uniform Convergence PDF
24 pages
Lecture 1: Stochastic Convergence and CLT
No ratings yet
Lecture 1: Stochastic Convergence and CLT
102 pages
DP Noas
No ratings yet
DP Noas
10 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
An Algorithm For Computing Risk Parity Weights - F. Spinu
No ratings yet
An Algorithm For Computing Risk Parity Weights - F. Spinu
6 pages
Methods For Applied Macroeconomics Research - ch1
No ratings yet
Methods For Applied Macroeconomics Research - ch1
28 pages
Statistics Training
No ratings yet
Statistics Training
96 pages
Preliminary Concepts In: Numerical Analysis
No ratings yet
Preliminary Concepts In: Numerical Analysis
21 pages
Math 341 - Lecture Notes On Chapter 6 - Sequences and Series of Functions
No ratings yet
Math 341 - Lecture Notes On Chapter 6 - Sequences and Series of Functions
32 pages
Econometric Analysis MT Official Problem Set Solution 2
No ratings yet
Econometric Analysis MT Official Problem Set Solution 2
7 pages
Emp Proc Lecture Notes
No ratings yet
Emp Proc Lecture Notes
172 pages
Analysis 3 Chapter 2
No ratings yet
Analysis 3 Chapter 2
8 pages
Class14 PDF
No ratings yet
Class14 PDF
29 pages
SIgnals
No ratings yet
SIgnals
5 pages
hw1 Solution 16 PDF
No ratings yet
hw1 Solution 16 PDF
17 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Apm
No ratings yet
Apm
11 pages
SPSS Problems Solved
100% (2)
SPSS Problems Solved
15 pages
Genomic Analysis of Pollen Grains For Forensic Applications
No ratings yet
Genomic Analysis of Pollen Grains For Forensic Applications
240 pages
Wadia Institute of Himalayan Geology: Application Form For Scientific Posts
No ratings yet
Wadia Institute of Himalayan Geology: Application Form For Scientific Posts
15 pages
RCSC Gtu Offer Letter
100% (1)
RCSC Gtu Offer Letter
6 pages
Google's HRM - Training, Performance Management - Panmore Institute
No ratings yet
Google's HRM - Training, Performance Management - Panmore Institute
5 pages
Proficiency Testing Guidance Document
No ratings yet
Proficiency Testing Guidance Document
3 pages
How Linear Regression Works - A Simple Explanation - by Ravishek Singh - Sep, 2024 - Medium
No ratings yet
How Linear Regression Works - A Simple Explanation - by Ravishek Singh - Sep, 2024 - Medium
13 pages
Mahesh BA Resume.
No ratings yet
Mahesh BA Resume.
1 page
25-Karl Gunnar Holter PHD Project Sproytemembran
No ratings yet
25-Karl Gunnar Holter PHD Project Sproytemembran
57 pages
An Adaptive Particle Swarm Optimization Algorithm Based On Cat Map
No ratings yet
An Adaptive Particle Swarm Optimization Algorithm Based On Cat Map
8 pages
Questions To Ask in Post-Observation Conferences For A Reflective Practice
No ratings yet
Questions To Ask in Post-Observation Conferences For A Reflective Practice
9 pages
Iii Final Output
No ratings yet
Iii Final Output
46 pages
How Can You Evaluate Cognitive Domain
No ratings yet
How Can You Evaluate Cognitive Domain
3 pages
Adebayo
No ratings yet
Adebayo
5 pages
2020 JinkoSolar Company Profile
No ratings yet
2020 JinkoSolar Company Profile
16 pages
UNIT - IV - 3 - Research Design - Experiment
No ratings yet
UNIT - IV - 3 - Research Design - Experiment
32 pages
Entrepreneurship Skills
No ratings yet
Entrepreneurship Skills
14 pages
NAME: Neil Dominic D. Careo: Ms Bem 208 - Midterm Examination October 17, 2020
No ratings yet
NAME: Neil Dominic D. Careo: Ms Bem 208 - Midterm Examination October 17, 2020
8 pages
Dampak Kesehatan Dan Adaptasi Lintas Budaya Akibat Gegar Budaya
No ratings yet
Dampak Kesehatan Dan Adaptasi Lintas Budaya Akibat Gegar Budaya
10 pages
Premade Royal Enfield Research Project
No ratings yet
Premade Royal Enfield Research Project
65 pages
Lampiran Efas Dan Ifas: 1. EFAS (External Factors Analysis Strategy)
No ratings yet
Lampiran Efas Dan Ifas: 1. EFAS (External Factors Analysis Strategy)
3 pages
Consumer Behavior of Bread and Its Influence On "Supply Chain Management" An Innovative Approach
No ratings yet
Consumer Behavior of Bread and Its Influence On "Supply Chain Management" An Innovative Approach
16 pages
Johari Window
No ratings yet
Johari Window
3 pages
OLson & Barnes Quality of Life Scale Ok
No ratings yet
OLson & Barnes Quality of Life Scale Ok
20 pages
Final-Format-JRU-Thesis (1) JJJJJ
100% (1)
Final-Format-JRU-Thesis (1) JJJJJ
5 pages
The Effects of Different Music Genres On Distraction
No ratings yet
The Effects of Different Music Genres On Distraction
4 pages
ANDO, C. y RICHARDSON, S. Ancient States and Infrastructural Power. Europe, Asia, and America
100% (1)
ANDO, C. y RICHARDSON, S. Ancient States and Infrastructural Power. Europe, Asia, and America
315 pages
Meldung Einer Dissertation Tu Wien
100% (2)
Meldung Einer Dissertation Tu Wien
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Asymp2 Analogy 2006-04-05 Mms

Uploaded by

Asymp2 Analogy 2006-04-05 Mms

Uploaded by

Analogy Principle

Asymptotic Theory — Part II

1. Maximum Likelihood Estimation (MLE)

2. (Nonlinear) Least Squares

4. Generalized Method of Moments (GMM)

These methods are not always distinct for a particular problem.

OLS is all four rolled into one.

In this lecture we will show how one basic principle – the

The intuition behind the analogy principle is as follows:

‘Suppose we know some properties that are satisﬁed for the

1. The model. Suppose there exists a ‘true model’ in the

4. The estimator. Let ˆW be the estimator of 0 in sample W

In general, to prove that the estimator formed using the analog

1. Identiﬁcation condition. Generally this states that only

The second condition is a technical one that ensures that if

Speciﬁc proofs of consistency depend on which property we

• Assume an abstract model P(\> [>  0 ) which leads to a

S := T() is maximized at  = 0 (in population).

• Construct a sample analog of the criterion function TW

Note that the rule picks

(; 5 X) TW () $ T().

This implies that as W gets large,

(;W ) ||TW () T()|| $ 0.

In other words, as W gets ‘big’ we get TW () arbitrarily close

(;W ) TW (1) TW (2) A 0.

Then under regularity condition 2 (q.v. §1.2) this would imply

Here principle S is an example of the extremum principle.

With su!cient data (large samples), these functions of sample

For simplicity, we look at non-stochastic functions and non-

(;{ 5 E) I ({) := lim Iq ({)

Nota bene, pointwise convergence is not enough to guarantee

• Does ‘(;q) Iq ({) is continuous’ =, ‘I ({) is continuous’?

• Does ‘(;q) lim Iq ({) = Iq ({0 )’ =, ‘ lim I ({) = I ({0 )’?

(;% A 0)(<Q)(;q A Q)(;{ 5 E) |Iq ({) I ({)| ? %

where Q depends on % but not on {, i.e.,

(;{ 5 E) I ({) % ? Iq ({) ? I ({) + %.

Intuitively, for any { 5 E as q gets large enough, Iq ({) lies in

We have theorems that ensure that the properties of inheri-

Thus, the property here is not maximization or minimization

From this equation we construct the simple analog

2. Moment principle (criterion function). In the population

We require condition 1 (q.v. §1.2) and uniform convergence

Example 5 (OLS – another analogy)

1. The model. As in §3.1.1.

4. The estimator. Here we pick ˆ by solving from TW to

• Recall that in Case A (q.v. §3.1.1) we did not form TW ,

• Observe that when [w = 1,

• OLS can also be interpreted as an application of the ex-

This principle underlies the nonlinear least squares (NLS) and

In this section we analyze OLS and NLS as extremum estima-

3. Analog in sample. A sample analog of the criterion func-

(Assuming conditions for application of some LLN are

4. The estimator. We pick ˆ to minimize TW . Under stan-

1. The model. We assume that the following model holds in

\w = j([w ; 0 ) +¡Xw (non-linear model)¢

Assume (Xw > [w > \w ) i.i.d. Then Xw B

Then T is minimized at  = 0 (a true parameter value).

As in the OLS case, we can show that plim TW = T.

We thus choose ˆ to minimize TW (). Reductio ad ab-

Taking (|w > {w ) as data, L becomes a function of 0 . The log

(We assume this integral exists.)

We get global identiﬁcation in the population if the inequality

4.4 The Estimator and its Properties

(we assume this exists).

Maximization of the criterion function yields ﬁrst and second

For ‘local identiﬁcation’, we require that the second order con-

(; 5 X) TW (ˆW ) A TW ().

Now if TW (ˆW ) $ T(lim ˆW ) (uniform convergence), we get the

It remains to cover certain concepts and deﬁnitions for random

We can then assign a probability to the event: *(> $) ? w.

j({)  sup *(> {) and k({)  inf *(> {)

are continuous functions.

Proof. This is a version of a ‘dominated convergence theorem’.

for 0 ? |k| ? k0 , k independent of {, then

Deﬁnition 3 (Almost Sure Convergence)

Deﬁnition 4 (Almost Sure Uniform Convergence)

4. The estimator. Let ˆW be the estimator of 0 in sample W

• Assume an abstract model P(\> [> 0 ) which leads to a

S := T() is maximized at = 0 (in population).

(; 5 X) TW () $ T().

(;W ) ||TW () T()|| $ 0.

In other words, as W gets ‘big’ we get TW () arbitrarily close

4. The estimator. Here we pick ˆ by solving from TW to

4. The estimator. We pick ˆ to minimize TW . Under stan-

\w = j([w ; 0 ) +¡Xw (non-linear model)¢

Then T is minimized at = 0 (a true parameter value).

We thus choose ˆ to minimize TW (). Reductio ad ab-

Taking (|w > {w ) as data, L becomes a function of 0 . The log

(; 5 X) TW (ˆW ) A TW ().

Now if TW (ˆW ) $ T(lim ˆW ) (uniform convergence), we get the

We can then assign a probability to the event: *(> $) ? w.

j({) sup (> {) and k({) inf (> {)

sup |Iq () I ()| $ 0

In this case, the negligible set is not indexed by .

#(d) = sup sup |I ({> )|.