0% found this document useful (0 votes)

56 views11 pages

Lecture Notes No. 18: 2.160 System Identification, Estimation, and Learning

1) The estimate θˆN converges to the true parameter value θ* as the number of data points N increases. However, the rate of convergence and how quickly the variance decreases with more data needs to be analyzed. 2) Applying the central limit theorem, as N increases the distribution of the parameter estimate θˆN approaches a Gaussian distribution with mean 0 and covariance matrix Q. 3) For a linear regression model, the asymptotic distribution of the parameter estimate θˆN can be derived. As N increases, the estimate θˆN approaches the true parameter θ0 according to this distribution.

Uploaded by

Ioannis Frangis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views11 pages

Lecture Notes No. 18: 2.160 System Identification, Estimation, and Learning

Uploaded by

Ioannis Frangis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

2.

160 System Identification, Estimation, and Learning

Lecture Notes No. 18
April 26, 2006

13 Asymptotic Distribution of Parameter Estimates

13.1 Overview

If convergence is guaranteed, then θˆN → θ * .

But, how quickly does the estimate θˆN approach the limit θ * ? How many data points
are needed? Æ Asymptotic Variance Analysis

θˆN
Distribution of θˆN The variance is
small for large N.

θ*
Iteration/Data Number
The variance is How quickly
large for small N. does the variance
reduce?

The main points to be obtained in this chapter

The variance analysis of this chapter will reveal

1
a) The estimate converges to θ * at a rate proportional to
N
b) Distribution converges to a Gaussian distribution: N(0, Q).
∂ŷ
c) Cov θˆN depends on the parameters sensitivity of the predictor:
∂θ

Identified model parameter θˆN with cov θˆN : a “quality tag” confidence interval

12.2 Central Limit Theorems.

The mathematical tool needed for asymptotic variance analysis is “Central Limit”
theorems. The following is a quick review of the theory.

1
Consider two independent random variable, X and Y, with PDF, f X (x) and fY ( y ) .
Define another random variable Z as the sum of X and Y:
Z = X +Y
Let us obtain the PDF of Z.
y
Pr ob( z ≤ Z ≤ z + ∆z ) ∆xy
z + ∆z = x + y
= ∫ ∫ f X ( x) fY ( y )dxdy
∆XY

⎡ ∞
⎤
= ⎢ ∫ f X ( x) fY ( z − x)dx ⎥ ∆z = f Z ( z )∆z
⎣ −∞ ⎦
z = x+ y x

Example
f X ( x) f X ( x)

1/2 1/2

-1 1 x -1 1 x

f X (x) and fY ( y ) have the same

uniform distribution. Combining the f Z = X +Y ( z )
two distributions, we can obtain the
distribution of Z. 1/2

-2 2 z

Further, consider W = X + Y + V , fV (v) has the same rectangular PDF as X and Y.

fW ( w)

w
-3 -1 0 1 3
The resultant PDF is getting close to a Gaussian distribution.

2
N
In general, the PDF of a random variable ∑X
i =1
i approaches a Gaussian distribution,

regardless of the PDF of each X i , as N gets larger. More rigorously, the following
central limit theorem has been proven.

A Central Limit Theorem of Independent Random Variables

Let X t , t = 0,1, " be a d-dimensional random variable with

Mean m = E( X t )
Co-variance [
Q = E ( X t − m )( X t − m )
T
] (1)

Consider the sum of X t − m given by

∑ (X − m)
1
YN = t
N t =1 (2)

Then, as N tends to infinity, the distribution of YN converges to the Gaussian distribution

given by PDF:

1 ⎧ 1 ⎫
fY ( y ) = exp ⎨− y T Q −1 y ⎬ (3)
(2π ) d /2
det Q ⎩ 2 ⎭

where
N
1
y = lim
N →∞ N
∑( X
t =1
t − m) .

13.3 Distribution of Estimate θˆN

Applying the Central Limit Theorem, we can obtain the distribution of estimate θˆN as N
tends to infinity.
Let θˆN be an estimate based on the prediction error method (PEM);
θˆN = arg min VN (θ , Z N ) (4)
θ ∈DM

N
1 1
VN (θ , Z N ) =
N
∑2ε
t =1
2
(t ,θ ) (5)

For simplicity, we first assume that the predictor yˆ (t θ ) is given by a linear regression:
yˆ (t θ ) = ϕ T θ (6)

3
and the parameter vector of the true system, θ 0 , is involved in the model set, θ 0 ∈ DM .

The actual data is generated by

yˆ (t θ ) = ϕ T θ 0 + e0 (t ) (7)
where
⎧λ t=s
E [e0 (t )e0 ( s )] = ⎨ 0 .
⎩0 t≠s

Since θˆN minimizes VN (θ , Z N )

d
V ' N (θˆN , Z N ) = VN (θ N , Z N ) θ =θ N = 0, V ' N ∈ R d ×1 (8)
dθ

Using the Mean Value Theorem, V ' N can be expressed as

V ' N (θˆN , Z N ) = V ' N (θ 0 , Z N ) + V ' ' N (ξ N , Z N )(θˆN − θ 0 ) θˆN ≤ ξ N ≤ θ 0 or θ 0 ≤ ξ N ≤ θˆN (9)

where ξ N is a parameter vector somewhere between θ 0 and θˆN .

d
Assuming that V ' ' N (ξ N , Z N ) = V ' N is non-singular and using (8) for (9),
dθ

θˆN − θ 0 = −[V ' ' N (ξ N , Z N )] V ' N (θ 0 , Z N )

−1
(10)

To obtain the distribution of θˆN − θ 0 , let us first examine V ' N (θ0 , Z N ) as N tends to
infinity.

1 N
dε
V ' N (θ 0 , Z N ) =
N
∑ ε (t ,θ
t =1
0 )
dθ
θ =θ 0 , (11)

ε (t ,θ ) = y (t ) − yˆ (t θ )
Recall and (6)

dε d
θ0 =− yˆ (t θ ) θ 0 = −ϕ T (t ), (12)
dθ dθ
and
ε (t ,θ 0 ) = ϕ T (t )θ 0 + e0 (t ) − ϕ T (t )θ 0 = e0 (t )

Therefore, (11) reduces to

4
N
1
− V ' N (θ 0 , Z N ) =
N
∑ϕ ( t ) e ( t )
t =1
0 (13)

Let us treat ϕ (t )e0 (t ) ≡ X t as a random variable. Its mean is zero, since

m = E [ϕ (t )e0 (t )] = E [ϕ (t )]E [e0 (t )] = 0 (14)

The covariance is

cov( X t X sT ) = E ⎡( X t − m )( X s − m ) ⎤ = E ⎡⎣ϕ (t )e0 (t )e0 ( s )ϕ ( s )T ⎤⎦ = 0, for t ≠ s

T
⎣ ⎦
(15)
T 2
[ ][
cov( X t X t ) = E e0 (t ) E ϕ (t )ϕ (t ) = λ0 R
T
] (16)

Note that X1, X2, …, XN are independent, since e0(t) is independent.

Consider
N N

∑ (X t − m) =
1 1
YN =
N t =1 N
∑ϕ ( t ) e ( t )
t =1
0

and apply the Central Limit Theorem. The distribution of YN , i.e. − N V ' N (θ 0 , Z N ) ,
converges to a Gaussian distribution as N tends to infinity.

Y N = − N V ' N (θ 0 , Z N ) ~ N (0, λ0 R ) (17)

Next, compute V ' ' N (ξ N , Z N )

V ' N (θ , Z N ) θ =ξ N
d
V ' ' N (ξ N , Z N ) =
dθ
d 1 N dε
= ∑
dθ N t =1
ε (t ,θ )
dθ
θ =ξ N

(18)
1 ⎧⎪ dε ⎛ dε ⎞
T
d 2ε ⎫⎪
= ⎨ ⎜ ⎟ + ε ( t , θ ) ⎬ θ =ξ
N ⎪⎩ dθ ⎝ dθ ⎠ dθ 2 ⎪⎭ N

∑ (ϕ (t )ϕ (t ) )
N
1
= T

N t =1

Therefore, under the ergodicity assumption,

∑ (ϕ (t )ϕ (t ) ) = R
N
1
V '' N (ξ N , Z N ) = lim T
(19)
N →∞ N
t =1

5
From (10), (17) and (19), the distribution of N (θˆN − θ 0 ) converges to the Gaussian
distribution given by

N (θˆN − θ 0 ) ~ N (0, Q ) as N →∞ (20)

where
Q = R −1 (λ0 R ) R −1 = λ0 R −1 (21)

Note that, as coordinate transformation y=Ax is performed, the covariant matrix C

associated with a multivariate Gaussian distribution is transformed to ACAT. This is used
in (21).

Large N
N (0, Q )

Small N

N (θˆN − θ0 ) (θˆN − θ0 )

Remarks

1) Eq.(20) manifests that the standard deviation of θˆN − θ 0 decreases at the rate of
1 1
for large N. See the figure above. Note that cov θˆN = Q .
N N

2) The above result is for a very restrictive case. A similar result can be obtained for
general cases with mild assumptions.
z The true system (7) does not have to be assumed. Instead, θ * = arg minV (θ )
must be involved in DM.
z The linear regression (6) can be extended to a general predictor where the model
parameter θ is determined based on the prediction error method (4), (5).

The extended result of estimate distribution is summarized in the following theorem, i.e.
Ljun’g Textbook Theorem 9-1.

6
Theorem 1 Consider the estimate θˆN determined by (4) and (5). Assume that the model
structure is linear and uniformly stable and that the data set Z ∞ satisfies the quasi
stationary and ergodicity requirements. Assume also that θˆN converges with probability 1
to a unique parameter vector θ * involved in DM:

θˆN → θ * ∈ DM w. p.1 as N → ∞ (22)

and that
V ' ' N (θ * ) > 0 ; positive definite (23)

and that

1 N
⎛ d ⎞
V '(θ *) = lim
N →∞ N
∑ ⎜⎝ dθ yˆ (t θ ) ⎟⎠ ε (t,θ ) θ
t =1
* converges to mt with probability 1

(24)

where mt is the ensemble mean given by

⎡N ⎛ d ⎞ ⎤
mt = E ⎢∑ ⎜ yˆ (t θ ) ⎟ε (t ,θ ) θ * ⎥ (25)
⎣ t =1 ⎝ dθ ⎠ ⎦

Then, the distribution of N (θˆN − θ 0 ) converges to the Gaussian distribution given by

N (θˆN − θ 0 ) ~ N (0, Pθ ) (26)

where Pθ is given by
[ ] [ −1
Pθ = V ' ' N (θ * ) Q V ' ' N (θ * ) ]
−1
(27)

( )(
Q = lim N ⋅ E[ V ' N (θ * ) V ' N (θ * ) ]
N →∞
)
T
(28)
⎛ d ⎞
The proof is quite complicated, since the random variable ⎜ yˆ (t θ ) ⎟ε (t ,θ ) θ * is not
⎝ dθ ⎠
independent. Therefore, the standard central limit theorem is not applicable.

Appendix 9A, at p.309 of Ljung’s textbook, shows the outline of proof. Since the model
structure is assumed to be stable uniformly in θ , Xt and Xs are independent as t and s are
1 N
distal. Because of this property, the sum, ∑ ( X t − mt ) , converges to the Gaussian
N t =1
distribution.

7
13.4 Expression for the Asymptotic Variance.

As stated formally in Theorem 1, the distribution of N θˆN − θ * converges to a ( )

Gaussian distribution for the broad class of system identification problems. This implies
that the covariance of θˆN asymptotically converges to:
1
CovθˆN ~ Pθ (29)
N

This is called the asymptotic covariance matrix.

The asymptotic variance depends not only on

(a) the member of samples/data set size: N, but also on

(b) the parameter sensitivity of the predictor:

ψ (t ,θ * ) =
d d
yˆ (t θ ) θ * = − ε (t ,θ ) θ * and (30)
dθ dθ

(c) Noise variance λ0 .

Let us compute the covariance once again for the general case. Form (5) and (30),

dε
( )
N N

∑ ε (t ,θ )ψ (t ,θ )
d 1 1
dθ
VN θ , Z N =
N
∑
t =1
ε (t ,θ )
dθ
=−
N t =1
(31)

Unlike the linear regression, the sensitivity ψ (t , θ ) is a function of θ ,

⎛ dε dψ ⎞
d2
dθ
(
V θ,Z N = −
2 N
1
N
) ∑ ⎜⎝ dθ ψ + ε ⎟
dθ ⎠
(32)
1 N
⎛ d2 ⎞
= ∑ ⎜⎜ψ (t , θ )ψ T (t , θ ) − ε (t , θ ) 2 yˆ (t θ ) ⎟⎟
t =1 ⎝ dθ
N ⎠

When the true system is contained in the model structure, θ 0 ∈ DM , and that is unique,

ε (t ,θ 0 ) = e0 (t ) (33)

from (28), (31), and (33)

8
∑∑ E[e (t )ψ (t ,θ ]
N N
N
Q = lim 0 0 )ψ T ( s, θ 0 )e0 ( s )
N →∞ N 2
t =1 s =1
(34)
∑ λ E[ψ (t ,θ )ψ ] [ ]
N
1
= lim 0 0
T
(t , θ 0 ) = λ0 E ψ (t ,θ 0 )ψ (t ,θ 0 )
T
N →∞ N
t =1

Also from (32)

⎡
[ ] d2 ⎤
N
V " (θ 0 ) = lim
1
N →∞ N
∑ ⎢ E ψ (t , θ 0 )ψ (t , θ 0 ) − ε (t , θ 0 ) dθ 2 yˆ (t θ ) θ0 ⎥
t =1 ⎣
T

0 ⎦ (35)

[ ]⎡ ⎤ 2
d
= E ψ (t , θ 0 )ψ T (t ,θ 0 ) − E ⎢e0 (t ) 2 yˆ θ0 ⎥
⎣ dθ ⎦
This depends on Z t −1 not on Z t
d2
Since e0 (t ) and yˆ are independent, the second term varnishes. Substituting (34) and
dθ 2
(35) into (29),

λ
1
[ (
CovθˆN ~ Pθ = 0 E ψ (t , θ 0 )ψ T (t ,θ 0 )
N N
)]
−1
(36)

The asymptotic variance is therefore a) inversely proportional to the number of samples,

b) proportional to the noise variance, and c) inversely related to the parameter sensitivity.
The more a parameter affects the prediction, the smaller the variance becomes.

Since θ 0 is not known, the asymptotic variance cannot be determined. In practice,

however, an empirical estimate, like the following formula, works well for large N.
−1
⎡1 N
⎤
PN = λˆN ⎢
⎣N
∑
t =1
ψ (t , θˆN )ψ T (t ,θˆN )⎥
⎦
(37)

N
1
λˆN =
N
∑ε
t =1
2
(t ,θˆN ) (38)

If one computes PN during experiments, sufficient data samples needed for assuming the
model accuracy may be obtained.

13.5 Frequency-Domain Expressions for the Asymptotic Variance.

The asymptotic variance has different expression in the frequency domain, which we will
find useful for variance analysis and experiment design.

9
Let transfer function G (q, θ ) and noise model H (q, θ ) be consolidated into as 1X2
matrix:
T (q, θ ) = [G (q, θ ), H (q, θ )] (39)

The gradient of T, that is, the sensitivity of T to θ , is

T ' (q, θ ) = T (q, θ ) = [G ' (q, θ ), H ' (q,θ )]

d
(40)
dθ

For a predictor, we have already defined W (q, θ ) and z(t), s.t.

⎡u ⎤
[ ]
yˆ (t θ ) = Wu (q)u (t ) + Wy (q) y (t ) = Wu W y ⎢ ⎥ = WZ (t )
⎣ y⎦
Therefore the predictor sensitivity ψ (t ,θ ) is given by
ψ (t ,θ ) =
d
dθ
[
yˆ (t θ ) = W 'u W ' y Z (t ) ] (41)

Wu' and W y' are computed as

d d HG '− H ' G
Wu' ⇒ Wu ( z ,θ ) = H −1 ( z ,θ )G ( z , θ ) = (42)
dθ dθ H 2 ( z,θ )

H ' ( z,θ )
Wy' ⇒
d
dθ
Wy ( z, θ ) =
d
dθ
[
1 − H −1 ( z , θ ) = 2 ]
H ( z,θ )

Substituting these back to ψ (t ,θ )

ψ (t ,θ ) =
1
[HG'− H ' G, H ']Z (t )
H ( q, θ )
2

[G ' , H ']⎡⎢
1 H 0⎤ ⎡u (t ) ⎤
= ⎥⎢ ⎥ (43)
H (q, θ )
2
⎣− G 1⎦ ⎣ y (t )⎦
1 ⎡ u (t ) ⎤
= T ' ( q, θ ) ⎢ −1 ⎥
H ( q, θ ) −1
⎣− H Gu + H y ⎦

At θ = θ 0 (the true system), note ε (t ,θ 0 ) = e0 (t ) and

− H −1 (q, θ 0 )G (q, θ 0 )u (t ) + H −1 (q,θ 0 ) y (t ) = e0 (t )
∴ψ (t , θ 0 ) = H −1 (q, θ 0 )T ' (q, θ 0 ) x0 (t ) (44)

where x0 (t ) = [u (t ) e0 (t )] .
T

Let Φ x0 (ω ) be the spectrum matrix of x0 (t )

10
⎡ Φ u (ω ) Φ ue0 (ω )⎤ Φ e0 (ω ) = λ0
Φ x0 (ω ) = ⎢ ⎥ (45)
⎣Φ ue0 (−ω ) Φ e0 (ω ) ⎦ Φ ue0 (ω ) = 0 for open − loop

1 π
Using the familiar formula: Rs (0) =
2π ∫ π Φ (ω )dω
−
s

[ ] π −2
1
E ψ (t ,θ 0 )ψ T (t ,θ 0 ) =
2π ∫π
−
H (eiω , θ 0 ) T ' (eiω ,θ 0 )Φ x0 (ω )T 'T (eiω ,θ 0 )dω (46)

For the noise spectrum,

2
Φ v (ω ) = λ0 H (eiω , θ 0 ) (47)

Using these in (36)

−1
1⎡ 1 π 1 ⎤
∫−π Φ v (ω ) T ' (e ,θ 0 )Φ x0 (ω )T ' (e ,θ 0 )dω ⎥⎦
iω iω
CovθˆN ~ ⎢ T

N ⎣ 2π

The asymptotic variance in the frequency domain.

Bera 2 - Introduction To Statistics For Econometricians, Part II Apostila
No ratings yet
Bera 2 - Introduction To Statistics For Econometricians, Part II Apostila
114 pages
Statistical Estimation
No ratings yet
Statistical Estimation
410 pages
Solutions To Steven Kay's Statistical Estimation Book
67% (3)
Solutions To Steven Kay's Statistical Estimation Book
16 pages
Notes Estimation Theory
100% (3)
Notes Estimation Theory
39 pages
Rakhlin Mathstat sp22
No ratings yet
Rakhlin Mathstat sp22
108 pages
Resa
No ratings yet
Resa
168 pages
4 ST Gallen Oct2024 UStatistics
No ratings yet
4 ST Gallen Oct2024 UStatistics
49 pages
Econ 623 AsymptoticTheory 2023
No ratings yet
Econ 623 AsymptoticTheory 2023
74 pages
Schema de Principe Electrical Schematic
No ratings yet
Schema de Principe Electrical Schematic
78 pages
XKWorkshopManual PDF
No ratings yet
XKWorkshopManual PDF
3,165 pages
ADA Full Manuscript Set II 23030187 1
No ratings yet
ADA Full Manuscript Set II 23030187 1
42 pages
Accuriopress 6136 6136p 6120 - Additional Information - en - 3 1 0
No ratings yet
Accuriopress 6136 6136p 6120 - Additional Information - en - 3 1 0
60 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Court Order
100% (1)
Court Order
17 pages
Consistencyofsumvar
No ratings yet
Consistencyofsumvar
21 pages
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
No ratings yet
Inference in Linear Regression Models With Many Covariates and Heteroskedasticity
47 pages
Sen 1968 Sen's Slope Method
No ratings yet
Sen 1968 Sen's Slope Method
12 pages
OptimalLinearFilters PDF
No ratings yet
OptimalLinearFilters PDF
107 pages
Econometría
No ratings yet
Econometría
43 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
#01 G.R. No. 100113
No ratings yet
#01 G.R. No. 100113
19 pages
(MIT 18.656) Lecture 1 Notes
No ratings yet
(MIT 18.656) Lecture 1 Notes
4 pages
Sen Slope
No ratings yet
Sen Slope
12 pages
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
No ratings yet
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
30 pages
Introduction
No ratings yet
Introduction
11 pages
LectureNotes 13 AsympVar 10 20
No ratings yet
LectureNotes 13 AsympVar 10 20
10 pages
1981 Estimating The Dimension of A Linear-Model - J. Andel, M. G. Perez and A. I. Negrao
No ratings yet
1981 Estimating The Dimension of A Linear-Model - J. Andel, M. G. Perez and A. I. Negrao
12 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
2A.3 Lecture Slides 0
No ratings yet
2A.3 Lecture Slides 0
19 pages
The Art of Troubleshooting - Ebook - V2
No ratings yet
The Art of Troubleshooting - Ebook - V2
356 pages
Trainz 2004 DRAFT Content Creation Procedures
100% (1)
Trainz 2004 DRAFT Content Creation Procedures
101 pages
Syllabus 2021 Foundation Engineering
No ratings yet
Syllabus 2021 Foundation Engineering
4 pages
18.6501x Fundamentals of Statistics
100% (1)
18.6501x Fundamentals of Statistics
8 pages
Engine Safety Unit
No ratings yet
Engine Safety Unit
2 pages
8 107
No ratings yet
8 107
6 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
MIT14 384F13 Rec7
No ratings yet
MIT14 384F13 Rec7
6 pages
Delta Method
No ratings yet
Delta Method
10 pages
2021 - Week - 3 - Ch.2 Random Process
No ratings yet
2021 - Week - 3 - Ch.2 Random Process
11 pages
Lecture 19: Dynamic Programming I: Memoization, Fibonacci, Shortest Paths, Guessing
No ratings yet
Lecture 19: Dynamic Programming I: Memoization, Fibonacci, Shortest Paths, Guessing
6 pages
Estimation of Parametric Functions in Downton's
No ratings yet
Estimation of Parametric Functions in Downton's
17 pages
Essentiel Proba Stat en
No ratings yet
Essentiel Proba Stat en
2 pages
Amcas Coursework Video
100% (2)
Amcas Coursework Video
7 pages
Statistical Inference
No ratings yet
Statistical Inference
82 pages
Bài Tập Ước Lượng C12346
No ratings yet
Bài Tập Ước Lượng C12346
55 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
8 pages
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
No ratings yet
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
20 pages
hw6 Sol
No ratings yet
hw6 Sol
12 pages
Contempo Proposal Group 1
No ratings yet
Contempo Proposal Group 1
15 pages
Probability and Statistics Soln 20
No ratings yet
Probability and Statistics Soln 20
5 pages
STD Dev
No ratings yet
STD Dev
5 pages
Teorema Central Del Limite
No ratings yet
Teorema Central Del Limite
9 pages
Stat 2013
No ratings yet
Stat 2013
132 pages
Solutions For Practice Set
No ratings yet
Solutions For Practice Set
7 pages
Lecture 24: Parallel Processor Architecture & Algorithms
No ratings yet
Lecture 24: Parallel Processor Architecture & Algorithms
11 pages
Application of Asymptotic Theory
No ratings yet
Application of Asymptotic Theory
19 pages
Buzz Marketing For Movies
No ratings yet
Buzz Marketing For Movies
9 pages
A Central Limit Theorem For Convex Sets
No ratings yet
A Central Limit Theorem For Convex Sets
45 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
7 pages
STAR HIB Plus Product Specifications 4
No ratings yet
STAR HIB Plus Product Specifications 4
1 page
Lecture 17: Shortest Paths III: Bellman-Ford
No ratings yet
Lecture 17: Shortest Paths III: Bellman-Ford
6 pages
MIT6 006F11 Lec21
No ratings yet
MIT6 006F11 Lec21
6 pages
Rahwaz Syndicate Profile
No ratings yet
Rahwaz Syndicate Profile
3 pages
Basic Statistic
No ratings yet
Basic Statistic
20 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
MIT6 006F11 Lec15
No ratings yet
MIT6 006F11 Lec15
7 pages
Mkt350 Final Report The Art of Potano
No ratings yet
Mkt350 Final Report The Art of Potano
30 pages
Hyaluronic Acid
No ratings yet
Hyaluronic Acid
7 pages
Quick Start Guide: Register Your Product and Get Support at
No ratings yet
Quick Start Guide: Register Your Product and Get Support at
6 pages
4-Quantity Calculations
No ratings yet
4-Quantity Calculations
18 pages
Probability and Statistical Course.: Instructor: DR - Ing. (C) Sergio A. Abreo C
No ratings yet
Probability and Statistical Course.: Instructor: DR - Ing. (C) Sergio A. Abreo C
12 pages
Important: Service Data Sheet
No ratings yet
Important: Service Data Sheet
4 pages
Ta 2
No ratings yet
Ta 2
7 pages
Chemical Engineering in Practice Second Edition - Sampler
100% (1)
Chemical Engineering in Practice Second Edition - Sampler
99 pages
Risk Ranger
No ratings yet
Risk Ranger
31 pages
Eugen Fink Oasis of Happiness
No ratings yet
Eugen Fink Oasis of Happiness
29 pages
Solutions To Exam 1: 1 2 N N A N
No ratings yet
Solutions To Exam 1: 1 2 N N A N
3 pages
ARINC Meteorological Data Collection and Reporting System (MDCRS)
No ratings yet
ARINC Meteorological Data Collection and Reporting System (MDCRS)
16 pages
Chapter 6: Regression
No ratings yet
Chapter 6: Regression
7 pages
Ps 1
No ratings yet
Ps 1
6 pages
Das PDF
No ratings yet
Das PDF
3 pages
Case Ih Tractor Ignition Electrical Parts
100% (2)
Case Ih Tractor Ignition Electrical Parts
16 pages
Lab02 DataTypes PDF
No ratings yet
Lab02 DataTypes PDF
5 pages
CH 7
No ratings yet
CH 7
47 pages
Lecture Notes 1 36-705 Brief Review of Basic Probability
No ratings yet
Lecture Notes 1 36-705 Brief Review of Basic Probability
7 pages
2
No ratings yet
2
29 pages
Hufnagel Transcript
No ratings yet
Hufnagel Transcript
3 pages
Bone Forming Tumors
No ratings yet
Bone Forming Tumors
81 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture Notes No. 18: 2.160 System Identification, Estimation, and Learning

Uploaded by

Lecture Notes No. 18: 2.160 System Identification, Estimation, and Learning

Uploaded by

2.

160 System Identification, Estimation, and Learning

13 Asymptotic Distribution of Parameter Estimates

If convergence is guaranteed, then θˆN → θ * .

The main points to be obtained in this chapter

The variance analysis of this chapter will reveal

12.2 Central Limit Theorems.

f X (x) and fY ( y ) have the same

Further, consider W = X + Y + V , fV (v) has the same rectangular PDF as X and Y.

A Central Limit Theorem of Independent Random Variables

Let X t , t = 0,1, " be a d-dimensional random variable with

Consider the sum of X t − m given by

Then, as N tends to infinity, the distribution of YN converges to the Gaussian distribution

13.3 Distribution of Estimate θˆN

The actual data is generated by

Since θˆN minimizes VN (θ , Z N )

Using the Mean Value Theorem, V ' N can be expressed as

V ' N (θˆN , Z N ) = V ' N (θ 0 , Z N ) + V ' ' N (ξ N , Z N )(θˆN − θ 0 ) θˆN ≤ ξ N ≤ θ 0 or θ 0 ≤ ξ N ≤ θˆN (9)

where ξ N is a parameter vector somewhere between θ 0 and θˆN .

θˆN − θ 0 = −[V ' ' N (ξ N , Z N )] V ' N (θ 0 , Z N )

Therefore, (11) reduces to

Let us treat ϕ (t )e0 (t ) ≡ X t as a random variable. Its mean is zero, since

m = E [ϕ (t )e0 (t )] = E [ϕ (t )]E [e0 (t )] = 0 (14)

cov( X t X sT ) = E ⎡( X t − m )( X s − m ) ⎤ = E ⎡⎣ϕ (t )e0 (t )e0 ( s )ϕ ( s )T ⎤⎦ = 0, for t ≠ s

Note that X1, X2, …, XN are independent, since e0(t) is independent.

Y N = − N V ' N (θ 0 , Z N ) ~ N (0, λ0 R ) (17)

Next, compute V ' ' N (ξ N , Z N )

Therefore, under the ergodicity assumption,

N (θˆN − θ 0 ) ~ N (0, Q ) as N →∞ (20)

Note that, as coordinate transformation y=Ax is performed, the covariant matrix C

θˆN → θ * ∈ DM w. p.1 as N → ∞ (22)

where mt is the ensemble mean given by

Then, the distribution of N (θˆN − θ 0 ) converges to the Gaussian distribution given by

N (θˆN − θ 0 ) ~ N (0, Pθ ) (26)

As stated formally in Theorem 1, the distribution of N θˆN − θ * converges to a ( )

This is called the asymptotic covariance matrix.

The asymptotic variance depends not only on

(a) the member of samples/data set size: N, but also on

(b) the parameter sensitivity of the predictor:

(c) Noise variance λ0 .

Unlike the linear regression, the sensitivity ψ (t , θ ) is a function of θ ,

from (28), (31), and (33)

Also from (32)

The asymptotic variance is therefore a) inversely proportional to the number of samples,

Since θ 0 is not known, the asymptotic variance cannot be determined. In practice,

13.5 Frequency-Domain Expressions for the Asymptotic Variance.

The gradient of T, that is, the sensitivity of T to θ , is

T ' (q, θ ) = T (q, θ ) = [G ' (q, θ ), H ' (q,θ )]

For a predictor, we have already defined W (q, θ ) and z(t), s.t.

Wu' and W y' are computed as

Substituting these back to ψ (t ,θ )

At θ = θ 0 (the true system), note ε (t ,θ 0 ) = e0 (t ) and

Let Φ x0 (ω ) be the spectrum matrix of x0 (t )

For the noise spectrum,

Using these in (36)

The asymptotic variance in the frequency domain.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.