Linear Quadratic Stochastic Control With Partial State Observation
Linear Quadratic Stochastic Control With Partial State Observation
Lecture 10
Linear Quadratic Stochastic Control with
Partial State Observation
10–1
Linear stochastic system
yt = Cxt + vt, t = 0, . . . , N
• ut = φt(Yt), t = 0, . . . , N − 1
– Yt = (y0, . . . , yt) is output history at time t
– φt : Rp(t+1) → Rm called the control policy at time t
• closed-loop system is
• objective:
−1
N
!
X
xTt Qxt uTt Rut + xTN QxN
J =E +
t=0
with Q ≥ 0, R > 0
• define
– x̂t = E(xt|Yt) (current state estimate)
– Σt = E(xt − x̂t)(xt − x̂t)T (current state estimate covariance)
– Σt+1|t = AΣtAT + W (next state estimate covariance)
• using
x̂t+1 = Ax̂t + But + Lt+1et+1,
with et+1 ∼ N (0, CΣt+1|tC T + V ), independent of Yt, we get
we get
N
X N
X
J⋆ = Tr(QΣt) + Tr Pt(Σt|t−1 − Σt)
t=0 t=0
using Σ0|−1 = X
which simplifies to
J ⋆ = Jlqr + Jest
where
N
X
Jlqr = Tr(P0X) + Tr(PtW ),
t=1
N
X
Jest = Tr((Q − P0)Σ0) + Tr((Q − Pt)Σt) + Tr(PtAΣt−1AT )
t=1
– Jlqr is the stochastic LQR cost, i.e., the optimal objective if you
knew the state
– Jest is the cost of not knowing (i.e., estimating) the state
N −1
1 X T T
J = lim E xt Qxt + ut Rut
N →∞ N
t=0
P = Q + AT P A − AT P B(R + B T P B)−1B T P A,
Σ̃ = AΣ̃AT + W − AΣ̃C T (C Σ̃C T + V )−1C Σ̃AT
where
• Q = I, R = I, X = I, W = 0.5I, V = 0.5I
• we compare LQG with the case where state is known (stochastic LQR)
2
(xt)1
0
−2
0 10 20 30 40 50
1
(ut)1
−1
0 10 20 30 40 50
t
500
400 J⋆
300
200
100
0
0 5 10 15 20 25 30
500
⋆
400 Jlqr
300
200
100
0
0 5 10 15 20 25 30