0% found this document useful (0 votes)
103 views20 pages

Linear Quadratic Stochastic Control With Partial State Observation

This document summarizes the solution to the linear quadratic stochastic control problem with partial state observation (LQG problem). It presents the key steps as: 1) The optimal policies are linear functions of the minimum mean squared error state estimates from a Kalman filter. 2) The solution can be obtained using dynamic programming by expressing the optimal cost-to-go as a quadratic function plus a constant. 3) The optimal objective can be expressed in terms of contributions from the linear quadratic regulator problem and the state estimation problem.

Uploaded by

arjun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views20 pages

Linear Quadratic Stochastic Control With Partial State Observation

This document summarizes the solution to the linear quadratic stochastic control problem with partial state observation (LQG problem). It presents the key steps as: 1) The optimal policies are linear functions of the minimum mean squared error state estimates from a Kalman filter. 2) The solution can be obtained using dynamic programming by expressing the optimal cost-to-go as a quadratic function plus a constant. 3) The optimal objective can be expressed in terms of contributions from the linear quadratic regulator problem and the state estimation problem.

Uploaded by

arjun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

EE363 Winter 2008-09

Lecture 10
Linear Quadratic Stochastic Control with
Partial State Observation

• partially observed linear-quadratic stochastic control problem

• estimation-control separation principle

• solution via dynamic programming

10–1
Linear stochastic system

• linear dynamical system, over finite time horizon:

xt+1 = Axt + But + wt, t = 0, . . . , N − 1

with state xt, input ut, and process noise wt

• linear noise corrupted observations:

yt = Cxt + vt, t = 0, . . . , N

yt is output, vt is measurement noise

• x0 ∼ N (0, X), wt ∼ N (0, W ), vt ∼ N (0, V ), all independent

Linear Quadratic Stochastic Control with Partial State Observation 10–2


Causal output feedback control policies

• causal feedback policies:


– input must be function of past and present outputs
– roughly speaking: current state xt is not known

• ut = φt(Yt), t = 0, . . . , N − 1
– Yt = (y0, . . . , yt) is output history at time t
– φt : Rp(t+1) → Rm called the control policy at time t

• closed-loop system is

xt+1 = Axt + Bφt(Yt) + wt, yt = Cxt + vt

• x0, . . . , xN , y0, . . . , yN , u0, . . . , uN −1 are all random

Linear Quadratic Stochastic Control with Partial State Observation 10–3


Stochastic control with partial observations

• objective:

−1
N
!
X
xTt Qxt uTt Rut + xTN QxN

J =E +
t=0

with Q ≥ 0, R > 0

• partially observed linear quadratic stochastic control problem


(a.k.a. LQG problem):
choose output feedback policies φ0, . . . , φN −1 to minimize J

Linear Quadratic Stochastic Control with Partial State Observation 10–4


Solution

• optimal policies are φt(Yt) = Kt E(xt|Yt)


– Kt is optimal feedback gain matrix for associated LQR problem
– E(xt|Yt) is the MMSE estimate of xt given measurements Yt
(can be computed using Kalman filter)

• called separation principle: optimal policy consists of


– estimating state via MMSE (ignoring the control problem)
– using estimated state as if it were the actual state, for purposes of
control

Linear Quadratic Stochastic Control with Partial State Observation 10–5


LQR control gain computation

• define PN = Q, and for t = N, . . . , 1,

Pt−1 = AT PtA + Q − AT PtB(R + B T PtB)−1B T PtA

• set Kt = −(R + B T Pt+1B)−1B T Pt+1A, t = 0, . . . , N − 1

• Kt does not depend on data C, X, W , V

Linear Quadratic Stochastic Control with Partial State Observation 10–6


Kalman filter current state estimate

• define
– x̂t = E(xt|Yt) (current state estimate)
– Σt = E(xt − x̂t)(xt − x̂t)T (current state estimate covariance)
– Σt+1|t = AΣtAT + W (next state estimate covariance)

• start with Σ0|−1 = X; for t = 0, . . . , N ,

Σt = Σt|t−1 − Σt|t−1C T (CΣt|t−1C T + V )−1CΣt|t−1,


Σt+1|t = AΣtAT + W

• define Lt = Σt|t−1C T (CΣt|t−1C T + V )−1, t = 0, . . . , N

Linear Quadratic Stochastic Control with Partial State Observation 10–7


• set x̂0 = L0y0; for t = 0, . . . , N − 1,

x̂t+1 = Ax̂t + But + Lt+1et+1, et+1 = yt+1 − C(Ax̂t + But)

– et+1 is next output prediction error


– et+1 ∼ N (0, CΣt+1|tC T + V ), independent of Yt

• Kalman filter gains Lt do not depend on data B, Q, R

Linear Quadratic Stochastic Control with Partial State Observation 10–8


Solution via dynamic programming

• let Vt(Yt) be optimal value of LQG problem, from t on, conditioned on


the output history Yt:
!
N
X −1
Vt(Yt) = min E (xTτ Qxτ + uTτ Ruτ ) + xTN QxN Yt

φt ,...,φN −1
τ =t

• we’ll show that Vt is a quadratic function plus a constant, in fact,

Vt(Yt) = x̂Tt Ptx̂t + qt, t = 0, . . . , N,

where Pt is the LQR cost-to-go matrix (x̂t is a linear function of Yt)

Linear Quadratic Stochastic Control with Partial State Observation 10–9


• we have

VN (YN ) = E(xTN QxN |YN ) = x̂TN Qx̂N + Tr(QΣN )

(using xN |YN ∼ N (x̂N , ΣN )) so PN = Q, qN = Tr(QΣN )

• dynamic programming (DP) equation is

xTt Qxt uTt Rut



Vt(Yt) = min E + + Vt+1(Yt+1)|Yt
ut

(and argmin, which is a function of Yt, is optimal input)

• with Vt+1(Yt+1) = x̂Tt+1Pt+1x̂t+1 + qt+1, DP equation becomes

xTt Qxt uTt Rut x̂Tt+1Pt+1x̂t+1



Vt(Yt) = min E + + + qt+1|Yt
ut

E(xTt Qxt|Yt) uTt Rut E(x̂Tt+1Pt+1x̂t+1|Yt)



= + qt+1 + min +
ut

Linear Quadratic Stochastic Control with Partial State Observation 10–10


• using xt|Yt ∼ N (x̂t, Σt), the first term is

E(xTt Qxt|Yt) = x̂Tt Qx̂t + Tr(QΣt)

• using
x̂t+1 = Ax̂t + But + Lt+1et+1,
with et+1 ∼ N (0, CΣt+1|tC T + V ), independent of Yt, we get

E(x̂Tt+1Pt+1x̂t+1|Yt) = x̂Tt AT Pt+1Ax̂t + uTt B T Pt+1But + 2x̂Tt AT Pt+1But


T T

+ Tr (Lt+1Pt+1Lt+1)(CΣt+1|tC + V )

• using Lt+1 = Σt+1|tC T (CΣt+1|tC T + V )−1, last term becomes

Tr(Pt+1Σt+1|tC T (CΣt+1|tC T +V )−1CΣt+1|t) = Tr Pt+1(Σt+1|t−Σt+1)

Linear Quadratic Stochastic Control with Partial State Observation 10–11


• combining all terms we get

Vt(Yt) = x̂Tt (Q + AT Pt+1A)x̂t + qt+1 + Tr(QΣt)


+ Tr Pt+1(Σt+1|t − Σt+1)
+ min(uTt (R + B T Pt+1B)ut + 2x̂Tt AT Pt+1But)
ut

• minimization same as in deterministic LQR problem


• thus optimal policy is φ⋆t(Yt) = Ktx̂t, with

Kt = −(R + B T Pt+1B)−1B T Pt+1A

• plugging in optimal ut we get Vt(Yt) = x̂Tt Ptx̂t + qt, where

Pt = AT Pt+1A + Q − AT Pt+1B(R + B T Pt+1B)−1B T Pt+1A


qt = qt+1 + Tr(QΣt) + Tr Pt+1(Σt+1|t − Σt+1)

• recursion for Pt is exactly the same as for deterministic LQR

Linear Quadratic Stochastic Control with Partial State Observation 10–12


Optimal objective

• optimal LQG cost is

J ⋆ = E V0(y0) = q0 + E x̂T0 P0x̂0 = q0 + Tr P0(X − Σ0)

using x̂0 ∼ N (0, X − Σ0)

• using qN = Tr QΣN and

qt = qt+1 + Tr(QΣt) + Tr Pt+1(Σt+1|t − Σt+1)

we get
N
X N
X
J⋆ = Tr(QΣt) + Tr Pt(Σt|t−1 − Σt)
t=0 t=0
using Σ0|−1 = X

Linear Quadratic Stochastic Control with Partial State Observation 10–13


• we can write this as
N
X N
X
J⋆ = Tr(QΣt) + Tr Pt(AΣt−1AT + W − Σt) + Tr(P0(X − Σ0))
t=0 t=1

which simplifies to
J ⋆ = Jlqr + Jest
where
N
X
Jlqr = Tr(P0X) + Tr(PtW ),
t=1
N
X
Jest = Tr((Q − P0)Σ0) + Tr((Q − Pt)Σt) + Tr(PtAΣt−1AT )
t=1

– Jlqr is the stochastic LQR cost, i.e., the optimal objective if you
knew the state
– Jest is the cost of not knowing (i.e., estimating) the state

Linear Quadratic Stochastic Control with Partial State Observation 10–14


• when state measurements are exact (C = I, V = 0), we have Σt = 0,
so we get
XN
J ⋆ = Jlqr = Tr(P0X) + Tr(PtW )
t=1

Linear Quadratic Stochastic Control with Partial State Observation 10–15


Infinite horizon LQG

• choose policies to minimize infinite horizon average stage cost

N −1
1 X T T

J = lim E xt Qxt + ut Rut
N →∞ N
t=0

• optimal average stage cost is

J ⋆ = Tr(QΣ) + Tr(P (Σ̃ − Σ))

where P and Σ̃ are PSD solutions of AREs

P = Q + AT P A − AT P B(R + B T P B)−1B T P A,
Σ̃ = AΣ̃AT + W − AΣ̃C T (C Σ̃C T + V )−1C Σ̃AT

and Σ = Σ̃ − Σ̃C T (C Σ̃C T + V )−1C Σ̃

Linear Quadratic Stochastic Control with Partial State Observation 10–16


• optimal average stage cost doesn’t depend on X

• (an) optimal policy is

ut = K x̂t, x̂t+1 = Ax̂t + But + L(yt+1 − C(Ax̂t + But))

where

K = −(R + B T P B)−1B T P A, L = Σ̃C T (C Σ̃C T + V )−1

• K is steady-state LQR feedback gain

• L is steady-state Kalman filter gain

Linear Quadratic Stochastic Control with Partial State Observation 10–17


Example

• system with n = 5 states, m = 2 inputs, p = 3 outputs; infinite horizon

• A, B, C chosen randomly; A scaled so maxi |λi(A)| = 1

• Q = I, R = I, X = I, W = 0.5I, V = 0.5I

• we compare LQG with the case where state is known (stochastic LQR)

Linear Quadratic Stochastic Control with Partial State Observation 10–18


Sample trajectories

sample trace of (xt)1 and (ut)1 in steady state

2
(xt)1
0

−2

0 10 20 30 40 50

1
(ut)1

−1

0 10 20 30 40 50
t

blue: LQG, red: stochastic LQR

Linear Quadratic Stochastic Control with Partial State Observation 10–19


Cost histogram

histogram of stage costs for 5000 steps in steady state

500
400 J⋆
300
200
100
0
0 5 10 15 20 25 30

500

400 Jlqr
300
200
100
0
0 5 10 15 20 25 30

Linear Quadratic Stochastic Control with Partial State Observation 10–20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy