0% found this document useful (0 votes)

28 views36 pages

ECE586BH Lecture1

This document summarizes a lecture on the interplay between feedback control and machine learning. It discusses how control theory concepts like robustness, safety, and model-based design relate to machine learning areas like large datasets, performance optimization, and data-driven training. The lecture outlines how tools from robust control like integral quadratic constraints can provide a unified framework for analyzing the robustness of neural networks and stochastic learning algorithms as dynamical systems. It also discusses how nonconvex optimization challenges in control design can be addressed using ideas from machine learning.

Uploaded by

alisina bayati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views36 pages

ECE586BH Lecture1

Uploaded by

alisina bayati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

ECE586BH: Interplay between Control and

Machine Learning

Bin Hu
ECE , University of Illinois Urbana-Champaign

Lecture 1, Fall 2023

Feedback Control Machine learning

• dynamical systems • statistics/optimization

• robustness • large-scale (big data)
• safety-critical • performance-driven
• model-based design • train using data
• CDC/ACC/ECC • NeurIPS/ICML/ICLR

2
Artificial Intelligence Revolution

Safety-critical applications!
3
Flight Control Certification

Ref: J. Renfrow, S. Liebler, and J. Denham. “F-14 Flight Control Law Design,
Verification, and Validation Using Computer Aided Engineering Tools,” 1996.
4
Feedback Control Machine learning

Unified and automated tools for a repeatable and trustable

design process of next generation intelligent systems

5
Example: Robustness is crucial!
• Deep learning: Small adversarial perturbations can fool the classifier!

• Optimization: The oracle can be inexact! xk+1 = xk − α(∇f (xk )+ek )

• Decision and control: Model uncertainty and sim-to-real gap matter!

6
Control for Learning
Control theory addresses unified analysis and design of dynamical systems.

LTI systems Markov jump systems Lur’e systems

ξk+1 = Aξk + Buk ξk+1 = Aik ξk + Bik uk ξk+1 = Aξk + Bφ(Cξk )

T
AT P B

Pn A PA − P
AT P A − P ≺ 0 T
i=1 pij Ai Pi Ai ≺ Pj ≺M
BTP A BTP B

Pros: Unified testing conditions when problem parameters are changed.

Cons: For control, we only need to solve the conditions numerically.
Control for learning: Algorithms and networks treated as control systems
• Neural networks as generalized Lur’e systems
• Stochastic learning algorithms as generalized Lur’e systems

Key message: Robustness can be addressed in a unified manner!

7
Learning for Control
Control theory addresses unified analysis and design of dynamical systems.

LTI systems MJLS Lur’e systems

ξk+1 = Aξk + Buk ξk+1 = Aik ξk + Bik uk ξk+1 = Aξk + Bφ(Cξk )

T
AT P B

T
Pn T A PA − P
A PA − P ≺ 0 i=1 pij Ai Pi Ai ≺ Pj ≺M
BTP A BTP B

Many control design methods rely on convex conditions (BGFB1994).

What about problems that cannot be formulated as convex optimization?
• Direct policy search (e.g. min J(K)) is nonconvex!

Learning for control: Tailoring nonconvex learning theory to push robust control
theory beyond the convex regime
8
Outline

• Control for Learning

• Control methods for certifiably robust neural networks
• A control perspective on stochastic learning algorithms

• Learning for Control

• Global convergence of direct policy search on robust control

9
Robust Control Theory
- ∆

v w

1. Approximate the true system as “a linear system + a perturbation”

2. ∆ can be a troublesome element: nonlinearity, uncertainty, or delays
3. Rich control literature including standard textbooks
• Zhou, Doyle, Glover, “ Robust and optimal control,” 1996
4. Many tools: small gain, passivity, dissipativity, Zames-Falb multipliers, etc
5. The integral quadratic constraint (IQC) framework [Megretski, Rantzer
(TAC1997)] provides a unified analysis for “ LTI P + troublesome ∆”
6. Recently, IQC analysis has been extended for more general P
7. Typically, the stability is tested by a SDP condition
10
Quadratic Constraints from Robust Control
• Lur’e system: ξk+1 = Aξk + B∆(Cξk ).
• EX: Gradient method (A = I, B = −αI, C = I, and ∆ = ∇f )
• Question: How to prove that the above Lur’e system converges? We are
looking at the following set of coupled sequences {ξk , wk , vk }

{(ξ, w, v) : ξk+1 = Aξk + Bwk , vk = Cξk } ∩ {(ξ, w, v) : wk = ∆(vk )}

• Key idea: Quadratic constraints! Replace the troublesome nonlinear
element ∆ with the following quadratic constraint:
( T )
vk vk
{(v, w) : wk = ∆(vk )} ⊂ (v, w) : M ≤0 ,
wk wk
where M is constructed from the property of ∆.
• If we can show that any sequence from the set below converges,
( T )
vk vk
(ξ, w, v) : ξk+1 = Aξk + Bwk , vk = Cξk , M ≤0 ,
wk wk
then we are done.
11
Quadratic Constraints from Robust Control
Now we are analyzing the sequence from the following set:
( T )
vk vk
(ξ, w, v) : ξk+1 = Aξk + Bwk , vk = Cξk , M ≤0
wk wk

Theorem
If there exists a positive definite matrix P and 0 < ρ < 1 s.t.
T T
A P A − ρ2 P AT P B

C 0 C 0
M
BTP A BTP B 0 I 0 I
T
then ξk+1 P ξk+1 ≤ ρ2 ξkT P ξk and limk→∞ ξk = 0.
T T T T
A P A − ρ2 P AT P B

ξk ξk ξ C 0 C 0 ξk
≤ k M
wk BTP A BTP B wk wk 0 I 0 I wk
| {z } | {z }
T TP ξ T 
P ξk+1 −ρ2 ξk
 
ξk+1 k
v v
 k  M  k ≤0
wk wk
This condition is a semidefinite program (SDP) problem!
12
Illustrative Example: Gradient Descent Method
• Rewrite the gradient method xk+1 = xk − α∇f (xk ) as:
(xk+1 − x? ) = (xk − x? ) − α ∇f (xk )
| {z } | {z } | {z }
ξk+1 ξk wk

• If f is L-smooth and m-strongly convex, then by co-coercivity:

T
x − x? −(m + L)I x − x?

2mLI
≤0
∇f (x) −(m + L)I 2I ∇f (x)
| {z }
M

• We have A = I, B = −αI, C = I, and the following SDP

T T
A P A − ρ2 P A T P B

C 0 C 0
M
BTP A BTP B 0 I 0 I

(1 − ρ2 )p −αp

• This leads to −2mL m + L
+ ⊗I 0
−αp α2 p m+L −2
• Choose (α, ρ, p) to be ( L1 , 1 − m 2 2 L−m 1 2
L , L ) or ( L+m , L+m , 2 (L + m) ) to
? ?
recover standard rates, i.e. kxk+1 − x k ≤ (1 − m/L)kxk − x k
• For this proof, is strong convexity really needed? No! Regularity condition!
13
Illustrative Example: Gradient Descent Method
• We have shown kxk+1 − x? k ≤ (1 − m/L)kxk − x? k
• Is it a contraction, i.e. kxk+1 − x0k+1 k ≤ (1 − m/L)kxk − x0k k?
• (xk+1 − x0k+1 ) = (xk − x0k ) − α (∇f (xk ) − ∇f (x0k ))
| {z } | {z } | {z }
ξk+1 ξk wk

• If f is L-smooth and m-strongly convex, then by co-coercivity:

T
x − x0 x − x0

2mLI −(m + L)I
≤0
∇f (x) − ∇f (x0 ) −(m + L)I 2I ∇f (x) − ∇f (x0 )
| {z }
M

• We have A = I, B = −αI, C = I, and the same SDP

(1 − ρ2 )p −αp

−2mL m + L
+ ⊗I 0
−αp α2 p m+L −2

• Choose (α, ρ, p) to be ( L1 , 1 − m 2
L,L )
2
or ( L+m , L−m 1 2
L+m , 2 (L + m) ) to give
the contraction result!
• For this proof, is strong convexity really needed? Yes!
14
Outline

• Control for Learning

• Control methods for certifiably robust neural networks
• A control perspective on stochastic learning algorithms

• Learning for Control

• Global convergence of direct policy search on robust control

15
Deep Learning for Classification
Deep learning has revolutionized the fields of AI and computer vision!
• Input space X ⊂ Rd to a label space Y := {1, . . . , H}.
• Predict labels from image pixels
• Neural network classifier function f := (f1 , . . . , fH ) : X → RH such that
the predicted label for an input x is arg maxj fj (x).
• Input-label (x, y) is correctly classified if arg maxj fj (x) = y.

16
Deep Learning Models

Deep learning models: f (x) = xD+1 and x0 = x

• Feedforward: xk+1 = σ(Wk xk + bk ) for k = 0, 1, · · · , D
• Residual network: xk+1 = xk − σ(Wk xk + bk ) for k = 0, 1, · · · , D
• Many other structures: transformers, etc
Deep learning models are expressive and generalize well, achieving state-of-the-art
results in computer vision and natural language processing. However, ...
17
Adversarial Attacks and Robustness

• For correct labels (i.e. arg maxj fj (x) = y), one may find kτ k ≤ ε s.t.
arg maxj fj (x + τ ) 6= y (small perturbation lead to wrong prediction)
• Small perturbation can fool modern deep learning models!
• How to deploy deep learning models into safety-critical applications?
• Certified robustness: A classifier f is certifiably robust at radius ε ≥ 0 at
point x with label y if for all τ such that kτ k ≤ ε : arg maxj fj (x + τ ) = y

18
1-Lipschitz Networks for Certified Robustness
• Tsuzuku, Sato, Sugiyama (NeurIPS2018): Let f be L-Lipschitz. If we have
√
Mf (x) := max(0, fy (x) − max
0
f y 0 (x)) > 2Lε
y 6=y

then we have for every τ such that kτ k2 ≤ ε: arg maxj fj (x + τ ) = y

√
• Perturbation smaller than Mf (x)/ 2L cannot deceive f for datapoint x!
• If each layer of a network is 1-Lipchitz, the entire network is 1-Lipschitz.
√
• For each data point, we test whether Mf (x) > 2ε, and then count the
percentage of data points that is guaranteed to be guarded for perturbation
smaller than ε (which is the certified accuracy for that ε).
• We need to train a Lipschitz neural network with good prediction margins!
Previous approaches:

• Spectral normalization (MKKY2018): xk+1 = σ 1
kWk k2 Wk xk + bk
• Orthogonality (TK2021, SF2021): xk+1 = σ(Wk xk + bk ) with WkT Wk = I
• Convex potential layer (MDAA2022): xk+1 = xk − kW2 k2 Wk σ(WkT x + bk )
k 2
1
−
• AOL (PL2022): xk+1 = σ(Wk diag( j |Wk Wk |ij ) 2 xk + bk )
T
P
19
My Focus: Principles for 1-Lipschitz Networks
Theorem (AHDAH2023)
If there exists nonsingular diagonal Tk s.t. WkT Wk Tk , then we have
−1
1. The layer xk+1 = σ(Wk Tk 2 xk + bk ) is 1-Lipschitz for any 1-Lipschitz σ.
2. The layer xk+1 = xk − 2Wk Tk−1 σ(WkT x + bk ) is 1-Lipschitz if σ is ReLU.
−1 −1 −1
kxk+1 − x0k+1 k2 ≤ kWk Tk 2 (xk − x0k )k2 = (xk − x0k )T Tk 2 WkT Wk Tk 2 (xk − x0k )
| {z }
≤kxk −x0k k2

The second statement can be proved using the quadratic constraint argument.

A Unification of Existing 1-Lipschitz Neural Networks

• Spectral normalization: Statement 1 with Tk = kWk k22 I
• Orthogonal weights: Statement 1 with Tk = I and WkT Wk = I
• CPL: Statement 2 with Tk = kWk k22 I
• AOL: Statement 1 with Tk = diag( nj=1 |WkT Wk |ij )
P

• Control Theory (SLL): Tk = diag( nj=1 |WkT Wk |ij qj /qi ).

P
20
Experimental Results
4 versions of SDP-based Lipchitz Network (SLL) (S, M, L, XL)

Natural Provable Accuracy (ε)

Datasets Models
Accuracy 36 72 108
1
255 255 255

Cayley Large 43.3 29.2 18.8 11.0 -

SOC 20 48.3 34.4 22.7 14.2 -
SOC+ 20 47.8 34.8 23.7 15.8 -
CPL XL 47.8 33.4 20.9 12.6 -
AOL Large 43.7 33.7 26.3 20.7 7.8
CIFAR100
SLL Small 45.8 34.7 26.5 20.4 7.2
SLL Medium 46.5 35.6 27.3 21.1 7.7
SLL Large 46.9 36.2 27.9 21.6 7.9
SLL X-Large 47.6 36.5 28.2 21.8 8.2

• Competitive results over CIFAR100 and TinyImageNet

• Many extensions: Lipschitz deep equilibrium models, neural ODEs, etc
21
Quadratic Constraints for Lipschitz Networks
• Residual network: xk+1 = xk − Gk σ(WkT xk + bk ) for k = 0, 1, · · · , D.
• 1-Lipschitz layer: How to enforce kxk+1 − x0k+1 k ≤ kxk − x0k k?
• (xk+1 − x0k+1 ) = (xk − x0k ) −Gk σ(WkT xk + bk ) − σ(WkT x0k + bk )

| {z } | {z } | {z }
ξk+1 ξk wk

• We will use the property of σ to construct Mk such that we only need to

look at the following set with Ak = I and Bk = −Gk :
( T )
ξk ξk
(ξ, w) : ξk+1 = Ak ξk + Bk wk , Mk ≤0 ,
wk wk

• Then we can ensure kξk+1 k ≤ kξk k via enforcing a SDP for the set:

T T T
AT AT

Ak P Ak − P k P Bk M =⇒ ξk Ak Ak − I k Bk ξk
k |{z} ≤0
BkT P Ak BkT P Bk wk BkT Ak BkT Bk wk
P =I | {z }
kξk+1 k2 −kξk k2 =kxk+1 −x0k+1 k2 −kxk −x0k k2

22
Quadratic Constraints for Lipschitz Networks
• Since σ is slope-restricted on [0, 1], the following scalar-version incremental
quadratic constraint holds with m = 0 and L = 1:
T
a − a0 a − a0

2mL −(m + L)
≤0
σ(a) − σ(a0 ) −(m + L) 2 σ(a) − σ(a0 )
| {z }
 

0 −1 
−1 2

• The vector-version quadratic constraint: For diagonal Γk 0, we have

T
vk − vk0 vk − vk0

0 −Γk
≤0
σ(vk ) − σ(vk0 ) −Γk 2Γk σ(vk ) − σ(vk0 )
| {z }
Xk

• Choosing vk = WkT xk + bk and vk0 = WkT x0k + bk , we have

T
WkT (xk − x0k ) WkT (xk − x0k )

Xk ≤0
σ(WkT xk + bk ) − σ(WkT x0k + bk ) σ(WkT xk + bk ) − σ(WkT x0k + bk )
23
Quadratic Constraints for Lipschitz Networks
T
−Γk WkT

ξ ξk Wk 0 0 0
• We get k Mk ≤ 0 with Mk =
wk wk 0 I −Γk 2Γk 0 I
| {z }
 
0 −Wk Γk 
−Γk WkT

2Γk

Theorem
If there exists diagonal Γk 0 such that

0 −Gk 0 −Wk Γk

−GT k GTk Gk −Γk WkT 2Γk

then the residual network xk+1 = xk − Gk σ(WkT xk + bk ) is 1-Lipschitz.

• Analytical solution: Gk = Wk Γk and Γk WkT Wk Γk 2Γk .

• Suppose Γk is nonsingular, and Tk = 2Γ−1k . Then the residual network
xk+1 = xk − 2Wk Tk−1 σ(WkT xk + bk ) is 1-Lipschitz as long as Tk WkT Wk
• Ref: Araujo, Havens, Delattre, Allauzen, H.. A unifying algebraic
perspective on Lipschitz neural networks, ICLR, 2023. (Spotlight) 24
Outline

• Control for Learning

• Control Methods on Certifiably Robust Neural Networks
• A Control Perspective on Stochastic Learning Algorithms

• Learning for Control

• Global convergence of direct policy search on robust control

25
History: Computer-Assisted Proofs in Optimization
In the past ten years, much progress has been made in leveraging SDPs to assist
the convergence rate analysis of optimization methods.
• Drori and Teboulle (MP2014): numerical worst-case bounds via the
performance estimation problem (PEP) formulation
• Lessard, Recht, Packard (SIOPT2016): numerical linear rate bounds using
integral quadratic constraints (IQCs) from robust control theory
• Taylor, Hendrickx, Glineur (MP2017): interpolation conditions for PEPs
• H., Lessard (ICML2017): first SDP-based analytical proof for Nesterov’s
accelerated rate
• H., Seiler, Ranzter (COLT2017): first paper on SDP-based convergence
proofs for stochastic optimization using jump system theory and IQCs
• Van Scoy, Freeman, and Lynch (LCSS2017): first paper on control-oriented
design of accelerated methods: triple momentum
Taken further by different groups
• inexact gradient methods, proximal gradient methods, conditional gradient
methods, operator splitting methods, mirror descent methods, distributed
gradient methods, monotone inclusion problems 26
Stochastic Methods for Machine Learning
• Many learning tasks (regression/classification) lead to finite-sum ERM
n
1X
minp fi (x)
x∈R n i=1
where fi (x) = li (x) + λR(x) (li is the loss, and R avoids over-fitting).
• Stochastic gradient descent (SGD): xk+1 = xk − α∇fik (xk )
• Inexact oracle: xk+1 = xk − α(∇fik (xk ) + ek ) where kek k ≤ δk∇fik (xk )k
(the angle θ between (ek + ∇fik (xk )) and ∇fik (xk ) satisfies | sin(θ)| ≤ δ)
• Algorithm change: SAG (SRF2017) vs. SAGA (DBL2014)
n
!
k+1 k ∇fik (xk ) − yikk 1X k
SAG: x =x −α + y
n n i=1 i
n
!
k+1 k k k 1X k
SAGA: x = x − α ∇fik (x ) − yik + y
n i=1 i

∇fi (xk ) if i = ik

k+1
where yi :=
yik otherwise
• Markov assumption: In reinforcement learning, {ik } can be Markovian 27
My Focus: Unified Analysis of Stochastic Methods
Assumption




• fi smooth, f RSI 





• ik is IID or Markovian 




 Bound

• Oracle is exact or inexact 



• many other possibilities  • Ekxk − x? k2 ≤ c2 ρk + O(α)




 • Ekxk − x? k2 ≤ c2 ρk

Method

• Other forms





• SGD 





• SAGA-like methods 





• Temporal difference learning
How to automate rate analysis of stochastic learning algorithms? Use
numerical semidefinite programs to support search for analytical proofs?

assumption + method =⇒ bound

28
My Focus: Stochastic Methods for Learning
In the deterministic setting, we just need to show that the trajectories generated
by optimization methods belong to the following set:
( T )
vk vk
(ξ, w, v) : ξk+1 = Aξk + Bwk , vk = Cξk , Mj ≤ Λj , j ∈ Π
wk wk

What to do for stochastic optimization (e.g. xk+1 = xk − α∇fik (xk ) where

ik ∈ {1, · · · , n} is sampled)?
• Stochastic quadratic constraints: Show that the trajectories generated by
stochastic optimization methods belong to the following set:
( T )
vk vk
(ξ, w, v) : ξk+1 = Aξk + Bwk , vk = Cξk , E Mj ≤ Λj , j ∈ Π
wk wk

• Jump system approach: Show that the trajectories generated by stochastic

optimization methods belong to the following set:
( T )
vk vk
(ξ, w, v) : ξk+1 = Aik ξk + Bik wk , vk = Cik ξk , Mj ≤ Λj , j ∈ Π
wk wk

where Aik ∈ {A1 , · · · , An }, Bik ∈ {B1 , · · · , Bn }, and Cik ∈ {C1 , · · · , Cn }

29
Stochastic Quadratic Constraints
Suppose we can show that the trajectories generated by stochastic optimization
methods belong to the following set:
( T )
vk vk
(ξ, w, v) : ξk+1 = Aξk + Bwk , vk = Cξk , E Mj ≤ Λj , j ∈ Π
wk wk

Theorem
If there exists a positive definite matrix P , non-negative λj and 0 < ρ < 1 s.t.
T T
A P A − ρ2 P AT P B
X
C 0 C 0
λj Mj
BTP A BTP B 0 I 0 I
j∈Π

T
P ξk+1 ≤ ρ2 EξkT P ξk +
P
then Eξk+1 j∈Π λj Λj .
T X T
AT P A − ρ 2 P AT P B

ξk ξk vk v
≤ λj Mj k
wk BTP A BTP B wk wk wk
| {z } j∈Π
T
ξk+1 TP ξ
P ξk+1 −ρ2 ξk k

Then take expectation and apply the expected quadratic constraints!

30
Main Result: Analysis of Biased SGD
• Consider xk+1 = xk − α(∇fik (xk ) + ek ) with kek k2 ≤ δ 2 k∇fik (xk )k2 + c2
• If c = 0, the bound means the angle θ between (ek + ∇fik (xk )) and
∇fik (xk ) satisfies | sin(θ)| ≤ δ

• Rewritten as (xk+1 − x? ) = (xk − x? ) + −αI −αI ∇fik (xk )

| {z } | {z } ek
ξk+1 ξk | {z }
wk

• Assume the restricted secant inequality ∇f (x) (x − x ) ≥ mkx − x? k2

T ?

• Assume fi is L-smooth, i.e. k∇fi (x) − ∇fi (x? )k ≤ Lkx − x? k

T 
xk − x? xk − x?
  
2mI −I 0
• 1st QC: E ∇fik (xk )  −I 0 0 ∇fik (xk ) ≤ |{z}0
ek 0 0 0 ek Λ1
| {z }
M1

?
T 
−2L2 I 0 xk − x?
  
xk − x 0 n
2X
• 2nd QC: E ∇fik (xk )  0 I 0 ∇fik (xk ) ≤ k∇fi (x? )k2
n i=1
ek 0 0 0 ek
| {z } | {z }
M2 Λ2
31
Main Result: Analysis of Biased SGD
• We can rewrite kek k2 ≤ δ 2 k∇fik (xk )k2 + c2 as
T 
xk − x? xk − x?
  
0 0 0
E ∇fik (xk ) 0 −δ 2 I c2
0 ∇fik (xk ) ≤ |{z}
ek 0 0 I ek Λ3
| {z }
M3

• We have A = I, B = −αI

−αI , C = I, and the following SDP
T 3 T
A P A − ρ2 P AT P B
X
C 0 C 0
λ j Mj
BTP A BTP B 0 I 0 I
j=1

• Biased SGD satisfies Ekxk+1 − x? k2 ≤ ρ2 Ekxk − x? k2 + λ2 Λ2 + λ3 c2 if

1 − ρ2
     2 
−α −α −2m 1 0 2L 0 0
 −α α 2 − δ 2 λ3 α 2  + λ1  1 0 0 + λ2  0 −1 0 0
2 2
−α α α + λ3 0 0 0 0 0 0
32
Main Result: Analysis of Biased SGD
• Given Ekx0 − x? k2 ≤ U0 , set Uk+1 = min(ρ2 Uk + λ2 Λ2 + λ3 c2 ) with
1 − ρ2
     2 
−α −α −2m 1 0 2L 0 0
 −α α 2 − δ 2 λ3 α 2  + λ1  1 0 0 + λ2  0 −1 0 0
2 2
−α α α + λ3 0 0 0 0 0 0
then we have Ekxk − x? k2 ≤ Uk . This leads to a sequential SDP problem.
• This problem has an exact solution
p p 2
Uk+1 = α c2 + δ 2 Λ2 + 2L2 δ 2 Uk + (1 − 2mα + 2L2 α2 )Uk + Λ2 α2

c2 +δ 2 Λ2 m(c2 (2L2 −m2 )+(1−δ 2 )Λ2 m2 )

• limk→∞ Uk = m2 −2δ 2 L2 + (m2 −2δ 2 L2 )2 α + O(α2 )
2 2 2
• Rate = 1 − m −2δ m
L
α + O(α2 )
• For different assumptions, modify (Mj , Λj )!
• H., Seiler, and Lessard. Analysis of biased stochastic gradient descent using
sequential semidefinite programs. Mathematical Programming, 2021
• Syed, Dall’Anese, H.. Bounds for the tracking error and dynamic regret of
inexact online optimization methods: A unified analysis via sequential SDPs.
33
Jump System Approach
n T
1 X AT 2
AT
X
i P Ai − ρ P i P Bi λj
C 0
Mj
C 0
T
n i=1 B i P Ai BiT P Bi 0 I 0 I
j∈Π

Pros:
• General enough to handle many algorithms: H., Seiler, Rantzer (COLT2017)
Method Ãik B̃ik C̃
" #
eik eTik

In − eik eTik 0̃ T
SAGA 0̃ 1
−α
n (e − neik )
T
1 −αeTik
" #
eik eTik

In − eik eTik 0̃ T
SAG 0̃ 1
−αn (e − eik )
T
1 −αn eik
T

• General enough to handle Markov {ik }: Syed and H. (NeurIPS2019), Guo

and H. (ACC2022a,2022b)
Cons:
• SDPs are much bigger than the ones obtained from stochastic quadratic
constraints, and we have to exploit SDP structures for simplifications
34
Control for Learning: Summary

• Iterative learning algorithms and neural network layers can be thought as

feedback control systems.

• The quadratic constraint approach from control theory can be leveraged to

formulate SDP conditions for machine learning research.

• Different from the study in control, now we want to obtain analytical

solutions of the SDPs!

35
Outline

• Control for Learning

• Control methods on certifiably robust neural networks
• A control perspective on stochastic learning algorithms

• Learning for Control

• Global convergence of direct policy search on robust control

Reinforcement Learning and Optimal Control - Draft Version by Dmitri Bertsekas
No ratings yet
Reinforcement Learning and Optimal Control - Draft Version by Dmitri Bertsekas
268 pages
(OPW) Manish Kumar Gond
No ratings yet
(OPW) Manish Kumar Gond
264 pages
RDBMS PPT
No ratings yet
RDBMS PPT
34 pages
Dmgss
No ratings yet
Dmgss
75 pages
Introduction To Online Control Hazan & Singh
No ratings yet
Introduction To Online Control Hazan & Singh
192 pages
ECE1659H Instructor 2x1
No ratings yet
ECE1659H Instructor 2x1
148 pages
5SC28 L7 Machine Learning
No ratings yet
5SC28 L7 Machine Learning
61 pages
Practical Labs CS505P
No ratings yet
Practical Labs CS505P
6 pages
PC Creator Overclocking Benchmarks
No ratings yet
PC Creator Overclocking Benchmarks
35 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
( (Robust) ) ( (Optimal) ) Robust & Optimal Control
No ratings yet
( (Robust) ) ( (Optimal) ) Robust & Optimal Control
614 pages
Ee227c Notes 2 PDF
No ratings yet
Ee227c Notes 2 PDF
122 pages
Lecture NN Part1
No ratings yet
Lecture NN Part1
62 pages
Algorithms For Reinforcement Learning - Szepesvari
No ratings yet
Algorithms For Reinforcement Learning - Szepesvari
98 pages
2023 Week4 Funcapproximate Update
No ratings yet
2023 Week4 Funcapproximate Update
69 pages
Richi's Neural Nets Summary
No ratings yet
Richi's Neural Nets Summary
114 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
DL 2
No ratings yet
DL 2
62 pages
1 - Lipschitz Layers Compared
No ratings yet
1 - Lipschitz Layers Compared
24 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
05 - Robust MPC
No ratings yet
05 - Robust MPC
28 pages
Netscout University - Lab - Router Profiled Automatic Threshold Configuration
No ratings yet
Netscout University - Lab - Router Profiled Automatic Threshold Configuration
6 pages
QB1 DL
No ratings yet
QB1 DL
20 pages
PINnc 89 VV
No ratings yet
PINnc 89 VV
12 pages
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
No ratings yet
A Neural Network Approach For Solving Optimal Control Problems With Inequality Constraints and Some Applications
29 pages
OS Lab - Version 3.0 - Revised Summer 2023
No ratings yet
OS Lab - Version 3.0 - Revised Summer 2023
96 pages
2019 RL Control Review
No ratings yet
2019 RL Control Review
27 pages
Kamala Pur Kar 2016
No ratings yet
Kamala Pur Kar 2016
11 pages
Adaptative For Lineare
No ratings yet
Adaptative For Lineare
23 pages
10 - Reinforcement Learning
No ratings yet
10 - Reinforcement Learning
24 pages
Deep Learning As Optimal Control Problems - Models and Numerical Methods
No ratings yet
Deep Learning As Optimal Control Problems - Models and Numerical Methods
34 pages
Cone Programming
No ratings yet
Cone Programming
27 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Deep Abap - BY - AMOL
No ratings yet
Deep Abap - BY - AMOL
3 pages
Sensors 20 05603
No ratings yet
Sensors 20 05603
20 pages
Control and Reinforcement Learning
No ratings yet
Control and Reinforcement Learning
6 pages
Chapter 2 Dit
No ratings yet
Chapter 2 Dit
17 pages
Tac 232
No ratings yet
Tac 232
7 pages
BAPI VS BADI in SAP
No ratings yet
BAPI VS BADI in SAP
8 pages
On The Certainty-Equivalence Approach To Direct Data-Driven LQR Design
No ratings yet
On The Certainty-Equivalence Approach To Direct Data-Driven LQR Design
8 pages
Applsci 13 13181
No ratings yet
Applsci 13 13181
21 pages
Ee227c Notes PDF
No ratings yet
Ee227c Notes PDF
122 pages
Provably Safe and Robust Learning-BasedModel Predictive Control
No ratings yet
Provably Safe and Robust Learning-BasedModel Predictive Control
13 pages
Devanand Resume - Mule
No ratings yet
Devanand Resume - Mule
3 pages
Module 3 (Compiler Design) 1
No ratings yet
Module 3 (Compiler Design) 1
14 pages
Reinforcement Learning and Dynamic Programming For Control
100% (1)
Reinforcement Learning and Dynamic Programming For Control
111 pages
Ponontle HP Parnership Technical Assessment 1
No ratings yet
Ponontle HP Parnership Technical Assessment 1
5 pages
Learning-Based Control of Continuous-Time Systems Using Output Feedback
No ratings yet
Learning-Based Control of Continuous-Time Systems Using Output Feedback
8 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
Lieven LP Problems
No ratings yet
Lieven LP Problems
68 pages
从LQR角度看RL和控制
No ratings yet
从LQR角度看RL和控制
28 pages
Robust Slides
No ratings yet
Robust Slides
32 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Network Capacity Planning
No ratings yet
Network Capacity Planning
14 pages
Alg RLearning Ejemplo
No ratings yet
Alg RLearning Ejemplo
99 pages
Pan 2022 J. Phys. Conf. Ser. 2203 012058
No ratings yet
Pan 2022 J. Phys. Conf. Ser. 2203 012058
7 pages
A Cutting Plane Technique Applied To Robust Control - 1989 - IFAC Proceedings Vo
No ratings yet
A Cutting Plane Technique Applied To Robust Control - 1989 - IFAC Proceedings Vo
5 pages
Memory Organization - II: Unit - 6
No ratings yet
Memory Organization - II: Unit - 6
17 pages
An Introduction To Model-Based Predictive Control (MPC) : ECE 680 Fall 2017
No ratings yet
An Introduction To Model-Based Predictive Control (MPC) : ECE 680 Fall 2017
25 pages
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
No ratings yet
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
8 pages
3796 Neural Lyapunov Model Predicti
No ratings yet
3796 Neural Lyapunov Model Predicti
12 pages
Lecture#1 String Manipulation
No ratings yet
Lecture#1 String Manipulation
34 pages
Chap 12
No ratings yet
Chap 12
120 pages
Nonaffine Nonlinear
No ratings yet
Nonaffine Nonlinear
5 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
Simon Chapter 3
No ratings yet
Simon Chapter 3
12 pages
Control Theory - Robust Systems, Theory and Applications
No ratings yet
Control Theory - Robust Systems, Theory and Applications
258 pages
Asd
No ratings yet
Asd
4 pages
C++ Shell
No ratings yet
C++ Shell
1 page
Algorithms For Reinforced Learning
No ratings yet
Algorithms For Reinforced Learning
98 pages
Linear Quadratic Dual Control: Anders Rantzer
No ratings yet
Linear Quadratic Dual Control: Anders Rantzer
4 pages
Adaptive l1 Robust Control For SISO Systems
No ratings yet
Adaptive l1 Robust Control For SISO Systems
15 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Optimal Control and Decision Making: Eexam
No ratings yet
Optimal Control and Decision Making: Eexam
18 pages
Syllabus ECE586BH 2023
No ratings yet
Syllabus ECE586BH 2023
2 pages
PARTS
No ratings yet
PARTS
4 pages
M - Sequence and P - Sequencer in UVM
No ratings yet
M - Sequence and P - Sequencer in UVM
4 pages
Formal Languages and Automata Theory
No ratings yet
Formal Languages and Automata Theory
7 pages
Sangoma s505 Datasheet PC
No ratings yet
Sangoma s505 Datasheet PC
2 pages
Cisco Ucertify 400-007 Study Guide 2022-Sep-22 by Hale 53q Vce PDF
No ratings yet
Cisco Ucertify 400-007 Study Guide 2022-Sep-22 by Hale 53q Vce PDF
12 pages
What Is IoT
No ratings yet
What Is IoT
5 pages
VFDs++for+HVAC+Application+ +Standard+or+Packaged
No ratings yet
VFDs++for+HVAC+Application+ +Standard+or+Packaged
6 pages
RL Frontmatter
No ratings yet
RL Frontmatter
11 pages
Restaurant Management System
No ratings yet
Restaurant Management System
3 pages
Short Notes On Software and Hardware
No ratings yet
Short Notes On Software and Hardware
2 pages
Teacher's Notes - Lab Chapter 1 - Intro To Solaris
No ratings yet
Teacher's Notes - Lab Chapter 1 - Intro To Solaris
3 pages
How To Code For Quantum Computers
From Everand
How To Code For Quantum Computers
Nivio Dos Santos
No ratings yet
Sequences and Infinite Series, A Collection of Solved Problems
From Everand
Sequences and Infinite Series, A Collection of Solved Problems
Steven Tan
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ECE586BH Lecture1

Uploaded by

ECE586BH Lecture1

Uploaded by

ECE586BH: Interplay between Control and

Lecture 1, Fall 2023

• dynamical systems • statistics/optimization

Unified and automated tools for a repeatable and trustable

• Optimization: The oracle can be inexact! xk+1 = xk − α(∇f (xk )+ek )

LTI systems Markov jump systems Lur’e systems

ξk+1 = Aξk + Buk ξk+1 = Aik ξk + Bik uk ξk+1 = Aξk + Bφ(Cξk )

Pros: Unified testing conditions when problem parameters are changed.

Key message: Robustness can be addressed in a unified manner!

LTI systems MJLS Lur’e systems

ξk+1 = Aξk + Buk ξk+1 = Aik ξk + Bik uk ξk+1 = Aξk + Bφ(Cξk )

Many control design methods rely on convex conditions (BGFB1994).

• Control for Learning

• Learning for Control

1. Approximate the true system as “a linear system + a perturbation”

{(ξ, w, v) : ξk+1 = Aξk + Bwk , vk = Cξk } ∩ {(ξ, w, v) : wk = ∆(vk )}

• If f is L-smooth and m-strongly convex, then by co-coercivity:

• We have A = I, B = −αI, C = I, and the following SDP

• If f is L-smooth and m-strongly convex, then by co-coercivity:

• We have A = I, B = −αI, C = I, and the same SDP

• Control for Learning

• Learning for Control

Deep learning models: f (x) = xD+1 and x0 = x

then we have for every τ such that kτ k2 ≤ ε: arg maxj fj (x + τ ) = y

A Unification of Existing 1-Lipschitz Neural Networks

• Control Theory (SLL): Tk = diag( nj=1 |WkT Wk |ij qj /qi ).

Natural Provable Accuracy (ε)

Cayley Large 43.3 29.2 18.8 11.0 -

• Competitive results over CIFAR100 and TinyImageNet

• We will use the property of σ to construct Mk such that we only need to

• The vector-version quadratic constraint: For diagonal Γk  0, we have

• Choosing vk = WkT xk + bk and vk0 = WkT x0k + bk , we have

then the residual network xk+1 = xk − Gk σ(WkT xk + bk ) is 1-Lipschitz.

• Analytical solution: Gk = Wk Γk and Γk WkT Wk Γk  2Γk .

• Control for Learning

• Learning for Control

assumption + method =⇒ bound

What to do for stochastic optimization (e.g. xk+1 = xk − α∇fik (xk ) where

• Jump system approach: Show that the trajectories generated by stochastic

where Aik ∈ {A1 , · · · , An }, Bik ∈ {B1 , · · · , Bn }, and Cik ∈ {C1 , · · · , Cn }

Then take expectation and apply the expected quadratic constraints!

• Assume the restricted secant inequality ∇f (x) (x − x ) ≥ mkx − x? k2

• Assume fi is L-smooth, i.e. k∇fi (x) − ∇fi (x? )k ≤ Lkx − x? k

• Biased SGD satisfies Ekxk+1 − x? k2 ≤ ρ2 Ekxk − x? k2 + λ2 Λ2 + λ3 c2 if

c2 +δ 2 Λ2 m(c2 (2L2 −m2 )+(1−δ 2 )Λ2 m2 )

• General enough to handle Markov {ik }: Syed and H. (NeurIPS2019), Guo

• Iterative learning algorithms and neural network layers can be thought as

• The quadratic constraint approach from control theory can be leveraged to

• Different from the study in control, now we want to obtain analytical

• Control for Learning

• Learning for Control

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

• The vector-version quadratic constraint: For diagonal Γk 0, we have

• Analytical solution: Gk = Wk Γk and Γk WkT Wk Γk 2Γk .