0% found this document useful (0 votes)
73 views30 pages

Model Reduction For Large-Scale Linear Dynamical Systems: Abstract. The Optimal

Modeling and Simulation of Gas Distribution Network

Uploaded by

sherryy619
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views30 pages

Model Reduction For Large-Scale Linear Dynamical Systems: Abstract. The Optimal

Modeling and Simulation of Gas Distribution Network

Uploaded by

sherryy619
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

SIAM J. MATRIX ANAL. APPL.


c 2008 Society for Industrial and Applied Mathematics
Vol. 30, No. 2, pp. 609–638

H2 MODEL REDUCTION FOR LARGE-SCALE LINEAR


DYNAMICAL SYSTEMS∗
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

S. GUGERCIN† , A. C. ANTOULAS‡ , AND C. BEATTIE†

Abstract. The optimal H2 model reduction problem is of great importance in the area of dy-
namical systems and simulation. In the literature, two independent frameworks have evolved focusing
either on solution of Lyapunov equations on the one hand or interpolation of transfer functions on the
other, without any apparent connection between the two approaches. In this paper, we develop a new
unifying framework for the optimal H2 approximation problem using best approximation properties
in the underlying Hilbert space. This new framework leads to a new set of local optimality condi-
tions taking the form of a structured orthogonality condition. We show that the existing Lyapunov-
and interpolation-based conditions are each equivalent to our conditions and so are equivalent to
each other. Also, we provide a new elementary proof of the interpolation-based condition that clar-
ifies the importance of the mirror images of the reduced system poles. Based on the interpolation
framework, we describe an iteratively corrected rational Krylov algorithm for H2 model reduction.
The formulation is based on finding a reduced order model that satisfies interpolation-based first-
order necessary conditions for H2 optimality and results in a method that is numerically effective
and suited for large-scale problems. We illustrate the performance of the method with a variety of
numerical experiments and comparisons with existing methods.

Key words. model reduction, rational Krylov, H2 approximation

AMS subject classifications. 34C20, 41A05, 49K15, 49M05, 93A15, 93C05, 93C15

DOI. 10.1137/060666123

1. Introduction. Given a dynamical system described by a set of first-order


differential equations, the model reduction problem seeks to replace this original set
of equations with a (much) smaller set of such equations so that the behavior of both
systems is similar in an appropriately defined sense. Such situations arise frequently
when physical systems need to be simulated or controlled; the greater the level of detail
that is required, the greater the number of resulting equations. In large-scale settings,
computations become infeasible due to limitations on computational resources as well
as growing inaccuracy due to numerical ill-conditioning. In all these cases the number
of equations involved may range from a few hundred to a few million. Examples
of large-scale systems abound, ranging from the design of VLSI (very large scale
integration) chips to the simulation and control of MEMS (microelectromechanical
system) devices. For an overview of model reduction for large-scale dynamical systems
we refer to the book [2]. See also [23] for a recent collection of large-scale benchmark
problems.
In this paper, we consider single input/single output (SISO) linear dynamical
systems represented as

ẋ(t) = Ax(t) + b u(t)
(1.1) G: or G(s) = cT (sI − A)−1 b,
y(t) = cT x(t)
∗ Received by the editors July 26, 2006; accepted for publication (in revised form) by P. Benner
February 25, 2008; published electronically June 6, 2008.
http://www.siam.org/journals/simax/30-2/66612.html
† Department of Mathematics, Virginia Tech, Blacksburg, VA (gugercin@vt.edu, beattie@vt.edu).

The work of these authors was supported in part by the NSF through grants DMS-050597 and
DMS-0513542, and by the AFOSR through grant FA9550-05-1-0449.
‡ Department of Electrical and Computer Engineering, Rice University, Houston, TX (aca@ece.

rice.edu). The work of this author was supported in part by the NSF through grants CCR-0306503
and ACI-0325081.
609

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


610 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

where A ∈ Rn×n , b, c ∈ Rn ; we define x(t) ∈ Rn , u(t) ∈ R, y(t) ∈ R as the state,


input, and output, respectively, of the system. (We comment on extensions to the
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

multiple input/multiple output (MIMO) case in section 3.2.1, but will confine our
analysis and examples to the SISO case.)
G(s) is the transfer function of the system: if u (s) and y(s) denote the Laplace
transforms of the input and output u(t) and y(t), respectively, then y(s) = G(s) u(s).
With a standard abuse of notation, we will denote both the system and its transfer
function by G. The “dimension of G” is taken to be the dimension of the underlying
state space, dim G = n in this case. It will always be assumed that the system, G, is
stable, that is, that the eigenvalues of A have strictly negative real parts.
The model reduction process will yield another system,

ẋr (t) = Ar xr (t) + br u(t)
(1.2) Gr : or Gr (s) = cTr (sI − Ar )−1 br ,
yr (t) = cTr xr (t)

having (much) smaller dimension r  n, with Ar ∈ Rr×r and br , cr ∈ Rr .


We want yr (t) ≈ y(t) over a large class of inputs u(t). Different measures of ap-
proximation and different choices of input classes will lead to different model reduction
goals. Suppose one wants to ensure that maxt>0 |y(t) ∞ − yr (t)| is small uniformly over
all inputs, u(t), having bounded “energy,” that is, 0 |u(t)|2 dt ≤ 1. Observe first
that y(s) − yr (s) = [G(s) − Gr (s)] u (s) and then
  ∞ 
 1 
max |y(t) − yr (t)| = max  y (ıω) − yr (ıω)) eıωt dω 
(
t>0 t>0 2π −∞
 ∞  ∞
1 1
≤ |
y (ıω) − yr (ıω)| dω = |G(ıω) − Gr (ıω)| | u(ıω)| dω
2π −∞ 2π −∞
  ∞ 1/2   ∞ 1/2
1 2 1
≤ |G(ıω) − Gr (ıω)| dω |
u(ıω)|2 dω
2π −∞ 2π −∞
  ∞ 1/2  ∞ 1/2
1 2
≤ |G(ıω) − Gr (ıω)| dω |u(t)| dt
2
2π −∞ 0
  +∞ 1/2
1
≤ |G(ıω) − Gr (ıω)| dω = G − Gr H2 .
2 def

2π −∞

We
seek a reduced order dynamical system, Gr , such that
G − Gr H2 , the “H2 error,” is as small as possible;
(i)
(ii)
critical system properties for G (such as stability) exist also in Gr ; and
the computation of Gr (i.e., the computation of Ar , br , and cr ) is both
(iii)
efficient and numerically stable.
The problem of finding reduced order models that yield a small H2 error has
been the object of many investigations; see, for instance, [6, 37, 34, 9, 21, 26, 22, 36,
25, 13] and the references therein. Finding a global minimizer of G − Gr H2 is a
hard task, so the goal in making G − Gr H2 “as small as possible” becomes, as for
many optimization problems, identification of reduced order models, Gr , that satisfy
first-order necessary conditions for local optimality. There is a wide variety of such
conditions that may be derived, yet their interconnections are generally unclear. Most
methods that can identify reduced order models satisfying such first-order necessary
conditions will require dense matrix operations, typically the solution of a sequence
of matrix Lyapunov equations, a task which becomes computationally intractable

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 611

rapidly as the dimension increases. Such methods are unsuitable even for medium-
scale problems. In section 2, we review the moment matching problem for model
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

reduction, its connection with rational Krylov methods (which are very useful for
large-scale problems), and basic features of the H2 norm and inner product.
We offer in section 3 what appears to be a new set of first-order necessary condi-
tions for local optimality of a reduced order model comprising in effect a structured
orthogonality condition. We also show its equivalence with two other H2 optimality
conditions that have been previously known (thus showing them all to be equivalent).
An iterative algorithm that is designed to force optimality with respect to a set of
conditions that is computationally tractable is described in section 4. The proposed
method also forces optimality with respect to the other equivalent conditions as well.
It is based on computationally effective use of rational Krylov subspaces and so is
suitable for systems whose dimension n is of the order of many thousands of state
variables. Numerical examples are presented in section 5.
2. Background.
2.1. Model reduction by moment matching. Given the system (1.1), reduc-
tion by moment matching consists in finding a system (1.2) so that Gr (s) interpolates
the values of G(s), and perhaps also derivative values as well, at selected interpolation
points (also called shifts) σk in the complex plane. For our purposes, simple Hermite
interpolation suffices, so our problem is to find Ar , br , and cr so that

Gr (σk ) = G(σk ) and Gr (σk ) = G (σk ) for k = 1, . . . , r

or, equivalently,

cT (σk I − A)−1 b = cTr (σk I − Ar )−1 br and cT (σk I − A)−2 b = cTr (σk I − Ar )−2 br

for k = 1, . . . , r. The quantity cT (σk I − A)−(j+1) b is called the jth moment of


G(s) at σk . Moment matching for finite σ ∈ C becomes rational interpolation; see, for
example, [3]. Importantly, these problems can be solved in a recursive and numerically
effective way by means of rational Lanczos/Arnoldi procedures.
To see this we first consider reduced order models that are constructed by Galerkin
approximation: Let Vr and Wr be given r-dimensional subspaces of Rn that are
generic in the sense that Vr ∩ Wr⊥ = {0}. Then for any input u(t) the reduced order
output yr (t) is defined by

(2.1) Find v(t) ∈ Vr such that v̇(t)−Av(t) − bu(t) ⊥ Wr for all t;


def T
then yr (t) = c v(t).

Denote by Ran(M) the range of a matrix M. Let Vr ∈ Rn×r and Wr ∈ Rn×r be


matrices defined so that Vr = Ran(Vr ) and Wr = Ran(Wr ). Then the assumption
Vr ∩Wr⊥ = {0} is equivalent to WrT Vr being nonsingular. The Galerkin approximation
(2.1) can be interpreted as v(t) = Vr xr (t) with xr (t) ∈ Rr for each t and

WrT (Vr ẋr (t) − AVr xr (t) − b u(t)) = 0

leading then to the reduced order model (1.2) with

(2.2) Ar = (WrT Vr )−1 WrT AVr , br = (WrT Vr )−1 WrT b, and cTr = cT Vr .

Evidently the choice of Vr and Wr determines the quality of the reduced order model.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


612 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

Rational interpolation by projection was first proposed by Skelton et al. in [11, 38,
39]. Grimme [17] showed how one can obtain the required projection using the rational
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Krylov method of Ruhe [33]. Krylov-based methods are able to match moments
without ever computing them explicitly. This is important since the computation of
moments is in general ill-conditioned. This is a fundamental motivation behind the
Krylov-based methods [12].
In Lemma 2.1 and Corollary 2.2 below, we present new short proofs of rational
interpolation by Krylov projection that are substantially simpler than those found in
the original works [17, 11, 38, 39].
Lemma 2.1. Suppose σ ∈ C is not an eigenvalue of either A or Ar .

(2.3) If (σI − A)−1 b ∈ Vr , then Gr (σ) = G(σ).


T −1
(2.4) If (σ I − A ) c ∈ Wr , then Gr (σ) = G(σ).
−1
If both (σI − A) b ∈ Vr and (σ I − AT )−1 c ∈ Wr ,
(2.5) then Gr (σ) = G(σ) and Gr (σ) = G (σ).

Proof. Define Nr (z) = Vr (zI − Ar )−1 (WrT Vr )−1 WrT (zI − A) and N r (z) =
−1
(zI−A)Nr (z)(zI−A) . Both Nr (z) and Nr (z) are analytic matrix-valued functions
in a neighborhood of z = σ. One may directly verify that N2r (z) = Nr (z) and
N r (z) and that Vr = Ran Nr (z) = Ker (I − Nr (z)) and W ⊥ = Ker N
2 (z) = N r (z) =
r
r
Ran I − N r (z) for all z in a neighborhood of σ. Then
T    
G(z) − Gr (z) = (zI − AT )−1 c r (z) (zI − A) I − Nr (z) (zI − A)−1 b.
I−N

Evaluating at z = σ leads to (2.3) and (2.4). Evaluating at z = σ + ε and observing


that (σI + εI − A)−1 = (σI − A)−1 − ε(σI − A)−2 + O(ε2 ) yields

G(σ + ε) − Gr (σ + ε) = O(ε2 ),

which gives (2.5) as a consequence.


Corollary 2.2. Consider the system G defined by A, b, c, a set of distinct shifts
given by {σk }rk=1 , that is closed under conjugation (i.e., shifts are either real or occur
in conjugate pairs), and subspaces spanned by the columns of Vr and Wr with
 
(2.6) Ran(Vr ) = span (σ1 I − A)−1 b, . . . , (σr I − A)−1 b and
 T −1 T −1

(2.7) Ran(Wr ) = span (σ1 I − A ) c, . . . , (σr I − A ) c .

Then Vr and Wr can be chosen to be real matrices and the reduced order system Gr
defined by Ar = (WrT Vr )−1 WrT AVr , br = (WrT Vr )−1 WrT b, cTr = cT Vr is itself real
and matches the first two moments of G(s) at each of the interpolation points σk , i.e.,
G(σk ) = Gr (σk ) and G (σk ) = Gr (σk ) for k = 1, . . . , r.
For Krylov-based model reduction, one chooses interpolation points and then con-
structs Vr and Wr satisfying (2.6) and (2.7), respectively. Note that, in a numerical
implementation, one does not actually compute (σi I − A)−1 , but instead computes
a (potentially sparse) factorization (one for each interpolation point σi ), uses it to
solve a system of equations having b as a right-hand side, and uses its transpose to
solve a system of equations having c as a right-hand side. The interpolation points
are chosen so as to minimize the deviation of Gr from G in a sense that is detailed in
the next section. Unlike Gramian-based model reduction methods such as balanced
truncation (see section 2.2 below and [2]), Krylov-based model reduction requires only

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 613

matrix-vector multiplications and some sparse linear solvers, and can be iteratively
implemented; hence it is computationally effective; for details, see also [15, 16].
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

2.2. Model reduction by balanced truncation. One of the most common


model reduction techniques is balanced truncation [28, 27]. In this case, the modeling
subspaces Vr and Wr depend on the solutions to the two Lyapunov equations

(2.8) AP + PAT + bbT = 0, AT Q + QA + cT c = 0.

P and Q are called the reachability and observability Gramians, respectively. Under
the assumption that A is stable, both P and Q are positive semidefinite matrices.
Square roots of the eigenvalues of the product PQ are the singular values of the
Hankel operator associated with G(s) and are called the Hankel singular values of
G(s), denoted by ηi (G).
Let P = UUT and Q = LLT . Let UT L = ZSYT be the singular value decompo-
sition with S = diag(η1 , η2 , . . . , ηn ). Let Sr = diag(η1 , η2 , . . . , ηr ), r < n. Construct

(2.9) Wr = LYr S−1/2


r and Vr = UZr S−1/2
r ,

where Zr and Yr denote the leading r columns of left singular vectors, Z, and right
singular vectors, Y, respectively. The rth-order reduced order model via balanced
truncation, Gr (s), is obtained by reducing G(s) using Wr and Vr from (2.9).
Another important dynamical systems norm (besides the H2 norm) is the H∞
norm defined as GH∞ := supω∈R |G(ıω)|. The reduced order system Gr (s) obtained
by balanced truncation is asymptotically stable and the H∞ norm of the error system
satisfies G − Gr H∞ ≤ 2(ηr+1 + · · · + ηn ).
The value of having, for reduced order models, guaranteed stability and an explicit
error bound is widely recognized, though it is achieved at potentially considerable
cost. As described above, balanced truncation requires the solution of two Lyapunov
equations of order n, which is a formidable task in large-scale settings. For more
details and background on balanced truncation, see section III.7 of [2].
2.3. The H2 norm. H2 will denote the set of functions, g(z), that are analytic
for z in the open right half plane, Re(z) > 0, and such that for each fixed Re(z) =
x > 0, g(x + ıy) is square integrable as a function of y ∈ (−∞, ∞) in such a way that
 ∞
sup |g(x + ıy)|2 dy < ∞.
x>0 −∞

H2 is a Hilbert space and holds our interest because transfer functions associated with
stable SISO finite-dimensional dynamical systems are elements of H2 . Indeed, if G(s)
and H(s) are transfer functions associated with real stable SISO dynamical systems,
then the H2 inner product can be defined as
 ∞  ∞
def 1 1
(2.10)
G, H H2 = G(ıω)H(ıω) dω = G(−ıω)H(ıω) dω,
2π −∞ 2π −∞
with a norm defined as
  +∞ 1/2
1
GH2 = |G(ıω)|2 dω
def
(2.11) .
2π −∞

Notice in particular that if G(s) and H(s) represent real dynamical systems, then

G, H H2 =
H, G H2 and
G, H H2 must be real.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


614 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

There are two alternate characterizations of this inner product that make it far
more computationally accessible.
Lemma 2.3. Suppose A ∈ Rn×n and B ∈ Rm×m are stable and, given b, c ∈ Rn
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php


and b, c ∈ Rm , define associated transfer functions,
−1 −1
G(s) = cT (sI − A) b and H(s) =
cT (sI − B) b.

The inner product


G, H H2 is associated with solutions to Sylvester equations as
follows:

(2.12) If P solves T = 0,
AP + PBT + bb then
G, H H2 = cT P
c.
(2.13) If Q solves QA + BT Q +
ccT = 0, then T Qb.

G, H H2 = b
(2.14) If R solves cT = 0,
AR + RB + b then

G, H H2 = cT Rb.

Note that if A = B, b = b, and c =


c, then P is the “reachability Gramian” of G(s),
Q is the “observability Gramian” of G(s), and R is the “cross Gramian” of G(s); and

(2.15) G2H2 = cT Pc = bT Qb = cT Rb.

Gramians play a prominent role in the analysis of linear dynamical systems; refer
to [2] for more information.
Proof. We detail the proof of (2.12); proofs of (2.13) and (2.14) are similar. Since
A and B are stable, the solution, P, to the Sylvester equation of (2.12) exists and is
unique. For any ω ∈ R, rearrange this equation to obtain in sequence


(−ıωI − A) P + P ıωI − BT − bb T = 0,
−1
−1 −1 T
−1
(−ıωI − A) P + P ıωI − BT = (−ıωI − A) bb ıωI − BT ,
−1
−1
cT (−ıωI − A) P c + cT P ıωI − BT
c = G(−ıω)H(ıω),

and finally
   
L
−1
L

T −1
c T
(−ıωI − A) dω P
c+c P T
ıωI − B dω
c
−L −L
 L
= G(−ıω)H(ıω) dω.
−L

Taking L → ∞ and using Lemma A.1 in the appendix leads to


 ∞   ∞ 
−1
G(−ıω)H(ıω) dω = c T
P.V. (−ıωI − A) dω P c
−∞ −∞
  ∞ 


T −1
T
+ c P P.V. ıωI − B dω c
−∞
T
= 2π c P
c.

Recently, Antoulas [2] obtained a new expression for GH2 based on the poles
and residues of the transfer function G(s) that complements the widely known alter-
native expression (2.15). We provide a compact derivation of this expression and the
associated H2 inner product.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 615

If f (s) is a meromorphic function with a pole at λ, denote the residue of f (s) at λ


by res[f (s), λ]. Thus, if λ is a simple pole of f (s), then res[f (s), λ]
= lims→λ (s−λ)f
(s),
d
(s − λ)2 f (s) .
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

and if λ is a double pole of f (s), then res[f (s), λ] = lims→λ ds


Lemma 2.4. Suppose that G(s) has poles at λ1 , λ2 , . . . , λn and H(s) has poles at
μ1 , μ2 , . . . , μm , both sets contained in the open left half plane. Then

m 
n
(2.16)
G, H H2 = res[G(−s)H(s), μk ] = res[H(−s)G(s), λk ].
k=1 k=1

In particular,
• if μk is a simple pole of H(s), then
res[G(−s)H(s), μk ] = G(−μk )res[H(s), μk ];
• if μk is a double pole of H(s), then
res[G(−s)H(s), μk ] = G(−μk ) res[H(s), μk ] − G (−μk ) · h0 (μk ),


where h0 (μk ) = lims→μk (s − μk )2 H(s) .
Proof. Notice that the function G(−s)H(s) has singularities at μ1 , μ2 , . . . , μm
and −λ1 , −λ2 , . . . , −λn . For any R > 0, define the semicircular contour in the left
half plane:
   
 π 3π

ΓR = {z |z = ıω with ω ∈ [−R, R] } ∪ z z = R e with θ ∈
ıθ
, .
2 2
ΓR bounds a region that for sufficiently large R contains all the system poles of H(s)
and so, by the residue theorem,
 +∞
1

G, H H2 = G(−ıω)H(ıω) dω
2π −∞
 m
1
= lim G(−s)H(s) ds = res[G(−s)H(s), μk ].
R→∞ 2πı Γ
R k=1

Evidently, if μk is a simple pole for H(s), it is also a simple pole for G(−s)H(s) and
res[G(−s)H(s), μk ] = lim (s − μk )G(−s)H(s) = G(−μk ) lim (s − μk )H(s).
s→μk s→μk

If μk is a double pole for H(s), then it is also a double pole for G(−s)H(s) and
d
res[G(−s)H(s), μk ] = lim (s − μk )2 G(−s)H(s)
s→μk ds
d
= lim G(−s) (s − μk )2 H(s) − G (−s)(s − μk )2 H(s)
s→μk ds
d
= G(−μk ) lim (s − μk )2 H(s) − G (−μk ) lim (s − μk )2 H(s).
s→μk ds s→μk

Lemma 2.4 immediately yields the expression for GH2 given by Antoulas [2,
p. 145] based on poles and residues of the transfer function G(s).
Corollary 2.5. If G(s) has simple poles at λ1 , λ2 , . . . , λn , then
 n 1/2

GH2 = res[G(s), λk ]G(−λk ) .
k=1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


616 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

3. Optimal H2 model reduction. In this section, we investigate three frame-


works of necessary conditions for H2 optimality. The first utilizes the inner product
structure H2 and leads to what could be thought of as a geometric condition for opti-
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

mality. This appears to be a new characterization of H2 optimality for reduced order


models. The remaining two frameworks, interpolation-based [26] and Lyapunov-based
[36, 22], are easily derived from the first framework and in this way can be seen to be
equivalent to one another—a fact that is not a priori evident. This equivalence proves
that solving the optimal H2 problem in the Krylov framework is equivalent to solving
it in the Lyapunov framework, which leads to the proposed Krylov-based method for
H2 model reduction in section 4.
Given G, a stable SISO finite-dimensional dynamical system as described in (1.1),
we seek a stable reduced order system Gr of order r as described in (1.2), which is
the best stable rth-order dynamical system approximating G with respect to the H2
norm:

(3.1) G − Gr H2 = min r H .


G − G 2
dim(G r )=r
G r : stable

Many researchers have worked on problem (3.1), the optimal H2 model reduction
problem. See [37, 34, 9, 21, 26, 22, 36, 25] and the references therein.
3.1. Structured orthogonality optimality conditions. The set of all stable
rth-order dynamical systems do not constitute a subspace of H2 , so the best rth-
order H2 approximation is not so easy to characterize, the Hilbert space structure
of H2 notwithstanding. This observation does suggest the following narrower though
simpler result.
Theorem 3.1. Let μ1 , μ2 , . . . , μr ⊂ C be distinct points in the open left half
plane and define M(μ) to be the set of all proper rational functions that have simple
poles exactly at μ1 , μ2 , . . . , μr . Then
• H ∈ M(μ) implies that H is the transfer function of a stable dynamical
system with dim(H) = r;
• M(μ) is an (r − 1)-dimensional subspace of H2 ;
• Gr ∈ M(μ) solves

(3.2) G − Gr H2 = min r H


G − G 2
r ∈M(μ)
G

if and only if

(3.3)
G − Gr , H H2 = 0 for all H ∈ M(μ).

Furthermore the solution, Gr , to (3.2) exists and is unique.


Proof. The key observation is that M(μ) is a closed subspace of H2 . Then the
equivalence of (3.2) and (3.3) follows from the classic projection theorem in Hilbert
space (cf. [32]).
One consequence of Theorem 3.1 is that if Gr (s) interpolates a real system G(s)
at the mirror images of its own poles (i.e., at the poles of Gr (s) reflected across
the imaginary axis), then Gr (s) is guaranteed to be an optimal approximation of
G(s) relative to the H2 norm among all reduced order systems having the same
reduced system poles {μi }ri=1 . An analogous result for optimal rational approximants
to analytic functions on the unit disk can be found in [14]. The set of stable rth-
order dynamical systems is not convex, and so the original problem (3.1) allows for

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 617

multiple minimizers. Indeed there may be “local minimizers” that do not solve (3.1).
A reduced order system, Gr , is a local minimizer for (3.1) if, for all ε > 0 sufficiently
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

small,
(3.4) (ε) H
G − Gr H2 ≤ G − G r 2

for all stable dynamical systems G r with dim(G


(ε) r ) = r and Gr − G
(ε) r H ≤ C ε,
(ε)
2

with C being a constant that may depend on the particular family G r(ε) considered.
As a practical matter, the global minimizers that solve (3.1) are difficult to obtain
with certainty; current approaches favor seeking reduced order models that satisfy a
local (first-order) necessary condition for optimality. Even though such strategies do
not guarantee global minimizers, they often produce effective reduced order models
nonetheless. In this spirit, we give necessary conditions for optimality for the reduced
order system, Gr , that appear as structured orthogonality conditions similar to (3.3).
Theorem 3.2. If Gr is a local minimizer to G as described in (3.4) and Gr has
simple poles, then
(3.5)
G − Gr , Gr · H1 + H2 H2 = 0
for all real dynamical systems H1 and H2 having the same poles with the same mul-
tiplicities as Gr .
(Gr · H1 here denotes pointwise multiplication of scalar functions.)
Proof. Theorem 3.1 implies (3.5) with H1 = 0, so it suffices to show that the
hypotheses imply that
G − Gr , Gr · H H2 = 0 for all real dynamical systems H
having the same poles with the same multiplicities as Gr .
Suppose that {G r(ε) }ε>0 is a family of real stable dynamical systems with
dim(G r(ε) ) = r and Gr − G r(ε) H < Cε for some constant C > 0. Then for all
2
ε > 0 sufficiently small,
(ε) 2
G − Gr 2H2 ≤ G − G r H2
(ε) )2
≤ (G − Gr ) + (Gr − G r H2
 
(ε)
≤ G − Gr H2 + 2 G − Gr , Gr − G
2 (ε) )2 .
+ Gr − G
r r H2
H2

This in turn implies for all ε > 0 sufficiently small that


 
(3.6) 0 ≤ 2 G − G r , Gr − G (ε) (ε) 2 .
+ Gr − G
r r H2
H2

By considering a few different “directions of approach” of G r(ε) to Gr as ε → 0,


(3.6) will lead to a few different necessary conditions for Gr to be a locally optimal
reduced order model. Denote the poles of Gr as μ1 , μ2 , . . . , μr and suppose they are
ordered so that the first mR are real and the next mC are in the upper half plane.
Write μi = αi + ıβi . Any real rational function having the same poles as Gr (s) can
be written as

mR
γi 
mR +mC
ρi (s − αi ) + τi
H(s) = + ,
i=1
s − μi i=m +1 (s − αi )2 + βi2
R

with arbitrary real-valued choices for γi , ρi , and τi . Now suppose that μ is a real pole
for Gr and that
 
Gr (s)
(3.7) G − Gr , = 0.
s − μ H2

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


618 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

pr−1 (s)
Write Gr (s) = (s−μ) qr−1 (s) for real polynomials pr−1 , qr−1 ∈ Pr−1 and define

pr−1 (s)
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

r(ε) (s) =
G ,
[s − μ − (±ε)] qr−1 (s)
 
where the sign of ±ε is chosen to match that of G − Gr , Gs−μr (s)
H
. Then we have
2

r(ε) (s) = Gr (s) ± ε pr−1 (s)


G + O(ε2 ),
(s − μ)2 qr−1 (s)

r(ε) (s) = ∓ε Gr (s) + O(ε2 ) and


which leads to Gr (s) − G s−μ
  
  
(ε)  G r (s) 
(3.8) G − G r , Gr − G = −ε  G − Gr ,  + O(ε2 ).
r
H2  s − μ H2 
  
Then (3.6) implies that as ε → 0, 0 <  G − Gr , Gs−μ r (s)
H2
 ≤ Cε for some constant
C, which then contradicts (3.7).
Now suppose that μ = α + ıβ is a pole for Gr with a nontrivial imaginary part,
β = 0, and so is one of a conjugate pair of poles for Gr . Suppose further that
   
Gr (s) (s − α) Gr (s)
(3.9) G − Gr , = 0 and G − Gr , = 0.
(s − α)2 + β 2 H2 (s − α)2 + β 2 H2

Write Gr (s) = [(s−α)p2r−1 (s)


+β 2 ] qr−2 (s) for some choice of real polynomials pr−1 ∈ Pr−1 and
qr−2 ∈ Pr−2 . Arguments exactly analogous to the previous case lead to the remaining
assertions. In particular,
 
Gr (s)
to show G − Gr , = 0,
(s − α)2 + β 2 H2
(ε) (s) = pr−1 (s)
consider G ;
r
[(s − α)2 + β 2 − (±ε)] qr−2 (s)
 
(s − α) Gr (s)
to show G − Gr , = 0,
(s − α)2 + β 2 H2
(ε) (s) = pr−1 (s)
consider G r 2 .
[(s − α − (±ε)) + β 2 ] qr−2 (s)
The conclusion follows then by observing that if Gr is a locally optimal H2 reduced
order model, then

mR  
Gr (s)

G − Gr , Gr · H1 + H2 H2 = γi G − Gr ,
i=1
s − μi H2
mR +mC  
(s − αi ) Gr (s)
+ ρi G − G r ,
i=mR +1
(s − αi )2 + βi2 H2
mR+mC  
Gr (s)
+ τi G − G r ,
i=m +1
(s − αi )2 + βi2 H2
R

+
G − Gr , H2 (s) H2 = 0.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 619

Theorem 3.2 describes new necessary conditions for the H2 approximation prob-
lem as structured orthogonality conditions. This new formulation amounts to a unify-
ing framework for the optimal H2 problem. Indeed, as we show in sections 3.2 and 3.3,
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

two other known optimality frameworks, namely, interpolatory- [26] and Lyapunov-
based conditions [36, 22], can be directly obtained from our new conditions by using
an appropriate form for the H2 inner product. The interpolatory framework uses the
residue formulation of the H2 inner product as in (2.16); the Lyapunov framework
uses the Sylvester equation formulation of the H2 norm as in (2.12).
3.2. Interpolation-based optimality conditions. Corollary 2.5 immediately
yields an observation regarding the H2 norm of the error system, which serves as a
main motivation for the interpolation framework of the optimal H2 problem.
Proposition 3.3. Given the full-order model G(s) and a reduced order model
Gr (s), let λi and λ i be the poles of G(s) and Gr (s), respectively, and suppose that
the poles of Gr (s) are distinct. Let φi and φ j denote the residues of the transfer
functions G(s) and Gr (s) at their poles λi and λ i , respectively: φi = res[G(s), λi ] for

i = 1, . . . , n and φj = res[Gr (s), λj ] for j = 1, . . . , r. The H2 norm of the error system
is given by

2

n
G − Gr H2 = res[(G(−s) − Gr (−s)) (G(s) − Gr (s)) , λi ]
i=1

r
+ j ]
res[(G(−s) − Gr (−s)) (G(s) − Gr (s)) , λ
j=1

n   r  
(3.10) = φi G(−λi ) − Gr (−λi ) − j ) − Gr (−λ
φ j G(−λ j ) .
i=1 j=1

The H2 error expression (3.10) is valid for any reduced order model regardless
of the underlying reduction technique and generalizes a result of [20, 18] to the most
general setting.
Proposition 3.3 has the system-theoretic interpretation that the H2 error is due
to mismatch of the transfer functions G(s) and Gr (s) at mirror images of the full-
order poles λi and reduced order poles λ i . This expression reveals that for good H2
performance, Gr (s) should approximate G(s) well at −λi and −λ j . Note that λ i is
not known a priori. Therefore, to minimize the H2 error, Gugercin and Antoulas
[20] proposed choosing σi = −λi (A), where λi (A) are those system poles having big
residuals φi . They have illustrated that this selection of interpolation points works
quite well; see [18, 20]. However, as (3.10) illustrates, there is a second part of the H2
error due to the mismatch at −λ j . Indeed, as we will show below, interpolation at

−λi is more important for model reduction and is a necessary condition for optimal
H2 model reduction; i.e., σi = −λ i is the optimal shift selection.
Theorem 3.4. Given a stable SISO system G(s) = cT (sI − A)−1 b, let Gr (s) =
cr (sI − Ar )−1 br be a local minimizer of dimension r for the optimal H2 model reduc-
T

tion problem (3.1) and suppose that Gr (s) has simple poles at λ i , i = 1, . . . , r. Then

Gr (s) interpolates both G(s) and its first derivative at −λi , i = 1, . . . , r:
(3.11) i ) = G(−λ
Gr (−λ i ) and i ) = G (−λ
Gr (−λ i ) for i = 1, . . . , r.
Proof. From (3.5), consider first the case H1 = 0 and H2 is an arbitrary transfer
i , i = 1, . . . , r. Denote φ i = res[H2 (s), λ
function with simple poles at λ i ]. Then (2.16)

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


620 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

leads to

r

G − Gr , H2 H2 = i ]
res[(G(−s) − Gr (−s)) H2 (s), λ
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

i=1

r  
= i ) − Gr (−λ
φ i G(−λ i ) = 0.
i=1

Since this is true for arbitrary choices of φ i , we have G(−λ i ) = Gr (−λ i ). Now
consider the case H2 = 0 and H1 is an arbitrary transfer function with simple poles
i , i = 1, . . . , r. Then Gr (s)H1 (s) has double poles at λ
at λ i , i = 1, . . . , r, and since
G(−λ i ) = Gr (−λ i ) we have

r

G − Gr , Gr · H1 H2 = i ]
res[(G(−s) − Gr (−s)) Gr (s)H1 (s), λ
i=1

r  
=− i ] G (−λ
φ i res[Gr , λ i ) − G (−λ
r
i ) = 0,
i=1

where we have calculated


 
i )2 Gr (s) · H1 (s) = res[H1 (s), λ
lim (s − λ i ] · res[Gr (s), λ
i ] = φ i res[Gr , λ
i ].
i
s→λ

We refer to the first-order conditions (3.11) as Meier–Luenberger conditions, rec-


ognizing the work of [26], although we have here directly obtained them from the
newly derived structured orthogonality conditions (3.5).
In Theorem 3.4, we assume that the reduced order poles (eigenvalues of Ar ) are
simple; analogous results for the case that Gr has a higher order pole are straightfor-
ward and correspond to interpolation conditions of higher derivatives at the mirror
images of reduced order poles.
3.2.1. Multiple input/multiple output systems. Many of these considera-
tions extend naturally to the multiple input/multiple output (MIMO) setting:

ẋ(t) = Ax(t) + B u(t)
(3.12) G: or G(s) = C(sI − A)−1 B,
y(t) = Cx(t)
where the state vector x(t) ∈ Rn as before, but now the system has an input vector
u(t) ∈ Rm and output vector y(t) ∈ Rp , so that B ∈ Rn×m and C ∈ Rp×n for
some m, p ≥ 1. The transfer function, G(s), in (3.12) becomes matrix valued. A
reduced order system analogous to (1.2) is sought with the same number of inputs
m and outputs p, but with lower state space dimension r  n. If Vr ∈ Rn×r and
Wr ∈ Rn×r such that WrT Vr is nonsingular, we can define a (matrix-valued) reduced
order transfer function Gr (s) = Cr (sI − Ar )−1 Br with
Ar = (WrT Vr )−1 WrT AVr , Br = (WrT Vr )−1 WrT B, and Cr = CVr .
In order to assess “closeness” of MIMO systems, there is a natural extension of
the Hilbert space, H2 , to p × m matrix-valued functions. In particular, if G(s) and
H(s) are p × m matrix-valued transfer functions associated with real stable MIMO
dynamical systems, then the associated H2 inner product is
(3.13)  ∞   ∞


def 1 1

G, H H2 = T
tr G(ıω) H (ıω) dω = tr G(−ıω)HT (ıω) dω,
2π −∞ 2π −∞

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 621

where “tr(M)” denotes the trace of the matrix M. The H2 norm is then
  +∞ 1/2
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1
GH2 = G(ıω)2F
def
(3.14) dω ,
2π −∞

def

2 1/2
where FF = ij |Fij | denotes the usual Frobenius matrix norm. As before,
if G(s) and H(s) represent real dynamical systems, then
G, H H2 =
H, G H2 and

G, H H2 is real.
Necessary conditions for H2 optimality built on structured orthogonality parallel-
ing the results of section 3.1 can be derived in this setting as well. In particular, the
residue form for the inner product is a straightforward analogue of Lemma 2.4 and
leads naturally to interpolation conditions. If F(s) is a matrix-valued meromorphic
function with a pole at λ, then F(s) has a Laurent expansion (with matrix coeffi-
cients), and its residue, res[F(s), λ], will be the coefficient matrix associated with the
expansion term (s − λ)−1 . For example, suppose that F(s) has the realization F(s) =


sI − A
C −1 B. If λ is a simple pole of F(s), then we can assume that λ is a simple
eigenvalue of A associated with a rank-1 spectral projector Eλ and then F(s) =
1
s−λ λE +D(s), where D(s) is analytic at s = λ, and res[F(s), λ] = lims→λ (s−λ)F(s) =
λ B.
CE If λ is a double pole, then we can assume that λ is a double eigenvalue of
A associated with a rank-2 spectral projector Eλ and a rank-1 nilpotent matrix Nλ
such that AE λ = λEλ + Nλ . Then F(s) = 1 2 Nλ + 1 Eλ + D(s), where D(s)
(s−λ)
(s−λ)2
is analytic at s = λ, and so res[F(s), λ] = lims→λ ds d
(s − λ) F(s) = CE λ B.

Lemma 3.5. Suppose that G(s) has poles at λ1 , λ2 , . . . , λn and H(s) has poles
at λ 1 , λ
2 , . . . , λ
ñ , with both sets contained in the open left half plane. Then


ñ  
(3.15)
G, H H2 = k ] .
tr res[G(−s)HT (s), λ
k=1

In particular, suppose H(s) has a realization H(s) = C(sI −1 B:


− A)
k is a simple pole of H(s), and λ
• If λ k is associated with left and right eigen-
y
vectors of A, k , respectively,
k , and x

A k x
xk = λ k , y k y

k∗ A k∗ , and k∗ x
y k = 1,

then  
k ] =
tr res[G(−s)HT (s), λ k )b
cTk G(−λ k ,

where b T = y and
k∗ B xk .
ck = C
k

• If λk is a double pole of H(s), and λ k is associated with left and right eigen-
k and x
vectors y
k of A, and generalized eigenvectors,
zk and w k , respectively,

A k x
xk =λ k , A w
k = λ k w k , y
k + x =λ
k∗ A k y k∗ ,
z∗k A k
=λ z∗k + y
k∗ ,
and y k∗ x z∗k w
k = 0, k = 0, and z∗k x k∗ w
k = y k = 1,

then
 
k ] = d
tr res[G(−s)HT (s), λ T G(−λ
k )b
k + k )
cTk G(−λ ek − k )b
cTk G (−λ k ,
k

k and
where b ck are as above and z∗k B
eTk = k = C
and d w k.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


622 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

Now assume that Gr is an optimal reduced order model minimizing G − Gr H2


in the sense described in (3.1) and suppose further that Gr has simple poles λ i . Take
 k is
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

H(s) = Gr in (3.5) so that Gr (s) = k 1 T


ck bk and the residue of Gr (s) at λ
s−λk
k ] =
matrix valued and rank one: res[Gr (s), λ T . An analysis paralleling what we
ck b k
have carried out above yields analogous error expressions (see also [2]) and first-order
necessary conditions for the MIMO optimal H2 reduction problem:
G(−λ k = Gr (−λ
k )b k )b
k ,

(3.16)
cT G(−λ k ) =
cT Gr (−λ k ), and
k k
k )b
cTk G (−λ
k = k )b
cTk Gr (−λ k , for k = 1, . . . , r.
The SISO (m = p = 1) conditions are replaced in the MIMO case by left tangential,
right tangential, as well as bi-tangential interpolation conditions. From the discussion
of section 2.1, if Ran(Vr ) contains (λ k I + A)−1 Bb
k and Ran(Wr ) contains (λ k I +
A)−T CT ck for each k = 1, 2, . . . , r, then the H2 optimality conditions given above
hold. First-order interpolatory MIMO conditions have been obtained recently in other
independent works as well; see [24, 35].
3.2.2. The discrete time case. An nth-order SISO discrete-time dynamical
system is defined by a set of difference equations

x(t + 1) = Ax(t) + b u(t)
(3.17) G: or G(z) = cT (zI − A)−1 b,
y(t) = cT x(t)

where t ∈ Z and A ∈ Rn×n , b, c ∈ Rn . G(z) is the transfer function of the system, so


that if û(z) and ŷ(z) denote the z-transforms of u(t) and y(t), respectively, then ŷ(z) =
G(z)û(z). In this case, stability of G means that | λi (A) |< 1 for i = 1, . . . , n. Also,
 2π
the h2 norm is defined as G2h2 = 2π 1
0
| G(eıθ ) |2 dθ. Model reduction for discrete-
time systems is defined similarly. In this setting, interpolatory (necessary) conditions
for h2 optimality of the rth-order reduced model Gr (z) = cTr (zI − Ar )−1 br become





G 1/λ i = Gr 1/λ i and G 1/λ i = G 1/λ i for i = 1, . . . , r, where λ
i denotes the
r
ith eigenvalue of Ar . This is a special case of results for discrete-time MIMO systems
formulated previously in [10].
3.3. Lyapunov-based H2 optimality conditions. In this section we briefly
review the Lyapunov framework for the first-order H2 optimality conditions and
present its connection to our structured orthogonality framework.
Given a stable SISO system G(s) = cT (sI − A)−1 b, let Gr (s) = cTr (sI − Ar )−1 br
be a local minimizer of dimension r for the optimal H2 model reduction problem (3.1)
and suppose that Gr (s) has simple poles at λ i , i = 1, . . . , r.
It is convenient to define the error system
−1
Gerr (s) = G(s) − Gr (s) = cTerr (sI − Aerr ) berr
def
(3.18)
   
A 0 b
(3.19) with Aerr = , berr = , and cTerr = [cT − cTr ].
0 Ar br
Let Perr and Qerr be the Gramians for the error system Gerr (s); i.e., Perr and
Qerr solve
(3.20) Aerr Perr + Perr ATerr + berr bTerr = 0,
(3.21) Qerr Aerr + ATerr Qerr + cerr cTerr = 0.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 623

Partition Perr and Qerr :


   
P11 P12 Q11 Q12
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(3.22) Perr = , Qerr = ,


PT12 P22 QT12 Q22

where P11 , Q11 ∈ Rn×n and P22 , Q22 ∈ Rr×r . Wilson [36] showed that the reduced
order model Gr (s) = cTr (sI−Ar )−1 br can be defined in terms of a Galerkin framework
as well by taking

(3.23) Vr = P12 P−1 −1


22 and Wr = −Q12 Q22 ,

and the resulting reduced order model satisfies the first-order conditions of the optimal
H2 problem. It was also shown in [36] that WrT Vr = I. The next result states the
Lyapunov-based Wilson conditions for H2 optimality and shows their equivalence to
our structured orthogonality framework.
Theorem 3.6. The Wilson conditions for H2 optimality,

(3.24) PT12 Q12 + P22 Q22 = 0,


(3.25) QT12 b + Q22 br = 0,
(3.26) cTr P22 − cT P12 = 0,

are equivalent to the structured orthogonality conditions of Theorem 3.2.


Proof. From (3.5), consider first the case H1 = 0 and H2 is an arbitrary transfer
function with simple poles at λ i , i = 1, . . . , r. Write H2 (s) = where
cT (sI − Ar )−1 b,

b and
c can vary arbitrarily. Then from (2.12), if, for any b = 0, [P1 , P2 ] solves
T T T

    
A 0 1
P P 1 T b T
(3.27) 2 + P 2 Ar + br b = 0,
0 Ar P

we have for arbitrary


c

1
P

G − Gr , H2 H2 = [c T
− cTr ] 2
c = 0.
P

1 and P
Notice that P 2 are independent of we must have
c, so for each choice of b

1 − cT P
cT P
r 2 = 0.

= br , one may check directly that P


For b 1 = P12 and P 2 = P22 in Perr that solves
(3.20) in Wilson’s conditions.
Likewise, from (2.13) for each choice of c, if [Q 2 ] solves
1, Q
 
(3.28) [Q 2] A 0
1, Q + ATr [Q 1, Q
2] + c[cT , −cTr ] = 0,
0 Ar


then we have for every b
 
T [Q
1, Q
2] b

G − Gr , H2 H2 =b = 0.
br

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


624 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

1, Q
Similarly to the first case, [Q so for each choice of
2 ] is independent of b, c we
must have
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1b + Q
Q 2 br = 0,

and for the particular case 1 = QT and Q


c = −cr , one may check directly that Q 2 =
12
Q22 in Qerr that solves (3.21) in Wilson’s conditions. The structured orthogonality
condition
G − Gr , H H2 = 0 taken over all systems H(s) with the same poles as Gr
leads directly to the Wilson conditions (3.25) and (3.26).
The additional orthogonality condition
G − Gr , Gr · H H2 = 0 taken over all
H(s) with the same poles as Gr will yield the remaining Wilson condition (3.24).
Observe that

Gr (s)H(s) = cTr (sI − Ar )−1 br


cT (sI − Ar )−1 b
   −1  
Ar br cT 0
= [cr , 0] sI2r −
T
.
0 Ar b

Referring to (2.12), the condition


G − Gr , Gr · H H2 = 0 leads to a Sylvester
equation,
      
A 0 !1 P
W 1 !1 P
W 1 ATr 0 b
+ + T ] = 0,
[0, b
0 Ar !2 P
W 2 !2 P
W 2
cbTr ATr br

1 and P
where the use of P 2 is intended to indicate that they solve (3.27) as well.
Then
  
!1
W 1
P cr

G − Gr , Gr · H2 H2 = [c , −cr ]
T T
= 0.
!2
W 2
P 0

Alternatively, from (2.13),


(3.29)
      
1 Q
Q 2 A 0 ATr 0 1
Q 2
Q cr
+ 1 2 + [cT , −cTr ] = 0
Y1 Y2 0 Ar
cbTr ATr Y Y 0

1 and Q
(Q 2 here also solve (3.28)) and
  
1
Q 2
Q b

G − Gr , Gr · H2 H2 T ]
= [0, b = 0.
1
Y 2
Y br

and since Y
Since this last equality is true for all b, 1 and Y
2 are independent of b,

we see that Y1 b + Y2 br = 0. We know already that Q1 b + Q2 br = 0, so
    
Q 1 Q 2 b 0
1 Y 2 = .
Y br 0

1 Q
2 P
1 1
Define Q = R 1 = 0. Premultiply (3.27) by
Q 1 Y
2 2 2 . We
P
will show that R
1
1 Q
2 Y P R
, postmultiply (3.29) by , and subtract the resulting equations to get
Y1 Y2 P2

1 AT − AT R
R 1 = 0 and 2 AT − AT R
R 2 = 1.
cbTr R
r r r r

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 625

The first equation asserts that R 1 commutes with AT , and since AT has distinct
r r
eigenvalues, R 1 must have the same eigenvectors as AT . Let y i , x
i be left and right
r
i (respectively, right and left eigenvectors of AT ):
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

eigenvectors of Ar associated with λ r


Ar x
i = λi x
i and y T
i Ar = λi y i . Then R
T 1y
i = di y i . Now premultiply the second
equation by x Ti and postmultiply by y i to find
 
Ti R
x 2 AT − AT R 2 y i = x Ti
cbTr R 1y
i ,
r r

x 2y
Ti R i λ i x
i − λ 2y
Ti R Ti
i = x 1y
cbTr R i ,

T
T
0= x i c br y i di .
Either di = 0 or one of x Ti i must vanish, which would then imply that
c and bTr y
1 = 0, which
either dim H < r or dim Gr < r. Thus di = 0 for all i = 1, . . . , r and R
proves the final Wilson condition (3.24).
The converse is omitted here since it follows in a straightforward way by reversing
the preceding arguments.
Hyland and Bernstein [22] offered conditions that are equivalent to the Wilson
conditions. Suppose Gr (s) defined by Ar , br , and cTr solves the optimal H2 problem.
Then there exist positive nonnegative matrices P, Q ∈ Rn×n and two n × r matrices
Fr and Yr such that
(3.30) PQ = Fr MYrT , YrT Fr = Ir ,
where M is similar to a positive definite matrix. Then Gr (s) is given by Ar , br ,
and cTr with Ar = YrT AFr , br = YrT b, and cTr = cT Yr such that, with the skew
projection Π = Fr YrT , the following conditions are satisfied:
(3.31) rank(P) = rank(Q) = rank(PQ),

(3.32) Π AP + PAT + bbT = 0,
T
(3.33) A Q + QA + ccT Π = 0.
Note that in both [36] and [22], the first-order necessary conditions are given
in terms of (coupled) Lyapunov equations. Both [36] and [22] proposed iterative
algorithms to obtain a reduced order model satisfying these Lyapunov-based first-
order conditions. However, the main drawback in each case is that both approaches
require solving two large-scale Lyapunov equations at each step of the algorithm. [40]
discusses computational issues related to solving associated linearized problems within
each step.
Theorems 3.4 and 3.6 show the equivalence between the structured orthogonal-
ity conditions and Lyapunov- and interpolation-based conditions for H2 optimality,
respectively. To complete the discussion, we formally state the equivalence between
the Lyapunov and interpolation frameworks.
Lemma 3.7 (equivalence of Lyapunov and interpolation frameworks). The first-
order necessary conditions of both [22] as given in (3.31)–(3.33) and [36] as given
in (3.23) are equivalent to those of [26] as given in (3.11). That is, the Lyapunov-
based first-order conditions [36, 22] for the optimal H2 problem are equivalent to the
interpolation-based Meier–Luenberger conditions.
We note that the connection between the Lyapunov and interpolation frameworks
has not been observed in the literature before. This result shows that solving the op-
timal H2 problem in the Krylov framework is equivalent to solving it in the Lyapunov
framework. This leads to the Krylov-based method proposed in the next section.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


626 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

4. Iterated interpolation. We propose an effective numerical algorithm that


produces a reduced order model Gr (s) satisfying the interpolation-based first-order
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

necessary conditions (3.11). Effectiveness of the proposed algorithm results from the
fact that we use rational Krylov steps to construct a Gr (s) that meets the first-order
conditions (3.11). No Lyapunov solvers or dense matrix decompositions are needed.
Therefore, the method is suited for large-scale systems where n  1000.
Several approaches have been proposed in the literature to compute reduced order
models that satisfy some form of first-order necessary conditions; see [37, 34, 9, 21,
26, 22, 36, 25]. However, these approaches do not seem to be suitable for large-scale
problems. The ones based on Lyapunov-based conditions, e.g., [36, 22, 34, 37], require
solving a couple of Lyapunov equations at each step of the iteration. To our knowledge,
the only methods that depend on interpolation-based necessary conditions have been
proposed in [25] and [26]. The authors work directly with the transfer functions of
G(s) and Gr (s); make an iteration on the denominator [25] or poles and residues
[26] of Gr (s); and explicitly compute G(s), Gr (s), and their derivatives at certain
points in the complex plane. However, working with the transfer function, its values,
and its derivative values explicitly is not desirable in large-scale settings. Indeed, one
will most likely be given a state space representation of G(s) rather than the transfer
function. And trying to compute the coefficients of the transfer function can be highly
ill-conditioned. These approaches are similar to [30, 31], where interpolation is done
by explicit usage of transfer functions. On the other hand, our approach, which is
detailed below, is based on the connection between interpolation and effective rational
Krylov iteration, and is therefore numerically effective and stable.
Let σ denote the set of interpolation points {σ1 , . . . , σr }; use these interpolation
points to construct a reduced order model, Gr (s), that interpolates both G(s) and
1 , . . . , λ
G (s) at {σ1 , . . . , σr }; let λ(σ) = {λ r } denote the resulting reduced order
poles of Gr (s); hence λ(σ) is a function from Cr → Cr . Define the function g(σ) =
λ(σ) + σ. Note that g(σ) : Cr → Cr . Aside from issues related to the ordering of
the reduced order poles, g(σ) = 0 yields λ(σ) = −σ; i.e., the reduced order poles
λ(σ) are mirror images of the interpolation points σ. Hence, g(σ) = 0 is equivalent
to (3.11) and is a necessary condition for H2 optimality of the reduced order model,
Gr (s). Thus one can formulate a search for optimal H2 reduced order systems by
considering the root-finding problem g(σ) = 0. Many plausible approaches to this
problem originate with Newton’s method, which appears as
  
(4.1) σ (k+1) = σ (k) − (I + J)−1 σ (k) + λ σ (k) .

In (4.1), J is the usual r × r Jacobian of λ(σ) with respect to σ: for J = [Ji,j ],


i
∂λ
Ji,j = ∂σ j
for i, j = 1, . . . , r. How to compute J will be clarified in section 4.3.

4.1. Proposed algorithm. We seek a reduced order transfer function Gr (s)


that interpolates G(s) at the mirror images of the poles of Gr (s) by solving the
equivalent root-finding problem, say by a variant of (4.1). It is often the case that in
the neighborhood of an H2 optimal shift set, the entries of the Jacobian matrix become
small and simply setting J = 0 might serve as a relaxed iteration strategy. This leads
to a successive substitution framework: σi ← −λi (Ar ); successive interpolation steps
using a rational Krylov method are used so that at the (i + 1)st step interpolation
points are chosen as the mirror images of the Ritz values from the ith step. Despite
its simplicity, this appears to be a very effective strategy in many circumstances.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 627

Here is a sketch of the proposed algorithm.


Algorithm 4.1. An iterative rational Krylov algorithm (IRKA).
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

1. Make an initial selection of σi for i = 1, . . . , r that is closed under conjugation


and fix a convergence tolerance tol.  
2. Choose Vr and W  r
so that Ran(Vr ) = span (σ1 I −A)−1 b, . . . , (σr I − A)−1 b ,
Ran(Wr ) = span (σ1 I − AT )−1 c, . . . , (σr I − AT )−1 c , and WrT Vr = I.
3. while (relative change in {σi } > tol)
(a) Ar = WrT AVr ,
(b) Assign σi ←− −λi (Ar ) for i = 1, . . . , r 
(c) Update Vr and W  r
so Ran(Vr ) = span (σ1 I − A)−1b, . . . , (σr I − A)−1 b ,
Ran(Wr ) = span (σ1 I − AT )−1 c, . . . , (σr I − AT )−1 c , and WrT Vr = I.
4. Ar = WrT AVr , br = WrT b, cTr = cT Vr
Upon convergence, the first-order necessary conditions (3.11) for H2 optimality
will be satisfied. Notice that step 3(b) could be replaced with some variant of a
Newton step (4.1).
We have implemented the above algorithm and applied it to many different large-
scale systems. In each of our numerical examples, the algorithm worked very effec-
tively: It has always converged after a small number of steps and resulted in stable
reduced systems. For those standard test problems we tried where a global optimum
is known, Algorithm 4.1 converged to this global optimum.
It should be noted that the solution is obtained via Krylov projection methods
only and its computation is suitable for large-scale systems. To our knowledge, this
is the first numerically effective approach for the optimal H2 reduction problem.
We know that the reduced model Gr (s) resulting from the above algorithm will
satisfy the first-order optimality conditions. Moreover, from Theorem 3.1 this reduced
order model is globally optimal in the following sense.
Corollary 4.1. Let Gr (s) be the reduced model resulting from Algorithm 4.1.
Then Gr (s) is the optimal approximation of G(s) with respect to the H2 norm among
all reduced order systems having the same reduced system poles as Gr (s).
Therefore Algorithm 4.1 generates a reduced model, Gr (s), which is the optimal
solution for a restricted H2 problem.
4.2. Initial shift selection. For the proposed algorithm, the final reduced
model can depend on the initial shift selection. Nonetheless for most of the cases,
a random initial shift selection resulted in satisfactory reduced models. For small-
order benchmark examples taken from [22, 25, 37, 34], the algorithm converged to the
global minimizer. For larger problems, the results were as good as those obtained by
balanced truncation. Therefore, while staying within a numerically effective Krylov
projection framework, we have been able to produce results close to or better than
those obtained by balanced truncation (which requires the solution of two large-scale
Lyapunov equations).
We outline some initialization strategies that can be expected to improve the
results. Recall that at convergence, interpolation points are mirror images of the
eigenvalues of Ar . The eigenvalues of Ar might be expected to approximate the
eigenvalues of A. Hence, at convergence, interpolation points will lie in the mirror
spectrum of A. Therefore, one could choose initial shifts randomly distributed within
a region containing the mirror image of the numerical range of A. The boundary of
the numerical range can be estimated by computing the eigenvalues of A with the
smallest and largest real and imaginary parts using numerically effective tools such
as the implicitly restarted Arnoldi (IRA) algorithm.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


628 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

The starting point for another initialization strategy is the H2 expression pre-
sented in Proposition 3.3. Based on this expression, it is appropriate to initiate the
proposed algorithm with σi = −λi (A), where λi (A) are the poles with big residues,
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

φi for i = 1, . . . , r. The main disadvantage of this approach is that it requires a modal


state space decomposition for G(s), which will be numerically expensive for large-scale
problems. However, there might be some applications where the original state space
representation is in the modal form and φi might be directly read from the entries of
the matrices b and cT .
Unstable reduced order models are not acceptable candidates for optimal H2
reduction. Nonetheless stability of a reduced model is not guaranteed a priori and
might depend on the initial shift selection. We have observed that if one avoids
making extremely unrealistic initial shift selections, stability will be preserved. In our
simulations we have never generated an unstable system when the initial shift selection
was not drastically different from the mirror spectrum of A, but otherwise random.
We were able to produce an unstable reduced order system; however, this occurred
for a case where the real parts of the eigenvalues of A were between −1.5668 × 10−1
and −2.0621 × 10−3 , yet we chose initial shifts bigger than 50. We believe that with a
good starting point, stability will not be an issue. These considerations are illustrated
for many numerical examples in section 5.
Remark 4.1. Based on the first-order conditions (3.16) discussed in section 3.2.1
for MIMO systems G(s) = C(sI−A)−1 B, one can extend IRKA to the MIMO case by
replacing (σi I−A)−1 b with (σi I−A)−1 Bb i and (σi I−AT )−1 c with (σi I−A)−1 CT
ci
in Algorithm 4.1, where bi and ci are as defined in section 3.2.1.
Remark 4.2. In the discrete-time case described in (3.17) above, the root-
finding problem becomes g(σ) = Σλ(σ) − e, where eT = [1, 1, . . . , 1] and Σ =
diag(σ). Therefore, for discrete-time systems, step 3(b) of Algorithm 4.1 becomes
σi ← 1/λi (Ar ) for i = 1, . . . , r. Moreover, the associated Newton step is
 
σ (k+1) = σ (k) − (I + Λ−1 ΣJ)−1 σ (k) − Λ−1 e ,

where Λ = diag(λ).
4.3. A Newton framework for IRKA. As discussed above, Algorithm 4.1
uses the successive substitution framework by simply setting J = 0 in the Newton
step (4.1). The Newton framework for IRKA can be easily obtained by replacing step
3(b) of Algorithm 4.1 with the Newton step (4.1). The only point to clarify for the
Newton framework is the computation of the Jacobian, which measures the sensitivity
of the reduced system poles with respect to shifts.
Given A ∈ Rn×n and b, c ∈ Rn , suppose that σi , i = 1, . . . , r, are r distinct
points in C, none of which are eigenvalues of A, and define the complex r-tuple
σ = [σ1 , σ2 , . . . , σr ]T ∈ Cr together with related matrices:

(4.2) Vr (σ) = (σ1 I − A)−1 b (σ2 I − A)−1 b . . . (σr I − A)−1 b ∈ Cn×r

and
⎡ ⎤
cT (σ1 I − A)−1
⎢ cT (σ2 I − A)−1 ⎥
⎢ ⎥
(4.3) WrT (σ) = ⎢ .. ⎥ ∈ Cr×n .
⎣ . ⎦
cT (σr I − A)−1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 629

We normally suppress the dependence on σ and write Vr (σ) = Vr and Wr (σ) = Wr .


Hence, the reduced order system matrix Ar is given by Ar = (WrT Vr )−1 WrT AVr ,
where (WrT Vr )−1 Wr plays the role of Wr in Algorithm 4.1. Let λ i , for i = 1, . . . , r,
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

denote the eigenvalues of Ar . Hence, the Jacobian computation amounts to computing


i
∂λ
J(i, j) = ∂σ j
. The following result shows how to compute the Jacobian for the Newton
formulation of the IRKA method proposed here.
Lemma 4.2. Let x i be an eigenvector of Ar = (WrT Vr )−1 WrT AVr associated
with λ i , normalized so that | i | = 1. Then WrT AVr x
xTi WrT Vr x i = λ i WT Vr x i and
r

∂λ i    
(4.4) =x Ti ∂j WrT AVr x i − λ i Vr x
i + x i x
Ti WrT A − λ Ti WrT ∂j Vr x i ,
∂σj

where ∂j WrT = ∂σ j
WrT = −ej c(σj I−A)−2 and ∂j Vr = ∂σ∂j Vr = −(σj I−A)−2 b eTj .
Proof. With Vr (σ) = Vr and Wr (σ) = Wr defined as in (4.2) and (4.3), both
WrT AVr and WrT Vr are complex symmetric matrices. Write λ for λ
i and x
for x
i ,
so
(4.5) (a) WT AVr x T Vr x
= λW and (b) x T WT AVr = λ x T W T Vr .
r r r r

Equation (4.5b) is obtained by transposition of (4.5a). x T WrT Vr


is a left eigenvector
for Ar associated with λ i . Differentiate (4.5a) with respect to σj , premultiply with
T , and simplify using (4.5b):
x
 
    ∂
λ
x T ∂j WrT AVr x rx
− λV + x xT WT ∂j Vr x
T WrT A − λ = T WrT Vr x
x ,
r
∂σj

where ∂j WrT = ∂σ j
WrT = ej cT (σj I − A)−2 and ∂j Vr = ∂
∂σj V = (σj I − A)−2 b eTj .
This completes the proof.
5. Numerical examples. We first compare our approach with the earlier ap-
proaches [22, 25, 37] on low-order benchmark examples presented in those papers.
We show that in each case we attain the minimum, the main difference being that
we achieve this minimum in a numerically efficient way. For each low-order model,
comparisons are made using data taken from the original sources [22, 25, 37]. We
then test our method in large-scale settings.
5.1. Low-order models and comparisons. Consider the following 4 models:
• FOM-1: Example 6.1 in [22]. State space representation of FOM-1 is given
by
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 −150 4 0
⎢ 1 0 0 −245 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
A=⎣ ⎢ ⎥ , b = ⎣ ⎦, c = ⎣ ⎥
⎢ ⎥ ⎢ .
0 1 0 −113 ⎦ 0 0 ⎦
0 0 1 −19 0 1
We reduce the order to r = 3, 2, 1 using the proposed successive rational
Krylov algorithm, denoted by IRKA, and compare our results with the gradi-
ent flow method of [37], denoted by GFM; the orthogonal projection method
of [22], denoted by OPM; and the balanced truncation method, denoted by
BTM.
• FOM-2: Example in [25]. Transfer function of FOM-2 is given by
2s6 + 11.5s5 + 57.75s4 + 178.625s3 + 345.5s2 + 323.625s + 94.5
G(s) = .
s7 + 10s6 + 46s5 + 130s4 + 239s3 + 280s2 + 194s + 60

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


630 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

We reduce the order to r = 6, 5, 4, 3 using IRKA and compare our results with
GFM, OPM, BTM, and the method proposed in [25], denoted by LMPV.
• FOM-3: Example 1 in [34]. Transfer function of FOM-3 is given by
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

s2 + 15s + 50
G(s) = .
s4 + 5s3 + 33s2 + 79s + 50
We reduce the order to r = 3, 2, 1 using IRKA and compare our results with
GFM, OPM, BTM, and the method proposed in [34], denoted by SMM.
• FOM-4: Example 2 in [34]. Transfer function of FOM-4 is given by
10000s + 5000
G(s) = .
s2 + 5000s + 25
We reduce the order to r = 1 IRKA and compare our results with GFM,
OPM, BTM, and SMM.
G(s)−Gr (s) H2
For all these cases, the resulting relative H2 errors G(s) H2 are tabulated in
Table 5.1 below, which clearly illustrates that the proposed method is the only one
that attains the minimum in each case. More importantly, the proposed method
achieves this value in a numerically efficient way staying in the Krylov projection
framework. No Lyapunov solvers or dense matrix decompositions are needed. The

Table 5.1
Comparison.

Model r IRKA GFM OPM


FOM-1 1 4.2683 × 10−1 4.2709 × 10−1 4.2683 × 10−1
FOM-1 2 3.9290 × 10−2 3.9299 × 19−2 3.9290 × 10−2
FOM-1 3 1.3047 × 10−3 1.3107 × 19−3 1.3047 × 10−3
FOM-2 3 1.171 × 10−1 1.171 × 10−1 Divergent
FOM-2 4 8.199 × 10−3 8.199 × 10−3 8.199 × 10−3
FOM-2 5 2.132 × 10−3 2.132 × 10−3 Divergent
FOM-2 6 5.817 × 10−5 5.817 × 10−5 5.817 × 10−5
FOM-3 1 4.818 × 10−1 4.818 × 10−1 4.818 × 10−1
FOM-3 2 2.443 × 10−1 2.443 × 10−1 Divergent
FOM-3 3 5.74 × 10−2 5.98 × 10−2 5.74 × 10−2
FOM-4 1 9.85 × 10−2 9.85 × 10−2 9.85 × 10−2

Model r BTM LMPV SMM


FOM-1 1 4.3212 × 10−1
FOM-1 2 3.9378 × 10−2
FOM-1 3 1.3107 × 10−3
FOM-2 3 2.384 × 10−1 1.171 × 10−1
FOM-2 4 8.226 × 10−3 8.199 × 10−3
FOM-2 5 2.452 × 10−3 2.132 × 10−3
FOM-2 6 5.822 × 10−5 2.864 × 10−4
FOM-3 1 4.848 × 10−1 4.818 × 10−1
FOM-3 2 3.332 × 10−1 2.443 × 10−1
FOM-3 3 5.99 × 10−2 5.74 × 10−2
FOM-4 1 9.949 × 10−1 9.985 × 10−2

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 631

only arithmetic operations involved are LU decompositions and some linear solvers.
Moreover, our method does not require starting from an initial balanced realization,
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

as suggested in [37] and [22]. In all these simulations, we have chosen a random initial
shift selection, and the algorithm converged in a small number of steps.
To illustrate the evolution of the H2 error throughout the iteration, consider the
model FOM-2 with r = 3. The proposed method yields the following third-order
optimal reduced model:
2.155s2 + 3.343s + 33.8
G3 (s) = .
s3 + 7.457s2 + 10.51s + 17.57
1 = −6.2217 and λ
Poles of G3 (s) are λ 2,3 = −6.1774 × 10−1 ± ı1.5628, and it can be
shown that G3 (s) interpolates the first two moments of G(s) at −λ i for i = 1, 2, 3.
Hence, the first-order interpolation conditions are satisfied. This also means that if
we start Algorithm 4.1 with the mirror images of these Ritz values, the algorithm
converges at the first step. However, we will try four random, but bad, initial selec-
tions. In other words, we start away from the optimal solution. We test the following
four selections: S1 = {−1.01, − 2.01, − 30000}, S2 = {0, 10, 3}, S3 = {1, 10, 3},
and S4 = {0.01, 20, 10000}. With selection S1 , we have initiated the algorithm with
some negative shifts close to system poles, and consequently with a relative H2 error
bigger than 1. However, in all four cases including S1 , the algorithm converged in 5
steps to the same reduced model. The results are depicted in Figure 5.1.

Initial shifts: −1.01, −2.01, −30000


Initial shifts: 1, 10, 3
1
Initial Shitfs: 0, 10, 3
Initial Shitfs: 0.01, 20, 10000
Relative H2 Error

0.8

0.6

0.4

0.2

0
1 2 3 4 5 6 7
k: Number of iterations

Fig. 5.1. H2 norm of the error system vs. the number of iterations.

Before testing the proposed method in large-scale settings, we investigate FOM-


4 further. As pointed out in [34], since r = 1, the optimal H2 problem can be
formulated as only a function of the reduced system pole. It was shown in [34] that
there are two local minima: (i) one corresponding to a reduced pole at −0.0052
1.0313
and consequently a reduced order model Gl1 (s) = s+0.0052 and a relative error of
0.9949, and (ii) one to a reduced pole at −4998 and consequently a reduced model
Gg1 = s+4998
9999
with a relative error of 0.0985. It follows that the latter, i.e., Gg1 (s), is
the global minimum. The first-order balanced truncation for FOM-4 can be easily
1.0308
computed as Gb1 (s) = s+0.0052 . Therefore, it is highly likely that if one starts from
a balanced realization, the algorithm would converge to the local minimum Gl1 (s).
This was indeed the case as reported in [34]. SMM converged to the local minimum
for all starting poles bigger than −0.47. On the other hand, SMM converged to the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


632 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

global minimum when it was started with an initial pole smaller than −0.47. We have
observed exactly the same situation in our simulations. When we start from an initial
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

shift selection smaller than 0.48, IRKA converged to the local minimum. However,
when we start with any initial shift bigger than 0.48, the algorithm converged to the
global minimum in at most 3 steps. Therefore, for this example we were not able the
avoid the local minimum if we started from a bad shift. These observations perfectly
agree with the discussion of section 4.2. Note that the transfer function of FOM-4
can be written as
10000s + 5000 0.99 9999
G(s) = 2 = + .
s + 5000s + 25 s + 0.0050 s + 5000
The pole at −5000 is the one corresponding to the large residue of 9999. Therefore, a
good initial shift is 5000. And if we start the proposed algorithm with an initial shift
at 5000, or close, the algorithm converges to the global minimum.
5.2. CD player example. The original model describes the dynamics between
a lens actuator and the radial arm position in a portable CD player. The model has
120 states, i.e., n = 120, with a single input and a single output. As illustrated in [4],
the Hankel singular values of this model do not decay rapidly and hence the model is
relatively hard to reduce. Moreover, even though the Krylov-based methods resulted
in good local behavior, they are observed to yield large H∞ and H2 error compared
to balanced truncation.
We compare the performance of the proposed method, Algorithm 4.1, with that of
balanced truncation. Balanced truncation is well known to lead to small H∞ and H2
error norms; see [4, 19]. This is due mainly to global information available through
the two system Gramians, the reachability and observability Gramians, which are
each solutions of a different Lyapunov equation. We reduce the order to r, with r
varying from 2 to 40; and for each r value, we compare the H2 error norms due
to balanced truncation and due to Algorithm 4.1. For the proposed algorithm, two
different selections have been tried for the initial shifts. (1) Mirror images of the
eigenvalues corresponding to large residuals, and (2) a random selection with real
parts in the interval [10−1 , 103 ] and the imaginary parts in the interval [1, 105 ]. To
make this selection, we looked at the poles of G(s) having the maximum/minimum
real and imaginary parts. The results showing the relative H2 error for each r are
depicted in Figure 5.2. The figure reveals that both selection strategies work quite
well. Indeed, the random initial selection behaves better than the residual-based
selection and outperforms balanced truncation for almost all the r values except r =
2, 24, 36. However, even for these r values, the resulting H2 error is not far away from
the one due to balanced truncation. For the range r = [12, 22], the random selection
clearly outperforms the balanced truncation. We would like to emphasize that these
results were obtained by a random shift selection and staying in the numerically
effective Krylov projection framework without requiring any solutions to large-scale
Lyapunov equations. This is the main difference our proposed algorithm has with
existing methods and what makes it numerically effective in large-scale settings.
To examine convergence behavior, we reduce the order to r = 8 and r = 10 using
Algorithm 4.1. At each step of the iteration, we compute the H2 error due to the
current estimate and plot this error versus the iteration index. The results are shown
in Figure 5.3. The figure illustrates two important properties for both cases r = 8
and r = 10: (1) At each step of the iteration, the H2 norm of the error is reduced.
(2) The algorithm converges after 3 steps. The resulting reduced models are stable
for both cases.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 633

−1
BT
Large φi
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

10

Random
−2
Relative H2 Error

10

−3
10

−4
10

−5
10
0 4 8 12 16 20 24 28 32 36 40
r: Order of the reduced system

Fig. 5.2. Relative H2 norm of the error system vs. r.

3
r = 10
2.9 r=8

2.8

2.7
H2 error

2.6

2.5

2.4

2.3

2.2

2.1
1 2 3 4 5 6 7 8 9 10
k: Number of iterations

Fig. 5.3. H2 norm of the error system vs. the number of iterations.

5.3. A semidiscretized heat transfer problem for optimal cooling of


steel profiles. This problem arises during a cooling process in a rolling mill when
different steps in the production process require different temperatures of the raw
material. To achieve high throughput, one seeks to reduce the temperature as fast
as possible to the required level before entering the next production phase. This
is realized by spraying cooling fluids on the surface and must be controlled so that
material properties, such as durability or porosity, stay within given quality standards.
The problem is modeled as boundary control of a two-dimensional heat equation. A
finite element discretization using two steps of mesh refinement with maximum mesh
width of 1.382 × 10−2 results in a system of the form
Eẋ(t) = Ax(t) + bu(t), y(t) = cT x(t),
with state dimension n = 20209, i.e., A, E ∈ R20209×20209 , b ∈ R20209×7 , cT ∈
R6×20209 . Note that in this case E = I, but the algorithm works with the obvious

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


634 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

modifications. For details regarding the modeling, discretization, optimal control


design, and model reduction for this example, see [29, 7, 8]. We consider the full-order
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

SISO system that associates the sixth input of this system with the second output.
We apply our algorithm and reduce the order to r = 6. Amplitude Bode plots of
G(s) and Gr (s) are shown in Figure 5.4. The output response of Gr (s) is virtually
indistinguishable from G(s) in the frequency range considered. IRKA converged in
7 iteration steps in this case, although some interpolation points converged in the
first 2–3 steps. The relative H∞ error obtained with our sixth order system was
7.85 × 10−3 . Note that in order to apply balanced truncation in this example, one
would need to solve two generalized Lyapunov equations (since E = I) of order 20209.
This presents a severe computational challenge, though there have been interesting
approaches to addressing it (e.g., [5]).

−1
10
H(s)
Hr(s)

−2
10
| H(jω)|

−3
10

−4
10

−5
10 −8 −6 −4 −2 0 2
10 10 10 10 10 10
freq ω (rad/sec)

Fig. 5.4. Amplitude Bode plots of H(s) and Hr (s).

5.4. Successive substitution vs. Newton framework. In this section, we


present two examples to show the effect of the Newton formulation for IRKA on two
low-order examples.
The first example is FOM-1 from section 5.1. For this example, for reduction
to r = 1, the optimal shift is σ = 0.4952. We initiate both iterations, successive
substitution and Newton frameworks, away from this optimal value with an initial
selection σ0 = 104 . Figure 5.5 illustrates how each process converges. As the figure
shows, even though it takes almost 15 iterations with oscillations for the successive
substitution framework to converge, the Newton formulation reaches the optimal shift
in 4 steps.
The second example in this section is a third-order model with a transfer function
−s2 + (7/4)s + 5/4
G= .
s3 + 2s2 + (17/16)s + 15/32
One can exactly compute the optimal H2 reduced model for r = 1 as
0.97197
Gr (s) =
s + 0.2727272

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 635

x 10
4 Shift Evolution− Successive Substutition Framework
1
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.5
Shift

−0.5

−1
0 5 10 15 20 25 30 35 40 45 50
k: Number of iterations

4 Shift Evolution − Newton Iteration Framework


10
Shift

0
10

1 2 3 4 5 6 7 8 9 10
k: Number of Iterations

Fig. 5.5. Comparison for FOM-1.

Shift Evolution: Successive Substitution Framework with σ = 0.2


0
5

0
Shift

−5

−10

−15
10 20 30 40 50 60 70 80 90 100
k: Number of Iterations
Shift Evolution: Newton Framework with σ0 = 2000
2000
Shift

1 2 3 4 5 6 7 8 9 10 11
k: Number of Iterations

Fig. 5.6. Comparison for the random third-order model.

and easily show that this reduced model interpolates G(s) and its derivative at σ =
0.2727272. We initiate Algorithm 4.1 with σ0 = 0.27, very close to the optimal shift.
We initiate the Newton framework at σ0 = 2000, far away from the optimal solution.
Convergence behavior of both models is depicted in Figure 5.6. The figure shows
that for this example, the successive substitution framework is divergent and indeed

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


636 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE


∂λ
∂σ ≈ 1.3728. On the other hand, the Newton framework is able to converge to the
optimal solution in a small number of steps.
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

6. Conclusions. We have developed an interpolation-based rational Krylov al-


gorithm that iteratively corrects interpolation locations until first-order H2 optimality
conditions are satisfied. The resulting method proves numerically effective and well
suited for large-scale problems. A new derivation of the interpolation-based necessary
conditions is presented and shown to be equivalent to two other common frameworks
for H2 optimality.
Appendix.
Lemma A.1. For any stable matrix M,
 +∞  L
(ıω − M)−1 dω = lim (ıω − M)−1 dω = πI.
def
P.V.
−∞ L→∞ −L

Proof. Observe that for any L > 0,


 L  L  L
(ıω − M)−1 dω = (−ıω − M)(ω 2 + M2 )−1 dω = (−M)(ω 2 I + M2 )−1 dω.
−L −L −L

Fix a contour Γ contained in the open left half plane so that the interior of Γ
contains all eigenvalues of M. Then

1 −z
−M(ω 2 I + M2 )−1 = (zI − M)−1 dz.
2πı Γ ω + z 2
2

For any fixed value z in the left half plane,


 +∞  L
dω −z
P.V. = lim dω = π.
−∞ ıω − z L→∞ −L ω 2 + z2

Thus,
   
L
1 L
−z
lim (−M)(ω 2 I + M2 )−1 dω = lim dω (zI − M)−1 dz
L→∞ −L 2πı Γ L→∞ −L ω2 + z2

1
= π (zI − M)−1 dz = πI.
2πı Γ

REFERENCES

[1] A.C. Antoulas, Recursive modeling of discrete-time time series, in Linear Algebra for Control
Theory, P. Van Dooren and B. W. Wyman, eds., IMA Vol. Math. Appl. 62, Springer-Verlag,
New York, 1993, pp. 1–20.
[2] A.C. Antoulas, Approximation of Large-Scale Dynamical Systems, Adv. Des. Control 6,
SIAM, Philadelphia, 2005.
[3] A.C. Antoulas and J.C. Willems, A behavioral approach to linear exact modeling, IEEE
Trans. Automat. Control, 38 (1993), pp. 1776–1802.
[4] A.C. Antoulas, D.C. Sorensen, and S. Gugercin, A survey of model reduction methods
for large scale systems, in Structured Matrices in Mathematics, Computer Science, and
Engineering, I (Boulder, CO, 1999), Contemp. Math. 280, AMS, Providence, RI, 2001,
pp. 193–219.
[5] J. Badia, P. Benner, R. Mayo, E.S. Quintana-Ortı́, G. Quintana-Ortı́, and J. Saak,
Parallel order reduction via balanced truncation for optimal cooling of steel profiles, in
Proceedings of the 11th International European Conference on Parallel Processing, EuroPar
2005, Lisbon, J. C. Cunha and P. D. Medeiros, eds., Lecture Notes in Comput. Sci. 3648,
Springer-Verlag, Berlin, 2005, pp. 857–866.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


H2 MODEL REDUCTION 637

[6] L. Baratchart, M. Cardelli, and M. Olivi, Identification and rational 2 approximation:


A gradient algorithm, Automat., 27 (1991), pp. 413–418.
[7] P. Benner, Solving large-scale control problems, IEEE Control Systems Mag., 24 (2004),
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

pp. 44–59.
[8] P. Benner and J. Saak, Efficient numerical solution of the LQR-problem for the heat equation,
Proc. Appl. Math. Mech., 4 (2004), pp. 648–649.
[9] A.E. Bryson and A. Carrier, Second-order algorithm for optimal model order reduction, J.
Guidance Control Dynam., 13 (1990), pp. 887–892.
[10] A. Bunse-Gerstner, D. Kubalinska, G. Vossen, and D. Wilczek, H2 -optimal Model Re-
duction for Large Scale Discrete Dynamical MIMO Systems, ZeTeM Technical Report 07-
04, University of Bremen, 2007; available online from http://www.math.uni-bremen.de/
zetem/reports/reports-liste.html#reports2007.
[11] C. De Villemagne and R. Skelton, Model reduction using a projection formulation, Internat.
J. Control, 40 (1987), pp. 2141–2169.
[12] P. Feldman and R.W. Freund, Efficient linear circuit analysis by Padé approximation via a
Lanczos method, IEEE Trans. Computer-Aided Design, 14 (1995), pp. 639–649.
[13] P. Fulcheri and M. Olivi, Matrix rational H2 approximation: A gradient algorithm based
on Schur analysis, SIAM J. Control Optim., 36 (1998), pp. 2103–2127.
[14] D. Gaier, Lectures on Complex Approximation, Birkhäuser, Boston, 1987.
[15] K. Gallivan, E. Grimme, and P. Van Dooren, A rational Lanczos algorithm for model
reduction, Numer. Algorithms, 2 (1996), pp. 33–63.
[16] K. Gallivan, P. Van Dooren, and E. Grimme, On some recent developments in projection-
based model reduction, in ENUMATH 97 (Heidelberg), World Scientific, River Edge, NJ,
1998, pp. 98–113.
[17] E.J. Grimme, Krylov Projection Methods for Model Reduction, Ph.D. thesis, University of
Illinois, Urbana-Champaign, Urbana, IL, 1997.
[18] S. Gugercin, Projection Methods for Model Reduction of Large-Scale Dynamical Systems,
Ph.D. thesis, Rice University, Houston, TX, 2002.
[19] S. Gugercin and A.C. Antoulas, A comparative study of 7 model reduction algorithms, in
Proceedings of the 39th IEEE Conference on Decision and Control, Sydney, 2000, pp. 2367–
2372.
[20] S. Gugercin and A.C. Antoulas, An H2 error expression for the Lanczos procedure, in
Proceedings of the 42nd IEEE Conference on Decision and Control, 2003, pp. 1869–1872.
[21] Y. Halevi, Frequency weighted model reduction via optimal projection, in Proceedings of the
29th IEEE Conference on Decision and Control, 1990, pp. 2906–2911.
[22] D.C. Hyland and D.S. Bernstein, The optimal projection equations for model reduction
and the relationships among the methods of Wilson, Skelton, and Moore, IEEE Trans.
Automat. Control, 30 (1985), pp. 1201–1211.
[23] J.G. Korvink and E.B. Rudyni, Oberwolfach benchmark collection, in Dimension Reduction of
Large-Scale Systems, P. Benner, G. Golub, V. Mehrmann, and D. Sorensen, eds., Lecture
Notes in Comput. Sci. Engrg. 45, Springer-Verlag, New York, 2005.
[24] D. Kubalinska, A. Bunse-Gerstner, G. Vossen, and D. Wilczek, H2 -optimal interpolation
based model reduction for large-scale systems, in Proceedings of the 16th International
Conference on System Science, Wroclaw, Poland, 2007.
[25] A. Lepschy, G.A. Mian, G. Pinato, and U. Viaro, Rational L2 approximation: A non-
gradient algorithm, in Proceedings of the 30th IEEE Conference on Decision and Control,
1991, pp. 2321–2323.
[26] L. Meier and D.G. Luenberger, Approximation of linear constant systems, IEEE Trans.
Automat. Control, 12 (1967), pp. 585–588.
[27] B.C. Moore, Principal component analysis in linear system: Controllability, observability and
model reduction, IEEE Trans. Automat. Control, 26 (1981), pp. 17–32.
[28] C.T. Mullis and R.A. Roberts, Synthesis of minimum roundoff noise fixed point digital
filters, IEEE Trans. Circuits Systems, CAS-23 (1976), pp. 551–562.
[29] T. Penzl, Algorithms for model reduction of large dynamical systems, Linear Algebra Appl.,
415 (2006), pp. 322–343.
[30] L.T. Pillage and R.A. Rohrer, Asymptotic waveform evaluation for timing analysis, IEEE
Trans. Computer-Aided Design, 9 (1990), pp. 352–366.
[31] V. Raghavan, R.A. Rohrer, L.T. Pillage, J.Y. Lee, J.E. Bracken, and M.M. Alaybeyi,
AWE inspired, in Proceedings of the IEEE Custom Integrated Circuits Conference, 1993,
pp. 18.1.1–18.1.8.
[32] F. Riesz and B. Sz.-Nagy, Functional Analysis, Ungar, New York, 1955.
[33] A. Ruhe, Rational Krylov algorithms for nonsymmetric eigenvalue problems II: Matrix pairs,
Linear Algebra Appl., 197 (1994), pp. 283–295.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


638 S. GUGERCIN, A. C. ANTOULAS, AND C. BEATTIE

[34] J.T. Spanos, M.H. Milman, and D.L. Mingori, A new algorithm for L2 optimal model re-
duction, Automat., 28 (1992), pp. 897–909.
[35] P. Van Dooren, K.A. Gallivan, and P.-A. Absil, H2 optimal model reduction of MIMO
Downloaded 07/01/12 to 198.82.177.50. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

systems, Appl. Math. Lett., to appear; doi:10.1016/j.aml.2007.09.015.


[36] D.A. Wilson, Optimum solution of model reduction problem, Proc. IEE-D, 117 (1970),
pp. 1161–1165.
[37] W.-Y. Yan and J. Lam, An approximate approach to H2 optimal model reduction, IEEE Trans.
Automat. Control, 44 (1999), pp. 1341–1358.
[38] A. Yousouff and R.E. Skelton, Covariance equivalent realizations with applications to model
reduction of large-scale systems, in Control and Dynamic Systems, Vol. 22, C. T. Leondes,
ed., Academic Press, New York, 1985, pp. 273–348.
[39] A. Yousouff, D.A. Wagie, and R.E. Skelton, Linear system approximation via covariance
equivalent realizations, J. Math. Anal. Appl., 196 (1985), pp. 91–115.
[40] D. Zigic, L. Watson, and C. Beattie, Contragredient transformations applied to the optimal
projection equations, Linear Algebra Appl., 188/189 (1993), pp. 665–676.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy