0% found this document useful (0 votes)
37 views24 pages

On The Approximation Quality of Markov State Models: © by SIAM. Unauthorized Reproduction of This Article Is Prohibited

Uploaded by

김현준
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views24 pages

On The Approximation Quality of Markov State Models: © by SIAM. Unauthorized Reproduction of This Article Is Prohibited

Uploaded by

김현준
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MULTISCALE MODEL. SIMUL.

c 2010 Society for Industrial and Applied Mathematics



Vol. 8, No. 4, pp. 1154–1177

ON THE APPROXIMATION QUALITY OF MARKOV


STATE MODELS∗
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

MARCO SARICH† , FRANK NOɆ , AND CHRISTOF SCHÜTTE†

Abstract. We consider a continuous-time Markov process on a large continuous or discrete state


space. The process is assumed to have strong enough ergodicity properties and to exhibit a number
of metastable sets. Markov state models (MSMs) are designed to represent the effective dynamics of
such a process by a Markov chain that jumps between the metastable sets with the transition rates
of the original process. MSMs have been used for a number of applications, including molecular
dynamics, for more than a decade. Their approximation quality, however, has not yet been fully
understood. In particular, it would be desirable to have a sharp error bound for the difference in
propagation of probability densities between the MSM and the original process on long timescales.
Here, we provide such a bound for a rather general class of Markov processes ranging from diffusions
in energy landscapes to Markov jump processes on large discrete spaces. Furthermore, we discuss
how this result provides formal support or shows the limitations of algorithmic strategies that have
been found to be useful for the construction of MSMs. Our findings are illustrated by numerical
experiments.

Key words. Markov state model, biomolecular dynamics conformations, metastable sets, effec-
tive dynamics, transfer operator, spectral gap, lag time, diffusive dynamics

AMS subject classifications. Primary, 60J99; Secondary, 62M05

DOI. 10.1137/090764049

1. Introduction. We consider a continuous-time Markov process (Xt )t∈R on


some state space E which may be continuous or discrete but large. We assume that
the process is sufficiently ergodic in E (see Remark 2.1 in section 2.2) such that there
is a unique invariant measure μ. The associated semigroup of transfer operators is
denoted (Tt )t∈R ; the transfer operators describe how the process propagates functions
in the state space; e.g., if v0 is a function at time t = 0, then Tt v0 is the function that
results from the time-t transport of v0 via the underlying dynamics.
Markov state models (MSMs) have been considered for processes that have meta-
stable dynamics [1, 2, 3, 4], especially in molecular dynamics. Recently the interest
in MSMs has drastically increased since it has been demonstrated that MSMs can
be constructed even for very high-dimensional systems [2] and have been especially
useful for modeling the interesting slow dynamics of biomolecules [5, 6, 7, 8, 9, 10] and
materials [11] (there under the name “kinetic Monte Carlo”). Their approximation
quality on large timescales has been rigorously studied, e.g., for Brownian or Glauber
dynamics and Ising models in the limit of vanishing smallness parameters (noise in-
tensity, temperature) where the analysis can be based on large deviation estimates
and variational principles [12, 13] and/or potential theory and capacities [14, 15].
In these cases, the effective dynamics is governed by some MSM with exponentially
small transition probabilities and its states label the different attractors of the under-
lying, unperturbed dynamical systems. Other approaches have tried to understand
the multidimensional setting for complex dynamical systems by generalizing Kramer’s
∗ Received by the editors July 6, 2009; accepted for publication (in revised form) March 15,
2010; published electronically May 28, 2010. This work was supported by the DFG research center
Matheon “Mathematics for key technologies” in Berlin.
http://www.siam.org/journals/mms/8-4/76404.html
† Freie Universität Berlin, Institut für Mathematik II, Arnimallee 2-6, 14195 Berlin, Germany

(sarich@math.fu-berlin.de, noe@math.fu-berlin.de, schuette@math.fu-berlin.de).


1154

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1155

approach, e.g., by discussing asymptotic expansions based on the Wentzel–Kramers–


Brillouin approximation in semiclassical quantum dynamics, matched asymptotics, or
similar techniques; see, e.g., [16, 17]. Another rigorous approach to the construction
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

of an MSM involves the exploitation of spectral properties. The relation between


dominant eigenvalues, exit times and rates, and metastable sets has been studied by
asymptotic expansions in certain smallness parameters as well as by functional an-
alytic means without any relation to smallness parameters [18, 19, 3, 4, 1]. In real
applications with high-dimensional state spaces, asymptotic expansions are based on
assumptions that typically cannot be checked and often enough are not satisfied, in-
volve quantities that cannot be computed, and/or are rather specific for a certain class
of processes. Even if a smallness parameter can be defined, we typically cannot check
whether we are in the asymptotic regime such that the theoretical results cannot be
used for error estimates. A general approach for accessing the approximation quality
of an MSM is still missing. Here, we will pursue this by following the functional
analytic approach found in [20, 19, 3].
In order to explain the construction of an MSM, we first fix a lag time τ > 0 and
consider some finite subdivision A1 , . . . , An of E, such that
n

(1) Aj = E, Ai ∩ Aj = ∅ ∀i = j,
j=1

with “nice” sets Aj (e.g., with a Lipschitz boundary). We introduce the discrete
process (X̂k )k∈N on the finite state space Ê = {1, . . . , n} by setting

(2) X̂k = i ⇔ Xkτ ∈ Ai .

(X̂k ) describes the snapshot dynamics of the continuous process (Xt ) with lag time τ
between the sets A1 , . . . , An . This process (X̂k ) is generally not Markovian, i.e.,

(3) P[X̂k+1 = j|X̂k = ik , X̂k−1 = ik−1 , . . . , X̂0 = i0 ] = P[X̂k+1 = j|X̂k = ik ].

However, MSMs attempt to approximate this process via a discrete Markov process
(X̃k )k∈N on Ê = {1, . . . , n} defined by the transition matrix P with entries

(4) Pτ (i, j) = Pμ [X̂1 = j|X̂0 = i] = Pμ [Xτ ∈ Aj |X0 ∈ Ai ].

While the long-term dynamical behavior of the original process (Xkτ )k∈N is governed
by Tkτ = Tτk for k ∈ N, the long-term dynamics of the MSM process (X̃k )k∈N is
governed by Pτk . Thus, for accessing the approximation quality of the MSM compared
to the original process, we have to study the error

(5) E(k) = dist(Tτk , Pτk ),

where dist denotes an appropriate metric measuring the difference between the op-
erators. We will see that under strong enough ergodicity conditions on the original
Markov chain (see Remark 2.1) we have E(k) ≤ 2ρk for some 0 < ρ < 1. However, we
are interested in how E(k) depends on the lag time τ and the sets A1 , . . . , An such
that the error E can be kept below a user-defined threshold.
The remainder of the article is organized as follows. In section 2 we introduce the
setting and give the general definition of MSM transfer operators. Then, in section 3
we compare the densities of the random variables (X̂k ) to the densities of the MSM

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1156 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

process (X̃k ) and see how the approximation quality of these densities depends on
the choice of the state space discretization A1 , . . . , An and the lag time τ . Section 4
extends our findings to some algorithmic strategies for the construction of MSMs
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

that are discussed in the literature. Finally, the results are illustrated in numerical
examples in section 5.
2. The MSM transfer operator.
2
 2 We consider the transfer operators Tt as operators on Lμ =
2.1. Setting.
{v : E → R : v dμ < ∞} with scalar product v, w = vwdμ. In the following
· will denote the associated norm v 2 = v, v on L2μ and the corresponding
operator norm B = maxv=1 Bv of an operator B : L2μ → L2μ .
In L2μ , Tτ has the general form
 
(6) Tτ v(y)μ(dy) = P[Xτ ∈ C|X0 = x]v(x)μ(dx) ∀ measurable C ⊂ E
C E

such that Tτ 1 = 1, where 1(x) = 1 for all x ∈ E. In the following we set T := Tτ .


Please note that if the transition function is absolutely continuous, i.e., if for all
measurable sets C

P[Xτ ∈ C|X0 = x] = p(τ, x, y)μ(dy),
C

then the above definition has the much simpler form



Tτ v(y) = p(τ, x, y)v(x)μ(dx).
E

2.2. Assumptions on the original process. Now let us assume that T has
m real eigenvalues λ1 , . . . , λm ∈ R,

(7) λ0 = 1 > λ1 ≥ λ2 ≥ · · · ≥ λm ,

with an orthonormal system of eigenvectors (uj )j=1,...,m , i.e.,



1, i = j,
(8) T uj = λj uj , ui , uj =
 j,
0, i =

and u0 = 1. Furthermore, we assume that the remainder of the spectrum of T lies


within a ball Br (0) ⊂ C with radius r < λm . In order to keep track of the dependence
of the eigenvalues on the lag time τ , we introduce the associated rates

(9) λj = exp(−Λj τ ), r = exp(−Rτ ), r/λ1 = exp(−τ (R − Λ1 )) = exp(−τ Δ),

of which the spectral gap Δ > 0 will play an essential role later. We should empha-
size that the notion “spectral gap” is usually used differently. It usually designates
a situation in which an entire interval of the real axis does not contain any eigen-
values, whereas the intervals above and below show a significantly denser population
of eigenvalues. Despite the obvious difference in our case, we will adopt the name
spectral gap for Δ since it plays a role in finding upper bounds similar to that of the
usual spectral gaps.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1157

Based on the above assumptions we can write


m

T v = T Πv + T Π⊥ v = λj v, uj uj + T Π⊥ v,
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(10)
j=0

where Π is the orthogonal projection onto U = span{u1 , . . . , um }


m

(11) Πv = v, uj uj
j=0

and Π⊥ = Id − Π is the projection error with

(12) T Π⊥ ≤ r < λm , spec(T ) \ {1, λ1 , . . . , λm } ⊂ Br (0) ⊂ C.

Furthermore, we assume that the subspace U and the remaining subspace do not mix
under the action of T :

(13) ΠT Π⊥ = Π⊥ T Π = 0,

and therefore the dynamics can be studied by considering the dynamics of both sub-
spaces separately:

(14) T k = (T Π)k + (T Π⊥ )k ∀k ≥ 0,

where the operator T Π is self-adjoint because of (8). Note that Π⊥ T Π = 0 in (13)


is always true, but ΠT Π⊥ = 0 is an assumption. Reversible processes definitely have
this property (see Remark 2.1), so the theory can be applied to most processes in
molecular dynamics applications. Nevertheless it is not completely clear which other
classes of processes might match the condition (13).
In addition we also define the orthogonal projection Π0 as

(15) Π0 v := v, u0 u0 = v, 1 1.

According to the above we have the asymptotic convergence rate T k − Π0 = λk1 for
all k ∈ N.
Remark 2.1. The assumptions (7), (8), (12), and (13) are definitely satisfied if
T is sufficiently ergodic and is self-adjoint (T is self-adjoint if the underlying original
Markov process (Xt ) is reversible). But it may also be sufficient if, e.g., (Xt ) is suf-
ficiently ergodic and has a dominant self-adjoint part, as is the case for second-order
Langevin dynamics with not too large friction [21] or for thermostatted Hamiltonian
molecular dynamics or stochastically perturbed Hamiltonian systems [3, 22]. Re-
versible or not, the property of being “sufficiently ergodic” seems to be central in any
case. We will now give sufficient conditions in technical terms for a reversible process.
These results and their generalizations to nonreversible cases can be found in [23, 3].
• A reversible and μ-irreducible process (Xt ) is sufficiently ergodic if one of the
following scenarios holds:
(i) (Xt ) is V -ergodic or geometrically ergodic; see [3].
(ii) The stochastic transition function p(t, x, ·) = pa (t, x, ·)+ps (t, x, ·) associ-
ated with (Xt ), where pa denotes the absolutely continuous part and ps
r
the singular part, satisfies the following two conditions: (a) pa ∈ Lμ×μ
for some 2 < r < ∞, and (b) Sv(y) = v(x)pa (t, x, y)μ(dy) satisfies
S 2,μ > 0.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1158 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

The above conditions guarantee mainly that the essential spectrum of T is


contained in some circle with radius strictly smaller than 1.
• There are many processes for which these conditions can be shown to be
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

valid; an example is a diffusion process in a smooth energy landscape V


with V → ∞ for x → ∞ fast enough; in this case the spectrum is known
to be discrete and real-valued. Comparable results (discrete and real-valued
dominant spectrum) can be found in [21] for second-order Langevin dynamics
with not too large friction.
2.3. MSM projection. Let χA denote the characteristic function of set A. We
define the orthogonal projection Q onto the n-dimensional space of step functions
Dn = span{χA1 , . . . , χAn }, i.e.,

n n
v, χAi
(16) Qv = χ Ai = v, φi φi ,
i=1
μ(Ai ) i=1

with orthonormal basis (φi )i=1,...,n of Dn ,


χA
(17) φi =  i .
μ(Ai )

That is, the orthogonal projection Q keeps the measure on the sets A1 , . . . , An , but on
each of the sets the ensemble will be redistributed according to the invariant measure
and the detailed information about the distribution inside of a set Ai is lost.
Since the sets A1 , . . . , An form a full partition of E, we have

(18) Q1 = 1,

which implies

(19) QΠ0 = Π0 Q = Π0 .

2.4. MSM transfer operator. Now consider the projection of our transfer
operator T onto Dn :

(20) P = QT Q : L2μ → Dn ⊂ L2μ .

In order to understand the nature of P , let us consider it as an operator on a finite-


dimensional space, P : Dn → Dn . Here, as a linear operator on a finite-dimensional
space, it has a matrix representation with respect to some basis. Let us take the basis
(ψi )i=1,...,n of probability densities given by
χ Ai
(21) ψi = .
μ(Ai )

By using the definition of T we get


n
 n

T ψi , χAj T χAi , χAj
P ψi = QT Qψi = QT ψi = χ Aj = ψj
j=1
μ(Aj ) j=1
μ(Ai )
(22) n 
 
1
= P[Xτ ∈ Aj |X0 = x] μ(dx) · ψj
j=1
μ(Ai ) Ai

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1159

such that
n

(23) P ψi = Pμ [Xτ ∈ Aj |X0 ∈ Ai ] · ψj .
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

j=1

So the transition matrix of the MSM Markov chain defined in (4) is identical with the
matrix representation for the projected transfer operator P ; therefore P is the MSM
transfer operator. P inherits the ergodicity properties and the invariant measure of
T , as the following lemma shows (the proof can be found in section 6.1).
Lemma 2.2. For every k ∈ N we have
(24) P k − Π0 ≤ (T Q)k − Π0 ≤ λk1 .
3. Approximation quality of coarse-grained transfer operators.
3.1. Approximation error E. The approximation quality of the MSM Markov
chain can be characterized by comparing the operators P k and T k for k ∈ N restricted
to Dn :
(25) E(k) = QT k Q − P k = QT k Q − Q(T Q)k .
Lemma 2.2 immediately implies that this error decays exponentially,
(26) E(k) = QT k Q − P k ≤ QT k Q − Π0 + P k − Π0
≤ Q(T k − Π0 )Q + P k − Π0 ≤ 2λk1 ,
independent of the choice of the sets A1 , . . . , An . Since we want to understand how the
choice of the sets and other parameters like the lag time τ influence the approximation
quality, we have to analyze the prefactor in much more detail.
3.2. Main result: An upper bound on E. The following theorem contains
the main result of this article.
Theorem 3.1. Let T = Tτ be a transfer operator for lag time τ > 0 with
properties as described above, in particular (7), (8), (12), and (13). Let the disjoint
sets A1 , . . . , An form a full partition and define
(27) Q⊥ uj =: δj ≤ 1 ∀j, δ := max δj ,
j=1,...,m

where Q⊥ = Id − Q denotes the projection onto the orthogonal complement of Dn in


L2μ and the maximal projection error δ depends on the chosen partitioning A1 , . . . , An .
Furthermore, set
r
(28) η(τ ) := = exp(−τ Δ) < 1, with Δ > 0.
λ1
Then the error (25) is bounded from above by


(29) E(k) ≤ min 2 ; C(δ, η(τ ), k) · λk1 ,

with a leading constant of the following form:




(30) C(δ, η, k) = (mδ + η) Csets (δ, k) + Cspec (η, k) ,
(31) Csets (δ, k) = m1/2 (k − 1) δ,
η
(32) Cspec (η, k) = (1 − η k−1 ).
1−η
The proof can be found in section 6.2.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1160 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

3.3. Interpretation and observations. The theorem shows that the overall
error can be made arbitrarily small by making the factor [Csets (δ, k) + Cspec (η, k)]
small. In order to understand the role of these two terms, consider for now k ≥ 2 to
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

be fixed. The following can then be observed:


1. The prefactor Csets depends on the choice of the sets A1 , . . . , An only. It can
be made smaller than any tolerance by choosing the sets appropriately and
the number of sets n large enough.
2. The prefactor Cspec is independent of the set definition and depends on the
spectral gap Δ and the lag time τ only. While the spectral gap is given by
the problem, the lag time may be chosen, and thus Cspec can also be made
smaller than any tolerance by choosing τ large enough. However, the factor
Cspec will grow unboundedly for τ → 0 and k → ∞, suggesting that using a
large enough lag time is essential to obtain an MSM with good approximation
quality, even if the sets are well approximated.
If we are interested in a certain timescale, say T , then we have to set k − 1 = T /τ .
Then the prefactor gets the form

T 1/2 T η
(33) C δ, η, = (mδ + η) m δ+ F , F = 1 − exp(−T Δ),
τ τ 1−η
in which the numbers m and T have been chosen by the user and the spectral gap Δ
is determined by the (m + 1)st eigenvalue and thus, for given m, is a constant of the
system. The error thus depends on the choice of the sets (via δ) and the lag time τ
(and with it η = exp(−τ Δ)). In the case where one is interested in the slowest process
in the system, the time of interest may be directly related to the second eigenvalue
via T ∝ 1/Λ1 . For example, one may choose the half-life time of the decay of the
slowest process, T = log 2/Λ1.
Vanishing noise diffusion. There are quite a few articles concerned with showing
that a diffusion process in a multiwell landscape for vanishing noise can be approx-
imated by a Markov jump process between the basins of the wells [4]. There the
one-dimensional continuous stochastic process (Xt )t∈R is governed by the stochastic
differential equation

(34) dXt = −∇V (Xt )dt + 2
dBt ,

where Bt denotes standard Brownian motion, 2
the small noise intensity, and V
the potential, as, e.g., illustrated in Figure 1.
The associated process satisfies our assumptions on the transfer operator. In
particular, the transfer operator is self-adjoint such that the spectrum is real-valued
and we have Λ1 ∝ exp(−ΔV /
). As
→ 0, we have Δ = O(1) while the timescale
of interest increases exponentially: T ∝ exp(ΔV /
). Taking m = 1, it is indeed
found that the projection error δ, with sets A1 , A2 chosen as shown in Figure 1, also
decreases exponentially with
: As shown in [17], there is a 0 < v < ΔV such that
δ ∝ exp(−v/
). When choosing the lag time τ such that it decreases exponentially
with
→ 0, τ ∝ exp(−ξ/
) with ξ > max(ΔV − 2v, 0), the upper bound E on the
approximation error decreases exponentially with
:
(35) E(T ) ∝ exp((ΔV − 2v − ξ)/
), with ΔV − 2v − ξ > 0.
Note that the fraction T /τ still grows exponentially with the vanishing noise; i.e.,
the time resolution of the MSM improves drastically compared to the timescale of
interest. If v > ΔV /2, τ can even be on the order of 1/
while still retaining an
exponentially vanishing error.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1161

2.5
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Potential V 2

1.5

1
ΔV
0.5
A1 A2
0
−1.5 −1 −0.5 0 0.5 1 1.5
x

Fig. 1. The double well potential V .

Bounding the error. Consider that the desired quality of the MSM approximation
is defined by the user via some tolerable error bound tol at some timescale T . This
requirement is met by satisfying E(T ) ≤ tol. A rational procedure for guaranteeing
this can be outlined as follows:
1. Define the timescale of interest, T .
2. Define the number of eigenfunctions, m, that we are seeking to approximate
well.
3. Compute the spectral gap, Δ, which is ideally given by the (m + 1)st eigen-
value. This will be directly accessible only for simple systems. For more
complex systems, it may, however, be possible to bound Δ from below and
thus guarantee that E(k) will remain an upper bound. In practical cases
involving statistical uncertainty (e.g., molecular dynamics), one may only be
able to estimate a probability distribution of the eigenvalues and thus, for
any given dataset, be able to estimate an almost certain lower bound for Δ.
4. Set the desired lag time τ depending on how much time resolution is desired
compared to the timescale of interest.
5. Solve

1/2 T η
(36) tol = (mδ + η) m δ + F ,
τ 1−η
with η = exp(−τ Δ), and F = 1 − exp(−T Δ)

for δ and adapt the choice of the sets A1 , . . . , An and their number n to the
requirement δ = δ(τ ) that results from (36). We will illustrate this in an
explicit example in section 5.
For practical applications, δ can also only be estimated, and an approach to do
this based on two differently fine discretizations is outlined in section 4.3.
Metastability. The handbook [3] gives the following theorem, in which smallness
of the projection error δ is related to the metastability of a subdivision A0 , . . . , Am of
the state space.
Theorem 3.2. Let T be a self-adjoint transfer operator with lag time τ and
properties as described above, in particular (7), (8), (12), and (13). The metastability
of an arbitrary decomposition A0 , . . . , Am of the state space is bounded from below and

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1162 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

above by
m

(37) 1+(1−δ12)λ1 +· · ·+(1−δm
2
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

)λm +c ≤ Pμ [Xτ ∈ Ai |X0 ∈ Ai ] ≤ 1+λ1 +· · ·+λm ,


j=1
 
where, as above, δj = Q⊥ uj , and c = −r δ12 + · · · + δm
2
.
This result tells us that the minimization of δ with m + 1 sets (for m + 1 eigen-
values) corresponds to identifying the subdivision with maximum joint metastability.
3.4. Generalization of the projection Q. Theorem 3.1 can easily be gener-
alized. First, we can easily see that we do not need to assume that the dynamical
process (Xt ) under consideration is time-continuous. Our results still hold if T = Tτ
denotes the transfer operator for τ steps of some time-discrete process and if the lag
time is no longer a continuous but a discrete variable.
It is not required that Q be the projection onto some space spanned by indicator
functions. Q can be any orthogonal projection onto some linear subspace D of L2μ
with the property Q1 = 1, i.e., 1 ∈ D.
Theorem 3.3. Let T = Tτ be a transfer operator for lag time τ > 0 with
properties as described above, in particular (7), (8), (12), and (13). Let Q denote the
projection onto some linear subspace D of L2μ and define

(38) Q⊥ uj =: δj ≤ 1 ∀j, δ := max δj ,


j=1,...,m

where Q⊥ = Id − Q. Define the projected transfer operator P = QT Q. Furthermore,


set η(τ ) = exp(−τ Δ) < 1 with Δ > 0. Then the error E(k) = QT k Q − P k is
bounded from above by


(39) E(k) ≤ min 2 ; C(δ, η(τ ), k) · λk1 ,

where C(δ, η, k) is as in Theorem 3.1.


Note that in this general case P does not necessarily have the interpretation of a
transition matrix. For this, consider the following two examples:
1. Let us decompose the state space into two components, E = Ex × Ey , such
that every z ∈ E can be written z = (x, y) with x ∈ Ex , y ∈ Ey . We consider
the (infinite-dimensional) subspace

(40) D = {g ∈ L2μ : g(x, ·) is constant on Ey ∀x ∈ Ex },

and

(41) Qv(x) = v(x, y)μx (dy),
Ey
 
where μx (C) = C μ(x, dy)/ Ey μ(x, dy) is the marginal of the invariant mea-
sure for fixed x. Associated averaged transfer operators are considered in the
context of so-called hybrid Monte Carlo methods; see [3, 2].
2. Consider the same projection Q as in (41), with Ex = span{u0 , . . . , um },
i.e., the projection onto the m-dimensional slow subspace of the dynamics.
For this case, the MSM error E(k) is 0 for all k, showing that a Markovian
formulation of the dynamics in the slow subspace is in principle possible.
In practical applications, however, it is often desired to obtain equations of

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1163

motion in a subspace spanned by some slow intrinsic degrees of freedom of


the system, which usually do not lie perfectly in span{u0 , . . . , um }, such that
there is still a finite projection error remaining.
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

3. Instead of the indicator functions we may choose mollified indicator functions


fj ≥ 0, j = 1, . . . , n. Then D = span{f1 , . . . , fn }, but the ansatz function
may no longer be orthogonal such that we have to consider the mass matrix
M with entries Mij = fi , fj . Then,
n

(42) Qv = (M −1 )ij fj , v fi .
i,j=1

Such mollified or fuzzy MSMs have been considered, e.g., in [24] (see sec-
tion 4.1).
4. Algorithmic considerations.
4.1. Almost exact fuzzy MSM. Let us return to the last example of general-
ization in section 3.4. There, the MSM subspace spanned by the indicator function of
sets has been replaced by another finite-dimensional subspace, D = span{f1 , . . . , fn },
with ansatz functions that no longer are indicator functions. Now, assume that we
can design (almost everywhere) nonnegative ansatz functions by linear combination
of the eigenvector uj , i.e.,
m

(43) fj = ajk uj , with scalars ajk for j = 0, . . . , m.
k=0

Then, by construction, we have that D = span{f0 , . . . , fm } = U and 1 ∈ D such that


Theorem 3.3 applies with projection error δ = 0. This results in a fuzzy MSM with
transfer operator P = QT Q with Q according to (42); obviously P can no longer be
interpreted as a transition matrix between sets in the state space. In addition to δ = 0
we also have Π = Q in the proof of Theorem 3.1. This implies immediately that

(44) E(k) = QT k Q − P k = 0 ∀k = 1, 2, 3, . . . ,

showing that the MSM is exact. M. Weber et al. have developed such an MSM variant
and discussed its applicability and interpretation [24]; in particular they show how
to optimally compute the coefficients aij for nonnegative fuzzy membership functions
[25]. A warning seems appropriate: The exactness requires having the eigenvectors of
T exactly. This is something that cannot be assumed in practice: The eigenvectors
result from numerical computations and will be effected by statistical and numerical
errors. Thus, any practical implementation of this strategy will also have to consider
the actual approximation quality of the MSM depending on the δ induced by the
numerical approximation of the fi and on the lag time τ .
4.2. MSM based on projections of the original dynamics. In practice,
MSMs are often constructed not by considering arbitrary sets Aj ⊂ E but only
sets which result from discretization of the subspace of a certain set of “essential”
coordinates θ : E → θ(E) ⊂ E. For example, in molecular systems, one usually
ignores the solvent coordinates [8, 26, 6] and may even further consider only a subset
of solute coordinates such as torsion angles [26, 6]. The projection of the original
process on the essential subspace, θ(Xt )t∈R , will then in general be far from being
Markovian. However, this does not concern our result or the construction of MSMs

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1164 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

in general. Let E(A) = {x ∈ E : θ(x) ∈ A} denote the cylinder set that belongs to
a subset A of the essential subspace θ(E). Thus, any subdivision A1 , . . . , An of θ(E)
will induce a subdivision E(A1 ), . . . , E(An ) of the full state space E. Thus, the above
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

results are fully valid if applied to these subdivisions. So the question of whether
an MSM based on a definition of states in a subspace θ has a good approximation
quality boils down to the question of whether the projection error δ can be kept below
a certain accuracy threshold based on cylinder sets. A necessary condition for this
possibility is that the eigenvectors u0 , . . . , um of the original transfer operator in full
state space E are almost constant along the fibers E(ϑ) = {x ∈ E : θ(x) = ϑ}.
In other words, the approximation quality of the MSM can be good if the variables
ignored by the projection onto θ are sufficiently “fast.” More precisely, whenever
the dynamics along these fibers is rapidly mixing on some timescale
that is much
smaller than the mixing times orthogonal to the fibers (order 1 or larger), one can
show by multiscale analysis that (in the limit
→ 0) the eigenvectors of the full
transfer operator are constant up to a scale that vanishes with
; cf. [27].
4.3. Comparing coarser and finer MSMs. In practical cases, the eigenvec-
tors and eigenvalues of T are not directly available. Because of that, many articles in
the MSM literature consider a fine subdivision of the state space and the associated
MSM first and construct the final, coarse MSM based on the eigenvectors and eigen-
values of the fine MSM [2, 28, 8, 5, 6]. In order to analyze this procedure based on
the above estimate of the MSM error, let us consider a case with two subdivisions of
very different fineness:
(1) (1)
A1 , . . . , AN : fine subdivision of E,
(2) (2)
A1 , . . . , An : coarse subdivision of E, n  N ;
consider the associated projections

Q(1) : fine grid orthogonal projection,


Q(2) : coarse grid orthogonal projection,
Q(12) : fine to coarse orthogonal projection
and the induced MSM transfer operators

P (1) = Q(1) T Q(1) : transfer operator of fine MSM process,


P (2) = Q(2) T Q(2) : transfer operator of coarse MSM process.
Obviously the projection error of the eigenvectors of T can be quite different.
Lemma 4.1. The fine and coarse projection error,

(45) δ (1) = max Q(1),⊥ uj , δ (2) = max Q(2),⊥ uj ,


j j

and the errors of the fine and coarse MSMs satisfy the estimate

δ (2) − δ (1) ≤ δ (12) ,


(46) (δ (2) )2 − (δ (1) )2 = (δ (12) )2 ,
E2 (k) − E1 (k) ≤ Q(12) (P (1) )k Q(12) − (P (2) )k .
The proof of this lemma is given in section 6.3.
This lemma makes explicit that coarse graining the state definition via Q(12) will
always increase the MSM error, as long as statistical effects are not considered.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1165

4.4. Statistical and total error. When considering an MSM for biomolecular
systems the MSM transfer operator P and its matrix representation (Pij ) are normally
not known exactly. Instead, only statistical approximations P̃ij of its entries
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(47) P̃ij ≈ Pμ (Xτ ∈ Aj |X0 ∈ Ai )

are available. Thus, the total error of an MSM compared to the original dynamics is
not E(k) = QT k Q − P k but

(48) Ẽ(k) = QT k Q − P̃ k ≤ E(k) + P k − P̃ k .

There are several approaches to estimation of the statistical error P k − P̃ k [29,


30, 31]. So it seems that combining these approaches to the results presented herein
should give control of the total error Ẽ. However, a warning seems appropriate:
Being able to bound E via Theorem 3.1 requires knowledge of P instead of P̃ such
that additional research will be needed to be able to bound the total error.

5. Numerical examples.

5.1. Double well potential. The results and concepts above will first be illus-
trated on a one-dimensional diffusion in a double well potential. In contrast to the
diffusive dynamics considered in section 3.3, this example does not rely on vanishing
noise approximations but considers the process dXt = −∇V (Xt )dt + σdBt with some
σ > 0. The potential V and its unique invariant measure are shown in Figure 2.

2.5

2
Potential V

1.5

0.5

0
−1.5 −1 −0.5 0 0.5 1 1.5
x
0.2
invariant measure μ

0.15

0.1

0.05

0
−1.5 −1 −0.5 0 0.5 1 1.5
x

Fig. 2. Top panel: the potential V . Bottom panel: the associated invariant measure.

This process satisfies all necessary assumptions, and by resolving only the slowest
process (m = 1), the following spectral values are obtained:

Λ1 = 0.201, R = 16.363, Δ = 16.162.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1166 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

The eigenvector u1 is given in the middle panel of Figure 3. It is seen that it is almost
constant on the two wells of the potentials and changes sign close to where its saddle
point is located.
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Projection error δ. Let us first choose the lag time τ = 0.1. Then λ1 = 0.9801
and r = 0.1947. Figure 3 shows the values of the projection error δ for n = 2 and
sets of the form A1 = (−∞; x] and A2 = (x; ∞) depending on the position of the
dividing surface, x. One can see that it is optimal for the boundary between the two
sets to lie close to the saddle point of the potential, where the second eigenvector is
strongly varying.

2.5 potential
2
1.5
V

1
0.5
0
−1.5 −1 −0.5 0 0.5 1 1.5
0.5
second eigenvector
u1

−0.5
−1.5 −1 −0.5 0 0.5 1 1.5
0.2
δ depending on choice of x, A = (−∞ x] A = (x ∞)
1 2

0.1
δ

0
−1.5 −1 −0.5 0 0.5 1 1.5
x

Fig. 3. Upper panel: potential V . Middle panel: eigenvector u1 . Lower panel: projection error
δ for different sets A1 = (−∞; x] and A2 = (x; ∞) plotted against x.

Next, let us study the effect of different discretizations/partitioning of the state


space on δ. First, A1 , . . . , An are chosen as a uniform discretization of the interval
[−1.5, 1.5], the case n = 5 being shown in Figure 4. For the uniform discretization
the projection error δ does not monotonically decrease with increasing n, as shown in
Figure 5. This means that our discrete approximation of the transfer operator can get
even worse while uniformly refining the grid. Therefore using a uniform discretization
should be avoided. Next, a simple adaptive refinement strategy is considered: For
the case n = 2, the dividing surface is placed so as to minimize the δ error (see
Figure 3). For n = 3, another dividing surface is introduced at a point that minimizes
the resulting δ-error, and so on. See Figure 4 (bottom panel) for the result for n = 5.
While this strategy does not yield an optimal discretization for n > 2, it guarantees
that the error will decrease monotonically with increasing n, as shown in Figure 5.
Figure 4 (bottom panel) shows that the refinement is concentrated on the transition
region between the minima of the potential, since most of the projection error is made
in this region resulting from the strong variation of the eigenvector.
Effect of the lag time. Next, let us study the effect of different lag times τ . For
this, we fix the choice of the sets to n = 3 adaptive sets. Figures 6 and 8 show the
bound on the MSM approximation error E(k) from Theorem 3.1 compared to the
exact approximation error E(k) computed via extensive direct numerical simulation.
Upon increasing the lag time from τ = 0.1 to τ = 0.5 the bound from Theorem 3.1
becomes much sharper; see Figure 7. The bottom panel of Figure 7 additionally shows
that the exponential decay of both the real error E(t) and the upper bound B(t) does
not hide some strong discrepancy between E(t) and B(t) for growing t. Furthermore,
Figure 8 exhibits that the approximation quality of the MSM becomes significantly

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1167

0.5

0
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

u1 and Qu1
−0.5

−1

−1.5

−2

−2.5
−1.5 −1 −0.5 0 0.5 1 1.5
x
0.5

0
u1 and Qu1

−0.5

−1

−1.5

−2

−2.5
−1.5 −1 −0.5 0 0.5 1 1.5
x

Fig. 4. Galerkin approximation Qu1 of the second eigenvector. Left panel: uniform grid with
n = 5 sets. Right panel: adaptive grid with n = 5 sets.

0.2
adaptive
uniform
projection error δ

0.15

0.1

0.05

0
2 4 6 8 10 12
number of sets n

Fig. 5. Approximation error δ against the number of sets n for uniform and adaptive dis-
cretizations.

better when the lag time is increased. Finally, Figure 9 compares exact errors and
bounds for n = 3 sets with uniform and adaptive grids with lag time τ = 0.5 exhibiting
a dramatic advantage of the adaptive over the uniform discretization for longer lag
times.
Number of sets necessary to yield a given error and lag time. Let us briefly come
back to the question of how to build an MSM if the maximum acceptable approxima-
tion error tol is given. In the present case, Δ is known explicitly. Thus, as explained in
section 3.3, for a given lag time τ the value of δ that is required for E(T ) = tol = 0.1
can be computed (Figure 10, solid line). Next, we consider the adaptive discretization
with n = 2, 3, 4, . . . and compute their δ-error (boxes in Figure 10). This shows that
the required error tolerance of 0.1 can be obtained with different n-τ pairs, e.g., using
n = 2 with τ ≈ 0.3 or n = 5 with τ ≈ 0.15.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1168 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

0.35
exact error

approximation error E(t)


0.3 bound from Theorem
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
time t (τ=0.1)

Fig. 6. Bound and exact error E for τ = 0.1 on the adaptive grid with n = 3 adaptive sets.

0.014
bound from Theorem
approximation error E(t)

0.012 exact error

0.01

0.008

0.006

0.004

0.002

0
0 5 10 15 20 25 30 35 40
time t (τ=0.5)

1
quotient exact error/bound

0.95

0.9

0.85

0.8

0.75

0.7
0 50 100 150 200
time t (τ=0.5)

Fig. 7. Left panel: bound B(t) from Theorem 3.1 and the exact error E(t) for τ = 0.5 on the
E(t)
adaptive grid with n = 3. Right panel: the quotient B(t) .

5.2. Double well potential with diffusive transition region. Let us now
consider a one-dimensional diffusion in a different potential with two wells that are
connected by an extended transition region with substructure: dXt = −∇V (Xt )dt +
σdBt with σ = 0.8. The potential V and its unique invariant measure are shown in
Figure 11. We observe that the transition region between the two main wells now con-
tains four smaller wells that will each have their own, less pronounced metastability.
When considering the semigroup of transfer operators associated with this dynamics
we find the dominant eigenvectors as shown in Figure 12. The eigenvectors are all al-
most constant on the two main wells but are nonconstant in the transition region. The
dominant eigenvalues take the following values (in the form of lag time-independent

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1169

0.06

exact approximation error E(t)


τ = 0.5
τ = 0.1
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.05

0.04

0.03

0.02

0.01

0
0 5 10 15 20 25 30 35 40
time t

Fig. 8. Exact error E for different lag times (τ = 0.1 and 0.5) on the adaptive grid with n = 3.

0.14
bound adaptive
exact error adaptive
approximation error E(t)

0.12
bound uniform
0.1 exact error uniform
and estimate

0.08

0.06

0.04

0.02

0
0 5 10 15 20 25 30 35 40
time t (τ=0.5)

Fig. 9. Exact error and bound for uniform and adaptive grids; n = 3, τ = 0.5.

rates as introduced above):


Λ0 Λ1 Λ2 Λ3 Λ4 Λ5 Λ6 Λ7
+0.0000 −0.0115 −0.0784 −0.2347 −0.4640 −0.7017 −2.9652 −3.2861
The main metastability has a corresponding timescale |1/Λ1 | ≈ 87 related to the
transitions from one of the main wells to the other. Four other, minor metastable
timescales related to the interwell switches between the main and the four additional
small wells exist in addition.
Adaptive subdivisions and projection error. Let us first fix m = 2 and lag time
τ = 0.5 and study how the decay of the projection error depends on the number n of
sets in the respective optimal adaptive subdivision. To this end we first observe that
adaptive subdivisions will have to decompose the transition regions finer and finer;
see Figure 13 for an example for n = 20. The decay of the projection error δ with n
is shown in Figure 14. Figure 14 also includes the comparison of the decay of δ with
n and the decay of the total propagation error of the underlying MSMs. We observe
that the two curves decay in a similar fashion, as suggested by our error bound E on
the propagation error.
The role of m and lag time τ . There is a trade-off between the projection error
δ and the spectral part of the error that can be modulated by varying the number

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1170 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

0.14

0.12
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

0.1
n=2
0.08
n=3
δ

0.06

0.04 n=4
n=5

0.02

0
0 0.1 0.2 0.3 0.4 0.5 0.6
lag time τ

Fig. 10. Dependence of the requirement for δ on τ for prescribed error tol = 0.1. The boxes
indicate some values of δ that can be realized by choosing n adaptive boxes.

3
potential V

0
−2 −1 0 1 2 3 4 5 6 7
x
0.035

0.03
invariant measure μ

0.025

0.02

0.015

0.01

0.005

0
−2 −1 0 1 2 3 4 5 6 7
x

Fig. 11. Top panel: the potential V with extended transition region. Bottom panel: the associ-
ated invariant measure for σ = 0.8.

of resolved eigenfunctions, m. When increasing m, more eigenvectors are taken into


account and the minimal projection error that can be obtained with a fixed number of
stepfunctions, n, will increase. On the other hand, the spectral part of the error will
decrease as growing m increases the spectral gap Δ. This means that increasing m
and thus Δ will allow decreasing the lag time τ without changing the spectral error.
In order to understand how strongly the projection error δ depends on m, we show
the dependence of δ on m and the number n adaptively chosen sets in Figure 15. We
observe that the increase of m for fixed n is significant but not dramatic. Let us
next assume that we want to guarantee that the total propagation error will be below
0.1 for all times k. Then, a certain choice of m and n fixes Δ and δ and allows the
minimal lag time τ∗ to be computed that is required to guarantee maxk E(k) ≤ 0.1.
Figure 16 shows how τ∗ depends on m and n. It is observed that m = 2 yields the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1171

0.04
0.02

μ
0
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

−2 −1 0 1 2 3 4 5 6 7
0.1
u1
0
−0.1
−2 −1 0 1 2 3 4 5 6 7
0.2
u2

0
−0.2
−2 −1 0 1 2 3 4 5 6 7
0.2
u3

0
−0.2
−2 −1 0 1 2 3 4 5 6 7
0.2
u4

0
−0.2
−2 −1 0 1 2 3 4 5 6 7
x

Fig. 12. Invariant measure and eigenvectors uj , j = 1, . . . , 4, for Brownian motion in the
potential V with the extended transition region from Figure 11 for σ = 0.8.

n=20
10
V

0
−2 −1 0 1 2 3 4 5 6 7
2
u1, Qu1

0
−2
−2 −1 0 1 2 3 4 5 6 7
2
u2, Qu2

0
−2
−2 −1 0 1 2 3 4 5 6 7
x

Fig. 13. Potential and eigenvectors uj , j = 1, 2, and their stepfunction approximation Quj for
n = 20 adaptive sets. The resulting projection error is δ = 0.052.

best results; that is, for given n the lag time can be chosen smallest with m = 2 in
comparison to m = 1 and m = 5 (and other values of m not shown in the last figure).
6. Proofs.
6.1. Proof of Lemma 2.2. Proof. Because of Π0 Q = QΠ0 = Π0 and T −
Π0 = λ1 we have for k = 1

(49) T Q − Π0 = (T − Π0 )Q ≤ λ1 .

Since, furthermore, T Π0 = Π0 and T Π is self-adjoint, we find for arbitrary v ∈ L2μ

Π0 T v = T v, 1 1 = T Πv, 1 1 + T Π⊥ v, Π1 1
(50)
= v, T Π 1 1 + ΠT Π⊥ v, 1 1 = v, 1 1 = Π0 v,

where the second-to-last identity follows from (13). Therefore

(51) Π0 T = T Π0 = Π0 .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1172 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

0.2
δ for m=2
maxk||QTkQ − QPkQ||
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

δ and exact error


0.15

0.1

0.05

0
5 10 15 20 25 30 35 40 45 50 55
number n of sets

Fig. 14. Decay of δ and of the maximal propagation error maxk QT k Q − P k  with the number
n of sets in the optimal adaptive subdivision for m = 2.

0.35
δ for m=2
0.3 δ for m=5
δ for m=1
0.25

0.2
δ

0.15

0.1

0.05

0
5 10 15 20 25 30 35 40 45 50 55
number n of sets

Fig. 15. Decay of δ with the number n of sets in the optimal adaptive subdivision for m = 1, 2, 5.

From this and QΠ0 = Π0 Q = Π0 , it follows that (T Q − Π0 )k = (T Q)k − Π0 and thus


with (49)
(52) P k − Π0 = Q(T Q)k − QΠ0 ≤ (T Q)k − Π0
= (T Q − Π0 )k ≤ T Q − Π0 k ≤ λk1 ,
which was the assertion.
6.2. Proof of Theorem 3.1. First we observe that the error in (25) at time k
consists of the k − 1 projection errors that are propagated until time k is reached, as
direct calculation shows that
k−1

(53) QT k Q − QP k Q = QT i Q⊥ (T Q)k−i .
i=1

By this expression we can estimate the approximation error E by observing that it


consists of two different parts. Because of Q⊥ Q⊥ = Q⊥ we have
k−1

(54) QT k Q − QP k Q ≤ QT i Q⊥ Q⊥(T Q)k−i .
i=1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1173

14
m=1
m=2

required lagtime τ∗
12
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

10

4
0 10 20 30 40 50 60
number n of sets
70
m=2
60 m=5
required lagtime τ∗

50

40

30

20

10

0
0 10 20 30 40 50 60
number n of sets

Fig. 16. Comparison of the minimal lag time τ∗ that is required to achieve maxk E(k) ≤ 0.1
depending on the number n of sets in the optimal adaptive subdivision. Top panel: m = 1 compared
to m = 2. Bottom panel: m = 2 compared to m = 5.

The first term QT i Q⊥ describes the propagation of the projection error in i steps,
and the second term Q⊥ (T Q)k−i measures how large a projection error can be in
the (k − i)th iteration of applying operator P . So the ith summand explains the effect
of the propagation of the error that is made in the (k − i)th iteration.
We will estimate the overall error by looking at both parts of the error separately.
Let us prepare for this with the following lemma.
Lemma 6.1. For the first part of the error we have the upper bound

(55) QT k Q⊥ ≤ mλk1 δ + rk .

Proof. Let v be arbitrary with v = 1. Because u0 = 1 and Q⊥ u0 = 0,


m

(56) (T Π)k Q⊥ v = T k ΠQ⊥ v = λkj Q⊥ uj , v uj ,
j=1

which leads to
m
 (27)
(57) (T Π)k Q⊥ v 2 = λ2k ⊥
j Q uj , v
2
≤ mλ2k
1 δ
2

j=1

and therefore
√ k
(58) Q(T Π)k Q⊥ ≤ mλ1 δ.

Now we can estimate


(14)
(59) QT k Q⊥ ≤ Q(T Π)k Q⊥ + Q(T Π⊥)k Q⊥
(58) √ (12) √
≤ mλk1 δ + T Π⊥ k ≤ mλk1 δ + rk .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1174 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

Now we can prove Theorem 3.1.


Proof. First recall that the first argument 2 in the minimum taken in (29) comes
from (26). Moreover, recall (54), that is,
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

k−1

(60) QT k Q − QP k Q ≤ QT i Q⊥ Q⊥(T Q)k−i .
i=1

Because Q⊥ Π0 = 0, we can write

(61) Q⊥ (T Q)k−i = Q⊥ T Q(T Q)k−i−1 = Q⊥ T Q((T Q)k−i−1 − Π0 ) .

Moreover,

(62) Q⊥ T Q ≤ Q⊥ T ΠQ + Q⊥ T Π⊥ Q ≤ Q⊥ T ΠQ + r

and, for v with v = 1,


m

(63) Q⊥ T ΠQv 2 = Qv, ui Qv, uj λi λj Q⊥ ui , Q⊥ uj ≤ m2 λ21 δ 2 v 2 .
i,j=1

We use Lemma 2.2 to get

(64) Q⊥ T Q((T Q)k−i−1 − Π0 ) ≤ (mλ1 δ + r)λk−i−1


1 .

Inserting (64) and Lemma 6.1 into (54) yields


k−1
 √
(65) E(k) = QT k Q − Q(T Q)k ≤ (mλ1 δ + r) ( mλi1 δ + ri )λk−i−1
1 .
i=1

Now we have
k−1
 k−1

√ √
(66) ( mλi1 δ + ri )λk−i−1
1 = mδ(k − 1)λk−1
1 + λk−1
1 ηi
i=1 i=1

and
k−1
 1 − ηk η − ηk η
(67) ηi = −1= = (1 − η k−1 ).
i=1
1−η 1−η 1−η

6.3. Proof of Lemma 4.1. Remember that Q(1) , Q(2) denote the projection
from L2μ to the fine and coarse stepfunction spaces, while Q(12) denotes the projection
from fine to coarse.
Proof. We first find easily that Q(2) = Q(12) Q(1) , and Q(2),⊥ = Q(1),⊥ +Q(12),⊥ Q(1) .
(12)
Setting δj = Q(12),⊥ Q(1) uj this implies

(2)
(δj )2 = Q(2),⊥ uj 2 = (Q(1),⊥ + Q(12),⊥ Q(1) )uj , (Q(1),⊥ + Q(12),⊥ Q(1) )uj
(68) = Q(1),⊥ uj 2 + Q(12),⊥ Q(1) uj 2 + 2 Q(12),⊥ Q(1) uj , Q(1),⊥ uj
(1) (12) 2
= (δj )2 + (δj ) ,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1175

where the last identity follows from the fact that Q(1),⊥ Q(12) = 0 on Range(Q(1) ),
which implies Q(1),⊥ Q(12),⊥ = Q(1),⊥ . This identity implies the assertions concerning
the δ-estimates. With respect to the estimate on the error it suffices to observe that
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

E2 (k) = Q(2) T k Q(2) − (P (2) )k = Q(12) Q(1) T k Q(12) Q(1) − (P (2) )k


= Q(12) Q(1) T k Q(1) Q(12) Q(1) − (P (2) )k
(69) ≤ Q(12) Q(1) T k Q(1) Q(12) Q(1) − Q(12) (P (1) )k Q(12) Q(1)
+ Q(12) (P (1) )k Q(12) Q(1) − (P (2) )k Q(1)
≤ E1 (k) + Q(12) (P (1) )k Q(12) − (P (2) )k .
7. Conclusion. We have presented a rigorous upper bound to the error in the
propagation of functions in the state space between an ergodic dynamical process and
an MSM based on a complete partitioning of the state space.
We have demonstrated that this error depends mainly on two components: (1) the
projection error associated with how well the discretization of the state space can
approximate the variation of the dominant eigenfunctions of the original process, and
(2) the lag time chosen in the transition matrix of the MSM. In particular, for fixed,
large enough lag time, we have seen that an increasingly fine discretization of the
state space decreases the error independently of whether or not the original process
coarse grained to the subdivision still satisfies the Markov property. This observation
justifies the algorithmic strategy of constructing MSMs by finely partitioning the state
space using clustering algorithms that has been employed by several researchers in
the field [6, 32, 33].
Our results also provide formal support to the practical experience that grouping
discretization elements into metastable sets is a good strategy when a small MSM
is desired [6, 24, 8, 1], but it will nevertheless increase the approximation error of
the MSM compared to the fine discretization for a given lag time τ . A given error
tolerance then again can be met by further increasing τ to reduce the spectral part of
the error. On the other hand, if a metastable subdivision does not yield a sufficiently
low error for a desirable choice of τ , further reduction of the projection error by
introducing additional refinements of the subdivision is needed. We have seen that
such refinements will best be placed within the transition regions. In contrast to the
assumption that has been made in many applications of an MSM, this shows that in
order to improve the quality of an MSM one needs to go to a partitioning of the state
space that is not metastable.
Additional algorithmic questions like the possible advantage of fuzzy memberships
and the use of subdivisions based only on a subset of state space variables have
also been discussed. The influence of the statistical error and possible strategies for
bounding the total approximation error of the MSM were also discussed. Note that
the requirement of improving the quality of the MSM by fine subdivisions of the
transition regions calls for adaptive simulation methods, as the statistical sampling is
typically worst in the transition regions.
Summarizing, on the one hand our analysis puts the questions concerning the
approximation quality of the MSM onto solid ground and the functional form of the
error immediately suggests the development of adaptive algorithms for obtaining a
discretization of the state space such that the MSM approximation is guaranteed
to stay below a user-defined tolerance. On the other hand, we see that there are
situations, such as potentials with large diffusive transition regions, where the full
partitioning of the state space might require using too many states to be practically

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


1176 MARCO SARICH, FRANK NOÉ, AND CHRISTOF SCHÜTTE

useful and new algorithmic strategies will be required for efficient construction of high
quality MSMs.
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

REFERENCES

[1] P. Deuflhard, W. Huisinga, A. Fischer, and Ch. Schuette, Identification of almost in-
variant aggregates in reversible nearly uncoupled Markov chains, Linear Algebra Appl.,
315 (2000), pp. 39–59.
[2] Ch. Schuette, A. Fischer, W. Huisinga, and P. Deuflhard, A direct approach to conforma-
tional dynamics based on hybrid Monte Carlo, J. Comput. Phys., 151 (1999), pp. 146–168.
[3] Ch. Schuette and W. Huisinga, Biomolecular conformations can be identified as metastable
sets of molecular dynamics, in Handbook of Numerical Analysis, Elsevier, Amsterdam,
2003, pp. 699–744.
[4] A. Bovier, M. Eckhoff, V. Gayrard, and M. Klein, Metastability and low lying spectra in
reversible Markov chains, Comm. Math. Phys., 228 (2002), pp. 219–255.
[5] F. Noé and S. Fischer, Transition networks for modeling the kinetics of conformational
change in macromolecules, Curr. Opin. Struct. Biol., 18 (2008), pp. 154–162.
[6] F. Noé, I. Horenko, Ch. Schuette, and J. Smith, Hierarchical analysis of conformational
dynamics in biomolecules: Transition networks of metastable states, J. Chem. Phys., 126
(2007), 155102.
[7] F. Noé, C. Schuette, L. Reich, and T. Weikl, Constructing the equilibrium ensemble of
folding pathways from short off-equilibrium simulations, Proc. Natl. Acad. Sci. USA, 106
(2009), pp. 19011–19016.
[8] J. Chodera, N. Singhal, V. S. Pande, K. Dill, and W. Swope, Automatic discovery of
metastable states for the construction of Markov models of macromolecular conformational
dynamics, J. Chem. Phys., 126 (2007), 155101.
[9] N. V. Buchete and G. Hummer, Coarse master equations for peptide folding dynamics, J.
Phys. Chem. B, 112 (2008), pp. 6057–6069.
[10] A. C. Pan and B. Roux, Building Markov state models along pathways to determine free
energies and rates of transitions, J. Chem. Phys., 129 (2008), 064107.
[11] A. Voter, Introduction to the kinetic Monte Carlo method, in Radiation Effects in Solids,
Springer, NATO Publishing Unit, Dordrecht, The Netherlands, 2007, pp. 1–23.
[12] M. Freidlin and A. D. Wentzell, Random Perturbations of Dynamical Systems, Springer,
New York, 1998.
[13] W. E and E. Vanden Eijnden, Metastability, conformation dynamics, and transition pathways
in complex systems, in Multiscale Modelling and Simulation, Springer, Berlin, 2004, pp.
35–68.
[14] A. Bovier, M. Eckhoff, V. Gayrard, and M. Klein, Metastability in reversible diffusion
processes. I. Sharp asymptotics for capacities and exit times, J. Eur. Math. Soc. (JEMS),
6 (2004), pp. 399–424.
[15] A. Bovier, V. Gayrard, and M. Klein, Metastability in reversible diffusion processes. II. Pre-
cise asymptotics for small eigenvalues, J. Eur. Math. Soc. (JEMS), 7 (2005), pp. 69–99.
[16] R. S. Maier and D. L. Stein, Limiting exit location distributions in the stochastic exit problem,
SIAM J. Appl. Math., 57 (1997), pp. 752–790.
[17] I. Pavlyukevich, Stochastic Resonance, Ph.D. thesis, HU Berlin, Berlin, 2002.
[18] W. Huisinga, S. Meyn, and Ch. Schuette, Phase transitions and metastability for Markovian
and molecular systems, Ann. Appl. Probab., 14 (2004), pp. 419–458.
[19] E. B. Davies, Spectral properties of metastable Markov semigroups, J. Funct. Anal., 52 (1983),
pp. 315–329.
[20] Ch. Schuette, Conformational Dynamics: Modelling, Theory, Algorithm, and Applications
to Biomolecules, Habilitation thesis, Fachbereich Mathematik und Informatik, FU Berlin,
Berlin, 1998.
[21] F. Herau, M. Hitrik, and J. Sjoestrand, Tunnel effect for Kramers-Fokker-Planck type
operators: Return to equilibrium and applications, Int. Math. Res. Not. IMRN, no. 15
(2008).
[22] P. Deuflhard, M. Dellnitz, O. Junge, and Ch. Schuette, Computation of essential molec-
ular dynamics by subdivision techniques, in Computational Molecular Dynamics Chal-
lenges, Methods, Ideas, Lect. Notes Comput. Sci. Eng. 4, Springer, Berlin, 1998, pp. 98–
115.
[23] W. Huisinga, Metastability of Markovian Systems: A Transfer Operator Based Approach in
Application to Molecular Dynamics, Ph.D. thesis, Fachbereich Mathematik und Informatik,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MARKOV STATE MODEL 1177

FU Berlin, Berlin, 2001.


[24] M. Weber, Meshless Methods in Conformation Dynamics, Ph.D. thesis, FU Berlin, Berlin,
2006.
Downloaded 12/07/12 to 138.87.11.21. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[25] P. Deuflhard and M. Weber, Robust Perron cluster analysis in conformation dynamics,
Linear Algebra Appl., 398 (2005), pp. 161–184.
[26] N.-V. Buchete and G. Hummer, Peptide folding kinetics from replica exchange molecular
dynamics, Phys. Rev. E (3), 77 (2008), 030902.
[27] C. Schütte, J. Walter, C. Hartmann, and W. Huisinga, An averaging principle for fast
degrees of freedom exhibiting long-term correlations, Multiscale Model. Simul., 2 (2004),
pp. 501–526.
[28] N. Singhal Hinrichs and V. S. Pande, Bayesian metrics for the comparison of Markovian
state models for molecular dynamics simulations, in Proceedings of the International Con-
ference on Research in Computational Molecular Biology, 2007, submitted.
[29] S. Roeblitz, Statistical Error Estimation and Grid-Free Hierarchical Refinement in Confor-
mation Dynamics, Ph.D. thesis, FU Berlin, Berlin, 2008.
[30] N. Singhal and V. S. Pande, Error analysis in Markovian state models for protein folding,
J. Chem. Phys., 123 (2005), 204909.
[31] F. Noé, Probability distributions of molecular observables computed from Markov models, J.
Chem. Phys., 128 (2008), 244103.
[32] F. Rao and A. Caflisch, The protein folding network, J. Mol. Biol., 342 (2004), pp. 299–306.
[33] S. V. Krivov and M. Karplus, Hidden complexity of free energy surfaces for peptide (protein)
folding, Proc. Natl. Acad. Sci. USA, 101 (2004), pp. 14766–14770.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy