0% found this document useful (0 votes)
8 views

Dynamics of stochastic approximation algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Dynamics of stochastic approximation algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

S ÉMINAIRE DE PROBABILITÉS (S TRASBOURG )

M ICHEL B ENAÏM
Dynamics of stochastic approximation algorithms
Séminaire de probabilités (Strasbourg), tome 33 (1999), p. 1-68
<http://www.numdam.org/item?id=SPS_1999__33__1_0>

© Springer-Verlag, Berlin Heidelberg New York, 1999, tous droits réservés.


L’accès aux archives du séminaire de probabilités (Strasbourg) (http://portail.
mathdoc.fr/SemProba/) implique l’accord avec les conditions générales d’utili-
sation (http://www.numdam.org/conditions). Toute utilisation commerciale ou im-
pression systématique est constitutive d’une infraction pénale. Toute copie ou im-
pression de ce fichier doit contenir la présente mention de copyright.

Article numérisé dans le cadre du programme


Numérisation de documents anciens mathématiques
http://www.numdam.org/
Dynamics of Stochastic Approximation
Algorithms
Michel Benaïm

Abstract
These notes were written for a D.E.A course given at Ecole Normale
Supérieure de Cachan during the 1996-97 and 1997-98 academic years and
at University Toulouse III during the 1997-98 academic year. Their aim
is to introduce the reader to the dynamical system aspects of the theory
of stochastic approximations.
Contents
1 Introduction 3
1.1 Outline of contents ........................ ~ 4

2 Some Examples 6
2.1 Stochastic Gradients and Learning Processes ........... 6
2.2 Polya’s Urns and Reinforced Random Walks ............ 6
2.3 Stochastic Fictitious Play in Game Theory ............ 8

3 Asymptotic Pseudotrajectories 9
3.1 Characterization of Asymptotic Pseudo trajectories ........ 10

4 Asymptotic Pseudotrajectories and Stochastic Approximation


Processes ~

11
4.1 Notation and Preliminary Result .................. 11
4.2 Robbins-Monro Algorithms ..................... 14
4.3 Continuous Time Processes ..................... 18

5 Limit Sets of Asymptotic Pseudotrajectories 20


5.1 Chain Recurrence and Attractors .................. 20
5.2 The Limit Set Theorem ....................... 24

6 Dynamics of Asymptotic Pseudotrajectories 25


6.1 Simple Flows, Cyclic Orbit Chains ................. 25
6.2 Lyapounov Functions and Stochastic Gradients .......... 27
6.3 Attractors ............................... 28
6.4 Planar Systems ............................ 29

7 Convergence with positive probability toward an attractor 30


7.1 Attainable Sets ............................ 31
7.2 Examples ............................... 32
7.3 Stabilization .............................. 34

8 Shadowing Properties 35
8.1 03BB-Pseudotrajectories
......................... 36
8.2 Expansion Rate and Shadowing ................... 40
8.3 Properties of the Expansion Rate .................. 43

9 Nonconvergence to Unstable points, Periodic orbits and Nor-


mally Hyperbolic Sets 47
9.1 Proof of Theorem 9.1......................... 50

10 Weak Asymptotic Pseudotrajectories 60


10.1 Stochastic Approximation Processes with Slow Decreasing Step-Size 64
3

1 Introduction
Stochastic approximation algorithms are discrete time stochastic processes whose
general form can be written as

=
’Yn+1 Vn+1 ( 1)
where xn takes its values in some euclidean space, is a random variable
and In > 0 is a "small" step-size.
Typically xn represents the parameter of a system which is adapted over time
and f(xn, ~»+1 ). At each time step the system receives a new information
that causes xn to be updated according to a rule or algorithm characterized
by the function f.Depending on the context f can be a function designed by a
user so that some goal (estimation, identification, ... ) is achieved, or a model
of adaptive behavior.
The theory of stochastic approximations was born in the early 50s through
the works of Robbins and Monro (1951) and Kiefer and Wolfowitz (1952) and has
been extensively used in problems of signal processing, adaptive control (Ljung,
1986; Ljung and Soderstrom, 1983; Kushner and Yin, 1997) and recursive es-
timation (Nevelson and Khaminski, 1974). With the renewed and increased
interest in the learning paradigm for artificial and natural systems, the theory
has found new challenging applications in a variety of domains such as neural
networks (White, 1992; Fort and Pages, 1994) or game theory (Fudenberg and
Levine, 1998).
To analyse the long term behavior of (1), it is often convenient to rewrite the
noise term as

Vn+l =
F(xn) + (2)
where F : - I~m is a deterministic vector field obtained by suitable averag-

ing. The examples given in Section 2 will illustrate this procedure. A natural
approach to the asymptotic behavior of the sequences {xn } is then, to consider
them as approximations to solutions of the ordinary differential equation (ODE)
dx dt = F(x). (3)

One can think of


(1) as a kind of Cauchy-Euler approximation scheme for nu-

merically solving (3) with step size yn . It is natural to expect that, owing to
the fact that yn is small, the noise washes out and that the asymptotic behavior
of {xn } is closely related to the asymptotic behavior of the ODE. This method
called the ODE method was introduced by Ljung (1977) and extensively studied
thereafter. It has inspired a number of important works, such as the book by
Kushner and Clark (1978), numerous articles by Kushner and coworkers, and
more recently the books by Benveniste, Metivier and Priouret (1990), Duflo

(1996) and Kushner and Yin (1997).


However, until recently, most works in this direction have assumed the sim-
plest dynamics for F (for example that F is the negative of the gradient of a
cost function), and little attention has been payed to dynamical system issues.
4

The aim of this set of notes is to show how dynamical system ideas can be
fully integrated with probabilistic techniques to provide a rigorous foundation
to the ODE method beyond gradients or other dynamically simple systems.
However it is not intended to be a comprehensive presentation of the theory of
stochastic approximations. It is principally focused on the almost sure dynamics
of stochastic approximation processes with decreasing step sizes. Questions of
weak convergence, large deviation, or rate of convergence, are not considered
here. The assumptions on the "noise" process are chosen for simplicity and
clarity of the presentation.
These notes are partially based on a DEA course given at Ecole Normale
Superieure de Cachan during the 1996-1997 and 1997-1998 academic years and
at University Paul Sabatier during the 1997-1998 academic year. I would like to
especially thank Robert Azencott for asking me to teach this course and Michel
Ledoux for inviting me to write these notes for Le Séminaire de Probabilités.
An important part of the material presented here results from a collabora-
tion with Morris W. Hirsch and it is a pleasure to acknowledge the fundamental
influence of Moe on this work. I have also greatly benefited from numerous dis-
cussions with Marie Duflo over the last months which have notably influenced
the presentation of these notes. Finally I would like to thank Odile Brandiere,
Philippe Carmona, Laurent Miclo, Gilles Pages and Sebastian Schreiber for valu-
able insights and informations.
Although most of the material presented here has already been published,
some results appear here for the first time and several points have been improved.

1.1 Outline of Contents


These notes are organized as follows.
Section 2 presents simple motivating examples of stochastic approximation
processes.
Section 3 introduces the notion of asymptotic pseudotrajectories for a semi-
flow. This a purely deterministic notion due to Benaim and Hirsch (1996) which
arises in many dynamical settings but turns out to be very well suited to stochas-
tic approximations.
In section 4 classical results on stochastic approximations are (re)formulated
in the language of asymptotic pseudotrajectories. Attention is restricted to the
classical situation where (see (2)) is a sequence of martingale differences.
It is shown that, under suitable conditions, the continuous time process obtained
by a convenient interpolation of is almost surely an asymptotic pseudotra-
jectory of the semiflow induced by the associated ODE. This section owes much
to the ideas and techniques developped by Kushner and his co-workers (Kushner
and dark, 1978; Kushner and Yin, 1997), Metivier and Priouret (1987) (see also
Benveniste, Metivier and Priouret (1990)) and Duflo (1990, 1996, 1997). The
case of certain diffusions and jump processes is also considered.
Section 5 characterizes the limit sets of asymptotic pseudotrajectories. It be-
gins with a comprehensive introduction to chain-recurrence and chain-transitivity.
Several properties of chain transitive sets are formulated. The main result of the
5

section establishes that limit sets of precompact asymptotic pseudotrajectories


are internally chain-transitive. This theorem was originally proved in (Benaim,

1996) but I have chosen to present here the proof of Benaim and Hirsch (1996).
I find this proof conceptually attractive and it is somehow more directly related
to the original ideas of Kushner and Clark (1978).
Section 6 applies the abstract results of section 5 in various situations. It
is shown how assumptions on the deterministic dynamics can help to identify
the possible limit sets of stochastic approximation processes with a great deal of
generality. This section generalizes and unifies many of the results which appear
in the literature on stochastic approximation.
Section 7 establishes simple sufficient conditions ensuring that a given attrac-
tor of the ODE has a positive probability to host the limit set of the stochastic
approximation process. It also provides lower bound estimates of this probabil-
ity. This section is based on unpublished works by Duflo (1997) and myself.
Section 8 considers the question of shadowing. The main result of the sec-
tion asserts that when the step size of the algorithm goes to zero at a suitable
rate (depending on the expansion rate of the ODE) trajectories of (1) are al-
most surely asymptotic to forward trajectories of (3). This section represents
a synthesis of the works of Hirsch (1994), Benaim (1996), Benaim and Hirsch

(1996) and Duflo (1996) on the question of shadowing. Several properties and
estimates of the expansion rate due to Hirsch (1994) and Schreiber (1997) are
presented. In particular, Schreiber’s ergodic characterization of the expansion
rate is proved.
Section 9 pursues the qualitative analysis of section 7. The focus is on the
behavior of stochastic approximation processes near "unstable" sets. The cen-
terpiece of this section is a theorem which shows that stochastic approximation
processes have zero probability to converge toward certain repelling sets includ-
ing linearly unstable equilibria and periodic orbits as well as normally hyper-
bolic manifolds. For unstable equilibria this problem has often been considered
in the literature but, to my knowledge, only the works by Pemantle (1990) and
Brandiere and Duflo (1996) are fully satisfactory. I have chosen here to follow
Pemantle’s arguments. The geometric part contains new ideas which allow to
cover the general case of normally hyperbolic manifolds but the probability part
owes much to Pemantle.

Section 10 introduces the notion of a stochastic process being a weak asymp-


totic pseudotrajectory for a semiflow and analyzes properties of its empirical
occupation measures. This is motivated by the fact that any stochastic approxi-
mation process with decreasing step size is a weak asymptotic pseudotrajectory
of the associated ODE regardless of the rate at which 0.
6

2 Some Examples
2.1 Stochastic Gradients and Learnin g Processes
Let ~~i }; > 1, ~= E E be a sequence of independent identically distributed random
inputsto a system and let zn E IRm denote a parameter to be updated, n > 0.
We suppose the updating to be defined by a given map f x E -~ and
the following stochastic algorithm:
xn+1 - xn =
(4)
Let be the common probability law of the çn. Introduce the average vector
field .

F(x) =

and set
Un+l =

It is clear that this algorithm has the form given by [(1) (2)]. Such processes are
classical models of adaptive algorithms.
A situation often encountered in "machine learning" or "neural networks" is
the following: Let I and 0 be euclidean spaces and M : x I -> 0 a smooth
function representing a system (e.g a neural network). Given an input y E I and
a parameter x the system produces the output M(x,
y) . .

Let ~~n} = be ’a sequence of i.i.d random variables representing


the training set of M. Usually the law p of ~n is unknown but many samples
of ~n are available. The goal of learning is to adapt the parameter x so that
the output M(x, yn) gives a good approximation of the desired output on Let .

e : 0 x 0 -~ 1~+ be a smooth error function. For example e(o, o’) ~ ~o - o~ ~ ~2. _

Then a basic training procedure for M is given by (4) where


f(x,03BE)=~xe(M,y)o and03BE=(y,0).

Assuming that derivation and expectation commute, the associated ODE is


the gradient ODE given by
F(x) =
-VC(x) with C(x) = ~ e(M(.c, y), o).
.

2.2 Polya’s Urns and Reinforced Random Walks


The unit m-simplex am C is the set

~"z =
{v E I~r’~’+i : vi ~ 0, = 1}.
We consider Om as a differentiable manifold, identifying its tangent space at any
point with the linear subspace
7

An urn initially (i. e. ,at time n = 0) contains no > 0 balls of colors 1,..., m + 1.
At each time step a new ball is added to the urn and its color is randomly chosen
as follows:
Let xn,; be the proportion of balls having color i at time n and denote by
xn E Om the vector of proportions xn = (xn,l, ..., ’Cn,m+i). The color of the
ball added at time n + 1 is chosen to be i with probability f t ( xn ) , where the f s
are the coordinates of a function f : Om -~
Such processes, known as generalized Polya urns, have been considered by
Hill, Lane and Sudderth (1980) for m = 1; Arthur, Ermol’ev and Kaniovskii
(1983); Pemantle (1990). Arthur (1988) used this kind of model to describe
competing technologies in economics.
An model is determined by the initial urn composition
urn no) and the
urn function f : ~m -~ We assume that the initial composition no) is
fixed one for all. The u-field 7n is the field generated by the random variables
xo, , xn One easily verifies that the equation
... .

.Cn+l - xn =

rt 0 + 1 n 1 (-xn -+- f(xn) + (5)


defines random variables that satisfy = 0. We can
identify the
affine space {v E I~"’+1 : =
1~ with Em by parallel translation, and
also with jRm by any convenient afhne isometry. Under the latter identification,
we see that process (5) has exactly the form of [(1) (2)], taking F to be any map

which e q uals -Id + f on A"Band setting yn - 1 .

no + n
Observe that f being arbitrary, the dynamics of F = -I d + f can be arbi-
trarily complicated.

The next example is a generalization of urn processes which I call Generalized


Vertex Reinforced Random Walks after Diaconis and Pemantle. These are non-
Markovian discrete time stochastic processes living on a finite state space for
which the transition probabilities at each step are influenced by the proportion
of time each state has been visited.
Let denote the space of real (m + 1) x (m + 1) matrices and let
M :: Em - be a smooth map such that for all v E Om, M(v) =
is Markov transition matrix. Given a point xa E Int{Om), a vertex
y E ~ 1, ... , m + 1 } and a positive integer no consider a stochastic process
{(Yn, (Sl(n), Sm+1 (n))}n>0 defined on ~ I, ... , rn + 1~ x
... , by
. =
Yo =
y.
n

. =
s= (0) + r n >- o.
k=l

.
j|Fn) =
8

where 7nn denotes the a-field g


generated byY{ ~ 0 ::;
_~_j ::; n}} and Xn
n = is
n+no
the empirical occupation measure of {Yn}. .

Suppose that for each v E am the Markov chain M(v) is indecomposable


(i.e has a unique recurrence class), then by a standard result of Markov chains
theory, M(v) has a unique invariant probability measure f(v) E As for
Polya’s urns, equation (5) defines a sequence of random variables {Un}. Here
the {Un} are no longer martingale differences but the ODE governing the long
term behavior of {xn} is still given by the vector field F(x) -x + f(x) (see =

Benaim, 1997).
The original idea of these processes is due to Diaconis who introduced the
process defined by
s ,j ( )
Ri,kvk
with R,j 0. For this process called a Vertex Reinforced Random Walk the
>
probability of transitionto site j increases each time j is visited. The long term
behavior of {xn} has been analyzed by Pemantle (1992) for R; j = Rj,a and by
Benaim (1997) in the non-symmetric case. With a non-symmetric R the ODE
may have nonconvergent dynamics and the behavior of the process becomes
highly complicated (Benaim, 1997). .

2.3 Stochastic Fictitious Play in Game Theory


Our last example is an adaptive learning process introduced by Fudenberg and
Kreps ( 1993) for repeated games of incomplete information called stochastic fic-
titious play. It belongs to a flourishing literature which develops the explanation
that equilibria in games may arise as the result of learning rather than from
rationalistic analysis. For more details and economics motivation we refer the
reader to the recent book by Fudenberg and Levine (1998). .

For notational convenience we restrict attention to a two-players and two-


strategies game. The players are labeled i 1, 2 and the set of strategies is
=

denoted { 0,1 } .
Let be a sequence of identically distributed random variables de-
scribing the states of nature. The payoff to player i at time n is a function
~n) : {0,1}2 -~ I~. We extend ~n) to a function ~") : ~0,1J2
defined by U~ x2, ~n) =

xlx2v=(1~ za ~n)+(1-x2)v= (1 ~ o~ ~n)J+(1-xl)(~2Ua (o~ 1~ ~n)+(lw2)U~ (o~ o~ °

Consider now the repeated play of the game. At round n player i chooses an
action sn E {o,1} independently of the other player. As a result of these choices
player i receives the payoff ~’ri). The basic assumption is that Ua (., ~n)
is known to player i at time n but the strategy chosen by her opponent is not.
At the end of the round, both players observe the strategies played.
Fictitious play produces the following adaptive process: At time n+ 1 player 1
(respectively 2) knowing her own payoff function U1 (. , ~"+1) and the strategies
9

played by her opponent up to time n computes and plays the action which
maximizes her expected payoff under the assumption that her opponent will
play an action whose probability distribution is given by historical frequency of
past plays. That is
4+1 =

where

xin = 1 nsik .

A simple computation shows that the vector of empirical frequencies


(zg, xn) satisfies a recursion of type [(1),(2)] with ~yn = n ,
= 0
and F is the vector field given by

~ x2) = + hl (x2) ~ -x2 + (6)


where
h x2) P(U1(1,x2,~) > ~1(~~x2~~))~
=

h2(xl) p(U2(xl, ~ ~~ ~) >


=
~~ ~))~
The mathematical analysis of stochastic fictitious play has been recently con-
ducted by (Benaim and Hirsch, 1994) and (Kaniovski and Young, 1995). We
will give in section 6.4 (see Example 6.16 ) a simple argument ensuring the
convergence of the process.

3 Asymptotic Pseudotrajectories
A semiflow 03A6 on a metric space (M, d) is a continuous map
~ :R+ x M - M,
(t~ x) ~ ~(t~ x) _

such that
~o
Identity,=
~t o ~3 =

for all (t, s) E R+ x R+. Replacing R+ by R defines a flow.


A continuous function X : R+ - M is an asymptotic pseudotrajectory for ~
if
lim sup d(X(t + h)’ ~h(X(~))) = 0

for any T > 0. Thus for each fixed T > 0, the curve

shadows the ~-trajectory of the point X(t) over the interval [0, T] with arbitrary
accuracy for sufficiently large t. By abuse of language we call X precompact if
its image has compact closure in M.
The notion of asymptotic pseudotrajectories has been introduced in Benaim
and Hirsch ( 1996) and is particularly useful for analyzing the long term behavior
of stochastic approximation processes.
10

3.1 Characterization of Asymptotic Pseudotrajectories


Let denote the space of continuous M-valued functions 11~ --~ M
endowed with the topology of uniform convergence on compact intervals. If
X : continuous function, we consider X as an element of M)
by setting X(t) X(0)
= for t 0. The space M) is metrizable. Indeed, a

distance is given by: for all f, g E M),

d(f,g) = 1 2kmin(1,dk(f,g)
where =
SUPtE[-k. k] d(f (t)~9(t))~ .

The translation flow 0 : C°(I~, M) x R --~ M) is the flow defined by:


~t(X)(S) =
X(t + s) .
Let $ be a flow or a semiflow on M. For each p 6 M, the trajectory -

03A6t (p) is an element of C° M) (with the convention that 03A6p (t) p


= if t 0 and
~ is for a semiflow). The set of all such ~p defines a subspace S~ C C° (I~, M). .

It is easy to see that the map H: M -~ S~ defined by H (p) _ ~p is an


homeomorphism which conjugates (8 restricted to S~) and ~. That is

where t if ~ is a flow and t > 0 if ~ is a semiflow. This makes S~ a closed


set invariant under 8. Define the retraction ~ :: C° (I~, M) --~ S~ as

~(X) H(X(0)) = =

Lemma 3.1 A continuous function X -~ M is an asymptotic pseudotra-


jectory of ~ if and only if:
lim = 0.

Proof Follows from definitions. QED

Roughly speaking, this means that an asymptotic pseudotrajectory of ~ is a


point of C°(11g+, M) whose forward trajectory under 0 is attracted by S~. We
also have the following result:
Theorem 3.2 Let X --~ M be continuous function whose image has com-

pact closure in M. Consider the following assertions


(i) X is an asymptotic pseudotrajectory of ~
(ii) X is uniformly continuous and every limit point1 of is in S03A6 (i.e
a fixed point of ).

1By a limit point of ~©t(X)~ we mean the limit in of a convergent sequence


0398tk (X),tk ~ oo.
11

(iii) The sequence is relatively compact in

Then (i) and (ii) are equivalent and imply (iii).


Proof Suppose holds. Let K denote the closure of {X() ::
that assertion (i)
t > 0 } . Let E > 0. By continuity of the flow and compactness of Ii there exists
a > 0 such that d(~s(x), x) E/2 for all (s~ a uniformly in x E I~. Therefore
d(~s(X(t)), X(t)) E/2 for all t > 0, ~s) a.
Since X is an asymptotic pseudotrajectory of ~, there exists to > 0 such
that X (t + s)) f/2 for all t > to, ~s~ a. It follows that d(X(t +
s), X(t)) e for all t > t°, a. This proves uniform continuity of X. On the .

other hand, Lemma 3.1 shows that any limit point of ~Ot (X) } is a fixed point
of$. This proves that (i) implies (ii).
Suppose now that (ii) holds. Since {X(t) : t > 0} is relatively compact and
X is uniformly is equicontinuous and for each s 2:: 0
is relatively compact in M. Hence by the Ascoli Theorem (see
e.g Munkres 1975, Theorem 6.1), is relatively compact in C° (I~, M).
Therefore limt~~ d(0398t(X), (~t(X))
= 0 which
by Lemma 3.1 implies (i).
The above discussion also shows that (ii) implies (iii). QED
Remark 3.3 Let M) be the space of functions which are right continuous
and have left-hand limits ( cad lag functions). The definition of asymptotic pseu-
dotrajectories can be extended to elements of D(R M). Since the convergence
of a sequence ~ f n } E D toward a continuous function f is equivalent to the uni-
form convergence of ~ fn} toward f on compact intervals, Lemma 3.1 continues
to hold and Theorem 3.2 remains valid provided that we replace the statement
that X is uniformly continuous by the weaker statement:
Vf > 0 there exists a > 0 such that

limsup sup d(X (t + s), X (t)) E.

4 Asymptotic Pseudotrajectories and Stochastic


Approximation Processes
4.1 Notation and Preliminary Result
Let F : ]Rm be a continuous map. Consider here a discrete time process
living in 1Rm (an algorithm) whose general form can be written as

xn = + (7)
where
. is a given sequence of nonnegative numbers such that

03A3
~2014~ 03B3k
k
=
~, lim 03B3n = 0.
12

. Un E IRm are (deterministic or random) perturbations.


Formula (7) be considered to be a perturbed version of a variable step-size
can

Cauchy-Euler approximation scheme for numerically solving dx/dt = F(x):


Yk+1 = Ik+1F(Yk).
It is thus natural to compare the behavior of a sample path with trajectories
of the flow induced by the vector field F. To this end we set
n

0 = 0 and n = 03A303B3i for n ~ 1, i=l

and define the continuous time a,~Cne and piecewise constant interpolated pro-
cesses X, ? :I~+ -~ Ilgm by
X ( Tn + s) = xn + s , and X (rn + s) =
zn
Tn+l - Tn

for all n E N and 0 s The "inverse" of n - rn is the map m : R+ - N


defined by
m(t) =
sup~k > 0 : t > Tk} (8)
let U, ’1 :R+ - IR m denote the continuous time processes defined by
U(Tn + S) =
Un+1 + S) _
’yn+1

for all n E N, 0 s Using this notation (7) can be rewritten as

X(t) - X(0) =
[F(X (s))
Jo
+ U(s)]ds (9)
The vector field F is said to be globally integrable if it has unique integral
curves. For instance a bounded locally Lipschitz vector field is always globally
integrable. We then have
Proposition 4.1 Let F be a continuous globally integrable vector field. Assume
that

A1 For all T > 0


k-1
sup{~03B i+1Ui+1~ :k = n + 1, ... , m ( Tn + T ) } = 0.

or equivalently
lim (t, T) = 0
with
0394(t,T) =
sup il / t+h
t
(10)
13

A2 Sllpn 00, or

A2’ F is Lipschitz and bounded on a neighborhood of {x" : n > 0}.


Then the interpolated process X is an asymptotic pseudotrajectory of the flow ~
induced by F. Furthermore, under assumption A2’, for t > 0 large enough we
have the estimate

sup + h) -

~h(X(t))~~ -

1,T + 1) + sup (y(s))]] (11)


where C(T) is a constant depending only on T and F.
Proof By continuity of F and assumption A2 there exists A’ > 0 such that
for all t ~ 0. Thus (9) and Al imply

limsup sup KT.


t~oo

Hence X is uniformly continuous.


On the other hand, a simple computation shows that
=
+At + Bt (12)
where Lp ~ is the continuous function defined as

LF(X)(S) =
X(0) +
13 F(X(u))du.
and
At(s) = [F(X(u)) -

Bt(s) U(u)du.
=

/t /
By assumption Al, limt~~ Bt 0 in =

For any T > 0 and t :s; u :s; t + T, (9) implies

X(u)1I = 1 F(X(s)) + l

Ky-(u) + ~u03C4m(u)U s d ~
For t large enough y(u) 1, therefore

)))) U(s)ds) [ ))
.It1
U(s)ds) ) + ) )
t
U
1 2A(t I , T I) . -

Thus

sup IIX(u) - 2A(t - 1, T-~ 1) + sup .

tut-~T tut+T
14

Under assumption A2 F is uniformly continuous on a neighborhood of {xn},


therefore limt~~ At = 0 in Rm) .

Let X* denote a limit point of { Ot (X) } . Then .

X* =

By uniqueness of integral curves, this implies


X* _ (X*)
Therefore Theorem 3.2 shows that X is an asymptotic pseudotrajectory of ~.
To prove the estimate in case F is Lipschitz with Lipschitz constant L observe
that for 0 ~ s T

-1,T+ 1) + sup

-1~T+ 1)~
T) -1, T + 1))
and by equation (12)
s

IIX(t s) - 03A6s(X(t))~ ~
+ L F ))x( + u) - + + .

Then use Gronwall’s inequality. QED

4.2 Robbins-Monro Algorithms


In application of Proposition 4.1 to stochastic approximation algorithms one usu-
ally tries to verify assumption Al by use of maximal inequalities and martingale
techniques.
To illustrate this idea let us consider here the simplest case of stochastic
approximation algorithms.
Let (~, ~’, P) be a probability space and a nondecreasing sequence

of sub-03C3-algebras of .F. We say that a stochastic process given by (7)


satisfies the Robbins-Monro or Martingale difference Noise (Kushner and Yin,
1997) condition if
(I) {yn } is a deterministic sequence.
is measurable with respect to .~n for each > 0.
(ii) is adapted: Un n

(iii) = o.

The next proposition is aparticular case of a general theorem due to Metivier


and Priouret (1987). The .
proof contains several inequalities that will be used
later.
15

Proposition 4.2 Let {xn} given by (7) be a Robbins-Monro algorithm. Suppose


that for some q > 2
~o
n

and
~ ’~n+qI 2
n
00 .

Then assumption A1 of proposition ,~.1 holds with probability 1.

Proof For any t > 0 Burkholder’s inequality (see e.g Stroock, 1993) implies
k-1 m(rn+T)-1
El sup ~03A303B3i+1Ui+1~q} ~
==n
CqE{[ 03A3 03B32i+1~Ui+1~2]q/2}
s-n
(13

for some universal constant Cq > 0.


To go further we need the following inequality:

( i
( i i
(14)

The proof of (14) is a consequence of the familiar Holder inequality

03A3 xiyi (03A3xui)1/u(03A3


;
yu/(u-1)i)(u-1)/u
i

obtained with xs =
a~ -~ ( and ys aa = .

Suppose q > 2. We now apply (14) with u =


q~2, ~ _ (q - 2)/2q, as =
~ +1
and ~3s = ~ ~ U=+1 ( ~ 2. Hence (13) yields
k-1 m(rn+T )-1 m(Tn+T )-1
E( sup II CqE(( £ ~=+1)ql2 1 £
j_~ ;_~ ;_~

m(r"+T)-1
E( £
i=n
(15

rn+T
~C(q, T) 03B31+q/2i+1 _ C(q, T)
/
for some constant C(q, T) > 0.
From the preceding inequality we get that

E(0394(t,T)q) ~ C(q, T )
/t (s)ds (16)
16

where T ) is as in ( 10) . If q = 2 inequality (16) follows directly from ( 13) .


Hence for q > 2

k>0
C(q,T) /"
"o
= 00. (17)

By the Borel-Cantelli Lemma this proves that

lim ~ ( kT, T ) = 0
with probability one. On the other hand for kT t (k + 1)T
~(t, T) 2~(kT, T) + 0((k + I)T, T).
Hence assumption Al is satisfied QED
Remark 4.3 Suppose that is a sequence of random variables such that
is :Fn measurable. Then the conclusion or corollary 4.2 remains valid
provided that we strengthen the assumption on to

sup C
n

for some deterministic constant C oo, and replace the assumption on by

~(~ ~,n+~’I2 ) 00.


n
The sequence {Un } is said to be subgaussian if there exists a positive number
r such that for all 03B8 E IR m

E( ex p(~ ~ 6 U n+1~E ~’ n)) _ exp( 2 () ~~1 ~~ °

This is for instance the case is bounded by ~..


The following result follows from Duflo (1997) (see also Kushner and Yin
(1997) or Benaim and Hirsch (1996) )
Proposition 4.4 Let given by (7) be a Robbins-Monro algorithm. Suppose
{Un } is subgaussian and a deterministic sequence such that

~n
DD

for each c > 0. Then assumption Al of proposition l~.1 is satisfied with proba-
bility 1. Therefore if A2 and A2’ hold almost surely and F has unique integral
curves the interpolated process X is almost surely an asymptotic pseudotrajectory

of the flow
17

Proof Let

i=l 2 i-1
By the assumption on {Un}, is a supermartingale. Thus for any
~3>0
k-1

P( sup ~8, ~ >_ Q)


.

nkm(T,,+T) i=n

h
P( sup Zk(~8) > Zn(9) exp(,~ _ -~~B~~Z ~
03B32i+1 - 03B2).
~ exp(039 2~03B8~2

i=n

Let e 1 , ... , , em be the canonical basis of a > 0 and e E {e 1 , ... , em } U


{-el, ..., -em}.
Set

R =
~
i=n

(3 = Ra and 0 = Re. Then


k-1 k-1

P( sup (e, ~ > a) =


P( sup (0, ~ > ,Q)
i=n z=n
_a2
exp( m(’rn’+’‘I’)-1 2 ).
It follows that

(18)
P(0394 t,T) ~ 03B1 C exp(-03B12 C’ t+T 03B (s)d ~ C exp(-03B12 C’T03B (u)

for t + T and some positive constants C, C’ depending on m (the


dimension of IR m) and r. The end of the proof is now exactly as in proposition
4.2. QED
Remark 4.5 Propositions 4.2 and 4.4 assume a Robbins Monro type algorithm.
However it is not hard to verify that the conclusions of these propositions con-
tinue to hold if {xn} satisfies the more general recursion

Xn+1 - xn = + Un+1 + bn+1)


where Un is a martingale difference noise and bn = 0 almost surely.
18

4.3 Continuous Time Processes


The technique used in the proof of corollary 4.4 can be easily adapted to analyse
a class of continuous time stochastic processes which include certain diffusion

and jump processes.


Let E : 11~+ -~ be a continuous non-increasing function. Consider the
families of operators and {Lt} acting on C2 functions f:
]Rm 4- R according to the formulas

Lt f(x) -

==!
8xa
+
f (t) ai,j(x) ~2fax;~xi~ax~xi (x)
2
=,j J
(19)

Ljt(x) = 1 ~(t)Rm
(f(x + ~(t)v) -

f(x)) x(dv) (20)


and
Lc = Li + Li (21)
where

(i) G is a bounded continuous vector field on

that
(ii) a =
(aij) is a m x m matrix-valued continuous bounded function such
a(x) is symmetric and nonnegative definite for each x E IRm.
(iii) a family of positive measures on I~m such that

~ ~ -). is measurable for each Borel set A C


~ The support of is contained in a compact set independent of x.

Under these assumptions there exists a nonhomogeneous Markov process X =


{X(t) :i t > 0} with sample paths in (the space of cad lag functions)
and initial condition X(0) =
xo E JRm which solves the martingale problem for
(Ethier and Kurtz, 1975; Stroock and Varadhan 1997). That is for each
C°° function f: ~ R with compact support,

Ls f (xs )ds

is a martingale with respect to 0t =


03C3{X(s) : s ~ t}.
Define the vector field

F(x) = G(x) + (22)

Proposition 4.6 Suppose that

F is continuous globally integrable vector field


(i) a
19

(~~>
~0 exp(-c~(t )dt o0

for all ~ > 0

(iii) oo) = 1 or F is Lipschitz.


Then X is almost surely an asymptotic pseudo trajectory of the flow induced by
F. Furthermore when F is Lipschitz we have the estimate: There exist constant
C, C(T) > 0 such that for all a > 0
P( sup + h) -

~c~X~t))~~ > a)

Proof Set
0394(t,T) sup0~h~T~X(t
-
+ h) - X(t) - ft+ht F(X(s) ds~
Let f(x) exp(B, ~ xo~ and r" inf{t > 0 X (0, t~ n B(0, n)‘ ~ 0} where
= -
=

B(0, n) f x E =
n}. Since the measure has uniformly bounded
support there exists r > 0 such that f(X(t n T")) fn(X(t n T")) where fn =

is a C°° function with compact support which equals f on B(0, n + r). Since
f" (X (t)) - fo L, f" (X (s))ds is a martingale and r" a stopping time f (.Y(t n
-
is a martingale. Hence

f(X t 039B 03C4n)exp[-t039B C4n0Lsf(X s) f(X s) d ]


is a martingale (Ethier and Kurtz, 1975).
Let g(u) = e" - u - 1. Then

Now the facts that


using g(u), g is non-decreasing on II8+, g(u) =
u2/2 + o(u) and the boundedness assumptions on x and the support of it is
not hard to verify that there exist a constant r > 0 and to > 0 such that for
s>to

Therefore

is an supermartingale for t > to. As is Proposition 4.4 we obtain

P(~X t+h)039B C4n)-X(t039B C4n)-(t+h)039B C4nt039B C4sF(Xs)~ 03B1)~Cexp(-03B12C’T ~ (t)


20

for t > to and by Fatou’s lemma we conclude that

P(0394t,T)~03B1)~C exp(-03B12C’T~(t).

The rest of the proof is now exactly as in Proposition 4.4. Details are left to the
reader. QED

5 Limit Sets of Asymptotic Pseudotrajectories


5.1 Chain Recurrence and Attractors
In this section weintroduce some basic terminology and a few results from
(topological) dynamics that will be useful to understand the behavior of asymp-
totic pseudotrajectories and stochastic approximation processes. In particular
we introduce the notion of chain recurrence and emphasize its relation with the

notion of attractors (thanks to Moe Hirsch who taught me the importance of


this relation). The material of this section is fairly standard to dynamicists and
can be found in numerous places. However since the students for which these
notes have been written (as well as the typical reader of the Seminaire ) may
not be familiar with these notions we have tried to give a self contained and
comprehensive presentation.
The main and original reference for this section is Conley ( 1978) . The books
by Shub (1987) and Robinson (1995) also contain most of the material here.

Basic Notions of Recurrence


Let 03A6 be a flow or semiflow on the metric space (M, d). We let T = R+ if 03A6 is
a semiflow and T = R if ~ is a flow.
A subset A C M is said positively invariant if C A for all t > 0. It is
said invariant if ~t (A) = A for all t E ’~.
A point p E M is an equilibrium if 03A6t (p) = p for all t. When M is a manifold
and ~ is the flow induced by a vector field F, equilibria coincide with zeros of F.
A point p E M is a periodic point of period T > 0 if ~T ( p) = p for some T > 0
and ~t (p) ~ p for 0 t T.
The forward orbit of x E M is the set ~y+ (x) = ~~t (x) : t > 0} and the
orbit of x is =
~~~(x) : t E 1C}. A point p E M is an omega limit point of
x if p = ~ck (x) for some sequence tk --~ oo. The omega limit set of x
denoted cj(.c) is the set of omega limit points of x.
has compact closure, is a compact connected invariant set (It
is a good warm up exercise for the reader unfamiliar with these notions) and
y’ ~+(x) U
=

If ~ is a flow the alphalimit set of x is defined as the omega limit set of x

for the reversed ~~t} with Wt = ~_t.


21

Further we set Eq(~) the set of equilibria, Per ~ the closure of the set of
periodic orbits L+(03A6) = ~x~M w(x), L-(03A6) = ~x~M a(x) and
~(~) _ ~+(~) ~,C_(~).
Chain Recurrence and Attractors

Equilibria, periodic and omega limit points are clearly "recurrent" points. In
general, we may say that a point is recurrent if it somehow returns near where
it under time evolution.
was
A notion of recurrence related to slightly perturbed orbits and well suited
to analyse stochastic approximation processes is the notion of chain recurrence
introduced by Bowen ( 1975) and Conley (1978).
Let 6 > 0, T > 0. A (03B4,T)-pseudo-orbit from a E M to b E is a finite
sequence of partial trajectories

~ ~0t t ~ }~ i=0> > ... > k-1~ t_ _ >T


>

such that

d (Yo, a) I,
=
0,..., k -1;

We write (~ : a b) (or simply a when there is no confusion on ~)


if there exists a (6, T)-pseudo-orbit from a to b. We write a b if a b for
every 6 > 0, T > 0. If a ~ a then a is a chain recurrent point. If every point of
M is chain recurrent then ~ is a chain recurrent semiflow (or flow).
If a b for all a,b E M we say the flow ~ is chain transitive.
We denote by -R(~) the set of chain recurrent points for ~. It is easy to verify
(again a good warm up exercise) that R(~) is a closed, positively invariant set
and that
Eq(~) C Per(~) C ,C(~) C R(~.)
We will see below that R(~) is always invariant when it is compact (Theorem
5.5).
Let A C M be nonempty invariant set. ~ is called chain recurrent on A if
a

every point p E A a chain recurrent point for


is ~~A, the restriction of ~ to A.
In other words, A =

A compact invariant set on which ~ is chain recurrent (or chain transitive)


is called an internally chain recurrent (or internally chain transitive) set.

Example 5.1 Consider the flow on the unit circle Sl =


R/27rZ induced by the
differential equation
d03B8 dt = f( 03B8)
22

Figure 1: 9 = f(0)

where f is a 27r-periodic smooth nonnegative function such that

/-’(0)={~7r:~eZ}.
We have
E9(~) =
~r} =
L+(~) -
L(~)
and
~(~) - S~ . .

Internally chain recurrent sets are {0}, and Sl. Remark that the set X =

~0, ~r~ is a compact invariant set consisting chain recurrent points. However,
of
X is not internally chain recurrent.

A subset A C M is an attractor for 03A6 provided:


(i) A is nonempty, compact and invariant =
A) ; and

(ii) A has a neighborhood W C M such that A) -~ 0 as t -~ o0

uniformly in x E W.

The neighborhood W is usually called a fundamental neighborhood of A. The


basin of A is the positively invariant open set comprising all points x such that
~4) -~ 0 as t - oo. If ~4 ~ M then A is called a proper attractor. A
global attractor is an attractor whose basin is all the space M. An equilibrium
(= stationary point) which is an attractor is called asymptotically stable.
The following Lemma due to Conley (1978) is quite useful.

Lemma 5.2 Let U C M be an open set with compact closure. Suppose that
03A6T(U) C_U for some T > 0. Then there exists an attractor A C U whose basin
contains U.

set V such that~T ( U ) C


Proof By compactness of ~~ ( U ) there exists an open
V C V C U. By continuity of the flow there exists > 0
f such that V for
~t (U) C
T-f _ t _ T+E. Let to = T(r+l)/~. F_ort > to writet k(T+r/k) with kEN =

and 0 r/k f. Therefore for all x E U ~t(x) ~T+r/k ~ ~ ~ ~ ~ = E V.


23

Then At = C V C U. and A = nt>oAt C U. It is now easy to verify


that A is an attractor. Details are left to the reader. QED

The following proposition originally due to Bowen (1975) makes precise the
relation between the different notions we have introduced.

Proposition 5.3 Let A C M. The following assertions are equivalent


(i) A is internally chain-transitive
(ii) A is connected and internally chain-recurrent
(iii) A is a compact invariant set and admits no proper attractor.
Proof (ii) is easy and left to the reader. (ii) =~ (iii). Let A C A be
(i) =>
a nonempty attractor. To prove that A = A it suffices to show that A is open
and closed in A. Let W be an open (in A) fundamental neighborhood of A. We
claim that W = A. Suppose to the contrary that there exists A. Let
U5 =
{x E A : d(x, A) ~}. Choose ð small enough so that Us C U26 C Wand
C W B U203B4 For T large enough and t >_ T 03A6t(W) C Ua . Therefore it is
impossible to have p p. A contradiction.
(iii) =~ (i).Let x E
> 0, T > 0 and V = ~y E A : (~~A : x y)}.
The set V is open (by definition) and satisfies ~T (V ) C V. It then follows from
Lemma 5.2 that V contains an attractor but since there are no proper attractors
V = A. Since this is true for all x E A, ~ > 0 and T > 0 it follows that A is
internally chain transitive. QED
Corollary 5.4 If an internally chain transitive set K meets the basin of an
attractor A, it is contained in A.

Proof By compactness, [{ n A is nonempty, hence an attractor for the ~~I~


Since has no proper attractors, being chain transitive, it follows that I~ C A.
QED
The following theorem was proved by Conley (1978) for flows but the proof
given here is adapted from a proof given by Robinson (1977) for diffeomorphims.
Theorem 5.5 If M is compact then R(~) is internally chain recurrent.

Proof First observe that R(~) is obviously a compact subset of M. Let p E


R(~). For n E N and T > 0 there exist points p = in M and
times tl, ...,tk" with t; > T such that p3 = p,
i = 0, ... k" -1 and
p +1) 1/ n for
p) 1/ra. Further we can always assume (by adding
points to the sequence) that tt 2T and since Cn = { po , ... is a compact
subset of M we can also assume (by replacing Cn by a subsequence that
~Cn} converges toward some compact set C for the Hausdorff topology2. By
2If A and B are closed subsets of M the Hausdorff distance D(A, B) is defined as D(A, B) _

inf{~ OA C
> C This distance makes the space of closed subsets of M a

compact space (se e.g Munkres exercise 7 page 279).


24

construction C C R(~) and p E C. Fix E > 0. By uniform continuity of ~ :


~0, 2T~ x M - M there exists 0 a E/3 so that d(a b) ~ ~ ~t(b))
f for all 0 t 2T. Now for n large enough C C Us (Cn ) and Cn C
Therefore there exist points qs E C such that
U~ (C) .

~. It follows that

’f’ d(~ +1~ 9 +1) _ E.


°

Thus we have constructed an (é, T) pseudo orbit from p to p which lies entirely
in C C R(~). To conclude the proof it remains to show that R(&#x26;) is invariant. It
is clearly positively invariant. Let p and Cn as above. By extracting convergent
subsequences from {tkn _ 1 } and {pkn _ 1 } we obtain points r E ~T, 2T~ and p* E
R(~) such that ~T (p~) = p. Hence p E ~t (R(~)) for all 0 t T and since p
is arbitrary R(~) C t (R( )) for all 0 ~ T. By the semiflow property this
implies R($) ~t (R(~))
C for all t > 0. QED
Corollary 5.6 Let x E M (non-necessarily compact). is compact then
w(z) is internally chain transitive.
Proof Let T [0,1] x 03B3+(x)
= and 03A8 the semiflow on T defined by y) =
(e’tu, 03A6t(y)). Clearly {0} x is a global attractor for 03A8 and points of {o} x
w(x) are chain recurrent for Therefore R(w) = {o} x w(x). By Theorem 5.5
= This implies =
w(x) and w(x) being connected
it is internally chain transitive by Proposition 5.3. QED

5.2 The Limit Set Theorem


Let X : asymptotic pseudotrajectory of a semiflow ~. The limit
set L(X) of X, defined in analogy to the omega limit set of a trajectory, is the
set of limits of convergent sequences X(tk), tk - oo. That is

t~O

Theorem 5.7

(i) Let X be a precompact asymptotic pseudotrajectory Then L(X) inter-


nally chain transitive.
(ii) Let L C M be an internally chain transitive set, and assume M is locally
path connected. Then there exists an asymptotic pseudotrajectory X such
that L(X) = L.

Proof We only give the proof of (i). We refer the reader to Benaim and Hirsch
(1996) for a proof of (ii) and further results. Since {X(t) : t > 0} is relatively
compact, Theorem 3.2 shows that {Ot (X) : t E l~} is relatively compact in
M) and limt~~ d(0398t(X), S03A6) = 0. Therefore by Corollary 5.6 the omega
limit set of X for 0, denoted by is internally chain transitive for the
semiflow 9 ~ S~ .
25

The homeomorphism H: M ~ defined by H(x)(t) = ~t(x) conjugates


elStt and ~:

where t > 0 for a semiflow ~, and t E 1R for a flow. Since the property of being
chain transitive is (obviously) preserved by conjugacy it suffices to verify that

H(L(X)) =

to prove assertion(i). Let p E L(X). Then p = limtk~~ X(tk). By relative


compactness of we can always suppose that
converges
toward some point Y E M). By lemma 3.1 Y = =
~ (Y)
H(Y(O)) = H ( p) .

This shows that H(L(X)) C The proof of the converse inclusion is


similar. QED

Remark 5.8 Our proof of Theorem 5.7 follows from Benaim and Hirsch ( 1996) .
It has the nice interpretation that the limit set L(X) can be seen as an omega
limit set for an extension of the flow to some larger space. A more direct proof in
the spirit of Theorem 5.5 can be found in Benaim (1996) (see also Duflo 1996).

6 Dynamics of Asymptotic Pseudotrajectories


Theorem 5.7 and its applications in later sections show the importance of un-
derstanding the dynamics and topology of internally chain recurrent sets (which
in most dynamical settings are the same as limit sets of asymptotic pseudo-
trajectories). Many of the results which appear in the literature on stochastic
approximation can be easily deduced (and generalized) from properties of chain
recurrent sets. While there is no general structure theory for internally chain
recurrent sets, much can be said about many common situations. Several useful
results are presented in this section. The main source of this section are the
papers (Benaim, 1996) and Benaim and Hirsch (1996) but some results have
been improved. In particular we give an elementary proof of the convergence of
stochastic gradient algorithms with possibly infinitely many equilibria. Several
results by Fort and Pages (1996) are similar to those of this section.
We continue to assume that X : R+ - M is an asymptotic pseudotrajectory
for a flow or semiflow ~ in a metric space M. Remark that we do not a priori
assume that X is precompact.

6.1 Simple Flows, Cyclic Orbit Chains


A flow on M is called
simple if it has only a finite set of alpha and omega limit
points (necessarily consisting of equilibria). This property is inherited by the
restriction of ~ to invariant sets.
A subset 0393 ~ M is a orbit chain for $ provided that for some natural number
k > 2, r can be expressed as the union
r =
(ei ..., ek} ~03B31~ ...
26

of equilibria { e i , ... , ek } and nonsingular orbits ~i , ... , yk- i connecting them:


this means that Ii has alpha limit set {e~ } and omega limit set {e~+i }. Neither
the equilibria nor the orbits of the orbit chain are required to be distinct. If
ei = ek, r is called a cyclic orbit chain. A homoclinic loop is an example of a
cyclic orbit chain.
Concerning cyclic orbit chains, Benaim and Hirsch (1995a, Theorem 3.1)
noted the following useful consequence of the important Akin-Nitecki-Shub Lemma
(Akin 1993).
Proposition 6.1 Let L C M be internally chain recurrent set. If ~~L is a
an

simple flow, then every non-stationary point of L belongs to a cyclic orbit chain
in L.

From Theorem 5.7 we thus get:

Corollary 6.2 Assume that X is precompact and a simple flow.


Then every point of L(X) is an equilibrium or belongs to a cyclic orbit chain in
L(X).
Corollary 6.3 Assume L C M is an internally chain recurrent set such that
n

c A =
U A~
j=1

where A1, ..., An are compact invariant subsets of L. Then for every point p E L
either p ~ or there exists a finite sequence xi, ..., xk E L 1 A and indices
ii , ik such that
...

(1) {ai, ..., ik_i} C {1, n} and ik zi


...,
=

(ii) C C for I =1, k -1. ... ,


.

In particular if there is no cycle among the A~ then L C A.

Proof Let L be the topological quotient space obtained by collapsing each Ai


to a point. Let 1r denote the quotient map 7r : L - L. We claim that L is
metrizable. By the Urysohn metrization Theorem, it suffices to verify that L is
a regular space with a countable basis.
We first construct a countable basis L
at each point i = E

as follows. If Z ~ A choose 0 dx d(x, A) and set Un(x) _


-)).
Let 0 f infi~j d(A,,Aj). For x E A; set =

basis it is immediate to verify that L is Hausdorff, and since it is


Using this
be a countable
compact (by continuity of 7r) it is a regular space. Now let
dense set in L. The family is a countable basis of L.
The flow ~ induces a flow @ on L defined which has simple
dynamics, and the Aj as equilibria.
Let x E L. It is clear, by definition of chain recurrence and uniform continuity
of 7r, that is chain recurrent for ~. Hence L is internally chain recurrent
and the result follows from Proposition 6.1. QED
27

6.2 Lyapounov Functions and Stochastic Gradients


Let A C M be a compact invariant set of the semiflow ~. A continuous function
V: M - R is called a Lyapounov function for A if the function t E R+ -
V (~t (x)) is constant for x E A and strictly decreasing for x E M ~ A. If A
equals the equilibria set Eq(~), V is called a strict Lyapounov function and ~ a
gradientlike system.
Proposition 6.4 Let A C M be a compact invariant set and V: M -~ I~ a
Lyapounov function for A. Assume that V(A) C R has empty interior. Then
every internally chain transitive set L is contained in A and is constant.
Proof Let L C M be an internally chain transitive set. Let v* inf{V(x) :: =

x E L }We claim that L n A ~ ~ and

= inf{V(x) : x E L n A}.
v*

Let x E L. The function t -)- V(~t(x)) being non-increasing and bounded the
limit =
limt~~ V(03A6t (x)) exists. Therefore V (p) =
V(x) for all
pE By invariance of V is constant along in
trajectories w(x). Hence
C A. This proves the claim.
By continuity of V and compactness of L n A, v* E V(L n A). Since V(A)
has empty interior there exists a sequence , vn E RB V(A) decreasing
to For n > 1_let Ln = {x E L: V(x)
v * .
vn}. Because V is a Lyapounov
function for A C Ln for any t > 0. Hence by Lemma 5.2 and Proposition
5.3 L = Ln. Then L = ~n~1 Ln = {x E L : V(x) = v*}. This implies L = A and
V(L) =
~v~}. QED
Remark 6.5 The following example shows that the assumption that V(A) has
empty interior is essential in Proposition 6.4.
Consider the flow on the unit circle Sl = R/27rR induced by the differential
equation
that
~ 8 _ d t f(8) wheref is
E
periodic
a

f ’ 1 (0) _ { [k~r, k ( ~ + 1)] : k 2Z}.


2~r smooth
nonnegative function such
Then S1 is clearly internally chain
transitive. However any 27r periodic smooth nonnegative function V : - R
strictly increasing on ~0, ~r( and strictly decreasing on )~r, is a strict Lyapounov
function.

Corollary 6.6 Assume that X is precompact, ~ admits a strict Lyapounov


function, and that there are countably many equilibria in L(X ) Then X(t) .

converges to an equilibrium as t -~ oo.


The following corollary is particularly useful in applications since it provides
a general convergence result for stochastic gradient algorithms.
Corollary 6.7 Assume M is a smooth Cr Riemannian
manifold of dimension
m > 1 V:M - I~ a C’r map and F the gradient vector field

F(x) =

Assume
28

(i) F induces a global


(ii) X is a precompact asymptotic pseudotrajectory of ~
(iii) r > m
Then L(X) consists of equilibria and V(X(t)) converges as t -~ oo.

Proof Let A =
Eq(~). By Sard’s theorem (Hirsch, 1976; chapter 3) V(A) has
Lebesgue in I~ and the result follows from
measure zero Proposition 6.4 applied
with the strict Lyapounov function V. QED

6.3 Attractors
Let X : R - M be an asymptotic pseudotrajectory of ~. For any T > 0 define

dx(T) =
X (kT + T)). (23)
kEN

If a point x to the basin of attraction of an attractor A C lYl


E M belongs
then ~t (x) - A The next lemma shows that the same is true for an
00.

asymptotic pseudotrajectory X provided that dx (T) is small enough and M is


locally compact. This simple lemma will appear to be very useful in the next
section.

Lemma 6.8 Assume M is locally compact. Let A C M be an attractor with


basin B(A) and let I~ C B(A) be a nonempty compact set. There exist numbers
T > 0, b > 0 depending only on [{ such that:
1 f X is an asymptotic pseudotrajectory with X(O) E Ii and dx(T) ~, then
L(X) C A.
Proof Choose an open set W with compact closure such that A U Ii ~ W ~
W C B(A) and choose 6 > 0 such that (the 2~ neighborhood of A) is
contained in W. Since A is an attractor there exists T > 0 such that C
Now, ifX(O) E K and dx(T) 6 we have ~T(X(0)) E and
d (X (T ), ~~ (X (o) ) ~. Thus X(T) E U2b (A) C W. By induction it follows that
X(kT) E W for all Thus, by compactness, L(X) n W ~ 0 and L(X) is
compact as a subset of ~(~O, T] x W). Since points in L(X) n Ware attracted
by A and L(X) invariant, L(X) n ~4 ~ 0. The conclusion now follows from
is
Proposition 5.3 and Theorem 5.7. QED
Below we assume that M is locally compact.

Theorem 6.9 Let e be a asymptotically stable equilibrium with basin of attrac-


tion W and K C W a compact set. If X(tk) E K for some sequence tk -~ oo,
then limt~~ X(t) = e.

In the context of stochastic approximations this result was proved by Kushner


and Clark (1978). It is an easy consequence of Theorem 5.7 because the only
chain recurrent point in the basin of e is e.
More generally we have:
29

Theorem 6.10 Let A be an attractor with basin Wand K C W a compact set.


If X(tk) E K for some sequence tk ~ ~, then L(X) C A.
Proof Follows from Theorem 5.7 and Lemma 6.8. QED
Corollary 6.ll Suppose M is noncompact but locally compact and that ~ is
dissipative meaning that there exists a global attractor for ~. Let M U ~oo}
denote the one-point compactification of M. Then either L(X) is an internally
chain transitive subset of M or X(t) oo. =

When applied to.stochastic approximation processes such as those described


in section 4 Propositions 4.2 and 4.4 under the assumption that F is bounded
and Lipschitz, Corollary fi.ll implies that with probability one either X(t) -~ o0
or L(X) is internally chain transitive for the flow induced
by F.

6.4 Planar Systems


The following result of Benaim and Hirsch (1994) goes far towards describing
the dynamics of internally chain recurrent sets for planar flows with isolated
equilibria:
Theorem 6.12 Assume ~ is a flow defined on 1~2 with isolated equilibria. Let
L be an internally chain recurrent set. Then for any p E L one of the following
holds:

~i~ p is an equilibrium.
(ii) p is periodic (i. e ~T ( p) =
p for some T> 0 ).
(iii) There exists a cyclic orbit chain r C L which contains p.
Notice that this rules out trajectories in L which spiral toward a periodic orbit,
or even toward a cyclic orbit chain.
In view of Theorem 5.7 we obtain:

Corollary 6.13 Let ~ be a flow in R2 with isolated equilibria. I f X is a bounded


asymptotic pseudotrajectory of ~ then L(X) is a connected union of equilibria,
periodic orbits and cyclic orbit chains of ~.
The following corollary can be seen as a Poincaré-Bendixson result for asymp-
totic pseudotrajectories:

Corollary 6.14 Let ~ be a flow defined on R2, K C R2 a compact subset


without equilibria, X an asymptotic pseudotrajectory of ~. If there exists T > 0
such that X(t) E K fort > T, then L(X) is either a periodic orbit or a cylinder
of periodic orbits.
Of if X(t) is an actual trajectory of 03A6, the Poincaré-Bendixson theo-
course
rem precludes a cylinder of periodic orbits. But this can easily occur for an
asymptotic pseudotrajectory.
The next result extends Dulac’s criterion for convergence in planar flows
having negative divergence:
30

Theorem 6.15 Let ~ be a flow in an open set in the plane, and assume that
~t decreases area for t > 0. Then:
(a) L(X) is a connected set of equilibria which is nowhere dense and which does
not separate the plane.
(b) If ~ has at most countably many stationary points, than L(X) consists of
a single stationary point.

Proof The proof is contained in that of Theorem 1.6 of (Benaim and


Hirsch 1994); here is a sketch. The assumption that $ decreases area implies
that no invariant continuum can separate the plane. A generalization of the
Poincare-Bendixson theorem (Hirsch and Pugh, 1988) shows that an internally
chain recurrent continuum (such as L(X)) which does not separate the plane
consists entirely of stationary points. Simple topological arguments complete
the proof. QED

Example 6.16 Consider the learning process described in section 2.3. Assume
that the probability law of ~n is such that functions h2 are smooth. Then
the divergence of the vector field (6) at every point (xl, x2) is

Trace(DF(x1, x2)) = -2.

This impliesthat ~t decreases area for t > 0. Since the interpolated process of
is almost surely an asymptotic pseudotrajectory of ~ (use Proposition 4.4),
the results of Theorem 6.15 apply almost surely to the limit set of the sequence
M.
For more details and examples of nonconvergence with more that two players
see (Benaim and Hirsch, 1994; Fudenberg and Levine, 1998).

7 Convergence with positive probability toward


an attractor

Throughout this section X is a continuous time stochastic process defined on

some probability space (Q, 0, P) with continuous (or càd lag) paths taking value
in M.
We suppose that X(.) is adapted to a non-decreasing sequence of sub-a al-
gebras : t > 0} and that for all d > 0 and T > 0

P(sup[ sup d(X(s + h), ~h(X(s)))~ ~ d~~t) ~ d, T) (24)


s>t

for some function w : R+ such that

lim w(t, d, T) ,~ 0.
31

A sufficient condition for (24) is that

t+T
P( sup d(X(t + h), ~h(X(t))) > _
J~
r(t (25)

for some function r : R+ such that

~0r(t,03B4,T)dt ~.
This last condition is satisfied by most examples of stochastic approximation
processes (see section (4) and section (7.2) below).
Our goal is to give simple conditions ensuring that X converges with pos-
itive probability toward a given attractor. We develop here some ideas which
originally appeared in Benaim (1997) and Duflo (1997).

7.1 Attainable Sets


A p E M is said to be attainable by X if for each t > 0 and every open
point
neighborhood U of p
P(3s > t: X(s) E U) > 0.
Lemma 7.1 The set Att(X) of attainable points by X is closed, positively in-
variant under 03A6 and contains almost surely L(X) . .

Proof A point p lies in M B Att(X) if there exists a neighborhood U of p and


t > 0 such that
P(Vs 2:: t :: U) == 1.

Hence it is clear that M B Att(X) is an open set almost surely disjoint from
L(X).
It remains to prove that given any p E Att(X) and T > 0, T(p) E Att (X ) .
Fix f > 0. By continuity of ~T there exists a > 0 such that ~T (Ba ( p} ) C

BE~2(~T (p)}. Since p E Att(X) and X is continuous there exists a sequence of


rational numbers with sk -~ oo such that P(X(sk) E Ba(P)) > 0.
Choose k large enough so that X(Sk + T)) >
1/2. Hence P(X(Sk + T) E > 0. This proves that ~T(p) E Att(X}.
QED
Example 7.2 Let X be the interpolated process associated to the urn process
described in section 2.2. Suppose that the urn function f maps ~m into
Then it is not hard to verify that every point of ~m is attainable.
The proof is left to the reader.

Theorem 7.3 Suppose M is locally compact. Let A C M be an attractor for ~


with basin of attraction B(A). If Att(X) n B(A) ~ 0 then
.

P(L(X) C A) > 0.
32

Furthermore, if U C M is an open set relatively compact with U C B(A) there


ex~ist numbers T, ~ > 0 (depending on U ) so that

P(L(X) C A) ~ (1- w(t, ~~ T))P(3s > t: X(s) E U).


Proof Let U be an open set such that K = U is a compact subset of B(A). To
the compact set K we can associate the numbers T > 0, ~ > 0 given by Lemma
(6.8).
Let t > 0 sufficiently large so that w(t, 8, T) 1. For n E N and k ~ N set
tn (k) = 2and let

03C4n = inf{tn(k):X(tn(k) ~ U and tn(k) ~ t}.

By Lemma (6.8)
{Tn oo} n { sup d(X(s + T) ~~(X (s)) - ~} C {L(X ) C A}.

Hence

p(L(X) C A) >
~ E[P( sup d(X (s+T), -
k>(2"tJ-~1

>
~, (1-’~(tn(k)~~~T’))p(Tn =
tn(k)) >_ (1- ~).
k>(2nt~+1
Since oo) =
P(3s > t: X (s) E U) we obtain

p(L(X) C A) >_ (1- w(t~ S~ T))P(3s >- t: X(s) E U).


Now, P(L(X) C A) > 0 it suffices to choose for U a neighborhood
to prove that
of a point p E Att(X) n B(A). QED

7.2 Examples
Let F I~m I~m be Lipschitz vector field. Consider the
Proposition 7.4 : --~ a

diffusion process
dX =
F(X)dt + ~(t)dBt
where E is a positive decreasing function such that for all c > 0
~0 exp(-
c ~(t )dt
)
~.

Then

(i) For each attractor A of F the event

QA =
lim d(X(t), A)
{ t-~~ =
0} _ {L(X) C A}
33

has positive probability and for each open set U relatively compact with
UC B(A)
> P(3s > t : X(s) E

with ~ and T given by Lemma 6.8 and C, C(T ) are positive constant (de-
pending on F.)
(ii) On S2A L(X) is almost surely internally chain transitive.
(iii) If F is a dissipative vector field with global attractor A

-1 ~X(t)~ =
~) > 0.

Proof follows from the fact that the law of X(t) has positive density with
(i)
respect to the
Lebesgue measure. Hence Att(X) = and Theorem 7.3 applies.
The lower bound for follows from Theorem 7.3 combined with Proposi-
tion 4.6, (iii). Statement (iii) follows from Theorems 7.3 and fi.ll. QED

Similarly we have
Proposition 7.5 Let F : l~m -~ I~m be a Lipschitz bounded vector field. Con-
sider a Robbins-Monro algorithm (7) satisfying the assumptions of Proposition
,~.2 or ,~.,~. Then
(i) For each attractor A C 1~m whose basin has nonempty intersection with
Att(X) the event
d(X(t), A) =
0} =
tL(X ) C A}
has positive probability and for each open set U relatively compact such
that U C B(A)

P(QA) > P(3s > t : X(s) E s)ds)


where
-03B4
r(03B4, T, s) =
C exp( 2C(T) 03B3(s) )
if ~Un ~ is subgaussian (Proposition ,~.,~~ and

r(03B4, T, s) = C (T, q) ( )
03B (s) 03B4
under the weaker assumptions given by Proposition 4.2 with 03B4 and T given
by Lemma 6.8. Here C, C(T) C’(T, q) denote positive constants.
(ii) On Q A L(X) is almost surely internally chain transitive.
(iii) If F is a dissipative vector field with global attractor A
(I - IIX(t)II > 0. =
34

7.3 Stabilization
Most of the results given in the preceding sections assume a precompact asymp-
totic pseudotrajectory X for a semiflow ~. Actually when X is not precompact
the long term behavior of X usually presents little interest (See Corollary 6.11). .

For stochastic approximation processes there are several stability conditions


which ensure that the paths of the process are almost surely bounded. Such
conditions can be found in numerous places such as Nevelson and Khasminskii
(1976), Benveniste et al (1990), Delyon (1996), Duflo (1996, 1997), Fort and
Pages (1996), Kushner and Yin (1997), to name just a few. We present here a
theorem due to Kushner and Yin (1997, Theorem 4.3). .

Theorem 7.6 Let

Xn+1 - xn = + Un+1)
be Robbins-Monro algorithm (section ,~.~). Suppose that there exists a C2
a

function V : I~m -~ I~+ with bounded second derivatives and a nonnegative


function k -)- R+ such that:

(1) ~~V (x)~ _ -~(x). °

(ii) limllxll--"oo V(x) = +~.

(iii) There are positive constants K, R such that


+ Kk(xn)
when
E(n
+ 00.

(iv) E(k(xn)) oo ifV(xn) oo and E(V(xo)) oo.

Then lim oo with probability one.


Proof A second order Taylor expansion and boundedness of the second deriva-
tive of V implies the existence of some constant Ki > 0 such that

+ + .

The hypotheses then imply that E(V(xn)) oo for all n. Let

Wn - +
s>n

and Vn =
V(xn) + Wn Vn is nonnegative and
E(Vn+1 - +
35

Since in - 0 there exists no > 0 such that E(Vn+1 - 0 for n >


no. Since Vn 2: 0 and E(Vno) oo the supermartingale convergence theorem
implies that { Vn } converges with probability one toward some nonnegative L 1
random variable V. Since assumption (iii) implies that Wn - 0 with probability
one,V(xn) -~ V with probability one. By assumption (ii) we then must have
lim supn~~ V(xn) oo QED

8 Shadowing Properties
In this section we consider the following question:
Given a stochastic approximation process such as (7) (or more generally an
asymptotic pseudotrajectory for a flow ~) does there exist a point x such that
the omega limit set of the trajectory {~t(x) : t > 0} is L(X) ?
The answer is generally negative and L(X) can be an arbitrary chain tran-
sitive set. However it is useful to understand what kind of conditions ensure a
positive answer to this question. A case of particular interest in applications
is given by the following problem: Assume that each ~- trajectory converges
toward an equilibrium. Does X converge also toward an equilibrium ?
The material presented in this section is based on the works of Hirsch ( 1994),
Benaim (1996), Benaim and Hirsch ( 1996), Duflo (1996) and Schreiber (1997).
We begin by a illustrative example borrowed from Benaim (1996) and Duflo
(1996).
Example 8.1 Consider the Robbins Monro algorithm given in polar coordi-
nates (p, 8) by the system

Pn+1- Pn = +

=
+
where is a sequence of i. i. d random variables uniformly distributed on
~~n }
~-1,1~ and satisfies the condition of Proposition 4.4. The function h is a
smooth function such that h(u) and -3 h(u) -4 for
~c > 4, g(p) = and ~yn 1/4 for all n. These choices ensure that the
algorithm is well defined (i.e po > 0 implies Pn 0 for all n ~ 0).
We suppose given po > 0. It is then not hard to verify there exist some
constants 0 k(po) K(po) such that k(po) Pn K(po) for all n >_ 0.
Let F : I~2 ~ R 2 be the vectorfield definedby

F(x, y) =
(xh(x2 + y2) - y3, + y2) + xy2). °
(26)
Then Xn =
(xn, y") =
(PnCOS(Bn), , satisfies a recursion of the form

Xn+1 - Xn =
(F(Xn) + Un+1) +
where {Un} is a sequence of bounded random variables such that =

0.
36

Figure 2: The phase portrait of F(x, y) =


(xh(x2 + y2) - y3, + y2) +

Let ~ be the flow induced by F (see figure 2). Equilibria of $ are the points
a = (-l, o), b = (0,0), c = (0,1), and every trajectory of ~ converges toward one
of these equilibria. Internally chain transitive sets are the equilibria {a}, {b}, {c}
and the unit circle Sl =={/?= 1} which is a cyclic orbit chain.
Since {(xn, 2/n)} lives in some compact set disjoint from the origin, Theorem
5.7 combined with Proposition 4.4 and Remark 4.5 imply that the limit set of
is almost surely one of the sets {a}, {c} or 81. We claim that if ~ y~ =
m then this limit set is almost surely Suppose on the contrary that { (xn, yn ) }
of a or c. Then limn~~ d(Bn, = 0 and since
converges toward one the points
= the sequence must converges. On the other hand by
the law of iterated logarithm for martingales lim 03A3ni=1 03B3i03BEi = ~. Thus
n

lim sup 03B8n ~ 90 + lim sup 03A303B3i03B6i = oo .


n-~oo n-~oo
1=1

A contradiction.
This example shows that the limiting behavior of a stochastic approximation
process can be quite different from the limiting behavior of the associated
ODE.
We will show later (see Example 8.16) that {(xn, yn)} actually converges toward
one the points a or c provided that yn goes to zero "fast enough". .

8.1 A-Pseudotrajectories
Let X denote an asymptotic pseudotrajectory for a semiflow ~ on the metric
space M. For T > 0 let

e(X~T) t-).oo t
sup d ( X ( t + h , ) ~h ( X ( t )))
and define the asymptotic error rate of X to be

e(X) =
sup e(X,T).
T>o
37

If e(X) A 0 we call X a 03BB-pseudotrajectory of 03A6.


The maps (&#x26;t ) are said to be Lipschitz, locally uniformly in t > 0 if for each
T > 0 there exists L(T) > 0 such that

d(&#x26;h (z) , &#x26;h (y)) L(T)d(z, y)


for all 0 h T, z, y e M.

Lemma 8.2 If the (&#x26;t ) are Lipschitz, locally uniformly in t > 0 then e(ii, T) =

e(X) for all T > 0.


Proof Let T’ > T > 0. It is clear from the definition that e(X, T) e(,I, T’) .
Conversely, write T’ = kT + r with k ~ N and 0 r T and set D(X, R, t) =
sup0~h~Rd(03A6h(X(t),X(t h)). Then _+
D(X, T’ , t) D(X, (k+I)T, t) sup(D(X, kT, T) , L(T)D(X, kT, t)+D(X, T, t+kT) )
Thus
e(,I, T’) e(X, (k + I)T) sup(e(X, T), e(ii, kT) ) .

Therefore e(X, T’) e(X, (k + I)T) e(iY, T) QED .

Main Examples
Our main example of 03BB-pseudotrajectories is given by stochastic approximation
processes whose step sizes go to zero at a "fast" rate:

Proposition 8.3 Let (zn) given by (7) be a Robbins-Monro algorithm. Suppose


ll’t) =
lim sup log(03B3n) n 0.

Assume that F is Lipschitz bounded vector field and that (Un) satisfies the
a

assumptions of Proposition g.2. Then X (the interpolated process) is almost


surely a l(03B3) 2
-pseudotrajectory of &#x26; (the flow induced by F ).
Proof Set A = I(y) and let 0 e -A. For t large enough y(t)
Therefore, with A(t, T) given by (10) and k ~ N large enough, equation (16)
implies
E(A(kT, TC(q, T) exp ~~~~ ~ .

Let a > 0 be such that ?+ a 0. By Markov inequality

P(A(kT> T) ~e-kT03B1) ~ TC(q> T) +


03B + ~ 2) . °

By Borel Cantelli Lemma this implies that

lim sup # i°g ( A ( kT> T) ) -T03B1


38

almost surely. Since a can be chosen arbitrary close to -A/2 this proves that

lim sup 1 kT log(0394(kT, T)) ~ 03BB/2

almost surely. Since 0394(t, T) 20394(kT, T) + 0394((k + I)T, T) we get that

almost surely and we conclude the proof by using inequality (11). QED
Remark 8.4 If yn =
f (n) for some positive decreasing function with f (s)ds =

oo, then

l(03B ) = log(f x) 1f(s)d .


For example, if
A
na
.

then 1(/) = 0 for 0 a 1 and ,Q > 0, 1(y) = -1 /A for a = 1 and ~3 =


0, and
I(y) = -oo for a = 1 and 0 ,~ ~ 1.

Similar to Proposition 8.3 is the next proposition whose proof is left to the
reader: .

Proposition 8.5 Let X be the continuous time Markov process associated to


the generator (21) and let
03B =
log(~(t) 2t.
1 f the vector field F given by (22) is Lipschitz continuous then X is almost surely
a a-pseudotrajectory of the flow induced by F.

Consider now the following situation. Suppose that X is a A-pseudotrajectory


whose limit set is contained in some compact positively invariant set Ii. Let
Y (t) E K denote a point nearest to X (t) . It is not true in general that Y is a
A-pseudotrajectory for but, for reasons that will be made clear later, it may
be useful to know when this is true. The end of this section is devoted to this
question.
Let K C M be a compact positively invariant set for ~ and B C M a set
containing K. We say that K attracts B exponentially at rate a 0 if there
exists C > 0 such that

d(~t(x)~ K) - K)
for all x ~ B and t ~ 0.
39

Example 8.6 Suppose $ is a C1 flow on ]Rm and F C ]Rm is a periodic orbit


of period T > 0. For any pEr let Ai = ... ,
= be the eigenvalues
of D~T (p) (counted with their multiplicities). The a; are are called the char-
acteristic (or Floquet) multipliers. They are independent on p and the unity is
always a Floquet multiplier (see Hartman (1964)).
Let a 0. If 1 has multiplicity 1 and the remaining rn-1 Floquet multipliers
are strictly inside the complex disk of center 0 and radius e" then there exists a

neighborhood B of r such that r attracts exponentially B at rate a/T. In this


case r is called an attracting hyperbolic periodic orbit.

Lemma 8.7 Let a 0,A 0 and Q = sup(a, a). Suppose Ii C B attracts


exponentially B at rate a. Let X be a 03BB-pseudotrajectory for 03A6 such that X(t) E

B for all t > 0 and let Y(t) E Ii be a point nearest to X (t). Then .

(i)
lim sup t logg d( X()~t Y ())
t .

(ii) If the {03A6t} are Lipschitz, locally uniformly in t > 0 then Y is 03B103B2-pseudotrajectory
for ~~Ii
Proof Choose 0 E -~i and choose T> 0 large enough such that

d(~T(x)~ K) k.)
for all x E B. Thus there exists to such that for t > to
d(X(t+T), ~T(X (t))+d(~T(X (t)), l~ ) K). .

Let vk =
d(X (kT), li’), p = and ko = + l. Then vk+1 pk + pvk
for k > ko. Hence
Vko+m ~ + Vko)
for m > 1. It follows that

kT - T
03B2 + f.
Also for kT t (k + I)T and k > ko
d(X(t), K) X (t)) + K)
+ +
Thus

lim sup log(d(X(t),K) t ~ 03B2 + ~

and since ( is arbitrary we get the desired result.


To obtain (ii) observe that

d(Y(t+h), ~h(Y(t)) _ d(~h(Y(t))~ ~h(X (t))+d(~h(X (t))~ X (t+h))+d(X (t+h),Y(t+h)


40

Then for t large enough and T > 0

sup d(Y(t+h), ~h(Y(t)) L(T)d(X (t), sup d(X(t+h),’ Y(t+h)).


0hT OShT
QED

8.2 Expansion Rate and Shadowing


From now on we assume that M is a Riemannian manifold and ~ a C1 flow on
M. The norm of a tangent vector v in the Riemannian metric is denoted byw ( ( .
In our applications M will be a submanifold of Rm positively invariant under
the flow generated by a smooth vector field.
Let I{ C M denote a compact positively invariant set. The expansion con-
stant of ~t at I~ (Hirsch, 1994) is the number

K) _ inf m(D03A6t(x))
where
m(D03A6t(x)) =
~v~ =
1}
denote the minimal norm of Observe that since ~ is a flow then

= .

The expansion rate of $ at Ii is defined as

~(03A6, K) = 1 tlog(EC(03A6t), K))


where the limit exists by a standard subadditivity argument whose verification
is left to the reader.

Remark 8.8 It is important to understand that the expansion rate of ~ at K


depends on the dynamics of ~ in M and not only in K. As a simple example
illustrating this point, consider a smooth flow in l~m having a non-stationary
periodic orbit F of period T > 0. Then it is not hard to see that £(~, r) equals
the smallest real part of the Floquet exponents of r divided by T. (this easily
follows from Theorem 8.12.) If we now set M = F and W = ~~T then £(~, r) = 0.

We now state a shadowing result due to Benaim and Hirsch (1996) whose proof
is an (easy) adaptation of Hirsch’s shadowing theorem (Hirsch,1994). .

Theorem 8.9 Let I~ C M be a compact positively invariant set. Let X be a

03BB-pseudotrajectory for 03A6. Suppose


(a) L(X) C K.

(b) a min~0, £(~, I~)}.


Then
41

(I) There exists r > 0 and z e M such that

A.
t-cn t

(ii) Let z be as in (I)Suppose 1 > 0, y e M are such that

lim sup 1 tt log d(ii (t )


t-cx>
(y) ) ~
A °

Then z and yare on the same orbit


Proof Since A S(&#x26;, Ii) choose T 0
we can > large enough so that

>
XEK

Set f = &#x26;T, yk =
,I(kT) and fix w such that e03BBT w minxEK m(D f(z)).
Thus for k large enough
d(Yk+if(Yk)) W~ .

(27)
By continuity of D f and compactness of Ii there exists a neighborhood U of Ii
such that
minm(D
XEU
f(z)) =
p > w..
(28)
Claim: There exists a neighborhood N c U of I and p* > 0 such that

B(f(z), P» C f(B(z, P))


for all z e Nand p p* .

Proof of the claim: Choose a neighborhood U’ of K, and r > 0, small enough


such that for all y e U’ and d(y, z) there exists C1
r a curve yy z :[0, 1] - U
with the properties:

(i) ’ly,z (o) " Y>


(ii) ’yy,z ([0, 1]) c U,
(iii) *
d(y, Z).
Set N f~ ~ (U’) n U and p*
= =
§. Let z e N, p p* and d(z, f(z) ) pp.. Then

d(f-1 (z), x) d(f-1 (z) , f-1 (f(x)))


*
~
/ o i ~Df-1 l’ff0>,z S))’f)z>,z
~P /~ l li)x> z (S) l lds ~> d(f(z) , z)
o

=
P.

This proves the claim.


42

Since L(X) C E N for k large enough. Let v ~ ~c}. We


claim that for k large enough
C °
(29)
Indeed let z E B(yk ~k). Then (27) implies that for k large enough
f (yk-i)l ~ ykJ + yk) ~ ~k + vk
Since 1/ 03B4 , 03B4k + 03BDk-1 ~ 03B4k-1 for k > Thus for k large enough,
say k > m,
.

z E C f (B(yk-i,
where the last inclusion follows from the claim. This proves (29). .

Set Bk = B(Yk, , 6k ) . For n > m estimate (29) implies that

i>0

is a nonempty compact set. Also for z E Qn, This proves


statement of the Theorem.
(i)
Since c5 ~ the claim shows that the diameter of goes to zero as

i - oo. This implies that Qn = (zn) all n > m where zn =

This implies (ii)..


QED
Corollary 8.10 Let ~ be a semiflow on M, A C M a positively invariant sub-
manifold of M and Ii C A a compact positively invariant set. Let X be a
a-pseudotrajectory for ~. Suppose that
(a) L(X) C K.

(b) There is a neighborhood of K (in M ) which is attracted exponentially at


rate a 0 by ~.

(c) There is a C1 flow 03A8 on A such that 03A6t |A =


for all t > 0.

(d) /3 =
sup(a, A) min~0, E(~, K)}. .

Then there exist r > 0 and x E A such that

limsup
t
t-+oo
g ( ~) +( ))_ ~3.

Proof Y(t) E K be a point nearest to X(t). By Lemma 8.7 (ii), Y is


Let
a /?-pseudotrajectory for ~, and the result follows from Theorem 8.9 combined
with estimate (i) of Lemma 8.7 QED

Example 8.11 As an illustration of Corollary 8.10 consider the diffusion on I~~’

dX =
F(X)dt +
43

where F and E are as in Proposition 7.4.


Let r C R"~ be an attracting hyperbolic periodic orbit of period T > 0. (see
Example 8.6) such that the multipliers distinct of the unity have moduli e"
for some a 0.
According to Proposition 7.4 the event Qr = C r~ has positive prob-
ability. If we furthermore assume that

then Corollary 8.10 applied with A = K = r and Proposition 8.5 imply that for
almost every w E Qr there exists x(w) E r such that

lim sup 1 t log(d(X (t), 03A6t (x (03C9))) ~ sup(03BB, 03B1 /T).

8.3 Properties of the Expansion Rate


This section presents several useful estimates of the expansion rate. The key re-
sult is an ergodic characterization of the expansion rate due to Schreiber ( 1997) .
We continue to assume that ~ is a C1 flow on a Riemannian manifold M.
Since we are concerned by the behavior of $ restricted to a compact positively
invariant set we furthermore assume without loss of generality that M is com-
pact.
In order to present Schreiber’s result we need to introduce a few notions of
ergodic theory. We let P(M) denote the space of Borel probability measures
on M with the topology of weak convergence3. A Borel probability measure

p E P(M) is said ~-invariant or invariant under ~ if for every Borel set A C M


and every t E R
p(A) _ ~(~t(A))~
We let ,~I~t(~) C P(M) denote the set of invariant measures. It is a nonempty
compact convex subset ofP(M) (Mane, 1987, chapter 1). A measure p E
is said ~-ergodic if every ~-invariant set has measure 0 or l.
.

The minimal center of attraction of ~ is the set

MC(~) =
U supp(p,).

The Birkhoff center of ~ is the set

BC(~) = : x E w(x)?.
By the Poincare recurrence Theorem (Mane, 1987, chapter 1)
C (30)
3 the one a functional analyst would call weak*
44

Let E M(03A6). By the celebrated Oseledec’s Theorem (see e.g Mane 1987,
chapter 11) there exists a Borel set R C M of full measure = 1) such that
for all x E R, there exist numbers À1(X) (x) and a decomposition of ...

Tx Minto
TxM El (x) ® ~ = ...

such that for all v E E; (x)

lim 1 tloggll ()II[ *


()
The maps r(.c), ~~ (x), Es (x) are measurable.
:c -~
When p is ergodic there is an integer and real numbers a4 (~), i =
1,..., such that r(x) r(p.) and ai (x) ~; for p
= = almost every x. The
~z are called the Lyapounov exponents of ~.

Schreiber (1997) proves the following result:


Theorem 8.12 Let Ii C M be a compact positively invariant set. Then

~(03A6, Ii) = inf 03BB1 ( )

where the infimum is taken over all ergodic measures with support in Ii. .

Proof Let f =
03A61 and let X {(x, v) :: x E Ii, v E TxM, ~v~ 1.} By
= =

compactness of X and definition of £(~, h) there exists a sequence (xn, vn) E X


such that

= =
’ ’
n-~oo n n-m n

Define a map G :X - X by
Df(x)v ~Df(x)v~
G(x,v) =
(f(x), ).

Let be the sequence of probability measures defined on X by


03B4Gi(xn,vn).
03B8n = 1 n

always suppose (by replacing 0n by some sub-


By compactness of X we can
toward some prob-
sequence if necessary) that the sequence 6n converges weakly
ability measure Continuity
O. of G easily implies that 0 is G-invariant.
Let h : X - R be the map defined by h(x, v) = log(~D f(x )vll). By the chain
rule we get n-1

i=O
45

Therefore

/
x
hd03B8 = lim
n-m
/
x hd03B8n
= lim
n-oJ n
=

Now, by the ergodic decomposition theorem (see Mañé, 1987 chapter 6 Theorem
6.4)

where 03B8(x,v) are


xhd03B8 (
G-invariant
=

ergodic probability measures. It follows that for


each e > 0 there exists an ergodic G-invariant measure w =
0x,v> for some
(z, v) , such that
x hdw
existence of a Borel set X’ C X such that
S(&#x26;, Ii) + e. Birkhoff’s ergodic theorem implies the
w(X’) = I and

1 n log(~Dfn(x)v~) = lim h(Gi(x, v)) = x hdv ~ ~(03A6, K) + ~ (31)


for all (z, v) e X’. Let p be the marginal probability measure defined Ii
on by
p(B) =
w((z, v) e X : z e B) .
Clearly p is f-invariant and ergodic. On the other hand by Oseledec’s Theorem
for p almost all z e I and all v e TxM the limit limn~~
exists and satisfies
log(~Dfn (z)v[[) §
03BB1( ) ~ 1 n log(~Dfn(x)v~) (32)
Therefore from (31) and (32) we deduce

p
inf . Ai (p)
ergodic
S(&#x26;, K) .

To prove the converse inequality let p be any


ergodic measure. By Oseledec’s
Theorem for p almost all z e I and all v e Ei (z) ::

Ai(p) * 1 n log(~Dfn(x)v~)
n-co n
> S(&#x26;, I)
QED

Corollary 8. 13

S(&#x26;, Ii) = =
S(&#x26;, BC(&#x26;[K)).
Proof The first equality follows from Theorem 8.12 and the second from
Poincaré recurrence theorem (equation 30) . QED

Theorem 8. 12 and Corollary 8. 13 can be used to estimate the Expansion rate.


Here are two such estimates.
46

Corollary 8.14 Assume that M = is generated by a smooth vector field


F and C Eq(~) (the equilibria set). Then

£(~, K) = ~ pe Eq(~) ~ Ii }
where al (p) denote the smallest real part of the eigenvalues of the jacobian matrix
DF (p) .
Proof Under the assumption that C Eq(~) every ergodic measure
with support in K has to be a Dirac measure at an equilibrium point. ~p
be such a measure. Then =
Ai(p) and the result follows from Theorem
8.12. QED

The following result is based on Hirsch (1994).


Corollary 8.15 Assume M = Rm and 03A6 is generated by a smooth vector field
F. Let (3(x) equals the smallest eigenvalue of the matrix + DF(x)T)
where T denotes matrix transpose. Then

£ (~, Is ) > ’ "


inf /?(.c)..

Proof Let y E By invariance of $,(2/) E for


all t E R. The variational equation along orbits of the reversed time flow ~_t
gives
=

Therefore for every nonzero vector v E and t ~ 0 we have,

with 03B2 =
infx~MC(03A6|K) p(x). Therefore

for all y E and t > 0. To conclude set y = ~t(x) for x E


and the estimate follows from the definition of £(~, K) combined with Corollary
8.13. QED

Example 8.16 Let (xn, yn) E R2 be the Robbins Monro algorithm described
in Example 8.1. It is convenient here to express the dynamics of the vector field
(26) in polar coordinates. That is
d03C1 dt = 03C1h(03C12), d03B8 dt = (03C1 sin 03B8)2.
Let BE = ~ ~ 1- E} . For E « 1 and (p, 9) E BE
~~~ ~~~ _2(1 " P)P(1- P ) _ 2(1 " P) (2 E)(1- e)
i = -
47

Thusjl - 03C1(t)| ~ e-(2-~)(1-~)t|1 - 03C1|. This shows that Sl attract B~ at rate


_2 + 0(~).
To compute £(~~Sl, S~) we use Corollary 8.14. The dynamics on Sl being
given by
points 0 and
d03B8 dt =

03C0. Thus
0 is the eigenvalue of the linearized ODE at equilibria
= 0. Suppose now that

t (y) = lim sup 0.


n-~oo Tn

Then Corollary 8.10 and Proposition 8.3 imply that yn? converges almost
surely toward one of the points a or c of Figure 2.

9 Nonconvergence to Unstable points, Periodic


orbits and Normally Hyperbolic Sets
Let be given by (7) and F a smooth vector field. Let p E JR m be an equilib-
rium of F; that is F(p) = 0. As usual, if all eigenvalues of DF( p) have nonzero
real parts, p is called hyperbolic. If all eigenvalues of DF(p) have negative real
parts, p is linearly stable. If some eigenvalue has positive real part, p is linearly
unstable.
Suppose p is a hyperbolic equilibrium of F which is linearly unstable. Then
the set of initial values whose forward trajectories converge to p - the stable
manifold W, (p) of p - is the image of an injective C1 immersion ~gm
where 0 k m. Consequently W,(p) has measure 0 in This suggests
that for the stochastic process (7), convergence of sample paths to p is
a null event, provided the noise has sufficiently large components in the
unstable directions at p. Such a result has been proved recently by Pemantle
(1990) and Brandiere and Duflo (1996) (under different sets of assumptions),
provided the vector fields F is C2 and the gain sequence is well behaved.
The ideas of Pemantle have been used in Benaim and Hirsch ( 1995b) to tackle
the case of hyperbolic unstable periodic orbits. This section is an extension
of these results which covers a larger class of repelling sets. Recent works by
Brandiere (1996,1997) address similar questions and prove the nonconvergence of
stochastic approximation processes toward certain types of repelling sets which
are not considered here.

Throughout this section we assume given


. A smooth vector field F ::1~’~ ~ IR m generating a

. A smooth (m - d) dimensional (embedded) submanifold S C where


d E ~1, ..., m}.
. A nonempty compact set r c S invariant under ~.
48

We assume that S is locally invariant, meaning that there exists a neighborhood


U of r (in and a positive time to such that

~t(UnS) C S

for all to.


We further assume that for every point pEr,
R~ =
Tp5’
where

(i) p -~ E; is a continuous map from r into the Grassman manifold G(d, m) of


d planes in

(ii) D03A6t (p)Ep = for all t E R, p E Ep .

(iii) There exist A > 0 and C > 0 such that for all p E r, wEE; and t > 0
>_

Examples
Linearly Unstable Equilibria: Suppose r = {p} where p E I~m is a linearly
unstable equilibrium of F. Then Rm = Ep ~ Ecp ~ Eup where and Ep
are the generalized eigenspaces of DF ( p) corresponding to eigenvalues with real

parts 0, equal to 0 and > 0. Because p is linearly unstable, the dimension of


E~ is at least 1. manifold
Using stable theory (see e.g Shub, (1987) or Robinson, (1995))
there exists locally
a invariant manifold S tangent to E - the center stable
manifold of p - which is Ck when F is Ck .

Since D~t(p) = there exist A > 0 and C > 0 such that


Ce03BBt~03C9~ for all w E Ep .
Linearly Unstable Periodic Orbits: Let r C I~m be a periodic orbit. F is
said to be hyperbolic if the unity is a multiplier with multiplicity one and the
m - 1 other multipliers have moduli different from 1. r is said to be linearly
unstable if some multiplier has modulus strictly greater than 1.
Suppose r is a hyperbolic linearly unstable periodic orbit for the Ck vec-
tor field F. By hyperbolicity (see for example Shub 1987) there exist positive
constants C, A and a decomposition of Tr]Rm as the direct sum of three vector
bundles:
T0393Rm = ® ®
which is invariant under Tr~, and such that for all p E F, t > 0:

(I ~ (33)
il D~_t (p) I Ep >
49

E: =
span(F(p))
where p x
E; denotes the fibre of EU(r) over p, and similar notation applies to
Ep and Ep .
Because F is linearly unstable, the dimension of is at least 1.
Each E; is a linear subspace of The map p H- E; is a continuous
map from r into the Grassmann manifold of linear subspaces of the appropriate
dimension (it is actually Ck due to the fact that Tr~t maps p x E; to ~t(p) x
and that Tr is a Ck flow).
E~t~, ~,
For p ~ 0393 and sufficiently small f > 0,the local stable manifold of p is defined
to be the set:
_
’(~ f, and ~ _ ~}.
Using stable manifold theory, we take E small enough so that WE (p) is a Ck

(embedded) submanifold whose tangent space at p is ?’pWE (p) = Ep.


We set S = WE (p). This is the local stable manifold of r. It follows
from the above that S is a Ck locally invariant submanifold, with TpS = .

Let 0 a 1. We call a map C1+a if it is C1 and its derivative is aHolder.


Note that C1+1 is weaker than C2. A C1+a manifold is a manifold whose tran-
sition functions may be chosen C1+a .
Theorem 9.1 Let given by (7) be a Robbins Monro algorithm (section
4.2) Assume:
(i) There exists K > 0 such that I~ for all n > 0.

(ii) as in Proposition ~,~.1~~.

(iii) There exists a neighborhood N(r) of F and b > 0 such that for all unit
vector v E ~gm
L’’(~~n+1 ~ v)+~.~n) >
(iv) There exists 0 a 1 such that:

(a) F and S are C1+a,


(b)

Then
03B 1n+lim~03A=n+1B3 2i
= 0.

P ( lim d(xn, r) =
0) = 0.

Remark 9.2 If ~ yn ~ with Q > 1/2, 0 A -


B then condition (iv), (b)
of Theorem 9.1 is fulfilled provided that

03B1 > 203B2-1 2.


If = oo condition (iv) of Theorem 9.1 is always satisfied for a > 0.
50

9.1 Proof of Theorem 9.1


The proof of this result relies, on one hand, upon the construction of a suitable
Lyapounov function, and on the other hand on probabilistic estimates due to
Pemantle (1990). .

Construction of a suitable Lyapounov function


The construction given here is very similar to the construction given in Benaim
and Hirsch (1995b), but instead of defining the Lyapounov function as the dis-
tance to S in the unstable direction for an adapted Riemann metric obtained
by time averaging (as in Benaim and Hirsch (1995b)), we consider the usual
distance and we then define the Lyapounov function by averaging over time.
It appears that this construction leads to much easier estimates and allows
us to handle the fact that the splitting =
TrS is only continuous.

stepl The first step of the construction is to replace the continuous invariant
splitting TrJRm = TrS by a smooth (noninvariant) splitting
T0393S~ EU close enough to the first one to control the expansion of DcJJt along
fibers of EU.
Choose T > 0 large enough so that for f =
p E F, and w E Ep:
~Df(p)03C9~ ~ 5~03C9~.
By Whitney embedding theorem (Hirsch, 1976, chapter 1) we can embed
G(d, m) into RD for some D ~ N large enough so that we can see p ~ E; as
a map from F - Thus by Tietze extension theorem (Munkres, 1975) we
can extend this map to a continuous map from IR m into IR D. Let fl denote a C°°
retraction from a neighborhood of G(d, m) C IRD onto G(d, m) whose existence
follows from a classical result in differential topology (Hirsch, 1976, chapter 4).
By composing the extension of p ~ E; with g we obtain a continuous map
defined on a neighborhood N of F, taking values in G(d, m) and which extends
p -~ E;.To shorten notation, we keep the notation Ep E G(d, m) to
denote this new map.
By replacing N by a smaller neighborhood if necessary, we can further assume

that N is compact, N C U, f (N) C U and

~~ I >_ 4 I I w II
for all
Now, by standard approximation procedure, we can approximate
a

E; E G(d, m) by a C°° function from N into Then, by composing with fl,


we obtain a C°° map
in the C°
EP
E G(d, m) which can be chosen arbitrary
close to Ep topology.
For p E N, let
Pp : TpS ~ E~ ~ Tp S,
51

and let
p : TpS ~ Ep ~ Tp S,
u+v--~u.
Fix E > 0 small enough so for all pEN, f(p) ~+4) 1 and f(p)I ( I I Pp I I+
~.+ 1 ) ) 1 (this choice will be clarified in the next lemma). From now on,
we will assume that the map Eup
E G(d, m) is chosen such that for all
p~N~S:
.

(1) =
TpS ® Eup
( ii ) The projector Pp TpS ® Eup ~ TpS, satisfies
:

~Pp-Pp~ ~ ~.

Let Eu{(p, v) E S n N x
= v E
Ep }.
Since S is C1+a, Eu is a vector
bundle over S n N. Let H -~ 1~m be the map defined by
H(p, v) = p + v.
It is easy to see that the tangent map of H at a point
(p, 0) is invertible. The
inverse function theorem implies that H is a local diffeomorphism at each
point of the zero section of E. Since H maps the zero section to S n N by the
diffeomorphism (p, o) - p, it follows that H restricts to a diffeomorphism
H : N~ --~ No between open neighborhoods No or the zero section and No ~ N
of S n N. We now define the maps

11: s,

for H-1 (x) =


(p, v) and

x ~

Observe that for any p E No n S

DII(p) =
Pp.
Step 2 The second step consists in the construction of Lyapounov function
which is zero on S and increases exponentially along trajectories outside S. This
function (see Proposition 9.5) is obtained from V by some averaging procedure.

Lemma 9.3 There exists a neighborhood of r, Nl ~ No, and p > 1 such that
for all x ~ N1
>

To prove this Lemma we use the following estimates


Lemma 9.4 Let P, P : l~m -~ I~m be two projectors and A : l~m a linear
map. Assume there exist E, a > 0 such that
52

(i) PII _ E~
(ii) > 03B1~u~ for all u E

Then

(i) (a(1 - e) - for all v E KerP.


(ii) IIPA(Id- P) PA(Id- P) II _ EIIAII(1+ IIPII +
-

Proof Let v E K er P be a unit vector. Write v (v Pv) + (P P)ro. Thus = - -

~Av~ ~ allv - Pv~ - ~~A~ ~ a(1- E) -


This proves (i) while (it) follows from

= )
~ ~~A~(~Id-~+~P~) .

QED
Proof of Lemma 9.3
let x E No n f-1(No) and set p =
II(x).
? =

- .

Lemma 9.4 (i) applied to D f (p), Pp, Pp and our choice for E imply
-P)~~ ~ =
3V(x)
Also, by Lemma 9.4 (ii)
IIPI(n)Df(p)(Id-Pa)-Pf(p)D.f(p)(Id-Pr)II1 _ 1.

Since for p E r, Pp) = 0 the preceding inequality implies the


-

existence of a neighborhood of r, Nl C No such that for x E Nl and p = II(x) :


-

Pp)~~
1. This implies

~D03A0(f(p))Df(p)(x p)~ ~x - -

p~ =
V(x).
It follows that V(f(x)) > 3V(x) -

V(x) ~-o(V(x)). Replacing Nl by a smaller


neighborhood gives
V(f(x)) > pV(x)
for some p > 1. QED
Recall that a map ~ : Rm ~ II8 is said to have a right derivative at point x if for
all h E the limit

- ~(x) t
D~(x).h = lim ~(x + th)
exists. If ~ is differentiable at x, then =
h) where E
is the usual gradient.
53

Proposition 9.5 There exists a compact neighborhood of r, N(r) C NI and


real numbers I > 0, Q > 0 such that the map ~: N(r) ~ R given by

~(x) =

l0 V( 3A6-t(x) dt
enjoys the following properties:
(i) r~ is Cr on N(f) B S.
(ii) For all x E N(r) n S, r~ admits a right derivative I~m -~ which
is Lipschitz, convex and positively homogeneous.

(iii) If r > 1 + a for some 0 a 1 there exists k > 0 and a neighborhood


U C I~m of 0 such that for all x EN(r) and v E U
+ v) > +

(iv) There exists ci > 0 such that for all x E N(r) B S


~~

and for alI x E N(r) n S and v E ~gm

clllv - .

(v) For all x E N(r) n S, u E TxS and v E Rm

+ v) =

(vi) For all x E N(r)


> ~

Proof Notation: Given A > 0 we let NA C Nl denote a compact neighborhood


of r such that for all ~t~I A, C Ni and we let C(A) > 0 denote the
Lipschitz constant of the map t, x -~ restricted to [-A, A] x NA.
Remark that for ~t ~ A and x E NA we have:

= -

I_ ~
(34)
We first fix 1 > 2T and assume that N(r) C Nt. We will see below (in proving
(vi)) how to choose l.
(i) is obvious.
(ii) follows from the fact that II is Cl, and .c -~ admits a right derivative
at the origin of Rm given as
Before passing to the proof of (iii) let us compute For x E Nt let

Gt(x) =
~_t(x) -
54

B(t, z) - -
(Id -

and for x let


x-03A0(x) ~x-03A0(x)~.
b(x) =
It is easy to verify that

J 0(B(t, x )h, (35)

for x ~ Nl B S and
=

J0 (36)
for x ~ Nl ~ S.
(iii) If r ~ 1 + a, ~ and II are C~ with a Holder derivatives. Hence there
exits k > 0 such that

+ IIGt(x) + B(t, ~~Gc(~)~~ -


If x E Nl n S then = 0 and the result follows from (36).
If x S, convexity of the norm implies

and the result follows from (35).


(iv). Claim : There exists co > 0 such that ) > co for all 0 t
I, p n S and unit vector b E EP .

Proof of the claim: Suppose the contrary. Then by compactness of {(t, p, v) :


0 t I, p E Nl n S, b E =
1} there exists 0 t 1, p E Ni n
Ep,
S and a unit vector b E such that B(t,p)b = 0. Therefore D~_t(p)b E
Ep
Ker(Id - DII(~_t(p)) = Thus b E =
TpS. But
this is impossible because 6 is a unit vector in and =
TpS ~ Eup. This E;
proves the claim.

~_t(x) - =

and for any h E with =


1,

(B(t, ~_t(z)-1I(~_~(x))) _
(B(t, °

Thus, if we set h =
b(x) we get
55

where = = 0. Since, according to the claim, ,

>

Co this implies
(B(t, > co/2

for small enough. Formulae (35) implies that ci with


ci =
lco /2. Hence ci.
Suppose now x E JV(r) n S. It follows from (iii) that B(t, x)v B(t, x)(v - =

for all v E Now the claim together with (36) imply that _

Dn(x)vi I.
(vi) For x E N~ and 0 t 1 we can write t = kT + r for kEN and
0 r T. Thus, by Lemma 9.3 and equation (34),

v(~t(x)) _
>_ PkV (~r(~)) >_ >

where C 1( T=
For s > 0,
PC1(T) and a =

~(03A6s(x))-~(x)=-ll-s V(03A6-t(x) dt+0-sV(03A6-t(x) dt ~ -V(x)e-aleas C1(T)3


+ s0V(03A6-t(x) dt.

It follows that

lim ~(03A6s(x))-~(x) s ~ V(x)(1-e-al C1(T))


It then suffices to choose I
of I we get that
large enough so that 1-
- > 0. With this choice

D~(x).F(x)~03B2~(x)

with ,Q > 0. QED


=1C I (1 C T )
Probabilistic Estimates
The following lemma adapted from Pemantle (1992, Lemma 5.5) is the proba-
bilistic key of the proof of Theorem 9.1. .

Lemma 9.6 Let {Sn } be a nonnegative stochastic process, Sn = So +


where Xn is .~n measumble. Let be a sequence of positive numbers such
00

that ~ ~"
n
oo and let an =
L
s=n.f.1
~. .

Assume there exist a sequence 0 fn = o( a" , constants al > 0, a2 > 0


and an integer No such that for all n > lVo:

(i~ I=
56

(it )
(iii)
(iv)
Then ~ =
0) = 0.

This lemma is stated and proved in (Pemantle, 1992) for in = and but the
proof adapts without difficulty to the present situation.
Proof Assume without loss of generality that No = 0, | Xn |~ b103B1n and
~n ~
where
2(6i+62)ai.
Given n ~ N let T be the stopping time defined as

T =
5’,~
Claim:
~

ai
(37)
k

Proof of (37): By assumption (iii), the process Zk =


5’~ 2014 is a
=o
submartingale. Therefore is a submartingale and for all m ~ n
Hence

E(S2m^T-S2n|Fn) ~ a1E( 03B32i|Fn) ~


!=n+l
03B11( 03B 2i)P(T > m|Fn).
=n+l

On the other hand

by definition ofT and condition (z). It follows that

> m|Fn) ~ 2i.


2(b1 + b2)03B1n a103A mi=n+103B

Letting m 2014~ oo proves the claim.


Now, let u be the stopping time defined as

cr =
5’, .

Claim : Let En be the event En =


{5’n ~ B/B2~}. Then
(38)
Proof of ~~.’ The process is a submartingale. Indeed
57

where the last term is nonnegative by condition (it) . Therefore by Doob’s de-
composition Lemma there exist a martingale {Mi}i~n and a previsible process
{~}~n such that =
M,+7,, In = 0 and 7,+i ~ 7,. The fact that 5’,Ay ~ M,
implies
P(r =
oo~) ~ P(V. ~ : M, ~ ~~/62~!~).
Thus

P(~ =
oo)~)l~ ~ M, - -~~/~’~~)lE. (39)
Our next goal is to estimate the right hand term of (39). Set Af~ =
At, 2014

M~.
For ~ n ::
t-i

= j=~~ E((M,+i - (40)

where we have used the fact that

E((M,+i - =
F((~ - (~ - /,)~ ~ 02~+1
by condition (t~). Therefore for s > 0, ~ ~ n and > 0

P( inf M,’ -~!~) ~ P( inf (M,’-) sup )M;-) ~


-~-).~) ~ ~( n«m

~ E(M~)~)+~ -
(S+~ (s+~22
where the last two inequalities follow from Doob’s inequality combined with (40).
With s =
-v~2~n and =
20142014"- we get that

M; -s|Fn) ~ 4a2 4a2 + b2 °

Thus
P(~i ~ n : Mi - Mn ~ -1 2b203B1n|Fn) ~ 1 - 4a2 4a2 + b2.

This proves (38).


We can now finish the proof of the Lemma. Let G denote the event that
{9n} does not converge to zero. By definition of T and inequality (38)::

E(1G|Fi)1T=i = E(1G|Fi)1Ei1T=i ~ b2 4a2 + b21Ei1T=i = b2 4a2 + b21T=i


for all ~ n. Therefore

t>n t>n
58

~ b2 4a2 + b2P(T ~|Fn) ~ b2 4a2 + b2(1- 2(b1 + b2) a1) > 0

where the last inequality follows from (37). Since limn~~E(1G|Fn) =


1G al-
most surely this proves that 1G =1 almost surely. QED

If = oo we use the next lemma:

Lemma 9.7 Let {Sn} be a stochastic process, Sn = So + X; where Xn is


measurable and C. Let such that 03B32i = oo. Assume there

exists al > 0 and some integer No such that for all n > No
a 103B32n + 1.
Then P(limn~~ Sn =
0) = 0.
n

Proof. As already noticed Zn = 5,2, - 03A3 a103B32i is a submartingale.


~’a
Suppose 0) > 0. Then for all f > 0 there exists N > N0 such
Sn =

that Sn ( f}) 0. >


Assume ! E and define the stopping time T = inf { k > N;| Sk|> ~}.

The sequence is a submartingale and we have Zn^T (E +


C)2. It follows from the submartingale convergence theorem that -

nAT

converges almost surely. Thus ( £ ==o


is almost surely bounded. This
implies T oo almost surely. QED

We now prove Theorem 9.1.


Let N E N. Assume :c~ E N(r) where N(r) is the neighborhood given by
Proposition 9.5. Let T be the stopping time defined by
~ N(T)}.
We prove Theorem 9.1 by showing that P(T oo) = 1.
Without loss of generality we assume N = 0. (The proof is the same for any
N).
Define two sequences of random variables and {Sn } as follows:

Xn+l = +
n

So -
Sn =
So +
~=1

The process {Sn } is clearly nonnegative. Notice that if T = oo then X"+1 =


and Sn telescopes into Sn = This will be used at the
end of the proof.
We now suppose that oo and verify that hypotheses (i) to (iv) of
Lemma 9.6 are satisfied.
59

Conditions (i) and (iv). By Lipschitz continuity and the boundedness


of the sequences {F(xn)}, {Un} we have = = .

Condition (ii). Let k’ = + K) where k is given by Proposition 9.5,


(iii), =
suP{F(x); x ~ N(r)} and I~ is the uniform bound of the Un. If
n T, using Proposition 9.5, (ii), (iii), (v) and (vi) we have
.
(41)
Thus

kr~’n+1 ) .

By convexity of the right derivative of ~ (Proposition 9.5, (it)) and the condi-
tional Jensen inequality we have

= 0.

Thus
(42)
If n > T,Xn+i = so

0 (43) .

Putting (42) and (43) together and letting En =


proves condition ( ii ) of
Lemma 9.6.
For Condition (iii) of Lemma 9.6, we observe that

E(~+i - =

If Sn > ~n, the right hand term is nonnegative by condition (ii), .previously
proved. If Sn fn, (42) and (43) imply =

-
Thus

E(’Sn+1 -
Therefore, to prove condition (iii) of Lemma 9.6, it suffices to show that

>

for some 61 > 0 and n large enough. From (41) we deduce

o (44)
Using Proposition 9.5, (iv) and assumption (iii) of Theorem 9.1 we see that

1{n~T} {xnS}(E D~(xn)U +1) |Fn)-c1b) > 0. (45)


If xn E S, choose a unit vector vn E Ker(Id - We have

Un+1, vn >_ Un+1 - D03A0(xn)Un+1, vn > . .


60

Let A denotes the event A = {n T} n {xn E 5’}. By using Proposition 9.5


(iv), the Cauchy-Schwartz inequality and assumption (iii) of Theorem 9.I we
obtain

c1b1A. (46)
Putting (44), (45), (46) together and (43) give

On the other hand by the Jensen inequality. It


follows that for bl > 0 and n large enough, as is desired.
Condition (i) through (iv) of Lemma 9.6 being satisfied, the probability is
zero that {5’n} converges to zero, according to Lemma 9.6. If ~ ~y~ = oo the

proof given here also shows that conditions of Lemma 9.7 are satisfied.
Now suppose T = oo. Then =
Sn and {xn} remains in N(r). There-
fore (by Theorem 5.7) (the limit set of {xn}) is a nonempty compact
invariant subset so that for all y E L( {xn}) and t E N(r).
By condition (vi) Proposition (9.5)
or this implies that r~(~t (y)) > for all
t > 0 forcing to be zero. Thus L({xn}) C S. This implies Sn = -~ 0.

Since 0) 0, T is almost surely


= finite. QED

10 Weak Asymptotic Pseudotrajectories


In the sections we have been mainly concerned with the asymptotic
previous
behavior of stochastic approximations processes with "fast" decreasing step-
sizes, typically
03B3n = o(1 log(n))
(Proposition 4.4) or

~n =
0(n-"), a 1

(Proposition 4.2).
step-sizes go to zero at a slower rate we cannot expect to characterize
If the
precisely the limit sets of the process4. However it is always possible to describe
the "ergodic" or statistical behavior of the process in term of the corresponding
behavior for the associated deterministic system. This is the goal of this section
which is mainly based on Benaim and Schreiber (1997). It is worth mentioning
4For instance, with a step-size of the order of it is easy to construct examples for
which the process never converges even though the chain recurrent set of the ODE consists of
isolated equilibria.
61

that Fort and Pages (1997) in a recent paper largely generalize results of this
section and address several interesting questions which are not considered here.
Let (~, ~, P) be a probability space and t ~ 0} a nondecreasing family
of sub-u-algebras. Let (M, d) be a separable metric space equipped with its
Borel 03C3-algebra.
A process

(t, w) -~ X (t, w)
is said to be a weak asymptotic pseudotrajectory of the semiflow 03A6 if
(i) It is progressively measurable: X|[0, T] x Q is x measurable for all
T > 0 where B[o,T] denotes the Borel o~- field over [0, T~ .
(ii)
lim P{ 0hT
sup d(X(t + h)’ h(X(t)) a = 0

almost surely for each a > 0 and T > 0.

Recall section 8.3) that


(see denotes the space of Borel probability mea-
sures on M with the topology of weak convergence and ,M(~)(C 7~(M)) denotes
the set of ~-invariant measures.
Let denote the (random) occupation measure of the process :

t(03C9) =-1 t0 03B4X(s,03C9)ds


and let M (X,, w ) denote weak limit points of {~t (w) } . The set M (X,, w ) is
a (possibly empty) subset of P(M). However if is tight (for example
if t ~ is precompact) then by the Prohorov theorem is a
nonempty compact subset of P (,M ) . .

Theorem 10.1 Let X be a weak asymptotic pseudotrajectory of 03A6. There exists


a set f2 C Q of full measure 1) such that for all w ~
=

c ~t (~).
Proof
Let f: M -~ [0,1] be a uniformly continuous function and T > 0. For n > 1
set
nT
Un (fT) =

nT( -1)Tf(X x(s) ds,


Mn(f,T) =
1 i[U (f,T)-E(Ui f,T)|F(i-1)T ],
and

Nn(f,T) =
1i=2
i[E(Ui(f,T)|F(i-1)T 1 - E(Ui(f,T)|F(i-2)T)].
62

The processes { Mn ( f and ,T)}n>1 are martingales with respect


to the filtration ~~’nT :~> 1 } . Since

supE(Mn(f,T)2) 4T2~ ,2,


n .
Z
i

Doob’s convergence theorem implies that ( f, T ) }" > 1 converges almost surely.
-

Hence, by Kronecker lemma,


n
, 1
L.J
-
_ ~ (47)
n
i=1

almost surely. Similar reasoning with ( f, T ) } leads to


n+1
1
lim - L.J _ ~ (48)
n
i=2

almost surely. Since ~Us ( f, T ) ~ T, (47) implies


n+1
1
lim - L.J
n ~~U=(f,T)
i=2
-
= o (49)

and by adding (48) and (49) we obtain


n
1
lim -
n
( f, T) -

( f, T ) ~~(s-1)T)~ _ ~ (50)
i=1

almost surely.
we claim that

lim E(ui+1(f,T) -

Ui(f 03A6T,T)|F(i-1)T) = 0 (51)


almost surely. Let f > 0. By uniform continuity of f there exists a > 0 such
that d(x, y) a implies ~ f (x) f (y)) -
E. Hence
iT
_ ~~ r
(i-i)T
_ 2TP{ sup +T) -

> + TE.

Since X is a weak asymptotic pseudotrajectory of 03A6 the first term in the right
of the inequality goes to Zero almost surely as i -~ oo and since f is arbitrary,
this proves the claim.
Now, write

(f~ ~’) -
~
~T) =
~’) -

(f~
63

~ ’~ (f~ ~) ~T)
- ~,(/0~)].
Then use equations (50), (51), and equation (47) with f o ~T in lieu of f. It.

follows that there exists a set ~ ( fT) C ~ of full measure such that for all
w E T)
lim 1 n Ui+1 (f, T)-1 n Ui(f 03A6T, T) = 0. (52)

Since (wI, d) is a separable metric space it admits a metric J inducing the


same topology as d such that
(a)
(b) There exists a countable set H = of uniformly continuous
functions : (M, d)
-~ ~0,1~ such that the topology of is induced
by the metric
R( , v) = M fk d
fkdv| 2k
-M
|

Statement (a) follows for example from the construction given in Lemma 3.1.4
of Stroock (1993) while (b) follows from Theorem 3.1.5 of Stroock (1993).
Let
~ = n
kEN,TEQ+
Given w ~ 03A9 and E there exists a sequence
M(X, w) tj ~ oo (depending on

w) such that (w)} converges weakly toward ~.


Let nj = [~-] denote the integer part of Then ~-.
1 ni-1 /’(+i)T
q
=
(53)
for any continuous and bounded function f: M ~ R. Hence by combining (53)
and (52) we get that

M(fk03A6T)(x d)=Mfk(x)d
for all f k E H and T e Q+. This proves that is 03A6T invariant for all T E Q+.
By continuity of 03A6 this implies that is 03A6T invariant for all T > 0 and since 03A6
is a semiflow, is 03A6 invariant. QED

Given E P(M) let denote the support of Given a weak asymp-


totic pseudotrajectory X of ~ and w E 0 we define the minimal center of at-
traction of ~X(t,w) : t > 0} as the (random) set

supp( X,~ w) =
U
64

Corollary 10.2 Given a Borel set A C M define


(A) = t (w) (A) .
lim inf .

Suppose that for P-almost every w tight. Then for P-almost


every w

(i) w)) =1 and for any other closed set A C M such that r (w)(:-1) _
1 it follows that w) C A.
(ii) ____________

C =
{x E M : x E

Proof The proof of part (i) is an easy consequence of Theorem 10.1 and ( ii)
follows Theorem 10.1 and Poincare recurrence Theorem (equation (30). QED

This last corollary has the interpretation that the fraction of time spends
by weak asymptotic pseudotrajectory in an arbitrary neighborhood of
a BC(~)
goes to one with probability one.

10.1 Approximation Processes with Slow De-


Stochastic
creasing Step-Size
Consider a Robbins-Monro algorithm as described in section (4). Recall that
X :: I~+ --~ I~m
denotes the constant piecewise interpolated process given by
X (t) =
xn for rn t Tn+1. Set Ft = for T" t .

Proposition 10.3 Let given by (7) be a Robbins-Monro algorithm..4s-


sume

(i) F is Lipschitz on a neighborhood of ~xn : n > 0~,


(ii) xo is Fo measurable.
(iii) limR~~{supn E(~Un+1~1{~Un+1~~R}|Fn)} = 0.

(iv) limn~~ 03B3n = 0.

Then X is a weak asymptotic pseudotrajectory of ~. Hence X and .Y satisfy


conclusion of Theorem 10.I and Corollary 10.2.

Proof Given R > 0 let

(R) =

and
=
65

Then
k

P( sup
i~n

P(( sup > ( sup


i=n
k

>_
i=n

~ 4 03B12C2R2 03B32i+1 + 4 03B1E(03Bi+1E(~U{>R}|Fi)n


where the first term in the right side of this inequality follows from inequality
(16) obtained with q = 2 and the second term is an obvious estimate based on
Markov inequality. Let E > 0. Assumption implies the existence of R large
enough so that
1

Hence

lim sup P( sup -E.


0
(54)
n-+oo
i=n
Inequality (54) combined with the estimate (11 ) proves the result. QED

References
Akin, E. (1993). The General Topology of Dynamical Systems. American Math-
ematical Society, Providence.
Arthur, B., Ermol’ev, Y., and Kaniovskii, Y. (1983). A generalized urn problem
and its applications. Cybernetics, 19:61-71.

Arthur, B. M. (1988). Self-reinforcing mechanisms in economics. In W, A. P.,


Arrow, K. J., and Pines, D., editors, The Economy as an Evolving Complex
System, SFI Studies in the Sciences of Complexity. Addison-Wesley.
Benaïm, M. (1996). A dynamical systems approach to stochastic approximations.
SIAM Journal on Control and Optimization, 34:141-176.

Benaïm, M. (1997). Vertex reinforced random walks and a conjecture of Peman-


tle. The Annals of Probability, 25:361-392.

Benaïm, M. and Hirsch, M. W. (1994). Learning processes, mixed equilibria and


dynamical systems arising from repeated games. Submitted.

Benaïm, M. and Hirsch, M. W. (1995a). Chain recurrence in surface flows.


Discrete and Continuous Dynamical Systems, 1(1):1-16.
66

Benaïm, M. and Hirsch, M. W. (1995b). Dynamics of morse-smale urn processes.


Ergodic Theory and Dynamical Systems, 15:1005-1030.
Benaïm, M. and Hirsch, M. W. (1996). Asymptotic pseudotrajectories and chain
recurrent flows, with applications. J. Dynam. Differential Equations, 8:141-
176.

Benaïm, M. and Schreiber, S. J. (1997). Weak asymptotic pseudotrajectories for


semiflows: Ergodic properties. Preprint.

Benveniste, A., Métivier, M., and Priouret, P. (1990). Stochastic Approximation


and Adaptive Algorithms. Springer-Verlag, Berlin and New York.

Bowen, R. (1975). Omega limit sets of Axiom A diffeomorphisms. J. Diff. Eq,


18:333-339.

Brandière, O. (1996). Autour des pièges des algorithmes stochastiques. Thèse


de Doctorat, Université de Marne-la-Vallée.

Brandière., O. (1997). Some pathological traps for stochastic approximation.


SIAM Journal on Control and Optimization. To Appear.

Brandière, O. and Duflo., M. (1996). Les algorithmes stochastique contournent


ils les pièges. Annales de l’IHP, 32:395-427.

Conley, C. C. (1978). Isolated invariant sets and the Morse index. CBMS
Regional conference series in mathematics. American Mathematical Society,
Providence.

Delyon, B. (1996), General convergence results on stochastic approximation.


IEEE trans. on automatic control, 41:1245-1255.

Duflo, M. (1990). Méthodes Récursives Aléatoires. Masson. English Translation:


Random Iterative Models, Springer Verlag 1997.

Duflo, M. (1996). Algorithmes Stochastiques. Mathématiques et Applications.


Springer-Verlag.
Duflo, M. (1997). Cibles atteignables avec une probabilité positive d’après M.
BENAIM. Unpublished manuscript.
Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes, Characterization and
Convergence. John Wiley and Sons, Inc.
Fort, J. C. and Pages, G. (1994). Résaux de neurones: des méthodes connexion-
nistes d’apprentissage. Matapli, 37:31-48.

Fort, J. C. and Pages, G. (1996). Convergence of stochastic algorithms: From


Kushner-Clark theorem to the lyapounov functional method. Adv. Appl.
Prob, 28:1072-1094.
67

Fort, J. C. and Pages, G. (1997). Stochastic algorithm with non constant step:
a.s. weak convergence of empirical measures. Preprint.
Fudenberg, D. and Kreps, K. (1993). Learning mixed equilibria. Games and
Econom. Behav., 5:320-367.

Fudenberg, F. and Levine, D. (1998). Theory of Learning in Games. MIT Press,


Cambridge, MA. In Press.
Hartman, P. (1964). Ordinary Differential Equationq. Wiley, New York.
Hill, B. M., Lane, D., and Sudderth, W. (1980). A strong law for some general-
ized urn processes. Annals of Probability, 8:214-226.
Hirsch, M. W. (1976). Differential Topology. Springer-Verlag, Berlin, New York,
Heidelberg.
Hirsch, M. W. (1994). Asymptotic phase, shadowing and reaction-diffusion sys-
tems. In Differential equations, dynamical systems and control science, vol-
ume 152 of Lectures notes in pure and applied mathematics, pages 87-99.
Marcel Dekker, New-York.

Hirsch, Pugh, C. C. (1988). Cohomology


M. W. and of chain recurrent sets.
Ergodic Theory and Dynamical Systems, 8:73-80.
Kaniovski, Y. and Young, H. (1995). Learning dynamics in games with stochastic
perturbations. Games and Econom. Behav., 11:330-363.
Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a
regression function. Ann. Math. Statis, 23:462-466.
Kushner, H. J. and Clarck, C. C. (1978). Stochastic Approximation for Con-
strained and Unconstrained Systems. Springer-Verlag, Berlin and New York.
Kushner, H. J. and Yin, G. G. (1997). Stochastic Approximation Algorithms and
Applications. Springer-Verlag, New York.

Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans.


Automat. Control., AC-22:551-575.

Ljung, L. (1986). System Identification Theory for the User. Prentice Hall,
Englewood Cliffs, NJ.
Ljung, L. and Söderström, T. (1983). Theory and Practice of Recursive Identi-
fication. MIT Press, Cambridge, MA.
Mañé, R. (1987). Ergodic Theory and Differentiable Dynamics. Springer-Verlag,
New York.

Métivier, M. and Priouret, P. (1987). Théorèmes de convergence presque sure


pour une classe d’algorithmes stochastiques à pas décroissant. Probability
Theory and Related Fields, 74:403-428.
68

Munkres, J. R. (1975). Topology a first course. Prentice Hall.

Nevelson, M. B. and Khasminskii, R. Z. (1976). Stochastic Approximation and


Recursive Estimation. Translation of Math. Monographs. American Math-
ematical Society, Providence.

Pemantle, R. (1990). Nonconvergence to unstable points in urn models and


stochastic approximations. Annals of Probability, 18:698-712.

Pemantle, R. (1992). Vertex reinforced random walk. Probability Theory and


Related Fields, 92:117-136.

Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann.


Math. Statis, 22:400-407.

Robinson, C. (1977). Stability theorems and hyperbolicity in dynamical systems.


Rocky Journal of Mathematics, 7:425-434.
Robinson, C. (1995). Introduction to the Theory of Dynamical Systems. Studies
in Advances Mathematics. CRC Press, Boca Raton.
Schreiber, S. J. (1997). Expansion rates and Lyapunov exponents. Discrete and
Conts. Dynam. Sys., 3:433-438.

Shub,M. (1987). Global Stability of Dynamical Systems. Springer-Verlag, Berlin,


New York, Heidelberg.
Stroock, D. W. (1993). Probability Theory. An analytic view. Cambridge Uni-
versity Press.
White, H. (1992). Artificial Neural Networks: Approximation and Learning
Theory. Blackwell, Cambridge, Massachussets.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy