100% found this document useful (1 vote)

206 views263 pages

RoughPaths 2

This document provides an introduction to the theory of rough paths and regularity structures. It discusses concepts such as controlled differential equations, Brownian motion as a rough path, and integration against rough paths. The preface provides motivation for writing this book and acknowledges sources of funding and feedback. The book appears to serve as a textbook on rough path theory.

Uploaded by

Tommy Tommy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

206 views263 pages

RoughPaths 2

Uploaded by

Tommy Tommy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 263

Peter Friz and Martin Hairer

A Course on Rough Paths

With an introduction to regularity structures

June 2014

Errata (last update: April 2015)

Springer
To Waltraud and Rudolf Friz

and

To Xue-Mei
Preface

Since its original development in the mid-nineties by Terry Lyons, culminating in

the landmark paper [Lyo98], the theory of rough paths has grown into a mature and
widely applicable mathematical theory, and there are by now several monographs
dedicated to the subject, notably Lyons–Qian [LQ02], Lyons et al [LCL07] and
Friz–Victoir [FV10b]. So why do we believe that there is room for yet another book
on this matter? Our reasons for writing this book are twofold.
First, the theory of rough paths has gathered the reputation of being difficult to
access for “mainstream” probabilists because it relies on some non-trivial algebraic
and / or geometric machinery. It is true that if one wishes to apply it to signals
of arbitrary roughness, the general theory relies on several objects (in particular
on the Hopf-algebraic properties of the free tensor algebra and the free nilpotent
group embedded in it) that are unfamiliar to most probabilists. However, in our
opinion, some of the most interesting applications of the theory arise in the context
of stochastic differential equations, where the driving signal is Brownian motion. In
this case, the theory simplifies dramatically and essentially no non-trivial algebraic
or geometric objects are required at all. This simplification is certainly not novel.
Indeed, early notes by Lyons, and then of Davie and Gubinelli, all took place in
this simpler setting (which allows to incorporate Brownian motion and Lévy’s area).
However, it does appear to us that all these ideas can nowadays be put together in
unprecedented simplicity, and we made a conscious choice to restrict ourselves to
this simpler case throughout most of this book.
The second and main raison d’être of this book is that the scope of the theory
has expanded dramatically over the past few years and that, in this process, the
point of view has slightly shifted from the one exposed in the aforementioned
monographs. While Lyons’ theory was built on the integration of 1-forms, Gubinelli
gave a natural extension to the integration of so-called “controlled rough paths”. As a
benefit, differential equations driven by rough paths can now be solved by fixed point
arguments in linear Banach spaces which contain a sufficiently accurate (second
order) local description of the solution.
This shift in perspective has first enabled the use of rough paths to provide solution
theories for a number of classically ill-posed stochastic partial differential equations

vii
viii Preface

with one-dimensional spatial variables, including equations of Burgers type and

the KPZ equation. More recently, the perspective which emphasises linear spaces
containing sufficiently accurate local descriptions modelled on some (rough) input,
spurred the development of the theory of “regularity structures” which allows to
give consistent interpretations for a number of ill-posed equations, also in higher
dimensions. It can be viewed as an extension of the theory of controlled rough paths,
although its formulation is somewhat different. In the last chapters of this book, we
give a short and rather informal (i.e. very few proofs) introduction to that theory,
which in particular also sheds new light on some of the definitions of the theory of
rough paths.
This book does not have the ambition to provide an exhaustive description of the
theory of rough paths, but rather to complement the existing literature on the subject.
As a consequence, there are a number of aspects that we chose not to touch, or to do
so only barely. One omission is the study of rough paths of arbitrarily low regularity:
we do provide hints at the general theory at the end of several chapters, but these are
self-contained and can be skipped without impacting the understanding of the rest
of the book. Another serious omission concerns the systematic study of signatures,
that is the collection of all iterated integrals over a fixed interval associated to a
sufficiently regular path, providing an intriguing nonlinear characterisation.
We have used several parts of this book for lectures and mini-courses. In particular,
over the last years, the material on rough paths was given repeatedly by the first
author at TU Berlin (Chapters 1-12, in the form of a 4h/week, full semester lecture for
an audience of beginning graduate students in stochastics) and in some mini-courses
(Vienna, Columbia, Rennes, Toulouse; e.g. Chapters 1-5 with a selection of further
topics). The material of Chapters 13-15 originates in a number of minicourses by
the second author (Bonn, ETHZ, Toulouse, Columbia, XVII Brazilian School of
Probability, 44th St. Flour School of Probability, etc). The “KPZ and rough paths”
summer school in Rennes (2013) was a particularly good opportunity to try out much
of the material here in joint mini-course form – we are very grateful to the organisers
for their efforts. Chapters 13-15 are, arguably, a little harder to present in a classroom.
Jointly with Paul Gassiat, the first author gave this material as full lecture at TU
Berlin (with examples classes run by Joscha Diehl, and more background material
on Schwartz distributions, Hölder spaces and wavelet theory than what is found
in this book); we also started to use consistently colours on our handouts. We felt
the resulting improvement in readability was significant enough to try it out also
in the present book and take the opportunity to thank Jörg Sixt from Springer for
making this possible, aside from his professional assistance concerning all other
aspects of this book project. We are very grateful for all the feedback we received
from participants at all theses courses. Furthermore, we would like to thank Bruce
Driver, Paul Gassiat, Massimilliano Gubinelli, Terry Lyons, Etienne Pardoux, Jeremy
Quastel and Hendrik Weber for many interesting discussions on how to present this
material. In addition, Khalil Chouk, Joscha Diehl and Sebastian Riedel kindly offered
to partially proofread the final manuscript.
At last, we would like to acknowledge financial support: PKF was supported by
the European Research Council under the European Union’s Seventh Framework
Preface ix

Programme (FP7/2007-2013) / ERC grant agreement nr. 258237 and DFG, SPP 1324.
MH was supported by the Leverhulme trust through a leadership award and by the
Royal Society through a Wolfson research award.

Berlin and Coventry, Peter K. Friz

June 2014 Martin Hairer
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Controlled differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Analogies with other branches of mathematics . . . . . . . . . . . . . . . . . . 6
1.3 Regularity structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Frequently used notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Rough path theory works in infinite dimensions . . . . . . . . . . . . . . . . . 11

2 The space of rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The space of geometric rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Rough paths as Lie-group valued paths . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Geometric rough paths of low regularity . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Brownian motion as a rough path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Kolmogorov criterion for rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Itô Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Stratonovich Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Brownian motion in a magnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Cubature on Wiener Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 Scaling limits of random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.8 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Integration against rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Integration of 1-forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Integration of controlled rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 Stability I: rough integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 Controlled rough paths of lower regularity . . . . . . . . . . . . . . . . . . . . . . 61

xi
xii Contents

4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.7 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Stochastic integration and Itô’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.1 Itô integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Stratonovich integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Itô’s formula and Föllmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Backward integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6 Doob–Meyer type decomposition for rough paths . . . . . . . . . . . . . . . . . . 83

6.1 Motivation from stochastic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Uniqueness of the Gubinelli derivative and Doob–Meyer . . . . . . . . . 85
6.3 Brownian motion is truly rough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4 A deterministic Norris’ lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.5 Brownian motion is Hölder rough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.7 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Operations on controlled rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.1 Relation between rough paths and controlled rough paths . . . . . . . . . 95
7.2 Lifting of regular paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.3 Composition with regular functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.4 Stability II: Regular functions of controlled rough paths . . . . . . . . . . 98
7.5 Itô’s formula revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.6 Controlled rough paths of low regularity . . . . . . . . . . . . . . . . . . . . . . . 101
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8 Solutions to rough differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . 105

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.2 Review of the Young case: a priori estimates . . . . . . . . . . . . . . . . . . . . 106
8.3 Review of the Young case: Picard iteration . . . . . . . . . . . . . . . . . . . . . 107
8.4 Rough differential equations: a priori estimates . . . . . . . . . . . . . . . . . . 109
8.5 Rough differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8.6 Stability III: Continuity of the Itô–Lyons map . . . . . . . . . . . . . . . . . . . 116
8.7 Davie’s definition and numerical schemes . . . . . . . . . . . . . . . . . . . . . . 117
8.8 Lyons’ original definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.9 Stability IV: Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.11 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Contents xiii

9 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

9.1 Itô and Stratonovich equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
9.2 The Wong–Zakai theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.3 Support theorem and large deviations . . . . . . . . . . . . . . . . . . . . . . . . . . 125
9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

10 Gaussian rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

10.1 A simple criterion for Hölder regularity . . . . . . . . . . . . . . . . . . . . . . . . 129
10.2 Stochastic integration and variation regularity of the covariance . . . . 131
10.3 Fractional Brownian motion and beyond . . . . . . . . . . . . . . . . . . . . . . . 139
10.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
10.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

11 Cameron–Martin regularity and applications . . . . . . . . . . . . . . . . . . . . . 149

11.1 Complementary Young regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
11.2 Concentration of measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.2.1 Borell’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
11.2.2 Fernique theorem for Gaussian rough paths . . . . . . . . . . . . . . 155
11.2.3 Integrability of rough integrals and related topics . . . . . . . . . . 156
11.3 Malliavin calculus for rough differential equations . . . . . . . . . . . . . . . 160
11.3.1 Bouleau–Hirsch criterion and Hörmander’s theorem . . . . . . . 160
11.3.2 Calculus of variations for ODEs and RDEs . . . . . . . . . . . . . . . 161
11.3.3 Hörmander’s theorem for Gaussian RDEs . . . . . . . . . . . . . . . . 164
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
11.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

12 Stochastic partial differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

12.1 Rough partial differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
12.1.1 Linear theory: Feynman–Kac . . . . . . . . . . . . . . . . . . . . . . . . . . 169
12.1.2 Nonlinear theory: flow transformation method . . . . . . . . . . . . 173
12.1.3 Rough viscosity solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
12.2 Stochastic heat equation as a rough path . . . . . . . . . . . . . . . . . . . . . . . . 180
12.2.1 The linear stochastic heat equation . . . . . . . . . . . . . . . . . . . . . . 182
12.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

13 Introduction to regularity structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
13.2 Definition of a regularity structure and first examples . . . . . . . . . . . . . 192
13.2.1 The canonical polynomial structure . . . . . . . . . . . . . . . . . . . . . 194
13.2.2 The rough path structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
13.3 Definition of a model and first examples . . . . . . . . . . . . . . . . . . . . . . . 197
13.3.1 The canonical polynomial model . . . . . . . . . . . . . . . . . . . . . . . 200
13.3.2 The rough path model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
13.4 Wavelets and the reconstruction theorem . . . . . . . . . . . . . . . . . . . . . . . 204
xiv Contents

13.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

13.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

14 Operations on modelled distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

14.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
14.2 Products and composition by regular functions . . . . . . . . . . . . . . . . . . 212
14.3 Schauder estimates and admissible models . . . . . . . . . . . . . . . . . . . . . 215
14.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

15 Application to the KPZ equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

15.1 Formulation of the main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
15.2 Construction of the associated regularity structure . . . . . . . . . . . . . . . 224
15.3 The structure group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
15.4 Canonical lifts of regular functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
15.5 Renormalisation of the KPZ equation . . . . . . . . . . . . . . . . . . . . . . . . . . 230
15.5.1 The renormalisation group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
15.5.2 The renormalised equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
15.5.3 Convergence of the renormalised models . . . . . . . . . . . . . . . . 234
15.6 The KPZ equation and rough paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Chapter 1
Introduction

Abstract We give a short overview of the scopes of both the theory of rough paths
and the theory of regularity structures. The main ideas are introduced and we point
out some analogies with other branches of mathematics.

1.1 Controlled differential equations

Differential equations are omnipresent in modern pure and applied mathematics;

many “pure” disciplines in fact originate in attempts to analyse differential equations
from various application areas. Classical ordinary differential equations (ODEs) are
of the form Ẏt = f (Yt , t); an important sub-class is given by controlled ODEs of the
form
Ẏt = f0 (Yt ) + f (Yt )Ẋt , (1.1)
where X models the input (taking values in Rd , say), and Y is the output (in Re , say)
of some system modelled by nonlinear functions f0 and f , and by the initial state
Y0 . The need for a non-smooth theory arises naturally when the system is subject to
white noise, which can be understood as the scaling limit as h → 0 of the discrete
evolution equation
√
Yi+1 = Yi + hf0 (Yi ) + hf (Yi )ξi+1 , (1.2)

where the (ξi ) are i.i.d. standard Gaussian random variables. Based on martingale
theory, Itô’s stochastic differential equations (SDEs) have provided a rigorous and
extremely useful mathematical framework for all this. And yet, stability is lost in the
passage to continuous time: while it is trivial to solve (1.2) for a fixed realisation of
ξi (ω), after all (ξ1, . . . ξT ; Y0 ) 7→ Yi is surely a continuous map, the continuity of
the solution as a function of the driving noise is lost in the limit.
Taking Ẋ = ξ to be white noise in time (which amounts to say that X is a
Brownian motion, say B), the solution map S : B 7→ Y to (1.1), known as Itô map,
is a measurable map which in general lacks continuity, whatever norm one uses to

1
2 1 Introduction

equip the space of realisations of B. 1 Actually, one can show the following negative
result (see [Lyo91, LCL07] as well as Exercise 5.21 below):
Proposition 1.1. There exists no separable Banach space B ⊂ C([0, 1]) with the
following properties:
1. Sample paths of Brownian
R· motions lie in B almost surely.
2. The map (f, g) 7→ 0 f (t)ġ(t) dt defined on smooth functions extends to a contin-
uous map from B × B into the space of continuous functions on [0, 1].
Since, for any two distinct indices i and j, the map
Z ·
B 7→ B i (t) Ḃ j (t) dt , (1.3)
0

is itself the solution of one of the simplest possible differential equations driven by
B (take Y ∈ R2 solving Ẏ 1 = Ḃ i and Ẏ 2 = Y 1 Ḃ j ), this shows that it takes very
little for S to lack continuity. In this sense, solving SDEs is an analytically ill-posed
task! On the other hand, there are well-known probabilistic well-posedness results
for SDEs of the form 2

dYt = f0 (Yt )dt + f (Yt ) ◦ dBt , (1.4)

(see e.g. [INY78, Thm 4.1]), which imply for instance

Theorem 1.2. Let ξε = δε ∗ ξ denote the regularisation of white noise in time with a
compactly supported smooth mollifier δε . Denote by Y ε the solutions to (1.1) driven
by Ẋ = ξε . Then Y ε converges in probability (uniformly on compact sets). The
limiting process does not depend on the choice of mollifier δε , and in fact is the
Stratonovich solution to (1.4).

There are many variations on such “Wong–Zakai” results, another popular choice
being ξε = Ḃ (ε) where B (ε) is a piecewise linear approximation (of mesh size
∼ ε) to Brownian motion. However, as consequence of the aforementioned lack of
continuity of the Itô-map, there are also reasonable approximations to white noise for
which the above convergence fails. (We shall see an explicit example in Section 3.4.)
Perhaps rather surprisingly, it turns out that well-posedness is restored via the
iterated integrals (1.3) which are in fact the only data that is missing to turn S into
a continuous map. The role of (1.3) was already appreciated in [INY78, Thm 4.1]
and related works in the seventies, but statements at the time were probabilistic
in nature, such as Theorem 1.2 above. Rough path analysis introduced by Terry
Lyons in the seminal article [Lyo98] and by now exposed in several monographs
[LQ02, LCL07, FV10b], provides the following remarkable insight: Itô’s solution
map can be factorised into a measurable “universal” map Ψ and a “nice” solution
map Ŝ as
1
This lack of regularity is the raison d’être for Malliavin calculus, a Sobolev type theory of C([0, T ])
equipped with Wiener measure, the law of Brownian motion.
2
For the purpose of this introduction, all coefficients are assumed to be sufficiently nice.
1.1 Controlled differential equations 3

Ψ Ŝ
B(ω) 7→ (B, B)(ω) 7→ Y (ω). (1.5)
The map Ψ is universal in the sense that it depends neither on the initial condition, nor
on the vector fields driving the stochastic differential equation, but merely consists
of enhancing Brownian motion with iterated integrals of the form
Z t
Bi,j (s, t) = B i (r) − B i (s) dB j (r) .

(1.6)
s

At this stage, the choice of stochastic integration in (1.6) (e.g. Itô or Stratonovich)
does matter and probabilistic techniques are required for the construction of Ψ .
Indeed, the map Ψ is only measurable and usually requires the use of some sort
of stochastic integration theory (or some equivalent construction, see for example
Section 10 below for a general construction in a Gaussian, non-semimartingale
context).
The solution map Ŝ on the other hand, the solution map to a rough differential
equation (RDE), also known as Itô–Lyons map and discussed in Chapter 8.1, is
purely deterministic and only makes use of analytical constructions. More precisely,
it allows input signals to be arbitrary rough paths which, as discussed in Chapter 2,
are objects (thought of as enhanced paths) of the form (X, X), defined via certain
algebraic properties (which mimic the interplay between a path and its iterated
integrals) and certain analytical, Hölder-type regularity conditions. In Chapter 3 these
conditions will be seen to hold true a.s. for (B, B); a typical realisation is thus called
Brownian rough path.
The Itô–Lyons map turns out, cf. Section 8.6, to be “nice” in the sense that it is a
continuous map of both its initial condition and the driving noise (X, X), provided
that the dependency on the latter is measured in a suitable “rough path” metric. In
other words, rough path analysis allows for a pathwise solution theory for SDEs i.e.
for a fixed realisation of the Brownian rough path. The solution map Ŝ is however
a much richer object than the original Itô map, since its construction is completely
independent of the choice of stochastic integral and even of the knowledge that the
driving path is Brownian. For example, if we denote by Ψ I (resp. Ψ S ) the maps
B 7→ (B, B) obtained by Itô (resp. Stratonovich) integration, then we have the almost
sure identities
S I = Ŝ ◦ Ψ I , S S = Ŝ ◦ Ψ S ,
where S I (resp. S S ) denotes the solution to (1.4) interpreted in the Itô (resp.
Stratonovich) sense. Returning to Theorem 1.2, we see that the convergence there
is really a deterministic consequence of the probabilistic question whether or not
Ψ S (B ε ) → Ψ S (B) in probability and rough path topology, with Ḃ ε = ξ . This
can be shown to hold in the case of mollifier, piecewise linear, and many other
approximations.
So how is this Itô–Lyons map Ŝ built? In order to solve (1.1), we need to be able
to make sense of the expression
4 1 Introduction
Z t
f (Ys ) dXs , (1.7)
0

where Y is itself the as yet unknown solution. Here is where the usual pathwise
approach breaks down: as we have seen in Proposition 1.1 it is in general impossible,
even in the simplest cases, to find a Banach space of functions containing Brownian
sample paths and in which (1.7) makes sense. Actually, if we measure regularity
in terms of Hölder exponents, then (1.7) makes sense as a limit of Riemann sums
for X and Y that are arbitrary α-Hölder continuous functions if and only if α > 12 .
The keyword here is arbitrary: in our case the function Y is anything but arbitrary!
Actually, since the function Y solves (1.1), one would expect the small-scale fluctua-
tions of Y to look exactly like the small-scale fluctuations of X in the sense that one
would expect that
Ys,t = f (Ys )Xs,t + Rs,t
where, for any path F with values in a linear space, we set Fs,t = Ft − Fs , and
where Rs,t is some remainder that one would expect to be “of higher order”.
Suppose now that X is a “rough path”, which is to say that it has been “enhanced”
with a two-parameter function X which should be interpreted as giving the values for
Z t
Xi,j (s, t) = i
Xs,r dXrj . (1.8)
s

Note here that this identity should be read in the reverse order from what one may be
used to: it is the right hand side that is defined by the left hand side and not the other
way around! The idea here is that if X is too rough, then we do not a priori know
how to define the integral of X against itself, so we simply postulate its values. Of
course, X cannot just be anything, but should satisfy a number of natural algebraic
identities and analytical bounds, see Chapter 2 below.
Anyway, assuming that we are provided with the data (X, X), then we know how
to give meaning to the integral of components of X against other components of X:
this is precisely what X encodes. Intuitively, this suggests that if we similarly encode
the fact that Y “looks like X at small scales”, then one should be able to extend
the definition of (1.7) to a large enough class of integrands to include solutions to
(1.1), even when α < 12 . One of the achievements of rough path theory is to make
this intuition precise. Indeed, in the framework of rough integration sketched here
and made precise in Chapter 4, the barrier α = 12 can be lowered to α = 13 . In
principle, this can be lowered further by further enhancing X with iterated integrals
of higher-order, but we decided to focus on the first non-trivial case for the sake of
simplicity and because it already covers the most important case when X is given
by a Brownian motion, or a stochastic process with properties similar to those of
Brownian motion. We do however indicate very briefly in Sections 2.4, 4.5 and 7.6
how the theory can be modified to cover the case α ≤ 13 , at least in the “geometric”
case when X is a limit of smooth paths.
The simplest way for Y to “look like X” is when Y = G(X) for some sufficiently
regular function G. Despite what one might guess, it turns out that this particular
1.1 Controlled differential equations 5

class of functions Y R is already sufficiently rich so that knowing how to define

t
integrals of the form 0 G(Xs ) dXs for (non-gradient) functions G allows to give a
meaning to equations of the type (1.1), which is the approach originally developed
in [Lyo98]. More recently,
Rt Gubinelli realised in [Gub04] that, in order to be able to
give a meaning to 0 Ys dXs given the data (X, X), it is sufficient that Y admits a
“derivative” Y 0 such that
Ys,t = Ys0 Xs,t + Rs,t ,
with a remainder satisfying Rs,t = O(|t − s|2α ). This extension of the original theory
turns out to be quite convenient, especially when applying it to problems other than
the resolution of evolution equations of the type (1.1).
An intriguing question is to what extent rough path theory, essentially a theory
of controlled ordinary differential equations, can be extended to partial differential
equations. In the case of finite-dimensional noise, and very loosely stated, one has
for instance a statement of the following type. (See [CF09, CFO11, FO14, GT10,
Tei11, DGT12] as well as Section 12.1 below.)

Theorem 1.3. Classes of SPDEs of the form du = F [u] dt + H[u] ◦ dB, with
second and first order differential operators F and H, respectively, and driven
by finite-dimensional noise, with the Zakai equation from filtering and stochastic
Hamilton–Jacobi–Bellman (HJB) equations as examples, can be solved pathwise, i.e.
for a fixed realisation of the Brownian rough path. As in the SDE case, the SPDE
solution map factorises as S S = Ŝ ◦ Ψ S where Ŝ, the solution map to a rough partial
differential equation (RPDE) is continuous in the rough path topology.

As a consequence, if ξε = δε ∗ ξ denotes the regularisation of white noise in

time with a compactly supported smooth mollifier δε that is scaled by ε, and if uε
denotes the random PDE solutions driven by ξε dt (instead of ◦dB) then uε converges
in probability. The limiting process does not depend on the choice of mollifier δε ,
and is viewed as Stratonovich SPDE solution. The same conclusion holds whenever
Ψ S (B ε ) → Ψ S (B) in probability and rough path topology.
The case of SPDEs driven by infinite-dimensional noise poses entirely different
problems. Already the stochastic heat equation in space dimension one has not
enough spatial regularity for additional nonlinearities of the type g(u)∂x u (which
arises in applications from path sampling [Hai11b, HW13]) or (∂x u)2 (the Kardar–
Parisi–Zhang equation) to be well-defined. In space dimension one, “spatial” rough
paths indexed by x, rather than t, have proved useful here and the quest to handle
dimension larger than one led to the general theory of regularity structures, see
Section 1.3 below.
Rather than trying to survey all applications to date of rough paths to stochastics,
let us say that the past few years have seen an explosion of results made possible by
the use of rough paths theory. New stimulus to the field was given by its use in rather
diverse mathematical fields, including for example quantum field theory [GL09],
nonlinear PDEs [Gub12], Malliavin calculus [CFV09], non-Markovian Hörmander
and ergodic theory, [CF10, HP13, CHLT12] and the analysis of chaotic behaviour in
fast-slow systems [KM14].
6 1 Introduction

In view of these developments, we believe that it is an opportune time to try to

summarise some of the main results of the theory in a way that is as elementary as
possible, yet sufficiently precise to provide a technical working knowledge of the
theory. We therefore include elementary but essentially complete proofs of several
of the main results, including the continuity and definition of the Itô–Lyons map,
the lifting of a class of Gaussian processes to the space of rough paths, etc. In
contrast to the available textbook literature [LQ02, LCL07, FV10b], we emphasize
Gubinelli’s view on rough integration [Gub04, Gub10] which allows to linearise
many considerations and to simplify the exposition. That said, the resulting theory
of rough differential equations is (immediately) seen to be equivalent to Davie’s
definition [Dav08] and, generally, we have tried to give a good idea what other
perspectives one can take on what amounts to essentially the same objects.

1.2 Analogies with other branches of mathematics

As we have just seen, the main idea of the theory of rough paths is to “enhance”
a path X with some additional data X, namely the integral of X against itself, in
order to restore continuity of the Itô map. The general idea of building a larger
object containing additional information in order to restore the continuity of some
nonlinear transformation is of course very old and there are several other theories
that have a similar “flavour” to the theory of rough paths, one of them being the
theory of Young measures (see for example the notes [Bal00]) where the value of
a function is replaced by a probability measure, thus allowing to describe limits of
highly oscillatory functions.
Nevertheless, when first confronted with some of the notions just outlined, the
first reaction of the reader might be that simply postulating the values for the right
hand side of (1.8) makes no sense. Indeed, if X is smooth, then we “know” that there
is only one “reasonable” choice for the integral X of X against itself, and this is the
Riemann integral. How could this be replaced by something else and how can one
expect to still get a consistent theory with a natural interpretation? These questions
will of course be fully answered in these notes.
For the moment, let us draw an analogy with a very well established branch of
geometric measure theory, namely the theory of varifolds [Alm66, LY02].
Varifolds arise as natural extensions of submanifolds in the context of certain
variational problems. We are not going into details here, but loosely speaking a
k-dimensional varifold in Rn is a (Radon) measure v on Rn × G(k, n), where
G(k, n) denotes the space of all k-dimensional subspaces of Rn . Here, one should
interpret G(k, n) as the space of all possible tangent spaces at any given point for
a k-dimensional submanifold of Rn . The projection of v onto Rn should then be
interpreted as a generalisation of the natural “surface measure” of a submanifold,
while the conditional (probability) measure on G(k, n) induced at almost every point
by disintegration should be interpreted as selecting a (possibly random) tangent
space at each point. Why is this a reasonable extension of the notion of submanifold?
1.2 Analogies with other branches of mathematics 7

Consider the following sequence Mε of one-dimensional submanifolds of R2 :

Mε M
⇒
ε

It is intuitively clear that, as ε → 0, this converges to a circle, but the right half has
twice as much “weight” as the left half so that, if we were to describe the limit M
simply as a manifold, we would have lost some information about the convergence of
the surface measures in the process. More dramatically, there are situations where one
has a sequence of smooth manifolds such that the limit is again a smooth manifold,
but with a limiting “tangent space” which has nothing to do with the actual tangent
space of the limit! Indeed, consider the sequence of one-dimensional submanifolds
of R2 given by

ε2

This time, the limit is a piece of straight line, which is in principle a perfectly nice
smooth submanifold, but the limiting tangent space is deterministic and makes a 45◦
angle with the canonical tangent space associated to the limit.
The situation here is philosophically very similar to that of the theory of rough
paths: a subset M ⊂ Rn may be sufficiently “rough” so that there is no way of
canonically associating to it either a k-dimensional Riemannian volume element,
or a k-dimensional tangent space, so we simply postulate them. The two examples
given above show that even in situations where M is a nice smooth manifold, it
still makes sense to associate to it a volume element and / or tangent space that are
different from the ones that one would construct canonically. A similar situation
arises in the theory of rough paths. Indeed, it may so happen that X is actually
given by a smooth function. Even so, this does not automatically mean that the right
hand side of (1.8) is given by the usual Riemann integral of X against itself. An
explicit example illustrating this fact is given in Exercise 2.17 below. Similarly to
the examples of “non-canonical” varifolds given above, “non-canonical” rough paths
can also be constructed as limits of ordinary smooth paths (with the second-order
term X defined by (1.8) where the integral is the usual Riemann integral), provided
that one takes limits in a suitably weak topology.
8 1 Introduction

1.3 Regularity structures

Very recently, a new theory of “regularity structures” was introduced [Hai14c], uni-
fying various flavours of the theory of rough paths (including Gubinelli’s controlled
rough paths [Gub04], as well as his branched rough paths [Gub10]), as well as the
usual Taylor expansions. While it has its roots in the theory of rough paths, the main
advantage of this new theory is that it is no longer tied to the one-dimensionality of
the time parameter, which makes it also suitable for the description of solutions to
stochastic partial differential equations, rather than just stochastic ordinary differen-
tial equations.
The main achievement of the theory of regularity structures is that it allows to
give a (pathwise!) meaning to ill-posed stochastic PDEs that arise naturally when
trying to describe the macroscopic behaviour of models from statistical mechanics
near criticality. One example of such an equation is the KPZ equation arising as a
natural model for one-dimensional interface motion [KPZ86, BG97, Hai13]:

∂t h = ∂x2 h + (∂x h)2 − C + ξ . (1.9)

The problem with this equation is that, if anything, one has (∂x h)2 = +∞ (a
consequence of the roughness of (1 + 1)-dimensional space-time white noise) and
one would have to compensate with C = +∞. It has become custom to define the
solution of the KPZ equation as the logarithm of the (multiplicative) stochastic heat
equation ∂t u = ∂x2 u + uξ, essentially ignoring the (infinite) Itô-correction term.3
The so-constructed solutions are called Hopf–Cole solutions and, to cite J. Quastel
[Qua11],
The evidence for the Hopf–Cole solutions is now overwhelming. Whatever the physicists
mean by KPZ, it is them.

It should emphasised that previous to [Hai13], to be discussed in Chapter 15, no

direct mathematical meaning had been given to the actual KPZ equation.
Another example is the dynamical Φ43 model arising for example in the stochastic
quantisation of Euclidean quantum field theory [PW81, JLM85, AR91, DPD03,
Hai14c], as well as a universal model for phase coexistence near the critical point
[GLP99]:
∂t Φ = ∆Φ + CΦ − Φ3 + ξ . (1.10)
Here, ξ denotes (3 + 1)-dimensional space-time white noise. In contrast to the KPZ
equation where the Hopf–Cole solution is a Hölder continuous random field, here
Φ is at best a random Schwartz distribution, making the term Φ3 ill-defined. Again,
one formally needs to set C = ∞ to create suitable cancellations and so, again, the
stochastic partial differential equation (1.10) has no “naı̈ve” mathematical meaning.
Loosely speaking, the type of well-posedness results that can be proven with the
help of the theory of regularity structures can be formulated as follows.

3
This requires one of course to know that solutions to ∂t u = ∂x2 u + uξ stay strictly positive with
probability one, provided u0 > 0 a.s., but this turns out to be the case.
1.4 Frequently used notations 9

Theorem 1.4. Consider KPZ and Φ43 on a bounded square spatial domain with
periodic boundary conditions. Let ξε = δε ∗ ξ denote the regularisation of space-time
white noise with a compactly supported smooth mollifier δε that is scaled by ε in
the spatial direction(s) and by ε2 in the time direction. Denote by hε and Φε the
solutions to

∂t hε = ∂x2 hε + (∂x hε )2 − Cε + ξε ,
∂t Φε = ∆Φε + C̃ε Φε − Φ3ε + ξε .

Then, there exist choices of constants Cε and C̃ε diverging as ε → 0, as well as

processes h and Φ such that hε → h and Φε → Φ in probability. Furthermore, while
the constants Cε and C̃ε do depend crucially on the choice of mollifiers δε , the limits
h and Φ do not depend on them.
In the case of the KPZ equation, the topology in which one obtains convergence is
that of convergence in probability in a suitable space of space-time Hölder continuous
functions. Let us also emphasise that in this case the resulting renormalised solutions
coincide indeed with the Hopf–Cole solutions.
In the case of the dynamical Φ43 model, convergence takes place instead in some
space of space-time distributions. One caveat that also has to be dealt with in the
latter case is that the limiting process Φ may in principle explode in finite time for
some instances of the driving noise. (Although this is of course not expected.)
The penultimate sections of this book gives a short and mostly self-contained
introduction to the theory of regularity structures and the last section shows how it
can be used to provide a robust solution theory for the KPZ equation. The material in
these sections differs significantly in presentation from the remainder of the book.
Indeed, since a detailed and rigorous exposition of this material would require an
entire book by itself (see the rather lengthy articles [Hai13] and [Hai14c]), we made
a conscious decision to keep the exposition mostly at an intuitive level. We therefore
omit virtually all proofs (with the notable exception of the proof of the reconstruction
theorem, Theorem 13.12, which is the fundamental result on which the theory builds)
and instead give short glimpses of the main ideas involved.

1.4 Frequently used notations

We shall deal with paths with values in, as well as maps between, Banach spaces
V, W . It will be important to consider tensor products of such Banach spaces. Assume
at first that V, W are finite-dimensional, V ∼ = Rm , W ∼ = Rn . In this case the
tensor product V ⊗ W can be identified with the matrix space Rm×n . Indeed, if
(ei : 1 ≤ i ≤ m) [resp. (fj : 1 ≤ j ≤ n)] is a basis of V [resp. W ], then
(ei ⊗ fj : 1 ≤ i ≤ m, 1 ≤ j ≤ n) is a basis of V ⊗ W . If (ei ) and (fj ) are
orthonormal bases it is natural to define a Euclidean structure on V ⊗ W by declaring
the (ei ⊗ fj ) to be orthonormal. This induces a norm on V ⊗ W which is compatible
in the sense |v ⊗w| ≤ |v|·|w| ∀v ∈ V, w ∈ W . When applied to V ⊗V we also have
10 1 Introduction

the (permutation) invariance property, |u ⊗ v| = |v ⊗ u| ∀u, v ∈ V . A well-known

and useful feature of tensor product spaces is their ability to linearise bilinear maps,4

L(V × V̄ , W ) ∼
= L(V ⊗ V̄ , W ).
In coordinates, this identification is almost trivial: any A ∈ L(V × V̄ , W ), i.e. any
bilinear map from V × V̄ into W , can be expressed in terms of a 3-tensor (Aji,k ) such
that A maps v = v i ei , v̄ = v̄ k ek into v i v̄ k Aji,k fj ∈ W . The same 3-tensor gives
rise to Ā ∈ L(V ⊗ V̄ , W ). Indeed, any M = M i,k (ei ⊗ ek ) ∈ V ⊗ V̄ is mapped
linearly into M i,k Aji,k fj ∈ W . (A brief discussion how these things are adapted in
an infinite-dimensional Banach setting is given in the following subsection.)
It will also be important to consider nonlinear maps between Banach spaces.
Generically, we write Cbn for the space of bounded continuous function, say F : V →
W , say, with up to n bounded, continuous derivatives in Fréchet sense, i.e. such that

kF kCbn ≡ kF k∞ + kDF k∞ + . . . + kDn F k∞ < ∞

whenever F ∈ Cbn ; recall DF (v) ∈ L(V, W ), D2 F ∈ L(V, L(V, W )) ∼ = L(V ×

V, W ) and so on.
The notation C α , for α ≤ 1 is reserved for paths, such as X, Y, . . . defined on
[0, T ], with values in some Banach space, Hölder continuous of exponent α (short:
α-Hölder). For X ∈ C α , the usual α-Hölder semi-norm is given by

def |Xs,t |
kXkα = sup <∞,
s,t∈[0,T ] |t − s|α

def
where we define the path increment Xs,t = Xt − Xs (and also use the convention
def
0/0 = 0). As is well known, C α is a Banach space when equipped with the norm
X 7→ |X0 | + kXkα . When working with paths starting at the origin, the term |X0 |
can be omitted, i.e. we can work with directly with k · kα . The same is true if we
are only interested in the α-Hölder distance between two paths started at the same
point ξ ∈ V . Often we shall work with partitions or dissections of [0, T ]; since
every dissection D = {0 = t0 < t1 < · · · < tn = T } ⊂ [0, T ] can be thought of as
a partition of [0, T ] into (essentially) disjoint intervals, P ={[ti−1 , ti ] : i = 1, . . . n},
and vice-versa, we shall use whatever is (notationally) more convenient. We recall
that lim|P|→0 , typically defined via nets, means convergence along any sequence (Pk )
with mesh |Pk | → 0, with identical limit along each such sequence. Here, the mesh
|P| of a partition P is the length of its largest element, i.e. |P| = supk∈{1,...,n} |tk −
tk−1 | if P is as above.
We will frequently deal with functions Ξ mapping (s, t) ∈ [0, T ]2 continuously
into some Banach space and which enjoy some sort of “on-diagonal” α-Hölder
def
regularity. More precisely, we write Ξ(s, t) = Ξs,t ∈ C2α if there exists a constant
C such that |Ξs,t | ≤ C|t − s| for all (s, t) ∈ [0, T ]2 . The smallest such constant is
α

4
This will arise naturally, with V̄ = V , when pairing the second Fréchet derivatives (of some
F : V → W ) with second iterated integrals with values in V ⊗ V .
1.5 Rough path theory works in infinite dimensions 11

then given by
def |Ξs,t |
kΞkα = sup .
s,t∈[0,T ] |t − s|α
In particular, if X is a function defined on [0, T ] that is α-Hölder continuous in the
usual sense, then its increments (s, t) 7→ Xs,t belong to C2α . For any such (non-
R t increment, one has necessarily α ≤ 1, for otherwise Ẋ = 0 and then
trivial) path
Xs,t = s Ẋ ≡ 0. In general, however, one has non-trivial elements Ξ ∈ C2α also
for α > 1 and indeed this is a crucial property whenever Ξs,t represents some error
term,
P since, in this case, α−1 whenever P is a partition of the interval [0, T ], one has
[s,t]∈P |Ξ s,t | ≤ CT |P| , which goes to 0 with the mesh of P.
As usual, we will use the notation A = O(x) if there exists a constant C such
that the bound |A| ≤ Cx holds for every x ≤ 1 (or every x ≥ 1, depending on the
context). Similarly, we write A = o(x) if the constant C can be made arbitrarily
small as x → 0 (or as x → ∞, depending on the context). We will also occasionally
write C for a generic constant that only depends on the data of the problem under
consideration and which can change value from one line to the other without further
notice.
At last, let us note that the symbols C α , DXα
etc. refer to spaces of rough paths and
controlled rough paths, respectively. (Both are introduced in details in the relevant
sections below.)

1.5 Rough path theory works in infinite dimensions

Unless explicitly otherwise stated, all rough path results in this book are valid (with
no complications in the arguments!) in a general Banach setting. Linear (or bilinear)
maps are now assumed to be continuous and we still use L(...) for the class of
such maps. What is a little more involved is the (classical) construction of a tensor
product as Banach space: one completes the algebraic tensor product, V ⊗a V̄ , under
a compatible tensor norm upon which the resulting space V ⊗ V̄ depends. What one
would like, as above, is

L(V × V̄ , W ) ∼
= L(V ⊗ V̄ , W )

so that for every A ∈ L(V × V̄ , W ) there exists a unique Ā ∈ L(V ⊗ V̄ , W )

satisfying Ã(v ⊗ v̄) = A(v, v̄), and such that A ↔ Ā is an isometric isomorphism
between the Banach spaces L(V × V̄ , W ) and L(V ⊗ V̄ , W ). This is known to be
true (e.g. [Rya02, Thm 2.9]) when V ⊗ V̄ = V ⊗proj V̄ , i.e. the closure of V ⊗a V̄
under the so-called projective tensor norm.
In fact, a continuous embedding (or “canonical injection”) of the form

L(V, L(V, W )) ,→ L(V ⊗ V, W )

12 1 Introduction

will be enough for our purposes. For the rest of this text we shall thus make the
standing assumption that V ⊗ V has been equipped with a compatible tensor norm
that has this property. In many situations of interest the space V is just a copy of Rm
and then this is trivially true. In the existing literature, such aspects are discussed in
[LCL07, p19-20], [LQ02, p28,111].
Chapter 2
The space of rough paths

Abstract We define the space of (Hölder continuous) rough paths, as well as the
subspace of “geometric” rough paths which preserve the usual rules of calculus. The
latter can be interpreted in a natural way as paths with values in a certain nilpotent
Lie group. At the end of the chapter, we give a short discussion showing how these
definitions should be generalized to treat paths of arbitrarily low regularity.

2.1 Basic definitions

In this section, we give a practical definition of the space of Hölder continuous

rough paths. Our choice of Hölder spaces is chiefly motivated by our hope that most
readers will already be familiar with the classical Hölder spaces from real analysis.
We could in the sequel have replaced “α-Hölder continuous” by “finite p-variation”
for p = 1/α in many statements. This choice would also have been quite natural,
due to the fact
R that one of our primary goals will be to give meaning to integrals
of the form f (X) dX or solutions to controlled differential equations of the form
dY = f (Y ) dX for rough paths X. The value of such an integral / solution does not
depend on the parametrisation of X, which dovetails nicely with the fact that the
p-variation of a function is also independent of its parametrisation. This motivated its
choice in the original development of the theory. In some other applications however
(like the solution theory to rough stochastic partial differential equations developed
in [Hai11b, HW13, Hai13] and more generally the theory of regularity structures
[Hai14c]), parametrisation-independence is lost and the choice of Hölder norms is
more natural.
A rough path on an interval [0, T ] with values in a Banach space V then consists
of a continuous function X : [0, T ] → V , as well as a continuous “second order
process” X : [0, T ]2 → V ⊗ V , subject to certain algebraic and analytical conditions.
Regarding the former, the behaviour of iterated integrals, such as (2.2) below, suggests
to impose the algebraic relation (“Chen’s relation”),

13
14 2 The space of rough paths

Xs,t − Xs,u − Xu,t = Xs,u ⊗ Xu,t , (2.1)

which we assume to hold for every triple of times (s, u, t). Since Xt,t = 0, it
immediately follows (take s = u = t) that we also have Xt,t = 0 for every t. As
already mentioned in the introduction, one should think of X as postulating the value
of the quantity Z t
def
Xs,r ⊗ dXr = Xs,t , (2.2)
s
where we take the right hand side as a definition for the left hand side. (And not
the other way around!) We insist (cf. Exercise 2.7 below) that as a consequence
of (2.1), knowledge of the path t 7→ (X0,t , X0,t ) already determines the entire
second order process X. In this sense, the pair (X, X) is indeed a path, and not
some two-parameter object, although it is often more convenient to consider it
as one. If X is a smooth function and we read (2.2) from right to left, then it is
straightforward to verify (see Exercise 2.6 below) that the Rrelation (2.1) does indeed
hold. Furthermore, one can convince oneself that if f 7→ f dX denotes any form
Rt
of “integration” which is linear in f , has the property that s dXr = Xs,t , and is
Rt Ru Ru
such that s f (r) dXr + t f (r) dXr = s f (r) dXr for any admissible integrand
f , and if we use such a notion of “integral” to define X via (2.2), then (2.1) does
automatically hold. This makes it a very natural postulate in our setting.
Note that the algebraic relations (2.1) are by themselves not sufficient to determine
X as a function of X. Indeed, for any V ⊗ V -valued function F , the substitution
Xs,t 7→ Xs,t + Ft − Fs leaves the left hand side of (2.1) invariant. We will see later
on how one should interpret such a substitution. It remains to discuss what are the
natural analytical conditions one should impose for X. We are going to assume that
the path X itself is α-Hölder continuous, so that |Xs,t | . |t − s|α . The archetype of
an α-Hölder continuous function is one which is self-similar with index α, so that
Xλs,λt ∼ λα Xs,t .
(We intentionally do not give any mathematical definition of self-similarity here,
just think of ∼ as having the vague meaning of “looks like”.) Given (2.2), it is then
very natural to expect X to also be self-similar, but with Xλs,λt ∼ λ2α Xs,t . This
discussion motivates the following definition of our basic spaces of rough paths.

Definition 2.1. For α ∈ ( 13 , 12 ], define the space of α-Hölder rough paths (over V ),
in symbols C α ([0, T ], V ), as those pairs (X, X) =: X such that

def |Xs,t | def |Xs,t |

kXkα = sup <∞, kXk2α = sup < ∞ , (2.3)
s6=t∈[0,T ] |t − s|α s6=t∈[0,T ] |t − s|2α

and such that the algebraic constraint (2.1) is satisfied.

Remark 2.2. Given an arbitrary path X ∈ C α with values in some Banach space V it
is far from obvious that this path can indeed be lifted to a rough path (X, X) ∈ C α .
The Lyons–Victoir extension theorem [LV07] asserts that this can always be done
provided α ∈ ( 13 , 12 ), with an infinite dimensional counter example given in the case
2.1 Basic definitions 15

α = 1/2. When dim V < ∞, there is no such restriction, see Proposition 13.23
below. In typical applications to stochastic processes, a “canonical” lift is constructed
via probability and one does not rely on the extension theorem.

If one ignores the nonlinear constraint (2.1), there is a natural way to think of
(X, X) as an element in the Banach space C α ⊕ C22α of such maps with (semi-)norm
kXkα + kXk2α . However, taking into account (2.1) we see that C α is not a linear
space, although it is a closed subset of the aforementioned Banach space. We will
need (some sort of) a norm and metric on C α . The induced “natural” norm on C α
given by kXkα + kXk2α fails to respect the structure of (2.1) which is homogeneous
with respect to a natural dilatation on C α , given by (X, X) 7→ (λX, λ2 X). This
suggests to introduce the α-Hölder (homogeneous) rough path norm
def
p
|||X|||α = kXkα + kXk2α , (2.4)

which, although not a norm in the usual sense of normed linear spaces, is the adequate
concept for the rough path X = (X, X).
Note also that the quantities defined in (2.3) are merely seminorms since they
vanish for constants. Most importantly, (2.3) leads to a notation of rough path metric
(and then rough path topology).

Definition 2.3. Given rough paths X, Y ∈ C α ([0, T ], V ), we define the (inhomoge-

neous) α-Hölder rough path metric 1

|Xs,t − Ys,t | |Xs,t − Ys,t |

%α (X, Y) := sup α
+ sup .
s6=t∈[0,T ] |t − s| s6=t∈[0,T ] |t − s|2α

The perhaps cheapest way to show convergence with respect to this rough path
metric is based on interpolation: in essence, it is enough to establish pointwise
convergence in conjunction with uniform “rough path” bounds of the form (2.3); see
Exercise 2.9. Let us also note that C α ([0, T ], V ) so becomes a complete, metric
space; the reader is asked to work out the details in Exercise 2.11.
We conclude this part with two important remarks. First, we can ask ourselves up
to which point the relations (2.1) are already sufficient to determine X. Assume that
we can associate to a given function X two different second order processes X and
X̄, and set Gs,t = Xs,t − X̄s,t . It then follows immediately from (2.1) that

Gs,t = Gu,t + Gs,u ,

so that in particular Gs,t = G0,t − G0,s . Since, conversely, we already noted that
setting X̄s,t = Xs,t + Ft − Fs for an arbitrary continuous function F does not
change the left hand side of (2.1), we conclude that X is in general determined

1
As was already emphasised, C α is not a linear space but is naturally embedded in the normed
space of maps X, X; the definition of %α makes use of this. While this may not appear intrinsic (the
situation is somewhat similar to using the (restricted) Euclidean metric on R3 on the 2-sphere), the
ultimate justification is that the Itô map will turn out to be locally Lipschitz continuous in %α .
16 2 The space of rough paths

only up to the increments of some function F ∈ C 2α (V ⊗ V ). The choice of F

does usually matter and there is in general no obvious canonical choice. However,
there are important examples where such a canonical choice exists and we will see
in Section 10 below that such examples are provided by a large class of Gaussian
processes that in particular include Brownian motion, and more generally fractional
Brownian motion for every Hurst parameter H > 14 .
The second remark is that this construction can possibly be useful only if α ≤ 12 .
Indeed, if α > 12 , then a canonical choice of X is given by reading (2.2) from
right to left and interpreting the left hand side by a simple Young integral [You36].
Furthermore, it is clear in this case that X must be unique, since any additional
increment should be 2α-Hölder continuous by (2.3), which is of course only possible
if α ≤ 12 . Let us stress once more however that this is not to say that X is uniquely
determined by X if the latter is smooth, when it is interpreted as an element of C α for
some α ≤ 12 . Indeed, if α ≤ 12 , F is any 2α-Hölder continuous function with values
in V ⊗ V and Xs,t = Ft − Fs , then the path (0, X) is a perfectly “legal” element of
C α , even though one cannot get any smoother than the function 0. The impact of
perturbing X by some F ∈ C 2α in the context of integration is considered in Example
4.13 below. In Section 5, we shall use this for a (rough-path) understanding of how
exactly Itô and Stratonovich integrals differ.

2.2 The space of geometric rough paths

While (2.1) does capture the most basic (additivity) property that one expects any
decent theory of integration to respect, it does not imply any form of integration by
parts / chain rule. Now, if one looks for a first order calculus setting, such as is valid
in the context of smooth paths or the Stratonovich stochastic calculus, then for any
pair e∗i , e∗j of elements in V ∗ , writing Xti = e∗i (Xt ) and Xij ∗ ∗
s,t = (ei ⊗ ej )(Xs,t ), one
would expect to have the identity
Z t Z t
ij ji i j i
Xs,t + Xs,t “ = ” Xs,r dXr + Xs,r dXrj
s s
Z t
j
= d(X i X j )r − Xsi Xs,t − Xsj Xs,t
i
s
j j
= (X i X j )s,t − Xsi Xs,t − Xsj Xs,t
i i
= Xs,t Xs,t ,

so that the symmetric part of X is determined by X. In other words, for all times s, t
we have the “first order calculus” condition
1
Sym(Xs,t ) = Xs,t ⊗ Xs,t . (2.5)
2
However, if we take X to be an n-dimensional Brownian path and define X by Itô
integration, then (2.1) still holds, but (2.5) certainly does not.
2.3 Rough paths as Lie-group valued paths 17

There are two natural ways to define a set of “geometric” rough paths for which
(2.5) holds. On the one hand, we can define a subspace Cgα ⊂ C α by stipulating that
(X, X) ∈ Cgα if and only if (X, X) ∈ C α and (2.5) holds for every s, t. Note that
Cgα is a closed subset of C α . On the other hand, we have already seen that every
smooth path can be lifted canonically to an element of C α by reading the definition
(2.2) from right to left. This choice of X then obviously satisfies (2.5) and we can
define Cg0,α as the closure of lifts of smooth paths in C α . We leave it as exercise
to the reader to see that smooth paths in the definition of Cg0,α may be replaced by
piecewise smooth paths or (piecewise) C 1 paths without changing the resulting space
of geometric rough paths; see also Exercise 2.12.
One has the obvious inclusion Cg0,α ⊂ Cgα , which turns out to be strict [FV06a].
The situation is similar to the classical situation of the set of α-Hölder continuous
functions being strictly larger than the closure of smooth functions under the α-
Hölder norm. (Or the set of bounded measurable functions being strictly larger than
C, the closure of smooth functions under the supremum norm.) Also similar to the
case of classical Hölder spaces, one has the converse inclusion Cgβ ⊂ Cg,0
α
whenever
β > α, see Exercise 2.14. Let us finally mention that non-geometric rough paths
can always be embedded in a space of geometric rough paths at the expense of
adding new components; this is made precise in Exercise 2.14 and was systematically
explored in [HK12].

2.3 Rough paths as Lie-group valued paths

We now present a very fruitful interpretation of rough paths, at least in finite dimen-
2
sions, say V = Rd . To this end, consider X : [0, T ] → Rd , X : [0, T ] → Rd ⊗ Rd
subject to (2.1) and define (with Xs,t = Xt − Xs as usual)

Xs,t := (1, Xs,t , Xs,t ) ∈ R ⊕ Rd ⊕ Rd ⊗ Rd = T (2) Rd .

def
(2.6)
d

The space T (2) R has an obvious (“component-wise”) vector space structure. More
interestingly, for our purposes, it is a non-commutative algebra with unit element
(1, 0, 0) under

(a, b, c) ⊗ (a0 , b0 , c0 ) = (aa0 , ab0 + a0 b, ac0 + a0 c + b ⊗ b0 ) ,

def

also known as truncated tensor algebra. This multiplicative structure is very well
adapted to our needs since (2.1), combined with the obvious identity Xs,t = Xs,u +
Xu,t , means precisely that (again, called “Chen’s relation”)

Xs,t = Xs,u ⊗ Xu,t .

(2)
Set Ta Rd = (a, b, c) : b ∈ Rd , c ∈ Rd ⊗ Rd . As suggested in (2.6), the

(2)
(affine) subspace T1 Rd will play a special role for us. We remark that each of its

18 2 The space of rough paths

elements has an explicit inverse given by

(1, b, c) ⊗ (1, −b, −c + b ⊗ b) = (1, −b, −c + b ⊗ b) ⊗ (1, b, c) = (1, 0, 0) , (2.7)

(2)
so that T1 Rd is a Lie group. It follows that Xs,t = X−1

0,s ⊗ X0,t are the natural
increments of the group-valued path t 7→ X0,t .
Identifying 1, b, c with elements (1, 0, 0), (0, b, 0), (0, 0, c) ∈ T (2) Rd , we may

write (1, b, c) = 1 + b + c. Computations using “formal power series” are then pos-
sible by considering the standard basis {ei : 1 ≤ i ≤ d} ⊂ Rd as non-commutative
−1
variables. The usual power series (1 + x) = 1 − x + x2 − . . . then leads to
−1
(1 + b + c) = 1 − (b + c) + (b + c) ⊗ (b + c)
=1−b−c+b⊗b,

and confirms the inverse of 1 + b + c given in (2.7). The usual power-series also
suggest

def 1
log (1 + b + c) = b + c − b ⊗ b
2
def 1
exp (b + c) = 1 + b + c + b ⊗ b (2.8)
2

and effectively allow to identify T0 Rd ∼

(2) (2)
= Rd ⊕ Rd×d , with T1 Rd =

(2)
exp Rd ⊕ Rd×d . A Lie algebra structure is defined on T0 Rd by

[b + c, b0 + c0 ] = b ⊗ b0 − b0 ⊗ b ,

which is nothing but the commutator associated to the non-commutative product

(2)
⊗. Denote by g(2) ⊂ T0 Rd the sub-algebra generated by elements of the form

(0, b, 0). One can check that, as a Lie algebra, g(2) = Rd ⊕ so(d), i.e. the linear
span of (ei : 1 ≤ i ≤ d) and (eij : 1 ≤ i < j ≤ d), where eij = [ei , ej ]. The Lie
bracket of eij with any other element in g(2) vanishes. Since g(2) is closed under
the
operation [·, ·], its image under the exponential map, G(2) Rd := exp g(2) , is a
(2)
Lie subgroup of T1 Rd .

We call G(2) (Rd ) the step-2 nilpotent Lie group (with d generators). The algebraic
constraint (2.5) then translates precisely to the statement that the path t 7→ X0,t (and
then the increments Xs,t ) takes values in G(2) (Rd ).
Without going into too much details here, G(2) (Rd ) admits a natural homogeneous
“Carnot-Carathéodory norm” k · kC with the property, for x = exp (b + c),
1/2
kxkC |b| + |c| , (2.9)

where indicates Lipschitz equivalence (with constants that may depend on the
dimension d). A left-invariant metric dC , known as the Carnot-Carathéodory metric,
is induced by k · kC so that
2.4 Geometric rough paths of low regularity 19

1/2
dC (Xs , Xt ) = kXs,t kC |Xs,t | + |Xs,t | . (2.10)

As a matter of fact, defining the “truncated signature” of a smooth path γ : [0, 1] →

Rd by
Z 1 Z 1 Z t
(2) d (2)
G (R ) 3 S (γ) = 1, dγ(t), dγ(s) ⊗ dγ(t) ,
0 0 0

we have the identity

nZ 1 o
|γ̇(t)| dt : γ ∈ C 1 ([0, 1], Rd ) , S (2) (γ) = x .
def
kxkC = inf
0

Using the homogeneous rough path norm introduced in (2.4), taking into account
(2.3), we thus have
dC (Xs , Xt )
|||X|||α;[0,T ] sup α ,
s,t∈[0,T ] |t − s|

and in particular the following appealing characterisation of geometric rough paths.

Proposition 2.4. Let α ∈ 31 , 12 . The following two statements are equivalent:

1. One has (X, X) ∈ Cgα , i.e. it satisfies (2.1), (2.3) and (2.5).
2. The path t 7→ Xt = 1 + X0,t + X0,t takes values in G(2) (Rd ) and is α-Hölder
continuous with respect to the distance dC .

Without going into full detail, the above proposition, combined with the geodesic
nature of the space G(2) (Rd ), shows that geometric rough paths are essentially limits
of smooths paths (“geodesic approximations” in the terminology of [FV10b]) in the
rough path metric.

Proposition 2.5. Let β ∈ 13 , 12 . For every (X, X) ∈ Cgβ [0, T ], Rd , there exists a

sequence of smooth paths X n : [0, T ] → Rd such that

Z ·
n n def n n
(X , X ) = X , X0,t ⊗ dXtn → (X, X) uniformly on [0, T ]
0

with uniform rough path bounds supn kX n kβ + kXn k2β < ∞. By interpolation,
convergence holds in α-Hölder rough path metric for any α ∈ 31 , β , namely

limn→∞ %α ((X n , Xn ), (X, X)) = 0.

2.4 Geometric rough paths of low regularity

The interpretation given above gives a strong hint on how to construct geometric
rough paths with α-Hölder regularity for α ≤ 13 : setting p = b1/αc, one defines the
p-step truncated tensor algebra T (p) (Rd ) by
20 2 The space of rough paths
p
d
M ⊗n
Rd
(p) def
T (R ) = R ⊕ .
n=1

We can construct a Lie group G(p) (Rd ) ⊂ T (p) (Rd ) as before, by setting G(p) =
(p)
exp(g(p) ), where g(p) ⊂ T0 (Rd ) is the Lie algebra spanned by elements of the
form (1, b, 0, . . . , 0). Again, one can construct a “homogeneous Carnot-Carathéodory
metric” on G(p) , with a property similar to (2.9), but with the contribution coming
from the kth level scaling like | · |1/k .
A geometric α-Hölder rough path for arbitrary α ∈ 0, 21 is then given by a

function t 7→ Xt ∈ G(p) (Rd ) with p = b1/αc, which is α-Hölder continuous with

respect to the corresponding distance dC . It is actually also possible to extend this
construction to the non-geometric setting. This is algebraically somewhat more
involved and requires to keep track of more than just the “iterated integrals” of the
rough path X, see [Gub10]. Again, as in Exercise 2.14, it is possible to embed spaces
of non-geometric rough paths of low regularity into a suitable space of geometric
rough paths. This construction however is also much more involved in the case of
very low regularities and can be found in [HK12].

2.5 Exercises

Exercise 2.6. Let X be a smooth V -valued path and let X be given by the left hand
side of (2.2), namely Z t
Xs,t = Xs,r ⊗ Ẋr dr .
s
a) Show that X does indeed satisfy Chen’s relation (2.1).
b) Consider the collection of all iterated integrals over [s, t], viewed as element in
the tensor algebra over V , say
Z
Xs,t := 1, Xs,t , Xs,t , , dXu1 ⊗ dXu2 ⊗ dXu3 , . . . ∈ T ((V )).
s<u1 <u2 <u3 <t
(2.11)
and show that the following general form of Chen’s relation holds,

Xs,t = Xs,u ⊗ Xu,t .

Hint: It suffices to consider the projection of Xs,t to V ⊗n , for an arbitrary integer

n, given by the n-fold integral of dXu1 ⊗ · · · ⊗ dXun over the simplex {s < u1 <
· · · < un < t}.

Exercise 2.7. It is common to define X on ∆0,T := {(s, t) : 0 ≤ s ≤ t ≤ T } rather

than [0, T ]2 . There is no difference however: if Xs,t is only defined for s ≤ t, show
that the relation (2.1) already determines the values of Xs,t for s > t and give an
explicit formula. In fact, show that knowledge of the path t 7→ (X0,t , X0,t ) already
2.5 Exercises 21

determines the entire second order process X. In this sense (X, X) is indeed a path,
and not some two-parameter object.

Exercise 2.8. Consider s ≡ τ0 < τ1 < · · · < τN ≡ t. Show that (2.1) implies
X X
Xs,t = Xτi ,τi+1 + Xτj ,τj+1 ⊗ Xτi ,τi+1
0≤i<N 0≤j<i<N
N
X −1

= Xτi ,τi+1 + Xs,τi ⊗ Xτi ,τi+1 . (2.12)
i=0

Exercise 2.9 (Interpolation). Assume that Xn ∈ C β , for 1/3 < α < β, with
uniform bounds

sup kX n kβ < ∞ and sup kXn k2β < ∞

n n

n
and uniform convergence Xs,t → Xs,t and Xns,t → Xs,t , i.e. uniformly over s, t ∈
[0, T ]. Show that this implies X ∈ C β and

%α (Xn , X) → 0.

Show furthermore that the assumption of uniform convergence can be weakened to

pointwise convergence:
n
∀t ∈ [0, T ] : X0,t → X0,t and Xn0,t → X0,t .

Solution 2.10. Using the uniform bounds and pointwise convergence, there exists C
such that uniformly in s, t

≤ C|t − s|β , 2β
n
|Xs,t | = lim Xns,t ≤ C|t − s| .

|Xs,t | = lim Xs,t
n n

It readily follows that X = (X, X) ∈ C β . In combination with the assumed uniform

convergence, there exists εn → 0, such that, uniformly in s, t,
n n β
|Xs,t − Xs,t | ≤ εn , |Xs,t − Xs,t | ≤ 2C|t − s| ,
2β
|Xns,t − Xs,t | ≤ εn , |Xns,t − Xs,t | ≤ 2C|t − s| .

By geometric interpolation (a ∧ b ≤ a1−θ bθ when a, b > 0 and 0 < θ < 1) with

θ = α/β we have
n α 2α
|Xs,t − Xs,t | . ε1−α/β
n |t − s| , |Xns,t − Xs,t | . ε1−α/β
n |t − s| ,

and the desired %α -convergence follows.

It remains to weaken the assumption to pointwise convergence. By Chen’s relation,
pointwise convergence of Xn0,t for all t actually implies pointwise convergence of
Xns,t for all s, t. We claim that, thanks to the uniform Hölder bounds, this implies
22 2 The space of rough paths

uniform convergence. Indeed, given ε > 0, pick a (finite) dissection D of [0, T ]

with small enough mesh so that C|D|β < ε/8. Given s, t ∈ [0, T ] write ŝ, t̂ for the
nearest points in D and note that
n n n
|Xs,t − Xs,t | ≤ |Xŝ,t̂ − Xŝ, t̂
| + |Xs,ŝ | + |Xs,ŝ | + |Xt,t̂ | + |Xt,nt̂ |
n
≤ |Xŝ,t̂ − Xŝ, t̂
| + ε/2 .
n
By picking n large enough, |Xŝ,t̂ − Xŝ, t̂
| can also be bounded by ε/2, uniformly
over the (finitely many!) points in D, so that X n → X uniformly. Although the
second level is handled similarly, the non-additivity of (s, t) 7→ Xs,t requires some
extra care, (2.1). For simplicity of notation only, we assume s < ŝ < t = t̂ so that

|Xs,t − Xns,t | ≤ |Xs,ŝ − Xnŝ,t | + |Xŝ,t | + |Xs,ŝ ⊗ Xŝ,t − Xs,ŝ

n n
⊗ Xŝ,t |.
n n n
It remains to write the last summand as |Xs,ŝ ⊗(Xŝ,t −Xŝ,t )−(Xs,ŝ −Xs,ŝ )⊗Xŝ,t |
and to repeat the same reasoning as in the first level.
Exercise 2.11. Check that C α ([0, T ], V ) is a complete metric space under the metric
|X0 − Y0 | + %α (X, Y).
Assuming that dim V ≥ 1 to avoid trivialities, show that C α ([0, T ], V ) is not
separable. Hint: Reduce to the case of scalar Hölder paths on [0, 1]; non-separability
of such spaces is well known.
Exercise 2.12. a) Define the space of geometric (α-Hölder) rough paths

Cg0,α ([0, T ], V ) ⊂ C α ([0, T ], V )

as the %α -closure of smooth paths (enhanced with their iterated Riemann integrals)
in C α ([0, T ], V ). Assuming that V is separable, show that Cg0,α ([0, T ], V ) is also
separable.
0,1/2
b) Show that for every geometric 1/2-Hölder rough path, X ∈ Cg , X is neces-
sarily the iterated Riemann-Stieltjes integral of the underlying path X ∈ C 0,1/2 .
(Attention, this does not mean that for every X ∈ C 0,1/2 the iterated Riemann-
Stieltjes integral exist! A counterexample is found in [FV10b, Ex.9.14 (iii)].)
Solution 2.13. Let Q be a countable, dense subset of V and consider the space
Λn of paths which are piecewise linear between level-n dyadic rationals Dn :=
{kT /2n : 0 ≤ k ≤ 2n }, and, at level-n dyadic points, take values in Q. Clearly Λ =
∪Λn is countable for each Λn is in one-to-one correspondence with the (2n + 1)-fold
Cartesian product of Q. It is easy to see that each smooth X is the limit in C 1 of
some sequence (X n ) ⊂ Λ. Indeed, one can take X n to be the piecewise linear
dyadic approximation, modified such that X n |Dn takes values in QRand such that
|(X n − X)|Dn | < 1/n. By continuity of the map X ∈ C 1 7→ X, X ⊗ dX ∈
C α in the respective topologies (we could even take R α = 1), we have more than
enough to assert that every lifted smooth path, X, X ⊗ dX , is the %α -limit of
lifted paths in Λ. It is then easy to see that every %α limit point of lifted smooth path
is also the %α -limit of lifted paths in Λ.
2.5 Exercises 23

Turning to the second part of the question, it is not hard to see that
( )
|X s,t | |X s,t |
Cg0,α ⊂ X ∈ C α : sup α → 0, sup 2α → 0 as ε → 0 .
s,t:|t−s|<ε |t − s| s,t:|t−s|<ε |t − s|

Consider now the case α = 1/2 and a dissection {s = τ0 < τ1 < · · · < τN = t}
with mesh ≤ ε. It follows from Chen’s relation (2.1) that
X X
Xs,t − Xs,τi ⊗ Xτi ,τi+1 =

Xτi ,τi+1
0≤i<n 0≤i<n
X 2α
≤ C(ε) |τi+1 − τi | = T C(ε).
0≤i<n

It follows that Xs,t is the limit of the above Riemann-Stieltjes sum.

Exercise 2.14. One can also consider “non-geometric” separable subspaces of C α .

Consider 1/3 < α < 1/2 (in view of the previous exercise there is no point in taking
α = 1/2 here) and define

C 0,α ([0, T ], V ) ⊂ C α ([0, T ], V )

as the %α -closure of smooth paths and their iterated integrals plus smooth V ⊗ V -
valued path increments. Show that

C 0,α ([0, T ], V ) ∼
= Cg0,α ([0, T ], V ) ⊕ C 0,2α ([0, T ], V ⊗ V ).

Define the (non-separable) space of weak geometric α-Hölder rough paths, Cgα as
those elements X ∈ C α for which 2 Sym (X) = X ⊗ X. Show that Cg0,α is a closed
subspace of Cgα and that

C α ([0, T ], V ) ∼
= Cgα ([0, T ], V ) ⊕ C 2α ([0, T ], V ⊗ V ).

The point of this exercise is that non-geometric rough path spaces can effectively be
embedded in geometric rough path spaces.

Exercise 2.15. At least when dim V < ∞, there is not much difference between
Cg0,α ⊂ Cgα in the following sense. Let 13 < α < β ≤ 12 . By using the (non-trivial!)
fact that every X ∈ Cgβ can be approximated uniformly by smooth paths, with
uniform β-Hölder rough path bounds, use interpolation to see that X ∈ Cg0,α , in fact
show that one has the compact embedding

Cgβ ,→ Cg0,α .

Show a similar statement for non-geometric rough path spaces.

Solution 2.16. C β ⊂ C 0,α (and in fact a continuous embedding) is obvious from the
interpolation exercise above. The compactness of the embedding is a consequence
24 2 The space of rough paths

of Arzela-Ascoli (use dim V < ∞). At last the extension to non-geometric rough
path spaces, is fairly straightforward using the embedding into geometric rough path
spaces.
Exercise 2.17 (Pure area rough path). Identify R2 with the complex numbers and
consider
[0, 1] 3 t 7→ n−1 exp 2πin2 t ≡ X n .

Rt n
a) Set Xns,t := s Xs,r ⊗ dXrn . Show that, for fixed s < t,

n 0 1
Xs,t → 0, Xns,t → π(t − s) . (2.13)
−1 0

b) Establish the uniform bounds supn kX n k1/2 < ∞ and supn kXn k1 < ∞.

c) Conclude by interpolation that (2.13) takes place in α-Hölder rough path metric
%α for any 1/3 < α < 1/2.
n
Solution 2.18. a) Obviously, Xs,t = O(1/n) → 0 uniformly in s, t. Then

1 n
Xns,t = n
+ Ans,t = O 1/n2 + Ans,t

Xs,t ⊗ Xs,t
2
where Ans,t ∈ so(2) is the antisymmetric part of Xns,t . To avoid cumbersome
notation, we identify

0 a
∈ so(2) ↔ a ∈ R.
−a 0

Ans,t then represents the signed area between the curve (Xrn : s ≤ r ≤ t) and
the straight chord from Xtn to Xsn . (This is a simple consequence of Stokes
theorem: the exterior derivative of the 1-form 12 (x dy − y dx) which vanishes
along straight chords, is the volume form dx ∧ dy.) With s < t, (Xrn : s ≤ r ≤ t)
makes bn2 (t − s)c full spins around the origin, at radius 1/n. Each full spin
2
contributes area π(1/n) , while the final incomplete spin contributes some area
2
less than π(1/n) . The total signed area, with multiplicity, is thus
π Cs,t
Ans,t = n2 (t − s) + O(1) 2 = π(t − s) + 2 ,
n n
where |Cs,t | ≤ π uniformly in s, t. It follows that

0 1
Xns,t = π(t − s) + O 1/n2

(2.14)
−1 0

and the claimed uniform convergence follows.

b) The following two estimates for path increments of n−1 exp 2πin2 t ≡ Xtn hold

true:
2.6 Comments 25
n n n
Xs,t ≤ Ẋ |t − s| ≤ n|t − s| , Xs,t ≤ 2|X n | = 2/n .
∞ ∞
√
Since a ∧ b ≤ ab, it immediately follows that
n p
Xs,t ≤ 2|t − s| ,

uniformly in n, s, t. In other words, supn kX n k1/2 < ∞. The argument for the
uniform bounds on Xs,t is similar. On the one hand, we have the bound (2.14).
On the other hand, we also have

n 2 |t − s|2 n2
Z Z
n n n
Xs,t = Ẋu ⊗ Ẋv du dv ≤ Ẋ
∞
≤ |t − s|2 .

s<u<v<t
2 2

The required uniform bound on kXk1 follows by using (2.14) for n2 |t − s| > 1
and the above bound for n2 |t − s| ≤ 1.

c) The interpolation argument is left to the reader.

Exercise 2.19 (Translation of rough paths). Fix α ∈ ( 31 , 12 ] and X = (X, X) ∈

C α [0, T ], Rd . For sufficiently smooth h : [0, T ] → Rd , the translation of X in

direction h is given by
def
Th (X) = X h , Xh ,

where X h := X + h and
Z t Z t Z t
h
Xs,t := Xs,t + hs,r ⊗ dXr + Xs,r ⊗ dhr + hs,r ⊗ dhr . (2.15)
s s s

a) Assume h is Lipschitz. (In particular, the last three integrals above are well-
defined Riemann–Stieltjes integrals.) Show that for fixed h, the translation operator
Th : X 7→ Th (X) is a continuous map from C α into itself.
b) The above (Lipschitz) assumption on h is equivalently expressed by saying that
h ∈ W 1,∞ , where W 1,q denotes the space of absolutely continuous paths h
with derivative ḣ ∈ Lq . Weaken the assumption on h by only requiring ḣ ∈ Lq ,
for suitable q = q(α). Show that q = 2 (“Cameron–Martin paths of Brownian
motion”) works for all α ≤ 1/2. As a matter of fact, the integrals appearing in
(2.15) make sense for every q ≥ 1, but the resulting translated “rough path” would
not necessarily lie in C α .

2.6 Comments

The notion of rough path is due to Lyons and was introduced in [Lyo98]. Rather
than using Hölder-type norms, the original article introduced rough paths in the p-
variation sense for any p ∈ [1, ∞). For p ≥ 3 (corresponding to α < 13 ), this requires
26 2 The space of rough paths
th
additional [p] order information. Various notes by Lyons preceding [Lyo98] already
dealt with α-Hölder rough paths for α ∈ 13 , 12 .
In the recent literature, elements in Cgα are actually called weakly geometric
(α-Hölder) rough paths. In contrast, the space of geometric rough paths Cg0,α is, by
definition, obtained via completion of smooth paths in %α . We do not insist on this
terminology here and indeed, by Proposition 2.5 there is not much difference. In the
early literature the two concepts were somewhat blurred, matters were clarified in
[FV06a].
Chapter 3
Brownian motion as a rough path

Abstract In this chapter, we consider the most important example of a rough path,
which is the one associated to Brownian motion. We discuss the difference, at the
level of rough paths, between Itô and Stratonovich Brownian motion. We also provide
a natural example of approximation to Brownian motion which converges to neither
of them.

3.1 Kolmogorov criterion for rough paths

2
Consider random X(ω) : [0, T ] → V and X(ω) : [0, T ] → V ⊗ V , subject to (2.1).
Equivalently, following Exercise 2.7, we can think of

X(ω) ≡ (X, X)(ω) : [0, T ] → V ⊕ (V ⊗ V )

as a (random) path. The basic example, of course, is that of d-dimensional standard

Brownian motion B enhanced with
Z t
Bs,r ⊗ dBr ∈ Rd ⊗ Rd ∼ = Rd×d .
def
Bs,t = (3.1)
s

The integration here is understood either in Itô- or Stratonovich sense (in the latter
case, we would write ◦dB); sometimes we indicate this by writing BItô resp. BStrat .
It should be noted that the antisymmetric part of B, which is nothing but Lévy’s
stochastic area and takes values in so(d), is not affected by the choice of stochastic
integration. Condition (2.1) is seen to be valid with either choice, while condition
(2.5) only holds in the Stratonovich case. We now address the question of α- resp. 2α-
Hölder regularity of X resp. X by a suitable extension of the classical Kolmogorov
criterion; the application to Brownian motion is then carried out in detail in the
following subsection.
Recalling that B ∈ C α , a.s. for any α < 1/2, we now address the question of
2α-Hölder regularity for B.

27
28 3 Brownian motion as a rough path

Using Brownian scaling and exponential integrability of B0,1 , which is an imme-

diate consequence of the integrability properties of the second Wiener chaos, the
following result applies with β = 1/2 and all q < ∞. It gives the desired 2α-Hölder
As a consequence, (B, B) ∈ C almost surely,
α
regularity for B, a.s. for any α < 1/2.
1 1
where we may take any α ∈ 3 , 2 and B ≡ (B, B) is known as Brownian rough
path or enhanced Brownian motion. In the Stratonovich case, thanks to (2.5), we
obtain a geometric rough path, i.e. (B, BStrat ) ∈ Cgα .

Theorem 3.1 (Kolmogorov criterion for rough paths). Let q ≥ 2, β > 1/q. As-
sume, for all s, t in [0, T ]
β 2β
|Xs,t |Lq ≤ C|t − s| , |Xs,t |Lq/2 ≤ C|t − s| , (3.2)

for some constant C < ∞. Then, for all α ∈ [0, β − 1/q), there exists a modification
of (X, X) (also denoted by (X, X)) and random variables Kα ∈ Lq , Kα ∈ Lq/2
such that, for all s, t in [0, T ]
α 2α
|Xs,t | ≤ Kα (ω)|t − s| , |Xs,t | ≤ Kα (ω)|t − s| . (3.3)
1 1
then, for every α ∈ 13 , β − 1q , we have (X, X) ∈ C α

In particular, if β − q > 3
a.s.

Proof. The proof is almost the same as the classical proof of Kolmogorov’s continuity
criterion, as exposed for example in [RY91]. Without loss of generality take T = 1
and let Dn denote the set of integerSmultiples of 2−n in [0, 1). As in the usual
criterion, it suffices to consider s, t ∈ n Dn , with the values at the remaining times
filled in using continuity. (This is why in general one ends up with a modification.)
Note that the number of elements in Dn is given by #Dn = 1/|Dn | = 2n . Set

Kn = sup Xt,t+2−n , Kn = sup Xt,t+2−n .
t∈Dn t∈Dn

It follows from (3.2) that

Xt,t+2−n q ≤
X 1 βq βq−1
E Knq ≤ E C q |Dn | = C q |Dn |

,
|Dn |
t∈Dn

q/2 Xt,t+2−n q/2 ≤

X 1 2βq/2 βq−1
C q/2 |Dn | = C q/2 |Dn |

E Kn ≤E .
|Dn |
t∈Dn
S
Fix s < t in n Dn and choose m : |Dm+1 | < t − s ≤ |Dm |. The interval [s, t) can
be expressed as the finite disjoint union of intervals of the form [u, v) ∈ Dn with
n ≥ m + 1 and where no three intervals have the same length. In other words, we
have a partition of [s, t) of the form

s = τ0 < τ1 < · · · < τN = t ,

3.1 Kolmogorov criterion for rough paths 29

where (τi , τi+1 ) ∈ Dn some n ≥ m + 1, and for each fixed n ≥ m + 1 there are at
most two such intervals taken from Dn . It follows that
−1
NX X
|Xs,t | ≤ max Xs,τi+1 ≤ Xτi ,τi+1 ≤ 2 Kn ,
0≤i<N
i=0 n≥m+1

and similarly,
N −1 N −1
X X
|Xs,t | = Xτi ,τi+1 + Xs,τi ⊗ Xτi ,τi+1 ≤ Xτi ,τi+1 + |Xs,τi |Xτi ,τi+1

i=0 i=0
N
X −1 N
X −1

≤ Xτi ,τi+1 + max Xs,τi+1 Xτj ,τj+1
0≤i<N
i=0 j=0
X X 2
≤2 Kn + 2 Kn .
n≥m+1 n≥m+1

We thus obtain
|Xs,t | X 1 X 2Kn
α ≤ α 2Kn ≤ α ≤ Kα ,
|t − s| |Dm+1 | |Dn |
n≥m+1 n≥m+1

α
where Kα := 2 n≥0 Kn /|Dn | is in Lq . Indeed, since α < β −1/q by assumption
P
and |Dn | to any positive power is summable, we have
X 2 q 1/q
X 2C β−1/q
kKα kLq ≤ α |E(Kn )| ≤ α |Dn | <∞.
|Dn | |Dn |
n≥0 n≥0

Similarly,

|Xs,t | X 1 X 1 2
2α ≤ 2α 2Kn + α 2Kn ≤ Kα + Kα2 ,
|t − s| |Dm+1 | |Dm+1 |
n≥m+1 n≥m+1

2α
is in Lq/2 . Indeed,
P
where Kα := 2 n≥0 Kn /|Dn |

X 2 2/q X 2C
q/2 2β−2/q
kKα kLq/2 ≤ ≤ 2α |Dn | <∞,

2α E Kn
n≥0
|Dn | n≥0
|D n |

thus concluding the proof. t

The reader will notice that the classical Kolmogorov criterion (KC) is contained
in the above proof and theorem by simply ignoring all considerations related to the
second-order process X. Let us also note in this context that the classical KC works
for processes (Xt : 0 ≤ t ≤ 1) with values in an arbitrary (separable) metric space
(it suffices to replace |Xs,t | by d(Xs , Xt ) in the argument). This observation actually
30 3 Brownian motion as a rough path

gives an alternative and immediate proof of Theorem 3.1, at least for “geometric”
(X, X), i.e. in presence of the algebraic constraint (2.5), and at the price of some
Lie group language. The key observation, as discussed in Section 2.3, is that t 7→
Xt := (1,X0,t, X0,t ) takes values in the step-2 nilpotent group with d generators,
G(2) Rd , ⊗ , endowed with the Carnot-Carathéodory metric
1/2
dC (Xs , Xt ) |Xs,t | + |Xs,t | .

Remark 3.2 (Warning). It is not possible to obtain (3.3) by applying the classical
KC to the (V ⊗ V )-valued process (X0,t : 0 ≤ t ≤ T ). Doing so only gives |Xs,t | =
α
O(|t − s| ) a.s. since one misses a crucial cancellation inherent in (cf. (2.1))

Xs,t = X0,t − X0,s − X0,s ⊗ Xs,t .

That said, it is possible (but tedious) to use a 2-parameter version of the KC to see
that (s, t) 7→ Xs,t /|t − s|2α admits a continuous modification. In particular, this then
implies that kXk2α is finite almost surely. In the Brownian setting, this was carried
out in [Fri05].

Here is a similar result for rough path distances, say between X and X̃. Note
that, due to the nonlinear structure of rough path spaces, one cannot simply apply
Theorem 3.1 to the “difference” of two rough paths. Indeed, X̃ − X is not defined in
general for, formally, one misses the information about the mixed integrals in
Z

X̃ − X = X̃ − X, X̃ − X ⊗ d X̃ − X .

Even when all of these expressions are well-defined, say when X̃ is smooth, conver-
gence of the right-hand side above to zero is different from saying that
Z Z
X̃ → X, X̃ ⊗ dX̃ → X ⊗ dX

and it is this type of convergence (in suitable Hölder-type norms) which our rough
path metric %α expresses.

Theorem 3.3 (Kolmogorov criterion for rough path distance). Let α, β, q be as

above in Kolmogorov’s criterion (KC), Theorem 3.1. Assume that both X̃ = (X̃, X̃)
and X = (X, X) satisfy the moment condition in the statement of KC with some
constant C. Set
∆X := X̃ − X , ∆X := X̃ − X ,
and assume that for some ε > 0 and all s, t ∈ [0, T ]
3.2 Itô Brownian motion 31
β 2β
|∆Xs,t |Lq ≤ Cε|t − s| , |∆Xs,t |Lq/2 ≤ Cε|t − s| .

Then there exists M , depending increasingly on C, so that |k∆Xkα |Lq ≤ M ε and

|k∆Xk2α |Lq/2 ≤ M ε. In particular, if β − 1q > 13 then, for every α ∈ 31 , β − 1q
we have |||X̃|||α , |||X|||α ∈ Lq and

|%α X̃, X |Lq ≤ M ε.

Proof. The proof is a straightforward modification of the proof of Theorem 3.1 and
is left as an exercise to the reader. t
u

In applications of this theorem one typically has a a family

{Xn ≡ (X n , Xn ) : 1 ≤ n ≤ ∞} ,

such that the moment conditions in the statement of KC hold with with a constant
C, uniformly over 1 ≤ n ≤ ∞. Application of the above with ε = εn then gives
Lq -rates of convergence,
|%α (Xn , X)|Lq . εn .
Of course, when εn decays sufficiently fast, a Borel–Cantelli argument also gives
almost-sure convergence with suitable rates.

3.2 Itô Brownian motion

Consider a d-dimensional standard Brownian motion B enhanced with its iterated

integrals Z t
Bs,r ⊗ dBr ∈ Rd ⊗ Rd ∼
= Rd×d ,
def
Bs,t = (3.4)
s
where the stochastic integration is understood in the sense of Itô. The antisymmetric
part of B is known as “Lévy’s stochastic area”. Sometimes we indicate this by
writing BItô . We shall assume straight away that Bt and Bs,t are continuous in t
and s, t respectively, with probability one. For instance, if one takes as granted
that almost surely Brownian motion and indefinite Itô integrals against Brownian
motion (such as B0,· ) are continuous, then it suffices to (re)define the second order
increments as Bs,t = B0,t − B0,s − Bs ⊗ Bs,t . Of course, by additivity of the Itô
integral, this coincides a.s. with the earlier definition. En passant, the so-defined
Bs,t = (Bs,t , Bs,t ) immediately satisfies (2.1), for all times, on a common set of
probability one.

Proposition 3.4. For any a ∈ (1/3, 1/2) and T > 0 with probability one,

B = (B, BItô ) ∈ C α ([0, T ], Rd ) .

32 3 Brownian motion as a rough path

Proof. Using Brownian scaling and finite moments of B0,1 , which are immediate
from integrability properties of the (homogeneous) second Wiener–Itô chaos, the KC
for rough paths applies with β = 1/2 and all q < ∞. (As exercise, the reader may
want to show finite moments of B0,1 without chaos arguments; an elementary way to
do so is via conditioning, Itô isometry, and reflection principle.) t
u

Observe that Brownian motion enhanced with its iterated Itô integrals (2nd order
calculus!) yields a (random) rough path but not a geometric rough path which is, by
definition, an object with hardwired first order behaviour. Indeed, Itô formula yields
the identity

d(B i B j ) = B i dB j + B j dB i + B i , B j dt ,

i, j = 1, . . . , d ,

so that, writing I for the identity matrix in d dimensions, we have for s < t,
1 1 1
Sym BItô
s,t = Bs,t ⊗ Bs,t − I(t − s) 6= Bs,t ⊗ Bs,t ,
2 2 2
in contradiction with (2.5).
Let us also mention that Brownian motion with values in infinite-dimensional
spaces can also be lifted to rough paths, see the exercise section.

3.3 Stratonovich Brownian motion

In the previous section we defined BItô by Itô-integration of d-dimensional Brownian

B motion against itself. Now, for (scalar) continuous semimartingales, M, N say,
the Stratonovich integral is defined as
Z t Z t
1
M ◦ dN := M dN + [M, N ]t
0 0 2

and has the advantage of a first order calculus. For instance, one has the first order
product rule
d(M N ) = M ◦ dN + N ◦ dM.
One can then define BStrat by (component-wise) Stratonovich-integration of Brownian
motion against
itself. Using basic results on quadratic variation between Brownian
motions (d B i , B j t = δ i,j dt where δ i,j = 1 if i = j, zero else), we see that

1
BStrat Itô
s,t = Bs,t + I(t − s) (3.5)
2
where I stands for the identity matrix. Note that the difference between BStrat and BItô
is symmetric, so that the antisymmetric parts of the two processes (Lévy’s stochastic
area) are identical.
3.3 Stratonovich Brownian motion 33

Proposition 3.5. For any a ∈ (1/2, 1/3) and with probability one,

B = (B, BStrat ) ∈ Cgα ([0, T ], Rd ).

A typical realisation B(ω) is called Brownian rough path, B = BStrat as a process is

called (Stratonovich) enhanced Brownian motion.
Proof. Using (3.5) rough path regularity of B is immediately reduced to the already
establish Itô-case. (Alternatively, one can use again the Kolmogorov criterion for
rough paths; the only – insignificant – difference is that now BStrat
0,1 takes values in
the inhomogeneous second chaos, due to the deterministic part I/2.) At last, B(ω) is
geometric since
1
Sym BStrat
s,t = Bs,t ⊗ Bs,t ,
2
an immediate consequence of the first order product rule. t u
It is a deterministic feature of every geometric rough path (X, X) that it can be
approximated – in the precise sense of Proposition 2.5 – by smooth paths in the rough
path topology. Such approximations require knowledge not only of the underlying
path X, but of the entire rough path, including the second order information X.
In contrast, one has the probabilistic statement that piecewise linear (and in fact:
many other “obvious”) approximations still converge in rough path sense. More
specifically, in the present context of d-dimensional standard Brownian motion, we
now give an elegant proof of this based on (discrete-time!) martingale arguments.

Proposition 3.6. Consider dyadic piecewise-linear approximations B (n) to B on
(n)
[0, T ]. That is, Bt = Bt whenever t = iT /2n for some integer i, and linearly
interplolated on intervals [iT /2n , (i + 1)T /2n ]. Then, with probability one,
Z ·
B (n) , B (n) ⊗ dB (n) → (B, BStrat ) in Cgα .
0

(The integral on the left-hand side is understood as classical Riemann–Stieltjes

integral.)
Remark 3.7. With Theorem 3.3, one can see rough path convergence (in probability,
and actually Lq , any q < ∞) of piecewise linear approximation along any sequence
of dissections with mesh tending to zero. Moreover, this approach will give the rate
θ, any θ < 1/2 − α.
Proof. It is easy to check that B gives B (n) via conditioning on B at dyadic times,

B (n) = E(B|σ{BkT 2−n : 0 ≤ k ≤ 2n }).

By independence of the components B i , B j for i 6= j, the same holds for BStrat

off-diagonal; the on-diagonal terms require no further attention since BStrat;i,i
s,t =
1 i
2
2 Bs,t . Almost sure pointwise convergence then readily follows from martingale
convergence. Furthermore, Theorem 3.1 implies
34 3 Brownian motion as a rough path

Bs,t ≤ Kα (ω)|t − s|α , ≤ Kα (ω)|t − s|2α ,

i Strat;i,j
Bs,t

n
R · respect to σ{BkT 2−n : 0 ≤ k ≤ 2 }, the same bounds
and upon conditioning with
hold for B (n);i and for 0 B (n);i dB (n);j . In fact, Kα , Kα have (more than enough)
integrability to apply Doob’s maximal inequality. This leads, with probability one, to
the bound Z ·
sup B (n) , B (n) ⊗ dB (n) < ∞ .

n 0 2α

Together with a.s. pointwise convergence, a (deterministic) interpolation argument

shows a.s. convergence with respect to the α-Hölder rough path metric %α . tu

The reader should be warned that there are perfectly smooth and uniform ap-
proximations to Brownian motion, which do not converge to Stratonovich enhanced
Brownian motion, but instead to some different geometric (random) rough path, such
as
B̄ = B, B̄ , where B̄s,t = BStrat

s,t + (t − s)A , A ∈ so(d) .
Note that the difference between B̄ and BStrat is now antisymmetric, i.e. B̄ has a
stochastic area that is different from Lévy’s area. To construct such approximations,
it suffices to include oscillations (at small scales) such as to create the desired
effect in the area, while they do no affect the limiting path, see Exercise 2.17.
(In the context of Brownian motion and SDEs driven by Brownian motion such
approximations were studied by McShean, Ikeda–Watanabe and others, see [McS72,
IW89].) Although such “twisted” approximations do not seem to be the most obvious
way to approximate Brownian motion, they also arise naturally in some perfectly
reasonable situations.

3.4 Brownian motion in a magnetic field

Newton’s second law for a particle in R3 with mass m, and position x = x(t), (for
simplicity: constant) frictions α1 , α2 , α3 > 0 in orthonormal directions, subject
to a (3-dimensional) white noise in time, i.e. the distributional derivative of a 3-
dimensional Brownian motion B, reads

mẍ = −M ẋ + Ḃ, (3.6)

assuming M symmetric with spectrum α1 , α2 , α3 . The process x(t) describes what is

known as physical Brownian motion. It is well known that in small mass regime, m
1, of obvious physical relevance when dealing with particles, a good approximation
is given by (mathematical) Brownian motion (with non-standard covariance). To see
this formally, it suffices to take m = 0 in (3.6) in which case x = M −1 B.
Let us now assume that our particle (with position x and momentum mẋ) carries
a non-zero electric charge and moves in a magnetic field which we assume to be
constant. Recall that such a particle experiences a sideways force (“Lorentz force”)
3.4 Brownian motion in a magnetic field 35

that is proportional to the strength of the magnetic field, the component of the velocity
that is perpendicular to the magnetic field and the charge of the particle. In terms
of our assumptions, this simply means that a non-zero antisymmetric component is
added to M . We shall hence drop the assumption of symmetry, and instead consider
for M a general square matrix with

Real{σ(M )} ⊂ (0, ∞).

Note that these second order dynamics can be rewritten as evolution equation for the
momentum p(t) = mẋ(t),
1
ṗ = −M ẋ + Ḃ = − M ṗ + Ḃ.
m
As we shall see X = X m , indexed by “mass” m, converges in a quite non-trivial
way to Brownian motion on the level of rough paths. In fact, the correct limit in
rough path sense is B̄ = (B, B̄), where

B̄s,t = BStrat
s,t + (t − s)A, (3.7)

in terms of an antisymmetric matrix A; written explicitly as A = 21 (M Σ − ΣM ∗ ) ∈

so(d), where Z ∞
∗
Σ= e−M s e−M s ds.
0
When M is normal, i.e. M ∗ M = M M ∗ , it is an exercise in linear algebra to show
that this expression simplifies to
1
A= Anti(M ) Sym(M )−1 ,
2
where Anti(M ) denotes the antisymmetric part of a matrix and Sym(M ) its symmet-
ric part. We can now state the result in full detail.

Theorem 3.8. Let M ∈ Rd×d be a square matrix in dimension d such that all its
eigenvalues have strictly positive real part. Let B be a d-dimensional standard
Brownian motion, m > 0, and consider the stochastic differential equations
1 1
dX = P dt , dP = − M P dt + dB .
m m
with zero initial position X and momentum P . Then, for any q ≥ 1 and α ∈
(1/3, 1/2), as mass m → 0,
Z
M X, M X ⊗ d(M X) → B̄ in C α and Lq .

Proof. Step 1. (Pointwise convergence in Lq .) In order to exploit Brownian scaling,

it is convenient to set m = ε2 and then Y ε as rescaled momentum,
36 3 Brownian motion as a rough path

Y ε = P/ε.

We shall also write X ε = X, to emphasize dependence on ε. We then have

dY ε = −ε−2 M Y ε dt + ε−1 dB , dX ε = ε−1 Y ε dt .

By assumption, there exists λ > 0 such that the real part of every eigenvalue of M
is (strictly) bigger than λ. For later reference, we note that this implies the estimate
| exp(−τ M )| = O(exp(−λτ )) as τ → ∞. For fixed ε, define the Brownian motion
B̃t = εBε−2 t , and consider the SDEs

dỸ = −M Ỹ dt + dB̃ , dX̃ = Ỹ dt .

Note that the law of the solutions does not depend on ε. Furthermore, when solved
with identical initial data, we have pathwise equality

Ytε , ε−1 Xtε = Ỹε−2 t , X̃ε−2 t .

(3.8)

Thanks to our assumption on M , Ỹ is ergodic; the stationary solution has (zero

mean, Gaussian) law ν ∼ N (0, Σ) for some covariance matrix Σ. To compute it,
write down the stationary solution
Z t
stat
Ỹt = e−M (t−s) dBs .
−∞

For each t (and in particular for t = 0), the law of Ỹtstat is precisely ν. We then see
that
Z 0 Z ∞
∗ ∗
Σ = E Ỹ0stat ⊗ Ỹ0stat = e−M (−s) e−M (−s) ds = e−M s e−M s ds.
−∞ 0

Since sup0≤t<∞ E|Ỹt2 | < ∞, it is clear that εỸε−2 t = εYtε → 0 in L2 uniformly in

t (and hence in Lq for any q < ∞). Noting that M Xtε = Bt − εY0,t ε
, the first part of
1
the proposition is now obvious. Moreover, by the ergodic theorem ,
Z t Z
f (Ytε ) dt → t f (y)ν(dy) , in Lq for any q < ∞, (3.9)
0

for all reasonable test functions f ; we shall only use it for quadratics. Using dX ε =
ε−1 Y ε dt we can then write
Z t Z t Z t
M Xsε ⊗ d(M X ε )s = M Xsε ⊗ dBs − ε M Xsε ⊗ dYsε
0 0 0

1
As found e.g. in textbooks by Stroock [Str11] or Kallenberg [Kal02]. Test functions are usually
assumed to be bounded, but by a truncation argument in our setting, this is easily extended to
quadratics.
3.4 Brownian motion in a magnetic field 37
Z t Z t
= M Xsε ⊗ dBs − M Xtε ⊗ (εYtε ) + ε d(M X ε )s ⊗ Ysε
0 0
Z t Z t
ε ε ε
= M Xs ⊗ dBs − M Xt ⊗ (εYt ) + M Ysε ⊗ Ysε ds
0 0
Z t Z
→ Bs ⊗ dBs − 0 + t (M y ⊗ y) ν(dy)
0
Z t
1
= Bs ⊗ dBs + tM Σ = B0,t + t M Σ − I ,
0 2

where the convergence is in Lq for any q ≥ 2. By considering the symmetric part of

the above equation,

1 1 1
(M Xtε ) ⊗ (M Xtε ) → Bt ⊗ Bt + Sym M Σ − I ,
2 2 2

we see that M Σ − 21 I has symmetric part 0, i.e. is antisymmetric, and hence also
equals 12 (M Σ − ΣM ∗ ). This settles pointwise convergence, in the sense that
Z t
ε
M Xtε , M Xsε ε

S(M X )t := ⊗ d(M X )s → Bt , B̄0,t .
0

Step 2. (Uniform rough path bounds in Lq .) We claim that, for any q < ∞,
Z q
ε q ε ε

sup E[kM X kα ] < ∞ , sup E M X ⊗ d(M X )
<∞,
ε∈(0,1] ε∈(0,1] 2α

which, in view of Theorem 3.1, is an immediate consequence of the bounds

"Z q #
ε q q t ε ε
q
sup E Xs,t
. |t − s| ,
2
sup E Xs,· ⊗ dX . |t − s| .
ε∈(0,1] ε∈(0,1] s

Since X is Gaussian, it follows from integrability properties of the first two Wiener–
Itô chaoses that it is enough to show these bounds for q = 2. Furthermore, we note
that the desired estimates are a consequence of the bounds
h 2 i
E X̃s,t . |t − s| , (3.10)
2
" Z #
t ε

2
E X̃s,u ⊗ dX̃u . |t − s| , (3.11)
s

where the implied proportionality constants are uniform over t, s ∈ (0, ∞). Indeed,
this follows directly from writing
h i h 2 i
ε 2
= E εX̃ε−2 s,ε−2 t . ε2 ε−2 t − ε−2 s = |t − s| ,

E Xs,t
38 3 Brownian motion as a rough path

(note the uniformity in ε), and similarly for the second moment of the iterated
integral.
In order to check (3.10), it is enough to note that M X̃s,t = B̃s,t − Ỹs,t , combined
with the estimate
h i 2 Z t ∗
−M (t−s)
2
E |Ỹs,t | = E (e − I)Ỹs + Tr(e−M u e−M u ) du . |t − s| ,

s

where we used the fact that Real{σ(M )} ⊂ (0, ∞) to get a uniform bound. In order
to control (3.11), we consider one of the components and write
"Z 2 # "Z Z 2 #
t i j
t u i j
E X̃s,u dX̃u = E Ỹr Ỹu dr du
s s s
Z h i
= E Ỹri Ỹuj Ỹqi Ỹvj 1{r≤u;q≤v} dr du dq dv
[s,t]4
Z h i h i h i h i
≤ E Ỹri Ỹuj E Ỹqi Ỹvj + E Ỹri Ỹqi E Ỹuj Ỹvj

[s,t]4
h i h i
+E Ỹri Ỹvj E Ỹuj Ỹqi dr du dq dv

Z !2
h i
. E Ỹr ⊗ Ỹu dr du

[s,t]2
Z !2
h i
. E Ỹr ⊗ Ỹu 1{r≤u} dr du ,

[s,t]2

where we have used the fact that Ỹ is Gaussian (which yields Wick’s formula for the
of products) in order to get the bound on the third line. But for r ≤ u,
expectation
E Ỹu Ỹr = e−M (u−r) Ỹr , so that
Z h i Z h i
E Ỹr ⊗ Ỹu 1{r≤u} dr du = E Ỹr ⊗ e−M (u−r) Ỹr 1{r≤u} dr du

[s,t]2 [s,t]2
Z t Z t
−λ(u−r)
2
. e du E Ỹr dr . |t − s| .
s r

It now suffices to recall that | exp(−τ M )| = O(exp(−λτ )) to conclude the proof of

(3.11).
Step 3. (Rough path convergence in Lq .) The remainder of the proof is an easy
application of interpolation, along the lines of Exercise 2.9. t
u
3.5 Cubature on Wiener Space 39

3.5 Cubature on Wiener Space

Quadrature rules replace Lebesgue measure

P λ on [0, 1] by a finite, convex linear
combination of point masses, say µ = ai δxi , where weights (ai ) and points (xi )
are chosen such that all monomials (and hence all polynomials) up to degree N are
correctly evaluated. In other words, one first computes the moments of λ, namely
Z 1
1
xn dλ(x) = ,
0 n+1
R1
for all n ≥ 0. One then looks for a measure µ such that 0 xn dµ(x) = 1/(n + 1)
for all n ∈ {0, 1, . . . , N }. The same can be done on Wiener space: the monomial
xn is then replaced by the n-fold iterated integrals (in the sense of Stratonovich),
integration is on C [0, T ], Rd against standard d-dimensional Wiener measure. In

order to find such cubature formulae, the mandatory first step, on which we focus
here, is the computation of the expectations of the n-fold iterated integrals2
Z
E ◦dB ⊗ · · · ⊗ ◦dB .
0<t1 <...tn <T

Let us combine all of these integrals into one single object (also called the “signature”
of Brownian motion) by writing
XZ
S(B)0,T = 1 + ◦dB ⊗ · · · ⊗ ◦dB .
n≥1 0<t1 <...tn <T

The signature S(B)0,T naturally takes values in the tensor algebra T Rd =

d ⊗n
L
n≥0 R . It turns out that in the case of Brownian motion, the expected signa-
ture can be expressed in a particularly concise and elegant form.
Theorem 3.9 (Fawcett). Consider S(B)0,T as above as a T Rd -valued random

variable. Then !
d
T X
ES(B)0,T = exp ei ⊗ ei .
2 i=1
Proof. (Shekhar) Set ϕt := ES(B)0,t . (It is not hard to see, by Wiener–Itô chaos
integrability or otherwise, that all involved iterated integrals are integrable so that ϕ
is well-defined.) By Chen’s formula (in its general form, see Exercise 2.6) and the
independence of Brownian increments, one has the identity

ϕt+s = ϕt ⊗ ϕs .

Since ϕt ⊗ ϕs = ϕs ⊗ ϕt , we have [ϕs , ϕt ] = 0, so that

2
We remark that all n-fold iterated Stratonovich integrals can be obtained from the “level-2” rough
path (B(ω), BStrat (ω)) ∈ Cgα by a continuous map. In fact, this so-called Lyons lift, allows to view
any geometric rough path as a “level-n” rough path for arbitrary n ≥ 2.
40 3 Brownian motion as a rough path

log ϕt+s = log ϕt + log ϕs .

For integers m, n we have log ϕm = n log ϕm/n and log ϕm = m log ϕ1 . It follows
that
log ϕt = t log ϕ1 ,
first for t = m
n ∈ Q, then for any real t by continuity. On the other hand, for t > 0,
Brownian scaling implies that ϕt = δ√t ϕ1 where δλ is the dilatation operator, which
acts by multiplication with λn on the nth tensor level, (Rd )⊗n . Since δλ commutes
with ⊗ (and thus also with log, defined as power series),

log ϕt = δ√t log ϕ1

and it follows that one necessarily has

⊗2
log ϕ1 ∈ Rd .
Pd
It remains to identify log ϕ1 with 12 i=1 ei ⊗ ei . To this end it suffices to compute
the expected signature up to level two, which yields
Z 1 d
1X
ES (2) (B) = E 1 + B0,1 + B ⊗ ◦dB =1+ ei ⊗ ei .
0 2 i=1

Recall that in this expression, “1” is identified with (1, 0, 0) in the truncated tensor
algebra, and similarly for the other summands, and addition also takes place in
T (2) (Rd ). Taking the logarithm (in the tensoralgebra truncated beyond level 2; in
this case log (1 + a + b) = a + b − 12 a ⊗ a if a is a 1-tensor, b a 2-tensor) then
immediately gives the desired identification. t u

The (constructive) existence of cubature formulae, a finite family of piecewise

smooth paths with associated probabilities, such as to mimic the behaviour of the
expected signature up to a given level is not a trivial problem (although much has
been achieved to date), the reader can explore a simple case in Exercise 3.24 below.

3.6 Scaling limits of random walks

Consider a family of continuous processes Xn = (X n , Xn ), with values in values in

V ⊕ (V ⊗ V ) where dim V < ∞. Assume Xn0 = (0, 0) for all n. We leave the proof
of the following result as exercise.

Theorem 3.10 (Kolmogorov tightness criterion for rough paths). Let q ≥ 2, β >
1/q. Assume, for all s, t in [0, T ]
n q q/2
≤ C|t − s|βq , En Xns,t
βq

En Xs,t ≤ C|t − s| , (3.12)
3.6 Scaling limits of random walks 41

1
> 31 . Then for every α ∈ 1 1

for some constant C < ∞. Assume β − q 3, β − q , the
Xn ’s are tight in C 0,α .

In typical applications, the X n are only defined for discrete times, such as s =
j/n, t = k/n for integers j, k. The non-trivial work then consists, for a suitable
choice of Xn , in checking the following discrete tightness estimates,

j − k βq j − k βq
q q/2
En X nj , k ≤ C , En Xnj , k ≤ C . (3.13)

n n n n n n

The analogous continuous tightness estimates are typically obtained by suitable

extension of Xn to continuous times (e.g. piecewise geodesic).

Proposition 3.11. Consider a d-dimensional random walk (Xj : j ∈ N), with i.i.d.
increments of zero mean, finite moments of any order q < ∞, and unit covariance
matrix. Extend the rescaled random walk
1
X nj := √ Xj ,
n n

defined on discrete times only, by piecewise linear interpolation to all times and
construct to Xn = (X n , Xn ) by iterated (Riemann–Stieltjes) integration. Then the
tightness estimates in Theorem 3.10 hold with β = 1/2 and all q < ∞.

Proof. The iterated integrals of a linear (or affine) path with increment v ∈ Rd takes
the simple form exp(v) in terms of the tensor exponential introduced in (2.8). Chen’s
relation then implies

Xnj , k = exp(X nj , j+1 ) ⊗ · · · ⊗ exp(X nk−1 , k ). (3.14)

n n n n n n

The simple calculus on the level-2 tensor algebra T (2) Rd leads to an explicit

expression for Xnj , k , to which one can apply the (discrete) Burkholder–Davis–Gundy
n n
inequality in order to get the discrete tightness estimates (3.13). The extension to
all times is straight-forward. Details are left to the reader (see e.g. [BF13]). An
alternative argument, not restricted to level 2, is found in [BFH09]. t u

Note that Xn , as constructed above, is a (random) geometric rough path. Recall

that suchrough paths can be viewed as genuine paths with values in the Lie group
G(2) Rd ⊂ T (2) Rd . On the other hand, from (3.14), we see that Xn restricted
to discrete times { nj : j ∈ N} is a Lie group valued random walk, rescaled with
the aid of the dilatation operator. By using central limit theorems available on such
Lie groups, one can see that Xn at unit time converges weakly to Brownian motion,
enhanced with its iterated integrals in the Stratonovich sense. Under the additional
assumption that E(X ⊗ X) = I, the identity matrix, this Brownian motion is in fact
a standard Brownian motion. This is enough to characterise the finite-dimensional
distributions of any weak limit point and one has the following “Donsker” type result.
42 3 Brownian motion as a rough path

Theorem 3.12. In the rescaled random walk setting of Proposition 3.11, and under
the additional assumption that E(X ⊗ X) = I, we have the weak convergence

Xn =⇒ BStrat

in the rough path space C α ([0, T ], Rd ), any α < 1/2.

Recall that, by definition, weak convergence is stable under push-forward by

continuous maps. The interest in this result is therefore clearly given by the fact that
stochastic integrals and the Itô map can be viewed as continuous maps on rough path
spaces, as will be discussed in later chapters.

3.7 Exercises

Exercise 3.13. Complete the proof of Theorem 3.3.

Exercise 3.14. Bypass the use of Wiener–Itô chaos integrability in Proposition 3.4
by showing directly that the matrix-valued random variable BItô 0,1 has moments of all
orders. Hint: this is trivial for the on-diagonal entries, for the off-diagonal entries
one can argue via conditioning, Itô isometry, and reflection principle.

Exercise 3.15. Show that d-dimensional Brownian motion B enhanced with Lévy’s
stochastic area is a degenerate diffusion process and find its generator.

Exercise 3.16 (Q-Wiener process as rough path). Consider a separable Hilbert

space H withorthonormal basis (ek ), (λk ) ∈ l1 , λk > 0 for all k, and a countable
sequence β k of independent standard Brownian motions. Then the limit
∞
1/2
X
Xt := λk βtk ek
k=1

exists a.s. and in L2 , uniformly onP

compacts. This defines a Q-Wiener process in
the sense of [DPZ92], where Q = k λk hek , ·iek is symmetric, non-negative and
trace-class; conversely, any such operator Q on H can be written in this form and
thus gives rise to a Q-Wiener process. Show that
∞ Z t
1/2 1/2
X
Xs,t := λj λk βsj dβsk ej ⊗ ek
j,k=1 s

exists a.s. and in L2 , uniformly on compacts and so defines X with values in H ⊗HS H,
the closure of the algebraic tensor product H ⊗a H under the Hilbert–Schmidt norm.
Consider both the case of Itô and Stratonovich integration and verify that with either
choice, (X, X) ∈ C α a.s. for any α < 1/2.
3.7 Exercises 43

Exercise 3.17 (Banach-valued Brownian motion as rough path [LLQ02]). Con-

sider a separable Banach space V equipped with a centred Gaussian measure µ. By
a standard construction (cf. [Led96]) this gives rise to a so-called abstract Wiener
space (V, H, µ), with H ⊂ V the Cameron–Martin space of µ. (Examples to have in
mind are V = H = Rd with µ = N (0, I), or the usual Wiener space V = C([0, 1])
equipped with Wiener measure, H is then the space of absolutely continuous paths
starting at zero with L2 -derivative.) There then exists a V -valued Brownian motion
(Bt : t ∈ [0, T ]) such that
• B0 = 0,
• B has independent increments,
2
• hBs,t , v ∗ i ∼ N 0, (t − s)v ∗ H whenever 0 ≤ s < t ≤ T and v ∗ ∈ V ∗ ,→
H∗ ∼= H.
We assume that V ⊗ V is equipped with an exact tensor norm (with respect to µ)
in the sense that there exists γ ∈ [1/2, 1) and a constant C > 0 such that for any
sequence {Gk ⊗ G̃k : k ≥ 1} of independent V -valued Gaussian random variables
with identical distribution µ,

N
2 
X
E

Gk ⊗ G̃k
 ≤ CN 2γ = o(N ).

k=1 V ⊗V

a) Verify that exactness holds with γ = 1/2 whenever dim V < ∞. (More generally,
exactness with γ = 1/2 always holds true if one works with the injective tensor
product space, V ⊗inj V , the injective norm being the smallest possible. For the
largest possible norm, the projective norm, the o(N )-estimate remains true but
can be as slow as one wishes; exactness may then fail; cf. [LLQ02]. Exactness
of the usual Wiener-space, with uniform or Hölder norm, is also known to be true.)

b) Fix α < 1/2.R Show that dyadic piecewise linear approximations B n , enhanced
with Bn = B n ⊗ dB n , converge in α-Hölder rough path metric to a limit
B in C α ([0, T ], V ). More precisely, use the previous exercise to show that the
sequence Bn = (B n , Bn ) is Cauchy in the sense that

|%a (Bn , Bm )|Lq → 0 with n, m → ∞ .

Conclude that Bn converges in C α and Lq to some limit B ∈ C α ([0, T ], V ) a.s.

c) Show that B is the Lq -limit in α-Hölder rough path metric for all piecewise linear
approximations, say B Dn , as long as mesh |Dn | → 0 with n → ∞. Show that
the convergence is almost sure if |Dn | ∼ 2−n and also |Dn | ∼ 1/n.

Solution 3.18. We only sketch the main step in the proof of b). Without loss of
generality, we set T = 1. The crux of the matter is to show that Bn0,1 converges in
V ⊗ V . The rest follows from scaling and equivalence of moments in the first two
Wiener chaoses. Set tnk = k/2n . Then
44 3 Brownian motion as a rough path
2n 2
X
n+1 n
2
0,1 − B0,1 L2 ∼ E Btn+1 ,tn+1 ⊗ Btn+1 ,tn+1
B
2k−2 2k−1 2k−1 2k
k=1 V ⊗V
n 2
2
1 X n+1 n+1

∼ 2n+2 E 2 2 Btn+1 ,tn+1 ⊗ 2 2 Btn+1 ,tn+1

2 2k−2 2k−1 2k−1 2k
k=1 V ⊗V
n 2
X2
−2n−2
∼2 E Gk ⊗ G̃k

k=1 V ⊗V
−2n−2 2γn
.2 2
−2n(1−γ)
∼2 ,

where the penultimate bound was obtained by exactness. By definition of exactness

1 − γ > 0 and so Bn0,1 is Cauchy in the L2 -space of V ⊗ V -valued random variables.
Exercise 3.19. In the context of Theorem 3.8, assume M normal and show that the
Lévy area correction takes the form
1
A= Anti(M ) Sym(M )−1
2
and conclude that the correction is zero if and only if M is symmetric. Is this also
true without the assumption that M is normal?
Exercise 3.20. In the context of Theorem 3.8, show that “physical Brownian motion
with mass m” converges as m → 0, in %α and Lq , α ∈ (1/2, 1/3) and q < ∞, with
rate
1
O , any θ < 1/2 − α.
mθ
Hint: Use Theorem 3.3 to show rough path convergence. (The computations are a
little longer, but of similar type, with the additional feature that the use of the ergodic
theorem can be avoided.)
Exercise 3.21. Consider physical Brownian motion in dimension d = 2, with

0 −1
M =I −α , α ∈ R.
1 0

Show that the area correction of X m , in the (small mass) limit m → 0 limit, is given
by
α 0 −1
.
2(1 + α2 ) 1 0
(This correction is computed by multiscale / homogenisation techniques in the book
[PS08]).
Exercise 3.22. Consider Xt = bt+σBt where b ∈ Rd , a = σσ ∗ ∈ (Rd )⊗2 . In other
words, X is a Lévy process with triplet (a, b, 0). Show that the expected signature of
3.7 Exercises 45

X over [0, T ] is given by

1
ES(X)0,T = exp T b + a .
2

Here, the exponential should be interpreted as the exponential in the tensor algebra,
i.e.
1 1
exp(u) = 1 + u + u ⊗ u + u ⊗ u ⊗ u + . . .
2! 3!
Exercise 3.23 (Expected signature for Lévy processes [FS12b]). Consider a com-
pound Poisson process Y with intensity λ and jumps distributed like J = J(ω) ∼ ν.
in other words, Y is Lévy with triplet (0, 0, K) where the Lévy measure is given by
K = λν. A sample path of Y gives rise to piecewise linear, continuous path; simply
by connecting J1 , J1 + J2 etc. Show that, under a suitable integrability condition for
J,
ES(Y )0,T = exp T λE(eJ − 1).
Can you handle the case of a general Lévy process?
Exercise 3.24 (Level-3 cubature formula). Define a measure µ on C [0, 1], Rd by

assigning equal weight 2−d to each of the paths

 
±1
 ±1  d
t 7→ t
... ∈ R .


±1

Call the resulting process (Xt (ω) : t ∈ [0, 1]) and compute the expected signature
up to level 3, that is
Z Z
E 1, X0,1 , dXt1 ⊗ dXt2 , dXt1 ⊗ dXt2 ⊗ dXt3 .
0<t1 <t2 <1 0<t1 <t2 <t3 <1

Compare with expected signature of Brownian motion, the tensor exponential

exp( 12 I), projected to the first 3 levels.
P
Solution 3.25. Xt (ω) = t i Zi (ω)ei with i.i.d. random variables Zi taking values
+1, −1 with equal probability. Clearly,
Z
E dXt1 = EXt1 = 0.
0<t1 <1

Then,
Z
1X 1
dXt1 dXt2 = Zi Zj ei ⊗ ej = I + (zero mean)
0<t1 <t2 <1 2 i,j 2

and so the expected value at level 2 matches π2 exp( 12 I) = 12 I. A similar expansion

on level 3 shows that every summand either contains, for some i, a factor EZti1 = 0 or
46 3 Brownian motion as a rough path
3
E Zti1 = 0. In other words, the expected signature at level 3 is zero, in agreement
with π3 exp( 21 I) = 0. We conclude that the expected signatures, of µ on the one

hand and Wiener measure on the other hand, agree up to level 3.

Exercise 3.26. Prove the Kolmogorov tightness criterion, Theorem 3.10.

3.8 Comments

The modification of Kolmogorov’s criterion for rough paths (Theorem 3.1) is a minor
variation on a rather well-known theme. Rough path regularity of Brownian motion
was first established in the thesis of Sipiläinen, [Sip93].
For extensions to infinite dimensional Wiener processes (and also convergence
of piecewise linear approximations in rough path sense) see Ledoux, Lyons and
Qian [LLQ02] and Dereich [Der10]; much of the interest here is to go beyond the
Hilbert space setting. The resulting stochastic integration theory against Banach-
space valued Brownian motion, which in essence cannot be done by classical methods,
has proven crucial in some recent applications (cf. the works of Kawabi–Inahama
[IK06], Dereich [Der10]).
Early proofs of Brownian rough path regularity were typically established by
convergence of dyadic piecewise linear approximations to (B, BStrat ) in (p-variation)
rough path metric; see e.g. Lyons–Qian [LQ02]. Many other “obvious” (but as we
have seen: not all reasonable) approximations are seen to yield the same Brownian
rough path limit. The discussion of Brownian motion in a magnetic field follows
closely Friz, Gassiat and Lyons [FGL13]. Continuous semi-martingales and large
classes of multidimensional Gaussian – and Markovian – processes lift to random
rough paths; convergence of piecewise linear approximation in rough path topology
is also known to hold true to hold in great generality. See e.g. Friz–Victoir [FV10b]
and the references therein. The expected signature of Brownian motion was first
established in the thesis of Fawcett [Faw04]; different proofs were then given by
Lyons–Victoir, Baudoin and Friz–Shekhar, [LV04, Bau04, FS12b]. Fawcett’s formula
is central to the Kusuoka–Lyons–Victoir cubature method ([Kus01, LV04]). More
generally, expected signatures capture important aspects of the law of a stochastic
process. See Chevyrev [Che13]. The extension to Lévy processes, Exercise 3.23, is
taken from Friz–Shekhar [FS12b]. The computation of expected signatures of large
classes of stochastic processes including stopped Brownian motion and stochastic
Löwner equations is presently pursued by a number of people including Lyons–
Ni [LN11], Werness [Wer12] and Boedihardjo–Qian [BNQ13]. The Donsker type
theorem, Theorem 3.12, in uniform topology, is a consequence of Stroock–Varadhan
[SV73]; the rough path case is due to Breuillard, Friz, and Huesmann [BFH09]].
Applications to cubature are discussed in [BF13].
Chapter 4
Integration against rough paths

R
Abstract The aim of this section is to give a meaning to the expression Yt dXt for
a suitable class of integrands Y , integrated against a rough path X. We first discuss
the case originally studied by Lyons where Y = F (X). We then introduce the notion
of a controlled rough path and show that this forms a natural class of integrands.

4.1 Introduction
R
The aim of this chapter is to give a meaning to the expression Yt dXt , for X ∈
C α ([0, T ], V ) and Y some continuous function with values in L(V, W ), the space
of bounded linear operators from V into some other Banach space W . Of course,
such an integral cannot be defined
R for arbitrary continuous functions Y , especially if
we want the map (X, Y ) 7→ Y dX to be continuous in the relevant topologies. We
therefore also want to identify a “good” class of integrands Y for the rough path X.
A natural approach would be to try to define the integral as a limit of Riemann–
Stieltjes sums, that is
Z 1 X
Yt dXt = lim Ys Xs,t , (4.1)
0 |P|→0
[s,t]∈P

where P denotes a partitionSof [0, 1] (interpreted as a finite collection of essentially

disjoint intervals such that P = [0, 1]) and |P| denotes the length of the largest
element of P. Such a definition - the Young integral - has been studied in detail in
the seminal paper by Young [You36], where it was shown that such a sum converges
if X ∈ C α and Y ∈ C β , provided α + β > 1, and that the resulting bilinear map
is continuous. This result is sharp in the sense that one can construct sequences of
n n n n 1/2
R n Yn and X such that Y → 0 and X → 0 in C ([0, 1], R), but
smooth functions
such that Y dX → ∞.
As a consequence of Young’s inequality [You36], one has the bound

47
48 4 Integration against rough paths
Z
1

(Yr − Y0 ) dXr ≤ CkY kβ;[0,1] kXkα;[0,1] , (4.2)
0

with C depending on α + β > 1. Given paths X, Y defined on [s, t] rather than [0, 1]
it is an easy consequence of the scaling properties of Hölder semi-norms, that
Z t
α+β

Yr dXr − Ys s,t ≤ CkY kβ kXkα |t − s|
X . (4.3)
s

2α
In particular, when α = β > 1/2, the right hand side is proportional to |t − s| =
o(|t − s|) which is to be compared with the estimate (4.20) below.
The main insight of the theory of rough paths is that this seemingly unsurmount-
able barrier of α + β > 1/2 (which reduces to α > 1/2 in the case α = β which
is our main interest1 ) can be broken by adding additional structure to the problem.
Indeed, for a rough path X, we postulate the values Xs,t of the integral of XRagainst
itself, see (2.2). It is then intuitively clear that one should be able to define Y dX
in a consistent way, provided that Y “looks like X”, at least on very small scales (in
the precise sense of (4.16) below). The easiest way for a function Y to “look like
X” is to have Yt = F (Xt ) for some sufficiently smooth F : V → L(V, W ), called a
1-form.

4.2 Integration of 1-forms

We aim to integrate Y = F (X) against X = (X, X) ∈ C α . When F : V → L(V, W )

is in C 1 , or better, a Taylor approximation gives

F (Xr ) ≈ F (Xs ) + DF (Xs )Xs,r , (4.4)

for r in some (small) interval [s, t], say. Recall (see sections 1.4 and1.5 concerning
the infinite-dimensional case) that2

L(V, L(V, W )) ∼
= L(V ⊗ V, W ) ,

so that DF (Xs ) may be regarded as element in L(V ⊗ V, W ). Since the Young

integral defined in (4.1), when applied to Y = F (X), is effectively based on the
approximation F (Xr ) ≈ F (Xs ), for r ∈ [s, t], it is natural to hope, with a motivating
look at (2.2), that the compensated Riemann-Stieltjes sum appearing at the right-hand
1
.... but see Exercise 4.25.
2
In coordinates, when dim V, dim W < ∞, G = DF (Xs ) takes the form of a (1, 2)-tensor
(Gki,j ) and the identification amounts to
X X
v 7→ ṽ 7→ Gk i j
i,j v ṽ versus M 7→ Gk
i,j M
i,j
.
k k
i,j i,j
4.2 Integration of 1-forms 49

side of Z 1 X
F (Xs ) dXs ≈ F (Xs )Xs,t + DF (Xs )Xs,t , (4.5)
0 [s,t]∈P

provides a good enough approximation (say, is Cauchy as |P| → 0) even when

X ceases to have α-Hölder regularity for α > 1/2 (as required by Young theory),
but assuming instead X = (X, X) ∈ C α , α ∈ 13 , 12 . Why should this be good
enough? The intuition is as follows: given α ∈ 13 , 12 neither |Xs,t | ∼ |t − s|α nor
|Xs,t | ∼ |t − s|2α in the above sum will be negligible as |P| → 0. Continuing in the
same fashion, one expects (in fact one can show it) that the third iterated integral
(3) (3)
Xs,t is of order Xs,t ∼ |t − s|3α = o(|t − s|), so that adding a third term of the form
(3)
D2 F (Xs )Xs,t in the sum of (4.5), at the very least, will not affect any limit, should
it exist. In the following, we will see that this limit,3
Z 1 X
F (Xs ) dXs = lim F (Xs )Xs,t + DF (Xs )Xs,t , (4.6)
0 |P|→0
[s,t]∈P

4
does exists and call it rough integral.
R· In fact, in this section we shall construct the
(indefinite) rough integral Z = 0 F (X)dX as element in C α , i.e. as path, similar
to the construction of stochastic integrals as processes rather than random variables.
Even this may not be sufficient in applications - one often wants to have an extended
meaning of the rough integral, such as (Z, Z) ∈ C α , point of view emphasised in
[Lyo98, LQ02, LCL07], or something similar (such as “Z controlled by X” in the
sense of Definition 4.6 below, to be discussed in the next section).

Lemma 4.1. Let F : V → L(V, W ) be a Cb2 function and let (X, X) ∈ C α for some
α > 13 . Set Ys := F (Xs ), Ys0 := DF (Xs ) and Rs,t
Y
:= Ys,t − Ys0 Xs,t . Then

Y, Y 0 ∈ C α and RY ∈ C 2α . (4.7)

(In the terminology of the forthcoming Definition 4.6: “Y is controlled by X with

Gubinelli derivative Y 0 ; in symbols (Y, Y 0 ) ∈ DX
2α
”.) More precisely, we have the
estimates

kY kα ≤ kDF k∞ kXkα ,
kY 0 k ≤ D2 F kXk ,

α ∞ α

R ≤ 1 D2 F kXk2 .
Y
2α ∞ α
2

3
Recall that lim|P|→0 means convergence along any sequence (Pn ) with mesh |Pn | → 0, with
identical limit along each such sequence. In particular, it is not enough to establish convergence
along a particular sequence (Pn ), although a particular sequence may be used to identify the limit.
4
Of course, we can and will consider intervals other than [0, 1]. Without further notice, P always
denotes a partition of the interval under consideration.
50 4 Integration against rough paths

Proof. Cb2 regularity of F implies that F and DF are both Lipschitz continuous with
Lipschitz constants kDF k∞ and kD2 F k∞ respectively. The α-Hölder bounds on Y
and Y 0 are then immediate. For the remainder term, consider the function

[0, 1] 3 ξ 7→ F (Xs + ξXs,t ) .

A Taylor expansion, with intermediate value remainder, yields ξ ∈ (0, 1) such that

Y 1 2
Rs,t = F (Xt ) − F (Xs ) − DF (Xs )Xs,t = D F (X s + ξXs,t )(Xs,t , Xs,t ) .
2
Y
The claimed 2α-Hölder estimate, in the sense that |Rs,t | . |t − s|2α , then follows at
once. t u

Before we prove that the rough integral (4.6) exists, we discuss some sort of
abstract Riemann integration. In what follows, at first reading, one may
R t have in mind
the construction of a Riemann-Stieltjes (or Young) integral Zt := 0 Yr dXr . From
Young’s inequality (4.3), one has (with Zs,t = Zt − Zs as usual)

Zs,t = Ys Xs,t + o(|t − s|)

and Ξs,t := Ys Xs,t is a sufficiently good local approximation in the sense that it
fully determines the integral Z via the limiting procedure given in (4.1)). In this
sense Z = IΞ is the well-defined image of Ξ under some abstract integration map
I. Note that Zs,t = Zs,u + Zu,t , i.e. increments are additive (or “multiplicative” if
one regards + as group operation5 ) whereas a similar property fails for Ξ. In the
language of [Lyo98], such a Ξ corresponds to a “almost multiplicative functional”
and it is a key result in the theory that there is a unique associated “multiplicative
functional” (here: Z = IΞ). Following [Gub04, FdLP06] we call “sewing” the step
from a (good enough) local approximation Ξ to some (abstract) integral IΞ; the
concrete estimate which quantifies how well IΞ is approximated by Ξ will be called
“sewing lemma”. It plays an analogous role to “Davie’s lemma” (cf. section 8.7) in
the context of (rough) differential equations.
We now formalize what we mean by Ξ being a good enough local approximations.
For this, we introduce the space C2α,β ([0, T ], W ) of functions Ξ from the simplex
0 ≤ s ≤ t ≤ T into W such that Ξt,t = 0 and such that
def
kΞkα,β = kΞkα + kδΞkβ < ∞ , (4.8)
|Ξs,t |
where kΞkα = sups<t |t−s|α as usual, and also

def |δΞs,u,t |
δΞs,u,t = Ξs,t − Ξs,u − Ξu,t , kδΞkβ = sup β
.
s<u<t |t − s|

5
This terminology becomes natural if one considers Z together with its iterated integrals as
group-valued path, increments of which satisfy Chen’s “multiplicative” relation, see (2.3).
4.2 Integration of 1-forms 51

Provided that β > 1, it turns out that such functions are “almost” of the form
Ξs,t = Ft − Fs , for some α-Hölder continuous function F (they would be if and
only if δΞ = 0). Indeed, it is possible to construct in a canonical way a function Ξ̂
with δ Ξ̂ = 0 and such that Ξ̂s,t ≈ Ξs,t for |t − s| 1:

Lemma 4.2 (Sewing lemma). Let α and β be such that 0 < α ≤ 1 < β. Then,
there exists a (unique) continuous map I : C2α,β ([0, T ], W ) → C α ([0, T ], W ) such
that (IΞ)0 = 0 and
(IΞ)s,t − Ξs,t ≤ C|t − s|β .

(4.9)
where C only depends on β and kδΞkβ . (The α-Hölder norm of IΞ also depends
on kΞkα and hence on kΞkα,β .)

Proof. Note first that I will be built as a linear map, so that its continuity is an
immediate consequence of its boundedness. Uniqueness of I is also immediate.
Indeed, assume by contradiction that, for a given Ξ, there are two candidates F
and F̄ for IΞ. Since both of these functions have to satisfy the bound (4.9), the
function F − F̄ satisfies (F − F̄ )0 = 0 and (F − F̄ )s,t . |t − s|β . Since β > 1 by
assumption, it follows immediately that F − F̄ vanishes identically.
It remains to find the map I. It is very natural to make the guess
X
IΞ)s,t = lim Ξu,v , (4.10)
|P|→0
[u,v]∈P

where P denotes a partition of [s, t] and |P| denotes its mesh, i.e. the length of its
largest element. The remainder of the proof shows that this expression is well-defined
and that (4.9) holds.
Why is (4.10) well-defined? Because of its importance we give two (independent
but related) arguments. The first argument is based on successive (dyadic) refinement,
i.e. one starts by identifying the integral as limit of Riemann type sums, along a
particular sequence (Pn ). This is followed by checking that the limit is indeed
interval [s, t], we
independent of the choice of partitions. More precisely, for a given
start with the trivial partition P0 = {[s, t]} and we set I 0 Ξ s,t = Ξs,t . We then
define recursively [
Pn+1 = u, m , m, v ,
[u,v]∈Pn

with m := m(u, v) := (u + v)/2 so that Pn , the level-n dyadic partion of [s, t]

contains 2n intervals, each of length 2−n |t − s|. We then set
def X X
I n+1 Ξ s,t = Ξu,v = I n Ξ s,t −

δΞu,m,v ,
[u,v]∈Pn+1 [u,v]∈Pn

where it is a straightforward exercise to check that the second equality holds. It then
follows immediately from the definition of k · kα,β that
n+1
Ξ s,t − I n Ξ s,t ≤ 2n(1−β) |t − s|β kδΞkβ .

I
52 4 Integration against rough paths

Since β > 1, these terms are summable and we conclude immediately that the
sequence (I n Ξ)s,t is Cauchy. It thus admits a limit (IΞ)s,t such that, by summing
up the bound above, one has
X n+1
Ξ s,t − I n Ξ s,t ≤ CkδΞkβ |t − s|β , (4.11)

IΞ − Ξs,t
≤ I
s,t
n≥0

for some universal constant C depending only on β, which is precisely the required
bound (4.9). It remains to see that the limit just constructed is independent of the
choice of partitions. Once one has shown that δIΞ = 0, which is equivalent to
(IΞ)0,t = (IΞ)0,s + (IΞ)s,t for all pairs s, t, this is not too difficult. Indeed, if P
denotes an arbitrary partition of [s, t] and we introduce
Z X
Ξ := Ξu,v ,
P [u,v]∈P

then the difference between (IΞ)s,t and this approximation can the be estimated,
thanks to (4.11) as
X
(IΞ)u,v − Ξu,v = O |P|β−1 ) .

[u,v]∈P

Since β > 1, this is enough to show that (IΞ)s,t is the limit along any sequence Pn
with mesh tending to zero. What remains to be shown is δIΞ = 0. In general this
is not obvious (but see Remark 4.3) and indeed, writing Pns,t for the level-n dyadic
partition relative to [s, t], as used above, this is quite tedious since Pn0,t is not equal
to the partition of [0, t] given by Pn0,s ∪ Pns,t , even though both have mesh tending
P n → ∞. In fact, one is better off to define the integral over [s, t] as the
to zero with
limit of [u,v]∈Pn0,T ,[u,v]⊂[s.t] Ξu,v . In Exercise 4.21, the reader is invited to work
out the remaining details.
The second argument, which is essentially due to Young, yields immediately
convergence as |P| → 0, i.e. the same limit is obtained along any sequence Pn with
mesh tending to zero. (As an immediate consequence, without any details left to the
reader, δIΞ = 0. Another advantage of Young’s construction is that it works under a
weaker 1/α-variation assumption on (X, X).) Consider a partition P of [s, t] and let
r ≥ 1 be the number of intervals in P. When r ≥ 2 there exists u ∈ [s, t] such that
[u− , u], [u, u+ ] ∈ P and
2
|u+ − u− | ≤ |t − s|.
r−1
P
Indeed, assuming otherwise R the contradiction 2|t − s| ≥ u∈P ◦ |u+ − u− | >
gives
2|t − s|. Hence, | P\{u} Ξ − P Ξ| = |δΞu− ,u,u+ | ≤ kδΞkβ (2|t − s|/(r − 1))β
R

and by iterating this procedure until the partition is reduced to P = {[s, t]}, we arrive
at the maximal inequality,
4.2 Integration of 1-forms 53
Z
sup Ξs,t − Ξ ≤ 2β kδΞkβ ζ(β)|t − s|β ,

P⊂[s,t] P

where ζ denotes the classical ζ function. It then remains to show that

Z Z
sup Ξ − Ξ → 0 as ε ↓ 0, (4.12)

|P|∨|P 0 |<ε P P0
R
which implies existence of IΞ as the limit lim|P|→0 P Ξ. To this end, at the price
of adding / subtracting P ∪ P 0 , we can assume without loss of generality that P 0
refines P. In particular, then |P| ∨ |P 0 | = |P| and
Z Z X Z
Ξ− Ξ= Ξu,v − Ξ .
P P0 [u,v]∈P P 0 ∩[u,v]

But then, for any P with |P| ≤ ε we can use the maximal inequality to see that
Z Z
X
≤ 2β ζ(β)kδΞk β
|v − u| = O |P|β−1 = O(εβ−1 ).

Ξ− Ξ
β
P P0 [u,v]∈P

This concludes the Young argument (with no hidden tedium left to the reader). t
u
Remark 4.3. The first argument ultimately suffered from the tedium of checking
the additivity property δIΞ = 0. In some cases, however, this addivity R property
of IΞ can be immediate. Imagine X : [0, T ] → V is smooth, X = X ⊗ dX,
and one is only interested in an error estimate for second order approximations of
Riemann-Stieltjes integrals, of the form
Z t

F (X r ) dXr − F (X s )X s,t − DF (X s )X s,t ≤ right-hand side of (4.13).

s

(This is still a highly non-trivial estimate since the right-hand side is uniform over all
(smooth) paths, as long as their α-rough path norms remain bounded!) In the context
of the above proof, this estimate is contained in the first step, applied with

Ξs,t = F (Xs )Xs,t + DF (Xs )Xs,t .

But here it is clear from classical Riemann-Stieltjes theory, or in fact just Riemann
integration theory, that IΞs,t , constructed as limit of dyadic partitions of [s, t], is
Rt
precisely the Riemann-Stieltjes integral s F (Xr ) dXr and therefore additive. (The
contribution of DF (X)X in the approximations disappears in the limit; indeed, it
2
suffices to remark that Xu,v ∼ |v − u| , thanks to smoothness of X.)
We now apply the sewing lemma to the construction of (4.6). We have the follow-
ing.
Theorem 4.4 (Lyons). Let X = (X, X) ∈ C α ([0, T ], V ) for some T > 0 and
α > 13 , and let F : V → L(V, W ) be a Cb2 function. Then, the rough integral defined
54 4 Integration against rough paths

in (4.6) exists and one has the bound

Z t
F (Xr ) dXr − F (Xs )Xs,t − DF (Xs )Xs,t

s

3 3α
. kF kC 2 kXkα + kXkα kXk2α |t − s| , (4.13)
b

where the proportionality constant depends only on α. Furthermore, the indefinite

rough integral is α-Hölder continuous on [0, T ] and we have the following quantita-
tive estimate,
Z ·
F (X) dX ≤ CkF k 2 |||X|||α ∨ |||X|||1/α

C α , (4.14)
b
0 α

where the constant C only depends on p T and α and can be chosen uniformly in
T ≤ 1. Furthermore, |||X|||α = kXkα + kXk2α denotes again the homogeneous
α-Hölder rough path norm.
R·
Remark 4.5. We will see in Section 4.4 that the map (X, X) ∈ C α 7→ 0 F (X) dX ∈
C α is continuous in α-Hölder rough path metric.

Proof. Let us stress the fact that the argument given here only relies on the properties
of the integrand Y = F (X) collected in Lemma 4.1 above. In particular, the general-
isation to “extended” integrands (Y, Y 0 ), which replace (F (X), DF (X)), subject to
(4.7), will be immediate. (We shall develop this “Gubinelli” point of view further in
Section 4.3 below.)
The result follows as a consequence of Lemma 4.2. With the notation that we just
introduced, the classical Young integral [You36] can be defined as the usual limit of
Riemann sums by
Z t

Yr dXr = IΞ s,t , Ξs,t = Ys Xs,t .
s

Unfortunately, this definition satisfies the identity

δΞs,u,t = −Ys,u Xu,t ,

so that, except in trivial cases, the required bound (4.8) is satisfied only if Y and
X are Hölder continuous with Hölder exponents adding up to β > 1. In order to
be able to cover the situation α < 12 , it follows that we need to consider a better
approximation to the Riemann sums, as discussed above. To this end, we use the
notation from Lemma 4.1, namely

Ys := F (Xs ) , Ys0 := DF (Xs ) and Rs,t

Y
:= Ys,t − Ys0 Xs,t ,

and then set Ξs,t = Ys Xs,t + Ys0 Xs,t . Note that, for any u ∈ (s, t), we have the
identity
Y 0
δΞs,u,t = −Rs,u Xu,t − Ys,u Xu,t .
4.3 Integration of controlled rough paths 55

Thanks to the α-Hölder regularity of X, Y 0 and the 2α-regularity of R, X, the triangle

inequality shows that (4.8) holds true with the given α > 1/3 and β := 3α > 1. The
fact that the integral is well-defined, and the bound
Z t
3α
Y dX − Ys Xs,t − Ys Xs,t . kXkα RY 2α + kXk2α kY 0 kα |t − s|
0

s
(4.15)
then follow immediately from (4.11). Upon substituting the estimate obtained in
Lemma 4.1, we obtain (4.13). R
We now turn to the proof of (4.14). Writing Z = F (X)dX and using the triangle
inequality in (4.13) gives

|Zs,t | ≤ kF k∞ |Xs,t | + kDF k∞ |Xs,t |

3 3α
+ CkF kC 2 kXkα + kXkα kXk2α |t − s|
b
h i
α 2α 3α
≤ CkF kC 2 A1 |t − s| + A2 |t − s| + A3 |t − s| ,
b

with Ai ≤ |||X|||α , for 1 ≤ i ≤ 3. Allowing C to change, this already implies

kZkα ≤ CkF kC 2 |||X|||α ∨ |||X|||3α ,

which is the claimed estimate (4.14) in the limit α ↓ 1/3. However, one can do better
by realising that the above estimate is best for |t − s| small, whereas for t − s large
it is better to split up |Zs,t | into the sum of small increments. To make this more
precise, set % := |||X|||α and write (hide factor C = C(α, T ) in . below)
α 2α 3α
|Zs,t | . %|t − s| + %2 |t − s| + %3 |t − s|
α
≤ 3%|t − s| for %1/α |t − s| ≤ 1.

Increments of Z over [s, t] with length greater than h := %−1/α are handled by
cutting them into pieces of length h. More precisely (cf. Exercise 4.24) we have
kZkα;h ≤ 3% which entails

kZkα ≤ 3% 1 ∨ 2h−(1−α) ≤ 6 % ∨ %1/α .

At last, we note that C = C(α, T ) can be chosen uniformly in T ≤ 1. t

4.3 Integration of controlled rough paths

Motivated by Lemma 4.7 and the observation that rough integration essentially relies
on the properties (4.7) we introduce the notion of a controlled path Y , relative to
some “reference” path X, due to Gubinelli [Gub04]. For the sake of the following
definition we assume that Y takes values in some Banach space, say W̄ . When it
56 4 Integration against rough paths

comes to the definition of a rough integral we typically take W̄ = L(V, W ); although

other choices can be useful (see e.g. remark 4.11). In the context of rough differential
equations, with solutions in W̄ = W , we actually need to integrate f (Y ), which will
be seen to be controlled by X for sufficiently smooth coefficients f : W → L(V, W ).

Definition 4.6. Given a path X ∈ C α ([0, T ], V ), we say that Y ∈ C α ([0, T ], W̄ ) is

controlled by X if there exists Y 0 ∈ C α ([0, T ], L(V, W̄ )) so that the remainder term
RY given implicitly through the relation

Ys,t = Ys0 Xs,t + Rs,t

Y
, (4.16)

satisfies kRY k2α < ∞. This defines the space of controlled rough paths,

(Y, Y 0 ) ∈ DX
2α
([0, T ], W̄ ).

Although Y 0 is not, in general, uniquely determined from Y (cf. Remark 4.7 and
Section 6 below) we call any such Y 0 the Gubinelli derivative of Y (with respect to
X).
Y
Here, Rs,t takes values in W̄ , and the norm k · k2α for a function with two
arguments is given by (2.3) as before. We endow the space DX
2α
with the semi-norm

kY, Y 0 kX,2α = kY 0 kα + kRY k2α .

def
(4.17)

As in the case of classical Hölder spaces, DX

2α
is a Banach space under the norm
0 0 0
(Y, Y ) 7→ |Y0 | + |Y0 | + kY, Y kX,2α . This quantity also controls the α-Hölder
regularity of Y since, for fixed X,

kY kα ≤ kRY kα + kY 0 k∞ kXkα ≤ C(1 + kXkα ) (|Y00 | + kY, Y 0 kX,2α ) , (4.18)

where the constant C only depends on T and α and in fact can be chosen uniformly
over T ∈ (0, 1].

Remark 4.7. Since we only assume that kY kα < ∞, but then impose that kRY k2α <
∞, it is in general the case that a genuine cancellation takes place in (4.16). The
question arises to what extent Y determines Y 0 . Somewhat contrary to the classical
situation, where a smooth function has a unique derivative, too much regularity of
the underlying rough path X leads to less information about Y 0 . For instance, if Y is
smooth, or in fact in C 2α , and the underlying rough path X happens to have a path
component X that is also C 2α , then we may take Y 0 = 0, but as a matter of fact
any continuous path Y 0 would satisfy (4.16) with kRk2α < ∞. On the other hand,
if X is far from smooth, i.e. genuinely rough on all (small) scales, uniformly in all
directions, then Y 0 is uniquely determined by Y , cf. Section 6 below.

Remark 4.8. It is important to note that while the space of rough paths C α is not
even a vector space, the space DX
2α
is a perfectly normal Banach space for any given
X = (X, X) ∈ C . The twist of course is that the space in question depends in a
α
4.3 Integration of controlled rough paths 57

crucial way on the choice of X. The set of all pairs (X; (Y, Y 0 )) gives rise to the total
space G
C α n D 2α = {X} × DX
def 2α
,
X∈C α

with base space C α and “fibres” DX2α

. While this looks reminiscent of fibre-bundles
like the tangent bundles of a smooth manifold, it is quite different in the sense that
the fibre spaces are in general not isomorphic. Loosely speaking, the rougher the
underlying path X, the “smaller” is DX 2α
, see Chapter 6.

Remark 4.9. While the notion of “controlled rough path” has many appealing fea-
tures, it does not comewith a natural approximation theory. To wit, consider
X, X ∈ Cgα [0, T ], Rd as limit of smooth paths Xn : [0, T ] → Rd in the sense of
Proposition 2.5. Then it is natural to approximate Y = F (X) by the Yn = F (Xn ),
which is again smooth (to the extent that F permits). On the other hand, there are
no obvious approximations (Yn , Yn0 ) ∈ DX 2α
n for an arbitrary controlled rough path

(Y, Y ) ∈ DX .
0 2α

We are now ready to extend Young’s integral to that of a path controlled by

X against X = (X, X). Recall from Lemma 4.1 that Y = F (X), with Y 0 =
DF (X), is somewhat
R the prototype of a controlled rough path. The definition of the
rough integral F (X)dX in terms of compensated Riemann sums, cf. (4.6), then
immediately suggests to define the integral of Y against X by6
Z 1 X
Ys Xs,t + Ys0 Xs,t ,
def
Y dX = lim (4.19)
0 |P|→0
[s,t]∈P

where we took W̄ = L(V, W ) and used the canonical injection L(V, L(V, W )) ,→
L(V ⊗ V, W ) in writing Ys0 Xs,t . With these notations, the resulting integral takes
values in W .
With these notations at hand, it is now straight-forward to prove the following
result, which is a slight reformulation of [Gub04, Prop 1]:

Theorem 4.10 (Gubinelli). Let T > 0, let X = (X, X) ∈ C α ([0, T ], V ) for some
α > 13 , and let (Y, Y 0 ) ∈ DX
2α
[0, T ], L(V, W ) . Then there exists a constant C
depending only on T and α (and C can be chosen uniformly over T ∈ (0, 1]) such
that
a) The integral defined in (4.19) exists and, for every pair s, t, one has the bound
Z t
Yr dXr − Ys Xs,t − Ys0 Xs,t ≤ C kXkα kRY k2α + kXk2α kY 0 kα |t − s|3α .

s
(4.20)
b) The map from DX [0, T ], L(V, W ) to DX
2α
2α

[0, T ], W given by

6
Note the abuse of notation: we hide dependence on Y 0 which in general affects the limit but is
usually clear from the context.
58 4 Integration against rough paths
Z ·
(Y, Y 0 ) 7→ (Z, Z 0 ) := Yt dXt , Y , (4.21)
0

is a continuous linear map between Banach spaces and one has the bound

kZ, Z 0 kX,2α ≤ kY kα + kY 0 kL∞ kXk2α + C kXkα kRY k2α + kXk2α kY 0 kα .

Proof. Part a) is an immediate consequence of Lemma 4.2, as already pointed out in

the proof of Theorem 4.4. The estimate (4.20) was pointed out explicitly in (4.15).
The continuity is a consequence of the continuity of I in Lemma 4.2, and will
be discussed in full detail in Section 4.4 below. It remains to show the bound on
kZ, Z 0 kX,2α . Splitting up the left hand side of (4.20) after the first term, using the
Rt
triangle inequality, gives immediately an α Hölder estimate on s Yr dXr = Zs,t , so
that Z ∈ C α . (Z 0 = Y ∈ C α is trivial, by the very nature of Y .) Similarly, splitting
up the left hand
R t side of (4.20) after the second term, gives an 2α-Hölder type estimate
estimate on s Yr dXr − Ys Xs,t = Zs,t − Zs0 Xs,t =: Rs,t Z
, i.e. on the remainder term
0
in the sense of (4.16). The explicit estimate for kZ, Z kX,2α = kY kα + kRZ k2α is
then obvious. t u

Remark 4.11. As in the above theorem, assume that (X, X) ∈ C α ([0, T ], V ) and
consider Y and Z two paths controlled by X. More precisely, we assume (Y, Y 0 ) ∈
DX2α
([0, T ], L(V̄ , W )) and (Z, Z 0 ) ∈ DX
2α
([0, T ], V̄ ), where of course V, V̄ , W are
all Banach spaces. Then, in terms of the abstract integration map I (cf. the sewing
lemma) we may define the integral of Y against Z, with values in W , as follows,
Z t
Yu dZu = (IΞ)s,t , Ξu,v = Yu Zu,v + Yu0 Zu0 Xu,v .
def
(4.22)
s

Here, we use the fact that Zu0 ∈ L(V, V̄ ) can be canonically identified with an opera-
tor in L(V ⊗V, V ⊗ V̄ ) by acting only on the second factor, and Yu0 ∈ L(V, L(V̄ , W ))
is identified as before with an operator in L(V ⊗ V̄ , W ). The reader may be helped
to see this spelled out in coordinates, assuming finite dimensions: using indices i, j
in W, V̄ respectively, and then k, l in V :
i i j i j k,l
(Ξu,v ) = (Yu )j (Zu,v ) + (Yu0 )k,j (Zu0 )l (Xu,v ) .

Note that, relative to the definition of Ξ in the previous proof, it suffices to

replace X by Z and Y 0 by Y 0 Z 0 . Making this substitution in δΞ, as it appears in the
aforementioned proof, then gives
Z
δΞs,u,t = −Rs,u Xu,t − (Y 0 Z 0 )s,u Xu,t

in the present situation. Clearly Y 0 Z 0 ∈ C α and so kδΞkβ is finite which allows the
proof to go through mutatis mutandis. In particular, (4.20) is valid, with the above
substitution, and reads
4.3 Integration of controlled rough paths 59
Z t
Yr dZr −Ys Zs,t −Ys0 Zs0 Xs,t ≤ C kXkα kRZ k2α +kXk2α kY 0 Z 0 kα |t−s|3α .

s
(4.23)
If Z = X and Z 0 is the identity operator, then this coincides with the definition
(4.19). Furthermore, in the smooth case, one can check that we again recover the
usual Riemann / Young integral.

Remark 4.12. If, in the notation of the proof of Theorem 4.4, Ξ and Ξ̃ are such that
Ξ − Ξ̃ ∈ C2β for some β > 1, i.e.

|Ξs,t − Ξ̃s,t | = O(|t − s|β ) ,

then IΞ = I Ξ̃. Indeed, it is immediate that

X
|Ξu,v − Ξ̃u,v | = O(|P|β−1 ) ,
[u,v]∈P

which converges to 0 as |P| → 0. (This remains true if O(|t − s|β ) with β > 1 is
replaced by o(|t − s|).)
This also shows that, if X and Y are smooth functions and X is defined by (2.2),
the integral that we just defined does coincide with the usual Riemann–Stieltjes
integral. However, if we change X, then the resulting integral does change, as will be
seen in the next example.

Example 4.13. Let f be a 2α-Hölder continuous function and let X = (X, X) and
X̄ = (X̄, X̄) be two rough paths such that

X̄t = Xt , X̄s,t = Xs,t + f (t) − f (s) .

Let furthermore (Y, Y 0 ) ∈ DX 2α

as above. Then also (Ȳ , Ȳ 0 ) := (Y, Y 0 ) ∈ DX̄2α
.
However, it follows immediately from (4.19) that
Z t Z t Z t
Ȳr dX̄r = Yr dXr + Yr0 df (r) . (4.24)
s s s

Here, the second term on the right hand side is a simple Young integral, which is
well-defined since α + 2α > 1 by assumption.

Remark 4.14. As we will see below, (4.24) can be interpreted as a generalisation of

the usual expression relating Itô integrals to Stratonovich integrals.

Remark 4.15. The bound (4.20) does behave in a very natural way under dilatations.
Indeed, the integral is invariant under the transformation

(Y, Y 0 , X, X) 7→ (λ−1 Y, λ−2 Y 0 , λX, λ2 X) . (4.25)

The same is true for the right hand side of (4.20), since under this dilatation, we also
have RY 7→ λ−1 RY .
60 4 Integration against rough paths

4.4 Stability I: rough integration

Consider X = (X, X), X̃ = (X̃, X̃) ∈ C α with (Y, Y 0 ) ∈ DX 2α

, (Ỹ , Ỹ 0 ) ∈ DX̃
2α
.
0 0
Although (Y, Y ) and (Ỹ , Ỹ ) live, in general, in different Banach spaces, the “dis-
tance”
dX,X̃,2α Y, Y 0 ; Ỹ , Ỹ 0 = Y 0 − Ỹ 0 α + RY − RỸ 2α
def

will be useful. Even when X = X̃, it is not a proper metric for it fails to separate
(Y, Y 0 ) and (Y + cX + c̄, Y 0 + c) for anytwo constants
c and c̄. When X 6= X̃,
the assertion “zero distance implies Y, Y 0 = Ỹ , Ỹ 0 ” does not even make sense.
(The two objects live in completely different spaces!) That said, for every fixed
(X, X) ∈ C α , one has (with Rs,t
Y
= Ys,t − Ys0 Xs,t as usual), a canonical map

ιX : Y, Y 0 ∈ CX
α
7→ Y 0 , RY ∈ C α ⊕ C22α .

Given Y0 = ξ, this map is injective since one can reconstruct Y by Yt = ξ +Y00 X0,t +
Y
R0,t . From this point of view, one simply has

dX,X̃,2α = kιX (.) − ιX̃ (.)kα,2α ,

and one is back in a normal Banach setting, where k·, ·kα,2α = k · kα + k · k2α is a
natural semi-norm on C α ⊕ C22α . (In fact, it is a norm if one only considers elements
in C α started at 0.) Elementary estimates of the form

ab − ãb̃ ≤ a b − b̃ + a − ã b̃ (4.26)

then lead to
0
+ Y00 Xs,t + Ỹ0,s + Ỹ0 X̃s,t + Rs,t
Y Ỹ

Ys,t − Ỹs,t = Y0,s − Rs,t

α
≤ C|t − s| Y00 − Ỹ00 + Y 0 − Ỹ 0 α + X − X̃ α + RY − RỸ 2α ,

with a constant C = C(R, T ), provided |Y00 |, kXkα , kY 0 kα and similarly for the
same quantities with tilde, all have their norms bounded by R. (As usual, C can be
taken uniform in T ≤ 1 since in this case k·kα;[0,T ] ≤ k·k2α;[0,T ] .) It follows that

Y − Ỹ ≤ C X − X̃ + Y00 − Ỹ00 + d 0 0

α α X,X̃,2α Y, Y ; Ỹ , Ỹ . (4.27)

An estimate of the proper

α-Hölder
norm of Y − Ỹ (rather than its semi-norm) is
obtained by adding Y0 − Ỹ0 to both sides.

Theorem 4.16 (Stability of rough integration). Consider X = X, X , X̃ =
X̃, X̃ ∈ C α , Y, Y 0 ∈ DX , Ỹ , Ỹ 0 ∈ DX̃
2α
2α
in a bounded set, in the sense

|Y00 | + kY, Y 0 kX,2α ≤ M, %α (0, X) ≡ ||X||α + kXk2α ≤ M,

4.5 Controlled rough paths of lower regularity 61

with identical bounds for X̃, X̃ , Ỹ , Ỹ 0 , for some M < ∞. Define

Z ·
(Z, Z 0 ) := Y dX, Y ∈ DX 2α
,
0

and similarly for Z̃, Z̃ 0 . Then the following (local) Lipschitz estimates holds true,

dX,X̃,2α Z, Z 0 ; Z̃, Z̃ 0 ≤ CM %α X, X̃ + Y00 − Ỹ00 + dX,X̃,2α Y, Y 0 ; Ỹ , Ỹ 0 ,

(4.28)
and also

Z−Z̃ ≤ CM %α X, X̃ + Y0 − Ỹ0 + Y00 − Ỹ00 + d 0 0

α X,X̃,2α Y, Y ; Ỹ , Ỹ ,
(4.29)
where CM = C(M, T, α) is a suitable constant.

Proof. (The reader is advised to review the proofs of Theorems 4.4, 4.10.) We first
note that (4.27) applied to Z, Z̃ (note: Z00 − Z̃0 = Y0 − Ỹ ) shows that (4.29) is an
immediate consequence of the first estimate (4.28). Thus, we only need to discuss
the first estimate. By definition of dX,X̃,2α , we need to estimate
0
Z − Z̃ 0 + kRZ − RZ̃ k2α = Y − Ỹ + RZ − RZ̃ .

α α 2α

Thanks to (4.27), the first summand is clearly bounded by the right-hand side of
(4.28). For the second summand we recall
Z t
0
Z
Rs,t = Zs,t − Zs Xs,t = Y dX − Ys Xs,t = (IΞ)s,t − Ξs,t + Ys0 Xs,t
s

where Ξs,t = Ys Xs,t + Ys0 Xs,t and similar for RZ̃ . Setting ∆ = Ξ − Ξ̃, we use
(4.11) with β = 3α and Ξ replaced by ∆, so that

= I∆ s,t − ∆s,t + Ys0 Xs,t − Ỹs0 X̃s,t

Z Z̃

Rs,t − Rs,t
3α
≤ Ckδ∆k3α |t − s| + Ys0 Xs,t − Ỹs0 X̃s,t ,

Ỹ Y 0 0
where δ∆s,u,t = Rs,u X̃u,t − Rs,u Xu,t + Ỹs,u X̃u,t − Ys,u Xu,t . We then conclude
with some elementary estimates of the type (4.26), noting that all involved quantities
stay bounded. tu

4.5 Controlled rough paths of lower regularity

Recall that we showed in Section 2.3 how an α-Hölder rough path X could be defined
as a path with values in the p-step nilpotent Lie group G(p) (Rd ) ⊂ T (p) (Rd ), with
p = b1/αc. It does not seem obvious at all a priori how one would define a controlled
62 4 Integration against rough paths

rough path in this context. One way of interpreting Definition 4.6 is as a kind of
local “Taylor expansion” up to order 2α. It seems natural in the light of the previous
subsections that if α < 31 , a controlled rough path should have a kind of “Taylor
expansion” up to order pα.
As a consequence, if we expand Xs,t = X−1
def
s ⊗ Xt as
X
Xs,t = Xws,t ew ,
|w|≤p

where |w| denotes the length of the word w, one would expect that a controlled rough
path should have an expansion of the form
X
δYs,t = Ysw Xw Y
s,t + Rs,t , (4.30)
|w|≤p−1

Y
with |Rs,t | . |t − s|pα . Recall however that in Definition 4.6 we also needed a
regularity condition on the “derivative process” Y 0 . The equivalent statement in the
present context is that the Ysw should themselves be described by a local “Taylor
expansion”, but this time only up to order (p − |w|)α. A neat way of packaging
this into a compact statement is to view Y as a T (p−1) (Rd )-valued function and to
introduce a scalar product on T (p) (Rd ) by postulating that hew , ew̄ i = δw,w̄ for any
two words w and w̄. One then has the following extension of Definition 4.6 (see
Exercise 4.26).

Definition 4.17. A controlled rough path is a T (p−1) (Rd )-valued function Y such
that, for every word w with |w| ≤ p − 1, one has the bound
hew , Yt i − hXs,t ⊗ ew , Ys i ≤ C|t − s|(p−|w|)α .

(4.31)

Given such a controlled rough path Y , it is then natural to define its integral
against any component X i by
Z t X X
def
Zt = Ys dXsi = lim Yrw hew ⊗ ei , Xr,s i ,
0 |P|→0
[r,s]∈P |w|≤p−1

where ei is the unit vector associated to the word consisting of the single letter
i. It turns out [Gub10] that Z is again a controlled rough path in the sense of
Definition 4.17 provided that we lift it to T (p−1) (Rd ) by imposing that
def
hew ⊗ ei , Zt i = Yrw ,

and by setting Ztw = 0 for all non-empty words that do not terminate with the letter
i.
4.6 Exercises 63

4.6 Exercises

Exercise 4.18. a) Deduce (4.3) from (4.2).

b) Show that there is a constant C depending only on T > 0 and α + β > 1 such
that Z ·

Y dX
≤ C |Y0 | + kY kβ;[0,T ] kXkα;[0,T ] . (4.32)
0 α;[0,T ]

In fact, show that C can be chosen uniformly over T ∈ (0, 1].

Solution 4.19. a) Given X on [s, t], define X̃ : [0, 1] 3 u 7→ X(s + u(t − s)) and
β
verify kX̃kα;[0,1] = |t − s| kXkβ;[s,t] . Proceeding similarly for Y , applying (4.2)
to X̃, Ỹ then gives (4.3).
b) Write Z for the indefinite integral. From (4.3), for every 0 ≤ s < t ≤ T ,
α+β
|Zs,t | ≤ |Ys ||Xs,t | + CkY kβ;[s,t] kXkα;[s,t] |t − s|

α
≤ |Y0 | + kY kβ;[0,T ] T β |Xs,t | + CkY kβ;[0,T ] kXkα;[0,T ] T β |t − s|
h i
α
≤ |Y0 | + kY kβ;[0,T ] T β (1 + C) kXkα;[0,T ] |t − s| .
h i
β α
≤ (1 ∨ T ) (1 + C) |Y0 | + kY kβ;[0,T ] kXkα;[0,T ] |t − s| ,

and this entails the claimed estimates.

Exercise 4.20. Let X = (X, X) ∈ C α ([0, T ], V ) and assume that F : V →

L(V, W ) is of gradient form, i.e. F = DG where G : V → W is sufficiently
smooth, say Cb3 . Show that the relation
Z t
F (X)dX = G(Xt ) − G(Xs ) ,
s

holds true whenever X is a geometric rough path. (Hence, from a rough path perspec-
tive, integration of gradient 1-forms against geometric rough paths is trivial for the
outcome does not depend on X.) What about non-geometric rough paths?

Exercise 4.21. Complete the “first argument” in the proof of Theorem 4.4.

Solution 4.22. Let Pn by the dyadic partitions of [0, T ], so that #Pn = 2n and mesh
|Pn | = T /2n . Call elements of Pn dyadic intervals (of level-n). Given an interval
[s, t] ⊂ [0, T ] there exists m ≥ 0, such that P m is the coarsest dyadic partition which
contains a dyadic interval ⊂ [s, t]. Note that |t − s| ∼ T /2m . We then define, for
n ≥ m, and a general interval [s, t],
X
n
Is,t := Ξu,v .
[u,v]∈Pn :
[u,v]⊂[s,t]
64 4 Integration against rough paths

n (n) (n)
Note Is,t = Ξs,t if [s, t] ∈ Pn . Write s+ (resp. t− ) for the closest right (resp. left)
level-n dyadic neighbour of s (resp. t) so that
(n) (n)
s ≤ s+ < t− ≤ t.
(n)
Note that if s is a level-m dyadic (i.e. s = kT /2m for some integer k) then s+ = s
for all n ≥ m, and similar for t. We have
n+1 X
n

Is,t − Is,t ≤ δΞ u+v + Ξs(n+1) ,s(n) + Ξt(n) ,t(n+1)
u, 2 ,v
[u,v]∈Pn :
[u,v]⊂[s,t]
β
|t(n) − s(n) | 1
. + 2−(n+1)α + 2−(n+1)α .
2n 2n
n

Plainly, these estimates imply, for general [s, t] ⊂ [0, T ], that Is,t : n is Cauchy
and we call the limit Is,t . In fact, I is additive in the sense that δI ≡ 0. Indeed, for
general s < u < t in [0, T ], if u− (resp. u+ ) denotes the closest left (resp. right)
level-n dyadic neighbour, then
n n n
Is,t = Is,u + Iu,v + Ξu− ,u+ ,
α α
and since Ξu− ,u+ . |u+ − u− | ∼ (1/2n ) , additivity of the limit I = lim I n
follows at once. Another immediate consequence of the above estimates, if applied
to a dyadic interval [s, t], is the estimate
β
|Is,t − Ξs,t | . |t − s| . (4.33)

Indeed, in this case m ≥ 0 is determined from |t − s| = T /2m so that [s, t] ∈ Pm

m
and since Is,t = Ξs,t we have
X β
m
2n(1−β) ∼ |t − s|2m(1−β) ∼ |t − s| .

|Is,t − Ξs,t | = Is,t − Is,t . |t − s|
n≥m

We claim that the estimate (4.33) is valid for all intervals [s, t] ⊂ [0, T ]. By
continuity, it will be enough to consider s < t in ∪n Pn . As in the proof of
the Kolmogorov criterion, Theorem 3.1, we consider a (finite) partition P =
(τi ) of [s, t], which “efficiently” exhausts [s, t] with dyadic intervals of length
∼ 2−n , n ≥ m, in the sense that no three intervals have the same length. Note
that |P | ≡ max {|v − u| : [v, u] ∈ P } = 2−m ≤ |t − s| (and in fact ∼ |t − s| due
to minimal choice of m). Thanks to additiviy of I and (4.33) for dyadic intervals,
X X
|Is,t − Ξs,t | = (Iu,v − Ξu,v ) − Ξs,t − Ξu,v

[u,v]∈P [u,v]∈P
X X
β
. |v − u| + Ξs,t − Ξu,v .
[u,v]∈P [u,v]∈P
4.6 Exercises 65
∞
X β
≤ |t − s| + δΞs,τ
−(i+1) ,τ−i
+ δΞτi ,τi+1 ,t ,
i=0

equality (“τi = τi+1 ” for some i),

and similarly, |τ−i − s| . 1/2m+i . As a consequence, one obtains

∞
X X β β
(1/2n ) ∼ 1/2mβ ∼ |t − s| ,

δΞs,τ + δΞτi ,τi+1 ,t . 2
−(i+1) ,τ−i
i=0 n≥m

β
so that |Is,t − Ξs,t | . |t − s| , as required
Exercise 4.23. Adapt the proof of Theorem 4.4 such as to obtain Young’s estimate
(4.3).
Exercise 4.24. Fix α ∈ (0, 1], h > 0 and M > 0. Consider a path Z : [0, T ] → V
and show that
|Zs,t |
−(1−α)

kZkα;h ≡ sup α ≤ M =⇒ kZkα ≤ M 1 ∨ 2h .
0≤s<t≤T |t − s|
t−s≤h

α
(Here, as usual, kZkα ≡ sup0≤s<t≤T |Zs,t |/|t − s| .)
Proof. By scaling it suffices to consider M = 1. Fix 0 ≤ s < t ≤ T , we need
α
to show |Zs,t |/|t − s| is bounded by 1 ∨ 2h1/α−1 . There is nothing to show for
|t − s| ≤ h. We therefore assume h ≤ |t − s| and define ti = (s + ih) ∧ t, for
i = 0, 1, . . . noting that tN = t for N ≥ |t − s|/h and also ti+1 − ti ≤ h for all i.
But then
X
|Zs,t | ≤ Zti ,ti+1
0≤i<|t−s|/h

≤ hα (1 + |t − s|/h) = hα−1 (h + |t − s|) ≤ 2hα−1 |t − s|.

and we are done. t

u
0
Exercise 4.25. Show the assumption on Y ∈ DX 2α
can be weakend to Y ∈ DX 2α
,
0 0
α < α, provided α + 2α > 1, and reformulate Theorem 4.10 accordingly. In
3α
particular, show that the estimate (4.20) holds upon replacing the final factor |t − s|
0
α+2α
by |t − s| , and kY 0 kα (resp. kRY k2α ) by kY 0 kα0 (resp. kRY k2α0 ).
Exercise 4.26. Checkthat this definition 4.17 is consistent with definition 4.6 in the
case when α ∈ 31 , 12 . Check also that if one takes w = φ, the empty word, then
Y
(4.31) reduces to (4.30) with |Rs,t | . |t − s|pα .
66 4 Integration against rough paths

4.7 Comments

The notion of integration of 1-forms against “general” p-variation geometric rough

paths, for any p ∈ [1, ∞), was developed by Lyons [Lyo98]; see also [Lyo98, LQ02,
LCL07]. A general estimate of the form (4.14) appears in [FV10b, Thm 10.47],
at least in the finite-dimensional setting of that book. Rough integration against
“controlled paths” is due to Gubinelli, see [Gub04] where it is developed in a α-
Hölder setting, α > 31 . Loosely speaking, it allows to “linearise” many considerations
(the space of controlled paths is a Banach space, while a typical space of rough paths
is not). This point of view has been generalized to arbitrary α (both in the geometric
and the non-geometric setting) in [Gub10].
We will see in Chapter 13 that this point of view can be pushed even further and,
as a matter of fact, the theory of regularity structures provides a unified framework
in which the Gubinelli derivative and the regular derivatives are but two examples
of a more general theory of objects behaving “like Taylor expansions” and allowing
to describe the small-scale structure of a function and / or distribution in terms of
“known” objects (polynomials in the case of Taylor expansions, the underlying rough
path in the case of controlled paths).
Chapter 5
Stochastic integration and Itô’s formula

Abstract In this chapter, we compare the integration theory developed in the pre-
vious chapter to the usual theories of stochastic integration, be it in the Itô or the
Stratonovich sense.

5.1 Itô integration

Recall from Section 3 that Brownian motion B can be enhanced to a (random) rough
path B = (B, B). Presently our focus is the case when B is given by the iterated Itô
integral 1 Z t
def
Bs,t = BItô
s,t = Bs,u ⊗ dBu
s
and the so enhanced Brownian motion has almost surely (non-geometric) α-Hölder
rough sample paths, for any α ∈ 31 , 12 . That is, B(ω) = (B(ω), B(ω)) ∈ C α for
every ω ∈ N1c where, here and in the sequel, Ni , i = 1, 2, ... denote suitable null sets.
We now show that rough integrals (against B = BItô ) and Itô integrals, whenever
both are well-defined, coincide.
Proposition 5.1. Assume (Y (ω), Y 0 (ω)) ∈ DB(ω)
2α
for every ω ∈ N2c . Set N3 =
N1 ∪ N2 . Then the rough integral
Z T X
Y dB = lim (Yu Bu,v + Yu0 Bu,v )
0 n→∞
[u,v]∈Pn

exists, for each fixed ω ∈ N3c , along any sequence (Pn ) with mesh |Pn | ↓ 0. If Y, Y 0
are adapted then, almost surely,
Z T Z T
Y dB = Y dB .
0 0
1
The case when B is given via iterated Stratonovich integration is left to Section 5.2 below.

67
68 5 Stochastic integration and Itô’s formula

Proof. Without loss of generality T = 1. The existence of the rough integral for
ω ∈ N3c under the stated assumptions is immediate from Theorem 4.10, applied
to Y (ω), controlled by B(ω), for ω ∈ N2c fixed. Recall (e.g. [RY91]) that for any
continuous, adapted process Y the Itô integral against Brownian motion has the
representation
Z 1 X
Y dB = lim Yu Bu,v (in probability)
0 n→∞
[u,v]∈Pn

along any sequence (Pn ) with mesh |Pn | ↓ 0. By switching to a subsequence, if

necessary, we can assume that the convergence holds almost surely, say on N4c . Set
N5 := N3 ∪ N4 . We shall complete the proof under the assumption that there exists
a (deterministic) constant M > 0 such that

sup |Y 0 (ω)|∞ ≤ M .
ω∈N5c

(This is the case in the “model” situation Y = F (X), Y 0 = DF (X) where F was
in particular assumed to have bounded derivatives; the general case is obtained by
localisation and left to Exercise 5.13.)
The claim is that the rough and Itô integral coincide on N5c . With a look at the
respective Riemann-sums, convergent away from N5 , basic analysis tells us that
X
∀ω ∈ N5c : ∃ lim Yu0 Bu,v ,
n
[u,v]∈Pn

and that this limit equals the difference of rough and Itô integrals (on N5c , a set of
full measure). Of course, |Pn | ↓ 0, and to see that the above limit is indeed zero (at
least on a set of full measure), it will be enough to show that
2
X 0

Yu Bu,v
2 = O(|P|) . (5.1)
[u,v]∈P L

To this end, assume the partition is of the form P = {0 = τ0 < · · · < τN = 1}

as desired. t
u
5.2 Stratonovich integration 69

5.2 Stratonovich integration

We could equally well have enhanced Brownian motion with

Z t
1
BStrat
s,t := Bs,u ⊗ ◦dBu = BItô
s,t + (t − s)I .
s 2

Almost surely, this construction then yields geometric α-Hölder rough sample paths,
for any α ∈ 13 , 12 . Recall that, by definition, the Stratonovich integral is given by
Z T Z T
def 1
Y ◦ dB = Y dB + [Y, B]T
0 0 2

P quadratic covariation of Y and

whenever the Itô integral is well-defined and the
B exists in the sense that [Y, B]T := lim|P|→0 [u,v]∈P Yu,v Bu,v exists as limit in
probability.
In complete analogy to the Itô case, we now show that rough integration against
Stratonovich enhanced Brownian motion coincides with usual Stratonovich integra-
tion against Brownian motion under some natural assumptions guaranteeing that
both notions of integral are well-defined.
α
Corollary 5.2. As above, assume Y = Y (ω) ∈ CB(ω) for every ω ∈ N2c . Set
N3 = N1 ∪ N2 . Then the rough integral of Y against B = BStrat exists,
Z T X
Yu Bu,v + Yu0 BStrat

Y dB = lim u,v .
0 n→∞
[u,v]∈Pn

Moreover, if Y, Y 0 are adapted, the quadratic covariation of Y and B exists and,

almost surely,
Z T Z T
Y dB = Y ◦ dB.
0 0

d d
Proof. BStrat Itô
s,t = Bs,t + fs,t where f (t) = (1/2)t I, where I ∈ R ⊗ R denotes the
identity matrix. This entails, as was discussed in Example 4.13,
Z 1 Z 1 Z 1
Y dBStrat = Y dBItô + Y 0 df.
0 0 0
R1 R1
Thanks to Proposition 5.1, it only remains to identify 2 0
Y 0 df = 0
Yt0 dt with
[Y, B]1 . To see this, write
X X
0

Yu,v Bu,v = Yu,v Bu,v Bu,v + Ru,v Bu,v
[u,v]∈P [u,v]∈P
X
0 3α−1
= Yu,v (Bu,v ⊗ Bu,v ) + O |P| ,
[u,v]∈P
70 5 Stochastic integration and Itô’s formula
3α−1
thanks to R ∈ C22α and B ∈ C α .
P
where we used that Ru,v Bu,v = O |P|
Note that

Bu,v ⊗ Bu,v = 2 Sym BStrat Itô

u,v = 2 Sym Bu,v + (v − u)I.

We have seen in the proof of Proposition 5.1 that any limit (in probability, say) of
X
0
Yu,v BItô
u,v
[u,v]∈P

must be zero. In fact, a look at the argument reveals that this remains true with BItô
u,v

replaced by Sym BItôu,v . It follows that

X X Z 1
0
lim Yu,v Bu,v = lim Yu,v (v − u) = Yt0 dt ,
|P|→0 |P|→0 0
[u,v]∈P [u,v]∈P

thus concluding the proof. t

5.3 Itô’s formula and Föllmer

Given a smooth path X : [0, T ] → V and a map F : V → W in Cb1 , where V, W are

Banach spaces as usual, the chain rule from classical “first oder” calculus tells us that
Z t
F (Xt ) = F (X0 ) + DF (Xs )dXs , 0 ≤ t ≤ T.
0

Unsurprisingly, the same change of variables formula holds for geometric rough
paths X = (X, X), which are essentially limits of smooth paths, and it is not hard
to figure out, in view of Example 4.13, that a “second order” correction, involving
D2 F , appears in the non-geometric case. In other words, one can write down Itô
formulae for rough paths.
Before doing so, however, an important preliminary discussion is in order. Namely,
much of our effort so far was devoted to the understanding of (rough) integration
against 1-forms, say G = G(X) and indeed we found
Z X
G(X)dX ≈ hG(Xs ), Xs,t i + hDG(Xs ), Xs,t i
[s,t]∈P

in the sense that the compensated Riemann-Stieltjes sums appearing on the right-
hand side converge with mesh |P| → 0. Let us split X into symmetric part, Ss,t :=
Sym (Xs,t ), and antisymmetric (“area”) part, Anti (Xs,t ) := As,t . Then

hDG(Xs ), Xs,t i = hDG(Xs ), Ss,t i + hDG(Xs ), As,t i

5.3 Itô’s formula and Föllmer 71

and the final term disappears in the gradient case, i.e. when G = DF . Indeed, the
contraction of a symmetric tensor (here: D2 F ) with an antisymmetric tensor (here:
A) always vanishes. In other words, area matters very much for general integrals
of 1-forms but not at all for gradient 1-forms. Note also that, contrary to A, the
symmetric part S is a nice function of the underlying path X. For instance, for Itô
enhanced Brownian motion in Rd , one has the identity
Z t
1 i j
Si,j
s,t = B i
dB j
= B B − δ ij
(t − s) , 1 ≤ i, j ≤ d .
s
s,r r
2 s,t s,t

These considerations suggest that the following definition encapsulates all the data
required for the integration of gradient 1-forms.
Definition 5.3. We call X = (X, S) a reduced rough path, in symbols X ∈
Crα ([0, T ], V ), if X = Xt takes values in a Banach space V , S = Ss,t takes values
in Sym (V ⊗ V ), and the following hold:
i) a “reduced” Chen relation

Ss,t − Ss,u − Su,t = Sym (Xs,u ⊗ Xu,t ) , 0 ≤ s, t, u ≤ T ,

α
ii) the usual analytical conditions, Xs,t = O(|t − s| ), Ss,t = O |t − s|2α , for
some α > 1/3.
Clearly, any X = (X, X) ∈ C α ([0, T ], V ) induces a reduced rough path by
ignoring its area A = Anti (X). More importantly, and in stark contrast to the general
rough path case, the lift of a path X ∈ C α to a reduced path is essentially trivial by
setting Ss,t = 12 Xs,t ⊗ Xs,t . Indeed, we have the following result.
Lemma 5.4. Given X ∈ C α , α ∈ (1/3, 1/2], the “geometric” choice S̄s,t =
1
C
α
2 Xs,t ⊗ X s,t yields a reduced rough path, i.e. X, S̄ ∈ r . Moreover, for any
2α-Hölder path γ (with values in Sym (V ⊗ V ), the perturbation
1 1
Ss,t = S̄s,t + (γt − γs ) = (Xs,t ⊗ Xs,t + γs,t )
2 2
also yields a reduced rough path (X, S). Finally, all reduced rough paths over X are
obtained in this fashion.
Proof. A simple exercise for the reader. t
u
The previous lemma gives in particular a one-one correspondence between S and
γ. We thus formalize the role of γ.
Definition 5.5 (Bracket of a reduced rough path). Given X = (X, S) ∈ Crα (V ),
we define the bracket

[X] : [0, T ] → Sym (V ⊗ V )

def
t 7→ [X]t = X0,t ⊗ X0,t − 2S0,t .
72 5 Stochastic integration and Itô’s formula

Note that, as consequence of the previous lemma, [X] ∈ C 2α . Furthermore, if one

defines
def
[X]s,t = Xs,t ⊗ Xs,t − 2Ss,t ,
then one has the identity [X]s,t = [X]0,t − [X]0,s for any two times s, t.

Proposition 5.6 (Itô formula for reduced rough paths). Let F : V → W be of

class Cb3 and let X = (X, S) ∈ Crα ([0, T ], V ) with α > 1/3. Then
Z t Z t
1
F (Xt ) = F (X0 ) + DF (Xs )dXs + D2 F (Xs )d[X]s , 0 ≤ t ≤ T.
0 2 0

Here, writing P for partitions of [0, t], the first integral is given by2
Z t X
def
hDF (Xu ), Xu,v i + D2 F (Xu ), Su,v ,

DF (Xs )dXs = lim
0 |P|→0
[u,v]∈P
(5.2)
while the second integral is a well-defined Young integral.

Proof. Consider first the geometric case, S = S̄, in which case the bracket is zero. The
proof is straightforward. Indeed, thanks to α-Hölder regularity of X with α > 1/3,
we obtain
X
F (XT ) − F (X0 ) = F (Xv ) − F (Xu )
[u,v]∈P
X 1
2
= hDF (Xu ), Xu,v i + D F (Xu ), Xu,v ⊗ Xu,v
2
[u,v]∈P

+ o(|v − u|)
X
hDF (Xu ), Xu,v i + D2 F (Xu ), S̄u,v + o(|v − u|) .

=
[u,v]∈P
P
We conclude by taking the limit |P| → 0, also noting that [u,v]∈P o(|v − u|) → 0.
For the non-geometric situation, just substitute
1
S̄u,v = Su,v + [X]u,v .
2
Since D2 F is Lipschitz, D2 F (X· ) ∈ C α and we can split-up the “bracket” term and
note that Z t
X D E
D2 F (Xu ), [X]u,v → D2 F (Xu )d[X]u ,
[u,v]∈P 0

where the convergence to the Young integral follows from [X] ∈ C 2α . The rest is
now obvious. tu

2
Note consistency with the rough integral when X ∈ C α .
5.3 Itô’s formula and Föllmer 73

Example 5.7. Consider the case when X = B, Itô enhanced Brownian motion. Then
X is given by the iterated Itô integrals, with twice its symmetric part given by
Z t
2Si,j B i dB j + B j dB i = Bti Btj − B i , B j t .

0,t =
0

The usual Itô formula is then recovered from the fact that
i,j j
i
− 2Si,j

i j i,j
[B]t = B0,t B0,t 0,t = B , B 0,t = δ t .

We conclude this section with a short discussion on Föllmer’s calcul d’Itô sans
probabilités [Föl81]. For simplicity of notation, we take V = Rd , W = Re in what
follows. With regard to (5.2), let us insist that the compensation is necessary and one
cannot, in general, separate the sum into two convergent sums. On the other hand,
we can combine the converging sums and write
X
hDF (Xu ), Xu,v i + D2 F (Xu ), Su,v

F (X)0,t = lim
|P|→0
[u,v]∈P
1 X
+ D2 F (Xu )[X]u,v (5.3)
2
[u,v]∈P
X 1

= lim hDF (Xu ), Xu,v i + D2 F (Xu ), Xu,v ⊗ Xu,v .
|P|→0 2
[u,v]∈P

We now put forward an assumption that allows to break up the sum above.
Definition 5.8. Let (Pn ) be a sequence of partitions of [0, T ] with mesh |Pn | → 0.
We say that X : [0, T ] → Rd has finite quadratic variation in the sense of Föllmer
along (Pn ) if and only if, for every t ∈ [0, T ] and 1 ≤ i, j ≤ d the limit
i j X j j
i i
X , X t := lim Xv∧t − Xu∧t Xv∧t − Xu∧t
n→∞
[u,v]∈Pn

exists. Write [X, X] for the resulting path with values in Sym Rd ⊗ Rd , i.e. the

space of symmetric d × d matrices.

Lemma 5.9. Assume X : [0, T ] → Rd has finite quadratic variation in the sense of
Föllmer, along (Pn ). Then the map t 7→ [X, X]t is of bounded variation on [0, T ].
Assume furthermore that t 7→ [X, X]t is continuous. Then, for any continuous
G : [0, T ] → L Rd ⊗ Rd , Re ,

X Z t
lim hG(u), Xu,v ⊗ Xu,v i = G(u)d[X, X]u ∈ Re .
n→∞ 0
[u,v]∈Pn
u<t

i For the
Proof. first statement, it is enough to argue component by component. Set
X := [X i , X i ]. By polarisation,
74 5 Stochastic integration and Itô’s formula

i j 1
X , X t = X i + X j t − X i t − X j t.

2

Since each term on the right-hand side is monotone in t, we see that t 7→ X i , X j t
is indeed of bounded variation.
Regarding the second statement, it is enough to check that, for continuous g :
[0, T ] → R and Y of finite quadratic variation, with continuous bracket t 7→ [Y ]t ,
X Z t
2
lim g(u)Yu,v = g(u)d[Y ]u . (5.4)
n→∞ 0
[u,v]∈Pn
u<t

Indeed, we can apply this for each component, with g = Gki,j and

X i + X j , X i, X j ,

Y ∈

which then also gives, by polarisation,

X Z t
Gki,j (u)Xu,v
i j
Gki,j (u)d X i , X j u .

Xu,v →
[u,v]∈Pn 0
u<t

2
P R
To see that (5.4) holds, write [u,v]∈Pn ,u<t g(u)Yu,v = [0,t)
g(u)dµn (u) with
X
2
µn = δu Yu,v
[u,v]∈Pn ,u<t

Note that the µn is a (finite) measure on [0, t) with distribution function

X
2
Fn (s) := µn ([0, s]) = Yu,v .
[u,v]∈Pn
u≤s

As n → ∞, Fn (s) → [Y ]s for any s ≤ t by continuity of Y . Since [Y ]s is a

continuous function of s, convergence of the distribution functions implies weak
convergence of the measures µn to the measure d[Y ] on [0, t), with distribution
function [Y ]. Since g|[0,t) is continuous, (5.4) follows and the proof is finished. t
u

Combination of the above lemma with (5.3) gives the Itô–Föllmer formula,
Z t
1 t 2
Z
F (Xt ) = F (X0 ) + DF (Xs )dX + D F (Xs )d[X, X]t , 0 ≤ t ≤ T
0 2 0
(5.5)
where the middle integral is given by the (now existent) limit of left-point Riemann-
Stieltjes approximations
5.4 Backward integration 75

X Z t
lim hDF (Xu ), Xu,v i =: DF (X)dX.
n→∞ 0
[u,v]∈Pn

In fact, we encourage the reader to verify as an exercise that this formula is valid
whenever X : [0, T ] → Rd is continuous, of finite quadratic variation, with t 7→
[X, X]t continuous. Note, however, that Föllmer’s notion of quadratic variation (and
the above integral) can and will depend in general on the sequence (Pn ).

5.4 Backward integration

Let us recall that the backward Itô-integral is defined as

T
←−
Z X
ft dB t = lim ft Bs,t
0 n
[s,t]∈P n

whenever this limit exists, in probability and uniformly on compact time intervals,
and does not depend on the sequence of partitions (as long as their meshes tend to
zero). For instance, Z t
←− 1 t
Bs dB s = Bt2 + .
0 2 2
In many applications one encounters integrand f that are “backward adapted” in the
sense that ft is FtT -measurable with Fst := σ(Bu,v : s ≤ u ≤ v ≤ t). For example
t
←−
Z
1 t 1 t
(Bt − Bs ) dB s = Bt2 − Bt2 − = Bt2 −
0 2 2 2 2

and we note (in contrast to the previous example) the zero mean property, which of
course comes from a backward martingale structure. By analogy with its forward
counterpart, the backward Stratonovich integral is defined as the backward Itô
integral, minus 1/2 times the quadratic variation of the integrand.
The purpose of this section is to understand backward integration as rough integra-
tion. To this end, recall that the rough integral of (Y, Y 0 ) ∈ DX
2α
against X = (X, X)
was defined by
Z T X
Y dX = lim Ys Xs,t + Ys0 Xs,t
0 |P |↓0
[s,t]∈P

where P are partitions of [0, T ] with mesh-size |P |. Clearly, some sort of “left-point”
evaluation has been hard-wired into our definition of rough integral. On the other
hand, one can expect that feeding in explicit second order information makes this
choice somewhat less important than in the case of classical stochastic integration.
The next proposition, purely deterministic, answers the questions to what extent
one can replace left-point by right-point evaluation.
76 5 Stochastic integration and Itô’s formula

Proposition 5.10 (Backward representation of rough integral). Given a rough

path X = (X, X) ∈ C α with α > 1/3 and (Y, Y 0 ) ∈ DX
2α
we have
Z T X
Yt Xs,t + Yt0 (Xs,t − Xs,t ⊗ Xs,t ) .

Y dX = lim (5.6)
0 |P|↓0
[s,t]∈P

Proof. We know that the rough integral is given as (compensated) Riemann–Stieltjes

limit Z T X
Ys Xs,t + Ys0 Xs,t + (∗)s,t

Y dX = lim
0 |P|↓0
[s,t]∈P

3α
whenever (∗)s,t ≈ 0 in the sense that (∗)s,t = O |t − s| = o(|t − s|), so that it
does not contribute to the limit. (Recall (4.19) and Lemma 4.2.) But then

Ys Xs,t + Ys0 Xs,t = Yt Xs,t − Ys,t Xs,t + Ys0 Xs,t

≈ Yt Xs,t − Ys0 Xs,t ⊗ Xs,t + Ys0 Xs,t
≈ Yt Xs,t + Yt0 (Xs,t − Xs,t ⊗ Xs,t ) ,

and the claimed backward compensated Riemann–Stieltjes representation holds. t

Remark 5.11. Note that another way of writing (5.6) is the somewhat more suggestive
Z T X
Y dX = − lim Yt Xt,s + Yt0 Xt,s .
0 |P|↓0
[s,t]∈P

It is worth noting that a naively defined backward rough integral, by replacing

left-point-evaluation (Ys , Ys0 ) in the usual definition of the rough integral by right-
point evaluation (Ys , Ys0 ), say
X
lim Yt Xs,t + Yt0 Xs,t
|P|↓0
[s,t]∈P

is, in general, not well-defined. In fact, in view of the above proposition, existence of
this limit is equivalent to existence of (either)
X X
lim Yt0 Xs,t ⊗ Xs,t = lim Ys0 Xs,t ⊗ Xs,t .
|P|↓0 |P|↓0
[s,t]∈P [s,t]∈P

There is no reason why, for a general path X ∈ C α , the above limits will exists.
On the other hand, we already considered such sums in the context of the Itô–Föllmer
formula, cf. Lemma 5.9. The appropriate condition for X was seen to be “quadratic
variation (in the sense of Föllmer, along some (Pn ))”. And under this assumption,
X Z T
Ys0 Xs,t ⊗ Xs,t → Ys0 d[X]s . (5.7)
[s,t]∈P n 0
5.4 Backward integration 77

Of course, with probability one, d-dimensional standard Brownian motion has

quadratic variation in the sense of Föllmer, along dyadic partitions, for instance, with
[B, B]t = I t, where I is the identity matrix. These remarks are crucial for proving
def
Theorem 5.12. Define the random rough paths BStrat = (B, BStrat ) and Bback =
(B, Bback ) by
Z t
def 1
BStrat
s,t = Bs,r ◦ dBr = BItôs,t + I(t − s) ,
s 2
Z t
def ←−
Bback
s,t = Bs,r dB r = BItô
s,t + I(t − s) .
s

Then, the following statements hold.

i) Assume (Y (ω), Y 0 (ω)) ∈ DB(ω)
2α
a.s. and Y, Y 0 are adapted as processes. Then,
with probability one, for all t ∈ [0, T ],
t t
1 t 0
Z Z Z Z t
Strat
Y dB = Ys dBs + Y I ds = Ys ◦ dBs ,
0 0 2 0 s 0
Z t Z t Z t
Y dBback = Ys dBs + Ys0 I ds .
0 0 0

ii) Assume (Y (ω), Y 0 (ω)) ∈ DB(ω)2α

a.s. and Yt , Yt0 are FtT -measurable for all
t < T . Then with probability one, for all r ∈ [0, T ],
T T T T
←− ←−
Z Z Z Z
1
Y dB Strat
= Yt dB t − Yt0 I dt = Ys ◦ dB s ,
r r 2 r r
T T
←−
Z Z
Y dBback = Yt dB t .
r r

Proof. Regarding point i), it follows from the definition of the rough integral (see
also Example 4.13) that
Z t Z t Z t
Y dB back
= Itô
Y dB + Y 0 I ds .
0 0 0

The claim then follows from Proposition 5.1. The Stratonovich case is similar, now
using Corollary 5.2.
We now turn to point ii). Thanks to the backward presentation established in
Proposition 5.10,
Z T X
Y dBback = lim Yt Bs,t + Yt0 BItô

s,t + I(t − s) − Bs,t ⊗ Bs,t
r n→∞
[s,t]∈P n
X
= lim Yt Bs,t + Yt0 BItô 0
s,t − Ys (Bs,t ⊗ Bs,t − I(t − s)) ,
n→∞
[s,t]∈P n
78 5 Stochastic integration and Itô’s formula

0 0
using Ys,t (Xs,t ⊗ Xs,t ) ≈ 0 and Ys,t I(t − s) ≈ 0. (As before (∗)s,t ≈ 0 means
(∗)s,t = o(|t − s|).) Now we know that with probability 1, B(ω) has finite quadratic
variation [B]t = It, in the sense of Föllmer along some sequence (P n ). As a purely
deterministic consequence, cf. (5.7), on the same set of full measure,
X Z T X
lim Ys0 Bs,t ⊗ Bs,t = Ys0 d[B]s = lim Ys0 I(t − s).
n→∞ 0 n→∞
[s,t]∈P n [s,t]∈P n

It follows at once that

Z T X
Y dBback (ω) = lim Yt Bs,t + Yt0 BItô
s,t .
r n→∞
[s,t]∈P n

T 0 T
Since BItô
s,t is independent from Ft and Yt , Yt are Ft -measurable, a (backward)
martingale argument shows that
X
lim Yt0 BItô
s,t = 0.
n→∞
[s,t]∈P n

As a consequence, with probability one,

T T
←−
Z X Z
Y dBback (ω) = lim Yt Bs,t = Y dB .
r n→∞ r
[s,t]∈P n

The (backward) Stratonovich case is then treated as simple perturbation,

Z T X
Y dBStrat = lim Yt Bs,t + Yt0 BItô

s,t + I(t − s) − Bs,t ⊗ Bs,t
r n→∞
[s,t]∈P n
1
− Yt0 I(t − s)
2
Z T
←− 1 T 0
Z
= Yt dB t − Y I dt ,
0 2 0 t

thus concluding the proof. t

5.5 Exercises

Exercise 5.13. Complete the proof of Proposition 5.1 in the case the of unbounded
Y 0.

Solution 5.14. It suffices to show the convergence of (5.1) in probability; to this end,
we introduce stopping times
5.5 Exercises 79
n o
τM = max t ∈ P : |Yt0 | < M ∈ [0, T ] ∪ {+∞}
def

and note that limM →∞ τM = ∞ almost surely. The stopped process S·τM is also a
martingale, and we see as above that, for every fixed M > 0,
X 2
0

Yu Bu,v = O(|P|).
2 L
[u,v]∈P
u≤τM

The proof is then easily finished by sending M to infinity.

Exercise 5.15 (Applications to statistics, see [DFM13]). Let B be a d-dimensional
Brownian motion. Consider a d × d matrix A, a non-degenerate volatility matrix
σ of the same dimension and a sufficiently nice map h : Rd → Rd so that the Itô
stochastic differential equation

dYt = A h(Yt )dt + σdBt (5.8)

has a unique solution, starting from any Y0 = y0 ∈ Rd . (As a matter of fact, this
SDE can be solved pathwise by considering the random ODE for Zt = Yt − σBt .)
We are interested in the maximum likelihood estimation of the drift parameter A over
a fixed time horizon [0, T ], given some observation path Y = Y (ω). Recall that this
estimator, ÂT (ω), is based on the Radon–Nikodym density on pathspace, as given
by Girsanov’s theorem, relative to the drift free diffusion.
a) Let d = 1, h(y) = y. Show that the estimator Â can be “robustified” in the sense
that ÂT (ω) = ÃT (Y (ω)) where

YT2 − y02 − σ 2 T
ÃT (Y ) = RT . (5.9)
2 0 Yt2 dt

is defined deterministically for any non-zero Y ∈ C([0, T ], Rd ), and continuous

with respect to uniform topology.
b) Take again h(x) = x, but now in dimension d > 1. Show that Â admits a
robust representation on rough path space, i.e. one has ÂT (ω) = ÃT (Y(ω))
where ÃT = ÃT (Y) is deterministically defined and continuous with respect to
α-Hölder rough path topology for any fixed α ∈ (1/3, 1/2). Here, Y(ω) is the
geometric rough path constructed from Y by iterated Stratonovich integration.
Explain why there cannot be a robust representation on path space (as was the
case when d = 1). What about more general h?
Exercise 5.16 (Skorokhod anticipating integration versus rough integration).
We have seen that Itô integration coincides with rough integration against BItô (ω),
subject to natural conditions (in particular: adaptedness of (Y, Y 0 ) which guarantee
that both are well-defined). A well-known extension of the Itô-integral to non-adapted
integrands is given by the Skorokhod integral; details of which are found in most
textbooks on Malliavin calculus, see for example [Nua06].
80 5 Stochastic integration and Itô’s formula

a) Let B denote one-dimensional Brownian motion on [0, T ]. Show that the Sko-
RT
rokhod integral of BT against B over [0, T ], in symbols 0 BT δBt , is given by
BT2 − T .
b) Set Yt (ω) := BT (ω), with (zero) increments (trivially) controlled by B with
Y 0 := 0. (In view of true roughness of Brownian motion, cf. Section 6, there is no
other choice for Y 0 ). Show that the rough integral of Y against Brownian motion
over [0, T ] equals BT2 . Conclude that Skorokhod and rough integrals (against Itô
enhanced Brownian motion) do not coincide beyond adapted integrals.

Exercise 5.17 (Stratonovich anticipating integration versus rough integration;

[CFV07]). In the spirit of Nualart–Pardoux [NP88], define the Stratonovich antici-
pating stochastic integral by
Z t Z t
dB n (ω)
u(s, ω) s
def
u(s, ω) ◦ dBs (ω) = lim ds,
0 n→∞ 0 ds

where B n is a dyadic piecewise-linear approximation to the Brownian motion B,

whenever this limit exists in probability and uniformly on compacts. Consider (pos-
sibly anticipating) random 1-forms, u(s, ω) = Fω (Bs ) ∈ Cb2 , for a.e. ω. Show that
with probability one,
Z · Z ·
dB n (ω)
Fω (Bs )dBStrat (ω) ≡ lim Fω (Bs ) s ds .
0 n→∞ 0 ds

where the limit on the right hand side exists in almost-sure sense. Conclude that in
this case rough integration against BStrat coincides almost surely with Stratonovich
anticipating stochastic integration, i.e.
Z · Z ·
Strat
Fω (Bs )dB (ω) ≡ Fω (Bs ) ◦ dBs (ω).
0 0

Hint: It is useful consider the pair (BStrat , B n ), canonically viewed as (geometric)

rough paths over R2d , followed by its rough path convergence to the “doubled” rough
path (BStrat , BStrat ) (which needs to be defined rigorously).
Remark. Nualart–Pardoux actually define their integral in terms of arbitrary de-
terministic (not necessarily dyadic) piecewise linear approximations and demand
that the limit does not depend on the choice of the sequence of partitions. At the
price of giving up the martingale argument, which made dyadic approximations easy
(Proposition 3.6), everything can also be done in the general case; see exercises 10.13
and 10.14 below.

Exercise 5.18. Consider the Itô-Föllmer integral given by

X Z t
lim hDF (Xu ), Xu,v i =: DF (X)dX
n→∞ 0
[u,v]∈Pn
5.5 Exercises 81

whenever this limit exists along some a sequence of dissections (Pn ) ⊂ [0, t] with
mesh |Pn | → 0. Show that this limit does not exist, in general, when X = B H , a
d-dimensional fractional Brownian motion with Hurst parameter H < 1/2. Hint:
Consider the simplest possible non-trival case, namely d = 1 and F (x) = x2 .
Solution 5.19. Assume convergence in probability say along some (Pn ) for the
approximating (left-point) sum,
X
Xu Xu,v .
[u,v]∈Pn

We look for a contradiction. Elementary “calculus for sums” implies that the mid-
point sum converges, i.e. where Xu above is replaced by Xu + Xu,v /2. It follows
that convergence of the left-point sums is equivalent to to existence of quadratic
variation, i.e. existence of
X 2
lim |Xu,v | .
n→∞
[u,v]∈Pn

2 2H
Note that E|Xu,v | = (1/2n ) so that the expectation of this sum equals 2n(1−2H) ,
which diverges when H < 1/2. In particular, quadratic variation does not exist as L1
limit. But is also cannot exists as a limit in probability, for both types of convergence
are equivalent on any finite Wiener–Itô chaos.
Exercise 5.20. In Proposition 5.6, replace the assumption that X = (X, S) ∈
Crα ([0, T ], V ) with α > 1/3, by a suitable p-variation assumption with p < 3.
Show that [X] has finite p/2-variation and that D2 F (X)d[X], as it appears in Itô’s
R

formula for reduced rough paths, remains a Young integral.

Exercise 5.21. Prove Proposition 1.1.
Solution 5.22. Without loss of generality, we consider the problem on the interval
[0, 2π]. Assume by contradiction that there is a space B R⊂ C([0, 2π]) which carries
the law µ of Brownian motion and such that (f, g) 7→ f dg is continuous on B.
By definition, the Cameron–Martin space of µ is H = W01,2 ([0, 1]), which has an
orthonormal basis {en }n∈Z given by

t sin kt 1 − cos kt
e0 (t) = √ , ek (t) = √ , e−k (t) = √ ,
2π k π k π

for k > 0. It follows from standard Gaussian measure theory [Bog98] that, given
a sequence ξn of i.i.d. normal Gaussian random variables, the sequence XN =
P N
n=−N en ξn converges almost surely in B to a limit X such that the law of X is µ.
PN
Write now YN = n=−N sign(n)en ξn , so that one also has YN → Y with law of
Y given by µ.
This immediately leads to a contradiction: on the one hand, assuming that (f, g) 7→
R R 2π
f dg is continuous on B, this implies that 0 XN (t) dYN (t) converges to some
finite (random) real number. On the other hand, an explicit calculation yields
82 5 Stochastic integration and Itô’s formula

2π N
ξ02 X ξn2 + ξ−n
2
Z
XN (t) dYN (t) = + .
0 2 n=1
n

It is now straightforward to verify that this diverges logarithmically, thus concluding

the proof.

5.6 Comments

Rough integrals of 1-forms against the Brownian rough path (and also continuous
semi-martingales enhanced to rough paths) are well known to coincide with stochastic
integrals, see [LQ02, FV10b] for instance, but the extensions presented in this section
seem to be new. Our Itô formula for reduced rough paths also appears to be new.
Chapter 6
Doob–Meyer type decomposition for rough
paths

Abstract A deterministic Doob–Meyer type decomposition is established. It is

closely related to question to what extent Y 0 is determined from Y , given that
(Y, Y 0 ) ∈ DX
2α
. The crucial property is true roughness of X, a deterministic property
that guarantees that X varies in all directions, all the time.

6.1 Motivation from stochastic analysis

Consider a continuous semi-martingale (St : t ≥ 0). By definition (e.g. [RY91, Ch.

IV]) this means that S = M + A where M ∈ M, the space of continuous local
martingales, and A ∈ V, the space of continuous adapted process of finite variation.
Then it is well known that the decomposition S = M + A is unique in the following
sense.

Proposition 6.1. Assume M, M̃ ∈ M, vanishing at zero, and A, Ã ∈ V such that

M + A ≡ M̃ + Ã (i.e. the respective processes are indistinguishable). Then

M ≡ M̃ and A ≡ Ã .

Furthermore, if S = M + A ≡ 0 on some random interval [0, τ ) where τ is a

stopping time, then [M ] ≡ 0 on [0, τ ) and A ≡ 0 on [0, τ ).

Proof. Assume M + A ≡ M̃ + Ã. Then M − M̃ ∈ V, and null at zero. By a standard

result in martingale theory, see for example [RY91, IV, Prop 1.2], this entails that
M − M̃ ≡ 0. But then A ≡ Ã and the proof is complete.
Regarding the second statement, consider the stopped semi-martingale, S τ =
M + Aτ where Mtτ = Mt∧τ and similarly for A. By assumption S τ ≡ 0 and
τ

hence, by the first part, M τ , Aτ ≡ 0. This also implies that the quadratic variation
τ
of M τ , denoted by [M τ ], vanishes. Since [M τ ] = [M ] (see e.g. [RY91, Ch. IV]) it
indeed follows that [M ] ≡ 0 on [0, τ ). t u

83
84 6 Doob–Meyer type decomposition for rough paths

The above proposition applies in particular when M is given as multidimensional

(say Re -valued) stochastic integral of a suitable L Rd , Re -valued integrand Y
(continuous and adapted will do) against d-dimensional Brownian motion B, while
A is the indefinite integral of some suitable Re -valued process Z (again, continuous
and adapted will do). We then have
Corollary 6.2. Let B be a d-dimensional Brownian motion and let Y , Z, Ỹ , Z̃ be
continuous stochastic processes adapted to the filtration generated by B. Assume, in
the sense of indistinguishability of left- and right-hand sides, that
Z · Z · Z · Z ·
Y dB + Zdt ≡ Ỹ dB + Z̃ dt on [0, T ]. (6.1)
0 0 0 0

Then Y ≡ Ỹ and Z ≡ Z̃ on [0, T ].

Proof. We may take dimension e = 1, otherwise argue componentwise. Also, by

linearity, it suffices to consider the case Ỹ = 0, Z̃ = 0. By the second part of the
previous proposition
Z · " d Z #
X ·
k
Y dB ≡ Yk dB ≡ 0 on [0, T ].
0 · k=1 0
·

On the other hand, due to d B k , B l t = dt if k = l, and zero else,
" d Z
# d Z d
X · X · X
Z ·
k
Yk Yl d B k , B l = Yk2 dt.

Yk dB ≡
k=1 0 k,l=1 0 k=1 0
·

It follows that Y ≡ 0 as claimed. By differentiation, it then follows that also Z ≡ 0.

t
u

Clearly, the martingale and quadratic (co-)variation – i.e. probabilistic – properties

of B play a key role in the proof of Corollary 6.2. It is worth noting that, with β
a scalar Brownian motion and B 1 = B 2 = β the conclusion fails; try non-zero
Y 1 ≡ −Y 2 , Z ≡ 0. It is crucial that d-dimensional standard Brownian motion
“moves in all directions”,
captured through the non-degeneracy of the quadratic
covariation matrix B k , B l t .
Surprisingly perhaps, one can formulate a purely deterministic decomposition
of the form (6.1): the stochastic integrals will be replaced by rough integrals, the
relevant probabilistic properties of B by certain conditions (“roughness from below1 ,
in all directions”) on the sample path.

1
As opposed to Hölder regularity which quantifies “roughness from above”, in the sense of an
upper estimate of the increment.
6.2 Uniqueness of the Gubinelli derivative and Doob–Meyer 85

6.2 Uniqueness of the Gubinelli derivative and Doob–Meyer

Here and in the sequel of this section we fix α ∈ ( 13 , 12 ], a rough path X = (X, X) ∈
C α ([0, T ], V ) and a controlled rough path (Y, Y 0 ) ∈ DX 2α
. We first address the
question to what extent X and Y determine the Gubinelli derivative Y 0 . As it turns
out, Y 0 is uniquely determined, provided that X is sufficiently “rough from below, in
all directions”. A Doob–Meyer type decomposition will then follow as a corollary.
Let us first consider the case when X is scalar, i.e. with values in V = R. Assume
that for some given s ∈ [0, T ), there exists a sequence of times tn ↓ s such that
2α
|Xs,tn |/|tn − s| → ∞, i.e.

|Xs,t |
lim 2α = +∞.
t↓s |t − s|

Then Ys0 is uniquely determined from Y by (4.16) and the condition that kRY k2α <
∞. In fact, one necessarily has Xs,tn ∈ R \ {0} for n large enough and so, from the
very definition of RY ,
Y 2α
Ys,tn Rs,t |tn − s|
Ys0 = − n
2α
Xs,tn |tn − s| Xs,tn

which implies that limn→∞ Ys,tn /Xs,tn exists and equals Ys0 . The multidimensional
case is not that different, and the above consideration suggests the following defini-
tion.

Definition 6.3. For fixed s ∈ [0, T ) we call X ∈ C α ([0, T ], V ) “rough at time s” if

|hv ∗ , Xs,t i|
∀v ∗ ∈ V ∗ \{0} : lim 2α =∞.
t↓s |t − s|

If X is rough on some dense set of [0, T ], we call it truly rough.

This definition is vindicated by the following result.

Proposition 6.4 (Uniqueness

R of Y 0 ). Let X = (X, X) ∈ C α , (Y, Y 0 ) ∈ DX
2α
, so
that the rough integral Y dX exists. Assume X is rough at some time s ∈ [0, T ).
Then
2α
Ys,t = O |t − s| as t ↓ s ⇒ Ys0 = 0 . (6.2)
As a consequence, if X is truly rough and (Y, Ỹ 0 ) ∈ DX
2α
is another controlled rough
0 0
path (with respect to X) then Y ≡ Ỹ .

Proof. From the definition of (Y, Ỹ 0 ) ∈ DX

2α
, we have
2α
Ys,t = Ys0 Xs,t + O |t − s| .

Hence, for t ∈ (s, s + ε),

86 6 Doob–Meyer type decomposition for rough paths

Ys0 Xs,t Ys,t

2α = 2α + O(1) = O(1) ,
|t − s| |t − s|

where the second equality follows from the assumption made in (6.2). Now, Ys0 Xs,t
takes values in W̄ , the same Banach space in which Y takes its values. For every
w∗ ∈ W̄ ∗ , the map V 3 v 7→ hw∗ , Ys0 vi defines an element v ∗ ∈ V ∗ so that
* +
|hv ∗ , Xs,t i| 0
∗ Ys Xs,t

= w , 2α = O(1) as t ↓ s;

2α
|t − s| |t − s|

Unless v ∗ = 0, the assumption that “X is rough at time s” implies that, along some
sequence tn ↓ s, we have the divergent behaviour |hv ∗ , Xs,tn i|/|tn − s|2α → ∞,
which contradicts that the same expression is O(1) as tn ↓ s. We thus conclude that
v ∗ = 0. In other words,

∀w∗ ∈ W ∗ , v ∈ V : hw∗ , Ys0 vi = 0 ,

and this clearly implies Ys0 = 0. This finishes the proof of the implication stated in
(6.2). tu

The following result should be compared with Corollary 6.2.

Theorem 6.5 (Doob–Meyer for rough paths). Assume that X is rough at some
time s ∈ [0, T ) and let (Y, Y 0 ) ∈ DX
2α
. Then
Z t
2α
Y dX = O |t − s| as t ↓ s ⇒ Ys = 0 . (6.3)
s

As a consequence, if X is truly rough, Ỹ , Ỹ 0 ∈ DX

2α
and Z, Z̃ ∈ C([0, T ], W ),
then the identity
Z · Z · Z · Z ·
Y dX + Zdt ≡ Ỹ dX + Z̃dt (6.4)
0 0 0 0

on [0, T ] implies that (Y, Y 0 ) ≡ (Ỹ , Ỹ 0 ) and Z ≡ Z̃ on [0, T ].

Proof. Recall from Theorem 4.10 that (I, I 0 ) :=

R
Y dX, Y is controlled by X,
i.e. (I, I 0 ) ∈ DX
2α
. The statement (6.3) is then an immediate consequence of (6.2).
The claim is now straightforward. Pick any s ∈ [0, T ) such that X is rough at
time s. From (6.4), and for all 0 ≤ s ≤ t ≤ T ,
Z t Z t
2α
Y − Ỹ dX = Zr − Z̃r dr = O(|t − s|) = O |t − s| ,
s s

2α
where the last inequality is just the statement that |t − s| = O |t − s| as t ↓ s,
thanks to α ≤ 1/2. We then conclude using (6.3) that Ys = Ỹs . If we now assume
true roughness of X, this conclusion holds for a dense set of times s and hence, by
6.3 Brownian motion is truly rough 87

continuity of Y and Ỹ , we have Y ≡ Ỹ . But then, by Proposition 6.4, we also have

Y 0 ≡ Ỹ 0 and so Z · Z ·
Y dX ≡ Ỹ dX .
0 0

(Attention that the above notation “hides” the dependence on Y 0 resp. Ỹ 0 .) But then
(6.4) implies Z t Z t
Zr dr ≡ Z̃r dr for t ∈ [0, T ],
0 0
and we conclude by differentiation with respect to t. t
u

6.3 Brownian motion is truly rough

Recall that (say, d-dimensional standard) Brownian motion satisfies the so-called
(Khintchine) law of the iterated logarithm, that is
" #
|Bt,t+h | √
∀t ≥ 0 : P lim 1 = 2 = 1. (6.5)
h↓0 h 2 (ln ln 1/h)1/2

See [McK69, p.18] or [RY91, Ch. II] for instance, typically proved with exponential
martingales. Remark that it is enough to consider t = 0 since (Bt,t+h : h ≥ 0) is
also a Brownian motion.
Theorem 6.6. With probability one, Brownian motion on V = Rd is truly rough,
relative to any Hölder exponent α ∈ [1/4, 1/2).
Proof. It is enough to show that, for fixed time s, and any θ ∈ [1/2, 1),
" #
∗ |ϕ(Bs,t )|
P ∀ϕ ∈ V , |ϕ| = 1 : lim = +∞ = 1.
t↓s |t − s|θ

(Then take s ∈ Q and conclude that the above event holds true, simultanously for all
such s, with probability one.)
1 1/2
To this end, set h 2 (ln ln 1/h) ≡ ψ(h). √We need the following two conse-
quences of (6.5). There exists c> 0 (here c = 2) such that for every every fixed
∗
unit dual vector ϕ ∈ V ∗ = Rd and every fixed s ∈ [0, T )

P lim |hϕ, Bs,t i|/ψ(t − s) ≥ c = 1,
t↓s

|Bs,t |
P lim < ∞ = 1.
t↓s ψ(t − s)

Take K ⊂ V ∗ to be any dense, countable set of dual unit vectors. Since K is

countable, the set on which the first condition holds simultanously for all ϕ ∈ K has
88 6 Doob–Meyer type decomposition for rough paths

full measure,
P ∀ϕ ∈ K : lim |ϕ(Bs,t )|/ψ(t − s) ≥ c = 1
t↓s

On the other hand, every unit dual vector ϕ ∈ V ∗ is the limit of some (ϕn ) ⊂ K.
Then
|hϕn , Bs,t i| |hϕ, Bs,t i| |Bs,t |
≤ + |ϕn − ϕ|V ∗
ψ(t − s) ψ(t − s) ψ(t − s)
so that, using lim (|a| + |b|) ≤ lim (|a|) + lim (|b|), and restricting to the above set
of full measure,

|hϕn , Bs,t i| |hϕ, Bs,t i| |Bs,t |V

c ≤ lim ≤ lim + |ϕn − ϕ|V ∗ lim .
t↓s ψ(t − s) t↓s ψ(t − s) t↓s ψ(t − s)

Sending n → ∞ gives, with probability one,

|hϕ, Bs,t i|
0 < c ≤ lim .
t↓s ψ(t − s)

Hence, for a.e. sample B = B(ω) we can pick a sequence (tn ) converging to s such
that |hϕ, Bs,tn i|/ψ(tn − s) ≥ c − 1/n. On the other hand, for any θ ≥ 1/2 we have

|hϕ, Bs,tn (ω)i| |hϕ, Bs,tn (ω)i| ψ(tn − s)

θ
=
|tn − s| ψ(tn − s) |tn − s|θ
1
−θ
≥ (c − 1/n)|tn − s| 2 L(tn − s) → ∞ ,

where in the borderline case θ = 1/2 (which corresponds to α = 1/4) this divergence
1/2
is only logarithmic, L(τ ) = (ln ln 1/τ ) . t u

6.4 A deterministic Norris’ lemma

We now turn our attention to a quantitative version of true roughness. In essence, we

now replace 2α in definition 6.3 by θ and quantify the divergence, uniformly over all
directions.

Definition 6.7. A path X : [0, T ] → V with values in a Banach space V is said to

be θ-Hölder rough for θ ∈ (0, 1), on scale (smaller than) ε0 > 0, if there exists a
constant L := Lθ (X) := L(θ, ε0 , T ; X) > 0 such that for every ϕ ∈ V ∗ , s ∈ [0, T ]
and ε ∈ (0, ε0 ] there exists t ∈ [0, T ] such that

|t − s| < ε , and |ϕ(Xs,t )| ≥ Lθ (X) εθ |ϕ| . (6.6)

the largest such value of L is called the modulus of θ-Hölder roughness of X.

6.4 A deterministic Norris’ lemma 89

Observe that, indeed, any element in C α which is θ-Hölder rough for θ < 2α
is truly rough. (We shall see in the next section that multidimensional Brownian
motion is θ-Hölder rough for any θ > 1/2.) The following result can be viewed as
quantitative version of Proposition 6.4.

Proposition 6.8. Let (X, X) ∈ C α [0, T ], V be such that X is θ-Hölder rough for

some θ ∈ (0, 1]. Then, for every controlled rough path (Y, Y 0 ) ∈ DX 2α
[0, T ], W
one has,
∀ε ∈ (0, ε0 ] : Lεθ kY 0 k∞ ≤ osc(Y, ε) + RY 2α ε2α .

(6.7)
As immediate consequence, if θ < 2α, Y 0 is uniquely determined from Y , i.e. if
Y, Y and Ỹ , Ỹ both belong to DX
0 0
and Y ≡ Ỹ , then Y 0 ≡ Ỹ 0 .
2α

Proof. Let us start with the consequence: apply estimate (6.7) with Y replaced by
Y − Ỹ = 0 and similarly Y 0 replaced by Y 0 − Ỹ 0 . Thanks to L > 0 it follows that
0
Y − Ỹ 0 = O ε2α−θ

∞

and we send ε → 0 to conclude Y 0 = Ỹ 0 . The remainder of the proof is devoted to

establish (6.7). Fix s ∈ [0, T ] and ε ∈ (0, ε0 ]. From the definition of the remainder
RY in (4.16), it then follows that

sup |Ys0 Xs,t | ≤ sup |Ys,t | + |Rs,t Y

| ≤ osc(Y, ε) + kRY k2α ε2α . (6.8)

|t−s|≤ε |t−s|≤ε

Let now ϕ ∈ W ∗ be such that |ϕ| = 1. Since X is θ-Hölder rough by assumption,

there exists v = v(ϕ) with |v − s| < ε such that
ϕ(Ys0 Xs,v )| = (Ys0 )∗ ϕ (Xs,v )| > L εθ |(Ys0 )∗ ϕ| .

(6.9)

(Note that one has indeed (Ys0 )∗ : W ∗ → V ∗ .) Combining both (6.8) and (6.9), we
thus obtain that

L εθ |(Ys0 )∗ ϕ| ≤ osc(X, ε) + kRY k2α ε2α .

Taking the supremum over all such ϕ ∈ W ∗ of unit length,2 and using the fact that
the norm of a linear operator is equal to the norm of its adjoint, we obtain

L εθ |Ys0 | ≤ osc(Y, ε) + kRY k2α ε2α .

Since s was also arbitrary, the stated bound follows at once. t

Remark 6.9. Even though the argument presented above is independent of the dimen-
sion of V , we are not aware of any example where L(θ, X) > 0 and dim V = ∞.
The reason why this definition works well only in the finite-dimensional case will be
apparent in the proof of Proposition 6.11 below.

2
Note that |Ys0∗ | denotes the operator norm, by definition equal to sup|ϕ|=1 |Ys0∗ ϕ|.
90 6 Doob–Meyer type decomposition for rough paths

This leads us to the folllowing quantitative version our previous Doob–Meyer

result for rough paths, Theorem 6.5. As usual, we assume that α ∈ (1/3, 1/2).

Theorem 6.10 (Norris lemma for rough paths). Let X = (X, X) ∈ C α [0, T ], V

rough with θ < 2α. Let (Y, Y ) ∈ DX [0, T ], L(V, W )

0 2α
be such that X is θ-Hölder
α
and Z ∈ C [0, T ], W and set
Z t Z t
It = Ys dXs + Zs ds.
0 0

Then there exist constants r > 0 and q > 0 such that, setting
−1
R := 1 + Lθ (X) + |||X|||α + kY, Y 0 kX;2α + |Y0 | + |Y00 | + kZkα + |Z0 |

one has the bound

r
kY k∞ + kZk∞ ≤ M Rq kIk∞ ,
for a constant M depending only on α, θ, and the final time T .

Proof. We leave the details of the proof as an exercise, see [HP13], and only sketch
its broad lines.
First, we conclude from Proposition 6.8 that I small in the supremum norm
implies that kY k∞ is also small. Then, we use interpolation to conclude from this
R D for ᾱ < α, thus implying
that R(Y, Y 0 ) is small when viewed as an element of 2ᾱ

that Y dX is necessarily small. This implies that Z ds is itself small from which,
using again interpolation, we finally conclude that Z itself must be small in the
supremum norm. t u

6.5 Brownian motion is Hölder rough

We now turn to Hölder-roughness of Brownian motion. Our focus will be on the unit
interval T = 1, and we consider scale up to ε0 = 1/2 for the sake of argument.

Proposition 6.11. Let B be a standard Brownian motion on [0, 1] taking values in

Rd . Then, for every θ > 12 , the sample paths of B are almost surely θ-Hölder rough.
Moreover, with scale ε0 = 1/2 and writing Lθ (B) for the modulus of θ-Hölder
roughness, there exists constants M and c such that

P(Lθ (B) < ε) ≤ M exp −cε−2 ,

for all ε ∈ (0, 1).

The proof of Proposition 6.11 relies on the following variation of the standard
small ball estimate for Brownian motion:
6.5 Brownian motion is Hölder rough 91

Lemma 6.12. Let B be a d-dimensional standard Brownian motion. Then, there

exist constants c > 0 and C > 0 such that

P inf sup |hϕ, B(t)i| ≤ ε ≤ C exp(−cδε−2 ) . (6.10)
|ϕ|=1 t∈[0,δ]

Proof. The standard small ball estimate for Brownian motion (see for example
[LS01]) yields the bound

sup P sup |hϕ, B(t)i| ≤ ε ≤ C exp(−cδε−2 ) . (6.11)
|ϕ|=1 t∈[0,δ]

The required estimate then follows from a standard chaining argument, as in [Nor86,
p. 127]: cover the sphere |ϕ| = 1 with ε−2(d−1) balls of radius ε2 , say, centred
at ϕi . We then use the fact that, since the supremum of B has Gaussian tails, if
supt∈[0,δ] |hϕi , B(t)i| ≤ ε, then the same bound, but with ε replaced by 2ε holds
with probability exponentially close to 1 uniformly over all ϕ in the ball of radius ε2
centred at ϕi . Since there are only polynomially many such balls required to cover
the whole sphere, (6.10) follows. Note that this chaining argument uses in a crucial
way that the number of balls or radius ε2 required to cover the sphere kϕk = 1 grows
only polynomially with ε−1 .
It is clear that bounds of the type (6.10) break down in infinite dimensions: if we
consider a cylindrical Wiener process, then (6.11) still holds, but the unit sphere of a
Hilbert space cannot be covered by a finite number of small balls anymore. If on the
other hand, we consider a process with a non-trivial covariance, then we can get the
chaining argument to work, but the bound (6.11) would break down due to the fact
that hϕ, B(t)i can then have arbitrarily small variance. t u
Proof (Proposition 6.11). With T = 1, ε0 = 1/2, a different way of formulating
Definition 6.7 is given by
1
Lθ (X) = inf sup θ
|hϕ, Xs,t i|.
t:|t−s|≤ε ε

where the inf is taken over |ϕ| = 1, s ∈ [0, 1] and ε ∈ (0, 1/2]. We then define the
“discrete analog” Dθ (X) of Lθ (X) to be given by

Dθ (X) = inf sup 2nθ |hϕ, Xs,t i| ,

s,t∈Ik,n

where Ik,n = [ k−1

2n ,
k
2n ] and the inf is taken over |ϕ| = 1, n ≥ 1 and k ≤ 2n . We
first claim that
1 1
Lθ (X) ≥ Dθ (X). (6.12)
2 2θ
To this end, fix a unit vector ϕ ∈ V ∗ , s ∈ [0, 1] and ε ∈ (0, 1/2]. Pick n ≥ 1 :
ε/2 < 2−n ≤ ε. It follows that there exists some k such that Ik,n is included in the
set {t : |t − s| ≤ ε}. Then, by definition of Dθ , for any unit vector ϕ there exist two
points t1 , t2 ∈ Ik,n such that
92 6 Doob–Meyer type decomposition for rough paths

|hϕ, Xt1 ,t2 i| ≥ 2−nθ Dθ (X).

Therefore, by the triangle inequality, we conclude that the magnitude of the difference
between hϕ, Xs i and one of the two terms hϕ, Xti i, i = 1, 2 (say t1 ) is at least
1 −nθ
|hϕ, Xs,t1 i| ≥ 2 Dθ (X)
2
and therefore
|hϕ, Xs,t1 i| 1 2−nθ 1 1
θ
≥ θ
Dθ (X) ≥ Dθ (X).
ε 2 ε 2 2θ
Since s, ε and ϕ were chosen arbitrarily, the claim (6.12) follows.
Applying this to Brownian sample paths, X = B(ω), it follows that it is sufficient
to obtain the requested bound on P(Dθ (B) < ε). We have the straightforward bound
|hϕ, Bs,t i|
P(Dθ (B) < ε) ≤ P inf inf infn sup < ε
kϕk=1 n≥1 k≤2 s,t∈Ik,n 2−nθ
X 2n
∞ X
≤ P inf sup |hϕ, Bs,t i| < 2−nθ ε .
kϕk=1 s,t∈Ik,n
n=1 k=1

Trivially sups,t∈Ik,n |hϕ, Bs,t i| ≥ supt∈Ik,n |hϕ, Br,t i|, where r is the left boundary
of the interval Ik,n , we can bound this by applying Lemma 6.12. Noting that the
bound obtained in this way is independent of k, we conclude that
∞
X ∞
X
(2θ−1)n −2
n
exp −c̃nε−2 .

P(Dθ (B) < ε) ≤ M 2 exp −c2 ε ≤ M̃
n=1 n=1

Here, we used the fact that as soon as θ > 12 , we can find constants K and c̃ such that

n log 2 − c2(2θ−1)n ε−2 ≤ K − c̃nε−2 ,

uniformly over all ε < 1 and all n ≥ 1. (Consider separately the cases ε2 ∈ (0, 1/n)
and ε2 ∈ [1/n, 1).) We deduce from this the bound
Z ∞
−2
P(Dθ (B) < ε) ≤ M e−c̃ε + exp −c̃ε−2 x dx ,
1

which immediately implies the result. t

Note that the proof given above is quite robust. In particular, we did not really
make use of the fact that B has independent increments. In fact, it transpires that all
that is required in order to prove the Hölder roughness of sample paths of a Gaussian
process W with stationary increments is a small ball estimate of the type
6.7 Comments 93

P sup |Wt − W0 | ≤ ε ≤ C exp(−cδ α ε−β ) ,
t∈[0,δ]

for some exponents α, β > 0. These kinds of estimates are available for example for
fractional Brownian motion with arbitrary Hurst parameter H ∈ (0, 1).

6.6 Exercises

Exercise 6.13. Show that the Q-Wiener process (as introduced in Exercise 3.16) is
truly rough.
Exercise 6.14. Prove and state precisely: multidimensional fractional Brownian mo-
tion B H , H ∈ (1/3, 1/2], is truly rough. Hint: A law of iterated logarithm for
fractional Brownian motion of the form
 
H
Bt,t+h √
Plim = 2 = 1
h↓0 hH (ln ln 1/h)1/2

holds, cf. for example [MR06, Thm 7.2.15].

Exercise 6.15. In (6.7), estimate osc(Z, ε) by 2kY k∞ (or alternatively by kY kα εα )
and deduce the estimate
1
kZ 0 k∞ ≤ inf 2ε−θ kY k∞ + RZ 2α ε2α−θ .

L ε∈(0,ε0 ]

Carry out the elementary optimization, e.g. when ε0 = T /2, to see that

4kY k∞ θ θ
RZ 2α kY k− 2α ∨ T −θ .
kZ 0 k∞ ≤ 2α ∞
L(θ, X)

Exercise 6.16 (Norris lemma for rough paths; [HP13]). Give a complete proof of
Theorem 6.10.

6.7 Comments

The notion of θ-roughness was first introduced in Hairer–Pillai [HP13], which

also contains Proposition 6.8, although some of the ideas underlying the concepts
presented here were already apparent in Baudoin–Hairer and Hairer–Mattingly
[BH07, HM11]. A version of this “Norris lemma” in the context of SDEs driven by
fractional Brownian motion was proposed independently by Hu–Tindel [HT13]. The
simplified condition of “true” roughness (which may be verified in infinite dimen-
sions), targeted directly at a Doob–Meyer decomposition, is taken from Friz–Shekhar
94 6 Doob–Meyer type decomposition for rough paths

[FS12a]; the quantitative “Norris lemma” is taken from Cass, Litterer, Hairer and
Tindel [CHLT12]. These results also hold in “rougher” situations, i.e. when α ≤ 1/3,
[FS12a, CHLT12].
Chapter 7
Operations on controlled rough paths

R
Abstract At first sight, the notation Y dX introduced in Chapter 4 is ambiguous
since the resulting controlled rough path depends in general on the choices of both
the second-order process X and the derivative process Y 0 . Fortunately, this “lack of
completeness” in our notations is mitigated by the fact that in virtually all situations
of interest, Y is constructed by using a small number of elementary operations
described in this chapter. For all of these operations, it turns out to be intuitively
rather clear how the corresponding derivative process is constructed.

7.1 Relation between rough paths and controlled rough paths

Consider X = (X, X) ∈ C α ([0, T ], V ). It is easy to see that X itself can be inter-

preted as a path “controlled by X”. Indeed, we can identify X with the element
(X, I) ∈ DX 2α
, where I is the identity matrix (more precisely: the constant path with
value I for all times). Conversely, an element (Y, Y 0 ) ∈ DX 2α
([0, T ], W ) can itself be
interpreted as a rough path again, say Y = (Y, Y). Indeed, with the interpretation of
the integral in the sense of (4.22), below fully spelled out for the reader’s convenience,
we can set
Z t Z
Ξ , Ξu,v = Yu ⊗ Yu,v + Yu0 ⊗ Yu0 Xu,v .
def
Ys,t = Ys,r ⊗ dYr = lim
s |P|→0 P

where Yu0 ⊗Yu0 ∈ L(V ⊗V, W ⊗W ) is given by (Yu0 ⊗Yu0 )(v⊗ṽ) = (Yu0 (v))⊗(Yu0 (ṽ)).
The fact that kYk2α is finite is then a consequence of (4.23). On the other hand, the
algebraic relations (2.1) already hold for the “Riemann sum” approximations to the
three integrals, provided that the partition used for the approximation of Ys,t is the
union of the one used for the approximation of Ys,u with the one used for Yu,t .
We summarise the above consideration in saying that for every fixed X ∈
C α ([0, T ], V ), we have a continuous canonical injection

95
96 7 Operations on controlled rough paths

DX
2α
([0, T ], W ) ,→ C α ([0, T ], W ) .

Furthermore, this interpretation of elements of DX2α

as elements of C α is coherent in
terms of the theory of integration constructed in the previous section, as can be seen
by the following result:

Proposition 7.1. Let (X, X) ∈ C α , let (Y, Y 0 ) ∈ DX 2α

, and let Y = (Y, Y) ∈ C α
be the associated rough path constructed as above. If (Z̃, Z̃ 0 ) ∈ DY2α , then (Z, Z 0 ) ∈
DX2α
, where Zt = Z̃t and Zt0 = Z̃t0 Yt0 . Furthermore, one has the identity
Z t Z t
Zs dYs = Z̃s dYs . (7.1)
0 0

Here, the left hand side uses (4.22) to define the integral of two controlled rough
paths against each other and the right hand side uses the original definition (4.19) of
the integral of a controlled rough path against its reference path.

Proof. By assumption, one has Ys,t = Ys0 Xs,t + O(|t − s|2α ) and Z̃s,t = Zs0 Ys,t +
O(|t − s|2α ). Combining these identities, it follows immediately that

Zs,t = Z̃s0 Ys0 Xs,t + O(|t − s|2α ) = Zs0 Xs,t + O(|t − s|2α ) ,

so that (Z, Z 0 ) ∈ DX
2α
as required. Now the left hand side of (7.1) is given by IΞ0,t
with Ξs,t = Zs Ys,t + Zs0 Ys0 Xs,t , whereas the right hand side is given by I Ξ̃0,t ,
where we set Ξ̃s,t = Z̃s Ỹs,t + Z̃s0 Ys,t . Since |Ys,t − Ys0 Ys0 Xs,t | ≤ C|t − s|3α by
(4.20), the claim now follows from Remark 4.12. t u

Remark 7.2. It is straightforward to see that if 13 < β < α, then C α ,→ C β and, for
2β
every X ∈ C α , we have a canonical embedding DX 2α
,→ DX . Furthermore, in view
of the definition (4.10) of I, the values of the integrals defined above do not depend
on the interpretation of the integrand and integrator as elements of one or the other
space.

7.2 Lifting of regular paths.

There is a canonical embedding ι : C 2α ,→ DX2α

given by ιY = (Y, 0), since in this
case Rs,t = Ys,t does indeed satisfy kRk2α < ∞. Recall that we are only interested
in the case α ≤ 12 . After all, if Ys,t = O(|t − s|2α ) with α > 12 , then Y has a
vanishing derivative and must be constant.
7.3 Composition with regular functions. 97

7.3 Composition with regular functions.

Let W and W̄ be two Banach spaces and let ϕ : W → W̄ be a function in Cb2 . Let
furthermore (Y, Y 0 ) ∈ DX2α
([0, T ], W ) for some X ∈ C α . (In applications X will
be part of some X = (X, X) ∈ C α but this is irrelevant here.) Then one can define a
(candidate) controlled rough path (ϕ(Y ), ϕ(Y )0 ) ∈ DX2α
([0, T ], W̄ ) by

ϕ(Y )t = ϕ(Yt ) , ϕ(Y )0t = Dϕ(Yt )Yt0 . (7.2)

It is straightforward to check that the corresponding remainder term does indeed

satisfy the required bound. It is also straightforward to check that, as a consequence
of the chain rule, this definition is consistent in the sense that (ϕ ◦ ψ)(Y, Y 0 ) =
ϕ(ψ(Y, Y 0 )). We have
Lemma 7.3. Let ϕ ∈ Cb2 , (Y, Y 0 ) ∈ DX 2α
([0, T ], W ) for some X ∈ C α with |Y00 | +
kY, Y kX,2α ≤ M ∈ [1, ∞). Let (ϕ(Y ), ϕ(Y )0 ) ∈ DX
0 2α
([0, T ], W̄ ) be given by
(7.2). Then, there exists a constant C depending only on T > 0 and α > 31 such that
one has the bound

ϕ(Y ), ϕ(Y )0 2 0 0

X,2α
≤ C α,T M kϕkC 2 (1 + kXkα ) |Y0 | + kY, Y kX,2α .
b

At last, C can be chosen uniformly over T ∈ (0, 1].

0
Proof. We have ϕ(Y ), ϕ(Y ) = (ϕ(Y· ), Dϕ(Y· )Y·0 ) ∈ DX
2α
. Indeed,

kϕ(Y· )kα ≤ kDϕk∞ kY· kα

ϕ(Y )0 ≤ kDϕ(Y· )k kY·0 k + kY·0 k kDϕ(Y· )k

α ∞ α ∞ α
≤ kDϕ(Y· )k∞ kY·0 kα + kY·0 k∞ D2 ϕ(Y· ) ∞ kY· kα ,

0
which shows that ϕ(Y ), ϕ(Y ) ∈ C α . Furthermore, Rϕ ≡ Rϕ(Y ) is given by
ϕ
Rs,t = ϕ(Yt ) − ϕ(Ys ) − Dϕ(Ys )Ys0 Xs,t
Y
= ϕ(Yt ) − ϕ(Ys ) − Dϕ(Ys )Ys,t + Dϕ(Ys )Rs,t

so that,
1 2 2
kRϕ k2α ≤ D ϕ ∞ kY kα + |Dϕ|∞ RY 2α .

2
It follows that
ϕ(Y ), ϕ(Y )0 ≤ kDϕ(Y· )k∞ kY·0 kα + kY·0 k∞ D2 ϕ(Y· ) ∞ kY· kα

X,2α
1 2
+ D2 ϕ∞ kY kα + |Dϕ|∞ RY 2α

2
2
≤ kϕkC 2 kY·0 kα + kY·0 k∞ kY· kα + kY kα + RY 2α
b

2
≤ Cα,T kϕkC 2 (1 + kXkα ) 1 + |Y00 | + kY, Y 0 kX,2α
b
98 7 Operations on controlled rough paths

× |Y00 | + kY, Y 0 kX,2α ,

where we used in particular (4.18). t

It follows immediately that one has the following “Leibniz rule”, the proof of
which is left to the reader:

Corollary 7.4. Let (Y, Y 0 ) and (Z, Z 0 ) be two controlled paths in DX

2α
for some
X ∈ C α . Then the path U = Y Z, with Gubinelli derivative U 0 = Y Z 0 + ZY 0 also
belongs to DX2α
.

7.4 Stability II: Regular functions of controlled rough paths

We now investigate the continuity properties of the controlled rough path constructed
in Lemma 7.3. In doing so, we shall use notation previously introduced in Section 4.4.

Theorem 7.5 (Stability of composition). Let X = X, X , X̃ = X̃, X̃ ∈ C α ,

Y, Y 0 ∈ DX , Ỹ , Ỹ 0 ∈ DX̃
2α
2α
. For ϕ ∈ Cb3 define

(Z, Z 0 ) := (ϕ(Y ), Dϕ(Y )Y 0 ) ∈ DX

2α
(7.3)
0

and Z̃, Z̃ similarly. Then, one has the local Lipschitz estimates

dX,X̃,2α Z, Z 0 ; Z̃, Z̃ 0 ≤ CM %α X, X̃ + Y0 − Ỹ0 + Y00 − Ỹ00

+ dX,X̃,2α Y, Y 0 ; Ỹ , Ỹ 0 , (7.4)

as well as

Z − Z̃ ≤ CM %α X, X̃ + Y0 − Ỹ0 + Y00 − Ỹ00 + d 0 0

α X,X̃,2α Y, Y ; Ỹ , Ỹ ,
(7.5)
for a suitable constant CM = C(M, T, α, ϕ).

Proof. (The reader is urged to revisit Lemma 7.3 where the composition (7.3) was
seen to be well-defined for ϕ ∈ Cb2 .) Similar as in the previous proof, noting that
0
Z0 − Z̃00 = Dϕ(Y0 )Y00 − Dϕ Ỹ0 Ỹ00 ≤ CM Y0 − Ỹ0 + Y00 − Ỹ00

it suffices to establish the first estimate, for (7.5) is an immediate consequence of

(7.4) and (4.27). In order to establish the first estimate we need to bound

Dϕ(Y )Y 0 − Dϕ Ỹ Ỹ 0 + RZ − RZ̃ .

α 2α

Write CM (εX + ε0 + ε00 + ε) for the right hand side of (7.4). Note that with this
notation, from (4.27),
7.4 Stability II: Regular functions of controlled rough paths 99

Y − Ỹ . εX + ε00 + ε =: εY ,

α

and also Y − Ỹ ∞;[0,T ] . ε0 + εY (uniformly over T ≤ 1). Since Dϕ ∈ Cb2 , we
know from Lemma 8.2 that

Dϕ Ỹ − Dϕ(Y ) α = Dϕ Ỹ0 − Dϕ(Y0 ) + Dϕ Ỹ − Dϕ(Y )
C α
≤ C(ε0 + εY )

where C depends on the Cb3 -norm of ϕ. Also, Y 0 − Ỹ 0 C α ≤ ε00 + ε. Clearly then
(C α is a Banach algebra under pointwise multiplication), we have, for a constant CM ,
Dϕ(Y )Y 0 − Dϕ Ỹ Ỹ 0 ≤ CM (ε0 + εY + ε00 + ε)

α
. CM (εX + ε0 + ε00 + ε) .

To deal with RZ − RZ̃ , write

Z
Rs,t = ϕ(Yt ) − ϕ(Ys ) − Dϕ(Ys )Ys0 Xs,t
Y
= ϕ(Yt ) − ϕ(Ys ) − Dϕ(Ys )Ys,t + Dϕ(Ys )Rs,t .

Taking the difference

with RZ̃ (replace Y, Y 0 , RY above by Ỹ , Ỹ 0 , RỸ ) leads to the
Z Z̃
bound Rs,t − Rs,t ≤ T1 + T2 where

T1 := ϕ(Yt ) − ϕ(Ys ) − Dϕ(Ys )Ys,t − ϕ Ỹt − ϕ Ỹs − Dϕ Ỹs Ỹs,t
Z 1

D2 ϕ(Ys + θYs,t )(Ys,t , Ys,t ) − D2 ϕ Ỹs + θỸs,t Ỹs,t , Ỹs,t (1 − θ)dθ

=
0
Y
Ỹ
T2 := Dϕ(Ys )Rs,t − Dϕ Ỹs Rs,t .
2α
Y
As for the second term, we know Rs,t Ỹ
− Rs,t ≤ (ε00 + ε)|t − s| , for all s, t, while

Dϕ Ỹs − Dϕ(Ys ) ≤ D2 ϕ Ỹs − Ys ≤ D2 ϕ (ε0 + εY ).

∞ ∞

By elementary estimates of the form ab − ãb̃ ≤ a b − b̃ + a − ã b̃ it then
2α
follows immediately that one has T2 ≤ C(εX + ε0 + ε00 + ε)|t − s| .
One argues similarly for
R the first term. This time, we consider the expression
under the above integral (. . .)(1 − θ)dθ for fixed integration variable θ ∈ [0, 1].
Using Y n → Y in α-Hölder norm, we obtain
2
D ϕ Ỹs + θỸs,t − D2 ϕ(Ys + θYs,t ) ≤ D3 ϕ Ỹs − Ys + Ỹs,t − Ys,t

∞
≤ 3 D3 ϕ ∞ Ỹ − Y ∞ . ε0 + εY ,

noting that this estimate is uniform in s, t ∈ [0, T ] and θ ∈ [0, 1]. RIt then suffices
to insert/subtract D2 ϕ(Ys + θYs,t ) Ỹs,t , Ỹs,t under the integral . . . (1 − θ)dθ
appearing in the definition of T1 and conclude with the triangle inequality and some
100 7 Operations on controlled rough paths

simple estimates, keeping in mind that kY − Ỹ kα ≤ εY and kY kα , kỸ kα . CM .

t
u

7.5 Itô’s formula revisited

Let F : V → W in Cb3 , X = (X, X) ∈ C α and (Y, Y 0 ) ∈ DX

2α
a controlled rough
path of the form Z t
Yt = Ys0 dXs + Γt , (7.6)
0

for some controlled rough path (Y , Y 00 ) ∈ DX02α

and some path Γ ∈ C 2α . This is the
case for rough integrals of 1-forms, cf. Section 4.2, and also if Y is the solution to a
rough differential equation driven by X, to be discussed in Section 8.1.
In this situation, in analogy with the “usual” Itô formula, we would expect that
Z t Z t
0
F (Yt ) = F (Y0 ) + DF (Ys )Ys dXs + DF (Ys ) dΓs
0 0
Z t
1
D2 F (Ys ) Ys0 , Ys0 d[X]s ,

+ (7.7)
2 0

which is meaningful if we interpret the last two integrals as Young integrals. To show
that this is indeed the case, note first that a consequence of (7.6) and Theorem 4.10,
the increments of Y are of the form

Ys,t = Ys0 Xs,t + Ys00 Xs,t + Γs,t + o(|t − s|) . (7.8)

Furthermore, by Lemma 7.3 and Corollary 7.4, the path G := DF (Y )Y 0 is controlled

by X, with G0 = D2 F (Y )(Y 0 , Y 0 ) + DF (Y )Y 00 , so that the rough integral
Z t Z t X
DF (Ys )Ys0 dXs = Gs dXs = lim Gu Xu,v + G0u Xu,v , (7.9)
0 0 |P |→0
[u,v]∈P

which is the first term in the above identity, makes sense as a rough integral. Note
that, if X = B, Itô enhanced Brownian motion, and Y, Y 0 , Y 00 are all adapted, then
so is G and the integral is identified, by Proposition 5.1, as a classical Itô integral.
Proposition 7.6. Under the assumption (7.8), the Itô formula (7.7) holds true.
Proof. By the (previous) Itô formula, we know that F (Yt ) − F (Y0 ) equals
X X
DF (Yu )Yu,v + D2 F (Yu )Yu,v + lim D2 F (Yu )[Y]u,v

lim
|D|→0 |D|→0
[u,v]∈D [u,v]∈D

Rv (7.10)
where Yu,v = u Yu,· ⊗ dY in the sense of remark 4.11, noting that Yu,v =
Yu0 Yu0 Xu,v + o(|v − u|). Also,
7.6 Controlled rough paths of low regularity 101

[Y]u,v = Yu,v ⊗ Yu,v − 2 Sym (Yu,v )

= Yu0 Yu0 (Xu,v ⊗ Xu,v − 2 Sym (Xu,v )) + o(|v − u|)
= Yu0 Yu0 [X]u,v + o(|v − u|).

Let us also subtract/add DF (Yu )Yu00 Xu,v from (7.10). Then F (Yt ) − F (Y0 ) equals
X
DF (Yu )(Yu,v − Yu00 Xu,v ) + DF (Yu )Yu00 Xu,v + D2 F (Yu )Yu0 Yu0 Xu,v

lim
|D|→0
[u,v]∈D
X
+ lim D2 F (Yu )Yu0 Yu0 [X]u,v
|D|→0
[u,v]∈D
X
DF (Yu )Yu0 Xu,v + DF (Yu )Yu00 + D2 F (Yu )Yu0 Yu0 Xu,v

= lim
|D|→0
[u,v]∈D
X Z t
+ lim DF (Yu )Γu,v + D2 F (Yu )Y 0 Yu0 d[X]u .
|D|→0 0
[u,v]∈D

In view of (7.9), also noting the appearance of two Young integrals in the last line,
the proof is complete. t u

7.6 Controlled rough paths of low regularity

Let us conclude this section by showing how these canonical operations can be
lifted to the case of controlled rough paths of low regularity, i.e. when α < 13 .
Recall from Section 4.5 that in this case we view a controlled rough path Y as a
T (p−1) (Rd )-valued function, which is controlled by increments of X in the sense of
Definition 4.17.
This suggests that, in order to define the product of two controlled rough paths
Y and Ȳ , we should first ask ourselves how a product of the type Xw w̄
s,t Xs,t for two
different words w a w̄ can be rewritten as a linear combination of the increments of
X. It was realised by Chen [Che54] that such a product is described by the shuffle

product. Recall that, for any alphabet A, the shuffle product is defined on the free
algebra over A by considering all possible ways of interleaving two words in ways
that preserve the original order of the letters. For example, if a, b and c are letters in
A, one has the identity

ab ac = abac + 2aabc + 2aacb + acab .

In our case, the choice of basis described earlier then defines a natural algebra
homomorphism w 7→ ew from the free algebra over {1, . . . , d} into T (p) (Rd ), and
we denote by ? the corresponding product. In other words, we have the identity

ew ? ew̄ = eww̄ ,
102 7 Operations on controlled rough paths

where, if w is a linear combination of words, ew is the corresponding linear combi-

nation of basis vectors.
With this definition at hand, it turns out that any geometric rough path X satisfies
the identity
ww̄
Xw w̄
s,t Xs,t = Xs,t .
This strongly suggests that the “correct” way of multiplying two controlled rough
paths Y and Ȳ is to define their product Z by

Zt = Yt ? Ȳt .

It is possible to check that Z is indeed again a controlled rough path. Similarly, if F

is a smooth function and Y is a controlled rough path, we define F (Y ) by
p−1
X 1 (k) φ ?k
F (Y )t = F (Ytφ ) +
def
F (Yt ) Ỹt ,
k!
k=1

where F (k) denotes the kth derivative of F and Ỹt = Yt − Ytφ is the part describing
def

the “local fluctuations” of Y .

It is again possible to show that F (Y ) is a controlled rough path if Y is a controlled
rough path and F is sufficiently smooth. (It should be of class Cbp .) See [Hai14c] for
an extremely general setting in which a similar calculus is still useful.

7.7 Exercises
Rt
Exercise 7.7. Verify that Xs,t = s Xs,r ⊗dXr where the integral is to be interpreted
in the sense of (4.22), taking (Y, Y 0 ) to be (X, I). In fact, check that
R this holds not
only in the limit |P| → 0 but in fact for every fixed |P|, i.e. Xs,t = P Ξ. Compare
this with formula (2.12), obtained in Exercise 2.7.

Exercise 7.8. Let ϕ : W × [0, T ] → W̄ be a function which is uniformly C 2 in its

first argument (i.e. ϕ is bounded and both Dy ϕ and Dy2 ϕ are bounded, where Dy
denotes the Fréchet derivative with respect to the first argument) and uniformly C 2α
in its second argument. Let furthermore (Y, Y 0 ) ∈ DX 2α
([0, T ], W ). Show that

ϕ(Y )t = ϕ(Yt , t) , ϕ(Y )0t = Dy ϕ(Yt , t)Yt0 .

defines an element (ϕ(Y ), ϕ(Y )0 ) ∈ DX

2α
([0, T ], W̄ ). In fact, show that there exists
a constant C, depending only on T , such that one has the bound
2 2
kϕ(Y )kX,2α ≤ C kDy2 ϕk∞ + kϕk∞ + kϕk2α;t 1 + kXkα 1 + kY kX,2α ,

where we denote by kϕk2α;t the supremum over y of the 2α-Hölder norm of ϕ(y, ·).
7.7 Exercises 103

Exercise 7.9. Convince yourself that in the case p = 2, the definitions given in
Section 7.6 coincide with the definitions given earlier in this section.
Chapter 8
Solutions to rough differential equations

Abstract We show how to solve differential equations driven by rough paths by a

simple Picard iteration argument. This yields a pathwise solution theory mimicking
the standard solution theory for ordinary differential equations. We start with the
simple case of differential equations driven by a signal that is sufficiently regular for
Young’s theory of integration to apply and then proceed to the case of more general
rough signals.

8.1 Introduction

We now turn our attention to (rough) differential equations of the form

dYt = f (Yt ) dXt , Y0 = ξ ∈ W . (8.1)

Here, X : [0, T ] → V is the driving or input signal, while Y : [0, T ] → W is the

output signal. As usual V and W are Banach spaces, and f : W → L(V, W ). When
dim V = d < ∞, one may think of f as a collection of vector fields (f1 , . . . , fd ) on
W . As usual, the reader is welcome to think V = Rd and W = Rn but there is really
no difference in the argument. Such equations are familiar from the theory of ODEs,
and more specifically, control theory, where X is typically assumed to be absolutely
continuous so that dXt = Ẋt dt. The case of SDEs, stochastic differential equations,
with dX interpreted as Itô or Stratonovich differential of Brownian motion, is also
well known. Both cases will be seen as special examples of RDEs, rough differential
equations.
We may consider (8.1) on the unit time interval. Indeed, equation (8.1) is invariant
under time-reparametrization so that any (finite) time horizon may be rescaled
to [0, 1]. Alternatively, global solutions on a larger time horizon are constructed
successively, i.e. by concatenating Y |[0,1] (started at Y0 ) with Y |[1,2] (started at
Y1 ) and so on. As a matter of fact, we shall construct solutions by a variation of
the classical Picard iteration on intervals [0, T ], where T ∈ (0, 1] will be chosen

105
106 8 Solutions to rough differential equations

sufficiently small to guarantee invariance of suitable balls and the contraction property.
Our key ingredients are estimates for rough integrals (cf. Theorem 4.10) and the
composition of controlled paths with smooth maps (Lemma 7.3). Recall that, for
rather trivial reasons (of the sort |t − s|2α ≤ |t − s|, when 0 ≤ s ≤ t ≤ T ≤ 1), all
constants in these estimates were seen to be uniform in T ∈ (0, 1].

8.2 Review of the Young case: a priori estimates

Let us postulate that there exists a solution to a differential equation in Young’s sense
and let us derive an a-priori estimate. (In finite dimension, this can actually be used
to prove the existence of solutions. Note that the regularity requirement here is “one
degree less” than what is needed for the corresponding uniqueness result.)

Proposition 8.1. Assume X, Y ∈ C β ([0, 1], V ) for some β ∈ (1/2, 1] such that,
given ξ ∈ W, f ∈ Cb1 (W, L(V, W )), we have

dYt = f (Yt )dXt , Y0 = ξ ,

in the sense of a Young integral equation. Then

1/β
kY kβ ≤ C kf kC 1 kXkβ ∨ kf kC 1 kXkβ .
b b

Rt
Proof. By assumption, for 0 ≤ s < t ≤ 1, Ys,t = s f (Yr )dXr . Using Young’s
inequality (4.3), with C = C(β),
Z t

|Ys,t − f (Ys )Xs,t | =
(f (Yr ) − f (Ys ))dXr
s
2β
≤ CkDf k∞ kY kβ;[s,t] kXkβ;[s,t] |t − s|

so that
β β
|Ys,t |/|t − s| ≤ kf k∞ kXkβ + CkDf k∞ kY kβ;[s,t] kXkβ;[s,t] |t − s| .

β
Write kY kβ;h ≡ sup |Ys,t |/|t − s| where the sup is restricted to times s, t ∈ [0, 1]
for which t − s ≤ h. Clearly then,

kY kβ;h ≤ kf k∞ kXkβ + CkDf k∞ kY kβ;h kXkβ hβ

and upon taking h small enough, s.t. δhβ 1, with δ = kXkβ , more precisely s.t.

CkDf k∞ kXkβ hβ ≤ C 1 + kf kC 1 kXkβ hβ ≤ 1/2
b
8.3 Review of the Young case: Picard iteration 107

(we will take h such that the second ≤ becomes an equality; adding 1 avoids trouble
when f ≡ 0)
1
kY kβ;h ≤ kf k∞ kXkβ .
2
−1/β
It then follows from Exercise 4.24 that, with h ∝ kXkβ ,

kY kβ ≤ kY kβ;h 1 ∨ h−(1−β) ≤ CkXkβ 1 ∨ h−(1−β)

1/β
= C kXkβ ∨ kXkβ .

Here, we have absorbed the dependence on f ∈ Cb1 into the constants. By scaling
(any non-zero f may be normalised to kf kC 1 = 1 at the price of replacing X by
b
kf kC 1 × X) we then get immediately the claimed estimate. t
u
b

8.3 Review of the Young case: Picard iteration

The reader may be helped by first reviewing the classical Picard argument in a
Young setting, i.e. when β ∈ (1/2, 1]. Given ξ ∈ W , f ∈ Cb2 (W, L(V, W )), X ∈
C β ([0, 1], V ) and Y : [0, T ] → W of suitable Hölder regularity, T ∈ (0, 1], one
defines the map MT by
Z t
MT (Y ) := ξ + f (Ys )dXs : t ∈ [0, T ] .
0

Following a classical pattern of proof, we shall establish invariance of suitable balls,

and then a contraction property upon taking T = T0 small enough. The resulting
unique fixed point is then obviously the unique solution to (8.1) on [0, T0 ]. The
unique solution on [0, 1] is then constructed successively, i.e. by concatenating the
solution Y on [0, T0 ], started at Y0 = ξ, with the solution Y on [T0 , 2T0 ] started
at YT0 and so on. Care is necessary to ensure that T0 can be chosen uniformly; for
instance, if f were only C 2 (without the boundedness assumption) one can still obtain
local existence on [0, T1 ], and then [T1 , T2 ], etc, but explosion may happen at some
finite time limn Tn within our time horizon [0, 1]. The situation here is completely
analogous to what we are familiar with from the usual theory of nonlinear ODEs.
We will need the Hölder norm of X over [0, T ] to tend to zero as T ↓ 0. Now,
as the example of the t 7→ t and β = 1 shows, this may not be true relative to the
β-Hölder norm; the (cheap) trick is to take α ∈ (1/2, β) and to view MT as map
from the Banach space C α ([0, T ], W ), rather than C β ([0, T ], W ), into itself. Young’s
inequality is still applicable since all paths involved will be (at least) α-Hölder
continuous with α > 1/2. On the other hand,

kXkα;[0,T ] ≤ T β−α kXkβ;[0,T ] ,

108 8 Solutions to rough differential equations

and so the α-Hölder norm of X has the desired behaviour. As previously, when no
confusion is possible, we write k · kα ≡ k · kα;[0,T ] .
To avoid norm versus semi-norm considerations, it is convenient to work on
the space of paths started at ξ, namely {Y ∈ C α ([0, T ], W ) : Y0 = ξ}. This affine
subspace is a complete metric space under Y, Ỹ 7→ Y − Ỹ α and so is the closed
unit ball
BT = {Y ∈ C α ([0, T ], W ) : Y0 = ξ, kY kα ≤ 1} .
Young’s inequality (4.32) shows that there is a constant C which only depends on α
(thanks to T ≤ 1) such that for every Y ∈ BT ,

kMT (Y )kα ≤ C(|f (Y0 )| + kf (Y )kα )kXkα

≤ C(|f (ξ)| + kDf k∞ kY kα )kXkα
≤ C(|f |∞ + kDf k∞ )kXkα ≤ C|f |C 1 kXkβ T β−α .
b

Similarly, for Y, Ỹ ∈ BT , using Young, f Y0 = f Ỹ0 and Lemma 8.2 below (with
K = 1)
Z · Z ·

Y − M Ỹ = f Y dX − f Ỹ dX

MT T s s s s

α

0 0 α

≤ C f Y0 − f Ỹ0 + f Y − f Ỹ α kXkα

≤ Ckf kC 2 kXkβ T β−α Y − Ỹ α .

It is clear from the previous estimates that a small enough T0 = T0 (f, α, β, X) ≤ 1

can be found such that MT0 (BT0 ) ⊂ BT0 and, for all Y, Ỹ ∈ BT0 ,

MT Y − MT Ỹ 1
0 0 α;[0,T0 ]
≤ Y − Ỹ α;[0,T0 ] .
2
Therefore, MT0 (·) admits a unique fixed point Y ∈ BT0 which is the unique solution
Y to (8.1) on the (small) interval [0, T0 ]. Noting that the choice T0 = T0 (f, α, β, X)
can indeed be done uniformly (in particular it does not change when the starting point
ξ is replaced by YT0 ), the unique solution on [0, 1] is then constructed iteratively, as
explained in the beginning.

Lemma 8.2. Assume f ∈ Cb2 (W, W̄ ) and T ≤ 1. Then there exists a Cα,K such that
for all X, Y ∈ C α with kXkα;[0,T ] , kY kα;[0,T ] ≤ K ∈ [1, ∞)

kf (X) − f (Y )kα;[0,T ] ≤ Cα,K kf kC 2 |X0 − Y0 | + kX − Y kα;[0,T ] .
b

Proof. Consider the difference

f (X)s,t − f (Y )s,t = (f (Xt ) − f (Yt )) − (f (Xs ) − f (Ys )).

8.4 Rough differential equations: a priori estimates 109

The idea is to use a division property of sufficiently smooth functions. In the present
context, this simply means that one has
Z 1
f (x) − f (y) = g(x, y)(x − y) with g(x, y) := Df (tx + (1 − t)y) dt ,
0

where g : W ×W → L(W, W̄ ) is obviously bounded by |Df |∞ and in fact Lipschitz

with |g|Lip ≤ C D2 f ∞ for some constant C ≥ 1 relative to any product norm on
W × W , such as |(x, y)|W ×W = |x| + |y|. It follows that

|(g(x, y) − g(x̃, ỹ))| ≤ |g|Lip |(x − x̃, y − ỹ)| ≤ C D2 f ∞ (|x − x̃| + |y − ỹ|).

Setting ∆t = Xt − Yt then allows to write

f (X) − f (Y ) = |g(Xt , Yt )∆t − g(Xs , Ys )∆s |
s,t s,t
= |g(Xt , Yt )(∆t − ∆s ) + (g(Xt , Yt ) − g(Xs , Ys ))∆s |
≤ |g|∞ |Xs,t − Ys,t | + |g|Lip |(Xs,t , Ys,t )|W ×W |Xs − Ys |
≤ |Df |∞ |Xs,t − Ys,t | + C|D2 f |∞ (|Xs,t | + |Ys,t |)kX − Y k∞;[0,T ]
. |t − s|α |Df |∞ kX − Y kα + K|D2 f |∞ kX − Y k∞;[0,T ] .

Since T ≤ 1 we can also estimate kX − Y k∞;[0,T ] ≤ |X0 − Y0 | + kX − Y kα;[0,T ]

and the claimed estimate on f (X) − f (Y ) follows immediately. t
u

8.4 Rough differential equations: a priori estimates

We now consider a priori estimates for rough differential equations, similar to Section
8.2. Recall that the homogeneous rough path norm |||X|||α was introduced in (2.4).
Proposition 8.3. Let ξ ∈ W, f ∈ Cb2 (W, L(V, W )) and a rough path X = (X, X) ∈
C α with α ∈ (1/3, 1/2] and assume that (Y, Y 0 ) = (Y, f (Y )) ∈ DX 2α
is a RDE
solution to dY = f (Y ) dX started at Y0 = ξ ∈ W . That is, for all t ∈ [0, T ],
Z t
Yt = ξ + f (Ys ) dXs , (8.2)
0

2β
where the integral is interpreted in the sense of Theorem 4.10 and f (Y ) ∈ DX is
built from Y by Lemma 7.3. (Thanks to Cb2 -regularity of f and Lemma 7.3 the above
rough integral equation (8.2) is well-defined.1 )
Then the following (a priori) estimate holds true
1/α
kY kα ≤ C kf kC 2 |||X|||α ∨ kf kC 2 |||X|||α
b b

1
Later we will establish existence and uniqueness under Cb3 -regularity.
110 8 Solutions to rough differential equations

where C = C(α) is a suitable constant.

Proof. Consider an interval I := [s, t] so that, using basic estimates for rough
integrals (cf. Theorem 4.10),
Y
Rs,t = |Ys,t − f (Ys )Xs,t |
Z t

≤ f (Y )dX − f (Ys )Xs,t − Df (Ys )f (Ys )Xs,t + |Df (Ys )f (Ys )Xs,t |
s
3α
. kXkα;I Rf (Y ) 2α;I + kXk2α;I kf (Y )kα;I |t − s|

2α
+ kXk2α;I |t − s| . (8.3)

Recall that k · kα is the usual Hölder semi-norm over [0, T ], while k · kα;I denotes
the same norm, but over I ⊂ [0, T ], so that trivially kXkα;I ≤ kXkα . Whenever
notationally convenient, multiplicative constants depending on α and f are absorbed
in ., at the very end we can use scaling to make the f dependence reappear. We
will also write k · kα;h for the supremum of k · kα;I over all intervals I ⊂ [0, T ] with
length |I| ≤ h. Again, one trivially has kXkα;I ≤ kXkα;h whenever |I| ≤ h. Using
this notation, we conclude from (8.3) that
Y f (Y )
α
R
2α;h
. kXk 2α;h + kXk α;h
R
2α;h
+ kXk 2α;h kf (Y )kα;h h .

We would now like to relate Rf (Y ) to RY . As in the proof of Lemma 7.3, we obtain

the bound
f (Y )
Rs,t = f (Yt ) − f (Ys ) − Df (Ys )Ys0 Xs,t
Y
= f (Yt ) − f (Ys ) − Df (Ys )Ys,t + Df (Ys )Rs,t

so that,
f (Y ) 1 2
≤ D2 f ∞ kY kα;h + |Df |∞ RY 2α;h

R
2α;h 2
2
. kY kα;h + RY 2α;h .

Hence, also using kf (Y )kα;h . kY kα;h , there exists c1 > 0, not dependent on X or
Y , such that
Y 2
R
2α;h
≤ c1 kXk2α;h + c1 kXkα;h hα kY kα;h (8.4)
α Y α

+ c1 kXkα;h h R 2α;h + c1 kXk2α;h h kY kα;h .

We now restrict ourselves to h small enough so that |||X|||α hα 1. More precisely,

we choose it such that
1 1/2 1
c1 kXkα hα ≤ , c1 kXk2α hα ≤ .
2 2
8.4 Rough differential equations: a priori estimates 111

Inserting this bound into (8.4), we conclude that

Y 1 2 1 RY
1/2
R
2α;h
≤ c1 kXk2α;h + kY kα;h + 2α;h
+ kXk2α;h kY kα;h .
2 2
This in turn yields the bound
Y 2 1/2
R
2α;h
≤ 2c1 kXk2α;h + kY kα;h + 2kXk2α;h kY kα;h
2
≤ c2 kXk2α;h + 2kY kα;h , (8.5)

Y
with c2 = (2c1 + 1). On the other hand, since Ys,t = f (Ys )Xs,t − Rs,t and f is
bounded, we have the bound

kY kα;h . kXkα + RY 2α;h hα .

Combining this bound with (8.5) yields

2
kY kα;h ≤ c3 kXkα + c3 kXk2α;h hα + c3 kY kα;h hα
1/2 2
≤ c3 kXkα + c4 kXk2α;h + c3 kY kα;h hα ,

for some constant c3 . Multiplication with c3 hα then yields, with ψh := c3 kY kα;h hα

and λh := c5 |||X|||α hα → 0 as h → 0,

ψh ≤ λh + ψh2 .

Clearly, for all h small enough depending on Y (so that ψh ≤ 1/2) ψh ≤ λh + ψh /2

implies ψh ≤ 2λh and so
kY kα;h ≤ c6 |||X|||α .
To see that this is true for all h small enough without dependence on Y , pick h0
small enough so that λh0 < 1/4. It then follows that for each h ≤ h0 , one of the
following two estimates must hold true
r
1 1 1
ψh ≥ ψ+ ≡ + − λh ≥
2 4 2
r
1 1 1 p
ψh ≤ ψ− ≡ − − λh = 1 − 1 − 4λh ∼ λh as h ↓ 0.
2 4 2
(In fact, for reasons that will become apparent shortly, we may decrease h0 further to
guarantee that for h < h0 we have not only ψh < 1/2 but ψh < 1/6.) We already
know that we are in the regime of the second estimate above as h ↓ 0. Noting that
ψh (< 1/6) < 1/2 in the second regime, the only reason that could prevent us from
being in the second regime for all h < h0 is an (upwards) jump of the (increasing)
function (0, h0 ] 3 h 7→ ψh . But ψh ≤ 3 limg↑h ψg , as seen from

kY kα;h ≤ 3kY kα;h/3 ≤ 3 lim kY kα;g ,

g↑h
112 8 Solutions to rough differential equations

(and similarly: limg↓h ψg ≤ 3ψh ) which rules out any jumps of relative jump size
greater than 3. However, given that ψh ≥ 1/2 in the first regime and ψh < 1/6 in the
second, we can never jump from the second into the first regime, as h increases (from
zero). And so, we indeed must be in the second regime for all h ≤ h0 . Elementary
estimates on ψ− , as function of λh then show that

kY kα;h ≤ c6 |||X|||α ,

for all h ≤ h0 ∼ |||X|||−1/α . We conclude with Exercise 4.24, arguing exactly as in

the Young case, Proposition 8.1. t u

8.5 Rough differential equations

The aim of this section is to show that if f is regular enough and (X, X) ∈ C β with
β > 13 , then we can solve differential equations driven by the rough path X = (X, X)
of the type
dY = f (Y ) dX .
Such an equation will yield solutions in DX 2α
and will be interpreted in the corre-
sponding integral formulation, where the integral of f (Y ) against X is defined using
Lemma 7.3 and Theorem 4.10. More precisely, one has the following result:

Theorem 8.4. Given ξ ∈ W , f ∈ C 3 (W, L(V, W )) and X = (X, X) ∈ C β (R+ , V )

2β
with β ∈ ( 13 , 21 ), there exists a unique element (Y, Y 0 ) ∈ DX ([0, 1], W ) such that
Z t
Yt = ξ + f (Ys ) dXs , t<τ , (8.6)
0

for some τ > 0. Here, the integral is interpreted in the sense of Theorem 4.10 and
2β
f (Y ) ∈ DX is built from Y by Lemma 7.3. Furthermore, one has Y 0 = f (Y ) and,
if f ∈ Cb3 , solutions are global in time.

Proof. With X = (X, X) ∈ C β ⊂ C α , 13 < α < β and (Y, Y 0 ) ∈ DX 2α

we know
from Lemma 7.3 that
0
(Ξ, Ξ 0 ) := f (Y ), f (Y ) := (f (Y ), Df (Y )Y 0 ) ∈ DX
2α
.

Restricting from [0, 1] to [0, T ], any T ≤ 1, Theorem 4.10 allows to define the map
Z ·
0 def
MT (Y, Y ) = ξ + Ξs dXs , Ξ ∈ DX 2α
.
0

The RDE solution on [0, T ] we are looking for is a fixed point of this map. Strictly
speaking, this would only yield a solution (Y, Y 0 ) in DX2α
. But since X ∈ C β , it
2β
turns out that this solution is automatically an element of DX . Indeed, |Ys,t | ≤
8.5 Rough differential equations 113
2α
|Y 0 |∞ |Xs,t | + RY 2α |t − s| , so that Y ∈ C β . From the fixed point property it
then follows that Y 0 = f (Y ) ∈ C β and also RY ∈ C22β , since X ∈ C22β and

t
Z
Rs,t = Ys,t − Ys0 Xs,t =
Y
(f (Yr ) − f (Ys ))dXt

s
3α
≤ |Y 0 |∞ |Xs,t | + O |t − s| .

Note that if (Y, Y 0 ) is such that (Y0 , Y00 ) = (ξ, f (ξ)), then the same is true for
MT (Y, Y 0 ). Therefore, MT can be viewed as map on the space of controlled paths
started at (ξ, f (ξ)), i.e.

(Y, Y 0 ) ∈ DX
2α
([0, T ], W ) : Y0 = ξ, Y00 = f (ξ) .

Since DX 2α
is a Banach space (under the norm (Y, Y 0 ) 7→ |Y0 | + |Y00 | + kY, Y 0 kX,2α )
the above (affine) subspace is a complete metric space under the induced metric. This
is also true for the (closed) unit ball BT centred at, say

t 7→ (ξ + f (ξ)X0,t , f (ξ)).

(Note here that the apparently simpler choice t 7→ ξ, f (ξ) does in general not
belong to DX2α
.) In other words, BT is the set of all (Y, Y 0 ) ∈ DX
2α
([0, T ], W ) :
0
Y0 = ξ, Y0 = f (ξ) and

|Y0 − ξ| + |Y00 − f (ξ)| + k(Y − (ξ + f (ξ)X0,· ), Y·0 − f (ξ))kX,2α

= k(Y − f (ξ)X0,· , Y·0 − f (ξ))kX,2α ≤ 1.

In fact, k(Y − f (ξ)X0,· , Y·0 − f (ξ))kX,2α = kY, Y·0 kX,2α as a consequence of the
triangle inequality and k(f (ξ)X0,· , f (ξ))kX,2α = kf (ξ)kα + k0k2α = 0, so that
n o
BT = (Y, Y 0 ) ∈ DX
2α
([0, T ], W ) : Y0 = ξ, Y00 = f (ξ) : k(Y, Y·0 )kX,2α ≤ 1 .

Let us also note that, for all (Y, Y 0 ) ∈ BT , one has the bound
0
Y0 + k(Y, Y 0 )k
X,2α ≤ |f |∞ + 1 =: M ∈ [1, ∞). (8.7)

We now show that, for T small enough, MT leaves BT invariant and in fact is
contracting. Constants below are denoted by C, may change from line to line and
may depend on α, β, X, X without special indication. They are, however, uniform
in T ∈ (0, 1] and we prefer to be explicit (enough) with respect to f such as to
see where Cb3 -regularity is used. With these conventions, we recall the following
estimates, direct consequences from Lemma 7.3 and Theorem 4.10 , respectively,

kΞ, Ξ 0 kX,2α ≤ CM kf kC 2 |Y00 | + kY, Y 0 kX,2α
b
114 8 Solutions to rough differential equations
Z ·
≤ kΞkα + kΞ 0 k∞ kXk2α

Ξs dXs , Ξ

0 X,2α

+ C kXkα RΞ 2α + kXk2α kΞ 0 kα

≤ kΞkα + C |Ξ00 | + kΞ, Ξ 0 kX,2α (kXkα + kXk2α )

≤ kΞkα + C |Ξ00 | + kΞ, Ξ 0 kX,2α T β−α .

Invariance: For (Y, Y 0 ) ∈ BT , noting that kΞkα = kf (Y )kα ≤ kf kC 1 kY kα and

b
2
that |Ξ00 | = |Df (Y0 )Y00 | ≤ kf kC 1 , we obtain the bound
b

Z ·
MT (Y , Y 0 )

X,2α
= Ξs dXs , Ξ

0 X,2α

0 0
≤ kΞkα + C |Ξ0 | + kΞ, Ξ kX,2α T β−α

2
≤ kf kC 1 kY kα + C kf kC 1 + CM kf kC 2 |Y00 | + kY, Y 0 kX,2α T β−α
b b b

β−α 2
≤ kf kC 1 (kf k∞ + 1)T + CM kf kC 1 + kf kC 2 (kf k∞ + 1) T β−α ,
b b b

where in the last step we used (8.7) and also kY kα;[0,T ] ≤ Cf T β−α , seen from

2α
|Ys,t | ≤ |Y 0 |∞ |Xs,t | + RY 2α |t − s|

β 2α
≤ (|Y00 | + kY 0 kα )kXkβ |t − s| + RY 2α |t − s| .

Then, using T α ≤ T β−α and RY 2α ≤ kY, Y 0 kX,2α ≤ 1 , we obtain the bound

kY kα;[0,T ] ≤ |Y00 | + kY, Y 0 kX,2α kXkβ T β−α + RY 2α T β−α

(8.8)

≤ (kf k∞ + 1)kXkβ + 1 T β−α .

In other words, kMT (Y, Y 0 )kX,2α = kMT (Y, Y 0 )kX,2α;[0,T ] = O T β−α with

constant only depending on α, β, X and f ∈ Cb2 . By chosing T = T0 small enough,

we obtain the bound kMT0 (Y, Y 0 )kX,2α;[0,T0 ] ≤ 1 so that MT0 leaves BT0 invariant,
as desired.
Contraction: Setting ∆s = f (Ys ) − f Ỹs as a shorthand, we have the bound
Z ·
MT Y, Y 0 − MT Ỹ , Ỹ 0

X,2α
= ∆s dXs , ∆

0 X,2α

≤ k∆kα + C |∆0 | + k∆, ∆0 kX,2α T β−α
0

≤ Ckf kC 2 Y − Ỹ α + Ck∆, ∆0 kX,2α T β−α .

b
8.5 Rough differential equations 115

The contraction property is obvious, provided that we can establish the following
two estimates:
Y − Ỹ ≤ CT β−α Y − Ỹ , Y 0 − Ỹ 0

α X,2α
, (8.9)
∆, ∆0 ≤ C Y − Ỹ , Y 0 − Ỹ 0 X,2α .

X,2α
(8.10)

To obtain (8.9), replace Y by Y − Ỹ in (8.8), noting Y00 − Ỹ00 = 0, shows that

Y − Ỹ ≤ Y 0 − Ỹ 0 kXk T β−α + RY − RỸ T β−α

α α β 2α
β−α 0 0

≤ CT Y − Ỹ , Y − Ỹ X,2α .

We now turn to (8.10). Similar to the proof of Lemma 8.2, f ∈ C 3 allows to write
∆s = Gs Hs where

Gs := g Ys , Ỹs , Hs := Ys − Ỹs ,

and g ∈ Cb2 with kgkC 2 ≤ Ckf kC 3 . Lemma 7.3 tells us that (G, G0 ) ∈ DX
2α
(with
b b

G0 = (DY g)Y 0 + (DỸ g)Ỹ 0 ) and in fact immediately yields an estimate of the form

kG, G0 kX,2α ≤ Ckf kC 3 ,

uniformly over Y, Y , Ỹ , Ỹ ∈ BT and T ≤ 1. On the other hand, DX

0 0
2α
is an
algebra in the sense that (GH, (GH) ) ∈ DX with (GH) = G H + GH . In fact,
0 2α 0 0 0

we leave it as easy exercise to the reader to check that

kGH, (GH)0 kX,2α . |G0 | + |G00 | + kG, G0 kX,2α

× |H0 | + |H00 | + kH, H 0 kX,2α .

In our situation,H0 = Y0 − Ỹ0 = ξ − ξ = 0, and similarly H00 = 0, so that, for all

Y, Y 0 , Ỹ , Ỹ 0 ∈ BT , we have

∆, ∆0 . |G0 | + |G00 | + kG, G0 kX,2α kH, H 0 kX,2α

X,2α

. kgk∞ + kgkC 1 Y00 + Ỹ00 + Ckf kC 3 Y − Ỹ , Y 0 − Ỹ 0 X,2α

b b

. Y − Ỹ , Y 0 − Ỹ 0 X,2α ,

where we made use of kgk∞ , kgkC 1 . kf kC 3 and |Y00 | = Ỹ00 = |f (ξ)| ≤ |f |∞ .
b b
The argument from here on is identical to the Young case: the previous esti-
mates allow fora small enough T0 ≤ 1 such that MT0 (BT0 ) ⊂ BT0 and for all
Y, Y 0 , Ỹ , Ỹ 0 ∈ BT0 :
116 8 Solutions to rough differential equations
1
MT0 Y, Y 0 − MT0 Ỹ , Ỹ 0 Y − Ỹ , Y 0 − Ỹ 0

≤

X,2α 2 X,2α

and so MT0 (·) admits a unique fixed point (Y, Y 0 ) ∈ BT0 , which is then the unique
solution Y to (8.1) on the (possibly rather small) interval [0, T0 ]. Noting that the
choice of T0 can again be done uniformly in the starting point, the solution on [0, 1]
is then constructed iteratively as before. t
u
In many situations, one is interested in solutions to an equation of the type

dY = f0 (Y, t) dt + f (Y, t) dXt , (8.11)

instead of (8.6). On the one hand, it is possible to recast (8.11) in the form (8.6) by
writing it as an RDE for Ŷt = (Yt , t) driven by X̂t = (X̂, X̂) where X̂ = (Xt , t)
and X̂ is given by X and the “remaining cross integrals” of Xt and t, given by usual
Riemann-Stieltjes integration. However, it is possible to exploit the structure of (8.11)
to obtain somewhat better bounds on the solutions. See [FV10b, Ch. 12].

8.6 Stability III: Continuity of the Itô–Lyons map

We now obtain continuity of solutions to rough differential equations as function of

their (rough) driving signals.
Theorem 8.5 (Rough path stability of the Itô–Lyons map). Let f ∈ Cb3 and let
(Y, f (Y )) ∈ DX
2α
be the (unique) RDE solution given by Theorem 8.4 to

dY = f (Y ) dX, Y0 = ξ ∈ W ;

similarly, let (Ỹ , f (Ỹ )) be the RDE solution driven by X̃ and started at ξ where
X, X̃ ∈ C β and α < β. Assuming

|||X|||β , |||X̃|||β ≤ M < ∞

we have the local Lipschitz estimates

˜ + %β X, X̃ ,

dX,X̃,2α Y, f (Y ); Ỹ , f (Ỹ ) ≤ CM |ξ − ξ|

and also
˜ + %β X, X̃ ,

Y − Ỹ ≤ CM |ξ − ξ|
α

where CM = C(M, α, β, f ) is a suitable constant.

Remark 8.6. The “loss” of Hölder regularity (the fact that we have two exponents
satisfying α < β) is not really necessary, but it allows for a quick proof.
Proof. Recall that, for given X ∈ C β , th RDE solution (Y, f (Y )) ∈ DX
2α
was
constructed as fixed point of
8.7 Davie’s definition and numerical schemes 117
Z ·
MT (Y, Y 0 ) := (Z, Z 0 ) := ξ+ f (Ys )dXs , f (Y· ) ∈ DX
2α
0
α
and similarly for M̃T Ỹ , f Ỹ ∈ CX̃ . Then, thanks to the fixed point property

(Y, f (Y )) = (Y, Y 0 ) = (Z, Z 0 ) = (Z, f (Y )),

(similarly with tilde) and the local Lipschitz estimate for rough integration (uniform
0
in T ≤ 1) writing (Ξ, Ξ 0 ) := f (Y ), f (Y ) for the integrand,

= dX,X̃,2α Z, Z 0 ; Z̃, Z̃ 0

dX,X̃,2α Y, f (Y ); Ỹ , f Ỹ
. %α X, X̃ + ξ − ξ˜ + dX,X̃,2α Ξ, Ξ 0 ; Ξ̃, Ξ̃ 0

≤ %β X, X̃ + ξ − ξ˜ + dX,X̃,2β Ξ, Ξ 0 ; Ξ̃, Ξ̃ 0 ,

where we used α < β and T ≤ 1 in the last step. Thanks to the local Lipschitz
estimate for composition (also uniform over T ≤ 1)

dX,X̃,2β Ξ, Ξ 0 ; Ξ̃, Ξ̃ 0 . %β X, X̃ + ξ − ξ˜ + dX,X̃,β Y, f (Y ); Ỹ , f Ỹ

≤ %β X, X̃ + ξ − ξ˜ + dX,X̃,2α Y, f (Y ); Ỹ , f Ỹ T β−α .

In summary, for some constant C = C(α, β, f, M ), we have the bound

dX,X̃,2α Y, f (Y ); Ỹ , f Ỹ ≤ C %β X, X̃ + ξ − ξ˜

+ dX,X̃,2α Y, f (Y ); Ỹ , f Ỹ T β−α .

By taking T = T0 (M, α, β, f ) smaller, if necessary, we may assume that CT β−α ≤

1/2, from which it follows that

dX,X̃,2α Y, f (Y ); Ỹ , f Ỹ ≤ 2C %β X, X̃ + ξ − ξ˜ ,

which is precisely the required bound. t

8.7 Davie’s definition and numerical schemes

Fix f ∈ Cb2 (W, L(V, W )) and X = (X, X) ∈ C β ([0, T ], V ) with β > 31 . Under
these assumptions, the rough differential equation dY = f (Y )dX makes sense as
well-defined integral equation. (In Theorem 8.4 we used additional regularity, namely
Cb3 , to establish existence of a unique solution on [0, T ].) By the very definition of an
2β
RDE solution, unique or not, (Y, f (Y )) ∈ DX i.e.
2β
Ys,t = f (Ys )Xs,t + O |t − s|
118 8 Solutions to rough differential equations

and we recognise a step of first-order

Euler approximation, Ys,t ≈ f (Ys )Xs,t , started
from Ys . Clearly O |t − s|2β = o(|t − s|) if and only if β > 1/2 and one can show
that iteration of such steps along a partition P of [0, T ] yields a convergent “Euler”
scheme as |P| ↓ 0, see [Dav08] or [FV10b].
In the case β ∈ 13 , 21 we have to exploit that we know more than just
2β Rt
(Y, f (Y )) ∈ DX . Indeed, since Ys,t = s f (Y )dX, estimate (4.20) for rough
integrals tells us that, for all pairs s, t
0 3β
Ys,t = f (Ys )Xs,t + (f (Y ))s Xs,t + O |t − s| . (8.12)
0
Using the identity f (Y ) = Df (Y )Y 0 = Df (Y )f (Y ), this can be spelled out
further to
Ys,t = f (Ys )Xs,t + Df (Ys )f (Ys )Xs,t + o(|t − s|) (8.13)
and, omitting the small remainder term, we recognise a step of a second-order Euler
or Milstein approximation. Again, one can show that iteration of such steps along a
partition P of [0, T ] yields a convergent “Euler” scheme as |P| ↓ 0; see [Dav08] or
[FV10b].
Remark 8.7. This schemes can be understood from simple Taylor expansions based
on the differential equation dY = f (Y )dX, at least when X is smooth (enough), or
via Itô’s formula in a semi-martingale setting. With focus on the smooth case, the
Euler approximation is obtained by a “left-point freezing” approximation f (Y· ) ≈
f (Ys ) over [s, t] in the integral equation,
Z t
Ys,t = f (Yr )dXr ≈ f (Ys )Xs,t
s
Rt
whereas the Milstein scheme, with Xs,t = s
Xs,r dXr for smooth paths, is obtained
from the next-best approximation

f (Yr ) ≈ f (Ys ) + Df (Ys )Ys,r

≈ f (Ys ) + Df (Ys )f (Ys )Xs,r .

It turns out that the description (8.13) is actually a formulation that is equivalent
to the RDE solution built previously in the following sense.
Proposition 8.8. The following two statements are equivalent
i) (Y, f (Y )) is a RDE solution to (8.6), as constructed in Theorem 8.4.
ii) Y ∈ C([0, T ], W ) is an “RDE solution in the sense of Davie”, i.e. in the sense of
(8.13).
Proof. We already discussed how (8.13) is obtained from an RDE solution to
2β
(8.6). Conversely, (8.13) implies immediately Ys,t = f (Ys )Xs,t + O |t − s|
which shows that Y ∈ C β and also Y 0 := f (Y ) ∈ C β , thanks to f ∈ Cb2 , so that
2β
(Y, f (Y )) ∈ DX . It remains to see, in the notation of the proof of Theorem 4.10,
that Ys,t = (IΞ)s,t with
8.8 Lyons’ original definition 119
0
Ξs,t = f (Ys )Xs,t + (f (Y ))s Xs,t = f (Ys )Xs,t + Df (Ys )f (Ys )Xs,t .

To see this, we note that trivially Ys,t = (I Ξ̃)s,t with Ξ̃s,t := Ys,t . But Ξ̃s,t =
Ξs,t + o(|t − s|) and one sees as in Remark 4.12 that I Ξ̃ = IΞ. t u

8.8 Lyons’ original definition

A slightly different notion of solution was originally introduced in [Lyo98] by Lyons.2

This notion only uses the spaces C α , without ever requiring the use of the spaces
DX2α
of “controlled rough paths”. Indeed, for X = (X, X) ∈ C α ([0, T ], V ) and F ∈
Cb (V, L(V, W )) we can define an element Z = (Z, Z) = IF (X) ∈ C α ([0, T ], W )
2

directly by
def
Zt = IΞ 0,t
, Ξs,t = F (Xs ) Xs,t + DF (Xs )Xs,t ,
def s
s

Zs,t = I Ξ̄ s,t
, Ξ̄u,v = Zs,u Zu,v + F (Xu ) ⊗ F (Xu ) Xu,v .

It is possible to check that Ξ̄ s ∈ C2α,3α for every fixed s (see the proof of Theorem
4.10) so that the second line makes sense. It is also straightforward to check that
(Z, Z) satisfies (2.1), so that it does indeed belong to C α . Actually, one can see that
Z t Z t
Zt = F (Xs ) dXs , Zs,t = Zs,r ⊗ dZr ,
0 s

where the integrals are defined as in the previous sections, where F (X) ∈ DX
2α
as in
Section 7.3.
We can now define solutions to (8.6) in the following way.

Definition 8.9. A rough path Y = (Y, Y) ∈ C α ([0, T ], W ) is a solution in the sense

of Lyons to (8.6) if there exists Z = (Z, Z) ∈ C α (V ⊕ W ) such that the projection
of (Z, Z) onto C α (V ) is equal to (X, X), the projection onto C α (W ) is equal to
(Y, Y), and Z = IF (Z) where

I 0
F (x, y) = .
f (y) 0

It is straightforward to see that if (Y, Y 0 ) ∈ DX

2α
(W ) is a solution to (8.6) in the
sense of the previous section, then the path Z = X ⊕ Y is controlled by X. As
seen in Section 7.1, it can therefore be interpreted as an element of C α . It follows
immediately from the definitions that it is then also a solution in the sense of Lyons.
Conversely, if (Y, Y) is a solution in the sense of Lyons, then one can check that one
2
As always, we only consider the step-2 α-Hölder case, i.e. α > 13 , whereas Lyons’ theory is
valid for every Hölder-exponent α ∈ (0, 1] (or: variation parameter p ≥ 1) at the complication of
heaving to deal with bpc levels.
120 8 Solutions to rough differential equations

necessarily has (Y, f (Y )) ∈ DX

2α
(W ) and that this is a solution in the sense of the
previous section. We leave the verification of this fact as an exercise to the reader.

8.9 Stability IV: Flows

We briefly state, without proof, a result concerning regularity of flows associated to

rough differential equations, as well as local Lipschitz estimates of the Itô–Lyons
maps on the level of such flows. More precisely, given a geometric rough path X ∈
Cgα ([0, T ], Rd ), we saw in Theorem 8.4 that, for Cb3 vector fields f = (f1 , . . . , fd )
on Re , there is a unique global solution to the rough integral equation
Z t
Yt = y + f (Ys ) dXs , t≥0. (8.14)
0

Write π(f ) (0, y; X) = Y for this solution. Note that the inverse flow exists trivially,
by following the RDE driven by X(. − t),

π(f ) (0, y; X)−1

t = π(f ) (0, y; X(. − t)t .

We call the map y 7→ π(f ) (0, y; X) the flow associated to the above RDE. Moreover,
if X is a smooth approximation to X (in rough path metric), then the corresponding
ODE solution Y is close to Y , with a local Lipschitz estimate as given in Section
8.6.
It is natural to ask if the flow depends smoothly on y. Given a multi-index
k = (k1 , . . . , ke ) ∈ Ne , write Dk for the partial derivative with respect to y 1 , . . . , y e .
The proof of the following statement is an easy consequence of [FV10b, Chapter 12].

Theorem 8.10. Let α ∈ (1/3, 1/2] and X, X̃ ∈ Cgα . Assume f ∈ Cb3+n for some
integer n. Then the associated flow is of regularity C n+1 in y, as is its inverse flow.
The resulting family of partial derivatives, {Dk π(f ) (0, ξ; X), |k| ≤ n} satisfies the
RDE obtained by formally differentiating dY = f (Y )dX.
At last, for every M > 0 there exist C, K depending on M and the norm of f
such that, whenever |||X|||α , |||X̃|||α ≤ M < ∞ and |k| ≤ n,

sup Dk π(f ) (0, ξ; X) − Dk π(f ) (0, ξ; X̃)α;[0,t] ≤ C%α (X, X̃),

ξ∈Re

sup Dk π(f ) (0, ξ; X)−1 − Dk π(f ) (0, ξ; X̃)−1 α;[0,t] ≤ C%α (X, X̃),

ξ∈Re

sup Dk π(f ) (0, ξ; X)α;[0,t] ≤ K,

ξ∈Re

sup Dk π(f ) (0, ξ; X)−1 α;[0,t] ≤ K.

ξ∈Re
8.10 Exercises 121

8.10 Exercises

Exercise 8.11. a) Consider the case of a smooth, one-dimensional driving signal

X : [0, T ] → R. Show that the solution map to the (ordinary) differential equation
dY = f (Y )dX, for sufficiently nice f (say bounded with bounded derivatives)
and started at some fixed point Y0 = ξ, is locally Lipschitz continuous with
respect to the driving signal in the supremum norm on [0, T ]. Conclude that it
admits a unique continuous extension to every continuous driving signal X.
b) Show by an example that, in general, no such statement holds for multi-dimen-
sional driving signals.
c) Formulate a condition on f under which the statement does still hold for multidi-
mensional driving signals.

Exercise 8.12 (Linear RDEs). Consider f ∈ L(W, L(V, W )). Given an a priori
estimate for solutions to dY = f (Y )dX. Conclude with a (global) existence and
uniqueness results for such linear RDEs.

Exercise 8.13 (Explicit solution, Chen–Strichartz formula). View

f = (f1 , . . . , fd ) ∈ Cb∞ Re , L Rd , Re ,

as collection of d (smooth, bounded with bounded derivatives of all orders) vector

fields on Re . Assume that f is step-2 nilpotent in the sense that [fi , [fj , fk ]] ≡ 0 for
any triple of indices i, j, k ∈ {1, . . . , d}. Here, [·, ·] denotes the Lie bracket between
two vector fields. Let (Y, f (Y )) be the RDE solution to dY = f (Y )dX started at
some ξ ∈ Re and assume that the rough path X is geometric. Give an explicit formula
of the type Yt = exp(...)ξ where exp denotes the unit time solution flow along a
vector field (...) which you should write down explicitly.

Exercise 8.14 (Explosion along linear-growth vector fields). Give an example of

smooth f with linear growth, and X ∈ C α so that dY = f (Y )dX started at some ξ
fails to have a global solution.

Exercise 8.15. Establish existence, continuity and stability for rough differential
equations with drift (cf. (8.6)),

dYt = f0 (Yt ) dt + f (Yt ) dXt . (8.15)

You may assume f0 ∈ Cb3 (although one can do much better and f0 Lipschitz is
enough). Hint: Under this assumption, one solves dY = f¯(Y )X̄ with f¯ = (f, f0 )
and a X̄ a “space-time” rough path extension of X.

Exercise 8.16. Let f ∈ Cb2 and assume (Y, f (Y )) is a RDE solution to (8.6), as
constructed in Theorem 8.4. Show that the o-term in Davie’s definition, (8.13), can
be chosen uniformly over (X, X) ∈ BR , any R < ∞, where
n o
BR := (X, X) ∈ C β : kXkβ + kXk2β ≤ R , any R < ∞.
122 8 Solutions to rough differential equations

Show also that RDE solutions are β-Hölder, uniformly over (X, X) ∈ BR , any
R < ∞.

Exercise 8.17. Show that dX,X n ,2α ((Y, f (Y )), (Y n , f (Y n ))) → 0, together with
X → Xn in C β implies that also (Y n , Yn ) → (Y, Y) in C α . Since, at the price of
replacing f by F , cf. Definition 8.9, there is no loss of generality in solving for the
controlled rough path Z = X ⊕ Y , conclude that continuity of the RDE solution
map (Itô–Lyons map) also holds with Lyons’ definition of a solution.

8.11 Comments

ODEs driven by not too rough paths, i.e. paths that are α-Hölder continuous for some
α > 1/2 or of finite p-variation with p < 2, understood in the (Young) integral sense
were first studied by Lyons in [Lyo94]; nonetheless, the terminology Young-ODEs is
now widely used. Existence and uniqueness for such equations via Picard iterations
is by now classical, our discussion in Section 8.3 is a mild variation of [LCL07, p.22]
where also the division property (cf. proof of Lemma 8.2) is emphasised. Existence
and uniqueness of solutions to RDEs via Picard iteration in the (Banach!) space of
controlled rough paths originates in [Gub04] for regularity α ∈ ( 13 , 12 ). This approach
also allows to treat arbitrary regularities, see [Gub10, Hai14c].
The continuity result of Theorem 8.5 is due to T. Lyons; proofs of uniform
continuity on bounded sets were given in [Lyo98, LQ02, LCL07]. Local Lipschitz
estimates were pointed out subsequently and in different settings by various authors
including Lyons–Qian [LQ02], Gubinelli [Gub04], Friz–Victoir [FV10b], Inahama
[Ina10], Deya et al. [DNT12a].
The name universal limit theorem was suggested by P. Malliavin, meaning con-
tinuity of the Itô–Lyons map in rough path metrics. As we tried to emphasise, the
stability in rough path metrics is seen at all levels of the theory.
Lyons’ original argument (for arbitrary regularity) also involves a Picard iteration,
see e.g. [LCL07, p.88]. For regularity α > 1/3, Davie [Dav08] proves existence
and uniqueness for Young resp. rough differential equations via discrete Euler resp.
Milstein approximations. Using Lie group techniques, Davie’s argument was adapted
to arbitrary values of α by Friz–Victoir [FV10b]. Let us also note that the regularity
assumption in Theorem 8.4 (f ∈ Cb3 ) is not sharp; it is fairly straightforward to push
the argument to γ-Lipschitz (in the sense of Stein) regularity, for any γ > 1/α. It is
less straightforward [Dav08, FV10b] to show that uniqueness also holds for γ ≥ 1/α
and this is optimal, with counter-examples constructed in [Dav08]. Existence results
on the other hand are available for γ > (1/α) − 1. Setting α = 1, this is consistent
with the theory of ODEs where it is well known that, modulo possible logarithmic
divergencies, Lipschitz continuity of the coefficients is required for the uniqueness
of local solutions, but continuity is sufficient for their existence.
Chapter 9
Stochastic differential equations

Abstract We identify the solution to a rough differential equation driven by the

Itô or Stratonovich lift of Brownian motion with the solution to the corresponding
stochastic differential equation. In combination with continuity of the Itô–Lyons
maps, a quick proof of the Wong–Zakai theorem is given. Applications to Stroock–
Varadhan support theory and Freidlin–Wentzell large deviations are briefly discussed.

9.1 Itô and Stratonovich equations

We saw in Section 3 that d-dimensional Brownian motion lifts in an essentially

canonical way to B = (B, B) ∈ C α [0, T ], Rd almost surely, for any α ∈ 13 , 12 .

In particular, we may use almost every realisation of (B, B) as the driving signal
of a rough differential equation. This RDE is then solved “pathwise” i.e. for a
fixed realisation of (B(ω), B(ω)). Recall that the choice of B is never unique: two
Itô Strat
important choices R the Stratonovich lift, we write B and B , where
R are the Itô and
B is defined as B ⊗ dB and B ⊗ ◦dB respectively. We now discuss the interplay
with classical stochastic differential equations (SDEs).

Theorem 9.1. Let f ∈ Cb3 Re , L Rd , Re , let f0 : Re → Re be Lipschitz continu-

ous, and let ξ ∈ Re . Then,

i) With probability one, BItô (ω) ∈ C α , any α ∈ (1/3, 1/2) and there is a unique
RDE solution (Y (ω), f (Y (ω))) ∈ DB(ω)2α
to

dY = f0 (Y )dt + f (Y ) dBItô , Y0 = ξ.

Moreover, Y = (Yt (ω)) is a strong solution to the Itô SDE dY = f0 (Y )dt +

f (Y )dB started at Y0 = ξ.
ii) Similarly, the RDE solution driven by BStrat yields a strong solution to the
Stratonovich SDE dY = f0 (Y )dt + f (Y ) ◦ dB started at Y0 = ξ.

123
124 9 Stochastic differential equations

Proof. We assume zero drift f0 , but see Exercise 8.15. The map

B|[0,t] 7→ (B, BStrat )|[0,t] ∈ Cg0,α [0, t], Rd

is measurable, where Cg0,α denotes the (separable, hence Polish) subspace of C α

obtained by taking the closure, in α-Hölder rough path metric, of piecewise smooth
paths. This follows, for instance, from Proposition 3.6. By the continuity of the
Itô–Lyons map (adding a drift vector field is left as an easy exercise) the RDE
solution Yt ∈ Re is the continuous image of the driving signal (B, BStrat )|[0,t] ∈
Cg0,α [0, t], Rd . It follows that Yt is adapted to

σ{Br,s , Br,s : 0 ≤ r ≤ s ≤ t} = σ{Bs : 0 ≤ s ≤ t} ,

1
and it suffices to apply Corollary 5.2. Since BItô Strat
s,t = Bs,t − 2 (t − s)I, measurability
is also guaranteed and we conclude with the same argument, using Proposition 5.1.
t
u
Remark 9.2. In contrast to standard SDE theory, the present solution constructed
via RDEs is immediately well-defined as a flow, i.e. for all ξ on a common set of
probability one. The price to pay is that of C 3 regularity of f , as opposed to the mere
Lipschitz regularity required for the standard theory.

9.2 The Wong–Zakai theorem

A classical result (e.g. [IW89, p.392]) asserts that SDE approximations based on
piecewise linear approximations to the driving Brownian motions converge to the
solution of the Stratonovich equation. Using the machinery built in the previous
sections, we can now give a simple proof of this by combining Proposition 3.6,
Theorem 8.5 and the understanding that RDEs driven by BStrat yield solutions to the
Stratonovich equation (Theorem 9.1).
Theorem 9.3 (Wong–Zakai, Clark, Stroock–Varadhan). Let f, f0 , ξ be as in The-
orem 9.1 above. Let α < 1/2. Consider dyadic piecewise-linear approximations
(B n ) to B on [0, T ], as defined in Proposition 3.6. Write Y n for the (random) ODE
solutions to dY n = f0 (Y n )dt+f (Y n )dB n and Y for the Stratonovich SDE solution
to dY = f0 (Y )dt + f (Y ) ◦ dB, all started at ξ. Then the Wong–Zakai approxi-
mations converge a.s. to the Stratonovich solution. More precisely, with probability
one,
kY − Y n kα;[0,T ] → 0.
The only reason for dyadic piecewise-linear approximations in the above statement
is the formulation of the martingale-based Proposition 3.6. In Section 10 we shall
present a direct analysis (going far beyond the setting of Brownian drivers) which
easily entails quantitative convergence (in probability and Lq , any q < ∞) for all
piecewise-linear approximations towards a (Gaussian) rough path.
9.3 Support theorem and large deviations 125

In the forthcoming Exercise 10.14 it will be seen that (non-dyadic) piecewise linear
approximations (meshsize ∼ 1/n), viewed canonically as rough paths, converge a.s.
in C α with rate anything less than 1/2 − α. As long as α > 1/3, it then follows
from (local) Lipschitzness of the Itô–Lyons map that Wong–Zakai approximations
also converge with rate 1/2 − α−. Note that the “best” rate one obtains in this way
is 1/2 − 1/3− = 1/6−; the reason being that rate is measured in some Hölder space
with exponent 1/3+, rather than the uniform norm. The (well known) almost sure
“strong” rate 1/2− can be obtained from rough path theory at the price of working in
rough path spaces of (much) lower regularity; see [FR14].

9.3 Support theorem and large deviations

We briefly discuss two fundamental results in diffusion theory and explain how
the theory of rough paths provides elegant proofs, reducing a question for general
diffusion to one for Brownian motion and its Lévy area.
The results discussed in this section were among the very first applications of
rough path theory to stochastic analysis, see Ledoux et al. [LQZ02]. Much more on
these topics is found [FV10b], so we shall be brief. The first result, due to Stroock–
Varadhan, concerns the support of diffusion processes.

Theorem 9.4 (Stroock–Varadhan support theorem). Let f, f0 , ξ be as in Theorem

9.1 above. Let α < 1/2, B be a d-dimensional Brownian motion and consider the
unique Stratonovich SDE solution Y on [0, T ] to
d
X
dY = f0 (Y )dt + fi (Y ) ◦ dB i (9.1)
i=1

started at Y0 = ξ ∈ Re . Write y h for ODE solution obtained by replacing ◦dB with

dh ≡ ḣ dt, whenever h ∈ H = W01,2 , i.e. absolutely continuous, h(0) = 0 and
ḣ ∈ L2 ([0, T ], Rd ). Then, for every δ > 0,

lim P kY − Y h kα;[0,T ] < δ kB − hk∞;[0,T ] < ε = 1 (9.2)

ε→0

(where Euclidean norm is used for the conditioning kB − hk∞,[0,T ] < ε). As a
consequence, the support of the law of Y , viewed as measure on the pathspace
C 0,α ([0, T ], Re ), is precisely the α-Hölder closure of {y h : ḣ ∈ L2 ([0, T ], Rd )}.

Proof. Using Theorem 9.1 we can and will take Y as RDE solution driven by
BStrat (ω). For h ∈ H and some fixed α ∈ ( 31 , 12 ), we furthermore denote by
S (2) (h) = (h, h ⊗ dh) ∈ Cg0,α the canonical lift given by computing the it-
R

erated integrals using usual Riemann-Stieltjes integration. It was then shown in

126 9 Stochastic differential equations

[FLS06]1 that for every δ > 0,

lim P %α;[0,T ] BStrat , S (2) (h) < δ kB − hk∞;[0,T ] < ε = 1.

(9.3)

ε→0

The conditional statement then follows easily from continuity of the Itô–Lyons map
and so yields the “difficult” support inclusion: every y h is in the support of Y . The
easy inclusion, support of Y contained in the closure of {y h }, follows from the
Wong–Zakai theorem, Theorem 9.3. If one is only interested in the support statement,
but without the conditional statement (9.2), there are “softer” proofs; see Exercise
9.6 below. t u
The second result to be discussed here, due to Freidlin–Wentzell, concerns the
behaviour of diffusion in the singular (ε → 0) limit when B is replaced by εB. We
assume the reader is familar with large deviation theory.
Theorem 9.5 (Freidlin–Wentzell large deviations). Let f, f0 , ξ be as in Theorem
9.1 above. Let α < 1/2, B be a d-dimensional Brownian motion and consider the
unique Stratonovich SDE solution Y = Y ε on [0, T ] to
d
X
dY = f0 (Y )dt + fi (Y ) ◦ εdB i (9.4)
i=1

started at Y0 = ξ ∈ Re . Write Y h for the ODE solution obtained by replacing ◦εdB

with dh where h ∈ H = W01,2 . Then (Ytε : 0 ≤ t ≤ T ) satisfies a large deviation
principle (in α-Hölder topology) with good rate function on pathspace given by

J(y) = inf I(h) : Y h = y .

Here I is Schilder’s rate function for Brownian motion, i.e. I(h) = 12 kḣk2L2 ([0,T ],Rd )
for h ∈ H and I(h) = +∞ otherwise.
Proof. The key remark is that large deviation principles are robust under continuous
maps, a simple fact known as contraction principle. The problem is then reduced to
establishing a suitable large deviation principle for the Stratonovich lift of εB (which
is exacly δε BStrat ) in the α-Hölder rough path topology. Readers familiar with general
facts of large deviation theory, in particular the inverse and generalized contraction
principles, are invited to complete the proof along Exercise 9.7 below. t u

9.4 Exercises

Exercise 9.6 (support of Brownian rough path, see [FV10b]). Fix α ∈ ( 31 , 12 )

0,α
and view the law µ of BStrat as probability measure on the Polish space Cg,0 , the
1
Strictly speaking, this was shown for h ∈ C 2 ; the extension to h ∈ H is non-trivial and found in
[FV10b].
9.5 Comments 127

(closed) subspace of Cg0,α of rough paths X started at X0 = 0. Show that BStrat has
full support. The “easy” inclusion, supp µ ⊂ Cg0,α is clear from Proposition 3.6. For
the other inclusion, recall the translation operator from Exercise 2.19 and follow the
steps below.
a) (Cameron–Martin theorem for Brownian rough path) Let h ∈ [0, T ] ∈ H =
W01,2 . Show that X ∈ supp µ implies Th (X) ∈ supp µ.
b) Show that the support of µ contains at least one point, say X̂ ∈ Cg0,α with the
property that there exists a sequence of Lipschitz paths (h(n) ) so that Th(n) (X̂) →
(0, 0) in α-Hölder rough path metric. Hint: Almost every realization of BStrat (ω)
will do, with −h(n) = B (n) , the dyadic piecewise-linear approximations from
Proposition 3.6.
c) Conclude that (0, 0) = limn→∞ R Th(n) (X̂) ∈ supp µ.
d) As a consequence, any (h, h ⊗ dh) = Th (0, 0) ∈ supp µ, for any h ∈ H and
taking the closure yields the “difficult” inclusion.
e) Appeal to continuity of the Itô–Lyons map to obtain the “difficult” support inclu-
sion (“every y h is in the support of Y ” ) in the context of Theorem 9.4.

Exercise 9.7 (“Schilder” large deviations, see [FV10b]). Fix α ∈ ( 13 , 21 ) and

consider
δε BStrat = (εB, ε2 BStrat ) ,
0,α
the laws of which are viewed as probability measures µε on the Polish space Cg,0 .
ε
Show that (µ ) : ε > 0 satisfies a large deviation principle in α-Hölder rough path
topology with good rate function

J(X) = I(X) ,

where X = (X, X) and I is Schilder’s rate function for Brownian motion, i.e.
I(h) = 12 kḣk2L2 ([0,T ],Rd ) for h ∈ H = W01,2 and I(h) = +∞ otherwise.
Hint: Thanks to Fernique estimates for the homogeneous rough paths norm of
BStrat (which can be obtained by carefully tracking the moment-growth in Theorem
3.1 applied to BStrat ; alternatively see Theorem 11.9 below for an elegant Gaussian
argument) it is actually enough to establish a large deviation principle for (δε BStrat :
ε > 0) in the (much coarser) uniform topology, which is not very hard to do “by
hand”, cf. [FV10b].

9.5 Comments

Lyons [Lyo98] used the Wong–Zakai theorem in conjunction with his continuity
result to deduce the fact that RDE solutions (driven by the Brownian rough path BStrat )
coincide with solution to (Stratonovich) stochastic differential equations. Similar to
Friz–Victoir [FV10b], the logic is reversed here: thanks to an a priori identification
of f (Y ) dBStrat as a Stratonovich stochastic integral, the Wong–Zakai results is
R
128 9 Stochastic differential equations

obtained. Almost sure rates for Wong–Zakai approximations in Brownian (and then
more general Gaussian) situations, were studied by Hu–Nualart [HN09], Deya, Tindel
and Neuenkirch [DNT12b] and Friz–Riedel [FR14]; see also Riedel–Xu [RX13].
Let us also note that Lq -rates for the convergence of approximations are not easy
to obtain with rough path techniques (in contrast to Itô-calculus which is ideally
suited for moment calculations). Nonetheless, such rates can be obtained by Gaussian
techniques, as discussed in Section 11.2.3 below; applications include multi-level
Monte Carlo for rough differential equations [BFRS13]. The material in Section
9.3 goes back to Ledoux, Qian and Zhang ([LQZ02]; in p-variation). The results in
stronger Hölder topolgy are due to Friz and Victoir [Fri05, FV05, FV07, FV10b],
the conditional estimate (9.3) is due to Friz, Lyons and Stroock [FLS06].
Chapter 10
Gaussian rough paths

Abstract We investigate when multidimensional stochastic processes can be viewed

– in a “canonical” fashion – as random rough paths. Gaussianity only enters through
equivalence of moments. A simple criterion is given which applies in particular to
fractional Brownian motion with suitable Hurst parameter.

10.1 A simple criterion for Hölder regularity

We now consider a driving signal modelled by a continuous, centred Gaussian process

with values in V = Rd . We thus have continuous sample paths

X(ω) : [0, T ] → Rd

and may take the underlying probability space as C [0, T ], Rd , equipped with a

Gaussian measure µ so that Xt (ω) = ω(t). Recall that µ, the law of X, is fully
determined by its covariance function
2
R : [0, T ] → Rd×d
(s, t) 7→ E[Xs ⊗ Xt ] .

In this section, a major role will be played by the rectangular increments of the
covariance, namely
s , t def
R 0 0 = E[Xs,t ⊗ Xs0 ,t0 ] .
s ,t
As far as the Hölder regularity of sample paths is concerned, we have the following
classical result, which is nothing but a special case of Kolmogorov’s continuity
criterion:

Proposition 10.1. Assume there exists positive % and M such that for every 0 ≤ s ≤
t ≤ T,

129
130 10 Gaussian rough paths

R s, t ≤ M |t − s|1/% .

(10.1)
s, t
Then, for every α < 1/(2%) there exists Kα ∈ Lq , for all q < ∞, such that
α
|Xs,t (ω)| ≤ Kα (ω)|t − s| .

Proof. We may argue componentwise and thus take d = 1 without loss of generality.
Since
1/2
1/2 s, t 1
≤ M 1/2 |t − s| 2%

|Xs,t |L2 = (E[Xs,t Xs,t ]) ≤ R
s, t

and |Xs,t |Lq ≤ cq |Xs,t |L2 by Gaussianity, we conclude immediately with an appli-
cation of the Kolmogorov criterion. t u

Whenever the above proposition applies with % < 1, the resulting sample paths
can be taken with Hölder exponent α ∈ ( 12 , 2%
1
); differential equations driven by X
can then be handled with Young’s theory, cf. Section 8.3. Therefore, our focus will be
on Gaussian processes which satisfy a suitable modification of condition (10.1) with
% ≥ 1 such that the process X allows for a probabilistic construction of a suitable
second order process1
2
X(ω) : [0, T ] → Rd×d ,
which is tantamount to making sense of the “formal” stochastic integrals
Z t
i
Xs,r dXrj for 0 ≤ s < t ≤ T, 1 ≤ i, j ≤ d , (10.2)
s

such that almost every realisation X(ω) satisfies the algebraicand analytical prop-
erties of Section 2, notably (2.1) and (2.3) for some α ∈ 31 , 12 . We shall also look
for (X, X) as (random) geometric rough path; thanks to (2.5), only the case i < j in
(10.2) then needs to be considered.
At the risk of being repetitive, the reader should keep in mind the following three
points: (i) the sample paths X(ω) will not have, in general, enough regularity to
define (10.2) as Young integrals; (ii) the process X will not be, in general, a semi-
martingale, so (10.2) cannot be defined using classical stochastic integrals; (iii) a lift
of the process X to (X, X) ∈ Cgα for some α ∈ 13 , 12 , if at all possible, will never
be unique (as discussed in Chapter 2, one can always perturb the area, i.e. Anti(X)
by the increments of a 2α-Hölder path). But there might still be one distinguished
canonical choice forR X, in the same way as BStrat is canonically obtained as limit
(in probability) of B ⊗ dB n , for many natural approximations B n of Brownian
n

motion B.

1
Despite the two parameters (s, t) one should not think of a random field here: as was noted in
Exercise 2.7, (X, X) is really a path.
10.2 Stochastic integration and variation regularity of the covariance 131

10.2 Stochastic integration and variation regularity of the

covariance

Our standing assumption from here on is independence of the d components of

X, which is tantamount to saying that the covariance takes values in the diagonal
matrices. Basic examples to have in mind are d-dimensional standard Brownian
motion B with
R(s, t) = (s ∧ t) × Id ∈ Rd×d
(here Id denotes the identity matrix in Rd×d ) or fractional Brownian motion B H ,
with
1h 2H
i
RH (s, t) = s2H + t2H − |t − s| × Id ∈ Rd×d
2
2 2H
where H ∈ (0, 1); note the implication E BtH − BsH = |t − s| . The reader
should observe that Proposition 10.1 above applies with % = 1/(2H); the focus on
% ≥ 1 (to avoid trivial situations covered by Young theory) translates to H ≤ 1/2.
We return to the task of making sense of (10.2), componentwise for fixed i < j,
and it will be enough to do so for theunit interval; theinterval [s, t] is handled
by
considering Xs+τ (t−s) : 0 ≤ τ ≤ 1 . Writing X, X̃ , rather than X i , X j , we
attempt a definition of the form
Z 1 X
def
X0,u dX̃u = lim X0,ξ X̃s,t with ξ ∈ [s, t] , (10.3)
0 |P|↓0
[s,t]∈P

where the limit is understood in probability, say. Classical stochastic analysis (e.g.
[RY91, p144]) tells us that care is necessary: if X, X̃ are semimartingales, the
choice ξ = s (“left-point evaluation”) leads to the Itô integral; ξ = t (“right-
point evaluation”) to the backward Itô - and ξ = (s + t)/2 to the Stratonovich
integral. On the other hand, all these integrals only differ by a bracket term hX, X̃i
which vanishes if X, X̃ are independent. While we do not assume a semi-martingale
structure here, we do have the standing assumption of componentwise independence.
This suggests a Riemann sum approximation of (10.2) in which we expect the precise
point of evaluation to play no rôle; we thus consider left-point evaluation (but mid-
or rightpoint evaluation would lead to the same result; cf. Exercise 10.18, (ii) below).
Give partitions P, P 0 of [0, 1] we set
Z X
X0,s dX̃s := X0,s X̃s,t ,
P [s,t]∈P

so that under the assumption that X and X̃ are independent, we have

Z Z
X 0, s s,t
E X0,s dX̃s X0,s dX̃s = R R̃ . (10.4)
P P0 0, s0 s0 , t0
[s,t]∈P
[s0 ,t0 ]∈P 0
132 10 Gaussian rough paths

On the right-hand side we recognise a 2D Riemann-Stieltjes sum and set

Z
X 0, s s,t
R dR̃ := R R̃ 0 0 .
P×P 0 0, s0 s ,t
[s,t]∈P
[s0 ,t0 ]∈P 0

Let us now assume that R has finite %-variation in the sense kRk%;[0,1]2 < ∞ where
the %-variation on a rectangle I × I 0 is given by
% !1/%
R s0 , t0
X
kRk%;I×I 0 := sup < ∞, (10.5)
P⊂I,
s ,t
[s,t]∈P
P 0 ⊂I 0 0 0
[s ,t ]∈P 0

and similarly for R̃, with θ = 1/% + 1/%̃ > 1. A generalisation of Young’s maximal
inequality due to Towghi [Tow02] states that 2
Z

sup R dR̃ ≤ C(θ) R %;I×I 0 R̃ %̃;I×I 0 .

P⊂I, P×P 0
P 0 ⊂I 0

In particular, if the covariance of X̃ has similar variation regularity as X, the condi-

tion simplifies to % < 2 and we obtain the following L2 -maximal inequality.
Lemma 10.2. Let X, X̃ be independent, continuous, centred Gaussian processes
with respective covariances R, R̃ of finite %-variation, some % < 2. Then
"Z 2 #

sup E X0,r dX̃r ≤ C R 2 R̃
2 ,
%;[0,1] %;[0,1]
P⊂[0,1] P

where the constant C depends on %.

We can now show existence of (10.3) as L2 -limit.
Proposition 10.3. Under the assumptions of the previous lemma,
Z Z

lim sup X0,r dX̃r − X 0,r dX̃r
= 0. (10.6)
ε→0 P,P 0 ⊂[0,1]:

P

P0 L2
|P|∨|P 0 |<ε,

R1
X0,r dX̃r exists as the L2 -limit of
R
Hence, 0 P
X0,r dX̃r as |P| ↓ 0 and
"Z
1 2 #

E X0,r dX̃r ≤ C R %;[0,1]2 R̃ %;[0,1]2 (10.7)
0

with a constant C = C(%).

2
This holds more generally if R is evaluated at [0, ξ] × [0, ξ 0 ] where ξ ∈ [s, t], ξ 0 ∈ [s0 , t0 ].
10.2 Stochastic integration and variation regularity of the covariance 133

Proof. At first glance, the situation looks similar to Young’s part in the proof of
Theorem 4.10 where we deduce (4.12) from Young’s maximal inequality. However,
the same argument fails if re-run with Ξs,t = X0,s X̃s,t and | · | replaced by | · |L2 ;
in effect, the triangle inequality is too crude and does not exploit probabilistic
cancellations present here. We now present two arguments for the key estimate (10.6).
First argument: at the price of adding/subtracting P ∩ P 0 , we may assume without
loss of generality that P 0 refines P. This allows to write
Z Z X Z def
X0,r dX̃r − X0,r dX̃r = Xu,r dX̃r = I ,
P0 P [u,v]∈P P 0 ∩[u,v]

and we need to show convergence of I to zero in L2 as |P| = |P| ∨ |P 0 | → 0. To

see this, we rewrite the square of the expectation of this quantity as
Z Z !
X X
EI 2 = E Xu,r dX̃r Xu0 ,r0 dX̃r0
[u,v]∈P [u0 ,v 0 ]∈P P 0 ∩[u,v] P 0 ∩[u0 ,v 0 ]

X X Z
= R dR̃ .
[u,v]∈P [u0 ,v 0 ]∈P P 0 ∩[u,v]×P 0 ∩[u0 ,v 0 ]

Thanks to Towghi’s maximal inequality, the absolute value of this term is bounded
from above by a constant C = C(%) times
X X
kRk%;[u,v]×[u0 ,v0 ] R̃ %;[u,v]×[u0 ,v0 ]
[u,v]∈P [u0 ,v 0 ]∈P
X X 1 1
≤ ω([u, v] × [u0 , v 0 ]) % ω̃([u, v] × [u0 , v 0 ]) % ,
[u,v]∈P [u0 ,v 0 ]∈P

where ω = ω([s, t] × [s0 , t0 ]) (and similarly for ω̃) is a so-called 2D control [FV11]:
super-additive, continuous and zero when s = t or s0 = t0 . A possible choice, if
finite, is
%
0 0 def
X u , v
ω([s, t] × [s , t ]) = sup R u0 , v 0 .
(10.8)
Q⊂[s,t]×[s0 ,t0 ] 0 0 [u,v]×[u ,v ]∈Q

The difference to (10.5) is that the sup is taken over all (finite) partitions Q of
[s, t] × [s0 , t0 ] into rectangles; not just “grid-like” partitions induced by P × P 0 .
At this stage it looks like one should the change the assumption “covariance of
finite %-variation” to “finite controlled %-variation”, which by definition means
2
ω [0, 1] < ∞. But in fact there is little difference [FV11]: finite controlled %-
variation trivially implies finite %-variation; conversely, finite %-variation implies
finite controlled %0 -variation, any %0 > %. Since (10.6) does not depend on %, we may
as well (at the price of replacing % by %0 ) assume finite controlled %-variation. The
Cauchy–Schwarz inequality for finite sums shows that ω̄ := ω 1/2 ω̃ 1/2 is again a 2D
134 10 Gaussian rough paths

control; the above estimates can then be continued to

2/%
X X
EI 2 ≤ C ω̄([u, v] × [u0 , v 0 ])
[u,v]∈P [u0 ,v 0 ]∈P
2−% X X
≤C max ω̄([u, v] × [u0 , v 0 ]) %
× ω̄([u, v] × [u0 , v 0 ])
[u,v]∈P
[u,v]∈P [u0 ,v 0 ]∈P
[u0 ,v0 ]∈P
≤ o(1) × ω̄([0, 1] × [0, 1]) ,

where we used the facts that |P| ↓ 0, % < 2 and super-additivity of ω̄ to obtain
the last inequality. This is precisely the required bound. The second argument
makes use of Riemann-Stieltjes theory, applicable after mollification of X̃, and a
uniformity property of %-variation upon mollification. Let thus denote X̃ n := X̃ ∗ fn
the convolution of t 7→ X̃t with (fn ), a family of smooth, compactly supported
n
probability density functions, weakly convergent to a Dirac at 0. Writing R̃s,t :=
n n
n n n

E X̃s X̃t for the covariance of X̃ , and also S̃s,t := E X̃s X̃t for the “mixed”
covariance, we leave the fact that

sup R̃n %;[0,1]2 , sup S̃ n %;[0,1]2 ≤ R̃ %;[0,1]2 ,

(10.9)
n n

as and easy exercise for the reader. (Hint: Note R̃n = R̃ ∗ (fn ⊗ fn ), S̃ n = R̃ ∗
(δ ⊗ fn ); estimate then the rectangular increments of R̃n , respectively S̃ n , to the
power % with Jensen’s inequality.)
Since X̃ n has finite variation sample paths, basic Riemann-Stieltjes theory implies
Z Z
X0,r dX̃rn → X0,r dX̃rn as |P| → 0. (10.10)
P

In fact, this convergence (n fixed) takes also place in L2 which may be seen as con-
sequence of Lemma 10.2. On the other hand, pick %0 ∈ (%, 2) and apply Lemma 10.2
to obtain3
Z Z 2
X0,r dX̃rn ≤ CkRX k%0 ;[0,1]2 RX̃−X̃ n %0 ;[0,1]2

sup X0,r dX̃r −

P P P L2
%/%0 1−%/%0
≤ CkRX k%0 ;[0,1]2 RX̃−X̃ n %;[0,1]2 RX̃−X̃ n ∞;[0,1]2 , (10.11)

where C = C(%). Now %0 > % implies kRX k%0 ;[0,1]2 ≤ kRX k%;[0,1]2 (immediate
Pm %
consequence of |x|%0 ≤ |x|% ≡ ( i=1 |xi | )1/% on Rm ) and thanks to (10.9) we
also have the (uniform in n) estimate
n
R n

X̃−X̃ 2 ≤ C %
%;[0,1]
R 2 + 2 S̃ 2 +
X̃ %;[0,1]
R n
2 %;[0,1] X̃ %;[0,1]

u, v
3
Define |f |∞;[0,1]2 = sup f 0 0 where the sup is taken over all [u, v], [u0 , v 0 ] ⊂ [0, 1].
u ,v
10.2 Stochastic integration and variation regularity of the covariance 135

≤ 4C% R̃ %;[0,1]2 .

Since X̃ n converges to X̃ uniformly and in L2 , it is not hard to see that RX̃−X̃ n → 0

2
uniformly on [0, 1] . We then see that (10.11) tends to zero as n → ∞. It is now an
elementary exercise to combine this with (10.10) to conclude the (second) proof of
(10.6).
At last, the L2 -estimate is an immediate corollary of the maximal inequality given
in Lemma 10.2 and L2 -convergence of the approximating Riemann-Stieltjes sums.
t
u

Note that there was nothing special about the time horizon [0, 1] in the above
discussion. Indeed, given any time horizon [s, t] of interest,
it suffices to apply the
same argument to the process Xs+τ (t−s) : 0 ≤ τ ≤ 1 . Since variation norms are
conveniently invariant under reparametrisation, (10.7) translates immediately to an
estimate of the form
"Z
t 2 #

E Xs,r dX̃r ≤ C R %;[s,t]2 R̃ %;[s,t]2 , (10.12)
s

first for the approximating Riemann-Stieltjes sums and then for their L2 -limits.

Theorem 10.4. Let (Xt : 0 ≤ t ≤ T ) be a d-dimensional, continuous, centred Gaus-

sian process with independent components and covariance R such that there exists
% ∈ [1, 2) and M < ∞ such that for every i ∈ {1, . . . , d} and 0 ≤ s ≤ t ≤ T ,
1/%
kRX i k%;[s,t]2 ≤ M |t − s| . (10.13)

Define, for 1 ≤ i < j ≤ d and 0 ≤ s ≤ t ≤ T , in L2 -sense (cf. Proposition 10.3),

Z
i,j
Xri − Xsi dXrj ,

Xs,t := lim
|P|→0 P

and then also (the algebraic conditions (2.1) and (2.5) leave no other choice!)
1 i 2
Xi,i
s,t := X and Xj,i i,j i j
s,t := −Xs,t + Xs,t Xs,t . (10.14)
2 s,t
Then, the following properties hold:
a) For every q ∈ [1, ∞) there exists C1 = C1 (q, %, d, T ) such that for all 0 ≤ s ≤
t ≤ T,
2q q q/%
E |Xs,t | + |Xs,t | ≤ C1 M q |t − s| . (10.15)

b) There exists a continuous modification of X, denoted by the same letter from here
on. Moreover, for any α < 1/(2%) and q ∈ [1, ∞) there exists C2 = C2 (q, %, d, α)
such that
2q q
E kXkα + kXk2α ≤ C2 M q . (10.16)
136 10 Gaussian rough paths

1
c) For any α < 2% , with probability one, the pair (X, X) satisfies conditions (2.1),
(2.3) and (2.5). In particular, for % ∈ [1, 32 ) and any α ∈ ( 13 , 2%
1
) we have
(X, X) ∈ Cg almost surely.
α

Proof. By scaling, we can take M = 1 without loss of generality. Regarding the

first property, the “first level” estimates are contained in Proposition
q 10.1. Thus,
in view of (10.14), in order to establish (10.15) only E Xi,j s,t
for i < j needs
to be considered. For q = 2 this is an immediate consequence of (10.12) and our
assumption (10.13). The case of general q follows from the well known equivalence
of Lq - and L2 -norm on the second Wiener–Itô chaos (e.g. [FV10b, Appendix D]).
Regarding the remaining two properties, almost sure validity of the algebraic con-
straint (2.1) for any fixed pair of times is an easy consequence of algebraic identities
for Riemann sums. The construction of a continuous modification of (s, t) 7→ Xs,t
under the assumed bound is then standard (in fact, the proof of Theorem 3.1 shows
this for dyadic times and the unique continuous extension is the desired modification).
At last, Theorem 3.1 yields Kα , Kα , with moments of all orders, such that
α 2α
|Xs,t | ≤ Kα (ω)|t − s| , |Xs,t | ≤ Kα (ω)|t − s| .

The dependence of the moments of Kα and Kα on M finally follows by simple

rescaling. tu

Theorem 10.5. Let (X, Y ) = X 1 , Y 1 , . . . , X d , Y d be a centred continuous
Gaus-
sian process on [0, T ] such that X i , Y i is independent of X j , Y j when i 6= j.
Assume that there exists % ∈ [1, 2) and M ∈ (0, ∞) such that the bounds
1/% 1/%
kRX i k%;[s,t]2 ≤ M |t − s| , kRY i k%;[s,t]2 ≤ M |t − s| ,
1/%
kRX i −Y i k%;[s,t]2 ≤ ε2 M |t − s| , (10.17)

hold for all i ∈ {1, . . . , d} and all 0 ≤ s ≤ t ≤ T . Then

a) For every q ∈ [1, ∞), the bounds
q 1
√ 1
E(|Ys,t − Xs,t | ) q . ε M |t − s| 2% ,
1 1
q
E(|Ys,t − Xs,t | ) q . ε M |t − s| % ,

hold for all 0 ≤ s ≤ t ≤ T .

b) For any α < 1/(2%) and q ∈ [1, ∞), one has

q 1
√
|E(kY − Xkα )| q . ε M ,
1
q
|E(kY − Xk2α )| q . εM .

c) For % ∈ [1, 32 ) and any α ∈ ( 13 , 2%

1
), q < ∞, one has
10.2 Stochastic integration and variation regularity of the covariance 137

|%α (X, Y)|Lq . ε .

(Here, %α (X, Y) denotes the α-Hölder rough path distance between X = (X, X)
and Y = (Y, X) in Cgα .)
Proof. By scaling we may without loss of generality assume M = 1. As for a) we
note (again) that equivalence of Lq - and L2 -norm on Wiener–Itô chaos allow to
reduce our discussion to q = 2. The first level estimate being easy, we focus on
the second level estimate; to this end fix i 6= j. Since L2 -convergence implies a.s.
convergence along a subsequence there exists (Pn ), with mesh tending to zero, we
can use Fatou’s lemma to estimate
Z 2
i,j 2
i,j i
dYrj − Xs,r
i
dXrj

E Ys,t − Xs,t = E lim Ys,r

n→∞ Pn
Z 2
i
≤ lim inf E Ys,r dYrj − Xs,r
i
dXrj

n Pn
Z 2
i
≤ sup E Ys,r dYrj − Xs,ri
dXrj .

P P

The result now follows from the bound

Z Z Z
i j i j
i j i j

Ys,r
dYr − Xs,r dXr ≤ Ys,r d(Y − X)r + (Y − X)s,r dXr ,

P P P

where we estimate the second moment of each term on the right hand side by the
respective variation norms of the covariances; e.g.
Z 2
i j
E Ys,r d(Y − X)r ≤ CkRY i k%;[s,t]2 kRY j −X j k%;[s,t]2

P
2
≤ Cε2 |t − s| % .

The case i = j is easier: it suffices to note that

i,i 2 1 i 2
2
E Yi,i i

s,t − Xs,t = E Ys,t − Xs,t
4
1 i i
i i

= E Ys,t − Xs,t Ys,t + Xs,t ,
4
then conclude with Cauchy–Schwarz.
Regarding b), given the pointwise Lq -estimates as stated in a), the Lq -estimates
for kX − Y kα and kY − Xk2α are obtained from Theorem 3.3. The last statement
is then an immediate consequence of the definition of %α . t u
1

Corollary 10.6. As above, let (X, Y ) = X , Y , . . . , X , Y d be a centred
1 d
contin-
uous Gaussian process such that X i , Y i is independent of X j , Y j when i 6= j.
Assume that there exists % ∈ [1, 23 ) and M ∈ (0, ∞) such that
1/%
R(X,Y )
%;[s,t]2
≤ M |t − s| ∀0 ≤ s ≤ t ≤ T. (10.18)
138 10 Gaussian rough paths

Then, for every α ∈ ( 13 , 2%

1
), every θ ∈ 0, 21 − %α and q < ∞, there exists a

constant C such that

h iθ
2
|%α (X, Y)|Lq ≤ C sup E|Xs,t − Ys,t | . (10.19)
s,t∈[0,T ]

Proof. At the price of replacing (X, Y ) by the rescaled process M −1/2 (X, Y ) we
may take M = 1. (The concluding Lq -estimate on %α M −1/2 X, M −1/2 Y is then
readily translated into an estimate on %α (X, Y ), given that we allow the final constant
to depend on M .) Assumption (10.18) then spells out precisely to
1/% 1/%
kRX i k%;[s,t]2 ≤ |t − s| , kRY i k%;[s,t]2 ≤ |t − s|

and (not present in the assumptions of the previous theorem!)

1/%
R(X i ,Y i )
%;[s,t]2
≤ |t − s|

where R(X i ,Y i ) (u, v) = E Xui Yvi . Thanks to this assumption we have

kRX i −Y i k%;[s,t]2 ≤ C% kRX i k%;[s,t]2 + 2 R(X i ,Y i ) %;[s,t]2 + kRY i k%;[s,t]2
1/%
≤ 4C% |t − s| ,

which is handy in the following interpolation argument. Set

η := max{kRX i −Y i k∞;[0,T ]2 : 1 ≤ i ≤ d}

and note that, for any %0 > %,

1−%/%0 %/%0
kRX i −Y i k%0 ;[s,t]2 ≤ kRX i −Y i k∞;[s,t]2 kRX i −Y i k%;[s,t]2
%/%0 1−%/%0 1/%0
≤ (4C% ) η |t − s| .
0
Also, with M̃ = 1 ∨ T 1/p−1/p , and then similar for RY i ,
1/% 1/%0
kRX i k%0 ;[s,t]2 ≤ kRX i k%;[s,t]2 ≤ |t − s| ≤ M̃ |t − s|
%
and so, picking %0 = 1−2θ the previous theorem (with %0 ← % and ε2 ←
0 %/%0
η 1−%/% , M ← M̃ ∨ (4C% ) ...) yields
1 1
−%
|%α (X, Y )|Lq ≤ Cε = Cη 2 2%0 = Cη θ .

for any given θ ∈ 0, 21 − %α . At last, take i∗ ∈ {1, . . . , d} as the arg max in the

definition of η and set ∆ = X i∗ − Y i∗ . Then, by Cauchy–Schwarz,

10.3 Fractional Brownian motion and beyond 139

η = kR∆ k∞;[0,T ]2 = sup E(∆s,t ∆s0 ,t0 ) ≤ sup E∆2s,t

0≤s≤t≤T 0≤s≤t≤T
0≤s0 ≤t0 ≤T

and the proof is finished. t

Remark 10.7. Corollary 10.6 suggests an alternative route to the construction of a

rough path lift X = (X, X) for some Gaussian process X as in Theorem 10.4. The
idea is to establish the crucial estimate (10.19) only for processes with regular sample
paths, in which case X is canonically given by iterated Riemann–Stieltjes integration.
Apply this to piecewise linear (or mollifier) approximations X n , X m to see that
(X n , Xn ) is Cauchy, in probability and rough path metric in the space Cg0,α . The
resulting limiting (random) rough path X is easily seen to be indistinguishable from
the one construct in Theorem 10.4. All estimates are then seen to remain valid in the
limit. (This is the approach taken in [FV10b].)

10.3 Fractional Brownian motion and beyond

We remarked in the beginning of Section 10.2 that (d-dimensional) fractional Brown-

ian motion B H , with Hurst parameter H ∈ (0, 1), determined through its covariance
1 h 2H 2H
i
RH (s, t) = s + t2H − |t − s| × Id ∈ Rd×d
2
has α-Hölder sample paths for any α < H. For H > 1/2, there is little need for
rough path analysis - after all, Young’s theory is applicable. For H = 1/2, one deals
with d-dimensional standard Brownian motion which, of course, renders the classical
martingale based stochastic analysis applicable. For H < 1/2, however, all these
theories fail but rough path analysis works. In the remainder of this section we detail
the construction of a fractional Brownian rough path.
In fact, we shall consider centred,continuous Gaussian processes with indepen-
dent components X = X 1 , . . . , X d and stationary increments. The construction
of a (geometric) rough path associated to X then naturally passes through an under-
standing of the two-dimensional %-variation of R = RX , the covariance of X; cf.
Theorem 10.4. To this end, it is enough to focus on one component and we may take
X to be scalar until further notice. The law of such a process is fully determined by

2
2 t, t + u
σ (u) := E Xt,t+u = R .
t, t + u

Lemma 10.8. Assume that σ 2 (·) is concave on [0, h] for some h > 0. Then, one
has non-positive correlation of non-overlapping increments in the sense that, for
0 ≤ s ≤ t ≤ u ≤ v ≤ h,

s, t
E[Xs,t Xu,v ] = R ≤ 0.
u, v
140 10 Gaussian rough paths

If in addition σ 2 (·) restricted to [0, h] is non-decreasing (which is always the case

for some possibly smaller h), then for 0 ≤ s ≤ u ≤ v ≤ t ≤ h,
2
0 ≤ E[Xs,t Xu,v ] = |E[Xs,t Xu,v ]| ≤ E Xu,v = σ 2 (v − u) .
2 2 2
Proof. Using the identity 2ac = (a + b + c) + b2 − (b + c) − (a + b) with
a = Xs,t , b = Xt,u and c = Xu,v , we see that
2 2 2 2
2E[Xs,t Xu,v ] = E Xs,v + E Xt,u − E Xt,v − E Xs,u
= σ 2 (v − s) + σ 2 (u − t) − σ 2 (v − t) − σ 2 (u − s).

The first claim now easily follows from concavity, cf. [MR06, Lemma 7.2.7].
To show the second bound, note that Xs,t Xu,v = (a + b + c)b where a = Xs,u ,
b = Xu,v , and c = Xv,t . Applying the algebraic identity
2 2
2(a + b + c)b = (a + b) − a2 + (c + b) − c2

and taking expectations yields

2 2 2 2
2E[Xs,t Xu,v ] = E Xs,v − E Xs,u + E Xu,t − E Xv,t
= σ 2 (v − s) − σ 2 (u − s) + σ 2 (t − u) − σ 2 (t − v) ≥ 0 ,

where we used that σ 2 (·) is non-decreasing. On the other hand, using (a + b + c)b =
b2 + ab + cb and the non-positive correlation of non-overlapping increments, we
have
2 2
E[Xs,t Xu,v ] = E Xu,v + E[Xs,u Xu,v ] + E[Xv,t Xu,v ] ≤ E Xu,v ,

thus concluding the proof. t

Theorem 10.9. Let X be a real-valued Gaussian process with stationary increments

and σ 2 (·) concave and non-decreasing on [0, h], some h > 0. Assume also, for
constants L, % ≥ 1, and all τ ∈ [0, h],

|σ 2 (τ )| ≤ L|τ |1/% .

Then the covariance of X has finite %-variation. More precisely

1/%
kRX k%-var;[s,t]2 ≤ M |t − s| (10.20)

for all intervals [s, t] with length |t − s| ≤ h and some M = M (%, L) > 0.

Proof. Consider some interval [s, t] with length |t − s| ≤ h. The proof relies on
separating “diagonal” and “off-diagonal” contributions. Let D = {ti }, D0 = {t0j } be
two dissections of [s, t]. For fixed i, we have
10.3 Fractional Brownian motion and beyond 141
X % %
31−% E Xti ,ti+1 Xt0j ,t0j+1 ≤ 31−% EXti ,ti+1 X· %-var;[s,t]

(10.21)
t0j ∈D 0
% %
≤ EXti ,ti+1 X· %-var;[s,ti ] + EXti ,ti+1 X· %-var;[ti ,ti+1 ]
%
+ EXti ,ti+1 X· %-var;[ti+1 ,t] .

By Lemma 10.8 above, we have

≤ |EXti ,ti+1 Xs,ti | ≤ |EXti ,ti+1 Xs,ti+1 | + |EXt2i ,ti+1 |

EXti ,ti+1 X·
%-var;[s,ti ]

≤ 2σ 2 (ti+1 − ti ) .

The third term is bounded analogously. For the middle term in (10.21) we estimate

EXti ,ti+1 X· %
X
|EXti ,ti+1 Xt0j ,t0j+1 |%

%-var;[t ,t ]
= sup
i i+1
D0
t0j ∈D 0

σ 2 (t0j+1 − t0j )% ≤ L|ti+1 − ti | ,

X
≤ sup
D0
t0j ∈D 0

where we used the second estimate of Lemma 10.8 for the penultimate bound and
the assumption on σ 2 for the last bound. Using these estimates in (10.21) yields
X
|EXti ,ti+1 Xt0j ,t0j+1 |% ≤ C|ti+1 − ti | ,
t0j ∈D 0

and (10.20) follows by summing over ti and taking the supremum over all dissections
of [s, t]. t
u
Corollary 10.10. Let X = (X 1 , ..., X d ) be a centred continuous Gaussian process
with independent components such that each X i satisfies the assumption of the
previous theorem, with common values of h, L and % ∈ [1, 3/2). Then X, restricted
to any interval [0, T ], lifts to X = (X, X) ∈ Cgα [0, T ], Rd .
Proof. Set In = [(n − 1)h, nh] so that [0, T ] ⊂ I1 ∪ I2 ∪ · · · ∪ I[T /h]+1 . On each
interval In , we may apply
Theorem 10.4 to lift Xn := X|In to a (random) rough
path Xn ∈ Cgα In , Rd . The concatenation of X1 , X2 , . . . then yields the desired
rough path lift on [0, T ]. t
u
Example 10.11 (Fractional Brownian motion). Clearly, d-dimensional fractional
Brownian motion B H with Hurst parameter H ∈ ( 13 , 12 ] satisfies the assumptions of
the above theorem / corollary for all components with

σ(u) = u2H ,
1
obviously non-decreasing and concave for H ≤ 2 and on any time interval [0, T ].
This also identifies
1
%=
2H
142 10 Gaussian rough paths

and % < 32 translates to H > 13 in which case we obtain a canonical geometric rough
path BH = (B H , BH ) associated to fBm. In fact, a canonical “level-3” rough path
BH can be constructed as long as % < %∗ = 2, corresponding to H > 1/4 but this
requires level-3 considerations which we do not discuss here (see [FV10b, Ch.15]).
Example 10.12 (Ornstein-Uhlenbeck process). Consider the d-dimensional (station-
ary) OU process, consisting of i.i.d. copies of a scalar Gaussian process X with
covariance
E[Xs Xt ] = K(|t − s|) , K(u) = exp (−cu) ,
where c > 0 is fixed. Note that σ 2 (u) = EXt,t+u
2 2
= EXt+u + EXt2 − 2EXt,t+u =
2
2[K(0) − K(u)] = 1 − exp (−cu), so that σ (u) is indeed increasing and concave:

∂u σ 2 (u) = c exp (−cu) > 0

∂u2 σ 2 (u) = −c2 exp (−cu) < 0 .

One also has the bound σ 2 (u) = 1 − exp (−cu) ≤ cu, which shows that the
assumptions of the above corollary are satisfied with % = 1, L = c and arbitrary
h > 0.

10.4 Exercises

Exercise 10.13. Let X D be a piecewise linear approximation to X. Show that

(Xs,t ) as constructed in Theorem 10.4 is the limit, in probability and uniformly
Rt D
on {(s, t) : 0 ≤ s ≤ t ≤ T } say, of s Xs,u ⊗ dXuD as |D| → 0. (In particular, any
algebraic relations which hold for (piecewise) smooth paths and their iterated inte-
grals then hold true in the limit. This yields an alternative proof that (X, X) satisfies
conditions (2.1) and (2.5).)
Exercise 10.14 (Brownian rough path, rate of convergence [HN09, FR11]). Let
X = B and Y = B n be d-dimensional Brownian motion and piecewise linear
approximations (with mesh size 1/n), respectively. Show that the covariance of
(B, B n ) has finite 1-variation, uniformly in n. Show also that

h i
n 2 1
sup E(Bs,t − Bs,t ) =O .
s,t∈[0,T ] n

Conclude that, for any θ < 1/2 − α

1
q
kB − B n kα + kB − Bn k2α =O θ .

Lq n

Use a Borel–Cantelli argument to show that, also for any θ < 1/2 − α,
1
kB − B n kα + kB − Bn k2α ≤ C(ω) .
nθ
10.4 Exercises 143

1 1

When α ∈ 3, 2 we can conclude convergence in α-Hölder rough path metric, i.e.

%α ((B, B), (B n , Bn )) → 0 ,

almost surely with rate 1/2 − α − ε for every ε > 0.

Exercise 10.15. Let (B, B̃) be a 2-dimensional standard Brownian motion. The
(Gaussian) process given by

X = (Bt , Bt + B̃t )

fails to have independent components and yet lifts to a Gaussian rough path. Explain
how and detail the construction.

Exercise 10.16. Assume R(s, t) = K(|t − s|) for some C 2 -function K. (This was
exactly the situation in the above Ornstein–Uhlenbeck case, Example 10.12.) Give a
direct proof that R has finite 2-dimensional 1-variation, more precisely,

kRk1-var;[s,t]2 ≤ C|t − s| , ∀0≤s≤t≤T ,

for a constant C which depends on T and K.

Solution 10.17. If (s, t) 7→ R(s, t) := E[Xs Xt ] is smooth, the 2-dimensional 1-

variation is given by
Z
2
kRk1-var;[0,T ]2 = ∂s,t R(s, t) ds dt
[0,T ]2

This remains true when the mixed derivative is a signed measure, which in turn is the
case when R(s, t) = K(|t − s|) for some C 2 -function K. Indeed, write H and 2δ
for the distributional derivatives of | · |. Formal application of the chain-rule gives
∂t R = K 0 (|t − s|)H(t − s) and then, using |H| ≤ 1 a.s.,
∂s,t R(s, t) ≤ |K 00 (|t − s|)| + 2|K 0 (|t − s|)|δ(t − s).
2

2 2
Integration again over [s, t] ⊂ [0, T ] yields
Z
∂u,v R(u, v) du dv ≤ (T |K 00 | + 2|K 0 (0)|)|t − s|.
2
kRk1-var;[s,t]2 = ∞
[s,t]2

This is easily made rigorous by replacing | · | (and then H, 2δ) by a mollified version,
say |·|ε (and Hε , 2δε ), noting that variation-norms behave in a lower-semi-continuous
fashion under pointwise limits; that is

kRk1-var;[s,t]2 ≤ liminf kRε k1-var;[s,t]2

ε→0

whenever Rε → R pointwise. To see this, it suffices to take arbitrary dissections

D = (ti ) and D0 = (t0j ) of [u, v] and note that
144 10 Gaussian rough paths

X ti−1 , ti

X ti−1 , ti
R 0 = R ε 0 ≤ liminf kRε k
tj−1 , t0j ε→0
lim
tj−1 , t0j 1-var;[u,v]2 .

ε→0
i,j i,j

Exercise 10.18. Assume X = X 1 , . . . , X d is a centred, continuous Gaussian
process with independent components.
(i) Assume covariance of finite %-variation with % < 2. Show that each component
X = X i , for i = 1, . . . , d, has almost surely vanishing compensated quadratic
variation on [0, T ] by which we mean
X
2 2

lim Xs,t − E(Xs,t ) =0,
n→∞
[s,t]∈Pn

in probability (and Lq , any q < ∞) for any sequence of partitions (Pn ) of [0, T ]
with mesh |Pn | → 0.
(ii) Under the assumptions of (i), show that there exists (Pn ) with |Pn | → 0 so
that, with probability one, the quadratic (co)variation X i , X j , in the sense of
definition 5.8, vanishes, for any i 6= j, with i, j ∈ {1, . . . , d}.
Conclude that, with regard to Theorem 10.4, the off-diagonal elements Xi,j s,t ,
defined as the L2 limit of left-point Riemann-Stieltjes sums, could have been
equivalently defined via mid- or right-point Riemann sums.
(iii) Assume % = 1. Show that, for all i = 1, . . . , d, there exists a sequence
(Pn ) with
mesh |Pn | → 0 so that, with probability, the quadratic variation X i , X i , in the
sense of definition 5.8, exists and equals
i X
i
2
X t := lim sup E Xu,v .
ε→0 |P|<ε
[u,v]∈P
u<t

Discuss the possibility of lifting X to a (random) non-geometric rough path,

similar to the Itô-lift of Brownian motion.
(iv) Consider the case of a zero-mean, stationary Gaussian process on [0, 2π] with
i.i.d. components, each specified by
2
E(Xs,t ) = cosh (−π) − cosh (|t − s| − π).

Verify that % = 1 and compute [X]. (This example is related to the stochastic heat
equation, where s, t should be thought of as spatial variables; cf Lemma 12.17)
Solution 10.19. (i) Using Wick’s formula for the expectation of products of centred
Gaussians, namely

E[ABCD] = E[AB]E[CD] + E[AC]E[BD] + E[AD]E[BC] ,

we obtain the identity

X 2
2 2
E Xs,t − E(Xs,t )

[s,t]∈Pn
10.4 Exercises 145
X X
2
Xs20 ,t0 − E Xs,t
2
E(Xs20 ,t0 )

= E Xs,t
[s,t]∈Pn [s0 ,t0 ]∈Pn
X X
= 2E[Xs,t Xs0 ,t0 ]E[Xs,t Xs0 ,t0 ]
[s,t]∈Pn [s0 ,t0 ]∈Pn
2
R s0 , t0
X X
=2 s ,t
[s,t]∈Pn [s0 ,t0 ]∈Pn
2−%
R s0 , t0 kRk%

≤ sup %-var;[0,T ]2
.
t−s≤|Pn |
s ,t
t0 −s0 ≤|Pn |

This term on the other hand converges to 0 as |Pn | → 0. This gives L2 -

convergence and hence convergence in probability. Convergence in Lq for any
q < ∞ follows from general facts on Wiener–Itô chaos.
(ii) Left to the reader.
(iii) We fix i and drop the index. We easily see that (i) holds uniformly on compacts,
say, in the sense that
X
2 2

sup Xu,v − E(Xu,v ) → 0 as n → ∞
t∈[0,T ] [u,v]∈D
n
u<t

in probability whenever |Pn | → 0 . On the other hand,

X 2
sup E(Xu,v ) < ∞
|P|<ε [u,v]∈P
u<t

thanks to finite 1-variation of the covariance. By monotonicity, the limit as

ε=
1/n → 0 exists, and we call it [[X]]t . Then, along a suitable sequence P̃n ,
X 2
[[X]]t = lim E(Xu,v ) .
n
[u,v]∈P̃n
u<t

On the
other hand, at the price of passing to another subsequence also denoted by
P̃n , we have
X
2 2

sup Xu,v − E(Xu,v ) →0 almost surely,
t∈[0,T ]
[u,v]∈P̃n
u<t

and so with probability one, and uniformly in t ∈ [0, T ],

X
2
Xu,v → [[X]]t .
[u,v]∈P̃n
u<t
146 10 Gaussian rough paths

2
(iv) One has E(Xs,t ) = cosh (−π) − cosh (|t − s| − π) = sinh (π)|t − s| + o(|t − s|)
and so [X]t = t sinh (π).
Exercise 10.20. Assume finite 1-variation of the covariance (as e.g. defined in (10.5))
2
of a zero-mean Gaussian process X and E[Xt,t+h ] = f (t)h + o(h) as h ↓ 0, for
some f ∈ C([0, T ], R). Show that, for every smooth test function ϕ,
T 2 T
Xt,t+h
Z Z
ϕ(t) dt → ϕ(t)f (t) dt as h → 0,
0 h 0

where the convergence takes places in Lq for any q < ∞ (and hence also in probabil-
ity).
Solution 10.21. Since all types of Lq -convergence are equivalent on the finite
Wiener–Itô chaos (here we only need the chaos up to level 2), it suffices to consider
q = 2. A dissection (tk ) of [0, T ] is given by tk = kh ∧ T . We have
X 1 Z tk+1 Z 1 X
2
ϕ(t)Xt,t+h dt = dθ ϕ(tk + θh)Xt2k +θh,tk +θh+h
h tk 0
k k
Z 1
≡ hϕ, µθ,h idθ ,
0

where the random measure µθ,h := k δtk +θh Xt2k +θh,tk +θh+h acts on test func-
P

tions by integration. It obviously suffices to establish hϕ, µθ,h i → hϕ, f i in L2 ,

uniformly in θ ∈ [0, 1]. Define the (random) distribution function of µθ,h
X
F (t) := µθ,h ([0, t]) = Xt2k +θh,tk +θh+h ,
k:tk +θh≤t

and also F̄ (t) = EF (t). Note that,

X Z t
F̄ (t) = f (tk + θh)h + o(h) ∼ f (s)ds as h ↓ 0,
k:tk +θh≤t 0

in θ ∈ [0, 1],
uniformly
t∈
[0, T ]. On the other hand, the Gaussian (or Wick) identity
E A2 B 2 − E[A2 ]E B 2 = 2(E[AB])2 , applied with A = Xtk +θh,tk +θh+h and
B = Xtj +θh,tj +θh+h , gives
2
E F (t) − F̄ (t) = E F 2 (t) − F̄ 2 (t)

2
X tk + θh, tk + θh + h
=2 RX
tj + θh, tj + θh + h
k:tk +θh≤t
j:tj +θh≤t

. osc R2−% ; h → 0

as h → 0 ,

uniformly in θ ∈ [0, 1], t ∈ [0, T ]. It follows that

10.5 Comments 147
Z t
F (t) = µθ,h ([0, t]) → f (s)ds
0

in L2 , again uniformly in t and θ. Now, for fixed smooth ϕ, one has the bound
Z Z 2 Z Z t 2

ϕ(t)µθ,h (dt) − ϕ(t)f (t)dt = f (s)ds − µθ,h ([0, t]) ϕ̇(t)dt

0
Z 1 Z t 2
. f (s)ds − µθ,h ([0, t]) dt
0 0

and so
Z Z 2 Z 1 Z t 2

E ϕ(t)µθ,h (dt) − ϕ(t)f (t)dt .
E f (s)ds − µθ,h ([0, t]) dt .
0 0

This expression converges to 0 as h → 0, uniformly in θ, thus completing the proof.

10.5 Comments

Classes of Gaussian processes which admit (canonical) lifts to random rough paths
were first studied by Coutin–Qian [CQ02], with focus on fBm with Hurst parameter
H > 1/4. Ledoux, Qian and Zhang [LQZ02] used Gaussian techniques to establish
large deviation and support for the Brownian rough paths, extensions to fractional
Brownian motions were investigated by Millet–Sanz-Solé [MSS06], Feyel and de
la Pradelle [FdLP06], Friz–Victoir [FV07, FV06a]. When H ≤ 1/4, there is no
canonical rough path lift: as noted in [CQ02], the L2 -norm of the area associated
to piecewise linear approximations to fBm diverges. See however the works of
Unterberger and then Nualart–Tindel [Unt10, NT11].
The notion of two-dimensional %-variation of the covariance, as adopted in this
chapter, is due to Friz–Victoir, [FV10a], [FV10b, Ch.15], [FV11], and allows for an
elegant and general construction of Gaussian rough paths. It also leads naturally to
useful Cameron–Martin embeddings, see Section 11.1. If restricted to the “diagonal”,
%-variation of the covariance relates to a classical criterion of Jain–Monrad [JM83].
The question remains how one checks finite %-variation when faced with a non-
trivial (and even non-explicit, e.g. given as Fourier series) covariance function. A
general criterion based on a certain covariance measure structure (reminiscent of
Kruk, Russo and Tudor [KRT07]) was recently given by Friz, Gess, Gulisashvili and
Riedel [FGGR13], a special case of which is the “concavity criterion” of Theorem
10.9.
Chapter 11
Cameron–Martin regularity and applications

Abstract A continuous Gaussian process gives rise to a Gaussian measure on path-

space. Thanks to variation regularity properties of Cameron–Martin paths, powerful
tools from the analysis on Gaussian spaces become available. A general Fernique
type theorem leads us to integrability properties of rough integrals with Gaussian
integrator akin to those of classical stochastic integrals. We then discuss Malliavin
calculus for differential equations driven by Gaussian rough paths. As application a
version of Hörmander’s theorem in this non-Markovian setting is established.

11.1 Complementary Young regularity

Although we have chosen to introduce (rough) paths subject to α-Hölder regularity,

the arguments are not difficult to adapt to p-variation with p = 1/α. In particular,
one uses the p-variation semi-norm given by
p
X p
kXkp-var;[0,T ] = sup |Xs,t | , (11.1)
P
[s,t]∈P

where X ∈ C([0, T ], Rd ), say, and the supremum is taken over all partitions of [0, T ].
The 1-variation (p = 1) of such a path is of course nothing but its length, possibly
+∞. Hölder implies variation regularity, one has the immediate estimate

kXkp-var;[0,T ] ≤ T α kXkα;[0,T ] .

Conversely, a time-change renders p-variation paths Hölder continuous with exponent

α = 1/p. Given two paths X ∈ C p-var ([0, T ], Rd ), h ∈ C q-var ([0, T ], Rd ) let us say
that they enjoy complementary Young regularity if Young’s condition
1 1
+ >1, (11.2)
p q

149
150 11 Cameron–Martin regularity and applications

is satisfied.
We are now interested in the regularity of Cameron–Martin paths. As in the
last section, X is an Rd -valued, continuous and centred Gaussian process on [0, T ],
realized as X(ω) = ω ∈ C [0, T ], Rd , a Banach space under the uniform norm,

equipped with a Gaussian measure. General principles of Gaussian measures on

(separable) Banach spaces thus apply [Led96]. Specializing to the situation at hand,
the associated Cameron–Martin space H ⊂ C [0, T ], Rd consists of paths t 7→
ht = E(ZXt ) whereZ ∈ W 1 is an element in the so-called first Wiener chaos,
the L2 -closure of span Xti : t ∈ [0, T ], 1 ≤ i ≤ d , consisting of Gaussian random
variables. We recall that if h0 = E Z 0 X· denotes another element in H, the inner
product hh, h0 iH = E(ZZ 0 ) makes H a Hilbert space; Z 7→ h is an isometry between
W 1 and H.
Example 11.1. (Brownian motion). Let B be a d-dimensional Brownian motion, let
g ∈ L2 [0, T ], Rd , and set

d Z
X T Z T
Z= gsi dBsi ≡ hg, dBi .
i=1 0 0

Rt 2
By Itô’s isometry, hit := E ZBti = 0 gsi ds so that ḣ = g and khkH := E Z 2 =
RT 2
0
|gs | ds = kḣk2L2 where | · | denotes Euclidean norm on Rd . Clearly, h is of finite
1-variation, and its length is given by kḣkL1 . On the other hand, Cauchy–Schwarz
shows any h ∈ H is 1/2-Hölder which, in general, “only” implies 2-variation.
The proposition below applies to Brownian motion with % = 1, also recalling that
kRk1;[s,t]2 = |t − s| in the Brownian motion case.

Proposition 11.2. Assume the covariance R : (s, t) 7→ E(Xs ⊗ Xt ) is of finite %-

variation (in 2D sense) for % ∈ [1, ∞). Then H is continuously embedded in the
space of continuous paths of finite %-variation. More, precisely, for all h ∈ H and all
s < t in [0, T ], q
khk%-var;[s,t] ≤ khkH kRk%-var;[s,t]2 .

Proof. We assume X, h to be scalar. The extension to d-dimensional X is straight-

forward (and even trivial when X has independent components, which will always
be the case for us). Let h = E(ZX
. ). By scaling, we may assume without loss of
2
generality, that khkH := E Z 2 = 1. Let (tj ) be a dissection of [s, t]. Let %0 be the
Hölder conjugate of %. Using duality for l% -spaces, we have1
X % 1/% X

ht = sup βj , htj ,tj+1
j ,tj+1

j β,|β|l%0 ≤1 j
X

= sup E Z βj , Xtj ,tj+1
β,|β|l%0 ≤1 j

1

The case % = 1 may be seen directly by taking βj = sgn htj ,tj+1 .
11.1 Complementary Young regularity 151
sX

≤ sup βj ⊗ βk , E Xtj ,tj+1 ⊗ Xtk ,tk+1
β,|β|l%0 ≤1 j,k
v
u X 1 X % %1
u %0 % 0 %0
≤ sup t |βj | |βk | E Xtj ,tj+1 ⊗ Xt ,t
k k+1

β,|β|l%0 ≤1 j,k j,k
X % 1/(2%) q
≤ E Xt ,t ⊗ Xt ,t
j j+1 k k+1
≤ kRk%-var;[s,t]2 .
j,k

The proof is then completed by taking the supremum over all dissections (tj ) of [0, t].
t
u
Remark 11.3. It is typical (e.g. for Brownian or fractional Brownian motion, with
% = 1/(2H) ≥ 1) that
1/%
∀s < t in [0, T ] : kRk%-var;[s,t]2 ≤ M |t − s| .
In such a situation, Proposition 11.2 implies that
1/(2%)
|hs,t | ≤ khk%-var;[s,t] ≤ khkH M 1/2 |t − s| ,

which tells us that H is continuously embedded in the space of 1/(2%)-Hölder

continuous paths (which can also be seen directly from hs,t = E(ZXs,t ) and Cauchy–
Schwarz). The point is that 1/(2%)-Hölder only implies 2%-variation regularity, in
contrast to the sharper result of Proposition 11.2.
In part i) of the following lemma we allow X = (X, X) to be a (continuous)
rough path of finite p-variation rather than of α-Hölder regularity. More formally, we
write X ∈ C p-var when p ∈ [2, 3) and the analytic Hölder type condition (2.3) in the
definition of a rough path is replaced by
X 1/p
def p
kXkp-var = sup |Xs,t | <∞,
P
[s,t]∈P
X 2/p (11.3)
def p/2
kXkp/2-var = sup |Xs,t | <∞.
P
[s,t]∈P

The homogeneous p-variation rough path norm over [0, T ] is then given by
q
def def
|||X|||p-var;[0,T ] = |||X|||p-var = kXkp-var + kXkp/2-var . (11.4)

Of course, a geometric rough path of finite p-variation, X ∈ Cgp-var is one for which
the “first order calculus” condition (2.5) holds.
The following results will prove crucial in Section 11.2 where we will derive,
based on the Gaussian isoperimetric inequality, good probabilistic estimates on
Gaussian rough path objects. They are equally crucial for developing the Malliavin
calculus for (Gaussian) rough differential equations in Section 11.3.
152 11 Cameron–Martin regularity and applications

Recall from Exercise 2.19 that the translation of a rough path X = (X, X) in
direction h is given by
def
Th (X) = X h , Xh

(11.5)
where X h := X + h and
Z t Z t Z t
h
Xs,t := Xs,t + hs,r ⊗ dXr + Xs,r ⊗ dhr + hs,r ⊗ dhr , (11.6)
s s s

provided that h is sufficienly regular to make the final three integrals above well-
defined.
Lemma 11.4. i) Let X ∈ Cgp-var ([0, T ], Rd ), with p ∈ [2, 3) and consider a function
h ∈ C q-var ([0, T ], Rd ) with complementary Young regularity in the sense that

1/p + 1/q > 1 .

Then the translation of X in direction h is well-defined in the sense that the

integrals appearing in (11.6) are well-defined Young integrals and Th : X 7→
Th (X) maps Cgp-var [0, T ], Rd into itself. Moreover, one has the estimate, for
some constant C = C(p, q),

|||Th (X)|||p-var ≤ C |||X|||p-var + khkq-var .

ii) Similarly, let α = 1/p ∈ ( 13 , 12 ], X ∈ Cgα [0, T ], Rd and h : [0, T ] → Rd again

of complementary Young regularity, but now “respectful” of α-Hölder regularity

in the sense that 2
α
khkq-var;[s,t] ≤ K|t − s| , (11.7)
uniformly in 0 ≤ s < t ≤ T . Write kh||q,α for the smallest constant K in
the
bound (11.7). Then again Th is well-defined and now maps Cgα [0, T ], Rd into
itself. Moreover, one has the estimate, again with C = C(p, q),

|||Th (X)|||α ≤ C(|||X|||α + kh||q,α ) .

Proof. This is essentially a consequence of Young’s inequality which gives

Z t

hs,r ⊗ dXr ≤ Ckhkq-var;[s,t] kXkp-var;[s,t] ,
s

and then similar estimates for the other (Young) integrals

√ appearing in the definition
of Xh . One then uses elementary estimates of the form ab ≤ a+b (for non-negative
reals a, b), in view of the definition of homogeneous norm (which involves Xh with a
square root). Details are left to the reader. tu

2 1
From Remark 11.3, khk%,α . khkH for all α ≤ 2%
.
11.1 Complementary Young regularity 153

By combining the Cameron–Martin regularity established in Proposition 11.2, see

also Remark 11.3, with the previous lemma we obtain the following result.

Theorem 11.5. Assume (Xt : 0 ≤ t ≤ T ) is a continuous d-dimensional, centred

Gaussian process with independent components and covariance R such that there
exists % ∈ [1, 32 ) and M < ∞ such that for every i ∈ {1, . . . , d} and 0 ≤ s ≤ t ≤ T ,
1/%
kRX i k%-var;[s,t]2 ≤ M |t − s| .

Let α ∈ ( 13 , 2%
1
] and X = (X, X) ∈ C α [0, T ], Rd a.s. be the random Gaussian

rough path constructed in Theorem 10.4. Then there exists a null set N such that for
every ω ∈ N c and every h ∈ H,

Th (X(ω)) = X(ω + h) .

Proof. Note that complementary Young regularity holds, with p = α1 < 3 and
q = % < 32 , as is seen from p1 + 1q > 13 + 32 > 1. As a consequence of Lemma 11.4,
the translation Th (X(ω)) is well-defined whenever X(ω) ∈ C α . The proof requires
a close look at the precise construction of X(ω) = (X(ω), X(ω)) in Theorem 10.4,
using Kolmogorov’s criterion to build a suitable (continuous, and then Hölder) modi-
fication from X restricted to dyadic times. We recall that X(ω) = ω ∈ C([0, T ], Rd ).
Let N1 be the null set of ω where X(ω) fails to be of α-Hölder (or p-variation)
regularity. Note that ω ∈ N1c implies ω + h ∈ N1c for all h ∈ H. By the very
construction of Xs,t as an L2 -limit, for fixed
R s, t there exists a sequence of partitions
(P m ) of [s, t] such that Xs,t (ω) = limm P m X ⊗ dX exists for a.e. ω, and we write
N2;s,t for the null set on which this fails. The intersections of all these, for dyadic
times s, t, is again a null set, denoted by N2 . Now take ω ∈ N1c ∩ N2c . For fixed
dyadic s, t, consider the aforementioned partitions (P m ) and note
Z
X(ω + h) ⊗ dX(ω + h)
Pm
Z Z Z Z
= X(ω) ⊗ dX(ω) + h ⊗ dX + X ⊗ dh + h ⊗ dh .
Pm Pm Pm Pm

Thanks to ω ∈ N1c and Proposition 11.2, X(ω) and h have complementary

Young regularities, which guarantees convergence of the last three integrals to
their
R respective Young integrals. On the other hand, ω ∈ N2c guarantees that
Pm
X(ω) ⊗ dX(ω) → Xs,t (ω). This shows that the left hand side converges,
the limit being by definition X(ω + h). In other words, for all ω ∈ N1c ∩ N2c , h ∈ H
and dyadic times s, t,
Th (X(ω))s,t = X(ω + h)s,t .
The construction of Xs,t for non-dyadic times was obtained by continuity (see
Theorem 10.4) and the above almost-sure identity remains valid. t
u
154 11 Cameron–Martin regularity and applications

11.2 Concentration of measure

11.2.1 Borell’s inequality

Let us first recall a remarkable isoperimetric inequality for Gaussian measures.

Following [Led96], we state it in the form due to C. Borell [Bor75], but an essentially
equivalent result was obtained independently by Sudakov and Tsirelson [SC74].
In order to state things in their natural generality, we consider in this section an
abstract Wiener-space (E, H, µ). The reader may have in mind the Banach space
E = C [0, T ], Rd , equipped with norm kxkE := sup0≤t≤T |xt | and a Gaussian

measure µ, the law of a d-dimensional, continuous centred Gaussian process X. In

this example, the Cameron–Martin space is given by H = E(X· Z) : Z ∈ W 1 with
1/2
khkH = E Z 2 for h = E(X· Z). Let us write
Z y
1 2
Φ(y) = √ e−x /2
dx
2π −∞

for the cumulative distribution function of a standard Gaussian, noting the elementary
tail estimate
Φ̄(y) := 1 − Φ(y) ≤ exp −y 2 /2 , y ≥ 0.

Theorem 11.6 (Borell’s inequality). Let (E, H, µ) be an abstract Wiener space and
A ⊂ E a measurable Borel set with µ(A) > 0 so that

â := Φ−1 (µ(A)) ∈ (−∞, ∞]

Then, if K denotes the unit ball in H, for every r ≥ 0,

c
µ((A + rK) ) ≤ Φ̄(â + r).

where A + rK = {x + rh : x ∈ A, h ∈ K} is the so-called Minkowski sum.3

Theorem 11.7 (Generalized Fernique Theorem). Let a, σ ∈ (0, ∞) and consider

measurable maps f, g : E → [0, ∞] such that

Aa = {x : g(x) ≤ a}

has (strictly) positive µ measure4 and set

â := Φ−1 (µ(Aa )) ∈ (−∞, ∞].

Assume furthermore that there exists a null-set N such that for all x ∈ N c , h ∈ H :

f (x) ≤ g(x − h) + σkhkH . (11.8)

3
Measurability is a delicate matter but circumventable by reading µ as outer measure; [Led96].
4
Unless g = +∞ almost surely, this holds true for sufficienly large a.
11.2 Concentration of measure 155

Then f has a Gaussian tail. More precisely, for all r > a and with ā := â − a/σ,

µ({x : f (x) > r}) ≤ Φ̄(ā + r/σ).

Proof. Note that µ(Aa ) > 0 implies â = Φ−1 (µ(Aa )) > −∞. We have for all
x∈/ N and arbitrary r, M > 0 and h ∈ rK,

{x : f (x) ≤ M } ⊃ {x : g(x − h) + σkhkH ≤ M }

⊃ {x : g(x − h) + σr ≤ M }
= {x + h : g(x) ≤ M − σr}.

Since h ∈ rK was arbitrary, this immediately implies the inclusion

[
{x : f (x) ≤ M } ⊃ {x + h : g(x) ≤ M − σr}
h∈rK
= {x : g(x) ≤ M − σr} + rK ,

and we see that

µ(f (x) ≤ M ) ≥ µ({x : g(x) ≤ M − σr} + rK) .

Setting M = σr + a and A := {x : g(x) ≤ a}, it then follows from Borell’s

inequality that
c
µ(f (x) > σr + a) ≤ µ((A + rK) ) ≤ Φ̄(â + r) .

It then suffices to rewrite the estimate in terms of r̃ := σr + a > a, noting that

â + r = ā + r̃/σ. t u

Example 11.8 (Classical Fernique estimate). Take f (x) = g(x) = kxkE . Then the
assumptions of the generalized Fernique Theorem are satisfied with σ equal to the
operator norm of the continuous embedding H ,→ E. This applies in particular to
Wiener measure on C [0, T ], Rd .

11.2.2 Fernique theorem for Gaussian rough paths

Theorem 11.9. Let (Xt : 0 ≤ t ≤ T ) be a d-dimensional, centred Gaussian process

with independent components and covariance R such that there exists % ∈ [1, 32 ) and
M < ∞ such that for every i ∈ {1, . . . , d} and 0 ≤ s ≤ t ≤ T ,
1/%
kRX i k%-var;[s,t]2 ≤ M |t − s| .

Then, for any α ∈ ( 13 , 2%

1
), the associated rough path X = (X, X) ∈ Cgα built in
Theorem 10.4 is such that there exists η = η(M, T, α, %) with
156 11 Cameron–Martin regularity and applications

2
E exp η|||X|||α < ∞ . (11.9)

Remark 11.10. Recall pthat the homogeneous “norm” |||X|||α was defined in (2.4) as
the sum of kXkα and kXk2α . Since X is “quadratic” in X (more precisely: in the
second Wiener–Itô chaos), the square root is crucial for the Gaussian estimate (11.9)
to hold.
Proof. Combining Theorem 11.5 with Lemma 11.4 and Proposition 11.2 shows that
for a.e. ω and all h ∈ H

|||X(ω)|||α ≤ C |||(X(ω − h))|||α + M 1/2 khkH .

We can thus apply the generalized Fernique Theorem with f (ω) = |||X|||α (ω) and
g(ω) = Cf (ω), noting that |||X|||α (ω) < ∞ almost surely implies that
def
Aa = {x : g(x) ≤ a}

has positive probability for a large enough (and in fact, any a > 0 thanks to a
support theorem for Gaussian rough paths, [FV10b]). Gaussian integrability of the
homogeneous rough path norm, for a fixed Gaussian rough path X is thus established.
The claimed uniformity, η = η(M, T, α, %) and not depending on the particular X
under consideration requires an additional argument. We need to make sure that
µ(Aa ) is uniformly positive over all X with given bounds on the parameters (in
particular M, %, a, d); but this is easy, using (10.16),
1 1
µ(|||X|||α ≤ a) ≥ 1 − E|||X|||2α ≥ 1 − 2 C ,
a2 a
√
where C = C(M, %, α, d) and so, say, a = 2C would do. t u

11.2.3 Integrability of rough integrals and related topics

The price of a pathwise integration / SDE theory is that all estimates (have to) deal
with the worst possible scenario. To wit, given X = (X, X) ∈ Cgα and a nice 1-form,
F ∈ Cb2 say, we had the estimate
Z
T
1/α
F (X)dX ≤ C |||X|||α;[0,T ] ∨ |||X|||α;[0,T ] ,

0

where C may depend on F , T and α ∈ 13 , 12 . In terms of p-variation, p = 1/α, one

can show similarly, with |||X|||p-var;[0,T ] as introduced earlier, cf. (11.4),

Z
T
F (X)dX ≤ C |||X|||p-var;[0,T ] ∨ |||X|||pp-var;[0,T ] , (11.10)

0
11.2 Concentration of measure 157

where C depends on F and α ∈ 13 , 12 but not on T , thanks to invariance under

reparametrisation. For the same reason, the integration domain [0, T ] in (11.10) may
be replaced by any other interval.

Example 11.11. The estimate (11.10) is sharp, at least when p = 1/α = 2, in the
following sense. Consider the (“pure-area”) rough path given by

0 c
t 7→ (0, At) , A = ,
−c 0

for some c > 0. The homogeneous (p-variation, or α-Hölder) rough path norm here
scales with c1/2 . Hence, the right-hand side of (11.10) scales like c (for c large), as
does the left-hand side which in fact is given by T |DF (0)A|.

The “trouble”, in Brownian (% = 1) or worse (% > 1) regimes of Gaussian rough

paths is that, despite Gaussian tails of the random variable |||X(ω)|||α , established
in Theorem 11.9, the above estimate (11.10) fails to deliver Gaussian, or even
exponential, integrability of the “random” rough integral
Z T
def
Z(ω) = F (X(ω))dX(ω) ,
0

something which is rather straightforward in the context of (Itô or Stratonovich)

stochastic integration against Brownian motion.
As we shall now see, Borell’s inequality, in the manifestation of our generalized
Fernique estimate, allows to fully close this “gap” between integrability properties.
The key idea, due to Cass–Litterer–Lyons [CLL13] is to define, for a fixed rough path
X of finite homogeneous p-variation in the sense of (11.4), a tailor-made partition5
of [0, T ], say
P = {[τi , τi+1 ] : i = 0, . . . , N }
with the property that for all i < N

|||X|||p-var;[τi ,τi+1 ] = 1,

i.e. for all but the very last interval for which one has |||X|||p-var;[τN ,τN +1 ] ≤ 1. One
can then exploit rough path estimates such as (11.10) on (small) intervals [τi , τi+1 ]
on which estimates are linear in |||X|||p-var ∼ 1. The problem of estimating rough
integrals is thus reduced to estimating N = N (X) and it was a key technical result
in [CLL13] to use Borell’s inequality to establish good (probabilistic) estimates on
N when X = X(ω) is a Gaussian rough path. (Our proof below is different from
[CLL13] and makes good use of the generalized Fernique estimate.)
To formalize this construction, we fixed a (1D) control function w = w(s, t), i.e.
a continuous map on {0 ≤ s ≤ t ≤ T }, super-additive, continuous and zero on the

5
The construction is purely deterministic. Of course, when X = X(ω) is random, then so is the
partition.
158 11 Cameron–Martin regularity and applications

diagonal.6 The canonical example of a control in this context is7

p
wX (s, t) = |||X|||p-var;[s,t] .

Thanks to continuity of w = wX we can then define a partition tailor-made for X

based on eating up unit (β = 1 below) pieces of p-variation as follows. Set

τ0 = 0 , τi+1 = inf {t : w(τi , t) ≥ β, τi < t ≤ T } ∧ T , (11.11)

so that w(τi , τi+1 ) = β for all i < N , while w(τN , τN +1 ) ≤ β, where N is given
by
N (w) ≡ Nβ (w; [0, T ]) := sup {i ≥ 0 : τi < T }.
As immediate consequence of super-additivity of controls,
N
X −1
βNβ (w; [0, T ]) = w(τi , τi+1 ) ≤ w(0, τN ) ≤ w(0, τN +1 ) = w(0, T ).
i=0

Note also that N is monotone in w, i.e. w ≤ w̃ implies N (w) ≤ N (w̃). At last, let us
set N (X) = N (wX ). The following (purely deterministic) lemma is most naturally
stated in variation regularity.

Lemma 11.12. Assume X ∈ Cgp-var , p ∈ [2, 3), and h ∈ C q-var , q ≥ 1, of complemen-

tary Young regularity in the sense that p1 + 1q > 1. Then there exists C = C(p, q) so
that
1
p
q
N1 (X; [0, T ]) q ≤ C kT−h (X)kp-var;[0,T ] + khkq-var;[0,T ] . (11.12)

Proof. (Riedel) It is easy to see that all Nβ , Nβ 0 , with β, β 0 > 0 are comparable, it
is therefore enough to prove the lemma for some fixed β > 0.
q
Given h ∈ C q-var , wh (s, t) = |||h|||q-var;[s,t] is a control and so is whθ whenever
θ ≥ 1. (Noting 1 ≤ q ≤ p, we shall use this fact with θ = p/q.) From Lemma 11.4
we have, for any interval I

|||Th X|||p-var;I . |||X|||p-var;I + khkq-var;I .

Raise everything to the pth power to see that

p p
(s, t) 7→ |||Th X|||p-var;[s,t] ≤ C |||X|||p-var;[s,t] + khkpq-var;[s,t] =: C w̃(s, t) .

where C = C(p, q) and w̃ is a control. Choose β = C. By monotonicity of Nβ in

the control,

6
Do not confuse a control w with “randomness” ω.
7
Super-additivity, i.e. ω(s, t) + ω(t, u) ≤ ω(s, u) whenever s ≤ t ≤ u is immediate, but
continuity is non-trivial see e.g. [FV10b, Prop. 5.8])
11.2 Concentration of measure 159

Nβ (Th X; [0, T ]) ≤ Nβ (C w̃; [0, T ]) = N1 (ω̃; [0, T ]).

By definition, Ñ := N1 (ω̃; [0, T ]) is the number of consecutive intervals [τi , τi+1 ]

for which
p
1 = ω̃(τi , τi+1 ) = |||X|||p-var;[τi ,τi+1 ] + khkpq-var;[τi ,τi+1 ] .

Using the manifest estimate khkpq-var;[τi ,τi+1 ] ≤ 1 and q/p ≤ 1 we have

p
1 ≤ |||X|||p-var;[τi ,τi+1 ] + khkqq-var;[τi ,τi+1 ] = wX (τi , τi+1 ) + wh (τi , τi+1 )

for 0 ≤ i < Ñ . Summation over i yields

p
Ñ ≤ wX (0, τÑ ) + wh (0, τÑ ) ≤ |||X|||p-var;[0,T ] + khkqq-var;[0,T ] .

Combination of these estimate hence shows that

p
Nβ (Th X; [0, T ]) ≤ |||X|||p-var;[0,T ] + khkqq-var;[0,T ] .

Replace X = Th T−h X by T−h X and then use elementary estimates of the type
(a + b)1/q ≤ (a1/q + b1/q ) for non-negative reals a, b, to obtain the claimed estimate
(11.12). t u
The previous lemma, combined with variation regularity of Cameron–Martin
paths (Proposition 11.2) and the generalized Fernique Theorem 11.7 then gives
immediately
Theorem 11.13 (Cass–Litterer–Lyons). Let X = (X, X) ∈ Cgα a.s. be a Gaussian
rough path, as in Theorem 11.9. (In particular, the covariance is assumed to have
finite 2D %-variation.) Then the integer-valued random variable

N (ω) := N1 (X(ω); [0, T ])

has a Weibull tail with shape parameter 2/% (by which we mean that N 1/% has a
Gaussian tail).
Let us quickly illustrate how to use the above estimate.
Corollary 11.14. Let X be as in the previous theorem and assume F ∈ Cb2 . Then the
random rough integral
Z T
def
Z(ω) = F (X(ω))dX(ω)
0

has a Weibull tail with shape parameter 2/% by which we mean that |Z|1/% has a
Gaussian tail.
Proof. Let (τi ) be the (random) partition associated to the p-variation of X(ω) as
defined in (11.11), with β = 1 and w = wX . Thanks to (11.10) we may estimate
160 11 Cameron–Martin regularity and applications
Z
T X
Z
τi+1

F (X(ω))dX(ω) ≤ F (X(ω))dX(ω)

0 τi
[τi ,τi+1 ]∈P

p
. (N (ω) + 1) sup |||X|||p-var;[τi ,τi+1 ] ∨ |||X|||p-var;[τi ,τi+1 ]
i
= (N (ω) + 1) ,
i
1 1
where the proportionality constant may depend on F , T and α ∈ 3 , 2% . t
u

11.3 Malliavin calculus for rough differential equations

In this section, we assume that the reader is already familiar with the basics of
Malliavin calculus as exposed for example in the monographs [Mal97, Nua06].

11.3.1 Bouleau–Hirsch criterion and Hörmander’s theorem

Consider some abstract Wiener space (W, H, µ) and a Wiener functional of the form
F : W → Re . In the context of stochastic - or rough differential equations (driven
by Gaussian signals), the Banach space W is of the form C [0, T ], Rd where µ
describes the statistics of the driving noise. If F denotes the solution to a stochastic
differential equation at some time t ∈ (0, T ], then, in general, F is not a continuous,
let alone Fréchet regular, function of the driving path. However, as we will see in this
section, it can be the case that for µ-almost every ω, the map H 3 h 7→ F (ω + h), i.e.
F (ω + ·) restricted to the Cameron-Martin space (H, h·, ·i) is Fréchet differentiable.
(This implies D1,p
loc -regularity, based on the commonly used Shigekawa Sobolev space
D1,p ; our notation here follows [Mal97] or [Nua06, Sec. 1.2, 1.3.4].) More precisely,
we introduce the following notion, see for example [Nua06, Sec. 4.1.3]:

Definition 11.15. Given an abstract Wiener space (W, H, µ), a random variable
1
F : W → R is said to be continuously H-differentiable, in symbols F ∈ CH , if for
µ-almost every ω, the map

H 3 h 7→ F (ω + h)

is continuously Fréchet differentiable. A vector-valued random variable is said to be

1
in CH if this is the case for each ofits components. In particular, µ-almost surely,
DF (ω) = DF 1 (ω), . . . , DF e (ω) is a linear bounded map from H to Re .

Given an Re -valued random variable F in CH

1
, we define the Malliavin covariance
matrix
def
Mij (ω) = DF i (ω), DF j (ω) .

(11.13)
11.3 Malliavin calculus for rough differential equations 161

The following well known criterion of Bouleau–Hirsch, see [BH91, Thm 5.2.2] and
[Nua06, Sec. 1.2, 1.3.4] then provides a condition under which the law of F has a
density with respect to Lebesgue measure:

Theorem 11.16. Let (W, H, µ) be an abstract Wiener space and let F be an Re -

1
valued random variable F in CH . If the associated Malliavin matrix M is invertible
µ-almost surely, then the law of F is has a density with respect to Lebesgue measure
on Re .

Remark 11.17. Higher-order differentiability, together with control of inverse mo-

ments of M allow to strengthen this result to obtain smoothness of this density.

As beautifully explained in his own book [Mal97], Malliavin realised that the
strong solution to the stochastic differential equation
d
X
dYt = Vi (Yt ) ◦ dBti , (11.14)
i=1

started at Y0 = y0 ∈ Re and driven along C ∞ -bounded vector fields Vi on Re , gives

rise to a non-degenerate Wiener functional F = YT , admitting a density with respect
to Lebesgue measure, provided that the vector fields satisfy Hörmander’s famous
“bracket condition” at the starting point y0 :

Lie {V1 , . . . , Vd } y0 = Re .

(H)

(Here, Lie V denotes the Lie algebra generated by a collection V of smooth vector
fields.) There are many variations on this theme, one can include a drift vector
field (which gives rise to a modified Hörmander condition) and under the same
assumptions one can show that YT admits a smooth density. This result can also
(and was originally, see [Hör67, Koh78]) be obtained by using purely functional
analytic techniques, exploiting the fact that the density solves Kolmogorov’s forward
equation. On the other hand, Malliavin’s approach is purely stochastic and allows to
go beyond the Markovian / PDE setting. In particular, we will see that it is possible
to replace B by a somewhat generic sufficiently non-degenerate Gaussian process,
with the interpretation of (11.14) as a random RDE driven by some Gaussian rough
path X rather than Brownian motion.

11.3.2 Calculus of variations for ODEs and RDEs

Throughout, we assume that V = (V1 , . . . , Vd ) is a given set of smooth vector fields,

bounded and with bounded derivatives of all orders. In particular, there is a unique
solution flow to the RDE
dY = V (Y ) dX , (11.15)
162 11 Cameron–Martin regularity and applications

for any α-Hölder geometric driving rough path X = (X, X) ∈ Cg0,α , which may
be obtained as limit of smooth, or piecewise smooth, paths in α-Hölder rough path
metric. Set p = 1/α. Recall that, thanks to continuity of the Itô–Lyons maps, RDE
solutions are limits of the corresponding ODE solutions.
The unique RDE solution (11.15) passing through Yt0 = y0 gives rise to the
X
solution flow y0 7→ Ut←t 0
(y0 ) = Yt . We call the derivative of the flow with respect
X
to the starting point the Jacobian and denote it by Jt←t 0
, so that

X d X
Jt←t a= Ut←t0 (y0 + εa) .

0
dε ε=0

We also consider the directional derivative

X d Tεh X
Dh Ut←0 = U ,
dε t←0 ε=0

for any sufficiently smooth path h : R+ → Re . Recall that the translation operator
Th was defined in (11.5). In particular, we have seen in Lemma 11.4 that, if X arises
from a smooth path X together with its iterated integrals, then the translated rough
path Th X is nothing but X + h together with its iterated integrals. In the general case,
given h ∈ C q-var of complementary Young regularity, i.e. with 1/p + 1/q > 1, the
translation Th X can be written in terms of X and cross-integrals between X and h.
Suppose for a moment that the rough path X is the canonical lift of a smooth
Rd -valued path X. Then, it is classical to prove that Jt←t
X
0
solves the linear ODE

d
X
X X
dJt←t 0
= DVi (Yt )Jt←t 0
dXti , (11.16)
i=1

and satisfies JtX2 ←t0 = JtX2 ←t1 · JtX1 ←t0 . Furthermore, the variation of constants
formula leads to
Z tX d
X X
Dh Ut←0 = Jt←s Vi (Ys ) dhis . (11.17)
0 i=1

Similarly, given any smooth vector field W , a straightforward application of the

chain rule yields
d
X
X X
d J0←t W (Yt ) = J0←t [Vi , W ](Yt ) dXti , (11.18)
i=1

where [V, W ] denotes the Lie bracket between the vector fields V and W . All this
extends to the rough path limit without difficulties. For instance, (11.16) can be
interpreted as a linear equation driven by the rough path X, using the fact that
DV (Y ) is controlled by X to give meaning to the equation. It is then still the case
X
that Jt←t 0
is the derivative of the flow associated to (11.15) with respect to its initial
condition.
11.3 Malliavin calculus for rough differential equations 163

Proposition 11.18. Let X ∈ Cg0,α ([0, T ], Rd ) with α ∈ ( 13 , 12 ] and h ∈ C q-var [0, T ], Rd

with complementary Young regularity in the sense that α + 1q > 1. Then

d
Z tX
X X X
i
Dh Ut←0 (y0 ) = Jt←s Vi Us←0 dhs (11.19)
0 i=1

where the right hand side is well-defined as Young integral.

X X X
Proof. Both Jt←0 and Dh Ut←0 satisfy (jointly with Ut←0 ) a RDE driven by X. This
is well known in the ODE case, i.e. when both X, h are smooth, (Duhamel’s principle,
variation of constant formula, ...) and remains valid in the geometric rough path limit
by appealing to continuity of the Itô–Lyons and continuity properties of the Young
integral. A little care is needed since the resulting vector fields are not bounded
anymore. It suffices to rule out explosion so that the problem can be localized. The
X
required remark is that that Jt←0 also satisfies a linear RDE of form
X
dJt←0 = dMX · Jt←0
X
(y0 )

and linear RDEs do not explode; cf. Exercise 8.12. t

Consider now an RDE driven by a Gaussian rough path X = X(ω). We now show
that the Re -valued random variable obtained from solving this random RDE enjoys
1
CH -regularity.
1
Proposition 11.19. With % ∈ [1, 23 ) and α ∈ ( 13 , 2% ), let X = (X, X) ∈ Cgα be a
Gaussian rough path as constructed in Theorem 10.4. For fixed t ≥ 0, the Re -valued
random variable
X(ω)
ω 7→ Ut←0 (y0 )
is continuously H-differentiable.

Proof. Recall h ∈ H ⊂ C %-var so that a.e. X(ω) and h enjoy complementary Young
regularity. As a consequence, we saw that the event

{ω : X(ω + h) ≡ Th X(ω) for all h ∈ H} (11.20)

X(ω+h)
has full measure. We show that h ∈ H 7→ Ut←0 (y0 ) is continuously Fréchet
differentiable for every ω in the above set of full measure. By basic facts of Fréchet
theory, we must show (a) Gateaux differentiability and (b) continuity of the Gateaux
differential.
Ad (a): Using X(ω + g + h) ≡ Tg Th X(ω) for g, h ∈ H it suffices to show Gateaux
X(ω+·)
differentiability of Ut←0 (y0 ) at 0 ∈ H. For fixed t, define
X X

Zi,s ≡ Jt←s Vi Us←0 .

Note that s 7→ Zi,s is of finite p-variation, with p = 1/α. We have, with implicit
summation over i,
164 11 Cameron–Martin regularity and applications

t X
Z
X X
i
Dh Ut←0 (y0 ) = Jt←s Vi Us←0 dhs
0
Z t
i

= Zi dh
0
. (kZkp-var + |Z(0)|) × khk%-var
. (kZkp-var + |Z(0)|) × khkH .
X
Hence, the linear map DUt←0 X
(y0 ) : h 7→ Dh Ut←0 (y0 ) ∈ Re is bounded and each
∗
component is an element of H . We just showed that

d Tεh X(ω) D
X(ω)
E
h 7→ Ut←0 (y0 ) = DUt←0 (y0 ), h
dε ε=0 H

and hence

d X(ω+εh) D
X(ω)
E
h 7→ U (y0 ) = DUt←0 (y0 ), h
dε t←0 ε=0 H

emphasizing again that X(ω + h) ≡ Th X(ω) almost surely for all h ∈ H simulta-
neously. Repeating the argument with Tg X(ω) = X(ω + g) shows that the Gateaux
X(ω+·)
differential of Ut←0 at g ∈ H is given by
X(ω+g) g T X(ω)
DUt←0 = DUt←0 .
T X(ω)
g
(b) It remains to be seen that g ∈ H 7→ DUt←0 ∈ L(H, Re ), the space of linear
bounded maps equipped with operator norm, is continuous. We leave this as exercise
to the reader, cf. Exercise 11.23 below. t
u

11.3.3 Hörmander’s theorem for Gaussian RDEs

Recall that % ∈ [1, 32 ), α ∈ ( 13 , 2%

1
) and X = (X, X) ∈ Cgα a.s. is the Gaussian
rough path constructed in Theorem 10.4. Any h ∈ H ⊂ C %-var and a.e. X(ω) enjoy
complementary Young regularity. We now present the remaining conditions on X,
followed by some commentary on each of the conditions, explaining their significance
in the context of the problem and verifying them for some explicit examples of
Gaussian processes.
Condition 1 Fix T > 0. For every t ∈ (0, T ] we assume non-degeneracy of the law
Pd R t
of X on [0, t] in the following sense. Given f ∈ C α ([0, t], Rd ), if j=1 0 fj dhj = 0
for all h ∈ H, then one has f = 0.
Rt
Note that, thanks to complementary Young regularity, the integral 0 fj dhj makes
sense as a Young integral. Some assumption along the lines of Condition 1 is certainly
necessary: just consider the trivial rough differential equation dY = dX, starting
11.3 Malliavin calculus for rough differential equations 165

at Y0 = 0, with driving process X = X(ω) given by a Brownian bridge which

returns to the origin at time T (i.e. Xt = Bt − Tt BT in terms of a standard Brownian
motion B). Clearly YT = XT = 0 and so YT does not admit a density, despite the
equation dY = dX being even “elliptic”. However, it is straightforward to verify
RT
that in this example 0 dh = 0 for every h belonging to the Cameron-Martin space
of the Brownian bridge, so that Condition 1 is violated by taking for f a non-zero
constant function.
Condition 2 With probability one, sample paths of X are truly rough, at least in a
right-neighbourhood of 0.
These conditions obviously hold for d-dimensional Brownian motion: the first
condition is satisfied because 0 is the only (continuous) function orthogonal to all of
L2 ([0, T ], Rd ); the second condition was already verified in Section 6.3. More inter-
estingly, these conditions are very robust and also hold for the Ornstein-Uhlenbeck
process, a Brownian bridge which returns to the origin at a time strictly greater than
T , and some non-semimartingale examples such as fractional Brownian motion,
including the rough regime of Hurst parameter less than 1/2. We now show that
under these conditions the process admits a density at strictly positive times. Note
that the aforementioned situations are not at all covered by the “usual” Hörmander
theorem.
Theorem 11.20. With % ∈ [1, 32 ) and α ∈ ( 31 , 2% 1
), let X = (X, X) ∈ Cgα be a
Gaussian rough path as constructed in Theorem 10.4. Assume that the Gaussian
process X satisfies Conditions 1 and 2. Let V = (V1 , . . . , Vd ) be a collection of
C ∞ -bounded vector fields on Re , which satisfies Hörmander’s condition (H) at some
point y0 ∈ Re . Then the law of the RDE solution

dYt = V (Yt ) dXt , Y (0) = y0 ,

admits a density with respect to Lebesgue measure on Re for all t ∈ (0, T ].

Proof. Thanks to Proposition 11.19 and in view of the Bouleau–Hirsch criterion,
Theorem 11.16 we only need to show almost sure invertibility of the Malliavin matrix
associated to the solution map. As a consequence of (11.13) and (11.19), we have
for every z ∈ Re the identity
d
z Jt←· Vj (Y· ) 2 ,
X
z | Mt z =
| X
t
j=1

where we wrote k · kt for the norm given by

Z t
kf kt = sup f (s) dh(s) .
h∈H : khk=1 0

X
Before we proceed we note that, by the multiplicative property of Jt←s , see the
remark following (11.16), one has
166 11 Cameron–Martin regularity and applications

X X
|
Mt = Jt←0 M̃t Jt←0 ,

where M̃t is given by

d
z J0←· Vj (Y· ) 2 .
X
z | M̃t z =
| X
t
j=1

Since we know that the Jacobian is invertible, invertibility of Mt is equivalent to

that of M̃t , and it is the invertibility of the latter that we are going to show.
Assume now by contradiction that M̃t is not almost surely invertible. This im-
plies that there exists a random unit vector z ∈ Re such that z | M̃t z = 0 with
non-zero probability. It follows immediately from Condition 1 that, with non-zero
X(ω)
probability, the functions s 7→ z | J0←s Vj (Ys ) vanish identically on [0, t] for every
j ∈ {1, . . . , d}. By (11.18), this is equivalent to
d Z
X ·
X
z | J0←s [Vi , Vj ](Ys ) dXi (s) ≡ 0
i=1 0

on [0, t]. Thanks to Condition 2, true roughness of X, we can apply Theorem 6.5 to
conclude that one has
X
z | J0←· [Vi , Vj ](Y· ) ≡ 0 ,
for every i, j ∈ {1, . . . , d}. Iterating this argument shows that, with non-zero prob-
X
ability, the processes s 7→ z | J0←s W (Ys ) vanish identically for every vector field
W obtained as a Lie bracket of the vector fields Vi . In particular, this is the case for
s = 0, which implies that with positive probability, z is orthogonal to W (z0 ) for
all such vector fields. Since Hörmander’s condition (H) asserts precisely that these
vector fields span the tangent space at the starting point y0 , we conclude that z = 0
with positive probability, which is in contradiction with the fact that z is a random
unit vector and thus concludes the proof. t u

11.4 Exercises

Exercise 11.21 (Improved Cameron–Martin regularity, [FGGR13]). A combi-

nation of Theorem 10.9 with the Cameron–Martin embedding, Proposition 11.2,
shows that every Cameron–Martin path associated to a Gaussian process enjoys finite
q-variation regularity with q = %. Show that, under the assumptions of Theorem 10.9,
this can be improved to
1
q= 1 1 . (11.21)
2 + 2%

As a consequence, “complementary Young regularity”, now holds for all % < 2. In

the fBm setting, this covers every Hurst parameter H > 1/4. (To exploit this in the
11.4 Exercises 167

newly covered regime H ∈ (1/4, 1/3], one would need to work in a “level-3” rough
path setting.)

Exercise 11.22. Formulate a quantitative version of Theorem 11.14. Show in partic-

ular that the Gaussian tail of |Z|1/% is uniform over rough integrals against Gaussian
rough paths, provided that kF kC 2 and the %-variation of the covariance, say in the
b
form of the constant M in Theorem 11.9, are uniformly bounded.

Exercise 11.23. Finish the proof of part (b) of Proposition 11.19.

Solution 11.24. In the notation of the (proof of) this Proposition, we have to show
Tg X(ω)
that g ∈ H 7→ DUt←0 ∈ L(H, Re ) is continuous. To this end, assume gn → g in
%-var
H (and hence in C ). Continuity properties of the Young integral imply continuity
of the translation operator viewed as map h ∈ C %-var 7→ Th X(ω) and so

Tgn X(ω) → Tg X(ω)

in p-variation rough path metric. To point here is that

x x x
x 7→ Jt←· and Jt←· (Vi (U·←0 )) ∈ C p-var

depends continuously on x with respect to p-variation rough path metric: using the
x x
fact that Jt←· and U·←0 both satisfy rough differential equations driven by x this is
just a consequence of Lyons’ limit theorem (the universal limit theorem of rough
path theory). We apply this with x = X(ω) where ω remains a fixed element in
(11.20). It follows that

Tgn X(ω) Tg X(ω) Tgn X(ω) Tg X(ω)
− DUt←0 = sup Dh Ut←0 − Dh Ut←0

DUt←0
op h:khkH =1

Tg X(ω) Tg X(ω)
and defining Zig (s) ≡ Jt←s Vi Us←0 , and similarly Zign (s), the same
reasoning as in part (a) leads to the estimate

Tgn X(ω) Tg X(ω)
− DUt←0 ≤ c |Z gn − Z g |p-var + |Z gn (0) − Z g (0)| .

DUt←0
op

From the explanations just given this tends to zero as n → ∞ which establishes
continuity of the Gateaux differential, as required, and the proof is finished.

Exercise 11.25. Prove Theorem 11.20 in presence of a drift vector field V0 . In par-
ticular, show that in this case condition (H) can be weakened to

Lie {V1 , . . . , Vd , [V0 , V1 ], . . . , [V0 , Vd ]} y0 = Re .

(11.22)
168 11 Cameron–Martin regularity and applications

11.5 Comments

Section 11.1: Regularity of Cameron–Martin paths (q-variation, with q = %) under

the assumption of finite %-variation of the covariance was established in Friz–Victoir,
[FV10a], see also [FV10b, Ch.15]. In the context of Gaussian rough paths, this
leads to complementary Young regularity (CYR) whenever % < 32 which covers
general “level-2” Gaussian rough paths as discussed in Chapter 10. On the other hand,
“level-3” Gaussian rough paths can be constructed for any % < 2 which includes
1
fBm with H = 2% > 14 ). A sharper Cameron regularity result specific to fBm
follows from a Besov–variation embedding theorem [FV06b], thereby leading to
CYR for all H > 41 . The general case was understood in [FGGR13]: one can take q
as in (11.21), provided one makes the slightly stronger assumption of finite “mixed”
(1, %)-variation of the covariance. The conclusion concerning %-variation of Theorem
10.9 can in fact be strengthened to finite mixed (1, %)-variation at no extra prize and
indeed this theorem is only a special case of a general criterion given in [FGGR13].
Section 11.2: Theorem 11.9 was originally obtained by careful tracking of con-
stants via the Garsia–Rodemich–Rumsey Lemma, see [FV10b]. The generalized
Fernique estimate is taken from Friz–Oberhauser and then Diehl, Oberhauser and
Riedel [FO10, DOR13]. It yields an elegant proof of Theorem 11.13 with which Cass,
Litterer, and Lyons [CLL13] have overcome the longstanding problem of obtaining
moment bounds for the Jacobian of the flow of a rough differential equation driven by
Gaussian rough paths, thereby paving the way for the proof of the Hörmander-type
results, see below. As was illustrated, this above methodology can be adapted to
many other situations of interest, a number of which are discussed in [FR13].
Section 11.3: Baudoin–Hairer [BH07] proved a Hörmander theorem for differ-
ential equations driven by fBm in the regular regime of Hurst parameter H > 1/2
in a framework of Young differential equations. The Brownian case H = 1/2
of course classical, see the monographs [Nua06, Mal97] or the original articles
[Mal78, KS84, KS85, KS87, Bis81b, Bis81a, Nor86], a short self-contained proof
can be found in [Hai11a]. In the case of rough differential equations driven by less
regular Gaussian processes (including fBm with H > 1/4), the relevance of com-
plementary Young regularity of Cameron–Martin paths to Malliavin regularity or
(Gaussian) RDE solutions was first recognised by Cass, Friz and Victoir [CFV09].
Existence of a density under Hörmander’s condition for such RDEs was obtained by
Cass–Friz [CF10], see also [FV10b, Ch.20], but with a Stroock-Varadhan support
type argument instead of true roughness (already commented on at the end of Chapter
6.) Smoothness of densities was subsequently established by Hairer–Pillai [HP13]
in the case case of fBm and then Cass, Hairer, Litterer and Tindel [CHLT12] in
the general Gaussian setting of Chapter 10, making crucial use of the integrability
estimates discussed in Section 11.2. Indeed, combined with known estimates for the
Jacobian of RDE flows (Friz–Victoir, [FV10b, Thm 10.16]) one readily obtains finite
moments of the Jacobian of the inverse flow, a key ingredient in the smoothness proof
via Malliavin calculus. See also Inahama [Ina13] for a discussion about higher-order
Malliavin differentiability of Gaussian RDE solutions.
Chapter 12
Stochastic partial differential equations

Abstract Second order stochastic partial differential equations are discussed from
a rough path point of view. In the linear and finite-dimensional noise case we
follow a Feynman–Kac approach which makes good use of concentration of measure
results, as those obtained in Section 11.2. Alternatively, one can proceed by flow
decomposition and this approach also works in a number of non-linear situations.
Secondly, now motivated by some semi-linear SPDEs of Burgers’ type with infinite-
dimension noise, we study the stochastic heat equation (in space dimension 1) as
evolution in Gaussian rough path space relative to the spatial variable, in the sense
of Chapter 10.

12.1 Rough partial differential equations

12.1.1 Linear theory: Feynman–Kac

The second order stochastic partial differential equations we will be concerned with
here take the form of a terminal value problem,
d
X
−du = L[u]dt + Γi [u] ◦ dWti (ω) , u(T, ·) = g , (12.1)
i=1

for u = u(ω) : [0, T ] × Rn → R, with differential operators L and Γi given by

def 1
Tr σ(x)σ T (x)D2 u + hb(x), Dui + c(x)u ,

L[u] = (12.2)
2
def
Γi [u] = hβi (x), Dui + γi (x)u .

The coefficients σ = (σ1 , . . . , σm ), b and β = (β1 , . . . , βd ) are viewed as vector

fields on Rn , while c, γ1 , . . . , γd are scalar functions. For simplicity only, all coef-

169
170 12 Stochastic partial differential equations

ficients are assumed to be bounded with bounded derivatives of all orders (but see
Remark 12.3). We assume g ∈ BC(Rn ), that is bounded and continuous.1 The reader
may think of ◦dW in (12.1) as Stratonovich differential of a d-dimensional Brownian
motion. But of course, we are interested in replacing W by a genuine (geometric)
rough path W, such as to give meaning to the rough partial differential equation
(RPDE)
−du = L[u]dt + Γ [u]dW , u(T, ·) = g . (12.3)
To this end, since geometric rough paths are limits of smooth paths, we first consider
the case W ∈ C 1 [0, T ], Rd . It is a basic exercise in Itô-calculus, that any bounded
C 1,2 solution to
d
X
−∂t u = L[u] + Γi [u]Ẇti , u(T, ·) = g , (12.4)
i=1

is given by the classical Feynman–Kac formula (and hence also unique),

" !#
Z Z T T
u(s, x) = Es,x g(XT ) exp c(Xt )dt + γ(Xt )Ẇt dt (12.5)
s s

=: S[W ; g](s, x), (12.6)

where X is the (unique) strong solution to

dXt = σ(Xt )dB(ω) + b(Xt )dt + β(Xt )Ẇt dt, (12.7)

where B is a m-dimensional standard Brownian motion.

Remark 12.1. The natural form of the Feynman–Kac formula is the reason for con-
sidering terminal value problems here, rather than Cauchy problems of the form
∂t u = L[u] + Γ [u]Ẇ with given initial data u(0, ·). Of course, a change of the time
variable t 7→ T − t allows to switch between these problems.
Clearly, there are situations when solutions cannot be expected to be C 1,2 , notably
when g ∈ / C 2 and L fails to provide smoothing as is the case, for example, in
“transport” equations where L is of first order. In such a case, formula (12.5) is a
perfectly good way to define a generalized solution to (12.4). Such a solution need
not be C 1,2 although it is bounded and continuous on [0, T ] × Rn , as one can see
directly from (12.5). As a matter of fact, (12.5) yields a (analytically) weak PDE
solution (cf. Exercise 12.22). It is also a stochastic representation of the unique
(bounded) viscosity solution [CIL92, FS06] to (12.4) although this will play no rôle
for us in the present section.

Theorem 12.2. Let α ∈ ( 31 , 12 ]. Given a geometric rough path W = (W, W) ∈

Cg0,α ([0, T ], Rd ), pick W ε ∈ C 1 [0, T ], Rd so that

1
In contrast to the space Cb we shall equip BC with the topology of locally uniform convergence.
12.1 Rough partial differential equations 171
Z ·
(W ε , Wε ) := W ε, ε
W0,t ⊗ dWtε → W
0

in α-Hölder rough path metric. Then there exists u = u(t, x) ∈ BC([0, T ] × Rn ),

not dependent on the approximating (W ε ) but only on W ∈ Cg0,α ([0, T ], Rd ), so
that, for g ∈ BC(Rn ),

uε = S[W ε ; g] → u =: S[W; g]

as ε → 0 in the sense of locally uniform convergence. Moreover, the resulting solution

map
S : Cg0,α ([0, T ], Rd ) × BC(Rn ) → BC([0, T ] × Rn )
is continuous. We say that u satisfies the RPDE (12.3).

Proof. Step 1: Write X = X W for the solution to (12.7) whenever W ∈ C 1 . The

first step is to make sense of the hybrid Itô-rough differential equation

dXt = σ(Xt )dBt + b(Xt )dt + β(Xt )dWt . (12.8)

This is clearly not an equation that can be solved by Itô theory alone. But is also not
immediately well-posed as rough differential equation since for this we would need to
understand B and W = (W, W) jointly as a rough path. In view of the Itô-differential
dB in (12.8), we take B, BItô , as constructed in Section 3.2), and are basically short
of the cross-integrals between B and W . (For simplicity of Rnotation only, pretend
over the next few lines W, B to be scalar.) We can Rdefine W dB(ω)Ras Wiener
integral (Itô with deterministic integrand), and then BdW = W B − W dB by
imposing integration by parts. We then easily get the estimate
Z t 2
2 2α+1
E Ws,r dBr . kW kα |t − s| ,
s

also when switching the roles of W, B, thanks to the integration by parts formula. It
0
follows from Kolmogorov’s criterion that ZW (ω) := Z = (Z, Z) ∈ C α a.s. for any
α0 ∈ (1/3, α) where
Rt !
BItô

Bt (ω) s,t (ω) s
W s,r ⊗ dB r (ω)
Zt = , Zs,t = R t
Wt Bs,r ⊗ dWr (ω) Ws,t
s

where we reverted to tensor notation reflecting the multidimensional nature of B, W .

It is easy to deduce from Theorem 3.3 that, for any q < ∞,

0 W W̃
%α Z , Z q . %α W, W̃ . (12.9)
L

We are hence able to say that a solution X = X(ω) of (12.8) is, by definition, a
solution to the genuine (random) rough differential equation
172 12 Stochastic partial differential equations

dX = (σ, β)(X)dZW (ω) + b(X)dt (12.10)

driven by the random rough path Z = ZW (ω). Moreover, as an immediate conse-

quence of (12.9) and continuity of the Itô–Lyons map, we see that X is really the
limit, e.g. in probability and uniformly on [0, T ], of classical Itô SDE solutions X ε ,
obtained by replacing dWt by the Ẇtε dt in (12.8).
Step 2: Given (s, x) we have a solution (Xt : s ≤ t ≤ T ) to the hybrid equation
0
(12.8), started at Xs = x. In fact (X, X 0 ) ∈ DZ2α with X 0 = (σ, β)(X). In
particular, the rough integral
Z Z
γ(X)dW := (0, γ(X))dZ

is well-defined, as is - with regard to the Feynman–Kac formula (12.5) - the random

variable !
Z T Z T
g(XT ) exp c(Xt )dt + γ(Xt )dWt (ω). (12.11)
s s

One can see, similar to (11.10), but now also relying on RDE growth estimates as
established in Proposition 8.3), with p = 1/α0 ,
Z t

γ(X)dW . |||Z|||p-var;[s,t]
s

whenever |||Z|||p-var;[s,t] is of order one. An application of the generalized Fernique

Theorem 11.7, similar to the proof of Theorem 11.13 but with % = 1 in the present
context, then shows that the number of consecutive intervals on which Z accumulates
unit p-variation has Gaussian tails; in fact, uniformly in ε ∈ (0, 1], if W is replaced by
W ε with limit W.) This implies that (12.11) is integrable (and uniformly integrable
with respect to ε when W is replaced by W ε ). It follows that
" !#
Z T Z T
u(s, x) := Es,x g(XT ) exp c(Xt )dt + γ(Xt )dWt (12.12)
s s

is indeed well-defined and the pointwise limit of uε (defined in the same way, with
W replaced by W ε ). By an Arzela–Ascoli argument, the limit is locally uniform. At
last, the claimed continuity of the solution map follows from the same arguments,
essentially by replacing W by W everywhere in the above argument, and of course
using (12.12) with g, W replaced by g ε , Wε , respectively. t
u
Remark 12.3. The proof actually shows that our solution u = u(s, x; W) to the linear
RDPE (12.3) enjoys a Feynman–Kac type representation, namely (12.12), in terms
of the process constructed as solution to the hybrid Itô-rough differential equation
(12.8). Assume now W is a Brownian motion, independent of B, and W(ω) =
WStrat = (W, WStrat ) ∈ Cg0,α a.s. It is not difficult to show that u = u(., ., WStrat (ω))
coincides with the Feynman–Kac SPDE solution derived by Pardoux [Par79] or
Kunita [Kun82], via conditional expectations given σ({Wu,v : s ≤ u ≤ v ≤ T },
12.1 Rough partial differential equations 173

and so provides an identification with classical SPDE theory. In conjunction with

continuity of the solution map S = S[W; g] one obtains, along the lines of Sections
9.2, SPDE limit theorems of Wong–Zakai type, Stroock–Varadhan type support
statements and Freidlin–Wentzell type small noise large deviations.

Remark 12.4. It is easy to quantify the required regularity of the coefficients. The
argument essentially relies on solving (12.10) as bona fide rough differential equation.
It is then clear that we need to impose Cb3 -regularity for the vector fields σ and β.
The drift vector field b may be taken to be Lipschitz and c ∈ Cb .

Remark 12.5. We have not given meaning to the actual equation (12.3),

−du = L[u]dt + Γ [u]dW , u(T, ·) = g .

Indeed, in the absence of ellipticity or Hörmander type conditions on L, the solution

may not be any more regular than g ∈ BC, so that in general the action of the first
order differential operator Γ = (Γ1 , . . . , Γd ) on u has no pointwise meaning, let
alone its rough integral against W. On the other hand, we can (at least formally) test
the equation against spatial Schwartz functions ϕ ∈ D and so arrive the following
“analytically weak” formulation of (12.3),
Z T Z T
∗
hus , ϕi = hg, ϕi − hut , L ϕidt − hut , Γ ∗ ϕidWt . (12.13)
s s

In Exercise 12.22 the reader is invited to check that this formulation is indeed
meaningful. In particular, the integral term hut , Γ ∗ ϕidWt is a bona-fide rough
R

integral of the controlled rough path (Y, Y 0 ) ∈ DW

2α
against W, where

Yt = hut , Γ ∗ ϕi , Yt0 = −hut , Γ ∗ Γ ∗ ϕi . (12.14)

Assume now that W is a Brownian motion and take W(ω) = WStrat as above. Then,
thanks to Theorem 5.12, one can see that u = u(., ., WStrat (ω)) is an analytically
weak SPDE solution in the sense that
Z T Z T
∗ ←−−
hus , ϕi = hg, ϕi − hut , L ϕidt − hut , M ∗ ϕi ◦ dW t ,
s s

where the final integral is a backward Stratonovich integral.

12.1.2 Nonlinear theory: flow transformation method

We now turn our attention to initial value problems of the form

d
X
du = F [u]dt + Hi [u] ◦ dWti (ω) , u(0, ·) = g , (12.15)
i=1
174 12 Stochastic partial differential equations

with (possibly non-linear) differential operators,

F [u] = F (x, u, Du, D2 u), Hi [u] = Hi (x, u, Du), i = 1, . . . , d,

given (in abusive notation) in terms of continuous functions F, H. As in the previous

section we aim to replace ◦dW by a “rough” differential dW, for some geometric
rough path W ∈ Cg0,α ([0, T ], Rd ), and show that an RPDE solution arises as the
unique limit under approximations (W ε , Wε ) → W. Of course, there is little one
can say at this level of generality and we have not even clarified in which sense we
mean to solve (12.15) when W ∈ C 1 ! Let us postpone this discussion and assume
momentarily that F and H are sufficiently “nice” so that, for every W ∈ C 1 and
g ∈ BC, say, there is a classical solution u = u(t, x) for t > 0. We shall focus on
three types of noise.
a) Transport noise. For sufficienly nice vector fields βi on Rn ,

Hi [u] = hβi (x), Dui;

b) Semilinear2 noise. For a sufficienly nice function Hi on Rn × R,

Hi [u] = Hi (x, u);

c) Linear noise. With βi as above and sufficiently nice functions γi on Rn

Hi [u] = Γi [u] := hβi (x), Dui + γi (x)u.

We now develop the “calculus” for the transformations associated to each of the
above cases. All proofs consist of elementary computations and are left to the reader.

Proposition 12.6 (Case a). Assume that ψ = ψ W is a C 3 solution flow of diffeomor-

phisms associated to the ODE Ẏ = −β(Y )Ẇ , where W ∈ C 1 . (This is the case if
β ∈ Cb3 .) Then u is a classical solution to

∂t u = F x, u, Du, D2 u + hβ(x), DuiẆ

if and only if v(t, x) = u(t, ψt (x)) is a classical solution to

∂t v − F ψ t, x, v, Dv, D2 v = 0

where F ψ is determined from

F ψ (t, ψt (x), r, p, X)
= F x, r, p, Dψt−1 , X, Dψt−1 ⊗ Dψt−1 + p, D2 ψt−1 .
def

Proposition 12.7 (Case b). For any fixed x ∈ Rn , assume that the one-dimensional
ODE

2
The terminology here follows [LS00a].
12.1 Rough partial differential equations 175

ϕ̇ = H(x, ϕ)Ẇ , ϕ(0; x) = r ,

has a unique solution flow ϕ = ϕW = ϕ(t, r; x) which is of class C 2 as a function
of both r and x. Then u is a classical solution to

∂t u = F x, u, Du, D2 u + H(x, u)Ẇ

if and only if v(t, x) = ϕ−1 (t, u(t, x), x), or equivalently ϕ(t, v(t, x), x) = u(t, x) ,
is a solution of
∂t v − ϕ F t, x, r, Dv, D2 v = 0 ,

with
1
F (t, x, ϕ, Dϕ + ϕ0 p,
ϕ def
F (t, x, r, p, X) = (12.16)
ϕ0
ϕ00 p ⊗ p + Dϕ0 ⊗ p + p ⊗ Dϕ0 + D2 ϕ + ϕ0 X ,

where ϕ0 denotes the derivative of ϕ = ϕ(t, r, x) with respect to r.

Remark 12.8. It is worth noting that the “quadratic gradient” term ϕ00 p⊗p disappears
in (12.16) whenever ϕ00 = 0. This happens when H(x, u) is linear in u, i.e. when

Hi [u] = γi (x)u , i = 1, . . . , d .

in which case we have

d
!
Z t X
i
ϕ(t, r, x) = r exp γ(x)dWs = r exp γi (x)W0,t . (12.17)
0 i=1

Remark 12.9. Note that all dependence on Ẇ has disappeared in (12.17), and conse-
quently (12.16). In the SPDE / filtering context this is known as robustification: the
transformed PDE (∂t − ϕ F )v = 0 can be solved for any W ∈ C([0, T ], Rd ). This
Pd
provides a way to solve SPDEs of the form du = F [u]dt + i=1 γi (x)u ◦ dWt
pathwise, so that u depends continuously on W in uniform topology.

We now turn our attention to case c). The point here is that the “inner” and “outer”
transformation seen above, namely

v(t, x) = u(t, ψt (x)) , v(t, x) = ϕ−1 (t, u(t, x), x) ,

respectively, can be combined to handle noise coefficients obtained by adding those

from cases a) and b), i.e. noise coefficients of the type hβi (x), Dui + Hi (x, u). We
content ourselves with the linear case

Hi [u] = hβi (x), Dui + γi (x)u .

Proposition
R 12.10 (Case c). Let ψ = ψ W be as in case a) and set ϕ(t, r, x) =
t
r exp 0 γ(ψs (x))dWs . Then u is a (classical) solution to
176 12 Stochastic partial differential equations

∂t u = F x, u, Du, D2 u + hβ(x), Dui + γ(x)u Ẇ ,

R
t
if and only if v(t, x) = u(t, ψt (·)) exp − 0 γ(ψs (x))dWs is a (classical) solution
to
∂t v − ϕ (F ψ ) t, x, v, Dv, D2 v = 0.

Remark 12.11. It is worth noting that the outer transformation F → F ψ preserves

the class of linear operators. That is, if F [u] = L[u] as given in (12.2), then F ψ is
again a linear operator. Because of the appearance of quadratic terms in Du, this
is not true for the inner transformation F → ϕ F unless ϕ00 = 0. Fortunately, this
happens in the linear case and it follows that the transformation F → ϕ (F ψ ) used in
case c) above does preserve the class of linear operators.

Let us reflect for a moment on what has been achieved. We started with a PDE
that involves Ẇ and in all cases we managed to transform the original problem to a
PDE where all dependence on Ẇ has been isolated in some auxiliary ODEs. In the
stochastic context (◦dW instead of dW = Ẇ dt) this is nothing but the reduction,
via stochastic flows, from a stochastic PDE to a random PDE, to be solved ω-wise.
In the same spirit, the rough case is now handled with the aid of flows for RDEs and
their stability properties.
Given W ∈ Cg0,α , we pick an approximating sequence (W ε ), and transform

∂t uε = F [uε ] + H[uε ]Ẇ ε (12.18)

to a PDE of the form

∂t v ε = F ε [v ε ], (12.19)

e.g. with F ε = F ψ and ψ = ψ W in case a) and accordingly in the other cases. Then

F ε [w] = F ε [t, x, w, Dw, D2 w]

(in abusive notation) and the function F ε which appears on the right-hand side above
converges (e.g. locally uniformly) as ε → 0, due to stability properties of flows
associated to RDEs as discussed in Section 8.9.
All one now needs is a (deterministic) PDE framework with a number of good
properties, along the following “wish list”.
1. All approximate problems, i.e. with W ε ∈ C 1 ([0, T ], Rd )
d
X
∂t uε = F [uε ] + Hi [uε ]Ẇtε,i , uε (0, ·) = g ε ,
i=1

should admit a unique solution, in a suitable class U of functions on [0, T ] × Rn ,

for a suitable class of initial conditions in some space G.
2. The change of variable calculus (Propositions 12.6 - 12.10) should remain valid,
so that uε ∈ U is a solution to (12.18) if and only if its transformation v ε ∈ U is a
solution to (12.19).
12.1 Rough partial differential equations 177

3. There should be a good stability theory, so that g ε → g 0 in G and F ε → F 0 (in a

suitable sense) allows to obtain convergence in U of solutions v ε to (12.19) with
intitial data g ε to the (unique) solution of the limiting problem ∂t v 0 = F 0 [v 0 ]
with initial data g 0 .
4. At last, the topology of U should be weak enough to make sure that v → v 0
implies that the “back-transformed” u converges in U, with limit u0 being v 0
back-transformed.3
The final point suggests to define a solution to

du = F [u]dt + H[u]dW , u(0, ·) = g , (12.20)

as an element in U which, under the correct flow transformation associated to W and

H, solves the transformed equation ∂t v = F 0 [v], v(0, ·) = g. To make this more
concrete, consider the transport case a). As before, ψ = ψ W is the flow associated to
the RDE dY = −β(Y )dW and u solves the above RPDE (with H[u] = hβ(x), Dui)
if, by definition, v(t, x) := u(t, ψt (x)) solves ∂t v = F ψ [v], with v(0, ·) = g. The
same logic applies to cases b) and c).
We then have the following (meta-) theorem, subject to a PDE framework with
the above properties.
Theorem 12.12. Let α ∈ ( 31 , 12 ]. Given a geometric rough path W = (W, W) ∈
Cg0,α ([0, T ], Rd ), pick W ε ∈ C 1 [0, T ], Rd so that

Z ·
(W ε , Wε ) := W ε, ε
W0,t ⊗ dWtε → W
0

in α-Hölder rough path metric. Consider unique solutions u ∈ U to the PDEs

∂t u = F [u ] + H[u ]Ẇ
(12.21)
u (0, ·) = g ∈ G.

Then there exists u = u(t, x) ∈ U, not dependent on the approximating (W ε ) but

only on W ∈ Cg0,α ([0, T ], Rd ), so that

uε = S[W ε ; g] → u =: S[W; g]

as ε → 0 in U. This u is the unique solution to the RPDE (12.20) in the sense of the
above definition. Moreover, the resulting solution map,

S : Cg0,α ([0, T ], Rd ) × G → U

is continuous.
It remains to identify suitable PDE frameworks, depending on the non-linearity
F . When ∂t u = F [u] is a scalar conservation law, entropy solutions actually provide
3
Given the roughness in t of our transformations, typically α-Hölder, it would not be wise to
incorporate temporal C 1 -regularity in the definition of the space U .
178 12 Stochastic partial differential equations

a suitable framework to handle additional rough noise, at least of (linear) type c),
[FG14]. On the other hand, when F = F [u] is a fully non-linear second order oper-
ator, say of Hamilton-Jacobi-Bellman (HJB) or Isaacs type, the natural framework
is viscosity theory [CIL92, FS06] and the problem of handling additional “rough”
/ C 1 , also with non-linear H = H(Du), was first raised by
noise, in the sense of W ∈
Lions–Sougandis [LS98a, LS98b, LS00a, LS00b].

12.1.3 Rough viscosity solutions

Consider a real-valued function u = u(x) with x ∈ Rm and assume u ∈ C 2 is a

classical supersolution,
−G x, u, Du, D2 u ≥ 0,

where G is continuous and degenerate elliptic in the sense that G(x, u, p, A) ≤

G(x, u, p, A + B) whenever B ≥ 0 in the sense of symmetric matrices. The idea is
to consider a (smooth) test function ϕ which touches u from below at some interior
point x̄. Basic calculus implies that Du(x̄) = Dϕ(x̄), D2 u(x̄) ≥ D2 ϕ(x̄) and, from
degenerate ellipticity,
−G x̄, ϕ, Dϕ, D2 ϕ ≥ 0.

(12.22)
This motivates the definition of a viscosity supersolution (at the point x̄) to −G = 0
as a (lower semi-)continuous function u with the property that (12.22) holds for
any test function which touches u from below at x̄. Similarly, viscosity subsolutions
are (upper semi-)continuous functions defined via test functions touching u from
above and by reversing inequality in (12.22); viscosity solutions are both super-
and subsolutions. Observe that this definition covers (completely degenerate) first
order equations as well as parabolic equations, e.g. by considering ∂t − F = 0
on [0, T ] × Rn where F is degenerate elliptic. Let us mention a few key results of
viscosity theory, with special regard to our “wish list”.
1. One has existence and uniqueness results in the class of BC solutions to the
initial value problem (∂t − F )u = 0, u(0, ·) = g ∈ BU C(Rn ), provided F =
F (t, x, u, Du, D2 u) is continuous, degenerate elliptic, there exists γ ∈ R such
that, uniformly in t, x, p, X,

γ(s − r) ≤ F (t, x, r, p, X) − F (t, x, s, p, X) whenever r ≤ s, (12.23)

and some technical conditions hold. 4 Without going into technical details, the con-
ditions are met for F = L as in (12.2) and are robust under taking inf and sup (pro-
vided the regularity of the coefficients holds uniformly). As a consequence, HJB
and Isaacs type non-linearities, where F takes the form infa La , infa supa0 La,a0 ,
are also covered.
4
... the most important of which is [CIL92, (3.14)]. Additional assumptions on F are necessary,
however, in particular due to the unboundedness of the domain Rn , and these are not easily found
in the literature; see [DFO14]. One can also obtain existence and uniqueness result in BUC.
12.1 Rough partial differential equations 179

2. The change-of-variable “calculus” Propositions 12.6 - 12.10 remain valid for

(continuous) viscosity solutions. Indeed, this can be checked directly from the
definition of a viscosity solution.
3. In fact, the technical conditions mentioned in 1. imply a particularly strong form
of uniqueness, known as comparison: assume u (resp. v) is a subsolution (resp.
supersolution) and u0 ≤ v0 ; then u ≤ v on [0, T ] × Rn . A key feature of viscosity
theory is what workers in the field simply call stability, a powerful incarnation
of which is known as Barles and Perthame procedure [FS06, Section VII.3] and
relies on comparison for (semi-continuous) sub- and super-solutions. In the for
us relevant form, one assumes comparison for ∂t − F 0 and considers viscosity
solutions to (∂t − F ε )v ε = 0, with v ε (0, ·) = g ε , assuming locally uniform
boundedness of v ε and g ε → g 0 locally uniformly. Then v ε → v 0 locally uni-
formly where v 0 is the (unique) solution to the limiting problem ∂t − F 0 v 0 = 0,
with v 0 (0, ·) = g 0 .
In the context of RPDEs above, again with focus on the transport case a) for
the sake of argument, F 0 = F ψ where ψ = ψ W , where ψ is a flow of C 3 -
diffeomorphisms (associated to the RDE dY = −β(Y )dW thereby leading to
the assumption β ∈ Cb5 ). As a structural condition on F , we may simply assume
“ψ-invariant comparison” meaning that comparison holds for ∂t − F ψ , for any C 3 -
diffeomorphism with bounded derivatives. Checking this condition turns out to be
easy. First, when F = L is linear, we have F ψ = Lψ also linear, with similar bounds
on the coefficients as L due to the stringent assumptions on the derivatives of ψ.
From the above discussion, and in particular from what was said in 1., it is then
clear that L satisfies ψ-invariant comparison. In fact, stability of the condition in 1.
under taking inf and sup, also implies that HJB and Isaacs type non-linearities satisfy
ψ-invariant comparison.
It is now possible to implement the arguments of the previous (meta-) Theorem
12.12 in the viscosity framework [CFO11]. We tacitly assume that all approximate
problems of the form (12.24) below have a viscosity solution, for all W ε ∈ C 1 and
g ∈ BU C, but see Remark 12.14.
Theorem 12.13. Let α ∈ ( 31 , 12 ]. Given a geometric rough path W = (W, W) ∈
Cg0,α ([0, T ], Rd ), pick W ε ∈ C 1 [0, T ], Rd so that (W ε , Wε ) → W in α-Hölder

rough path metric. Consider unique BC viscosity solutions u to

∂t u = F [u ] + hβ(x), DuiẆ
(12.24)
u (0, ·) = g ∈ BU C(Rn )

where F satisfies ψ-invariant comparison. Then there exists u = u(t, x) ∈ BC, not
dependent on the approximating (W ε ) but only on W ∈ Cg0,α ([0, T ], Rd ), so that

uε = S[W ε ; g] → u =: S[W; g]

as ε → 0 in local uniform sense. This u is the unique solution to the RPDE (12.20)
with transport noise H[u] = hβ(x), Dui in the sense of the definition given previous
to Theorem 12.12. Moreover, we have continuity of the solution map,
180 12 Stochastic partial differential equations

S : Cg0,α ([0, T ], Rd ) × BUC(Rn ) → BC([0, T ] × Rn ) .

Remark 12.14. In the above theorem, existence of RPDE solutions actually relies on
existence of approximate solutions uε , which one of course expects from standard
viscosity theory. Mild structural conditions on F , satisfied by HJB and Isaacs exam-
ples, which imply this existence are reviewed in [DFO14]. One can also establish a
modulus of continuity for RPDE solutions, so that u ∈ BU C after all.
Remark 12.15. The RPDE solution to du = F [u]dt + hβ(x), DuidW as constructed
above, when F = infa La is of HJB form, arises in pathwise stochastic control
[LS98b, BM07, DFG13].
Unfortunately, in the “semi-linear” noise case b), it turns out the structural as-
sumptions one has to impose on F in order to have the necessary comparison for
∂t − F 0 = 0 is rather restrictive, although semilinear situations are certainly covered.
Even in this case, due to the appearance of a quadratic non-linearity in Du, the argu-
ment is involved and requires a careful analysis on consecutive small time intervals,
rather than [0, T ]; see [LS00a, DF12]. A non-linear Feynman–Kac representation, in
terms of rough backward stochastic differential equations is given in [DF12].
At last, we return to the fully linear case of Section 12.1.1. That is, we consider
the (linear noise) case c) with linear F = L. With some care [FO14], the double
transformation leading to the transformed equation ∂t − ϕ (F ψ ) = 0 can be imple-
mented with the aid of coupled flows of rough differential equations. We can then
recover Theorem 12.2, but with somewhat different needs concerning the regularity
of the coefficients. (For instance, in the aforementioned theorem we really needed
σ, β ∈ Cb3 whereas now, using flow decomposition, we need β ∈ Cb5 but only σ ∈ Cb1 .
Remark 12.16. By either approach, case c) with linear F = L or Theorem 12.2,
we obtain a robust view on classes SPDEs which contain the Zakai equation from
filtering theory, provided the initial law admits a BU C-density. Robustness is an
important issue in filtering theory, see also Exercise 12.24.

12.2 Stochastic heat equation as a rough path

Nonlinear stochastic partial differential equations driven by very singular noise, say
space-time white noise, may suffer from the fact that their nonlinearities are ill-posed.
For instance, even in space dimension one, there is no obvious way of giving “weak”
meaning to Burgers-like stochastic PDEs of the type
n
X
∂t ui = ∂x2 ui + f (u) + gji (u)∂x uj + ξ i , i = 1, . . . , n , (12.25)
j=1

where ξ = ξ i denotes space-time white noise (strictly speaking, n independent
copies of scalar space-time white noise). Recall that, at least formally, space-time
12.2 Stochastic heat equation as a rough path 181

white noise is a Gaussian generalized stochastic process such that

Eξ i (t, x)ξ j (s, y) = δij δ(t − s)δ(x − y) .

As a consequence of the lack of regularity of ξ, it turns out that the solution to the
stochastic heat equation (i.e. the case f = g = 0 in (12.25) above) is only α-Hölder
continuous in the spatial variable x for any α < 1/2. In other words, one would
not expect any solution u to (12.25) to exhibit spatial regularity better than that of a
Brownian motion.
As a consequence, even when aiming for a weak solution theory, it is not clear
how to define the integral of a spatial test function ϕ against the nonlinearity. Indeed,
this would require us to make sense of expressions of the type
Z
ϕ(x)gji (u)∂x uj (t, x) dx ,

for fixed t. When g happens to be a gradient, such an integral can be defined by pos-
tulating that the chain rule holds and integrating by parts. For a general g, as arising
in applications from path sampling [HSV07], this approach fails. This suggests to
seek an understanding of u(t, ·) as a spatial rough path. Indeed, this would solve the
problem just explained by allowing us to define the nonlinearity in a weak sense as
Z
ϕ(x)gji (u) duj (t, x) ,

where u is the rough path associated to u.

In the particular case of (12.25), it is actually sufficient to be able to associate a
rough path to the solution ψ to the stochastic heat equation

∂t ψ = ∂x2 ψ + ξ .

Indeed, writing u = ψ + v and proceeding formally for the moment, we then see that
v should solve
n
X
∂t v i = ∂x2 v i + f (v + ψ) + gji (v + ψ) ∂x ψ j + ∂x v j ) .
j=1

If we were able to make sense of the term appearing in the right hand side of this
equation, one would expect it to have the same regularity as ∂x ψ so that, since
ψ(t, ·) turns out to belong to C α for every α < 1/2, one would expect v(t, ·) to be
of regularity C α+1 for every α < 1/2. In particular, we would not expect the term
involving ∂x v j to cause any trouble, so that it only remains to provide a meaning for
the term gji (v + ψ)∂x ψ j . If we know that v ∈ C 1 and we have an interpretation of
ψ(t, ·) as a rough path ψ (in space), then this can be interpreted as the distribution
whose action, when tested against a test function ϕ, is given by
182 12 Stochastic partial differential equations
Z
ϕ(x)gji (ψ + v)) dψ j (t, x) .

This reasoning can actually be made precise, see the original article [Hai11b]. In this
section we limit ourselves to providing the construction of ψ and giving some of its
basic properties.

12.2.1 The linear stochastic heat equation

We now study the model problem in this context - the construction of a spatial rough
path associated, in essence, to the above SPDE in the case f = g = 0. More precisely,
we are considering stationary (in time) solution to the stochastic heat equation5 ,

dψt = −Aψt dt + σdWt , (12.26)

where, for fixed λ > 0

Au = −∂x2 u + λu;
and W is a cylindrical Wiener process over L2 (T), the L2 -space over the one-
dimensional torus. Let (ek : k ∈ Z) denote the standard Fourier-basis of L2 (T)
 1
 √π sin (kx) for k > 0

ek (x) = √1 for k = 0
2π
 √1 cos (kx) for k < 0

π

which diagonalises the operator A in the sense that

Aek = µk ek , muk = k 2 + λ , k∈Z.

Thanks to the fact that we chose λ > 0, the stochastic heat equation (12.26) has
indeed a stationary solution
P which, by taking Fourier transforms, may be decom-
posed as ψ(x, t; ω) = k Ytk (ω)ek (x). The components Ytk are then a family of
independent stationary one-dimensional Ornstein-Uhlenbeck processes given by

dYtk = −µk Ytk dt + σdBtk ,

where (B k : k ∈ Z) is a family of i.i.d. standard Brownian motions. An explicit

calculation yields
σ2
E Ysk Ytk =

exp (−µk |t − s|) ,
2µk
so that in particular

5
With λ = 0, the 0th mode of ψ behaves like a Brownian motion and ψ cannot be stationary in
time, unless one identifies functions that only differ by a constant.
12.2 Stochastic heat equation as a rough path 183

2 σ2
E Ytk = ,
2µk
for any fixed time t.

Lemma 12.17. For each fixed t, the spatial covariance of ψ is given by

E(ψ(x, t)ψ(y, t)) = K(|x − y|)

where K is given by

1 2 X cos (ku) σ2 √
K(u) := σ = √ √ cosh λ(u − π) .
4π µk 4 λ sinh λπ
k∈Z

Here, the second equality holds for u restricted to [0, 2π]. In fact, the cosine series is
the periodic continuation of the r.h.s. restricted to [0, 2π].

Proof. From the basic identity cos (α − β) = cos α cos β + sin α sin β,
1
e−k (x)e−k (y) + ek (x)ek (y) = cos (k(x − y)), k ∈ Z .
π

in R(x, y) := E(ψ(x, t)ψ(y, t)), and using the

Inserting the respective expansion
independence of the Y k : k ∈ Z , gives
∞
X 2 1 2 1 X 2
R(x, y) = ek (x)ek (y)E Ytk = E Yt0 + cos (k(x − y))E Ytk
2π π
k∈Z k=1
2 X
σ cos (k(x − y))
= ,
4π λ + k2
k∈Z

and then R(x, y) = K(|x − y|) where

σ 2 X cos (kx)
K(x) = .
4π λ + k2
k∈Z
√
At last, expand the (even) function cosh λ |·|−π in its (cosine) Fourier-series
to get the claimed equality. t u

Proposition 12.18. Fix t ≥ 0. Then ψt (x; ω) = ψ(t, x; ω), indexed by x ∈ [0, 2π],
is a centred Gaussian process with covariance of finite 1-variation. More precisely,

Rψ(t,·)
1;[x,y]2
≤ 2πkKkC 2 ;[0,2π] |x − y| ,

and so (cf. Theorem 10.4), for each fixed t ≥ 0, the Rd -valued process

[0, 2π] 3 x 7→ ψt1 (x), . . . , ψtd (x) ,

184 12 Stochastic partial differential equations

copies of ψt , lifts canonically to a Gaussian rough path ψ t (·) ∈

consisting of d i.i.d.
Cg0,α [0, 2π], Rd .

Proof. This follows immediately from Exercise 10.16. t

Remark 12.19. There are ad-hoc ways to construct a (spatial) rough path lift asso-
ciated to the stochastic heat-equation, for instance be writing ψ(t, ·) as Brownian
bridge plus a random smooth function. In this way, however, one ignores the large
body of results available for general Gaussian rough paths: for instance, rough path
convergence of hyper-viscosity or Galerkin approximation, extensions to fractional
stochastic heat equations, concentration of measure can all be deduced from general
principles.

We now show that solutions to the stochastic heat equation induces a continuous
stochastic evolution in rough path space.

Theorem 12.20. There exists a continuous modification of the map t 7→ ψ t with

values in Cgα [0, 2π], Rd .

Proof. Fix s and t. The proof then proceeds in two steps. First, we will verify the
assumptions of Corollary 10.6, namely we will show that
h iθ
2
|%α (ψs , ψt )|Lq ≤ C sup E(|ψs (x, y) − ψt (x, y)| ) ,
x,y∈[0,2π]

for some constant C that is independent of s and t. In the second step, we will show
that (here we may assume d = 1), with ψs (x, y) := ψs (y) − ψs (x), one has the
bound h i
2 1/2
sup E |ψs (x, y) − ψt (x, y)| = O |t − s| .
x,y∈[0,2π]

The existence of a continuous (and even Hölder) modification is then a consequence

of the classical Kolmogorov criterion.
For the first step, we write X = ψs1 (·), . .. , ψsd (·) and Y = ψt1 (·), . . . , ψtd (·) .
Note that one has independence of X i , Y i with X j , Y j for i 6= j. We have to
verify finite 1-variation (in the 2D sense) of the covariance of (X, Y ). In view of
Proposition 12.18, it remains to establish finite 1-variation of
X
(x, y) 7→ R(X 1 ,Y 1 ) (x, y) = E ψs1 (x)ψt1 (y) = ek (x)ek (y)E Ysk Ytk

k∈Z
2
σ X cos (k(x − y)) −(λ+k2 )|t−s|.
= e =: Rτ (x, y).
4π λ + k2
k∈Z

For every τ > 0, exponential decay of the Fourier-modes implies smoothness of Rτ .

We claim
kRτ k1-var;[u,v]2 ≤ C|v − u| < ∞,
uniformly in τ ∈ (0, 1] and u, v. To see this, write
12.2 Stochastic heat equation as a rough path 185
Z v Z v
kRτ k1-var;[u,v]2 = |∂xy Rτ |dx dy
u u
Z v Z v
X 2 eik(x−y) −(λ+k2 )τ

∼ k e dx dy
u u
λ + k2
Z v Z v X
2
∼ eik(x−y) e−k τ dx dy

Zuv Zuv
= pτ (x − y)dy dx ≤ |v − u| ,
u u
Rv R 2π
where we used the trivial estimate u pτ (x − y)dy ≤ 0 pτ (x − y)dy = 1. In this
expression, p denotes the (positive) transition kernel of the heat semigroup on the
torus. The step above, between second and third line, where we effectively set λ = 0
is harmless. The factor e−λτ may simply be taken out, and

k2 X k 2 X λ
X
ik(x−y) −k2 τ
1− e e ≤ 1 − λ + k 2 = <∞.

λ + k2 λ + k2

k k k

2
After integrating over [u, v] , we see that the error made above is actually of order
2
O |v − u| . This is more than enough to conclude that

R(X 1 ,Y 1 )
1-var;[u,v]2
≤ C|v − u| < ∞ ,

uniformly in τ ∈ (0, 1] and u, v.

We now turn to the second step of our proof. We claim that E|ψs1 (x, y) −
1/2
ψt1 (x, y)|2 = O |t − s| , uniformly in x, y ∈ [0, 2π]. Since
1
ψs (x, y) − ψt1 (x, y) ≤ ψs1 (x) − ψt1 (x) + ψs1 (y) − ψt1 (y) ,

the question reduces to a similar bound on E|ψs1 (x)−ψt1 (x)|2 , uniform in x ∈ [0, 2π].
This quantity is equal to

E ψs1 (x)ψs1 (x) − 2E ψs1 (x)ψt1 (x) + E ψt1 (x)ψt1 (x)

2
σ 2 X 2 1 − e−(λ+k )|t−s|

= .
4π λ + k2
k∈Z
2
σ 2 X 2 1 − e−(λ+k )|t−s|

σ2 X
≤ 2|t − s| + 2 ,
4π 4π λ + k2
|k|<N k≥N

where we used that 1 − e−cx ≤ cx for c, x > 0 in the first sum. We then take
N ∼ |t − s|−1/2 , so that the first sum is of order O |t − s|1/2 . For the second sum,
2
we use the trivial bound 1 − e−(λ+k )|t−s| ≤ 1. It then suffices to note that
186 12 Stochastic partial differential equations
X 1 X 1 1/2
≤ = O(1/N ) = O |t − s| ,
λ + k2 k2
k≥N k≥N

which completes the proof. t

u
Remark 12.21. The final estimate in the above proof, namely
2 1/2
Eψs1 (x) − ψt1 (x) = O |t − s|

,

also implies “almost 41 -Hölder” temporal regularity of the stochastic heat equation.

12.3 Exercises

Exercise 12.22 (From [DFS14]).

a) Assume W ∈ C 1 . Show that the Feynman–Kac solution (equivalently: viscosity)
solution to (12.4) is an analytically weak solution in the sense of (12.13) with dW
replaced by Ẇ dt.
b) Assume now W = (W, W) ∈ Cg0,α . Show that (Y, Y 0 ) ∈ DW 2α
.
c) Show that the Feynmann–Kac solution constructed in Theorem 12.2 is an analyti-
cally weak solution in the sense of (12.13).
Exercise 12.23 (From [CDFO13]). A crucial rôle in the proof of Theorem 12.2 was
played by a hybrid Itô-rough differential equation of the form

dXt = σ(Xt )dB + β(Xt )dW, (12.27)

ultimately solved as (random) rough differential equation, subject to σ, β ∈ Cb3 . Give

an alternative construction to the hybrid equation based on flow decomposition. That
is, use the flow associated to the RDE dY = β(Y )dW and transform (12.27) into a
bona-fide Itô differential equations. Hint: When W is replaced by a C 1 path W ε this
a straight-forward computation. Use stability of RDE flows, combined with stability
results for Itô SDEs to conclude. Specify the regularity requirements on σ, β.

Exercise 12.24 (Robust filtering, [CDFO13]). Consider a pair of processes (X, Y )

with dynamics
X X
dXt = V0 (Xt , Yt )dt + Zk (Xt , Yt )dWtk + Vj (Xt , Yt )dBtj , (12.28)
k j
dYt = h(Xt , Yt )dt + dWt , (12.29)

with X0 ∈ L∞ and Y0 = 0. For simplicity, assume coefficients V0 , V1 , . . . , VdB :

RdX +dY → RdX , Z1 , . . . , ZdY : RdX +dY → RdX and h = (h1 , . . . , hdY ) :
RdX +dY → RdY to be bounded with bounded derivatives of all orders; W and
B are independent Brownian motions of the correct dimension. We now interpret X
12.3 Exercises 187

as a signal and Y as noisy and incomplete observation. The filtering problem consists
in computing the conditional distribution of the unobserved component X, given the
observation Y . Equivalently, one is interested in computing

πt (g) = E[g(Xt , Yt )|Yt ] ,

where Yt is the observation filtration and g is a suitably chosen test function. Measure
theory tells us that there exists a Borel-measurable map θtg : C([0, t], RdY ) → R, such
that a.s. πt (g) = θtg (Y ) where we consider Y = Y (ω) as a C([0, t], RdY )-valued
random variable. Note that θtg is not uniquely determined (after all, modifications
on null sets are always possible). On the other hand, there is obvious interest to
have a robust filter, in the sense of having a continuous version of θtg , so that close
observations lead to nearby conclusions about the signal.
a) Give an example to show that, in general, θtg does not admit a continuous version.
b) Let α ∈ (1/2, 1/3). Show that there exists a continuous map on rough path space

Θtg : Cg0,α ([0, t], RdY ) → R ,

such that a.s.

πt (g) = Θtg (Y) , (12.30)
where Y is the random geometric rough path obtained from Y by iterated
Stratonovich integration.
Hint: You may use the “Kallianpur–Striebel formula”, a standard result in filtering
theory which asserts that

pt (g)
πt (g) = , pt (g) := E0 [g(Xt , Yt )vt |Yt ]
pt (1)

where
!

dP0 XZ t 1 t
Z
i i 2
= exp − h (Xs , Ys )dWs − ||h(Xs , Ys )|| ds
dP Ft i 0 2 0

and v = {vt , t > 0} is defined as the right-hand side above with −W replaced by Y .

Exercise 12.25. Show almost sure “( 41 − ε)-Hölder” temporal regularity of ψ =

ψt (x; ω), solution to the stochastic heat equation. Show that, for fixed x, ψt (x; ω) is
not a semi-martingale.

Exercise 12.26 (Spatial Itô–Stratonovich correction; from [HM12]). Writing T

for [0, 2π] with periodic boundary, let us say that

u = u(t, x; ω) : [0, T ] × T × Ω → R

is a (analytically) weak solution to

188 12 Stochastic partial differential equations

1 2
∂t u = ∂xx u − u + ∂x u +ξ , (?)
2

if and only if u = v + ψ where ψ is the stationary solution to ∂t ψ = ∂xx ψ − ψ + ξ

and, for all test functions ϕ ∈ C ∞ (T),

1 2
∂t hv, ϕi = hv, ∂xx ϕi − hv, ϕi − u , ∂x ϕ .
2

a) Replace ∂x ( 12 u2 ) in (?) by a (spatially right) finite-difference approximation,

2
1 u(. + ε) − u2
;
2 ε
write uε for a solution to the resulting equation. Assume uε → u locally uniformly
in probability. Show that u is a solution to (?).

b) At least formally, ∂x 21 u2 = u∂x u in (?), which suggests an alternative finite

difference approximation, namely,

(u(. + ε) − u)
u ;
ε
Assume v ε = uε − ψ → v := u − ψ and its first (spatial) derivatives converge
locally uniformly in probability. Show that u is an analytically weak solution to
the perturbed equation

1 2
∂t u = ∂xx u + ∂x u +C +ξ
2

with C 6= 0. Determine the value of C. Hint: Use Exercise 10.20.

Solution 12.27. a) By switching to suitable subsequences, we may assume uε → u
locally uniformly with probability one. Write Dε,l , Dε,r for a discrete (left, right)
finite difference approximation. Note

1 2 1 2 1 2
Dε,r u , ϕ = − u , Dε,l ϕ → − u , ∂x ϕ .
2 2 2

Given that v ε = uε − ψ → v := u − ψ locally uniform it then suffices to pass to

the limit in the (integral formulation) of

ε ε ε 1 2
∂t hv , ϕi = hv , ∂xx ϕi − hv , ϕi + u , Dε,l ϕ .
2

b) We note
2
1 u(. + ε) − u2

1 2 (u(. + ε) + u) (u(. + ε) − u)
Dε,r u = =
2 2 ε 2 ε
12.3 Exercises 189

(u(. + ε) − u) 1 2
=u + (u(. + ε) − u) .
ε 2ε
It follows that
ε
+ ε) − uε )

ε ε ε ε (u (.
∂t hv , ϕi = hv , ∂xx ϕi − hv , ϕi + u ,ϕ .
ε
= hv ε , ∂xx ϕi − hv ε , ϕi

1 ε 2 1 ε 2
− (u ) , Dε,l ϕ − (u (. + ε) − uε ) , ϕ .
2 2ε

In order to pass to the ε → 0 limit, we must understand the final “quadratic

variation” term. By assumption v ε are of class C 1 , uniformly in ε. Hence

[uε (. + ε) − uε ] = ψ(. + ε) − ψ + v ε (. + ε) − v ε
= ψ(. + ε) − ψ + O(ε)

and so, with osc (ψ; ε)O(1) + O(ε) = o(1) as ε → 0,

1 ε 2 1 2
(u (. + ε) − uε ) = (ψ(. + ε) − ψ) + o(1)
2ε 2ε
we have

1 ε ε 2 1 2
(u (. + ε) − u ) , ϕ = (ψ(. + ε) − ψ) , ϕ + o(1) .
2ε 2ε

From Lemma 12.17 we know that

2
E[ψx,x+ε ] = 2(K(0) − K(ε)) = −2K 0 (0)ε + o(ε) = Cε + o(ε) .

(u−π)
Since K(u) = cosh 0 1
4 sinh (π) , we have C = −2K (0) = 2 , and it follows from
Exercise 10.20 that
ψ 2x,x+ε
Z
1 2 1
(ψ(. + ε) − ψ) , ϕ = ϕ(x) dx
2ε 2 ε
Z
1 1
→ ϕ(x)Cdx = ,ϕ ,
2 4

where the convergence takes place in probability. It follows that u is a solution (in
the above analytically weak sense) of

1 2 1
∂t u = ∂xx u − u + ∂x u + + ξ.
2 4
190 12 Stochastic partial differential equations

12.4 Comments

Section 12.1: Linear stochastic partial differential equations go back at least to

Krylov–Rozovskii [KR77]. A Feynmann-Kac representation appears in Pardoux
[Par79] and Kunita [Kun82]. Kunita also has flow decompositions of SPDE solutions.
Caruana–Friz [CF09] implement this in the rough path setting in a framework of
classical PDE solutions. In the context of Crandall–Ishii-Lions viscosity setting,
non-linear SPDE problems (“stochastic viscosity solution”) where introduced by
Lions–Souganidis [LS98a, LS98b, LS00a, LS00b]. Caruana, Friz and Oberhauser
[CFO11] introduce “rough viscosity solutions”, for classes of non-linear SPDEs with
transport noise. Extensions to different noise situations are due to Diehl–Friz, [DF12]
and then [FO14]. Non-linear noise, quadratic in Du is considered by Friz–Gassiat
[FG13]. See [LPS13, FG14] for similar investigations in the context of stochastic
conservation laws. A non-linear Feynman–Kac representation (with relations to
“rough BSDEs”) is given in [DF12]. In a filtering context, a (rough path) robustified
Kalianpur–Striebel formula (cf. Exercise 12.24) was given by Crisan, Diehl, Friz and
Oberhauser [CDFO13], which is also the first source of hybrid differential equations.
The construction of hybrid stochastic/rough differential equations as encountered in
the proof of Theorem 12.2 is taken from [DOR13]; see also [DFS14]. At last, we
refer to Gubinelli–Tindel, Deya et al. and Teichmann [GT10, DGT12, Tei11] for
some other rough path approaches to SPDEs.
Section 12.2: The construction of a spatial rough path associated to the stochastic
heat equation is due to Hairer [Hai11b] and allows to deal with otherwise ill-posed
SPDEs of stochastic Burgers type, see also Hairer–Weber [HW13] and Friz, Gess,
Gulisashvili, Riedel [FGGR13] for various extensions (including multiplicative noise,
and fractional Laplacian / non-periodic boundary respectively). This construction is
also important in giving meaning to the KPZ equation, Hairer [Hai13] and Chapter
15. Exercise 12.26, in the spirit of Föllmer – rather than rough path – integration, is
taken from Hairer–Maas [HM12]. Similar results are available for rough SPDEs of
type (12.25), see Hairer, Maas and Weber [HMW14], but this is beyond the scope of
these notes.
Chapter 13
Introduction to regularity structures

Abstract We give a short introduction to the main concepts of the general theory
of regularity structures. This theory unifies the theory of (controlled) rough paths
with the usual theory of Taylor expansions and allows to treat situations where the
underlying space is multidimensional.

13.1 Introduction

While a full exposition of the theory of regularity structures is well beyond the
scope of this book, we aim to give a concise overview to most of its concepts and
to show how the theory of controlled rough paths fits into it. In most cases, we will
only state results in a rather informal way and give some ideas as to how the proofs
work, focusing on conceptual rather than technical issues. The only exception is
the “reconstruction theorem”, Theorem 13.12 below, which is one of the linchpins
of the whole theory. Since its proof (or rather a slightly simplified version of it) is
relatively concise, we provide a fully self-contained version. For precise statements
and complete proofs of most of the results exposed here, we refer to the original
article [Hai14c]. See also the review articles [Hai14a, Hai14b] for shorter expositions
that complement the one given here.
It should be clear by now that a controlled rough path (Y, Y 0 ) ∈ DW 2α
bears a
strong resemblance to a differentiable function, with the Gubinelli derivative Y 0
describing the coefficient in front of a “first-order Taylor expansion” of the type

Yt = Ys + Ys0 Ws,t + O(|t − s|2α ) . (13.1)

Compare this to the fact that a function f : R → R is of class C γ with γ ∈ (k, k+1)
(1) (k)
if for every s ∈ R there exist coefficients fs , . . . , fs such that
k
X
ft = fs + fs(`) (t − s)` + O(|t − s|γ ) . (13.2)
`=1

191
192 13 Introduction to regularity structures

(`)
Of course, fs is nothing but the `th derivative of f at the point s, divided by `!.
In this sense, one should really think of a controlled rough path (Y, Y 0 ) ∈ DW 2α

as a 2α-Hölder continuous function, but with respect to a “model” given by W ,

rather than the usual Taylor polynomials. This formal analogy between controlled
rough paths and Taylor expansions suggests that it might be fruitful to systematically
investigate what are the “right” objects that could possibly take the place of Taylor
polynomials, while still retaining many of their nice properties.

13.2 Definition of a regularity structure and first examples

The first step in such an endeavour is to set up an algebraic structure reflecting

the properties of Taylor expansions. First of all, such a structure should contain a
vector space T that will contain the coefficients
L of our expansion. It is natural to
assume that T has a graded structure: T = α∈A Tα , for some set A of possible
“homogeneities”. For example, in the case of the usual Taylor expansion (13.2), it is
natural to take for A the set of natural numbers and to have T` contain the coefficients
corresponding to the derivatives of order `. In the case of controlled rough paths
however, it is natural to take A = {0, α}, to have again T0 contain the value of the
function Y at any time s, and to have Tα contain the Gubinelli derivative Ys0 . This
reflects the fact that the “monomial” t 7→ Xs,t only vanishes at order α near t = s,
while the usual monomials t 7→ (t − s)` vanish at integer order `.
This however isn’t the full algebraic structure describing Taylor-like expansions.
Indeed, one of the characteristics of Taylor expansions is that an expansion around
some point x0 can be re-expanded around any other point x1 by writing
X m!
(x − x0 )m = (x1 − x0 )k · (x − x1 )` . (13.3)
k!`!
k+`=m

(In the case when x ∈ Rd , k, ` and m denote multi-indices and k! = k1 ! . . . kd !.)

Somewhat similarly, in the case of controlled rough paths, we have the (rather trivial)
identity
Ws0 ,t = Ws0 ,s1 · 1 + 1 · Ws1 ,t . (13.4)
What is a natural abstraction of this fact? In terms of the coefficients of a “Taylor
expansion”, the operation of reexpanding around a different point is ultimately just a
linear operation from Γ : T → T , where the precise value of the map Γ depends on
the starting point x0 , the endpoint x1 , and possibly also on the details of the particular
“model” that we are considering. In view of the above examples, it is naturalL to impose
furthermore that Γ has the property that if τ ∈ Tα , then Γ τ − τ ∈ β<α Tβ . In
other words, when reexpanding a homogeneous monomial around a different point,
the leading order coefficient remains the same, but lower order monomials may
appear.
13.2 Definition of a regularity structure and first examples 193

These heuristic considerations can be summarised in the following definition of

an abstract object we call a regularity structure:

Definition 13.1. A regularity structure T = (A, T, G) consists of the following

elements:
• An index set A ⊂ R such that 0 ∈ A, A is bounded from below, and A is locally
finite. L
• A model space T , which is a graded vector space T = α∈A Tα , with each
Tα a Banach space; elements in Tα are said to have homogeneity (or degree) α.
Furthermore T0 = h1i ∼
= R. Given τ ∈ T , we will write kτ kα for the norm of its
component in Tα .
• A structure group G of (continuous1 ) linear operators acting on T such that, for
every Γ ∈ G, every α ∈ A, and every τα ∈ Tα , one has
def
M
Γ τα − τα ∈ T<α = Tβ . (13.5)
β<α

Furthermore, Γ 1 = 1 for every Γ ∈ G.

Remark 13.2. The assumption T0 ∼ = R is not really crucial, but it is convenient in

some cases and all of the natural examples we know of do satisfy it.

Remark 13.3. In principle, the index set A can be infinite. By analogy with the
polynomials,
P it is then natural to consider T as the set of all formal series of the form
α∈A τα , where only finitely many of the τα ’s are non-zero. This also dovetails
nicely with the particular form of elements in G. In practice however we will only
ever work with finite subsets of A so that the precise topology on T does not matter
as long as each of the Tα is finite-dimensional which is the case in all of the examples
we will consider here.

The model space should be thought of as consisting of “abstract” Taylor expan-

sions (or “jets”) , where each element of Tα would correspond to a ”monomial of
degree (homogeneity) α” (this will be made meaningful with the definition of a
model below). To avoid confusion between “abstract” elements of T and “concrete”
associated functions (or Schwartz distributions), we will use color to denote elements
of T , e.g. τ . Typically, T will be generated (as a free vector space) by a set of “basis
symbols”, so that T consists of all formal (finite) linear combination obtained from
regarding these symbols as basis vectors. Given basis symbols/vectors τ1 , τ2 , . . . we
indicate this by
T = hτ1 , τ2 , . . . i. (13.6)
Important convention: basis symbols will always by listed in order of increasing
homogeneities. That is, τi ∈ Tαi with α1 ≤ α2 ≤ . . . in (13.6). We now turn to
some first examples of regularity structures.

1
This only matters if dim T<α = +∞ for some α ∈ A.
194 13 Introduction to regularity structures

13.2.1 The canonical polynomial structure

We start with two simple special cases followed by the general polynomial structure.
Fix γ ∈ (0, 1) and consider a real-valued function belonging to the Hölder space
γ
of exponent γ, say f ∈ C γ . In other words, f : R → R, and |fx − fy | . |y − x|
uniformly for x, y on compacts. The trivial regularity structure

A = {0} , T = T0 = h1i ∼
=R, G = {I} ,

allows us to interpret the function f as a T -valued map

x 7→ f (x) := fx 1.

Consider next a real-valued function f : R → R of class C 2+γ , with γ ∈ (0, 1).

By this we mean that continuous derivatives Df and D2 f exist, with D2 f locally
γ-Hölder continuous. The minimal regularity structure allowing to capture the fact
that f ∈ C 2+γ is

A = {0, 1, 2} , T = T0 ⊕ T1 ⊕ T2 = h1, X, X 2 i ∼
= R3 ,

with structure group G = {Γh ∈ L(T, T ) : h ∈ (R, +)} where Γh is given, with
respect to the ordered basis 1, X, X 2 , by the matrix

1 h h2
 
 0 1 2h .
001

Note that Γg ◦ Γh = Γg+h , so that G inherits its group structure from (R, +).
Moreover, the triangular form, with ones on the diagonal, expresses exactly the
requirement (13.5), i.e. that the action of Γh on any element in T amounts to add
terms of lower homogeneity. This structure allows to represent the function f and its
first two derivatives as a truncated Taylor series, namely as the T -valued map
1
x 7→ f (x) := fx 1 + Dfx X + D2 fx X 2 .
2
It is now an easy matter to generalize the above considerations to general Hölder
maps of several variables, say f : Rd → R in the Hölder space C n+γ , which is
defined by the obvious generalisation of (13.2) to functions on Rd . In this case, we
would take A = {0, 1, . . . , n} and T is the space of abstract polynomials of degree
at most n, in d commuting indeterminants X1 , . . . , Xd . This motivates the following
definition.

Definition 13.4. The canonical polynomial regularity structure on Rd is given by

• A = N = {0, 1, 2, ...} is the set of nonnegative integers.
• T = R[X1 , . . . , Xd ] is the space of polynomials in d commuting indeterminants
with real coefficients and Tα is spanned by the monomials of degree α ∈ N.
13.2 Definition of a regularity structure and first examples 195

• G ∼ Rd , + acts on T via

Γh P (X) = P (X + h1) , h ∈ Rd ,

for any polynomial P .

Given an arbitrary multi-index k = (k1 , . . . , kd ), we write X k as a shorthand

for X1k1 · · · Xdkd , and we write |k| = k1 + · · · + kd . With this notation, for any
α ∈ A = N,
Tα = hX k : |k| = αi. (13.7)

13.2.2 The rough path structure

We start again from simple examples. What structure would be appropriate for Young
integration? Fix α ∈ (0, 1) and consider the problem of integrating a (continuous)
path YR against a scalar W ∈ C α . In the case of smooth W , the indefinite integral
Z = Y dW exists in Riemann–Stieltjes sense (and then Ż = Y Ẇ ). Otherwise,
Ẇ only exists as a Schwartz distribution (more precisely, Ẇ is an element of the
negative Hölder space C α−1 ). The corresponding regularity structure is given by

A = {α − 1, 0} , T = Tα−1 ⊕T0 = hẆ , 1i ∼

= R2 , G = {Id|T } . (13.8)

The potentially ill-defined product Ż = Y Ẇ can now be replaced by the perfectly

well-defined (abstract) T -valued map

s 7→ Ż(s) := Ys Ẇ .
We shall see later how Ż R gives rise to the Ż, the distributional derivative of the
indefinite Young integral Y dW , provided of course that Y has the correct regularity
such as Y ∈ C β with α + β > 1.
Let us next consider the “task” of representing a controlled rough path in a suitable
regularity structure. More precisely, consider α ∈ (1/3, 1/2], a path W ∈ C α with
values in R, say, and (Y, Y 0 ) ∈ DW
2α
so that

Yt ≈ Ys + Ys0 Ws,t . (13.9)

The right-hand side above is (some sort of) Taylor expansion, based on W ∈ C α ,
which describes Y well near the (time) point s. We want to formalize this by attaching
to each time s the “jet”
Y (s) := Ys 1 + Ys0 W .
Performing the substitution 1 7→ 1, W 7→ (y 7→ Ws,t ) gets us back to the right hand
side of (13.9). This suggests to define the following regularity structure

A = {0, α} , T = T0 ⊕ Tα = h1, W i ∼
= R2 ,
196 13 Introduction to regularity structures

with structure group G = {Γh ∈ L(T, T ) : h ∈ (R, +)} where Γh is given, with
respect to the ordered basis 1, W by the matrix

1h
.
01

The regularity structure relevant for rough integration is essentially a combination of

the two previous one.R Let W = (W, W) ∈ C α and (Y, Y 0 ) ∈ DW 2α
and consider the
rough integral Z := Y dW. Since, for s ≈ t, we have
Z t
Zs,t = Y dW ≈ Ys Ws,t + Ys0 Ws,t ,
s

this suggests (rather informally at this stage), that in the vicinity of any fixed time s,
the distributional derivative of Z should have an expansion of the type

Ż ≈ Ys Ẇ + Ys0 Ẇs , (13.10)

where Ẇ := ∂t Wt and Ẇs := ∂t Ws,t are distributional derivatives. This suggests

to attach the following “jet” at each point s,

Ż(s) := Ys Ẇ + Ys0 Ẇ , (13.11)

which can be done with the aid of the following regularity structure.

A = {α − 1, 2α − 1, 0, α} ,
T = Tα−1 ⊕ T2α−1 ⊕ T0 ⊕ Tα = hẆ , Ẇ, 1, W i ∼
= R4 ,

with structure group G = {Γh ∈ L(T, T ) : h ∈ (R, +)} where Γh is given, with
respect to the ordered basis Ẇ , Ẇ, 1, W , by the matrix
 
1h00
0 1 0 0 
0 0 1 h .
 

0001

Equivalently,

Γh 1 = 1 , Γh Ẇ = Ẇ , Γh W = W + h1 , Γh (Ẇ) = Ẇ + hẆ .

It will be seen later that in this framework the function Ż defined in (13.11) does
indeed give rise to Ż, the distributional derivative of the indefinite rough integral
Y dW. The extension to multi-component rough paths, W ∈ C ([0, T ], Re ) with
R

e > 1, is essentially trivial. We just need more basis vectors Ẇ i , Ẇj,k , W l (with
1 ≤ i, j, k, l ≤ e):
Definition 13.5. Let α ∈ (1/3, 1/2]. The regularity structure for α-Hölder rough
paths (over Re ) is given by
13.3 Definition of a model and first examples 197

• The set of possible homogeneities is given by A = {α − 1, 2α − 1, 0, α}.

2
• The model space T is given by T = Tα−1 ⊕ T2α−1 ⊕ T0 ⊕ Tα ∼ = Re+e +1+e
with

T0 = h1i , Tα = hW 1 , . . . , W e i ,
Tα−1 = hẆ 1 , . . . , Ẇ e i , T2α−1 = hẆij : 1 ≤ i, j ≤ ei .

• The group G ∼ (Re , +) acts on T via

Γh 1 = 1 , Γh W i = W i + hi 1 ,
i i
(13.12)
Γh Ẇ = Ẇ , Γh Ẇij = Ẇij + hi Ẇ j .
In a Brownian (rough path) context, one has Hölder regularity with exponent
α = 1/2 − κ, for arbitrarily small κ > 0. The above index set A, relevant for a
“regularity structure view” on stochastic integration, then becomes
n 1 1 o
A= − − κ, −2κ, 0, − κ ,
2 2
which, in abusive but convenient notation, we write as
n 1− − 1 −o
A= − , 0 , 0, .
2 2
Index sets of this form (“half-integers− ”) will also be typical in later SPDE situations
driven by spatial or space-time white noise.

13.3 Definition of a model and first examples

At this stage, a regularity structure is a completely abstract object. It only becomes

useful when endowed with a model, which is a concrete way of associating to
any τ ∈ T and x0 ∈ Rd , the actual “Taylor polynomial based at x0 ” represented
by τ . Furthermore, we want elements τ ∈ Tα to represent functions (or possibly
distributions!) that “vanish at order α” around the given point x0 (thereby justifying
our calling α homogeneity).
Since we would like to allow A to contain negative values and therefore allow
elements in T to represent actual distributions, we need a suitable notion of “vanishing
at order α”. We achieve this by considering the size of our distributions, when tested
against test functions that are localised around the given point x0 . Given a test
function ϕ on Rd , we write ϕλx as a shorthand for

ϕλx (y) = λ−d ϕ λ−1 (y − x) .

Given an integer r > 0, we also denote by Br the set of all functions ϕ : Rd → R

such that ϕ ∈ Cbr with kϕkCbr ≤ 1 that are furthermore supported in the unit ball
198 13 Introduction to regularity structures

around the origin. We also write D0 (Rd ) for the space of Schwartz distributions on
Rd . With these notations, our definition of a model for a given regularity structure
T is as follows.

Definition 13.6. Given a regularity structure T and an integer d ≥ 1, a model

M = (Π, Γ ) for T on Rd consists of maps

Π : Rd → L T, D0 (Rd ) Γ : Rd × Rd → G

x 7→ Πx (x, y) 7→ Γxy

such that Γxy Γyz = Γxz and Πx Γxy = Πy . We then say that Πx realizes an element
of T as a Schwartz distribution.
Furthermore, write r for the smallest integer such that r > | min A| ≥ 0. We then
impose that for every compact set K ⊂ Rd and every γ > 0, there exists a constant
C = C(K, γ) such that the bounds
Πx τ (ϕλx ) ≤ Cλα kτ kα , kΓxy τ kβ ≤ C|x − y|α−β kτ kα ,

(13.13)

hold uniformly over ϕ ∈ Br , (x, y) ∈ K, λ ∈ (0, 1], τ ∈ Tα with α ≤ γ and β < α.

One very important remark is that the space M of all models for a given regularity
structure is not a linear space. However, it can be viewed as a closed subset (deter-
mined by the nonlinear constraints Γxy ∈ G, Γxy Γyz = Γxz , and Πy = Πx Γxy ) of
the linear space with seminorms (indexed by the compact set K) given by the smallest
constant C in (13.13). Also, there is a natural distance between models (Π, Γ ) and
(Π̄, Γ̄ ) given by the smallest constant C in (13.13), when replacing Πx by Πx − Π̄x
and Γxy by Γxy − Γ̄xy .

Remark 13.7. In principle, test functions appearing in (13.13) should be smooth. It

turns out that if these bounds hold for smooth elements of Br , then Πx τ can be
extended canonically to allow any Cbr test function with compact support.

Remark 13.8. The identity Πx Γxy = Πy reflects the fact that Γxy is the linear map
that takes an expansion around y and turns it into an expansion around x. The first
bound in (13.13) states what we mean precisely when we say that τ ∈ Tα represents
a term that vanishes at order α. (See Exercise 13.31; note that α can be negative, so
that this may actually not vanish at all!) The second bound in (13.13) is very natural
in view of both (13.3) and (13.4). It states that when expanding a monomial of order
α around a new point at distance h from the old one, the coefficient appearing in
front of lower-order monomials of order β is of order at most hα−β .

Remark 13.9. In many cases of interest, it is natural to scale the different directions of
Rd in a different way. This is the case for example when using the theory of regularity
structures to build solution theories for parabolic stochastic PDEs, in which case
the time direction “counts as” two space directions. This “parabolic scaling” can be
formalized by the integer vector (2, 1, . . . , 1). More generally, one can introduce a
scaling s of Rd , which is just a collection of d mutually prime strictly positive integers
13.3 Definition of a model and first examples 199

and to define ϕλx in such a way that the ith direction is scaled by λsi . The polynomial
structure introduced earlier, in particular (13.7), should be changed accordingly by
Pd
postulating that the homogeneity of X k is given by |k|s = i=1 si ki . In this case,
the Euclidean distance between two points should be replaced everywhere by the
corresponding scaled distance |x|s = i |xi |1/si . See [Hai14c] for more details.
P

With these definitions at hand, it is then natural to define an equivalent in this

context of the space of γ-Hölder continuous functions in the following way.

Definition 13.10. Given a regularity structure T equipped with a model M = (Π, Γ )

γ γ
over Rd , the space DM = DM (T ) is given by the set of functions f : Rd → T<γ
such that, for every compact set K and every α < γ, there exists a constant C with

kf (x) − Γxy f (y)kα ≤ C|x − y|γ−α (13.14)

uniformly over x, y ∈ K. Such functions f are called modelled distributions. For

fixed K, a semi-norm
kf kM,γ;K
γ
is defined as the smallest constant C in the bound (13.14). The space DM endowed
with this family of seminorms is then a Fréchet space.

It is furthermore convenient to be able to compare two modelled distributions

defined over two different models. In this case, a natural way of comparing them is
to take as a “metric” the smallest constant C in the bound

kf (x) − Γxy f (y) − f¯(x) + Γ̄xy f¯(y)kα ≤ C|x − y|γ−α .

Remark 13.11. (Compare with Remark 4.8 in the rough path context.) It is important
γ
to note that while the space of models M is not a linear space, the space DM is a
linear space (with Banach, or at least Fréchet structure), given a model M ∈ M . The
twist of course is that the space in question depends in a crucial way on the choice of
M. The total space then is
[
M n Dγ = {M} × DM
def 2α
,
M∈M

γ
with base space M and “fibres” DM .

The most fundamental result in the theory of regularity structures then states that
given f ∈ D γ with γ > 0, there exists a unique Schwartz distribution Rf on Rd
such that, for every x ∈ Rd , Rf “looks like Πx f (x) near x”. More precisely, one
has

Theorem 13.12 (Reconstruction). Let M = (Π, Γ ) be a model for a regularity

γ
structure T on Rd . Assume f ∈ DM with γ > 0. Then, there exists a unique linear
map
γ
R = RM : DM → D0 (Rd )
200 13 Introduction to regularity structures

such that
Rf − Πx f (x) (ϕλx ) . λγ ,

(13.15)
uniformly over ϕ ∈ Br and λ as before, and locally uniformly in x. Without the
positivity assumption on γ, everything remains valid but uniqueness of R.

Remark 13.13. Actually, R should really be viewed as a (nonlinear!) map from the
total space M n D γ into D0 (Rd ). It is then also continuous with respect to the
natural topology on this space, which is essential when using it to prove convergence
results. We will however not prove this stronger continuity statement here.

In the particular case where Πx τ happens to be a continuous function for every

τ ∈ T (and every x ∈ Rd ), we will see that Rf is also a continuous function and R
is given by the somewhat trivial explicit formula

Rf (x) = Πx f (x) (x) .

We postpone the proof of the reconstruction theorem, as well as the above remark,
and turn instead to our previous list of regularity structures, now adding the relevant
models and indicate the interest of the reconstruction map.

13.3.1 The canonical polynomial model

Recall the canonical polynomial regularity structure in d variables. In this context,

the canonical polynomial model P is given by

Πx X k = (y 7→ (y − x)k ) ,

Γxy = Γh h=x−y .

We leave it as an exercise to the reader to verify that this does indeed satisfy the
bounds and relations of Definition 13.6.
In the sense of the following proposition, modelled distributions in the context of
the polynomial model are nothing than classical Hölder functions.

Proposition 13.14. Let β = n + γ with n ∈ N and γ ∈ (0, 1). Then f is an element

in the Hölder space C β if and only if there exists a function fˆ ∈ DPβ with hfˆ, 1i = f .

The proof is not difficult. Given f ∈ C n+γ , write f (x) for the Taylor expansion up to
order n with all monomials (y − x)k replaced by X k . It is immediate to check that
fˆ = f will do. The converse is obvious when n = 0, the general case can be seen
by induction. The proposition remains valid for integer values of β with the usual
caveat that in this context C β means β − 1 times continuously differentiable with the
highest order derivatives locally Lipschitz continuous.
Validity of such a proposition for negative exponents requires a suitable notion
“negative” Hölder spaces. In fact, the considerations above (see also Exercise 13.31)
suggest that a very natural space of distributions is obtained in the following way.
13.3 Definition of a model and first examples 201

Given α > 0, we denote by C −α the space of all Schwartz distributions η such that η
belongs to the dual of Ccr (elements in Cbr with compact support) with r the smallest
integer such that r > α, and such that
η(ϕλx ) . λ−α ,

uniformly over all ϕ ∈ Br and λ ∈ (0, 1], and locally uniformly in x. Given any
compact set K, the best possible constant such that the above bound holds uniformly
over x ∈ K yields a seminorm. The collection of these seminorms endows C −α with
a Fréchet space structure.

Remark 13.15. In terms of the scale of classical Besov spaces, the space C −α is a
−α
local version of B∞,∞ . It is in some sense the largest space of distributions that is
invariant under the scaling ϕ(·) 7→ λ−α ϕ(λ−1 ·), see for example [BP08].

Let us now give a very simple application of the reconstruction theorem. It is

a classical result in the “folklore” of harmonic analysis (see for example [BCD11,
Thm 2.52] for a very similar statement) that the product extends naturally to C −α ×C β
into D0 (Rd ) if and only if β > α. Let us illustrate how to use the reconstruction
theorem in order to obtain a straightforward proof of the “if” part of this result:

Theorem 13.16. For β > α > 0, there is a continuous bilinear map

B : C β × C −α → D0 (Rd )

such that B(f, g) = f g for any two continuous functions f and g.

Proof. Assume from now on that g = ξ ∈ C −α for some α > 0 and that f ∈ C β for
some β > α. We then build a regularity structure T in the following way. For the
set A, we take A = N ∪ (N − α) and for T , we set T = V ⊕ W , where each one of
the spaces V and W is a copy of the canonical polynomial model (in d commuting
variables). We also choose Γ as in the canonical polynomial model above, acting
simultaneously on each of the two instances.
As before, we denote by X k the canonical basis vectors in V . We also use the
suggestive notation “ΞX k ” for the corresponding basis vector in W , but we postulate
that ΞX k ∈ T|k|−α rather than ΞX k ∈ T|k| . Given any distribution ξ ∈ C −α , we
then define a model (Π ξ , Γ ), where Γ is as in the canonical model, while Π ξ acts as

Πxξ X k (y) = (y − x)k , Πxξ ΞX k (y) = (y − x)k ξ(y) ,

with the obvious abuse of notation in the second expression. It is then straightforward
to verify that Πy = Πx ◦ Γxy and that the relevant analytical bounds are satisfied, so
that this is indeed a model.
Denote now by Rξ the reconstruction map associated to the model (Π ξ , Γ ) and,
for f ∈ C β , denote by f the element in D β given by the local Taylor expansion of
f of order β at each point. Note that even though the space D β does in principle
depend on the choice of model, in our situation f ∈ D β for any choice of ξ. It
202 13 Introduction to regularity structures

follows immediately from the definitions that the map x 7→ Ξf (x) belongs to D β−α
so that, provided that β > α, one can apply the reconstruction operator to it. This
suggests that the multiplication operator we are looking for can be defined as

B(f, ξ) = Rξ Ξf .

By Theorem 13.12, this is a jointly continuous map from C β × C −α into D0 (Rd ),

provided that β > α. If ξ happens to be a smooth function, then it follows immedi-
ately from the remark after Theorem 13.12 that B(f, ξ) = f (x)ξ(x), so that B is
indeed the requested continuous extension of the usual product. t u
Remark 13.17. In the context of this theorem, one can actually show that B(f, g) ∈
C −α . More generally, denoting by −α the smallest homogeneity arising in a given
regularity structure T , i.e. α = − min A, it is possible to show that the reconstruction
operator R takes values in C −α .
The reader may notice that one can also work with a finite-dimensional regularity
structure, based on index set Ñ ∪ (Ñ − α), with Ñ = {0, 1, . . . , n} and β = n + γ.
In particular, if n = 0, the regularity structure used here is exacty the one already
encountered in (13.8).

13.3.2 The rough path model

Let us see now how some of the results of Section 4 can be reinterpreted in the light
of this theory. Fix α ∈ (1/3, 1/2] and let T be the rough path regularity structure
put forward in Definition 13.5. Recall that this means A = {α − 1, 2α − 1, 0, α}.
We have for T0 a copy of R with unit vector 1, for Tα and Tα−1 a copy of Re with
respective unit vectors W j and Ẇ j , and for T2α−1 a copy of Re×e with unit vectors
Ẇij . The structure group G is isomorphic to Re and, for h ∈ Re , acts on T via

Γh 1 = 1 , Γh Ẇ i = Ẇ i , Γh W i = W i + hi 1 , Γh Ẇij = Ẇij + hi Ẇ j .
(13.16)
Let now W = (W, W) be an α-Hölder continuous rough path over Re . It turns out
that this defines a model for T in the following way:
Lemma 13.18. Given an α-Hölder continuous rough path W, one can define a model
M = MW for T on R by setting Γt,s = ΓWs,t and
j
Πs W j (t) = Ws,t

Πs 1 (t) = 1 ,
Z Z
Πs Ẇ j (ψ) = ψ(t) dWtj , Πs Ẇij (ψ) = ψ(t) dWij

s,t .

Here, both integrals are perfectly well-defined Riemann integrals, with the differential
in the second case taken with respect to the variable t. Given a controlled rough path
(Y, Y 0 ) ∈ DW
2α
, this then defines an element Y ∈ DM 2α
by
13.3 Definition of a model and first examples 203

Y (s) = Y (s) 1 + Yi0 (s) W i ,

with summation over i implied.

Proof. We first check that the algebraic properties of Definition 13.6 are satisfied.
It is clear that Γs,u Γu,t = Γs,t and that Πs Γs,u τ = Πu τ for τ ∈ {1, W j , Ẇ j }.
Regarding Ẇij , we differentiate Chen’s relations (2.1) which yields the identity

dWi,j i,j i j
s,t = dWu,t + Ws,u dWt .

The last missing algebraic relation then follows at once. The required analytic bounds
follow immediately (exercise!) from the definition of the rough path space C α .
Regarding the function Y defined in the statement, we have

kY (s) − Γs,u Y (u)k0 = |Y (s) − Y (u) + Yi0 (u)Ws,u

i
|,
kY (s) − Γs,u Y (u)kα = |Y 0 (s) − Y 0 (u)| ,

so that the condition (13.14) with γ = 2α does indeed coincide with the definition of
a controlled rough path. t u

Theorems 4.4 and 4.10 can then be recovered as a particular case of the recon-
struction theorem in the following way.

Proposition 13.19. In the same context as above, let α ∈ ( 31 , 12 ], and consider

the modelled distribution Y ∈ DM 2α
W
built as above from a controlled rough path
(Y, Y 0 ) ∈ DW
2α
. Then, the map Y Ẇ i given by

Y Ẇ j (s) := Y (s) Ẇ j + Yi0 (s) Ẇij

belongs to D 3α−1 . Furthermore, there exists an essentially unique function Z such

that Z
RY Ẇ j (ψ) = ψ(t) dZ(t) ,

j
and such that Zs,t = Y (s) Ws,t + Yi0 (s) Wi,j 3α
s,t + O(|t − s| ).

Remark 13.20. The function Z is unique up to addition of constants.

Proof. The fact that Y Ẇ j ∈ D 3α−1 is an immediate consequence of the definitions.

Since α > 13 by assumption, we can apply the reconstruction theorem to it, from
which it follows that there exists a unique distribution η such that, if ψ is a smooth
compactly supported test function, one has
Z Z
η(ψs ) = ψs (t)Y (s) dWt + ψsλ (t)Yi0 (s) dWi,j
λ λ j
s,t + O(λ
3α−1
).

By a simple approximation argument, it turns out that one can take for ψ the indicator
function of the interval [0, 1], so that
204 13 Introduction to regularity structures

j
η(1[s,t] ) = Y (s) Ws,t + Yi0 (s) Wi,j 3α
s,t + O(|t − s| ) .

Here, the reason why one obtains an exponent 3α rather than 3α − 1 is that it is
really |t − s|−1 1[s,t] that scales like an approximate δ-distribution as t → s. t
u

Remark 13.21. Using the formula (13.25), it is straightforward to verify that if W

happens to be a smooth function and W is defined from W via (2.2), but this time
viewing it as a definition for the right hand side, with the left hand side given by
a usual Riemann integral, then the function Z constructed in Proposition 13.19
coincides with the usual Riemann integral of Y against W j .

Remark 13.22. The theory of (controlled) rough paths of lower regularity already
hinted at in Section 2.4 can be recovered from the reconstruction operator and a
suitable choice of regularity structure (essentially two copies of the truncated tensor
algebra) in virtually the same way.

Let us give another application to rough path theory. Given an arbitrary path
W ∈ C α with values in Re , does there exist a (since α ≤ 1/2: non-unique) rough
path lift? In dimension e = 1, the answer is trivially yes, it suffices set Ws,t = 12 Ws,t
2

but the case of e > 1 is non-trivial. The following can be obtained as easy application
of the reconstruction theorem in the case γ ≤ 0.

Proposition 13.23 (Lyons–Victoir extension; [LV07]). For any W ∈ C α with val-

ues in Re for e > 1, there exist a rough path lift, i.e. W so that

W = (W, W) ∈ C α ([0, T ], Re ) .

Furthermore, this can be done is such a way that the map W 7→ W is continuous.

Remark 13.24. The reader may wonder how this dovetails with Proposition 1.1. The
point is that if we define W 7→ W by an application of the reconstruction theorem
with γ < 0, this map restricted to smooth paths does in general not coincide with the
Riemann–Stieltjes integral of W against itself.

13.4 Wavelets and the reconstruction theorem

We trust the reader is familiar with the Haar (wavelet) basis. The analysis seen earlier
in the rough path context (e.g. the proof of the sewing lemma, based on dyadic refine-
ments) can be viewed as based on this wavelet basis. The Haar basis, however, suffers
from lack of regularity. Fortunately, the following result due to Daubechies [Dau88]
provides us with much more regular functions that enjoy analogous properties:

Theorem 13.25. Given any integer 0 < r < ∞, there exists a function ϕ : Rd → R
with the following properties:
1. The function ϕ is of class Cbr and has compact support.
13.4 Wavelets and the reconstruction theorem 205

2. For every polynomial P of degree Pr, there exists a polynomial P̂ of degree r such
that, for every x ∈ Rd , one has y∈Zd P̂ (y)ϕ(x − y) = P (x).
3. One has ϕ(x)ϕ(x − y) dx = δy,0 for every y ∈ Zd .
R

4. There exist coefficients {ak }k∈Zd such that 2−d/2 ϕ(x/2) = k∈Zd ak ϕ(x − k).
P

The existence of such a function ϕ is highly non-trivial and actually equivalent to the
existence of a wavelet basis consisting of Cbr functions with compact support. Let us
restate the reconstruction theorem for the reader’s convenience. (We only consider
the case γ > 0 here.)
Theorem 13.26. Let T be a regularity structure as above and let (Π, Γ ) a model
for T on Rd . Then, there exists a unique linear map R : D γ → D0 (Rd ) such that
Rf − Πx f (x) (ϕλx ) . λγ ,

uniformly over ϕ ∈ Br and λ ∈ (0, 1], and locally uniformly in x.

Proof. We pick ϕ with properties (1–4), as provided by Theorem 13.25, for some r >
| inf A|. We also set Λn = 2−n Zd and, for y ∈ Λn , we set ϕny (x) = 2nd/2 ϕ(2n (x −
y)). Here, the normalisation is chosen in such a way that the set {ϕny }y∈Λn is again
orthonormal in L2 . We then denote by Vn ⊂ C r the linear span of {ϕny }y∈Λn , so that,
by the property (4) above, one has V0 ⊂ V1 ⊂ V2 ⊂ . . .. We furthermore denote by
V̂n the L2 -orthogonal complement of Vn−1 in Vn , so that Vn = V0 ⊕ V̂1 ⊕ . . . ⊕ V̂n .
In order to keep notations compact, it will also be convenient to define the coefficients
ank with k ∈ Λn by ank = a2n k .
With these notations at hand, we then define a sequence of linear operators
Rn : D γ → C r by
X
Rn f (y) = Πx f (x) (ϕnx ) ϕnx (y) .

x∈Λn

We claim that there then exists a Schwartz distribution Rf such that, for every

compactly supported test function ψ of class C r , one has hRn f , ψi → Rf (ψ),
and that Rf furthermore satisfies the properties stated in the theorem.
Let us first consider the size of the components of Rn+1 f − Rn f in Vn . Given
x ∈ Λn , we make use of properties (3-4), so that
X
hRn+1 f − Rn f , ϕnx i = ank hRn+1 f , ϕn+1
n
x+k i − Πx f (x) (ϕx )
k∈Λn+1
X
ank Πx+k f (x + k) (ϕn+1
n
= x+k ) − Πx f (x) (ϕx )
k∈Λn+1
X
ank Πx+k f (x + k) (ϕn+1
n+1
= x+k ) − Πx f (x) (ϕx+k )
k∈Λn+1
X
ank Πx+k f (x + k) − Γx+k,x f (x) (ϕn+1

= x+k ) ,
k∈Λn+1
206 13 Introduction to regularity structures

where we used the algebraic relations between Πx and Γxy to obtain the last identity.
Since only finitely many of the coefficients ak are non-zero, it follows from the
definition of D γ that for the non-vanishing terms in this sum we have the bound

kf (x + k) − Γx+k,x f (x)kα . 2−n(γ−α) ,

uniformly over n ≥ 0 and x in any compact set. Furthermore, for any τ ∈ Tα , it

follows from the definition of a model that one has the bound
Πx τ (ϕnx ) . 2−αn− nd

2 ,

again uniformly over n ≥ 0 and x in any compact set. Here, the additional factor
nd
2− 2 comes from the fact that the functions ϕnx are normalised in L2 rather than L1 .
Combining these two bounds, we immediately obtain that
nd
f − Rn f , ϕnx i . 2−γn− 2 ,
n+1
hR (13.17)

uniformly over n ≥ 0 and x in compact sets. Take now a test function ψ ∈ Cbr with
compact support and let us try to estimate hRn+1 f − Rn f , ψi. Since Rn+1 f −
Rn f ∈ Vn+1 , we can decompose it into a part δRn f ∈ Vn and a part δ̂Rn f ∈ V̂n+1
and estimate both parts separately. Regarding the part in Vn , we have
X nd
X
hδRn f , ψi = hδRn f , ϕnx ihϕnx , ψi . 2−γn− 2 hϕnx , ψi ,

x∈Λn+1 x∈Λn+1
(13.18)
where we made use of the bound (13.17). At this stage we use the fact that, due
to the boundedness of ψ, we have hϕnx , ψi . 2−nd/2 . Furthermore, thanks to the
boundedness of the support of ψ, the number of non-vanishing terms appearing in
this sum is bounded by 2nd , so that we eventually obtain the bound
hδRn f , ψi . 2−γn .

(13.19)

Regarding the second term, we use the standard fact coming from wavelet analysis
[Mey92] that a basis of V̂n+1 can be obtained in the same way as the basis of Vn , but
replacing the function ϕ by functions ϕ̂ from some finite set Φ. In other words, V̂n+1
is the linear span of {ϕ̂nx }x∈Λn ;ϕ̂∈Φ . Furthermore, as a consequence of property (2),
the functions ϕ̂ ∈ Φ all have the property that
Z
ϕ̂(x) P (x) dx = 0 , (13.20)

for any polynomial P of degree less or equal to r. In particular, this shows that one
has the bound
nd
|hϕ̂nx , ψi| . 2− 2 −nr .
As a consequence, one has
13.4 Wavelets and the reconstruction theorem 207
X nd
X
hδ̂Rn f , ψi = hRn+1 f , ϕ̂nx ihϕ̂nx , ψi . 2− 2 −nr hRn+1 f , ϕ̂nx i .

x∈Λn x∈Λn
ϕ̂∈Φ ϕ̂∈Φ

At this stage, we note that, thanks to the definition of Rn+1 and the bounds on the
nd
model
(Π, Γ ), we have |hRn+1 f , ϕ̂nx i| . 2− 2 −α0 n , where α0 = inf A, so that
hδ̂Rn f , ψi . 2−nr−α0 n . Combining this with (13.19), we see that one has indeed
Rn f → Rf for some Schwartz distribution Rf .
It remains to show that the bound (13.15) holds. For this, given a distribution
η ∈ C α for some α > −r, we first introduce the notation
X X X
Pn η = η(ϕnx ) ϕnx , P̂n η = η(ϕ̂nx ) ϕ̂nx .
x∈Λn ϕ̂∈Φ x∈Λn

We also choose an integer value n ≥ 0 such that 2−n ∼ λ and we write

X
Rf − Πx f (x) = Rn f − Pn Πx f (x) + Rm+1 f − Rm f − P̂m Πx f (x)

m≥n
X
n
δ̂Rm f − P̂m Πx f (x)

= R f − Pn Πx f (x) +
m≥n
X
m
+ δR f . (13.21)
m≥n

We then test these terms against ψxλ and we estimate the resulting terms separately.
For the first term, we have the identity
X
Rn f − Pn Πx f (x) (ψxλ ) = Πy f (y) − Πx f (x) (ϕny ) hϕny , ψxλ i . (13.22)

y∈Λn

Since only finitely many (independently of n) terms contribute to the sum in (13.22),
it is indeed bounded by a constant proportional to 2−γn ∼ λγ as required.
We now turn to the second term in (13.21), where we consider some fixed value
m ≥ n. We rewrite this term very similarly to before as

δ̂Rm f − P̂m Πx f (x) (ψxλ )

XX
Πy f (y) − Πx f (x) (ϕm+1 ) hϕm+1 , ϕ̂m m λ

= y y z i hϕ̂z , ψx i ,
ϕ̂∈Φ y,z

where the sum runs over y ∈ Λm+1 and z ∈ Λm . This time, we use the fact that by
the property (13.20) of the wavelets ϕ̂, one has the bound
208 13 Introduction to regularity structures
md
−d−r −rm−
|hϕ̂m λ
z , ψx i| . λ 2 2 , (13.24)

and the L2 -scaling implies that |hϕm+1

y , ϕ̂m
z i| . 1. Furthermore, for each z ∈ Λ ,
m
m+1
only finitely many elements y ∈ Λ contribute to the sum, and these elements all
satisfy |y − z| . 2−m . Bounding the first factor as in (13.23) and using the fact that
there are of the order of λd 2md terms contributing for every fixed m, we thus see
that the contribution of the second term in (13.21) is bounded by
X X X X
λd 2md λγ−α−d−r 2−dm−αm−rm ∼ λγ−α−r 2−αm−rm ∼ λγ .
m≥n α<γ α<γ m≥n

For the last term in (13.21), we combine (13.18) with the bound |hϕm λ
y , ψx i| .
−d −dm/2 d −md
λ 2 and the fact that there are of the order of λ 2 terms appearing in the
sum (13.18) to conclude that the mth summand is bounded by a constant proportional
to 2−γm . Summing over m yields again the desired bound and concludes the proof.
t
u
Remark 13.27. There are obvious analogies between the construction of the recon-
struction operator R and that of the “rough integral” in Section 4, see also Exer-
cise 13.33. As a matter of fact, there exists a slightly more abstract formulation of
the reconstruction theorem which can be interpreted as a multidimensional analogue
to the sewing lemma, Lemma 4.2.
Remark 13.28. With a look to remark 13.11, and M = (Π, Γ ) ∈ M , one should
really view R = RM f as a map from M n D γ into D0 . Since the space M n D γ is
not a linear space, this shows that the map R isn’t actually linear, despite appearances.
However, the map (Π, Γ, f ) 7→ Rf turns out to be locally Lipschitz continuous
provided that the distance between (Π, Γ, f ) and (Π̄, Γ̄ , f¯) is given by the smallest
constant % such that

kf (x) − f¯(x) − Γxy f (y) + Γ̄xy f¯(y)kα ≤ %|x − y|γ−α ,

Πx τ − Π̄x τ (ϕλx ) ≤ %λα kτ k ,

kΓxy τ − Γ̄xy τ kβ ≤ %|x − y|α−β kτ k .

Here, in order to obtain bounds on Rf − R̄f¯ (ψ) for some smooth compactly

supported test function ψ, the above bounds should hold uniformly for x and y in a
neighbourhood of the support of ψ. The proof that this stronger continuity property
also holds is actually crucial when showing that sequences of solutions to mollified
equations all converge to the same limiting object. However, its proof is somewhat
more involved which is why we chose not to give it here.
Remark 13.29. In the particular case where Πx τ happens to be a continuous function
for every τ ∈ T (and every x ∈ Rd ), Rf is also a continuous function and one has
the identity
Rf (x) = Πx f (x) (x) . (13.25)
This can be seen from the fact that
13.5 Exercises 209
X
Rf (y) = lim Rn f (y) = lim Πx f (x) (ϕnx ) ϕnx (y) .

n→∞ n→∞
x∈Λn

Indeed, our assumptions imply that the function (x, z) 7→ Πx f (x) (z) is jointly
continuous and since the non-vanishing terms in the above sum satisfy |x − y| .
2−n , one has 2dn/2 Πx f (x) (ϕnx ) ≈ Πy f (y) (y) for large n. Since furthermore

n dn/2
P
x∈Λn ϕx (y) = 2 , the claim follows.

13.5 Exercises

Exercise 13.30. Use wavelets to construct an example demonstrating the “only if”
part of Theorem 13.16.

Exercise 13.31 (Hölder spaces).

a) For k ∈ N and α ∈ (0, 1), it is customary to define C k+α as the space of k times
continuously differentiable functions f : Rd → R such that their derivatives of
order k are α-Hölder continuous. Show that this agrees with the obvious extension
to Rd of the definition given earlier in (13.2).
b) Fix α > 0. Show that f ∈ C α if and only if, for each x, there exists a polynomial
Px such that
f − Px , ψxλ . λα ,

locally uniformly in x, uniformly over λ ∈ (0, 1] and uniformly over smooth

functions ψ ∈ D with support in B1 (0) such that kψk∞;B1 (0) ≤ 1.
c) Define C −α as the space of all Schwartz distributions η belonging to the dual of
C r with r > α some integer and such that
η(ϕλx ) . λ−α ,

uniformly over all ϕ ∈ Br and λ ∈ (0, 1], and locally uniformly in x. Show that
the space C −α is independent of the choice of r in the definition given above,
which justifies the notation. Take now d = 1 and α ∈ (0, 1) for simplicity. Show
that any f ∈ C −α is the distributional derivative of some Hölder continuous
function F ∈ C 1−α .

Exercise 13.32. Show that in general, the function

Rt Z defined by (15.2) coincides, up
to an additive constant, with the integral Y (s) dXsj , interpreted in the sense of
(4.19).

Exercise 13.33. Retrace the proof of Theorem 13.12 in the context of Proposition
13.19 with the Haar basis as the choice of wavelet basis (i.e. set ϕ(x) = 1[0,1] (x)).
Convince yourself that this is equivalent to the proof of Lemma 4.2.

Exercise 13.34. Let (Π, Γ ) be a model for the “rough path” regularity structure
given in Definition 13.5 with the additional property that Πs Ẇ i is the distributional
210 13 Introduction to regularity structures

derivative of Πs W i for every s. Show that it is then necessarily of the form MW for
some α-Hölder rough path W as in Lemma 13.18.

Exercise 13.35. Give a detailed proof of Proposition 13.23.

13.6 Comments

An alternative theory to the theory of regularity structures [Hai14c] has been in-
troduced more or less simultaneously in Gubinelli–Imkeller–Perkowski [GIP12].
Instead of the reconstruction theorem, that theory builds instead on properties of
Bony’s paraproduct [Bon81, BMN10, BCD11]. It is also in principle able to deal
with stochastic PDEs like the KPZ equation or the dynamical Φ43 equation, see
Catellier–Chouk [CC13], but its scope is not as wide as that of the theory of regular-
ity structures. (For example, it cannot deal with classical one-dimensional parabolic
SPDEs driven by space-time white noise with a diffusion coefficient depending on
the solution.)
One advantage of the paraproduct-based theory is that one generally deals with
globally defined objects rather than the “jets” used in the theory of regularity struc-
tures. It also uses some already well-studied objects, so that it can rely on a substantial
body of existing literature. However, besides being less systematic than the theory
of regularity structures, it achieves a less clean break between the analytical and the
algebraic aspects of a given problem.
Chapter 14
Operations on modelled distributions

Abstract The original motivation for the development of the theory of regularity
structures was to provide robust solution theories for singular stochastic PDEs like
the KPZ equation or the dynamical Φ43 model. The idea is to reformulate them as fixed
point problems in some space D γ (or rather a slightly modified version that takes into
account possible singular behaviour near time 0) based on a suitable random model
in a regularity structure purpose-built for the problem at hand. In order to achieve
this this chapter provides a systematic way of formulating the standard operations
arising in the construction of the corresponding fixed point problem (differentiation,
multiplication, composition by a regular function, convolution with the heat kernel)
as operations on the spaces D γ .

14.1 Differentiation

Being a local operation, differentiating a modelled distribution is straightforward,

provided that the model one works with is sufficiently rich. Denote by L some
(formal) differential operator with constant coefficients that is homogeneous of
degree m, i.e. it is of the form
X
L= ak D k ,
|k|=m

where k is a d-dimensional multi-index, ak ∈ R, and Dk denotes the kth mixed

derivative in the distributional sense.
Given a regularity structure (A, T, G), it is convenient to define “abstract” dif-
ferentiation only on certain subspaces of T . More precisely, we say that a subspace
V ⊂ T is a sector if it is invariant under
Lthe action of the structure group G and if
it can furthermore be written as V = α∈A Vα with Vα ⊂ Tα . We then have the
following

211
212 14 Operations on modelled distributions

Definition 14.1. Let V be a sector of T . A linear operator ∂ : V → T is said to

realise L (of degree m) for the model (Π, Γ ) if
• one has ∂τ ∈ Tα−m for every τ ∈ Vα ,
• one has Γ ∂τ = ∂Γ τ for every τ ∈ V .
• one has Πx ∂τ = LΠx τ for every τ ∈ V and every x ∈ Rd .

Writing D γ (V ) for those elements in D γ taking values in the sector V , it then

turns out that one has the following fact:

Proposition 14.2. Let ∂ be a map that realises L for the model (Π, Γ ) and let
f ∈ D γ (V ) for some γ > m. Then, ∂f ∈ D γ−m and the identity R∂f = LRf
holds.

Proof. The fact that ∂f ∈ D γ−m is an immediate consequence of the definitions, so

we only need to show that R∂f = LRf .
By the “uniqueness” part of the reconstruction theorem, this on the other hand
follows immediately if we can show that, for every fixed test function ψ and every
x ∈ Rd , one has
Πx ∂f (x) − LRf (ψxλ ) . λδ ,

for some δ > 0. Here, we defined ψxλ as before. By the assumption on the model Π,
we have the identity

Πx ∂f (x)−LRf (ψxλ ) = ∂Πx f (x)−LRf (ψxλ ) = − Πx f (x)−Rf (L∗ ψxλ ) ,

where L∗ is the formal adjoint of L. Since, as a consequence of the homogeneity of

λ
L, one has the identity L∗ ψxλ = λ−m L∗ ψ x , it then follows immediately from the
reconstruction theorem that the right hand side of this expression is of order λγ−m ,
as required. tu

14.2 Products and composition by regular functions

One of the main purposes of the theory presented here is to give a robust way to
multiply distributions (or functions with distributions) that goes beyond the barrier
illustrated by Theorem 13.16. Provided that our functions / distributions are repre-
sented as elements in D γ for some model and regularity structure, we can multiply
their “Taylor expansions” pointwise, provided that we give ourselves a table of
multiplication on T .
It is natural to consider products with the following properties.

Definition 14.3. Given a regularity structure (T, A, G) and two sectors V, V̄ ⊂ T ,

a product on (V, V̄ ) is a bilinear map ? : V × V̄ → T such that, for any τ ∈ Vα
and τ̄ ∈ V̄β , one has τ ? τ̄ ∈ Tα+β and such that, for any element Γ ∈ G, one has
Γ (τ ? τ̄ ) = Γ τ ? Γ τ̄ .
14.2 Products and composition by regular functions 213

Remark 14.4. The condition that homogeneities add up under multiplication is very
natural, bearing in mind the case of the polynomial regularity structure. The second
condition is also very natural since it merely states that if one reexpands the product
of two “polynomials” around a different point, one should obtain the same result as
if one reexpands each factor first and then multiplies them together.
Given such a product, we can ask ourselves when the pointwise product of an
element D γ1 with an element in D γ2 again belongs to some D γ . In order to answer
this question, we introduce the notation Dαγ to denote those elements f ∈ D γ such
that furthermore M
f (x) ∈ T≥α = Tβ ,
β≥α

for every x. With this notation at hand, it is not hard to show:

Theorem 14.5. Let f1 ∈ Dαγ11 (V ), f2 ∈ Dαγ22 (V̄ ), and let ? be a product on (V, V̄ ).
Then, the function f given by f (x) = f1 (x) ? f2 (x) belongs to Dαγ with

α = α1 + α2 , γ = (γ1 + α2 ) ∧ (γ2 + α1 ) . (14.1)

Proof. It is clear that f (x) ∈ T≥α , so it remains to show that it belongs to D γ .

Furthermore, since we are only interested in showing that f1 ? f2 ∈ D γ , we discard
all of the components in Tβ for β ≥ γ.
By the properties of the product ?, it remains to obtain a bound of the type

kΓxy f1 (y) ? Γxy f2 (y) − f1 (x) ? f2 (x)kβ . |x − y|γ−β .

By adding and subtracting suitable terms, we obtain

kΓxy f (y) − f (x)kβ ≤ k Γxy f1 (y) − f1 (x) ? Γxy f2 (y) − f2 (x) kβ (14.2)

+ k Γxy f1 (y) − f1 (x) ? f2 (x)kβ

+ kf1 (x) ? Γxy f2 (y) − f2 (x) kβ .

It follows from the properties of the product ? that the first term in (14.2) is bounded
by a constant times
X
kΓxy f1 (y) − f1 (x)kβ1 kΓxy f2 (y) − f2 (x)kβ2
β1 +β2 =β
X
. kx − ykγ1 −β1 kx − ykγ2 −β2 . kx − ykγ1 +γ2 −β .
β1 +β2 =β

Since γ1 + γ2 ≥ γ, this bound is as required. The second term is bounded by a

constant times
X X
kΓxy f1 (y) − f1 (x)kβ1 kf2 (x)kβ2 . kx − ykγ1 −β1 1β2 ≥α2
β1 +β2 =β β1 +β2 =β

. kx − ykγ1 +α2 −β ,
214 14 Operations on modelled distributions

where the second inequality uses the identity β1 + β2 = β. Since γ1 + α2 ≥ γ, this

bound is again of the required type. The last term is bounded similarly by reversing
the roles played by f1 and f2 . tu

Remark 14.6. It is clear that the formula (14.1) for γ is optimal in general as can
be seen from the following two “reality checks”. First, consider the case of the
polynomial model and take fi ∈ C γi . In this case, the (abstract) truncated Taylor
series fi for fi belong to D0γi . It is clear that in this case, the product cannot be
expected to have better regularity than γ1 ∧ γ2 in general, which is indeed what (14.1)
states. The second reality check comes from (the proof of) Theorem 13.16. In this
case, with β > α ≥ 0, one has f ∈ D0β , while the constant function x 7→ Ξ belongs
β−α
to D−α
∞
so that, according to (14.1), one expects their product to belong to D−α ,
which is indeed the case.

It turns out that if we have a product on a regularity structure, then in many

cases this also naturally yields a notion of composition with regular functions. Of
course, one could in general not expect to be able to compose a regular function with a
distribution of negative order. As a matter of fact, we will only define the composition
of regular functions with elements in some D γ for which it is guaranteed that the
reconstruction operator yields a continuous function. One might think at this case
that this would yield a triviality, since we know of course how to compose arbitrary
continuous function. The subtlety is that we would like to design our composition
operator in such a way that the result is again an element of D γ .
For this purpose, we say that a given sector V ⊂ T is function-like if α <
0 ⇒ Vα = 0 and if V0 is one-dimensional. (Denote the unit vector of V0 by 1.)
We will furthermore always assume that our models are normal in the sense that
Πx 1 (y) = 1. In this case, it turns outthat if f ∈ D γ (V ), then Rf is a continuous
function and one has the identity Rf (x) = h1, f (x)i, where we denote by h1, ·i
the element in the dual of V which picks out the prefactor of 1.
Assume now that we are given a regularity structure with a function-like sector
V and a product ? : V × V → V . For any smooth function G : R → R and any
f ∈ D γ (V ) with γ > 0, we can then define G(f ) to be the V -valued function given
by
X G(k) (f¯(x))
f˜(x)?k ,

G ◦ f (x) =
k!
k≥0

where we have set

f¯(x) = h1, f (x)i , f˜(x) = f (x) − f¯(x)1 .

Here, G(k) denotes the kth derivative of G and τ ?k denotes the k-fold product
τ ? · · · ? τ . We also used the usual conventions G(0) = G and τ ?0 = 1.
Note that as long as G is C ∞ , this expression is well-defined. Indeed, by as-
sumption, there exists some α0 > 0 such that f˜(x) ∈ T≥α0 . By the properties of
the product, this implies that one has f˜(x)?k ∈ T≥kα0 . As a consequence, when
considering the component of G ◦ f in Tβ for β < γ, the only terms that give a
14.3 Schauder estimates and admissible models 215

contribution are those with k < γ/α0 . Since we cannot possibly hope in general that
0
G ◦ f ∈ D γ for some γ 0 > γ, this is all we really need.
It turns out that if G is sufficiently regular, then the map f 7→ G ◦ f enjoys
similarly nice continuity properties to what we are used to from classical Hölder
spaces. The following result is the analogue in this context to Lemma 7.3:

Proposition 14.7. In the same setting as above, provided that G is of class C k with
k > γ/α0 , the map f 7→ G◦f is continuous from D γ (V ) into itself. If k > γ/α0 +1,
then it is locally Lipschitz continuous.

The proof of this result can be found in [Hai14c]. It is somewhat lengthy, but
ultimately rather straightforward.

14.3 Schauder estimates and admissible models

One of the reasons why the theory of regularity structures is very successful at
providing detailed descriptions of the small-scale features of solutions to semilinear
(S)PDEs is that it comes with very sharp Schauder estimates. Recall that the classical
Schauder estimates state that if K : Rd → R is a kernel that is smooth everywhere,
except for a singularity at the origin that is (approximately) homogeneous of degree
β − d for some β > 0, then the operator f 7→ K ∗ f maps C α into C α+β for every
α ∈ R, except for those values for which α + β ∈ N. (See for example [Sim97].)
It turns out that similar Schauder estimates hold in the context of general regularity
structures in the sense that it is in general possible to build an operator K : D γ →
D γ+β with the property that RKf = K ∗ Rf . Of course, such a statement can only
be true if our regularity structure contains not only the objects necessary to describe
Rf up to order γ, but also those required to describe K ∗ Rf up to order γ + β.
What are these objects? At this stage, it might be useful to reflect on the effect of the
convolution of a singular function (or distribution) with K.
Let us assume for a moment that a given real-valued function f is smooth ev-
erywhere, except at some point x0 . It is then straightforward to convince ourselves
that K ∗ f is also smooth everywhere, except at x0 . Indeed, for any δ > 0, we can
write K = Kδ + Kδc , where Kδ is supported in a ball of radius δ around 0 and
Kδc is a smooth function. Similarly, we can decompose f as f = fδ + fδc , where
fδ is supported in a δ-ball around x0 and fδc is smooth. Since the convolution of
a smooth function with an arbitrary distribution is smooth, it follows that the only
non-smooth component of K ∗ f is given by Kδ ∗ fδ , which is supported in a ball of
radius 2δ around x0 . Since δ was arbitrary, the statement follows. By linearity, this
strongly suggests that the local structure of the singularities of K ∗ f can be described
completely by only using knowledge on the local structure of the singularities of f .
It also suggests that the “singular part” of the operator K should be local, with the
non-local parts of K only contributing to the “regular part”.
This discussion suggests that we certainly need the following ingredients to build
an operator K with the desired properties:
216 14 Operations on modelled distributions

• The canonical polynomial structure should be part of our regularity structure in

order to be able to describe the “regular parts”.
• We should be given an “abstract integration operator” I on T which describes
how the “singular parts” of Rf transform under convolution by K.
• We should restrict ourselves to models which are “compatible” with the action of
I in the sense that the behaviour of Πx Iτ should relate in a suitable way to the
behaviour of K ∗ Πx τ near x.
One way to implement these ingredients is to assume first that our model space T
contains abstract polynomials in the following sense.
Assumption 14.8 There exists a sector T̄ ⊂ T isomorphic to the space of abstract
polynomials in d commuting variables. In other words, T̄α 6= 0 if and only if α ∈ N,
and one can find basis vectors X k of T|k| such that every element Γ ∈ G acts on T̄
by Γ X k = (X + h1)k for some h ∈ Rd .
Furthermore, we assume that there exists an abstract integration operator I with
the following properties.
Assumption 14.9 There exists a linear map I : T → T such that ITα ⊂ Tα+β ,
such that I T̄ = 0, and such that, for every Γ ∈ G and τ ∈ T , one has

Γ Iτ − IΓ τ ∈ T̄ . (14.3)

Finally, we want to consider models that are compatible with this structure for a
given kernel K. For this, we first make precise what we mean exactly when we said
that K is approximately homogeneous of degree β − d.
P
Assumption 14.10 One can write K = n≥0 Kn where each of the kernels
d
Kn : R → R is smooth and compactly supported in a ball of radius 2−n around the
origin. Furthermore, we assume that for every multi-index k, one has a constant C
such that the bound
sup |Dk Kn (x)| ≤ C2n(d−β+|k|) , (14.4)
x
R
holds uniformly in n. Finally, we assume that Kn (x)P (x) dx = 0 for every
polynomial P of degree at most N , for some sufficiently large value of N .

Remark 14.11. It turns out that in order to define the operator K on D γ , we will need
K to annihilate polynomials of degree N for some N ≥ γ + β.

Remark 14.12. The last assumption may appear to be extremely stringent at first
sight. In practice, this turns out not to be a problem at all. Say for example that we
want to define an operator that represents convolution with G, the Green’s function of
the Laplacian. Then, G can be decomposed into a sum of terms satisfying the bound
(14.4) with β = 2, but it does of course not annihilate generic polynomials and it is
not supported in the ball of radius 1.
However, for any fixed value of N > 0, it is straightforward to decompose G
as G = K + R, where the kernel K is compactly supported and satisfies all of the
14.3 Schauder estimates and admissible models 217

properties mentioned above, and the kernel R is smooth. Lifting the convolution with
R to an operator from D γ → D γ+β (actually to D γ̄ for any γ̄ > 0) is straightforward,
so that we have reduced our problem to that of constructing an operator describing
the convolution by K.

Given such a kernel K, we can now make precise what we meant earlier when we
said that the models under consideration should be compatible with the kernel K.

Definition 14.13. Given a kernel K as in Assumption 14.10 and a regularity structure

T satisfying Assumptions 14.8 and 14.9, we say that a model (Π, Γ ) is admissible
if the identities

Πx X k (y) = (y − x)k ,

Πx Iτ = K ∗ Πx τ − Πx J (x)τ , (14.5)

holds for every τ ∈ T with |τ | ≤ N . Here, J (x) : T → T̄ is the linear map given
on homogeneous elements by
X Xk Z
D(k) K(x − y) Πx τ (dy) .

J (x)τ = (14.6)
k!
|k|<|τ |+β

Remark 14.14. Note first that if τ ∈ T̄ , then the definition given above is coherent as
long as |τ | < N . Indeed, since Iτ = 0, one necessarily has Πx Iτ = 0. On the other
hand, the properties of K ensure that in this case one also has K ∗ Πx τ = 0, as well
as J (x)τ = 0.

Remark 14.15. While K ∗ ξ is well-defined for any distribution ξ, it is not so clear a

priori whether the operator J (x) given in (14.6) is also well-defined. It turns out that
the axioms of a model do ensure that this is the case. The correct way of interpreting
(14.6) is by
X X Xk
Πx τ D(k) Kn (x − ·) .

J (x)τ =
k!
|k|<|τ |+β n≥0

Note now that the scaling properties of the Kn ensure that 2(β−|k|)n D(k) Kn (x − ·)
is a test function that is localised around x at scale 2−n . As a consequence, one has
Πx τ D(k) Kn (x − ·) . 2(|k|−β−|τ |)n ,

so that this expression is indeed summable as long as |k| < |τ | + β.

Remark 14.16. As a matter of fact, it turns out that the above definition of an admis-
sible model dovetails very nicely with our axioms defining a general model. Indeed,
starting from any regularity structure T , any model (Π, Γ ) for T , and a kernel
K satisfying Assumption 14.10, it is usually possible to build a larger regularity
structure Tˆ containing T (in the “obvious” sense that T ⊂ T̂ and the action of Ĝ on
T is compatible with that of G) and endowed with an abstract integration map I, as
218 14 Operations on modelled distributions

well as an admissible model (Π̂, Γ̂ ) on Tˆ which reduces to (Π, Γ ) when restricted

to T . See [Hai14c] for more details.
The only exception to this rule arises when the original structure T contains some
homogeneous element τ which does not represent a polynomial and which is such
that |τ | + β ∈ N. Since the bounds appearing both in the definition of a model and
in Assumption 14.10 are only upper bounds, it is in practice easy to exclude such a
situation by slightly tweaking the definition of either the exponent β or of the original
regularity structure T .

With all of these definitions in place, we can finally build the operator K : D γ →
D γ+β
announced at the beginning of this section. Recalling the definition of J from
(14.6), we set

Kf (x) = If (x) + J (x)f (x) + N f (x) , (14.7)

where the operator N is given by

X Xk Z
D(k) K(x − y) Rf − Πx f (x) (dy) .

N f (x) = (14.8)
k!
|k|<γ+β

Note first that thanks to the reconstruction theorem, it is possible to verify that the
right hand side of (14.8) does indeed make sense for every f ∈ D γ in virtually the
same way as in Remark 14.15. One has:

Theorem 14.17. Let K be a kernel satisfying Assumption 14.10, let T = (A, T, G)

be a regularity structure satisfying Assumptions 14.8 and 14.9, and let (Π, Γ ) be an
admissible model for T . Then, for every f ∈ D γ with γ ∈ (0, N −β) and γ +β 6∈ N,
the function Kf defined in (14.7) belongs to D γ+β and satisfies RKf = K ∗ Rf .

Proof. The complete proof of this result can be found in [Hai14c] and will not be
given here. Let us simply show that one has indeed RKf = K ∗ Rf in the particular
case when our model consists of continuous functions so that Remark 13.29 applies.
In this case, one has

RKf (x) = Πx (If (x) + J (x)f (x)) (x) + Πx N f (x) (x) .

As a consequence of (14.5), the first term appearing in the right hand side of this
expression is given by

Πx (If (x) + J (x)f (x)) (x) = K ∗ Πx f (x) (x) .

On the other hand, the only term contributing to the second term is the one with
k = 0 (which is always present since γ > 0 by assumption) which then yields
Z

Πx N f (x) (x) = K(x − y) Rf − Πx f (x) (dy) .
14.4 Exercises 219

Adding both of these terms, we see that the expression K ∗ Πx f (x) (x) cancels,
leaving us with the desired result. t
u
We are now in principle in possession of all of the ingredients required to formu-
late a large number of semilinear stochastic PDEs: multiplication, composition by
regular functions, differentiation, and integration against the Green’s function of the
linearised equation.
In the next chapter we show how this can be leveraged in practice in order to build
a robust solution theory for the KPZ equation.

14.4 Exercises

Exercise 14.18. a) Construct an example of a regularity structure with trivial group

G in which both Rf1 and Rf2 are continuous functions but where the identity

R(f1 ? f2 )(x) = (Rf1 )(x) (Rf2 )(x)

fails.
b) Transfer Exercise 2.17 to the present context.
Solution 14.19. (We only address the first part.) Consider for instance the regularity
structure given by A = (−2κ, −κ, 0) for fixed κ > 0 with each Tα being a copy
of R given by T−nκ = hΞ n i. We furthermore take for G the trivial group. This
regularity structure comes with an obvious product by setting Ξ m ? Ξ n = Ξ m+n
provided that m + n ≤ 2.
Then, we could for example take as a model for T = (T, A, G):

Πx Ξ 0 (y) = 1 , Πx Ξ (y) = 0 , Πx Ξ 2 (y) = c ,

(14.9)

where c is an arbitrary constant. Let furthermore

f1 (x) = f1 (x)Ξ 0 + f˜1 (x)Ξ , f2 (x) = f2 (x)Ξ 0 + f˜2 (x)Ξ .

Since our group G is trivial, one has fi ∈ D γ provided that each of the fi belongs to
C γ and each of the f˜ibelongs to C γ+κ . (And one has γ + κ < 1.) One furthermore
has the identity Rfi (x) = fi (x).
However, the pointwise product is given by

f1 ? f2 (x) = f1 (x)f2 (x)Ξ 0 + f˜1 (x)f2 (x) + f˜2 (x)f1 (x) Ξ + f˜1 (x)f˜2 (x)Ξ 2 ,

which by Theorem 14.5 belongs to D γ−κ . Provided that γ > κ, one can then apply
the reconstruction operator to this product and one obtains

R f 1 ? f 2 (x) = f1 (x)f2 (x) + cf˜1 (x)f˜2 (x) ,

which is obviously quite different from the pointwise product (Rf1 )(x) · (Rf2 )(x).
220 14 Operations on modelled distributions

How should this be interpreted? For n > 0, we could have defined a model Π (n)
by
√
Πx Ξ 0 (y) = 1 , Πx Ξ 2 (y) = 2c sin2 (ny) .

Πx Ξ (y) = 2c sin(ny) ,

Denoting by R(n) the corresponding reconstruction operator, we have the identity

√
R(n) f i (x) = fi (x) + 2cf˜i (x) sin(nx) ,

as well as R(n) (f1 ? f2 ) = R(n) f1 · R(n) f2 . As a model, the model Π (n) actually
converges to the limiting model Π defined in (14.9). As a consequence of the
continuity of the reconstruction operator, this implies that

R(n) f1 · R(n) f2 = R(n) (f1 ? f2 ) → R(f1 ? f2 ) 6= Rf1 · Rf2 ,

which is of course also easy to see “by hand”. This shows that in some cases, the
“non-canonical” models as in (14.9) can be interpreted as limits of “canonical” models
for which the usual rules of calculus hold. Even this is however not always the case
(think of the Itô Brownian rough path).

Exercise 14.20. Consider space-time Rd with one temporal and (d − 1) spatial

dimensions, under the parabolic scaling (2, 1, . . . , 1), as introduced in Remark 13.9.
Denote by G the heat kernel (i.e. the Green’s function of the operator ∂t − ∂x2 ). Show
that one has the decomposition

G = K + K̂ ,

where the kernel K satisfies all of the assumptions of Section 14.3 (with β = 2) and
the remainder K̂ is smooth and bounded.
Chapter 15
Application to the KPZ equation

Abstract We show how the theory of regularity structures can be used to build a
robust solution theory for the KPZ equation. We also give a very short survey of the
original approach to the same problem using controlled rough paths and we discuss
how the two approaches are linked.

15.1 Formulation of the main result

Let us now briefly explain how the theory of regularity structures can be used to
make sense of solutions to very singular semilinear stochastic PDEs. We will keep
the discussion in this section at a very informal level without attempting to make
mathematically precise statements. The interested reader may find more details in
[Hai13, Hai14c].
For definiteness, we focus on the case of the KPZ equation [KPZ86], which is
formally given by
∂t h = ∂x2 h + (∂x h)2 + ξ − C , (15.1)
where ξ denotes space-time white noise, the spatial variable takes values in the circle
(i.e. in the interval [0, 2π] endowed with periodic boundary conditions), and C is a
fixed constant. The problem with such an equation is that even the solution to the
linear part of the equation, namely

∂t Ψ = ∂x2 Ψ + ξ ,

is not differentiable as a function of the spatial variable. As a matter of fact, as already

noted in Section 12.2, for any fixed time t, Ψ has the regularity of Brownian motion
as a function of the spatial variable x. As a consequence, it turns out that the only
way of giving meaning to (15.1) is to “renormalise” the equation by subtracting to its
right hand side an “infinite constant”, which counteracts the divergence of the term
(∂x h)2 .

221
222 15 Application to the KPZ equation

This has usually been interpreted in the following way. Assuming for a moment
that ξ is a smooth function, a simple consequence of the change of variables formula
shows that if we define h = log Z, then Z satisfies the PDE

∂t Z = ∂x2 Z + Z ξ .

The only ill-posed product appearing in this equation now is the product of the
solution Z with white noise ξ. As long as Z takes values in L2 , this product can
be given a meaning as a classical Itô integral, so that the equation for Z can be
interpreted as the Itô equation

dZ = ∂x2 Z dt + Z dW , (15.2)

were W is an L2 -cylindrical Wiener process. It is well known [DPZ92] that this

equation has a unique (mild) solution and we can then go backwards and define the
solution to the KPZ equation as h = log Z. The expert reader will have noticed that
this argument is flawed: since (15.2) is interpreted as an Itô equation, we should
really use Itô’s formula to find out what equation h satisfies. If one does this a bit
more carefully, one notices that the Itô correction term appearing in this way is
indeed an infinite constant! This is the case in the following sense. If Wε is a Wiener
process with a covariance given by x 7→ ε−1 %(ε−1 x) for some smooth compactly
supported function % integrating to 1 and Zε solves (15.2) with W replaced by Wε ,
then hε = log Zε solves

dh = ∂x2 h dt + (∂x h)2 dt + dWε − ε−1 C% dt , (15.3)

for some constant C% depending on %. Since Zε converges to a strictly positive limit Z,

this shows that the sequence of functions hε solving (15.3) converges to a limit h. This
limit is called the Hopf–Cole solution to the KPZ equation [Hop50, Col51, BG97].
This notion of solution is of course not very satisfactory since it relies on a nonlin-
ear transformation and provides no direct interpretation of the term (∂x h)2 appearing
in the right hand side of (15.1). Furthermore, many natural growth models lead to
equations that structurally “look like” (15.1), rather than (15.2). Since perturbations
are usually rather badly behaved under exponentiation and since there is no really
good approximation theory for (15.2) either (for example it has been an open problem
whether space-time regularisations of the noise lead to the same notion of solution),
one would like to have a robust solution theory for (15.1) directly.
Such a robust solution theory is precisely what the theory of regularity structures
provides. More precisely, it provides spaces M (a suitable space of “admissible
models”) and D γ , maps Sa (an abstract “solution map”), R (the reconstruction
operator) and Ψ (a “canonical lift map”), as well as a finite-dimensional group R
acting both on R and M such that the following diagram commutes:
15.1 Formulation of the main result 223

R
Sa
C∈ F × M × Cα Dγ

Ψ · R
(15.4)
Sc
F × C × Cα C([0, T ], C α )

∈
R
ξ h0 h

Here, Sc denotes the classical solution map Sc (C, ξ, h0 ) which provides the solution
(up to some fixed final time T ) to the equation

∂t h = ∂x2 h + (∂x h)2 + ξ − C , h(0, x) = h0 (x) , (15.5)

for regular instances of the noise ξ. The space F of “formal right hand sides” is in
this case just a copy of R which holds the value of the constant C appearing in (15.5).
The diagram commutes in the sense that if M ∈ R, then

Sc (M (C), ξ, h0 ) = RSa (C, M (Ψ (ξ)), h0 ) ,

where we identify M with its respective actions on R and M . The important addi-
tional features are the following:
• If ξε denotes a “natural” regularisation of space-time white noise, then there
exists a sequence Mε of elements in R such that Mε Ψ (ξε ) converges to a limiting
random element (Π, Γ ) ∈ M . This element can also be characterised directly
without resorting to specific approximation procedures and RSa (0, (Π, Γ ), h0 )
coincides almost surely with the Hopf–Cole solution to the KPZ equation.
• The maps Sa and R are both continuous, unlike the map Sc which is discontinuous
in its second argument for any topology for which ξε converges to ξ.
• As an abstract group, the “renormalisation group” R is simply equal to (R3 , +).
However, it is possible to extend the picture to deal with much larger classes of
approximations, which has the effect of increasing both R and the space F of
possible right hand sides. See for example [HQ14] for a proof of convergence to
KPZ for a much larger class of interface growth models.
An example of statement that can be proved from these considerations (see
[Hai13, Hai14c, HQ14]) is the following.
Theorem 15.1. Consider the sequence of equations

∂t hε = ∂x2 hε + (∂x hε )2 + ξε − Cε , (15.6)

where ξε = δε ∗ξ with δε (t, x) = ε−3 %(ε−2 t, ε−1 x), for some smooth and compactly
supported function %, and ξ denotes space-time white noise. Then, there exists a
224 15 Application to the KPZ equation

(diverging) choice of constants Cε such that the sequence hε converges in probability

to a limiting process h.
Furthermore, one can ensure that the limiting process h does not depend on the
choice of mollifier % and that it coincides with the Hopf–Cole solution to the KPZ
equation.

Remark 15.2. It is important to note that although the limiting process is independent
of the choice of mollifier %, the constant Cε does very much depend on this choice,
as we already alluded to earlier.

Remark 15.3. Regarding the initial condition, one can take h0 ∈ C β for any fixed
β > 0. Unfortunately, this result does not cover the case of “infinite wedge” initial
conditions, see for example [Cor12].

The aim of this section is to sketch how the theory of regularity structures can be
used to obtain this kind of convergence results and how (15.4) is constructed. First of
all, we note that while our solution h will be a Hölder continuous space-time function
(or rather an element of D γ for some regularity structure with a model over R2 ), the
“time” direction has a different scaling behaviour from the three “space” directions.
As a consequence, it turns out to be effective to slightly change our definition of
“localised test functions” by setting

ϕλ(s,x) (t, y) = λ−3 ϕ λ−2 (t − s), λ−1 (y − x) .

Accordingly, the “effective dimension” of our space-time is actually 3, rather than 2.

The theory presented in Section 13 extends mutatis mutandis to this setting. (Note
however that when considering the homogeneity of a regular monomial, powers of
the time variable should now be counted double.) Note also that with this way of
measuring regularity, space-time white noise belongs to C −α for every α > 32 . This
is because of the bound
1/2 3
Ehξ, ϕλx i2 = kϕλx kL2 ≈ λ− 2 ,

combined with an argument somewhat similar to the proof of Kolmogorov’s continu-

ity lemma.

15.2 Construction of the associated regularity structure

Our first step is to build a regularity structure that is sufficiently large to allow to
reformulate (15.1) as a fixed point in D γ for some γ > 0. Denoting by G the heat
kernel (i.e. the Green’s function of the operator ∂t − ∂x2 ), we can rewrite the solution
to (15.1) with initial condition h0 as

h = G ∗ (∂x h)2 + ξ + Gh0 ,

15.2 Construction of the associated regularity structure 225

where ∗ denotes space-time convolution and where we denote by Gh0 the harmonic
extension of h0 . (That is the solution to the heat equation with initial condition h0 .)
In order to have a chance of fitting this into the framework described above, we first
decompose the heat kernel G as in Exercise 14.20 as

G = K + K̂ ,

where the kernel K satisfies all of the assumptions of Section 14.3 (with β = 2) and
the remainder K̂ is smooth. If we consider any regularity structure containing the
usual Taylor polynomials and equipped with an admissible model, is straightforward
to associate to K̂ an operator K̂ : D γ → D ∞ via
X Xk
D(k) K̂ ∗ Rf (z) ,

K̂f (z) =
k!
k

where z denotes a space-time point and k runs over all possible 2-dimensional
multiindices. Similarly, the harmonic extension of h0 can be lifted to an element
in D ∞ which we denote again by Gh0 by considering its Taylor expansion around
every space-time point. At this stage, we note that we actually cheated a little: while
Gh0 is smooth in {(t, x) : t > 0, x ∈ S 1 } and vanishes when t < 0, it is of course
singular on the time-0 hyperplane {(0, x) : x ∈ S 1 }. This problem can be cured
by introducing weighted versions of the spaces D γ allowing for singularities on a
given hyperplane. A precise definition of these spaces and their behaviour under
multiplication and the action of the integral operator K can be found in [Hai14c].
For the purpose of the informal discussion given here, we will simply ignore this
problem.
This suggests that the “abstract” formulation of (15.1) should be given by

H = K (∂H)2 + Ξ + K̂ (∂H)2 + Ξ + Gh0 ,

(15.7)

where it still remains to be seen how to define an “abstract differentiation operator” ∂

realising the spatial derivative ∂x as in Section 14.1. In view of (14.7), this equation
is of the type
H = I (∂H)2 + Ξ + (. . .) ,

(15.8)
where the terms (. . .) consist of functions that take values in the subspace T̄ of
T spanned by regular Taylor polynomials in the time variable X0 and the space
variable X1 . (As previously, X denotes the collection of both.) In order to build a
regularity structure in which (15.8) can be formulated, it is then natural to start with
the structure T̄ given by these abstract polynomials (again with the parabolic scaling
which causes the abstract “time” variable to have homogeneity 2 rather than 1), and
−
to then add a symbol Ξ to it which we postulate to have homogeneity − 23 , where
−
we denote by α an exponent strictly smaller than, but arbitrarily close to, the value
α. As a consequence of our definitions, it will also turn out that the symbol ∂ is
always immediately followed by the symbol I, so that it makes sense to introduce the
226 15 Application to the KPZ equation

shorthand I 0 = ∂I. This is also suggestive of the fact that I 0 can itself be considered
an abstract integration map, associated to the kernel K 0 = ∂x K.
We then simply add to T all of the formal expressions that an application of the
right hand side of (15.8) can generate for the description of H, ∂H, and (∂H)2 . The
homogeneity of a given expression is furthermore completely determined by the
rules |Iτ | = |τ | + 2, |∂τ | = |τ | − 1 and |τ τ̄ | = |τ | + |τ̄ |. For example, it follows
from (15.8) that the symbol I(Ξ) is required for the description of H, so that I 0 (Ξ)
is required for the description of ∂H. This then implies that I 0 (Ξ)2 is required for
the description of the right hand side of (15.8), which in turn implies that I(I 0 (Ξ)2 )
is also required for the description of H, etc.

Remark 15.4. Here we made a distinction between I(Ξ), interpreted as the linear
map I applied to the symbol Ξ, and the symbol I(Ξ). Since the map I is then
defined by I(Ξ) = I(Ξ), this distinction is somewhat moot and will be blurred in
the sequel.

More formally, denote by U the collection of those formal expressions that are
required to describe H. This is then defined as the smallest collection containing 1,
X, and I(Ξ), and such that

τ1 , τ2 ∈ U ⇒ I(∂τ1 ∂τ2 ) ∈ U ,

where it is understood that I(X k ) = 0 for every multi-index k. We then set

W = U ∪ {Ξ} ∪ {∂τ1 ∂τ2 : τi ∈ U} , (15.9)

and we define our space T as the set of all linear combinations of elements in W.
Naturally, Tα consists of those linear combinations that only involve elements in W
that are of homogeneity α. It is not too difficult to convince oneself that, for every
α ∈ R, W contains only finitely many elements of homogeneity less than α, so that
each Tα is finite-dimensional.
In order to simplify expressions later, we will use the following shorthand graphi-
cal notation for elements of W. For Ξ, we draw a small circle. The integration map I
is then represented by a downfacing wavy line and I 0 is represented by a downfacing
plain line. The multiplication of symbols is obtained by joining them at the root. For
example, we have

I 0 (Ξ)2 = , (I 0 (I 0 (Ξ)2 ))2 = , I(I 0 (Ξ)2 ) = .

Symbols containing factors of X have no particular graphical representation, so we

will for example write Xi I 0 (Ξ)2 = Xi . With this notation, the space T (up to
homogeneity 32 ) is given by

T = hΞ, , , , , , , , 1, , , , , X1 , , , . . .i , (15.10)

where we ordered symbols in increasing order of homogeneity and used h·i to denote
the linear span.
15.3 The structure group 227

Exercise 15.5. Compute the homogeneities of the symbols appearing in the list
(15.10).

15.3 The structure group

Recall that the purpose of the group G is to provide a class of linear maps Γ : T → T
arising as possible candidates for the action of “reexpanding” a “Taylor series” around
a different point. In our case, in view of (14.5), the coefficients of these reexpansions
will naturally be some polynomials in x and in the expressions appearing in (14.6).
This suggests that we should define a space T + whose basis vectors consist of formal
expressions of the type
YN
Xk J`i (τi ) , (15.11)
i=1

where N is an arbitrary but finite number, the τi are canonical basis elements in W
defined in (15.9), and the `i are d-dimensional multiindices satisfying |`i | < |τi | + 2.
(The last bound is a reflection of the restriction of the summands in (14.6) with
β = 2.) The space T + is endowed with a natural commutative product. (In fact,
T + is nothing but the free commutative algebra over the symbols {Xi , J` (τ )} with
i ∈ {1, . . . , d} and τ ∈ W with |τ | < |`|.)

Remark 15.6. While the canonical basis of T + is related to that of T , it should be

viewed as a completely disjoint space. We emphasise this by not colouring the basis
vectors of T + .

The space T + also has a natural graded structure T + =

L +
Tα similarly to before
by setting
|J` (τ )| = |τ | + 2 − |`| , |X k | = |k| ,
and by postulating that the degree of a product is the sum of the degrees of its
factors. Unlike in the case of T however, elements of T + all have strictly positive
homogeneity, except for the empty product 1 which we postulate to have homogeneity
0.
To any given admissible model (Π, Γ ), it is then natural to associate linear maps
fx : T + → R by setting fx (X k ) = (−x)k , fx (σσ̄) = fx (σ)fx (σ̄), and
Z
fx (J`i (τi )) = D(`i ) K(x − y) Πx τi (dy) .

(15.12)

It turns out that with this definition, the coefficients of the linear maps Γxy can be
expressed as polynomials of the numbers fx (Jì (τi )) and fy (Jì (τi )) for suitable
expressions τi and multiindices ì . In order to formalize this, we consider the follow-
ing construction. We define a linear map ∆ : T → T ⊗ T + in the following way. For
the basic elements Ξ, 1 and Xi (i ∈ {0, 1}), we set
228 15 Application to the KPZ equation

∆1 = 1 ⊗ 1 , ∆Ξ = Ξ ⊗ 1 , ∆Xi = Xi ⊗ 1 + 1 ⊗ Xi .

We then extend this recursively to all of T by imposing the following identities

∆(τ τ̄ ) = ∆τ · ∆τ̄ ,
X X` Xm
∆I(τ ) = (I ⊗ I)∆τ + ⊗ J`+m (τ ) ,
`! m!
`,m
X X` Xm
∆I 0 (τ ) = (I 0 ⊗ I)∆τ + ⊗ J`+m+(0,1) (τ ) .
`! m!
`,m

def
Here, we extend τ 7→ Jk (τ ) = Jk (τ ) to a linear map Jk : T → T + by setting
Jk (τ ) = 0 for those basis vectors τ ∈ W for which |τ | < |k| − 2.
Let now G+ denote the set of all linear maps g : T + → R with the property that
g(σσ̄) = g(σ)g(σ̄) for any two elements σ and σ̄ in T + . Then, to any such map, we
can associate a linear map Γg : T → T by

Γg τ = (I ⊗ g)∆τ . (15.13)

In principle, this definition makes sense for every g ∈ (T + )∗ . However, it turns out
that the set of such maps with g ∈ G+ forms a group, which is our structure group
G.
Furthermore, there exists a linear map ∆+ : T + → T + ⊗ T + such that

(∆ ⊗ I)∆ = (I ⊗ ∆+ )∆ , ∆+ (σσ̄) = ∆+ σ · ∆+ σ̄ . (15.14)

With this map at hand, we can define a product ◦ on the dual of T + by

(f ◦ g)(σ) = (f ⊗ g)∆+ σ .

This has the property that Γf ◦g = Γf ◦ Γg , with the symbol ◦ on the right denoting
the composition of linear maps as usual. The second identity of (15.14) furthermore
ensures that if f and g belong to G+ , then f ◦ g ∈ G+ . It also turns out that every
f ∈ G+ admits a unique inverse f −1 such that f −1 ◦ f = f ◦ f −1 = e, where
e : T + → R maps every basis vector of the form (15.11) to zero, except for e(1) = 1.
The element e is neutral in the sense that Γe is the identity operator.
It is a highly non-trivial fact [Hai14c, Sec. 8] that if Π comes from an admissible
model as in Definition 14.13 and we define Fx : T → T by

Fx = Γfx ,

with fx given by (15.12), then Πz Fz−1 is independent of the space-time point z. In

particular, for any admissible model, it turns out that Γ is determined by Π through
the identity
Γxy = Fx−1 Fy = Γγxy , γxy = fx−1 ◦ fy .
15.4 Canonical lifts of regular functions 229

While this is a very nice coherent algebraic framework, it begs the question whether
in general there do even exist any non-trivial admissible models. This is a valid
question since the analytic bounds and algebraic identities that any admissible model
should satisfy are extremely stringent. The next section shows that fortunately there
exists a very rich class of admissible models.

15.4 Canonical lifts of regular functions

Given any sufficiently regular function ξ (say a continuous space-time function),

there is then a canonical way of lifting ξ to a model ιξ = (Π, Γ ) for T by setting

Πz X k (z̄) = (z̄ − z)k ,

Πz Ξ (z̄) = ξ(z̄) ,

and then recursively by

Πz τ τ̄ (z̄) = Πz τ (z̄) · Πz τ̄ (z̄) , (15.15)

as well as (14.5). Here we used z and z̄ as notations for generic space-time points in
order not to overload the notations. The maps Γxy are then determined from Π by
the discussion in the previous subsection.
With such a model ιξ at hand, it follows from (15.15), (13.25), and the admissibil-
ity of ιξ that the associated reconstruction operator satisfies the properties

RKf = K ∗ Rf , R(f g) = Rf · Rg ,

as long as all the functions to which R is applied belong to D γ for some γ > 0. As a
consequence, applying the reconstruction operator R to both sides of (15.7), we see
that if H solves (15.7) then, provided that the model (Π, Γ ) = ιξ was built as above
starting from any continuous realisation ξ of the driving noise, the function h = RH
solves the equation (15.1).
At this stage, the situation is as follows. For any continuous realisation ξ of the
driving noise, we have factorized the solution map (h0 , ξ) 7→ h associated to (15.1)
into maps
(h0 , ξ) 7→ (h0 , ιξ) 7→ H 7→ h = RH ,
where the middle arrow corresponds to the solution to (15.7) in some weighted
D γ -space. The advantage of such a factorisation is that the last two arrows yield
continuous maps, even in topologies sufficiently weak to be able to describe driving
noise having the lack of regularity of space-time white noise. The only arrow that
isn’t continuous in such a weak topology is the first one. At this stage, it should
be believable that a similar construction can be performed for a very large class of
semilinear stochastic PDEs, provided that certain scaling properties are satisfied.
This is indeed the case and large parts of this programme have been carried out in
[Hai14c].
230 15 Application to the KPZ equation

Given this construction, one is lead naturally to the following question: given
a sequence ξε of “natural” regularisations of space-time white noise, for example
as in (15.6), do the lifts ιξε converge in probably in a suitable space of admissible
models? Unfortunately, unlike in the theory of rough paths where this is very often
the case (see Section 10), the answer to this question in the context of SPDEs is often
an emphatic no. Indeed, if it were the case for the KPZ equation, then one could
have been able to choose the constant Cε to be independent of ε in (15.6), which is
certainly not the case.

15.5 Renormalisation of the KPZ equation

One way of circumventing the fact that ιξε does not converge to a limiting model as
ε → 0 is to consider instead a sequence of renormalised models. The main idea is
to exploit the fact that our abstract definitions of a model do not impose the identity
(15.15), even in situations where ξ itself happens to be a continuous function. One
question that then imposes itself is: what are the natural ways of “deforming” the
usual product which still lead to lifts to an admissible model? It turns out that the
regularity structure whose construction was sketched above comes equipped with
a natural finite-dimensional group of continuous transformations R on its space of
admissible models (henceforth called the “renormalisation group”), which essentially
amounts to the space of all natural deformations of the product. It then turns out that
even though ιξε does not converge, it is possible to find a sequence Mε of elements in
R such that the sequence Mε ιξε converges to a limiting model (Π̂, Γ̂ ). Unfortunately,
the elements Mε no not preserve the image of ι in the space of admissible models.
As a consequence, when solving the fixed point map (15.7) with respect to the model
Mε ιξε and inserting the solution into the reconstruction operator, it is not clear a
priori that the resulting function (or distribution) can again be interpreted as the
solution to some modified PDE. It turns out that in our case, at least for a suitable
subgroup of R, this is again the case and the modified equation is precisely given
by (15.6), where Cε is some linear combination of the constants appearing in the
description of Mε .
There are now three questions that remain to be answered:
1. How does one construct the renormalisation group R?
2. How does one derive the new equation obtained when renormalising a model?
3. What is the right choice of Mε ensuring that the renormalised models converge?

15.5.1 The renormalisation group

How does all this help with the identification of a natural class of deformations for
the usual product? First, it turns out that for every continuous function ξ, if we denote
again by (Π, Γ ) the model ιξ, then the linear map Π : T → C given by
15.5 Renormalisation of the KPZ equation 231

Π = Πy Fy−1 , (15.16)

which is independent of the choice of y by the above discussion, is given by

ΠX k (x) = xk ,

ΠΞ (x) = ξ(x) , (15.17)

and then recursively by

Πτ τ̄ = Πτ · Π τ̄ , ΠIτ = K ∗ Πτ . (15.18)

Note that this is very similar to the definition of ιξ, with the notable exception that
(14.5) is replaced by the more “natural” identity ΠIτ = K ∗ Πτ . It turns out
that the knowledge of Π and the knowledge of (Π, Γ ) are equivalent since one has
Πx = ΠFx and the map Fx can be recovered from Πx by (15.12). (This argument
appears circular but it is possible to put a suitable recursive structure on T and T +
ensuring that this actually works.) Furthermore, the translation (Π, Γ ) ↔ Π actually
works for any admissible model and does not at all rely on the fact that it was built by
lifting a continuous function. However, in the general case, the first identity in (15.17)
does of course not make any sense anymore and might fail even if the coordinates of
Π consist of continuous functions.
At this stage we note that if ξ happens to be a stationary stochastic process
and Π is built from ξ by following the above procedure, then Πτ is a stationary
stochastic process for every τ ∈ T . In order to define R, it is natural to consider only
transformations of the space of admissible models that preserve this property. Since
we are not in general allowed to multiply components of Π, the only remaining
operation is to form linear combinations. It is therefore natural to describe elements
of R by linear maps M : T → T and to postulate their action on admissible models
by Π 7→ Π M with
Π M τ = ΠM τ . (15.19)
It is not clear a priori whether given such a map M and an admissible model (Π, Γ )
there is a coherent way of building a new model (Π M , Γ M ) such that Π M is the map
associated to (Π M , Γ M ) as above. It turns out that one has the following statement:

Proposition 15.7. In the above context, for every linear map M : T → T commuting
with I and multiplication by X k , there exist unique linear maps ∆M : T → T ⊗ T +
and ∆ˆM : T + → T + ⊗ T + such that if we set

ΠxM τ = Πx ⊗ fx ∆M τ , M
(σ) = (γxy ⊗ fx )∆ˆM σ ,

γxy

then ΠxM satisfies again (14.5) and the identity ΠxM Γxy
M
= ΠyM .

At this stage it may look like any linear map M : T → T commuting with I and
multiplication by X k yields a transformation on the space of admissible models by
Proposition 15.7. This however is not true since we have completely disregarded the
analytical bounds that every model has to satisfy. It is clear from Definition 13.6 that
in the absence of any additional knowledge these are satisfied if and only if ΠxM τ is
232 15 Application to the KPZ equation

a linear combination of the Πx τ̄ for some symbols τ̄ with |τ̄ | ≥ |τ |. This suggests
the following definition.

Definition 15.8. The renormalisation group R consists of the set of linear maps
M : T → T commuting with I, I 0 , and with multiplication by X k , such that for
τ ∈ Tα one has
∆M τ − τ ⊗ 1 ∈ T>α ⊗ T + . (15.20)
Its action on the space of admissible models is given by Proposition 15.7.

Remark 15.9. In principle, one should of course also impose that

∆ˆM σ − σ ⊗ 1 ∈ T>α
+
⊗ T+ .

However, it turns out that this is always the case, provided that ∆M satisfies (15.20).
The reason for this is that it is possible to verify that one always has the identity
∆ˆM Jk (τ ) = (Jk ⊗ I)∆M τ .

15.5.2 The renormalised equations

In the case of the KPZ equation, it turns out that we need a three-parameter sub-
group of R to renormalise the equations, but in order to explain the procedure we
will consider a larger 4-dimensional subgroup
P3 of R. More precisely, we consider
elements M ∈ R of the form M = exp(− i=0 Ci Li ), where the generators Li are
determined by the following contraction rules:

L0 : 7→ 1 , L1 : 7→ 1 , L2 : 7→ 1 L3 : 7→ 1 . (15.21)

This should be understood in the sense that if τ is an arbitrary formal expression,

then L0 τ is the sum of all formal expressions obtained from τ by performing a
substitution of the type 7→ 1. For example, one has

L0 =2 , L0 =2 + ,

etc. The extension of the other operators Li to all of T proceeds in principle along
the same lines. However, as a consequence of the fact that I(1) = I 0 (1) = 0 by
construction, it actually turns out that Li τ = 0 for i 6= 0 and every τ for which Li
wasn’t defined in (15.21). It is possible to verify that one has the following result.

Proposition 15.10. The linear maps M of the type just described belong to R. Fur-
thermore, if (Π, Γ ) is an admissible model such that Πx τ is a continuous function
for every τ ∈ T , then one has the identity

ΠxM τ (x) = Πx M τ (x) .

(15.22)
15.5 Renormalisation of the KPZ equation 233

Remark 15.11. Note that it is the same value x that appears twice on each side of
(15.22). It is in fact not the case that one has ΠxM τ = Πx M τ in general! However,
the identity (15.22) is all we need to derive the renormalised equation.

It is now rather straightforward to show the following:

P3
Proposition 15.12. Let M = exp(− i=0 Ci Li ) as above and let (Π M , Γ M ) =
M ιξ for some smooth function ξ. Let furthermore H be the solution to (15.7) with
respect to the model (Π M , Γ M ). Then, writing RM for the reconstruction
operator
associated to this renormalised model, the function h(t, x) = RM H (t, x) solves
the equation

∂t h = ∂x2 h + (∂x h)2 − 4C0 ∂x h + ξ − (C1 + C2 + 4C3 ) .

Proof. By Theorem 14.5, it turns out that (15.7) can be solved in D γ as soon as γ is
a little bit greater than 3/2. Therefore, we only need to keep track of its solution H
up to terms of homogeneity 3/2. By repeatedly applying the identity (15.8), we see
that the solution H ∈ D γ for γ close enough to 3/2 is necessarily of the form

H = h1 + + + h0 X1 + 2 + 2h0 ,

for some real-valued functions h and h0 . (Note that h0 is treated as an independent

function here, we certainly do not suggest that the function h is differentiable! Our
notation is only by analogy with the classical Taylor expansion...) As an immediate
consequence, ∂H is given by

∂H = + + h0 1 + 2 + 2h0 , (15.23)

as an element of D γ for γ sufficiently close to 1/2. Similarly, the right hand side of
the equation is given up to order 0 by

(∂H)2 + Ξ = Ξ + + 2 + 2h0 + +4 + 2h0 + 4h0 + (h0 )2 1 . (15.24)

It follows from the definition of M that one then has the identity

M ∂H = ∂H − 4C0 ,

so that, as an element of D γ with very small (but positive) γ, one has the identity

(M ∂H)2 = (∂H)2 − 8C0 .

As a consequence, after neglecting all terms of strictly positive order, one has the
identity (writing c instead of c1 for real constants c)

M (∂H)2 + Ξ = (∂H)2 + Ξ − C0 4 + 4 + 8 + 4h0 1 − C1 − C2 − 4C3

= (M ∂H)2 + Ξ − 4C0 M ∂H − (C1 + C2 + 4C3 ) .

Combining this with (15.22), the claim now follows at once. t

u
234 15 Application to the KPZ equation

Remark 15.13. It turns out that, thanks to the symmetry x 7→ −x enjoyed by our
problem, the corresponding model can be renormalised by a map M as above, but
with C0 = 0. The reason why we considered the general case here is twofold. First,
it shows that it is possible to obtain renormalised equations that differ from the
original equation in a more complicated way than just by the addition of a large
constant. Second, it is plausible that if one tries to approximate the KPZ equation by
a microscopic model which is not symmetric under space inversion, then the constant
C0 could play a non-trivial role.

15.5.3 Convergence of the renormalised models

It remains to argue why one expects to be able to find constants Ciε such that the
P3
sequence of renormalised models M ε ιξε with M ε = exp( i=1 Ciε Li ) converges
to a limiting model. Instead of considering the actual sequence of models, we only
ε
consider the sequence of stationary processes Π̂ τ := Π ε M ε τ , where Π ε is
ε ε
associated to (Π , Γ ) = ιξε as in Section 15.5.1.

Remark 15.14. It is important to note that we do not attempt here to give a full proof
that the renormalised model converges to a limit in the correct topology for the space
ε
of admissible models. We only aim to argue that it is plausible that Π̂ converges
to a limit in some topology. A full proof of convergence (but in a slightly different
setting) can be found in [Hai13], see also [Hai14c, Section 10].

Since there are general arguments available to deal with all the expressions τ
of positive homogeneity as well as expressions of the type I 0 (τ ) and Ξ itself, we
restrict ourselves to those that remain. Inspecting (15.10), we see that they are given
by
, , , , .
For this part, some elementary notions from the theory of Wiener chaos expansions
are required, but we’ll try to hide this as much as possible. At a formal level, one has
the identity
Π ε = K 0 ∗ ξε = Kε0 ∗ ξ ,
where the kernel Kε0 is given by Kε0 = K 0 ∗ δε . This shows that, at least formally,
one has
ZZ
Π ε (z) = K 0 ∗ ξε (z)2 = Kε0 (z − z1 )Kε0 (z − z2 ) ξ(z1 )ξ(z2 ) dz1 dz2 .

Similar but more complicated expressions can be found for any formal expression τ .
This naturally leads to the study of random variables of the type
Z Z
Ik (f ) = · · · f (z1 , . . . , zk ) ξ(z1 ) · · · ξ(zk ) dz1 · · · dzk . (15.25)
15.5 Renormalisation of the KPZ equation 235

Ideally, one would hope to have an Itô isometry of the type EIk (f )Ik (g) =
hf sym , g sym i, where h·, ·i denotes the L2 -scalar product and f sym denotes the sym-
metrisation of f . This is unfortunately not the case. Instead, one should replace the
products in (15.25) by Wick products, which are formally generated by all possible
contractions of the type

ξ(zi )ξ(zj ) 7→ ξ(zi ) ξ(zj ) + δ(zi − zj ) .

If we then set
Z Z
Iˆk (f ) = ··· f (z1 , . . . , zk ) ξ(z1 ) · · · ξ(zk ) dz1 · · · dzk ,

One has indeed

EIˆk (f )Iˆk (g) = hf sym , g sym i .
Furthermore, one has equivalence of moments in the sense that, for every k > 0 and
p > 0 there exists a constant Ck,p such that

E|Iˆk (f )|p ≤ Ck,p kf sym kp .

Finally, one has EIˆk (f )Iˆ` (g) = 0 if k 6= `. Random variables of the form Iˆk (f ) for
some k ≥ 0 and some square integrable function f are said to belong to the kth
homogeneous Wiener chaos.
Returning to our problem, we first argue that it should be possible to choose M ε
ε
in such a way that Π̂ converges to a limit as ε → 0. The above considerations
suggest that one should rewrite Π ε as

Π ε (z) = K 0 ∗ ξε (z)2

(15.26)
ZZ
= Kε0 (z − z1 )Kε0 (z − z2 ) ξ(z1 ) ξ(z2 ) dz1 dz2 + Cε(1) ,

(1)
where the constant Cε is given by the contraction
Z
2
Kε0 (z) dz .
(1) def
Cε = =

Note now that Kε0 is an ε-approximation of the kernel K 0 which has the same singular
behaviour as the derivative of the heat kernel. In terms of the parabolic distance, the
singularity of the derivative of the heat kernel scales like p K(z) ∼ |z|−2 for z → 0.
(Recall that we consider the parabolic distance |(t, x)| = |t| + |x|, so that this is
consistent with the fact that the derivative of the heat kernel is bounded by t−1 .) This
2
suggests that one has Kε0 (z) ∼ |z|−4 for |z| ε. Since parabolic space-time has
scaling dimension 3 (time counts double!), this is a non-integrable singularity. As a
matter of fact, there is a whole power of z missing to make it borderline integrable,
which suggests that one has
236 15 Application to the KPZ equation

1
Cε(1) ∼ .
ε
This already shows that one should not expect Π ε to converge to a limit as ε → 0.
However, it turns out that the first term in (15.26) converges to a distribution-valued
stationary space-time process, so that one would like to somehow get rid of this
(1)
diverging constant Cε . This is exactly where the renormalisation map M ε (in
particular the factor exp(−C1 L1 )) enters into play. Following the above definitions,
we see that one has
ε
(z) = Π ε M (z) = Π ε (z) − C1 .

Π̂
(1) ε
This suggests that if we make the choice C1 = Cε , then Π̂ does indeed converge
to a non-trivial limit as ε → 0. This limit is a distribution given, at least formally, by
ZZ
ε
ψ(z)K 0 (z − z1 )K 0 (z − z2 ) dz ξ(z1 ) ξ(z2 ) dz1 dz2 .

Π (ψ) =

Using again the scaling properties of the kernel K 0 , it is not too difficult to show that
this yields indeed a random variable belonging to the second homogeneous Wiener
chaos for every choice of smooth test function ψ.
The case τ = is treated in a somewhat similar way. This time one has

Π ε (z) = K 0 ∗ ξε (z) K 0 ∗ K 0 ∗ ξε (z)

ZZ
= Kε0 (z − z1 )(K ∗ Kε0 )(z − z2 ) ξ(z1 ) ξ(z2 ) dz1 dz2 + Cε(0) ,

(0)
where the constant Cε is given by the contraction
Z
= Kε0 (z) K 0 ∗ Kε0 (z) dz .
def
Cε(0) =

This time however Kε0 is an odd function (in the spatial variable) and K 0 ∗ Kε0 is an
(0)
even function, so that Cε vanishes for every ε > 0. This is why we can set C0 = 0
and no renormalisation is required for .
Turning to our list of terms of negative homogeneity, it remains to consider ,
, and . It turns out that the latter two are the more difficult ones, so we only
discuss these. Let us first argue why we expect to be able to choose the constant C2
ε
in such a way that Π̂ converges to a limit. In this case, the “bad” term comes
from the part of Π ε (z) belonging to the homogeneous chaos of order 0. This is
simply a constant, which is given by
Z
= 2 K 0 (z)K 0 (z̄)Q2ε (z − z̄) dz dz̄ ,
(2) def
Cε = 2 (15.27)

where the kernel Qε is given by

15.5 Renormalisation of the KPZ equation 237
Z
Qε (z) = Kε0 (z̄)Kε0 (z̄ − z) dz̄ .

Remark 15.15. The factor 2 comes from the fact that the contraction (15.27) appears
twice, since it is equal to the contraction . In principle, one would think that the
(2)
contraction also contributes to Cε . This term however vanishes due to the fact
0
that the integral of Kε vanishes.

Since Kε0 is an ε-mollification of a kernel with a singularity of order −2 and

the scaling dimension of the underlying space is 3, we see that Qε behaves like an
ε-mollification of a kernel with a singularity of order −2 − 2 + 3 = −1 at the origin.
As a consequence, the singularity of the integrand in (15.27) is of order −6, which
gives rise to a logarithmic divergence as ε → 0. This suggests that one should choose
(2)
C2 = Cε in order to cancel out this diverging term and obtain a non-trivial limit
ε
for Π̂ as ε → 0. This is indeed the case.
We finally turn to the case τ = . In this case, there are “bad” terms appearing in
the Wiener chaos decomposition of Π ε both in the second and the zeroth Wiener
chaos. This time, the constant appearing in the zeroth Wiener chaos is given by

Z
K 0 (z)K 0 (z̄)Qε (z̄)Qε (z + z̄) dz dz̄ ,
def
Cε(3) =2 =2

(2)
which diverges logarithmically for exactly the same reason as Cε . Setting C2 =
(2)
Cε , this diverging constant can again be cancelled out. The combinatorial factor 2
arises in essentially the same way as for and the contribution of the term where
the two top nodes are contracted vanishes for the same reason as previously.
It remains to consider the contribution of Π ε to the second Wiener chaos. This
contribution consists of three terms, which correspond to the contractions

It turns out that the first one of these terms does not give raise to any singularity. The
last two terms can be treated in essentially the same way, so we focus on the last one,
which we denote by η ε . For fixed ε, the distribution (actually smooth function) η ε is
given by
Z
η (ψ) = ψ(z0 )K 0 (z0 − z1 )Qε (z0 − z1 )K 0 (z2 − z1 )
ε

× Kε0 (z3 − z2 )Kε0 (z4 − z2 ) ξ(z3 ) ξ(z4 ) dz .

The problem with this is that as ε → 0, the product Q̂ε := K 0 Qε converges to a

kernel Q̂ = K 0 Q, which has a non-integrable singularity at the origin. In particular,
it is not clear a priori whether the action of integrating a test function against Q̂ε
converges to a limiting distribution as ε → 0. Our saving grace here is that since Qε
is even and K 0 is odd, the kernel Q̂ε integrates to 0 for every fixed ε.
238 15 Application to the KPZ equation

This is akin to the problem of making sense of the “Cauchy principal value”
distribution, which formally corresponds to the integration against 1/x. For the sake
of the argument, let us consider a function W : R → R which is compactly supported
and smooth everywhere except at the origin, where it diverges like |W (x)| ∼ 1/|x|.
It is then natural to associate to W a “renormalised” distribution RW given by
Z
RW (ϕ) = W (x) ϕ(x) − ϕ(0) dx .

Note that RW has the property that if ϕ(0) = 0, then it simply corresponds to
integration against W , which is the standard way of associating a distribution to
a function. Furthermore, the above expression is always well-defined, since ϕ is
smooth and therefore the factor (ϕ(x) − ϕ(0)) cancels out the singularity of W at
the origin. It is also straightforward to verify that if Wε is a sequence of smooth
approximations to W (say one has Wε (x) = W (x) for |x| > ε and |Wε | . 1/ε
otherwise) which has the property that each Wε integrates to 0, then W ε → RW in
a distributional sense.
In the same way, one can show that Q̂ε converges as ε → 0 to a limiting distribu-
tion R Q̂. As a consequence, one can show that η ε converges to a limiting (random)
distribution η given by
Z
η(ψ) = ψ(z0 ) R Q̂(z0 −z1 )K 0 (z2 −z1 )K 0 (z3 −z2 )K 0 (z4 −z2 ) ξ(z3 )ξ(z4 ) dz .

It should be clear from this whole discussion that while the precise values of the
constants Ci depend on the details of the mollifier δε , the limiting (random) model
(Π̂, Γ̂ ) obtained in this way is independent of it. Combining this with the continuity
of the solution to the fixed point map (15.7) and of the reconstruction operator R
with respect to the underlying model, we see that the statement of Theorem 15.1
follows almost immediately.

15.6 The KPZ equation and rough paths

In the particular case of the KPZ equation, it turns out that is possible to give a robust
solution theory by only using “classical” controlled rough path theory, as exposed in
the earlier part of this book. This is actually how it was originally treated in [Hai13].
To see how this can be the case, we make the following crucial remarks:
1. First, looking at the expression (15.23) for ∂H, we see that most symbols come
with constant coefficients. The only non-constant coefficients that appear are in
front of the term 1, which is some kind of renormalised value for ∂H, and in front
of the term . This suggests that the problem of finding a solution h to the KPZ
equation (or equivalently a solution h0 to the corresponding Burgers equation) can
be simplified considerably by considering instead the function v given by
15.6 The KPZ equation and rough paths 239

v = ∂x h − Π + +2 , (15.28)

where Π is the operator given in (15.16).

2. The only symbol τ appearing in ∂H such that |τ | + | | < 0 is the symbol .
Furthermore, one has

∆1 = 1 ⊗ 1 , ∆ = ⊗ 1 + 1 ⊗ J 0( ) ,
∆ = ⊗1, ∆ = ⊗ 1 + ⊗ J 0( ) .

It then follows from this and the definition (15.13) of the structure group G that
the space h , , 1, i ⊂ T is invariant under the action of G. Furthermore, its
action on this subspace is completely described by one real number corresponding
to J 0 ( ). Finally, viewing this subspace as a regularity structure in its own right,
we see that it is nothing but the regularity structure of Section 13.3.2, provided
that we make the identifications ∼ Ẇ , ∼ W , and ∼ Ẇ.
3. One has the identities

∆ = ⊗ 1 + ⊗ J 0( ), ∆ = ⊗ 1 + ⊗ J 0( ),

so that the pair of symbols { , } could also have played the role of {W , Ẇ}
in the previous remark.
Let now ξ be a smooth function and let h be given by the solution to the unrenor-
malised KPZ equation (15.1). Defining Π by ΠΞ = ξ and then recursively as in
(15.18), and defining v by (15.28), we then obtain for v the equation

∂t v = ∂x2 v + ∂x v Π + 4 Π

+R, (15.29)

where the “remainder” R belongs to C α for every α < −1. Similarly to before, it also
turns out that if we replace Π bi Π̂ = Π M defined as in (15.19) (with C0 = 0) and
h as the solution to the renormalised KPZ equation (15.6) with Cε = C1 + C2 + 4C3 ,
then v also satisfies (15.29), but with Π replaced by the renormalised model Π̂.
We are now in the following situation. As a consequence of (15.23) we can guess
that for any fixed time t, the solution v should be controlled by the function Π̂ ,
which we can interpret as one component (say W 1 ) of some rough path (W, W).
Note that here the spatial variable plays the role of time! The time variables merely
plays the role of a parameter, so we really have a family of rough paths indexed
by time. Furthermore, Π̂ can be interpreted as the distributional derivative of
another component (say W 0 ) of the rough path W . Finally, the function Π̂ can be
interpreted as a third component W 2 of W .
As a consequence of the second and third remarks above, the two distributions
Π̂ and Π̂ can then be interpreted as the distributional derivatives of the “iterated
integrals” W1,0 and W2,1 . It follows automatically from these algebraic relations
combined with the analytic bounds (13.13) that W1,0 and W2,1 then satisfy the
required estimates (2.3). Our model does not provide any values for W1,2 , but these
turn out not to be required. Assuming that v is indeed controlled by X1 = Π̂ , it
240 15 Application to the KPZ equation

is then possible to give meaning to the term v Π appearing in (15.29) by using

“classical” rough integration.
As a consequence, we then see that the right hand side of (15.29) is of the form
∂x2 Y , for some function Y controlled by W 0 . One of the main technical results of
[Hai13] guarantees that if Z solves

∂t Z = ∂x2 Z + ∂x2 Y ,

and Y is controlled by W 0 , then Z is necessarily controlled by W 1 = Π̂ . This

“closes the loop” and allows to set up a fixed point equation for v that is stable as
a function of the underlying model Π̂ and therefore also allows to deal with the
limiting case of the KPZ equation driven by space-time white noise.
References

[Alm66] F. J. A LMGREN , J R . Plateau’s problem: An invitation to varifold geometry. W. A.

Benjamin, Inc., New York-Amsterdam, 1966.
[AR91] S. A LBEVERIO and M. R ÖCKNER. Stochastic differential equations in infinite dimen-
sions: solutions via Dirichlet forms. Probab. Theory Related Fields 89, no. 3, (1991),
347–386. doi:10.1007/BF01198791.
[Bal00] E. J. BALDER. Lectures on Young measure theory and its applications in economics.
Rend. Istit. Mat. Univ. Trieste 31, no. suppl. 1, (2000), 1–69. Workshop on Measure
Theory and Real Analysis (Italian) (Grado, 1997).
[Bau04] F. BAUDOIN. An introduction to the geometry of stochastic flows. Imperial College
Press, London, 2004.
[BCD11] H. BAHOURI, J.-Y. C HEMIN, and R. DANCHIN. Fourier analysis and nonlinear partial
differential equations, vol. 343 of Grundlehren der Mathematischen Wissenschaften.
Springer, Heidelberg, 2011.
[BF13] C. BAYER and P. K. F RIZ. Cubature on Wiener space: pathwise convergence. Appl.
Math. Optim. 67, no. 2, (2013), 261–278. doi:10.1007/s00245-012-9187-8.
[BFH09] E. B REUILLARD, P. F RIZ, and M. H UESMANN. From random walks to
rough paths. Proc. Amer. Math. Soc. 137, no. 10, (2009), 3487–3496.
doi:10.1090/S0002-9939-09-09930-4.
[BFRS13] C. BAYER, P. K. F RIZ, S. R IEDEL, and J. S CHOENMAKERS. From rough path
estimates to multilevel Monte Carlo. ArXiv e-prints (2013). arXiv:1305.5779.
[BG97] L. B ERTINI and G. G IACOMIN. Stochastic Burgers and KPZ equations from particle
systems. Comm. Math. Phys. 183, no. 3, (1997), 571–607.
[BH91] N. B OULEAU and F. H IRSCH. Dirichlet forms and analysis on Wiener space, vol. 14
of de Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin, 1991.
[BH07] F. BAUDOIN and M. H AIRER. A version of Hörmander’s theorem for the fractional
Brownian motion. Probab. Theory Related Fields 139, no. 3-4, (2007), 373–395.
doi:10.1007/s00440-006-0035-0.
[Bis81a] J.-M. B ISMUT. Martingales, the Malliavin calculus and Hörmander’s theorem. In
Stochastic integrals (Proc. Sympos., Univ. Durham, Durham, 1980), vol. 851 of Lecture
Notes in Math., 85–109. Springer, Berlin, 1981.
[Bis81b] J.-M. B ISMUT. Martingales, the Malliavin calculus and hypoellipticity under general
Hörmander’s conditions. Z. Wahrsch. Verw. Gebiete 56, no. 4, (1981), 469–505.
[BM07] R. B UCKDAHN and J. M A. Pathwise stochastic control problems and stochastic HJB
equations. SIAM Journal on Control and Optimization 45, no. 6, (2007), 2224–2256.
doi:10.1137/S036301290444335X.
[BMN10] Á. B ÉNYI, D. M ALDONADO, and V. NAIBO. What is . . . a paraproduct? Notices Amer.
Math. Soc. 57, no. 7, (2010), 858–860.

241
242 References

[BNQ13] H. B OEDIHARDJO, H. N I, and Z. Q IAN. Uniqueness of signature for simple curves.

ArXiv e-prints (2013). arXiv:1304.0755.
[Bog98] V. I. B OGACHEV. Gaussian measures, vol. 62 of Mathematical Surveys and Mono-
graphs. American Mathematical Society, Providence, RI, 1998.
[Bon81] J.-M. B ONY. Calcul symbolique et propagation des singularités pour les équations
aux dérivées partielles non linéaires. Ann. Sci. École Norm. Sup. (4) 14, no. 2, (1981),
209–246.
[Bor75] C. B ORELL. The Brunn-Minkowski inequality in Gauss space. Invent. Math. 30, no. 2,
(1975), 207–216.
[BP08] J. B OURGAIN and N. PAVLOVI Ć. Ill-posedness of the Navier-Stokes equations
in a critical space in 3D. J. Funct. Anal. 255, no. 9, (2008), 2233–2247.
doi:10.1016/j.jfa.2008.07.008.
[CC13] R. C ATELLIER and K. C HOUK. Paracontrolled Distributions and the 3-dimensional
Stochastic Quantization Equation. ArXiv e-prints (2013). arXiv:1310.6869.
[CDFO13] D. C RISAN, J. D IEHL, P. K. F RIZ, and H. O BERHAUSER. Robust filtering: correlated
noise and multidimensional observation. Ann. Appl. Probab. 23, no. 5, (2013), 2139–
2160. doi:10.1214/12-AAP896.
[CF09] M. C ARUANA and P. F RIZ. Partial differential equations driven by
rough paths. J. Differential Equations 247, no. 1, (2009), 140–173.
doi:10.1016/j.jde.2009.01.026.
[CF10] T. C ASS and P. F RIZ. Densities for rough differential equations under
Hörmander’s condition. Ann. of Math. (2) 171, no. 3, (2010), 2115–2141.
doi:10.4007/annals.2010.171.2115.
[CFO11] M. C ARUANA, P. K. F RIZ, and H. O BERHAUSER. A (rough) pathwise approach to a
class of non-linear stochastic partial differential equations. Ann. Inst. H. Poincaré Anal.
Non Linéaire 28, no. 1, (2011), 27–46. doi:10.1016/j.anihpc.2010.11.002.
[CFV07] L. C OUTIN, P. F RIZ, and N. V ICTOIR. Good rough path sequences and applications
to anticipating stochastic calculus. Ann. Probab. 35, no. 3, (2007), 1172–1193.
doi:10.1214/009117906000000827.
[CFV09] T. C ASS, P. F RIZ, and N. V ICTOIR. Non-degeneracy of Wiener functionals arising from
rough differential equations. Trans. Amer. Math. Soc. 361, no. 6, (2009), 3359–3371.
doi:10.1090/S0002-9947-09-04677-7.
[Che54] K.-T. C HEN. Iterated integrals and exponential homomorphisms. Proc. London Math.
Soc. (3) 4, (1954), 502–512.
[Che13] I. C HEVYREV. Unitary representations of geometric rough paths. ArXiv e-prints (2013).
arXiv:1307.3580.
[CHLT12] T. C ASS, M. H AIRER, C. L ITTERER, and S. T INDEL. Smoothness of the den-
sity for solutions to Gaussian Rough Differential Equations. ArXiv e-prints (2012).
arXiv:1209.3100. Ann. Probab., to appear.
[CIL92] M. G. C RANDALL, H. I SHII, and P.-L. L IONS. User’s guide to viscosity solutions
of second order partial differential equations. Bull. Amer. Math. Soc. (N.S.) 27, no. 1,
(1992), 1–67.
[CLL13] T. C ASS, C. L ITTERER, and T. LYONS. Integrability and tail estimates for Gaus-
sian rough differential equations. Ann. Probab. 41, no. 4, (2013), 3026–3050.
doi:doi:10.1214/12-AOP821.
[Col51] J. D. C OLE. On a quasi-linear parabolic equation occurring in aerodynamics. Quart.
Appl. Math. 9, (1951), 225–236.
[Cor12] I. C ORWIN. The Kardar-Parisi-Zhang equation and universality class. Random Matrices
Theory Appl. 1, no. 1, (2012), 1130001, 76. doi:10.1142/S2010326311300014.
[CQ02] L. C OUTIN and Z. Q IAN. Stochastic analysis, rough path analysis and fractional
Brownian motions. Probab. Theory Related Fields 122, no. 1, (2002), 108–140.
[Dau88] I. DAUBECHIES. Orthonormal bases of compactly supported wavelets. Comm. Pure
Appl. Math. 41, no. 7, (1988), 909–996. doi:10.1002/cpa.3160410705.
References 243

[Dav08] A. M. DAVIE. Differential equations driven by rough paths: an approach via dis-
crete approximation. Appl. Math. Res. Express. AMRX 2008, no. 2, (2008), 1–40.
doi:10.1093/amrx/abm009.
[Der10] S. D EREICH. Rough paths analysis of general Banach space-valued Wiener processes.
J. Funct. Anal. 258, no. 9, (2010), 2910–2936. doi:10.1016/j.jfa.2010.01.018.
[DF12] J. D IEHL and P. F RIZ. Backward stochastic differential equations with rough drivers.
Ann. Probab. 40, no. 4, (2012), 1715–1758. doi:10.1214/11-AOP660.
[DFG13] J. D IEHL, P. F RIZ, and P. G ASSIAT. Stochastic control with rough paths. ArXiv e-prints
(2013). arXiv:1303.7160.
[DFM13] J. D IEHL, P. F RIZ, and H. M AI. Pathwise stability of likelihood estimators for
diffusions via rough paths. ArXiv e-prints (2013). arXiv:1311.1061.
[DFO14] J. D IEHL, P. K. F RIZ, and H. O BERHAUSER. Regularity theory for rough partial
differential equations and parabolic comparison revisited. Preprint; earlier version
available on arXiv (2014).
[DFS14] J. D IEHL, P. F RIZ, and W. S TANNAT. Stochastic partial differential equations: a rough
path view, 2014. Preprint.
[DGT12] A. D EYA, M. G UBINELLI, and S. T INDEL. Non-linear rough heat equa-
tions. Probab. Theory Related Fields 153, no. 1-2, (2012), 97–147.
doi:10.1007/s00440-011-0341-z.
[DNT12a] A. D EYA, A. N EUENKIRCH, and S. T INDEL. A Milstein-type scheme without Lévy
area terms for SDEs driven by fractional Brownian motion. Ann. Inst. Henri Poincaré
Probab. Stat. 48, no. 2, (2012), 518–550. doi:10.1214/10-AIHP392.
[DNT12b] A. D EYA, A. N EUENKIRCH, and S. T INDEL. A Milstein-type scheme without Lévy
area terms for SDEs driven by fractional Brownian motion. Ann. Inst. Henri Poincaré
Probab. Stat. 48, no. 2, (2012), 518–550. doi:10.1214/10-AIHP392.
[DOR13] J. D IEHL, H. O BERHAUSER, and S. R IEDEL. A Levy-area between Brownian motion
and rough paths with applications to robust non-linear filtering and RPDEs. ArXiv
e-prints (2013). arXiv:1301.3799.
[DPD03] G. DA P RATO and A. D EBUSSCHE. Strong solutions to the stochastic quantization equa-
tions. Ann. Probab. 31, no. 4, (2003), 1900–1916. doi:10.1214/aop/1068646370.
[DPZ92] G. DA P RATO and J. Z ABCZYK. Stochastic equations in infinite dimensions, vol. 44
of Encyclopedia of Mathematics and its Applications. Cambridge University Press,
Cambridge, 1992.
[Faw04] T. FAWCETT. Non-commutative harmonic analysis. Ph.D. thesis, University of Oxford,
2004.
[FdLP06] D. F EYEL and A. DE L A P RADELLE. Curvilinear integrals along enriched paths.
Electronic Journal of Probability 11, (2006), 860–892. doi:10.1214/EJP.v11-356.
[FG13] P. F RIZ and P. G ASSIAT. Eikonal equations and pathwise solutions to fully nonlinear
SPDEs, 2013. Preprint.
[FG14] P. K. F RIZ and B. G ESS. Stochastic scalar conservation laws driven by rough paths.
ArXiv e-prints (2014). arXiv:1403.6785.
[FGGR13] P. K. F RIZ, B. G ESS, A. G ULISASHVILI, and S. R IEDEL. Jain-Monrad criterion for
rough paths. ArXiv e-prints (2013). arXiv:1307.3460.
[FGL13] P. F RIZ, P. G ASSIAT, and T. LYONS. Physcial Brownian motion in magnetic field as
rough path. ArXiv e-prints (2013). arXiv:1302.2531.
[FLS06] P. F RIZ, T. LYONS, and D. S TROOCK. Lévy’s area under condition-
ing. Ann. Inst. H. Poincaré Probab. Statist. 42, no. 1, (2006), 89–101.
doi:10.1016/j.anihpb.2005.02.003.
[FO10] P. F RIZ and H. O BERHAUSER. A generalized Fernique theorem and
applications. Proc. Amer. Math. Soc. 138, (2010), 3679–3688.
doi:10.1090/S0002-9939-2010-10528-2.
[FO14] P. F RIZ and H. O BERHAUSER. Rough path stability of (semi-)linear
SPDEs. Probab. Theory Related Fields 158, no. 1-2, (2014), 401–434.
doi:10.1007/s00440-013-0483-2.
244 References

[Föl81] H. F ÖLLMER. Calcul d’Itô sans probabilités. In Seminar on Probability, XV (Univ.

Strasbourg, Strasbourg, 1979/1980) (French), vol. 850 of Lecture Notes in Math.,
143–150. Springer, Berlin, 1981.
[FR11] P. F RIZ and S. R IEDEL. Convergence rates for the full Brownian rough paths with
applications to limit theorems for stochastic flows. Bull. Sci. Math. 135, no. 6-7, (2011),
613–628. doi:10.1016/j.bulsci.2011.07.006.
[FR13] P. F RIZ and S. R IEDEL. Integrability of (non-)linear rough differential equations
and integrals. Stochastic Analysis and Applications 31, no. 2, (2013), 336–358.
doi:10.1080/07362994.2013.759758.
[FR14] P. F RIZ and S. R IEDEL. Convergence rates for the full Gaussian rough paths. Ann. Inst.
Henri Poincaré Probab. Stat. 50, no. 1, (2014), 154–194. doi:10.1214/12-AIHP507.
[Fri05] P. K. F RIZ. Continuity of the Itô-map for Hölder rough paths with applications to the
support theorem in Hölder norm. In Probability and partial differential equations in
modern applied mathematics, vol. 140 of IMA Vol. Math. Appl., 117–135. Springer,
New York, 2005.
[FS06] W. H. F LEMING and H. M. S ONER. Controlled Markov processes and viscosity
solutions, vol. 25 of Stochastic Modelling and Applied Probability. Springer, New
York, second ed., 2006.
[FS12a] P. F RIZ and A. S HEKHAR. Doob–Meyer for rough paths. Special Varadhan issue of
Bulletin of Institute of Mathematics Academia Sinica New Series, to appear (2012).
[FS12b] P. F RIZ and A. S HEKHAR. The Lévy-Kintchine formula for rough paths. ArXiv e-prints
(2012). arXiv:1212.5888.
[FV05] P. F RIZ and N. V ICTOIR. Approximations of the Brownian rough path with applications
to stochastic analysis. Ann. Inst. H. Poincaré Probab. Statist. 41, no. 4, (2005), 703–724.
doi:10.1016/j.anihpb.2004.05.003.
[FV06a] P. F RIZ and N. V ICTOIR. A note on the notion of geometric rough paths. Probab. The-
ory Related Fields 136, no. 3, (2006), 395–416. doi:10.1007/s00440-005-0487-7.
[FV06b] P. F RIZ and N. V ICTOIR. A variation embedding theorem and applications. J. Funct.
Anal. 239, no. 2, (2006), 631–637. doi:10.1016/j.jfa.2005.12.021.
[FV07] P. F RIZ and N. V ICTOIR. Large deviation principle for enhanced Gaussian pro-
cesses. Ann. Inst. H. Poincaré Probab. Statist. 43, no. 6, (2007), 775 – 785.
doi:10.1016/j.anihpb.2006.11.002.
[FV10a] P. F RIZ and N. V ICTOIR. Differential equations driven by Gaussian signals. Ann. Inst.
H. Poincaré Probab. Statist. 46, no. 2, (2010), 369–413. doi:10.1214/09-AIHP202.
[FV10b] P. F RIZ and N. V ICTOIR. Multidimensional Stochastic Processes as Rough Paths, vol.
120 of Cambridge Studies in Advanced Mathematics. Cambridge University Press,
Cambridge, 2010.
[FV11] P. F RIZ and N. V ICTOIR. A note on higher dimensional p-variation. Electronic Journal
of Probability 16, (2011), 1880–1899. doi:10.1214/EJP.v16-951.
[GIP12] M. G UBINELLI, P. I MKELLER, and N. P ERKOWSKI. Paracontrolled distributions and
singular PDEs. ArXiv e-prints (2012). arXiv:1210.2684.
[GL09] M. G UBINELLI and J. L ÖRINCZI. Gibbs measures on Brownian currents. Comm. Pure
Appl. Math. 62, no. 1, (2009), 1–56. doi:10.1002/cpa.20260.
[GLP99] G. G IACOMIN, J. L. L EBOWITZ, and E. P RESUTTI. Deterministic and stochastic
hydrodynamic equations arising from simple microscopic model systems. In Stochastic
partial differential equations: six perspectives, vol. 64 of Math. Surveys Monogr.,
107–152. Amer. Math. Soc., Providence, RI, 1999.
[GT10] M. G UBINELLI and S. T INDEL. Rough evolution equations. Ann. Probab. 38, no. 1,
(2010), 1–75. doi:10.1214/08-AOP437.
[Gub04] M. G UBINELLI. Controlling rough paths. J. Funct. Anal. 216, no. 1, (2004), 86–140.
doi:10.1016/j.jfa.2004.01.002.
[Gub10] M. G UBINELLI. Ramification of rough paths. J. Differential Equations 248, no. 4,
(2010), 693–721. doi:10.1016/j.jde.2009.11.015.
References 245

[Gub12] M. G UBINELLI. Rough solutions for the periodic Korteweg–de Vries

equation. Commun. Pure Appl. Anal. 11, no. 2, (2012), 709–733.
doi:10.3934/cpaa.2012.11.709.
[Hai11a] M. H AIRER. On Malliavin’s proof of Hörmander’s theorem. Bull. Sci. Math. 135, no.
6-7, (2011), 650–666. doi:10.1016/j.bulsci.2011.07.007.
[Hai11b] M. H AIRER. Rough stochastic PDEs. Comm. Pure Appl. Math. 64, no. 11, (2011),
1547–1585. doi:10.1002/cpa.20383.
[Hai13] M. H AIRER. Solving the KPZ equation. Ann. of Math. (2) 178, no. 2, (2013), 559–664.
doi:10.4007/annals.2013.178.2.4.
[Hai14a] M. H AIRER. Introduction to regularity structures. ArXiv e-prints (2014).
arXiv:1401.3014.
[Hai14b] M. H AIRER. Singular stochastic PDEs. ArXiv e-prints (2014). arXiv:1403.6353.
[Hai14c] M. H AIRER. A theory of regularity structures. Invent. Math. (2014).
doi:10.1007/s00222-014-0505-4.
[HK12] M. H AIRER and D. K ELLY. Geometric versus non-geometric rough paths. ArXiv
e-prints (2012). arXiv:1210.6294. Ann. IHP (B), to appear.
[HM11] M. H AIRER and J. C. M ATTINGLY. A theory of hypoellipticity and unique ergodicity
for semilinear stochastic PDEs. Electron. J. Probab. 16, (2011), no. 23, 658–738.
doi:10.1214/EJP.v16-875.
[HM12] M. H AIRER and J. M AAS. A spatial version of the Itô-Stratonovich correction. Ann.
Probab. 40, no. 4, (2012), 1675–1714. doi:10.1214/11-AOP662.
[HMW14] M. H AIRER, J. M AAS, and H. W EBER. Approximating rough stochastic PDEs. Comm.
Pure Appl. Math. 67, no. 5, (2014), 776–870. doi:10.1002/cpa.21495.
[HN09] Y. H U and D. N UALART. Rough path analysis via fractional calculus. Trans. Amer.
Math. Soc. 361, no. 5, (2009), 2689–2718. doi:10.1090/S0002-9947-08-04631-X.
[Hop50] E. H OPF. The partial differential equation ut + uux = µuxx . Comm. Pure Appl.
Math. 3, (1950), 201–230.
[Hör67] L. H ÖRMANDER. Hypoelliptic second order differential equations. Acta Math. 119,
(1967), 147–171.
[HP13] M. H AIRER and N. S. P ILLAI. Regularity of laws and ergodicity of hypoellip-
tic SDEs driven by rough paths. Ann. Probab. 41, no. 4, (2013), 2544–2598.
doi:10.1214/12-AOP777.
[HQ14] M. H AIRER and J. Q UASTEL. Continuous interface growth models rescale to KPZ,
2014. Preprint.
[HSV07] M. H AIRER, A. M. S TUART, and J. VOSS. Analysis of SPDEs arising in path
sampling. II. The nonlinear case. Ann. Appl. Probab. 17, no. 5-6, (2007), 1657–1706.
doi:10.1214/07-AAP441.
[HT13] Y. H U and S. T INDEL. Smooth density for some nilpotent rough differential equations.
J. Theoret. Probab. 26, no. 3, (2013), 722–749. doi:10.1007/s10959-011-0388-x.
[HW13] M. H AIRER and H. W EBER. Rough Burgers-like equations with multiplica-
tive noise. Probab. Theory Related Fields 155, no. 1-2, (2013), 71–126.
doi:10.1007/s00440-011-0392-1.
[IK06] Y. I NAHAMA and H. K AWABI. Large deviations for heat kernel measures on loop
spaces via rough paths. J. London Math. Soc. (2) 73, no. 3, (2006), 797–816.
doi:10.1112/S0024610706022654.
[Ina10] Y. I NAHAMA. A stochastic Taylor-like expansion in the rough path theory. J. Theor.
Probab. 23, (2010), 671–714.
[Ina13] Y. I NAHAMA. Malliavin differentiability of solutions of rough differential equations.
ArXiv e-prints (2013). arXiv:1312.7621.
[INY78] N. I KEDA, S. NAKAO, and Y. YAMATO. A class of approximations of Brownian
motion. Publ. Res. Inst. Math. Sci. 13, no. 1, (1977/78), 285–300.
[IW89] N. I KEDA and S. WATANABE. Stochastic differential equations and diffusion processes.
North-Holland Publishing Co., Amsterdam, second ed., 1989.
[JLM85] G. J ONA -L ASINIO and P. K. M ITTER. On the stochastic quantization of field theory.
Comm. Math. Phys. 101, no. 3, (1985), 409–436.
246 References

[JM83] N. C. JAIN and D. M ONRAD. Gaussian measures in Bp . Ann. Probab. 11, no. 1,
(1983), 46–57.
[Kal02] O. K ALLENBERG. Foundations of modern probability. Probability and its Applications
(New York). Springer-Verlag, New York, second ed., 2002.
[KM14] D. K ELLY and I. M ELBOURNE. Smooth approximation of stochastic differential
equations. ArXiv e-prints (2014). arXiv:1403.7281.
[Koh78] J. J. KOHN. Lectures on degenerate elliptic problems. In Pseudodifferential operator
with applications (Bressanone, 1977), 89–151. Liguori, Naples, 1978.
[KPZ86] M. K ARDAR, G. PARISI, and Y.-C. Z HANG. Dynamic scaling of growing interfaces.
Phys. Rev. Lett. 56, no. 9, (1986), 889–892.
[KR77] N. V. K RYLOV and B. L. ROZOVSKII. The Cauchy problem for linear stochastic partial
differential equations. Izv. Akad. Nauk SSSR Ser. Mat. 41, no. 6, (1977), 1329–1347,
1448.
[KRT07] I. K RUK, F. RUSSO, and C. A. T UDOR. Wiener integrals, Malliavin calculus
and covariance measure structure. J. Funct. Anal. 249, no. 1, (2007), 92–142.
doi:10.1016/j.jfa.2007.03.031.
[KS84] S. K USUOKA and D. S TROOCK. Applications of the Malliavin calculus. I. In Stochastic
analysis (Katata/Kyoto, 1982), vol. 32 of North-Holland Math. Library, 271–306.
North-Holland, Amsterdam, 1984.
[KS85] S. K USUOKA and D. S TROOCK. Applications of the Malliavin calculus. II. J. Fac. Sci.
Univ. Tokyo Sect. IA Math. 32, no. 1, (1985), 1–76.
[KS87] S. K USUOKA and D. S TROOCK. Applications of the Malliavin calculus. III. J. Fac.
Sci. Univ. Tokyo Sect. IA Math. 34, no. 2, (1987), 391–442.
[Kun82] H. K UNITA. Stochastic partial differential equations connected with nonlinear filtering.
In Nonlinear filtering and stochastic control (Cortona, 1981), vol. 972 of Lecture Notes
in Math., 100–169. Springer, Berlin, 1982.
[Kus01] S. K USUOKA. Approximation of expectation of diffusion process and mathematical
finance. In Taniguchi Conference on Mathematics Nara ’98, vol. 31 of Adv. Stud. Pure
Math., 147–165. Math. Soc. Japan, Tokyo, 2001.
[LCL07] T. J. LYONS, M. C ARUANA, and T. L ÉVY. Differential equations driven by rough
paths, vol. 1908 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures
from the 34th Summer School on Probability Theory held in Saint-Flour, July 6–24,
2004, With an introduction concerning the Summer School by Jean Picard.
[Led96] M. L EDOUX. Isoperimetry and Gaussian analysis. In Lectures on probability theory and
statistics (Saint-Flour, 1994), vol. 1648 of Lecture Notes in Math., 165–294. Springer,
Berlin, 1996.
[LLQ02] M. L EDOUX, T. LYONS, and Z. Q IAN. Lévy area of Wiener processes in Banach
spaces. Ann. Probab. 30, no. 2, (2002), 546–578. doi:10.1214/aop/1023481002.
[LN11] T. LYONS and H. N I. Expected signature of Brownian Motion up to the first exit time
from a bounded domain. ArXiv e-prints (2011). arXiv:1101.5902.
[LPS13] P.-L. L IONS, B. P ERTHAME, and P. E. S OUGANIDIS. Scalar conservation laws with
rough (stochastic) fluxes. ArXiv e-prints (2013). arXiv:1309.1931.
[LQ02] T. LYONS and Z. Q IAN. System control and rough paths. Oxford Mathematical
Monographs. Oxford University Press, Oxford, 2002. Oxford Science Publications.
[LQZ02] M. L EDOUX, Z. Q IAN, and T. Z HANG. Large deviations and support theorem for
diffusion processes via rough paths. Stochastic Process. Appl. 102, no. 2, (2002),
265–283. doi:10.1016/S0304-4149(02)00176-X.
[LS98a] P.-L. L IONS and P. E. S OUGANIDIS. Fully nonlinear stochastic partial differential
equations. C. R. Acad. Sci. Paris Sér. I Math. 326, no. 9, (1998), 1085–1092.
doi:10.1016/S0764-4442(98)80067-0.
[LS98b] P.-L. L IONS and P. E. S OUGANIDIS. Fully nonlinear stochastic partial differential
equations: non-smooth equations and applications. C. R. Acad. Sci. Paris Sér. I Math.
327, no. 8, (1998), 735–741. doi:10.1016/S0764-4442(98)80161-4.
References 247

[LS00a] P.-L. L IONS and P. E. S OUGANIDIS. Fully nonlinear stochastic PDE with semilinear
stochastic dependence. C. R. Acad. Sci. Paris Sér. I Math. 331, no. 8, (2000), 617–624.
doi:10.1016/S0764-4442(00)00583-8.
[LS00b] P.-L. L IONS and P. E. S OUGANIDIS. Uniqueness of weak solutions of fully nonlinear
stochastic partial differential equations. C. R. Acad. Sci. Paris Sér. I Math. 331, no. 10,
(2000), 783–790. doi:10.1016/S0764-4442(00)01597-4.
[LS01] W. V. L I and Q.-M. S HAO. Gaussian processes: inequalities, small ball probabilities
and applications. In Stochastic processes: theory and methods, vol. 19 of Handbook of
Statist., 533–597. North-Holland, Amsterdam, 2001.
[LV04] T. LYONS and N. V ICTOIR. Cubature on Wiener space. Proc. R. Soc. Lond. Ser. A Math.
Phys. Eng. Sci. 460, no. 2041, (2004), 169–198. doi:10.1098/rspa.2003.1239.
Stochastic analysis with applications to mathematical finance.
[LV07] T. LYONS and N. V ICTOIR. An extension theorem to rough paths. Ann.
Inst. H. Poincaré Anal. Non Linéaire 24, no. 5, (2007), 835–847.
doi:10.1016/j.anihpc.2006.07.004.
[LY02] F. L IN and X. YANG. Geometric measure theory—an introduction, vol. 1 of Advanced
Mathematics (Beijing/Boston). Science Press Beijing, Beijing, 2002.
[Lyo91] T. LYONS. On the nonexistence of path integrals. Proc. Roy. Soc. London Ser. A 432,
no. 1885, (1991), 281–290. doi:10.1098/rspa.1991.0017.
[Lyo94] T. LYONS. Differential equations driven by rough signals. I. An extension of
an inequality of L. C. Young. Math. Res. Lett. 1, no. 4, (1994), 451–464.
doi:10.4310/MRL.1994.v1.n4.a5.
[Lyo98] T. J. LYONS. Differential equations driven by rough signals. Rev. Mat. Iberoamericana
14, no. 2, (1998), 215–310. doi:10.4171/RMI/240.
[Mal78] P. M ALLIAVIN. Stochastic calculus of variations and hypoelliptic operators. Proc. In-
tern. Symp. SDE 195–263.
[Mal97] P. M ALLIAVIN. Stochastic analysis, vol. 313 of Grundlehren der Mathematischen
Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag,
Berlin, 1997.
[McK69] H. P. M C K EAN , J R . Stochastic integrals. Probability and Mathematical Statistics, No.
5. Academic Press, New York-London, 1969.
[McS72] E. J. M C S HANE. Stochastic differential equations and models of random processes. In
Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probabil-
ity (Univ. California, Berkeley, Calif., 1970/1971), Vol. III: Probability theory, 263–294.
Univ. California Press, Berkeley, Calif., 1972.
[Mey92] Y. M EYER. Wavelets and operators, vol. 37 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press, Cambridge, 1992. Translated from the 1990
French original by D. H. Salinger.
[MR06] M. B. M ARCUS and J. ROSEN. Markov processes, Gaussian processes, and local
times, vol. 100 of Cambridge Studies in Advanced Mathematics. Cambridge University
Press, Cambridge, 2006.
[MSS06] A. M ILLET and M. S ANZ -S OL É. Large deviations for rough paths of the fractional
Brownian motion. Ann. Inst. H. Poincaré Probab. Statist. 42, no. 2, (2006), 245–271.
[Nor86] J. N ORRIS. Simplified Malliavin calculus. In Séminaire de Probabilités, XX, 1984/85,
vol. 1204 of Lecture Notes in Math., 101–130. Springer, Berlin, 1986.
[NP88] D. N UALART and É. PARDOUX. Stochastic calculus with anticipating in-
tegrands. Probab. Theory Related Fields 78, no. 4, (1988), 535–581.
doi:10.1007/BF00353876.
[NT11] D. N UALART and S. T INDEL. A construction of the rough path above fractional
Brownian motion using Volterra’s representation. Ann. Probab. 39, no. 3, (2011),
1061–1096. doi:10.1214/10-AOP578.
[Nua06] D. N UALART. The Malliavin calculus and related topics. Probability and its Applica-
tions (New York). Springer-Verlag, Berlin, second ed., 2006.
[Par79] E. PARDOUX. Stochastic partial differential equations and filtering of diffusion pro-
cesses. Stochastics 3, no. 2, (1979), 127–167. doi:10.1080/17442507908833142.
248 References

[PS08] G. A. PAVLIOTIS and A. M. S TUART. Multiscale methods, vol. 53 of Texts in Applied

Mathematics. Springer, New York, 2008. Averaging and homogenization.
[PW81] G. PARISI and Y. S. W U. Perturbation theory without gauge fixing. Sci. Sinica 24,
no. 4, (1981), 483–496.
[Qua11] J. Q UASTEL. Introduction to KPZ. Current Developments in Mathematics 2011,
(2011), 125–194.
[RX13] S. R IEDEL and W. X U. A simple proof of distance bounds for gaussian rough paths.
Electron. J. Probab. 18, (2013), no. 108, 1–18. doi:10.1214/EJP.v18-2387.
[RY91] D. R EVUZ and M. YOR. Continuous martingales and Brownian motion, vol. 293 of
Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, Berlin, 1991.
[Rya02] R. A. RYAN. Introduction to tensor products of Banach spaces. Springer Monographs
in Mathematics. Springer-Verlag London Ltd., London, 2002.
[SC74] V. N. S UDAKOV and B. S. C IREL0 SON. Extremal properties of half-spaces for
spherically invariant measures. Zap. Naučn. Sem. Leningrad. Otdel. Mat. Inst. Steklov.
(LOMI) 41, (1974), 14–24, 165. Problems in the theory of probability distributions, II.
[Sim97] L. S IMON. Schauder estimates by scaling. Calc. Var. Partial Differential Equations 5,
no. 5, (1997), 391–407. doi:10.1007/s005260050072.
[Sip93] E.-M. S IPIL ÄINEN. A pathwise view of solutions of stochastic differential equations.
Ph.D. thesis, University of Edinburgh, 1993.
[Str11] D. W. S TROOCK. Probability theory. Cambridge University Press, Cambridge, second
ed., 2011. An analytic view.
[SV73] D. W. S TROOCK and S. R. S. VARADHAN. Limit theorems for random walks on Lie
groups. Sankhyā Ser. A 35, no. 3, (1973), 277–294.
[Tei11] J. T EICHMANN. Another approach to some rough and stochastic par-
tial differential equations. Stoch. Dyn. 11, no. 2-3, (2011), 535–550.
doi:10.1142/S0219493711003437.
[Tow02] N. T OWGHI. Multidimensional extension of L. C. Young’s inequality. JIPAM. J.
Inequal. Pure Appl. Math. 3, no. 2, (2002), Article 22, 13 pp. (electronic).
[Unt10] J. U NTERBERGER. A rough path over multidimensional fractional Brownian motion
with arbitrary Hurst index by Fourier normal ordering. Stochastic Process. Appl. 120,
no. 8, (2010), 1444–1472. doi:10.1016/j.spa.2010.04.001.
[Wer12] B. M. W ERNESS. Regularity of Schramm-Loewner evolutions, annular cross-
ings, and rough path theory. Electron. J. Probab. 17, (2012), no. 81, 21.
doi:10.1214/EJP.v17-2331.
[You36] L. C. YOUNG. An inequality of the Hölder type, connected with Stieltjes integration.
Acta Math. 67, no. 1, (1936), 251–282.
Index

Th , 25 Cameron–Martin
T≥α , 213 embedding theorem, 150
||| · |||α , 54 paths, 150
C γ , 194 space, 150
C α , 14 theorem for Brownian rough path, 127
Cgα , 16 variation embedding, 150
C p-var , 151 variation embedding, improved, 166
p-var
Cg , 151 Carnot–Carathéodory
0,α metric, 18
Cg,0 , 126
0,α norm, 18
Cg , 22
Cass–Litterer–Lyons estimates, 159
D γ (V ), 212
Chen’s relation, 13, 17, 20
DX 2α
, 56
γ Chen–Strichartz formula, 121
Dα , 213
complementary Young regularity, 149
M , 198
concentration of measure, 154
W 1 , 150
controlled rough paths, 56
%α , 15
composition with regular functions, 97
dC , 18, 30
integration, 57
p-variation, 149
of low regularity, 101
BC, 170
operations on, 95
1-form, 48
relation to rough paths, 95
covariance function, 129
admissible models, 215 cubature formula, 40, 45
cubature on Wiener space, 39
Borell’s inequality, 154
Bouleau–Hirsch criterion, 160 Davie’s lemma, 50
bracket of a rough path, 71 differential equations
Brownian motion, 150 Young, 106
Banach-valued, 43 Doob–Meyer
fractional, 139, 141 decomposition, 83
Hölder roughness, 90 for rough paths, 86
Hilbert-valued, 42
in magnetic field, as rough path, 34 enhanced Brownian motion, 28
Itô, as rough path, 31
physical, 34, 44 Fawcett’s formula, 39
Stratonovich, as rough path, 32 Fernique theorem, 154
Brownian rough path, 28, 33 for Gaussian rough paths, 155

249
250 Index

generalized, 154 Malliavin calculus, 160

Feynman–Kac formula, 170 Malliavin covariance matrix, 160
filtering, 186 model, 198
flow, 120 modelled distribution, 199
transformation method, 173 composition with regular function, 212
fractional Brownian motion, 92, 129, 141 differentiation, 211
as rough path, 139 multiplicative functional, 50
Freidlin–Wentzell large deviations, 126, 173 almost, 50

Gaussian rough paths, 129 Norris’ lemma, 88

geodesic approximations, 19
Gubinelli derivative, 49, 56 Ornstein–Uhlenbeck process, 142, 143, 182
uniqueness, 85
quadratic variation
Hölder roughness, 88 in the sense of Föllmer, 73
of Brownian motion, 90
of fractional Brownian motion, 92 random walk, 41
Hölder space, 194 reconstruction theorem, 199, 204
Hölder spaces, 209 regularity structure, 191, 193
Hörmander’s theorem, 160 canonical polynomial structure, 194
rough path proof, 164 model for, 198
hybrid Itô-rough differential equation, 171, 185 rough path structure, 195
Riemann–Stieltjes sum, 47
integrability of rough integrals, 156 compensated, 48
integral robustness
of controlled rough paths, 57 of filtering, 186
rough, 49, 53 of maximum likelihood estimation, 79
Skorokhod, 79 of rough integration, 60
Stratonovich anticipating, 80 rough
integration truly, 85
backward Itô, 75 rough differential equation, 105, 109, 112
Itô, 67 calculus of variations, 161
of controlled rough paths, 57 Davie’s definition, 117
rough, 49, 53 driven by Gaussian signal
Stratonovich, 69 Hörmander theory, 160
interpolation, 21 Malliavin calculus, 160
Itô’s formula, 70, 72 Euler approximation, 118
controlled rough path point of view, 100 explicit solution, 121
Itô–Föllmer formula, 70 explosion, 121
Itô–Lyons map, 116 flows, 120
Hörmander’s theorem, 164
Kallianpur–Striebel formula, 186 hybrid Itô-rough , 171
Kolmogorov type criteria, 28, 30, 40, 129 in the sense of Davie, 118
KPZ equation, 221 linear, 121
Hopf–Cole solution, 223 Lyons’s definition, 119
solution via regularity structures, 224 Milstein approximation, 118
solution via rough paths, 238 partial, 169
partial, Feynman–Kac formula, 169
Lévy’s stochastic area, 27, 31 with drift, 121
large deviations, 126 rough integral, 49
large deviations of Schilder type, 127 integrability, 156
law of the iterated logarithm, 87 rough integration, 49
Lie algebra, 18 rough partial differential equations, 169
Lie group, 17, 18 rough path, 14
Index 251

bracket, 71 stochastic area, 31

Brownian, large deviations, 127 stochastic differential equation, 123
Brownian, support, 126 Freidlin–Wentzell large deviations, 126
Cameron–Martin theorem for, 127 in Itô sense, 123
controlled, 49, 56 in Stratonovich sense, 123
controlled, of lower regularity, 61 Stroock–Varadhan support theorem, 125
convergence, via interpolation, 21 Wong–Zakai approximations, 124
convergence, via Kolmogorov, 30 stochastic heat equation as rough path, 180
Donsker theorem, 42 stochastic integration
Doob–Meyer for, 86 anticipating, 79, 80
Fernique theorem, 154 backward Itô, 75
for Gaussian process, 139 Itô, 67
for Ornstein-Uhlenbeck process, 142 Stratonovich, 69
for physical Brownian motion, 34 stochastic partial differential equation, 169
for stochastic heat equation, 180 Burger-like, 180
from random walk, 41 Feynman–Kac formula, 169
Gaussian, 129 KPZ, 221
Fernique theorem for, 155 linear stochastic heat equation, 181
Malliavin calculus for, 160 singular semilinear, 221
geometric, 16 Stroock–Varadhan support theorem, 125, 173
integral, 53
Kolmogorov criterion, 27 tensor algebra
Kolmogorov tightness criterion, 40 trunctated, 17
Lévy–Kintchine formula, 45 tensor norm
metric, 15 injective, 43
norm projective, 11, 43
homogeneous α-Hölder , 15 translation of a rough path, 25, 152
homogeneous p-variation, 151 translation operator, 25
Norris’ lemma for, 88 true roughness, 85
pure area, 24, 157 as condition for Hörmander’s theorem, 165
reduced, 71 of Brownian motion, 87
relation to controlled rough paths, 95 truly rough, 85
spaces, separability, 22
spatial, 180 universal limit theorem, 122
translation, 25
translation operator, 25, 152 variation
rough path norm, 15 2D %-variation, 132
rough viscosity solutions, 178 controlled %-variation, 133
regularity, 149
Schauder estimates, 215
sector, 211 wavelets, 204
sewing lemma, 50, 51 Wiener–Itô chaos, 150
shuffle product, 101 Wong–Zakai
signature for Brownian rough path, 33
truncated, 19 for SPDEs, 173
stability for stochastic differential equations, 124
flows of rough differential equations, 120
functions of controlled rough paths, 98 Young
rough differential equations, 116 2D maximal inequality, 132
rough integration, 60 differential equations, 107
statistics inequality, 48
applications to, 79 integral, 47

Electromagnetic Waves - Aakash
100% (2)
Electromagnetic Waves - Aakash
16 pages
Lectures On The Hyperreals
100% (1)
Lectures On The Hyperreals
309 pages
GROUP4
100% (1)
GROUP4
59 pages
HOL Theorem-Proving
100% (1)
HOL Theorem-Proving
122 pages
Peter Woit - Quantum Theory, Groups and Representations - An Introduction-Springer (2017)
No ratings yet
Peter Woit - Quantum Theory, Groups and Representations - An Introduction-Springer (2017)
659 pages
IntroOptimManifolds Boumal 2020 PDF
No ratings yet
IntroOptimManifolds Boumal 2020 PDF
310 pages
Dales, Dashiell, Lau, - Strauss. Banach Spaces of Continuous Functions As Dual Spaces
No ratings yet
Dales, Dashiell, Lau, - Strauss. Banach Spaces of Continuous Functions As Dual Spaces
286 pages
Classical Covariant Fields
100% (2)
Classical Covariant Fields
554 pages
Software Defined Radio Handbook
0% (1)
Software Defined Radio Handbook
70 pages
Analysis III
100% (1)
Analysis III
278 pages
Construction of Alternator: Under Electrical Generator
100% (1)
Construction of Alternator: Under Electrical Generator
3 pages
Stochastic Calculus
No ratings yet
Stochastic Calculus
217 pages
Exploring Digital Logic Withlogisim-Evolution
100% (1)
Exploring Digital Logic Withlogisim-Evolution
279 pages
On Perfect Numbers and Their Relations
No ratings yet
On Perfect Numbers and Their Relations
11 pages
Quat Book PDF
No ratings yet
Quat Book PDF
816 pages
Peri Dynamics
No ratings yet
Peri Dynamics
110 pages
Pe 1974 10 PDF
100% (1)
Pe 1974 10 PDF
102 pages
Introduction To Coalgebra 59: Towards Mathematics of States and Observation
100% (1)
Introduction To Coalgebra 59: Towards Mathematics of States and Observation
493 pages
Kreisel and Lawvere On Category Theory and The Foundations of Mathematics
100% (2)
Kreisel and Lawvere On Category Theory and The Foundations of Mathematics
46 pages
The Impact of The Lambda Calculus
100% (1)
The Impact of The Lambda Calculus
26 pages
Volnei A. Pedroni Finite State Machines in Hardware Theory and Design PDF
No ratings yet
Volnei A. Pedroni Finite State Machines in Hardware Theory and Design PDF
349 pages
9780203755419
100% (1)
9780203755419
431 pages
Introduction To C-Algebras - J. Dixmier
100% (1)
Introduction To C-Algebras - J. Dixmier
503 pages
Stein J.Y. Digital Signal Processing - A Computer Science Perspective (Wiley, 2000) (T) (869s)
100% (1)
Stein J.Y. Digital Signal Processing - A Computer Science Perspective (Wiley, 2000) (T) (869s)
869 pages
5 Analytic Continuation
No ratings yet
5 Analytic Continuation
17 pages
Z3 An Efficient SMT Solver
No ratings yet
Z3 An Efficient SMT Solver
4 pages
Introduction To Random Graphs
100% (1)
Introduction To Random Graphs
583 pages
Dental Amalgam
No ratings yet
Dental Amalgam
30 pages
These
No ratings yet
These
122 pages
Particle
No ratings yet
Particle
31 pages
Transmission Lines Input Impedance
No ratings yet
Transmission Lines Input Impedance
18 pages
Foubook PDF
No ratings yet
Foubook PDF
341 pages
Fourier Analysis of Time Series An Introduction 2ed Bloomfield PDF
No ratings yet
Fourier Analysis of Time Series An Introduction 2ed Bloomfield PDF
275 pages
Free Probability and Random Matrices
No ratings yet
Free Probability and Random Matrices
342 pages
Brambilla 1994
No ratings yet
Brambilla 1994
8 pages
Introduction To Modern Set Theory
No ratings yet
Introduction To Modern Set Theory
129 pages
Maxwells Equations Time Varying Fields PDF
No ratings yet
Maxwells Equations Time Varying Fields PDF
18 pages
Lecture Notes of Advanced Probability
No ratings yet
Lecture Notes of Advanced Probability
101 pages
Using Generalized Quantum Fourier Transforms in Quantum Phase Estimation Algorithms
100% (1)
Using Generalized Quantum Fourier Transforms in Quantum Phase Estimation Algorithms
94 pages
Peller Hankel Operators PDF
No ratings yet
Peller Hankel Operators PDF
56 pages
Parallel Resonance and Parallel RLC Resonant Circuit
No ratings yet
Parallel Resonance and Parallel RLC Resonant Circuit
9 pages
Data Driven Model Discovery and Coordinate Embeddings For Physical Systems by Nathan Kutz, University of Washington
No ratings yet
Data Driven Model Discovery and Coordinate Embeddings For Physical Systems by Nathan Kutz, University of Washington
43 pages
18 - Irreducible Tensor Operators and The Wigner-Eckart Theorem PDF
No ratings yet
18 - Irreducible Tensor Operators and The Wigner-Eckart Theorem PDF
30 pages
Texts in Applied Mathematics: Springer
No ratings yet
Texts in Applied Mathematics: Springer
349 pages
Bessel Functions (Guide)
100% (1)
Bessel Functions (Guide)
22 pages
Hardy Spaces Lecture Notes 1
100% (2)
Hardy Spaces Lecture Notes 1
62 pages
Physical Geodesy: Martin Vermeer
No ratings yet
Physical Geodesy: Martin Vermeer
520 pages
200 More Puzzling Problems in Physics
No ratings yet
200 More Puzzling Problems in Physics
16 pages
Compiled by Rabin
No ratings yet
Compiled by Rabin
110 pages
VITA VM9 Working Instructions 1190EN 0911
No ratings yet
VITA VM9 Working Instructions 1190EN 0911
36 pages
Automata Theory P.K.srimani
No ratings yet
Automata Theory P.K.srimani
622 pages
Physics: Pearson Edexcel
No ratings yet
Physics: Pearson Edexcel
28 pages
Modern Physics With Modern - (Z-Library) - 1-60
No ratings yet
Modern Physics With Modern - (Z-Library) - 1-60
62 pages
Michio Kuga - Susan Addington - Motohico Mulase - Galois' Dream - Group Theory and Differential Equations - Group Theory and Differential Equations-Birkhauser (1993)
No ratings yet
Michio Kuga - Susan Addington - Motohico Mulase - Galois' Dream - Group Theory and Differential Equations - Group Theory and Differential Equations-Birkhauser (1993)
158 pages
Number Theory: Lecture Notes
No ratings yet
Number Theory: Lecture Notes
69 pages
Lecture Notes in Lie Groups: Vladimir G. Ivancevic Tijana T. Ivancevic
No ratings yet
Lecture Notes in Lie Groups: Vladimir G. Ivancevic Tijana T. Ivancevic
74 pages
An Introduction To Interpolation Theory, 2007 (Lunardi - A.)
No ratings yet
An Introduction To Interpolation Theory, 2007 (Lunardi - A.)
125 pages
Electromagnetic Pollution Its Impact and Control: Vol. 2, Issue 7, ISSN No. 2455-2143, Pages 61-65
No ratings yet
Electromagnetic Pollution Its Impact and Control: Vol. 2, Issue 7, ISSN No. 2455-2143, Pages 61-65
5 pages
Physics Ip 2025
No ratings yet
Physics Ip 2025
18 pages
Lesson 0: Martingales: Le Thi Xuan Mai
No ratings yet
Lesson 0: Martingales: Le Thi Xuan Mai
50 pages
Mi Morad Arthesis
No ratings yet
Mi Morad Arthesis
155 pages
Introduction To Type Theory
No ratings yet
Introduction To Type Theory
60 pages
Tensors PDF
No ratings yet
Tensors PDF
16 pages
AC Current Clamp: Instruction Sheet
No ratings yet
AC Current Clamp: Instruction Sheet
6 pages
Standard Specifications: Horizontal Wire Mesh Belt Conveyor
No ratings yet
Standard Specifications: Horizontal Wire Mesh Belt Conveyor
1 page
Horedt - Polytropes Applications in Astrophysics and Related Fields PDF
No ratings yet
Horedt - Polytropes Applications in Astrophysics and Related Fields PDF
727 pages
Evjen
No ratings yet
Evjen
5 pages
Foundations I: States and Ensembles: 2.1 Axioms of Quantum Mechanics
100% (1)
Foundations I: States and Ensembles: 2.1 Axioms of Quantum Mechanics
40 pages
Organic Ion Exchange of ZSM-5 Zeolite: P. Chu and F.G. Dwyer
No ratings yet
Organic Ion Exchange of ZSM-5 Zeolite: P. Chu and F.G. Dwyer
4 pages
2018 Operator Theory Operator Algebras and Matrix Theory - Book
100% (1)
2018 Operator Theory Operator Algebras and Matrix Theory - Book
381 pages
Chapter 0
No ratings yet
Chapter 0
32 pages
Introduction To Organic Photochemistry
No ratings yet
Introduction To Organic Photochemistry
54 pages
CPS Guffanti Bronzine
No ratings yet
CPS Guffanti Bronzine
2 pages
MMP Assignment On Matrices
No ratings yet
MMP Assignment On Matrices
2 pages
Plane Areas by Integration
No ratings yet
Plane Areas by Integration
11 pages
Newsletter
No ratings yet
Newsletter
2 pages
Cbse Project
No ratings yet
Cbse Project
79 pages
HS PS1 1
No ratings yet
HS PS1 1
2 pages
WORKBOOK 1 FULL - CH
No ratings yet
WORKBOOK 1 FULL - CH
61 pages
Fourier-Mukai and Nahm Transforms in Geometry and Mathematical Physics (Claudio Bartocci, Ugo Bruzzo Etc.)
100% (1)
Fourier-Mukai and Nahm Transforms in Geometry and Mathematical Physics (Claudio Bartocci, Ugo Bruzzo Etc.)
434 pages
Grade 10 Physics Laboratory Manual
No ratings yet
Grade 10 Physics Laboratory Manual
81 pages
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
100% (1)
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
960 pages
Free Probability and Operator Algebras
100% (1)
Free Probability and Operator Algebras
144 pages
Topics in Random Matrix Theory
No ratings yet
Topics in Random Matrix Theory
342 pages
Semigroups of Operators and Spectral Theory (Research Notes in Mathematics Series) (PDFDrive)
No ratings yet
Semigroups of Operators and Spectral Theory (Research Notes in Mathematics Series) (PDFDrive)
149 pages
Rings of Continuous Functions
From Everand
Rings of Continuous Functions
Leonard Gillman
No ratings yet
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
From Everand
Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements
S. Twomey
No ratings yet
Foundations of Stochastic Analysis
From Everand
Foundations of Stochastic Analysis
M. M. Rao
No ratings yet
Techniques and Applications of Path Integration
From Everand
Techniques and Applications of Path Integration
L. S. Schulman
No ratings yet
Conformal Mapping on Riemann Surfaces
From Everand
Conformal Mapping on Riemann Surfaces
Harvey Cohn
3/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.