0% found this document useful (0 votes)

15 views40 pages

2205.11393generic Bounds On The Approximation Error

This document proposes a general framework for deriving rigorous bounds on the approximation error of physics-informed neural networks (PINNs), DeepONets, Fourier neural operators (FNOs), and physics-informed operator learning architectures for approximating solutions and operators of partial differential equations (PDEs). The framework shows that error estimates for one type of neural network can be derived from another, simplifying the proofs. It is demonstrated by deriving novel error bounds for PINNs, DeepONets, FNOs and physics-informed operator learning, showing they can overcome the curse of dimensionality for nonlinear parabolic PDEs like the Allen-Cahn equation.

Uploaded by

hongnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views40 pages

2205.11393generic Bounds On The Approximation Error

Uploaded by

hongnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Generic bounds on the approximation error for

physics-informed (and) operator learning

Tim De Ryck ∗
arXiv:2205.11393v2 [cs.LG] 10 Oct 2022

Siddhartha Mishra †

Abstract
We propose a very general framework for deriving rigorous bounds on the approxi-
mation error for physics-informed neural networks (PINNs) and operator learning
architectures such as DeepONets and FNOs as well as for physics-informed oper-
ator learning. These bounds guarantee that PINNs and (physics-informed) Deep-
ONets or FNOs will efficiently approximate the underlying solution or solution
operator of generic partial differential equations (PDEs). Our framework utilizes
existing neural network approximation results to obtain bounds on more involved
learning architectures for PDEs. We illustrate the general framework by deriving
the first rigorous bounds on the approximation error of physics-informed opera-
tor learning and by showing that PINNs (and physics-informed DeepONets and
FNOs) mitigate the curse of dimensionality in approximating nonlinear parabolic
PDEs.

1 Introduction
The efficient numerical approximation of partial differential equations (PDEs) is of paramount im-
portance as PDEs mathematically describe an enormous range of interesting phenomena in the sci-
ences and engineering. Machine learning techniques, particularly deep learning, are playing an
increasingly important role in this context. For instance, given their universal approximation proper-
ties, deep neural networks serve as ansatz spaces for supervised learning of a variety of (parametric)
PDEs [19, 70, 40, 53, 54] and references therein. In this setting, large amounts of training data
might be required. However, this data is often acquired from expensive computer simulations or
physical measurements [53], necessitating the design of learning frameworks that work with lim-
ited data. Physics-informed neural networks (PINNs), proposed by [18, 42, 41] and popularized by
[67, 68], are a prominent example of such a learning framework as the residual of the underlying
PDE is minimized within the class of neural networks and in principle, little (or even no) training
data is required. PINNs and their variants have proven to be a very powerful and computationally
efficient framework for approximating solutions to PDEs, [69, 51, 55, 65, 76, 34, 35, 61, 59, 60, 2]
and references therein.
Often in the context of PDEs, one needs to approximate the underlying solution operator that maps
one infinite-dimensional function space into another [27, 39]. As neural networks can only map
between finite dimensional spaces, a new field of operator learning is emerging wherein novel
learning frameworks need to be designed in order to approximate operators. These include deep
operator networks (DeepONets) [9, 49] and their variants as well as neural operators [39], which
generalize neural networks to this setting. A variety of neural operators have been proposed, see
[45, 46] but arguably, the most efficient form of neural operators is provided by the so-called Fourier
∗
Seminar for Applied Mathematics (SAM), D-MATH, ETH Zürich, Switzerland
†
Seminar for Applied Mathematics (SAM), D-MATH and ETH AI center, ETH Zürich, Switzerland

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

neural operators (FNOs) [44]. Both DeepONets and FNOs have been very successfully deployed in
scientific computing [50, 56, 8, 48, 47, 66] and references therein. Finally, one can combine PINNs
and operator learning to design physics-informed DeepONets/FNOs [74, 47, 73, 22].
From a theoretical perspective, one needs to provide a rigorous guarantee that the learning frame-
work can approximate the underlying PDE solution (operator) to desired accuracy. More precisely,
given an error tolerance ε > 0, we need to rigorously prove that the approximation error of the neu-
ral network (operator) can be made smaller than ε. For efficient approximation, one has to further
ensure that the computational complexity (measured in terms of the size) of the learning architec-
ture grows at most polynomially in ε−1 . In particular, exponential growth has to be ruled out. As
neural networks, DeepONets and FNOs are all universal approximators [13, 9, 49, 43, 38] of the
underlying functions or operators, it is possible to show that the approximation error can be made as
small as desired. However, these results do not guarantee efficient approximation as the underlying
network size could still grow exponentially with decreasing error, see [77] for neural networks in
very high spatial dimensions, [43] for DeepONets and [38] for FNOs. Hence, the real theoretical
challenge in this context lies in proving efficient approximation results for the different learning ar-
chitectures in scientific computing. Such efficient approximation results have mostly been obtained
for neural networks in the supervised learning setting e.g. [23, 36, 32, 4, 63] and references therein.
In contrast, there is a relative scarcity of such efficient approximation results for PINNs and operator
learning with notable exceptions being [25, 15, 16] (for PINNs), [43] (for DeepONets) and [38] (for
FNOs). Moreover, the underlying proofs in these works are often on a case-by-case basis and the
overall abstract structure is not clearly identified. Finally, no similar rigorous approximation results
for physics informed operator learning are available till date.
This paucity of generic efficient approximation results for PINNs and operator learning for PDEs sets
the stage for the current paper where our main contribution is to propose a very general framework
(Section 3) for proving bounds on the approximation error for space-time neural networks, PINNs,
DeepONets, FNOs and physics-informed DeepONets and FNOs for very general PDEs. Conse-
quently, we obtain the first rigorous bounds for physics-informed operator learning in literature. Our
framework is based on the observation that error estimates for different types of neural network archi-
tectures can all be obtained from one another. As error estimates for neural network approximations
of PDE solutions at a fixed time are the easiest to obtain, and hence constitute the largest proportion
of currently available estimates, we devote particular attention to demonstrating how these available
estimates can be used to obtain novel bounds on the approximation error for space-time networks,
PINNs and (physics-informed) operator learning. Our results provide a roadmap for deriving math-
ematical guarantees for deep learning methods in scientific computing by simplifying the proofs, as
the needed work essentially reduces to verifying a small number of assumptions. We demonstrate
how the generic error bounds from Section 3 can be applied in practice in Section 4, among others by
giving short alternative proofs for known results and also proving a number of novel results. In par-
ticular, we show in Section 4.1 that PINNs can overcome the curse of dimensionality for nonlinear
parabolic PDEs such as the Allen-Cahn equation i.e., that the network size does not grow exponen-
tially with increasing spatial dimension. Moreover, dimension-independent convergence rates are
also obtained for (physics-informed) DeepONets and FNOs, provided that the PDE solutions are
sufficiently smooth. These are the first results of their kind. We note that many of the proofs and
some examples are deferred to the supplementary material (SM).

2 Preliminaries
2.1 Setting
Given T > 0 and D ⊂ Rd compact, consider the function u : [0, T ] × D → Rm , for m ≥ 1, that
belongs to a function space H and solves the following (time-dependent) PDE,
La (u)(t, x) = 0 and u(x, 0) = u0 ∀(t, x) ∈ [0, T ] × D, (2.1)
where u0 ∈ Y ⊂ L (D) is the initial condition and La : H → L ([0, T ] × D) is a differential
2 2

operator that can depend on a parameter (function) a ∈ Z ⊂ L2 (D). In our notation, we will often
suppress the dependence of L := La on a for simplicity. Depending on the context, one might
want to recover one of the following mathematical objects: for fixed a and u0 , one might want to
approximate u(T, ·) or u(·, ·) with a neural network; a more challenging task would be to learn the
solution operator G : X → L2 (Ω) : v 7→ u, where v ∈ {u0 , a}, X ∈ {Y, Z} and Ω = D or

2
Ω = [0, T ] × D. We will use this notation consistently throughout the paper, see SM A.1 for an
overview.

2.2 Approximating PDEs with neural networks

Neural networks A (feedforward) neural network uθ : Rd0 → RdL is defined as a concatenation

of affine maps Al : Rdl−1 → Rdl : z 7→ Wl z + bl and an activation function σ : R → R that is
applied component-wise, resulting in,
uθ (y) = AL ◦ σ ◦ AL−1 . . . . . . . . . ◦ σ ◦ A2 ◦ σ ◦ A1 (y). (2.2)
The weights and biases of the affine maps θ = {Wl , bl }1≤l≤L are the trainable parameters. We will
quantify the size of a neural network by its depth(uθ ) := L and its width(uθ ) := maxl dl . In order
to obtain a neural network that approximates the solution u of PDE (2.1) at time t = T , one chooses
the parameters of uθ : D → R such that a discretization (quadrature) of J (θ) = ku(T ) − uθ kL2 (D)
is minimized. The training data is acquired from either measurements or potentially expensive
simulations.

PINNs Physics-informed neural networks (PINNs) are neural networks that are trained with a
different, residual-based loss function. As the PDE solution u satisfies L(u) = 0, the goal of
physics-informed learning is to find a neural network uθ : [0, T ] × D → R for which the PDE
residual is approximately zero, L(uθ ) ≈ 0. To ensure uniqueness, one also needs to require that the
initial condition is satisfied i.e., uθ (0, x) ≈ u0 (x), and similarly for boundary conditions. In practice
one minimizes a quadrature approximation of J (θ) = kL(uθ )k2L2 ([0,T ]×D) + kuθ (0, ·) − u0 k2L2 (D) ,
where additional terms can be added to (approximately) impose boundary conditions and augment
the loss function using data. A desirable property of PINNs is that only very little or even no training
data is needed to construct the loss function.

Operator learning In order to approximate operators, one needs to allow the input and output of
the learning architecture to be infinite-dimensional. A possible approach is to use deep operator
networks (DeepONets), as proposed in [9, 49]. Given m, fixed sensor locations {xj }m j=1 ⊂ D
and the corresponding sensor values {v(xj )}m j=1 as input, a DeepONet can be formulated in terms
of two (deep) neural networks: a branch net β : Rm → Rp and a trunk net τ : D → Rp+1 .
The branch and trunk nets are then combined to approximate the underlying Ppnonlinear operator
as the following DeepONet Gθ : X → L2 (D), with Gθ (v)(y) = τ0 (y) + k=1 βk (v)τk (y). A
second approach is that of neural operators, which generalize hidden layers by including a non-
local integral operator [45], of which particularly Fourier neural operators (FNOs) [44] are already
well-established. The practical implementation (i.e. discretization) of an FNO maps from and to the
space of trigonometric polynomials of degree at most N ∈ N, denoted by L2N , and can be identified
with a finite-dimensional mapping that is a composition of affine maps and nonlinear layers of the
−1
form Ll (z)j = σ(Wl vj + bl,j FN (Pl (k) · FN (z)(k)j )), where the Pl (k) are coefficients that define
a non-local convolution operator via the discrete Fourier transform FN , see [38].

Physics-informed operator learning Both DeepONets and FNOs are trained by choosing a
suitable probability measure µ on X and minimizing a quadrature approximation of J (θ) =
kGθ (v) − G(v)kL2µ×dx (X ×Ω) . Generating training sets might require many calls to an expensive
PDE solver, leading to an enormous computational cost. In order to reduce or even fully eliminate
the need for training data, physics-informed operator learning has been proposed in [74] for Deep-
ONets and in [47] for FNOs. Similar to PINNs, the training procedure aims to minimize a quadrature
approximation of J (θ) = kL(Gθ )kL2µ×dx (X ×Ω) .

3 General results
We propose a framework to obtain bounds on the approximation error for the various neural network
architectures introduced in Section 2.2. Figure 1 visualizes how different types of error estimates
can be obtained from one another. Every box shows the name of the network architecture, the form
of the relevant loss and the theorem which proves the corresponding estimate for the approximation
error. Every arrow in the flowchart represents a proof technique that allows one to transfer an error

3
Neural network (fixed time) FNO DeepONet
C D
u(T ) − uθ (T ) Lq (D) < ε kG − Gθ kL2 (X ×D) < ε kG − Gθ kL2 (X ×D) < ε
Assumed to be known Theorem 3.7 Corollary 3.8

Neural network (space-time)

ku − uθ kLq ([0,T ]×D) < ε B B
Theorem 3.5

B
PINN Physics-informed FNO Physics-informed DeepONet
C D
L(uθ ) Lq ([0,T ]×D) < ε L(Gθ ) L2 (X ×Ω) < ε L(Gθ ) L2 (X ×Ω) < ε
Theorem 3.5 Theorem 3.9 Theorem 3.9 & 3.10

Figure 1: Flowchart of the structure of the results in this paper, with q ∈ {2, ∞}. The letters reflect
the techniques used in the proofs: A uses Taylor approximations (Section 3.1), B is based on finite
difference approximations (Section 3.1), C uses trigonometric polynomial interpolation (Section
3.2) and D uses the connection between FNOs and DeepONets (Section 3.2).

estimate from one type of method to another (see caption of Figure 1 for an overview of those
techniques).
We give particular attention to the case where it is known that a neural network can efficiently
approximate the solution to a time-dependent PDE at a fixed time. Such neural networks are usually
obtained by emulating a classical numerical method. Examples include finite difference schemes,
finite volume schemes, finite element methods, iterative methods and Monte Carlo methods, e.g.
[36, 63, 10, 57]. More precisely, for ε > 0, we assume to have access to an operator U ε : X ×
[0, T ] → H that for any t ∈ [0, T ] maps any initial condition/parameter function v ∈ X to a neural
network U ε (v, t) that approximates the PDE solution G(v)(·, t) = u(·, t)∈ Lq (D), q ∈ {2, ∞}, at
time t, as specified below. Moreover, we will assume that we know how its size depends on the
accuracy ε. Explicit examples of the operator U ε will be given in Section 4 and SM C.
Assumption 3.1. Let q ∈ {2, ∞}. For any B, ε > 0, ℓ ∈ N, t ∈ [0, T ] and any v ∈ X with
kvkC ℓ ≤ B there exist a neural network U ε (v, t) : D → R and a constant Cε,ℓ
B
> 0 s.t.

U ε (v, t) − G(v)(·, t) Lq (D)

≤ε and max U ε (v, t) W ℓ,q (D)
B
≤ Cε,ℓ . (3.1)
t∈[0,T ]

Remark 3.2. For vanilla neural networks and PINNs one can set X := {v}, G(v) := u and v := u0
or v := a in Assumption 3.1 above and Assumption 3.4 below.

Under this assumption, we prove the existence of space-time neural networks and PINNs that effi-
ciently approximate the PDE solution (Section 3.1), as well as FNOs and DeepONets (Section 3.2)
and physics-informed FNOs and DeepONets (Section 3.3). Finally, we also prove a general result
on the generalization error (Section 3.4).

3.1 Estimates for (physics-informed) neural networks

We will construct a space-time neural network uθ for which both kuθ − ukLq ([0,T ]×D) and the PINN
loss kL(uθ )kLq ([0,T ]×D) are small. To accurately approximate the time derivatives of u we emulate
Taylor expansions, whereas for the spatial derivatives we employ finite difference (FD) operators
in our proofs. Depending on whether forward, backward or central differences are used, a FD
operator might not be defined on the whole domain D, e.g. for f ∈ C([0, 1]) the (forward) operator
∆+h [f ] := f (x + h) − f (x) is not well-defined for x ∈ (1 − h, 1]. This can be solved by resorting
to piecewise-defined FD operators, e.g. a forward operator on [0, 0.5] and a backward operator on
(0.5, 1]. In a general domain Ω one can find a well-defined piecewise FD operator if Ω satisfies the
following assumption, which is satisfied by many domains (e.g. rectangular, smooth).

4
Assumption 3.3. There exists a finite partition P of Ω such that for all P ∈ P there exists εP > 0
1
and vP ∈ B∞ = {x ∈ Rdim(Ω) : kxk∞ ≤ 1} such that for all x ∈ P it holds that x + εP (vP +
1
B∞ ) ⊂ Ω.
Additionally, we need to assume that the PINN error can be bounded in terms of the errors related to
all relevant partial derivatives, denoted by D(k,α) := Dtk Dxα := ∂tk ∂xα11 . . . ∂xαdd , for (k, α) ∈ Nd+1
0 .
This assumption is valid for many classical solutions of PDEs. A few worked out examples can be
found in SM D.5 (gravity pendulum) and SM D.6 (Darcy flow).
Assumption 3.4. Let k, ℓ ∈ N, q ∈ {2, ∞}, C > 0 be independent from d. For all v ∈ X it holds,
X ′
L(Gθ (v)) Lq ([0,T ]×D) ≤ C · poly(d) · D(k ,α) (G − Gθ ) q . (3.2)
L ([0,T ]×D)
(k′ ,α)∈Nd+1
0
k′ ≤k,kαk1 ≤ℓ

In this setting, we prove the following approximation result for space-time networks and PINNs.
Theorem 3.5. Let s, r ∈ N, let u ∈ C (s,r) ([0, T ] × D) be the solution of the PDE (2.1) and let
Assumption 3.1 be satisfied. There exists a constant C(s, r) > 0 such that for every M ∈ N and
ε, h > 0 there exists a tanh neural network uθ : [0, T ] × D → R for which it holds that,
kuθ − ukLq ([0,T ]×D) ≤ C(kukC (s,0) M −s + ε). (3.3)
and if additionally Assumption 3.3 and Assumption 3.4 hold then,
L(uθ ) L2 ([0,T ]×D)
+ kuθ − ukL2 (∂([0,T ]×D))
(3.4)
≤ C · poly(d) · lnk (M )(kukC (s,ℓ) M k−s + M 2k (εh−ℓ + Cε,ℓ
B r−ℓ
h )).
Moreover, depth(uθ ) ≤ C · depth(U ε ) and width(uθ ) ≤ CM · width(U ε ).

Proof. We only provide a sketch of the full proof (SM B.2). The main idea is to divide [0, T ]
into M uniform subintervals and construct a neural network that approximates a Taylor approxi-
mation in time of u in each subinterval. In the obtained formula, we approximate the monomials
and multiplications by neural networks (SM A.7) and approximate the derivatives of u by finite
differences and use (A.2) of SM A.2 to find an error estimate in C k ([0, T ], Lq (D))-norm. We
use again finite difference operators to prove that spatial derivatives of u are accurately approxi-
mated as well. The neural network will also approximately satisfy the initial/boundary conditions
as kuθ − ukL2 (∂([0,T ]×D)) . Cpoly(d)kuθ − ukH 1 ([0,T ]×D) , which follows from a Sobolev trace
inequality.

We note that the bounds (3.3) and (3.4) together imply that there exists a neural network for which
the total error as well as the PINN loss can be made as small as possible, providing a solid theoretical
foundation to PINNs for approximating the PDE (2.1).

3.2 Estimates for operator learning

In this section, we use Assumption 3.1 to prove estimates for DeepONets and FNOs. First, we prove
a generic error estimate for FNOs. Using the known connection between FNOs and DeepONets
(SM Lemma B.6) this result can then easily be applied to DeepONets (Corollary 3.8). In order to
prove these error estimates, we need to assume that the operator U ε from Assumption 3.1 is stable
with respect to its input function, as specified in Assumption 3.6 below. Moreover, we will take the
d-dimensional torus as domain D = Td = [0, 2π)d and assume periodic boundary conditions for
simplicity in what follows. This is not a restriction, as for every Lipschitz subset of Td there exists a
(linear and continuous) Td -periodic extension operator of which also the derivatives are Td -periodic
[38, Lemma 41].
Assumption 3.6. Assumption 3.1 is satisfied and let p ∈ {2, ∞}. For every ε > 0 there exists a
ε
constant Cstab > 0 such that for all v, v ′ ∈ X it holds that,
U ε (v, T ) − U ε (v ′ , T ) L2
ε
≤ Cstab v − v′ Lp
. (3.5)

In this setting, we prove a generic approximation result for FNOs.

5
Theorem 3.7. Let r ∈ N, T > 0, let G : C r (Td ) → C r (Td ) be an operator that maps a function
u0 to the solution u(·, T ) of the PDE (2.1) with initial condition u0 , let Assumption 3.6 be satisfied
and let p∗ ∈ {2, ∞} \ {p}. Then there exists a constant C > 0 such that for every ε > 0, N ∈ N
there is an FNO Gθ : L2N (Td ) → L2N (Td ) of depth O(depth(U ε )) and width O(N d width(U ε ))
with accuracy,
∗
ε
kG − Gθ kL2 ≤ C(ε + Cstab BN −r+d/p + Cε,r CB −r
N ). (3.6)

Proof. We give a sketch of the proof, details can be found in SM B.3. Given function values of v
on a uniform grid with grid size 1/N , we use trigonometric polynomial interpolation (SM A.6) to
reconstruct v and use this together with Assumption 3.1 to construct a neural network. The resulting
approximation is then projected onto the space L2N , of trigonometric polynomials of degree at most
N ∈ N, again through trigonometric polynomial interpolation.

A recent result, [38, Theorem 36] (SM Lemma B.6), shows that any error bound for FNOs also
implies an error bound for DeepONets, by choosing the trunk nets as neural network approximations
of the Fourier basis. We apply this result with ε ∼ poly(1/N ) to Theorem 3.7 to obtain the following
generic error bound for DeepONets.
Corollary 3.8. Assume the setting of Theorem 3.7. Then for every ε > 0, N ∈ N and ev-
ery corresponding FNO Gθ from Theorem 3.7 there exists a DeepONet Gθ∗ : X → L2 (D) with
width(β) = O(N d ), depth(β) = O(depth(Gθ )), width(τ ) = O(N d+1 ) and depth(τ ) ≤ 3 that
satisfies (3.6).

3.3 Estimates for physics-informed operator learning

Using the techniques from previous sections, we now present the very first theoretical result for
physics-informed operator learning. We demonstrate that if an error estimate for a DeepONet/FNO
and the growth of its derivatives are known (see SM D.1 on how to obtain these), then one can
prove an error estimate for the corresponding physics-informed DeepONet/FNO. For simplicity,
the following result focuses only on operators mapping to C r (D) but the generalization to e.g.
C r ([0, T ] × D) is immediate by considering D′ := [0, T ] × D.
Theorem 3.9. Consider an operator G : X → C r (D), r ∈ N, that satisfies Assumption 3.3 and
Assumption 3.4 with ℓ ∈ N . Let λ∗ ∈ (0, ∞], let λ, C(λ) > 0 with λ ≤ λ∗ and let σ : N → R be a
function such that for all p ∈ N there is a DeepONet/FNO Gθ such that
G(v) − Gθ (v) L2 (D)
≤ Cp−λ and Gθ (v) C r (D)
≤ Cpσ(r) ∀r ∈ N, v ∈ X . (3.7)
(r−ℓ)λ∗ −ℓσ(r)
Then for all β ∈ R with 0 < β ≤ r there exists a constant C ∗ > 0 such that for all
v ∈ X and p ∈ N it holds that
L(Gθ (v)) L2 (D)
≤ C ∗ p−β . (3.8)

Proof. For suitable Dα , use SM Lemma B.1 with q = 2, f1 = G(v) and f2 = Gθ (v) together with
(3.7) to find
Dα (G(v) − Gθ (v)) L2 (D) ≤ C(r, λ)(p−λ h−ℓ + pσ(r) hr−ℓ ). (3.9)
∗ σ(r)+β
Let β ∈ R with 0 < β ≤ (r−ℓ)λr−ℓσ(r) . We carefully balance terms by setting h = p− r−ℓ and
ℓ r
λ = r−ℓ σ(r) + r−ℓ β to find (3.8). Conclude using Assumption 3.4.

Finally, we use Theorem 3.5 to present an alternative error estimate for a physics-informed Deep-
ONet in the case that Assumption 3.1 is satisfied. As this assumption is different from assuming
access to an error bound for the corresponding DeepONet, it is interesting to use the techniques
from the previous sections rather than directly apply Theorem 3.9. The proof of the following theo-
rem can be found in SM B.4.
Theorem 3.10. Let s, r ∈ N, T > 0, let G : C r (Td ) → C (s,r) ([0, T ] × Td ) be an operator that
maps a function u0 to the solution u of the PDE (2.1) with initial condition u0 , let Assumption 3.1
and Assumption 3.6 be satisfied and let p∗ ∈ {2, ∞} \ {p}. There exists a constant C > 0 such that

6
for every Z, N, M ∈ N, ε, ρ > 0 there is an DeepONet Gθ : C r (Td ) → L2 ([0, T ] × Td ) with Z d
sensors with accuracy,
∗
G(v) − Gθ (v) ≤ CM ρ (kukC (s,0) M −s + M s−1 (ε + Cstab
L2 ([0,T ]×Td )
ε
Z −r+d/p + Cε,r
CB −r
N ))
(3.10)
and if additionally Assumption 3.3 and Assumption 3.4 hold then,
∗
L(Gθ (v)) ≤ CM k+ρ (kukC (s,ℓ) M −s + M s−1 N ℓ (ε + Cstab
L2 ([0,T ]×Td )
ε
Z −r+d/p + Cε,r
CB −r
N )),
(3.11)
for all v. Moreover, it holds that, depth(β) = depth(U ε ), width(β) = O(M (Z d +
N d width(U ε ))), depth(τ ) = 3 and width(τ ) = O(M N d (N + ln(N ))).

3.4 A posteriori bound on the generalization error

Although the main focus of this paper is on the approximation error for different neural network
architectures, we now demonstrate that it is possible to provide similar bounds for other sources of
error, such as the generalization error. We therefore prove a general a posteriori upper bound on the
generalization error of the all the considered neural network architectures. Consider f : D → R
(an operator or function) and the neural network architecture fθ : D → R, θ ∈ Θ, which includes
all architectures of Section 2.2: neural networks (D = Ω, f = u and fθ = uθ ), PINNs (D = Ω,
f = 0 and fθ = L(uθ )), operator learning (D = Ω × X , f = G and fθ = Gθ ) and physics-informed
operator learning (D = Ω × X , f = 0 and fθ = L(Gθ )). Given a training set S = {X1 , . . . Xn },
where {Xi }ni=1 are iid random variables on D (according to a measure µ), the training error ET and
generalization error EG are,
n
1X
ˆ
2 2
ET (θ, S)2 = f (zi ) − fθ (zi ) , EG (θ)2 = fθ (z) − f (z) dµ(z), (3.12)
n i=1 D

where µ is a probability measure on D. The following theorem provides a computable a posteriori

error bound on the expectation of the generalization error for a general class of approximators. We
refer to e.g. [6, 16] for bounds on dΘ , c and L.
Theorem 3.11. For R > 0 and dΘ ∈ N, let Θ = [−R, R]dΘ be the set of trainable parameters, and
for every training set S, let θ∗ (S) ∈ Θ be an (approximate) minimizer of θ 7→ ET (θ, S)2 , assume
that θ 7→ EG (θ, S)2 and θ 7→ ET (θ)2 are bounded by c > 0 and Lipschitz continuous with Lipschitz
constant L > 0. If n ≥ 2c2 e8 /(2RL)dΘ /2 then it holds that
r
h i h i 2c2 (dΘ + 1) √
∗ 2 ∗ 2
E EG (θ (S)) ≤ E ET (θ (S), S) + ln RL n . (3.13)
n

Proof. The proof (SM B.5) combines standard techniques, based on covering numbers and Hoeffd-
ing’s inequality, with an error composition from [16].

For any type of neural network architecture of depth L, width W and weights bounded by R, one
finds that dΘ ∼ LW (W + d). For tanh neural networks and operator learning architectures, one
has that ln(L) ∼ L ln(dRW ), whereas for physics-informed neural networks and DeepONets one
finds that ln(L) ∼ (k + ℓ)L ln(dRW ) with k and ℓ as in Assumption 3.4 [43, 16]. Taking this
into account, one also finds that the imposed lower bound on n is not very restrictive. Moreover,
the RHS of (3.13) depends at most polynomially on L, W, R, d, k, ℓ and c. For physics-informed
architectures, however, upper bounds on c often depend exponentially on L [16, 14].
Remark 3.12. As Theorem 3.11 is an a posteriori error estimate, one can use the network sizes of
the trained networks for L, W and R. The sizes stemming from the approximation error estimates
of the previous sections can be disregarded for this result. Moreover, instead of considering the
expected values of EG and ET in (3.13), one can also prove that such an inequality holds with a
certain probability (see SM B.5).

4 Applications
We demonstrate the power and generality of the framework proposed in Section 3 by applying the
presented theory to the following case studies. First, we demonstrate how these generic bounds can

7
be used to overcome the curse of dimensionality (CoD) for linear Kolmogorov PDEs and nonlinear
parabolic PDEs (Section 4.1). These are the first available results that overcome the CoD for nonlin-
ear parabolic PDEs for PINNs and (physics-informed) operator learning. Next, we apply the results
of Section 3.3 to both linear and nonlinear operators and provide bounds on the approximation error
for physics-informed operator learning.

4.1 Overcoming the curse of dimensionality

For high-dimensional PDEs, it is not possible to obtain efficient approximation results using standard
neural network approximation theory [77, 15] as they will lead to convergence rates that suffer from
the CoD, meaning that the neural network size scales exponentially in the input dimension. In
literature, one has shown for some PDEs that their solution at a fixed time can be approximated to
accuracy ε > 0 with a network that has size O(poly(d)ε−β ), with β > 0 independent of d, and
therefore overcomes the CoD.

Linear Kolmogorov PDEs We consider linear time-dependent PDEs of the following form.
Setting 4.1. Let s, r ∈ N, u0 ∈ C02 (Rd ) and let u ∈ C (s,r) ([0, T ] × Rd ) be the solution of
1
L(u)(x, t) = ∂t u(x, t) − Tr(σ(x)σ(x)T ∆x [u](x, t)) − µ(x)T ∇x [u](x, t) = 0, u(0, x) = u0 (x)
2
(4.1)
for all (x, t) ∈ D × [0, T ], where σ : Rd → Rd×d and µ : Rd → Rd are affine functions and for
which kukC (s,2) grows at most polynomially in d. For every ε > 0, there is a neural network u b0 of
width O(poly(d)ε−β ) such that ku0 − u b0 kL∞ (Rd ) < ε.

Prototypical examples of such linear Kolmogorov PDEs include the heat equation and the Black-
Scholes equation. In [23, 7, 36] the authors construct a neural network that approximates u(T ) and
overcomes the CoD by emulating Monte-Carlo methods based on the Feynman-Kac formula. In
[16] one has proven that PINNs overcome the CoD as well, in the sense that the network size grows
as O(poly(dρd )ε−β ), with ρd as defined in SM (C.10). For a subclass of Kolmogorov PDEs it is
known that ρd = poly(d), such that the CoD is fully overcome.
We demonstrate that the generic bounds of Section 3 (Theorem 3.5) can be used to provide a much
shorter proof for this result. SM Lemma C.6 verifies that Assumption 3.1 is indeed satisfied. The
full proof can be found in SM C.2.
Theorem 4.2. Assume that Setting 4.1 holds. For every σ, ε > 0 and d ∈ N, there is a tanh neural
r+σ s+1 1+σ
network uθ of depth O(depth(b u0 )) and width O(poly(dρd )ε−(2+β) r−2 s−1 − s−1 ) such that,
L(uθ ) L2 ([0,T ]×[0,1]d )
+ kuθ − ukL2 (∂([0,T ]×[0,1]d )) ≤ ε. (4.2)

Nonlinear parabolic PDEs Next, we consider nonlinear parabolic PDEs as in Section 4.3, which
typically arise in the context of nonlinear diffusion-reaction equations that describe the change in
space and time of some quantities, such as in the well-known Allen-Cahn equation [1].
Setting 4.3. Let s, r ∈ N and for u0 ∈ X ⊂ C r (Td ) let u ∈ C (s,r) ([0, T ] × Td ) be the solution of
L(u)(x, t) = ∂t u(t, x) − ∆x u(t, x) − F (u(t, x)) = 0, u(0, x) = u0 (x), (4.3)
for all (t, x) ∈ [0, T ] × D, with period boundary conditions, where F : R → R is a polynomial and
for which kukC (s,2) grows at most polynomially in d. For every ε > 0, there is a neural network
b0 of width O(poly(d)ε−β ) such that ku0 − u
u b0 kL∞ (Td ) < ε. Let µ, resp. µ∗ , be the normalized
Lebesgue measure on [0, T ] × Td , resp. ∂([0, T ] × Td ).
In [32] the authors have proven that ReLU neural networks overcome the CoD in the approximation
of u(T ). We have reproven this result in SM Lemma C.14 for tanh neural networks to show that
Assumption 3.1 is satisfied. Using Theorem 3.5 we can now prove that PINNs overcome the CoD
for nonlinear parabolic PDEs. The proof is analogous to that of Theorem 4.2.
Theorem 4.4. Assume Setting 4.3. For every σ, ε > 0 and d ∈ N there is a tanh neural network uθ
r+σ s+1 1+σ
of depth O(depth(b u0 ) + poly(d) ln 1/ε ) and width O(poly(d)ε−(2+β) r−2 s−1 − s−1 ) such that,
L(uθ ) L2 ([0,T ]×Td ,µ)
+ ku − uθ kL2 (∂([0,T ]×Td ,µ∗ )) ≤ ε. (4.4)

8
Similarly, one can use the results from Section 3.2 to obtain estimates for (physics-informed)
DeepONets for nonlinear parabolic PDEs (4.3) such as the Allen-Cahn equation. In particular, a
dimension-independent convergence rate can be obtained if the solution is smooth enough, which
improves upon the result of [43], which incurred the CoD. For simplicity, we present results for
C (2,r) functions, rather than C (s,r) functions, as we found that assuming more regularity did not
necessarily further improve the convergence rate. The proof is given in SM B.4.
Theorem 4.5. Assume Setting 4.3 and let G : X → C r (Td ) : u0 7→ u(T ) and G ∗ : X →
C (2,r) ([0, T ] × Td ) : u0 7→ u. For every σ, ε > 0, there exists a DeepONets Gθ and Gθ∗ such that
kG − Gθ kL2 (Td ×X ) ≤ ε, L(Gθ∗ ) L2 ([0,T ]×Td ×X )
≤ ε. (4.5)
d+σ
Moreover, for Gθ we have O(ε− r ) sensors and,
(d+σ)(2+β)
width(β) = O(ε− r ), depth(β) = O(ln 1/ε ),
− d+1+σ
(4.6)
width(τ ) = O(ε r ), depth(τ ) = 3,
(3+σ)d
whereas for Gθ∗ we have O(ε− r−2 ) sensors and,
(3+σ)(d+r(2+β))
width(β) = O(ε−1− r−2 ), depth(β) = O(ln 1/ε ),
(3+σ)(d+1) (4.7)
width(τ ) = O(ε−1− r−2 ), depth(τ ) = 3.

4.2 Error bounds for physics-informed operator learning

We demonstrate how Theorem 3.9 can be used to generalize available error estimates for DeepONets
and FNOs, e.g. [43, 38] and SM D.1, to estimates for their physics-informed counterparts.

Linear operators In the simplest case, the operator G of interest is linear. In [43, Theorem D.2],
a general error bound for ReLU DeepONets for linear operators has been established, which still
holds for tanh DeepONets. Using Theorem 3.9 it is then straightforward to prove convergence rates
for physics-informed DeepONets for solution operators of linear PDEs (2.1).
Consider an operator G : X → L2 (Td ) : v 7→ u as in Section 2.1, where v is the parameter/initial
condition and u the solution of the PDE (2.1). Following [43], we fix the measureP µ on L2 (Td ) as
a Gaussian random field, such that v allows the Karhunen-Loève expansion v = k∈Zd αk Xk ek ,
where |αk | ≤ exp −ℓ|k| with ℓ > 0, the Xk ∼ N (0, 1) are iid Gaussian random variables
and {ek }k∈Zd is the standard Fourier basis (SM A.5). In this setting, we can prove the following
approximation result, the proof of which can be found in SM D.3. The result can be generalized to
other data distributions µ for which a convergence result for DeepONets can be proven, as in [43].
Theorem 4.6. Assume the setting above and that of Assumption 3.4, and assume that G(v) ∈
C ℓ+1 (Td ) for all v ∈ X . For all β > 0 there exists a constant C > 0 such that for any p ∈ N there
exists a DeepONet Gθ with p sensors and branch and trunk nets such that
L(Gθ )) L2 (L2 (Td ),µ)
≤ Cp−β . (4.8)
d+1
Moreover, size(τ ) ≤ Cp d , depth(τ ) = 3, size(β) ≤ p and depth(β) = 1.

Nonlinear operators For nonlinear PDEs a general result like Theorem 4.6 can not be obtained
from the currently available tools. Instead one needs to use Theorem 3.9 for every PDE of interest
on a case-by-case basis. In the SM, we demonstrate this for a nonlinear ODE (gravity pendulum
with external force, SM D.5) and an elliptic PDE (Darcy flow, SM D.6).

5 Related work and discussion

This is the first paper to rigorously expose the connections between the different deep learning frame-
works from Section 2.2 for generic PDEs. Until now, most available results focus on providing
generic results for one specific method. In [31] and [24] one uses neural networks that approxi-
mate solutions to a generic ODE/PDE at a fixed time to construct space-time neural networks. A

9
generalization to PINNs is not immediate as the proof involves the emulation of the forward Euler
method. We have overcome this difficulty by constructing space-time neural networks using Taylor
expansions instead (Theorem 3.5). To bound the approximation error of PINNs one can use the
generic error bounds in Sobolev norms of e.g. [25, 26] for very general activation functions or the
more concrete bounds [15] for tanh neural networks. In both approaches, the only assumption is
that the solution of the PDE has sufficient Sobolev regularity. As a consequence, these results incur
the curse of dimensionality and are not applicable to high-dimensional PDEs. The authors of [15]
analyze PINNs based on three theoretical questions related to approximation, stability and general-
ization. Other theoretical analyses of PINNs include e.g. [71, 72, 30]. For DeepONets, convergence
rates for advection-diffusion equations are presented in [17] and a clear workflow for obtaining
generic error estimates as well as worked out examples can be found in [43]. Similar results are ob-
tained for FNOs in [38]. A comprehensive comparison of DeepONets and FNOs is the topic of [50].
To the best of the authors’ knowledge, no theoretical results for physics-informed operator learning
are currently available. Unrelated to the approximation error, we also report generic bounds on the
expected value of the generalization error of all the aforementioned deep learning architectures, in
the form of an a posteriori error estimate on the generalization error.
A second goal of the paper is to prove that deep learning-based frameworks can overcome the curse
of dimensionality (CoD). PDEs for which the curse of dimensionality has been overcome include
linear Kolmogorov PDEs e.g. [23, 36], nonlinear parabolic PDEs [32] and elliptic PDEs [4, 10, 57].
By assuming that the initial data lies in a Barron class, the authors of [52] proved for elliptic PDEs
that the Deep Ritz Method [20] can overcome the CoD. Since the Barron class is a Banach algebra
[10] it is possible that our results, which mostly only involve multiplications and additions of neural
networks, can be extended to Barron functions. For PINNs, it is proven that they can overcome
the CoD for linear Kolmogorov PDEs [16]. We give an alternative proof of this result, improve the
convergence rate (Theorem 4.2) and additionally prove that PINNs can also overcome the CoD for
nonlinear parabolic PDEs (Theorem 4.4). DeepONets and FNOs can overcome the CoD in many
cases [43, 38] but we note that this does not yet include nonlinear parabolic PDEs such as the Allen-
Cahn equation. In Theorem 4.5 we prove that dimension-independent convergence rates can be
obtained if the solution is sufficiently regular. Similar results are expected to hold for e.g. elliptic
PDEs by using the results from [4, 10, 57].
It is evident that the generic bounds presented here can only be obtained under suitable assumptions.
These should always be checked to prevent misleading claims about mathematical guarantees for
the considered deep learning methods. We briefly discuss how restrictive these are and whether they
can be relaxed. Assuming the existence of a neural network that approximates the solution of PDE
at a fixed time (Assumption 3.1) is of course essential, but such a result can usually be obtained by
emulating an existing numerical method. Proving a bound on the Sobolev norm of that network is
always possible as we only consider smooth networks. Assumption 3.3 holds for many domains,
including rectangular and smooth ones. Assumption 3.4 and Assumption 3.6 also hold for a very
broad class of PDEs, much like the assumption on the size of the neural network approximation in
Setting 4.1 and 4.3 holds for most functions of interest. Therefore, the assumption that the PDE
solution is C (s,r) -regular seems to be the most restrictive. However, results like Theorem 3.5 could
be extended to e.g. Sobolev regular functions by using the Bramble-Hilbert lemma instead of Taylor
expansions. Another restriction is that we exclusively focused on neural networks with the tanh acti-
vation function. This was only for simplicity of exposition. All results still hold for other sigmoidal
activation functions, as well as more general smooth activation functions, which might give rise
to slightly different convergence rates. A last restriction is that the obtained rates are not optimal,
but this is not the goal of our framework. In particular, for PINNs for low-dimensional PDEs it is
beneficial to use e.g. [26, 15].
Optimizing the obtained convergence rates and comparing with optimal ones is one direction for
future research. Previously mentioned possibilities include extending to more general activation
functions and less regular functions. Another direction is to make the connection between our results
and that of [10] where they prove that Barron spaces are Banach algebras and use this to obtain
dimension-independent convergence rates for PDEs with initial data in a Barron class by emulating
numerical methods.
Here, we have considered the approximation and generalization errors in the present analysis. It is
clear that the bounds on the generalization error may not be sharp, as in traditional deep learning.
Obtaining sharper bounds will be an interesting topic for further investigation. Finally, there is no

10
explicit bound on the training (optimization) errors. Obtaining such bounds will be considered in
the future.

Acknowledgments and Disclosure of Funding

The research of SM was performed under a project that has received funding from the European
Research Council (ERC) under the European Union’s Horizon 2020 research and innovation pro-
gramme (Grant Agreement No. 770880).

References
[1] S. M. Allen and J. W. Cahn. A microscopic theory for antiphase boundary motion and its application to
antiphase domain coarsening. Acta metallurgica, 27(6):1085–1095, 1979.
[2] G. Bai, U. Koley, S. Mishra, and R. Molinaro. Physics informed neural networks (PINNs) for approxi-
mating nonlinear dispersive PDEs. arXiv preprint arXiv:2104.05584, 2021.
[3] A. Barth, A. Jentzen, A. Lang, and C. Schwab. Numerical Analysis of Stochastic Ordinary Differential
Equations. ETH Zürich, 2018.
[4] C. Beck, L. Gonon, and A. Jentzen. Overcoming the curse of dimensionality in the numerical
approximation of high-dimensional semilinear elliptic partial differential equations. arXiv preprint
arXiv:2003.00596, 2020.
[5] C. Beck, F. Hornung, M. Hutzenthaler, A. Jentzen, and T. Kruse. Overcoming the curse of dimensionality
in the numerical approximation of Allen-Cahn partial differential equations via truncated full-history
recursive multilevel Picard approximations. Journal of Numerical Mathematics, 28(4):197–222, 2020.
[6] C. Beck, A. Jentzen, and B. Kuckuck. Full error analysis for the training of deep neural networks, 2020.
[7] J. Berner, P. Grohs, and A. Jentzen. Analysis of the generalization error: Empirical risk minimization over
deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of
Black-Scholes partial differential equations. SIAM Journal on Mathematics of Data Science, 2(3):631–
657, 2020.
[8] S. Cai, Z. Wang, L. Lu, T. A. Zaki, and G. E. Karniadakis. DeepM&Mnet: Inferring the electroconvec-
tion multiphysics fields based on operator approximation by neural networks. Journal of Computational
Physics, 436:110296, 2021.
[9] T. Chen and H. Chen. Universal approximation to nonlinear operators by neural networks with arbitrary
activation functions and its application to dynamical systems. IEEE Transactions on Neural Networks,
6(4):911–917, 1995.
[10] Z. Chen, J. Lu, and Y. Lu. On the representation of solutions to elliptic PDEs in Barron spaces. arXiv
preprint arXiv:2106.07539, 2021.
[11] A. Cohen, R. Devore, and C. Schwab. Analytic regularity and polynomial approximation of parametric
and stochastic elliptic PDEs. Analysis and Applications, 9(01):11–47, 2011.
[12] G. Constantine and T. Savits. A multivariate Faa di Bruno formula with applications. Transactions of the
American Mathematical Society, 348(2):503–520, 1996.
[13] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals
and systems, 2(4):303–314, 1989.
[14] T. De Ryck, A. D. Jagtap, and S. Mishra. Error estimates for physics informed neural networks approxi-
mating the Navier-Stokes equations. arXiv preprint arXiv:2203.09346, 2022.
[15] T. De Ryck, S. Lanthaler, and S. Mishra. On the approximation of functions by tanh neural networks.
Neural Networks, 2021.
[16] T. De Ryck and S. Mishra. Error analysis for physics informed neural networks (PINNs) approximating
Kolmogorov PDEs. arXiv preprint arXiv:2106.14473, 2021.
[17] B. Deng, Y. Shin, L. Lu, Z. Zhang, and G. E. Karniadakis. Convergence rate of DeepONets for learning
operators arising from advection-diffusion equations. arXiv preprint arXiv:2102.10621, 2021.
[18] M. Dissanayake and N. Phan-Thien. Neural-network-based approximations for solving partial differential
equations. Communications in Numerical Methods in Engineering, 1994.
[19] W. E, J. Han, and A. Jentzen. Deep learning-based numerical methods for high-dimensional parabolic
partial differential equations and backward stochastic differential equations. Communications in Mathe-
matics and Statistics, 5(4):349–380, 2017.

11
[20] W. E and B. Yu. The deep Ritz method: a deep learning-based numerical algorithm for solving variational
problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
[21] R. A. Fisher. The wave of advance of advantageous genes. Annals of eugenics, 7(4):355–369, 1937.
[22] S. Goswami, M. Yin, Y. Yu, and G. E. Karniadakis. A physics-informed variational DeepONet for pre-
dicting crack path in quasi-brittle materials. Computer Methods in Applied Mechanics and Engineering,
391:114587, 2022.
[23] P. Grohs, F. Hornung, A. Jentzen, and P. Von Wurstemberger. A proof that artificial neural networks
overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential
equations. arXiv preprint arXiv:1809.02362, 2018.
[24] P. Grohs, F. Hornung, A. Jentzen, and P. Zimmermann. Space-time error estimates for deep neural network
approximations for differential equations. arXiv preprint arXiv:1908.03833, 2019.
[25] I. Gühring, G. Kutyniok, and P. Petersen. Error bounds for approximations with deep ReLU neural
networks in W s,p norms. Analysis and Applications, 18(05):803–859, 2020.
[26] I. Gühring and M. Raslan. Approximation rates for neural networks with encodable weights in smoothness
spaces. Neural Networks, 134:107–130, 2021.
[27] W. H. Guss and R. Salakhutdinov. On universal approximation by neural networks with uniform guaran-
tees on approximation of infinite dimensional maps. arXiv preprint arXiv:1910.01545, 2019.
[28] P. Henry-Labordere. Counterparty risk valuation: A marked branching diffusion approach. Available at
SSRN 1995503, 2012.
[29] P. Henry-Labordere, X. Tan, and N. Touzi. A numerical algorithm for a class of BSDEs via the branching
process. Stochastic Processes and their Applications, 124(2):1112–1140, 2014.
[30] B. Hillebrecht and B. Unger. Certified machine learning: A posteriori error estimation for physics-
informed neural networks. arXiv preprint arXiv:2203.17055, 2022.
[31] F. Hornung, A. Jentzen, and D. Salimova. Space-time deep neural network approximations for high-
dimensional partial differential equations. arXiv preprint arXiv:2006.02199, 2020.
[32] M. Hutzenthaler, A. Jentzen, T. Kruse, and T. A. Nguyen. A proof that rectified deep neural networks
overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN
partial differential equations and applications, 1(2):1–34, 2020.
[33] M. Hutzenthaler, A. Jentzen, B. Kuckuck, and J. L. Padgett. Strong Lp -error analysis of nonlinear
Monte Carlo approximations for high-dimensional semilinear partial differential equations. arXiv preprint
arXiv:2110.08297, 2021.
[34] A. D. Jagtap and G. E. Karniadakis. Extended physics-informed neural networks (XPINNs): A general-
ized space-time domain decomposition based deep learning framework for nonlinear partial differential
equations. Communications in Computational Physics, 28(5):2002–2041, 2020.
[35] A. D. Jagtap, E. Kharazmi, and G. E. Karniadakis. Conservative physics-informed neural networks on dis-
crete domains for conservation laws: Applications to forward and inverse problems. Computer Methods
in Applied Mechanics and Engineering, 365:113028, 2020.
[36] A. Jentzen, D. Salimova, and T. Welti. A proof that deep artificial neural networks overcome the curse of
dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant
diffusion and nonlinear drift coefficients. arXiv preprint arXiv:1809.07321, 2018.
[37] A. N. Kolmogorov. Étude de l’équation de la diffusion avec croissance de la quantité de matière et son
application à un problème biologique. Bull. Univ. Moskow, Ser. Internat., Sec. A, 1:1–25, 1937.
[38] N. Kovachki, S. Lanthaler, and S. Mishra. On universal approximation and error bounds for Fourier
Neural Operators. arXiv preprint arXiv:2107.07562, 2021.
[39] N. Kovachki, Z. Li, B. Liu, K. Azizzadensheli, K. Bhattacharya, A. Stuart, and A. Anandkumar. Neural
operator: Learning maps between function spaces. arXiv preprint arXiv:2108.08481v3, 2021.
[40] G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider. A theoretical analysis of deep neural networks
and parametric PDEs. Constructive Approximation, pages 1–53, 2021.
[41] I. E. Lagaris, A. Likas, and P. G. D. Neural-network methods for boundary value problems with irregular
boundaries. IEEE Transactions on Neural Networks, 11:1041–1049, 2000.
[42] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solving ordinary and partial
differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 2000.
[43] S. Lanthaler, S. Mishra, and G. E. Karniadakis. Error estimates for DeepONets: A deep learning frame-
work in infinite dimensions. Transactions of Mathematics and Its Applications, 6(1):tnac001, 2022.
[44] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier
neural operator for parametric partial differential equations, 2020.

12
[45] Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar.
Neural operator: Graph kernel network for partial differential equations. CoRR, abs/2003.03485, 2020.
[46] Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, A. M. Stuart, K. Bhattacharya, and A. Anandkumar.
Multipole graph neural operator for parametric partial differential equations. In H. Larochelle, M. Ran-
zato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems
(NeurIPS), volume 33, pages 6755–6766. Curran Associates, Inc., 2020.
[47] Z. Li, H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzadenesheli, and A. Anandkumar. Physics-
informed neural operator for learning partial differential equations. arXiv preprint arXiv:2111.03794,
2021.
[48] C. Lin, Z. Li, L. Lu, S. Cai, M. Maxey, and G. E. Karniadakis. Operator learning for predicting multiscale
bubble growth dynamics. The Journal of Chemical Physics, 154(10):104118, 2021.
[49] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet
based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229,
2021.
[50] L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, and G. E. Karniadakis. A comprehensive and
fair comparison of two neural operators (with practical extensions) based on fair data. Computer Methods
in Applied Mechanics and Engineering, 393:114778, 2022.
[51] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. DeepXDE: A deep learning library for solving differential
equations. SIAM Review, 63(1):208–228, 2021.
[52] Y. Lu, J. Lu, and M. Wang. A priori generalization analysis of the deep Ritz method for solving high
dimensional elliptic partial differential equations. In Conference on Learning Theory, pages 3196–3241.
PMLR, 2021.
[53] K. O. Lye, S. Mishra, and D. Ray. Deep learning observables in computational fluid dynamics. Journal
of Computational Physics, page 109339, 2020.
[54] K. O. Lye, S. Mishra, D. Ray, and P. Chandrashekar. Iterative surrogate model optimization (ISMO): An
active learning algorithm for pde constrained optimization with deep neural networks. Computer Methods
in Applied Mechanics and Engineering, 374:113575, 2021.
[55] Z. Mao, A. D. Jagtap, and G. E. Karniadakis. Physics-informed neural networks for high-speed flows.
Computer Methods in Applied Mechanics and Engineering, 360:112789, 2020.
[56] Z. Mao, L. Lu, O. Marxen, T. A. Zaki, and G. E. Karniadakis. DeepM&Mnet for hypersonics: Predicting
the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of
operators. Journal of Computational Physics, 447:110698, 2021.
[57] T. Marwah, Z. Lipton, and A. Risteski. Parametric complexity bounds for approximating pdes with neural
networks. Advances in Neural Information Processing Systems, 34:15044–15055, 2021.
[58] H. P. McKean. Application of brownian motion to the equation of Kolmogorov-Petrovskii-Piskunov.
Communications on pure and applied mathematics, 28(3):323–331, 1975.
[59] S. Mishra and R. Molinaro. Estimates on the generalization error of physics-informed neural networks
for approximating a class of inverse problems for PDEs. IMA Journal of Numerical Analysis, 2021.
[60] S. Mishra and R. Molinaro. Physics informed neural networks for simulating radiative transfer. Journal
of Quantitative Spectroscopy and Radiative Transfer, 270:107705, 2021.
[61] S. Mishra and R. Molinaro. Estimates on the generalization error of physics informed neural networks
(PINNs) for approximating PDEs. IMA Journal of Numerical Analysis, 2022.
[62] B. Øksendal. Stochastic differential equations. Springer, 2003.
[63] J. A. Opschoor, P. C. Petersen, and C. Schwab. Deep ReLU networks and high-order finite element
methods. Analysis and Applications, 18(05):715–770, 2020.
[64] J. A. Opschoor, C. Schwab, and J. Zech. Exponential ReLU DNN expression of holomorphic maps in
high dimension. Constructive Approximation, pages 1–46, 2021.
[65] G. Pang, L. Lu, and G. E. Karniadakis. fPINNs: Fractional physics-informed neural networks. SIAM
journal of Scientific computing, 41:A2603–A2626, 2019.
[66] J. Pathak, S. Subramanian, P. Harrington, S. Raja, A. Chattopadhyay, M. Mardani, T. Kurth, D. Hall,
Z. Li, K. Azizzadenesheli, p. Hassanzadeh, K. Kashinath, and A. Anandkumar. Fourcastnet: A global
data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint
arXiv:2202.11214, 2022.
[67] M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning of nonlinear partial differen-
tial equations. Journal of Computational Physics, 357:125–141, 2018.

13
[68] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deep learning
framework for solving forward and inverse problems involving nonlinear partial differential equations.
Journal of Computational Physics, 378:686–707, 2019.
[69] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden fluid mechanics: A Navier-Stokes informed deep
learning framework for assimilating flow visualization data. arXiv preprint arXiv:1808.04327, 2018.
[70] C. Schwab and J. Zech. Deep learning in high dimension: Neural network expression rates for generalized
polynomial chaos expansions in uq. Analysis and Applications, 17(01):19–55, 2019.
[71] Y. Shin, J. Darbon, and G. E. Karniadakis. On the convergence and generalization of physics informed
neural networks. arXiv preprint arXiv:2004.01806, 2020.
[72] Y. Shin, Z. Zhang, and G. E. Karniadakis. Error estimates of residual minimization using neural networks
for linear equations. arXiv preprint arXiv:2010.08019, 2020.
[73] S. Wang and P. Perdikaris. Long-time integration of parametric evolution equations with physics-informed
DeepONets. arXiv preprint arXiv:2106.05384, 2021.
[74] S. Wang, H. Wang, and P. Perdikaris. Learning the solution operator of parametric partial differential
equations with physics-informed DeepOnets. arXiv preprint arXiv:2103.10974, 2021.
[75] J. Yang, Q. Du, and W. Zhang. Uniform Lp -bound of the Allen-Cahn equation and its numerical dis-
cretization. International Journal of Numerical Analysis & Modeling, 15, 2018.
[76] L. Yang, X. Meng, and G. E. Karniadakis. B-PINNs: Bayesian physics-informed neural networks for
forward and inverse pde problems with noisy data. Journal of Computational Physics, 425:109913, 2021.
[77] D. Yarotsky. Error bounds for approximations with deep ReLU networks. Neural Networks, 94:103–114,
2017.

Checklist
1. For all authors...
(a) Do the main claims made in the abstract and introduction accurately reflect the paper’s
contributions and scope? [Yes] See Section 3 and Section 4.
(b) Did you describe the limitations of your work? [Yes] See Section 5.
(c) Did you discuss any potential negative societal impacts of your work? [Yes] See
Section 5.
(d) Have you read the ethics review guidelines and ensured that your paper conforms to
them? [Yes]
2. If you are including theoretical results...
(a) Did you state the full set of assumptions of all theoretical results? [Yes] All assump-
tions are either stated in the theorem statement or described in the text right above the
theorem statement.
(b) Did you include complete proofs of all theoretical results? [Yes] For each result we
mention where the proof can be found.
3. If you ran experiments...
(a) Did you include the code, data, and instructions needed to reproduce the main experi-
mental results (either in the supplemental material or as a URL)? [N/A]
(b) Did you specify all the training details (e.g., data splits, hyperparameters, how they
were chosen)? [N/A]
(c) Did you report error bars (e.g., with respect to the random seed after running experi-
ments multiple times)? [N/A]
(d) Did you include the total amount of compute and the type of resources used (e.g., type
of GPUs, internal cluster, or cloud provider)? [N/A]
4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...
(a) If your work uses existing assets, did you cite the creators? [N/A]
(b) Did you mention the license of the assets? [N/A]
(c) Did you include any new assets either in the supplemental material or as a URL? [N/A]

(d) Did you discuss whether and how consent was obtained from people whose data
you’re using/curating? [N/A]

14
(e) Did you discuss whether the data you are using/curating contains personally identifi-
able information or offensive content? [N/A]
5. If you used crowdsourcing or conducted research with human subjects...
(a) Did you include the full text of instructions given to participants and screenshots, if
applicable? [N/A]
(b) Did you describe any potential participant risks, with links to Institutional Review
Board (IRB) approvals, if applicable? [N/A]
(c) Did you include the estimated hourly wage paid to participants and the total amount
spent on participant compensation? [N/A]

15
A Notation and preliminaries

We introduce notation and preliminary results regarding finite differences, Sobolev spaces, the Leg-
endre basis, the Fourier basis, trigonometric polynomial interpolation and neural network approxi-
mation theory.

A.1 Overview of used notation

Table 1: Glossary of used notation.

Symbol Description Page
σ tanh activation function
d spatial dimension of domain
Td periodic torus, identified with [0, 2π)d
D general d-dimensional spatial domain p. 2
Ω general domain, either Ω = D or Ω = [0, T ] × D p. 2
∂Ω boundary of Ω
H function space of PDE solution p. 2
Y function space of parameters for L, e.g. La with a ∈ Y p. 2
Z function space of initial conditions for the PDE (2.1) p. 2
X input function space of the operator G p. 2
G operator of interest, G : X → L2 (Ω) p. 2
L, La differential operator that describes the PDE (with parameter a) p. 2
r, s regularity of the PDE solution, u ∈ C r (D) or u ∈ C (s,r) ([0, T ] × D) p. 3
D(k,α) D(k,α) := Dtk Dxα := ∂tk ∂xα11 . . . ∂xαdd , for (k, α) ∈ Nd+1
0 p. 5
ℓ upper bound on kαk1 p. 5
q see Assumption 3.1 p. 4
p see Assumption 3.6 p. 5
poly(d) a polynomial in d p. 5
C0r subset of C r functions with compact support
∆α,r
h finite difference operator; if the variable is time: ∆α,s h,t p. 16
JN grid point indices, JN = {0, . . . , 2N }d p. 17
KN Fourier wavenumbers KN = {k ∈ Zd | |k|∞ ≤ N }
L2 Space of square-integrable functions
Hs Sobolev space of smoothness s, with norm k · kH s p. 17
L2N L2N ⊂ L2 trigonometric polynomials of degree ≤ N p. 18

A.2 Finite differences

For h > 0, α ∈ Nd0 , r ∈ N and ℓ := kαk1 , we define a finite difference operator ∆α,r
h as,
X
∆α,r
h [f ](t, x) = cα,r α,r
j f (t, x + hbj ), (A.1)
j

for f ∈ C r+ℓ (Rd ), where the number of non-zero terms in the summation can be chosen to be finite
and only dependent on ℓ and r and where the choice of bα,rj ∈ Rd allows to approximate Dxα f up
to accuracy O(h ). This means that for any f ∈ C (R ) it holds for all x that,
r r+ℓ d

h−ℓ · ∆α,r α
h [f ](t, x) − Dx f (t, x) ≤ cℓ,r f (t, ·) C r+ℓ
hr for h > 0, (A.2)

where cℓ,r > 0 does not depend on f and h. Similarly, we can define a finite difference operator
∆k,s
h,t [f ](t, x) to approximate Dt f (t, x) to accuracy O(h ).
k s

16
A.3 Sobolev spaces
Let d ∈ N, k ∈ N0 , 1 ≤ p ≤ ∞ and let Ω ⊆ Rd be open. For a function f : Ω → R and a
(multi-)index α ∈ Nd0 we denote by
∂ |α| f
Dα f = (A.3)
∂xα
1
1
· · · ∂xα
d
d

the classical or distributional (i.e. weak) derivative of f . We denote by Lp (Ω) the usual Lebesgue
space and for we define the Sobolev space W k,p (Ω) as
W k,p (Ω) = {f ∈ Lp (Ω) : Dα f ∈ Lp (Ω) for all α ∈ Nd0 with |α| ≤ k}. (A.4)
For p < ∞, we define the following seminorms on W k,p (Ω),
 1/p
X p
|f | m,p
W =
(Ω) kDα f k p  L (Ω) for m = 0, . . . , k, (A.5)
|α|=m

and for p = ∞ we define

|f |W m,∞ (Ω) = max kDα f kL∞ (Ω) for m = 0, . . . , k. (A.6)
|α|=m

Based on these seminorms, we can define the following norm for p < ∞,
 1/p
Xk
p
kf kW k,p (Ω) =  |f |W m,p (Ω)  , (A.7)
m=0

and for p = ∞ we define the norm

kf kW k,∞ (Ω) = max |f |W m,∞ (Ω) . (A.8)
0≤m≤k

The space W k,p (Ω) equipped with the norm k·kW k,p (Ω) is a Banach space.
We denote by C k (Ω) the space of functions that are k times continuously differentiable and equip
this space with the norm kf kC k (Ω) = kf kW k,∞ (Ω) .
Lemma A.1 (Continuous Sobolev embedding). Let d, ℓ ∈ N and let k ≥ d/2 + ℓ. Then there exists
a constant C > 0 such that for any f ∈ H k (Td ) it holds that
kf kC ℓ (Td ) ≤ Ckf kH k (Td ) . (A.9)

A.4 Notation for Legendre basis

In a one-dimensional setting, we denote for j ∈ N0 the j-th Legendre polynomial by Lj . Following

Pℓ
the notation of [64], it holds that Lj (x) = j=0 cjℓ xℓ where, with m(ℓ) := (j − ℓ)/2,
(
j 0 j − ℓ{0, . . . , j} ∪ (2Z + 1),
cℓ = j
j+ℓ√ (A.10)
(−1)m 2−j m j 2j + 1 j − ℓ{0, . . . , j} ∪ 2Z,
where each polynomial is normalized in L2 ([−1, 1], λ/2), where λ is the Lebesgue measure. Simi-
larly, the tensorized Legendre polynomials,
d
Y
Lν (x) = Lνj (xj ) for all ν ∈ Nd0 , (A.11)
j=1

constitute an orthonormal basis of L2 ([−1, 1]d , λ/2d ). By considering the lexicographic order on
Nd0 , of which we denote the enumeration by κ : N → Nd0 , one can defined an ordered basis (Lj )j∈N
by setting Lj := Lκ(j) .
From [64, eq. (2.19)] it also follows that,
d
Y
∀s ∈ N0 , ν ∈ Nd0 : kLν kC s ([−1,1]d ) ≤ (1 + 2νj )1/2+2s . (A.12)
j=1

17
A.5 Notation for Standard Fourier basis
Using the notation from [43], we introduce the following “standard” real Fourier basis {eκ }κ∈Zd in
d dimensions. For κ = (κ1 , . . . , κd ) ∈ Zd , we let σ(κ) be the sign of the first non-zero component
of κ and we define


1, σ(κ) = 0,
eκ := Cκ cos hκ, xi , σ(κ) = 1, (A.13)
sin hκ, xi, σ(κ) = −1,


where the factor Cκ > 0 ensures that eκ is properly normalized, i.e. that keκ kL2 (Td ) = 1. Next, let
κ : N → Zd be a fixed enumeration of Zd , with the property that j 7→ |κ(j)|∞ is monotonically
increasing, i.e. such that j ≤ j ′ implies that |κ(j)|∞ ≤ |κ(j ′ )|∞ . This will allow us to introduce an
N-indexed version of the Fourier basis,
ej (x) := eκ(j) (x), ∀j ∈ N. (A.14)
Finally we note that
s
keκ kC s ([0,2π]d ) ≤ kκk∞ . (A.15)

A.6 Trigonometric polynomial interpolation

2πj
For N ∈ N, let xj = 2N +1 and let yj ∈ R for all j ∈ JN = {0, . . . , 2N + 1}d . We will construct
an operator
QN : R|JN | → L2 (Td ) : y 7→ QN (y), (A.16)
where QN (y) is a trigonometric polynomial of degree at most N such that QN (y)(xj ) = yj for
all j ∈ JN . We construct this polynomial using the discrete Fourier transform and its inverse. For
k ∈ KN = {−N, . . . , N }d , we define the discrete Fourier transform as,
X
Xk (y) = yj exp −ihk, xj i , (A.17)
j∈JN

and the trigonometric interpolation polynomial as,

1 X
QN (y)(z) = Xk (y) exp ihk, zi
|KN |
k∈KN
1 X X
= yj cos hk, z − xj i
|KN |
k∈KN j∈JN
1 X X (A.18)
= yj cos hk, xj i cos hk, zi − sin hk, xj i sin hk, zi
|KN |
k∈KN j∈JN
1 X X
= yj ak,j ek (z),
|KN |
k∈KN j∈JN

where,


1, σ(k) = 0,

ak,j = cos hk, xj i , σ(k) = 1, (A.19)
sin hk, x i, σ(k) = −1,

j

with σ as in SM A.5. We can also define an encoder EN by,

EN : C(Td ) → R|JN | : f 7→ (f (xj ))j∈JN . (A.20)
The composition QN ◦ EN is called the pseudo-spectral projection onto the space of trigonometric
polynomials of degree at most N and has the following property [38].
Lemma A.2. For s, k ∈ N0 with s > d/2 and s ≥ k, and f ∈ C s (Td ) it holds that
f − (QN ◦ EN )(f ) H k (Td )
≤ C(s, d)N −(s−k) kf kH s (Td ) , (A.21)
for a constant C(s, d) > 0 that only depends on s and d.

18
A.7 Neural network approximation theory

We recall some basic results on the approximation of functions by tanh neural networks in this
section. All results are adaptations from results in [15]. The following two lemmas address the
approximation of univariate monomials and the multiplication operator.
Lemma A.3 (Approximation of univariate monomials, Lemma 3.2 in [15]). Let k ∈ N0 , s ∈ 2N− 1,
M > 0 and define fp : [−M, M ] → R : x 7→ xp for all p ∈ N. For every ε > 0, there exists a
shallow tanh neural network ψs,ε : [−M, M ] → Rs of width 3(s+1)
2 such that

max fp − (ψs,ε )p W k,∞

≤ ε. (A.22)
p≤s

Lemma A.4 (Shallow approximation of multiplication of d numbers, Corollary 3.7 in [15]). Let
d ∈ N, k ∈ N0 and M > 0. Then l mfor every ε > 0, there exist a shallow tanh neural network
b εd : [−M, M ]d → R of width 3 d+1 Pd,d (or 4 if d = 2) such that
× 2

d
Y
b εd (x) −
× xi ≤ ε. (A.23)
i=1
W k,∞

B Additional material for Section 3

B.1 Auxiliary results for Section 3

Lemma B.1. Let q ∈ [1, ∞], r, ℓ ∈ N with ℓ ≤ r and f1 , f2 ∈ C (0,r) ([0, T ] × D). If Assumption
3.3 holds then there exists a constant C(r) > 0 such that for any α ∈ Nd0 with ℓ := kαk1 it holds
that
Dxα (f1 − f2 ) Lq
≤ C(kf1 − f2 kLq h−ℓ + max fj C (0,r)
hr−ℓ ) ∀h > 0. (B.1)
j=1,2

Proof. From the triangle inequality and (A.2) the existence of a constant C(r) > 0 follows such
that,

Dxα (f1 − f2 ) Lq
≤ max Dα fj − h−ℓ · ∆α,r
h [fj ] + C(r)h−ℓ kf1 − f2 kLq
j=1,2 Lq

≤ cℓ,r max fj C (0,r)

hr−ℓ + C(r)h −ℓ
kf1 − f2 kLq .
j=1,2

Lemma B.2. Using the notation of the proof of Theorem 3.5 (SM B.2), it holds that

D(k,α) (e
u−u
b) ≤ δ. (B.2)
C0

Proof. Using the Faà di Bruno formula [12] and its consequences for estimating the norms of deriva-
tives of compositions [15, Lemma A.7] one can prove for sufficiently regular functions g1 , g2 , h1 , h2
and a suitable multi-index β estimates of the form,

Dβ (g1 ◦ h1 − g2 ◦ h2 ) ≤ C(kg1 − g2 kC kβk1 + kh1 − h2 kC kβk1 ), (B.3)

assuming that the compositions are well-defined and where the constant C > 0 may depend on
g1 , g2 , h1 , h2 and their derivatives. Using this theorem we can prove that
s−1 ∆i,s−i [b
M X
X ε
1/M,t um ](tm , x)
D(k,α) u
b − D(k,α) bδi (t − tm ) · ΦM
·ϕ m (t) < Cδ. (B.4)
m=1 i=0
M −i i!

b δ in the definition of u
Because the size of the neural network × b does not depend on its accuracy δ
(see Lemma A.4) we can rescale δ and therefore set C = 1/2 in the above inequality.

19
Next, we observe that,
s−1 ∆i,s−i [b
M X
X ε
(k,α) 1/M,t um ](tm , x)
D bδi − ϕi )(t − tm ) · ΦM
· (ϕ m (t)
m=1 i=0
M −i i!
(B.5)
XM Xs−1 ∆i,s−i [D α u ε k
1/M,t x bm ](tm , x) X k
= −i i!
· bδi − ϕi )(t − tm ) · ∂tk−n ΦM
∂tn (ϕ m (t)
m=1 i=0
M n=0
n

Analogously to before, because the sizes of the neural networks ϕ

bδi are independent of their accuracy
δ we can rescale δ such that k(B.5)kC 0 ≤ δ/2. The claim follows by the triangle inequality,

D(k,α) (e
u−u
b) ≤ k(B.4)kC 0 + k(B.5)kC 0 ≤ δ. (B.6)
C0

Lemma B.3. Let ∆k,s h,t be a finite difference operator cf. Section 3.3 and SM A.2, let 1 ≤ j ≤ d, let
1 ≤ q ≤ ∞, let ℓ ∈ N0 and let α ∈ Nd0 with kαk1 = ℓ. Let u, u b ∈ C (s,ℓ) ([−2h, 2h] × D) such that
for all t ∈ [−2h, 2h],
Dxα (u(t, ·) − u
b(t, ·)) Lq (D) ≤ ε. (B.7)
Then there exists cs > 0 holds that,
 
s−1
X ∆i,s−i
[b
u ](0, x)
h,t
Dk,α  i i!
ti − u(t, ·) ≤ cs εh−k + |Dxα u|C (s,0) hs−k . (B.8)
i=0
h
Lq

Proof. Let t ∈ [−2h, 2h], α ∈ Nd0 with kαk1 = ℓ and x ∈ Rd be arbitrary. We first observe that,

k,α
s−1
X ∆i,s−i
h,t [u](0, x) i
s−1
X ∆i,s−i α
h,t [Dx u](0, x)
D t = ti−k . (B.9)
i=0
hi i! hi (i − k)!
i=k

Taylor’s theorem then guarantees the existence of ξt,x ∈ [−2h, 2h] such that
 
s−1 i,s−i
X ∆ h,t [u](0, x)
Dk,α  i i!
ti − u(t, ·)
i=0
h
  (B.10)
s−1−k i+k,s−i−k
X ∆h,t [Dxα u](0, x) i Di+k,α u(0, x) i s,α
=  t − t  + D u(ξt,x , x) ts−k .
i=0
hi+k i! i! (s − k)!

Now observe that because of assumption (B.7) and the definition and properties (A.2) of the finite
difference operator, there exists a constant Cs > 0 such that,
i+k,s−i−k i+k,s−i−k
∆h,t [Dxα u
b](0, x) − ∆h,t [Dxα u](0, x) ≤ Cs ε,
Lq
i+k,s−i−k
∆h,t [Dxα u](0, x) (B.11)
− Di+k,α u(0, x) ≤ Cs |Dxα u|C (s,0) hr−i−k .
hi+k

Combining all previous results provides us with the existence of a constant cs > 0 such that,
 
s−1
X i,s−i
∆ h,t [b
u ](0, x)
Dk,α  i i!
ti − u(t, ·)
i=0
h
Lq
X
s−1−k
Cs ε i Cs α

1 (B.12)
s−i−k i
≤ i+k i!
h + |Dx u|C (s,0) h h + |Dxα u|C (s,0) hs−k
i=0
h i! (s − k)!

≤ cs εh−k + |Dxα u|C (s,0) hs−k .

20

Definition B.4. Let C > 0, N ∈ N, 0 < ε < 1 and α = ln CN k /ε . For every 1 ≤ j ≤ N , we
define the function ΦN
j : [0, T ] → [0, 1] by
!
N 1 1 T
Φ1 (t) = − σ α t − ,
2 2 N
! !
N 1 T (j − 1) 1 Tj
Φj (t) = σ α t − − σ α t− , (B.13)
2 N 2 N
!
N 1 T (N − 1) 1
ΦN (t) = σ α t − + .
2 N 2

The functions {ΦNj }j approximate a partition of unity in the sense that for every j it holds on Ij
N

that for some ε > 0,

X1 X
1− ΦNj+v . ε and ΦNj+v . ε. (B.14)
v=−1 |v|≥2,
j+v∈{1,...,N }
This is made exact in [15, Section 4].
Theorem B.5. Let k ∈ N ∪ {0}, q ∈ {2, ∞}, ξ > 0 and s ∈ N. Let µ be a probability measure on
D and let f ∈ C s ([0, T ], Lq (µ)). Assume that for every 0 ≤ ℓ ≤ k there is a constant Cℓ∗ > 0 for
which it holds that for every N ∈ N there exist functions {pN N
j }j=1 that satisfy for all 1 ≤ j ≤ N ,

f − pN
j = h max i Dtℓ (f (t, ·) − pN
j (t, ·)) ≤ Cℓ∗ N −s+ℓ + ξ.
C ℓ (JjN ,Lq (µ)) t∈
(j−2)T (j+1)T
, N Lq (µ)
N

(B.15)
Let Ck := max{max0≤ℓ≤k Cℓ∗ , kf kC k ([0,T ],Lq (µ)) , 1}. There exists a constant C(k) > 0 that only
depends on k such that for all N ≥ 3 it holds that,
N
X
Ck
f− pN N
j · Φj ≤ C lnk (N ) + ξN k
. (B.16)
j=1
N s−k
C k ([0,T ],Lq (µ))

Proof. We follow the proof of [15, Theorem 5.1]. All steps of the proofs are identical, with the only
difference being that the W k,∞ ([0, 1]d )-norm of [15] is replaced by the C k ([0, T ], L2(µ))-norm
in this work. Following [15], one divides the domain [0, T ] into intervals IiN = [ti−1 , ti ], with
ti = iT /N and N ∈ N large enough. On each of these intervals, f locally can be approximated
(in Sobolev norm) by pNj , by virtue of the assumptions of the theorem. A global approximation can
then be constructed by multiplying each pN j with an approximation of the indicator function of the
corresponding intervals and summing over all intervals.
We now highlight the main steps in the proof. Step 2a (as in [15]) results in the following estimate,
 ! 
XN k
CN
f− f · ΦNj ≤ Ckf kC k (I N ,Lq (µ)) ε + N k+1 lnk ε . (B.17)
j=1
i ε
C k (IiN ,Lq (µ))
Step 2b results in the estimate,
N
!
X CN k Ck
(f − pN
j ) · ΦN,d
j ≤ C ln k k
+ ξN + Ck N k+1
ε , (B.18)
j=1
ε N s−k
C k (IiN ,Lq (µ))

Putting everything together, we find that if CN k ≥ εe,

N
X
f− pN N
j · Φj
j=1
C k ([0,T ],Lq (µ))
! (B.19)
k

CN Ck
≤ C lnk (kf kC k (I N ,Lq (µ)) + Ck )N k+1 ε + + ξN k .
ε i N s−k

21
In particular, if we set N k+1 ε = N −s+k and N ≥ 3, then we find that
N
" #
X kf kC k (I N ,Lq (µ)) + Ck
N N k i k
f− p j · Φj ≤ C ln (N ) + ξN . (B.20)
j=1
N s−k
C k ([0,T ],Lq (µ))

Lemma B.6. Let Gθ : X → H be a tanh FNO with grid size N ∈ N and let B > 0. For every
ε > 0, there exists a tanh DeepONet Gθ∗ : X → H with N d sensors and N d branch and trunk nets
such that
sup sup Gθ∗ (v)(x) − Gθ (v)(x) ≤ ε. (B.21)
kvkL∞ ≤B x∈Td

Furthermore, width(β) ∼ N d , depth(β) ∼ ln(N ), width(τ ) ∼ N d (N + ln N/ε ) and
depth(τ ) = 3.

Proof. This is a consequence of [38, Theorem 36] and Lemma D.1 with ε ← N d ε.

B.2 Proof of Theorem 3.5

Proof. Step 1: construction. To define the approximation, we divide [0, T ] into M subintervals
of the form [tm−1 , tm ], where tm = mT /M with 1 ≤ m ≤ M . One could approximate u on
every subinterval by an s-th order accurate Taylor approximation around tm , provided that one has
access to Dti u(·, tm ) for 0 ≤ i ≤ s − 1. As those values are unknown, we resort to the finite
difference approximation Dti u(·, tm ) ≈ M i · ∆i,s−i 1/M,t [U (u0 , tm )], which is a neural network. See
ε

SM A.2 for an overview of the notation for finite difference operators. Moreover, we replace the
univariate monomials ϕi : [0, T ] → R : t 7→ ti in the Taylor approximation by neural networks
s−1
bδi : [0, T ] → R with kϕi − ϕ
ϕ bδi kC k+1 . δ. Lemma A.3 guarantees that the output of (ϕ bδi )i=1 can
be obtained using a shallow network with width 2(s + 1) (independent of δ). The multiplication
operator is replaced by a shallow neural network × b δ : [−a, a]2 → R (for suitable a > 0) for which
k × −× b δ kC k+1 . δ. By Lemma A.4 only four neurons are needed for this network. This results in
the following approximation for f ∈ C 0 ([0, T ] × D),
 
s−1 i,s−i
X ∆1/M,t [f ](t m , x)
Nbmδ
[f ](t, x) := bδ 
× bδi (t − tm ) ∀t ∈ [0, T ], x ∈ D, 1 ≤ m ≤ M.
,ϕ
M −i i!
i=0
(B.22)
Next, we patch together these individual approximations by (approximately) multiplying them with
a NN approximation of a partition of unity, denoted by ΦM
1 , . . . , ΦM : [0, T ] → [0, 1], as introduced
M

in Definition B.4 in SM B. Every Φm can be thought of as a NN approximation of the indicator

function on [tm−1 , tm ]. For any ε, δ > 0, we then define our final neural network approximation
b : [0, T ] × D → R as,
u
M
X
b(t, x) :=
u bδ N
× bm
δ
[U ε (u0 , tm )](t, x), ΦM
m (t) ∀t ∈ [0, T ], x ∈ D. (B.23)
m=1

Step 2: error estimate. In order to facilitate the proof, we introduce the intermediate approxima-
e : [0, T ] × D → R and Nm : C 0 (D) × [0, T ] × D → R by,
tions u
M
X s−1 ∆i,s−i [b
M X
X ε
1/M,t um ](tm , x)
e(t, x) :=
u uεm ](t, x)·ΦM
Nm [b m (t) := ·ϕi (t−tm )·ΦM
m (t), (B.24)
m=1 m=1 i=0
M −i i!

where ubεm = U ε (u0 , tm ). Note that u

b can be obtained from u
e by replacing the multiplication opera-
tor and the monomials by neural networks. Since these the size of these networks are independent
of their accuracy δ, we can assume without loss of generality that kD(k,α) (eu−u b)(t, ·)kLq ≤ δ (see
Lemma B.2) for any relevant D (k,α)
and t.

22
It remains to prove that D(k,α) u
e ≈ D(k,α) u. Combining the observation that D(k,α) Nm [b uεm ] =
k α ε
Dt Nm [Dx u bm ] with Lemma B.3 lets us conclude that for all 0 ≤ k ≤ s − 1 and t ∈ [tm−2 , tm+2 ],

D(k,α) (Nm [b
uεm ](t, ·) − u(t, ·)) ≤ C(r)M k ( Dxα (b
uεm − u)(·, tm ) Lq
+ |u|C (s,ℓ) M −s )
Lq
(B.25)
We use Theorem B.5 with f ← u, pN α ε
bm ], ξ ← C(r)M k
j ← Nm [Dx u uεm − u)(·, tm ) Lq ,
Dxα (b
Cℓ ← C(s)|u|C (k,ℓ) , N ← M to find that,
∗

D(k,α) (b
u − u) ≤ C lnk (M )(kukC (s,ℓ) M k−s + M 2k Dxα (b
uεm − u)(·, tm ) Lq
), (B.26)
Lq
where C(r, s) > 0 only depend on r and s. Finally, using Lemma B.1 to bound
uεm − u)(·, tm ) Lq and combining this with Assumption 3.1 proves (3.4).
Dxα (b
Step 3: size estimate. The following holds,
u) ≤ Cdepth(U ε ), width(b
depth(b u) ≤ CM width(U ε ). (B.27)

B.3 Proof of Theorem 3.7

Proof. Step 1: construction. Let N ∈ N, let EN : C 0 (T d ) → R|JN | be an encoder and QN :

R|JN | → L2N be a trigonometric polynomial interpolation operator, cf. SM A.6. If we let Gb =
U ε ◦ QN ◦ EN then we can define an FNO Gθ : L2N (Td ) → L2N (Td ) as Gθ (u0 )(x) = (QN ◦ EN ◦
b 0 )(x).
G)(u
Step 2: error estimate. We decompose the L2 -error of the FNO using the triangle inequality and
b L2 ≤ C ε ku0 − QN ◦ EN ◦ u0 k p , which follows from Assumption 3.6,
the inequality kU ε − Gk stab L

kG − Gθ kL2 ≤ kG − U ε kL2 + Cstab

ε
ku0 − QN ◦ EN ◦ u0 kLp + kGb − Gθ kL2 . (B.28)

First, we find using a Sobolev embedding result (Lemma A.1) and Lemma A.2 that,
∗
u0 − (QN ◦ EN )(u0 ) Lp
≤ u0 − (QN ◦ EN )(u0 ) ≤ C(d, r)N −r+d/p ku0 kH r ,
H d/p∗
(B.29)
where p∗ is such that 1/p + 1/p∗ = 1/2. Next, we observe that for any u0 ∈ X with ku0 kC r ≤ B
that (QN ◦ EN )(u0 ) H r (Td ) ≤ CB =: B. Hence, by applying Lemma A.2 to the second and last
term of (B.28) we find that,
∗
ε
kG − Gθ kL2 ≤ C(ε + Cstab BN −r+d/p + Cε,r
B
N −r ). (B.30)

Step 3: size estimate. As for any FNO, the width is equal to N d width(U ε ). The depth in this case
is equal to depth(U ε ).

B.4 Proof of Theorem 3.10

Proof. Step 1: construction.
Let ε > 0 and n, N ∈ N. We first introduce some notation. Let JN = {0, . . . , 2N + 1}d , KN =
{−N, . . . , N }d , let {ej }j∈N be an ordered Fourier basis, as described in SM A.5, and let {b
ej }j∈N
be a neural network approximation of the same basis such that
max kek − b
ek kC r ≤ η, (B.31)
k∈KN

cf. Lemma D.1. Using notation from SM A.6, let QN : R|JN | → C(Td ) be the trigonometric
polynomial interpolation operator as in (A.18) and let EN : C(Td ) → R|JN | be the encoder as in
(A.20). We define
X X
QbN : R|JN | → C(Td ) : y 7→ 1 yj ak,j b
ek , (B.32)
|KN |
k∈KN j∈JN

23
with coefficients ak,j as in (A.19), as a neural network approximation of QN .
Inspired by the proof of Theorem 3.5 (and using its notation as well), we define Gb : C(Td ) → L2 (µ)
by
s−1 ∆i,s−i [U ε (Q ◦ E ◦ u , t )](t , x)
M X
X 1/M Z Z 0 m m
b 0 )(t, x) =
G(u · ϕδi (t − tm )ΦM (B.33)
m (t),
m=1 i=0
M −i i!
Then it holds that
M Xs−1 i,s−i ε
X X X ak,j ∆1/M [U (QZ ◦ EZ ◦ u0 , tm )](tm , xj )
b 0 )(t, x) =
(QN ◦ EN ◦ G)(u · Ψi,m,k (t, x)
m=1 i=0
|KN | M −i i!
k∈KN j∈JN

Ψi,m,k (t, x) = ϕδi (t − tm )ΦM

m (t)ek (x).
(B.34)
Now for every i, m, k let Ψi,m,k : [0, T ] × T → R be defined as,
d

b i,m,k (t, x) = ×
b δ ϕδi (t − tm ), ΦM (B.35)
Ψ m (t), b
ek (x) ,

b δ is a neural network approximation of the multiplication operator. We can then construct a

where ×
DeepONet as
p
X
Gθ (u0 )(t, x) = βj (u0 )τj (t, x)
j=1
 
s−1 X
X M X X ak,j ∆i,s−i
1/M [U ε
(Q Z ◦ E Z ◦ u ,
0 mt )](t m , xj )
=  b i,m,k (t, x).
·Ψ
i=0 m=1 k∈KN
|KN | M −i i!
j∈JN
(B.36)
b i,m,k , up
We see that we need to set p = sM (2N + 1)d and that the trunk nets are given by τj ∼ Ψ
to a different indexing.
Step 2: error estimate. First we use Assumption 3.4 to see that
X
L(G − Gθ ) L2 ≤ C D(k,α) (G − Gθ ) . (B.37)
L2
k,α

Next, we observe that using Assumption 3.1, Assumption 3.6 and (B.29) it holds that for all t,
∗
(U ε (QZ ◦ EZ ◦ u0 ) − G(u0 ))(·, t) L2
ε
≤ ε + Cstab CBZ −r+d/p . (B.38)
One can then use Theorem 3.5, but by replacing ε by (B.38) in the error bound (3.4), to find that
∗
b
D(k,α) (G − G) ≤ C lnk (M )(kukC (s,ℓ) M k−s + M 2k ((ε + Cstab
ε CB r−ℓ
Z −r+d/p )h−ℓ + Cε,ℓ h ))
L2
(B.39)
Then, using the observation that D(k,α) (Id − QN ◦ EN )Gb = Dxα (Id − QN ◦ EN )Dtk Gb we find that
b 0)
D(k,α) (Id − QN ◦ EN )G(u b 0)
≤ CN −(r−ℓ) Dtk G(u , (B.40)
L2 Hr
which can be combined with the estimate
b 0)
Dtk G(u ≤ M s−1 · M k lnk (M ) U ε (QZ ◦ EZ ◦ u0 ) ≤ M s+k−1 lnk (M )Cε,r
B
, (B.41)
Hr Hr

where we used that for u0 ∈ X with ku0 kC r ≤ B it holds (QN ◦ EN )(u0 ) H r (Td )
≤ CB =: B.
Next, we make the rough estimate that,
bN − QN ) ◦ EN )G(u
D(k,α) (Q b 0) ≤ CN d M s+k−1 lnk (M ) max kek − b
ek kC r . (B.42)
L2 k

Finally, using Lemma B.2 we find that

bN ◦ EN ◦ Gb − Gθ )(u0 )
D(k,α) (Q ≤ δ. (B.43)
L2

24
By setting η = N ℓ−r−d, h = 1/N and using that M 2k ≤ M k+s−1 and Cε,ℓ
B B
≤ Cε,r we find,
∗
L(G − Gθ ) ≤ C lnk (M )(kukC (s,ℓ) M k−s + M k+s−1 ((ε + Cstab
L2
ε
Z −r+d/p )N ℓ + Cε,r
B
N ℓ−r )).
(B.44)
We conclude by using that lnk (M ) ≤ CM ρ for any ρ > 0.
Step 3: size estimate. It follows immediately that depth(β) = depth(U ε ), width(β) =
O(M (Z d + N d width(U ε ))), depth(τ ) = 3 and width(τ ) = O(M N d (N + ln(N ))).

B.5 Proof of Theorem 3.11

Proof. Define the random variable Y = EG (θ∗ (S))2 − ET (θ∗ (S), S)2 . Then if follows from equa-
tion (4.8) in the proof of [16, Theorem 5] that
d !
2 2RL Θ −2ε4 n
P(Y > ε ) ≤ exp , (B.45)
ε2 c2

since P(Y > ε2 ) = 1 − P (A), where A is as defined in the proof of [16, Theorem 5]. It follows that

E[Y ] = E[Y 1Y ≤ε2 ] + E[Y 1Y >ε2 ] ≤ ε2 + cP Y > ε2 . (B.46)

Setting ε2 = cP Y > ε2 leads to
v
u
u 2c2 dΘ !
c 2RL
E[Y ] ≤ 2ε2 = t ln 2 . (B.47)
n ε ε2
√
For ε < 1, and using that ln(x) ≤ x for all x > 0, this equality implies that
2c3 (2RL)dΘ /2
εdΘ +1 ≤ . (B.48)
n
Hence, we find that if n ≥ 2c2 e8 /(2RL)dΘ /2 then εdΘ +1 ≤ ce−8 (2RL)dΘ which implies that
 
dΘ ! −1/2
ln c 2RL  1
2 2
≤ √ . (B.49)
ε ε 2 2

Using once more that ε2 = cP Y > ε2 and (B.49) gives us,
v 
u
u    dΘ +1 
u 2  √ dΘ ! −1/2
u 2c  2n  c 2RL  
E[Y ] ≤ u ln
 c(2RL)dΘ  ln 2   

t n c ε ε 2
(B.50)
r r
2c2 √ 2c2 (dΘ + 1) √
≤ ln (aL n)dΘ +1 = ln aL n .
n n

C Additional material for Section 4.1

C.1 Auxiliary results

Lemma C.1. Let ε > 0, let(Ω, F , P) be a probability space, and let X : Ω → R be a random
variable that satisfies E |X| ≤ ε. Then it holds that P(|X| ≤ ε) > 0.

Proof. This result is [23, Proposition 3.3].

25
Lemma C.2. Let γ ∈ {0, 1}, β ∈ [1, ∞), α0 , α1 , x0 , x1 , x2 , . . . ∈ [0, ∞) satisfy for all k ∈ N0 that
k−1
X h i
xk ≤ 1N (k)(α0 + α1 k)β k +
γ
(k − l) β (k−l) xl + xmax{l−1,0} . (C.1)
l=0

Then it holds for all k ∈ N0 that


(α0 + α1 )β 1N (k)
k 1N (k)(α0 + α1 )2−1 (1 + 21/2 )k β k :γ=0
xk ≤ = (C.2)
(4 + γ)1/2 (1 + 2(1+γ)/2 )−k 1N (k)(α0 + α1 )5−1/2 (3β)k : γ = 1.

Proof. This result is [33, Corollary 4.3].

Lemma C.3. Let α ∈ [1, ∞), x0 , x1 , . . . ∈ [0, ∞) satisfy for all k ∈ N0 that xk ≤ αxkk−1 . Then it
holds for all k ∈ N0 that
xk ≤ α(k+1)! xk!
0 (C.3)

Proof. We provide a proof by induction. First of all, it is clear that x0 ≤ αx0 . For the induction
(k−1)!
step, assume that xk−1 ≤ αk! x0 for an arbitrary k ∈ N0 . We calculate that
k
(k−1)!
xk ≤ α αk! x0 ≤ α(k+1)! xk!
0 . (C.4)

This proves the statement.

Lemma C.4. Let ℓ ∈ N, f ∈ C ℓ (R, R), h ∈ C ℓ (Td , R) and let Bℓ denote the ℓ-th Bell number.
Then it holds that

ℓ
|f ◦ h|C ℓ (R) ≤ kf kC ℓ (R) Bℓ khkC ℓ−1 (Td ) + |h|C ℓ (Td ) . (C.5)

Proof. Let Π be the set of all partitions of the set {1, . . . , ℓ}, let α ∈ Nd0 such that kαk1 = ℓ and
ℓ
let ι : Nℓ → Nd be a map such that Dα = Qℓ ∂ x . Then the Faà di Bruno formula can be
j=1 ι(j)
reformulated as [12],
X Y ∂ |B| h(x)
Dα f (h(x)) = f (|π|) (h(x)) · Q
π∈Π B∈π j∈B ∂xι(j)
X Y ∂ |B| h(x) (C.6)
= f (|π|) (h(x)) · Q + f ′ (h(x))Dα h(x).
π∈Π, B∈π j∈B ∂xι(j)
|π|≥2

Combining this formula with the definition of the Bell number as Bℓ = |Π|, we find the following
upper bound,
X ℓ
|f ◦ h|C ℓ (R) ≤ kf kC ℓ (R) khkC ℓ−1 (R) + kf kC 1 (R) |h|C ℓ (R)
π∈Π
(C.7)
ℓ
≤ kf kC ℓ (R) Bℓ khkC ℓ−1 (R) + |h|C ℓ (R) .

C.2 Proof of Theorem 4.2

Definition C.5. Let (Ω, F , µ) be a measure space and let q > 0. For every F /B(Rd )-measurable
function f : Ω → Rd , we define
ˆ 1/q
q
kf kLq (µ,k·k d ) := f (ω) Rd µ(dω) . (C.8)
R
Ω

26
Let (Ω, F , P, (Ft )t∈[0,T ] ) be a stochastic basis, D ⊆ Rd a compact set and, for every x ∈ D, let
X x : Ω × [0, T ] → Rd be the solution, in the Itô sense, of the following stochastic differential
equation,
dXtx = µ(Xtx )dt + σ(Xtx )dBt , X0x = x, x ∈ D, t ∈ [0, T ], (C.9)
where Bt is a standard d-dimensional Brownian motion on (Ω, F , P, (Ft )t∈[0,T ] ). The existence of
X x is guaranteed by [3, Theorem 4.5.1].
As in [16, Theorem 3.3] we define ρd as
kXsx − Xtx kLq (P,k·k )
Rd
ρd := max sup 1 < ∞, (C.10)
x∈D s,t∈[0,T ], |s − t| p
s<t

where X x is the solution, in the Itô sense, of the SDE (C.9), q > 2 is independent of d and
k·kLq (P,k·k d ) is as in Definition C.5.
R

Lemma C.6. In Setting 4.1, Assumption 3.1 and Assumption 3.6 are satisfied with
u(·, t) − U ε (ϕ, t) L2 (µ)
≤ ε, B
Cε,ℓ = CB · poly(dρd ), ε
Cstab = 1, p = ∞, (C.11)

where t ∈ [0, T ] and ϕ ∈ C02 (Rd ). Moreover, there exists C ∗ > 0 (independent of d) for which it
holds that depth(U ε ) ≤ C ∗ depth(ϕbε ) and {width, size}(U ε ) ≤ C ∗ ε−2 {width, size}(ϕ
bε ).

Proof. It follows from the Feynman-Kac formula that u(t, x) = E ϕ(Xtx ) [62]. Replacing ϕ by a
neural network ϕbε with kϕ − ϕbε kC 0 ≤ ε gives us for any probability measure µ that,
ε x
E ϕ(Xtx ) − E ϕ b (Xt ) ≤ kϕ − ϕ bε kC 0 . (C.12)
L2 (µ)

Using [16, Lemma A.2] (which is based on [23]) we find,

 1/2 
2
m
 ε x 1 X ε x   2kϕ bε k 0
ˆ
E (I) := E  E ϕb (Xt ) − ϕb (Xt (ωm )) µ(dx)   ≤ √ C . (C.13)
D m i=1 m

From [16, Lemma A.5], for all x ∈ Rd , t ∈ [0, T ] and ω ∈ Ω it holds that
d
X
Xtx (ω) = Xtei (ω) − Xt0 (ω) xi + Xt0 (ω). (C.14)
i=1

Using this equality, together with Hölder’s inequality and the boundedness of kXtx kLp [16, Lemma
A.5] we find that,
 1/2 
2
m
 1 X α ε x  
ˆ
E (IIα ) := E   bε (Xtx ) −
E Dxα ϕ Dx ϕb (Xt (ωm )) µ(dx)  
D m i=1

 1/4 
4
m
X
 ε x 1  
ˆ
≤ C · poly(dρd ) · E 
 b (Xt ) −
E ϕ bε (Xtx (ωm )) µ(dx) 
ϕ 
D m i=1

≤ CB · poly(dρd )
(C.15)
Combining the previous results gives us,
 
√ X 
E  m · (I) + (IIα ) ≤ CB · poly(dρd ). (C.16)
kαk1 ≤ℓ

27
If we combine this with Lemma C.1 then we find the existence of (ωi∗ )mi=1 such that for
 
Xm X d
1
U ε (ϕ, t)(x) = bε 
ϕ Xtei (ωi∗ ) − Xt0 (ωi∗ ) xi + Xt0 (ωi∗ ) (C.17)
m i=1 i=1

it holds that
ε x 2C
b (Xt ) − U ε (ϕ, t)
E ϕ ≤√ . (C.18)
L2 (D) m
and by setting m = ε−2 and (C.12) we find that,

u(·, t) − U ε (ϕ, t) L2 (D)

≤ ε, B
Cε,ℓ = CB · poly(ρd ), ε
Cstab = 1, p = ∞. (C.19)

Moreover, it holds that

depth(U ε ) ≤ C ∗ depth(ϕ
bε ), {width, size}(U ε ) ≤ C ∗ ε−2 {width, size}(ϕ
bε ) (C.20)
where we write C ∗ = Cpoly(dρd ).

We can now present the proof of the actual theorem.

Proof of Theorem 4.2. We use Theorem 3.5 with k = 1 and ℓ = 2 and combine the result with
Lemma C.6. We find that for every M ∈ N and δ, h > 0 it holds that,

L(b
u − u) Lq ([0,T ]×D)
+ kb
u − ukL2 (∂([0,T ]×D))
(C.21)
≤ CB · poly(dρd ) · ln(M )(kukC (1,2) M 1−s + M 2 (δh−2 + hr−2 )).

Set δ = hr , M −1−s = δ 1−2/r we find that

r−2 s−1
L(b
u − u) Lq ([0,T ]×D)
+ kb
u − ukL2 (∂([0,T ]×D)) ≤ CB · poly(dρd ) ln 1/δ δ r s+1 (C.22)

Using that ln(M ) ≤ CM σ for arbitrarily small σ > 0, we find that we should set
r+σ s+1 −1−σ
δ = ε r−2 s−1 , M =ε s−1 (C.23)

C.3 Nonlinear parabolic equations

Some examples of nonlinear parabolic PDEs of the type (4.3) are:

• The Kolmogorov–Petrovsky–Piskunov (KPP) equation [37] is a celebrated model that is of-

ten used to model wave propagation and population genetics. The model is particularly use-
ful for systems that exhibit phase transitions. One obtains the KPP equation if one chooses
a sufficiently smooth nonlinearity F that satisfies the requirements F (0) = F (1) = 0,
F ′ (0) = r > 0, F (u) > 0 and F ′ (u) < r for all 0 < u < 1. Well-known examples
include the Fisher equation [21] with F (u) = ru(1 − u) and the Allen-Cahn equation [1]
with F (u) = ru(1 − u2 ).
• Branching diffusion processes
P give a kprobabilistic representation
P of the KPP equation for
the case where F (u) = β( ∞ k=0 ak u − u) with ak ≥ 0 and k ak = 1. In this setting,
the PDE (4.3) describes a d-dimensional branching Brownian motion, where every particle
in the system dies in an an exponential time of parameter β and created k i.i.d. descendants
with probability ak [29, 58].
• Finally, the PDE (4.3) arises in the context of credit valuation adjustment when pricing
derivative contracts to compute the counterparty risk valuation, e.g. [28]. The dimension d
corresponds to the number of underlying assets and can be very high.

28
C.4 Multilevel Picard approximations
In what follows, we will provide a definition of a particular kind of MLP approximation (cf. [33])
and a theorem that quantifies the accuracy of the approximation. First, we rigorously introduce the
setting of the nonlinear parabolic PDE (4.3) that is under consideration, cf. [33, Setting 3.2 with
p ← 0]. We choose the d-dimensional torus Td = [0, 2π)d as domain and impose periodic boundary
conditions. This setting allows us to use the results of [33], which are set in Rd , and yet still consider
a bounded domain so that the error can be quantified using an uniform probability measure.
Setting C.7. Let d, m ∈ N, T, L, L ∈ [0, ∞), let (Td , B(Td ), µ) be a probability space where µ is
the rescaled Lebesgue measure, let g ∈ C(Td , R) ∩ L2 (µ), let F ∈ C(R, R), assume for all x ∈ Td ,
y, z ∈ R that
F (y) − F (z) ≤ L|y − z|, max{ F (y) , g(x) } ≤ L. (C.24)
Let ud ∈ C 1,2 ([0, T ] × Rd , R) ∩ L2 (µ) satisfy for all t ∈ [0, T ], x ∈ Rd that
(∂t ud )(t, x) = (∆x ud )(t, x) + F (ud (t, x)), ud (0, x) = g(x). (C.25)
Assume that for every ε > 0 there exists a neural network Fbε , a neural network gbε and a neural
network Iε with depth depth(Iε ) = depth(Fbε ) such that
Fbε − F ≤ ε, kb
gε − gkL2 (µ) ≤ ε, kIε − IdkC 0 ([−1−L,1+L]) ≤ ε. (C.26)
C 0 (R)

Note that for some of the equations introduced in Section C.3 the nonlinearity F might not be
globally Lipschitz and hence does not satisfy (C.24). However, it is easy to argue or rescale g [43, 5]
such that ud is globally bounded by some constant C. For instance, for the Allen-Cahn equation
it holds that if kgkL∞ ≤ 1 then ud (t, ·) L∞ ≤ 1 for any t ∈ [0, T ] [75]. One can then define a
‘smooth’, globally Lipschitz, bounded function Fe : R → R such that Fe (v) = F (v) for |v| ≤ C and
such that Fe (v) = 0 for |v| > 2C. This will then also ensure the existence of a neural network Fb
that is close to Fe in C 0 (R)-norm.
In this setting, multilevel Picard approximations can be introduced. We follow the definition of [33].
S
Definition C.8 (MLP approximation). Assume Setting C.7. Let Θ = n∈N Zn , let (Ω, F , P) be a
probability space, let Y θ : Ω → [0, 1], θ ∈ Θ, be i.i.d. random variables, assume for all θ ∈ Θ,
r ∈ (0, 1) that P(Y θ ≤ r) = r, let Uθ : [0, T ] × Ω → [0, T ], θ ∈ Θ, satisfy for all t ∈ [0, T ], θ ∈ Θ
that Uθt = t + (T − t)Y θ , let W θ : [0, T ] × Ω → Rd , θ ∈ Θ, be independent standard Brownian
motions, assume that (Uθ )θ∈Θ and (W θ )θ∈Θ are independent, and let Unθ : [0, T ] × Td × Ω → R,
n ∈ Z, θ ∈ Θ, satisfy for all n ∈ N0 , θ ∈ Θ, t ∈ [0, T ], x ∈ Td that
" mn #
θ 1N (n) X (θ,0,−k)
Un (t, x) = g(x + WT −t )
mn
k=1
n−1
" n−i #
X (T − t) mX
) − 1N (i)F (Ui−1
(θ,i,k) (θ,−i,k) (θ,i,k) (θ,i,k)
+ (F (Ui ))(Ut , x + W (θ,i,k) ) .
i=0
mn−i Ut −t
k=1
(C.27)
Example C.9. In order to improve the intuition of the reader regarding Definition C.8, we provide
explicit formulas for the multilevel Picard approximation (C.27) for n = 0 and n = 1,
"m #
θ θ 1 X (θ,0,−k)
U0 (t, x) = 0 and U1 (t, x) = g(x + WT −t ) + (T − t)F (0). (C.28)
m
k=1

Finally, we provide a result on the accuracy of MLP approximations at single space-time points.
Theorem C.10. It holds for all n ∈ N0 , t ∈ [0, T ], x ∈ Td that
!1/2
0
2 L(T + 1) exp(LT )(1 + 2LT )n
E Un (t, x) − u(t, x) ≤ . (C.29)
mn/2 exp −m/2

Proof. This result is [33, Corollary 3.15] with p ← 0, p ← 2 and L ← L/2.

29
C.5 Neural network approximation of nonlinear parabolic equations
In this section, we will prove that the solution of the nonlinear parabolic PDE as in Setting C.7 can
be approximated with a neural network without the curse of dimensionality. At this point, we do not
specify the activation function, with the only restriction being that the considered neural networks
should be expressive enough to satisfy (C.26). By emulating an MLP approximation and using that
F , g and the identity function can be approximated using neural networks, the following theorem
can be proven.
Theorem C.11. Assume Setting C.7. For every ε, σ > 0 and t ∈ [0, T ] there exists a neural network
bε : Td → R such that
u
bε (·) − u(t, ·) L2 (µ) ≤ ε.
u (C.30)
In addition, u
b satisfies that

gδ ) + logC2 (3C1 exp m/2 /ε)depth(Fbδ ),
uε ) ≤ depth(b
depth(b
!2+3σ
4C1 exp m/2 (C.31)
width(b
uε ), size(b gδ ) + size(Fbδ ) + size(Iδ ))
uε ) ≤ (size(b ,
ε
where
C1 = (T + 1)(1 + L exp(LT )), C2 = 5 + 3LT,
2
ε 2(1+1/σ) (C.32)
δ= , m = C2 .
9C12 exp m/2

Proof. Step 1: construction of the neural network. Let ε, δ > 0 be arbitrary and let Fb = Fbδ ,
gb = gbδ and I = Iδ as in Setting C.7. We then define for all n ∈ N and θ ∈ Θ,
" mn #
b θ 1N (n) X n−1 (θ,0,−k)
Un (t, x) = (I ◦bg)(x + WT −t )
mn
k=1
n−1
" n−i #
X (T − t) mX
) − 1N (i)(I
(θ,i,k) (θ,−i,k) (θ,i,k) (θ,i,k)
+ ((I n−i−1
◦ Fb )(U
b
i
n−i
◦ Fb )(U
b
i−1 ))(Ut , x + W (θ,i,k) ) ,
i=0
mn−i Ut −t
k=1
(C.33)
with notation and random variables cf. Definition C.8. Note that for every t ∈ [0, T ], n ∈ N, θ ∈ Θ,
bnθ (t, ·) is a neural network that maps from Td to R.
every realization of the random variable U
Let n ∈ N0 , m ∈ N and t ∈ [0, T ] be arbitrary. Integrating the square of the error bound of Theorem
C.10 and Fubini’s theorem tell us that
ˆ ˆ
2 2
E Un0 (t, x) − u(t, x) dµ(x) = E Un0 (t, x) − u(t, x) dµ(x)
Td Td
(C.34)
4L2 (T + 1)2 exp(2LT )(1 + 2LT )2n
≤ .
mn exp(−m)
From Lemma C.1 it then follows that
!
2 4L2 (T + 1)2 exp(2LT )(1 + 2LT )2n
ˆ
0
P Un (t, x) − u(t, x) dµ(x) ≤ > 0. (C.35)
Td mn exp(−m)

As a result, there exists ω = ω(t, n, m) ∈ Ω and a realization Un0 (ω) such that
L(T + 1) exp(LT )(1 + 2LT )n
Un0 (ω)(t, ·) − u(t, ·) ≤ . (C.36)
L2 (µ) mn/2 exp −m/2
We define
ω : [0, T ] × N2 → Ω : (t, n, m) 7→ ω(t, n, m) (C.37)
and set for every 1 ≤ k ≤ n,
b θ (t, x) = U
U b θ (ω(t, n, m))(t, x) θ
and Uk,ω (t, x) = Ukθ (ω(t, n, m))(t, x) (C.38)
k,ω k

30
b 0 (t, ·).
for all k ∈ N0 and all θ ∈ Θ. We then define our approximation as U n,ω

Step 2: error estimate. We will quantify how well U bn,ω

0
approximates Un,ω
0
. Using the calculation
that for f1 , f2 ∈ C (R) and h1 , h2 ∈ L (µ) it holds that
1 2

kf1 ◦ h1 − f2 ◦ h2 kL2 (µ) ≤ kf1 ◦ h1 − f2 ◦ h1 kL2 (µ) + kf2 ◦ h1 − f2 ◦ h2 kL2 (µ)

(C.39)
≤ kf1 − f2 kC 0 (R) + |f2 |Lip(R) kh1 − h2 kL2 (µ) ,
and the fact that 2n ≤ 2n for n ∈ N we find that it holds for every θ ∈ Θ that,
b θ (t, ·) − U θ (t, ·)
U n,ω n,ω
L2 (µ)
n−1
X
≤ 1N (n) kb
g − gkL2 (µ) + (n − 1)kI − IdkC 0 + T b (θ,i,k) − F ◦ U (θ,i,k)
I n−i−1 ◦ Fb ◦ U i,ω i,ω
L2 (µ)
i=0
n−1
X
+T 1N (i) I n−i ◦ Fb ◦ Ubi−1,ω
(θ,−i,k) (θ,−i,k)
− F ◦ Ui−1,ω
L2 (µ)
i=0
" #
(n − 1)n (n + 1)n
≤ 1N (n) kbg − gkL2 (µ) + 2T n Fb − F + (n − 1) + + kI − IdkC 0 ,
C 0 (R) 2 2
X (θ,i,k)
n−1
+ 1N (i) Ui−1
b (θ,i,k) b (θ,−i,k) (θ,−i,k)
+ LT Ui − Ui 2
− Ui−1 2
L (µ) L (µ)
i=0

≤ 1N (n)2n kb
g − gkL2 (µ) + T Fb − F + nkI − IdkC 0
C 0 (R)
n−1
X
+ (max{1, LT }) n−i b (θ,i,k) − U (θ,i,k)
U i i + 1N (i) U
b (θ,−i,k) − U (θ,−i,k)
i−1 i−1 .
L2 (µ) L2 (µ)
i=0
(C.40)

Now let us set for every k ∈ N0 ,

bk,ω
xk = sup U θ θ
(t, ·) − Uk,ω (t, ·) , (C.41)
θ∈Θ L2 (µ)

g − gkL2 (µ) +T Fb − F
and in addition we define α0 = kb , α1 = kI − IdkC 0 and β = 2+LT .
C 0 (R)
Taking the supremum over all θ ∈ Θ in (C.40) gives us for all k ∈ N0 that,
k−1
X
xk ≤ 1N (k)(α0 + α1 k)β + k
β k−i (xi + xmax{i−1,0} ). (C.42)
i=0
Therefore, we can use Lemma C.2 with γ ← 0 then gives us that for all k ∈ N0 it holds that,
bk,ω
sup U θ θ
(t, ·) − Uk,ω (t, ·) 2
θ∈Θ L (µ)
√ k
(1 + 2)
≤ 1N (k) g − gkL2 (µ) + T Fb − F
kb + kI − IdkC 0 (2 + LT )k .
2 C 0 (R)
(C.43)
Next we define
C1 = (T + 1)(1 + L exp(LT )), C2 = 5 + 3LT. (C.44)
Combining (C.36) with (C.43) then gives us that,
b 0 (t, ·) − u(t, ·)
U n,ω
L2 (µ)

≤ Ub 0 (t, ·) − U 0 (t, ·) 0
+ Un,ω (t, ·) − u(t, ·)
n,ω n,ω
L2 (µ) L2 (µ)
(C.45)

≤ C1 C2n kb g − gkL2 (µ) + Fb − F 0 + kI − IdkC 0 + m−n/2 exp m/2 .
C (R)

31
For an arbitrary σ > 0, we choose
2(1+1/σ)
m = C2 , n = σ logC2 (4C1 exp m/2 /ε) (C.46)
and if we choose gb = gbδ and Fb = Fbδ such that,
ε ε1+σ
kb
g − gkL2 (µ) ≤ δ = n = 1+σ
, (C.47)
4C1 C2 (4C1 ) exp σm/2
then we obtain that
b 0 (t, ·) − u(t, ·)
U ≤ ε. (C.48)
n,ω
L2 (µ)

Step 3: size estimate. We now provide estimates on the size of the network constructed in Step 1.
First of all, it is straightforward to see that the depth of the network can be bounded by

Lε (Ub 0 ) ≤ Lδ (b g ) + logC2 (3C1 exp m/2 /ε)Lδ (Fb ).
g ) + (n − 1)Lδ (Fb ) ≤ Lδ (b (C.49)
n,ω
Next we prove an estimate on the number of needed neurons. For notation, we write Mn =
Mε (Ub 0 ). We find that for all 0 ≤ k ≤ n,
n,ω

Mk ≤ 1N (k)mk (Mδ (b
g ) + (k − 1)Mδ (I))
k−1
X
+ mk−i (2Mδ (Fb ) + (2k − 2i − 1)Mδ (I) + Mi + Mmax{i−1,0} )
i=0 (C.50)
k−1
X
≤ 1N (k)(Mδ (b
g ) + Mδ (Fb ) + kMδ (I))(2m)k + mk−i (Mi + Mmax{i−1,0} ).
i=0

Applying Lemma C.2 to (C.50) (i.e. α0 ← Mδ (b g) + Mδ (Fb ), α1 ← Mδ (I) and β ← 2m) then
gives us that
1 √
Mn ≤ (Mδ (b g ) + Mδ (Fb) + Mδ (I))(1 + 2)n (2m)n . (C.51)
2
√ 2(1+1/σ)
Observing that 2 + 2 2 ≤ C2 and recalling that m = C2 we find that
1 (3σ+2)n/σ
Mn ≤ g ) + Mδ (Fb) + Mδ (I))C2
(Mδ (b
2
!2+3σ (C.52)
1 4C1 exp m/2
g ) + Mδ (Fb) + Mδ (I))
= (Mδ (b .
2 ε
bn,ω
For the width, we make the estimate widthε (U 0
) ≤ Mn .

C.6 PINN approximation of nonlinear parabolic equations

Setting C.12. Assume Setting C.7, let b g ∈ C(Td , R) ∩ L2 (µ)3 and let ω : [0, T ] × N2 → Ω be
defined as in (C.37) in the proof of Theorem C.11. Let U bn,ω
θ :
[0, T ] × Td × Ω → R, n ∈ Z, θ ∈ Θ,
d
satisfy for all n ∈ N0 , ε > 0, θ ∈ Θ, t ∈ [0, T ], x ∈ T that
" mn #
b θ 1N (n) X n−1 (θ,0,−k)
Un,ω (t, x) = (Iε ◦ gb)(x + WT −t (ω(t, n, m)))
mn
k=1
" n−i
X (T − t) mX
n−1
+ (Iεn−i−1 ◦ Fbε )(U b (θ,i,k) )
n−i i,ω
i=0
m
k=1
#

− 1N (i)(I n−i ◦ Fbε )(U
b (θ,−i,k) (θ,i,k) (θ,i,k)
ε i−1,ω) U t (ω(t, n, m)), x + W (θ,i,k) (ω(t, n, m)) .
Ut −t

(C.53)
3
The function gb can but need not be the same as the function gbε , for some ε > 0, of Setting C.7.

32
Lemma C.13. Assume Setting C.12. Under the assumption that,

max Ij ≤ 2, (C.54)
1≤j≤k C k ([−L−1,L+1])

where I j denotes j compositions of I, it holds for all ℓ, k ∈ N0 that,

2(ℓ+1)!
√ ℓ
bk,ω
sup U θ
≤ Ck,ℓ gkC ℓ (Td ) + 2Bℓ (1 + 2)k (1 + 2Bℓ T Fb
:= kb k
) ,
θ∈Θ C (0,ℓ) ([0,T ]×Td ) C ℓ (R)
(C.55)

and where Bℓ denote the ℓ-th Bell number i.e., the number of possible partitions of a set with ℓ
elements.

Proof. We prove the claim by induction on ℓ.

Base case. From Definition C.8, we find that for ℓ = 0 and all k ∈ N0 it holds that

bk,ω
sup U θ
gkC 0 (Td ) + 2T Fb
≤ kb k. (C.56)
θ∈Θ C 0 ([0,T ]×Td ) C 0 (R)

Claim (C.55) follows immediately for ℓ = 0.

Induction step. We assume that claim (C.55) holds true for all 0 ≤ ℓ∗ ≤ ℓ − 1 and k ∈ N0 . From
this assumption, we will deduce that (C.55) holds true for ℓ and all k ∈ N0 . We first observe that it
follows from Lemma C.4, the induction hypothesis and the fact that (Ck,ℓ )ℓ≥0 is non-decreasing for
any k, that for all θ ∈ Θ and 0 ≤ i, j ≤ k it holds that,

(I j ◦ Fb)(U
bθ )
i,ω ≤ I j
◦ b
F B ℓ C ℓ
k,ℓ−1 + bθ
U i,ω , (C.57)
C ℓ ([0,T ]×Td ) C ℓ (R) C (0,ℓ) ([0,T ]×Td )

ℓ
and where (again using Lemma C.4) it holds that I j ◦ Fb ≤ 2Bℓ Fb .
C ℓ (R) Cℓ

Using this estimate and the fact that (Ck,ℓ )k≥0 is non-decreasing for any ℓ, we can make the follow-
ing calculation for every k ∈ N0 ,

bk,ω
sup U θ
θ∈Θ C (0,ℓ) ([0,T ]×Td )
k−1
X
≤ 1N (k)|b
g|C ℓ (Td ) + T sup (I k−i−1 ◦ Fb)(U
bθ )
i,ω
C (0,ℓ) (([0,T ]×Td ))
i=0 θ∈Θ
k−1
X
+T 1N (i) sup (I k−i ◦ Fb)(Ubi−1,ω
θ
)
θ∈Θ C (0,ℓ) (([0,T ]×Td ))
i=0
k−1
!
ℓ X
≤ 1N (k)|b
g|C ℓ (Td ) + 2Bℓ T Fb ℓ
Bℓ Ck,ℓ−1 + sup bθ
U i,ω
C ℓ (R) θ∈Θ C (0,ℓ) (([0,T ]×Td ))
i=0
k−1
!
ℓ X
+ 2Bℓ T Fb 1N (i) ℓ
Bℓ Ck,ℓ−1 + sup bθ
U i−1,ω
C ℓ (R) θ∈Θ C (0,ℓ) (([0,T ]×Td ))
i=0
ℓ
≤ 1N (k)(|b
g|C ℓ (Td ) + 2Bℓ T Fb ℓ
Ck,ℓ−1 k)
C ℓ (R)
k−1
!
X ℓ
+ 2Bℓ T Fb bθ
sup U i,ω + 1N (i) sup U
bθ
i−1,ω
C ℓ (R) θ∈Θ C (0,ℓ) (([0,T ]×Td )) θ∈Θ C (0,ℓ) (([0,T ]×Td ))
i=0
(C.58)

33
ℓ
Application of Lemma C.2 with α0 ← |b
g|C ℓ (Td ) , α1 ← 2Bℓ Ck,ℓ−1
ℓ
, β ← (1 + 2BℓT Fb ) and
C ℓ (R)
γ ← 0 gives us
ℓ
|b
g |C ℓ (Td ) + 2Bℓ Ck,ℓ−1 √ k ℓ
bθ
sup U ≤ (1 + 2) (1 + 2Bℓ T Fb )k
k,ω
θ∈Θ C 0 ([0,T ]×Td ) 2 C ℓ (R)
(C.59)
√ ℓ
≤ |b
g |C ℓ (Td ) + 2Bℓ (1 + 2) (1 + 2Bℓ T Fb
k k
) ℓ
Ck,ℓ−1 .
C ℓ (R)

Filling in the definition of Ck,ℓ−1 indeed gives us the formula as stated in (C.55), thereby concluding
the proof of the claim.

Lemma C.14. Let F be a polynomial. For every σ, ε > 0 there is an operator U ε as in Assumption
3.1 such that for every t ∈ [0, T ],

U ε (u0 , t) − G(v)(u0 , t) L2 (Td )

≤ ε, B
Cε,ℓ ≤ C(Bε−σ )2(l+1)! , ε
Cstab ≤ Cε−σ . (C.60)

Moreover it holds that depth(U ε (u0 , t)) ≤ depth(b u0 ) + C ln ε−1 , width(U ε (u0 , t)) ≤
u0 )ε−2−σ and size(U ε (u0 , t)) ≤ size(b
width(b u0 )ε−2−σ .

Proof. The three bounds are a consequence of, respectively, Theorem C.11 and Lemma C.13 and
(C.45). The size estimates follow from Theorem C.11. Note that one might have to rescale the
constant σ > 0.

D Additional material for Section 4.2

D.1 Errors of DeepONets

In [43], numerous error estimates for DeepONets are proven, with a focus on DeepONets that use
the ReLU activation function. In order to quantify this error, the authors fix a probability measure
µ ∈ P(X ) and define the error as,
 1/2
ˆ ˆ
 2 
Eb =  G(u)(y) − Gθ (u)(y) dy dµ(u) , (D.1)
X U

assuming that there exist embeddings X ֒→L2 (D) and Y֒→L2 (U ). From [43, Lemma 3.4], it then
follows that Eb (D.1) can be bounded as,
Eb ≤ Lipα (G)Lip(R ◦ P) (EbE )α + Lip(R)EbA + EbR , (D.2)

where Lipα (·) denotes the α-Hölder coefficient of an operator and where EbE quantifies the encoding
error, where EbA is the error incurred in approximating the approximator A and where EbR quantifies
the reconstruction error. Assuming that all Hölder coefficients are finite, one can prove that Eb is
small if EbE , EbA and EbR are all small. We summarize how each of these three errors can be bounded
using the results from [43].

• The upper bound on the encoding error EbE depends on the chosen sensors and the spectral
decay rate for the covariance operator associated to the measure µ. Use bespoke sensor
points to obtain optimals bounds when possible, otherwise use random sensors to obtain
almost optimal bounds. More information can be found in [43, Section 3.5].
• The upper bound on the reconstruction error EbR depends on the smoothness of the operator
and the chosen basis functions τ i.e., neural networks, for the reconstruction operator R.
Following [43, Section 3.4], one first chooses a standard basis τe of which the properties
are well-known. We denote the corresponding reconstruction by R e and the corresponding

34
reconstruction error by EbR e . In this work, we focus on Fourier and Legendre basis func-
tion, both of which are introduced in SM A. One then proceeds by constructing the neural
network basis τ i.e., the trunk nets, that satisfy for some ε > 0 and p ≥ 1 the condition
ε
max kτk − τek kL2 ≤ , (D.3)
k=1,...,p p3/2
which is shown to imply that,

EbR ≤ EbR
e + Cε, (D.4)

where C ≥ 1 depends only on L2 kuk2 dG# µ(u). Using standard approximation theory,
´

one can calculate an upper bound on EbRe and using neural network theory one can quantify
the network size of τ needed such that (D.3) is satisfied. For the Fourier and Legendre
bases such results are presented in Lemma D.1 and Lemma D.2, respectively.
• The upper bound on the approximation error EbA depends on the regularity of the operator
G. We present the tanh counterparts of some results of [43, Section 3.6] in the following
sections, with the main result being Theorem D.6.

For bounded linear operators, these calculations are rather straightforward and are presented in [43,
SM D]. For nonlinear operators, one has to complete all the above steps for each specific case. In
[43, Section 4], this has been done for four types of differential equations.

D.2 Auxiliary results for linear operators

Following Section D.1, we need results on the required neural network size to approximate the
reconstruction basis to a certain accuracy (D.3). The following lemma provides such a result for the
Fourier basis introduced in SM A.5.
Lemma D.1. Let s, d, p ∈ N. For any ε > 0, there exists a trunk net τ : Rd → Rp with 2 hidden
d+1
layers of width O(p d + ps ln psε−1 ) and such that

p3/2 max kτj − ej kC s ([0,2π]d ) ≤ ε, (D.5)

j=1,...,p

where e1 , . . . , ep denote the first p elements of the Fourier basis, as in SM A.5.

Proof. We note that each element in the (real) trigonometric basis e1 , . . . , ep can be expressed in
the form
ej (x) = cos(κ · x), or ej (x) = sin(κ · x), (D.6)
for κ = κ(j) ∈ Zd with |κ|∞ ≤ N , where N is chosen as the smallest natural number such that
p ≤ (2N + 1)d . We focus only focus on the first form, as the proof for the second form is entirely
similar. Define f : [0, 2π]d → R : x 7→ κ · x and g : [−2πdN, 2πdN ] → R : x 7→ cos(x).
As f ([0, 2π]d ) ⊂ [−2πdN, 2πdN ], the composition g ◦ f is well-defined and one can see that it
coincides with a trigonometric basis function ej . Moreover, the linear map f is a trivial neural
network without hidden layers. Approximating ej by a neural network τj therefore boils down to
approximating g by a suitable neural network.
From [15, Theorem 5.1] it follows that the function g there exists an independent constant R > 0
such that for large enough t ∈ N there is a tanh neural network gbt with two hidden layers and
O(t + N ) neurons such that

gt kC s ([−2πdN,2πdN ]) ≤ 4(8(s + 1)3 R)s exp(t − s).

kg − b (D.7)

This can be proven from [15, eq. (74)] by setting δ ← 31 , k ← s, s ← t, N ← 2 and using
kgkC s = 1 and Stirling’s approximation to obtain
t−s t−s
1 3 1 e
≤p ≤ exp(s − t) for t > s + e2 . (D.8)
(t − s)! 2 · 2 2π(t − s) t − s

35

Setting t = O(ln δ −1 + s ln(s)) then gives a neural network b
gt with kg − gbt kC s < η. Next, it
follows from [15, Lemma A.7] that
s
kg ◦ f − gbt ◦ f kC s ([0,2π]d ) ≤ 16(e2 s4 d2 )s kg − b
gt kC s ([−2πdN,2πdN ])kf kC s ([0,2π]d )
(D.9)
≤ 16(e2 s4 d2 )s η(2πdN )s .
From this follows that we can obtain the desired accuracy (D.5) if we set τj = b
gt(η) ◦ f with
εp−3/2
η= , (D.10)
16(2πN d3 e2 s4 )s

which amounts to t = O(s ln sNε−1 ). As a consequence, the tanh neural network τj has two
hidden layers with O(s ln sN ε−1 + N ) neurons and therefore, by recalling that p ∼ N d , the
combined network τ has two hidden layers with
d+1
O(p(s ln sN ε−1 + N )) = O(ps ln psε−1 + p d ) (D.11)

neurons.

D.3 Proof of Theorem 4.6

Proof. Consider the setting of Theorem 4.6. Using [43, Theorem D.3], the reasoning as in [43,
Example D.4] and Lemma D.1 we find that there exists a constant C = C(d, ℓ) > 0, such that for
any m, p, s ∈ N there exists a DeepONet with trunk net τ and branch net β, such that
d+1

size(τ ) ≤ C(p d + ps ln psε−1 ), depth(τ ) = 3, (D.12)

and where
size(β) ≤ p, depth(β) ≤ 1, (D.13)
and such that the DeepONet approximation error (D.1) is bounded by
!
c m1/d
1/d
G(v) − Gθ (v) L2 (µ×λ)
≤ ε + C exp −c p + C exp − 1/d
. (D.14)
log(m)
Moreover, it holds that
N (u)(·) C s ≤ Cps/d , (D.15)
since in this case τ approximates the Fourier basis (SM A.5). From (A.15), one can then deduce
the estimate on the C s -norm of the DeepONet. This proves that (3.7) in Theorem 3.9 holds with
σ(s) = s/d. This concludes the proof.

D.4 Auxiliary results for nonlinear operators

We provide a neural network approximation result for the Legendre basis from SM A.4.
Lemma D.2. Let n, p ∈ N. For any ε > 0, there exists a trunk net τ : Rd → Rp with two hidden
layers of width O(p) such that
p3/2 max kτj − Lj kC s ([−1,1]d ) ≤ ε, (D.16)
j=1,...,p

where L1 , . . . , Lp denote the first p elements of the Legendre basis, as in SM A.4.

Proof. Let j ∈ 1, . . . , p. It holds by definition of Legendre polynomials and the corresponding

enumeration (SM A.4) that the degree in every variable is at most p. Therefore, Lj is a product
of d univariate polynomials of degree at most p. From [15, Lemma 3.2] it follows that one needs
a shallow tanh neural network with O(p) neurons to approximate a univariate polynomial to any
accuracy. The result from [15, Corollary 3.7] can be used to construct a shallow tanh network that
approximates the product of the d univariate polynomials. Note that its size only depends on the
dimension d and not on the polynomial degree p or the accuracy. Finally, [15, Lemma A.7] ensures
the accuracy of the composition of the two subnetworks. It then follows that there exist a tanh neural
network of width O(p) and two hidden layers that achieves the wanted error estimate.

36
In our proofs, we require tanh counterparts to the results for DeepONets with ReLU activation
function from [43]. We present these adapted results below for completeness.
The first lemma considers the neural network approximation of the map u 7→ Yb (u), as defined in
[43, Eq. (3.59)].
Lemma D.3. Let N, d ∈ N, and denote m := (2N + 1)d . There exists a constant C > 0, in-
dependent of N , such that for every N there exists a tanh neural network Ψ : Rm → Rm , with

size(Ψ) ≤ C(1 + m log(m)), depth(Ψ) ≤ C(1 + log(m)), (D.17)

and such that Ψ(u) = (Yb1 (u), . . . , Ybm (u)), for all u ∈ Rm .

Proof. The proof is identical to that of [43, Lemma 3.28].

We can now state the following result [70, Theorem 3.10] which is the counterpart of [43, Theorem
3.32] for tanh neural networks.
Theorem D.4. Let V be a Banach space and let J be a countable index set. Let F : [−1, 1]J → V
be a (b, ε, κ)-holomorphic map for some b ∈ ℓq (N) and q ∈ (0, 1), and an enumeration κ : N → J .
Then there exists a constant C > 0, such that for every N ∈ N, there exists an index set
n Q o
ΛN ⊂ ν = (ν1 , ν2 , . . . ) ∈ j∈J N0 | νj 6= 0 for finitely many j ∈ J , (D.18)

with |ΛN | = N , a finite set of coefficients {cν }ν∈ΛN ⊂ V , and a tanh network Ψ : RN → RΛN ,
y 7→ {Ψν (y)}ν∈ΛN with
size(Ψ) ≤ C(1 + N log(N )), depth(Ψ) ≤ C(1 + log log(N )), (D.19)
and such that
X
sup F (y) − cν Ψν (yκ(1) , . . . , yκ(N ) ) ≤ CN 1−1/q . (D.20)
y∈[−1,1]J ν∈ΛN
V

Using this theorem, we can state the tanh counterpart to [43, Corollary 3.33].
Corollary D.5. Let V be a Banach space. Let F : [−1, 1]J → V be a (b, ε, κ)-holomorphic map
for some b ∈ ℓq (N) and q ∈ (0, 1), where κ : N → J is an enumeration of J . In particular, it
is assumed that {bj }j∈N is a monotonically decreasing sequence. If P : V → Rp is a continuous
linear mapping, then there exists a constant C > 0, such that for every m ∈ N, there exists a tanh
network Ψ : Rm → Rp , with
size(Ψ) ≤ C(1 + pm log(m)), depth(Ψ) ≤ C(1 + log log(m)), (D.21)
and such that
sup kP ◦ F (y) − Ψ(yκ(1) , . . . , yκ(m) )kℓ2 (Rp ) ≤ CkPk m−s , (D.22)
y∈[−1,1]J

where s := q −1 − 1 > 0 and kPk = kPkV →ℓ2 denotes the operator norm.

Proof. The proof is identical to the one presented in [43, Appendix C.18].

Finally, we use this result to state the counterpart to [43, Theorem 3.34], which considers the ap-
proximation of a parametrized version of the operator G, defined as a mapping
F : [−1, 1]J → L2 (U ) : y 7→ G(u(·; y)). (D.23)
A more detailled discussion can be found in [43, Section 3.6.2].
Theorem D.6. Let F : [−1, 1]J → L2 (U ) be (b, ε, κ)-holomorphic with b ∈ ℓq (N) and κ : N → J
an enumeration, and assume that F is given by (D.23). Assume that the encoder/decoder pair is
constructed as in [43, Section 3.5.3], so that [43, Eq. (3.69)] holds. Given an affine reconstruction
R : Rp → L2 (U ), let P : L2 (U ) → Rp denote the corresponding optimal linear projection [43, Eq.

37
(3.17)]. Then given k ∈ N, there exists a constant Ck > 0, independent of m, p and an approximator
A : Rm → Rp that can be represented by a neural network with
size(A) ≤ Ck (1 + pm log(m)), depth(A) ≤ Ck (1 + log(m)).
and such that the approximation error EbA can be estimated by
EbA ≤ Ck kPk m−k ,
where kPk = kPkL2(U)→Rp is the operator norm of P.

Proof. The proof is as in [43, Appendix C.19.1].

D.5 Gravity pendulum with external force

Next, we consider the following nonlinear ODE system, already considered in the context of approx-
imation by DeepONets in [49] and [43],

 dv1 = v2 ,


dt (D.24)

 dv
 2 = −γ sin(v1 ) + u(t).
dt
with initial condition v(0) = 0 and where γ > 0 is a parameter. Let us denote v = (v1 , v2 ) and

v2 0
g(v) := , U (t) := , (D.25)
−γ sin(v1 ) u(t)
so that equation (D.24) can be written in the form
dv
Lu (v) := − g(v) + U = 0, v(0) = 0. (D.26)
dt
In (D.26), v1 , v2 are the angle and angular velocity of the pendulum and the constant γ denotes a
frequency parameter. The dynamics of the pendulum is driven by an external force u. With the
external force u as the input, the output of the system is the solution vector v and the underlying
nonlinear operator is given by G : L2 ([0, T ]) → L2 ([0, T ]) : u 7→ G(u) = v. Following the
discussion in [43], we choose an underlying (parametrized) measure µ ∈ P(L2 ([0, T ])) as a law of
a random field u, that can be expanded in the form
X
2πt
u(t; Y ) = Yk αk ek , t ∈ [0, T ], (D.27)
T
k∈Z

where ek (x), k ∈ Z, denotes the one-dimensional standard

Fourier basis (A.5) and where the coef-
ficients αk ≥ 0 decay to zero as αk ≤ Cα exp −|k|ℓ for some constants Cα , ℓ > 0. Furthermore,
we assume that the {Yk }k∈Z are iid random variables on [−1, 1].
Assuming the described setting, the following lemma gives an error bound of tanh DeepONets in
terms of the sizes of the corresponding branch and trunk nets.
Lemma D.7. Consider the DeepONet approximation problem for the gravity pendulum (D.24),
where the forcing u(t) is distributed according to a probability measure µ ∈ P(L2 ([0, T ])) given
as the law of the random field (D.27). For any k, r ∈ N, there exists a constant C = C(k, r) > 0,
and a constant c > 0, independent of m, p, such that for any m, p ∈ N, there exists a DeepONet Gθ
with trunk net τ and branch net β, such that
size(τ ) ≤ Cp, depth(τ ) = 2, (D.28)
and
size(β) ≤ C(1 + pm log(m)), depth(β) ≤ C(1 + log(m)), (D.29)
and such that the DeepONet approximation error (D.1) is bounded by
Eb ≤ Ce−cℓm + Cm−k + Cp−r , (D.30)
and that for all s ∈ N ,
Gθ (u)(·) Cs
≤ Cpd/2+2sd . (D.31)

38
Proof. The proof of the statement is identical to that of [43, Theorem 4.10], with the only difference
that we consider tanh neural networks instead of ReLU neural networks. As a result, the proof
comes down to determining the size of the trunk net τ using Lemma D.2 instead of [64, Proposition
2.10], thereby proving the tanh counterpart of [43, Proposition 4.5], and replacing [43, Proposition
4.9] by Theorem D.6. The C s -bound of the DeepONet follows from the C s -bound of Legendre
polynomials (A.12) and Lemma D.2.

We can again follow Theorem 3.9 to obtain error bounds for physics-informed DeepONets. As-
sumption 3.3 is satisfied for [0, T ]. As a result, we can apply Theorem 3.9 to obtain the following
result.
Theorem D.8. Consider the setting of Lemma D.7. For every β > 0, there exists a constant C > 0
such that for any p ∈ N , there exists a DeepONet Gθ with a trunk net τ = (0, τ1 , . . . , τp ) with p
outputs and branch net β = (0, β1 , . . . , βp ), such that
size(τ ) ≤ Cp, depth(τ ) = 2, (D.32)
and
size(β) ≤ C(1 + p2 log(p)), depth(β) ≤ C(1 + log(p)), (D.33)
and such that
dGθ (u)1 dGθ (u)2
− Gθ (u)2 + + γ sin Gθ (u)1 − u(t) ≤ Cp−β . (D.34)
dt L2 (µ) dt L2 (µ)

Proof. Lemma D.7 with s ← 1, k ← r and m ← p then provides a DeepONet that satisfies
the conditions of Theorem 3.9 with r∗ = +∞ and equation (3.7) with σ(s) = d/2 + 2sd. The
smoothness of v is guaranteed by [43, Lemma 4.3]. Moreover, it holds that,
dGθ (u)1 dGθ (u)1 dG(u)1
− Gθ (u)2 ≤ − + G(u)2 − Gθ (u)2 L2 (µ)
, (D.35)
dt L2 (µ) dt dt L2 (µ)

and also that,

dGθ (u)2
+ γ sin Gθ (u)1 − u(t)
dt L2 (µ)
dGθ (u)2 dG(u)2
≤ − + γ sin G(u)1 − sin Gθ (u)1 (D.36)
dt dt L2 (µ) L2 (µ)

dGθ (u)2 dG(u)2

≤ − + γ G(u)1 − Gθ (u)1 L2 (µ)
.
dt dt L2 (µ)

Combining this estimate with Theorem 3.9 with k = 2 then gives the wanted result.

D.6 An elliptic PDE: Multi-d diffusion with variable coefficients

Next, again following [43], we consider a popular model problem for elliptic PDEs with unknown
diffusion coefficients [11] and references therein. For the sake of definiteness and simplicity, we
shall assume a periodic domain D = Td in the following. For b ∈ N0 , we consider an elliptic PDE
with variable coefficients a,
La (u) := ∇ · (a(x)∇u(x)) + f (x) = 0, (D.37)
for u ∈ C b+2 (D) with suitable boundary conditions, and for fixed f ∈ C b (D). Similar to the
previous examples, we fix a probability measure µ on the coefficient a by assuming that every a can
be written as
X
a(x, Y ) = a(x) + αk Yk ek (x), (D.38)
k∈Zd

with notation from SM A.5, and where for simplicity a(x) ≡ 1 is assumed to be constant. Further-
more, we will consider the case of smooth coefficients x 7→ a(x; Y ), which is ensured by requiring

39

that there exist constants Cα > 0 and ℓ > 1, such that |αk | ≤ Cα exp −ℓ|k|∞ for all k ∈ Zd . Still
following [43], we define b = (b1 , b2 , . . . ) ∈ ℓ1 (N) by

bj := Cα exp −ℓ|κ(j)|∞ , (D.39)
where κ : N → Zd is the enumeration for the standard Fourier basis, (SM A.5). Note that by
assumption on the enumeration κ, we have that b is a monotonically decreasing sequence. In the
following, we will assume throughout that kbkℓ1 < 1, ensuring a uniform coercivity condition on
all random coefficients a = a( · ; Y ) in (D.37). Finally, we assume that the Yj ∈ [−1, 1] are centered
random variables and we let µ ∈ P(L2 (Td )) denote the law of the random coefficient (D.38).
The following lemma provides an error estimate for DeepONets approximating the operator G that
maps the input coefficient a into the solution field u of the PDE (D.37).
Lemma D.9. For any k, r ∈ N, there exists a constant C > 0, such that for any m, p ∈ N, there
exists a DeepONet Gθ = R ◦ A ◦ E with m sensors, a trunk net τ = (0, τ1 , . . . , τp ) with p outputs
and branch net β = (0, β1 , . . . , βp ), such that
size(β) ≤ C(1 + pm log(m)), depth(β) ≤ C(1 + log(m)), (D.40)
and
d+1
size(τ ) ≤ Cp d depth(τ ) ≤ 2 (D.41)
such that the DeepONet approximation error (D.1) satisfies
1
Eb ≤ Ce−cℓm d + Cm−k + Cp−r , (D.42)
and that for all s ∈ N
Gθ (u)(·) Cs
≤ Cps/d . (D.43)

Proof. This statement is the tanh counterpart of [43, Theorem 4.19], which addresses ReLU Deep-
ONets. We only highlight the differences in the proof. First, one should use Lemma D.1 instead of
[43, Lemma 3.13], which then results in different network sizes in [43, Lemma 3.14, Proposition
3.17, Corollary 3.18, Proposition 4.17]. Second, one needs to replace [43, Proposition 4.18] with
Theorem D.6.
Moreover, in this case the trunk net τ approximates the Fourier basis (SM A.5). From (A.15), one
can then deduce the estimate on the C s -norm of the DeepONet.

It is straightforward to verify that the conditions of Theorem 3.9 are satisfied in the current set-
ting. Applying Theorem 3.9 then results in the following theorem on the error of physics-informed
DeepONets for (D.37).
Theorem D.10. Consider the elliptic equation (D.37) with b ≥ 1. For every β > 0, there exists
a constant C > 0 such that for any p ∈ N , there exists a DeepONet Gθ with a trunk net τ =
(0, τ1 , . . . , τp ) with p outputs and branch net β = (0, β1 , . . . , βp ), such that
size(β) ≤ C(1 + p2 log(p)), depth(β) ≤ C(1 + log(p)), (D.44)
and
size(τ ) ≤ Cp2 depth(τ ) ≤ 2 (D.45)
such that
∇ · (a(x)∇Gθ (a)(x)) − f (x) L2 (µ)
≤ Cp−β . (D.46)

Proof. We first check the conditions of Theorem 3.9. Lemma D.9 with s ← 1, k ← r and m ← p
then provides a DeepONet that satisfies the conditions of Theorem 3.9 with r∗ = +∞ and equation
(3.7) with σ(s) = s/d. Moreover, the following estimate holds,
∇ · (a(x)∇Gθ (a)(x)) − f (x) L2 (µ)

≤ ∇ · (a(x)∇Gθ (a)(x)) − ∇ · (a(x)∇G(a)(x)) L2 (µ)

(D.47)
d
X d
X
≤ kakC 0 ∂j2 (Gθ − G)(a)(x) + kakC 1 ∂j (Gθ − G)(a)(x) L2 (µ)
.
L2 (µ)
j=1 j=1
Combining this estimate with Theorem 3.9 with k = 2 then gives the wanted result.

Case Version 6.5 Service Manual
80% (5)
Case Version 6.5 Service Manual
276 pages
General Trade Rules For Sales of Paper and Paperboard 1980
No ratings yet
General Trade Rules For Sales of Paper and Paperboard 1980
2 pages
2203 09346v2
No ratings yet
2203 09346v2
34 pages
On The Convergence of Physics Informed Neural Networks For Linear Second-Order Elliptic and Parabolic Type Pdes
No ratings yet
On The Convergence of Physics Informed Neural Networks For Linear Second-Order Elliptic and Parabolic Type Pdes
31 pages
A Rate of Convergence of Physics Informed Neural Networks For The Linear Second Order Elliptic PDEs
No ratings yet
A Rate of Convergence of Physics Informed Neural Networks For The Linear Second Order Elliptic PDEs
24 pages
Deep ONet
No ratings yet
Deep ONet
22 pages
Self-adaptive physics-informed neural networks
No ratings yet
Self-adaptive physics-informed neural networks
23 pages
master_thesis
No ratings yet
master_thesis
94 pages
A High-Efficient Hybrid Physics-Informed Neural Networks Based On Convolutional Neural Network
No ratings yet
A High-Efficient Hybrid Physics-Informed Neural Networks Based On Convolutional Neural Network
13 pages
Gradient-Enhanced Physics-Informed Neural Networks For Forward and Inverse PDE Problems
No ratings yet
Gradient-Enhanced Physics-Informed Neural Networks For Forward and Inverse PDE Problems
22 pages
Lu DeepONet NMachineIntell21
No ratings yet
Lu DeepONet NMachineIntell21
15 pages
2409.17938v3
No ratings yet
2409.17938v3
31 pages
2022 Predicting parametric spatiotemporal dynamics by multi-resolution PDE structure-preserved deep learning
No ratings yet
2022 Predicting parametric spatiotemporal dynamics by multi-resolution PDE structure-preserved deep learning
51 pages
Reliable extrapolation [Comput. Methods Appl. Mech. Eng.]
No ratings yet
Reliable extrapolation [Comput. Methods Appl. Mech. Eng.]
36 pages
2023-Physics-Informed Radial Basis Network (PIRBN) A Local
No ratings yet
2023-Physics-Informed Radial Basis Network (PIRBN) A Local
41 pages
GPINN
No ratings yet
GPINN
34 pages
merger03
No ratings yet
merger03
4 pages
Physics-Informed Deep-Learning For Scientific Computing PDF
No ratings yet
Physics-Informed Deep-Learning For Scientific Computing PDF
19 pages
Physics-Informed Deep Neural Operator Networks
No ratings yet
Physics-Informed Deep Neural Operator Networks
34 pages
Learning To Solve Multiple Partial Differential Equations With DNN
No ratings yet
Learning To Solve Multiple Partial Differential Equations With DNN
4 pages
Stochastic_Differential_Geometry_22559_24_With_the_capability
No ratings yet
Stochastic_Differential_Geometry_22559_24_With_the_capability
9 pages
1808.07526
No ratings yet
1808.07526
26 pages
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
No ratings yet
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
27 pages
A Deep Learning Framework For Solution and Discovery in Solid Mechanics
No ratings yet
A Deep Learning Framework For Solution and Discovery in Solid Mechanics
24 pages
C PINN
No ratings yet
C PINN
27 pages
Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
No ratings yet
Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
22 pages
Article 60
No ratings yet
Article 60
6 pages
36_neural_operator_graph_kernel_n
No ratings yet
36_neural_operator_graph_kernel_n
21 pages
MgNO
No ratings yet
MgNO
26 pages
Stuart 185
No ratings yet
Stuart 185
97 pages
Physics-Informed Neural NetworkThe Effect of For Sloving Differential Equations
No ratings yet
Physics-Informed Neural NetworkThe Effect of For Sloving Differential Equations
8 pages
Deep Neural Networks and Partial Differential Equations - Approximation Theory and Structural Properties) Philipp Petersen, University of Oxford
No ratings yet
Deep Neural Networks and Partial Differential Equations - Approximation Theory and Structural Properties) Philipp Petersen, University of Oxford
49 pages
PINN
No ratings yet
PINN
27 pages
OPERATOR LEARNING ALGORITHMS AND ANALYSIS
No ratings yet
OPERATOR LEARNING ALGORITHMS AND ANALYSIS
36 pages
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
No ratings yet
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
67 pages
The Mathematics of Neural Operators
No ratings yet
The Mathematics of Neural Operators
9 pages
Physics Informed NN
No ratings yet
Physics Informed NN
85 pages
An Overview on Machine Learning Methods for Partial Differential Equations From Physics Informed Neural Networks to Deep Operator Learning
No ratings yet
An Overview on Machine Learning Methods for Partial Differential Equations From Physics Informed Neural Networks to Deep Operator Learning
59 pages
FAUMoD DaniaSana-InternReport PINN
No ratings yet
FAUMoD DaniaSana-InternReport PINN
47 pages
Solving Parabolic Periodic P-Laplacian by Deep Learning
No ratings yet
Solving Parabolic Periodic P-Laplacian by Deep Learning
15 pages
Efficient-physics-informed-neural-networks
No ratings yet
Efficient-physics-informed-neural-networks
15 pages
Stiff Pdes and Physics Informed Neural Networks: Prakhar Sharma Llion Evans Michelle Tindall Perumal Nithiarasu
No ratings yet
Stiff Pdes and Physics Informed Neural Networks: Prakhar Sharma Llion Evans Michelle Tindall Perumal Nithiarasu
30 pages
Finite Basis Physics-Informed Neural Networks (Fbpinns) : A Scalable Domain Decomposition Approach For Solving Differential Equations
No ratings yet
Finite Basis Physics-Informed Neural Networks (Fbpinns) : A Scalable Domain Decomposition Approach For Solving Differential Equations
39 pages
2205.14398 Deep Neural Networks Overcome The Curse
No ratings yet
2205.14398 Deep Neural Networks Overcome The Curse
34 pages
2023-Pre-Training Strategy For Solving Evolution Equations Based On Physics-Informed Neural Networks
No ratings yet
2023-Pre-Training Strategy For Solving Evolution Equations Based On Physics-Informed Neural Networks
19 pages
4. Haghigat_2021_DeepLearning_SurrogateModel (1)
No ratings yet
4. Haghigat_2021_DeepLearning_SurrogateModel (1)
22 pages
numerical-analysis-of-physics-informed-neural-networks-and-related-models-in-physics-informed-machine-learning
No ratings yet
numerical-analysis-of-physics-informed-neural-networks-and-related-models-in-physics-informed-machine-learning
81 pages
lec105
No ratings yet
lec105
19 pages
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
No ratings yet
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
62 pages
Neural Tangent Kernel Analysis of PINN For Advection-Diffusion Equation
No ratings yet
Neural Tangent Kernel Analysis of PINN For Advection-Diffusion Equation
13 pages
NN
No ratings yet
NN
28 pages
2401.07888v2 (1)
No ratings yet
2401.07888v2 (1)
15 pages
A feedforward neural network framework for approximating the solutions to nonlinear ordinary differential equations
No ratings yet
A feedforward neural network framework for approximating the solutions to nonlinear ordinary differential equations
13 pages
1-s2.0-S0377042723003400-main
No ratings yet
1-s2.0-S0377042723003400-main
8 pages
2309.13722 Deep Neural Networks With ReLU
No ratings yet
2309.13722 Deep Neural Networks With ReLU
52 pages
MIONet
No ratings yet
MIONet
25 pages
DeepXDE A Deep Learning Library For Solving Differ
No ratings yet
DeepXDE A Deep Learning Library For Solving Differ
17 pages
Solving differential equations via artificial neural networks Findings and failures in a model problem
No ratings yet
Solving differential equations via artificial neural networks Findings and failures in a model problem
6 pages
Deep XDE
No ratings yet
Deep XDE
21 pages
A Neural Network Based PDE Solving Algorithm With High Precision
No ratings yet
A Neural Network Based PDE Solving Algorithm With High Precision
12 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Test Mazes
No ratings yet
Test Mazes
85 pages
1312.6203 Spectral Networks and Locally Connected Networks
No ratings yet
1312.6203 Spectral Networks and Locally Connected Networks
14 pages
2304.09221 Convergence of Stochastic Gradient Descent Under
No ratings yet
2304.09221 Convergence of Stochastic Gradient Descent Under
14 pages
1203.2369 Counterparty Risk Valuation
No ratings yet
1203.2369 Counterparty Risk Valuation
17 pages
21 OWL 0108 IXD Data Sheet V6
No ratings yet
21 OWL 0108 IXD Data Sheet V6
2 pages
Redux Toolkit
100% (1)
Redux Toolkit
20 pages
Dots, Dots, Dots
No ratings yet
Dots, Dots, Dots
6 pages
A SAT Attack On The Erd Os Discrepancy Conjecture
No ratings yet
A SAT Attack On The Erd Os Discrepancy Conjecture
8 pages
Colors Around Me
No ratings yet
Colors Around Me
6 pages
Bud The Mud Bug
No ratings yet
Bud The Mud Bug
6 pages
Openquant Strategy
No ratings yet
Openquant Strategy
61 pages
#Include #Include #Include #Include #Include #Include #Include #Include Static Void
No ratings yet
#Include #Include #Include #Include #Include #Include #Include #Include Static Void
2 pages
Anycast Based Routing in Vehicular Adhoc Networks (VANETS) Using Vanetmobisim
No ratings yet
Anycast Based Routing in Vehicular Adhoc Networks (VANETS) Using Vanetmobisim
11 pages
Project
No ratings yet
Project
13 pages
Tuition Fee Deposit Policy
No ratings yet
Tuition Fee Deposit Policy
8 pages
NP Paribas
No ratings yet
NP Paribas
8 pages
Advance and Judicial Ethics 1
No ratings yet
Advance and Judicial Ethics 1
6 pages
Need For Communication Interfaces: Why Are Communication Interfaces Required in Embedded Systems
No ratings yet
Need For Communication Interfaces: Why Are Communication Interfaces Required in Embedded Systems
76 pages
1 She Shuttle Leaflet Final
No ratings yet
1 She Shuttle Leaflet Final
2 pages
Good Pass
No ratings yet
Good Pass
34 pages
Texas Code of Criminal Procedure
No ratings yet
Texas Code of Criminal Procedure
1,554 pages
A1274 Manual SM-A390-V02 20160707 65X90mm For Reviewing PDF
No ratings yet
A1274 Manual SM-A390-V02 20160707 65X90mm For Reviewing PDF
2 pages
VPN and Tunnel Concept With IP-in-IP Tunnel Configuration IP-in-IP Tunnel Configuration
No ratings yet
VPN and Tunnel Concept With IP-in-IP Tunnel Configuration IP-in-IP Tunnel Configuration
21 pages
Project of Mobilink
50% (2)
Project of Mobilink
31 pages
(Transfer) Basic Concepts of Back-End - All Kinds of OCV in One Go - Ocv Chip-CSDN Blog
No ratings yet
(Transfer) Basic Concepts of Back-End - All Kinds of OCV in One Go - Ocv Chip-CSDN Blog
6 pages
津巴布韦500TPD 铜矿浮选厂报价书-Zimbabwe 500TPD Copper Flotation Plant Quotation.-from Devin
No ratings yet
津巴布韦500TPD 铜矿浮选厂报价书-Zimbabwe 500TPD Copper Flotation Plant Quotation.-from Devin
9 pages
93 95 95a Woolley Grange - Darton - Barnsley Web TT 29 Oct 2023
No ratings yet
93 95 95a Woolley Grange - Darton - Barnsley Web TT 29 Oct 2023
4 pages
Software Testing - 2012
100% (1)
Software Testing - 2012
652 pages
Database Management System: Homework-2
100% (2)
Database Management System: Homework-2
19 pages
Chapter 1 Smith and Van Ness
No ratings yet
Chapter 1 Smith and Van Ness
41 pages
NW Accounting Grade 11 June 2023 P1 and Memo
No ratings yet
NW Accounting Grade 11 June 2023 P1 and Memo
29 pages
Cyber Reinsurance
No ratings yet
Cyber Reinsurance
29 pages
NSDL Non Corporates
No ratings yet
NSDL Non Corporates
20 pages
Vaishnava Training and Education
No ratings yet
Vaishnava Training and Education
62 pages
Question Bank PDF
No ratings yet
Question Bank PDF
2 pages
Mohd Favas C V (Draughtsman Civil) 2023
No ratings yet
Mohd Favas C V (Draughtsman Civil) 2023
4 pages
Sakurai Solutions 1-1 1-4 1-8
No ratings yet
Sakurai Solutions 1-1 1-4 1-8
4 pages
vm5k Me PDF
0% (1)
vm5k Me PDF
72 pages
Carl Edison Balagtas: Lawyer
No ratings yet
Carl Edison Balagtas: Lawyer
1 page
Week 1
No ratings yet
Week 1
25 pages
Legal Consequences of Construction of A Wall in The Occupied Palestinian Territory
No ratings yet
Legal Consequences of Construction of A Wall in The Occupied Palestinian Territory
7 pages
How2 Foundations
No ratings yet
How2 Foundations
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.