Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
1 Introduction
Achieving universal function approximation is one of the most important tasks in the rapidly growing field of machine
learning (ML). To this end, deep neural networks (DNNs) have been actively developed, enhanced and used for a plethora
of versatile applications in science and engineering including image processing, natural language processing (NLP), rec-
ommendation systems, and design optimization [Guo et al. (2016); Pak and Kim (2017); Brown et al. (2020); Otter et al.
(2020); Khan et al. (2021); Kollmann et al. (2020)]. In the emerging field of scientific machine learning (SciML), DNNs
are a ubiquitous tool for analyzing, solving, and optimizing complex physical systems modeled with partial differential
equations (PDEs) across a range of scenarios, including different initial and boundary conditions (ICs, BCs), model pa-
rameters and geometric domains. Such models are trained from a finite dataset of labeled observations generated from
a (generally expensive) traditional numerical solver (e.g., finite difference method (FD), finite elements (FEM), compu-
tational fluid dynamics (CFD), and once trained they allow for accurate predictions with real-time inference [Berg and
Nyström (2019); Chen et al. (2019); Raissi et al. (2019); Abdar et al. (2021)].
DNNs are conventionally used to learn functions by approximating mappings between finite dimensional vector
spaces. Operator regression, a more recently proposed ML paradigm, focuses on learning operators by approximat-
ing mappings between abstract infinite-dimensional Banach spaces. Neural operators specifically, first introduced in
2019 with the deep operator network (DeepONet) [Lu et al. (2021)], employ DNNs to learn PDE operators and provide a
discretization-invariant emulator, which allows for fast inference and high generalization accuracy. Motivated by the
universal approximation theorem for operators proposed by Chen & Chen [Chen and Chen (1995)], DeepONet encapsu-
lates and extends the theorem for deep neural networks Lu et al. (2021)]. The architecture of DeepONet features a DNN,
which encodes the input functions at fixed sensor points (branch net) and another DNN, which encodes the informa-
tion related to the spatio-temporal coordinates of the output function (trunk net). Since its first appearance, standard
DeepONet has been employed to tackle challenging problems involving complex high-dimensional dynamical systems
[Di Leoni et al. (2021); Kontolati et al. (2023); Goswami et al. (2022d); Oommen et al. (2022); Cao et al. (2023b)]. In ad-
dition, extensions of DeepONet have been recently proposed in the context of multi-fidelity learning [De et al. (2022);
Lu et al. (2022b); Howard et al. (2022)], integration of multiple-input continuous operators [Jin et al. (2022); Goswami
∗ Corresponding author. Email: michael.shields@jhu.edu
1
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
et al. (2022c)], hybrid transferable numerical solvers [Zhang et al. (2022a)], transfer learning [Goswami et al. (2022b)],
and physics-informed learning to satisfy the underlying PDE [Wang et al. (2021); Goswami et al. (2022a)].
Another class of neural operators is the integral operators, first instantiated with the graph kernel networks (GKN)
introduced by Li et al. (2020b). In GKNs, the solution operator is expressed as an integral operator of Green’s function
which is modeled with a neural net and consists of a lifting layer, iterative kernel integration layers, and a projection
layer. GKNs were found to be unstable for multiple layers and a new graph neural operator was developed in D’Elia et al.
(2022) based on a discrete non-local diffusion-reaction equation. Furthermore, to alleviate the inefficiency and cost of
evaluating integral operators, the Fourier neural operator (FNO) [Li et al. (2020a)] was proposed, in which the integral
kernel is parameterized directly in the Fourier space. The input to the network, like in GKNs, is elevated to a higher
dimension, then passed through numerous Fourier layers before being projected back to the original dimension. Each
Fourier layer involves a forward fast Fourier transform (FFT), followed by a linear transformation of the low-Fourier
modes and then an inverse FFT. Finally, the output is added to a weight matrix, and the sum is passed through an ac-
tivation function to introduce nonlinearity. Different variants of FNO have been proposed, such as the FNO-2D which
performs 2D Fourier convolutions and uses a recurrent structure to propagate the PDE solution in time, and the FNO-3D,
which performs 3D Fourier convolutions through space and time. Compared to DeepONet, FNO employs evaluations
restricted to an equispaced mesh to discretize both the input and output spaces, where the mesh and the domain must
be the same. The interested reader is referred to Lu et al. (2022a) for a comprehensive comparison between DeepONet
and FNO across a range of complex applications. Recent advancements in neural operator research have yielded promis-
ing results for addressing the bottleneck of FNO. Two such integral operators are the Wavelet Neural Operator (WNO)
Tripura and Chakraborty (2023) and the Laplace Neural Operator (LNO) Cao et al. (2023a), which have been proposed
as alternative solutions for capturing the spatial behavior of a signal and accurately approximating transient responses,
respectively.
{𝐱 ! }'
!%& {$𝐱 ! }'
!%&
{𝐲! }'
!%&
product
ℒ" 𝜃 + ℒ# 𝜃
Temporal
locations
Trunk 𝜃∗
𝜁 = {𝑡} net
{𝐲1 " }& " &
#$% − {𝐲 }#$%
Figure 1 Latent DeepONet (L-DeepONet) framework for learning deep neural operators on latent spaces. In the first step, a
multi-layer autoencoder is trained using a combined dataset of the high-dimensional input and output realizations of a PDE
model, {xi , yi }N d
i=1 , respectively. The trained encoder projects the data onto a latent space R and the dataset on the latent
r r N
space, {xi , yi }i=1 is then used to train a DeepONet model and learn the operator Gθ , where θ denotes the trainable parameters
of the network. Finally, to evaluate the performance of the model on the original PDE outputs and perform inference, the pre-
trained decoder is employed to map predicted samples back to physically-interpretable space.
Despite the impressive capabilities of the aforementioned methods to learn mesh-invariant surrogates for complex
PDEs, these models are primarily used in a data-driven manner, and thus a representative and sufficient labeled dataset
needs to be acquired a-priori. Often, complex physical systems require high-fidelity simulations defined on fine spatial
and temporal grids, which results in very high-dimensional datasets. Furthermore, the high (and often prohibitive)
expense of traditional numerical simulators e.g., FEM allows for the generation of only a few hundred (and possibly even
fewer) observations. The combination of few and very high-dimensional observations can result in sparse datasets that
often do not represent adequately the input/output distribution space. In addition, raw high-dimensional physics-based
data often consists of redundant features that can (often significantly) delay and hinder network optimization. Physical
constraints cause the data to live on lower-dimensional latent spaces (manifolds) that can be identified with suitable
2
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
linear or nonlinear dimension reduction (DR) techniques. Previous studies have shown how latent representations can be
leveraged to enable surrogate modeling and uncertainty quantification (UQ) by addressing the ‘curse of dimensionality’
in high-dimensional PDEs with traditional approaches such as Gaussian processes (GPs) and polynomial chaos expansion
(PCE) [Lataniotis et al. (2020); Nikolopoulos et al. (2022); Giovanis and Shields (2020); Kontolati et al. (2022a,b)]. Although
neural network-based models can naturally handle high-dimensional input and output datasets, it is not clear how their
predictive accuracy, generalizability, and robustness to noise are affected when these models are trained with suitable
latent representations of the high-dimensional data.
In this work, we aim to investigate the aforementioned open questions by exploring the training of DeepONet on
latent spaces for high-dimensional time-dependent PDEs of varying degrees of complexity. The idea of training neural
operators on latent spaces using DeepONet and autoencoders (AE) was originally proposed in Oommen et al. (2022). In
this work, the growth of a two-phase microstructure for particle vapor deposition was modeled using the Cahn-Hilliard
equation. In another recent work [Zhang et al. (2022b)], the authors explored neural operators in conjunction with AE to
tackle high-dimensional stochastic problems. But the general questions of the predictive accuracy and generalizability
of DeepONets trained on latent spaces remain and require systematic investigation with comparisons to conventional
neural operators.
The training of neural operators on latent spaces consists of a two-step approach: first, training a suitable AE model to
identify a latent representation for the high-dimensional PDE inputs and outputs, and second, training a DeepONet model
and employing the pre-trained AE decoder to project samples back to the physically interpretable high-dimensional
space (see Figure 1). The L-DeepONet framework has two advantages: first, the accuracy of DeepONet is improved, and
second, the L-DeepONet training is accelerated due to the low dimensionality of the data in the latent space. Combined
with the pre-trained AE model, L-DeepONet can perform accurate predictions with real-time inference and learn the
solution operator of complex time-dependent PDEs in low-dimensional space. The contributions of this work can be
summarized as follows:
• We investigate the performance of L-DeepONet, an extension of standard DeepONet, for high-dimensional time-
dependent PDEs that leverages latent representations of input and output functions identified by suitable autoen-
coders (see Figure 1).
• We perform direct comparisons with vanilla DeepONet for complex physical systems, including brittle fracture of
materials, and complex convective and atmospheric flows, and demonstrate that L-DeepONet consistently outper-
forms the standard approach in terms of accuracy and computational time.
• We perform direct comparisons with another neural operator model, the Fourier neural operator (FNO), and two
of its variants, i.e., FNO-2D and FNO-3D, and identify advantages and limitations for a diverse set of applications.
2 Results
To demonstrate the advantages and efficiency of L-DeepONet, we learn the operator for three diverse PDE models of
increasing complexity and dimensionality. First, we consider a PDE that describes the growth of fracture in brittle
materials which are widely used in various industries including construction and manufacturing. Predicting with ac-
curacy the growth of fractures in these materials is important for preventing failures, improve safety, reliability and
cost-effectiveness in a wide range of applications. Second, we consider a PDE describing convective fluid flow, a com-
mon phenomenon in many natural and industrial processes. Understanding how these flows evolve may allow engineers
to better design systems such as heat exchangers or cooling systems to enhance efficiency and reduce energy consump-
tion. Finally, we consider a PDE describing large-scale atmospheric flows which can be used to predict patterns that
occur in weather systems. Such flows play a crucial role in the Earth’s climate system influencing precipitation, temper-
ature which in turn may have a significant impact in water resources, agricultural productivity and energy production.
Developing an accurate surrogate to predict with detail such complex atmospheric patterns may allow us to better adapt
to changes in the climate system and develop effective strategies to mitigate the impacts of climate change. For all PDEs,
the input functions for the operator represent initial conditions modeled as Gaussian or non-Gaussian random fields.
We perform direct comparisons of L-DeepONet with the standard DeepONet model trained on the full dimensional data
and with FNO. More details about the models and the corresponding data generation process are provided in the Sup-
plementary Materials to assist the readers in readily reproducing the results presented below.
Figure 2 Left: Results for all applications of the multi-layer autoencoders (MLAE) for different values of the latent dimensionality.
Right: Results for all applications of the neural operators for all studied models. Violin plots represent 5 independent training of
the models using different random seed numbers.
et al. (2022)]. Modeling fracture using the phase field method involves the integration of two fields, namely the vector-
valued elastic field, u(x), and the scalar-valued phase field, φ (x) ∈ [0, 1], with 0 representing the undamaged state of the
material and 1 a fully damaged state.
The equilibrium equation for the elastic field for an isotropic model, considering the evolution of crack, can be written
as [Goswami et al. (2019)]:
−∇ · g(φ)σ = f on Ω, (1)
where σ is the Cauchy stress tensor, f is the body force and g(φ) = (1 − φ)2 represents the monotonically decreas-
ing stress-degradation function that reduces the stiffness of the bulk material in the fracture zone. The elastic field is
constrained by Dirichlet and Neumann boundary conditions:
g(φ)σ · n = tN on ∂ΩN ,
(2)
u = u on ∂ΩD ,
where tN is the prescribed boundary forces and u is the prescribed displacement for each load step. The Dirichlet and
Neumann boundaries are represented by ∂ΩD and ∂ΩN , respectively. Considering the second-order phase field for a
quasi-static setup, the governing equation can be written as:
Gc
φ − Gc l0 ∇2 φ = −g 0 (φ)H(x, t; lc , yc ) on Ω, (3)
l0
where Gc is a scalar parameter representing the critical energy release rate of the material, l0 is the length scale pa-
rameter, which controls the diffusion of the crack, H(x, t) is a local strain-history functional, and yc , lc represent the
position and length of the crack respectively. For sharp crack topology, l0 → 0 [Bourdin et al. (2008)]. H(x, t) contains
4
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
the maximum positive tensile energy (Ψ+ 0 ) in the history of deformation of the system. The strain-history functional is
employed to initialize the crack on the domain as well as to impose irreversibility conditions on the crack growth [Miehe
et al. (2010)]. In this problem, we consider yc , lc to be random variables with yc ∼ U [0.3, 0.7] and lc ∼ U [0.4, 0.6], thus,
the initial strain function H(x, t = 0; lc , yc ) is also random (see the Supplementary Materials for more details). We aim
to learn the solution operator G : H(x, t = 0; lc , yc ) 7→ φ(x, t) which maps the initial strain-history function to the crack
evolution.
In Figure 2(a), we show the mean-square error (MSE) between the studied models and ground truth. The left panel
shows the MSE for the multi-layer autoencoder (MLAE) for different latent dimensions (d), where the violin plot shows
the distribution of MSE from n = 5 independent trials. The right panel shows the resulting MSE for L-DeepONet oper-
ating on different latent dimensions (d) compared with the full high-dimensional DeepONet, FNO-2D, and FNO-3D. We
observe that, regardless of the latent dimension, the L-DeepONet outperforms the standard DeepONet (Full DON) and
performs comparably with FNO-2D and FNO-3D. In Figure 3, a comparison between all models for a random represen-
tative result is shown. While L-DeepONet results in prediction fields almost identical to the reference, the predictions
of the standard models deviate from the ground truth both inside and around the propagated crack. Finally, the cost of
training the different models is presented in Table 1. Because the required network complexity is significantly reduced,
the L-DeepONet is 1 − 2 orders of magnitude cheaper to train than the standard approaches.
Figure 3 Brittle fracture in a plate loaded in shear: results of a representative sample with yc = 0.55 and lc = 0.6 for all
neural operators. The results of the L-DeepONet model consider the latent dimension, d = 64. The neural operator is trained to
approximate the growth of the crack for five time steps from a given initial location of the defect.
α∆T gh3
Ra = , (4)
νκ
where α is the thermal expansion coefficient, g is the gravitational acceleration, h is the thickness of the fluid layer, ν
is the kinematic viscosity and κ is the thermal diffusivity. When ∆T is small, the convective flow does not occur due
to stabilizing effects of viscous friction. Based on the governing conservation laws for an incompressible fluid (mass,
momentum, energy) and the Boussinesq approximation according to which density perturbations affect only the gravi-
5
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
tational force, the dimensional form of the Rayleigh-Bénard equations for a fluid defined on a domain Ω reads:
Du
1 ρ
= − ∇p + g + ν∇2 u, x ∈ Ω, t > 0,
Dt
ρ ρ
0 0
DT
= κ∇2 T, x ∈ Ω, t > 0, (5)
Dt
∇ · u = 0,
ρ = ρ (1 − α(T − T )),
0 0
where D/Dt denotes material derivative, u, p, T are the fluid velocity, pressure and temperature respectively, T0 is the
temperature at the lower plate, and x = (x, y) are the spatial coordinates. Considering two plates (upper and lower) the
corresponding BCs and ICs are defined as
T (x, t)|y=0 = T0 , x ∈ Ω, t > 0,
T (x, t)|y=h = T1 , x ∈ Ω, t > 0,
u(x, t)|y=0 = u(x, t)|y=h = 0, x ∈ Ω, t > 0, (6)
y
T (y, t)|t=0 = T0 + h (T1 − T0 ) + 0.1v(x), x ∈ Ω,
u(x, t)| = 0,
t=0 x ∈ Ω,
where T0 , and T1 are the fixed temperatures of the lower and upper plates, respectively. For a 2D rectangular domain
and through a non-dimensionalization of the above equations, the fixed temperatures become T0 = 0 and T1 = 1. The
IC of the temperature field is modeled as linearly distributed with the addition of a GRF, v(x) having correlation length
scales `x = 0.45, `y = 0.4 simulated using a Karhunen-Loéve expansion. The objective is to approximate the operator
G : T (x, t = 0) 7→ T (x, t) (see the Supplementary Materials for more details).
Figure 4 Rayleigh-Bénard convective flow: results of the temperature field of a representative sample for all neural operators.
The results of the L-DeepONet model consider the latent dimension, d = 100. The neural operator is trained to approximate the
growth of the evolution of the temperature field from a realization of the initial temperature field for seven time steps.
Figure 2(b) again shows violin plots of the MSE for the MLAE with differing latent dimensions and the MLE for the
corresponding L-DeepONet compared with the other neural operators. Here we see that the reconstruction accuracy
of the MLAE is improved by increasing the latent dimensionality up to d = 100. However, the change in the predictive
accuracy of L-DeepONet for different values of d is less significant, indicating that latent spaces with even very small
dimensions (d = 25) result in a very good performance. Furthermore, L-DeepONet outperforms all other neural oper-
ators with a particularly significant improvement compared to FNO. In Figure 4, we observe that L-DeepONet is able to
capture the complex dynamical features of the true model with high accuracy as the simulation evolves. In contrast, the
standard DeepONet and FNO result in diminished performance as they tend to smooth out the complex features of the
true temperature fields. Furthermore, the training time of the L-DeepONet is significantly lower than the full DeepONet
and FNO as shown in Table 1.
Shallow-water equations
6
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
The shallow-water equations model the dynamics of large-scale atmospheric flows [Galewsky et al. (2004)]. In a vector
form, the viscous shallow-water equations can be expressed as
DV
= −f k × V − g∇h + ν∇2 V ,
Dt
(7)
Dh = −h∇ · V + ν∇2 h, x ∈ Ω, t ∈ [0, 1],
Dt
where Ω = (λ, φ) represents a spherical domain where λ, φ are the longitude and latitude respectively ranging from
[−π, π], V = iu + jv is the velocity vector tangent to the spherical surface (i and j are the unit vectors in the eastward
and northward directions respectively and u, v the velocity components), and h is the height field which represents the
thickness of the fluid layer. Moreover, f = 2Ξ sin φ is the Coriolis parameter, where Ξ is the Earth’s angular velocity, g is
the gravitational acceleration and ν is the diffusion coefficient.
As an initial condition, we consider a zonal flow which represents a typical mid-latitude tropospheric jet. The initial
velocity component u is expressed as a function of the latitude φ as
0 for φ ≤ φ0 ,
" #
u 1
max
u(φ, t = 0) = exp for φ0 < φ < φ1 , (8)
n (φ − φ 0 )(φ − φ1 )
for φ ≥ φ1 ,
0
where umax is the maximum zonal velocity, φ0 , and φ1 represent the latitude in the southern and northern boundary of
the jet in radians, respectively, and n = exp[−4/(φ1 − φ0 )2 ] is a non-dimensional parameter that sets the value umax at
the jet’s mid-point. A small unbalanced perturbation is added to the height field to induce the development of barotropic
instability. The localized Gaussian perturbation is described as
where −π < λ < π and ĥ, φ2 , α, β are parameters that control the location and shape of the perturbation. We consider
α, β to be random variables with α ∼ U [0.1̄, 0.5] and β ∼ U [0.03̄, 0.2] so that the input Gaussian perturbation is random.
The localized perturbation is added to the initial height field, which forms the final initial condition h(λ, φ, t = 0) (see
Supplementary Materials for more details). The objective is to approximate the operator G : h(λ, φ, t = 0) 7→ u(λ, φ, t).
This problem is particularly challenging as the fine mesh required to capture the details of the convective flow both
spatially and temporally results in output realizations having millions of dimensions.
Figure 5 Shallow water equations: results of the evolution of the velocity field through eight time steps for all the operator
models considered in this work, for a representative realization of the initial perturbation to the height field. The results of the
L-DeepONet model consider the latent dimension, d = 81.
Unlike the previous two applications, here the approximated operator learns to map the initial condition of one quan-
tity, h(λ, φ, t = 0), to the evolution of a different quantity, u(λ, φ, t). Given the difference between the input and output
quantities of interest (in scale and features), a single encoding of the combined data as in the standard proposed approach
7
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
(see Figure 1) is insufficient. Instead, two separate encodings are needed for the input and output data, respectively.
While an autoencoder is used to reduce the dimensionality of the output data representing the longitudinal component
of the velocity vector u, standard principal component analysis (PCA) is performed on the input data due to the small
local variations in the initial random height field h which results in a small intrinsic dimensionality.
Results, in terms of MSE, are presented in Figure 2(c), where again we see that the L-DeepONet outperforms the stan-
dard approach while changes in the latent dimension do not result in significant differences in the model accuracy. Con-
sistent with the results of the previous application, the training cost of the L-DeepONet is much lower than the full Deep-
ONet (Table 1). We further note that training FNO for this problem (either FNO-2D or FNO-3D) proved computationally
prohibitive. For a moderate 3D problem with spatial discretization beyond 643 , the latest GPU architectures such as the
NVIDIA Ampere GPU do not provide sufficient memory to process a single training sample [Grady II et al. (2022)]. Data
partitioning across multiple GPUs with distributed memory, model partitioning techniques like pipeline parallelism,
and domain decomposition approaches [Grady II et al. (2022)] can be implemented to handle high-dimensional tensors
within the context of an automatic differentiation framework to compute the gradients/sensitivities of PDEs and thus op-
timize the network parameters. This advanced implementation is beyond the scope of this work as it proves unnecessary
for the studied approach. Consequently, a comparison to the FNO is not shown here. Figure 5, shows the evolution of the
L-DeepONet and the full DeepONet compared to the ground truth for a single realization. The L-DeepONet consistently
captures the complex nonlinear dynamical features for all time steps, while the full model prediction degrades over time
and again smoothing the results such that it fails to predict extreme velocity values for each time step that can be crucial,
e.g., in weather forecasting.
Table 1 Comparison of the computational training time in seconds (s) for all the neural operators across all considered appli-
cations, identically trained on an NVIDIA A6000 GPU. Inference is performed at a fraction of a second for all the approaches.
Table 2 Comparison of the accuracy of the L-DeepONet for two different dimensionality reduction techniques; namely, the
multi-layer autoencoders (MLAE) and principal component analysis (PCA), and d denotes the size of the latent space. Results for
both the maximum and minimum d values tested for each applications are provided. To evaluate the performance of L-DeepONet,
we compute the mean square error of predictions, and we report the mean and standard deviation of this metric based on five
independent training trials.
3 Discussion
We have investigated latent DeepONet (L-DeepONet) for learning neural operators on latent spaces for time-dependent
PDEs exhibiting highly non-linear features both spatially and temporally and resulting in high-dimensional observations.
The L-DeepONet framework leverages autoencoder models to cleverly construct compact representations of the high-
dimensional data while a neural operator is trained on the identified latent space for operator regression. Both the
advantages and limitations of L-DeepONet are demonstrated on a collection of diverse PDE applications of increasing
complexity and data dimensionality. As presented, L-DeepONet provides a powerful tool in SciML and UQ that improve
the accuracy and generalizability of neural operators in applications where high-fidelity simulations are considered to
exhibit complex dynamical features, e.g., in climate models.
A systematic comparison with standard DeepONet and FNO revealed that L-DeepONet improves the quality of results
and it can capture with greater accuracy the evolution of the system represented by a time-dependent PDE. This result is
more noticeable as the dimensionality and non-linearity of dynamical features increase (e.g., in complex convective fluid
8
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
flows). Another advantage is that L-DeepONet training requires less computational resources, as standard DeepONet
and FNO are trained on the full-dimensional data and are thus, more computationally demanding and require much
larger memory (see Table 1). For all applications, we found that a small latent dimensionality (d ≤ 100) is sufficient for
constructing powerful neural operators, by removing redundant features that can hinder the network optimization and
thus its predictive accuracy. Furthermore, L-DeepONet can alleviate the computational demand and thus enable tasks
that require the computation of kernel matrices, e.g., used in transfer learning for comparing the statistical distance
between data distributions [Goswami et al. (2022b)].
Despite the advantages of learning operators in latent spaces, there are certain limitations that warrant discussion.
L-DeepONet trains DR models to identify suitable latent representations for the combined input and output data. How-
ever, as shown in the final application, in cases where the approximated mapping involves heterogeneous quantities,
two independent DR models need to be constructed. While in this work we found that simple MLAE models result in the
smallest L-DeepONet predictive error, a preliminary study regarding the suitability of the DR approach needs to be per-
formed for all quantities of interest. Another disadvantage is that the L-DeepONet as formulated is unable to interpolate
in the spatial dimensions. The current L-DeepONet consists of a modified trunk net where the time component has been
preserved while the spatial dimensions have been convolved. Thus, L-DeepONet can be used for time but not for space
interpolation/extrapolation. Finally, L-DeepONet cannot be readily employed in a physics-informed learning manner
since the governing equations are not known in the latent space and therefore cannot be directly imposed. These limita-
tions motivate future studies that continue to assist researchers in the process of constructing accurate and generalizable
surrogate models for complex PDE problems prevalent in physics and engineering.
where Θ is a finite-dimensional parameter space. In this standard setting, the optimal parameters θ∗ are learned through
training the neural operator (e.g., via DeepONet, FNO) with a set of labeled observations {xj , yj }Nj=1 generated on a
discretized domain Ωm = {x1 , . . . , xm } ⊂ Ω where {xj }j=1 represent the sensor locations, thus xj|Ωm ∈ RDx and
m
yj|Ωm ∈ RDy where Dx = dx × m and Dy = dy × m. Representing the domain discretization with a single parameter
m, corresponds to the simplistic case where mesh points are equispaced. However, the training data of neural opera-
tors are not restricted to equispaced meshes. For example, for a time-dependent PDE with two spatial and one temporal
dimension with discretizations ms , mt respectively, the total output dimensionality is computed as Dy = mds x × mt .
where Jθencoder , Jθdecoder are the two parts of a DR method, r corresponds to data on the reduced space, Gθ is the approx-
imated latent operator and θ its trainable parameters. While the encoder Jθencoder is used to project high-dimensional
data onto the latent space, the decoder Jθdecoder is employed during the training of DeepONet to project predicted samples
back to original space and evaluate its accuracy on the full-dimensional data {xj , yj }Nj=1 . Once trained, L-DeepONet can
be used for real-time inference at no cost. We note that the term ‘L-DeepONet’ refers to the trained DeepONet model
together with the pre-trained encoder and decoder parts of the autoencoder which are required to perform inference in
unseen samples (see Figure 1). Next, the distinct parts of the L-DeepONet framework are elucidated in detail.
9
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
r N ×mt N (1+m )
where {xri }N
i=1 ∈ R , {yi }i=1
d
∈ Rd and {zri }i=1 t ∈ Rd . The trainable parameters of the encoder and decoder are
represented with θencoder and θdecoder respectively. The optimal set of the autoencoder parameters θae = {θencoder , θdecoder }
are obtained via the minimization of the loss function
where k·k2 denotes the standard Euclidean norm and z̃ ≡ {x̃, ỹ} denotes the reconstructed dataset of combined input
and output data. From a preliminary study, which is not shown here for the sake of brevity, we investigated three AE
models, simple autoencoders (vanilla-AE) with a single hidden layer, multi-layer autoencoders (MLAE), with multiple
hidden layers and convolutional autoencoders (CAE) which convolve data through convolutional layers. We found that
MLAE performs best, even with a small number of hidden layers (e.g., 3). Furthermore, the use of alternative AE models
which are primarily used as generative models, such as variational autoencoders (VAE) [Kingma and Welling (2013)]
or Wasserstein autoencoders (WAE) [Tolstikhin et al. (2017)], resulted in significantly worse L-DeepONet performance.
Although such models resulted in good reconstruction accuracy and thus can be used to reduce the data dimensionality
and generate synthetic yet realistic samples, we found that the obtained submanifold is not well-suited for training the
neural operator, as it may result in the reduction of data variability or even representation collapse.
where [b1 , b2 , . . . , bp ]T is the output vector of the branch net, [tr1 , tr2 , . . . , trp ]T the output vector of the trunk net and p
denotes a hyperparameter that controls the size of the final hidden layer of both the branch and trunk net. The trainable
parameters of the DeepONet, represented by θ in Eq. (14), are obtained by minimizing a loss function, which is expressed
as:
L(θ) = Lr (θ) + Li (θ),
(15)
Lr (θ) = minkyr − ỹr k22 ,
θ
10
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
where Lr (θ), Li (θ) denote the residual loss and the initial condition loss respectively, yr the reference reduced outputs
and ỹr the predicted reduced outputs. In this work, we only consider the standard regression loss Lr (θ), however, ad-
ditional loss terms can be added to the loss function. The branch and trunk networks can be modeled with any specific
architecture. Here we consider a CNN for the branch net architecture and a feed-forward neural network (FNN) for the
trunk net to take advantage of the low dimensions
√ √
of the evaluation points, ζ. To feed the branch net of L-DeepONet the
reduced output data are reshaped to R d× d , thus it is advised to choose square latent dimensionality values. Once the
optimal parameters θ are obtained, the trained model can be used to predict the reduced output for novel realizations of
the input x ∈ RDx . Finally, the predicted data are used as inputs to the pre-trained decoder Jθdecoder , to transform results
back to the original space and obtain the approximated full-dimensional output yrec ∈ RDy . We note that the training
cost of L-DeepONet is significantly lower compared to the standard model, due to the smaller size of the network and the
reduced total number of its trainable parameters.
Error metric
To assess the performance of L-DeepONet we consider the MSE evaluated on a set of Ntest test realizations
Ntest
1 X
(yi − yirec ) ,
2
MSE = (16)
Ntest i=1
where y ∈ RDy is the reference and yrec ∈ RDy the predicted output respectively.
More details on how this framework is implemented for different PDE systems of varying complexity can be found
in Results (Section 2). Information regarding the choice of neural network architectures and generation of training data
are provided in the Supplementary Materials.
References
Abdar, M., Pourpanah, F., Hussain, S., Rezazadegan, D., Liu, L., Ghavamzadeh, M., Fieguth, P., Cao, X., Khosravi, A., Acharya, U. R., et al. A Review
of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges. Information Fusion, 76:243–297, 2021.
Berg, J. and Nyström, K. Data-driven discovery of PDEs in complex datasets. Journal of Computational Physics, 384:239–252, 2019.
Bharali, R., Goswami, S., Anitescu, C., and Rabczuk, T. A robust monolithic solver for phase-field fracture integrated with fracture energy based
arc-length method and under-relaxation. Computer Methods in Applied Mechanics and Engineering, 394:114927, 2022.
Bourdin, B., Francfort, G. A., and Marigo, J.-J. The variational approach to fracture. Journal of Elasticity, 91(3):5–148, 2008.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models
are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
Cao, Q., Goswami, S., and Karniadakis, G. E. Lno: Laplace neural operator for solving differential equations. arXiv preprint arXiv:2303.10528,
2023a.
Cao, Q., Goswami, S., Karniadakis, G. E., and Chakraborty, S. Deep neural operators can predict the real-time response of floating offshore
structures under irregular waves. arXiv preprint arXiv:2302.06667, 2023b.
Chen, T. and Chen, H. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application
to dynamical systems. IEEE Transactions on Neural Networks, 6(4):911–917, 1995.
Chen, Z., Zhang, J., Arjovsky, M., and Bottou, L. Symplectic Recurrent Neural Networks. arXiv preprint arXiv:1909.13334, 2019.
Chillà, F. and Schumacher, J. New perspectives in turbulent Rayleigh-Bénard convection. The European Physical Journal E, 35(7):1–25, 2012.
De, S., Hassanaly, M., Reynolds, M., King, R. N., and Doostan, A. Bi-fidelity Modeling of Uncertain and Partially Unknown Systems using Deep-
ONets. arXiv preprint arXiv:2204.00997, 2022.
D’Elia, M., Silling, S., Yu, Y., You, H., and Gao, T. Nonlocal Kernel Network (NKN): a Stable and Resolution-Independent Deep Neural Network.
Technical report, Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2022.
Di Leoni, P. C., Lu, L., Meneveau, C., Karniadakis, G., and Zaki, T. A. DeepONet prediction of linear instability waves in high-speed boundary
layers. arXiv preprint arXiv:2105.08697, 2021.
Galewsky, J., Scott, R. K., and Polvani, L. M. An initial-value problem for testing numerical models of the global shallow-water equations. Tellus
A: Dynamic Meteorology and Oceanography, 56(5):429–440, 2004.
Giovanis, D. G. and Shields, M. D. Data-driven surrogates for high dimensional models using Gaussian process regression on the Grassmann
manifold. Computer Methods in Applied Mechanics and Engineering, 370:113269, 2020.
Goswami, S. Phase field modeling of fracture with isogeometric analysis and machine learning methods. Doctoral Thesis, 2021.
Goswami, S., Anitescu, C., and Rabczuk, T. Adaptive phase field analysis with dual hierarchical meshes for brittle fracture. Engineering Fracture
Mechanics, 218:106608, 2019.
Goswami, S., Anitescu, C., and Rabczuk, T. Adaptive fourth-order phase field analysis for brittle fracture. Computer Methods in Applied Mechanics
and Engineering, 361:112808, 2020.
11
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
Goswami, S., Bora, A., Yu, Y., and Karniadakis, G. E. Physics-Informed Neural Operators. arXiv preprint arXiv:2207.05748, 2022a.
Goswami, S., Kontolati, K., Shields, M. D., and Karniadakis, G. E. Deep transfer operator learning for partial differential equations under condi-
tional shift. Nature Machine Intelligence, pages 1–10, 2022b.
Goswami, S., Li, D. S., Rego, B. V., Latorre, M., Humphrey, J. D., and Karniadakis, G. E. Neural operator learning of heterogeneous mechanobio-
logical insults contributing to aortic aneurysms. Journal of the Royal Society Interface, 19(193):20220410, 2022c.
Goswami, S., Yin, M., Yu, Y., and Karniadakis, G. E. A physics-informed variational DeepONet for predicting crack path in quasi-brittle materials.
Computer Methods in Applied Mechanics and Engineering, 391:114587, 2022d.
Grady II, T. J., Khan, R., Louboutin, M., Yin, Z., Witte, P. A., Chandra, R., Hewett, R. J., and Herrmann, F. J. Towards Large-Scale Learned Solvers
for Parametric PDEs with Model-Parallel Fourier Neural Operators. arXiv preprint arXiv:2204.01205, 2022.
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., and Lew, M. S. Deep learning for visual understanding: A review. Neurocomputing, 187:27–48,
2016.
Howard, A. A., Perego, M., Karniadakis, G. E., and Stinis, P. Multifidelity Deep Operator Networks. arXiv preprint arXiv:2204.09157, 2022.
Jin, P., Meng, S., and Lu, L. MIONet: Learning multiple-input operators via tensor product. arXiv preprint arXiv:2202.06137, 2022.
Khan, Z. Y., Niu, Z., Sandiwarno, S., and Prince, R. Deep learning techniques for rating prediction: a survey of the state-of-the-art. Artificial
Intelligence Review, 54(1):95–135, 2021.
Kingma, D. P. and Welling, M. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
Kollmann, H. T., Abueidda, D. W., Koric, S., Guleryuz, E., and Sobh, N. A. Deep learning for topology optimization of 2D metamaterials. Materials
& Design, 196:109098, 2020.
Kontolati, K., Loukrezis, D., dos Santos, K. R., Giovanis, D. G., and Shields, M. D. Manifold learning-based polynomial chaos expansions for
high-dimensional surrogate models. International Journal for Uncertainty Quantification, 12(4), 2022a.
Kontolati, K., Loukrezis, D., Giovanis, D. G., Vandanapu, L., and Shields, M. D. A survey of unsupervised learning methods for high-dimensional
uncertainty quantification in black-box-type problems. Journal of Computational Physics, page 111313, 2022b.
Kontolati, K., Goswami, S., Shields, M. D., and Karniadakis, G. E. On the influence of over-parameterization in manifold based surrogates and
deep neural operators. Journal of Computational Physics, page 112008, 2023.
Lataniotis, C., Marelli, S., and Sudret, B. Extending classical surrogate modeling to high dimensions through supervised dimensionality reduc-
tion: a data-driven approach. International Journal for Uncertainty Quantification, 10(1), 2020.
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A. Fourier Neural Operator for Parametric Partial
Differential Equations. arXiv preprint arXiv:2010.08895, 2020a.
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Graph kernel network for
partial differential equations. arXiv preprint arXiv:2003.03485, 2020b.
Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. Learning nonlinear operators via DeepONet based on the universal approximation
theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.
Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z., and Karniadakis, G. E. A comprehensive and fair comparison of two neural operators
(with practical extensions) based on FAIR data. Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022a.
Lu, L., Pestourie, R., Johnson, S. G., and Romano, G. Multifidelity deep neural operators for efficient learning of partial differential equations
with application to fast inverse design of nanoscale heat transport. arXiv preprint arXiv:2204.06684, 2022b.
Miehe, C., Welschinger, F., and Hofacker, M. Thermodynamically consistent phase-field models of fracture: Variational principles and multi-field
FE implementations. International Journal for Numerical Methods in Engineering, 83(10):1273–1311, 2010.
Nikolopoulos, S., Kalogeris, I., and Papadopoulos, V. Non-intrusive surrogate modeling for parametrized time-dependent partial differential
equations using convolutional autoencoders. Engineering Applications of Artificial Intelligence, 109:104652, 2022.
Oommen, V., Shukla, K., Goswami, S., Dingreville, R., and Karniadakis, G. E. Learning two-phase microstructure evolution using neural operators
and autoencoder architectures. npj Computational Materials, 8(1):190, 2022.
Otter, D. W., Medina, J. R., and Kalita, J. K. A Survey of the Usages of Deep Learning for Natural Language Processing. IEEE Transactions on Neural
Networks and Learning Systems, 32(2):604–624, 2020.
Pak, M. and Kim, S. A review of deep learning in image recognition. In 2017 4th international conference on computer applications and information
processing technology (CAIPT), pages 1–3. IEEE, 2017.
Raissi, M., Perdikaris, P., and Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse
problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
Tolstikhin, I., Bousquet, O., Gelly, S., and Schoelkopf, B. Wasserstein Auto-Encoders. arXiv preprint arXiv:1711.01558, 2017.
Tripura, T. and Chakraborty, S. Wavelet neural operator for solving parametric partial differential equations in computational mechanics prob-
lems. Computer Methods in Applied Mechanics and Engineering, 404:115783, 2023.
Wang, S., Wang, H., and Perdikaris, P. Learning the solution operator of parametric partial differential equations with physics-informed Deep-
ONets. Science advances, 7(40):eabi8605, 2021.
Zhang, E., Kahana, A., Turkel, E., Ranade, R., Pathak, J., and Karniadakis, G. E. A Hybrid Iterative Numerical Transferable Solver (HINTS) for PDEs
Based on Deep Operator Network and Relaxation Methods. arXiv preprint arXiv:2208.13273, 2022a.
Zhang, J., Zhang, S., and Lin, G. MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncer-
12
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
tainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. arXiv preprint arXiv:2204.03193, 2022b.
Acknowledgements
The authors would like to acknowledge computing support provided by the Advanced Research Computing at Hopkins
(ARCH) core facility at Johns Hopkins University and the Rockfish cluster and the computational resources and services
at the Center for Computation and Visualization (CCV), Brown University where all experiments were carried out.
Funding
KK & MDS: U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research grant under
Award Number DE-SC0020428.
SG & GEK: U.S. Department of Energy project PhILMs under Award Number DE-SC0019453 and the OSD/AFOSR Multi-
disciplinary Research Program of the University Research Initiative (MURI) grant FA9550-20-1-0358.
Author contributions
Conceptualization: KK, SG, GEK, MDS
Investigation: KK, SG
Visualization: KK, SG
Supervision: GEK, MDS
Writing—original draft: KK, SG
Writing—review & editing: KK, SG, GEK, MDS
Competing interests
The authors declare no competing interests.
13
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
1
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
Supplementary Text
Nomenclature
Table S1 Summary of the main symbols and notation used in this work.
Notation Description
xj an input realization (e.g., ICs, BCs)
yj an output of the PDE model
f (·) a forcing function of the PDE
G PDE solution operator
Gθ approximation of mapping on latent space
θ trainable parameters of the neural operator
Jθencoder encoder part of the autoencoder
Jθdecoder decoder part of the autoencoder
{xi }m i=1 sensor locations
ms , mt spatial and temporal discretization
[xrj (x1 ), xrj (x2 ), ..xrj (xd )] pointwise evaluation of the reduced input to the branch net
ζ locations as inputs to the trunk net
Lae autoencoder loss
Lr (θ) L-DeepONet residual loss
d latent space dimensionality
GRF Gaussian random field
CNN convolutional neural network
FNN feed-forward neural network
CAE convolutional autoencoder
VAE variational autoencoder
MLAE multi-layer autoencoder
N total number of train/test data
OOD out-of-distribution
KLE Karhunen-Loéve expansion
MSE mean squared error
Theoretical details
Neural operators
Let Ω ⊂ RD be a bounded open set and X = X (Ω; Rdx ) and Y = Y(Ω; Rdy ) two separable Banach spaces. Furthermore,
assume that G : X → Y is a non-linear map arising from the solution of a time-dependent PDE. The objective is to
approximate the nonlinear operator via the following parametric mapping
where Θ is a finite dimensional parameter space. The optimal parameters θ∗ are learned via the training of a neural
operator with backpropagation based on a dataset {xj , yj }N
j=1 generated on a discretized domain Ωm = {x1 , . . . , xm } ⊂ Ω
where {xj }m
j=1 represent the sensor locations, thus x j|Ωm ∈ RDx and yj|Ωm ∈ RDy where Dx = dx × m and Dy = dy × m.
DeepONet
The Deep Operator Network (DeepONet) [Lu et al. (2021)] aims to learn operators between infinite-dimensional Banach
spaces. Learning is performed in a general setting in the sense that the sensor locations {xi }m i=1 at which the input
functions are evaluated need not be equispaced, however they need to be consistent across all input function evaluations.
Instead of blindly concatenating the input data (input functions [x(x1 ), x(x2 ), . . . , x(xm )]T and locations ζ) as one input,
i.e., [x(x1 ), x(x2 ), . . . , x(xm ), ζ]T , DeepONet employs two subnetworks and treats the two inputs equally. Thus, DeepONet
can be applied for high-dimensional problems, where the dimension of x(xi ) and ζ no longer match since the latter is
a vector of d components in total. A trunk network f (·), takes as input ζ and outputs [tr1 , tr2 , . . . , trp ]T ∈ Rp while a
second network, the branch net g(·), takes as input [x(x1 ), x(x2 ), . . . , x(xm )]T and outputs [b1 , b2 , . . . , bp ]T ∈ Rp . Both
2
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
subnetwork outputs are merged through a dot product Pp to generate the quantity of interest. A bias b0 ∈ R is added in the
last stage to increase expressivity, i.e., G(x)(ζ) ≈ i=k bk tk + b0 . The generalized universal approximation theorem for
operators, inspired by the original theorem introduced by Chen and Chen (1995), is presented below. The generalized
theorem essentially replaces shallow networks used for the branch and trunk net in the original work with deep neural
networks to gain expressivity.
Theorem 1 (Generalized Universal Approximation Theorem for Operators.) Suppose that X is a Banach space, K1 ⊂ X,
K2 ⊂ Rd are two compact sets in X and Rd , respectively, V is a compact set in C(K1 ). Assume that: G : V → C(K2 ) is a
nonlinear continuous operator. Then, for any > 0, there exist positive integers m, p, continuous vector functions g : Rm → Rp ,
f : Rd → Rp , and x1 , x2 , . . . , xm ∈ K1 such that
G(x)(ζ) − hg(x(x1 ), x(x2 ), . . . , x(xm )), f (ζ)i <
| {z } |{z}
branch trunk
holds for all x ∈ V and ζ ∈ K2 , where h·, ·i denotes the dot product in Rp . For the two functions g, f classical deep neural network
models and architectures can be chosen that satisfy the universal approximation theorem of functions, such as fully-connected
networks or convolutional neural networks.
The interested reader can find more information and details regarding the proof of the theorem in Lu et al. (2021).
where K : X × ΘK → H(Y(Ω; Rdv ), Y(Ω; Rdv )) maps to bounded linear operators on Y(Ω; Rdv ) and is parameterized by
φ ∈ ΘK , W : Rdv → Rdv is a linear transformation and σ : R → R is an activation function to introduce non-linearity.
The kernel integral operator K(x; φ) is defined as
Z
(K(x; φ)vt )(x) := κ(x, y, x(x), x(y); φ)vt (y)dy, ∀x ∈ Ω (19)
Ω
where κφ : R2(d+dx ) → Rdv ×dv is approximated by a neural network parameterized by φ ∈ ΘK . In FNO, the kernel
integral operator in Eq. 19 is replaced with a convolution operator defined in Fourier space. The dependence on the
input function x is removed by imposing κφ (x, y) = κφ (x − y) and thus the operator in Eq. 19 results in
(K(x; φ)vt )(x) = F −1 F(κφ ) · F(vt ) (x), ∀x ∈ Ω (20)
where F, F −1 denote the forward and inverse Fourier transformation of a function f : Ω → Rdv defined as
Z Z
−2iπhx,ki −1
(Ff )j (k) = fj (x)e dx, (F f )j (x) = fj (k)e2iπhx,ki dk, (21)
Ω Ω
√
where k ∈ Ω represents the frequency modes and j = 1, . . . , dv with i = −1 the imaginary unit. For implementation
purposes a finite-dimensional parameterization is chosen by truncating the Fourier expansion with a maximal number
of modes kmax = |Zkmax |= |{k ∈ Zd : |kj |≤ kmax,j , for j = 1, . . . , d}|. The low frequency modes are chosen by defining an
upper-bound on the `1 -norm of k ∈ Zd .
The complete FNO algorithm is employed as follows. An input x ∈ X is first lifted to a higher dimensional represen-
tation v0 (x) = P (x(x)) parameterized by a shallow FNN. Subsequently, a number of iterations of updates are applied
vt 7→ vt+1 through a series of Fourier layers. At each Fourier layer, and given that Ω is discretized with m ∈ N points
we have that vt ∈ Rm×dv and F(vt ) ∈ Cm×dv which results to F(vt ) ∈ Ckmax ×dv after the truncation of the higher order
modes. In practice, it has been shown that kmax,j = 12 perform satisfactorily for most applications. Next, the output is
multiplied to a weight tensor R ∈ Ckmax ×dv ×dv . For a uniform discretization, F is replaced with a Fast Fourier Transform
(FFT) which greatly reduces algorithmic complexity from O(m2 ) to O(m log m). After the inverse Fourier transform the
output is added to another weight matrix which is multiplied with the input i.e., W vt (x), and finally the result is passed
3
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
through a non-linear activation function σ(·). After a series of T Fourier layers, the PDE output y(x) = Q(vT (x)) is
computed via the transformation of vT with Q : Rdv → Rdy .
In the original work, two main FNO models are proposed: the FNO-2D and FNO-3D. In FNO-3D, 3-D convolutions are
performed (in space and time) and the model maps 3D functions representing the initial time steps to 3D functions rep-
resenting the full trajectory. It has been shown that FNO-3D is more expressive and leads to better accuracy for sufficient
data. However, it is fixed to the training interval, so once trained, it can only predict the solution in this range but for
any time-discretization. On the other hand, FNO-2D, performs 2-D convolutions together with a recurrent architecture
to propagate in time. While the advantage of this approach is that the model can predict the solution for any number of
time steps (and for fixed time interval ∆t), it has been shown that it is less expressive and more challenging to train. For
more information, the interested reader is referred to Li et al. (2020a).
Data generation
Brittle fracture mechanics
In this application, we consider a continuum fracture modeling method (the second-order phase field model), to ap-
proximate the growth of fracture on a unit square plate, which is fixed on the bottom and the left edge, subjected to
displacement controlled shear loading conditions on the top edge [Goswami (2021)]. We specifically aim to approximate
the mapping G : H(x, t = 0; lc , yc ) 7→ φ(x, t). We consider the material parameters as: λ = 121.15 kN/mm2 , µ = 80.77
kN/mm2 and Gc = 2.7 × 10−3 kN/mm, where λ and µ are Lamé’s constants. The computation is performed by applying
constant displacement increments of ∆u = 1 × 10−4 mm to effectively capture the crack propagation. For all simulations,
l0 is considered to be 0.0125 mm.
Initial cracks are modeled by using the local strain-history function, H(x, t). The initial strain-history function,
H(x, t = 0) is defined as a function of the closest distance of any point, x, on the domain to the line, l, which repre-
sents the discrete crack [Goswami (2021)]. In particular, it is set as:
2d(x,l)
(
BGc l0
2l0 (1 − l0 ) d(x, l) 6 2
H(x, t = 0; lc , yc ) = l0 , (22)
0 d(x, l) > 2
where B is a scalar parameter that controls the magnitude of the scalar history field and for this experiment is consid-
ered as B = 103 based on domain knowledge. The function d(x, l) computes the distance between the middle horizontal
line (defined by the two parameters lc , yc ) of the crack and sets the appropriate value for the initial strain functional. The
simulation takes place in a rectangular domain Ω = [0, 1] × [0, 1], discretized with ms × ms = 162 × 162 mesh points. The
quasi-static problem is solved and in total mt = 8 snapshots of the phase field φ(x) are considered. Thus the dimension-
ality of input and output realizations is Dx = 26, 244 and Dy = 209, 952 respectively. In total, we generate N = 261 data
and split to Ntrain = 230, Ntest = 31 for testing and training respectively. Figure S1 depicts the simulation box with the
associated varying parameters as well as a representative realization of the model with the propagation of an initial crack
through the phase field quantity in three points in time. The training datasets are generated using the code developed in
Goswami et al. (2020), which is available on https://github.com/somdattagoswami/IGAPack-PhaseField.
a Δ𝑢
b
(0,1) (1,1)
𝑦! ℓ!
(0,0) (1,0)
Figure S1 (a) Schematic of the simulation box considered in the generating labeled dataset for brittle fracture under shear
loading, depicting the two random parameters, namely the length of the crack (lc ) and the height of the crack (yc ) and (b)
resulting phase field φ(x) from the solution of the PDE model, showing the evolution of the crack through three-time steps
t = {0.25, 0.625, 1}.
4
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
Δ𝑇
b c d
e f g
Figure S2 (a) Schematic of the Rayleigh-Bénard convective flow in a thin fluid layer due to temperature gradient ∆T with the
creation of convective cells at the top and (b)-(g) the evolution of the temperature field T (x, t) for a random realization of the
initial temperature field for six time steps t = {0.05, 0.325, 0.525, 0.775, 0.9, 1.0} based on the numerical solution of the PDE.
Shallow-water equations
In this problem, we aim to approximate the operator between the random Gaussian perturbation h0 to the time-evolved
velocity component u, i.e., G : h0 (λ, φ, t = 0) 7→ u(φ, λ, t). The constants are defined as: Ξ = 7.292 × 10−5 s−1 is the
Earth’s angular velocity, g = 9.80616 ms−1 the gravitational acceleration, ν = 1.0 × 105 m2 s−1 the diffusion coefficient,
umax = 80 ms−1 , φ0 = π/7, φ1 = π/2 − φ0 , thus the mid-point of the jet where the maximum velocity is applied is at
φ = π/4. The initial velocity u is defined, so that it is zero outside the zone of interest with no discontinuities in the
northern and southern poles. The parameters of the Gaussian perturbation which is added to the height field are set as:
φ2 = π/4, ĥ = 120 m, while α, β are random parameters. In this expression, the Gaussian functions are multiplied with
a cosine so that the forced perturbation is zero at the two poles.
While the initial condition of the velocity field is given analytically (see Main Text), the height field is obtained by
numerically integrating the balance equation
tan(φ0 )
Z φ
0
gh(φ) = gh0 − αu(φ ) f + u(φ ) dφ0 ,
0
(23)
α
where α = 6.37122 × 106 m is the radius of the Earth and h0 is set so that mean layer depth around the sphere is equal
to 10 km. The above integral can be calculated using a numerical scheme such a Gaussian quadrature. The Gaussian
perturbation h0 (λ, φ, t = 0), is added to the initial height field computed by the expression above to form the final initial
condition h(λ, φ, t = 0).
5
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
The simulation takes place in a spherical domain Ω = [−π, π] × [−π, π], discretized with ms × ms = 256 × 256 mesh
points in the longitudinal and latitudinal direction respectively. The PDE is solved in the time interval t = [0, 360h] for
δt = 1.6̄ · 10−1 h and in total mt = 72 times steps (equispaced) are considered. For the presentation of results, the
time range is mapped to the dimensionless range t = [0, 1]. Thus the dimensionality of input and output realizations
is Dx = 65, 536 and Dy = 4, 587, 520 respectively. The significantly high dimensionality of outputs makes this problem
particularly challenging. In total, we generate N = 300 data and split to Ntrain = 260, Ntest = 40 for testing and training
respectively. The evolution of the velocity field u for a random realization of the initial height field is shown in Figure S3
for six points in time. Datasets were generated using the Dedalus Project that can be found in https://github.com/
DedalusProject/dedalus.
Figure S3 Evolution of the velocity field u(λ, φ) on a sphere (Earth) as a solution of the spherical shallow-water equations, for
a random realization of the initial perturbation to the height field, i.e., α = 0.38, β = 0.20. The velocity field is shown for six time
steps t = {0, 0.32, 0.4, 0.6, 0.8, 1.0}.
Table S2 Architecture of multi-layer autoencoders (MLAE). Parameter d represents the dimensionality of the latent space. All
layers use the ReLU activation function except the last one which uses the sigmoid function.
Application MLAE
Brittle material fracture [128, 64, d, 64, 128]
Rayleigh-Bénard fluid flow [400, 256, 169, d, 169, 256, 400]
Shallow water equation [256, 169, 121, d, 121, 169, 256]
Tables S2,S3, show the architecture of the autoencoders and the neural operators. For all trained multi-layer au-
toencoders the depth and width are chosen based on the dimensionality of the original data. For the neural opera-
6
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
Table S3 Architecture of DeepONet. Inputs to the Conv2D layers consist of the number of output filters, kernel size, and activa-
tion function respectively. Parameter p has been set equal to 5.
tors, a standard architecture is chosen which resulted in a good performance for all applications. Finally, for train-
ing both FNO-2D and FNO-3D the code from the original implementation was used which can be found at https:
//github.com/zongyi-li/fourier_neural_operator.
Supplementary results
Error plots
In Figures S4,S5,S6, the error plots corresponding to the three applications for all studied models are presented for a
single random realization. The error fields represent the point-wise absolute error between the reference response and
model prediction. As shown and discussed in the main paper, L-DeepONet results in the smallest interpolation error
across diverse applications.
Figure S4 Brittle fracture in a plate loaded in shear: absolute error plots of all the neural operators for the results of the repre-
sentative sample with yc = 0.55 and lc = 0.6 shown in Fig. 3. The neural operator is trained to approximate the growth of the
crack for five time steps from a given initial location of the defect on a unit square domain.
Figure S6 Shallow water equations: absolute error plots of the predictions of the temperature field from a given initial temper-
ature as obtained for all the neural operators. The predicted solution is shown in Fig. 4.
S7 c). To summarize, we found that the autoencoder-based L-DeepONet results in a better overall performance (espe-
cially for low d) with an accuracy that is either comparable or better to the PCA-based L-DeepONet. However, in certain
problems PCA can performance as good as the AE, with the additional advantage of being much less computationally
expensive.
8
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here
Figure S7 Results for all applications of principal component analysis (PCA) (left plots) for different values of the latent dimen-
sionality and neural operators (right plot) for all studied models. Violin plots represent 5 independent trainings of the models
using different random seed numbers.