0% found this document useful (0 votes)

35 views22 pages

Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators

1) The document discusses neural operators, which use deep neural networks to approximate mappings between infinite-dimensional function spaces to model partial differential equations (PDEs). 2) It introduces the latent deep operator network (L-DeepONet), an extension of the DeepONet neural operator that leverages latent representations of input and output functions identified by autoencoders. 3) The L-DeepONet is shown to outperform the standard DeepONet approach in terms of accuracy and efficiency across diverse time-dependent PDEs like fracture growth and fluid flows.

Uploaded by

data science

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views22 pages

Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators

Uploaded by

data science

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Learning in latent spaces improves the predictive

accuracy of deep neural operators

1 2 2,3 ∗1
Katiana Kontolati , Somdatta Goswami , George Em Karniadakis , Michael D Shields
1
Department of Civil and Systems Engineering, Johns Hopkins University, 2 Division of Applied Mathematics, Brown University, 3 School of
Engineering, Brown University
arXiv:2304.07599v1 [cs.LG] 15 Apr 2023

Abstract Operator regression provides a powerful means of constructing discretization-invariant

emulators for partial-differential equations (PDEs) describing physical systems. Neural operators specif-
ically employ deep neural networks to approximate mappings between infinite-dimensional Banach
spaces. As data-driven models, neural operators require the generation of labeled observations, which
in cases of complex high-fidelity models result in high-dimensional datasets containing redundant
and noisy features, which can hinder gradient-based optimization. Mapping these high-dimensional
datasets to a low-dimensional latent space of salient features can make it easier to work with the
data and also enhance learning. In this work, we investigate the latent deep operator network (L-
DeepONet), an extension of standard DeepONet, which leverages latent representations of high-dimensional
PDE input and output functions identified with suitable autoencoders. We illustrate that L-DeepONet
outperforms the standard approach in terms of both accuracy and computational efficiency across di-
verse time-dependent PDEs, e.g., modeling the growth of fracture in brittle materials, convective fluid
flows, and large-scale atmospheric flows exhibiting multiscale dynamical features.

Keywords: Neural operators, autoencoders, latent representations, partial differential equations

1 Introduction
Achieving universal function approximation is one of the most important tasks in the rapidly growing field of machine
learning (ML). To this end, deep neural networks (DNNs) have been actively developed, enhanced and used for a plethora
of versatile applications in science and engineering including image processing, natural language processing (NLP), rec-
ommendation systems, and design optimization [Guo et al. (2016); Pak and Kim (2017); Brown et al. (2020); Otter et al.
(2020); Khan et al. (2021); Kollmann et al. (2020)]. In the emerging field of scientific machine learning (SciML), DNNs
are a ubiquitous tool for analyzing, solving, and optimizing complex physical systems modeled with partial differential
equations (PDEs) across a range of scenarios, including different initial and boundary conditions (ICs, BCs), model pa-
rameters and geometric domains. Such models are trained from a finite dataset of labeled observations generated from
a (generally expensive) traditional numerical solver (e.g., finite difference method (FD), finite elements (FEM), compu-
tational fluid dynamics (CFD), and once trained they allow for accurate predictions with real-time inference [Berg and
Nyström (2019); Chen et al. (2019); Raissi et al. (2019); Abdar et al. (2021)].
DNNs are conventionally used to learn functions by approximating mappings between finite dimensional vector
spaces. Operator regression, a more recently proposed ML paradigm, focuses on learning operators by approximat-
ing mappings between abstract infinite-dimensional Banach spaces. Neural operators specifically, first introduced in
2019 with the deep operator network (DeepONet) [Lu et al. (2021)], employ DNNs to learn PDE operators and provide a
discretization-invariant emulator, which allows for fast inference and high generalization accuracy. Motivated by the
universal approximation theorem for operators proposed by Chen & Chen [Chen and Chen (1995)], DeepONet encapsu-
lates and extends the theorem for deep neural networks Lu et al. (2021)]. The architecture of DeepONet features a DNN,
which encodes the input functions at fixed sensor points (branch net) and another DNN, which encodes the informa-
tion related to the spatio-temporal coordinates of the output function (trunk net). Since its first appearance, standard
DeepONet has been employed to tackle challenging problems involving complex high-dimensional dynamical systems
[Di Leoni et al. (2021); Kontolati et al. (2023); Goswami et al. (2022d); Oommen et al. (2022); Cao et al. (2023b)]. In ad-
dition, extensions of DeepONet have been recently proposed in the context of multi-fidelity learning [De et al. (2022);
Lu et al. (2022b); Howard et al. (2022)], integration of multiple-input continuous operators [Jin et al. (2022); Goswami
∗ Corresponding author. Email: michael.shields@jhu.edu

1
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

et al. (2022c)], hybrid transferable numerical solvers [Zhang et al. (2022a)], transfer learning [Goswami et al. (2022b)],
and physics-informed learning to satisfy the underlying PDE [Wang et al. (2021); Goswami et al. (2022a)].
Another class of neural operators is the integral operators, first instantiated with the graph kernel networks (GKN)
introduced by Li et al. (2020b). In GKNs, the solution operator is expressed as an integral operator of Green’s function
which is modeled with a neural net and consists of a lifting layer, iterative kernel integration layers, and a projection
layer. GKNs were found to be unstable for multiple layers and a new graph neural operator was developed in D’Elia et al.
(2022) based on a discrete non-local diffusion-reaction equation. Furthermore, to alleviate the inefficiency and cost of
evaluating integral operators, the Fourier neural operator (FNO) [Li et al. (2020a)] was proposed, in which the integral
kernel is parameterized directly in the Fourier space. The input to the network, like in GKNs, is elevated to a higher
dimension, then passed through numerous Fourier layers before being projected back to the original dimension. Each
Fourier layer involves a forward fast Fourier transform (FFT), followed by a linear transformation of the low-Fourier
modes and then an inverse FFT. Finally, the output is added to a weight matrix, and the sum is passed through an ac-
tivation function to introduce nonlinearity. Different variants of FNO have been proposed, such as the FNO-2D which
performs 2D Fourier convolutions and uses a recurrent structure to propagate the PDE solution in time, and the FNO-3D,
which performs 3D Fourier convolutions through space and time. Compared to DeepONet, FNO employs evaluations
restricted to an equispaced mesh to discretize both the input and output spaces, where the mesh and the domain must
be the same. The interested reader is referred to Lu et al. (2022a) for a comprehensive comparison between DeepONet
and FNO across a range of complex applications. Recent advancements in neural operator research have yielded promis-
ing results for addressing the bottleneck of FNO. Two such integral operators are the Wavelet Neural Operator (WNO)
Tripura and Chakraborty (2023) and the Laplace Neural Operator (LNO) Cao et al. (2023a), which have been proposed
as alternative solutions for capturing the spatial behavior of a signal and accurately approximating transient responses,
respectively.

Multi-layer perceptron autoencoder

{𝐱 ! }'
!%& {$𝐱 ! }'
!%&

{𝐲! }' {𝐲$! }'

!%&
!%&

Latent {𝐱 "# , 𝐲"# }& '

"$% ∈ ℝ
representation
Latent DeepONet
DeepONet training {𝐲!"#$ }'
{𝐱 ! }'
!%&
!%&

Pre-trained encoder Reduced

inputs
Branch Minimize loss Pre-trained decoder
{𝐱 " }& net 𝒢! (𝑋)(𝜁)
#$%
dot

{𝐲! }'
!%&
product
ℒ" 𝜃 + ℒ# 𝜃
Temporal
locations
Trunk 𝜃∗
𝜁 = {𝑡} net
{𝐲1 " }& " &
#$% − {𝐲 }#$%

Figure 1 Latent DeepONet (L-DeepONet) framework for learning deep neural operators on latent spaces. In the first step, a
multi-layer autoencoder is trained using a combined dataset of the high-dimensional input and output realizations of a PDE
model, {xi , yi }N d
i=1 , respectively. The trained encoder projects the data onto a latent space R and the dataset on the latent
r r N
space, {xi , yi }i=1 is then used to train a DeepONet model and learn the operator Gθ , where θ denotes the trainable parameters
of the network. Finally, to evaluate the performance of the model on the original PDE outputs and perform inference, the pre-
trained decoder is employed to map predicted samples back to physically-interpretable space.

Despite the impressive capabilities of the aforementioned methods to learn mesh-invariant surrogates for complex
PDEs, these models are primarily used in a data-driven manner, and thus a representative and sufficient labeled dataset
needs to be acquired a-priori. Often, complex physical systems require high-fidelity simulations defined on fine spatial
and temporal grids, which results in very high-dimensional datasets. Furthermore, the high (and often prohibitive)
expense of traditional numerical simulators e.g., FEM allows for the generation of only a few hundred (and possibly even
fewer) observations. The combination of few and very high-dimensional observations can result in sparse datasets that
often do not represent adequately the input/output distribution space. In addition, raw high-dimensional physics-based
data often consists of redundant features that can (often significantly) delay and hinder network optimization. Physical
constraints cause the data to live on lower-dimensional latent spaces (manifolds) that can be identified with suitable
2
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

linear or nonlinear dimension reduction (DR) techniques. Previous studies have shown how latent representations can be
leveraged to enable surrogate modeling and uncertainty quantification (UQ) by addressing the ‘curse of dimensionality’
in high-dimensional PDEs with traditional approaches such as Gaussian processes (GPs) and polynomial chaos expansion
(PCE) [Lataniotis et al. (2020); Nikolopoulos et al. (2022); Giovanis and Shields (2020); Kontolati et al. (2022a,b)]. Although
neural network-based models can naturally handle high-dimensional input and output datasets, it is not clear how their
predictive accuracy, generalizability, and robustness to noise are affected when these models are trained with suitable
latent representations of the high-dimensional data.
In this work, we aim to investigate the aforementioned open questions by exploring the training of DeepONet on
latent spaces for high-dimensional time-dependent PDEs of varying degrees of complexity. The idea of training neural
operators on latent spaces using DeepONet and autoencoders (AE) was originally proposed in Oommen et al. (2022). In
this work, the growth of a two-phase microstructure for particle vapor deposition was modeled using the Cahn-Hilliard
equation. In another recent work [Zhang et al. (2022b)], the authors explored neural operators in conjunction with AE to
tackle high-dimensional stochastic problems. But the general questions of the predictive accuracy and generalizability
of DeepONets trained on latent spaces remain and require systematic investigation with comparisons to conventional
neural operators.
The training of neural operators on latent spaces consists of a two-step approach: first, training a suitable AE model to
identify a latent representation for the high-dimensional PDE inputs and outputs, and second, training a DeepONet model
and employing the pre-trained AE decoder to project samples back to the physically interpretable high-dimensional
space (see Figure 1). The L-DeepONet framework has two advantages: first, the accuracy of DeepONet is improved, and
second, the L-DeepONet training is accelerated due to the low dimensionality of the data in the latent space. Combined
with the pre-trained AE model, L-DeepONet can perform accurate predictions with real-time inference and learn the
solution operator of complex time-dependent PDEs in low-dimensional space. The contributions of this work can be
summarized as follows:

• We investigate the performance of L-DeepONet, an extension of standard DeepONet, for high-dimensional time-
dependent PDEs that leverages latent representations of input and output functions identified by suitable autoen-
coders (see Figure 1).
• We perform direct comparisons with vanilla DeepONet for complex physical systems, including brittle fracture of
materials, and complex convective and atmospheric flows, and demonstrate that L-DeepONet consistently outper-
forms the standard approach in terms of accuracy and computational time.
• We perform direct comparisons with another neural operator model, the Fourier neural operator (FNO), and two
of its variants, i.e., FNO-2D and FNO-3D, and identify advantages and limitations for a diverse set of applications.

2 Results
To demonstrate the advantages and efficiency of L-DeepONet, we learn the operator for three diverse PDE models of
increasing complexity and dimensionality. First, we consider a PDE that describes the growth of fracture in brittle
materials which are widely used in various industries including construction and manufacturing. Predicting with ac-
curacy the growth of fractures in these materials is important for preventing failures, improve safety, reliability and
cost-effectiveness in a wide range of applications. Second, we consider a PDE describing convective fluid flow, a com-
mon phenomenon in many natural and industrial processes. Understanding how these flows evolve may allow engineers
to better design systems such as heat exchangers or cooling systems to enhance efficiency and reduce energy consump-
tion. Finally, we consider a PDE describing large-scale atmospheric flows which can be used to predict patterns that
occur in weather systems. Such flows play a crucial role in the Earth’s climate system influencing precipitation, temper-
ature which in turn may have a significant impact in water resources, agricultural productivity and energy production.
Developing an accurate surrogate to predict with detail such complex atmospheric patterns may allow us to better adapt
to changes in the climate system and develop effective strategies to mitigate the impacts of climate change. For all PDEs,
the input functions for the operator represent initial conditions modeled as Gaussian or non-Gaussian random fields.
We perform direct comparisons of L-DeepONet with the standard DeepONet model trained on the full dimensional data
and with FNO. More details about the models and the corresponding data generation process are provided in the Sup-
plementary Materials to assist the readers in readily reproducing the results presented below.

Brittle fracture in a plate loaded in shear

Fracture is one of the most commonly encountered failure modes in engineering materials and structures. Defects,
once initialized, can lead to catastrophic failure without warning. Therefore, from a safety point of view, prediction of
the initiation and propagation of cracks is of utmost importance. In the phase field fracture modeling approach, the
effects associated with crack formation, such as stress release, are incorporated into the constitutive model [Bharali
3
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

a) Brittle material fracture

b) Rayleigh-Bénard convective flow

c) Shallow water equations

Figure 2 Left: Results for all applications of the multi-layer autoencoders (MLAE) for different values of the latent dimensionality.
Right: Results for all applications of the neural operators for all studied models. Violin plots represent 5 independent training of
the models using different random seed numbers.

et al. (2022)]. Modeling fracture using the phase field method involves the integration of two fields, namely the vector-
valued elastic field, u(x), and the scalar-valued phase field, φ (x) ∈ [0, 1], with 0 representing the undamaged state of the
material and 1 a fully damaged state.
The equilibrium equation for the elastic field for an isotropic model, considering the evolution of crack, can be written
as [Goswami et al. (2019)]:
−∇ · g(φ)σ = f on Ω, (1)

where σ is the Cauchy stress tensor, f is the body force and g(φ) = (1 − φ)2 represents the monotonically decreas-
ing stress-degradation function that reduces the stiffness of the bulk material in the fracture zone. The elastic field is
constrained by Dirichlet and Neumann boundary conditions:

g(φ)σ · n = tN on ∂ΩN ,
(2)
u = u on ∂ΩD ,

where tN is the prescribed boundary forces and u is the prescribed displacement for each load step. The Dirichlet and
Neumann boundaries are represented by ∂ΩD and ∂ΩN , respectively. Considering the second-order phase field for a
quasi-static setup, the governing equation can be written as:
Gc
φ − Gc l0 ∇2 φ = −g 0 (φ)H(x, t; lc , yc ) on Ω, (3)
l0

where Gc is a scalar parameter representing the critical energy release rate of the material, l0 is the length scale pa-
rameter, which controls the diffusion of the crack, H(x, t) is a local strain-history functional, and yc , lc represent the
position and length of the crack respectively. For sharp crack topology, l0 → 0 [Bourdin et al. (2008)]. H(x, t) contains
4
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

the maximum positive tensile energy (Ψ+ 0 ) in the history of deformation of the system. The strain-history functional is
employed to initialize the crack on the domain as well as to impose irreversibility conditions on the crack growth [Miehe
et al. (2010)]. In this problem, we consider yc , lc to be random variables with yc ∼ U [0.3, 0.7] and lc ∼ U [0.4, 0.6], thus,
the initial strain function H(x, t = 0; lc , yc ) is also random (see the Supplementary Materials for more details). We aim
to learn the solution operator G : H(x, t = 0; lc , yc ) 7→ φ(x, t) which maps the initial strain-history function to the crack
evolution.
In Figure 2(a), we show the mean-square error (MSE) between the studied models and ground truth. The left panel
shows the MSE for the multi-layer autoencoder (MLAE) for different latent dimensions (d), where the violin plot shows
the distribution of MSE from n = 5 independent trials. The right panel shows the resulting MSE for L-DeepONet oper-
ating on different latent dimensions (d) compared with the full high-dimensional DeepONet, FNO-2D, and FNO-3D. We
observe that, regardless of the latent dimension, the L-DeepONet outperforms the standard DeepONet (Full DON) and
performs comparably with FNO-2D and FNO-3D. In Figure 3, a comparison between all models for a random represen-
tative result is shown. While L-DeepONet results in prediction fields almost identical to the reference, the predictions
of the standard models deviate from the ground truth both inside and around the propagated crack. Finally, the cost of
training the different models is presented in Table 1. Because the required network complexity is significantly reduced,
the L-DeepONet is 1 − 2 orders of magnitude cheaper to train than the standard approaches.

𝑡=0 𝑡 = 0.14 𝑡 = 0.43 𝑡 = 0.71 𝑡=1

Reference
Full DeepONet L-DeepONet
FNO-3D

Figure 3 Brittle fracture in a plate loaded in shear: results of a representative sample with yc = 0.55 and lc = 0.6 for all
neural operators. The results of the L-DeepONet model consider the latent dimension, d = 64. The neural operator is trained to
approximate the growth of the crack for five time steps from a given initial location of the defect.

Rayleigh-Bénard fluid flow convection

Rayleigh-Bénard convection occurs in a thin layer of fluid that is heated from below [Chillà and Schumacher (2012)]. The
natural fluid convection is buoyancy-driven and caused due to a temperature gradient ∆T . Instability in the fluid occurs
when ∆T is large enough to make the non-dimensional Rayleigh number, Ra, exceed a certain threshold. The Rayleigh
number whose physical interpretation is the ratio between the buoyancy and the viscous forces is defined as

α∆T gh3
Ra = , (4)
νκ

where α is the thermal expansion coefficient, g is the gravitational acceleration, h is the thickness of the fluid layer, ν
is the kinematic viscosity and κ is the thermal diffusivity. When ∆T is small, the convective flow does not occur due
to stabilizing effects of viscous friction. Based on the governing conservation laws for an incompressible fluid (mass,
momentum, energy) and the Boussinesq approximation according to which density perturbations affect only the gravi-
5
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

tational force, the dimensional form of the Rayleigh-Bénard equations for a fluid defined on a domain Ω reads:
Du

1 ρ
 = − ∇p + g + ν∇2 u, x ∈ Ω, t > 0,
Dt

ρ ρ

0 0

 DT


= κ∇2 T, x ∈ Ω, t > 0, (5)
 Dt



 ∇ · u = 0,

ρ = ρ (1 − α(T − T )),
0 0

where D/Dt denotes material derivative, u, p, T are the fluid velocity, pressure and temperature respectively, T0 is the
temperature at the lower plate, and x = (x, y) are the spatial coordinates. Considering two plates (upper and lower) the
corresponding BCs and ICs are defined as



 T (x, t)|y=0 = T0 , x ∈ Ω, t > 0,
T (x, t)|y=h = T1 , x ∈ Ω, t > 0,



u(x, t)|y=0 = u(x, t)|y=h = 0, x ∈ Ω, t > 0, (6)
y

T (y, t)|t=0 = T0 + h (T1 − T0 ) + 0.1v(x), x ∈ Ω,





u(x, t)| = 0,
t=0 x ∈ Ω,

where T0 , and T1 are the fixed temperatures of the lower and upper plates, respectively. For a 2D rectangular domain
and through a non-dimensionalization of the above equations, the fixed temperatures become T0 = 0 and T1 = 1. The
IC of the temperature field is modeled as linearly distributed with the addition of a GRF, v(x) having correlation length
scales `x = 0.45, `y = 0.4 simulated using a Karhunen-Loéve expansion. The objective is to approximate the operator
G : T (x, t = 0) 7→ T (x, t) (see the Supplementary Materials for more details).

𝑡=0 𝑡 = 0.14 𝑡 = 0.29 𝑡 = 0.43 𝑡 = 0.57 𝑡 = 0.71 𝑡 = 0.86 𝑡=1

FNO-3D Full DeepONet L-DeepONet Reference

Figure 4 Rayleigh-Bénard convective flow: results of the temperature field of a representative sample for all neural operators.
The results of the L-DeepONet model consider the latent dimension, d = 100. The neural operator is trained to approximate the
growth of the evolution of the temperature field from a realization of the initial temperature field for seven time steps.

Figure 2(b) again shows violin plots of the MSE for the MLAE with differing latent dimensions and the MLE for the
corresponding L-DeepONet compared with the other neural operators. Here we see that the reconstruction accuracy
of the MLAE is improved by increasing the latent dimensionality up to d = 100. However, the change in the predictive
accuracy of L-DeepONet for different values of d is less significant, indicating that latent spaces with even very small
dimensions (d = 25) result in a very good performance. Furthermore, L-DeepONet outperforms all other neural oper-
ators with a particularly significant improvement compared to FNO. In Figure 4, we observe that L-DeepONet is able to
capture the complex dynamical features of the true model with high accuracy as the simulation evolves. In contrast, the
standard DeepONet and FNO result in diminished performance as they tend to smooth out the complex features of the
true temperature fields. Furthermore, the training time of the L-DeepONet is significantly lower than the full DeepONet
and FNO as shown in Table 1.

Shallow-water equations
6
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

The shallow-water equations model the dynamics of large-scale atmospheric flows [Galewsky et al. (2004)]. In a vector
form, the viscous shallow-water equations can be expressed as
DV

 = −f k × V − g∇h + ν∇2 V ,
Dt

(7)
 Dh = −h∇ · V + ν∇2 h, x ∈ Ω, t ∈ [0, 1],

Dt
where Ω = (λ, φ) represents a spherical domain where λ, φ are the longitude and latitude respectively ranging from
[−π, π], V = iu + jv is the velocity vector tangent to the spherical surface (i and j are the unit vectors in the eastward
and northward directions respectively and u, v the velocity components), and h is the height field which represents the
thickness of the fluid layer. Moreover, f = 2Ξ sin φ is the Coriolis parameter, where Ξ is the Earth’s angular velocity, g is
the gravitational acceleration and ν is the diffusion coefficient.
As an initial condition, we consider a zonal flow which represents a typical mid-latitude tropospheric jet. The initial
velocity component u is expressed as a function of the latitude φ as



 0 for φ ≤ φ0 ,
 " #
u 1
max
u(φ, t = 0) = exp for φ0 < φ < φ1 , (8)

 n (φ − φ 0 )(φ − φ1 )

for φ ≥ φ1 ,

 0

where umax is the maximum zonal velocity, φ0 , and φ1 represent the latitude in the southern and northern boundary of
the jet in radians, respectively, and n = exp[−4/(φ1 − φ0 )2 ] is a non-dimensional parameter that sets the value umax at
the jet’s mid-point. A small unbalanced perturbation is added to the height field to induce the development of barotropic
instability. The localized Gaussian perturbation is described as

h0 (λ, φ, t = 0) = ĥ cos(φ) exp −(λ/α)2 exp[−(φ2 − φ)/β]2 ,

(9)

where −π < λ < π and ĥ, φ2 , α, β are parameters that control the location and shape of the perturbation. We consider
α, β to be random variables with α ∼ U [0.1̄, 0.5] and β ∼ U [0.03̄, 0.2] so that the input Gaussian perturbation is random.
The localized perturbation is added to the initial height field, which forms the final initial condition h(λ, φ, t = 0) (see
Supplementary Materials for more details). The objective is to approximate the operator G : h(λ, φ, t = 0) 7→ u(λ, φ, t).
This problem is particularly challenging as the fine mesh required to capture the details of the convective flow both
spatially and temporally results in output realizations having millions of dimensions.

𝑡=0 𝑡 = 0.28 𝑡 = 0.42 𝑡 = 0.58 𝑡 = 0.72 𝑡 = 0.86 𝑡 = 0.93 𝑡=1

Reference
L-DeepONet
Full DeepONet

Figure 5 Shallow water equations: results of the evolution of the velocity field through eight time steps for all the operator
models considered in this work, for a representative realization of the initial perturbation to the height field. The results of the
L-DeepONet model consider the latent dimension, d = 81.

Unlike the previous two applications, here the approximated operator learns to map the initial condition of one quan-
tity, h(λ, φ, t = 0), to the evolution of a different quantity, u(λ, φ, t). Given the difference between the input and output
quantities of interest (in scale and features), a single encoding of the combined data as in the standard proposed approach
7
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

(see Figure 1) is insufficient. Instead, two separate encodings are needed for the input and output data, respectively.
While an autoencoder is used to reduce the dimensionality of the output data representing the longitudinal component
of the velocity vector u, standard principal component analysis (PCA) is performed on the input data due to the small
local variations in the initial random height field h which results in a small intrinsic dimensionality.
Results, in terms of MSE, are presented in Figure 2(c), where again we see that the L-DeepONet outperforms the stan-
dard approach while changes in the latent dimension do not result in significant differences in the model accuracy. Con-
sistent with the results of the previous application, the training cost of the L-DeepONet is much lower than the full Deep-
ONet (Table 1). We further note that training FNO for this problem (either FNO-2D or FNO-3D) proved computationally
prohibitive. For a moderate 3D problem with spatial discretization beyond 643 , the latest GPU architectures such as the
NVIDIA Ampere GPU do not provide sufficient memory to process a single training sample [Grady II et al. (2022)]. Data
partitioning across multiple GPUs with distributed memory, model partitioning techniques like pipeline parallelism,
and domain decomposition approaches [Grady II et al. (2022)] can be implemented to handle high-dimensional tensors
within the context of an automatic differentiation framework to compute the gradients/sensitivities of PDEs and thus op-
timize the network parameters. This advanced implementation is beyond the scope of this work as it proves unnecessary
for the studied approach. Consequently, a comparison to the FNO is not shown here. Figure 5, shows the evolution of the
L-DeepONet and the full DeepONet compared to the ground truth for a single realization. The L-DeepONet consistently
captures the complex nonlinear dynamical features for all time steps, while the full model prediction degrades over time
and again smoothing the results such that it fails to predict extreme velocity values for each time step that can be crucial,
e.g., in weather forecasting.

Table 1 Comparison of the computational training time in seconds (s) for all the neural operators across all considered appli-
cations, identically trained on an NVIDIA A6000 GPU. Inference is performed at a fraction of a second for all the approaches.

Application L-DeepONet Full DeepONet FNO-3D

Brittle material fracture 1,660 15,031 128,000
Rayleigh-Bénard fluid flow 2,853 6,772 1,126,400
Shallow water equation 15,218 379,022 –

Table 2 Comparison of the accuracy of the L-DeepONet for two different dimensionality reduction techniques; namely, the
multi-layer autoencoders (MLAE) and principal component analysis (PCA), and d denotes the size of the latent space. Results for
both the maximum and minimum d values tested for each applications are provided. To evaluate the performance of L-DeepONet,
we compute the mean square error of predictions, and we report the mean and standard deviation of this metric based on five
independent training trials.

Application d with MLAE with PCA

9 3.33 · 10−4 ± 4.99 · 10−5 2.71 · 10−3 ± 6.62 · 10−6
Brittle material fracture
64 2.02 · 10−4 ± 1.88 · 10−5 3.13 · 10−4 ± 4.62 · 10−6
25 4.10 · 10−3 ± 8.05 · 10−5 3.90 · 10−3 ± 4.73 · 10−5
Rayleigh-Bénard fluid flow
100 3.55 · 10−3 ± 1.46 · 10−4 3.76 · 10−3 ± 4.86 · 10−5
25 2.30 · 10−4 ± 1.50 · 10−5 7.98 · 10−4 ± 8.01 · 10−7
Shallow water equation
81 2.23 · 10−4 ± 1.83 · 10−5 4.18 · 10−4 ± 4.67 · 10−6

3 Discussion
We have investigated latent DeepONet (L-DeepONet) for learning neural operators on latent spaces for time-dependent
PDEs exhibiting highly non-linear features both spatially and temporally and resulting in high-dimensional observations.
The L-DeepONet framework leverages autoencoder models to cleverly construct compact representations of the high-
dimensional data while a neural operator is trained on the identified latent space for operator regression. Both the
advantages and limitations of L-DeepONet are demonstrated on a collection of diverse PDE applications of increasing
complexity and data dimensionality. As presented, L-DeepONet provides a powerful tool in SciML and UQ that improve
the accuracy and generalizability of neural operators in applications where high-fidelity simulations are considered to
exhibit complex dynamical features, e.g., in climate models.
A systematic comparison with standard DeepONet and FNO revealed that L-DeepONet improves the quality of results
and it can capture with greater accuracy the evolution of the system represented by a time-dependent PDE. This result is
more noticeable as the dimensionality and non-linearity of dynamical features increase (e.g., in complex convective fluid
8
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

flows). Another advantage is that L-DeepONet training requires less computational resources, as standard DeepONet
and FNO are trained on the full-dimensional data and are thus, more computationally demanding and require much
larger memory (see Table 1). For all applications, we found that a small latent dimensionality (d ≤ 100) is sufficient for
constructing powerful neural operators, by removing redundant features that can hinder the network optimization and
thus its predictive accuracy. Furthermore, L-DeepONet can alleviate the computational demand and thus enable tasks
that require the computation of kernel matrices, e.g., used in transfer learning for comparing the statistical distance
between data distributions [Goswami et al. (2022b)].
Despite the advantages of learning operators in latent spaces, there are certain limitations that warrant discussion.
L-DeepONet trains DR models to identify suitable latent representations for the combined input and output data. How-
ever, as shown in the final application, in cases where the approximated mapping involves heterogeneous quantities,
two independent DR models need to be constructed. While in this work we found that simple MLAE models result in the
smallest L-DeepONet predictive error, a preliminary study regarding the suitability of the DR approach needs to be per-
formed for all quantities of interest. Another disadvantage is that the L-DeepONet as formulated is unable to interpolate
in the spatial dimensions. The current L-DeepONet consists of a modified trunk net where the time component has been
preserved while the spatial dimensions have been convolved. Thus, L-DeepONet can be used for time but not for space
interpolation/extrapolation. Finally, L-DeepONet cannot be readily employed in a physics-informed learning manner
since the governing equations are not known in the latent space and therefore cannot be directly imposed. These limita-
tions motivate future studies that continue to assist researchers in the process of constructing accurate and generalizable
surrogate models for complex PDE problems prevalent in physics and engineering.

4 Materials and Methods

4.1 Problem statement
Neural operators learn nonlinear mappings between infinite dimensional functional spaces on bounded domains and
provide a unique simulation framework for real-time inference of complex parametric PDEs. Let Ω ⊂ RD be a bounded
open set and X = X (Ω; Rdx ) and Y = Y(Ω; Rdy ) two separable Banach spaces. Furthermore, assume that G : X → Y
is a non-linear map arising from the solution of a time-dependent PDE. The objective is to approximate the nonlinear
operator via the following parametric mapping

G :X ×Θ→Y or, Gθ : X → Y, θ ∈ Θ (10)

where Θ is a finite-dimensional parameter space. In this standard setting, the optimal parameters θ∗ are learned through
training the neural operator (e.g., via DeepONet, FNO) with a set of labeled observations {xj , yj }Nj=1 generated on a
discretized domain Ωm = {x1 , . . . , xm } ⊂ Ω where {xj }j=1 represent the sensor locations, thus xj|Ωm ∈ RDx and
m

yj|Ωm ∈ RDy where Dx = dx × m and Dy = dy × m. Representing the domain discretization with a single parameter
m, corresponds to the simplistic case where mesh points are equispaced. However, the training data of neural opera-
tors are not restricted to equispaced meshes. For example, for a time-dependent PDE with two spatial and one temporal
dimension with discretizations ms , mt respectively, the total output dimensionality is computed as Dy = mds x × mt .

4.2 Approximating nonlinear operators on latent spaces via L-DeepONet

In physics and engineering, we often consider high-fidelity time-dependent PDEs generating very high-dimensional in-
put/output data with complex dynamical features. To address the issue of high dimensionality and improve the predictive
accuracy we employ L-DeepONet which allows the training of DeepONet on latent spaces. The approach involves two
main steps: 1) the nonlinear DR of both input and output data {xj , yj }N j=1 via a suitable and invertible DR technique, 2)
learning of a DeepONet model on a latent space and inverse transformation of predicted samples back to the original
space. This process is defined as
Jθencoder : {x, y} 7→ {xr , yr }
Gθ : xr 7→ yr (11)
Jθdecoder : yr 7→ yrec

where Jθencoder , Jθdecoder are the two parts of a DR method, r corresponds to data on the reduced space, Gθ is the approx-
imated latent operator and θ its trainable parameters. While the encoder Jθencoder is used to project high-dimensional
data onto the latent space, the decoder Jθdecoder is employed during the training of DeepONet to project predicted samples
back to original space and evaluate its accuracy on the full-dimensional data {xj , yj }Nj=1 . Once trained, L-DeepONet can
be used for real-time inference at no cost. We note that the term ‘L-DeepONet’ refers to the trained DeepONet model
together with the pre-trained encoder and decoder parts of the autoencoder which are required to perform inference in
unseen samples (see Figure 1). Next, the distinct parts of the L-DeepONet framework are elucidated in detail.
9
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

Learning latent representations

The first objective is to identify a latent representation for the high-dimensional input/output PDE data. Compressing
the data to a reduced representation will not only allow us to accelerate the DeepONet training but, as shown above, it
improves predictive performance and robustness. To this end, we employ autoencoders due to their flexibility in the
choice of the model architecture and the inherent inverse mapping capability. We note that the proposed framework
allows for the adoption of any suitable linear or nonlinear DR method provided the existence of an inverse mapping. In
this work, the objective is to demonstrate that DR enhances the accuracy of neural operators rather than establishing
which DR method is the most advantageous. The latter depends on various factors including accuracy, generalizability,
and computational cost. For our demonstrations, we apply AEs that we found to perform comparably or better than PCA
across our diverse set of PDEs through systematic study (see Table 2 and Supplementary Materials). However, the choice
of DR approach can be problem and resource-dependent so, although AEs generally outperform PCA, PCA is found to be
a viable approach for many problems and under certain conditions.
We train unsupervised autoencoder model Jθae and perform hyperparameter tuning to identify the optimal latent
dimensionality d, where d Dx , Dy . Assume a time-dependent PDE, where dx corresponds to the dimensionality of the
input space and ms , mt the spatial and temporal discretizations of the generated data. In order to feed the autoencoder
N ×mt
model with image-like data, the PDE outputs are reshaped into distinct snapshots, i.e., {ŷi }i=1 . Finally, input and
N (1+mt )
output data are concatenated into a single dataset {zi }i=1 . The two parts of the autoencoder model, which are
trained concurrently, are expressed as

Jθencoder : {x, ŷ} ≡ z 7→ {xr , yr } ≡ zr ,

(12)
Jθdecoder : {xr , yr } ≡ zr 7→ {x̃, ỹ} ≡ z̃,

r N ×mt N (1+m )
where {xri }N
i=1 ∈ R , {yi }i=1
d
∈ Rd and {zri }i=1 t ∈ Rd . The trainable parameters of the encoder and decoder are
represented with θencoder and θdecoder respectively. The optimal set of the autoencoder parameters θae = {θencoder , θdecoder }
are obtained via the minimization of the loss function

Lae = minkz − z̃k22 , (13)

θae

where k·k2 denotes the standard Euclidean norm and z̃ ≡ {x̃, ỹ} denotes the reconstructed dataset of combined input
and output data. From a preliminary study, which is not shown here for the sake of brevity, we investigated three AE
models, simple autoencoders (vanilla-AE) with a single hidden layer, multi-layer autoencoders (MLAE), with multiple
hidden layers and convolutional autoencoders (CAE) which convolve data through convolutional layers. We found that
MLAE performs best, even with a small number of hidden layers (e.g., 3). Furthermore, the use of alternative AE models
which are primarily used as generative models, such as variational autoencoders (VAE) [Kingma and Welling (2013)]
or Wasserstein autoencoders (WAE) [Tolstikhin et al. (2017)], resulted in significantly worse L-DeepONet performance.
Although such models resulted in good reconstruction accuracy and thus can be used to reduce the data dimensionality
and generate synthetic yet realistic samples, we found that the obtained submanifold is not well-suited for training the
neural operator, as it may result in the reduction of data variability or even representation collapse.

Training neural operator on latent space (L-DeepONet)

Once the autoencoder model is trained and the reduced data {xr , yr } are generated, we aim to approximate the latent
representation mapping with an unstacked DeepONet Gθ , where θ are the trainable model parameters. As shown in
Figure 1, the unstacked DeepONet consists of two concurrent DNNs, a branch net which encodes the inputs realizations
xr ∈ Rd (in this case the reduced input data) evaluated at the reduced spatial locations {x1 , x2 , . . . , xd }. On the other
hand, the trunk net takes as input the temporal coordinates ζ = {ti }m i=1 at which the PDE output is evaluated. The
t

solution operator for an input realization, x1 , can be expressed as:

p
X p
X
Gθ (xr1 )(ζ) = bi · tri = bi (xr1 (x1 ), xr1 (x2 ), . . . , xr1 (xd )) · tri (ζ), (14)
i=1 i=1

where [b1 , b2 , . . . , bp ]T is the output vector of the branch net, [tr1 , tr2 , . . . , trp ]T the output vector of the trunk net and p
denotes a hyperparameter that controls the size of the final hidden layer of both the branch and trunk net. The trainable
parameters of the DeepONet, represented by θ in Eq. (14), are obtained by minimizing a loss function, which is expressed
as:
L(θ) = Lr (θ) + Li (θ),
(15)
Lr (θ) = minkyr − ỹr k22 ,
θ

10
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

where Lr (θ), Li (θ) denote the residual loss and the initial condition loss respectively, yr the reference reduced outputs
and ỹr the predicted reduced outputs. In this work, we only consider the standard regression loss Lr (θ), however, ad-
ditional loss terms can be added to the loss function. The branch and trunk networks can be modeled with any specific
architecture. Here we consider a CNN for the branch net architecture and a feed-forward neural network (FNN) for the
trunk net to take advantage of the low dimensions
√ √
of the evaluation points, ζ. To feed the branch net of L-DeepONet the
reduced output data are reshaped to R d× d , thus it is advised to choose square latent dimensionality values. Once the
optimal parameters θ are obtained, the trained model can be used to predict the reduced output for novel realizations of
the input x ∈ RDx . Finally, the predicted data are used as inputs to the pre-trained decoder Jθdecoder , to transform results
back to the original space and obtain the approximated full-dimensional output yrec ∈ RDy . We note that the training
cost of L-DeepONet is significantly lower compared to the standard model, due to the smaller size of the network and the
reduced total number of its trainable parameters.

Error metric
To assess the performance of L-DeepONet we consider the MSE evaluated on a set of Ntest test realizations
Ntest
1 X
(yi − yirec ) ,
2
MSE = (16)
Ntest i=1

where y ∈ RDy is the reference and yrec ∈ RDy the predicted output respectively.
More details on how this framework is implemented for different PDE systems of varying complexity can be found
in Results (Section 2). Information regarding the choice of neural network architectures and generation of training data
are provided in the Supplementary Materials.

References
Abdar, M., Pourpanah, F., Hussain, S., Rezazadegan, D., Liu, L., Ghavamzadeh, M., Fieguth, P., Cao, X., Khosravi, A., Acharya, U. R., et al. A Review
of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges. Information Fusion, 76:243–297, 2021.
Berg, J. and Nyström, K. Data-driven discovery of PDEs in complex datasets. Journal of Computational Physics, 384:239–252, 2019.
Bharali, R., Goswami, S., Anitescu, C., and Rabczuk, T. A robust monolithic solver for phase-field fracture integrated with fracture energy based
arc-length method and under-relaxation. Computer Methods in Applied Mechanics and Engineering, 394:114927, 2022.
Bourdin, B., Francfort, G. A., and Marigo, J.-J. The variational approach to fracture. Journal of Elasticity, 91(3):5–148, 2008.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models
are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
Cao, Q., Goswami, S., and Karniadakis, G. E. Lno: Laplace neural operator for solving differential equations. arXiv preprint arXiv:2303.10528,
2023a.
Cao, Q., Goswami, S., Karniadakis, G. E., and Chakraborty, S. Deep neural operators can predict the real-time response of floating offshore
structures under irregular waves. arXiv preprint arXiv:2302.06667, 2023b.
Chen, T. and Chen, H. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application
to dynamical systems. IEEE Transactions on Neural Networks, 6(4):911–917, 1995.
Chen, Z., Zhang, J., Arjovsky, M., and Bottou, L. Symplectic Recurrent Neural Networks. arXiv preprint arXiv:1909.13334, 2019.
Chillà, F. and Schumacher, J. New perspectives in turbulent Rayleigh-Bénard convection. The European Physical Journal E, 35(7):1–25, 2012.
De, S., Hassanaly, M., Reynolds, M., King, R. N., and Doostan, A. Bi-fidelity Modeling of Uncertain and Partially Unknown Systems using Deep-
ONets. arXiv preprint arXiv:2204.00997, 2022.
D’Elia, M., Silling, S., Yu, Y., You, H., and Gao, T. Nonlocal Kernel Network (NKN): a Stable and Resolution-Independent Deep Neural Network.
Technical report, Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2022.
Di Leoni, P. C., Lu, L., Meneveau, C., Karniadakis, G., and Zaki, T. A. DeepONet prediction of linear instability waves in high-speed boundary
layers. arXiv preprint arXiv:2105.08697, 2021.
Galewsky, J., Scott, R. K., and Polvani, L. M. An initial-value problem for testing numerical models of the global shallow-water equations. Tellus
A: Dynamic Meteorology and Oceanography, 56(5):429–440, 2004.
Giovanis, D. G. and Shields, M. D. Data-driven surrogates for high dimensional models using Gaussian process regression on the Grassmann
manifold. Computer Methods in Applied Mechanics and Engineering, 370:113269, 2020.
Goswami, S. Phase field modeling of fracture with isogeometric analysis and machine learning methods. Doctoral Thesis, 2021.
Goswami, S., Anitescu, C., and Rabczuk, T. Adaptive phase field analysis with dual hierarchical meshes for brittle fracture. Engineering Fracture
Mechanics, 218:106608, 2019.
Goswami, S., Anitescu, C., and Rabczuk, T. Adaptive fourth-order phase field analysis for brittle fracture. Computer Methods in Applied Mechanics
and Engineering, 361:112808, 2020.

11
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

Goswami, S., Bora, A., Yu, Y., and Karniadakis, G. E. Physics-Informed Neural Operators. arXiv preprint arXiv:2207.05748, 2022a.
Goswami, S., Kontolati, K., Shields, M. D., and Karniadakis, G. E. Deep transfer operator learning for partial differential equations under condi-
tional shift. Nature Machine Intelligence, pages 1–10, 2022b.
Goswami, S., Li, D. S., Rego, B. V., Latorre, M., Humphrey, J. D., and Karniadakis, G. E. Neural operator learning of heterogeneous mechanobio-
logical insults contributing to aortic aneurysms. Journal of the Royal Society Interface, 19(193):20220410, 2022c.
Goswami, S., Yin, M., Yu, Y., and Karniadakis, G. E. A physics-informed variational DeepONet for predicting crack path in quasi-brittle materials.
Computer Methods in Applied Mechanics and Engineering, 391:114587, 2022d.
Grady II, T. J., Khan, R., Louboutin, M., Yin, Z., Witte, P. A., Chandra, R., Hewett, R. J., and Herrmann, F. J. Towards Large-Scale Learned Solvers
for Parametric PDEs with Model-Parallel Fourier Neural Operators. arXiv preprint arXiv:2204.01205, 2022.
Guo, Y., Liu, Y., Oerlemans, A., Lao, S., Wu, S., and Lew, M. S. Deep learning for visual understanding: A review. Neurocomputing, 187:27–48,
2016.
Howard, A. A., Perego, M., Karniadakis, G. E., and Stinis, P. Multifidelity Deep Operator Networks. arXiv preprint arXiv:2204.09157, 2022.
Jin, P., Meng, S., and Lu, L. MIONet: Learning multiple-input operators via tensor product. arXiv preprint arXiv:2202.06137, 2022.
Khan, Z. Y., Niu, Z., Sandiwarno, S., and Prince, R. Deep learning techniques for rating prediction: a survey of the state-of-the-art. Artificial
Intelligence Review, 54(1):95–135, 2021.
Kingma, D. P. and Welling, M. Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
Kollmann, H. T., Abueidda, D. W., Koric, S., Guleryuz, E., and Sobh, N. A. Deep learning for topology optimization of 2D metamaterials. Materials
& Design, 196:109098, 2020.
Kontolati, K., Loukrezis, D., dos Santos, K. R., Giovanis, D. G., and Shields, M. D. Manifold learning-based polynomial chaos expansions for
high-dimensional surrogate models. International Journal for Uncertainty Quantification, 12(4), 2022a.
Kontolati, K., Loukrezis, D., Giovanis, D. G., Vandanapu, L., and Shields, M. D. A survey of unsupervised learning methods for high-dimensional
uncertainty quantification in black-box-type problems. Journal of Computational Physics, page 111313, 2022b.
Kontolati, K., Goswami, S., Shields, M. D., and Karniadakis, G. E. On the influence of over-parameterization in manifold based surrogates and
deep neural operators. Journal of Computational Physics, page 112008, 2023.
Lataniotis, C., Marelli, S., and Sudret, B. Extending classical surrogate modeling to high dimensions through supervised dimensionality reduc-
tion: a data-driven approach. International Journal for Uncertainty Quantification, 10(1), 2020.
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A. Fourier Neural Operator for Parametric Partial
Differential Equations. arXiv preprint arXiv:2010.08895, 2020a.
Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhattacharya, K., Stuart, A., and Anandkumar, A. Neural operator: Graph kernel network for
partial differential equations. arXiv preprint arXiv:2003.03485, 2020b.
Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E. Learning nonlinear operators via DeepONet based on the universal approximation
theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.
Lu, L., Meng, X., Cai, S., Mao, Z., Goswami, S., Zhang, Z., and Karniadakis, G. E. A comprehensive and fair comparison of two neural operators
(with practical extensions) based on FAIR data. Computer Methods in Applied Mechanics and Engineering, 393:114778, 2022a.
Lu, L., Pestourie, R., Johnson, S. G., and Romano, G. Multifidelity deep neural operators for efficient learning of partial differential equations
with application to fast inverse design of nanoscale heat transport. arXiv preprint arXiv:2204.06684, 2022b.
Miehe, C., Welschinger, F., and Hofacker, M. Thermodynamically consistent phase-field models of fracture: Variational principles and multi-field
FE implementations. International Journal for Numerical Methods in Engineering, 83(10):1273–1311, 2010.
Nikolopoulos, S., Kalogeris, I., and Papadopoulos, V. Non-intrusive surrogate modeling for parametrized time-dependent partial differential
equations using convolutional autoencoders. Engineering Applications of Artificial Intelligence, 109:104652, 2022.
Oommen, V., Shukla, K., Goswami, S., Dingreville, R., and Karniadakis, G. E. Learning two-phase microstructure evolution using neural operators
and autoencoder architectures. npj Computational Materials, 8(1):190, 2022.
Otter, D. W., Medina, J. R., and Kalita, J. K. A Survey of the Usages of Deep Learning for Natural Language Processing. IEEE Transactions on Neural
Networks and Learning Systems, 32(2):604–624, 2020.
Pak, M. and Kim, S. A review of deep learning in image recognition. In 2017 4th international conference on computer applications and information
processing technology (CAIPT), pages 1–3. IEEE, 2017.
Raissi, M., Perdikaris, P., and Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse
problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
Tolstikhin, I., Bousquet, O., Gelly, S., and Schoelkopf, B. Wasserstein Auto-Encoders. arXiv preprint arXiv:1711.01558, 2017.
Tripura, T. and Chakraborty, S. Wavelet neural operator for solving parametric partial differential equations in computational mechanics prob-
lems. Computer Methods in Applied Mechanics and Engineering, 404:115783, 2023.
Wang, S., Wang, H., and Perdikaris, P. Learning the solution operator of parametric partial differential equations with physics-informed Deep-
ONets. Science advances, 7(40):eabi8605, 2021.
Zhang, E., Kahana, A., Turkel, E., Ranade, R., Pathak, J., and Karniadakis, G. E. A Hybrid Iterative Numerical Transferable Solver (HINTS) for PDEs
Based on Deep Operator Network and Relaxation Methods. arXiv preprint arXiv:2208.13273, 2022a.
Zhang, J., Zhang, S., and Lin, G. MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncer-

12
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

tainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. arXiv preprint arXiv:2204.03193, 2022b.

Acknowledgements
The authors would like to acknowledge computing support provided by the Advanced Research Computing at Hopkins
(ARCH) core facility at Johns Hopkins University and the Rockfish cluster and the computational resources and services
at the Center for Computation and Visualization (CCV), Brown University where all experiments were carried out.

Funding
KK & MDS: U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research grant under
Award Number DE-SC0020428.
SG & GEK: U.S. Department of Energy project PhILMs under Award Number DE-SC0019453 and the OSD/AFOSR Multi-
disciplinary Research Program of the University Research Initiative (MURI) grant FA9550-20-1-0358.

Author contributions
Conceptualization: KK, SG, GEK, MDS
Investigation: KK, SG
Visualization: KK, SG
Supervision: GEK, MDS
Writing—original draft: KK, SG
Writing—review & editing: KK, SG, GEK, MDS

Data and materials availability

All data needed to evaluate the conclusions in the paper are presented in the paper and/or the Supplementary Ma-
terials. All code and data accompanying this manuscript will become publicly available at https://github.com/
katiana22/latent-deeponet upon publication of the paper.

Competing interests
The authors declare no competing interests.

13
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

Supplementary Materials for

Learning in latent spaces improves the predictive accuracy

of deep neural operators

Kontolati Katiana, Goswami Somdatta, George Em Karniadakis, Michael D Shields*

*Corresponding author. Email: michael.shields@jhu.edu

This PDF file includes:

Supplementary Text
Tables S1 to S3
Figures S1 to S7
References

1
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

Supplementary Text
Nomenclature

Table S1 Summary of the main symbols and notation used in this work.

Notation Description
xj an input realization (e.g., ICs, BCs)
yj an output of the PDE model
f (·) a forcing function of the PDE
G PDE solution operator
Gθ approximation of mapping on latent space
θ trainable parameters of the neural operator
Jθencoder encoder part of the autoencoder
Jθdecoder decoder part of the autoencoder
{xi }m i=1 sensor locations
ms , mt spatial and temporal discretization
[xrj (x1 ), xrj (x2 ), ..xrj (xd )] pointwise evaluation of the reduced input to the branch net
ζ locations as inputs to the trunk net
Lae autoencoder loss
Lr (θ) L-DeepONet residual loss
d latent space dimensionality
GRF Gaussian random field
CNN convolutional neural network
FNN feed-forward neural network
CAE convolutional autoencoder
VAE variational autoencoder
MLAE multi-layer autoencoder
N total number of train/test data
OOD out-of-distribution
KLE Karhunen-Loéve expansion
MSE mean squared error

Theoretical details
Neural operators
Let Ω ⊂ RD be a bounded open set and X = X (Ω; Rdx ) and Y = Y(Ω; Rdy ) two separable Banach spaces. Furthermore,
assume that G : X → Y is a non-linear map arising from the solution of a time-dependent PDE. The objective is to
approximate the nonlinear operator via the following parametric mapping

G :X ×Θ→Y or, Gθ : X → Y, θ ∈ Θ (17)

where Θ is a finite dimensional parameter space. The optimal parameters θ∗ are learned via the training of a neural
operator with backpropagation based on a dataset {xj , yj }N
j=1 generated on a discretized domain Ωm = {x1 , . . . , xm } ⊂ Ω
where {xj }m
j=1 represent the sensor locations, thus x j|Ωm ∈ RDx and yj|Ωm ∈ RDy where Dx = dx × m and Dy = dy × m.

DeepONet
The Deep Operator Network (DeepONet) [Lu et al. (2021)] aims to learn operators between infinite-dimensional Banach
spaces. Learning is performed in a general setting in the sense that the sensor locations {xi }m i=1 at which the input
functions are evaluated need not be equispaced, however they need to be consistent across all input function evaluations.
Instead of blindly concatenating the input data (input functions [x(x1 ), x(x2 ), . . . , x(xm )]T and locations ζ) as one input,
i.e., [x(x1 ), x(x2 ), . . . , x(xm ), ζ]T , DeepONet employs two subnetworks and treats the two inputs equally. Thus, DeepONet
can be applied for high-dimensional problems, where the dimension of x(xi ) and ζ no longer match since the latter is
a vector of d components in total. A trunk network f (·), takes as input ζ and outputs [tr1 , tr2 , . . . , trp ]T ∈ Rp while a
second network, the branch net g(·), takes as input [x(x1 ), x(x2 ), . . . , x(xm )]T and outputs [b1 , b2 , . . . , bp ]T ∈ Rp . Both
2
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

subnetwork outputs are merged through a dot product Pp to generate the quantity of interest. A bias b0 ∈ R is added in the
last stage to increase expressivity, i.e., G(x)(ζ) ≈ i=k bk tk + b0 . The generalized universal approximation theorem for
operators, inspired by the original theorem introduced by Chen and Chen (1995), is presented below. The generalized
theorem essentially replaces shallow networks used for the branch and trunk net in the original work with deep neural
networks to gain expressivity.

Theorem 1 (Generalized Universal Approximation Theorem for Operators.) Suppose that X is a Banach space, K1 ⊂ X,
K2 ⊂ Rd are two compact sets in X and Rd , respectively, V is a compact set in C(K1 ). Assume that: G : V → C(K2 ) is a
nonlinear continuous operator. Then, for any > 0, there exist positive integers m, p, continuous vector functions g : Rm → Rp ,
f : Rd → Rp , and x1 , x2 , . . . , xm ∈ K1 such that

G(x)(ζ) − hg(x(x1 ), x(x2 ), . . . , x(xm )), f (ζ)i <

| {z } |{z}
branch trunk

holds for all x ∈ V and ζ ∈ K2 , where h·, ·i denotes the dot product in Rp . For the two functions g, f classical deep neural network
models and architectures can be chosen that satisfy the universal approximation theorem of functions, such as fully-connected
networks or convolutional neural networks.

The interested reader can find more information and details regarding the proof of the theorem in Lu et al. (2021).

Fourier neural operator

The backbone algorithm of the Fourier neural operator (FNO) was originally introduced with the kernel integral oper-
ators in Li et al. (2020b), while the actual model was proposed in Li et al. (2020a) and is based on the idea of param-
eterizing the integral kernel in the Fourier space. Similarly to DeepONet, FNO aims to learn a mapping between two
infinite dimensional (functional) spaces. The method employs an iterative algorithm to predict a sequence of functions
v0 7→ v1 7→ . . . 7→ vT taking values in Rdv formally defined as

vt+1 (x) := σ W vt (x) + (K(x; φ)vt )(x) , ∀x ∈ Ω (18)

where K : X × ΘK → H(Y(Ω; Rdv ), Y(Ω; Rdv )) maps to bounded linear operators on Y(Ω; Rdv ) and is parameterized by
φ ∈ ΘK , W : Rdv → Rdv is a linear transformation and σ : R → R is an activation function to introduce non-linearity.
The kernel integral operator K(x; φ) is defined as
Z
(K(x; φ)vt )(x) := κ(x, y, x(x), x(y); φ)vt (y)dy, ∀x ∈ Ω (19)
Ω

where κφ : R2(d+dx ) → Rdv ×dv is approximated by a neural network parameterized by φ ∈ ΘK . In FNO, the kernel
integral operator in Eq. 19 is replaced with a convolution operator defined in Fourier space. The dependence on the
input function x is removed by imposing κφ (x, y) = κφ (x − y) and thus the operator in Eq. 19 results in

(K(x; φ)vt )(x) = F −1 F(κφ ) · F(vt ) (x), ∀x ∈ Ω (20)

where F, F −1 denote the forward and inverse Fourier transformation of a function f : Ω → Rdv defined as
Z Z
−2iπhx,ki −1
(Ff )j (k) = fj (x)e dx, (F f )j (x) = fj (k)e2iπhx,ki dk, (21)
Ω Ω
√
where k ∈ Ω represents the frequency modes and j = 1, . . . , dv with i = −1 the imaginary unit. For implementation
purposes a finite-dimensional parameterization is chosen by truncating the Fourier expansion with a maximal number
of modes kmax = |Zkmax |= |{k ∈ Zd : |kj |≤ kmax,j , for j = 1, . . . , d}|. The low frequency modes are chosen by defining an
upper-bound on the `1 -norm of k ∈ Zd .
The complete FNO algorithm is employed as follows. An input x ∈ X is first lifted to a higher dimensional represen-
tation v0 (x) = P (x(x)) parameterized by a shallow FNN. Subsequently, a number of iterations of updates are applied
vt 7→ vt+1 through a series of Fourier layers. At each Fourier layer, and given that Ω is discretized with m ∈ N points
we have that vt ∈ Rm×dv and F(vt ) ∈ Cm×dv which results to F(vt ) ∈ Ckmax ×dv after the truncation of the higher order
modes. In practice, it has been shown that kmax,j = 12 perform satisfactorily for most applications. Next, the output is
multiplied to a weight tensor R ∈ Ckmax ×dv ×dv . For a uniform discretization, F is replaced with a Fast Fourier Transform
(FFT) which greatly reduces algorithmic complexity from O(m2 ) to O(m log m). After the inverse Fourier transform the
output is added to another weight matrix which is multiplied with the input i.e., W vt (x), and finally the result is passed
3
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

through a non-linear activation function σ(·). After a series of T Fourier layers, the PDE output y(x) = Q(vT (x)) is
computed via the transformation of vT with Q : Rdv → Rdy .
In the original work, two main FNO models are proposed: the FNO-2D and FNO-3D. In FNO-3D, 3-D convolutions are
performed (in space and time) and the model maps 3D functions representing the initial time steps to 3D functions rep-
resenting the full trajectory. It has been shown that FNO-3D is more expressive and leads to better accuracy for sufficient
data. However, it is fixed to the training interval, so once trained, it can only predict the solution in this range but for
any time-discretization. On the other hand, FNO-2D, performs 2-D convolutions together with a recurrent architecture
to propagate in time. While the advantage of this approach is that the model can predict the solution for any number of
time steps (and for fixed time interval ∆t), it has been shown that it is less expressive and more challenging to train. For
more information, the interested reader is referred to Li et al. (2020a).

Data generation
Brittle fracture mechanics
In this application, we consider a continuum fracture modeling method (the second-order phase field model), to ap-
proximate the growth of fracture on a unit square plate, which is fixed on the bottom and the left edge, subjected to
displacement controlled shear loading conditions on the top edge [Goswami (2021)]. We specifically aim to approximate
the mapping G : H(x, t = 0; lc , yc ) 7→ φ(x, t). We consider the material parameters as: λ = 121.15 kN/mm2 , µ = 80.77
kN/mm2 and Gc = 2.7 × 10−3 kN/mm, where λ and µ are Lamé’s constants. The computation is performed by applying
constant displacement increments of ∆u = 1 × 10−4 mm to effectively capture the crack propagation. For all simulations,
l0 is considered to be 0.0125 mm.
Initial cracks are modeled by using the local strain-history function, H(x, t). The initial strain-history function,
H(x, t = 0) is defined as a function of the closest distance of any point, x, on the domain to the line, l, which repre-
sents the discrete crack [Goswami (2021)]. In particular, it is set as:

2d(x,l)
(
BGc l0
2l0 (1 − l0 ) d(x, l) 6 2
H(x, t = 0; lc , yc ) = l0 , (22)
0 d(x, l) > 2

where B is a scalar parameter that controls the magnitude of the scalar history field and for this experiment is consid-
ered as B = 103 based on domain knowledge. The function d(x, l) computes the distance between the middle horizontal
line (defined by the two parameters lc , yc ) of the crack and sets the appropriate value for the initial strain functional. The
simulation takes place in a rectangular domain Ω = [0, 1] × [0, 1], discretized with ms × ms = 162 × 162 mesh points. The
quasi-static problem is solved and in total mt = 8 snapshots of the phase field φ(x) are considered. Thus the dimension-
ality of input and output realizations is Dx = 26, 244 and Dy = 209, 952 respectively. In total, we generate N = 261 data
and split to Ntrain = 230, Ntest = 31 for testing and training respectively. Figure S1 depicts the simulation box with the
associated varying parameters as well as a representative realization of the model with the propagation of an initial crack
through the phase field quantity in three points in time. The training datasets are generated using the code developed in
Goswami et al. (2020), which is available on https://github.com/somdattagoswami/IGAPack-PhaseField.

a Δ𝑢
b
(0,1) (1,1)

𝑦! ℓ!

(0,0) (1,0)

Figure S1 (a) Schematic of the simulation box considered in the generating labeled dataset for brittle fracture under shear
loading, depicting the two random parameters, namely the length of the crack (lc ) and the height of the crack (yc ) and (b)
resulting phase field φ(x) from the solution of the PDE model, showing the evolution of the crack through three-time steps
t = {0.25, 0.625, 1}.

4
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

Rayleigh-Bénard fluid flow convection

In this problem, we aim to approximate the operator G : T (x, t = 0) 7→ T (x, t), which maps the initial temperature
field to its entire time evolution. The simulation takes place in a spherical domain Ω = [0, 4] × [0, 1], discretized with
ms × ms = 128 × 128 mesh points. For each realization, the PDE is solved in the time interval t = [0, 1] for δt = 10−2
and mt = 40 times steps are considered from the 100. The dimensionless Rayleigh number is set equal to 2 · 106 , while
the Prandtl number is set equal to 1. Thus the dimensionality of input and output realizations are Dx = 16, 384 and
Dy = 655, 360 respectively. In total, we generate N = 800 data and split to Ntrain = 720, Ntest = 80 for testing and
training respectively. In Figure S2, a schematic of the convective flow and a random realization of the evolution of the
temperature field T (x, t) are shown. Datasets were generated using the Dedalus Project that can be found in https:
//github.com/DedalusProject/dedalus.

Δ𝑇

b c d

e f g

Figure S2 (a) Schematic of the Rayleigh-Bénard convective flow in a thin fluid layer due to temperature gradient ∆T with the
creation of convective cells at the top and (b)-(g) the evolution of the temperature field T (x, t) for a random realization of the
initial temperature field for six time steps t = {0.05, 0.325, 0.525, 0.775, 0.9, 1.0} based on the numerical solution of the PDE.

Shallow-water equations
In this problem, we aim to approximate the operator between the random Gaussian perturbation h0 to the time-evolved
velocity component u, i.e., G : h0 (λ, φ, t = 0) 7→ u(φ, λ, t). The constants are defined as: Ξ = 7.292 × 10−5 s−1 is the
Earth’s angular velocity, g = 9.80616 ms−1 the gravitational acceleration, ν = 1.0 × 105 m2 s−1 the diffusion coefficient,
umax = 80 ms−1 , φ0 = π/7, φ1 = π/2 − φ0 , thus the mid-point of the jet where the maximum velocity is applied is at
φ = π/4. The initial velocity u is defined, so that it is zero outside the zone of interest with no discontinuities in the
northern and southern poles. The parameters of the Gaussian perturbation which is added to the height field are set as:
φ2 = π/4, ĥ = 120 m, while α, β are random parameters. In this expression, the Gaussian functions are multiplied with
a cosine so that the forced perturbation is zero at the two poles.
While the initial condition of the velocity field is given analytically (see Main Text), the height field is obtained by
numerically integrating the balance equation

tan(φ0 )
Z φ
0
gh(φ) = gh0 − αu(φ ) f + u(φ ) dφ0 ,
0
(23)
α

where α = 6.37122 × 106 m is the radius of the Earth and h0 is set so that mean layer depth around the sphere is equal
to 10 km. The above integral can be calculated using a numerical scheme such a Gaussian quadrature. The Gaussian
perturbation h0 (λ, φ, t = 0), is added to the initial height field computed by the expression above to form the final initial
condition h(λ, φ, t = 0).
5
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

The simulation takes place in a spherical domain Ω = [−π, π] × [−π, π], discretized with ms × ms = 256 × 256 mesh
points in the longitudinal and latitudinal direction respectively. The PDE is solved in the time interval t = [0, 360h] for
δt = 1.6̄ · 10−1 h and in total mt = 72 times steps (equispaced) are considered. For the presentation of results, the
time range is mapped to the dimensionless range t = [0, 1]. Thus the dimensionality of input and output realizations
is Dx = 65, 536 and Dy = 4, 587, 520 respectively. The significantly high dimensionality of outputs makes this problem
particularly challenging. In total, we generate N = 300 data and split to Ntrain = 260, Ntest = 40 for testing and training
respectively. The evolution of the velocity field u for a random realization of the initial height field is shown in Figure S3
for six points in time. Datasets were generated using the Dedalus Project that can be found in https://github.com/
DedalusProject/dedalus.

a 𝑡=0 b 𝑡 = 0.32 c 𝑡 = 0.4

d 𝑡 = 0.6 e 𝑡 = 0.8 f 𝑡 = 1.0

Figure S3 Evolution of the velocity field u(λ, φ) on a sphere (Earth) as a solution of the spherical shallow-water equations, for
a random realization of the initial perturbation to the height field, i.e., α = 0.38, β = 0.20. The velocity field is shown for six time
steps t = {0, 0.32, 0.4, 0.6, 0.8, 1.0}.

Network architecture details

The proposed approach employs autoencoders to reduce the dimensionality of input and output PDE data and feed the
DeepONet model. Although the framework is general enough and any suitable autoencoder model can be used, including
convolutional autoencoders (CAE) and variational autoencoders (VAE), we found that simple multi-layer autoencoders
(MLAE) resulted in the best L-DeepONet performance. Due to the large number of available training data (each output
snapshot is considered a training image), all autoencoders result in very good reconstruction accuracy. However, not all
autoencoders construct a latent space which is suitable for the training of DeepONet. The choice of the autoencoder also
depends on the choice of the DeepONet architecture. For example, if a CNN is employed in the DeepONet’s branch net,
then it is not advised to use CAE for dimension reduction as the input functions will be convolved twice.

Table S2 Architecture of multi-layer autoencoders (MLAE). Parameter d represents the dimensionality of the latent space. All
layers use the ReLU activation function except the last one which uses the sigmoid function.

Application MLAE
Brittle material fracture [128, 64, d, 64, 128]
Rayleigh-Bénard fluid flow [400, 256, 169, d, 169, 256, 400]
Shallow water equation [256, 169, 121, d, 121, 169, 256]

Tables S2,S3, show the architecture of the autoencoders and the neural operators. For all trained multi-layer au-
toencoders the depth and width are chosen based on the dimensionality of the original data. For the neural opera-
6
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

Table S3 Architecture of DeepONet. Inputs to the Conv2D layers consist of the number of output filters, kernel size, and activa-
tion function respectively. Parameter p has been set equal to 5.

Branch net Trunk net

Conv2D(32, (3, 3), sine)
Batch normalization Dense(100)
Conv2D(16, (3, 3), sine) Activation(sine)
Batch normalization Dense(100)
Conv2D(16, (3, 3), sine) Activation(sine)
Batch normalization Dense(p)
Dense(p)

tors, a standard architecture is chosen which resulted in a good performance for all applications. Finally, for train-
ing both FNO-2D and FNO-3D the code from the original implementation was used which can be found at https:
//github.com/zongyi-li/fourier_neural_operator.

Supplementary results
Error plots
In Figures S4,S5,S6, the error plots corresponding to the three applications for all studied models are presented for a
single random realization. The error fields represent the point-wise absolute error between the reference response and
model prediction. As shown and discussed in the main paper, L-DeepONet results in the smallest interpolation error
across diverse applications.

𝑡=0 𝑡 = 0.14 𝑡 = 0.43 𝑡 = 0.71 𝑡=1

L-DeepONet
Full DeepONet
FNO-3D

Figure S4 Brittle fracture in a plate loaded in shear: absolute error plots of all the neural operators for the results of the repre-
sentative sample with yc = 0.55 and lc = 0.6 shown in Fig. 3. The neural operator is trained to approximate the growth of the
crack for five time steps from a given initial location of the defect on a unit square domain.

Results using principal component analysis (PCA)

In Figure S7, we provide the results of the PCA-based L-DeepONet. In this scenario, the PCA is performed on the com-
bined dataset of both input and output data. The left plots in Figure S7, show the reconstruction MSE of the PCA method
for all three PDE applications, whereas the plots on the right show the MSE of the neural operators. First, we observe that
for certain problems the PCA results in low predictive accuracy for very small values of the latent dimensionality (d = 9
in Figure S7 a, and d = 25 in Figure S7 c). This result is also reflected in the low predictive accuracy of the neural operator
model. In Figure S7 b, we observe that the performance of PCA and L-DeepONet when compared with the autoencoder
results in the main text (Figure 2), is comparable. However, for the third and most challenging problem we found that
the autoencoder (see Figure 2 in main text) outperforms the PCA-based L-DeepONet for all tested values of d (see Figure
7
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

𝑡=0 𝑡 = 0.14 𝑡 = 0.29 𝑡 = 0.43 𝑡 = 0.57 𝑡 = 0.71 𝑡 = 0.86 𝑡=1

Full DeepONet L-DeepONet
FNO-3D

Figure S5 Rayleigh-Bénard convective flow:

𝑡=0 𝑡 = 0.28 𝑡 = 0.42 𝑡 = 0.58 𝑡 = 0.72 𝑡 = 0.86 𝑡 = 0.93 𝑡=1

L-DeepONet
Full DeepONet

Figure S6 Shallow water equations: absolute error plots of the predictions of the temperature field from a given initial temper-
ature as obtained for all the neural operators. The predicted solution is shown in Fig. 4.

S7 c). To summarize, we found that the autoencoder-based L-DeepONet results in a better overall performance (espe-
cially for low d) with an accuracy that is either comparable or better to the PCA-based L-DeepONet. However, in certain
problems PCA can performance as good as the AE, with the additional advantage of being much less computationally
expensive.

8
This is a non-peer reviewed manuscript submitted to SEISMICA Your short title goes here

a) Brittle material fracture

b) Rayleigh-Bénard convective flow

c) Shallow water equations

Figure S7 Results for all applications of principal component analysis (PCA) (left plots) for different values of the latent dimen-
sionality and neural operators (right plot) for all studied models. Violin plots represent 5 independent trainings of the models
using different random seed numbers.

Dewey How We Think. Revised Edition
No ratings yet
Dewey How We Think. Revised Edition
229 pages
2000-double-seated-globe-control-valve
No ratings yet
2000-double-seated-globe-control-valve
24 pages
Biblioprof Tab
0% (1)
Biblioprof Tab
585 pages
Stuart 185
No ratings yet
Stuart 185
97 pages
2209.12871v3
No ratings yet
2209.12871v3
49 pages
Abebe Final Ppt
No ratings yet
Abebe Final Ppt
52 pages
Reliable extrapolation [Comput. Methods Appl. Mech. Eng.]
No ratings yet
Reliable extrapolation [Comput. Methods Appl. Mech. Eng.]
36 pages
2022 Predicting parametric spatiotemporal dynamics by multi-resolution PDE structure-preserved deep learning
No ratings yet
2022 Predicting parametric spatiotemporal dynamics by multi-resolution PDE structure-preserved deep learning
51 pages
Air Systems (Pressurization)
No ratings yet
Air Systems (Pressurization)
46 pages
OPERATOR LEARNING ALGORITHMS AND ANALYSIS
No ratings yet
OPERATOR LEARNING ALGORITHMS AND ANALYSIS
36 pages
Final Presentation PML
No ratings yet
Final Presentation PML
45 pages
An Overview on Machine Learning Methods for Partial Differential Equations From Physics Informed Neural Networks to Deep Operator Learning
No ratings yet
An Overview on Machine Learning Methods for Partial Differential Equations From Physics Informed Neural Networks to Deep Operator Learning
59 pages
Ir Book Review
No ratings yet
Ir Book Review
5 pages
The Law of Psychic Phenomena (Thomas Jay Hudson)
No ratings yet
The Law of Psychic Phenomena (Thomas Jay Hudson)
209 pages
36_neural_operator_graph_kernel_n
No ratings yet
36_neural_operator_graph_kernel_n
21 pages
MIONet
No ratings yet
MIONet
25 pages
2412.20183v1
No ratings yet
2412.20183v1
23 pages
2009.12935v1
No ratings yet
2009.12935v1
23 pages
Neural Operator - Learning Maps Between Function Spaces
No ratings yet
Neural Operator - Learning Maps Between Function Spaces
93 pages
Sec 2 E Math Peicai Sec SA2 2018i
No ratings yet
Sec 2 E Math Peicai Sec SA2 2018i
42 pages
1-s2.0-S0893608024010426-main
No ratings yet
1-s2.0-S0893608024010426-main
17 pages
2406.09795v1
No ratings yet
2406.09795v1
18 pages
A Mathematical Guide To Operator Learning
No ratings yet
A Mathematical Guide To Operator Learning
45 pages
ICONIP2024论文
No ratings yet
ICONIP2024论文
15 pages
1 Online
No ratings yet
1 Online
15 pages
Preliminary OWR 2018 11
No ratings yet
Preliminary OWR 2018 11
29 pages
2024 Laplace Neural Operator For Solving Differential Equations
No ratings yet
2024 Laplace Neural Operator For Solving Differential Equations
10 pages
A Neural Network Based PDE Solving Algorithm With High Precision
No ratings yet
A Neural Network Based PDE Solving Algorithm With High Precision
12 pages
Geo Fno
No ratings yet
Geo Fno
26 pages
DeepONet extensions, e.g., POD-DeepONet [Comput. Methods Appl. Mech. Eng.]
No ratings yet
DeepONet extensions, e.g., POD-DeepONet [Comput. Methods Appl. Mech. Eng.]
35 pages
AI - Physics Informed Neural Network by ARNAB HALDER
100% (1)
AI - Physics Informed Neural Network by ARNAB HALDER
15 pages
Lu DeepONet NMachineIntell21
No ratings yet
Lu DeepONet NMachineIntell21
15 pages
lec105
No ratings yet
lec105
19 pages
Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations
No ratings yet
Koopman Neural Operator As A Mesh-Free Solver of Non-Linear Partial Differential Equations
18 pages
RJ Industries Solar Energy_2025_compressed
No ratings yet
RJ Industries Solar Energy_2025_compressed
15 pages
EFMEA Form Elsmar 20160415
No ratings yet
EFMEA Form Elsmar 20160415
11 pages
NeurIPS2024论文
No ratings yet
NeurIPS2024论文
23 pages
2211.15188
No ratings yet
2211.15188
6 pages
Deep ONet
No ratings yet
Deep ONet
22 pages
2204.11127v3
No ratings yet
2204.11127v3
17 pages
2205.11393generic Bounds On The Approximation Error
No ratings yet
2205.11393generic Bounds On The Approximation Error
40 pages
Physics-Informed Deep Neural Operator Networks
No ratings yet
Physics-Informed Deep Neural Operator Networks
34 pages
Chen, Deng et al 2021 - Effective and Efficient Batch Normalization
No ratings yet
Chen, Deng et al 2021 - Effective and Efficient Batch Normalization
15 pages
2207.05209 Fourier Neural Operator With Learned Deformations
No ratings yet
2207.05209 Fourier Neural Operator With Learned Deformations
17 pages
2 Protein
No ratings yet
2 Protein
7 pages
A feedforward neural network framework for approximating the solutions to nonlinear ordinary differential equations
No ratings yet
A feedforward neural network framework for approximating the solutions to nonlinear ordinary differential equations
13 pages
9_Effects of artificial intelligence adoption on organizational success, productivity, and efficiency
No ratings yet
9_Effects of artificial intelligence adoption on organizational success, productivity, and efficiency
17 pages
MgNO
No ratings yet
MgNO
26 pages
Oommen - NPJ - Rethinking Materials Simulations - Blending Direct Numerical Simulations With Neural Operators
No ratings yet
Oommen - NPJ - Rethinking Materials Simulations - Blending Direct Numerical Simulations With Neural Operators
14 pages
Lecture 2. Linear Classification: Prof. Simone Formentin
No ratings yet
Lecture 2. Linear Classification: Prof. Simone Formentin
10 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
No ratings yet
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
26 pages
Struts 1
No ratings yet
Struts 1
17 pages
The Mathematics of Neural Operators
No ratings yet
The Mathematics of Neural Operators
9 pages
2407.08615v2
No ratings yet
2407.08615v2
29 pages
2205.12262
No ratings yet
2205.12262
9 pages
Rabin-Karp Algorithm
No ratings yet
Rabin-Karp Algorithm
2 pages
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
No ratings yet
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
27 pages
Understanding Deep Convolutional Networks: Review
No ratings yet
Understanding Deep Convolutional Networks: Review
16 pages
Deepgraphonet: A Deep Graph Operator Network To Learn and Zero-Shot Transfer The Dynamic Response of Networked Systems
No ratings yet
Deepgraphonet: A Deep Graph Operator Network To Learn and Zero-Shot Transfer The Dynamic Response of Networked Systems
10 pages
Neural ODES
No ratings yet
Neural ODES
32 pages
Axial Capacities of Eccentrically Loaded Equal-Leg Single Angles - Comparisons of Various Design Methods
100% (1)
Axial Capacities of Eccentrically Loaded Equal-Leg Single Angles - Comparisons of Various Design Methods
38 pages
Sciadv Abi8605
No ratings yet
Sciadv Abi8605
10 pages
Topsis Method
No ratings yet
Topsis Method
15 pages
(Alt, Tobias, Et Al.), Translating Numerical Concepts For Pdes Into Neural Architectures, Arxiv Preprint Arxiv-2103.15419 (2021) .
No ratings yet
(Alt, Tobias, Et Al.), Translating Numerical Concepts For Pdes Into Neural Architectures, Arxiv Preprint Arxiv-2103.15419 (2021) .
12 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
An Energy Approach To The Solution of Partial Differential Equations in Computational Mechanics Via Machine Learning: Concepts, Implementation and Applications
No ratings yet
An Energy Approach To The Solution of Partial Differential Equations in Computational Mechanics Via Machine Learning: Concepts, Implementation and Applications
51 pages
Neural Operator Graph Kernel Network For Partial Differential Equations
No ratings yet
Neural Operator Graph Kernel Network For Partial Differential Equations
21 pages
New Bridges Between Deep Learning and Partial Differential Equations
No ratings yet
New Bridges Between Deep Learning and Partial Differential Equations
5 pages
Solving Parabolic Periodic P-Laplacian by Deep Learning
No ratings yet
Solving Parabolic Periodic P-Laplacian by Deep Learning
15 pages
PS Exponential and Logarithmic Curve Fitti
No ratings yet
PS Exponential and Logarithmic Curve Fitti
3 pages
Accepted Manuscript: Journal of Computational Physics
No ratings yet
Accepted Manuscript: Journal of Computational Physics
47 pages
MKT 302 Report
No ratings yet
MKT 302 Report
16 pages
7.capillary Pressure
No ratings yet
7.capillary Pressure
24 pages
As NZ 3191-2008
No ratings yet
As NZ 3191-2008
7 pages
A High-Efficient Hybrid Physics-Informed Neural Networks Based On Convolutional Neural Network
No ratings yet
A High-Efficient Hybrid Physics-Informed Neural Networks Based On Convolutional Neural Network
13 pages
Gmail - Interview Call Letter For Yardi Software India LTD (29th November 2023)
No ratings yet
Gmail - Interview Call Letter For Yardi Software India LTD (29th November 2023)
2 pages
A S Enterprises
No ratings yet
A S Enterprises
7 pages
Upgrade Option #2 - HPR130 / HPR260: Benefits Include
No ratings yet
Upgrade Option #2 - HPR130 / HPR260: Benefits Include
1 page
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
No ratings yet
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
12 pages
Connections Between Deep Learning and Partial Differential Equations
No ratings yet
Connections Between Deep Learning and Partial Differential Equations
2 pages
Toyota 2J Engine Data
No ratings yet
Toyota 2J Engine Data
1 page
NodeFlair_Resume_2024-03-31_08_52_49
No ratings yet
NodeFlair_Resume_2024-03-31_08_52_49
1 page
Dgca Module 17 Part 3
No ratings yet
Dgca Module 17 Part 3
17 pages
Analysis of Laterally Loaded Piles Based On P-Y Procedures
No ratings yet
Analysis of Laterally Loaded Piles Based On P-Y Procedures
25 pages
Renato - A Tutorial On Solving Ordinary Differential Equations Using Python and Hybrid Physics-Informed Neural Network
No ratings yet
Renato - A Tutorial On Solving Ordinary Differential Equations Using Python and Hybrid Physics-Informed Neural Network
11 pages
STS Report Outline1
100% (1)
STS Report Outline1
14 pages
Session Plan: Computer Systems Servicing NC II
No ratings yet
Session Plan: Computer Systems Servicing NC II
5 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators

Uploaded by

Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators

Uploaded by

Learning in latent spaces improves the predictive

accuracy of deep neural operators

Abstract Operator regression provides a powerful means of constructing discretization-invariant

Keywords: Neural operators, autoencoders, latent representations, partial differential equations

Multi-layer perceptron autoencoder

{𝐲! }' {𝐲$! }'

Latent {𝐱 "# , 𝐲"# }& '

Pre-trained encoder Reduced

Brittle fracture in a plate loaded in shear

a) Brittle material fracture

b) Rayleigh-Bénard convective flow

c) Shallow water equations

𝑡=0 𝑡 = 0.14 𝑡 = 0.43 𝑡 = 0.71 𝑡=1

Rayleigh-Bénard fluid flow convection

𝑡=0 𝑡 = 0.14 𝑡 = 0.29 𝑡 = 0.43 𝑡 = 0.57 𝑡 = 0.71 𝑡 = 0.86 𝑡=1

h0 (λ, φ, t = 0) = ĥ cos(φ) exp −(λ/α)2 exp[−(φ2 − φ)/β]2 ,

𝑡=0 𝑡 = 0.28 𝑡 = 0.42 𝑡 = 0.58 𝑡 = 0.72 𝑡 = 0.86 𝑡 = 0.93 𝑡=1

Application L-DeepONet Full DeepONet FNO-3D

Application d with MLAE with PCA

4 Materials and Methods

G :X ×Θ→Y or, Gθ : X → Y, θ ∈ Θ (10)

4.2 Approximating nonlinear operators on latent spaces via L-DeepONet

Learning latent representations

Jθencoder : {x, ŷ} ≡ z 7→ {xr , yr } ≡ zr ,

Lae = minkz − z̃k22 , (13)

Training neural operator on latent space (L-DeepONet)

solution operator for an input realization, x1 , can be expressed as:

Data and materials availability

Supplementary Materials for

Learning in latent spaces improves the predictive accuracy

Kontolati Katiana, Goswami Somdatta, George Em Karniadakis, Michael D Shields*

*Corresponding author. Email: michael.shields@jhu.edu

This PDF file includes:

G :X ×Θ→Y or, Gθ : X → Y, θ ∈ Θ (17)

Fourier neural operator

Rayleigh-Bénard fluid flow convection

a 𝑡=0 b 𝑡 = 0.32 c 𝑡 = 0.4

d 𝑡 = 0.6 e 𝑡 = 0.8 f 𝑡 = 1.0

Network architecture details

Branch net Trunk net

𝑡=0 𝑡 = 0.14 𝑡 = 0.43 𝑡 = 0.71 𝑡=1

Results using principal component analysis (PCA)

𝑡=0 𝑡 = 0.14 𝑡 = 0.29 𝑡 = 0.43 𝑡 = 0.57 𝑡 = 0.71 𝑡 = 0.86 𝑡=1

Figure S5 Rayleigh-Bénard convective flow:

𝑡=0 𝑡 = 0.28 𝑡 = 0.42 𝑡 = 0.58 𝑡 = 0.72 𝑡 = 0.86 𝑡 = 0.93 𝑡=1

a) Brittle material fracture

b) Rayleigh-Bénard convective flow

c) Shallow water equations

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.