0% found this document useful (0 votes)
11 views25 pages

MIONet

Uploaded by

1536268388tom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

MIONet

Uploaded by

1536268388tom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

SIAM J. SCI. COMPUT.

© 2022 Society for Industrial and Applied Mathematics


Vol. 44, No. 6, pp. A3490--A3514

MIONET: LEARNING MULTIPLE-INPUT OPERATORS VIA


TENSOR PRODUCT\ast
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

PENGZHAN JIN\dagger , SHUAI MENG\ddagger , AND LU LU\S

Abstract. As an emerging paradigm in scientific machine learning, neural operators aim to


learn operators, via neural networks, that map between infinite-dimensional function spaces. Several
neural operators have been recently developed. However, all the existing neural operators are only
designed to learn operators defined on a single Banach space; i.e., the input of the operator is a single
function. Here, for the first time, we study the operator regression via neural networks for multiple-
input operators defined on the product of Banach spaces. We first prove a universal approximation
theorem of continuous multiple-input operators. We also provide a detailed theoretical analysis
including the approximation error, which provides guidance for the design of the network architecture.
Based on our theory and a low-rank approximation, we propose a novel neural operator, MIONet,
to learn multiple-input operators. MIONet consists of several branch nets for encoding the input
functions and a trunk net for encoding the domain of the output function. We demonstrate that
MIONet can learn solution operators involving systems governed by ordinary and partial differential
equations. In our computational examples, we also show that we can endow MIONet with prior
knowledge of the underlying system, such as linearity and periodicity, to further improve accuracy.

Key words. operator regression, multiple-input operators, tensor product, universal approxi-
mation theorem, neural networks, MIONet, scientific machine learning

MSC codes. 47-08, 47H99, 65D15, 68Q32, 68T07

DOI. 10.1137/22M1477751

1. Introduction. The field of scientific machine learning (SciML) has grown


rapidly in recent years, where deep learning techniques are developed and applied
to solve problems in computational science and engineering [12]. As an active area
of research in SciML, different methods have been developed to solve ordinary and
partial differential equations (ODEs and PDEs) by parameterizing the solutions via
neural networks (NNs), such as physics-informed NNs (PINNs) [48, 31, 36, 38, 47],
the deep Ritz method [8], and the deep Galerkin method [41]. These methods have
shown promising results in diverse applications, such as fluid mechanics [39], optics
[4], systems biology [44, 5], and biomedicine [15]. However, these methods solve only
one specific instance of the PDE, and hence it is necessary to train a new neural
network given a new initial condition, boundary condition, or forcing term, which is
computationally costly and time-consuming.
Another approach is to apply neural networks (called neural operators) to learn
solution operators of PDEs, mapping from an input function v (e.g., initial condition,
boundary condition, or forcing term) to the PDE solution u. This regression for the

\ast Submitted to the journal's Methods and Algorithms for Scientific Computing section February

14, 2022; accepted for publication (in revised form) July 13, 2022; published electronically November
7, 2022.
https://doi.org/10.1137/22M1477751
Funding: This work was supported by the U.S. Department of Energy (DE-SC0022953).
The first author was partially supported by the China Postdoctoral Science Foundation grant
2022M710005.
\dagger School of Mathematical Sciences, Peking University, Beijing 100871, China (jpz@pku.edu.cn).
\ddagger Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA

19104 USA (shuaim@seas.upenn.edu).


\S Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadel-

phia, PA 19104 USA (lulu1@seas.upenn.edu).


A3490

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3491

solution operator \scrG is formulated as

\scrG : X \rightarrow Y, v \mapsto \rightarrow u,


Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

where X and Y are two infinite-dimensional Banach spaces of functions and u = \scrG (v).
We aim to learn \scrG via NNs from a training dataset, i.e., some pairs (v, \scrG (v)) \in X \times Y .
Once a neural operator is trained, obtaining a solution \scrG (v) for a new instance of v
requires only a forward pass of the network.
Several neural operator approaches have been recently proposed, such as deep op-
erator network (DeepONet) [27, 28, 29] and Fourier neural operator (FNO) [21, 29],
graph kernel network [22, 46], and others [34, 1, 42, 37]. Among these approaches,
DeepONet has been applied and demonstrated good performance in diverse applica-
tions, such as high-speed boundary layer problems [7], multiphysics and multiscale
problems of hypersonics [32] and electroconvection [2], multiscale bubble growth dy-
namics [23, 24], fractional derivative operators [28], stochastic differential equations
[28], solar-thermal systems [35], and aortic dissection [45]. Several extensions of Deep-
ONet have also been developed, such as Bayesian DeepONet [25], DeepONet with
proper orthogonal decomposition (POD-DeepONet) [29], multiscale DeepONet [26],
neural operator with coupled attention [14], and physics-informed DeepONet [43, 10].
Despite the progress of neural operators in computational tests, our theoretical
understanding is lagging behind. The current theoretical analysis of neural operators
focuses on approximation capability, such as the universal approximation theorems for
DeepONet [3, 28] and FNO [18]. Recent theoretical works show that DeepONet [6, 20]
and FNO [17] may break the curse of dimensionality (CoD) in some problems, and
DeepONet can approximate the solution operators of elliptic PDEs with exponential
accuracy [33]. These results demonstrate the efficient approximation capability of
DeepONet and FNO. One main difference between DeepONet and FNO is that the
theory of FNO requires the input function v and the output function u to be defined
on the same domain [17, 18], but the theory of DeepONet does not have this restriction
[3, 28, 6, 20, 33].
However, all the existing neural operators are only designed to learn operators
defined on a single Banach space X, i.e., the input of the operator is a single func-
tion. The theory of universal approximation for operators has only been proved for
operators defined on a single Banach space. We note that some theoretical work
[18] allows the input function to be a vector-valued function, i.e., the input could be
v = (v1 , v2 , . . . ), but it still requires that all components of the input function vi must
be defined on the same domain. This limitation of the input space prohibits us from
learning a wide range of useful operators, e.g., the PDE solution operator mapping
from both the initial condition and boundary condition to the PDE solution, as the
initial condition and boundary condition are defined on two different domains (the
initial domain and the boundary domain, respectively).
To overcome this limitation, in this work we first study theoretically the approx-
imation theory of the operator regression for a multiple-input operator \scrG defined on
the product of Banach spaces,

\scrG : X1 \times X2 \times \cdot \cdot \cdot \times Xn \rightarrow Y,


where X1 , X2 , . . . , Xn are n different input Banach spaces, and Y is the output Banach
space. For example, X1 can be the function space of all initial conditions, X2 can
be the function space of all boundary conditions, and X3 can be the function space
of all forcing terms. Based on our theory, we then propose a novel neural operator,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3492 PENGZHAN JIN, SHUAI MENG, AND LU LU

MIONet, to learn multiple-input operators. We verify in our computational results


that MIONet can learn solution operators involving systems governed by ODEs and
PDEs. We also discuss how we can endow MIONet with prior knowledge of the
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

underlying system, such as linearity and periodicity, to further improve accuracy.


This paper is organized as follows. In section 2, we prove the approximation
theory for multiple-input operator regression. Then we propose MIONet based on
the theory in section 3. Subsequently, we test MIONet on several problems of ODEs
and PDEs in section 4. Finally, section 5 summarizes this work.

2. Approximation theory. Our goal is to learn a (typically nonlinear) operator


mapping from a product of n Banach spaces (input spaces) to another Banach space
(output space). These spaces are typically infinite dimensional. We first define the
main notation used throughout this paper. Denote the space comprised of all the
continuous maps mapping from a metric space X to a metric space Y as C(X, Y ),
and define C(X) := C(X, \BbbR ). Let X1 , X2 , . . . , Xn and Y be n + 1 Banach spaces,
and let Ki \subset Xi (i = 1, . . . , n) be a compact set. Then we aim to learn a continuous
operator

\scrG : K1 \times \cdot \cdot \cdot \times Kn \rightarrow Y, (v1 , . . . , vn ) \mapsto \rightarrow u,

where vi \in Ki and u = \scrG (v1 , . . . , vn ). Such \scrG form the space C(K1 \times \cdot \cdot \cdot \times Kn , Y ),
which is studied in this paper. In the following results, Y could be any Banach space.
When Y is a function space, for convenience we consider Y = C(K0 ) for a compact
domain K0 .
In this section, we prove the approximation theory of continuous multiple-input
operators by first illustrating our basic idea using the example of multilinearity on
finite-dimensional spaces in section 2.1. We then introduce the techniques of Schauder
basis and canonical projection for infinite-dimensional spaces in section 2.2, based on
which we present the main theory of nonlinear operators in section 2.3, with more
detailed analysis in section 2.4. We also provide a view of the theory through the
tensor product of Banach spaces in section 2.5.

2.1. Multilinear operators defined on finite-dimensional Banach spaces.


We first use a simple case to illustrate the main idea of our theoretical approach:
multilinear operators defined on finite-dimensional Banach spaces. Specifically, we
consider a multilinear operator

\scrG : X1 \times \cdot \cdot \cdot \times Xn \rightarrow Y,

where X1 , . . . , Xn are Banach spaces of finite dimensions d1 , . . . , dn .


Let \{ \phi ij \} dj=1
i
\subset Xi be a basis of Xi , and thus for each vi \in Xi , there exists a
coordinate representation,

di
\sum
vi = \alpha ji \phi ij
j=1

for some vector \alpha i = (\alpha 1i , \alpha 2i , . . . , \alpha di i ) \in \BbbR di . Because \scrG is multilinear, for any input

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3493

(v1 , . . . , vn ),
\left( \right)
d1 dn
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\sum \sum
\scrG (v1 , . . . , vn ) =\scrG \alpha j11 \phi 1j1 , . . . , \alpha jnn \phi njn
j1 =1 jn =1
d1
\sum dn
\sum
\scrG \phi 1j1 , . . . , \phi njn \alpha j11 \cdot \cdot \cdot \alpha jnn ,
\bigl( \bigr)
= \cdot \cdot \cdot
j1 =1 jn =1
\bigl( \bigr) \bigl( \bigr)
where uj1 \cdot \cdot \cdot jn = \scrG \phi 1j1 , . . . , \phi njn \in Y is the output of \scrG for the input \phi 1j1 , . . . , \phi njn .
For convenience and clarity, for u = (uj1 \cdot \cdot \cdot jn )d1 \times \cdot \cdot \cdot \times dn \in Y d1 \times \cdot \cdot \cdot \times dn , we use the nota-
tion u\langle \cdot \cdot \cdot \rangle to represent the multilinear map
d1
\sum dn
\sum
u\langle \alpha 1 , . . . , \alpha n \rangle := \cdot \cdot \cdot uj1 \cdot \cdot \cdot jn \alpha j11 \cdot \cdot \cdot \alpha jnn .
j1 =1 jn =1

Hence, a multilinear operator defined on finite-dimensional Banach spaces can be


represented as
\scrG (v1 , . . . , vn ) = \scrG \phi 1j1 , . . . , \phi njn d \times \cdot \cdot \cdot \times d \langle \alpha 1 , . . . , \alpha n \rangle .
\bigl( \bigl( \bigr) \bigr)
(2.1)
1 n

Next we discuss the main idea of the approximation theory of \scrG , i.e., how to
construct a surrogate model \scrG \~\theta (parameterized by the parameters \theta ) to approximate
\scrG . We note that \alpha i in (2.1) can be\bigl( computed
\bigl( directly
\bigr) \bigr) for vi , and thus to approximate
\scrG , it is sufficient to approximate \scrG \phi 1j1 , . . . , \phi njn d \times \cdot \cdot \cdot \times d . We consider Y = C(K0 )
1 n

for a compact set K0 \subset \BbbR d , and then we can construct \scrG \~\theta as
\scrG \~\theta :\BbbR d1 \times \cdot \cdot \cdot \times \BbbR dn \rightarrow C(K0 ),
(\alpha 1 , . . . , \alpha n ) \mapsto \rightarrow f\~\theta \langle \alpha 1 , . . . , \alpha n \rangle ,

where f\~\theta \in C(K0 , \BbbR d1 \times \cdot \cdot \cdot \times dn ) is a function class parameterized by parameters \theta . It is
easy to show that\bigl( \scrG \~\theta \bigl( is multilinear \bigr) \bigr) and can approximate \scrG arbitrarily well as long as
f\~\theta approximates \scrG \phi 1j1 , . . . , \phi njn d \times \cdot \cdot \cdot \times d well, which can be achieved by choosing
1 n

f\~\theta as neural networks.


2.2. Schauder basis and canonical projections for infinite-dimensional
spaces. To deal with infinite-dimensional spaces, we introduce the Schauder basis
and canonical projections. We refer the reader to [9] for more details.
Definition 2.1 (Schauder basis). Let X be an infinite-dimensional normed linear
space. A sequence \{ ei \} \infty
i=1 in X is called a Schauder basis of X if for every x \in X
there is a unique sequence of scalars \{ ai \} \infty
i=1 , called the coordinates of x, such that
\infty
\sum
x= ai ei .
i=1

We show two useful examples of the Schauder basis as follows.


Example 2.2 (Faber--Schauder basis of C[0, 1]). Given distinct points \{ ti \} \infty
i=1
which are a dense subset in [0, 1] with t1 = 0, t2 = 1, let e1 (t) = 1, e2 (t) = t, and
let ek+1 be chosen as an element, such that e1 , . . . , ek , ek+1 is a basis of the (k + 1)-
dimensional space which consists of all the piecewise linear functions with grid points
\{ ti \} k+1
i=1 .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3494 PENGZHAN JIN, SHUAI MENG, AND LU LU

Example 2.3 (Fourier basis of L2 [0, 1]). Any orthogonal basis in a separable Hilbert
space is a Schauder basis.
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

We denote the coordinate functional of ei by e\ast i , and thus


\infty
\sum
x= e\ast i (x)ei \forall x \in X.
i=1

Then for a constant n, the canonical projection Pn is defined as


\Biggl( \infty \Biggr) n
\sum \sum
\ast
Pn (x) = Pn ei (x)ei = e\ast i (x)ei .
i=1 i=1

We have the following property for Pn , according to which, we can represent points in
an infinite-dimensional Banach space by finite coordinates within a sufficiently small
projection error.
Property 2.4 (canonical projection). Assume that K is a compact set in a Ba-
nach space X equipped with a Schauder basis and corresponding canonical projections
Pn ; then we have

lim sup \| x - Pn (x)\| = 0.


n\rightarrow \infty x\in K

Proof. The proof can be found in Appendix A.


For convenience, we decompose the Pn as

Pn = \psi n \circ \varphi n ,

where \varphi n : X \rightarrow \BbbR n and \psi n : \BbbR n \rightarrow X are defined as
n
\sum
T
\varphi n (x) = (e\ast 1 (x), . . . , e\ast n (x)) , \psi n (\alpha 1 , . . . , \alpha n ) = \alpha i ei .
i=1

The \varphi n (x) are essentially the truncated coordinates for x. Moreover, sometimes we
can further replace \{ e1 , . . . , en \} with an equivalent basis for the decomposition of Pn ,
i.e.,

\varphi \^n (x) = Q(e\ast 1 (x), . . . , e\ast n (x))T , \psi \^n (\alpha 1 , . . . , \alpha n ) = (e1 , . . . , en )Q - 1 (\alpha 1 , . . . , \alpha n )T ,

with a nonsingular matrix Q \in \BbbR n\times n . For example, when applying the Faber--
Schauder basis (Example 2.2), instead of using the coordinates based on the sequence
\{ ei \} \infty
i=1 , we use the function values evaluated at certain grid points as the coordinates,
which is the same as using the linear element basis in the finite element method.
2.3. Main theorems: Approximation theory for multiple-input opera-
tors. Here, we present the main approximation theorems in Theorem 2.5 and Corol-
lary 2.6, and the proofs will be presented afterwards.
Theorem 2.5. Suppose that X1 , . . . , Xn , Y are Banach spaces, Ki \subset Xi are com-
pact sets, and Xi have a Schauder basis with canonical projections Pqi = \psi qi \circ \varphi iq .
Assume that \scrG : K1 \times \cdot \cdot \cdot \times Kn \rightarrow Y is a continuous operator; then for any \epsilon > 0,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3495

there exist positive integers pi , qi , continuous vector functions gi \in C(\BbbR qi , \BbbR pi ), and
u = (uj1 \cdot \cdot \cdot jn ) \in Y p1 \times \cdot \cdot \cdot \times pn , such that
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

sup \bigm\| \scrG (v1 , . . . , vn ) - u g1 (\varphi 1q1 (v1 )), . . . , gn (\varphi nqn (vn )) \bigm\| < \epsilon .
\bigm\| \bigl\langle \bigr\rangle \bigm\|
(2.2)
vi \in Ki

Corollary 2.6. The conclusion in Theorem 2.5 can also be expressed in the
following equivalent forms.
(i) There exist positive integers pi , qi , r and continuous functions gi \in C(\BbbR qi , \BbbR pi ),
u \in Y r , and W \in \BbbR p1 \times \cdot \cdot \cdot \times pn \times r , such that
sup \bigm\| \scrG (v1 , . . . , vn ) - W g1 (\varphi 1q1 (v1 )), . . . , gn (\varphi nqn (vn )), u \bigm\| < \epsilon .
\bigm\| \bigl\langle \bigr\rangle \bigm\|
(2.3)
vi \in Ki

If \{ ei \} is a Schauder basis for Y , we can further have u = (e1 , e2 , . . . , er )T .


(ii) There exist positive integers p, qi and continuous functions gji \in C(\BbbR qi ) and
uj \in Y , such that
\bigm\| \bigm\|
\bigm\| p
\sum \bigm\|
1 1 n n
\bigm\| \bigm\|
(2.4) sup \bigm\|
\bigm\| \scrG (v 1 , . . . , v n ) - gj (\varphi (v
q1 1 )) \cdot \cdot \cdot gj (\varphi (v
qn n )) \cdot uj \bigm\| < \epsilon .
\bigm\|
vi \in Ki \bigm\| j=1 \bigm\|

The relations between these three results are as follows. We first prove (2.2);
in (2.3), we treat gi and u in (2.2) symmetrically and combine them via a tensor;
(2.4) is simply a summation of products. In fact, when Y is a space of continuous
function approximated by fully connected neural networks (FNNs), (2.2) and (2.3)
are technically equivalent, since W can be regarded as the final linear output layer of
the FNN for approximating u. Therefore, we design two architectures in section 3,
one based on (2.2)/(2.3) and the other based on (2.4).
Next we show two special cases of n = 1 based on the theory above. In Example
2.7, we choose the Faber--Schauder basis as a Schauder basis. In Example 2.8, we
have the universal approximation theorem for DeepONets.
Example 2.7. Assume that K is a compact set in C[0, 1], and \scrG : K \rightarrow C[0, 1]
is a continuous operator; then for any \epsilon > 0, there exist positive integers q, r and a
continuous map f : \BbbR q \rightarrow \BbbR r , such that
\bigm\| \bigm\|
r \biggl( \biggl( \biggr) \biggl( \biggr) \biggl( \biggr) \biggr)
\bigm\| \sum 0 1 q - 1 \bigm\|
\bigm\| \scrG (v) - fi v ,v ,...,v \cdot ei \bigm\| < \epsilon
\bigm\| \bigm\|
\bigm\|
i=1
q - 1 q - 1 q - 1 \bigm\|

holds for all v \in K, where f = (fi ). \{ ei \} ri=1 are chosen as the piecewise linear functions
j j - 1 i - 1
with grid points r - 1 , and ei ( r - 1 ) = \delta ij . In fact, fi denotes the values of \scrG (v) at r - 1 .
This example is a direct conclusion of Example 2.2 and Corollary 2.6(i).
Example 2.8 (DeepONet). As a special case, for n = 1 in Theorem 2.5 we obtain
the universal approximation theorem for DeepONet (Theorem 2 in [28]).
2.4. Detailed analysis. We first introduce Lemma 2.9 and Theorem 2.10, which
are used to prove the main theorems in section 2.3.
Lemma 2.9. Suppose that X1 , . . . , Xn , Y are Banach spaces and Ki \subset Xi are
compact sets. Assume that \scrG : K1 \times \cdot \cdot \cdot \times Kn \rightarrow Y is a continuous operator; then
for any \epsilon > 0, there exist positive integers pi and continuous vector functionals g \^i \in
C(Xi , \BbbR pi ) and u \in Y p1 \times p2 \times \cdot \cdot \cdot \times pn , such that

sup \| \scrG (v1 , . . . , vn ) - u\langle g \^n (vn )\rangle \| < \epsilon .


\^1 (v1 ), . . . , g
vi \in Ki

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3496 PENGZHAN JIN, SHUAI MENG, AND LU LU

Proof. The proof can be found in Appendix B.


Lemma 2.9 gives the approximation theory in the original infinite-dimensional
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Banach spaces. Next we extend to the following result.


Theorem 2.10. Suppose that X1 , . . . , Xn , Y are Banach spaces, Ki \subset Xi are
compact sets, and Xi have a Schauder basis with canonical projections Pqi . Assume
that \scrG : K1 \times K2 \times \cdot \cdot \cdot \times Kn \rightarrow Y is a continuous operator; then for any \epsilon > 0,
there exist positive integers pi , continuous vector functionals g \^i \in C(Xi , \BbbR pi ), and
p1 \times p2 \times \cdot \cdot \cdot \times pn
u \in Y , such that
n
\sum
\^1 (Pq11 (v1 )), . . . , g
\^n (Pqnn (vn ))\rangle \bigm\| < \epsilon + M L\epsilon i (qi )
\bigm\| \bigm\|
(2.5) sup \bigm\| \scrG (v1 , . . . , vn ) - u\langle g
vi \in Ki i=1

holds for arbitrary positive integers qi , where

L\epsilon i (qi ) = sup \bigm\| g\^i \circ Pqii (vi ) - g


\bigm\| \bigm\|
\^i (vi )\bigm\| 1 , M = max \| \scrG (v1 , . . . , vn )\| .
vi \in Ki vi \in Ki

Note that L\epsilon i (qi ) \rightarrow 0 as qi \rightarrow \infty .


Proof. The proof can be found in Appendix C.
Theorem 2.10 immediately derives Theorem 2.5 and Corollary 2.6 as shown in
Appendices D and E. In (2.5), the first part\sum of error ``\epsilon "" is due to operator approx-
n
imation, while the second part of error ``M i=1 L\epsilon i (qi )"" is due to the projection to
\epsilon
finite-dimensional space. We note that Li depends on \epsilon , and thus when \epsilon is small,
which makes L\epsilon i converge more slowly, a large value of qi is needed.
Next we show further analysis of these results, which also provides guidance for
the design of the network architectures in section 3.
Corollary 2.11 (effect of a bias). In Theorem 2.10, if Y = C(K0 ) for compact
K0 in a Banach space X0 , we take an additional bias b \in \BbbR , such that (2.5) becomes
n
\sum
\^1 (Pq11 (v1 )), . . . , g
\^n (Pqnn (vn )) - b| < \epsilon + M L\epsilon i (qi ),
\bigl\langle \bigr\rangle
sup | \scrG (v1 , . . . , vn )(y) - f (y) g
vi \in Ki i=1
y\in K0

where f \in C(K0 , \BbbR p1 \times p2 \times \cdot \cdot \cdot \times pn ) and
\biggl( \biggr)
1
M= max \scrG (v1 , . . . , vn )(y) - min \scrG (v1 , . . . , vn )(y) .
2 vi \in Ki ,y\in K0 vi \in Ki ,y\in K0

Proof. Replace \scrG (v1 , . . . , vn ) by \scrG (v1 , . . . , vn ) - b in Theorem 2.10, where


\biggl( \biggr)
1
b= max \scrG (v1 , . . . , vn )(y) + min \scrG (v1 , . . . , vn )(y) .
2 vi \in Ki ,y\in K0 vi \in Ki ,y\in K0

Corollary 2.11 suggests adding a bias, which makes the constant M smaller and
thus decreases the error. In addition, we explore more characteristics of Theorem 2.5
for learning multiple operators.
Corollary 2.12 (approximation theory for multiple operators). Suppose that
X1 , . . . , Xn , Y1 , . . . , Ym are Banach spaces, Ki \subset Xi are compact sets, and Xi have a
Schauder basis with canonical projections Pqi = \psi qi \circ \varphi iq . Assume that \scrG j : K1 \times \cdot \cdot \cdot \times
Kn \rightarrow Yj are continuous operators, then for any \epsilon > 0 the following hold:

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3497

(i) There exist positive integers pi , qi and continuous vector functions gi \in C(\BbbR qi , \BbbR pi )
and uj \in Yjp1 \times p2 \times \cdot \cdot \cdot \times pn , such that
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

sup \bigm\| \scrG j (v1 , . . . , vn ) - uj \langle g1 (\varphi 1q1 (v1 )), . . . , gn (\varphi nqn (vn ))\rangle \bigm\| < \epsilon , 1 \leq j \leq m.
\bigm\| \bigm\|
vi \in Ki

(ii) There exist positive integers pi , qi , rj and continuous functions gi \in C(\BbbR qi , \BbbR pi ),
r
uj \in Yj j , and Wj \in \BbbR p1 \times \cdot \cdot \cdot \times pn \times rj , such that

sup \bigm\| \scrG j (v1 , . . . , vn ) - Wj \langle g1 (\varphi 1q1 (v1 )), . . . , gn (\varphi nqn (vn )), uj \rangle \bigm\| < \epsilon , 1 \leq j \leq m.
\bigm\| \bigm\|
vi \in Ki

If \{ ejk \} is a Schauder basis for Yj , we can further have uj = (ej1 , ej2 , . . . , ejrj )T .
(iii) There exist positive integers p, qi and continuous functions gki \in C(\BbbR qi ) and
ujk \in Yj , such that
\bigm\| p
\bigm\|
\bigm\| \sum \bigm\|
1 1 n n j \bigm\|
sup \bigm\| \scrG j (v1 , . . . , vn ) - gk (\varphi q1 (v1 )) \cdot \cdot \cdot gk (\varphi qn (vn )) \cdot uk \bigm\| < \epsilon , 1 \leq j \leq m.
\bigm\|
vi \in Ki \bigm\| \bigm\|
k=1

Proof. Replace Y by Y1 \times \cdot \cdot \cdot \times Ym in Theorem 2.5.


We list Corollary 2.12 here to emphasize that multiple operators defined on the
same product space can share the same gi or gji , which indicates a practical approxi-
mation method for operators mapping from X1 \times \cdot \cdot \cdot \times Xn to Y1 \times \cdot \cdot \cdot \times Ym .
Corollary 2.13 (linear case). In Theorem 2.5 and Corollary 2.6, if \scrG is linear
with respect to vi , then linear gi and gji are sufficient.
Proof. The proof can be found in Appendix F.
By Corollary 2.13, if we know the operator is linear with respect to vi ; then we
can choose gi or gji as linear maps in practice to make the learning procedure easier
and generalize better.
Property 2.14. In Theorem 2.5 and Corollary 2.6, if gi = Wi \cdot hi for Wi \in
\BbbR pi \times hi and hi \bigm\| \in C(\BbbR qi , \BbbR hi ), then (2.2) and (2.3) can be rewritten \bigm\| as
1 n
(i) supvi \in Ki \bigm\|
\bigm\| \scrG (v 1 , . . . , v n ) - \~
u\langle h 1 (\varphi q 1
(v 1 )), . . . , hn (\varphi q n
(v n ))\rangle \bigm\| < \epsilon ,
\bigm\|
(ii) supv \in K \bigm\| \scrG (v1 , . . . , vn ) - W
\bigm\| \~ \langle h1 (\varphi 1q (v1 )), . . . , hn (\varphi nq (vn )), u\rangle \bigm\| \bigm\| < \epsilon ,
i i 1 n

\~ \in Y
respectively, for a new u h1 \times \cdot \cdot \cdot \times hn \~ \in \BbbR h1 \times \cdot \cdot \cdot \times hn \times r .
and a new W
Proof. The proof can be found in Appendix G.
Property 2.14 shows that the linear output layers of gi are allowed to be removed
without loss of universality. For example, when gi are chosen as FNNs, we can
eliminate the redundant parameters of the last linear layer.
Corollary 2.15 (universal approximation theorem for functions). Assume that
f : K1 \times \cdot \cdot \cdot \times Kn \rightarrow \BbbR is a continuous function for compact Ki \subset \BbbR qi and \sigma is an
activation function which satisfies the requirements for the approximation theorem of
fully connected neural networks; then for any \epsilon > 0 the following hold:
(i) There exist integers pi , weights Wi \in \BbbR pi \times qi , W \in \BbbR p1 \times \cdot \cdot \cdot \times pn , and biases bi \in
\BbbR pi , such that

\| f - W \langle \sigma (W1 (\cdot ) + b1 ), . . . , \sigma (Wn (\cdot ) + bn )\rangle \| C(K1 \times \cdot \cdot \cdot \times Kn ) < \epsilon .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3498 PENGZHAN JIN, SHUAI MENG, AND LU LU

(ii) There exist integer p, weights Wi \in \BbbR pi \times qi , wji \in \BbbR 1\times pi , and biases bi \in \BbbR pi ,
such that
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\bigm\| \bigm\|
\bigm\| p
\sum
\bigm\|
\bigm\| \bigl( 1 \bigr) \bigl( n \bigr) \bigm\|
\bigm\| f - w j \sigma (W 1 (\cdot ) + b 1 ) \cdot \cdot \cdot wj \sigma (W n (\cdot ) + b )
n \bigm\|
\bigm\| < \epsilon .
\bigm\|
\bigm\| j=1 \bigm\|
C(K1 \times \cdot \cdot \cdot \times Kn )

Proof. We take Xi = \BbbR qi and Y = \BbbR in Lemma 2.9. Then the corollary can be
obtained by the universal approximation theorem for fully connected neural networks
with one hidden layer and Property 2.14.
When n = 1, Corollary 2.15 degenerates to the classical universal approximation
theorem for fully connected neural networks with one hidden layer. Generally speak-
ing, V1 \times \cdot \cdot \cdot \times Vn can be regarded as a compact V \in \BbbR q1 +\cdot \cdot \cdot +qn , and thus f can also be
approximated by FNNs. Compared to FNNs, the two new architectures in Corollary
2.15 divide the input components into several different groups.
Remark 2.16 (tensor neural network). Here, we introduce the tensor neural
network (TNN) as a general function approximator mapping from \BbbR d to \BbbR m . Assume
that d = q1 + \cdot \cdot \cdot + qn for positive integers q1 , . . . , qn ; then a TNN \Phi : \BbbR d \rightarrow \BbbR m can
be written as

(2.6) \Phi (x1 , . . . , xn ) := W (\Psi 1 (x1 ) \odot \cdot \cdot \cdot \odot \Psi n (xn )), xi \in \BbbR qi , 1 \leq i \leq n,

where \odot is the Hadamard product (i.e., elementwise product), \Psi i : \BbbR qi \rightarrow \BbbR p are
standard neural networks (such as FNNs), and W \in \BbbR m\times p is a trainable weight
matrix. The approximation property for TNN can be obtained via setting Xi = \BbbR qi
and Y = \BbbR m in Theorem 2.5 and Corollary 2.6. A bias can also be added to (2.6).
2.5. View through the tensor product of Banach spaces. We have pre-
sented all the main theorems and related analysis. Here, we provide another view
of the theory through the tensor product of Banach spaces; this section is optional,
and the reader's understanding of the subsequent content will not be hindered if it is
skipped. We also refer the reader to [40] for more details.
Recall that we aim to learn a continuous operator

\scrG \in C(K, Y ), K = K1 \times \cdot \cdot \cdot \times Kn ,

where Ki is a compact set in a Banach space Xi . By the injective tensor product, we


have

C(K, Y ) \sim \^ \varepsilon Y,


= C(K)\otimes

with the canonical linear map defined as

J : C(K) \otimes Y \rightarrow C(K, Y ),


p
\sum p
\sum
fj \otimes uj \mapsto \rightarrow fj \cdot uj
j=1 j=1
\sum p
for representation \mu = fj \otimes uj \in C(K) \otimes Y . Here, C(K)\otimes
j=1
\^ \varepsilon Y is the completion
\bigm\| \sum \bigm\|
\bigm\| p
of C(K) \otimes Y with the injective norm \varepsilon (\mu ) = supv\in K \bigm\| j=1 fj (v)uj \bigm\| , and we have
\bigm\|

the isometric isomorphism between C(K)\otimes \^ \varepsilon Y and C(K, Y ); for convenience it is still

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3499
\sum p
denoted as J. Then for \scrG \in C(K, Y ) and any \epsilon > 0, there exists j=1 fj \otimes uj such
that
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\bigm\| \bigm\| \left( \right)


\bigm\| p
\sum \bigm\| p
\sum
= \varepsilon J - 1 \scrG -
\bigm\| \bigm\|
(2.7) \bigm\| \scrG - fj \cdot uj \bigm\| fj \otimes uj < \epsilon .
\bigm\| \bigm\|
\bigm\| j=1 \bigm\| j=1
C(K,Y )

Furthermore, by repeating the decomposition

C(K1 \times \cdot \cdot \cdot \times Kn ) \sim


=C(K1 \times \cdot \cdot \cdot \times Kn - 1 , C(Kn ))
\sim
=C(K \^ \varepsilon C(Kn ),
1 \times \cdot \cdot \cdot \times Kn - 1 )\otimes

we obtain

(2.8) C(K1 \times K2 \times \cdot \cdot \cdot \times Kn , Y ) \sim \^ \varepsilon C(K2 )\otimes
= C(K1 )\otimes \^ \varepsilon \cdot \cdot \cdot \otimes
\^ \varepsilon C(Kn )\otimes
\^ \varepsilon Y.

Similar to (2.7), we have


\bigm\| \bigm\|
\bigm\| p
\sum \bigm\|
1 2 n
\bigm\| \bigm\|
(2.9) \bigm\| \scrG - fj \cdot fj \cdot \cdot \cdot fj \cdot uj \bigm\| < \epsilon
\bigm\|
\bigm\|
\bigm\| j=1 \bigm\|
C(K,Y )

for some fji \in C(Ki ) and uj \in Y . Note that C(Ki ), as a continuous function space on
\sum \infty
compact Ki , has a Schauder basis, denoted as \{ gki \} \infty i
k=1 . Let fj =
i i
k=1 \alpha jk gk ; then
there exist positive integers pi such that

(2.10) \| \scrG - u\langle g1 , . . . , gn \rangle \| C(K,Y ) < \epsilon


\bigr) T \Bigl( \sum \Bigr)
\bigl( p
for gi = g1i , . . . , gpi i , u = j=1 jk1 \alpha 1
\cdot \cdot \cdot \alpha n
jkn uj . Furthermore, if Y has
\sum \infty p1 \times \cdot \cdot \cdot \times pn
a Schauder basis \{ ek \} and uj = k=1 \beta j k ek , there exists an r such that

(2.11) \| \scrG - W \langle g1 , . . . , gn , e\rangle \| C(K,Y ) < \epsilon


\Bigl( \sum \Bigr)
p
for e = (e1 , . . . , er )T and W = j=1
1
\alpha jk1
n
\cdot \cdot \cdot \alpha jk \beta
n j kn+1
\in \BbbR p1 \times \cdot \cdot \cdot \times pn \times r . Hence,
we have also obtained the three approximation formulas (2.9)--(2.11) corresponding
to (2.2)--(2.4), with the additional information that the components of gi and e are
all basis functions.
Next we analyze the complexity of the approximation in terms of tensor rank.
The operator \scrG in (2.11) is discretely represented by a tensor W \in \BbbR p1 \times \cdot \cdot \cdot \times pn \times r ,
since gi and e are fixed bases. The number of parameters of W grows exponentially
with respect to n, so directly using W in computation is too expensive for large n.
However, if we rewrite W as
\left( \right)
p
\sum p
\sum
1 n
W = \alpha jk1
\cdot \cdot \cdot \alpha jkn \beta j kn+1 = a1j \otimes \cdot \cdot \cdot \otimes anj \otimes bj
j=1 j=1

for aij = (\alpha jk


i
) \in \BbbR pi , bj = (\beta j k ) \in \BbbR r , then we have
p
\sum
(2.12) W \langle g1 , . . . , gn , e\rangle = fj1 \cdot fj2 \cdot \cdot \cdot fjn \cdot uj
j=1

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3500 PENGZHAN JIN, SHUAI MENG, AND LU LU

\sum pi \sum r
for fji = k=1 i
\alpha jk gki , uj = k=1 \beta j k ek . Here, W in (2.12) is a tensor of rank at most
p. From this point of view, (2.9) also gives a low-rank approximation by the tensor
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

p
\sum
\mu = fj1 \otimes \cdot \cdot \cdot \otimes fjn \otimes uj
j=1

of rank at most p. We note that it is usually difficult to determine the rank of high-
order tensors, which is NP-hard [11], but in some cases there exist some relationships
between the dimension of W and its rank p. For example, if W \in \BbbR p1 \times p2 \times p3 , then
the rank of W has an upper bound [16, 19]:

rank(W ) \leq min\{ p1 p2 , p1 p3 , p2 p3 \} .

In short, we have a more general viewpoint for our results. Assume that Y =
C(K0 ) for a compact set K0 in a Banach space X0 ; then, depending on the level of
decomposition applied, we have the following cases:
\left\{
(2.13) C(K1 \times \cdot \cdot \cdot \times Kn \times K0 ) (standard NN),
(2.14) C(K, Y ) = C(K1 \times \cdot \cdot \cdot \times Kn )\otimes \^ \varepsilon C(K0 ) (DeepONet),
(2.15) \^ \^ \^
C(K1 )\otimes \varepsilon \cdot \cdot \cdot \otimes \varepsilon C(Kn )\otimes \varepsilon C(K0 ) (MIONet),

where K = K1 \times \cdot \cdot \cdot \times Kn . We discuss how the three different representations lead to
different network architectures as follows.
\bullet In (2.13), we first combine all the inputs from all the spaces and then pass
them into a function (i.e., a machine learning model) to approximate \scrG . When
we restrict the model to be a neural network, it is a standard NN, such as
FNN, residual neural network (ResNet), convolutional neural network (CNN),
etc.
\bullet In (2.14), we first split the input and output spaces, and have one model for
the input space and one model for the output space, and then we combine
them. When both models are standard NNs, this leads to the same archi-
tecture as DeepONet [28]. Therefore, DeepONet with this simple extension
can handle the multiple-input case via treating X1 \times \cdot \cdot \cdot \times Xn as a single
Banach space. However, this treatment will lose the structure of the product
space, which leads to a large generalization error, as we demonstrate in our
numerical experiments.
\bullet In (2.15), we split all the spaces with one model for each space, and then
combine them to compute the output. This leads to our proposed MIONet
in section 3, where each model is a standard NN.
3. Operator regression methods. Based on our theory, we propose a new
neural operator, MIONet, for learning multiple-input operator regression.
3.1. Network architectures. The architectures of MIONet are designed based
on Theorem 2.5 and Corollary 2.6 with Y = C(K0 ) for a compact set K0 \subset \BbbR d . We
design two slightly different versions of MIONet according to different formulas as
follows.
MIONet (high-rank). We first construct the architecture according to (2.2) and
(2.3). Note that the architecture induced by (2.3) is technically equivalent to (2.2) as
we discussed in section 2.3. Specifically, we use f \in C(K0 , \BbbR p1 \times \cdot \cdot \cdot \times pn ) to denote the
u in (2.2), and we approximate gi and f by independent neural networks denoted by

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3501

\~i (called branch net i) and \~f (called trunk net). We also add a trainable bias b \in \BbbR
g
according to Corollary 2.11. Then the network is
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\Biggl\langle \Biggr\rangle
(3.1) \~ 1 , . . . , vn )(y) = \~f (y) g
\scrG (v 1
\~1 (\varphi q1 (v1 )), . . . , g n
\~n (\varphi qn (vn )) + b.
\underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{}
trunk branch1 branchn

MIONet has n independent branch nets and one trunk net. The ith branch net g \~i
encodes the input function vi , and the trunk net \~f encodes the input y. The output
tensor of the trunk net has a high rank as we discussed in section 2.5. We note that
the last linear layer of each branch net can be removed by Property 2.14 to reduce
the number of parameters.
MIONet (low-rank; the default version). We then construct MIONet according to
(2.4). Specifically, gi = (g1i , . . . , gpi )T \in C(\BbbR qi , \BbbR p ) and f = (u1 , . . . , up )T \in C(K0 , \BbbR p )
are approximated by neural networks g \~i (called branch net i) and \~f (called trunk net).
Then the network (Figure 1) is
\left( \right)

(3.2) \~ 1 , . . . , vn )(y) = \scrS


\scrG (v g\~1 (\varphi 1q1 (v1 )) \odot \cdot \cdot \cdot \odot g\~n (\varphi nqn (vn )) \odot \~f (y) + b,
\underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{}
branch1 branchn trunk

where \odot is the Hadamard product, \scrS is the summation of all the components of a
vector, and b \in \BbbR is a trainable bias. This MIONet is a low-rank version of the MIONet
(high-rank) above, which greatly reduces the number of parameters of the trunk net.
Furthermore, if the output function is also defined on a product space, e.g., C(K0 ) =
\^ \varepsilon C(K02 ), and the
C(K01 \times K02 ), we can choose to further decompose it into C(K01 )\otimes
corresponding MIONet can be built similarly. As a special case, if the image space Y
is finite dimensional, we show the corresponding low-rank MIONet in Appendix H.
The universal approximation theorem for MIONet can be easily obtained in The-
orem 3.1 from Theorem 2.5 and Corollary 2.6.
Theorem 3.1 (universal approximation theorem for MIONet). Under the set-
ting of Theorem 2.5 with Y = C(K0 ), for any \epsilon > 0 there exists a MIONet (low-/
\~ such that
high-rank) \scrG ,
\bigm\| \bigm\|
\bigm\| \scrG - \scrG \~\bigm\| < \epsilon .
\bigm\| \bigm\|
C(K1 \times \cdot \cdot \cdot \times Kn ,C(K0 ))

Proof. We consider \scrF mapping from C(\varphi 1q1 (K1 ), \BbbR p ) \times \cdot \cdot \cdot \times C(\varphi nqn (Kn ), \BbbR p ) \times
C(K0 , \BbbR p ) to C(K1 \times \cdot \cdot \cdot \times Kn , C(K0 )) as

\scrF (g1 , . . . , gn , f )(v1 , . . . , vn ) := \scrS g1 (\varphi 1q1 (v1 )) \odot \cdot \cdot \cdot \odot gn (\varphi nqn (vn )) \odot f .
\bigl( \bigr)

It is easy to verify that \scrF is continuous. Then the universal approximation theo-
rem of MIONet can be easily obtained from Theorem 2.5, Corollary 2.6, and the
approximation property of neural networks.
Connections to DeepONet. Our proposed MIONet is related to DeepONet. When
there is only one input function, i.e., n = 1, MIONet becomes DeepONet with one
branch net and one trunk net.
Remark. In this work, we mainly consider MIONet (low-rank), as MIONet (high-
rank) is computationally expensive. We note that MIONet is a high-level architecture,
where the neural networks g\~i and \~f can be chosen as any valid NNs, such as FNN,

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3502 PENGZHAN JIN, SHUAI MENG, AND LU LU

𝒢 𝑣1 , … , 𝑣𝑛 𝑦

Sum +
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

Element-wise product 

Branch net1 …… Branch net𝑛 Trunk net

𝜑𝑞11 𝑣1 Coordinate projection 𝜑𝑞𝑛𝑛 𝑣𝑛

𝑣1 𝑣𝑛 𝑦

Fig. 1. Architecture of MIONet. All the branch nets and the trunk net have the same number
of outputs, which are merged together via the Hadamard product and a subsequent summation.

ResNet, CNN, etc., depending on the specific problem. All the techniques developed
for DeepONet in [29] can be directly used for MIONet. For example, we can encode
the periodicity in the trunk net to ensure that the predict functions from MIONet are
always periodic. We refer the reader to [29] for more details.
3.2. Other computational details.
Data. One data point in the dataset is comprised of input functions and their
corresponding output function, i.e., (v1 , . . . , vn , \scrG (v1 , . . . , vn )). In the first step of
MIONet, we project the input functions vi onto finite-dimensional spaces as stated
in (3.1) and (3.2), which can be done separately before the network training. Hence,
the network input in practice is (\varphi 1q1 (v1 ), . . . , \varphi nqn (vn )), and then the dataset takes the
form
\left\{ \left( \right) \right\} N

\varphi 1q1 (v1k ), . . . , \varphi nqn (vnk ), yk , \scrG v1k , . . . , vnk (yk )
\bigl( \bigr)
(3.3) \scrT = ,
\underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{} \underbrace{}
branch1 branchn trunk output k=1

d
where yk \in K0 \subset \BbbR is a single point location in the domain of the output function.
Training. For a training dataset \scrT , in this study we use a standard mean squared
error (MSE),
N
1 \sum \~ 1k , . . . , vnk )(yk )| 2 .
MSE = | sk - \scrG (v
N
k=1

We also provides alternative losses via numerical integration in Appendix I.


Inference. For new input functions v1 , . . . , vn , the prediction is simply given by
\~ 1 , . . . , vn ). We note that \scrG (v
\scrG (v \~ 1 , . . . , vn ) is a function given by neural networks,
which can be evaluated at arbitrary points without interpolation.
4. Numerical results. To demonstrate the capability of MIONet, we learn
three different operators of ODEs and PDEs. In the experiments, we directly evaluate
the function values at uniform grid points as the input of branch nets, i.e., each \varphi
takes 100 equidistant sampling points in [0, 1] for each input function. The branch

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3503

and trunk nets are all chosen as fully connected neural networks (FNNs) unless noted
otherwise. Each branch or trunk net has the same number of neurons (i.e., width) for
each layer. The activation in all networks is set to ReLU. We train all the networks by
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

the Adam optimizer [13]. To evaluate the performance of the networks, we compute
the L2 relative error of the predictions, and for each case five independent training
trials are performed to compute the mean error and the standard deviation. The
code in this study is implemented by using the library DeepXDE [30] and is publicly
available from the GitHub repository https://github.com/lu-group/mionet.
MIONet is the first neuron operator designed for multiple inputs, and there is no
other network that can be directly compared to MIONet. In order to compare the per-
formance of MIONet and DeepONet, we concatenate all the input functions together
as a single input of DeepONet branch net as discussed in section 2.5. Specifically, if we
assume that \{ eij \} is the Schauder basis of Xi , then \{ (e1j , 0, . . . , 0)\} \cup \{ (0, e2j , 0, . . . , 0)\} \cup
\cdot \cdot \cdot \cup \{ (0, . . . , 0, enj )\} is, in fact, a Schauder basis of the Banach space X1 \times \cdot \cdot \cdot \times Xn .
The hyperparameters of DeepONet and MIONet are chosen as follows. For each
case, we perform a grid search for the depth and width to find the best accuracy of
DeepONet. Then we set the depth of MIONet to be the same as DeepONet and man-
ually tune the width, so it is possible that MIONet can achieve even better accuracy
than results reported below.
4.1. An ODE system. We first consider a nonlinear ODE system,
du1 du2
= u2 , = - f1 (t) sin(u1 ) + f2 (t), t \in [0, 1],
dt dt
with an initial condition u1 (0) = u2 (0) = 0. We learn the operator mapping from f1
and f2 to one of the ODE solutions u1 :

\scrG : (f1 , f2 ) \mapsto \rightarrow u1 .

To generate the dataset, f1 and f2 are both sampled from a Gaussian random
field (GRF)
\scrG \scrP (0, kl (x1 , x2 )),
where the covariance kernel kl (x1 , x2 ) = exp( - \| x1 - x2 \| 2 /2l2 ) is the Gaussian kernel
with a length-scale parameter l. Here, we choose l = 0.2. We set the number of
functions in the training set to 1000 and in the test set to 100,000, and for each
couple of (f1 , f2 ) we get the numerical solution of u1 at 100 equidistant grid points
in [0, 1]. We train the networks for 100,000 epochs with learning rate 0.001.
MIONet has the L2 relative error of 1.69\% (Table 1), which outperforms Deep-
ONet with almost the same number of parameters (2.41\%). We also perform a grid
search for the depth and width to find the best accuracy of DeepONet; the best
accuracy is 2.26\%, which is still worse than MIONet.
Table 1
MIONet and DeepONet for an ODE system. DeepONet (same size) has the same number of
parameters as MIONet. DeepONet (best) is the best result chosen from depth 2--5 and width 100--400.

Depth Width No. of parameters L2 relative error


MIONet 2 200 161K 1.69 \pm 0.13\%
DeepONet (same size) 2 312 161K 2.41 \pm 0.27\%
DeepONet (best) 2 300 151K 2.26 \pm 0.14\%

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3504 PENGZHAN JIN, SHUAI MENG, AND LU LU

4.2. A diffusion-reaction system. We consider a nonlinear diffusion-reaction


system
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\biggl( \biggr)
\partial u \partial \partial u
= D(x) + ku2 + g(x), x \in [0, 1], t \in [0, 1],
\partial t \partial x \partial x
with zero initial and boundary conditions, where D(x) = 0.01(| f (x)| +1) and k = 0.01
is the reaction rate. We learn the operator
\scrG : (D, g) \mapsto \rightarrow u.
In the dataset, f and g are generated by GRF with length scale 0.2. We set the
number of couples of (D, g) in the training dataset to 1000 and in the test dataset to
5000, and for each couple we solve u in a grid with 100 \times 100 equidistant points. We
train each case for 100,000 epochs with learning rate 0.001.
The error of MIONet is significantly less than that of DeepONet of similar size
and also that of the best DeepONet (Table 2). In Figure 2, we show an example of
the inputs and the corresponding PDE solution. We also show the prediction and
pointwise error of DeepONet and MIONet.
Table 2
MIONet and DeepONet for a diffusion-reaction system. DeepONet (same size) has the same
number of parameters as MIONet. DeepONet (best) is the best DeepONet result chosen from depth
2--5 and width 100--400.

Depth Width Parameters L2 relative error


MIONet 2 200 161K 1.97 \pm 0.11\%
DeepONet (same size) 2 312 161K 5.25 \pm 0.38\%
DeepONet (best) 2 400 242K 5.18 \pm 0.11\%

4.3. An advection-diffusion system. We consider an advection-diffusion sys-


tem

\partial u \partial u \partial 2 u


+ - D(x) 2 = 0, x \in [0, 1], t \in [0, 1],
\partial t \partial x \partial x
with the periodic boundary condition and the initial condition u0 (x) = u(x, 0) =
f1 (sin2 (\pi x)), where D(x) = 0.05| f2 (sin2 (\pi x)| + 0.05 is the diffusion coefficient. We
aim to learn the operator
\scrG : (D, u0 ) \mapsto \rightarrow u.
In the dataset, f1 and f2 are sampled from a GRF with the length scale 0.5. The
training/test dataset consists of 1000 couples of (D, u0 ). For each (D, u0 ), we solve
the solution u numerically in a grid of size 100 \times 100 and randomly select 100 values
of u out of the 10,000 grid points. We train each case for 100,000 epochs with learning
rate 0.0002.
Here, we show how to encode the prior information of this problem. Since the
operator \scrG is linear with respect to the initial condition u0 , we choose the branch net
for u0 in MIONet to be a linear network, i.e., a linear layer without bias. Moreover,
because the solution u is periodic with respect to x, we decompose the single trunk
net to two independent networks, one for x and one for t. For the trunk net of x, we
apply a periodic layer as the input of FNN [29]:

Trunk(x) = FNN(cos(2\pi x), sin(2\pi x), cos(4\pi x), sin(4\pi x)), x \in \BbbR .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3505

1
Reference
D(x) 0.03 0.030
0.02
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

0.010
0.01
0.0 0.2 0.4 0.6 0.8 1.0
x 0.5 0.050

t
0.25 0.090
g(x)

0.00
0.25 0 0.125
0.0 0.2 0.4 0.6 0.8 1.0 0 0.5 1
x x
1
MIONet 1
Error
0.030 0.015

0.010
0.010
0.5 0.050 0.5
t

0.005
0.090

0 0.125 0 0.000
0 0.5 1 0 0.5 1
x x
1
DeepONet 1
Error
0.030 0.015

0.010
0.010
0.5 0.050 0.5
t

0.005
0.090

0 0.125 0 0.000
0 0.5 1 0 0.5 1
x x

Fig. 2. Example of the diffusion-reaction system. (Top) Examples of the input functions (left)
and the reference solution (right). (Middle) MIONet prediction and corresponding absolute error.
(Bottom) DeepONet prediction and corresponding absolute error.

It is easy to check that by using these cos and sin features, MIONet is automatically
periodic with respect to x. We present the illustration of the modified MIONet in
Figure 3.
We show the accuracy of different networks in Table 3. MIONet performs sig-
nificantly better than DeepONet (same size) and DeepONet (best). By encoding the
periodicity information, MIONet (periodic) obtains the smallest prediction error. An
example of prediction of MIONet (periodic) is shown in Figure 4.
We also investigate how the dataset size affects the accuracy of MIONet and
DeepONet. We train the MIONet (periodic) and DeepONet (best) architectures in
Table 3 for dataset sizes ranging from 200 to 4000, and the results are shown in Figure
5. The prediction errors of both MIONet and DeepONet monotonically decrease with
the increase of dataset size. MIONet always performs better than DeepONet, but for
a larger dataset, the difference becomes smaller.

5. Conclusions. In this study, we aim to learn an operator mapping from a


product of multiple Banach spaces to another Banach space. Our main contribution

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3506 PENGZHAN JIN, SHUAI MENG, AND LU LU

 
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

…… Trunk net …… FNN Trunk net2

(cos 2𝜋𝑥 , sin 2𝜋𝑥 , cos 4𝜋𝑥 , sin 4𝜋𝑥 )

(𝑥, 𝑡) 𝑥 𝑡

Fig. 3. Architecture of the modified MIONet for the advection-diffusion system. There are two
trunk nets, one for x and one for t. The trunk net of x has a periodic layer.

Table 3
MIONet and DeepONet for the advection-diffusion system. MIONet (periodic) has a periodic
layer for the trunk net of x. MIONet, MIONet (periodic), and DeepONet (same size) have the same
number of parameters. DeepONet (best) is the best DeepONet chosen from depth 2--5 and width
100--400.

Depth Width Parameters L2 relative error


MIONet 3 300 422K 1.98 \pm 0.07\%
MIONet (periodic) 3 248 422K 1.29 \pm 0.09\%
DeepONet (same size) 3 343 424K 7.83 \pm 0.49\%
DeepONet (best) 3 300 332K 7.70 \pm 0.69\%

is that for the first time, we provide universal approximation theorems for multiple-
input operator regression based on the tensor product of Banach spaces. Based on the
theory and a low-rank tensor approximation, we propose a new network architecture,
MIONet, which consists of multiple branch nets for encoding the input functions
and one trunk net for encoding the domain of the output function. To show the
effectiveness of MIONet, we have performed three experiments including an ODE
system, a diffusion-reaction system, and an advection-diffusion system. We also show
that it is flexible to customize MIONet to encode the prior knowledge.
In future work, more experiments should be done to test the performance of
MIONet on diverse problems. Moreover, MIONet can be viewed as an extension of
DeepONet from a single branch net to multiple branch nets, and thus recent devel-
opments and extensions of DeepONet (see the discussion in the introduction) can
be directly applied to MIONet. For example, similar to DeepONet with proper or-
thogonal decomposition (POD-DeepONet) [29], we can employ POD in MIONet to
develop POD-MIONet. We can also embed physics into the loss function [43, 10] of
MIONet to develop physics-informed MIONet. These techniques will further improve
the accuracy and efficiency of MIONet.

Appendix A. Proof of Property 2.4. Since X is a Banach space, Pn


is uniformly bounded by the basis constant C. For any \epsilon > 0, we choose finite
points \{ xi \} ki=1 \subset K, such that the union of open balls \cup ki=1 B(xi , \delta ) covers K, where
\epsilon
\delta = 2(1+C) . There exists a large integer m \in \BbbN \ast such that \| xi - Pn (xi )\| < 2\epsilon holds
for all 1 \leq i \leq k and n \geq m. When n \geq m, for any x \in K, assume that x \in B(xj , \delta );

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3507
0.25
0.12
0.00
0.11
0.25

u0(x)
0.10
D(x)

0.09 0.50
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

0.08 0.75
1.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
0.2 0.2
1
Reference 1
MIONet 1
Error 0.015

0.2 0.2
0.010

0.5 0.5 0.5


t

t
0.6 0.6
0.005

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
x 1.0 x 1.0 x 0.000

Fig. 4. Prediction of MIONet (periodic) for the advection-diffusion system.

MIONet
DeepONet
10 1
L 2 relative error

10 2

102 103 104


Dataset size

Fig. 5. L2 relative errors of MIONet and DeepONet for different dataset sizes. MIONet
(periodic) and DeepONet (best) architectures in Table 3 are applied to the advection-diffusion system.

then

\| x - Pn (x)\| = \| (I - Pn )(x - xj ) + xj - Pn (xj )\|


\leq \| I - Pn \| \cdot \| x - xj \| + \| xj - Pn (xj )\|
\epsilon \epsilon
< (1 + C) \cdot +
2(1 + C) 2
= \epsilon .

Appendix B. Proof of Lemma 2.9. As \scrG is uniformly continuous on K1 \times


\cdot \cdot \cdot \times Kn , there exists a \delta > 0 such that \| \scrG (v1 , . . . , vn ) - \scrG (v1\prime , . . . , vn\prime )\| < \epsilon holds for
all vi , vi\prime \in Ki , \| vi - vi\prime \| < \delta , 1 \leq i \leq n. Due to the compactness of Ki , we can choose

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3508 PENGZHAN JIN, SHUAI MENG, AND LU LU

\{ \nu ji \} pj=1
i
\subset Ki such that

pi
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\bigcup
(B.1) B(\nu ji , \delta ) \supset Ki ,
j=1

where B(\nu ji , \delta ) denotes the open ball centered at \nu ji with radius \delta , 1 \leq i \leq n. Now
\~i : Xi \rightarrow \BbbR pi as
define g

\~i (x) = (ReLU(\delta - \| x - \nu 1i \| ), ReLU(\delta - \| x - \nu 2i \| ), . . . , ReLU(\delta - \| x - \nu pi i \| ))T ,


g

\^i (x) : Xi \rightarrow \BbbR pi as


and g

\~i (x)
g
\^i (x) =
g ,
\| \~
gi (x)\| 1 + d(x, Ki )

where ReLU(x) := max(x, 0), and d(x, Ki ) := inf x\prime \in Ki \| x - x\prime \| represents the dis-
tance between x and Ki . g \^i is, in fact, the normalization of g \~i on Ki , and the
condition (B.1) guarantees that g \^i (x) is well defined everywhere, i.e., \| \~ gi (x)\| 1 is non-
zero on Ki while d(x, Ki ) is nonzero outside Ki . Moreover, define u \in Y p1 \times p2 \times \cdot \cdot \cdot \times pn
as
u = (\scrG (\nu j11 , \nu j22 , . . . , \nu jnn ))p1 \times p2 \times \cdot \cdot \cdot \times pn .
We will show that the constructed g \^i and u are what we need.
Denote g \^i = (\^ \^ipi )T , and define Ai [v] = \{ j \in \{ 1, 2, . . . , pi \} | \| \nu ji - v\| < \delta \}
gi1 , . . . , g
for v \in Ki . Given any vi \in Ki , we have

\| \scrG (v1 , . . . , vn ) - u\langle g \^1 (v1 ), . . . , g \^n (vn )\rangle \|


\bigm\| \bigm\|
\bigm\| \sum \bigm\|
1 n j1 jn
\bigm\| \bigm\|
= \bigm\|
\bigm\| \scrG (v 1 , . . . , v n ) - \scrG (\nu j1 , . . . , \nu jn ) \cdot \^
g 1 (v 1 ) \cdot \cdot \cdot g\^ n (v )
n \bigm\|
\bigm\|
\bigm\| j1 ,...,jn \bigm\|
\bigm\| \bigm\|
\bigm\| \sum \bigm\|
j1 jn 1 n j1 jn
\bigm\| \bigm\|
= \bigm\|
\bigm\| (\scrG (v 1 , . . . , v n ) \cdot \^
g 1 (v 1 ) \cdot \cdot \cdot \^
g n (v n ) - \scrG (\nu j1 , . . . , \nu jn ) \cdot \^
g 1 (v 1 ) \cdot \cdot \cdot \^
g n (v ))
n \bigm\|
\bigm\|
\bigm\| j1 ,...,jn \bigm\|
\bigm\| \bigm\|
\bigm\| \sum \bigm\|
1 n j1 jn
\bigm\| \bigm\|
= \bigm\|
\bigm\| (\scrG (v 1 , . . . , v n ) - \scrG (\nu j1 , . . . , \nu jn )) \cdot \^
g 1 (v 1 ) \cdot \cdot \cdot \^
g n (v )
n \bigm\|
\bigm\|
\bigm\| j1 ,...,jn \bigm\|
\bigm\| \bigm\|
\bigm\| \bigm\|
\bigm\| \sum \bigm\|
1 n j1 jn
\bigm\| \bigm\|
= \bigm\|
\bigm\| (\scrG (v1 , . . . , vn ) - \scrG (\nu j1 , . . . , \nu jn )) \cdot g \^1 (v1 ) \cdot \cdot \cdot g \^n (vn )\bigm\| \bigm\|
\bigm\| ji \in Ai [vi ] \bigm\|
\bigm\| 1\leq i\leq n \bigm\|
\sum \bigm\|
\bigm\| \scrG (v1 , . . . , vn ) - \scrG (\nu j1 , . . . , \nu jn )\bigm\| \cdot g \^1j1 (v1 ) \cdot \cdot \cdot g \^njn (vn )
\bigm\|
\leq 1 n
ji \in Ai [vi ]
1\leq i\leq n
\sum
< \^1j1 (v1 ) \cdot \cdot \cdot g
\epsilon \cdot g \^njn (vn )
ji \in Ai [vi ]
1\leq i\leq n

= \epsilon .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3509

\^i , u as
Appendix C. Proof of Theorem 2.10. For any \epsilon > 0, there exist pi , g
defined in the proof of Lemma 2.9, such that
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

sup \| \scrG (v1 , . . . , vn ) - u\langle g \^n (vn )\rangle \| < \epsilon .


\^1 (v1 ), . . . , g
vi \in Ki

Denote M = maxvi \in Ki \| \scrG (v1 , . . . , vn )\| , and then for positive integers qi ,
\^1 \circ Pq11 (v1 ), . . . , g \^n \circ Pqnn (vn )\rangle - u\langle g
\bigm\| \bigm\|
\bigm\| u\langle g \^1 (v1 ), . . . , g \^n (vn )\rangle \bigm\|
\bigm\| \bigm\|
\bigm\| \sum \bigm\|
1 2 n i1 1 in n i1 in
\bigm\| \bigm\|
= \bigm\|
\bigm\| \scrG (\nu ,
i1 i2 \nu , . . . , \nu in ) \cdot (\^
g 1 \circ P (v
q1 1 ) \cdot \cdot \cdot \^
g n \circ P qn n(v ) - \^
g 1 (v 1 ) \cdot \cdot \cdot \^
g n (v ))
n \bigm\|
\bigm\|
\bigm\| i1 ,...,in \bigm\|
\sum \bigm|
\^1i1 \circ Pq11 (v1 ) \cdot \cdot \cdot g
\^nin \circ Pqnn (vn ) - g \^1i1 (v1 ) \cdot \cdot \cdot g
\^nin (vn )\bigm|
\bigm|
\leq M \bigm| g
i1 ,...,in
\bigm| \left( \right) \bigm|
n
\sum \bigm| \bigm| \sum k - 1
\prod i n \bigm|
ij
\prod ik ik
j k
j
\bigl( \bigr) \bigm|
=M \bigm|
\bigm| \^
g j \circ P (v
qj j ) \cdot \^
g j (v j ) \cdot \^
g k \circ P (v
qk k ) - \^
g k (v k ) \bigm|
\bigm|
i1 ,...,in \bigm| k=1 j=1 j=k+1 \bigm|
\left( \right)
n k - 1
\prod i n
i
\sum \sum \prod \bigm| i
\^jj \circ Pqjj (vj ) \cdot \^kk \circ Pqkk (vk ) - g \^kik (vk )\bigm|
\bigm|
\leq M g \^jj (vj ) \cdot \bigm| g
g
i1 ,...,in k=1 j=1 j=k+1
\left( \right)
n k - 1 n
i i
\sum \sum \prod \prod \bigm| i
\^jj \circ Pqjj (vj ) \cdot \^kik (vk )\bigm|
\^kk \circ Pqkk (vk ) - g
\bigm|
=M g \^jj (vj ) \cdot \bigm| g
g
k=1 i1 ,...,in j=1 j=k+1
n \sum
\sum \bigm| i
\^kik (vk )\bigm|
\^kk \circ Pqkk (vk ) - g
\bigm|
\leq M \bigm| g
k=1 ik
n
\sum
\^k \circ Pqkk (vk ) - g
\bigm\| \bigm\|
=M \bigm\| g \^k (vk )\bigm\| 1 .
k=1
\sum i
Note that i g \^j (x) \in [0, 1] for all x \in Xj . Therefore,
\^1 (Pq11 (v1 )), . . . , g
\^n (Pqnn (vn ))\rangle \bigm\|
\bigm\| \bigm\|
\bigm\| \scrG (v1 , . . . , vn ) - u\langle g
\leq \| \scrG (v1 , . . . , vn ) - u\langle g \^n (vn )\rangle \|
\^1 (v1 ), . . . , g
\^1 (Pq1 (v1 )), . . . , g \^n (Pqn (vn ))\rangle - u\langle g
\bigm\| \bigm\|
+ \bigm\| u\langle g 1
\^1 (v1 ), . . . , g
n
\^n (vn )\rangle \bigm\|
n
\sum
\^k \circ Pqkk (vk ) - g
\bigm\| \bigm\|
< \epsilon + M \bigm\| g \^k (vk )\bigm\| 1
k=1
n
\sum
\leq \epsilon + M L\epsilon k (qk ).
k=1

Appendix D. Proof of Theorem 2.5. By Theorem 2.10, for any \epsilon > 0, there
exist positive integers pi , qi and continuous vector functionals g \^i \in C(Xi , \BbbR pi ) and
u \in Y p1 \times p2 \times \cdot \cdot \cdot \times pn , such that

\^1 (Pq11 (v1 )), . . . , g


\^n (Pqnn (vn ))\rangle \bigm\| < \epsilon .
\bigm\| \bigm\|
sup \bigm\| \scrG (v1 , . . . , vn ) - u\langle g
vi \in Ki

Now we define

\^i \circ \psi qi i


gi = g
and obtain this theorem.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3510 PENGZHAN JIN, SHUAI MENG, AND LU LU

Appendix E. Proof of Corollary 2.6.

(2.2)\Rightarrow (2.4): Denoting gi = (gji ), we have


Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

\sum
u\langle g1 , . . . , gn \rangle = gj11 \cdot \cdot \cdot gjnn uj1 \cdot \cdot \cdot jn ,
j1 ,\cdot \cdot \cdot ,jn

which is indeed the form (2.4) by rearrangement and relabeling of the summation.
(2.4)\Rightarrow (2.3): Denoting gi = (gji ), u = (uj ), we then have
p
\sum \sum
gj1 \cdot \cdot \cdot gjn uj = \delta j1 \cdot \cdot \cdot jn+1 gj11 \cdot \cdot \cdot gjnn ujn+1 = (\delta j1 \cdot \cdot \cdot jn+1 )\langle g1 , . . . , gn , u\rangle ,
j=1 j1 ,...,jn+1

where \delta j1 \cdot \cdot \cdot jn+1 is equal to 1 if j1 = \cdot \cdot \cdot = jn+1 ; otherwise, it is 0. Moreover, if uj is
\sum r
approximated by u \~j = k=1 \alpha kj ek , denote u \~ = (\~
uj ), e = (ej ), and we then have

\~ = W \langle g1 , . . . , gn , e\rangle ,
(\delta j1 \cdot \cdot \cdot jn+1 )\langle g1 , . . . , gn , u\rangle

\sum j
where W = ( jn+1 \delta j1 \cdot \cdot \cdot jn+1 \alpha kn+1 ).
(2.3)\Rightarrow (2.2): Denote W = (wj1 \cdot \cdot \cdot jn+1 ), u = (uj ), so

W \langle g1 , . . . , gn , u\rangle = u\langle


\~ g1 , . . . , gn \rangle ,

\sum
\~ = ( jn+1 wj1 \cdot \cdot \cdot jn+1 ujn+1 ).
where u

Appendix F. Proof of Corollary 2.13. Without loss of generality, assume


that \scrG is linear with respect to v1 ; that is, there is a continuous operator defined
on X1 \times K2 \times \cdot \cdot \cdot \times Kn which is linear with respect to v1 and equal to \scrG limited on
K1 \times \cdot \cdot \cdot \times Kn , and for convenience we still denote it as \scrG . Suppose that \{ ei \} , \{ e\ast i \}
are the Schauder basis and coordinate functionals of X1 . For \epsilon > 0, according to the
continuity of \scrG and Property 2.4, there exists a positive integer q1 such that

\epsilon
sup \bigm\| \scrG (v1 , v2 , . . . , vn ) - \scrG (Pq11 (v1 ), v2 , . . . , vn )\bigm\| < .
\bigm\| \bigm\|
vi \in Ki 2

Denote M = maxv1 \in K1 ,1\leq j\leq q1 | e\ast j (v1 )| . Now define continuous operators \scrG j : K2 \times
\cdot \cdot \cdot \times Kn \rightarrow Y as

\scrG j (v2 , . . . , vn ) = \scrG (ej , v2 , . . . , vn ), 1 \leq j \leq q1 .

Then by Corollary 2.12, there exist positive integers pi , qi and continuous vector
functions gi \in C(\BbbR qi , \BbbR pi ), uj = (ujk2 \cdot \cdot \cdot kn ) \in Y p2 \times \cdot \cdot \cdot \times pn , 2 \leq i \leq n, 1 \leq j \leq q1 , such
that
\epsilon
sup \bigm\| \scrG j (v2 , . . . , vn ) - uj \langle g2 (\varphi 2q2 (v2 )), . . . , gn (\varphi nqn (vn ))\rangle \bigm\| <
\bigm\| \bigm\|
, j = 1, . . . , q1 .
vi \in Ki 2q1 M

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3511

Let p1 = q1 , u = (ukk12 \cdot \cdot \cdot kn ) \in Y p1 \times \cdot \cdot \cdot \times pn , g1 : \BbbR q1 \rightarrow \BbbR q1 be the identity map; then
\bigm\| \scrG (Pq1 (v1 ), v2 , . . . , vn ) - u\langle g1 (\varphi 1q (v1 )), . . . , gn (\varphi nq (vn ))\rangle \bigm\|
\bigm\| \bigm\|
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

1 1 n
\bigm\| \bigm\|
\bigm\| q1 \ast q1
\bigm\| \sum \sum \bigm\|
\ast 2 n
\bigm\|
= \bigm\|
\bigm\| ej (v1 )\scrG (ej , v2 , . . . , vn ) - ej (v1 )uj \langle g2 (\varphi q2 (v2 )), . . . , gn (\varphi qn (vn ))\rangle \bigm\|
\bigm\|
\bigm\| j=1 j=1 \bigm\|
q1
\sum
| e\ast j (v1 )| \cdot \bigm\| \scrG (ej , v2 , . . . , vn ) - uj \langle g2 (\varphi 2q2 (v2 )), . . . , gn (\varphi nqn (vn ))\rangle \bigm\|
\bigm\| \bigm\|
\leq
j=1
\epsilon \epsilon
< q1 \cdot M \cdot = .
2q1 M 2
Therefore,
\epsilon \epsilon
sup \bigm\| \scrG (v1 , . . . , vn ) - u\langle g1 (\varphi 1q1 (v1 )), . . . , gn (\varphi nqn (vn ))\rangle \bigm\| < + = \epsilon ,
\bigm\| \bigm\|
vi \in Ki 2 2
where g1 is linear. The proofs for the other two cases are similar.
Appendix G. Proof of Property 2.14. Assume that Wi = (wji )pi =
i i p1 \times p2 \times \cdot \cdot \cdot \times pn
(wjk )pi \times hi , hi = (Hj )hi ; then for u = (uj1 \cdot \cdot \cdot jn ) \in Y ,
\sum
u\langle g1 , . . . , gn \rangle = uj1 \cdot \cdot \cdot jn (wj11 h1 ) \cdot \cdot \cdot (wjnn hn )
j1 ,...,jn
\Biggl( \Biggr) \Biggl( \Biggr)
\sum \sum \sum
= uj1 \cdot \cdot \cdot jn wj11 k Hk1 \cdot \cdot \cdot wjnn k Hkn
j1 ,...,jn k k
\left( \right)
\sum \sum
= uj1 \cdot \cdot \cdot jn wj11 k1 \cdot \cdot \cdot wjnn kn Hk11 \cdot \cdot \cdot Hknn
k1 ,...,kn j1 ,...,jn

= u\langle \~ h1 , . . . , hn \rangle ,

\~ = ( j1 ,...,jn uj1 \cdot \cdot \cdot jn wj11 k1 \cdot \cdot \cdot wjnn kn )h1 \times \cdot \cdot \cdot \times hn . Briefly speaking, the linear out-
\sum
where u
put layers of gi can be merged into u. The proof for the other case is similar.
Appendix H. MIONet for finite-dimensional image space. Remark 2.16
for tensor neural networks also gives the approximation theorem for the operators
projecting onto both finite-dimensional domain and image space. Given data points
(v1 , . . . , vn , \scrG (v1 , . . . , vn )), we first transform them into a training set

\{ \varphi 1q1 (v1k ), . . . , \varphi nqn (vnk ), \varphi Ym (\scrG (v1k , . . . , vnk ))\} N
k=1

\ast \ast
by determining basis elements \{ ei \} m Y T
i=1 for Y with \varphi m (x) = (e1 (x), . . . , em (x)) . Then
the loss function can be written as
N
1 \sum \bigm\|
\bigm\| \varphi Ym (\scrG (v1k , . . . , vnk )) - W g
\bigm\| 2
\~1 (\varphi 1q1 (v1k )) \odot \cdot \cdot \cdot \odot g
\~n (\varphi nqn (vnk )) - b\bigm\| 2 ,
\bigl( \bigr)
MSE =
mN
k=1

where g\~i : \BbbR qi \rightarrow \BbbR p are neural networks to be trained. W \in \BbbR m\times p and b \in \BbbR m are
trainable weights and bias, respectively. After training, we make prediction by

\~ 1 , . . . , vn ) = (e1 , . . . , em ) \cdot W g \~1 (\varphi 1q1 (v1 )) \odot \cdot \cdot \cdot \odot g
\~n (\varphi nqn (vn )) + b .
\bigl( \bigl( \bigr) \bigr)
\scrG (v

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3512 PENGZHAN JIN, SHUAI MENG, AND LU LU

Appendix I. Loss function via numerical integration. Suppose that


Y = C[0, 1]. For \scrT = \{ v1k , . . . , vnk , \scrG (v1k , . . . , vnk )\} N
k=1 , the general loss function can be
computed as
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

N
1 \sum \~ 1k , . . . , vnk )),
\scrL (\scrT ) = I(\scrG (v1k , . . . , vnk ) - \scrG (v
N
k=1

where I(\cdot ) is a numerical integration. For example, for xk uniformly sampled on [0, 1]
(0 = x0 < \cdot \cdot \cdot < xm = 1), we have the following choices of I(\cdot ):
\bullet rectangle rule:
\Biggl( m \biggl( \biggr) \Biggr)
1 \sum xk - 1 + xk
Irec (f ) = f ,
m 2
k=1

\bullet trapezoidal rule:


\Biggl( m
\Biggr)
1 \sum f (xk - 1 ) + f (xk )
Itra (f ) = ,
m 2
k=1

\bullet Monte Carlo integration:


\Biggl( m
\Biggr)
1 \sum
Imon (f ) = f (xk ) .
m
k=1

For a high-dimensional integration, Monte Carlo integration usually performs better.

REFERENCES

[1] K. Bhattacharya, B. Hosseini, N. B. Kovachki, and A. M. Stuart, Model Reduction and


Neural Networks for Parametric PDEs, preprint, https://arxiv.org/abs/2005.03180, 2020.
[2] S. Cai, Z. Wang, L. Lu, T. A. Zaki, and G. E. Karniadakis, DeepM\&Mnet: Inferring the
electroconvection multiphysics fields based on operator approximation by neural networks,
J. Comput. Phys., 436 (2021), 110296.
[3] T. Chen and H. Chen, Universal approximation to nonlinear operators by neural networks
with arbitrary activation functions and its application to dynamical systems, IEEE Trans.
Neural Networks, 6 (1995), pp. 911--917.
[4] Y. Chen, L. Lu, G. E. Karniadakis, and L. Dal Negro, Physics-informed neural net-
works for inverse problems in nano-optics and metamaterials, Optics Express, 28 (2020),
pp. 11618--11633.
[5] M. Daneker, Z. Zhang, G. E. Karniadakis, and L. Lu, Systems Biology: Identifiability
Analysis and Parameter Identification via Systems-Biology Informed Neural Networks,
preprint, https://arxiv.org/abs/2202.01723, 2022.
[6] B. Deng, Y. Shin, L. Lu, Z. Zhang, and G. E. Karniadakis, Convergence Rate of DeepONets
for Learning Operators Arising from Advection-Diffusion Equations, preprint, https://
arxiv.org/abs/2102.10621, 2021.
[7] P. C. Di Leoni, L. Lu, C. Meneveau, G. Karniadakis, and T. A. Zaki, DeepONet Prediction
of Linear Instability Waves in High-Speed Boundary Layers, preprint, https://arxiv.org/
abs/2105.08697, 2021.
[8] W. E and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving
variational problems, Commun. Math. Stat., 6 (2018), pp. 1--12.
[9] M. Fabian, P. Habala, P. Hajek,\' V. Montesinos, and V. Zizler, Banach Space Theory:
The Basis for Linear and Nonlinear Analysis, Springer, New York, 2011.
[10] S. Goswami, M. Yin, Y. Yu, and G. E. Karniadakis, A physics-informed variational Deep-
ONet for predicting crack path in quasi-brittle materials, Comput. Methods Appl. Mech.
Engrg., 391 (2022), 114587.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


MIONET: LEARNING MULTIPLE-INPUT OPERATORS A3513

[11] H. Johan, Tensor rank is NP-complete, J. Algorithms, 4 (1990), pp. 644--654.


[12] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics-
informed machine learning, Nature Rev. Phys., 3 (2021), pp. 422--440.
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

[13] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proceedings of
the 3rd International Conference on Learning Representations (ICLR), San Diego, CA,
Conference Track Proceedings, 2015.
[14] G. Kissas, J. Seidman, L. F. Guilhoto, V. M. Preciado, G. J. Pappas, and P. Perdikaris,
Learning Operators with Coupled Attention, preprint, https://arxiv.org/abs/2201.01032,
2022.
[15] G. Kissas, Y. Yang, E. Hwuang, W. R. Witschey, J. A. Detre, and P. Perdikaris,
Machine learning in cardiovascular flows modeling: Predicting arterial blood pressure from
non-invasive 4D flow MRI data using physics-informed neural networks, Comput. Methods
Appl. Mech. Engrg., 358 (2020), 112623.
[16] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51
(2009), pp. 455--500, https://doi.org/10.1137/07070111X.
[17] N. Kovachki, S. Lanthaler, and S. Mishra, On universal approximation and error bounds
for Fourier neural operators, J. Mach. Learn. Res., 22 (2021), 290.
[18] N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and
A. Anandkumar, Neural Operator: Learning Maps between Function Spaces, preprint,
https://arXiv.org/abs/2108.08481, 2021.
[19] J. B. Kruskal, Rank, decomposition, and uniqueness for 3-way and n-way arrays, in Multiway
Data Analysis, Papers from the International Meeting on the Analysis of Multiway Data
Matrices held in Rome, 1988, R. Coppi and S. Bolasco, eds., North-Holland, Amsterdam,
1989, pp. 7--18.
[20] S. Lanthaler, S. Mishra, and G. E. Karniadakis, Error Estimates for DeepONets: A Deep
Learning Framework in Infinite Dimensions, preprint, https://arxiv.org/abs/2102.09618,
2021.
[21] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and
A. Anandkumar, Fourier Neural Operator for Parametric Partial Differential Equations,
preprint, https://arxiv.org/abs/2010.08895, 2020.
[22] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and
A. Anandkumar, Neural Operator: Graph Kernel Network for Partial Differential Equa-
tions, preprint, https://arxiv.org/abs/2003.03485, 2020.
[23] C. Lin, Z. Li, L. Lu, S. Cai, M. Maxey, and G. E. Karniadakis, Operator learning for
predicting multiscale bubble growth dynamics, J. Chem. Phys., 154 (2021), 104118.
[24] C. Lin, M. Maxey, Z. Li, and G. E. Karniadakis, A seamless multiscale operator neural
network for inferring bubble dynamics, J. Fluid Mech., 929 (2021), A18.
[25] G. Lin, C. Moya, and Z. Zhang, Accelerated Replica Exchange Stochastic Gradient Langevin
Diffusion Enhanced Bayesian DeepONet for Solving Noisy Parametric PDEs, preprint,
https://arxiv.org/abs/2111.02484, 2021.
[26] L. Liu and W. Cai, Multiscale DeepONet for Nonlinear Operators in Oscillatory Function
Spaces for Building Seismic Wave Responses, preprint, https://arxiv.org/abs/2111.04860,
2021.
[27] L. Lu, P. Jin, and G. E. Karniadakis, DeepONet: Learning Nonlinear Operators for Identi-
fying Differential Equations Based on the Universal Approximation Theorem of Operators,
preprint, https://arxiv.org/abs/1910.03193, 2019.
[28] L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis, Learning nonlinear operators
via DeepONet based on the universal approximation theorem of operators, Nature Mach.
Intell., 3 (2021), pp. 218--229.
[29] L. Lu, X. Meng, S. Cai, Z. Mao, S. Goswami, Z. Zhang, and G. E. Karniadakis, A
Comprehensive and Fair Comparison of Two Neural Operators (with Practical Extensions)
Based on FAIR Data, preprint, https://arxiv.org/abs/2111.05512, 2021.
[30] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis, DeepXDE: A deep learning library for
solving differential equations, SIAM Rev., 63 (2021), pp. 208--228, https://doi.org/10.1137/
19M1274067.
[31] L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, and S. G. Johnson, Physics-informed
neural networks with hard constraints for inverse design, SIAM J. Sci. Comput., 43 (2021),
pp. B1105--B1132, https://doi.org/10.1137/21M1397908.
[32] Z. Mao, L. Lu, O. Marxen, T. A. Zaki, and G. E. Karniadakis, DeepM\&Mnet for hyper-
sonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using
neural-network approximation of operators, J. Comput. Phys., 447 (2021), 110698.
[33] C. Marcati and C. Schwab, Exponential Convergence of Deep Operator Networks for Elliptic

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A3514 PENGZHAN JIN, SHUAI MENG, AND LU LU

Partial Differential Equations, preprint, https://arxiv.org/abs/2112.08125, 2021.


[34] N. H. Nelsen and A. M. Stuart, The random feature model for input-output maps between
Banach spaces, SIAM J. Sci. Comput., 43 (2021), pp. A3212--A3243, https://doi.org/10.
Downloaded 11/17/22 to 115.27.198.15 . Redistribution subject to SIAM license or copyright; see https://epubs.siam.org/terms-privacy

1137/20M133957X.
[35] J. D. Osorio, Z. Wang, G. Karniadakis, S. Cai, C. Chryssostomidis, M. Panwar, and
R. Hovsapian, Forecasting solar-thermal systems performance under transient operation
using a data-driven machine learning approach based on the deep operator network archi-
tecture, Energy Conversion and Management, 252 (2022), 115063.
[36] G. Pang, L. Lu, and G. E. Karniadakis, fPINNs: Fractional physics-informed neural net-
works, SIAM J. Sci. Comput., 41 (2019), pp. A2603--A2626, https://doi.org/10.1137/
18M1229845.
[37] R. G. Patel, N. A. Trask, M. A. Wood, and E. C. Cyr, A physics-informed operator re-
gression framework for extracting data-driven continuum models, Comput. Methods Appl.
Mech. Engrg., 373 (2021), 113500.
[38] M. Raissi, P. Perdikaris, and G. E. Karniadakis, Physics-informed neural networks: A deep
learning framework for solving forward and inverse problems involving nonlinear partial
differential equations, J. Comput. Phys., 378 (2019), pp. 686--707.
[39] M. Raissi, A. Yazdani, and G. E. Karniadakis, Hidden fluid mechanics: Learning velocity
and pressure fields from flow visualizations, Science, 367 (2020), pp. 1026--1030.
[40] R. A. Ryan, Introduction to Tensor Products of Banach Spaces, Springer Monogr. Math.,
Springer-Verlag London, Ltd., London, 2002.
[41] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial dif-
ferential equations, J. Comput. Phys., 375 (2018), pp. 1339--1364.
[42] N. Trask, R. G. Patel, B. J. Gross, and P. J. Atzberger, GMLS-Nets: A Framework for
Learning from Unstructured Data, preprint, https://arxiv.org/abs/1909.05371, 2019.
[43] S. Wang, H. Wang, and P. Perdikaris, Learning the solution operator of parametric partial
differential equations with physics-informed DeepONets, Sci. Adv., 7 (2021), eabi8605.
[44] A. Yazdani, L. Lu, M. Raissi, and G. E. Karniadakis, Systems biology informed deep
learning for inferring parameters and hidden dynamics, PLoS Comput. Biol., 16 (2020),
e1007575.
[45] M. Yin, E. Ban, B. V. Rego, E. Zhang, C. Cavinato, J. D. Humphrey, and G. Em Karni-
adakis, Simulating progressive intramural damage leading to aortic dissection using Deep-
ONet: An operator--regression neural network, J. R. Soc. Interface, 19 (2022), 20210670.
[46] H. You, Y. Yu, M. D'Elia, T. Gao, and S. Silling, Nonlocal Kernel Network (NKN): A
Stable and Resolution-Independent Deep Neural Network, preprint, https://arxiv.org/abs/
2201.02217, 2022.
[47] J. Yu, L. Lu, X. Meng, and G. E. Karniadakis, Gradient-Enhanced Physics-Informed Neural
Networks for Forward and Inverse PDE Problems, preprint, https://arxiv.org/abs/2111.
02801, 2021.
[48] D. Zhang, L. Lu, L. Guo, and G. E. Karniadakis, Quantifying total uncertainty in physics-
informed neural networks for solving forward and inverse stochastic problems, J. Comput.
Phys., 397 (2019), 108850.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy