0% found this document useful (0 votes)
19 views6 pages

Learning Data-Driven Discretizations

This document summarizes a research paper that introduces a new method called "data-driven discretization" for numerically solving partial differential equations (PDEs). The method uses neural networks to learn optimized approximations of spatial derivatives in PDEs directly from high-resolution solutions, allowing integration at coarser resolutions than standard finite-difference methods. The resulting numerical methods were remarkably accurate for a collection of nonlinear PDEs in 1D, allowing integration at 4-8x coarser resolutions. This learning-based approach differs from traditional coarse-graining techniques by directly learning from fine-scale solutions rather than analyzing equations of motion.

Uploaded by

sridevi10mas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Learning Data-Driven Discretizations

This document summarizes a research paper that introduces a new method called "data-driven discretization" for numerically solving partial differential equations (PDEs). The method uses neural networks to learn optimized approximations of spatial derivatives in PDEs directly from high-resolution solutions, allowing integration at coarser resolutions than standard finite-difference methods. The resulting numerical methods were remarkably accurate for a collection of nonlinear PDEs in 1D, allowing integration at 4-8x coarser resolutions. This learning-based approach differs from traditional coarse-graining techniques by directly learning from fine-scale solutions rather than analyzing equations of motion.

Uploaded by

sridevi10mas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Learning data-driven discretizations for partial

differential equations
Yohai Bar-Sinaia,1,2 , Stephan Hoyerb,1,2 , Jason Hickeyb , and Michael P. Brennera,b
a
School of Engineering and Applied Sciences, Harvard University, Cambridge MA 02138; and b Google Research, Mountain View, CA 94043

Edited by John B. Bell, Lawrence Berkeley National Laboratory, Berkeley, CA, and approved June 21, 2019 (received for review August 14, 2018)

The numerical solution of partial differential equations (PDEs) different from coarse-graining techniques that are currently in
is challenging because of the need to resolve spatiotemporal use: Instead of analyzing equations of motion to derive effec-
features over wide length- and timescales. Often, it is computa- tive behavior, we directly learn from high-resolution solutions to
tionally intractable to resolve the finest features in the solution. these equations.
The only recourse is to use approximate coarse-grained repre-
sentations, which aim to accurately represent long-wavelength Related Work
dynamics while properly accounting for unresolved small-scale Several related approaches for computationally extracting effec-
physics. Deriving such coarse-grained equations is notoriously tive dynamics have been previously introduced. Classic works
difficult and often ad hoc. Here we introduce data-driven used neural networks for discretizing dynamical systems (5,
Downloaded from https://www.pnas.org by "INDIAN INSTITUTE OF SCIENCE, BANGALORE" on September 6, 2023 from IP address 14.139.128.56.

discretization, a method for learning optimized approximations 6). Similarly, equation-free modeling approximates coarse-scale
to PDEs based on actual solutions to the known underlying
derivatives by remapping coarse initial conditions to fine scales
equations. Our approach uses neural networks to estimate spa-
which are integrated exactly (7). The method has a similar spirit
tial derivatives, which are optimized end to end to best satisfy
to our approach, but it does not learn from fine-scale dynamics
the equations on a low-resolution grid. The resulting numeri-
and use the memorized statistics in subsequent times to reduce
cal methods are remarkably accurate, allowing us to integrate in
the computational load. Recent works have applied machine
time a collection of nonlinear equations in 1 spatial dimension
at resolutions 4× to 8× coarser than is possible with standard
learning to partial differential equations (PDEs), either focus-
finite-difference methods.
ing on speed (8–10) or recovering unknown dynamics (11, 12).
Models focused on speed often replace the slowest component
of a physical model with machine learning, e.g., the solution
coarse graining | machine learning | computational physics
of Poisson’s equation in incompressible fluid simulations (9),
subgrid cloud models in climate simulations (10), or building

S olutions of nonlinear partial differential equations can have


enormous complexity, with nontrivial structure over a large
range of length- and timescales. Developing effective theories
reduced-order models that approximate dynamics in a lower-
dimensional space (8, 13, 14). These approaches are promising,
but learn higher-level components than our proposed method.
that integrate out short lengthscales and fast timescales is a An important development is the ability to satisfy some physi-
long-standing goal. As examples, geometric optics is an effec- cal constraints exactly by plugging learned models into a fixed
tive theory of Maxwell equations at scales much longer than equation of motion. For example, valid fluid dynamics can be
the wavelength of light (1); density functional theory models the
full many-body quantum wavefunction with a lower-dimensional
object—the electron density field (2); and the effective viscos- Significance
ity of a turbulent fluid parameterizes how small-scale features
affect large-scale behavior (3). These models derive their coarse- In many physical systems, the governing equations are known
grained dynamics by more or less systematic integration of the with high confidence, but direct numerical solution is pro-
underlying governing equations (by using, respectively, WKB hibitively expensive. Often this situation is alleviated by writ-
theory, local density approximation, and a closure relation for the ing effective equations to approximate dynamics below the
Reynolds stress). The gains from coarse graining are, of course, grid scale. This process is often impossible to perform ana-
enormous. Conceptually, it allows a deep understanding of emer- lytically and is often ad hoc. Here we propose data-driven
gent phenomena that would otherwise be masked by irrelevant discretization, a method that uses machine learning to system-
details. Practically, it allows computation of vastly larger systems. atically derive discretizations for continuous physical systems.
Averaging out unresolved degrees of freedom invariably On a series of model problems, data-driven discretization
replaces them by effective parameters that mimic typical behav- gives accurate solutions with a dramatic drop in required
ior. In other words, we identify the salient features of the dynam- resolution.
ics at short and fast scales and replace these with terms that
Author contributions: Y.B.-S., S.H., J.H., and M.P.B. designed research; Y.B.-S. and S.H. per-
have a similar average effect on the long and slow scales. Deriv- formed research; Y.B.-S. and S.H. analyzed data; and Y.B.-S., S.H., J.H., and M.P.B. wrote
ing reliable effective equations is often challenging (4). Here we the paper.y
approach this challenge from the perspective of statistical infer- The authors declare no conflict of interest.y
ence. The coarse-grained representation of the function contains This article is a PNAS Direct Submission.y
only partial information about it, since short scales are not mod- This open access article is distributed under Creative Commons Attribution-NonCommercial-
eled. Deriving coarse-grained dynamics requires first inferring NoDerivatives License 4.0 (CC BY-NC-ND).y
the small-scale structure using the partial information (recon- Data deposition: Source code is available on GitHub (https://github.com/google/data-
struction) and then incorporating its effect on the coarse-grained driven-discretization-1d).y
field. We propose to perform reconstruction using machine- 1
Y.B.-S. and S.H. contributed equally to this work.y
learning algorithms, which have become extraordinarily efficient 2
To whom correspondence may be addressed. Email: ybarsinai@gmail.com or shoyer@
at identifying and reconstructing recurrent patterns in data. Hav- google.com.y
ing reconstructed the fine features, modeling their effect can This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
be done using our physical knowledge about the system. We 1073/pnas.1814058116/-/DCSupplemental.y
call our method data-driven discretization. It is qualitatively Published online July 16, 2019.

15344–15349 | PNAS | July 30, 2019 | vol. 116 | no. 31 www.pnas.org/cgi/doi/10.1073/pnas.1814058116


guaranteed by learning either velocity fields directly (12) or a vec- The main idea of this work is that unresolved physics can
tor potential for velocity in the case of incompressible dynamics instead be learned directly from data. Instead of deriving an
(8). Closely related to this work, neural networks can be used approximate coarse-grained continuum model and discretizing
to calculate closure conditions for coarse-grained turbulent flow it, we suggest directly learning low-resolution discrete mod-
models (15, 16). However, these models rely on existing coarse- els that encapsulate unresolved physics. Rigorous mathematical
grained schemes specific to turbulent flows and do not discretize work shows that the dimension of a solution manifold for a
the equations directly. Finally, ref. 17 suggested discretizations nonlinear PDE is finite (25, 26) and that approximate parame-
whose solutions can be analytically guaranteed to converge to terizations can be constructed (27–29). If we knew the solution
the center manifold of the governing equation, but not in a manifold, we could generate equation-specific approximations
data-driven manner. for the spatial derivatives in Eq. 2, approximations that have
the potential to hold even when the system is underresolved.
Data-Driven Subgrid-Scale Modeling In contrast to standard numerical methods, the coefficients αi
(n)

Consider a generic PDE, describing the evolution of a continu- are equation dependent. Different regions in space (e.g., inside
ous field v (x , t), and outside a shock) will use different coefficients. To discover
  these formulas, we use machine learning: We first generate a
∂v ∂v ∂v training set of high-resolution data and then learn the discrete
= F t, x , v , , ,··· . [1]
∂t ∂xi ∂xi ∂xj approximations to the derivatives in Eq. 2 from this dataset.
This produces a tradeoff in computational cost, which can be
Most PDEs in the exact sciences can be cast in this form, alleviated by carrying out high-resolution simulations on small
Downloaded from https://www.pnas.org by "INDIAN INSTITUTE OF SCIENCE, BANGALORE" on September 6, 2023 from IP address 14.139.128.56.

including equations that describe hydrodynamics, electrodynam- systems to develop local approximations to the solution mani-
ics, chemical kinetics, and elasticity. A common algorithm to fold and using them to solve equations in much larger systems at
numerically solve such equations is the method of lines (18): significantly reduced spatial resolution.
Given a spatial discretization x1 , . . . , xN , the field v (x , t) is
Burgers’ Equation. For concreteness, we demonstrate this ap-

MATHEMATICS
represented by its values at node points vi (t) = v (xi , t) (finite

APPLIED
differences) or by its averages over a grid cell, vi (t) = ∆x −1 proach with a specific example in 1 spatial dimension. Burg-
R xi +∆x /2
v (x 0 , t)dx 0 (finite volumes), where ∆x = xi − xi−1 is ers’ equation is a simple nonlinear equation which models fluid
xi −∆x /2 dynamics in 1D and features shock formation. In its conservative
the spatial resolution (19). The time evolution of vi can be form, it is written as
computed directly from Eq. 1 by approximating the spatial
derivatives at these points. There are various methods for this
v2
 
approximation—polynomial expansion, spectral differentiation, ∂v ∂ ∂v ∂v
+ J v, = f (x , t), J≡ −η , [4]
etc.—all yielding formulas resembling ∂t ∂x ∂x 2 ∂x

∂ n v X (n) where η > 0 is the viscosity and f (x , t) is an external forcing


≈ αi vi , [2]
∂x n i
term. J is the flux. Generically, solutions of Eq. 4 sponta-
neously develop sharp shocks, with specific relationships between
(n) the shock height, width, and velocity (19) that define the local
where the αi are precomputed coefficients. For example, the 1- structure of the solution manifold.
∂v
dimensional (1D) finite-difference approximation for ∂x to first-
vi+1 −vi
order accuracy is ∂x v (xi ) = ∆x + O(∆x ) .
Standard schemes use one set of precomputed coefficients for
all points in space, while more sophisticated methods alternate A B
between different sets of coefficients according to local rules (20,
21). This discretization transforms Eq. 1 into a set of coupled
ordinary differential equations of the form

∂vi
= F (t, x , v1 , . . . , vN ) [3]
∂t
that can be numerically integrated using standard techniques.
C
The accuracy of the solution to Eq. 3 depends on ∆x , converging
to a solution of Eq. 2 as ∆x → 0. Qualitatively, accuracy requires
that ∆x is smaller than the spatial scale of the smallest feature
of the field v (x , t).
However, the scale of the smallest features is often orders of
magnitude smaller than the system size. High-performance com-
puting has been driven by the ever increasing need to accurately Fig. 1. Polynomial vs. neural net-based interpolation. (A) Interpolation
resolve smaller-scale features in PDEs. Even with petascale com- between known points (blue diamonds) on a segment of a typical solution
putational resources, the largest direct numerical simulation of of Burgers’ equation. Polynomial interpolation exhibits spurious “over-
a turbulent fluid flow ever performed has Reynolds number of shoots” in the vicinity of shock fronts. These errors compound when
order 1,000, using about 5 × 1011 grid points (22–24). Simula- integrated in time, such that a naive finite-difference method at this res-
tions at higher Reynolds number require replacing the physical olution quickly diverges. In contrast, the neural network interpolation is
so close to the exact solution that it cannot be visually distinguished. (B)
equations with effective equations that model the unresolved
Histogram of exact vs. interpolated function values over our full validation
physics. These equations are then discretized and solved numer- dataset. The neural network vastly reduces the number of poor predictions.
ically, e.g., using the method of lines. This overall procedure (C) Absolute error vs. local curvature. The thick line shows the median and
essentially modifies Eq. 2, by changing the αi to account for the the shaded region shows the central 90% of the distribution over the vali-
unresolved degrees of freedom, replacing the discrete equations dation set. The neural network makes much smaller errors in regions of high
in Eq. 3 with a different set of discrete equations. curvature, which correspond to shocks.

Bar-Sinai et al. PNAS | July 30, 2019 | vol. 116 | no. 31 | 15345
With this in mind, consider a typical segment of a solu- vide the cell average to the network as the “true” value of the
tion to Burgers’ equation (Fig. 1A). We want to compute the discretized field.
time derivative of the field given a low-resolution set of points Integrating Eq. 4, it is seen that the change rate of the cell
(blue diamonds in Fig. 1). Standard finite-difference formulas averages is completely determined by the fluxes at cell bound-
predict this time derivative by approximating v as a piecewise- aries. This is an exact relation, in which the only challenge is
polynomial function passing through the given points (orange estimating the flux given the cell averages. Thus, prediction is
curves in Fig. 1). But solutions to Burger’s equations are not carried out in 3 steps: First, the network reconstructs the spatial
polynomials: They are shocks with characteristic properties. By derivatives on the boundary between grid cells (staggered grid).
using this information, we can derive a more accurate, albeit Then, the approximated derivatives are used to calculate the flux
equation-specific, formula for the spatial derivatives. For the J using the exact formula Eq. 4. Finally, the temporal derivative
method to work it should be possible to reconstruct the fine- of the cell averages is obtained by calculating the total change
scale solution from low-resolution data. To this end, we ran many at each cell by subtracting J at the cell’s left and right bound-
simulations of Eq. 4 and used the resulting data to train a neu- aries. The calculation of the time derivative from the flux can also
ral network. Fig. 1 compares the predictions of our neural net be done using traditional techniques that promote stability, such
(details below and in SI Appendix) to fourth-order polynomial as monotone numerical fluxes (19). For some experiments, we
interpolation. This learned model is clearly far superior to the use Godunov flux, inspired by finite-volume ENO schemes (20,
polynomial approximation, demonstrating that the spatial res- 21), but it did not improve predictions for our neural networks
olution required for parameterizing the solution manifold can models.
be greatly reduced with equation-specific approximations rather Dividing the inference procedure into these steps is favor-
Downloaded from https://www.pnas.org by "INDIAN INSTITUTE OF SCIENCE, BANGALORE" on September 6, 2023 from IP address 14.139.128.56.

than finite differences. able in a few aspects: First, it allows us to constrain the model
at the various stages using traditional techniques; the conserva-
Models for Time Integration tive constraint, numerical flux, and formal polynomial accuracy
The natural question to ask next is whether such parameteri- constraints are what we use here, but other constraints are also
zations can be used for time integration. For this to work well, conceivable. Second, this scheme limits the machine-learning
integration in time must be numerically stable, and our models part to reconstructing the unknown solution at cell boundaries,
need a strong generalization capacity: Even a single error could which is the main conceptual challenge, while the rest of the
throw off the solution for later times. scheme follows either the exact dynamics or traditional approx-
To achieve this, we use multilayer neural networks to parame- imations for them. Third, it makes the trained model more
terize the solution manifold, because of their flexibility, including interpretable since the intermediate outputs (e.g., J or αi ) have
the ability to impose physical constraints and interpretability clear physical meaning. Finally, these physical constraints con-
through choice of model architecture. The high-level aspects of tribute to more accurate and stable models, as detailed in the
the network’s design, which we believe are of general interest, ablation study in SI Appendix.
are described below. Additional technical details are described in
SI Appendix and source code is available online at https://github. Choice of Loss. The loss of a neural net is the objective function
com/google/data-driven-discretization-1d. minimized during training. Rather than optimizing the predic-
tion accuracy of the spatial derivatives, we optimize the accuracy
Pseudolinear Representation. Our network represents spatial of the resulting time derivative*. This allows us to incorpo-
derivatives with a generalized finite-difference formula simi- rate physical constraints in the training procedure and directly
lar to Eq. 2: The output of the network is a list of coeffi- optimize the final predictions rather than intermediate stages.
cients α1 , . . . , αN such that the nth derivative is expressed as a Our loss is the mean-squared error between the predicted time
(n)
pseudolinear filter, Eq. 2, where the coefficients αi (v1 , v2 , . . . ) derivative and labeled data produced by coarse graining the fully
depend on space and time through their dependence on the field resolved simulations.
values in the neighboring cells. Finding the optimal coefficients Note that a low value of our training loss is a necessary but not
is the crux of our method. sufficient condition for accurate and stable numerical integration
The pseudolinear representation is a direct generalization of over time. Many models with low training loss exhibited poor
the finite-difference scheme of Eq. 2. Moreover, exactly as in the stability when numerically integrated (e.g., without the conserva-
case of Eq. 2, a Taylor expansion allows us to guarantee formal tive constraint), particularly for equations with low dissipation.
polynomial accuracy. That is, we can impose that approxima- From a machine-learning perspective, this is unsurprising: Imita-
tion errors decay as O(∆x m ) for some m ≤ N − n, by layering tion learning approaches, such as our models, often exhibit such
a fixed affine transformation (SI Appendix). We found the best issues because the distribution of inputs produced by the model’s
results when imposing linear accuracy, m = 1 with a 6-point sten- own predictions can differ from the training data (30). Incor-
cil (N = 6), which we used for all results shown here. Finally, porating the time-integrated solution into the loss improved
we note that this pseudolinear form is also a generalization of predictions in some cases (as in ref. 9), but did not guarantee sta-
the popular essentially nonoscillatory (ENO) and weighted ENO bility, and could cause the training procedure itself to diverge due
(WENO) methods (20, 21), which choose a local linear filter to decreased stability in calculating the loss. Stability for learned
(or a combination of filters) from a precomputed list accord- numerical methods remains an important area of exploration for
ing to an estimate of the solution’s local curvature. WENO is an future work.
efficient, human-understandable, way of adaptively choosing fil-
ters, inspired by nonlinear approximation theory. We improve on Learned Coefficients. We consider 2 different parameterizations
WENO by replacing heuristics with directly optimized quantities. for learned coefficients. In our first parameterization, we learn
optimized time- and space-independent coefficients. These fixed
Physical Constraints. Since Burgers’ equation is an instance of
the continuity equation, as with traditional methods, a major
increase in stability is obtained when using a finite-volume
* For
scheme, ensuring the coarse-grained solution satisfies the con- one specific case, namely the constant-coefficient model of Burgers’ equation
with Godunov flux limiting, trained models showed poor performance (e.g., not
servation law implied by the continuity equation. That is, coarse- monotonically increasing with resample factor) unless the loss explicitly included the
grained equations are derived for the cell averages of the field time-integrated solution, as done in ref. 9. Results shown in Figs. 3 and 4 use this loss
v , rather than its nodal values (19). During training we pro- for the constant-coefficient models with Burgers’ equation. See details in SI Appendix.

15346 | www.pnas.org/cgi/doi/10.1073/pnas.1814058116 Bar-Sinai et al.


coefficients minimize the loss when averaged over the whole to Eq. 4 for different realizations of f (x , t) at high enough reso-
training set for a particular equation, without allowing the lution to ensure mesh convergence. These realizations of f were
scheme to adapt the coefficients according to local features of the drawn from the same distribution as those used for training, but
solution. Below, we refer to these as “optimized constant coeffi- were not in the training set. Then, for the same realization of the
cients.” In our second parameterization, we allow the coefficients forcing we solved the equation at a lower, possibly underresolved
to be an arbitrary function of the neighboring field values {vi }, resolution using 4 different methods for calculating the flux: 1)
implemented as a fully convolutional neural network (31). We A standard finite-volume scheme with either first-order or third-
use the exact same architecture (3 layers, each with 32 filters, order accuracy; 2) a fifth-order upwind-biased WENO scheme
kernel size of 5, and ReLU nonlinearity) for coarse graining all with Godunov flux (21); 3) spatial derivatives estimated by con-
equations discussed in this work. stant optimized coefficients, with and without Godunov flux; and
Example coefficients predicted by our trained models are 4) spatial derivatives estimated by the space- and time-dependent
shown in Fig. 2 and SI Appendix, Fig. S3. Both the optimized coefficients, computed with a neural net.
constant and data-dependent coefficients differ from baseline Results are shown in Fig. 3. Fig. 3A compares the integra-
polynomial schemes, particularly in the vicinity of the shock. The tion results for a particular realization of the forcing for different
neural network solutions are particularly interesting: They do not values of the resample factor, that is, the ratio between the num-
appear to be using 1-sided stencils near the shock, in contrast to ber of grid points in the low-resolution calculation and that of
traditional numerical methods such as WENO (21) which avoid the fully converged solution† . Our learned models, with both
placing large weights across discontinuities. constant and solution-dependent coefficients, can propagate
The output coefficients can also be interpreted physically. For the solution in time and dramatically outperform the baseline
Downloaded from https://www.pnas.org by "INDIAN INSTITUTE OF SCIENCE, BANGALORE" on September 6, 2023 from IP address 14.139.128.56.

example, coefficients for both ∂v /∂x (Fig. 2 B, Inset) and v (SI method at low resolution. Importantly, the ringing effect around
Appendix, Fig. S3C) are either right or left biased, opposite the the shocks, which leads to numerical instabilities, is practically
sign of v . This is in line with our physical intuition: Burgers’ eliminated.
equation describes fluid flow, and the sign of v corresponds to Since our model is trained on fully resolved simulations, a cru-
the direction of flow. Coefficients that are biased in the oppo- cial requirement for our method to be of practical use is that

MATHEMATICS
site direction of v essentially look “upwind,” a standard strategy training can be done on small systems, but still produce models

APPLIED
in traditional numerical methods for solving hyperbolic PDEs that perform well on larger ones. We expect this to be the case,
(19), which helps constrain the scheme from violating tempo- since our models, being based on convolutional neural networks,
ral causality. Alternatively, upwinding could be built into the use only local features and by construction are translation invari-
model structure by construction, as we do in models which use ant. Fig. 3B illustrates the performance of our model trained on
Godunov flux. the domain [0, 2π] for predictions on a 10-times larger spatial
domain of size [0, 20π]. The learned model generalizes well. For
Results example, it shows good performance when function values are
Burgers’ Equation. To assess the accuracy of the time integration all positive in a region of size greater than 2π, which due to the
from our coarse-grained model, we computed “exact” solutions conservation law cannot occur in the training dataset.
To make this assessment quantitative, we averaged over many
realizations of the forcing and calculated the mean absolute error
integrated over time and space. Results on the 10-times larger
A B inference domain are shown in Fig. 3C: The solution from the
full neural network has equivalent accuracy to increasing the res-
olution for the baseline by a factor of about 8×. Interestingly,
even the simpler constant-coefficient method significantly out-
performs the baseline scheme. The constant-coefficient model
with Godunov flux is particularly compelling. This model is
faster than WENO, because there is no need to calculate coeffi-
cients on the fly, with comparable accuracy and better numerical
stability at coarse resolution, as shown in Figs. 3A and 4.
These calculations demonstrate that neural networks can carry
out coarse graining. Even if the mesh spacing is much larger
than the shock width, the model is still able to accurately propa-
gate dynamics over time, showing that it has learned an internal
representation of the shock structure.

Other Examples. To demonstrate the robustness of this method,


Fig. 2. Learned finite-volume coefficients for Burgers’ equation. Shown are we repeated the procedure for 2 other canonical PDEs: The
fixed and spatiotemporally varying finite-volume coefficients α(1) (1)
1 , . . . , α6 Korteweg–de Vries (KdV) equation (32), which was first derived
(Eq. 2) for ∂v/∂x. (A) Various centered and 1-sided polynomial finite- to model solitary waves on a river bore and is known for being
volume coefficients, along with optimized constant coefficients trained on
completely integrable and to feature soliton solutions, and the
this dataset (16× resample factor in Fig. 3). The vertical scale, which is
the same for all coefficient plots, is not shown for clarity. (B) An exam-
Kuramoto–Sivashinsky (KS) equation which models flame fronts
ple temporal snapshot of a solution to Burgers’ equation (Eq. 4), along and is a textbook example of a classically chaotic PDE (33).
with data-dependent coefficients produced by our neural network model at All details about these equations are given in SI Appendix. We
each of the indicated positions on cell boundaries. The continuous solution repeated the training procedure outlined above for these equa-
is plotted as a dashed line, and the discrete cell-averaged representation tions, running high-resolution simulations and collecting data to
is plotted as a piecewise constant solid line. The optimized constant coef-
ficients are most similar to the neural network’s coefficients at the shock
position. Away from the shock, the solution resembles centered polynomial †
Physically, the natural measure of the spatial resolution is with respect to the internal
coefficients. (B, Inset) Relative probability density for neural network coeffi- lengthscale of the equation which in the case of Burgers’ equation is the typical shock
cient “center of mass” vs. field value v across our full test dataset. Center of width. However, since this analysis is meant to be applicable also to situations where
mass is calculated by averaging the positions of each element in the stencil, the internal lengthscale is a priori unknown, we compare here to the lengthscale at
weighted by the absolute value of the coefficient. which mesh convergence is obtained.

Bar-Sinai et al. PNAS | July 30, 2019 | vol. 116 | no. 31 | 15347
A B Discussion and Conclusion
It has long been remarked that even simple nonlinear PDEs
can generate solutions of great complexity. But even very com-
plex, possibly chaotic, solutions are not just arbitrary functions:
They are highly constrained by the equations they solve. In
mathematical terms, despite the fact that the solution set of a
PDE is nominally infinite dimensional, the inertial manifold of
solutions is much smaller and can be understood in terms of
interactions between local features of the solutions to nonlinear
PDEs. The dynamical rules for interactions between these fea-
tures have been well studied over the past 50 years. Examples
include, among many others, interactions of shocks in complex
media, interactions of solitons (32), and the turbulent energy
cascade (34).
Machine learning offers a different approach for modeling
these phenomena, by using training data to parameterize the
C inertial manifold itself; said differently, it learns both the fea-
tures and their interactions from experience of the solutions.
Here we propose a simple algorithm for achieving this, moti-
Downloaded from https://www.pnas.org by "INDIAN INSTITUTE OF SCIENCE, BANGALORE" on September 6, 2023 from IP address 14.139.128.56.

vated by coarse graining in physical systems. It is often the case


that coarse graining a PDE amounts to modifying the weights in
a discretized numerical scheme. Instead, we use known solutions
to learn these weights directly, generating data-driven discretiza-
tions. This effectively parameterizes the solution manifold of the
PDE, allowing the equation to be solved at high accuracy with an
unprecedented low resolution.
Faced with this success, it is tempting to try and leverage the
Fig. 3. Time integration results for Burgers’ equation. (A) A particular
understanding the neural network has developed to gain new
realization of a solution at varying resolution solved by the baseline first-
order finite-volume method, WENO, optimized constant coefficients with
insights about the equation or its coarse-grained representation.
Godunov flux (Opt. God.), and the neural network (NN), with the white Indeed, in Fig. 2 we could clearly interpret the directionality of
region indicating times when the solution diverged. Both learned methods the weights as an upwind bias, the pseudolinear representation
manifestly outperform the baseline method and even outperform WENO providing a clear interpretation of the prediction in a physically
at coarse resolutions. (B) Inference predictions for the 32× neural network sensible way. However, extracting more abstract insight from the
model, on a 10 times larger spatial domain (only partially shown). The box network, such as the scaling relation between the shock height
surrounded by the dashed line shows the spatial domain used for train- and width, is a difficult challenge. This is a general problem in
ing. (C) Mean absolute error between integrated solutions and the ground the field of machine learning, which is under intensive current
truth, averaged over space, times less than 15, and 10 forcing realizations on
research (35, 36).
the 10-times larger inference domain. These metrics almost exactly match
results on the smaller training domain [0, 2π] (SI Appendix, Fig. S8). As
Our results are promising, but 2 challenges remain before
ground truth, we use WENO simulations on a 1× grid. Markers are omit- our approach can be deployed at large scales. The first chal-
ted if some simulations diverged or if the average error is worse than lenge is speed. We showed that optimized constant coefficients
fixing v = 0. can already improve accuracy, but our best models rely on the
flexibility of neural networks. Unfortunately, our neural nets
use many more convolution operations than the single convo-
train equation-specific estimators of the spatial derivative based lution required to implement finite differences, e.g., 322 = 1,024
on a coarse grid. These equations are essentially nondissipative,
so we do not include a forcing term. The solution manifold is
explored by changing the initial conditions, which are taken to
be a superposition of long-wavelength sinusoidal functions with
random amplitudes and phases (see SI Appendix for details).
To assess the accuracy of the integrated solution, for each
initial condition we define “valid simulation time” as the first
time that the low-resolution integrated solution deviates from
the cell-averaged high-resolution solution by more than a given
threshold. We found this metric more informative to compare
across very different equations than absolute error.
Fig. 4 shows the median valid simulation time as a function of
the resample factor. For all equations and resolutions, our neu-
ral network models have comparable or better performance than
all other methods. The neural network is particularly advanta-
geous at low resolutions, demonstrating its improved ability to
solve coarse-grained dynamics. The optimized constant coeffi- Fig. 4. Model performance across all of our test equations. Each plot shows
cients perform better at coarse resolution than baseline methods, the median time for which an integrated solution remains “valid” for each
equation, defined by the absolute error on at least 80% of grid points being
but not always at high resolutions. Finally, at large enough
less than the 20th percentile of the absolute error from predicting all 0s.
resample factors the neural network approximations also fail to These thresholds were chosen so that valid corresponds to a relatively gener-
reproduce the dynamics, as expected. These results also hold ous definition of an approximately correct solution. Error bars show the 95%
on a 10-times larger spatial domain, as shown in SI Appendix, confidence interval for the median across 100 simulations for each equation,
along with figures illustrating specific realizations and mean determined by bootstrap resampling. Simulations for each equation were
absolute error (SI Appendix, Figs. S8 and S9). run out to a maximum of time of 100.

15348 | www.pnas.org/cgi/doi/10.1073/pnas.1814058116 Bar-Sinai et al.


convolutions with a 5-point stencil between our second and third would scale like the square or the cube of the resample fac-
layers. We suspect that other machine-learning approaches could tor. Irregular grids may be more challenging, but deep learning
be dramatically faster. For example, recent work on a related methods that respect appropriate invariants have been devel-
problem—inferring subpixel resolution from natural images— oped both for arbitrary graphs (39) and for collections of points
has shown that banks of pretrained linear filters can nearly match in 3D space (40). Similar to what we found here, we expect
the accuracy of neural nets with orders of magnitude better that hand-tuned heuristics for both gridding and grid coeffi-
performance (37, 38). The basic idea is to divide input images cients could be improved upon by systematic machine learning.
into local patches, classify patches into classes based on fixed More broadly, data-driven discretization suggests the poten-
properties (e.g., curvature and orientation), and learn a single tial of data-driven numerical methods, combining the optimized
optimal linear filter for each class. Such computational archi- approximations of machine learning with the generalization of
tectures would also facilitate extracting physical insights from physical laws.
trained filters.
ACKNOWLEDGMENTS. We thank Peyman Milanfar, Pascal Getreur,
A second challenge is scaling to higher-dimensional problems Ignacio Garcia Dorado, and Dmitrii Kochkov for collaboration and impor-
and more complex grids. Here we showcased the approach for tant conversations; Peter Norgaard and Geoff Davis for feedback on drafts
regular grids in 1D, but most problems in the real world are of the manuscript; and Chi-Wang Shu for guidance on the implementation
of WENO. Y.B.-S. acknowledges support from the James S. McDonnel post-
higher dimensional, and irregular and adaptive grids are com- doctoral fellowship for the study of complex systems. M.P.B. acknowledges
mon. We do expect larger potential gains in 2D and 3D, as support from NSF Grants DMS-1715477, ONR N00014-17-1-3029, as well as
the computational gain in terms of the number of grid points the Simons Foundation.
Downloaded from https://www.pnas.org by "INDIAN INSTITUTE OF SCIENCE, BANGALORE" on September 6, 2023 from IP address 14.139.128.56.

1. J. D. Jackson, Classical Electrodynamics (John Wiley & Sons, 1999). 22. M. Lee, R. D. Moser, Direct numerical simulation of turbulent channel flow up to
2. D. Sholl, J. A. Steckel, Density Functional Theory: A Practical Introduction (Wiley & Reτ ≈ 5200. J. Fluid Mech. 774, 395–415 (2015).
Sons, 2011). 23. M. Clay, D. Buaria, T. Gotoh, P. Yeung, A dual communicator and dual grid-resolution
3. C. J. Chen, Fundamentals of Turbulence Modelling (CRC Press, 1997). algorithm for petascale simulations of turbulent mixing at high Schmidt number.
4. M. Van Dyke, Perturbation methods in fluid mechanics (NASA STI/Recon Technical Comput. Phys. Commun. 219, 313–328 (2017).

MATHEMATICS
Report A 75, 1975). 24. K. P. Iyer, K. R. Sreenivasan, P. K. Yeung, Reynolds number scaling of velocity

APPLIED
5. R. Gonzalez-Garcia, R. Rico-Martinez, I. Kevrekidis, Identification of distributed increments in isotropic turbulence. Phys. Rev. E 95, 021101 (2017).
parameter systems: A neural net based approach. Comput. Chem. Eng. 22, S965–S968 25. P. Constantin, C. Foias, B. Nicolaenko, R. Temam, Integral Manifolds and Inertial Mani-
(1998). folds for Dissipative Partial Differential Equations (Springer Science & Business Media,
6. R. Rico-Martinez, I. Kevrekidis, K. Krischer, “Nonlinear system identification using 2012), vol. 70.
neural networks: Dynamics and instabilities” in Neural Networks for Chemical 26. C. Foias, G. R. Sell, R. Temam, Inertial manifolds for nonlinear evolutionary equations.
Engineers, A. B. Bulsari, Ed. (Elsevier, Amsterdam, The Netherlands, 1995), pp. J. Differ. Equations 73, 309–353 (1988).
409–442. 27. M. Jolly, I. Kevrekidis, E. Titi, Approximate inertial manifolds for the Kuramoto-
7. I. G. Kevrekidis, G. Samaey, Equation-free multiscale computation: Algorithms and Sivashinsky equation: Analysis and computations. Physica D Nonlinear Phenom. 44,
applications. Annu. Rev. Phys. Chem. 60, 321–344 (2009). 38–60 (1990).
8. B. Kim et al., Deep fluids: A generative network for parameterized fluid simulations. 28. E. S. Titi, On approximate inertial manifolds to the Navier-Stokes equations. J. Math.
Computer Graphics Forum 38, 59–70 (2019). Anal. Appl. 149, 540–557 (1990).
9. J. Tompson, K. Schlachter, P. Sprechmann, K. Perlin, “Accelerating Eulerian fluid 29. M. Marion, Approximate inertial manifolds for reaction-diffusion equations in high
simulation with convolutional networks” in Proceedings of the 34th International space dimension. J. Dyn. Differ. Equations 1, 245–267 (1989).
Conference on Machine Learning, D. Precup, Y. W. Teh, eds. (PMLR, 2017), vol. 70, pp. 30. S. Ross, D. Bagnell, “Efficient reductions for imitation learning” in Proceedings of
3424–3433. the 13th International Conference on Artificial Intelligence and Statistics, Y. W. Teh,
10. S. Rasp, M. S. Pritchard, P. Gentine, Deep learning to represent subgrid processes in M. Titterington, eds. (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010), vol. 9, pp.
climate models. Proc. Natl. Acad. Sci. U.S.A. 115, 9684–9689 (2018). 661–668.
11. S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equations from data by 31. I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep Learning (MIT Press,
sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. U.S.A. 113, Cambridge, MA, 2016).
3932–3937 (2016). 32. N. J. Zabusky, M. D. Kruskal, Interaction of ”solitons” in a collisionless plasma and the
12. E. de Bezenac, A. Pajot, P. Gallinari, “Deep learning for physical processes: Incorpo- recurrence of initial states. Phys. Rev. Lett. 15, 240–243 (1965).
rating prior scientific knowledge” in International Conference on Learning Repre- 33. D. Zwillinger, Handbook of Differential Equations (Gulf Professional Publishing,
sentations (2018). https://iclr.cc/Conferences/2018/Schedule?showEvent=40. Accessed 1998).
11 July 2019. 34. U. Frisch, Turbulence: The Legacy of A. N. Kolmogorov (Cambridge University Press,
13. B. Lusch, J. N. Kutz, S. L. Brunton, Deep learning for universal linear embeddings of 1996).
nonlinear dynamics. Nat. Commun. 9, 4950 (2018). 35. M. Sundararajan, A. Taly, Q. Yan, “Axiomatic attribution for deep networks,” in
14. J. Morton, F. D. Witherden, A. Jameson, M. J. Kochenderfer, “Deep dynamical model- Proceedings of the 34th International Conference on Machine Learning (ICML),
ing and control of unsteady fluid flows” in Advances in Neural Information Processing D. Precup, Y. W. Teh, Eds. (PMLR, 2017), vol. 70, pp. 3319–3328.
Systems, S. Bengio et al., Eds. (Curran Associates, Inc., 2018), vol. 31, pp. 9258–9268. 36. A. Shrikumar, P. Greenside, A. Kundaje, “Learning important features through prop-
15. J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence modelling using agating activation differences” in Proceedings of the 34th International Conference
deep neural networks with embedded invariance. J. Fluid Mech. 807, 155–166 on Machine Learning (ICML), D. Precup, Y. W. Teh, Eds. (PMLR, 2017), vol. 70, pp.
(2016). 3145–3153.
16. A. D. Beck, D. G. Flad, C. D. Munz, Deep neural networks for data-driven turbulence 37. Y. Romano, J. Isidoro, P. Milanfar, RAISR: Rapid and accurate image super resolution.
models. arXiv:1806.04482 (15 June 2018). IEEE Trans. Comput. Imaging 3, 110–125 (2017).
17. A. Roberts, Holistic discretization ensures fidelity to Burgers’ equation. Appl. Numer. 38. P. Getreuer et al., “BLADE: Filter learning for general purpose computational photog-
Math. 37, 371–396 (2001). raphy” in 2018 IEEE International Conference on Computational Photography (ICCP)
18. W. E. Schiesser, The Numerical Method of Lines: Integration of Partial Differential (IEEE, 2018).
Equations (Academic Press, San Diego, 1991). 39. J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, “Neural message pass-
19. R. J. LeVeque, Numerical Methods for Conservation Laws (Birkhauser Verlag, 1992). ing for quantum chemistry” in Proceedings of the 34th International Conference
20. A. Harten, B. Engquist, S. Osher, S. R. Chakravarthy, Uniformly high order accurate on Machine Learning (ICML), D. Precup, Y. W. Teh, Eds. (PMLR, 2017), vol. 70, pp.
essentially non-oscillatory schemes, III. J. Comput. Phys. 71, 231–303 (1987). 1263–1272.
21. C. W. Shu, “Essentially non-oscillatory and weighted essentially non-oscillatory 40. C. R. Qi, L. Yi, H. Su, L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on
schemes for hyperbolic conservation laws” in Advanced Numerical Approximation point sets in a metric space” in Advances in Neural Information Processing Systems,
of Nonlinear Hyperbolic Equations, A. Quarteroni, Ed. (Springer, 1998), pp. 325–432. I. Guyon et al., Eds. (Curran Associates, Inc., 2017), vol. 30.

Bar-Sinai et al. PNAS | July 30, 2019 | vol. 116 | no. 31 | 15349

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy