0% found this document useful (0 votes)
7 views13 pages

An Intriguing Property of Geophysics Inversion: Yinan Feng Yinpeng Chen Shihang Feng Peng Jin Zicheng Liu Youzuo Lin

This paper presents a novel approach to geophysical inversion using a shallow neural network that models a near-linear relationship between geophysical measurements and properties after integral transforms. The proposed method, InvLINT, simplifies the inversion process by utilizing a lightweight encoder-decoder architecture, achieving comparable accuracy to deeper models while using significantly fewer parameters. The findings are validated through experiments on seismic and electromagnetic inversion problems across multiple datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

An Intriguing Property of Geophysics Inversion: Yinan Feng Yinpeng Chen Shihang Feng Peng Jin Zicheng Liu Youzuo Lin

This paper presents a novel approach to geophysical inversion using a shallow neural network that models a near-linear relationship between geophysical measurements and properties after integral transforms. The proposed method, InvLINT, simplifies the inversion process by utilizing a lightweight encoder-decoder architecture, achieving comparable accuracy to deeper models while using significantly fewer parameters. The findings are validated through experiments on seismic and electromagnetic inversion problems across multiple datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

An Intriguing Property of Geophysics Inversion

Yinan Feng 1 Yinpeng Chen 2 Shihang Feng 1 Peng Jin 3 1 Zicheng Liu 2 Youzuo Lin 1

Abstract 1. Introduction
Inversion techniques are widely used to recon-
struct subsurface physical properties (e.g., veloc-
𝑼
arXiv:2204.13731v2 [cs.LG] 16 Jun 2022

Geophysical
ity, conductivity) from surface-based geophysical Measurements 𝒀 Geophysical
measurements (e.g., seismic, electric/magnetic 𝑢 𝑥, 𝑡 properties
x 𝑦 𝑥, 𝑧
(EM) data). The problems are governed by par- x

tial differential equations (PDEs) like the wave 𝒀 ≈ 𝐴𝑼


or Maxwell’s equations. Solving geophysical in-


t z
version problems is challenging due to the ill-


posedness and high computational cost. To alle-
viate those issues, recent studies leverage deep 𝑈𝑛 = ∬ 𝑢 𝑥, 𝑡 Φ𝑛 𝑥, 𝑡 𝑑𝑡𝑑𝑥 𝑌𝑚 = ∬ 𝑦 𝑥, 𝑧 Ψ𝑚 𝑥, 𝑧 𝑑𝑥𝑑𝑧

neural networks to learn the inversion mappings


Figure 1. Illustration of the near-linear relation property between
from measurements to the property directly.
geophysical measurements and properties after applying integral
In this paper, we show that such a mapping can be transform. {Φn } and {Ψm } are two families of kernels for integral
well modeled by a very shallow (but not wide) net- transforms (e.g., sine and Gaussian). Here, the full waveform
work with only five layers. This is achieved based inversion from seismic data to velocity map is used as an example.
on our new finding of an intriguing property: a
near-linear relationship between the input and Geophysics inversion techniques are commonly used to char-
output, after applying integral transform in high acterize site geology, stratigraphy, and rock quality. These
dimensional space. In particular, when dealing techniques uncover subsurface layering and rock geome-
with the inversion from seismic data to subsurface chanical properties (such as velocity, conductivity), which
velocity governed by a wave equation, the integral are crucial in subsurface applications such as subsurface en-
results of velocity with Gaussian kernels are lin- ergy exploration, carbon capture and sequestration, ground-
early correlated to the integral of seismic data with water contamination and remediation, and earthquake early
sine kernels. Furthermore, this property can be warning systems. Technically, these subsurface geophysi-
easily turned into a light-weight encoder-decoder cal properties can be inferred from geophysical measure-
network for inversion. The encoder contains the ments (such as seismic, electromagnetic (EM)) acquired
integration of seismic data and the linear transfor- on the surface. Some underlying partial differential equa-
mation without need for fine-tuning. The decoder tions (PDEs) between measurements and geophysical prop-
only consists of a single transformer block to re- erty exist, where inversion gets its name. For example,
verse the integral of velocity. velocity is reconstructed from seismic data based on full
Experiments show that this interesting property waveform inversion (FWI) of a wave equation, while con-
holds for two geophysics inversion problems over ductivity is recovered from EM measurements based on EM
four different datasets. Compared to much deeper inversion of Maxwell’s equations.
InversionNet (Wu & Lin, 2019), our method
However, these inversion problems can be rather challeng-
achieves comparable accuracy, but consumes sig-
ing to solve, as they are ill-posed. Recent works study them
nificantly fewer parameters.
from two perspectives: physics-driven and data-driven. The
1
Earth and Environmental Sciences Division, Los Alamos Na- former approaches search for the optimal geophysical prop-
tional Laboratory,USA 2 Microsoft Research, USA 3 College of erty (e.g., velocity) from an initial guess, such that the gener-
Information Sciences and Technology, The Pennsylvania State Uni- ated geophysical simulations based on the forward modeling
versity, USA. Correspondence to: Youzuo Lin <ylin@lanl.gov>. of the governing equation are closed to the real measure-
Proceedings of the 39 th International Conference on Machine ments (Virieux & Operto, 2009; Feng & Schuster, 2019;
Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copy- Feng et al., 2021). These methods are computationally ex-
right 2022 by the author(s). pensive as they require iterative optimization per sample.
Published as a conference paper at ICML 2022

The latter methods (i.e., data-driven approaches) (Wu & Lin, better) performance on two geophysics inversion problems
2019), inspired by the image-to-image translation task, em- (seismic full waveform inversion and electric/magnetic in-
ploy encoder-decoder convolution neural networks (CNN) version) over four datasets (Kimberlina Leakage (Jordan &
to learn the mapping between physical measurements and Wagoner, 2017), Marmousi (Feng et al., 2021), Salt (Yang
geophysical properties. Deep network architecture that in- & Ma, 2019), and Kimberlina-Reservoir (Alumbaugh et al.,
volves multiple convolution blocks is employed as both 2021)), but uses significantly less parameters than prior
encoder and decode, which also results in heavy reliance on works. For instance, on Marmousi, our model only needs
data and very high computational cost in training. 1/20 parameters, compared to previous InversionNet.
In this paper, we found an intriguing property of geophysics
inversion that can significantly simplify data-driven methods 2. Background
as:
The governing equation of the seismic full waveform inver-
Geophysical measurements (e.g., seismic data) and sion is acoustic wave equation (Schuster, 2017),
geophysical property (e.g., velocity map) have near- 1 ∂2
linear relationship in high dimensional space after ∇2 p(r, t) − p(r, t) = s(r, t), (2)
c2 (r) ∂t2
integral transform.
where r = (x, z) represents the spatial location in Cartesian
Let u(x, t) denote a spatio-temporal geophysical measure- coordinates (x is the horizontal direction and z is the depth),
ment along horizontal x and time t dimensions, and y(x, z) t denotes time, c(r) is the velocity map, p(r, t) represents
denote a 2D geophysical property along horizontal x and the pressure wavefield, ∇2 is the Laplacian operator, and
depth z. Since, in practice, geophysical measurement is s(r, t) is the source term that specifies the location and time
mostly collected at the surface, and people want to invert history of the source.
the subsurface geophysical property, measurement u only
For the EM forward modeling, the governing equation is the
contains spatial variable x, while property y includes (x, z).
Maxwell’s Equations (Commer & Newman, 2008),
As illustrated in Figure 1, the proposed property can be
mathematically represented as follows: σE − ∇ × H = −J, (3)
ZZ
∇ × E + iωµ0 H = −M,
U = [U1 , . . . , UN ]T , Un = u(x, t)Φn (x, t)dxdt,
ZZ where E and H are the electric and magnetic fields. J and
Y = [Y1 , . . . , YM ]T , Ym = y(x, z)Ψm (x, z)dxdz, M are the electric and magnetic sources. σ is the electrical
conductivity and µ0 = 4π × 10−7 Ω · s/m is the magnetic
Y ≈ A U, (1) permeability of free space.
where Φn and Ψm are kernels for integral transforms. After
applying integral transforms, both geophysical measurement 3. Methodology
u(x, t) and property y(x, z) are projected into high dimen-
sional space (denoted as U and Y ), and they will have a In this section, we use seismic full waveform inversion (from
near-linear relationship (Y ≈ A U ). Note that the kernels seismic data to velocity) as an example to illustrate our
({Φn }, {Ψm }) are not learnable, but well-known analytical derivation of the linear property after integral transforms.
kernels like sine, Fourier, or Gaussian. We will also show the encoder-decoder architecture based
on this linear property. Empirically, our solution is also
Interestingly, this intriguing property can significantly sim- applicable to EM inversion (from EM data to conductivity).
plify the encoder-decoder architecture in data-driven meth-
ods. The encoder only contains the integral with kernel
3.1. Near-Linear Relationship between Integral
{Φn } followed by a linear layer with weight matrix A in
Transformations
Eq. 1. The decoder just uses a single transformer (Vaswani
et al., 2017) block followed by a linear projection to reverse In the following part, we will show the seismic data and
the integral with kernels {Ψm }. This results in a much shal- velocity maps have the near-linear relation after integral
lower architecture. In addition, the encoder and decoder are transformation like the format of Equation 1. The seismic
learnt separately. The matrix A in encoder can be directly data p and velocity map c are governed by the wave equa-
solved by pseudo inverse and is frozen afterward. Only the tion (Equation 2). Note that seismic data p and velocity map
transformer block and following linear layer in the decoder c in wave equation corresponds to the input u and output y
are learnt via SGD based optimizer. in Equation 1, respectively.
Our method, named InvLINT (Inversion via LINear rela- Rewriting wave equation in Fourier series: Similar to
tionship between INTegrals), achieves comparable (or even constant coefficients PDEs, we assume spatial variable
Published as a conference paper at ICML 2022

r = (x, z) and temporal variable t are separable, i.e.,


p(x, z, t) = p1 (x, z)p2 (t), and s(x, z, t) = s1 (x, z)s2 (t). 1
Z
Thus, Equation 2 is rewritten as c2 (x, z) ∇2 p1 (x, z)Bn − s1 (x, z)Gn dx
4π 2 n2 z=0
ZZ
c2 (x, z)(∇2 p1 (x, z)p2 (t) − s1 (x, z)s2 (t)) ≈ 2
c (x, z)Fn (x, z)dxdz,
∂2 (8)
= (p1 (x, z)p2 (t)). (4)
∂t2
where Fn (x, z) is the kernel function.
Next the temporal parts p2 (t) and s2 (t) are represented This hypothesisRR(Eq. 7–8) bridges integral transforms of the
seismicRRdata ( p(t, x, z)e−j2πnt dtdx|z=0 ) and velocity
PN j2πnt
as Fourier series: p2 (t) = n=1 Bn e and s2 (t) =
PN
G e j2πnt
. This turns Equation 4 as: maps (R c2 (x, z)Fn (x, z)dxdz) via an auxiliary function
n=1 n 1
4π 2 n2 c2 (x, z) ∇2 p1 (x, z)Bn − s1 (x, z)Gn dx|z=0 .
N
X It has two parts: (a) the double integral of velocity
c2 (x, z)(∇2 p1 (x, z)Bn − s1 (x, z)Gn )ej2πnt maps equals the auxiliary function, and (b) the 2D kernel
n=1 Fn (x, z) can be estimated by a set of basis functions, so
N
X we can further calculate the inverse problem we want to
= 4π 2 n2 p1 (x, z)Bn ej2πnt . (5) solve. The existence of Fn (x, z) to achieve the former
n=1 equality can be validated by a special case Fn (x, z) =
1 2
4π 2 n2 ∇ p1 (x, z)Bn − s1 (x, z)Gn δ(z) where δ(z) is an
To make sure both P
sides have the same coefficient for each impulse function. The latter may weaken the former asser-
N
n, the aggregation n=1 and ej2πnt can be removed from tion of equality, but the misfit is likely small, as velocity
Equation 5 as: map is continuous at most (x, z) positions and seismic data
p1 (x, z) and source s1 (x, z) in the auxiliary function has
c2 (x, z)(∇2 p1 (x, z)Bn − s1 (x, z)Gn ) strong correlation along x and z. Our experimental results
= 4π 2 n2 p1 (x, z)Bn . (6) over three datasets empirically validate this hypothesis.
Further simplification by a single kernel family: As dis-
By further integrating over x, we have cussed above, we simplify Fn (x, z) as a weighted sum of a
series of basis functions:
Z
1 M
c2 (x, z) ∇2 p1 (x, z)Bn − s1 (x, z)Gn dx, X
4π 2 n2 Fn (x, z) = dn,m Ψm (x, z), (9)
Z m=1
= p1 (x, z) |Bn | dx,
  where dn,m is the weight and Ψm (x, z) is the basis function.
Z Z  By further plugging Equations 8 and 9 into Equation 7, we
−j2πnt
 
=  p1 (x, z)p2 (t) e dtdx, (7) get
 | {z } | {z } 
Seismic data F ourier kernel M
X ZZ
dn,m c2 (x, z)Ψm (x, z)dxdz
where |R· | is the modulus operator of complex numbers and m=1
Bn = p2 (t)e−j2πnt dt are the Fourier coefficients. Note
ZZ
that since Bn and Gn are complex numbers, we take module ≈ p(x, z, t)e−j2πnt dtdx . (10)
z=0
on both sides. Here, taking the real or imaginary part, rather
than modulo, does not affect our conclusions. Now, the Relation to Equation 1: Equation 10 is special case of
right hand of Equation 7 is in the same format with Un in Equation 1 We can, therefore, express Equation 10 in the
Equation 1. The kernel function Φn (x, t) = e−j2πnt 1(x), form of Equation 1 by letting:
where 1(x) = 1 for all x.
y(x, z) = c2 (x, z), u(x, t) = p(x, t),
Approximation by Integral over z: In reality the seismic
data p(x, z, t) is mostly collected at the surface (z = 0). A = D† , Φn (x, t) = e−j2πnt 1(x),
Thus, the right-hand side of Equation 7 is computable at where A is pseudo inverse of matrix D and D =
z = 0. However, the left-hand side is hard to calculate, since [dn,m ]N ×M is the matrix format.
∇2 p1 (x, z) and s1 (x, z) are unknown. Here, we hypothe-
size that the left-hand side at z = 0 can be approximated by In particular, U = [U1 , . . . , UM ]T and Y = [Y1 , . . . , YN ]T
leveraging velocity map at multiple depth positions as: are the high dimensional embeddings of the measurement
Published as a conference paper at ICML 2022

purpose of this is to recover the output to the original shape


Input: Output: with overlaps among blocks to remove the block effect.
𝑢 𝑥, 𝑡 𝑦 𝑥, 𝑧
𝑼 = ∬ 𝑢Φ𝑛 𝑑𝑡𝑑𝑥 1:𝑁
𝒀 = 𝐴𝑼
Sine kernels Linear layer Transformer × 1
3.3. Training
{Φ𝑛 𝑥, 𝑡 } 𝐴 {Ψ𝑚−1 𝑥, 𝑧 }
Because of the near-linear relation, we can easily solve
Encoder Decoder
the linear layer in the encoder, L1 , with the least squares
Figure 2. Schematic illustration of our proposed method, using method. Specifically, we first compute the embedding of
seismic FWI as an example. The linear regression for two trans- both encoder and decoder by integral transformations, calcu-
formed embeddings is solved by pseudo inverse and is frozen late the solution of matrix A, and freeze it while training the
afterward. The decoder is trained via SGD-based optimizer. decoder. The decoder is trained by an SGD-based optimizer.
The loss function of our InvLINT is a pixel-wise MAE loss
given as
and geophysical property. {Φn } is chosen as cosine/sine or
Fourier transform; while, based on the experiments, Gaus- L(ĉ, c) = `1 (ĉ, c). (11)
sian kernel becomes our choice of the {Ψm } to embed the Peng et al. (Jin et al., 2022) find that combining MAE,
spatial information in the geophysics property. It is true that MSE, and perceptual loss together is helpful to improve the
the hypothesis may seem strong, however, its validity can performance. However, to make a fair comparison with the
be supported via our extensive experimental results using previous work, we only use MAE as our loss function.
multiple datasets and various PDEs.

3.2. Simplified Encoder-Decoder Architecture 4. Experiment


Based on the proposed mathematical property as shown in In this section, we present experimental results of our pro-
Equation 1, we can easily design a simple network archi- posed InvLINT evaluated on four datasets and compare
tecture, accordingly. The encoder plays exactly the same our method with the previous works, InversionNet (Wu &
role as the right-hand side of Equation 1, while the decoder, Lin, 2019) and VelocityGAN Zhang et al. (2019). We also
with a neural network, approximates the inverse mapping of discuss different factors that affect the performance of our
the integral transformation (Ψ−1 method.
m (x, z)). The structure (Fig-
ure 2) of our InvLINT is described below.
4.1. Implementation Details
Encoder: As illustrated in Figure 2, we design the encoder
exactly the same to Equation 1, where an integral transfor- 4.1.1. DATASETS
mation with kernel {Φn }, n ∈ [1, N ] is first implemented
In experiments, we verify our method on four datasets, of
and followed by a linear layer represented by A. With such
which three of them are used for seismic FWI, and one of
a simple linear relation, one can easily map the input mea-
which is for an EM inversion.
surement to the embedding of the output.
Kimberlina-Leakage: The geophysical properties were
Decoder: There are many kernel functions (like Gaussian
developed under DOE’s National Risk Assessment Pro-
kernel), which does not have a close form inverse trans-
gram (NRAP). It contains 991 CO2 leakage scenarios, each
formations. Instead, we use a shallow decoder network to
simulated over a duration of 200 years, with 20 leakage
approximate such a pseudo-inverse. To achieve this, we first
velocity maps provided (i.e., at every ten years) for each sce-
use a linear layer L1 to map Y to a more compact embed-
nario (Jordan & Wagoner, 2017). Excluding some missing
ding and tile it a grid with the shape Rh×w×k . Here, h and
velocity maps, the data are split as 807/166 scenarios for
w are the size of the velocity map with 32 times downsam-
training and testing, respectively. The size of the velocity
pling, and k is the number of channels. After that, L1 (Y ) is
maps is 401 × 141 grid points, and the grid size is 10 me-
input into a 1-layer transformer, with patch size of 1 × 1 × k.
ters in both directions. To synthesize the seismic data, nine
This shallow transformer is the only nonlinear part of our
sources are evenly distributed along the top of the model,
decoder.
with depths of 5 m. The seismic traces are recorded by
The last parts of the model are a linear layer Lr . It upsam- 101 receivers positioned at each grid with an interval of 15
ples each 1 × 1 × k patch to a (32 + d) × (32 + d) block1 , m. The source frequency is 10 Hz. Each receiver collects
where d is a small integer. The final predicted velocity map ĉ 1251-timestep data for 1 second.
can be construed by stitching all h × w blocks together. The
Marmousi: We apply the generating method in Jin et al.
1
Since the size of output may not be divided exactly by 32, the (2022), which follows Feng et al. (2021) and adopts the Mar-
recovered shape will be slightly different for different datasets. mousi velocity map as the style image to construct this low-
Published as a conference paper at ICML 2022

Dataset Model MAE↓ MSE↓ SSIM↑ #Parameters FLOPs


InversionNet (Wu & Lin, 2019) 9.43 1086.99 0.9868 15.81M 563.52M
Kimberlina
VelocityGAN Zhang et al. (2019) 9.73 1026.27 0.9863 16.99M 1.31G
Leakage
InvLINT (Ours) 8.13 1534.60 0.9812 1.49M (9.4%) 44.30M (7.9%)
InversionNet (Wu & Lin, 2019) 149.67 45936.23 0.7889 24.41M 189.58M
Marmousi VelocityGAN Zhang et al. (2019) 124.62 30644.31 0.8642 25.59M 259.49M
InvLINT (Ours) 136.67 36003.43 0.7972 1.45M (5.9%) 9.31M (4.9%)
InversionNet (Wu & Lin, 2019) 25.98 8669.98 0.9764 13.74M 32.37M
Salt VelocityGAN Zhang et al. (2019) 332.62 145669.11 0.7760 14.92M 65.98M
InvLINT (Ours) 24.60 8840.79 0.9742 1.62M (11.8%) 5.98M (18.5%)
InversionNet (Wu & Lin, 2019) 0.01330 0.000855 0.9175 0.30M 1.20G
Kimberlina-
VelocityGAN Zhang et al. (2019) 0.01313 0.000688 0.8611 1.48M 3.95G
Reservoir
InvLINT (Ours) 0.00703 0.000537 0.9370 0.16M (53.3%) 96.10M (8.0%)

Table 1. Quantitative results evaluated on four datasets in terms of MAE, MSE and SSIM, the number of parameters and FLOPs. The
percentages indicate the ratio of #Parameters (FLOPs) required by InvLINT to that required by InversionNet. Our InvLINT achieves
comparable (or even better) inversion accuracy comparing to the InversionNet and VelocityGAN with a much smaller number of
parameters and FLOPs.

resolution dataset. This dataset contains 30K with paired 4.1.2. T RAINING D ETAILS
seismic data and velocity map. 24k samples are set as the
The input seismic data and EM data are normalized to the
training set, 3k samples are used as the validation set, and
range [-1, 1]. In practice, to supply more information, it al-
the rest are the testing set. The size of the velocity map is
ways uses multiple sources to measure, where s ∈ [1, · · · S]
70 × 70, with the 10-meter grid size in both directions. The
is the index of different sources. After integration, all
velocity ranges from 1, 500m/s to 4, 700m/s. There are
sources vectors will be concatenated. For the seismic data,
S = 5 sources placed evenly with a spacing of 170 m. The
we use Φn (x, t) = sin(nπt)1(x)/(xmax − xmin ) as the
source frequency is 20 Hz. The seismic data are recorded by
kernel function. However, for the EM data, since the raw
70 receivers with a receiver interval of 10 m. Each receiver
data are already in the frequency domain and the input size
collects 1,000-timestep data for 1 second.
is small, we skip the integral transformation step.
Salt: The dataset contains 140 velocity maps (Yang & Ma,
The Gaussian kernel can be represented as Ψm (x, z) =
2019). We downsample it to 40 × 60 with a grid size of −k(x,z)−µm k22
10 m, and the splitting strategy 120/10/10 is applied. The exp 2σ 2 . We let µm distribute evenly over the
velocity ranges from 1, 500m/s to 4, 500m/s. There are output shape. Then, the σ is set equal to the distance of adja-
also S = 5 sources used, with 12-Hz source frequency cent µ. When applying Ridge regression to solve the linear
and a spacing of 150 m. The seismic data are recorded by layer in the encoder, and set the regularization parameter
60 receivers with an interval of 10 m, too. Each receiver α = 1.
collects 600-timestep data for 1 second. We employ AdamW (Loshchilov & Hutter, 2018) optimizer
Kimberlina-Reservoir: The geophysical properties were with momentum parameters β1 = 0.5, β2 = 0.999 and a
also developed under DOE’s NRAP. It is based on a potential weight decay of 1 × 10−4 to update decoder parameters of
CO2 storage site in the Southern San Joaquin Basin of the network. The initial learning rate is set to be 1 × 10−3 ,
California (Alumbaugh et al., 2021). We use this dataset to and we decay the learning rate with a cosine annealing
test our method in the EM inversion problem. In this data, (Loshchilov & Hutter, 2016), where T0 = 5, Tmult = 2 and
there are 780 EM data as the geophysical measurement, and the minimum learning rate is set to be 1 × 10−3 . The size of
corresponding conductivity as the geophysical property. We every mini-batch is set to be 128. We implement our models
use 750/30 as training and testing. EM data are simulated in Pytorch and train them on 1 NVIDIA Tesla V100 GPU.
by finite-difference method (Commer & Newman, 2008)
with two sources location at x = 2.5 km, z = 3.025 km 4.1.3. E VALUATION M ETRICS
and x = 4.5 km, z = 2.5 km. There are S = 8 source We apply three metrics to evaluate the geophysical proper-
frequencies from 0.1 to 8.0 Hz and recorded with its real ties generated by our method: MAE, MSE and Structural
and imaginary part. The conductivity is with the size of Similarity (SSIM). In the existing literature (Wu & Lin,
351×601 (H ×W ), where the grid is 10 m in all dimensions. 2019; Zhang & Lin, 2020), both MAE and MSE have been
employed to measure the pixel-wise error. SSIM is also con-
Published as a conference paper at ICML 2022

sidered to measure the perceptual similarity (Jin et al., 2022), Note that, in this challenging dataset, which only has a small
since both velocity maps and conductivity have highly struc- number of samples, VelocityGAN cannot converge well and
tured information, and degradation or distortion can be eas- yields bad results. This is a side effect of its complex struc-
ily perceived by a human. Note that when calculating MAE ture. The velocity maps inverted by ours and InversionNet
and MSE, we denormalize geophysical properties to their are illustrated in the fifth and sixth rows of Figure 3. Consis-
original scale while we keep them in the normalized scale tent with quantitative results, both methods generate similar
[−1, 1] for SSIM according to the algorithm. results. In the shallow region, our method output a slightly
clear structure; but in a deeper region (e.g., the red region in
Moreover, we also employ two common metrics to mea-
the first example), the output of InversionNet is a little close
sure the complexity and computational cost of the model:
to the ground truth. However, the overall difference can
the number of parameters (#Parameters) and Floating-point
be hard to distinguish. Our method achieves comparable
operations per second (FLOPs).
results with much less complexity.
4.2. Main Results Results on Kimberlina-Reservoir: Compared to Inver-
sionNet and VelocityGAN, our method outperforms in all
Table 1 shows the comparison results of our method with three metrics, with 1/2 parameters and 1/12 FLOPs to those
InversionNet on different datasets. Overall, our method of InversionNet. Because of the compact input, all model
achieves comparable or even better performance with a utilize the much smaller number of parameters. However,
smaller amount of parameters and lower FLOPs. Below, we due to the simple architecture, InvLINT yields significantly
will provide in detail the comparison of all four datasets. It fewer FLOPs but achieves better inversion accuracy. The
may be worthwhile mentioning that FWI is a quantitative conductivity results inverted by different models are shown
inversion technique, meaning that it will yield both the shape in the last two rows of Figure 3. Contrary to previous results
and the quantitative values of the subsurface property. on the Kimberlina-Leakage dataset, our model yields clearer
Results on Kimerlina-Leakage: Compared to Inversion- results. In the first example, we can see that the outputs of
Net and VelocityGAN, our method outperforms in MAE, our model are less noisy; and in the second case, InvLINT
slightly worse in MSE and SSIM. However, our InvLINT inverts the deep region more precisely, as highlighted by
only needs less than 1/10 parameters and FLOPs. This the red squares. This is also consistent with the quantitative
demonstrates the power of our model, and further validates results.
the properties we found. The velocity maps inverted by At the same time, we find that the number of parameters of
ours and InversionNet are shown in the first two rows of our model varies less for the same inverse problem. The
Figure 3. In the second example, despite of some noise number of model parameters is relatively independent of
produced by our method in the background, the CO2 leak- data size. In contrast, the previous methods are greatly
age plume (most important region as boxed out in green) affected by the input and output sizes. Moreover, our model
has been very well imaged. Compared to ground truth, not only requires fewer parameters, but also enables more
our method yields even better quantitative values than that efficient training and inference. When training on Marmousi
obtained by InversionNet. dataset using 1 GPU (NVIDIA Quadro RTX 8000), our
Results on Marmousi: Marmousi is a more challenging model is 9 times faster than InversionNet/VelocityGAN (1
dataset due to its more complex structure. Compared to In- hour vs. 9 hours). We also tested inference runtime with
versionNet, our method outperforms in all three metrics with batch size 1 on a single thread of an Intel(R) Xeon(R) CPU
significantly less computational and memory cost (about Gold 6248 v3 (2.5GHz). Our model is 16 times faster than
1/20 parameters and FLOPs). This result again demon- InversionNet/VelocityGAN (5 ms vs. 80ms). The small
strates not only the power of our model but also the validity model size is suitable for memory-limited mobile devices.
of the near-linear relationship that we found. However, in More visualization results are provided in the Appendix for
such a large and complex dataset, VelocityGAN outperform readers who might be interested.
others, where the GAN structure helps generating better re-
sults. The velocity maps inverted by ours and InversionNet 4.3. Ablation Tests
are illustrated in the third and fourth rows of Figure 3. Our
In this part, we will test how different kernel functions and
InvLINT and InversionNet perform comparably in both the
network architectures will influence the performance of our
shallow and deep regions compared to the ground truth.
method. We put our default setting at the first row of each
Results on Salt: Compared to InversionNet, our method table. For ease of illustration, we only provide results on
outperforms in MAE, and is slightly worse in MSE and Marmousi. Results on Kimberlina Leakage are given in the
SSIM with a very small gap. Moreover, our method uses Appendix.
1/8 parameters and 1/5 FLOPs to those of InversionNet.
Different Encoder Kernels
Published as a conference paper at ICML 2022

Ground Truth InversionNet InvLINT (Ours)


First, we conduct experiments by replacing the 1D sine
Kimberlina Leakage kernel in the encoder to different 2D sine kernels. The quan-
titative results are shown in the Table 2. By comparing the
results in Marmousi and the results in Kinberlina-Leakage
(shown in the Appendix), we can see that the optimal strat-
egy to integrate over x axis is distinct for different datasets.
In Marmousi, using kernel sin(nπt) cos(nπx) can improve
the performance a lot. This kernel, however, does not per-
form well on other datasets (e.g., Kimberlina-Leakage).

Dataset Encoder Kernel MAE↓ MSE↓ SSIM↑


sin(nπt)1(x)/(xmax − xmin ) 136.67 36003.43 0.7972
sin(nπt) sin(nπx) 138.76 37648.80 0.8042
sin(nπt) cos(nπx) 128.33 32451.22 0.8115
Marmousi
cos(nπt) sin(nπx) 140.14 38417.23 0.8031
Marmousi
sin(nπ(x + t)) 141.58 38383.58 0.7892
sin(nπt) + sin(nπx) 142.12 38261.56 0.7884

Table 2. Quantitative results for Different Encoder Kernel.

Different Decoder Kernels


Then, we test different kernels for geophysical properties. In
particular, we evaluate a series of 2D kernels: different 2D
sine kernels, a sinc function kernel (sin(πkr − µm k2 )/kr −
µm k2 ), and a Gaussian kernel with a smaller variance, noted
as Gaussianσ. For the sinc function, the choice of µm is
the same as the Gaussian kernel, while for Gaussianσ, we
Salt choose σ as 1/3 of the original. The quantitative results
are shown in Table 3. As we can see, our choice of kernel
outperforms the rest kernels. A smaller variance of Gaussian
will yield a slightly worse result, while sinc kernel performs
similarly to the Gaussianσ.

Dataset Decoder Kernel MAE↓ MSE↓ SSIM↑


Gaussian 136.67 36003.43 0.7972
Sinc 138.02 36534.44 0.7952
Gaussianσ 138.19 36579.46 0.7954
Marmousi
sin(nπx) sin(nπz) 177.36 56102.75 0.7455
cos(nπx) sin(nπz) 165.38 49463.79 0.7491
sin(nπx) cos(nπz) 175.92 55424.26 0.7376
sin(nπ(x + z)) 209.47 74167.16 0.7057
sin(nπx) + sin(nπz) 216.12 78496.77 0.7030

Kimberlina-Reservoir Table 3. Quantitative results for Different Decoder Kernel.

Different Number of Kernels


We also test different numbers of kernels for both Sine and
Gaussian. We evaluate performance over a 6×6 grid where
the dimensions of U and Y vary from 128 to 4096. The
quantitative results are shown in Figure 4. Results indi-
cate that the current selection of dimensions is appropriate.
Obviously, reducing the model’s size reduces its capacity,
while choices of higher dimension are more prone to over-
fit. However, choosing a small dimension yields a smaller
number of parameters and FLOPs. One can easily balance
Figure 3. Illustration of results evaluated on four datasets the performance and the cost based on his requirements and
resources, indicating the flexibility of our model.
Different Decoder Architectures
Published as a conference paper at ICML 2022

Dataset Architecture MAE↓ MSE↓ SSIM↑


1 layer Transformer 136.67 36003.43 0.7972
Multi-Linear 138.82 36801.89 0.7939
Marmousi
2 layers Transformer 134.24 35111.23 0.8002
3 layers Transformer 132.19 34502.25 0.8037

Table 5. Quantitative results for a Larger Decoder.

4.4. Singular Value Analysis


Another major benefit of our simplified model is the ease
Figure 4. Performance over dimensions of U and Y . of analysis. Since we use only one linear layer in the en-
coder, we can analyze it by performing singular value de-
composition. The results are shown in Figure 5. Since the
singular value varies greatly in different datasets, we divide
We aim to design an effective and efficient decoder to re- it by its maximum value to normalize it and trunk it at 150
verse the integral transform over a velocity map. The shifted dim. Results indicate that for all datasets, the number of
Gaussian kernels used in integral transform split the velocity essential dimensions is less than 100. In other words, a
map into overlapping windows and encode the local struc- 100-dimensional latent space is sufficient to represent the
ture within each window. To reconstruct the global structure data. Specifically, we can see that a ten-dimensional latent
of the velocity map from these local features, we leverage space is enough for Kimberlina-Reservoir dataset. That
the transformer’s power in modeling long-range interaction answers why the required number of parameters of both
in a single layer. Options like conv/deconv require more our InvLINT and InversionNet on Kimberlina-Reservoir
layers to cover long range. datasets are much smaller than that on other datasets. All
in all, with such a simple architecture, our InvLINT is able
To better illustrate this, we test the performance of different
to not only help in analyzing the problem but also help us
decoder architecture. Results are provided in Table 4. A
quantify the difficulty of different datasets.
transformer layer followed by a linear layer is a more accu-
rate decoder than shallow conv/deconv layers. Deeper de-
coders with more conv/deconv layers achieve more accurate
results, but require a larger model. When using the deconv
decoder of InversionNet in our method, we achieve better
performance, clearly outperforming InversionNet (MAE
126.6 vs. 149.7).

Dataset Architecture MAE↓ MSE↓ SSIM↑ #Params FLOPs


Transformer×1 + Linear* 136.67 36003.43 0.7972 1.45M 9.3M
Conv ×2 + Linear 140.72 37345.58 0.7903 0.30M 9.2M
Marmousi
Deconv ×1 + Linear 167.98 49728.14 0.7520 0.35M 10.8M
(Deconv + Conv) ×5 126.59 33830.73 0.8158 12.71M 94.6M
(Up + Conv) ×5 128.74 34854.78 0.8120 4.01M 56.7M

Table 4. Comparison among different decoder structures. (*) indi-


cates the default decoder option.

Figure 5. Normalized Singular Value Decomposition of the linear


Results for a Larger Decoder layer on different datasets.
Here, we evaluate our method with a larger/deeper decoder.
Firstly, we test it using multiple unshared linear layers,
4.5. Comparison to traditional FWI:
rather than a shared one Lr1 , in the last part of our de-
coder. Furthermore, we evaluate our model with a deeper We performed new comparison with a widely used tradi-
transformer. The quantitative results are shown in Table 5. tional FWI method (i.e., Multiscale FWI (Virieux & Op-
The result using unshared linear layers indicates that a single erto, 2009)) on three seismic FWI datasets (Marmousi,
linear layer is enough and the model does not benefit from Kimberlina-Leakage, Salt). Our method is consistently
more parameters. On the other hand, a deeper transformer better on 3 datasets (MAE: 11.7 vs. 42.0 in Kimberlina-
can improve the performance. Similar to the number of Leakage, 140.7 vs. 199.5 in Marmousi, 26.1 vs. 176.6
kernels, the balance is based on requirements. in Salt). The traditional FWI requires a good initial guess
Published as a conference paper at ICML 2022

and optimization per sample, resulting in slow processing Experiments show that this interesting property holds for
(e.g., 4 hours per sample in Kimberlina-Leakage). Due to two geophysics inversion problems over four different
the limited rebuttal duration, we ran the comparison over 5 datasets. Compared to much deeper InversionNet, our
samples per dataset. method achieves comparable accuracy, but consumes signif-
icantly fewer parameters.
5. Related works
References
5.1. Data-driven Methods for FWI
Adler, A., Araya-Polo, M., and Poggio, T. Deep learning
Recently, based on deep learning, a new type of method has for seismic inverse problems: Toward the acceleration of
been developed. Araya-Polo et al. (2018) use a fully con- geophysical analysis workflows. IEEE Signal Processing
nected network to invert velocity maps in FWI. Wu & Lin Magazine, 38(2):89–119, 2021.
(2019) consider the FWI as an image-to-image translation
problem, and employ encoder-decoder CNN to solve. By Alumbaugh, D., Commer, M., Crandall, D., Gasperikova,
using generative adversarial networks (GANs) and transfer E., Feng, S., Harbert, W., Li, Y., Lin, Y., Manthila Sama-
learning, Zhang et al. (2019) achieved improved perfor- rasinghe, S., and Yang, X. Development of a multi-
mance. In Zeng et al. (2021), authors present an efficient scale synthetic data set for the testing of subsurface CO2
and scalable encoder-decoder network for 3D FWI. Feng storage monitoring strategies. In American Geophysical
et al. (2021) develop a multi-scale framework with two Union (AGU), 2021.
convolutional neural networks to reconstruct the low- and
Araya-Polo, M., Jennings, J., Adler, A., and Dahlke, T.
high-frequency components of velocity maps. A thorough
Deep-learning tomography. The Leading Edge, 37(1):
review on deep learning for FWI can be found in Adler et al.
58–66, 2018.
(2021).
Chen, Y., Huang, D., Zhang, D., Zeng, J., Wang, N., Zhang,
5.2. Physics-informed machine learning H., and Yan, J. Theory-guided hard constraint projection
(hcp): A knowledge-based data-driven scientific machine
Previous pure data-driven methods can be considered as
learning method. Journal of Computational Physics, 445:
incorporating physic information in training data. On the
110624, 2021.
other hand, integrating the physic knowledge into loss func-
tion or network architecture is another direction. All of them Commer, M. and Newman, G. A. New advances in
are called Physics-informed neural networks (PINN). Raissi three-dimensional controlled-source electromagnetic in-
et al. proposed utilizing nonlinear PDEs in the loss function version. Geophysical Journal International, 172(2):513–
as a soft constrain (Raissi et al., 2019). Through a hard 535, 2008.
constraint projection, Chen et al. proposed a framework
to ensure model’s predictions strictly conform to physical Feng, S. and Schuster, G. T. Transmission+ reflection
mechanisms (Chen et al., 2021). Based on the universal anisotropic wave-equation traveltime and waveform in-
approximation theorem of operators, in Lu et al. (2021), version. Geophysical Prospecting, 67(2):423–442, 2019.
authors proposed DeepONet to learn continuous operators Feng, S., Fu, L., Feng, Z., and Schuster, G. T. Multiscale
or complex systems. Sun et al. (2021) proposed a hybrid net- phase inversion for vertical transverse isotropic media.
work design, which involves deterministic, physics-based Geophysical Prospecting, 69(8-9):1634–1649, 2021.
modeling and data-driven deep learning. A comprehensive
review of PINN can be found in Karniadakis et al. (2021). Jin, P., Zhang, X., Chen, Y., Huang, S. X., Liu, Z., and Lin, Y.
Unsupervised learning of full-waveform inversion: Con-
necting CNN and partial differential equation in a loop.
6. Conclusion
In Proceedings of the Tenth International Conference on
In this paper, we find an intriguing property of geophysics Learning Representations (ICLR), 2022.
inversion: a near-linear relationship between the input and
Jordan, P. and Wagoner, J. Characterizing construction of
output, after applying integral transform in high dimen-
existing wells to a co2 storage target: The kimberlina site,
sional space. Furthermore, this property can be easily turned
california. Technical report, National Energy Technology
into a light-weight encoder-decoder network for inversion.
Laboratory (NETL), Pittsburgh, PA, Morgantown, WV ,
The encoder contains the integration of seismic data and
2017.
the linear transformation without fine-tuning. The decoder
consists of a single transformer block to reverse the integral Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris,
of velocity with Gaussian kernels. P., Wang, S., and Yang, L. Physics-informed machine
learning. Nature Reviews Physics, 3(6):422–440, 2021.
Published as a conference paper at ICML 2022

Loshchilov, I. and Hutter, F. Sgdr: Stochastic gra- adversarial networks. In 2019 IEEE Winter Conference
dient descent with warm restarts. arXiv preprint on Applications of Computer Vision (WACV), pp. 705–
arXiv:1608.03983, 2016. 714. IEEE, 2019.

Loshchilov, I. and Hutter, F. Decoupled weight decay regu-


larization. In Sixth International Conference on Learning
Representations (ICLR), 2018.

Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E.
Learning nonlinear operators via deeponet based on the
universal approximation theorem of operators. Nature
Machine Intelligence, 3(3):218–229, 2021.

Raissi, M., Perdikaris, P., and Karniadakis, G. E. Physics-


informed neural networks: A deep learning framework for
solving forward and inverse problems involving nonlinear
partial differential equations. Journal of Computational
Physics, 378:686–707, 2019.

Schuster, G. T. Seismic inversion. Society of Exploration


Geophysicists, 2017.

Sun, J., Innanen, K. A., and Huang, C. Physics-guided


deep learning for seismic inversion with hybrid training
and uncertainty analysis. Geophysics, 86(3):R303–R317,
2021.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,


L., Gomez, A. N., Kaiser, L., and Polosukhin, I. At-
tention is all you need. In Guyon, I., Luxburg, U. V.,
Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S.,
and Garnett, R. (eds.), Advances in Neural Information
Processing Systems, volume 30. Curran Associates, Inc.,
2017.

Virieux, J. and Operto, S. An overview of full-waveform


inversion in exploration geophysics. Geophysics, 74(6):
WCC1–WCC26, 2009.

Wu, Y. and Lin, Y. InversionNet: An efficient and accurate


data-driven full waveform inversion. IEEE Transactions
on Computational Imaging, 6:419–433, 2019.

Yang, F. and Ma, J. Deep-learning inversion: A next-


generation seismic velocity model building method.
Geophysics, 84(4):R583–R599, 2019.

Zeng, Q., Feng, S., Wohlberg, B., and Lin, Y. Inversion-


net3d: Efficient and scalable learning for 3d full wave-
form inversion. arXiv preprint arXiv:2103.14158, 2021.

Zhang, Z. and Lin, Y. Data-driven seismic waveform in-


version: A study on the robustness and generalization.
IEEE Transactions on Geoscience and Remote sensing,
58(10):6900–6913, 2020.

Zhang, Z., Wu, Y., Zhou, Z., and Lin, Y. Velocitygan:


Subsurface velocity image estimation using conditional
Published as a conference paper at ICML 2022

Ground Truth InversionNet InvLINT (Ours)


A. Appendix
A.1. Inversion Results of Different Datasets
Ground Truth InversionNet InvLINT (Ours)

Figure 7. Illustration of results evaluated on Marmousi

Figure 6. Illustration of results evaluated on Kimberlina Leakage


Published as a conference paper at ICML 2022

Ground Truth InversionNet InvLINT (Ours) Ground Truth InversionNet InvLINT (Ours)

Figure 9. Illustration of results evaluated on Kimberlina Reservoir


Figure 8. Illustration of results evaluated on Salt
Published as a conference paper at ICML 2022

A.2. Ablation Test on Kimberlina Leakage


The ablation test results on Kimberlina Leakage are shown
in Table 6-10
Dataset Encoder Kernel MAE↓ MSE↓ SSIM↑
sin(nπt)1(x)/(xmax − xmin ) 8.13 1534.60 0.9812
sin(nπt) sin(nπx) 11.07 3227.71 0.9783
Kimberlina sin(nπt) cos(nπx) 8.88 2015.23 0.9804
Leakage cos(nπt) sin(nπx) 10.95 3222.21 0.9782
sin(nπ(x + t)) 8.17 1751.89 0.9815
sin(nπt) + sin(nπx) 8.10 1760.43 0.9817

Table 6. Quantitative results for Different Encoder Kernel.

Dataset Decoder Kernel MAE↓ MSE↓ SSIM↑


Gaussian 8.13 1534.60 0.9812
Sinc 8.90 2051.99 0.9789
Kimberlina Gaussianσ 8.84 2042.94 0.9790
Leakage sin(nπx) sin(nπz) 15.41 7357.48 0.9764
cos(nπx) sin(nπz) 15.40 7349.02 0.9764
sin(nπx) cos(nπz) 15.37 7252.45 0.9765
sin(nπ(x + z)) 12.86 4721.48 0.9767
sin(nπx) + sin(nπz) 13.21 74719.95 0.9764

Table 7. Quantitative results for Different Decoder Kernel.

Dataset #kernel MAE↓ MSE↓ SSIM↑


N=2048; M=512 8.13 1534.60 0.9812
N=1024; M=512 8.63 1946.62 0.9811 Dataset Set MAE↓ MSE↓ ymax − ymin |ymean |
Kimberlina
N=4096; M=512 8.29 1780.75 0.9808 Kimberlina Training set 2.83 45.48 884.4 490.93
Leakage
N=2048; M=128 8.76 2007.14 0.9805 Leakage Test set 4.63 245.09 885.27 492.65
N=2048; M=1024 8.59 1898.95 0.9808 Training set 4.26 33.38 107.05 4.92
Marmousi
Test set 4.29 33.9 103.01 5.1
Training set 0.28 0.46 51.3 12.11
Salt
Table 8. Quantitative results for Different Number of Kernels. Test set 0.48 1.98 49.35 11.97
Kimberlina Training set 169.25 2607.27 26497.1 7197.4873
Reservoir Test set 212.64 109288.08 26496.956 6849.95
Dataset Architecture MAE↓ MSE↓ SSIM↑
Transformer×1 + Linear* 8.13 1534.60 0.9812 Table 11. Quantitative results for a Larger Decoder.
Kimberlina Conv ×2 + Linear 13.42 2447.81 0.9762
Leakage Deconv ×1 + Linear 21.32 4919.03 0.9648
(Deconv + Conv) ×5 6.86 1462.29 0.9841
(Up + Conv) ×5 6.87 1516.80 0.9840

Table 9. Quantitative results for Different Decoder Architectures.


(*) indicates the default decoder option.

Dataset Architecture MAE↓ MSE↓ SSIM↑


1 layer Transformer 8.13 1534.60 0.9812
Kimberlina Multi-Linear 8.14 1799.70 0.9811
Leakage 2 layers Transformer 8.24 1781.50 0.9812
3 layers Transformer 8.09 1707.53 0.9813

Table 10. Quantitative results for a Larger Decoder.

A.3. Regression Results for the Encoder Linear Layer


We also show here the regression results of the linear layer in
our encoder on different datasets in Table 11. As a reference,
we also show the range and mean of the regression target
value as ymax -ymin , |ymean |. The result demonstrate that
how well the regressions are fitted.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy