An Intriguing Property of Geophysics Inversion: Yinan Feng Yinpeng Chen Shihang Feng Peng Jin Zicheng Liu Youzuo Lin
An Intriguing Property of Geophysics Inversion: Yinan Feng Yinpeng Chen Shihang Feng Peng Jin Zicheng Liu Youzuo Lin
Yinan Feng 1 Yinpeng Chen 2 Shihang Feng 1 Peng Jin 3 1 Zicheng Liu 2 Youzuo Lin 1
Abstract 1. Introduction
Inversion techniques are widely used to recon-
struct subsurface physical properties (e.g., veloc-
𝑼
arXiv:2204.13731v2 [cs.LG] 16 Jun 2022
Geophysical
ity, conductivity) from surface-based geophysical Measurements 𝒀 Geophysical
measurements (e.g., seismic, electric/magnetic 𝑢 𝑥, 𝑡 properties
x 𝑦 𝑥, 𝑧
(EM) data). The problems are governed by par- x
…
t z
version problems is challenging due to the ill-
…
posedness and high computational cost. To alle-
viate those issues, recent studies leverage deep 𝑈𝑛 = ∬ 𝑢 𝑥, 𝑡 Φ𝑛 𝑥, 𝑡 𝑑𝑡𝑑𝑥 𝑌𝑚 = ∬ 𝑦 𝑥, 𝑧 Ψ𝑚 𝑥, 𝑧 𝑑𝑥𝑑𝑧
The latter methods (i.e., data-driven approaches) (Wu & Lin, better) performance on two geophysics inversion problems
2019), inspired by the image-to-image translation task, em- (seismic full waveform inversion and electric/magnetic in-
ploy encoder-decoder convolution neural networks (CNN) version) over four datasets (Kimberlina Leakage (Jordan &
to learn the mapping between physical measurements and Wagoner, 2017), Marmousi (Feng et al., 2021), Salt (Yang
geophysical properties. Deep network architecture that in- & Ma, 2019), and Kimberlina-Reservoir (Alumbaugh et al.,
volves multiple convolution blocks is employed as both 2021)), but uses significantly less parameters than prior
encoder and decode, which also results in heavy reliance on works. For instance, on Marmousi, our model only needs
data and very high computational cost in training. 1/20 parameters, compared to previous InversionNet.
In this paper, we found an intriguing property of geophysics
inversion that can significantly simplify data-driven methods 2. Background
as:
The governing equation of the seismic full waveform inver-
Geophysical measurements (e.g., seismic data) and sion is acoustic wave equation (Schuster, 2017),
geophysical property (e.g., velocity map) have near- 1 ∂2
linear relationship in high dimensional space after ∇2 p(r, t) − p(r, t) = s(r, t), (2)
c2 (r) ∂t2
integral transform.
where r = (x, z) represents the spatial location in Cartesian
Let u(x, t) denote a spatio-temporal geophysical measure- coordinates (x is the horizontal direction and z is the depth),
ment along horizontal x and time t dimensions, and y(x, z) t denotes time, c(r) is the velocity map, p(r, t) represents
denote a 2D geophysical property along horizontal x and the pressure wavefield, ∇2 is the Laplacian operator, and
depth z. Since, in practice, geophysical measurement is s(r, t) is the source term that specifies the location and time
mostly collected at the surface, and people want to invert history of the source.
the subsurface geophysical property, measurement u only
For the EM forward modeling, the governing equation is the
contains spatial variable x, while property y includes (x, z).
Maxwell’s Equations (Commer & Newman, 2008),
As illustrated in Figure 1, the proposed property can be
mathematically represented as follows: σE − ∇ × H = −J, (3)
ZZ
∇ × E + iωµ0 H = −M,
U = [U1 , . . . , UN ]T , Un = u(x, t)Φn (x, t)dxdt,
ZZ where E and H are the electric and magnetic fields. J and
Y = [Y1 , . . . , YM ]T , Ym = y(x, z)Ψm (x, z)dxdz, M are the electric and magnetic sources. σ is the electrical
conductivity and µ0 = 4π × 10−7 Ω · s/m is the magnetic
Y ≈ A U, (1) permeability of free space.
where Φn and Ψm are kernels for integral transforms. After
applying integral transforms, both geophysical measurement 3. Methodology
u(x, t) and property y(x, z) are projected into high dimen-
sional space (denoted as U and Y ), and they will have a In this section, we use seismic full waveform inversion (from
near-linear relationship (Y ≈ A U ). Note that the kernels seismic data to velocity) as an example to illustrate our
({Φn }, {Ψm }) are not learnable, but well-known analytical derivation of the linear property after integral transforms.
kernels like sine, Fourier, or Gaussian. We will also show the encoder-decoder architecture based
on this linear property. Empirically, our solution is also
Interestingly, this intriguing property can significantly sim- applicable to EM inversion (from EM data to conductivity).
plify the encoder-decoder architecture in data-driven meth-
ods. The encoder only contains the integral with kernel
3.1. Near-Linear Relationship between Integral
{Φn } followed by a linear layer with weight matrix A in
Transformations
Eq. 1. The decoder just uses a single transformer (Vaswani
et al., 2017) block followed by a linear projection to reverse In the following part, we will show the seismic data and
the integral with kernels {Ψm }. This results in a much shal- velocity maps have the near-linear relation after integral
lower architecture. In addition, the encoder and decoder are transformation like the format of Equation 1. The seismic
learnt separately. The matrix A in encoder can be directly data p and velocity map c are governed by the wave equa-
solved by pseudo inverse and is frozen afterward. Only the tion (Equation 2). Note that seismic data p and velocity map
transformer block and following linear layer in the decoder c in wave equation corresponds to the input u and output y
are learnt via SGD based optimizer. in Equation 1, respectively.
Our method, named InvLINT (Inversion via LINear rela- Rewriting wave equation in Fourier series: Similar to
tionship between INTegrals), achieves comparable (or even constant coefficients PDEs, we assume spatial variable
Published as a conference paper at ICML 2022
Table 1. Quantitative results evaluated on four datasets in terms of MAE, MSE and SSIM, the number of parameters and FLOPs. The
percentages indicate the ratio of #Parameters (FLOPs) required by InvLINT to that required by InversionNet. Our InvLINT achieves
comparable (or even better) inversion accuracy comparing to the InversionNet and VelocityGAN with a much smaller number of
parameters and FLOPs.
resolution dataset. This dataset contains 30K with paired 4.1.2. T RAINING D ETAILS
seismic data and velocity map. 24k samples are set as the
The input seismic data and EM data are normalized to the
training set, 3k samples are used as the validation set, and
range [-1, 1]. In practice, to supply more information, it al-
the rest are the testing set. The size of the velocity map is
ways uses multiple sources to measure, where s ∈ [1, · · · S]
70 × 70, with the 10-meter grid size in both directions. The
is the index of different sources. After integration, all
velocity ranges from 1, 500m/s to 4, 700m/s. There are
sources vectors will be concatenated. For the seismic data,
S = 5 sources placed evenly with a spacing of 170 m. The
we use Φn (x, t) = sin(nπt)1(x)/(xmax − xmin ) as the
source frequency is 20 Hz. The seismic data are recorded by
kernel function. However, for the EM data, since the raw
70 receivers with a receiver interval of 10 m. Each receiver
data are already in the frequency domain and the input size
collects 1,000-timestep data for 1 second.
is small, we skip the integral transformation step.
Salt: The dataset contains 140 velocity maps (Yang & Ma,
The Gaussian kernel can be represented as Ψm (x, z) =
2019). We downsample it to 40 × 60 with a grid size of −k(x,z)−µm k22
10 m, and the splitting strategy 120/10/10 is applied. The exp 2σ 2 . We let µm distribute evenly over the
velocity ranges from 1, 500m/s to 4, 500m/s. There are output shape. Then, the σ is set equal to the distance of adja-
also S = 5 sources used, with 12-Hz source frequency cent µ. When applying Ridge regression to solve the linear
and a spacing of 150 m. The seismic data are recorded by layer in the encoder, and set the regularization parameter
60 receivers with an interval of 10 m, too. Each receiver α = 1.
collects 600-timestep data for 1 second. We employ AdamW (Loshchilov & Hutter, 2018) optimizer
Kimberlina-Reservoir: The geophysical properties were with momentum parameters β1 = 0.5, β2 = 0.999 and a
also developed under DOE’s NRAP. It is based on a potential weight decay of 1 × 10−4 to update decoder parameters of
CO2 storage site in the Southern San Joaquin Basin of the network. The initial learning rate is set to be 1 × 10−3 ,
California (Alumbaugh et al., 2021). We use this dataset to and we decay the learning rate with a cosine annealing
test our method in the EM inversion problem. In this data, (Loshchilov & Hutter, 2016), where T0 = 5, Tmult = 2 and
there are 780 EM data as the geophysical measurement, and the minimum learning rate is set to be 1 × 10−3 . The size of
corresponding conductivity as the geophysical property. We every mini-batch is set to be 128. We implement our models
use 750/30 as training and testing. EM data are simulated in Pytorch and train them on 1 NVIDIA Tesla V100 GPU.
by finite-difference method (Commer & Newman, 2008)
with two sources location at x = 2.5 km, z = 3.025 km 4.1.3. E VALUATION M ETRICS
and x = 4.5 km, z = 2.5 km. There are S = 8 source We apply three metrics to evaluate the geophysical proper-
frequencies from 0.1 to 8.0 Hz and recorded with its real ties generated by our method: MAE, MSE and Structural
and imaginary part. The conductivity is with the size of Similarity (SSIM). In the existing literature (Wu & Lin,
351×601 (H ×W ), where the grid is 10 m in all dimensions. 2019; Zhang & Lin, 2020), both MAE and MSE have been
employed to measure the pixel-wise error. SSIM is also con-
Published as a conference paper at ICML 2022
sidered to measure the perceptual similarity (Jin et al., 2022), Note that, in this challenging dataset, which only has a small
since both velocity maps and conductivity have highly struc- number of samples, VelocityGAN cannot converge well and
tured information, and degradation or distortion can be eas- yields bad results. This is a side effect of its complex struc-
ily perceived by a human. Note that when calculating MAE ture. The velocity maps inverted by ours and InversionNet
and MSE, we denormalize geophysical properties to their are illustrated in the fifth and sixth rows of Figure 3. Consis-
original scale while we keep them in the normalized scale tent with quantitative results, both methods generate similar
[−1, 1] for SSIM according to the algorithm. results. In the shallow region, our method output a slightly
clear structure; but in a deeper region (e.g., the red region in
Moreover, we also employ two common metrics to mea-
the first example), the output of InversionNet is a little close
sure the complexity and computational cost of the model:
to the ground truth. However, the overall difference can
the number of parameters (#Parameters) and Floating-point
be hard to distinguish. Our method achieves comparable
operations per second (FLOPs).
results with much less complexity.
4.2. Main Results Results on Kimberlina-Reservoir: Compared to Inver-
sionNet and VelocityGAN, our method outperforms in all
Table 1 shows the comparison results of our method with three metrics, with 1/2 parameters and 1/12 FLOPs to those
InversionNet on different datasets. Overall, our method of InversionNet. Because of the compact input, all model
achieves comparable or even better performance with a utilize the much smaller number of parameters. However,
smaller amount of parameters and lower FLOPs. Below, we due to the simple architecture, InvLINT yields significantly
will provide in detail the comparison of all four datasets. It fewer FLOPs but achieves better inversion accuracy. The
may be worthwhile mentioning that FWI is a quantitative conductivity results inverted by different models are shown
inversion technique, meaning that it will yield both the shape in the last two rows of Figure 3. Contrary to previous results
and the quantitative values of the subsurface property. on the Kimberlina-Leakage dataset, our model yields clearer
Results on Kimerlina-Leakage: Compared to Inversion- results. In the first example, we can see that the outputs of
Net and VelocityGAN, our method outperforms in MAE, our model are less noisy; and in the second case, InvLINT
slightly worse in MSE and SSIM. However, our InvLINT inverts the deep region more precisely, as highlighted by
only needs less than 1/10 parameters and FLOPs. This the red squares. This is also consistent with the quantitative
demonstrates the power of our model, and further validates results.
the properties we found. The velocity maps inverted by At the same time, we find that the number of parameters of
ours and InversionNet are shown in the first two rows of our model varies less for the same inverse problem. The
Figure 3. In the second example, despite of some noise number of model parameters is relatively independent of
produced by our method in the background, the CO2 leak- data size. In contrast, the previous methods are greatly
age plume (most important region as boxed out in green) affected by the input and output sizes. Moreover, our model
has been very well imaged. Compared to ground truth, not only requires fewer parameters, but also enables more
our method yields even better quantitative values than that efficient training and inference. When training on Marmousi
obtained by InversionNet. dataset using 1 GPU (NVIDIA Quadro RTX 8000), our
Results on Marmousi: Marmousi is a more challenging model is 9 times faster than InversionNet/VelocityGAN (1
dataset due to its more complex structure. Compared to In- hour vs. 9 hours). We also tested inference runtime with
versionNet, our method outperforms in all three metrics with batch size 1 on a single thread of an Intel(R) Xeon(R) CPU
significantly less computational and memory cost (about Gold 6248 v3 (2.5GHz). Our model is 16 times faster than
1/20 parameters and FLOPs). This result again demon- InversionNet/VelocityGAN (5 ms vs. 80ms). The small
strates not only the power of our model but also the validity model size is suitable for memory-limited mobile devices.
of the near-linear relationship that we found. However, in More visualization results are provided in the Appendix for
such a large and complex dataset, VelocityGAN outperform readers who might be interested.
others, where the GAN structure helps generating better re-
sults. The velocity maps inverted by ours and InversionNet 4.3. Ablation Tests
are illustrated in the third and fourth rows of Figure 3. Our
In this part, we will test how different kernel functions and
InvLINT and InversionNet perform comparably in both the
network architectures will influence the performance of our
shallow and deep regions compared to the ground truth.
method. We put our default setting at the first row of each
Results on Salt: Compared to InversionNet, our method table. For ease of illustration, we only provide results on
outperforms in MAE, and is slightly worse in MSE and Marmousi. Results on Kimberlina Leakage are given in the
SSIM with a very small gap. Moreover, our method uses Appendix.
1/8 parameters and 1/5 FLOPs to those of InversionNet.
Different Encoder Kernels
Published as a conference paper at ICML 2022
and optimization per sample, resulting in slow processing Experiments show that this interesting property holds for
(e.g., 4 hours per sample in Kimberlina-Leakage). Due to two geophysics inversion problems over four different
the limited rebuttal duration, we ran the comparison over 5 datasets. Compared to much deeper InversionNet, our
samples per dataset. method achieves comparable accuracy, but consumes signif-
icantly fewer parameters.
5. Related works
References
5.1. Data-driven Methods for FWI
Adler, A., Araya-Polo, M., and Poggio, T. Deep learning
Recently, based on deep learning, a new type of method has for seismic inverse problems: Toward the acceleration of
been developed. Araya-Polo et al. (2018) use a fully con- geophysical analysis workflows. IEEE Signal Processing
nected network to invert velocity maps in FWI. Wu & Lin Magazine, 38(2):89–119, 2021.
(2019) consider the FWI as an image-to-image translation
problem, and employ encoder-decoder CNN to solve. By Alumbaugh, D., Commer, M., Crandall, D., Gasperikova,
using generative adversarial networks (GANs) and transfer E., Feng, S., Harbert, W., Li, Y., Lin, Y., Manthila Sama-
learning, Zhang et al. (2019) achieved improved perfor- rasinghe, S., and Yang, X. Development of a multi-
mance. In Zeng et al. (2021), authors present an efficient scale synthetic data set for the testing of subsurface CO2
and scalable encoder-decoder network for 3D FWI. Feng storage monitoring strategies. In American Geophysical
et al. (2021) develop a multi-scale framework with two Union (AGU), 2021.
convolutional neural networks to reconstruct the low- and
Araya-Polo, M., Jennings, J., Adler, A., and Dahlke, T.
high-frequency components of velocity maps. A thorough
Deep-learning tomography. The Leading Edge, 37(1):
review on deep learning for FWI can be found in Adler et al.
58–66, 2018.
(2021).
Chen, Y., Huang, D., Zhang, D., Zeng, J., Wang, N., Zhang,
5.2. Physics-informed machine learning H., and Yan, J. Theory-guided hard constraint projection
(hcp): A knowledge-based data-driven scientific machine
Previous pure data-driven methods can be considered as
learning method. Journal of Computational Physics, 445:
incorporating physic information in training data. On the
110624, 2021.
other hand, integrating the physic knowledge into loss func-
tion or network architecture is another direction. All of them Commer, M. and Newman, G. A. New advances in
are called Physics-informed neural networks (PINN). Raissi three-dimensional controlled-source electromagnetic in-
et al. proposed utilizing nonlinear PDEs in the loss function version. Geophysical Journal International, 172(2):513–
as a soft constrain (Raissi et al., 2019). Through a hard 535, 2008.
constraint projection, Chen et al. proposed a framework
to ensure model’s predictions strictly conform to physical Feng, S. and Schuster, G. T. Transmission+ reflection
mechanisms (Chen et al., 2021). Based on the universal anisotropic wave-equation traveltime and waveform in-
approximation theorem of operators, in Lu et al. (2021), version. Geophysical Prospecting, 67(2):423–442, 2019.
authors proposed DeepONet to learn continuous operators Feng, S., Fu, L., Feng, Z., and Schuster, G. T. Multiscale
or complex systems. Sun et al. (2021) proposed a hybrid net- phase inversion for vertical transverse isotropic media.
work design, which involves deterministic, physics-based Geophysical Prospecting, 69(8-9):1634–1649, 2021.
modeling and data-driven deep learning. A comprehensive
review of PINN can be found in Karniadakis et al. (2021). Jin, P., Zhang, X., Chen, Y., Huang, S. X., Liu, Z., and Lin, Y.
Unsupervised learning of full-waveform inversion: Con-
necting CNN and partial differential equation in a loop.
6. Conclusion
In Proceedings of the Tenth International Conference on
In this paper, we find an intriguing property of geophysics Learning Representations (ICLR), 2022.
inversion: a near-linear relationship between the input and
Jordan, P. and Wagoner, J. Characterizing construction of
output, after applying integral transform in high dimen-
existing wells to a co2 storage target: The kimberlina site,
sional space. Furthermore, this property can be easily turned
california. Technical report, National Energy Technology
into a light-weight encoder-decoder network for inversion.
Laboratory (NETL), Pittsburgh, PA, Morgantown, WV ,
The encoder contains the integration of seismic data and
2017.
the linear transformation without fine-tuning. The decoder
consists of a single transformer block to reverse the integral Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris,
of velocity with Gaussian kernels. P., Wang, S., and Yang, L. Physics-informed machine
learning. Nature Reviews Physics, 3(6):422–440, 2021.
Published as a conference paper at ICML 2022
Loshchilov, I. and Hutter, F. Sgdr: Stochastic gra- adversarial networks. In 2019 IEEE Winter Conference
dient descent with warm restarts. arXiv preprint on Applications of Computer Vision (WACV), pp. 705–
arXiv:1608.03983, 2016. 714. IEEE, 2019.
Lu, L., Jin, P., Pang, G., Zhang, Z., and Karniadakis, G. E.
Learning nonlinear operators via deeponet based on the
universal approximation theorem of operators. Nature
Machine Intelligence, 3(3):218–229, 2021.
Ground Truth InversionNet InvLINT (Ours) Ground Truth InversionNet InvLINT (Ours)