Seismic Inversion by Newtonian Machine Learning: Yuqing Chen and Gerard T. Schuster
Seismic Inversion by Newtonian Machine Learning: Yuqing Chen and Gerard T. Schuster
10.1190/GEO2019-0434.1
INTRODUCTION solutions as the inverted velocity model become closer to the true
model. Alternatively, Wu et al. (2014) use the envelope of the
Full-waveform inversion (FWI) has been shown to accurately in- seismic traces to invert for the subsurface model because they claim
vert seismic data for high-resolution velocity models (Lailly and that the envelope carries the ultra-low-frequency information of the
Bednar, 1983; Tarantola, 1984; Virieux and Operto, 2009). How- seismic data. Ha and Shin (2012) invert the data in the Laplace-
ever, the success of FWI heavily relies on an initial model that domain, which is less sensitive to the lack of low frequencies than
is close to the true model; otherwise, cycle-skipping problems will conventional FWI. Sun and Schuster (1993), Fu et al. (2018), and
trap the FWI in a local minimum (Bunks et al., 1995). Chen et al. (2019) use an amplitude replacement method to focus
To mitigate the cycle-skipping problem, Bunks et al. (1995) the inversion on reducing the phase mismatch instead of the wave-
propose a multiscale inversion approach that initially inverts
form mismatch. In addition, they use a multiscale approach by tem-
low-pass-filtered seismic data and then gradually admits higher
porally integrating the traces to boost the low frequencies and
frequencies as the iterations proceed. AlTheyab and Schuster
mitigate cycle-skipping problems, and then they gradually intro-
(2015) remove the mid- and far-offset cycle-skipped seismic traces
duce the higher frequencies as the iterations proceed.
before inversion and gradually incorporate them into the iterative
Manuscript received by the Editor 5 August 2019; revised manuscript received 20 March 2020; published ahead of production 4 June 2020; published online
10 June 2020.
1
King Abdullah University of Science and Technology, Department of Earth Science and Engineering, Thuwal 23955-6900, Saudi Arabia. E-mail: yuqing
.chen@kaust.edu.sa (corresponding author); gerard.schuster@kaust.edu.sa.
© The Authors. © 2020 The Authors. Published by the Society of Exploration Geophysicists. All article content, except where otherwise noted (including
republished material), is licensed under a Creative Commons Attribution 4.0 Unported License (CC BY). See http://creativecommons.org/licenses/by/4.0/.
Distribution or reproduction of this work in whole or in part commercially or noncommercially requires full attribution of the original publication, including
its digital object identifier (DOI).
WA185
Nonlinear inversion often gets stuck in a local minimum, which The autoencoder neural network is an unsupervised machine learn-
means that the objective function is very complex and is charac- ing method that is trained for dimensionality reduction (Schmidhuber,
terized by many local minima. To avoid this problem, Luo and Schus- 2015). An autoencoder maps the data into a lower dimensional space
ter (1991a, 1991b) suggest a skeletonized inversion method that com- by extracting the data’s most important features. It encodes the orig-
bines the skeletonized representation of seismic data with the implicit inal data into a condensed representation, also denoted as the skel-
function theorem to accelerate convergence to the vicinity of the global etonized representation, of the input data. The input data can be
minimum (Lu et al., 2017). Simplification of the data by skeletoniza- reconstructed by a decoder applied to the encoded latent-space vector.
tion reduces the complexity of the misfit function as well as the num- In this paper, we first use the observed seismic traces as the training
ber of local minima. Examples of wave-equation inversion of set to train the autoencoder neural network. Once the autoencoder is
skeletonized data include the following. Luo and Schuster (1991a, well trained, we feed the observed and synthetic traces into the au-
1991b) use the solutions to the wave equation to invert the first-arrival toencoder to get the corresponding low-dimension representations of
traveltimes for the low-to-intermediate wavenumber details of the the seismic data. We compute the misfit function as the sum of the
background velocity model. Feng and Schuster (2019) use the trav- squared differences between the observed and the predicted encoded
eltime misfit function to invert for the subsurface velocity and aniso- values. To calculate the gradient with respect to the model parameters
tropic parameters in a vertical transverse isotropic medium. Instead of such as the velocity in each pixel, we use the implicit function theorem
minimizing the traveltime misfit function, Li et al. (2016) find the op- to compute the perturbation of the skeletonized information with re-
timal S-velocity model that minimizes the differences between the ob- spect to the velocity. The high-level strategy for inverting the skeleton-
served and predicted dispersion curves associated with surface waves. ized latent variables is summarized in Figure 1, where L corresponds
Liu et al. (2018) extend 2D dispersion inversion of surface waves to to the forward modeling operator of the governing equations, such as
the 3D case. Li et al. (2018) invert the data recorded over near-surface the wave equation. Any machine-learning method, such as principal
waveguides using the dispersion curve misfit function. Instead of component analysis (PCA), variational autoencoder (VAE), or a regu-
inverting for the velocity model, Dutta and Schuster (2016) develop larized autoencoder, can be used to approximate the original data by a
a wave-equation inversion method that inverts for the subsurface Qp lower dimensional representation.
distribution. Here, they find the optimal Qp model by minimizing the This paper is organized into four sections. After the introduction,
misfit between the observed and the predicted peak/centroid-frequency we explain the theory of the Newtonian machine-learning (NML)
shifts of the early arrivals. Similarly, Li et al. (2017) use the peak-fre- inversion method. This theory includes the formulation first pre-
quency shift of the surface waves to invert for the Qs model. A tutorial sented in Luo and Schuster (1991a, 1991b) where the implicit func-
for skeletonized inversion is in Lu et al. (2017). tion theorem is used to employ numerical solutions to the wave
One of the key problems with skeletonized inversion is that the equation for generating the Fréchet derivative of the skeletal data.
skeletonized data must be picked from the original data, which can Then, we present the numerical results for the synthetic data and
be labor intensive for large data sets. To overcome this problem, we field data. The last section provides a discussion and a summary
propose computing the skeletonized data by an autoencoder and of our work and its significance.
then use solutions to the wave equation to invert such data for the
model of interest (Schuster, 2018). The skeletonized data corre-
THEORY
spond to the latent codes in the latent space of the autoencoder,
which has a reduced dimension and retains significant information Conventional FWI inverts for the subsurface velocity distribution
related to the model. by minimizing the l2 norm of the waveform difference between the
observed and synthetic data. However, this misfit
function is highly nonlinear and the iterative sol-
ution often gets stuck in a local minimum (Bunks
et al., 1995). To mitigate this problem, skeleton-
ized inversion methods simplify the objective
function by combining the skeletonized represen-
tation of data, such as the traveltimes, with the
implicit function theorem, to give a gradient opti-
Figure 1. The strategy for inverting the skeletonized latent variables. mization method that quickly converges to the
vicinity of the global minimum. Instead of man-
ually picking the skeletonized data, we allow the
unsupervised autoencoder to generate such picks.
generally used to represent input data using a smaller dimensional In practice, training by a preconditioned steepest-descent method
space than is originally present (Hotelling, 1933). However, PCA is is employed with minibatch inputs. The above equations are for a
a linear operation that is restricted to finding the optimal rotation of two-layer autoencoder only; however, this representation can be
the original data axes that maximizes its projections to the principal easily extended to the N-layer case.
component axes. In comparison, the autoencoder with a sufficient
number of layers can find almost any nonlinear sparse mapping be- Skeletonized representation of seismic data by
tween the input and output images. A typical autoencoder architec- autoencoder
ture is shown in Figure 2 that generally includes three parts: the
In this section, we show how the autoencoder computes the low-
encoder, the latent space, and the decoder.
dimensional skeletonized representation of seismic data. The input
• Encoder: Unsupervised learning by an autoencoder uses a data consist of seismic traces, each represented by an nt × 1 vector.
set of training data consisting of N training samples Each seismic trace represents one training example in the training
fxð1Þ ; xð2Þ ; : : : ; xðNÞ g, where xðiÞ is the ith feature vector set. For the crosswell experiment, there are N s sources in the source
with dimension D × 1 and D represents the number of features well and N r receivers in the receiver well. We mainly focus on the
for each feature vector. The encoder network indicated by the inversion of the transmitted arrivals by windowing the input data
pink box in Figure 2 encodes the high-dimension input data around the early arrivals.
xðiÞ into a low-dimension latent space with dimension Figure 3a shows a homogeneous velocity model with a Gaussian
C × 1 using a series of neural layers with a decreasing number anomaly in the center. Figure 3b is the initial velocity model having
of neurons; here, C is typically much smaller than D. This the same background velocity as the true velocity model. A cross-
encoding operation at the first hidden layer can be mathemati- well acquisition system with two 1570 m deep cased wells sepa-
cally described as zðiÞ ¼ gðW1 xðiÞ þ b1 Þ, where W1 and b1 rated by 1350 m describes the source and receiver wells. The
represent the network parameters and the vector of bias terms finite-difference method is used to compute 77 acoustic shot gathers
for the first layer and gðÞ indicates the activation function such for the observed and synthetic data with a 20 m shot interval. Each
as a sigmoid, ReLU, or tanh. shot is recorded with 156 receivers that are evenly distributed along
• Latent space: The compressed data zðiÞ with
dimension C × 1 in the latent space layer
(denoted by the green box in Figure 2) is
a) True model m/s b) Initial model m/s
the lowest dimensional space in which
the input data are reduced and the key in-
formation about the data is preserved. The
latent space usually has a few neurons,
which forces the autoencoder neural net-
Depth (km)
Depth (km)
work to create effective low-dimensional
representations of the high-dimensional in-
put data. These low-dimensional attributes
can be used by the decoder to reconstruct
the original input.
• Decoder: The decoder portion of the
neural network represented by the purple
box reconstructs the input data from
the latent space representation zðiÞ by a x (km) x (km)
series of neural network layers with an
increasing number of neurons. For a Figure 3. (a) A homogeneous velocity model with a Gaussian velocity anomaly in the
decoder with one hidden layer, the recon- center and the (b) homogeneous background model.
structed data x~ ðiÞ are calculated by
x~ ðiÞ ¼ W2 zðiÞ þ b2 , where the coefficients a)
Original seismic trace
of the matrix W2 and vector b2 represent the network param-
Amplitude
X
N
JðW1 ; b1 ; W2 ; b2 Þ ¼ ðx~ ðiÞ − xðiÞ Þ2 ;
i¼1
X
N Time (s)
¼ ðW2 ðgðW1 xðiÞ þ b1 ÞÞ þ b2 − xðiÞ Þ2 : (1)
i¼1 Figure 4. The (a) original and (b) processed seismic traces.
the depth at a spacing of 10 m. To train the autoencoder network, we consists of a total of 2496 training examples, or seismic traces.
use the following workflow. We did not use all of the shot gathers for training because of the
increase in computation cost.
1) Construct the training set. For every five observed shots, we 2) Data processing. Each seismic trace is Hilbert transformed to
randomly select one shot gather as part of the training set that get its envelope, and then the transformed data are subtracted
a) Input trace by its mean and divided by its variance. Figure 4a and 4b shows
a seismic trace before and after processing, respectively. We use
Amplitude
a) b) c)
d) e) f)
Figure 6. (a-c) Three shot gathers and (d-f) their corresponding encoded data.
shot gathers, which are not included in the training set, and their where pz ðxr ; t; xs Þsyn represents a synthetic trace for a given back-
encoded values are shown in Figure 6d–6f, which are the skeleton- ground velocity model recorded at the receiver location xr due to a
ized representations of the input seismic traces. The encoded values source excited at location xs . The subscript z is the skeletonized
do not have any units and can be considered as a skeletonized attrib- feature (a low-dimensional representation of the seismic trace)
ute of the data. that is encoded by a well-trained autoencoder network. Similarly,
We compare the traveltime differences and differences of the pz−z1 ðxr ; t; xs Þobs denotes the observed trace with encoded skeleton-
latent variables for the observed and synthetic data in Figure 7. ized feature equal to z − z1 that has the same source and receiver
The black and red curves represent the observed and synthetic data, location as pz ðxr ; t; xs Þsyn , and z1 is the distance between the syn-
respectively. Figure 7b shows larger traveltime differences than thetic and observed latent space variables.
Figure 7a and 7c, as its propagating waves are affected more by For an accurate velocity model, the observed and synthetic traces
the Gaussian velocity anomaly than the other two shots. However, will have the same encoded values in the latent space. Therefore,
the misfit functions for the low-dimensional representations of the we seek to minimize the distance between the synthetic and ob-
seismic data exhibit a pattern similar to that of the traveltime misfit served latent space variables. This can be done by finding the shift
functions. Both reveal a large misfit at the traces affected by the value z1 ¼ Δz that maximizes the crosscorrelation function in equa-
velocity anomaly. Similar to the traveltime misfit values, the en- tion 2. If Δz ¼ 0, it indicates that the correct velocity model has
coded values are also sensitive to the velocity changes. In this case, been found and the synthetic and observed traces have the same
we can conclude that the (1) autoencoder network is able to estimate encoded values in the latent space. The Δz that maximizes the cross-
the effective low-dimensional representations of the input data and correlation function in equation 2 should satisfy the condition that
(2) the encoded low-dimensional representations can be used as the derivative of f z1 ðxr ; t; xs Þ with respect to z1 is equal to zero.
skeletonized features sensitive to changes in the velocity model. Therefore,
Theory of the NML inversion ∂f z1 ðxr ;t;xs Þ
ḟ Δz ¼ ;
∂z1 z1 ¼Δz
To invert for the velocity model from the skeletonized data, we Z
use the implicit function theorem to compute the perturbation of the
¼ dtṗz−Δz ðxr ; t; xs Þobs pz ðxr ; t; xs Þsyn ¼ 0; (3)
skeletonized data with respect to the velocity.
Connective function where ṗz−Δz ðxr ; t; xs Þobs ¼ ∂pz−z1 ðxr ; t; xs Þ∕∂z1 . Equation 3 is the
connective function that acts as an intermediate equation to connect
A crosscorrelation function is defined as the connective function the seismogram with the skeletonized data, which are the encoded
that connects the skeletonized data with the pressure field. This con- values of the seismograms (Luo and Schuster, 1991a, 1991b). Such
nective function measures the similarity between the observed and a connective function is necessary because there is no wave equa-
synthetic traces as tion that relates the skeletonized data to a single type of model
Z parameter (Dutta and Schuster, 2016). The connective function will
f z1 ðxr ; t; xs Þ ¼ dtpz−z1 ðxr ; t; xs Þobs pz ðxr ; t; xs Þsyn ; (2) be later used to derive the derivative of skeletonized data with re-
spect to the velocity.
Time (ms)
Time (ms)
d) Encoded data comparison #1 e) Encoded data comparison #38 f) Encoded data comparison #77
Encoded value
Figure 7. (a-c) Comparisons of the traveltimes for different shot gathers. (d-f) Comparisons of the encoded values for different shot gathers.
The black and red curves represent the observed and synthetic data, respectively.
For the first-order acoustic wave equation, the Fréchet derivative equation 10 can be rewritten as
∂p∕∂v can be written as
XXZ
γðxÞ ¼ −2ρv dt∇ · vðx; t; xs Þðgp ðxr ; −t; x; 0Þ
Encoded value misfit vs Velocity s r
25
Δpz ðxr ; t; xs ÞÞ;
20 XZ X
¼ −2ρv dt∇ · vðx; t; xs Þ ðgp ðxr ; −t; x; 0Þ
Encoded misfit
15 s r
vðxÞkþ1 ¼ vðxÞk þ αk γðxÞk ; (13) method is shown in Figure 10d, which successfully recovers the shal-
low velocity perturbations visited by the diving waves.
where k indicates the iteration number and αk represents the step
length. Reflection energy surface geometry checkerboard test
Depth (km)
Checkerboard tests
We first test the NML method on data generated for checkerboard
models with three different acquisition geometries.
x (km)
Depth (km)
The crosswell checkerboard model is shown in Figure 9a. A source
well is located at x ¼ 10 m, which includes 89 shots evenly distrib-
uted along the well with a shot interval of 20 m. Each shot gather is
recorded by 179 receivers evenly deployed along the receiver well
located at x ¼ 590 m. A 15-Hz Ricker wavelet is used as the source x (km)
c) Initial model
wavelet. The initial model is the homogeneous model shown in Fig- m/s
ure 9b. Figure 9c shows the first iteration of the NML gradient, which
Depth (km)
a) True model
b) Initial model
c) NML gradient
Depth (km)
Figure 9. The (a) true and (b) initial velocity models. (c) The NML gradient after the first iteration.
a) True model m/s shown in Figure 11a and 11b, respectively. Each shot is recorded
with 239 receivers that are evenly distributed on the surface at a
Depth (km)
Figure 12. The (a) true velocity and (b) initial a) True model m/s
b) Initial model m/s
velocity models.
Depth (km)
Depth (km)
x (km) x (km)
a) b) Figure 14. The (a) FWI, (b) WT, and (c) NML
tomograms. (d) The FWI tomogram using the
NML tomogram as the initial model.
Depth (km)
c) d)
Depth (km)
Depth (km)
x (km) x (km)
Figure 16. The (a) true and (b) initial velocity a) True model b) Initial model
models.
Depth (km)
x (km) x (km)
Figure 18. The (a) raw data, (b) data after band- a) b) c)
pass filtering, (c) data after tube-wave removal,
(d) upgoing waves, (e) data after wavefield sepa-
ration, and (f) final processed data.
d) e) f)
shown in Figure 17a, and the comparison of their vertical profiles at Friendswood crosswell field data
x ¼ 0.5 and x ¼ 0.8 is shown in Figure 17b and 17c, respectively.
The blue, red, and black curves represent the velocity profiles of the We now test our method on the Friendswood crosswell field data
set. Two 305 m deep cased wells separated by 183 m are used as the
initial, true, and inverted velocity models, respectively. It shows that
source and receiver wells. Downhole explosive charges are fired at
the inverted model can only reconstruct the low-wavenumber infor-
intervals of 3 m from 9 to 305 m in the source well, and the receiver
mation in the true velocity model. To get a high-resolution inversion
well has 96 receivers placed at depths ranging from 3 to 293 m.
result, a hybrid approach such as the skeletonized inversion + FWI
The seismic data are recorded with a sampling interval of 0.25 ms
approach can be used (Luo and Schuster, 1991a, 1991b).
for a total recording time of 0.375 s. We apply the following pro-
cessing steps to the raw data, which is similar to the processing
workflow in Dutta and Schuster (2014) and
Cai and Schuster (1993): pffiffiffiffiffiffi
The raw data are scaled by ðtÞ to correct the
a) c) d) f) 3D geometric spreading effects.pWe ffiffiffiffiffiffiffimultiply
ffi the
data spectrum with the filter i∕ω to correct
the phase.
A band-pass filter of 80–400 Hz is applied to
the observed data to remove the noise shown in
Figure 18a. The filtered data have a peak fre-
quency of 190 Hz.
To remove the tube waves shown in Fig-
ure 18b, we first flatten the tube waves and then
apply a nine-point median filter along the hori-
zontal direction to remove any other arrivals ex-
cept the tube waves. The filtered tube waves are
then shifted back to their original time positions
b) e) and subtracted from the original data. Figure 18c
shows the data after tube wave removal.
Because our goal is velocity inversion rather
than imaging, we use an FK method to separate
the upgoing waves from the downgoing waves.
Figure 18d shows the pure upgoing waves after
wavefield separation, and Figure 18e shows the
g) i) j) l) data that contain the downgoing waves only. We
interpolate the data to a 0.1 ms sampling interval
to ensure numerically stable solutions. A final
processed shot gather is shown in Figure 18f.
The autoencoder architecture we used here is
almost the same as the previous two cases, except
the dimensions of the input and output layers are
changed to 3750 × 1. A linearly increasing
velocity model is used as the initial model and
is shown in Figure 19a. Figure 19b shows the in-
verted velocity model with 10 iterations. Two
high-velocity zones at the depth range between
85–115 and 170–300 m appear in the inverted
result. However, some source artifacts appear
h) k) near the source well. Figure 20a shows the en-
coded value map of the observed data, where
the vertical and horizontal axes represent the
source and receivers indexes, respectively. This
shows that the near-offset traces have large pos-
itive values and the encoded values decrease as
the offset increases.
Figure 20b and 20c shows the encoded value
Figure 21. Shot gathers with (a) SNR = 30 db, (d) SNR = 11 db, (g) SNR = 4 db, and maps of the seismic data generated from the ini-
(j) SNR = 1 db. The 80th trace is displayed in (c), (f), (i), and (l), respectively. (b), (e), tial and inverted velocity models, respectively,
(h), and (k) display the encoded values from the observed and synthetic data, respec-
tively. The numbers along the horizontal axes of the encoded value graphs correspond to where the latter map is much more similar to
the trace indices and the numbers along the vertical axes correspond to the values of the the encoded value map of the observed data.
latent variable z. To measure the distance between the true and
the initial models, we plot the values of the encoded residuals in Figure 22 shows magnified views of the encoded values in
Figure 20d. It shows that there are relatively larger residuals at Figure 21, where some oscillations appear in the noisy data. These
the near-offset traces than at the far-offset traces. However, these oscillations could further affect the accuracy of the inverted result,
residuals are largely reduced with the inverted tomogram that is especially if the small velocity perturbation is omitted. Therefore,
shown in Figure 20e. This clearly demonstrates that our inverted good data quality with less noise is preferred for the autoencoder
tomogram is much closer to the true velocity model compared to method to recover an accurate subsurface velocity model.
the initial model.
Overfitting problem
In our examples, the number of seismic traces in the training set is
DISCUSSION
usually smaller than the number of parameters in the autoencoder,
Tests on the synthetic and observed data demonstrate that wave- which might result in an overfitting problem. If the data are over-
equation inversion of seismic data skeletonized by an autoencoder fitted, the network learns the intricacies of the training data set at
can invert for the low- to intermediate-wavenumber details of the sub- the expense of its ability to represent unseen examples (in the test
surface velocity model. We next test the method’s sensitivity to noisy data) (Valentine and Trampert, 2012). In other words, at some point
data and discuss overfitting problem. during training, the reconstruction error of the training set keeps
Reconstruction error
Figure 7, except we add random noise to the input data. Different
levels of noise are added to the observed and synthetic data. Fig-
ure 21a, 21d, 21g, and 21j shows four shot gathers, and their 80th
traces are displayed in Figure 21c, 21f, 21i, and 21l. Their encoded
results are shown in Figure 21b, 21e, 21h, and 21k, where the black
and red curves represent the encoded values from the observed and
synthetic data, respectively. It appears that the range of encoded
values decreases as the noise level increases. Moreover, the encoded
Iteration number
residual also decreases, which indicates that the encoded values be-
come less sensitive to the velocity changes as the data noise level Figure 23. The reconstruction error of the training set and testing
increases. set versus the iteration number.
a) Encoded data comparison b) Encoded data comparison Figure 22. (a-d) are the magnified views of Fig-
with SNR = 30 with SNR = 11 ure 21b, 21e, 21h, and 21k.
decreasing, while the reconstruction error of the testing set is either Multidimensional NML inversion
stable or becomes worse. Figure 23a and 23b shows the recon-
struction errors of the training and testing sets versus the iteration An autoencoder with a single latent space neuron can sometimes
number, respectively. It clearly shows that the reconstruction errors be incapable of fully capturing the important information in traces.
of both data sets decrease rapidly within the first 10 iterations and For a 2D latent space, the new misfit function can be written as
then gradually become stable. A similar pattern with our training
1XX
and testing sets demonstrates that we do not suffer from the over- ϵ¼ ðΔz1 ðxr ; xs Þ2 þ Δz2 ðxr ; xs Þ2 Þ; (14)
fitting problem during training. 2 s r
The connection between the encoded value and the where Δz1 and Δz2 are the encoded value difference of the first and
decoded waveform second latent space coordinates. The gradient γðxÞ is
An ideal autoencoder neural network seeks to identify the
common features in the training set and encapsulates these within ∂ϵ ∂z1 ∂z2
γðxÞ ¼ − ¼− Δz1 þ Δz2 : (15)
the encoder and decoder functions. The latent variables contain the ∂vðxÞ ∂vðxÞ ∂vðxÞ
information that distinguishes between each individual example in
the data set (Valentine and Trampert, 2012). To illustrate this point, Similarly, the connective function is
we perturb the encoded values in the latent space and see how the Z
decoded waveform changes. Figure 24 displays the changes in the
f z1 ;z2 ðxr ;t;xs Þ¼ dtpðzobs obs
1 −z1 ;z2 −z2 Þ
ðxr ;t;xs Þobs pzsyn ðxr ;t;xs Þsyn ;
encoded values and the decoded waveforms. It clearly shows that
the changes in the encoded values result in temporal shifts in the (16)
waveforms, but the shape of the waveform barely changes. There-
fore, in this case, the latent space information is mainly related to
which connects ∂z1 ∕∂vðxÞ and ∂z2 ∕∂vðxÞ with the Fréchet deriva-
the traveltimes, which are necessary to distinguish different exam-
tive ∂p∕∂vðxÞ. Using the multivariable implicit function theorem,
ples in the data set.
we can get
b)
c)
d)
e)
f)
g)
h)
c) d) e)
2 3 2 3−1 2 3
∂z1 ∂2 f ∂2 f ∂2 f CONCLUSION
4 ∂vðxÞ 5 4 ∂z2 ∂z1 ∂z2
5 4 ∂z1 ∂vðxÞ 5:
∂z2 ¼ − ∂2 f1 ∂2 f ∂2 f
(17)
We presented a wave-equation method that finds the velocity
∂vðxÞ ∂z1 ∂z2 ∂z22 ∂z2 ∂vðxÞ
model which minimizes the misfit function in the autoencoder’s la-
tent space. The autoencoder can compress a high-dimensional seis-
mic trace to a smaller dimension that best represents the original
We apply the multidimensional NML inversion method to the data in the latent space. In this case, measuring the encoded resid-
same data that were generated in the crosswell Marmousi model. uals largely reduces the nonlinearity when compared with measur-
The same data set is used for training except that there are two latent ing their waveform differences. Therefore, the inverted result will be
space neurons in the autoencoder. The autoencoder with two latent less prone to getting stuck in a local minimum. The implicit func-
space neurons converges to a smaller residual compared to the tion theorem is used to connect the perturbation of the encoded val-
single-latent space neuron autoencoder, which means that more ues with the velocity perturbations to calculate the gradient.
waveform information is preserved in the latent space. The initial Numerical results with synthetic and field data demonstrate that
model is a linear increasing model shown in Figure 16b. Figure 25b skeletonized inversion with the autoencoder network can accurately
and 25c shows the single-neuron and double-neuron NML tomo- estimate the background velocity model. The inverted result can be
grams, where the latter recovers more detail especially at depths used as a good initial model for FWI.
between z ¼ 0.1 and z ¼ 0.8 km. The velocity profile comparisons The most significant contribution of this paper is that it provides a
at x ¼ 0.5 and x ¼ 0.9 km are shown in Figure 25d and 25e, re- general framework for using solutions to the governing partial dif-
spectively, which shows that the double-neuron NML profile ferential equation in order to skeletal data generated by any type of
(the red solid line) more closely agrees with the true velocity profile neural network. The governing equation can be that for gravity, seis-
(the blue solid line). This improvement suggests that two latent mic waves, electromagnetic fields, and magnetic fields. The input
space neurons contain more information about the subsurface than data can be the records from different types of data, as long as the
does a single neuron. However, inverting a greater number of neu- model parameters are sensitive to the model perturbations. The skel-
rons will likely lead to a greater chance of getting stuck in a local etal data can be the latent space variables of a regularized autoen-
minima. To avoid this we suggest a multiscale strategy that initially coder, a VAE, or a feature map from a CNN, or PCA features. That
inverts for the velocity model from a low-dimensional latent space is, we have combined the deterministic features of forward and
representation. This low-wavenumber model is then used as the backward modeling in Newtonian physics with the dimensional re-
starting velocity model for inverting latent-space variables with a duction capabilities of machine learning to invert seismic data
higher-dimension. by NML.
ACKNOWLEDGMENTS Hotelling, H., 1933, Analysis of a complex of statistical variables into prin-
cipal components: Journal of Educational Psychology, 24, 417, doi: 10
.1037/h0071325.
The research reported in this paper was supported by the King Lailly, P., and J. Bednar, 1983, The seismic inverse problem as a sequence of
Abdullah University of Science and Technology (KAUST) in before stack migrations: Conference on Inverse Scattering: Theory and
Thuwal, Saudi Arabia. We are grateful to the sponsors of the Center Application, 206–220.
Li, J., G. Dutta, and G. Schuster, 2017, Wave-equation qs inversion of skel-
for Subsurface Imaging and Modeling (CSIM) Consortium for etonized surface waves: Geophysical Journal International, 209, 979–991,
their financial support. For computer time, this research used the doi: 10.1093/gji/ggx051.
Li, J., Z. Feng, and G. Schuster, 2016, Wave-equation dispersion inversion:
resources of the Supercomputing Laboratory at KAUST. We thank Geophysical Journal International, 208, 1567–15, doi: 10.1093/gji/ggw465.
them for providing the computational resources required for carry- Li, J., S. Hanafy, and G. Schuster, 2018, Wave-equation dispersion inversion
ing out this work. We also thank Exxon for the Friendswood cross- of guided P waves in a waveguide of arbitrary geometry: Journal of Geo-
physical Research: Solid Earth, 123, 7760–7774, doi: 10.3997/2214-4609
well data. .201801961.
Liu, Z., J. Li, S. M. Hanafy, and G. Schuster, 2018, 3D wave-equation
dispersion inversion of surface waves: 88th Annual International Meeting,
DATA AND MATERIALS AVAILABILITY SEG, Expanded Abstracts, 4733–4737, doi: 10.1190/segam2018-
2997521.1.
Data associated with this research are confidential and cannot be Lu, K., J. Li, B. Guo, L. Fu, and G. Schuster, 2017, Tutorial for wave-
released. equation inversion of skeletonized data: Interpretation, 5, no. 3, SO1-
SO10, doi: 10.1190/INT-2016-0241.1.
Luo, Y., and G. T. Schuster, 1991a, Wave equation inversion of skeletalized
REFERENCES geophysical data: Geophysical Journal International, 105, 289–294, doi:
10.1111/j.1365-246X.1991.tb06713.x.
AlTheyab, A., and G. Schuster, 2015, Reflection full-waveform inversion Luo, Y., and G. T. Schuster, 1991b, Wave-equation traveltime inversion:
for inaccurate starting models: Workshop on Depth Model Building: Geophysics, 56, 645–653, doi: 10.1190/1.1443081.
Full-waveform Inversion Workshop, 18–22. Plessix, R.-E., 2006, A review of the adjoint-state method for computing
Bunks, C., F. M. Saleck, S. Zaleski, and G. Chavent, 1995, Multiscale seis- the gradient of a functional with geophysical applications: Geophysical
mic waveform inversion: Geophysics, 60, 1457–1473, doi: 10.1190/1 Journal International, 167, 495–503, doi: 10.1111/j.1365-246X.2006
.1443880. .02978.x.
Cai, W., and G. T. Schuster, 1993, Processing friendswood cross-well seis- Schmidhuber, J., 2015, Deep learning in neural networks: An overview:
mic data for reflection imaging: 63rd Annual International Meeting, SEG, Neural Networks, 61, 85–117, doi: 10.1016/j.neunet.2014.09.003.
Expanded Abstracts, 92–94, doi: 10.1190/1.1822658. Schuster, G., 2018, Machine learning and wave equation inversion of skel-
Chen, S., L. Zimmerman, and J. Tugnait, 1990, Subsurface imaging using etonized data: 80th Annual International Conference and Exhibition,
reversed vertical seismic profiling and crosshole tomographic methods: EAGE, Extended Abstracts, WS01.
Geophysics, 55, 1478–1487, doi: 10.1190/1.1442795. Sun, Y., and G. T. Schuster, 1993, Time-domain phase inversion: 63rd
Chen, Y., Z. Feng, L. Fu, A. AlTheyab, S. Feng, and G. Schuster, 2019, Annual International Meeting, SEG, Expanded Abstracts, 684–687,
Multiscale reflection phase inversion with migration deconvolution: doi: 10.1190/1.1822588.
Geophysics, 85, no. 1, R55–R73, doi: 10.1190/geo2018-0751.1. Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic
Dutta, G., and G. T. Schuster, 2014, Attenuation compensation for least- approximation: Geophysics, 49, 1259–1266, doi: 10.1190/1.1441754.
squares reverse time migration using the viscoacoustic-wave equation: Valentine, A. P., and J. Trampert, 2012, Data space reduction, quality assess-
Geophysics, 79, no. 6, S251–S262, doi: 10.1190/geo2013-0414.1. ment and searching of seismograms: autoencoder networks for waveform
Dutta, G., and G. T. Schuster, 2016, Wave-equation Q tomography: Geo- data: Geophysical Journal International, 189, 1183–1202, doi: 10.1111/j
physics, 81, no. 6, R471–R484, doi: 10.1190/geo2016-0081.1. .1365-246X.2012.05429.x.
Feng, S., and G. T. Schuster, 2019, Transmission+ reflection anisotropic Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in
wave-equation traveltime and waveform inversion: Geophysical Prospec- exploration geophysics: Geophysics, 74, no. 6, WCC1–WCC26, doi: 10
ting, 67, 423–442, doi: 10.1111/1365-2478.12733. .1190/1.3238367.
Fu, L., B. Guo, and G. T. Schuster, 2018, Multiscale phase inversion of seismic Wu, R.-S., J. Luo, and B. Wu, 2014, Seismic envelope inversion and modu-
data: Geophysics, 83, no. 2, R159–R171, doi: 10.1190/geo2017-0353.1. lation signal model: Geophysics, 79, no. 3, WA13–WA24, doi: 10.1190/
Ha, W., and C. Shin, 2012, Laplace-domain full-waveform inversion of geo2013-0294.1.
seismic data lacking low-frequency information: Geophysics, 77, no. 5,
R199–R206, doi: 10.1190/geo2011-0411.1.