Geo 2021 0551.1
Geo 2021 0551.1
10.1190/GEO2021-0551.1
Manuscript received by the Editor 20 August 2021; revised manuscript received 9 August 2022; published ahead of production 5 October 2022; published
online 13 December 2022.
1
Petrobras, Rio de Janeiro, Brazil and Pontifícia Universidade Católica do Rio de Janeiro, Departamento de Engenharia Elétrica, Rio de Janeiro, Brazil.
E-mail: paula.burkle@petrobras.com.br (corresponding author).
2
Universidade de Lisboa, CERENA, DECivil, Instituto Superior Técnico, Lisboa, Portugal. E-mail: leonardo.azevedo@tecnico.ulisboa.pt.
3
Pontifícia Universidade Católica do Rio de Janeiro, Departamento de Engenharia Elétrica, Rio de Janeiro, Brazil. E-mail: marley@ele.puc-rio.br.
© 2023 Society of Exploration Geophysicists. All rights reserved.
R11
computationally expensive, mainly due to the model generation and Several physics-driven methods have recently been proposed to
update steps (Ferreirinha et al., 2015). solve geophysical inversion problems, such as full-waveform inver-
Deep learning (DL) is a subarea of machine learning (ML) that sion (FWI) (Hu et al., 2020; Jin et al., 2020; Sun et al., 2021a), elastic
applies deep neural networks (DNNs), which are neural networks inversion (Alfarraj and AlRegib, 2019; Biswas et al., 2019; Chen
composed of a large number of hidden layers that integrate different et al., 2021), geosteering inversion (Jin et al., 2020), and porosity
levels of features that are enriched with the number of layers. estimation (Feng et al., 2020). Several experiments are carried out
Convolutional neural networks (CNNs) (LeCun et al., 1989) are by Sun et al. (2021a) to evaluate and compare fully data-driven, fully
a class of DNN that revolutionized computer vision (Krizhevsky physics-driven, and hybrid approaches to solve the FWI problem.
et al., 2012), achieving outstanding results in processing image The trade-off between data residual and model misfit is analyzed
tasks (LeCun et al., 2015; Goodfellow et al., 2016; Alom et al., through a set of experiments based on a 2D synthetic velocity data
2019; Alzubaidi et al., 2021). Unlike conventional neural networks, set. The experiments show that the physics-based objective function
CNN makes the explicit assumption that the input data have a spa- significantly improves the accuracy of the predicted velocity models.
tial structure of multidimensional arrays, such as images, videos, An additional challenge, still largely open to research, is how to
and volumetric data (i.e., a 3D grid of pixels). quantify uncertainty in neural networks’ predictions (Hüllermeier
In the field of geophysics, CNNs have gained popularity due to and Waegeman, 2021; Mena et al., 2021). This is a crucial concern
the spatial and multidimensional nature of geophysical data. Exam- in the predictions of earth models (Feng et al., 2020), where direct
ples of applications of CNNs in geophysics comprise seismic inver- observations are rare and indirect investigation methods are inher-
sion (Alfarraj and AlRegib, 2019; Biswas et al., 2019; Feng et al., ently limited. Sun et al. (2021a) apply the Monte Carlo dropout
2020; Jin et al., 2020; Zhang et al., 2020), velocity analysis (Mosser method (Gal and Ghahramani, 2016), a widely used technique to
et al., 2018; Park and Sacchi, 2020), seismic interpretation (Xiong approximate Bayesian neural networks, to capture the neural net-
et al., 2018; Di et al., 2021), and seismic facies classification (Liu work uncertainty about the velocity models predictions. The results
et al., 2020; Li et al., 2021). show high-uncertainty areas related to illumination issues and sus-
Recently, DL has been applied to solve the seismic inverse prob- ceptibility to missing or incomplete data.
lem overcoming the limitations of traditional seismic inversion Concurrently, a DNN of the generative adversarial network (GAN)
methods, for example, the computational cost and the sensitivity (Goodfellow et al., 2016) type can be used to learn a prior distribution
to initial prior simulated models, measurement noise, and modeling and define a low-dimensional representation of the original high-di-
errors. The potential benefits of DL-based models include scalabil- mensional expected geologic patterns. The generative model can be
ity (van Essen et al., 2015), the capacity of neural networks in map- further conditioned to honor observed data (Chan and Elsheikh,
ping the complex and nonlinear relationship between observed data 2017; Dupont et al., 2018; Mosser et al., 2018; Richardson, 2018).
and model parameters, easy adaptation to perform new tasks (Pan Alternatively, a Monte Carlo technique can be applied to samples
and Yang, 2010; Weiss et al., 2016), and the availability of ready-to- from the posterior distribution of the model parameters given seismic
use solutions (e.g., automatic differentiation and GPU-accelerated observations (Laloy et al., 2018; Mosser et al., 2020). Although deep
computing) provided by open-source DL frameworks, such as generative priors enable the parameterization of geologic models for
PyTorch (Paszke et al., 2017). Adler et al. (2021) review the current uncertainty propagation, they are unstable and difficult to train.
state of the art regarding DL and geophysical inversion. GANs are known to be prone to mode collapse when only a few
DL techniques have been shown to perform well in problems modes of data are generated (Wiatrak and Albrecht, 2019).
involving massive amounts of labeled data. Early DL-based ap- We present herein a physics-aware DL method to invert poststack
proaches to the seismic inversion problem are fully data-driven and seismic data directly for high-resolution models of acoustic imped-
thus rely entirely on the completeness of the training data. However, ance (AI). In the proposed method, a deep CNN works as the in-
one of the biggest challenges in the generalization of DL in seismic verse of the forward operator, which is conditioned on a prior
inversion is the lack of sufficient and representative samples of the distribution of the reservoir heterogeneity. As with other physics-
target model parameters to train these high-capacity neural networks. driven approaches, the seismic forward model is incorporated into
Due to the high cost of acquiring direct measures of the subsurface the graph of the neural network to provide physics-based updates to
(i.e., well data), many geophysical problems have a huge amount of the network parameters.
unlabeled data and a limited quantity of annotated data, usually ob- To compensate for the incompleteness of the seismic data, we
tained from real elastic and petrophysical data from borehole logging. borrow a prior spatial distribution pattern from an ensemble of mod-
The scarcity of labeled data has been addressed using transfer els generated with geostatistical simulation (Deutsch and Journel,
learning, data augmentation techniques, and generating synthetic 1998). These previously simulated subsurface models, which are
data with geostatistical modeling tools (Mosser et al., 2018; Das based solely on local well data and a prior belief on the spatial dis-
et al., 2019; Colombo et al., 2020; Park and Sacchi, 2020; Sun et al., tribution pattern (i.e., a variogram model), are fed into the neural
2021b). Such strategies aim at reducing the dependency on large network alongside the observed seismic data. The geostatistical
training data and mitigating overfitting issues. In addition, deter- simulation method is performed at high spatial resolution and pro-
ministic physical models can be applied to guide the training vides the small-scale variability not captured in the seismic data.
process of ML models, a paradigm known as theory-guided, or During training, the network learns to extract and preserve the
physics-driven or -aware, neural networks (Karpatne et al., 2017). relevant features from the seismic data and prior simulated models.
In physics-driven approaches for inverse problems, the known A manually designed loss function penalizes input pairs of AI val-
physic laws that explain the system under investigation (i.e., the ues-and-seismic observations that are physically inconsistent. Each
propagation of a seismic wavefield) are used to leverage the use geostatistical realization used as input to the trained network will
of unlabeled data by constraining the model-data relationship. result in a different estimated AI model. The final ensemble of
estimates is used to evaluate the uncertainty of the predictions. This data. The simulated value is then assigned to the inversion grid and
is an alternative probabilistic perspective to the studies previously considered as conditioning data for the next simulation location
introduced. We summarize our main contributions as follows: along the random path. As the random path changes each time the
simulation runs, the conditioning data at each location along the
1) Fully unsupervised learning: we adopt the physics-driven simulation random path change, thereby resulting in alternative
learning paradigm with a twofold goal: to leverage the use models (i.e., geostatistical realizations).
of unlabeled seismic data and to validate the physical consis- From the M AI volumes, we compute the normal incidence re-
tency of input training pairs. Our learning paradigm is fully flection coefficients (RCs), which are then convolved with a wavelet
unsupervised as it does not depend on ground-truth models. to generate M synthetic full-stack seismic volumes. Each synthetic
Instead, we rely on the diversity of unconditional (with re- seismic trace is compared with the corresponding observed seismic
spect to seismic data) prior simulated models to explore trace as follows:
the model parameters’ solution space.
2) Uncertainty quantification: alternatively to dropout approaches, Pm j;r
j;r
2 k¼−m ðdtþk dtþk Þ
the predictions are given based on one single deterministic neu- S ¼ Pm 2
P m j;r 2 ; (1)
ral network, but stochastically prior generated models are used k¼−m ðdtþk Þ þ k¼−m ðdtþk Þ
as input to generate several predictions that are used for uncer-
tainty quantification. where S is the local trace-by-trace similarity coefficient; j is the
3) Efficiency: unlike conventional iterative geostatistical seismic iteration number; r is the realization number; m is the number
inversion methods, the computational complexity of the pro- of samples of a moving window; and dt and dj;r t are the observed
posed method does not scale with the number of output mod- and synthetic seismic traces at sample t, respectively. The S works
els. The training procedure is performed on a representative as Pearson’s correlation coefficient (i.e., varies between −1 and 1)
portion of the prior ensemble, for which the relative computa- but is sensitive to the waveform and the amplitude content of the
tional cost decreases with the number of final realizations. seismic traces. However, due to restrictions related to geostatistical
Once trained, the network can efficiently generate thousands cosimulation used in the subsequent iteration, negative values of S
of output models to hopefully devise more well-calibrated un- are truncated at zero.
certainty estimates. The AI traces that generate the highest S are selected and stored
in auxiliary volumes along with the corresponding S and used in the
The proposed inversion methodology is illustrated in a 1D syn- cosimulation of a new set of AI models in the subsequent iteration.
thetic example and applied to a real data set from a Brazilian postsalt The iterative inversion finishes when the global S, computed be-
region. We compare our results with those obtained by an iterative tween synthetic and recorded full-stack volumes, is above a given
geostatistical acoustic inversion (Azevedo and Soares, 2017) in terms threshold or a predefined number of iterations is reached. The GSI
of quantitative and qualitative performance metrics as well as relative can be summarized in the next sequence of steps (Soares et al.,
computational cost. 2007; Azevedo and Soares, 2017):
We start with a brief review of the geostatistical seismic inversion
(GSI) method used to benchmark the results obtained with the 1) Generate a set of M realizations of AI from the existing
method proposed herein. Next, we describe the proposed method- AI-log data and impose a variogram model that describes
ology followed by its application to invert a full-stack seismic the expected spatial continuity pattern of AI in the subsurface.
volume for AI. The results of the application examples are then dis- 2) Forward model each M AI model and compute M synthetic
cussed before the main conclusions are presented. seismic volumes.
3) Compare, on a trace-by-trace basis, the M synthetic and real
seismic volumes following equation 1.
GSI 4) For each location within the inversion grid, select the simu-
lated AI traces that ensure the maximum S. Store the selected
GSI methodologies are iterative optimization methods where AI traces and the collocated S values in two auxiliary
the perturbation of the model parameter space is performed with volumes.
stochastic sequential simulation and cosimulation (Azevedo and 5) Use the stored auxiliary volumes (step 4) as secondary var-
Soares, 2017). These seismic inversion methods use geostatistical iables in the cosimulation of a new set of M AI models. Lo-
simulation tools as they provide a framework to represent reservoir cations associated with a high S will exhibit a similar spatial
heterogeneity and model complex relationships between reservoir pattern as observed in the auxiliary volume, whereas loca-
properties (Doyen, 2007). tions collocated with a low S will have little influence from
In GSI (Soares et al., 2007), we start by generating an ensemble the auxiliary volume.
of M AI models from the set of AI-log data and imposing a spatial 6) Return to step 1 and iterate until the global S between real and
continuity model (i.e., a variogram model) with direct sequential synthetic volumes is above a predefined threshold.
simulation (DSS) (Soares, 2001). In DSS, the simulation grid (i.e.,
the inversion grid) is visited along a random path that visits all of the
inversion grid cells. At each location, the kriging estimate and krig- METHODOLOGY
ing variance are computed based on direct observations of the prop-
erty of interest (i.e., the AI-log data) and previously simulated grid In this section, we first introduce a general overview of the
cells. The kriging estimate and variance are then used to draw a proposed seismic inversion method. Then, we describe in detail the
value from an auxiliary probability distribution function built from parameterization and architecture of the deep networks used to
the global distribution function of AI as estimated from the AI-log predict AI from full-stack seismic data.
The proposed seismic inversion system is composed of two pected spatial distribution pattern, as represented by the variogram
neural networks: the inversion network F −1 and the physical model (Bosch et al., 2010).
forward operator network F .
The F −1 is a deep CNN parameterized by weights and biases in a Training the inversion network
multidimensional space (θ). It works as a surrogate to the inverse of
the physical forward operator. Given an input vector (x) containing After the creation of the ensemble of AI models, we train the
seismic reflection data (d) and a prior simulated AI model (m), F −1 inversion network (F −1 ) to output a high-resolution AI model from
outputs a vector m ^ in the AI domain: observed seismic data and a prior simulated AI model. A schematic
workflow of the training procedure is shown in Figure 1. During
training, F −1 learns to generate AI models that match the observed
m
^ ¼ F −1 x ¼
m ;θ : (2) seismic data. To achieve that, the parameters of F −1 are updated to
d minimize the residuals between the observed data and the predicted
synthetic seismic data:
The physical forward operator F is a CNN with fixed weights,
denoted by ω, that equals the input seismic pulse (i.e., the wavelet).
^ ωÞ − dk22 ;
LD ¼ kF ðm; (4)
It is used to convolve the seismic pulse with reflectivity series
devised from an input vector (m) in the AI domain and obtain
where k:k2 is the squared l2 norm.
^
the correspondent synthetic seismic data (d): A second objective function associated with the prior simulated
AI model, namely the model loss, is defined to encourage the net-
d^ ¼ F ðm; ωÞ: (3) work to be sensitive to the prior simulated model and preserve its
statistical properties (i.e., mean, variance, and spatial pattern). The
model loss is defined as
General workflow
The proposed deep physics-aware seismic inversion method can ^ − mÞk22 ;
LM ¼ kΓ ∘ ðm (5)
be summarized by the following steps:
where Γ is a matrix representing the confidence in the prior simu-
1) Generate M AI models (M∶fmj g; j ¼ 1; · · · ; M) with geo- lated model. The symbol ∘ denotes the Hadamard product (i.e., the
statistical simulation using the existing well-log data and a prior element-wise product).
spatial continuity model as revealed by the variogram model. The confidence in the prior model is spatially distributed. It is
2) Randomly select a small portion of M for training lower for regions where the AI values after forward modeling lead
(MT ∶fmTj g; j ¼ 1; · · · ; MT ). to synthetic seismic data with a poor match with the observed seis-
3) Train F −1 on MT and the observed seismic data (d) using the mic data. For these low-confidence regions, the update of the net-
minibatch gradient-descent algorithm. work parameters will assimilate more information from the seismic
4) Apply F −1 to each model in M to obtain the final ensemble data than from the prior model. We perform the following steps
of estimates (M∶f^ m ^ j g; j ¼ 1; · · · ; M). to compute the matrix Γ. First, we calculate the normal incidence
RCs from the prior simulated AI model, which are further con-
Next, we detail these main steps.
volved with the wavelet to generate the synthetic seismic data.
Then, the matrix Γ is computed based on the residual between the
Stochastic sequential simulation of AI models
retrieved synthetic seismic data and the observed seismic data:
We start by generating an ensemble of M AI models
(M∶fmj g; j ¼ 1; · · · ; M) from the set of existing AI well-log data Γ ¼ σððF ðm; ωÞ − dÞ2 Þ; (6)
and imposing a spatial continuity model (i.e., a variogram model)
with DSS (Soares, 2001). The geostatistical method generates geo- where σ is a sigmoid function applied herein to ensure values be-
logical models consistent with both direct observations and the ex- tween zero and one.
The model loss is added to the data misfit to
compose the total loss (L) in our optimization
procedure. The final loss function is given by
L ¼ λD LD þ λM LM ; (7)
data or the prior model. These weights depend on the signal-to- a transformation function that maps an input vector x ∈ R3ðC;H;WÞ
noise ratio of the observed seismic data and the confidence one into an output vector m ^ ∈ R2ðH;WÞ in the AI domain.
−1
The F is a deep CNN of the encoder-decoder type. Encoder-
has about the prior model. For high-quality seismic data and rela-
tively unexplored regions where few wells are available, λD should decoder networks typically include a middle bottleneck layer (i.e., a
be increased and λM decreased. However, in areas with a consider- lower dimensional hidden layer) to force the compression of the
able number of wells and where the subsurface geology is well input into the latent space representation. The benefits are twofold.
understood, λM should be increased. For the sake of simplicity, First, it reduces the computational cost by decreasing the number of
and due to the relatively large number of conditioning data and network parameters, and second, it enables the network to capture
high-quality seismic data, λM and λD were set to one in the appli- information about larger objects. The architecture of the inversion
cation examples shown next. network is summarized in Table 1. Residual blocks (He et al., 2016)
The F −1 is trained on a randomly chosen subset from the ensemble are used to avoid vanishing gradients and degradation of accuracy,
of prior simulated models. By not using the full ensemble, we reduce common problems faced in training DNNs (Bengio et al., 1994; He
the training time. This approximation is valid as long as all of the mod- et al., 2016; Radford et al., 2016). In addition, residual blocks en-
els in the prior share the same spatial variability and well-data infor- able the network to easily learn the identity function, which is of
mation. The size of the training subset is data dependent and can be particular interest for this application, as we expect to preserve the
empirically defined by taking into account the generalization error. statistics and spatial distribution of the inputs.
The proposed methodology assumes that the trained network will All convolutional layers use a 3 × 3 kernel size. In the encoder
be further applied to AI models drawn from the same distribution as part of the network, downsampling is directly obtained by convolu-
that of the training data. If either the measurement process changes tional layers with stride-2 followed by batch normalization (BN)
(e.g., source wavelet or well-log data) or the target geologic setting (Atanov et al., 2019) and a rectified linear unit (ReLU) (Agarap,
is different from the training data (e.g., subsurface models with dif- 2018) activation function. The last convolutional layer of the
ferent spatial correlations or facies proportions), then one should encoder is followed by five residual blocks. The residual blocks fol-
train an F −1 per set of realizations. low the architecture guidelines given by He et al. (2016). We adopt
A 2D-CNN architecture is adopted for the inversion and the transposed convolutions (Springenberg et al., 2015) for upsampling
physical forward operator networks. In a 2D-CNN, the filter in a to decode the latent vector. A tanh activation function is used in the
convolutional layer slides over the 2D input data along two spatial output layer of the network to bound the values in the range
dimensions. The inversion network is trained on vertical sections of ½−1;1.
extracted from the input volumes (i.e., the observed seismic data
and prior simulated model). The vertical slices are extracted in a Physical forward operator network F
given direction (i.e., along the inline or crossline direction). In the
application example, we use the inline direction; however, depend- The forward operator network implements a common forward
ing on the complexity of the geologic background, a multiple- model in geophysics, which is expressed by equation 8a. It receives
direction strategy could have been applied. an input vector m ∈ R2ðH;WÞ in the AI domain and outputs a vector
To transform AI and seismic amplitudes to be on a similar d^ ∈ R2ðH;WÞ in the amplitude domain. The network F contains one
scale, we scale each input data into the range of ½−1;1 using the convolutional layer that is used to convolve the RCs with the wave-
global minimum and maximum according to the formula let ω, accordingly to equation 8b. In the convolutional layer, the
x 0 ¼ ðx − xmin Þ=ðxmax − xmin Þ − 1. wavelet ω, represented by a 1D kernel, is convolved with the RCs
along the temporal dimension (i.e., H). The following steps are
performed to obtain the RCs. First, the output of the inversion net-
Predicting the ensemble of estimated AI models
work is scaled to the original AI global minimum and maximum
In the prediction step, the trained network is applied to invert values. Then, the retrieved AI vector is used to calculate the normal
high-frequency AI models from seismic data (d) and the ensemble
of prior simulated AI models (M∶fmj g; j ¼ 1 · · · M). The verti-
cal sections extracted from the input volumes are used as input to Table 1. The architecture of the encoder-decoder CNN for
the inversion operator F −1 .
F −1 , one after another, to create the output 3D AI models. This
process is repeated M times to generate the final ensemble of
the inverted AI model (M∶f ^ m ^ j g; j ¼ 1; · · · ; M). Filter size, Output
As the cost of training is fixed and independent of the number of Layer stride (C × H × W)
realizations, this method is especially appropriate for simulating
Input — 2 × 128 × 550
many thousands of realizations to explore the model parameter space.
Next, we detail the architecture of both networks. Conv, BN, ReLU 3 × 3, 1 32 × 128 × 550
Conv, BN, ReLU 3 × 3, 2 64 × 64 × 275
Inversion network F −1 Conv, BN, ReLU 3 × 3, 2 128 × 32 × 138
5 × residual blocks
Let m ∈ R2ðH;WÞ and d ∈ R2ðH;WÞ be images where H and W are
the height and width of the image, respectively, and R be the set of Conv, BN, ReLU, Conv, BN 3 × 3, 1 128 × 32 × 138
real numbers. The images m and d are vertical sections extracted Deconv, BN, ReLU 3 × 3, 1 64 × 64 × 275
from the prior simulated model and observed seismic data volumes, Deconv, BN, ReLU 3 × 3, 1 32 × 128 × 550
which are stacked onto each other along the channel dimension (C) Conv, BN, sigmoid 3 × 3, 1 1 × 128 × 550
to compose the input vector x ∈ R3ðC;H;WÞ (C equals 2). The F −1 is
incidence RCs accordingly to equation 8c. Ultimately, in the output Quantitative performance evaluation for our deep inversion network
layer of the network, the synthetic seismic data are scaled back to is the correlation coefficient between the synthetic and observed
the interval ½−1;1: seismic data. In addition, we evaluate whether the results reproduce
the prior variability and spatial continuity by comparing the histo-
F ðm; ωÞ ¼ Sðm; ω; tÞ; t ¼ 1; · · · ; H; (8a) grams and variograms of the inputs and estimated AI models.
Results
^ − M ∘ mk22 ;
Lsamples ¼ kM ∘ m (9)
Figure 3. (a–c) Progress of the model loss and (d–f) data loss on the train and test sets at every epoch, for 500 epochs, for the split ratios of
20%, 35%, and 60%. The split ratios correspond to the percentage of the prior ensemble used to train the neural network. The black curves are
the train losses, and the test losses are shown in the dashed gray lines.
from these estimated logs match the target one. The average corre-
lation coefficient between the synthetic seismograms and the target
one is 0.95.
The cumulative distribution functions (CDFs) of the real AI log
data (the red line) and the estimated AI logs (the gray lines) are
shown in Figure 5a. The predicted distribution does reproduce
the true one with a slight overestimation of the larger values. A com-
parison between the experimental vertical variograms retrieved
from the real AI log (the red dots) and the estimated AI logs
(the gray lines) is shown in Figure 5b. The real AI log variogram
represents the expected spatial pattern that we want to preserve. The
experimental variograms calculated from the predicted AI logs re-
produce the expected one.
Figure 7. Vertical well section extracted from (a) one AI geostatistical realization from the prior ensemble, (b) observed seismic data, (c) one
DL estimated AI from (a and b), and (d) synthetic seismic data calculated from (c).
trial-and-error experimentation by monitoring the generalization in approximately 60 min using the computing infrastructure
error through the learning process. When trained on a single reali- described previously. Then, we applied the trained inversion net-
zation and tested on another, both randomly selected from M, work to all of the AI models in M. We next present the results.
the neural network can fit equally well the train and test sets. This
indicates that the neural network can produce accurate outcomes Results
for previously unseen prior AI realizations. The number of epochs
is empirically defined. We observed that, after 300 epochs, no We start by showing the DL output when a prior AI realization,
relevant improvement is obtained for the objective function. We different from that used to train the inversion network, is the input
trained the inversion network with a batch size of 32 for 350 epochs to the trained inversion network. Figure 7 shows the vertical well
Figure 8. Inversion grid inlines extracted from (a) one AI geostatistical realization from the prior ensemble, (b) one DL estimated AI from (a),
(c) the corresponding synthetic seismic data, (d) the observed seismic data. Below each DL output is the MSE and the SSIM between the inputs
and outputs.
sections extracted from a stochastic realization of AI (Figure 7a), continuity pattern of the pointwise average models agrees with the
the observed seismic data Figure 7b), the result obtained when one observed in the recorded seismic data. At the same time, they
the AI realization from the prior is the input of the trained F −1 (Fig- present the influence of the well data around the wells. The standard
ure 7c), and the predicted synthetic seismic data retrieved from this deviation models are null at the well locations and increase with the
model (Figure 7d). The predicted AI model presents the structure of distance. They also point to regions of higher uncertainty in loca-
the observed seismic data, while preserving the spatial continuity tions where the signal-to-noise ratio of the original seismic data
pattern from the prior model. The predicted synthetic seismic data is low.
match the observed one in terms of amplitude content and location Ultimately, the comparison between the prior and estimated sta-
of the seismic reflections. The global correlation coefficient be- tistics is an additional representation of the goodness of fit of the
tween the observed and synthetic seismic data for this DL estimated produced models. A comparison between the CDFs from one prior
model is 0.85.
To assess the results obtained at several locations within the in-
version grid, Figure 8 shows inlines extracted from an AI model
predicted with the proposed method along with the input AI model,
the observed seismic data, and the synthetic seismic data. In addi-
tion to the qualitative perceptual image evaluation, we present two
metrics to quantitatively compare input and output data. One is the
normalized mean squared error (MSE) and the other is the structural
similarity index (SSIM) (Wang et al., 2004). The SSIM is a number
between zero and one that measures the similarity between two im-
ages based on three key features: structure, contrast, and luminance.
This is interpreted as follows: higher values mean the two images
are more similar, and one indicates that they are identical.
To assess the robustness of the predictions, we computed the
pointwise average (Figure 9a and 9c) and standard deviation (Fig-
ure 9b and 9d) volumes from all the estimated AI models and
respective synthetic seismic data generated for every single realiza- Figure 10. Original CDF (the red line) and CDF of the AI models
tion in the prior ensemble as input (i.e., 100 models). The spatial estimated with the proposed method (the gray lines).
Figure 9. Vertical well section extracted from (a) pointwise average model computed from AI DL ensemble, (b) pointwise standard deviation
model computed from AI DL ensemble, (c) pointwise average model computed from synthetic seismic DL ensemble, and (d) pointwise
standard deviation model computed from synthetic seismic DL ensemble.
model (the red line) and the estimated AI models (the gray lines) is iterations and 100 realizations. We started from the same ensemble of
shown in Figure 10. The estimated and prior CDFs are similar in geostatistical realizations used as prior in our application example.
terms of shape and maximum and minimum. Figure 11 compares Figure 13 shows vertical sections extracted from the best-fit in-
the variograms models (i.e., the spatial continuity pattern) between a verted models. Figure 14 shows the observed seismic and synthetic
realization from the prior ensemble (the red dots) and the estimated seismic data corresponding to the former results. A visual analysis
AI models (the gray lines). The horizontal omni-directional vario- of the AI models indicates that the DL-based method presents
gram is shown in Figure 11a. The horizontal variogram is computed higher lateral and vertical variability than the GSI one. The point-
with 20 lags with a lag size of 200 m. Figure 11b presents the ver- wise average and standard deviation results considering all of the
tical variograms computed with 30 lags of size 2 ms. The vario- realizations for both methods are shown in Figures 15 and 16. The
grams for the predicted values are similar to the prior model variability of the DL output presents higher values and a more uni-
variogram and the ranges for horizontal and vertical directions also form spatial distribution. Thus, the proposed method can explore
are preserved. more alternative solutions for this inversion problem. In addition,
We then evaluate the performance of the proposed inversion the synthetic seismic data obtained with the proposed method herein
method at the well location. Although the prior simulated models do not present the artifacts caused by the trace-by-trace-based opti-
match exactly the AI-log data after the upscaling into the inversion mization from GSI. The correlation coefficient between the synthetic
grid due to the use of geostatistical simulation, the proposed method and observed seismic data of GSI is 0.88, quite close to the corre-
does not impose any local constraint. This might lead to a mismatch lation coefficient obtained with the proposed method (0.85).
between measured and predicted AI values, which might be useful GSI is computationally expensive mainly due to the perturbation
when the well-log data imposed during the simulation of the prior of the model parameter space, which is performed with stochastic
ensemble have errors that originated during the data acquisition or
sequential simulation and cosimulation. In this implementation,
processing. Figure 12 shows a comparison between the observed
each individual realization is parallelized given the number of CPUs
AI log and the predicted ones and the correlation
between both.
DISCUSSION
We introduced a physics-aware DL method that
can predict high-resolution 3D models of AI from
full-stack seismic data and a limited number of
well logs. The proposed inversion methodology
combines geostatistical simulation with DL.
We applied the proposed method to a synthetic
and a real data set from a reservoir located offshore
of Brazil in the postsalt sedimentary sequence.
The synthetic application case illustrates the appli-
Figure 11. Comparison between the experimental (a) horizontal (omni-directional) and
cability of the proposed method but considers no (b) vertical variograms obtained from a prior geostatistical realization (the red dots) and
uncertainty regarding the wavelet used during the estimated AI with the proposed inversion method (the gray lines).
inversion and low uncertainty on the forward
model used during the inversion. For both appli-
cation cases, a qualitative analysis of the predicted
models indicates that the network successfully
captures and preserves the desired features from
the inputs — the spatial distribution from the
prior AI model and the large-scale structures
and spatial continuity of the seismic data. The cor-
relation coefficient between the input and output
data is used as a metric for the quantitative assess-
ment of the performance of the method. The
proposed method achieved a global correlation
coefficient between the observed and synthetic
seismic data equal to 0.95 and 0.85, for the syn-
thetic and real cases applications, respectively. We
additionally computed the SSIM between inputs
and outputs for a better assessment of the preser-
vation of the structural information.
We compare the results obtained with the
proposed methodology against the GSI method.
For the real case application data set, we ran the Figure 12. AI at the well locations for 100 realizations predicted with the proposed
GSI method on a Xeon Gold 6242 processor with inversion method (the light-gray lines), the upscaled well logs (the black lines), and
32-cores (2.8 GHz, DDR4-2933, 12 GB) with six the point-wise average of 100 realizations (the red lines).
available in the system. It took 45 min to perform the simulation The time complexity for deep-learning models is evaluated in terms
step for each iteration. The total processing time, considering all of the total time taken by the training and the prediction time. This
of the calculations of the inversion procedure, is approximately 3 h. may reach millions of calculations and be quite expensive computa-
Figure 13. Comparison between a vertical well section of AI results obtained by the proposed deep physics-aware seismic inversion method
and the GSI: (a) one realization predicted with the proposed method and (b) one realization predicted with GSI.
Figure 14. Comparison between vertical well sections extracted from (a) the synthetic seismic data obtained by the proposed method and
(b) the GSI.
Figure 15. Comparison between the vertical well section of AI results obtained by the proposed deep physics-aware seismic inversion method
and the GSI: (a) point-wise average of AI computed from the ensemble of predicted models with the proposed method and (b) a point-wise
average of AI computed from the ensemble of predicted models with GSI.
Figure 16. Comparison between the vertical well section of AI results obtained by the proposed deep physics-aware seismic inversion method
and the GSI: (a) point-wise standard deviation of AI computed from the ensemble of predicted models with the proposed method and (b) the
point-wise average of AI computed from the ensemble of predicted models with GSI.
tionally. However, most of these matrix operations are performed in ried out. We achieve comparable performance considering the quan-
parallel on GPUs, which can considerably reduce the processing time. titative metric but considerably better visual perceptual quality. The
In practice, a comparison between the computational complexity of computational cost of our method is dramatically improved when
the GSI and the proposed method is not straightforward as both meth- compared with the conventional one. Unlike the GSI approach, our
ods leverage different parallelization strategies. solution is well suited to generate thousands of alternate solutions
It took approximately 1 h to train the inversion network for the real for better uncertainty quantification.
application example. Considering the initial step of generation of the The proposed method has a large potential for further extensions.
prior ensemble and the estimation step, the total processing time is Although the algorithm is described in the acoustic domain (i.e., in-
very similar to the GSI one. However, in the proposed method, as version of the full stack for AI), it also can be generalized for the
the network is trained on a subset of the prior simulated models’ en- prestack domain. In future work, we aim to apply this methodology
semble, the training time does not scale with the number of estimates to solve amplitude-versus-angle and petrophysical seismic inver-
and the inference is performed almost in real time. As a consequence, sion in a simultaneous workflow.
for a large number of realizations, the computational cost of our
method is dramatically improved when compared with the GSI one.
ACKNOWLEDGMENTS
We believe that our method could be successfully applied in a
different sedimentary basin, with different geologic settings. An ex- PB wants to express her gratitude to Petrobras for sponsoring this
haustive discussion regarding the generalization of the proposed in- research project and for the permission to publish the data. LA
version method to other geologic environments is out of the scope gratefully acknowledges the support of the CERENA (strategic
of this work; however, we anticipate that the proposed inversion project FCT-UIDB/04028/2020).
method would have similar performance for application examples
where good signal-to-noise seismic data are available and we can
generate geostatistical simulations with spatial continuity patterns DATA AND MATERIALS AVAILABILITY
that resemble the true subsurface geology. This will require a rea-
sonable number of wells and/or good geologic knowledge. Data associated with this research are confidential and cannot be
released.
CONCLUSION
REFERENCES
We showed a seismic inversion method that can predict AI from
Adler, A., M. Araya-Polo, and T. Poggio, 2021, Deep learning for seismic
full-stack seismic data based on DL and geostatistical simulation. inverse problems: Toward the acceleration of geophysical analysis work-
The proposed inversion method builds in two neural networks that flows: IEEE Signal Processing Magazine, 38, 89–119, doi: 10.1109/MSP
act as inversion and forward models. The forward network mimics .2020.3037429.
Agarap, A. F., 2018, Deep learning using rectified linear units (ReLU): arXiv
the seismic convolutional model. The misfit between synthetic and preprint, doi: 10.48550/arXiv.1803.08375.
observed seismic data, along with the prior model reconstruction Alfarraj, M., and G. AlRegib, 2019, Semisupervised sequence modeling for
loss, is then used to update the parameters of the network. The elastic impedance inversion: Interpretation, 7, no. 3, SE237–SE249, doi:
10.1190/INT-2018-0250.1.
encoder-decoder architecture enables the inversion network to ex- Alom, M. Z., T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin,
tract and preserve the relevant features from the seismic data and M. Hasan, B. C. van Essen, A. A. Awwal, and V. K. Asari, 2019, A state-
prior model distribution. The AI models estimated with DL keep of-the-art survey on deep learning theory and architectures: Electronics, 8,
292, doi: 10.3390/electronics8030292.
the variability observed in the prior geostatistical realizations. Alzubaidi, L., J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-
The method is applied to a real 3D seismic data set and the pre- Shamma, J. Santamaría, M. A. Fadhel, M. Al-Amidie, and L. Farhan,
2021, Review of deep learning: Concepts, CNN architectures, challenges,
dicted AI models are consistent with those obtained from a full GSI applications, future directions: Journal of Big Data, 8, 53, doi: 10.1186/
methodology. A comparison against the existing GSI method is car- s40537-021-00444-8.
Atanov, A., A. Ashukha, D. Molchanov, K. Neklyudov, and D. Vetrov, 2019, Krizhevsky, A., I. Sutskever, and G. Hinton, 2012, ImageNet classification
Uncertainty estimation via stochastic batch normalization: International Sym- with deep convolutional neural networks: Advances in Neural Informa-
posium on Neural Networks, Springer International Publishing, 261–269. tion Processing Systems 25.
Azevedo, L., and A. Soares, 2017, Geostatistical methods for reservoir geo- Laloy, E., R. Hérault, D. Jacques, and N. Linde, 2018, Training-image based
physics: Springer International Publishing. geostatistical inversion using a spatial generative adversarial neural network:
Bengio, Y., P. Simard, and P. Frasconi, 1994, Learning long-term depend- Water Resources Research, 54, 381–406, doi: 10.1002/2017WR022148.
encies with gradient descent is difficult: IEEE Transactions on Neural LeCun, Y., Y. Bengio, and G. Hinton, 2015, Deep learning: Nature, 521,
Networks, 5, 157–166, doi: 10.1109/72.279181. 436–444, doi: 10.1038/nature14539.
Biswas, R., M. K. Sen, V. Das, and T. Mukerji, 2019, Prestack and poststack LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard,
inversion using a physics-guided convolutional neural network: Interpre- and L. D. Jackel, 1989, Backpropagation applied to handwritten zip code rec-
tation, 7, no. 3, SE161–SE174, doi: 10.1190/INT-2018-0236.1. ognition: Neural Computation, 1, 541–551, doi: 10.1162/neco.1989.1.4.541.
Bosch, M., T. Mukerji, and E. F. Gonzalez, 2010, Seismic inversion for res- Li, F., H. Zhou, Z. Wang, and X. Wu, 2021, ADDCNN: An attention-based
ervoir properties combining statistical rock physics and geostatistics: A re- deep dilated convolutional neural network for seismic facies analysis
view: Geophysics, 75, no. 5, 75A165–75A176, doi: 10.1190/1.3478209. with interpretable spatial-spectral maps: IEEE Transactions on Geoscience
Chan, S., and A. H. Elsheikh, 2017, Parametrization and generation of geo- and Remote Sensing, 59, 1733–1744, doi: 10.1109/TGRS.2020.2999365.
logical models with generative adversarial networks: arXiv preprint, doi: Liu, M., M. Jervis, W. Li, and P. Nivlet, 2020, Seismic facies classification
10.48550/arXiv.1708.01810. using supervised convolutional neural networks and semisupervised gen-
Chen, H., J. Gao, X. Jiang, Z. Gao, and W. Zhang, 2021, Optimization-in- erative adversarial networks: Geophysics, 85, no. 4, O47–O58, doi: 10
spired deep learning high-resolution inversion for seismic data: Geophys- .1190/geo2019-0627.1.
ics, 86, no. 3, R265–R276, doi: 10.1190/geo2020-0034.1. Mena, J., O. Pujol, and J. Vitrià, 2021, A survey on uncertainty estimation in
Colombo, D., W. Li, E. Sandoval-Curiel, and G. W. McNeice, 2020, Deep- deep learning classification systems from a Bayesian perspective: ACM
learning electromagnetic monitoring coupled to fluid flow simulators: Computing Surveys, 54, 1–35, doi: 10.1145/3477140.
Geophysics, 85, no. 4, WA1–WA12, doi: 10.1190/geo2019-0428.1. Mosser, L., O. Dubrule, and M. J. Blunt, 2020, Stochastic seismic waveform
Das, V., A. Pollack, U. Wollner, and T. Mukerji, 2019, Convolutional neural inversion using generative adversarial networks as a geological prior: Math-
network for seismic impedance inversion: Geophysics, 84, no. 6, R869– ematical Geosciences, 52, 53–79, doi: 10.1007/s11004-019-09832-6.
R880, doi: 10.1190/geo2018-0838.1. Mosser, L., W. Kimman, J. Dramsch, S. Purves, A. D. la Fuente Briceño, and
Deutsch, C. V., and A. Journel, 1998, GSLIB: Geostatistical software library G. Ganssle, 2018, Rapid seismic domain transfer: Seismic velocity inver-
and user’s guide: Oxford University Press, Applied Geostatistics Series. sion and modeling using deep generative neural networks: 80th Annual
Di, H., C. Li, S. Smith, Z. Li, and A. Abubakar, 2021, Imposing interpreta- International Conference and Exhibition, EAGE, Extended Abstracts, doi:
tional constraints on a seismic interpretation convolutional neural net- 10.3997/2214-4609.201800734.
work: Geophysics, 86, no. 3, IM63–IM71, doi: 10.1190/geo2020-0449.1. Pan, S., and Q. Yang, 2010, A survey on transfer learning: IEEE Transac-
Doyen, P., 2007, Seismic reservoir characterization: An earth modelling per- tions on Knowledge and Data Engineering, 22, 1345–1359, doi: 10.1109/
spective: EAGE Publications bv. TKDE.2009.191.
Dupont, E., T. Zhang, P. Tilke, L. Liang, and W. J. Bailey, 2018, Generating Park, M. J., and M. D. Sacchi, 2020, Automatic velocity analysis using con-
realistic geology conditioned on physical measurements with generative volutional neural network and transfer learning: Geophysics, 85, no. 1,
adversarial networks: arXiv: Machine Learning. V33–V43, doi: 10.1190/geo2018-0870.1.
Feng, R., T. M. Hansen, D. Grana, and N. Balling, 2020, An unsupervised Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A.
deep-learning method for porosity estimation based on poststack seismic Desmaison, L. Antiga, and A. Lerer, 2017, Automatic differentiation in
data: Geophysics, 85, no. 6, M97–M105, doi: 10.1190/geo2020-0121.1. PyTorch: NIPS 2017 Workshop Autodiff Decision Program Chairs.
Ferreirinha, T., R. Nunes, L. Azevedo, A. Soares, F. Pratas, P. Tomás, and N. Radford, A., L. Metz, and S. Chintala, 2016, Unsupervised representation
Roma, 2015, Acceleration of stochastic seismic inversion in OpenCL- learning with deep convolutional generative adversarial networks: Proceed-
based heterogeneous platforms: Computers and Geosciences, 78, 26–36, ings of the 4th International Conference on Learning Representations.
doi: 10.1016/j.cageo.2015.02.005. Richardson, A., 2018, Generative adversarial networks for model order re-
Gal, Y., and Z. Ghahramani, 2016, Dropout as a Bayesian approximation: duction in seismic full-waveform inversion: arXiv preprint, doi: 10.48550/
Representing model uncertainty in deep learning: Proceedings of the 33rd arXiv.1806.00828.
International Conference on International Conference on Machine Learn- Soares, A., 2001, Direct sequential simulation and cosimulation: Mathemati-
ing, JMLR.org, 48, 1050–1059. cal Geology, 33, 911–926, doi: 10.1023/A:1012246006212.
Gonzalez, E., T. Mukerji, and G. Mavko, 2008, Seismic inversion combining Soares, A., J. D. Diet, and L. Guerreiro, 2007, Stochastic inversion with a
rock physics and multiple-point geostatistics: Geophysics, 73, no. 1, R11– global perturbation method: Petroleum Geostatistics 2007.
R21, doi: 10.1190/1.2803748. Springenberg, J. T., A. Dosovitskiy, T. Brox, and M. A. Riedmiller, 2015,
Goodfellow, I., Y. Bengio, and A. Courville, 2016, Deep learning: MIT Press. Striving for simplicity: The all convolutional net: Proceedings of the 3rd
Grana, D., L. Azevedo, L. de Figueiredo, P. Connolly, and T. Mukerji, 2022, International Conference on Learning Representations.
Probabilistic inversion of seismic data for reservoir petrophysical charac- Sun, J., K. Innanen, and C. Huang, 2021a, Physics-guided deep learning for
terization: Review and examples: Geophysics, 87, no. 5, M199–M216, seismic inversion with hybrid training and uncertainty analysis: Geophys-
doi: 10.1190/geo2021-0776.1. ics, 86, no. 3, R303–R317, doi: 10.1190/geo2020-0312.1.
Grana, D., T. Mukerji, and P. Doyen, 2021, Seismic reservoir modeling: Sun, Y., M. Araya-Polo, and P. Williamson, 2021b, Data characterization and
Theory, examples, and algorithms: Wiley. transfer learning for DL-driven velocity model building: First International
Grana, D., T. Mukerji, J. Dvorkin, and G. Mavko, 2012, Stochastic inversion Meeting for Applied Geoscience & Energy, SEG, Expanded Abstracts,
of facies from seismic data based on sequential simulations and probabil- 1475–1479, doi: 10.1190/segam2021-3594467.1.
ity perturbation method: Geophysics, 77, no. 4, M53–M72, doi: 10.1190/ Tarantola, A., 2004, Inverse problem theory and methods for model param-
geo2011-0417.1. eter estimation: SIAM.
He, K., X. Zhang, S. Ren, and J. Sun, 2016, Deep residual learning for image van Essen, B., H. Kim, R. Pearce, K. Boakye, and B. Chen, 2015, LBANN:
recognition: Proceedings of the IEEE Computer Society Conference on Livermore big artificial neural network HPC toolkit: Proceedings of the
Computer Vision and Pattern Recognition, 770–778. Machine Learning in High-Performance Computing Environments —
Hu, W., Y. Jin, X. Wu, and J. Chen, 2020, Physics-guided self-supervised Held in Conjunction with SC 2015: The International Conference for
learning for low frequency data prediction in FWI: 90th Annual International High Performance Computing, Networking, Storage and Analysis, Asso-
Meeting, SEG, Expanded Abstracts, 875–879, doi: 10.1190/segam2020- ciation for Computing Machinery, Inc.
3423396.1. Wang, Z., A. Bovik, H. Sheikh, and E. Simoncelli, 2004, Image quality as-
Hüllermeier, E., and W. Waegeman, 2021, Aleatoric and epistemic uncer- sessment: From error visibility to structural similarity: IEEE Transactions
tainty in machine learning: An introduction to concepts and methods: on Image Processing, 13, 600–612, doi: 10.1109/TIP.2003.819861.
Machine Learning, 110, 457–506, doi: 10.1007/s10994-021-05946-3. Weiss, K., T. M. Khoshgoftaar, and D. Wang, 2016, A survey of transfer
Jin, P., S. Feng, Y. Lin, B. Wohlberg, D. Moulton, E. Cromwell, and X. learning: Journal of Big Data, 3, 9, doi: 10.1186/s40537-016-0043-6.
Chen, 2020, CycleFCN: A physics-informed data-driven seismic wave- Wiatrak, M., and S. V. Albrecht, 2019, Stabilizing generative adversarial
form inversion method: 90th Annual International Meeting, SEG, network training: A survey: CoRR, abs/1910.00927.
Expanded Abstracts, 3867–3871, doi: 10.1190/segam2020-w13-05.1. Xiong, W., X. Ji, Y. Ma, Y. Wang, N. M. Benhassan, M. N. Ali, and Y. Luo,
Karpatne, A., G. Atluri, J. H. Faghmous, M. Steinbach, A. Banerjee, A. 2018, Seismic fault detection with convolutional neural network: Geo-
Ganguly, S. Shekhar, N. Samatova, and V. Kumar, 2017, Theory-guided physics, 83, no. 5, O97–O103, doi: 10.1190/geo2017-0666.1.
data science: A new paradigm for scientific discovery from data: IEEE Zhang, R., Y. Liu, and H. Sun, 2020, Physics-guided convolutional neural
Transactions on Knowledge and Data Engineering, 29, 2318–2331, network (PhyCNN) for data-driven seismic response modeling: Engineer-
doi: 10.1109/TKDE.2017.2720168. ing Structures, 215, 1–24.
Kingma, D. P., and J. Ba, 2017, Adam: A method for stochastic optimization:
arXiv preprint, doi: 10.48550/arXiv.1412.6980. Biographies and photographs of the authors are not available.