Diifusion Model 5
Diifusion Model 5
The impact of noise noise is often coherent, sharing seismic characteristics with the
Noise can affect the quality of seismic data, damaging the signal, it can be challenging to design filters that only remove the
geological integrity of the final migrated image; therefore, we unwanted components but preserve the useful energy.
should minimize its impact. If not properly attenuated it may
affect amplitude-related attributes, lead to difficulties during the The adoption of artificial intelligence
quantitative interpretation (Cambois, 2001; Ball et al., 2011), and In the early days, computers learnt how to solve problems that
result in an inaccurate appraisal of the reservoir. were intellectually difficult for humans by following a sequence
Noise can take many forms. In this paper, we consider the of strict mathematical rules. The true challenge was to create a
noise generated during the migration stage. The noise sources machine that could solve problems that humans solve intuitively,
might be as diverse as residual impulsive noise, multiple energy, problems that are hard to describe in formal rules. This knowledge
mispositioned primary energy due to errors in estimation of earth somehow needs to be captured by a computer to behave in an
properties, or insufficient illumination caused by limitations in the intelligent way (Goodfellow et al., 2017). This is a challenge for
data acquisition, and/or complex media-related propagation effects. creating an artificial intelligence (AI). To overcome the problem
In each case the migrated image may be affected by suboptimal the field of machine learning (ML) was born: AI systems were
destructive interference of the migration isochrones (Gardner and given the ability to acquire knowledge by extracting patterns from
Canning, 1994), resulting in a contamination of the data by coher- the data and gather knowledge from experience. Classical ML
ent noise. We concentrate on the latter case where the resulting methods are highly dependent on features that are prepared by
noise is due to inadequate illumination in a complex media. humans. It could be time consuming and also difficult to extract
An experienced geophysicist may easily differentiate signal and provide a right set of features without human bias in order for
and noise in a seismic section. It can be challenging to remove an ML algorithm to perform well. Deep learning (DL) overcomes
the noise without affecting the signal, because the coherent noise this problem by extracting information from raw data: complex
often has similar seismic characteristics to the desirable compo- representations can be learnt from the input by decomposing the
nents of the data. Several approaches can minimize the migration data into simpler intermediate representations. DL models look
artefacts. These include data regularization prior to migration, at the data on different scale levels layer by layer. Deep learning
filtering during migration, post-processing after migration, and has an ability to perform automatic feature extraction from raw
least-squares migration methods. data without depending completely on human-crafted features.
Data reconstruction is often used to overcome irregularities Together with advanced architectures and optimized training
in data coverage (Chemingui and Biondi, 2002; Schonewille et approaches, increase of training data for DL algorithms can help
al., 2009). Depending on the method and migration algorithm, in reaching human performance on complex tasks by learning
regularized data may produce less noise, but can require a signif- from a vast variety of examples.
icant effort in the data preparation. Moreover, regularization may Data is not in short supply within the seismic processing busi-
impact the resolution of the final image. Aperture optimization ness, however the adoption of artifical intelligence has not been
may also reduce the impact of noise generated in the migration as extensive as in other data-rich industries. There is evidence that
process (Alerini and Ursin, 2009; Klokov and Fomel, 2012), it is changing, as seismic companies seek to augment decision
but requires the knowledge of local structural dips, and results making and reduce project cycle times. The number of papers
are dependent on the accuracy of such dip information. There and manuscripts describing applications of artificial intelligence
are pragmatic alternatives such as filters that can be designed and data analytics has grown at geophysical conferences and in
to attenuate the noise in the image domain (Hale, 2011). As the geophysical journals. The trend in paper numbers suggests meth-
1
PGS
*
Corresponding author, E-mail: elena.klochikhina@pgs.com
DOI: 10.3997/1365-2397.fb2020048
ods invoking artificial intelligence-enabled automation may be The stability and quality of predictions depends on the network
the future of our industry. The combination of data and computer architecture, the hyperparameters and the training data set. In our
science with geophysics may be applicable to every aspect of example, the training data were created using noisy images as
a seismic processing project. From unsupervised (Martin et al., an input to the CNN and clean images (noise-free) as an output.
2015) to supervised (Farmani and Pedersen, 2020) classification The architecture of a convolutional neural network contains a
of denoising workflows, and support vector regression for data number of different operations; the goal of the trained network is
interpolation (Jia and Ma, 2017) to parabolic dictionary learning to replicate human endeavour. In our case we are trying to iden-
for data reconstruction (Turquais et al., 2019), most aspects tify and remove coherent noise from a seismic image. A typical
of data domain processing are being tested. Using a variety of CNN architecture for computer vision problems consists of a
neural network approaches, efforts are being made to compare number of different components that include convolutional layers
and contrast the velocity model building with conventional and activation functions. It may also contain other operations like
inversion-based schemes (Øye and Dahl, 2019; Yang and Ma, downsampling (or pooling), upsampling, batch normalization,
2019; Zheng et al., 2019). The reported results look encouraging. etc. All network components are connected in the form of a graph.
The essential components of CNN architectures are con-
Using a deep convolutional neural network for volutional layers separated by non-linear activation functions.
image denoising Convolutional layers use filters, which are also known as kernels.
For coherent noise attenuation, rather than explicitly formulating Each filter is an array consisting of a sequence of numbers or
filters as in conventional methods, we present an AI approach that weights. The filter slides over the image, only being exposed to
utilizes a deep convolutional neural network (CNN) to achieve a small number of input pixels at any time. The operation used
the same goal. The main components of CNNs are convolutional is a dot product of the input values with the filter weights. The
filters that are iteratively adjusted during the training step to output is a single number per sliding window. The results over the
handle the artefacts and produce clean outputs from the noisy entire input is called an activation or a feature map. Depending on
inputs. The trained models are then used to denoise the seismic the filter, each activation map identifies distinguishing features;
images from field experiments. with each progressive convolutional layer, more complex features
A neural network may act as a universal function approxi- can be determined. Non-linearity is a crucial part that allows the
mator to mimic the characteristics of a complex function F that neural network to approximate complex operations necessary for
maps a noisy input x to a noise-free output y: solving a given task. The so-called activation functions are used
for this purpose.
(1) There are many other components that could be used in CNN
architectures. In our denoising architecture, we use a pooling step
The goal of the training is to find a transformation that for downsampling. There are several types of pooling, and we used
maps x into a set of corresponding y. To do this we minimize a a maxpooling operation, which selects the largest number from
cost function ( J ), that is defined as a difference between the the neighbouring cells during the downsampling process. This
transformed inputs y’ and the desired clean outputs y, with respect reduces the spatial dimensionality of the input data scale, limiting
to the parameters of the trained network in L2 sense: computational cost and increasing the exposure of the input for the
next convolutional layer. The opposite operation is upsampling that
(2) is used to refine spatial sampling of the feature map.
The hyperparameters of a convolutional neural network are
(3) its structure, components, and training specifications. To achieve
the best performance of the convolutional neural network, we The bottom layer takes an input from the left branch and
optimize the hyperparameters through testing. In practice, the applies two convolutional layers.
network is trained using user defined data. This is critical for The expansion path receives the input from the bottleneck
the success of the neural network to achieve its goals. The data and also consists of four blocks; each block has two convolu-
needs to be representative of the problem we are trying to solve. tional layers followed by an upsampling procedure. After each
The training happens over the course of multiple epochs. Each upsampling step, the number of filters in the convolutional layers
epoch consists of multiple iterations. Each iteration uses a subset halves.
of the input data set, called a batch. On average, every epoch The corresponding blocks of the contraction and expansion
passes through all input data samples once. Data augmentation paths are connected by ‘skip connections’. This helps to solve the
enables a modification to the pool of training data. It is one way problem of a vanishing gradient during the training stage and sim-
to increase the number and variability of the data set, enabling a plifies the prediction task, as there is no need for reconstruction
more robust prediction and resulting in an increase in the level of of the image at full resolution from its compressed representation.
sophistication of the trained network. To accommodate the challenge, we modified the convolu-
tional blocks of U-net and fine-tuned the hyperparameters of the
Specifics of the architecture network during the training process to achieve better performance
Among the wide variety of commonly used network architectures of the neural network. In order to reduce the likelihood of overfit-
for image denoising in computer vision, we considered the U-net ting, we added dropout layers.
architecture (Ronneberger et al., 2015) to be most suitable for this We modelled synthetic shot gathers, which were migrated to
problem. During our testing phase, it showed better convergence, form the noisy inputs and clean outputs used in the training step.
faster training and solves the problem naturally as it enables We subsampled and migrated the synthetic data to generate the
operations on different feature resolutions (Figure 1). coherent noise in the images. The noise-free output consisted of
The architecture consists of three parts: the contraction (left clean images from the migration of appropriately sampled data.
branch), the bottleneck (bottom) and the expansion (right branch). The image patch size for training was 256x256 pixels (Figure 2).
Each convolutional layer receives an input and applies a set of We carefully selected the data set so that it included variations
3x3 filters, followed by a nonlinear activation function. in the following: frequency content, structural dip, amplitude,
The contracting path consists of four blocks; each block has and noise character and level. To increase the variability of the
two convolutional layers followed by a downsampling procedure input, we used data augmentation, which included horizontal
(maxpooling). The number of filters in the convolutional layers flips, random crops and sign reversal, filtering and scaling with
doubles each time the resolution decreases, so the architecture depth, resulting in approximately 100,000 total input samples.
retains the ability to explain complex features present in the input. Hyperparameters such as learning rate schedule, dropout rate and
batch size were adjusted during the training phase to minimize the studies demonstrate the ability of the trained network to attenuate
prediction error. We trained the network for 50 epochs on a single the noise from data sets that represent two geologically different
GPU with 32 Gb of memory. settings. We used only synthetic data to train the neural network;
therefore, the case studies demonstrate the ability of the network
The challenge of overfitting to generalize outside the training data set.
A common challenge for machine learning algorithms is overfit- We compared performance of the CNN-based denoising
ting. This occurs when the trained network’s performance shows tool with an application of a commonly used structure-oriented
great promise on the data used for training, but has a poor success filter (Hale, 2011). We could parameterize the filter differently
rate when attempting to generalize on previously unseen data. to preserve the primary energy; however here we were focused
This happens when the capacity of the model is too large, primarily on the noise attenuation aspect. More aggressive filter
compared to the diversity of a data set used for building the settings can better eliminate the noise at the cost of damaging
model. With neural networks, this occurs when there are too image resolution.
many parameters. The model may provide great flexibility and
approximation power, but the amount and variability of the data Example one – offshore Brazil
given to it is not enough to constrain the weights within the net- In the first example from Brazil, there is strong and pervasive
work, at least not without regularization. As a result, the network coherent noise. The upper yellow arrow in Figure 4 highlights
makes unreasonable predictions for any data that differs from the where this is most evident. The middle yellow arrow in the
training set, in our case frequency, amplitude or noise level. As same figure shows coherent noise above a high contrast and
a precaution, the input data set is split into two subsets, one for rugose surface. The migration noise directly overlying the
training and the remainder for validation. We then monitor the unconformity distorts the seismic events, making interpretation
trained model’s performance on the latter. A gradual decrease challenging. In all cases, the noise shares seismic characteristics
of the loss function for both training and validation data sets with the signal that we want to use and preserve, such as
implies reasonable generalization, assuming fair selection of the dipping fault planes, and the flanks of the deeper steep-sided
validation data set. body. Figure 5 shows the result of using the CNN on the
To reduce the overfitting problem, we used a dropout input data. The blue arrows show the removal of the coherent
technique. During the training step, we carefully monitored the noise. The reflectivity above the rugose unconformity is more
behaviour of the objective function for both the training and val- continuous, no longer disrupted by the noise forms, and there is
idation data sets, assuring the proper behaviour whilst preserving no noise contamination of the data abutting the deep steep-sided
an effective convergence (Figure 3). body.
The denoised section is much cleaner, and reflectivity is
Case studies easier to track. Steep dipping energy, such as fault planes,
The denoising capabilities of the neural network were tested are still present. The difference of the application (Figure 6)
on two field data sets, one from a deep water survey offshore demonstrates the impact of the CNN – a large amount of noise
Brazil, the other from a shallow water example in the North has been removed. There are indications that the process has
Sea. In both cases, insufficient illumination and complex media attenuated some steep dipping energy that correlates with the
cause coherent noise in the images that have similar seismic noise; however, the output section (Figure 5) shows that much of
characteristics just as the signal we want to preserve. The case this energy remains unscathed.
Figure 7 demonstrates an application of a structure-oriented filter. The injectites have localized reservoir potential. In Figure 10
The upper turquoise arrows indicate locations where the noise is still (blue arrows), we see that the trained neural network has removed
present, whilst the lower orange arrows show where the process has more noise; the events overlying the source of the noise are easier
removed the complementary steep dipping signal we would ideally to interpret. Figure 11 shows the impact of the deep-learning
preserve. It is also important to emphasize that the application of the approach. The noise has almost been eradicated. It is worth noting
structure-oriented filter affected image resolution. This conventional the orange arrows on Figure 11. They show that the injectites
method has not been as effective as the CNN approach. are affected by the denoise process. Their shape is very similar
to the coherent noise created in the migration process, as each
Example two – North Sea migration-related noise form has an apex. Consequently, the
In the second example, poor illumination of a single high-contrast process does attenuate some energy from the injectites but no
and undulating event causes localized migration-related noise more than the conventional approach, which underperforms on
(yellow arrows – Figure 8). The noise swings upwards disrupting the migration related noise attenuation.
the events directly above the rugose event, making interpretation
challenging. Discussion
The conventional denoise approach using structure-oriented The goal of this work is to demonstrate the denoising capabilities
filters mitigates some noise (turquoise arrows – Figure 9), but of a convolution neural network, in particular the attenuation of
also smears some of the isolated injectites located in the shallow- coherent noise formed during the migration process. The network
er layer (orange arrows – Figure 9). was trained using approximately 100,000 input samples, and