Jiang Et Al 2021
Jiang Et Al 2021
29382/eqs-2021-0038
Article
Key points:
• The advantages and disadvantages of two deep learning models, namely PhaseNet and EQTransformer, are compared for
the detection of aftershock sequences of the Yangbi and Maduo earthquakes.
• An analysis of factors affecting the generalization ability of earthquake detection models is presented, which can serve as
a reference for the development of new earthquake detection networks and evaluation indicators.
• Foreshock activity was evident three days prior to the mainshock of the Yangbi earthquake, whereas no foreshock
activity was detected three weeks prior to the Maduo earthquake.
Abstract PhaseNet and EQTransformer are two datasets. In addition, noise datasets should be incorporated
state-of-the-art earthquake detection methods that have been during training. According to the continuous waveforms
increasingly applied worldwide. To evaluate the generaliz- detected 21 days before the Yangbi and Maduo earthquakes,
ation ability of the two models and provide insights for the the Yangbi earthquake exhibited foreshock, while the Maduo
development of new models, this study took the sequences of earthquake showed no foreshock activity, indicating that the
the Yunnan Yangbi M 6.4 earthquake and Qinghai Maduo two earthquakes’ nucleation processes were different.
M7.4 earthquake as examples to compare the earthquake
detection effects of the two abovementioned models as well Keywords: earthquake detection; deep learning; PhaseNet;
as their abilities to process dense seismic sequences. It has EQTransformer; Yangbi earthquake; Maduo earth-
been demonstrated from the corresponding research that due quake.
to the differences in seismic waveforms found in different
geographical regions, the picking performance is reduced Citation: Jiang C, Fang LH, Fan LP, and Li BR (2021).
when the two models are applied directly to the detection of Comparison of the earthquake detection effects of
the Yangbi and Maduo earthquakes. PhaseNet has a higher PhaseNet and EQTransformer considering the Yangbi
recall than EQTransformer, but the recall of both models is and Maduo earthquakes. Earthq Sci 34 (5): 425–435,
reduced by 13%–56% when compared with the results rep- doi: 10.29382/eqs-2021-0038.
orted in the original papers. The analysis results indicate that
neural networks with deeper layers and complex structures
may not necessarily enhance earthquake detection perfor-
mance. In designing earthquake detection models, attention 1. Introduction
should be paid to not only the balance of depth, width, and
architecture but also to the quality and quantity of the training The automatic detection and phase picking of
earthquakes are two important aspects of seismology
Received 22 August 2021; received in revised form 28 September research. With the construction of the National Earthquake
2021; accepted 2 October 2021; available online 25 October
2021.
Intensity Rapid Report and Early Warning Project in
© The Seismological Society of China and Institute of Geophysics, recent years and the inclusion of the China Seismic
China Earthquake Administration 2021. Experimental Site (CSES) in the 14th National Five-Year
Plan, the number of fixed seismic stations in China is identifying the intervals for P- and S-waves in seismic
expected to see a rapid increase from over 1,200 to waveforms, and precisely locating their first motions. In
approximately 15,000. With the densification of this seismic sequence processing, semantic segmentation is
network, the number of monitored earthquakes will mainly performed by convolutional neural networks
increase exponentially. Meanwhile, with the rise of (CNNs), which mimic human vision to extract features in a
induced earthquakes (Lei XL et al., 2020; Yang W et al., manner similar to image feature extraction by human
2021), manual processing can no longer keep up with the analysis. Improvements made by scholars based on CNNs
demands of real-time processing, such as earthquake early include the following. The Generalized Phase Detection
warning and rapid reporting. Thus, there is a pressing need (GPD) method uses a full CNN to recognize seismic phase
to develop efficient automatic earthquake detection and noise (Ross et al., 2018). The PickNet automatic
methods. seismic phase picking algorithm utilizes multi-scale and
Among the early earthquake detection methods multi-level information (Wang J et al., 2019). The PpkNet
proposed by seismologists, the most representative ones hybrid event detection and phase picking algorithm
can be categorized into three types: (1) amplitude and employs a CNN and recurrent neural network (RNN),
energy ratio-based, such as short/long time windows respectively (Zhou YJ et al., 2019). The Unet method
(Allen, 1978; Withers et al., 1998; Baer and Kradolfer, leverages a CNN and symmetric network structure (Zhao
1987); (2) waveform similarity-based, such as the template M et al., 2019; Zhu WQ and Beroza, 2018). The
matching algorithm (Peng ZG and Zhao P, 2009; Gibbons EQTransformer method was developed based on a CNN,
and Ringdal, 2006; Shelly et al., 2007) and the fingerprint RNN, and attention mechanism (Mousavi et al., 2020;
and similarity thresholding (FAST) method (Yoon et al., Xiao ZW et al., 2021). The above models were trained on
large datasets. Among them, PhaseNet developed by Zhu
2015); and (3) traditional machine learning-based (Wang J
WQ and Beroza (2019) and EQTransformer developed by
and Teng TL, 1997; Gentili and Michelini, 2006; Dai HC
Mousavi et al. (2020) have exhibited excellent precision,
and MacBeth, 1995). In these traditional methods, features
recall, and F1-Score and have been increasingly applied
such as waveform polarization and sudden changes in
both at home and have been increasingly applied both at
amplitude and frequency are utilized to pick up seismic
home and abroad (Table 1).
signals. While the template matching method has good
As different geographical regions exhibit different
catalog completeness, it requires pre-preparation of temp-
waveform characteristics, if a given earthquake detection
lates, has a low detection efficiency, and cannot determine
model is applied directly to the study area, its picking
the seismic phase arrival time directly. The early machine
performance may be reduced, and its recall could drop by
learning-based detection methods are limited by hardware
30%–40% (Chai CP et al., 2020). To evaluate the gene-
performance, have difficulty in performing complex calcu-
ralization ability of the PhaseNet and EQTransformer
lations, and have not been widely adopted for earthquake models, especially when applied to dense seismic
detection. sequences, the Yangbi and Maduo earthquakes were
The AlphaGo computer program developed by Google chosen as the examples for this study, and the continuous
defeated expert human chess players in March 2016, which waveforms were processed using the two methods for
prompted increased interest in deep learning methods from earthquake detection. The arrival time picking, recall, and
seismologists. In May 2017, the Institute of Geophysics of precision were analyzed in comparison to manual
China Earthquake Administration (CEA) and Alibaba processing results, and the influences of training sets,
Cloud Tianchi platform jointly launched the “International network structures, and other factors on the detection
Aftershock Detection Contest”, which further accelerated effect are discussed. The research outcomes are expected
the development of artificial intelligence (AI) in the field to serve as a reference for developing new evaluation
of earthquake detection and seismic phase recognition indicators as well as improving the performance of
(Fang LH et al., 2017). Deep learning networks are more earthquake detection models.
complex than traditional machine learning networks
(LeCun et al., 2015), and they are currently applied to
seismic sequence processing for two main purposes: image 2. Methods
classification and semantic segmentation. Image class-
ification can be used to identify both natural and unnatural 2.1. PhaseNet method
earthquakes (Wei YG et al., 2019) as well as seismic
signals and noise (Li ZF et al., 2018). Semantic segme- PhaseNet is an earthquake detection model developed
ntation is primarily used for detecting earthquakes, by Zhu WQ and Beroza at Stanford University based on
Jiang C et al. doi: 10.29382/eqs-2021-0038 427
Unet, which is an image semantic segmentation network. (Mousavi et al., 2019). The waveform length was 1 min,
Unet can extract the precise outline of the target object the sampling rate was 100 Hz, and the training data were
from an image after receiving the image input. It has a preprocessed by bandpass filtering at 1.0–45.0 Hz. When
symmetrical U-shaped structure, with a convolutional layer applying this model to the STEAD test set, a residual of
on the left and upsampling layer on the right. PhaseNet is less than 0.5 s was considered as true. For P-wave picking,
an improvement of Unet for picking seismic phases in the the precision was 99.0%, and the recall was 99.0%; for S-
field of seismology. It has a depth of five layers, and the wave picking, the precision was 99.0%, and the recall was
number of parameters is 268,000. PhaseNet was trained on 96.0%.
more than 600,000 waveforms recorded by the Northern
California Earthquake Data Center (NCEDC). The window 3. Yangbi and Maduo earthquakes
length for the training data was 30 s, the sampling rate was
100 Hz, and normalization preprocessing was performed
An M6.4 earthquake occurred in Yangbi, Yunnan at
on the waveforms. When applying this model to the test
21:48 on May 21, 2021. Four hours later, an M7.4
set, a residual (arrival time difference between AI and
earthquake occurred in Maduo, Qinghai. Both events were
analyst) of less than 0.5 s was considered as true. For P-
accompanied by a large number of aftershocks (Figure 1)
wave picking, the precision was 96.0%, and the recall was
(Wang WL et al., 2021; Yang T et al., J Earth Sci, in
96.0%; for S-wave picking, the precision was 96.0%, and
revision; Su JB et al., 2021). For the Yangbi earthquake,
the recall was 93.0%. To date, this method has been
continuous waveform data recorded at 14 stations near the
applied to aftershock sequence detection in regions
epicenter from May 21 to May 27 were used. During this
including Changning, Sichuan (Zhao M et al., 2019),
period, 3,650 earthquake events were processed manually,
Ridgecrest, US (Liu M et al., 2020), and Central
and the earthquake bulletin contained 14,151 P-wave
Apennines, Italy (Zhang X et al., 2020).
phases and 11,214 S-wave phases. For the Maduo
2.2. EQTransformer method earthquake, continuous waveform data recorded at 14
stations near the epicenter from May 22 to May 26 were
EQTransformer is an earthquake detection and phase
used. During this period, 2,192 earthquake events were
picking network established by Mousavi et al. at Stanford
processed manually, and the seismic phase report
University based on Transformer, which is an attention
contained 8,053 P-wave phases and 4,870 S-wave phases.
mechanism-based network framework developed by Goog-
The above data were sampled at 100 Hz and stored in
le. This framework has been widely applied in natural
MiniSEED format. Before using PhaseNet, the data were
language processing and has also achieved promising
preprocessed by detrending and normalization. Before
results in the field of image segmentation. EQTransformer
using EQTransformer, the data were preprocessed by
combines both CNN and RNN models and introduces the
detrending, a bandpass filter at 1.0–45.0 Hz, and
attention mechanism to retain the local and global features
normalization.
of seismic signals. EQTransformer differs from other
networks in that it has a coding layer consisting of 17
layers and 3 decoding layers consisting of 8–10 layers, 4. Results
corresponding to the earthquake probability, P-wave
probability, and S-wave probability. The number of The evaluation indicators—namely precision, recall,
parameters is 378,000. EQTransformer was trained using and F1-Score (Table 2)—used in the original papers of
the STanford EArthquake Dataset (STEAD) dataset PhaseNet and EQTransformer were adopted in this study.
containing the data of approximately 850,000 earthquakes Precision is the ratio of true positives to both true and false
Note: The data in the table are cited from Zhu WQ and Beroza (2019) and Mousavi et al. (2020). Only data with an arrival time difference of
less than 0.5 s are compared here.
428 doi: 10.29382/eqs-2021-0038 Jiang C et al.
Yangbi earthquake
sents the actual false data predicted as true by the model,
(a)
and FN (false negative) represents the actual true data
25°42'N predicted as false by the model.
The default probability thresholds for the original
methods were adopted for the evaluation of this study: 0.3
25°36'
for PhaseNet and 0.1 for EQTransformer. The results are
listed in Table 2. It can be seen that compared with the
results given in the original papers of Zhu WQ and Beroza
(2019) and Mousavi et al. (2020), the recall of both
25°30'
99°48'E 100°00' 100°12' PhaseNet and EQTransformer dropped significantly,
Maduo earthquake
showing reductions of approximately 13%–56%. A greater
(b) reduction was observed for precision, largely due to the
fact that the original studies only used event waveforms as
35°00'N
test data, and when processing continuous waveforms,
there were many false detections, which resulted in a
significant precision decrease. For the Yangbi earthquake,
34°30'
PhaseNet’s recall reached 82.9%, which was much higher
than EQTransformer’s recall of 44.9%; however, Phase-
Net’s precision was 17.1%, which was much lower than
97°30'E 98°15' 99°00' 99°45'
EQTransformer’s precision of 50.8%. For the Maduo
earthquake, PhaseNet’s recall was 67.0%, which was still
Figure 1. Epicenter distributions of aftershock sequences
higher than EQTransformer’s recall of 50.0%; PhaseNet’s
following the (a) Yangbi earthquake and (b) Maduo
precision was 12.7%, which was also lower than
earthquake.
EQTransformer’s precision of 32.4%. To examine the
positives, and recall is the ratio of true positives to both impact of dense seismic events on the detection results, the
true positives and false negatives. F1-Score is a compre- data from May 24–28 were selected, during which the
hensive evaluation indicator that combines precision and aftershock frequency of the Yangbi earthquake gradually
decreased. On this basis, the various indicators were
recall. The corresponding calculation formulas are as
recalculated and compared with those including the day of
follows
the mainshock. The recall and precision of PhaseNet were
TP 82.3% and 11.2%, respectively; the recall decreased by
Precision = ,
TP + FP 0.6%, and the precision decreased by 5.9%. The recall and
TP precision of EQTransformer were 33.4% and 46.5%,
Recall = , (1)
TP + FN respectively; the recall decreased by 11.5%, and the
Precision × Recall
F1-Score = 2 × , precision decreased by 4.3%. The recall of EQTransformer
Precision + Recall may have dropped more because the method is poor at
where TP (true positive) represents the actual true data detecting seismic events with a low signal-to-noise ratio
predicted as true by the model, FP (false positive) repre- (SNR). The SNR of seismic phases was high during the
Table 2. Comparison of detection results between PhaseNet and EQTransformer for the Maduo and Yangbi earthquakes.
Maduo earthquake Yangbi earthquake
Indicator
PhaseNet EQTransformer PhaseNet EQTransformer
intense outbreak of aftershocks, but then it gradually PhaseNet and manual picking of P-wave was 0.009s, the
decreased, resulting in missed phase picking. absolute average was 0.049 s, and the standard deviation
In addition, the choice of probability threshold had a was 0.088 s. The average arrival time error between
large impact on the recall and precision. To objectively PhaseNet and manual picking of S-wave was 0.017 s, the
compare the earthquake detection effects of PhaseNet and absolute average was 0.082 s, and the standard deviation
EQTransformer, their precision, recall, and F1-Score were was 0.128 s. The average arrival time error between
compared under different probability thresholds (Table 3). EQTransformer and manual picking of the P-wave was
The comparison results for two earthquake sequences 0.000 s, the absolute average was 0.074 s, and the standard
under different probability thresholds show that PhaseNet deviation was 0.120 s. The average arrival time error
gave a better recall performance and provided more between EQTransformer and manual picking of S-wave
sensitive earthquake detection. EQTransformer was more was 0.001 s, the absolute average was 0.080 s, and the
stable and exhibited higher precision. PhaseNet had a standard deviation was 0.121 s.
precision of only 8.3%–31.3%, indicating that the number
of detected earthquakes was 3 to 16 times that of manual 5. Discussion
picking; that is, a large number of false earthquakes may
have been detected. This can be attributed to the fact that As shown in Tables 2 and 3, the recall of PhaseNet was
no noise data were used in the training, so the false much higher than that of EQTransformer, but its precision
detection rate was high when processing continuous was lower. In other words, compared with EQTransformer,
waveforms. PhaseNet is more likely to detect a wider range of arrival
The arrival time picking errors of PhaseNet and times that are not present in the manual catalog. More
EQTransformer (Figure 2) were obtained by comparing the specifically, its detection results not only cover the arrival
manually obtained arrival time data of the Yangbi and times missed in the manual analysis but also the falsely
Maduo earthquakes. It can be seen that the P-wave picking detected ones. In this study, only seismic phase picking
error of PhaseNet was lower than that of EQTransformer, was conducted; however, when processing continuous
while its S-wave picking error was similar to that of waveforms, earthquake association, location, and other
EQTransformer. For the Yangbi earthquake, the average processes are also conducted to filter the false detections.
arrival time error between PhaseNet and manual picking of Therefore, recall has a higher priority than precision in
the P-wave was 0.028 s, the absolute average was 0.066 s, practical applications; that is, it should be ensured that no
and the standard deviation was 0.107 s. The average arrival seismic phase is missed during the picking process. This
time error between PhaseNet and manual picking of the S- also indicates that in addition to the high recall, precision,
wave was 0.073 s, the absolute average was 0.116 s, and and F1-Score of the earthquake detection model, seismic
the standard deviation was 0.146 s. The average arrival data from different regions should also be employed as test
time error between EQTransformer and manual picking of data to improve its generalization ability. As such, actual
the P-wave was 0.011 s, the absolute average was 0.077 s, data can be processed more effectively. For earthquake
and the standard deviation was 0.119 s. The average arrival early warnings and automatic reporting, recall should be
time error between EQTransformer and manual picking of emphasized to avoid missing earthquakes, and a multi-
the S-wave was 0.076 s, the absolute average was 0.162 s, station strategy should be employed to filter false picking
and the standard deviation was 0.192 s. For the Maduo signals. For automatic cataloging, the precision should be
earthquake, the average arrival time error between increased appropriately, and the seismic phase picking
Table 3. Comparison of detection results between PhaseNet and EQTransformer under different probability thresholds (PT).
PhaseNet EQTransformer
25.0% 15.0%
Percentage
Percentage
20.0%
10.0%
15.0%
10.0%
5.0%
5.0%
0.0% 0.0%
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4
TAI − TCatalog (s) TAI − TCatalog (s)
Percentage
8.0%
15.0%
6.0%
10.0%
4.0%
5.0% 2.0%
0.0% 0.0%
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4
TAI − TCatalog (s) TAI − TCatalog (s)
Figure 2. Histograms of arrival time differences between manual and AI picking. P- and S-wave picking error distributions
for the (a) Yangbi and (b) Maduo earthquakes. The picking errors of PhaseNet and EQTransformer are represented by red and
blue histograms, respectively.
error should be reduced while ensuring a certain recall common methods include increasing the network’s depth
performance such that the manual review workload is (He KM et al., 2016), width (Zagoruyko and Komodakis,
minimized. 2016), and resolution (Huang YP et al., 2019). However, a
Both PhaseNet and EQTransformer exhibit some limi- large width, depth, or resolution alone will not necessarily
tations (Figure 3). In cases where the same detection deliver better network performance; instead, a balance
window contains multiple earthquakes with significant between them will achieve higher precision and efficiency
differences in amplitudes, the earthquake probabilities for (Tan MX and Le QV, 2019). In this study, a section of the
smaller earthquakes detected by PhaseNet will be sign- waveform was intercepted as an example to plot the output
ificantly lower, resulting in them being missed. Thus, of each layer of the EQTransformer network (Figure 4). It
when using PhaseNet, the window length should be can be seen that as the layer deepened, the output of each
carefully selected. A longer time window will allow the layer gradually became more abstracted. The Layer_norm-
detection of more events with a greater distance between alization layer is the deepest layer for extracting global
epicenters, but this will also miss many adjacent small features, and the decoding starts from this point until the
events. Regardless, EQTransformer will produce unstable final probability is output. Deep features are mostly high-
missed detections, even for distinguishable earthquake level features that express frequencies or even regions.
events. However, the waveforms differ from human faces or other
In addition to optimizing input window length and objects, and as such, the reliability of the extracted high-
other hyperparameters, attention should also be paid to the level features cannot be evaluated by manual analysis.
architecture of the network to enhance its performance. Nevertheless, blindly increasing the network depth will
Regarding network architecture improvement, the most result in excessive calculations and the extraction of too
Jiang C et al. doi: 10.29382/eqs-2021-0038 431
PhaseNet EQTransformer
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
0 4 8 12 0 4 8 12
t (s) t (s)
Figure 3. Earthquake events missed by PhaseNet and EQTransformer. The plot on the left shows the events detected by
PhaseNet but missed by EQTransformer, while the plot on the right shows the events detected by EQTransformer but missed
by PhaseNet. The blue and red lines correspond to the arrival times of the P- and S-waves, respectively.
many invalid features. Furthermore, the extraction of high- performance. EQTransformer was trained on the STEAD
level features will also require more precise input, i.e., dataset, which contains 850,000 seismic records, and
high-quality datasets are required. PhaseNet is a five-layer PhaseNet was trained with 600,000 waveforms recorded
Unet network containing 268,000 parameters, and by NCEDC. In terms of quantity, EQTransformer used a
EQTransformer has one encoding layer consisting of 17 greater amount of data. In terms of the spatial distribution
layers and three decoding layers consisting of 8–10 layers, of the training sets, the STEAD dataset is global, while the
containing 378,000 parameters. Thus, EQTransformer has NCEDC dataset is regional. In theory, training the model
a greater depth, width, and parameter count than PhaseNet. with the STEAD dataset should deliver better generali-
However, from the evaluation indicators, such as picking zation ability. To analyze the impact of dataset size on the
error and recall, it can be seen that PhaseNet exhibits a training results, the P-wave and S-wave travel times were
stronger generalization ability than EQTransformer in extracted from the STEAD and NCEDC datasets, and the
other geographical regions. Therefore, it is not entirely corresponding travel time curves are plotted in Figure 5. It
reliable to judge the generalization ability of a model can be seen that both the STEAD and NCEDC datasets are
solely by considering the network architecture. highly concentrated near the fitted straight line in these
Data quality is also a critical factor effecting model plots, and some data are evenly distributed in areas far
432 doi: 10.29382/eqs-2021-0038 Jiang C et al.
Picking S-phase
Picking P-phase
Detection
0 10 20 30 40 0 10 20 30 40 0 1 2 3 4 5 6
Sample Sample Sample (×103)
Figure 4. Input and output features of different layers of EQTransformer. (a)Input layer; (b) features extracted by Convld_1;
(c) features extracted by Convld_5; (d) features extracted by Convld_20; (e) global features extracted by the deepest layer; and
(f) outputs of P-Wave, S-wave, and detection probability.
Time (s)
20 20
15 15
10 10
5 5
0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
Hypocetner distance (km) Hypocetner distance (km)
Time (s)
20 20
15 15
10 10
5 5
0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
Hypocetner distance (km) Hypocetner distance (km)
−6.5 −5.5 −4.5 −3.5 −2.5 −5.0 −4.5 −4.0 −3.5 −3.0 −2.5 −2.0
P-phase S-phase
Figure 5. Travel time curves for (a) STEAD dataset, (b) NCEDC seismic data from 2018 to 2020, (c) Yangbi earthquake
aftershock sequences, and (d) Maduo earthquake aftershock sequences. The numbers in the upper left corners represent the
standard deviations of the travel times of the P- and the S-waves in units of second. The color bars represent the logarithmic
values of the normalized phase proportions.
Jiang C et al. doi: 10.29382/eqs-2021-0038 433
Counts (×103)
similar. Thus, it can be concluded that using the STEAD 3
training set with a greater amount of data did not improve 2
the generalization ability of the model. This may be due to
1
overfitting during the training process of EQTransformer
0
and/or inadequacy of its network architecture. It can be
seen from Figure 5 that the Yangbi and Maduo earthquake
01
05
09
13
17
21
25
5-
5-
5-
5-
5-
5-
5-
-0
-0
-0
-0
-0
-0
-0
sequences showed better performance than foreign data in
21
21
21
21
21
21
21
20
20
20
20
20
20
20
terms of travel time variance and standard deviation. The
quality of the arrival time data is closely related to the 4 Qinghai, MAD
experience of the analysts as well as factors such as the
complexity of the velocity structure and distribution of 3
Counts (×103)
focal depths. With high-quality Chinese seismic data as the
2
training set, a better seismic detection model than Phase-
Net and EQTransformer may be developed. 1
Most studies currently regard the Yangbi earthquake as
a typical foreshock-mainshock-aftershock sequence owing 0
05
09
13
17
21
25
5-
5-
5-
5-
5-
5-
5-
-0
-0
-0
-0
-0
-0
-0
research has yet been conducted on the foreshock activities
21
21
21
21
21
21
21
20
20
20
20
20
20
20
of the Maduo earthquake due to the sparse distribution of
stations. In this study, the PhaseNet method, which Figure 6. Earthquake detection results at the CHT station in
the Yangbi aftershock area and at the MAD station in the
exhibited a higher recall, was selected to detect the
Maduo aftershock area. The abscissa is the date, the ordinate is
continuous waveforms from the two nearest stations to the
the number of P-waves detected, and the stars and triangle
Yangbi and Maduo earthquakes. The results (Figure 6) indicate the occurrence times of the mainshock and foreshock,
show that the frequency of earthquakes recorded at the respectively.
CHT station three days before the Yangbi earthquake
picking error than EQTransformer. PhaseNet has a low
increased significantly, whereas that recorded at the MAD
missed detection rate, but the number of detected
station did not increase significantly three weeks before
earthquakes is much larger than the actual number,
the mainshock, suggesting that the Maduo earthquake may
possibly due to the absence of noise datasets during its
not have given any foreshocks.
training process. EQTransformer offers a higher picking
precision, and fewer false phases are detected; however,
6. Conclusions more phases will be missed.
EQTransformer has a greater depth, width, and
In this study, the PhaseNet and EQTransformer metho- parameter number than PhaseNet, and its training used a
ds were employed to process the continuous waveforms global seismic dataset with a larger amount of data.
recorded at stations near the Yangbi and Maduo However, its generalization ability is weaker than that of
earthquakes. The obtained results were compared with PhaseNet, suggesting that increasing the training set size
those from manual processing to analyze their respective and network depth will not necessarily deliver better
pros and cons. The results show that when the two generalization ability and recall. Statistical analysis of the
methods were applied to the detection of the Yangbi and travel time data showed that the STEAD data dispersion is
Maduo earthquake sequences, their precision, recall, and much greater than that of the Yangbi earthquake data,
F1-Score were all significantly lower than those reported which may have resulted from the different data
in the original papers. When PhaseNet and EQTransformer processing methods employed by different stations in
are applied to detect earthquakes in other geographic STEAD. As a result, EQTransformer may extract incorrect
regions, retraining or migration learning may help enhance features and affect the performance of the network. In
their detection effects in the new region and improve the developing an earthquake detection model using deep
completeness of the earthquake catalog. learning, the quality of the training data is more important
PhaseNet provides better recall and lower arrival time than its quantity. It would be difficult to obtain a model
434 doi: 10.29382/eqs-2021-0038 Jiang C et al.
with good generalization ability solely by increasing the Geophys Res Lett 47(6): e2020GL088651.
size of the training dataset. Besides the high precision and Dai HC, and MacBeth C (1995). Automatic picking of seismic
recall of the model, its generalization ability is also an arrivals in local earthquake data using an artificial neural
important evaluation indicator. To assess the performance network. Geophys J Int 120(3): 758–774.
Fang LH, Wu ZL, and Song K (2017). SeismOlympics. Seismol
of the model objectively, it must be tested on datasets from
Res Lett 88(6): 1 429–1 430.
different geographical regions.
Gentili S, and Michelini A (2006). Automatic picking of P and S
This study used PhaseNet to detect the continuous phases using a neural tree. J Seismol 10(1): 39–63.
waveforms 21 days before the Yangbi and Maduo earth- Gibbons SJ, and Ringdal F (2006). The detection of low
quakes. The Yangbi earthquake was found to exhibit a magnitude seismic events using array-based waveform
foreshock sequence, while the Maduo earthquake showed correlation. Geophys J Int 165(1): 149–166.
no foreshock activity three weeks before the occurrence of He KM, Zhang XY, Ren SQ, and Sun J (2016). Deep residual
the event, indicating that the nucleation processes of the learning for image recognition. In: Proceedings of 2016 IEEE
two earthquakes were different. With the development of Conference on Computer Vision and Pattern Recognition.
intelligent seismic data processing technology and the real- IEEE, Las Vegas.
time processing of observational data, earthquake predic- Huang YP, Cheng YL, Bapna A, Firat O, Chen MX, Chen D, Lee
tion and forecasting could be supported technically HJ, Ngiam J, Le QV, Wu YH, and Chen ZF (2019). GPipe:
efficient training of giant neural networks using pipeline
through analyzing the changes in seismic frequency, b-
parallelism. In: Proceedings of the 33rd Conference on
values, and other seismic data.
Neural Information Processing Systems. NeurIPS,
Vancouver.
Acknowledgments LeCun Y, Bengio Y, and Hinton G (2015). Deep learning. Nature
521(7553): 436–444.
This study was jointly funded by the National Key Lei XL, Su JR, and Wang ZW (2020). Growing seismicity in the
Sichuan Basin and its association with industrial activities.
R&D Program of China (No. 2021YFC3000702), the Nat-
Sci China Earth Sci 63(11): 1 633–1 660.
ional Natural Science Foundation of China (No. 41774067),
Li ZF, Meier MA, Hauksson E, Zhan ZW, and Andrews J (2018).
and the Fundamental Research Funds for the Institute of Machine learning seismic wave discrimination: application to
Geophysics, China Earthquake Administration (Nos. DQ earthquake early warning. Geophys Res Lett 45(10): 4 773–
JB21Z05, DQJB20X07). The authors would like to extend 4779.
their sincere gratitude to the Yunnan and Qinghai Liu M, Zhang M, Zhu WQ, Ellsworth WL, and Li HY (2020).
Provincial Seismological Bureaus for providing the Rapid characterization of the July 2019 Ridgecrest,
seismic waveform data. In addition, the authors wish to California, earthquake sequence from raw seismic data using
express their gratitude to Shirong Liao and Hongcai machine-learning phase picker. Geophys Res Lett 47(4):
Zhang, senior engineers from the Fujian Earthquake e2019GL086189.
Agency, for the discussions they contributed during the Mousavi SM, Sheng YX, Zhu WQ, and Beroza GC (2019).
STanford EArthquake Dataset (STEAD): a global data set of
writing process. The EQTransformer earthquake detection
seismic signals for AI. IEEE Access 7: 179 464–179 476.
model can be downloaded at https://github.com/smou-
Mousavi SM, Ellsworth WL, Zhu WQ, Chuang LY, and Beroza
savi05/EQTransformer. The PhaseNet earthquake detec-
GC (2020). Earthquake transformer—an attentive deep-
tion model can be downloaded at https://github.com/way- learning model for simultaneous earthquake detection and
neweiqiang/PhaseNet. The STEAD training dataset can be phase picking. Nat Commun 11(1): 3 952.
downloaded at https://github.com/smousavi05/STEAD. Peng ZG, and Zhao P (2009). Migration of early aftershocks
following the 2004 Parkfield earthquake. Nat Geosci 2(12):
References 877–881.
Ross ZE, Meier MA, Hauksson E, and Heaton TH (2018).
Allen RV (1978). Automatic earthquake recognition and timing Generalized seismic phase detection with deep learning. Bull
from single traces. Bull Seismol Soc Am 68(5): 1 521–1 532. Seismol Soc Am 108(5A): 2 894–2 901.
Baer M, and Kradolfer U (1987). An automatic phase picker for Shelly DR, Beroza GC, and Ide S (2007). Non-volcanic tremor
local and teleseismic events. Bull Seismol Soc Am 77(4): and low-frequency earthquake swarms. Nature 446(7133):
1437–1 445. 305–307.
Chai CP, Maceira M, Santos-Villalobos HJ, Venkatakrishnan SV, Su JB, Liu M, Zhang YP, Wang WT, Li HY, Yang J, Li XB, and
Schoenball M, Zhu WQ, Beroza GC, Thurber C, and EGS Zhang M (2021). High resolution earthquake catalog building
Collab Team (2020). Using a deep neural network and for the 21 May 2021 Yangbi, Yunnan, MS6.4 earthquake
transfer learning to bridge scales for seismic phase picking. sequence using deep-learning phase picker. Chin J Geophys
Jiang C et al. doi: 10.29382/eqs-2021-0038 435
64(8): 2647–2656 (in Chinese with English abstract). seismic array. J Geophys Res 126(5): e2020JB021444.
Tan MX, and Le QV (2019). EfficientNet: rethinking model Yang W, Chen GY, Meng LY, Zang Y, Zhang HJ, and Li JL
scaling for convolutional neural networks. In: Proceedings of (2021). Determination of the local magnitudes of small
the 36th International Conference on Machine Learning. earthquakes using a dense seismic array in the Changning-
PMLR, Long Beach. Zhaotong Shale Gas Field, Southern Sichuan Basin. Earth
Wang J, and Teng TL (1997). Identification and picking of S Planet Phys 5(6): 532–546.
phase using an artificial neural network. Bull Seismol Soc Yoon CE, O’Reilly O, Bergen KJ, and Beroza GC (2015).
Am 87(5): 1 140–1 149. Earthquake detection through computationally efficient
Wang J, Xiao ZW, Liu C, Zhao DP, and Yao ZX (2019). Deep similarity search. Sci Adv 1(11): e1501057.
learning for picking seismic arrival times. J Geophys Res Zagoruyko S, and Komodakis N (2016). Wide residual networks.
124(7): 6 612–6 624. In: Proceedings of British Machine Vision Conference 2016.
Wang WL, Fang LH, Wu JP, Tu HW, Chen LY, Lai GJ, and BMVA Press, York.
Zhang L (2021). Aftershock sequence relocation of the 2021 Zhang X, Zhang M, and Tian X (2020). Real-time earthquake
MS7.4 Maduo Earthquake, Qinghai, China. Sci China Earth early warning with deep learning: application to the 2016
Sci 64(8): 1 371–1 380. Central Apennines, Italy earthquake sequence. arXiv:
Wei YG, Yang QL, Wang TT, Jiang CS, and Bian YJ (2019). 2006.01332.
Earthquake and explosion identification based on deep Zhao M, Chen S, and Yuen D (2019). Waveform classification
learning residual network model. Acta Seismol Sin 41(5): and seismic recognition by convolution neural network. Chin
646–657. J Geophys 62(1): 374–382 (in Chinese with English abstract).
Withers M, Aster R, Young C, Beiriger J, Harris M, Moore S, Zhou YJ, Yue H, Kong QK, and Zhou SY (2019). Hybrid event
and Trujillo J (1998). A comparison of select trigger detection and phase-picking algorithm using convolutional
algorithms for automated global seismic phase and event and recurrent neural networks. Seismol Res Lett 90(3): 1 079–
detection. Bull Seismol Soc Am 88(1): 95–106. 1087.
Xiao ZW, Wang J, Liu C, Li J, Zhao L, and Yao ZX (2021). Zhu WQ, and Beroza GC (2019). PhaseNet: a deep-neural-
Siamese earthquake transformer: a pair-input deep-learning network-based seismic arrival-time picking method. Geophys
model for earthquake detection and phase picking on a J Int 216(1): 261–273.