A Machine Learning Tutorial for Operational Meteorology.
A Machine Learning Tutorial for Operational Meteorology.
1509
ABSTRACT: Recently, the use of machine learning in meteorology has increased greatly. While many machine learning
methods are not new, university classes on machine learning are largely unavailable to meteorology students and are not
required to become a meteorologist. The lack of formal instruction has contributed to perception that machine learning
methods are “black boxes” and thus end-users are hesitant to apply the machine learning methods in their everyday work-
flow. To reduce the opaqueness of machine learning methods and lower hesitancy toward machine learning in meteorol-
ogy, this paper provides a survey of some of the most common machine learning methods. A familiar meteorological
example is used to contextualize the machine learning methods while also discussing machine learning topics using plain
language. The following machine learning methods are demonstrated: linear regression, logistic regression, decision trees,
random forest, gradient boosted decision trees, naïve Bayes, and support vector machines. Beyond discussing the different
methods, the paper also contains discussions on the general machine learning process as well as best practices to enable
readers to apply machine learning to their own datasets. Furthermore, all code (in the form of Jupyter notebooks and
Google Colaboratory notebooks) used to make the examples in the paper is provided in an effort to catalyze the use of
machine learning in meteorology.
KEYWORDS: Radars/Radar observations; Satellite observations; Forecasting techniques; Nowcasting;
Operational forecasting; Artificial intelligence; Classification; Data science; Decision trees; Machine learning;
Model interpretation and visualization; Regression; Support vector machines; Other artificial intelligence/machine learning
DOI: 10.1175/WAF-D-22-0070.1
Ó 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright
Policy (www.ametsoc.org/PUBSReuseLicenses).
Unauthenticated | Downloaded 11/06/24 08:16 PM UTC
1510 WEATHER AND FORECASTING VOLUME 37
FIG. 1. Search results for the Meteorology and Atmospheric Science category when searching abstracts for machine learning methods
and severe weather. Machine learning keywords searched were the following: linear regression, logistic regression, decision trees, random
forest, gradient-boosted trees, support vector machines, k-means, k-nearest, empirical orthogonal functions, principal component analysis,
self-organizing maps, neural networks, convolutional neural networks, and unets. Severe weather keywords searched were the following:
tornadoes, hail, hurricanes, and tropical cyclones. (a) Counts of publications per year for all papers in the Meteorology and Atmospheric
Science category (black line; reduced by one order of magnitude), machine learning topics (blue line), and severe weather topics (red
line). (b) As in (a), but with the two subtopics normalized by the total number of Meteorology and Atmospheric Science papers. (c) Num-
ber of neural network papers (including convolutional and unets) published in Meteorology and Atmospheric sciences. All data are de-
rived from Clarivate Web of Science.
define common ML terms. Section 3 discusses the general ML tomorrow’s high temperature, the input feature would be
methods in context of a simple meteorological example, while tomorrow’s forecasted temperature from a numerical
also describing the end-to-end ML pipeline. Then, section 4 weather model (e.g., GFS) and the label would be tomor-
summarizes this paper and also discusses the topics of the row’s observed temperature.
next paper in the series. Supervised ML methods can be further broken into two sub-
categories: regression and classification. Regression tasks are
2. Machine learning methods and common terms ML methods that output a continuous range of values, like the
forecast of tomorrow’s high temperature (e.g., 75.08F). Mean-
This section will describe a handful of the most common
while classification tasks are characteristic of ML methods that
ML methods. Before that, it is helpful to define some termi-
classify data (e.g., will it rain or snow tomorrow). Reposing to-
nology used within ML. First, we define ML as any empirical1
morrow’s high temperature forecast as a classification task
method where parameters are fit (i.e., learned) on a training
would be: “Will tomorrow be warmer than today?” This paper
dataset in order to optimize (e.g., minimize or maximize) a
will cover both regression and classification methods. In fact,
predefined loss (i.e., cost) function. Within this general frame-
many ML methods can be used for both tasks.
work, ML has two categories: supervised and unsupervised
All ML methods described here will have one thing in com-
learning. Supervised learning are ML methods that are
mon: the ML method quantitatively uses the training data to
trained with prescribed input features and output labels. For
optimize a set of weights (i.e., thresholds) that enable the pre-
example, predicting tomorrow’s high temperature at a specific
diction. These weights are determined either by minimizing the
location where we have measurements (i.e., labels). Mean-
error of the ML prediction or maximizing a probability of a
while, unsupervised methods do not have a predefined output
class label. The two different methods coincide with the regres-
label (e.g., self-organizing maps; Nowotarski and Jensen
sion and classification, respectively. Alternative names for error
2013). An example of an unsupervised ML task would be
that readers might encounter in the literature are loss or cost.
clustering all 500 mb geopotential height maps to look for
Now that some of the common ML terms has been discussed,
unspecified patterns in the weather. This paper focuses on
the following subsections will describe the ML methods. It will
supervised learning.
start with the simplest methods (e.g., linear regression) and
The input features for supervised learning, also referred
move to more complex methods (e.g., support vector machines)
to as input data, predictors, or variables, can be written
as the sections proceed. Please note that the following subsec-
mathematically as the vector (matrix) X. The desired output
tions aim to provide an introduction and the intuition behind
of the ML model is usually called the target, predictand or
each method. An example of the methods being applied and
label, and is mathematically written as the scalar (vector) y.
helpful application discussion can be found in section 3.
Drawing on the meteorological example of predicting
a. Linear regression
1
By “empirical” we mean any method that uses data as opposed An important concept in ML is when choosing to use ML
to physics. for a task, one should start with the simpler ML models first.
iD
ŷ wi x i : (1)
i0
c. Naïve Bayes
An additional method to do classification is known as naïve
Bayes (Kuncheva 2006), which is named for its use of Bayes’s
theorem and can be written as the following:
P(y)P(x|y)
P(y|x) : (8)
P(x)
ik
Gini pi (1 2 pi ), (11)
i0
ŷ wT x 1 b, (13)
FIG. 6. Support vector machine classification examples. (a) Ideal (synthetic) data where the x and y axis are both in-
put features, while the color designates what class each point belongs to. The decision boundary learned by the sup-
port vector machine is the solid black line, while the margin is shown by the dashed lines. (b) A real world example
using NAM 1800 UTC forecasts of U and V wind and tipping-bucket measurements of precipitation. Blue plus
markers are raining instances, and the red minus signs are non-raining instances. Black lines are the decision boundary
and margins.
from the Storm Event Imagery dataset (SEVIR; Veillette to estimate GLM measurements prior to GOES-16 (i.e.,
et al. 2020), which contains over 10 000 storm events from be- November 2016).
tween 2017 and 2019. Each event spans four hours and in-
b. Data
cludes measurements from both GOES-16 and NEXRAD. An
example storm event and the five measured variables}red The first step of any ML project is to obtain data. Here, the
channel visible reflectance (0.64 mm; channel 2), midtropo- data are from a public archive hosted on the Amazon web ser-
spheric water vapor brightness temperature (6.9 mm; channel 9), vice. For information of how to obtain the SEVIR data as
clean infrared window brightness temperature (10.7 mm; chan- well as the code associated with this manuscript see the data
nel 13), vertically integrated liquid (VIL; from NEXRAD), and availability statement. One major question at this juncture is
Geostationary Lightning Mapper (GLM) measured lightning as follows: “How much data are needed to do machine learning?”
flashes}are found in Fig. 7. In addition to discussing ML in con- While there does not exist a generic number that can apply to all
text of the SEVIR dataset, this section will follow the general datasets, the idea is to obtain enough data such that one’s training
steps to using ML and contain helpful discussions of the best data are diverse. A diverse dataset is desired because any bias
practices as well as the most common pitfalls. found within the training data would be encoded in the ML
method (McGovern et al. 2021). For example, if a ML model was
a. Problem statements trained on only images where thunderstorms were present, then
the ML model would likely not know what a non-lightning pro-
The SEVIR data will be applied to two tasks: 1) Does this ducing storm would look like and be biased. Diversity in the
image contain a thunderstorm? and 2) How many lightning SEVIR dataset is created by including random images (i.e., no
flashes are in this image? To be explicit, we assume the GLM storms) from all around the United States (cf. Fig. 2 in Veillette
observations are unavailable, and we need to use the other et al. 2020).
measurements (e.g., infrared brightness temperature) as fea- After obtaining the data, it is vital to remove as much spuri-
tures to estimate if there are lightning flashes (i.e., classifica- ous data as possible before training because the ML model
tion), and how many of them are there (i.e., regression). will not know how to differentiate between spurious data and
While both of these tasks might be considered redundant high quality data. A common anecdote when using ML mod-
since we have GLM, the goal of this paper is to provide dis- els is garbage in, garbage out. The SEVIR dataset has already
cussion on how to use ML as well as discussion on the ML gone through rigorous quality control, but this is often not the
methods themselves. That being said, a potential useful appli- case with raw meteorological datasets. Two examples of qual-
cation of the trained models herein would be to use them on ity issues that would likely be found in satellite and radar
satellite sensors that do not have lightning measurements. For datasets are satellite artifacts (e.g., GOES-17 heat pipe;
example, all generations of GOES prior to GOES-16 did McCorkel et al. 2019) and radar ground clutter (e.g., Hubbert
not have a lightning sensor collocated with the main sensor. et al. 2009). Cleaning and manipulating the dataset to get it
Thus, we could potentially use the ML models trained here ready for ML often takes a researcher 50%–80% of their
time.3 Thus, do not be discouraged if cleaning one’s datasets classified as containing a thunderstorm if the image has at least
is taking a large amount of time because a high-quality dataset one flash in the last five minutes. For Problem Statement 2, the
will be best for having a successful ML model. sum of all lightning flashes in the past five minutes within the
Subsequent to cleaning the data, the next step is to engineer image are used for the regression target.
the inputs (i.e., features) and outputs (i.e., labels). One ave- Now that the data have been quality controlled and our fea-
nue to create features is to use every single pixel in the image tures and labels have been extracted, the next step is to split
as a predictor. While this could work, given the number of that dataset into three independent subcategories named the
pixels in the SEVIR images (589 824 total pixels for one visi- training, validation, and testing sets. The reason for these three
ble image) it is computationally impractical to train a ML subcategories is because of the relative ease at which ML
model with all pixels. Thus, we are looking for a set of statis- methods can “memorize” the training data. This occurs be-
tics than can be extracted from each image. For the genera- cause ML models can contain numerous (e.g., hundreds, thou-
tion of features, domain knowledge is critical because sands, or even millions) learnable parameters, thus the ML
choosing meteorologically relevant quantities will ultimately model can learn to perform well on the training data but not
determine the ML models skill. For the ML tasks presented generalize to other non-training data, which is called over-fitting.
in section 3a, information about the storm characteristics To assess how over-fit a ML model is, it is important to evaluate
(e.g., strength) in the image would be beneficial features. For a trained ML model on data outside of its training data (i.e.,
example, a more intense storm is often associated with more validation and testing sets).
lightning. Proxies for estimating storm strength would be the The training dataset is the largest subset of the total amount
magnitude of reflectance in the visible channel; how cold of data. The reason the training set is the largest is because
brightness temperatures in the water vapor and clean infrared the aforementioned desired outcome of most ML models
channel are; and how much vertically integrated water there is to generalize on wide variety of examples. Typically, the
is. Thus, to characterize these statistics, we extract the follow- amount of training data is between 70% and 85% of the total
ing percentiles from each image and variable: 0, 1, 10, 25, 50, amount of data available. The validation dataset, regularly
75, 90, 99, and 100.
5%–15% of the total dataset, is a subset of data used to assess
To create the labels the number of lightning flashes in the
if a ML model is over-fit and is also used for evaluating best
image are summed. For Problem Statement 1, an image is
model configurations (e.g., the depth of a decision tree). These
model configurations are also known as hyper-parameters.
3
https://www.nytimes.com/2014/08/18/technology/for-big-data- Machine learning models have numerous configurations and per-
scientists-hurdle-to-insights-is-janitor-work.html. mutations that can be varied and could impact the skill of any
FIG. 9. Performance metrics from the simple classification (only using Tb). (a) Receiver operating characteristic
(ROC) curves for each ML model (except support vector machines), logistic regression (LgR; blue), naïve Bayes
(NB; red), decision tree (DT; geen), random forest (RF; yellow), and gradient boosted trees (GBT; light green). The
area under the ROC curve is reported in the legend. (b) Performance diagram for all ML models [same colors as (a)].
Color fill is the corresponding CSI value for each success ratio–probability of detection (SR–POD) pair. Dashed con-
tours are the frequency bias.
FIG. 10. As in Fig. 9, but now trained with all available predictors. The annotations from Fig. 9 have been removed.
system or a slightly underforecasting system. For the thunder- Using all available input features yields an accuracy of
storm, no-thunderstorm task, there are not many implications 90%, 84%, 86%, 91%, 90%, and 89% for logistic regression,
for overforecasting or underforecasting. However, developers naïve Bayes, decision tree, random forest, gradient boosted
of a tornado prediction model may prefer a system that pro- trees, and support vector machines, respectively. Beyond
duces more false positives (overforecasting; storm warned, no the relatively good accuracy, the ROC curves are shown in
tornado) than false negatives (underforecasting; storm not Fig. 10a. This time there are generally two sets of curves, one
warned, tornado) as missed events could have significant im- better performing group (logistic regression, random forest,
pact to life and property. It should be clear that without going gradient boosted trees, and support vector machines) with
beyond a single metric, this differentiation between the ML AUCs of 0.97 and a worse performing group (naïve Bayes
methods would not be possible. and decision tree) AUCs around 0.87. This separation coin-
While the previous example was simple by design, we as hu- cides with the flexibility of the classification methods. The bet-
mans could have used a simple threshold at the intersection of ter performing groups are better set to deal with many
the two histograms in Fig. 8 to achieve similar accuracy (e.g., features and nonlinear interactions of the features, while the
81%; not shown). The next logical step with the classification worse performing group is a bit more restricted in how it com-
task would be to use all available features. One important bines many features. Considering the performance diagram
thing to mention at this step is that it is good practice to nor- (Fig. 10b), the same grouping of high AUC performing mod-
malize input features. Some of the ML methods (e.g., random els have higher CSI scores (.0.8) and have little to no fre-
forest) can handle inputs of different magnitudes (e.g., CAPE quency bias. Meanwhile the lower AUC performing models
is on the order of hundreds to thousands, but lifted index is have lower CSI (0.75) and NB has a slight overforecasting
on the order of one to tens), but others (e.g., logistic regres- bias. Overall, the ML performance on classifying if an image
sion) will be unintentionally biased toward larger magnitude has a thunderstorm is doing well with all predictors. While a
features if you do not scale your input features. Common scal- good performing model is a desired outcome of ML, at this
ing methods include min–max scaling and scaling your input point we do not know how the ML is making its predictions.
features to have mean 0 and standard deviation of 1 (i.e., stan- This is part of the “black-box” issue of ML and does not lend
dard anomaly) which are defined mathematically as follows: itself to being consistent with the ML user’s prior knowledge
(see note in introduction on consistency; Murphy 1993).
x 2 xmin
minmax , and (19) To alleviate some of opaqueness of the ML black box, one
xmax 2 xmin
can interrogate the trained ML models by asking: “What in-
x2m put features are most important to the decision?” and “Are
standard anomaly , (20) the patterns the ML models learned physical (e.g., follow me-
s
teorological expectation)?” The techniques named permuta-
respectively. In Eq. (19), xmin is the minimum value within the tion importance (Breiman 2001; Lakshmanan et al. 2015) and
training dataset for some input feature x while xmax is the accumulated local effects (ALE; Apley and Zhu 2020) are
maximum value in the training dataset. In Eq. (20), m is used to answer these two questions, respectively. Permutation
the mean of feature x in the training dataset and s is the stan- importance is a method in which the relative importance of
dard deviation. For this paper, the standard anomaly is used. an input feature is quantified by considering the change in
FIG. 11. Backward permutation importance test for the best performing classification ML models. Single pass results are in the top row,
while multi-pass forward results are for the bottom row. Each column corresponds to a different ML method: (a),(d) logistic regression;
(b),(e) random forest; and (c),(f) gradient boosted trees. Bars are colored by their source, yellow for the vertically integrated liquid (VIL),
red for the infrared (IR), blue for water vapor (WV), and black for visible (VIS). Number subscripts correspond to the percentile of that
variable. The dashed black line is the original AUC value when all features are not shuffled.
evaluation metric (e.g., AUC) when that input variable is the single-pass permutation importance. For visual learners,
shuffled (i.e., randomized). The intuition is that the most im- see the animations (for the backward direction; Figs. ES4 and
portant variables when shuffled will cause the largest change ES5) in the supplement of McGovern et al. (2019). The reason
to the evaluation metric. There are two main flavors of per- for doing multi-pass permutation importance is because corre-
mutation importance, named single-pass and multi-pass. lated features could result in falsely identifying unimportant
Single-pass permutation importance goes through each input variables using the single pass permutation importance. The
variable and shuffles them one by one, calculating the change best analysis of the permutation test is to use both the single
in the evaluation metrics. Multi-pass permutation importance pass and multi-pass tests in conjunction.
uses the result of the single-pass, but progressively permutes The top five most important features for the better per-
features. In other words, features are successively permuted forming models (i.e., logistic regression, random forest, and
in the order that they were determined as important (most gradient boosted trees) as determined by permutation impor-
important then second most important etc.) from the single tance are shown in Fig. 11. For all ML methods both the sin-
pass but are now left shuffled. The specific name for the gle and multi-pass test show that the maximum vertically
method we have been describing is the backward multi-pass integrated liquid is the most important feature, while the min-
permutation importance. The backward name comes from the imum brightness temperature from the clean infrared and
direction of shuffling, starting will all variables unshuffled and midtropospheric water vapor channels are found within the
shuffling more and more of them. There is the opposite direc- top five predictors (except multi-pass test for logistic regres-
tion, named forward multi-pass permutation importance, where sion). In general, the way to interpret these is to take the con-
the starting point is that all features are shuffled to start. Then sensus over all models which features are important. At this
each feature is unshuffled in order of their importance from point it time to consider if the most important predictors
FIG. 12. Accumulated local effects (ALE) for (a) the maximum vertically integrated liquid (VILmax), (b) the minimum brightness tem-
perature from infrared (IRmin), and (c) the minimum brightness temperature from the water vapor channel (WVmin). Lines correspond to
all the ML methods trained (except support vector machines) and colors match Fig. 9. Gray histograms in the background are the counts
of points in each bin.
make meteorological sense. Vertically integrated liquid has as the ALE for that bin. This process is repeated for all bins
been shown to have a relationship to lightning (e.g., Watson which result in a curve. For example, the ALE for some of
et al. 1995) and is thus plausible to be the most important pre- the top predictors of the permutation test is shown in in
dictor. Similarly, the minimum brightness temperature at the Fig. 12. At this step, the ALEs can be mainly used to see if
water vapor and clean infrared channels also makes physical the ML models have learned physically plausible trends with
sense because lower temperatures are generally associated input features. For the vertically integrated liquid, all models
with taller storms. We could also reconcile the maximum in- show that as the max vertically integrated liquid increases
frared brightness temperature (Fig. 11a) as a proxy for the from about 2 to 30 kg m22 the average output probability of
surface temperature which correlates to buoyancy, but note the model will increase, but values larger than 30 kg m22 gen-
that the relative change in AUC with this feature is quite erally all have the same local effect on the prediction
small. Conversely, any important predictors that do not align (Fig. 12a). As for the minimum clean infrared brightness tem-
with traditional meteorological knowledge may require fur- perature, the magnitude of the average change is considerably
ther exploration to determine why the model is placing such different across the different models, but generally all have
weight on those variables. Does the predictor have some the same pattern. As the minimum temperature increases
statistical correlation with the meteorological event that is from 2888 to 2558C, the mean output probability decreases:
unexplained by past literature, or are there nonphysical temperatures larger than 2178C have no change (Fig. 12b).
characteristics of the data that may be influencing the model Last, all models but the logistic regression show a similar pat-
during training? In the latter case, it is possible that your tern with the minimum water vapor brightness temperature,
model might be getting the right answer for the wrong but notice the magnitude of the y axis (Fig. 12c). Much less
reasons. change occurs with this feature. For interested readers, addi-
Meanwhile minimum brightness temperature at both the tional interpretation techniques and examples can be found in
water vapor and clean infrared channels also makes physical Molnar (2022).
sense since lower temperatures are related with taller storms.
2) REGRESSION
We could also reconcile the max infrared brightness tempera-
ture as a proxy for the surface temperature, which correlates As stated in section 3a, task 2 is to predict the number of
to buoyancy, but not that the relative change in AUC with lightning flashes inside an image. Thus, the regression meth-
this feature is quite small. If any the top predictors do not ods available to do this task are linear regression, decision
make sense meteorologically, then your model might be get- tree, random forest, gradient boosted trees, and support vec-
ting the right answer for the wrong reasons. tor machines. Similar to task 1 a simple scenario is considered
Accumulated local effects are where small changes to input first, using Tb as the lone predictor. Figure 13 shows the gen-
features and their associated change on the output of the eral relationship between Tb and the number of flashes in the
model are quantified. The goal behind ALE is to investigate image. For Tb . 2258C, most images do not have any light-
the relationship between an input feature and the output. ning, while Tb , 2258C shows a general increase of lightning
ALE is performed by binning the data based on the feature of flashes. Given there are a lot of images with zero flashes (ap-
interest. Then for each example in each bin, the feature value proximately 50% of the total dataset; black points in Fig. 13),
is replaced by the edges of the bin. The mean difference in the linear methods will likely struggle to capture a skillful pre-
the model output from the replaced feature value is then used diction. One way to improve performance would be to only
1 N
bias (y 2 ŷ j ), (21)
N j1 j
1 N
MAE |y 2 ŷ j |, (22)
N j1 j
1 N
RMSE (y 2 ŷ j )2 , (23)
N j1 j
N
(yj 2 ŷ j )2
j1
R 12
2
: (24)
N
(yj 2 y)2
j1
All of these metrics are shown in Fig. 15. In general, the met-
rics give a more quantitative perspective to the one-to-one plots.
The poor performance of the linear methods shows, with the
two worst performances being the support vector machines and
linear regression with biases of 71 and 6 flashes, respectively.
FIG. 13. The training data relationship between the minimum
brightness temperature from infrared (Tb) and the number of While no method provides remarkable performance, the ran-
flashes detected by GLM. All non-thunderstorm images (number dom forest and gradient boosted trees perform better with this
of flashes equal to 0) are in black. single feature model (show better metrics holistically).
As before, the next logical step is to use all available fea-
predict the number of flashes on images where there are tures to predict the number of flashes: those results are found
nonzero flashes. While this might not seem like a viable way in Figs. 16 and 17. As expected, the model performance in-
forward since non-lightning cases would be useful to pre- creases. Now all models show a general correspondence be-
dict, in practice we could leverage the very good perfor- tween the predicted number of flashes and the true number of
mance of the classification model from section 3c(1), and flashes in the one-to-one plot (Fig. 16). Meanwhile the scatter
then use the trained regression on images that are confident for random forest and gradient boosted trees has reduced
to have at least one flash in them. An example of this done considerably when comparing to the single input models
in the literature is Gagne et al. (2017), where hail size pre- (Figs. 16c,d). While comparing the bias of the models trained
dictions were only made if the classification model said with all predictors is relatively similar, the other metrics are
there was hail. much improved, showing large reductions in MAE and
As before, all methods are fit on the training data initially RMSE and increases in R2 (Fig. 17) for all methods except de-
using the default hyper-parameters. A common way to com- cision trees. This reinforces that fact that similar to the classifi-
pare regression model performance is to create a one-to-one cation example, it is always good to compare more than one
plot, which has the predicted number of flashes on the x axis metric.
and the true measured number of flashes on the y axis. A Since the initial fitting of the ML models used the default
perfect model will show all points tightly centered along the parameters, there might be room for tuning the models to
diagonal of the plot. This is often the quickest qualitative as- have better performance. Here we will show an example of
sessment of how a regression model is performing. While Tb some hyper-parameter tuning of a random forest. The com-
was well suited for the classification of thunderstorm/no- mon parameters that can be altered in a random forest include
thunderstorm, it is clear that fitting a linear model to the the following: the maximum depth of the trees (i.e., number of
data in Fig. 13 did not do well (Figs. 14a,e), leading to a decisions in a tree) and the number of trees in the forest. The
strong overprediction of the number of lightning flashes in formal hyper-parameter search will use the full training data-
an images with less than 100 flashes, while under-predicting set, and systematically vary the depth of the trees from 1 to 10
the number of flashes for images with more than 100 flashes. (in increments of 1) as well as the number of trees from 1 to
The tree based methods tend to do better, but there is still a 100 (1, 5, 10, 25, 50, 100). This results in 60 total models that
large amount of scatter and an over estimation of storms are trained.
with less than 100 flashes. To evaluate which is the best configuration, the same metrics
To tie quantitative metrics to the performance of each as before are shown in Fig. 18 as a function of the depth of the
model the following are common metrics calculated: mean trees. The random forest quickly gains skill with added depth
bias, mean absolute error (MAE), root mean squared error beyond one, with all metrics improving for both the training
(RMSE) and coefficient of determination (R2). Their mathe- (dashed lines) and validation datasets (solid lines). Beyond a
matical representations are the following: depth of four, the bias, MAE, and RMSE all stagnate, but the
FIG. 14. The one-to-one relationship between the predicted number of lightning flashes from the ML learning models
trained on only Tb (x axis; ŷ) and the number of measure flashes from GLM (y axis; y). Each marker is one observation.
Meanwhile areas with more than 100 points in close proximity are shown in the colored boxes. The lighter the shade of
the color, the higher the density of points. (a) Linear regression (LnR; reds), (b) decision tree (DT; blues), (c) random
forest (RF; oranges), (d) gradient boosted trees (GBT; purples), and (e) linear support vector machines (SVM; grays).
R2 value increases until a depth of eight where the training training metric skills but no increase (or a decrease) to vali-
data continue to increase. There does not seem to be that dation data skill is the overfitting signal we discussed in
large of an effect of increasing the number of trees beyond section 3b. Thus, the best random forest model choice for
10 (color change of lines). The characteristic of increasing predicting lightning flashes is a random forest with a max
depth of eight and a total of 10 trees. The reason we choose
10 trees, is because in general choosing a simpler model is
less computationally expensive to use as well as a more in-
terpretable than a model with 1000 trees.
d. Testing
As mentioned before, the test dataset is the dataset you
hold out until the end when all hyper-parameter tuning has
finished so that there is no unintentional tuning of the final
model configuration to a dataset. Thus, now that we have eval-
uated the performance of all our models on the validation da-
taset it is time to run the same evaluations as in sections 3c(1)
and 3c(2). These test results are the end performance metrics
that should be interpreted as the expected ML performance
on new data (e.g., the ML applied in practice). For the ML
models here, the metrics are very similar as the validation set.
(For brevity the extra figures are included in the appendix
Figs. A1–A3.)
FIG. 16. As in Fig. 14, but now the x axis is provided from the ML models trained with all available input features.
common ML methods. All ML methods described here are such a way that ML methods are more familiar to readers as
considered supervised methods, meaning the data the models they encounter them in the operational community and within
are trained from include pre-labeled truth data. The specific the general meteorological literature. Moreover, this manu-
methods covered included linear regression, logistic regres- script provided ample references of published meteorological
sion, decision trees, random forests, gradient boosted decision examples as well as open-source code to act as catalysts for
trees, naïve Bayes, and support vector machines. The over- readers to adapt and try ML on their own datasets and in their
arching goal of the paper was to introduce the ML methods in workflows.
Additionally, this manuscript provided a tutorial example of 3) Exhibited a regression task to predict the number of light-
how to apply ML to a couple meteorological tasks using the Storm ning flashes in a satellite image. This section also con-
Event Imagery dataset (SEVIR; Veillette et al. 2020) dataset. We tained discussions of training/evaluation as well as an ex-
ample of hyper-parameter tuning [section 3c(2)].
1) Discussed the various steps of preparing data for ML (i.e.,
4) Released python code to conduct all steps and examples
removing artifacts; engineering features, training/valida-
in this manuscript (see data availability statement).
tion/testing splits; section 3b).
2) Conducted a classification task to predict if satellite im- The follow on paper in this series will discuss a more com-
ages had lightning within them. This section included dis- plex, yet potentially more powerful, grouping of ML methods:
cussions of training, evaluation and interrogation of the neural networks and deep learning. Like a lot of the ML meth-
trained ML models [section 3c(1)]. ods described in this paper, neural networks are not necessarily
APPENDIX
REFERENCES
Meteor. Climatol., 60, 341–359, https://doi.org/10.1175/JAMC- Gensini, V. A., C. Converse, W. S. Ashley, and M. Taszarek,
D-20-0177.1. 2021: Machine learning classification of significant tornadoes
Chisholm, D., J. Ball, K. Veigas, and P. Luty, 1968: The diagnosis and hail in the United States using ERA5 proximity sound-
of upper-level humidity. J. Appl. Meteor., 7, 613–619, https://doi. ings. Wea. Forecasting, 36, 2143–2160, https://doi.org/10.1175/
org/10.1175/1520-0450(1968)007,0613:TDOULH.2.0.CO;2. WAF-D-21-0056.1.
Cintineo, J. L., M. Pavolonis, J. Sieglaff, and D. Lindsey, 2014: An Glahn, H. R., and D. A. Lowry, 1972: The use of Model Output
empirical model for assessing the severe weather potential of Statistics (MOS) in objective weather forecasting. J. Appl. Me-
developing convection. Wea. Forecasting, 29, 639–653, https:// teor., 11, 1203–1211, https://doi.org/10.1175/1520-0450(1972)011,
doi.org/10.1175/WAF-D-13-00113.1. 1203:TUOMOS.2.0.CO;2.
}}, and Coauthors, 2018: The NOAA/CIMSS ProbSevere Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning.
model: Incorporation of total lightning and validation. Wea. MIT Press, 800 pp., http://www.deeplearningbook.org.
Forecasting, 33, 331–345, https://doi.org/10.1175/WAF-D-17- Grams, H. M., P.-E. Kirstetter, and J. J. Gourley, 2016: Naïve
0099.1. Bayesian precipitation type retrieval from satellite using a
}}, M. J. Pavolonis, J. M. Sieglaff, L. Cronce, and J. Brunner, cloud-top and ground-radar matched climatology. J. Hydrome-
2020: NOAA Probsevere v2.0}Probhail, Probwind, and teor., 17, 2649–2665, https://doi.org/10.1175/JHM-D-16-0058.1.
Probtor. Wea. Forecasting, 35, 1523–1543, https://doi.org/10. Harris, C. R., and Coauthors, 2020: Array programming with
1175/WAF-D-19-0242.1. NumPy. Nature, 585, 357–362, https://doi.org/10.1038/s41586-
Conrick, R., J. P. Zagrodnik, and C. F. Mass, 2020: Dual-polarization 020-2649-2.
radar retrievals of coastal Pacific Northwest raindrop size distri- Herman, G., and R. Schumacher, 2018a: Dendrology in numerical
bution parameters using random forest regression. J. Atmos. weather prediction: What random forests and logistic regres-
Oceanic Technol., 37, 229–242, https://doi.org/10.1175/JTECH- sion tell us about forecasting. Mon. Wea. Rev., 146, 1785–
D-19-0107.1. 1812, https://doi.org/10.1175/MWR-D-17-0307.1.
Cui, W., X. Dong, B. Xi, and Z. Feng, 2021: Climatology of linear }}, and }}, 2018b: Money doesn’t grow on trees, but forecasts
mesoscale convective system morphology in the United do: Forecasting extreme precipitation with random forests.
States based on the random-forests method. J. Climate, 34,
Mon. Wea. Rev., 146, 1571–1600, https://doi.org/10.1175/
7257–7276, https://doi.org/10.1175/JCLI-D-20-0862.1.
MWR-D-17-0250.1.
Czernecki, B., M. Taszarek, M. Marosz, M. Półrolniczak, L.
Hilburn, K. A., I. Ebert-Uphoff, and S. D. Miller, 2021: Develop-
Kolendowicz, A. Wyszogrodzki, and J. Szturc, 2019: Applica-
ment and interpretation of a neural-network-based synthetic
tion of machine learning to large hail prediction}The impor-
radar reflectivity estimator using GOES-R satellite observa-
tance of radar reflectivity, lightning occurrence and convec-
tions. J. Appl. Meteor. Climatol., 60, 3–21, https://doi.org/10.
tive parameters derived from ERA5. Atmos. Res., 227, 249–
1175/JAMC-D-20-0084.1.
262, https://doi.org/10.1016/j.atmosres.2019.05.010.
Hill, A. J., and R. S. Schumacher, 2021: Forecasting excessive
Elmore, K. L., and H. Grams, 2016: Using mPING data to gener-
rainfall with random forests and a deterministic convection-
ate random forests for precipitation type forecasts. 14th Conf.
allowing model. Wea. Forecasting, 36, 1693–1711, https://doi.
on Artificial and Computational Intelligence and its Applica-
org/10.1175/WAF-D-21-0026.1.
tions to the Environmental Sciences, New Orleans, LA,
}}, G. R. Herman, and R. S. Schumacher, 2020: Forecasting se-
Amer. Meteor. Soc., 4.2, https://ams.confex.com/ams/96Annual/
vere weather with random forests. Mon. Wea. Rev., 148,
webprogram/Paper289684.html.
Flora, M. L., C. K. Potvin, P. S. Skinner, S. Handler, and A. 2135–2161, https://doi.org/10.1175/MWR-D-19-0344.1.
McGovern, 2021: Using machine learning to generate storm- Hoerl, A. E., and R. W. Kennard, 1970: Ridge regression: Biased
scale probabilistic guidance of severe weather hazards in the estimation for nonorthogonal problems. Technometrics, 12,
Warn-on-Forecast system. Mon. Wea. Rev., 149, 1535–1557, 55–67, https://doi.org/10.1080/00401706.1970.10488634.
https://doi.org/10.1175/MWR-D-20-0194.1. Holte, R. C., 1993: Very simple classification rules perform well
Friedman, J., 2001: Greedy function approximation: A gradient on most commonly used datasets. Mach. Learn., 11, 63–90,
boosting machine. Ann. Stat., 29, 1189–1232, https://doi.org/ https://doi.org/10.1023/A:1022631118932.
10.1214/aos/1013203451. Hu, L., E. A. Ritchie, and J. S. Tyo, 2020: Short-term tropical cy-
Gagne, D., A. McGovern, and J. Brotzge, 2009: Classification of clone intensity forecasting from satellite imagery based on
convective areas using decision trees. J. Atmos. Oceanic Tech- the deviation angle variance technique. Wea. Forecasting, 35,
nol., 26, 1341–1353, https://doi.org/10.1175/2008JTECHA1205.1. 285–298, https://doi.org/10.1175/WAF-D-19-0102.1.
}}, }}, }}, and M. Xue, 2013: Severe hail prediction within Hubbert, J. C., M. Dixon, S. M. Ellis, and G. Meymaris, 2009:
a spatiotemporal relational data mining framework. 13th Int. Weather radar ground clutter. Part I: Identification, model-
Conf. on Data Mining, Dallas, TX, Institute of Electrical and ing, and simulation. J. Atmos. Oceanic Technol., 26, 1165–
Electronics Engineers, 994–1001, https://doi.org/10.1109/ICDMW. 1180, https://doi.org/10.1175/2009JTECHA1159.1.
2013.121. Jergensen, G. E., A. McGovern, R. Lagerquist, and T. Smith,
}}, }}, S. Haupt, R. Sobash, J. Williams, and M. Xue, 2017: 2020: Classifying convective storms using machine learning.
Storm-based probabilistic hail forecasting with machine learn- Wea. Forecasting, 35, 537–559, https://doi.org/10.1175/WAF-
ing applied to convection-allowing ensembles. Wea. Forecast- D-19-0170.1.
ing, 32, 1819–1840, https://doi.org/10.1175/WAF-D-17-0010.1. Kalnay, E., 2002: Atmospheric Modeling, Data Assimilation and
}}, H. Christensen, A. Subramanian, and A. Monahan, 2019: Predictability. Cambridge University Press, 341 pp., https://
Machine learning for stochastic parameterization: Genera- doi.org/10.1017/CBO9780511802270.
tive adversarial networks in the Lorenz ’96 model. J. Adv. Key, J., J. Maslanik, and A. Schweiger, 1989: Classification of
Model. Earth Syst., 12, e2019MS001896, https://doi.org/10. merged AVHRR and SMMR Arctic data with neural net-
1029/2019MS001896. works. Photogramm. Eng. Remote Sens., 55, 1331–1338.
Kluyver, T., and Coauthors, 2016: Jupyter Notebooks}A publish- Mao, Y., and A. Sorteberg, 2020: Improving radar-based precipi-
ing format for reproducible computational workflows. Posi- tation nowcasts with machine learning using an approach
tioning and Power in Academic Publishing: Players, Agents based on random forest. Wea. Forecasting, 35, 2461–2478,
and Agendas, F. Loizides and B. Schmidt, Eds., IOS Press, https://doi.org/10.1175/WAF-D-20-0080.1.
87–90. McCorkel, J., J. Van Naarden, D. Lindsey, B. Efremova, M.
Kossin, J. P., and M. Sitkowski, 2009: An objective model for iden- Coakley, M. Black, and A. Krimchansky, 2019: GOES-17 ad-
tifying secondary eyewall formation in hurricanes. Mon. Wea. vanced baseline imager performance recovery summary.
Rev., 137, 876–892, https://doi.org/10.1175/2008MWR2701.1. (IGARSS 2019) 2019 IEEE Int. Geoscience and Remote Sens-
Kühnlein, M., T. Appelhans, B. Thies, and T. Nauß, 2014: Precipi- ing Symp., Yokohama, Japan, Institute of Electrical and Elec-
tation estimates from MSG SEVIRI daytime, nighttime, and tronics Engineers, 1–4, https://doi.org/10.1109/IGARSS40859.
twilight data with random forests. J. Appl. Meteor. Climatol., 2019.9044466.
53, 2457–2480, https://doi.org/10.1175/JAMC-D-14-0082.1. McGovern, A., D. Gagne, J. Williams, R. Brown, and J. Basara,
Kuncheva, L. I., 2006: On the optimality of naïve Bayes with de- 2014: Enhancing understanding and improving prediction of
pendent binary features. Pattern Recognit. Lett., 27, 830–837, severe weather through spatiotemporal relational learning.
https://doi.org/10.1016/j.patrec.2005.12.001. Mach. Learn., 95, 27–50, https://doi.org/10.1007/s10994-013-
Kurdzo, J. M., E. F. Joback, P.-E. Kirstetter, and J. Y. N. Cho, 5343-x.
2020: Geospatial QPE accuracy dependence on weather ra- }}, }}, J. Basara, T. Hamill, and D. Margolin, 2015: Solar en-
dar network configurations. J. Appl. Meteor. Climatol., 59, ergy prediction: An international contest to initiate interdisci-
1773–1792, https://doi.org/10.1175/JAMC-D-19-0164.1. plinary research on compelling meteorological problems.
Lackmann, G., Ed., 2011: Numerical weather prediction/data as- Bull. Amer. Meteor. Soc., 96, 1388–1395, https://doi.org/10.
similation. Midlatitude Synoptice Meteorology: Dynamics, 1175/BAMS-D-14-00006.1.
Analysis, and Forecasting, Amer. Meteor. Soc., 274–287. }}, R. Lagerquist, D. Gagne, G. Jergensen, K. Elmore, C.
Lagerquist, R., A. McGovern, and T. Smith, 2017: Machine learn- Homeyer, and T. Smith, 2019: Making the black box more
ing for real-time prediction of damaging straight-line convec- transparent: Understanding the physical implications of ma-
tive wind. Wea. Forecasting, 32, 2175–2193, https://doi.org/10. chine learning. Bull. Amer. Meteor. Soc., 100, 2175–2199,
1175/WAF-D-17-0038.1. https://doi.org/10.1175/BAMS-D-18-0195.1.
}}, }}, C. R. Homeyer, D. J. Gagne II, and T. Smith, 2020: }}, I. Ebert-Uphoff, D. J. Gagne II, and A. Bostrom, 2021: The
Deep learning on three-dimensional multiscale data for next- need for ethical, responsible, and trustworthy artificial intelli-
hour tornado prediction. Mon. Wea. Rev., 148, 2837–2861, gence for environmental sciences. arXiv, 2112.08453, https://
https://doi.org/10.1175/MWR-D-19-0372.1. arxiv.org/abs/2112.08453.
}}, J. Q. Stewart, I. Ebert-Uphoff, and C. Kumler, 2021: Using McKinney, W., 2010: Data structures for statistical computing in
deep learning to nowcast the spatial coverage of convection Python. Proceedings of the Ninth Python in Science Confer-
from Himawari-8 satellite data. Mon. Wea. Rev., 149, 3897– ence, S. van der Walt and J. Millman, Eds., 56–61, https://doi.
3921, https://doi.org/10.1175/MWR-D-21-0096.1. org/10.25080/Majora-92bf1922-00a.
Lakshmanan, V., C. Karstens, J. Krause, K. Elmore, A. Ryzhkov, Mecikalski, J., J. Williams, C. Jewett, D. Ahijevych, A. LeRoy,
and S. Berkseth, 2015: Which polarimetric variables are im- and J. Walker, 2015: Probabilistic 0–1-h convective initiation
portant for weather/no-weather discrimination? J. Atmos. nowcasts that combine geostationary satellite observations
Oceanic Technol., 32, 1209–1223, https://doi.org/10.1175/ and numerical weather prediction model data. J. Appl. Me-
JTECH-D-13-00205.1. teor. Climatol., 54, 1039–1059, https://doi.org/10.1175/JAMC-
Lee, C.-Y., S. J. Camargo, F. Vitart, A. H. Sobel, J. Camp, S. D-14-0129.1.
Wang, M. K. Tippett, and Q. Yang, 2020: Subseasonal predic- Molina, M. J., D. J. Gagne, and A. F. Prein, 2021: A benchmark
tions of tropical cyclone occurrence and ACE in the S2S da- to test generalization capabilities of deep learning methods
taset. Wea. Forecasting, 35, 921–938, https://doi.org/10.1175/ to classify severe convective storms in a changing climate.
WAF-D-19-0217.1. Earth Space Sci., 8, e2020EA001490, https://doi.org/10.1029/
Lee, J., R. Weger, S. Sengupta, and R. Welch, 1990: A neural net- 2020EA001490.
work approach to cloud classification. IEEE Trans. Geosci. Molnar, C., 2022: Interpretable Machine Learning: A Guide for
Remote Sens., 28, 846–855, https://doi.org/10.1109/36.58972. Making Black Box Models Explainable. 2nd ed. 329 pp.,
Li, L., and Coauthors, 2020: A causal inference model based on https://christophm.github.io/interpretable-ml-book.
random forests to identify the effect of soil moisture on pre- Muñoz-Esparza, D., R. D. Sharman, and W. Deierling, 2020: Avi-
cipitation. J. Hydrometeor., 21, 1115–1131, https://doi.org/10. ation turbulence forecasting at upper levels with machine
1175/JHM-D-19-0209.1. learning techniques based on regression trees. J. Appl. Me-
Loken, E. D., A. J. Clark, and C. D. Karstens, 2020: Generating teor. Climatol., 59, 1883–1899, https://doi.org/10.1175/JAMC-
probabilistic next-day severe weather forecasts from convec- D-20-0116.1.
tion-allowing ensembles using random forests. Wea. Forecast- Murphy, A. H., 1993: What is a good forecast? An essay on the
ing, 35, 1605–1631, https://doi.org/10.1175/WAF-D-19-0258.1. nature of goodness in weather forecasting. Wea. Forecasting,
}}, }}, and A. McGovern, 2022: Comparing and interpreting 8, 281–293, https://doi.org/10.1175/1520-0434(1993)008,0281:
differently designed random forests for next-day severe WIAGFA.2.0.CO;2.
weather hazard prediction. Wea. Forecasting, 37, 871–899, Neetu, S., M. Lengaigne, J. Vialard, M. Mangeas, C. Menkes, I.
https://doi.org/10.1175/WAF-D-21-0138.1. Suresh, J. Leloup, and J. Knaff, 2020: Quantifying the bene-
Malone, T., 1955: Application of statistical methods in weather fits of nonlinear methods for global statistical hindcasts of
prediction. Proc. Natl. Acad. Sci. USA, 41, 806–815, https:// tropical cyclones intensity. Wea. Forecasting, 35, 807–820,
doi.org/10.1073/pnas.41.11.806. https://doi.org/10.1175/WAF-D-19-0163.1.
Nowotarski, C. J., and A. A. Jensen, 2013: Classifying proximity and satellite meteorology. Advances in Neural Information
soundings with self-organizing maps toward improving super- Processing Systems, H. Larochelle et al., Eds., Vol. 33, Curran
cell and tornado forecasting. Wea. Forecasting, 28, 783–801, Associates, Inc., 22 009–22 019, https://proceedings.neurips.cc/
https://doi.org/10.1175/WAF-D-12-00125.1. paper/2020/file/fa78a16157fed00d7a80515818432169-Paper.pdf.
Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning Vigaud, N., M. K. Tippett, J. Yuan, A. W. Robertson, and N.
in Python. J. Mach. Learn. Res., 12, 2825–2830. Acharya, 2019: Probabilistic skill of subseasonal surface tem-
Peter, J. R., A. Seed, and P. J. Steinle, 2013:Application of a perature forecasts over North America. Wea. Forecasting, 34,
Bayesian classifier of anomalous propagation to single- 1789–1806, https://doi.org/10.1175/WAF-D-19-0117.1.
polarization radar reflectivity data. J. Atmos. Oceanic Technol., Wang, C., P. Wang, D. Wang, J. Hou, and B. Xue, 2020: Nowcast-
30, 1985–2005, https://doi.org/10.1175/JTECH-D-12-00082.1. ing multicell short-term intense precipitation using graph
Quinlan, J., 1993: C4.5: Programs for Machine Learning. Morgan models and random forests. Mon. Wea. Rev., 148, 4453–4466,
Kaufmann, 302 pp. https://doi.org/10.1175/MWR-D-20-0050.1.
Ravuri, S., and Coauthors, 2021: Skilful precipitation nowcasting Watson, A. I., R. L. Holle, and R. E. López, 1995: Lightning from
using deep generative models of radar. Nature, 597, 672–677, two national detection networks related to vertically inte-
https://doi.org/10.1038/s41586-021-03854-z. grated liquid and echo-top information from WSR-88D ra-
Roebber, P., 2009: Visualizing multiple measures of forecast qual- dar. Wea. Forecasting, 10, 592–605, https://doi.org/10.1175/
ity. Wea. Forecasting, 24, 601–608, https://doi.org/10.1175/
1520-0434(1995)010,0592:LFTNDN.2.0.CO;2.
2008WAF2222159.1.
Williams, J., 2014: Using random forests to diagnose aviation tur-
Rumelhart, D. E., G. E. Hinton, and R. J. Williams, 1986: Learn-
bulence. Mach. Learn., 95, 51–70, https://doi.org/10.1007/
ing representations by back-propagating errors. Nature, 323,
s10994-013-5346-7.
533–536, https://doi.org/10.1038/323533a0.
}}, D. Ahijevych, S. Dettling, and M. Steiner, 2008a: Combining
Schumacher, R. S., A. J. Hill, M. Klein, J. A. Nelson, M. J.
observations and model data for short-term storm forecasting.
Erickson, S. M. Trojniak, and G. R. Herman, 2021: From
Proc. SPIE, 7088, 708805, https://doi.org/10.1117/12.795737.
random forests to flood forecasts: A research to operations
}}, R. Sharman, J. Craig, and G. Blackburn, 2008b: Remote
success story. Bull. Amer. Meteor. Soc., 102, E1742–E1755,
detection and diagnosis of thunderstorm turbulence. Proc.
https://doi.org/10.1175/BAMS-D-20-0186.1.
SPIE, 7088, 708804, https://doi.org/10.1117/12.795570.
Sessa, M. F., and R. J. Trapp, 2020: Observed relationship be-
Yang, L., H. Xu, and S. Yu, 2021: Estimating PM2.5 concentra-
tween tornado intensity and pretornadic mesocyclone charac-
teristics. Wea. Forecasting, 35, 1243–1261, https://doi.org/10. tions in contiguous eastern coastal zone of China using
1175/WAF-D-19-0099.1. MODIS AOD and a two-stage random forest model. J. At-
Shield, S. A., and A. L. Houston, 2022: Diagnosing supercell envi- mos. Oceanic Technol., 38, 2071–2080, https://doi.org/10.1175/
ronments: A machine learning approach. Wea. Forecasting, JTECH-D-20-0214.1.
37, 771–785, https://doi.org/10.1175/WAF-D-21-0098.1. Yoshida, S., T. Morimoto, T. Ushio, and Z. Kawasaki, 2009: A
Taillardat, M., A.-L. Fougères, P. Naveau, and O. Mestre, 2019: fifth-power relationship for lightning activity from Tropical
Forest-based and semiparametric methods for the postpro- Rainfall Measuring Mission satellite observations. J. Geophys.
cessing of rainfall ensemble forecasting. Wea. Forecasting, 34, Res., 114, D09104, https://doi.org/10.1029/2008JD010370.
617–634, https://doi.org/10.1175/WAF-D-18-0149.1. Zhang, Z., D. Wang, J. Qiu, J. Zhu, and T. Wang, 2021: Machine
Tibshirani, R., 1996: Regression shrinkage and selection via the learning approaches for improving near-real-time IMERG
lasso. J. Roy. Stat. Soc., 58B, 267–288, https://doi.org/10.1111/ rainfall estimates by integrating cloud properties from
j.2517-6161.1996.tb02080.x. NOAA CDR PATMOS-x. J. Hydrometeor., 22, 2767–2781,
Vapnik, V., 1963: Pattern recognition using generalized portrait https://doi.org/10.1175/JHM-D-21-0019.1.
method. Autom. Remote Control, 24, 774–780. Zou, H., and T. Hastie, 2005: Regularization and variable selec-
Veillette, M., S. Samsi, and C. Mattioli, 2020: SEVIR: A storm tion via the elastic net. J. Roy. Stat. Soc., 67B, 301–320,
event imagery dataset for deep learning applications in radar https://doi.org/10.1111/j.1467-9868.2005.00503.x.