0% found this document useful (0 votes)
16 views31 pages

Sustainability 14 14934 v2

The article compares various predictive models for dam water levels in Botswana, focusing on the Gaborone and Bokaa dams from 2001 to 2019. It evaluates Multivariate Linear Regression, Vector AutoRegressive, Random Forest Regression, and Multilayer Perceptron ANN models, finding that RFR and MLP-ANN provided the best predictions based on climate factors and land-use. A hybrid VAR-ANN model is suggested for improved accuracy in predicting dam water level variability.

Uploaded by

Hassan Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views31 pages

Sustainability 14 14934 v2

The article compares various predictive models for dam water levels in Botswana, focusing on the Gaborone and Bokaa dams from 2001 to 2019. It evaluates Multivariate Linear Regression, Vector AutoRegressive, Random Forest Regression, and Multilayer Perceptron ANN models, finding that RFR and MLP-ANN provided the best predictions based on climate factors and land-use. A hybrid VAR-ANN model is suggested for improved accuracy in predicting dam water level variability.

Uploaded by

Hassan Raza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

sustainability

Article
Dam Water Level Prediction Using Vector AutoRegression,
Random Forest Regression and MLP-ANN Models Based on
Land-Use and Climate Factors
Yashon O. Ouma 1, * , Ditiro B. Moalafhi 2 , George Anderson 3 , Boipuso Nkwae 1 , Phillimon Odirile 1 ,
Bhagabat P. Parida 4 and Jiaguo Qi 5

1 Department of Civil Engineering, University of Botswana, Private Bag UB 0061, Gaborone, Botswana
2 Faculty of Natural Resources, Botswana University of Agriculture and Natural Resources (BUAN),
Private Bag 0027, Gaborone, Botswana
3 Department of Computer Science, University of Botswana, Private Bag UB 0061, Gaborone, Botswana
4 Department of Civil and Environmental Engineering, Botswana International University of Science and
Technology (BIUST), Private Bag 16, Palapye, Botswana
5 Center for Global Change and Earth Observations, Michigan State University, East Lansing, MI 48824, USA
* Correspondence: oumay@ub.ac.bw

Abstract: To predict the variability of dam water levels, parametric Multivariate Linear Regression
(MLR), stochastic Vector AutoRegressive (VAR), Random Forest Regression (RFR) and Multilayer
Perceptron (MLP) Artificial Neural Network (ANN) models were compared based on the influences
of climate factors (rainfall and temperature), climate indices (DSLP, Aridity Index (AI), SOI and
Niño 3.4) and land-use land-cover (LULC) as the predictor variables. For the case study of the
Gaborone dam and the Bokaa dam in the semi-arid Botswana, from 2001 to 2019, the prediction
Citation: Ouma, Y.O.; Moalafhi, D.B.; results showed that the linear MLR is not robust for predicting the complex non-linear variabilities
Anderson, G.; Nkwae, B.; Odirile, P.; of the dam water levels with the predictor variables. The stochastic VAR detected the relationship
Parida, B.P.; Qi, J. Dam Water Level between LULC and the dam water levels with R2 > 0.95; however, it was unable to sufficiently capture
Prediction Using Vector the influence of climate factors on the dam water levels. RFR and MLP-ANN showed significant
AutoRegression, Random Forest correlations between the dam water levels and the climate factors and climate indices, with a higher
Regression and MLP-ANN Models R2 value between 0.890 and 0.926, for the Gaborone dam, compared to 0.704–0.865 for the Bokaa dam.
Based on Land-Use and Climate
Using LULC for dam water predictions, RFR performed better than MLP-ANN, with higher accuracy
Factors. Sustainability 2022, 14, 14934.
results for the Bokaa dam. Based on the climate factors and climate indices, MLP-ANN provided
https://doi.org/10.3390/su142214934
the best prediction results for the dam water levels for both dams. To improve the prediction results,
Academic Editors: Ozgur Kisi and a VAR-ANN hybrid model was found to be more suitable for integrating LULC and the climate
Mohammad Valipour conditions and in predicting the variability of the linear and non-linear time-series components of the
Received: 6 October 2022 dam water levels for both dams.
Accepted: 4 November 2022
Published: 11 November 2022 Keywords: Bokaa and Gaborone dams (Botswana); dam water levels; land-use land-cover; climate
change; multivariate linear regression; Vector AutoRegressive (VAR); Random Forest Regression;
Publisher’s Note: MDPI stays neutral
Multilayer Perceptron ANN; VAR-Neural Network hybrid model
with regard to jurisdictional claims in
published maps and institutional affil-
iations.

1. Introduction
Despite freshwater scarcity being a global problem, the solutions must be locally
Copyright: © 2022 by the authors. formulated in order to understand the connections between water supply and demand, and
Licensee MDPI, Basel, Switzerland. to adequately respond to the local water shortages. Water shortages are being exacerbated
This article is an open access article by human activities, as manifested by population growth and the impacts of land-use
distributed under the terms and changes driven by urbanization, agricultural activities, industrialization, and economic
conditions of the Creative Commons
development. The cumulative effects of the intensification of land-use activities and climate
Attribution (CC BY) license (https://
change continue to pose uncertainty on the availability of water resources, with the effect
creativecommons.org/licenses/by/
of intensified manipulation of the surface and groundwater hydrological regimes [1].
4.0/).

Sustainability 2022, 14, 14934. https://doi.org/10.3390/su142214934 https://www.mdpi.com/journal/sustainability


Sustainability 2022, 14, 14934 2 of 31

The monitoring of dam water levels is important, not only to ensure efficient dam
operations, but also for applications related to the integration of reservoir management
schemes, identifying the main factors that influence the dam water level variabilities,
determining the impacts of global climate changes on catchment hydrological systems,
and ensuring sufficient freshwater supply [2,3]. In addition, the accurate monitoring and
prediction of dam water level is important as it relates to the parameters such as inflows into
the dams, dam water storage and water release from the dam reservoirs, evaporation, and
infiltration. These parameters constitute the dam reservoir uncertainties and are important
in dam operations and modelling.
For the simulation, prediction and forecasting of the dam water levels, reliable models
are required [4]. However, the variability of dam water levels results from complex non-
linear processes, which include factors such as precipitation, evaporation, discharge from
tributaries, topographic structure, land use, etc. These influences are more complicated
when the dam has various water supply sources, e.g., precipitation, rivers, wellfields and
supplies from other dams. As such, reliable and accurate prediction of dam water levels is
challenging for hydrologists and water resource managers.
To solve the hydrological time series simulation and prediction problems, numer-
ous techniques have been developed. Such models include the hydrodynamic models
(e.g., MIKE21, CHAM and EFDC), time-series models using ARMA and ARIMA, and soft
computing approaches, e.g., Artificial Neural Networks (ANNs), Support Vector Regres-
sion (SVR), and model trees [5–7]. While the hydrodynamic models have proven to be
superior in simulating water levels, for accurate and reliable predictions, they require
detailed and calibrated data, complex boundary conditions and parameters as input data,
and are computationally expensive to implement [8–11].
To improve the prediction and forecasting of water levels under data scarcity, soft
computing techniques have been recommended [12,13] because of their ability to capture
complex and non-linear input-output relationships with no explicit knowledge of the
physical processes [12,14]. Further, machine learning (ML) models have been considered
as they can efficiently represent the complex non-linear relationships in the temporally
dynamic system, which are not normally addressed in traditional mathematical models [13].
Additionally, machine learning models can deal with large spatial-temporal data in terms
of scalability, multi-dimensionality, flexibility, efficiency and accuracy. As such, they can
capture not only the primary exogenous parameters that influence the dam water level
variabilities, such as the catchment land-use and land-cover, watershed characteristics,
hydrological variables and climate factors but also the secondary factors, including the
reservoir operational decisions.
In recent decades, numerous machine learning models have been proposed and com-
pared for predicting dam water levels. In [15], the support vector machine (SVM) and
adaptive network-based fuzzy inference system (ANFIS) are compared for the forecasting
of daily reservoir water levels in the Klang gate, in Malaysia, concluding that SVM was
superior to the ANFIS model. In addition, in the Kenyir Dam in Malaysia, Reference [16]
also compared supervised Boosted Decision Tree Regression (BDTR), Decision Forest Re-
gression (DFR), Bayesian Linear Regression (BLR) and Neural Network Regression (NNR)
and showed that BLR and BDTG tree-based ML models were more accurate in predict-
ing the reservoir water levels. Using the wavelet decomposition with ANN and ANFIS,
Reference [17] demonstrated that the hybrid WANN and WANFIS models were more suit-
able for predicting daily reservoir water levels. In addition, previous research [18] predicted
the water level variabilities in the Chahnimeh reservoirs in Zabol based on evaporation,
wind speed and daily average temperature factors using ANN, ANFIS and Cuckoo op-
timizations algorithms and the results indicated that the ANFIS was the best algorithm.
For short-term reservoir water level predictions in Yaojiang, China, Reference [19] also
compared ANN, SVM and ANFIS, with the results showing that all three models had
advantages in using all the predictor datasets, avoiding noisy information with lags of
inputs, and detecting the peaks under extreme conditions, respectively.
Sustainability 2022, 14, 14934 3 of 31

Furthermore, Reference [20] predicted and estimated the daily reservoir levels for the
Millers Ferry dam on the Alabama river using ANFIS, SVM, radial Basis Neural Networks
(RBNN), and Generalized Regression Neural Networks (GRNN) methods in comparison
with the ARMA and Multilinear Regression (MLR) methods. The study concluded that,
for the best-input combinations, ANFIS produced better results. For the prediction of dam
inflows into the Soyang River dam in South Korea, Reference [21] showed that instead
of individual models, the combined ensemble forecasts using Random Forest (RF) and
Gradient Boost Trees (GT) with Multilayer Perceptron (MLP) could give greater results.
In predicting the water levels in the Upo wetland in South Korea, Reference [22] also
concluded that RF regression tree-based ML had the best prediction accuracy against ANN,
decision trees (DT), and SVM. In addition, Reference [23] showed that MLR and M5P not
only had higher accuracy than the k-NN and ANN but were also faster to train than the
Advanced Hydrologic Prediction System (AHPS).
Despite the accurate prediction results, which also varied according to the case studies
and different machine learning models, there are also limitations with some of the machine
learning based prediction algorithms. For example, ANN and ANFIS have shown the
disadvantage of presenting different results that depend on the system complexity and
the available data [19]. Some algorithms also tend to have low and unstable convergence
rates, and some tend to fall into the local optimum trap, and other algorithms require
high computational time [24]. In addition to these drawbacks, most implemented studies
did not apply baseline evaluation methods in forecasting competition evaluation. This is
particularly important in gauging the relative performance of the ML models to allow for
better contextualization of the results in relation to the complexity between the models [25].
Further, most of the previous investigations tended to input all the exogenous predictor
variables in the prediction without significance and impact evaluations on the performance
of the models, with the assumption that the inclusion of additional variables improves the
model prediction accuracy [26].
From previous studies, the following is a summary of the drawbacks in the prediction
of dam water levels: (1) only a few studies have focused on the optimization of machine
learning and stochastic models and their integration for the prediction of dam water levels;
(2) most of the related studies focused on dam water level forecasting, as influenced by flood
stages and different reservoirs rather than on the dam water capacity predictions, and (3) the
studies utilized few variables in dam water level forecasting, with the dependent variable
as dam water level, and only rainfall and dam water itself as the independent variables.
To determine a suitable model for predicting the water levels in Botswana’s Limpopo
River Basin from 2001 to 2019, this study evaluates the results of the case study of the
Gaborone dam and the Bokaa dam. To improve on the drawbacks in the previous studies,
the aims of the current study are: (1) to determine the optimal machine learning model
for the accurate prediction of monthly dam water levels by comparing the parametric
Multivariate Linear Regression (MLR) as the baseline model, stochastic Vector AutoRegres-
sive (VAR), ensemble Random Forest Regression (RFR) and Multilayer Perceptron (MLP)
Neural Network (MLP-ANN); (2) to evaluate the effectiveness of the algorithms in learning
and predicting the temporal trends in the dam water levels by comparing the performances
of the optimized models; (3) to determine the significance of climate factors (rainfall and
temperature, climate indices), southern oscillation index (SOI), Niño 3.4, Aridity Index (AI),
Darwin Sea level pressure (DSLP), and land-use land-cover comprising of built-up, crop-
land, water, forest, shrubland, grassland and bare-land, in the prediction of the dam water
levels in the two dams, and; (4) to derive the optimal model approach(es) for predicting the
variability of dam water levels in the two dams. The main contribution of this work is on
the derivation of a hybrid model capable of combining stochastic and machine learning
models for the accurate prediction of dam water levels in the two dams by integrating the
LULC and the climate conditions within the dam catchments.
Sustainability 2022, 14, x FOR PEER REVIEW 4 of 32

of this work is on the derivation of a hybrid model capable of combining stochastic and
Sustainability 2022, 14, 14934 machine learning models for the accurate prediction of dam water levels in the two dams
4 of 31
by integrating the LULC and the climate conditions within the dam catchments.

2. Materials and Methods


2. Materials and Methods
2.1. Study Area
2.1. Study Area
The
Thestudy
studyareaareais is
located within
located Botswana’s
within Botswana’sLimpopo RiverRiver
Limpopo Basin Basin
(BLRB). The larger
(BLRB). The
Limpopo River Basin is a transboundary basin, covering an area of approximately
larger Limpopo River Basin is a transboundary basin, covering an area of approximately 416,300
km 2, and straddles four southern African countries: South Africa (45%), Botswana (19%),
416,300 km2 , and straddles four southern African countries: South Africa (45%), Botswana
Mozambique
(19%), Mozambique (21%), (21%),
and Zimbabwe (15%).(15%).
and Zimbabwe The basin
The is home
basin to more
is home than 18
to more million
than peo‐
18 million
ple and and
people Botswana has the
Botswana hashighest percentage
the highest (61%)(61%)
percentage of its population living in
of its population the basin.
living in the
As shown in Figure 1, the semi‐arid Botswana relies on the following
basin. As shown in Figure 1, the semi-arid Botswana relies on the following small-to- small‐to‐medium‐
sized dams, which
medium-sized dams,arewhich
located within
are thewithin
located BLRB:the
Gaborone (141.4 MCM);
BLRB: Gaborone (141.4Letsibogo
MCM); Letsi-(100
MCM); Shashe (85 MCM); Dikgatlhong (400 MCM); Bokaa (18.5
bogo (100 MCM); Shashe (85 MCM); Dikgatlhong (400 MCM); Bokaa (18.5 MCM); Lotsane MCM); Lotsane (42.35
MCM); Ntimbale
(42.35 MCM); (26.5 MCM),
Ntimbale and Thune
(26.5 MCM), (90 MCM).
and Thune The case
(90 MCM). study
The casedams
studyare the are
dams Bokaathe
dam
Bokaa and
damtheand
Gaborone dam, located
the Gaborone in the in
dam, located southern part ofpart
the southern theofBLRB (Figure
the BLRB 1). The
(Figure 1).two
The
dams
two damsare located at a distance
are located of approximately
at a distance 40 km
of approximately 40apart.
km apart.

Figure 1.
Figure Location map
1. Location map of
of the
the Limpopo
Limpopo River
River Basin
Basin (LRB),
(LRB), Botswana’s
Botswana’s LRB,
LRB, Bokaa
Bokaa and
and Gaborone
Gaborone
dams and
dams and the
the dam
dam catchment
catchment areas.
areas. Reprinted
Reprintedwith
withpermission
permissionfrom
fromref.
ref.[27].
[27].Copyright
Copyright2022
2022Society
Soci‐
of Photo-Optical
ety Instrumentation
of Photo‐Optical InstrumentationEngineers.
Engineers.

With the
With the general
general scarcity
scarcity of
offreshwater
freshwaterin inthe
thearid
aridand
andsemi‐arid
semi-aridregions,
regions,water
waterman‐
man-
agement problems tend to worsen, especially during extreme hydrological
agement problems tend to worsen, especially during extreme hydrological events, such events, such
as drought. For this reason, and to optimally manage the dam operations,
as drought. For this reason, and to optimally manage the dam operations, continuous and continuous
and accurate
accurate reservoir
reservoir management
management schemes—including
schemes—including predictions
predictions of the variabilities
of the variabilities of the
of the dam water capacities and the determinations of the influences
dam water capacities and the determinations of the influences of a natural of a natural climatic
climatic phe‐
phenomenon
nomenon and and anthropogenic
anthropogenic activities
activities on water
on the the water resources—is
resources—is essential.
essential. In most
In most re‐
regions, predicting and forecasting dam water capacities is still challenging for water
gions, predicting and forecasting dam water capacities is still challenging for water re‐
resource operators and managers. This is attributed to the fact that, despite reservoir
source operators and managers. This is attributed to the fact that, despite reservoir water
water levels being directly regulated by the inflows and outflow releases, there are several
levels being directly regulated by the inflows and outflow releases, there are several un‐
uncertainties in the dam water level determinant variables, such as the temporal dynamics
certainties in the dam water level determinant variables, such as the temporal dynamics
of climatic factors, e.g., rainfall and temperature, and dam operations and management
of climatic factors, e.g., rainfall and temperature, and dam operations and management
regimes, which are complex to model.
regimes, which are complex to model.
2.2. Data
2.2.1. Land-Use and Land-Cover (LULC)
For the multitemporal LULC classification, Landsat series data from Landsat 4 (L4-MSS),
Landsat 5 (L5-TM), and Landsat 7 (L7-ETM+), acquired from 1986 to 2020, were used.
Using the FLAASH (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes)
Sustainability 2022, 14, 14934 5 of 31

atmospheric correction algorithm and the Landsat rescaling coefficients, the multitemporal
Landsat images were corrected to generate the surface reflectance imagery.
The LULC classification was carried out using Breiman’s random forest algorithm [28]
and was implemented within the Google Earth Engine, as detailed in [29]. To improve the
classification accuracy, the mean, variance, and contrast gray-level cooccurrence matrix
(GLCM) texture features were found to be most significant and were included in the
classification scheme. The LULC classification accuracy metrics results are presented
in Table 1, and the LULC area coverages are summarized in Table 2 for the Bokaa and
Gaborone dam catchments. From the results, the Bokaa dam catchment occupies an area of
approximately 3610 km2 and the Gaborone catchment is approximately 4344 km2 .

Table 1. LULC classification accuracy. OA is average overall classification accuracy; PA is average


producer accuracy and UA is the average user accuracy.

Bokaa Dam Gaborone Dam


Kappa Kappa
Year PA (%) UA (%) OA (%) PA (%) UA (%) OA (%)
Index Index
1986 - - - - 82.4% 80.0% 88.9% 0.82
1989 - - - - 83.1% 88.1% 89.1% 0.84
1994 86.0% 86.2% 90.5% 0.857 75.8% 86.8% 83.4% 0.76
1999 87.5% 86.1% 85.5% 0.790 85.4% 86.6% 85.2% 0.78
2004 89.7% 87.8% 88.3% 0.837 86.7% 87.3% 89.6% 0.84
2009 90.1% 86.7% 89.3% 0.869 80.0% 84.3% 81.3% 0.75
2014 85.3% 88.2% 89.4% 0.858 87.8% 91.0% 85.2% 0.78
2019 91.3% 88.3% 88.0% 0.833 86.0% 83.6% 84.8% 0.80

Table 2. Spatial-temporal LULC in Bokaa and Gaborone catchments.

Bokaa Catchment (Year/Area (km2 ))


LULC Class 2001 2004 2009 2014 2019
Tree Cover 206.14 477.58 609.53 776.80 507.08
Shrubland 2237.10 2066.95 2173.66 2175.63 2296.40
Grassland 492.00 487.99 453.44 236.40 248.72
Cropland 297.20 306.55 173.84 196.23 233.57
Water 1.76 7.15 8.93 4.04 5.23
Built-Up 70.30 76.42 96.53 99.59 108.79
Bare-soil 305.40 187.26 93.97 121.21 210.10
Gaborone Catchment (Year/Area (km2 ))
LULC Class 2001 2004 2009 2014 2019
Tree Cover 752.46 870.96 1160.17 1202.26 1364.78
Shrubland 2782.85 2478.52 1805.56 2210.82 1778.31
Grassland 167.24 174.03 445.11 62.23 323.70
Cropland 451.62 454.66 588.12 526.62 268.74
Water 15.07 12.65 18.49 7.66 17.88
Built-Up 70.28 98.31 155.77 167.44 172.95
Bare-soil 104.73 255.12 171.03 167.22 417.89

From the classification error matrix, the overall accuracy (OA) is determined from the
ratio of the correctly classified pixels to the total training sample. Further, the respective
class User’s Accuracy (UA) is determined by the ratio of the correct positive predictions,
while the Producer’s Accuracy (PA) is the ratio of the correctly detected positives. For each
year, the average OA, UA and PA are presented in Table 1. The results in Table 1 show that
for both dam catchments, the LULC classification accuracies, as measured using the PA,
UA and OA metrics, were higher than 80%, and the corresponding Kappa Index ranged
between 0.75 and 0.87. The accuracy measures demonstrate that the LULC was derived
with a high degree of accuracy for both dam catchments.
Bare‐soil 305.40 187.26 93.97 121.21 210.10
Gaborone Catchment (Year/Area (km2))
LULC class 2001 2004 2009 2014 2019
Tree Cover 752.46 870.96 1160.17 1202.26 1364.78
Sustainability 2022, 14, 14934 6 of 31
Shrubland 2782.85 2478.52 1805.56 2210.82 1778.31
Grassland 167.24 174.03 445.11 62.23 323.70
Cropland 451.62 454.66 588.12 526.62 268.74
InWater
Table 2, it is observed
15.07 that for12.65
both catchments,
18.49the built-up areas are increasing
7.66 17.88
exponentially,
Built‐Up while the vegetation
70.28 and bare
98.31 soil-covered
155.77 areas increased
167.44 and decreased,
172.95
interchangeably, either due to activities in croplands or due to climate influences. Tree
Bare‐soil 104.73 255.12 171.03 167.22 417.89
cover within the catchments is also observed to be increasing in coverage, while shrubland
has decreased in extent over the years.
2.2.2. Climate Data
1. Rainfall
2.2.2. Climateand Temperature
Data
1. Monthly
Rainfall andrainfall data from the Gaborone gauge station was also used for both the
Temperature
Bokaa Monthly rainfall data dam
dam and Gaborone fromcatchments
the Gaboroneduegauge
to their geographical
station was alsoproximity, climatic
used for both the
similarities, and given that there is no gauge station within the Bokaa
Bokaa dam and Gaborone dam catchments due to their geographical proximity, climatic dam catchment.
Figure 2 shows
similarities, andthe observed
given rainfall
that there patterns
is no gaugewithin
stationthe Bokaathe
within and Gaborone
Bokaa dam dam catch‐
catchment.
ments, 2and
Figure showsFigure
the 3observed
shows the minimum,
rainfall average,
patterns withinandthemaximum
Bokaa andtemperature
Gaborone damvariabili‐
catch-
ties within the catchments. Over the 19 years of study, it is observed that
ments, and Figure 3 shows the minimum, average, and maximum temperature variabilities the mean tem‐
perature is increasing while the amount of rainfall received in the two catchments
within the catchments. Over the 19 years of study, it is observed that the mean temperature is de‐
creasing.
is increasing while the amount of rainfall received in the two catchments is decreasing.

Sustainability 2022, 14, x FOR PEER REVIEW 7 of 32


Figure 2. Variability of rainfall and climate indices within Bokaa and Gaborone dam catchments
Figure 2. Variability of rainfall and climate indices within Bokaa and Gaborone dam
(2001–2019).
catchments (2001–2019).

Figure 3. Temperature variability within the Bokaa and Gaborone dam catchments (2001–2019).
Figure 3. Temperature variability within the Bokaa and Gaborone dam catchments (2001–2019).
2.
2. Climate
Climate Indices
Indices
The climate indices considered were those that have teleconnections with particular
The climate indices considered were those that have teleconnections with particular
rainfall over southern Africa, that is, DSLP, SOI, and Niño 3.4. The average March‐June
rainfall over southern Africa, that is, DSLP, SOI, and Niño 3.4. The average March-June
pressures at Darwin have proven to have high positive sea level pressure (SLP) anomalies
pressures at Darwin have proven to have high positive sea level pressure (SLP) anomalies
and teleconnections to droughts over southern Africa [30]. The SOI standardized sea‐level
pressure difference between Papeete and Darwin is also related to rainfall over the sub‐
region. In addition to the three climate indices, the aridity index (AI) was derived using
station rainfall and temperature data, as in Equation (1):
12Pi
2. Climate Indices
The climate indices considered were those that have teleconnections with particular
rainfall over southern Africa, that is, DSLP, SOI, and Niño 3.4. The average March‐June
Sustainability 2022, 14, 14934 pressures at Darwin have proven to have high positive sea level pressure (SLP) anomalies 7 of 31
and teleconnections to droughts over southern Africa [30]. The SOI standardized sea‐level
pressure difference between Papeete and Darwin is also related to rainfall over the sub‐
region. In addition
and teleconnections to the three
to droughts overclimate
southern indices,
Africathe
[30].aridity index
The SOI (AI) was derived
standardized sea-levelusing
station
pressure rainfall and
difference temperature
between Papeetedata, as in Equation
and Darwin is also (1):
related to rainfall over the sub-
region. In addition to the three climate indices, the 12aridity
Pi index (AI) was derived using
Ii 
station rainfall and temperature data, as in Equation (1):, (1)
Ti  10
12Pi
Ii = (1)
where Pi = the monthly total precipitation Ti + 10(mm) and Ti = mean near‐surface tempera‐
ture (°C).
where Pi = the monthly total precipitation (mm) and Ti = mean near-surface temperature (◦ C).

2.2.3.2.2.3.
DamDam Reservoir
Reservoir Water Water Levels
Levels
The mean
The mean monthly
monthly dam water
dam water levelslevels were as
were used used
theas the indicators
indicators for water
for water availabil‐
availability
in surface water storages, from 2001 to 2019, for the two dams. Figure 4a shows the variabil- the
ity in surface water storages, from 2001 to 2019, for the two dams. Figure 4a shows
ity ofvariability of thelevels
the dam water dam with
water levels with
rainfall, with rainfall,
the Bokaa with
damthe Bokaa dam
exhibiting exhibitinghigher
a marginally a margin‐
degreeallyofhigher degreewith
correlation of correlation with
rainfall than rainfall
the than dam.
Gaborone the Gaborone dam. The
The scatterplot scatterplot re‐
regressions
gressions in Figure 4b depicts very low correlations between
in Figure 4b depicts very low correlations between the measured dam water levelsthe measured damandwater
levels and rainfall in
rainfall in the two catchments.the two catchments.

(a) (b)
Figure 4. (a) Variability of dam water levels in Bokaa and Gaborone dams with mean monthly
Figure 4. (a) Variability of dam water levels in Bokaa and Gaborone dams with mean monthly
precipitation. (b) Correlation between dam water levels and rainfall.
precipitation. (b) Correlation between dam water levels and rainfall.

2.2.4. Data Statistics and Correlational Analysis


The summary of the mean monthly statistical descriptions of the study datasets, from
2001 to 2019, for the two dams is presented in Table 3.
In terms of the correlations presented in Figure 5, the Bokaa dam exhibits the highest
water levels, but inverse correlations with tree cover, at −0.349, followed by maximum
temperature, bare soil, grassland, average temperature and aridity index, respectively,
at −0.243, 0.216, 0.175, 0.161, and 0.161. The Bokaa dam water level correlations were
particularly worse with Niño 3.4 and DSLP, at −0.035 and −0.047, respectively. In general,
the water levels in the Bokaa dam have positive but low correlations with LULC classes
and low negative correlations with the climate factors. Comparatively, the Gaborone dam
had higher correlations with the predictor variables (Figure 5). The highest correlations
for the Gaborone dam water levels with the predictor variables were for grassland, water
bodies, and shrubland, at 0.815, 0.761 and −0.730, respectively. The lowest correlations
were with built up, rainfall, and aridity index, at 0.013, 0.029 and 0.034, respectively.
The Gaborone dam displays positive and higher correlations with dam surface area and
grassland; however, lower and negative correlations with climate factors and indices.
Sustainability 2022, 14, 14934 8 of 31

Table 3. Descriptive statistics for the datasets. (BC = Bokaa dam catchment; GC = Gaborone dam
catchment; SD = standard deviation; CV = coefficient of variation and SE = standard error).

Parameters Min Max Median Mean SD CV SE

Dam water B-dam 2.00 105.00 47.00 46.46 28.50 61.34 2.74
levels (WL) (%) G-dam 1.00 100.00 44.50 43.86 31.65 72.17 3.05
Rainfall (RN) (mm) 0.00 174.10 17.70 35.15 43.82 124.68 4.24
Min Temp (TMX) (◦ C) 1.60 23.30 15.15 13.49 6.03 44.73 0.58
Max Temp (TMM) (◦ C) 20.90 39.10 29.45 28.88 3.93 13.61 0.38
Avg Temp (TMA) (◦ C) 11.25 29.05 22.40 21.19 4.82 22.73 0.46
AI 10.00 96.15 19.21 27.36 21.71 79.35 2.09
DSLP 4.80 15.00 10.45 10.20 2.80 27.49 0.27
SOI −6.55 8.04 0.25 0.40 3.16 791.38 0.30
Niño 3.4 (NINO) 25.00 29.42 27.18 27.17 0.99 3.64 0.10
BC 14.05% 21.52% 18.80% 18.41% 0.02 13.33 0.01
TreeCover (FR)
GC 27.10% 31.42% 28.40% 28.78% 0.02 5.41 0.01
BC 60.24% 63.61% 60.95% 61.38% 0.01 2.12 0.00
Shrubland (SR)
GC 40.93% 50.89% 46.81% 46.31% 0.03 6.80 0.01
BC 6.55% 10.13% 6.82% 7.47% 0.01 17.05 0.00
Grassland (GL)
GC 1.43% 7.45% 5.00% 4.56% 0.02 45.01 0.01
BC 5.07% 6.47% 5.65% 5.70% 0.00 8.65 0.00
Cropland (CL)
GC 6.19% 12.90% 11.00% 10.34% 0.02 23.93 0.01
BC 0.11% 0.19% 0.14% 0.14% 0.00 18.04 0.00
Water body (WT)
GC 0.18% 0.41% 0.27% 0.28% 0.00 26.34 0.00
BC 2.71% 3.01% 2.81% 2.83% 0.00 3.88 0.00
Built-Up (BU)
GC 3.68% 3.98% 3.88% 3.86% 0.00 2.60 0.00
BC 2.90% 5.82% 3.85% 4.08% 0.01 26.03 0.00
Bare-soil (BL)
GC 3.85% 9.62% 5.00% 5.80% 0.02 38.22 0.01

2.3. Methods
2.3.1. Multivariate Linear Regression (MLR)
MLR was utilized as a baseline for competition evaluation [24]. Linear regression
models are simple models that have linear and non-linear parameters for predictions. For
small sample sizes, the parametric multilinear regression (MLR) models are able to establish
the relationships between the predictor variables and the dependent variable using least
squares fitting. In this study, the dam water levels depend on climate factors, climate
indicators, and LULC. The general MLR model is expressed as in Equation (2).
yi = β 0 + β 1 x1i + β 2 x2i + · · · + β q xqi + ε i (2)

where: yi = observed dependent variable; n = sample size with i = 1, . . . , n; x1 , x2 , . . . ,


xq = explanatory predictor variables; x1i , x2i , . . . , xqi = observed value descriptors; ε i = residual
or error for individual i, β 0 = constant; β 1 , β 2 , · · · β q = multiple regression coefficients. In
Equation (2), dam water level (WL) is the dependent variable, Y, determined by a set of
predictor variables, as those in Figure 5 (RN, TMX, TMM, TMA, DSLP, AI, SOI, NINO, BUP,
CL, WT, FR, SL, GL, BL).
Sustainability 2022,
Sustainability 14, x14934
2022, 14, FOR PEER REVIEW 9 9ofof32
31

(a) (b)
Figure
Figure 5.
5. Correlation
Correlation matrix
matrix heatmap
heatmap ofof the
the predictor
predictor variables
variables and
and dam
dam water
water levels
levels for
for (a)
(a) Bokaa
Bokaa
dam and (b) Gaborone dam. The datasets are abbreviated as dam water level (WL), rainfall (RN),
dam and (b) Gaborone dam. The datasets are abbreviated as dam water level (WL), rainfall (RN),
max temperature (TMX), min temperature (TMM), average temperature (TMA), Darwin Sea Level
max temperature (TMX), min temperature (TMM), average temperature (TMA), Darwin Sea Level
Pressure (DSLP), Aridity Index (AI), Southern Oscillation Index (SOI), Niño 3.4 (NINO), Built‐Up
Pressure
(BUP), (DSLP),(CL),
cropland Aridity Index
water (AI),
body Southern
(WT), Oscillation
tree‐cover Index
or forest (FR),(SOI), Niño 3.4
shrubland (SL),(NINO),
grassland Built-Up
(GL)
(BUP), cropland (CL), water
and bare‐land or soil (BL). body (WT), tree-cover or forest (FR), shrubland (SL), grassland (GL) and
bare-land or soil (BL).
2.3. Methods
2.3.2. Vector AutoRegressive Model
2.3.1. Multivariate Linear Regression (MLR)
VAR is a stochastic linear prediction model that predicts the current time variable value,
based MLR was
on its utilizedtime
previous as avalue,
baselineandfor competition
takes evaluation
into consideration [24].predictor
other Linear regression
variables.
models are simple models that have linear and non‐linear parameters
Through dynamic analysis, VAR detects the changes to a particular variable, affects changes for predictions. For
small sample sizes, the parametric multilinear regression (MLR)
to other variables, the lags of those variables and the changes in the variables’ lags. VAR models are able to estab‐
lish
thusthe relationships
extends between
the univariate the predictortovariables
autoregression the multiple andtime-series
the dependent variable
regression, using
with the
least
laggedsquares
valuesfitting. In thisas
of all series study, the dam
regressors. Forwater levelsthe
example, dependVAR on climate
model factors,
of two climate
variables Xt
indicators,
and Yt (k =and LULC.
2) with theThe
lag general
order p MLR model
is defined asisinexpressed
Equationsas(3) inand
Equation
(4). The (2).β and γ can
be estimated using the ordinary 0  squares
yi least 1 x1i  2 xmethod.
2i
  q xqi  i , (2)
Yt = β 10 + β 11 Yt−1 + · · · · + β 1p Yt− p + γ11 Xt−1 + · · · · +γ1p Xt− p + µ1t (3)
where: yi = observed dependent variable; n = sample size with i = 1, …, n; x1, x2 , ..., xq =
Xt = β 20 + β 11 Yt−1 + · · · · + β 2p Yt− p + γ21 Xt−1 + · · · · +γ2p Xt− p + µ2t (4)
explanatory predictor variables; x1i , x2i , ..., xqi = observed value descriptors;  i = resid‐
The lag-order for the VAR(p) model is determined using the lag-length selection crite-
ual
ria,or
anderror
thefor individual
VAR(p) models  0 fitted
i, are = constant;
with orders 1 , 2 ,p=0, = multiple
q 1, . . . pmax regression coefficients.
and the p-value, which
minimizes some model selection criteria, is chosen. The parameter
In Equation (2), dam water level (WL) is the dependent variable, Y, determined by a set lag selection criteria in
this study are the Akaike’s Information
of predictor variables, as those in Figure 5 (RN, TMX, TMM, Criterion (AIC p ), Schwarz Bayesian Information
TMA, DSLP, AI, SOI, NINO,
Criterion
BUP, (BICpFR,
CL, WT, ), Hannan-Quinn
SL, GL, BL). Criterion (HQCp ), and Final Prediction Error (FPEp ). The
traditional unrestricted VAR is unsuitable for non-stationary data with seasonality and,
therefore,
2.3.2. Vector this study imposedModel
AutoRegressive a priori differencing on the input datasets for stationarity.
The implemented VAR model
VAR is a stochastic linear prediction for dammodel waterthat levelpredicts
time series prediction
the current time was devel-
variable
oped with the following steps:
value, based on its previous time value, and takes into consideration other predictor var‐
1.
iables. Testing
Through fordynamic
stationarity of the
analysis, VARindividual
detects the predictor
changesvariables using variable,
to a particular the augmented
affects
Dickey-Fuller (ADF) test.
changes to other variables, the lags of those variables and the changes in the variables’
2. VAR
lags. Determining the lagthe
thus extends forunivariate
the VAR(p) model using lag-length
autoregression to the multiple selection. VAR(p) regres‐
time‐series models
are fitted with orders p = 0,
sion, with the lagged values of all series asmax1, . . . p , and the p value resulting
regressors. For example, the VAR model of in minimal model
two
selection criteria is chosen based on the parameter selection criteria above. In this
study, the lag orders are determined for the specific predictor variables.
Sustainability 2022, 14, 14934 10 of 31

3. Establishment of an optimal VAR model with appropriate lags for each parameter.
For multivariate time series, the VAR model is constructed such that each variable, at
a time point, exhibits as a linear function of the recent lag of itself and other variables.
The generalized VAR(p) = VAR(1) form for the n = 15 predictor variables can be
expressed as in Equation (5). Equation (5) is solved using ordinary least squares, and
c represents the intercepts; A is the regression coefficient matrix, and e is the error in
prediction at time t.
A1,1 · A1,15
        
Y1,t c1 Y1,t−1 e1,t
 ·   ·   · · ·   ·  +  · 
   
 ·  =  · + · (5)
    
· ·   ·   · 
Y15,t c15 A15,1 · A15,15 Y15,t−1 e15,t
4. Residual autocorrelation assessment for goodness-of-fit. For the time series data, the
autocorrelation of the residuals between the observed and the model-fitted values is
used to determine the goodness-of-fit of the model. Accuracy assessment metrices,
including R2 , RMSE, MAE and MAPE are used.
5. VAR system stability test assessment with the autoregressive (AR) roots graph. The
VAR stability determines how well the model represents the time series over the
sampling window. This is evaluated using the roots of the characteristic polynomial
of the coefficient matrix A in Equation (5). If the roots are less than 0, the VAR model
is considered stable.

2.3.3. Random Forest Regression


RFR is an ensemble learning regression model based on a decision tree algorithm [28].
The RFR principle entails randomly generating different unpruned CART decision trees, in
which the decrease in Gini impurity is regarded as the splitting criterion. As a bootstrap
resampling and bagging approach, the bootstrap samples from the training set data are
fitted with an unpruned decision tree for each bootstrap sample. At the decision tree nodes,
variable selection is made on small random subsets of the predictor variables and the best
split from the predictors used to split the node. The trees in the forest are averaged or voted
to generate output probabilities and a final model that generates a robust model. In this
study the construction of the RFR through the following steps:
1. From the original data, nTree bootstrap samples are drawn.
2. For each bootstrap dataset, a tree is grown, and for each tree-node mTry variables are
randomly selected for splitting.
3. The aggregated information from the nTree trees is used for new data prediction, in
this case voting for regression.
4. Out-of-bag (OOB) error rate are computed using the test dataset not in the bootstrap sample.
RFR hyperparameters were tuned to determine the optimal lag-order, epochs, number
of trees (n_estimators) and max_depth for predicting the dam water levels.

2.3.4. Multilayer Perceptron (MLP) Neural Network


MLP-ANN is one of the most popular Neural Network models with input, hidden,
and output layers. The advantage of MLP-ANN is that even with a single hidden layer
and arbitrary bounded and smooth activation function, the network can approximate a
continuous non-linear system. The adopted network in this study was trained on the
Levenberg–Marquardt backpropagation with a gradient scheme for weighting adjustment
to minimize the predicted and observed data errors. The MLP-ANN model was imple-
mented following the structure and detailed steps in [31].

2.4. Performance Evaluation Metrics


Four statistical measures were used to evaluate the prediction efficiency of the models,
RMSE, R2 , MAE, and MAPE. The metrices are respectively represented in Equations (6)–(9),
Sustainability 2022, 14, 14934 11 of 31

where hio is the observed dam water level and his is the simulated or predicted dam water
level. RMSE, MAE, and h are measured in % of dam water level.
" 2 # 0.5
n his − hio
RMSE = ∑ (6)
i =1
n

n  o
 s
 2
∑ −h hio −h his
i =n
R2 = n  (7)
o 2 n s 2
  
∑ hio − h ∑ his − h
i =n i =n

his − hion
MAE =
n ∑ (8)
i =1

1 n hio − his

MAPE = ∑ × 100% (9)
n i =1 hio

2.5. Data Normalization


The input datasets were standardized within the range [0.1–0.9]. The [0.1–0.9] nor-
malization, using the minimum-maximum boundary, was used to standardize the original
data, as expressed in Equation (10). The standardization minimizes biases as all the input
data receive the same attention.
 o
hi − hio min

o
f : hi → 0.1 + 0.8 ∗ (10)
hio max − hio min

where hio , y ∈ Rn , hio min = min hio , hio max = max hio and hio = input data. The datasets
 

were divided into 70% for training sets (April 2001–May 2014) and 30% for testing
(June 2014–December 2019).
The predictor parameters were organized into predefined significant inputs compris-
ing of: Set-1: Climate Indices, Rainfall and Temperatures; Set-2: Min-Avg-Max Tem-
peratures; Set-3: All Variables; Set-4: Rainfall; Set-5: Land-Use Land-Cover (LULC);
Set-6: LULC, Rainfall, Minimum and Maximum Temperatures, Climate Indices; Set-7: Rainfall,
Minimum-Maximum Temperatures and Set-8: Climate Indices.
To evaluate the relative importance of the predictor variables, backward sensitivity
analysis is adopted, where the significance of each input variable is determined by stepwise
variable replacement and the measure of the MAE deviation.

3. Results
3.1. Hyperparameter Tuning for the Models
3.1.1. Parameter Lag Order Determination for VAR Model
The optimal lag orders for the Gaborone and Bokaa dams were determined based on
the AIC, BIC, and HQIC measures. From the summary results in Table 4, rainfall (Set-4)
had the lowest AIC, BIC, and HQIC information criteria for the Gaborone dam, respectively
corresponding to −9.493, −5.990, and −8.070 (Table 4). Set-7, comprising rainfall and
temperature, was the second lowest, followed by Set-8, consisting of all the temperatures,
and the highest measure was detected from Set-3, comprising all the parameters. For
the Gaborone dam, temperature and rainfall had the highest lag orders, at 43 and 40,
respectively. From the results in Table 4 for the Bokaa dam, the rainfall factor (Set-4) gave
the lowest AIC, BIC, and HQIC, at −9.061, −7.732 and −8.523, respectively, and the highest
lag order of 20. This is followed by Set-7, combining rainfall and temperatures, with a lag
order of 12. For the Bokaa dam, the respective optimal lag orders varied between 7–20,
with temperature having the least lag order compared to the Gaborone dam. In Table 4, the
FPE values are not included since their magnitudes were all negligible.
Sustainability 2022, 14, 14934 12 of 31

Table 4. Optimal VAR(p) lag order determinants for Gaborone dam and Bokaa dam.

Gaborone Dam Bokaa Dam


Lag Lag
Dataset AIC BIC HQIC R2 AIC BIC HQIC R2
Order Order
Set-1 7 −45.9 −37.3 −42.5 0.761 7 −43.3 −35.1 −40.0 0.785
Set-2 43 −58.6 −43.8 −52.6 0.329 6 −57.6 −55.8 −56.9 0.203
Set-3 2 −142.6 −133.1 −138.7 0.810 8 −157.6 −125.2 −144.5 0.872
Set-4 40 −9.5 −5.9 −8.1 0.224 20 −9.1 −7.7 −8.5 0.256
Set-5 8 −62.8 −53.2 −58.9 0.952 7 −78.2 −69.9 −74.8 0.917
Set-6 2 −100.8 −92.5 −97.5 0.936 6 −116.9 −91.9 −106.8 0.860
Set-7 7 −20.8 −18.7 −19.9 0.121 12 −19.6 −16.5 −18.3 0.798
Set-8 2 −25.9 −24.9 −25.6 0.884 7 −25.4 −22.1 −24.1 0.902

The VAR training results show that the contributions of rainfall and temperatures were
insignificant for both dams, with R2 of less than 35%. The combination of the two climate
factors in Set-7 only improved the training results for the Bokaa dam water levels, but did
not influence the water levels in the Gaborone dam. Both dams responded well with LULC
and the four regional climate indices, with R2 of between 76% and 95% (Table 4).

3.1.2. Training for RF Regression


To determine the optimal RFR tuning hyperparameters, the data sets were trained
with 70% of the data. The training results, based on lag order and max_depth, n_estimators,
are presented in Figure 6 for the Bokaa dam, and the corresponding results for the four best
predictors variables are presented in Table 5. For the Gaborone dam water level simulations
Sustainability 2022, 14, x FOR PEER REVIEW 13 of 32
and predictions, the results for the RFR model tuning parameters are also presented in
Figure 6, with the best predictor variables statistics presented in Table 5.

Bokaa: R2 vs. Lag Order Bokaa: R2 vs. max_depth Bokaa: R2 vs. n_estimators
0.86 0.84 0.85

0.84 0.84
0.83
0.82 0.83
R2

R2

0.82
R2

0.80 0.82
(a)
0.78 0.81 0.81

0.76 0.8
0.8
1 3 5 7 9 11 13 15 17 19 21 10 20 30 40 50 60 70 80 90 50 100 150 200 250 300 350 400
Lag Order max_depth n_etimators
Set 1 Set 2 Set 1 Set 2 Set 1 Set 2
Set 3 Set 7 Set 3 Set 7 Set 3 Set 7

Gaborone: R2 vs. Lag Order Gaborone: R2 vs. max_depth Gaborone: R2 vs. n_estimators
1.0 1.0 1.0

0.8 0.8 0.8

0.6 0.6 0.6


R2
R2

0.4 0.4 0.4


R2

(b)
0.2 0.2 0.2

0.0 0.0 0.0


1 3 5 7 9 11 13 15 10 20 30 40 50 60 70 80 90 50 100 150 200 250 300 350 400
Lag Order max_depth n_estimators
Set 2 Set 4 Set 2 Set 4 Set 2 Set 4
Set 8 Set 7 Set 8 Set 7 Set 8 Set 7

Figure 6. Hyperparameter tuning response for dam water level predictions using RFR model based
Figure 6. Hyperparameter tuning response for dam water level predictions using RFR model based
on lag order, max_depth and n_estimators for (a) Bokaa dam (top row), and (b) Gaborone dam
on lag order, max_depth and n_estimators for (a) Bokaa dam (top row), and (b) Gaborone dam
(bottom row). Reprinted with permission from ref. [27]. Copyright 2022 Society of Photo‐Optical
(bottom row). Reprinted
Instrumentation Engineers.with permission from ref. [27]. Copyright 2022 Society of Photo-Optical
Instrumentation Engineers.
The RFR hyperparameter tuning results show that the water level prediction in the
Bokaa dam required significantly higher lag orders than the Gaborone dam but relatively
shallower depth and fewer n_estimators or number of RFR trees (Table 5). The RFR train‐
ing results for the best datasets depict R2 > 0.82, with the exception of the Gaborone dam,
where the climate indices yield R2 = 0.563.
Sustainability 2022, 14, 14934 13 of 31

Table 5. Bokaa and Gaborone dam RFR optimal hyperparameters after tuning for best
predictor datasets.

Best Descriptor Data Sets Lag Order Max_Depth n_Estimators R2


Bokaa dam
Climate indices, Rainfall,
Set-1 Min-Avg-Max 12 11 52 0.840
Temperatures
Min-Avg-Max
Set-2 11 11 51 0.831
Temperatures
Set-3 All Variables 2 20 50 0.824
Rainfall, Min-Max
Set-7 13 20 50 0.820
Temperatures
Gaborone dam
Min-Avg-Max
Set-2 1 28 379 0.914
Temperatures
Set-4 Rainfall 1 28 379 0.817
Rainfall, Min-Max
Set-7 1 20 100 0.819
Temperatures
Set 8 Climate Indices 2 20 350 0.563

The RFR hyperparameter tuning results show that the water level prediction in the
Bokaa dam required significantly higher lag orders than the Gaborone dam but relatively
shallower depth and fewer n_estimators or number of RFR trees (Table 5). The RFR training
results for the best datasets depict R2 > 0.82, with the exception of the Gaborone dam,
where the climate indices yield R2 = 0.563.

3.1.3. Training of MLP-ANN Model


The training of the MLP-ANN for predicting the water levels in the two dams was
based on the lag order, the network number of hidden layers, epochs, and batch sizes. The
tuning results for the dams are illustrated in Figure 7 and the summary statistics for the
best four predictor variables are presented in Table 6.
For MLP training, low lag orders, between 1–4, are required to train the ANN, with
the hidden layers varying from 2–4 (Table 6). The Bokaa dam required higher epochs, with
relatively lower batch sizes, to train the model compared to the Gaborone dam, with the
exception of the data set comprising min-avg-max temperatures for the Gaborone dam. The
difference between the MLP and RFR hyperparameter tuning is that MLP-ANN detected
the direct impact of rainfall (Set-4) on the Bokaa dam water level variability, while RFR
only detected it indirectly, in combination with temperature (Set-7). For the Gaborone dam,
RFR detected the direct impact of climate indices (Set-8), however, this was only captured
indirectly for the Bokaa dam using RFR with Set-1. The RFR and MLP-ANN results
indicate that the temporal variability of the dam water levels within the two catchments is
influenced by the climate indices and climate factors. The impact of LULC is not directly
related to the water levels but may contribute to the determination of demand and dam
operation regimes. The best predictor variables in Table 6 show high training output with
R2 > 0.83.
The observed variable responses in the hyperparametric tuning for water levels in
both dams, using RFR and MLP-ANN, respectively, shown in Figures 6 and 7, are attributed
to the systematic one-parameter-at-a-time tuning approach. For both models, the input of
the combination of the predetermined optimal hyperparameters in the determination of
the final hyperparameter automatically minimizes the model errors yielding the best fit
results, as observed in the final tuning response curves in Figures 6 and 7.
Sustainability 2022, 14, 14934 14 of 31
Sustainability 2022, 14, x FOR PEER REVIEW 14 of 32

Bokaa: R2 vs. Lag Order Bokaa: R2 vs. No. of Hidden Layers Bokaa: R2 vs. No. of Epochs
0.9 0.9
0.9

0.8
0.6 0.8
R2

R2
0.7

R2
0.3 0.7 (a)
0.6

0.0 0.5 0.6

100

200

300

400

500

600

700

800

900

1000
1 3 5 7 9 11 13 15 17 19 21 1 2 3 4 5 6 7 8 9 10
Lag Order Number of Hidden Layers No. of Epochs
Set 2 Set 4 Set 2 Set 4 Set 2 Set 4
Set 3 Set 7 Set 3 Set 7 Set 3 Set 7

Gaborone: R2 vs. Lag Order Gaboron: R2 vs. No. of Hidden Layers Gaborone: R2 vs. Number of Epochs
1.0 1.0 1.0

0.8
0.7 0.8
0.6
R2

R2
R2

0.4
0.4 0.6 (b)
0.2

0.1 0.4 0.0

1000
100
200
300
400
500
600
700
800
900
1 3 5 7 9 11 13 15 17 19 21 1 2 3 4 5 6 7 8 9 10
Lag Order Number of Hidden Layers Number of Epochs
Set 1 Set 2 Set 1 Set 2 Set 1 Set 2
Set 4 Set 7 Set 4 Set 7 Set 4 Set 7

Figure 7. Hyperparameter tuning for dam water level prediction using MLP‐ANN model based on
Figure 7. Hyperparameter tuning for dam water level prediction using MLP-ANN model based on
lag order, number of hidden layers, and epochs for (a) Bokaa dam (top row), and (b) Gaborone dam
lag order, number of hidden layers, and epochs for (a) Bokaa dam (top row), and (b) Gaborone dam
(bottom row). Reprinted with permission from ref. [27]. Copyright 2022 Society of Photo‐Optical
(bottom row). Engineers.
Instrumentation
Table 6. Bokaa and Gaborone dam optimal hyperparameters after tuning for MLP-ANN.
For MLP training, low lag orders, between 1–4, are required to train the ANN, with
Best Descriptor Data the
Setshidden layersLag
varying
Order fromHidden_Layers
2–4 (Table 6). TheEpochs
Bokaa dam required higher epochs,
Batch Size R2 with
relatively lower batch sizes, to train the model compared to the Gaborone dam, with the
Bokaa dam
exception of the data set comprising min‐avg‐max temperatures for the Gaborone dam.
Set-2 Min-Avg-Max Temperatures 2 3 700 5 0.865
The difference between the MLP and RFR hyperparameter tuning is that MLP‐ANN de‐
Set-3 All Variables 1 4 400 13 0.825
Set-4 Rainfall tected the direct impact
1 of rainfall (Set‐4)
2 on the Bokaa
400 dam water 5level variability, while
0.829
RFR only detected it indirectly, in combination with temperature (Set‐7). For the Gaborone
Rainfall, Min-Max
Set-7 1 5 300 5 0.850
Temperatures dam, RFR detected the direct impact of climate indices (Set‐8), however, this was only
captured indirectly for Gaborone
the Bokaadamdam using RFR with Set‐1. The RFR and MLP‐ANN
results indicate that the temporal variability of the dam water levels within the two catch‐
Climate indices, Rainfall,
Set-1 ments is influenced by 3 the climate indices
2 100 factors. The 1impact of LULC
and climate 0.882is not
Min-Avg-Max Temperatures
directly related to the water levels but may contribute to the determination of demand
Set-2 Min-Avg-Max Temperatures 1 2 600 13 0.914
and dam operation regimes. The best predictor variables in Table 6 show high training
Set-4 Rainfall output with R2 > 0.83.3 2 100 15 0.920
Rainfall, Min-Max
Set-7 4 2 100 13 0.921
Temperatures Table 6. Bokaa and Gaborone dam optimal hyperparameters after tuning for MLP‐ANN.

Best Descriptor Data Sets Lag Order Hidden_Layers Epochs Batch Size R2
3.2. Dam Water Level Prediction
Bokaa damResults
Set‐2 Min‐Avg‐Max Temperatures This section presents
2 the standardized
3 dam water
700 prediction results
5 for comparison
0.865
Set‐3 All Variables between the two dams. 1 The RMSE,4MAE, and MAPE 400 are calculated
13 on the 0.825
inverse of
Set‐4 Rainfall Equation (9) of the standardized
1 datasets.
2 400 5 0.829
Set‐7 Rainfall, Min‐Max Temperatures 1 5 300 5 0.850
3.2.1. Prediction of Dam Water Levels Using MLR
Gaborone dam
The results for the prediction of the dam water levels using MLR are presented in
Climate indices, Rainfall, Min‐Avg‐
Set‐1 Table 7, and shows that 3 for both dams,2 Set-3, comprising
100 of all variables,
1 Set-5 (LULC),
0.882 and
Max Temperatures Set-6 (LULC, Rainfall, Min and Max Temperatures) were the best predictors. For the Bokaa
Set‐2 Min‐Avg‐Max Temperatures
dam, the highest R2 was 1 0.583, from Set-3
2 600 the same13
and Set-6, while sets yielded0.914
R2 = 0.841
Set‐4 Rainfall 3 2 2 100 15
for the Gaborone dam, and LULC (Set-5) had R of 0.785 for the Gaborone dam, compared 0.920
Set‐7 Rainfall, Min‐Max Temperatures 4 2 100 13 0.921
Sustainability 2022, 14, 14934 15 of 31

to 0.489 for the Bokaa dam. The rest of the predictor variables predicted the time-series
variability of the dam water levels at less than 50% accuracy in terms of R2 . Since the same
regression fitting equation was used for training and testing the time-series dam water
levels, the MLR results were found to be similar, with very low prediction accuracy. Using
the same fit for the entire 19-year data gave better results, as presented in Figure 8, and
demonstrated the fact that more robust model(s), at both training and testing phases, are
required in the prediction of dam water levels.
Table 7. Performance of the different datasets as water level predictors using MLR.

Predictor RMSE (%) R2 MAE (%) MAPE (%)


Set B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam
Set-1 26.2 62.4 0.151 0.181 22.5 61.9 75.8 73.2
Set-2 26.9 26.6 0.098 0.013 22.9 22.8 58.2 92.0
Set-3 18.3 15.8 0.583 0.841 14.6 12.7 62.3 50.8
Set-4 28.2 26.9 0.015 0.001 24.2 22.8 60.6 93.2
Set-5 20.3 16.6 0.489 0.785 16.5 13.5 66.3 39.1
Set-6 18.3 15.8 0.583 0.841 14.6 12.7 62.3 60.8
Set-7 2022, 14, 26.8
Sustainability 26.4
x FOR PEER REVIEW 0.110 0.021 22.8 22.6 78.0 91.5
16 of 32
Set-8 27.5 26.3 0.058 0.177 23.4 22.9 88.9 73.7

Figure 8. Dam water level predictions for Bokaa dam (top) and Gaborone dam (bottom) using mul‐
Figure 8. Dam water level predictions for Bokaa dam (top) and Gaborone dam (bottom) using
tivariate linear regression. Reprinted with permission from ref. [27]. Copyright 2022 Society of
multivariate
Photo‐Optical linear regression. Reprinted
Instrumentation Engineers.with permission from ref. [27]. Copyright 2022 Society of
Photo-Optical Instrumentation Engineers.
3.2.2. VAR Prediction of Dam Water Levels
Despite the good predictions using LULC for the Gaborone dam, which impacts on
1. Bokaa
Set-3 Dam
and Set-6, theWater Levelplots
graphical Prediction Using
in Figure 8 andVAR
the large RMSE, MASE, and MAPE show
Thelinear
that the dam water
MLR level
is notpredictions
suitable forfor the Bokaa and
simulating dampredicting
were basedthe
oncomplex,
the predetermined
seasonal,
optimal
and training
non-linear results
trends for each
exhibited bydataset, shown
the water levelsininTable 4. TheAs
both dams. prediction
such, theresults show
MLR results
confirm
that onlythe hypothesis
Sets‐1, thatand
‐3, ‐5, ‐6, more robust regression
‐8 presented models
the highest are necessary
convergence forBokaa
for the predicting
dam
water
(Tablelevels in the
8). Set‐5, dams.
comprising the LULC classes, gave the highest R2, at 0.998. The second
highest (R2 = 0.975) predictor variable is (Set‐6), followed by Set‐1 (R2 = 0.959), Set‐3 (R2 =
0.928) and Set‐8 (R2 = 0.916). In terms of climate indices and climate factors, Set‐1 (R2 =
0.959; RMSE = 3.3%; MAE = 2.7%; MAPE = 14.3%) and Set‐8 (R2 = 0.995; RMSE = 2.7%;
MAE = 2.2%; MAPE = 36.9%) gave the best results. Without the climate indices, the long‐
term predictions of dam water levels using temperatures (Set‐2), rainfall (Set‐4), and their
combination shows low prediction results. The rainfall and temperature sets registered
Sustainability 2022, 14, 14934 16 of 31

3.2.2. VAR Prediction of Dam Water Levels


1. Bokaa Dam Water Level Prediction Using VAR
The dam water level predictions for the Bokaa dam were based on the predetermined
optimal training results for each dataset, shown in Table 4. The prediction results show
that only Sets-1, -3, -5, -6, and -8 presented the highest convergence for the Bokaa dam
(Table 8). Set-5, comprising the LULC classes, gave the highest R2 , at 0.998. The second
highest (R2 = 0.975) predictor variable is (Set-6), followed by Set-1 (R2 = 0.959), Set-3
(R2 = 0.928) and Set-8 (R2 = 0.916). In terms of climate indices and climate factors, Set-1
(R2 = 0.959; RMSE = 3.3%; MAE = 2.7%; MAPE = 14.3%) and Set-8 (R2 = 0.995; RMSE = 2.7%;
MAE = 2.2%; MAPE = 36.9%) gave the best results. Without the climate indices, the long-
term predictions of dam water levels using temperatures (Set-2), rainfall (Set-4), and their
combination shows low prediction results. The rainfall and temperature sets registered the
highest MAPE errors, of more than 50%. The good performance of the LULC is attributed
to the interpolation within the five years, which results in minimal variability within the
input data and, therefore, low data variability and high accuracy.
Table 8. Accuracy statistics for water level predictions using VAR model for dam Bokaa and
Gaborone dam.

Predictor RMSE (%) R2 MAE (%) MAPE (%)


Set B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam
Set-1 3.5 2.7 0.959 0.995 2.7 2.2 14.3 36.9
Set-2 27.9 52.9 0.157 0.181 23.1 49.9 87.4 88.3
Set-3 4.9 0.2 0.928 0.998 3.4 0.1 21.9 1.5
Set-4 43.7 86.7 0.167 0.116 34.9 73.7 41.0 45.9
Set-5 0.7 0.6 0.998 0.999 0.2 0.2 1.5 3.3
Set-6 4.4 0.2 0.975 0.876 3.0 0.1 28.9 1.5
Set-7 42.8 1.3 0.291 0.858 32.1 0.9 40.6 15.4
Set-8 3.4 0.7 0.916 0.929 2.8 0.6 16.2 8.0

The best results for the water level predictions in the Bokaa dam are presented in
Figure 9. The prediction results and the graphical fits show that, despite having the highest
performance accuracy, the predictor factors combined LULC (Set-3 and Set-6) are not the
best predictor variables. This is particularly due to the inability of the model to capture the
dam water levels at the beginning of the prediction using the LULC as the predictor factor.
These differences are captured within the dotted boxes in Figure 9, depicting a lack of
expected trends and patterns. From the graphical and statistical analysis, the best predictor
variables for the Bokaa dam water levels are Set-1 and Set-8, where Set-1 was influenced by
both climate factors and climate indices.
2. Gaborone Dam Water Level Prediction Using VAR
Using the VAR model, the prediction of the Gaborone dam water levels is detected
to be significant using the four climate indices (Set-8), as shown in Table 8 and Figure 10
(R2 = 0.929; RMSE = 0.7%; MAE = 0.6%; MAPE = 8%). The rainfall and temperature climate
factors performed marginally in predicting the Gaborone dam water levels, with R2 of less
than 0.3 and MAPE above 20%, while their combination in Set-7 yielded higher accuracy
prediction accuracy results. Similarly, high prediction results were obtained using the
integration of the climate indices with rainfall and temperature in Set-1. The results for
the climate-based predictors are presented in Figure 10 for the Gaborone dam, with Set-3
including all parameters. By visually assessing the trends of the predictions within the
dotted boxes in Figure 10, it is empirically observed that climate indices gave the best
results. However, the results show that in the absence of climate factors, LULC can be
used to predict the water levels in the dams with good accuracy (R2 > 0.990; RMSE < 0.7%;
MAE < 0.3%; MAPE < 3.5%).
est performance accuracy, the predictor factors combined LULC (Set‐3 and Set‐6) are not
the best predictor variables. This is particularly due to the inability of the model to capture
the dam water levels at the beginning of the prediction using the LULC as the predictor
factor. These differences are captured within the dotted boxes in Figure 9, depicting a lack
Sustainability 2022, 14, 14934
of expected trends and patterns. From the graphical and statistical analysis, the best pre‐ 17 of 31
dictor variables for the Bokaa dam water levels are Set‐1 and Set‐8, where Set‐1 was influ‐
enced by both climate factors and climate indices.

Sustainability 2022, 14, x FOR PEERFigure


REVIEW9. Water level prediction for Bokaa dam using VAR model. Reprinted with permission
18 of 32
Figure 9. Water level prediction for Bokaa dam using VAR model. Reprinted with permission
from ref. [27]. Copyright 2022 Society of Photo‐Optical Instrumentation Engineers. from
ref. [27]. Copyright 2022 Society of Photo-Optical Instrumentation Engineers.
2. Gaborone Dam Water Level Prediction Using VAR
Using the VAR model, the prediction of the Gaborone dam water levels is detected
to be significant using the four climate indices (Set‐8), as shown in Table 8 and Figure 10
(R2 = 0.929; RMSE = 0.7%; MAE = 0.6%; MAPE = 8%). The rainfall and temperature climate
factors performed marginally in predicting the Gaborone dam water levels, with R2 of less
than 0.3 and MAPE above 20%, while their combination in Set‐7 yielded higher accuracy
prediction accuracy results. Similarly, high prediction results were obtained using the in‐
tegration of the climate indices with rainfall and temperature in Set‐1. The results for the
climate‐based predictors are presented in Figure 10 for the Gaborone dam, with Set‐3 in‐
cluding all parameters. By visually assessing the trends of the predictions within the dot‐
ted boxes in Figure 10, it is empirically observed that climate indices gave the best results.
However, the results show that in the absence of climate factors, LULC can be used to
predict the water levels in the dams with good accuracy (R2 > 0.990; RMSE < 0.7%; MAE <
0.3%; MAPE < 3.5%).

Figure 10. Water level predictions for Gaborone dam using VAR model. Reprinted with permis‐
Figure
sion10.
fromWater level
ref. [27]. predictions
Copyright 2022for Gaborone
Society dam using
of Photo‐Optical VAR model. Reprinted
Instrumentation Engineers. with permission
from ref. [27]. Copyright 2022 Society of Photo-Optical Instrumentation Engineers.
3.2.3. RFR Simulation and Prediction
1. RFR Prediction of Bokaa Dam Water Levels
The RFR prediction results for the Bokaa dam show that all the datasets are suitable
for predicting the water levels, with R2 > 0.8. LULC and RFR presented the least prediction
accuracy with R2 = 0.807 and the best four predictors were Set‐2 of all the temperatures,
followed by Set‐3, Set‐7, and Set‐1, with R2 of 0.836, 0.829, 0.824, and 0.820, respectively.
Sustainability 2022, 14, 14934 18 of 31

3.2.3. RFR Simulation and Prediction


1. RFR Prediction of Bokaa Dam Water Levels
The RFR prediction results for the Bokaa dam show that all the datasets are suitable
for predicting the water levels, with R2 > 0.8. LULC and RFR presented the least prediction
accuracy with R2 = 0.807 and the best four predictors were Set-2 of all the temperatures,
followed by Set-3, Set-7, and Set-1, with R2 of 0.836, 0.829, 0.824, and 0.820, respectively. The
corresponding RMSE varied between 11.3–12.5%, with an MAE average of approximately
7% and MAPE of approximately 13%. Figure 11 presents the predictions for the four best
predictor variable sets. The results from Set-2 and Set-7 comprise temperatures and rainfall
and depict that RFR captured the relationship between the dam water levels and the climate
factors (rainfall and temperature). The analysis of the prediction trends confirms Set-2 and
Set-7 as the most suitable for predicting the dam water levels, as illustrated within
Sustainability 2022, 14, x FOR PEER REVIEW 19 ofthe
32
dotted boxes, where the predictor variables are able to capture the temporal trends of the
measured dam water levels.

Figure 11. Observed and RFR predicted water levels for Bokaa dam. Reprinted with permission
Figure 11. Observed and RFR predicted water levels for Bokaa dam. Reprinted with permission from
from ref. [27]. Copyright 2022 Society of Photo‐Optical Instrumentation Engineers.
ref. [27]. Copyright 2022 Society of Photo-Optical Instrumentation Engineers.

2.
2. RFR
RFRPrediction
Predictionof ofGaborone
Gaborone Dam Dam Water
Water Levels
Levels
Using the optimal RFR hyperparameters
Using the optimal RFR hyperparameters for forpredicting
predicting thethe water
water levels
levels in the
in the Gabo‐
Gaborone
rone
dam,dam,TableTable 9 shows
9 shows Sets-2,Sets‐2,
-4, -7,‐4,
and‐7,-8and ‐8 presenting
presenting theresults,
the best best results,
with Rwith R values
2 values2 of
of 0.918,
0.918, 0.819, 0.898, 0.897 and 0.890, respectively. The datasets comprise
0.819, 0.898, 0.897 and 0.890, respectively. The datasets comprise temperature, rainfall, temperature, rain‐
fall,
theirtheir combination,
combination, andclimate
and the the climate indices,
indices, respectively.
respectively. The RMSEThe RMSE is observed
is observed to be
to be lower
lower for the Gaborone dam than the Bokaa dam, ranging between
for the Gaborone dam than the Bokaa dam, ranging between 9.7% and 11.4%, while the 9.7% and 11.4%, while
the
MAE MAE averages
averages werewere at 6.5%
at 6.5% of dam
of dam waterwater levels
levels andand
MAPEMAPE is higher,
is higher, at between
at between 23%23%and
and
38%.38%. The LULC‐based
The LULC-based prediction
prediction resultsresults showdespite
show that, that, despite the positive
the positive correlationcorrelation
of more
of more
than 65%,than
with65%, with water
the dam the dam water
levels, LULClevels,
doesLULC does not
not capture thecapture
temporal theseasonality
temporal sea‐ and
sonality
variability andof variability
the dam water of the dam(Figure
levels water 12).
levels
The(Figure
results12). The results
in Table in Table
9 and Figure 9 and
12 depict
Figure
that RFR 12isdepict that
able to RFR the
predict is able to predict
water levels inthe
thewater levelsdam
Gaborone in the
usingGaborone
the climatedamfactors,
using
the climate factors, with the temperatures (Set‐2) being the best climate
with the temperatures (Set-2) being the best climate factor, followed by rainfall (Set-4). The factor, followed
by rainfall (Set‐4). The combination of temperature and rainfall marginally reduces the
influence of the predictive ability of temperatures by nearly 10%, to R2 of 0.898. The cli‐
mate indices (Set‐8) display a significant impact on the water levels in the Gaborone dam,
with R2 = 0.890. The dotted box regions in Figure 12 show the inability of RFR to accurately
Sustainability 2022, 14, 14934 19 of 31

combination of temperature and rainfall marginally reduces the influence of the predictive
ability of temperatures by nearly 10%, to R2 of 0.898. The climate indices (Set-8) display a
significant impact on the water levels in the Gaborone dam, with R2 = 0.890. The dotted
box regions in Figure 12 show the inability of RFR to accurately predict the temporal trends
in the Gaborone dam water levels.
Table 9. Accuracy statistics for water level predictions using the RFR model for Bokaa and
Gaborone dams.

RMSE (%) R2 MAE (%) MAPE (%)


Predictor
Set B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam
Sustainability
Set-1 2022, 14, x12.2
FOR PEER REVIEW
10.6 0.820 0.884 7.8 65. 18.3 20 of 32
25.0
Set-2 11.3 9.8 0.836 0.918 7.1 5.6 14.5 23.7
Set-3 11.9 10.1 0.829 0.816 7.1 6.8 12.1 30.7
Set-4 12.5 10.9 0.811 0.898 7.9 6.1 12.9 24.8
Set‐5
Set-5 12.5
12.5 11.3
11.3 0.807
0.807 0.653
0.653 8.0
8.0 7.1
7.1 12.8
12.8 37.5
37.5
Set‐6
Set-6 12.3
12.3 10.9
10.9 0.815
0.815 0.782
0.782 7.9
7.9 6.9
6.9 17.6
17.6 30.3
30.3
Set-7
Set‐7 12.3
12.3 10.9
10.9 0.824
0.824 0.897
0.897 7.2
7.2 6.1
6.1 13.3
13.3 25.2
25.2
Set-8
Set‐8 12.6
12.6 11.3
11.3 0.808
0.808 0.890
0.890 8.0
8.0 6.4
6.4 18.3
18.3 23.6
23.6

Figure
Figure 12.
12. Observed
Observed and
and RFR
RFR predicted water levels
predicted water levels for
for Gaborone
Gaborone dam.
dam. Reprinted
Reprinted with
with permission
permis‐
sion from ref. [27]. Copyright 2022 Society of Photo‐Optical Instrumentation Engineers.
from ref. [27]. Copyright 2022 Society of Photo-Optical Instrumentation Engineers.

3.2.4. MLP‐ANN
MLP-ANN Simulation and Prediction
1. Bokaa
BokaaDam
DamWater
Water Level
Level Prediction
Prediction Using MLP‐ANN
MLP-ANN
With the rectifier linear unit activation function, Adam optimizer, and a learning rate
of 0.0003, the
the results
results for
forpredicting
predictingwater
waterlevels
levelsininthe
theBokaa
Bokaadamdam areare presented
presented in in Table
Table 10.
10.
TheThe local
local temperature
temperature is linked
is linked to the the dam
to dam waterwater levels
levels withwith R2 ofR0.865,
the highest
the highest 2 of 0.865,
and
and the lowest
the lowest RMSE RMSE = 10.9%
= 10.9% andand
MAE MAE = 6.5%.
= 6.5%. TheThe combination
combination of temperature
of temperature and and rain‐
rainfall
fall (Set‐7)
(Set-7) is second,
is second, 2
with with
R ofR0.850,
2 of 0.850, followed
followed by rainfall 2
by rainfall (R = 0.829).
(R = 0.829).
2 ClimateClimate
indicesindices
(Set-8)
(Set‐8) also influenced
also influenced the dam the damlevels
water waterwith
levelsR2with R2 ofand
of 0.805 0.805
theand the
least least MAPE
MAPE = 13.2%.= 13.2%.
LULC
LULC had the least influence on the dam water levels, with MAPE of 27.7%, and its com‐
bination with the other parameters in Set‐6 further reduced the accuracy, with MAPE =
56.6% and R2 = 0.449.
Sustainability 2022, 14, 14934 20 of 31

had the least influence on the dam water levels, with MAPE of 27.7%, and its combination
with the other parameters in Set-6 further reduced the accuracy, with MAPE = 56.6% and
R2 = 0.449.
Table 10. Performance accuracy for water level predictions using MLP-ANN for Bokaa and
Gaborone dams.

Predictor RMSE (%) R2 MAE (%) MAPE (%)


Set B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam B-Dam G-Dam
Set-1 14.6 9.8 0.717 0.917 10.7 5.6 17.1 24.2
Sustainability
Set-2 2022, 14, 10.9
x FOR PEER REVIEW
9.3 0.865 0.925 6.5 4.9 24.6 21 of 32
30.8
Set-3 15.2 45.4 0.704 −7.801 10.7 40.9 23.4 57.7
Set-4 11.6 9.3 0.829 0.926 6.7 4.5 25.3 24.1
Set-5 15.8 42.6 0.627 −5.790 11.7 38.7 27.7 76.9
Set‐6
Set-6 17.3
17.3 28.7
28.7 0.449
0.449 −2.049
− 2.049 13.5
13.5 24.9
24.9 56.6
56.6 38.9
38.9
Set‐7
Set-7 11.3
11.3 9.8
9.8 0.850
0.850 0.920
0.920 6.8
6.8 4.7
4.7 17.7
17.7 24.7
24.7
Set-8
Set‐8 12.1
12.1 18.2
18.2 0.805
0.805 0.407
0.407 8.4
8.4 13.7
13.7 13.2
13.2 38.0
38.0

The performances
The performancesforforthe
thebest
best predictor
predictor variables
variables within
within the the
box box window
window timetime re‐
regions
gions in Figure 13 show that local temperature (Set‐2) and rainfall (Set‐4) exhibit similar
in Figure 13 show that local temperature (Set-2) and rainfall (Set-4) exhibit similar and best
and best prediction
prediction trends
trends with MAEs with MAEs of approximately
of approximately 6.5% and 6.5%
MAPE andof MAPE
25%. of 25%.

Figure 13. MLP‐ANN prediction of dam water levels for Bokaa dam.
Figure 13. MLP-ANN prediction of dam water levels for Bokaa dam.
2.
2. Gaborone
Gaborone DamDam Water Level Prediction
Water Level Prediction Using
Using MLP-ANN
MLP‐ANN
For the Gaborone dam, all the predictor datasets with LULC (Sets‐3‐5‐6) did not con‐
For the Gaborone dam, all the predictor datasets with LULC (Sets-3-5-6) did not
verge to predict the dam water levels (Table 10). This further confirms the observations in
converge to predict the dam water levels (Table 10). This further confirms the observations
MLR and RFR, where LULC recorded low correlations with dam water levels. The best
in MLR and RFR, where LULC recorded low correlations with dam water levels. The best
performing sets
performing sets in
in predicting
predicting dam
dam water
water levels
levels for
for the
the Gaborone
Gaborone damdam were
were Set-4,
Set‐4, rainfall
rainfall
(0.926), performing equally with Set‐2 (0.925), then Set‐7 (0.920), and Set‐1 (0.917).
(0.926), performing equally with Set-2 (0.925), then Set-7 (0.920), and Set-1 (0.917). The The
results show
results show aa positive
positive response
response of
of the
the dam
dam water
water levels
levels to
to rainfall,
rainfall, temperature,
temperature, and
and to
to
the climate indices with an average low RMSE of less than 10%, R2 > 0.91, and the least
MAE, > 5% on average. The dotted boxes in Figure 14 show the differences in the dam
water predictions for the Gaborone dam. In comparison to Set‐7, Sets‐1, ‐2, and ‐4 present
good initial estimations of the dam water level. The MLP‐ANN results improved the abil‐
ity of RFR to detect near‐linear trends, with Set‐2 (temperature) presenting the best em‐
Sustainability 2022, 14, 14934 21 of 31

the climate indices with an average low RMSE of less than 10%, R2 > 0.91, and the least
MAE, >5% on average. The dotted boxes in Figure 14 show the differences in the dam
water predictions for the Gaborone dam. In comparison to Set-7, Sets-1, -2, and -4 present
good initial estimations of the dam water level. The MLP-ANN results improved the22ability
Sustainability 2022, 14, x FOR PEER REVIEW of 32
of RFR to detect near-linear trends, with Set-2 (temperature) presenting the best empirical
and statistical predictions (Figure 14).

Figure 14. MLP‐ANN prediction of dam water levels for Gaborone dam.
Figure 14. MLP-ANN prediction of dam water levels for Gaborone dam.

3.3. Relative Importance of the Predictor Variables


For the Bokaa dam, Figure 15 presents the relative importance of each variable in the
predictor groups and compares all the the factors.
factors. Comparing the variables, the tree-cover tree‐cover
and shrubland exhibited the highest correlation with dam water levels (slightly more than
50% influence), followed by the max temperature and Niño 3.4. The least contributions
are from
from barebare soil,
soil,built‐up
built-upand andgrassland,
grassland,with withthethesignificance
significance of rainfall andand
of rainfall aridity in‐
aridity
index
dex being
being negligible.
negligible. TheThe significance
significance of the
of the predictor
predictor variables
variables indicates
indicates thatthat within
within the
the Bokaa
Bokaa catchment,
catchment, the the degree
degree of of vegetation
vegetation indexand
index andthe
theregional
regional temperature
temperature have
higher correlations with the Bokaa dam water capacity. For
correlations with the Bokaa dam water capacity. For the Gaborone the Gaborone damdam (Figure 15),
(Figure
grassland
15), grasslandand and
water bodies
water exhibit
bodies the highest
exhibit significance,
the highest followed
significance, by cropland
followed and bare
by cropland and
soil, with
bare the rest
soil, with theof theofparameters
rest contributing
the parameters less than
contributing 2% each.
less than The aridity
2% each. index
The aridity and
index
rainfall
and are observed
rainfall to have
are observed the least
to have contributions
the least toward
contributions predicting
toward the Gaborone
predicting dam
the Gaborone
water
dam waterlevels.levels.
WhileWhile
grassland has negligible
grassland contributions
has negligible to damtowater
contributions dam levels
water in the Bokaa
levels in the
dam, it has the highest significance for water capacity in the Gaborone
Bokaa dam, it has the highest significance for water capacity in the Gaborone dam, ac‐ dam, accounting for
nearly 48% significance. Similar to the Bokaa dam, the significance
counting for nearly 48% significance. Similar to the Bokaa dam, the significance of vege‐of vegetation health is
observed to have higher correlations with the dam water levels in the
tation health is observed to have higher correlations with the dam water levels in the Gab‐ Gaborone dam.
orone dam.
Sustainability 2022, 14, x FOR PEER REVIEW 23 of
Sustainability 2022, 14, 14934 22 of 31

Figure 15.
Figure 15. Relative Relative importance
importance of thevariables
of the predictor predictor variables
for for Bokaa
Bokaa dam dam andwater
and Gaborone Gaborone water leve
levels.
Reprinted with permission from ref. [27]. Copyright 2022 Society of Photo‐Optical
Reprinted with permission from ref. [27]. Copyright 2022 Society of Photo-Optical Instrumentation Instrumentatio
Engineers. Engineers.

Investigating
Investigating the predictorthedata
predictor
groups data
forgroups for dam,
the Bokaa the Bokaa dam,
in terms ofin terms
the of the catchme
catchment
LULC, tree-cover has the most influence in predicting the dam water levels, accountingaccountin
LULC, tree‐cover has the most influence in predicting the dam water levels,
for more
for more than than 50%;bare-soil,
50%; built-up, built‐up,andbare‐soil, and have
grassland grassland havecontribution,
the least the least contribution,
with the with th
significance of each at less than 1%. The climate factors
significance of each at less than 1%. The climate factors and maximum temperature and maximum temperature
have hav
the highestthe highest contributions,
contributions, at 34%, and at 34%,
rainfallandatrainfall
22% forat 22% for thecatchment.
the Bokaa Bokaa catchment.
AmongAmong th
the climateclimate indicators,
indicators, Niño 3.4Niñohas3.4
thehas the highest
highest contribution
contribution in predicting
in predicting the dam thewater
dam water le
elsBokaa
levels in the in thedam,
Bokaa dam, For
at 28%. at 28%. For the Gaborone
the Gaborone dam, the dam, the existence
existence of water
of water bodies andbodies an
grassland isgrassland is mostfor
most important important
predicting forthepredicting
dam water the daminwater
levels levels indam,
the Gaborone the Gaborone
with dam
up to 32%.withThe up to 32%.
climate The exhibit
factors climate competing
factors exhibit competing
significance, significance,
ranging between ranging
21–25%, between 21
with minimum temperature
25%, with minimum and rainfall as the
temperature andmost significant
rainfall as the mostclimate factors.climate
significant For thefactors. F
climate indices, Niño 3.4
the climate has the
indices, Niñohighest
3.4 has significance,
the highestatsignificance,
42%, with AI atand
42%,DSLP
with being
AI and the
DSLP bein
least, with the
a nearly
least, equal
with arelative
nearly importance
equal relative of importance
17%. of 17%.
The relativeTheimportance measures, measures,
relative importance shown in Figure
shown 16, depict 16,
in Figure the depict
sensitives of the
the sensitives of th
predictor variables. The results show that for both dams, LULC forms
predictor variables. The results show that for both dams, LULC forms part of the mopart of the most
significantsignificant
predictor variables;
predictor therefore,
variables; the more accurate
therefore, the morecatchment LULC, in terms
accurate catchment LULC,ofin terms
high temporal
high temporal resolution and actual classification accuracy, is importantdam
resolution and actual classification accuracy, is important in predicting in predictin
water levelsdamforwater
both dams.
levels The parametric
for both dams. sensitivities
The parametric in Figures 15 and
sensitivities in16 also imply
Figures 15 and 16 al
that the prediction model should be able to capture the influences of both the high and low
significant variables.
Sustainability 2022, 14, x FOR PEER REVIEW 24 of 32

Sustainability 2022, 14, 14934 imply that the prediction model should be able to capture the influences of both the23high
of 31
and low significant variables.

Figure 16.
Figure Relativesignificance
16. Relative significanceofofpredictor
predictor variables
variables within
within LULC,
LULC, climate
climate factors
factors andand climate
climate in‐
dices forfor
indices Bokaa dam
Bokaa dam and Gaborone
and Gaborone water levels.
water levels.

4. Discussions
4. Discussions
The present
The present study
study compares
compares the
the performance
performance of of the
the stochastic
stochastic VAR
VAR and
and the
the machine
machine
learning RFR and MLP-ANN models. The performances of each prediction
learning RFR and MLP‐ANN models. The performances of each prediction horizon are horizon are
compared using the average MAE, RMSE, and MAPE estimates and the R2 statistics as
compared using the average MAE, RMSE, and MAPE estimates and the R2 statistics as a
a goodness-of-fit
goodness‐of‐fit of of
thethe models.
models. TheThe metrics
metrics are considered
are considered to adequately
to adequately measure
measure the
the pre‐
prediction accuracy and depict how well the model generalizes the unseen
diction accuracy and depict how well the model generalizes the unseen or test data. Toor test data.
To determine
determine the the
bestbest predictor
predictor variables
variables andand to gauge
to gauge the the sensitivity
sensitivity of models
of the the models to
to the
the inputs, different exogenous input combinations were explored, and the results were
inputs, different exogenous input combinations were explored, and the results were com‐
compared using the above statistical indicators.
pared using the above statistical indicators.
4.1. Influence of the Predictor Variables on Dam Water Level Predictions
4.1. Influence of the Predictor Variables on Dam Water Level Predictions
4.1.1. Impact of LULC on Water Level Predictions
4.1.1. Impact of LULC on Water Level Predictions
The current study reveals the significance of LULC in predicting dam water levels
The current
as detected by thestudy reveals
tested models.the The
significance of LULC
assumption in theinfive-year
predicting dam
time water
epoch levels
used as
in the
detected by the tested models. The assumption in the five‐year time epoch
LULC temporal resolution is that there are insignificant changes in the natural land-covers used in the
LULC
such as temporal resolution
water bodies, is that there
grasslands, are insignificant
shrublands, forests,changes in the
bare soils, natural
and land‐covers
land-use such as
such as water
croplands. bodies,significant
However, grasslands, shrublands,
changes forests,
are expected bare soils,
in urban and although
built-up, land‐use at such as
a slow
croplands.
spatial and However, significant
temporal rate. Only thechanges are VAR
stochastic expected in urban
detected built‐up, and
the correlation although at a
variability
slow spatial
between theand
damtemporal rate.and
water levels Only the stochastic
LULC, VAR detected
and predicted the dam the watercorrelation
levels with and var‐
LULC
iability between
as the best the dam
predictor water
variable, levels
with theand LULC,
highest and predicted
accuracy of greaterthethandam water
99%. Thelevels with
prediction
LULC
resultsas the best
using MLR,predictor
RFR, and variable,
MLP-ANN withshowed
the highest
that accuracy
the LULCofpattern,
greater as
than 99%. The
interpolated
prediction resultsperiod,
over the 20-year using may
MLR,not RFR, and MLP‐ANN
be suitable showed
for predicting thethat
damthe LULC
water pattern,
levels as
for both
dams as it exhibited
interpolated over thehigh RMSE,
20‐year MAE, may
period, and MAPE
not beerrors.
suitable Forfor
thepredicting
Gaborone the dam, the water
dam use of
LULC resulted in a lack of convergence in prediction using the MLP-ANN. To improve the
significance of LULC in dam water predictions, it is recommended to increase the temporal
resolution of the LULC to annually.
Sustainability 2022, 14, 14934 24 of 31

4.1.2. Influence of Climate Factors and Climate Indices


In predicting Bokaa dam water levels using the VAR model, the combination of climate
indices, rainfall, and temperature gave the best results (R2 = 0.959, MAPE = 14.3%). This is
attributed to the high correlation with the climate indices (R2 = 0.916, MAPE = 16.2%), which
resulted in good performance of all the parameters combined. Rainfall and temperature,
however, did not give good results. RFR detected a higher relationship of the dam water
levels using temperature (R2 = 0.836, MAPE = 14.5%), the combination of temperature and
rainfall (R2 = 0.824, MAPE = 13.3%), the climate indices (R2 = 0.808, MAPE = 18.3%), and
the combination of climate indices, rainfall, and temperature (R2 = 0.820, MAPE = 18.3%).
The MLP-ANN results for the Bokaa dam water levels show temperature (R2 = 0.865,
MAPE = 24.6%), rainfall (R2 = 0.829, MAPE = 25.3%), the combination of rainfall and
temperature (R2 = 0.850, MAPE = 17.7%), and climate indices (R2 = 0.805, MAPE = 13.2%)
are directly related to the Bokaa dam water levels.
For the Gaborone dam, VAR predicted the dam water levels using the combined influ-
ences from rainfall and temperature combined (R2 = 0.858, MAPE = 15.4%), climate indices
(R2 = 0.929, MAPE = 8.0%), and climate indices, rainfall and temperatures (R2 = 0.995,
MAPE = 36.9%). Using RFR, the dam water level trends were best predicted using the local
temperature observations (R2 = 0.918, MAPE = 23.7%), rainfall (R2 = 0.898, MAPE = 24.8%),
integrated temperatures and rainfall (R2 = 0.897, MAPE = 25.2%), and climate indices
(R2 = 0.890, MAPE = 23.6%). Using MLP-ANN, similar results as RFR were observed,
with local temperatures (R2 = 0.925, MAPE = 30.8%), rainfall (R2 = 0.926, MAPE = 24.1%),
integrated temperatures and rainfall (R2 = 0.920, MAPE = 24.7%), and climate indices,
rainfall and temperatures (R2 = 0.917, MAPE = 24.2%).
While the VAR predictor variables are different for the two dams, with the exception of
a combination of climate indices, rainfall and temperature, the predictor parameters for the
Gaborone dam are observed to be similar to those of the Bokaa dam. It is observed that the
predictions using RFR and MLP-ANN detected the variability of both dam water levels to be
influenced by the same factors. For both dams using RFR and MLP-ANN, the results show
that the climate factors and climate indices are the best predictors for dam water levels and
are best modelled using MLP-ANN, which had the highest prediction accuracy, compared
to RFR. The results further show that in the absence of reliable rainfall and temperature
data, the water levels in both dams can reliably be predicted using the machine learning
models based on the regional climate indices (DSLP, AI, SOI and Niño 3.4).
From the analysis of the significance of the predictor variables in Figure 15, the
relatively lower contribution of rainfall in the prediction of dam water levels shows that
precipitation and resulting runoff within the catchment may not be only the main sources
of dam water but also marginal contributions from conjunctive water sources, such as
wellfields and from other dams. As such, improvements in the prediction of the dam water
levels should include the determination of the influences of the network of inter-reservoir
water transfers.

4.1.3. Model Performances


In general, MLR was not able to detect and predict the variability of the dam water
levels. On the other hand, the lower performance of VAR in detecting the influence of
the seasonal climate factors and climate indices in detecting the variability of the dam
water levels is attributed to the low convergence rate, as the convergence tends to be
unstable, and the predictions easily fall into the local optimum trap, with an increase in the
computational time, especially for the non-stationary variables [32]. On the other hand, the
main advantage of the RFR machine learning, resulting in generally good results with all
the variables, is in the ability to detect and discard the outlier dam water levels with ease
due to the improved grouping of water level data contained in the set of terminal nodes
in the decision tree. The results from MLR, VAR and RFR imply that the fluctuations in
the water level in the dams are difficult to capture using the stochastic linear models [33].
The advantage of RFR and why it was able to give relatively good results is that it can
Sustainability 2022, 14, 14934 25 of 31

handle non-linear and non-Gaussian data well and with minimal over-fitting problems as
the number of trees increases [34].
MLP-ANN results support the suggestion that data-driven techniques tend to over-
come the drawbacks of traditional models in terms of accuracy and the ability to model
complex phenomena [35]. MLP-ANN was able to capture the influence of climate factors
and climate indices with higher accuracy, though it had non-converging prediction using
LULC. For the two dams, it is possible to infer that the MLP-ANN predictions adapted
to the changing climate conditions. The advantages of the ANNs over other methods in
predicting dam levels can be attributed to the fact that the ANN structure can detect and
include the non-linear components of the system in the whole data set. Comparatively,
in predicting reservoir water levels for the Angat dam in the Philippines, [25] tested the
Naïve-persistence and Seasonal Mean methods as baselines against ARIMA, gradient boost-
ing machines (GBM), and Deep Neural Networks based on LSTM, univariate (DNN-U)
and multivariate models (DNN-M). The results showed that the prediction of the dam
water levels was better performed using the data driven Deep Neural Network and not the
traditional linear models.

4.2. VAR-ANN Hybrid Dam Water Prediction Model


The results show that neither the stochastic VAR, the decision tree based RFR, nor the
MLP-ANN can independently detect the compounded impacts of LULC, climate factors,
and climate indices in predicting the dam water levels. In particular, the stochastic VAR is
observed to be more capable of predicting the dam water levels using LULC, which exhibits
a linear trend from the five-year interval interpolations, while MLP-ANN performed better
than RFR and VAR in predicting the dam water levels using the seasonal and non-linear
climate factors and indices.
Since time-series hydrological data comprises different frequency components char-
acterized by non-linear interactions, hybrid models have been proposed to improve the
performance in hydrological prediction [36]. These approaches include Neural Networks
based on Set Pair Analysis (SPA) and Principal Component Analysis (PCA) [37,38], Chaotic
Neural Networks [39], Cluster Hybrid Neural Networks [40], And Bootstrapped Artificial
Neural Networks [41,42].
From the prediction results, a hybrid dam water level prediction model comprising
VAR-ANN is proposed as optimal in modeling the linear and non-linear components of the
dam water levels. The VAR-ANN time-series representation of the dam water levels W Lt is
proposed to comprise the linear Lt and non-linear Nt predictor variables (Equation (11)).
W Lt = ( Lt + Nt ) (11)
 
_
ε t = W Lt − L t (12)

_ 
N t = f e1 , e2 , . . . , e t − p , e t (13)
 
_ _ _
W L t = L t + Nt (14)

In the implementation, VAR is fitted to the linear components and the outcome linear-
_
based predictions L t at time t are derived. The residuals from the VAR, termed as ε t at
time t are determined as in Equation (12). The ε t dataset after VAR fitting is considered
to contain the non-linear Nt time-series components of the dam water W Lt levels and can
be modelled using the ANN. With p input nodes, the ANN for residuals has the form
in Equation (13), with f as the non-linear function estimated by the ANN and ε t is the
_
white noise. If N t is the ANN prediction, then the hybrid prediction of at time t is defined
according to Equation (14). The hybrid VAR-ANN model is implemented as depicted in
Figure 17.
sidered to contain the non‐linear Nt time‐series components of the dam water W L t lev‐
els and can be modelled using the ANN. With p input nodes, the ANN for residuals has
the form in Equation (13), with f as the non‐linear function estimated by the ANN and  t

is the white noise. If Nt is the ANN prediction, then the hybrid prediction of at time t is
Sustainability 2022, 14, 14934 26 of 31
defined according to Equation (14). The hybrid VAR‐ANN model is implemented as de‐
picted in Figure 17.

Figure17.
Figure 17.VAR-ANN
VAR‐ANNhybrid
hybridmodel
modelfor
fordam
damwater
waterlevel
levelprediction.
prediction.

Fromthe
From thebest
bestpredictor
predictorvariables
variablesforforboth
bothdams,
dams,thetheaverage
averageresults
resultsofofthe
thehybrid
hybrid
VAR‐ANN model for the two dams, presented in Figure 18, show
VAR-ANN model for the two dams, presented in Figure 18, show an overall improvementan overall improvement
ininthe
theprediction
predictionaccuracy
accuracyofofthethedam
damwater
waterlevels.
levels.TheTheresults
resultsshow
showthat
thatthe
thehybrid
hybridmodel
model
integratesthe
integrates the linear and
andnon‐linear
non-linearvariabilities
variabilitiesinin
thethe
predictor
predictor datasets to accurately
datasets pre‐
to accurately
dict thethe
predict dam
damwater
waterlevels.
levels.The
TheVAR‐ANN
VAR-ANNproduces
produces positive
positive predictions usingusing rainfall,
rainfall,
temperature,climate
temperature, climateindices,
indices,and
andLULC,
LULC,with averageR2R>
withananaverage 2 >0.84
0.84and
andMAPE
MAPE<<10%. 10%.The
The
resultsshow
results showthat
thatthe
theaverage
averageprediction
predictionRMSE,
RMSE,MAE,MAE,and andMAPE
MAPEerror errormeasures
measuresfor forboth
both
dams
damsarearealso
alsosignificantly
significantlyreduced.
reduced. TheThe results
results imply
imply that
thatthethehybrid
hybridmodel
modelisisable
abletoto
capture
capturethetheparametric
parametricsensitivities
sensitivitiesofofboth
boththe
thehigh
highandandlow
lowsignificant
significantvariables
variablesthat
thatare
are
depicted
depictedininFigures
Figures15 15and
and16.
16.

4.3. Average Model Errors and ROC Area under Curve (AUC)
The average model prediction error E (%) in Equation (15) is determined as the average
for both dams using the best predictor parameters with the highest R2 and the least RMSE,
MAE, and MAPE error measures. In Figure 19, for the average predicted dam water level
errors for the four models, the combination of the VAR and ANN diminishes the magnitude
of the prediction error between the predicted and observed dam water levels for the two
dams, producing the least errors for the predicted time-series dam water levels, and thus
improved consistency in predicting the water levels.
 
W L predicted − W Lobserved
E= × 100% (15)
W Lobserved

In the first months, the E (%) for VAR-ANN is observed to be between −5% and +8%
of dam water levels and diminishes to nearly 0.01% for more than 70% of the predicted dam
water levels. Even though MLP-ANN performs better than RFR and VAR, its prediction
errors exhibit low convergence with sinusoidal patterns in time, and this could be attributed
to the influence of LULC. RFR and VAR present higher degrees of error at about 5–10%,
with VAR exhibiting random spikes in error with time.
To further infer the significance of the models, the area under the receiver operating
characteristic curve scores were computed for the two dams, with the results in Figure 19.
The AUC scores are also based on the average true positive (sensitivity) and false positive
rates (specificity) measures from the average of the best predictor variables for the dams.
The results in Figure 20 show that for the Bokaa dam and Gaborone dam, VAR-ANN had
the highest AUC scores, 0.89 and 0.93, performing better than MLP-ANN and RFR. The
AUC scores for RFR were nearly equal, at 0.77 and 0.78, respectively, for the Bokaa dam and
Gaborone dam, while VAR performance was at AUC < 0.7 for both dams. Despite the good
performance from VAR-ANN, the MAPE measures for the Gaborone dam were observed
to be higher than those of the Bokaa dam. The average AUC shows that the VAR-ANN has
a higher ability to predict the dam water levels from all the predictor variables.
Sustainability 2022, 14, 14934 27 of 31
Sustainability 2022, 14, x FOR PEER REVIEW 28 of 32

Figure 18. VAR-ANN model average results for dam water level predictions in Bokaa and Gaborone dams.
Figure 18. VAR‐ANN model average results for dam water level predictions in Bokaa and Gaborone dams.
WLobserved
In the first months, the E (%) for VAR‐ANN is observed to be between −5% and +8%
of dam water levels and diminishes to nearly 0.01% for more than 70% of the predicted
dam water levels. Even though MLP‐ANN performs better than RFR and VAR, its predic‐
Sustainability 2022, tion errors exhibit low convergence with sinusoidal patterns in time, and this could be
14, 14934 28 of 31
attributed to the influence of LULC. RFR and VAR present higher degrees of error at about
5–10%, with VAR exhibiting random spikes in error with time.

Sustainability 2022, 14, x FOR PEER REVIEW 30 of 32

Figure 19. MeanFigure 19. Mean


dam water dam
level water level
prediction prediction
errors errorsVAR‐ANN,
E (%) from E (%) from VAR-ANN,
MLP‐ANN,MLP-ANN,
RFR and RFR and VAR.
VAR.

To further infer the significance of the models, the area under the receiver operating
characteristic curve scores were computed for the two dams, with the results in Figure 19.
The AUC scores are also based on the average true positive (sensitivity) and false positive
rates (specificity) measures from the average of the best predictor variables for the dams.
The results in Figure 20 show that for the Bokaa dam and Gaborone dam, VAR‐ANN had
the highest AUC scores, 0.89 and 0.93, performing better than MLP‐ANN and RFR. The
AUC scores for RFR were nearly equal, at 0.77 and 0.78, respectively, for the Bokaa dam
and Gaborone dam, while VAR performance was at AUC < 0.7 for both dams. Despite the
good performance from VAR‐ANN, the MAPE measures for the Gaborone dam were ob‐
served to be higher than those of the Bokaa dam. The average AUC shows that the VAR‐
ANN has a higher ability to predict the dam water levels from all the predictor variables.

Figure 20. ROC curves for VAR, RFR, ANN and VAR‐ANN and the respective AUC scores.
Figure 20. ROC curves for VAR, RFR, ANN and VAR-ANN and the respective AUC scores.
ability 2022, 14, x. https://doi.org/10.3390/xxxxx 5. Conclusions www.mdpi.com/journal/sustainability

Under the influence of climate change and the intensification of land‐use land-use activities,
activities,
understanding dam water capacity variations is important for planning dam water supply
understanding
regimes andandmanagement.
management.In In thethe
present study,
present damdam
study, water level level
water observations in the Bokaa
observations in the
dam and
Bokaa dam Gaborone dam, indam,
and Gaborone the semi‐arid Botswana,
in the semi-arid were simulated
Botswana, and predicted
were simulated using
and predicted
linear multilinear
using regression
linear multilinear (MLR) and
regression (MLR)stochastic Vector AutoRegression
and stochastic (VAR) models,
Vector AutoRegression (VAR)
along with
models, Random
along Forest Regression
with Random (RFR) and
Forest Regression Multilayer
(RFR) Perceptron
and Multilayer Neural Network
Perceptron Neural
Network
(MLP‐ANN) (MLP-ANN)
techniques. techniques.
Using LULC, Using LULC,factors
climate climate(rainfall
factors and
(rainfall and temperature)
temperature) and cli‐
and
mateclimate
indicesindices
(DSLP, (DSLP, Aridity
Aridity IndexIndex
(AI),(AI),
SOISOI
andand Niño
Niño 3.4)
3.4) asasthe
thedam
damwater
water predictor
predictor
variables,
variables, the
theresults
resultsshow
showthat
thatthethe
stochastic VAR
stochastic was was
VAR able able
to detect the variation
to detect of LULC
the variation of
with dam water levels better than MLR, RFR and MLP-ANN, while
LULC with dam water levels better than MLR, RFR and MLP‐ANN, while RFR and MLP‐ RFR and MLP-ANN
captured
ANN capturedthe relationships with the
the relationships withclimate conditions
the climate with MLP-ANN,
conditions with MLP‐ANN, performing better
performing
than
betterRFR.
thanThe
RFR.stochastic VAR was
The stochastic VARnot wasable
nottoable
correlate rainfall
to correlate and temperature
rainfall with
and temperature
the
withdam water
the dam levels,
water except
levels, when
except whenintegrated with
integrated thethe
with four
fourclimate
climateindices.
indices. RFR
RFR and
and
MLP‐ANN gave the highest dam water level prediction results using rainfall, tempera‐
ture, and the climate indices. MLP‐ANN gave the best prediction results for the dam water
level fluctuations for both dams, with the Gaborone dam predictions being more accurate
than those for the Bokaa dam in terms of R2, but slightly lower when determined using
MAPE. The higher MAPE for the Gaborone dam confirmed that the dam does not entirely
Sustainability 2022, 14, 14934 29 of 31

MLP-ANN gave the highest dam water level prediction results using rainfall, temperature,
and the climate indices. MLP-ANN gave the best prediction results for the dam water level
fluctuations for both dams, with the Gaborone dam predictions being more accurate than
those for the Bokaa dam in terms of R2 , but slightly lower when determined using MAPE.
The higher MAPE for the Gaborone dam confirmed that the dam does not entirely rely
on precipitation, but also on conjunctive water sources, including periodic direct supply
from the Bokaa dam and wellfields. The proposed VAR-ANN hybrid model improved the
prediction accuracy of the dam water levels for both dams by integrating the linear and non-
linear variabilities in the predictor datasets and the dam water levels. To improve on the
current study, the temporal intervals for the LULC should be increased to annual in order
to accurately capture the seasonal variabilities in the LULC; secondly, the contributions of
water sources from wellfields and other dams should be incorporated into the prediction
modeling. For the low convergence in the simulation and prediction of the dam water
levels, using faster and hybrid tree-based machine learning algorithms is recommended for
further investigations.

Author Contributions: Conceptualization, Y.O.O., D.B.M. and G.A.; Funding acquisition, Y.O.O. and
J.Q.; Investigation, B.N., P.O. and B.P.P.; Methodology, Y.O.O., D.B.M. and G.A.; Project administration,
B.N., P.O., B.P.P. and J.Q.; Resources, B.N., B.P.P. and J.Q.; Writing—original draft, Y.O.O. and D.B.M.
All authors have read and agreed to the published version of the manuscript.
Funding: This research project was funded by the Office of Research and Development (ORD) of the
University of Botswana and by USAID Partnerships for Enhanced Engagement in Research (PEER)
under the PEER program cooperative agreement number: AID-OAA-A-11-00012.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data sources for this study are summarized as: (1) Landsat-
image from USGS Earth Explorer (https://earthexplorer.usgs.gov/, accessed on 17 November 2021);
(2) DEM from ALOS PALSAR (https://search.asf.alaska.edu/#/, accessed on 17 November 2021);
(3) precipitation and temperature data from the Department of Meteorological Services (Botswana);
and (4) dam reservoir water level data from the Department of Water and Sanitation (Botswana) and
Water Utilities Corporation (WUC) (Botswana). The rest of the data used in the study are presented
in this paper.
Acknowledgments: The authors wish to thank the Department of Water and Sanitation (Botswana)
for providing the measured dam water levels.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Garza-Díaz, L.E.; DeVincentis, A.J.; Sandoval-Solis, S.; Azizipour, M.; Ortiz-Partida, J.P.; Mahlknecht, J.; Cahn, M.; Medellín-
Azuara, J.; Zaccaria, D.; Kisekka, I. Land-use optimization for sustainable agricultural water management in Pajaro Valley,
California. J. Water Resour. Plan. Manag. 2019, 145, 05019018. [CrossRef]
2. Wantzen, K.M.; Rothhaupt, K.O.; Mörtl, M.; Cantonati, M.; Tóth, L.G.; Fischer, P. Ecological effects of water-level fluctuations
in lakes: An urgent issue. In Ecological Effects of Water-Level Fluctuations in Lakes; Springer: Dordrecht, The Netherlands, 2008;
pp. 1–4.
3. Hu, W.; Zhai, S.; Zhu, Z.; Han, H. Impacts of the Yangtze River water transfer on the restoration of Lake Taihu. Ecol. Eng. 2008, 34,
30–49. [CrossRef]
4. Mosavi, A.; Ozturk, P.; Chau, K. Flood prediction using machine learning models: Literature review. Water 2018, 10, 1536.
[CrossRef]
5. Khan, M.S.; Coulibaly, P. Application of support vector machine in lake water level prediction. J. Hydrol. Eng. 2006, 11, 199–205.
[CrossRef]
6. Altunkaynak, A. Forecasting surface water level fluctuations of Lake Van by artificial neural networks. Water Resour. Manag.
2007, 21, 399–408. [CrossRef]
7. Lai, X.; Jiang, J.; Liang, Q.; Huang, Q. Large-scale hydrodynamic modeling of the middle Yangtze River Basin with complex
river–lake interactions. J. Hydrol. 2013, 492, 228–243. [CrossRef]
Sustainability 2022, 14, 14934 30 of 31

8. Li, Y.; Zhang, Q.; Werner, A.; Yao, J. Investigating a complex lake–catchment–river system using artificial neural networks: Poyang
Lake (China). Hydrol. Res. 2015, 46, 912–928. [CrossRef]
9. Zaji, A.H.; Bonakdari, H.; Gharabaghi, B. Reservoir water level forecasting using group method of data handling. Acta Geophys.
2018, 66, 717–730. [CrossRef]
10. Kumar, R.; Singh, M.P.; Roy, B.; Shahid, A.H. A comparative assessment of metaheuristic optimized extreme learning machine
and deep neural network in multi-step-ahead long-term rainfall prediction for all-Indian regions. Water Resour. Manag. 2021, 35,
1927–1960. [CrossRef]
11. Do Carmo, J.S.A. Physical Modelling vs. Numerical Modelling: Complementarity and Learning. 2020. Available online:
https://www.preprints.org/manuscript/202007.0753/v2 (accessed on 17 November 2021).
12. Fotovatikhah, F.; Herrera, M.; Shamshirband, S.; Chau, K.; Faizollahzadeh, A.S.; Piran, M.J. Survey of computational intelligence
as basis to big flood management: Challenges, research directions and future work. Eng. Appl. Comput. Fluid Mech. 2018, 12,
411–437. [CrossRef]
13. Li, B.; Yang, G.; Wan, R.; Dai, X.; Zhang, Y. Comparison of random forests and other statistical methods for the prediction of lake
water level: A case study of the Poyang Lake in China. Hydrol. Res. 2016, 47, 69–83. [CrossRef]
14. Trichakis, I.C.; Nikolos, I.K.; Karatzas, G.P. Artificial Neural Network (ANN) Based Modeling for Karstic Groundwater Level
Simulation. Water Resour. Manag. 2011, 25, 1143–1152. [CrossRef]
15. Hipni, A.; El-shafie, A.; Najah, A.; Karim, O.A.; Hussain, A.; Mukhlisin, M. Daily forecasting of dam water levels: Comparing a
Support Vector Machine (SVM) Model with Adaptive Neuro Fuzzy Inference System (ANFIS). Water Resour. Manag. 2013, 27,
3803–3823. [CrossRef]
16. Sapitang, M.; Ridwan, W.M.; Faizal, F.K.; Najah, A.A.; El-Shafie, A. Machine learning Application in reservoir water level
forecasting for sustainable hydropower generation strategy. Sustainability 2020, 12, 6121. [CrossRef]
17. Seo, Y.; Kim, S.; Singh, V.P. Multistep-ahead flood forecasting using wavelet and data-driven methods. KSCE J. Civ. Eng. 2015, 19,
401–417. [CrossRef]
18. Piri, J.; Kahkha, M.R.R. Prediction of water level fluctuations of chahnimeh reservoirs in Zabol using ANN, ANFIS and Cuckoo
optimization algorithm. Iran. J. Health Saf. Environ. 2016, 4, 706–715.
19. Zhang, S.; Lu, L.; Yu, J.; Zhou, H. Short term water level prediction using different artificial intelligent models. In Proceedings of
the 5th International Conference on Agro-geoinformatics (Agro-geoinformatics), Tianjin, China, 18–20 July 2016.
20. Üneş, F.; Demirci, M.; Taşar, B.; Kaya, Y.Z.; Varçin, H. Estimating dam reservoir level fluctuations using data-driven techniques.
Pol. J. Environ. Stud. 2018, 28, 3451–3462. [CrossRef]
21. Hong, J.; Lee, S.; Bae, J.H.; Lee, J.; Park, W.J.; Lee, D.; Kim, J.; Lim, K.J. Development and Evaluation of the Combined Machine
Learning Models for the Prediction of Dam Inflow. Water 2020, 12, 2927. [CrossRef]
22. Choi, C.; Kim, J.; Han, H.; Han, D.; Kim, H.S. Development of Water Level Prediction Models Using Machine Learning in
Wetlands: A Case Study of Upo Wetland in South Korea. Water 2020, 12, 93. [CrossRef]
23. Wang, Q.; Wang, S. Machine Learning-Based Water Level Prediction in Lake Erie. Water 2020, 12, 2654. [CrossRef]
24. Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M5 Accuracy Competition: Results, Findings and Conclusions. Int. J. Forecast.
2022, 38, 1365–1385. [CrossRef]
25. Ibañez, S.C.; Dajac, C.V.G.; Liponhay, M.P.; Legara, E.F.T.; Esteban, J.M.H.; Monterola, C.P. Forecasting reservoir water levels
using deep neural networks: A case study of Angat Dam in the Philippines. Water 2021, 14, 34. [CrossRef]
26. Hyndman, R.J. A Brief History of Forecasting Competitions. Int. J. Forecast. 2020, 36, 7–14. [CrossRef]
27. Ouma, Y.O.; Moalahi, D.; Anderson, G.; Nkwae, B.; Odirile, P.; Parida, B.P.; Sebusang, N.; Nkgau, T.; Qi, J. Predicting the variability
of dam water levels with land-use and climatic factors using Random Forest and Vector AutoRegression models. Proceedings of
SPIE 12262, Remote Sensing for Agriculture, Ecosystems, and Hydrology XXIV, 122620J, Berlin, Germany, 5–7 September 2022.
28. Breiman, L. Random forests. Mach Learn. 2001, 45, 5–32. [CrossRef]
29. Ouma, Y.; Nkwae, B.; Moalafhi, D.; Odirile, P.; Parida, B.; Anderson, G.; Qi, J. Comparison of Machine Learning Classifiers For
Multitemporal and Multisensor Mapping of Urban LULC Features. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 43,
681–689. [CrossRef]
30. Manatsa, D.; Chingombe, W.; Matsikwa, H.; Matarira, C.H. The superior influence of Darwin Sea level pressure anomalies over
ENSO as a simple drought predictor for Southern Africa. Theor. Appl. Climatol. 2008, 92, 1–14. [CrossRef]
31. Ouma, Y.O.; Okuku, C.O.; Njau, E.N. Use of artificial neural networks and multiple linear regression model for the prediction of
dissolved oxygen in rivers: Case study of hydrographic basin of River Nyando, Kenya. Complexity 2020, 2020, 9570789. [CrossRef]
32. Ahmed, A.N.; Yafouz, A.; Birima, A.H.; Kisi, O.; Huang, Y.F.; Sherif, M.; Sefelnasr, A.; El-Shafie, A. Water level prediction using
various machine learning algorithms: A case study of Durian Tunggal river, Malaysia. Eng. Appl. Comput. Fluid Mech. 2022, 16,
422–440. [CrossRef]
33. Štefelová, N.; Alfons, A.; Palarea-Albaladejo, J.; Filzmoser, P.; Hron, K. Robust regression with compositional covariates including
cellwise outliers. Adv. Data Anal. Classif. 2021, 15, 869–909. [CrossRef]
34. Genuer, R.; Poggi, J.-M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recogn. Lett. 2010, 31, 2225–2236.
[CrossRef]
35. Allawi, M.F.; Binti Othman, F.; Afan, H.A.; Ahmed, A.N.; Hossain, M.S.; Fai, C.M.; El-Shafie, A. Reservoir Evaporation Prediction
Modeling Based on Artificial Intelligence Methods. Water 2019, 11, 1226. [CrossRef]
Sustainability 2022, 14, 14934 31 of 31

36. Okkan, U.; Serbes, Z.A. The combined use of wavelet transform and black box models in reservoir inflow modeling. J. Hydrol.
Hydromech. 2013, 61, 112–119. [CrossRef]
37. Wang, W.; Van Gelder, P.; Vrijling, J.K.; Ma, J. Forecasting daily streamflow using hybrid ANN models. J. Hydrol. 2006, 324,
383–399. [CrossRef]
38. Wu, C.L.; Chau, K.W.; Li, Y.S. Predicting monthly streamflow using data-driven models coupled with data-preprocessing
techniques. Water Resour. Res. 2009, 45, W08432. [CrossRef]
39. Karunasinghe, D.S.K.; Liong, S.Y. Chaotic time series prediction with a global model: Artificial neural network. J. Hydrol. 2006,
323, 92–105. [CrossRef]
40. Cigizoglu, H.K.; Kisi, O. Flow prediction by three back propagation techniques using k-fold partitioning of neural network
training data. Nord. Hydrol. 2005, 36, 49–64.
41. Seo, Y.; Park, K.B.; Kim, S.; Singh, V.P. Application of bootstrap-based artificial neural networks to flood forecasting and
uncertainty assessment. Proceedings of 6th International Perspective on Water Resources and the Environment, Izmir, Turkey,
7–9 January 2013; EWRI-ASCE: Reston, VA, USA, 2013.
42. Tiwari, M.K.; Chatterjee, C. Development of an accurate and reliable hourly flood forecasting model using wavelet-bootstrap-ANN
(WBANN) hybrid approach. J. Hydrol. 2010, 394, 458–470. [CrossRef]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy