Research Article: Multivariate Streamflow Simulation Using Hybrid Deep Learning Models
Research Article: Multivariate Streamflow Simulation Using Hybrid Deep Learning Models
Research Article
Multivariate Streamflow Simulation Using Hybrid Deep
Learning Models
Received 13 August 2021; Revised 30 September 2021; Accepted 5 October 2021; Published 27 October 2021
Copyright © 2021 Eyob Betru Wegayehu and Fiseha Behulu Muluneh. This is an open access article distributed under the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
Reliable and accurate streamflow simulation has a vital role in water resource development, mainly in agriculture, environment,
domestic water supply, hydropower generation, flood control, and early warning systems. In this context, these days, deep learning
algorithms have got enormous attention due to their high-performance simulation capacity. In this study, we compared multilayer
perceptron (MLP), long short-term memory (LSTM), and gated recurrent unit (GRU) with the proposed new hybrid models,
including CNN-LSTM and CNN-GRU. Hence, we can simulate one-step daily streamflow in different agroclimatic conditions,
rolling time windows, and a range of variable input combinations. The analysis used daily multivariate and multisite time series
data collected from Awash River Basin (Borkena watershed: Ethiopia) and Tiber River Basin (Upper Tiber River Basin: Italy)
stations. The datasets were subjected to rigorous quality control processes. Consequently, it rolled to a different time lag to remove
noise in the time series and further split into training and testing datasets using a ratio of 80 : 20, respectively. Finally, the results
showed that integrating the GRU layer with the convolutional layer and using monthly rolled average daily input time series could
substantially improve the simulation of streamflow time series.
1. Introduction in most parts of the world [4]. Tourian et al. [5] gathered a
time series plot of the number of stations with available
One of the emerging research areas in hydrology is hy- discharge data from the Global Runoff Data Centre (GRDC).
drological simulation [1], through which catchment re- This time series indicates a decline in the total monitored
sponses are evaluated in terms of meteorological forcing annual stream flows between 1970 and 2010. Besides, in-
variables. Hydrological simulation is also crucial for water adequate discharge observation and malfunctioned gauging
resource planning and management, such as flood pre- stations worsen the situation in developing countries [6].
vention, water supply distribution, hydraulic structure de- Sparsely distributed rain gauge stations in Ethiopia also limit
sign, and reservoir operation [2, 3]. However, river flow the performance of physical hydrological models. Therefore,
simulation is not an easy task since river flow time series are research studies on the robustness of innovative discharge
commonly random, dynamic, and chaotic. The relationship data estimation models are undeniably important.
between streamflow generation and other hydrologic pro- Streamflow simulation models in the literature generally
cesses is nonlinear, which is controlled not only by external are divided into two: (1) process or physical-based models
climatic factors and global warming but also by physical that are generated from catchment characteristics and (2)
catchment characteristics. data-driven models that depend on historically collected
Stream flows are mostly recorded at river gauging sta- data [2, 3, 7]. Process-based models commonly use the
tions. However, different research studies show that the experimental formula that provides insight into physical
availability of gauging station records is generally decreasing characteristics and has extensive data requirements. On the
2 Computational Intelligence and Neuroscience
other hand, data-driven models are suitable and can func- convolutional neural network (CNN), LSTM, and hybrid
tion easily without considering the internal physical CNN-LSTM models for nitrogen oxide emission prediction.
mechanism of the watershed system [2, 3, 7]. They concluded that CNN-LSTM has an accurate and stable
Artificial neural networks (ANNs) are the most used and forecast of periodic nitrogen oxide emissions from the re-
studied “black-box” models. They are utilized in many fining industry. Moreover, Li et al. [15] used univariate and
scientific and technological areas than the list of available multivariate time series data as input for LSTM and CNN-
black-box algorithms, such as support vector machine LSTM models. Hence, for the analysis of air quality using
(SVM), genetic programming (GP), fuzzy logic (FL), re- particulate matter (PM2.5) concentration prediction, the
current neural network (RNN), and long short-term proposed multivariate CNN-LSTM model gives the best
memory (LSTM) [7, 8]. ANN is available in different result due to low error and short training time.
functionalities and architectural forms, from simple to ad- The integration of CNN and LSTM models benefits time
vanced levels. A recurrent neural network (RNN) is one of series prediction models such that the LSTM model can
the advanced ANN architectures. It has been considered a efficiently capture long time sequences of pattern infor-
specially designed deep learning network for time series mation. In contrast, CNN models can filter out the noise of
analysis that quickly adapts to temporal dynamics using the input data and extract more valuable features, which
previous time step information [2]. However, RNN cannot could increase the accuracy of the prediction model [16].
capture long-time dependencies, and it is susceptible to Moreover, integrating CNN with GRU can also lead us to
vanishing and exploding gradients. robust preprocessing of data, providing a viable option to
Couta et al. suggested advanced RNN or long short-term improve the model’s accuracy [17]. Even though combining
memory (LSTM) as one of the most effective approaches [8]. CNN with LSTM showed remarkable results in different
The LSTM unit has a cell that comprises an input gate, an studies, its application in hydrological fields still demands
output gate, and a forget gate [9]. Due to these gates, the more research [18]. Muhammad et al. [19] used LSTM,
LSTM model has shown promising results in different ap- GRU, and hybrid CNN-GRU models for streamflow sim-
plications, including speech recognition, time series mod- ulation based on 35 years of Model Parameter Estimation
elling, natural language processing, handwriting Experiment (MOPEX) dataset of 10 river basins in the USA.
recognition, and traffic flow simulation [3, 10]. Studies have They revealed that the proposed hybrid model outperforms
also shown that LSTM has powerful performance for the conventional LSTM; nevertheless, the performance is
streamflow simulation over different powerful multilayered almost the same with GRU. Recently, Barzegar et al. [20]
(ML) tools [3, 11]. Campos et al. [10] applied autoregressive studied short-term water quality variable prediction using a
integrated moving average (ARIMA) and LSTM network to hybrid CNN-LSTM model and effectively captured low and
forecast floods on four Para’ iba do Sul’s River stations in high water quality variables, mainly dissolved oxygen
Brazil. Aljahdali et al. [7] also compared the LSTM network concentrations.
and layered RNN to forecast streamflow in the USA’s two Screening input variables for different model architec-
rivers, the Black and Gila rivers. A recent article by tures is also a challenging task for the researchers. Even
Rahimzad et al. [12] used time-lagged Qt-1, Qt-2, and other though rainfall, evaporation, and temperature are causal
climatic variables to forecast Qt in the future and concluded variables for streamflow modelling, data availability and
that the LSTM network outperforms linear regression (LR), study objectives limit the choice variability [21]. Van et al.
multilayer perceptron (MLP), and support vector machine [21] discussed that applying temperature and evapotrans-
(SVM) in forecasting daily streamflow. piration input nodes into the model increases the network
A few years back, Cho et al. [13] introduced gated re- complexity and causes overfitting. In contrast, Parisouj et al.
current units (GRUs) similar to LSTM with a forget gate [22] concluded that using readily available input variables
which have fewer parameters than LSTM, as it lacks an such as temperature and precipitation for data-driven
output gate. GRU’s capacities in speech signal modelling and streamflow simulation will provide a reliable result. Hence,
natural language processing were similar to those of LSTM. this research will contribute a step to this debate by testing
However, there are debates on the relative performance of different input combinations of various climatic regions in
these two architectures for streamflow and reservoir inflow the performance of the proposed models.
simulation, which is not well studied with different time- To the best of our knowledge, we identify minimal lit-
scales and environments. erature that shows the performance variation of different
Notwithstanding the difference in their performance, hybrid models for streamflow simulation in various input
selecting appropriate time series models from various variability conditions at once. Thus, we compared various
known deep learning network architectures is difficult. forms of hybrid CNN-LSTM and CNN-GRU architectures
LSTMs and GRUs are not always the ideal sequence pre- with the classical MLP, GRU, and LSTM networks to
diction option. However, simulation with better prediction simulate single-step streamflow using two climatic regions,
accuracy, fast running time, and less complicated models available precipitation, and minimum and maximum tem-
requires more research. Hence, this comparative analysis on perature data. Moreover, the study tests the hybrid models
the network architectures helps decide the optimized al- with different layer arrangements and applies Keras tuner to
ternative for time series analysis. Recently, different hybrid optimize model hyperparameters. In general, the primary
deep learning models are getting wide attention from re- objective of this study will be to test the performance var-
searchers in various fields of study. Chen et al. [14] used iation of the proposed models with extreme input variability
Computational Intelligence and Neuroscience 3
conditions, which includes climatic, input combination, 31, 1978. Both case study datasets are multivariate and
input time window, and average rolling time window multisite. Even though we are highly concerned and chose
variability. the series of time windows with minimum data gap for
This study used different open-source software and both stations, the datasets contain many missing values
machine learning libraries, including Python 3.6 for pro- due to different reasons. Thus, our first task was to fill the
gramming, NumPy, pandas, Scikit-learn, Hydroeval, Stats- missing values with the Monte Carlo approach for this
models, and Matplotlib libraries. All were used for data research.
preprocessing, evaluation, and graphical interpretation. The study applied linear correlation statistics to mea-
Moreover, TensorFlow and Keras deep learning frameworks sure the strength of dependency between different input
were employed for modelling deep learning architectures. variables [25]. Even though Mehr and Gandomi [26] stated
that linear correlation might mislead or provide abundant
2. Study Area inputs, our study does no’t have a huge feature size that
requires intensive feature selection criteria. Hence, we
In the present study, two river subcatchments were selected adopted a linear correlation coefficient. Moreover, Kun
in two climatic regions: the Awash River Basin, Borkena et al. [27] concluded that Pearson correlation coefficient
subcatchment in Ethiopia (Figure 1(a)), and the Upper Tiber (PCC) is the most applicable for multiple linear regressions
River Basin in Italy (Figure 1(b)). (MLRs), and Oyebode [28] also stated that inputs selected
with PCC showed superior model accuracy. Hence, this
study applied Pearson linear correlation coefficient [29, 30].
2.1. Borkena Watershed (Ethiopia). The first case study area
It has a value ranging between “+1” and “-1,” where “+1”
is in the Borkena watershed at the Kombolcha station outlet,
indicates a positive linear correlation, “0” is no linear
located in the upper part of the Awash River Basin in the
correlation, and “-1” shows a negative linear correlation
northern part of Ethiopia. The mainstream of the watershed
[25]. Equation (1) calculates the Pearson correlation co-
emanates from Tosa mountain, which is found near Dessie
efficient, and Tables 1 and 2 present the result. Correlation
town. The area’s altitude ranges from 1,775 m at the lowest
values between positive (0 and 0.3) and negative (0 and
site near Kombolcha to 2,638 m at the highest site upstream
−0.3) show a weak linear relationship among variables [31].
of Dessie. The main rainy season of this watershed is from
However, since we have a small number of variables and
July to September.
data size, for this study, we decided to omit Borkena station
(Tmax) values, which have r values ranging between
2.2. Upper Tiber River Basin (Italy). The second case study (−0.129) and (+0.107), and the details are presented in
area is located in the Upper Tiber River Basin (UTRB) in Table 1.
Italy. The Tiber River Basin (TRB) is the second-largest
catchment in Italy [23]. Geographically, the basin is located N XY − X Y
r � �������������������������������. (1)
between 40.5°N to 43°N latitudes and 10.5° E to 13° E lon- 2 2 2 2
N x − X N Y − Y
gitudes, covering about 17,500 km2 that occupies roughly 5%
of the Italian territory. The Upper Tiber River Basin (UTRB)
is part of the TRB, covering 4145 km2 (∼20% of the TRB) After passing rigorous quality control processes, the raw
with its outlet at Ponte Nuovo. The elevation of the data were then split chronologically into training and testing
catchment ranges from 148 to 1561 m above sea level. The datasets with a ratio of 80 : 20, respectively. The time series
area’s climate is the Mediterranean, with precipitation graph and the corresponding box plot of split data for both
mainly occurring from autumn (March to May) to spring stations are presented in Figure 2. Different options existed
(September to November). The intense rainfall highly in- in the literature to remove noise from the time series. A
fluences the basin’s hydrology at the upstream part that sliding window is the first option to temporarily approxi-
causes frequent floods in the downstream areas [24]. mate the time series data’s actual value [32]. In comparison,
rolling windows (moving average) is the second option that
3. Data Source and Preprocessing smooths the time series data by calculating the average,
maximum, minimum, or sum over a specific time [33].
Borkena’s required hydrological and metrological datasets Hence, for this study, we applied average rolling windows to
were collected from the Ministry of Water Irrigation and smooth and remove noise from the time series by keeping
Energy (MoWIE) of Ethiopia and the National Meteoro- the length of the data still.
logical Agency of Ethiopia (NMA), respectively. UTRB’s Then, daily, weekly, and monthly average rolling sliding
datasets were collected from the National Research Council windows were used to rebuild the input and output time
of Italy (CNR) and archived for public use with the Water series into a supervised learning format. Accordingly, the
Resource Management and Evaluation (WRME) platform in rolled time series data were then prepared with the time lag
the following link: http://hydrogate.unipg.it/wrme/. window of 30 or 45 for single-step streamflow simulation at
We collected 5844 available data series from the time Borkena and UTRB stations, respectively. Moreover, split
window of January 1, 1999, to December 31, 2014, from time series data variable scaling was performed using
the Borkena watershed. Similarly, for UTRB, 7670 data Standard Scaler for the modelling process’s computational
series were collected from January 1, 1958, to December easiness and numerical stability.
4 Computational Intelligence and Neuroscience
43°30′0″N 43°30′0″N
11°0′0″E 11°0′0″N
W E N
S W E
Kilometers
0 3.5 7 14 21 28 43°0′0″N 43°0′0″N
S
Kilometers
0 210420 840 1,260 1,680
Guaging Station
10°30′0″N 10°30′0″N River Catchments in Italy
Borkena Watershed Tiber River Basin
Awash River Basin Upper Tiber River Basin
39°30′0″E 40°0′0″E Ethiopian River Basins Gauging Station(Ponte Nuovo)
12°0′0″E 12°30′0″E
(a) (b)
Table 1: Descriptive statistics of split time series data for the Borkena watershed.
Training data (80%) Testing data (20%)
Stations Data type Pearson correlation with streamflow
Mean Max Min SD Mean Max Min SD
Stream flow (m3/sec) 1.000 10.9 216.9 0.00 23.2 10.1 94.8 0.0 20.2
P (mm/day) 0.321 3.1 73.2 0.0 7.5 2.9 60.4 0.0 7.2
Kombolcha
Tmin (oc) 0.271 12.5 20.9 1.5 3.3 12.5 20.6 2.6 3.4
Tmax (°c) −0.099 27.2 33.6 16.4 2.5 27.3 33.0 19.6 2.1
P (mm/day) 0.344 3.5 81.6 0.0 8.6 3.4 64.3 0.0 8.1
Chefa Tmin (oc) 0.266 13.3 21.5 0.1 3.7 14.1 22.2 3.9 3.5
Tmax (oc) −0.069 29.9 38.0 18.5 2.8 30.3 38.0 22.2 2.5
P (mm/day) 0.335 3.5 80.6 0.0 8.6 2.9 67.0 0.0 7.3
Dessie Tmin (oc) 0.319 8.5 15.5 0.1 2.5 7.8 15.5 0.0 3.1
Tmax (oc) 0.107 23.8 30.0 16.0 1.9 24.1 30.0 15.0 2.1
P (mm/day) 0.372 3.1 81.9 0.0 8.3 2.9 72.1 0.0 7.5
Kemise Tmin (oc) 0.282 13.8 22.0 3.0 3.4 13.5 20.1 4.5 3.6
Tmax (oc) −0.129 31.0 38.3 14.0 2.7 31.9 37.8 23.5 2.4
P (mm/day) 0.347 3.3 80.7 0.0 8.6 3.3 81.3 0.0 8.6
Majete Tmin (oc) 0.202 14.7 23.0 1.4 2.9 14.6 21.5 6.7 2.9
Tmax (oc) −0.057 28.6 37.8 17.2 2.8 29.1 38.0 20.8 2.4
Table 2: Descriptive statistics of split time series data for the UTRB.
Training data (80%) Testing data (20%)
Stations Data type Pearson correlation with streamflow
Mean Max Min SD Mean Max Min SD
Ponte Nuovo Streamflow (m3/sec) 1.000 50.6 939.0 1.9 75.5 50.6 737.0 3.7 68.6
Castel Rigone P (mm/day) 0.384 2.6 72.8 0.0 6.6 2.7 67.7 0.0 6.9
Montecoronaro P (mm/day) 0.339 3.9 229.0 0.0 10.7 4.0 110.0 0.0 10.5
P (mm/day) 0.379 2.4 120.4 0.0 6.6 2.5 61.8 0.0 6.3
Perugia (ISA) Tmin (oc) −0.353 9.7 30.4 −9.0 6.3 9.3 25.2 −5.0 5.6
Tmax (oc) −0.379 17.4 37.4 −4.5 8.1 16.3 33.0 0.6 7.2
Petrelle P (mm/day) 0.345 2.51 90.0 0.0 6.9 2.7 117.1 0.0 7.4
Pietralunga P (mm/day) 0.428 3.22 150.0 0.0 8.1 3.1 73.1 0.0 7.3
P (mm/day) 0.412 2.9 113.6 0.0 7.9 2.9 94.2 0.0 7.8
Spoleto Tmin (oc) −0.265 7.5 23.0 −12.6 6.4 8.8 21.7 −5.4 5.8
Tmax (oc) −0.383 18.8 38.7 −3.5 8.6 18.7 36.8 2.0 7.8
Torgiano P (mm/day) 0.364 2.4 141.2 0.0 7.1 2.5 62.0 0.0 6.9
Tmin (oc) −0.315 8.7 26.0 −12.0 5.9 6.1 19.3 −11.3 5.4
Gubbio
Tmax (oc) −0.377 18.1 39.0 −8.0 8.1 17.4 34.1 −0.9 7.5
Tmin (oc) −0.325 9.2 25.6 −11.6 6.2 8.2 21.5 −8.0 5.6
Assisi
Tmax (oc) −0.378 18.2 37.8 −5.0 8.3 18.1 35.8 0.0 7.8
200
Q (m3/sec) 150
Training Testing
100
50
0
2000 2002 2004 2006 2008 2010 2012 2014
Time (years)
(a)
800
Q (m3/sec)
600
400
200
0
1960 1964 1968 1972 1976 1980
Time (years)
(b)
16 120
14
12 100
Q (m3/sec)
Q (m3/sec)
10 80
8 60
6
40
4
2 20
0 0
Total Train Test Total Train Test
Split data Split data
(a) (b)
Figure 2: Streamflow time series graph and the corresponding box plot of split data. (a) Borkena. (b) UTRB.
1D CNN is mainly implemented for sequence data dense layers. Additionally, dropouts are introduced to
processing [41], 2D CNN is usually used for text and image prevent overfitting. Figure 8 shows the designed model
identification [42], and usually, 3D CNN is recognized for inputs and outputs with a basic description of the con-
modelling medical image and video data identification [43]. volutional, pooling, and LSTM or GRU layers proposed for
Hence, since the aim of the present study is time series this project.
analysis, we implemented 1D CNN. The detailed process of
1D CNN is described in Figure 7. 5. Data Analysis
As depicted in Figure 7, the input series is convoluted to
the convolution layer from top to bottom (shown by the Simulation with deep learning requires selecting a probable
arrows). The grey or the mesh colours represent different combination of hyperparameters: batch size, epochs,
filters where the size of the convolution layer depends on the number of layers, and number of units for each layer [8].
number of input data dimensions, the size of the filter, and Optimizing hyperparameters is not always consistent as
the convolution step length. there is no hard rule to follow. “The process is more of an art
than a science” [44]. Hence, in this study, we chose the Keras
tuner optimizer developed by the Google team and included
4.6. CNN-LSTM and CNN-GRU Hybrid Models. In this it in the Keras open library [45, 46].
study, hybrid models were designed by integrating CNN
with LSTM or GRU layers. Hence, the feature sequence from
the CNN layer was considered as the input for the LSTM or 5.1. Hyperparameter Optimization. Tuning machine learn-
GRU layer, and then the short and long-time dependencies ing model hyperparameters is critical. Varying hyper-
were further extracted. parameter values often results in models with significantly
The proposed CNN-LSTM or CNN-GRU models different performances [47]. The models applied in this
contain two main components: the first component con- study mainly contain two types of hyperparameters: con-
sists of one dimensional single or double convolutional and stant hyperparameters that are not altered through the
average pooling layers. Moreover, a flatten layer is con- optimization process and variable hyperparameters. Adam
nected to further process the data into the format required optimizer is applied under the category of the constant
by the LSTM or GRU. In the second component, the hyperparameters because of its efficiency and easiness to
generated features are processed using LSTM, GRU, and implementation that requires minimum memory and is
Computational Intelligence and Neuroscience 7
Xt Q
LSTM
P(t+1) Tmin(t+1) Tmax(t+1)
CNN-LSTM
Model
Input Output
Architectures
Ct-1 + Ct
ft φ
it
ht-1 Ot
ht ht
Cet
Previous Updated
Short-Term Short-Term
Memory Xt Input Memory
Output y^t
ht-1 ht
rt
1- h^t
Previous Updated
Zt
Long-Term Long-Term
+ +
Short-Term σ σ tanh Short-Term
Memory Memory
Xt Input
W1 W2
Input P1 P2 P3 . . . . . Pn
C1=W1P1+W2P2
C2=W1P2+W2P3
Cn=W1Pn-1+W2Pn
Convolution - Layer C1 C2 C3 . . . . Cn
Multi-variate Timeseries
Convolution - Layer
I1
I2
In
Borkena Stations
P and Tmin of the past 30 days.
Both stations Daily, Weekly and Monthly
rolled single step Streamflow
Upper Tiber Stations
P, Tmin and Tmax of the Past 45 days.
AveragePooling
(single output)
LSTM or GRU
LSTM or GRU
Dropout
Dropout
1D conv
Output
Flatten
Dense
Input
Table 4: Model hyperparameter choices or value ranges for optimization by Keras tuner.
Value ranges∗∗
No Hyperparameters Choices Default
Min Max Step
∗ ∗
1 Conv_1_filter 8 32 8
∗ ∗ ∗ ∗
2 Conv_1_kernal 2 or 3
∗ ∗ ∗ ∗
3 Conv_1_pool_size 2 or 3
∗ ∗
4 Conv_2_filter 8 32 8
∗ ∗ ∗ ∗
5 Conv_2_kernal 2 or 3
∗ ∗ ∗ ∗
6 Conv_2_pool_size 2 or 3
CNN-LSTM1, CNN-LSTM2, CNN-GRU1, CNN-GRU2, LSTM, GRU, or MLP layer 1 ∗ ∗
7 5 30 5
units
∗
8 Dropout 1 0.0 0.3 0.1 0.2
CNN-LSTM1, CNN-LSTM2, CNN-GRU1, CNN-GRU2, LSTM, GRU, or MLP layer 2 ∗ ∗
9 5 30 5
units
∗
10 Dropout 2 0.0 0.3 0.1 0.2
∗ ∗ ∗ ∗
11 Learning rate 1e-2, 1e-3 or 1e-4
∗ ∗
12 Number of epochs 10 100 10
∗ ∗
13 Number of batch sizes 10 100 10
∗∗ ∗
Value ranges or choices for optimization by Keras tuner: (objective � “validation loss,” max trials � 20, and executions per trial � 3). Not applicable.
Table 5: Daily streamflow simulation: performance comparison of the proposed models for different input variables and climatic
conditions.
Borkena UTRB
P + Tmin P P + Tmin + Tmax P
Model 1 R M R2 T R M R2 T R M R2 T R M R2 T
M A T M A T M A T M A T
S E P S E P S E P S E P
E E∗ (sec) E E∗ (sec) E E∗ (sec) E E∗ (sec)
MLP 9.91 5.01 0.77 0.89 9.38 4.63 0.79 0.63 49.11 22.74 0.49 0.78 56.57 28.14 0.33 0.41
GRU 8.78 4.37 0.82 3.61 7.94 3.64 0.85 3.32 46.63 20.89 0.55 2.61 51.09 26.74 0.45 3.39
LSTM 8.41 4.09 0.83 2.35 9.65 4.87 0.78 2.92 48.64 22.79 0.51 3.86 48.59 25.00 0.51 5.98
CNN-LSTM1 8.09 4.07 0.84 0.46 8.57 4.67 0.82 0.41 51.20 22.95 0.45 1.19 56.16 26.55 0.34 0.57
CNN-LSTM2 7.99 4.09 0.85 0.72 9.14 4.50 0.80 0.45 45.38 21.85 0.57 0.82 51.57 25.84 0.44 1.85
CNN-GRU1 7.94 3.66 0.85 0.63 8.32 4.09 0.83 0.86 55.06 23.49 0.37 1.16 52.42 24.98 0.43 0.83
CNN-GRU2 9.07 4.19 0.80 1.01 8.43 4.26 0.83 0.28 45.61 21.79 0.57 0.64 49.96 25.38 0.48 0.68
∗
TTPE (training time per epoch). The bold values indicate the highest performance score.
10 Computational Intelligence and Neuroscience
Table 6: Weekly rolled streamflow simulation: performance comparison of the proposed models for different input variables and climatic
conditions.
Borkena UTRB
P + Tmin P P + Tmin + Tmax P
Model 2 R M R2 T R M R2 T R M R2 T R M R2 T
M A T M A T M A T M A T
S E P S E P S E P S E P
E E∗ (sec) E E∗ (sec) E E∗ (sec) E E∗ (sec)
MLP 8.11 4.29 0.84 0.23 7.33 4.19 0.87 0.22 33.01 20.01 0.60 0.74 38.03 25.17 0.47 0.79
GRU 7.59 3.71 0.86 2.04 7.15 4.13 0.87 2.43 25.21 14.83 0.77 5.56 31.39 19.26 0.64 16.79
LSTM 8.41 4.01 0.82 2.98 7.93 3.91 0.84 1.27 31.07 18.87 0.65 3.55 31.07 19.49 0.65 2.69
CNN-LSTM1 7.90 4.09 0.85 0.78 7.72 4.25 0.85 0.63 28.04 17.33 0.71 0.93 34.57 21.92 0.57 0.62
CNN-LSTM2 7.33 3.86 0.87 0.52 7.63 4.25 0.86 0.55 28.45 16.66 0.71 1.14 35.04 21.77 0.55 1.56
CNN-GRU1 7.83 3.94 0.85 0.44 7.91 4.31 0.85 0.50 30.57 18.01 0.66 2.32 35.14 22.58 0.55 0.63
CNN-GRU2 8.73 4.61 0.81 0.43 8.43 4.35 0.82 0.97 27.81 16.99 0.72 4.37 33.76 23.01 0.59 1.01
∗
TTPE (training time per epoch). The bold values indicate the highest performance score.
Table 7: Monthly rolled streamflow simulation: performance comparison of the proposed models for different input variables and climatic
conditions.
Borkena UTRB
P + Tmin P P + Tmin + Tmax P
Model 2 R M R2 T R M R2 T R M R2 T R M R2 T
M A T M A T M A T M A T
S E P S E P S E P S E P
E E∗ (sec) E E∗ (sec) E E∗ (sec) E E∗ (sec)
MLP 6.68 4.37 0.87 0.58 5.57 3.80 0.91 0.41 20.24 13.84 0.78 0.44 28.79 21.05 0.56 0.41
GRU 5.15 3.52 0.91 1.62 5.22 3.06 0.92 3.31 20.79 14.30 0.77 16.63 26.47 20.08 0.63 4.70
LSTM 5.55 3.49 0.91 2.75 5.76 3.51 0.90 2.51 21.49 15.11 0.76 4.15 32.29 24.47 0.45 5.09
CNN-LSTM1 6.05 4.42 0.89 0.98 5.58 3.40 0.91 0.58 21.53 14.87 0.76 1.29 27.48 21.19 0.60 0.42
CNN-LSTM2 5.36 3.17 0.92 1.41 6.87 4.05 0.86 1.44 19.07 13.53 0.81 0.70 27.79 20.90 0.59 0.42
CNN-GRU1 5.76 3.62 0.90 0.52 5.77 3.56 0.90 0.69 19.31 13.78 0.80 4.87 28.67 21.07 0.57 3.08
CNN-GRU2 5.36 3.25 0.92 0.62 5.15 3.18 0.92 0.78 17.98 12.99 0.83 0.71 27.77 20.36 0.59 1.22
∗
TTPE (training time per epoch). The bold values indicate the highest performance score.
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Loss
Loss
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60
Epochs Epochs
Train Train
Test Test
(a) (b)
Figure 9: Training and test loss function of the optimized high score hybrid model. (a) CNN-GRU2 model for Borkena Station. (b) CNN-
GRU2 model for UTRB Station.
Computational Intelligence and Neuroscience 11
70 70
60 60
Test Q (m3/sec)
50 50
Q (m3/sec)
40 40
30 30
20 20
10 10
0 0
2011-09
2012-01
2012-05
2012-09
2013-01
2013-05
2013-09
2014-01
2014-05
2014-09
2015-01
0 10 20 30 40 50 60 70
Predicted Q (m3/sec)
Test Data_points
Predicted
(a)
175 175
150 150
125 Test Q (m3/sec) 125
Q (m3/sec)
100 100
75 75
50 50
25 25
0 0
1975-01
1975-07
1976-01
1976-07
1977-01
1977-07
1978-01
1978-07
1979-01
Test Data_points
Predicted
(b)
Figure 10: Comparison of true values and predicted values of the optimized high score hybrid model. (a) CNN-GRU2 model for Borkena
Station. (b) CNN-GRU2 model for UTRB Station.
Table 8: Best hybrid model type, input feature, and Keras tuner
1 n t
optimized hyperparameter values for Borkena station with its MSE MAE � Q − Qtsim , (9)
score. n i�1 0bs
CNN-GRU2 where Qobs � discharge observed, Qsim � discharge simu-
Hyperparameters:
Monthly rolled P lated, and n � number of observations. The range of R2 lies
Conv_1_filter 24 between 0 and 1, representing, respectively, no correlation
Conv_1_kernal 2 and a perfect correlation between observed and simulated
Conv_1_pool_size 3 values, whereas smallest RMSE and MAE scores or values
GRU_l1_units 15
close to zero direct to the best model performance.
Dropout1 0.1
GRU_l2_units 20
Dropout2 0.2 6. Results
Learning rate 0.0001
Number of epochs 80 Streamflow simulation result with the proposed seven deep
Number of batch sizes 20 learning architectures, different input time window series,
Score (MSE) 0.083 two climatic regions, two input combinations, and three
12 Computational Intelligence and Neuroscience
Table 9: Best hybrid model type, input features, and Keras tuner optimized hyperparameter values for UTRB station with its MSE score.
CNN-GRU2
Hyperparameters:
Monthly rolled P, Tmin and Tmax
Conv_1_filter 8
Conv_1_kernal 2
Conv_1_pool_size 2
GRU_l1_units 20
Dropout1 0.3
GRU_l2_units 30
Dropout2 0.2
Learning rate 0.0001
Number of epochs 60
Number of batch sizes 40
Score (MSE) 0.193
Figure 11: Internal network structure of the optimized high score hybrid CNN-GRU2 model for Borkena Station.
Computational Intelligence and Neuroscience 13
Figure 12: Internal network structure of the optimized high score hybrid CNN-GRU2 model for UTRB Station.
average rolling time windows is presented in Tables 5, 6, and are RMSE, MAE, R2, and training time per epoch,
7. Regardless of the combination of these conditions, the respectively.
CNN-GRU model showed promising results in most of these
Moreover, from the proposed four hybrid models, CNN-
scenarios (Tables 5 and 7). The highest scores are presented
GRU2 or the model designed by a single 1D CNN layer
here.
showed the highest promising result on trial model 1(UTRB)
(1) In daily streamflow simulation for Borkena station, and model 3, as shown in Tables 5 and 7. In contrast, GRU
CNN-GRU1 scored 7.94, 3.66, 0.85, and 0.63, and for on model 2 (UTRB), CNN-LSTM2 on model 2 (Borkena),
UTRB station, CNN-GRU2 scored 45.61, 21.79, 0.57, and CNN-GRU1 on model 1 (Borkena) shared the second-
and 0.64 for RMSE, MAE, R2, and training time per highest promising result. Streamflow simulation with the
epoch, respectively. CNN-GRU2 model generally showed the highest perfor-
(2) In weekly rolled streamflow simulation for Borkena mance than the other tested hybrid deep learning models
station, CNN-LSTM2 scored 7.33, 3.86, 0.87, and and state-of-the-art LSTM, GRU, and MLP models. In line
0.52, and for UTRB station, GRU scored 25.21, 14.83, with our objectives, the result is discussed with different
0.77, and 5.56 for RMSE, MAE, R2, and training time variability conditions in the following paragraphs.
per epoch, respectively.
(3) In monthly rolled streamflow simulation, the CNN- 6.1. Climatic Region Variability. Testing models in different
GRU2 model showed high performance with 5.15, climatic conditions with historical data will likely provide
3.18, 0.92, and 0.78 scores for Borkena station and robust deep learning models for streamflow simulation in
17.98, 12.99, 0.83, and 0.71 for UTRB station, which the future [51]. Hence, this research also tested different
14 Computational Intelligence and Neuroscience
models in two climatic regions, and irrespective of climatic performance of the selected architectures is irre-
and time window variation, the CNN-GRU model displayed spective of the climatic characteristics of the basins.
the highest scores on tested case study areas. (ii) Combining temperature data with precipitation as
input and inserting to the proposed models had
minimum performance increment in Borkena
6.2. Input Combination Variability. Input combination, station compared to UTRB case study area, which
minimum temperature (Tmin) with precipitation (P), does clearly showed that temperature data scarcity has
not show significant performance increment in the Borkena more performance loss implication in UTRB sta-
station (Tables 5 and 6). In some scenarios, adopting P only tion. On the other hand, the Borkena station has
as input increases the performance of the model (Table 7). In significant natural streamflow variability than
contrast, for UTRB, streamflow simulation with all input UTRB, which is also reflected in the model results.
variables or Tmin, Tmax, and P showed significant perfor- This implies the consideration of catchment re-
mance increments (Table 7). sponse before any deep learning model
applications.
6.3. Average Rolling Time Window Variability. (iii) Rolling the time window of input and output time
Streamflow simulation without rolling daily time series data series for streamflow simulation using the proposed
had deficient performance compared to monthly rolled models considerably increases performance in the
average time series. This could be because the time series UTRB than in the Borkena station.
noise in UTRB is visible compared to that in Borkena (iv) The analysis results also showed that training time
station. As a result, performance increment from daily to per epoch for the hybrid deep learning models is
monthly rolled window models is much higher in UTRB much lower than that of GRU and LSTM models.
than in Borkena station. Deep learning models usually require massive datasets,
Generally, the monthly rolled time window with the and their performance drops with small to medium datasets.
CNN-GRU2 model showed the top performance results in However, from this case study, acceptable results and
both stations (Table 7). The corresponding training and test considering hybrid models’ hyperparameters sensitivity and
loss functions of this optimized high score hybrid model for complexity, future research may further design optimized
both stations are displayed in Figure 9. Consequently, configurations. Moreover, they can test these hybrid models
Figure 10 compares the true values and predicted values of for long-term streamflow simulation in ephemeral, seasonal,
this model. The optimized hybrid model boosts the per- and perennial river systems and other fields of study. Our
formance score and lowers the training time per epoch much future research will try to synchronize the highly performed
better than GRU and LSTM models. This model, input hybrid deep learning models in this study with remote
feature, and Keras tuner optimized hyperparameter values sensing datasets for the problem we experience in the
for both stations with its MSE score are presented in Tables 8 ungauged catchments.
and 9. Moreover, the internal network structures of these
models are also shown in Figures 11 and 12, which display
the model input and output parameter matrices for each
Data Availability
layer. The raw hydrological and metrological datasets used for the
Borkena watershed are available from the corresponding
author upon request. However, authorization letters are
7. Conclusions required from the Ministry of Water Irrigation and Energy
This study showed a comparative analysis of different hybrid (MoWIE) of Ethiopia (http://mowie.gov.et/) and the Na-
deep learning algorithms with state-of-the-art machine tional Meteorological Agency of Ethiopia (NMA) (http://
learning models for one-step daily streamflow simulation at www.ethiomet.gov.et), whereas for UTRB, the datasets can
two river basins or subcatchment stream flow outlets. The be retrieved from an online repository (http://hydrogate.
proposed algorithms for this study are CNN-LSTM and unipg.it/wrme/).
CNN-GRU hybrid deep learning models, each model having
one or two 1D CNN layers with the classic MLP, LSTM, and Conflicts of Interest
GRU models. This study conducted a series of experiments
to observe the performance variation of the proposed The authors declare that they have no conflicts of interest.
models by introducing different input combinations, rolling
time windows, and climatic conditions for streamflow Acknowledgments
simulation. The following list of points will summarize the
significant findings of this study. The corresponding author acknowledges the Ministry of
Water Irrigation and Energy of Ethiopia (MoWIE) and the
(i) CNN-GRU2 with one 1D CNN layer showed the Italian National Research Center (Centro National di Ric-
best simulation performance reporting the lowest cerca CNR) for the hydrological and metrological datasets.
RMSE, MAE, and R2 out of all models in both case The corresponding author also thanks Dr. Fiseha Behulu for
study areas. Such results dictate that the his advice and series follow-up.
Computational Intelligence and Neuroscience 15
[32] S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee, [48] C. Boyraz and Ş. N. Engin, “Streamflow prediction with deep
“Sliding window-based frequent pattern mining over data learning,” in Proceedings of the 6th International Conference
streams,” Information Sciences, vol. 179, no. 22, on Control Engineering Information Technology (CEIT),
pp. 3843–3865, 2009. pp. 1–5, Istanbul, Turkey, October 2018.
[33] E. Zivot and J. Wang, “Rolling analysis of time series,” in [49] E. K. Jackson, W. Roberts, B. Nelsen, G. P. Williams,
Modeling Financial Time Series with S-Plus, pp. 299–346, E. J. Nelson, and D. P. Ames, “Introductory overview: error
Springer, New York, NY, USA, 2003. metrics for hydrologic modelling - a review of common
[34] Y. Wang, W. Liao, and Y. Chang, “Gated recurrent unit practices and an open source library to facilitate use and
network-based short-term photovoltaic forecasting,” Energies, adoption,” Environmental Modelling & Software, vol. 119,
vol. 11, no. 8, p. 2163, 2018. pp. 32–48, 2019.
[35] O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu, “Financial [50] S. Kumar, T. Roshni, and D. Himayoun, “A comparison of
time series forecasting with deep learning : a systematic lit- emotional neural network (ENN) and artificial neural net-
erature review: 2005-2019,” Applied Soft Computing, vol. 90, work (ANN) approach for rainfall-runoff modelling,” Civil
Article ID 106181, 2020. Engineering Journal, vol. 5, no. 10, pp. 2120–2130, 2019.
[36] Q. Zou, Q. Xiong, Q. Li, H. Yi, Y. Yu, and C. Wu, “A water [51] P. Bai, X. Liu, and J. Xie, “Simulating runoff under changing
quality prediction method based on the multi-time scale climatic conditions: a comparison of the long short-term
bidirectional long short-term memory network,” Environ- memory network with two conceptual hydrologic models,”
Journal of Hydrology, vol. 592, Article ID 125779, 2021.
mental Science and Pollution Research, vol. 27, no. 14,
pp. 16853–16864, 2020.
[37] S.-C. Wang, “Artificial neural network,” Interdisciplinary
Computing in Java Programming, vol. 26, pp. 81–100, 2003.
[38] V. Nourani, A. Molajou, H. Najafi, and A. Danandeh Mehr,
“Emotional ANN (EANN): a new generation of neural net-
works for hydrological modeling in IoT,” in Artificial Intel-
ligence In IoT. Transactions On Computational Science And
Computational Intelligence, pp. 45–61, Springer, New York,
NY, USA, 2019.
[39] L. Yan, J. Feng, and T. Hang, “Small watershed stream-flow
forecasting based on LSTM,” Advances in Intelligent Systems
and Computing, vol. 935, pp. 1006–1014, 2019.
[40] S. Zhu, X. Luo, X. Yuan, and Z. Xu, “An improved long short-
term memory network for streamflow forecasting in the upper
Yangtze River,” Stochastic Environmental Research and Risk
Assessment, vol. 14, no. 9, 2020.
[41] A. Osama, A. Onur, K. Serkan, G. Moncef, and I. D. Inman,
“Real-time vibration-based structural damage detection using
one-dimensional convolutional neural networks,” Journal of
Sound and Vibration, vol. 388, pp. 154–170, 2017.
[42] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-
based learning applied to document recognition,” Proceedings
of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[43] H.-C. Shin, H. R. Roth, M. Gao et al., “Deep convolutional
neural networks for computer-aided detection: CNN archi-
tectures, dataset characteristics and transfer learning,” IEEE
Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–
1298, 2016.
[44] A. Essien and C. Giannetti, “A deep learning framework for
univariate time series prediction using convolutional LSTM
stacked autoencoders,” in Proceedings of the IEEE Interna-
tional Symposium on Innovations in Intelligent Systems and
Applications (INISTA), pp. 1–6, Alberobello, Italy, July 2019.
[45] D. S. Abdelminaam, F. H. Ismail, M. Taha, A. Taha,
E. H. Houssein, and A. Nabil, “CoAID-DEEP: an optimized
intelligent framework for automated detecting COVID-19
misleading information on Twitter,” IEEE Access, vol. 9,
pp. 27840–27867, 2021.
[46] O. Kazakov and O. Mikheenko, “Transfer learning and do-
main adaptation based on modeling of socio-economic sys-
tems,” Business Inform, vol. 14, no. 2, pp. 7–20, 2020.
[47] B. Wang and N. Z. Gong, “Stealing hyperparameters in
machine learning,” in Proceedings of the IEEE Symposium on
Security and Privacy (SP), pp. 36–52, San Francisco, CA, USA,
May 2018.