0% found this document useful (0 votes)
81 views16 pages

Research Article: Multivariate Streamflow Simulation Using Hybrid Deep Learning Models

Uploaded by

Haile Solomon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views16 pages

Research Article: Multivariate Streamflow Simulation Using Hybrid Deep Learning Models

Uploaded by

Haile Solomon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Hindawi

Computational Intelligence and Neuroscience


Volume 2021, Article ID 5172658, 16 pages
https://doi.org/10.1155/2021/5172658

Research Article
Multivariate Streamflow Simulation Using Hybrid Deep
Learning Models

Eyob Betru Wegayehu and Fiseha Behulu Muluneh


School of Civil and Environmental Engineering, Addis Ababa Institute of Technology, Addis Ababa University,
Addis Ababa, Ethiopia

Correspondence should be addressed to Eyob Betru Wegayehu; eyob.betru@aait.edu.et

Received 13 August 2021; Revised 30 September 2021; Accepted 5 October 2021; Published 27 October 2021

Academic Editor: Maciej Lawrynczuk

Copyright © 2021 Eyob Betru Wegayehu and Fiseha Behulu Muluneh. This is an open access article distributed under the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.

Reliable and accurate streamflow simulation has a vital role in water resource development, mainly in agriculture, environment,
domestic water supply, hydropower generation, flood control, and early warning systems. In this context, these days, deep learning
algorithms have got enormous attention due to their high-performance simulation capacity. In this study, we compared multilayer
perceptron (MLP), long short-term memory (LSTM), and gated recurrent unit (GRU) with the proposed new hybrid models,
including CNN-LSTM and CNN-GRU. Hence, we can simulate one-step daily streamflow in different agroclimatic conditions,
rolling time windows, and a range of variable input combinations. The analysis used daily multivariate and multisite time series
data collected from Awash River Basin (Borkena watershed: Ethiopia) and Tiber River Basin (Upper Tiber River Basin: Italy)
stations. The datasets were subjected to rigorous quality control processes. Consequently, it rolled to a different time lag to remove
noise in the time series and further split into training and testing datasets using a ratio of 80 : 20, respectively. Finally, the results
showed that integrating the GRU layer with the convolutional layer and using monthly rolled average daily input time series could
substantially improve the simulation of streamflow time series.

1. Introduction in most parts of the world [4]. Tourian et al. [5] gathered a
time series plot of the number of stations with available
One of the emerging research areas in hydrology is hy- discharge data from the Global Runoff Data Centre (GRDC).
drological simulation [1], through which catchment re- This time series indicates a decline in the total monitored
sponses are evaluated in terms of meteorological forcing annual stream flows between 1970 and 2010. Besides, in-
variables. Hydrological simulation is also crucial for water adequate discharge observation and malfunctioned gauging
resource planning and management, such as flood pre- stations worsen the situation in developing countries [6].
vention, water supply distribution, hydraulic structure de- Sparsely distributed rain gauge stations in Ethiopia also limit
sign, and reservoir operation [2, 3]. However, river flow the performance of physical hydrological models. Therefore,
simulation is not an easy task since river flow time series are research studies on the robustness of innovative discharge
commonly random, dynamic, and chaotic. The relationship data estimation models are undeniably important.
between streamflow generation and other hydrologic pro- Streamflow simulation models in the literature generally
cesses is nonlinear, which is controlled not only by external are divided into two: (1) process or physical-based models
climatic factors and global warming but also by physical that are generated from catchment characteristics and (2)
catchment characteristics. data-driven models that depend on historically collected
Stream flows are mostly recorded at river gauging sta- data [2, 3, 7]. Process-based models commonly use the
tions. However, different research studies show that the experimental formula that provides insight into physical
availability of gauging station records is generally decreasing characteristics and has extensive data requirements. On the
2 Computational Intelligence and Neuroscience

other hand, data-driven models are suitable and can func- convolutional neural network (CNN), LSTM, and hybrid
tion easily without considering the internal physical CNN-LSTM models for nitrogen oxide emission prediction.
mechanism of the watershed system [2, 3, 7]. They concluded that CNN-LSTM has an accurate and stable
Artificial neural networks (ANNs) are the most used and forecast of periodic nitrogen oxide emissions from the re-
studied “black-box” models. They are utilized in many fining industry. Moreover, Li et al. [15] used univariate and
scientific and technological areas than the list of available multivariate time series data as input for LSTM and CNN-
black-box algorithms, such as support vector machine LSTM models. Hence, for the analysis of air quality using
(SVM), genetic programming (GP), fuzzy logic (FL), re- particulate matter (PM2.5) concentration prediction, the
current neural network (RNN), and long short-term proposed multivariate CNN-LSTM model gives the best
memory (LSTM) [7, 8]. ANN is available in different result due to low error and short training time.
functionalities and architectural forms, from simple to ad- The integration of CNN and LSTM models benefits time
vanced levels. A recurrent neural network (RNN) is one of series prediction models such that the LSTM model can
the advanced ANN architectures. It has been considered a efficiently capture long time sequences of pattern infor-
specially designed deep learning network for time series mation. In contrast, CNN models can filter out the noise of
analysis that quickly adapts to temporal dynamics using the input data and extract more valuable features, which
previous time step information [2]. However, RNN cannot could increase the accuracy of the prediction model [16].
capture long-time dependencies, and it is susceptible to Moreover, integrating CNN with GRU can also lead us to
vanishing and exploding gradients. robust preprocessing of data, providing a viable option to
Couta et al. suggested advanced RNN or long short-term improve the model’s accuracy [17]. Even though combining
memory (LSTM) as one of the most effective approaches [8]. CNN with LSTM showed remarkable results in different
The LSTM unit has a cell that comprises an input gate, an studies, its application in hydrological fields still demands
output gate, and a forget gate [9]. Due to these gates, the more research [18]. Muhammad et al. [19] used LSTM,
LSTM model has shown promising results in different ap- GRU, and hybrid CNN-GRU models for streamflow sim-
plications, including speech recognition, time series mod- ulation based on 35 years of Model Parameter Estimation
elling, natural language processing, handwriting Experiment (MOPEX) dataset of 10 river basins in the USA.
recognition, and traffic flow simulation [3, 10]. Studies have They revealed that the proposed hybrid model outperforms
also shown that LSTM has powerful performance for the conventional LSTM; nevertheless, the performance is
streamflow simulation over different powerful multilayered almost the same with GRU. Recently, Barzegar et al. [20]
(ML) tools [3, 11]. Campos et al. [10] applied autoregressive studied short-term water quality variable prediction using a
integrated moving average (ARIMA) and LSTM network to hybrid CNN-LSTM model and effectively captured low and
forecast floods on four Para’ iba do Sul’s River stations in high water quality variables, mainly dissolved oxygen
Brazil. Aljahdali et al. [7] also compared the LSTM network concentrations.
and layered RNN to forecast streamflow in the USA’s two Screening input variables for different model architec-
rivers, the Black and Gila rivers. A recent article by tures is also a challenging task for the researchers. Even
Rahimzad et al. [12] used time-lagged Qt-1, Qt-2, and other though rainfall, evaporation, and temperature are causal
climatic variables to forecast Qt in the future and concluded variables for streamflow modelling, data availability and
that the LSTM network outperforms linear regression (LR), study objectives limit the choice variability [21]. Van et al.
multilayer perceptron (MLP), and support vector machine [21] discussed that applying temperature and evapotrans-
(SVM) in forecasting daily streamflow. piration input nodes into the model increases the network
A few years back, Cho et al. [13] introduced gated re- complexity and causes overfitting. In contrast, Parisouj et al.
current units (GRUs) similar to LSTM with a forget gate [22] concluded that using readily available input variables
which have fewer parameters than LSTM, as it lacks an such as temperature and precipitation for data-driven
output gate. GRU’s capacities in speech signal modelling and streamflow simulation will provide a reliable result. Hence,
natural language processing were similar to those of LSTM. this research will contribute a step to this debate by testing
However, there are debates on the relative performance of different input combinations of various climatic regions in
these two architectures for streamflow and reservoir inflow the performance of the proposed models.
simulation, which is not well studied with different time- To the best of our knowledge, we identify minimal lit-
scales and environments. erature that shows the performance variation of different
Notwithstanding the difference in their performance, hybrid models for streamflow simulation in various input
selecting appropriate time series models from various variability conditions at once. Thus, we compared various
known deep learning network architectures is difficult. forms of hybrid CNN-LSTM and CNN-GRU architectures
LSTMs and GRUs are not always the ideal sequence pre- with the classical MLP, GRU, and LSTM networks to
diction option. However, simulation with better prediction simulate single-step streamflow using two climatic regions,
accuracy, fast running time, and less complicated models available precipitation, and minimum and maximum tem-
requires more research. Hence, this comparative analysis on perature data. Moreover, the study tests the hybrid models
the network architectures helps decide the optimized al- with different layer arrangements and applies Keras tuner to
ternative for time series analysis. Recently, different hybrid optimize model hyperparameters. In general, the primary
deep learning models are getting wide attention from re- objective of this study will be to test the performance var-
searchers in various fields of study. Chen et al. [14] used iation of the proposed models with extreme input variability
Computational Intelligence and Neuroscience 3

conditions, which includes climatic, input combination, 31, 1978. Both case study datasets are multivariate and
input time window, and average rolling time window multisite. Even though we are highly concerned and chose
variability. the series of time windows with minimum data gap for
This study used different open-source software and both stations, the datasets contain many missing values
machine learning libraries, including Python 3.6 for pro- due to different reasons. Thus, our first task was to fill the
gramming, NumPy, pandas, Scikit-learn, Hydroeval, Stats- missing values with the Monte Carlo approach for this
models, and Matplotlib libraries. All were used for data research.
preprocessing, evaluation, and graphical interpretation. The study applied linear correlation statistics to mea-
Moreover, TensorFlow and Keras deep learning frameworks sure the strength of dependency between different input
were employed for modelling deep learning architectures. variables [25]. Even though Mehr and Gandomi [26] stated
that linear correlation might mislead or provide abundant
2. Study Area inputs, our study does no’t have a huge feature size that
requires intensive feature selection criteria. Hence, we
In the present study, two river subcatchments were selected adopted a linear correlation coefficient. Moreover, Kun
in two climatic regions: the Awash River Basin, Borkena et al. [27] concluded that Pearson correlation coefficient
subcatchment in Ethiopia (Figure 1(a)), and the Upper Tiber (PCC) is the most applicable for multiple linear regressions
River Basin in Italy (Figure 1(b)). (MLRs), and Oyebode [28] also stated that inputs selected
with PCC showed superior model accuracy. Hence, this
study applied Pearson linear correlation coefficient [29, 30].
2.1. Borkena Watershed (Ethiopia). The first case study area
It has a value ranging between “+1” and “-1,” where “+1”
is in the Borkena watershed at the Kombolcha station outlet,
indicates a positive linear correlation, “0” is no linear
located in the upper part of the Awash River Basin in the
correlation, and “-1” shows a negative linear correlation
northern part of Ethiopia. The mainstream of the watershed
[25]. Equation (1) calculates the Pearson correlation co-
emanates from Tosa mountain, which is found near Dessie
efficient, and Tables 1 and 2 present the result. Correlation
town. The area’s altitude ranges from 1,775 m at the lowest
values between positive (0 and 0.3) and negative (0 and
site near Kombolcha to 2,638 m at the highest site upstream
−0.3) show a weak linear relationship among variables [31].
of Dessie. The main rainy season of this watershed is from
However, since we have a small number of variables and
July to September.
data size, for this study, we decided to omit Borkena station
(Tmax) values, which have r values ranging between
2.2. Upper Tiber River Basin (Italy). The second case study (−0.129) and (+0.107), and the details are presented in
area is located in the Upper Tiber River Basin (UTRB) in Table 1.
Italy. The Tiber River Basin (TRB) is the second-largest
catchment in Italy [23]. Geographically, the basin is located N 􏽐 XY − 􏽐 X 􏽐 Y􏼁
r � 􏽱�������������������������������. (1)
between 40.5°N to 43°N latitudes and 10.5° E to 13° E lon- 2 2 2 2
􏽨N 􏽐 x − 􏽐 X􏼁 􏽩􏽨N 􏽐 Y − 􏽐 Y􏼁 􏽩
gitudes, covering about 17,500 km2 that occupies roughly 5%
of the Italian territory. The Upper Tiber River Basin (UTRB)
is part of the TRB, covering 4145 km2 (∼20% of the TRB) After passing rigorous quality control processes, the raw
with its outlet at Ponte Nuovo. The elevation of the data were then split chronologically into training and testing
catchment ranges from 148 to 1561 m above sea level. The datasets with a ratio of 80 : 20, respectively. The time series
area’s climate is the Mediterranean, with precipitation graph and the corresponding box plot of split data for both
mainly occurring from autumn (March to May) to spring stations are presented in Figure 2. Different options existed
(September to November). The intense rainfall highly in- in the literature to remove noise from the time series. A
fluences the basin’s hydrology at the upstream part that sliding window is the first option to temporarily approxi-
causes frequent floods in the downstream areas [24]. mate the time series data’s actual value [32]. In comparison,
rolling windows (moving average) is the second option that
3. Data Source and Preprocessing smooths the time series data by calculating the average,
maximum, minimum, or sum over a specific time [33].
Borkena’s required hydrological and metrological datasets Hence, for this study, we applied average rolling windows to
were collected from the Ministry of Water Irrigation and smooth and remove noise from the time series by keeping
Energy (MoWIE) of Ethiopia and the National Meteoro- the length of the data still.
logical Agency of Ethiopia (NMA), respectively. UTRB’s Then, daily, weekly, and monthly average rolling sliding
datasets were collected from the National Research Council windows were used to rebuild the input and output time
of Italy (CNR) and archived for public use with the Water series into a supervised learning format. Accordingly, the
Resource Management and Evaluation (WRME) platform in rolled time series data were then prepared with the time lag
the following link: http://hydrogate.unipg.it/wrme/. window of 30 or 45 for single-step streamflow simulation at
We collected 5844 available data series from the time Borkena and UTRB stations, respectively. Moreover, split
window of January 1, 1999, to December 31, 2014, from time series data variable scaling was performed using
the Borkena watershed. Similarly, for UTRB, 7670 data Standard Scaler for the modelling process’s computational
series were collected from January 1, 1958, to December easiness and numerical stability.
4 Computational Intelligence and Neuroscience

39°30′0″E 40°0′0″E 12°0′0″E 12°30′0″E

43°30′0″N 43°30′0″N

11°0′0″E 11°0′0″N

W E N

S W E
Kilometers
0 3.5 7 14 21 28 43°0′0″N 43°0′0″N
S
Kilometers
0 210420 840 1,260 1,680

Guaging Station
10°30′0″N 10°30′0″N River Catchments in Italy
Borkena Watershed Tiber River Basin
Awash River Basin Upper Tiber River Basin
39°30′0″E 40°0′0″E Ethiopian River Basins Gauging Station(Ponte Nuovo)

12°0′0″E 12°30′0″E

(a) (b)

Figure 1: Location of case study areas. (a) Borkena. (b) UTRB.

4. Methods where i, h, j, b, and w indicate neurons of the input, hidden,


output layers, bias, and applied weight of the neuron, re-
In this study, three types of network architectures MLP, spectively; fh and fj show the activation functions of the
GRU, and LSTM, were compared with the proposed hybrid hidden layer and output layer, respectively; xi, n, and m
deep neural network architectures CNN-LSTM and CNN- represent, respectively, the input value, input neuron, and
GRU for the simulation of single-step streamflow by taking hidden neuron numbers; and y and y􏽢j denote the observed
different combinations of precipitation (P), minimum and calculated target values, respectively. In the calibration
temperature (Tmin), and maximum temperature (Tmax) as phase of the model, the values of the hidden and output
inputs. The proposed simulation model architectures with layers and corresponding weights could be varied and
their input and output variables are briefly presented as a calibrated [38].
flowchart in Figure 3. The ability of ANN to link input and output variables in
complex hydrological systems without the need for prior
knowledge about the nature of the process has led to a huge
4.1. Deep Learning Models. Deep learning models are part of leap in the use of ANN models in hydrological simulations
a broader family of machine learning, including recurrent [38].
neural networks (RNNs), convolutional neural networks
(CNNs), deep belief networks (DBNs), and deep neural
networks (DNNs). These models have been applied to dif- 4.3. Long Short-Term Memory (LSTM). The difference of
ferent fields of study, including speech recognition, com- LSTM from the classical MLP network is that layers of the
puter vision, natural language processing, and time series neurons in LSTM have recurrent connections; thus, the state
analysis [13, 16, 34–36]. The following sections will briefly from the previous activation time step is used to formulate
discuss some of these architectures that were used in the an output. The LSTM replaces the typical neuron in the
present study. hidden layer with a memory cell and three gates: an input
gate, a forget gate, and an output gate [39]. It is an advanced
form of recurrent neural network (RNN) that can capture
4.2. Artificial Neural Network (ANN). Artificial neural net- long-term dependencies. On the other hand, RNN is a
work (ANN) is the most common machine learning model circular network in which an additional input is added to
that has found application in streamflow simulation over the represent the state of the neuron in the hidden layer at the
last two decades [1, 37]. It is known for modelling complex previous time steps [40]. LSTM has two critical benefits over
input-output relationships inherent in hydrological time RNN: overcoming vanishing and exploding gradients and
series features within a river catchment. The traditional holding memory to capture long-term temporal dependency
feedforward neural network (FFNN) with three layers of in input sequences. The mathematical formulation for dif-
input-output and hidden layers trained by backpropagation ferent parameters is listed in Table 3, and Figure 5 displays
(BP) algorithm gained popularity for nonlinear hydrological the LSTM memory cell with three gated layers.∗ Wi, Wf, Wo,
time series modelling. and Wc are the weights that map the hidden layer input to
Figure 4 displays the typical architecture of ANN. the three gates of input, forget, and output. Ui, Uf, Uo, and Uc
m n weight matrices map the hidden layer output to gates; bi, bf,

y􏽢j � fj ⎡
⎣ 􏽘 wjh ∗ fh ⎝ ⎞ + wjb ⎤⎥⎦,
⎛􏽘 whi xi + whb ⎠ (2) bo, and bc are vectors. Ct and ht are the outcome of the cell
h�1 i�1 and the outcome of the layer, respectively.
Computational Intelligence and Neuroscience 5

Table 1: Descriptive statistics of split time series data for the Borkena watershed.
Training data (80%) Testing data (20%)
Stations Data type Pearson correlation with streamflow
Mean Max Min SD Mean Max Min SD
Stream flow (m3/sec) 1.000 10.9 216.9 0.00 23.2 10.1 94.8 0.0 20.2
P (mm/day) 0.321 3.1 73.2 0.0 7.5 2.9 60.4 0.0 7.2
Kombolcha
Tmin (oc) 0.271 12.5 20.9 1.5 3.3 12.5 20.6 2.6 3.4
Tmax (°c) −0.099 27.2 33.6 16.4 2.5 27.3 33.0 19.6 2.1
P (mm/day) 0.344 3.5 81.6 0.0 8.6 3.4 64.3 0.0 8.1
Chefa Tmin (oc) 0.266 13.3 21.5 0.1 3.7 14.1 22.2 3.9 3.5
Tmax (oc) −0.069 29.9 38.0 18.5 2.8 30.3 38.0 22.2 2.5
P (mm/day) 0.335 3.5 80.6 0.0 8.6 2.9 67.0 0.0 7.3
Dessie Tmin (oc) 0.319 8.5 15.5 0.1 2.5 7.8 15.5 0.0 3.1
Tmax (oc) 0.107 23.8 30.0 16.0 1.9 24.1 30.0 15.0 2.1
P (mm/day) 0.372 3.1 81.9 0.0 8.3 2.9 72.1 0.0 7.5
Kemise Tmin (oc) 0.282 13.8 22.0 3.0 3.4 13.5 20.1 4.5 3.6
Tmax (oc) −0.129 31.0 38.3 14.0 2.7 31.9 37.8 23.5 2.4
P (mm/day) 0.347 3.3 80.7 0.0 8.6 3.3 81.3 0.0 8.6
Majete Tmin (oc) 0.202 14.7 23.0 1.4 2.9 14.6 21.5 6.7 2.9
Tmax (oc) −0.057 28.6 37.8 17.2 2.8 29.1 38.0 20.8 2.4

Table 2: Descriptive statistics of split time series data for the UTRB.
Training data (80%) Testing data (20%)
Stations Data type Pearson correlation with streamflow
Mean Max Min SD Mean Max Min SD
Ponte Nuovo Streamflow (m3/sec) 1.000 50.6 939.0 1.9 75.5 50.6 737.0 3.7 68.6
Castel Rigone P (mm/day) 0.384 2.6 72.8 0.0 6.6 2.7 67.7 0.0 6.9
Montecoronaro P (mm/day) 0.339 3.9 229.0 0.0 10.7 4.0 110.0 0.0 10.5
P (mm/day) 0.379 2.4 120.4 0.0 6.6 2.5 61.8 0.0 6.3
Perugia (ISA) Tmin (oc) −0.353 9.7 30.4 −9.0 6.3 9.3 25.2 −5.0 5.6
Tmax (oc) −0.379 17.4 37.4 −4.5 8.1 16.3 33.0 0.6 7.2
Petrelle P (mm/day) 0.345 2.51 90.0 0.0 6.9 2.7 117.1 0.0 7.4
Pietralunga P (mm/day) 0.428 3.22 150.0 0.0 8.1 3.1 73.1 0.0 7.3
P (mm/day) 0.412 2.9 113.6 0.0 7.9 2.9 94.2 0.0 7.8
Spoleto Tmin (oc) −0.265 7.5 23.0 −12.6 6.4 8.8 21.7 −5.4 5.8
Tmax (oc) −0.383 18.8 38.7 −3.5 8.6 18.7 36.8 2.0 7.8
Torgiano P (mm/day) 0.364 2.4 141.2 0.0 7.1 2.5 62.0 0.0 6.9
Tmin (oc) −0.315 8.7 26.0 −12.0 5.9 6.1 19.3 −11.3 5.4
Gubbio
Tmax (oc) −0.377 18.1 39.0 −8.0 8.1 17.4 34.1 −0.9 7.5
Tmin (oc) −0.325 9.2 25.6 −11.6 6.2 8.2 21.5 −8.0 5.6
Assisi
Tmax (oc) −0.378 18.2 37.8 −5.0 8.3 18.1 35.8 0.0 7.8

4.4. Gated Recurrent Unit (GRU). GRU is a special type of ⌢


ht � tanh Wh Xt + rt ∗ ht−1 􏼁 + bh 􏼁, (5)
LSTM architecture in which it merges the input and forget
gates and converts them into an update gate, which makes ⌢
the parameter numbers fewer, and the training will be easier. ht � 1 − Zt 􏼁 ∗ ht−1 + Zt ∗ ht , (6)
There are two input features each time: the input vector xt
and the previous output vector ht−1. The output of each where Zt is the update gate vector, rt is the reset gate vector,
specific gate can be calculated through logical operation and W and U are parameter matrices, σ is a sigmoid function,
nonlinear transformation of the input [34]. The mathe- and tanh is a hyperbolic tangent.
matical formulations among inputs, outputs, and different
parameters are listed in equations (3), (4), (5), and (6).
Moreover, Figure 6 displays the structure of the gated re- 4.5. Convolutional Neural Network (CNN). Convolutional
current unit (GRU) network. neural network (CNN) is one of the most successful deep
learning models, especially for feature extraction, and its
Zt � σ Wz Xt + Uzht−1 + bz 􏼁, (3) network structures include 1D CNN, 2D CNN, and 3D CNN
[15]. CNN structure generally consists of a convolution
rt � σ Wr Xt + Ur ht−1 + br 􏼁, (4) layer, a pooling layer, and a full connection layer [18].
6 Computational Intelligence and Neuroscience

200
Q (m3/sec) 150
Training Testing
100
50
0
2000 2002 2004 2006 2008 2010 2012 2014
Time (years)
(a)

800
Q (m3/sec)

600
400
200
0
1960 1964 1968 1972 1976 1980
Time (years)
(b)
16 120
14
12 100

Q (m3/sec)
Q (m3/sec)

10 80
8 60
6
40
4
2 20
0 0
Total Train Test Total Train Test
Split data Split data
(a) (b)
Figure 2: Streamflow time series graph and the corresponding box plot of split data. (a) Borkena. (b) UTRB.

1D CNN is mainly implemented for sequence data dense layers. Additionally, dropouts are introduced to
processing [41], 2D CNN is usually used for text and image prevent overfitting. Figure 8 shows the designed model
identification [42], and usually, 3D CNN is recognized for inputs and outputs with a basic description of the con-
modelling medical image and video data identification [43]. volutional, pooling, and LSTM or GRU layers proposed for
Hence, since the aim of the present study is time series this project.
analysis, we implemented 1D CNN. The detailed process of
1D CNN is described in Figure 7. 5. Data Analysis
As depicted in Figure 7, the input series is convoluted to
the convolution layer from top to bottom (shown by the Simulation with deep learning requires selecting a probable
arrows). The grey or the mesh colours represent different combination of hyperparameters: batch size, epochs,
filters where the size of the convolution layer depends on the number of layers, and number of units for each layer [8].
number of input data dimensions, the size of the filter, and Optimizing hyperparameters is not always consistent as
the convolution step length. there is no hard rule to follow. “The process is more of an art
than a science” [44]. Hence, in this study, we chose the Keras
tuner optimizer developed by the Google team and included
4.6. CNN-LSTM and CNN-GRU Hybrid Models. In this it in the Keras open library [45, 46].
study, hybrid models were designed by integrating CNN
with LSTM or GRU layers. Hence, the feature sequence from
the CNN layer was considered as the input for the LSTM or 5.1. Hyperparameter Optimization. Tuning machine learn-
GRU layer, and then the short and long-time dependencies ing model hyperparameters is critical. Varying hyper-
were further extracted. parameter values often results in models with significantly
The proposed CNN-LSTM or CNN-GRU models different performances [47]. The models applied in this
contain two main components: the first component con- study mainly contain two types of hyperparameters: con-
sists of one dimensional single or double convolutional and stant hyperparameters that are not altered through the
average pooling layers. Moreover, a flatten layer is con- optimization process and variable hyperparameters. Adam
nected to further process the data into the format required optimizer is applied under the category of the constant
by the LSTM or GRU. In the second component, the hyperparameters because of its efficiency and easiness to
generated features are processed using LSTM, GRU, and implementation that requires minimum memory and is
Computational Intelligence and Neuroscience 7

Xt Q

Time Catchment Time

P(t+n) Tmin(t+n) Tmax(t+n) MLP Q(t+n)


GRU

LSTM
P(t+1) Tmin(t+1) Tmax(t+1)
CNN-LSTM

P(t) Tmin(t) Tmax(t)


CNN-GRU

Model
Input Output
Architectures

Figure 3: A simple architecture of the proposed models.

commonly suited in different problems [48]. In this cate- W1,1,1


gory, rectified linear unit (Relu) was used as an activation X1 h1
W1,1,2 W2,1,1
function, and mean squared error (MSE) was used as a loss
function.
In contrast, the second type of changing hyper- W2,2,1
X2 h2 y
parameters is optimized by Keras tuner, and hyperparameter
choices or value ranges for optimization are set using dif- Output layer
ferent trials. We also considered our PC capacity (processor: W1,1,3
Intel(R) Core (TM) i7-6500U CPU 2.50 GHz and RAM: 8
W2,3,1
gigabytes) with Windows 10 operating system. Hyper-
Xn hn
parameters are optimized with 20 trials, and since deep W1,3,3
learning networks have different training and validation
plots for each run, we decided to repeat the iteration three Input layer Hidden layer
times. Figure 4: Typical architecture of ANN.
All hyperparameter ranges or choices are listed in
Table 4. CNN-LSTM1 and CNN-GRU1 models used
hyperparameter values from numbers 1 to 13 for opti- used different input and model variability conditions.
mization (Table 4), while we omitted 4, 5, and 6 for CNN- Hence, to concisely measure the analysis output and present
LSTM2 and CNN-GRU2. The remaining deep learning the result, we applied the following top three standard
models MLP, LSTM, and GRU used a list of hyper- performance evaluation criteria that can also have the po-
parameters from numbers 7 to 13. Finally, each optimized tential to capture the extreme streamflow time series values
hyperparameter is used for each training and testing ex- effectively [50].
periment. Moreover, the train and test traces from each Coefficient of determination (R2):
run can be plotted to give a more robust idea of the be-
haviour of the model to inspect overfitting and under- n 􏽐​ Qobs ∗ Qsim 􏼁 − 􏽐​ Qobs 􏼁 ∗ 􏽐​ Qsim 􏼁
fitting issues. R2 � 1/2
􏽱�������������������� .
2 2 2 2
􏽨n􏼐􏽐​ Qobs 􏼑 − 􏽐​ Qobs 􏼁 􏽩 ∗ 􏽨n􏼐􏽐​ Qsim 􏼑 − 􏽐​ Qsim 􏼁 􏽩
(7)
5.2. Performance Measures. A wide variety of evaluation
metrics are listed in the literature [49]. However, the popular Root mean square error (RMSE):
ones are mean error (ME), coefficient of determination (R2), 􏽳����������������
2
root mean square error (RMSE), mean absolute error 􏽐N t t
n�1 􏼐Q0bs − Qsim 􏼑 (8)
(MAE), mean percentage error (MPE), mean absolute RMSE � .
N
percentage error (MAPE), mean absolute scaled error
(MASE), and Nash–Sutcliffe efficiency (NSE). This study Mean absolute error (MAE):
8 Computational Intelligence and Neuroscience

Table 3: Mathematical formulation for LSTM cell.


Network gates Purpose Equations∗
Forget gate Chooses the information to reject from the cell ft � σ (uf xt + wf ht-1 + bf )
Input gate Decides what information is relevant to update in the current cell state it � σ (ui xt + wi ht-1 + bi)
Output gate Decides what to output based on input and the long-term memory of the cell ot � σ (uo xt + wo ht-1 + bo)
Cet � tanh (WcXt + Ucht−1 + bc)
Cell state Long-term memory
Ct � ft∗ Ct−1 + it ∗ Cet
Hidden state Short-term memory ht � tanh (Ct) ∗ Ot

Previous Output Updated


Long-Term Long-Term
Memory yt Memory

Ct-1 + Ct

ft φ
it
ht-1 Ot
ht ht

Cet

Previous Updated
Short-Term Short-Term
Memory Xt Input Memory

Figure 5: LSTM memory cell with three gated layers [11].

Output y^t

ht-1 ht

rt
1- h^t
Previous Updated
Zt
Long-Term Long-Term
+ +
Short-Term σ σ tanh Short-Term
Memory Memory

Xt Input

Figure 6: The structure of gated recurrent unit (GRU) network [34].

For Filter Size of 2*1

W1 W2

Input P1 P2 P3 . . . . . Pn

C1=W1P1+W2P2
C2=W1P2+W2P3
Cn=W1Pn-1+W2Pn

Convolution - Layer C1 C2 C3 . . . . Cn

Multi-variate Timeseries

Convolution - Layer
I1

I2

In

Figure 7: The process of 1D CNN [15].


Computational Intelligence and Neuroscience 9

Daily Multivariate Timeseries Data

Weekly Rolled Monthly Rolled


Daily
Average Average

Borkena Stations
P and Tmin of the past 30 days.
Both stations Daily, Weekly and Monthly
rolled single step Streamflow
Upper Tiber Stations
P, Tmin and Tmax of the Past 45 days.

AveragePooling

(single output)
LSTM or GRU

LSTM or GRU
Dropout

Dropout
1D conv

Output
Flatten

Dense
Input

Figure 8: The basic architecture of the proposed CNN-LSTM or CNN-GRU models.

Table 4: Model hyperparameter choices or value ranges for optimization by Keras tuner.
Value ranges∗∗
No Hyperparameters Choices Default
Min Max Step
∗ ∗
1 Conv_1_filter 8 32 8
∗ ∗ ∗ ∗
2 Conv_1_kernal 2 or 3
∗ ∗ ∗ ∗
3 Conv_1_pool_size 2 or 3
∗ ∗
4 Conv_2_filter 8 32 8
∗ ∗ ∗ ∗
5 Conv_2_kernal 2 or 3
∗ ∗ ∗ ∗
6 Conv_2_pool_size 2 or 3
CNN-LSTM1, CNN-LSTM2, CNN-GRU1, CNN-GRU2, LSTM, GRU, or MLP layer 1 ∗ ∗
7 5 30 5
units

8 Dropout 1 0.0 0.3 0.1 0.2
CNN-LSTM1, CNN-LSTM2, CNN-GRU1, CNN-GRU2, LSTM, GRU, or MLP layer 2 ∗ ∗
9 5 30 5
units

10 Dropout 2 0.0 0.3 0.1 0.2
∗ ∗ ∗ ∗
11 Learning rate 1e-2, 1e-3 or 1e-4
∗ ∗
12 Number of epochs 10 100 10
∗ ∗
13 Number of batch sizes 10 100 10
∗∗ ∗
Value ranges or choices for optimization by Keras tuner: (objective � “validation loss,” max trials � 20, and executions per trial � 3). Not applicable.

Table 5: Daily streamflow simulation: performance comparison of the proposed models for different input variables and climatic
conditions.
Borkena UTRB
P + Tmin P P + Tmin + Tmax P
Model 1 R M R2 T R M R2 T R M R2 T R M R2 T
M A T M A T M A T M A T
S E P S E P S E P S E P
E E∗ (sec) E E∗ (sec) E E∗ (sec) E E∗ (sec)
MLP 9.91 5.01 0.77 0.89 9.38 4.63 0.79 0.63 49.11 22.74 0.49 0.78 56.57 28.14 0.33 0.41
GRU 8.78 4.37 0.82 3.61 7.94 3.64 0.85 3.32 46.63 20.89 0.55 2.61 51.09 26.74 0.45 3.39
LSTM 8.41 4.09 0.83 2.35 9.65 4.87 0.78 2.92 48.64 22.79 0.51 3.86 48.59 25.00 0.51 5.98
CNN-LSTM1 8.09 4.07 0.84 0.46 8.57 4.67 0.82 0.41 51.20 22.95 0.45 1.19 56.16 26.55 0.34 0.57
CNN-LSTM2 7.99 4.09 0.85 0.72 9.14 4.50 0.80 0.45 45.38 21.85 0.57 0.82 51.57 25.84 0.44 1.85
CNN-GRU1 7.94 3.66 0.85 0.63 8.32 4.09 0.83 0.86 55.06 23.49 0.37 1.16 52.42 24.98 0.43 0.83
CNN-GRU2 9.07 4.19 0.80 1.01 8.43 4.26 0.83 0.28 45.61 21.79 0.57 0.64 49.96 25.38 0.48 0.68

TTPE (training time per epoch). The bold values indicate the highest performance score.
10 Computational Intelligence and Neuroscience

Table 6: Weekly rolled streamflow simulation: performance comparison of the proposed models for different input variables and climatic
conditions.
Borkena UTRB
P + Tmin P P + Tmin + Tmax P
Model 2 R M R2 T R M R2 T R M R2 T R M R2 T
M A T M A T M A T M A T
S E P S E P S E P S E P
E E∗ (sec) E E∗ (sec) E E∗ (sec) E E∗ (sec)
MLP 8.11 4.29 0.84 0.23 7.33 4.19 0.87 0.22 33.01 20.01 0.60 0.74 38.03 25.17 0.47 0.79
GRU 7.59 3.71 0.86 2.04 7.15 4.13 0.87 2.43 25.21 14.83 0.77 5.56 31.39 19.26 0.64 16.79
LSTM 8.41 4.01 0.82 2.98 7.93 3.91 0.84 1.27 31.07 18.87 0.65 3.55 31.07 19.49 0.65 2.69
CNN-LSTM1 7.90 4.09 0.85 0.78 7.72 4.25 0.85 0.63 28.04 17.33 0.71 0.93 34.57 21.92 0.57 0.62
CNN-LSTM2 7.33 3.86 0.87 0.52 7.63 4.25 0.86 0.55 28.45 16.66 0.71 1.14 35.04 21.77 0.55 1.56
CNN-GRU1 7.83 3.94 0.85 0.44 7.91 4.31 0.85 0.50 30.57 18.01 0.66 2.32 35.14 22.58 0.55 0.63
CNN-GRU2 8.73 4.61 0.81 0.43 8.43 4.35 0.82 0.97 27.81 16.99 0.72 4.37 33.76 23.01 0.59 1.01

TTPE (training time per epoch). The bold values indicate the highest performance score.

Table 7: Monthly rolled streamflow simulation: performance comparison of the proposed models for different input variables and climatic
conditions.
Borkena UTRB
P + Tmin P P + Tmin + Tmax P
Model 2 R M R2 T R M R2 T R M R2 T R M R2 T
M A T M A T M A T M A T
S E P S E P S E P S E P
E E∗ (sec) E E∗ (sec) E E∗ (sec) E E∗ (sec)
MLP 6.68 4.37 0.87 0.58 5.57 3.80 0.91 0.41 20.24 13.84 0.78 0.44 28.79 21.05 0.56 0.41
GRU 5.15 3.52 0.91 1.62 5.22 3.06 0.92 3.31 20.79 14.30 0.77 16.63 26.47 20.08 0.63 4.70
LSTM 5.55 3.49 0.91 2.75 5.76 3.51 0.90 2.51 21.49 15.11 0.76 4.15 32.29 24.47 0.45 5.09
CNN-LSTM1 6.05 4.42 0.89 0.98 5.58 3.40 0.91 0.58 21.53 14.87 0.76 1.29 27.48 21.19 0.60 0.42
CNN-LSTM2 5.36 3.17 0.92 1.41 6.87 4.05 0.86 1.44 19.07 13.53 0.81 0.70 27.79 20.90 0.59 0.42
CNN-GRU1 5.76 3.62 0.90 0.52 5.77 3.56 0.90 0.69 19.31 13.78 0.80 4.87 28.67 21.07 0.57 3.08
CNN-GRU2 5.36 3.25 0.92 0.62 5.15 3.18 0.92 0.78 17.98 12.99 0.83 0.71 27.77 20.36 0.59 1.22

TTPE (training time per epoch). The bold values indicate the highest performance score.

0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
Loss
Loss

0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60
Epochs Epochs

Train Train
Test Test
(a) (b)

Figure 9: Training and test loss function of the optimized high score hybrid model. (a) CNN-GRU2 model for Borkena Station. (b) CNN-
GRU2 model for UTRB Station.
Computational Intelligence and Neuroscience 11

70 70
60 60

Test Q (m3/sec)
50 50
Q (m3/sec)
40 40
30 30
20 20
10 10
0 0
2011-09

2012-01

2012-05

2012-09

2013-01

2013-05

2013-09

2014-01

2014-05

2014-09

2015-01
0 10 20 30 40 50 60 70
Predicted Q (m3/sec)

Time (Year) Linear_fit

Test Data_points

Predicted
(a)

175 175
150 150
125 Test Q (m3/sec) 125
Q (m3/sec)

100 100
75 75
50 50
25 25
0 0
1975-01

1975-07

1976-01

1976-07

1977-01

1977-07

1978-01

1978-07

1979-01

0 25 50 75 100 125 150 175


Predicted Q (m3/sec)

Time (Year) Linear_fit

Test Data_points

Predicted
(b)

Figure 10: Comparison of true values and predicted values of the optimized high score hybrid model. (a) CNN-GRU2 model for Borkena
Station. (b) CNN-GRU2 model for UTRB Station.

Table 8: Best hybrid model type, input feature, and Keras tuner
1 n 􏼌􏼌􏼌 t 􏼌
optimized hyperparameter values for Borkena station with its MSE MAE � 􏽘 􏼌Q − Qtsim 􏼌􏼌􏼌, (9)
score. n i�1 0bs
CNN-GRU2 where Qobs � discharge observed, Qsim � discharge simu-
Hyperparameters:
Monthly rolled P lated, and n � number of observations. The range of R2 lies
Conv_1_filter 24 between 0 and 1, representing, respectively, no correlation
Conv_1_kernal 2 and a perfect correlation between observed and simulated
Conv_1_pool_size 3 values, whereas smallest RMSE and MAE scores or values
GRU_l1_units 15
close to zero direct to the best model performance.
Dropout1 0.1
GRU_l2_units 20
Dropout2 0.2 6. Results
Learning rate 0.0001
Number of epochs 80 Streamflow simulation result with the proposed seven deep
Number of batch sizes 20 learning architectures, different input time window series,
Score (MSE) 0.083 two climatic regions, two input combinations, and three
12 Computational Intelligence and Neuroscience

Table 9: Best hybrid model type, input features, and Keras tuner optimized hyperparameter values for UTRB station with its MSE score.
CNN-GRU2
Hyperparameters:
Monthly rolled P, Tmin and Tmax
Conv_1_filter 8
Conv_1_kernal 2
Conv_1_pool_size 2
GRU_l1_units 20
Dropout1 0.3
GRU_l2_units 30
Dropout2 0.2
Learning rate 0.0001
Number of epochs 60
Number of batch sizes 40
Score (MSE) 0.193

input: [(?, ?, 30, 5)]


time_distributed_3_input: InputLayer
output: [(?, ?, 30, 5)]

input: (?, ?, 30, 5)


time_distributed_3(conv1d_1): TimeDistributed(Conv1d)
output: (?, ?, 29, 24)

input: (?, ?, 29, 24)


time_distributed_4(average_pooling1d_1): TimeDistributed(AveragePooling1D)
output: (?, ?, 9, 24)

input: (?, ?, 9, 24)


time_distributed_5(flatten_1): TimeDistributed(Flatten)
output: (?, ?, 216)

input: (?, ?, 216)


gru: GRU
output: (?, ?, 15)

input: (?, ?, 15)


dropout_2: Dropout
output: (?, ?, 15)

input: (?, ?, 15)


gru_1: GRU
output: (?, 20)

input: (?, 20)


dropout_3: Dropout
output: (?, 20)

input: (?, 20)


dense_1: Dense
output: (?, 1)

Figure 11: Internal network structure of the optimized high score hybrid CNN-GRU2 model for Borkena Station.
Computational Intelligence and Neuroscience 13

input: [(?, ?, 45, 15)]


time_distributed_15_input: InputLayer
output: [(?, ?, 45, 15)]

input: (?, ?, 45, 15)


time_distributed_15 (conv1d_5): TimeDistributed (Conv1d)
output: (?, ?, 44, 8)

input: (?, ?, 44, 8)


time_distributed_16 (average_pooling1d_5): TimeDistributed (AveragePooling1D)
output: (?, ?, 22, 8)

input: (?, ?, 22, 8)


time_distributed_17 (flatten_5): TimeDistributed (Flatten)
output: (?, ?, 176)

input: (?, ?, 176)


gru_10: GRU
output: (?, ?, 20)

input: (?, ?, 20)


dropout_10: Dropout
output: (?, ?, 20)

input: (?, ?, 20)


gru_11: GRU
output: (?, 30)

input: (?, 30)


dropout_11: Dropout
output: (?, 30)

input: (?, 30)


dense_5: Dense
output: (?, 1)

Figure 12: Internal network structure of the optimized high score hybrid CNN-GRU2 model for UTRB Station.

average rolling time windows is presented in Tables 5, 6, and are RMSE, MAE, R2, and training time per epoch,
7. Regardless of the combination of these conditions, the respectively.
CNN-GRU model showed promising results in most of these
Moreover, from the proposed four hybrid models, CNN-
scenarios (Tables 5 and 7). The highest scores are presented
GRU2 or the model designed by a single 1D CNN layer
here.
showed the highest promising result on trial model 1(UTRB)
(1) In daily streamflow simulation for Borkena station, and model 3, as shown in Tables 5 and 7. In contrast, GRU
CNN-GRU1 scored 7.94, 3.66, 0.85, and 0.63, and for on model 2 (UTRB), CNN-LSTM2 on model 2 (Borkena),
UTRB station, CNN-GRU2 scored 45.61, 21.79, 0.57, and CNN-GRU1 on model 1 (Borkena) shared the second-
and 0.64 for RMSE, MAE, R2, and training time per highest promising result. Streamflow simulation with the
epoch, respectively. CNN-GRU2 model generally showed the highest perfor-
(2) In weekly rolled streamflow simulation for Borkena mance than the other tested hybrid deep learning models
station, CNN-LSTM2 scored 7.33, 3.86, 0.87, and and state-of-the-art LSTM, GRU, and MLP models. In line
0.52, and for UTRB station, GRU scored 25.21, 14.83, with our objectives, the result is discussed with different
0.77, and 5.56 for RMSE, MAE, R2, and training time variability conditions in the following paragraphs.
per epoch, respectively.
(3) In monthly rolled streamflow simulation, the CNN- 6.1. Climatic Region Variability. Testing models in different
GRU2 model showed high performance with 5.15, climatic conditions with historical data will likely provide
3.18, 0.92, and 0.78 scores for Borkena station and robust deep learning models for streamflow simulation in
17.98, 12.99, 0.83, and 0.71 for UTRB station, which the future [51]. Hence, this research also tested different
14 Computational Intelligence and Neuroscience

models in two climatic regions, and irrespective of climatic performance of the selected architectures is irre-
and time window variation, the CNN-GRU model displayed spective of the climatic characteristics of the basins.
the highest scores on tested case study areas. (ii) Combining temperature data with precipitation as
input and inserting to the proposed models had
minimum performance increment in Borkena
6.2. Input Combination Variability. Input combination, station compared to UTRB case study area, which
minimum temperature (Tmin) with precipitation (P), does clearly showed that temperature data scarcity has
not show significant performance increment in the Borkena more performance loss implication in UTRB sta-
station (Tables 5 and 6). In some scenarios, adopting P only tion. On the other hand, the Borkena station has
as input increases the performance of the model (Table 7). In significant natural streamflow variability than
contrast, for UTRB, streamflow simulation with all input UTRB, which is also reflected in the model results.
variables or Tmin, Tmax, and P showed significant perfor- This implies the consideration of catchment re-
mance increments (Table 7). sponse before any deep learning model
applications.
6.3. Average Rolling Time Window Variability. (iii) Rolling the time window of input and output time
Streamflow simulation without rolling daily time series data series for streamflow simulation using the proposed
had deficient performance compared to monthly rolled models considerably increases performance in the
average time series. This could be because the time series UTRB than in the Borkena station.
noise in UTRB is visible compared to that in Borkena (iv) The analysis results also showed that training time
station. As a result, performance increment from daily to per epoch for the hybrid deep learning models is
monthly rolled window models is much higher in UTRB much lower than that of GRU and LSTM models.
than in Borkena station. Deep learning models usually require massive datasets,
Generally, the monthly rolled time window with the and their performance drops with small to medium datasets.
CNN-GRU2 model showed the top performance results in However, from this case study, acceptable results and
both stations (Table 7). The corresponding training and test considering hybrid models’ hyperparameters sensitivity and
loss functions of this optimized high score hybrid model for complexity, future research may further design optimized
both stations are displayed in Figure 9. Consequently, configurations. Moreover, they can test these hybrid models
Figure 10 compares the true values and predicted values of for long-term streamflow simulation in ephemeral, seasonal,
this model. The optimized hybrid model boosts the per- and perennial river systems and other fields of study. Our
formance score and lowers the training time per epoch much future research will try to synchronize the highly performed
better than GRU and LSTM models. This model, input hybrid deep learning models in this study with remote
feature, and Keras tuner optimized hyperparameter values sensing datasets for the problem we experience in the
for both stations with its MSE score are presented in Tables 8 ungauged catchments.
and 9. Moreover, the internal network structures of these
models are also shown in Figures 11 and 12, which display
the model input and output parameter matrices for each
Data Availability
layer. The raw hydrological and metrological datasets used for the
Borkena watershed are available from the corresponding
author upon request. However, authorization letters are
7. Conclusions required from the Ministry of Water Irrigation and Energy
This study showed a comparative analysis of different hybrid (MoWIE) of Ethiopia (http://mowie.gov.et/) and the Na-
deep learning algorithms with state-of-the-art machine tional Meteorological Agency of Ethiopia (NMA) (http://
learning models for one-step daily streamflow simulation at www.ethiomet.gov.et), whereas for UTRB, the datasets can
two river basins or subcatchment stream flow outlets. The be retrieved from an online repository (http://hydrogate.
proposed algorithms for this study are CNN-LSTM and unipg.it/wrme/).
CNN-GRU hybrid deep learning models, each model having
one or two 1D CNN layers with the classic MLP, LSTM, and Conflicts of Interest
GRU models. This study conducted a series of experiments
to observe the performance variation of the proposed The authors declare that they have no conflicts of interest.
models by introducing different input combinations, rolling
time windows, and climatic conditions for streamflow Acknowledgments
simulation. The following list of points will summarize the
significant findings of this study. The corresponding author acknowledges the Ministry of
Water Irrigation and Energy of Ethiopia (MoWIE) and the
(i) CNN-GRU2 with one 1D CNN layer showed the Italian National Research Center (Centro National di Ric-
best simulation performance reporting the lowest cerca CNR) for the hydrological and metrological datasets.
RMSE, MAE, and R2 out of all models in both case The corresponding author also thanks Dr. Fiseha Behulu for
study areas. Such results dictate that the his advice and series follow-up.
Computational Intelligence and Neuroscience 15

References [16] I. E. Livieris, E. Pintelas, and P. Pintelas, “A CNN–LSTM


model for gold price time-series forecasting,” Neural Com-
[1] Z. Zhang, Q. Zhang, and V. P. Singh, “Univariate streamflow puting and Applications, vol. 32, 2020.
forecasting using commonly used data-driven models: liter- [17] A. A. M. Ahmed, R. C. Deo, N. Raj et al., “Deep learning
ature review and case study,” Hydrological Sciences Journal, forecasts of soil moisture: convolutional neural network and
vol. 63, no. 7, pp. 1091–1111, 2018. gated recurrent unit models coupled with satellite-derived
[2] L. Ni, D. Wang, V. P. Singh et al., “Streamflow and rainfall MODIS, observations and synoptic-scale climate index data,”
forecasting by two long short-term memory-based models,” Remote Sensing, vol. 13, no. 4, p. 554, 2021.
Journal of Hydrology, vol. 583, Article ID 124296, 2020. [18] Y. Liu, T. Zhang, A. Kang, J. Li, and X. Lei, “Research on
[3] X. Yuan, C. Chen, X. Lei, Y. Yuan, and R. Muhammad Adnan, runoff simulations using deep-learning methods,” Sustain-
“Monthly runoff forecasting based on LSTM-ALO model,” ability, vol. 13, no. 3, p. 1336, 2021.
Stochastic Environmental Research and Risk Assessment, [19] A. U. Muhammad, X. Li, and J. Feng, “Using LSTM GRU and
vol. 32, no. 8, pp. 2199–2212, 2018. hybrid models for streamflow forecasting,” Machine Learning
[4] A. Sichangi, L. Wang, and Z. Hu, “Estimation of river dis- and Intelligent Communications, vol. 294, pp. 510–524, 2019.
charge solely from remote-sensing derived data: an initial [20] R. Barzegar, M. T. Aalami, and J. Adamowski, “Short-term
study over the Yangtze river,” Remote Sensing, vol. 10, no. 9, water quality variable prediction using a hybrid CNN-LSTM
p. 1385, 2018. deep learning model,” Stochastic Environmental Research and
[5] M. J. Tourian, N. Sneeuw, and A. Bárdossy, “A quantile Risk Assessment, vol. 34, no. 2, pp. 415–433, 2020.
function approach to discharge estimation from satellite al- [21] S. P. Van, H. M. Le, D. V. Thanh, T. D. Dang, H. H. Loc, and
timetry (ENVISAT),” Water Resources Research, vol. 49, no. 7, D. T. Anh, “Deep learning convolutional neural network in
pp. 4174–4186, 2013. rainfall-runoff modelling,” Journal of Hydroinformatics,
[6] A. W. Sichangi, L. Wang, K. Yang et al., “Estimating conti- vol. 22, no. 3, pp. 541–561, 2020.
nental river basin discharges using multiple remote sensing [22] P. Parisouj, H. Mohebzadeh, and T. Lee, “Employing machine
data sets,” Remote Sensing of Environment, vol. 179, pp. 36–53, learning algorithms for streamflow prediction: a case study of
2016. four river basins with different climatic zones in the United
[7] S. Aljahdali, A. Sheta, and H. Turabieh, “River flow fore- States,” Water Resources Management, vol. 34, no. 13,
casting: a comparison between feedforward and layered re- pp. 4113–4131, 2020.
current neural network,” Learning and Analytics in Intelligent [23] A. Annis and F. Nardi, “Integrating VGI and 2D hydraulic
Systems, vol. 43, pp. 523–532, 2020. models into a data assimilation framework for real time flood
[8] D. Couta, Y. Zhang, and Y. Li, “River flow forecasting using forecasting and mapping,” Geo-Spatial Information Science,
long short-term memory,” DEStech Transactions on Computer vol. 22, no. 4, pp. 223–236, 2019.
Science and Engineering, vol. 16, 2019. [24] B. M. Fiseha, S. G. Setegn, A. M. Melesse, E. Volpi, and
[9] Y. Bai, N. Bezak, K. Sapač, M. Klun, and J. Zhang, “Short-term A. Fiori, “Impact of climate change on the hydrology of upper
streamflow forecasting using the feature-enhanced regression Tiber River Basin using bias corrected regional climate
model,” Water Resources Management, vol. 28, no. 5,
model,” Water Resources Management, vol. 33, no. 14,
pp. 1327–1343, 2014.
pp. 4783–4797, 2019.
[25] S. Rania, B. Waad, and E. Nadia, “Hybrid feature selection
[10] L. C. D. Campos, L. Goliatt da Fonseca, T. L. Fonseca,
method based on the genetic algorithm and pearson corre-
G. D. de Abreu, L. F. Pires, and Y. Gorodetskaya, “Short-term
lation coefficient,” Machine Learning Paradigms: Theory and
streamflow forecasting for paraı́ba do Sul river using deep
Application, Studies in Computational Intelligence, vol. 801,
learning,” Progress in Artificial Intelligence, vol. 43, pp. 507–
2000.
518, 2019. [26] A. D. Mehr and A. H. Gandomi, “MSGP-LASSO: an im-
[11] B. B. Sahoo, R. Jha, A. Singh, and D. Kumar, “Long short-term
proved multi-stage genetic programming model for stream-
memory (LSTM) recurrent neural network for low-flow flow prediction,” Information Sciences, vol. 561, pp. 181–195,
hydrological time series forecasting,” Acta Geophysica, vol. 67, 2021.
no. 5, pp. 1471–1481, 2019. [27] R. Kun, F. Wei, Q. Jihong, Z. Xia, and S. Xiaoyu, “Comparison
[12] M. Rahimzad, A. Moghaddam Nia, H. Zolfonoon, J. Soltani, of eight filter-based feature selection methods for monthly
A. Danandeh Mehr, and H.-H. Kwon, Performance Com- streamflow forecasting – three case studies on CAMELS data
parison of an LSTM-Based Deep Learning Model versus sets,” Journal of Hydrology, vol. 586, Article ID 124897, 2020.
Conventional Machine Learning Algorithms for Streamflow [28] O. Oyebode, “Evolutionary modelling of municipal water
Forecasting, Water Resources Management, New York, NY, demand with multiple feature selection techniques,” Journal
USA, 2021. of Water Supply: Research & Technology - Aqua, vol. 68, no. 4,
[13] K. Cho, B. van Merriënboer, D. Bahdanau, and Y. Bengio, “On pp. 264–281, 2019.
the properties of neural machine translation: enco- [29] R. Dehghani, H. Torabi Poudeh, and Z. Izadi, “Dissolved
der–decoder approaches,” in Proceedings of the SSST-8, Eighth oxygen concentration predictions for running waters with
Workshop on Syntax, Semantics and Structure in Statistical using hybrid machine learning techniques,” Modeling Earth
Translation, pp. 103–111, Doha, Qatar, October 2014. Systems and Environment, vol. 597, 2021.
[14] C. Chen, W. He, J. Li, and Z. Tang, “A novel hybrid CNN- [30] N. Yuvaraj, V. Chang, B. Gobinathan et al., “Automatic de-
LSTM scheme for nitrogen oxide emission prediction in tection of cyberbullying using multi-feature based artificial
FCC unit,” Mathematical Problems in Engineering, vol. 2020, intelligence with deep decision tree classification,” Computers
Article ID 8071810, 2020. & Electrical Engineering, vol. 92, Article ID 107186, 2021.
[15] T. Li, M. Hua, and X. Wu, “A hybrid CNN-LSTM model for [31] B. Ratner, “The correlation coefficient: its values range be-
forecasting particulate matter (PM2.5),” IEEE Access, vol. 8, tween +1/−1, or do they?” Journal of Targeting, Measurement
pp. 26933–26940, 2020. and Analysis for Marketing, vol. 17, no. 2, pp. 139–142, 2009.
16 Computational Intelligence and Neuroscience

[32] S. K. Tanbeer, C. F. Ahmed, B.-S. Jeong, and Y.-K. Lee, [48] C. Boyraz and Ş. N. Engin, “Streamflow prediction with deep
“Sliding window-based frequent pattern mining over data learning,” in Proceedings of the 6th International Conference
streams,” Information Sciences, vol. 179, no. 22, on Control Engineering Information Technology (CEIT),
pp. 3843–3865, 2009. pp. 1–5, Istanbul, Turkey, October 2018.
[33] E. Zivot and J. Wang, “Rolling analysis of time series,” in [49] E. K. Jackson, W. Roberts, B. Nelsen, G. P. Williams,
Modeling Financial Time Series with S-Plus, pp. 299–346, E. J. Nelson, and D. P. Ames, “Introductory overview: error
Springer, New York, NY, USA, 2003. metrics for hydrologic modelling - a review of common
[34] Y. Wang, W. Liao, and Y. Chang, “Gated recurrent unit practices and an open source library to facilitate use and
network-based short-term photovoltaic forecasting,” Energies, adoption,” Environmental Modelling & Software, vol. 119,
vol. 11, no. 8, p. 2163, 2018. pp. 32–48, 2019.
[35] O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu, “Financial [50] S. Kumar, T. Roshni, and D. Himayoun, “A comparison of
time series forecasting with deep learning : a systematic lit- emotional neural network (ENN) and artificial neural net-
erature review: 2005-2019,” Applied Soft Computing, vol. 90, work (ANN) approach for rainfall-runoff modelling,” Civil
Article ID 106181, 2020. Engineering Journal, vol. 5, no. 10, pp. 2120–2130, 2019.
[36] Q. Zou, Q. Xiong, Q. Li, H. Yi, Y. Yu, and C. Wu, “A water [51] P. Bai, X. Liu, and J. Xie, “Simulating runoff under changing
quality prediction method based on the multi-time scale climatic conditions: a comparison of the long short-term
bidirectional long short-term memory network,” Environ- memory network with two conceptual hydrologic models,”
Journal of Hydrology, vol. 592, Article ID 125779, 2021.
mental Science and Pollution Research, vol. 27, no. 14,
pp. 16853–16864, 2020.
[37] S.-C. Wang, “Artificial neural network,” Interdisciplinary
Computing in Java Programming, vol. 26, pp. 81–100, 2003.
[38] V. Nourani, A. Molajou, H. Najafi, and A. Danandeh Mehr,
“Emotional ANN (EANN): a new generation of neural net-
works for hydrological modeling in IoT,” in Artificial Intel-
ligence In IoT. Transactions On Computational Science And
Computational Intelligence, pp. 45–61, Springer, New York,
NY, USA, 2019.
[39] L. Yan, J. Feng, and T. Hang, “Small watershed stream-flow
forecasting based on LSTM,” Advances in Intelligent Systems
and Computing, vol. 935, pp. 1006–1014, 2019.
[40] S. Zhu, X. Luo, X. Yuan, and Z. Xu, “An improved long short-
term memory network for streamflow forecasting in the upper
Yangtze River,” Stochastic Environmental Research and Risk
Assessment, vol. 14, no. 9, 2020.
[41] A. Osama, A. Onur, K. Serkan, G. Moncef, and I. D. Inman,
“Real-time vibration-based structural damage detection using
one-dimensional convolutional neural networks,” Journal of
Sound and Vibration, vol. 388, pp. 154–170, 2017.
[42] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-
based learning applied to document recognition,” Proceedings
of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[43] H.-C. Shin, H. R. Roth, M. Gao et al., “Deep convolutional
neural networks for computer-aided detection: CNN archi-
tectures, dataset characteristics and transfer learning,” IEEE
Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–
1298, 2016.
[44] A. Essien and C. Giannetti, “A deep learning framework for
univariate time series prediction using convolutional LSTM
stacked autoencoders,” in Proceedings of the IEEE Interna-
tional Symposium on Innovations in Intelligent Systems and
Applications (INISTA), pp. 1–6, Alberobello, Italy, July 2019.
[45] D. S. Abdelminaam, F. H. Ismail, M. Taha, A. Taha,
E. H. Houssein, and A. Nabil, “CoAID-DEEP: an optimized
intelligent framework for automated detecting COVID-19
misleading information on Twitter,” IEEE Access, vol. 9,
pp. 27840–27867, 2021.
[46] O. Kazakov and O. Mikheenko, “Transfer learning and do-
main adaptation based on modeling of socio-economic sys-
tems,” Business Inform, vol. 14, no. 2, pp. 7–20, 2020.
[47] B. Wang and N. Z. Gong, “Stealing hyperparameters in
machine learning,” in Proceedings of the IEEE Symposium on
Security and Privacy (SP), pp. 36–52, San Francisco, CA, USA,
May 2018.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy