0% found this document useful (0 votes)
21 views17 pages

An Overview of Performance Evaluation Metrics for

This document provides an extensive overview of performance evaluation metrics for short-term statistical wind power forecasting (WPF), emphasizing the importance of estimating forecast accuracy for model validation. It discusses various WPF models, including physical and statistical methods, and introduces the concept of robustness in evaluating model performance across different wind generation scenarios. A numerical study using data from Ireland is presented to analyze the robustness of these performance metrics under varying conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views17 pages

An Overview of Performance Evaluation Metrics for

This document provides an extensive overview of performance evaluation metrics for short-term statistical wind power forecasting (WPF), emphasizing the importance of estimating forecast accuracy for model validation. It discusses various WPF models, including physical and statistical methods, and introduces the concept of robustness in evaluating model performance across different wind generation scenarios. A numerical study using data from Ireland is presented to analyze the robustness of these performance metrics under varying conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Contents lists available at ScienceDirect

Renewable and Sustainable Energy Reviews


journal homepage: www.elsevier.com/locate/rser

An overview of performance evaluation metrics for short-term statistical


wind power forecasting
J.M. González-Sopeña a , V. Pakrashi b,c ,∗, B. Ghosh a
a
Department of Civil, Structural and Environmental Engineering, Museum Building, Trinity College Dublin, Dublin 2, Ireland
b
Dynamical Systems and Risk Laboratory, School of Mechanical & Materials Engineering, University College Dublin, Dublin, Ireland
c
SFI MaREI Centre, University College Dublin and the Energy Institute, University College Dublin, Ireland

ARTICLE INFO ABSTRACT

Keywords: Wind power forecasting has become an essential tool for energy trading and the operation of the grid due
Wind power forecasting to the increasing importance of wind energy. Therefore, estimating the forecast accuracy of a WPF model
Accuracy estimation and understanding how the accuracy is calculated are necessary steps to appropriately validate WPF models.
Performance evaluation metrics
The present study gives an extensive overview of the performance evaluation methods used for assessing
Hybrid decomposition-based models
the forecast accuracy of short-term statistical wind power forecast estimates, and the concept of robustness
is introduced to determine the validity of a model over different wind power generation scenarios over the
testing set. Finally, a numerical study using decomposition-based hybrid models is presented to analyse the
robustness of the performance evaluation metrics under different conditions in the context of wind power
forecasting. Data from Ireland are employed using two different resolutions to examine its influence on the
forecast accuracy.

1. Introduction also WPF models based on statistical and machine learning meth-
ods. [5] introduces a collection of data-mining techniques to discover
Wind power forecasting (WPF) has established itself as one of the hidden patterns in a dataset. Some of these techniques are cluster
main challenges faced by the energy industry due to the stochastic analysis, to split a dataset into different groups with similar character-
character of the wind. As the penetration of wind power increases in the istics, or association analysis, to find relationships among observations.
grid, accurate WPFs become more necessary since they lead to a greater The performance of these techniques is evaluated for different time
performance of the energy market [1] and the operation of the grid [2].
horizons. [6] analyses probabilistic methodologies to predict wind
Furthermore, WPFs errors cannot be prevented and consequently, they
power generation and classifies them into three different categories:
must be reduced and properly assessed to evaluate the validity of a WPF
probabilistic forecasting [7], where the output is regarded as a random
model.
WPF models can be broadly divided into physical and statistical variable, risk index [8], where an index is used to define the level of un-
methods. Physical forecasting models draw on meteorological infor- certainty of the WPF, and scenario forecasting [9,10], where statistical
mation and specific site conditions at a current or future wind farm, scenarios are generated considering the spatial and temporal interde-
combined with the laws of physics, to produce predictions. On the other pendence of prediction errors. [11] overviews combined forecasting
hand, statistical models are built using historical data of the wind farm. techniques for wind speed and wind power prediction. Forecasting
Statistical methods such as time series modelling are used to predict models are usually combined by estimating the output independently
future values of wind power output. Alternatively, statistical models for each model and afterwards a weight coefficient depending on the
are based on machine learning techniques, such as artificial neural efficiency of every model. [12] gives an in-depth analysis of wind
networks (ANNs), or deep learning. Physical models do not require power ramp forecasting, a subclass of wind power prediction that
historical data from the wind farm. focuses on large and fast variations of wind power known as ramp
Different aspects of WPF have been discussed in other papers. A
events. [13] discusses the impact of different uncertainty sources of the
chronological evolution of short-term WPF from a qualitative point of
wind power forecast, such as the weather conditions or the prediction
view is provided in [3]. [4] presents an overview of the main numerical
algorithm.
wind prediction methodologies such as upscaling and downscaling and

∗ Correspondence to: University College Dublin, School of Mechanical and Materials Engineering, Engineering Building Belfield Dublin 4, Ireland.
E-mail address: vikram.pakrashi@ucd.ie (V. Pakrashi).

https://doi.org/10.1016/j.rser.2020.110515
Received 24 December 2019; Received in revised form 24 September 2020; Accepted 23 October 2020
1364-0321/© 2020 Elsevier Ltd. All rights reserved.

Please cite this article as: J.M. González-Sopeña, Renewable and Sustainable Energy Reviews, https://doi.org/10.1016/j.rser.2020.110515
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

The remainder of this paper is organized as follows. Section 2


Abbreviations provides an overview of recently proposed WPF models. Section 3
ACE Average Coverage Error presents the different performance evaluation methods for assessing
AIC Akaike Information Criterion WPFs. Section 4 presents different techniques used to estimate multi-
step ahead forecasts that are considered for longer prediction horizons.
ANN Artificial Neural Network
Section 5 presents a numerical study with data from Ireland. Section 6
AR Autoregressive
includes the concluding remarks of this paper.
ARMA Autoregressive Moving Average
ARIMA Autoregressive Integrated Moving Average 2. Overview of WPF models
BIC Bayesian Information Criterion
CRPS Continuous Ranked Probability Score Traditionally, WPF forecasts have been provided as point or deter-
CWC Coverage Width-based Criterion ministic estimates, meaning that a single value is computed for every
EEMD Ensemble Empirical Mode Decomposition time step to forecast. Nonetheless, every prediction is associated with
ELM Extreme Learning Machine a certain degree of uncertainty which is impossible to reduce entirely.
FFNN Feedforward Neural Network Probabilistic forecasts overcome this issue and allow to obtain a proba-
IA Index of Agreement bilistic estimate for future wind power outputs. Several representations
for probabilistic estimates are found in the literature: predictive den-
IS Interval Sharpness
sities which represent the probability distribution of future outputs,
KDE Kernel Density Estimation
quantiles that divide the probability distribution into intervals, and
LS-SVM Least Squares Support-vector Machine prediction intervals (PIs) which provide the range where a value will
LUBE Lower Upper Bound Estimation be located under a given distribution. The latter representation tends to
MA Moving Average be more appealing for end users, so usually PIs are provided to assess
MAAPE Mean Arctangent Absolute Percentage Er- probabilistic estimates.
ror Comparing the different WPF models proposed in the literature is a
MAE Mean Absolute Error challenging task as they are tested under different conditions. Firstly,
MAPE Mean Absolute Percentage Error the inputs given to the model differ, as univariate time series can be
MASE Mean Absolute Scaled Error considered (only wind power data) or multivariate time series that
MDL Minimum Description Length reflect the dependency on other variables such as wind speed or wind
direction. Other condition is the dataset in terms of its scale, as the
MIMO Multiple-Input Multiple-Output
model is fitted at either a turbine, farm or national production level, in
NMAE Normalized Mean Absolute Error
terms of sample size, as it varies from a few months to approximately
NRMSE Normalized Root Mean Square Error three years of data to benchmark the model, and in terms of time-
PDF Probability Density Function resolution, usually from 10 min to 1 h, as intra-hour resolution data
PI Prediction Interval show higher volatility [15]. Forecasting competitions represent a good
PICP PI Coverage Probability opportunity to compare different forecasting models [16,17], as they
PINAW PI Normalized Average Width are evaluated under the same rules and using the same dataset.
PINC PI Nominal Confidence Model performance is evaluated using deterministic and probabilis-
QR Quantile Regression tic estimates, or both. In this study, the aim is to create a benchmark
RMSE Root Mean Square Error of performance evaluation metrics from the existing literature. A brief
overview of WPF models is presented to provide a context to the fore-
RNN Recurrent Neural Network
casting estimates. Fig. 1 shows an overview of the major categorizations
SC Skill Score
of WPF models and estimates. Further details on the modelling of wind
SDE Standard Deviation Error
power are found in [18,19].
SVM Support-vector Machine
TSA Time Series Analysis 2.1. Time series analysis
VAR Vector Autoregression
VMD Variational Mode Decomposition In the realm of time-series analysis (TSA), autoregressive (AR)
WPF Wind Power Forecasting processes are modelled using a combination of previous variables,
whereas moving average (MA) processes are modelled combining pre-
vious forecast errors. They can be merged together resulting in ARMA
processes, and generalized to non-stationary processes by differencing
None of the discussed studies goes into the limitations of evaluating the original time series, generating the known as ARIMA (autoregres-
WPFs, especially in terms of robustness, meaning that the forecast sive integrated moving average) models. Compared to ANNs and other
accuracy of a certain WPF model should not be affected by the wind machine learning models, TSA provides a well-established statistical
power generation process, and therefore perform similarly for different framework that allow to draw conclusions more confidently [20]. In
the field of WPF, vector autoregression (VAR) is a generalization of AR
scenarios of wind power generation [14]. Even though performance
models that allows to predict WP for several wind farms considering
evaluation metrics are introduced in the literature to analyse the fore-
their spatio-temporal dependencies. [21] tests a VAR-based method
cast accuracy of WPF models, the robustness of the models is often
using 22 wind farms located in southeastern Australia, whereas [22]
disregarded. The main contributions of this paper are the review of the
employs a similar method for 172 and 100 wind farms in France
main performance evaluation metrics used in the literature to assess and Denmark respectively, proving to be effective for large datasets.
WPF models and the empirical evaluation of this feature through a ARFIMA (autoregressive fractionally integrated moving average model)
case study using a set of WPF models and two different multi-step models [23] are an extension of ARIMA that allows to characterize
ahead forecast strategies. Furthermore, forecasts are estimated using long-memory for time series. [24] proposes to use an ARFIMA process
two different resolutions (10 and 60 min) to examine its influence in to model the linear component of WP time series, together with a least-
the evaluation of the models. squares support-vector machine (LS-SVM) to estimate the non-linear

2
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Fig. 1. Overview of WPF models and estimates.

component. [25] introduces a model where dynamic clustering and where the PIs are evaluated changing the reference membership values
linear regression are its main features, and requiring less data in the and a set of NNs with structures from 5 to 15 neurons. [40] proposes
training phase in comparison with ANNs or support-vector machines a methodology inspired on the LUBE method in which the lower and
(SVMs) is its main advantage, being appropriate for new wind farms as upper boundaries of the PI are designed using a novel inter type-2 fuzzy
the method can be applied with less than 1 month of data. [26] devel- model. Another alternative to build PIs is quantile regression (QR), a
ops a non-parametric statistical model where meteorological changes non-parametric method where the uncertainty is estimated by means of
are modelled by a Markov state transition process and the local dy- a set of forecast quantiles. QR is characterized by its free-distribution
namic behaviours by AR time series. [27] applies a hybrid method
approach and its flexibility to include predictors [7]. Further details
where an ARMA model is used as a first step to build a base model.
on QR are found in Section 5.2. This methodology can be combined
easily with ANN models to extend deterministic estimates to PIs, such
2.2. Machine learning
as [41], where QR is used to establish probabilistic estimates for a
Most of the work on machine learning models is focused on ANNs. neural network based methodology. Additionally, a QR neural network
A basic feedforward neural network (FFNN) is defined as a classical is combined with another non-parametric approach known as kernel
network with a hidden layer between the input and output layers and density estimation (KDE) in [42], where the predictions at different
uses the backpropagation training algorithm [28]. ANN methods used quantiles are used as an input for the KDE to model the wind power
in the literature aim to improve forecast accuracy while reducing or probability density function (PDF) information.
maintaining the computational burden of the network by implementing An alternative to the backpropagation algorithm to train faster
improved optimization algorithms, using feature selection criteria to ANNs are extreme learning machines (ELMs), a NN-based technique
avoid the use of redundant data and avoid the propagation of errors based on a single-hidden feedforward neural network (FFNN) where the
generated by additional exogenous inputs, or using hybrid methodolo- weights between the input and the hidden layers are randomly assigned
gies where the wind power time series is previously decomposed and and never updated, accelerating the training rate of the network.
the ANNs are applied to the resulting modes. For instance, improved ELM-based models found in the literature are [43], that proposes a
optimization algorithms are found in [29], where the convergence and
bidirectional mechanism based on this technique, and [44] introduces
accuracy of the training algorithm is improved using an error feedback
a two-stage WPF model where the ELM is optimized by the grey wolf
scheme, in [30] the clonal search algorithm contributes to capture non-
algorithm. Estimation of PIs using ELM can be found in [45–48].
linearities in the data, and [31] uses a optimization algorithm where the
concepts of evolutionary computing and particle swarm optimization Other WPF models using ANNs are combined with other techniques
are combined. Feature selections methods are implemented in [32], such as a SVM [49], a LS-SVM [50], or a Gaussian process [51]. Alter-
using genetic programming to prevent the propagation of errors of the natively, other machine learning WFP models in the literature are [52],
predictors, in [33] data are clustered into groups of similar patterns and where and ensemble of decision trees and support vector regression is
the best one is chosen and trained by an ANN, and in [34] a feature used, and [53], where deterministic estimates are generated by random
selection technique based on mutual information is used to select the forests and intervals by QR forests.
most informative input variables to feed the ANN. Decomposition-
based hybrid models decompose the WP time series before training
the ANN and have shown a better forecasting performance in re- 2.3. Deep learning
cent times [35]. Decomposition techniques such as wavelet transform
are employed in [30,33], or variational mode decomposition (VMD)
Deep learning based models are an extension of machine learning
in [36]. Decomposition-based hybrid models are described in detail in
methods in the sense that networks are conformed by several layers that
Section 5.2.
provide different interpretations of the data fed to the model. Details
ANNs can be also employed to built PIs. An interesting non-
parametric approach is the method known as LUBE (lower upper bound on the implementation of deep learning for energy forecasting can be
estimation) method [37,38]. It consists of a FFNN with two outputs found in [54]. [55] obtains WPFs with a deep learning based model
which represent the upper and lower boundaries of the constructed PI. using convolutional neural networks. [56] uses the LUBE method with
The loss function is based on the two main properties of the PI: its a recurrent neural network (RNN) model instead of training the model
coverage and its width. [39] adds a fuzzy-based loss function to the with a FFNN. [57] proposes a deep neural network based ensemble
LUBE method to facilitate the adjustment of the NN parameters. The technique combined with the concept of transfer learning to extend the
forecast accuracy of the method is demonstrated for two case studies knowledge gained training one wind farm to others.

3
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

3. Performance evaluation metrics of WP forecasts are periods where a wind farm does not generate any power. The mean
arctangent absolute percentage error (MAAPE) is another alternative
The assessment of a forecasting model is a crucial step in its develop- option to the MAPE [66]. It transforms the MAPE using the arctangent
ment to address its validity for estimating future values of wind power. function. Its main advantage is the preservation of the characteristics
The forecast accuracy of deterministic estimates is evaluated measuring of the MAPE while overcoming the limitations of the MAPE.
the discrepancy between the forecast and actual values through several The prediction error can be decomposed into the random error,
criteria. The evaluation of probabilistic estimates is a more challenging which is inherently unpredictable, and the systematic error, which oc-
task, as the forecast cannot be compared directly to the actual values, curs due to inaccuracies in the system. The standard deviation of errors
and several properties of the forecast have to be addressed to verify the (Eq. (4)) addresses the random component of the prediction error,
forecast accuracy of the model. whereas the bias (Eq. (5)) deals with the systematic component [67].
Lastly, another common metric in the literature to assess determin-
3.1. Accuracy of deterministic estimates istic estimates is the index of agreement (Eq. (6)). It was originally
proposed in [68] and refined versions of this index have been developed
Many performance evaluation methods are used in the literature to since then [69]. It measures to which degree the predictions are error-
assess the accuracy of deterministic estimates. The most common ones free and takes values between zero when the adjustment between
are shown down below. predictions and observations is null, and one when the predictions fully
- Mean absolute error (MAE): pair with the actual values.
Another alternative to examine model performance is to use prob-
1 ∑
𝑁
𝑀𝐴𝐸 = |𝑦̂ − 𝑦𝑖 | (1) abilistic statistical measures that address not only the performance
𝑁 𝑖=1 𝑖
but also the complexity of the model. Some of them are the Akaike
- Root mean square error (RMSE): Information Criterion (AIC), the Bayesian Information Criterion (BIC),
√ and Minimum Description Length (MDL).

√1 ∑ 𝑁
𝑅𝑀𝑆𝐸 = √
In order to facilitate the comparison with benchmark models, the
(𝑦̂ − 𝑦𝑖 )2 (2)
𝑁 𝑖=1 𝑖 improvement of a technique is defined by means of the relation [67]:

- Mean absolute percentage error (MAPE): 𝑀


Improvement = 1 − (7)
𝑀𝑟𝑒𝑓
1 ∑
𝑁
| 𝑦𝑖 − 𝑦̂𝑖 |
| | (3) where M is the value of the selected measure for a specific model,
𝑀𝐴𝑃 𝐸 = | 𝑦 | ⋅ 100%
𝑖=1 | |
𝑁 𝑖 and 𝑀𝑟𝑒𝑓 is the value of the same measure for the benchmark model.
- Standard deviation error (SDE): Comparing different models become an issue as there is not a unified
√ criterion for selecting benchmarks to evaluate them. Typically, models

√1 ∑ 𝑁
𝑆𝐷𝐸 = √
are compared to the persistence model, as it is a requirement to
(𝜖 − 𝜖)2 (4)
𝑁 𝑖=1 𝑖 outperform it to be considered skillful, and state-of-the-art methods
such as neural networks.
- Bias:

1 ∑
𝑁
3.2. Accuracy of probabilistic estimates
𝐵𝐼𝐴𝑆 = 𝑦 − 𝑦̂𝑖 (5)
𝑁 𝑖=1 𝑖
A PI is an interval that gives the expectation of where a future value
- Index of Agreement (IA):
will fall with a specified probability. Therefore, the PI relies on the

𝑁
significance level 𝛼. The probability that a future wind power output 𝑦𝑖
(𝑦𝑖 − 𝑦̂𝑖 )2
lies within the PI is known as Prediction interval nominal confidence
𝑖=1
𝐼𝐴 = 1 − (6) (PINC):

𝑁
(|𝑦̂𝑖 − 𝑦| + |𝑦𝑖 − 𝑦|)2 𝑃 𝐼𝑁𝐶 = 100(1 − 𝛼)% (8)
𝑖=1

where N is the number of samples, 𝑦𝑖 is the actual value, 𝑦̂𝑖 the Taking this into consideration, a PI for a future time step i and a
predicted value, 𝑦 is the mean value of the real values, 𝜖𝑖 = 𝑦𝑖 − 𝑦̂𝑖 is significance level 𝛼 is defined as:
the prediction error (also known as residual), and 𝜖 is the average value
𝐼̂𝑖𝛼 = 𝑈̂ 𝑖𝛼 − 𝐿̂ 𝛼𝑖 (9)
of the errors. Table 1 summarizes the performance evaluation metrics
used in the literature for deterministic wind power estimates. where 𝑈̂ 𝑖𝛼 and 𝐿̂ 𝛼𝑖 are the upper and lower boundaries of the PI respec-
Eq. (1) shows the mean absolute error (MAE). It is defined as tively. The most common metrics defined for probabilistic estimates are
the average value of the predictions errors in absolute values. The shown down below.
root mean square error (Eq. (2)) depicts the standard deviation of - Prediction interval coverage probability (PICP):
the residuals. Normalized versions of the MAE (NMAE) and the RMSE
1 ∑
𝑁
(NRMSE) are commonly used in the literature as well. While both MAE 𝑃 𝐼𝐶𝑃 = 𝑐 (10)
and RMSE are suitable indicators for assessing the performance of a 𝑁 𝑖=1 𝑖
model, the RMSE should be preferred when the model errors follow a
where N is the number of samples and 𝑐𝑖 is
Gaussian distribution [64]. {
Another statistical measure is the MAPE (Eq. (3)). It quantifies the 1, if 𝑦𝑖 ∈ 𝐼̂𝑖𝛼
𝑐𝑖 = (11)
accuracy as a percentage of the error. However, the MAPE produces 0, otherwise
very large values when the actual values are close to zero and is unde-
fined when the actual value is equal to zero. Alternative versions of the - Prediction interval normalized average width (PINAW ):
MAPE have been proposed to prevent this shortcoming. For instance,
1 ∑ ̂𝛼
𝑁
the mean absolute scaled error (MASE) is an alternative defined as the 𝑃 𝐼𝑁𝐴𝑊 = 𝐼 (12)
𝑁𝑅 𝑖=1 𝑖
MAE of the forecast values scaled by the MAE of the in-sample naïve
forecast [65]. It is specially useful for wind power time series, as there where R is the range of the target variable.

4
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Table 1
Performance evaluation metrics for deterministic estimates.
Reference Year MAE NMAE RMSE NRMSE MAPE SDE Bias IA Others
Dowell et al [21] 2015 ✕ ✕
Messner et al [22] 2019 ✕ ✕ ✕
Yuan et al [24] 2017 ✕ ✕ ✕
Ozkan et al [25] 2015 ✕ ✕
Jiang et al [27] 2017 ✕ ✕ ✕
Chang et al [29] 2017 ✕ ✕
Chitsaz et al [30] 2015 ✕ ✕
Osório et al [31] 2015 ✕ ✕ ✕
Zameer et al [32] 2017 ✕ ✕ ✕
Azimi et al [33] 2016 ✕ ✕ ✕
Li et al [34] 2015 ✕ ✕
Naik et al [36] 2018 ✕ ✕ ✕
Haque et al [41] 2014 ✕ ✕ ✕
He et al [42] 2018 ✕ ✕ ✕
Zhao et al [43] 2016 ✕ ✕ ✕ ✕ ✕
Hao et al [44] 2019 ✕ ✕ ✕ ✕
Buhan et al [49] 2015 ✕
Liu et al [50] 2017 ✕ ✕ ✕
Lee et al [51] 2013 ✕ ✕ ✕ ✕ ✕ ✕
Heinermann et al [52] 2016 ✕
Lahouar et al [53] 2017 ✕ ✕ ✕ ✕ ✕
Qureshi et al [57] 2017 ✕ ✕ ✕
Y. Zhang et al [58] 2016 ✕
Yan et al [59] 2016 ✕ ✕
Yang et al [60] 2015 ✕ ✕
Y. Wang et al [61] 2017 ✕ ✕ ✕
Han et al [62] 2015 ✕ ✕
Zjavka et al [63] 2018 ✕ ✕ ✕

Table 2
Performance evaluation metrics for probabilistic estimates.
Reference Year PICP PINAW CWC ACE IS CRPS SC Others
Dowell et al [21] 2015 ✕ ✕
Xie et al [26] 2018 ✕ ✕
Khosravi et al [37] 2013 ✕ ✕ ✕
Quan et al [38] 2013 ✕ ✕ ✕
Kavousi-Fard et al [39] 2015 ✕ ✕
Zou et al [40] 2019 ✕ ✕ ✕ ✕
Haque et al [41] 2014 ✕ ✕
He et al [42] 2018 ✕ ✕
G. Zhang et al [45] 2014 ✕ ✕
Wan et al [46] 2016 ✕ ✕
Mahmoud et al [47] 2018 ✕ ✕ ✕ ✕
Afshari-Igder et al [48] 2018 ✕ ✕
Lahouar et al [53] 2017 ✕
H. Wang et al [55] 2017 ✕ ✕ ✕
Shi et al [56] 2017 ✕ ✕ ✕ ✕
Y. Zhang et al [58] 2016 ✕
Yang et al [60] 2015 ✕
Y. Wang et al [61] 2017 ✕ ✕ ✕ ✕ ✕
Gallego-Castillo et al [70] 2016 ✕
Lin et al [71] 2018 ✕ ✕
Khorramdel et al [72] 2018 ✕ ✕ ✕
Alessandrini et al [73] 2015 ✕ ✕

- Coverage width-based criterion (CWC): where 𝑏𝑖 is


[ ]
𝐶𝑊 𝐶 = 𝑃 𝐼𝑁𝐴𝑊 1 + 𝛾(𝑃 𝐼𝐶𝑃 )𝑒−𝜂(𝑃 𝐼𝐶𝑃 −𝜇) (13) ⎧−2𝛼 𝐼̂𝛼 − 4(𝐿̂ 𝛼 − 𝑦𝑖 ), if 𝑦𝑖 < 𝐿̂ 𝛼
𝑖 𝑖 𝑖

where 𝛾(PICP) is a step function dependent on the values of PICP and 𝑏𝑖 = ⎨−2𝛼 𝐼̂𝑖𝛼 , if 𝑦𝑖 ∈ 𝐼̂𝑖𝛼 (17)

⎩−2𝛼𝐼𝑖 − 4(𝑦𝑖 − 𝑈̂ 𝑖 ), if 𝑦𝑖 > 𝑈̂ 𝑖
𝛼 𝛼 𝛼
𝜇:
{
0, if 𝑃 𝐼𝐶𝑃 ≥ 𝜇 - Continuous ranked probability score (CRPS):
𝛾(𝑃 𝐼𝐶𝑃 ) = (14)
1, if 𝑃 𝐼𝐶𝑃 < 𝜇
1 ∑
𝑁 𝑃𝑚𝑎𝑥
- Average coverage error (ACE): 𝐶𝑅𝑃 𝑆 = [𝐶𝐷𝐹𝑖 − 𝐻(𝑦 − 𝑦𝑖 )]2 𝑑𝑦 (18)
𝑁 𝑖=1 ∫0
𝐴𝐶𝐸 = 𝑃 𝐼𝐶𝑃 − 𝑃 𝐼𝑁𝐶 (15)
where 𝐶𝐷𝐹𝑖 is the cumulative form of the distribution and H is the
- Interval sharpness (IS): Heaviside step function:
{
1 ∑
𝑁
0, if 𝑦 < 𝑦𝑖
𝐼𝑆 = 𝑏 (16) 𝐻= (19)
𝑁 𝑖=1 𝑖 1, otherwise

5
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

- Skill Score (SC): use of additional exogenous inputs such as meteorological data may be
[𝑀 ] considered to train a statistical model for this horizon as the dynamics
1 ∑
𝑁
1 ∑
𝑁 ∑ 𝛼 𝛼
𝑆𝐶 = 𝑆𝐶𝑖 = (𝜉 𝑗 − 𝛼𝑗 )(𝑦𝑖 − 𝑞𝑖 𝑗 ) (20) of wind power generation become significant and could potentially lead
𝑁 𝑖=1
𝑁 𝑖=1 𝑗=1 to a better performance of the model. Nonetheless, the use of these
where 𝑆𝐶𝑖 is a set of quantiles on a single time step i, 𝛼𝑗 is the quantile data could increase the computational complexity of the model [78]
𝛼 and these datasets are associated with their own prediction error that
proportion, 𝑞𝑖 𝑗 is the quantile forecast, and 𝜉 𝛼𝑗 is an indicator variable
denoted by will affect the prediction accuracy [79]. Alternatively, physical models
{ are used for short-term forecasting as well in the absence of an actual
𝛼
1, if 𝑦𝑖 < 𝑞𝑖 𝑗 wind farm, although they are more computationally expensive com-
𝜉 𝛼𝑗 = (21)
0, otherwise pared to statistical models. Longer-term forecasts usually make use of
physical methods and are used to make decisions on unit commitment
Table 2 shows the application of these performance evaluation metrics
or maintenance scheduling [80].
for probabilistic estimates in the recent literature.
For longer prediction horizons from very short-term and short-term
The two main features of a PI, the most common representation for
statistical models, it is necessary to consider multi-step ahead forecasts.
probabilistic estimates, are its reliability and its informativeness: a PI
A multi-step ahead forecast estimates the next H steps [𝑦𝑡+1 , … , 𝑦𝑡+𝐻 ] of
will be reliable when the actual wind power output falls within the
a time series. There are several strategies to approach this matter [81,
interval, whereas it will be informative depending on its width. Ideally, 82]: the recursive, the MIMO (Multiple-input Multiple-output) and the
the PI should be as narrow as possible to facilitate the decision-making direct strategies are the most commonly used approaches to estimate
process. The uncertainty of the prediction can be attributed to several multi-step ahead predictions.
sources. One source is the model uncertainty (or epistemic uncertainty), In the recursive strategy [81], also known as iterated or multi-state
which occurs due to a misspecification of the forecasting model or the strategy, the model is trained to compute one-step ahead forecasts.
parameters of the model. The other main source is the data uncertainty Afterwards, the next steps are predicted iteratively using the previous
(or aleatoric uncertainty), which quantifies the inherent noise of the one-step ahead forecasts as inputs. This strategy is sensitive to larger
observations. prediction horizons, as the errors of the predictions accumulate for
The PICP (Eq. (10)) is a metric that measures exclusively the every iteration.
reliability of the PI. It accounts for the average of target values cov-
ered by the interval. Its counterpart in terms of width is the PINAW ⎧𝑓 (𝑦𝑡 , … , 𝑦𝑡−𝑑+1 ) if ℎ = 1,

(Eq. (12)). The PICP and the PINAW can be further merged with 𝑦̂𝑡+ℎ = ⎨𝑓 (𝑦̂𝑡+ℎ−1 , … , 𝑦̂𝑡+1 , 𝑦𝑡 , … , 𝑦𝑡−𝑑+ℎ ) if ℎ ∈ {2, … , 𝑑} (22)
the CWC (Eq. (13)). In this equation, 𝜂 and 𝛾 are two controlling ⎪
⎩𝑓 (𝑦̂𝑡+ℎ−1 , … , 𝑦̂𝑡+ℎ−𝑑 ) if ℎ ∈ {𝑑 + 1, … , 𝐻}
hyperparameters that determine how much invalid PIs are penalized.
Alternative versions of the CWC have been introduced in the literature. where d denotes the number of steps used in the input set.
For instance, the new CWC is proposed in [56], which includes a The MIMO strategy [83] produces a vector with the whole sequence
new term designed to take better into consideration the information of outputs training a single model.
provided by the actual measurements. Another alternative is presented
[𝑦̂𝑡+𝐻 , … , 𝑦̂𝑡+1 ] = 𝑓 (𝑦𝑡 , … , 𝑦𝑡−𝑑+1 ), (23)
in [45], which considers an additional function to account for those
samples that lie beyond the interval. The direct strategy [81] consists of training H different models in-
Another parameter that describes the reliability of a PI is the ACE dependently, one for each horizon. While it prevents the accumulation
(Eq. (15)). It is defined as the deviation between the PICP and the PINC. of errors, it neglects the dependencies between the H forecasts and it
Smaller deviations indicate more reliable PIs. This metric provides carries a larger computational cost to train every model separately.
additional information compared to the PICP since a larger PICP is not
𝑦̂𝑡+ℎ = 𝑓ℎ (𝑦𝑡 , … , 𝑦𝑡−𝑑+1 ), ℎ ∈ {1, … , 𝐻} (24)
necessarily better for a given PINC [74].
The interval sharpness (also known as Winkler score) (Eq. (16)) Further combinations of these models lead to other strategies. For
evaluates the PI in terms of its width [75]. Narrower intervals are instance, the DirRec strategy [84] combines elements from the recur-
rewarded by this metric, whereas those PIs where the observations do sive and direct approaches, and the DIRMO strategy [85] presents a
not lie inside are penalized. trade-off between the direct and the MIMO strategies.
The CRPS (Eq. (18)) is a global criterion as it assesses both features
simultaneously. This metric is equivalent to the MAE when a forecast 5. Numerical study
generates a deterministic estimate. It differs from other metrics as the
CRPS assesses cumulative distribution functions. Lower scores of the In this section, the features of the performance evaluation metrics
CRPS mean a better performance of the model. for WPF models have been investigated considering data from a single
The Eq. (20) denotes the scoring rule proposed by [76] when wind-farm, modelled using a set of decomposition-based hybrid models.
probabilistic forecasts are estimated by non-parametric models and are Firstly, the dataset and the forecasting models are described, followed
represented by a set of quantile forecasts. Its orientation is positive and by a discussion of the features of the metrics over the testing set.
a value of zero represents a perfect forecast.
Comparison with benchmark models for probabilistic estimates can 5.1. Dataset
be performed by using the same relation presented for deterministic
estimates in Eq. (7). The dataset used for the case study contains measurements from
a wind farm located in Ireland. Measurements are collected every
4. Prediction horizon for WPF estimates 10 min between January 2017 and December 2017 (Fig. 2). As shown
in this figure, wind power generation shows a large variability as it
The prediction horizon is one of the main aspects to consider when a is influenced by wind and other meteorological variables, as well as
WPF model is developed. Very-short term forecasts consider predictions human activities such as maintenance operations. Less than 1% of
up to 30 min ahead. The main applications of these forecasts are wind the values are missing and have been reconstructed considering the
farm control and operation of reserves [77]. Short-term forecasts take previous and posterior values. Simulations are run for one wind turbine
into account predictions from hours to a few days and are an indis- with the reconstructed dataset with a temporal 10-min resolution and a
pensable tool for power system management and energy trading. The resampled dataset with a 1-h resolution. The dataset has been divided

6
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Fig. 2. Historical wind power generation during the year 2017 (left) and a sample of the wind power generation time series from February to May (right). Data are shown with
a temporal resolution of 60 min.

Table 4
WPF models used for the numerical study.
Decomposition technique Training model Multi-step strategy
VMD FFNN MIMO
VMD FFNN Recursive
EEMD FFNN MIMO
EEMD FFNN Recursive
– FFNN MIMO
– FFNN Recursive

to decompose the wind power time series: ensemble empirical mode


decomposition (EEMD) and VMD. Additionally, two multi-step forecast
strategies (MIMO and Recursive) are implemented. In total, six models
are employed (Table 4), including two FFNN models where the WP time
series is not decomposed.
EEMD [86] is a non-linear signal processing technique in which the
time series is decomposed into a set of stationary modes and mitigate
mode mixing issues existing in the standard EMD approach [87].
EEMD obtains the modes, known as intrinsic mode functions (IMFs),
as the mean value of the modes generated by an ensemble of various
noise-added copies of the original signal.
Fig. 3. Flowchart of the WP forecasting model.
VMD [88] is another method to decompose a signal into its principal
modes. The modes are estimated concurrently by looking for a set of
Table 3
modes and their centre frequencies all together to reproduce the signal.
Sample size of training, validation and testing sets and summary statistics of the
datasets. The bandwidth of each mode is estimated by a constrained variational
Low-res set High-res set optimization process: first, the Hilbert transform is employed to de-
termine the associated analytical signal and consequently a unilateral
Resolution 60-min 10-min
frequency spectrum; second, the mode’s frequency spectrum is shifted
Train 7296 43 776
Validation 1080 6480
to baseband by mixing with an exponential tuned to the respective
Test 384 2304 estimated centre frequency; and third, the 𝐻 1 Gaussian smoothness
Mean (kW) 726.49 726.49 of the demodulated signal is used to identify the bandwidth of the
Std (kW) 696.15 715.30 mode. This process is transformed into an unconstrained problem by
Min (kW) 0 0 introducing a penalty term and Lagrangian multipliers 𝜆 as it follows:
Q1 (kW) 157.13 140
[( ) ]
Q2 (kW) 478 462 ∑𝐾
‖ ‖2
𝐿({𝑢𝑘 }, {𝜔𝑘 }, 𝜆) = 𝛼 ‖𝜕𝑡 𝛿(𝑡) + 𝑗 ∗ 𝑢 (𝑡) 𝑒−𝑗𝜔𝑘𝑡 ‖
Q3 (kW) 1134 1151 ‖ 𝑘 ‖
Max (kW) 2364.3 2365 𝑘=1 ‖ 𝜋𝑡 ‖2
⟨ ⟩
‖ ∑𝐾
‖2 ∑𝐾

+‖𝑦(𝑡) − ‖
𝑢𝑘 (𝑡)‖ + 𝜆(𝑡), 𝑦(𝑡) − 𝑢𝑘 (𝑡) (25)
‖ 𝑘=1 ‖2 𝑘=1
into training, validation and testing sets to train the model and test where y(t) is the original time series, {𝑢𝑘 } the set of all modes, {𝜔𝑘 }
the accuracy of the forecasts provided by the models. Table 3 shows the set of the respective centre frequencies, 𝛿(𝑡) the Dirac function,
the number of samples for each set considering the low-resolution (1-
and 𝛼 the balancing parameter of the data fidelity constraint. Fi-
h) and high-resolution (10-min) data and the summary statistics for
nally, this equation is solved using the alternate direction method of
both sets of data. The location of the wind farm is not disclosed due
multipliers [89].
to confidentiality reasons.
Using either EEMD or VMD, every resulting mode is trained using
5.2. Modelling methodology an FFNN and afterwards, the signal is reconstructed by aggregating
all forecasts (Fig. 3). The FFNN is defined as a classical network with
The performance evaluation metrics existing in the literature show a hidden layer between the input and output layers and uses the
that decomposition-based hybrid models contribute to a better fore- backpropagation training algorithm [28], in which a Rectified Linear
casting accuracy for WPF [35]. Two different techniques are applied Unit (ReLU) is used as activation function in the network. In order

7
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Table 5
Results for deterministic estimates.
Forecast horizon: 6-h ahead
Low-resolution dataset (1-h resolution)
Method Strategy NMAE (%) NRMSE (%) MAPE (%) NBias (%) NSDE (%) IA
EEMD-FFNN MIMO 10.40 14.33 29.80 4.12 13.72 0.936
EEMD-FFNN Recursive 10.80 14.66 31.45 −2.09 14.51 0.935
VMD-FFNN MIMO 1.77 2.44 12.05 −0.58 2.37 0.998
VMD-FFNN Recursive 4.51 5.45 20.25 4.38 3.23 0.992
FFNN MIMO 21.08 28.41 44.91 5.98 27.97 0.706
FFNN Recursive 21.87 30.12 48.61 2.35 30.02 0.730
High-resolution dataset (10-min resolution)
Method Strategy NMAE (%) NRMSE (%) MAPE (%) NBias (%) NSDE (%) IA
EEMD-FFNN MIMO 11.16 15.50 28.69 1.03 15.46 0.929
EEMD-FFNN Recursive 24.91 33.02 56.97 23.51 23.19 0.675
VMD-FFNN MIMO 8.27 10.77 25.19 −3.61 10.14 0.969
VMD-FFNN Recursive 13.65 19.02 35.50 9.49 16.48 0.890
FFNN MIMO 22.17 30.55 46.84 7.48 29.62 0.689
FFNN Recursive 34.26 46.11 74.19 34.05 31.10 0.478
Forecast horizon: 24-h ahead
Low-resolution dataset (1-h resolution)
Method Strategy NMAE (%) NRMSE (%) MAPE (%) NBias (%) NSDE (%) IA
EEMD-FFNN MIMO 18.18 22.33 35.84 1.54 22.27 0.765
EEMD-FFNN Recursive 24.69 33.03 55.89 20.71 25.73 0.582
VMD-FFNN MIMO 5.95 7.57 18.99 1.66 7.38 0.982
VMD-FFNN Recursive 8.81 10.48 25.78 −7.68 7.13 0.969
FFNN MIMO 26.66 34.93 49.68 12.66 32.56 0.379
FFNN Recursive 28.74 36.92 58.63 12.50 34.74 0.434
High-resolution dataset (10-min resolution)
Method Strategy NMAE (%) NRMSE (%) MAPE (%) NBias (%) NSDE (%) IA
EEMD-FFNN MIMO 20.28 25.43 38.52 1.06 25.32 0.689
EEMD-FFNN Recursive 43.49 53.73 36.47 −39.05 36.91 0.522
VMD-FFNN MIMO 20.39 25.94 40.41 4.90 25.48 0.713
VMD-FFNN Recursive 29.40 35.77 38.97 −11.20 33.97 0.589
FFNN MIMO 28.62 38.25 56.42 17.63 33.95 0.389
FFNN Recursive 63.79 71.43 31.31 −63.79 32.14 0.411

Fig. 4. Performance of the selected models for 6-h and 24-h ahead deterministic estimates. Data are shown with a temporal (60-min) resolution.

Fig. 5. Residuals for 6-h and 24-h ahead deterministic estimates. Data are shown with a temporal (60-min) resolution.

8
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Fig. 6. Distribution of the residuals for 6-h and 24-h ahead predictions.

Fig. 7. Performance evaluation metrics for deterministic estimates with respect to the number of steps ahead (low resolution dataset).

to facilitate the training of the network, data from every mode are where y is a data point of a given mode, 𝑦𝑚𝑎𝑥 is the maximum value,
normalized before the training using the following expression: and 𝑦𝑚𝑖𝑛 is the minimum value of that mode. As the outputs are nor-
𝑦 − 𝑦𝑚𝑖𝑛 malized as well, the scaling is undone (inverse normalization) before
𝑦𝑛𝑜𝑟𝑚 = (26) aggregating the forecast outputs of each mode. In every case for this
𝑦𝑚𝑎𝑥 − 𝑦𝑚𝑖𝑛

9
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Fig. 8. Daily performance of metrics for deterministic estimates (VMD-FFNN MIMO model).

study, the vector input has the same size as the vector output, which time-resolution, robustness, and prediction horizon length. In the case
will be as long as the prediction horizon. of probabilistic estimates, PIs are chosen to present the uncertainty of
Lastly, the boundaries of the PIs are estimated by quantile regres- the forecast as they are the most widespread representation. Hence the
sion, using an asymmetric loss function (also known as pinball loss CRPS and SC are not considered in the study, as they are used to assess
function) that depends on the required quantile 𝜏: other representations of probabilistic estimates.
{
𝜏𝜖, if 𝜖 ⩾ 0 5.3.1. Deterministic predictions
𝜌𝜏 (𝜖) = (27)
(𝜏 − 1)𝜖, otherwise Forecasts are estimated 6-h and 24-h ahead. These horizons are
and taking it into consideration, the error function to be minimized is important for activities such as energy trading, where an initial forecast
is usually provided 24-h ahead and subsequently corrected between 6-
1 ∑
𝑁
and 8-h ahead. Table 5 shows the results of the performance evaluation
𝐸𝜏 = 𝜌 (𝑦(𝑖) − 𝑦̂𝜏 (𝑖)) (28)
𝑁 𝑖=1 𝜏 measurements for 6-h and 24-h ahead forecasts.
The values of MAE, RMSE, BIAS, and SDE are normalized by the
where y(i) is the target value at a time i and 𝑦̂𝜏 (𝑖) is the conditional capacity of the wind turbine to facilitate their understanding and the
𝜏-quantile at the same time. By doing so, the conditional quantiles are assessment of the model errors. The values of MAPE and IA are not
estimated instead of the conditional mean, allowing to compute PIs. normalized, as MAPE is established by definition as a percentage, and
Further information can be found in [90,91]. IA only takes values between zero and one. The lower scores for NMAE
Deterministic estimates are obtained using only one output for every and NRMSE indicate that overall the VMD-FFNN MIMO model pro-
FFNN, whereas the boundaries of PIs are estimated by QR as described duces better forecasts considering every source of error. Additionally,
in Eqs. (27) and (28). This technique has been chosen for the case study the better scores for the NBIAS and NSDE indicate that this model deals
to provide interval estimates as it is a well-established technique in better with the systematic and random error separately. The numerical
the field of WPF [7,92]. Further methodologies based on QR can be values of the metrics are larger for 24-h ahead forecasts, indicating
found in the literature [41,46]. The final prediction will be a single that the forecast accuracy is lower. Values close to one for the IA,
output for every step for deterministic estimates and the lower and such as the VMD-FFNN MIMO models for both 6-h and 24-h ahead
upper boundaries of the interval in the case of PIs. forecasts using the low-resolution dataset, indicate that the forecasts
have a low degree of error. Compared to the rest of the metrics, MAPE
5.3. Results shows unsteady values that are not consistent with the rest of metrics
evaluating the overall forecast accuracy (NMAE, NRMSE, and IA). As
The decomposition-based hybrid models previously described are observed in Fig. 4, the 6-h ahead forecasts are more accurate than 24-h
used to obtain forecast estimates, that will allow to produce numerical ahead forecasts, as the forecast accuracy is lower for larger prediction
values for the performance evaluation metrics. In this study, the metrics horizons. Fig. 5 shows the residuals in the 1-h resolution testing set.
not only evaluate the general forecast accuracy of the models, as usually In both scenarios, the prediction error shows less variability when the
considered during the model development stage, but also in terms of VMD technique is applied to decompose the wind power time series.

10
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Table 6
Results for 6-h ahead prediction intervals.
Confidence level: 99%: Low-resolution dataset (1-h resolution)
Method Strategy PICP (%) PINAW (%) CWC ACE IS
EEMD-FFNN MIMO 100 99.66 0.997 0.01 −47.84
EEMD-FFNN Recursive 95 76.77 6.440 −0.04 −78.45
VMD-FFNN MIMO 100 77.95 0.780 0.01 −37.42
VMD-FFNN Recursive 100 60.05 0.601 0.01 −28.82
FFNN MIMO 93.61 92.68 14.641 −0.05 −55.95
FFNN Recursive 63.89 80.69 3.3e7 −0.35 −394.49
Confidence level: 99%: High-resolution dataset (10-min resolution)
Method Strategy PICP (%) PINAW (%) CWC ACE IS
EEMD-FFNN MIMO 100 96.88 0.969 0.01 −46.50
EEMD-FFNN Recursive 96.25 81.20 1.881 −0.03 −80.02
VMD-FFNN MIMO 100 90.32 0.903 0.01 −43.35
VMD-FFNN Recursive 88.98 40.14 1.495 −0.1 −92.97
FFNN MIMO 99.8 98.77 0.988 0.01 −47.51
FFNN Recursive 41.48 67.26 2.1e12 −0.57 −1096.71
Confidence level: 95%: Low-resolution dataset (1-h resolution)
Method Strategy PICP (%) PINAW (%) CWC ACE IS
EEMD-FFNN MIMO 100 81.96 0.820 0.05 −196.70
EEMD-FFNN Recursive 88.61 41.29 10.486 −0.06 −184.08
VMD-FFNN MIMO 100 39.75 0.397 0.05 −95.39
VMD-FFNN Recursive 100 31.09 0.311 0.05 −74.61
FFNN MIMO 93.06 83.43 3.04 −0.01 −263.53
FFNN Recursive 43.89 50.51 6.3e10 −0.5 −723.95
Confidence level: 95%: High-resolution dataset (10-min resolution)
Method Strategy PICP (%) PINAW (%) CWC ACE IS
EEMD-FFNN MIMO 100 84.76 0.848 0.05 −203.42
EEMD-FFNN Recursive 45.74 32.07 44.521 −0.49 −850.53
VMD-FFNN MIMO 100 68.49 0.685 0.05 −164.37
VMD-FFNN Recursive 57.78 26.06 11.038 −0.37 −346.90
FFNN MIMO 90.97 85.55 7.265 −0.04 −269.465
FFNN Recursive 27.78 44.97 1.8e14 −0.67 −1373.64
Confidence level: 80%: Low-resolution dataset (1-h resolution)
Method Strategy PICP (%) PINAW (%) CWC ACE IS
EEMD-FFNN MIMO 96.67 51.59 0.516 0.17 −528.03
EEMD-FFNN Recursive 56.11 18.44 28395.9 −0.24 −526.32
VMD-FFNN MIMO 100 21.34 0.213 0.2 −204.90
VMD-FFNN Recursive 96.67 14.26 0.143 0.17 −146.72
FFNN MIMO 71.94 56.99 32.57 −0.08 −854.34
FFNN Recursive 33.61 32.82 3.9e9 −0.46 −1234.63
Confidence level: 80%: High-resolution dataset (10-min resolution)
Method Strategy PICP (%) PINAW (%) CWC ACE IS
EEMD-FFNN MIMO 99.44 58.90 0.589 0.19 −567.42
EEMD-FFNN Recursive 15.88 15.94 81.20 −0.64 −1477.55
VMD-FFNN MIMO 99.95 47.93 0.479 0.19 −460.54
VMD-FFNN Recursive 36.76 11.50 8.798 −0.43 −511.88
FFNN MIMO 77.18 59.92 3.059 −0.03 −905.04
FFNN Recursive 21.67 16.84 7.8e11 −0.58 −1751.05

Spikes are visible when using EEMD, as the prediction error is larger MIMO and VMD-FFNN Recursive models for 24-h ahead forecasts. A
due to the less ability of EEMD to predict sudden changes in wind normal distribution is visible for the EEMD-FFNN MIMO and VMD-
power correctly. FFNN MIMO models, although there are a considerable amount of
In terms of the effect of time-resolution on the forecasts, the larger outliers at the end of both tails. Only the EEMD-FFNN Recursive model
volatility existing in higher resolution wind power data reduces the shows a skewness to the left. As the error distribution seems to be
forecast accuracy of WPF models. Fig. 6 depicts the distribution of the influenced by the model and the resolution, RMSE can be in some
residuals for every scenario. The larger spread observed for the high- scenarios a more suitable metric compared to MAE to examine the
resolution data comes mostly from the volatility of these data, as their accuracy of deterministic estimates of wind power forecasts [64], since
intrinsic characteristic are harder to capture for the model, and it is it provides not only information about the performance of the model
affected as well by the number of steps ahead to predict as the errors but the error distribution. Additionally, as the RMSE is related to the
accumulate for every step. Low-resolution residuals show a normal second moment of the error, it represents more accurately the presence
distribution in most of the cases, except for the EEMD models for 24-h of larger residuals.
ahead, that are slightly skewed to the right. Furthermore, the 6-h ahead The evolution of the performance evaluation metrics with respect to
predictions present a few outliers, as most of the residuals are centred the number of steps ahead predicted for the low-resolution dataset is
around the median. The prediction errors from the high-resolution shown in Fig. 7. As every step represents a 60-min interval, the number
dataset have different patterns. A right-skewed distribution is observed of steps is equivalent to the hours ahead predicted. As expected, the
in the EEMD-FFNN Recursive and the VMD-FFNN Recursive models NMAE and the NRMSE values increase as the number of steps increases,
for 6-h ahead forecasts, and in the EEMD-FFNN MIMO, VMD-FFNN since the quality of the predictions shrinks with the prediction horizon.

11
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Fig. 9. 6-h ahead prediction intervals (95% confidence level) using the low-resolution (left) and high-resolution dataset (right).

The same expected behaviour is observed for the IA, indicating a lower The lack of robustness for the other case (24-h ahead forecasts in the
performance for larger prediction horizons. The NBIAS is not affected high-resolution set) results from a combination of several aspects: the
by the prediction horizon, producing approximately regular values for model itself, the dataset, the prediction horizon length and the time-
the VMD-FFNN MIMO model and high variance for the other three resolution of the data. Therefore, the model should be further calibrated
models. The SDE tends to increase with larger prediction horizons. to verify if the model is able to produce accurate forecasts for this case.
MAPE scores are higher for a larger number of steps, although this The daily-averaged NBIAS shows low variability in the whole testing
relationship is not linear. set for 6-h ahead forecasts, whereas they do not follow any pattern for
Considering the techniques and available dataset, the VMD-FFNN 24-h ahead forecasts. The NSDE has a similar behaviour as the NMAE
MIMO model performs overall better than the rest of the models, and is and NRMSE, producing robust outcomes for every case but 24-h ahead
used hereafter to discuss its robustness by analysing the daily values of forecasts in the high-resolution dataset. The IA produces similar scores
the metrics obtained with the forecasts provided by this model (Fig. 8). for 6-h ahead predictions in the low-resolution dataset. The results are
Ideally, the forecast accuracy of the model should be as independent as quite steady for 6-h ahead (high-resolution set) and 24-h ahead (low-
possible from the data used to benchmark its validity, and therefore resolution set) forecasts, except for day 2, where there is a sudden drop
the numerical values obtained by the metrics should be similar for in the performance of the metric. MAPE shows a great variability in its
every subinterval. This performance is achieved for NMAE and NRMSE values, indicating a large sensitivity to changes of wind power output.
for three of the scenarios: 6-h ahead forecasts (both low- and high- The performance evaluation metrics provide not only information
resolution sets) and 24-h ahead forecasts with the low-resolution set. about the general performance of the models in terms of accuracy,

12
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Fig. 10. Performance evaluation for interval metrics with respect to the number of steps ahead (low resolution dataset).

but also can act as a tool to analyse the effects of aspects such as the and 14.26% for the VMD-FFNN Recursive model given 99%, 95%, and
robustness of the models under different conditions. The VMD-FFNN 80% confidence levels respectively). Furthermore, since the coverage
MIMO model show a robust behaviour when the forecast accuracy is of the intervals given by this model is high for all confidence levels,
high since it shows low variations when calculating their values for it also has the best scores for the IS. The CWC takes into account
different periods in the testing set. As expected, the performance evalu- both PICP and PINAW to consider both features simultaneously. As
ation metrics show a decreasing forecast accuracy while the prediction stated in Eq. (14), those intervals where the PICP is lower than 𝜇
horizon increases. However, the NBIAS does not seem to be affected will be penalized and will produce equal measurements as the PINAW.
by the prediction horizon and keeps values around zero for the VMD- Looking at the metrics’ scores, the decomposition-based hybrid models
FFNN MIMO model. Additionally, the MAPE values are not in line with present more skilled intervals that the FFNN models. Additionally, even
the scores obtained by the NMAE, the NRMSE, and the IA in terms of if the EEMD-FFNN MIMO model has a high PICP, the intervals are not
accuracy and consequently its use is not recommended. informative as they are also very wide, and therefore not useful for
applications in the industry.
For the high-resolution dataset, the metrics also provide information
5.3.2. Prediction intervals
regarding the multi-step ahead forecast strategy used. For instance,
PIs are estimated 6-h ahead for three confidence levels: 99%, 95%,
even if the PINAW is only 26.06% for the VMD-FFNN Recursive model
and 80%. The results are shown in Table 6. As done previously for
compared to a 68.49% when the MIMO strategy is used (95% con-
deterministic estimates, two FFNN models are used to benchmark the fidence level), the interval covers only a 57.78% of the data points
skill of the decomposition-based hybrid models. of the testing set. The IS provides additional details in terms of the
The PICP and ACE measure exclusively the coverage of the PI. coverage-width relation: the VMD-FFNN MIMO model scores better in
Considering that, these metrics indicate that the VMD-FFNN MIMO every scenario, meaning that this relation is balanced, as fewer intervals
model presents the best results in terms of coverage, meaning that are penalized by not covering the actual values. In terms of skill, the
a larger number of observations fall within the interval. The PINAW behaviour is identical as the observed for the low-resolution dataset.
and the IS quantify the width of the interval. The first metric only Fig. 9 displays the PIs obtained for the four models using the low-
considers the width of the interval in every time step, while the IS pe- resolution and high-resolution set respectively. As the number of steps
nalizes incorrect intervals as shown in Eq. (16). In this study, narrower to predict is lower for the low-resolution dataset, intervals are more
intervals are usually built when the recursive strategy is applied, at accurate and the boundaries are closer to the observations. In the low-
the expense of reducing the coverage of the interval. Therefore, better resolution dataset, all models can produce reliable intervals, although
scores for PINAW are obtained for narrower intervals (60.05%, 31.09% the intervals are too wide for some of the models.

13
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

Fig. 11. Daily performance of metrics for PIs (95% confidence level).

The evolution of the PI performance metrics with respect to the one of the models. Lastly, the IS decreases with the forecast horizon,
prediction horizon length is shown in Fig. 10. The coverage of the PI indicating a lower forecast accuracy for longer horizons.
decreases with the number of steps only for EEMD-FFNN Recursive Fig. 11 shows the daily performance of the interval metrics for both
model since the rest maintain all the values within the interval. This sets. The PICP has reached its maximum value for some of the models,
behaviour is replicated for the ACE, where the value decreases with a meaning that all the observations fall inside the interval. Therefore, the
larger number of steps. The PINAW is barely influenced by the increase daily-averaged values of the PICP indicate that the models are robust in
of the prediction horizon and it does not reveal high changes except for these cases. Otherwise, the metric shows the sensitivity of the models

14
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

to changes in the data as the PICP acts as a control depending on the The different performance evaluation methods are also considered
percentage of values within the PI. The second metric (PINAW ) shows to evaluate the robustness of the models over the testing set. Several
a more robust performance for the models in the testing set, although it aspects are consider such as the prediction horizon and the time-
generates narrower intervals in days 2, 7 and 12 for every model. This resolution of the data, as intra-hour wind data shows higher volatility.
behaviour takes place with low wind power generation, so these values For higher resolution sets the models can be further calibrated to
seem justified as the interval will not grow any longer in its lower provide more accurate forecasts.
boundary. The CWC shows spikes whenever the PICP is lower than the This study proves to be useful in the model development stage and
parameter 𝜇 (set to the confidence level). It provides a control system aims to improve the benchmarking of WPF models by considering as-
to know the coverage of the interval and to what degree is good, as a pects such as the data, the prediction horizon, and the time-resolution,
larger CWC implies a greater penalization by the metric. However, as as well as the robustness of the models that can be evaluated through
the CWC is very sensitive to this penalization, evaluating the robustness the use of the appropriate performance evaluation metrics. In addition,
of the models with this metric is not advised. The ACE provides 6-h and 24-h ahead forecasts have been stressed since these horizons
similar results than the PICP, only it takes into account the significance are relevant for wind power trading in electricity markets.
level. For this reason, except for exceptional cases, it is enough to
choose only one of them to analyse the robustness of the interval. The CRediT authorship contribution statement
daily-averaged IS produces robust values with low variability for the
VMD-FFNN models in the low-resolution dataset. Additionally, the IS J.M. González-Sopeña: Methodology, Software, Validation, Formal
provides supplementary information about the forecast accuracy of the analysis, Investigation, Resources, Data curation, Writing - original
interval, since the prediction error by itself accounts only for the size draft, Writing - review & editing, Visualization. V. Pakrashi: Conceptu-
of the PI. The more negative is the IS for a time step, the further away alization, Validation, Investigation, Resources, Data curation, Writing -
the actual value is from the interval. review & editing, Supervision, Project administration, Funding acquisi-
The PICP and PINAW provide direct knowledge in terms of coverage tion. B. Ghosh: Conceptualization, Methodology, Validation, Investiga-
and width. The CWC is highly sensitive to the coverage of the interval tion, Resources, Data curation , Writing - original draft, Writing - review
and the confidence level, and therefore not suitable to evaluate the & editing, Supervision, Project administration, Funding acquisition.
robustness of the models. The ACE provides very similar information
as the PICP, so its assessment is not necessary if the PICP is already es- Declaration of competing interest
timated. The IS allows to determine reliably the robustness of a model,
and provides information about the width interval while penalizing The authors declare that they have no known competing finan-
incorrect PIs in terms of coverage. cial interests or personal relationships that could have appeared to
influence the work reported in this paper.
6. Conclusions
Acknowledgements

This paper presents an overview of the most common metrics ap- The authors acknowledge the funding of SEAI WindPearl, Ireland
plied for evaluating WPF models in the recent literature for determin- Project 18/RDD/263. Vikram Pakrashi would like to acknowledge the
istic and probabilistic estimates. Furthermore, this paper illustrates the support of SFI MaREI centre and UCD Energy Institute, Ireland.
capability of these metrics to properly evaluate the performance of
WPF models over different datasets, time-resolution and other model References
specific attributes. This aspect is often disregarded to determine the
validity of a forecasting model over an out-of-sample set, as the values [1] Pinson P, Chevallier C, Kariniotakis GN. Trading wind generation from
of these metrics could be influenced by the intrinsic characteristics of short-term probabilistic forecasts of wind power. IEEE Trans Power Syst
the dataset and they could fluctuate considerably for different periods 2007;22(3):1148–56.
[2] Bessa RJ, Matos MA, Costa IC, Bremermann L, Franchin IG, Pestana R, et
of the same testing set. A numerical study is presented using wind al. Reserve setting and steady-state security assessment using wind power
data from Ireland with two different time-resolutions (10-min and 60- uncertainty forecast: A case study. IEEE Trans Sustain Energy 2012;3(4):827–36.
min) and decomposition-based hybrid models to be assessed by the [3] Costa A, Crespo A, Navarro J, Lizcano G, Madsen H, Feitosa E. A review on the
performance evaluation metrics. young history of the wind power short-term prediction. Renew Sustain Energy
Rev 2008;12:1725–44.
Metrics are based on fundamental theory of statistics and can be
[4] Foley AM, Leahy PG, Marvuglia A, McKeogh EJ. Current methods and advances
considered robust as such. However, they capture different aspects in forecasting of wind power generation. Renew Energy 2012;37:1–8.
of model performance. Most of the performance evaluation metrics [5] Colak I, Sagiroglu S, Yesilbudak M. Data mining and wind power prediction: A
identified for the assessment of deterministic estimates analyse all literature review. Renew Energy 2012;46:241–7.
[6] Zhang Y, Wang J, Wang X. Review on probabilistic forecasting of wind power
sources of error together (MAE, RMSE, MAPE, IA), while others evaluate
generation. Renew Sustain Energy Rev 2014;32:255–70.
a specific source of error such as the BIAS, that accounts for the [7] Bremnes JB. Probabilistic wind power forecasts using local quantile regression.
systematic component of error, or the SDE, where only the random Wind Energy: Int J Prog Appl Wind Power Convers Technol 2004;7(1):47–54.
error is analysed. Probabilistic estimates account for both accuracy [8] Pinson P, Kariniotakis G. On-line assessment of prediction risk for wind power
production forecasts. Wind Energy Int J for Prog Appl Wind Power Convers
and precision, therefore they are preferred for model comparability.
Technol 2004;7(2):119–32.
Their metrics evaluate the coverage provided by the interval, such as [9] Pinson P, Madsen H, Nielsen HA, Papaefthymiou G, Klöckl B. From probabilistic
the PICP, or the width, such as the PINAW, whereas the IS provides forecasts to statistical scenarios of short-term wind power production. Wind
additional information in terms of the overall quality of the interval. Energy Int J for Prog Appl Wind Power Convers Technol 2009;12(1):51–62.
These three metrics give enough information to address the forecast ac- [10] Papaefthymiou G, Pinson P. Modeling of spatial dependence in wind power
forecast uncertainty. In: Proc. 10th int. conf. probab. methods appl. to power
curacy of the interval. On the other hand, the ACE does not provide any syst.. IEEE; 2008, p. 1–9.
additional information if the PICP is already estimated, consequently is [11] Tascikaraoglu A, Uzunoglu M. A review of combined approaches for prediction of
not deemed necessary to evaluate the coverage of the interval. The CWC short-term wind speed and power. Renew Sustain Energy Rev 2014;34:243–54.
is highly sensitive to both the tuning parameters and the nature of the [12] Gallego-Castillo C, Cuerva-Tejero A, Lopez-Garcia O. A review on the re-
cent history of wind power ramp forecasting. Renew Sustain Energy Rev
training set. Therefore, this metric is recommended as a parameter to
2015;52:1148–57.
train the data in methodologies such as the LUBE method, but not to [13] Yan J, Liu Y, Han S, Wang Y, Feng S. Reviews on uncertainty analysis of wind
evaluate the accuracy of a forecasting model. power forecasting. Renew Sustain Energy Rev 2015;52:1322–30.

15
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

[14] Chen C. Robustness properties of some forecasting methods for seasonal time [43] Zhao Y, Ye L, Li Z, Song X, Lang Y, Su J. A novel bidirectional mecha-
series: a Monte Carlo study. Int J Forecast 1997;13(2):269–80. nism based on time series model for wind power forecasting. Appl Energy
[15] Sorensen P, Cutululis NA, Vigueras-Rodríguez A, Jensen LE, Hjerrild J, Dono- 2016;177:793–803.
van MH, et al. Power fluctuations from large wind farms. IEEE Trans Power [44] Hao Y, Tian C. A novel two-stage forecasting model based on error factor
Syst 2007;22(3):958–65. and ensemble method for multi-step wind power forecasting. Appl Energy
[16] Hong T, Pinson P, Fan S. Global energy forecasting competition 2012. Int J 2019;238:368–83.
Forecast 2014;30:357–63. [45] Zhang G, Wu Y, Wong KP, Xu Z, Dong ZY, Iu HH-C. An advanced approach for
[17] Hong T, Pinson P, Fan S, Zareipour H, Troccoli A, Hyndman RJ. Probabilistic construction of optimal wind power prediction intervals. IEEE Trans Power Syst
energy forecasting: Global energy forecasting competition 2014 and beyond. Int 2014;30(5):2706–15.
J Forecast 2016;32:896–913. [46] Wan C, Lin J, Wang J, Song Y, Dong ZY. Direct quantile regression for
[18] Giebel G, Kariniotakis G. Wind power forecasting—A review of the state of the nonparametric probabilistic forecasting of wind power generation. IEEE Trans
art. In: Renewable energy forecasting. Elsevier; 2017, p. 59–109. Power Syst 2016;32(4):2767–78.
[19] Sweeney C, Bessa RJ, Browell J, Pinson P. The future of forecasting for renewable [47] Mahmoud T, Dong ZY, Ma J. An advanced approach for optimal wind power
energy. Wiley Interdiscip Rev Energy Environ 2020;9(2):e365. generation prediction intervals by using self-adaptive evolutionary extreme
[20] Makridakis S, Spiliotis E, Assimakopoulos V. Statistical and machine learning machine. Renew Energy 2018;126:254–69.
[48] Afshari-Igder M, Niknam T, Khooban M-H. Probabilistic wind power forecasting
learning forecasting methods: Concerns and ways forward. PLoS One
using a novel hybrid intelligent method. Neural Comput Appl 2018;30(2):473–85.
2018;13(3):e0194889.
[49] Buhan S, Cadirci I. Multistage wind-electric power forecast by using
[21] Dowell J, Pinson P. Very-short-term probabilistic wind power forecasts by sparse
a combination of advanced statistical methods. IEEE Trans Ind Inform
vector autoregression. IEEE Trans Smart Grid 2015;7(2):763–70.
2015;11(5):1231–42.
[22] Messner JW, Pinson P. Online adaptive lasso estimation in vector autore-
[50] Liu J, Wang X, Lu Y. A novel hybrid methodology for short-term wind power
gressive models for high dimensional wind power forecasting. Int J Forecast
forecasting based on adaptive neuro-fuzzy inference system. Renew Energy
2019;35(4):1485–98.
2017;103:620–9.
[23] Granger CW, Joyeux R. An introduction to long-memory time series models and
[51] Lee D, Baldick R. Short-term wind power ensemble prediction based on Gaussian
fractional differencing. J Time Ser Anal 1980;1(1):15–29.
processes and neural networks. IEEE Trans Smart Grid 2013;5(1):501–10.
[24] Yuan X, Tan Q, Lei X, Yuan Y, Wu X. Wind power prediction using hybrid [52] Heinermann J, Kramer O. Machine learning ensembles for wind power
autoregressive fractionally integrated moving average and least square support prediction. Renew Energy 2016;89:671–9.
vector machine. Energy 2017;129:122–37. [53] Lahouar A, Slama JBH. Hour-ahead wind power forecast based on random
[25] Ozkan MB, Karagoz P. A novel wind power forecast model: Statistical forests. Renew Energy 2017;109:529–41.
hybrid wind power forecast technique (SHWIP). IEEE Trans Ind Inform [54] Wang H, Lei Z, Zhang X, Zhou B, Peng J. A review of deep learning for renewable
2015;11(2):375–87. energy forecasting. Energy Convers Manage 2019;198:111799.
[26] Xie W, Zhang P, Chen R, Zhou Z. A nonparametric Bayesian framework [55] Wang H-z, Li G-q, Wang G-b, Peng J-c, Jiang H, Liu Y-t. Deep learning
for short-term wind power probabilistic forecast. IEEE Trans Power Syst based ensemble approach for probabilistic wind power forecasting. Appl Energy
2018;34(1):371–9. 2017;188:56–70.
[27] Jiang Y, Xingying C, Kun YU, Yingchen L. Short-term wind power forecasting [56] Shi Z, Liang H, Dinavahi V. Direct interval forecast of uncertain wind
using hybrid method based on enhanced boosting algorithm. J Mod Power Syst power based on recurrent neural networks. IEEE Trans Sustain Energy
Clean Energy 2017;5(1):126–33. 2017;9(3):1177–87.
[28] Rumelhart DE, Hinton GE, Williams RJ. Learning representations by [57] Qureshi AS, Khan A, Zameer A, Usman A. Wind power prediction using deep
back-propagating errors. Nature 1986;323(6088):533–6. neural network based meta regression and transfer learning. Appl Soft Comput
[29] Chang GW, Lu HJ, Chang YR, Lee YD. An improved neural network-based 2017;58:742–55.
approach for short-term wind speed and power forecast. Renew Energy [58] Zhang Y, Wang J. K-nearest neighbors and a kernel density estima-
2017;105:301–11. tor for gefcom2014 probabilistic wind power forecasting. Int J Forecast
[30] Chitsaz H, Amjady N, Zareipour H. Wind power forecast using wavelet neural 2016;32(3):1074–80.
network trained by improved clonal selection algorithm. Energy Convers Manage [59] Yan J, Li K, Bai E, Yang Z, Foley A. Time series wind power forecasting based
2015;89:588–98. on variant Gaussian process and TLBO. Neurocomputing 2016;189:135–44.
[31] Osório G, Matias JCO, Catalão JPS. Short-term wind power forecasting using [60] Yang L, He M, Zhang J, Vittal V. Support-vector-machine-enhanced markov
adaptive neuro-fuzzy inference system combined with evolutionary particle model for short-term wind power forecast. IEEE Trans Sustain Energy
swarm optimization, wavelet transform and mutual information. Renew Energy 2015;6(3):791–9.
2015;75:301–7. [61] Wang Y, Hu Q, Meng D, Zhu P. Deterministic and probabilistic wind power
[32] Zameer A, Arshad J, Khan A, Raja MAZ. Intelligent and robust prediction of forecasting using a variational Bayesian-based adaptive robust multi-kernel
short term wind power using genetic programming based ensemble of neural regression model. Appl Energy 2017;208:1097–112.
networks. Energy Convers Manage 2017;134:361–72. [62] Han L, Romero CE, Yao Z. Wind power forecasting based on principle component
[33] Azimi R, Ghofrani M, Ghayekhloo M. A hybrid wind power forecasting phase space reconstruction. Renew Energy 2015;81:737–44.
model based on data mining and wavelets analysis. Energy Convers Manage [63] Zjavka L, Mišák S. Direct wind power forecasting using a polynomial de-
2016;127:208–25. composition of the general differential equation. IEEE Trans Sustain Energy
2018;9(4):1529–39.
[34] Li S, Wang P, Goel L. Wind power forecasting using neural network ensembles
[64] Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error
with feature selection. IEEE Trans Sustain Energy 2015;6(4):1447–56.
(MAE)?–Arguments against avoiding RMSE in the literature. Geosci Model Dev
[35] Qian Z, Pei Y, Zareipour H, Chen N. A review and discussion of decomposition-
2014;7(3):1247–50.
based hybrid models for wind energy forecasting applications. Appl Energy
[65] Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J
2019;235:939–53.
Forecast 2006;22(4):679–88.
[36] Naik J, Dash S, Dash PK, Bisoi R. Short term wind power forecasting using hybrid
[66] Kim S, Kim H. A new metric of absolute percentage error for intermittent demand
variational mode decomposition and multi-kernel regularized pseudo inverse
forecasts. Int J Forecast 2016;32(3):669–79.
neural network. Renew Energy 2018;118:180–212.
[67] Madsen H, Pinson P, Kariniotakis G, Nielsen HA, Nielsen TS. Standardizing the
[37] Khosravi A, Nahavandi S, Creighton D. Prediction intervals for short-term wind
performance evaluation of short-term wind power prediction models. Wind Eng
farm power generation forecasts. IEEE Trans Sustain Energy 2013;4(3):602–10.
2005;29(6):475–89.
[38] Quan H, Srinivasan D, Khosravi A. Short-term load and wind power forecasting [68] Willmott CJ. On the validation of models. Phys Geogr 1981;2(2):184–94.
using neural network-based prediction intervals. IEEE Trans Neural Netw Learn [69] Willmott CJ, Robeson SM, Matsuura K. A refined index of model performance.
Syst 2013;25(2):303–15. Int J Climatol 2012;32(13):2088–94.
[39] Kavousi-Fard A, Khosravi A, Nahavandi S. A new fuzzy-based combined [70] Gallego-Castillo C, Bessa R, Cavalcante L, Lopez-Garcia O. On-line quantile
prediction interval for wind power forecasting. IEEE Trans Power Syst regression in the RKHS (Reproducing Kernel Hilbert Space) for operational
2015;31(1):18–26. probabilistic forecasting of wind power. Energy 2016;113:355–65.
[40] Zou W, Li C, Chen P. An inter type-2 FCR algorithm based TS fuzzy [71] Lin Y, Yang M, Wan C, Wang J, Song Y. A multi-model combination ap-
model for short-term wind power interval prediction. IEEE Trans Ind Inform proach for probabilistic wind power forecasting. IEEE Trans Sustain Energy
2019;15(9):4934–43. 2018;10(1):226–37.
[41] Haque AU, Nehrir MH, Mandal P. A hybrid intelligent model for deterministic [72] Khorramdel B, Chung CY, Safari N, Price GCD. A fuzzy adaptive probabilistic
and quantile regression approach for probabilistic wind power forecasting. IEEE wind power prediction framework using diffusion kernel density estimators. IEEE
Trans Power Syst 2014;29(4):1663–72. Trans Power Syst 2018;33(6):7109–21.
[42] He Y, Li H. Probability density forecasting of wind power using quantile [73] Alessandrini S, Delle Monache L, Sperati S, Nissen JN. A novel application
regression neural network and kernel density estimation. Energy Convers Manage of an analog ensemble for short-term wind power forecasting. Renew Energy
2018;164:374–84. 2015;76:768–81.

16
J.M. González-Sopeña et al. Renewable and Sustainable Energy Reviews xxx (xxxx) xxx

[74] Pinson P, Kariniotakis G. Conditional prediction intervals of wind power [83] Taieb SB, Sorjamaa A, Bontempi G. Multiple-output modeling for multi-step-
generation. IEEE Trans Power Syst 2010;25(4):1845–56. ahead time series forecasting. Neurocomputing 2010;73(10–12):1950–7.
[75] Winkler RL. A decision-theoretic approach to interval estimation. J Am Stat Assoc [84] Sorjamaa A, Lendasse A. Time series prediction using DirRec strategy. In: Eur.
1972;67(337):187–91. symp. artif. neural networks, Vol. 6. 2006, p. 143–8.
[76] Pinson P, Nielsen HA, Møller JK, Madsen H, Kariniotakis GN. Non-parametric [85] Taieb SB, Bontempi G, Sorjamaa A, Lendasse A. Long-term prediction of time
probabilistic forecasts of wind power: required properties and evaluation. Wind series by combining direct and MIMO strategies. In: 2009 int. jt. conf. neural
Energy Int J for Prog Appl Wind Power Convers Technol 2007;10(6):497–516. networks. IEEE; 2009, p. 3054–61.
[77] Pinson P. Very-short-term probabilistic forecasting of wind power with [86] Wu Z, Huang NE. Ensemble empirical mode decomposition: a noise-assisted data
generalized logit–normal distributions. J R Stat Soc C 2012;61(4):555–76. analysis method. Adv Adapt Data Anal 2009;1(01):1–41.
[78] Yan J, Liu Y, Han S, Qiu M. Wind power grouping forecasts and its uncertainty [87] Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, et al. The empirical
analysis using optimized relevance vector machine. Renew Sustain Energy Rev mode decomposition and the Hilbert spectrum for nonlinear and non-stationary
2013;27:613–21. time series analysis. Proc R Soc Lond 1998;454(1971):903–95.
[79] Zhang J, Meng H, Gu B, Li P. Research on short-term wind power combined [88] Dragomiretskiy K, Zosso D. Variational mode decomposition. IEEE Trans Signal
forecasting and its Gaussian cloud uncertainty to support the integration of Process 2013;62(3):531–44.
renewables and EVs. Renew Energy 2020;153:884–99. [89] Hestenes MR. Multiplier and gradient methods. J Optim Theory Appl
[80] Soman SS, Zareipour H, Malik O, Mandal P. A review of wind power and wind 1969;4(5):303–20.
speed forecasting methods with different time horizons. In: North American [90] Koenker R, Bassett Jr G. Regression quantiles. Econometrica 1978;46(1):33–50.
Power Symposium 2010. IEEE; 2010, p. 1–8. [91] Cannon AJ. Quantile regression neural networks: Implementation in R and
[81] Taieb SB, Bontempi G, Atiya AF, Sorjamaa A. A review and comparison application to precipitation downscaling. Comput Geosci 2011;37(9):1277–84.
of strategies for multi-step ahead time series forecasting based on the NN5 [92] Nielsen HA, Madsen H, Nielsen TS. Using quantile regression to extend an
forecasting competition. Expert Syst Appl 2012;39(8):7067–83. existing wind power forecasting system with probabilistic forecasts. Wind Energy:
[82] Wang J, Song Y, Liu F, Hou R. Analysis and application of forecasting models Int J Prog Appl Wind Power Convers Technol 2006;9(1–2):95–108.
in wind power integration: A review of multi-step-ahead wind speed forecasting
models. Renew Sustain Energy Rev 2016;60:960–81.

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy