1. Introduction
Short-term travel flow forecasting serves as an essential tool to provide useful traffic flow information in the immediate future, and hence wide supports to intelligent transportation systems (ITSs), especially advanced traveler information system, route guidance system, and proactive traffic signal control system. For instance, with accurate real-time traffic flow forecasts, a proactive signal control system is able to set up strategies to significantly reduce traffic delay, and a route guidance system is capable of providing travelers with the optimal route with a minimum travel cost. With this in mind, plenty of methods have been proposed for short-term traffic flow forecasting, which can be generally divided into two categories: (1) traditional forecasting methods that concentrate on improving the accuracy of traffic mean forecasts; and (2) traffic volatility forecasting that focus on the unpredictable part to capture the time-dependent variances.
During the last couple of decades, extensive traffic flow forecasting studies have been conducted using time series models [
1,
2,
3,
4], Kalman filtering [
5,
6], nonparametric regression [
7,
8], neural networks [
9,
10], Bayesian network [
11,
12], random forests [
13,
14], deep neural networks [
15,
16,
17], and so on. The majority of the methods belong to point forecasting methods, in which the second moment (i.e., variance) of the traffic variables was presumed as a constant. Therefore, traffic volatility was artificially omitted during the forecasting process.
As the traffic reliability or uncertainty problem has been highly valued in recent years, many volatility models have been proposed for short-term traffic flow forecasting. Compared with the point forecasting models that presume a constant variance, the volatility models can not only produce accurate mean forecasts but also give more reliable forecasting confidence intervals by capturing the temporal evolution of conditional variances. Among the volatility models, the GARCH-type models, such as the fractionally integrated asymmetric power ARCH (FIAPARCH) model [
18], the exponential GARCH (EGARCH) model [
19], and the GJR-GARCH model [
20], have enjoyed much popularity in past years, for their outstanding ability to describe the asymmetric effect of positive and negative shocks on volatility. Even though these models have been confirmed to be able to produce more reliable forecasts than the traditional point forecasting models, the more or less imposed restrictions on parameter estimations may make the asymmetric property of traffic volatility be not or insufficiently considered. Furthermore, the performance of the models has not been fully evaluated and compared in the traffic forecasting context, rendering the choice of the models dilemmatic for traffic volatility modeling. Therefore, it is necessary to develop a flexible traffic volatility forecasting framework under which various volatility models can be easily implemented and fully tested to yield more reliable forecasts and enhance the forecasting models’ applicability.
In this research, an omnibus traffic volatility forecasting framework based on the FGARCH model [
21] is proposed to model traffic speed volatility on urban roads and freeways. The volatility models under the proposed framework are established by fixing or flexibly estimating three key parameters, namely the Box-Cox transformation coefficient
, the shift factor
, and the rotation factor
. Extensive case studies show that the proposed framework can be used to account for both shift and rotation asymmetric effects and allows for more flexible traffic volatility descriptions that are usually restricted in some classical volatility models. The achieved findings provide insights into how to develop and select proper traffic volatility forecasting models in different situations and are helpful for building traffic flow forecasting models with high reliability and good applicability.
The main contributions of this research include two aspects. First, we present a comprehensive traffic volatility forecasting framework, which can be utilized to develop effective volatility models in a flexible manner, and are capable of properly capturing the evolution of conditional variance with symmetric or asymmetric properties. Second, we conduct extensive experimental studies to fully evaluate and compare the performance of the established volatility forecasting models, including the symmetric standard GARCH [
22] and the asymmetric TGARCH [
23], NGARCH [
24], NAGARCH [
25], GJR-GARCH [
19], and FGARCH [
21] and further summarize their applicability in various situations.
The remainder of this paper is organized as follows: In
Section 2, a comprehensive literature review regarding traffic volatility forecasting is provided. Then, the proposed framework based on the FGARCH model for traffic volatility forecasting is elaborated in
Section 3. In
Section 4, the performance of the developed traffic volatility forecasting models is fully evaluated and compared using real traffic speed datasets collected from urban road links and freeway segments. Conclusions and future work are summed up in
Section 5.
2. Literature Review
During the past decades, a large number of volatility methods have been developed for short-term traffic flow forecasting. In general, these methods describe the traffic volatility during the forecasting process by using the standard GARCH or its advanced extensions.
The GARCH-type models were firstly proposed in the fields of finance and economics. In order to deal with the problem that the volatility of financial time series changes over time, the autoregressive conditional heteroscedastic (ARCH) model was introduced by Engle [
26], and then further promoted by Bollerslev [
22] as the GARCH model. In a following study, Kamarianakis et al. [
27] pointed out that the time-varying nature of traffic volatility can be recognized as the alternative large movement period that can be observed when the traffic condition changes. Therefore, a standard GARCH model was applied to study the volatility of relative speed. Given the strong ability to capture the traffic volatility with time-variant conditional variances, the GARCH model has been widely applied in many pieces of research for short-term traffic flow forecasting. Guo and Williams [
28] proposed to use the autoregressive moving average (ARMA) model for traffic speed mean forecasting and the GARCH model for traffic speed volatility forecasting. At the stage of online processing, a layered adaptive Kalman filter approach was developed. Similarly, Chen et al. [
29] proposed an integrated model which combines the autoregressive integrated moving average (ARIMA) model and the GARCH model for traffic flow forecasting and reached the conclusion that the time-variant confidence intervals achieved can provide more reliable information for travelers, compared with the standard ARIMA model. Ding et al. [
30] modeled the dynamic volatility of the subway short-term ridership forecasting process by constructing four kinds of integrated ARIMA-GARCH models. Their experimental results show the performance of the hybrid models is better than the traditional models because of the improvement in the reliability of the forecasting point value with the associated coverage probability of the forecasting interval. Considering that the inherent relationships between different traffic variables could contribute to the improvement of traffic forecasting accuracy and reliability, Xia et al. [
31] developed a multivariate GARCH model for traffic volume and speed volatility forecasting, in which a vector autoregressive model was used for jointly forecasting traffic volume and speed mean values. In addition, from the perspective of traffic flow decomposition, Chen et al. [
32] proposed a time-series analysis and supervised-learning (TSA-SL) model for short-term traffic flow prediction by decomposing the traffic flow series into two components, namely periodicity and volatility.
Due to the standard GARCH process having a linear relationship between the conditional variance and the shocks in the past, the GARCH model is limited in perfectly describing the temporal evolution of traffic volatility. Moreover, the squared term in the standard GARCH model does not allow the asymmetric effects of traffic volatility that might exist when traffic condition changes dramatically. Therefore, Tsekeris and Stathopoulos [
18] presented a fractionally integrated asymmetric power ARCH (FIAPARCH) model to relax the linear and symmetric restrictions of the standard GARCH model, and in return, to better represent the nonlinear and asymmetric properties of traffic volatility. In their research, the fractionally integrated component was introduced to describe the long memory of traffic volatility since the standard GARCH process can only capture the short memory of the volatility evolution that was restrained by the assumption of exponentially decayed propagation of shocks. Evaluation results show that the FIAPARCH model outperforms the standard GARCH model in traffic volatility forecasting. Similarly, Zhang et al. [
19] applied another two asymmetric volatility models, i.e., an EGARCH model and a GJR-GARCH model, for reliable short-term travel time forecasting. Compared with the standard GARCH model, both the EGARCH and GJR-GARCH models allow the conditional variance to respond differently to the past negative and positive shocks that release the symmetric restriction in the GARCH model. Performance evaluation demonstrates that the GJR-GARCH is better than the standard GARCH model and the EGARCH model in terms of the forecasting bandwidth and coverage probability.
Considering traffic volatility may be better modeled by removing underlying traffic patterns that can be decomposed from original traffic flow data series, Zhang et al. [
20] proposed a hybrid traffic flow forecasting model that used spectral analysis to remove the periodic trend of traffic flows using an ARIMA process to model the deterministic component of traffic flow, and using a GJR-GARCH process to model the traffic volatility. Experimental results show that the hybrid model is able to improve forecasting accuracy and reliability. Years later, Lin et al. [
33] proposed a hybrid PSO-ELM model to quantify the uncertainty of border crossing traffic volume prediction and compared it with the adaptive Kalman filter method used by Guo et al. [
28] and the GJR-GARCH method proposed by Zhang et al. [
20]. The results confirm that the proposed PSO-ELM model can always reach satisfactory forecasting reliability. As a nonlinear analytical model, the integer-valued GARCH (INGARCH) can be utilized to capture the characteristics of network traffic. Therefore, it was introduced in the modeling of network traffic forecasting by Kim [
34] and turned out to be more competitive than the ARIMA, GARCH, and long short-term memory neural network. In addition, Yao et al. [
35] presented a nonlinear hybrid method for traffic flow forecasting, which uses the ARIMA-GARCH model to predict the similar and volatile parts, and applies the Markov model with state membership degree and wavelet neural network to predict the irregular part. Another hybrid model named ARIMA-GARCH-M developed by Lin and Huang [
36] is able to predict short-term high-speed traffic flow data with good accuracy and reliability. To extend the GARCH model to be used for spatial data, Sato and Matsuda [
37] proposed a spatial GARCH (S-GARCH) model. Considering the effect of both space and time restrictions, Hølleland and Karlsen [
38] introduced a stationary spatiotemporal GARCH. To deal with the time-varying characteristic of traffic flow, a hybrid method that combines the ARIMA, maximum correntropy criterion, conditional kernel density estimation, and GARCH model was put forward by Zhao et al. [
39], which outperforms the ARIMA model in terms of both accuracy and reliability.
The aforementioned GARCH-type models describe the evolution of traffic volatility from linear to nonlinear, from symmetric effect to asymmetric effect. Though the dynamics of traffic volatility has been modeled explicitly step-by-step, the ability of capturing traffic volatility may be subject to the restrictions of the appointed GARCH-type models. For instance, the asymmetric property of volatility can be captured by EGARCH, GJR-GARCH, and asymmetric power ARCH models because of the introduced “rotation factor” in the ARCH term. However, another critical factor—“shift factor”—that can also be used to describe the asymmetric property is not considered in these models. According to Helbing et al. [
40,
41], the volatile behavior dynamics of drivers was found to be similar to the stockholders in a financial market. As both the rotation and shift effects have been recognized in the financial market [
21], it is reasonable and necessary to develop a more flexible volatility model that permits both shift and rotation asymmetric properties for traffic volatility modeling. Moreover, according to Hentschel [
21], multiple volatility models are able to be established within the same theoretical GARCH framework. However, the performance of these models on traffic volatility has not been comprehensively tested, making their feasibility in traffic reliability inexplicable for researchers and practitioners.
3. Method
To develop an effective traffic volatility model that can better capture the evolution of the conditional heteroskedasticity underlying traffic flow time series, an omnibus traffic volatility forecasting framework is presented, as depicted in
Figure 1.
Generally, the framework can be systematically deconstructed into five steps. First, a mean forecasting model is built based on a calibration dataset and the ARIMA algorithm. To achieve satisfactory mean forecasts, the stationarity test and autocorrelation test are conducted using the augmented Dickey–Fuller (ADF) test [
42] and the Ljung–Box test [
43], respectively, and the orders of the ARIMA model are optimally identified with a step-wise algorithm [
44] based on the Bayesian information criterion (BIC) [
45]. Second, conditional heteroscedasticity, also known as the ARCH effect, is checked for the residual series obtained according to the mean equation. To achieve this purpose, the Lagrange multiplier test [
26] is carried out. Third, if the ARCH effects are statistically significant, multiple traffic volatility models based on the FGARCH algorithmic framework are established to capture the symmetric or asymmetric effects in traffic volatility, while a joint estimation of the mean and volatility equations is performed to determine the model parameters using the maximum likelihood estimation (MLE) [
46] method. Fourth, the developed models are checked to ensure the validity of the mean equation, volatility equation, as well as the distribution assumption. Finally, the traffic volatility forecasting models are evaluated in terms of performance measures of both mean forecasting and volatility forecasting, and the best model is selected for different application scenarios. In what follows, each step of the proposed framework is described in more details.
3.1. Mean Forecasting Model Development
The purpose of developing the mean forecasting model is to remove the sample mean from the traffic flow data and obtain the residual series for further traffic volatility modeling. One of the most widely used mean forecasting models, namely ARIMA [
19,
20,
27,
31], can be applied to fit the mean values of the traffic flow time series. Note that if the modeling traffic flow data contains the seasonal components, the seasonal ARIMA (SARIMA) [
3,
4] model can be constructed to ensure good mean forecasting performance. In this study, we only use one day’s data to calibrate the model, where the seasonality is not a typical characteristic of the traffic flow time series. As a result, the ARIMA model is established, which is mathematically expressed as
where
is the traffic variable at time
,
is the backshift operator,
is the differencing order,
is the autoregressive polynomial order,
is the moving average polynomial order,
and
are the autoregressive and moving average parameters, and
is the error term. After the specification of the mean equation, four further subprocedures need to be carried out to establish the qualified mean forecasting model.
First, the stationarity of the traffic flow time series needs to be checked, which is an important assumption in modeling the ARIMA model. The ADF test [
42] is a unit root test, which can be utilized to check the stationarity of a time series. The presence of a unit root implies the analyzed time series is non-stationary and the number of unit roots corresponds to the number of differencing operations required to make the series stationary. For the traffic flow time series
, to verify the existence of a unit root in an AR (
), we can perform the test
versus
with an ADF regression
where
is a deterministic function of the time index
, and can be a constant or a linear function of
,
are the autoregressive coefficients,
is the differenced series of
,
is the coefficient of
. Equation (2) can also be rewritten as
where
and the equivalent hypothesis can be replaced as
versus
. If we accept
means the traffic flow series needs to be differenced to make it stationary, otherwise the stationary condition is satisfied.
Second, the autocorrelation of the traffic flow time series is tested with the Ljung–Box test [
43]. As a linear time series model, ARIMA is characterized by its autocorrelation function (ACF), and its modeling process makes use of the sample ACF to specify the forecasting model that can capture the dynamic dependence of the data. In view of this, it is necessary to check whether several autocorrelations of the series
are 0. The Ljung-Box test is a popular statistical means to achieve the above goal. The test statistic is given as
where
is the autoregressive lag order of the series
and can be commonly set as
,
is the sample size,
is the sample ACF of
and a biased estimate of
that is the lag-
autocorrelation of
. The null hypothesis is
and the alternative hypothesis is
for
. If
,
will be rejected, where
denotes the
percentile of a chi-squared distribution with
degrees of freedom.
Third, a step-wise algorithm proposed by Hyndman and Khandakar [
44] is introduced to determine the autoregressive order
and moving average order
. One of the advantages of the algorithm is that it is able to automatically identify the appropriate orders of the mean equation and guarantee a valid model. This advantage is significant since the order
and order
are often application-dependent, implying the modeling process may be labor-intensive or time-consuming if we determine the orders manually with site-to-site traffic flow data. In this study, the BIC [
45] is adopted to measure the quality of each candidate model during the step-wise searching procedure and is defined as
where
is the maximum likelihood estimate of
that is the variance of
,
is the penalty of each parameter. The selection rule is to compute
for
, where
is a prespecified positive integer, and select the value of
as the order
or
that has the minimum BIC value.
Once the orders of the ARIMA are optimally determined, the last subprocedure of building the mean forecasting model is to estimate the parameters
and
with the MLE method. Note that this subprocedure is optional since we can estimate the parameters of the mean equation and volatility equation jointly. In this study, we conduct this subprocedure because we will test whether the joint estimation procedure can indeed improve the point forecasting accuracy. Interested readers can refer to [
47] for more details.
3.2. ARCH Effects Test for Traffic Flow Residual Series
Once the mean forecasting model is established, the residual series can be computed according to the observations and mean forecasts. The conditional heteroscedasticity of the residual series, also known ARCH effects, needs to be carefully checked to model the traffic volatility. In view of this, the Lagrange multiplier test [
26] is introduced to check whether there are ARCH effects in the traffic flow residual series. The Lagrange multiplier test is to conduct testing in the following regression:
where
is the residual of the mean equation at time
,
denotes the regression error term. More specifically, we define the null hypothesis as
against the alternative hypothesis as
for some
between
and
. Denote
, where
is the sample mean of
, and
, where
is the least squares residual of the prior linear regression. Then we have
which follows an
distribution with degrees of freedom
and
under
. When
is sufficiently large,
can be used as the test statistic, which is asymptotically a chi-squared distribution with
degrees of freedom under the null hypothesis. Let
be the upper
percentile of
. If
, we reject the null hypothesis and the residual series is heteroscedastic; otherwise, it does not have any ARCH effect. In this study, if the
-values computed from the
statistic are less than the significant level
, a proper volatility model needs to be established to describe the evolution of the traffic flow volatility.
3.3. Traffic Volatility Forecasting Model Development
When the significant ARCH effects are checked, the subsequent step is to capture the effects with a proper volatility model. The ARCH model proposed by Engle [
26] provides the first systematic framework for volatility modeling. Although owning some advantages for volatility modeling, the ARCH model also has some weaknesses. For example, the ARCH model is rather restrictive and often imposes complicated constraints with the increase of the orders, resulting in the limited applicability of the model. The GARCH model, first introduced by Bollerslev [
22], is a useful extension of the ARCH model and can provide a simpler parametric function to describe the volatility evolution. It should be indicated that both models respond equally to the positive and negative shocks [
47], and hence cannot be utilized to describe the asymmetric property of the volatility, a common phenomenon in a vast number of application areas.
To deal with the above issue, various GARCH-type models have been presented. Among the GARCH-type models for short-term traffic flow forecasting, the GARCH (1,1) model has been verified as the most adequate and simple representation of traffic volatility [
11,
18,
31]. Therefore, the omnibus FGARCH (1,1) based volatility equation is constructed in this study for short-term traffic volatility forecasting, which is mathematically expressed as
where
is the error term of the mean equation,
is the time-variant conditional standard deviation to reflect the traffic volatility, random variable
is a strong white noise process,
,
, and
are the parameters that need to be estimated,
is the Box-Cox transformation coefficient,
is the shifted and rotated absolute value function, in which the parameter
controls the shift effect and the parameter
governs the rotation effect.
The parameter
can be obtained from Box-Cox transformation for the conditional standard deviation. Therefore, the relationship between the conditional variances and all shocks in the past may be more flexible than the linear relationship. As indicated by Ding et al. [
48], a better estimate of the power transformation coefficient
can help maintain the long memory property of the volatility. In the standard GARCH model,
is fixed as
, which only describes the short memory property of volatility. The shift factor
and the rotation factor
are introduced to model the asymmetric impact upon volatility exerted by positive and negative shocks—called the leverage effect in the financial markets. Small shocks are dominated by the shift effect, while the rotation is more important for large shocks [
21]. As Helbing et al. [
40] demonstrated, there are intrinsic similarities between market players and traffic network users, implying that the shift and rotation effects of the asymmetric property that have been recognized in financial volatility may also exist in traffic volatility.
The above volatility equation has a flexible form for modeling volatility by controlling the parameters
,
, and
. For instance, we can derive the standard GARCH model by restricting the parameters
and
. That is, the standard GARCH model imposes restrictive nonnegative constraints on the parameters and also restricts a linear relationship between conditional variances and all shocks in the past. Moreover, the restriction
makes the standard GARCH model a symmetric model, in which both the shift and rotation effects are not allowed to describe the asymmetric property of volatility evolution. Similar restrictions can be seen in other GARCH-type models such as
,
in the EGARCH model, and
and
in the APARCH and TGARCH models. By manually setting up or freely estimating part or all of the three parameters, we can further obtain multiple models that can capture different traffic flow volatility characteristics. Interested readers can refer to Hentschel [
21] for the detailed restrictions of the three parameters in the FGARCH algorithmic framework. Overall, by constructing the FGARCH-based volatility equation, we can implement multiple traffic volatility forecasting models within a unifying framework, and are herein, able to provide a more appropriate and flexible form for explicitly describing the evolution of volatility and to fully investigate the feasibility of the models in different situations. The parameters of the developed traffic volatility forecasting models are finally determined by using the MLE method [
46].
3.4. Traffic Volatility Forecasting Model Checking
The adequacy of the fitted models needs to be examined carefully before traffic volatility forecasting. More specifically, we need to check the validity of the established mean equation and volatility equation. The validity of the mean equation can be examined by applying the Ljung–Box test to the standardized residual series
and the validity of the volatility equation can be checked by applying the Ljung–Box test to the squared standardized residual series
. When the above
-values of the Ljung–Box statistic are smaller than
, the traffic volatility forecasting model can reduce the heteroscedasticity in the residual series and capture the statistical characteristics of traffic volatility well. Moreover, the skewness, kurtosis, and quantile-to-quantile plot of the standardized residual series can be applied to examine the validity of the distribution assumption [
47].
3.5. Traffic Volatility Forecasting Model Evaluation and Selection
The last step of the framework is to evaluate the performance of the developed models and select the most effective one for further traffic volatility forecasting according to the evaluation results.
For traffic flow mean forecasting performance evaluation, two typical measures, i.e., the mean absolute error (MAE) and the mean absolute percentage error (MAPE), are selected, which are defined as follows.
In the above equations,
is the observed value of the traffic variable at time
,
is the forecast value of the traffic variable at time
, and
is the number of the tested data samples. To check the difference of the MAE measures of the two competing models, the two-sided DM test statistic [
49] is adopted.
With regard to traffic volatility forecasting performance evaluation, four classical measures, including the kickoff percentage (KP), the average confidence length (ACL), the volatility MAE (VMAE), and the directional accuracy, are employed. For short-term traffic flow forecasting, the KP measure indicates the probability that future observations do not lie in the forecasted confidence intervals when given the evaluation sample size. Theoretically, the better the performance is, the closer the KP is at a given significant level
(
in this study) [
19]. The ACL measure quantifies the average length of the forecasted confidence interval. To reduce forecasting uncertainties, the forecasted confidence interval should be reasonably as narrow as possible. According to Chen et al. [
50], VMAE measures the average magnitude of volatility forecasting error, and DA measures the correctness of the turning point forecasts that gives a rough indication of the average direction of the forecast volatility. The four measures are defined as follows.
In the above equations,
is a two-valued variable,
if
; otherwise,
.
and
are the lower and upper bounds of the forecasted confidence interval at time
.
is the forecasted confidence length at time
.
and
represent the observed volatility and forecasted volatility, respectively. As the observed volatility is unknown in reality,
can be used as the surrogate [
50], where
is the mean value of the tested data samples. The forecasted volatility is quantified as
.
4. Experimental Analysis
4.1. Data Description
The proposed traffic volatility forecasting framework and developed models were tested using real traffic speed data collected from urban roads of Kunshan city, China and freeway segments of the San Diego Region, USA [
51]. To investigate the performance of the models under different traffic conditions, two kinds of datasets were constructed. One consists of the datasets collected under normal traffic conditions, while the other includes the datasets collected under incident conditions. The former datasets were collected at 35 stations on 5 types of roads, namely major arterials, minor arterials, branches, expressway, and freeway. For each station, we collected data in four days. Two of them are the weekdays, and the other two are weekends. For the latter, the traffic flow data was acquired from the freeway monitoring stations associated with 12 incidents occurred in different days. The traffic speed data of each station was aggregated at 5-min intervals and divided into a calibration dataset and an evaluation dataset. The calibration dataset was used to build the traffic volatility model, while the evaluation dataset was utilized to test the performance of the calibrated model. More detailed information on the used data is described in
Appendix A Table A1 and
Table A2. It is noted that the missing, erroneous, and suspicious data is less than 3% for each monitoring station during the data collection periods, ensuring that the selected traffic datasets are suitable for model calibration and evaluation.
4.2. Performance Evaluation of Mean Forecasting Models
To evaluate the mean forecasting performance of the developed models, the MAE and its two-sided DM test statistics were computed and compared. In this study, the parameters of the mean forecasting model can be estimated in two ways—namely separate estimation and joint estimation. The former is implemented based on a separate ARIMA modeling process, while the latter is achieved by estimating the parameters of the mean equation and volatility equation jointly. The two ways may lead to different mean forecasting performance. Therefore, it is necessary to investigate the performance difference between the two ways.
Table 1 shows the performance comparisons between the separate estimation and joint estimation on the evaluation datasets. The first column of the table indicates the traffic flow patterns considered in this research, including normal traffic patterns on weekdays, normal traffic patterns on weekends, and incident traffic patterns. The second column depicts the comparison results. The label “Win” means the joint estimation way outperforms the separate estimation way. The label “Lose” means the joint estimation way is worse than the separate estimation way. The label “Tie” means the two ways have identical or similar performance. The digital number in the table cell represents the number of stations associated with the label in the corresponding row. The percentage in the parentheses indicates the proportion of the associated label.
As can be seen from the table, under normal traffic conditions, the two estimation ways show very similar performance, especially on weekends. However, under incident traffic conditions, there were significant differences between the performance of the two estimation ways on nearly half of the evaluation stations. More specifically, the separate estimation way achieved better performance than the joint estimation way. This implies that it is better to first estimate the parameters of the mean equation and further use the residual time series to estimate the parameters of the volatility equation when modeling traffic volatility under incident traffic conditions.
Traffic on different types of roads exhibits distinct fluctuation patterns. To properly evaluate the performance of the established models under different volatility levels, we grouped the stations with different road types according to the calculated VMAE measure, as illustrated in
Table 2. As seen, for normal conditions on weekdays, the traffic on minor arterial and branch show significant fluctuations and had a high volatility level. The major arterial and freeway possess medium volatility level, while the expressway has the lowest volatility. Different from the normal conditions on weekdays, the daily traffic on weekends passed through the freeway appears to have very low fluctuations and was hence put into the low volatility level group. The traffic flow on freeway segments under incident conditions shows medium-level volatility.
Table 3 provides the average MAE value and its DM test rank of the developed models on the evaluation datasets. As mentioned above, the stations were categorized into three groups (H-high volatility level; M-medium volatility level; L-low volatility level) according to the volatility level. The average MAE value was calculated as the mean value of all stations in the same group. The DM test rank was computed based on the following procedure: First, for each station in the same volatility level group, we compare the performance of two models according to the DM statistic and
value. If the
value is less than the significant level (0.05 in this study), it means one of the two comparative models is significantly better or worse than the other model. By repeating the above comparison process, we can obtain the rank of each model. Next, the ranks of each model are averaged for all stations in the same volatility level group. The smaller the rank is, the better performance the model possesses.
The best rank is labeled with the red color, while the worst rank is labeled with the blue color. Note that when the DM test ranks of two models are same, the best model is further determined by the average MAE values. As indicated by the average MAE values, the developed six models show very similar mean forecasting performance. By further going through the DM test ranks, we can find that the NGARCH has better performance when the volatility levels are high and medium. Comparatively, the TGARCH shows worse performance in the same situations. It is a bit surprising that the TGARCH model shows advantages over other models under incident conditions. We can also see that the freely estimated FGARCH model takes the most advantage at the high volatility level on weekdays.
Table 4 gives the average MAPE value and its rank for each developed model. The computing procedure is similar with that of the MAE measure. From the evaluation results, we can obtain a similar observation, further confirming the conclusions achieved above.
4.3. Performance Evaluation of Volatility Forecasting Models
To effectively evaluate the volatility forecasting performance of the developed models, the VMAE, DA, KP, and ACL measures were calculated. The four measures quantify the model performance from different perspectives. The VMAE measures the accuracy of traffic volatility forecasting. The DA gives the volatility direction accuracy of the forecasting models. KP and ACL ensure the forecasts fall into the predicted confidence interval as much as possible when the interval length is short enough. Considering the inconsistency of the measures in some cases, we also computed the ranks of the models for each measure to make a fair comparison.
Table 5 lists the average VMAE value and its corresponding DM test rank of the developed models. As can be seen, the volatility forecasting performance is quite similar for the six models. However, their ranks indicate that different models usually show distinct advantages compared to different traffic speed datasets. As indicated by Hentschel [
21], the GARCH-type models developed in this study show similar forecasting results on the Daily U.S. stock dataset due to the fact that they are derived from the same modeling framework. Our experimental results also point to a similar conclusion, although the models indeed show distinct forecasting behaviors on several datasets. For instance,
Table 6 and
Figure 2 depict the distinct performance of the models on the incident dataset I818_1118078.
As seen in
Table 3, the estimated parameters of the mean equation of the models appear to be very similar because the orders of the equation are the same. For the volatility equation, it is obvious that the estimated values of the parameter
exhibit distinct diversity among different models. The different values of the parameter
implies that the developed models may capture the conditional variance and respond differently to the past shocks when modeling the same traffic flow series. In addition, both of the freely estimated parameters
and
are different from 0, indicating the existence of asymmetric property of traffic speed volatility controlled by the shift and rotation effects at the same time. Moreover, for the FGARCH model, the estimated parameter
is positive and parameter
is negative, implying that the asymmetric property of traffic speed volatility at this station may be dominated by the combination of rightward shift effect and anticlockwise rotation effect, which is consistent with the “news impact curve” introduced by Pagan and Schwert [
52].
Figure 2 depicts the forecasted speed standard deviation curves of the developed models. As seen from the figure, these models respond differently to the fluctuations induced by the incident. The NGARCH model decreases the standard deviation forecasts gradually with the time evolution, while the other five models are all able to capture the incident volatility but respond with different magnitudes. To further check the effectiveness of the models, the observation, mean forecasts, as well as the forecast confidence intervals are given in
Appendix A Figure A1. It can easily be seen that the FGARCH exhibits the best volatility forecasting performance among the six models because it can not only correctly capture the volatility characteristics caused by the incident but also provide more reliable and narrower confidence interval than the others.
The DA measure describes the forecasting directional accuracy of the given model. In view of this, we calculated the average DA value and its rank of the six models, as shown in
Table 7. From the table, it is obvious that under normal traffic conditions, the standard GARCH model has the best performance on weekdays and the NAGARCH model shows more advantages over other models on weekends. Interestingly, the FGARCH has the best performance under incident conditions while not performing well under normal traffic conditions.
The absolute deviation of KP from the significance level of 0.05 was calculated to check the validity of the models at each traffic station. The calculation results include the average value and its rank on all stations of the same volatility level group, which are illustrated in
Table 8. From the table, we can see that the standard GARCH model and the GJR-GARCH model show better performance than the other four models in terms of KPD, which means the two models exhibit the smallest absolute deviation of KP from the given significant level (
) and tried to make the confidence interval contain forecasted data samples as much as possible. In contrast, the FGARCH exhibits unsatisfactory performance in most cases.
The average ACL value and the rank of each developed model are presented in
Table 9. As can be seen from the table, the overall ACL values of the models show similarities in different situations. Relatively, the standard GARCH model, GJR-GARCH model, and FGARCH model tend to enlarge the forecasted confidence interval. In contrast, the TGARCH model attempts to give a narrower confidence interval in most cases. It needs to be indicated that it is commonly difficult for a volatility model to achieve the lowest KP and ACL at the same time because the two measures are counterbalanced in describing volatility characteristics.
4.4. Impact of Identifying Orders of ARIMA on Forecasting Performance
There are commonly two strategies for identifying the orders of the mean equation of the proposed framework. The first is an automtic identification strategy, which uses the step-wise searching algorithm [
44] with an information criteria (e.g., AIC or BIC). The other is a manual identification strategy that finds the optimal orders by manually inspecting the ACF and PACF plots. Each strategy has its own advantages. For example, the automatic identification strategy is more computationally efficient and can be utilized to handle extensive datasets in an automated manner and hence realize significant cost savings, while the manual identification strategy is capable of providing more reliable forecasts in some cases.
To investigate the impact of the two order identification strategies on the forecasting performance of the developed models, we recorded the forecasting results using the two strategies. After careful inspection and calibration, we found that the two strategies lead to similar forecasting performance for most stations. However, for several specific monitoring stations, they show significant differences.
Appendix A Table A3 and
Table A4 show the comparison results on such kinds of stations. From the tables, we can see that the models established based on the automatic strategy can result in a significantly poor mean forecasting performance and volatility forecasting performance. The MAE and DA deteriorate dramatically for all of the developed models. This observation can be further obtained with
Figure 3, which describes the forecasting results of the developed models at station MA_1014014 on 9 September 2018. The mean forecasts and volatility forecasts biase the true observations significantly, while they are improved evidently after using the manual order identification strategy. Therefore, it is better to carefully check the identified orders of ARIMA when all of the developed volatility models perform poorly and try to determine the orders with the manual strategy.