0% found this document useful (0 votes)
16 views20 pages

Expected Idiosyncratic Volatility

This study analyzes nearly 80 million daily returns from over 19,000 firms to identify the best forecasting model for realized idiosyncratic variances, finding that the ARMA(1,1) model outperforms others, including the martingale model. The research reveals that expected idiosyncratic volatility is influenced by factors such as beta and turnover, while showing no significant relationship with returns when using the best models. The findings provide insights into the pricing of idiosyncratic risk and contribute to the understanding of the IVOL puzzle.

Uploaded by

behradpirozghani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views20 pages

Expected Idiosyncratic Volatility

This study analyzes nearly 80 million daily returns from over 19,000 firms to identify the best forecasting model for realized idiosyncratic variances, finding that the ARMA(1,1) model outperforms others, including the martingale model. The research reveals that expected idiosyncratic volatility is influenced by factors such as beta and turnover, while showing no significant relationship with returns when using the best models. The findings provide insights into the pricing of idiosyncratic risk and contribute to the understanding of the IVOL puzzle.

Uploaded by

behradpirozghani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Journal of Financial Economics 167 (2025) 104023

Contents lists available at ScienceDirect

Journal of Financial Economics


journal homepage: www.elsevier.com/locate/jfec

Expected idiosyncratic volatility☆


Geert Bekaert a,* , Mikael Bergbrant b,#, Haimanot Kassa c,##
a
Columbia Business School and CEPR
b
St. John’s University, Department of Economics and Finance, Tobin College of Business, Queens, NY 11439, USA
c
Miami University, Department of Finance, Farmer School of Business, Oxford, OH 45056, USA

A R T I C L E I N F O A B S T R A C T

We use close to 80 million daily returns for more than 19,000 CRSP listed firms to establish the best forecasting
Dataset link: Expected Idiosycratic Volatility
model for realized idiosyncratic variances. Comparing forecasts from multiple models, we find that the popular
(Reference data)
martingale model performs worst. Using the root-mean-squared-error (RMSE) to judge model performance,
ARMA(1,1) models perform the best for about 46% of the firms in out-of-sample tests. The ARMA(1,1) model
JEL Classification:
G11
delivers an average RMSE that is statistically significantly lower than all alternative models, and also performs
G12 well when not the very best. Its forecasts reverse large, unexpected shocks to realized variances. When using this
G14 model to revisit the relation between idiosyncratic risk and returns (the IVOL puzzle), we fail to find a significant
Keywords: relation. The IVOL puzzle is closely connected to a very small set of observations where the martingale forecast
Idiosyncratic volatility over-predicts the future realized variance. These extreme observations are correlated with well-known firm
IVOL puzzle characteristics associated with the IVOL puzzle such as poor liquidity as measured by high bid-ask spreads and
Volatility forecasting the “MAX” effect.
Max returns
Martingale
ARMA
ARIMA
HAR
MIDAS
Quarticity
Realized variances
EGARCH

1. Introduction known that the idiosyncratic variance component, more often than
not, dominates the systematic component of individual stock return
Quantifying the expected variance of an individual stock position variances.
requires estimates of systematic exposure, the conditional variances of In this paper, we build on the state-of-the-art literature on volatility
systematic variances (e.g. the market variance in a market model), and forecasting to run a horse race between a variety of models to measure
the expected idiosyncratic variance. There is a very extensive literature expected idiosyncratic risk. Our base set of models include the popular
on the first two components,1 but the literature on quantifying the ex- martingale model (see Ang et al., 2006), a variant of Corsi’s (2009)
pected idiosyncratic variance is scant. This is surprising as it is well- heterogeneous autoregressive (“HAR”) model, a non-linear

Nikolai Roussanov was the editor for this article. We sincerely thank the referee for their valuable comments, which significantly improved the exposition of the

paper. We are grateful for comments from Turan Bali, Jonathan Lewellen, Nikolai Roussanov (editor) and Avanidhar Subrahmanyam as well as the remarks of
seminar participants at University of Denver, West Virginia University and Ohio University. The paper has also benefitted from feedback at the Southern Finance
Conference (2023). We thank Jens Mueller for helping us use Miami University’s Redhawk cluster for processing large batches of data.
* Corresponding author at: Columbia University, 3022 Broadway, Uris Hall, Room 411, New York, NY 10027, United States.
E-mail addresses: gb241@gsb.columbia.edu (G. Bekaert), bergbram@stjohns.edu (M. Bergbrant), kassah@miamioh.edu (H. Kassa).
#
Reed-McDermott Associate Professor of Finance.
##
Lindmor Associate Professor of Finance.
1
For example, the literature on predicting market variances is too vast to survey, but we use some of its state-of-the-art models, like the Corsi (2009) model, below.

https://doi.org/10.1016/j.jfineco.2025.104023
Received 18 June 2023; Received in revised form 1 February 2025; Accepted 3 February 2025
Available online 20 February 2025
0304-405X/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

autoregressive model and an ARMA(1,1) model. To accommodate po- We then link these extreme forecast errors to standard controls used
tential specification errors for the systematic component of returns in IVOL-puzzle regressions and the explanatory variables explored in
and/or market wide variance components, we also consider the last Hou and Loh (2016), using a variety of econometric techniques. We
three models with the market variance as an additional independent conjecture that firm characteristics associated with large martingale
variable. Our models aim to forecast future monthly idiosyncratic real- forecast errors should also help explain the IVOL puzzle. We find a
ized variances using daily data for individual firm returns going back to strong relationship with bid-ask spreads, size indicators, measures of
1926. Using mainly the Root Mean Squared (Forecast) Error, henceforth illiquidity, and past returns (the reversal effect). The variable most
RMSE, we find that an ARMA(1,1) model for realized variances is the dramatically related to extreme forecast errors is the MAX variable (the
best model in terms of out-of-sample performance. It generates the best maximum daily return over the past month) from Bali et al. (2011),
out-of-sample forecasts for more than 34% of firms. The ARMA model which seems a good proxy for firm months in which the realized idio-
and the ARMA model with the market variance together generate the syncratic variance is excessively and temporarily high.
best forecasts for over 46% of firms. The ARMA(1,1) model also per- The ARMA(1,1) forecasts differ drastically from the martingale
forms relatively well when not the very best, ranking among the top 3 model, as the former model allows temporary shocks to IVAR to revert,
models for more than 72% of firms. Furthermore, the median ratio be- and does therefore not generate many of the poor forecasts (outliers with
tween the RMSE of the ARMA model to that of the best model is 1.0106, large forecast errors) that drive the negative relation between martin-
meaning that the ARMA forecast is only 1.06% worse than the forecast gale IVOL and returns. This helps explain the difference in the relation
of the best model for the median firm. More telling, the 75% percentile between the forecasts from this model and returns. We perform the same
of this ratio is 1.0549 implying that the ARMA forecasts are very good analysis using the difference between the martingale and ARMA(1,1)
even when not the very best. Finally, it delivers statistically significantly forecasts and, indeed, the same firm characteristics that explain the
lower average RMSE than all alternative models. Economically, the extreme martingale forecast errors, are associated with large differences
ARMA model features an autoregressive coefficient much higher than an in the forecasts of the two models.
AR(1) model, but lower, of course, than the martingale model. However, We further characterize what firm properties cause variation in ex-
through a large moving average coefficient its forecasts reverse large, pected idiosyncratic volatility. Focusing on the forecasts from the ARMA
unexpected shocks to realized variances. (1,1) model, we find that expected idiosyncratic volatility is increasing
The outperformance of the ARMA(1,1) should not come as a surprise in beta and turnover while decreasing in firm size and book to market,
given the literature on high frequency volatility forecasting (see e.g. with the size and turnover effects economically largest. Finally, we
Barndorff-Nielsen and Shephard, 2002). When the integrated variance investigate the evolution of aggregate expected idiosyncratic volatility
follows a simple autoregressive process, the realized variance naturally over time. It appears overall stationary but shows occasional extreme
follows an ARMA(1,1) process.2 This literature also suggests various peaks, such as during the Great Depression, the 1970s’ stagflation, the
extensions of our base models, including higher order ARMA models Tech boom and ensuing bear market, the Great Financial Recession, and
(see e.g. Andreou and Ghysels, 2002), models embedding quarticity (see the Covid shock.
e.g. Bollerslev et al., 2016), and MIDAS models (see Ghysels et al., Our paper provides useful inputs to several literatures. First, the
2007). However, even compared to these more sophisticated models, the literature on the dynamic properties of expected idiosyncratic volatility
ARMA(1,1) model remains the best performing model. We perform a should be helpful in risk management, asset allocation and option
host of robustness checks to the measurement of idiosyncratic realized pricing. In asset management, idiosyncratic volatility is a key determi-
variances (IVAR) underlying our main results, and the results remain nant of the optimal weight assigned to mis-priced securities or arbitrage
unchanged. opportunities (see Pontiff, 2006; Treynor and Black, 1973). Many indi-
Our results offer a useful perspective on the huge literature on the vidual option pricing models (see e.g., Bakshi et al., 2021) start from
(negative) pricing of idiosyncratic risk in the cross-section of expected models built for index options. Our results suggest that such models may
returns (see Ang et al. 2006). We find that while expected idiosyncratic miss an important dynamic component of idiosyncratic risk.3
risk derived from the martingale model is negatively (and significantly) Secondly, a rapidly growing macroeconomic literature studies the
related to returns (as shown in prior literature), none of the other effect of uncertainty shocks on real economic activity and business cy-
forecasts, including those of the best ARMA(1,1) model, are significantly cles, and documents that heightened uncertainty can entail economic
related to returns. Our failure to reject the null of no relation continues slowdowns through delayed firm investments, or increased precau-
to hold when generating the forecasts from the “best” model for each tionary savings by households. While macroeconomists often employ
firm (judged by the lowest out-of-sample RMSE for each firm) and when stock market data to measure uncertainty, they take various short-cuts
back testing models each month and choosing the forecast from the by using aggregate market volatility or measures of cross-sectional re-
model with the best past performance. Digging deeper, we document turn dispersion (see e.g., Bloom (2009) and Christiano et al. (2014)).
that the negative relationship between idiosyncratic risk and future However, the relevant economic models seem to call for a measure of
realized returns using the martingale model is very fragile, and disap- idiosyncratic variance, reflecting non-systematic volatility, and, better
pears when a limited number of firm-month observations are removed. still, a measure of firm-specific productivity or output uncertainty. Our
Specifically, removing just 0.4% of the observations each month with measures may therefore provide useful input to this macroeconomic
the highest realized idiosyncratic variance forecast errors, measured as literature.4
forecast minus realization, suffices to render the relationship insignifi- Finally, our finding that expected IVOL is not priced when estimated
cant. The relationship is also rendered insignificant in the univariate using reasonable models should be comforting to proponents of efficient
(multivariate) regression models when 5% (7%) of firm months with the
highest martingale - ARMA(1,1) forecast differences are removed.
Hence, the IVOL puzzle is driven by observations where the martingale
3
forecast is particularly poor, and specifically when past IVAR shocks (i.e. Bakshi et al. (2003) show, inter alia, that individual equity options reveal
high IVAR in time t-1) revert in time t, which the martingale fails to less negative skewness than market options. We estimate our models using
projections or quasi maximum likelihood and do not take a stand on conditional
account for. Importantly, under identical sample variations for the
distributions, which are very important in asset pricing. Our results suggest that
ARMA(1,1) model, the inference regarding the relationship between
the data generating process for individual stock variances should feature a
idiosyncratic variances and returns does not change. shock dampening volatility effect, as implied by the ARMA(1,1) model.
4
Kozeniauskas et al. (2018) distinguish between uncertainty shocks derived
from both micro and macro data, but also show they are substantially
2
We thank the referee for pointing us in this direction. correlated.

2
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

markets and traditional asset pricing theory. In addition, the findings (2012) scaled by the stock price. Both measures are downloaded from
that the negative relation between martingale IVOL forecasts and the Open Source Asset Pricing website7 and these values are lagged to
returns is driven by very few observations with particularly poor fore- ensure that we do not introduce a look-ahead bias. Finally, Zero return
casts, and that explanations of the puzzle are all related to the incidence (ZERORET) is measured following Han and Lesmond (2011) as the
of these poor forecasts should be helpful in resolving the “IVOL Puzzle”. fraction of trading days with a zero return in the previous month. We
The remainder of the paper is organized as follows. The next section also use two alternative size measures (both dummies). One is for firms
(2) describes the data. Section 3 describes the construction of our ex- that have a price below $5 at the end of the prior month (LOW_PRC), and
pected realized variance proxies using 7 different base models and a another is for firms with a market capitalization below the 5th NYSE
variety of more sophisticated models. Section 4 analyzes the forecast percentile in the prior month (LOW_MC). Finally, we measure MAX as
performance; Section 5 contrasts the pricing of idiosyncratic risk under the highest daily return in the prior month.
different volatility models and analyzes why the martingale model Although we use all available data to create our variables, many of
uniquely delivers a negative relation with returns. Section 6 analyzes the our tests start in 1963 due to limitations in accounting data prior to that
properties of expected idiosyncratic volatility using the best (ARMA) time. For our asset pricing tests, we follow prior literature on investi-
model. The conclusion is presented in Section 7. gating whether idiosyncratic risk is related to returns in the cross-section
and winsorize all continuous variables (except for returns8 and BETA)
2. Data and variable descriptions monthly at the 0.5% and 99.5% levels.

We obtain daily holding period returns inclusive of dividends, prices, 3. Variance forecasting models
shares outstanding, and volume data for equities traded on NYSE/
AMEX, and NASDAQ from January of 1926 to December 2022 from the To compute the best possible estimate of expected idiosyncratic
Center for Research in Security Prices (CRSP). This provides us with a variances, we rely on the state-of-the-art volatility forecasting literature,
total of 78,304,731 daily (3,715,261 monthly) firm returns covering which now almost exclusively uses realized variance models. Andersen
26,493 firms. The number of firms varies between 500 (1926:1) and et al. (2003) show that, under mild conditions, the expected conditional
7,519 (1997:12) over the 1,164 available months. We obtain accounting realized variance converges to the conditional return variance, where
data from Compustat’s annual fundamentals file. The Fama and French the realized variance is the quadratic variation of high frequency
(1993) factors and the risk-free rate (RF) are obtained through Wharton returns. Therefore, our criterion to define forecasting performance is the
Research Data Services (WRDS). Our tests on the relative performance of ability of different models to predict future realized idiosyncratic vari-
different models (Tables 1-4) are based on data over the entire ances.9 In Section 3.1, we outline the computation of idiosyncratic
1926–2022 sample. The remaining tables have a later start date based on variances; in Section 3.2, we first discuss why theoretically, an ARMA(1,
data availability from Compustat as described below. 1) model would be a natural forecasting model for realized variances,
To characterize the results and conduct tests for whether idiosyn- whereas the martingale model is mis-specified. We then present the 7
cratic risk is priced, we collect data on firm characteristics. Beta, size, base models we examine. In Section 3.3, we discuss and justify various
and market-to-book ratios are defined as in Fama and French (1992) and extensions of the base models. These include higher order ARMA(p,q)
updated annually while turnover and the coefficient of variation of processes, models embedding quarticity (following Bollerslev et al.,
turnover are constructed following Chordia et al. (2001) and updated 2016), MIDAS models (Ghysels et al., 2019), and the monthly EGARCH
monthly. Turnover (TURN) is calculated as the average share turnover Model (Nelson, 1991)
(monthly volume divided by the numbers of shares outstanding) in the
past 36 months (a minimum of 24 months of data is required). We also 3.1. Calculating idiosyncratic realized variance
calculate the coefficient of variation of the turnover over the same time
period (CVTURN). Momentum (MOM) is calculated as the mean The extensive literature on volatility forecasting focuses mostly on
monthly holding period return between month t-7 to t-2, and Reversals index returns, where the consensus currently has converged to using 5-
(LAGRET) represents last month’s return. Momentum and Reversals are minute returns (see Andersen et al., 2001, and many others) as an
calculated based on monthly data and updated monthly. All adequate trade-off between microstructure noise and the continuous
cross-sectional variables, except for beta, are calculated ex-ante.5 time limit. With microstructure noise (see e.g., Chordia and Sub-
To explore how martingale IVOL forecasts may contribute to the rahmanyam, 2004; Asparouhova et al., 2010; and Han and Lesmond,
IVOL puzzle, we collect data on common explanatory variables used in 2011 among others) more important for most individual firms than for
this literature (see e.g. Hou and Loh, 2016). Skewness (SKEW) is market returns, sparser sampling might be preferred (see Bandi and
calculated using daily returns in the previous month while Co-skewness Russell, 2006). Although an active econometric literature has proposed
(COSKEW) is measured as the regression coefficient of squared daily several new realized variance estimators to reduce the effects of
stock returns on market returns in the previous month. We require 15 microstructure noise (see e.g. Jacod et al. (2009)), we are forced to
return observations within a month for both SKEW and COSKEW. We measure idiosyncratic shocks at the daily frequency because high fre-
obtain expected idiosyncratic skewness (EXPIDIOSKEW) from Boyer quency data for (a subset of) individual stocks is only available since the
et al. (2010).6 We use several alternative measures of market frictions. early nineties.
Amihud Illiquidity (AMILL) is calculated following Amihud (2002) as Let’s define the idiosyncratic return shock for an individual firm i,
the past 12-month average of daily returns divided by turnover while observed at day d in month t, as εi,d,t . We define the monthly idiosyn-
bid-ask spread (BIDASK) is calculated following Amihud and Mendel-
cratic realized variance (IVARi,t ) as the sum of the squared daily idio-
sohn (1986) as the effective bid-ask spread based on Corwin and Schultz
syncratic return shocks (εi,d,t ) within month t. For our main models, we
use εi,d,t from the Fama and French (1993) model estimated within

5
Fama and French (1992) argue the forward-looking BETA proxy is more
7
precise than using BETA estimates at the firm level. While this variable is used https://www.openassetpricing.com/.
8
in analyzing our results, as well as the asset pricing tests, it is not used to Following prior literature, we set monthly returns over 300% to missing for
compute idiosyncratic risk and does not feature in our model selection our asset pricing tests.
9
exercises. While a few articles forecast volatility (the square root of the variance),
6
We thank Prof. Brian Boyer for making the data available for download at volatility is not a proper moment and constitutes a non-linear transformation of
https://boyer.byu.edu/skewness-data. the variance.

3
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

month t. example, imagine a firm experiencing a large earnings shock, which


( ) causes its volatility to spike. Presumably, such large volatility spike
RETi,d,t − RFd,t = ai,t + bi,t MKTd,t − RFd,t + si,t SMBd,t + hi,t HMLd,t + εi,d,t
should not persist into the next month. To accommodate such variation
Nt
∑ in persistence, our third model is a non-linear AR(1) model, where the
IVARi,t = ε2i,d,t persistence coefficient is a function of the past variance (“ARNL”):
d=1
( )
(1) IVARi,t = ai + θi *L bi * IVARi,t− 1 *IVARi,t− 1 + ei,t (5)

where RETi,d,t is the stock return for firm i on day d in month t, RFd,t is the where L(.) is the logistic function, i.e., L(x) = exp(x)/[1 +exp(x)]. We
daily equivalent of the one-month Treasury bill rate from Ibbotson As- expect the b-coefficient to be negative. Bekaert et al. (2024) find evi-
sociates on day d in month t obtained from Professor Ken French dence for such non-linearity in US, Euro area and Japanese market
(through WRDS), and MKT, SMB and HML are the market, size and value variances.10 The model is estimated by non-linear least squares.
factors. Nt is the maximum number of trading days within the month. The fourth model is a variant of the Corsi (2009) “HAR” model,
Our definition of idiosyncratic shocks closely follows the finance perhaps the most popular model in the extensive realized variance
literature, but we also consider several additional alternatives for forecasting literature at the market level:
robustness (See Section 4.4 for the results). First, we use the CAPM in
month t instead of the Fama-French three factor model. Second, we IVARi,t = ai + bi IVARi,t− 1 + θi IVARi,t− 1,w + ci IVARi,t− 1,d + ei,t (6)
estimate the parameters of the Fama-French three factor model using
daily data in the prior 3 months (t-2 to t), to estimate εi,d,t in month t. where IVARi,t− 1,w is the sum of squared residuals in the last week (w) of
Third, we adjust IVAR for autocorrelation following French et al. (1987) the prior month (t-1) for firm i and IVARi,t− 1,d is the squared residual on
by adding 2 times the sum of daily cross-products of adjacent returns. the last trading day (d) in the prior month (t-1) for firm i. Thus, more
Specifically, recent information can differentially affect the conditional variance for
the next month. The HAR model was initially developed for daily real-
Nt N t− 1
∑ ∑ ized variance forecasting and the current formulation may not be
IVARFSS
i,t = ε2i,d,t + 2 εi,d,t εi,d+1,t (2)
d=1 d=1
optimal for monthly forecasting. However, the results in Bekaert and
Hoerova (2014) for stock return variances at the market level, and for
where, as stated above, the subscript i is the firm, t is the month, d is the various asset classes in Ghysels et al. (2019) show that it performs
day within the month, and Nt is the maximum number of trading days reasonably well for monthly horizons. In essence, the HAR model is a
within the month. Fourth, we restrict the sample to only include firms in MIDAS regression with step functions, see e.g. Ghysels et al. (2007) and
the S&P500 in a given month, as this should ensure that the findings are the relevant discussion in Corsi (2009). If optimal at the daily level, the
not driven by illiquidity or micro-structure noise. monthly forecast could also be viewed as a special case of a MIDAS
forecast, putting different weights on the various daily realized vari-
ances within the month. We consider a MIDAS framework directly in
3.2. Predictive models for realized variances Section 3.3.
In addition to the previous 4 models, we also estimate each of the last
Ang et al. (2006) find that lagged realized volatility is negatively 3 models including the sum of squared daily returns of the market during
related to returns (which is often termed the IVOL Puzzle), implicitly the prior month (i.e., the market realized variance) as an independent
using a martingale model for realized variances. This is the most popular variable. While we have removed systematic components in returns,
IVAR model and the first model we consider: there is evidence of common factors in idiosyncratic firm variances (see
Herskovic et al., 2016, for evidence in the US, and Bekaert et al., 2024,
IVARi,t = IVARi,t− 1 + ei,t (3)
for evidence in 21 countries.) Allowing dependence on the market
However, on theoretical grounds, this model is almost surely mis- variance is a simple way to potentially capture such factors, which may
specified. Without getting into technical details, the realized variance improve the forecasting power of the model.11
literature shows that under certain conditions, the conditional return
variance can be measured as the conditional expected value of inte-
3.3. Additional predictive models for realized variances
grated variance, which is approximated by the realized variance. Let’s
define integrated variance as IVt , and realized variance (measured using
While the bulk of results focuses on the base models introduced in
a particular sampling interval) as RVt , then, RVt = IVt + ut where ut is
Section 3.2, we also estimate a wide set of alternative models. These
a zero mean noise term, reflecting microstructure noise and the fact RVt
models fall into four different categories: generalizations of the ARMA
uses a particular sampling frequency during the day (See Anderson et al.,
(1,1) model, alternative time-varying parameter models embedding
2003; Barndorff-Nielsen and Shephard, 2002 for formal discussions). If
quarticity, MIDAS models which generalize the HAR model, and the
we assume an AR(1) model for IVt , and standard white noise for the error
EGARCH model.
term ut , the resulting time series model for RVt is an ARMA(1,1) model.
Thus, there are theoretical reasons to expect the ARMA(1,1) model to
3.3.1. Additional ARMA models
outperform the martingale model, and it is the second model we
While theory suggests that realized variances follow an ARMA(1,1)
consider:
model, this assumes that the integrated variance is an AR(1) process.
IVARi,t = ai + bi IVARi,t− 1 − θi ei,t− 1 + ei,t (4) Chernov et al. (2003), among others, find that continuous time volatility
models are best described by two-factor models, which would suggest an
We estimate this model by Maximum Likelihood (MLE), assuming
ARMA(2,2) model for realized variances. Component models of
the initial residual is zero. Economically, when a large volatility spike
occurs, it likely causes a large positive residual error; with a positive θ
coefficient, the model can then counteract the high volatility prediction 10
The model is also related to the quarticity correction proposed in Bollerslev
of the autoregressive component. et al. (2016), which models the persistence coefficient as a function of the
Econometrically, there is an alternative way to accommodate rela- degree of uncertainty with which the realized variance is measured, see Section
tively quick changes in the dependence on past variances, namely time- 3.3.
varying autoregressive parameters. It is conceivable that the persistence 11
Bartram et al. (2016) and Bekaert et al. (2012) find a strong link between
of volatility is different in normal times than in high volatility times. For the aggregate idiosyncratic variance and the conditional market variance.

4
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

volatility (e.g., the permanent and transitory components in Engle and 3.3.2.3. ARMA(1,1) quarticity model. In this model, both the AR and
Lee, 1999; or bad and good volatility components in Bekaert et al., 2015) MA coefficients are a linear function of the quarticity measure.
also suggest richer ARMA processes. In fact, Barndorff-Nielsen and ( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅ )
Shephard (2002) show theoretically that integrated and realized vari- IVARi,t = αi,0 + βi,1 + βi,1Q RQi,t− 1 IVARi,t− 1
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅ )
ances are ARMA(p,p) processes when the spot variance is a linear + θi,1 + θi,1Q RQi,t− 1 ei,t− 1 + ei,t (10)
combination of p independent continuous time autoregressive processes.
An alternative literature, focusing on improving the measurement of For the above quarticity models, we set the starting values based on
the median values from the “base” models, and we set the quarticity
realized variances, suggests using smoothing by incorporating lagged
realized variances (see e.g. Ghysels et al., 2023 and Andreou and Ghy- coefficients to 0. All quarticity models are estimated using PROC MODEL
in SAS.
sels, 2002). The Andreou-Ghysels paper suggests that for the S&P 500
index, using up to two lags improves measurement of monthly realized
variances. Such measurement schemes would also suggest more 3.3.2.4. MIDAS quarticity model. We defer the discussion of this model
complicated ARMA models for the realized variance. We therefore to the next section.
consider an alternative forecasting model, where we estimate all (12) In all these models, we follow Bollerslev et al. (2016) and do not use
ARMA(p,q) models, with p in the (1,2,3) set and q in the (0,1,2,3) set. RQ itself in the estimation but RQ minus its sample mean so that βi, 1 , for
For each firm, in each month (t), we then use the forecast for month (t + example, is the unconditional autoregressive estimate. In our
1) from the model with the lowest Akaike information criterion (1974, out-of-sample analysis, this averaging is done for each rolling sample
AIC henceforth). over which we conduct the estimation.
While theory may favor more complicated models, it is well known
that in out-of-sample forecasting, parsimonious models do well. We 3.3.3. MIDAS models
therefore also consider a simple autoregressive AR(1) model for realized As mentioned before, the HAR model is a special case of a MIDAS
variances: model. We therefore also consider MIDAS models explicitly: that is, we
model the forecast as a function of all IVARi,t,d in the prior 22 days with a
IVARi,t = ai + bi IVARi,t− 1 + ei,t (7) flexible weight function. We parametrize the weights with an (expo-
nential) Almon lag specification as in Ghysels et al. (2019). Obviously,
3.3.2. Quarticity models the monthly implied model of the daily HAR model would be a special
Ghysels et al. (2023) derive the optimal weight on current and past case of this model. Specifically, this model can be written as follows:
realized variances in the measurement of integrated variance, suggest-
K
ing the weight on current realized variance should depend on the noise ∑ ( )
IVARi,t = αi + ϕi wi,j IVARi,t− 1,d + ei,t (11)
with which it is measured. This noise is proportional to the quarticity of j=0
realized variances, which depends on returns to the fourth power (see e.
g. Barndorff-Nielsen and Shephard, 2002). Bollerslev et al. (2016) focus where the t subscript is a month subscript and d indicates the day within
directly on forecasting future realized variances and suggest estimating the month, with j = 0 representing the last day in the month (the one
an AR(1) model with the AR(1) coefficient interacted with realized that is nearest), and K the first day. We set K = 22. The weight function
quarticity, or an HAR model with some or all of the coefficients inter- depends on two parameters, with the weights adding to one and always
acted with the relevant quarticity measure. Note that the intuition here positive. Dropping the stock subscript i, the weight for day j is:
is quite similar to the non-linear autoregressive model (ARNL) in the /
K
base set of models. We estimate a number of such models. ( ) ∑ ( )
wj = exp θ1 j + θ2 j2 exp θ1 j + θ2 j2
j=0
3.3.2.1. Autoregressive quarticity model
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ) Note that this specification has three parameters, two determining
IVARi,t = αi,0 + αi,1 + αi,1Q RQi,t− 1 IVARi,t− 1 + ei,t (8) the weight function and one “autoregressive” parameter. This weight
function can fit almost any pattern, for example, if θ2 < 0 weights
where RQ stands for realized quarticity, and RQ is generally estimated decline to zero eventually. We estimate the model by non-linear least

as: RQt,d = (M /3) M 4
k=1 rt,k , where rt,k is the high frequency (e.g. 5 min) squares, just as we did for the nonlinear AR model. We use agnostic
return, and M is the number of high frequency returns used per day, starting values, setting θ1 = θ2 = 0, which produces equal 1/K weights.
where we dropped the stock subscript. Of course, in our set up with daily For ϕ, we try different starting values in the set (11,9,7,5,3) and use the
returns, M = 1, so that daily quarticity is the daily idiosyncratic return one that produces the smallest objective function value to start the
to the 4th power and monthly quarticity becomes RQt iterative process. These starting values correspond to a wide, reasonable
∑ t 4
= (1 /Nt )(1 /3) Nk=1 rt,k where rt,k are now idiosyncratic daily returns range of AR(1) coefficients. Note that for weights equal to 1/K, the usual
within month t and Nt is the number of trading days within month t. The autoregressive coefficient would correspond to ϕ/K.
prediction is that α1Q is negative; when the noise in measuring RV is Running this non-linear estimation for our large set of stocks over a
high, the persistence of realized variances decreases. long historical period is not trivial, and we did experience convergence
issues in about 8% of the cases. To improve the estimation performance,
we also estimate simpler versions of the MIDAS model, namely a first
3.3.2.2. HAR quarticity model. Here all coefficients in the HAR model
order and second order Taylor approximation to the actual model.
interact with the relevant quarticity measure:
Internet Appendix 3 describes the derivation of these models. These
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅)
IVARi,t = αi,0 + αi,1 + αi,1Q RQi,t− 1 IVARi,t− 1 models typically perform about as well as the original model and are
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅)
+ βi,1 + βi,1Q RQi,t− 1,w IVARi,t− 1,w (9) more stable. Therefore, our estimation approach uses the quadratic
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅)
+ γi,1 + γi,1Q RQi,t− 1,d IVARi,t− 1,d + ei,t model when the full model fails to converge, and the linear model when
the quadratic model also fails to converge.
where the adjustment for the weekly (5 days) and one day quarticity We also estimate the quarticity version of this model:
measures sums the daily returns to the 4th power over the recent 5 (1) K
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅) ∑ ( )
days and divides by 5 times 3 for the weekly estimate or by 1 times 3 for IVARi,t = αi + ϕi,0 + ϕi,1 RQi,t− 1 wi,j IVARi,t,d + ei,t (12)
the daily estimate. j=0

5
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

3.3.4. EGARCH models approximately normal distribution. Our second and third tests conduct
Finally, we also consider Nelson’s (1991) EGARCH model to predict non-parametric tests of the differences in RMSEs between models, spe-
future realized variances using monthly returns. In fact, this is one of the cifically the Sign Test and the Wilcoxon Signed Rank Tests.13 Because
most popular models in the idiosyncratic volatility literature (See Fu, these tests are quite standard, we relegate a brief description of them to
2009; Fink et al., 2012; and Guo et al., 2014). However, as shown by Internet Appendix 1, where they are discussed together with the in-
Bergbrant and Kassa (2021) the model is a poor fit for most firms, and sample results.
consistent with this we find that its performance is by far the worst from
all the models we consider. Internet Appendix 3 describes how, consis- 4.2. Out-of-Sample performance for base models
tent with the literature, we estimate this model and we later briefly
further comment on its performance. We report the main out-of-sample performance results for the base
models in Table 1, using the RMSE to measure forecast errors. The mean
4. Forecast performance (median) RMSE is reported in the first (last) column. We produce
average firm statistics in two ways: the first is to simply average the
Section 4.1 describes the practical setup of our forecasting exercise. RMSE across firms and the second is to also weight the RMSE by the
Section 4. 2 discusses the out-of-sample results for the base set of number of monthly observations available per firm. Although the
models, whereas Section 4.3 discusses the relative performance of the weighted results show substantially lower average RMSEs, consistent
additional models considered and Section 4.4 presents robustness tests with expectations that model performance improves with the number of
observations used, the relative performance of the models are almost
4.1. Forecasting set-up and in sample results identical with the two methodologies, so we only report and discuss the
simple arithmetic averages. The table shows that the martingale model
All our predictive models are estimated using monthly data. We is the worst model, but it is not too far off from the HAR model that uses
require at least 36 monthly observations to include a firm in our sample. the market variance. The best model is by far the simple ARMA(1,1)
While we conduct a thorough in sample analysis, its results are mostly model both in terms of the mean and median RMSE.
confirmed by the out-of-sample analysis. We therefore relegate a In columns 2 and 3, we report the fraction of times the model is the
detailed discussion of the in-sample tests to the Internet Appendix best model (lowest RMSE) and the average rank. That is, for each firm
(Internet Appendix 1). we rank the performance of all models (with 1 being best and 7 being
For the out-of-sample exercise, we opt to use relatively long rolling worst) and record the average rank. The ARMA(1,1) model is also best in
samples of 10 years. This choice strikes a balance between having suf- terms of average rank (2.63) and is the best model in 34.3% of the cases.
ficient data to efficiently estimate model parameters and accommoda- Together, the ARMA model and the ARMA model appended with market
ting varying volatility dynamics over a firm’s life cycle. This is feasible variance are the best for 46.5% of the cases. Despite featuring a higher
because our data go back to 1926 for many firms.12 Concretely, we use average and median RMSE than all other models, the martingale model
rolling regressions, allowing up to 120 months (10 years) of data to is the best model in 13.8% of the cases, and only two additional models
estimate the models. We fit the models with data up until time t-1, and other than the ARMA(1,1) model are best in more than 10% of the cases
use the forecast for time t as our out-of-sample forecast. As indicated (the simple ARNL model and the ARMA model with the market vari-
above, we require a minimum of 36 observations to generate the forecast ance). In terms of average rank, the second-best model is the ARNL
for the following month. (Hence, for firms within their first 10 years of model (3.38), followed closely by the ARMA model with the market
listing, the forecasts use an expanding window). In total, we estimate 17
out-of-sample models, and we only include a forecast for firm i at time t,
when calculating statistics for each model, if we have forecasts from all
our estimated models for the firm in that month. Our main performance Table 1
Out-of-sample forecast performance for base models.
criterion is the out-of-sample RMSE (the square root of the mean squared
forecast errors). Model Mean Best Rank Imp. to Worst Median
With samples sometimes being relatively short, some models pro- Martingale 0.1031 13.83% 5.32 29.41% 0.0324
duce real outlier forecasts that any reasonable practitioner would reject. ARMA(1,1) 0.0989 34.28% 2.63 81.80% 0.0275
Although this does not impact the inferences regarding the best model, ARNL 0.1005 17.35% 3.38 70.15% 0.0281
HAR 0.1021 8.18% 4.37 57.11% 0.0300
we winsorize all variance forecasts at 0 and 0.5 in our main tests. The 0.5
ARMA(1,1) w/ Mkt 0.1002 12.24% 3.45 72.15% 0.0287
maximum value is close to the 99.5th percentile of realized variances in ARNL w/ Mkt 0.1013 9.20% 3.95 63.82% 0.0292
our full sample. (In robustness exercises, presented in Panels E and F of HAR w/ Mkt 0.1030 4.78% 4.90 50.20% 0.0310
Appendix A, we redo the main tables with no winsorization and win-
This table shows out-of-sample statistics for the 7 models described in the data
sorization at the 2.5% and 97.5% levels and find qualitatively similar section, where the statistics presented are based on firm averages. To calculate
results.) the out-of-sample statistics, each model is fit every month for each firm to
To provide a formal statistical test that any model indeed performs generate a forecast for the following month, using up to 10 years of data (when
better in the cross-section of US stocks, we conduct three paired sig- available) prior to the time of the forecast. Before using these forecasts to
nificance tests. Each test produces a test statistic for the null hypothesis calculate the out-of-sample statistic for each firm, all monthly forecasts are
that the mean (or median) of the RMSE is equal across two models (i.e. winsorized at 0 and 0.5. In addition to the mean statistic, the table also shows the
the difference between them is equal to 0) against the two-sided alter- proportion of firms for which a model generates the best forecasts, as well as the
native that the mean or median is not equal. The first is the standard average rank of the model. We also report the “improvement to worst”, defined
as the average ratio of the difference between the worst RMSE and the model
Student’s t-test, which is appropriate if the data is from an
being examined relative to the RMSE difference between the worst and best. The
final column reports the median RMSE. The sample period is from 1926 to 2022.
12
This long-term perspective makes it impossible to consider the use of option
implied volatilities, which are only available since 1996, and only for a small
subset of our stock universe.
13
Details about the estimation of the t-test, sign and Wilcoxon signed rank
tests can be found here: Tests for Location : Base SAS(R) 9.4 Procedures Guide:
Statistical Procedures, Third Edition.

6
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

variance (3.45). Taken together, these results clearly confirm the supe- to third best models are the ARMA model with the market variance, and
riority of the ARMA(1,1) model with regard to forecasting idiosyncratic the ARNL model, with ratios of around 70%.
variances. The superiority of the ARMA(1,1) model is not driven by the specific
Because the average RMSEs appear close to one another, it is time period used in this sample, as it is robust across different sub-pe-
important to verify that the improvement in fit is economically mean- riods. Internet Appendix 2 shows that it produces the best forecasts
ingful and different across models. We therefore report the “Improve- (when judging based on a combination of all criteria) in each decade of
ment to worst statistic,” in column 4; that is, the average ratio of the our study (from the 1930s to the 2020s).
difference between the worst RMSE and the RMSE of the model being These results also hold up in the formal paired statistical tests, re-
examined, relative to the RMSE difference between the worst and the ported in Table 2, with Panel A reporting the t-test, Panel B the Sign test
best model. If the model always has the lowest RMSE, the statistic is and Panel C the Signed Rank test. Note that because we have so many
obviously one. The martingale model has a very low improvement to firms, the tests are extremely powerful and almost invariably reject the
worst statistic of only 29.41%. The ARMA(1,1) model is by far the best, null of equality with very small p-values.14
on average capturing 81.80% of the worst to best difference. The second The ARMA(1,1) model produces significantly lower average RMSE

Table 2
Formal tests of Out-of-Sample (OOS) differences in RMSE.
Panel A: T-Test w/ Mkt

Martingale ARMA ARNL HAR ARMA ARNL HAR

ARMA(1,1) − 32.88 ​ − 16.40 − 28.99 − 17.47 − 22.33 − 34.47


​ (0.00) ​ (0.00) (0.00) (0.00) (0.00) (0.00)
ARNL − 16.18 16.40 ​ − 13.16 2.58 − 13.63 − 19.52
​ (0.00) (0.00) ​ (0.00) (0.01) (0.00) (0.00)
HAR − 7.11 28.99 13.16 ​ 15.24 6.17 − 14.85
​ (0.00) (0.00) (0.00) ​ (0.00) (0.00) (0.00)
ARMA(1,1) w/ Mkt − 19.58 17.47 − 2.58 − 15.24 ​ − 10.58 − 25.95
​ (0.00) (0.00) (0.01) (0.00) ​ (0.00) (0.00)
ARNL w/ Mkt − 11.18 22.33 13.63 − 6.17 10.58 ​ − 14.72
​ (0.00) (0.00) (0.00) (0.00) (0.00) ​ (0.00)
HAR w/ Mkt − 0.72 34.47 19.52 14.85 25.95 14.72 ​
​ (0.47) (0.00) (0.00) (0.00) (0.00) (0.00) ​

Panel B: Sign Test ​ ​ ​ ​ w/ Mkt

Martingale ARMA ARNL HAR ARMA ARNL HAR

ARMA(1,1) − 5932 − 2719 − 5136 − 3927 − 3525 − 5401


​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
ARNL − 4729 2719 − 3455 339 − 2843 − 4017
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
HAR − 3470 5136 3455 2967 1695 − 2636
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
ARMA(1,1) w/ Mkt − 4836 3927 − 339 − 2967 − 1858 − 4528
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
ARNL w/ Mkt − 3936 3525 2843 − 1695 1858 − 3573
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
HAR w/ Mkt − 2622 5401 4017 2636 4528 3573
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

Panel C: Signed Rank Test ​ ​ ​ ​ w/ Mkt

​ Martingale ARMA ARNL HAR ARMA ARNL HAR

ARMA(1,1) − 57.47 − 30.00 − 54.47 − 41.40 − 38.33 − 57.21


​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
ARNL − 42.72 30.00 − 34.95 3.55 − 29.82 − 41.19
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
HAR − 28.43 54.47 34.95 31.28 16.98 − 28.29
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
ARMA(1,1) w/ Mkt − 45.88 41.40 − 3.55 − 31.28 − 20.47 − 48.01
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
ARNL w/ Mkt − 34.69 38.33 29.82 − 16.98 20.47 − 35.73
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
HAR w/ Mkt − 19.53 57.21 41.19 28.29 48.01 35.73
​ (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

Panels A shows Student’s t-test statistic and the associated p-value on the difference between the OOS RMSE (Panel A) for the model identified in column 1 minus that
on the horizontal axis. For example, in the first cell is the t-stat for the difference in OOS RMSE betwee`n the ARMA(1,1) and Martingale model. Underneath the t-stat is
the associated p-value. Panel B shows the Sign Test statistic and the associated p-value and Panel C shows the Signed Rank Test Statistics and the associated p-value. All
Signed Rank Test statistics have been divided by 1 million to conserve space in the table.

14
These tests assume that the firm values are uncorrelated across firms.
Because the underlying observations represent a difference between test sta-
tistics (such as the RMSE) of two different models for a firm, it is difficult to
imagine they would exhibit strong cross-firm correlation, and it is not clear such
correlation would even be positive.

7
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Table 3
Comparison of performance and forecasts for base models.
Panel A: Comparison to “best”:
​ P10 P25 P50 P75 P90

Martingale 0.00% 6.08% 19.76% 30.86% 41.84%


ARMA(1,1) 0.00% 0.00% 1.06% 5.49% 17.63%
ARNL 0.00% 0.54% 3.43% 9.66% 26.07%
HAR 0.17% 2.32% 6.89% 16.09% 38.48%
ARMA(1,1) w/ Mkt 0.00% 0.65% 3.04% 9.70% 28.88%
ARNL w/ Mkt 0.04% 1.19% 4.80% 13.07% 36.01%
HAR w/ Mkt 0.59% 3.01% 8.60% 20.03% 49.91%

Panel B: Forecast Correlation ​ ​ ​ ​ w/ Mkt

​ Martingale ARMA(1,1) ARNL HAR ARMA(1,1) ARNL HAR

Martingale 1
ARMA(1,1) 0.82 1
ARNL 0.77 0.87 1
HAR 0.79 0.86 0.82 1
ARMA(1,1) w/ Mkt 0.79 0.95 0.84 0.84 1
ARNL w/ Mkt 0.76 0.85 0.95 0.80 0.87 1
HAR w/ Mkt 0.77 0.84 0.80 0.95 0.87 0.84 1

Panel A shows the percentiles (10th, 25th, 50th, 75th, and 90th) of the distribution of a statistic measuring how much worse a given model is compared to the “best”
model, namely the percentage increase in RMSE compared to the “best” model. Panel B shows the Pearson correlation coefficients between the forecasts produced by
the different models.

compared to all other models; and the second-best model is the ARMA base models (martingale, ARMA(1,1), ARNL and HAR). We drop the
(1,1) model featuring the market variance, according to all three tests. models with the market variance as they are dominated by models
The ARNL model is third best. It is striking that the results are very without the market variance in out-of-sample exercises. Both new
consistent across all three tests. models are in fact quite competitive, with both beating the martingale
In addition, it turns out that the performance of the ARMA(1,1) and HAR models for most statistics (mean, median, rank, and
model is stellar for the large majority of the firms. That is, even for the improvement to worst). The simple AR(1) model is overall actually
firms for which it is not the best model, its relative performance is still slightly better than the ARMA(AIC) model. It is also competitive with the
very good. Table 3 (Panel A) shows how each model fares compared to ARNL model for most statistics, but the ARNL model is far superior in
the “best” model. For each firm, we calculate how much larger (in terms of the fraction of times it is the best model. The ARMA(1,1) model
percent) the RMSE is for Model i compared to the “best” model for that is by far the best model for all statistics.
firm. We then present percentiles of the distribution across firms of this Panel B examines whether embedding quarticity improves fore-
statistic. For example, for the martingale model, almost 10% (90th casting performance. We consider the AR(1), ARMA(1,1) and HAR
percentile) of the firms have a 42% worse performance than the best models with their quarticity counterparts. It is very clear that incorpo-
model. As is apparent, the ARMA(1,1) model works great in forecasting rating quarticity in the models is not helpful and invariably increases the
IVAR even when not the best. In fact, for 50% (75%) of firms, it gen- average and median RMSE and worsens the rank and “improvement to
erates forecasts with RMSEs that are less than 1.06% (5.49%) worse than worst statistics”. The only exception is that the AR(1) model with
the “best” model. The ARNL, ARNL plus market, and ARMA(1,1) plus quarticity is best in 19.2% of the cases which is better than the simple AR
market models also have median statistics less than 5%, with the ARMA (1) model. This is similar to the ARNL model being much better along
(1,1) plus market model second-best at 3.04%. However, at the 90th this statistic than the AR(1) model in Panel A. Over all the statistics, the
percentile, the ARNL model is better than the ARMA(1,1) plus market ARMA(1,1) model remains the best model.
model. The ARMA(1,1) model generates the lowest statistics across the In Panel C, we report the results for the MIDAS model together with
whole distribution. the 4 non-market variance base models. The performance of the MIDAS
To gain some intuition on the differences between the forecasts model is very close to that of the HAR model, and only improves on it in
generated by the various models, Panel B of Table 3 reports panel cor- terms of the fraction of times it is the best model (at 10.89% versus
relations between the forecasts of the 7 models. The forecasts of all 9.36% for the HAR model). Again, the ARMA(1,1) model is by far the
models do show substantial correlation. Most of these correlations are best model.
between 0.75 and 0.95 with the martingale model generally producing Finally, in Panel D, we report results for all 17 models we estimate,
forecasts that are the least correlated with the other models (correlations including the EGARCH model. Because of the terrible performance of the
varying between 0.76 and 0.82). EGARCH model, the improvement to worst statistics go up considerably
Not surprisingly, the forecasts of the ARMA(1,1) model are most for most models, including the martingale model. The dominance of the
correlated with the forecasts from the ARMA model featuring the market ARMA(1,1) model is, if anything, even more glaring. It is the best model
variance (the correlation is 0.95). In fact, each model that includes the in 21.74% of the cases and its improvement to worst statistic is over
market variance has a correlation with its counterpart without the 90%. The second-best model depends on the statistic considered: it is the
market variance that is 0.95. AR(1) and ARMA(1,1) plus market variance models for the average
RMSE, the martingale model for the best model fraction (at 9.24%), and
the ARNL model for the average rank, the improvement to worst statistic
4.3. Out-of-Sample performance for additional models and median RMSE.
We conclude that the ARMA(1,1) model is by far the best forecasting
Here we cast a wider net and also discuss the performance of the model for idiosyncratic variances.
additional models described in Section 3.3. The results are reported in
several panels in Table 4. In Panel A, we compare the performance of
more complicated ARMA models (where we select the best ARMA
model), termed ARMA(AIC), and a simple AR(1) model with the four

8
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Table 4
Out-of-sample forecast performance for additional models.
Panel A: ​ ​ ​ ​ ​
Model Mean Best Rank Imp. to Worst Median

Martingale 0.1031 14.28% 4.72 27.11% 0.0324


ARMA(1,1) 0.0989 36.27% 2.36 82.22% 0.0275
ARNL 0.1005 20.78% 3.16 69.60% 0.0281
HAR 0.1021 9.86% 4.09 55.71% 0.0300
ARMA(AIC) 0.1009 8.67% 3.51 66.25% 0.0285
AR(1) 0.1002 9.87% 3.16 69.92% 0.0284

Panel B: ​ ​ ​ ​ ​
Model Mean Best Rank Imp. to Worst Median

ARMA(1,1) 0.0989 41.87% 2.24 85.52% 0.0275


ARMA(1,1) Quart 0.1113 8.83% 4.69 32.67% 0.0403
HAR 0.1021 9.45% 3.68 66.90% 0.0300
HAR Quart 0.1088 6.83% 4.57 41.89% 0.0372
AR(1) 0.1002 13.61% 2.88 76.10% 0.0284
AR(1) Quart 0.1017 19.24% 2.94 75.13% 0.0294

Panel C: ​ ​ ​ ​ ​
Model Mean Best Rank Imp. to Worst Median

Martingale 0.1031 13.83% 3.89 29.08% 0.0324


ARMA(1,1) 0.0989 42.99% 2.04 82.35% 0.0275
ARNL 0.1005 22.81% 2.54 70.44% 0.0281
HAR 0.1021 9.36% 3.24 57.31% 0.0300
MIDAS 0.1024 10.89% 3.30 54.75% 0.0302

Panel D: ​ ​ ​ ​ ​
Model Mean Best Rank Imp. to Worst Median

Martingale 0.1031 9.24% 11.19 76.96% 0.0324


ARMA(1,1) 0.0989 21.74% 4.91 91.11% 0.0275
ARMA(AIC) 0.1009 4.46% 7.55 86.66% 0.0285
ARMA(1,1) Quart 0.1113 5.27% 12.05 63.64% 0.0403
ARNL 0.1005 7.39% 6.54 88.18% 0.0281
HAR 0.1021 3.34% 8.97 83.94% 0.0300
HAR Quart 0.1088 3.55% 11.67 69.96% 0.0372
AR(1) 0.1002 4.29% 6.76 87.78% 0.0284
AR(1) Quart 0.1017 7.44% 7.19 85.72% 0.0294
AR(1) w/ Mkt 0.1012 3.00% 8.02 85.48% 0.0295
ARNL w/ Mkt 0.1013 4.87% 7.75 86.12% 0.0292
ARMA(1,1) w/ Mkt 0.1002 7.22% 6.60 88.08% 0.0287
ARMA(AIC) w/ Mkt 0.1023 3.25% 9.15 83.50% 0.0299
HAR w/ Mkt 0.1030 2.44% 10.01 81.73% 0.0310
MIDAS 0.1024 5.36% 9.20 83.04% 0.0302
MIDAS Quart 0.1046 4.93% 9.86 78.83% 0.0326
EGARCH 0.1397 1.98% 15.58 16.09% 0.0718

This table shows out-of-sample statistics for subsets of the 17 models described in the data section, where the statistics presented are based on firm averages. To
calculate the out-of-sample statistics, each model is fit every month for each firm to generate a forecast for the following month, using up to 10 years of data (when
available) prior to the time of the forecast. Before using these forecasts to calculate the out-of-sample statistic for each firm, all monthly forecasts are winsorized at
0 and 0.5. In addition to the mean statistic, the table also shows the proportion of firms for which a model generates the best forecasts, as well as the average rank of the
model. We also report the “improvement to worst”, defined as the average ratio of the difference between the worst RMSE and the model being examined relative to the
RMSE difference between the worst and best. The final column reports the median RMSE. The sample period is from 1926 to 2022.

4.4. Robustness paper are not sensitive to the particular asset pricing model used to
compute idiosyncratic variances.
In this section, we verify the robustness of our out-of-sample results Our next set of robustness checks attempts to ensure that micro-
to changing our definition of how idiosyncratic variance is calculated, to structure noise, such as infrequent trading, does not drive our results.
limiting our sample to liquid firms (those included in the S&P 500), and Given that prior literature has suggested microstructure noise may drive
to the choice of winsorization of the forecasts. the relation between certain measures of idiosyncratic risk and returns
We estimate idiosyncratic variance in two alternative ways, designed (Bali and Cakici, 2008; Han and Lesmond, 2011; Bergbrant and Kassa,
to reduce the noise inherent in an estimation that uses relatively few 2021), this is an important consideration for any paper measuring
observations (around 20 daily observations within the month) to obtain idiosyncratic volatility. Our first approach to account for this noise is to
the parameters in Eq. (1). First, we estimate Eq. (1) without the HML and correct for autocorrelation in the daily residuals using the French et al.
SMB factors, assuming a CAPM model. Second, we follow Welch (2022) (1987) correction. For these models, the IVAR is calculated by adding 2
and start by winsorizing daily firm returns to an interval of minus 2 to times the sum of the products of each daily return with the prior day’s
plus 4 times the contemporaneous market return and then estimate Eq. return. Our second approach is to consider a sample which only includes
(1) using three months of data (but calculate IVAR using only the re- firms in months when they were part of the S&P 500. The results for
siduals in month t). these tests are presented in Panels C and D of Appendix A. The
The results, presented in Panels A-B of Appendix A, show that the martingale model now performs substantially worse with the percentage
specific factor model used to calculate idiosyncratic variances does not of firms for which it is the best model decreasing to less than 10%; and
change the conclusion that the ARMA model performs best in fore- even decreasing below 6% for the S&P 500 firms (5.77%), which are
casting future IVAR. This is comforting, suggesting the results in this arguably the firms with the least microstructure noise. Importantly, the

9
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

ARMA(1,1) model continues to produce the best forecasts for the largest forecast that most closely approximates the true expected idiosyncratic
number of firms. In fact, for the most liquid (S&P 500) firms, the ARMA variance. Our analysis shows that neither the martingale model nor the
(1,1) produces the best forecast for over 40% of firms (42.37%). EGARCH model is likely to accurately approximate true expected idio-
For the main out-of-sample results in the paper, we winsorize our syncratic risk, whereas the ARMA(1,1) model provides the best estimate.
monthly forecasts at 0 and 0.5. For robustness, we also estimate the main In Table 5, Panel A, we use the seven proxies we developed in this
results without winsorization, as well as with monthly winsorization at paper in a univariate Fama-MacBeth (1973) regression framework to
2.5% and 97.5% of the cross-sectional distribution of the IVAR forecasts. investigate the empirical relation between expected idiosyncratic risk
The results are presented in Panels E and F of Appendix A. While the and returns.16 The table shows that the only proxy for expected idio-
mean RMSE are much higher (extremely so for some models) without syncratic volatility that is statistically significantly related to returns is
winsorization,15 the ARMA(1,1) model remains the best model for all the one derived from the martingale model. While the coefficients are
statistics, and has the lowest RMSE for over 35% of firms. This remains always negative, all the other proxies for expected idiosyncratic risk
true for the alternative (2.5%, 97.5%) winsorization scheme. show no statistically significant relation with returns, and the t-stats are
less than |1| in all cases. The failure to reject the null of no effect also
5. The idiosyncratic volatility pricing puzzle holds for the best proxy for expected idiosyncratic variance identified in
this paper, i.e., the ARMA(1,1) model. Note also that the martingale
In this section, we reconsider the IVOL puzzle with different models model generates the lowest adjusted R-square of all models.
measuring expected idiosyncratic risk (Section 5.1). We then further In Panel B of Table 5, we report multivariate tests controlling for
analyze why the martingale model delivers different results than all typical predictors of the cross-section of stock returns including market
other models, by focusing on forecasts for which the martingale per- beta, size, book-to-market, momentum, lagged returns (prior month’s
forms the worst, as well as differences between the martingale and return), turnover, and the coefficient of variation of turnover in our
ARMA(1,1) forecasts (Section 5.2). Finally, we link extreme martingale Fama-MacBeth (1973) regressions (see Huang et al., 2010 for a similar
forecasts, which render the relationship between idiosyncratic volatility specification). Including an extensive set of independent variables is
and future returns negative, to explanatory variables from the IVOL particularly important in our context as IVAR itself is estimated using
literature (Section 5.3). only the Fama and French 3-factors,17 and hence could be related to
returns due to its correlation with any omitted risk factor. Again, a
5.1. Revisiting the IVOL puzzle significant relation between expected IVOL and returns only arises using
the forecasts from the martingale model.
Traditional asset pricing models such as the CAPM and APT predict We also acknowledge the possibility that different IVAR models
that idiosyncratic volatility can be diversified away, and therefore might work better for different firms, or at different points in time.
should not be priced in equilibrium. However, subsequent studies argue Results in the previous sections indicate that the ARMA(1,1), although
that many investors might be poorly diversified (Blume and Friend best overall, is not the best for all firms. In the second to last columns of
1975; Goetzmann and Kumar, 2008) and thus might require a premium Panel A and B of Table 5, we allow the volatility model to differ across
for bearing idiosyncratic risk (Levy 1978; Merton 1987; Malkiel and Xu firms by using forecasts from the model with the overall lowest out-of-
2002). sample RMSE for each firm. Although all forecasts are out-of-sample,
However, a large empirical literature documents a negative relation it is important to note that this approach has a look-ahead bias in the
between high realized idiosyncratic volatility and subsequent returns sense that the best IVAR model for each firm is determined ex-post. This
(Ang et al. 2006, 2009). This negative relation is often referred to in the exercise further corroborates our key result, as the IVOL coefficient re-
literature as the idiosyncratic volatility (IVOL) puzzle, under the implicit mains insignificant.
conjecture that IVOL follows a martingale process and past IVOL can be We also allow different IVAR models to be used in different periods
used as a proxy for expected volatility. Consequently, a voluminous and for different firms, by back testing our models each month. We do
literature on the determinants of the “IVOL puzzle” has developed. One this by comparing the past out-of-sample RMSE forecasts generated by
strand argues that investors have a preference for lottery-like payoffs; each model, for each firm, in the 10 years leading up to time t. We then
IVOL then simply proxies for desired features of a stock, such as skew- use the best IVAR model to generate our realized variance expectation
ness (see, for example, Barberis and Huang, 2008; Chabi-Yo and Yang, for the next month t+1. The results are presented in the last column of
2009; Boyer et. al. 2010; Bali et al., 2011.) Alternative explanations for Table 5 (Min_RMSE), showing again an insignificant IVOL coefficient.
the IVOL puzzle include bid-ask bounce (Fu, 2009; Huang et al., 2010), Overall, our results suggest that the relation between idiosyncratic
illiquidity (Han and Lesmond, 2011), fundamental uncertainty volatility and returns is flat, consistent with traditional asset pricing
(Johnson, 2004; Jiang, et al., 2009), a missing risk factor (Chen and theory, and the negative relation documented in prior literature is likely
Petkova, 2012), and arbitrage asymmetry (Stambaugh et al., 2015). Hou an artifact of the proxy used for expected idiosyncratic volatility rather
and Loh (2016) provides an excellent survey of the extant literature. than reflecting the true underlying relation between expected idiosyn-
A small subsequent literature suggests that when an alternative cratic volatility and returns. However, it is important to note that for our
volatility model is used, namely an EGARCH model, a positive (Fu, tests, we do not know the magnitude of type II errors, so we cannot
2009; Brockman et al., 2022), albeit fragile (Guo et al., 2014; Bergbrant unambiguously conclude that no relation between idiosyncratic vola-
and Kassa, 2021) relation between expected idiosyncratic volatility and tility and returns exists. It is possible that all our proxies contain suffi-
returns emerges at the firm level. cient noise to mask a true relation between expected idiosyncratic risk
Given that IVOL forecasts from different models (EGARCH vs. and returns.
martingale) generate opposite results, it is plausible that the relation is
model specific and does not reflect the true relation between expected 5.2. Why does the martingale model generate the IVOL puzzle?
idiosyncratic risk and returns. The actual relation between expected
idiosyncratic risk and returns should be present for an idiosyncratic risk Although there is a voluminous literature trying to explain the

15 16
Extreme forecasts for the non-linear AR models render the average statistics To be consistent with the IVOL literature, we use the square root of the
meaningless. These models sometimes generate very poor and unrealistic IVAR forecast, but using the IVAR forecast itself produces analogous results.
17
forecasts as a result of extreme and unrealistic parameter estimates. Note that Estimating IVAR using models with more risk factors is difficult because we
for the ARMA(1,1) model, winsorizing makes little difference. use daily data within the month.

10
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Table 5
Is the expected idiosyncratic volatility priced?
Panel A: Univariate ​ ​ ​ ​ w/ Mkt ​ ​

​ Martingale ARMA(1,1) ARNL HAR ARMA(1,1) ARNL HAR Best_Overall Min_RMSE

Intercept 1.014*** 0.957*** 0.936*** 0.972*** 0.943*** 0.914*** 0.956*** 0.877*** 0.949***
​ (5.79) (5.74) (5.72) (5.76) (5.68) (5.64) (5.72) (5.33) (5.76)
IVOL − 2.404** − 1.113 − 0.941 − 1.112 − 0.926 − 0.731 − 0.944 − 0.541 − 0.970
​ (− 2.24) (− 0.75) (− 0.65) (− 0.81) (− 0.63) (− 0.51) (− 0.69) (− 0.37) (− 0.68)
Adj R-Sq 0.020 0.026 0.024 0.023 0.025 0.024 0.023 0.025 0.024

Panel B: Multivariate ​ ​ ​ ​ w/ Mkt ​ ​

​ Martingale ARMA(1,1) ARNL HAR ARMA(1,1) ARNL HAR Best_Overall Min_RMSE

Intercept 3.094*** 2.998*** 3.013*** 2.987*** 2.961*** 2.997*** 2.969*** 2.973*** 3.048***
​ (8.67) (8.70) (8.61) (8.53) (8.55) (8.56) (8.46) (8.36) (8.56)
IVOL − 1.740*** − 0.996 − 0.163 − 0.662 − 0.894 0.027 − 0.472 0.439 − 0.860
​ (− 2.64) (− 0.82) (− 0.15) (− 0.64) (− 0.76) (0.02) (− 0.46) (0.37) (− 0.84)
BETA 0.198 0.200 0.182 0.197 0.207 0.185 0.198 0.171 0.188
​ (1.27) (1.35) (1.21) (1.30) (1.39) (1.23) (1.30) (1.15) (1.24)
LN(ME) − 0.125*** − 0.122*** − 0.112*** − 0.118*** − 0.119*** − 0.110*** − 0.116*** − 0.100*** − 0.121***
​ (− 4.33) (− 4.78) (− 4.28) (− 4.47) (− 4.59) (− 4.17) (− 4.33) (− 3.91) (− 4.63)
LN(BEME) 0.189*** 0.185*** 0.192*** 0.189*** 0.189*** 0.194*** 0.192*** 0.196*** 0.189***
​ (3.55) (3.49) (3.60) (3.56) (3.55) (3.62) (3.60) (3.68) (3.53)
MOM 0.027** 0.028** 0.028** 0.027** 0.028** 0.028** 0.027** 0.028*** 0.026**
​ (2.42) (2.55) (2.50) (2.37) (2.55) (2.49) (2.39) (2.61) (2.34)
LAGRET − 0.059*** − 0.061*** − 0.060*** − 0.061*** − 0.061*** − 0.060*** − 0.060*** − 0.061*** − 0.061***
​ (− 13.92) (− 14.49) (− 14.45) (− 14.47) (− 14.46) (− 14.44) (− 14.45) (− 14.75) (− 14.40)
LN(TURN) − 0.165*** − 0.155*** − 0.171*** − 0.162*** − 0.158*** − 0.175*** − 0.166*** − 0.183*** − 0.154***
​ (− 2.92) (− 2.89) (− 3.13) (− 2.93) (− 2.91) (− 3.21) (− 3.00) (− 3.40) (− 2.85)
LN(CVTURN) − 0.347*** − 0.339*** − 0.363*** − 0.343*** − 0.336*** − 0.365*** − 0.345*** − 0.379*** − 0.354***
​ (− 6.01) (− 6.26) (− 6.66) (− 6.18) (− 6.11) (− 6.65) (− 6.13) (− 6.86) (− 6.15)
Adj R-Sq 0.064 0.065 0.065 0.064 0.065 0.065 0.064 0.065 0.065

This table reports cross-sectional Fama and MacBeth (1973) regression results of forecasting one month ahead (time t) stock returns. IVOL is the square root of the
out-of-sample forecast of IVAR for time t based on prior data (up until time t-1) using 7 different forecasting models and 2 selection models. Panel A shows univariate
results, while Panel B presents the multivariate version. Newey and West (1987) corrected t-statistics with three lags are reported in parentheses, and *, **, and ***
indicate statistical significance at the 10%, 5%, and 1% levels respectively. The data are from 1963:07 to 2022:12.

negative relation between martingale IVOL and returns, a fully IVAR and equity returns is positive (limited liability is one source of such
convincing explanation remains elusive (see, for example, the review in relationship). Thus, if the erroneous outliers over-predict future vari-
Hou and Loh, 2016 and the references therein). In this section, we link ances (and hence, returns), the relationship between current IVOL and
the puzzle to the fact that the martingale model generates poor forecasts future returns could easily turn negative, as it associates, erroneously,
for future idiosyncratic variances. Even for firms where it produces the low future return firm months with previously high realized variance
“best” forecasts, the martingale forecasts are generally poor with large months. In contrast, current relatively high IVAR observations that
forecast errors. So, why do forecasts produced by the martingale model severely under-predict future realized variances, are associated with
generate a negative relation to future returns, while all other models, high returns in the future and cannot be the source of a negative rela-
including the better ARMA(1,1) forecasts, show an insignificant tionship between current IVOL and future returns.
relation? Outlying under-prediction observations associated with low realized
To understand this, let’s first assume that there is no relation be- variances could also turn the relationship negative, associating, erro-
tween expected IVOL and returns (as suggested by all our models except neously, high future returns with previously low realized variances.
for the martingale model). While there will still be some firm months However, because there are few large forecast errors in this quadrant,
with high expected IVOL and subsequently low returns, the impact of the low current realized variance, high future returns quadrant is not
those observations would be offset by firm months with high expected likely responsible for the negative IVOL relation. Hence, it is likely the
IVOL and high returns, leading to an insignificant relation. The same high volatility quadrant, which generates outliers potentially inducing a
would be true for firm months with low expected IVAR and subsequent spurious positive or negative relation between volatility forecasts and
(low/high) returns. Put differently, it should then be the case that in the returns.
data the four possible quadrants of current volatility forecasts (high vs To test the conjecture that the negative relation between the
low) and future returns (high vs low) cancel each other out. martingale forecasts and returns is driven by a few outliers, we remove
However, outliers (observations with extreme expected variances observations with the worst possible forecasts and verify how the rela-
and subsequent extreme returns) can have a large impact on the esti- tion between idiosyncratic volatility and future returns changes. Note
mated coefficients and could easily cause a spurious relation (negative that if the negative relation between martingale IVAR and returns found
or positive) between expected volatility and returns, especially if the in the data reflects investors actually “pricing” IVOL we would expect it to
outliers are more likely to occur in one of the four quadrants. be stronger when poor forecasts (ex-post) are excluded and be robust to
The well-known fact that the realized variance distribution is skewed exclusions of subsets of firms.
plays a key role in the martingale model generating a negative rela- In Panel A of Table 6, we first eliminate 0%− 1% of firms (in 0.2%
tionship between current volatility forecasts and future returns. First, increments) each month where the martingale forecasts overestimate
this skewness renders the martingale model particularly sensitive to (underestimate) realized IVAR the most. We then report the regression
outliers in the right tail. In fact, we find that the worst forecast errors coefficients on IVOL for both the univariate and multivariate cross-
(both over-predictions and under-predictions) are concentrated in the sectional models (the multivariate model is the same as in Table 5, but
high realized variance quadrants. we suppress the coefficients on the controls from brevity). As expected,
Second, it is well known that the contemporaneous relation between removing firms where the martingale model overestimates IVAR the

11
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Table 6 model does the worst job at forecasting and the negative relation can be
Forecast errors and the IVOL puzzle. overturned by eliminating the observations showing the worst over-
Panel A: Drop highest Martingale IVOL forecast errors prediction of realized variances.18
We have previously shown that the ARMA(1,1) does a stellar job in
Overestimation of IVAR Underestimation of IVAR
forecasting IVAR, possibly because it allows the reversal of large shocks
Univariate Multivariate Univariate Multivariate to variances. If the overpredictions from the martingale model occur
Delete top 0% − 2.404** − 1.740*** − 2.404** − 1.740*** because of firms with large past shocks to their IVAR which the
​ (− 2.24) (− 2.64) (− 2.24) (− 2.64) martingale model fails to reverse, we would expect that eliminating
Delete top 0.2% 1.924* 1.397** 3.776*** 2.604***
firms where the martingale model has the highest IVAR predictions
− − − −
​ (− 1.73) (− 2.03) (− 3.57) (− 4.03)
Delete top 0.4% − 1.281 − 0.859 − 4.496*** − 2.975*** compared to the ARMA(1,1) model would also diminish the negative
​ (− 1.11) (− 1.21) (− 4.33) (− 4.70) relation between martingale IVAR and returns. The benefit of this,
Delete top 0.6% − 0.697 − 0.356 − 4.964*** − 3.183*** compared to the prior results, is that the exclusions are made ex-ante
​ (− 0.58) (− 0.48) (− 4.83) (− 5.06) (based solely on differences in expectations). Naturally, we would
Delete top 0.8% − 0.207 0.152 − 5.343*** − 3.376***
​ (− 0.17) (0.20) (− 5.26) (− 5.38)
need to exclude more firm months as we do not know which observa-
Delete top 1.0% 0.255 0.628 − 5.649*** − 3.562*** tions produce the greatest outliers. The results are presented in Panel B
​ (0.20) (0.79) (− 5.61) (− 5.71) of Table 6 (left column). A similar pattern as before (in Panel A) is
Panel B:Drop highest Martingale-ARMA Forecasts present when dropping these firm months. Dropping 5% (7%) of the
firms each month for which the martingale fails to reverse large past
Martingale Forecast ARMA(1,1) Forecast
IVAR (i.e. the martingale generates the highest forecast compared to the
​ Univariate Multivariate Univariate Multivariate ARMA model) in the univariate (multivariate) regressions renders the
Delete top 0% − 2.404** − 1.740*** − 1.113 − 0.996 relation insignificant. Note that the univariate regressions feature more
​ (− 2.24) (− 2.64) (− 0.75) (− 0.82) negative coefficients and require fewer observations to be dropped to
Delete top 1% 2.337* 1.855** 0.741 0.913
− − − −
render the relationship insignificant.. Importantly, dropping the same
​ (− 1.94) (− 2.47) (− 0.49) (− 0.73)
Delete top 2% − 2.305* − 1.783** − 0.551 − 0.717 firm months, the inference regarding the relationship between idio-
​ (− 1.79) (− 2.21) (− 0.36) (− 0.57) syncratic volatility and returns does not change when ARMA(1,1)
Delete top 3% − 2.322* − 1.898** − 0.452 − 0.754 forecasts are used to measure idiosyncratic risk. (See right-hand side
​ (− 1.74) (− 2.25) (− 0.29) (− 0.59) columns.)
Delete top 4% − 2.303* − 1.856** − 0.365 − 0.647
​ (− 1.66) (− 2.12) (− 0.23) (− 0.50)
Delete top 5% − 2.219 − 1.861** − 0.258 − 0.676 5.3. Martingale forecasts and firm characteristics
​ (− 1.55) (− 2.13) (− 0.17) (− 0.54)
Delete top 6% − 2.087 − 1.628* − 0.132 − 0.526
​ (− 1.43) (− 1.79) (− 0.08) (− 0.41) Thus far, we have shown that the IVOL puzzle only obtains for the
Delete top 7% − 2.020 − 1.465 − 0.066 − 0.400 martingale forecasts, and that it is heavily influenced by observations for
​ (− 1.35) (− 1.56) (− 0.04) (− 0.31) which the martingale model generates extremely poor, excessively high
Delete top 8% − 1.810 − 1.269 0.058 − 0.321 forecasts. For these outliers to drive the IVOL puzzle, purported expla-
(− 1.19) (− 1.35) (0.04) (− 0.25)
nations of the puzzle should be related to these extreme martingale

Delete top 9% − 1.757 − 1.164 0.091 − 0.261
​ (− 1.14) (− 1.22) (0.06) (− 0.20) forecast errors (outliers). While no article has yet claimed to fully ac-
Delete top 10% − 1.680 − 1.207 0.146 − 0.379 count for the IVOL puzzle, Hou and Loh (2016) provide a comprehensive
​ (− 1.07) (− 1.25) (0.09) (− 0.30) examination of various explanations and attempt to quantify how much
This table reports cross-sectional Fama and MacBeth (1973) regression results of of the puzzle they explain. They organize these explanations into three
forecasting one month ahead (time t) stock returns using the martingale IVOL groups: lottery preferences of investors (measured by skewness, cos-
forecast (Panel A) and martingale and ARMA forecasts (Panel B). In Panel A, we kewness, expected idiosyncratic skewness, maximum daily return, and
drop observations each month for which the martingale performs the worst in retail trading proportion); market frictions (measured by one-month
terms of overpredicting (Model 1 and 2) and underpredicting (Model 3 and 4) return reversal, a variable that is already included in our standard risk
realized variances. While additional controls are included in Models 2 and 4, controls, the Amihud illiquidity measure, zero return proportion, and
they are suppressed for brevity. In Panel B, we drop observations based on the bid-ask spreads) and a third group that includes all other variables.19 In
difference between the martingale forecasts compared to the ARMA(1,1) fore-
addition, we investigate variables that mimic the size screens that were
casts. Again, the additional controls in Models 2 and 4 are suppressed for brevity.
shown to be important in Bali and Cakici (2008) (namely, dummies
Newey and West (1987) corrected t-statistics with three lags are reported in
parentheses, and *, **, and *** indicate statistical significance at the 10%, 5%, indicating stocks with prices lower than $5 and stocks with a market
and 1% levels respectively. The data are from 1963:07 to 2022:12. capitalization lower than the 5th percentile of the NYSE stocks ranked
on market capitalization). All the variables are defined in the data
most leads to a less negative relation (i.e., the magnitude is smaller, in section.
absolute value), whereas removing observations where the martingale Because the IVOL puzzle is driven by the observations for which the
model underestimates realized IVAR the most leads to a more negative martingale model overestimates IVOL the most, we conjecture that
relation (because we are removing observations from the high forecast, candidate explanations for the IVOL puzzle more correlated with
high future return quadrant which induces a positive relation). Inter- extreme martingale overestimations (compared to underestimations)
estingly, eliminating just the top 0.4% of firms with the largest over- should show stronger explanatory power of the puzzle. Such relation-
estimation already renders the relationship statistically insignificant for ship is not likely captured well by linear regressions (recall that just
the remaining firm-month observations. At a 1% (0.8%) cut, the rela- dropping 0.4% of the extreme forecast error observations suffices to
tionship even turns positive for the univariate (multivariate) models. It render the martingale IVOL-return relationship insignificant). We
is important to note that these cuts are made ex-post, and we would
expect that the relation between expected IVOL and returns would
18
change in the direction observed if we excluded a large number of firm Note that our winsorizing of forecasts mitigates the outlier problem, but we
months with over- or underpredictions. However, these results indicate observe the exact same phenomenon using the raw, non-winsorized forecasts.
19
that the relation between the martingale IVOL forecast and returns is We exclude the variables in the last category as they substantially reduce
strongly influenced by a few firm months for which the martingale our sample size, but confirm in unreported tests that including them has a
negligible impact on the explanatory power of the models.

12
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

therefore resort to several regression methodologies that measure re- variable in question across the three different specifications. We rank the
lationships in the tail: a linear probability (LinProb) model where we independent variables based on the magnitude of the absolute co-
create a dummy for the top decile of martingale over-predictions (i.e. the efficients from largest (1) to smallest (total number of independent
negative of the martingale forecast errors); a quantile regression (QREG) variables) and average across the specifications.
focusing on the top 10% over-predictions, and a weighed least squares We first focus on the multivariate regressions that exclude the
(WLS) regression where the weights are proportional to the square root maximum return variable (MAX) from Bali et al. (2011). According to
of the percentile rank of the (negative of the) forecast error. As inde- Hou and Loh (2016), this variable is very highly correlated with IVOL
pendent variables, we use the usual risk controls that we employed in and it is hard to see it as a proper measure of skewness or lottery pref-
the return model (Model 1) and then add the various variables proposed erences. The inclusion of MAX also has a huge impact on the other
in Hou and Loh (2016) in Model 2. When we add the size related vari- explanatory variables, as including it changes the sign of various co-
ables from Bali and Cakici (2008), we drop the size risk control to avoid efficients compared to when the base model 1 is augmented with one
multi-collinearity (See Internet Appendix 4, Table 3). additional variable at a time (presented in Internet Appendix 4, Table
In Table 7, for brevity, we only show the coefficients from the 2A). In contrast, no coefficients change sign from the specification where
multivariate results for the WLS regressions (the coefficients from the only that one variable was added to the risk controls compared to Model
other two models, the linear probability model and the quantile 2, where all of them except for MAX are included. The sign consistency is
regression models, are relegated to Internet Appendix 4, Table 1). In also largely true when using the other econometric methodologies (not
these models, all continuous control variables are winsorized (at 0.5% reported).
and 99.5%) and all continuous variables are standardized with a mean of For the multivariate regressions using only standard control vari-
0 and a standard deviation of 1 to ease interpretation. Standard errors ables (Model 1), the economically most important coefficients (given
are clustered at both the firm and time (month) level (i.e. 2-way clus- standardization, this is measured immediately from the magnitude of
tered standard errors). We also report a covariance decomposition, the coefficients) are size (especially when the alternative size metrics are
indicating the ratio of the covariance between the fitted value and the used in Internet Appendix 4, Table 3A), reversal (as measured by
coefficient times the independent variable over the variance of the fitted LAGRET) and liquidity. This is corroborated by the covariance decom-
value. These covariance contributions add to 100%, and use the average position, which shows that these variables rank as the most important
across the 3 models discussed above (WLS, linear probability, and across the models, and account for 36% (LNME), 27% (CVTURN), and
quantile regression). Similarly, the “rank average” reports the relative 23% (LAGRET) of the variation respectively. When the additional con-
importance (in terms of the absolute magnitude of the coefficient) of the trol variables from Hou and Loh (2016) are added (Model 2), the results

Table 7
Negative martingale forecast error.
(1) (2) (3)

WLS Decompose Avg Rank Avg WLS Decompose Avg Rank Avg WLS Decompose Avg Rank Avg

BETA − 0.000 1.21% 7.00 0.001 0.48% 12.33 − 0.007*** − 0.83% 7.00
​ (− 0.037) ​ ​ (1.520) ​ ​ (− 19.651) ​ ​
LN(ME) − 0.034*** 36.43% 1.00 − 0.004*** 6.76% 8.00 0.019*** − 1.56% 4.67
​ (− 21.898) ​ ​ (− 3.011) ​ ​ (22.043) ​ ​
LN(BEME) − 0.006*** 0.30% 6.00 0.001 − 0.04% 11.67 0.006*** 0.03% 10.00
​ (− 6.972) ​ ​ (1.211) ​ ​ (14.346) ​ ​
MOM − 0.017*** 5.77% 5.00 − 0.009*** 1.08% 8.67 − 0.000 − 0.03% 12.00
​ (− 10.416) ​ ​ (− 7.578) ​ ​ (− 0.522) ​ ​
LAGRET 0.031*** 23.11% 2.33 0.022*** 6.62% 4.00 − 0.002** − 0.47% 9.00
​ (18.889) ​ ​ (14.690) ​ ​ (− 2.022) ​ ​
LN(TURN) 0.028*** 6.59% 3.33 0.015*** 0.95% 6.67 − 0.009*** − 0.30% 6.00
​ (20.344) ​ ​ (16.376) ​ ​ (− 13.481) ​ ​
LN(CVTURN) 0.026*** 26.58% 3.33 0.023*** 8.49% 4.67 0.003*** 0.62% 10.33
​ (23.724) ​ ​ (29.701) ​ ​ (6.423) ​ ​
SKEW ​ ​ ​ 0.022*** 5.15% 6.33 − 0.043*** − 6.69% 2.00
​ ​ ​ ​ (19.517) ​ ​ (− 61.189) ​ ​
COSKEW ​ ​ ​ 0.003*** 0.28% 10.67 − 0.001*** − 0.03% 11.33
​ ​ ​ ​ (3.948) ​ ​ (− 2.911) ​ ​
EXPIDIOSKEW ​ ​ ​ 0.007*** 2.47% 8.00 − 0.000 − 0.09% 13.00
​ ​ ​ ​ (5.392) ​ ​ (− 0.058) ​ ​
AMILL ​ ​ ​ 0.011*** 4.94% 6.67 0.001 0.94% 6.00
​ ​ ​ ​ (9.921) ​ ​ (1.372) ​ ​
ZERORET ​ ​ ​ − 0.065*** − 1.81% 2.33 − 0.011*** 0.08% 6.00
​ ​ ​ ​ (− 28.276) ​ ​ (− 9.014) ​ ​
BIDASK ​ ​ ​ 0.077*** 64.64% 1.00 − 0.004*** − 2.79% 6.67
​ ​ ​ ​ (36.247) ​ ​ (− 3.986) ​ ​
MAX ​ ​ ​ ​ ​ ​ 0.147*** 111.11% 1.00
​ ​ ​ ​ ​ ​ ​ (84.446) ​ ​
Adj R-Sq 0.085 ​ ​ 0.174 ​ ​ 0.375 ​ ​

The table reports coefficients from regressing standardized martingale (negative) forecast error as the dependent variable on standardized and winsorized (at 0.5% and
99.5%) continuous independent variables using weighted least squares. All dependent and independent variables have been multiplied by the square root of the
( ) ( )
martingale error percentile rank. “Decompose Avg” is the average of the decomposed covariance using cov ̂ y i,t , βj Xj,i,t /var ̂
y i,t where ̂
y i,t is the predicted y for firms i
in month t and Xj is a vector of independent variables used in the model, across the WLS model, a Linear Probability model, and a Quantile Regression model (where the
latter two focus on extreme decile). “Rank Avg” is the average rank (highest=1) of the magnitude of the absolute coefficient in each model averaged across the 3
models. Clustered standard errors by time and firm are reported in parentheses, and *, **, and *** indicate statistical significance at the 10%, 5%, and 1% levels
respectively.

13
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Table 8
Difference between martingale and ARMA(1,1) Forecasts.
(1) (2) (3)

WLS Decompose Avg Rank Avg WLS Decompose Avg Rank Avg WLS Decompose Avg Rank Avg

BETA − 0.016*** 0.18% 7.00 0.011***


− − 0.04% 10.33 − 0.064*** − 1.02% 7.00
​ (− 4.000) ​ (−
3.109) ​ ​ (− 22.110) ​ ​
LN(ME) − 0.181*** 40.78% 1.00 0.001
− 5.30% 9.67 0.123*** − 2.34% 6.67
​ (− 17.827) ​ (−
0.056) ​ ​ (20.702) ​ ​
LN(BEME) − 0.045*** 0.35% 6.00 0.006 0.01% 12.00 0.038*** 0.06% 9.00
​ (− 7.743) ​ (1.138) ​ ​ (12.473) ​ ​
MOM − 0.167*** 22.72% 2.00 − 0.096*** 5.03% 4.00 − 0.032*** 1.09% 8.00
​ (− 14.974) ​ (− 11.132) ​ ​ (− 6.849) ​ ​
LAGRET 0.105*** 8.78% 4.67 0.059*** 1.53% 6.67 − 0.071*** − 1.37% 4.33
​ (8.446) ​ (4.989) ​ ​ (− 12.505) ​ ​
LN(TURN) 0.151*** 5.05% 3.67 0.078*** 0.73% 8.33 − 0.101*** − 0.37% 4.67
​ (15.262) ​ (11.279) ​ ​ (− 21.825) ​ ​
LN(CVTURN) 0.156*** 22.14% 3.67 0.148*** 6.44% 5.00 − 0.002 − 0.71% 11.00
​ (19.360) ​ (24.999) ​ ​ (− 0.490) ​ ​
SKEW ​ ​ 0.124*** 4.82% 5.00 − 0.265*** − 6.24% 2.00
​ ​ ​ (18.069) ​ ​ (− 48.579) ​ ​
COSKEW ​ ​ 0.013*** 0.27% 9.33 − 0.010*** − 0.05% 13.67
​ ​ ​ (3.197) ​ ​ (− 3.550) ​ ​
EXPIDIOSKEW ​ ​ 0.066*** 1.34% 8.67 − 0.014** − 0.95% 10.00
​ ​ ​ (5.783) ​ ​ (− 1.993) ​ ​
AMILL ​ ​ − 0.022*** − 2.43% 9.00 − 0.101*** − 2.58% 6.67
​ ​ ​ (− 2.618) ​ ​ (− 13.632) ​ ​
ZERORET ​ ​ − 0.363*** − 3.38% 2.00 − 0.031*** 0.04% 10.00
​ ​ ​ (− 24.211) ​ ​ (− 5.242) ​ ​
BIDASK ​ ​ 0.430*** 80.38% 1.00 − 0.047*** − 1.58% 11.00
​ ​ ​ (34.435) ​ ​ (− 5.885) ​ ​
MAX ​ ​ ​ ​ ​ 0.867*** 116.01% 1.00
​ ​ ​ ​ ​ ​ (72.953) ​ ​
Adj R-Sq 0.107 ​ 0.241 ​ ​ 0.567 ​ ​

The table reports results from regressing the standardized difference between the martingale and ARMA(1,1) forecasts of IVAR as the dependent variable on stan-
dardized and winsorized (at 0.5% and 99.5%) continuous independent variables using weighted least squares. All dependent and independent variables have been
( ) ( )
multiplied by the square root of the martingale error percentile rank. “Decompose Avg” is the average of the decomposed covariance using cov ̂y i,t , βj Xj,i,t /var ̂
y i,t
where ̂y i,t is the predicted y for firms i in month t and Xj is a vector of independent variables used in the model, across the WLS model, a Linear Probability model, and a
Quantile Regression model (where the latter two focus on the 90th percentile). “Rank Avg” is the average rank (highest=1) of the magnitude of the absolute coefficient
in each model averaged across the 3 models. Clustered standard errors by time and firm are reported in parentheses, and *, **, and *** indicate statistical significance
at the 10%, 5%, and 1% levels respectively.

continue to suggest a large role for liquidity, with zero returns and forecast errors for the martingale model and the IVOL puzzle.
bid-ask spreads being most important (as judged by the magnitude of the We perform the same analysis for the difference between the
coefficients). The coefficients for the important variables range between martingale and ARMA(1,1) forecasts. The results, using the WLS re-
0.02 and 0.08, and we observe similar coefficients for the linear prob- gressions, are reported in Table 8 (as well as Internet Appendix 4).
ability model (Internet Appendix 4, Table 1). Perhaps not surprisingly, we find that the same variables are associated
The MAX return variable is considered separately (Model 3) because it with larger differences between the two forecasts. In particular, the most
has a massive effect on the probability of observing a large forecast error. important variable is again the MAX variable, which renders other
Its coefficient is of the order of 0.15 and its inclusion in the regression variables much less economically important or even changes the sign of
changes the sign of the coefficients on many variables, while also reducing the coefficients (e.g. for the bid-ask spread). When the MAX variable is
them to smallish numbers. The R2 more than doubles. One exception is the not included in the regression, the most important variables, economi-
skewness coefficient which now becomes negative, likely driven by its cally, are the bid-ask spread and zero returns (the latter with a negative
high correlation with the MAX variable which is an indirect indicator of the sign). Size indicators also play an important role, especially when
potential to positive skewness and reflects a realization of high skewness. measured by the alternative size dummies (Internet Appendix 4, Table
In sum, the MAX return variable is a good proxy for martingale over- 3B).
estimations of variances. Internet Appendix 4, Table 1 shows that alter-
native estimation approaches (including the linear probability model, and 6. The cross-section of expected idiosyncratic volatility
quantile regressions) lead to similar conclusions.20 For the quantile re-
gressions, the key variable, when excluding the maximum return variable, Given that the ARMA (1,1) model is clearly the best model, we
is, in a more dominant fashion than seen using other approaches, the bid- analyze what drives the ARMA(1,1) parameters and use the model
ask spread. Small, illiquid firms, which are prone to some extreme return forecasts to characterize the cross-sectional and time series behavior of
behavior in particular firm months, appear to be the drivers of the large expected idiosyncratic volatility for US firms.

6.1. ARMA(1,1) expected idiosyncratic volatility


20
While the standard errors for the WLS and Linear Probability Models use
two-way clustered errors, the Quantile regression models use robust standard We first provide further insights on how the ARMA model produces
errors. For the latter, the standard errors are likely mis-specified, and we only
rely on that model for interpreting the magnitude of the coefficients and for
covariance decomposition.

14
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

expected idiosyncratic variances from past information, in particular average autoregressive coefficient, which is 0.63. This is quite very far
past realized variances through the autoregressive parameter, and past away from the implied value of 1 imposed by the martingale model, and
variance shocks through the moving average coefficient. Then we similarly far away from the average persistence coefficient in an AR(1)
investigate how expected idiosyncratic volatility varies with firm model, which is only 0.27 (not reported). The largest economic effects
characteristics. arise from the size and the book-to-market variables, with firms one
Models 1 and 2 of Table 9 reports coefficients from a panel regres- standard deviation larger than the average featuring a 0.04 higher
sion, regressing the estimated autoregressive and moving average pa- autoregressive coefficient, firms with book-to-market values one stan-
rameters on firm characteristics. Because the parameters result from our dard deviation above the mean a 0.05 higher coefficient. These effects
out-of-sample analysis with rolling estimations, we have a panel of such are highly statistically significant. The turnover variables (level and
coefficients. To facilitate interpretation of the results, all independent coefficient of variation) also generate statistically significant effects with
variables are standardized to have zero mean and standard deviation of higher turnover (variation) lowering the AR coefficients. All other ef-
1. fects are economically very small.
Focusing on the AR-coefficients (Model 1), because all independent We also report similar results for the moving average coefficients in
variables are standardized, the constant in the regression represents the Model 2. The average moving average coefficient is 0.41, indicating that
the best forecast typically involves reversing large shocks to realized
idiosyncratic variances. This explains why economically neither a
martingale model, nor a simple autoregressive model can fit the data
Table 9
well. When shocks are small, an autoregressive model with a coefficient
What drives variation in the ARMA(1,1) parameters and IVOL.
of around 0.63 works well. When a large positive shock to the realized
(1) (2) (3) variance happens, the optimal ARMA forecast lowers the effective
AR MA ARMA(1,1) IVOL
persistence considerably; when a negative shock happens, the forecast
Intercept 0.625*** 0.412*** 0.481*** increases the effective persistence of the process. The importance of the
​ (173.820) (132.932) (133.344) MA coefficient can be gleaned from Fig. 1, where we rank firms in
BETA 0.004 − 0.002 0.022***
​ (1.478) (− 0.954) (13.002)
deciles based on their change in IVAR between time t-1 and t. We then
LN(ME) 0.041*** 0.036*** − 0.159*** show the average IVAR in subsequent time periods for the most extreme
​ (9.293) (8.910) (− 40.949) deciles. The blue line shows the average IVAR for firms with the largest
LN(BEME) 0.050*** 0.038*** − 0.034*** increases in IVAR from t-1 to t (the graph shows this to represent a shift
(19.969) (15.656) (− 18.808)
from about 0.04 to 0.14), and the red line shows the average IVAR for

MOM − 0.004*** − 0.001 − 0.018***
​ (− 2.877) (− 0.428) (− 6.808) the decile with the largest decreases in IVAR between t-1 and t (repre-
LAGRET 0.002* 0.001 0.009*** senting roughly the opposite switch). Clearly, we see strong mean-
​ (1.685) (0.927) (4.036) reversion in the IVAR levels, which the ARMA model accomplishes
LN(TURN) − 0.018*** − 0.010*** 0.097*** through the MA-coefficient.
(− 5.573) (− 3.204) (41.856)
When looking at the impact on the MA coefficient, size and book-to-

LN(CVTURN) − 0.037*** − 0.036*** 0.071***
​ (− 12.544) (− 12.970) (34.314) market again generate the largest positive coefficients, with large value
firms having higher moving average coefficients (0.036, respectively
The table shows a panel regression of the AR (MA) coefficient as well as the
0.038 per standard deviation away from the mean). The only other
annualized volatility forecast (SQRT(Monthly VAR Forecast × 12)) of the ARMA
(1,1) model on firm characteristics. All firm level explanatory variables have
economically important effect is recorded for the coefficient of variation
been winsorized at the 0.5% and 99.5% levels and all independent variables of turnover, with a − 0.036 coefficient.
have been standardized with a mean of 0 and standard deviation of 1. Clustered Fig. 2 provides more intuition on the relative performance of the
standard errors by time and firm are reported in parentheses, and *, **, and *** ARMA(1,1) model relative to simpler but nested models such as the
indicate statistical significance at the 10%, 5%, and 1% levels respectively. martingale and the AR(1) model. We split the firms every month (at time
t-1) based on the change in IVAR from the previous month (from t-2 to t-

Fig. 1. This figure shows the firms with the largest changes in IVAR from time t-1 to time t organized in deciles, and the subsequent mean IVAR for the two extreme
deciles. The blue line shows the average IVAR for firms with the largest increases in IVAR from t-1 to t, and the red line shows the average IVAR for the decile with the
largest decreases in IVAR between t-1 and t.

15
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Fig. 2. This figure shows the mean IVAR forecast, as well as actual IVAR, for month t where the forecasts are based on data up until t-1. The forecasts are generated
separately for each decile of change in the forecast from time t-2 to time t-1. Hence, the lowest decile (1) shows the forecasts for the firms with the lowest (most
negative) change in IVAR from time t-2 to time t-1, and the highest decile (10 - to the right on the graph), shows the forecasts for the decile with the largest changes in
IVAR from time t-2 to time t-1.

1) and organize them into 10 deciles. Hence, the firms in the first decile standard deviation above the average size (turnover) featuring 15.9%
just experienced large negative shocks to the realized variance, whereas (9.7%) lower (higher) idiosyncratic volatility.
the firms in the 10th decile experienced large positive shocks to the
realized variance. The figure plots the mean realized variance (IVAR) at 6.2. Time series dynamics of expected idiosyncratic volatility
time t and the average forecast, based on data up until t-1, for each of the
10 deciles for the three aforementioned models. Thus, in the figure on In Fig. 3, we plot the time series behavior of idiosyncratic volatility
the left, the lowest decile (indicated by 1) shows the forecasts for the over the full sample period. Panel B shows the median, 25% and 75%
firms with the lowest (most negative) change in IVAR from time t-2 to percentile of annualized expected idiosyncratic volatility using the
time t-1, and on the right, shows the forecasts for the decile with the ARMA(1,1) model, while Panel A shows the realized volatility for
largest changes in IVAR from time t-2 to time t-1, the highest decile comparison. The 90% range, i.e. the 5th to 95th percentile, for median
(indicated by 10). realized volatility over this long time period is [17.3%, 50.9%]. It is
As expected, the martingale model overpredicts IVAR for firms with apparent that idiosyncratic volatility was extremely high in the early
recent large jumps in IVAR and underpredicts the IVAR for firms with part of our sample (the 1930s) and then declined to much lower levels by
recent large declines in IVAR (as the martingale model assumes that any 1960. However, the median idiosyncratic volatility then increases again,
change to IVAR is permanent). The predicted IVAR for firms with small showing several peaks reaching levels close to those observed in the late
changes to IVAR generally appears too low for this model, perhaps thirties. The most recent such peak in our sample occurred during the
because it misses the skewness in variance. In contrast, the ARMA(1,1) beginning of the Covid pandemic, which is also characterized by
model performs better at predicting IVAR after both positive and extreme dispersion in idiosyncratic volatility across the 25% and 75%
negative large changes in IVAR, suggesting that such shocks are partially percentiles. Given the very large literature on an alleged downward
reversed. Interestingly, the ARMA(1,1) model also produces better trend in aggregate idiosyncratic volatility (see e.g., Campbell et al.,
forecasts than the martingale model for firms with modest changes in 2001), this may be surprising. The time series pattern seems more
IVAR. In comparison to the AR(1) model, its performance is most visibly consistent with a stationary process with occasional regime switches
better when there are large negative shocks to IVAR, and it is important into high volatility periods (see also Bekaert et al., 2012). The expected
to revise the predicted realized variance upwards which the AR(1) idiosyncratic volatility series are obviously much smoother than the
model fails to do. Because the AR(1) model typically imbues less realized volatility graphs but follow the same pattern.
persistence to the model than does the ARMA(1,1) model, it still per-
forms well in cases where the realized variance is temporarily high. 7. Summary and conclusion
Next, we characterize how expected idiosyncratic volatility varies
with firm characteristics more directly and report the results in Model 3 Evaluating the riskiness of a stock portfolio requires an estimate of its
of Table 9. We use the same panel regression framework as above, but expected idiosyncratic variance (IVAR). Very little systematic research
the dependent variable now is the expected idiosyncratic volatility (in exists on how to construct such estimates. In this article, we compare the
annualized terms) for each firm i, at time t using the ARMA(1,1) model. performance of a large suite of different models to forecast idiosyncratic
The average idiosyncratic volatility is around 48%. Most firm charac- realized variances. While the popular martingale model is only best for
teristics generate statistically significant deviations from the average, 13.83% of firms in out-of-sample forecasts, the ARMA(1,1) model per-
but few generate effects above 5% in absolute magnitude. The forms the best for 34.28% of firms (and if we include the ARMA(1,1)
economically important effects are size and turnover, with firms one model incorporating a market variance term, it is best for 46.52% of

16
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Fig. 3. Realized annual volatility.


Figure A shows the 25th, 50th and 75th percentile of realized (annualized) volatility [SQRT(Monthly IVAR × 12)] by month. Figure B shows the same percentiles of
the ARMA(1,1) firm volatility forecasts.

firms). Moreover, the ARMA(1,1) model delivers lower average RMSEs which the martingale model generates poor, excessively high forecasts.
than all other models in a statistically significant fashion, and also That is, the model fails to reverse large shocks to IVAR and therefore
performs quite well when it is not the best model. The ARMA model overestimates IVAR for firms with subsequently low returns. Eliminating
produces different predictions from both a martingale model and an AR firm months with high differences between the martingale and ARMA(1,
(1) model, as for a typical firm it generates persistent forecasts (with the 1) forecasts also renders the relationship insignificant. Not surprisingly,
autoregressive coefficient around 0.63, much lower than for a martin- we find that proposed explanations for the IVOL puzzle, such as small
gale, but much higher than observed for an AR(1) model) only when size, skewness and poor liquidity are strongly associated with firm
there are no large shocks to realized variance. When there are large months featuring poor martingale forecasts, with the strongest effect
shocks to realized variance, the forecast is adjusted in the opposite di- associated with high bid-ask spreads. When the max return variable
rection through a large MA coefficient (of about 0.41). from Bali et al. (2011) is included, it dominates the regressions. Its
We use our various proxies for expected idiosyncratic risk to revisit explanatory power for the IVOL puzzle is highly associated with high
the relationship between idiosyncratic risk and returns. We find that the maximum return months coinciding with firm months where the
only proxy for which the Ang et al. (2006) result (a negative relation martingale model generates poor, excessively high volatility forecasts.
between expected idiosyncratic risk and returns) holds, is the martingale Our idiosyncratic volatility measures may be useful in several
model. For all other models, including the best ARMA(1,1) model, and alternative literatures, including option pricing for individual stock
when using the best IVAR model for each firm in each month, there is no options and asset management, because expected idiosyncratic volatility
relation between expected idiosyncratic risk and returns. We show that is necessary to determine optimal active positions. In addition, there is
the difference is driven by a very small percentage of firm months for little consensus in the macroeconomic literature on how to measure

17
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

uncertainty. Our firm-specific measure of expected idiosyncratic vari- Writing – review & editing, Writing – original draft, Visualization,
ance provides a viable alternative to alternative stock market variance Validation, Supervision, Software, Resources, Project administration,
measures that have been used in the literature, including the aggregate Methodology, Investigation, Funding acquisition, Formal analysis, Data
market volatility (see Bloom, 2009), or cross-sectional stock market curation, Conceptualization. Haimanot Kassa: Writing – review &
dispersion (Bloom, 2009; Christiano et al., 2014). editing, Writing – original draft, Visualization, Validation, Supervision,
Software, Resources, Project administration, Methodology, Investiga-
CRediT authorship contribution statement tion, Funding acquisition, Formal analysis, Data curation,
Conceptualization.
Geert Bekaert: Writing – review & editing, Writing – original draft,
Visualization, Validation, Supervision, Software, Resources, Project Declaration of competing interest
administration, Methodology, Investigation, Funding acquisition,
Formal analysis, Data curation, Conceptualization. Mikael Bergbrant: None.

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jfineco.2025.104023.

Appendix A. Out-of-sample Forecast Performance Robustness Checks

This table presents robustness tests where the IVAR forecasted is that from a within month CAPM Model (Panel A), 3-month FF (Panel B),
autocorrelation corrected IVAR (Panel C) and subset to only firms during which time they were included in the S&P 500 (Panel D). This table shows
out-of-sample statistics for the 10 models described in the data section, where the statistics presented are based on firm averages. To calculate the out-
of-sample statistics, each model is fit every month for each firm to generate a forecast for the following month, using up to 10 years of data (when
available) prior to the time of the forecast. Before using these forecasts to calculate the out-of-sample statistic for each firm, all monthly forecasts are
winsorized at 0 and 0.5. Panel E and F present the main results (from Table 1) with forecasts winsorized at 2.5% and 97.5% (Panel E) or not at all
(Panel F). In addition to the mean statistic, the table also shows the proportion of firms for which a model generates the best forecasts, as well as the
average rank of the model. We also report the “improvement to worst”, defined as the ratio of the difference between the worst RMSE and the model
being examined relative to the RMSE difference between the worst and best, as well as the median. The sample period is from 1926 to 2022.

Panel A: CAPM ​ ​ ​ ​ ​
Model Mean Best Rank Imp. to Worst Median

Martingale 0.1154 13.57% 5.31 29.72% 0.0374


ARMA(1,1) 0.1111 33.89% 2.63 81.70% 0.0318
ARNL 0.1128 17.63% 3.37 70.26% 0.0326
HAR 0.1144 8.17% 4.36 56.87% 0.0347
ARMA(1,1) w Mkt 0.1124 12.47% 3.47 71.81% 0.0331
ARNL w Mkt 0.1136 9.05% 3.95 63.72% 0.0339
HAR w Mkt 0.1153 5.09% 4.92 49.80% 0.0360

Panel B: Fama-French 3 Factors with 3-month estimation of parameters

Model Mean Best Rank Imp. to Worst Median

Martingale 0.1258 13.96% 5.28 30.12% 0.0393


ARMA(1,1) 0.1214 33.21% 2.67 81.06% 0.0335
ARNL 0.1230 17.29% 3.41 69.66% 0.0342
HAR 0.1248 8.47% 4.36 56.70% 0.0365
ARMA(1,1) w Mkt 0.1226 12.51% 3.46 71.95% 0.0348
ARNL w Mkt 0.1238 9.38% 3.93 63.76% 0.0353
HAR w Mkt 0.1256 4.97% 4.88 50.22% 0.0378

Panel C: Autocorrelation Corrected

Model Mean Best Rank Imp. to Worst Median

Martingale 0.0989 9.16% 5.84 20.64% 0.0390


ARMA(1,1) 0.0914 29.73% 2.80 82.74% 0.0323
ARNL 0.0924 20.10% 3.25 76.26% 0.0329
HAR 0.0936 12.25% 3.95 68.61% 0.0339
ARMA(1,1) w Mkt 0.0928 12.36% 3.62 74.71% 0.0336
ARNL w Mkt 0.0934 10.33% 3.91 69.73% 0.0340
HAR w Mkt 0.0947 5.95% 4.63 61.67% 0.0350

Panel D: S&P 500 Firms

Model Mean Best Rank Improvement to Worst Median

Martingale 0.0124 5.77% 6.01 17.89% 0.008


ARMA(1,1) 0.0105 42.37% 2.28 89.85% 0.006
ARNL 0.0109 15.44% 3.31 76.37% 0.006
HAR 0.0117 6.69% 4.39 64.25% 0.007
ARMA(1,1) w Mkt 0.0108 16.41% 3.13 79.72% 0.006
ARNL w Mkt 0.0111 8.86% 3.86 69.57% 0.007
(continued on next page)

18
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

(continued )
HAR w Mkt 0.0120 4.46% 5.03 55.00% 0.007

Panel E: Winsorize at 2.5% and 97.5%

Model Mean Best Rank Imp. to Worst Median

Martingale 0.1015 16.66% 4.94 36.01% 0.0304


ARMA(1,1) 0.0996 34.17% 2.65 79.50% 0.0272
ARNL 0.1007 16.03% 3.44 66.98% 0.0279
HAR 0.1016 8.27% 4.29 55.36% 0.0291
ARMA(1,1) w Mkt 0.1011 11.36% 3.60 66.78% 0.0286
ARNL w Mkt 0.1017 8.19% 4.13 57.78% 0.0291
HAR w Mkt 0.1028 4.60% 4.97 45.23% 0.0303

Panel F: No Winsorization

Model Mean Best Rank Improvement to Worst Median

Martingale 0.1181 11.71% 5.38 31.06% 0.0325


ARMA(1,1) 0.1041 35.85% 2.47 86.23% 0.0275
ARNL 1.01E29 19.95% 3.20 74.41% 0.0283
HAR 0.3242 7.40% 4.48 56.25% 0.0306
ARMA(1,1) w Mkt 0.1117 11.89% 3.44 74.90% 0.0290
ARNL w Mkt 1.04E25 9.43% 3.91 65.97% 0.0297
HAR w Mkt 0.3571 3.78% 5.13 47.16% 0.0320

Data availability Bergbrant, M., Kassa, H., 2021. Is idiosyncratic volatility related to returns? Evidence
from a subset of firms with quality idiosyncratic volatility estimates. J. Bank. Finance
127, 106126.
Expected Idiosycratic Volatility (Reference data) (Mendeley Data) Bloom, N., 2009. The impact of uncertainty shocks. Econometrica 77 (3), 623–685.
Blume, M.E., Friend, I., 1975. The asset structure of individual portfolios and some
implications for utility functions. J. Finance 30 (2), 585–603.
Bollerslev, T., Patton, A.J., Quaedvlieg, R., 2016. Exploiting the errors: a simple
References approach for improved volatility forecasting. J. Econom. 192 (1), 1–18.
Boyer, B., Mitton, T., Vorkink, K., 2010. Expected idiosyncratic skewness. Rev. Financ.
Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Automat. Stud. 23 (1), 169–202.
Contr. 19 (6), 716–723. Brockman, P., Guo, T., Vivero, M.G., Yu, W., 2022. Is idiosyncratic risk priced? The
Amihud, Y., 2002. Illiquidity and stock returns: cross-section and time-series effects. international evidence. J. Empir. Finance 66, 121–136.
J. Financ. Mark. 5 (1), 31–56. Campbell, J., Lettau, M., Malkiel, B., Xu, Y., 2001. Have individual stocks become more
Amihud, Y., Mendelson, H., 1986. Liquidity and stock returns. Financ. Anal. J. 42 (3), volatile? An empirical exploration of idiosyncratic risk. J. Finance 56 (1), 1–43.
43–48. Chabi-Yo, F. and Yang, J. (2009). Idiosyncratic coskewness and equity returns, Working
Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H., 2001. The distribution of realized Paper.
stock return volatility. J. Financ. Econ. 61 (1), 43–76. Chen, Z., Petkova, R., 2012. Does idiosyncratic volatility proxy for risk exposure? Rev.
Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P., 2003. Measuring and forecasting Financ. Stud. 25 (9), 2745–2787.
realized volatility. Econometrica 71 (2), 579–625. Chernov, M., Gallant, A.R., Ghysels, E., Tauchen, G., 2003. Alternative models for stock
Andreou, E., Ghysels, E., 2002. Rolling-sample volatility estimators: some new price dynamics. J. Econom. 116 (1–2), 225–257.
theoretical, simulation and empirical results. J. Bus. Econ. Stat. 20 (3), 363–376. Chordia, T., Subrahmanyam, A., Anshuman, V.R., 2001. Trading activity and expected
Ang, A., Hodrick, R.J., Xing, Y., Zhang, X., 2006. The cross-section of volatility and stock returns. J. Financ. Econ. 59 (1), 3–32.
expected returns. J. Finance 61 (1), 259–299. Chordia, T., Subrahmanyam, A., 2004. Order imbalance and individual stock returns:
Ang, A., Hodrick, R.J., Xing, Y., Zhang, X., 2009. High idiosyncratic volatility and low theory and evidence. J. Financ. Econ. 72 (3), 485–518.
returns: international and further US evidence. J. Financ. Econ. 91 (1), 1–23. Christiano, L.J., Motto, R., Rostagno, M., 2014. Risk shocks. Am. Econ. Rev. 104 (1),
Asparouhova, E., Bessembinder, H., Kalcheva, I., 2010. Liquidity biases in asset pricing 27–65.
tests. J. Financ. Econ. 96 (2), 215–237. Corsi, F., 2009. A simple approximate long-memory model of realized volatility.
Bakshi, G., Cao, C., Zhong, Z.K., 2021. Assessing models of individual equity option J. Financ. Econom. 7 (2), 174–196.
prices. Rev. Quant. Finance Account. 57 (1), 1–28. Corwin, S.A., Schultz, P., 2012. A simple way to estimate bid-ask spreads from daily high
Bakshi, G., Kapadia, N., Madan, D., 2003. Stock return characteristics, skew laws, and the and low prices. J. Finance 67 (2), 719–760.
differential pricing of individual equity options. Rev. Financ. Stud. 16 (1), 101–143. Engle, R.F., Lee, G.G.J., 1999. A long-run and short-run component model of stock return
Bali, T.G., Cakici, N., 2008. Idiosyncratic volatility and the cross section of expected volatility. In: Cointegration, Causality, and Forecasting, pp. 475–497.
returns. J. Financ. Quant. Anal. 43 (1), 29–58. Fama, E.F., French, K.R., 1992. The cross-section of expected stock returns. J. Finance 47
Bali, T.G., Cakici, N., Whitelaw, R.F., 2011. Maxing out: stocks as lotteries and the cross- (2), 427–465.
section of expected returns. J. Financ. Econ. 99 (2), 427–446. Fama, E.F., French, K.R., 1993. Common risk factors in the returns on stocks and bonds.
Bandi, F.M., Russell, J.R., 2006. Separating microstructure noise from volatility. J. Financ. Econ. 33 (1), 3–56.
J. Financ. Econ. 79 (3), 655–692. Fama, E.F., MacBeth, J.D., 1973. Risk, return, and equilibrium: empirical tests. J. Polit.
Barberis, N., Huang, M., 2008. Stocks as lotteries: the implications of probability Economy 81 (3), 607–636.
weighting for security prices. Am. Econ. Rev. 98 (5), 2066–2100. Fink, J.D., Fink, K.E., He, H., 2012. Expected idiosyncratic volatility measures and
Barndorff-Nielsen, O.E., Shephard, N., 2002. Econometric analysis of realized volatility expected returns. Financ. Manag. 41 (3), 519–553.
and its use in estimating stochastic volatility models. J. R. Stat. Soc. Ser. B Stat. French, K.R., Schwert, G.W., Stambaugh, R.F., 1987. Expected stock returns and
Methodol. 64 (2), 253–280. volatility. J. Financ. Econ. 19 (1), 3–29.
Bartram, S.M., Brown, G.W., Stulz, R.M., 2016. Why Does Idiosyncratic Risk Increase Fu, F., 2009. Idiosyncratic risk and the cross-section of expected stock returns. J. Financ.
with Market Risk? NBER. Working Paper No. 22492. Econ. 91 (1), 24–37.
Bekaert, G., Engstrom, E., Ermolov, A., 2015. Bad environments, good environments: a Ghysels, E., Plazzi, A., Valkanov, R., Rubia, A., Dossani, A., 2019. Direct versus iterated
non-Gaussian asymmetric volatility model. J. Econom. 186 (1), 258–275. multiperiod volatility forecasts. Annu. Rev. Financ. Econ. 11, 173–195.
Bekaert, G., Hodrick, R.J., Zhang, X., 2012. Aggregate idiosyncratic volatility. J. Financ. Ghysels, E., Sinko, A., Valkanov, R., 2007. MIDAS regressions: further results and new
Quant. Anal. 47 (6), 1155–1185. directions. Econom. Rev. 26 (1), 53–90.
Bekaert, G., Hoerova, M., 2014. The VIX, the variance premium and stock market Ghysels, E., Mykland, P., Renault, E., 2023. In-sample asymptotic and across sample
volatility. J. Econom. 183 (2), 181–192. efficiency gains for high frequency data statistics. Econ. Theory 39 (1), 70–106.
Bekaert, G., Hoerova, M., & Xu, N.R. (2024a). Risk, monetary policy and asset prices in a Goetzmann, W.N., Kumar, A., 2008. Equity Portfolio Diversification. Rev. Financ.
global world. Available At SSRN 3599583. 433–463.
Bekaert, G., Wang, X., Zhang, X., 2024b. The International Commonality of Idiosyncratic Guo, H., Kassa, H., Ferguson, M.F., 2014. On the relation between EGARCH idiosyncratic
Variances. forthcoming Manag. Sci. volatility and expected stock returns. J. Financ. Quant. Anal. 49 (01), 271–296.
Han, Y., Lesmond, D., 2011. Liquidity biases and the pricing of cross-sectional
idiosyncratic volatility. Rev. Financ. Stud. 24 (5), 1590–1629.

19
G. Bekaert et al. Journal of Financial Economics 167 (2025) 104023

Herskovic, B., Kelly, B., Lustig, H., Van Nieuwerburgh, S., 2016. The common factor in Levy, H., 1978. Equilibrium in an imperfect market: a constraint on the number of
idiosyncratic volatility: quantitative asset pricing implications. J. Financ. Econ. 119 securities in the portfolio. Am. Econ. Rev. 68 (4), 643–658.
(2), 249–283. Malkiel, B.G., and Xu, Y. (2002). Idiosyncratic risk and security returns. University of
Hou, K., Loh, R.K., 2016. Have we solved the idiosyncratic volatility puzzle? J. Financ. Texas at Dallas Working Paper.
Econ. 121 (1), 167–194. Merton, R.C., 1987. A simple model of capital market equilibrium with incomplete
Huang, W., Liu, Q., Rhee, S.G., Zhang, L., 2010. Return reversals, idiosyncratic risk, and information. J. Finance 42 (3), 483–510.
expected returns. Rev. Financ. Stud. 23 (1), 147–168. Nelson, D.B., 1991. Conditional heteroskedasticity in asset returns: a new approach.
Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M., 2009. Microstructure noise in Econom. J. Econom. Soc. 347–370.
the continuous case: the pre-averaging approach. Stochastic processes and their Newey, W.K., West, K.D., 1987. Hypothesis testing with efficient method of moments
applications 119 (7), 2249–2276. estimation. International Economic Review, pp. 777–787.
Jiang, G.J., Xu, D., Yao, T., 2009. The information content of idiosyncratic volatility. Pontiff, J., 2006. Costly arbitrage and the myth of idiosyncratic risk. J. Account. Econ. 42
J. Financ. Quant. Anal. 44 (1), 1–28. (1–2), 35–52.
Johnson, T.C., 2004. Forecast dispersion and the cross section of expected returns. Stambaugh, R.F., Yu, J., Yuan, Y., 2015. Arbitrage asymmetry and the idiosyncratic
J. Finance 59 (5), 1957–1978. volatility puzzle. J. Finance 70 (5), 1903–1948.
Kozeniauskas, N., Orlik, A., Veldkamp, L., 2018. What are uncertainty shocks? J. Monet. Treynor, J.L., Black, F., 1973. How to use security analysis to improve portfolio selection.
Econ. 100, 1–15. J. Bus. 46 (1), 66–86.
Welch, I., 2022. Simpler better market betas. Crit. Finance Rev. 11 (1), 37–64.

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy