0% found this document useful (0 votes)
5 views38 pages

2010 - 点估计 - Making and Evaluating Point Forecasts

The document discusses the evaluation of point forecasts, emphasizing the importance of matching scoring functions with forecasting tasks to avoid misleading inferences. It highlights the need for a priori specification of scoring functions or directives to ensure optimal point forecasts, such as the Bayes rule. The paper critiques common practices in academia and business, advocating for more principled approaches to forecast evaluation.

Uploaded by

xiubing100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views38 pages

2010 - 点估计 - Making and Evaluating Point Forecasts

The document discusses the evaluation of point forecasts, emphasizing the importance of matching scoring functions with forecasting tasks to avoid misleading inferences. It highlights the need for a priori specification of scoring functions or directives to ensure optimal point forecasts, such as the Bayes rule. The paper critiques common practices in academia and business, advocating for more principled approaches to forecast evaluation.

Uploaded by

xiubing100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Making and Evaluating Point Forecasts

Tilmann Gneiting

Institut für Angewandte Mathematik


Universität Heidelberg

March 9, 2010
arXiv:0912.0902v2 [math.ST] 7 Mar 2010

Abstract
Single-valued point forecasts continue to be issued and used in almost all realms of
science and society. Typically, competing point forecasters or forecasting procedures
are compared and assessed by means of an error measure or scoring function, such as
the absolute error or the squared error, that depends both on the point forecast and the
realizing observation. The individual scores are then averaged over forecast cases, to
result in a summary measure of the predictive performance, such as the mean absolute
error or the (root) mean squared error. I demonstrate that this common practice can
lead to grossly misguided inferences, unless the scoring function and the forecasting
task are carefully matched.
Effective point forecasting requires that the scoring function be specified a priori,
or that the forecaster receives a directive in the form of a statistical functional, such
as the mean or a quantile of the predictive distribution. If the scoring function is
specified a priori, the forecaster can issue an optimal point forecast, namely, the Bayes
rule, which minimizes the expected loss under the forecaster’s predictive distribution.
If the forecaster receives a directive in the form of a functional, it is critical that the
scoring function be consistent for it, in the sense that the expected score is minimized
when following the directive. Any consistent scoring function induces a proper scoring
rule for probabilistic forecasts, and a duality principle links Bayes rules and consistent
scoring functions.
A functional is elicitable if there exists a scoring function that is strictly consistent
for it. Expectations, ratios of expectations and quantiles are elicitable. For example,
a scoring function is consistent for the mean functional if and only if it is a Bregman
function. It is consistent for a quantile if and only if it is generalized piecewise linear.
Similar characterizations apply to ratios of expectations and to expectiles. Weighted
scoring functions are consistent for functionals that adapt to the weighting in peculiar
ways. Not all functionals are elicitable; for instance, conditional value-at-risk is not,
despite its popularity in quantitative finance.

Key words and phrases: Bayes rule; Bregman function; conditional value-at-risk
(CVaR); consistency; decision theory; elicitability; expectile; mean; median; mode;
optimal point forecast; piecewise linear; proper scoring rule; quantile; statistical func-
tional

1
1 Introduction
In many aspects of human activity, a major desire is to make forecasts for an uncertain future.
Consequently, forecasts ought to be probabilistic in nature, taking the form of probability
distributions over future quantities or events (Dawid 1984; Gneiting 2008a). Still, many
practical situations require single-valued point forecasts, for reasons of decision making,
market mechanisms, reporting requirements, communications, or tradition, among others.

1.1 Using scoring functions to evaluate point forecasts


In this type of situation, competing point forecasters or forecasting procedures are compared
and assessed by means of an error measure, such as the absolute error or the squared error,
which is averaged over forecast cases. Thus, the performance criterion takes the form
n
1X
S̄ = S(xi , yi ), (1)
n i=1

where there are n forecast cases with corresponding point forecasts, x1 , . . . , xn , and verifying
observations, y1 , . . . , yn . The function S depends both on the forecast and the realization,
and we refer to it as a scoring function.
Table 1 lists some commonly used scoring functions. We generally take scoring functions to
be negatively oriented, that is, the smaller, the better. The absolute error and the squared
error are of the prediction error form, in that they depend on the forecast error, x − y, only,
and they are symmetric, in that S(x, y) = S(y, x). The absolute percentage error and the
relative error are used for strictly positive quantities only; they are neither of the prediction
error form nor symmetric. Patton (2009) discusses these as well as many other scoring
functions that have been used to assess point forecasts for a strictly positive quantity, such
as an asset value or a volatility proxy.

Table 1: Some commonly used scoring functions.

S(x, y) = (x − y)2 squared error (SE)


S(x, y) = |x − y| absolute error (AE)
S(x, y) = |(x − y)/y| absolute percentage error (APE)
S(x, y) = |(x − y)/x| relative error (RE)

Our next two tables summarize the use of scoring functions in academia, the public and the
private sector. Table 2 surveys the 2008 volumes of peer-reviewed journals in forecasting
(Group I) and statistics (Group II), along with premier journals in the most prominent
application areas, namely econometrics (Group III) and meteorology (Group IV). We call an

2
article a forecasting paper if it contains a table or a figure in which the predictive performance
of a forecaster or forecasting method is summarized in the form of the mean score (1), or
a monotone transformation thereof, such as the root mean squared error. Not surprisingly,
the majority of the Group I papers are forecasting papers, and many of them employ several
scoring functions simultaneously. Overall, the squared error is the most popular scoring
function in academia, particularly in Groups III and IV, followed by the absolute error and
the absolute percentage error.
Table 3 reports the use of scoring functions in businesses and organizations, according to
surveys conducted or summarized by Carbone and Armstrong (1982), Mentzner and Kahn
(1995), McCarthy et al. (2006) and Fildes and Goodwin (2007). In addition to the squared
error and the absolute error, the absolute percentage error has been very widely used in
practice, presumably because business forecasts focus on demand, sales, or costs, all of
which are nonnegative quantities.
There are many options and considerations in choosing a scoring function. What scoring
function ought to be used in practice? Do the standard choices have theoretical support?
Arguably, there is considerable contention in the scientific community, along with a critical
need for theoretically principled guidance. Some 20 years ago, Murphy and Winkler (1987,
p. 1330) commented on the state of the art in forecast evaluation, noting that

“[. . . ] verification measures have tended to proliferate, with relatively little effort being
made to develop general concepts and principles [. . . ] This state of affairs has impacted
the development of a science of forecast verification.”

Nothing much has changed since. Armstrong (2001) called for further research, while
Moskaitis and Hansen (2006) asked

“Deterministic forecasting and verification: A busted system?”

Similarly, the recent review by Fildes et al. (2008, p. 1158) states that

“Defining the basic requirements of a good error measure is still a controversial issue.”

1.2 Simulation study


To focus issues and ideas, we consider a simulation study, in which we seek point forecasts
for a highly volatile daily asset value, yt . The data generating process is such that yt is a
realization of the random variable
Yt = Zt2 , (2)
where Zt follows a Gaussian conditionally heteroscedastic time series model (Engle 1982;
Bollerslev 1986), with the parameter values proposed by Christoffersen and Diebold (1996),
in that
Zt ∼ N (0, σt2) where σt2 = 0.20 Zt−1
2 2
+ 0.75 σt−1 + 0.05.

3
Table 2: Use of scoring functions in the 2008 volumes of leading peer-reviewed journals
in forecasting (Group I), statistics (Group II), econometrics (Group III) and meteorology
(Group IV). Column 2 shows the total number of papers published in 2008 under Web of
Science document type article, note or review. Column 3 shows the number of forecasting
papers (FP), that is, the number of articles with a table or figure that summarizes predic-
tive performance in the form of the mean score (1) or a monotone transformation thereof.
Columns 4 through 7 show the number of papers employing the squared error (SE), absolute
error (AE), absolute percentage error (APE), or miscellaneous (MSC) other scoring func-
tions. The sum of columns 4 through 7 may exceed the number in column 3, because of
the simultaneous use of multiple scoring functions in some articles. Papers that apply error
measures to evaluate estimation methods, rather than forecasting methods, have not been
considered in this study.

Total FP SE AE APE MSC


Group I: Forecasting
International Journal of Forecasting 41 32 21 10 8 4
Journal of Forecasting 39 25 23 13 5 3
Group II: Statistics
Annals of Applied Statistics 62 8 6 3 1 0
Annals of Statistics 100 5 3 2 0 0
Journal of the American Statistical Association 129 10 9 1 0 0
Journal of the Royal Statistical Society Ser. B 49 5 4 1 0 0
Group III: Econometrics
Journal of Business and Economic Statistics 26 9 8 2 1 0
Journal of Econometrics 118 5 5 0 0 0
Group IV: Meteorology
Bulletin of the American Meteorological Society 73 1 1 0 0 0
Monthly Weather Review 300 63 58 8 2 0
Quarterly Journal of the Royal Meteorological Society 148 19 19 0 0 0
Weather and Forecasting 79 26 20 11 0 1

4
Table 3: Use of scoring functions in the evaluation of point forecasts in businesses and
organizations. Columns 2 through 4 show the percentage of survey respondents using the
squared error (SE), absolute error (AE) and absolute percentage error (APE), with the
source of the survey listed in column 1.

Source SE AE APE
Carbone and Armstrong (1982), Table 1 27% 19% 9%
Mentzner and Kahn (1995), Table VIII 10% 25% 52%
McCarthy, Davis, Golicic and Mentzner (2006), Table VIII 6% 20% 45%
Fildes and Goodwin (2007), Table 5 9% 36% 44%

Table 4: The mean error measure (1) for the three point forecasters in the simulation study,
using the squared error (SE), absolute error (AE), absolute percentage error (APE) and
relative error (RE) scoring functions.

Forecaster SE AE APE RE
Statistician 5.07 0.97 2.58 × 105 0.97
Optimist 22.73 4.35 13.96 × 105 0.87
Pessimist 7.61 0.96 0.14 × 105 19.24

We consider three forecasters, each of whom issues a one-day ahead point forecast for the
asset value. The statistician has knowledge of the data generating process and the actual
value of the conditional variance σt , and thus predicts the true conditional mean,

x̂t = E(Yt |σt2 ) = σt2 ,

as her point forecast. The optimist always predicts x̂t = 5. The pessimist always issues the
point forecast x̂t = 0.05. Figure 1 shows these point forecasts along with the realizing asset
value for 200 successive trading days. There ought to be little contention as to the predictive
performance, in that the statistician is more skilled than the optimist or the pessimist.
Table 4 provides a formal evaluation of the three forecasters for a sequence of n = 100, 000
sequential forecasts, using the mean score (1) and the scoring functions listed in Table 1.
The results are counterintuitive and disconcerting, in that the pessimist has the best (lowest)
score both under the absolute error and the absolute percentage error scoring functions. In
terms of relative error, the optimist performs best. Yet, what we have done here is common
practice in academia and businesses, in that point forecasts are evaluated by means of these
scoring functions.

5
5
4
ASSET PRICE

Statistician
Optimist
Pessimist
2
1
0

0 50 100 150 200

TRADING DAY

Figure 1: A realized series of volatile daily asset prices under the data generating process
(2), shown by circles, along with the one-day ahead point forecasts by the statistician (blue
line), the optimist (orange line at top) and the pessimist (red line at bottom).

1.3 Discussion
The source of these disconcerting results is aptly explained in a recent paper by Engelberg,
Manski and Williams (2009, p. 30):

“Our concern is prediction of real-valued outcomes such as firm profit, GDP, growth,
or temperature. In these cases, the users of point predictions sometimes presume
that forecasters report the means of their subjective probability distributions; that is,
their best point predictions under square loss. However, forecasters are not specifically
asked to report subjective means. Nor are they asked to report subjective medians
or modes, which are best predictors under other loss functions. Instead, they are
simply asked to ‘predict’ the outcome or to provide their ‘best prediction’, without
definition of the word ‘best.’ In the absence of explicit guidance, forecasters may report
different distributional features as their point predictions. Some may report subjective
means, others subjective medians or modes, and still others, applying asymmetric loss
functions, may report various quantiles of their subjective probability distributions.”

Similarly, Murphy and Daan (1985, p. 391) noted that

“It will be assumed here that the forecasters receive a ‘directive’ concerning the pro-
cedure to be followed [. . . ] and that it is desirable to choose an evaluation measure
that is consistent with this concept. An example may help to illustrate this concept.
Consider a continuous [. . . ] predictand, and suppose that the directive states ‘forecast

6
the expected (or mean) value of the variable.’ In this situation, the mean square error
measure would be an appropriate scoring rule, since it is minimized by forecasting the
mean of the (judgemental) probability distribution. Measures that correspond with a
directive in this sense will be referred to as consistent scoring rules (for that directive).”

Despite these well-argued perspectives, there has been little recognition that the common
practice of requesting ‘some’ point forecast, and then evaluating the forecasters by using
‘some’ (set of) scoring function(s), is not a meaningful endeavor. In this paper, we develop
the perspectives of Murphy and Daan (1985) and Engelberg et al. (2009) and argue that
effective point forecasting depends on ‘guidance’ or ‘directives’, which can be given in one
of two complementary ways, namely, by disclosing the scoring function ex ante to the
forecaster, or by requesting a specific functional of the forecaster’s predictive distribution,
such as the mean or a quantile.
As to the first option, the a priori disclosure of the scoring function allows the forecaster
to tailor the point predictor to the scoring function at hand. In particular, this permits
our statistician forecaster to mutate into Mr. Bayes, who issues the optimal point forecast,
namely the Bayes rule,
x̂ = arg minx EF S(x, Y ), (3)
where the expectation is taken with respect to the forecaster’s subjective or objective predic-
tive distribution, F . For example, if the scoring function S is the squared error, the optimal
point forecast is the mean of the predictive distribution. In the case of the absolute error,
the Bayes rule is any median of the predictive distribution. The class
 y β
Sβ (x, y) = 1 − (β 6= 0) (4)
x
of scoring functions nests both the absolute percentage error (β = −1) and the relative error
(β = 1) scoring functions. If the predictive distribution F has density f on the positive
half-axis and a finite fractional moment of order β, the optimal point forecast under the loss
or scoring function (4) is the median of a random variable whose density is proportional to
y β f (y). We call this the β-median of the probability distribution F and write med(β) (F ).
The traditional median arises in the limit as β → 0.
Table 5 summarizes our discussion, in that it shows the optimal point forecast, or Bayes
rule, under the scoring functions in Table 1, both in full generality and in the special case
of the true predictive distribution under the data generating process (2). Table 6 shows the
mean score (1) for the new competitor Mr. Bayes in the simulation study, who issues the
optimal point forecast. As expected, Mr. Bayes outperforms his colleagues.
An alternative to disclosing the scoring function is to request a specific functional of the
forecaster’s predictive distribution, such as the mean or a quantile, and to apply any scoring
function that is consistent with the functional, roughly in the following sense.
Let the interval I be the potential range of the outcomes, such as I = R for a real-valued
quantity, or I = (0, ∞) for a strictly positive quantity, and let the probability distribution F

7
Table 5: Bayes rules under the scoring functions in Table 1 as a functional of the forecaster’s
predictive distribution, F . The functional med(β) (F ) is defined in the text. The final column
specializes to the true predictive distribution under the data generating process (2) in the
simulation study. The entry for the absolute percentage error (APE) is to be understood as
follows. The predictive distribution F has infinite fractional moment of order −1, and thus
med(−1) (F ) does not exist. However, it is readily seen that the smaller the (strictly positive)
point forecast, the smaller the expected APE. Thus, a prudent forecaster will issue some
very small ǫ > 0 as point predictor.

Scoring Function Bayes Rule Point Forecast in Simulation Study


SE x̂ = mean(F ) σt2
AE x̂ = median(F ) 0.455 σt2
APE x̂ = med(−1) (F ) ε
(1)
RE x̂ = med (F ) 2.366 σt2

Table 6: Continuation of Table 4, showing the corresponding mean scores for the new com-
petitor, Mr. Bayes. In the case of the APE, Mr. Bayes issues the point forecast x̂ = ǫ = 10−10 .

SE AE APE RE
Mr. Bayes 5.07 0.86 1.00 0.75

be concentrated on I. Then a scoring function is any mapping S : I×I → [0, ∞). A functional
is a potentially set-valued mapping F 7→ T(F ) ⊆ I. A scoring function S is consistent for
the functional T if
EF [S(t, Y )] ≤ EF [S(x, Y )]
for all F , all t ∈ T(F ) and all x ∈ I. It is strictly consistent if it is consistent and equality
of the expectations implies that x ∈ T(F ). Following Osband (1985) and Lambert, Pennock
and Shoham (2008), a functional is elicitable if there exists a scoring function that is strictly
consistent for it.

1.4 Plan of the paper


The remainder of the paper is organized as follows. Section 2 develops the notions of con-
sistency and elicitability in a comprehensive way. In addition to reviewing and unifying the
extant literature, we present original results on weighted scoring functions that extend prior
findings on optimal point forecasts, such as those of Park and Stefanski (1998) and Patton
(2010). Section 3 turns to examples. The mean functional, ratios of expectations, quantiles

8
and expectiles are elicitable. Subject to weak regularity conditions, a scoring function for a
real-valued predictand is consistent for the mean functional if and only if it is a Bregman
function, that is, of the form

S(x, y) = φ(y) − φ(x) − φ′ (x)(y − x),

where φ is a convex function with subgradient φ′ (Savage 1971). More general and novel
results apply to ratios of expectations and expectiles. A scoring function is consistent for
the α-quantile if and only if it is generalized piecewise linear (GPL) of order α ∈ (0, 1), that
is, of the form
S(x, y) = (1(x ≥ y) − α) (g(x) − g(y)),
where 1(·) denotes an indicator function and g is nondecreasing (Thomson 1979; Saerens
2000). However, not all functionals are elicitable. Notably, the conditional value-at-risk
(CVaR) functional is not elicitable, despite its popularity as a risk measure in financial
applications.
The paper closes with a discussion in Section 5, which makes a plea for change in the practice
of point forecasting. I contend that in issuing and evaluating point forecasts, it is essential
that either the scoring function be specified ex ante, or an elicitable target functional be
named, such as an expectation or a quantile, and scoring functions be used that are consistent
for the target functional.

2 A decision-theoretic approach to the evaluation of


point forecasts
We now develop a theoretical framework for the evaluation of point forecasts. Towards this
end, we review the more general, classical decision-theoretic setting whose basic ingredients
are as follows.

(a) An observation domain, O, which comprises the potential outcomes of a future obser-
vation.

(b) A class F of probability measures on the observation domain O (equipped with a


suitable σ-algebra), which constitutes a family of probability distributions for the future
observation.

(c) An action domain, A, which comprises the potential actions of a decision maker.

(d) A loss function L : A × O → [0, ∞), where L(a, o) represents the monetary or societal
cost when the decision maker takes the action a ∈ A and the observation o ∈ O
materializes.

9
Given a probability distribution F ∈ F for the future observation, the Bayes act or Bayes
rule is any decision â ∈ A such that

â = arg min a EF L(a, Y ), (5)

where Y is a random variable with distribution F . Thus, if the decision maker’s assessment of
the uncertain future is represented by the probability measure F , and she wishes to minimize
the expected loss, her optimal decision is the Bayes act, â. In general, Bayes acts need not
exist nor be unique, but in most cases of practical interest, Bayes rules exist, and frequently
they are unique (Ferguson 1967).

2.1 Decision-theoretic setting


Point forecasting falls into the general decision-theoretic setting, if we assume that the ob-
servation domain and the action domain coincide. In what follows we assume, for simplicity,
that this common domain,
D = O = A ⊆ Rd ,
is a subset of the Euclidean space Rd and equipped with the corresponding Borel σ-algebra.
Furthermore, we refer to the loss function as a scoring function. With these adaptations,
the basic components of our decision-theoretic framework are as follows.

(a) A prediction-observation (PO) domain, D = D × D, which is the Cartesian product of


the domain D ⊆ Rd with itself.

(b) A family F of potential probability distributions for the future observation Y that
takes values in D.

(c) A scoring function S : D = D × D → [0, ∞), where S(x, y) represents the loss or pe-
nalty when the point forecast x ∈ D is issued and the observation y ∈ D materializes.

In this setting, the optimal point forecast under the probability distribution F ∈ F for the
future observation, Y , is the Bayes act or Bayes rule (5), which can now be written as

x̂ = arg min x EF S(x, Y ). (6)

We will mostly work in dimension d = 1, in which any connected domain D is simply an


interval, I. The cases of prime interest then are the real line, I = R, and the nonnegative or
positive halfaxis, I = [0, ∞) or I = (0, ∞).
Table 7 summarizes assumptions which some of our subsequent results impose on scoring
functions. The nonnegativity condition (S0) is standard and not restrictive. Indeed, if S0 is
such that S0 (x, y) ≥ S0 (y, y) for all x, y ∈ I, which is a natural assumption on a loss or scoring
function, then S(x, y) = S0 (x, y)−S0 (y, y) satisfies (S0) and shares the optimal point forecast

10
Table 7: Assumptions on a scoring function S on a PO domain D = I × I, where I ⊆ R is an
interval, x ∈ I denotes the point forecast and y ∈ I the realizing observation.

(S0) S(x, y) ≥ 0 with equality if x = y


(S1) S(x, y) is continuous in x
(S2) The partial derivative S(1) (x, y) exists and is continuous in x whenever x 6= y

(6), subject to integrability conditions that are not of practical concern. Generally, a loss
function can be multiplied by a strictly positive constant and any function that depends on y
only can be added, without changing the nature of the optimal point forecast. Furthermore,
the optimization problem in (6) is posed in terms of the point predictor, x. In this light, it is
natural that assumptions (S1) and (S2) concern continuity and differentiability with respect
to the first argument, the point forecast x.
Efron (1991) and Patton (2010) argue that homogeneity or scale invariance is a desirable
property of a scoring function. We adopt this notion and call a scoring function S on the
PO domain D = D × D homogeneous of order b if
S(cx, cy) = |c| b S(x, y) for all x, y ∈ D and c ∈ R
which are such that cx ∈ D and cy ∈ D. Evidently, the underlying quest is that for
equivariance in the decision problem. The scoring function S on the PO domain D = D × D
is equivariant with respect to some class H of injections h : D → D if
arg minx EF [S(x, h(Y ))] = h(arg minx EF [S(x, Y )])
for all h ∈ H and all probability distributions F that are concentrated on D. For instance,
if S is homogeneous on D = Rd or D = (0, ∞)d then it is equivariant with respect to the
multiplicative group of the linear transformations {x 7→ cx : c > 0}. If the scoring function is
of the prediction error form on D = Rd , then it is equivariant with respect to the translation
group {x 7→ x + b : b ∈ Rd }.
While our decision-theoretic setting resembles and follows those of Osband (1985) and Lam-
bert et al. (2008), and the subsequent development owes much to their pioneering works,
there are distinctions in technique. For example, Osband (1985) assumes a bounded domain
D, while Lambert et al. (2008) consider D to be a finite set. The work of Granger and
Pesaran (2000a, 2000b), which argues in favor of closer links between decision theory and
forecast evaluation, focuses on probability forecasts for a dichotomous event.

2.2 Consistency
In the decision-theoretic framework, we think of the aforementioned ‘distributional feature’
or ‘directive’ for the forecaster as a statistical functional. Formally, a statistical functional,

11
or simply a functional, is a potentially set-valued mapping from a class of probability distri-
butions, F , to a Euclidean space (Horowitz and Manski 2006; Huber and Ronchetti 2009;
Wellner 2009). In the current context of point forecasting, we require that the functional

T : F −→ D, F 7−→ T(F ),

maps into the domain D ⊆ Rd . Frequently, we take F to be the class of all probability
measures on D, or the class of the probability measures with compact support in D.
To facilitate the presentation, the following definitions and results suppress the dependence
of the scoring function S, the functional T and the class F on the domain D.

Definition 2.1. The scoring function S is consistent for the functional T relative to the
class F if
EF S(t, Y ) ≤ EF S(x, Y ) (7)
for all probability distributions F ∈ F , all t ∈ T(F ) and all x ∈ D. It is strictly consistent
if it is consistent and equality in (7) implies that x ∈ T(F ).

As noted, the term consistent was coined by Murphy and Daan (1985, p. 391), who stressed
that is is critically important to define consistency for a fixed, given functional, as opposed to
a generic notion of consistency, which was, correctly, refuted by Jolliffe (2008). For example,
the squared error scoring function, S(x, y) = (x−y)2 , is consistent, but not strictly consistent,
for the mean functional relative to the class of the probability measures on the real line with
finite first moment. It is strictly consistent relative to the class of the probability measures
with finite second moment.
In a parametric context, Lehmann (1951) and Noorbaloochi and Meeden (1983) refer to a re-
lated property as decision-theoretic unbiasedness. The following result notes that consistency
is the dual of the optimal point forecast property, just as decision-theoretic unbiasedness is
the dual of being Bayes (Noorbaloochi and Meeden 1983). It thus connects the problems of
finding optimal point forecasts, and of evaluating point predictions.

Theorem 2.2. The scoring function S is consistent for the functional T relative to the class
F if and only if, given any F ∈ F , any x ∈ T(F ) is an optimal point forecast under S.

Stated differently, the class of the scoring functions that are consistent for a certain functional
is identical to the class of the loss functions under which the functional is an optimal point
forecast. Despite its simplicity, and the proof being immediate from the defining properties,
this duality does not appear to be widely appreciated.
Our next result shows that the class of the consistent scoring functions is convex, and thus
suggests the existence of Choquet representations (Phelps 1966).

12
Theorem 2.3. Let λ be a measure on a measurable space (Ω, A). Suppose that for all
ω ∈ Ω, the scoring function S ω satisfies (S0) and is consistent for the functional T relative
to the class F . Then the scoring function
Z
S(x, y) = S ω (x, y) λ(dω)

is consistent for T relative to F .

At this point, it will be useful to distinguish the notions of a proper scoring rule (Winkler
1996; Gneiting and Raftery 2007) and a consistent scoring function. I believe that this
distinction is useful, even though the extant literature has failed to make it. For example, in
referring to proper scoring rules for quantile forecasts, Cervera and Muñoz (1996), Gneiting
and Raftery (2007), Hilden (2008) and Jose and Winkler (2009) discuss scoring functions
that are consistent for a quantile.
Within our decision-theoretic framework, a proper scoring rule is a function S : F × D → R
such that
EF S(F, Y ) ≤ EF S(G, Y ) (8)
for all probability distributions F, G ∈ F , where we assume that the expectations are well-
defined. Note that S is defined on the Cartesian product of the class F and the domain
D. The loss or penalty S(F, y) arises when a probabilistic forecaster issues the predictive
distribution F while y ∈ D materializes. The expectation inequality (8) then implies that
the forecaster minimizes the expected loss by following her true beliefs. Thus, the use of
proper scoring rules encourages sincerity and candor among probabilistic forecasters.
In contrast, a scoring function S acts on the PO domain, D = D × D, that is, the Cartesian
product of D with itself. This is a much simpler domain than that for a scoring rule. However,
any consistent scoring function induces a proper scoring rule in a straightforward and natural
construction, as follows.

Theorem 2.4. Suppose that the scoring function S is consistent for the functional T relative
to the class F . Then the function

S : F × D −→ [0, ∞), (F, y) 7−→ S(F, y) = S(T(F ), y),

is a proper scoring rule.

A more general decision-theoretic approach to the construction of proper scoring rules is


described by Dawid (2007, p. 78) and Gneiting and Raftery (2007, p. 361).

2.3 Elicitability
We turn to the notion of elicitability, which is a critically important concept in the evaluation
of point forecasts. While the general notion dates back to the pioneering work of Osband

13
(1985), the term elicitable was coined only recently by Lambert et al. (2008). Whenever
appropriate and feasible, we suppress the dependence of the definitions and results on the
PO domain D = D × D.

Definition 2.5. The functional T is elicitable relative to the class F if there exists a scoring
function S that is strictly consistent for T relative to F .

Evidently, if T is elicitable relative to the class F , then it is elicitable relative to any subclass
F0 ⊆ F . The following result then is a version of Osband’s (1985, p. 9) revelation principle.

Theorem 2.6 (Osband). Suppose that the class F is concentrated on the domain D, and
let g : D → D be a one-to-one mapping. Then the following holds.

(a) If T is elicitable, then Tg = g ◦ T is elicitable.

(b) If S is consistent for T, then the scoring function

Sg (x, y) = S(g −1(x), y)

is consistent for Tg .

(c) If S is strictly consistent for T, then Sg is strictly consistent for Tg .

The next theorem is an original result that concerns weighted scoring functions, where the
weight function depends on the realizing observation, y, only.

Theorem 2.7. Let the functional T be defined on a class F of probability distributions which
admit a density, f , with respect to some dominating measure on the domain D. Consider
the weight function
w : D → [0, ∞).
Let F (w) ⊆ F denote the subclass of the probability distributions in F which are such that
w(y)f (y) has finite integral over D, and the probability measure F (w) with density propor-
tional to w(y)f (y) belongs to F . Define the functional

T (w) : F (w) −→ I, F 7−→ T (w) (F ) = T(F (w) ), (9)

on this subclass F (w). Then the following holds.

(a) If T is elicitable, then T (w) is elicitable.

(b) If S is consistent for T relative to F , then

S(w) (x, y) = w(y) S(x, y) (10)

is consistent for T (w) relative to F (w) .

14
Table 8: The optimal point forecast or Bayes rule (6) when the scoring function is relative
error, S(x, y) = |(x − y)/x|, and the future quantity Y can be represented as Y = Z 2 , where
Z has a t-distribution with mean 0, variance 1 and ν > 2 degrees of freedom. In the limiting
case as ν → ∞, we take Z to be standard normal. If Z has variance σ 2 the entries need to
be multiplied by this factor. As opposed to the approximations in Table 1 of Patton (2010),
which stem from numerical and Monte Carlo methods and are reproduced below, our results
derive from Theorem 2.7 and are exact. For details see Appendix B.

ν=4 ν=6 ν=8 ν = 10 ν→∞


Exact optimal point forecast 3.4048 2.8216 2.6573 2.5801 2.3660
Patton’s approximation 3.0962 2.7300 2.6067 2.5500 2.3600

(c) If S is strictly consistent for T relative to F , then S(w) is strictly consistent for T (w)
relative to F (w).

In other words, a weighted scoring function is consistent for the functional T (w) , which acts
on the predictive distribution in a peculiar way, in that it applies the original functional,
T, to the probability measure whose density is proportional to the product of the weight
function and the original density.
Theorem 2.7 is a very general result with a wealth of applications, both in forecast evaluation
and in the derivation of optimal point forecasts. In particular, the functional (9) is the
optimal point forecast under the weighted scoring function (10), which allows us to unify
and extend scattered prior results. For example, the scoring function Sβ of equation (4),
 y β
Sβ (x, y) = 1 − ,
x

is of the form (10) with the original scoring function S(x, y) = |x−β − y −β | and the weight
function w(y) = y β on the positive halfaxis, D = (0, ∞). The scoring function S is consistent
for the median functional. Thus, as noted in the introduction, the scoring function Sβ
is consistent for the β-median functional, med(β) (F ), that is, the median of a probability
distribution whose density is proportional to y β f (y), where f is the density of F . If β = −1,
we recover the absolute percentage error, S−1 (x, y) = |(x−y)/y|. The case β = 1 corresponds
to the relative error, S1 (x, y) = |(x − y)/x|, which Patton (2010) refers to as the MAE-prop
function. Table 1 of Patton (2010) shows Monte Carlo based approximate values for optimal
point forecasts under this scoring function. Theorem 2.7 permits us to give exact results;
these are summarized in Table 8 and differ notably from the approximations.
Another interesting case arises when the original scoring function S is the squared error,
S(x, y) = (x − y)2 , which is consistent for the mean or expectation functional. If T is the

15
mean functional, the functional T(w) of equation (9) becomes
EF [Y w(Y )]
T(w) (F ) = T(F (w) ) = EF (w) [Z ] = . (11)
EF [w(Y )]
Park and Stefanski (1998) studied optimal point forecasts in the special case in which D =
(0, ∞) is the positive half-axis and w(y) = 1/y 2, so that S(w) (x, y) = (x − y)2 /y 2 is the
squared percentage error. By equation (11), the scoring function S(w) is consistent for the
functional T(w) (F ) = EF [Y −1 ] / EF [Y −2 ]. By Theorem 2.2, this latter quantity is the optimal
point forecast under the squared percentage error scoring function, which is the result derived
by Park and Stefanski (1998).
Situations in which the weight function depends on the point forecast, x, need to be handled
on a case-by-case basis. For example, a routine calculation shows that the squared relative
error scoring function, S(x, y) = (x − y)2 /x2 , is consistent for the functional
EF [Y 2 ]
T(F ) = . (12)
EF [Y ]
Incidentally, by a special case of (11) the observation-weighted scoring function S(x, y) =
y(x − y)2 is also consistent for the functional (12). Later on in equation (23) we characterize
the class of the scoring functions that are consistent for this functional.
While Theorems 2.6 and 2.7 suggest that general classes of functionals are elicitable, not all
functionals are such. The following result, which is a variant of Proposition 2.5 of Osband
(1985) and Lemma 1 of Lambert et al. (2008), states a necessary condition.

Theorem 2.8 (Osband). If a functional is elicitable then its level sets are convex in the
following sense: If F0 ∈ F , F1 ∈ F and p ∈ (0, 1) are such that Fp = (1 − p)F0 + pF1 ∈ F ,
then t ∈ T(F0 ) and t ∈ T(F1 ) imply t ∈ T(Fp ).

For example, the sum of two distinct quantiles generally does not have convex level sets and
thus is not an elicitable functional. Interesting open questions include those for a converse
of Theorem 2.8 and, more generally, for a characterization of elicitability.

2.4 Osband’s principle


Given an elicitable functional T, is there a practical way of describing and characterizing
the class of the scoring functions that are consistent for it? The following general approach,
which originates in the pioneering work of Osband (1985), is frequently useful.
Suppose that the functional T is defined for a class of probability measures on the domain
D which includes the two-point distributions. Assume that there exists an identification
function V : D × D → R such that

EF [V(x, Y )] = 0 ⇐⇒ x ∈ T(F ) (13)

16
Table 9: Possible choices for the identification function V with the property (13) in the case
in which D = I ⊆ R is an interval.

Functional Identification function


Mean V(x, y) = x − y
Ratio EF [r(Y )] / EF [s(Y )] V(x, y) = xs(y) − r(y)
α-Quantile V(x, y) = 1(x ≥ y) − α
τ -Expectile V(x, y) = 2 |1(x ≥ y) − τ | (x − y)

and V(x, y) 6= 0 unless x = y. If a consistent scoring function is available, which is smooth


in its first argument, we can take V(x, y) to be the corresponding partial derivative. For
example, if T is the mean or expectation functional on an interval D = I ⊆ R, we can pick
V(x, y) = x − y, which derives from the squared error scoring function, S(x, y) = (x − y)2 .
Table 9 provides further examples, with the second and fourth nesting the first.
The function
ǫ(c) = p S(c, a) + (1 − p)S(c, b) (14)
represents the expected score when we issue the point forecast c for a random vector Y such
that Y = a with probability p and Y = b with probability 1 − p. Since S is consistent for
the functional T, the identification function property (13) implies that ǫ(c) has a minimum
at c = x, where
p V(x, a) + (1 − p) V(x, b) = 0. (15)
If S is smooth in its first argument, we can combine (14) and (15) to result in

S(1) (x, a)/ V(x, a) = S(1) (x, b)/ V(x, b), (16)

where S(1) denotes a partial derivative or gradient with respect to the first argument. If this
latter equality holds for all pairwise distinct a, b and x ∈ D, the function S(1) (x, y)/V(x, y)
is independent of y ∈ D, and we can write

S(1) (x, y) = h(x) V(x, y) (17)

for x, y ∈ D and some function h : D → D. Frequently, we can integrate (17) to obtain the
general form of a scoring rule that is consistent for the functional T.
In recognition of Osband’s (1985) fundamental yet unpublished work, we refer to this gen-
eral approach as Osband’s principle. The examples in the subsequent section give various
instances in which the principle can be successfully put to work. For a general technical
result, see Theorem 2.1 of Osband (1985).

17
3 Examples
We now give examples in the case of a univariate predictand, in which any connected domain
D = I ⊆ R is an interval. Some of the results are classical, such as the characterizations
for expectations (Savage 1971) and quantiles (Thomson 1979), and some are novel, includ-
ing those for ratios of expectations, expectiles and conditional value-at-risk. In a majority
of the examples, the technical arguments rely on the properties of convex functions and
subgradients, for which we refer to Rockafellar (1970).

3.1 Expectations
It is well known that the squared error scoring function, S(x, y) = (x − y)2 , is strictly
consistent for the mean functional relative to the class of the probability distributions on R
whose second moment is finite. Thus, means or expectations are elicitable. Before turning
to more general settings in subsequent sections, we review a classical result of Savage (1971)
which identifies the class of the scoring functions that are consistent for the mean functional
as that of the Bregman functions. Closely related results have been obtained by Reichelstein
and Osband (1984), Saerens (2000), Banerjee, Guo and Wang (2005) and Patton (2010).

Theorem 3.1 (Savage). Let F be the class of the probability measures on the interval I ⊆ R
with finite first moment. Then the following holds.

(a) The mean functional is elicitable relative to the class F .

(b) Suppose that the scoring function S satisfies assumptions (S0), (S1) and (S2) on the
PO domain D = I × I. Then S is consistent for the mean functional relative to the
class of the compactly supported probability measures on I if, and only if, it is of the
form
S(x, y) = φ(y) − φ(x) − φ′ (x)(y − x), (18)
where φ is a convex function with subgradient φ′ on I.
(c) If φ is strictly convex, the scoring function (18) is strictly consistent for the mean
functional relative to the class of the probability measures F on I for which both EF Y
and EF φ(Y ) exist and are finite.

Banerjee et al. (2005) refer to a function of the form (18) as a Bregman function. For
example, if I = R and φ(x) = |x|a , where a > 1 to ensure strict convexity, the Bregman
representation yields the scoring function

Sa (x, y) = |y|a − |x|a − a sign(x)|x|a−1 (y − x), (19)

which is homogeneous of order a and nests the squared error that arises when a = 2. Savage
(1971) showed that up to a multiplicative constant squared error is the unique Bregman

18
15
Mr Bayes
Optimist
MEAN SCORE Pessimist

10
5
0

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

PATTON PARAMETER B

Figure 2: The mean score (1) under the Patton scoring function (20) for Mr. Bayes (green),
the optimist (orange) and the pessimist (red) in the simulation study of Section 1.2.

function of the prediction error form, as well as the unique symmetric Bregman function.
Patton (2010) introduced a rich and flexible family of homogeneous Bregman functions on
the PO domain D = (0, ∞) × (0, ∞), namely

1  1


 y b − xb − xb−1 (y − x) if b ∈ R \ {0, 1},


 b(b − 1) b − 1
Sb (x, y) = y y (20)
− log − 1 if b = 0,


 x x
 y log y − y + x


if b = 1.
x
Up to a multiplicative constant, these are the only homogeneous Bregman functions on
this PO domain. The squared error scoring function emerges when b = 2 and the QLIKE
function (Patton 2010) when b = 0. If b = a > 1 the Patton function (20) coincides with the
corresponding restriction of the power function (19), up to a multiplicative constant.
Finally, it is worth noting that roper scoring rules for probability forecasts of a dichotomous
event are also of the Bregman form, because the probability of a binary event equals the
expectation of the corresponding indicator variable. Compare McCarthy (1956), Savage
(1971), DeGroot and Fienberg (1983), Schervish (1989), Winkler (1996), Buja, Stuetzle and
Shen (2005) and Gneiting and Raftery (2007), among others.
Figure 2 returns to the initial simulation study of Section 1.2 and shows the mean score
(1) under the Patton scoring function (20) for Mr. Bayes, the optimist and the pessimist.

19
The optimal point forecast under a Bregman scoring function is the mean of the predictive
distribution, so that the statistician forecaster fuses with Mr. Bayes.

3.2 Ratios of expectations


We now consider statistical functionals which can be represented as ratios of expectations.
The mean functional emerges in the special case in which r(y) = y and s(y) = 1.

Theorem 3.2. Let I ⊆ R be an interval, and suppose that r : I → R and s : I → (0, ∞) are
measurable functions. Then the following holds.

(a) The functional


EF [r(Y )]
T(F ) = (21)
EF [s(Y )]
is elicitable relative to the class of the probability measures on I for which EF [r(Y )],
EF [s(Y )] and EF [Y s(Y )] exist and are finite.

(b) If S is of the form

S(x, y) = s(y) (φ(y) − φ(x)) − φ′ (x)(r(y) − xs(y)) + φ′ (y)(r(y) − ys(y)), (22)

where φ is a convex function with subgradient φ′ , then it is consistent for the func-
tional (21) relative to the class of the probability measures F on I for which EF [r(Y )],
EF [s(Y )], EF [r(Y )φ′ (Y )], EF [s(Y )φ(Y )] and EF [Y s(Y )φ′ (Y )] exist and are finite. If
φ is strictly convex, then S is strictly consistent.

(c) Suppose that the scoring function S satisfies assumptions (S0), (S1) and (S2) on the
PO domain D = I × I. If s is continuous and r(y) = ys(y) for y ∈ I, then S
is consistent for the functional (21) relative to the class of the compactly supported
probability measures on I if, and only if, it is of the form (22), where φ is a convex
function with subgradient φ′ .

In the case in which s(y) = w(y) and r(y) = yw(y) for a strictly positive, continuous weight
function w, the ratio (21) coincides with the functional (11). If I = (0, ∞) and w(y) = y,
the special case T(F ) = EF [Y 2 ] / EF [Y ] of equation (12) arises. In Section 2.3 we saw that
both the squared relative error scoring function, S(x, y) = (x − y)2 /x2 , and the observation-
weighted scoring function S(x, y) = y(x − y)2 are consistent for this functional. By part (c)
of Theorem 3.2, the general form of a scoring function that is consistent for the functional
(12) is
S(x, y) = y (φ(y) − φ(x)) − y (y − x) φ′ (x), (23)
where φ is convex with subgradient φ′ . The above scoring functions emerge when φ(y) = 1/y
and φ(y) = y 2, respectively.

20
3.3 Quantiles and expectiles
An α-quantile (0 < α < 1) of the cumulative distribution function F is any number x for
which limy↑x F (y) ≤ α ≤ F (x). In finance, quantiles are often referred to as value-at-risk
(VaR; Duffie and Pan 1997). The literature on the evaluation of quantile forecasts generally
recommends the use of the asymmetric piecewise linear scoring function,

Sα (x, y) = (1(x ≥ y) − α) (x − y), (24)

which is strictly consistent for the α-quantile relative to the class of the probability measures
with finite first moment (Raiffa and Schlaifer 1961, p. 196; Ferguson 1967, p. 51). This well-
known property lies at the heart of quantile regression (Koenker and Bassett 1978).
As regards the characterization of the scoring functions that are consistent for a quantile,
results of Thomson (1979) and Saerens (2000) can be summarized as follows. For a discussion
of their equivalence and historical comments, see Gneiting (2010).

Theorem 3.3 (Thomson, Saerens). Let F be the class of the probability measures on the
interval I ⊆ R, and let α ∈ (0, 1). Then the following holds.

(a) The α-quantile functional is elicitable relative to the class F .

(b) Suppose that the scoring function S satisfies assumptions (S0), (S1) and (S2) on the
PO domain D = I × I. Then S is consistent for the α-quantile relative to the class of
the compactly supported probability measures on I if, and only if, it is of the form

S(x, y) = (1(x ≥ y) − α) ( g(x) − g(y)), (25)

where g is a nondecreasing function on I.

(c) If g is strictly increasing, the scoring function (25) is strictly consistent for the α-
quantile relative to the class of the probability measures F on I for which EF g(Y )
exists and is finite.

Gneiting (2008b) refers to a function of the form (25) as generalized piecewise linear (GPL)
of order α ∈ (0, 1), because it is piecewise linear after applying a nondecreasing transfor-
mation. Any GPL function is equivariant with respect to the class of the nondecreasing
transformations, just as the quantile functional is equivariant under monotone mappings
(Koenker 2005, p. 39). If I = (0, ∞) and g(x) = xb /|b| for b ∈ R \ {0}, and taking the
corresponding limit as b → 0, we obtain the family

1 
 (1(x ≥ y) − α) xb − y b if b ∈ R \ {0},


|b|
Sα,b (x, y) = (26)
x
 (1(x ≥ y) − α) log

 if b = 0,
y

21
10
Mr Bayes
Statistician
Optimist

8
Pessimist
MEAN SCORE

6
4
2
0

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

GPL POWER PARAMETER B

Figure 3: The mean score (1) under the GPL power scoring function (26) with α = 12 for
Mr. Bayes (green), the statistician (blue), the optimist (orange) and the pessimist (red) in
the simulation study of Section 1.2.

of the GPL power scoring functions, which are homogeneous of order b. The asymmetric
piecewise linear function (24) arises when b = 1, and the MAE-LOG and MAE-SD functions
described by Patton (2009) emerge when α = 21 , and b = 0 and b = 12 , respectively.
Figure 3 returns to the simulation study in Section 1.2 and shows the mean score (1) under
the GPL power function (26), where α = 12 , for Mr. Bayes, the statistician, the optimist and
the pessimist. Once again, Mr. Bayes dominates his competitors.
Newey and Powell (1987) introduced the τ -expectile functional (0 < τ < 1) of a probability
measure F with finite mean as the unique solution x = µτ to the equation
Z ∞ Z x
τ (y − x) dF (y) = (1 − τ ) (x − y) dF (y).
x −∞

If the second moment of F is finite, the τ -expectile equals the Bayes rule or optimal point
forecast (6) under the asymmetric piecewise quadratic scoring function,

Sτ (x, y) = |1(x ≥ y) − τ | (x − y)2 , (27)

similarly to the α-quantile being the Bayes rule under the asymmetric piecewise linear func-
tion (24). Not surprisingly, expectiles have properties that resemble those of quantiles.

22
The following original result characterizes the class of the scoring functions that are consistent
for expectiles. It is interesting to observe the ways in which the corresponding class (28)
combines key characteristics of the Bregman and GPL families.
Theorem 3.4. Let F be the class of the probability measures on the interval I ⊆ R with
finite first moment, and let τ ∈ (0, 1). Then the following holds.

(a) The τ -expectile functional is elicitable relative to the class F .


(b) Suppose that the scoring function S satisfies assumptions (S0), (S1) and (S2) on the
PO domain D = I × I. Then S is consistent for the τ -expectile relative to the class of
the compactly supported probability measures on I if, and only if, it is of the form
S(x, y) = |1(x ≥ y) − τ | (φ(y) − φ(x) − φ′ (x)(y − x)) , (28)
where φ is a convex function with subgradient φ′ on I.
(c) If φ is strictly convex, the scoring function (28) is strictly consistent for the τ -expectile
relative to the class of the probability measures F on I for which both EF Y and EF φ(Y )
exist and are finite.

3.4 Conditional value-at-risk


The α-conditional value-at-risk functional (CVaR α , 0 < α < 1) equals the expectation of a
random variable with distribution F conditional on it taking values in its upper (1 − α)-tail
(Rockafellar and Uryasev 2000, 2002). An often convenient, equivalent definition is
Z 1
1
CVaRα (F ) = qβ (F ) dβ, (29)
1−α α
where qβ denotes the β-quantile (Acerbi 2002), similarly to the functional representation
of the α-trimmed mean (Huber and Ronchetti 2009). The CVaR functional is a popular
risk measure in quantitative finance. Its varied, elegant and appealing properties include
coherency in the sense of Artzner et al. (1999), who consider functionals defined in terms of
random variables, rather than the corresponding probability measures.
Theorem 3.5. The CVaRα functional is not elicitable relative to any class F of probability
distributions on the interval I ⊆ R that contains the measures with finite support, or the
finite mixtures of the absolutely continuous distributions with compact support.

This negative result challenges the use of the CVaR functional as a predictive measure of risk,
and may provide a partial explanation for the striking lack of literature on the evaluation of
CVaR forecasts, as opposed to quantile or VaR forecasts, for which we refer to Berkowitz and
O’Brien (2002), Giacomini and Komunjer (2005) and Bao, Lee and Saltoğlu (2006), among
others. With consistent scoring functions not being available, it remains unclear how one
might assess and compare CVaR forecasts.

23
3.5 Mode
Let F be a class of probability measures on the real line, each of which has a well-defined,
unique mode. It is sometimes stated informally that the mode is an optimal point forecast
under the zero-one scoring function,

Sc (x, y) = 1(|x − y| > c),

where c > 0. A rigorous statement is that the optimal point forecast or Bayes rule (6) under
the scoring function Sc is the midpoint

x̂ = arg max x (F (x + c) − limy↑x−c F (y))

of the modal interval of length 2c of the probability measure F ∈ F (Ferguson 1967, p. 51).
Example 7.20 of Lehmann and Casella (1998) explores this argument in more detail.
Expressed differently, the zero-one scoring function Sc is consistent for the midpoint func-
tional, which we denote by Tc . If c is sufficiently small, then Tc (F ) is well-defined and
single-valued for all F ∈ F . We can then define the mode functional on F as the limit

T0 (F ) = limc↓0 Tc (F ).

I do not know whether or not T0 is elicitable. However, if the members of the class F have
continuous Lebesgue densities, then T0 is asymptotically elicitable, in the sense that it can
be represented as the continuous limit of a family of elicitable functionals.
Stronger results become available if one puts conditions on both the scoring function S and
the family F of probability distributions. Theorem 2 of Granger (1969) is a result of this
type. Consider the PO domain D = R×R. If the scoring function S is an even function of the
prediction error that attains a minimum at the origin, and each F ∈ F admits a Lebesgue
density, f , which is symmetric, continuous and unimodal, so that mean, median and mode
coincide, then S is consistent for this common functional. Theorem 1 of Granger (1969)
and Theorem 7.15 of Lehmann and Casella (1998) trade the continuity and unimodality
conditions on f for an additional assumption of convexity on the scoring function.
Henderson, Jones and Stare (2001, p. 3087) posit that in survival analysis a loss function of
the form  
 0, x ≤ y ≤ kx 
S∗k (x, y) = k = 1(| log(x) − log(y)| > log(k))
 1, otherwise 

is reasonable, with a choice of k = 2 often being adequate, arguing that “most people for
example would accept that a lifetime prediction of, say, 2 months, was reasonably accurate if
death occurs between about 1 and 4 months”. From the above, the optimal point forecast or
Bayes rule under S∗k is the midpoint functional Tlog(k) applied to the predictive distribution
of the logarithm of the lifetime, rather than the lifetime itself. Henderson et al. (2001) give
various examples.

24
4 Multivariate predictands
While thus far we have restricted attention to point forecasts of a univariate quantity, the
general case of a multivariate predictand that takes values in a domain D ⊆ Rd is of consid-
erable interest. Applications include those of Gneiting et al. (2008) and Hering and Genton
(2010) to predictions of wind vectors, or that of Laurent, Rombouts and Violante (2009)
to forecasts of multivariate volatility, to name but a few. We turn to the decision-theoretic
setting of Section 2.1 and assume, for simplicity, that the point forecast, the observation and
the target functional take values in D = Rd .
We first discuss the mean functional. Assuming that S(x, y) ≥ 0 with equality if x = y,
Savage (1971), Osband and Reichelstein (1985) and Banerjee et al. (2005) showed that a
scoring function under which the (component-wise) expectation of the predictive distribution
is an optimal point forecast, is of the Bregman form
S(x, y) = φ(y) − φ(x) − h∇φ(x), y − xi, (30)
where φ : Rd → R is convex with gradient ∇φ : Rd → Rd and h , i denotes a scalar prod-
uct, subject to smoothness conditions. Expressed differently, a sufficiently smooth scoring
function is consistent for the mean functional if and only if it is of the form (30), which is
a generalization of the Bregman representation (18) in the case of a univariate predictand.
When φ(x) = kxk2 is the squared Euclidean norm, we obtain the squared error scoring
function, and similarly its ramifications, such as the weighted squared error and the pseudo
Mahalanobis error (Laurent et al. 2009).
It is of interest to note that rigorous versions of the Bregman characterization depend on
restrictive smoothness conditions. Osband and Reichelstein (1985) assume that the scoring
function is continuously differentiable with respect to its first argument, the point forecast;
Banerjee et al. (2005) assume the existence of continuous second partial derivatives with
respect to the observation. A challenging, nontrivial problem is to unify and strengthen
these results, both in univariate and multivariate settings.
Laurent et al. (2009) consider point forecasts of multivariate stochastic volatility, where the
predictand is a symmetric and positive definite matrix in Rq×q . If the matrix is vectorized, the
above results for the mean functional apply, thereby leading to the Bregman representation
(30) for the respective consistent scoring functions, which is hidden in Proposition 3 of
Laurent et al. (2009). Corollary 1 of Laurent et al. (2009) supplies a version thereof that
applies directly to point forecasts, say Σx ∈ Rq×q , of a matrix-valued, symmetric and positive
definite quantity, say Σy ∈ Rq×q , without any need to resort to vectorization. Specifically,
any scoring function of the form
S(Σx , Σy ) = φ(Σy ) − φ(Σx ) − tr (∇0 φ(Σx )(Σy − Σx )) (31)
is consistent for the (component-wise) mean functional, where φ is convex and smooth, and
∇0 φ denotes a symmetric matrix of first partial derivatives, with the off-diagonal elements
multiplied by a factor of one half.

25
Dawid and Sebastiani (1999) and Pukelsheim (2006) give various examples of convex func-
tions φ whose domain is the cone of the symmetric and positive definite elements of Rq×q ,
with the matrix norm  1/s
1 s
φ(Σ) = tr(Σ ) (32)
q
for s > 1 being one such instance. The matrix norm is nonnegative, nondecreasing in
the Loewner order, continuous, strictly convex, standardized and homogeneous of order
one. With simple adaptations, the construction extends to any real or extended real-valued
exponent s and to general, not necessarily positive definite symmetric matrices (Pukelsheim
2006, pp. 141 and 151). In the limit as s → 0 in (32) the log determinant φ(Σ) = log det(Σ)
emerges. When used in the Bregman representation (31), the log determinant function gives
rise to a well known homogeneous scoring function for point predictions of a positive definite
symmetrically matrix-valued quantity in Rq×q , namely,
 
S(Σx , Σy ) = tr Σ−1 −1
x Σy − log det Σx Σy − q, (33)

which was introduced by James and Stein (1961, Section 5). When q = 1 the scoring function
(33) reduces to the Patton function (20) with b = 0, that is, the QLIKE function.
In the case of quantiles, the passage from the univariate functional to multivariate analogues
is much less straightforward. Notions of quantiles for multivariate distributions based on
loss or scoring functions have been studied by Abdous and Theodorescu (1992), Chaudhuri
(1996), Koltchinskii (1997), Serfling (2002) and Hallin, Paindaveine and S̆iman (2010), among
others. In particular, it is customary to define the median of a probability distribution F on
Rd as
x̂ = arg minx EF (kx − Y k − kY k),
where k · k denotes the Euclidean norm (Small 1990). If d = 1, this yields the traditional
median on the real line, with the kY k term eliminating the need for moment conditions on
the predictive distribution (Kemperman 1987). Of course, norms and distances other than
the Euclidean could be considered. In this more general type of situation, Koenker (2006)
proposed that a functional based on minimizing the square of a distance be called a Fréchet
mean, and a functional based on minimizing a distance a Fréchet median, just as in the
traditional case of the Euclidean distance.

5 Discussion
Ideally, forecasts ought to be probabilistic, taking the form of predictive distributions over
future quantities and events (Dawid 1984; Diebold et al. 1998; Granger and Pesaran 2000a,
2000b; Gneiting 2008a). If point forecasts are to be issued and evaluated, it is essential that
either the scoring function be specified ex ante, or an elicitable target functional be named,
such as the mean or a quantile of the predictive distribution, and scoring functions be used
that are consistent for the target functional.

26
Our plea for the use of consistent scoring functions supplements and qualifies, but does not
contradict, extant recommendations in the forecasting literature, such as those of Armstrong
(2001), Jolliffe and Stephenson (2003) and Fildes and Goodwin (2007). For example, Fildes
and Goodwin (2007) propose forecasting principles for organizations, the eleventh of which
suggests that “multiple measures of forecast accuracy” be employed. I agree, with the
qualification that the scoring functions to be used be consistent for the target functional.
We have developed theory for the notions of consistency and elicitability, and have char-
acterized the classes of the loss or scoring functions that result in expectations, ratios of
expectations, quantiles or expectiles as optimal point forecasts. Some of these results are
classical, such as those for means and quantiles (Savage 1971; Thomson 1979), while others
are original, including a disconcerting negative result, in that scoring functions which are
consistent for the CVaR functional do not exist.
In the case of the mean functional, the consistent scoring functions are the Bregman functions
of the form (18). Among these, a particularly attractive choice is the Patton family (20) of
homogeneous scoring functions, which nests the squared error (SE) and QLIKE functions.
In evaluating volatility forecasts, Patton and Sheppard (2009) recommend the use of the
latter because of its superior power in Diebold and Mariano (1995) and West (1996) tests of
predictive ability, which depend on differences between mean scores of the form (1) as test
statistics. Further work in this direction is desirable, both empirically and theoretically. If
quantile forecasts are to be assessed, the consistent scoring functions are the GPL functions
of the form (25), with the homogeneous power functions in (26) being appealing examples.
Interestingly, the scoring functions that are consistent for expectiles combine key elements
of the Bregman and GPL families.
As regards the most commonly used scoring functions in academia, businesses and organi-
zations, the squared error scoring function is consistent for the mean, and the absolute error
scoring function for the median. The absolute percentage error scoring function, which is
commonly used by businesses and organizations, and occasionally in academia, is consistent
for a non-standard functional, namely, the median of order −1, med(−1) , which tends to sup-
port severe underforecasts, as compared to the mean or median. It thus seems prudent that
businesses and organizations consider the intended or unintended consequences and reassess
its suitability as a scoring function.
Pers et al. (2009) propose a game of prediction for a fair comparison between competing
predictive models, which employs proper scoring rules. As Theorem 2.4 shows, consistent
scoring functions can be interpreted as proper scoring rules. Hence, the protocol of Pers et
al. (2009) applies directly to the evaluation of point forecasting methods. Their focus is on
the comparison of custom-built predictive models for a specific purpose, as opposed to the
M-competitions in the forecasting literature (Makridakis and Hibon 1979, 2000; Makridakis
et al. 1982, 1993), which compare the predictive performance of point forecasting methods
across multiple, unrelated time series. In this latter context, additional considerations arise,
such as the comparability of scores across time series with realizations of differing magnitude
and volatility, and commonly used evaluation methods remains controversial (Armstrong and

27
Collopy 1992; Fildes 1992; Ahlburg et al. 1992; Hyndman and Koehler 2006).
The notions of consistency and elicitability apply to point forecast competitions, where
participants ought to be advised ex ante about the scoring function(s) to be employed,
or, alternatively, target functional(s) ought to be named. If multiple target functionals
are named, participants can enter possibly distinct point forecasts for distinct functionals.
Similarly, if multiple scoring functions are to be used in the evaluation, and the scoring
functions are consistent for distinct functionals, participants ought to be allowed to submit
possibly distinct point forecasts.
While thus far we have addressed forecasting or prediction problems, similar issues arise
when the goal is estimation. Technically, our discussion relates to M-estimation (Huber
1964; Huber and Ronchetti 2009). A century ago Keynes (1911, p. 325) derived the Breg-
man representation (18) in characterizing the probability density functions for which the
“most probable value” is the arithmetic mean. For a contemporary perspective in terms
of maximum likelihood and M-estimation, see Klein and Grottke (2008). Komunjer (2005)
applied the GPL class (25) in conditional quantile estimation, in generalization of the tra-
ditional approach to quantile regression, which is based on the asymmetric piecewise linear
scoring function (Koenker and Bassett 1978). Similarly, Bregman functions of the origi-
nal form (18) and of the variant in (28) could be employed in generalizing symmetric and
asymmetric least squares regression.
In applied settings, the distinction between prediction and estimation is frequently blurred.
For example, Shipp and Cohen (2009) report on U.S. Census Bureau plans for evaluating
population estimates against the results of the 2010 Census. Five measures of accuracy are
to be used to assess the Census Bureau estimates, including the root mean squared error
(SE) and the mean absolute percentage error (APE). Our results demonstrate that Census
Bureau scientists face an impossible task in designing procedures and point estimates aimed
at minimizing both measures simultaneously, because the SE and the APE are consistent for
distinct statistical functionals. In this light, it may be desirable for administrative or political
leadership to provide a directive or target functional to Census Bureau scientists, much in
the way that Murphy and Daan (1985) and Engelberg et al. (2009) requested guidance for
point forecasters, in the quotes that open and motivate this paper.

Appendix A: Proofs
Proof of Theorem 2.3. Given F ∈ F , let t ∈ T(F ) and x ∈ D. Then
Z 
EF S(t, Y ) = EF S ω (t, y) λ(dω)
Z h i
= EF S ω (t, y) λ(dω)
Z h i
≤ EF S ω (x, y) λ(dω) = EF S(x, Y ),

28
where the interchange of the expectation and the integration is allowable, because each Sω
is a nonnegative scoring function. 

Proof of Theorem 2.4. Given any two probability measures F, G ∈ F , we have

EF S(F, Y ) = EF S(T(F ), Y ) ≤ EF S(T(G), Y ) = EF S(G, Y ),

where the expectations are well-defined, because the scoring function S is nonnegative. 

Proof of Theorem 2.6. We first show part (b). Towards this end, let tg ∈ Tg (F ) and xg ∈ D.
Then tg = g(t) for some t ∈ T(F ) and xg = g(x) for some x ∈ D. Therefore,

EF Sg (tg , Y ) = EF S(t, Y ) ≤ EF S(x, Y ) = EF Sg (xg , Y ).

As regards parts (c) and (a), it suffices to note that if S is strictly consistent, we have equality
if and only if x ∈ T(F ) or, equivalently, xg ∈ Tg (F ). 

Proof of Theorem 2.7. We first prove part (b). Let F ∈ F (w) , t ∈ T(w) (F ) and x ∈ D. Then

EF S(w) (t, Y ) = EF [w(Y ) S(t, Y )]


Z
= S(t, y)w(y)f (y) µ(dy)
Z  Z −1
(w)
= S(t, y) dF (y) · w(y)f (y) µ(dy)
Z  Z −1
(w)
≤ S(x, y) dF (y) · w(y)f (y) µ(dy)

= EF [w(Y ) S(x, Y )]
h i
= EF S(w) (x, Y ) ,

where µ is a dominating measure. The critical inequality holds because F (w) ∈ F (w) ⊆ F
and t(w) ∈ T(w) (F ) = T(F (w) ). To prove parts (c) and (a), we note that the inequality is
strict if S is strictly consistent for S, unless x ∈ T(F (w) ) = T(w) (F ). 

Proof of Theorem 2.8. Suppose that the functional T is elicitable relative to the class F
on the domain D. Then there exists a scoring function S which is strictly consistent for it
relative to F . Suppose now that F0 ∈ F , F1 ∈ F and t ∈ D are such that t ∈ T(F0 ) and
t ∈ T(F1 ). If x ∈ D is arbitrary and p ∈ (0, 1) is such that Fp = (1 − p)F0 + pF1 ∈ F then

EFp S(t, Y ) = (1 − p) EF0 S(t, Y ) + p EF1 S(t, Y )


≤ (1 − p) EF0 S(x, Y ) + p EF1 S(x, Y ) = EFp S(x, Y ).

29
Hence, t ∈ T(Fp ). 

Sketch of the proof of Theorem 3.1. The statements in parts (b) and (c) are immediate from
the arguments in Section 6.3 of Savage (1971), and form special cases of the more general
result in Theorem 3.2. To prove the necessity of the representation (18), Savage essentially
applied Osband’s principle with the identification function V(x, y) = x − y. 

Proof of Theorem 3.2. We first prove part (b). To show the sufficiency of the representation
(22), let x ∈ I and let F be a probability measure on I for which EF [r(Y )], EF [s(Y )],
EF [r(Y )φ′ (Y )], EF [s(Y )φ(Y )] and EF [Y s(Y )φ′ (Y )] exist and are finite. Then
 
EF [r(Y )]
EF S(x, Y ) − EF S ,Y
EF [s(Y )]
    
EF [r(Y )] ′ EF [r(Y )]
= EF [s(Y )] φ − φ(x) − φ (x) −x
EF [s(Y )] EF [s(Y )]

is nonnegative, and is strictly positive if φ is strictly convex and x 6= EF [r(Y )] / EF [s(Y )].
As regards part (c), it remains to show the necessity of the representation (22). We apply
Osband’s principle with the identification function V(x, y) = xs(y) − r(y), as proposed by
Osband (1985, p. 14). Arguing in the same way as in Section 2.4, we see that

S(1) (x, a)/(xs(a) − r(a)) = S(1) (x, b)/(xs(b) − r(b))

for all pairwise distinct a, b and x ∈ I. Hence,

S(1) (x, y) = h(x)(xs(y) − r(y))

for x, y ∈ I and some function h : I → I. Partial integration yields the representation (22),
where Z xZ s
φ(x) = h(u) du ds (34)
x0 x0

for some x0 ∈ I. Finally, φ is convex, because the scoring function S is nonnegative, which
implies the validity of the subgradient inequality.
To prove part (a), we consider the scoring function (22) with φ(y) = y 2 /(1 + |y|), for which
the expectations in part (b) exist and are finite if, and only if, EF [r(Y )], EF [s(Y )] and
EF [Y s(Y )] exist and are finite. 

Sketch of the proof of Theorem 3.3. For concise yet full-fledged proofs of parts (b) and (c),
see Gneiting (2008b), where Osband’s principle is applied with the identification function
V(x, y) = 1(x ≥ y)−α. To prove part (a), we may apply part (c) with any strictly increasing,
bounded function g : I → I, with g(x) = exp(−x)/(1 + exp(−x)) being one such example. 

30
Proof of Theorem 3.4. To show the sufficiency of the representation (28), let x ∈ I where
x < µτ , and let F be a probability measure with compact support in I. A tedious but
straightforward calculation shows that if S is of the form (28) then

EF S(x, Y ) − EF S(µτ , Y )
Z
= (1 − τ ) (φ(µτ ) − φ(x) − φ′ (x)(µτ − x)) dF (y)
(−∞, x)
Z
+τ (φ(y) − φ(x) − φ′ (x)(y − x)) dF (y)
Z[x, µτ )
+τ (φ(µτ ) − φ(x) − φ′ (x)(µτ − x)) dF (y)
[µτ , ∞)
Z
+ (1 − τ ) (φ(µτ ) − φ(y) − φ′ (x)(µτ − y)) dF (y)
[x, µτ ) | {z }
≥ φ(µτ ) − φ(y) − φ′ (y)(µτ −y) ≥ 0

is nonnegative, and is strictly positive if φ is strictly convex. An analogous argument applies


when x > µτ . This proves sufficiency in part (b) as well as the claim in part (c).
To prove the necessity of the representation (28) in part (b), we apply Osband’s principle
with the identification function V(x, y) = |1(x ≥ y) − τ | (x − y). Arguing in the usual way,
we see that
S(1) (x, y) = h(x) V (x, y)
for x, y ∈ I and some function h : I → I. Partial integration yields the representation (28),
where φ is defined as in (34) and is convex, because S is nonnegative.
To prove part (a), we apply part (c) with the convex function φ(y) = y 2/(1 + |y|), for which
EF φ(Y ) exists and is finite if, and only if, EF Y exists and is finite. 

Proof of Theorem 3.5. Suppose first that F contains the measures with finite support. Let
a, b, c, d ∈ I be such that a < b < c < 12 (b + d), which implies b < d, and consider the
probability measures
1
F1 = α δa + (1 − α) (δb + δd ), F2 = α δc + (1 − α)δ(b+d)/2 ,
2
where δx denotes the point measure in x ∈ R. Then CVaRα (F1 ) = CVaRα (F2 ) = 21 (b + d),
while CVaRα ( 21 (F1 + F2 )) = 41 (b + c + 2d) > 12 (b + d). Thus, the level sets of the functional
are not convex. By Theorem 2.8, the CVaR functional is not elicitable relative to the class
F . An analogous example emerges when the point measures are replaced by appropriately
focused and centered absolutely continuous distributions with compact support. 

31
Appendix B: Optimal point forecasts under the relative
error scoring function (Table 8)
Here we address a problem posited by Patton (2010), in that we find the optimal point
forecast or Bayes rule

x̂ = arg min x EF S(x, y) under S(x, y) = |(x − y)/x|, (35)

where Y = Z 2 and Z has a t-distribution with mean 0, variance 1 and ν > 2 degrees of
freedom. In the limiting case as ν → ∞, we take Z to be standard normal.
To find the optimal point forecast, we apply Theorem 2.2 and part (b) of Theorem 2.7 with
the original scoring function S(x, y) = |x−1 − y −1|, the weight function w(y) = y and the
domain D = (0, ∞), so that S(w) (x, y) = |(x − y)/x|. By Theorem 3.3, the scoring function S
is consistent for the median functional. Therefore, by Theorem 2.7 the optimal point forecast
under the weighted scoring function S(w) is the median of the probability distribution whose
density is proportional to yf (y), where f is the density of Y , or equivalently, proportional
to y 1/2 g(y 1/2), where g is the density of Z.
Hence, if Z has a t-distribution with mean 0, variance 1 and ν > 2 degrees of freedom,
the optimal point forecast under the relative error scoring function is the median of the
probability distribution whose density is proportional to
 − (ν+1)/ 2
1/2 y
y 1+
ν −2
on the positive halfaxis. Using any computer algebra system, this median can readily be
computed symbolically or numerically, to any desired degree of accuracy. For example, if
ν = 4 the optimal point forecast (35) is
2
x̂ = = 3.4048 . . .
22/3−1
Table 8 provides numerical values along with the approximations in Table 1 of Patton (2010),
which were obtained by Monte Carlo methods, and thus are less accurate. If Z has variance
σ 2 , the entries in the table continue to apply, if they are multiplied by this constant.

Acknowledgements
The author thanks Werner Ehm, Marc G. Genton, Peter Guttorp, Jorgen Hilden, Peter
J. Huber, Ian T. Jolliffe, Charles F. Manski, Caren Marzban, Kent H. Osband, Pierre Pin-
son, Adrian E. Raftery, Ken Rice, R. Tyrrell Rockafellar, Paul D. Sampson, J. McLean
Sloughter, Stephen Stigler, Adam Szpiro, Jon A. Wellner and Robert L. Winkler for discus-
sions, references and preprints. Financial support was provided by the Alfried Krupp von

32
Bohlen und Halbach Foundation, and by the National Science Foundation under Awards
ATM-0724721 and DMS-0706745 to the University of Washington. Special thanks go to
University of Washington librarians Martha Tucker and Saundra Martin for their unfailing
support of the literature survey in Table 2. Of course, the opinions expressed in this paper
as well as any errors are solely the responsibility of the author.

References
Abdous, B., and Theodorescu, R. (1992), “Note on the Spatial Quantile of a Random Vec-
tor,” Statistics & Probability Letters, 13, 333–336.
Acerbi, C. (2002), “Spectral Measures of Risk: A Coherent Representation of Subjective
Risk Aversion,” Journal of Banking and Finance, 26, 1505–1518.
Ahlburg, D. A., Chatfield, C., Taylor, S. J., Thompson, P. H., Murphy, A. H., Winkler,
R. L., Collopy, F., Armstrong, J. S. and Fildes, R. (1992), “A Commentary on Error
Measures,” International Journal of Forecasting, 8, 99–111.
Armstrong, J. S. (2001), “Evaluating Forecasting Methods,” in Principles of Forecasting,
Armstrong, J. S., ed., Kluwer, Norwell, Massachusetts, pp. 443–471.
Armstrong, J. S., and Collopy, F. (1992), “Error Measures for Generalizing About Forecast-
ing Methods: Empirical Comparisons,” International Journal of Forecasting, 8, 69–80.
Artzner, P., Delbaen, F., Eber, J.-M. and Heath, D. (1999), “Coherent Measures of Risk,”
Mathematical Finance, 9, 203–228.
Banerjee, A., Guo, X. and Wang, H. (2005), “On the Optimality of Conditional Expectation
as a Bregman Predictor,” IEEE Transactions on Information Theory, 51, 2664–2669.
Bao, Y., Lee, T.-H., and Saltoğlu, B. (2006), “Evaluating Predictive Performance of Value-
at-Risk Models in Emerging Markets: A Reality Check,” Journal of Forecasting, 25,
101–128.
Berkowitz, J., and O’Brien, J. (2002), “How Accurate are Value-at-Risk Models at Commer-
cial Banks?,” Journal of Finance, 57, 1093–1111.
Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroscedasticity,” Journal
of Econometrics, 31, 307–327.
Buja, A., Stuetzle, W. and Shen, Y. (2005), “Loss Functions for Binary Class Probability
Estimation and Classification: Structure and Applications,” Working paper,
http://www-stat.wharton.upenn.edu/~ buja/PAPERS/paper-proper-scoring.pdf.
Carbone, R., and Armstrong, J. S. (1982), “Evaluation of Extrapolative Forecasting Meth-
ods: Results of a Survey of Academicians and Practicioners,” Journal of Forecasting, 1,
215–217.
Cervera, J. L., and Muñoz, J. (1996), “Proper Scoring Rules for Fractiles,” in Bayesian
Statistics 5, Bernardo, J. M., Berger, J. O., Dawid, A. P., and Smith, A. F. M., eds.,
Oxford University Press, pp. 513–519.

33
Chaudhuri, P. (1996), “On a Geometric Notion of Quantiles for Multivariate Data,” Journal
of the American Statistical Association, 91, 862–872.
Christoffersen, P. F., and Diebold, F. X. (1996), “Further Results on Forecasting and Model
Selection Under Asymmetric Loss,” Journal of Applied Econometrics, 11, 561–571.
Dawid, A. P. (1984), “Statistical Theory: The Prequential Approach,” Journal of the Royal
Statistical Society, Ser. A, 147, 278–292.
(2007), “The Geometry of Proper Scoring Rules,” Annals of the Institute of Statistical
Mathematics, 59, 77–93.
Dawid, A. P. and Sebastiani, P. (1999), “Coherent Dispersion Criteria for Optimal Experi-
mental Design,” Annals of Statistics, 27, 65–81.
DeGroot, M. H., and Fienberg, S. E. (1983), “The Comparison and Evaluation of Probability
Forecasters,” Statistician, 12, 12–22.
Diebold, F. X., and Mariano, R. S. (1995), “Comparing Predictive Accuracy,” Journal of
Business and Economic Statistics, 13, 253–263.
Diebold, F. X., Gunther, T. A., and Tay, A. S. (1998), “Evaluating Density Forecasts With
Applications to Financial Risk Management,” International Economic Review, 39, 863–
883.
Duffie, D., and Pan, J. (1997), “An Overview of Value at Risk,” Journal of Derivatives, 4,
7–49.
Efron, B. (1991), “Regression Percentiles Using Asymmetric Squared Error Loss,” Statistica
Sinica, 1, 93–125.
Engelberg, J., Manski, C. F., and Williams, J. (2009), “Comparing the Point Predictions and
Subjective Probability Distributions of Professional Forecasters,” Journal of Business
and Economic Statistics, 27, 30–41.
Engle, R. F. (1982), “Autoregressive Conditional Heteroscedasticity With Estimates of the
Variance of United Kingdom Inflation,” Econometrica, 45, 987–1007.
Ferguson, T. S. (1967), Mathematical Statistics: A Decision-Theoretic Approach, Academic,
New York.
Fildes, R. (1992), “The Evaluation of Extrapolative Forecasting Methods,” International
Journal of Forecasting, 8, 81–98.
Fildes, R., and Goodwin, P. (2007), “Against Your Better Judgement? How Organizations
Can Improve Their Use of Management Judgement in Forecasting,” Interfaces, 37, 570–
576.
Fildes, R., Nikolopoulos, K., Crone, S. F., and Syntetos, A. A. (2008), “Forecasting and
Operational Research: A Review,” Journal of the Operational Research Society, 59,
1150–1172.
Giacomini, R., and Komunjer, I. (2005), “Evaluation and Combination of Conditional Quan-
tile Forecasts,” Journal of Business and Economic Statistics, 23, 416–431.

34
Gneiting, T. (2008a), “Editorial: Probabilistic Forecasting,” Journal of the Royal Statistical
Society, Ser. A, 171, 319–321.
(2008b), “Quantiles as Optimal Point Forecasts,” Technical Report no. 538, University
of Washington, Department of Statistics,
http://www.stat.washington.edu/research/reports/2008/tr538.pdf.
(2010), “Quantiles as Optimal Point Forecasts,” International Journal of Forecasting,
in press.
Gneiting, T., and Raftery, A. E. (2007), “Strictly Proper Scoring Rules, Prediction, and
Estimation,” Journal of the American Statistical Association, 102, 359–378.
Gneiting, T., Stanberry, L. I., Grimit, E. P., Held, L., and Johnson, N. A. (2008), “Assess-
ing Probabilistic Forecasts of Multivariate Quantities, With Applications to Ensemble
Predictions of Surface Winds,” Test, 17, 211–264.
Granger, C. W. J. (1969), “Prediction With a Generalized Cost of Error Function”, Opera-
tional Research Quarterly, 20, 199–207.
Granger, C. W. J., and Pesaran, M. H. (2000a), “Economic and Statistical Measures of
Forecast Accuracy,” Journal of Forecasting, 19, 537–560.
(2000b), “A Decision Theoretic Approach to Forecast Evaluation,” in Statistics and
Finance: An Interface, Chan, W.-S., Li, W. K., and Tong, H., eds., Imperial College
Press, London, pp. 261–278.
Hallin, M., Paindaveine, D., and S̆iman, M. (2010), “Regression Quantiles: From L1 Opti-
mization to Halfspace Depth,” Annals of Statistics, 38, 635–703.
Henderson, R., Jones, M., and Stare, J. (2001), “Accuracy of Point Predictions in Survival
Analysis,” Statistics in Medicine, 20, 3083–3096.
Hering, A. S., and Genton, M. G. (2010), “Powering up with Space-Time Wind Forecasting,”
Journal of the American Statistical Association, in press.
Hilden, J. (2008), “Scoring Rules for Evaluation of Prognosticians and Prognostic Rules,”
Workshop notes, http://biostat.ku.dk/~ jh/.
Horowitz, J. L., and Manski, C. F. (2006): “Identification and Estimation of Statistical
Functionals Using Incomplete Data,” Journal of Econometrics, 132, 445–459.
Huber, P. J. (1964), “Robust Estimation of a Location Parameter,” Annals of Mathematical
Statistics, 35, 73–101.
Huber, P. J., and Ronchetti, P. M. (2009), Robust Statistics, 2nd edition, Wiley, Hoboken,
New Jersey.
Hyndman, R. J., and Koehler, A. B. (2006), “Another Look at Measures of Forecast Accu-
racy,” International Journal of Forecasting, 22, 679–688.
James, W., and Stein, C. (1961), “Estimation With Quadratic Loss,” in Proceedings of the
Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, Neyman,
J., ed., University of California Press, pp. 361–379.

35
Jolliffe, I. T. (2008), “The Impenetrable Hedge: A Note on Propriety, Equitability and
Consistency,” Meteorological Applications, 15, 25–29.
Jolliffe, I. T., and Stephenson, D. B., eds. (2003), Forecast Verification: A Practicioner’s
Guide in Atmospheric Science, Wiley, Chichester.
Jose, V. R. R., and Winkler, R. L. (2009), “Evaluating Quantile Assessments,” Operations
Research, 57, 1287–1297.
Kemperman, J. H. B. (1987), “The Median of a Finite Measure on a Banach Space,” in
Statistical Data Analysis Based on the L1 Norm and Related Methods, Dodge, Y., ed.,
North Holland, pp. 217–230.
Keynes, J. M. (1911), “The Principal Averages and the Laws of Error which Lead to Them,”
Journal of the Royal Statistical Society, 74, 322–331.
Klein, I. and Grottke, M. (2008), “On J. M. Keynes’ “The Principal Averages and the Laws
of Error which Lead to Them” – Refinement and Generalisation.” Discussion Paper,
http://www.iwqw.wiso.uni-erlangen.de/forschung/07-2008.pdf.
Koenker, R. (2005), Quantile Regression, Cambridge University Press.
(2006), “The Median is the Message: Toward the Fréchet Mean,” Journal de la Société
Française de Statistique, 147, 61–64.
Koenker, R., and Bassett, G. (1978), “Regression Quantiles,” Econometrica, 46, 33–50.
Koltchinskii, V. I. (1997), “M-Estimation, Convexity and Quantiles,” Annals of Statistics,
25, 435–477.
Komunjer, I. (2005), “Quasi Maximum-Likelihood Estimation for Conditional Quantiles,”
Journal of Econometrics, 128, 137–164.
Lambert, N. S., Pennock, D. M., and Shoham, Y. (2008), “Eliciting Properties of Probability
Distributions,” Extended abstract, Proceedings of the 9th ACM Conference on Electronic
Commerce, July 8–12, 2008, Chicago, Illinois.
Laurent, S., Rombouts, J. V. K., and Violante, F. (2009), “On Loss Functions and Ranking
Forecasting Performances of Multivariate Volatility Models”, Discussion Paper,
http://www.cirpee.org/fileadmin/documents/Cahiers_2009/CIRPEE09-48.pdf.
Lehmann, E. L. (1951), “A General Concept of Unbiasedness,” Annals of Mathematical
Statistics, 22, 587–592.
Lehmann, E., and Casella, G. (1998), Theory of Point Estimation, 2nd edition, Springer,
New York.
Makridakis, S., and Hibon, M. (1979), “Accuracy of Forecasting: An Empirical Investiga-
tion” (with discussion), Journal of the Royal Statistical Society, Ser. A, 142, 97–145.
(2000), “The M3-Competition: Results, Conclusions and Implications,” International
Journal of Forecasting, 16, 451–476.
Makridakis, S., Chatfield, C., Hibon, M., Lawrance, M., Mills, T., Ord, K., and Simmons,
L. F. (1993), “The M2-Competition: A Real-Time Judgementally Based Forecasting
Study,” International Journal of Forecasting, 9, 5–22.

36
Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., Newton,
J., Parzen, E., and Winkler, R. (1982), “The Accuracy of Extrapolation (Time Series)
Methods: Results of a Forecasting Competition,” Journal of Forecasting, 1, 111–153.
McCarthy, J. (1956), “Measures of the Value of Information,” Proceedings of the National
Academy of Sciences, 42, 654–655.
McCarthy, T. M., Davis, D. F., Golicic, S. L., and Mentzner, J. T. (2006), “The Evolution of
Sales Forecasting Management: A 20-Year Longitudinal Study of Forecasting Practice,”
Journal of Forecasting, 25, 303–324.
Mentzner, J. T., and Kahn, K. B. (1995), “Forecasting Technique Familiarity, Satisfaction,
Usage, and Application,” Journal of Forecasting, 14, 465–476.
Moskaitis, J. R., and Hansen, J. A. (2006), “Deterministic Forecasting and Verification: A
Busted System?,” Working paper, Massachusetts Institute of Technology,
http://wind.mit.edu/~ hansen/papers/MoskaitisHansenWAF2006.pdf.
Murphy, A. H., and Daan, H. (1985), “Forecast Evaluation,” in Probability, Statistics and
Decision Making in the Atmospheric Sciences, Murphy, A. H., and Katz, R. W., eds.,
Westview Press, Boulder, Colorado, pp. 379–437.
Murphy, A. H., and Winkler, R. L. (1987), “A General Framework for Forecast Verification”,
Monthly Weather Review, 115, 1330–1338.
Noorbaloochi, S., and Meeden, G. (1983), “Unbiasedness as the Dual of Being Bayes,”
Journal of the American Statistical Association, 78, 619–623.
Newey, W. K., and Powell, J. L. (1987), “Asymmetric Least Squares Estimation and Test-
ing,” Econometrica, 55, 819–847.
Offerman, T., Sonnemans, J., van de Kuilen, G., and Wakker, P. P. (2009), “A Truth-
Serum for non-Bayesians: Correcting Proper Scoring Rules for Risk Attitudes. Review
of Economic Studies, 76, 1461–1489.
Osband, K. H. (1985), “Providing Incentives for Better Cost Forecasting,” Ph.D. Thesis,
University of California, Berkeley.
Osband, K., and Reichelstein, S. (1985), “Information-Eliciting Compensation Schemes,”
Journal of Public Economics, 27, 107–115.
Park, H., and Stefanski, L. A. (1998), “Relative-Error Prediction,” Statistics & Probability
Letters, 40, 227–236.
Patton, A. J. (2010), “Volatility Forecast Comparison Using Imperfect Volatility Proxies,”
Journal of Econometrics, in press, http://econ.duke.edu/~ ap172/.
Patton, A. J., and Sheppard, K. (2009), “Evaluating Volatility and Correlation Forecasts,”
in Handbook of Financial Time Series, Anderson, T. G., Davis, R. A., Kreiss, J.-P., and
Mikosch, T., eds., Springer, pp. 801–838.
Pers, T. H., Albrechtsen, A., Holst, C., Sørensen, T. I. A., and Gerds, T. A. (2009), “The
Validation and Assessment of Machine Learning: A Game of Prediction from High-
Dimensional Data,” PLoS ONE, 4, e6287, doi:10.1371/journal.pone.0006287.

37
Phelps, R. R. (1966), Lectures on Choquet’s Theorem, D. Van Nostrand, Princeton.
Pukelsheim, F. (2006), Optimal Design of Experiments, SIAM Classics edition, SIAM, Philadel-
phia.
Raiffa, H., and Schlaifer, R. (1961), Applied Statistical Decision Theory, Colonial Press,
Clinton.
Reichelstein, S., and Osband, K. (1984), “Incentives in Government Contracts,” Journal of
Public Economics, 24, 257–270.
Rockafellar, R. T. (1970), Convex Analysis, Princeton University Press.
Rockafellar, R. T., and Uryasev, S. (2000), “Optimization of Conditional Value-at-Risk,”
Journal of Risk, 2, 21–42.
(2002), “Conditional Value-at-Risk for General Loss Distributions,” Journal of Bank-
ing and Finance, 26, 1443–1471.
Saerens, M. (2000), “Building Cost Functions Minimizing to Some Summary Statistics,”
IEEE Transactions on Neural Networks, 11, 1263–1271.
Savage, L. J. (1971), “Elicitation of Personal Probabilities and Expectations,” Journal of
the American Statistical Association, 66, 783–810.
Schervish, M. J. (1989), “A General Method for Comparing Probability Assessors,” Annals
of Statistics, 17, 1856–1879.
Serfling, R. (2002), “Quantile Functions for Multivariate Analysis: Approaches and Appli-
cations,” Statistica Neerlandica, 56, 214–232.
Shipp, S., and Cohen, S. (2009), “COPAFS Focuses on Statistical Activities,” Amstat News,
August 2009, 15–18.
Small, C. G. (1990), “A Survey of Multidimensional Medians,” International Statistical
Review, 58, 263–277.
Thomson, W. (1979), “Eliciting Production Possibilities From a Well-Informed Manager,”
Journal of Economic Theory, 20, 360–380.
Wellner, J. A. (2009), “Statistical Functionals and the Delta Method,” Lecture notes,
http://www.stat.washington.edu/people/jaw/COURSES/580s/581/LECTNOTES/ch7.pdf.
West, K. D. (1996), “Asymptotic Inference About Predictive Ability,” Econometrica, 64,
1067–1084.
Winkler, R. L. (1996), “Scoring Rules and the Evaluation of Probabilities” (with discussion),
Test, 5, 1–60.

38

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy