Notes and Correspondence Plotting Positions in Extreme Value Analysis
Notes and Correspondence Plotting Positions in Extreme Value Analysis
ABSTRACT
Plotting order-ranked data is a standard technique that is used in estimating the probability of extreme
weather events. Typically, observations, say, annual extremes of a period of N years, are ranked in order of
magnitude and plotted on probability paper. Some statistical model is then fitted to the order-ranked data
by which the return periods of specific extreme events are estimated. A key question in this method is as
follows: What is the cumulative probability P that should be associated with the sample of rank m? This
issue of the so-called plotting positions has been debated for almost a century, and a number of plotting
rules and computational methods have been proposed. Here, it is shown that in estimating the return
periods there is only one correct plotting position: P ⫽ m/(N ⫹ 1). This formula predicts much shorter
return periods of extreme events than the other commonly used methods. Thus, many estimates of the
weather-related risks should be reevaluated and the related building codes and other related regulations
updated.
FIG. 1. Example of the extreme value analysis of 50 annual extremes on Gumbel probability paper. The dots represent the probability
plotting positions from Castillo (1988) by using Hazen’s (1914) formula P ⫽ (m ⫺ 1⁄2)/N. The effect of erroneous plotting positions to
extrapolating toward extreme events is illustrated by plotting the 10 largest extremes also by Eq. (3), that is, P ⫽ m/(N ⫹ 1). These
correct plotting positions are marked by crosses. Linear extrapolation using the 10 largest maxima to the wind speed of 35 m s⫺1 results
in approximate return periods of 200 yr based on Hazen’s formula and 90 yr based on Eq. (3).
anticipated cumulative distribution function P ⫽ F(x) P, as such, is being plotted to estimate return periods.
of the variable appears as a straight line. Typically, the Furthermore, it is pointed out that the so-called modi-
Gumbel probability paper (Gumbel 1958) is used be- fied Gumbel method, in which the plotting is made
cause in many cases the distribution of the extremes, through an initial transformation to a reduced variate
each selected from r events, asymptotically approaches (e.g., Kimball 1960; Cunnane 1978; Harris 1996), pro-
the Gumbel distribution when r goes to infinity. In duces a probability parameter that cannot be used to
modern analysis, graphs based on the Pareto distribu- estimate the return periods.
tion and the generalized extreme value distribution are
also used (e.g., Pickands 1975; Brabson and Palutikof
2. The history of plotting positions
2000). The transformed variable that replaces P on
such plots is called the reduced variate. Figure 1 shows Over the last 90 years, a number of plotting formulas
an illustrative example of the extreme value analysis. and related computational methods for the extreme
In this paper, an important problem of the extreme value analysis have been proposed and supported by
value analysis—how to assess the correct cumulative empirical justification. A summary of the most com-
probabilities to the ranked values—is solved. It is first monly used plotting formulas is shown in Table 1. Re-
shown that there exists a unique plotting formula when views on this subject are available in Cunnane (1978),
336 JOURNAL OF APPLIED METEOROLOGY AND CLIMATOLOGY VOLUME 45
TABLE 1. Return period R of the largest value in a sample of 21 as being “not recommended” because “it gives esti-
annual extremes as given by the commonly used plotting methods. mates of return period that are smaller than the other
The error is given as the percentage in R when compared with that
methods.” The statement by Folland and Anderson
given by the Weibull formula. All other formulas overestimate R,
that is, underestimate the risk. (2002) is striking in that the Weibull formula in Eq. (3)
is generally used and may be considered as an essential
Method Proponent R (yr) Error (%) part of the standard Gumbel extreme value method
m/(N ⫹ 1) Weibull (1939) 22.0 0 (e.g., Gumbel 1958; Cook 1982, 1985; Cook et al. 2003).
(m ⫺ 0.31)/(N ⫹ 0.38) Beard (1943) 31.0 41 Obviously, more convincing arguments to support a
(m ⫺ 0.44)/(N ⫹ 0.12) Gringorten (1963) 37.7 71 plotting formula and, indeed, a unique solution of the
(m ⫺ 0.5)/N Hazen (1914) 42.0 91
Numerical method Harris (1996) 37.9 72
problem are in demand.
considered as the unbiased estimate for the plotting P, have been followed above. Solving P ⫽ 1 ⫺ Pe from
position (e.g., Cunnane 1978; Harris 1996). Folland and Eq. (11) yields
Anderson (2002), however, suggested that the median
of F(xm) should be used instead. A mathematical proof m
P⫽ , 共12兲
is given in the following for the mean of F(xm) as the N⫹1
correct estimate for the plotting position in the extreme
which is Eq. (3). Equation (12) also results if, instead,
value analysis of return periods.
the order ranking is done in descending order and the
Let us define Pe ⫽ 1 ⫺ P as the probability of ex-
probability of exceedance is considered, the combina-
ceedance of the mth smallest observation in the past N
tion of which is a common practice in many applica-
trials. Then, following Castillo (1988, 13–14), the prob-
tions. It is noteworthy that the result derived above is
ability w of observing r exceedances in the future n
independent of the underlying distribution f (x).
trials is given by
In summary, it was shown above that when estimat-
w共N, N ⫺ m ⫹ 1, n, rⲐPe兲 ⫽ 共rn兲Pre共1 ⫺ Pe兲n⫺r. 共6兲 ing the return period R, the correct plotting position is
obtained by the mean of F(xm), that is, by using Eq. (3).
The mean of the binomial variable in Eq. (6) is nPe. Hence, in the analysis of the return period the other
Therefore, the mean number of exceedances r is suggested plotting formulas, such as Eq. (2), are incor-
冕 1
rect.
r共N, N ⫺ m ⫹ 1, n兲 ⫽ nPe f 共Pe兲 dPe. 共7兲
0
4. Plotting positions involving a reduced variate
Taking into account the total probability rule, and that
the mean of the mth-order statistics F(xm) is given by Most of the plotting formulas suggested historically
Eq. (5), the mean number of exceedances in Eq. (7) are, however, not intended to be used for plotting the
becomes cumulative probability or the related return period R
on arithmetic paper. Instead, they are used when plot-
n共N ⫺ m ⫹ 1兲 ting on paper where the probability scale is transformed
r⫽ . 共8兲
N⫹1 in order to obtain a linear fit that is convenient to ex-
trapolate. For example, on a Gumbel plot (Fig. 1), the
Let us now return to the exact definition of the return
probability scale is transformed into the reduced vari-
period R. Let A be an event and T be the random time
ate ⫽ ⫺ln(⫺lnP) ⫽ ⫺ln[⫺ln(1 ⫺ 1/R)].
between consecutive occurrences of A events. Then, the
In the classical Gumbel analysis, Eq. (3) is used so
mean value of the random variable T is called the return
that the probability P is approximated by its mean. A
period R of the event A. It follows from this definition
nonlinear transformation is then applied to that mean.
that the mean number of events A in an observation
Kimball (1960), Gringorten (1963), Cunnane (1978),
period that is equal to T is 1. An event A is here defined
and Harris (1996) have argued that when the plotting
so that a random observation, say an annual extreme
involves a reduced variate, a more correct procedure
value, exceeds a value of x. Then, for the mean number
would be to apply the transformation first and then plot
of exceedances r,
the mean value E(m) of the reduced variate m de-
r⫽1 when n ⫽ R. 共9兲 fined that way. This results in plotting formula of the
type
Combining Eqs. (8) and (9) gives
P ⫽ ⫺1关E共m兲兴, 共13兲
N⫹1
R⫽ . 共10兲
N⫺m⫹1 where ⫺1 is the inverse function of the transformation
that gives . The plotting positions based on Eq. (13), in
In terms of the probability of exceedance Pe ⫽ 1 ⫺ P, contrast to those based on Eq. (12), depend on the
the return period in Eq. (1) becomes R ⫽ 1/Pe, and one transformation and, hence, on the postulated parent
gets from Eq. (10) probability distribution function f (x). The various dis-
N⫺m⫹1 tribution-tailored plotting formulas and methods pre-
Pe ⫽ . 共11兲 sented in the literature reflect this situation, that is, it is
N⫹1
believed that the plotting positions for estimating the
The notations of Folland and Anderson (2002), where return periods depend on the underlying distribution
the ranking is done is ascending order and the plotting when a reduced variate is involved in the analysis.
position is defined as the probability of nonexceedance It was shown above in section 3 that Eq. (12) asso-
338 JOURNAL OF APPLIED METEOROLOGY AND CLIMATOLOGY VOLUME 45
ciates the cumulative probability P to the mth rank in N the cumulative probability distribution so that a careful
samples. This fundamental relationship can be ex- selection must be made,” illustrate this confusion. Ma-
pressed in terms of the return period R as Eq. (10). nipulation of the plotting positions in order to obtain a
Suppose that we have N years of observation of, say, linear fit can be identified as a failure to properly sepa-
annual extremes, then in the analysis of these N rate the two different procedures required in the data
maxima Eq. (10) provides a unique relationship g. Let analysis; one must first determine the probability posi-
us denote this relationship by g, so that tions, which are independent of the distribution, and
only then make transformations hoping to obtain lin-
R ⫽ g共m兲. 共14兲
earity in relation to some model distribution and a good
As shown in section 3 by deriving Eq. (10), the rela- fit to the plotted data. In other words, one should not fit
tionship g is independent of the underlying distribution. the observations to a model, but fit a model to the
Thus, Eq. (14) underlines that there exists a fundamen- observations.
tal connection between the rank of an observation and Second, the argument given to justify Eq. (13)—“It is
the estimate of its return period. This relationship is pre- not the probability ordinate that is plotted but the re-
sented in quantitative form in Eq. (10). To estimate the duced variate” (Harris 1996)—is misleading. This is so
return period, we may plot the N maxima on arithmetic because transforming to the reduced variate is merely a
paper using R as the ordinate by applying R ⫽ g(m) and method to manipulate the probability scale of the
try fitting some curve to the points thus plotted. As graph, so that a parameter that appears linear on the
discussed above, the alternative and more commonly ordinate is obtained. In the classical Gumbel analysis it
used method is to transform the scale on the ordinate is the probability P that is being plotted, but now on
axis so that the points plotted would better fall on a another scale. The transformation then associates
straight line. E[F(xm)] to m. Kimball (1960), Gringorten (1963),
Clearly, the fundamental distribution free relation- Cunnane (1978), and Harris (1996, 1999, 2000), on the
ship g that associates the return period R with a rank m other hand, plot the reduced variate by making the
cannot be affected by the fitting method. In other transformation before plotting, that is, by associating
words, the plotting positions given by Eq. (10) must not E(m) to m. However, it was shown in section 3 that the
be manipulated based on an arbitrary choice of the foundation for the use of the mean E[F(xm)] is merely
scale on the ordinate axis of the graph that is devised to that the return period R is defined as the mean time
merely alleviate the analysis of the data. Hence, in es- period T between events that exceed F(xm). There exists
timating R, any deviation from the use of the Eq. (10) no justification for the use of the mean of the cumula-
by applying a plotting formula other than Eq. (12), tive probability function of the mth-ranked value as the
based on some presumed statistical model, is misuse of plotting position when the variable is something else
the data. The fitting procedure may reflect the scale than F(xm). To be useful at all in estimating R, that
used, but the probability positions of the data must be parameter must be a result of an operator that retains
the same regardless of the method of fitting. In other the fundamental relationship in Eq. (14). Hence, it
words, the plotting formula P ⫽ m/(N ⫹ 1) is valid must be such that its application to the distribution of
regardless of the transformation made. m rescales to the mean of F(xm), that is, to P. This
It is, therefore, concluded that the approach leading redirects the plotting to the use of P and Eq. (3) in the
to the distribution-specific plotting formulas through first place.
Eq. (13) is both unnecessary and incorrect when ana- Third, in the approach of plotting the reduced variate
lyzing return periods. Because this concept has been the transformation is from F(xm) to E(m). This is dif-
persistent in the literature for many decades, it is of ferent from that of the classical Gumbel analysis, in
interest to discuss in detail the origins and nature of the which the transformation is from E[F(xm)] to m be-
errors involved. cause the result of taking a mean and making a nonlin-
First, confusion has been caused by the temptation to ear transformation depends on the order in which these
obtain a good linear fit for easy extrapolation on prob- operations are applied. Consequently, the linearity,
ability paper. This has misled many researchers to ma- shown by Gumbel (1958) to exist as a result of plotting
nipulate the plotting positions to that end. The com- E[F(xm)] by Eq. (3), is lost when E(m) is being plotted.
ments by Blom (1958, 68–75) that “a condition to be This linearity can be returned if one knows the under-
satisfied by any plotting formula is that the points must lining distribution, as shown empirically by Cunnane
lie on the average on a line which deviates only little (1978) and theoretically by Harris (1996). However,
from a straight line,” and by Castillo (1988) that “the this can only be done by manipulating the plotting po-
plotting position formulas can affect the linear trend of sitions, that is, by violating Eq. (12). Such manipulated
FEBRUARY 2006 NOTES AND CORRESPONDENCE 339
plotting positions no more correspond to the probabil- the use of Hazen’s formula [used, for example, by
ity P that is required to estimate the return period. The Castillo (1988)] can be approximated by Fig. 1. For the
error thus made can be described in mathematical wind speed of 35 m s⫺1 Hazen’s formula predicts R of
terms as follows. By an axiom of probability calculus, a approximately 200 yr instead of the 90 yr predicted by
sample probability P is additive. Consequently, its non- the correct plotting formula, that is, Eq. (3). Jenkin-
linear transformation is nonadditive. The best esti- son’s formula, supported by Folland and Anderson
mate of a sample parameter is its mean value only if (2002), predicts R of 35 m s⫺1 to be about 130 yr in the
that parameter is additive. Hence, keeping in mind that case of Fig. 1.
P is being estimated, the transformation must be made An implication of the overall error that results at
in such a way that the mean is taken over P, not over , high extremes is obtained by simply considering the
that is, the transformation to a reduced variate must not return period R that is predicted through Eq. (1) by the
be made before taking the mean. different plotting formulas for the largest extreme in
In summary, in order to use Eq. (1) in the case of the sample, that is, when m ⫽ N. In Table 1 such a
order-ranked data, the cumulative probability P in it comparison is shown for a sample of 21 annual maxima
must be defined as the mean of F(xm) in an infinite [this period is chosen because for that the numerical
ensemble of ranked observations, each including N result by Harris (1996) is available]. Table 1 shows that
samples. The variable ⫺1[E(m)] in Eq. (13), which is an error of more than 70% in the return period of the
a retransformation of the mean of the nonlinearly largest observed extreme is obtained by using both the
transformed F(xm) values, does not meet this definition. Gringorten formula, which has been used in the analy-
Thus, estimation of return periods based on order- sis, particularly when utilizing the generalized Pareto
ranked data is not possible by interpreting the reduced distribution (Hosking et al. 1985; Hosking and Wallis
variate as being transformed before plotting. Conse- 1995; Linacre 1992; Brabson and Palutikof 2000) and
quently, the concept of distribution-specific plotting the modified Gumbel analysis method (Harris 1996,
formulas in analyzing return periods should be aban- 1999, 2000).
doned. This causes no problems to the analysis, how- From the point of view of estimating the risks of
ever, because the Weibull plotting formula P ⫽ m/(N ⫹ extreme weather phenomena in the present and future
1) is to be used regardless of the underlining distribu- climates these errors are very serious because overes-
tion. timating the return period equals underestimating the
risk. Because the present estimates of many important
weather-related risks are partly based on the conven-
5. Discussion and conclusions
tional methods that have been shown here to be invalid,
It was shown above in section 3 that the Weibull comprehensive reanalysis of them is suggested. This
plotting formula P ⫽ m/(N ⫹ 1) directly follows from may make it worthwhile to reevaluate the related build-
the definition of the return period R. Thus, proof was ing codes and regulations.
given for Eq. (3) as the correct plotting formula when
the return periods are being analyzed by the extreme Acknowledgments. This work was supported by the
value method. The proof is valid for any underlying Ministry of Environment, Finland. Thanks are given to
continuous distribution f (x). Dr. Matti Pajari for fruitful discussions.
It was further pointed out in section 4 that, because
P ⫽ m/(N ⫹ 1) associates the mth-ranked value of x REFERENCES
with the cumulative probability and the related return Beard, L. R., 1943: Statistical analysis in hydrology. Trans. Amer.
period R in a fundamental way, this relationship holds Soc. Civ. Eng., 108, 1110–1160.
regardless of the transformations made in the extreme Benson, M. A., 1962: Plotting positions and economics of engi-
value analysis. Consequently, the various other meth- neering planning. Proc. Amer. Soc. Civ. Eng. Hydraul. Div.,
88 (HY6), 57–71.
ods for determining the plotting positions, suggested
Blom, G., 1958: Statistical Estimates and Transformed Beta-
during the last 90 years, such as the formulas by Blom, Variables. John Wiley and Sons, 146 pp.
Jenkinson, and Gringorten, the computational methods Brabson, B. B., and J. P. Palutikof, 2000: Tests of the generalized
by Yu and Huang (2001), as well as the modified Gum- Pareto distribution for predicting extreme wind speeds. J.
bel method, are incorrect when applied to estimating Appl. Meteor., 39, 1627–1640.
Castillo, E., 1988: Extreme Value Theory in Engineering. Aca-
return periods.
demic Press, 389 pp.
As can be seen in Fig. 1 and in Folland and Anderson Cook, N. J., 1982: Towards better estimation of extreme winds. J.
(2002), the errors resulting from the use of such incor- Wind Eng. Ind. Aerodyn., 9, 295–323.
rect methods are very large. The error resulting from ——, 1985: The Designer’s Guide to Wind Loading on Building
340 JOURNAL OF APPLIED METEOROLOGY AND CLIMATOLOGY VOLUME 45
Structures. Part I: Background, Damage Survey, Wind Data, ——, ——, and E. F. Wood, 1985: Estimation of the generalized
and Structural Classification. Butterworths, 371 pp. extreme-value distribution by the method of probability
——, R. I. Harris, and R. Whiting, 2003: Extreme wind speeds in weighted moments. Technometrics, 27, 251–261.
mixed climates revisited. J. Wind Eng. Ind. Aerodyn., 91, Jordaan, I., 2005: Decisions under Uncertainty. Cambridge Uni-
403–422. versity Press, 672 pp.
Cunnane, C., 1978: Unbiased plotting positions—A review. J. Hy- Kharin, V. V., and F. W. Zwiers, 2005: Estimating extremes in
drol., 37, 205–222. transient climate change simulations. J. Climate, 18, 1156–
Folland, C., and C. Anderson, 2002: Estimating changing ex- 1173.
tremes using empirical ranking methods. J. Climate, 15, 2954– Kimball, B. F., 1960: On the choice of plotting positions on prob-
2960. ability paper. J. Amer. Stat. Assoc., 55, 546–560.
Gringorten, I. I., 1963: A plotting rule for extreme probability Langbein, W. B., 1960: Plotting positions in frequency analysis.
paper. J. Geophys. Res., 68, 813–814. U.S. Geol. Surv. Water Supply Pap., 1543-A, 48–51.
Gumbel, E. J., 1958: Statistics of Extremes. Columbia University Linacre, E., 1992: Climate Data and Resources. Routledge Press,
Press, 375 pp. 366 pp.
Harris, R. I., 1996: Gumbel re-visited—A new look at extreme Meehl, G. A., F. W. Zwiers, J. Evans, T. Knutson, L. Mearns, and
value statistics applied to wind speeds. J. Wind Eng. Ind. P. Whetton, 2000: Trends in extreme weather and climate
Aerodyn., 59, 1–22. events: Issues related to modeling extremes in projection of
——, 1999: Improvements to the “method of independent future climate change. Bull. Amer. Meteor. Soc., 81, 427–436.
storms.” J. Wind Eng. Ind. Aerodyn., 80, 1–30. Pickands, J., 1975: Statistical interference using extreme order
——, 2000: Control curves for extreme value methods. J. Wind statistics. Ann. Stat., 3, 119–130.
Eng. Ind. Aerodyn., 88, 119–131. Weibull, W., 1939: A statistical theory of strength of materials.
Hazen, A., 1914: Storage to be provided in impounding reservoirs Ing. Vetensk. Akad. Handl., 151, 1–45.
for municipal water supply. Trans. Amer. Soc. Civ. Eng. Pap., Yu, G. H., and C. C. Huang, 2001: A distribution free plotting
1308 (77), 1547–1550. position. Stoch. Environ. Res. Risk Assess., 15, 462–476.
Hosking, J. R., and J. R. Wallis, 1995: A comparison of unbiased Zhang, X., F. W. Zwiers, and G. Li, 2004: Monte Carlo experi-
and plotting-position estimators of L moments. Water Re- ments on the detection of trends in extreme values. J. Cli-
sour. Res., 31, 2019–2025. mate, 17, 1945–1952.