Biondi DENDROCLIM2002
Biondi DENDROCLIM2002
Abstract
Tree-ring chronologies are often calibrated against instrumental climate records using correlation and response
functions. DENDROCLIM2002 uses bootstrapped confidence intervals to estimate the significance of both correlation
and response function coefficients. Input and output file selection, as well as analytical options, are chosen from a user-
friendly GUI. Final results are saved in ASCII format, and are plotted on screen using color-coded symbols.
DENDROCLIM2002 is an extension of existing task-specific software, which is mostly MS-DOS based, and of
available user-supplied code for statistical packages, such as SAS. In addition, DENDROCLIM2002 incorporates the
ability to test for temporal changes of dendroclimatic relationships by means of evolutionary and moving intervals. This
simple approach allows for a complete, dynamical representation of statistical relationships between climate and tree
growth. An example using published dendroclimatic data is used to illustrate the analytical and graphical capabilities of
the software.
r 2004 Elsevier Ltd. All rights reserved.
0098-3004/$ - see front matter r 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cageo.2003.11.004
ARTICLE IN PRESS
304 F. Biondi, K. Waikul / Computers & Geosciences 30 (2004) 303–311
Selection of analysis
End
regression parameters for the original predictors. In response and correlation coefficients, and to test their
principal component regression, it is standard practice significance at the 0.05 level. Bootstrap samples are
to discard the principal components with the smallest drawn at random with replacement from the calibration
variances (eigenvalues). This ensures that the new design interval. Median correlation and response coefficients
matrix is farther from being singular, thereby reducing are deemed significant if they exceed, in absolute value,
the multicollinearity problem. In DENDROCLIM2002 half the difference between the 97.5th quantile and the
principal components are selected according to the PVP 2.5th quantile of the 1000 estimates (Dixon, 2001).
criterion (Guiot, 1990; Fig. 3). The model then becomes
Y ¼ Zm km þ e;
where Zm is the n m matrix obtained after discarding 3. Algorithmic complexity
(q m) principal components, and e incorporates both
random disturbances and the discarded components. In this section we discuss the algorithmic complexity
Linear least squares is used to estimate the m 1 vector of the software considering only the core computation
km, and an estimate of b can be obtained after setting the and neglecting the user interface construction. We
last (q m) elements of k equal to zero. DENDRO- consider cases that lead to the maximum number of
CLIM2002 uses 1000 bootstrapped samples to compute iterations, e.g. when single interval analysis uses the
ARTICLE IN PRESS
306 F. Biondi, K. Waikul / Computers & Geosciences 30 (2004) 303–311
Fig. 3. Detailed list of computational steps in (A) single interval analysis, and (B) multiple interval analysis.
entire common interval or when multiple interval process for result storage matrices takes up to Oð2q2 Þ;
analysis uses the minimum base length. We employ where q is number of predictors and the factor 2 refers to
‘Big-Oh’ (O) notation, which gives the asymptotic upper the computation of both correlation and response
bound on execution time but is not necessarily related to functions. The generation of each one of the 1000
running time for every input combination (Cormen et al., bootstrap samples requires Oðnq þ nÞ; with the addi-
2001). Mathematically it is represented as tional n factor due to the tree-ring index data vector
(predictand). Both predictand and predictors are then
OðgðnÞÞ ¼ f ðnÞ 0pf ðnÞpcgðnÞ 8nXn0 ;
standardized, which involves O(cnq) computations, c
where c and n0 are non-zero constants (Fig. 4). being a small constant. Principal component regression
In single interval analysis, the first step of reading data requires initialization of new matrices, so that O(Cnq)
files requires Oð4n), where n is the number of years will be performed, C being a large constant. The first
(rows) in each input file, for a maximum of 4 files. step is to compute X 0 X ; where X is the predictor matrix;
Therefore, execution time for this step is directly this involves Oðnq2 Þ: Computing eigenvalues requires
proportional to the size of the data sets. Computing OðnqÞ to reduce the diagonal elements of the symmetric
the boundaries of the common interval involves Oð1Þ: matrix to 1, and O(Cq3) to apply Jacobian estimation
The following step of reading the input data into the (Press et al., 1997, 2002). Eigenvectors are sorted in
array that will be used for bootstrapping samples descending order according to their eigenvalues, a
requires OðnqÞ; where n is number of years (rows) and process which takes another Oðq2 Þ: Applying the PVP
q is number of predictors (columns). The initialization criterion and multiplying the original matrix X by the
ARTICLE IN PRESS
F. Biondi, K. Waikul / Computers & Geosciences 30 (2004) 303–311 307
Fig. 4. (A) Plot of mathematical expression of ‘‘Big-Oh’’ notation. (1) Example of C code and of its execution time in ‘‘Big-Oh’’
notation.
selected principle components involves a total runtime programs. Data used by Fekedulegn et al. (2002), were
of Oðnq2 Þ: The solution vector is obtained by Singular retrieved from the world wide web, and re-analyzed
Value Decomposition (SVD; Press et al., 1997, 2002), using SAS, RESPO, PRECON, and DENDRO-
which takes O(Cq4) steps, with C being a very large CLIM2002. However, we found that RESPO v.6.06P
constant (in the order of 104). Obtaining correlation incorrectly read the input data sets, hence no reliable
coefficients requires Oðq2 Þ computations. Significance results could be obtained. That program has since been
tests of median coefficients based on the 0.975 and 0.025 removed from distribution until further notice (Richard
quantiles are done in Oðq2 Þ: In summary, most of the Holmes,1 Tucson, AZ, pers. comm.). Sample linear
running time is dedicated to the calculation of response correlations were calculated using SAS, and their
functions, especially because of the SVD included there. t-values, computed according to Press et al. (2002),
In multiple interval analysis, response and correlation were used to plot the correlation function (Fig. 5). No
coefficients are computed a number of times. Hence the monthly climatic correlation stands out; coefficients for
complexity of this analysis is EOðnq4 Þ; where n is the current July precipitation (positive) and February
number of years in the common interval. This can be temperature (negative) are barely significant. In addi-
approximated as EOðq5 Þ; since the value of n is tion, current February (negative) and May (positive)
comparable to q in most cases. Such computational precipitation are almost significant, together with pre-
intensity gives rise to an important user interface issue. vious-year August temperature (negative).
Execution time was often 20–25 min or longer on Intel In their analysis of the same data set, Fekedulegn et al.
Celeron and Intel Pentium III processors with 256 MB (2002), used response functions whose significance was
of RAM. In addition, the computer became practically tested without considering the issues raised by Cropper
unusable while the application was running. Therefore, a (1985). Because of that, a large number of predictors
worker thread was created to handle all CPU-intensive were deemed significant: 9 precipitation variables (all
computations. This allows the operating system to positive), and 7 temperature variables (4 positive and 3
respond to user requests while the program executes as negative). Of those, 5 predictors were from the previous
a background process. year, even though autocorrelation had been removed
from the tree-ring indices. Climate-tree growth relation-
ships are clarified using bootstrapped response func-
tions, as shown by the PRECON output (Fig. 6).
4. Comparison with other programs Winter signals no longer appear, and summer
6 0.6
0.4 Temperature
4
2 0.2
0 0.0
-2 -0.2
Coefficient
-4 -0.4
Temperature
t-value
-6 -0.6
6 0.6
4 0.4
2 0.2
0 0.0
-2 -0.2
-4 -0.4 Precipitation
Precipitation
-6 -0.6
MJ J A S ON D J F MAM J J A S
MJ J A S O ND J FM A MJ J A S
Month
Month
Fig. 6. Plot of response function generated by PRECON for
Fig. 5. Plot of correlation function for example data set.
example data set. A total of 1000 bootstrap samples were used
Sample linear correlation coefficients computed using SAS were
to compute mean coefficients and their 2 standard deviation
transformed in t-values to facilitate graphical representation of
intervals.
their 95% confidence limits.
moisture stress dominates the response, with a signifi- records (62 yr total, from 1935 to 1996). However,
cant current July precipitation (positive). Current considering that previous-year coefficients are not
August precipitation (positive) and previous-year Au- significant, it would be possible to reduce the number
gust temperature (negative) are, respectively, close to of predictors by studying current-year relationships
and barely significant. alone. As an example, moving interval correlation and
In DENDROCLIM2002 significant coefficients are response functions are shown in Fig. 8 for January
plotted with color-coded symbols (Fig. 7). Although through October precipitation and temperature, using a
output files can be used to display correlation and base length (or minimum number of years) equal to 45.
response functions in the style of Fig. 6, which was The positive relationship with June–July precipitation is
originally used by Fritts et al. (1971), we found that the main climate signal in the tree-ring chronology,
identifying the main climate signals ‘at a glance’ was although it is not particularly stable over time. Back-
facilitated by making all non-significant coefficients ward and forward evolutionary interval analyses (not
equal to zero. This is especially the case when showing shown) also point to growing season precipitation as the
results of multiple-interval analysis (Biondi, 1997, 2000). dominant relationship, with a tendency to become more
In the particular example used here, summer precipita- significant in recent decades.
tion (mainly July) has a positive relationship with tree
growth. Minor differences between PRECON and
DENDROCLIM2002 results are related to computa-
tional procedures, such as the way in which final 5. Conclusions
coefficients are estimated (mean in PRECON, median
in DENDROCLIM2002) and tested for significance (2 The temporal stability of climate-proxy connections is
standard deviation interval in PRECON, difference an extremely important issue in any type of paleocli-
between the 97.5 and the 2.5 percentile in DENDRO- matic reconstruction. In dendroclimatology the investi-
CLIM2002). gator has the opportunity to actually investigate such
Analysis of temporal stability requires a large enough issue because of the exact calendar dates assigned to the
number of intervals and of degrees of freedom within proxy records. Considering that software for dendrocli-
each interval (Fig. 3). In the example data set, the matic analysis is in dire need of updating, DENDRO-
number of predictors (34 variables total, 17 for CLIM2002 is expected to facilitate the identification of
temperature and 17 for precipitation) is too large when climatic signals, and their potential changes over time,
compared to the overlap between tree-ring and climate embedded in tree-ring records.
ARTICLE IN PRESS
F. Biondi, K. Waikul / Computers & Geosciences 30 (2004) 303–311 309
Fig. 7. Correlation and response functions computed using DENDROCLIM2002 for example data set.
ARTICLE IN PRESS
310 F. Biondi, K. Waikul / Computers & Geosciences 30 (2004) 303–311
Fig. 8. Moving interval correlation and response functions computed using DENDROCLIM2002 for example data set. A base length
of 45 years was progressively slid through the total number of available years (without missing values), i.e. from 1935 to 1995.
ARTICLE IN PRESS
F. Biondi, K. Waikul / Computers & Geosciences 30 (2004) 303–311 311