Normal Score Variogram GTSIM
Normal Score Variogram GTSIM
Abstract
Application of the truncated Gaussian method for categorical variable simulation requires the calculation of an
appropriate normal scores variogram for generating the Gaussian random ®eld. In the case of only two categories,
the appropriate variogram can be determined by inverting the indicator variogram model from the bivariate
Gaussian distribution. Even though no closed-form relation exists for such inversion, the proper normal scores
variogram can be obtained through numerical integration via a series approximation. The procedure is illustrated
with a small simulation example demonstrating the close reproduction of the variogram of the categorical data.
# 1999 Elsevier Science Ltd. All rights reserved.
0098-3004/99/$ - see front matter # 1999 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 9 8 - 3 0 0 4 ( 9 8 ) 0 0 1 2 4 - 1
162 P.C. Kyriakidis et al. / Computers & Geosciences 25 (1999) 161±169
where K threshold values {zk, k = 1, . . . , K} discretize used to simulate (after truncation) a nested set of only
the variability of the Z attribute. two lithofacies; i.e. simulate each set nested into a pre-
Various algorithms exist for generating a number L viously simulated set. This approach can be used only
of realizations { j (l)(u), u $ A}, l = 1, . . ., L of the indi- to simulate lithofacies nested one into another.
cator random function (RF) {J(u), u $ A}; the most An approximation consists of inverting the single co-
common and straightforward being sequential indi- variance model CY(h) from some average of the
cator simulation (Journel and Alabert, 1988). Another (K ÿ 1) indicator variograms, in the case of K lithofa-
alternative is to truncate a realization { y (l)(u), u $ A} cies. It is this single average indicator variogram that
of a standard normal RF {Y(u), u $ A} to create an in- would be reproduced through simulation, not any par-
dicator realization { j (l)(u), u $ A} (Journel and Isaaks, ticular lithofacies indicator variogram model.
1984; Matheron et al., 1987): Recent eorts to extend the truncated Gaussian
method for simulation of multiple (K>2), non-nested,
l 1; if y l u > y0 ; lithofacies call for multiple Gaussian RFs. Contrary to
j u
0; if not the case K = 2, simulation of multiple, non-nested,
lithofacies involves a highly non-unique inversion.
with y0 being some threshold value. The only data Iterative procedures based on trial and error selection
available are indicator data, yet the covariance model of the input normal scores covariances and cross co-
CY(h) of the Gaussian RF {Y(u), u $ A} to be trun- variances between the multiple Gaussian RFs have
cated is needed for generating the realization { y (l)(u), been reported in the literature (see, for example, Loc'h
u $ A}. and Galli, 1997).
One could use for CY(h) the (standardized to unit This paper recalls the one-to-one relationship
sill) indicator covariance model CJ(h) inferred from the between CJ(h) and CY(h) in the case of only two litho-
categorical data. This practice leads to a mismatch facies and proposes a power series development to ap-
between the target indicator variogram model CJ(h) proximate it.
and the variogram of the resulting simulated indicator
values { j (l)(u), u $ A}. Such mismatch becomes larger
for indicator variograms related to lithofacies with 2. Theoretical framework
small proportions, that is when the threshold value y0
deviates signi®cantly from the zero median. In ad- Consider a stationary standard (zero mean and unit
dition, Matheron (1989) showed that it is not consist- variance) multivariate normal RF {Y(u), u $ A}. The
ent to use the same variogram type (e.g. spherical) for univariate and bivariate cumulative distribution func-
the Gaussian RF {Y(u), u $ A}, as the target indicator tions (cdfs) are:
variogram CJ(h) inferred from the sample indicator G y ProbfY uRyg p, 8u 2 A
data. ÿ1
Alternatively, the original categorical data are trans- where y = G ( p) is the standard normal p-quantile
formed to continuous pseudo normal scores data, as in and
Xu and Journel (1993), and CY(h) is substituted by the G rY h; y, y 0 ProbfY uRy, Y u hRy 0
covariance inferred from such data. The resulting
pseudo normal score values depend critically on the where rY(h) = CY(h) = E{Y(u)Y(u + h)} is the covari-
procedure used to despike the original indicator data, ance of the Gaussian RF Y(u), fully characterizing its
i.e. break the ties between binary values. Random multivariate distribution.
despiking leads to a too-high nugget eect and par- Next, consider the indicator RF J(u; y0) de®ned by
ameters for a despiking algorithm, which ranks the cat- truncating the Gaussian RF Y(u) at the p0 quantile:
egorical data according to local areal proportions, are
1; if Y u > y0 Gÿ1 p0;
dicult to establish. J u; y0
0; if not
In the case of a single threshold y0 separating only
two lithofacies, the normal scores covariance CY(h) is The resulting stationary indicator RF J(u; y0) has
directly linked, through a one-to-one relation, to the the following moments:
lithofacies indicator covariance CJ(h) (Journel and
Isaaks, 1984). However, such relation exists only for . Mean mJ:
the case of only two lithofacies. A standard Gaussian
mJ EfJ u; y0 g ProbfY u > y0 g 1 ÿ p0 :
RF Y(u) is fully speci®ed by its sole covariance model
CY(h), hence it cannot be used to identify more than
one indicator covariance model CJ(h). . Variance s2J:
One way around this limitation would be to consider
a series of Gaussian RFs Yk(u), k = 1, . . . , K, each s 2J VarfJ u; y0 g mJ 1 ÿ mJ p0 1 ÿ p0 :
P.C. Kyriakidis et al. / Computers & Geosciences 25 (1999) 161±169 163
CJ h; y0 KJ h; y0 ÿ 1 ÿ p0 2
Fig. 1. Bivariate scatterplot of Y1 versus Y2: non-centered in-
G rY h; y0 , y0 ÿ p0 1 ÿ p0 : 3 dicator covariance KJ(h; y0) is integral of fY1 Y2 ( y1, y2) over
hatched region.
Inversion of Eq. (2) or Eq. (3) provides the normal
scores covariance rY(h) as a function of the non-cen- 3. Implementation
tered indicator covariance KJ(h; y0):
The ®rst step towards simplifying Eq. (6) is to per-
rY h G ÿ1 KJ h; y0 ÿ 1 ÿ 2p0 ; y0 4 form a 458 rotation and scaling of the original vari-
ables Y1, Y2 in order to eliminate the cross terms from
or as a function of the centered indicator covariance the exponential integrand.
CJ(h; y0): Consider the following change of variables:
rY h G ÿ1 CJ h; y0 p0 1 ÿ p0 ; y0 : 5 y1 y2 y2 ÿ y1
u p and v p :
2 1 rY h 2 1 ÿ rY h
Conversely, the relation linking KJ(h; y0) to rY(h) is
determined by integrating the bivariate Gaussian den- The original variables are then written as
sity over the area delineated by Y1>y0 and Y2>y0 p p
(see hatched area in Fig. 1): y1 u 1 rY h ÿ v 1 ÿ rY h,
1 1 p p
KJ h; y0 gY1 Y2 h; y1 , y2 dy1 dy2 6 y2 u 1 rY h v 1 ÿ rY h
y0 y0
where gY1 Y2 ( y1, y2) is the bivariate density function of with Jacobian:
two Gaussian RVs Y1=Y(u) and Y2=Y(u + h) separ- p p q
1 rY h ÿ 1 ÿ rY h
ated by vector h (Anderson, 1958): J u, v p p 2 1 ÿ r 2Y h:
1 rY h 1 ÿ rY h
gY1 Y2 y1 , y2
The limits of integration are now modi®ed as fol-
1 y 2 ÿ 2rY hy1 y2 y 22 lows:
p exp ÿ 1 : 7
2p 1 ÿ r 2Y h 2 1 ÿ r 2Y h p
. For y1=y0 and y2=y0: u = u0=y0/( 1 rY h).
Unfortunately, integrals of Gaussian densities, such
. For y1= + 1 and y2= + 1: u = + 1.
as in Eq. (6), do not have closed-form analytical ex-
pressions. The task is then to approximate KJ(h; y0) . For y2=y and y1 $ (ÿ1, + 1): v = v1=
p
0
through a numerical integration procedure, and then ÿ (u ÿ u0) 1 rY h= 1 ÿ rY h.
inversion of Eq. (6) with respect to rY(h) for a given
value of h. Note that rY(h) is constrained by . For yp1=y and y2 $ (ÿ1, + 1):
0 v = v2=
ÿ1 R rY(h) R 1 and that the integral in Eq. (6) is (u ÿ u0) 1 rY h= 1 ÿ rY h.
monotonic in rY(h).
Substituting the expressions for y1 and y2 into the
A power series expansion approach is proposed
numerator of the bivariate density (Eq. (7)) yields
hereafter for numerically evaluating Eq. (6). First, the
coecients for the power series approximation of ÿ y21 ÿ 2rY hy1 y2 y 22 ÿ2 1 ÿ r 2Y h u 2 v 2 :
CJ(h) in terms of CY(h) are computed, and then the
power series is reverted to obtain a development of The integral in Eq. (6) giving the non-centered indi-
CY(h) in terms of CJ(h). cator covariance KJ(h; y0) is then simpli®ed to
164 P.C. Kyriakidis et al. / Computers & Geosciences 25 (1999) 161±169
p=2
1 2
sec2 y
eÿd dy 9
p o
The normal scores correlogram rY(h) does not where dt is expressed in terms of dy using the fact that
appear explicitly in the integrand, instead it is sec2y ÿ 1 = tan2y, hence
accounted for in the limits of integration u0 and v2.
The second step is to convert to polar coordinates
(see Fig. 3) de®ning
u r cos y ÿ o and v r sin y ÿ o
@t @t
sec2 y)@ y 2 : greater than 2. In general, for an arbitrary threshold
@y t 1 y0, in the neighborhood of rY(h) = 1, the variogram
Application of power series expansion at t = 0 (see, gJ(h) of the indicator RF behaves like the square root
for example, Apostol, 1967), leads to: of the variogram 1 ÿ rY(h) of the Gaussian RF
(Matheron, 1989) and this poses problems when deal-
2 2 2
eÿd t0 eÿd t ing with Gaussian indicator covariance models.
KJ h; y0 1 ÿ p ÿ dt
p 0 t2 1 However, Gaussian covariance models characterize
XN phenomena with very high spatial continuity and
1 1 ÿ p ÿ bn t n0 11 should not be used for modeling a (by de®nition) dis-
n1 continuous indicator RF. Hence, for realistic positive
where N is the number of terms retained in the series de®nite indicator covariance models CJ(h) the normal
approximation, and coecients bn are evaluated as scores correlogram values derived by bigaus2 constitute
a licit covariance table, since they are obtained by
2 2 2
1 eÿd @ n
eÿd t
= t 2 1 direct evaluation (integration and inversion) of Eq. (6).
bn
n p @t Alternatively, one could ensure positive de®niteness
of the numerically derived normal scores correlogram
or, equivalently, as
X rY(h) by ®tting a permissible parametric correlogram
nÿ1=2 model rY(h; y) to rY(h). Here, y denotes the vector of
2
nÿ1=2 eÿd =np d 2k =k!
bn ÿ1 k0 , if n odd, parameters (sills, ranges) of each basic structure com-
prising the parametric correlogram model rY(h; y)
bn 0, if n even:
adopted.
Direct reversion of the series equation (Eq. (11))
leadsp
to a power series for
t0= 1 ÿ rY h= 1 rY h (see, for example,
Abramovitz and Stegun, 1972): 4. Program description
X
N
t0 an y n 12 The GSLIB program bigaus (Deutsch and Journel,
n1 1998) has been modi®ed to include the reverse pro-
cedure, i.e. calculation of gY(h) from gJ(h). The new
where y = CJ(h)p0(1 ÿ p0) + p0(1 ÿ p0) and an are coef- program bigaus2 can handle the following cases:
®cients de®ned as:
X (1) Input: normal scores variogram model gY(h).
nÿ1
i1 ai b n, i
an ÿ Output: values of the indicator variogram gJ(h)
b n, n de®ned at a speci®c threshold y0.
with b(n, n) being the nnth element of a matrix B con- (2) Input: indicator variogram model gJ(h) for a
taining the coecients bn. speci®ed threshold y0. Output: values of the normal
The value of rY(h) is then obtained by simple alge- scores variogram gY(h) required for generating the
bra as: Gaussian RF {Y(u), u $ A}.
(3) Input: experimental normal scores variogram
1 ÿ t 20 values g*Y(h). Output: `experimental' values of the
rY h : 13
1 t 20 indicator variogram g*J(h) de®ned at a speci®c
threshold y0.
Note that, even if the series involves N terms the (4) Input: experimental indicator variogram
actual number of non-zero terms is (N ÿ 1)/2. values g*J(h) for a speci®ed threshold y0. Output: `ex-
Remark: The resulting normal scores correlogram perimental' values of the normal scores variogram
values rY(h) must constitute a positive de®nite covari- g*Y(h) required for generating the Gaussian RF
ance table for any lag h. This could be checked a pos- {Y(u), u $ A}.
teriori by verifying that the Fourier transform of the
numerically derived correlogram rY(h), i.e. the corre- Note that in case (4), a further modeling step is
sponding spectral density function, is non-negative for required in order to ®t a licit normal scores variogram
all possible lags h (see, for example, Christakos, 1984). model to the values output from bigaus2. Test runs of
Note that for certain indicator covariance models bigaus2 indicate that in case (2), the resulting normal
CJ(h; y0), such as a Gaussian model that behaves like scores variogram values could be used directly for gen-
a parabola near the origin, the resulting normal scores erating the Gaussian RF {Y(u), u $ A}, provided that
correlogram model rY(h) may not be positive de®nite, the threshold value y0 is not too high, and that nu-
since it behaves like a polynomial with an exponent merical integration errors are small.
166 P.C. Kyriakidis et al. / Computers & Geosciences 25 (1999) 161±169
The bigaus2 program follows GSLIB conventions. . For each of the nst nested structures, one must de-
The parameters required for the program, shown also ®ne it: the type of the structure; cc the c parameter;
in Fig. 4, are listed as follows: ang1, ang2, ang3, the angles de®ning the geometric
anisotropy; aahmax, the maximum horizontal range;
. imd: if imd is set to 1, then an input variogram aahmin, the minimum horizontal range and aavert, the
model is required. If imd is set to 2, then an input vertical range. A detailed description of these par-
®le in¯ containing experimental semivariogram ameters is given in section II.3 of the GSLIB man-
values is required. ual.
. in¯: if imd is set to 2, the input ®le containing the
experimental semivariogram values should be pro- The FORTRAN source code in ®le bigaus2.f, along
vided. with the corresponding parameter ®le bigaus2.par, can
. pcut: if imd is set to 2, a single threshold is required. be downloaded from a searchable database www.iam-
This threshold is expressed in units of cumulative g.org/CGEditor/index.htm.
probability, e.g. the median is 0.50.
. icl: the type of calculation is speci®ed. If icl is set to
1, then the input consists of a standardized indicator 5. An illustrative example
semivariogram (either model or experimental). If icl
is set to 2, the input consists of a normal scores The following example illustrates the discrepancy
incurred by generating the Gaussian RF {Y(u), u $ A}
semivariogram (either model or experimental).
using the standardized indicator variogram gJ(h)
. out¯: the output ®le for the normal scores or indi-
instead of the correct normal scores variogram gY(h).
cator semivariograms depending on icl. The format
First, the dierence between the standardized indicator
is the same as that created by GSLIB; therefore,
variogram and the correct normal scores variogram
GSLIB program vargplt could be used to plot these
model is shown for a hypothetical system with various
indicator variograms.
proportions of two lithofacies.
. ncut: the number of thresholds. The standardized indicator variogram modeling the
. zc(ncut): ncut threshold values are required. These spatial distribution of the two lithofacies is a spherical
thresholds are expressed in units of cumulative prob- model with a 5% nugget and a range of 10 units; the
ability, for example, the lower quartile is 0.25, the ®eld dimensions being 100 100 units. Four possible
median is 0.50. Note that the results are symmetric: facies proportions are examined, p = 0.5, p = 0.8,
the variogram for the 5th percentile (0.05) is the p = 0.9 and p = 0.97. The corresponding normal
same as the variogram for the 95th percentile (0.95). scores variograms (case (2) in bigaus2) are shown in
. ndir and nlag: the number of directions and the Fig. 5.
number of lags to be considered. Fig. 5 illustrates the smaller nugget variance of the
. azm, dip and lag: for each of the ndir directions, an correct normal scores variogram (dashed line) and its
azimuth and a dip must be speci®ed (by azm and parabolic behavior at the origin, as opposed to the lin-
dip, respectively) along with a unit lag oset (lag). ear behavior of the input standardized indicator vario-
. nst and c0: the number of nested structures and the gram model (solid line). The mismatch is more
nugget eect. pronounced as the proportion p deviates more from
P.C. Kyriakidis et al. / Computers & Geosciences 25 (1999) 161±169 167
Fig. 5. Discrepancy between input standardized indicator variogram models (solid line) and correct normal scores variograms
(dashed line) for various facies proportions p. Note that normal scores variograms do not have zero nugget eect; this can be
detected if graphs were to be plotted on larger scale. Nugget eect values are 0.0031, 0.0026, 0.0021 and 0.0014, respectively, for
p = 0.5, 0.8, 0.9 and 0.97.
0.5, i.e. as one of the two lithofacies becomes more nugget variance contribution and a range of 10 units.
abundant. Everything else being equal, utilization of The second (Fig. 6, top right) was calculated using the
the correct normal scores variogram leads to a more correct normal scores variogram as generated from
continuous realization at short scales than had the program bigaus2 for 1 ÿ p = 0.25. These two Gaussian
standardized indicator variogram been used.
realizations were truncated via program gtsim
Utilization of the correct normal scores variogram
(Deutsch and Journel, 1998) at the 1 ÿ p = 0.25 quan-
model becomes more critical when the data control
does not suce to impose the pattern of spatial vari- tile. The two resulting indicator realizations (binary
ation. For this reason, the following example using images) are shown in the middle of Fig. 6. Both indi-
unconditional simulation shows the discrepancies at cator realizations have the correct proportion of litho-
their worst. facies, yet the omnidirectional variograms of the
Consider the problem of generating a realization of simulated indicator values (Fig. 6, bottom) are quite
the spatial distribution of the two lithofacies (Fig. 6). dierent. Utilization of the standardized indicator var-
Let the desired proportion of facies A be 1 ÿ p = 0.25 iogram model results in a binary image exhibiting a
and that of facies B be p = 0.75. Let the target indi-
higher nugget variance (more noise) than the target in-
cator variogram be an isotropic spherical one with
dicator variogram model. On the contrary, using the
relative nugget of 5% and range of 10 units.
Two realizations of a Gaussian RF are generated correct normal scores variogram model yields, after
using program sgsim (Deutsch and Journel, 1998) (see truncation at the 1 ÿ p = 0.25 quantile, a binary image
top of Fig. 6). The ®rst (Fig. 6, top left) was generated whose indicator variogram is very close to the target
using an isotropic spherical variogram model with 5% indicator variogram model.
168 P.C. Kyriakidis et al. / Computers & Geosciences 25 (1999) 161±169
Fig. 6. Indicator variogram reproduction for two gtsim realizations using: (a) standardized indicator variogram model (left graphs)
and (b) correct normal scores variogram model (right graphs). Model indicator variogram is shown in solid line.