Wilkinson Rogers Formula 2346786
Wilkinson Rogers Formula 2346786
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series C (Applied Statistics)
SUMMARY
The paper describes the symbolic notation and syntax for specifying factorial
models for analysis of variance in the control language of the GENSTAT
statistical program system at Rothamsted. The notation generalizes that
of Nelder (1965). Algorithm AS 65 (Rogers, 1973) converts factorial model
formulae in this notation to a list of model terms represented as binary
integers.
A further extension of the syntax is discussed for specifying models
generally (including non-linear forms).
1. INTRODUCTION
GENERAL computer programs for analysing experiments need a concise, flexible
notation for specifying the appropriate factorial models. The notation in this paper,
and various others due to Zyskind (1962), Hemmerle (1964), Nelder (1965), Fowlkes
(1969) and Claringbold (1969), were discussed at an international workshop meeting
on the computational aspects of analysis of variance at the University of Wisconsin in
1970 (Muller and Wilkinson, 1970).
The present notation for model formulae includes the addition, crossing and
nesting operators common to most of the notations mentioned, a dot operator for
defining multi-factor model terms and deletion operators for eliminating unwanted
terms from otherwise simple formulae. Submodel functions may be substituted for
factors in a formula, to specify regression sub-models for partitioning factorial effects.
The notation is implemented in the GENSTAT language (Nelder et al., 1973) (which
also includes a special pseudo-factor operator not described here). The GENSTAT
system is currently in operation at Rothamsted, the Edinburgh Regional Computing
Centre, Cambridge and Bristol Universities and other centres. Algorithm AS 65
(Rogers, 1973) converts symbolic factorial model formulae to a list of model terms
represented as binary integers.
Further extensions of the notation are readily envisaged, e.g. a diallel function of
parental genotype factors and a similarity-link operator for combining random terms
with a common variance, such as rows and columns in a lattice square design. A
general extension of notation to include linear or non-linear regression models is
described in Section 4.
factors A and B,
Latin squares, Youden squares, lattice squares and plaid designs have a block structure
rows*cols (6)
or reps/(rows*cols) (7)
while split-plot or split-row and split-column designs have block structures such as
blocks/mainplots/subplots (8)
or repsl(rows*cols)/subplots (9)
or (rows/subrows)*(cols/subcols). (10)
e.g. nitrate*phos*potash
or spray/(type*dose), (12)
where spray is a factor indicating whether experimental plots were sprayed or not
(with insecticide, say). Note that in this example the factors type and dose would
include null levels associated with the unsprayed plots.
The general rules for determining simple factorial formulae from formulae such as
(7)-(12) are given in Section 3.
+ *l
are three deletion operators with meanings as follows:
Operator
(i) - Delete the specified term(s) from the preceding model.
(ii) -* As for (i), and also any corresponding higher-order terms.
(iii) -/ Delete only the corresponding higher-order terms.
These are useful for deleting unwanted terms from crossed and nested formulae when
the corresponding simple factorial sum of terms would be otherwise too lengthy.
The following equivalent model expressions illustrate their meaning (see rules in
Section 3).
The ANOVA directive of the GENSTAT language also provides a model-order contro
parameter for suppressing, from the analysis, all treatment terms above the order
specified.
2.6. Submodels
An important requirement for the analysis of variance of factorial models is the
ability to specify submodels for partitioning factorial effects into regression com-
ponents; linear, quadratic and cubic trends of main effects for instance, and inter-
actions of these such as linear (A) x linear (B), linear (A) x quadratic (B), etc., where A
and B are different factors.
It is usually sufficient in practice to specify submodels only for the main effects of
each factor, since these then define by implication the corresponding compound
submodels for higher-order factorial terms. Submodels are specified in the GENSTAT
language by substituting for factors in the treatment model formula the appropriate
submodel functions of them, which are of the form
would produce the type of genotype x site analysis described by Finlay and Wilkinson
(1963), in which the sensitivity of each genotype with respect to site is characterized by a
linear regression of yield values for that genotype on the site means (over all genotypes);
together with an extension of the analysis for linear and quadratic trends of yield on
density (sowing rate) and their interactions with genotype and site. If a linear sub-
model were also specified for genotype, as for site, single degrees of freedom for non-
additivity would be produced.
Operator * / * + - -/ - (18)
Precedence 1 2 3 4 4 4 4 (
(We omit from consideration here the substitution of submodels for factors, which
does not affect the primary, factorial model.)
Ordering of model terms. Some of the evaluation rules below may not produce a
statistically appropriate ordering of model terms, so that re-ordering may be required.
An essential order requirement is that any term in a simple factorial model should
precede all terms to which it is marginal, i.e. A before A B, etc. A stronger require-
ment (implemented in GENSTAT) iS that terms be arranged in increasing order with
respect to the number of factors in a term, with terms of the same order arranged in a
natural sequence with respect to the factors defining them.
(A+B)C= A C+B C.
where FAC(L) is the dot-product of all factors in L. It will usually be a term in the
expansion of L. For example,
(A+B)*C= A+B+C+(A+B)-C
= A+B+C+A C+B C,
Deletion operations
indicates the linear model subspace determined by the p+q+pq variates associated
with Xl, X2 and XI X2.
(ii) Introducing a symbolic exponentiation operator **, a complete second-degree
model with respect to XI and X2 is concisely described as
in which MAXVAL and RATE are the parameters, X is a column vector of known
x-values and * here denotes item-by-item multiplication. This raises two points:
(1) Declaration of parameters. A distinction must be made between symbols in an
algebraic expression that represent parameters and those that represent variables with
known values. This can be done for instance with declarations such as
(2) Modes of expression. Since the algebraic and symbolic modes of expression
involve the same operator symbols +,-, *c, / but with different meanings, an un-
ambiguous indication of which mode of expression is being used in particular contexts
is required. Thus, a directive 'MODEL' might carry a modifier to indicate mode:
Mixed mode expressions may sometimes be necessary. For instance, the parameters
MAXVAL and RATE in (29) might depend on certain factorial treatments with
symbolic model A *B. This can be effected by introducing a special function
where LM stands for linear model. The symbolic argument defines a set of x-variates
and the second argument is a name for identifying the corresponding array of para-
meters. The functional notation enables symbolic expressions to be introduced into
otherwise algebraic expressions. Thus, (29) could be modified to
or if, alternatively, the parameter definition (30) is extended to include the appropriate
symbolic models, e.g.
'PARAMETERS' MAXVAL,RATE $ A*B, (35)
5. ACKNOWLEDGEMENT
The authors wish to thank a referee for valuable suggestions which substantially
improved the presentation.
REFERENCES
CLARINGBOLD, P. (1969). An approach to conversational statistics. In Statistical Computation
(R. C. Milton and J. A. Nelder, eds.) pp. 267-283. New York: Academic Press.
FINLAY, K. W. and WILKINSON, G. N. (1963). The analysis of adaptation in a plant-breeding pro-
gramme. Aust. J. Agric. Res., 14, 742-754.
FOWLKES, E. B. (1969). Some operators for ANOVA calculations. Technometrics, 11, 511-526.
HEMMERLE, W. J. (1964). Algebraic specifications of statistical models for analysis of variance
computations. J. Ass. Comput. Mach., 11, 234-239.
MULLER, M. E. and WILKINSON, G. N. (1970). Statistical algorithms and computational aspects
of the analysis of variance. Report on ANOVA Workshop, 1970, University of Wisconsin.
NELDER, J. A. (1965). The analysis of randomized experiments. I. Block structure and the null
analysis of variance. II. Treatment structure and the general analysis of variance. Proc. Roy.
Soc. A, 283, 147-178.
NELDER, J. A. et al. (1973). GENSTAT Reference Manual, Rothamsted Experimental Station.
ROGERS, C. E. (1973). Algorithm AS 65. Interpreting structure formulae. Appl. Statist., 22, 414-
424.
WILKINSON, G. N. (1969). Facilities in a statistical program system for analysis of multiple-
indexed data. In Statistical Computation (R. C. Milton and J. A. Nelder, eds.), pp. 201-228.
New York: Academic Press.
WILKINSON, G. N. (1970). A general recursive procedure for analysis of variance. Biometrika,
57, 19-46.
ZYSKIND, G. (1962). On structure, relation, sigma and expectation of mean squares. Sankhya, A,
24, 115-148.