0% found this document useful (0 votes)
119 views81 pages

Basic Methods of Theoretical Biology

Basic methods in Theoretical Biology discusses mathematical tools that are useful for graduate students studying quantitative and theoretical biology. It presents mathematical concepts like an extended glossary with biological applications and examples. The document covers topics like methodology, the mathematical toolkit, models for processes, and model-based statistics.

Uploaded by

MadMinarch
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views81 pages

Basic Methods of Theoretical Biology

Basic methods in Theoretical Biology discusses mathematical tools that are useful for graduate students studying quantitative and theoretical biology. It presents mathematical concepts like an extended glossary with biological applications and examples. The document covers topics like methodology, the mathematical toolkit, models for processes, and model-based statistics.

Uploaded by

MadMinarch
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Basic methods in Theoretical Biology

S.A.L.M. Kooijman
Dept. Theoretical Biology
Faculty of Earth and Life Science, Vrije Universiteit, Amsterdam
This document is part of the DEB tele-course
http://www.bio.vu.nl/thb/deb/course/
Basic methods in Theoretical Biology discusses a basic toolkit which graduate
students in Quantitative Biology, and especially in Theoretical Biology, should be able to
use. The mathematical material is presented like an extended glossary; applications in
biology are given in examples and exercises.
Ackowledgements I like to thank J. Ferreira, B.W. Kooi and C. Zonneveld for their
helpful comments
Summary of contents:
1 METHODOLOGY
2 MATHEMATICAL TOOLKIT
3 MODELS FOR PROCESSES
4 MODEL-BASED STATISTICS
Accompanying:
EXAMPLES From biological problem via mathematics to solution
EXERCISES Motivations, given, questions, hints, answers
Contents
Preface vii
1 Methodology 1
1.1 Empirical cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Scales in organization . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Eciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Numerical behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1 Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Identication of variables to be measured . . . . . . . . . . . . . . . . . . . 10
1.8 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.9 Realism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.9.1 Stochastic versus deterministic models . . . . . . . . . . . . . . . . 12
1.10 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.10.1 Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.10.2 Predicate logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Mathematical toolkit 17
2.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.3 Sequences and series . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.4 Dierentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.6 Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
iii
iv CONTENTS
2.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.2 Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.3 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.4 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . 31
2.5.5 Quadratic and bilinear forms . . . . . . . . . . . . . . . . . . . . . . 32
2.5.6 Vector calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Random variables and probabilities . . . . . . . . . . . . . . . . . . . . . . 33
2.6.1 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6.2 Examples of probability distributions . . . . . . . . . . . . . . . . . 35
2.6.3 Examples of probability density functions . . . . . . . . . . . . . . . 36
2.6.4 Conditional and marginal probabilities . . . . . . . . . . . . . . . . 37
2.6.5 Calculations with random variables . . . . . . . . . . . . . . . . . . 37
2.7 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7.1 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7.2 Numerical dierentiation . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7.3 Root nding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.7.4 Extreme nding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 Models for processes 41
3.1 Types of processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Constraints on dynamics . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.2 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.3 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Model-based statistics 49
4.1 Scope of statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Measurements: scales and units . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Precision and accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Smoothing and interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Testing hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.6 Likelihood functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Large sample properties of ML estimators . . . . . . . . . . . . . . . . . . 57
4.8 Likelihood ratio principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.8.1 Likelihood based condence region . . . . . . . . . . . . . . . . . . 59
4.8.2 Prole likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.9 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.10 Composite likelihoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.11 Parameter identiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.12 Monte Carlo techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
CONTENTS v
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Notation 65
vi CONTENTS
Preface
The eld of theoretical biology uses elements from methodology, mathematics, and com-
puter science to develop new insights in biology. Such developments also require elements
from biology, physics, chemistry and earth sciences. Practice learns us that this cocktail
is hard to teach in a single course; it is simply too overwhelming. An extra handicap
is that little knowledge of mathematics is less than adequate to deal with the complex
non-linearities of life. Physics got its strength from simplication, both in theory and in
experimental design. Biology, however, has little access to this powerful approach; the
most simple living systems are still very complex. This is why biology still resembles a
big bag of facts, semi-facts and artifacts. Yet, many theoretical biologists believe that it
need not to stay like this. The purpose of this document is to present an adequate formal
toolkit, that should suce for most applications in biology.
Although many books exist on each of the topics we discuss, we found no book that
just lls our educational needs. Mathematical books (and especially those on statistics)
frequently have quite some material that is of little interest for the applications we have
in mind, as well as that they do not present some more advanced material we think to
be essential for a basic toolkit. Mathematicians use mathematics dierent from natural
scientists, and have other purposes in mind. We here only focus on basic material for
graduate students. You will nd little about linear models and techniques, that dominate
standard texts. The reason is that linearity hardly occurs in biology. We also omitted some
standard material about computations of quantities like ranks, determinants, inverses etc,
because basic computer routines are available and we do have a need for selection. You will
nd more on multivariate models than in elementary texts. We do realize that research
frequently requires more than we oer, but the presented material should allow a rapid
consultation of specialized literature.
We focus on conceptual aspects, and did not attempt to write a stand alone docu-
ment. The serious student frequently will feel the need to consult elementary textbooks
that oer more backgrounds, derivations and contexts. We suggest titles we think to be
appropriate. We assume practical knowledge about Octave and/or Matlab and the avail-
ability of software package DEBtool, which can be downloaded from the electronic DEB
laboratory at
http://www.bio.vu.nl/thb/deb/deblab
vii
viii Preface
Design of the document on methods
A rather technical document explains elements of methodology, mathematics and computer
science. The three disciplines start to blur and cross fertilize each other in chapters on
modeling and statistics. The rst part on methods has little material that is specic for
any specialization in biology; it, therefore, remains somewhat abstract. The plan is to
keep the document brief, not a collection of all-there-is, but a choice for the most basic
methods, with an emphasis on concepts. The selection criterion of material is its use in
the applications and exercises.
Applications
The second part of the document consists of an eventually large number of examples of
application in all elds of biology. The plan is to keep each example short, starting with a
biological problem and its motivation, and coming back with an answer to that problem,
using pieces if the toolkit that is oered in the rst part.
If new examples are included that use methods that are not discussed in the rst part,
the rst part will be extended to include these methods.
Exercises
Besides the method document and applications data-base, an eventually substantial col-
lection of exercises and answers is set up. Each exercise has the structure: motivation,
given, question, hints, answer. The exercises can illustrate the particular method and/or
an application. They can make use of public domain software.
Octave and DEBtool, which is written in Octave and Matlab, can be used to make
the exercises. It also possible to use packages like Maxima (for symbolic calculations) and
AUTO for bifurcation analysis, r instance.
Use of the document
The general idea is that a graduate student, who is trained in a particular specialization in
biology, can be oered a number of examples and exercises in his/her own eld, together
with the methods part of the document, to gain an working knowledge of Theoretical
Biology.
Self improving document
We do have the idealistic view that universities have the task of optimizing the propagation
of knowledge in a way that is as free of nancial and cultural constraints as possible. We
also do believe that collaboration leads to improvement. This is why we have setup this
eduction project in several phases.
Preface ix
Phase 1: design
The executing editor rst writes a rst draft to structure the whole project. This is more
ecient than listing the plans in detail.
Phase 2: polishing
The editorial board polishes the material, and supply additions (especially of exercises and
examples).
Phase 3: self improvement
The project is now open for contributions of examples and exercises from all over the
world. The editorial board will function like that of a journal and judge incoming material,
seeking advice from referees. If the material requires new methods, the author should make
a proposal for such a text. The rst part of the document on methods will remain the
responsibility of the editorial board (for the time being). If material from submissions is
used in this rst part, authors name will be mentioned as contributor. The authors name
will remain associated with examples and exercises.
If the collection of examples and exercises grows large, it will be classied according to
the biological subject.
Any suggestions for improvement are very welcome; please mail to deb@bio.vu.nl.
Where to get?
This document is freely down-loadable from
http://www.bio.vu.nl/thb/course/tb/
and will be updated and improved continuously in educational practice. We hope to
stimulate the interest in the eld of Theoretical Biology in this way, and to help teach a
generation of students that will bring the eld into blossom.
Disclaimer
This document is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.
Questions?
Send to deb@bio.vu.nl.
Chapter 1
Methodology
1.1 Empirical cycle
Like it or not, but humans think in terms of models, although not everybody realizes that.
The most important task of Theoretical Biology is to make implicit assumptions explicit, so
that they can be replaced by others if necessary. Models have a lot in common with stories
about quantities, phrased in the language of mathematics; they can have (and frequently
do have) language errors, they can tell non-sense and they can be boring. They can also
be exciting, however, depending on they way they are put together.
After identication of the scientic problem, the empirical cycle should start with the
formulation of a set of assumptions, a derivation of a mathematical model from these
assumptions, a sequence of tests on consistency, coherence, parameter sensitivity, and
relevance with respect to the problem. See Figure 1.1. Most models dont need to be
tested against experimental data; they simply do not pass the theoretical tests.
The second part of the empirical cycle then consists of the formulation of auxiliary
theory for how variables in the model relate to things that can be measured, the setup of
adequate experiments and/or sampling and measurement protocols to test model predic-
tions, the collection of the measurements, and statistical tests of model predictions against
measurements. These tests could reveal that the protocols have been less adequate, and
should be redesigned and executed; possible inadequacies should be detected in the auxil-
iary theory. So inconsistencies between data and model predictions not necessarily point
to inadequacies in the model itself.
If anywhere in this two-segment cycle appears the need to improve the model, it should
not be changed directly, but the list of assumptions should be adapted, and the whole
process should be repeated. It is a long and painstaking process, but sloppy procedures
easily lead to useless results. Advocates of putting the lead of the empirical cycle in
the observations, rather than in the assumptions, are frequently unaware of the implicit
assumptions that need to be made to give observations a meaning. The most important
aspect of modeling is to make all assumptions explicit. If modeling procedures are followed
in a sloppy way, by adapting models to t data directly, it is likely that the result will be
sloppy too; one easily falls in the trap of curve-tting. If it comes to tting curves to data,
the use of a pencil, rather than a model, is so much easier.
1
2 CHAPTER 1. METHODOLOGY
(re) identify
collect
(re) formulate
derive
test
test
test
test
(re) design
identify
formulate
perform
test
assumptions (hypotheses)
parameters
conservation laws)
with related fields of interest
model for efficiency
with respect to aim of research
(plasticity, simplicity)
experiment
(factors to be manipulated)
variables
(type, accuracy, frequency)
variables
model for consistency
results (statistical analysis)
(from literature)
& aim of research
model for
consistency
(dimensions,
math model
identify variables and
observations
scientific problem
model for coherence
model for numerical
behaviour & qualitative realism
to test model assumptions
to be measured
relationship between
measured & model
experiment &
with respect to experimental
mechanistically inspired
measurements
Empirical cycle
Figure 1.1: The empirical cycle as conceived by a theoretician. In the knowledge that nonsense
models can easily t any given set of data well, given enough exibility in the parameters, realism
is not the rst and not the most important criterion for useful models. Lack of t (so lack of
realism) just indicates that the modeling job is not completed yet. This discrepancy between
prediction and observation can be used to guide further research, which is perhaps the most
useful application of models. This application to improve understanding only works if the model
meets the criteria indicated in the gure; few models meet these criteria, however.
1.2. CONSISTENCY 3
It is common practice, unfortunately, to just pose and apply a model, with little at-
tention for the underlying assumptions. If such a model fails one of the tests, nothing
is left and one should start again from scratch. There cannot be a sequence of stepwise
improvements in understanding and prediction. The fact that such a model ts data is of
little use, perhaps only for interpolation purposes.
Models are idealizations and, therefore, always false in the strict sense of the word.
This limits the applicability of the principle of falsication. A model can t data for
the wrong reasons, which means that the principle of verication is even more limited in
applicability. This points to the criterion usefulness, to judge models, but usefulness is
linked to a purpose. This is why a model should never be separated from its purpose.
The purpose can contain elements such as increase in understanding, or in predictability.
Increase in understanding can turn a useful model into a less useful one.
If a model passes all tests, including that against experimental data, there is no reason
to change the assumptions, and work with them until new evidence forces reconsideration.
It might seem counter intuitive, but models that fail the test against experimental data
more directly serve their task in leading to greater insight, i.e. in guiding to the assumptions
that require reconsideration. This obviously only works well if the steps of the formulation
of assumptions has been adequate. Models are a mean in getting more insight, never an
aim in themselves.
Theoretical biology specializes in the interplay between methodology, mathematics and
computer science as applied in biological research. It is by its nature an interdisciplinary
specialization in generalism and the natural companion of experimental biology. Both have
complementary roles to play in the empirical cycle. We hope that Figure 1.1 makes clear
that both specializations should be considered as obligate symbionts in the art of science.
People with a distaste for models frequently state that a model is not more than you put
into it. This is absolutely right, but instead of being a weakness, it is the single most
important aspect of the use of models. Assumptions can have far reaching consequences
that cannot be revealed without the use of mathematics. Put into other words: any
mathematical statement is either wrong or follows from assumptions. Few people throw
mathematics away for this reason. Models play an important role in the mechanism of
research, as will be discussed, but also in other contexts, such as in nding answers to
what if questions, and in solving extrapolation problems (see chapter on statistics).
The next sections highlight some steps in the empirical cycle. Table 1.1 gives some
practical hints.
1.2 Consistency
Proposition X is inconsistent with proposition Y , means that they cannot both be true.
Models that are internally not consistent are meaningless, so they are useless. If dierent
assumptions are directly contradictory, inconsistency is easy to detect. In many cases,
however, this is much less easy. Inconsistencies come in many forms; lack of realism
(meaning: a dierence between measured data and model predictions for those data) is
just one form (that comes in gradations).
4 CHAPTER 1. METHODOLOGY
Table 1.1: Some practical hints for starters in science
open a document with a unique label, your name, date, purpose; If your document is
likely to contain quite some formulas, we suggest to use Latex, which is public domain
make a list of assumptions (refer to literature items for support)
make a list of symbols, variables and dimensions. Follow the mathematical rules for
designing symbols; dont use names, like you will do in your computer code. Use
dierent symbols for dierent dimension groups
derive the equations, and insert any new assumptions or symbols that you need in the
list. Include enough of these derivations into your document that you can understand
them (much) later; check the consistency of your assumptions
check dimensions before you proceed
write computer code from your written formulas; we suggest to use a fourth generation
language, such as Octave, which is public domain. Insert your name and date in the
computer code. Refer in your code to the document where you listed the formulas.
Make a link between the variables in your code and the symbols in your document
make sure that your code is doing what your formulas prescribe. If your code is not
working yet, you still dont have a problem. Problems start as soon as your code is
producing something, and you have to answer the dicult question whether or not that
something relates to your model
get a numerical feel for the potential behaviour of your model by making lots of graphs
using dierent choices of parameters
if you dont like the numerical behaviour of your model, dont start to change the
code directly. Change assumptions rst, re-do your derivations, then adapt the code
(including the date of creation)
make various simplications of your model to see what the dierent elements of your
model do. Learn to think in terms of families of models, rather than the model
plan your experiment carefully, by imaging in detail what you are going to do with the
results if you would have them. Will the results answer the questions that you have?
think of calibrating your equipment before use; does the accuracy meet your require-
ments? Check mass and energy balances where possible.
specify experimental conditions (sources of materials that are used, temperature, etc);
label all experimental results carefully, you might want to re-use them at a much later
moment; think of using a data-base
t your model to the data, and make a list of parameters, estimates and units. Never
insert parameter values in models, because this obscures the units.
compare the parameter values with your expectations, based on the literature
what is your most promising next step? Discuss your results with colleagues. Consider
contacting authors of papers that you read for your work; try to be a specic as possible
in the questions that you will have
1.2. CONSISTENCY 5
dimensionless # number t time
l length m mass T temperature
Table 1.2: Symbols for frequently used dimensions.
An example of a model inconsistency that is apparently not so easy to detect is the
log-logistic model for the cumulative number of ospring per female at a standardized
exposure time as a function of the concentration of test compound. This very popular
model (on which much of the environmental risk assessment in the world is based) has the
form N(c) = N
0
_
1 + (c/c
50
)
b
_
1
, where N
0
is the cumulative number of ospring in the
blanc, c
50
the so-called EC50 (50% Eective Concentration) and b a parameter that relates
to the slope of the concentration-response curve. The bioassay with (female) daphnids is
started with a number of concentrations, and a cohort of neonates in each concentration.
The individuals develop and start to reproduce after about 7 days; the bioassay runs for
21 days. The inconsistency reveals after the observation that reproduction rates tend to
become constant after some time (after growth is ceased and internal concentration settled
at some value). This means that the cumulative number of ospring eventually grows at a
constant (concentration-dependent) rate. The implication is that, if the log-logistic model
applies at some exposure time after the reproduction rates have stabilized, it cannot apply
at any later exposure time. The assumption that the model applies at 21 days, together
with the arbitrariness of this exposure period, in fact translates to the more stringent
assumption that the model applies to all exposure times (even if it not used at other
exposure times). This cannot be true, and the fact that the model ts empirical data is
meaningless in the knowledge that other models t these data too. Users of this model
are probably not aware of the implicit assumptions about the reproduction process in a
model that does not have time as an explicit variable. This type of problem rarely occurs
if one starts from assumptions about mechanisms, rather than assuming the applicability
of a model.
1.2.1 Dimensions
A dimension is an identier for the physical nature of a quantity, see Table 1.2. A quantity
of a particular dimension can be measured using several units (see Table 4.1); units deter-
mine the dimension fully. It is not necessary that all quantities in a model are measurable;
the concept dimension is more general than the concept unit. Models that violate rules
for dealing with dimensions are meaningless; it is a special case of inconsistency which
frequently relates to errors in the translation of assumptions into a model. This does not
imply that models that treat dimension well are necessarily useful models.
The elementary rules for manipulating dimensions are simple: addition and subtraction
of variables are only meaningful if the dimensions of the arguments are the same, but the
addition or subtraction of variables with the same dimensions is not always meaningful;
meaning depends on interpretation. Multiplication and division of variables correspond
with multiplication and division of dimensions. Simplifying the dimension, however, should
6 CHAPTER 1. METHODOLOGY
be done carefully. A dimension that occurs in both the numerator and the denominator in
a ratio does not cancel automatically. A handy rule of thumb is that such dimensions only
cancel if the sum of the variables to which they belong can play a meaningful role in the
theory. The interpretation of the variable and its role in the theory always remain attached
to dimensions. So the dimension of the biomass density in the environment expressed on
the basis of volume is cubed length (of biomass) per cubed length (of environment); it is
not dimensionless. This argument is sometimes quite subtle. The dimension of the total
number of females a male buttery meets during its lifetime is number (of females) per
number (of males), as long as males and females are treated as dierent categories. If it is
meaningful for the theory to express the number of males as a fraction of the total number
of animals, the ratio becomes dimensionless.
The connection between a model and its interpretation gets lost if it contains transcen-
dental functions of variables that are not dimensionless. Transcendental functions, such as
logarithm, exponent and sinus, frequently occur in models. pH is an example, where a log-
arithm is taken of a variable with dimension number per cubed length (ln#l
3
). When it
is used to specify environmental conditions, no problems arise; it just functions as a label.
However, if it plays a quantitative role, we must ensure that the dimensions cancel correctly.
For example, take the dierence between two pH values in the same liquid. This dierence
is dimensionless: dim(pH
1
pH
2
) = ln#l
3
ln#l
3
= ln#l
3
#
1
l
3
= ln = ..
In linear multivariate models in ecology, the pH sometimes appears together with other
environmental variables, such as temperature, in a weighted sum. Here dimension rules
are violated and the connection between the model and its interpretation is lost.
Another example of a model is the Arrhenius relationship, where the logarithm of a
rate is linear in the inverse of the absolute temperature: ln

k(T) = T
1
, where

k is
a rate, T the absolute temperature and and are regression coecients. At rst sight,
this model seems to violate the dimension rule for transcendental functions. However, it
can also be presented as

k(T) =

k

expT
A
T
1
, where T
A
is a parameter with dimen-
sion temperature and

k

is the rate at very high temperatures. In this presentation, no


dimension problem arises. So, it is not always easy to decide whether a model suers from
dimension problems.
A further example of a model is the allometric function in body-size scaling relation-
ships ln y(x) = + ln x, or y(x) = x

, where y is some variable, x has the interpre-


tation of body weight, the parameter is known as the scaling exponent, and as the
scaling coecient. At rst sight, this model also seems to violate the dimension rule for
transcendental functions. Huxley introduced it as a solution of the dierential equation
dy
dx
=
y
x
. This equation does not suer from dimensional problems, nor does its solution
y(x) = y(x
1
)(
x
x
1
)

. However, this function has three rather than two parameters. It can be
reduced to two parameters for dimensionless variables only. The crucial point is that, in
most body size scaling relationships, a natural reference value x
1
does not exist for weights.
The choice is arbitrary. The two-parameter allometric function violates the dimension rule
for transcendental functions; uncertainty in the value of translates into an uncertainty in
the dimensions of . Although this has been stated by many authors, the use of allometric
functions is so widespread in energetics that it almost seems obligatory.
Variables are frequently transformed into dimensionless variables to simplify the model
1.3. COHERENCE 7
and get rid of as many parameters as possible. This makes the structure of the model more
visible, and, of course, is essential for understanding the range of possible behaviours of
the model when the parameter values change. The actual values of parameters are usually
known with a high degree of uncertainty and they can vary a lot. Buckinghams theorem
states that any relationship between m variables x
i
of the form f(x
1
, , x
m
) = 0 can be
rewritten as a relationship between n = ms dimensionless variables y
i
= h
i
(x
1
, , x
n
)
of the form g(y
1
, , y
m
) = 0, if the xs have s dierent dimensions.
1.2.2 Conservation laws
Models that violate the conservation laws for mass, energy or time (or other conserved
quantities) are rarely useful. It is a milder form of inconsistency. (The physical conversion
between mass and energy occurs on scales in space and time that is of little relevance to life
on earth.) Conservation laws can frequently be written as a constraint on state variables x
j
of a system in the form f
i
(x
1
, , x
n
) = 0, where index i relates to the dierent conserved
quantities (such as chemical elements, energy, etc).
Thermodynamics makes a most useful distinction between intensive variables which
are independent of size, such as temperature, concentration, density, pressure, viscosity,
molar volume, and molar heat capacity and extensive variables, which depend on size,
such as mass, heat capacity and volume. Extensive variables can sometimes be added
in a meaningful way if they have the same dimension, but intensive variables cannot.
Concentrations, for example, can only be added when they relate to the same volume.
Then they can be treated as masses, i.e. extensive variables. When the volume changes,
we face the basic problem that while concentrations are the most natural choice for dealing
with mechanisms, we need masses, i.e. absolute values, to make use of conservation laws.
This is one of the reasons why one needs a bit of training to apply the chain rule for
dierentiation.
1.3 Coherence
Coherence is the natural (logical) relationship between quantities. Assumptions should not
contradict known relationships in the context of the model. While consistency can only
be judged for rather precise quantitative propositions, coherence is weaker and is judged
for more qualitative propositions. Consistency mainly applies to the assumptions in direct
relationship with the problem, coherence applies to the scientic neighbourhood of the
problem in a wider context. The problem that everything depends on everything else in
biology has strong implications for models that represent theories. When y depends on x,
it is usually not hard to formulate a set of assumptions, that imply a model that describes
the relationship with acceptable accuracy. This also holds for a relationship between y
and z. When more and more relationships are involved, the cumulative list of assumptions
tends to grow and it becomes increasingly dicult to keep them consistent. This holds
especially when the same variables occur in dierent relationships. Moreover, the inclusion
of more variables in the model also comes with an increase in constraints that relate to
8 CHAPTER 1. METHODOLOGY
known properties of those variables.
1.3.1 Scales in organization
The eld of biology ranges from molecules, via cells, individuals, population, ecosystems
to system earth. These levels of organization concern scales in space as well as in time.
The words of Pascal still apply:
The whole can only be understood in terms of its parts, but the parts can
only be understood in the context of the whole.
Recent successes in molecular biology made holistic thinking less popular, however. Some
workers seem to belief that soon they can explain all biology from the molecular level. The
principle of reduction in science relates to the attempt to explain phenomena in terms of the
smallest feasible objects. The hope for success can only be poor, however. Knowledge about
technical details of engines in automobiles is extremely valuable for optimizing design, and
reducing air pollution, but it is of little help to ght trac jams. Similar relationships hold
between molecular biology and ecology, these specializations focus on dierent space-time
scales and deal with dierent processes that partially overlap.
Scales in space and in time are coupled in modelling because of problem of complexity
(see next section). Models with a large time scale and a small special scale (or vice versa)
will be complex, and complex models are not very useful. Using impressive computing
power, it is feasible to model water transport in the earths oceans, which seems to defeat
the coupling of scales. The modeling of this physical transport, however, involves only
a limited number of parameters (and processes), given the shape of the oceans basins,
explicit external wind forcing and information on planetary rotation.
1.4 Eciency
A model should be well-balanced in the level of details. It makes little sense to construct
a model for x, y and z that is very detailed in the relationship between x and (y, z), but
not detailed at all in the relationships between y and z. The avoidance of such unbalance
by increasing the level of detail between y and z easily leads to complex models. All
details should have a necessary function in the model, both conceptually and numerically;
the principle of parsimony is to leave out less important details. What is a detail or an
essential feature depends on the problem. The eciency criterion boils down to the match
in essential features in the model and in the problem, which makes that the model can be
used optimally to nd answers to the problem.
A major trap in model building is the complexity caused by a large number of variables.
This trap became apparent with the advent of computers, which removed the technical and
practical limitations for the inclusion of many variables. Each relationship, each parameter
in a relationship comes with an uncertainty, frequently an enormous one in biology. With
considerable labour, it is usually possible to trim computer output to an acceptable t with
a given set of observations. This, however, gives minimal support for the realism of the
1.5. NUMERICAL BEHAVIOUR 9
whole, which turns simulation results into a most unreliable tool for making predictions in
other situations. The need for compromise between simplism and realism, makes modeling
an art that is idiosyncratic to the modeler.
The only solution to the trap of complexity is to use nested modules. Sets of closely
interacting objects are isolated from their environment and combined into a new object, a
module, with simplied rules for input-output relationships. This strategy is basic to all
science. A chemist does not wait for the particle physicist to nish their job, though the
behaviour of the elementary particles determines the properties of atoms and molecules
taken as units by the chemist. The same applies to the ecologist who does not wait for
the physiologist. The existence of dierent specializations testies to the relative success
of the modular approach.
The problems that come with dening modules are obvious, especially when they are
rather abstract. The rst problem is that it is always possible to group objects in dierent
ways to form new objects which then makes them incomparable. The problem would be
easy if we could agree about the exact nature of the basic objects, but life is not that
simple. The second problem with modules lies in the simplication of the input-output
relationships. An approximation that works well in one circumstance can be inadequate
in another. When dierent approximations are used for dierent circumstances, and this
is done for several modules in a system, the behaviour of the system can easily become
erratic and the approximations no longer contribute insight into the behaviour of the real
thing.
In the rst part of the empirical cycle, where the properties of models are analyzed, a
powerful tool is to focus on the most simple models, and compare dierent models, where
particular variables are included and excluded to study the eect of that variable. This can
sometimes be done rather systematically, and families of models can be compared within
a given framework. This is the happy hunting ground of Mathematical Biology, where
models simplicity allows the application of powerful mathematics.
1.5 Numerical behaviour
Before the realism of a model can be tested in a sensitive way, we need to study how
the numerical behaviour of the model depends on the values of the variables and the
parameters. Knowledge about the plasticity of the model is important in the estimation
of parameter values, and in the best design of experiments. See the section on support.
The rescaling of variables to dimensionless quantities is a very useful tool to reduce the
complexity of the model by eliminating parameters. A very useful strategy is to choose
combinations of parameter or variables values that kick out a particular mechanism, and
to compare the results with other choices of values. The contribution of each mechanism
to the end result can be studied this way. It frequently happens that a few combinations
of a number parameters (mainly) determine the numerical behaviour, rather than each
parameter separately. This reveals opportunities to simplify the model. Odd behaviour
of the model can point to undesirable interactions of assumptions, but more frequently to
simple programming errors. If the odd behaviour is a genuine implication of assumptions,
10 CHAPTER 1. METHODOLOGY
however, this is most helpful in the design of experiments to test its realism.
1.5.1 Testability
Models that cannot be tested against experimental results are likely to be useless. Testa-
bility, however, comes in gradations. In most cases assumptions can be tested indirectly
only, which involves other assumptions. This complicates the process of replacement of un-
realistic assumption in an attempt to nd realistic ones, but this does aect the usefulness
of a model.
The variables that are easy to measure or those that will be used to test the model are
not always those that should be state variables. An example is metabolic rate, which is
measured as the respiration rate, i.e. oxygen consumption rate or carbon dioxide production
rate. The metabolic rate has dierent components, each of which follows simple rules.
The sum of these components is then likely to behave in a less simple way in non-linear
models. The same holds for, for example, dry weights, which can be decomposed into
structural biomass and reserve materials. A direct consequence of such partitioning is that
experimental results that only include composite variables are dicult to interpret. For
mechanistic models, it is essential to use variables that are the most natural players in
the game. The relationship between these variables and those to be measured is the next
problem to be solved, once the model is formulated.
1.6 Experimental design
The art of experimental design fully rests on the prediction of the experimental results, and
the choice of statistical procedures that will be used to evaluate the results. It is a form
of reserved reasoning. The choice of experimental conditions, type of measurements to be
made, details of sampling protocols to be used and other choices that have to be made can
be motivated, for instance, by the minimization of the condence intervals of particular
model parameters that will estimated from the experimental results. A problematic aspect
in explicit optimization of design is that models parameters have to be known, while the
experiment is usually done because they are not known. One has to rely on guesses, which
might be in the wrong ball park. Moreover, the optimization of experimental design usually
also involves constraints in terms of nancial costs (including eort), ethical aspects, and
availability of materials. The numerical analysis of the model (see previous section) is the
main source of inspiration in the design of experiments.
1.7 Identication of variables to be measured
What can be measured and the precision of measurements depend on technical possibilities
and nancial costs that come with their own constraints. In the most straightforeward and
ideal situation the variables that occur in the model can be measured directly, without
interference with the system (experimental artifacts). Practice is usually remote from this
ideal situation.
1.8. EXPERIMENTATION 11
The usual situation is that the variables that can be measured dier from those in the
model, which calls for additional modelling for how the two sets are related. These models
come with new parameters, and the numerical behaviour of this model (with variables
that can be mesured) should again be studied to optimize the design of the experiment
and reduce the complexity of the model. It frequently happens that the experiment is
not a single experiment, but a set of possible very dierent experiments, in which dierent
variables are measured. Some of these experiments require experimental pilot studies before
the nal experiment can be set up in an optimal way.
It is physically impossible to measure something, without interference with the system.
The amount of disturbance must be evaluated in one way or another, usually by comparing
results of experiments in which the disburbance is of a dierent nature.
Before actually performing costly (and/or time consuming) experiments it can be very
useful to fake the possible measured values rst, and complete the full cycle of statistical
testing, using these faked values. It might be unrealistic to expect that the experiment
results can possibly be satisfying, and that more eort should be invested in further opti-
mizing the design or in the setup of alternative experiments.
1.8 Experimentation
This is not the right place to focus on the many aspects of experimentation, and e.g.
procedures for calibration of measurement devices. Similar to modelling, testing mass and
energy balances can be a very useful tool to check experimental results on the consistency
in ways that hardly depend on modeling details.
1.9 Realism
Model predictions for measurements (or experimental results) will always dier from these
measurements because models are idealisations, and repeated measurements are not iden-
tical. Whether or not a given dierence is large or small depends on the specied problem.
The judgement can usually be formalized in a statistical procedure for that problem, called
a test, that can result in a judgement unacceptably large, in which case the model failed
the test against experimental data. A realistic model is a model that predicts measured
values with a small dierence only. The measurements then support the model, and give
no reason to change or replace assumptions. Such a support can, however, never prove
that the assumtions are right.
The amount of support that a successful test of a model gives depends on the model
structure and has an odd relationship with the ability to estimate parameters: the better
one can estimate parameters, the less support a successful test of a model gives. This is a
rather technical but vital point in work with models. This can be illustrated with a simple
model that relates y to x, and which has a few parameters, to be estimated on the basis of
a given set of observations x
i
, y
i
. We make a graph of the model for a given interval of
the argument x, and get a set of curves if we choose the dierent values of the parameters
between realistic boundaries. Two extremes could occur, with all possibilities in between:
12 CHAPTER 1. METHODOLOGY
The curves have widely dierent shapes, together lling the whole
x, y-rectangular plot. Here, one particular curve will probably
match the plotted observations, determining the parameters in an
accurate way, but a close match gives little support for the model; if
the observations were totally dierent, another curve, with dierent
parameter values, would have a close match.
y
x
The curves all have similar shapes and are close together in the x, y-
rectangular plot. If there is a close match with the observations,
this gives substantial support for the model, but the parameter
values are not well determined by the observations. Curves with
widely dierent parameter values t equally well.
y
x
Two alternative models for biodegradation, with the same number of parameters, illus-
trate both situations in Figure 1.2. Of course, the choice of the models structure is not
free; it is dictated by the assumptions. So, testability is a property of the theory and nice
statistical properties can combine with nasty theoretical ones and vice versa. It is essential
to make this distinction.
An increase in the number of parameters usually allows models to assume a much wider
range of shapes in a graph. This is closely connected with the structural property of models
just mentioned. So a successful test against a set of observations gives little support for
such a model, unless the set includes many variables as well. A fair comparison of models
should be based on the number of parameters per variable described, not on the absolute
number.
1.9.1 Stochastic versus deterministic models
Observations show scatter, which reveals itself if one variable is plotted against another.
It is such an intrinsic property of biological observations that deterministic models should
be considered as incomplete. The mechanism behind scatter is frequently the eect of a
large number of factors that inuence the result, but are not modeled explicitly. Think,
for instance, about modeling the outcome of throwing a dice. A complex deterministic
model can (in principle) predict the outcome, when the forces, the trajectory in the air,
the tumbling and bouncing is modeled in great detail, including the many imperfections
of dice and table. A very simple stochastic model (with the six possible outcomes having
equal probability) usually works better become most parameters of the deterministic model
are not known, and the process of throwing cannot be controlled in sucient detail. This
example reveals that it should usually be possible to reduce scatter (deviations between
measurements and predictions by deterministic models) either by modeling more factors,
or by excluding the scatter inducing factors experimentally.
Only complete models, i.e. those that describe observations which show scatter, can be
tested. The standard way completing deterministic models is to add measurement error.
The denition of a measurement error is that, if the measurements are repeated frequently
1.9. REALISM 13
n-th order degradation
s
c
a
l
e
d
c
o
n
c
e
n
t
r
a
t
i
o
n
,
x
(

)
scaled time,
Monod degradation
s
c
a
l
e
d
c
o
n
c
e
n
t
r
a
t
i
o
n
,
x
(

)
scaled time,
Dynamics
d
dt
X = X
n
Solution
X(t) =
_
X
1n
0
(1 n) t
_
(1n)
1
Special cases
X(t)
n=0
= X
0
t for t < X
0
/
X(t)
n=1
= X
0
exp t
Scaled solution
x() = (1 (1 n))
(1n)
1
with x
X
X
0
; t X
n1
0
Dynamics
d
dt
X =

b
X
X
K
+ X
Solution
0 = X(t) X
0
+ X
K
lnX(t)/X
0
+

bt
Special cases
X(t)
KX
0
= X
0


bt for t < X
0
/

b
X(t)
X
K
X
0
= X
0
exp

bt/X
K

Scaled solution
0 = x() 1 + x
K
ln x() + (x
K
+ 1)
with x
X
X
0
; =
t

b
X
K
+X
0
; x
K
=
X
K
X
0
Figure 1.2: The n-th order model for biodegradation of a compound X during time t is much
more exible in its morphology as a function of parameter values then the Monod model, while
both models have three parameters (start concentration X
0
, a rate parameter, or

b and a shape
parameter: the order n or the saturation constant X
K
). Both models give identical X(t) curves
if n = 0 and X
K
0 and if n = 1 and X
K
. While all possible shapes of curves can
be scaled between these two boundaries for the Monod model, many other shapes are possible
for the n-th order model. This means that observations better determine the parameter values
of the n-th order model, but that a good t gives less support, compared to the Monod model.
Moreover, the n-th order model suers from dimension problems if n is not an integer, and has
a more complex link with mechanisms, if any.
14 CHAPTER 1. METHODOLOGY
enough, the error will disappear in the mean of these observations. Such models are called
regression models: y
i
(x
i
) = f(x
i
[pars) +
i
. They are characterized by a deterministic part,
here symbolized by the function f, plus a stochastic part, . The latter term is usually
assumed to follow a normal probability density, with mean 0 and a xed variance, which
is one of the parameters of the model.
The interpretation of scatter as measurement error originates from physics. It is usually
not realistic in biology, where many variables can be measured accurately in comparison
with the amount of scatter. The observations just happen to dier from model expec-
tations. When the scatter is large, the model is useless, despite its goodness of t as a
stochastic model. A realistic way of dealing with scatter is far from easy and usually gives
rise to highly complicated models. Modelers are frequently forced to compromise between
realism and mathematical over-simplicity. This further degrades the strict application of
goodness of t tests for models with unrealistic stochastic components.
1.10 Logic
Deduction is the inferring of particular instances from a general law. This is inverse to
induction: the inference of a general law from particular instances. Most scientic reason-
ings have both deductive and inductive components. Deductive logic consists of two parts:
propositional and predicate logic.
1.10.1 Propositional logic
Propositional logic deals with the (re)construction of statements from ve elementary op-
erations on particular statements p and q:
name expression symbol
negation not-p p
disjunction p or q p q
conjunction p and q p q
material implication if p then q p q
material equivalence p if and only if q p q
Based on the idea that a statement can be either true (T) or false (F), two funda-
mental truth tables can be constructed:
p p
T F
F T
p q p q p q p q p q
T T T T T T
T F T F F F
F T T F T F
F F F F T T
The systematic application of these rules reveals, for instance, that the tautology (((p
q) p) q) is always true and the contradiction (((p q) p) q) is always false, for
all false/true combinations for p and q.
An argument in formal logic leads from premises to a conclusion, separated by a hori-
zontal bar. Some valid argument forms are:
1.10. LOGIC 15
modus ponens
p q
p
q
modus tollens
p q
q
p
disjunctive syllogism
p q
p
q
hypothetical syllogism
p q
q r
p r
Some fallacious argument forms are:
arming the consequent
p q
q
p
denying the antecedent
p q
p
q
asserting an alternative
p q
p
q
1.10.2 Predicate logic
While propositional logic deals with operations on elementary statements, predicate logic
deals with the internal structure of such statements. This structure is again captured in
symbols.
If thing a has property P, we write the statement Pa for a one-place predicate. If
relationship R exists between thing a and thing b, we write Rab for a two-place predicate.
Things can be constants, or variables; if they are variables, statements require a quanti-
er. The symbol (x) is used for the universal quantier, i.e. for all x; so (x)( x )
means for all x, we have the statement x . The symbol (x) represents the existen-
tial quantier; (x)( x ) means there exists at least one x for which the statement
x holds. If a statement has no quantier, it is called a singular statement.
Arguments related to the modus ponens are:
(x)(Px Qx)
Pa
Qa
(x)(y)((Sxy Fxy) Cxy)
Sab Fab
Cab
where x and y are variables; a and b are constants.
The following statements are pair-wise equivalent:
(x)(Px) (x)(Px)
(x)(Px) (x)(Px)
(x)(Px) (x)(Px)
(x)(Px) (x)(Px)
Intuition will often suce to evaluate arguments in predicate logic.
16 CHAPTER 1. METHODOLOGY
Chapter 2
Mathematical toolkit
2.1 Sets
A set is a collection of objects that satises a particular property. If such a property can
be absent or present, the set can be depicted graphically in a Venn diagram (after John
Venn, 1881), see Figure 2.1. If an object x has that property, it belongs to that set A,
indicated by x A, if not it belongs to the complement of that set, indicated by x A

.
The set of all objects U is said to be partitioned in the sets A and A

. The union of the


two sets, A B, is the set of objects that have at least one of two properties, intersection
of the two sets, AB, is the set of objects that have both properties. If n(A) denotes the
number of elements of A, we have
n(A B) = n(A) + n(B) n(A B)
If all elements (also called members) of set B are also elements of A, B is said to be a
subset of A, denoted as B A, or A B. If the intersection of two sets is empty, the sets
are said to be disjoint, denoted by A B = .
Some rules for the three Boolean operators , and / on subsets A, B and C of U are
Idempotent laws: A A = A; A A = A
Commutative laws: A B = B A; A B = B A
Associative laws: C (A B) = (C A) B; C (A B) = (C A) B
Absorption laws: A (A B) = A (A B) = A
Modular law: If C B, then C (A B) = (C A) B
Distributive laws: C(AB) = (CA)(CB); C(AB) = (CA)(CB)
Universal bounds: C = ; C = C; C U = C; C U = U
Complemetarity: C C

= ; C C

= U
17
18 CHAPTER 2. MATHEMATICAL TOOLKIT
A B
Figure 2.1: Venns diagram depicts the set A of ob-
jects that have a certain property with a circle, the
other objects dont have that property. The set B of
objects that have another property can partly overlap.
The union of the two sets, AB, is the set of objects
that have at least one of two properties (green and
yellow), intersection of the two sets, A B, is the set
of objects that have both properties (yellow).
Involution law: (C

= C
de Morgans laws: (A B)

= A

; (A B)

= A

An interval of real numbers is the set of all real numbers between the boundaries x
0
and x
1
. If the set includes the boundaries the interval is said to be closed, [x
0
, x
1
], if not
it is open, (x
0
, x
1
). If can also be half-open, (x
0
, x
1
] or [x
0
, x
1
). An interval is said to be
unbounded if it includes all numbers larger or smaller than some number a, and we write
[a, ), (, a). If the interval includes all real numbers we write (, ). Notice that
or not real numbers.
2.1.1 Combinatorics
We will only encounter very few basic results of combinatorics, the art of counting. The
number of dierent ways to order n objects is n factorial : n!

n
i=1
i, so e.g. 4! =
1 2 3 4 = 24. We have that 0! = 1! = 1. The number of dierent selections
of x out of n objects is n over x:
_
n
x
_

n!
x! (nx)!
for x n, and 0! = 1. E.g.
_
4
2
_
=
4!
2!2!
=
24
22
= 6.
We have
_
n
x
_
=
_
n
n x
_
and
_
n
x 1
_
+
_
n
x
_
=
_
n + 1
x
_
.
2.2 Operators
An operator (or mapping or transformation) T : A is a rule which assigns a object
y in the set (the image) to each objects x in the set A (the domain). We denote the
object in assigned to x by y = T(x) or by y = Tx. Both x and y can not only be single
objects, but also lists of objects.
An operator is said to be injective or one-to-one if dierent elements in the domain
have dierent images. An operator is surjectice, or a mapping of X onto Y , if every y Y
is the image of at least on x X. An operator is said to be bijective if it is both injective
and surjective. For such mapping, an inverse mapping exists.
2.3. NUMBERS 19
2.3 Numbers
A scalar is a number (in contrast to e.g. a matrix, which will be discussed later). Special
types of numbers are
Real numbers are the familiar numbers, which can be positive, zero as well as negative.
They have innitely many decimals. If all decimals after a certain decimal are zero,
the number is said to have a terminating decimal expansion. The absolute value, [a[
of real number a is a if a 0 or a if a < 0. Special types of real numbers are
Integers are the numbers 0, 1, 2, .
Rational numbers are ratios of integers (so they include the integers).
Complex numbers are numbers of the form c = a + bi, where real number Re(c) = a
is called the real part, real number Im(c) = b is called the imaginary part, and i is
dened as i =

1 (so i
2
= 1). The complex numbers include the real ones (b = 0).
The conjugate of a +bi is dened to be a bi; both numbers form a conjugated pair.
Simple rules exist for operations with complex numbers
Addition: (x + iy) + (u + iv) = x + u + i(y + v)
Subtraction: (x + iy) (u + iv) = x u + i(y v)
Multiplication: (x + iy)(u + iv) = xu yv + i(xv + yu)
Division:
x+iy
u+iv
=
xu+yv
u
2
+v
2
+ i
yuxv
u
2
+v
2
Complex numbers have important applications in e.g. the roots of functions, and the anal-
ysis of the behaviour of dynamic systems.
2.4 Functions
A function f : A is an operator (so a rule) which assigns a unique number y in the
set (the range) to each number x in the set A (the domain). We denote the number in
assigned to x by y = f(x).
We call x the independent variable and y the dependent variable. The specication of
f can contain constants, called parameters.
A function f is said to be monotonously increasing if f(x) > f(x
1
) for all x > x
1
,
monotonously decreasing if f(x) < f(x
1
), monotonously non-decreasing if f(x) f(x
1
),
monotonously non-increasing if f(x) f(x
1
). A function f is said to be even if f(x) =
f(x), or odd if f(x) = f(x). A function is homogeneous of the n-th degree if f(ax) =
a
n
f(x) for some arbitrary positive constant a.
Special types of functions are
Polynomials: functions of the type f(x) =

n
i=0
a
i
x
i
, where n is called the degree of
the polynomial if a
n
,= 0.
Rational functions are ratios of two polynomials: f(x) = p(x)/q(x), where p(x) and
q(x) are polynomials.
20 CHAPTER 2. MATHEMATICAL TOOLKIT

a
c
b

Figure 2.2: A triangle with a right angle, and sides


of length a, b and c =

a
2
+ b
2
illustrates the basic
trigonometric functions of angle . We have
sin = b/c csc = c/b
cos = a/c sec = c/a
tan = b/a cot = a/b
Algebraic functions: which only include addition, multiplication, division, and taking
powers.
Composition of two functions g and h is the function f(x) = g(h(x)). for all x in the
domain of the inner function h, such that h(x) is in the domain of outer function g.
Transcendental functions:
exponential function f(x) = a
x
, where parameter a (the base) is real.
logarithmic function f(x) = log
a
x if a
f(x)
= x.
trigonometric functions sin(x), cos(x), tan(x), cot(x)
Many functions in practice are only dened implicitly in the form g(x, y) = 0, where
y = f(x) is considered as a function of x provided that y is uniquely dened by this
equation.
A function f is the inverse function of function g if f(g(x)) = x for all values of x in
the domain of g. We then also have that g(f(x)) = x. Examples are the pairs
y = log x and exp y for x > 0
y = arctan x and tan y for /2 < x < /2
y = arcsin x and sin y for /2 < x < /2
2.4.1 Trigonometric functions
Figure 2.2 illustrates the denitions of the basic trigonometric functions sine, cosecant,
cosine, secant, tangent and cotangent. The relationships between these functions follow
from the observation that the sum of the three angles of a triangle amount to 2 (where
radians = 180 degrees). We have the following relationships
csc =
1
sin
; sec =
1
cos
; cot =
1
tan
; tan =
sin
cos
.
cos() = cos ; sin() = sin ; cos( + 2) = cos ; sin( + 2) = sin .
cos
2
+ sin
2
= 1; 1 + tan
2
= sec
2
; 1 + cot
2
= csc
2
.
sin( + ) = sin cos + cos sin ; cos( + ) = cos cos sin sin
cos
2
= (1 + cos 2)/2; sin
2
= (1 cos 2)/2.
2.4. FUNCTIONS 21
2.4.2 Limits
The number L is the limit of function f(x) as x approaches x
1
provided that, given any
number > 0, there exists a number > 0 such that [f(x) L[ < for all x such that
0 < [xx
1
[ < . In the right-hand limit lim
xx
1
, x
1
is approached from some value smaller
than x
1
, in the left-hand limit lim
xx
1
, x
1
is approached from some value larger than x
1
.
If a function increases without bound, we indicate this with the symbol , so lim
x0
x
1
=
, lim
x0
x
1
= , while lim
x0
x
1
does not exist. We have the rules for any positive
number a: a = ; a = ;
a
= . Undened are: , 0 ,

0
,

.
A function f is continuous in the neighbourhood of a point x
1
if lim
xx
1
f(x) exists
and equals f(x
1
); the left- and right-hand limits are the same, and called the two-sided
limit.
Rules for limits are:
Constant law: If f(x) = f
1
, for constant x
1
, lim
xx
1
f(x) = f
1
.
Addition law: If lim
xx
1
f(x) = f
1
and lim
xx
1
g(x) = g
1
, then lim
xx
1
(f(x) +
g(x)) = f
1
+ g
1
.
Product law: If lim
xx
1
f(x) = f
1
and lim
xx
1
g(x) = g
1
, then lim
xx
1
(f(x)g(x)) =
f
1
g
1
.
Substitution law: If lim
xx
1
g(x) = L and lim
xL
f(x) = f(L), then lim
xx
1
f(g(x)) =
f(L).
Reciprocal law: If lim
xx
1
f(x) = f
1
and f
1
,= 0, then lim
xx
1
1
f(x)
=
1
f
1
.
Quotient law: If lim
xx
1
f(x) = f
1
and lim
xx
1
g(x) = g
1
, while g
1
,= 0 then
lim
xx
1
f(x)
g(x)
=
f
1
g
1
.
Squeeze law: If lim
xx
1
f(x) = L and lim
xx
1
g(x) = L, while f(x) h(x) g(x)
for x in the neighbourhood of x
1
, then lim
xx
1
h(x) = L.
The limit of f(x, y) as (x, y) approaches (x
1
, y
1
) is L provided that, for every number
> 0, there exists a number > 0, with the following property. If (x, y) is a point of the
domain of f such that 0 <
_
(x x
1
)
2
+ (y y
1
)
2
< , then it follows that [f(x, y)L[ < .
If L = f(x
1
, y
2
), and the limit exists, f is said to be continuous at the point (x
1
, y
1
).
A limit that is of special interest is
lim
n
_
1 + n
1
_
n
= e 2.718281828459045
This number has the property that log
e
e = 1. If the base of the logarithm equals e, the
base will be suppressed in the notation. We will write exp(x) for e
x
, so exp(log(x)) = x.
The curve y = f(x) has a vertical tangent line at the point (x
1
, f(x
1
) provided that f
is continuous at x
1
and [
d
dx
f(x)[ as x x
1
. The line y = L is said to be a horizontal
asymptote of the curve y = f(x) if either lim
x
f(x) = L or lim
x
f(x) = L. The
22 CHAPTER 2. MATHEMATICAL TOOLKIT
function f(x) =
x
x+1
has a vertical tangent line at x = 1 and a horizontal asymptote for
f(x) = 1.
A function of x is said to be of Order x, indicated by O(x) if lim
x0
O(x) = 0, and
of order x, indicated by o(x), if lim
x0
o(x)/x = 0. If a function of x is of order x, it is
obviously also of Order x.
2.4.3 Sequences and series
A sequence is an ordered list of numbers (called terms), and is called an innite sequence if
the list does not end. An example the Fibonacci sequence F
n
which is dened as F
1
= 1,
F
2
= 1, F
n+1
= F
n
+F
n1
for n 2. If new terms are dened as functions of earlier terms,
like here, we speak of a recursively dened sequence.
The sequence a
n
converges to a real number L if lim
n
a
n
= L, which means that
for any number > 0 there exists an integer N such that [a
n
L[ < for all n N. A
sequence is increasing if a
i
a
j
for all i < j, of decreasing if a
i
a
j
.
A series is the sum of all terms in a sequence. A famous series is the geometric series

n=0
ar
n
=
a
1r
for [r[ < 1. The harmonic series

n=1
n
1
diverges to innity; it is a
special case of the Riemann zeta function, which is dened as (x) =

i=1
n
x
, where x is
a complex number.
The most important series is probably Newtons the binomial series from the 1660s:
1 + x +
( 1)
2!
x
2
+
( 1)( 2)
3!
x
3
+ =

i=0
_

i
_
x
i
= (1 + x)

.
A special case is for x = 1, and equal to an integer n:

n
i=0
_
n
i
_
= 2
n
. Some well
known series of nite sequences are

n
i=1
i = n(n + 1)/2,

n
i=1
i
2
= n(n + 1)(2n + 1)/6,

n
i=1
i
3
= n
2
(n + 1)
2
/4.
2.4.4 Dierentiation
Single independent variable
The derivative of a continuous function f in the point x
1
is dened as
d
dx
f(x
1
) = f

(x
1
) = lim
h0
f(x
1
+ h) f(x
1
)
h
This limit not always exists; if it exists, the function f is said to be dierentiable in the
point x
1
. We will write dx for an innitesimally small step in x. Notice that for y = f(x),
dim(y

) = dim(y)/ dim(x).
The denition of dierentiation implies rules that are listed in Table 2.1. When x has
the interpretation of time,
d
dx
f has the interpretation of a change in y = f(x), which is a
rate if y is some quantity.
If a dierentiable function y = f(x) is specied implicitly in the form F(x, y) = 0,
implicit dierentiation of y with respect to x generally results in an dierential equation
of the form g(x, y, y

) = 0.
2.4. FUNCTIONS 23
f(x) f

(x) f(x) f

(x)
a 0 x
a
ax
a1
g(x) + h(x) g

(x) + h

(x) g(x)h(x) g

(x)h(x) + g(x)h

(x)
g(x)
h(x)
g

(x)
h(x)

g(x)h

(x)
(h(x))
2
g(h(x))
dg
dh
dh
dx
a
x
a
x
log x
exp x exp x log x x
1
cos x sin x sin x cos x
tan x sec
2
x cot x sec
2
x
sec x sec x tan x csc x csc x cot x
Table 2.1: Some frequently occurring rules for dierentiation, where a is a parameter.
Dierentiation can be nested; the second derivative is
d
2
dx
2
f(x) = f

(x
1
) = lim
h0
f

(x
1
+ h) f

(x
1
)
h
When x has the interpretation of time, y

=
d
2
dx
2
f has the interpretation of a change in y

for y = f(x), which is an acceleration if y is some quantity. If y

(x) changes sign at x = x


1
,
f(x) is said to have an inection point at x
1
. We have dim(y

) = dim(y)/ dim
2
(x).
LHopitals rule for the limit of the ratio of two dierentiable functions f and g for
some x ,= a in some open interval containing a reads
lim
xa
f(x)
g(x)
= lim
xa
f

(x)
g

(x)
if lim
xa
f(x) = 0 and lim
xa
g(x) = 0 while g

(a) ,= 0
The theorem of Brook Taylor (1685-1731) states that any bounded function f that has
derivatives of all orders in some interval containing x
1
can be written as the polynomial
f(x) =

n=0
(x x
1
)
n
n!
d
n
dx
n
f(x
1
) (2.1)
For x
1
= 0, the Taylor series is called the Maclaurin series. The Taylor series is a
generalization of the binomial series; for f(x) = (1 + x)

, we have
d
n
dx
n
f(x) = ( 1)( 2) ( n + 1)(1 + x)
n
.
The Taylor series for the exponential, sine and cosine functions amount to
exp(x) = 1 + x +
x
2
2!
+
x
3
3!
+
sin(x) = x
x
3
3!
+
x
5
5!

x
7
7!
+
cos(x) = 1
x
2
2!
+
x
4
4!

x
6
6!
+
24 CHAPTER 2. MATHEMATICAL TOOLKIT
The line tangent g(x) to f(x) at the point (x
1
, f(x
1
)) is
g(x) = f(x
1
) + (x x
1
)
d
dx
f(x
1
).
This corresponds with the Taylor series, using the zero-th and rst-order derivatives only.
Likewise we arrive the tangent parabola by also including the seconder-order terms:
g(x) = f(a) + (x x
1
)f

(x
1
) + (x x
1
)
2
f

(x
1
)/2.
For x close enough to x
1
, and f suciently smooth, the function g can be taken as an
approximation for function f in the neighbourhood of the point x
1
.
Several independent variables
The partial derivatives, with respect to x and with respect to y of a function f(x, y) are
two functions dened as

x
f(x, y) = lim
h0
f(x + h, y) f(x, y)
h
and

y
f(x, y) = lim
h0
f(x, y + h) f(x, y)
h
The partial derivatives of a function are sometimes organized in a vector, called the gradi-
ent, denoted by f. The component of f which lies in the direction of an arbitrary unit
vector a is called the directional derivative and is given by f
T
a.
If a dierentiable function z = f(x, y) is dened implicitly in the form of the equation
F(x, y, z) = 0, implicit partial dierentiation of z with respect to x and y gives
z
x
=

x
F

z
F
and
z
y
=

y
F

z
F
wherever

z
F ,= 0.
The second order partial derivatives of f with respect to x and y are four functions

2
x
2
f(x, y),

2
xy
f(x, y),

2
y x
f(x, y), and

2
y
2
f(x, y), where

2
xy
f(x, y) =

2
y x
f(x, y). The
functions are usually organized in a symmetric matrix, called the matrix of partial deriva-
tives, or the Hessian matrix, after Otto L. Hesse.
Suppose that w = f(x, y) has continuous rst-order partial derivatives and x = g(t)
and y = h(t) are dierential functions, then w is dierentiable with respect to t and
dw
dt
=
w
x
dx
dt
+
w
y
dy
dt
which is known as the chain rule. The variable t is the independent variable, x and y
are intermediate variables and w is the dependent variable. The chain rule generalizes to
several independent variables to
w
t
i
=

j
w
x
j
x
j
t
i
.
The Taylor series for a function f of two variables in the point (x
1
, y
1
) that has deriva-
tives of all orders in some interval containing (x
1
, y
1
) reads
f(x, y) =

n=0
n

i=0
(x x
1
)
i
(y y
1
)
ni
n!

i
x
i

ni
y
ni
f(x
1
, y
1
) (2.2)
2.4. FUNCTIONS 25
The plane tangent g(x, y) to f(x, y) at the point (x
1
, y
1
) is
g(x, y) = f(x
1
, y
1
) + (x x
1
)

x
f(x
1
, y
1
) + (y y
1
)

y
f(x
1
, y
1
)
This corresponds with the Taylor series, using the zero-th and rst-order derivatives only.
Critical points
A critical point of a function at points in the plane region R within the boundary curve C
is a point in the interior of region C (so not at the boundary curve) where the derivative
equals zero or where not all partial derivatives exist. Critical points play an important role
in nding extremes of functions.
Extremes
If c is in the closed interval [a, b], which is in the domain of function f, then f(c) is called
the minimum value of f(x) on [a, b] if f(c) f(x) for all x in [a, b], or the maximum value
if f(c) f(x). If f is two times dierentiable at c, and f

(c) = 0 while f

(c) < 0, f(c) is


a local maximum, and if f

(c) = 0 while f

(c) > 0, f(c) is a local minimum. If the local


minimum f(c) is the smallest of all local minima in [a, b], f(c) is said to be the absolute
or global minimum; If the local maximum f(c) is the largest of all local maxima in [a, b],
f(c) is said to be the absolute or global maximum in [a, b].
Extremes of a function are in practice frequently found through critical points. The
function f(x) = x
3
illustrates that it has a critical point at x = 0, but no extreme. The
localization of critical points must, therefore, be followed by testing the properties of these
points. A critical point (x
1
, y
1
) of function f(x, y) is a point where the function has an
extreme if the determinant at the critical point is positive, i.e.
=
_

2
x
2
f(x
1
, y
1
)
__

2
y
2
f(x
1
, y
1
)
_

_

2
xy
f(x
1
, y
1
)
_
2
> 0.
In the case of several variables, some variables can have a maximum at the critical point,
others a minimum. If this occurs, we speak of a saddle point.
The extreme of an implicit function y = f(x), given by F(x, y) = 0 subject to the con-
straints g
i
(x, y) = 0 for i = 1, 2, can be found with the method of Lagrange multipliers
(named after its inventor Joseph Louis Lagrange, 1736-1813). The method states that this
extremes can be found by the critical points of
F(x, y)

i
g
i
(x, y) = 0
where the Lagrange multipliers
i
are considered as extra variables. Implicit partial dier-
entiation learns that we have to solve x, y and all
i
s from the system of equations
g
i
(x, y) = 0;

x
F(x, y) =

x
g
i
(x, y);

y
F(x, y) =

y
g
i
(x, y).
The method generalizes to more variables in a straightforward way.
26 CHAPTER 2. MATHEMATICAL TOOLKIT
2.4.5 Integration
Single independent variable
An antiderivative of the function f is a function F such that F

(x) = f(x), wherever f(x)


is dened. Every antiderivative G of f on an open interval has the form G(x) = F(x)+c for
a constant c. The collection of all antiderivatives of the function f is called the indenite
integral of f with respect to x, and is denoted by
_
f(x) dx = F(x) +c, and dx is called the
integrand. The denite integral of f between the boundaries x
0
and x
1
(also called lower
and upper limits) is
_
x
1
x=x
0
f(x) dx = F(x
1
) F(x
0
). If a function is continuous on [a, b], it
is integrable on that interval. For F(x) =
_
x
x
1
f(y) dy, we have dim(F) = dim(f) dim(x).
Some properties of integrals are

_
x
1
x=x
0
f(x) dx =
_
x
0
x
1
f(x) dx

_
b
a
c dx = c(b a)

_
b
a
cf(x) dx = c
_
b
a
f(x) dx

_
b
a
(f(x) + g(x)) dx =
_
b
a
f(x) dx +
_
b
a
g(x) dx

_
b
a
f(x) dx =
_
c
a
f(x) dx +
_
b
c
f(x) dx

d
dx
(
_
x
a
f(t) dt) = f(x). This property shows that derivation and integration are in-
verse processes (for one variable).

_
u dv = uv
_
v du for du = u

(x) dx and dv = v

(x) dx; a relationship known as


integration by parts.
If f is integrable on [a, b], the average value y of y = f(x) is
y =
1
b a
_
b
a
f(x) dx
An integral of special interest is
_
1
0
4 dx
1+x
2
= 3.1414592653589793. Many frequently
occurring functions are dened as integrals. Examples are
(natural) logarithm: log x =
_
x
1
dt
t
. Some properties are log xy = log x + log y,
log x/y = log x log y, log x
y
= y log x, log e = 1. The exponential function is
the inverse function of the logarithm: exp(log x) = x.
gamma function (x) =
_

0
t
x1
exp(t) dt. A property is (x) = (x 1)! for x =
1, 2, and (1/2) =

.
beta function B(m, n) =
_
1
0
x
m1
(1x)
n1
dx, with the property B(m, n) =
(m)(n)
(m+n)
.
If T : R S transforms variable x in a variable u and x = f(u), the integral over x
translates into one over u as
_
R
F(x) dx =
_
S
F(f(u))f(u) du
2.4. FUNCTIONS 27
Leibnizs theorem for the dierentiation of an integral reads
d
dx
_
g
1
(x)
g
0
(x)
h(y, x) dy =
_
g
1
(x)
g
0
(x)
h

(y, x) dy + h(y, g
1
)g

1
(x) h(y, g
0
)g

0
(x)
Just as integration is inverse to dierentiation, double integration is inverse to double
dierentiation
d
2
dx
2
__
x
a
_
y
a
f(t) dt dy
_
= f(x)
Several independent variables
The concept integration extends to several variables. Suppose that f(x, y) is continuous
on the rectangle R = [a, b] [c, d]. Then
_
b
a
_
_
d
c
f(x, y) dy
_
dx =
_
d
c
_
_
b
a
f(x, y) dx
_
dy
is the integral of f over an area R. The expressions within the brackets are called partial
integrals of f with respect to x or y.
Transformation of variables in multiple integrals uses the Jacobian, after Carl Jacobi
(1804-1851), which is the determinant of the matrix of partial derivatives, also called the
Jacobian matrix. If transformation T : R S transforms variables x and y into variables
u and v, and x = f(u, v), y = g(u, v), the Jacobian is dened as
J
T
(u, v) =

u
f(u, v)

v
f(u, v)

u
g(u, v)

v
g(u, v)

The double integral over (x, y) translates into one over (u, v) as
_ _
R
F(x, y) dx dy =
_ _
S
F(f(u, v), g(u, v))J
T
(u, v) du dv
This generalizes to more than two variables in a straightforward way; notice that this also
covers univariate transformations.
2.4.6 Roots
The fundamental theorem of the algebra states that a polynomial of degree n has exactly
n complex roots, i.e. values for the independent variable for which the dependent variable
equals zero. Some roots may have the same value. The roots of the second-order polynomial
y(x) = a + bx + cx
2
are
x
1
=
b

d
2a
and x
2
=
b +

d
2a
where the determinant d = b
2
4ac.
The roots are real and dierent for determinant d > 0, real and equal for d = 0 and
complex valued and dierent for d < 0. For all polynomials holds that if a function has a
28 CHAPTER 2. MATHEMATICAL TOOLKIT
complex root, its conjugate is also a root. If the degree is odd, the polynomial has at least
one real root.
The set of pairs (x, y) which satises f(x, y) = c, for some chosen constant c is called the
c-isocline or the c-contour, which is often used in graphical presentations of the function
f. Familiar geometrical objects are isoclines, such as
(x x
1
)
2
+ (y y
1
)
2
= r
2
for a circle with radius r and center at (x
1
, y
1
), or
x
2
x
2
1
+
y
2
y
2
1
= 1
for an ellipse with foci at (

a
2
b
2
, 0) and (

a
2
b
2
, 0) for a > b.
x
2
x
2
1

y
2
y
2
1
= 1
for an hyperbola
This idea can be extended to more variables, such as the set of triplets (x, y, z) that
satisfy
(x x
1
)
2
+ (y y
1
)
2
+ (z z
1
) = r
2
and form a sphere of radius r and a center at (x
1
, y
1
, z
1
).
2.5 Matrices
A matrix A is a rectangular array of numbers a
ij
, called elements, for which certain rules
are dened. Index i refers to the row number, j to the column number; so a
23
is the
element in row 2 and column 3 of matrix A. The size of a matrix is the number of rows
and columns; a (n, m)-matrix has n rows and m columns. The rules for matrices are
Equality: A = B if a
ij
= b
ij
for all indices; the sizes of A and B must match.
Addition of matrices: A + B = C is a matrix with elements c
ij
= a
ij
+ b
ij
; the sizes
of A and B must match. We have A+B = B +A, and (A+B) +C = A+(B +C).
Multiplication of matrices: AB = C is a matrix with elements c
ij
=

k
a
ik
b
kj
; the
number of columns of A must equal the number of rows of B. We have AB ,= BA
(in general; the commutative law does not hold for matrices), and (AB)C = A(BC),
(A + B)C = AC + BC.
Multiplication of a matrix and a scalar: Ab = C is a matrix with elements c
ij
= a
ij
b.
Notice that Ab = bA.
Direct multiplication of two matrices: AB = C is a matrix with elements c
ij
= a
ij
b
ij
for all indices; the sizes of A and B must match.
2.5. MATRICES 29
Dierentiation of a matrix:
d
dx
A = B is a matrix with elements b
ij
=
d
dx
a
ij
(x).
A consequence of these rules is the subtraction A B = C, where C is a matrix with
elements c
ij
= a
ij
b
ij
, and A (B C) = A B + C. Division is for matrices an
operation that is more complex; we rst have to discuss some other concepts.
The elements a
11
, a
22
, are called the diagonal elements of matrix A. The sum of the
diagonal elements is called the trace.
Some special matrices exist: A square matrix is a matrix where the number of rows
and columns are equal. A diagonal matrix is a square matrix of which the non-diagonal
elements are all zero. The identity matrix I is a diagonal matrix of which all diagonal
elements are one. It has the property AI = IA = A. The zero matrix 0 is a matrix of
which all elements are zero. It has the properties A0 = 0, 0A = 0 and A + 0 = A. An
upper triangular matrix is a square matrix where all elements below the diagonal are zero;
a lower triangular matrix is a square matrix were all elements above the diagonal are zero.
A vector is a matrix with a single column; a row vector is a matrix with a single row.
If AA = A, and A is non zero, matrix A is said to be idempotent.
Transposition is interchanging rows and columns, and is indicated with a prime. So
B = A
T
is a matrix with elements b
ij
= a
ji
. An implied property is (AB)
T
= B
T
A
T
. A
symmetric matrix is a square matrix with the property that A = A
T
.
2.5.1 Determinants
The determinant of a square matrix A of size (n, n) is the sum
[A[ =

P
(1)

a
1j
1
a
2j
2
a
njn
(2.3)
where the indices j
i
are all dierent (which means that each index occurs exactly one
time), is the number of inversions that are necessary to order the sequence j
1
, j
2
, , j
n
in the sequence 1, 2, n, and P is the set of n! dierent permutations of the sequence
j
1
, j
2
, , j
n
. For matrix sizes (1,1) , (2,2) and (3,3) this amounts to
[a
11
[ = a
11

a
11
a
12
a
21
a
22

= a
11
a
22
a
12
a
21

a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33

=
a
11
a
22
a
33
+ a
12
a
23
a
31
+ a
13
a
21
a
32
a
13
a
22
a
31
a
11
a
23
a
32
a
12
a
21
a
33
Determinants are not dened for non-square matrices. Some properties of determinants
are
Transposition does not aect the determinant: [A
T
[ = [A[.
The determinants of triangular matrices, and so of diagonal matrices, equal the
product of the diagonal elements.
30 CHAPTER 2. MATHEMATICAL TOOLKIT
The determinant of a matrix where we multiply a single row with a scalar c is c
time the determinant of that matrix. Consequently we have [cA[ = c
n
A, if A is a
(n, n)-matrix.
The sign of the determinant reverses if we interchange two rows or two columns.
Consequently the determinant must be zero of a matrix that has two equal rows or
columns.
The determinant if not aected by adding a multiple of some column to another
column. The same applies to rows
If a row or a column has all elements equal to zero, the determinant is zero.
We have [AB[ = [A[[B[
If [A[ = 0, A is called singular, and if [A[ , = 0, A is called nonsingular.
2.5.2 Ranks
A row of a matrix is said to be linearly dependent of a set of other rows if it can we written
as a weighted sum of the rows in that set. So if a
i
denotes row i of matrix A, a
i
depends
linearly on rows a
j
and a
k
, if scalars w
j
and w
k
exist such that a
i
= w
j
a
j
+ w
k
a
k
. The
same applies to columns. The rank of a (r, k)-matrix is the number of independent rows
if r k, or the number of independent columns if r k; if r = k, these numbers are the
same. If the rank equals min(r, k), the matrix is said to be of full rank. Some properties
of ranks are
rank (A
T
) = rank (A)
rank (A
T
A) = rank (A) and rank (AA
T
) = rank (A)
for nonsingular matrix B we have rank (AB) = rank (A) and rank (BA) = rank (A)
if a square matrix is full rank, it is nonsingular
2.5.3 Inverses
The inverse of a matrix A is a matrix A
1
with the property that AA
1
= I (right inverse)
or A
1
A = I (left inverse); it is the matrix analogue of division (remember: aa
1
= 1 for
scalars). Three situations can happen
the inverse does not exist. If A is a square matrix, it is singular. The right and left
inverses only exist if A is full rank.
the inverse is unique, which only happens if A is nonsingular. We have AA
1
=
A
1
A = I (the left and right inverses are identical), (AB)
1
= B
1
A
1
, and (Ab)
1
=
b
1
A
1
for a non-zero scalar b.
2.5. MATRICES 31
the inverse is not unique, which happens if A is not square. The left and right inverses
dier, even their sizes.
An important application of inverses is in the solution of the so-called the non-homogeneous
linear system Ax = c, where x is an unknown vector x, while vector c is known. An al-
ternative denition of an inverse A
1
of matrix A is a matrix such that x = A
1
c is a
solution of Ax = c. It can be shown that for each A (rectangular or square, singular or
nonsingular), there exist a unique matrix A
1
that satises 4 conditions: (1) AA
1
A = A,
(2) A
1
AA
1
= A
1
, (3) (AA
1
)
T
= AA
1
, (4) (A
1
A)
T
= A
1
A. Such a matrix A
1
is
called a generalized inverse, or g-inverse or pseudo-inverse, of A. If A is nonsingular (so
also square), the g-inverse of A is the (regular) inverse.
Some properties of inverses are
(A
1
)
1
= A
(A
T
)
1
= (A
1
)
T
(cA)
1
= c
1
A
1
, for a non-zero scalar c
A = AA
T
(A
1
)
T
= (A
1
)
T
A
T
A
A
1
= A
1
(A
1
)
T
A
T
= A
T
(A
1
)
T
A
1
(A
T
A)
1
= A
1
(A
1
)
T
If A =

i
A
i
, where A
T
i
A
j
= 0 and A
i
A
T
j
= 0 for i ,= j, then A
1
=

i
A
1
i
A
1
= (A
T
A)
1
A
T
= A
T
(AA
T
)
1
A
1
A, AA
1
, I A
1
A, I AA
1
are all symmetric idempotent
(ABC)
1
= C
1
B
1
A
1
if A and C are full rank
The inverse of a symmetric matrix is also symmetric
The inverse of a diagonal matrix is a diagonal matrix with the reciprocals of the
original elements
Several alternative denitions for generalized inverses exist, that satisfy other conditions
than (3) and (4). The presented denition is also known as the Moore-Penrose inverse.
2.5.4 Eigenvalues and eigenvectors
The eigen values, also called characteristic or latent roots of a square (p, p)-matrix A the
p solutions for of the determinantal equation
[A I[ = 0
The roots can be complex-valued and some roots can have the same value. Since is the
root of polynomial of the degree p, see further under 27. Associated with each eigen
32 CHAPTER 2. MATHEMATICAL TOOLKIT
value is a (right) eigen vector, which is a non-zero vector x with the property Ax = x,
and a left eigen vector, which is a non-zero vector y with the property y
T
x = y
T
. If x is
a (left or right) eigen vector ax is also an eigen vector for any non-zero scalar a, so eigen
vectors are usually normalized to length 1, i.e. x
T
x = 1. Some properties of eigen values
and eigen vectors are
The product of the eigenvalues of A is equal to [A[
The sum of the eigenvalues of A is equal to the trace of A
The eigen values of a symmetric matrix with real elements are all real.
The eigen values of a positive denite matrix are all positive.
A symmetric positive semidenite matrix of size (n, n) and rank r has r positive eigen
values and n r zero eigen values.
The non-zero eigen values of AB are equal to the non-zero eigen values of BA. As a
consequence, the traces of BA and AB are equal.
The eigen values of a diagonal matrix are the diagonal elements.
If A is symmetric, the eigen vectors x
i
and x
j
that correspond with the eigen values

i
and
j
for i ,= j are orthogonal. That is to say, if the numerical values of
i
and
j
are identical, the corresponding eigen vectors need not be orthonogal, but can always
chosen to be orthogonal. This is because all weighted sums of these eigen vectors are
again eigen vectors.
Avery real symmetric matrix A has an orthogonal matrix P such that P
T
AP = D
while P
T
P = I, where D is a diagonal matrix with eigen values on the diagonal. So,
if x = Py, the quadratic form x
T
Ax = y
T
P
T
APy = y
T
Dy =

i

i
y
2
i
, where s are
the eigen values.
2.5.5 Quadratic and bilinear forms
The scalar function f(x, y) = x
T
Ay is known as a bilinear form, and the special case
f(x) = x
T
Ax is known as quadratic form, for a symmetric matrix A.
A symmetric matrix A is positive denite if x
T
Ax > 0 for all non-zero x, or positive
semidenite if x
T
Ax 0. The concepts negative denite, negative semidenite are dened
in a similar way, while a matrix is indenite if its quadratic form can be positive as well
as negative. If A is positive denite, it is also full rank.
We have that
d
dx
x
T
Ax = 2Ax.
2.6. RANDOM VARIABLES AND PROBABILITIES 33
2.5.6 Vector calculus
The inner product of vectors a and b of size n is dened as a
T
b (which is a scalar). The
inner product a
T
a is known as the squared length (also called norm) of vector a. For size
2, this amounts to Pythagoras theorem
[[y x[[ = (x
T
y)
1/2
=
_
(y
1
x
1
)
2
+ (y
2
x
2
)
2
_
1/2
The outer product as ab
T
(which is a (n, n)-matrix)
If a vector is thought to represent the coordinates of the end point of a line segment
that starts in the origin in the n-dimensional space, the cosine of the angle between vector
x and y is given by
cos =
x
T
y
(x
T
x)
1/2
(y
T
y)
1/2
If cos = 0, vectors x and y are said to be orthogonal , and if x
T
x = y
T
y = 1 as well,
they are orthonormal . If all columns of matrix X represent orthonormal vectors, we have
XX
T
= X
T
X = I, and X
T
= X
1
.
For the rotation matrix
T =
_
cos sin
sin cos
_
,
the product y = Tx represents a rotation of x through an angle .
2.6 Random variables and probabilities
A random variable is a variable the values of which are given by chance. The set of possible
values is called the domain. If the domain is a denumerable subset of the real numbers, such
as 1,2,3,4,5,6 or 0, 1, 2, , the random variable is called discrete and is dimensionless.
If the domain is an interval within the real numbers, like the interval [0, 1] or (0,), the
random variable is called continuous and can have dimensions.
Observations (or measurements) represent realizations of a random variable; a set of
n similar observations is called a sample of size n. The relative frequency distribution of
values in the sample converges to the probability distribution for increasing sample size.
If x is a discrete random variable, then every element x of the domain W of x has a
certain probability to be realized, symbolized by Prx = x. The set of all probabilities is
called a probability distribution. Notice that
Prx = x 0 and

xW
Prx = x = 1
For continuous random variables, probabilities are specied in a dierent way. Now
for every x W we must have Prx = x = 0, because W is an uncountable set. The
probability density function (p.d.f.) f
x
species the probabilities in the form Prx
0
x
x
1
=
_
x
1
x=x
0
f
x
(x) dx. If the domain of a continuous x is W, we have
f
x
(x) 0 and
_
W
f
x
(x) dx = 1 .
34 CHAPTER 2. MATHEMATICAL TOOLKIT
For a p.d.f. we have dim(f
x
(x)) = 1/ dim(x), while probabilities are dimensionless. The
product f
x
(x) dx has the interpretation of an innitesimally small probability Prx x
x + dx.
The (cumulative) distribution function of a random variable is dened as F
x
(x) =
Prx x, and the survivor function of a random variable is dened as S
x
(x) = Prx > x.
So we have F
x
(x) = 1 S
x
(x) for all values of x. The survivor function takes its name
from a very special random variable: the lifespan of an object. The survivor function then
species the probability that the object will live longer than the value of its argument
indicates. Notice that F
x
(x) =
_
x
0
f
x
(s) ds for a continuous random variable that takes
values on the interval [0, ), and F
x
(x) =

sx
Prx = s for a discrete random variable.
Distribution and survivor functions are probabilities, and therefore dimensionless, with
values on the interval [0, 1]. They have the same meaning for discrete as for continuous
variables.
2.6.1 Expectations
The expectation of a function of a random variable is dened as cg(x) =

x
g(x) Prx = x
for discrete random variables, or cg(x) =
_
x
g(x)f
x
(x) dx for continuous random variables,
where the summation or integration is across all values in the domain of the random
variable. The mean is a special case of an expectation of a function of a random variable,
namely g(x) = x. The variance is another special case, namely g(x) = (x c(x))
2
, and
it is indicated with var x. Notice that var x = cx
2
(cx)
2
, and that dim(c(x)) = dim(x)
and dim(var x) = dim(x)
2
. Expectations of the function g(x) = x
m
are called the mth
moment, and of g(x) = (x cx)
m
, the m-th central moment. The standard deviation
is dened as the square root of the variance, so sd x =

var x; the variation coecient
as the ratio of the standard deviation and the mean, so vc x = sd x/cx. Notice that
the variation coecient is dimensionless. For non-negative random variables we have the
relationship cx =
_

0
T
x
(x) dx or cx =

x=0
T
x
(x).
Simple rules directly follow from the denitions, such as for a constant a, we must have
ca = a, var (ax) = a
2
var x, sd ax = asd x, vc ax = vc x, var a = 0, cx
2
c
2
x.
Two random variables x and y have a simultaneous (or joint) probability distribution
Prx
0
x x
1
, y
0
y y
1
. For continuous random variables we can write Prx
0
x
x
1
, y
0
y y
1
=
_
x
1
x=x
0
_
y
1
y=y
0
f
x,y
(x, y) dx dy. Two random variables are said to be stochas-
tically independent if Prx
0
x x
1
, y
0
y y
1
= Prx
0
x x
1
Pry
0
y y
1
,
i.e.
_
x
1
x=x
0
_
y
1
y=y
0
f
x,y
(x, y) dx dy =
_
x
1
x=x
0
f
x
(x) dx
_
y
1
y=y
0
f
y
(y) dy for continuous random vari-
ables. For independent variables we have cxy = cx cy.
The covariance between x and y is dened as cov (x, y) = c(x cx)(y cy) = cxy
cxcy, which is zero if x and y are independent. Notice that cov (x, x) = var x and
dim(cov (x, y)) = dim(x) dim(y). The correlation coecient between x and y is dened as
cor (x, y) =
cov (x, y)
(sd x) (sd y)
.
The correlation coecient can take values in the interval (0, 1), and equals 0 for independent
variables. It is dimensionless. Notice that cor (x, x) = var x/(sd x sd x) = 1.
2.6. RANDOM VARIABLES AND PROBABILITIES 35
Some implied rules for constant a and b: cov (ax, by) = ab cov (x, y), cor (ax, by) =
cor (x, y). If x and y are independent, we have var (x + y) = var (x) + var (y), and
var (x y) = var (x) + var (y).
2.6.2 Examples of probability distributions
The Poisson probability distribution is dened by
Prx = x =

x
x!
exp() for x = 0, 1,
where the parameter is non-negative. Properties are that cx = var x = . Notice that
, and therefore x, must be dimensionless. The sum of independently Poisson distributed
random variables is again Poisson distributed.
The binomial probability distribution is dened by
Prx = x =
_
n
x
_
p
x
(1 p)
nx
for x = 0, 1, , n
where the parameter p can take values in the interval (0, 1), and the parameter n can take
the values 1, 2, . Properties are cx = np, and var x = np(1p). The parameters p and
n, and x are dimensionless. The binomial distribution converges to the Poisson distribution
for np n. For n = 1, the binomial distribution is also called the Bernoulli distribution.
If x
i
follows a Bernoulli distribution with parameter p,

n
i=1
follows a binomial distribution
with parameters p and n.
The multinomial probability distribution is an extension of the binomial one, and de-
ned as
Prx
1
= x
1
, , x
s
= x
s
=
n!
x
1
! x
s
!
p
x
1
1
p
xs
s
for x
i
= 0, 1, , n and

i
x
i
= n
with

s
i=1
p
i
= 1. The parameter s takes values 2, 3, , and x
i
takes values 0, 1, , n,
and p
i
takes values in (0, 1), for all i = 1, s. Notice that the multinomial distribution
reduces to the binomial one for s = 2; since x
2
= n x
1
, it is no longer of much interest,
and we call x
1
just x, and suppress reference to x
2
.
The geometric probability distribution is dened as
Prx = x = p(1 p)
x
for x = 0, 1,
where parameter 0 < p < 1 has the interpretation of the probability of success when x
measures the number of unsuccessful trials before the rst successful one if the trials are
independent. We have cx = (1p)/p and var x = (1p)/p
2
. If x measures the number of
unsuccessful trials before the r-th successful one (so the r-th success occurs at trial r +x),
x is negative binomially distribution
Prx = x =
_
x + r 1
x
_
p
r
(1 p)
x
for x = 0, 1,
If x
i
is geometrically distributed with parameter p,

r
i=1
x
i
is negative binomially dis-
tributed with parameters p and r. We have cx = r(1 p)/p and var x = r(1 p)/p
2
.
36 CHAPTER 2. MATHEMATICAL TOOLKIT
2.6.3 Examples of probability density functions
The homogeneous distribution on the interval (a,b) is given by
f
x
(x) = (x > a)(x < b) for < x <
The mean in cx = (a+b)/2 and variance var x = (ba)
2
/12. We have dim(a) = dim(b) =
dim(x). If x is homogeneously distributed on the interval (0, 1), log x is exponentially
distributed with parameter 1. The importance of this fact is the readily availability of
random generators for homogeneously distributed random variables, which are used in
Monte Carlo methods.
The beta distribution is dened by
f
x
(x) =
x
1
(1 x)
1
B(, )
for 0 x 1
which has mean cx =

+
and variance var x =

(+)
2
(++1)
. Both , and x must be
dimensionless.
A random variable is said to be exponentially distributed if
f
x
(x) = exp(x) for x 0
where and x take values in (0, ), while dim() = 1/ dim(x). We have cx =
1
,
var x =
2
, F
x
(x) = 1 exp(x), T
x
(x) = exp(x). The relationship between the
exponential and the Poisson distribution is explained in the section on point processes.
The sum of n independent exponentially distributed random variables with parameter
is gamma distributed:
f
x
(x) =

(n)
(x)
n1
exp(x) for x 0
which has mean cx = n/ and variance var x = n/
2
. We have dim() = 1/ dim(x), and
n is dimensionless.
The uni- and p-variate normal p.d.f. are dened as
f
x
(x) = (2
2
)
1/2
exp(
(x )
2
2
2
) for < x <
f
x
(x) = (2)
p/2
[[
1/2
exp(
1
2
(x )
T

1
(x )) for < x
i
<
where takes values in (, ), and and in (0, ). Properties are that cx = and
var x =
2
, which shows that dim() = dim() = dim(x). The signicance of the normal
distribution is by the central limit theorem, which states that the sum of independent
identically distributed random variable is asymptotically normally distributed, for large
enough number of random variables (irrespective of their own distribution).
The
2
p.d.f. is dened as
f
x
(x) = (2
/2
(/2))
1
x
/21
exp(x/2) for x 0
2.6. RANDOM VARIABLES AND PROBABILITIES 37
The dimensionless parameter is usually called the degree of freedom, and can assume
a value in the interval (0, ). Notice that ( + 1) = !, for = 0, 1, . We have
that cx = . The sum of squared independent normally distributed random variables
with mean 0 and variance 1 is
2
-distributed with degrees of freedom. Moreover, if
x x
1
, x
2
, , x
p
N(, ), the quadratic form (x )

1
(x ) is
2
distributed
with p degrees of freedom.
2.6.4 Conditional and marginal probabilities
Suppose the x can assume values in . The probability on the result x = x for some
x
0
, given that x assumes values in
0
, while
0
is a subset of , equals
Prx = x[x
0
= Prx = x/ Prx
0

This is called a conditional probability. If x follows a Poisson distribution, for example,


the probability of x, given that it is larger than 0 is given by Prx = x[x > 0 = Prx =
x/(1 Prx = 0) =

x
x!
exp()/(1 exp()). This is an example of a truncated
probability distribution.
Likewise, if x and y follow some simultaneous probability distribution, the probability
that x attains some value, given that y has some specied value equals
Prx = x[y = y = Prx = x, y = y/ Pry = y
where Pry = y =

x
Prx = x, y = y is the marginal distribution of y. For continuous
random variables x and y we have the conditional p.d.f. f
x
(x[y) = f
x,y
(x, y)/f
y
(y), where
f
y
(y) =
_
x
f
x,y
(x, y) dx is the marginal p.d.f. of y. The conditional p.d.f. of x has dimension
dim(x)
1
.
If x follows some probability distribution with parameter , while this value represents
a random trial from some p.d.f. we have the mixture Prx = x =
_

Prx = x[f

() d.
2.6.5 Calculations with random variables
The probability distribution of a sum z of two random variables x and y is given by
Prz = z =

v
Prx = v, y = z v. If x and y are independent, this reduces to
Prz = z =

v
Prx = v Pry = z v. Likewise for independent continuous random
variables we have f
z
(z) =
_
v
f
x
(v)f
y
(z v) dv =
_
v
f
x
(z v)f
y
(v) dv. Such an integral is
known as a convolution integral .
The probability distribution of a product z of two random variables x and y is given
by Prz = z =

v
Prx = v, y = z/v; for independent continuous random variables we
have f
z
(z) =
_
v
f
x
(v)f
y
(z/v) dv.
The survivor function of the maximum y of n independent identically distributed ran-
dom variables x
i

n
i=1
is S
y
(y) = S
x
(y)
n
. The distribution function of the minimum y of n
independent identically distributed random variables x
i

n
i=1
is F
y
(y) = F
x
(y)
n
.
Suppose that y = g(x) is some monotonous transformation g of x, with inverse G, so
x = G(g(x)). If x has p.d.f. f
x
(x), the distribution function of y is Pry > y = Prx >
38 CHAPTER 2. MATHEMATICAL TOOLKIT
G(y) and the p.d.f. of y is f
y
(y) = f
x
(G(y))
d
dx
g(x). If x is homogeneously distributed
on [0, 1], so f
x
(x) = (x 0)(x 0), and if g(x) = log x, then G(y) = exp(y), and
f
y
(y) = (y > 0) exp(y) y
1
; so y is exponentially distributed.
2.7 Numerical methods
The art of numerical methods is to nd numerical values for quantities that cannot be
obtained analytically. The problem is always to specify the desired accuracy, in combina-
tion with robustness and reasonable computational eort. Many numerical procedures
have been developed, we only illustrate a very few simple ones to illustrate the concepts.
The rapid increase of the computer performance makes that computational eort is con-
tinuously changing in appreciation, which makes that numerical methods are still in full
development. For simple applications it has become close to meaningless, but for many
problems computational eort is still an issue to take into consideration. This is also caused
by the fact that problems are nowadays solved numerically, that were avoided in earlier
days for obvious reasons.
2.7.1 Numerical integration
The trapezoidal approximation for a denite integral is
_
b
a
f(x) dx
x
2
(y
0
+ 2y
1
+ + 2y
n1
+ y
n
)
where x = (b a)/n for some appropriate choice of n, and y
i
= f(a + ix). Tangent
line and tangent parabola approximations usually work better for smooth functions, i.e.
the balance between accuracy and computational eort.
For a dierential equation
d
dt
x = f(t, x), the second order Runge-Kutta method reads
x
n+1
= x
n
+ (k
1
+ k
2
)/2 + O(h
3
) for k
1
= hf(t
n
, x
n
) and k
2
= hf(t
n
+ h, x
n
+ k
1
)
for some chosen step size h, with t
n+1
= t
n
+ h. Starting from (0, x
0
), the pairs (t
i
, x
i
)
are generated till some chosen endpoint t
N
, while x may be vector-valued. If f does not
depend on x, the solution of the dierential equation reduces to plain integration, and the
second-order Runge-Kutta method reduces to the trapezoidal approximation. Higher order
Runge-Kutta schemes are devised; the fth-order allows a variable step size, where the steps
are small if the function f changes rapidly. Dierential equations can behave in sti way,
which means that the values change slowly for most of the time, but sometimes they do
change rapidly. The integration of such equations require special techniques, because the
rapid changes force a small choice of step size h, which is most of the time not necessary.
See also splines at 53.
2.7.2 Numerical dierentiation
Numerical dierentiation follows the denition of dierentiation, omitting the limit. This
makes that the forward and backward incremental step in the independent variable no
2.7. NUMERICAL METHODS 39
x
0
x
1
x
2
x
0
x
1
x
2
Figure 2.3: The Newton Raphson procedure for
nding a root of a function of a single variable.
The root of the tangent line at a point is used
as the new point. The approximate domain of
attraction is given in green. Any choice for a
starting point x
0
outside this domain will not
result in an approximation for the root.
longer has the same result. We take the mean with
d
dx
f(x
1
)
f(x
1
+ x) f(x
1
x)
2x
for some appropriate choice of x. Numerical dierentiation easily gives substantial devi-
ations from the real thing; whenever possible it should be avoided. This even more applies
to the numerical approximation of the second derivative
d
2
dx
2
f(x
1
)
f(x
1
+ x) 2f(x
1
) + f(x
1
x)
(x)
2
.
See also splines at 53.
2.7.3 Root nding
Newton Raphsons iteration scheme for nding the roots of equation f(x) = 0 is
x
n+1
= x
n
x
n
with x
n
=
_
d
dx
T
f(x
n
)
_
1
f(x
n
)
where both f and x can be vector valued. Starting from an appropriately chosen x
0
, the
convergence is usually very fast, but the basin of attraction s rather small (so x
0
has to be
close to to the real root). The iteration is stopped when the norm [[f(x)[[ is smaller than
a specied value, or the number of iterations exceeds a maximum. The basin can usually
be extended by restricting the step size [[dx[[ to some upper limit by multiplication of x
with a scalar. Be aware of the problem of multiple roots. The iteration only nds one,
which depends on the starting value x
0
, see Figure 2.3.
Newtons method is frequently used for nding extremes, by applying it to the deriva-
tives of the function. This only works if the function satises smoothness conditions. Be
ware of the problem that Newtons method only nds a critical point, which is not always
the one you want to have. The result requires testing to tell critical points apart from
extremes (and minima from maxima). Several starting values should be used to nd other
critical points, but one cannot be sure that all critical points are detected.
See also splines at 53.
40 CHAPTER 2. MATHEMATICAL TOOLKIT
contraction in 1 dim away from worst point
reection
reection and extension
contraction in n dim to best point
Figure 2.4: The simplex method in 3 variables, so the simplex consists of 4 points. The worst
point (red) is replaced by a better point (open), according to one of 3 possible strategies, or the
simplex is contracted to the best point (green). The second best point is blue. The vertices of
the new simplex are connected with dotted lines.
2.7.4 Extreme nding
The downhill simplex method is due to Nelder and Mead [19]. It is slow, but also very
robust; it requires no derivatives. If the minimization is in n variables, simplex method
starts in n+1 points (or vertices), and the worst point is replaced be a better point according
to the scheme indicated in gure 2.4. A pocedure is stopped when the vertices are closer
together than some threshold value, or the function values at the vertices dier less than
some threshold value, or the number of function evaluations exceeds some threshold value
(unsuccessful termination).
Chapter 3
Models for processes
Life is essentially a process in time, no wonder that the analysis of processes is basic in
biology. Two aspects of processes are usually studied: transient states, where the process
is followed while it evolves in time, and steady states, where the long-term behaviour is
studied, that is independent of how the process originally started.
3.1 Types of processes
Processes can be modeled such that time can take particular values only (discrete time
processes, where the distance between the time points, the time steps, is usually constant)
or any value (continuous time processes)
The specication of discrete time models is usually done in the form of maps, where
the value of quantities at time t is written as some function of the values are earlier points
in time.
Most models for processes are deterministic, which means that they do not include
random variables, contrary to stochastic models. One of the main reasons is that realistic
implementations of random variables very easily lead to extremely complex models. Most
stochastic models (such as the class of stochastic dierential equations) do not satisfy e.g.
laws for energy and mass conservation in the strict sense. Their statistical evaluation
is very complex, so that one usually has to reply on computer simulation studies. We
here only discuss Markovian models, which still have some basic simplicity. Deterministic
models for processes obviously suer from the problem that they are hardly realistic for
since stochastic behaviour is basic to most biological processes.
3.1.1 Stochastic processes
Correlation functions
If the pairs x(t
i
), y(t
i
)
i
form a stochastic process with constant time step t = t
i+1
t
i
,
the function
xy
(ht) = cor (x(t
i
), y(t
i+h
)) is known as the cross-correlation function, and
the
xx
(ht) = cor (x(t
i
), x(t
i+h
)) as the auto-correlation function. They quantify the
memory of the process in a particular way. Similar constructs are dened for continuous
41
42 CHAPTER 3. MODELS FOR PROCESSES
time, and for more than two variables. Correlation functions are matrix-valued, where the
independent variable represents the time shift between the members of correlated pairs.
Markov chains
A Markov chain is a model of the type
p(t
n+1
) = Pp(t
n
) (3.1)
where p(t
n
) is the vector with probabilities that the system is in its various states, and
P the square matrix with xed transition probabilities; the typical element p
ij
gives the
probability that the system will be in state i in the next time step, given that it is in state
j. By denition we must have that P1 = 1 and p(t
i
)
T
1 = 1, where 1 is a vector with ones.
A consequence of the construct is that p(t
n+r
) = P
r
p(t
n
), and that the steady state
probability distribution satises Pp(t

) = p(t

), in other words it is an eigen vector of P


that is associated with eigen value 1.
The simplism of the Markov chain derives from the fact that it has no memory; it does
not matter what path the system took to arrive in a certain state. Some memory can be
built into the Markov chains by including the path in the denition of the state. So if the
system has originally n states, we delineate n
2
states to include the previous state, and n
3
states to include the last two states.
A branching process is a Markov chain of a special structure, where the state of the
system represents the total number of objects and at each time step each object is replaced
by a number that represents a random trial from some probability distribution. This is
typical for typical for particular types of organisms, but interactions among organism are
dicult to implement in branching processes.
Markov processes
Markov processes are Markov chains in continuous time. The transition from discrete to
continuous time can be smooth for a particular type of Markov chains, where the states
represent numbers in a population of objects. Transition probabilities in an innitesimally
small time increment are only positive for neighbouring states, and proportional to the
time increment. So we arrive at the specication for dt 0
Prx(t + dt) = m[x(t) = n =
_

_
dt if m = n 1
1 ( + ) dt o(dt) if m = n
dt if m = n + 1
o(dt) if m < n 1 or m > n + 1
(3.2)
where lim
dt0
o(dt)/dt = 0 by denition. This specication can be rewritten in the dier-
ential equation for p
n
(t) Prx(t) = n
p

n
(t) = p
n1
(t) + p
n+1
(t) ( + )p
n
(t)
Together with a specication of the initial conditions, such a process is called a birth
and death process. For = 0 and = x, for a constant , the process is called the
3.2. SYSTEMS 43
Yule process. The birth rate is here proportional to the population size, which is typical
for organisms. The inclusion of interactions among organisms (such as competition for
resources), however, is dicult to implement in Markov processes.
Point processes
A point process is a special type of stochastic process, where the time points of the oc-
currence of events are stochastic. Such processes have two aspects: the counting process,
which follows the number of events in a xed time interval, and the interval process, which
deals with the time intervals between subsequent events. If the probability of two events
in a single time interval is negligibly small, we call the process orderly.
If subsequent time intervals represent independently identically distributed random
variables, we call the process a renewal process. The term renewal process originates from
a special type of point process, where the events represent replacement of an object, such
as a light bulb for instance. If these intervals are independently exponentially distributed,
we call the process a Poisson process, because the number of events in a time interval is
then Poisson distributed.
The intensity of a point process is dened as the expected number of events per time
increment, considered as a function of time. If the intensity is constant, we call the process
ergodic. The intensity equals the hazard rate. The hazard rate is dened as
h
t
(t) =
f
t
(t)
S
t
(t)
,
where the continuous random variable t frequently has the interpretation of the life span
of an object. It is also called mortality rate; the product h
t
(t) dt species the probability
that the object will die in the innitesimally small time interval (t, t + dt), given that
it is still alive at t. So it is a conditional p.d.f.. Notice that h
t
(t) =
d
dt
ln(S
t
(t)), so
S
t
(t) = exp
_
t
0
h
t
(s) ds.
If the hazard rate is constant, the process is a Poisson process, so the number of death
in a xed interval is Poisson distributed, and the intervals between subsequent deaths are
exponentially distributed. A central theorem states that the sum of an increasing number
of mutually independent ergodic point processes converges to a Poisson process.
3.2 Systems
The idea behind the concept of a system is simple in principle. A system is based on the
idea of state variables, which are supposed to specify completely the state of the system
at a given moment. Completeness is essential. The next step is to specify how the state
variables change with time as a function of a number of inputs and each other. The
specication usually takes the form of a set of ordinary dierential equations (ode s)
d
dt
x = f(x[) for x = (x
1
, , x
n
)
T
(3.3)
44 CHAPTER 3. MODELS FOR PROCESSES
which have parameters , i.e. constants that are assumed to have some xed value in the
simplest case. Usually this specication also includes a number of outputs. The set of
dierential equations fully species the dynamics of the system in combination with the
specication of the system at the start, x(0), or at some moment in time, x(t
1
). Finding the
states of the system as a function of time is called an initial value problem, of a boundary
value problem, respectively.
Parameters are typically constant, but sometimes the values change with time. This can
be described by a function of time, which again has parameters that are now considered
to be constant. For instance, parameters that have the interpretation of physiological
rates depend on temperature; therefore, they remain constant as long as the temperature
does not change. If the temperature does change, then the parameters do as well. Heat,
however, is generated as a side product of metabolism. In ectotherms, i.e. animals that
do not heat their body to a constant high temperature, heat production is low, because
of their usually low body temperature. The body temperature usually follows that of the
environment, and can thus be treated as a function of time. The situation is more complex
in developing birds, which make the transition to the endothermic state some days after
hatching. The hatchlings temperature is high, because of brooding; therefore, metabolism
and heat production are also high. In addition, the young bird starts to invest extra energy
in heating. Here, the state variables of the system interfere with the environment, but not
via input; this means that the body temperature must be considered as an additional state
variable.
Choosing the state variables is the most crucial step in dening a system. It is usually
a lot easier to compare and test alternative formulations for the change of state variables,
than dierent choices of state variables. Models with dierent sets of state variables are
hardly comparable, both conceptually and in tests against data. Statistics basically deals
with parameter values, and is of little use when comparing the goodness of t of models
that dier in structure.
3.2.1 Constraints on dynamics
Mass, energy and other conserved quantities pose constraints on the possible behaviour
of dynamic systems. The explicit use of these constraints forms a powerful tool in the
specication of the dynamics of the system. If the states are appropriately specied, these
k constraints on x
1
, x
n
can be written in the form g
i
(x) = 0, for i = 1, , k. If all
x
i
represent masses, the functions g
i
take the form

ij
w
ij
x
j
= 0, where w
ij
are xed
weight coecients. The system can be reduced to n k variables, and we have to nd the
remaining k variables from the k constraints. If the system is not reduced, the Jacobian
matrix of the system will have k eigen values equal to zero at any point in time. This
might hamper the application of software for the analysis of asymptotic properties of the
system.
3.2. SYSTEMS 45
3.2.2 Feedback
Systems can behave in ways that cause an amplication of that behaviour, a phenomenon
called positive feedback, or a reduction of that behaviour, called negative feedback. A grow-
ing population of organisms is likely to grow faster because more organisms partake in
reproduction (positive feedback), but they exhaust their resources sooner (negative feed-
back). The notion feedback originates from engineering, where systems are constructed.
In biology, where systems are usually given, and many components of the system work in
opposite directions, the notion is less operative.
3.2.3 Asymptotic properties
Some model systems have innite memories. Think, e.g. of a deterministic model for
cell growth and division in a homogeneous environment where each cell divides into two
identical daughter cells when it reaches a certain size. Since all cells experience the same
environment, all daughter cells of some cell in the inoculum (i.e. the founder population)
will divide at the same moment, because they all experience the same environment. Such a
system remains dependent on properties of the inoculum (e.g. the size distribution). Such
(unrealistic) memory can be removed in dierent ways (such as small random dierences in
size between daughters, or individual-specic size thresholds for division), but this makes
the model more complex. We here discuss asymptotic properties of models that have nite
memories, so all information about the (remote) past disappears, and focus on properties
of a set of ODEs of the type
d
dt
x = f(x), for some vector-valued x, and how it depends on
the parameters of the system..
The rst important observation is that, if we follow x(t) through time by evaluating
x(t) =
_
t
0
f(x(s)) ds, starting from some x(0), several things can happen. First the system
can become degenerated, i.e. one or more components of x can become zero, or (the
latter possibility is usually excluded by conservation laws). If not, it can evolve to some
constant value x

, or it can start to change cyclically. This behaviour might depend on


the choice for x(0). The system is attracted by a point attractor or a cyclic attractor.
When the number of loosely coupled variables is suciently large in a system, the system
is likely to have very complex asymptotic behaviour for some combination of parameter
values, including the occurrence of multiple attractors, possibly of the chaotic type, called
strange attractors. A deterministic system is said to behave chaotically if it varies in time
without repeating itself periodically. Since this is dicult to check (because periods can
be extremely large), another property is used in practice: an innitesimally small change
in initial values of the system eventually results in a substantial dierence in behaviour.
This property is not unique for chaos, however. The behaviour of the system is usually
compared with that for slightly dierent values of particular parameters of the system, to
see if the pattern matches the known routes to chaos. It remains dicult to be sure that
a system behaves chaotically, but it is not rare to nd chaos in the more complex models.
This is almost independent of the specic model.
Many asymptotic properties of a set of ODEs x

= f(x) can be deduced from the eigen


values of the Hessian

2
xx
T
f(x

) evaluated at an equilibrium x

of the system f(x

) = 0.
46 CHAPTER 3. MODELS FOR PROCESSES
Table 3.1: List of basic local bifurcations for ODEs: dx/dt = f(x, ), and maps: y
n+1
= f(y
n
, )
with normal forms. The bifurcation point is = 0. is the eigenvalue of the Hessian evaluated
at the equilibrium (
d
dt
x = 0 and y
n+1
= y
n
) and is the Floquet multiplier evaluated at the limit
cycle. The bifurcation type depends on the real (Re) parts of these characteristic exponents. A
stable positive attractor originates at a supercritical transcritical bifurcation (superscript +) and
an unstable positive equilibrium or limit cycle at a subcritical transcritical bifurcation (superscript
). Superscript refers to supercritical and subcritical.
symbol bifurcation normal form characteristic
exponents
T
e
Tangent, of equilibrium
d
dt
x = x
2
Re = 0
T
c
Tangent, of limit cycle y
n+1
= y
n
+ y
2
n
Re = 1
TC

e
Transcritical, of equilibrium
d
dt
x = x x
2
Re = 0
TC

c
Transcritical, of limit cycle y
n+1
= (1 + )y
n
y
2
n
Re = 1
H

Hopf
d
dt
x = y + x( (x
2
+ y
2
))
d
dt
y = x + y( (x
2
+ y
2
)) Re
1,2
= 0
F

Flip y
n+1
= (1 + )y
n
y
3
n
Re = 1
If such an equilibrium exists for non-zero values of the variables and the eigen values are
all real, the equilibrium represents a point attractor; if some are complex, the equilibrium
is unstable, and has a limit cycle associated with it. The imaginary part gives information
about the orbit. If a system happens to be at an unstable equilibrium it will remain there
for ever, but if it is slightly disturbed (i.e. placed out of equilibrium) it will never return
to an unstable equilibrium. If a system is slightly disturbed while it was in a limit cycle,
it may return to that limit cycle, and we then speak of a stable limit cycle, or it might
follow a dierent limit cycle, in which case we speak of neutral stability. The study of the
reactions of a system to small perturbations is called local stability analysis, which does
not give information about reactions to large perturbations, called global stability analysis.
A system can have more than one attractor.
In practice we rst try to nd an approximation for a point attractor by following the
system through time (e.g. using a Runge Kutta method), and then we try to nd the roots
of the system f(x

) = 0 (e.g. using a Newton Raphson method). Once we have found a


root, we can change one or more parameters a little, and nd a new root with the Newton
Raphson method, using the previous roots as a starting value. This technique is called
continuation. Notice that we can nd unstable roots this way, to which the system cannot
evolve.
Bifurcation analysis deals with qualitative changes in the asymptotic behaviour of the
system, when a parameter is varied in value; this selection from the list of parameters of the
system is called bifurcation parameter. Its interest rests on the interpretation of the system,
so no general rules can be given for its selection. Table 3.1 gives the possible bifurcation
types. The bifurcation type depends on the value of the eigenvalue of the Hessian matrix
evaluated at the equilibrium and the Floquet multiplier, which is an eigenvalue of the
Poincare next-return map. If all complex values of the Floquet multipliers are within the
unit circle, the dynamic systems orbit converges to a limit cycle.
3.2. SYSTEMS 47
The analysis of bifurcation behaviour must be done numerically, using specialized soft-
ware: locbif [13] and auto [4] can calculate bifurcation diagrams using continuation
methods. The theory is documented in [15]. The analyses cannot be done on a routine
basis, however, and the user must have a fairly good idea of what to expect and what
to look for. Although the software is rapidly improving in quality, at present it is still
decient in computing certain types of global bifurcations, for instance, and one has to
rely on in-house software to ll in the gaps, see [3].
Results of bifurcation analyses are frequently reported in the form of bifurcation dia-
grams. These diagrams connect points where systems asymptotic behaviour changes in a
similar way when the bifurcation parameters are varied. So, the system has similar asymp-
totic behaviour for values of bifurcation parameters within one region. The construction
of such diagrams is only feasible if there are just one or two of such parameters. Diagrams
with these parameters are called operating diagrams.
48 CHAPTER 3. MODELS FOR PROCESSES
Chapter 4
Model-based statistics
4.1 Scope of statistics
Statistics deals basically with the estimation of model parameters from data and with the
testing of hypotheses about their values; do comparable parameter values in two samples
dier signicantly, or does a particular parameter value dier signicantly from some given
value? These ideas may be used for optimizing experimental design and sampling strategies.
Statistics cannot deal with problems like: is this model correct, or does this model t
signicantly better than that model? Statistics treats the model as given. The goodness
of t of a model can be quantied. Only stochastic models can be tested against data;
deterministic models are usually extended to stochastic ones, by introducing a measure-
ment error, and treating it as a regression model. This is convenient, but rarely realistic.
Models might t data well for the wrong reasons; models are idealizations of reality, so
we can only expect some deviations from model predictions. To what extend deviations
from model predictions are problematic depends on the purpose, so on the context. Since
we deal with probabilities, we cannot be sure of anything while using statistics; a correct
model might t data poorly.
A model gets its value from the mechanistically inspired assumptions from which it is
derived. Without such assumptions (so if the model itself is the assumption), the model
is close the useless, including all statistical inference based on it. This is the reason why
one should never transform data, wherever statistical text books might tell about this;
transformations destroy the relationship between the model and its assumptions, and so
the usefulness of the model.
Many statistical methods are based on linear models (e.g. ANOVA, multiple correlation,
principal component analysis, factor analysis, auto-regression). Since such models rarely
apply in biology, these methods are not discussed here.
4.2 Measurements: scales and units
A measurement assigns a numerical value to a quantity (object or process). Depending on
the nature of this quantity, the measurement can be in one of the following scales:
49
50 CHAPTER 4. MODEL-BASED STATISTICS
a annum of time (1 a 365.25 d = 31.56 Ms) A ampere of electric current
At ampere-turn of magnetomotive force C coulomb of electrical charge (1 C = 1 AS)

C degree Celsius (0

C = 273.15 K) cd candela of luminous intensity


d day of time (1 d = 24 h = 86.4 ks) F farad of electric capacitance (1 F = 1 CV
1
)
g gram of mass h hour of time (1 h = 3600 s)
H henry of inductance (1 H = 1 Wb A
1
) Hz hertz of frequency (1 Hz = 1 s
1
)
J joule of energy (1 J = 1 Nm) K kelvin of temperature
l liter of volume (1 l = 1 dm
3
) lm lumen of luminous ux (1 lm = 1 cd sr
1
)
lx lux of illumination (1 lx = 1 lmm
2
) m meter of length
mol mole of compound (1 mol = 6.02 10
23
molecules) nt nit of luminance
N newton of power (1 N = 1 kg ms
2
) ohm of resistance (1 = 1 VA
1
)
Pa pascal of pressure (1 Pa = 1 NM
2
) rad radian of plane angle
s second of time S siemens of electrical conduction (1 S = 1
1
)
sr steradian of solid angle T tesla of magnetic ux density (1 T = 1 Wb m
2
)
V volt of potential dierence (1 V = 1 WA
1
) W watt of power (1 W = 1 J s
1
)
Wb weber of magnetic ux (1 Wb = 1 Vs)
Table 4.1: Symbols for important single units of measurement in the SI system.
Nominal scale The numbers only represent a name for a category of objects, e.g. sexes
of organisms can be numbered as 0, 1, (yes, some organisms have 4 sexes). No
operators are dened for this scale, and the scale is invariant under bijection.
Ordinal scale The numbers have an order, such as marks of an exam, or wind speed in
Beaufort. Addition and multiplication are not dened for this scale. This scale is
invariant under monotone transformation.
Interval scale The dierence between numbers have a meaning, such as temperature in
degrees Celsius or Fahrenheit. This scale is invariant under linear transformation
(y = ax + b).
Ratio scale The dierence and ratios between numbers have a meaning, which implies
that the value zero is not arbitrary, such as temperature in Kelvin, or speed. This
scale is invariant under proportional transformation (y = ax).
Absolute scale Like the ratio scale, so the scale has a natural origin, but it also has a
natural unit. The classic example is counts.
Numbers on the interval and ratio scales have units. These units standardized in the
International System, and symbols are associated with these standardized units, see Table
4.1.
To avoid the notation of small or large numbers, submultiple prexes are used, such as
in ms (milli-second) or km (kilo-meter); see Table 4.2.
4.3 Precision and accuracy
The precision with which a variable should be measured completely depends on the aim
of the measurement. Before measuring, we therefore ought to have an idea of what to
4.3. PRECISION AND ACCURACY 51
10
18
a atto- 10
15
f femto- 10
12
p pico- 10
9
n nano-
10
6
micro- 10
3
m milli- 10
2
c centi- 10
1
d deci-
10
1
da deka- 10
2
h hecto- 10
3
k kilo- 10
6
M mega-
10
9
G giga- 10
12
T tera- 10
15
P peta- 10
18
E exa-
Table 4.2: Standardized prexes that can be used in combination with SI units.
do with the result. Too high an precision, or too low can be a waste of eort. The
circumstance that the one planning the measurement is often somebody else than the one
actually performing the measurement should stimulate an explicit statement of the desired
precision. The presentation of the measurement should reect its precision in the form of
the correct number of so-called signicant gures. This number is just one larger than the
number of which one is certain. The number 13.40, or 0.001340, has 4 signicant gures.
The measurer is certain of the rst three gures, but the last one, 0, is estimated. Non-
signicant zeros are always suppressed. Therefore, the number of signicant gures in
134000 is not obvious. It can range from 3 to 6. In order to make the number of signicant
gures explicit and, sometimes, to shorten notation, the notion of oating points has been
introduced. In a oating point notation, one places the point just behind the rst gure
and multiply with the appropriate integer power of 10. So 1.34 104 has 3 signicant gures,
while 1.3400 104 has got 5. Computer related manuscripts frequently use E for times ten
to the power, like 1.34E4 or 1.3400E4.
By rounding o, one reduces the number of signicant gures: One replaces the number
by the nearest number with one (or more) signicant gures less. The number 14.0979 is
rounded to 14.098, 14.10, 14.1, 14, and 1E1 depending on the desired number of signicant
gures. In calculations, one should only round numbers in the end result only, and not in
intermediate results. Otherwise, errors can build up rapidly. In machines, gures become
lost by a process called truncation, where they are simply deleted. Usually, the number
of gures is suciently large to be of no practical problem. Sometimes, however, it is not
always easy to tell when, errors induced by this truncation can built up to an intolerable
extend. In such case one should change algorithm or machine.
Measurements can usually not be repeated exactly. The deviation from the unknown
true value, is termed error, which can be random or systematic. Random errors are
characterized by the property that the mean of a suciently large number of independent
measurements of the same object is arbitrary close to the true value. This does not hold for
systematic errors, like those occurring when the tare is not compensated when weighing
objects. Sometimes the term precision is used to describe the number signicant gures,
and the term accuracy to describe the distance to the real value (which is usually unknown,
that is why we measure). The only defense against systematic errors is calibration, which
is, of course, specic for the measurement. It bit more can be said about random errors,
when we dene them more exactly as the standard deviation. That is to say, we usually do
not measure the size of a random error each time by repeating the measurement, but we
use the value obtained from some prior calibration procedure (usually performed by the
manufacturer of the equipment). The size of a random error can be expressed in terms of
absolute error, having the same dimension as the measurement itself, or in terms of the
52 CHAPTER 4. MODEL-BASED STATISTICS
relative error, where we divide by the measured value, giving a dimensionless quantity.
Many measurements, like the surface area of a rectangular body, speed etc.) are com-
pound, i.e. consist of a function of a number of other measurements. The following discus-
sion shows how small independent random errors propagate in such compound measure-
ments. For this purpose we linearize the function by a Taylor series approximation in the
true value of the compound measurement. We explicitly neglect all terms involving powers
of the error higher than 1, which is only reasonable if the error is small indeed. When
f denotes the function of interest, x
1
, x
2
, the measurements, y
1
, y
2
, the true values
for the measurements, i.e. the expected values for x
1
, x
2
, , so cx
i
= y
i
, and d
1
, d
2
, the
deviations, i.e. d
i
= x
i
y
i
, we approximate f(x
1
, x
2
, ) by
f(x
1
, x
2
, ) f(y
1
, y
2
, ) +

i
d
i
f

i
(y
1
, y
2
, ), (4.1)
where f

i
represents the derivative of f with respect to x
i
: f

i
(y
1
, y
2
, ) =
d
dx
i
f(y
1
, y
2
, ).
The rst useful observation, when we apply the expectation operator to both sides, is that
cf(x
1
, x
2
, ) cf(y
1
, y
2
, ) +

i
cd
i
f

i
(y
1
, y
2
, ) = f(y
1
, y
2
, )
So the expected value of a function of a set of random variables is approximated by the
function of the expectation of those random variables. (It should be stressed that this
approximation is only close for really small deviations indeed). When we take the variance
at both sides of (4.1), and apply the rules var c = 0 and var (cx) = c
2
var x for a constant
c, and var (x
1
+ x
2
) = var x
1
+ var x
2
for independent x
1
and x
2
, we arrive at
var f(x
1
, x
2
, ) var f(y
1
, y
2
, )+var
_

i
d
i
f

i
(y
1
, y
2
, )
_
(f

i
(y
1
, y
2
, ))
2

i
var x
i
The square of the relative error is thus given by
var f(x
1
, x
2
, )
f(y
1
, y
2
, )
2

_
d
dx
i
ln f(y
1
, y
2
, )
_
2

i
var x
i
(4.2)
From (4.2), we obtain the following squared relative errors which are of practical inter-
est:
var (x
1
+ x
2
)
(y
1
+ y
2
)
2
=
var x
1
+ var x
2
(y
1
+ y
2
)
2
(4.3)
var (x
1
x
2
)
(y
1
y
2
)
2
=
var x
1
+ var x
2
(y
1
y
2
)
2
(4.4)
var (x
1
x
2
)
(y
1
y
2
)
2

var x
1
y
2
1
+
var x
2
y
2
2
(4.5)
var (x
1
/x
2
)
(y
1
/y
2
)
2

var x
1
y
2
1
+
var x
2
y
2
2
(4.6)
var x
n
y
2n
n
2
var x
y
2
(4.7)
var nx
(ny)
2
=
var x
y
2
(4.8)
4.4. SMOOTHING AND INTERPOLATION 53
The relations (4.3), (4.4) and (4.8) also follow directly by applying the variance operator.
They are exact rather than approximative. If the relative errors of the components of a
compound measurement are known, (4.3)-(4.8) can be used to obtain the relative error of
the compound measurement. As a rule, the last signicant gure in the presentation of
the (compound) measurement should be of the same order of magnitude as the error.
Example: If we measure that a daphnid of length 3.0 0.2 mm, ingested 4.0 0.5 10
5
algal cells in 1.00 0.01 h, and if we know that the ingestion rate is proportional to the
squared length, than we nd an ingestion rate for a 4 mm daphnid of 4.0 10
5
42/3.02 =
7.111 10
5
cells mm
2
/h. The relative error is ((0.5/4.0)
2
+ (0.01/1.00)
2
+ 4(0.2/3.0)
2
)
1/2
=
0.14, so the absolute error is 7.11 10
5
0.14 = 1.0 10
5
cells.mm
2
/h, leading to the nal
presentation of 7.1 1.0 10
5
cells.mm
2
/h. Note that the largest relative error, that of the
number of ingested cells in this case, dominates the error in the end result.
4.4 Smoothing and interpolation
If no strong preference for a particular model function exists, cubic splines can be used
to interpolate between data points [25]. Cubic splines have parameters, called knots (or
also nodes or joints), consisting of a sequence of point coordinates x
i
, y
i

n
i=1
. Between
two neighbouring knots, the cubic spline is a 3-th degree polynomial; left of the rst knot
and right of the n-th knot, the cubic spline is a rst degree polynomial. The coecients
of the n 1 piecewise cubic polynomials and the 2 line segments are determined by the
constraints that the cubic spline hits all knots, and that rst and second derivatives of the
polynomials left and right of each knot are equal. So one cannot see the knots in a graph
of the spline; the n + 1 pieces are glued smoothly.
Given the knots x
i
, y
i

n
i=1
for n > 3, and d
i
= x
i+1
x
i
,
i
= y
i+1
y
i
, the second
derivatives at the knots, y

2
, , y

n1
for y

i
= y

(x
i
), can be found from
d
i1
y

i1
+ 2(d
i1
+ d
i
)y

i
+ d
i
y

i+1
= (
i
/d
i

i1
/d
i1
)/6 for i = 2, , n 1
while y

1
= y

n
= 0. We will need

i
= y

i+1
y

i
. The rst derivatives at the knots,
y

2
, , y

n1
for y

i
= y

(x
i
), are
y

i
=
i
/d
i
(2y

i
+y

i+1
)d
i
/6 for i = 1, , n1; y

n
=
n1
/d
n1
+(y

n1
+2y

n
)d
n1
/6
We will also need

i
= y

i+1
y

i
. The cubic spline is now given by
y(x) =
_

_
y
1
(x
1
x)y

1
for x x
1
y
i
+ (x x
i
)
i
/d
i
(x x
i
)(x
i+1
x)(y

i
+ y

(x) + y

i+1
)/6
for x
i
< x x
i+1
, i = 1, ., n 1
y
n
+ (x x
n
)y

n
for x > x
n
The rst derivative is
y

(x) =
_

1
/d
1
d
1
y

2
/6 for x x
1

i
/d
i
+ (2x x
i
x
i+1
)(y

i
+ y

(x) + y

i+1
)/6 for x
i
< x x
i+1
, i = 1, ., n 1

n1
/d
n1
+ d
n1
y

n1
/6 for x > x
n
54 CHAPTER 4. MODEL-BASED STATISTICS
The second derivative is
y

(x) =
_

_
0 for x x
1
y

i
+ (x x
i
)

i
/d
i
for x
i
< x x
i+1
, i = 1, ., n 1
0 for x > x
n
The third derivative is
y

(x) =

i
/d
i
for x
i
< x x
i+1
, i = 1, ., n 1 and y

(x) = 0 for x < x


1
and x > x
n
The integral is
_
b
a
y(x), dx =
_

_
(b a)y
1
(b a)(x
1
a/2 b/2)y

1
for a b x
1
(b a)y
i
+ (b a)(a/2 + b/2 x
i
)
i
/d
i
+
+
1
6
(x
i
x
i+1
(b a) (x
i+1
+ x
i
)(b
2
a
2
)/2 + (b
3
a
3
)/3) (2y

i
+ y

i+1
x
i

i
/d
i
)+
+
1
6
(x
i
x
i+1
(b
2
a
2
)/2 (x
i+1
+ x
i
)(b
3
a
3
)/3 + (b
4
a
4
)/4)

i
/d
i
for x
1
a b x
i+1
, i = 1, ., n 1
(b a)y
n
(b a)(x
n
a/2 b/2)y

n
for x
n
a b
If the knots are all data points, the spline is called an interpolation spline. Scatter in
the data can easily lead to erratic behaviour of the interpolating spline. In practice fewer
knots are chosen, i.e. the values for the independent variable are selected, and those for
the dependent variable are chosen such that the sum of squared distances between the
data points and the spline is minimized. Such a spline in called a smoothing spline. The
fewer the number of knots, the smoother the spline, but 4 is the minimum. It is strongly
recommended to check spline ttings graphically.
The advantage of splines is that the shape of the function in one interval of the do-
main, hardly aects the shape in another interval; this in contrast to a single high-degree
polynomial, for instance.
Important applications, apart from interpolation, are in numerical dierentiation, in
integration and in root nding, if only data points are available; the data need not be
measured alues, but can also be calculated values. Notice that the second derivative of the
cubic spline is still continuous, but the third derivative does not exist at the knots.
What splines do for functions, Bezier curves do for isoclines. These are functions of the
type
p(u) =
n

i=0
p
i
_
n
i
_
u
i
(1 u)
ni
where p
i
is a control point, and n some chosen number, frequently 3. While u walks from
0 to 1, the curve p(u) connects point p(0) = p
0
with p(1) = p
n
. The curve does not hit
intermediary control points; at p
0
, the curve is tangent to the line segment (p
0
, p
1
), and
at p
n
the curve is tangent to the line segment (p
n1
, p
n
). If p
0
and p
n
coincide, the curve
is closed; a circle results if p
1
, , p
4
are on the coners of a square, and p
0
= p
5
are in
the middle of an edge. The point p
0
of the new segment i + 1, called p
i+1
0
, will generally
coincide with point p
i
n
of the last segment. If p
i
n1
, p
i
n
= p
i+1
0
, p
i+1
1
are on the same line, the
joint between two adjacent Bezier curves is smooth. A problem of Bezier curves is that
the whole segment changes in shape, if one control point is changed. B-splines are used to
solve these problems [20].
4.5. TESTING HYPOTHESES 55
-1
0
1
0 2 4 6 8 10
Figure 4.1: The 20 data points are indi-
cated by red +, the 6 knots with blue +,
the smoothing cubic spline in green, the
rst derivative in blue, the second one in
magenta, the third one in red.
4.5 Testing hypotheses
A statistical test usually takes to form of
formulation of a null (H
0
) and an alternative (H
1
) hypothesis about the value of a
parameter, e.g. H
0
: p = p
0
and H
1
: p = p
1
(a so-called simple alternative) or e.g. H
1
:
p ,= p
0
(a composite alternative)
denition of a test statistic, which is some appropriate function of the observations
derivation of the distribution of the test statistic under the null hypothesis
rejection of the null hypothesis if the survivor function under H
0
at the value of the
test statistic is less than some preselected value: the signicance level .
Two types of errors can occur in this decision scheme
Type 1 error: H
0
is rejected, while it is true. This occurs with a probability that is
equal to the signicance level.
Type 2 error: H
0
is accepted, while H
1
is true. This occurs with a probability that
is equal to the 1 minus the power of the test. The distribution of the test statistic
under H
1
must be derived to calculate the power.
The Neyman-Pearson Theorem states that, if x
1
, , x
n
represent n independent trials
from some p.d.f., the best (i.e. uniformly most powerful) choice for a test statistic for
simple alternatives is the likelihood ratio (see below).
Example: a test on the value of the binomial probability is as follows. Suppose that
we have a trial x from the bionomial probablity distribution with parameters n and p, and
want to test the null hypothesis H
0
: p = 0.2 against the alternative H
1
: p = 0.4. We
choose the signicance level and obtain the smallest value for m for which

n
i=m
Prx =
i[p = 0.2 holds. We reject H
0
if x > m. It is in this case usually not possible to
test exactly with signicance level because x is an integer. The power of the test is
1

n
i=m
Prx = i[p = 0.4.
56 CHAPTER 4. MODEL-BASED STATISTICS
4.6 Likelihood functions
Consider a random sample x
1
, x
2
, . . . x
n
of some discrete random variable that has probabil-
ity distribution Prx = x; , where represents the vector of parameters. If the samples
are independent, the simultaneous probability of x
1
, x
2
, . . . x
n
is

n
i=1
Prx
i
= x
i
; . This
simultaneous probability may be regarded as a function of . When so regarded, it is called
the likelihood function L of the random sample, and we write
L(; x
1
, x
2
, . . . x
n
) =
n

i=1
Prx
i
= x
i
;
The value of that maximizes L is called the maximum likelihood estimate (MLE) of
and it will be denoted as

and is a number. If we replace the observed values x
i
by the
random variables x
i
in the MLE, we have an maximum likelihood estimator,

, which is a
random variable. Be aware of the problem that likelihood functions usually have several
local maxima, while the MLE only relates to the global maximum.
Example with real data: Suppose we have the following 10 observations from a Poisson
variable with unknown parameter : 4, 0, 2, 5, 3, 2, 4, 1, 7, 3. Which value of ts best
to the data? According to the maximum likelihood principle we have to maximize the
product of the 10 Poisson probabilities
L(; 4, 0, . . . 3) = f(4; )f(0; ) . . . f(3; )
=

4
4!
e


0
0!
e

. . .

3
3!
e

=

31
4!0! . . . 3!
e
10
This function can be maximized by setting the rst derivative of L, with respect to ,
equal to zero. Note however, that each of the functions L and ln(L) (denoted as ) is a
maximum for the same value of . It appears to be easier to work with instead of L.
d
d
= 0
() = 31 ln() ln(4!0! . . . 3!) 10
d
d
=
31

10 = 0 =
31
10
= 3.1 .
So the MLE

= 3.1, the average of the data. From this simple example we can derive a
more general result: if we have a sample from a Poisson variable, the maximum likelihood
estimator of the parameter is given by

x
i
n
= x
For n independent samples from a continuous random variable, we can likewise dene
the likelihood function
L(; x
1
, x
2
, . . . x
n
) =
n

i=1
f
x
i
(x
i
; )
4.7. LARGE SAMPLE PROPERTIES OF ML ESTIMATORS 57
with dim(L) = dim(x)
n
. An example for the exponentially distributed random variable
with unknown parameter is as follows. Let x
1
, x
2
, . . . x
n
be such a sample of size n. The
likelihood L is now given by the simultaneous p.d.f.
L(; x
1
, . . . x
n
) = f(x
1
; )f(x
2
; ) . . . f(x
n
; )
= e
x
1
e
x
2
. . . e
xn
=
n
e

x
i
As in the previous example we take the logarithm of L.
() = nln()

x
i
From this the maximum likelihood estimator can easily be found to be

=
n

x
i
=
1
x
Example with two parameters: the normal distribution. Suppose we have a sample
from a normal distribution N(,
2
). The log likelihood is then given by
(,
2
) =
n
2
ln(2)
n
2
ln(
2
)

i
(x
i
)
2
2
2
(4.9)
To nd the maximum we take the two partial derivatives with respect two and
2
and
equate these to zero.

=
1

(x
i
)

2
=
n
2
2
+
1
2
4

(x
i
)
2
This leads to the following MLEs for and
2
= x and
2
=
1
n

(x
i
x)
2
(4.10)
4.7 Large sample properties of ML estimators
The estimator

is said to be an unbiased estimator for if c

= . ML estimators are
generally only asymptotically unbiased, i.e. for increasingly large sample sizes.
The ML estimator for the Poisson parameter has nice properties: its obvious that
c( x) = and var( x) = /n, so c(

) = and var(

) = /n. The rst property means that


the maximum likelihood estimator for the Poisson parameter is unbiased. We also know
the distribution of the estimator:

x
i
Poisson(n), so


1
n
Poisson(n).
The ML estimator for the exponential parameter has not the same nice properties as
in the previous example. Here it is not straightforward how to calculate the expectation
58 CHAPTER 4. MODEL-BASED STATISTICS
of

. It appears that c(
1
x
) =
n
n1
, i.e.

is biased. But for large n the bias disappears.
We say that

is asymptotically unbiased. This appears to be a general result.
It can be shown that ML estimators have, asymptotically, minimum variance, i.e. no
other estimators can be devised that have a smaller variance than ML estimators for
increasingly large sample sizes.
The ML theory also applies to a vector of parameters. Under general regularity condi-
tions it can be proved that a maximum likelihood estimator

follows asymptotically (i.e.
for large values of the sample size n) a normal distribution:

N(, ()) where () =


_
c
_
d
2

dd

__
1
=
_
c
_
d
d
d
d

__
1
In the case of a Poisson distribution this leads to

N(,

n
) for large values of n
For the exponential distribution we get

N(,

2
n
) for large values of n
The asymptotic distributions can be used to construct condence intervals for the
parameter; it is said to be of level with boundaries
0
and
1
if Pr
0



1
= .
Notice that such an interval is not unique. A dierent way to get condence intervals is
by the use of prole likelihoods.
4.8 Likelihood ratio principle
The likelihood function can be used to construct parameter estimators, but it can also be
used to test hypotheses about those parameters. The idea is as follows: we take a sample
from a distribution with unknown parameter vector = (
1
,
2
, . . .
k
) that can take values
in a set . We want to test a hypothesis, for instance
1
= 0 or more generally
0
.
Now we maximize the likelihood in two ways: rst by constraining ourselves to
0
,
next without any constraint. Then we look at LR, the ratio of the two likelihoods:
LR =
max

0
L()
max

L()
If LR is too small we have to reject the null hypothesis. What we mean by too small is
determined by the distribution of LR in the usual way: a critical value c is given by P(LR <
c [ H
0
) = where is the signicance level (usually 0.05). Note that we can also work
with the logarithm of LR, that is the dierence between the logarithms: max

0
()
max

(). Because it is always negative, we usually work with the deviance 2 ln LR


L = 2(max

() max

0
()), which is always positive.
4.8. LIKELIHOOD RATIO PRINCIPLE 59
This idea has been worked out for a lot of standard experiments: the well known t-test,
ANOVA and linear regression tests are all examples of likelihood ratio tests, based on the
normal distribution. The binomial test for binary data (0/1, yes/no, blue/green) is also
based on the same principle. These tests are exact: they do not assume large samples.
There is a large sample approach that is generally applicable: it can be proved that
asymptotically
L
2

where is the dierence in the number of parameters to be estimated, between maximiza-


tion over and
0
. Graphically this means that we rst approximate the log likelihood
ratio around the MLE by the tangent parabola:
L() (

)
2
d
2
d
2
(

)
The boundary values of the 100(1 )% condence region are found by the intersection
of this parabola with the line L() =
2
[;1]
, which gives

2
[;1]
/
d
2
d
2
(

) < <

+

2
[;1]
/
d
2
d
2
(

)
A simple example: suppose we have data x
1
, x
2
, . . . x
n
from an exponential distribution
and = (0, ). The ln-likelihood is given by () = n(ln() x). The maximum
value of over is reached for =

= 1/ x and amounts to (

) = n(ln( x) + 1),
so
d
2
d
2
(

) = n

2
= n x
2
and L() = 2n( x 1 ln( x)). The tangent parabola
approximation is L() n( x 1)
2
, which gives the 100(1 ) % condence interval
x
_
1
_

2
[1;1]
/n
_
< < x
_
1 +
_

2
[1;1]
/n
_
see Figure 4.2. If we want to test the hypothesis = 0.5, we have
0
= 0.5. Maximization
of L over
0
is simple: it is L(0.5). This leads to the following test rule: reject H
0
: = 0.5
if
2n(0.5 x 1 ln(0.5 x)) >
2
[1;1]
4.8.1 Likelihood based condence region
We can use the previous approach to construct a condence region for a parameter vector
= (
1
,
2
, . . .
k
). The region consists of all parameter vector values
0
that do not lead
to rejection of the hypothesis H
0
: =
0
. This leads to the 100(1 )% condence region
_
, 2 ln
_
L(

)
L()
_
<
2
[;1]
_
60 CHAPTER 4. MODEL-BASED STATISTICS
L
(

)
/ x
Figure 4.2: The 2 ln likelihood ratio, L, as a function
on its parameter , together with the tangent parabolic ap-
proximation for 5 samples from an exponential distribution.
The range for which the functions are below the indicated
horizontal line represents the 95 % condence interval. L is
here proportional to the sample size; an increase of the num-
ber of samples has the same eect as lowering the horizontal
line. Close to the MLE, the parabolic approximation is very
good, but for small samples, it should not be used.
4.8.2 Prole likelihood
The condence region dened above becomes dicult to handle or to communicate in case
of multi-parameter models. In most case we are interested in one parameter at a time, say
in
1
. Therefore we calculate the prole likelihood L
p
(
1
) given by
L
p
(
1
) = max

2
,...
k
L(
1
,
2
, . . .
k
)
In case of a one-parameter model, the boundary values of the 100(1)% condence region
are found by the intersection of the -2 ln likelihood ratio with the line L() =
2
[p;1]
. In
the section on likelihood based condence region we used the tangent parabola, rather
than the likelihood ratio itself. The boundary values of the condence region are no
longer equidistant from the MLE. In the case of the exponential distribution the boundary
values of the 100(1 ) % condifence interval have to be found numerically from L() =
2n( x 1 ln( x)) =
2
[1;1]
, see Figure 4.2. The purpose is to demonstrate the general
applicability of the prole likelihood method; the fact that the exact condence interval
can be obtained analytically in this particular case is of no relevance.
In case of a two-parameter model the likelihood function can be seen as a mountain
landscape. The prole likelihood is then the skyline of this landscape in one dimension.
From this one-dimensional function we can calculate a 100(1 ) % condence interval for

1
given by
_

1
, 2 ln
_
L(

)
L
p
(
1
)
_
<
2
[1;1]
_
The condence region is based on large samples. Prole likelihood ratios tend to give more
correct condence intervals for small samples than approximations by tangent parabolas.
The practical problem in the application of prole likelihoods is that for each value of the
parameter under consideration, the MLEs for the other parameters have to be obtained,
which might be computationally intensive.
In case of a likelihood function with more than one (local) maximum this denition of
a condence region may lead to more than one disjunct intervals. In general it would be
better to use the term condence set instead of condence interval.
4.9. REGRESSION 61
4.9 Regression
Regression models are deterministic models for dependent variables with additive scatter
frequently have the form
y
i
(x) N(f(x
i
; ),
2
)
where f is some specied function of an independent variable with parameter vector , and
a (constant) scatter parameter. The ln likelihood function is
(,
2
) =
n
2
ln(2)
n
2
ln(
2
)

i
(y
i
f(x
i
; ))
2
2
2
The maximization of this likelihood function as a function of the parameters amounts to
a minimization of the sum of squared deviations between the observed values y
i
and the
(deterministic) model predictions f(x
i
). The least squared deviation criterion for param-
eter estimation is thus a special case of the ML criterion. The ML estimator for
2
is for
f
i
= f(x
i
; )

2
=
1
n

i
(y
i
f
i
)
2
.
If we want to test the hypothesis that p parameters have specied values, based on n
observations, the deviance amounts to L = nlog
2
1
/
2
0
, where
2
1
is the estimated variance
if all parameters are estimated, while
2
0
is that given the values of the p parameters. Under
the null hypothesis, L is
2
distributed with p degrees of freedom.
Biological data tend to have a constant variation coecient (vc ), so = v
c
f(x; ) and
f(x; ) > 0, rather than a constant variance. For normally distributed dependent variables
ln likelihood function now amounts to
(, v
c
) =
n
2
ln(2) nln v
c

i
ln f(x
i
; )
1
2v
2
c

i
(y
i
/f(x
i
; ) 1)
2
This leads for f
i
= f(x
i
; ) to
v
2
c
=
1
n

i
(y
i
/f
i
1)
2
The ML estimates minimize log in the constant-sd model, and log v
c
+
1
n

i
log f
i
in the
proportional-sd model.
4.10 Composite likelihoods
In the practice of statistical analyses of biological data, it frequently happens that not a
single, but several variables have been measured and that particular parameters occur in
more than one data set. A very much related problem is the case that the same variable
has been measured in dierent experiments, while some parameters have the same value
and others (for instance parameters that relate to scatter) have dierent values for the
62 CHAPTER 4. MODEL-BASED STATISTICS
0
1
2
3
4
0 1 2 3 4 5
Figure 4.3: Data with curves y(x) = y

(y


y
0
) exp(rx), with ML estimated parameters, based
on the assumption of normally distributed scatter with
constant standard deviation (in red: y
0
= 0.023,
y

= 2.98, r = 0.695) and with standard devia-


tion proportional to the mean (in green: y
0
= 0.003,
y

= 3.01, r = 0.686). The small dierence is rather


typical, but exceptions can occur.
dierent experiments. As long as the data are mutually independent the method of esti-
mating the parameters (point and interval estimates) is straightforward within the context
of the likelihood principle. We illustrate this with a simple example.
Suppose we have data sets x
i
, y
i

n
i=1
and v
i
, w
i

m
i=1
. We assume that the random
variable y is normally distributed with mean ax and variance
2
y
, while the random variable
w is also normally distributed with mean av and variance
2
w
. The ln likelihood function
for this case is
(a,
y
,
w
) =
n
2
ln 2
2
y

1
2
2
y
n

i=1
(y
i
ax
i
)
2

m
2
ln 2
2
w

1
2
2
w
m

i=1
(w
i
av
i
)
2
The maximization of the ln likelihood function, as a function of the parameter a amounts
the a minimization of the weighted sum of squared deviations between measured and
expected values for y
i
and w
i
(terms 2 and 4 in the right-hand side of the formula). The
weight coecients are inversely proportional to the variances. The resulting MLE for the
parameter a turns out to be
a =
_

2
y
n

i=1
x
i
y
i
+
2
w
m

i=1
v
i
w
i
__

2
y
n

i=1
x
2
i
+
2
w
m

i=1
v
2
i
_
1
while the variances are estimated by

2
y
=
1
n
n

i=1
(y
i
ax
i
)
2
and

2
w
=
1
m
m

i=1
(w
i
av
i
)
2
Notice that

2
=
2
; this is just a special case of a general property of MLEs: the MLE
of a (parameter-free) function of a parameter is the function of the MLE of the parameter.
Notice also that the estimates for a,
y
and
w
are given implicitly only; explicit estimates
can only be obtained numerically.
4.11 Parameter identiability
A parameter is said to be unidentiable is it cannot be estimated from a given data set. It
is theoretically unidentiable if it can never be estimated, no matter how extensive the data
4.12. MONTE CARLO TECHNIQUES 63
set is. The parameters a and x
0
in the model y(x) = a(x/x
0
)
b
+, for example, cannot be
estimated from a data set x
i
, y
i

n
i=1
. The solution to this problem is re-parametrization,
using less parameters. Here we have y(x) = a

x
b
, where a

= ax
b
0
(although this here
gives a dimension problem, which strangely does not stop many workers from using this
model.).
A parameter set is practically unidentiable if it cannot be estimated from the given
data set, but this problem might go away for other data sets. The parameters y
m
and k in
the model y(x) =
ymx
k+x
+ can hardly be estimated from data set x
i
, y
i

n
i=1
, if max
i
x
i
< k.
The parameters y
m
and k are strongly negatively correlated. The solution to this problem
is either extending the data set of re-parametrization with less parameters. If x < k we
have y(x) y

m
x + , where y

m
= y
m
/k.
The properties of parameter estimators depend on the way the parameters are intro-
duced. In the regression of y on x, the estimators for parameters a and b in the relationship
y = x
2
(a + bx) are strongly negatively correlated when in the observations x
i
, y
i

n
i=1
, all
x
i
> 0; the mathematically totally equivalent relationship y = x
2
(c + b(x

i
x
3
i
/

i
x
2
i
))
suers much less from this problem. Replacement of the original parameters by appropri-
ately chosen compound parameters can also reduce correlations between parameter esti-
mates.
4.12 Monte Carlo techniques
The possibilities to evaluate the properties of MLEs for small samples are very limited;
likelihood ctions are even dicult to construct for interacting sub-processes. The con-
struction of such functions are only feasible if we deal with independent trials from some
distribution. The only method that is frequently left is the Monte Carlo method, where
computer simulations are used to evaluate the propagation of eects of stochasticity.
The basic tool is the random generator, which is a recurrent deterministic algorithm;
a next trial from an uncorrelated approximately homogeneously distributed random vari-
able is obtained from a previous one, after initialization with a seed. These numbers are
subsequently transformed to produce random trials from other distributions and used to
simulate the process under consideration. This has to be done many times to evaluate the
role of stochasticity but that does not needs to be a problem.
As an example of application of Monte Carlo methods, we check the validity of the
likelihood-based condence interval for the ML estimator for the parameter of the exponen-
tial distribution. The procedure is to choose a condence level, obtain the likelihood-based
condence interval, and calculate the fraction of the MLEs that are in this interval. We
repeat this for many choices of condence levels. Figure 4.4 shows that even for 2 random
trials, the likelihood-based condence interval is close to correct. (The tail probabilities
are in this case not exactly equal, however.)
64 CHAPTER 4. MODEL-BASED STATISTICS
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2
10
5
2

s
u
r
v
i
v
o
r
f
u
n
c
t
i
o
n
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
lik-based condence level
e
m
p
i
r
i
c
a
l
c
o
n
f
.
l
e
v
e
l 2
10
Figure 4.4: Left: The empirical survivor function of the ML estimator based on 2, 5 and 10
random trials from an exponential distribution with parameter 1. Right: the corresponding
empirical condence level, against the likelihood-based condence level. All curves were generated
from 1000 MLEs.
Chapter 5
Notation
The following conventions are used:
summation:

n
i=1
x
i
= x
1
+x
2
+ +x
n
. If the summation is over all indices, this is
summarized as

i
x
i
. The double summation

m
j=1

n
i=1
x
ij
is summarized as

ij
x
ij
.
The notation

i=j
x
ij
means: summation over all indices i and j such that i ,= j,
so the elements x
ii
are excluded. The notation

i<j
x
ij
means: summation over all
indices i and j such that i < j.
product:
n
i=1
x
i
= x
1
x
2
x
n
. Indices handling is similar to summation.
integration:
_
b
x=a
f(x) dx or
_
b
a
f(x) dx, where x runs from a till b. If the integration
is over the whole domain, it is summarized as
_
x
f(x) dx. The notation
_
f(x) dx,
however, means the indenite integral.
capitals for matrices
T for transposition
/ and
d
dx
for dierentation
diag for diagonal elements
[ [ for determinant
[[ [[ for length
underline for random variables
Pr for probability
var for variance
cov for covariance
cor for correlation coecient
65
66 CHAPTER 5. NOTATION
vc for variation coecient
sd for standard deviation
c for expectation operator
dim for dimension
Bibliography
[1] J. D. Barrow. The world within the world. Clarendon Press, Oxford, 1988.
[2] E. A. Bender. An introduction to mathematical modelling. J. Wiley & Sons, New York, 1978.
[3] M. P. Boer. The dynamics of tri-trophic food chains. PhD thesis, Vrije Universiteit, Amsterdam,
2000.
[4] E. J. Doedel, A. R. Champneys, T. F. Fairgrieve, Y. A. Kuznetsov, B. Sandstede, and X. Wang.
Auto 97: Continuation and bifurcation software for ordinary dierential equations. Technical report,
Concordia University, Montreal, Canada, 1997.
[5] L. Edelstein-Keshet. Mathematical models in biology. Random house, 1988.
[6] G. H. Edwards and D. E. Penney. Calculus with analytic geometry, early transcendentals. Prentice
Hall, New York, 1997.
[7] C. Epstein, L. Thinking physics is gedanken physics. Insight Press, San Francisco, 1985. An excellent
example of scientic thinking that leads to the formulation of mechanistically inspired assumptions.
[8] E. Glover and K. Mitchell. An introduction to biostatistics. McGraw Hill, Boston, 2002. Primary
introduction to classical applied statistics.
[9] H. B. Griths and A. Oldknow. Mathematics of models; continuous and discrete dynamical systems.
Ellis Horwood Ltd, Chichester, 1993.
[10] R. Hilborn and M. Mangel. The ecological detective; confronting models with data. Monographs in
population biology. Princeton University Press, Princeton, New Jersey, 1997. One of the few applied
books with an emphasis on parameter estimation.
[11] R. V. Hogg and A. T. Craig. Introduction to mathematical statistics. MacMillan Publ., New York,
1989. A good non-technical intro to statistical concepts.
[12] N. G. van Kampen. Stochastic processes in physics and chemistry. North-Holland, Amsterdam, 1981.
Stochastic modeling with emphasis on transport phenomena.
[13] A. I. Khibnik, Yu. A. Kuznetsov, V. V. Levitin, and E. V. Nikolaev. Continuation techniques and
interactive software for bifurcation analysis of ODEs and iterated maps. Physica D, 62:360371, 1993.
[14] H. A. Klein. The science of measurement; a historical survey. Dover Publ. Inc., New York, 1988.
Gives backgrounds of the art of measurements physical and chemical quantities.
[15] Yu. A. Kuznetsov. Elements of applied bifurcation theory., volume 112 of Applied Mathematical
Sciences. Springer-Verlag, Berlin, 1995.
[16] H. A. Lauwerier. Modellen met de microcomputer. Epsilon Uitgaven, Utrecht, 1989.
[17] D. C. Lay. Linear algebra and its applications. Addiso-Wesley Publishing Comp., Reading, 2000.
[18] J. D. Murrey. Mathematical biology. Springer, 1989.
[19] J. A. Nelder and R. Mead. . Computer Journal, 7:308, 1965.
67
68 BIBLIOGRAPHY
[20] W. M. Newman and R. F. Sproull. Principles of interactive computer graphics. McGraw-Hill, Aukland,
1981.
[21] H. E. Nusse and J. A. York. Dynamics: numerical explorations. Springer Verlag, New York, 1994.
[22] Y. Pawitan. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford Science
Publications, 2001. Rather technical but complete book.
[23] D. Peak and M. Frame. Chaos under control. W. H. Freeman & Comp., New York, 1994.
[24] D. Ruelle. Elements of dierentiable dynamics and bifurcation theory. Academic Press, Boston, 1989.
[25] L. Schumaker. Spline functions; basic theory. John Wiley & Sons, New York, 1981.
[26] L. A. Segel. Modeling dynamic phenomena in molecular and cellular biology. Cambridge University
Press, 1984.
[27] H. M. Taylor and S. Karlin. An introduction to stochastic modeling. Academic Press, London, 1984.
[28] J. R. Taylor. An introduction to error analysis; the study of uncertainties in physical measurements.
Oxford Univerity Press, 1982.
Index
accuracy, 51
antiderivative, 26
approximation, 24
Arrhenius, 6
asymptote, 21
attractor
cyclic, 45
point, 45
strange, 45
base, 20
bias, 57
bifurcation, 45
biology
experimental, 3
theoretical, 3
chaos, 45
circle, 28
coecient
correlation, 34
variation, 34
coherence, 7
complement, 17
conjugate, 19
consistency, 3
constraint, 25
continuation, 46
contour, see isocline
convergence, 22
cosecant, 20
cosine, 20, 33
cotangent, 20
covariance, 34
curve
Bezier, 54
denite
negative, 32
positive, 32
denition
implicit, 20
recursive, 22
degree, 19
dependent
linearly, 30
stochastically, 34
derivative
directional, 24
partial, 24
second, 23
design
experimental, 10
determinant, 25, 27, 29
deviance, 58
deviation
standard, 34
diagram
bifurcation, 47
operating, 47
Venn, 17
dierentiation
implicit, 22
implicit partial, 24
numerical, 38
dimension, 5
disjoint, 17
distribution
Bernoulli, 35
beta, 36
binomial, 35
Chi-square, 36
exponential, 36
frequency, 33
gamma, 36
geometric, 35
homogeneous, 36
marginal, 37
multinomial, 35
negative binomial, 35
normal, 36
Poisson, 35
probability, 33
truncated, 37
divergence, 22
domain, 18
element, 28
69
70 INDEX
ellipse, 28
empirical cycle, 1
equation
dierential, 43
sti dierential, 38
error
absolute, 51
random, 51
relative, 52
systematic, 51
type 1, 55
type 2, 55
estimate
maximum likelihood, 56
estimator
maximum likelihood, 56
expectation, 34
factorial, 18
falsication, 3
feedback, 45
gure
signicant, 51
form
bilinear, 32
normal, 46
quadratic, 32
function, 19
algebraic, 20
allometric, 6
beta, 26
composition, 20
continuous, 21
correlation, 41
dierentiable, 22
distribution, 34
even, 19
exponential, 26
gamma, 26
homogeneous, 19
inner, 20
inverse, 20
likelihood, 56
logarithmic, 26
monotonous, 19
odd, 19
outer, 20
polynomial, 19
probability density, 33
rational, 19
Riemann zeta, 22
spline, 53
survivor, 34
transcendental, 20
generator
random, 63
gradient, 24
hyperbola, 28
hypothesis
alternative, 55
null, 55
idempotent, 29
image, 18
input, 43
integer, 19
integral
convolution, 37
integrand, 26
integration
denite, 26
indenite, 26
numerical, 38
partial, 27
intensity, 43
interpolation, 53
intersection, 17
interval, 18
closed, 18
condence, 58
half-open, 18
open, 18
unbounded, 18
isocline, 28
iteration
Newton Raphson, 39
Jacobian, 27
knot, 53
level
signicance, 55
limit, 21
left-hand, 21
right-hand, 21
line
tangent, 24
vertical tangent, 21
logic
deductive, 14
inductive, 14
predicate, 15
propositional, 14
INDEX 71
map, 41
mapping, see operator
Markov
chain, 42
process, 42
matrix, 28
diagonal, 29
generalized inverse, 31
Hessian, 24
identity, 29
inverse, 30
Jacobian, 27
nonsingular, 30
rotation, 33
singular, 30
size, 28
square, 29
symmetric, 29
triangular, 29
zero, 29
maximum
global, 25
local, 25
mean, 34
measurement, 49
scale, 49
unit, 50
memory, 45
minimum
global, 25
local, 25
mixture, 37
model
regression, 61
module, 9
moment, 34
central, 34
multiplier
Floquet, 46
Lagrange, 25
norm
of vector, 33
number
complex, 19
rational, 19
real, 19
operator, 18
bijective, 18
Boolean, 17
injective, 18
surjective, 18
Order, 22
order, 22
orthogonal, 33
orthonormal, 33
output, 44
over, 18
parabola
tangent, 24
parameter, 19
unidentiable, 62
part
imaginary, 19
real, 19
partition, 17
Pascal, 8
plane
tangent, 25
point
control, 54
critical, 25
oating, 51
inection, 23
saddle, 25
precision, 51
probability
conditional, 37
process
birth and death, 42
branching, 42
continuous time, 41
counting, 43
deterministic, 41
discrete time, 41
ergodic, 43
interval, 43
Markov, 42
orderly, 43
point, 43
renewal, 43
stochastic, 41
Yule, 43
product
direct, 28
inner, 33
matrix, 28
outer, 33
range, 19
rank, 30
full, 30
rate
hazard, 43
72 INDEX
root, 27
rule
chain, 24
de Morgan, 18
LHopital, 23
sample, 33
scalar, 19
scale
measurement, 49
organization, 8
secant, 20
seed, 63
sequence, 22
Fibonacci, 22
series, 22
binomial, 22
geometric, 22
harmonic, 22
Maclaurin, 23
Taylor, 23
set, 17
simplex, 40
sine, 20
sphere, 28
stability
global, 46
local, 46
neutral, 46
state
steady, 41
transient, 41
subset, 17
support, 11
system
non-homogeneous linear, 31
tangent, 20
term, 22
test
power, 55
testability, 10
theorem
algebra, 27
Buckingham, 7
central limit, 36, 43
Leibniz, 27
Neyman-Pearson, 55
Pythagoras, 33
theory
auxiliary, 1
trace, 29
transformation, see operator, 49
transposition, 29
union, 17
usefulness, 3
value
absolute, 19
average, 26
boundary, 44
eigen, 31
initial, 44
maximum, 25
minimum, 25
variable
continuous random, 33
dependent, 19
discrete random, 33
extensive, 7
independent, 19
intensive, 7
intermediate, 24
random, 33
state, 43
variance, 34
vector, 29
eigen, 32
length, 33
verication, 3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy