Book GWC10
Book GWC10
1 Introduction 11
1.1 The Nature of Software Systems Today . . . . . . . . . . . . . . 12
1.2 Enabling Technology . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Formal Modeling as an Engineering Enterprise . . . . . . . . . . 15
1.4 A Guide to Using This Book . . . . . . . . . . . . . . . . . . . . 16
I Foundations 19
2 Formal Models 23
2.1 Models in Engineering . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Choosing the Right Models . . . . . . . . . . . . . . . . . . . . . 24
2.3 Models for Software Engineers . . . . . . . . . . . . . . . . . . . 25
2.4 Formal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Formal Systems 33
3.1 Formal Languages . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Inference Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Proofs and Theorems . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Propositional Logic 49
4.1 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3
4 TABLE OF CONTENTS
4.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.4 Propositional Calculus . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.1 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.2 Implication . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.3 Bi-implication . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.4 Disjunction . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.5 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5 Derived Inference Rules . . . . . . . . . . . . . . . . . . . . . . 61
4.6 Soundness and Completeness . . . . . . . . . . . . . . . . . . . . 63
4.7 Translating English into Propositional Logic . . . . . . . . . . . . 64
4.7.1 Choosing Atomic Propositions . . . . . . . . . . . . . . . 64
4.7.2 Connectives . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7.3 Example: More Traffic Lights . . . . . . . . . . . . . . . 69
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 Predicate Logic 75
5.1 Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Predicate Calculus . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.1 The Universal Quantifier . . . . . . . . . . . . . . . . . . 82
5.4.2 The Existential Quantifier . . . . . . . . . . . . . . . . . 84
5.5 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Derived Inference Rules . . . . . . . . . . . . . . . . . . . . . . 87
5.7 Soundness and Incompleteness . . . . . . . . . . . . . . . . . . . 87
5.8 Translating English into Logic . . . . . . . . . . . . . . . . . . . 88
5.8.1 Propositions Versus Predicates . . . . . . . . . . . . . . . 88
5.8.2 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.8.3 Beyond Predicate Logic . . . . . . . . . . . . . . . . . . 93
5.9 Fathers and Sons: A Formal Riddle System . . . . . . . . . . . . 93
5.9.1 Fathers and Sons . . . . . . . . . . . . . . . . . . . . . . 93
5.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
9
10 TABLE OF CONTENTS
several years was the three-part approach that we present in this book. We start
with the subset of basic mathematics that is most centrally relevant to software
modeling. Next we introduce the concepts needed to relate mathematics to com-
putation: state machines, traces, invariants, and so on. Finally, we provide a set of
examples of specific notations to show how the abstract modeling concepts can be
realized in a concrete, scalable way, and how tools can be used to make tractable
the job of specifying and analyzing models.
Initially when we taught this material we relied on standard mathematical texts
for the first part and a variety of method-specific texts for the third part, bridging
the gap in Part 2 with handouts. At some point, however, we realized that there
was some value in putting it all together in a smallish book that could be used as a
springboard for anyone interested in gaining an appreciation of formal modeling
for practical software development. Moreover, we hoped that the foundations and
examples covered by the book would be useful to others in the formal modeling
community who might be interested in contributing additional modules to Part 3
– essentially making this text an extensible and continuously evolving asset to the
formal modeling community.
This book and the course that motivated its creation owe much to the members
of past and present members of the software engineering community at Carnegie
Mellon University. In particular we would like to thank Daniel Jackson and Jim
Tomayko, who served on the MSE curriculum committee that positioned Mod-
els of Software Systems in a central place in the software engineering curricu-
lum; Masters students who took the course and made many good suggestions for
improvement; and numerous teaching assistants who contributed to the body of
exercises. In addition we would like to thank Daniel Kroening for his early contri-
butions to this book and helping us formulate the overall plan of the book and its
approach. We also thank Sungwon Kang and Paul Strooper, who provided detailed
comments on early drafts, Microsoft Corporation, for supporting the production of
this book through an educational development grant, and various funding agencies
(The National Science Foundation, DARPA, NASA, and others), for supporting
our research in formal methods.
Chapter 1
Introduction
11
12 CHAPTER 1. INTRODUCTION
our current systems of banking, medical care, commerce, national security, enter-
tainment, communication, transportation, and energy would be unthinkable. Such
software-based systems must continue to run, even in the presence of flaws. They
must serve millions of people on a daily basis. They must provide high degrees of
security and privacy for their users.
To handle the increasingly demanding requirements of modern software-based
systems the complexity of the underlying software has also risen dramatically.
Many types of systems that once were differentiated primarily by their hardware,
are now commercially competitive by virtue of value-added features provided by
software. As a result, the amount of software resident in appliances, cars, tele-
phones, televisions, and so on, is increasing dramatically.
But in addition to increased size and functionality of software, complexity due
to the context in which systems must operate has also risen dramatically. Currently
most systems must function in a distributed setting, communicating with other
systems over networks or other communication channels. They must interoperate
with other systems, often unknown at the time of their creation. They must be
engineered in ways that permit incremental introduction of new capabilities. They
must be built by development teams spread across the globe.
In short, the day of the simple stand-alone application, created, distributed, and
maintained by a single co-located organization, is over. Software engineers must
now create systems that exhibit high degrees of availability, reliability, security,
and interoperability. Maintaining intellectual control over such systems becomes
a daunting task.
In that context techniques for managing complexity and for ensuring critical
system properties during design become not just a luxury, but a necessity. Formal
models by their very nature can play a significant role in that regard. Through
abstraction they allow software engineers to focus on the critical issues facing
them. Through precision they provide a way to document intended functions and
properties of a system to be built. Through refinement they provide ways to help
guarantee that implementations respect design principles and properties. Through
logical foundations they support the ability to perform analyses to determine con-
sequences of design and implementation choices.
the early days of formal methods reasoning about a formal model was largely
a manual exercise. Establishing properties of a model, or relating two models,
usually boiled down to proving a set of theorems. And while several powerful
automated theorem provers and semi-automatic proof assistants were developed
in research communities, their use required considerable mathematical expertise
and patience to develop all of the underlying theories and lemmas required to
demonstrate even the most rudimentary properties of a formal model.
Over the past decade, however, numerous automated tools have made the job
of analyzing formal models considerably more tractable and accessible to prac-
ticing software engineers. One important class of tools is model checkers. A
model checker takes a formal model and a property to check about it. The checker
then explores all computational paths of the model, and either certifies that the
property holds over all of those paths, or provides a counterexample that shows a
computational sequence that leads to the violation of the property.
Model checkers found initial success in the hardware design domain, where
their use is now de rigueur. But over the past decade considerable progress has
also been made in applying these tools to software-based systems. While there re-
main obstacles to their use on very complex software systems, for many restricted
domains, or for sufficiently simple models of a system, they can be remarkably
effective in increasing the engineering payoff of formal modeling by providing
ways to explore properties of those models.
Other tools for static analysis that take advantage of formal specifications have
drastically improved our ability to eliminate certain classes of errors from our
systems. For example by embedding certain annotations in code, one can use
tools to check for absence of race conditions in concurrent code, or the absence of
buffer overruns in software that would otherwise permit it.
Moreover, the state of automated theorem provers and proof assistants has also
improved considerably since their early days. A number of fully automated the-
orem provers for specialized theories have been developed. They take advantage
of advances in optimization and machine learning techniques for proof search.
Powerful interactive proof assistants incorporate automated theorem provers and
link to other external tools (such as SAT solvers and model checkers) to improve
performance. Theorem provers have already been used to prove impressive math-
ematical theorems, some never proved before. Some large industrial verifications
have also been carried out.
1.3. FORMAL MODELING AS AN ENGINEERING ENTERPRISE 15
one kind of model may be ideal for analyzing protocols of communication, while
another might be better suited to understanding the relationships between data.
Other models may be better for understanding the performance or the reliability
of a system.
In the succeeding chapters we will be considering a variety of modeling lan-
guages and their associated methods. As we describe each, we will be looking for
a clear understanding of the contexts in which they are appropriate and the kind
of problems that they can best solve. The goal is to empower the engineer with
the ability to select a formal modeling approach that will best solve a problem at
hand.
The third principle is the use of tool-assisted reasoning. As we noted earlier,
one of the enablers of formal methods has been the remarkable improvement in
tools for reasoning about formal models. We believe that such tools should be
leveraged as appropriate. Of course, as with the choice of modeling approach,
the strengths and weaknesses of a given tool must be evaluated with respect to its
engineering benefits to the project.
In this book, as we introduce various modeling approaches we will also at-
tempt to describe the way that tools can be used to assist in analysis of the models.
While we will not be able to talk about individual tools in depth, we will try to
show what kinds of benefits various tools provide, and give examples of how they
are used.
The fourth principle is that formal models are engineering artifacts them-
selves. That is to say, when we create a formal model we should be concerned
not only with what it tells us about a system, but also the properties of that formal
model that make it a usable engineered artifact. In particular, we need to consider
how easily the models described using a given approach can be incrementally
extended, enhanced, composed with other models, read by software developers,
reused across different development projects, and so on.
As we will see in later chapters, different models take very different approaches
to these engineering concerns. Specifically, the way in which a modeling approach
allows us to compose a model from smaller models becomes a crucial discrimina-
tor when understanding the costs and benefits of that approach.
which the rest of the book is based. It presents the mathematical concepts, such
as logics, proofs, theories, sets, functions, relations, etc., on which all modern
modeling approaches build. For some readers this material will be familiar from
a course in discrete mathematics. In that case, a quick skim of that part of the
book may be sufficient to remind the reader of those concepts and to become
familiar with the specific mathematical notation that we use in this book. For other
readers, particularly those who have not been exposed to a course in mathematics
for computer science, this may be a particularly challenging section of the book.
Indeed, the reader may want to refer to one of many textbooks in this area for
additional examples and practice beyond what we can offer in this book.
The second layer presents the concepts that allow us to relate raw mathemat-
ics to computation and to software. We consider topics such as state machines,
invariants, pre- and post-conditions, proving properties about programs, and so
on. The goal here is to introduce these concepts in the simplest possible way,
independently of any specific notation or method. To the extent possible we will
rely on standard mathematical notations, sugared only enough to make the job of
specifying computations more natural.
The third layer consists of a set of modules, each focusing on a particular
notation and method of formal modeling. The goal in this layer is to show how
the general concepts of the second layer can be turned into effective engineering
tools through the specialization of the concepts, and the introduction of special-
purpose notations and tools. For the purposes of the book we will provide a small
number of such modules to illustrate some of the more important parts of the space
of formal modeling approaches. Our hope is that over time the set of such modules
will continue to grow, and that others in the community of formal modeling will
contribute their own modules.
An important cross-cutting theme for the book is the use of exercises to illus-
trate the main points of the text. We strongly encourage the reader to try out these
exercises, as there is no substitute for engaging directly in the process of formal
modeling. Additionally, we use some of the exercises to explore certain themes in
formal modeling that we do not have the space to detail in the main body of the
text.
Finally, each chapter of the book contains a list of additional readings. The
field of formal methods is large, and in this book we can only scratch the surface.
The readings point the way to more in-depth treatment of many of the topics that
we cover, and many that we can only hint at.
18 CHAPTER 1. INTRODUCTION
Further Reading
[TBD]
Part I
Foundations
19
Introduction
The starting point for any treatment of formal modeling is mathematics. Essen-
tially all approaches to formal methods are founded on mathematical principles,
and look to mathematics for the underlying mechanisms of reasoning about pre-
cise models.
But what parts of mathematics are needed? The field of mathematics is huge in
its own right, and arguably almost any branch of it might have some applicability
to software systems. Moreover, learning sophisticated mathematical techniques
and ideas could occupy many courses by themselves.
Thankfully, over the past two decades there has emerged an understanding
that, in fact, most of the models needed by a practicing software engineer rely
on a relatively small set of mathematical notions. In many cases these concepts
are taught in specific courses, labeled by names like “Discrete Mathematics” or
“Mathematics for Computer Scientists.”
It is this body of material that we will examine in Part 1. We start by in-
troducing the idea of a formal model, illustrating how mathematical abstraction
can help us reason about interesting systems. As we will see, every formal sys-
tem is constructed from a certain set of basic building blocks that determine what
kinds of models we can express, and what kinds of judgments we can make about
those models. Next we consider the logical apparatus that we will need to reason
formally about mathematical models. This covers much of the standard material
on propositional and predicate logic, with particular emphasis on the ability to
translate between formal and informal models. Finally, we consider the building
blocks for creating models of complex software systems and their behavior: sets,
relations, functions, sequences, and so on.
In presenting the material of Part 1 we will attempt to find a middle ground
between thoroughness and brevity. While we do not expect Part 1 to substitute for
a full course in discrete mathematics, we hope that it will provide a solid overview
of the concepts, and we include pointers for readers who may want to go beyond
21
22
Formal Models
One of the hallmarks of any engineering discipline is effective use of system mod-
els. For example, civil engineers use stress models to determine whether the sup-
porting structure for a bridge will support anticipated traffic loads. Aeronautical
engineers use airflow models to design wing surfaces of jets. Electrical engineers
use heat-transfer models to reason about power consumption of a computing de-
vice.
Models are central to the engineering enterprise because they permit engineers
to reason about a system design or implementation at a level of abstraction at
which the system’s essential properties can be better understood. This capability
in turn supports early exploration of design tradeoffs, often making it possible
to try out various approaches to system design before committing to a particular
solution. These models are often useful in detecting design flaws early in the
system development cycle, when errors are relatively less expensive to fix. In
some cases engineering models can also be used as blueprints for more detailed
design and implementation.
When an engineer decides to use models there are a number of important con-
siderations in choosing what to model and how to model it. These considerations
arise from the fact that engineering resources are limited, and building, analyzing,
and maintaining models takes time and effort. So a central question is how can an
engineer get the maximum value out of the modeling effort?
23
24 CHAPTER 2. FORMAL MODELS
increasingly lower-level designs. Ideally one would like to make sure that prop-
erties of the more abstract models are preserved by the lower-level ones. (This is
sometimes referred to as a refinement relationship.) So the question arises, how
can we guarantee cross-model properties, and at what cost?
mathematics. The advantage of formality is that such models are (a) precise –
their meaning is unambiguous; (b) formally analyzable – we can use mathematical
reasoning to determine properties of the model; and (c) mechanizable – they can
be processed and analyzed by computer programs.
These three properties are critical in applying formal modeling to real systems.
Without precision we would be unable to satisfy our need to communicate our
ideas unambiguously to others, or to have confidence that we have expressed what
we intend. Without analyzability, models have limited usefulness: in general,
the effort required to produce them would not be commensurate with the benefit
derived. Without tools to process our models, it is difficult to scale them to the
needs of practical systems.
We are also tangentially interested in formal methods. Formal methods pre-
scribe the way in which you can create and reason about formal models. In par-
ticular, they provide guidance on
Usually such methods are tied to specific kinds of models or domains. For exam-
ple, methods for refining models from abstract to more concrete usually depend
on a particular notation or logic.
2.5 An Example
Let us now consider a simple example to illustrate the points we have been mak-
ing. Suppose we are given a description of a game that is to be played as follows:
We start with a large container that contains a number of identical white and black
balls. We are also given a large supply of black balls on the side. To play the game
we repeatedly draw two balls randomly from the container, and then put a single
ball back in the container. The rules for deciding what color ball to put back are
as follows:
1. “If the two selected balls are both black or both white, put a black ball back
into the container.”
2.5. AN EXAMPLE 27
stock of black
jar of balls balls
Rules
2. “If one ball is white and the other black, put a white ball back into the
container.”
3. “If there is only one ball left in the container, stop.”
(b − 1, w) (Rule 1.)
T(b, w) = (b + 1, w − 2) (Rule 1.)
(b − 1, w) (Rule 2.)
represent in that model. Figure 2.3 has a summary of the model we choose. The
basic idea is to represent the game as the value of two numeric variables, b and w,
representing (respectively) the number of black and white balls remaining in the
container.
We then prescribe the rules of the game in terms of those variables. We will
do this by defining a transition relation T that says how we change the values of b
and w on each turn. In Figure 2.3 you can see that the three cases are defined to fit
the rules listed above. For example, the first rule models the removal of two black
balls: we decrease b by two when we remove the balls, and then increase it by 1
when we put a black ball back into the container. The final line describes what
happens when there is only one ball left: we leave the configuration the same.1
Thus far we have not done anything more than use a mathematical notation to
express what we already knew. But let us now manipulate the model a little bit.
In this case we will perform simple arithmetic on T to get the simplified rules of
Figure 2.4.
Does the model now suggest something interesting? As a hint, note that the
number of white balls either remains the same or is decreased by two. What does
that imply?
The answer is that the color of the final ball can be predicted if we know
something about the original number of white balls in the container. If w is even
the result will be black; if odd, then white. To see why this is so, we note that
1 The observant reader might notice that our model does not actually “stop,” but we won’t worry
Chapter Notes
The example of the black and white balls was adapted from the book The Specifi-
cation of Complex Systems by B. Cohen, W.T. Harwood, and M.I. Jackson [2].
Further Reading
[TBD]
2.6. EXERCISES 31
2.6 Exercises
1. Consider the game described in this chapter. Suppose that the container
starts out with N balls.
(a) List five aspects of the real world that were not represented in our
formal model.
(b) How many “turns” will it take for the game to stop?
(c) What is the largest number of extra black balls needed, and what con-
figuration of the container causes this number to be required? Assume
that when two black balls are taken out of the container one is put back
into the container and the other into the stock of extra balls.
(d) Argue formally that the game stops.
2. Consider a simple version of the game of Nim (as presented in [1]) in which
two players alternate removing one or two toothpicks from a pile of N tooth-
picks. The game stops when there are no toothpicks left in the pile; the
player who removes the last toothpick loses the game.
(a) Describe a formal model of the game by (a) specifying the state model,
and (b) giving a transition relation that describes the valid moves of the
game in terms of changes to the state.
(b) What aspects of the game are represented in the model? What aspects
are left out? Would the model be different for more than two players?
(c) Argue that the game eventually stops.
(d) When N > 1 and (N mod 3) 6= 1 the first player has a winning strategy.
What is that strategy?
(e) Extend your formal model to encode the winning strategy for the first
player. That is to say, the rules should automatically lead to a win for
the first player. What new aspects do you need to represent?
32 CHAPTER 2. FORMAL MODELS
Chapter 3
Formal Systems
The goal of this book is to help you understand how to model complex software
systems. There are many ways in which we might describe these systems. Here
we will be particularly interested in approaches that have a mathematical basis,
and hence allow us to be precise about what we want to model, as well as to reason
about properties of the models. That is, we are interested in formal models.
But there are many kinds of formal models. For example, some are based on
systems of equations. Others on mathematical logic. Others on computational
rules. Each of these may have different notational structure and specific ways of
reasoning about it. How are we to make sense of this complex space of possibili-
ties?
Thankfully, all types of formal models have a similar underlying form. First,
they define a language for describing a certain class of models. And second,
they provide a set of inference rules for manipulating models of that type and
for proving results about them. As we will see in this chapter, the combination
of these two things — a formal notation and a set of inference rules — defines a
formal system. In addition, we will also be interested in assigning a meaning to the
models in a formal system by relating them to some domain of interest, or defining
their semantics. Finally, there may exist a body of useful results associated with
a type of model. These will make our life a lot easier by giving us a rich starting
point for working with that type of model.
33
34 CHAPTER 3. FORMAL SYSTEMS
The second rule, number, expresses the idea that a simple number is a se-
quence of digits. To express this idea the rule uses the choice operator: a number
is either a single digit, or a number followed by a single digit. Note that the rule
is recursive. That is, it uses the rule’s name (number) as part of its own definition.
Hence, numbers of arbitrary length can be built up by applying the rule multiple
times.
The third rule, describes a digit as any of the ten decimal digits of the alphabet.
Such rules determine whether a given sequence of symbols in the alphabet of
the language is a well-formed formula, or wff (pronounced “woof”). A sequence
of symbols is a wff of a grammar rule in the language if and only if its structure
conforms to that grammar rule (and any other rules on which that rule depends).
Normally we will designate one of the rules as the “topmost” rule for the grammar
– typically the first rule – and say that a w is a wff of the language if it conforms
to that rule. Determining whether a given sequence of symbols is a wff of the
language is often called parsing.
For example we can see that for Example 3.1, as expected, 42 and 3.14 are
wffs. The first is a wff by virtue of the first branch of the decimal number rule.
The second is a wff by virtue of the second branch of it. But 45. and .22 are not
wffs, since according to the first rule any decimal number with a decimal point
must have a simple number on either side of it.
Example 3.2. Consider a language StarDiamond with alphabet {, ∗} and the fol-
lowing grammar:
• ∗∗∗
•
• ∗∗
• ∗∗
• ∗∗
36 CHAPTER 3. FORMAL SYSTEMS
Example 3.3. As another example, consider the language Smileys with alphabet
{ : , ; , - , ) , ( }, and the following grammar:
3.2 Semantics
It is important to note that wffs in a formal language are merely strings of symbols:
there is no intrinsic meaning associated with them. This is particularly clear with
a language like StarDiamond. For example, we might choose to interpret “∗” as
a 2, “” as a 3, and juxtaposition (following one symbol by another) as addition.
In that case, the wff ∗ ∗ would represent the number 13. On the other hand,
we might just as easily interpret “∗” as a 10, “” as a 5, and juxtaposition as
multiplication. In that case the same wff would denote the number 12500.
Hence in order for a language to be useful we will need to explicitly assign
meanings to the wffs in that language. To do this we will need to pick a domain
of interest and rules that tell us how each wff in the language is mapped to some
value in that domain. Such an assignment of meanings is called an interpretation
of the language. Providing an interpretation is often called giving the language a
semantics.
In many cases there will be a natural interpretation for a language. For ex-
ample, it would be surprising if the domain of interest for the language Decimals
were not the decimal numbers, in which, the symbol 1 denotes the number one, 2
the number two, etc. Similarly in the language of set theory (Chapter 6), the wff
{1, 2} would naturally be interpreted as the set containing the numbers one and
two.
In other cases we will need to be explicit. For example, in the language of
propositional logic (Chapter 4), to interpret a wff like p ∧ q ⇒ r, we will need to
be clear about the interpretation of its constituent symbols p, q, and r.
There are many ways that one might go about defining the semantics of a
language. Indeed, a study of ways in which one can do this formally is itself an
important subfield of computer science. However, a detailed examination of this
topic is beyond the scope of this book. For now we will use relatively informal
3.3. INFERENCE SYSTEMS 37
Wffs in this language consist of three strings of stars separated by a diamond and
a circle. Examples of wffs in this language include
• ∗∗◦∗∗
• ∗∗∗◦∗∗∗
• ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗ ∗.
The inference system for Stars consists of the following:
Axiom A ∗ ∗ ◦ ∗ ∗
Using this inference system we can apply rule R to the wff ∗ ∗ ∗ ◦ ∗ ∗ ∗ to get
∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗ as an immediate consequence, where the m, n, and r in R are filled
by ∗, ∗ ∗ and ∗ ∗ ∗, respectively.
In the example above, we characterized the inference rule informally. To elim-
inate ambiguity in rule definitions, we will need a way to specify rules more pre-
cisely. To do this we will use the following general form:
existing patterns
rule name
consequence pattern
Above the line is a list of the wff structures required to apply the rule. Below
the line is the resulting structure derived from the existing ones. The rule name
appears on the right. (In some cases we will also include an extra condition of ap-
plicability, called a side condition – we will see examples of this in later chapters.)
To describe wff patterns we use schema variables. These are variables that
can be instantiated with any wff of the appropriate grammar rule.
Example 3.5. The inference rule from Example 3.4 would be written formally as
mn◦r
R
m n∗ ◦r∗
In this rule m, n, and r are schema variables representing arbitrary sequences of
stars.
Although Stars stands on its own as a purely syntactic system, we can also
associate a semantics to it.
Example 3.6. Consider the following interpretation of Stars
∗ → 1
∗∗ → 2
∗∗∗ → 3
etc.
→ +
◦ → =
That is, a string of N stars denotes the number N. For example, a wff of the
m n r
z }| { z }| { z }| {
form ∗ ∗ . . . ∗ ∗ ∗ . . . ∗ ◦ ∗ ∗ . . . ∗, containing strings of stars of length m, n, and r
represents a statement of the form m + n = r.
3.4. PROOFS AND THEOREMS 39
When interpreted in this way, we can evaluate whether a given wff in Stars is
true or false. For instance, the wff ∗ ∗ ∗ ◦ ∗ ∗ ∗, denoting 1 + 2 = 3 would be true,
while the wff ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗, denoting 2 + 2 = 3 would be false.
Notice that the inference system of Stars makes sense according to this in-
terpretation. The interpretation of the axiom ∗ ∗ ◦ ∗ ∗ is true since 1 + 1 = 2.
And the inference rule says that if we know m + n = r, then we can conclude
m + n + 1 = r + 1.
When we want to be precise about the line of reasoning that we are using, we
will format proofs as a numbered sequence of wffs, together with an indication of
the justification for writing each wff down. The justification consists of the name
of the axiom, or the name of the rule and the lines on which it depends.
Example 3.8. The inference rule for Stars would be represented as
a. m n ◦ r
m n∗ ◦r∗ R, a
1. ∗ ∗ ◦ ∗ ∗ axiom A
2. ∗ ∗ ∗ ◦ ∗ ∗ ∗ R, 1
3. ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗ R, 2
To make it easier to know what lines a rule should reference, we will often
write our inference rules using labels for the wffs above the line, and an indication
of how those labels follow the use of the rule.
Example 3.9. Using labeling, the inference rule for Stars would be represented as
a. m n ◦ r
m n∗ ◦r∗ R, a
The collection of all theorems for a formal system F is called the theory of
F. For example, for the formal system of sets (Chapter 6) the set of theorems
is called “set theory.” Sometimes a formal system is given the name calculus.
For example, Chapter 4 discusses the Propositional Calculus and Chapter 5 the
Predicate Calculus.
The example system, Stars, is extremely simple, and it is relatively easy to
determine whether a given wff is in its theory (i.e., can be proved), and if so, how
to prove it. More typically, however, the formal systems that we will be using
in this book will have several axioms and many inference rules. As we will see
later, deciding whether a given wff is a theorem or not in such a system becomes
a non-trivial task, often involving ingenuity and creativity.
It is important to emphasize again that when we prove things in a formal sys-
tem we we are manipulating wffs in purely syntactic ways, appealing only to their
linguistic structure and not any interpretations of them.
However, it is interesting to see what happens when we give our formal system
an interpretation in a domain in which each wff denotes a statement that is either
true or false. In that case it is reasonable to ask whether all theorems that we can
prove in the formal system are true. Conversely, we might wonder whether every
true statement in the domain of interpretation has some proof. If the first condition
holds we say that the formal system is sound with respect to that interpretation. If
the second condition holds we say that the formal system is complete with respect
to the interpretation.
Note that in order for a system to be sound the axioms must be true when
interpreted in the semantic domain, since axioms are themselves trivially theorems
3.5. DERIVATIONS 41
in the formal system. Furthermore, if a set of wffs is true, then any immediate
consequence of the those wffs should also be true.
For example, under the interpretation given earlier for Stars the formal system
that we defined is sound. As we noted earlier, the axiom represents a true state-
ment, and the consequences of the inference rule will be true if the wff that it is
applied to is also true.
How about statements like 2 + 1 = 3? These are true in the semantic domain,
and we might hope that there would be proofs of them in the formal system. Can
we prove them? A little thought indicates that we can’t. Since the axiom permits
only a single star in the first place, and the inference rule never increases the
number of stars in that place, there is no way that we can produce as a theorem a
wff containing more than one star in the first place. Hence Stars is not complete
for the interpretation that we gave it.
As we will see in later chapters, this situation is often the case: most formal
systems of interest are sound, but not complete. It turns out that incompleteness
is a consequence of the fundamental nature of formal systems. Moreover, from a
practical perspective, lack of completeness is entirely reasonable. Naturally, we
want it to be the case that all theorems in our formal system are true. Otherwise
there would be little point in proving them. But often in order to simplify our
reasoning and reduce the cost of developing models, our formal systems will be
partial. That is, they will attempt to express only some aspects of a semantic
domain of interest. Hence, by choice we will be leaving out many of the details
of a formal system that would be needed to prove a broader class of theorems.
For some formal systems it is possible to formally prove that the system is
sound and/or complete. Such theorems are instances of what logicians refer to as
meta-theorem. This is because they are theorems about the theorems of a formal
system – they tell us something about the kinds of things that we can prove, and
the ways we can prove them in that system.
3.5 Derivations
When we prove a theorem, we are arguing from “first principles” – namely the
axioms of the formal system. In many cases, what we would like to do, however,
is to reason about some consequence under a set of assumptions.
As a simple example, consider the problem of modeling a certain kind of med-
ical patient monitoring device. It might be possible to characterize such devices
building up from a set of axioms about the fundamental nature of these devices –
42 CHAPTER 3. FORMAL SYSTEMS
the physics of the sensors, the electrical properties of the circuits and processor,
etc. Such a set of axioms would likely be quite complex and require considerable
effort to develop.
Another way, however, is to make a set of assumptions about these devices,
such as that certain primitive actions will have certain effects, that sensors have
certain behavioral characteristics, etc. We would then like to reason about the
properties of the device under the condition that those assumptions hold. (Of
course, if we are wrong about our assumptions, our conclusions will have little
value.)
To enable such an approach, we augment our notion of proof to allow the intro-
duction of a set of assumptions. These assumptions can be treated as if they were
additional axioms of the formal system, introduced temporarily for the purpose of
a particular proof.
A derivation of a wff W in formal system F from a set P of wffs, called
premises, is a sequence of wffs in the language of F, in which W is the last wff,
and where each wff in the sequence is either
• an axiom of F; or
• a premise in P; or
• an immediate consequence of the previous wffs using one of the inference
rules of F.
We say that W is derived from P, and write P ` W. When formatting the proof,
we indicate the use of a premise by noting that fact in the justification column. We
use shorthand P a` W to indicate that both P ` W and W ` P.
Example 3.10. For Stars prove the following: ∗ ∗ ∗ ◦ ∗ ∗ ∗ ` ∗ ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗
Proof
1. ∗ ∗ ∗ ◦ ∗ ∗ ∗ premise
2. ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗ Rule R, 1
3. ∗ ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗ ∗ Rule R, 2
Relative to the earlier interpretation of Stars, we have shown that if assume that
2 + 1 = 3, then we can prove that 2 + 3 = 5.
The example above illustrates an interesting point: we can prove nonsense if
our premises are not valid. In this case we showed that by assuming that 2 +
1 = 2 we can prove that 2 + 3 = 4. This is to be expected: since our formal
derivation machinery is independent of any interpretation, it can’t be expected to
discriminate between premises that are true and those that are false. In fact, there
may well exist some other interpretation for which those premises are true.
One final comment: there is, of course, a close relationship between proofs and
derivations. In particular, every proof is a derivation in which the set of premises is
empty. And, conversely, every derivation can be thought of as a proof in a “richer”
formal system in which the premises have been added to the set of axioms.
Chapter Notes
The Decimals, StarDiamond, and Stars formal systems were adapted from the
book Software Engineering Mathematics by J. Woodcock and M. Loomes [6].
We have shown one possible way of structuring proofs and derivations. Other
ways of strucuring derivations exist. For example, in the Gentzen style of proofs [3]
proofs are given in a tree format with the root of the tree being the wff to be proved.
The application of an inference rule generates branching of the tree in the follow-
ing way: the root is matched to the conclusion of the rule and as many branches as
there are antecedents in the rule are created. Each of these branches in turn is “ex-
panded” (by applying suitable inference rules) causing the proof tree to grow. The
details of the rule applications, especially rules that make use of assumptions, are
not important at this point. The proof is considered complete when all the leaves
are premises, theorems of the system, or assumptions introduced by the inference
rules, and all the subproofs are complete.
Further Reading
[TBD]
3.6 Exercises
1. Two formal languages can have the same alphabet but different syntactic
rules. Consider the language of section numbers, whose alphabet is the
44 CHAPTER 3. FORMAL SYSTEMS
2. For the grammar of Example 3.1 it turns out that expressions like 000 and
0.0 are wffs. Write a version of the grammar that does not permit these
kinds of expressions to be wffs.
(a) Extend the inference system for Stars so that wffs like ∗ ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗
∗ ∗ can be proved to be theorems.
(b) Illustrate the power of the extended system by providing a proof for
the following theorems:
i. ` ∗ ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗ ∗
ii. ` ∗ ∗ ∗ ∗ ∗ ◦ ∗ ∗ ∗ ∗ ∗
5. Smiiileys
We extend the grammar of Example 3.3 so that a smiley’s nose can be arbi-
trarily long:
:) → happy
: n) → happy
:( → sad
: n( → sad
;) → flirty
; n) → flirty
;( → weeping
; n( → weeping
syntax:
(a) (p ∧ q ∧ s ⇒ p) ∧ (q ∧ r ⇒ p) ∧ (p ∧ s ⇒ s)
(b) (p ∧ q ∧ s ⇒ ¬ p) ∧ (q ∧ r ⇒ p) ∧ (p ∧ s ⇒ s)
(c) (p ∧ q ∧ s ⇒ ⊥) ∧ (q ∧ r ⇒ p) ∧ (> ⇒ s)
(d) (p ∧ q ∧ s ⇒ ⊥) ∧ (¬ q ∧ r ⇒ p) ∧ (> ⇒ s)
NatNum = “zero”
| “s”, “(”, NatNum, “)”
| “add”, “(”, NatNum, “,”, NatNum, “)”
Inference System:
48 CHAPTER 3. FORMAL SYSTEMS
Axioms:
zero
Inference Rules:
add(zero, x)
rule1
x
add(s(x), y)
rule2
add(x, s(y))
add(s(s(s(zero))), s(s(zero)))
Propositional Logic
4.1 Propositions
The starting point for any formal modeling enterprise is the ability to make state-
ments about some domain of interest, and to reason about the truth of those state-
ments.
Propositions are statements that are either true or false (but not both).
Example 4.1. The following are propositions:
We will avoid statements whose truth or falsity depends on the context. For exam-
ple, statements such as “February has 28 days,” or “my brother is younger than
I,” express different propositions depending on whether the February in question
49
50 CHAPTER 4. PROPOSITIONAL LOGIC
is from a leap year or not, and who makes the second statement. Other sentences
that are not propositions include interrogative, imperative, and exclamatory sen-
tences. Such sentences cannot be said to be true or false.
Example 4.2. The following are not propositions:
• “What a day!”
• “Are there any questions so far?”
• “Today is Sunday.”
• “Pass the salt, please.”
The same proposition can often be expressed in several ways.
Example 4.3. The following express the same proposition:
• “Five is bigger than four.”
• “Four is smaller than five.”
• “4 < 5”
• “5 > 4”
Some propositions have structure. For example, the proposition
“David went to the store and Bill went to the movies.”
can be decomposed into two simpler propositions:
“‘David went to the store’ and ‘Bill went to the movies’.”
Similarly, any two propositions can be combined using words such as “and”, “or”,
“if . . . then . . . ”, etc. to form new propositions.
4.2 Syntax
To create a formal system for statements such as those in the previous section,
we first need to define the grammar of the language that we will use to represent
propositions.
The alphabet of propositional logic consists of the following symbols:
p, q, r, ..., p1 , q1 , r1 , . . . , ¬, ∨, ∧, ⇒, ⇔, (, )
4.2. SYNTAX 51
We use lower case letters to denote primitive or atomic propositions, and assume
that we never run out of such symbols. These letters will act as shorthand for
statements like “David went to the store.”
The symbols ¬, ∨, ∧, ⇒, ⇔ are used to combine other propositions to form
compound propositions and are known as propositional connectives.
The well-formed formulae of propositional logic are determined by the fol-
lowing grammar:
sentence = “p” | “q” | “r” | . . . | “p1 ” | “q1 ” | “r1 ” | . . .
| “¬”, sentence
| “(”, sentence, “ ∨”, sentence, “)”
| “(”, sentence, “ ∧”, sentence, “)”
| “(”, sentence, “ ⇒”, sentence, “)”
| “(”, sentence, “ ⇔”, sentence, “)”
Example 4.4. The following are sentences in propositional logic:
• ((p ∨ q) ∧ ¬(r ⇒ ¬q))
• ¬¬¬p
• ((p ∨ q) ∨ r)
We also introduce a few special terms to refer to specific forms of propositions.
Let p and q be arbitrary sentences in propositional logic. A sentence of the form
¬p is called a negation,
p ∨ q is called a disjunction; p and q are disjuncts,
p ∧ q is called a conjunction; p and q are conjuncts,
p ⇒ q is called an implication; p is the antecedent, and q is the consequent
or conclusion,
p ⇔ q is called a bi-implication.
In practice it is often a good idea to use parentheses to clarify the intended mean-
ing even when they are not strictly necessary. For example, sentence 1 above is
more readable if we write it as (q ∧ ¬r ∨ p) ⇒ ¬q. Similarly, it is often a good
idea to include parentheses when ∧ and ∨ are used next to each other. For ex-
ample, we would write (q ∧ ¬r) ∨ p even though the parentheses are not strictly
required.
4.3 Semantics
To define the meaning of a propositional logic, we need to pick a domain of inter-
est, and explain how every propositional sentence is mapped into that domain.
When interpreting propositions the domain of interest is simply that of truth
values: true (T) and false (F). The meaning of propositional sentences is then
determined as follows:
The truth tables of Figure 4.1 encode the following informal rules:
p q p ∨ q p ⇒ (p ∨ q) p q p ∧ q p ⇒ (p ∧ q)
T T T T T T T T
T F T T T F F F
F T T T F T F T
F F F T F F F T
54 CHAPTER 4. PROPOSITIONAL LOGIC
Similarly, the truth table for p ⇒ (p ∧ q) reveals that for p true and q false, the sen-
tence is false (hence the sentence is not a tautology); however, the sentence is true
for all other interpretations of p and q (hence the sentence is not a contradiction).
Truth tables work fine for understanding properties of simple propositional
formulae. However, as the size of a formula increases truth tables rapidly become
impractical. Indeed, the number of rows grows exponentially with the number of
propositional symbols in a sentence. To make it possible to reason about complex
propositional sentences, we will introduce a way to carry out reasoning about
propositions at the syntactic level without having to appeal explicitly to the in-
terpretation of propositions. As we discussed in Chapter 3, an inference system
allows us to do just that.
4.4.1 Conjunction
• Conjunction introduction:
a. p a. p
b. q b. q
p ∧ q ∧-intro, a,b q∧p ∧-intro, a,b
• Conjunction elimination:
a. p ∧ q a. p ∧ q
p ∧-elim, a q ∧-elim, a
4.4.2 Implication
• Implication introduction:
a. p assumption
..
.
c. q
p⇒q ⇒-intro, a–c
• Implication elimination:
a. p ⇒ q
b. p
q ⇒-elim, a,b
Implication introduction encodes the intuition that if by assuming p we can
show q, then we must have p ⇒ q. This follows from our understanding of logical
implication: that if q is true whenever p is true, then p ⇒ q is true. To represent
this idea the rule for implication introduction makes use of an assumption in the
derivation. An assumption allows us to introduce an arbitrary sentence that we
can treat temporarily just like any other derived sentence. At some point in the
proof, that assumption is discharged by using it in some inference rule. The part
of the proof within which that assumption can be used is called its scope (marked
by a vertical line). The scope starts at the line in which the assumption is intro-
duced and ends before the line of the rule used to discharge the assumption. An
1 We label some useful results for easy reference. In Section 4.5 we show how such derived
results can be used as inference rules.
56 CHAPTER 4. PROPOSITIONAL LOGIC
assumption, and any statements derived within the scope of that assumption must
not be used outside that assumption’s scope.
For historical reasons, implication elimination sometimes goes by the name
modus ponens.
Example 4.8. We show that p ⇒ (q ⇒ r) ` p ∧ q ⇒ r.
1. p ⇒ (q ⇒ r) premise
2. p∧q assumption
3. p ∧-elim, 2
4. q⇒r ⇒-elim, 1,3
5. q ∧-elim, 2
6. r ⇒-elim, 4,5
7. p∧q⇒r ⇒-intro, 2–6
While in general discovering a legal derivation requires creativity and insight
into the problem, in the proof above (and in many proofs) the structure of the
sentence that we want to derive helps determine the structure of the derivation.
In this case, since ⇒ is the outermost connective of p ∧ q ⇒ r (remember that ∧
binds tighter than ⇒), we try to match the sentence to rules whose conclusion has
⇒. This suggests ⇒-intro as a potential rule. This in turn causes us to introduce
line 2 (and the scope of the assumption). Now we need to derive r under the
assumption p ∧ q. At this point we may do one of two things: either try to derive
as many sentences from the assumption as we can, or look at the premise for
clues about what may be useful next. Since the premise has ⇒ as its outermost
connective, we try to match that to the premises part of the inference rules, and
notice that we could use ⇒-elim if we could derive p. Moreover, we discover that
p is easily derivable from p ∧ q using ∧-elim, so we write down line 3. Now line
4 follows from ⇒-elim using lines 1 and 3. The rest of the derivation is relatively
straightforward.
4.4.3 Bi-implication
• Bi-implication introduction:
a. p ⇒ q a. p ⇒ q
b. q ⇒ p b. q ⇒ p
p ⇔ q ⇔-intro, a,b q⇔p ⇔-intro, a,b
• Bi-implication elimination:
a. p ⇔ q a. p ⇔ q
p ⇒ q ⇔-elim, a q ⇒ p ⇔-elim, a
4.4. PROPOSITIONAL CALCULUS 57
4.4.4 Disjunction
• Disjunction introduction:
a. p a. q
p ∨ q ∨-intro, a p ∨ q ∨-intro, a
• Disjunction elimination:
a. p ∨ q
b. p assumption
..
.
d. r
e. q assumption
..
.
g. r
r ∨-elim, a,b–d,e–g
58 CHAPTER 4. PROPOSITIONAL LOGIC
1. p ∨ (q ∨ r) premise
2. p assumption
3. p∨q ∨-intro, 2
4. (p ∨ q) ∨ r ∨-intro, 3
5. q∨r assumption
6. q assumption
7. p∨q ∨-intro, 6
8. (p ∨ q) ∨ r ∨-intro, 7
9. r assumption
10. (p ∨ q) ∨ r ∨-intro, 9
11. (p ∨ q) ∨ r ∨-elim, 5,6–8,9–10
12. (p ∨ q) ∨ r ∨-elim, 1,2–4,5–11
for the schema variable p and r for the schema variable q). Deriving (p ∨ q) ∨ r
from q ∨ r requires applying the disjunction elimination rule once more: we derive
(p ∨ q) ∨ r from q and r separately. This requires two applications of disjunction
introduction to derive the sentence from q, and one application of disjunction
introduction to derive it from r. Thus, we arrive at derivation line 11. Now, having
derived the conclusion from both p and q ∨ r we finish the derivation.
4.4.5 Negation
• Negation introduction:
a. p assumption
..
.
c. q
..
.
e. ¬q
¬p ¬-intro, a,c,e
• Negation elimination:
a. ¬p assumption
..
.
c. q
..
.
e. ¬q
p ¬-elim, a,c,e
Although the negation rules are intuitive, figuring out what contradiction to
derive (that is, how to instantiate q in the rules) can be challenging in practice.
Example 4.12. Double negation
We show that ¬¬p a` p.
First we show ¬¬p ` p.
1. ¬¬p premise
2. ¬p assumption
3. ¬p copy from 2
4. ¬¬p copy from 1
5. p ¬-elim, 2,3,4
1. p premise
2. ¬p assumption
3. p copy from 1.
4. ¬p copy from 2.
5. ¬¬p ¬-intro, 2,3,4
a1 . p1
a2 . p2
..
.
an . pn
q rule name, a1 ,a2 ,. . . , an
Example 4.13. In Example 4.12 we proved ¬¬p a` p. We called this rule (which
allows us to derive p and ¬¬p from each other) “Double negation.”
We show that ` r ∨ ¬r.
1. ¬(r ∨ ¬r) assumption
2. r assumption
3. r ∨ ¬r ∨-intro, 2
4. ¬(r ∨ ¬r) copy from 1
5. ¬r ¬-intro, 2,3,4
6. r ∨ ¬r ∨-intro, 5
7. ¬(r ∨ ¬r) copy from 1
8. ¬¬(r ∨ ¬r) ¬-intro, 1,6,7
9. r ∨ ¬r Double negation, 8
4 The justification for the generalization also relies on the following fact:
Note that we have only needed the ¬¬p ` p side of the rule.
Figure 4.2 summarizes some useful derived rules of propositional calculus.
These include many of the common propositional rules such as DeMorgan’s Laws,
commutativity and associativity of conjunction, disjunction, and bi-implication,
and rules for reasoning with contrapositives.
` p ∨ ¬p Excluded Middle
p∨q ` q∨p ∨-Commutativity
p∧q ` q∧p ∧-Commutativity
p⇔q ` q⇔p ⇔-Commutativity
(p ∨ q) ∨ r a` p ∨ (q ∨ r) ∨-Associativity
(p ∧ q) ∧ r a` p ∧ (q ∧ r) ∧-Associativity
(p ⇔ q) ⇔ r a` p ⇔ (q ⇔ r) ⇔-Associativity
p ∨ (q ∧ r) a` (p ∨ q) ∧ (p ∨ r) ∨∧-Distributivity
p ∧ (q ∨ r) a` (p ∧ q) ∨ (p ∧ r) ∧∨-Distributivity
(p ⇒ q) ∧ (q ⇒ r) ` p ⇒ r ⇒-Transitivity
(p ⇔ q) ∧ (q ⇔ r) ` p ⇔ r ⇔-Transitivity
p ⇒ q a` ¬p ∨ q ⇒-Alternative
p ⇒ q a` ¬q ⇒ ¬p Contrapositives
¬¬p a` p Double Negation
(p ⇔ q) a` (¬p ⇔ ¬q) ⇔-Alternative
¬(p ∧ q) a` ¬p ∨ ¬q De Morgan
¬(p ∨ q) a` ¬p ∧ ¬q De Morgan
p ∧ q ⇒ r a` p ⇒ (q ⇒ r) Shunting
Traffic Lights
Let’s consider another example.
A simple description of the traffic lights can be created using the following
sentences as atomic propositions:
where == is used to represent the fact that traffic lights1 is a name for the propo-
sition on the right hand side of == . We pick ∧ as the connective since we would
like to say that all the mentioned propositions should simultaneously hold in the
world of traffic lights.
This description is simple, but not particularly useful. What can we say about
the truth or falsity of proposition “if the North-South light is green or yellow, the
66 CHAPTER 4. PROPOSITIONAL LOGIC
East-West light is red”? Not much. However, by revealing more of the structure
of r we can do better.
We might start by introducing r1 and r2 as atomic propositions for the two
parts of r:
r1 : “if the North-South light is green or yellow, the East-West light is red”
r2 : “if the East-West light is green or yellow, the North-South light is red”
We can then characterize the collection of facts about our traffic lights as:
traffic lights2 is more detailed, and the proposition “if the North-South light is
green or yellow, the East-West light is red” is one of its primitive facts. But many
of the questions that might arise would remain unanswered in this model. For
example, what can we say about the East-West light in the situation that “the
North-South light is green”? It surely must be red, but we cannot reason about this
given our current representation of traffic lights. We will return to this example
with a more expressive representation after discussing how English connective
words are translated into logic connectives.
4.7.2 Connectives
The process of translating English connective words into logic connectives is often
straightforward. Figure 4.3 summarizes how some of the most frequent English
connective words are translated.
In the translations of Figure 4.3 a couple of points are worth noting. First,
the word “or” is sometimes used in English in an inclusive and sometimes in
an exclusive sense. The usage that we have shown is inclusive, meaning that it
remains true when both p and q are true. In contrast, we treat the expression
“either p or q” as being exclusive: if both p and q are true the statement is false.
Another point worth noting is that we interpret the word “unless” to mean that p
and q cannot be true at the same time.
There are also some words that signal logical connectives in various ways de-
pending on context. One of these is “but.” In some cases it indicates conjunction.
For instance the sentence
condition for spl. However, ncl is not a sufficient condition for spl because ncl ⇒
spl is not true when the straight lines are on different planes.
Bi-implication
In informal English descriptions we rarely see the use of “if and only if.” There-
fore, it can be challenging to recognize when we are dealing with bi-implication.
Bi-implication is associated with facts that are both sufficient and necessary
for some other fact to be true. Such conditions usually arise in definitions. For
example, the definition of equilateral triangle can be expressed as:
“For two straight lines to be parallel, the lines must not cross and the
lines must be on the same plane.”
where samepl stands for “the lines are on the same plane.” However, we know
from Euclidean geometry that we are dealing with conditions that are both suffi-
cient and necessary, therefore, we write:
Another interesting property to prove about our traffic lights is that the lights
cannot both be green at the same time. A situation in which both lights were green
could be catastrophic indeed. We would like, therefore, to make sure that it cannot
happen in our model. In later chapters we will call such properties safety prop-
erties, because as the name suggests they are meant to show that “bad situations”
cannot happen.
Let us show that the lights cannot be green at the same time, that is
We use a proof by contradiction to derive this property: we assume that both lights
can be green at the same time. But from the property we just proved (and which
we call “nsg-ewr”), the North-South light’s greenness implies that the East-West
light is red. So we have that the East-West light is simultaneously green and red,
which contradicts ewl2 .
4.8. EXERCISES 71
A note about line 8 of the derivation: we are assuming that the parenthesized
version of
is
Chapter Notes
[TBD]
Further Reading
[TBD]
4.8 Exercises
1. Which of the following are well-formed sentences in propositional logic?
(a) p¬q
(b) ¬¬¬(p ∧ r ∧ q)
(c) ⇒ (s ⇔ r)
(d) ((q ⇔ r) → s)
72 CHAPTER 4. PROPOSITIONAL LOGIC
2. The following sentences have no parentheses and are parsed according to the
precedence rules.
(a) ¬q ∨ r ⇒ s
(b) q⇔r⇒s
(c) p ∨ s ∧ ¬¬q ⇔ p ∧ s
(d) p⇒q⇒r
(e) q⇔r⇔s
Questions:
(a) p ∨ (p ∧ q)
(b) p ∧ (p ∨ q)
(c) ¬p ⇒ (p ∧ (q ⇒ p))
(d) ¬p ∧ (p ∨ (q ⇒ p))
(e) (p ⇒ q) ⇒ (¬p ∨ q)
(f) (p ⇒ q) ⇔ (¬p ∨ q)
(a) valid?
(b) consistent?
(c) contingent?
(d) inconsistent?
5. Show that the following sentences are valid using truth tables:
(a) p ⇒ q ⇔ ¬p ∨ q
(b) p ⇒ q ⇔ ¬q ⇒ ¬p
(c) ¬(p ∧ q) ⇔ ¬p ∨ ¬q
(d) ¬(p ∨ q) ⇔ ¬p ∧ ¬q
(a) p ⇔ q ` q ⇒ p
4.8. EXERCISES 73
(b) p, ¬p ` q
(c) q ∧ ¬q ` p ⇒ r
(d) (p ∧ q) ∧ r ` p ∧ (q ∧ r)
(e) ¬¬q ` q ∨ r
(f) p ∧ q, p ⇒ s, q ⇒ t ` s ∧ t
(g) ` (p ∧ q) ⇒ q
(h) q ⇒ ¬p, p ∧ q ` r
(i) p∧q`p⇒q
(j) ¬p ∧ ¬q ` p ⇔ q
Example
7. NNF
8. CNF
Predicate Logic
5.1 Predicates
Sometimes we want to assert the proposition that all members of some set satisfy
a property. Suppose we have a set of friends {Larry, Joe, Moe}. If you wanted
to state that they are all tall you might do this in propositional logic using three
propositions and conjunction:
“Larry is tall” ∧ “Joe is tall” ∧ “Moe is tall”
While this achieves the desired effect, for large sets enumeration becomes un-
wieldy. Worse, for sets that have an infinite number of elements, such enumeration
is not possible.
An alternative is to introduce a property template “Tall()” — we call such a
property template a predicate. Given a friend, x, Tall(x) would be either true or
false depending on whether that person is tall or not. For our small set of friends
we could then write:
Tall(Larry) ∧ Tall(Joe) ∧ Tall(Moe)
This helps since the use of a predicate eliminates the need to define distinct
propositions for each element of the set. But we still have the problem that we
75
76 CHAPTER 5. PREDICATE LOGIC
must write down the property for each element of the set. To avoid the need for
explicit enumeration we introduce the following new syntax:
∀ x : Friends • Tall(x)
to represent
Tall(friend1) ∧ Tall(friend2) ∧ . . .
where the set Friends is defined as {friend1, friend2, . . .}. The notation “x : Friends”
indicates that x is a variable whose values are drawn from the set Friends. That
is, x stands for any object of the set Friends.
Similarly we introduce the syntax:
∃ x : Friends • Tall(x)
to represent:
Tall(friend1) ∨ Tall(friend2) ∨ . . .
where, again, “x : Friends” indicates that x stands for any object in the set Friends.
The notational limitations of propositional logic are problematic, but not par-
ticularly serious: after all, we can often capture our intent using single propo-
sitions, such as “All roads lead to Rome.” However, as we noted earlier, such
propositions limit our ability to use their structure to derive new facts. For exam-
ple, consider the following propositions:
1. “All roads lead to Rome.”
2. “X is a road.”
3. “X leads to Rome.”
What can we say about the truth or falsity of the last proposition? Although our
intuition suggests that the last proposition should follow from the first two, in
propositional logic we cannot deduce anything useful about the last proposition
from the first two. As we will see shortly, predicate logic and its inference system
will enable us to express the structure of propositions so that we can reason as
follows: “given that all roads lead to Rome, X’s being a road implies that X leads
to Rome.”
A predicate like Tall is an example of a unary predicate: it has just one place
for an object name to be put. We will also allow n-ary predicates, which can be
thought of as representing some relationship among n objects. For example, the
predicate “ParentOf (x, y)” can be used to express the fact that “y is a parent of x”.
5.2. SYNTAX 77
5.2 Syntax
The sentences of predicate logic are defined by extending the grammar for propo-
sitional logic as follows:
sentence = atomic proposition | predicate
| “¬”, sentence
| “(”, sentence, “ ∨”, sentence, “)”
| “(”, sentence, “ ∧”, sentence, “)”
| “(”, sentence, “ ⇒”, sentence, “)”
| “(”, sentence, “ ⇔”, sentence, “)”
| “(”, “ ∀”, variable, “ :”, setname, “•”, sentence, “)”
| “(”, “ ∃”, variable, “ :”, setname, “•”, sentence, “)”
atomic proposition = “p” | “q” | “r” | . . . | “p1 ” | “q1 ” | “r1 ” | . . .
predicate = predicate name, “(”, termlist, “)”
term list = term | term, “,”, term list
term = constant | variable | function application
constant = “a” | “b” | “c” | . . . | “a1 ” | “b1 ” | “c1 ” | . . .
variable = “x” | “y” | “z” | . . . | “x1 ” | “y1 ” | “z1 ” | . . .
function application = function name, “(”, term list, “)”
Usually we use upper case letters P, Q, R, . . . to denote generic predicates. We also
allow predicate names that suggest a particular property or relationship, such as
Tall and ParentOf .
The “parameters” of a predicate are called terms. Terms are constants, vari-
ables, or formed by function application. A constant represents a specific, fixed
object in a set of objects. A variable represents an undetermined object in a set of
objects — it can be instantiated with the name of any object belonging to the set
√ 1 Function application allows us to create expressions such as 5x + 3,
in question.
and a. We will discuss functions more formally in Chapter 6. For now suffice
it to say that function application must obey well-formedness rules regarding the
number of parameters and the sets associated with them. For example, for addi-
tion of numbers + to make sense it must be applied to two parameters and the
parameters must be numbers.
We assume that each predicate symbol has a fixed arity, representing the num-
ber of places for terms. For a sentence to be well-formed the term list associated
with each predicate must have the same length as the arity of that predicate.
1 Formally √
these expressions would be written as (a, 2), and +((×(5, x)), 3), but we will
typically use the more familiar forms.
78 CHAPTER 5. PREDICATE LOGIC
The symbols ∀ and ∃ are called quantifiers. ∀ is called the universal quantifier,
and ∃ is called the existential quantifier. In sentences of the form (quantifier x :
S • . . .), x is called the quantified variable, and the part after “•” is called the scope
of the quantifier. The quantifier is said to range over S, where S is a set of values.
Example 5.1. The following are sentences in predicate logic:
• (∃ y : S • (P(x) ∧ Q(x)))
The scope of ∃ is P(x) ∧ Q(x).
• (p ⇔ (∀ x : T • Q(x)))
The scope of ∀ is Q(x).
• (∃ x : S • ((∀ y : N • R(x, y)) ∨ ¬Q(x, z)))
The scope of ∃ is ((∀ y : N • R(x, y)) ∨ ¬Q(x, z)); the scope of ∀ is R(x, y).
• ((∀ z : Z • (∀ y : S • P(z, y, x))) ⇒ ¬r)
The scope of the outermost ∀ is (∀ y : S • P(z, y, x)); the scope of the inner-
most ∀ is P(z, y, x).
Conventions The precedence rules for propositional sentences can also be used
in predicate logic to eliminate parentheses. Additionally, we will use the following
conventions to improve sentence readability:
– When parentheses are missing around a quantified sentence we will assume
that the quantifier scope stretches as far to the right as possible. Specifically,
the scope is taken to be the first unmatched right parenthesis, ‘)’, to the right
of the ‘•’, or the end of the sentence if none exists. For example,
(p ∨ ∀ x : S • ¬Q(x) ⇒ P(x)) ∧ q
should be read as:
(p ∨ (∀ x : S • (¬Q(x) ⇒ P(x)))) ∧ q
– We will use the following notation:
∀ x, y : T; z : S • . . .
as syntactic sugaring for:
∀ x : T • (∀ y : T • (∀ z : S • . . .))
Similarly, we may combine several ∃ appearing one after the other. How-
ever, we may not combine ∀ and ∃ together.
5.3. SEMANTICS 79
Bound and Free Variables The grammar allows one to use any variable name
as a term in a predicate. In many cases such a variable will have been introduced
as the quantified variable in a scope that includes the predicate. But this is not
strictly necessary. To distinguish between variables that are within the scope of a
quantifier and those that aren’t we introduce the following terminology:
A variable can occur both free and bound in the same sentence.
Example 5.2. In the following sentences x is bound, y is free, and z is both free
and bound.
5.3 Semantics
The semantics of predicate logic extends that of propositional logic. As before,
each sentence will be mapped to the domain of true or false. Moreover, atomic
propositions and propositional connectives are interpreted exactly as in proposi-
tional logic.
A variable is interpreted as the objects whose names may be used to instantiate
it; so a variable has as many interpretations as the number of objects in the set the
variable is drawn from. Each constant has a single interpretation — that of the
object that it denotes. Informally, function application f (t1 , t2 , . . . , tn ) is interpreted
by first interpreting the function name “f ,” and all the variables and constants
appearing in its parameter list. Application is then interpreted as “the value” of
the function for the given arguments (in the sense elaborated in Chapter 6). For
example, f (x, 3) will be interpreted as the value 5 when f is interpreted as addition
and x is interpreted as 2, but as 12 when f is interpreted as multiplication and x
is interpreted as 4. We often associate a fixed interpretation with some function
names; for example, + is usually interpreted as addition.
80 CHAPTER 5. PREDICATE LOGIC
(quantifier x : T • P(x))
∃ x : Z • (∀ y : Z • P(x, y))
to give:
∃ x : Z • (∀ z : Z • P(x, z))
∃ x : Z • (∀ y : Z • P(x, y))
Some Important Predicates We assume that each set comes equipped with two
special predicates: the element of predicate “∈” where a ∈ S holds when a is a
member of set S, and the equality predicate “=” where a = b holds when a and
b denote the same element of set S. (Sets will be discussed more thoroughly in
Chapter 6.) We will also use some well-known predicates on integers (Z), such
as “<” (interpreted as x < y if “x is smaller than y”), and some of their properties,
without explicitly defining them.
82 CHAPTER 5. PREDICATE LOGIC
• Universal elimination:
a. ∀ x : S • P(x)
b. a ∈ S
P(a) ∀-elim, a,b
The intuition behind this rule is that if we know that P(x) holds for all values
x in S, and a is a specific element of S, then it must hold for a.
To see how these rules are used, consider the following derivation:
Example 5.4. We show that ` (∀ x : S • P(x)) ⇒ (∀ x : S • P(x) ∨ Q(x)).
1. ∀ x : S • P(x) assumption
2. x∈S assumption
3. P(x) ∀-elim, 1,2
4. P(x) ∨ Q(x) ∨-intro, 3
5. ∀ x : S • P(x) ∨ Q(x) ∀-intro, 2–4
6. (∀ x : S • P(x)) ⇒ (∀ x : S • P(x) ∨ Q(x)) ⇒-intro, 1–5
Example 5.5. An incorrect derivation using ∀-intro.
1. z∈S assumption ← correct
2. P(z, z) assumption
3. z∈S assumption ← incorrect!
4. P(z, z) copy from 2.
5. ∀ x : S • P(x, z) ∀-intro, 3–4
6. P(z, z) ⇒ (∀ x : S • P(x, z)) ⇒-intro, 2–5
7. ∀ y : S • P(y, y) ⇒ (∀ x : S • P(x, y)) ∀-intro, 1–6
This derivation erroneously “shows” that if a binary predicate P is true when its
two arguments are the same, then it must hold for any values of its variables
— clearly not something that we would expect to be valid. The incorrect usage
results from introducing the assumption z ∈ S, violating the side condition of ∀-
introduction: z occurs free in ∀ x : S • P(x, z), and the undischarged assumptions
z ∈ S and P(z, z).
84 CHAPTER 5. PREDICATE LOGIC
Special cases of the universal rules arise when S is empty: that is to say, S
contains no elements. When S is empty then the statement ∀ x : S • P(x) is always
true — the statement is said to hold vacuously. Formally, since the assumption
a ∈ S in the introduction rule contradicts S’s being empty, anything can be derived
from this contradiction, including ∀ x : S • P(x). On the other hand, for an empty
S we can never derive a ∈ S in the elimination rule, so we can never prove P(a) in
this case.
• Existential elimination:
a. ∃ x : S • P(x)
b. a ∈ S ∧ P(a) assumption
..
.
d. R
R ∃-elim, a,b–d [a is a fresh variable]
The intuition behind the elimination rule is that ∃ x : S • P(x) says that P
holds for at least one member of S. Thus we are trying to derive R when at
least one element of S has the property P. By picking a to be an arbitrary
element of S that makes P true, and showing that R must follow, we know
that the proof would work for whichever a makes P(a) true.
The constraint that a is a fresh variable is included to ensure that a is an
arbitrary element of S. a should not appear free in ∃ x : S • P(x), R, or any
undischarged assumptions.
1. ∃ x : S • ∃ y : T • P(x, y) premise
2. z ∈ S ∧ ∃ y : T • P(z, y) assumption
3. ∃ y : T • P(z, y) ∧-elim, 2
4. w ∈ T ∧ P(z, w) assumption
5. z∈S ∧-elim, 2
6. P(z, w) ∧-elim, 4
7. ∃ x : S • P(x, w) ∃-intro, 5,6
8. w∈T ∧-elim, 4
9. ∃ y : T • ∃ x : S • P(x, y) ∃-intro, 8,7
10. ∃ y : T • ∃ x : S • P(x, y) ∃-elim, 3,4–9
11. ∃ y : T • ∃ x : S • P(x, y) ∃-elim, 1,2–10
1. a∈S assumption
2. ∃ y : S • P(y) premise
3. a ∈ S ∧ P(a) assumption ← incorrect!
4. P(a) ∧-elim, 3
5. P(a) ∃-elim, 2,3–4
6. ∀ x : S • P(x) ∀-intro, 1–5
5.5 Equality
We mentioned equality “=” as a special predicate in Section 5.3. This special
predicate denotes that two values from a set T are the same. Comparing two
values for equality makes sense only if they are from the same set. Every set has
an equality predicate associated with it; however, we do not generally distinguish
between the equality symbols, writing = to stand for =T , =U , etc. We also write
x 6= y as a shorthand for ¬(x = y).
The properties of equality are captured as the following inference rules.
• Reflexivity:
t=t eq-refl
• Symmetry:
a. t1 = t2
t2 = t1 eq-sym, a
• Transitivity:
a. t1 = t2
b. t2 = t3
t1 = t3 eq-trans, a,b
• Substitution of equals for equals:
a. t = u
b. S[x ← t]
S[x ← u] eq-sub, a,b
This final inference rule is simple, but extremely powerful. It says that if we
know that two terms (t and u) are equal then whenever a property (S) holds
about t (expressed as S[x ← t]) it also holds with u substituted for t (which
is expressed as S[x ← u]. Sometimes this is called “substituting equals for
equals.”
The resulting formal system is known as predicate logic with equality. In Chap-
ter 7 we present equational reasoning — a style of reasoning based on the prop-
erties of equality and substitution of equals for equals.
Example 5.7. As an example involving reasoning about equality let us derive
` ∀ x : S • (∃ y : S • y = x ∧ P(y)) ⇒ P(x).
5.6. DERIVED INFERENCE RULES 87
1. x∈S assumption
2. ∃ y : S • y = x ∧ P(y) assumption
3. y ∈ S ∧ (y = x ∧ P(y)) assumption
4. y = x ∧ P(y) ∧-elim, 3
5. y=x ∧-elim, 4
6. P(y) ∧-elim, 4
7. P(x) eq-sub, 5,6
8. P(x) ∃-elim, 2,3–7
9. (∃ y : S • y = x ∧ P(y)) ⇒ P(x) ⇒-intro, 2–8
10. ∀ x : S • (∃ y : S • y = x ∧ P(y)) ⇒ P(x) ∀-intro, 1–9
As we have noted, the meaning of equality will depend on the kinds of enti-
ties being compared, and whenever we use it we must be careful to indicate how
equality is determined. In Chapter 6 we will, for example, define how equality on
sets can be determined.
Similarly we can introduce EW(x) for the East-West light, and express EWL1
and EWL2 analogously to NSL1 and NSL2 . The resulting description is slightly
more concise than the first one. However, a more important advantage of using
predicates is that the expressions are immune to changes in the color possibilities:
if traffic light design was to change to allow more than three colors, we would
only need to change the definition of the colors set.
Another alternative for the traffic lights is to introduce a new set Directions,
containing the elements North-South and East-West, and a binary predicate L(x, y)
where x represents the direction and y the color. We could then describe the traffic
light rules as:
L1 == ∀ y : Directions • ∃ x : Colors • L(y, x)
L2 == ∀ z : Directions; x, y : Colors • x 6= y ⇒ ¬(L(x) ∧ L(y))
R == ∀ x, y : Directions•
x 6= y ∧ (L(x, green) ∨ L(x, yellow)) ⇒ L(y, red)
and the resulting traffic lights representation as
traffic lights4 == L1 ∧ L2 ∧ R
model may need to be changed over time. By encapsulating certain concepts using
sets and predicates, we can often insulate ourselves from later changes, in a way
similar to how abstract data types and object-oriented classes have been used in
programming to reduce the impact of changes on a software system.
5.8.2 Quantifiers
The universal quantifier ∀ is typically signaled by words such as “all”, “every”,
“each”, and “any.” For example, “any” indicates that the property under consider-
ation characterizes arbitrary members of a particular set; therefore, it characterizes
all members of the set.2 Similarly, the article “a” when used in the sense of “any”
gives rise to a universally quantified sentence. For example, the first “a” in the
sentence
“A friend in need is a friend indeed.”
indicates that all those who help you in difficult times are true friends.
The existential quantifier ∃ is signaled by words such as “there is”, “there
exist(s)”, and “some.” For example,
“Some students have a background in logic.”
translates to
∃ x : Students • L(x)
where Students is the set of students, and L(x) the predicate that holds if student x
has a background in logic.
“The students, who had a background in logic, did well on the final.”
The situation is a bit different when dealing with sentences that require exis-
tential quantification. In almost all such cases we would use conjunction when
translating the sentences. For example, the sentence
is translated as
where N(x) expresses that x “shall remain nameless.” Moreover, the sentence
“Some of the crocodiles that Jim saw at the zoo looked menacing.”
where Z(x) holds if Jim saw x at the zoo, and M(x) if x looks menacing.
To understand why this sentence could not be translated as
notice that the sentence would be true if Jim did not see all of the crocodiles
(making the antecedent of Z(x) ⇒ M(x) false for some crocodile, and therefore the
implication true), or if some crocodile looks menacing (making the consequent of
Z(x) ⇒ M(x) true for some crocodile, and therefore the implication true).3 Most
people would agree that this is not the intended meaning of the original sentence.
3 Formally, this follows from the property
We now show how to create a formal system to express and reason about such
riddles. We call our system the Riddle System.
Axioms
We now introduce a collection of axioms for the Riddle System to clarify the
properties of the predicates and sets that we have introduced.
A1: ∀ x : Persons • ¬FatherOf (x, x)
This axiom states that no one can be their own father.
A2: ∀ x : Persons • ∃ y : Persons • FatherOf (x, y)
This axiom states the existence of a father for every person.
A3: ∀ x, y, z : Persons • FatherOf (x, y) ∧ FatherOf (x, z) ⇒ y = z
This axiom states that a person cannot have more than one (biological) fa-
ther.
A4: ∀ x : Persons • ¬SonOf (x, x) ∧ ¬DaughterOf (x, x)
This axiom states that no one can be their own son or daughter.
5.9. FATHERS AND SONS: A FORMAL RIDDLE SYSTEM 95
Derived Facts
Having defined the primitives and the axioms, the next task is to build a collection
of derived facts. Derived facts can be used like any of the axioms of the Riddle
System or the theorems that hold more generally for predicate logic (cf., Sec-
tion 4.5). As before, the choice of which facts to derive is driven by the specific
needs of the formalization. For example, as we will see below, to proof that a
particular kind of relationship exists between the riddle teller and the unidentified
man in the riddles above, is assisted by the introduction of a collection of lemmas
or theorems.
Let us now consider some derived facts in the Riddle System.
Theorem 1. Son-Father
96 CHAPTER 5. PREDICATE LOGIC
We show that
Theorem 2. Daughter-Father
We show that
1. x ∈ Persons assumption
2. Siblings(x, x) assumption
3. ∀ x, y : Persons•
Siblings(x, y) ⇔
x 6= y ∧ ∃ z : Persons • FatherOf (x, z) ∧ FatherOf (y, z) premise A7
4. x ∈ Persons ∧ x ∈ Persons ∧-intro 1,1
5. Siblings(x, x) ⇔
x 6= x ∧ ∃ z : Persons • FatherOf (x, z) ∧ FatherOf (x, z) ∀-elim, 3,4
6. Siblings(x, x) ⇒
x 6= x ∧ ∃ z : Persons • FatherOf (x, z) ∧ FatherOf (x, z) ⇔-elim, 5
7. x 6= x ∧ ∃ z : Persons • FatherOf (x, z) ∧ FatherOf (x, z) ⇒-elim, 6,2
8. x 6= x ∧-elim, 7
9. x=x eq-refl
10. ¬Siblings(x, x) ¬-intro, 2,8,9
11. ∀ x : Persons • ¬Siblings(x, x) ∀-intro, 1–10
Figure 5.2: Proof of Theorem 4
We also prove one more result about siblings: that two different persons that
are not siblings cannot have the same father.
A1, A2, . . . , A7 `
∀ x, y : Persons•
¬Siblings(x, y) ∧ x 6= y ⇒
∀ z : Persons • ¬(FatherOf(x, z) ∧ FatherOf(y, z))
A1, A2, . . . , A7 `
∀ x, y, z : Persons•
x 6= y ∧ FatherOf(x, z) ∧ FatherOf(y, z) ⇒ Siblings(x, y)
Deriving facts once the axioms of a system are identified is also useful as a
sanity check: if we are unable to prove properties that we expect to hold this
might be an indication that our axioms are too weak. For example, can you derive
that “a person cannot be his father’s father” in the Riddle System? Probably not.
Informally the reason for not being one’s own grandfather is that every father is
older than his son. But there is nothing in the axioms of the Riddle System that
allows us to reason in this way.
Riddle Formalization
Let us now go back to the riddles that we started with. We will formalize the first
riddle in the Riddle System; moreover, we will stipulate a relationship that we
expect to hold between the riddle teller and the unidentified man and try to derive
5.9. FATHERS AND SONS: A FORMAL RIDDLE SYSTEM 99
∀ z : Persons • ¬Siblings(x, z)
The second line, “this man is my father’s son,” can be translated in several
ways. For example,
which says that if v is x’s father then y must be one of v’s sons. The careful reader
will notice that these two sentences do not generally express the same thing in
predicate logic. So why can we use them interchangeably here? The answer is
that within the Riddle System the sentences express the same fact.
Lemma 1.
A1, A2, . . . , A7 `
∀ x, y : Persons•
(∀ v : Persons • FatherOf(x, v) ⇒ SonOf(v, y)) ⇔
(∃ v : Persons • FatherOf(x, v) ∧ SonOf(v, y))
By now the reader has probably guessed that the solution to the puzzle is that
x and y are the same person. Now we express this relationship formally and prove
it in the Riddle system.
100 CHAPTER 5. PREDICATE LOGIC
∀ x, y : Persons•
(∀ z : Persons • ¬Siblings(x, z)) ∧
(∀ v : Persons • FatherOf(x, v) ⇒ SonOf(v, y)) ⇒
x=y
Proof Sketch: Intuitively the argument for why x and y must be the same person
is that they are children of the same father (since y is the son of x’s father) and
x’s father has only one child (since x has no brothers or sisters). We prove that
x = y by contradiction: we assume that x 6= y. Then from the second line of the
riddle we deduce that x and y have the same father. Moreover, since x 6= y then
they must be siblings. But now we have a contradiction: the first line says that x
has no siblings. The contradiction arose from assuming x 6= y; therefore, x and y
must be the same person.
The full proof of the theorem is presented in Figure 5.5.
Chapter Notes
[TBD]
Further Reading
[TBD]
5.10 Exercises
1. Which of the following sentences are well-formed sentences in predicate
logic?
2. Which occurrences of the variables x and y are free and which are bound in
each of the following?
102 CHAPTER 5. PREDICATE LOGIC
S4. Fourth, the second left and the second on the right
Are twins once you taste them, though different at first sight.
(a) Translate the emphasized sentences of the riddle into predicate logic
using the translation key provided below:
6. Assume x does not occur free in Q. Show using predicate calculus that:
Can you explain this confusion using the formal notations covered in this
chapter?
8. (a) Formalize each of the following riddles in the Riddle System, intro-
ducing new primitive predicates if necessary.
i. “Brothers and sisters have I none,
but this man’s father is my father’s son.”
ii. “Brothers and sisters have I none,
but this man’s son is my son.”
iii. “Brothers and sisters have I none,
but this man’s son is my father’s son.”
iv. “Brothers and sisters have I none,
but this man’s father’s son is my son.
v. “Brothers and sisters have I none,
but this man’s father’s son is my father’s son.”
(b) If new primitive predicates were introduced augment the Riddle Sys-
tem with a set of axioms relating the newly-introduced predicates to
those of the original Riddle System.
(a) Characterize the relationship between the riddle teller and the uniden-
tified man.
(b) Prove that the relationship is a logical consequence of the formalized
riddle in the (augmented) Riddle System.
10. Extend the Riddle System so that one can derive the fact that no one is their
own grandfather, great-grandfather, or great-great-grandfather.
(a) An infusion line may become pinched causing the flow to be blocked.
This will be recognized by the pump as an occlusion and will cause
the pump to alarm.
i. The mitigation is to straighten the line and re-start the pump.
ii. A caregiver may silence the alarm during the procedure.
106 CHAPTER 5. PREDICATE LOGIC
(b) The infusion line may become plugged. The pump will recognize an
occlusion and alarm.
i. The mitigation is to clear the infusion lines and re-start the pump.
ii. The caregiver may silence the alarm during the procedure.
(c) Electrical failure may occur causing the pump to switch to battery op-
eration.
i. The pump will switch over to battery power and notify the care-
giver visually.
ii. The switch may not occur if the battery is not properly charged.
Questions:
6.1 Sets
A set is simply a collection of objects. Examples include the set of prime num-
bers, the set of positive integers, the set of countries in Europe, the set of strings
of letters and numbers, and the set of possible vehicle license plate numbers in
Pennsylvania.
When working with sets, we will assume that there exists a predicate, element
of, that allows us to assert that an element is a member of a set. Notationally, we
write e ∈ S, which is true when e is an element of the set S. We will abbreviate
¬(e ∈ S) by e 6∈ S.
An element can, of course, be a member of several sets. For example, the
number 3 is an element of both the sets of prime numbers and the positive integers.
Similarly, “ABC123” is both a possible license plate number in Pennsylvania and
a string of letters and numbers.
107
108 CHAPTER 6. STRUCTURES AND RELATIONS
Types One important feature of our approach is that we will require that sets be
homogeneous, in the sense that all their elements have the same “shape.” To make
this idea precise, we will associate a type with each element in a model, and insist
that a set contain elements of only one type. This approach to sets is called typed
set theory.
As with programming languages, the use of types has a number of engineer-
ing benefits. First, it permits us to make definitional distinctions between different
kinds of elements in the universe, thereby allowing us to represent important se-
mantic differences in the elements of systems that we are modeling. For example,
we might distinguish calendar dates from employee identification numbers, even
though both could in principle be represented as numbers or strings. Second, it
serves as a sanity check on expressions that we write, and allows tools to help
make sure that we are not writing down nonsense. For example, typing rules
would prohibit the application of a function to an element that has the incorrect
type. Third, it eliminates certain kinds of mathematical paradoxes that would oth-
erwise occur in a less constrained world.
There are many possible type systems that one might use. In this book we
adopt a simple scheme: the type of an object is the maximal set in which it is an
element.1 This approach has the virtue that every element e has a unique type —
the largest set S for which e ∈ S.
In addition to ensuring that sets are homogeneous, the use of types also allows
us to check that expressions are well-formed. For example, we can rule out ex-
pressions of the form a = b if a and b do not have the same type. Similarly, we
can rule out an expression of the form e ∈ S if the type of e is not the same as the
type of elements contained in S.
In the remainder of this chapter, as we introduce new ways to construct sets,
we will also explain how we assign types to their elements.
Basic Sets The starting point for constructing models is to define a set of prim-
itive, or basic, sets. A basic set is defined simply by providing a name for that
set.
Basic sets are primitive in the sense that we have no knowledge about the
internal structure of their elements. We will, however, assume that two different
basic sets are disjoint: that is, no two basic sets have common elements. We will
also assume that each basic set comes with an equality predicate that allows us to
1 Of course, for this definition to be sound we need to be sure that such a maximal set exists.
See the chapter notes for a brief discussion of this issue.
6.1. SETS 109
determine whether two elements in that set are the same object.
Syntactically, we declare a basic set by enclosing its name in square brackets.
For example,
[Persons]
[Plants]
declare Persons and Plants to be basic sets. By virtue of the fact that they are
different basic sets, we can assume that no person is also a plant, or the other way
around.
We may declare more than one basic set in the same place. So, an equivalent
declaration for the examples above would be
[Persons, Plants]
Given a basic set B and an element x ∈ B, the type of x is simply the set B.
The Integers We will also include one built-in set — the set of integers. Infor-
mally, this is the set containing
. . . , −2, −1, 0, 1, . . .
We use Z to denote this set.2 We also assume that we know the standard arith-
metic facts about the elements of Z, such as the facts about addition, subtraction,
less than, and so on. Equality between integers is the usual notion of numerical
equality.
x:S
where x is the variable name that we are introducing and S is a set or any expres-
sion that represents a set. The type of x is the type of elements in S.
Note that ‘:’ and ‘∈’ represent different concepts. The former is used to de-
clare a variable, while the latter is a predicate, which will be true or false depend-
ing on whether the value of x is in S or not.
2 Formally, this is a basic set for which the properties of its elements have been axiomatized in
predicate logic.
110 CHAPTER 6. STRUCTURES AND RELATIONS
x:S
P(x)
Here the variable x is introduced globally together with a predicate P that the
value of x must satisfy. More than one variable may be declared and more than
one constraint may be defined in the same definition block. For example,
Max : Z
Unlucky : Z
Max ≥ 100
Unlucky = 13
declares Max, whose value must be greater than or equal to 100, and Unlucky,
whose value is 13.
Multiple predicates in axiomatic definitions are conjoined together. For exam-
ple,
x:S
P1 (x)
P2 (x)
Example 6.1. The following are examples of sets defined by enumeration. (As in
Chapters 4-5, we use the notation S == e to mean that S is by definition the same
as e: wherever we use S we could have used e.)
• SmallEvens == {2, 4, 6, 8}
The elements of SmallEvens are of type Z.
• Primary == {red, green, blue}
The elements of Primary are of type Colors (presumably declared elsewhere
as a basic set).
• Primes == {2, 3, 5, 7, 11, 13, . . .}
This is the set of prime numbers; its elements are of type Z. Note that here
we informally use “. . . ” to represent that the set also contains other elements
that fit the pattern observed in the elements that have been listed. We will
see later how we can define such patterns formally.
Example 6.2. The following is not a well-formed set.
• ColorNumbers == {2, 4, red, green, 10, 15, 1}
The set is not properly typed: we cannot mix color elements with number
elements in the same set.
One common form of set enumeration is a number range — the set of all
integers in some range. For these enumerations we use the syntactic abbreviation
“x..y” to denote the set that includes integers from x up to (and including) y. When
y < x the set x..y is empty.
Example 6.3.
• 1..3 represents the set {1, 2, 3}.
• -2..2 represents the set {−2, −1, 0, 1, 2}.
• 0..9 represents the set of digits {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
• 1..100 represents the set of the first 100 positive integers.
• 1..0 represents the empty set of integers.
The notion of defining a set by enumeration can be captured formally through
an axiom that tells us how to determine whether an object is a member of such
a set: an object is an element of a set defined by enumeration if and only if it is
equal to one of the elements of the set:
Set Membership: a ∈ {s1 , s2 , . . . , sn } ⇔ a = s1 ∨ a = s2 ∨ . . . ∨ a = sn
112 CHAPTER 6. STRUCTURES AND RELATIONS
Set Equality: S = T ⇔ (∀ x : U • x ∈ S ⇔ x ∈ T)
6.1.3 Subsets
A set S is said to be a subset of set T, written S ⊆ T, if and only if every element
of S is an element of T. Formally:
Subset: S ⊆ T ⇔ (∀ x : U • x ∈ S ⇒ x ∈ T)
3 Since ` P ⇔ Q then we also know that ` P ⇒ Q and ` Q ⇒ P. From ` P ⇒ Q and Modus
Ponens, to derive Q we need a derivation of P. And that is what the following rule says
a. P
Q R1, a
Similarly, we can introduce a rule from ` Q ⇒ P.
6.1. SETS 113
S == {x : T | P(x)}
where T is the set from which the elements are drawn, and P(x) is the sentence in
predicate logic that captures the property that each element of S must possess.
For example, the natural numbers are defined as the set of non-negative inte-
gers:
N == {x : Z | x ≥ 0}
In the definition of set comprehension x is considered a bound variable, and,
as with predicate logic, can be renamed provided no variables occurring free in
P(x) are captured during renaming. So, for example, an equivalent definition of
the natural numbers would be
N == {y : Z | y ≥ 0}
6.2. POWERSET 115
The type of the elements in a set defined by comprehension is the type of the
elements in T. Note that the set T from which elements are drawn can be any set,
and not necessarily a type (i.e., a maximal set).
Example 6.6. Consider the set SmallNats == {x : N | x < 5}. Since the values
of x under consideration are drawn from N, the type of elements in SmallNats is
Z, the type of elements in N.
6.2 Powerset
The powerset of a set T, denoted P T, is the set of all of its subsets.
116 CHAPTER 6. STRUCTURES AND RELATIONS
Finite Subsets Sometimes we would like to talk about the finite subsets of a set.
F S denotes the set of all finite subsets of S.
The powerset, and finite subset operators bind tighter than any of the other set
operators (such as union, intersection, and product, which we will discuss shortly).
NewName == SetExpression
NewName[Set] == SetExpression
For example, suppose we wish to talk about the non-empty subsets of a variety
of sets. We could define this for each set as needed, but a more general way would
be to declare
NonEmptySets[S] == {ss : P S | ss 6= 0}
/
Then, for example, NonEmptySets[Z] would represent the set of non-empty sets of
integers, and NonEmptySets[COLOR] would represent the set of non-empty sets
of colors,
S ∪ T == {x : U | x ∈ S ∨ x ∈ T}
Distributed Union Let S be a set of sets whose elements are of type U. The
distributed union over S is defined as:
S
S == {x : U | ∃ s : S • x ∈ s}
That is, an element is in the distributed union over a set of sets if and only if it
appears in at least one of the member sets.
Example 6.12. Suppose PrimarySets and PrimarySetsSets are defined as in Ex-
ample 6.9.
S
• S PrimarySets = {green, blue, red}
• S PrimarySetsSets = {{green, blue}, {red, green, blue}, 0} /
• (P N) = N
6.4.2 Intersection
The intersection of two sets S and T, denoted S ∩ T, is the set containing exactly
those elements that appear both in S and in T:
S ∩ T == {x : U | x ∈ S ∧ x ∈ T}
where the type of the elements of both S and T is U.
Example 6.13. Here are some intersection examples.
• For S == {4, 5, 6, 7} and T == {2, 3, 4, 5}, S ∩ T = {4, 5}
• For S == {red} and T == 0, / S ∩ T = 0/
• Evens ∩ Odds = 0/
• P Evens ∩ P Odds = {0} /
Set union and intersection satisfy many useful properties. Figure 6.1 lists some
of these. Note that the last two cardinality-related properties apply only to finite
sets.
6.4. UNION, INTERSECTION, DIFFERENCE 119
S∩T = T ∩S ∩-Commutativity
S∪T = T ∪S ∪-Commutativity
S ∩ 0/ = 0/ ∩-Empty
S ∪ 0/ = S ∪-Empty
(S ∩ T) ∩ U = S ∩ (T ∩ U) ∩-Associativity
(S ∪ T) ∪ U = S ∪ (T ∪ U) ∪-Associativity
S ∩ (T ∪ U) = (S ∩ T) ∪ (S ∩ U) ∩∪-Distributivity
S ∪ (T ∩ U) = (S ∪ T) ∩ (S ∪ U) ∪∩-Distributivity
#(S ∩ T) ≤ #S ∧ #(S ∩ T) ≤ #T ∩-Cardinality
#S ≤ #(S ∪ T) ∧ #T ≤ #(S ∪ T) ∪-Cardinality
6.4.3 Difference
The difference of sets S and T, denoted S \ T, is the set containing exactly those
elements of S that do not appear in T:
S \ T == {x : U | x ∈ S ∧ x 6∈ T}
where the type of the elements of both S and T is U.
Example 6.15. Difference examples.
• For S == {4, 5, 6, 7} and T == {2, 3, 4, 5}, S \ T = {6, 7}
• For S == {red} and T == 0, / S \ T = {red}
• Evens \ Odds = Evens
• N \ Evens = Odds
120 CHAPTER 6. STRUCTURES AND RELATIONS
whose elements have type U, and T a set whose elements have type V, then the
elements of S × T have type U × V.
Generalizing, the Cartesian product of n sets S1 , S2 , . . . , Sn , denoted S1 × S2 ×
. . . × Sn , is a set of n-tuples (x1 , x2 , . . . , xn ) where x1 ∈ S1 , x2 ∈ S2 , . . . , xn ∈ Sn . If
the elements of set Si have type Ui (for 1 ≤ i ≤ n) then the type of the elements of
S1 × S2 × . . . × Sn is U1 × U2 × . . . × Un .
Example 6.17. Let S1 == {red, green}, S2 == {3}, and S3 == {Paul, Ron}
Owners ⊆ (N × Persons)
Owners ∈ N ↔ Persons
122 CHAPTER 6. STRUCTURES AND RELATIONS
Owners == {(1234, John S.), (3251, Peter M.), (5132, Mary P.)}
Owners == {1234 7→ John S., 3251 7→ Peter M., 5132 7→ Mary P.}
(Note that when using the “map” notation we do not enclose the map expression
in parentheses: that is, we use a 7→ b and not (a 7→ b).)
We assume that “↔” associates to the right, so that S ↔ T ↔ U is interpreted
as S ↔ (T ↔ U).
Domain and Range We identify two important sets in connection with a re-
lation R : S ↔ T. Its domain, denoted dom(R), is the set of values from S that
appear as the first element of some pair in R. Its range, denoted ran(R), is the set
of values from T that appear as the second element of some pair in R. Formally,
the domain and range of a relation are defined as follows:
dom(R) == {x : S | ∃ y : T • (x, y) ∈ R}
ran(R) == {y : T | ∃ x : S • (x, y) ∈ R}
Notice that although dom(R) ⊆ S, it may be the case that dom(R) is not the
same as S, since there can be elements of S that do not appear as the first element
of any pair in R. Similarly for the range and target.
Example 6.18. Consider the relation Div2 defined as follows
Its elements are the pairs (2, 1), (4, 2), (6, 3).... The elements are of type Z × Z.
N is both the source and the target of the relation. The domain of the relation is
Evens, and the range of the relation is N.
6.6. RELATIONS AND FUNCTIONS 123
Domain and Range Restriction There are a number of operators that allow us
to “filter” elements of a binary relation. The domain restriction operator, denoted
C, is defined as follows:
S1 C R == {(x, y) : S × T | (x, y) ∈ R ∧ x ∈ S1 }
where S1 has the same type as S. Informally, the restricted relation contains those
elements from R whose first component appears in S1 .
Similarly, we define the range restriction operator, denoted B, as:
R B T1 == {(x, y) : S × T | (x, y) ∈ R ∧ y ∈ T1 }
where T1 has the same type as T. Informally, the restricted relation contains those
elements from R whose second component appears in T1 .
Example 6.19. Let
R == {(2, red), (5, blue), (2, yellow), (3, red), (5, pink), (4, azure)}
Primary == {red, blue, green}
Evens == {x : N | ∃ y : N • x = 2 ∗ y}
Then
• A Pythagorean triple consists of three positive integers that can be the lengths
of the sides of a right triangle. The set of all such triples can be defined as:
Pythagorean == {(a, b, c) : N × N × N | a2 + b2 = c2 }
5DegreePolynomials ∈ P(Z × Z × Z × Z × Z × Z)
Sometimes it is convenient to consider an n-ary relation rn : P(S1 × S2 × . . . ×
Sn ) as a binary relation r2 : (S1 × S2 × . . . × Sn−1 ) ↔ Sn . For example, it may
be convenient to work with terms like (a, b, c) 7→ 5 (an element of r2 ) instead of
(a, b, c, 5) (an element of r4 ).
Whenever we do so, we assume we are working with a relationship r2 equiva-
lent to rn in the following sense:
∀ s1 : S1 ; s2 : S2 ; . . . ; sn : Sn •
((s1 , s2 , . . . , sn−1 ), sn ) ∈ r2 ⇔ (s1 , s2 , . . . , sn ) ∈ rn
6.6.3 Functions
A partial function from S to T is a binary relation f : S ↔ T such that f maps an
element of S to at most one element T:
∀ x : S; y1 , y2 : T • (x, y1 ) ∈ f ∧ (x, y2 ) ∈ f ⇒ y1 = y2
We use the notation f (x) = y when there is a y such that (x, y) ∈ f and then say
that f (x) is defined; otherwise we say that f (x) is undefined.
A total function from S to T is a partial function f from S to T such that f (x)
is defined for all x ∈ S. In other words, dom(f ) = S.
Although a total function is a special kind of partial function it is customary
to use the word function to mean a total function. We then say explicitly when we
are dealing with partial functions.
6.6. RELATIONS AND FUNCTIONS 125
relations ↔
partial functions
bijective
surjective
Since functions are also relations we can use relational composition to com-
pose them, provided their types match up appropriately. In particular,
Relational Composition
• If the range
6.6. RELATIONS of one relation is the domain of
AND FUNCTIONS 127
another can form the composition (R1;R2)
0
A red
B 1
blue
2
X green
Y 3
R1 R2
Note:Figure
in some texts
6.3: ran(R2) must
Example be the same
of Relational as dom(R2)
Composition
for ; to be defined
we denote by Licenses. Composing Owners with Licenses would give us the re-
Models of Software Systems © Garlan, 2001 Lecture 4 -- Sets, Relations, Functions 27
lation that maps vehicle numbers to the driver’s license numbers of their owners.
Example 6.24. Pipes and Filters. Relational composition can be thought of as
modeling a “pipe-and-filter” style of computation: a filter represents a computing
unit that transforms its inputs according to a relation. The outputs from the first
Overwriting
filter are piped into a second filter, which also transforms the data according to
another relation, and so on. For n filters, the initial inputs are then related to the
• Frequently
final inputs according to R1we
; R2 ;will
. . . ; want
Rn . to change the value
of a function for one or more values
6.6.5 •Defining Relationsoperator,
The overriding ⊕, does
and Functions Axiomatically
this:
• Example:
As we have illustrated thus far, we can define a relation or a function in a variety
> fset
of ways: using ==enumeration,
{(1,red), (2,blue), (3,green)} and so on. We can also define
set comprehension,
a function using
> g an
==axiomatic
{(1,pink),declaration
(4,mauve)} introduced in Section 6.1.
Consider>the
f ⊕ g = {(1,pink), (2,blue), might
square function, which we informally
(3,green), define as
(4,mauve) } follows:
square(x) = x2
Note: replacement
Defining this axiomatically only
we would have:
considers domain values
square : N → N
∀ x : N • square(x) = x2
Models of Software Systems © Garlan, 2001 Lecture 4 -- Sets, Relations, Functions 28
root : Z ↔ Z
∀ x, y : Z • (x, y) ∈ root ⇔ (x2 = y)
When defining functions and relations in this way we can also indicate that
they are to be treated as an infix operator. For example, when defining the superset
relation ⊃ over sets we can declare that it is an infix operator by using the notation
⊃ as follows:
⊃ : PX ↔ PX
∀ S, T : P X • S ⊃ T ⇔ S ⊂ T
6.7 Records
Another useful structure when creating models is one that allows us to keep track
of mixed types of information (as we do with tuples), but also allows us to identify
each part using a label, rather than its position in some ordering. For example,
we might like to model information associated with an employee, such as social
security number, salary, and date of employment. But rather than listing these in
some fixed order we will refer to each part of the data record using an appropriate
label.
This can be achieved with records, a construct familiar to programmers. Records
are similar to tuples, but their components have names. We refer to the component
names as fields. For example, a sample employee record might be
is the same as the previous record. That is to say, two records are equal if the
values of their fields are the same.
The value of an individual field in a record can be accessed using the familiar
“dot” notation. For example, if x refers to the record defined above, then x.ssn =
123456789, x.salary = 55000, and x.startDate = 5/16/02.
6.8. RECURSIVE STRUCTURES 129
6.8.1 Trees
Consider binary trees. A recursive definition for simple binary trees would be
TREE ::= leaf | nodehhTREE × TREEii
This definition introduces a new type, TREE, for which any element of the type is
either a leaf or it is made up of two subtrees glued together with a node. We refer
to leaf and node as tree constructors.
To write down a particular element of a recursive type we treat the non-
parameterized constructors of the recursive definition as constants, and the param-
eterized constructors as functions. For example, here are some TREE instances:
leaf
node(leaf , leaf )
node(node(leaf , leaf ), leaf )
node(leaf , node(node(leaf , leaf ), leaf ))
130 CHAPTER 6. STRUCTURES AND RELATIONS
A recursive structure can have any number of constructors. For example con-
sider a kind of tree that can have both binary as well as ternary nodes:
mixleaf
ternnode(binnode(mixleaf , mixleaf ), ternnode(mixleaf , mixleaf , mixleaf ), mixleaf )
Enumerations, such as these, guarantee that every value of the type must be
exactly one of the distinct choices provided by the enumeration.
and 5 can be represented described as (push(5, (push(3, new))). How the elements
are stored internally is irrelevant to the user of the stack.
Beyond simple data structures, the general principle of characterizing an en-
tity in terms of an interface specification is ubiquitous throughout software en-
gineering. It represents one of the key ideas behind object-oriented program-
ming, component-based systems, peer-to-peer computing, service-oriented archi-
tectures, and many other software engineering paradigms.
When encountering some phenomena that you would like to model, the deci-
sion of whether to use a recursive structure will often be dictated by the nature of
the entities involved and the kind of reasoning that you would like to do. In some
cases, such as trees, a recursive definition will be the most natural. In others, di-
rectly representing it in terms of other modeling structures (e.g., sets, relations,
etc.) will be preferable. In some cases, as we will see shortly, both ways are
commonly used – each offering certain advantages.
6.9 Sequences
Sequences are used to model ordered collections of objects. They are typically
used to model queues, temporal orderings (such as the history of states of system),
and indexed lists of elements. Unlike tuples and records, which have fixed length,
the length of a sequence is variable. For example, one may append new elements
to a sequence or concatenate two sequences together.
Instances of a sequence are denoted using angle brackets, for example, h4, 2, 8i.
The empty sequence is the sequence with no elements, and is denoted by hi.
In contrast to tuple types, which allow for elements of a tuple to have differ-
ent types, the elements of a sequence must be of the same type. For example,
h{a}, 0,
/ {b, c}, {a}i is a valid sequence – its elements are drawn from P{a, b, c} –
whereas ha, {b}i is not a valid sequence. An element can occur more than once in
a sequence; for example sequence hb, a, a, ci is different from sequence hb, a, ci.
In the remainder of this section we describe two ways of modeling sequences:
as relations and defined recursively. In addition to providing a useful modeling
abstraction in its own right, this will help to illustrate how the various modeling
concepts introduced in this chapter can be applied to create new kinds of modeling
structures.
132 CHAPTER 6. STRUCTURES AND RELATIONS
That is, a sequence is either empty or is formed by adding a natural number to the
front of another sequence.
To avoid clutter we will pretty-print cons as an infix operator ‘ :: ’. For exam-
ple, cons(n, s) will be written (n :: s) for a number n and sequence s. Moreover,
we let :: associate to the right, so 3 :: 4 :: 5 :: hi means 3 :: (4 :: (5 :: hi)).
Example 6.25. These sequences are well-formed:
• hi
• 1 :: (2 :: hi)
• 7 :: 10 :: hi
• 3 :: (6 :: (1 :: (2 :: hi)))
Example 6.26. These sequences are badly-formed:
• h3i :: hi
• (hi :: 1) :: 4
In order to define operations over such sequences, we need to describe the
effect of the operator for each construct. How we do that, and also how we reason
about structures defined recursively will be detailed in Chapter 7.
the amount of effort that we want to put into defining a model, the nature of the
tools that we have at our disposal for automating the specification and analysis
process, expectations about ways that the model may need to be changed in the
future, and the existence of other models, and theories that we may wish to build
on.
However, despite these differences, there are a number of steps that are typi-
cally followed to design a specification of a model.
The first step is to decide what kinds of entities are going to be included in the
model. This can often be done without yet knowing how those entities are going
to be modeled. For example, if the system is intended to support secure access to
documents, we are likely to have entities such as documents, people, passwords,
access rights, audit trails, etc.
The second step is to define what kinds of entities are to be primitive. These
will typically be modeled as given sets. For example, we might decide that pass-
words are primitive entities, but documents will require more structured represen-
tations.
Next we need to define the more complex elements using given sets and the
type constructors that we have discussed in this chapter and the relationships
6.11. EXERCISES 135
among those elements. This is usually the hard part. In the end, it typically in-
volves thinking through the design of the specification in terms of the constraints
over the possible states of the the model, the kinds of properties that are impor-
tant, and the kinds of reasoning that you would like to perform. Typically this is
an iterative process: early models may turn out to be too complex or not detailed
enough. Or, in the process of specification we may discover new properties and
constraints that are relevant.
In the process of creating our model, it will also be important to document
it, by adding in prose that explains the terminology and the rationale behind the
choice of model elements and properties that are included.
[Future versions will include an example that illustrates the process and tech-
niques.]
Chapter Notes
[TBD]
Further Reading
[TBD]
6.11 Exercises
1. Define the following sets by enumeration. What is the type of the elements
of each set?
(a) {0, 1}
(b) {5}
(c) {0}
/
(d) {{0}}
/
(a) S1 = {(a, b), (b, c), (c, d)} and T1 = {(b, a), (c, b)}.
(b) S2 = {a : N | a ≤ 5} and T2 = {a : N | 4 ≤ a}.
(c) S3 = {a : Z | ∃ b : Z • a = 15b} and T3 = {a : Z | ∃ b : Z • a = 10b}.
(d) S4 = {(a, b) : N × N | a ∗ b ≤ a2 } and T4 = {(a, b) : N × N | a ∗ b ≤ b2 }.
(a) S1 = {5, 6, 7}, S2 = {6, 7, 8, 9}, S3 = {7, 8, 9, 10, 11}, and S4 = {8, 9, 10, 11, 12, 13}.
(b) Si = {(a, b) : N × N | a ∗ b ≤ i}, for 1 ≤ i ≤ 5.
(c) Si = {a : N | ∃ b : N • i = a ∗ b}, for 1 ≤ i ≤ 5.
(d) Ti = P Si , for 1 ≤ i ≤ 3, where Si is as defined in 6c.
(e) Si = {i} for i ∈ N.
7. (a) Show (A ⊆ B) ∧ (B ⊆ A) ⇔ A = B.
(b) Show (A ⊂ B) ⇒ B 6= 0. /
(c) Show A \ B ⊆ A.
(d) Show that set difference is not commutative.
6.11. EXERCISES 137
13. Can a finite function have an infinite range? Briefly explain why or why
not.
(a) S1 C R
(b) (S1 ∪ S2 ) C R
(c) RBT
(d) S2 C (R B T)
S1 −
C R == {(x, y) : S × T | (x, y) ∈ R ∧ x 6∈ S1 }
R−
B T1 == {(x, y) : S × T | (x, y) ∈ R ∧ y 6∈ T1 }
f ⊕ g == ((dom g) −
Cf)∪g
The new function has the same results as f for domain elements of f not
appearing in the domain of g, and the same results as g for domain elements
of g.
For example, for
(a) An operator “singletons” that takes a set and returns all its subsets with
one element.
(b) An operator “infinite subsets” that takes a set and returns its infinite
subsets.
20. Using axiomatic definitions (as described in Section 6.6.5) and the defini-
tion of sequences of natural numbers from Section 6.9.1 specify the follow-
ing.
(a) The relation “identic” between two sequences: s and t are identic if
their elements correspond.
(b) The relation “doubled” between two sequences: s and t are related if
each element of t is twice as large as the corresponding element of s.
(c) The relation “mapped” between two sequences and a function f (that
takes a natural number and returns a natural number): s, t, and f are
related if each element of t is obtained by applying f to the correspond-
ing element of s.
(d) The function “identity” that takes a sequence and returns an identic
sequence (in the sense of 20a).
(e) The function “double” that takes a sequence and returns its double (in
the sense of 20b).
(f) The function “map” that takes a sequence, and a function f and returns
the mapped sequence (in the sense of 20c).
(a) An operator “turn into function” that takes a relation like r above and
turns it into a function like f .
(b) An operator “turn into relation” that takes a function like f above and
turns it into a relation like r.
6.11. EXERCISES 141
• reflexive: ∀ a : S • a ≤ a,
• antisymmetric: ∀ a, b : S • a ≤ b ∧ b ≤ a ⇒ (a = b), and
• transitive: ∀ a, b, c : S • a ≤ b ∧ b ≤ c ⇒ a ≤ c.
∀ s : S; s1 , s2 : seq[X] • (s = s1 _ s2 ) ⇒ (s1 ∈ S)
25. We defined an object’s type to be the maximal set in which the object is an
element. That is, any other set that the object is an element of will have
fewer elements than the maximal set. Argue (informally) that this implies
that the type of an object is unique — that any two maximal sets must be
the same set. (Hint: assume that two different maximal sets exist and show
that this leads to a contradiction.)
R == {A | A 6∈ A}
Reasoning Techniques
143
144 CHAPTER 7. REASONING TECHNIQUES
understanding was that by rationalizing individual steps you were ensuring that
once a simple enough expression was obtained, that expression would be the same
as the original.
This reasoning approach is formally based on a subset of predicate logic called
equational logic. In the rest of this section we describe equational logic and the
formal justification for equational proofs.
Semantics
In Chapter 5 we axiomatized three important properties of equality: reflexivity,
symmetry, and transitivity. We also introduced an inference rule (“eq-sub”) for
replacing parts of a sentence with equal parts while preserving its truth value.
Now we introduce an inference rule that enables replacing part of an expres-
sion with an equal part without changing the value of the expression. This rule is
known as substitution of equals for equals:
a=b
=-sub
e[x := a] = e[x := b]
= [explanation of why a = b]
ei+1 [x := b]
ek
= [explanation of why ek = ek+1 ]
ek+1
e0
= [explanation of why e0 = e1 ]
e1
= [explanation of why e1 = e2 ]
e2
..
.
= [explanation of why en−2 = en−1 ]
en−1
= [explanation of why en−1 = en ]
en
(a + b)2
= [definition of exponentiation m2 = m × m]
(a + b) × (a + b)
= [right distributivity of × over +]
a × (a + b) + b × (a + b)
= [left distributivity of × over +]
146 CHAPTER 7. REASONING TECHNIQUES
(a × a + a × b) + (b × a + b × b)
= [definition of exponentiation m × m = m2 ]
(a2 + a × b) + (b × a + b2 )
= [associativity of +]
a2 + (a × b + b × a) + b2
= [commutativity of ×]
a2 + (a × b + a × b) + b2
= [m + m = 2 × m]
a2 + 2 × a × b + b2
P0
⇔ [explanation of why P0 ⇔ P1 ]
P1
..
.
⇔ [· · · ]
Pn−1
⇔ [explanation of why Pn−1 ⇔ Pn ]
Pn
7.2.1 ⇔ Substitution
We give two rules that directly support equational-style reasoning about ⇔. The
first is an alternative formulation of rule “eq-sub” from Chapter 5, and the second
allows replacing sub-sentences with equivalent sentences.
m=n
eq-sub
S[x := m] ⇔ S[x := n]
1 Compare the use of ⇒ to the use of ≤ when concluding a0 ≤ an from a0 ≤ a1 ≤ a2 ≤ . . . ≤
an−1 ≤ an .
148 CHAPTER 7. REASONING TECHNIQUES
P⇔Q
⇔-sub
S[x := P] ⇔ T[x := Q]
It is obvious that the above rules can also be used in derivations involving ⇒.
For example, we could write:
P⇔Q
⇒-sub
S[x := P] ⇒ T[x := Q]
7.2.2 Monotonicity
Transformations that involve ⇒ sometimes require application of so-called mono-
tonicity rules. Figure 7.1 lists some useful monotonicity rules.
Monotonicity rules allow restricting the argument of why certain sentences are
related by logical implication to an argument involving parts of such sentences.
For example, ∨-Mono is read as: to show a proof for P ∨ R ⇒ Q ∨ R it is sufficient
to show a proof for P ⇒ Q.2 This is incredibly useful since a proof for P ⇒ Q will
more often than not be much simpler than a proof for P ∨ R ⇒ Q ∨ R.
Example 7.4. A direct application of ∨-Monotonicity is that it is sufficient to prove
P ∧ Q ⇒ P in order to prove (P ∧ Q) ∨ R ⇒ P ∨ R.
Rules such as ∀-Body Mono, which involve quantifiers, can be used to sim-
plify sentences by moving quantifiers outwards, and reducing the number of quan-
tifiers. By applying ∀-Body Mono, instead of having to show a proof involving
two ∀ quantifiers we are allowed to produce an argument involving only one.
` (P ⇒ Q) ⇒ (P ∨ R ⇒ Q ∨ R) ∨-Mono
` (P ⇒ Q) ⇒ (P ∧ R ⇒ Q ∧ R) ∧-Mono
` (P ⇒ Q) ⇒ ((R ⇒ P) ⇒ (R ⇒ Q)) Conseq. Mono
` (P ⇒ Q) ⇒ (¬Q ⇒ ¬P) ¬-Antimono
` (P ⇒ Q) ⇒ ((Q ⇒ R) ⇒ (P ⇒ R) Ante. Antimono
` (P ⇒ P0 ) ∧ (Q ⇒ Q0 ) ⇒ (P ∧ Q ⇒ P0 ∧ Q0 )
` (∀ x : T • P(x) ⇒ Q(x)) ⇒ ((∀ x : T • P(x)) ⇒ (∀ x : T • Q(x))) ∀-Body Mono
` (∃ x : T • P(x) ⇒ Q(x)) ⇒ ((∃ x • P(x)) ⇒ (∃ x • Q(x))) ∃-Body Mono
first formalize the logical constants true and false, and then show how they can be
used in equational-style derivations.
We axiomatize true as a tautology, and false as a contradiction:
Truth: ` true
Falsity: ` false ⇔ ¬true
Given this axiomatization we can prove that true and false satisfy some inter-
esting properties, which we list in Figure 7.2.
(P ∧ Q) ∨ R ⇒ P ∨ R
⇐ [∨-Mono]
150 CHAPTER 7. REASONING TECHNIQUES
P∧Q⇒P
⇐ [P ⇔ P ∧ true Figure 7.2]
P ∧ Q ⇒ P ∧ true
⇐ [∧-Mono]
Q ⇒ true
⇐ [(P ⇒ true) ⇔ true, Figure 7.2]
true
Assume P, Q
Show P ⇔ Q
P
⇔ [Assumption P]
true
⇔ [Assumption Q]
Q
7.4. OTHER PROOF TECHNIQUES 151
The formal justification of this proof technique relies on the Deduction Theo-
rem for predicate logic (see Chapter 5), which says that if we have P1 , P2 , · · · , Pn `
Q then we also have ` P1 ∧ P2 ∧ · · · ∧ Pn ⇒ Q.
Case x ≤ 0 and y ≤ 0
−x × y
..
.
= [· · · ]
x × −y
To prove R we find the cases P and Q such that P ∨ Q holds. We then show that
in each case R follows, that is, P ⇒ R, and Q ⇒ R.
Often it will be obvious that P ∨ Q holds, and in those cases we omit the formal
proof for P ∨ Q. One such example is when Q is instantiated with ¬P. This gives
rise to the strategy known as “simple case analysis,” and which can formally be
expressed as:
` (P ⇒ R) ∧ (¬P ⇒ R) ⇔ R (7.2)
The rule generalizes trivially to more than two cases. For example, for three
cases the justification would be:
7.5 Induction
We often find ourselves in the situation of wanting to prove something of the form:
∀ x : S • P(x)
That is, we want to prove that all elements of some set S have the property P. We
could try using universal introduction, or do a proof by contradiction. Unfortu-
nately, these techniques are sometimes insufficient. For example, trying to prove
2m × 2n = 2m+n (where m, n are natural numbers) for arbitrary m, n does not get
154 CHAPTER 7. REASONING TECHNIQUES
• natural numbers: starting with the element 0, each element is the successor
of some other element in the set.
• sequences: starting with the empty sequence, hi, each element is built up
by appending an element to an existing list.
• parse trees: starting with the “terminal” nodes of a grammar, each parse
tree is a structure defined by one of the “non-terminal” productions of the
grammar.
In the first of these examples, the proof technique is typically referred to as “nat-
ural induction.” In the others, it is typically referred to as “structural induction.”
The technique of proof by induction resembles a proof by case analysis in the
following sense. Each structural rule used to define a set S describes a (proper)
subset of S, and the union of the resulting subsets corresponds with S. Therefore,
it is sufficient to argue that each of the constituent subsets of S have a property P
in order to prove that S itself has that property. For example, for natural numbers
these subsets would be the set with sole element 0 and the set of elements that can
be expressed as the successor of some natural number. Therefore, it is sufficient
to derive a proof for P(0), and a proof for P(m + 1) where m is an arbitrary natural
number (and m + 1 is therefore described by the rule “is a successor of some
element of N”).
Induction is much more powerful than case analysis however. We need to
show P(0). And we only need to provide a derivation for P(m + 1) under the
assumption that we have a derivation for P(m).4 That is, it suffices to show that
a proof for P(m + 1) can be constructed from a proof for P(m). Informally, the
reason this works is that if m = 0 then we have a proof for P(m) since we have
a proof for P(0). Since we can construct a proof for P(m + 1) from a proof for
P(m) that in turn means that we have a proof for P(1). The argument continues
that having a proof for P(1) and a way of constructing a proof for P(2) from that
of P(1) gives us a proof for P(2) and so on.
4 Variations allow for assumptions that P is true for all numbers up to m; see Exercise 4.
7.5. INDUCTION 155
We will refer to a derivation for P(0) as the base case, and a derivation for ∀ k :
N • P(k) ⇒ P(k + 1) as the inductive case. The inductive case will be proved by an
implicit universal quantifier introduction: we assume that k is an arbitrary natural
number, and try to derive P(k) ⇒ P(k + 1). We will then derive P(k + 1) under
the assumption that P(k) holds – this assumption will be known as the induction
hypothesis. This step of the proof will often require transforming the predicate
P(k + 1) into a predicate containing P(k).5
Example 7.8. As an example we prove that 20 + 21 + 22 + ... + 2n = 2n+1 − 1; we
have therefore let P(n) be the property 20 + 21 + 22 + ... + 2n = 2n+1 − 1.
Base case: We show P(0), that is 20 = 20+1 − 1; this follows trivially from arith-
metic facts.
Inductive step: We assume k ∈ N and P(k) holds. That is, our induction hypoth-
esis is 20 + 21 + 22 + ... + 2k = 2k+1 − 1. We then show P(k + 1) holds, that is
20 + 21 + 22 + ... + 2k + 2k+1 = 2(k+1)+1 − 1.
20 + 21 + 22 + ... + 2k + 2k+1
= [substitution, Induction Hypothesis]
(2k+1 − 1) + 2k+1
= [arithmetic]
2 × 2k+1 − 1
= [arithmetic]
2k+2 − 1
= [arithmetic]
2(k+1)+1 − 1
5 In
fact, if such a reduction is not necessary for the proof then the proof can be carried out with
non-inductive strategies.
156 CHAPTER 7. REASONING TECHNIQUES
As with most other proof techniques some derivations will require several ap-
plications of the induction technique. For example, proving 2m × 2n = 2m+n re-
quires induction on both m and n.
It says that a tree is either a leaf or it is made up of two subtrees glued together
with a node. Some examples of the kind of structures that we can build up using
this definition:
leaf
node(leaf , leaf )
node(node(leaf , leaf ), leaf )
node(leaf , node(node(leaf , leaf ), leaf ))
As with natural induction there are two cases: a base case for leaves, and an in-
ductive case for composite trees. Since a composite tree includes two subtrees, the
induction hypothesis will include two assumptions, one for each of the subtrees.
7.5. INDUCTION 157
Definition of Size
There are many ways that we might define the size of a tree. Some definitions
would count just the leaves, others just the nodes. Here we will count both.
Informally, we could say that the size of a single leaf is 1, while the size of
a tree built out of two subtrees, say t1 and t2 , is the sum of the sizes of the two
subtrees plus 1 (for the joining node).
The basic idea of this definition is that we define the size of a tree inductively
over the structure, saying how the size of a given tree is calculated from sizes its
parts. We define the function axiomatically, by first declaring its type (in this case
size : TREE → N), and then by saying how it is defined in each of the two cases.
size : TREE → N
∀ t1 , t2 : TREE •
size(leaf ) = 1 ∧
size(node(t1 , t2 )) = 1 + size(t1 ) + size(t2 )
In a similar way, we might make other definitions about trees. Here are two
useful ones:
leaves : TREE → N
nodes : TREE → N
∀ t1 , t2 : TREE •
leaves(leaf ) = 1 ∧
leaves(node(t1 , t2 )) = leaves(t1 ) + leaves(t2 ) ∧
nodes(leaf ) = 0 ∧
nodes(node(t1 , t2 )) = 1 + nodes(t1 ) + nodes(t2 )
Proof:
Base Case: Show the property holds for leaf , that is, size(leaf ) = leaves(leaf ) +
nodes(leaf ).
size(leaf )
= [definition of size]
1
= [arithmetic]
1+0
= [definition leaves]
leaves(leaf ) + 0
= [definition nodes]
leaves(leaf ) + nodes(leaf )
Induction Case: Assume that the property holds for trees t1 and t2 , that is,
size(t1 ) = leaves(t1 ) + nodes(t1 ), and size(t2 ) = leaves(t2 ) + nodes(t2 ). Show that
it holds for node(t1 , t2 ).
size(node(t1 , t2 ))
= [definition of size]
1 + size(t1 ) + size(t2 )
= [induction hypothesis]
1 + (leaves(t1 ) + nodes(t1 )) + (leaves(t2 ) + nodes(t2 ))
= [commutative and associative properties of +]
(leaves(t1 ) + leaves(t2 )) + (1 + nodes(t1 ) + nodes(t2 ))
= [definition of leaves]
leaves(node(t1 , t2 )) + (1 + nodes(t1 ) + nodes(t2 ))
= [definition of nodes]
leaves(node(t1 , t2 )) + nodes(node(t1 , t2 ))
Chapter Notes
[TBD]
Further Reading
[TBD]
Exercises
1. Prove in equational style the following laws for set union:
(a) S ∪ T = T ∪ S
(b) S ∪ 0/ = S
3. Natural Induction
Prove the following claims by induction over the natural numbers:
T :N→N
Fib : N → N
∀n : N •
T(0) = 1 ∧
T(1) = 1 ∧
2 ≤ n ⇒ T(n) = T(n − 1) + T(n − 2) + 1 ∧
Fib(0) = 0 ∧
Fib(1) = 1 ∧
2 ≤ n ⇒ Fib(n) = Fib(n − 1) + Fib(n − 2)
(c) Define a function rev that reverses a sequence; for example, applied
to sequence (5 :: (3 :: (2 :: hi))) the function will return (2 :: (3 :: (5 ::
hi))).
(d) Prove that rev is idempotent, that is, rev(rev(s)) = s.
(e) Define an infix function _ that concatenates two sequences; for ex-
ample, for two sequences s = (4 :: (2 :: hi)) and t = (6 :: (3 :: (5 :: hi))),
s _ t will return sequence (6 :: (3 :: (5 :: (4 :: (2 :: hi))))).
(f) Prove that hi is unit of _, that is, hi _ s = s and s _ hi = s.
(g) Prove that _ is associative, that is, (s _ t) _ r = s _ (t _ r).
(h) Prove that n :: s = (n :: hi) _ s.
(i) Prove that rev(s _ t) = rev(t) _ rev(s).
(j) Prove that rev(s _ rev(t)) = t _ rev(s).
162 CHAPTER 7. REASONING TECHNIQUES
Part II
State Machines
163
165
In this chapter we cover some basic concepts for defining simple state machines.
These basic concepts underlie many of the different kinds of state machine models
used in computer science and software engineering. In the next chapter we look
at some of those variations. In this chapter we will also see our use of concepts
from Part I: in Section 8.5 we see how to use set notation and predicate logic to
describe infinite state machines in a succinct and precise manner.
167
168 CHAPTER 8. STATE MACHINES: BASICS
One purpose of this book is to describe some of these ways of managing com-
plexity. There are three themes we will visit and revisit: notation, abstraction, and
modularization. With the appropriate notation, using abstraction and modulariza-
tion (composition and decomposition) techniques, we model and reason about
complex software systems. But remember, as with any mathematical model, we
discuss only those things that that model models. If a state machine does not let
us model the cost of the system, then we cannot reason about how expensive or
cheap it will be to build.
This Car has a very short lifetime. It starts out in the initial state where it is
off. When we perform the action of turning the key to start the car, it moves into
the idle state. After we apply some gas, it moves into the accelerating state. If we
apply the brake, it dies, ending up in the crashed state.
We can think of the Car as a black box whose interface to the outside world
is a set of observable states (ovals above) and a set of actions (arrows above).
Sometimes we focus on just the actions and thus depict the Car’s interface as
follows:
key gas brake
Car
Car’s Interface
Imagine stuffing the Car’s state transition diagram inside the box. In Section
8.6.1 we will discuss interfaces more.
Suppose in this Car example, we turn the key and we are unlucky: the car will
not start. To model the possibility that the car goes from the off state to more than
one possible next state (off and idle), we need to model nondeterministic behavior.
8.3. STATE MACHINES: DEFINITIONS OF BASIC CONCEPTS 169
Car == (
{off , idle, accelerating, crashed},
{off },
{key, gas, brake},
{(off , key, idle), (idle, gas, accelerating), (accelerating, brake, crashed)}
).
170 CHAPTER 8. STATE MACHINES: BASICS
8.3.1 Concepts
Let M be a state machine (S, I, A, δ).
Definition 2. Each triple, (s, a, s0 ), in δ of M is a step of M.
Definition 3. An execution fragment is a finite or infinite sequence hs0 , a1 , s1 , a2 , s3 , . . .i
of alternating states and actions such that for all i (si , ai+1 , si+1 ) is a step of M.
Definition 4. An execution is an execution fragment starting with an initial state
of M (and ending in a state if finite).
Definition 5. A state is reachable if it is a last state of a finite execution.
There are two reasonable ways to define what the behavior of a state machine
is. One way (“event-based” or “action-based”) says what is observable are a ma-
chine’s actions; the other (“state-based”) says what is observable are a machine’s
states. Which way we might prefer is philosophical. Here are two alternative
definitions of what a trace is:
Definition 6. (Event-based) A trace is the sequence of actions of an execution.
Definition 7. (State-based) A trace is the sequence of states of an execution or is
the sequence, hsi i, for each si ∈ I.
Finally, we define what the behavior of a machine is.
Definition 8. The behavior of a machine M (Beh(M)) is the set of all traces of M.
Behaviors are prefix-closed, which means, for a given behavior, B: (1) The empty
trace is in Beh(M); and (2) if a trace is in B then any prefix of that trace is in B.
In other work on state machine models, behaviors do not have or are not as-
sumed to have the prefix-closure property.
Beh(Car) =
{hi, hoff i, hoff , idlei, hoff , idle, acceleratingi, hoff , idle, accelerating, crashedi}.
Light on
off
Light == ({off , on}, {off }, {flick}, {(off , flick, on), (on, flick, off )}). Some ex-
ecutions of Light are:
There are an infinite number of finite executions and the last execution listed
above is infinite. Thus Beh(Light) is an infinite set of finite and infinite traces.
Here is an example of a state machine with more than one infinite trace:
pressR pressB
pressR pressB
pressB
One of the infinite traces (event-based) is the infinite sequence of pressR actions.
What are some others?
0 1 2 ...
SimpleCounter
As soon as we need to model a system that deals with any domain with an
infinite set of values, e.g., integers, then we admit the possibility of an infinite
number of states. Now we see why most programs, let alone software systems,
are infinite state machines.
The SimpleCounter example has just one action, inc. It has an infinite number
of state transitions because there is an infinite number of states over which the
state transition relation, δ, is defined — because there is an infinite number of
values that the SimpleCounter can take.
Now, suppose we want to write a description of the state machine using the
notation we have seen so far. We would write it something like this:
SimpleCounter == (
{0, 1, 2, . . .},
{0},
{inc},
{(0, inc, 1), (1, inc, 2), (2, inc, 3), . . .}
).
The problem is what about those two occurrences of . . .? Here, the pattern is
clear and we rely on sharing the same intuition as to what goes in those . . .. In
general, we need a way to describe more complex sets. We would like to find a
way to characterize an infinite set of things in terms of a finite string of symbols.
Fortunately, predicate logic provides just the notation we need. To characterize
the set of all non-negative integers, we write:
{x : Z | x ≥ 0}
{x : Z | x = 0}
We have taken care of the first occurrence of . . ., but what about the second?
We define the state transition relation, δ, as a set of triples, (s, a, s0 ), for which
the pair of states, s, s0 , satisfies a given predicate. We do this for each action in A
and then take the union of these sets. For example, we define the set of triples,
(s, a, s0 ), that inc contributes to δ as follows:
SimpleCounter == (
{x : Z | x ≥ 0},
{x : Z | x = 0},
{inc},
{(s, a, s0 ) : {x : Z | x ≥ 0} × {inc} × {x : Z | x ≥ 0} | s0 = s + 1}
).
Technical Aside: The observant reader will notice that we use the state name
for two purposes: to name the state and to give the value of the SimpleCounter in
that state. In the next chapter we will refine the structure of states that will allow
us to make a distinction between names and values.
8.6 Notes
8.6.1 Environment and Interfaces
A system does not live in isolation. It interacts with its environment. When we
model a system as a state machine we are modeling the interface the system has
with its environment. Later in the book when we discuss concurrency we will
model the environment itself as a state machine and then discuss the interactions
between a system and its environment as the behavior of the composition of two
state machines. For now we focus our attention on modeling the system and un-
derstanding a system’s behavior through its state machine model. For now, acting
as the system’s user, we are the system’s environment.
8.6. NOTES 175
Intuitively the behavior of a state machine captures what the environment ob-
serves of the system modeled by the machine. Many state machine models differ
on what “observes” mean. (That is one reason why we introduced an event-based
view and a state-based view.) When we identify what a system’s sets of states
and actions are we define what its observable behavior is. So, when we design a
system, especially if it is supposed to be put together with some other system, it
is critical that we identify its interface. A rule of thumb to use when we try to nail
down a system’s interface and we are unsure whether something belongs in the
interface or not, is to ask, “Can I observe it?” If so, then “it” has to be modeled
somehow; “it” is part of the system’s interface.
Abstraction
In the Light example we chose not to make a distinction between flicking the light
switch up or flicking it down. In both cases we simply named the action of flicking
the switch “flick.” We made a choice to abstract from the direction of flicking the
switch. (This abstraction gives the implementor of the Light model the freedom
176 CHAPTER 8. STATE MACHINES: BASICS
to implement the light switch with a button that pops in and out rather than with
a lever that moves up and down.) When we model a system we often are faced
with such design decisions. When faced with the problem of whether something
should be modeled at a particular level of abstraction, the question to ask is “Is
this level of detail relevant to this level of abstraction?” or more precisely “Is
this distinction observable by the environment?” If the answer is “no,” i.e., the
observer has no way of telling two things apart or we as the system designer do
not want to provide the observer a way of telling two things apart, then we should
abstract from the difference between the two things. For example, suppose we
have an apple, an orange, and an eggplant. We might decide that we do not want
an observer to tell the difference between the apple and the orange, but only the
difference between fruits (apple or orange) and vegetables (eggplant).
As another example, think of the Car. By the way we chose to model it, the
user (as the environment) does not get to see all its states or all its state transitions.
For example, to go from the idle state to the accelerating state we may actually
have shifted gears, say from first to second and second to third and so on, before
getting to the accelerating state. It was our choice to abstract from some of its
states (e.g., being in third gear) and state transitions (shifting from second to third).
In making our choice of what to reveal to the observer, we hid those states and state
transitions from the observer because they were irrelevant. The only information
we have about the Car is what we reveal to the user. These are design decisions as
a modeler that we made.
Some state machine models allow us to make a distinction between external
actions and internal actions. External actions are part of the system’s interface
and are observable by the system’s environment. Internal actions are hidden and
not observable.
Actions Revisited
By combining the two points made in the previous two subsections we see that
we can divide a set of actions into external and internal actions. We can further
divide a set of external actions into input and output actions. Different models of
state machines may or may not make these distinctions. (For example, the I/O
automata model makes distinctions [?].)
8.6. NOTES 177
Light
An Unpluggable Light?
There are four reasonable interpretations:
1. Nothing happens.
2. It is undefined. That is, using functional notation, δ(off , unplug) = ⊥ and
δ(on, unplug) = ⊥. (⊥, read as “bottom,” is the mathematical symbol for
“undefined.”)
3. It is an error. Chaos can occur (core dumped, machine crashes).
4. It cannot happen.
We will take the fourth interpretation because we can model the first three
explicitly if that is the behavior we want. In the first case, we would define the
state transition function such that for each state, s ∈ {off , on}, (s, unplug, s). In
the state transition diagram, for each state, we would draw an arrow from it to
itself and label the arrow with the action “unplug.” In the second case, we would
introduce a special state called ⊥. In our state transition diagram we would simply
draw an arrow from the off state to ⊥ and an arrow from the on state to ⊥. Both
arrows would be labeled with the action “unplug.” In the third case, we would
introduce a state called “error” and draw arrows similar to those in the second
case.
178 CHAPTER 8. STATE MACHINES: BASICS
Further Reading
Exercises
1. A certain, simple, answering machine has two buttons, “play” and “save”
and can, of course, receive messages. If someone plays the messages and
doesn’t save them, they are erased/overwritten when the next incoming
message is received. The answering machine only holds a specified num-
ber of messages; when it reaches full capacity it refuses to accept new
messages. The answering machine can be modeled by the state machine,
AnsMachine, whose state transition diagram is attached.
8.6. NOTES 179
The state machine model presented in the previous chapter is not suitable, appro-
priate, or natural for modeling all systems. In this chapter we look at different
variations of the basic model we have presented so far. (In this chapter, we do not
spell out each of the components of the state machine to the same level of detail
as in the last chapter. We also introduce some minimal notation for new concepts
to make the examples concrete.)
Which model we choose to use depends on what we want to model. We want
to choose one that allows us to state as precisely and concisely those things we care
about. Some models may make distinctions that we do not care about; some may
make assumptions that do not fit our problem. But sometimes which we choose is
just a matter of taste. When we decide what model to use we should understand
why we are choosing one over another. The choice should be deliberate, not
arbitrary.
Given a state machine, M == (S, I, A, δ), in the first three sections, we refine
some of M’s components. First we give more structure to states in S (Section
9.1), then to actions in A (Section 9.2), and then generalize the functionality of
δ (Section 9.3). In the fourth section we show how all these things can be used
together. Finally, in the last two sections we discuss other refinements of state
machine models that are often seen in practice.
9.1 States
Let’s revisit the integer counter example.
181
182 CHAPTER 9. STATE MACHINES: VARIATIONS
In the diagram above, we introduce the variable x to “hold” the value of the
integer counter. The notion of a state as having variables that can have values of
some type should be familiar to us from our programming experience.
With respect to state machine models, we are refining the notion of what a
machine state is; we add some internal structure so that states are more than just
named entities like 2, off, or crashed. In general, each state in S of a state machine,
M, is a record whose field names are variable “names” or “identifiers”. Moreover,
we assume variables and values are typed much like in a programming language:
the values of a variable are drawn from the type associated with the corresponding
field name. For example, the state space of our integer counter is defined as:
S == [x : Z]
That is, S is the space of all records with field name x and whose values are drawn
from the integers.
Since variables correspond to projection functions of records, x(s) denotes the
value of the variable x in state s.
Suppose we want to write the state transition function for the Counter. Then,
as defined earlier, let S (the set of states) be the set of records mapping the variable
x to an integer value. Then similar to the SimpleCounter, we have
Now suppose we have a counter that allows state transitions only from states
whose value for its state variable, x, is an even number. EvenCounter starts in the
initial state x = 0 and whenever we bump its state, we get to the next even number:
9.1. STATES 183
x=2 x=4
x=0
EvenCounter
x=1 x=3
EvenCounter’s Interface
Part of EvenCounter’s State Transition Diagram
where we assume the predicate even has been defined appropriately. (Notice that
some states in EvenCounter are unreachable. Which ones?)
Unfortunately writing the state transition function as predicates over sets of
pairs of states and writing x(s) and/or x(s0 ) whenever we want to refer to the
value of a state variable becomes pretty unwieldy quickly. By introducing two
keywords, we write the state transition function for each action in a more readable
notation. Here is what we write for Counter’s inc action:
inc
pre true
post x0 = x + 1
bump
pre even(x)
post x0 = x + 2
The first line, which we call the header, gives the name of the action whose state
transition behavior we describe in the subsequent two lines. The second line gives
a pre-condition, which is just a predicate, and the third line gives a post-condition,
which is just another predicate. The interpretation of the pre- and post-conditions
is: In order for the state transition to occur from the state s to the state s0 the pre-
condition must hold in s; after the state transition occurs, then the post-condition
184 CHAPTER 9. STATE MACHINES: VARIATIONS
must hold in s0 . The state transition cannot occur if the pre-condition is not met.
Post-conditions in general need to talk about the values of state variables in both
the state before the state transition occurs (the “pre-state”) and the state after it
occurs (the “post-state”). We use an unprimed variable to denote the value of the
variable in the pre-state and a primed variable to denote its value in the post-state.
So, x0 really stands for x(s0 ); x, for x(s).
Here is how to visualize what the pre- and post-conditions capture:
a
... s s’ ...
Pre-condition Post-condition
holds in s. holds in s and s’.
For action a to occur the pre-condition must hold in s. If a occurs, the post-
condition must hold in s and s0 .
In the Counter example, the inc action has the trivial pre-condition, “true.”
This means that the inc action can be performed in any state in S. EvenCounter’s
bump action has a non-trivial pre-condition. Another typical non-trivial pre-condi-
tion is requiring that a pop action not be performed on an empty stack. We’ll see
other examples of non-trivial pre-conditions later. Inc’s post-condition says that
the value of the integer counter is increased by one from its previous value; bump’s
post-condition says that the value is increased by two.
In general, for a given M == (S, I, A, δ), the template1 we use to describe
δaction is
action
pre Φ(v)
post Ψ(v, v0 )
where action is in A, and Φ and Ψ are (state) predicates over a vector, v, of state
variables. The above template stands for the following part of the definition of the
state transition function, δ:
In English this says that the precondition (Φ) has to hold in the pre-state and the
post-condition (Ψ) has to hold in the pre/post-states2 .
Other interpretations of pre/post-condition specifications are possible. We are
just giving one reasonable one here.3 For example, another common one is where
the conjunction used above is replaced by an implication. Under this interpreta-
tion, which is used in Z and Larch, if the pre-condition does not hold and we try
to do the action then anything can happen, i.e., “all bets are off.” We could end up
in an unexpected state, an error state, or an undefined state.
9.2 Actions
9.2.1 Actions with Arguments
Now suppose we want to let the integer counter’s inc action take an integer ar-
gument. We see it is even more difficult to draw BigCounter’s state transition
diagram, only part of which is shown here:
inc(1)
x=1
It is much easier and more concise to write the state transition function for inc as
follows:
inc(i: Z)
pre true
post x0 = x + i
2 Recall that the post-condition is defined over two states.
3 This one also is consistent with the discussion (in the Chapter 8) about actions that cannot
occur.
186 CHAPTER 9. STATE MACHINES: VARIATIONS
We extend the header in the specification to include a list of input arguments (and
their types). We intentionally choose syntax to look like programming language
notation.
The technical term for what we do is lambda abstraction. Using a single
template we define an infinite set of functions, one for each integer i. Instead of
defining separate actions inc1 , inc2 , . . . we define a family of actions inc(i).
According to the above specification, there is nothing preventing the input
integer argument that we hand to inc from being negative. Suppose we want the
counter to always increase in value, never decrease? we capture this requirement
by strengthening the pre-condition:
inc(i: Z)
pre i > 0
post x0 = x + i
read()/ok(Z)
pre true
post result = x
write(i: Z)/ok()
pre true
post x0 = i
9.2. ACTIONS 187
read()/ok(1)
read()/ok(0)
read()/ok(int) write(i: int)/ok() write(1)/ok() x=1
write(2)/ok()
Register x=0
etc.
The first thing to notice is that we introduce the word “ok” in the header. We
do this for two reasons. The first is that we want to set ourselves up so that we have
a convenient way to distinguish normal termination from exceptional termination
of a procedure, a feature supported by most advanced programming languages.
More on this later. The second is a trivial point: For symmetry, we prefer writing
read()/ok(1) instead of read()/1. Think of the instance of an action as a procedure
call. Then the state transition labeled read()/ok(1) corresponds to calling the read
procedure and getting the integer 1 back.
We are simply adding more structure to actions. In general, a state transition is
an action instance, which is a pair of an invocation event and response event. An
invocation event is the name of the action plus the values of its input arguments;
a response event is the name of the termination condition (e.g., ok) plus the value
of its result.
An execution of a state machine is a sequence of alternating states and action
instances. Some executions of the Register machine are:
hx = 0, write(1)/ok(), x = 1, read()/ok(1), x = 1i
hx = 0, write(1)/ok(), x = 1, read()/ok(1), x = 1, read()/ok(1), x = 1,
write(5)/ok(), x = 5, read()/ok(5), x = 5i
hx = 0, write(1)/ok(), x = 1, write(7)/ok(), x = 7, write(9001)/ok(), x =
9001i
For the above executions, we have the following (event-based) traces:
hwrite(1)/ok(), read()/ok(1)i
188 CHAPTER 9. STATE MACHINES: VARIATIONS
pop()/ok(7) ...
Stack’s Interface
Part of Stack’s State Transition Diagram
push(i: Z)/ok()
pre true
post st0 = st a hii
pop()/ok(Z)
pre st 6= hi
post st = st0 a hresulti
Here is how we specify a more robust interface to Stack that allows pop to
raise the exception empty if we try to perform the pop action on an empty stack
(push stays the same):
pop()/ok(Z), empty()
pre true
post (st 6= hi ⇒ (st = st0 a hresulti ∧ terminates = ok)) ∧
(st = hi ⇒ (st = st0 ∧ terminates = empty))
The first thing to notice is the addition of the name of the exceptional termina-
tion condition, empty, in the header. For each termination condition (normal and
exceptional), we allow some kind of result to be returned; here empty does not
return any result.
The second thing to notice is the special reserved word, terminates, which we
introduce to hold the value of the termination condition (“ok” for normal termina-
tion and one of the exceptions listed in the header for an exceptional termination).
From a software engineering perspective, there is usually a close correlation
between pre-conditions and exceptions. It is common to transform a “check” in
the pre-condition to be a “check” in the post-condition. From our programming
experience, this is the same as placing the responsibility on the callee rather than
the caller of a procedure. With a pre-condition it is the caller’s responsibility to
check that the state of the system satisfies the pre-condition before calling the
procedure; with an exception in lieu of the pre-condition, it is the callee’s respon-
sibility by performing a (run-time) check and raise an exception in case the state
of the system violates the condition.
Here is an example where upon exceptional termination an interesting value
is returned. Consider a Table machine, which stores keys and values. The state
variable, t, stores the CMU telephone extensions of the 15-671 staff members:
190 CHAPTER 9. STATE MACHINES: VARIATIONS
insert(JI, 1234)/already_in(5842)
We model the state of the Table as a function from keys to values 4 . The insert
action returns an exception already in if there already exists a value associated
with the key for which we are trying to insert a particular key-value pair, (k, v),
and if so, it returns the current value bound to that key, k:
9.3 Nondeterminism
So far, δ has been a function, that is, for each state, s, and action, a, δ mapped us
to at most one next state. Suppose we have a RandomCounter machine with an
inc action that takes an integer argument:
Here is the specification of inc:
inc(i: Z)
pre i > 0
post x0 = x + i ∨ x0 = x + 2i
4 Weleave it to the reader to formalize this state machine. We will see something similar when
we get to the Birthday Book example in Z.
9.4. PUTTING EVERYTHING SO FAR TOGETHER 191
inc(i: int)
inc(4) x=7
x = 11
inc(4)
RandomCounter’s Interface
According to the specification, inc increments the counter’s value either by the
value of its argument i or by twice that value. Thus, there are two possible states
that we might end up in after doing the inc action given some integer argument.
Since there is more than one, δ needs to map to a set of states. (We view δ as a
function from (state, action) pairs to a set of states, rather than as a relation on
(state, action, state) triples.):
As an observer, we do not know which state transition will occur when we feed
inc the integer 4. We must be prepared to deal with either possibility. The choice
of which post-state is taken is made by the machine itself. Notice that the nonde-
terminism shows up in the specification of inc in the use of logical disjunction in
the post-condition.
Suppose we have an integer set, t, and in its interface is a choose action that does
not take any arguments and removes and returns an element from t.
192 CHAPTER 9. STATE MACHINES: VARIATIONS
choose()/ok(3) t = {2}
choose()/ok(int)
choose()/ok(Z)
pre t 6= 0/
post result ∈ t ∧ t0 = t \ {result}
The nondeterminism shows up in the specification for choose in the use of the set
membership operator (∈) in the post-condition. We do not know which element
of t will be returned; we know only that some element will be returned.
Notice that the labels on the arcs in the state transition diagram above are
different (by what is returned by choose); however, most people would still view
the state transition function as nondeterministic because they would abstract from
the actual value returned. In other words they would define δ something like this:
δ([t = {2, 3}], choose()/ok(Z)) = {[t = {2}], [t = {3}]}
Finally, in a programming language that supports exception handling we would
probably export a more robust interface for the choose action:
choose()/ok(Z), empty()
pre true
post (t 6= 0/ ⇒ (result ∈ t ∧ t0 = t \ {result} ∧ terminates = ok)) ∧
(t = 0/ ⇒ (t0 = t ∧ terminates = empty))
where inputs is a list of arguments and their types, termi is the name of one of i
termination conditions (including “ok”) and outputi is the type of the result corre-
sponding to termi . Φ and Ψ are state predicates as defined earlier. The reserved
identifiers we use are:
• ok, used in the header, to denote normal termination,
• result, used in the post-condition, to denote the value returned by an action,
and
• terminates, used in the post-condition, to denote the value of termination
condition. Its value can be any of the termi , including “ok,” listed in the
header.
We are glossing over a number of technicalities here regarding state variables
to store input arguments and the type of the result returned, depending on how an
action terminates.
Finally, for simplicity, let’s assume actions return only normally unless speci-
fied otherwise (by explicitly listing exceptional conditions in their headers). Un-
der this default case, we avoid clutter in our specifications by not always having
to write “terminates = ok” in the post-conditions of our actions.
M == (S, I, F, A, δ)
Here “trace†” means the trace of an execution ending in a final state. Some-
times the language of M is called regular or a regular set because it can be ex-
pressed as a regular expression using ∗, ∪, and concatenation. Just as with the
behavior of a machine, the language of an FSA can be infinite, i.e., there might be
an infinite number of strings accepted by M.
Further Reading
Exercises
1. Consider a TV remote control that allows the user to select the channel to
be viewed, add or remove a “parental block” to a channel that prevents the
channel from being displayed (removal requires a password), and enter a
password to allow a blocked channel currently selected to be displayed. If
a blocked channel is selected, the channel is not initially displayed. The
user may choose to select a different channel or may enter the password to
display the channel. If the incorrect password is entered, the channel is not
displayed. If the correct password is entered, the channel is displayed.
Your task is to model the described functionality of the remote control. That
is,
Given a state machine model of a system, we can do some formal reasoning about
properties of the model. It is important to remember that we are proving some
property about the model of the system, not the system itself. If the model is
“incorrect” then we may not be able to prove anything useful about the system.
Worse, if the model is “incorrect” then we may be able to prove something that has
no correspondence to the real system. However, we hope that we have modeled
our system properly so that whatever we prove about our model is true of the
system being modeled.
But then, why not just reason about the system itself directly? One reason is
that it is often impossible to reason about the system itself because it is too large,
too complex, or too unwieldy. Another is that we may be interested in one aspect
of the system and want to abstract from the irrelevant aspects. Another is that
we may not actually have a real system; our model could simply be a high-level
design of a system we might build and we want to do some reasoning about our
design before spending the dollars building the real thing. Another is that it may
be impossible to get our hands on the system (maybe it is proprietary). Another is
that it may be impossible for us to run the system to check for the property because
of its safety-critical side-effects (like setting off a bomb). So, in some cases, the
best we can do is reason about a model of the system, and not the system itself.
In this chapter we discuss a few kinds of properties we might want to reason
about a state machine model. The most important of these is invariant properties,
properties that are true of every reachable state in the system.
197
198 CHAPTER 10. REASONING ABOUT STATE MACHINES
10.1 Invariants
An invariant is a predicate that is true in all states. In the context of state machines,
we usually care that an invariant is true in all reachable states. The statement of
an invariant, θ, in full generality looks like:
∀ e : executions • ∀ s : S • s in e ⇒ θ(s)
∀ s : S • θ(s)
θ(s)
For example, here is an invariant for the Counter example given in Chapter 9:
x(s) ≥ 0
which says that in all states, x’s value is greater than or equal to 0. We know this
is true because initially x’s value is 0 and because the inc action always increases
x’s value by 1. Since inc is Counter’s only action, there’s no other way to change
x.
hs0 , a1 , s1 , a2 , . . . , si−1 , ai , si , . . .i
Then to prove a property, θ, is invariant requires that for every execution we:
P⇒θ
where P is the predicate describing the set of states. If it is true of every state, then
certainly it is true of every reachable state.
For example, recall in the Counter example, the predicate P is simply x(s) ≥ 0.
Hence we can trivially show the invariant property holds:
x(s) ≥ 0 ⇒ x(s) ≥ 0
To argue that the Counter’s invariant holds, we need to show that the invariant
holds in each initial state and then for each action show that if the invariant holds
in its pre-state, it holds in the post-state. (Here again, is another good reason to
have only a finite set of actions.) In general, we have a proof rule that looks like:
∀ s : I • θ(s)
∀ a : A • ∀ s, s0 :S • (s, a, s0 ) ∈ δ ∧ Φ(s) ∧ θ(s) ∧ Ψ(s, s0 ) ⇒ θ(s0 )
∀ s : S • θ(s)
where I is the set of initial states and A is the set of actions. Φ and Ψ are the pre-
and post-conditions of a, respectively. Or, said in English:
The two main proof steps are sometimes called (1) establishing the invariant
(true in initial states) and (2) preserving the invariant (assuming it is true in a
pre-state and showing that each action’s post-state preserves it).
What is the rationale for this proof rule? First, notice we care only about
reachable states (that is why the δ appears above). Second, consider any execution
of a state machine:
hs0 , a1 , s1 , a2 , . . . , si−1 , ai , si , . . . i
10.1.2 OddCounter
Let OddCounter be the state machine:
inc(2)
inc(i: int) x=3
x=1
inc(14)
OddCounter
x = 17
inc(16)
OddCounter’s Interface
More precisely,
OddCounter == (
[x : Z],
{[x = 1]},
{inc(i : Z)},
δ ==
inc(i: Z)
pre i is even
post x0 = x + i
).
The invariant we want to prove is that OddCounter’s state always holds an odd
integer:
θ == x is odd.
union(u : F Z)/ok()
pre u 6= 0/
post t0 = t ∪ u
card()/ok(Z)
pre true
post result = #t ∧ t0 = t
Suppose we want to show the property that the size of the FatSet t is always
greater than or equal to 1:
θ == #t(s) ≥ 1
1. It is true of all initial states since the size of all singleton sets is 1.
2. We need to show the invariant is preserved for each of union and card.
(a) (union). Assume
10.1. INVARIANTS 203
delete(i:Z)/ok()
pre t 6= 0/
post t0 = t \ {i}
Then the invariant would no longer hold because if delete were called in any state
where t = {i} (where i is the argument to delete) then the size of t0 would be 0.
204 CHAPTER 10. REASONING ABOUT STATE MACHINES
poke(3)
DivergingCounter’s Interface
poke(4)
...etc.
Part of DivergingCounter’s State Transition Diagram
poke(i : Z)
pre i > 0
post x0 = x + i ∧ y0 = y − i
).
Notice that all the states drawn above are legitimate initial states. So is the state
where x and y are both initialized to 0.
The invariant maintained by DivergingCounter is:
10.2. CONSTRAINTS 205
θ == x(s) + y(s) = 0
To the reader: Can you prove it?
10.2 Constraints
An invariant is supposed to be true of every state of every execution of a state
machine. We might ask is there a corresponding notion for state transitions? The
answer is “yes,” though because it is not as commonly discussed in the literature,
there is no common term for such a property. Some people might simply call it
another kind of invariant, one over state transitions rather than states. To avoid
confusion with state invariants, we call it a constraint. This word is not standard;
also, others may use “constraint” to mean something different.
Consider any execution of a state machine:
hs0 , a1 , s1 , a2 , . . . , si−1 , ai , si , ai+1 , si+1 , . . . , sj−1 , aj , sj , . . .i
A constraint is a predicate that is true in all pairs of states, si and sj , in every
execution, where sj follows si , but need not immediately follow it (sj does not have
to be si+1 ).
The statement of a constraint, χ, in full generality looks like:
∀ e : executions • ∀ si , sj : S • (si in e ∧ sj in e ∧ i < j) ⇒ χ(si , sj )
As for statements of invariants, we omit (because it is understood) the universal
quantification over executions and states; the condition about si and sj both being
states in the execution; and the condition that si precedes sj in the execution.
A constraint that holds for the Counter example is that its value always strictly
increases:
206 CHAPTER 10. REASONING ABOUT STATE MACHINES
where A is the set of actions and si and sj are quantified and qualified as described
above. Or, in English:
What is the rationale for this proof rule? First, again we care only about reach-
able states. Second, consider any execution of my state machine:
If we show that χ holds over any pair of successive states, (si , si+1 ), i.e., every
single state transition, then surely it holds over any pair of states, (si , sj ), where
i < j. To show that it holds in any pair of successive states, we need only consider
every possible action, which is the only way we can get from si to si+1 . For each
action, we need to make sure that the conjunction of its pre- and post-condition
predicates imply the constraint.
10.2. CONSTRAINTS 207
χ == ∀ x : Z • x ∈ t(si ) ⇒ x ∈ t(sj )
where si and sj are implicitly defined and qualified as usual. This says that once an
integer gets added to my set t it never disappears. We know this constraint holds
because there is no way to delete elements from the set.
Notice that saying that the cardinality of t always strictly increases:
is not a constraint for FatSet. It does not hold since taking the union of two sets
may not necessarily increase the size of either.
10.2.3 MaxCounter
Constraints are useful for stating succinctly when things do not change in value.
Consider the following MaxCounter machine whose state variable x can never
exceed the value of the other state variable max. max is initialized to 15 and its
value never changes.
MaxCounter = (
[x, max : Z],
{[x = 0; max = 15]},
{inc(i : Z)},
δ ==
inc(i : Z)
pre x + i ≤ max
post x0 = x + i ∧ max0 = max
).
χ == max(si ) = max(sj )
208 CHAPTER 10. REASONING ABOUT STATE MACHINES
This kind of example may look simplistic but it generalizes to any system
where we want to ensure that some state variable never changes. When we have
a huge state space (as is typical of software systems), very often we are careful to
state how some state variable changes but forget to say what state variables do not
change. Constraints are a nice way to describe those properties.
Chapter Notes
[TBD]
Further Reading
[TBD]
Exercises
[TBD]
Chapter 11
So far we have seen what a state machine is and how it can be used to model
concepts like system behavior, input actions (or events) with arguments, output
actions (or events) with results, and nondeterminism. We have even hinted at how
a state machine would be an appropriate model of an abstract data type, and hence
one of the bases for object-oriented programming.
We have also seen that given a state machine we can reason about some of its
properties, most importantly, invariants.
In this and the next three chapters we consider the following question: “Given
two state machines, how are they related?” This chapter discusses the relationship
of equivalence between two state machines. Chapter 12 discusses the problem
of whether one machine satisfies (in some sense) another. Chapter 13 takes a
break from definitions and gives two examples of how to show one state machine
satisfies another. Finally, in Chapter ?? we generalize these relations to show
when one state machine simulates another.
We do not cover in this book the many different notions of equivalence or the
many different ways of showing two state machines equivalent. These chapters
are meant only to introduce the reader to these concepts and to provide enough
detail for the reader to see what the fundamental questions are. Hence this chapter
is short and to be read for edification.
209
210 CHAPTER 11. RELATING STATE MACHINES: EQUIVALENCE
the same output for the same input, they are equivalent; if not, they are different.
This notion of equivalence takes a pretty narrow view of what a state machine
is. It essentially says that a state machine represents a function and determining
equivalence between two state machines boils down to the problem of determining
whether they compute the same function.
However, a software system is not nearly as simple. For example, software
systems, and hence their state machine models, often have intermediate observ-
able effects on the environment (their context). It does not suffice to just consider
the final states of the machines. Thus, many prefer to take the broader view that
two machines are equivalent if their behaviors (i.e., the sets of traces) are the same.
To determine whether two sets of traces are the same we need a way to determine
whether two traces are the same. To determine whether two traces are the same
we need a way to determine whether two actions (or states) are the same. To de-
termine whether two actions (or states) are the same we may need to ignore some
actions or states variables (e.g., internal ones) and not others. So, it is not so easy
to decide whether two state machines are equivalent; each substructure of a state
machine introduces another place for differing opinions.
For example, depending on whether we take an event-based or state-based
view of what a trace is we could come up with different answers given two ma-
chines. Consider the Light example with off and on states and the flick action.
A different Light example with three states, e.g., red, amber, and green, but with
the same action set {flick} (think of a lever with three positions rather than two)
will have the same behavior if we take an event-based view of traces, but different
behavior if we take a state-based view.
Even for the same view of traces, there are different notions of equivalence.
Suppose in determining whether two behaviors are the same, we need to deter-
mine whether two state-based traces are the same. Would we view a trace with
“stuttering” states as equivalent to one without? Consider the Register example
that has read and write actions where after doing two read’s in a row, we remain
in the same state:
hx = 0, x = 1, x = 1, x = 1, x = 5i
hx = 0, x = 1, x = 5i
It may be even harder to answer this question if our model includes infinite
traces where a trace might end with an infinite number of stuttering states.
How do we show whether two machines are equivalent or not? There are two
general approaches to take. First, we could work within just the semantic domain
Semantic Domain
and show that the two semantic entities, in this case quadruples, are equivalent.
If our notion of equivalence is not defined directly in terms of quadruples, but
rather behavior sets then our semantic domain would be behavior sets and we
would have to show equivalence between two behavior sets, which might require
showing equivalence between traces, etc..
< S, I, A, δ >
and show that the two syntactic descriptions, e.g., pre/post-condition specifica-
tions, Z specifications, or CSP programs, are equivalent in some sense. For exam-
ple, for two different pre/post-condition specifications or two different Z specifica-
tions we might show that each of the predicates of one implies the corresponding
predicate of the other and vice versa. For CSP programs, we might use properties
of process algebras to show equivalence. Since the two syntactic descriptions are
“the same,” then they denote the same semantic entity, in this a case a quadruple
representing a state machine; thus, they must describe the same state machine.
Another way to categorize how we might show the equivalence between two
machines is as follows:
• Show equivalence from first principles using mathematical logic, theories
of sets and sequences, induction, etc. Since equality on sets, sequences, in-
tegers, booleans, and other primitive types is mathematically well-defined,
if we beat everything down to these primitive types, then we have a way of
showing equivalence. We might use this technique if we want to show the
equivalence of two quadruples.
• Show one “simulates” the other and vice versa. Anything we can do in one
machine has some equivalent action or sequence of actions in the other. For
example, if we want to show that one behavior set is the same as the other,
then we might use this approach. We return to this idea in Chapter ??.
• Transform one into the other. For example, if we want to show that one
state machine description is “the same” as the other, we might use this ap-
proach. Showing that compiled code is “correct” amounts to showing that
the transformed code is “the same” as the original code.
214 CHAPTER 11. RELATING STATE MACHINES: EQUIVALENCE
Further Reading
Exercises
Chapter 12
This chapter discusses the problem of whether one machine satisfies (in some
sense) another. It should be read simultaneously (if that’s possible!) with Chapter
13, which gives two examples.
Just as there is not one standard notion of equivalence, there is not one stan-
dard notion of “satisfies.” In this chapter we give a definition that is reasonable,
representative, and used in practice.
This chapter presents three key ideas:
Although our notion of satisfies and the proof technique that we present here may
seem specific to the model of state machines described so far, what is the most
important idea to learn from this chapter is the notion of an abstraction function.
We will see it again, in a different guise, when we cover refinement in Z.
215
216 CHAPTER 12. RELATING STATE MACHINES: SATISFIES
C satisfies A.
C implements A.
C is a refinement of A.
The program C is correct with respect to the specification A.
A = (SA , IA , AA , δA )
C = (SC , IC , AC , δC )
each denoting a behavior set, Beh(C) and Beh(A), respectively. We take an event-
based view of trace: Each trace is a sequence of (invocation, response) pairs and
each pair represents a single execution of one of the actions provided by the ma-
chine. We assume for simplicity that there is a one-to-one correspondence be-
tween the action names in the concrete machine to those in the abstract machine
and that we use a renaming function, α, to define the relationship:
α : AC → AA
Since Beh(C) is a set of traces and Beh(A) is a set of traces, the satisfies
relation is satisfied if every trace in Beh(C) is in Beh(A). This means that A’s set
can be larger; C’s set reduces the choices of possible acceptable behaviors.
Why does this definition of satisfies make sense? Viewing C as an imple-
mentation of a design specification A, this definition says that an implementor
makes decisions that restrict the scope of the freedom allowed by a design. In
other words, the specification is saying what may or is permitted to occur at the
implementation level, not what must occur. The implementation narrows down
the choice of what is allowed to happen. For example, an implementation might
reduce the amount of nondeterminism allowed by the specification.
Thus, certainly having the behavior sets equal (Beh(C) = Beh(A)) is too strong.
Having the subset relation go the other way (Beh(A) ⊆ Beh(C)) cannot be right
either; otherwise there may be executions of the concrete machine that are not
permitted by the abstract one.
2 In all the examples, the renaming function will be obvious.
12.3. SHOWING ONE MACHINE SATISFIES ANOTHER 219
According to this definition of correctness the concrete machine with the empty
behavior would be a perfectly acceptable implementation of an abstract machine.
Since the empty behavior is not very satisfying, we normally assume that the set
of initial states and the state transition function for the concrete machine are both
non-empty; thus its behavior is non-empty.
The empty behavior case is the extreme case where the concrete machine does
not do anything bad (a safety property)3 since it does not do anything at all; how-
ever, the machine also does not do anything good (a liveness property). Our def-
inition of satisfies does not require that our machine do anything; the definition
requires only that the machine does only allowed things.
AF AF
y y’
c
12.3.2 Rationale
Why does this proof technique make sense? The technique should smell familiar.
It is inductive in nature. There is a base case (“for each initial state . . .”) and an
inductive step, defined in terms of all possible action instances (“for each state
transitions . . .”). As before, because the action sets are finite, we do a big case
analysis, one action per case, in the inductive step.
12.3. SHOWING ONE MACHINE SATISFIES ANOTHER 221
• In the base step, notice there is no requirement that all initial states of the
abstract machine be covered. So, there may be some abstract executions
that have no corresponding concrete execution since we could not even get
started. This is okay since we only need to show a subset relationship be-
tween behavior sets.
• The intuition behind the inductive step is that if a state transition can occur
in the concrete machine then it must be allowed to occur in the abstract ma-
chine. If we show the inductive step for all state transitions in the concrete
machine, then we will have shown it for all of its possible executions. (No-
tice that in showing the commuting diagram for each action of the concrete
machine, we are really showing it for each state transition that involves that
action.)
After showing the base case and inductive step, we will have shown that each
trace in the behavior set of the concrete machine has a corresponding (modulo the
abstraction function) trace in the behavior set of the abstract machine. Hence, the
behavior set of the concrete machine is a subset of the behavior set of the abstract
machine (modulo the abstraction function), which is what we needed to prove.
Aside: In our proof technique, because of our simplistic way of associating
actions in one machine to another (through the one-to-one function α) we are
associating a single state transition in C to a single state transition in A. More
generally, a state transition in C might map to a sequence of transitions in the ab-
stract machine A. This generalization is especially needed for proving concurrent
state machines correct and/or dealing with state machines with internal as well as
external actions. We will return to this generalization in Chapter ??.
good idea, in large part because there is often more than one way of representing
each state of the abstract machine. For example, suppose we represent a set by a
sequence. Then many different sequences, e.g., h3, 5, −1i, h5, 3, −1i, h−1, 5, 3i,
h3, 5, 5, −1i, could (given the appropriate definition of AF and RI) all represent
the same set {3, 5, −1}.
AF may be partial. Not all states of the concrete machine may represent a
“legal” abstract state. For example, in the integer-modulo7 example, integers not
within 0..6 are not “legal” representations of days of the week. The representation
invariant serves to restrict the domain of the abstraction function. We may assume
that for any concrete state for which the representation invariant does not hold, AF
is undefined.
Finally, AF is not necessarily onto. There may be states of the abstract ma-
chine that are not represented by any state of the concrete machine. This can be
true of initial states as well. In the context of showing that one machine adequately
implements another, this may sound strange; we say more on adequacy in Section
12.3.4.
Abstract states
AF AF
RI = true
Concrete states
RI = false
where the bottom unshaded region represents the domain of AF and the top un-
shaded region represents its range.
12.3. SHOWING ONE MACHINE SATISFIES ANOTHER 223
Adequacy
In this handout we explicitly stated that AF need not be surjective (onto). Were
we to require AF to be surjective, then we would require that every abstract state
have some concrete representation, i.e., that AF is adequate. Requiring AF to be
adequate makes very good sense since we might like to know that every abstract
state we have modeled is implemented by some concrete state. Some refinement
methods like VDM require that AF be adequate. And, in proving the correctnesss
of an abstract data type, we usually require AF to be adequate for the concrete
representation type.
As mentioned, we also do not require adequacy in the sense of having every
state transition of A be implemented in terms of one in C. Rather, we only require
that every state transition in C relate to some state transition of A. (C cannot
do anything not permitted by A.) This laxity is in contrast to how we defined
whether one binary relation, RC , satisfies another, RA ; we required that for each
input related by RA , RC should be defined. This requirement gets at the adequacy
of RC viewed as an implementation of RA .
Taking this last point to an extreme, we do not even require that every action
in A actually be “implemented” by some action (or sequence of actions) in C. In
other words there may be state transitions, all associated with a particular action
of A, that have no correspondence are not adequately represented) in C.
We will see later in Part III when we cover Z and CSP that other state machine-
based models impose different kinds of adequacy restrictions in defining a refine-
ment/correctness relation between two machines.
Abstraction Relations
Some people prefer to use abstraction relations or abstraction mappings more
generally than functions. There are examples where it is easier, more convenient,
or more natural to map a concrete state to a set of abstract states. As mentioned,
however, we would lose the substitution property of abstraction functions.
Auxiliary Variables
Sometimes it is not so straightforward to prove a concrete machine satisfies
an abstract machine in terms of just the state variables of the concrete machine.
In this case, we need to introduce auxiliary variables (sometimes called dummy
variables or history variables in the literature). The need for auxiliary variables
in such proofs is especially common for reasoning about concurrent programs.
224 CHAPTER 12. RELATING STATE MACHINES: SATISFIES
Further Reading
Exercises
Chapter 13
13.1 Days
Here is a simple example to show how an “integer mod-7” counter state machine
satisfies a “days of the week” machine. Both the abstract and concrete machines
have a finite number of states, a finite number of transitions, and infinite traces.
The proof of correctness uses an abstraction function that is one-to-one.
tick
tick mon tues
wed
Day sun tick
thurs
Day’s Interface
sat
tick fri
tick
tick
225
226 CHAPTER 13. RELATING STATE MACHINES: TWO EXAMPLES
Day is like an enumerated type in Pascal; its set of states is just the days of the
week:
Day = (
{sun, mon, tues, wed, thurs, fri, sat},
{sun},
{tick},
δA = {(sun, tick, mon), (mon, tick, tues), (tues, tick, wed), (wed, tick, thurs),
(thurs, tick, fri), (fri, tick, sat), (sat, tick, sun)}
).
inc inc 1 2 7
3
0 inc 59
Mod7Counter
...
4
Mod7Counter’s Interface 6
inc 5 -1
inc
inc
Mod7Counter = (
{x : int},
{0},
{inc},
δC = {(0, inc, 1), (1, inc, 2), (2, inc, 3), (3, inc, 4), (4, inc, 5),
(5, inc, 6), (6, inc, 0)}
).
The Mod7Counter concrete machine has a single action, inc, which is defined
in the obvious way. Let’s intentionally defined Mod7Counter’s set of states to be
13.1. DAYS 227
the set of integers rather than just the integers between 0 and 6, inclusive. (What
are the reachable states of Mod7Counter?)
It should be pretty obvious that Mod7Counter machine satisfies the Day machine.
But, let’s go through the steps.
AF : int →
7 {sun, mon, tues, wed, thurs, fri, sat}
AF(0) = sun
AF(1) = mon
...
AF(6) = sat
RI : int → bool
RI(i) = 0 ≤ i ≤ 6
The representation invariant just says that the only integer values we
have to worry about are between 0 and 6 inclusive. It characterizes the
domain of AF.
tues
mon
sun wed Days
sat thurs
fri
AF AF
1 2
0 3
6
7 5 4 Integers
-1
59
AF AF
y y’
inc
In other words, we need to show1 that δA (AF(y), tick) = AF(δC (y, inc))
for all y that satisfy the representation invariant, i.e., for all y ∈ {i : int |
1 Note that the use of functional notation for the state transition relations δA and δC ; more
importantly, because they are both functions, it is sound to use equational reasoning in the proof.
13.2. SETS 229
In the last part of the proof above, we did not really do it as shown, but rather
we reduced both sides of the equation at the same time yielding sun = sun. we
can do that because equality is bi-directional:
δA (AF(6), tick) = AF(δC (6, inc))
δA (sat, tick) = AF(0) def’ns of AF and δC
sun = sun def’ns of δA and AF
The above proof is more readable and it is perfectly acceptable. Just remember
that we need to give our justification next to each proof step if it is not obvious or
clear from context.
13.2 Sets
The motivation for this example is show that when we do “object-oriented” pro-
gramming, we are really identifying certain abstract objects (better known an data
abstractions or abstract data types) like sets, stacks, queues, symbol tables, etc..
We eventually have to realize (i.e., implement) these objects in a real program-
ming language in terms of either other abstract objects or the language’s built-in
data objects like sequences, arrays, records, linked lists, etc. After we write our
(concrete) implementation we are then faced with proving it correct with respect
to the (abstract) specification.
Not surprisingly, these data objects (abstract or built-in) can themselves be
viewed as little state machines. So, to show the correctness of an implementa-
tion of an abstract object is very much like showing that one state machine (the
concrete one) satisfies another (the abstract one).
230 CHAPTER 13. RELATING STATE MACHINES: TWO EXAMPLES
There are other proof techniques used to prove the correctness of the imple-
mentations of abstract data types.
Set = (
{s : {t} → set[int]},
{s : {t} → set[int] | s(t) = 0},
/
{insert(i : int)/ok(), . . . see above . . . , pick()/ok(int)},
δA = . . . see next page . . .
).
Here are specifications of the actions, insert, delete, card, member, and pick.
13.2. SETS 231
insert(i: int)/ok()
pre true
post t0 = t ∪ {i}
delete(i: int)/ok()
pre true
post t0 = t \ {i}
card()/ok(int)
pre true
post t0 = t ∧ result = #t
member(i: int)/ok(bool)
pre true
post t0 = t ∧ result = (i ∈ t)
pick()/ok(int)
pre t 6= 0/
post t0 = t ∧ result ∈ t
Seq
q = < 3, 5, -1> q = < 3, 5> fetch(2)/ok(5)
Seq’s Interface addh(-1)/ok()
We define the actions such that the state where q = h3, 5, 5i is unreachable. We
will see why soon.
Seq = (
232 CHAPTER 13. RELATING STATE MACHINES: TWO EXAMPLES
{s : {q} → seq[int]},
{s : {q} → seq[int] | s(q) = hi},
{addh(i : int)/ok(), . . . see above . . . , fetch(i : int)/ok(int)},
δC = . . . see next page . . .
).
addh(i: int)/ok()
pre i ∈/ ran q
post q0 = q a hii
remh()/ok(int)
pre q 6= hi
post q = q0 a hresulti
size()/ok(int)
pre true
post q0 = q ∧ result = #q
isin(i: int)/ok(bool)
pre true
post q0 = q ∧ result = (i ∈ ran q)
fetch(i: int)/ok(int)
pre q 6= hi ∧ i ∈ dom q
post q0 = q ∧ result = q i
RI(h3, 5, −1i) = . . .
Step 1
We first need to define an abstraction function and a representation invariant.
1. Abstraction Function.
Informally AF takes an (ordered) sequence of elements and turns it into an
(unordered) set of the sequence’s elements. Formally,
AF : seq[int] →
7 set[int]
AF(hi) = 0/
AF(q a hei) = AF(q) ∪ {e}
These three different sequence values map to the same set value.
2. Representation Invariant.
Notice that the addh action has a pre-condition that checks whether the ele-
ment to be inserted is already in the sequence. Thus, only sequence values that
have no duplicate elements serve to represent set values. We have the following
representation invariant, which characterizes the domain of AF:
RI : seq[int] → bool
RI(q) = ∀ 1 ≤ i, j ≤ #q • i 6= j ⇒ q i 6= q j
All those sequence values that RI maps to true are legal representations of set
values.
IMPORTANT: Remember that there is a side proof that we need to do here.
We need to show that the representation invariant is indeed an invariant. That is,
along the lines of the proof technique described in Chapter 10, we need to show
that the invariant is established in the initial states and preserved by each action.
We leave this part of the proof as an exercise to the reader.
Here is a picture illustrating that AF is partial and many-to-one:
O set[int]
{3, 5, -1}
AF AF
Step 2
Armed with RI and AF, we now show the concrete machine satisfies the abstract
one.
1. Initial condition.
We need to show that each initial state of Seq maps to some initial state of Set.
More formally, we need to show that AF(hi) = 0. / This is obviously true by the
definition of AF.
2. Commuting diagram for each Seq action.
We need to show this diagram:
13.2. SETS 235
set-action
t t’
AF AF
q q’
seq-action
There are five cases, one for each Seq action. Let’s just do three actions, addh,
remh, and size.
Case 2: i ∈ ran q
If i is already in the sequence then no state transition occurs and q stays the
same.
choose a particular state transition involving delete—the one for which result is
passed as an argument. We have:
δA (AF(q), delete(result)/ok())
= δA (AF(q0 a hresulti), delete(result)/ok()) post-condition of remh
= δA (AF(q0 ) ∪ {result}, delete(result)/ok()) def’n of AF
= (AF(q0 ) ∪ {result}) \ {result} post-condition of delete (and def’n of δ
= AF(q0 ) properties about set union and set differ
= t0 t0 = AF(q0 )
Second, we need to show that size does not change the abstract value of the
set that q represents. More formally,
q0 = q
⇒ post-condition of size
AF(q0 ) = AF(q)
⇒ AF is a function.
t0 = t
For the homework exercises, it is fine to give informal proofs like the ones
given here.
13.2. SETS 237
Further Reading
Exercises
238 CHAPTER 13. RELATING STATE MACHINES: TWO EXAMPLES
Bibliography
[1] Ralph-Johan Back and Joakim von Wright. Refinement Calculus: A System-
atic Introduction. Springer-Verlag, 1998.
[4] Kurt Gödel. Über formal unentscheidbare Sätze der Principia Mathematica
und verwandter Systeme, i. Monatshefte für Mathematik und Physik, 38:173–
198, 1931.
[5] J. K. Rowling. Harry Potter and the Sorcerer’s Stone. Scholastic Press, 1998.
[6] Jim Woodcock and Martin Loomes. Software Engineering Mathematics. The
SEI Series in Software Engineering. Addison-Wesley, 1998.
239