Automatic Program Construction Techniques
Automatic Program Construction Techniques
_, VIEGHNIQUES
In the brief history of computers, programming has
remained stubbornly complex and time-consum-
ing—demanding that the vague, often intuitive ideas
of the mind be translated into the precise and easily
executable code of the machine. More recently,
however, computer science researchers have
sought to construct flexible algorithms that can au-
tomate the programming process and, thus, free
unsophisticated computer users to perform novel
programming tasks. Automatic Program Construction
Techniques describes a range of these new meth-
odologies, bringing together the varied findings of
more than 40 distinguished international authori-
ties.
https://archive.org/details/automaticoprogramO0000unse_s8d4
Automatic Program
Construction Techniques
Editors
Alan W. Biermann
Gérard Gutho
Yves Kodratoff
printing number
I 23 49 © 7 OY iO
Preface
SECTION
INTRODUCTION
SECTION T
DEDUCTIVE AND TRANSFORMATIONAL METHODS
SECTION IV
OTHER TECHNIQUES FOR SYNTHESIS AND ANALYSIS
Invariance Proof Methods and Analysis Techniques for Parallel Programs 243
Patrick Cousot and Radhia Cousot
SECTION V
PROGRAM SYNTHESIS FROM EXAMPLES
SECTION VI
LEARNING
Computer programming has historically been considered an activity reserved for human beings. The
sequential lines of code for the machine must be carefully assembled to achieve the desired goals and it has
been assumed that humans must be the authors of | these commands. However, in recent years, a number of
researchers have been examining the possibility that the code might be written automatically, and that the com-
puter users might be required to specify only the goals for the desired program rather than the program itself.
These goals might be given in various forms. They might be composed of a formal specification of the
needed input-output behavior, they might be described by an informal interaction with the user, and they could
include examples of the desired behavior or other fragmentary information. The result, however, would be that
humans could obtain useful work from machines on novel tasks without the necessity of carefully coding a pro-
gram in the traditional sense. The user would interact with the machine in a convenient dialog, and the
machine would program itself to do the required job.
At the core of any such automatic programming system is a mechanism that assembles fragments of infor-
mation into machine executable code. The mechanism must be able to specify needed data structures, build
sequences of commands with the required loops and branches, construct appropriate subroutines, and complete
other tasks related to the construction of programs. This book describes the results of several dozen researchers
who have been able to build such mechanisms. It contains chapters by most of these individuals, giving their
techniques in considerable detail and including examples of their methodologies. The volume should be ade-
quate to give the reader a good view of the field of automatic programming techniques and it points the way
toward many new avenues of research. Sections of the book sequentially cover
This book emphasizes construction techniques and does not include coverage of any of the large
automatic programming systems such as those developed during the 1970’s by Robert Balzer at Information
Science Institute, Cordell Green at Stanford, George Heidorn at IBM, William Martin at MIT and others. Such
large systems need to have a user interface, large databases of programming knowledge, a control system for
monitoring dialog and invoking special functions and so forth. This volume examines the central mechanisms
related to the synthesis process which will be at the heart of any complete automatic programming system.
However, it does not include coverage of such large systems which is a substantial research topic in its own
right.
This book is the outgrowth of a meeting that was held several years ago at the beautiful Centre Culturel
de Bonas in southern France. The sponsors were the U. S. Army Research and Standardization Group, the
French Centre National de la Recherche Scientifique and the Institut National de Recherche en Informatique et
Automatique. We are greatly indebted to the Duke University Department of Computer Science which has pro-
vided facilities for preparation of the manuscript and we appreciate the encouragement of the Chairman, Mer-
rell Patrick. We would like to especially thank Patricia Land who typed most of the chapters and Steffani Webb,
David Mutchler, Bruce Smith, and Elaine Levine who did considerable debugging and formatting in the final
stages of the project. The phototypesetting was done by UNICOMP of Los Alamos, New Mexico.
Alan W. Biermann
Gérard Guiho
Yves Kodratoff
a we
han
aces ooeQ' As
= 2
eS
5 ve wae naar
ds eats
0Ghana
= (oe ie
+ @~ om Giaofl
— (ee ®
“Die (eae GD
: ee ea ie
eo Fe
a a
SECTION I
INTRODUCTION
OVERVIEW 3
CHAPTER 1
Alan W. Biermann
Duke University
Durham, NC 27706
A. Introduction
The origins of a computer program begin in the mind of a human being and may have many forms.
There may be a vision of how the output should look, a few ideas concerning the variety of inputs that will be
encountered, some vague thoughts concerning how the computation should be done, some specific examples of
desired or disallowed behaviors, and many other kinds of fragmentary information. On the other hand, the tar-
get computer program is neither vague nor fragmentary but is a very well defined sequence of instructions that
can be loaded into a machine and executed. Computer programming is the process of translating those vague
inclinations in the mind of a human into machine executable code that will perform the desired action.
4 BIERMANN, GUIHO, KODRATOFF
viewpoints. The
The fundamental processes needed for the creation of code have been studied from two
and tries to generate new methodolo gies of
one examines human processes in analyzing and solving problems
automated in the process of code crea-
programming. The other tries to discover what can be totally or partially
with how to select programmi ng constructs to
tion. We shall consider the second only which is concerned
n, how to synthesize code from examples of
implement specifications, how to utilize fragmentary informatio
desired behavior, how to utilize domain knowledge and any of many other processes. These methodologies
have been called automatic program construction techniques, and many of the important results are described in
this volume.
The following sections of this chapter introduce these methodologies. We first will examine the process
of synthesizing programs from formal specifications. Next abstract data structure methodologies will be
described followed by a section on synthesis related issues, program manipulation and correctness. In Section
E, techniques for synthesizing programs from examples will be described. The final section will describe recent
advances in learning theory and their relationship to automatic programming.
Most of the studies in this introductory chapter will involve a very brief introduction to individual tech-
niques and then an examination of how each approach can be used to create a particular example program.
Where possible, the same example has been used so that comparisons can be made. Some of the examples
outlined in this chapter are completed in later chapters or the Appendix. All of these sections necessarily leave
out much detail but enough is included to give the flavor of each approach and its general characteristics. The
later chapters, of course, are designed to give full coverage.
B.1 Introduction
The first synthesis methods to be studied here will be the formal logic based systems. These methodolo-
gies encode information in the ‘‘well formed formulas’’ of logic and utilize deductive procedures in the creation
of programs. We will examine in this section the deductive approach of Manna and Waldinger (as is described
in Chapter 2), the heuristic but still formal methods of Bibel and Hornig (Chapter 3), and the goal reduction
methodology of Follett (Chapter 4). In each case, we present only an overview of the methodology with all
details deferred to the later chapters.
In order to make the discussions concrete and to allow comparisons, the approaches will each be demon-
strated on the same example problem which can be defined as follows: a program f is to be created that reads
an input list x of integers and returns x with all negative integers removed. For example, if f is given the list
(-7 29 -3 4), it returns (2 9 4). If fis given NIL, the list of length zero, it returns NIL.
. In more formal language, we will write the input specification as P(x) where P is a predicate yielding
true” if x isa list of integers and false, otherwise. The input-output specification will be represented as
R(x,z) which is true if and only if x is a list of integers and z is x with all negative integers removed. We will
examine in this chapter how seven program synthesis methods approach this problem and create an appropriate
program.
The Manna-Waldinger approach employs a tabular notation called a sequent which has three columns
labelled as assertions, goals, and outputs. A sequent will be typically written as
A,(a,x)
A (a,x)
A,(a,x)
G, (a,x) tietaex)
G, (a,x) ty (a,x)
Gm (a,x) tifa;x)
if Vx A,(a,x) and
Yx A,(a,x) and
Yx A,(a,x) then
Ax G,(a,x) or
Sx G»(a,x) or
Gale
If x=e is the particular value of x such that some G;(a,e) is true, then the corresponding output t;(a,e) may be
thought of as the program to compute the specified output. All of this can be illustrated by continuing the
example. The theorem to be proven for this particular synthesis can be written directly in sequent form.
P(a)
R(a,z) Z
Thus if P(a) is true, we seek z such that R(a,z) and that z is the desired output.
The deductive approach provides techniques for adding assertions and goals with their appropriate output
entries to the initial sequent. The aim is to deduce a goal of ‘‘true’’ with a corresponding output entry in terms
of primitive machine instructions. We give one method for making additions to the sequent here.
6 BIERMANN, GUIHO, KODRATOFF
goals
One technique for adding lines to the sequent is called GG-resolution. Suppose there are two
subsentences P, and P>, respectively, that can be unified. That is we assume
G, and G, in the sequent with
there is a substitution @ such that P; and Pare identical: P}@ = P28.
goal output
goal output
G,A[P,4 — true]
A
G,6[P,6 —false] if P,@ then t,6 else ty
This output is correct if the new goal is true for the following reason. If the new goal is true, each conjunct
must be true. Thus P,;@ =true and G,@ [P,;@ ~true] is true imply t,@ will be the correct output. This is by
definition of t;). If P,@=false, it is not known what G,6[P,6] may evaluate to. However,
G,6(P,@ = P,6@ = false] is true by the above argument meaning t7@ is the correct output. So the output should
be:
goal output
(a=NIL) NIL
not (a=NIL) t
Let P;} =P, = (a = NIL) and @ is the null substitution. Then the result of a GG-resolution is
goal output
true and not (false) if(a=NIL) then NIL
else t
which reduces to
goal output
true if a=NIL then NIL
else t
ee oy aim of the Manna-Waldinger deductive technique is to deduce a goal of ‘“‘true”’ (or an assertion
of
i with a corresponding output entry in terms of primitive machine instructions
. After a series of steps
similar to the GG-resolution described above, the method generates
the following goal and program as
described in the Appendix. (Note: A brief introducti on to LISP notation appears in the A
chapter for readers unfamiliar with this language.) Pe Cer aaa
OVERVIEW 7
goal output
Once a complete synthesis has been made using the Manna-Waldinger technique, one can make some
comments on its characteristics. First one should notice how naturally program control structures emanate
from the deduction. For example, GG-resolution, which is a form of the ordinary resolution of the theorem
proving literature (Robinson [1965]), generates an if-then-else form. The resulting code is a logical result of
the deductive step that is being made. Similarly, recursive looping is introduced through an inductive argument
and a resolution step. The well-foundedness of these individual steps and the consequent strength of the com-
plete derivation yields a very satisfying mathematical path from axioms to final program. Once such a synthesis
is complete, one has the feeling that the program is completely understood from the most basic logical building
blocks.
Furthermore, because of the flexibility of logic, one can have confidence that a wide variety of structures
can be handled in aesthetic ways. Thus Manna and Waldinger have pointed out that the method easily handles
conjunctive goals where code is to be generated satisfying more than one specification. Furthermore, various
kinds of quantification in the specification and also subroutines are dealt with in a straightforward manner as
described in Chapter 2.
In the next section, we will examine methods by Bibel and Hornig that use a nondeductive logic based
synthesis method. Their method also emphasizes the use of heuristics to guide the search through the
immense solution space.
INITIALIZE PROBLEM
f
INPUT VARIABLE
x
INPUT CONDITION
list (x)
OUTPUT CONDITION
if x=NIL then f(x) =NIL
otherwise f(x)=y where
Yu [member(u,y) <= > (member(u,x) Anot(neg(u)))]
Here list(x) is true if and only if x is a list and member(u,y) is true if and only if u is on list y.
We would expect the system to begin by generating code to handle the trivial case and then seek a recur-
sive solution to the general case. Following this strategy would yield the code
Bibel and Hornig argue that there are relatively few practical recursion schemes and that a program syn-
thesizer need only try those few. Their method would pose the problem, find g(x) such that
The strategy dictates that an element of the input is to be ‘“‘guessed’’ and then used in computing the output.
This ‘‘guessed’’ element can then be removed from the input and a recursive call can be made on the function.
The most easily accessed member of input x is its first element car(x), and the method infers that two cases
must be handled, the case where car(x) is not in the output and the one where it is. In the first case, it builds
the recursive call f(cdr(x)) and in the second it constructs cons(car(x),f(cdr(x))). Finally, it finds a test, as
explained in the Appendix, to distinguish between the two and generates
The use of this passback pair can be illustrated in the synthesis of a simple program. Suppose the input
condition P(x) is that x is negative. Suppose the goal condition R(x,z) requires that x be smaller than z.
We propose that the goal can be achieved with the statement z+—1 and use the passback pair to determine the
required precondition. But G is x<z, so p(G) is x<l. Since the input condition P(x) requires x to be nega-
tive, the precondition x <1 is achieved and the program synthesis is complete. The program is the single state-
ment z+—l and it will find for any x such that P(x) a z such that R(x,z).
Another passback pair for z+-l1 is (G does not contain z,I). This states that if the goal following z+—l
does not mention z, the identity operator may be used to pass the goal back over z+-1. This means that if the
goal is y=1 and if z+—1 is used to try to achieve that goal, the required precondition for zl is y=1. Thus the
statement accomplishes nothing toward this goal.
Suppose a program is to be created that yields z=1 if x is negative and z=2 otherwise. Then a conditional
program segment is needed
if A then I, else I,
where A is a predicate and I, and I, are programs. If I, and I, have passback pairs (S;,p;) and (S>,p2), then the
conditional statement will have the pair (S;AS», if A then p,;(G) else p2(G)). To generate the current program,
the following substitutions can be made.
A x <0
(x <0Az=1) V (x>=0Az=2)
and one can check that the passback pair is applicable and yields a precondition of ‘‘true’’ indicating the syn-
thesis is correct.
One can similarly approach the canonical example for this chapter. The goal is
where z is list x with negative entries removed. A conditional statement can be proposed to achieve this goal
with I; = NIL and I, = g(x) where g(x) is yet to be created. I, will have passback pair (true, GAf(x) =NIL)
10 BIERMANN, GUIHO, KODRATOFF
and I, will have (true,G’) where G’ is G with the requirement removed that f(x)=z where z equals x with nega-
tive entries removed. Carrying out the synthesis as in the previous example yields
The synthesis of g(x) requires another invocation of the if-then-else feature and a loop construction using an
inductive argument similar to that given by Manna and Waldinger.
S= {integer}
0: — integer
S: integer — integer
+: integer, integer — integer
E= {x+0=x, yt+s(x)=s(y+x)}.
Many problems arise with this kind of definition. The first one is to discover what has been specified.
OVERVIEW 11
We can say that the natural numbers are a suitable model for the previous specification but the integers
modulo n are also models and there are many others that are more or less natural. In fact, it is well known that
a specification specifies a class of multisorted algebras. Usually we restrict ourselves to algebras that are finitely
generated and when this class is reduced to one element (just one model), the type is in some way totally
specified and said to be monomorphic. When that class is empty or reduced to some kind of trivial type where
every sort is composed of only one element, the type is said to be inconsistent.
Many characterizations have been done on types and many difficult problems arise, some of which are still
at the research level. The principal ones are as follows.
Errors. How should errors be handled. For instance, in the last type, if one adds the operation d (decre-
ment x by 1) with the axiom d(s(x)) =x, one must decide how to define d(0). If we select d(0) = error, then
error is a new constant of the sort integer and terms such as s(error) etc. must be defined.
Parameterized types. We can define in the same way the type stack of integers or stack of booleans with
operations in the first case like
This is related to the genericity of package in ADA, for instance. In fact, it is quite easy at the syntactic level
as in ADA or CLU but in some cases the semantic conditions are quite complex.
Power of specifications. Can we allow every kind of axiom in the specification? What happens to the deci-
sion procedures if we allow conditional axioms like
x £0 => s(d(x)) =x
=ks(k) =02
Only some strong restrictions on the presentation of axioms make it possible to have effective decision pro-
cedures.
Representation of one type by another. When can one say that one type represents correctly another? A
natural way is to require that the class of multisorted algebras specified by the representing class is included in
the class of the other type. A more effective way is to show that a special homomorphism can be constructed
between the two types that will enforce on one the equations of the other. The principal idea of the method of
Gaudel et al. (Chapter 9) is to consider the computation process as a representation of this homomorphism
between the two types which specify the two languages. The chapter shows how this idea can be applied to pro-
gram construction.
12. BIERMANN, GUIHO, KODRATOFF
Write a program that inputs a list of integers and outputs the list with all negative entries removed.
The processing of natural language and the subsequent creation of an internal problem representation is beyond
the scope of this book. It is dealt with in Balzer [1973], Biermann and Ballard [1980], Green [1977], Green ef
al. [1978], and Martin et al. [1974]. We will follow Barstow in this discussion and concentrate on the program
synthesis problem.
The Barstow synthesis method follows a producer-consumer paradigm which assumes a process (the pro-
ducer) that generates pieces of data and a process (the consumer) that uses those pieces to construct an output.
This paradigm provides a very natural way of looking at our example problem and we will show briefly how it
ieads to a solution.
The first step is to divide the processing into the producer and consumer subparts, a straightforward
Operation in this case. The producer will scan the input list searching for nonnegative values. The consumer
will append them to the end of a list as they are found. We, of course, in this short overview do not describe
the knowledge rules that lead to this decision but refer the reader to Chapter 10 for details.
The next steps construct sequentially the consumer code and the producer code. Addressing the consu-
mer problem first, the output z is to be initialized. Then objects are to be received from the producer and put
on the list. A series of rules built into the system analyze the code requirements and produce the code. For
example, in this case the system uses these rules.
If a consumer builds a set consisting solely of objects received from another process, the set is initially
empty.
If it is known that an object follows every member of an ordered sequence, the object may be added to
the sequence by inserting it at the back.
Initialize:
z:=NIL
Action:
v:= {received from producer}
insert v at back of z
OVERVIEW 13
Next we examine the producer which is to move sequentially across the input looking for values to output.
The knowledge based rules must be able to extract from the problem description the fact that nonnegative
values are sought. The generated producer in this case would be the following:
Initialize:
b: =x
Termination test:
Is b NIL?
Each step:
u :=first element of b
remove the first element of b
if u is not negative {send u to consumer }
The final step in the synthesis process is the merging of the producer and consumer. Here the system
invokes the following rule:
A transfer program involving a producer and consumer may be refined into a WHILE loop whose condi-
tion is the termination test of the producer.
The producer and consumer are thus merged to become a solution to the problem. We use the function
consright to append an atom to the right end of a list. As usual, we let x and z stand for the input and out-
put.
b: =x
z:=NIL
while b is not NIL do
u:=car(b)
b:=cdr(b)
if not(neg(u)) then
ve
z:=consright(z,v)
Of course, this program is not minimal. In fact, all the intermediate variables b, u, and v can be removed. But
synthesis rather than efficiency was the aim of this work. The reader might wish to examine Kant [1979] for a
treatment of efficiency concerns.
The knowledge based approach to programming is attractive because it attempts to simulate in some ways
the only good programmers known to exist, humans. Standard methodologies for coding, programming tricks,
and other programming knowledge can be built into the system rules to gain a human-like approach to program
generation. Of course, the rules are manipulating mathematically precise objects, namely, lines of programming
code so a deductive component is needed. This is, in fact, a central point in the Barstow contribution.
In Chapter 11, Back gives a methodology for program generation based upon the concept of the invariant,
a formal specification of the program state at a particular point in the code. This work grows out of the litera-
ture on program correctness (Dahl et al. [1972], Floyd [1967], Gerhart [1972], Hoare [1969]) and is logic
based. A programming language is described for implementation of invariant based programming and examples
are given to illustrate the method.
A second chapter coming from the correctness literature is Chapter 12 by Cousot and Cousot. This work
introduces a unified approach to the study of various program proof and analysis methods and is based upon
14 BIERMANN, GUIHO, KODRATOFF
of Floyd [1967],
state transition systems as models of programs. The approach is used to study the methods
Owicki et al. [1976], and Lamport [1977] and is offered as a technique for creating and analyzing new proof
methods.
Another approach to automatic programming allows the user to make declarative statements relating
objects or variables in a problem domain, and an automatic system ‘‘executes’’ the statements to find a solution
instantiation for certain output variables. The well known PROLOG language is an example of such a system,
and the equational specification language of Chapter 13 by Pnueli, Prywes, and Zarhi is another. This system
accepts an equational specification which relates numerical data in arrays but which includes no direct indication
concerning the method or order of the calculation to be done. The task of the automatic system is to extract
from the equations the operations necessary to do a computation and to program or ‘*schedule’’ them for exe-
cution.
Chapter 14 by Pettorossi studies an issue in the synthesis of recursive programs, the optimum use of
memory. The strategy involves a technique for automatically keeping track of which locations contain useful
information and releasing those cells which do not.
E.1 Introduction
A complaint often made against the synthesis methods described above is that they require too much pre-
cision and care from the user. That is, it may be in many cases easier to write a program in a traditional pro-
gramming language than it would be to write its formal input-output specifications. We address in this section a
vastly different type of construction problem, program synthesis from examples. Thus in the canonical example
problem of this chapter, instead of constructing the program from input-output specifications, the synthesis will
be done knowing only that the desired program must yield z = (2 9 4) if given input x = (—7 2 9 —3 4). The
synthesis methods will attempt to produce a program that has this behavior and that will function ‘‘similarly”’
on other examples. If the user observes that the automatically created program has shortcomings either by exa-
mining the code or by running examples, then the system can be given additional examples on which to base
the synthesis. A surprising experimental fact that has come out of this research is that the desired program, if
it is small, often can be very quickly converged upon using only this weak source of information.
Synthesis in this section begins with the example input-output pair: (-7 2 9 -3 4) yields (2 9 4). In fact,
we can express this pair with the relationship
f(x) = cons(car(cdr(x)),
cons(car(cdr(cdr(x))),
cons(car(cdr(cdr(cdr(cdr(x))))),
NIL)))
This is a cons structure which also contains selectors (the car and cdr operations) which ‘‘travel
inside’’ the
data in order to pick up the desired parts. For instance, the car(cdr(x)) operation selects
the 2 from the input
list and makes it available for constructing the output. This structure also contains
the NIL constructor and the
atom predicate will be available to check if some data has reduced to NIL.
Finally, this structure contains
objects belonging to an external type which itself has properties. For instance,
one can build lists of integers and
the external type will be very rich or lists of variables without any special
property. Most of the LISP synthesis
methods generate programs to manipulate structure without reference
to any external type. Our example will
however, reference integers and the type related fact concerning
whether or not they are negative. :
The following sections briefly introduce three methods for
synthesis from examples.
OVERVIEW 15
Then the behaviors are numbered and each output is described in terms of its associated input using the cons
tree construction described above. Thus the first output f,(x) is the trivial cons tree NIL. The second output
f(x) is obtained by using a selector car(x) to obtain a value from the input. Then cons is used to build the
output list.
f,(x) =cons(car(x),NIL)
f3(x)= cons(car(cdr(x)),NIL)
f4(x)= cons(car(x) ,cons(car(cdr(cdr(x))),NIL))
f5(x)= cons(car(x),
cons(car(cdr(x)),
cons(car(cdr(cdr(cdr(x)))),
NIL)))
fe(x) = cons(car(cdr(x)),
cons(car(cdr(cdr(x)))
cons(car(cdr(cdr(cdr(cdr(x))))),
NIL)))
This method attempts to find f; and f; with i>j such that f; can be expressed in terms of fj. If a pattern is found
that relates each fj to a previous f; in a regular way, a program may be synthesized. In the current case, we note
that f¢ and fs; look the same except for an inner nested cdr function:
f¢(x) = f5(cdr(x))
Next we examine fs and notice it is different from f,. But an instance of f, with an extra cdr appears in fs; so we
can write
16 BIERMANN, GUIHO, KODRATOFF
XG recursion relation
Summers and his followers have developed a number of elegant theorems relating such sequences of recursion
relations with programs. In the current case, we note that fj(x)= NIL if x is NIL, fj(x) = f,-\(cdr(x)) if car(x)
is negative, and f;(x) = cons(car(x),fj;-,(cdr(x))) otherwise. So the program is
z=cons(car(cdr(x)),
cons(car(cdr(cdr(x))),
cons(car(cdr(cdr(cdr(cdr(x))))),
NIL)))
One can create the desired program by first breaking this expression into a set of primitive forms and then per-
forming a merge operation on these primitives. As described in Chapter 17, the primitive forms to be used will
be
f(x) = NIL
f(x) =x
f(x) = fj(car(x))
f(x) = f,(cdr(x))
f(x) = cons(f(x), f(x)
i f(x)
f,(x) =cons(f,(x), fs5(x))
f>(x) = f3(cdr(x))
f3(x) = f4(car(x))
f4(x) =x
fs(x) =cons(f¢(x), fyo(x))
f(x) = f7(cdr(x))
f(x) = fg(cdr(x))
f(x) = fo(car(x))
fg(x) 96
f 19(x) = cons(f,;(x), f17(x))
f11¢x) = f12(cdr(x))
f12(x) = f13(cdr(x))
f13(x) = f4(cdr(x))
f14(x) = f5(cdr(x))
f15(x) = f6(carx)
f16(x) = 28
f17(x) = NIL
The construction procedure then produces the target program by performing merge operations on the
above seventeen functions. The primary tool used is the conditional of LISP:
18 BIERMANN, GUIHO, KODRATOFF
cond((pj, fi),
(pi2, fiz),
(Pins fin))
This function evaluates predicates pj}, Pi, Pi3,---> Pin Sequentially until one is found that is true. If pj is true,
then function fj; is computed and its value returned. In this application, pj, will always be T (true) to guarantee
that the conditional will always yield a value. The functions fj will simply be the primitive functions defined
above. The predicates p,, must be created by a predicate constructor from primitives.
We will assume for this example that predicates may be assembled from the primitives atom and neg
operating on any combination of car’s and cdr’s of x. The predicate constructor will be given sets of S-
expressions and will have the task of finding a predicate that can distinguish between them. For example, if it
is given the sets {3} and {(3 4 5)}, it will generate atom(x). In other words, atom(x) yields T for all members
of one set and F for all members of the others. If it is given {(-3)} and {(3 4) (6)}, it will find neg(car(x)).
Such a constructor is easy to build since its only task is to enumerate the class of allowed predicates until a
satisfactory one is found.
Proceeding with the synthesis, the above seventeen functions are modified as described in the Appendix
and then every possible merger is attempted. An example merger would be to identify fy and fj, as the same
function. This is a reasonable merger since both functions have the same value. A less obvious merger would
be to identify f, and f, to be a new function g. Since the forms of f, and f, are different, a conditional is
needed. We could write g as either
If py and pz can be found so that either form can be used to successfully compute the original example, then
the merge of f, and f, is successful. If no such p,; and p, can be found, the merge is not successful,
and
f; and f, are proven to be disjoint.
" one this merge process can be carried out and it will generate the following code as shown in the
ppendix.
f(x) =cond((atom(x) , x)
(neg(car(x)), f(cdr(x)))
(T, cons(f3(x), fs5(x))))
f3(x) =f; (car(x))
f5(x) = fy (cdr(x))
two randomly chosen examples are sufficient for the synthesis of most simple programs. Secondly, the method
can be defined to be completely algorithmic and so can be programmed to be as reliable as any compiler. In
Chapter 17, we describe a version of this system that will generate any program in the class of ‘‘regular’’ LISP
programs. The construction of the trace and the modification of its form described here is a simple computa-
tion as explained in Chapter 17. The merger computation and the predicate generation are also simple algo-
rithmic procedures but they are expensive. Unfortunately, the merger process is exponentially costly in the
length of the target program so that only short programs can be generated.
The program construction methods described next grew out of experiences gained with this function
merger technique. It attempts to maintain the simplicity and reliability of the current method and to simultane-
ously increase the speed of the construction.
The rule has two parts, the nonterminal symbol [P2,(X 9XL), next] and the generated string
P° (X9,XL) =cons(car(Xo), next). The nonterminal symbol [P?,(Xo XL), next] can be interpreted to say that
program P? is to be defined with arguments (Xo XL) and that the result of its computation is to be appended
onto ‘‘next’’. The arrow => means ‘“‘generates’’ so this rule can be understood to say that the nonterminal
20 BIERMANN, GUIHO, KODRATOFF
where w=23, XL=X4 Xo, and next=NIL. Thus a function P’; has been defined with parameters Xo,X4,X6,
and the value is computed as shown.
Another production rule of interest is the following which generates looping code.
Pax XD)
cond((P! entry check), next)
(Te PER OG sx I)
Here the nonterminal [Pi, (X; XL), next] generates two things, some LISP code and another nonterminal,
[PK,, ae ....]. Again, a number of variables need to be instantiated in order to use this rule,
i,w,XL,next, (P' entry check), P‘,, and m. The functioning of this rule can probably best be understood by
examining its action in a complete synthesis.
As an example, suppose a program is needed to generate from input (A B C D) the output (A C). Then
the system would first compare the input with output and, using the method described in Chapter 17, deter-
mine that the required behavior can be achieved with one scan of the input. It would then select the only rule
needed for that scan, the looping rule shown above, and set the variables at appropriate values. Specifically,
only one rule is needed, i=1, the needed entry check for the loop is (atom(X;) or atom(cdr(X;))), the called
routine P* should be P® to append atoms to the output, and the decrement m across the input list should be
two. So the looping rule now reads
The production rule method then creates the program by expanding the nonterminal [P!, (X,), NIL]. Thus to
do this expansion with the above rule we set
OVERVIEW 21
next = NIL.
(a) P! (X,)
cond((atom(X,) or atom(cdr(X,))),NIL)
(Ty Pe (Xx)
The new nonterminal can be expanded using the first rule given above if
w= 1
XL =X,
next = P!(cddr(X ))
to yield
(T, Pt (X2,X2)))
Pi (X1,X2) = cond((not(neg(car(X,))),P?\(X1,X1,X2))
(T, P*(cdr(X4))))
P?, (Xo,X1,X2)
=cons(car (Xo) ,P2(cdr(X>)))
22 BIERMANN, GUIHO, KODRATOFF
F. Learning
F.1 Introduction
The above sections have concentrated on the generation of programs written in traditional languages such
as LISP. However, learning theorists have been developing synthesis methods for other structures such as logi-
cal formulas, and these structures are close enough to being executable programs to be of interest to us.
The model for the learning theorist is similar to the synthesis-from-examples paradigm of the previous
section. Samples of some kind of behavior such as input-output pairs are given and a generalization from them
is to be made. However, a learning system may construct a logical formula, grammar, or other entity to
represent the generalization. The function of the learning system is to build a representation of the relationship
between the input and the output so that if it is given one, it can find the other.
For example, suppose it is desired to train a system such that if its input x is a list of integers, its output
z is that list with the negative ones removed. Then we might begin the training process with the input-output
pair
x=NIL, z=NIL.
x=NIL A z=NIL
and then attempts to find a generalization of this assertion. Following the strategy of Cohen and Sammut in
Chapter 21, the system would note that
x=z A x=NIL
X=Z.
This assertion thus becomes its first guess at the relationship between x and z.
This is clearly not the desired relationship, so another sample of the target behavior can be given.
x=(-1), z=NIL
From the new sample the system would assemble the fact
x.head.sign="-" A
X.head.mag=1 /A
x.tail=NIL \
z=NIL
using the notation of Chapter 21 Following the same generalization Strategy, the formula becomes
x.head.sign="-" A
x.head.mag=1 A
X.tail=z
a This ee still does not represent the target behavior, but one can
begin to see signs of movement
€ correct direction. Thus, one can imagine that, after
several more learning steps, this expression could be
OVERVIEW — 23
transformed into the equivalent of ‘‘if x.head.sign="-" then the output of the current computational iteration
should be x.tail.”’
This complete concept learning task is described in Chapter 21 and the resulting learned concept,
‘delete’, is the following.
delete =
[ x.z: x=NIL A x=z
V [3 P,Q: x.head.sign="-"
A x.head.mag=P
A x.tail=Q
A x.cardinal(P)
A delete(Q,z)]
V [3 P,Q: x.head.sign="+"
A x.tail=P
A z.tail=Q
A z.head=x.head
A delete(P,Q)] ]
The meaning is that either
Cohen and Sammut argue this concept is executable like a program and they explain why. For example, if
x = (-—7 29 -3 4), their system will find z = (2 9 4) to satisfy this concept. That is, this concept is like a
PROLOG program, a set of logical assertions which are executable. For another example of this kind of work,
see Shapiro [1981].
Example 1.
Let x,y,u,v, be variables. Let t, = cons(cons(car(x),NIL),cdr(y)) and
t> = cons(v,cdr(cons(car(w),NIL))). Then the substitution
s = (v \cons(car(x),NIL), y \ cons(car(w),NIL))
is such that
st; = st? = cons(cons(car(x)
NIL) ,cdr(cons(car(w) ,NIL)))
In learning, we are concerned with a restriction of unification to the case where there is an s_ such that
st; =st>. Given two terms t, and tj, we will say t; is more general than t, when there is a substitution s such
that st; =t>. Intuitively, this means that the tree t, is ‘‘shorter’’ than the tree t since some variables of t; have
to be replaced by terms to make t, equal to ty.
Example 2.
Let t; = cons(cons(car(x) NIL) ,cdr(y)) and ty = cons(v,cdr(y)).
Then s = (v \cons(car(x)),NIL)) is such that st) = t, and thus t, is more general than ty.
Example 3.
Let t; =cons(x,y) and tz = cons(v,w). One can unify with the substitution x \ v, y \w but x,y,v, and w
are free variables so there is no change. Therefore, neither t,; nor t, is more general than the other.
Example 4.
Consider example 1. One has st, and st, and neither t; or ty is more general than the other.
Example S.
Let t; = cons(x,NIL) and tz = cons(x,v). Since NIL is a function of arity 0, it cannot be the left side of a
substitution. But the substitution s = (v \ NIL) yields st? = t;, so ty is more general than t,. This is the well
known rule of ‘“‘turning constants into variables.”’
Example 6.
Let t, = cons(x,x) and ty =cons(x,y). If we attempt to match t, to ty, we obtain x \ x, x \y which is
disallowed since x cannot be given two different names. On the contrary, we can match tp to ty with s =
(x \
Suey XS that Sty = t, and ty is more general than t,. This is known by numerous names and expresses
the
fact that replacing two occurrences of a variable by different variables is a generalization.
Next, we shall be concerned with substitutions into formulas which contain the logical
connectives A and
V considered as 2-ary functions. It happens that their properties make
generalization difficult.
Example 7.
. Let t} =x and t) = xAy. Then there exists a substitution S; = (x \xAy) such that st; =t>.
is more general than ty, But t; =x =t') =xATrue. Therefore ty
Then there is a substitution S> = (y \ True) such that
t, = Sty. Therefore tz is more general than t’; = ty.
We see here that the substitution rule fails. In fact, it works only
when applied to terms withi i
formulas and not to predicates which x and y are in thise xample
. A formula with
a logical connec tivesealie
oun suchaesas
OVERVIEW ~ 25
/\ and V must connect comparable things so that we can measure the generality of the expression. This is
illustrated in the following.
Example 8.
Let t; = EQ(x,1)AEQ(y,2). We claim that t) = LOWER(x,y) is a generalization of t; because there is a
substitution s such that st) =t,. Of course, this does not appear immediately and one must use some intui-
tion to prove it. Thus one can show that ty is equivalent to
t') = EQ(x,1)A EQ(y,2)A LOWER(x,2)A LOWER(1,y). Also t, can be shown to be equivalent to
t's = EQ(x,x))A EQ(y,y,) ALOWER(x,y,;) A LOWER(x,,y). Then there exists an s such that st’) =t’; and
therefore t’) = t» is more general than t’; = t.
G. Conclusion
This chapter has introduced a variety of program synthesis techniques by examining how each approaches
the example problem. A summary of the seven methods appears below in Table I. These are, of course, only
representative of the complete field of automatic programming and the reader is invited to read the remaining
chapters.
BIERMANN, GUIHO, KODRATOFF
Logic with heuristic search (Chapter Input: Logical specification of I-O Heuristic search. Two nested conditionals with a
3) characteristics. recursive loop.
Knowledge based production rules Input: Coded version of English Heuristic search. Iterative loop with a conditional.
(Chapter 10) sentence. Internal: Many production
rules.
Induction on recursion relations Input: 6 examples Construction of recursion relations. Two nested conditionals with a
(Chapters 15,16) Instantiation of schemas. recursive loop.
Merge of primitive functions. Input: 1 example Uniform search of all merges of 17 Two nested conditionals with a
(Chapter 17) primitives with pruning. recursive loop.
Syntactic production rules (Chapter Input: 1 example Selection and expansion of three Two nested conditionals with a
17) production rules. recursive loop.
Learning of I-O relation. (Chapter Input: 3 examples plus the answers Construction and generalization of Logic formula for I-0 relation with
21) to 8 yes-no queries logical relations. recursion.
Appendix
where <atom> may be either an identifier, integer, or NIL. NIL is a special reserved symbol, a LISP con-
stant. Any S-expression defined by the rule
will be called an atom. Some example S-expressions are NIL, (3.NIL), and ((A . B) . C).
Three basic LISP functions are car, cdr, and cons which are defined as
S) if x = (s; . Sy)
edr(x) = |
undefined if x is an atom
where s; and s» are S-expressions. Conventional LISP notation includes a function f and its arguments
X1,X,....X, on a list as follows (f X; X2,...,X,) and sometimes the names will be capitalized. Thus one
would write (CAR X), (CDR X), and (CONS X Y) instead of car(x), cdr (x), and cons(x,y), respectively. A
list is represented by an S-expression as follows:
ali if x is an atom
atom(x) =) NIL otherwise
T if x is a negative integer
neg(x) = NIL otherwise
28 BIERMANN, GUIHO, KODRATOFF
iT if x is NIL
not(x) = 4NIL in xis.
undefined otherwise
car(x) = (A . B)
cdr(x) = C
cons(x,x) = (((A.. B) .C) . ((A.. B) .C))
atom(x) = NIL
not(atom(x)) = T
References
Balzer [1973]
R.M. Balzer, ‘‘A global view of automatic programming,”’ Proc. of the Third Joint Conference on Artificial
Intelligence (August 1973), pp. 494-499.
Costa [1982]
E. Costa, *‘Derécursivation automatique en utilisant des systemes de réécriture de termes,’ Thése de 3éme
cycle, Paris (1981). Published (in French) by L.R.I. Bat. 490, F 91405 ORSAY CEDEX Publication interne n°
118, juin 1982.
Floyd [1967]
R.W. Floyd, *‘Assigning meanings to programs,”’ Proc. of the Symposium on Applied Mathematics, Vol. 19
(1967), pp. 19-32.
Gaudel [1978]
M.C. Gaudel, ‘‘Specifications incompletes mais suffisantes de la re presentation des types
yp abstraits,’’ R appor t
Laboria No. 320, INRIA.
OVERVIEW" 29
Gerhart [1972]
S.L. Gerhart, ‘‘Verification of APL programs,’ Ph.D. Thesis, Dept. of Computer Science, Carnegie-Mellon
University, Pittsburgh, PA (1972).
Green [1977]
C.C. Green, ‘‘A summary of the PSI program synthesis system,” Proc. of the Fifth International Conf. on
Artificial Intelligence, Vol. I (Aug. 1977), pp. 380-381.
Hoare [1969]
C.A.R. Hoare, ‘‘An axiomatic basis for computer programming,’ Comm. of the ACM, Vol. 19 (1969), pp. 576-
583.
Kant [1979]
E. Kant, ‘‘Efficiency considerations in program synthesis: A knowledge-based approach,’ Ph.D. Thesis, Com-
puter Science Dept., Stanford University (1979).
Kodratoff [1979]
Y. Kodratoff‘‘ "A class of functions synthesized from a finite number of examples and a LISP program
scheme’’, International Journal of Computer and Information Science, Vol. 8, No. 6 (1979), pp. 489-521.
Lamport [1977]
L. Lamport, ‘‘Proving the correctness of multiprocess programs,’’ /EEE Trans. on Software Eng., Vol. SE3 (Mar.
1977), pp. 125-143.
Robinson [1965]
J. Robinson, ‘“‘A machine oriented logic based on the resolution principle,’ Journal of the ACM, Vol. 2
(1965), pp. 23-41.
Shapiro [1981]
E.Y. Shapiro, ‘‘Inductive inference of theories from facts,’ Report 192, Dept. of Computer Science, Yale
University (Feb. 1981).
Smith [1977]
D.R. Smith, ‘SA class of synthesizable LISP programs,’ A.M. Thesis, Dept. of Computer Science, Duke
University (1977).
Summers [1977]
P.D. Summers, ‘“‘A methodology for LISP program construction from examples,’ Journal of the ACM, Vol. 24
(1977).
Vere [1981]
S.A. Vere, ‘‘Constrained N-to-1 generalizations,’’ Unpublished draft Feb. 1981, 23 pp.
SECTION II
CHAPTER 2
Zohar Manna
Stanford University and Weizmann Institute
Richard Waldinger
SRI International
This research was supported in part by the National Science Foundation under Grants MCS 76—83655 and MCS 78—02591, in part by
the Office of Naval Research under Contracts N00014—76—C—0687 and N00014—75—C—0816, in part by the Defense Advanced
Research Projects Agency of the Department of Defense under Contract MDA903—76—C—0206, and in part by the United States-Israel
Binational Science Foundation.
Authors’ addresses: Z. Manna, Department of Computer Science, Stanford University, Stanford, CA. 94305; R. Waldinger, Artificial
Intelligence Center, SRI International, 333 Ravenswood Ave., Menlo Park, CA 94025.
* This article is reprinted from the Transactions on Programming Languages and Systems, January 1980, ‘‘A Deductive Approach to Pro-
gram Synthesis,’ by Zohar Manna and Richard Waldinger. Copyright 1980, Association for Computing Machinery, Inc., reprinted by per-
mission.
34 MANNA AND WALDINGER
Abstract
Program synthesis is the systematic derivation of a program from a given specification. A deductive
regards
approach to program synthesis is presented for the construction of recursive programs. This approach
program synthesis as a theorem-pro ving task and relies on a theorem-pro ving method that combines the
features of transformati on rules, unification, and mathematical induction within a single framework.
A. Motivation
The early work in program synthesis relied strongly on mechanical theorem-proving techniques. The
work of Green [1969] and Waldinger and Lee [1969], for example, depended on resolution-based theorem
proving, however, the difficulty of representing the principle of mathematical induction in a resolution frame-
work hampered these systems in the formation of programs with iterative or recursive loops. More recently,
program synthesis and theorem proving have tended to go their separate ways. Newer theorem-proving sys-
tems are able to perform proofs by mathematical induction (e.g., Boyer and Moore [1975]) but are useless for
program synthesis because they have sacrificed the ability to prove theorems involving existential quantifiers.
Recent work in program synthesis (e.g., Burstall and Darlington [1977] and Manna and Waldinger [1979]), on
the other hand, has abandoned the theorem-proving approach and has relied instead on the direct application of
transformation or rewriting rules to the program’s specification; in choosing this path, these systems have
renounced the use of such theorem-proving techniques as unification or induction.
In this paper we describe a framework for program synthesis that again relies on a theorem-proving
approach. This approach combines techniques of unification, mathematical induction, and transformation rules
within a single deductive system. We outline the logical structure of this system without considering the stra-
tegic aspects of how deductions are directed. Although no implementation exists, the approach is machine
oriented and ultimately intended for implementation in automatic synthesis systems.
In the next section we give examples of specifications accepted by the system. In the succeeding sections
we explain the relation between theorem proving and our approach to program synthesis.
B. Specification
The specification of a program allows us to express the purpose of the desired program, without indicating
an algorithm by which that purpose is to be achieved. Specifications may contain high-level constructs that are
not computable, but are close to our way of thinking. Typically, specifications involve such constructs as the
quantifiers for all ... and for some ..., the set constructor {x: ...}, and the descriptor find z such that ....
ae to specify a program to compute the integer square root of a nonnegative integer n, we
would write
Here, ordered(z) expresses that the elements of the output list z should be in nondecreasing order; perm(/,z)
expresses that z should be a permutation of the input /; and islist(/) expresses that / can be assumed to be a
list.
To describe a program to find the last element of a nonempty list /, we might write
Here, ‘‘u<>v’’ denotes the result of appending the two lists u and v; [u] denotes the list whose sole element
is u; and [] denotes the empty list. (Thus, [A B C]<>I[D] yields [A B C DJ; therefore, by the above
specification, last([A B C D]) = D.)
In general, we are considering the synthesis of programs whose specifications have the form
Here, a denotes the input of the desired program and z denotes its output; the input condition P(a) and the
output condition R(a,z) may themselves contain quantifiers and set constructors (but not the find descriptor).
The above specification describes an applicative program, one which yields an output but produces no side
effects. To derive a program from such a specification, we attempt to prove a theorem of the form
for alla,
if P(a)
then for some z, R(a,z).
The proof of this theorem must be constructive, in the sense that it must tell us how to find an output z satis-
fying the desired output condition. From such a proof, a program to compute z can be extracted.
The above notation can be extended to describe several related programs at once. For example, to specify
the programs div(i,j) and rem(i,j) for finding the integer quotient and remainder, respectively, of dividing a
nonnegative integer i by a positive integer j, we write
(div(i,j),rem(i,j)) <==
find(y,z) such that integer(y) and
integer(z) and i=y:j+z and 0 <zandz <j
where integer(i) and integer(j) and 0 <iand0 <j.
36 MANNA AND WALDINGER
C. Basic Structure | .
of two lists S aa
The basic structure employed in our approach is the sequent, which consists
or goal there may arene iean
assertions Ay,A>,...,Am, and the goals G;,G),...,Gn- With each assertion
on the proof itself, peeves ‘s Py
entry called the output expression. This output entry has no bearing
(cf. the ‘‘answer literal’” in Dudes
segment that has been constructed at each stage of the derivation
row in the sequen
We denote a sequent by a table with three columns: assertions, goals, and outputs. Each
has the form
assertions outputs
A,(a,x) ti(a,x)
or
Basa Joe 7
The meaning of a sequent is that if all instances of each of the assertions are true, then some instances of
at least one of the goals is true; more precisely, the sequent has the same meaning as its associated sentence
where a denotes all the constants of the sequent and x denotes all the free variables. (In general, we denote
constants or tuples of constants by ‘‘a,b,c,...,.n” and variables or tuples of variables by ‘‘u,v,w,...,z”.) If some
instance of a goal is true (or some instance of an assertion is false), the corresponding instance of its output
expression satisfies the given specification. In other words, if some instance G,(a,e) is true (or some instance
A;(a,e) is false), then the corresponding instance t,(a,e) (or t;(a,e)) is an acceptable output.
Note that (1) an assertion or goal is not required to have an output entry; (2) an assertion and a goal
never occupy the same row of the sequent; (3) the variables in each row are ‘‘dummies”’ that we can systemati-
cally rename without changing the meaning of the sequent.
The distinction between assertions and goals is artificial and does not increase the logical power
of the
deductive system. In fact, if we delete a goal from a sequent and add its negation as a new assertion,
we obtain
an equivalent sequent; similarly, we can delete an assertion from a sequent and add
its negation as a new goal
without changing the meaning of the sequent. This property is known as duality. Neverthele
ss, the distinction
between assertions and goals makes our deductions easier to understand.
A DEDUCTIVE APPROACH 37
In other words, we assume that the input condition P(a) is true, and we want to prove that for some z, the goal
R(a,z) is true; if so, z represents the desired output of the program f(a). The output z is a variable, for which
we can make substitutions; the input a is a constant. If we prefer, we may remove quantifiers in P(a) and
R(a,z) by the usual skolemization procedure (see, e.g., Nilsson 1979]).
The input condition P(a) is not the only assertion in the sequent; typically, simple, basic axioms, such as
u=u, are represented as assertions that are tacitly present in all sequents. Many properties of the subject
domain, however, are represented by other means, as we shall see.
The deductive system we describe operates by causing new assertions and goals, and corresponding new
output expressions, to be added to the sequent without changing its meaning. The process terminates if the
goal true (or the assertion false ) is produced, whose corresponding output expression consists entirely of primi-
tives from the target programming language; this expression is the desired program. In other words, if we
develop a row of form
is TestarSeah satenich a a a c
or
f(a) <==t.
Note that this deductive procedure never requires us to establish new sequents or (except for strategic
purposes) to delete an existing assertion or goal. In this sense, the approach more resembles resolution than
‘‘natural deduction.”
Suppose we are required to construct two related programs f(a) and g(a); i.e., we are given the
specification
: outputs
P(a)
fy a
R(a,y,z) y Z
eee eS
where both s and t are primitive expressions, then the desired programs are
f(a) <==<s
and
g(a) <==t.
In the remainder of this paper we outline the deductive rules of our system and their application to pro-
gram synthesis.
D. Splitting Rules
The splitting rules allow us to decompose an assertion or goal into its logical components. For example, if
our sequent contains an assertion of form (F and G), we can introduce the two assertions F and G into the
sequent without changing its meaning. We will call this the andsplit rule and express it in the following nota-
tion:
aye E
ee t
t
2G
G t
This means that if rows matching those above the double line are present in the sequent, then the correspond-
ing rows below the double line may be added.
Similarly, we have the orsplit rule
assertions
t
t
A DEDUCTIVE APPROACH 39
ee ea
There is no orsplit rule or ifsplit rule for assertions and no andsplit rule for goals. Note that the output
entries for the consequents of the splitting rules are exactly the same as the entries for their antecedents.
Although initially only the goal has an output entry, the ifsplit rulecan introduce an assertion with an out-
put entry. Such assertions are rare in practice, but can arise by the action of such rules.
E. Transformation Rules
Transformation rules allow one assertion or goal to be derived from another. Typically, transformations
are expressed as conditional rewriting rules.
meaning that in any assertion, goal, or output expression, a subexpression of form r can be replaced by the
corresponding expression of form s, provided that the condition P holds. We never write such a rule unless r
and s are equal terms or equivalent sentences, whenever condition P holds. For example, the transformation
rule
expresses that an element belongs to a nonempty list if it equals the head of the list or belongs to its tail.
(Here, head(v) denotes the first element of the list v, and tail(v) denotes the list of all but the first element.)
The rule
A transformation rule
r I Il V n = @}
Lea
condi-
is not permitted to replace an expression of form s by the corresponding expression of form r when the
tion P holds, even though these two expressions have the same values. For that purpose, we would require a
second rule
x + 0 ==>x_ if number(x)
but not the rule
==>x+0_ if number(x).
is a transformation rule and F is an assertion containing a subexpression r’ which is not within the scope of any
quantifier. Suppose also that there exists a unifier for r and 1’, i.e., a substitution @ such that r@ and r’@ are
identical. Here, r@ denotes the result of applying the substitution @ to the expression r. We can assume that 6
is a ‘‘most general’’ unifier (in the sense of Robinson [1965]) of r and r’. We rename the variables of F, if
necessary, to ensure that it has no variables in common with the transformation rule. By the rule, we can con-
clude that if P@ holds, then r@ and s@ are equal terms or equivalent sentences. Therefore, we can add the
assertion
to our sequent. Here, the notation Fé[r@ — s6] indicates that every occurrence of ré@ in F@ is to be replaced by
sé.
For example, suppose we have the assertion
a€/ and a #0
taking r’ to be a€/ and @ to be the substitution [u —a: v — /]; then we obtain the new assertion
A DEDUCTIVE APPROACH 41
if islist(/) and / # [J
then (a = head(/) or a € tail(/)) anda #0.
Note that a and / are constants, while u and v are variables, and indeed, the substitution was made for the vari-
ables of the rule but not for the constants of the assertion.
In general, if the given assertion F has an associated output entry t, the new output entry is formed by
applying the substitution 6 to t. For, suppose some instance of the new assertion "if P@ then Fé[r@ —sé]" is
false; then the corresponding instance of P@ is true, and the corresponding instance of Fé[r@ —s8@] is false.
Then, by the transformation rule, the instances of r@ and s@ are equal; hence the corresponding instance of F@
is false. We know that if any instance of F is false, the corresponding instance of t satisfies the given
specification. Hence, because some instance of F@ is false, the corresponding instance of t@ is the desired out-
put.
In our deduction rule notation, we write
walt
ifPothen Fores) SSSCSC~—~SSSC‘
The corresponding dual deduction rule for goals is
il Saad meee F
Sst~—<“S~sS™SCS*SYSC*‘ ano — STC C*d
For example, suppose we have the goal
taking r’ to be alz and @ to be the substitution [z ~—0; u a]. Then we obtain the goal
This rule will have the desired effect of reducing the goal f(a) = f(b) to the simpler subgoal a = b, and (like
the consequent theorem) will not have the pernicious side effect of deriving from the simple assertion a = b
the more complex assertion f(a) = f(b). The axiomatic representation of the same fact would have both
results. (Incidentally, the transformation rule has the beneficial effect, not shared by the consequent theorem,
of deriving from the complex assertion not(f(a) = f(b)) the simpler assertion not(a = b).)
F. Resolution
The original resolution principle (Robinson [1965]) required that sentences be put into conjunctive nor-
mal form. As a result, the set of clauses sometimes exploded to an unmanageable size and the proofs lost their
intuitive content. The version of resolution we employ does not require the sentences to be in conjunctive nor-
mal form.
Assume our sequent contains two assertions F and G, containing subsentences P, and P», respectively,
that are not within the scope of any quantifier. For the time being, let us ignore the output expressions
corresponding to these assertions. Suppose there exists a unifier for P, and P>, i.e., a substitution @ such that
P,@ and P26 are identical. We can take @ to be the most general unifier. The AA-resolution rule allows us to
deduce the new assertion
and add it to the sequent. Recall that the notation Fé[P,6 — true] indicates that every
instance of the subsen-
tence P\@ in F@ is to be replaced by true. (Of course, we may need to do the usual
renaming to ensure that F
and G have no variables in common.) We will call @ the unifying substitution and
P,\@ (=P,6) the eliminated
subexpression; the deduced assertion is called the resolvent. Note that
the rule is symmetric, so the roles of F
and G may be reversed.
A DEDUCTIVE APPROACH § 43
The two subsentences ‘‘P(x) and Q(b)”’ and ‘‘P(a) and Q(y)”’ can be unified by the substitution
0 = |xi— a; ye — bl,
Therefore, the AA-resolution rule allows us to eliminate the subexpression ‘‘P(a) and Q(b)”’ and derive the
conclusion
R (a)
by application of the appropriate transformation rules.
The conventional resolution rule may be regarded as a special case of the above AA-resolution rule. The
conventional rule allows us to derive from the two assertions
(not P;) or Q
and
P,orR
where @ is a most general unifier of P} and P>. From the same two assertions we can use our AA-resolution
rule to derive
Qé or RA
assertions goals
F
oe eT |
Foe[P,@ —true] or Gé[P,@ — false]
assertions
F
assertions
goals
Up to now, we have ignored the output expressions of the assertions and goals. However, if at least one
of the sentences to which a resolution rule is applied has a corresponding output expression, the resolvent will
also have an output expression. If only one of the sentences has an output expression, say t, then the resolvent
will have the output expression t@. On the other hand, if the two sentences F and G have output expressions
t, and t», respectively, the resolvent will have the output expression
(Of course, if t,}@ and t,@ are identical, no conditional expression need be formed; the output expression is sim-
ply t,6.)
The justification for constructing this conditional as an output expression is as follows. We consider only
the GG case: Suppose that the goal
has been obtained by GG-resolution from two goals F and G. We would like to show that if the goal is true,
the conditional output expression satisfies the desired specification. We assume that the resolvent is true;
therefore both Fé[P,6 — true] and Gé[P,6 false] are true. In the case that P,@ is true, we have that F@ is
also true. Consequently, the corresponding instance t,@ of the output expression t, satisfies the specification of
the desired program. In the other case, in which P,@ is false, P,6 is false, and the same reasoning allows us to
conclude that t7@ satisfies the specification of the desired program. In either case we can conclude that the con-
ditional
satisfies the desired specification. By duality, the same output expression can be derived for the AA-resolution,
GA-resolution, and AG-resolution.
For example, let u-v denote the operation of inserting u before the first element of the list v, and sup-
pose we have the goal
Herel Ae see 2 ae ee
with no output expression; then by GA-resolution, applying the substitution
6 = [u+-a,z+a-vl
46 MANNA AND WALDINGER
head(a:v) = a,
ii aillav) = 6
by application of the appropriate transformation rules. Note that we have applied the substitution (Wo — a7 —
a:v] to the original output expression z, obtaining the new output expression a-v. Therefore, if we can find v
such that tail(a-v) = b, the corresponding instance of a-v will satisfy the desired specification.
Another example: Suppose we have derived the two goals
outputs
assertions
max(/)
max(tail(/)) > head(/) max (tail(/))
and tail@)a—all
not(max(tail(/)) > head(/)) head(/)
and tail(/) ¥ []
Then by GG-resolution, eliminating the subsentence max(tail(/)) > head(/), we can derive the new goal
Pew anda Sd
and the assertion
However, we can also apply GA-resolution and eliminate P(c,d), yielding the resolvent
ae
ae
Finally, we can also apply AG-resolution to the same assertion and goal in two different ways, eliminating
P(c,d) and eliminating Q(b,a); both of these applications lead to the same trivial goal false.
A polarity strategy adapted from Murray [1978] restricts the resolution rules to prevent many such fruitless
applications. We first assign a polarity (either positive or negative) to every subsentence of a given sequent as
follows:
outputs
f(a)
A DEDUCTIVE APPROACH 49
Then we can always add to our sequent a new assertion, the induction hypothesis
ifeeccea
then if P(u)
then R(u,f(u))
Here, f denotes the program we are trying to construct. The well-founded set and the particular well-founded
<, to be employed in the proof have not yet been determined. If the induction hypothesis is used more than
once in the proof, always refer to the same well-founded ordering <y.
Let us paraphrase: We are attempting to construct a program fsuch that for an arbitrary input a satisfy-
ing the input condition P(a), the output f(a) will satisfy the output condition R(a,f(a)). By the well-founded
induction principle, we can assume inductively that for every u less than a (in some well-founded ordering)
such that the input condition P(u) holds, the output f(u) will satisfy the same output condition R(u,f(u)). By
employing the induction hypothesis in the proof, recursive calls to f can be introduced into the output expres-
sion for f(a).
As we Shall see in a later section, we can introduce an induction hypothesis corresponding to any subset
of the assertions or goals in our sequent, not just the initial assertion and goal; most of these induction
hypotheses are not relevant to the final proof, and the proliferation of new assertions obstructs our efforts to
find a proof. Therefore, we employ the following recurrence strategy for determining when to introduce an
induction hypothesis.
Let us restrict our attention to the case where the induction hypothesis is formed from the initial sequent.
Suppose that at some point in the derivation a goal is developed of the form
where s_ is an arbitrary term. In other words, the new goal is a precise instance of the initial goal R(a,z)
obtained by replacing a by s. This recurrence motivates us to add the induction hypothesis
iigte<ea
then if P(u)
then R(u,f(u))
The rationale for introducing the induction hypothesis at this point is that now we can perform GA-
resolution between the newly developed goal R(s,z’) and the induction hypothesis. The resulting goal is then
Note that a recursive call f(s) has been introduced into the output expression for f(a). By proving the expres-
sions <, a, we ensure that this recursive call will terminate; by proving the expression P(s), we guarantee that
the argument s of the recursive call will satisfy the input condition of the program f.
The particular well-founded ordering <, to be employed by the proof has not yet been determined. We
assume the existence of transformation rules of form
Wa te mai (URY)
capable of choosing or combining well-founded orderings applicable to the particular theories under considera-
tion (e.g., numbers, lists, and sets).
Let us look at an example. Suppose we are constructing two programs div(i,j) and rem(i,j) to compute
the quotient and remainder, respectively, of dividing a nonnegative integer i by a positive integer j;, the
specification may be expressed as
(Note that, for simplicity, we have omitted type requirements such as integer(i).) Our initial sequent is then
assertions wee UN a
div (i,j) rem(i,j)
OFcipands 04
i=yj+z and 0 <z and z<j y Z
Here, the inpuis i and j are constants, for which we can make no substitution; y and the output z are vari-
ables.
Assume that during the course of the derivation we develop the goal
PNAa A, utah
Ob vrei itel
2 ke?
obtained by replacing i by i—j. Therefore, we add as a new assertion the induction hypothesis.
i (05) oe)
then if 0 <u, and 0<u,
then uy = div(uy,uy)-uy + rem(uj,u9)
and 0 < rem(u;,u9) and rem(u;,u) <u,
which reduces to
Note that the recursive calls div(i—j,j) and rem(i—j,j) have been introduced into the output entry.
The particular well-founded ordering <,, to be employed in the proof has not yet been determined. It can
be chosen to be the < ordering on the first component of the pairs, by application of the transformation rule
(uy,Uy) <NI (v1,V>) => true if uj <v,;and0 <u, and0 < vy.
A new goal
In other words, in the case that j<i, the outputs div(i—j,j) + 1 and rem(i—j,j) satisfy the desired program’s
specification. In the next section, we give the full derivation of these programs.
In our presentation of the induction rule, several limitations were imposed for simplicity but are not actu-
ally essential:
52. MANNA AND WALDINGER
the constants
(1) In the example we considered, the only skolem functions in the initial sequent are
are those correspondi ng to the program's outputs;
corresponding to the program’s inputs, and the only variables
the sequent was of form
outputs
assertions f(a)
In forming the induction hypothesis, the skolem constant a is replaced by a variable u and the variable z is
replaced by the term f(u); the induction hypothesis was of form
it <a
then if P(u)
then R(u,f(u))
However, if there are other skolem functions in the initial sequent, they too must be replaced by variables
in the induction hypothesis; if there are other variables in the initial sequent, they must be replaced by new
skolem functions. For example, suppose the initial sequent is of form
P(a)
R (a,z,g1(z) x2)
= z
where gj(z) is the skolem function corresponding to x,;. The induction hypothesis is then of form
ieee
then if P(u)
then R(u,f(u),v,g5(u,v))
Here, the skolem function g,(z) has been replaced by the variable v, and the variable x, has been replaced by a
new skolem function g>(u,v).
(2) One limitation to the recurrence strategy was that the induction hypothesis was
introduced only when
an entire goal is an instance of the initial goal. In fact, the Strategy can
be extended so that the hypothesis is
introduced when some subsentence of a goal is an instance of some subsentenc
e of the initial goal, because the
Bee rule can then be applied between the goal and the induction hypothesi
s. This extension is straight-
orward.
A DEDUCTIVE APPROACH — 53
(3) A final observation: The induction hypothesis was always formed directly from the initial sequent;
thus, the theorem itself was proved by induction. In later sections we extend the rule so that induction can be
applied to lemmas that are stronger or more general than the theorem itself. This extension also accounts for
the formation of auxiliary procedures in the program being constructed.
Some early efforts toward incorporating mathematical induction in a resolution framework were made by
Darlington [1968]. His system treated the induction principle as a second-order axiom schema rather than as a
deduction rule; it had a limited ability to perform second-order unifications.
(For simplicity, we again omit type conditions, such as integer(i), from this discussion.) Our initial sequent is
therefore
outputs
div (i,j) rem (i,j)
an=e)
54 MANNA AND WALDINGER
Assume we have the following transformation rules that define integer multiplication:
= < I I V S
Applying the first of these rules to the subexpression y:j in goal 2 yields
7.i=O+zand0<zandz<j |
Oo | |
G—=Nly— 0; ¥ —3)
applying this substitution to the output entry y produced the new output 0.
Applying the numerical transformation rule
yields
The GA-resolution rule can now be applied between goal 8 and the equality assertion 3, u = u. The unifying
substitution is
De eck Sacs |
In other words, we have found that in the case that i < j, the output 0 will satisfy the specification
for the quo-
tient program and the output i will satisfy the specification for the remainder program.
Let us return our attention to the initial goal 2,
= yj tezeand: 0'<zeanduzi—:
A DEDUCTIVE APPROACH _ 55
Li = yi ez and Vivato Zz
0 <zandz <j
applying this substitution to the output entry y produced the new output y;+l in the div program.
The transformation rule
Lip = V4 ez yi Hl Zz
and 0 <zandz <j
i= yj + zand
0<z andz <j,
obtained by replacing the input i by i—j. (Again, the replacement of the dummy variable y by y, is not
significant.) Therefore, the following induction hypothesis is formed:
By applying GA-resolution between goal 12 and the induction hypothesis, we obtain the goal
Note that the substitution to the variable y; has caused the output entry y; +1 to be changed to div(i—j,j) + 1
and the output entry z to be replaced by rem(i—j,j). The use of the induction hypothesis has introduced the
recursive calls div(i—j,j) and rem(i—j,j) into the output.
Goal 14 reduces to
The particular ordering <, has not yet been determined; however, it is chosen to be the < ordering on the
first component of the pairs, by application of the transformation rule
Note that the conditions of the transformation rule caused new conjuncts to be added to the goal.
By application of algebraic and logical transformation rules, and GA-resolution with the assertion 5, 0 <i,
and assertion 6, 0 < j, goal 16 is reduced to
iii ae are
In other words, we have learned that in the case that j <i, the outputs div(i—j,j) + 1 and rem(i-j,j)
satisfy the
specification of the div program. On the other hand, in deriving goal 10 we learned that in the case that
i < ih
0 and i are satisfactory outputs. Assuming we have the assertion 4
A DEDUCTIVE APPROACH 57
u <vorv <u,
by GA-resolution.
The final goal
can then be obtained by GG-resolution between goals 10 and 18. The conditional expressions have been
formed because both goals have a corresponding output entry. Because we have developed the goal true and a
corresponding primitive output entry, the derivation is complete. The final programs
. outputs
assertions goals F(a)
P(a)
R(a,z) Z
Let goal A be any goal obtained during the derivation of f(a), and assume that goal A is of form
ee
Suppose that by applying deduction rules successively to goal A and to the assertions Pa), PS@).....b 4 aor
the sequent, we obtain a goal B of form
B:
where s is an arbitrary term. (For simplicity, we assume that no goals are required other than those derived
from goal A, and that none of the k required assertions have associated output entries.)
In summation, we have developed a new goal (goal B) that is a precise instance of the earlier goal (goal
A), obtained by replacing the input a by the term s. This recurrence motivates us to define an auxiliary pro-
cedure fnew(a) whose output condition is goal A; we then hope to achieve goal B by a recursive call to the new
procedure.
Let us be more precise. The specification for fnew(a’) is
Here, the input condition P’(a’) is (P’)(a’) and P’,(a’) and ... and P’,(a’)). If we succeed in constructing a pro-
gram that meets this specification, we can employ it as an auxiliary procedure of the main program f(a).
Consequently, at this point we add a new output column for fnew(a’) to the sequent, and we introduce
the new rows
assertions outputs
f(a) fnew(a’)
ew)
Note that in these rows we have replaced the input constant a by a new constant
a’. This step is logically
necessary; adding the induction hypothesis without renaming the constant can
lead to false results. The second
row (goal A’) indicates that if we succeed in constructing fnew(a’) to satisfy
the above specification, then f(a)
may be computed by a call t’(fnew(a)) to the new procedure.
A DEDUCTIVE APPROACH 59
By introducing the procedure fnew(a’) we are able to call it recursively. In other words, we are now able
to form an induction hypothesis from the assertion P’(a’) and the goal R’(a’,z’), namely,
fats << aM
then if P’(u’)
then R’(u’,fnew(u’))
If this assertion is employed during a proof, a recursive call to fnew can be introduced into the output column
for fnew(a’). The well-founded ordering <y-, corresponding to fnew(a’) may be distinct from the ordering <y
corresponding to f(a).
Note that we do not begin a new sequent for the derivation of the auxiliary procedure fnew; the synthesis
of the main program f(a) and the auxiliary procedure fnew(a’) are both conducted by applying derivation rules
to the same sequent. Those rows with output entries for fnew(a’) always have the expression t’(fnew(a)) as
the output entry for f(a).
Suppose we ultimately succeed in obtaining the goal true with primitive output entries t and t’:
outputs
assertions goals fa frewia?
a i Ce Gee
Then the final program is
f(a) <<==t
and
fnew(a) <==
Note that although the portion of the derivation leading from goal A to goal B serves to motivate the for-
mation of the auxiliary procedure, it may actually have no part in the derivation of the final program; its role
has been taken over by the derivation of goal B’ from goal A’.
It is possible to introduce many auxiliary procedures for the same main program, each adding a new out-
put column to the sequent. An auxiliary procedure may have its own set of auxiliary procedures. An auxiliary
procedure may call the main program or any of the other procedures; in other words, the system of procedures
can be ‘‘mutually recursive.”
If we fail to complete the derivation of an auxiliary procedure fnew(a’), we may still succeed in finding
some other way of completing the derivation of f(a) without using fnew, by applying deduction rules to rows
that have no output entry for fnew(a’).
To illustrate the formation of auxiliary procedures, we consider the synthesis of a program cart(s,t) to
compute the cartesian product of two (finite) sets s and t, i.e., the set of all pairs whose first component
belongs to s and whose second component belongs to t. The specification for this program is
; outputs
assertions goals cart(s,t)
TP aba sand
ea
(Note that this specification has no input condition, except for the type condition isset(s) and isset(t), which we
omit for simplicity.)
We denote the empty set by {}. If u is a nonempty set, then choice(u) denotes some particular element
of u, and rest(u) denotes the set of all other elements. We assume that the transformation rules concerning
finite sets include:
a We will not reproduce the complete derivation, but only those portions that concern the formation of aux-
iliary procedures.
By application of deduction rules to the initial sequent, we obtain the goal
By applying several deduction rules to this goal alone, we obtain the new goal
B:
This S goal is a precise instance of the earlier goal; consequently Our recurren
ce strategy motiv
auxiliary procedure cartnew(s,t) having the earlier goal as its ou
tput specification, i.e.,a AS iene Sage
A DEDUCTIVE APPROACH 61
We therefore introduce an additional output column corresponding to the new procedure, and we add to the
sequent the row
A’:
outputs
assertions
cart(s,t) cartnew(s’,t’)
z’ = {(a,b): a = choice(s’) ifs = {} Zz"
and b € t’} then {}
else cartnew(s,t)
cart(rest(s)
,t)
clurv yeu
then cartnew(u’,v’) = {(a,b): a = choice(u’) and b € v’}
B’:
Applying GA-resolution between this goal and the induction hypothesis, and simplifying by transformation
rules, we obtain the goal
Note that a recursive call has now appeared in the output entry for the auxiliary procedure cartnew. By further
transformation, the well-founded ordering <y is chosen to be <3» defined by
cartnew(s‘,t’)) <== if t’ = {}
then {}
else (choice(s’) ,choice(t’)) U
cartnew(s’,rest(t’)).
There are a few extensions to the method for forming auxiliary procedures that we will not describe in
detail:
(1) We have been led to introduce an auxiliary procedure when an entire goal was found to be an
instance of a previous goal. As we remarked in the section on mathematical induction, we can actually intro-
duce an auxiliary procedure when some subsentence of a goal is an instance of some subsentence of a previous
goal.
(2) Special treatment is required if the assertions and goal incorporated into the induction hypothesis
contain more than one occurrence of the same skolem function. We do not describe the formation of such an
induction hypothesis here.
(3) To complete the derivation of the auxiliary procedure, we may be forced to weaken or strengthen its
specification by adding input or output conditions incrementally. We do not present here the extension of the
procedure-formation principle that permits this flexibility.
L. Generalization
In performing a proof by mathematical induction, it is often necessary to generalize the theorem to be
proved, so as to have the advantage of a stronger induction hypothesis in proving the inductive step. Paradoxi-
cally, the more general statement may be easier to prove. If the proof is part of the synthesis of a program,
generalizing the theorem can result in the construction of a more general procedure, so that recursive calls to
the procedure will be able to achieve the desired subgoals. The recurrence strategy we have outlined earlier
provides a strong clue as to how the theorem is to be generalized.
We have formed an auxiliary procedure when a goal is found to be a precise instance of a previous goal.
However, in some derivations it is found that the new goal is not a precise instance of the earlier goal, but that
both are instances of some more general expression. This situation suggests introducing a new auxiliary pro-
cedure whose output condition is the more general expression, in the hope that both goals may be achieved by
calls to this procedure.
Let us be more precise. Suppose we are in the midst of a derivation and that we have already developed a
goal A, of form
A:
assertions outputs
f(a)
A DEDUCTIVE APPROACH _ 63
where s, is an arbitrary term. Assume that by applying deduction rules only to goal A and some assertions
P’,(a), P’2(a),..., P’,(a), we obtain a goal B, of form
B:
R’(a,89,Z7 ) ty (z>)
where sp» is a term that does not match s,. Thus, the new goal (goal B) is not a precise instance of the earlier
goal (goal A). Hence, if an induction hypothesis is formed for goal A itself, the resolution rule cannot be
applied between goal B and the induction hypothesis.
However, both goals A and B may be regarded as instances of the more general expression R’(a,b’,z’),
where b’ is a new constant: goal A is obtained by replacing b’ by s), and goal B is obtained by replacing b’ by sp.
This suggests that we attempt to establish a more general expression (goal A’) hoping that the proof of goal A’
will contain a subgoal (goal B’) corresponding to the original goal B, so that the induction hypothesis resulting
from goal A’ will be strong enough to establish goal B’.
The new goal A’ constitutes the output condition for an auxiliary procedure, whose specification is
(Here, P’(a’) is the conjunction P’,(a’) and P’,(a’) --- and P’,(a’).) Consequently, we introduce a new output
column to the sequent, and we add the new assertion
outputs
assertions goals
f(a) fnew(a’,b’)
P’(a’)
and the new goal
AC:
a R‘(a’,b’,z) t, (fnew(a,s,))
(Note again that it is logically necessary to replace the input condition a by a new constant a’.) Corresponding
to this assertion and goal we have the induction hypothesis
There is no guarantee that we will be able to develop from goal A’ a new goal B’ such that the resolution rule
can be applied between goal B’ and the induction hypothesis. Nor can we be sure that we will conclude the
derivation of fnew successfully. If we fail to derive fnew, we may still complete the derivation of f in some
other way.
We illustrate the generalization process with an example that also serves to show how program-synthesis
techniques can be applied as well to program transformation (see, e.g., Burstall and Darlington [1977]). In this
64 MANNA AND WALDINGER
application we are given a clear and concise program, which may be inefficient, we attempt to derive an
equivalent program that is more efficient, even though it may be neither clear nor concise.
We are given the program
reverse(/) <== if / =]
then []
else reverse(tail(/)) <> [head(/)]
for reversing the order of the elements of a list /. Here, head(/) is the first element of a nonempty list / and
tail(/) is the list of all but the first element of /. Recall that u<>v is the result of appending two lists u and
v, [] denotes the empty list, and [w] is the list whose sole element is w. As usual, we omit type conditions,
such as islist(/), from our discussion.
This reverse program is inefficient, for it requires many recursive calls to reverse and to the append pro-
cedure <>. We attempt to transform it to a more efficient version. The specification for the transformed pro-
gram rev(/) is
A:
2 outputs
assertions oals
ee a
reverse (/)
The given reverse program is not considered to be a primitive. However, we admit the transformation rules
head(u-v) ==> u
tail(uv) ==>yv
ful ==> u[]
(uv = []) ==> false
(where u:v is the result of inserting u before the first element of the list v; it is the Lisp
cons function)
A DEDUCTIVE APPROACH 65
u<>v==>v ifu=([]
u<>v==>u ifv=[
u <>v ==> head(u) - (tail(u) <>v) if uf]
(u <> vy) <> w ==> <> @ <> w)
tail(/)) <p, / ==> true if /[]
B:
This goal is not a precise instance of goal A. However, both goals may be regarded as instances of the more
general expression
Goal A is obtained by replacing /’ by tail(/) and m’ by [] (because u<>[] = u), and goal B is obtained by
replacing /’ by tail(/) and m’ by [head(/)]. This suggests that we attempt to construct an auxiliary procedure
having the more general expression as an output condition; the specification for this procedure is
Consequently, we introduce a new output column to the sequent, and we add the new goal
ie
fv e<y (om)
then revnew(u’,v’) = reverse(u’) <> v’
66 MANNA AND WALDINGER
outputs
Eels rev(/) revnew(/‘,m’)
—_ . igi
We succeed in applying the resolution rule between this goal and the induction hypothesis.
Ultimately, we obtain the final program
This program turns out to be more efficient than the given program reverse(/); it is essentially iterative
and employs the insertion operation - instead of the imperative append operation <>. In general, however, we
have no guarantee that the program produced by this approach will be more efficient than the given program.
A possible remedy is to include efficiency criteria explicitly in the specification of the program. For example,
we might require that the rev program should run in time linear to the length of /. In proving the theorem
obtained from such a specification, we would be ensuring that the program constructed would operate within
the specified limitations. Of course, the difficulty of the theorem-proving task would be compounded by such
measures.
Some generalizations are quite straightforward to discover. For example, if goal A is of form R’(a,0,z;)
and goal B is of form R‘(a,1,z2), this immediately suggests that we employ the general expression R’(a,b’,z’).
Other generalizations may require more ingenuity to discover. In the reverse example, for instance, it is not
immediately obvious that z; = reverse(/) and z) = reverse(tail(/)) <> [head(/)] should both be regarded as
instances of the more general expression z’ = reverse(/’) <> m’.
Our strategy for determining how to generalize an induction hypothesis is distinct from that of Boyer and
Moore [1975]. Their system predicts how to generalize a goal before developing any subgoals. In our
approach, recurrences between a goal and its subgoals suggest how the goal is to be generalized.
in which attention is passed back and forth freely between several competing assertions and goals. The present
framework can take advantage of parallel hardware.
Furthermore, the task of program synthesis always involves a theorem-proving component, which is
needed, say, to prove the termination of the program being constructed, or to establish the input condition for
recursive calls. (The Burstall-Darlington system is interactive and relies on the user to prove these theorems;
DEDALUS incorporates a separate theorem prover.) If we retain the artificial distinction between program syn-
thesis and theorem proving, each component must duplicate the efforts of the other. The mechanism for form-
ing recursive calls will be separate from the induction principle; the facility for handling specifications of the
form
and so forth. By adopting a theorem-proving approach, we can unify these two components.
Theorem proving was abandoned as an approach to program synthesis when the development of
sufficiently powerful automatic theorem provers appeared to flounder. However, theorem provers have been
exhibiting a steady increase in their effectiveness, and program synthesis is one of the most natural applications
of these systems.
N. Acknowledgments
We would like to thank John Darlington, Chris Goad, Jim King, Neil Murray, Nils Nilsson, and Earl
Sacerdoti for valuable discussions and comments.
References
Bledsoe [1977]
W.W. Bledsoe, ‘‘Non-resolution theorem proving,” Artif. Intel. J. 9(1977), pp. 1—35.
Darlington [1968]
J.L. Darlington, ‘‘Automatic theorem proving with equality substitutions and mathematical induction,”
Machine Intell. 3(Edinburgh, Scotland) (1968), pp. 113-127.
Green [1969]
C.C. Green, ‘‘Application of theorem proving to problem solving,’ in Proc. Int. Joint Conf. on Artificial Intelli-
gence (Washington, D.C., May 1969), pp. 219-239.
68 MANNA AND WALDINGER
Hewitt [1971]
A language for proving
C. Hewitt, ‘‘Description and theoretical analysis (using schemata) of PLANNER:
theorems and manipulating models in a robot,”’ Ph.D. Diss., M.I.T., Cambridge, Mass., 1971.
Murray [1978]
N. Murray, ‘‘A proof procedure for non-clausal first-order logic,’’ Tech. Rep. Syracuse Univ., Syracuse, N.Y.,
1978.
Nilsson [1979]
N.J. Nilsson, ‘SA production system for automatic deduction,’ Machine Intell. 9, Ellis Horwood, Chichester,
England, 1979.
Nilsson [1971]
N.J. Nilsson, Problem-solving Methods in Artificial Intelligence, McGraw-Hill, New York, 1971, pp. 165—168.
Robinson [1965]
J.A. Robinson, ‘‘A machine-oriented logic based on the resolution principle,’ J.ACM 12, 1(Jan. 1965), pp.
23—41.
Wilkins [1973]
D. Wilkins, ‘“‘QUEST—A non-clausal theorem proving system,’’ M.Sc. Th., Univ. of Essex, England, 1973.
A STRATEGICAL APPROACH 69
CHAPTER 3
Abstract
This paper describes LOPS, an interactive system for LOgical Program Synthesis, which is currently being
implemented. Given the logical input-output specification of a problem, LOPS attempts to construct an algo-
rithmic solution by following several strategies which result in logical transformations, the correctness of which
is established by a theorem prover. One of its further components is an example generator. Its use seems to
be a novelty in this field.
A. Introduction
As with the previous chapter, our approach to the automation of program construction is based on the
view that (human or mechanical) programming is a deductive process which starts with a more or less detailed
purely descriptive specification of the given problem in some representation language and after some search
guided by several basic strategies eventually ends up with the deduction of a suitable program solving the pro-
blem. We also adopt the view that predicate logic (in a wider sense) so far still is the most suitable representa-
tion language for studying this process, because of its conciseness, naturalness, flexibility, extendibility, and in
particular because of the well-understood deductive mechanisms for it.
The crucial issue in such a deductive approach is the development of strategies which both are generally
applicable and are effectively cutting down the otherwise tremendous search space. Strategies of that kind have
been presented in Bibel [1980] and successfully applied by hand to a variety of programming problems of a
rather different nature. It has been argued that these strategies are suitable for implementation. We discuss in
this chapter a number of steps towards such an implementation which have been worked out during the last
several years. Our system is called LOPS which stands for LOgical Program Synthesis. It is written in UT-
LISP, implemented on a CYBER 175, and currently comprises implementational work of about two man-years.
In this introduction we will outline the structure of LOPS, leaving further details to the subsequent sec-
tions. Its top-level conceptual structure is illustrated by figure 1. Note that the flow of control is not shown,
since the arrows rather illustrate the flow of relevant information during the synthesis process.
KB
| knowledge
base
ECU
central
control
unit
Be EG
example
explorer generator
O)\ O \\
ALGORITHM
QUTPUT
B. Description of CCU
In this section, we are describing the components of the central control unit, CCU, in LOPS, as intro-
duced in figure 2 from the previous section. For that purpose, their functions are illustrated by use of the fol-
lowing well-known partition problem Bibel [1980] as an example. Given a set S, an element a of S and an
ordering relation < on S, we are looking for an algorithm which produces the subset S, C S of all elements
of S smaller than a w.r.t. < . Notice that the problem given in the introduction of this book is in fact a spe-
cial case thereof.
Bl INPUT
AS we Said in the introduction, the background for LOPS is predicate logic, that is, the system will mani-
pulate formulas of first order predicate calculus. In the final version of LOPS, however, the user will not have
to be deeply familiar with predicate logic; rather he might then communicate with the system in some artificial
natural language (see section 4 in Bibel [1982a]).
In logic most problems can be stated in the following canonical way:
Y(input—variables) 3 (output—variables)
[input-condition -- > output-condition]
In this format our example could be written as follows where we assume all members of S are different.
Here we have used a semi-formal language for predicate calculus, allowing more natural phrases like ‘‘< is an
ordering on S’’ rather than ORD (<,S). We use --> to signify implication and \ to indicate set subtraction.
Numerous further liberalizations are possible and a future input-dialogue could proceed as follows (or even
more comfortably). Words in capital letters indicate messages of the system.
74. BIBEL AND HORNIG
INITIALIZE PROBLEM:
PART
INPUT-VARIABLES:
S,a, before
OUTPUT-VARIABLES
St
INPUT-CONDITION:
before ordering on S
INPUT-CONDITION:
a element of S
INPUT-CONDITION:
nil
OUTPUT-CONDITION:
Sl subset of S minus a
OUTPUT-CONDITION:
x element of S minus S1 and x #a implies a before x
OUTPUT-CONDITION:
x element of Sl implies x before a
OUTPUT-CONDITION:
nil
The system would understand expressions like ‘“‘element of’ or ‘‘subset of’. If, for example, ‘‘ordering on”’
was not known to the system it would request more information by asking repeatedly GIVE AXIOM FOR
"ORDERING ON": The user then would have to specify rules like
— before is binary
— before is transitive
— before is antisymmetric, etc.
In the current state of affairs the problem specification may in fact be presented to LOPS in the way of such a
dialogue except that a more formal language still has to be used. The task of designing a suitable user-language
has been deferred to the future. As an aside, it should be noted that there are several different (but equivalent
from a logical point of view) ways to describe problems and that the efficiency of LOPS may well depend on
this description. This effect, however, is certainly well-known to every human programmer as well.
Let us assume that our problem is stored internally in the form of (1) (with <_ replaced by a binary
predicate before) and that the following axioms are known to the system:
“before is an ordering on S”’ then means ‘‘(A,)(A>)A(A3)”, with Sas the domain of discourse.
A STRATEGICAL APPROACH 75
where V is the exclusive disjunction, i.e. A V B holds if and only if exactly one of A or B holds. LOPS obtains
this form as follows. The component GUESS analyzes the syntactical structure of the output-conditinn with
respect to the membership relation. Since S; occurs on the right hand side of an €, GUESS conjectures that S,
is a set. This leads to the alternative (u€S, V u€S,), capturing the distinction between the cases ‘‘lucky’’ and
‘*not-lucky’’, where u denotes the guessed element.
In order to realize its task mentioned before, DOMAIN breaks up the output-condition into a set of
literals (clauses in more complicated cases). From these it selects a subset (specifying the restricted domain)
according to certain criteria. Some of these criteria are of a purely syntactical nature. For instance, the
output-variable should occur in a literal of the subset only if it ranges over individuals, since then, if we replace
it by our guess we obtain a meaningful formula. This would not be the case, if it would range over sets, since
then the output is constructed pointwise, and we could not reasonably replace the output, representing a set, by
one of its elements. There are also semantical criteria. A guess should neither prevent its success nor its
failure. This seems reasonable, since it is useless to make guesses which are always bound to be wrong. On
the other hand, it would be equivalent to the algorithm we wish to produce to make correct guesses every time.
In our example this amounts to the demand that both
Checking this means to construct models, and we can see here for the first time the need for a model-
generating component. A last very important criterion comes from a look ahead to the situation where we want
to transform our logical representation into executable code. Eventually we will need to have an algorithm
which is able to make the guess according to the condition chosen by DOMAIN. Hence, we must either know
that there is such an algorithm (e.g. in KB) or seek to construct it. If this is impossible we cannot use such a
literal as a DOMAIN-condition. If there are literals satisfying all our criteria, then the component has
76 BIBEL AND HORNIG
completed its effort, if not, the system turns to the user and asks for help. In our example, the formula u € S
\, {a} satisfies all requirements and is therefore chosen. An algorithm CHOOSE-SIDIFF (SIDIFF stands for the
difference of a set and a singleton), which chooses elements from S \ {a} if this set is non-empty, is provided
by KB (see section B.S).
To apply GET-DNF (get disjunctive normal form) means simply to break up the problem into two mutu-
ally excluding cases:
B.3. GET-REC
According to Bibel [1980] the next strategy to be applied to our example is GET-REC. This strategy has
the task of finding a recursion scheme which fits the description of the problem obtained by GUESS and
DOMAIN and which makes an algorithmic solution possible. This is achieved by a matching procedure assisted
by a theorem-prover (see section C.1), as will be discussed now. In Bibel [1980] it has been argued that there
are only a few practically useful recursion schemata. In the final version of LOPS these schemata will be avail-
able for the matching process in a hierarchically structured data-base. The fact that u€S holds (by the choice u
€ § \ {a} performed by DOMAIN) gives a high priority to such a recursion scheme which, when applied to the
present situation, reduces the problem from S to S \ {u} (see formula (4) below). So, LOPS attempts to
replace equivalently S by S \ {u}. As for DOMAIN above, the input- and output-conditions are simply
regarded as a set of literals. Each of these literals A determines a new literal A", obtained by replacing every
occurrence of S by S \ {u}, e.g. a€S would be replaced by a € S \ {u}. The equivalence of A and A" now has
to be checked under the side conditions u € S \ {a}, and u €S, or u€S), respectively, according to formula (2).
The following three different reactions can be distinguished:
1. Under the given side conditions, A and A” are equivalent. For instance, it is true that
holds under the side condition u € S \ {a} and u¢S,. In this case, A may be replaced by
A” without changing the truth value of the formula.
2. There is aformula B such that A" is similar to a (not necessarily proper) subformula C of B
(in the sense that C can be obtained from AY by a substitution of terms by terms), and
A and B are equivalent under the given side conditions. For instance, under the side
conditions u € S \ {a} and u€S), we have the following equivalence.
3. If there is no obvious way to obtain one of the first two cases, then A remains unchanged. An
example of such an A is ‘‘before is an ordering on S’’.
Acting as described above the system obtains from (2), splitted by GET-DNF, a new equivalent
problem-specification. This equivalence is the following formula (3), where we omit obvious quantifiers:
To see that this equivalence actually holds, assume the antecedent u € S \ {a} and, distinguishing the two cases
u€S, and u €S;, prove the equivalence of corresponding clauses, e.g.
Or
S; CS
\ {a} <---> S, \ fu} CS \ {u} \ fa}
in the case u €S,
and
S; CS
\ fu} <--->S, CS \ {u} \ fa}
in the case u €S}.
78 BIBEL AND HORNIG
This is an instance of the recursion scheme we had in mind at the beginning of this section.
The transformation from (3) to (4) is done in two steps. First, LOPS determines a scheme from which all
equivalences contained in (3) can be obtained by substitution and possibly addition of formulas. Such a
scheme is given by:
The substitution ((X S) (Y S) (ZS,) (W S,)) gives the left hand side of (3), except for the distinction of
cases (u€S; V ufS;), ((X S) (YS \ {u}) (ZS, \ {u}) (W S))) gives the first alternative (u €S,), of the right
hand side and ((X S) (YS \ {u} ) (ZS,) (W S,)) gives the second alternative (u¢S,), Fl except for the
clauses (u before a) and (a before u). In other words, there is a scheme F(X,Y,Z,W) with
F(S,S,S),,S)A(u €S V u ¢S})
Cass
F(S,S \ {u},S; \ {u},S)) A (u before a) A uéS,
V
F(S,S \ {u},S;,8;) A (a before u) A u€S,
In the second step LOPS notes that the first and fourth arguments of all F-expressions are equal and thus not
relevant for the recursion. LOPS considers the third component as the result of a computation (it corresponds
to the output variable) and uses the side conditions u €S, and u¢S, to get (4). Thus, PART is defined by
it will react with the counterexample y=u (cf. Bledsoe and Ballantine [1979]), saying that u is a counterex-
ample if and only if u before a holds. Therefore LOPS reacts by adding the negation of this condition, that is
a before u to A". The theorem prover has to establish the validity of the new equivalence. In general, this
technique may be too simple to be successful. To deal with a more complicated situation we use the example
generator EG. This will now be illustrated with the substitution of u€S, and u€¢S, by computationally evaluable
formulas. Since S; is to be determined, any literal containing S; is computationally infeasible, and thus has to
be substituted. Consider the first case u¢S,;. Only an example can help in this case. Therefore we look for a
model (S,before,S,), say, with S = {0,1,2,3}, and the following axioms:
S = (0,1,2,3}
before = {(1 2) (1 3) (1 0) (2 3) (20) 3 0)}
S, = {1}
@—___e@—__@__e
l 2 3 0
(ole!
-———_
5 +4
During the construction EG remembers why certain elements do not satisfy certain relations. Therefore look-
ing at a typical example of an element u € S, (e.g. 3), we find that the reason is the failure of (A4), i.e. not (3
before 2). Therefore, the component EG has to suggest u € S; <---> (u before a). The theorem-prover
would then check that this is in fact true and a simplifying procedure would use (A>) to obtain
80 BIBEL AND HORNIG
At this point two objections to this description of the use of the model program EG in cooperation with
EE might arise. Namely it is difficult to see in our very simple example why such a complicated approach is in
fact necessary since u¢S, <---> a before u expressed hardly more than the definition of S;. The reason for
this is, that we wanted to present the basic ideas behind our approach to program syntheses and not simply ad
hoc solutions that apply only to specific problems. The reader, who wants to appreciate the necessity of our
general approach, is encouraged to try the maximal spanning tree algorithm (Bibel [1980]). There, one is con-
fronted with a clause e €E, V e¢E,, where E, is the desired output. It turns out that e¢E, is equivalent to the
property of being an edge in a closed circuit of the graph. To us it seems promising to find this and similar
equivalences by use of the model program, and in fact we are not aware of any better solution.
Thus SIDE CONDITIONS, the component in LOPS responsible for what has-been described in this sec-
tion, leads to
It remains to transform this into an algorithmic solution, that is to choose suitable data structures and to find a
control structure. This will be discussed in the following section.
B.5. ALGORITHM
It is a straightforward task to transform (5) into a PROLOG procedure (Colmerauer et al. [1981]). This
way LOPS could have finished its task of constructing an algorithm for the partition-problem. However, to
obtain more efficient algorithms it is necessary to select suitable data-structures and to eliminate recursion.
This particular part has lower priority in our project and will be taken into closer consideration at some time in
the future. We note, however, that there are well-explored methods to do this which have been used in other
projects like SETL (Dewar [1978]) and PSI (Green [1977]). Eigemeier et al. [1980] describe methods to gen-
erate data-types, a project which may be regarded as complementing our efforts in this respect.
In the current version of LOPS there is only a rudimentary version of ALGORITHM which, for instance,
translates (5) into the executable LISP-program, shown in figure 3.
A STRATEGICAL APPROACH | 8:
(DEF
(PART (S A)
(PROG (U*)
(COND ((NOT (MEMBER A S)) (RETURN NIL)))
(SETQ U* (CHOOSE-SIDIFF (LIST S) (LIST A)))
(COND (U*
(COND ((LESSP A U*)
(RETURN (PART (REMOVEI S U*) A)))
(T
(RETURN (CONS U* (PART (REMOVEI S U*) A))))
))
(T (RETURN NIL))))))
(DEF
(CHOOSE-SIDIFF (ARGS+ ARGS-)
(DO
((Y (CAR ARGS+) (REMOVEI Y Z)) (Z NIL)
(CHOICE NIL))
((OR (NULL Y) CHOICE) CHOICE)
(SETQ Z (RAND Y))
(SETQ CHOICE (TEST - SIDIFF Z (CDR ARGS+) ARGS-)))))
(DEF
(TEST-SIDIFF (CH ARGS+ ARGS-)
(COND ((TEST-INTERSECTION CH ARGS+)
(DO ((FLAG T) (Y ARGS- (CDR Y)))
((OR (NULL Y) (NULL FLAG))
(COND (FLAG CH)
(T NIL)))
(SETQ FLAG (NOT (EQUAL CH (CAR Y)))))))))
(DEF
(RAND (LIST) (NTH LIST) (ADD1 (REMAINDER (TEMPUS) (LENGTH LIST))))))
(DEF
(TEST-INTERSECTION (CH ARGS)
(DO ((FLAG T) (Y ARGS (CDR Y)))
((OR (NULL Y) (NULL FLAG))
(COND (FLAG CH)
(T NIL)))
(SETQ FLAG (MEMBER CH (CAR Y))))))
We do not claim that this is the most efficient algorithm to solve the partition-problem. Clearly, a choice of
better data-structures, elimination of recursion and other modifications will lead to considerable improvements.
But as we said, such techniques might be adapted from other projects, hence they are presently not in the focus
of our interest.
Clearly, a model of B,,B, must be a structure M:=(S,P,Q) where S_ is the domain and PC S and Qc S deter-
mine the subsets of elements x in S for which Px and Qx, respectively, are true. In both approaches we
first fix a small number N, let S:={0,1,...,N}, and attempt to find suitable relations P and Q. This particular
choice of S is, of course, not essential. Instead of numbers we may use other symbols, e.g. LISP-atoms. As
an example, let N=2, so that S={0,1,2}. M is called a model of say B, if M satisfies B,, in symbols M|I=B).
The relation |= is defined by the following induction on the structure of sentences. Let G, and G) be sen-
tences containing at most the predicate symbols P and Q and constants from S.
A STRATEGICAL APPROACH _ 83
a) Let nes.
b) M|=~G;: <--->—~M]|=G,
c) M| =G,AG;: <--- >M|=G, and M|=G,
M|=Vx(PxVQx)
<===> forall n€S:M|= PnVQn
<===> forall né€S:M|= Pn or M|= Qn
<===> forall né€S:n€P or n€Q>
Similarly
M|=Wx ~(PxAQx)
<===> forall né€S: ~ (n€P and n€Q)
<===> forall n€S:
n¢éP or n€Q—
B;: POA Pl
Bag: WxPx-- >Qx
Bs: x ~Qx
It is rather obvious, that the most direct way to satisfy these axioms is to do it in the sequence B3,B,4,Bs5 which
in a way reflects increasing complexity. Syntactically this sequence is found by counting the number of positive
and negative occurrences of predicates in the axioms. We get
If we apply the heuristic rule: ‘‘List axioms, where the relation R occurs positive before those, where the
relation R occurs negative!’’, then we obtain exactly the sequence B3,B4,Bs. The general rules to obtain such
a sequence are slightly more complicated, since the structure of positive and negative occurrences is usually not
as simple as here. A second strategy is to remove as few elements as possible and to check whether axioms
which were valid in M; are still valid in Mj4;. There are more strategies, for which we have no space to go into
details.
We only have a further look at our example. Apparently it holds that M,|=B, and M,|#B, (M, was
defined at the beginning of C.2.1). If we choose P}:=P, \ {0} and Q}:=Q,, we get M,|=B; and Mj|= Bo.
Another such step gives My:=(S, {2},S). If we define P3:=P, \ {2}, B; does no longer hold in M3. Therefore
we better try to reduce Q,=S. This leads us, for instance, to M3:=(S,{2},{0,1}). It is easily seen that
M3| = B, AB).
It may also occur in the present version that the program can find no model even though such models
exist, that is, the procedure is not complete in this sense. Therefore, we have made another, more systematic
attempt to construct models.
which in turn is equivalent to B; A B>. Now, it is not hard to see, that a non-complementary path (see Bibel
[1982b]) through K determines models of B, and B>. Such paths are obtained by selecting one literal from
each column (i.e. clause). As an example consider the path {QO,Q1,P2,~PO,~P1,~Q72}. In fact, it can be
shown that all models of cardinality 3 can be obtained (up to isomorphism) this way. Now, the proof-
procedure given by the first author in Bibel [1981] after an easy modification produces non-complementary
paths, if applied to matrices of axioms which possess a model of the cardinality in question. These facts justify
the label ‘‘systematic’’. A program which embodies these ideas has been written and is currently undergoing
tests. For more details see Hornig [1981].
which we call model prototypes. To the system, model prototypes could be made available in three ways, by a
data-base installed in the system, by the user, or by a construction using the methods in C.2.1 or C.2.2. What
is needed is a procedure which can join several model prototypes into one model. Theorem proving will cer-
tainly play an important role in this context. Such a procedure should be feasible, since human beings work in
such a way when constructing models, but nothing has been done yet to formalize these ideas.
— transitivity of relations
— uniqueness in arguments
— aboveness in physical spaces
— behindness in physical spaces, etc.
The system then checks which of these properties hold in the model and suggests combinations of them as gen-
eral facts. The generality has to be checked by TP.
86 BIBEL AND HORNIG
There are interesting questions concerning the use and updating of these data-structures. They will not be dis-
cussed here.
F. Conclusion
We have presented LOPS, an interactive system to construct programs by logical transformations. LOPS
consists of a control unit, a knowledge base, a theorem prover, an example generator and an example explorer.
After a two man-years implementational investment some of these components are still in a rudimentary state.
The structure of LOPS is patterned after procedures which are believed to be central to human program-
ming efforts. We hope that with the experience of experimenting with LOPS, this system can be improved to
such an extent as to strengthen the view that this approach, especially the strategies described in Bibel [1980] is
powerful enough to allow the synthesis of algorithms for many practical problems.
G. Acknowledgements
We thank Alan Biermann for a number of detailed comments which provided substantial support for a
considerably improved version of this paper.
88 BIBEL AND HORNIG
References
Bibel [1980]
W. Bibel, ‘‘Syntax-directed, semantics-supported program synthesis,” Artificial Intelligence, Vol. 14, No. 3, pp.
243-261.
Bibel [1981]
W. Bibel, ‘“‘Matings in matrices,’ Proceedings German Workshop on Artif. Intelligence 1981, Bad Honnef, J. Siek-
mann (ed.), Informatik Fachberichte No. 47, Springer 1981, pp. 171—187.
Bibel [1982a]
W. Bibel, ‘“‘Logical program synthesis,’ Proc. Intern. Conference on Fifth Generation Computer Systems, Tokyo
1981, T. Moto-oka (ed.), North-Holland P.C. 1982, pp. 227—236.
Bibel [1982b]
W. Bibel, ‘‘Automated theorem proving,’ Vieweg, 1982.
Biermann [1976]
A.W. Biermann, ‘‘Approaches to automatic programming,’ Advances in Computers, Vol. 15 (1976), pp. 1—63.
Dewar [1978]
R. Dewar, *‘The SETL programming language,’’ Courant Inst., New York University (1978).
Green [1977]
C. Green, *“‘A summary of the PSI program synthesis system,”’ Proc. IJCAI—77 (1977), pp. 380—381.
Hornig [1981]
K.M. Hornig, ‘‘Generating small models of first order axioms,’ Proc. German Workshop on Artif. Intelligence
1981, Bad Honnef, J. Siekmann (ed.), Informatik Fachberichte No. 47, Springer 1981, pp. 248—255.
Kowalski [1979]
R.. Kowalski, ‘‘Algorithm = logic + control,’’ Communications of the ACM, Vol. 22 (1979), pp. 424—436.
Wegbreit [1976]
B. Wegbreit, ‘‘Goal directed program transformations,’’ /JEEE Transactions on Software Engineering, Vol. 2
(1976).
COMBINING SYNTHESIS WITH ANALYSIS 91
CHAPTER 4
Ria Follett
Scientia Pty. Ltd. Computer Consultants
Anzac House
26 — 36 College Street
Sidney, N.S.W. 2010
Australia
Abstract
When synthesizing programs which may have side effects, these side effects must be discovered and taken
into account. Program analysis is used to describe the effect of any program segment in sufficient detail to
allow the required goal to be achieved and verified. Combining program analysis and program synthesis is espe-
cially important when synthesizing recursive programs which may contain arbitrary side effects. The method is
illustrated by synthesizing two different sorting algorithms—insertion sort and quick sort.
92 FOLLETL
A. Introduction
A major problem in the synthesis of programs is how to take proper account of the side effects of parts of
the program. A side effect is an action which is unplanned. For example, you may open a window to let in
some air, and in flies a bee. The bee flying in is a side effect of opening the window. Side effects may be
harmful but not necessarily so. Side effects can be used constructively in the synthesis of programs, rather than
being purely a destructive element.
Before the side effects of a program segment can be used in program synthesis, these side effects must be
discovered and described. Together with a system that synthesizes programs, there must be a method of
analyzing any new program segment to obtain its description, which includes all possible side effects. To handle
any practical problem methods of analyzing branches, loops and subroutines must be developed. The analysis
of the program will result in a detailed program description. The program description should contain sufficient
detail to allow the correctness of programs to be proved, while keeping the effort involved in program analysis
to a minimum.
A method of program description has been developed which minimizes the amount of analysis required to
obtain automatically a description which is adequate for the required verification. The program segment
descriptions are then used to synthesize larger programs.
An automatic program synthesis system, PROSYN, has been designed to illustrate these methods.
PROSYN allows the synthesis of recursive subroutines, as well as the synthesis of hierarchies of programs. The
system has automatically synthesized and analyzed a variety of programs including SORT2, MAX, SUM of an
array, solving two simultaneous equations, REMAINDER from dividing two numbers, moving elements up and
down arrays, sorting arrays, adding, multiplying and transposing matrices, inverting matrices, finding eigenvec-
tors of matrices, finding the zeros of a function, and solving a set of linear simultaneous equations.
In the following sections the method of describing a program shall be presented, together with methods of
program analysis and synthesis. Various sorting algorithms will be used as examples of the variety of programs
that may result from using these techniques. Detailed descriptions of two sorting algorithms, insertion sort and
quicksort, shall be used to illustrate the process.
where I is the identity function. This means that only relations R, where R does not depend on the variable
*v’, may be passed back. These relations will be passed back unaffected by the assignment statement.
The variety of passback pairs available for any given program segment means that the depth to which the
segment must be analyzed may be varied depending on the depth of description required. For example, no
analysis is needed to describe a program segment by the passback pair
(FALSE, I)
As no relations can be passed back over the segment using this passback pair, it is not very useful in either pro-
gram verification or program synthesis. A variety of more expressive passback pairs of the form
(S,D
where I is once again the identity function, is often achievable. In these cases, S defines a domain of invari-
ants. Exact descriptions may be unobtainable, and are of the form
(TRUE,f)
where f defines the predicate transformer of all relations. Any combination of invariants and predicate
transformers may be used to describe a program segment. If the passback pair is
(Sf)
then invariants are relations R, which lie in the domain defined by S, and for which R <= f(R).
The process of obtaining a more precise passback pair is called refining the passback pair. Refining the
passback pair implies widening the domain, with a corresponding modification to f.
For example, consider the following program, SORT2, that sorts two numbers
<=
I| temp
END
94 FOLLETT
where I is the identity function. Using this passback pair, only relations not dependent on the values of x, y or
temp may be passed back over the segment, and these may be passed back unaffected.
This passback pair may be refined by passing the relation back separately over both branches. The rela-
tion R may be passed back over the ELSE branch resulting in the precondition R’”’,
where
The relation R may be passed back over the THEN branch unchanged. Thus a refined passback pair for the
SORT2 segment is
(Ree = R, 9)
This passback pair means that, whenever x and y may be interchanged in R without affecting R, and R does not
contain temp, the R may be passed back unaffected by SORT2. The passback pair may be further refined to
This means that, if both R and R’”’ are true before SORT2, then R will be true after SORT2.
The passback pair may be still further refined to
Each passback pair is more complicated than the previous passback pair. The passback pair need only be
refined to the level required for program verification or program synthesis.
A passback pair also shows when a primitive, if inserted, will interfere with a protected relation R. Pro-
tected relations, or protected goals, are widely used in program-synthesis systems, for example in Sussman
[1975], Tate [1975] and Waldinger [1977]. After goals are achieved, they are protected, and no further pro-
gram steps may be added that alter or ‘‘undo’’ the protected goal. A program step, with a passback pair of
(S,f), is likely to alter a protected relation R, unless S(R) and R => f(R). For example v := a may be
inserted where the y = b is protected, as v is not contained in the relation, but v:=a may not
be inserted
where v = b is protected. Further refinements may decide that the relation is not affected after all. For
exam-
ple if v=a is protected, then the program segment v:=a may not be inserted if the segment is
described by
(v is contained in R, I).
COMBINING SYNTHESIS WITH ANALYSIS = 95
Then f(v=a) is a=a which is true. Thus S(R) and R = > f(R) and so the segment may be inserted after all.
C. Program Synthesis
The analysis/synthesis approach has been illustrated by implementing an automatic programming system
in LISP, called PROSYN. PROSYN consists of
An important aim in the development of PROSYN was to show the power of the approach in a wide
variety of domains. Hence the synthesizer was designed to be domain independent, and we rely on comple-
mentary information supplied for any desired domain, rather than replying on a fixed domain. Domains imple-
mented include BLOCKS, INTEGERS, ARRAYS with REALS, and LISTS. The more primitive domains such
as BLOCKS did not need the complete power of the synthesis system.
The domain information consists of
1) the grammar of relations allowed. This grammar determines the format of the input required by the syn-
thesizer. PROSYN assumes a prefix format, whereas in this paper relations are expressed in the more
normal infix format. An example used in the grammar is
(VARS) =—(E XP a)
where (VAR 1) is a free variable defined in the grammar by VAR, and (EXP 1) is a free expression as
defined by EXP. EXP may include in its definition, that EXP is:
(EXPil)) + (EXP)
or (VAR 1)
or any atomic value
2) The primitives supplied. The assignment statement was usually the only primitive used. The assignment
statement may be described as:
3) The domain dependent logic. The pattern matcher accesses this logic when attempting to show that two
expressions match. For example in the REAL domain expl < exp2 is equivalent to exp2 > expl.
4) Any initial assumptions, such as initial values of variables, positions of blocks, etc.
Functions are synthesized by considering goals in turn. After a program to satisfy some of the goals has
been created, remaining goals are synthesized by taking into account the side effects of the program already
written and the goals that are protected. Strategies (corresponding to programming knowledge) are used to
guide the construction. Backtracking occurs when all strategies fail. The strategies, or methods of tackling
problems, used include inserting primitives and achieving their preconditions, passing back goals, splitting the
goal into cases, and induction. Strategies other than induction lead to straight line or branch programs. The
synthesis of recursive programs is far more difficult. In this case the incomplete program must be analyzed so
that recursive function calls, which are to be inserted, do not interfere with the goals already achieved.
Recursive functions are achieved in stages by applying induction. Induction is applied whenever the goal
consists of a relation defined in the domain to operate on a SET of values, for example the relations FORALL
and EXISTS. The induction process may be described as follows
a) if the set cannot be proven to be non-empty, the goal is split into the case of an empty set, and non-
empty set.
b) the non-empty set is then split using one of the possible splits indicated in the domain and the new goals
attempted.
c) when a future goal is an instance of the original goal, then
d) termination is considered. If termination is assured then
e) the program written so far is analyzed for its passback pair.
f) the recursive function call is inserted if its passback pair does not interfere with any relations protected
where it is to be inserted.
g) when completed the function is again analyzed. If only recursive function calls are inserted in the func-
tion program, the passback pair as calculated in f) is valid. If other primitives are added, then the pass-
back pair must be re-evaluated, and all recursive function calls checked for consistency with the protected
relations. If the program consists of only recursive function calls, the function cannot be synthesized by
this method. An alternative split must be used.
h) the program can now be used as a completed subroutine.
The above process can be illustrated by the program that inserts an element in an array. This program
was synthesized as part of the synthesis of the insertion sorting algorithm which is described later. The aim of
INSERT is to achieve
The synthesis of INSERT will be described under the points a) to h) given above.
a) The set is empty if n<p. As it cannot be determined whether or not n < p, the goal is split into the two
cases. This results in the following program segment:
The validity of this goal cannot be determined, so it is used as the basis of a branch. If the goal is not
already true, swapping will make it true. As no goals are as yet protected, the elements can be swapped,
and the following program segment is inserted after the ELSE .... in a):
This achieves the goal x(n-1) < x(n) which is then protected.
ii) achieve FORALL{i = p + 1, n-1} x(i-1) < x(i). The validity of this goal cannot be determined, as
the SWAP interferes with the assumed precondition (which is, in fact, identical to this goal). The goal is
then attempted separately in each branch:
e After the THEN, the only program is the null program, and thus the goal is true given the original
precondition.
e After the ELSE, the original precondition is affected by the ELSE. However:
c) The new goal is an instance of the original goal, so induction can be applied.
d) As the set on which the new goal is defined, FORALL{i = p + 1, n-1} is a proper subset of the original
goal (FORALL{i = p + 1, n}), termination is assured, so
e) the program written so far is analyzed for its passback pair. The following passback pair can be derived
(see Follett [1980 a,b]) for INSERT (p , n-1)
INSERT(p , n-1) can be inserted after the SWAP if it does not interfere with the protected goal x(n-1) <
x(n). Denote the above passback pair as (TRUE,f). Then the effect on the protected goal is f(x(n-1) <
x(n) ) gives FORALL{| = p, n-1}xG) < x(n). However, closer analysis shows that the relations already
true at this point imply the altered goal (see above reference) and so:
98 “FOLLE DT
g) As only a recursive function call has been inserted, the function does not need to be reanalyzed.
h) The program can now be used in the SORT program.
where s(i) is the initial value of x(i). Note that the last two conjuncts ensure that the resultant array is a per-
mutation of the original array. The first conjunct ensures that the resultant array is sorted.
The last two conjuncts are true initially, and so may be protected over the range of the whole program.
They need not be specifically achieved.
The goal that still needs to be achieved is
As the goal is expressed as a FORALL, induction is used. The set on which i operates, being (p + 1 , n) can
be divided into the subsets
The first decision is to determine the value of m. If m is n—1, various selection, insertion and bubble sorts
result. If m is another predetermined number (such as (n—p)/2), then a merge sort results. If the choice of
m is left until later, partition sorts (such as a quick sort) result.
Secondly the order in which the goals are attempted must be determined. The order affects the resulting
algorithm, but not the correctness, as future goals take into account the program already synthesized and the
goals that were achieved or used. For sorting algorithms the order in which the goals are attempted
are rela-
tively unimportant. Similar programs are obtained, except that attempting the singleton goal (x(n-1) < x(n) )
first, tends to lead to bubble sorts.
COMBINING SYNTHESIS WITH ANALYSIS 99
Figure 1 shows how the major sorting algorithms are obtained. These are discussed in detail in Follett
[1980 b]. The synthesis method is discussed by illustrating how insertion sort and quicksort are synthesized.
The goal can be split only if p<n. As this cannot be determined at this stage, a branch is inserted in the pro-
gram (see B point a). The goal is then split as
This can be solved by a recursive call to SORT(p ,n-1), and, as this does not interfere with any protected rela-
tion (no relation is as yet protected), the recursive call is inserted. This gives the program
Original Goal
passback | | reachieve
goal goal
Figure |
Various Sorting Algorithms
COMBINING SYNTHESIS WITH ANALYSIS 101
The remaining goal x(n-1) < x(n) is to be achieved at 100. This can be achieved using SORT2(n-1 ,n),
the subroutine that sorts x(n-1) and x(n). Before this can be inserted, any interactions between SORT2 and
the goal already achieved must be considered. A passback pair for SORT2(n ,n-1) was developed in section B,
as
(CIRUEDR “Ro
where R is the relation to be passed back and R’”’ is the relation with x(n) and x(n-1) interchanged. The effect
of SORT2(n, n-1) on the protected goal is considered. The protected goal, G,
can be passed back over SORT2(n , n-1) giving f(G) as FORALL{i = p + 1 ,n-1} x(i-1) < x(i) AND x(n-2)
< x(n). AsG »#f(G), the protected goal will be altered if SORT2 is inserted between 50 and 100.
There are two ways of solving this problem:
i) |The goal x(n-1) < x(n) can be passed back over the recursive function call, SORT(p ,n-1) at line 50.
This approach leads to various selection sort routines (see Follett [1980 b]), or
ii) The overall goal can be reachieved at 100, given the subgoal already achieved at 50. This approach leads
to insertion sorts, and will be described now.
A new subroutine is synthesized, and will be, for the sake of clarity, described as INSERT. The aim of the
subroutine is to achieve
assuming, as a precondition
D.2 Quicksort
This algorithm is generated when the goal for SORT(p ,n), being
with the goal FORALL{i = p + 1 ,m} x(i-1) < x(i) protected over 50 to 100.
The goal FORALL{i = m + 1 ,n} x(i-1) < x(i) is still to be achieved at 100. This goal also matches the
original program goal call with p = m + 1. The passback pair of SORT(m + | ,n) must be determined to test
if it interferes with the protected goal.
So far, the only program step is a recursive call to itself. Thus, the passback pair can only be estimated at
this stage. The validity will be checked when the synthesis is complete. Assuming that the only primitive
available is SWAP(i ,j) which interchanges x(i) and x(j), the passback pair for SORT(p ,n) can be estimated as
(Follett [1980 b])
The insertion of the recursive function call, SORT(m + 1, n) can be shown to interfere with the protected goal
in a manner similar to that described for SORT2 in section C.1.
There are again several ways of solving this problem
i) The overall goal may be reachieved at this point. This leads to a variety of inefficient merge sorts.
ii) The remaining goal may be further split into
The second subgoal may be achieved with a recursive function call without altering the protected goal.
The first subgoal may be achieved using the method described in i) leading to merge sorts, or passed back
over the two recursive functions calls, leading to partition sorts, similar to that described in iii)
iii) The remaining goal may be passed back over SORT(p ,m), leading to partition sorts. This option will be
investigated below.
COMBINING SYNTHESIS WITH ANALYSIS — 103
is passed back over SORT(p ,m), using the above passback pair, giving the goal to be achieved before the
SORT(p ,m) as
The first conjunction generates a new subroutine, called here PART(p ,n). This subroutine determines
the value for m, as m is as yet undetermined. As m will be altered, the SORT routine must be altered to
include m as a parameter, effectively making m a local variable. The synthesis of PART(p ,n) is given
separately in the next section. The SORT function will be completed here assuming PART, but in fact
PROSYN will recursively solve the subgoal PART first. The effect of PART is to swap the values of x(i), and
to alter the value of m, but as the values of x(i) and m are not protected at step 25, the subroutine is inserted,
leading to the program:
with the goal FORALL{i = m + 2 ,n} x(i-1) < x(i) still to be achieved at 50.
The goal can be achieved by inserting the recursive function call SORT(m + 1 ,n , 0) at 30. However
the side effects must be considered. The goal (given above as G2) is altered by SORT(m + 1 , n, 0) using
the passback pair for SORT given earlier, to
The altered goal is identical to the original goal. Thus the recursive function call SORT(m + 1 ,n ,0) can be
inserted at 30, giving the completed sorting program:
D.3 PART
This subroutine is synthesized in response to a subgoal in the SORT routine. It is actually synthesized
during the synthesis of the SORT routine, but will be described separately to make the synthesis easier to fol-
low.
The goal that needs to be achieved is
where m is an as yet unprotected variable. This means that the value of m may be altered by PART.
The goal can be restated as there exists a y such that
(Note that m may be p - | or n, meaning that either of the FORALL ranges may be null. This allows y to be
greater than all the x values, or less than all the x values.)
The EXISTS allows induction on the value of m. Let the induction set considered be split as (p- 1 n-
1) and (n ,n). The null condition is first inserted. If p- 1 = n thenm exists, and must be n. So the null con-
dition added is
IF p- 1 = n THEN m:= n.
COMBINING SYNTHESIS WITH ANALYSIS — 105
The goal for EXISTS may be split into the sets (p - 1 , n - 1) and (n ,n) giving the goal
The first disjunct is attempted first. As the goal is FORALL induction occurs. The induction set chosen may
be (p ,p) and (p + 1 ,n). The first disjunct is split as
The first conjunct is attempted first. The validity cannot be determined, so a branch is synthesized. If the con-
junct is true, a recursive call achieves the remainder of the conjunct, giving the partial program
The alternative branch of the worldsplit is the next disjunct. This is the goal
However the disjunct does not match the original goal for PART as n in the original goal must match both n
and n-1 in the current goal. The disjunct can be further split into the sets (p-1 , p-1) and (p ,n-1) giving the
goal
The first disjunct is attempted. The goal is split into the sets (p ,n-1) and (n ,n), giving the goals required as
The first goal is attempted first. As this cannot be determined, a branch is synthesized, giving the resultant
program:
The goal matches the recursive function call except for the goals x(p) < y A y < x(n). These are attempted
first. Both of these goals are known to be false from the branch conditions, and neither of these goals can be
achieved independently, as x may only be permutated. However, these can be achieved together, still retaining
the overall x values, by using SWAP(p ,n).
The recursive function call then matches the original function call, giving the completed program.
The only undetermined fact is the choice of y. This requires
a) choosing the value for y (any is sufficient, as otherwise the synthesis would have chosen one)
b) adding this to the parameter list of the procedure PART. Choose y to be x(n).
E. Conclusion
The method illustrated here combines program synthesis with program analysis. Methods used in
pro-
gram verification, such as the use of invariants and predicate transformers (Manna [1974]) are utilized
in pro-
gram analysis, and generalized to a form more suitable for combining program analysis with
synthesis. The
generalization, called the passback pair, allows flexibility in the amount of analysis required
without affecting
the correctness of the resulting program.
COMBINING SYNTHESIS WITH ANALYSIS — 107
The program synthesis is based on passing back goals, and protecting relations required for verification.
These methods are based on those found in Manna and Waldinger [1977 a,b] and Waldinger [1977]. They
have been extended for use in generating recursive programs in conjunction with the passback pair. The syn-
thesis is driven by a set of strategies corresponding to programming knowledge. A wide variety of programs
have been synthesized by the automatic programming system, PROSYN, based on the above principles, as
described in the introduction. The above two examples of sorting algorithms show how the mechanisms work,
and their flexibility in the synthesis process.
References
Darlington [1977]
J. Darlington, ‘‘A synthesis of several sorting algorithms,’’ DAI Research Report No. 23A, Univ. of Edin-
burgh, 1977.
Follett [1980a]
R. Follett, ‘““Synthesizing recursive function with side effects,’’ AJ Journal, Vol. 13, No. 3, May 1980, pp.
i752 00)
Follett [1980b]
R. Follett, ‘“Automatic program synthesis,’ Ph.D. Thesis, Univ. of NSW, Australia.
Manna [1974]
Z. Manna, ‘‘Mathematical theory of computation,’> McGraw-Hill Book Co., 1974.
Sussman [1975]
G. Sussman, ‘‘A computer model of skill acquisition,’ American Elsevier Publishing Co., New York, 1975.
Tate [1975]
A. Tate, ‘‘Interacting goals and their use,’ Fourth Int. Joint Conf. on A.I., 1975, pp. 215—218.
Waldinger [1977]
R. Waldinger, ‘‘Achieving several goals simultaneously,’’ Machine Intelligence 8, Ed Elcock and Michie, Wiley
and Sons, New York, 1977, pp. 94-136.
>
eee “ah 168 BAS 7”
sian q is Per
: -
janet] ts,
are eT Aniemh
va 1 TL Nt
ey
- AO are sc
lor ampild
1 ape tuts
oebiireala a»
ron fy
a
PROBLEMATIC FEATURES _ 109
CHAPTER 5
Zohar Manna
Stanford University and Weizmann Institute
Richard Waldinger
SRI International
Abstract
To construct a program in a given programming language, one must describe the features of that
language. Certain ‘‘problematic’’ features, such as data structure manipulation and procedure call mechanisms,
have been found to be difficult to describe by conventional techniques. A unified conceptual framework, based
on a ‘“‘situational calculus’’, has been developed for describing such problematic features. This framework is
This research was supported in part by the National Science Foundation under Grants MCS —78—02591 and MCS—79—09495, in part by
the Office of Naval Research under Contracts N00014—75—C—0816 and N00014—76—C—0687, and in part by the Air Force Office of
Scientific Research under Contract F30602—78—0099.
The author’s addresses: Z. Manna, Department of Computer Science, Stanford University, Stanford, CA 94305, R. Waldinger, Artificial
Intelligence Center, SRI International, Menlo Park, CA 9402S.
110 MANNA AND WALDINGER
compatible with contemporary theorem-proving techniques, and is suitable for incorporation in automatic pro-
gram synthesis, transformation, and verification systems.
A fuller description of the material in this chapter appears in Manna and Waldinger [1981].
A. Introduction
The most widely accepted approach to program verification and to the synthesis of programs with side
effects has been the one described in Floyd’s [1967] paper and formalized by Hoare [1969]. Hoare’s formaliza-
tion requires that each construct of the programming language be described by an axiom or rule, which defines
how the construct alters the truth of an arbitrary assertion. Certain features of programming languages have
been found to be easier to describe in this way than others:
Programs with only simple assignment statements and while statements can be described adequately.
Programs with arrays are more intractable, but can be treated if the array operations are rewritten in
terms of McCarthy’s [1962] assign and contents function.
Operations on other data structures, such as pointers, lists, and records, can be handled only if spe-
cial restrictions are imposed on the language.
Even the simple assignment statement fails to satisfy the usual Hoare assignment axiom if included
in a programming language with other problematic features.
Certain combinations of features have been shown (Clarke [1977]) to be impossible to describe at
all by the Floyd-Hoare technique.
It has been argued (e.g., in London et al. [1978]) that features of programming languages whose seman-
tics are difficult to describe by the Floyd-Hoare technique are also difficult for people to understand and use
consistently. For this reason, a number of programming languages have been designed with the intention of
eliminating or restricting such ‘‘problematic’’ features. Others have objected (e.g., Hoare [1975], Knuth
[1974], deMillo et al. [1977]) that the disciplined use of such ‘‘unverifiable”’ programming features can aid the
clear and direct representation of a desired algorithm, while their removal may force the programmer into
increasingly obscure circumlocutions.
We have recently developed a conceptual framework capable of describing all of these problematic pro-
gramming features. This framework is suitable to serve as a basis for the implementation of verification Sys-
tems, as well as synthesis and transformation systems. We do not argue that the problematic features should
necessarily be included in programming languages without restriction, but we intend that if a language designer
wishes to use some combination of features, that no obstacle should be imposed by verification concerns.
The approach we employ is a ‘“‘situational calculus’’, in which we refer explicitly to the states of the com-
putation. In a given state s, the evaluation of an expression e of the programming language produces a new
State sje. The meaning of the expression can then be defined by axioms that relate the characteristi
cs of the
new state with those of the original state.
. This formalism is quite distinct from that of Hoare, in which no explicit reference is made
to states. In
this respect, our approach is closer to those adopted by McCarthy [1964] and Burstall [1969] for specifying
the
semantics of ALGOL-60 subsets, and by Green [1969] for describing robot operations.
To describe the characteristics of the states of a computation, we introduce ‘‘situational operators,’ func-
tions and predicates whose values depend on the state. In defining these operators, we distinguish between the
PROBLEMATIC FEATURES | 111
expressions of the programming language, the storage locations of the machine, and the abstract data objects of
the program’s domain. The precision of this descriptive apparatus enables us to model the effects of
programming-language constructs in full detail. We can describe and compare various implementations of the
same programming language, or we can ignore the details of implementation if we prefer.
Once we have succeeded in describing the constructs of a programming language, we can use that descrip-
tion in proving that programs in the language satisfy a given specification. The situational operators can be
used not only to describe the constructs of the language but also to represent the specifications of a program.
Indeed, they are more expressive for this purpose than the conventional input/output assertions, because they
enable us to refer in a single sentence to different states of the computation. For example, it is possible to say
directly how the final value of an identifier relates to its initial or intermediate values. To show that a program
satisfies such a specification, we then prove a corresponding theorem in situational calculus.
The situational-calculus approach can be applied not only to prove that a single program satisfies given
properties, but also to prove that an entire class of programs, or a programming language, satisfies given proper-
ties. For example, we can state and prove that the ‘‘aliasing’’ phenomenon, in which two identifiers are
regarded as different names for the same variable, cannot be created in languages which satisfy certain con-
straints.
Although the approach has been devised to extend to languages for which the Hoare formalism breaks
down, it can also be used to show that the Hoare formalism does actually apply to suitably restricted program-
ming languages. For example, we can show that the Hoare assignment axiom (which fails to apply to most
languages used in practice) is true and can be proved as a situational-calculus theorem for languages in which
the problematic features have been omitted.
Up to now, we have been discussing the use of a situational-calculus approach for proving properties of
given programs and classes of programs. Historically, however, we were led to this approach in developing a
method for program synthesis, i.e., the systematic construction of a program to meet given specifications. We
have described in Chapter 2 a deductive technique for the synthesis of applicative programs, which yield an out-
put but produce no side effects. We can now construct programs that may produce side effects by applying the
same deductive technique within the situational calculus. More precisely, to construct a program to achieve a
desired condition, we prove the existence of a state in which the condition is true. The proof is constrained to
be ‘‘constructive,’’ so that a program to achieve the desired condition can then be extracted from the proof.
The same deductive technique can be applied to the task of transforming a given program, generally to
improve its efficiency. Often, the performance of a program can be augmented, at the expense of clarity, by
applying transformations that introduce indirect side effects. This transformation process can be conducted
within a situational-calculus deductive system, to ensure that the original purpose of the given program is
preserved.
The approach of this work is similar in intent and scope to that of denotational semantics, but it relies on
a simpler mathematical framework. We do not use functions of higher type, lambda expressions, or fixed
points. Situational calculus can be embedded comfortably in a first-order logic to which the well-developed bar-
rage of mechanical theorem proving techniques, such as the unification algorithm, can be applied. In particular,
no special difficulty is presented by the existential quantifier, which is outside of the scope of denotational
semantics based systems (e.g., Gordon et al. [1979]), but which is valuable for program verification and crucial
for program synthesis.
Because of space limitations, it is impossible to present the technical details of our situational-calculus
approach here. We must be content with outlining some of the features of programming languages that have
caused problems in the past, to indicate sources of the difficulty, and to give some hint of the conceptual
framework with which we approach these problems. For a more detailed discussion, see the full version of this
paper (Manna and Waldinger [1981]).
We begin with one of the less problematic features—the simple assignment statement.
112 MANNA AND WALDINGER
B. Assignment to Identifiers
By the simple assignment statement we mean one of the form
Xess
where x is an identifier and t is an expression in the programming language. The Hoare axiom for such an
assignment may be expressed as
indicating that if the assertion P — (x —t) holds before executing the assignment x = t, and if the execution
terminates, then the assertion P holds afterward. Here, P — (x + t) is the result of replacing all free
occurrences of x in P by t. The rationale for this rule is that the value of x after executing the assignment
x +t will be the same as the value of t before; therefore, anything that can be said about t before the exe-
cution can be said about x afterwards.
However, the above reasoning is faulty, and only applies if certain restrictions are applied to the expres-
sion t, the assertion P, and the situation in which the assignment takes place. Let us examine some of these
restrictions:
@ The expression t must be static, in the sense that its evaluation must not itself produce side effects. For
example, in the assignment
x —(x + (y -y+t]))
the evaluation of t, i.c., xt+(y —y+t+l1), has the side effect of altering the value of the identifier y. Such
assignments are legal in the ALGOL dialects and in LISP. If we take the assertion P to be y=0, then accord-
ing to the Hoare axiom, we have
(Note that the assertion (y=0) — (x —t) reduces to y=0 because x does not occur in y=0.) However, this
sentence is false, because if y is 0 before executing this assignment, then y will be 1, not 0, afterwards.
Similarly, the assignment
x — f(x),
where f is a procedure that has the side effect of increasing the value of the global identifier y by 1, violates
the instance of the Hoare axiom
fy = 0} x —f(x) fy=0}.
@ The assertion P may not refer to the value of an identifier except by mentioning the identifier expiicitly.
For example, suppose P is the assertion
(Pex OIE
PROBLEMATIC FEATURES _ 113
is an instance of the Hoare axiom. However, if x is the only identifier whose value is 2 before executing the
assignment, then P will become false afterwards.
@ Here, the axiom broke down because P referred to the value of x without mentioning x itself.
@ In the situation in which the assignment to the identifier x takes place, there must be no way to refer to x
indirectly, in terms of other identifiers. For example, suppose x and y are ‘‘aliases,’’ i.e., they can be
regarded as different names for the same variable. Then changing the value of x will also change the value of
y. If P is the condition {y=0}, then the instance of the Hoare axiom
is false, because after executing the assignment, the value of y will be 1, not 0.
In practice, the aliasing phenomenon can arise in languages that admit procedure calls. For example, sup-
pose we have a procedure
f(x,y) <=x-+-1l
whose parameters are passed by a call-by-reference mechanism. In other words, in executing a procedure call
f(u,v), where u and v are identifiers, the identifiers x and u become aliases, and the identifiers y and v
become aliases. In executing the procedure call f(u,u), all three identifiers x,y, and u become aliases, so alter-
ing the value of x will alter the value of y as well. Thus, the assignment statement x+— 1 that occurs in the
body of the procedure f(x,y) will violate the instance of the Hoare axiom
C. Array Assignments
The direct translation of the Hoare assignment axiom to array assignments is
This sentence is false even for straightforward expressions t and assertions P, and for the simplest situations.
For example, the sentence
is an instance of the above sentence because alx] does not occur at all in the assertion aly] =0; but, of course,
the sentence is false if x and y have the same values. The problem is that, while it is exceptional for two
identifiers x and y to be aliases, it is commonplace for two array entries alx] and aly] to be alternate names
for the same entity.
The difficulty has been approached (McCarthy [1962]) by regarding the entire array as a single entity, so
that assigning to any of the array’s entries produces a new array. More precisely, we regard the entire array a
as an ordinary identifier, we treat an array assignment al[x] +t as an abbreviation for a simple assignment
114 MANNA AND WALDINGER
a + assign(a,x,t),
contents(a,y).
The assign and contents functions are then assumed to satisfy the properties
contents(assign(a,x,t),y) =t if x=y
and
contents(assign(a,x,t),y) = contents(a,y) if x #y.
Programs involving arrays can then be treated by the Hoare axiom for simple identifier assignments.
Thus, the previous false sentence
{contents(a,y)
=0} a —assign(a,x,t) {contents(a,y)
=O}.
This sentence is not an instance of the Hoare assignment axiom, because the assertion contents(a,y)=0 does
contain an occurrence of the identifier a. The true instance of the Hoare axiom in this case is
a + assign(a,x,t)
{contents(a,y) =0},
{contents(assign(a,x,t),y) =0}
a —— assign(a,x,t)
{contents(a,y) =0}.
Although this solution still suffers from the limitations associated with the simple assignment axiom,
it
resolves the special difficulties arising from the introduction of arrays.
D. Pointer Assignment
To describe the pointer mechanism, let us introduce some terminology. If an
identifier is declared in a
program, then there is some J/ocation bound to that identifier; we can regard the
location as a cell in the machine
memory. If two identifiers are aliases, they are bound to the same locations.
A location may contain data, or it
may store (the address of) another location; we thus distinguish between data
locations and storage locations. A
pointer IS a storage location which stores (the address of) another Storage
location. There are many notations
for pointers in different programming languages; ours is typical but is
not actually identical to any of these.
PROBLEMATIC FEATURES | 115
Xa ne
where x and y are both identifiers. Here, the notation {| y means the location bound to the identifier y. The
result of this assignment is that the location bound to y is stored in the location bound to x. The
configuration produced may be represented by the following diagram:
Figure 2.1
Here, a and B are locations bound to x and y, respectively, y is the location stored in B.
If we subsequently execute a simple assignment statement
ia
where t is an expression, we alter the contents of the location B that y is bound to. The location B will then
store the location y yielded by the evaluation of t. The new configuration can be represented by the following
diagram:
Figure 2.2
We have remarked that such a configuration can easily lead to violations of the Hoare assignment axiom: a
simple assignment to y can alter the truth of an assertion about x.
Suppose instead we execute the special pointer assignment
Lexar te
The notation | x means that the location altered by the assignment is not the location @ bound to x but rather
the location B stored in a. In other words, the effect of the above pointer assignment is precisely the same as
that of the simple assignment y +t, and results in the same configuration depicted above.
1146 MANNA AND WALDINGER
(P—+({x-—t)} | xt {P}.
The sentence
fy=0} |x —1 {y=0}
is an instance of this axiom, because | x does not occur in the assertion y=0. However, as we have seen, if x
“points to” y the assignment | x 1 can set the value of y to 1. In short, the simple adaptation of the
Hoare assignment axiom fails to describe the action of the pointer assignment, because the assignment can alter
the value of an identifier not mentioned explicitly.
The assign/contents technique for arrays has been extended (e.g. see Cartwright et al. [1978]) to pointers
by regarding all the identifiers in the program as entries in a single array v, which is indexed not by integers
but by identifiers. These array operations can then be treated as simple assignments in terms of the assign and
contents functions, and are correctly described by the Hoare simple assignment axiom and the two McCarthy
axioms for assign and contents.
Qy ap
Figure 2.3
Es)
Figure 2.4
LISP provides two functions, which we will call left and right, for accessing the corresponding descendents
of a binary-tree location. Suppose t is an expression whose evaluation yields a binary-tree location a; in other
words, @ represents the value of t. Then the evaluation of left(t) and right{t) yield the left and right descen-
dents of a, respectively.
LISP also provides two operations for altering binary trees: the replaca operation, which we denote by
leit(eaty
right.(e) 1;
where e and t are any expressions. If the evaluation of e yields a binary-tree location a, and if the evaluation
of t then yields a location B, then the rplaca operation left(e) —t will cause B to become the new left descen-
dent of a. The replacd operation behaves analogously.
The problem in describing the rplaca and rplacd operations is precisely the same as for the pointer assign-
ment: a rplaca operation on one binary tree can alter the value of another without mentioning it explicitly. For
example, suppose that x and y are identifiers associated with binary-tree locations a and B, respectively, in the
following configuration:
Y Y2
Figure 2.5
118 MANNA AND WALDINGER
iG.
Neg
el
Figure 2.6
where 6 is the location yielded by the evaluation of t. Note that the subsequent evaluation of the expression
left(left(x)) yields the location 8, not the location y;. In other words, the value of x may have been changed
by the rplaca operation, even though the operation was applied to y, not to x. Similarly, if @ had any other
ancestors before the execution of the rplaca, then their values could also have been affected by the operation.
It seems that to model the effects of such an assignment completely, we must know all of the ancestors of the
altered location.
Many languages admit a more general form of tree structure called a record, in which a record location can
store several other locations. Binary trees can then be regarded as a special type of record. The same problems
that arise with trees clearly apply to records as well.
The assign/contents formalization of arrays has been extended to apply to tree and record structures by
Wegbreit et al. [1972], Cartwright et al. [1978], and Kowaltowski [1979]. Burstall represents the operations that
alter tree and record structures by introducing new functions to access the structures. For example, an rplaca
operation is said to create a new access function left’, which behaves like the left function after the execution
of the assignment.
F. Expressiveness of Specifications
Many of the difficulties that prevent us from describing the behavior of individual programming con-
structs with the Floyd/Hoare approach also obstruct our efforts to express the specifications that describe the
desired behavior of entire programs. The only mechanism for forming specifications in that approach is the pair
of input and output assertions. We have already encountered one weakness in the expressive power of such
assertions: there is no way to refer to an identifier without mentioning it explicitly. For example, we were
unable to deal with an assertion such as
not (alias(x,y))
as part of the program’s input assertions, and to describe the relation alias(x,y) by axioms or rules of inference.
PROBLEMATIC FEATURES 119
However, this relation cannot be expressed in an assertion because x and y are meant to refer to locations, not
values. Thus, the relation will violate even the simple assignment axioms; e.g., the instance of the Hoare
axiom
falias(x,y)}z —y {alias(x,z)}
is false: if x andy are aliases, then assigning the value of y to z will not cause x and z to become aliases.
This shortcoming foils a plausible approach to retaining the Hoare formalism by forbidding aliasing to
occur in situations where it can lead to trouble. We certainly forbid such occurrences, but we cannot express
the condition we want to forbid as a Hoare assertion.
Another awkwardness in the Floyd/Hoare assertion mechanism as a specification device is its inability to
refer to more than one state in a single assertion. Thus, it is impossible in an output assertion to refer directly
to the initial or intermediate value of an identifier. For example, suppose we want to say that a program rev-
erses the values of two identifiers x and y. The traditional approach is to introduce a ‘‘ghost’’ input assertion
The purpose of the input assertion is merely to give names to the initial values of x and y. We must be care-
ful, of course, that x, and yo are new identifiers that do not occur in the program.
The flaw in this solution is apparent if we attempt to use the above program not in isolation but as a seg-
ment of a larger program or as the body of a procedure. In this case, we would normally have to prove that the
initial assertion
is true when control enters the segment; but this is impossible, because x, and y, are new symbols that cannot
occur earlier in the program.
G. Procedures
We have already seen that procedure calls can cause aliasing to occur, which obstructs attempts to axioma-
tize the assignment statement; we have also seen how global side effects of procedure calls foil the assignment
statement axiomatization. Many other problems arise in describing the procedure call mechanism itself. Let us
consider only one of these difficulties: expressing how global identifiers of procedures are treated in languages
with static binding.
A global identifier of a procedure is one that occurs in the procedure’s body but that is not one of its
parameters. For example, consider the procedure f(x) declared by
Static binding is difficult to treat by a Hoare rule for a procedure call because it requires that we refer to
the binding the global identifier y had in a much earlier state, when the procedure f was first declared. In
the meantime, of course, y may have been redeclared.
* OK OK OK OK
It is our intention that the situational-calculus approach constitute a single conceptual framework capable
of describing all of these problematic programming-language features. This framework is compatible with con-
temporary theorem-proving techniques, and can be incorporated into systems for the synthesis, verification, and
transformation of computer programs.
References
Burstall [1969]
R.M. Burstall, ‘‘Formal description of program structure and semantics in first order logic,’ in Machine Intelli-
gence 5, B. Meltzer and D. Michie (eds.), Edinburgh University Press, Edinburgh (1969), pp. 79—98.
Clarke [1977]
E.M. Clarke, Jr., ‘“‘Programming language constructs for which it is impossible to obtain ‘good’ Hoare-like
axiom systems,”’ Proc. of the Fourth ACM Symp. on Principles of Programming Languages, Los Angeles, CA (Jan.
is pp. L020:
Floyd [1967]
R.W. Floyd, ‘‘Assigning meanings to programs,’’ in the Proc. of the Symp. on Applied Mathematics, Vol. |eeel pl
Schwartz (ed.), Providence, RI (1967), pp. 19—32.
Green [1969]
Ce Green, ‘Application of theorem proving to problem solving,’’ in the Proc. of the International
Joint Conf. on
Artificial Intelligence, Washington, D.C. (May 1969), pp. 219—239.
Hoare [1969]
Spaces “An axiomatic basis for computer programming,’ CACM, Vol. 12, No. 10 (1969), pp.
LO .
PROBLEMATIC FEATURES | 121
Hoare [1975]
C.A.R. Hoare, ‘‘Recursive data structures,’’ /ntl. Jour. of Computer and Information Sciences, Vol. 4, No. 2 (June
1975).
Knuth [1974]
D.E. Knuth, ‘‘Structured programming with ‘go to’ statements,’’ Computing Surveys, Vol. 6, No. 4 (Dec. 1974),
pp. 261—301.
Kowaltowski [1979]
T. Kowaltowski, ‘‘Data structures and correctness of programs,’’ JACM, Vol. 26, No. 2 (April 1979), pp.
283—301.
McCarthy [1962]
J. McCarthy, ‘‘Towards a mathematical science of computation,’ in /nformation Processing, Proc. of IFIP
)
McCarthy [1964]
J. McCarthy, ‘‘A formal description of a subset of ALGOL,’’? Report (AIM—24), Stanford University, Stan-
ford, CA. (Sept. 1964).
1@e. “eo
: i A| a oe
te “ele as 1
— | we 4 ire
@ ' ~
- sold At
j =)
7 —
7 a
- >
ae _ i . mi?
ey
"7 .
-
—— 7 i OL)
— twa B46 ce
: = i) as owl
. ii Purd Werte
a
AN INTERACTIVE TOOL 123
CHAPTER 6
Anne Adam
Maitre- Assistante at the University of Caen
14032 Caen cedex FRANCE
G.R. 22 of C.N.R.S.
Paul Gloess
International Fellow at Stanford Research Institute
333 Ravenswood Ave., Menlo Park
California 94025, U.S.A.
Jean-Pierre Laurent
Professor at the University of Chambery
B.P. M04 73011 Chambery, FRANCE
Abstract
An interactive system for understanding programs has been designed. This system provides information
about control structure and data flow. It also performs powerful semantic transformations that are checked for
validity. The system relies on previously implemented algorithms that apply to a graph representation of pro-
grams.
124 ADAM, GLOESS, LAURENT
A. Introduction
A practical system, LAURA, has been previously implemented by A. Adam and J.P. Laurent [1980] and
is used at the University of Caen to debug student programs by comparing them with a program model.
LAURA relies on a theoretical work about graphs by A. Adam and J.P. Laurent [1978] and on related algo-
rithms.
We now intend to use these existing algorithms with a different purpose. The system we present in this
paper can be used interactively to understand the meaning of a program. Possible applications are debugging,
maintenance and improvement of readability.
The system provides information about control structure and data flow. It can also apply powerful
transformations to modify the control structure and sometimes extract formulae that express the program’s
meaning.
Since the system deals with a graph representation of programs which is independent of the source
language, it is not concerned with such syntactic transformations as those performed by some program editors
or manipulators. It is entirely devoted to powerful semantic transformations. It is also able to check their vali-
dity before applying them.
Understanding is a very subjective notion. Hence it is essential to have an interactive system. It is also
necessary to have a flexible and well-suited command language, the main features of which are described in this
paper.
Special commands allow the user to insert comments, or restore a previous state. They are not described
here but the reader should keep their existence in mind to better understand the usefulness of our tool.
READ Z
[:=]
c @ X=]
Y:=]
X:=X+2
Y:=Y¥+X
Pe]
Y>Z?
Yes
WRITEI
Comments:
command (1): The system solves the recurrence equation X(j) = X(j-l1) +2 with j varying from 1 to I and
with X(0) = 1. The result is X(I) =1 +2 * 1.
command (2): The system finds that the value of X used in b is 1 +2* 1, and that it may replace X by
1+2*ITin Y =Y +X without changing the meaning of the program. It does so.
commands (3) and (4): The system may remove the definitions of X since this variable is no more used.
After execution of these first commands we obtain the graph of Fig. 2
d @ Y:=]
¥:=Y+1+2xl
I=]+]
Gray
Yes
WRITE I
It is now possible to ask the system to deal with the variable Y. For that purpose we may use the two
commands:
Comments:
command (5): The system solves the recurrence equation Y(j) = YG—1) +1 +2*(G-—1) with j varying from 1
to I and Y(0) = 1. The result is YI) = (1 +1)**2.
command (6): The system does not perform the substitution and issues a diagnostic, because the definition d
of Y is no longer valid in t since Y as been redefined in b. The system suggests using
definition b.
command (7): The system replaces Y in the final test t by its value (1 +1)**2. ("! SUBST Y*t" would have
had the same effect).
Then, considering the resulting program in Fig. 3, it is easy to understand that the final value of I is
SQRT(Z).
126 ADAM, GLOESS, LAURENT
READ Z
[r=]
Y:=]
Y:=(1+1 )**¥2
[=I +]
(1+I) x%2>Z?
Yes
WRITEI
It should be emphasized that the meaning of the program, although it was a short one, was not obvious at
all from the outset. However, the meaning was easily discovered by issuing a few commands.
In larger programs, other commands, that deal more specifically with the control structure, will be useful
to break the complexity.
; , 1 (a,b,c)
2 (e,a,b,c,d)
1 Intuitively, an oval (a ‘‘fuseau’’ in French) is a piece of program with only one entry point
and only one exit point. It is not just a simple
succession of assignments. It may have a complex structure since the whole program itself
is an oval.
2 Some commands provide information about the program without modifying it:
their name starts with ‘‘?” . The other on i
the program: they start with ‘‘!’’. Soy medi)
AN INTERACTIVE TOOL — 127
Note that the system has to look for subgraphs that correspond to the classical programming notion of
loops, and not only for simple circuits. According to our system a loop is a union of circuits with same entry
and exit nodes (see Fig. 5).
?LOOPS a returns
1 (a,b,d,e,c)
(a,b,d,e)
(a,c,d,e)
One can also obtain the entry nodes and the exit nodes of a loop. For example, with the graph of Fig. 4,
2ENTRY (a,1) returns a and b; ?7ENTRY (a,2) returns e. ?EXIT (a,1) returns c; ?7EXIT (a,2) returns d.
In a programming language with GOTO statements, the loops often have arbitrary structures. For under-
standing, it is useful to standardize the structure of loops. By splitting some nodes, the system can reduce the
number of entry points of a loop. It can also standardize the structure according to the user’s preference.
The command !ONE-ENTRY-LOOP (a,n) returns a loop with only one entry node. It is useful for
separating intermixed loops. For example the graph of Fig. 4 may be transformed into that of Fig. 6. In this
graph, the loop (a,b,c) now has one entry point, which makes the structure more obvious.
a3
The command !WHILE-LOOP (a,n) returns a loop in which the entry point is also an exit point. The
command !DO-LOOP (a,n) returns a loop in which an exit point is directly connected to the entry point by one
128 ADAM, GLOESS, LAURENT
arc. After execution of one of these commands, the system can generate a WHILE statement (resp. a DO
statement) corresponding to the new loop.
deel =1+]
EY) 1:=0
f} T= I+] T=[+]
P:=PxX+A(I) P:=PxX +A(I)
Q:=Q*X+(N-I+1)%A(I) — yes\@ ISN?
yes I<N? I:=0
[:=I+l
Q:=Q*X +(N-I+1) #A(1)
yes I<N?
The two resulting loops may be transformed afterwards into two formulae, using the RECUR command
described below.
D. Data Flow
A variable X may be defined at different nodes, for instance by READ X at one node and by X=Y+Z at
another one. It is used by nodes such as WRITE X, or Y=X+1. X may eventually be used and redefined in
the same node, e.g., X=X+Y.
Studying the various definitions of a variable, as well as the range of each definition, is essential to under-
standing the meaning(s) of the variable. Graph representation of the control structure is very convenient for
dealing with this kind of problem.
RANGE V a: if a defines V, returns the set of nodes that may use the value of V defined by a. (Note that if
this set is empty, a can be deleted using !|REMOVE a).
READ X
The renaming of variables increases the readability of a program and may also
help pinpoint a bug.
D.3 Substitutions
| Replacing occurrences of a variable by its definition often facilitates understa
nding. The system can deter-
mine whether a substitution is possible or not, and execute the
following commands:
'SUBST V d *: definition d is substituted for variable V wherever possible. The set of altered nodes is
returned.
'SUBST V * u: looks for a definition d such that !SUBST V d u may be performed. If d exists, the substitution
is performed.
'SUBST * * *: all possible substitutions in the program are performed. The set of altered nodes is returned.
The system may simplify arithmetic expressions, which is often useful after a substitution.
Substitutions may exhibit the meaning of a sequence of computations. This is illustrated by the example
of Fig. 12.
X:=B°-4AC
U:=-B/(2A) By !SUBST **a
and !SUBST * * b,
aks a becomes X1=-B/(2A)-VB**2—4AC/(2A)
f= 00
[:=]
S:=A(1)*S+B(1)
[:=I+]
I>N?
Yes
n n-l n
f(N):=SO +JJ AG) + DG * TJ Ak) + BN)
j=! j=l k=j+1
A special command, !RECUR e, looks for the innermost loop containing e, and if there is in e a recurrence
equation that is linear and of the first order, solves it. Two cases may then occur:
— if the loop has the general form of Fig. 13, the loop is replaced by the assignment S:=f(N). Two
examples of such extractions of formulae are given in Fig. 14.
P:=A(0) P:=A(Q)
[:=] [:=]
ee £® P:=PxX+A(1) fe P:=P+A(I)
xXxx(N-I)
I=N? T:=[ +]
Yes L>N
Yes
— if the loop has a more complex form (for instance the exit test is not a comparison between the index
and one integer value), the node a is replaced inside of the loop by the assignment S:=f(I). The intro-
ductory example demonstrates the powerfulness of this particular transformation.
Conclusion
Text editors that deal with character strings only are widely used to type and modify programs but they
cannot help understanding them. More specific systems such as the INTERLISP analyzer MASTERCOPE (e.g.
W. Teitelman and Kaplan [1978]) or the program manipulator MENTOR (e.g. V. Donzeau-Gouge et al.
[1979]) have some knowledge of the programming language syntax. MENTOR in particular is able to perform
complex syntactic transformations, which may be a step towards understanding.
However, a specific system for understanding programs must have some semantic knowledge of the pro-
gramming language. This is the case for our system which uses graph and data flow transformations that
preserve the outputs. It is actually a tool for discovering the meaning of a program.
Interestingly, most of the transformations performed by our system are deoptimizing transformations.
This is quite natural since the most optimized programs are generally the most difficult to understand. Under-
standing and optimization are opposite goals. It is then difficult to compare our system with those which
transform programs in order to optimize them, such as the Burstall and Darlington [1978] system.
Our system may be used to debug a program, by checking whether sequences agree with their comments.
It may be used to increase readability by adding meaningful comments. In particular, it could also be an
efficient tool for maintaining programs: maintenance often requires understanding somebody else’s procedures,
which is always hard. It also requires modifying specific parts of a program without introducing bugs anywhere
else. The ability of our system to locate ovals and to provide information about their external variables may be
very helpful for that.
We now intend to realize the system and experiment with it in order to define good strategies for users.
References
Gerhart [1975]
S.L. Gerhart, ‘‘Knowledge about programs. A model and case study,’ IEEE Conf. on Reliable Software
(1975).
King [1976]
J.C. King, ‘‘Symbolic execution and program testing,’ CACM, Vol. 19 (July 1976), n° 7.
Loveman [1977]
D. Loveman, ‘‘Program improvement by source to source transformation,’’ JACM, Vol. 24 n° 1(January 1977).
Rosen [1975]
B.K. Rosen, *‘Data flow analysis for recursive PL/I programs,’’ IBM Res. Report RC 5211 (January 1975).
Rosen [1976]
B.K. Rosen, *‘Data flow analysis for procedural programs,’’ IBM Res. Report RC 5948 (April 1976).
Ruth [1976]
G.R. Ruth, ‘‘Intelligent program analysis,’ Artificial Intelligence, Vol. 7 n° 1(1976).
Tarjan [1972]
oe ita ‘“Depth-first search and linear graph algorithms,’ SIAM Journal Computing (June 1972), pp.
AN INTERACTIVE TOOL 135
Waters [1978]
R.C. Waters, ‘‘Automatic analysis of the logical structure of programs,’’ MIT Report TR—492 (December
1978).
Wertz [1978]
H. Wertz, ‘‘Un systeme de comprehension, d’amelioration et de correction de programmes incorrects,’’ These
de 3°™ cycle PARIS VI (1978).
SECTION III
a
EFFICIENT DATA REPRESENTATIONS — 139
CHAPTER 7
John Darlington
Department of Computing
Imperial College, London
A. Introduction
It is a recognized programming discipline to first approach a task at the appropriate level employing high
level data types and then to design appropriate structures that will represent these data types efficiently. For
example, priority queues can be efficiently implemented as_ binomial trees, Vuillemin (1976), or trees as vec-
tors, Floyd (1964). In this chapter we would like to show how consideration of the computations being per-
formed at the higher level can assist in the design of suitable data representations and propose a method
whereby an automatic programming system could conceivably invent efficient representations for itself. How-
ever we must stress that the ideas presented are very preliminary and we are not intending to incorporate them
in any of our experimental program development systems, Darlington [Darlington 1978].
140 DARLINGTON
We will use an applicative language, NPL, Burstall [1977], to write our programs and an equational
method for defining our data types (Liskov [1975] and Guttag [1977]). The manipulations that we will perform
on these programs and data type definitions in order to design efficient representations will be based on the for-
mal system outlined in Burstall and Darlington [1977]. However readers not familiar with that work can regard
the manipulations as simple equality replacements.
In section B we outline our method of designing data representations and give a simple example of its
application. This method relies on being able to decompose a given function into two or more simpler func-
tions and we describe some techniques for achieving this in section C. In section D we present two examples
of representation design, namely the design of the vector of booleans representation for sets of integers and the
invention of search trees from consideration of binary search on ordered arrays. Finally, in section E we dis-
cuss some possible enhancements to this method.
the purpose of a representation for D1 in another domain Dconc may be to facilitate f. It can do this by stor-
ing some of the computation of f in a structure in Dconc so that its recomputation can be made more easily.
This is equivalent to saying that there is a function rep that encodes some of the computation of f into a data
see in Deonc and a function fconc that takes this encoding and completes the rest of the computation of
, thus
ee eee
rep j=
Dconc
feonc (rep
1))=(d
f(d1) for d1 in D1
Often D2 will be an abstract domain, often D1 itself. In this case we have the following picture and equa-
tion
2 een
rep rep
fconc
Dconc ——————— Dconc
B.1 A Simple Example, the Invention of Ordered Lists From Unordered Lists.
To show our technique in action we will first use a very simple example. The reader is warned that two
steps used are fairly unmotivated. Happily when we come to more substantial examples things become more
mechanical and better motivated.
Our abstract data domain dl will be conventional lists of integers.
142 DARLINGTON
type lists
operations
nil: — list
Axioms
hd(cons(n,l)) =n
tail(cons(n,l)) = |
Assume that we have the operations min and deletemin defined over these lists.
min(x::nil) <= x
min(x2::1) otherwise
deletemin(1) <=delete(min(1),1)
delete(x1,x2::1)
<= | if xl = x2
x2::delete(x1,1) otherwise
Assume that we have decided to stay within lists but we would like to have a representation
that enabled
us to compute min and deletemin more efficiently. Thus we need to synthesize functions
rep, minconc and
deleteminconc.
EFFICIENT DATA REPRESENTATIONS — 143
such that,
minconc(rep(1)) = min(I)
and,
deleteminconc(rep(l)) = rep(deletemin(I))
Here we take our inventive step mentioned earlier. We are seeking functions for minconc and deletemin-
conc that are simple to compute. The simplest functions that are of the right type are hd and tail respec-
tively. Let us see what happens if we try these for minconc and deleteminconc. Substituting in our equations
we get,
hd(rep(1)) = min(1)
tail(rep(1)) = rep(deletemin(1))
and we can regard this as one equation defining a representation function that orders the list by performing a
selection sort. This equation can be used when | has one element or more. To produce a base case we need to
consider the case where | has only one element, thus we need rep such that
hd(rep(x::nil)) = min(x::nil)
and
tail(rep(x::nil)) = rep(deletemin(x::nil))
144. DARLINGTON
Thus we have
hd(rep(x::nil)) = x
and
tail(rep(x::nil)) = rep(nil)
To make our recursion terminate we have to evaluate rep(nil). Given min and deletemin alone this is
difficult as both min and deletemin are undefined for the empty list but it seems reasonable to take rep(nil) =
nil. Thus we have our equations for rep.
append(nil,Y) <= Y
and,
x::g(X,Y,Z) unfolding g
We are attempting to produce a recursion for h. Looking at the right hand side of the last line above we
see that it is of the same form as the right hand side of the main equation for append. Thus we are led to the
hypothesis that h is similar to append. More precisely we can produce an equation for h
h(x::X,Y) = x::h(X,Y)
Using this equation to fold the right hand side of our equation we get
h(k(x::X,Y),Z) hoe:
kX Y):Z)
k(x::X,Y) = x::k(X,Y)
(We have of course assumed that if h(X1,Y) = h(X2,Y) then X1=X2. But this is true for the h we have
chosen.)
We have finally to produce base cases for our recursion for h and k. We do this by first considering (A)
when X is nil getting
h(k(nil,Y),Z) = g(nil,Y,Z)
= append(Y,Z) (unfolding)
146 DARLINGTON
k(nil,Y) = Y
This gives us
h(Y,Z) = append(Y,Z)
which is enough.
Thus the equations we have developed for h and k are
and hand k are exactly append. This is not always the case as we shall see later even though we may use an
already existing equation to base our new equation on the new function when fully defined may be different.
g(nil) <= 0
h(k(n::X)) = g(n::X)
2n+g(X)
k:lists — lists
h:lists — integers
Thus we have to rewrite (B) to make this intermediate list explicit. We do this by inserting subexpres-
sions of the right hand side of (B) into a list structure and immediately reading them out again. Thus we
rewrite B as,
h(k(n::X)) = uth(v)
where u::v == 2n::k(X)
Now we can identify a subexpression on the left hand side with a subexpression on the right hand side
getting
k(n::X) = 2n::k(X)
Returning to (B) we can unfold the left hand side using our new equation getting
h(2n::k(X)) = 2n+h(k(X))
Generalizing, we get
h(u::v) = 2u+h(v)
This technique of introducing an intermediary data structure by writing information into it and then
immediately reading it out has been called anti-projection by Gerard Terrine in his related work on the inven-
tion of data representations, Terrine [1978]. The manipulations performed above are exactly the reverse of the
manipulations performed in the more conventional transformation optimizing a composition of two functions to
a single recursion. See Burstall and Darlington [1977]. There, projection axioms such as head(cons(u,v) =u
and tail(cons(u,v))
=v are used repeatedly. This connection with well known transformations provides strong
heuristic guidance and helps to motivate some of the seemingly unmotivated steps above.
148. DARLINGTON
h(k(nil)) = g(nil)
=0
We have complete freedom of choice but as k maps lists to lists it seems reasonable to set
k(nil) = nil
h(nil) = 9
k(nil) = nil
k(n::X) = 2n::k(X)
h(nil) = 0
h(n::X) = n+h(X)
The above manipulations are heavily heuristic in their application. However the equations produced, if
they are adequate to define a function, are guaranteed to be correct according to their defining equation.
Whether or not they will be useful is a question to be settled by efficiency considerations in the representation
context.
type sets
operations
empty:set — boolean
nilset: — set
has:set x integer — boolean
add:set x integer — set
remove:set x integer — set
Axioms
empty(nilset) = true
empty(add(s,i)) = false
has(nilset,i)) = false
has(add(s,il),i2) = if il=i2 then true
else has(s,i2)
etc.
type vector
operations
emptyv: — vector
isemptyv:vector — boolean
assign: vector x integer x item — vector
read:vector x integer — item
150 DARLINGTON
Axioms
isemptyv(emptyv) = true
isemptyv(assign(v,in,it)) = false
read(emptyv,in) = error
read(assign(v,in1,it),in2) = if inl = in2 then it (A)
else read(v,in2).
We will choose has to help us design a representation for sets. We therefore need to synthesize two
functions hasconc and rep such that
hasconc(rep(s),i)=has(s,i) (B)
Thus if rep is of type:set — ?, hasconc is of type:? x integer — boolean. Looking first at the case add(s,il)
and expanding has we have
hasconc(rep(add(s,il)),i2) = has(add(s,il),i2)
= if 11 =i2 then true
else has(s,i2)
Using equation (B) we can rewrite the right hand side getting
= if il=i2
then true
else hasconc(rep(s)
,i2)
The computation of the right hand side has now to be split into two, one bit corresponding to hasconc
and the other corresponding to rep. Looking at the right hand side above we can see it matches with the right
hand side of equation (A) in the vector axioms. Thus we can take this as an equation for hasconc,
hasconc(rep(add(s,il)),i2) = hasconc(assign(rep(s),il
,true) ,i2)
EFFICIENT DATA REPRESENTATIONS _ 151
rep(add(s,il)) = assign(rep(s),il,true)
Thus rep:set — vector of booleans and hasconc:vector of booleans x integer — boolean and we have designed a
representation for our sets of integers as a vector of booleans where the n’th element is true if the integer n is
in the set, a traditional clever’ representation for small sets of integers. Finally, we have to consider the base
cases. Looking at the nilset case for equation (B) we get
hasconc(rep(nilset),i) = has(nilset,i)
= false
Guided by our recursions for hasconc and rep produced above it seems sensible to parcel this computa-
tion up as
rep(nilset) = emptyv
hasconc(emptyv,i) = false
rep(nilset) = emptyv
rep(add(s,i)) = assign(rep(s),i,true)
hasconc(emptyv,i) = false
hasconc(assign(v,il,it),i2) = if il =i2
then it
else hasconc(v,i2)
The reader may justifiably complain that this is not quite the traditional representation for small integer
sets. Usually the vector length is bounded by the largest integer to be included in the set and the vector con-
tains false for those integers that are not in the set. We can achieve this by reconsidering the base case. We
have
hasconc(rep(nilset),i) = false
for all 1<i<n where n is the largest integer to be stored. Now if we retain the first equation derived for
hasconc as the sole equation for hasconc we see that rep(nilset) must be the vector with false at every posi-
TOneees
152. DARLINGTON
type tree
operations
miltree 7 tree
nulltree : tree — boolean
constree : item x tree x tree — tree
valof : tree — item
lefttree < tree — tree
hightiree = tree — tree
axioms
nulltree(niltree) = true
nulltree(constree(i,t;,tz)) = false
valof(constree(i,t),t,)) =i
lefttree(constree(i,t;,tz)) =t,
righttree(constree(i,t;,t2)) = ty
For this example we assume that the items have some total order, >, defined over them and that our
vectors are organized in ascending order. We start our ’invention’ of search trees from consideration of an
operation that performs a binary search for an item on such an ordered vector.
EFFICIENT DATA REPRESENTATIONS — 153
bs(vec,i,j,x) =
ifi >j
then false
else if read(vec,mid(i,j)) = x
then true
then bs(vec,i,mid(i,j)-1,x)
else bs(vec,mid
(i,j) + 1,j,x)
else til
2
size (emptyv) = 0
treesearch(rep(vec),x) = bs(vec,1,size(vec)
,x)
if i>j
then false
else if read(vec,mid(i,j)) = x
then true
else if read(vec,mid(i,j)) > x
then bs(vec,i,mid(i,j)-1,x)
else bs(vec,mid (i,j) + 1,j,x)
EFFICIENT DATA REPRESENTATIONS — 155
Using equation (A) we can replace the calls to bs in the final two lines getting
if i>j
then false
else if read(vec,mid(i,j)) =x
then true
else if read(vec,mid(i,j)) > x
then treesearch(repl(vec,i,mid(i,j) —1) ,x)
else treesearch(repl(vec,mid (i,j) +1,j) ,x)
Following the strategy outlined earlier we have to make use of an anti-projection to make the intermediate
data structure explicit. Here our intermediate data structure or representation is to be trees holding items at
their nodes and having similar trees as their left and right subtrees. Looking at the right hand side of the above
equation, we see that read(vec,mid(i,j)) is of type item and repl(vec,i,mid(i,j)-1) and repl(vec,mid(i,j)
+1,j)
are of type tree. Thus using the projections
valof(constree(n,t),t2)) =n
lefttree(constree(n,t),t,)) = t;
righttree(constree(n,t),t>)) = ty
backwards, we rewrite the right hand side of the above equation getting
= if i>j
then false
else if valof(t) = x
then true
else if valof(t) > x
then treesearch (lefttree(t)
,x)
else treesearch(righttree(t) ,x)
where t = constree(read,(vec,mid(i,j)),
repl(vec,i,mid(i,j)-1),
repl(vec,mid(i,j)+ 1,j))
treesearch(repl(vec,i,j),x) = false
156 DARLINGTON
Again we have an arbitrary choice. But as rep] maps onto trees, it is natural to choose
treesearch(niltree,x) = false
Having made this choice the right hand side splits cleanly. The inner’ portion up to the tree is associated
with rep] and the ‘outer’ portion with treesearch. Thus we get
repl(vec,i,j)
= if 1>j
then niltree
else constree(read(vec, mid (i,j) ),
repl(vec,i,mid(i,j)-1),
repl(vec,mid (i,j) + 1,j))
treesearch(t,x)
= if nulltree(t)
then false
else if valof(t) =x
then true
else if valof(t) > x
then treesearch(lefttree(t)
,x)
else treesearch(righttree(t)
,x)
Thus our representation function organizes the vector into the traditional search tree, one with all items
on the left subtree less than the root and all items on the right subtree greater than the root.
E. Future Developments
Wirsing et al. [1979] stresses the importance of various algebraic models for equational abstract data type
specifications. In particular there is a remark, attributed originally to Wand, that as the terminal models contain
the least amount of redundancy, they could be the best candidates for efficient representations. We have been
thinking along the same lines ourselves and have been considering ways to enrich a data type specification so as
to exclude the initial models. This we can do by adding more laws to the specification, making more things
equal. Of course if the laws added are derivable from the previous laws then there is no change in the models.
We have tried to develop laws that do not change the external behavior of the data type observed through the
functions that have as range a type not the type being specified, the TOI in Guttag’s terminology. That is,
given two terms tl, t2 of type TOI, not equal in the initial model, we try to deduce identities fi(tl) = fi(t2) for
all fi with range not the TOI. We can then add the equation tl = t2 to the specification moving the initial
model closer to the terminal without changing the behavior of the type.
EFFICIENT DATA REPRESENTATIONS — 157
Consider for example the simple type SET defined in Wirsing et al. [1979].
type SET
operations
empty: — set
incorp: set x integer — set
iselem: set x integer — boolean
axioms
iselem(empty,i) = false
iselem(incorp(s,i),j) = if i=j
then true
else iselem(s,j)
Here the only external function is iselem. In the initial model of this type incorp(incorp(s,i),j) #
incorp(incorp(s,j),i). However, if we consider iselem(incorp(incorp(s,il),i2),j) we have
iselem(incorp(incorp(s,il)
,i2)),j)
= ifil =j
then true
else if i2 =j
then true
else iselem(s,})
(Unfolding twice)
= if i2=)
then true
else if 11>
then true
else iselem(s,j)
(Rearranging the conditional)
= iselem(incorp(incorp(i,i2),il),j)
(Folding twice)
incorp(incorp(s,il)
,i2) = incorp(incorp(s,i2)
,il)
will not change the type’s external behavior. Similarly, the identity incorp(incorp(s,i),i) = incorp(s,i) can be
developed. Having developed this new type it is easy to synthesize a new constructor that keeps everything in
the simplest form, e.g., avoiding duplications.
158 DARLINGTON
F. Acknowledgements
Many of the ideas presented here were also being developed by Gerard Terrine before his untimely death
last year and I greatly benefited from discussions with him and Marie-Claude Gaudel. Thanks also to J. Guttag.
The British Science Research Council provided financial support.
References
Burstall [1977]
R.M. Burstall, ‘‘Design considerations for a functional programming language,” Infotech State of the Art
Conference, Copenhagen (1977), pp. 45—57.
Darlington [1978]
J. Darlington, ‘‘Program transformation and synthesis: present capabilities,’ Report 77/43, Department of
Computing and Control, Imperial College (1978). To appear in Artificial Intelligence Journal.
Darlington [1979]
J. Darlington, ‘‘The synthesis of implementations for abstract data types,’’ Report 80, Dept. of Computing and
Control, Imperial College (1979).
Floyd [1964]
R.W. Floyd, ‘‘Algorithm 245 treesort,’> CACM7,12 (1964).
Hoare [1972]
C.A.R. Hoare, ‘Proof of correctness of data representations,’ Acta Informatica | (1972), pp. 271—278.
Terrine [1979]
G. Terrine, Personal communication.
Vuillemin [1976]
J. Vuillemin, ‘‘A data structure for manipulating priority queues,’ Internal Report, Department
d’Informatique, Universite de Paris Sud, France (1976).
CHAPTER 8
Christine Choppy
Laboratoire de Recherche en Informatique
Université de Paris-Sud
Abstract
In this paper, we are concerned with problems involved in transforming abstract data type specifications.
The questions are: what are the operations and the conditions involved in transforming a given specification
into an equivalent one, what is the gain in this transformation (we define a complexity measure), and finally, is
it possible to define systematic strategies for specification transformation. The example we use is a specification
built on a cartesian product of two abstract data types (called subtypes), modified by a restriction on a definition
domain of a constructor and enriched by some operations.
160 CHOPPY, LESCANNE, REMY
Introduction
We present here some ideas about how to evaluate and improve an abstract data type specification; we
illustrate them with an example inspired from the Bartussek and Parnas [1977] paper.
We adopt the point of view that an abstract data type is a class of algebras on a signature of typed opera-
tions (Burstall and Goguen [1977], Guttag and Horning [1978], Goguen ef a/. [1978]). We use equational
specifications with preconditions that restrict the operation definition domain (Guttag [1980]).
The construction of abstract data type specification has been studied by various authors (Bauer and Woss-
ner [1979], Darlington [1978], Partsch and Broy [1979], Remy [1980]). An example proposed by Majster
[1977] (traversable stacks) and similar to Bartussek and Parnas’ has been studied by some authors with an
approach close to ours: Thatcher er a/. [1978] simplify the example and are interested by a very simple model
for which they obtain a specification proven thereafter. Veloso and Pequeno [1979] look for the normal forms
and work on them as a model.
The example chosen for this paper is built on the cartesian product of two simple data types enriched by
additional operations following Burstall and Goguen [1977] terminology. Those operations are difficult to effect
and we attempt to transform the specification into another one where calculations are simpler.
In the first part of this paper we describe the techniques we use (and that we either borrowed or
developed) in order to build a specification and to allow for systematic transformations:
1) A methodical presentation of the algebraic specification of an abstract data type with preconditions allow-
ing an easy reading of confluence, finite termination and sufficient completeness. (This presentation is
tied to Guttag and Horning [1978] and Gaudel [1980]. We differ from these works by orienting each
equation into a rewrite rule (Lescanne [1979], Musser [1980], Huet [1980], Rémy [1982]) ).
2) Proof techniques in rewriting systems proposed by Huet [1980], Burstall and Darlington [1977], (folding,
unfolding), Musser [1980] (induction proofs) and Goguen [1980] together with transformation strategies
as defined by Feather [1979] (composition of definitions).
3) A measure of the operations complexity defined as the number of rewritings necessary to evaluate them.
Finally, we describe the transformation process with some strategies and we give the main steps:
1) Choice of a family of constructors.
2) Choice of the two types in the cartesian product.
3) Transforming the definitions of the operations depending on the two types.
CHOICE OF CONSTRUCTORS 161
© Operations declaration — The operations are divided into three classes: the constructors are the only operation
necessary to describe any object of the data type, the internal operations are operations with range the type of
interest (i.e. the type being specified) which are not constructors, the external operations extract values from the
abstract data type objects (among those operations are the predicates or operations with range boolean that test
properties of the objects).
e@ Declaration of the operation profile and restrictions — In each class, profiles and states (i.e. infixed or postfixed,
see further down) are given. Often operations are not defined on their whole domain and restrictions are
necessary, they are defined by predicates (predicates are themselves not restricted, therefore using induction, it
is possible to define a domain for every term), called preconditions.
@ Rewriting rules — We use rewriting rules to express the operation semantics; both sides play dissymmetrical
roles, that is mainly useful to take in account the restrictions. Let s(x),...,xXm) — t(yj,...,.¥,) be a rule where
V(s) = {x),...,.Xm} and V(t) = (yj,...,y,) are the variables occurring in s and t. Each rule will be supposed
right regular (i.e. V(t) C V(s)) and left linear (i.e. each variable occurs once in s) ; therefore, t is written
t(X],...,Xm). Finally, we say that a rule is ‘‘sound’’ if the domain of s(xj,...,xm) is included in that of
Pox.)
@ We replace the notation: u — if p then v else w by a separate cases notation: p= > u ~v; nonp = >u—
w which is simpler to use in proof techniques.
@ We consider specifications where the operations have at most one argument of the type of interest.
e In constructors and internal operations, the argument of the type of interest is denoted in postfixed notation,
leaving the other arguments as parameters (Curry and Feys [1958], Backus [1978]). For instance, the term
resulting from adding an item x ona stack s (Fig. A.1) is denoted s.Add(x). The advantage of this notation
is that it takes into account the chronology of the object construction.
In Fig. A.l. we give two specification examples: the type Stack and the type Pointer (that is a version of
type Integer). In the specification of type Stack, we call respectively the operations Take, Front and Isnull, the
destructor, the access and the test associated to the operation Add. Numerous specifications present destruction,
access and test operations associated with a constructor; in the process of specification transformation, once the
family of constructors is chosen, we shall try to associate such operations to them.
It is well known that, in order to perform computations under good conditions, the rewriting system
should have the properties of confluence (or Church-Rosser property) and finite termination. The first property
insures that every term t is rewritten into at most one irreducible term t’ called its normal form. The second
property insures that there is no infinite series of rewritings. These two properties insure then that every term
has a unique normal form. Usually, the confluence property is proved by using the Knuth and Bendix [1970]
algorithm and the Finite termination property is proved by using a well-founded partial ordering on the type
expressions (Plaisted [1978], Dershowitz [1979], Kamin er a/. [1980], Jouannaud et al. [1982]).
162 CHOPPY, LESCANNE, REMY
s.Take : Stack def if not Isempty(s) p.Left : Pointer def if not Isnull(p)
Front(s) : Item def if not Isempty(s) Isnull(p) : Boolean
Isempty (s) : Boolean
s.Add(x).Take —s p.Right.Left —p
Front(s.Add(x)) —x Isnull(Pnull) —True
Isempty(Emptystack) —True Isnull(p.Right) —False
Isempty(s.Add(x)) —False
We require that our specifications are ‘“‘gracious”’ in Bidoit [1981]’s terms. Among other things, it means
that each internal or external operation admits, with respect to the constructors, either a direct definition or an
inductive definition. A direct definition is a rule whose left hand side has the following pattern: x.f(Z),...,Zp)
where x is a variable of the type of interest, z),...,Z, are variables of external types. An inductive definition is
a family of rules whose left hand sides have the following pattern: x.c(yj,...,¥m)-f(Z1,...,2q) OF Co.f(Z},...,Zp)
where C, iS a constant constructor and c, a non constant constructor. We need one rule per constructor when
the precondition of f is not identically false.
Rewriting in a context — u rewrites tov in acontext p if there exists a rule p'\=>u’—v’ and a substitution o
such that
® cu’ is asubterm of u
® v is derived from u by replacing cu’ by ov’, and
® the condition op’ is a rewriting of p
(In the third condition, we use classical rewritings, without preconditions; this is generally sufficient).
Equality proof in a context — Equation u=v is directly provable in a context p if there exist some
sequences
UU} ©" * Un and Vo ~~ * Vm such that uy is u, Vg is v, Uy and Wm coincide and each term rewrites
to the following
in the context p.
CHOICE OF CONSTRUCTORS _ 163
The left hand side member rewrites to the right hand side one. We use the substitution o: q — q.Addq(x) and
the rule not Isnewq((q) => Frontq(q.Addq(y)) — Frontq(q) and o( not Isnewq(q)) = not
Isnewq(q.Addq(x)) — not false — true.
Induction proof — When an equation is not directly provable, we proceed by induction: let E be an equation
u=v (p) to be proven and x a variable of type of interest in E. We can replace the proof of E by the proofs
of a family denoted te) of equations so defined: each equation of F(x) is obtained from E by substitut-
ing for x aterm x.c(yj,...,y,) where c is a non-constant constructor of C of the term c, if c, is a constant
constructor. This notation is due to Bidoit [1981]. Moreover, we can consider x as a constant and use the
induction hypothesis to prove the equations of erat’)
Application of proof techniques — When we transform a specification, we have to invent the right hand side
members of equations. Hopefully, we know the left hand side members since we know a family of constructors
and we want some recursive definitions of the other operations. We adopt the following strategy (cf. also Bur-
stall and Darlington [1977] and Feather [1979]).
Direct proof — Beginning with a term u (the left hand side member of an equation)
1) Unfold this term as long as possible, i.e. apply specification rules from left to right. If
these rules are conditional, introduce two complementary contexts to deal with suc-
cessively.
2) Then, fold the resulting term by applying rules from right to left (occasionally, intro-
duce two contexts) and stop when the final term constitutes a recursive definition of
u with respect to the new constructors and their associated operations.
Induction proof — Suppose we proved an assertion E,: Uy = Vo (py) and there exists a constructor co, a
variable x and an assertion E such that E, = ob) Then we try to prove E by induction.
dco
164. CHOPPY, LESCANNE, REMY
A.3. Defining and Computing a Complexity Measure for a Specification Rewriting System
a
The complexity we define has to give a measure of the number of rewritings necessary to transform
ground term (i.e. without variables) into its normal form. This number depends on the rewriting system and
the rewriting strategy considered (Vuillemin [1974] ,Huet and Lévy [1979]). Taking in account the specific
form of the rules, a call by value strategy, with special consideration for conditionals, is optimal here. There-
fore the complexity is totally defined by its value on the terms: t.o(t),...,t,) or O(t,t),...,t,) where t is a normal
form of type of interest and ty,...,t, constants of external type. We denote by 6(t,t),...,t,) the number of rewrit-
ings needed in computing the normal form of these terms. If t isaterm, R a rewriting system, we denote by
R(t) the normal form of t in R.
The family of functions (6)<9 verifies a system of equations which can be systematically deduced from R.
Instead of formally specifying the process, we propose a sufficiently complete example: let 0,0),02 internal
operations, c aconstructor, p a predicate and the following rule, recursively defining 0 on c:
Normal form and valuation — In many examples, the quantity O(X,Y1,...,¥,) depends only on numbers of
occurrence of constructors in x. We name valuation of a normal form the sequence of these occurrence
numbers. For example, let x be a normal form: Cucr Vn enypcs (aimee the valuation of x, denoted |x|,
is the pair (p,q). In this case we write also 6(|x|) instead of 6(x,yj, ... , Yq).
Isnewq(Newq) — true
Isnewq(q.Addq(x)) — false
Immediately, Isnewq(n) = 1 for each n>o. By definition |Newq| = 0 and |q.Addq(x)| = 1 + |q|. So
Isnewq(q) = true<==> |q| = 0. Wecan write
In our example, INIT, and INIT, are reduced to one operation; so is INIT,XINIT), it is denoted NULL.
@ For each operation 0, € 0;*, its extension 0, affects the first component of T, xT, in the same way as 0). In
the rest of the paper 0, will be denoted 0 (the overlining will be omitted).
T, XT, rules are composed with extensions R’; and R’) of the sets of rules R; and R>, each with commuta-
tivity rules between:
with pl: Stack x Pointer, Rules for all pl: Stack x Pointer
Operations
Xx! Item xX: Item
: Stack x Pointer .
NULL
pl. Add (x) : Stack x Pointer pl.Right.Add(x) —pl.Add(x).Right
pl.Right : Stack x Pointer
pl. Take - Stack x Pointer pl.Add(x). Take —pl
def if not Isempty (pl) pl.Right.Take —pl.Take.Right
pl.Left : Stack x Pointer pl. Add(x).Left —pl.Left.Add(x)
pl.Right.Left —pl
Restricting the domain of Right with Exright causes the rejection of half of the objects generated by the
cartesian product. Furthermore, the commutativity rule between the constructors is not symmetrical
any more
since the left hand side is less defined than the right hand side. What is left is not a cartesian product
anymore.
The operations Add, Take, Front are not a formalization for operations on the data; they play
an impor-
tant role in the specification since they are a constructor together with its associated
operations, but they are
hidden operations (they could not be invoked by a system’s user). Enriching the specificatio
n with additional
operations does not modify the set of the objects.
CHOICE OF CONSTRUCTORS _ 167
Operations with pl: Pointed List,x: Item Rules for all pl:Pointed List, x: Item
NULL : {Pointed List }
pl.Add(x)* : {Pointed List } pl.Right.Add(x) —pl.Add(x).Right
pl.Right : {Pointed List def if Exright(p!) }
pl.Take * : {Pointed List def if Exright(pl) } pl.Add(x).Take —pl
pl.Right.Take —pl.Take.Right
pl.Left : (Pointed List def if Exleft(pl) } pl.Add(x).Left —pl.Left. Add(x)
pl.Right.Left —pl
pl.Insert : {Pointed List } {non Exright(pl)= >pl.Insert(x) }
—pl.Add(x).Right
Exright(pl) = >pl.Insert(x)
—pl.Take.Insert(x).Add(Front(pl))
pl.Delete : {Pointed List def if Exleft(pl) } {non Exright(pl) —pl.Delete —pl.Take.Left}
Exright(pl) = >pl.Delete
—pl.Take.
Delete. Add(Front(pl) )
Front(pl) * : {Item def if Exright(pl) } Front(pl.Add(x)) —x
{or Exleft(pl) } Front(pl.Right) —Front(pl)
Read (pl) : Item def {if Exleft(pl) } {non Exright(pl) ~Read(pl) —Front(pl) }
Exright(pl) = >Read(pl) —Read(pl.Take)
Exright (pl) : Boolean Exright(NULL) — False
Exright(pl.Add(x)) — True
Exright(pl.Right) —Exright(pl.Take)
Exleft(pl) p Exleft(NULL) — False
Exleft(pl.Add(x)) —Exleft(pl)
Exleft(pl.Right) — True
* hidden operations
C. Transformations
NULL =0
n a +1) (p+2)
Add(n,p) =p Exright(n,p) = sprit)
- - ~ +2
Insert(n,p) = {n=p+1) (p+) (p+2) +3(n—p) (p+l) +n +1
Beft( p=)
2
(p+l) (p+4)
Front(n,p) = p+1 Read(n,p) = (n—p+l) aml
2
the expected goal: reducing the complexity of (nonhidden) operations. At last, in this work, the resulting
specification has to be gracious (which implies they have the confluence, finite termination and sufficient com-
pleteness properties).
To obtain the new specification, we have, first of all, to find the new constructors, then we compute
definition rules for the nonhidden operations (and hidden operations necessary to the specification) in terms of
these new constructors.
The input and output specifications have to be equivalent, i.e. their initial algebras have to be isomorphic
(Goguen et al. [1978]). Without giving formal definitions, it is enough to define an isomorphism between the
normal forms generated by the two families of constructors. Classically, we build a new specification by invent-
ing some relations which are theorems in the old specification and, then, we verify that the axioms of the first
specification are also theorems for the new one.
In the next paragraph, we see how it is possible to make easier invention and verification in this process.
1) the first step is the choice of a new constructor and we give some criteria to guide this choice. Recall that
one of our goals is reducing the complexity of the specification: we choose, for the first transformation an
‘‘expensive’’ operation, able to form, with another constructor a generating family for the type. In the
other side, we seek for a ‘‘cartesian product” specification and the new constructor has to commute with
the old constructor which is kept. The satisfaction of these conditions leads us to choose, for the second
transformation, a constructor which is a composition of operations.
2) As we note in paragraph A.1., we try to associate with the new constructor some operations of destruc-
tion, access and test, and test whether these operations verify commutativity rules allowing to identify the
components of a cartesian products (cf. para. B.1.).
3) We have finally to complete the specifications by the definition of the operations which are not dealt with
yet. It remains now to insure that the resulting specification has confluence, finite termination and
sufficient completeness properties (we do not prove that here), to verify that the two specifications are
equivalent and to evaluate the complexity in order to estimate the resulting specification.
The equivalence proof is easier because of the special structure of specifications. Let C and C’ be the
two constructors systems; since only one constructor is modified, we have: C =C” U {c) and C’ =C” U{c’}.
Let A_ be the set of the auxiliary operations used in the definitions of c with respect to C’ and c’ with
respect to C. The rules on C” U{c,c’}U A consist in:
Rules of the second specification have been computed in the last step as some theorems of the first one. It
remains to verify that the relations in C’U{c} and definitions of {c’?}UA are theorems of the second
specification. If, in addition, some constructors of the first specification are restricted by preconditions, we have
to verify that they have the same preconditions in the transformed specification. For the other operations
(which do not belong to CUC’UA) we have computed before, definitions in the new specification which are
theorems in the old specification. The fact that the specifications are gracious with respect to C or C’ guaran-
tees their equivalence without having to prove that the rules of the old specification are theorems of the new
one (Rémy and Veloso [1982]).
rot
Eg
Add\x)
Right
Insert (x )
Right
Figure C.3.1.
The constructor to eliminate is Right since the family {Null, Insert, Right} is not generating. So, the new
family is {Null,Add,Insert}. It is generating as we can define Right in terms of Add,Insert and the operations
associated with Add,Remove and Front (cf. Appendix 2):
pl.Add(x).Insert(y) — pl.Insert(y).Add(x).
pl.Insert(x).Delete — pl
Read(pl.Insert(x)) — x
not Exleft(Null) — True
not Exleft(pl.Insert(x)) — False
Operations with pl: Pointed List, x: Item Rules for all pl: Pointed list x,y: Item
NULL : Pointed list
pl.Add(x)* : Pointed list pl.Insert(x).Add(y) —pl.Add(y).Insert(x)
pl.Insert(x) : Pointed list
pl. Take’ : Pointed List def if Exright(pl) pl.Add(x).Take —pl
pl.Insert(x).Take —pl.Take.Insert(x)
pl.Delete : Pointed List def if Exleft (pl) pl.Add(x).Delete —pl.Delete. Add(x)
pl.Insert(x).Delete —pl
pl.Left : Pointed List def if Exleft(pl) pl.Add(x).Left —pl.Left.Add(x)
non Exright(pl)= >pl.Insert(x).Left —pl.Add(x)
Exright(pl)= >pl.Insert(x).Left —
—pl.Take.Insert(x).Left.Add(Front(pl) )
pll.Right : Pointed List def if Exright(pl) non Exright(pl. Take) = >pl.Right —
—pl.Take.Insert(Front(pl) )
Exright(pl. Take) = >pl.Right —
—pl.Take. Right. Add(Front(pl))
Front(pl)* : Item def if Exleft(pl) Front(pl.Add(x)) —x
nonExright(pl) = >Front(pl.Insert(x))) —x
or Exright(pl) Exright(pl) = >Front(pl.Insert(x)) —Front(pl)
Read (pl) : Item def if Exleft(pl) Read(pl.Add(x)) —Read(pl)
Read (pl.Insert(x)) —x
Exright (pl) : Boolean Exright(NULL) — False
Exright(pl.Add(x)) — True
Exright(pl.Insert(x)) —Exright(pl)
Exleft (pl) : Boolean Exleft(NULL) — False
Exleft(pl.Add(x)) —Exleft(pl)
Exleft(pl.Insert(x)) — True
* hidden operations
2 CHOPPY, LESCANNE, REMY
Cc” = {Null,Add}
c = Right
c’ = Insert
A = (Take, Front, Exright}
We have to verify that commutativity relation between Add and Right and the definition of operations of
{c’}UA in the first specification are theorems of the new one.
C35 _ Evaluation and analysis of the complexity of the rewriting system of the resulting specification
We give in figure C.3.5. the complexity of the rewriting system of the resulting specification. We want to
compare this complexity with the one of the original specification (Fig. B.3.). We can adopt several points of
view:
a) to compare the complexity by groups of operations
— we want to keep Insert as constructor as this is a nonhidden operation and its role of construction gives
it a low complexity, so we eliminate Add,
— we seek for a constructor among costly operations (Right and Left); Right is not generating
with Insert
and Left does not commute with Insert, yet the composition of Insert and Left is convenient.
Insertleft
can be inductively defined on the constructors Null, Add, Insert by the following rules:
CHOICE OF CONSTRUCTORS _ 173
NULL =0
Delete(n,p) =1 Exright(n,p) =p +1
Then the family {Null, Insert, Insertleft} is generating. Indeed, the eliminated constructor Add is so defined
(where R denotes the symmetric of rule R):
So, the operations Null, Insertleft, Deleteright, Readright, not Exright constitute a second stack which com-
mute with the first one. We, therefore, have a cartesian product of two stacks.
The proof is here reduced to the commutativity relation of Add and Insert and the definition of Insertleft with
respect to Add and Insert.
CHOICE OF CONSTRUCTORS — 175
Operation with pl: Pointed list, x: Item Rules for all pl: Pointed list , x,y: Item
* hidden operations
D. Conclusion
The main ideas in this paper are the following:
NULL =0
Insert(n,p) =p Left(n,p) = 2(p+l)
Insertleft(n,p) =0 — Right(n,p) = p+
Delete(n,p) = ptl Exleft(n,p) = p+l
Considering simple types composition allows us to use particular transformation strategies (Gaudel and
Terrine [1978]).
d) Analyzing the operation complexity in a specification and identifying the costly operations is a useful
guide in the choice of constructors.
This work should be pursued in the following ways:
a) Apply the strategies presented here to other examples (as was done for the representation of sets by
binary trees (Rémy [1980]).
b) Define a complexity measure that distinguishes each type of operation and its cost.
) Determine computation methods for the complexity functions starting from equations.
d) Deepen the study of heuristics in guiding the choice of new constructors. Extend the field of the opera-
tions that can be used. Finally, it could be necessary to test the strategies and techniques in this paper by
writing a set of programs or using existing programs such as Darlington’s [1979] or Feather’s [1979].
E. Acknowledgements
We are grateful to our departed colleague, G. Terrine, for proposing this Bartussek and Parnas example,
and for many discussions we had before his untimely death.
We thank M.C. Gaudel, G. Guiho and M. Sintzoff for their encouragements and constructive criticism.
At last, we are indebted to our Castor Research Team at the Centre de Recherche en Informatique de Nancy.
Appendix 1
Complexity computations are done using the following lemma the proof of which is simple:
Lemma: for all term pl of the Pointed list type such that:
and for all term pl such that |R(pl)| = (n,p) with n>0
According to the above lemma, the case : Exright(pl) corresponds to the case : n>p, and non Exright(pl)
corresponds to: n=p. Let us consider both cases separately:
@® Casen> p
Insert(R(pl),R(x)) = 1 + Exright(R(pl)) + Take(R (pl)
+Insert(R(pl. Take) ,R(x))
+Add(R (pl. Take. Insert(x) ),R(Front(pl)))
+Front(R (pl) )
hence:
Using the results concerning complexities of the operations Exright, Take, Add and Front given in Fig. B.3. :
ee Oe)
Insert(n,p)
=1 + 4+3(p+1) + Insert(n—1,p)
2
@ Casen =p
Insert(R
(pl) ,R(x)) = 1+Exright(R(pl)) + Add(R
(pl) R(x)
+ Right(R
(pl. Add(x))
hence:
Finally :
Appendix 2 Computing the old constructor : Right in terms of the new ones : Add and Insert and their associ-
ated operations
We want either a direct definition (rewriting pl.Right) or an inductive one (rewriting pl.Add(x).Right and
pl.Insert(x).Add) using the following results:
(1) pl.Right is defined if Exright(pl)
pl = pl.Take.Add(Front(pl)) (Exright(pl))
The context of this theorem is the same as the definition context for pl.Right :
References
Backus [1978]
J. Backus, ‘‘Can programming be liberated from Von Neumann style?: A functional style and its algebra of pro-
grams’? Comm. A.C.M., 21 (1978), pp. 813—841.
Bidoit [1981]
M. Bidoit, ‘‘Une méthode de présentation des types abstraits: application’’, Théses 3éme cycle, Université
Paris-Sud (June 1981).
Darlington [1978]
J. Darlington, ‘‘Program Transformation involving unfree date structures. An extended example” Proc. 3rd Int.
On Programming (Robinet ed.), Dunod, Paris, (1978), 203-217.
Dershowitz [1979]
N. Dershowitz, ‘‘Orderings for term rewriting systems’’, Proc. 20th Symposium on Foundations of Computer
Science (1979), pp. 123-131.
Feather [1979]
M. Feather, ‘‘A system for developing programs by transformations”’ Ph.D. Thesis, Dept. of Artif. Int., Univ.
of Edinburgh, (1979).
180 CHOPPY, LESCANNE, REMY
Gaudel [1980] ; .
M.C. Gaudel, ‘‘Génération et preuve de compilateurs basées sur une sémantique formelle des langages de pro-
grammation,’’ These de I’Institut National Polytechnique de Nancy, Nancy (1980).
Goguen [1980] A
J.A. Goguen, ‘‘How to prove algebraic inductive hypotheses without induction’’, Sth Conf. on Automated
Deduction. Les Arcs (1980).
Guttag [1980]
J. Guttag, ‘‘Notes on type abstraction’, IEEE Trans. on Soft. Engin., 6, (1980), pp. 13—23.
Huet [1980]
G. Huet, ‘‘Confluent reductions: abstract properties and applications to term rewriting systems’, J.A.C.M., 27,
4 (1980), pp. 797-821.
Lescanne [1979]
P. Lescanne, ‘‘Etude algébrique et relationelle des types abstraits et de leurs représentations,’’ These de
l'Institut National Polytechnique de Lorraine. Centre de Recherche en Informatique de Nancy (1979).
Majster [1979]
M.E. Majster, ‘‘Limits of the ‘algebraic’ specification of abstract data types,’’ SIGPLAN Notices, 12 (1977), pp.
Bi — 42.
Musser [1980]
D. Musser, ‘‘On proving inductive properties of abstract data types,’ 7th ACM Symposium on Principles of
Programming Languages (1980).
Plaisted [1978]
D.A. Plaisted, ‘SA recursively defined ordering for proving termination of term rewriting systems,’ Dept. of
Computer Science Research Report 78—948, University of Illinois, Urbana IL (Sept. 1978)
Rémy [1980]
J.L. Rémy, ‘‘Construction, évaluation et amélioration systématiques de structures de données,’ R.A./.R.P
Theoretical Computer Science, 14 (1980), pp. 83-118.
Rémy [1982]
J.L. Rémy, ‘‘Etude des systems de réeciture conditionnels et applications aux types abstraits algebraiques’’,
These de I’Institut National Polytechnique de Lorraine, Centre de Recherche en Informatique de Nancy (1982).
Vuillemin [1974]
J. Vuillemin, ‘‘Correct and optimal implementations of recursion in a simple programming language”’ J. Comp.
Syst. Sc. 9 (1974), pp. 332—354.
«| eT aii
ye ee ear is
e - ~ yin 7
~. ADIN ie
% &e ent
32 ei oa
: »- Ae >
COMPILER CONSTRUCTION _ 183
CHAPTER 9
M.C. Gaudel*
Ph. Deschamp
M. Mazaud
INRIA**
‘* a general notation for semantic specification would permit the development of a true com-
piler generator, just as B.N.F. led to the development of parsers generators’’.
* Present address: Centre de Recherches de la CGE — MARCOUSSIS Route de Nozay 91460 Marcoussis
A. Introduction
In this area
For a long time, much attention has been given to the systematic development of compilers.
n of a grammar is now widely known and used,
automatic production of parsers from a B.N.F.-like specificatio
n of compilers, the syntactic part. The so-called
(Boullier [1980]). These tools allow only partial constructio
semantic part has to be written most of the time by hand. The discovery of a high level specificatio n method of
programming languages which would be well-suited to compiler constructio n is still an unsolved problem.
In the first part of this paper, we briefly present the current state of the art. In the second part we suggest
a formalism for describing the semantics of programming languages which seems to be convenient for compiler
specification. As an example, we outline what the semantics of a usual programming language with recursive
procedures looks like in this formalism. In the third part, the specification of an implementation of this same
language is presented and discussed.
In the last part, the overall structure of the compiler generator PERLUETTE (which accepts such
specifications as inputs) is explained and the current state of development and experimentation of the system is
given.
Source Language
Definition Compiler from
Target Language COMPILER the Source Language
Definition GENERATOR into the Target
Implementation Language
Choices Specification
— the semantics of the Source Language and of the Target Language are both described with the
same formalism;
— the representation of the semantic values of the source programs in terms of semantic values of
target programs is specified and proved as the representation of an abstract data type by another one.
types int;
operations
(int,in) — int ; plus, sub, mult, divi
axioms
plus{x, int’0’) = x
plusx, plusty,z)) = plus (plus(x,y)
,z)
N.B.: Constants of a data type are enclosed between quotes, preceded by the name of their type. Usually
they are considered as nullary operator. In our case we prefer to consider int 7DIGIT-STRING’ as a function of
string into int. The axioms on this function are not given here; for a complete presentation see Gaudel [1980a].
The semantic function V_ yields for each expression of SIMPROC a term of the int data type. It is defined
by equations such as:
where E, T, and NUMBER are some non-terminals of the language. It is important to notice that the func-
tional symbols are not interpreted: we have no more knowledge about ‘‘plus’’ than what is specified in the
axioms.
We then describe the properties of identifiers in SIMPROC: every identifier is local to a procedure or to
the program (to which the identifier PROG is associated). The identifier types are directly deduced from the
data types of SIMPROC:
The scope operation gives, for any identifier, the identifier of the procedure to which it is local.
One of the most difficult issues encountered when describing such a language is that a variable identifier,
local to a procedure, refers to a different variable for each call of that procedure. In addition, we want to be
able to specify which variable is referred to by an identifier non-local to the current procedure. To deal with
this problem we introduce the type call of procedure and some operations on it.
type call
operations
© = call > currentcall ;
(call) — call : caller ;
(call) — proc-id : name ;
An example of the relationship between calls and procedure identifiers is illustrated by the figure below:
Caller caller
\ <n \ “Ceo oe 6 eooeX* —X currentcall
name name
where °x’ stands for a call and *.” stands for a procedure identifier.
An important property we must state is that all the calls in the chain are different. (Some procedure identifiers
are possibly the same, as recursive calls are allowed.) We then have the following axioms:
type var;
operations
(var-id, call — var : designates,
( var) — int : value-of,
axioms
Axiom [1] expresses that two different identifiers designate two different variables. Axiom [2] describes
the duplication of local variables for each procedure call. Axiom [3] specifies that, if an identifier is not a local
one, it designates the same variable as in the caller, be it local or global to the caller. (Keep in mind that all
identifiers are different.)
These axioms are not the most concise ones, but we use them here for reasons of understandability.
To conclude the presentation of this part of the abstract data type, let us give the semantic functions for
simple identifiers of SIMPROC: depending on the context where an identifier is used, its meaning can be a
term of the var data type or of the int data type.
188 GAUDEL, DESCHAMP, MAZAUD
N.B.: Array elements and called-by-variable parameters are not treated _ here. See Gaudel et al. [1978],
Gaudel [1980a], Deschamp [1980].
type modif,
The semantic value of a statement is a term of this type. Let S be the corresponding semantic function.
Among the operations with modifas co-domain, there are:
operations
We need some formal definitions of these operations, but before stating them we are going to give an intuitive
idea of what a modifis. Let us consider, for instance, the modification assign (v,i). Its meaning is that the for-
mula ‘‘value-of (v) = i’? becomes valid. The properties of the value-of operation are then changed and,
accordingly, the algebraic data type is different. Thus the meaning of a statement is a transformation of the
algebraic data type. It is convenient to describe such transformations by using the notion of ‘‘formulas of an
algebraic data type’’. We call the set of these formulas a ‘‘state’’.
Definition 1:
Let <T,F,A > be the presentation of the algebraic data type associated with a programming language.
A state is a set S of formulas t = t’, where t and t’ are some constant terms of the data type, which
satisfies the following properties:
COMPILER CONSTRUCTION _ 189
Definition 2:
The state generated by a set of axioms A is the smallest state which contains all the formulas obtained by
substituting, in each axiom a of A, to all the free variables occurring in a, all the constant terms of <T,F,A>
of the relevant sort.
These formulas are called ’directly derivable from A’.
Actually, the last definition defines the well-known smallest-congruence relation in <T,F,A>. It means
that we are considering the initial algebra semantics for the algebraic data types used to characterize a State.
Definition 3:
A state satisfies a set of axioms A if it contains all the formulas directly derivable from A.
We have now the elements to describe the semantics of a programming language. Let us call A the set of
axioms which defines the operations of the programming language but the modifiable ones. This set of axioms
is satisfied by all the states of any program of the language. The initial state of a program is the state generated
by A.
Given a current state, a modification (an assignment for instance) removes some formulas from the states
and adds some new ones. In order to describe these transformations we need, at first, some way to state the
semantics of a term of type Modif. Let m be sucha term. We note
S’ = appl (m,S)
the resulting state when applying the modification denoted by m to the state S.
The definitions of the operations concat and cond are:
To formalize the assignment, one uses a primitive modification subst which is called ‘‘generalized substitu-
tion’. Intuitively, subst(f(A),) transforms a state S in such a way that:
The rather complicated definition of subst given below insures that no inconsistencies are introduced in
the resulting state even if there are some occurrences of f in A or wp.
First, let us consider the following state (where f’ does not belong to F and has the same arity as f):
1909 GAUDEL, DESCHAMP, MAZAUD
which con-
This state is the result of adding a new axiom to S. By definition S + a is the smallest state
tains S and all the formulas which are directly derivable from a (see Definition ay
Now, the definition of subst can be given:
All the modifications of the Source Language are defined in terms of appl and subst. The same thing is
done for the Target Language: it is then possible to prove representations of the modifications.
operations
(proc-id) modif : possesses ;
(proc-id) var-id : parl ;
(proc-id, var-id,modif, proc-id) modif : proc-decl ;
(proc-id, int, proc-id) Bh
es modif : proc-call ;
The value possessed by a procedure identifier is a term of type modifwhich is the semantic value of the body of
the procedure. There is no problem with recursive declarations since we only consider the term, without
interpretation. (We would otherwise have to speak of the least-fixed point theorem and Scott’s theory.) The
semantics of a procedure call with one called-by value parameter is then:
— a concurrent substitution which describes the change in the calls chain and the assignment of the
parameter;
— the body of the procedure:
— the return.
COMPILER CONSTRUCTION 191
proc-calKp,i,q) =
concat *( subst(< caller (currentcall’), name (currentcall’)
currentcall, value-of (designates (par (p), currentcall)) >,
<currentcall p, currentcall , i>),
possesses{p) ,
subst (currentcall, caller (currentcall))
Note that substitution embodies a very powerful means to specify the creation of a new call and of the new
variables associated with it: in every valid formula the occurrences of currentcall are replaced by
caller(currentcall), and from the axioms about caller and designates one can see that the ’new’ currentcall is
different from all the other calls and that the variables local to the called procedure are different from all the
other variables.
D. Specification of an Implementation
operations
(address, value) address : index ;
(value, value) cond-code : compare ;
Gy) cond-code : cc ;
(address) content : ca ;
(register) Te
Ue content : cr ;
axioms
indexta, value’0’) = a;
index(a, ad& v1 ,v2) index (index{a,v1),v2) ;
storda,v) = subst(cala), v) «
— At run-time the memory is managed as a stack of areas, each of which is related to a current call;
192 GAUDEL, DESCHAMP, MAZAUD
— We assume the existence of an infinite stack of registers. At run-time of the procedure of level i, the
first i registers contain the address bases of i areas related to that call and to the current calls of the
embedding procedures;
— Each area related to a call includes
i base registers
j-i+l base
adresses saved
TOP POINTER
return label area of the
parameters P call
local variables
(level i)
[The function ALLOCI searches in the allocation table for the displacement associated to V:
if V is not found the function ALLOC1 returns a displacement depending on its scope.
ALLOC1 is an example of a meta function which describes compile-time computations.]
The specification of the implementation becomes very systematic and formal. It is possible, from this
specification, to translate Source terms into Target terms.
But the main advantage is the opportunity to check out that the axioms of the Source data type are kept
by the representation and accordingly, to get a proof of the implementation.
E. The System
The PERLUETTE system is a compiler generator. From the definitions of the source and target
languages and from the representation specification, a translator working in three steps is built. Only the ‘‘syn-
tactic’ part of the abstract data types is used. The axioms are necessary only for the proof, not for the transla-
tor construction.
194 GAUDEL, DESCHAMP, MAZAUD
Source Program
X27
X87
N87
Target Abstract Data Type
+ Code Generator
code
In the source language description the semantic functions are defined by the means of semantic attributes.
The type checking is specified in the same way. The module PI! is made of the syntactic constructor and the
tabulation part of the DELTA system (Lorho [1975]) with a preprocessor which transforms the functional
schemes which occur in attribute definitions into LISP expressions. The STEP1 module is a compound of the
parser and the evaluation part of the DELTA system, which have been associated with a LISP system. The
result of STEP1 is a LISP form: the Source Term is represented as a tree.
The P2 module builds, from the representation specification, a LISP subsystem which is a list of func-
tions. A lambda (or nlambda) definition is associated to each declaration of type (if there are constants of this
type) or operation.
The following definitions are associated respectively to the type var-id and to the Operations designates,
value-of, currentcall.
COMPILER CONSTRUCTION — 195
(var-id
(nlambda (v)
(list "value (allocl v) v)))
(designates
(lambda (v a)
(list ’index
(list ’cr (list "register (niv v)))
v)))
(assign
(lambda (v i)
(list ’store (list v i))))
(value-of
(lambda (v)
(list ’ca v)))
(currentcall
(lambda ( )
(list ’cr "register "base)))
This subsystem is used to obtain the target term from the source term. STEP 2 evaluates the source
term, each of its subterms being considered as a call of a function of the LISP subsystem.
We give now the example of the LISP form generated by STEP | for an assignment where the variable j
is local to a procedure of level 2, the variable m toa procedure of level 1.
Let us consider that the displacement associated to j by the meta function ALLOC1 is 3” and the dis-
placement associated with m is 4. Thus the LISP form resulting from STEP 2 is:
The whole system is currently running on Multics. It is written in PASCAL and largely uses the syntactic
constructors SYNTAX and semantic constructor SDELTA. The formal definition of SIMPROC and the
specification of the representation presented above have been effectively used as test examples for the available
version of the system (Pl, STEP 1, P2, STEP 2). The P3 and STEP 3 modules have been tested for the
description of the DEC 11 machine language. A code generator for Multics will be described later on.
Meanwhile a Pascal Compiler is described and is used as a realistic test for the PERLUETTE System. By
now the implementations are hand-proved but we believe that the proofs can be mechanized: as a first experi-
ment, we have used the DTVS of D. Musser [1977] to prove an implementation of the expressions of a
language with flexible arrays, and more recently AFFIRM to prove the representation of booleans as condition-
code values.
196 GAUDEL, DESCHAMP, MAZAUD
F. Conclusion
mpiler.
This system is a typical example of an application of theoretical methods to a practical compiler-co
modularity of the compiler specificatio n facilitates the
The advantages of this formal approach are obvious: the
are defined independent ly and it is feasible to use the
use of the system; the Source and the Target Languages
corresponding parts of translator (STEP 1 or STEP 3) in several compilers.
Although the system is still an experimental one, it shows the feasibility of automatic compiler construc-
tion. This hopeful result is definitely a consequence of the specific class of considered programs and of the
large amount of research already performed in the area of programming language semantics.
Appendix
We create a program which inputs a list of integers and returns that list with all nonpositive integers removed.
Type pos
_int :
Type seq :
axioms
{There are no axioms because there is no semantics: we are
specifying a simple syntactic transformation.}
SIMPLE CLASSES
digit = "0123456789" -
Space = wow
s
Atmittiees
others = ,
TOKENS
ie3) "SYNTAX.
<prog> = <list> ;
<list> = %string |- %string |%string %blancs <list >| - string %blanc <list> ;
Fin pos_int ;
Type pos_seq :
Code(pos_int’P’) = ’P’ ;
Code(&) =°°;
Code(add(p,s)) = Code(p).’ ’.Code(s) ;
3) REPRESENTATION SPECIFICATION.
end pos
_int
198 GAUDEL, DESCHAMP, MAZAUD
Type neg int : pos_int ; (This is a trick: neg int must be represented by a target sort. But
. 7 from the representation of neg_append any representation of a
neg int is going to disappear.}
repr neg _int’N’ = pos_int’N’
end seq
reprop’() = &,;
int p, seq s) = add(repr p, repr s) ,
repr op pos_append(pos
repr op neg_append(neg
int n, seq s) = reprs ;
References
Goguen [1976]
J.A. Goguen, ‘‘Abstract data types as initial algebras and the correctness of data representations,’ Proceedings of
Conference on Computer Graphics and Pattern Recognition and Data Structures (1976).
Boullier [1980]
P. Boullier, ‘‘Generation automatique d’analyseurs syntaxiques avec rattrapage d’erreurs, Journees Franco-
9,3
Deschamp [1980]
Ph. Deschamp, ‘‘Production de compilateurs a partir d’une description simantique des langages de programma-
tion: le systeme Perluette,’’ These de Docteur-Ingenieur-INPL (Oct. 24, 1980).
Gaudel [1980al
M.C. Gaudel, *‘Generation et preuve de compilateurs basees sur une semantique formelle des langages de pro-
grammation,’’ These d’Etat, INPL (March 10, 1980).
Gaudel [1980b]
M.C. Gaudel, *‘On the concepts of state and of states modification in programming languages,” WG 2.2 Meet-
ing, Lyngby, Danemark (June 1980).
Knuth [1968]
D.E. Knuth, ‘‘Semantics of context free languages,’’ Mathematical Systems Theory 2,2(1968).
Lorho [1975]
B. Lorho, ‘‘Semantic attributes in the system DELTA,’ Symposium on Implementation of Algorithmic
Languages, Novossibirsk, USSR (1975).
Milne [1977]
R. Milne, ‘‘Verifying the correctness of implementations,’’ Advanced Course on Semantics of Programming
Languages, Antibes (1977).
Morris [1973]
F.L. Morris, ‘‘Advice on structuring compilers and proving them correct,’ Proc. of Symp. on Principles of Pro-
gramming Languages, Boston (1973).
Mosses [1975]
P.D. Mosses, ‘‘Mathematical semantics and compiler generation,’’ Ph.D. Thesis, University of Oxford (1975).
Mosses [1978]
P.D. Mosses, ‘‘A compiler-generation system using denotational semantics,’’ Reference Manual, Department of
Computer Science, University of Aarhus, Denmark (June 1978).
Musser [1977]
D.R. Musser, ‘‘A data type verification system based on rewrite rules,” 6th Texas Conf. on Computing Systems,
Austin, Texas (Nov. 1977).
ast}
7 mm! 2 ;
al ah i :
oe a)
7 @* leinetiad
we
~ 1 eae ae
teape
witcl “o —— jamuse ——
= cian
_ : < eu
irvup wh
—-
—
SECTION IV
OTHER TECHNIQUES
FOR SYNTHESIS AND ANALYSIS
foie hl ol vas
NeW 4ve HO
KNOWLEDGE AND DEDUCTION 201
CHAPTER 10
David R. Barstow
Schlumberger- Doll Research
P.O. Box 307
Ridgefield, CT 06877
Abstract
In the earliest attempts to apply artificial intelligence techniques to program synthesis, deduction (that is,
the use of a general purpose mechanism such as a theorem prover) played a central role. Recent attempts have
relied almost exclusively on knowledge about programming in particular domains, with no significant role for
This material is based upon work supported by the National Science Foundation under Grant No. MCS 78—03827. Any opinions,
findings, and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect the views
of the National Science Foundation. The research was done while the author was an Assistant Professor in Computer Science at Yale
University. An earlier version of this paper was presented at the Sixth International Joint Conference on Artificial Intelligence, Tokyo, Au-
gust 1979. Reprinted with permission from Machine Intelligence 10, edited by J.E. Hayes, D.Michie, and Y.H. Pao, 1982 and published by
Ellis Horwood Ltd., Chichester.
202 BARSTOW
for deduc-
deduction. Even in such knowledge-based systems, however, there seems to be an important role
conditions of specific programming rules. This auxiliary role for deduction is
tion in testing the applicability
especially important in algorithm design, as can be seen in the hypothetical synthesis of a breadth-first
enumeration algorithm. The interplay between knowledge and deduction also shows how one can be motivated
to consider the central mathematical properties upon which particular algorithms are based, as illustrated in the
synthesis of two minimum cost spanning tree algorithms.
A. Introduction
In applying artificial intelligence techniques to the problem of automatic program synthesis, two basic
approaches have been tried. In the first approach, deduction (i.e., the use of a general-purpose deductive
mechanism such as a theorem prover or problem solver) played a central role. The basic idea was to rephrase
the program specification as a theorem to be proved or a problem to be solved. If a proof or solution was
found, it could be transformed into a program satisfying the original specification. The Heuristic Compiler
[16], the early work of Green [7] and Waldinger and Lee [17], and the recent work of Manna and Waldinger
[12,13,14] are all examples of this approach. The second approach, an application of the knowledge engineer-
ing paradigm, is based on the assumption that the ability of human programmers to write programs comes more
from access to large amounts of knowledge about specific aspects of programming than from the application of
a few general deductive principles. For example, the PECOS system [2,3] had a knowledge base of four hun-
dred rules about many aspects of symbolic programming, and could apply these to the task of implementing
abstract algorithms.
Although much progress has been made, neither approach has yet ‘‘solved’’ the automatic programming
problem, and it is suggested here that neither one alone ever will. There are two problems with the deductive
approach. First, for programs of real-world interest, the search space is too large to be explored by a theorem
prover. While current systems are capable of synthesizing programs such as ‘‘greatest common divisor’’, no
system can yet synthesize any program of significantly greater complexity. And without guiding the theorem
prover with some kind of knowledge about how to write programs, it is unlikely that much more progress can
be made. Second, since there is a one-to-one correspondence between proofs and programs, if there is a meas-
ure by which one program may be preferred to another (e.g., efficiency), then the theorem prover must find,
not just any proof, but the ‘‘right’’? proof. That is, the theorem prover needs to know that it is really writing
programs and how to relate differences between proofs to differences between programs.
On the other hand, the knowledge-based approach has only been applied to the synthesis of programs
from relatively algorithmic specifications. While many rules for algorithm refinement have been developed and
tested, very little progress has been made on rules for algorithm design. And, as has been observed elsewhere
(e.g., [9]), algorithm design requires a deep understanding of the problem to be solved, an understanding
which seems to include the ability to reason (make deductions) about the problem.
In the rest of this paper, a way to combine these two approaches is suggested, as summarized in the fol-
lowing two statements:
(1) Just as there are specific rules for refining abstract algorithms into concrete implementations,
there are also rules for designing algorithms from non-algorithmic specifications.
(2) In testing the applicability conditions of such rules, it is necessary to call on a deductive
mechanism to test whether certain properties hold for the program being developed. (See [4] for a
similar model of algorithm design, but more closely related to the deduction approach.)
In the next section, these two statements are illustrated with a detailed synthesis of a
graph algorithm. In sec-
tionC, two other algorithm syntheses are used to demonstrate how these roles
for knowledge and deduction
help identify the mathematical basis for a particular algorithm.
KNOWLEDGE AND DEDUCTION — 203
B. Hypothetical Synthesis
The synthesis in this section consists of a sequence of rule applications, with each rule representing some
fact that human programmers probably know and that automatic programming systems probably should know.
Several of the rules can only be applied if some condition holds for the program under construction. These are
the conditions that seem appropriate tasks for a deductive system, and are indicated by the word ‘‘PROOF”’ in
the synthesis. The synthesis will be presented in a mixture of English and a hypothetical Algol-like language,
in order to focus on the nature of the programming knowledge and the algorithm design process, independent
from any formalism used to express them. It should be noted at the outset that this synthesis is beyond the
capabilities of any current automatic programming system, and of course, the rules and proofs will have to be
formalized before the synthesis could be automated.
B.1 Specification
Write a program that inputs the root R of a tree with a known successor function CHILDREN(N)
and constructs a sequence of all (and only) the nodes of the tree, such that if the distance from R
to NI is less than the distance from R to N2, then NI precedes N2 in the sequence (where the
distance from X to Y, denoted by D(X,Y), is the number of arcs between X and Y; thus, D(R,X)
is the depth of X).
B.2 Overview
The synthesis process goes through four major stages. First, the task is broken into two parts, a ‘‘pro-
ducer’’ that generates the nodes in the desired order, and a ‘‘consumer”’ that builds a sequence from the nodes
generated by the producer. The second stage involves constructing the consumer, which simplifies into a sim-
ple concatenation operation. The third stage, in which the producer is built, is relatively complex but eventu-
ally results in a simple queue mechanism in which nodes are produced by taking them from the front and the
children of each produced node are added at the back. Finally, the two processes are combined together into a
simple WHILE loop.
If the task involves using the elements in one set to build another set, try the transfer paradigm,
with the producer producing the elements from the first set and the consumer building the second
set.
204 BARSTOW
order,
When the transfer paradigm is used, objects are passed from the producer to the consumer in some
order. While the transfer order is often inconsequent ial, the following rule sug-
which we may call the transfer
gests that a particular order might be useful in this case:
If a consumer builds an ordered sequence, try constraining the transfer order to satisfy the same
ordering.
After applying these two rules, the original task has been broken into two subtasks:
Producer:
Send the nodes of the tree to the consumer in any order satisfying the following con-
straint:
if D(R,N1) < D(R,N2), then N1 is produced before N2
Consumer:
Using the objects received from the producer, build a sequence satisfying the following
constraint:
We may now consider the producer and consumer separately, viewing each as a separate process.
If a consumer builds a set consisting solely of objects received from another process, the set is ini-
tially empty.
Ifa consumer builds a set consisting of all objects received from another process, the action at each
step involves adding the object to the set.
Finally, we may apply a special case rule for adding an object to an ordered sequence:
Before applying this rule, we must prove its applicability condition: ‘‘the object follows every member of
the ordered sequence.’’ In this case, the proof is trivial, since the objects are received from the producer in the
desired order, and these objects are the only members of the sequence. Having applied the rule, we are now
finished with the consumer:
Consumer:
Initialization:
S:= <>
Action:
X := {receive from producer}
insert X at the back of S
If a producer produces all of the nodes of a graph according to some ordering constraint, try using
an ordered enumeration of the nodes in the graph, where the action for each step of the enumera-
tion is to produce the enumerated node.
Any such enumeration produces the nodes of the graph, guaranteeing that each node is produced exactly once.
In order to do this, the enumerator must know, at each stage, which objects have and have not been
enumerated. That is, the state of the enumeration must be saved between ‘‘calls’’ to the producer process. In
this case, the objects to be enumerated are all objects which may be reached from the root node by following
zero or more arcs. At any point in the process of following these arcs, certain nodes will have been found and
others will not. Of those that have been found, some will have had their successors looked at, and others will
not. Thus, the enumeration state may be maintained by tagging the nodes (at least conceptually) with one of
three labels. This technique for saving the state is summarized in the following rule:
If a graph is represented as a base set of nodes and a successor function, the state of an enumeration
of the nodes may be represented as a mapping of nodes to one of three states: ‘“‘EXPLORED”’,
‘BOUNDARY”, and ‘“‘SUNEXPLORED”’.
(Note that up to this point, no decision has been made about how to implement this mapping. In particular,
there is no commitment to explicitly tagging each of the nodes with one of these labels. As will be seen, the
final representation of this mapping involves only a single set, consisting of those nodes tagged ‘‘BOUN-
DAR Ye)
Any enumeration involves several parts, including an initialization, a termination test, a way of selecting
the next object to enumerate, and a way of incrementing the state. The following three rules are all based on
the technique just selected for saving the enumeration state:
206 BARSTOW
If the state of an enumeration of the nodes in a graph is represented as a three-state mapping, the
state may be initialized by setting the image of all nodes to ‘‘UNEXPLORED” and then changing
the image of the base nodes to ‘‘BOUNDARY”’.
If the state of an enumeration of the nodes in a graph is represented as a three-state mapping, the
termination test involves testing whether any nodes are mapped to ‘“‘BOUNDARY’’. If not, the
enumeration is over.
If the state of an enumeration of the nodes in a graph is represented as a three-state mapping, the
action at each stage consists of selecting some node mapped to ‘‘BOUNDARY” (the enumerated
node), changing its mapping to ‘“‘EXPLORED”’, and changing the mapping of all of its ““UNEX-
PLORED”’ successors to ‘‘cBOUNDARY’’. The ‘“‘SBOUNDARY”’ node must be selected to satisfy
any ordering constraints on the enumeration.
Thus, applying the above rules gives us the following description of the producer (where MARK[X] specifies
the image of X under the mapping and MARK-I[Y] specifies the inverse image of Y, that is, the set of all
nodes which map to Y under MARK):
Producer:
Initialization:
for all nodes, X:
MARKI[X] := "UNEXPLORED"
change MARKIR] from ‘‘UNEXPLORED” to ‘‘BOUNDARY”
Termination test:
is MARK-1[‘sBOUNDARY’’] empty?
Each step:
X := select any node in MARK-1[*SBOUNDARY’’] such that
if Xl and X2 are nodes remaining to be selected
and D(R,X1) <D(R,X2) then X1 is selected before X2
change MARKI[X] from ‘‘SBOUNDARY”’ to ‘‘EXPLORED”’
for all successors, Y, of X:
if MARK[Y] = ‘‘UNEXPLORED”
then change MARKY] from ‘‘UNEXPLORED” to ‘‘:BOUNDARY”’
{send X to the consumer}
Note that the nodes that remain to be enumerated include some tagged ““UNEXPLORED” as well as those
aoe ‘““BOUNDARY”’. Thus, the selection operation, and the constraint on it, may be precisely stated
as fol-
ows:
If we call the set from which an object is to be selected the ‘‘selection set’’, and the set of which that
object must be minimal the ‘‘constraint set’’, the following rule gives us a way to implement the selection
operation:
If a selection operation is subject to an ordering constraint, and every member of the constraint set
is dominated by some member of the selection set, the selection operation may be implemented by
taking the minimal element of the selection set.
However, before applying the rule we must prove one of its conditions:
Thus, the action at each stage of the enumeration has been simplified:
Part of the body involves a test of whether a node is ‘‘UNEXPLORED’’. The following rule enables this test
to be omitted:
If it is known that the value of a test in a conditional will always be true, the test may be omitted
and the conditional refined into the action of the ‘‘true”’ branch.
Every successor of X was initially ‘‘UNEXPLORED”’, since every node (except the root, which is
not a successor of any node), was initially ‘‘UNEXPLORED’’. The mark of an ‘“‘“UNEXPLORED”’
node is changed in only one place, when the node is the successor of some other node. Thus, if a
successor of X is not ‘‘SUNEXPLORED”’, it must be the successor of some node other than X,
which is impossible because the graph is a tree. Hence, every successor of X is ‘“‘*UNEXPLORED”’.
QED
With these simplifications to the enumerator, it is clear that only the “BOUNDARY” tag is really being
used. Thus, the mapping may be simplified somewhat by getting rid of references to the other two possibilities
(i.e., making them implicit instead of explicit). This knowledge is embodied in the following rules:
If a range element is only used in ‘‘set image’’ and ‘‘change image’’ operations, it may be made
implicit.
If the range element of a ‘‘set image’’ operation is implicit, the operation may be omitted.
Producer:
Initialization:
change MARK[R] from implicit to ‘BOUNDARY”
Termination test:
is MARK-1[*SBOUNDARY’”’] empty?
Each step:
X := closest node in MARK-1[‘SBOUNDARY”’]
change MARK[X] from ‘‘BOUNDARY” to implicit
for all successors, Y, of X:
change MARK[Y] from implicit to “BOUNDARY”
{send X to the consumer}
KNOWLEDGE AND DEDUCTION — 209
We are finally ready to implement the mapping. When inverse images are frequent operations, the mapping
can be inverted, as suggested by the following rule:
A mapping with domain X and range Y may be represented as a mapping whose domain is Y and
whose range consists of subsets of X.
In addition to this data structure representation rule, we need the following rules dealing with operations on
such data structures:
If a mapping is inverted, the inverse image of a range object may be found by retrieving the image
of the range object under the inverted mapping.
If a mapping is inverted, a range object may be assigned as the image of a domain object by adding
the domain object to the image of the range object under the inverted mapping.
If a mapping is inverted, the image of a domain object may be changed from one range object to
another by removing the domain object from the image of the first range object and adding it to the
image of the second range object.
If a mapping is inverted, the initial state of the mapping is that all range sets are empty.
Applying these rules gives the following (where MARK’ is the inverted mapping):
Producer:
Initialization:
MARK’ [‘‘BOUNDARY”’] := {}
add R to MARK’ [‘SBOUNDARY’”’]
Termination test:
is MARK’ [‘SBOUNDARY’’] empty?
Each step:
X := closest node in MARK’ [‘SBOUNDARY”’]
remove X from MARK’ [‘“‘BOUNDARY”’]
for all successors, Y, of X:
add Y to MARK’ [‘‘SBOUNDARY’”’]
{send X to the consumer}
Of course, the inverted mapping must still be represented concretely. Here, the elements of the domain set are
known precisely (there is only one) and all references to the mapping involve constants for domain elements,
so a record structure is a particularly good technique:
210 BARSTOW
A mapping whose domain consists of a fixed set may be represented as a record structure with one
field for each domain element.
If a mapping is represented as a record structure, the image of a constant domain object may be
referenced by referencing the corresponding field of the record.
A record structure with only one field may be represented as the object stored in the field.
If a record structure is represented as the object stored in its single field, the field may be referenced
by referencing the record structure itself.
Applying all of the above rules gives us the following (in which the single set is called B):
Producer:
Initialization:
B:= {}
add R to B
Termination test:
is B empty?
Each step:
X := closest node in B
remove X from B
for all successors, Y, of X:
add Y toB
{send X to the consumer}
We now have three operations on the B set to consider, and each depends on the representation
for that
set. Since one of the operations involves taking an extreme element under an ordering relation,
it is natural to
use an ordered sequence. We thus apply the following rule:
Having applied this rule, we must consider the operations on B. The “‘closest”’ operation may be refined by
using the following rule:
KNOWLEDGE AND DEDUCTION 211
If a set is represented as an ordered sequence, the least element under the ordering may be found
by taking the first element of the sequence.
Normally, removing an element from a sequence requires first searching for its position and then deleting that
position. In this case, we can avoid the search by applying the following rule:
If it is known that an element is the first element of a sequence, that element may be removed from
the sequence by removing the first element.
Finally, we must consider the two addition operations, one adding R to B and the other adding each successor
of X to B. In each case, we may apply the same rule used earlier in writing the consumer:
If it is known that an object follows every member of an ordered sequence, the object may be added
to the sequence by inserting it at the back.
And, in each case, we must prove the applicability condition. The proof when adding R to B is trivial, since
B is empty. However, the proof for adding the successors of X to B is somewhat more complicated. Before
considering the main proof, we need the following lemma:
This is clearly true initially, since B = {R}. Suppose it is true at some point. There are only two
changes in B that can be made during one step:
(a) After X is removed, either B is empty (in which case adding Y restores the desired property)
or there is some new closest node, call it W. We know:
so substitution (into the inequality that held before X was removed) gives us:
+1 for all Z in B
D(R,W) <D(R,Z) <D(R,Y) <D(R,W)
QED
Note that this lemma holds after each successor of X is added to B. So we are now ready to prove the condi-
tion necessary for adding Y at the back of B.
The lemma indicates that there is some X in B such that everything else in B is at least as far from the root
as X, but not further away than | arc. Therefore, this property must hold for the closest node in B. Since
D(R,Y) = D(R,X)+1, if Y is a successor of X, we know that D(R,Z) <D(R,Y) for all Z in B. Thus, Y
follows every element of B. QED
KNOWLEDGE AND DEDUCTION 23
Producer:
Initialization:
B:= <>
insert R at the back of B
Termination test:
is B empty?
Each step:
X := first element of B
remove first element of B
for all successors, Y, of X:
insert Y at the back of B
{send X to the consumer}
A transfer program involving a producer and consumer may be refined into a WHILE loop whose
condition is the termination test of the producer.
And we have the following algorithm for enumerating the nodes of a tree in breadth-first order:
SS
B:= <>
insert R at the back of B
while B is not empty do
X := first element of B
remove first element of B
for all successors, Y, of X:
insert Y at the back of B
insert X at the back of S
From this point on, rules about simple symbolic programming (such as those in PECOS) could produce the
final implementation. The interesting aspects involve representing the sequences S and B (each is probably
best represented as a linked list with a special pointer to the last cell), and the ‘‘for all’? construct (which
depends on the representation of the set returned by CHILDREN(P)).
214 BARSTOW
Write a program that inputs a weighted, undirected, connected graph G and constructs a subgraph T
such that:
(1) T is a tree;
(2) T contains every node of G;
(3) W(T) <min { W(T’) |T’ is a subgraph of G which is
a tree and which contains every
node of G}.
(W is the weight function on edges; when applied to a graph it denotes the sum of the weights on the edges.)
There are two standard algorithms for this task, one due to Kruskal [11] and one due to Prim [14]. For each of
these, there are a variety of implementations, based on different techniques and representations for the sets and
graphs involved. (Cheriton and Tarjan give a detailed discussion of several implementations [5].) Both stan-
dard algorithms are based on a single iemma, which may be stated as follows:
If T is a subgraph of some minimum cost spanning tree of G, T contains all the nodes of G, T” is
one of the connected components of T, and x is an edge of minimal weight connecting T’ to
another connected component of T, then T’+ {x} is a subgraph of some minimum cost spanning
tree:of. G:
If one knows this lemma, it is not difficult to come up with either of the standard algorithms. But what if one
does not know the lemma? In the rest of this section, we will see that the model of algorithm design discussed
in earlier sections leads naturally and directly to the fundamental lemma. In particular, we will consider the
minimum cost spanning tree from two different viewpoints, in each case applying a sequence of general pro-
gramming rules, and in each case arriving at the need to prove the lemma. In the interests of brevity,
syntheses will not be pursued at the same level of detail as the breadth-first enumeration algorithm, but only to
a sufficient degree to show how one is led to the fundamental lemma in each case. (Note: the rest of this dis-
cussion is worded as if the minimum cost spanning tree were unique: this is not crucial to the synthesis
pro-
cess, but simplifies the discussion.)
If the task involves finding a subset of a given set, try the transfer paradigm, with the producer pro-
ducing the elements from the first set and the consumer testing each for inclusion in the subset.
Of course, this paradigm only succeeds if it is actually possible to test the elements of the set individually as
they are produced, which at this point in the synthesis cannot be guaranteed (hence, the word ‘“‘try’’ in the
rule). If it turns out to be impossible to construct such a test, then a more general paradigm (such as depth-
first search allowing back-up by more than one step) or even an entirely different paradigm (such as the tree-
growth paradigm of the next section) must be tried.
As with the breadth-first enumeration problem, the transfer order is again important. The following rule
suggests that a particular order might be useful in this case:
If the object being built by the consumer must satisfy a minimality constraint based on some func-
tion of the objects being transferred, try constraining the transfer order to be in increasing order of
the function.
Again, there is no guarantee that this approach will succeed; this is simply a heuristic suggesting a potentially
useful strategy. This heuristic is, in fact, one of the key steps in reaching Kruskal’s algorithm, but note that it
is far more general than its application to this particular problem. For example, it also plays a role in creating
the standard breadth-first algorithm for determining the shortest path from one node in a graph to another.
Note also that there are other heuristics that one could apply here. For example, one could try considering the
edges in order of decreasing weight, resulting ultimately in an algorithm that gradually removes edges from the
original graph until the minimum spanning tree is determined (Kruskal describes this as ‘‘Construction A’”’
[11].)
After applying these two rules, the original task has been broken into two subtasks:
Producer:
Send the edges of the graph to the consumer
in any order satisfying the following constraint:
if W(El) < W(E2), then El is produced before E2
Consumer:
Using the objects received from the producer,
build a subset T of the objects such that:
(1) T is a tree;
(2) T contains every node of G:
(3) W(T) <min{W(T’) |T’ is a subgraph of G which is a tree
and which contains every node of G}.
The objects are received from the producer in an order
satisfying the following constraint:
if W(El) < W(E2), then El is produced before E2
For this task, the producer is relatively simple, essentially involving only a sort of the edge set of G, either as
a pre-process before sending any edges to the consumer, or intermittently during the process of sending. In
either case, the algorithm depends on the representation of G and the representation chosen to hold the
sorted edge set, but is not particularly complicated. Since it is also not relevant to identifying the critical
lemma, the synthesis process for the producer will be skipped in the interest of brevity.
216 BARSTOW
receives the
The consumer is a process that produces a subgraph of the original graph, assuming that it
edges of G in order of increasing weight. The following rule suggests a way to write the consumer:
If a consumer builds a subset from the set produced by the producer, try writing a test that deter-
mines whether or not an element belongs in the subset, based solely on the subset that has been
constructed so far, on the order in which the elements are received, and on the fact that any ele-
ment already received by the consumer is either not in the desired subset or has been added to the
subset already.
In effect, this rule suggests trying to add the elements to the subset one at a time. (A less desirable alternative
would be to reconsider the entire set of edges received so far, but this would be somewhat strange in light of
the original decision to try the transfer paradigm.) The problem here is to determine a relatively simple test.
Before developing the test, we must define the notion of a ‘‘preventive’’ constraint:
A constraint is preventive if, whenever a set S violates the constraint, there is no superset of S
that satisfies it.
Given this definition, the following rule suggests a test to be used in this problem:
If you know that the following is true (where S is the desired set, S’ is the subset constructed so far,
x is the element received by the consumer):
If S’ is a subset of S
and S’+ {x} does not violate any preventive constraints on S
and if y was received before x then y is in S’ or not in S
then S’+ {x} is a subset of S.
then the test for whether a new element may be added to the subset constructed so far may be
implemented as a test of whether the new element does not violate any preventive constraints on
the subset.
(While this phrasing may seem a bit long for a ‘‘rule,”’ its length is due to its precision; a less formal statement
would be ‘“‘Try using only preventive constraints in the test.’’)
In order to apply the rule, the condition must be tested (here, T is the desired subgraph of G):
If T’ is a subset of T
and T’+ {x} does not violate any preventive constraints on T
and if y was received before x then y is in T’ or not in T
then T’+ {x} is a subset of T.
Which are the preventive constraints? The constraint that T be a tree reduces to
two subconstraints: that T
be acyclic and that T be connected. Of these, the constraint that T be
acyclic is clearly preventive: if a
graph is cyclic, no graph containing it can be a tree; the constraint that
T be connected is clearly not preven-
tive: any disconnected graph may be extended into a connected
graph. The constraint that T contain every
node of G is not preventive: any graph that contains a subset of
the nodes of G can be extended to contain
KNOWLEDGE AND DEDUCTION) 217
every node of G. Finally, the minimality constraint is preventive (at least, if all edge weights are positive): if
the weight of a graph is greater than some value, then any extension of that graph will also be greater than that
value.
The third antecedent in the rule’s condition may also be instantiated for this problem: since the edges are
being transferred in order of increasing weight, the condition ‘ty was received before x’ reduces to ‘‘W(y) <
W(x)’’. Thus, the condition on the rule becomes:
If T° is a subset of T
and T’+ {x} is acyclic
and W(T’+ {x}) <min{W(T’) |T’ is a subgraph of G which is a tree
and which contains every node of G}
and if W(y) < W(x) then y is in T’ or not in T
then T’+ {x} is a subset of T.
since T must contain at least one of the edges not yet received by the consumer and since x has the least
weight of the edges not yet received. Thus, the condition may be reduced to:
If T’ is a subset of T
and T’+ {x} is acyclic
and if W(y) < W(x) then y is in T’ or not in T
then T’+ {x} is a subset of T.
that is,
Since
leading to the following condition which, if proved, is sufficient to permit the rule to be applied:
If T° is a subset of T
and T’+ {x} is acyclic
and W(x) <min {W(y) |T’+ {y} is acyclic }
then T’+ {x} is a subset of T.
Thus, by applying a sequence of relatively general programming rules, we are led naturally and directly to a
(slightly) special case of the fundamental lemma. The rest of the synthesis process is less interesting, and will
be omitted. The final algorithm is as follows (assuming that the producer sorts the edges before sending any to
the consumer):
T:=
E := edges of G sorted by weight;
while E is non-empty do
X := first element of E;
E := rest of E;
if T+ {X} is acyclic
then T := T+ {x}
Of course, a considerable amount of work remains to be done. In particular, the test for acyclicity can be very
inefficient unless some sophisticated data structures are maintained. Aho, Hopcroft and Ullman describe an
efficient implementation of Kruskal’s algorithm [1].
If the task involves finding a tree satisfying certain constraints, try starting with a degenerate tree (a
single node) and adding edges to the tree until the constraints are satisfied.
As with the transfer paradigm, there is no guarantee that this strategy will succeed. (In particular, it may not
be possible to determine an appropriate edge to add at each step.) But it still seems a reasonable strategy to
explore.
In applying this strategy, there are two subtasks: determining a technique for selecting the initial node
and determining a technique for finding an edge to add at each step. For the first, the following rule may be
used:
If the set of nodes in the desired tree is known, try selecting any node from the set as the initial
node.
Of course there are other plausible techniques. (For example, selecting one of the nodes on an edge of
minimal weight is another good way to start.) But the unconstrained selection seems as good as any, and is
easily implemented (depending on the representation of the nodes of the graph).
The second subtask, determining a technique for finding an edge to add at each step, is more difficult, but
the following rule offers a possibility:
If the tree to be grown must satisfy a minimality constraint, and you know that the following holds:
if T’ is a subtree of T
and x extends T’
and T’+ {x} does not violate any preventive constraints on T
and W(x) = min {W(y) |y extends T’ }
then T’+ {x} is a subtree of T
then at each stage select the edge of minimal weight that extends T’ and does not violate any
preventive constraints.
(This rule is actually just a variant of the minimality heuristic used in developing Kruskal’s algorithm.) Before
applying the rule, the condition must be proved. As with Kruskal’s algorithm, the only preventive constraints
are that T’+ {x} be acyclic and that the weight of T’ be minimal. By similar reasoning, the condition reduces
to:
if T’ is a subtree of T
and x extends T’
and W(x) = min { W(y) | y extends T” }
then T’+ {x} is a subtree of T
Again we are led naturally and directly to a (slightly) special case of the fundamental lemma. In this case, we
were led even more directly to the necessary condition than with Kruskal’s algorithm, but this is primarily
because the rules were specific to graph problems, rather than being oriented more generally toward sets.
The final algorithm is as follows:
220 BARSTOW
T:= {}
while T does not contain all of the vertices of G do
X := edge of least weight that connects T to some vertex not in T;
Te=T4\X}
Here again, considerable work remains to be done. In this case, the hard part is determining the edge of least
weight that extends T. Dijkstra and Kerschenbaum and Van Slyke give implementations of the basic algorithm
[6,10].
D. Discussion
The model of algorithm design and program synthesis that emerges from these examples has several parts.
First, there is a body of strategic rules, each suggesting a paradigm that may lead to an algorithm for solving the
given problem. These strategic rules are relatively specific (more so than condition- or recursion-introduction
rules) yet applicable in a variety of situations. Even when applicable, however, there is rarely a guarantee that
the paradigm will indeed result in an algorithm: that can only be determined by trying to fill out the pieces. If
an algorithm does result, and if the applicability conditions have all been met, then the algorithm will be correct
(assuming correctness of the individual rules). Second, there is a larger body of implementation rules which
can be used to refine an abstract algorithm into concrete code in some target language. These rules are again
relatively specific, but usually independent of any particular target language. Finally, there must also be rules
for the chosen target language; while these rules must at some point take over from the implementation rules,
this may occur at different levels of abstraction for different languages and problems.
In the process of testing the applicability conditions of any of these rules, deduction plays a very impor-
tant role. When it is used, it involves properties of the program being developed and properties of the domain.
(It is interesting to note that, even if a condition can not be proved, one could apply the rule, resulting in a
heuristic program; this would be especially useful if the condition were tested and found to hold on a large
number of examples.) Although not illustrated by the hypothetical syntheses, a mechanism for choosing from
among applicable rules is also needed. Among the possibilities for such a mechanism are efficiency models of
the program being developed (i.e., an evaluation function) and heuristic rules based both on the efficiency of
the target program and on the likelihood of a particular strategy succeeding. This latter consideration seems
especially important in the initial algorithm design phase.
The rules illustrated here (and the concepts involved in them) are relevant in many more situations than
these graph problems. As noted earlier, the transfer paradigm provides a useful model for several sorting algo-
rithms, including selection and insertion sort. In fact, choosing the transfer order seems to be the primary deci-
sion involved in differentiating the two types of sorting, and the rule about adding objects to ordered sequences
is itself used in deriving an insertion sort. (For a more detailed discussion of this paradigm for sorting, see
[8].) As another example, the notion of an enumeration state is critical to any enumeration (in fact, it
corresponds to the loop invariant), and the three-state mapping used here is simply a generalization of the stan-
dard state-saving schemes used for sequences such as linked lists and arrays.
In fact, many of the rules used in these hypothetical syntheses have already been implemented and tested
in the PECOS system. For example, PECOS’s knowledge base included the rules about inverted mappings and
record structures, and they were used in the implementation of several abstract algorithms, including
a reacha-
bility algorithm and a simple concept formation algorithm.
KNOWLEDGE AND DEDUCTION 221
E. Conclusions
In order for future automatic programming systems to be truly useful when applied to real-world tasks,
they will necessarily have to deal with complex programs at a rather detailed level. It seems to me that this will
only be possible if these systems have effective access to large amounts of knowledge about what we today con-
sider to be the task of programming. While some of this knowledge certainly involves general strategies, such
as the conditional and recursion introduction rules of deduction-based systems, much of the knowledge also
involves rather specific detailed facts about programming in particular domains, such as PECOS’s rules about
symbolic programming. (This is reminiscent of discussion of the trade-off between generality and power in
artificial intelligence systems.) The first attempt to build such programming knowledge into an automatic pro-
gramming system involved applying the knowledge to algorithmic specifications, in which the programming task
was to produce a concrete implementation. The hypothetical syntheses of a breadth-first enumeration algorithm
and two minimum-cost spanning tree algorithms demonstrate that specific detailed knowledge about program-
ming can also be of significant value in the algorithm creation process. At the same time, they illustrate an
important role to be played by deduction in such knowledge-based automatic programming systems: as a
mechanism for answering particular queries about the program under construction, in order to test the applica-
bility conditions of particular rules. Important directions for future research involve adding to the rules that
have already been codified and developing deductive techniques for use with such rules.
F. Acknowledgements
Much of this work resulted from discussions during seminars in Knowledge-based Automatic Program-
ming held at Yale in Fall 1977 and Spring 1979. L. Birnbaum helped develop the basis for the section on
minimum cost spanning trees. B.G. Buchanan, D. McDermott, A. Perlis, C. Rich, H. Shrobe, L. Snyder, R.
Waldinger, and R. Waters provided helpful comments on earlier drafts of the paper.
References
[1] Aho,A.V., Hopcroft,J.E. & Ullman,J.D. (1974). The Design and Analysis ofComputer Algorithms, Reading,
Mass: Addison-Wesley.
[2] Barstow,D.R. (1979). An experiment in knowledge-based automatic programming, Artificial Intelligence,
1257 3—120:
[3] Barstow,D.R. (1979). Knowledge-based Program Construction, New-York and Amsterdam: Elsevier North-
Holland, 1979.
[4] Bibel,W. (1980). Syntax-directed, semantics-supported program synthesis. Artificial Intelligence, 14,
243 —261.
[5] Cheriton,D. & Tarjan,R.E. (1976). Finding minimum spanning trees, S/AM Journal of Computing, 5,
442.
[6] Dijkstra,E.W. (1959). A note on two problems in connexion with graphs, Numerische Mathematik, 1,
269-271.
[7] Green,C.C. (1969). The application of theorem proving to question-answering systems. A/M—96. Stan-
ford: Stanford University Computer Science Department.
[8] Green,C.C. & Barstow,D.R. (1978). On program systhesis knowledge, Artificial Intelligence, 10, 241—279.
[9] Gries,D. & Misra,J. (1978). A linear sieve algorithm for finding prime numbers, Communications of the
ACM, 21, 999-1003.
222 BARSTOW
[10] Kerschenbaum,A. & Van Slyke,R. (1972). Computing minimum spanning trees efficiently, Proceedings of
the 25th Annual Conference of the ACM, pp. 518—527.
[11] Kruskal,J.B. (1956). On the shortest spanning subtree of a graph and the travelling salesman problem,
Proceedings of the American Mathematical Society, 7, 48—50.
[12] Manna,Z. & Waldinger,R. (1975). Knowledge and reasoning in program synthesis, Artificial Intelligence, 6,
W5— 208
[13] Manna,Z. & Waldinger,R. (1979). Synthesis: dreams => programs, /EEE Transactions on Software
Engineering, 4, 294—328.
[14] Manna,Z. & Waldinger,R. (1978). A deductive approach to program synthesis, Stanford: AJ/M—320.
Computer Science Department, Stanford University.
[15] Prim,R.C. (1975). Shortest connection networks and some generalizations, Bell System Technical Journal,
36, 1389-1401.
[16] Simon,H.A. (1963). Experiments with a heuristic compiler, Journal of the ACM, 10, 493—506.
[17] Waldinger,R. & Lee,R.C.T. (1969). A step toward automatic program writing, /nternational Joint Confer-
ence on Artificial Intelligence, (Washington,D.C.)
,pp. 241 —252.
INVARIANT BASED PROGRAMS — 223
CHAPTER 11
Ralph-Johan Back
Department of Computer Science
University of Helsinki
Editor’s Note: The author has included a very complete discussion of our canonical example from Chapter 1.
It turns out that through no fault of his own, the specification was understood to be that the output should be a
list of all n>O from the input rather than all n>0 as was assumed in Chapter |. Hopefully, this will not cause
confusion.
Abstract
The technique of starting from invariants in constructing iterative programs is studied. Invariants are
viewed as internal specifications of the program, expressing requirements on the way in which the algorithm
should work. The advantages of this view for checking program correctness and for locating possible errors in a
program are discussed. A simple language for programs based on invariants is defined. The technique for con-
structing programs from invariants is described by means of an example. Finally, a system for checking pro-
gram correctness and locating errors in programs is described.
224 BACK
A. Introduction
One of the main problems in programming is the question whether a given program is correct or not.
Correctness is checked by two different techniques, program testing and program verification, neither one of
which in itself is sufficient to decide whether a given program is correct or not. The limits of program testing
are aptly characterized by Dijkstra’s remark that testing only can be used to show the presence of errors, never
their absence. This is usually cited as a motivation for program verification. On the other hand, program
verification is only useful for showing the absence of errors, i.e. correctness, not their presence. The underly-
ing assumption in program verification is that the program to be proved correct is in fact correct. The methods
are not designed to detect program errors.
In practice, one does not know whether a given program is correct or not, and the important thing is to
find this out. Moreover, if the program turns out to be incorrect, one wants to know where the errors are.
Therefore both testing and verification will usually be needed. In addition, one needs debugging to locate pos-
sible errors. Even if testing does reveal that there is an error in the program, it will not provide any informa-
tion about what kind of error is responsible for the incorrectness or where the error is in the program text.
Only when testing results in a run-time error do we get a pointer to a specific location in the program text.
There have been some attempts to strengthen the traditional program verification techniques so that
incorrectness of a program could be established (Brand [1978], Katz and Manna [1976]) and conversely to
strengthen the testing methods so that correctness could be established (Goodenough and Gerhart [1975]).
Debugging, however, is usually left as a more or less informal technique. The situation in checking program
correctness should be compared with the elegance by which syntactic correctness is checked by a compiler. The
compiler scans the program line for line and either detects one or more syntactic errors, indicating their places
in the program text, or does not find any errors, in which case the program is declared syntactically correct. A
similar system for checking semantic correctness would, if possible at all, clearly be desirable. We will here
consider the possibility of such a system, in some form or another, in the context of simple iterative programs.
To make things more concrete, we will take a simple example program and consider how its correctness is
checked. We choose the standard example of this book, i.e., a program which inputs a list of integers and
returns that list with all nonpositive integers removed. Thus e.g. input (—7 2 9 —3 4) should produce output
(2 9 4).
For simplicity, we assume that we have available a data type for lists, called /ist of T, where T is the type
of the list elements (integer in this case). This data type is assumed to have at least the following constants
and operations:
entry: L = LO,
exit: M is LO with all nonpositive integers removed.
The constant LO denotes the initial value of L. The program will change the value
of L, so we cannot refer to
L in the exit condition.
INVARIANT BASED PROGRAMS 225
We can make the exit condition more precise by introducing some additional notation. If Ll and L2 are
lists, we write Ll << L2 to denote the fact that L1 is a sublist of L2, in the sense that L1 can be created from
L2 by removing some elements from L2, without disturbing the order of occurrence of the remaining elements.
We let e(L) denote the set of elements in the list L. Finally, if P(x) is a property of elements, then we can
extend this property to sets of elements A by asserting P(A) if and only if P(x) holds for all x in A. With
these conventions, the exit condition can be expressed by
This says that M only containsintegers from LO, in their original order of occurrence (M << LO), that all
integers in M are positive (e(M) > O) and that all positive integers of LO are included in M (because
e(LO) —e(M) < 0, ice. all integers in LO and not in M are nonpositive).
The following program is given as a solution to this programming problem:
The question now is whether this program is correct or not, i.e. does it satisfy its specification, given in the
form of entry and exit conditions.
We will only consider partial correctness of programs here. A program S is partially correct with respect to
entry condition P and exit condition Q, denoted P{S}Q,if whenever P holds initially and the execution of S ter-
minates normally, Q will hold upon termination of S. The formula P{S}Q is referred to as a partial correctness
formula. Thus, partial correctness of the above program is expressed by
which says that if initially L = LO holds and the program terminates normally, then upon termination M will
be LO with all positive integers removed. (The program is allowed to loop forever or to terminate abnormally.)
We can test this program by executing it for some selected inputs and checking that the outputs satisfy
the exit condition. The values LO = (—7 2 9 —3 4) and M = (2 9 4) do e.g. satisfy the exit condition. When
confident that the program does not contain any errors, we can try to prove it correct. Partial correctness is
usually proved by attaching an assertion to each program loop (referred to as the loop invariant), which shows
the situation holding prior to each iteration of the loop. A suitable invariant for this program is found by
analyzing the behaviour of the program, looking at the situation which holds prior to the test of the loop. An
initial part of LO has then been scanned, with its positive integers stored in M and the remaining integers still
in L. This situation can be expressed by the invariant:
moving: M*L << LO and e(M) > 0 and e(LO) —e(M*L) <0.
Here M*L denotes the concatenation of the lists M and L. To prove partial correctness, we have to show that
this invariant holds initially, when the loop is entered, that it is preserved by each iteration of the loop and that
it implies the exit condition, if the loop ever terminates. These conditions are referred to as verification condi-
tions.
226 BACK
We can express the verification conditions quite nicely with partial correctness assertions, if we use the
assert-statement, of the form
assert b
where b is a boolean condition. This statement has no effect if b is true, but if b is false, it forces an abnormal
termination of the program.
The verification conditions of the program above are now the following. First, the initialization of the
loop:
(Note that if L # null, then an abnormal termination occurs, so the partial correctness assertion is trivially
satisfied.) Finally, we have to show that each iteration preserves the loop invariant:
This last verification condition can in fact be split into two, by treating the cases x > 0 and x < 0 separately,
giving the verification conditions
and
Partial correctness is now established by showing that each verification condition holds. This
is quite straight-
forward, by considering the effects of the respective statement sequences. One can
also express the verification
conditions directly as first-order formulas, thus removing the last traces of program text.
The verification con-
ditions (1) and (2) correspond e.g. to the formulas
The last two verification conditions are a little bit more complicated, (3a) corresponds e.g. to the formula
The variables x1 and LI are here used to denote the initial values of x and L. We write moving[x1,L1/x,L] for
the formula we get by substituting xl for each (free) occurrence of x and L1 for each free occurrence of L in
the formula moving.
The problem with this verification technique is that it only can provide a positive answer to the question
whether the program is correct or not. If all verification conditions are found to be true, then the program is
partially correct. If, however, some verification condition is not true, then we know nothing about the correct-
ness of the program. It could still be correct, in which case the loop invariants must have been wrongly
chosen, or it could be incorrect, so no choice of invariants could make all verification conditions true. More-
over, even if we could establish incorrectness, by testing or by some other means, we still could not locate the
errors in the program. The notion of a specific location of an error is not meaningful when talking about partial
correctness; the program is either all right or all wrong.
The situation would, however, be different, if we looked upon the invariants in a slightly different way.
Rather than being a tool in verifying partial correctness, we could see them as internal specifications of the pro-
gram, putting certain requirements on the way in which the algorithm is to work. In this case, if some
verification condition is found to be false, it means that the algorithm does not satisfy its specifications, i.e. is
incorrect, because some internal specification (invariant) or external specification (entry or exit condition) is
violated. We could then use a verification condition generator to check the correctness of programs. It would
compute all the verification conditions of the program (using the given invariants) and feed them to the pro-
grammer one by one, asking him (or maybe an automatic theorem prover) to decide whether they are true or
false. If all verification conditions were found to be true, then the program would be correct. If some
verification condition was found to be false, the program would be rejected as incorrect. Moreover, the places
of the errors could also be localized: the parts of the program text from which the false verification conditions
were computed.
The price to be paid for this is that the pre- and postconditions of the program together with the loop
invariants must now be considered part of the program text. Correctness in the sense above is then turned into
an inherent property of the program, which can be decided by a verification condition generator (assuming the
availability of an oracle to decide whether an arbitrary verification condition is true or false). We will refer to
programs of this kind, in which pre- and postconditions together with all loop invariants are explicitly stated, as
invariant based programs.
This view of program correctness is acceptable if the invariants really can be understood as specifications,
to be satisfied by the program code, rather than as comments, explaining how the program works. We will try to
show below that the first view is defendable, by describing a programming method in which the invariants are
designed before the program code is determined, thus serving as specifications for the latter.
The idea of using invariants to prove program correctness is due to Floyd [1967] and Naur [1966]. Hoare
[1971] was the first one to use the invariants as guides in constructing programs, combining it with the stepwise
refinement technique developed by Dijkstra [1968] and Wirth [1971]. The book by Dijkstra [1976] is an impor-
tant step further in this direction, with the issue of termination being given special attention. The use of
invariants without the restriction to while-programs, which is the approach we will take here, has been
described by van Emden [1979] and by Reynolds [1978]. The use of invariants as internal specifications of the
program has also been studied by Blikle [1979].
228 BACK
We will in this chapter consider the following questions concerned with invariant based programs:
which
(1) What is a suitable language for invariant based programs. We will describe a simple language
we think is well-suited to the construction of such programs.
(2) How should invariant based programs be constructed. We will describe, by means of an example, a
programming method in which the invariants are designed before the code itself is constructed.
(3) How should the correctness of invariant based programs be checked. We will describe a verification
condition generator for the proposed language which works as a compiler, checking for errors in the pro-
gram text and marking all detected ones, declaring the program to be incorrect or correct depending on
whether errors were found or not.
xi=e:; S, |
if Die Spel yD eee S llamo)
Here L is a label identifier, x is a variable identifier, e is an expression, bj,...,6, are boolean expressions and
S},....5, are simple statements. The syntax of identifiers, expressions and boolean expressions is the conven-
tional one and will not be defined here.
A compound statementC is a collection of labelled simple statements. The syntax is
where So,..., S, are simple statements and L},..., L, are distinct labels.
A declaration D is either a variable declaration or a label declaration. The syntax is
D2= var xl |
label =igO:
Here x is a variable identifier, T is a type, L is a label identifier and Q is an invariant. The syntax of invariants
and types will not be further defined (first-order formulas can be assumed for invariants, Pascal-types for
types).
An environmentE is a sequence of declarations. The syntax is
Be Ee
He EP} B:
All labels and variables used but not declared in B must be declared in the global environment E. No identifier
declared in E may be redeclared in B (this restriction could be lifted, with a slight complication in the correct-
ness checking). The postconditions of B are given as label declarations in E (we allow more than one exit label
for B).
The assignment statement has its usual interpretation and the semicolon is used for compound state-
ments. Note, however, that only a restricted form of compound statements is allowed by the syntax. (The first
statement must be an assignment.) The conditional statement is essentially the guarded conditional statement
of Dijkstra [1976], i.e. the boolean expressions by,...,b, (the guards) are evaluated, one of the guards which
230 BACK
may evalu-
evaluate to true is selected, and the corresponding statement is executed. More than one condition
ate to true, in which case the choice is nondeterministic. If no condition is true, then an abnormal termination
of the program occurs.
A label identifier L in a simple statement stands for the command goto L. It signals a jump to the state-
ment labelled by L in the block, with execution continuing from this statement. If L is a label declared in the
global environment, then execution terminates at this label.
The syntax implies that execution of a simple statement always ends in an explicit goto-statement. Simple
statements and blocks are thus single entry - multiple exit constructs. They have a tree structure, with labels as
leaves and branching points at the conditionals. The essential difference, as compared to while-programs, is
that joining of paths which have been split up by a conditional statement is disallowed. More precisely, we do
not allow constructs like
These paths have different beginnings but the same ending. They can therefore not be designed independently
of each other, as each one must fit the common continuation S. This construct would make the design of sim-
ple statements more difficult (although the code undoubtedly would be more concise), which is the reason for
its rejection.
A block is essentially an ordinary Pascal block, with the symbol '0' standing for *;’. Execution of the
block starts with SO and continues according to the usual rules for assignments, conditional statements and
goto’s. However, because each simple statement ends in an explicit label, the order in which the labeled state-
ments are listed in the block is not significant. This again allows the statements for the different labels to be
constructed independently of each other. Nested blocks could be accommodated for quite easily in this
language (see e.g. Back [1980]), but we have omitted them for reasons of simplicity.
The programmer states the required behavior of the program with declarations. The declaration var x: T
gives the type of the variable x. It states that the variable x should onlv be assigned well-defined values of this
type (otherwise a run-time error will occur). The declaration label L:Q gives the invariant at label L. It states
that Q may be assumed to hold whenever execution of the block is at label L. The syntax of the language is
such that a loop only can be constructed by explicit backward jumps to labels in the block. As all labels must
be declared, this makes it illegal to construct a loop without explicitly Stating an invariant for it. The label
declaration thus serves two different purposes, naming invariants and declaring control points. (It will some-
times be convenient to use a label declaration solely for the first purpose. We do it in the examples, to name
the entry conditions of the programs constructed.)
The program E{P}B contains all the information about the intended behavior of the program which is
needed to check its correctness. The global environment E declares all global variables and all labels referred
to
but not declared in B. The precondition P states the condition which may be assumed to hold
at entry to the
block (the exit conditions are declared as labels in E)
As an example, we show how the example in the introduction is expressed in this
language:
INVARIANT BASED PROGRAMS — 231
The programming language described here is in content more or less equivalent to the transition diagrams of
Reynolds [1978]. A transition diagram is a finite graph, where invariants are associated with the nodes of the
graph and assignment or assert statements are associated with the arcs of the graph. The language above pro-
vides a linear notation for these transition diagrams and also decreases the number of invariants which need to
be explicitly stated.
Another way of describing invariant based programs has been proposed by van Emden [1979]. His
suggestion is to express the program directly as a set of verification conditions. Our example program would
then be represented as the set of verification conditions {(1), (2), (3a), (3b)} given in the introduction. The
language described here is also very close to van Emden’s representation, but avoids the repetitions inherent in
that approach, as all transitions from an invariant are bundled together to form a single statement. Both Rey-
nolds and van Emden use goto-programs to describe the final executable version of a program. We think that
the language proposed here provides a cleaner way of describing the end product, without sacrificing efficiency
or losing the information provided by the invariants.
Blikle [1979] has very much the same approach as we, emphasizing the role of invariants as internal
specifications of the program, but chooses to extend while-programs to incorporate the invariants and the entry
and exit conditions.
The language described above is in form almost identical to the language for tail recursion described by
Hehner [1979]. However, Hehner gives a very different interpretation for the labels, taking them to be calls on
recursively defined parameterless procedures. This view supports the stepwise refinement technique, but not
directly invariant based program construction in the form we want. It is, however, interesting to note that
essentially the same language provides a basis for both these, conceptually rather different, methods.
(Hehner’s tail recursion provides evidence for the fact that stepwise refinement is also useful without insisting
on a restricted set of control structures.)
232 BACK
where char is the data type of characters. The input will be given in a global variable line, declared as
The input may only be manipulated by the operation ‘‘read(x)’’, x a character variable, which has the effect
The only operations allowed on the output are ‘‘reset’’ and ‘‘write(w)’’, w a string, with the effects
We will not define this function more precisely here. (The precise definition is given as an exercise.) The pro-
gram must establish the situation
I I. | scanning blanks
l W | scanning letters
For the other invariant, when we are scanning words, we need an auxiliary variable w, declared as
var w: String.
The word being scanned will be accumulated into this variable. A similar reasoning as the one above gives us
the invariant:
234 BACK
(last(linel) is the last character in linel). Here the last conjunct expresses the fact that w must contain all the
initial letters of the word being scanned.
The program is constructed by considering the entry condition and each invariant in turn, analyzing what
needs to be done to either reach the exit or to reach one of the two invariants (in a way which brings the com-
putation closer to termination). The order in which the invariants are handled does not matter, because of the
independence of the labelled statements in the block.
Let us e.g. start by considering how to proceed from the invariant ‘‘scanning blanks’’. Analyzing the
situation, we see that there are essentially two different cases, either line = null or line # null. In the first case
the task has been completed, i.e. we have established the situation ‘‘parse computed’’. (It follows from the
invariant that parse contains all the words of lineO.) In the second case, there is at least one character left in
line. Let us read this character into the variable c, removing it from line at the same time. We now again have
two cases, either c =’'’ orc #’’. If c =’'’, then we are still scanning blanks. If, however, c #'’, then we
have found the first character of the next word. Initializing w to contain this character will mean that we are
scanning letters. In both cases a loop may be created, but at the same time the computation has been brought
closer to termination, because the length of the string in variable line has been decreased.
The above analysis is more concisely expressed by the following labelled simple statement:
scanning blanks:
if eol - >
parse computed
O not eol ->
read(c);
iteo= =>
scanning blanks
Ee se eS
w:= add(null,c);
scanning letters fi fi.
. This shows the way in which one proceeds from a chosen invariant to other invariants. The initial
situa-
tion is analyzed and the possible cases are identified. Each case is then handled independentl
y of the others,
changing the variable values by assignments and identifying new subcases, which again can
be handled indepen-
dently of the other cases and subcases. The programming language supports this kind
of carefully progressing
case analysis, by keeping the different cases separate from each other: once a
situation has been split into
different cases, the way in which one of these cases is handled does not influence
the other cases.
The need to check that each loop created does indeed terminate is the main
break in the independence of
the labelled statements of the block. In this case, the interaction between
the statements must be considered
In Back [1980] we show how the language can be extended so that terminati
on also can be handled ina mcd
lar fashion. A disadvantage of this language, compared to while-programs,
is also evident when termination is
considered: identifying the loops of a program is easier in while-pr
ograms, because of their explicit looping
constructs.
INVARIANT BASED PROGRAMS — 235
The initial situation, described by the entry condition, and the other invariant are handled in a similar
way. The resulting invariant based program is as follows:
parse computed
initial
R and x = t,
where R is a first-order formula, x is the list of variables declared in E and t is a list of terms, the number of
terms in t being the same as the number of variables in x. No variable declared in E may occur free in R or
occur in any term int. The precondition stands for the formula
where Ky,..., K, are the assumptions and K is the conclusion. It says that if Ky,..., K, are true, then the con-
clusion K must also be true. We will depart from this standard, writing a proof rule in the form
hohe
eelek
when Eiwe a
(m,n > 0), where H, Hj,..., H, are invariant based programs and F),..., F, are first-order formulas. Assuming
that H = E{P}B, this stands for the proof rule
Hueco
P=>F,...., P=>Fm
.
This notation is chosen to emphasize the use of the proof rules as a reduction system. The correctness of the
program H, with precondition P, can be reduced to the correctness of the programs Hj,..., H,, provided all the
formulas P=>Fy,,..., P=>F, are valid. In proving the correctness of an invariant based program, one starts
from the original program and applies these reduction rules as long as possible, each time checking that the
conditions for the application of a reduction rule (proof rule) are satisfied. If the reduction terminates with no
programs left to reduce, the program is correct, otherwise it is incorrect.
238 BACK
Before going to the proof rules, we introduce some useful notation. First, if L is a label, then E(L) is the
invariant associated with L in the environment E. Given a term t (a value expression or a boolean expression),
a a list
def(t) is an assertion which is true if and only if the expression t has a well-defined value. Finally, for
and e a list element, a, e denotes the result of appending e to the end of list a.
The proof rules for checking the correctness of invariant based programs are as follows.
1. Variable declaration:
E{R and x=t} var y:T;B
--> E; var y: T{R and x, y= t, y’}B
Here y’ is a fresh identifier, not declared in E or B or occurring in R or ¢. Thus the declaration of variable y
means that y is initialized to some arbitrary value y’.
2. Label declaration:
A label declaration is thus simply moved into the environment, without affecting the precondition in any way.
3. Compound statements:
E{R and x= t} begin SyG L;:S,;--- O Ly:S, end
--> E{R and x= t}So,
E{E(L,)(x'/x] and x I x’}S),
where x’ is a list of distinct fresh identifiers, not declared in E or occurring free in R or t. Correctness of a
compound statement thus reduces to n+1 independent checks: the correctness of the initialization statement
SO and of the statements S),..., S,.
4. Conditional statements:
Thus, correctness of a conditional statement is reduced to the correctness of each alternative (with the guard
5. Assignment statement:
Correctness of an assignment statement construct is thus reduced to the correctness of the statement following
the assignment statement, provided that the expression has a well-defined value.
6. Labels:
Reduction thus terminates when a label is reached, provided that the invariant associated with the label is
satisfied.
These proof rules allow us to check the correctness of an invariant based program in a way which is simi-
lar to the syntax checking done by a recursive descent parser. The environment E corresponds to the symbol
table of the parser. R and x function as variables used by the parser to keep relevant information needed to
generate the correctness checks (when-conditions). These checks together make up the verification conditions
of the program. The difference, as compared to the usual verification condition generators, is that each initial
segment of a path in a program generates a verification condition. This allows very precise location of the
errors in the program text. An error occurs at the end of an initial segment for which the verification condition
produced is true, but the extension of which gives a verification condition which is false.
An alternative way of looking at the proof system, which shows the close relation to the program text, is
to look at the precondition {R and x =t} as a marker in the program text. The proof rules show under which
conditions this marker can be moved forward in the text. Starting from the initially given invariant based pro-
gram, with the precondition as the only marker in the text, this marker, and any others which may be created
by the proof rules, are moved forward as far as possible in the text. If all markers can be eliminated from the
text, then the program is correct. If some marker gets stuck, because some when-condition is false, then the
marker shows the place of the error and what is known to be true at that place, while the when-condition shows
the nature of the error.
The simple marking scheme we use will give all the information available about the error detected. This
would not be the case if joining of paths was allowed, e.g. by the construct
if bie Soh ee eS hs
If the marker pointed at the position immediately preceding the statement S, then it would not be clear which
one of the cases bj,..., b, was responsible for the error (although this information is contained in the
verification condition computed). In this case, some indication of the path responsible for the error would have
to be added to the error message.
We will illustrate the use of the proof rules with the example program of the introduction. We repeat this
program below for ease of reference. We show for selected program points, indicated by numbers in the text,
what may be assumed to be true at that point (the marker R and x =t) and what needs to be checked at that
point (the when-condition).
240 BACK
The assumptions and checks associated with the above program points are then:
E. Summary
Our main purpose here has been to discuss the use of invariants in programs, when these are understood
as internal specifications for the program to be constructed, rather than as comments about the way in which
the program works. We have tried to show the advantages of this view for checking program correctness: Not
only can we show the correctness of programs, but also incorrectness can be established, and the program
errors responsible for the incorrectness can be located.
We have described a simple programming language, intended to support a program construction technique
in which the design of invariants precedes the construction of the program code itself. The feasibility of this
approach was supported by a simple example of program construction. Finally, we described a simple system
for checking program correctness, essentially a verification condition generator, allowing very precise location of
program errors in the text.
Acknowledgements
I would like to thank J.W. de Bakker, A. de Bruin, H.B.M. Jonkers, P. Orponen and J.V. Tucker for their
critical comments and for the stimulating discussions we had on the topics treated here.
References
Back [1980a]
R.J.R. Back, ‘‘Exception handling with multi-exit statements, Programmiersprachen und Programmentwick-
lung,’ Darmstadt 1980. Informatik — Fachbereich 25, Springer Verlag.
Back [1980b]
R.J.R. Back, ‘“‘Checking whether programs are correct or incorrect,’» Mathematical Center report IW 144/80,
Mathematical Center, Amsterdam 1980.
Blikle [1979]
A. Blikle, ‘‘Specified programming,’ in: K.E. Blum, M. Paul and S. Takosu (eds.), Mathematical Studies of
Information Processing, Lecture Notes in Computer Science 75, Springer Verlag 1979.
Brand [1978]
D. Brand, ‘‘Path calculus in program verification,’ Journal of the ACM, Vol. 25, No. 4 (October 1978), pp.
630—651-
Dijkstra [1968]
E.W. Dijkstra, ‘‘A constructive approach to the problem of program correctness,’ BIT 8 (1968), pp. 174-186.
Dijkstra [1976]
E.W. Dijkstra, A Discipline of Programming, Prentice-Hall, Englewood Cliffs, NJ., 1976.
Floyd [1967]
R.W. Floyd, ‘‘Assigning meanings to programs,’’ Proceedings of the AMS Symposium in Applied Mathematics
TSA 9IGT, polo 1s
242 BACK
Gerhart [1976] z .
S.L. Gerhart, ‘‘Proof theory of partial correctness verification systems, SIAM J. of Computing, Vol. 5, No. 3
(September 1976), pp. 355—377.
Hehner [1979] .
E. Hehner, ‘‘Do considered od: a contribution to the programming calculus,’ Acta Informatics 11 (1979), pp.
287—304.
Hoare [1971]
C.A.R. Hoare, ‘‘Proof of a program: FIND,’’ Comm. ACM 14(January 1971), pp. 39-45.
Naur [1966]
P. Naur, ‘‘Proof of algorithms by general snapshots,’ BIT 6(1966), pp. 310—316.
Reynolds [1978]
J.C. Reynolds, ‘Programming with transition diagrams, in: Gries, D. (ed.), Programming Methodology,
Springer Verlag, Berlin 1978.
Sites [1974]
R.L. Sites, ‘‘Proving that computer programs terminate cleanly,’ Stanford Report CS—74—418, 1974.
Wirth [1971]
N. Wirth, *“*Program construction by stepwise refinement,’’ Comm. of the ACM 14, 4(1971), Pow 2
INVARIANCE PROOF METHODS — 243
CHAPTER 12
Patrick Cousot
Université de Metz
Faculté des Sciences
Ile du Saulcy
57045 Metz cedex
France
Radhia Cousot
Centre de Recherche en Informatique de Nancy
France
A. Introduction
We propose a unified approach for the study, comparison and systematic construction of program proof
and analysis methods. Our presentation will be mostly informal but the underlying formal theory can be found
in Cousot and Cousot [1980b, 1979], and Cousot, P. [1981].
244. COUSOT AND COUSOT
We use discrete state transition systems (Keller [1976], Cousot, P. [1979, 1981]) as abstract models of
programs so that our approach is independent of any particular programming language. We use parallel pro-
grams with shared variables for illustration purposes.
Our approach is also independent of the particular class of program properties which is considered. For
simplicity we only consider invariance properties in this paper. Important properties falling under this category
are partial correctness, non termination, absence of run-time errors, deadlock freedom, mutual exclusion, etc.
Since programs are finite descriptions of arbitrarily long and sometimes infinite computations, properties
of these computations can only be proved using some inductive reasoning. Hence program proof methods rely
upon basic induction principles. For a given class of program properties several different induction principles
can be considered. For simplicity, only one basic induction principle will be considered in this paper, which
underlies Floyd [1967]’s partial correctness proof method. (A number of different although equivalent induc-
tion principles for invariance can be found in Cousot and Cousot [1982].)
All proof methods which rely upon the same induction principle intuitively look similar, but can be
difficult to compare in the abstract. We offer a unified view for comparing them. It consists in showing that
the verification conditions involved in any of these methods can be obtained by decomposition of the global
inductive hypothesis used in the induction principle into an equivalent set of local inductive hypotheses. (Such
decompositions can be formalized as connections between lattices (see Cousot and Cousot [1979, 1980b]) and
in particular obtained by a cover of the set of states of the program where each local inductive hypothesis holds
for a given block of the cover (Cousot [1979]). It is possible to find as many proof methods as such different
decompositions. We illustrate only three of them which respectively lead to the Floyd [1967], Owicki and Gries
[1976], and Lamport [1977, 1980] invariance proof methods. This approach also provides a framework for sys-
tematically constructing new sound and complete proof methods based on unexplored induction principles or
decompositions. (See for example Cousot, R. [1981], Cousot and Cousot [1980a]).
Static program flow analysis techniques can be used for discovering semantic properties of programs, that
is, for discovering properties of the runtime behavior of programs without actually running them. Such analysis
methods consist in solving a fixed point system of equations (by elimination or iteration algorithms) associated
with the program to be analyzed (Cousot and Cousot [1977]). In the design of such methods the essential part
consists in defining correctly the rules for associating the system of equations with the program. We have
shown (Cousot and Cousot [1979]) that they can be derived from the verification conditions of a proof method
using an approximate decomposition, hence from a basic induction principle. We illustrate this point of view by
generalizing Cousot and Cousot [1976] to parallel programs with shared variables. Another example can be
found in Cousot and Cousot [1980b] that generalizes Cousot and Halbwachs [1978] to parallel processes com-
municating by rendezvous.
S is a Set of states,
t € (SxS — {tt,ff}) is a transition relation,
b € (S — {tt,ff}) characterizes entry states,
e € (S — {tt,ff}) characterizes exit states.
INVARIANCE PROOF METHODS 245
The set S of states is a model of the set of possible data that can be contained in the store(s) on which
the program operates. We ignore for the moment the particular structure of the states. In practice a state has
several memory components (assigning values to program variables, input and output files, ...) and control
components (assigning values to program location counters, ...). Program execution always begins with entry
states. The total function b from states into truth values ({tt,ff} characterizes entry states. This means that
b(s) = tt if and only if state s is an entry state and b(s) = ff otherwise. Program execution properly ends
when an exit state is reached. Exit states are characterized by e. The transition relation t specifies the effect
of executing an elementary program step. More precisely t(s,s’) = tt means that starting in state s and exe-
cuting one program step can put the program in successor state s’. A sequential program is modeled by a deter-
ministic transition relation since a state s can only have one successor state s’, if any. A parallel program is
modeled by a non-deterministic transition relation since a state s can have no or several successor states s’.
This is because the transition relation is usually defined in terms of arbitrarily choosing an active process and
executing one step of that process. Some states s may have no successor (that is t(s,s’) = ff for all s’€S), in
which case they are called blocking states. For example, a sequential program can be in a blocking state after a
run-time error or a parallel program can be in a blocking state because all processes which are not terminated
are waiting for some event that never happens.
Example B.1: Defining the semantics of a sequential program by means of a deterministic transition system.
We will consider sequential programs with assignment, conditional and iteration commands. Labels will
only be used to designate program points. For simplicity, type and variable declarations are left implicit.
For example, the following program computes 2" for n 2 0:
2:
while N > 0 do
3:
N:=N-1;P:=2 xP;
L4:
od;
|
ER
Let II = {li,...,hi} be the set of integers included between the lowest and greatest machine representable
integers li and hi. A state (I,n,p) €S consists of a memory state, that is a pair (n,p) €M assigning integer values
to program variables N,P and of a control state |€C which is one of the program points, L1,...,L5. Therefore,
246 COUSOT AND COUSOT
We define the transition relation t by the following clauses (where n€II and p€ID:
A clause [s +> f(s) iff c(s)] means that for all s€S, t(s,f(s)) = tt whenever condition c(s) holds.
Starting with N=2 and P=p, execution of that program leads to the sequence of states
(L1,2,p) +> (L2,2,1) +> (L3,2,1) +> (L4,1,2) +> (L3,1,2) +> (L4,0,4) += (L5,0,4)o
Example B.2: Defining the semantics of a parallel program by means of a non-deterministic transition system.
We consider parallel programs [[P, |...| Py] which consist of k >1 sequential processes Py,...,P, executed
concurrently. These processes share (implicitly declared) global variables. (If variables need to be local to
some process Pj, we will use instead global variables neither used nor modified by the other processes P;, ji.)
It is sometimes necessary that processes have exclusive access to shared global variables. For that purpose
we will enclose atomic operations inside square brackets. The execution of such operations is indivisible so that
it cannot interfere with the concurrent execution of other processes. For example the program
[ (N:=N+1] | [N:=N+1] ]
f (T1:=N];(T1:=T14+1];(N:=T1] | [T2:=N];(T2:=T2+1];[N:=T2] ]
will increment N by one if both processes read the value of N_ before it is modified by the other process and
by two if one process reads the value of N after it has been incremented by the other process.
INVARIANCE PROOF METHODS 247
LO:
[
jes
[Pl e= 1]:
Ll2:
while [N >1] do
Els:
[ING NEI P=) xP He
L14:
od;
PS:
|
L21:
(P2? 2= 1):
[22:
while [N >1] do
23:
ING = N-1? P22 xP?)
L24:
od;
25:
1;
Hele
if N = 0 then P := P1 xP2 else P := 2 xP1 xP? fi;
12:
A state is of the form (c,n,pl,p2,p) where the values n,pl,p2,p of variables N,P1,P2,P belong to II] =
{li,...,hi} and the control state c is either LO, Ll, L2 ora pair (11,12) of labels, one control location for each of
the two processes:
C= ALO
Ie 2 eU RCE ieee ox (Ie) Ieee to)
M = It
S=CxM
b(c,n,pl,p2,p) = [c = LO] characterizes entry states
e(c,n,pl,p2,p) = [c = L2] characterizes exit states
(b) ((L11.12)-n.p1.p2.p) = ;n
(012. Ip2p)
12) iff 1 €Il
(c) (b) ((L13,12),n,pl,p2,p) ++ ((L14,12),n-1,2 xp1,p2,p) iff (n—1) €M and (2~xpl) €Il
(Ll,n,pl,p2,p) + (L2,n,pl,p2,p1
xp2) iff (n=O) and (pl xp2)
€ll
On program entry, executions of both processes begin simultaneously (a). Then each process progresses
at its own speed independently of the other (b). The concurrent execution of commands in different processes
is modelled by an interleaved execution which proceeds as a sequence of discrete steps. In each step a com-
mand is selected in only one of the processes and is executed to completion before the same or another process
may initiate an elementary command and proceed to complete it. Since execution of atomic operations is indi-
visible it is modelled by a single transition (c). Notice that since Pl and P2 are not shared we could have split
[N:=N-1;Pi:=2xPi] into [N:=N-1];[Ti:=Pi];[Ti:=2xTi];(Pi:=Ti]. However the update of N must be indi-
visible. This can be achieved by any hardware or software mutual exclusion mechanism. The concurrent exe-
cution of the two processes ends when both have terminated (d).
A possible execution sequence for N=2 could be:
CEO2spiep2sp)y=— (L112
1))2 pl p2.p)) 4 (122821). 29192. p)
OE TS 22)
2kD) (1S 51005) 1a ip) a (I te) LIE
In the above sequence the value of N at Ll was 1. It can also be 0 if both processes simultaneously test that
N>1 when N=2. This is the case in the following execution sequence:
INVARIANCE PROOF METHODS 249
GIET53125)0, 252.) 28 (1
0 28) (2 ae)
Notice that the undeterminacy about the values of N and P when both processes end can easily be taken
into account to yield the correct result. This solution is certainly less costly than the one which would consist
in synchronizing the processes in order to avoid possible simultaneous tests of N. Another solution would con-
sist in having one process iterate |n/2 | times and the other [n/2 | times. The drawback of this solution is
that its efficiency does depend upon the assumption that both processes are executed at about the same speed.
On the contrary, the efficiency of the above parallel program does not depend upon the relative speed of execu-
tion of the two processes. Another advantage is that it can be easily generalized to an arbitrary number of
processes. 0
Abstraction from the above examples is left to the reader. In general, the semantics of a programming
language can be defined operationally. This consists in defining the transition system associated with each pro-
gram of the language by induction on the context-free syntax of programs. (See e.g. Cousot, R. [1981]).
An assertion W € (S—{tt,ff}) is said to be invariant if and only if it characterizes a super-set of the set of
final states that can be reached during some execution started with an initial state, that is
More precisely, if states s€S are pairs (c,x) consisting of a control state c €C and a memory state x €M, b
characterizes entry states and e characterizes exit states, partial correctness can be stated as
Vc,c €C, x,.x EM, [b(c,x) A t((c,x), (x) A e(e,x)] => [6) => O(x,x)]
Notice that the fact that an exit state (¢,x) can be reached when execution is started with an entry state (c,x) is
an hypothesis which is assumed to be true for W((c,x),(¢,x)) =[d(x) => 0(x,x)] to hold. Therefore termi-
nation is not implied. In particular, any non-terminating program is partially correct since
ff => [d(x) => (x,x)].
For program B.1 the condition that all integers between 0 and 22 are machine representable is sufficient to
avoid run-time errors. This can be stated as
Example C.5.1: Using program flow analysis algorithms for generating local invariants.
Some program analysis techniques, such as Cousot and Halbwachs [1978], can be used for automatic com-
putation of local invariants of programs. Since the strongest set of local invariants is not computable, only
approximate results can be automatically obtained. The invariant 6(x) associated with program points | €L is
approximate in the sense that it is correct:
await [B then C]
it is delayed until the condition B is true. Then the command C is executed as an atomic action, the evaluation
of B to true and execution of C being indivisible. Command C cannot contain a nested await command. If two
Or more processes are waiting for the same condition B, any one may be allowed to proceed when B becomes
true while the others continue waiting. When invariance properties are considered the order in which waiting
processes are scheduled is often irrelevant.
Let us consider a parallel program [P, |...| Py]. The corresponding states are of the form ((Ij,...,l,),x)
where each |; is a location of process P; and x the memory state of the shared variables X. Let W; be the set of
waiting locations of process P; so that P; contains await commands
Lij:await [B(Lij) (X) then C] , for Lij € Wj.
Let Lie be the exit location of process P;. We define W as the set of control states (1,,...,],) corresponding to
waiting or exit locations, not all of them being exit locations. Formally
252 COUSOT AND COUSOT
A sufficient condition ensuring absence of global deadlocks is that all states that can be reached during execu-
tion are not blocking states. This invariance property can be stated as
V¥s,s €S, [b(s) A t'(s,s)] => 7A(s)
THEOREM D.1
<==>
Corollary D.2
<==>
Proof: The soundness proof (=>) consists in defining I(s,s) =i(s), W(s,s) = (Ss) and applying theorem D.1.
The completeness proof (<=) consists in proving that if I(s,s) satisfies conditions D.1 (a)—(c) then i(s) =
[3s €S| €(s) A I(s,s)] satisfies conditions D.2 (a) —(c).
254 COUSOT AND COUSOT
Example D.3: Proving the partial correctness of a parallel program by direct application of the basic induction
principle.
The program
r= (E1112)
(bt 22) <i states
b(Q1,12,n) =([M=L11 A 122=L21) entry states
e(11,12,n) = [11=L12 A 12=L22] exit states
(L11,12,n) +> (L12,12,n4+1) iff (n+l) € TI transition relation
(11,L21,n) ++ (11,L22,n4+1) iff (n+l) € TI
Let us prove that if execution of that program begins with N=0 and happens to end then N=2. This partial
correctness property can be formulated as
where
which, as can easily be checked by the reader, satisfies conditions D.2 (a) —(c).
Readers familiar with fixpoint theory can consult Cousot, P. [1981] where it is shown that the invariants
can be defined as fixpoints of predicate transformers. In Cousot and Cousot [1981], other equivalent induction
prin-
ciples are derived from the above ones, and this leads to the construction of new invariance proof
methods.
decomposed into a conjunction of simpler verification conditions, each one corresponding to a basic command
of the program and each one involving only some of the local invariants Q).
Observe that by substitutions we could have eliminated Q» and Q4, keeping only the loop invariant Q3. This
leads to Floyd’s method.
The reader can check that the following local invariants satisfy the above verification conditions:
Qi(n,n,p) = [n =n 20)
Q,(n,n,p) =[n =n20 A p=]
Q3(n,n,p) =[n>0 A p=22*]
Q4(n,n,p) = [n>0 A p=22*]
Q;(n,n,p) = [n=0 A p=24]
In order to understand how this partial correctness proof method can be constructively derived from
induction principle D.1, let us define
5
= ({l=L1 A n 30] => ty [L=Lk A Qx(n ,n,p)))
5
= ({(n>0) => V_ [L1=Lk A Q,(n,n,p)))
k=1
III ({n 20) => Q,(n,n,p))
Replacing I and t by their definitions further simplifications lead to the verification conditions E.1.(b).
Finally E.1.(c) is equivalent to D.1.(c) where:
o(l,f,p) = {1 =LS5]
More generally, observe that the basic induction principles D.1 and D.2 have verification conditions of the
form:
Invariance proof methods apply induction principles D1 or D2 indirectly in that one uses other verification con-
ditions of the form
Example E.2: The standard decomposition for sequential programs leads to sound and complete proof methods.
Coming back to the partial correctness proof method which we illustrated by example E.1 we had:
C= (1,..k5}
Sr" XII2
A = (S? = {ttff})
A’ = J] GP — {tt,ff})
1EC
That is Q CA’ was a vector of assertions Q;, 1 € C on (n,n,p) € II?) The correspondence between A and A’ was
defined by p € [A’ — A] such that:
; 5
p(Q)((,n,p),(,n,p)) = oe (I=Lk A Q,(n,n,p)]
p(1) a (Q),...,Qs5)
where
This correspondence formally defines what is usually explained as "Q,(n,n,p) relates the initial value n of N and
the current values n,p of variables N,P when control is at program point |".
Notice that p is one-to-one-onto and p is its inverse. Since
and Cousot [1976]) is more general in that it is suitable for reasoning about program proof methods (where
(A.V) and (A‘,V’) have to be equivalent) and also for reasoning about mechanizable hence fundamentally
incomplete program analysis methods (Cousot and Cousot [1979]).
F.1.1 Decomposition
Let us consider a parallel program [P, |...| P,J] with memory states M and control states Ci for each pro-
cess Pi, i = 1, ...,k. A global invariant | € (Clx--- xCkxM — {tt,ff}) can be expressed as a conjunction of
local invariants Qi, € (C1x +--+ xCi-1xCi+lx--- xCkxM — {tt,ff}) on control states (of processes P;,j i)
and memory states. A local invariant Qi is attached to each program point | € Ci of each process P;, i=1,...,k.
More precisely,
k
NCienck raw
jean HON
LEC, KC i ea Onc
SS 1 Cerca
i—1>“i+l eae
Vk
and
Notice that if program point | belongs to process i then Q, holds when control is at | in process i (wherever con-
trol can be in the other processes). Therefore these local invariants can be interspread at appropriate places in
the program text, with the interpretation that Q, is a valid comment at program point!. O
Cae cir x) A ECC Creek) A ChsneeC aul Core Cie x) S> Oi(Cie CoC ee Cre
for each process i € {1, ...,k} and program point | € C; of that process. Since t is defined as a disjunction of
cases, one for each elementary command of the program, the above verification condition can be further
decomposed. When the transition corresponds to execution of a command of process i we get verification
conditions corresponding to the sequential proof (of process i regarded as an independent sequential program):
When the transition corresponds to execution of a command of process j#i, we get verification conditions of
the form:
A Coron cher URC upc en Ee Pe Berens CURIAUK CCIE MOPoe ol (eee GIN (yian-ch Mecckd Roane)
”
— ”
—— Oi cn! AES ere ss Oe ec.)
which consists in proving that the local invariants of each process are left invariantly true under parallel execu-
tion of the other processes. These verification conditions were termed ‘“‘interference freeness checks’? by
Owicki and Gries [1976] and ‘‘monotonicity conditions’? by Lamport [1977]. However, these authors did not
exactly propose the above verification conditions but instead the following simpler (and stronger) ones:
— sequential proof:
Oi CeCe CD ii AU (rac a ein acts, a) MC ike Cay ere Ce)
% '
Wat (Cie lone cl eeC Xue Cie bcm re Cpa) moe Olin. Gayle Cte Cece Cs)
These verification conditions are obviously sufficient since they imply the above ones
(because | = A : Qi).
i
Example F.1.2.1:
The verification conditions corresponding to program F.1.1.1 that is:
are
(a) initialization:
(n) = > [Q,(3,n) A Q3(1,n)], where ¢ is the input specification
(c) finalization:
[Q,(4,n) A Q4(2,n)] => 6(m), where @ is the output specification.O
INVARIANCE PROOF METHODS 261
If we define
| k
p(Q) =I iff (Cie CeO) = » A [(c\=) => Once 16C ace
i= G.
and
p(1) =Q iff Oj Git Ca1. C416 oR aril (Cy CeCe Ck)
we have established a formal correspondence between induction principle D.2 and the Owicki-Lamport invari-
ance proof method. We have informally proved that the verification conditions proposed by Owicki-Lamport
are sound (i.e. using the notations of paragraph E, that V'(Q) => V(p(Q)). The sufficient completeness con-
dition (V(I) => V'(p(1))) can also be checked by the reader. Intuitively, this condition is satisfied because
each local invariant Q; can always be made strong enough so as to exactly describe the possible states of the
whole program when process 1 is at point lI.
F.1.3 Example
Let us prove that program B.2 is partially correct (according to definition C.2.1). Instead of using induc-
tion principle D.2 which underlies the Owicki and Gries’ method we use D.1 so as to be able to relate the
current value n_ of variable N to its initial value n. (Owicki and Gries would instead introduce an auxiliary
variable in order to memorize the initial value of N).
Since both processes are symmetric we need only reason about process 1. We will prove that the relation
is invariant in process | after initialization of variable Pl. To prove this we will show that the invariant remains
true after execution of any command of process | and that it is not invalidated by execution of some command
of process 2. Since partial correctness follows from the invariant with c2=L25 (process 2 has terminated) and
0 <n <1, we will also show that the value of N after the parallel command is either 0 or 1. Since the initial
value n of N is assumed to be positive, the only difficulty is for n > 1. In this case N is decremented until
reaching value 2. On one hand both processes can test that N>I1 before it is decremented by the other one,
then each process will decrement N and terminate. In this case N would equal 0 on exit of the parallel com-
mand. On the other hand, when N=2, one process can test for N>1 and decrement N to | before the other
process tests for N>1l. Then both processes terminate and N=1 on exit of the parallel command. For an
invariance proof, the above operational arguments can be rephrased in a ‘“‘time independent manner’’, which
leads to the following local invariants:
Q)(n,p) = [p=25J
It is a simple mathematical exercise to show that these local invariants satisfy the following verification
conditions (which are universally quantified over n,n,n’,pl,pl’,p2,p2’,p € TI, cl € (UliSye
— Initialization:
= > Q),(n,L24,n'—1,pl,2xp2’)
[Qix(n,L24,n,pl,p2) A Qo4(n ,L1k,n,p2,pl1) A n>1] => Qi,(n ,L23,n,p1,p2)
[Qix(n,L24,n,pl,p2) A Qo4(n ,L1k,n,p2,p1) A n<l] => Q),(n ,L25,n,p1,p2)
INVARIANCE PROOF METHODS _ 263
— Finalization:
[Qi5(n ,L25,n,pl,p2) A Qos5(n,L15,n,p2,pl1)] => Q;(n,n,pl,p2)
[Q)(n,n,pl,p2) A n=0 A (pl xp2) EM] => Q>(n,pl xp2)
[Qi(n,n,pl,p2) A n#0 A (2xp1 xp2) €M] => Q>(n,2 xp] xp2)
Q,(n,p) => [p=249).
Notice that for the whole program we have got 77 veritication conditions. Theoretically the number of
sequential verification conditions is linear in the size of the program whereas the number of verification condi-
tions for checking interference freeness grows exponentially with the number and size of processes. The practi-
cal method for avoiding this combinatorial explosion is to make informal proofs and to choose the local invari-
ants of each process as independent as possible of the other processes. In that case most of the interference
freeness checks become trivial.
F.2 Decomposition of the Global Program Invariant Leading to Lamport [1980] Proof Method
F.2.1 Decomposition
Another way to avoid the proliferation of simple verification conditions is to use a coarser decomposition
which consists in associating a global invariant Q; with each process P; of program [P, |...| Py]. Each predicate
Q; may depend upon the values of variables as well as upon program control locations. The correspondence
with induction principle D2 is established along the lines of paragraph E by defining global inductive invariant
I as the conjunction of the global invariants Q; for each process P;:
: k
I=pQ=-AQ
i=l
(A Q) At =>Q; is possible since t is a disjunction of cases. For each process i, one can distinguish
JS;
between a sequential proof ( A Q;) A ty => Q; (when t corresponds to execution of a basic command
J€Si
264 COUSOT AND COUSOT
F.2.3 Example
Let us give another proof of program B.2, using process invariants. Obviously, the proof is just a refor-
mulation of F.1.3 using a global invariant for each process instead of local invariants attached to program
points. Since both processes are symmetric, we use the same process invariant for each of them and reason
only on process one.
In order to be able to designate program locations let us introduce:
The central idea of program B.2 is to maintain invariant the following relation:
The other essential observation for the partial correctness proof is that the program can only terminate
when 0 <N <1. To prove this, let us introduce:
For each process, one can choose the following global invariant:
— Initialization.
— The proof of absence of interference of execution of process 2 with the global invariant of process | exactly
amounts to the sequential proof of process 2.
— Finalization:
When compared with F.1.3 the use of a coarser decomposition leads, for that example, to a natural factor-
ization of similar verification conditions.
Reciprocally, a proof using global process invariants can be derived from a proof using local invariants using
representable integer. We will use the selectors <n,p;p; >[N] =n, <n,p;,p; > [Pi] =p; and
<n,pj,P; > [P] ps The meaning of a description D can be explained in terms of the local invariants Q
considered at paragraph F.1 by the connection Q = y(D) such that:
Qi(n cn, pj,Pp = Ve ((c-=h) NA (V v €{N,P1,P2 i" Dj (jh) [v] ~l| A v €Dj(jh) [v])).
For example D,4(23) = <l[l,hi-1],[2,hiJ,[1,hil > means that at point L14 of process 1 it is true that
((1 <n <hi-1) A (2<p;<hi) A (1 <p)<hi)) when control is at point L23 of process 2. Reciprocally a set of
local invariants Q can be approximated by D = a(Q) such that
G.3 Fixed Point System of Approximate Equations Associated with a Parallel Program
The verification conditions of paragraph F.1.3 can be written as a system of inequations Q<=V(Q) which
was obtained by decomposition of
In the following presentation of the fixed point system of approximate equations for program B.2, it is assumed
that initially we must have n>0. For each equation we distinguish a term corresponding to the sequential proof
and a term corresponding to the interference check:
Do = <(0,hi),
[li,hi], [li,hi] >
D,,(21) ae Do
D,,(2k) = inter,,(2k)
Dya(2k) = Dyy(2k)
(pl:U1,1]V intery,(2k) a4s ;
D,3(2k) = (Dy2(2k) A <([2,hil, [li,hil, [li,hi] >) et ees
V (Dy4(2k) A <([2,hil,[li,hil, [li,hi] >) V inter,;(2k)
Dy4(2k) = <Dj3(2k)(n)—1, 2XD)3(2k) (pl), D)3(2k) (p2) > V inter,4(2k)
D,5(2k) = (D,)(2k) A <[li,1],Ui,hil, [li,hi] >) scsi
V (Dy4(2k) A <([li, 1), [li,hi],[li,hi] >) V inter,4(2k)
where
inter),(21) slice, 22
inter),(22) = (Dj, (21) A Dy,(.k)) [p2:01,1)]
inter},(23) = (Dy,(22) A Dyk) A <([2,hi],[li,hi], [li,hi] >)
V (Dy, (24) A DogAk) A <([2,hi], [li,hi) , [li,hi] >)
The convergence can be accelerated using Cousot and Cousot [1976,1977] extrapolation techniques. This
consists in defining a widening operation V such as:
jl Vx=x
[a,b] V [c,d]
=[ if c <a then li else a, if d > b then hi else b]
and then solving iteratively. The result we have obtained (for process 1) is:
(0, hi] (li,hi) (lijhi} | (O,hi) (ijhi) (1,1) | (2,hi) — [li,hi) (1,hi) {1,hi-1] [lijhi] (2,hi} | (0,1) [li,hi) [hil
(0,hi} (1,1) fli,hil (0,hi] LU) TORU) | Aer) GLU} [oleett) | olsoreN) (OK) Wear) | Osi) Bhs) [line
(2,hi] (1,hi) [li,hi) (2,hi) [1,hi) (1,1) | (2,hi} (hi) (hi) | (1 yhi-1) (1,hi) (2,hi) | (1,1) (hi) (1,hi)
(1,hi-1] (2,hi} [lijni) | (1,hi-1) (2,hi) (1,1) | Ohi-1) (2,hi} (1,hi} | (O,hi-2) [2,hi] [2,hi] | [0,1] (2,hi) [1,hi)
(0,1) {1,hi) [li,hi] {0,1] (1,hi) (1,1) | (1,1) (hi) (hi) | fo. (1,hi) (2,hi) | (0,1) (1,hi) (1,hi)
Notice that the decomposition is approximate enough to allow a computer implementation of this kind of
analyses. However, as shown by the above example, the results of such approximate analyses can be useful
since we obtain
D, = <(0,1),(1,hi],01,hi} >
which proves that N € {0,1} on exit of the parallel command of program B.2, a result which is not trivial to
obtain by hand.
References
Cousot P. [1977]
P. Cousot, ‘‘Asynchronous iterative methods for solving a fixed point system of monotone equations in a com-
plete lattice,’’ Research Report No. 88, IMAG, University of Grenoble, France (Sept. 1977).
270 COUSOT AND COUSOT
Cousot P. [1979] .
P. Cousot, ‘‘Analysis of the behavior of dynamic discrete systems,’ Research Report No. 161, IMAG, Univer-
sity of Grenoble, France (Jan. 1979).
Cousot P. [1981]
P. Cousot, ‘‘Semantic foundations of program analysis,’ in Program Flow Analysis, Theory and Applications, S.S
Muchnick & N.J. Jones (eds.), Prentice-Hall, Inc. (1981), pp. 303-342.
Cousot R. [1981]
R. Cousot, ‘‘Proving invariance properties of parallel programs by backward induction,’’ Research Report,
CRIN—81—P026, Nancy, France (March 1981), to appear in Acta Informatica.
Floyd [1967]
a San ecu meanings to programs,’’ Proc. Symp. in Applied Math., Vol. 19, AMS, Providence, RI
» PP. 17 5z.
INVARIANCE PROOF METHODS 271
Hoare [1969]
C.A.R. Hoare, ‘‘An axiomatic basis for computer programming,’’ C.ACM 12, 10(Oct. 1969), pp. 576—580,
583.
Keller [1976]
R.M. Keller, ‘‘Formal verification of parallel programs,’? C.4CM19, 7(July 1976), pp. 371—384.
Lamport [1977]
L. Lamport, ‘‘Proving the correctness of multiprocess programs,’’ /EEE Trans. on Soft. Eng., SE3, 2(March
1I9771)s pps 125—143,
Lamport [1980]
L. Lamport, ‘‘The ‘Hoare Logic’ of concurrent programs,’’ Acta Informatica 14 (1980), pp. 21—37.
Naur [1966]
P. Naur, ‘Proof of algorithms by general snapshots,’’ B/T 6 (1966), pp. 310—316.
CHAPTER 13
A. Pnueli
The Weizmann Institute of Science
Rehovot, Israel
N. S. Prywes
University of Pennsylvania
Philadelphia, PA 19104
R. Zarhi
The Weizmann Institute of Science
Rehovot, Israel
The research reported here was supported by the Information Systems Program Office of Naval Research, Contract No.
N00014—76—C—0416.
274. PNEULI, PRYWES, ZARHI
variables. The user need not (and cannot) include statements for the reading and writing of external variables
in his specification. Conditions are allowed within equations, and so is the definition of an array variable by a
subscripted equation which is interpreted as an individual equation for each element of the array obtained by
varying the subscripts over the defined range of the array.
Apart from these two conventions, the user has no way of exerting direct control such as conditional
statements or looping. The translations system derives the need for loops from subscripted equations and the
data declarations which specify the dimensions of the arrays.
As an example of a MODEL specification, consider the task of computing the smallest power of 2 which
is larger than 10000. A possible specification for it might be
G IS GROUP (A(*))
A IS FIELD
RESULT IS FIELD
For a more detailed description of the MODEL language and system we refer the reader to Prywes et al.
[1979].
An obvious event in the execution of this specification is the computation of the value of C from the
values of A and B. Less obvious but still necessary events are the reading of the values of A and B off some
external medium and then that of writing the value of C onto some other external medium. The execution of
these separate events is constrained by the natural precedence requirements that an external variable must be
read before its value is used, that all the arguments of an expression must be evaluated before the expression
itself can be evaluated. In the simplest cases these precedence or dependency constraints can be expressed by a
simple graph:
276 PNEULI, PRYWES, ZARHI
Here the nodes labelled by variable names signify the read and write operations respectively associated
with these variables. If we sort this directed graph topologically, i.e. order its nodes in any linear order which
conforms with the contraints we obtain a schedule of events in the execution which can immediately be
transformed into the simple program:
READ B
READ A
C:= A+B
WRITE C
In the more general case, variables are structured and simple edges do not convey the exact interdepen-
dency between the individual elements. Thus considering the specification above for computing the first power
of 2 exceeding 10000 we obtain the following dependency graph:
SCHEDULING EQUATIONAL SPECIFICATIONS 277
Without deeper analysis of the dependencies, one may conclude from this graph that A depends in gen-
eral on itself, and therefore this specification cannot be scheduled. This indeed would have been the case if we
had slightly changed the expression defining A to read
Consequently we introduce the notion of an array graph, in which nodes stand for arrays of individual elements
and edges represent dependencies which may be subscripted. The problem of translating a specification reverts
therefore to the scheduling of array graphs. Indeed the basic operation of the translator for the MODEL sys-
tem proceeds in three main phases. In the first phase, the statements of equations and declarations are
analyzed and the execution events associated with each statement identified. The main output of the first phase
is an array graph representation of the specification. The second phase of the translation consists of the
scheduling of the array graph. As will be seen below, this scheduling is tightly coupled with loop identification
and construction. The last phase transforms the schedule produced above into a PL/I program, performing
essentially a one to one translation of the scheduled events into the respective code for their execution. In this
chapter we chose to concentrate on the scheduling algorithm and,in order to simplify the presentation,apply it
to a simple set of equations defining rectangular arrays. One should bear in mind, though, that these equations
could actually represent arbitrary precedence constraints between arbitrary events in a specification, i.e. are, in
fact, an equational representation of an array graph. Thus, the method presented here is actually the basic algo-
rithm used in the MODEL translator for scheduling the array graph representation of the specification.
C. Equational Specifications
In the following sections we address ourselves to the problem of translating an Equational Specification
into an equivalent working program.
Typically an equational specification is given by a set of equations of the form:
This set of equations defines (recursively) the values of the elements of the possibly infinite arrays
Ay,.-Am -
The idea is certainly not new. These equations, sometimes representing dependency relations, naturally
arise in languages such as Data Flow languages, (Arvind and Gostelow), Lucid (Ashcroft et al. [1976]), etc.
On the other hand, for anyone familiar with Functional Languages such as LISP, Backus’ languages, etc., this is
a special case of a mutually recursive set of function definitions when the arguments are restricted to be natural
numbers.
There are several reasons for staying within the limitations of recursively defined arrays rather than
extending to general functions:
278 PNEULI, PRYWES, ZARHI
and struc-
1. The area of applications we have in mind is that of Data Processing usage. In this area, arrays
and
tures in general play important roles and the direct specification of relations between input structures
output structures seems both natural and adequate.
2D Restricting the arguments of the defined entities to be integers and imposing syntactical restrictions on
the expressions that appear as subscripts enable us to perform a much deeper analysis of the situation and
hope for a compiling system rather than an interpreting system.
In fact, the subject of this paper is the study of these simplifying assumptions about the language which
will enable us to propose a compiling system for its execution.
It is a trivial matter to show that such an array definition language, supplemented with the special function
‘‘T EAST” is indeed universal in the sense that every computational task can be programmed in it:
Z =h(Y@))
In the case that some of the y variables are structured, the notation Y(I) calls for the addition of one
more dimension.
The main question we would like to investigate in this paper is under what conditions can we produce a
compiled program for the execution of a given array specification.
Before we formulate this question more precisely, we would like to point out two possible pitfalls that one
must beware of, in considering this question.
The first is that obviously, there exists a universal interpreter for this language. The interpreter operation
can be described as a continuous search for equations which can be applied to quantities which are either input
quantities or already defined array components, and yield a definition for some new array components. Once
such an equation is found, the new component is assigned its value, and is flagged as defined. Systematically
persisting in this search, the interpreter must either reach a state in which all the output quantities are defined,
in which case it can stop, or loop forever. The latter case implies that the specification was inadequate
for the
definition of all the output variables.
Such an interpreter could indeed be presented as a solution to the general case. But if
we insist on having
an efficient compiled program, we should rule out the possibility of this trivial solution.
Another complicating factor is that it is easy to devise examples of simple
specifications, for which the
derivation of the optimal program seems nontrivial.
Consider for example the specification:
SCHEDULING EQUATIONAL SPECIFICATIONS 279
Example 1:
Example 2:
A(25):=X;
Example 3:
else A(I+1) +2
We leave it to the interested reader to a) verify that this specification is adequate for defining a value for
each A(I), 1 <I < 100, and b) to derive an efficient loop-while program for the computation of its elements.
It can easily be shown that the general problem of adequacy of the specification, being equivalent to the
totality of a number theoretic function given by a recursive definition, is of course undecidable.
In order to avoid the trivial solution of the interpreter on one hand, or getting into undecidable analysis
on the other hand, we choose to formulate our problem in schematic terms presenting an abstract model whose
solution is more manageable.
PA ek Ago A.)
Aq(I),...1a) = Tl,
I, is a subscript. Its appearance in 7; must satisfy k < dj. The Ix, | <k <dj must be disjoint, i.e. if i#j
then iF x Ij.
These specifications are identical except for the instance of A3 in f,;. However, that small difference
makes in fact Sl unschedulable while S2 is schedulable.
where some disjoint loop variables have been substituted for the free subscripts I,, - - - Ig.
We define the notion of an L-block for L a finite set of loop variables. L specifies the loop variables
which are still free in the block.
Ze Execute the program in the conventional way. All the array elements are initialized to 1, and so is inter-
preted any reference to A(,:-:~g,...) for I[g] >N. Since all the concrete functions are total over
(Z*)™, (D*)™ respectively, each statement yields a value (possibly 1 which is assigned to the correspond-
ing array element.
This computation terminates and yields values over D* for Aj(1 --- N,...,1 --- N). We extend Ey(A,) by
defining Ey(A))(,:--J,---) =1forany J >N.
An immediate consequence of the continuity of the interpretation is that for any N; > N> we have
En, (Aj, eT AS En, (A1,---An) by which we mean En, (Ai) Ee En (Ai) for every 1 <i <n.
Thus Ex(A) forms a chain for N = 1,... and we define EP(A) = (10-5) (EN(A)).
Intuitively this definition is equivalent to:
AP(ky, °°: kg) =u € D iff for some N the N-execution assigns the value u to Aj(kj ~~~ kq).
AP(ky, °°: kq) = 4 iff all N- execution assigns the value 1 to this component.
282 PNEULI, PRYWES, ZARHI
A program P is said to realize the specification S if for every interpretation: EP(A) = pA. 7 (A),
i.e., the functions (arrays) A;,°:: A, as defined by the limit of N executions are identical to the least
fixpoint of the interpreted specification.
Thus a possible realization of the specification S2 is:
for k do
for m do A,(k,m):=f;(A;(k,m—1),A3(k—1,m)) end;
A2(k):=f2(A 1 (k,g));
for m do A3(k,m):=f3(A,(k,m),A2(k)) end
end
Problem: Given a specification, does there exist a program realizing it? If the answer is positive, we would like
to also be able to produce such a program.
Because of the high level of abstraction in our problem, our space of candidate programs is actually lim-
ited.
We say that a program P is a schedule of a specification S , if all the statements in P are substitution
instances of the equations in S.
A(QI,J) = Fd,J;AQd—1,J),AG—1,D)
B,J) = AGV,D
Then the following is a consistently subscripted specification which extends the original one:
This transformation can be applied to the general case to produce a subscript consistent specification.
The algorithm for scheduling a specification in normal form uses a labelled dependency graph which
represents the dependencies between elements in the specification. For a given specification S , the depen-
dency graph Gg is defined as follows:
Gs = (Vs,Es,s) where:
Vs — The set of nodes. Each node corresponds to an array variable (or the equation defining it.)
Es — The set of edges. For each instance of the array Aj in 7 (the defining expression for Aj), there is an edge
leading from node i to node j, implying that elements of A; depend on elements
of Aj, 1.e., some elements of A; have to be computed before the elements of A; can
be computed.
As — For each edge e = (i,j) € Es, A(e) the label for e indicates which elements of A; depend on which ele-
ments of A;. For an edge e = (i,j) which represents dependency of Aj(Ij,...,[g) on
Ai(Jj,...,Jq) the label A(e) = (Ay(e),...,Agle)) is defined by:
GC if Jy =I, —¢
Let G be the component. If G consists of a single node n with no edges then P(G) = n.
284 PNEULI, PRYWES, ZARHI
Otherwise —
Locate a position j such that the index I; is still free, i.e., has not been bounded by a loop and such that
for all edges e inG, Aj(e) 2 0. Consiruel the modified graph G’ obtained from G by deleting all edges
e with Aj(e) > 0.
For I, > 0
P(G’)
FOR 2 0800
p OOD (0,1)}
(0,0)
ENDI;
FOR IT2.0500
[@]
v
pf (0,1)|
ENDI; YY
FOR 120 DO
FOR J20 DO
AED) ey POA
—1 2), BA, J — 2)
END J
FOR J 20 DO
B(I,J}=G(BCUI,J—1),A(1,J))
END J
END I
286 PNEULI, PRYWES, ZARHI
|
G. Generalization
con-
A deeper investigation of these problems is reported in Pnueli and Zarhi [1981]. The analysis there
general frame-
siders a significant generalization of the framework and the results presented here. In the more
for a
work a subscript expression on the right hand side may assume one of the following forms: I,—c, I,+c
nonnegative c > 0 and g,(1). We may thus have a specification of the following form:
ACJ) = F(BU+1,J-1))
B(I,J) = G(BU,J—1), AU—1,J+2))
The main algorithm presented there (based on the idea of the algorithm presented here) checks whether a
given specification is realizable and produces a program for realizable specifications. The algorithm described in
Theorem | has to be extended as follows:
Corresponding to each dependency of the form
we construct an edge with the label —c in position i. The labels now may have negative values (not only —o°o
as in the case described here). In looking for a free index (step 2) we insist now on identification of subscript
position such that the sum of labels (in this position) on every cycle in the dependency graph must be nonnega-
tive. A negative cycle implies a dependency of an array element on another element of the same array with
higher subscript. This for example will emerge in the analysis of an equation such as A(I) = f(A(I+1)) .
Naturally with our convention of increasing loops such specification cannot be realized. The more general case
where constant subscripts c 2 0 are also allowed, and consequently an array may be defined by several equa-
tions — is discussed. The problem whether a given specification in this extended form is realizable is proved to
be undecidable.
References
Hoffman [1978]
C.M. Hoffman, ‘‘Design and correctness of a compiler for a nonprocedural language,’’ Acta Informatica 9
(1978), pp. 217-241.
Prywes [1978]
N.S. Prywes, “MODEL I]— Automatic program generator user manual,”’ Office of Planning
and Research Inter-
nal Revenue Service, TIR—77—41 (July 1978), available from CIS Dept., University of Pennsylvania.
SCHEDULING EQUATIONAL SPECIFICATIONS 287
Shastry [1978]
S. Shastry, ‘‘Verification and correction of a nonprocedural specification in automatic generation of programs,”
Ph.D. Dissertation, CIS Dept., University of Pa. (May 1978).
Shastry et al.
S. Shastry, A. Pnueli and N.S. Prywes, ‘‘Basic algorithms used in the MODEL system for design of programs,”’
Moore School Report, CIS Dept., University of Pa.
Wadge [1979]
W. Wadge, ‘‘An extensional treatment of dataflow deadlocks,’’ Proceedings, Semantics of Concurrent Computa-
tion, Evian, France (1979), Springer-Verlag.
_ Uva ae
a a
: |
a
| ny ae di
| 7
-~ = Se
CHAPTER 14
Alberto Pettorossi
Istituto di Analisi dei Sistemi ed Informatica del C.N.R.,
Via Buonarroti 12, 00185 Roma
Abstract
We present an algorithm for improving memory efficiency of applicative recursive programs. It is based
on the method of adding ‘‘destructive annotations” to programs as suggested in (Pettorossi, [1978]; Pettorossi,
[1979]. Schwarz, [1977]). Memory utilization is improved by the overwriting of cells where intermediate
results no longer useful are stored.
The time complexity of the proposed algorithm is also studied.
290 PETTOROSSI
A. Introduction
Among other methodologies for writing correct and efficient programs, “program transformation’’ (Bur-
stall and Darlington [1977]; Darlington and Burstall [1976]) has been of great interest over the past years. The
basic idea of this approach, in contrast, for example, with the methodology of the stepwise refinements (Dahl et
al. [1972]. Wirth [1971]), is that the programmer is first asked to be concerned with the program correctness
and, only at later stages, with efficiency considerations. The original version of the program, which can be
easily proved correct, is transformed (perhaps in several phases) into a program which is still correct, because
the strategies used for the transformation preserve correctness, and it is more efficient because the computa-
tions evoked by the final program save time and/or space.
Several papers have been published about this methodology concerning: (i) systems for transforming pro-
grams (Bauer et al. [1977]; Burstall and Darlington [1977], Feather [1978]), (ii) various strategies for directing
the transformations (Bauer et a/. [1978]; Chatelin [1977]; Partsch and Pepper [1977]; Pettorossi [1977]), and
(iii) some theories for proving their correctness (Huet and Lang [1978], Kott [1978]). The list of references is
not to be considered exhaustive. For a more extensive bibliography one may refer to Burstall and Feather
[1979].
Unfortunately for the program transformation methodology, a general framework in which one can prove
that transformations improve efficiency while preserving correctness, is not fully available yet. A first step in
this direction was done in Wegbreit [1976]. We have already some partial results and we know that, under
given hypotheses, the tupling strategy is a way of reducing time of computation (Burstall and Darlington
[1977]) and also the ‘‘time x memory product’’ requirements (Pettorossi [1977]). Other strategies also, as the
‘‘generalization strategy’? or the ‘‘where-abstraction introduction’’, can improve the running time efficiency.
In this paper we are concerned with the problem of reducing memory requirements in computations
evoked by applicative recursive programs, like those studied in (Burstall and Darlington [1977]). In the next
section we present the method used and in sections C and D we develop an algorithm which implements it.
Possible improvements and directions of further research are outlined in the last two sections.
A computation history is given in fig. 1. (We did not represent the evaluation of the predicate of the if-then-
else construct. The notation pred(n) stands for the predecessor of n.)
CONSTRUCTING RECURSIVE PROGRAMS 291
Control Memory
Obviously one could improve the memory utilization by giving to the interpreter the information that it
could put the result of mult(m,n) where m was and the result of pred(n) where n was. Doing that, we would
have required 2 memory cells instead of 4 (in general 2 cells instead of 2(n+2) when computing fact(n)) (see
fig. 2).
As a first step we therefore need to extend the language in which programs are written so that such
extra information about the destruction of arguments can be suitably represented. We decided to associate to
each basic function (i.e. a function which is not explicitly defined in the applicative recursive program, but
immediately executable by the interpreter) and each if-then-else construct a binary k-tuple. If the i-th com-
ponent of the k-tuple is 1 the interpreter may reclaim the cell used by the value of the i-th argument of the
basic function for storing a new value, which will be computed later on. If it is 0, no reclaim is possible, at
least at the moment of the evaluation of that basic function. For example pred <,, (n) means that the result of
pred(n), namely, n—1, may be stored in the same cell where n was. We associate with the if-then-else con-
struct a l-tuple (and not a 3-tuple) denoting the possible destruction of the term of the predicate only (see Pet-
torossi [1978a]). Therefore we could rewrite program P1.1 as follows:
and the computation of fact(1) according to program P1.2 is like the one in fig. 2.
Control Memory
given
Notice that the information we give to the interpreter is a ‘‘static’’ one. In other words, it is not
be valid for any input value. Since we give
“at run time’’, but once and for all before execution and it must
“static”? information we do not achieve a memory efficiency which is always optimal (as one can see in other
examples), but we have the great advantage of being able to ‘“compute”’ the destructive annotations ‘‘off-line’’.
The fact that the given information is ‘‘static’’ is the reason that we associate a l1-tuple to if-then-else constructs
(and not a 3-tuple, as one could expect of if-then-else being a function with 3 arguments). Since a priori we
cannot know whether the evaluation of the predicate yields true or false, we should always keep the values of
both the left and the right arm of the conditional. So, instead of using the 3-tuple <x00 >, we use the 1-tuple
<x> for x = O04:
An important feature of the destruction phenomenon needs to be underlined: ‘‘destruction is propagat-
ines.
In the example we just gave, fact(n) is a not-basic function, therefore it has no destructive annotations;
nevertheless it destroys its argument n. It is clear from fig. 2 that the value of n which is | is then changed
into 0 by pred<,5(n). Since, in general, functions are recursively defined, we have to make sure that the pro-
pogation of the destruction is such that correctness is always preserved. In Pettorossi [1978] we studied this
problem and we proved some theorems which guarantee, under given hypotheses, the correctness of the des-
tructive annotations.
Notice also that correctness of the annotations depends on the semantics we adopt for the language in
which we write our applicative recursive programs. Again it is clear from the given example that, for a correct
evaluation of fact(n), it is necessary that pred <,;5(n) is performed after mult <j93(m,n). Otherwise the value
of n which mult will multiply by m cannot be the correct one. The destructive semantics D we consider, is
based on the following 3 basic rules:
We are going to have two semantics functions: S and D. S is a ‘‘standard’’ semantics and deals with
usual programs, i.e. the not annotated ones, and D is a ‘‘destructive’’ semantics which deals with
programs which are annotated.
Given a program P, and an algorithm Markprog for synthesizing destructive annotations (or destructive
markings, as we call them), the basic facts to be shown are the following ones:
Markprog
programs P————————————m
marked programs MP
—
data values \y
In our previous papers (Pettorossi [1978a] and [1978b]) we formally defined the S and D semantics, we
proved some theorems and we introduced some heuristics which will validate the Markprog algorithm we ©
are going to present here. Indeed it can be shown that, for our algorithm, under given hypotheses, the
diagram of fig. 3 commutes and the memory cells are more efficiently used.
where k is a binary k-tuple. Marked-programs are functions from F to Marked-terms. Each pair <f,mt>
belonging to a marked-program mprog such that mprog(f) =mt, is written as f(x1,....xn) < mt.
The standard semantics S:Terms - > Env - > Programs - > V for terms in L is:
294 PETTOROSSI
S[g(tl,...)]
(r)(prog) = B(g) (S[t1]
(r) (prog)....)
where Env denotes the environments, i.e., functions from X to V, B:;G— V"— V gives the semantics for sym-
bols in G; and COND is the semantic conditional.
For a more complete and detailed definition the reader may refer to Pettorossi [1978]. The definition of the
destructive semantics D is a bit complicated (Pettorossi [1978]). About D we need only to know that the
evaluation of a marked-term like g <p p2>(mtl,mt2) proceeds as follows:
mtl is evaluated first yielding the value vl, then mt2 is evaluated yielding the value v2, and eventually
the value of g(vl,v2) is stored in a new location, with the extra effect that the location where the value of
mti was stored, for i = 1, 2, is not certain to contain vi any more, if bi=1.
We will give the algorithm Markprog for synthesizing destructive annotations using the simplifying
hypothesis that all variable values are independent, i.e. when we destroy the value of one variable we do not
affect the value of any other variable. This is indeed a simplifying hypothesis, because when dealing with com-
posite data structures, such as lists or trees, one may have variables denoting subvalues of others (for example
y may denote the tail of the list denoted by z). Nevertheless the ‘‘independent variables’’ hypothesis is the first
step one has to make as an initial approach to the theory of destructiveness, and it is not too restrictive because
it still allows us to deal with data structures like numbers, arrays, etc. and also composite data structures when
sharing of values is not present.
In order to be more specific, let us now introduce another example to which we will refer for explaining
how the algorithm works. We consider the following program P2.0 for computing 2*!:
where all k-tuples are undefined, i.e. made out of ¢’s. Then the algorithm changes each @ into either 1 or 0,
denoting destruction or preservation of the value of the corresponding argument, according to the fact that des-
troying that value does or does not keep the diagram of fig. 3 commutative. Since we look for the highest
memory efficiency, we have to try to transform all ¢’s into 1’s. We may write a first version of the Markprog
algorithm as follows:
CONSTRUCTING RECURSIVE PROGRAMS = 295
A theorem given in Pettorossi [1978] guarantees the commutativity of diagram of fig. 3 when correctness
and consistency, i.e. two properties defined according to the destructive semantics D, hold. In order to make the
checking of the commutativity of the diagram much easier and faster, we would like to have those properties
locally testable, i.e. testable looking only at one recursive function definition, like f(x,,...,x,) <= mt, at
the time. For that purpose we introduce the following notion of destructive characterization of recursive func-
tions. Given a marked program mp e F — Markedterms, a destructive characterization d is a function from F
to 2% where X = {xl,...,xn}. The informal semantic interpretation of such a function d is such that iff e F has
arity n and d(f) = Kipeski where X'S are elements of X then, after the evaluation of any term of the form
f(mtl,...,mtn), the values of the marked terms mti,, . . . , mti,, are destroyed.
Example 1. Given d(f) = {x2}, ie. f destroys its second argument, after the evaluation of
f(xl, succ <95(x2))) in program P2.1, the value of succ(x2) is destroyed. O
Now we can informally define the correctness of a markedterm mt, w.r.t. a given destructive characterization
d. We say that mt is correct w.r.t. d if, when we evaluate mt according to D and we assume the recursively
defined functions occurring in mt to be destructive as specified by d, we get for mt the same value as
S[t] (r) (prog), i.e. the value which can be obtained using the undestructive semantics.
Example 2. Given d(f) = {x2},mtl = if), minus ¢g95(x1,x2) = 0 then 1 _ else
doubleof <)5(f(xl, succ <93(x2))) is correct w.r.t. d, because no useful value is destroyed; but
mt2 = if ¢;yminus <)9$(x1,x2) =0 then 1 else doubleof <;5(f(x1,succ <95(x2))) is not correct w.r.t. d, because
when we evaluate minus(x1,x2) we destroy x1 which is needed for the evaluation of the right arm of the condi-
tional. Similarly a term like mult cogy(f(succ <95(x1),x2),x2) is not correct w.r.t. d, because after the evalua-
tion of f, x2 has been destroyed, and it will no longer be possible to evaluate correctly the second argument of
mult. O
We also say that a marked term mt, such that f(x1,...,xn) <= mt, is consistent w.r.t. a destructive charac-
terization d if d(f) C (d(f) U inspect(mt,d)), where inspect(mt,d) returns the subset of {x1,....xn} which con-
tains any variable xi which will be destroyed during the evaluation of mt according to D, when the recursively
defined functions destroy their arguments according to d.
In other words, we may also say that mt is consistent w.r.t. d if the variable destruction, performed in
evaluating mt, is not ‘‘greater’’ than the destruction we assume for the corresponding recursively defined func-
tion f.
Example3. Given d(f) = {},
prog (f) =mt= if <; minus <995(x1,x2)=O then | else doubleof <;5(f(x1,succ <)5(x2)))
Markprog/2:
{every marked term in the marked program, where all ¢’s are considered as 0’s, is correct and consistent w.r.t.
d};
It is easy to see (Pettorossi [1978]) that the invariant is true after initialization and it remains true for each
iteration of the while-loop. In what follows we consider the following two functions as primitive. They are:
They check whether or not correctness and consistency hold. They also depend on the definition of the seman-
tics D and any reasonable definition of D determines a suitable coding of them (and vice versa, according to
theorem 2, Pettorossi [1978], any coding defines a D such that the diagram of fig. 3 commutes). We hope that
the rules we mentioned at the end of the previous section, and the examples we have given in this section, can
convey to the reader the definition of D we have in mind, so that he can anticipate the behavior of the func-
tions correct and inspect in the examples we are going to give. The interested reader may also find in Pettorossi
[1979] the NPL programs for correct and inspect, which correspond to the semantics D formally given in Pet-
torossi [1978].
validity of the invariant, we have to check that all marked terms mtj’s of the given marked program, where fi
occurs, are correct and consistent w.r.t. the new destructive characterization d’. As usual, f[v|x] denotes a
function which is identical to f except that f(x) =v.
Obviously this process of checking correctness and consistency for all mtj’s w.r.t. d’ can be performed by a
“recursive call’? of the body of the loop we are describing, which indeed has to check correctness and con-
sistency of mtil w.r.t. d. This recursive call is a tail-recursive call because what one has to do at the end of a
call is either to evoke a new call or to stop the overall process by (i) replacing mti by mti0 or mtil, and (ii)
respectively keeping d unchanged or replacing with its last computed value d’. Therefore we can use an inner
while-loop to implement this recursion, using the following two extra variables:
i) a set, say ‘‘functset’’, to remember all function symbols fj’s for which we have still to check correctness
and consistency of the corresponding marked term,
ii) a temporary variable d for storing the initial value of the destructive characterization.
Markprog/3:
od
if t then mprog,d:
=mprog[mtiOffi] ,d
else mprog :=mprog[mtillfi]
od
298 PETTOROSSI
In Markprog/3:(i) mti0 and mtil denote the marked term mti where the ¢ which has been found in it is
changed into 0 or | respectively, (ii) g{vi] denotes the function g where g(i) = v and all its other values are
unchanged. Obviously a correctness proof of Markprog/3 requires the formal specification of the semantics D,
of the functions correct and inspect and of the notion of maximal memory efficiency. It would take too much
space here to do so, and we leave to the reader to convince himself of the validity of the algorithm recalling
that:
and again d(twoto) = {}, d(f) = {}. The correctness and the consistency of the terms in P2.2 rely on the fact
that no variable value is destroyed. Since xl and x2 are necessary for the evaluation of the right hand side of
the conditional, it is easy to see that during the following two executions of the while-loop body corr is false
and we get:
and d is still unchanged. Again, changing doubleof <gs(...) into doubleof <;,(...) does not create any problem,
because the variable values are not destroyed and correctness and consistency are guaranteed. We have there-
fore:
and d(twoto) = {} and d(f) ‘ {}. When succ <gs(x2) is changed into succ <1>(x2), corris true, but cons is
false because inspect (if <) minus <o9>(x1,x2) = 0 then 1 else doubleof <)5(f(x1,succ <15(x2))),d) = {x2}
while d(f) = {}. Since f occurs in mprog(f), we have again functset =
{f}. Now corrand cons are both true
because d has been updated: we have in fact d(twoto)= {} and d(f)== {x2}. . SiSince functset = {}, th
while-loop stops and we have the following final program:
teats Quiet
CONSTRUCTING RECURSIVE PROGRAMS = 299
The reader can easily check the improvements on memory utilization one can obtain using program P2.5
instead of program P2.0. In particular swcc and doubleof may overwrite the memory cells where the values of
their arguments are stored.
Example S.
i) if x2=0 then g(x2,succ(x1)) else h(x1) is an applicative if-then-else.
ii) if x2=0 then g(x2,succ(x1)) else x2 is notan applicative if-then-else because x2 is not an application. O
Heuristic 1. A basic function can be destructive in any argument which is an application or an applicative
if-then-else.
Heuristic 2. In if <gs;mt0=0 then mtl else mt2, ¢@ can be | if mtO is an application or an applicative if-
then-else.
Heuristics 1 and 2 are valid because the value of an argument which is an application or an applicative if-
then-else, is used only once. On the contrary, variable values which are passed by pointers can be used several
times and by different functions: this is why heuristics 1 and 2 cannot be applied for variables.
Heuristic 3. In ifymtO=0 then mtl else mt2, mt0 cannot destroy the variables needed for evaluating mt]
and mt2.
Example 6. Applying heuristics 1,2 and 3 for the program of Example 4 the marking algorithm could have
started directly from program P2.4 saving various executions of the outer while-loop body of Markprog/3.
ACK X2) x3) <= Ne gogo (X12 (2 pegs (x2) <d 24507) x3)
If we choose to change the k-tuple <@¢¢> of h into <1¢@¢>, since in the body of f2 the function fl is called
with the third argument equal to the first one and variable passing is by reference, destruction of the first
argument of fl implies also destruction of its third argument. Analogously, if we choose to change that same
k-tuple into <@¢l > we destroy the third and the first argument of fl for the same reason. So we might as
well choose to change <d¢d¢> in one step into <1¢l > and test correctness and consistency for that k-tuple
directly.
There are in general other ways of improving the method and the algorithm we have given, as for exam-
ple, taking into consideration the semantics of the basic functions occurring in the program. Given a term like
g(r(hd(x)),tl(x)) where x is a list, hd and tl are the usual head and tail functions on lists, we could destroy in
place the value of hd(x) after having performed the basic operation r because we know that tl(x) would not care
about the value of the head of x.
For simplicity reasons we did not analyze this approach here. The reader may have in fact realized that
the synthesis of the program annotations as we presented it, is basically done via the analysis of uninterpreted
program schemata. We hope to study the problem of destructiveness annotation in interpreted programs in our
future research work in this area.
i) the outermost while-loop is obviously performed Abeo times because the number
of ¢’s is initially equal to
the sum of the arities of the basic functions occurrences plus the number
of occurrences of the if-then-else
constructs;
CONSTRUCTING RECURSIVE PROGRAMS _ 301
iii) since the top of the lattice of destructive characterizations (see Pettorossi [1978a]) is consistent and the
longest upward chain in that lattice has length not greater than ay, the internal while-do loop is done at
most a, times;
iv) the worst case for the running time of Markprog/3 is a program with 1 equation only which can be con-
sidered as a tree with ay. + ay) nodes, when f(al,...,an) is represented as:
al an
In that case the evaluation of the internal while-do loop body takes (ape + ayo) Steps, assuming that the
evaluation of one node of such a tree takes one step of computation.
From points i) to iv) it follows that the time complexity of Markprog/3 is in the worst case
O((ageo - Ay) (Atco + Avo)). Therefore the time complexity is quadratic with the sum of the arities of the basic
function occurrences. This means that, roughly speaking, it is quadratic with the size of the program to be
marked.
References
:
Bauer et al. [1978]
F.L. Bauer, M. Broy, H. Partsch, P. Pepper, and H. Wossner, ‘‘Systematics of transformation rules,
TUM —INT—BER —77—12—0350, Institut fiir Informatik, Technische Universitat, Miinchen (1978).
Burstall [1977]
R.M. Burstall, ‘‘Design considerations for a functional programming language,”’ Proc. of Infotech State of the Art
Conference, Copenhagen (1977), 45—S7.
Chatelin [1977]
P. Chatelin, ‘‘Self-redefinition as a program manipulation strategy,’ Proc. of Symposium on Artificial Intelligence
and Programming Languages, ACM SIGPLAN Notices and SIGART Newsletter, August 1977, 174-179.
Feather [1978]
M.S. Feather, **ZAP program transformation system: primer and user manual,’ D.A./. Research Report No. 54,
Dept. of Artificial Intelligence, Univ. of Edinburgh (1978).
Kott [1978]
L. Kott, ‘‘About transformation system: a theoretical study,” 3eme Colloque International l j
Dunod, Paris (1978), 232—267. ‘ Soe ae aaa
CONSTRUCTING RECURSIVE PROGRAMS — 303
Pettorossi [1977]
A. Pettorossi, ‘‘Transformation of programs and use of "tupling strategy,’ Proc. of Informatica ’77 Conference,
Bled, Yugoslavia (1977), 3—103, 1—6.
Pettorossi [1978a]
A. Pettorossi, ‘‘Improving memory utilization in transforming programs,’’ MFCS, Zakopane, Poland, Lecture
Notes in Computer Science 64, Springer Verlag (1978), 416—4235.
Pettorossi [1978b]
A. Pettorossi, ‘‘Destructive marking: a method and some simple heuristics for improving memory utilization in
recursive programs,’ /nformatica ’78, Bled, Yugoslavia (1978).
Pettorossi [1979]
A. Pettorossi, ‘‘An algorithm for saving memory in recursive programs using destructive annotations,’’ Rapp.
Istituto di Automatica R.79—19, Universita di Roma (1979).
Schwarz [1977]
J. Schwarz, ‘‘Using annotations to make recursion equations behave,’’ D.A./. Research Report No. 43, Dept. of
Artificial Intel., Univ. of Edinburgh (1977).
Schwarz [1978]
J. Schwarz, ‘‘Verifying the safe use of destructive operations in applicative programs,’’ 3éme Colloque Interna-
tional sur la Programmation, Paris (1978), and [EEE Trans. Software Eng. SE—8—1 (1982), 21—33.
Wegbreit [1976]
B. Wegbreit, ‘‘Goal-directed program transformation,’ /EEE Trans. Software Eng. SE—2 (1976), 69-79.
Wirth [1971]
N. Wirth, ‘‘Program development by stepwise refinement,’” CACM, Vol. 14, No. 4 (1971), 221-227.
Witte
7 nd ea)
ii > ia we
“= % = ae
—,
‘cam = ne9
=e °
= Pi
>
=
=—2 9
SECTION V
CHAPTER 15
Douglas R. Smith
Naval Postgraduate School
Monterey, California
A. Introduction
For some kinds of programs at least, a few well chosen examples of input and output behavior can convey
quite clearly to a human what program is intended. The automatic construction of programs from the informa-
tion contained in a small set of input/output pairs has received much attention recently, especially in the LISP
language. The user of such an automatic programming system supplies a sequence of input-output (I/O) pairs
<X1,¥1 >, <X,Y2>, --. <XpYn>- The system tries to obtain enough information from the examples to infer
the target programs behavior on the full domain of inputs. For example, if a user inputs the sequence
<nil,nil>, <(a),(a) >, <(a b),(b a) >, <(abc),(c b a) > then the system should return a program such as
* The work reported herein was supported by the Foundation Research Program of the Naval Postgraduate School with funds provided by
the Chief of Naval Research.
308 SMITH
(F x) = (Gx nil)
(G x z) = (cond ((atom x) z)
(T (G (cdr x) (cons (car x) z)))).
If the system is unable to synthesize a program or needs more examples to verify a hypothesized program then
the machine may request more examples. This paper presents an overview of the basic work on program con-
struction from examples and recent approaches to this problem in the domain of the LISP language. It has not
been possible to avoid glossing over many details and interesting mechanisms in the synthesis techniques dis-
cussed here. Our discussion isa simplified treatment of these techniques aimed at presenting what is essential
and novel about them. Further detail may be found by consulting the original papers.
In general the classes of programs which have been studied for synthesis purposes are constructed from
the LISP primitives car, cdr, cons, the predicates atom or null, the McCarthy conditional and recursive function
procedures. An arbitrary composition of car and cdr functions will be called a basic function. e.g., car’cdr is a
basic function (usually abbreviated to cadr) where * is the composition operator. A cons-structure is a function
which can be described recursively as follows: i) the special atom nil is a cons-structure, ii) a basic function or a
call to a program is a cons-structure, and iii) a function of the form (cons F, F,) is a cons-structure if F,; and F
are cons-structures. For example, (cons (car x) (F (cdr x) z)) is a cons-structure with arguments x and z. The
programs to be synthesized usually have the following general form:
(pag ley
(T (AX (EF (ox) (1G x 7)
where H and G are themselves programs of this type, pj,P2, ...,Px are predicates, f},f>, ...,f, are cons-structures,
and b is a basic function. For simplicity the arguments are restricted above to x and z; in general more or less
will be required. If there is no variable z and no function G then F will be called a forward-loop program. If
there is no H function then the last line of the above scheme has the form (T (F (b x) (G x z))) )), and F is
called a reverse-loop program.
B. Basic Work
Work on inductive inference properly includes programming by examples because the synthesis of a pro-
gram generally involves the inference of an extended pattern of program behavior from the patterns discovered
in the examples. In the most general setting of computability theory the problem of inductive inference
of
functions from input/output pairs has been explored in [Gold 1967, Blum & Blum 1975, Barzdin 1977].
The
basic result from this work is that a function can be inferred from I/O pairs if it belongs to
a class of enumer-
able functions with a decidable halting problem. Given I/O pairs {<x}, yp >, <x), V5 5:1 <q,
Vo} OUL inter:
ence mechanism enumerates functions one at a time until one is found which generates
y; given x; for eachi =
1,2,...n. If we then extend our examples set and the chosen function does not
satisfy some new I/O pair then
our mechanism continues its enumeration until a new function is found which
does satisfy the examples. Of
course such a mechanism is useful only for theoretical purposes since the
process of enumerating over a whole
class of functions is too expensive for practical use. Nonetheless, this
work shows that it is possible to infer
large useful classes of programs simply from examples of input/output
behavior.
Biermann and his co-workers [Biermann et al. 1975, Biermann and Krishna
swamy 1976] have looked at
methods for speeding up this enumerative search process based on the use of a program trace. Given an
ple input exam-
and a desired output, a semitrace is a functional expression which correctly computes the example
put from out-
the input but which may allow several different orders of evaluation. A trace is a semitrace
which
only has one possible order of evaluation. E.g. a semitrace for a computation of 3! is (times 3 2 i Because
of
SYNTHESIS OF LISP: A SURVEY 309
the associativity of multiplication we obtain two distinct traces from this semitrace: (times 3 (times 2 1)) and
(times (times 3 2) 1). Given this kind of information Biermann has shown how to speed up the enumerative
process enormously by enumerating over only those programs whose trace on an example input corresponds to
some initial subsequence of the given trace. This task is accomplished by partitioning the trace such that each
block corresponds to a unique instruction in the desired program. The process of finding such a partition is
controlled by a straightforward backtrack search. Pattern matching, generation of predicates and looping control
Structures are involved in determining whether a given trace instruction belongs to certain block (a process
Biermann calls instruction merging). See [Bauer 1979] for recent work on program synthesis from traces.
Despite the great gain which can be realized by using trace information in program synthesis, Biermann’s
mechanism is again an enumerative method and thus can take large amounts of time in order to synthesize all
but the simplest programs. The special attraction of LISP programs is the possibility that a semitrace may be
easily constructed from example I/O pairs and furthermore certain restricted classes of LISP programs can be
synthesized from the resulting semitraces without the need for enumerative search over the class. Some of the
early work on LISP program synthesis [Shaw et al. 1975, Hardy 1975, Siklossy and Sykes 1975] could generate
interesting programs but relied on heuristic techniques and provided no characterization of the class of target
programs. Summers [Summers 1976, 1978] was the first to put the possibility of LISP program synthesis from
examples on a firm theoretical foundation.
All of the methods to be described in this paper derive from Summers’ insight that under certain condi-
tions a semitrace of a computation can be constructed from input/output examples in the domain of LISP.
Once such a semitrace has been generated a synthesis from traces method can be applied to construct the
desired program.
It will be useful to abstract the key elements of a synthesis from traces method in order to facilitate the
description and comparison of the LISP synthesis methods. These elements are described below and illustrated
by Summers’ method.
2. Control Structures
A set of control structures must be available for constructing target programs. The McCarthy conditional
(cond) and recursive function procedures are used in Summers’ method.
3. Program Schemas
All of the synthesis methods described here make use of program schemas in order to constrain the way
in which the control structures and data operators are used. The basic schema used by Summers’ system has
the form
p,lx ae f, [x];
1 — Al Flolxl); xid
where fj,...,f, are cons-structures, and Alw;x] is a cons-structure in which w occurs exactly once. The predi-
cates p;[x] have the form atom[b;[x]] where bj is a basic function.
310 SMITH
Note that the output of ST is an unEVALed expression. For example ST((a b),(b a)) = (cons (cadr x) (cons
(car x) nil)).
Hbarerideo)|
where f; is the semitrace generated by the ith example I/O pair. Summers assumes that the user gives a
sequence of examples in which the example inputs form a chain, in the sense that for all inputs x; and x; either
Xj < Xj Or xX; < x; where u <v iff atom(u) or [ car(u) <car(v) & cdr(u) <cdr(v) ]. For example (a.b) < (d.e)
but the s-expressions u = (a.(b.c)) and v = ((d.e).f) could not both appear as inputs since neither u < v nor
v <u holds. This assumption provides a natural ordering for the branches of the conditional. The predicates
(p; x;) for 1 <i<k must have the property that (p; x;) evaluates to T, but (p; xj) evaluates to nil for alll <j <
i. Since predicates have the form (atom (b x)) where b is a basic function, a mechanism for generating the
predicates must find a basic function b; such that (bj xj) is an atom yet (b; x;4;) is not an atom (thus (pj Xi4)) =
false). The set of basic functions which so distinguish between example inputs x; and x;4; can be computed by
PRED GEN(x;,x;4)) = PG (xj, Xi44,1)
(I denotes the identity function)
0) if y is an atom
PG(x,y,6) = {6} if x is an atom and y is not an atom
PG (car(x),car(y),car*@) U PG(cdr(x) ,cdr(y) ,cdr*@) otherwise
SYNTHESIS OF CISP|ASURVEY “311
For example PRED_GEN((A),(A B)) = {cdr}, so a predicate to distinguish between (A) and (A B) is (atom
(cdr x)). If Summers’ system were given the following set of examples for the function REVERSE
{ nil — nil
(A) —>(A)
(A B) — (B A)
(ASB) (CC Baa),
then by combining the semitraces produced by ST and the predicates generated by PRED GEN, the following
loop-free program is produced:
If recursive function procedures are allowed then mechanisms must be available for detecting recursive
patterns in a semitrace, determining the primitive cases for termination, and creating and handling all necessary
variables. Summers has shown in his Basic Synthesis Theorem [Summers 1975,1977] the following fundamen-
tal result. Suppose that the following recurrence relations hold on a (possibly infinite) set of input-output pairs
defined by the recurrence relations can be correctly computed by the following instance of the recursive pro-
gram schema given above.
pfx] — f,{xl:
T— Cl Flblxilex) 1
This theorem and its generalizations establish a link between the characterization of a function by a
recurrence relation and a recursive program for computing that function. Thus the synthesis of recursive pro-
grams reduces to detecting repetitive patterns in the examples and a decision on when enough instances of a
pattern have been found to inductively infer that the pattern holds generally. At least two or three instances of
a pattern with no counterinstances have been taken as sufficient to allow induction of the pattern. Any decision
on this matter though is subject to easily constructed examples which cause the induction of an incorrect pat-
tern.
Recurrence relations are detected between two semitraces f; and f;,, by finding a basic function b such
that (f;(b x)) is a substructure of (f;4,x). For example if
312. SMITH
nil — nil
((a b)) — (b)
((a b)(c d)) — (b d)
((a b)(c d)(e f)) — (bdf)
which describes the problem of returning the list of second elements of each sublist of the input list. Applying
ST to these examples we obtain the following semitraces
(f; x) = nil
(f. x) = (cons (cadar x) nil)
(f; x) = (cons (cadar x) (cons (cadadr x) nil))
(f4 x) = (cons (cadar x) (cons (cadadr x) (cons (cadaddr x) nil)))
(p; x) = (atom x)
(p2 x) = (atom (cdr x))
(p3 x) = (atom (cddr x))
(p4 x) = (atom (cdddr x)).
We inductively infer that these patterns hold for all i and using Summers
’ Basic Synthesis Theorem obtain the
following program
(F x) = (cond
((atom x) nil)
(T (cons (cadar x) (F (cdr x)))))
SYNTHESIS OF LISP: ASSURVEY 313
Note that this program computes a partial function since it assumes that the input is a list of lists where each
sublist has length at least two. Thus the program is not defined on inputs (a b) or ((a.b)).
For some example sets, such as the examples given above for REVERSE, the recurrence relation detec-
tion mechanisms will not work. A fundamental technique used in Summers’ system generalizes the semitraces
by replacing some sub-semitrace which occurs in each semitrace by a new variable. It is important to ensure
that the value of the new variable will be initialized to the value of the sub-semitrace. The system then checks
for recurrence relations amongst the generalized semitraces. Consider the semitraces obtained above for the
function REVERSE. If the constant sub-expression nil is replaced by variable z then the semitraces become
(gi xz) =z
(g) x z) = (cons (car x) z)
(g3 x z) = (cons (cadr x) (cons (car x) z)))
(g4 x z) = (cons (caddr x) (cons (cadr x) ((cons (car x) z)))))
(REVERSE x) = (F x nil)
(F x z) = (cond ((atom x) z)
(T (F (cdr x) (cons (car x) z)))).
Biermann [Biermann 1978] has applied the techniques of synthesis from traces to the domain of LISP.
The data type, operators, and control structures used in this approach are the same as in Summers’ method.
The program schemas, called semi-regular LISP schemas, have the form
(Px fi)
(Toray)
where each fj has one of the following forms: nil, x, (Fj (car x)), (Fj (cdr x)), or (cons (Fy x) (Fy x)). The
predicates p; are required to have the form (atom (b; x)) where b; is a basic function and furthermore if (pj+1 X)
= (atom (b;4; x)) then bj; = b;"w where w is a basic function not equal to the identity function. Certain res-
trictions are placed on the syntax and interpretation of semi-regular LISP schemas to yield a well-behaved sub-
set of instances called regular LISP programs. A semitrace is obtained as in Summers’ method and from this a
trace is constructed in the form of a nonlooping regular program (i.e., no function F; is called by more than one
function f;). Of course there are no predicates in the trace so each function in the trace has the form
(F; x) = fa.
314. SMITH
A regular LISP program can be viewed as a directed graph whose nodes represent functions. Each arc
directed from a node representing function f; is labelled by a branch of the conditional of fj, i.e., a predicate and
a function call. The construction of the graph structure of a regular LISP program from a given trace is per-
formed by a backtrack search. At each point during the search a certain portion of the target graph has been
constructed and it accounts for some initial portion of the trace. At this point if all of the trace is accounted for
we are done, otherwise there is some function f; in the trace which invokes a function fj such that fj is
accounted for but f; is not. If the target graph has n nodes currently then there are n+1 choices of nodes to
identify with f; (one for each current node plus a new node for f; if needed). These n+1 alternatives are tried
in turn. A choice will fail and be pruned from consideration if for various reasons (for example, knowledge
about regular programs or general programming knowledge) the identification of a function with a node is
incompatible with other functions identified with the node, or if all of the children of the node fail.
If several arcs emanate from a node g; then predicates must be synthesized in order to distinguish the
different cases. First, for each transition we collect all inputs which cause it to be taken. An extension of algo-
rithm PRED_GEN from Summers’ method may be used to build predicates distinguishing the resulting set of
inputs.
Consider the trace given in above. Initially we create a new node for f; as in Figure la. For f, we have
two possibilities: identify f, with g,, or create a new node g) as in Figure 1b. The first alternative fails so we
explore further the second. Of the three alternative choices for f; shown in Figure lc only the third does not
fail. After merging f4 and f; we have the graph in Figure 1d which accounts for all but fio, fi1, fi2, and f}3. To
account for fg a new transition is created from g, and predicates must be synthesized to distinguish the two
arcs from g;. From the trace we find that the inputs
{ ((a.b).(c.d)), (c.d) }
lead to the function call (cons (gy x)(g3 x)), and the inputs
(d }
lead to the function call x. The generated predicate which distinguishs these inputs
is (atom x). The resulting
graph shown in Figure le accounts for the entire trace so synthesis halts.
SYNTHESIS OF LISP: ASSURVEY 315
] CONE == |
a. A node for f
AY[cons(g(x),-) ay)
| cons (go(x),-
©)
b. Alternatives for f».
ay (1)
Cons(Go(x),g)(x kconstgix),galx)) f cons(go(x),gs(x))
(92) (92) (Gs)
c. alternatives for f3.
g,(cdr(x) ee (x))
e. Final graph.
The system is able to generate any regular program from a finite number of examples. Although the sys-
tem can be sped up by restricting the schema (for example, to programs consisting of a single loop), in general
it faces the combinatorial explosion inherent in backtracking mechanisms.
Biermann and Smith [Biermann and Smith 1977, 1979] have studied the synthesis of a class of programs
called scanning programs. A scanning program scans its input variables as it generates its output. Synthesis can
be based on a single example and the I/O pairs are assumed to be lists of atoms. Data type, operators, and
control structures are the same as those used by Summers.
The target program is viewed as a hierarchy of LISP functions and is constructed in a bottom up fashion.
A semitrace is constructed for each level of the hierarchy and has the form of a loopless sequence of function
calls. Each function call of a semitrace is either nil or consists of a function and an argument. Each level of a
semitrace assumes that the output can be computed from the input by appending the results of the sequence of
function calls in the semitrace. Since the example input and output are lists,each atom of the output y can be
expressed by a function call to a function which has the form (P° x) = (cons (car x) next) where next is the
function call expressing the next atom in the example output.
For example, if the user gives the input/output pair
(a bie-d ef) —(arc'e fi'dib)
then the lowest level trace is
To obtain the next higher level in the hierarchy the sequence is chopped up into segments such that each seg-
ment can be easily generated by either a forward-loop, a reverse-loop, or a straight line function which simply
executes a fixed sequence of function calls. The segmenting mechanism works by looking for patterns and
extrapolating them as far as they match the sequence of function calls. The above semitrace is segmented as
follows
The first three function calls can be computed by a forward loop, which we call P!,
which in effect computes (a
ce) from (a bc de f), and the next three functions can be computed by a reverse loop
which we call P?. The
second lowest level of semitrace is
Predicates are generated in a manner similar to Summers’ method. The resulting program is
(P? x) = (P! x x)
(P! x, x») = (cond
((atom x,) (P? (cdr x) nil))
(T. (PP X] X>)))
(P? x; x2) = (cons (car x;) (P! (cddr x;) x>))
(P? x z) = (cond
((atom (cdr x)) z)
(T (P? (cddr x) (P9 x z))))
(P? x z) = (cons (car x) z).
Production rule schemas are used to encode both program schemas and the coding knowledge necessary
for coordinating the flow of data and control in a hierarchy of functions.
Each function generated by the synthesizer instantiates a rule schema to become a production rule. At
the top of the hierarchy is the function F (P? in the above example) which when given the example input will
produce the example output. The rule for this highest level routine is applied to a start symbol. Each applica-
tion of a rule adds the code for a routine on the next lowest level of the hierarchy, the correct number and
order of variables having been passed down from the coded string, and a sequence of nonterminals, one for
each function called by the routine with the correct number and order of variables included to be passed along
into the rule for each of these routines. In general it was found necessary to introduce a new variable for each
level of nesting of these functions. Rule schemas for forward loops, reverse-loops, straight-line routines, and
if-then-else statements were studied.
Kodratoff [Kodratoff 1979] has presented a synthesis technique which is a powerful generalization of
Summers’ method. The same data type, operators, and control structures are used as in Summers’ method.
However Kodratoff employs a more powerful schema and technique for detecting looping patterns. Like Sum-
mers’ method, multiple I/O pairs are required and looping patterns are detected by pattern matching the semi-
trace of output y; with a subsemitrace of output yj;4, for various values of i and k. In general such a matching
divides the semitrace y;4, into three segments ylj4,, y2j4,, and y3\4,, where the middle segment y2;,, matches
semitrace y;. Summers’ system stops at this point with the assumption that the initial and the tail segments
have no looping structure. Kodratoff’s system on the other hand creates two new synthesis problems: one with
I/O pairs <x;, ylj4,> and the other with I/O pairs <x;, y3;4,>. Ifh is the resulting synthesized program from
the former problem,and g is the resulting synthesized program for the latter problem, then these programs can
be composed in the following schema:
(F x) = (f x,(i x))
(f x z) = (cond ((p; x) (fy x z))
(pp a5 xeZ))
((py x) (fy x 2)
(T(hex @ (box) (= x77)
where b is a basic function, h and g are programs satisfying the schema, and i is an initialization function for
variable z.
Notice that this system works in a top-down fashion in contrast to Biermann and Smith’s system which
works bottom-up. Another point of contrast is the strikingly different ways that the two systems segment the
318 SMITH
output examples. For example, Biermann and Smith’s system would segment the output of the example (A B
CD) ~(ABCDDCBA) into A BC D and DCB A and conjoin the forward and reverse looping routines
which produce these segments. Kodratoff’s system, given the examples
(F x) = (f x nil)
(f x z) = (cond ((atom x) z)
(T (hx (f (cdr x) (g x z)))))
(h x z) = (cons (car x) z)
(g x z) = (cons (car x) z)
Another feature of Kodratoff’s method is the use of an algorithm he calls BMWk in the matching process
of one semitrace against another. As noted above, the introduction of new variables is one of the key problems
of program synthesis. In attempting to match A B C C B A against BC D D C Baas in the example above it
will be noticed that if (cdr x) is substituted for x everywhere in the semitrace for the former expression, we
obtain the semitrace of the latter subexpression. The fact that a single substitution suffices is the clue that only
one variable is involved. Consider on the other hand, the following example pairs.
(A B) — (BA BBB A)
CAB: C= (CIAGG Bi GeAc C6.BB)
When we attempt to match B A B BB A against C A C BC A we find two different substitutions. For the
first, third and fifth atoms we substitute (cdr x) for x, and for the second, fourth, and sixth atoms we substitute
x for x (the identity substitution). Here the two different substitutions are the clue that two variables are
involved; the first variable is used to obtain the odd numbered atoms and the second is used to obtain the even
numbered atoms. The recurrence relation between these examples then has the form:
Yi4n(X,X2,Z) = yi(cdr(x1),x2,hj(x1,X9,2))
where hj(x;, x2, z) computes the right-most segment of Vi+1-
Further development of this method is reported in [Kodratoff and Papon 1980, Jouannaud and
Kodratoff
1980, Kodratoff and Jouannaud 1983].
Jouannaud and Guiho [Jouannaud and Guiho 1979] have a system called SISP
which works on either sin-
gle or multiple examples. The class of target programs have a somewhat
simpler form than those of the above
systems since a more powerful class of basic functions is used. Let us
call the following functions JG-basic:
Icar((A BC D)) = (A) (the list containing the car of its argument), Irac((A
B C D)) = (D) (the list contain-
ing the last element of the argument list), cdr((A BC D)) = (BC D), and rdc((A BC D)) = (A BC) (the
list formed by deleting the last element of the argument list), and any
composition of JG-basic functions. In
addition to the JG-basic functions SISP uses the predicate null
and a multiple-argument version of append.
SYNTHESIS OF LISP: ASURVEY 319
Given I/O pair <x,y> in which both x and y are lists of atoms SISP segments x such that x =
append(px,c,sx) and y = append(py,c,sy) where c is the longest subexpression common to x and y. Since both
x and y are lists it is easy to find a JG-basic function f such that f(x) = c. The synthesis task then reduces to
two subtasks. SISP finds the shortest subexpressions x; and x of x such that x; — py and x — sy are self-
contained examples (chosing px, c, or sx if possible), and recursively tries to construct functions satisfying
these new examples.
Consider the synthesis of the function REVERSE from example x = (A BC D), y = (DCB A). The
largest common subexpression is (A) (in case of a tie in length, a subexpression on either end of y wins out).
Thus c = (A), sx = (BC D) and py = (DC B). x = append(nil,(A),(B C D)), y = append((D C
B),(A),nil). Since (A) = _ Icar(x) we have the partial semitrace y = append(py,lcar(x),nil) =
append (py,lcar(x)). The smallest subexpression x, of x such that x; — py is self contained is x} = (BC D) =
cdr(x). Applying the above mechanism recursively to the example x; — py we end up reducing it to another
simpler example, etc. Jouannaud and Guiho diagram the resulting ‘‘approximation’’ structure as follows
COMMON
INPUT SUBEXPRESSION OUTPUT
car
means Z = append(Z, Z> ... Z,). Adjacent to each node we have supplied that actual value of the node with
respect to the example (A BC D) ~ (DCB A). This structure is similar to the semitrace generated by Sum-
mers’ method, but its repetitive structure is so clearly brought out that the following recursive program to
generate REVERSE is easily created from a single example:
320 SMITH
F(x) = (cond
((null x) nil)
(T (append (F (cdr x)) (Icar x))))
If SISP fails to find a satisfactory program based on one example then it can ask for another example
whose input has the longest length less than x which is defined on the domain of the program. Examples must
be carefully chosen so that the inputs are of decreasing length. Given the examples xy and xy awhere
b(x) = x’ (b is a JG-basic function) SISP segments y into py, c, sx where c now is the largest common seg-
ment of y and y’. Synthesis control is based on the following diagram:
Perhaps the most striking point of contrast between the five methods surveyed in this paper lies in their
approaches to finding recursion loops. Summers’ and Kodratoff’s methods detect recurrence patterns by match-
ing the semitraces obtained from different example outputs. In Biermann’s Regular LISP system and Biermann
and Smith’s system recursion is detected by matching or folding a semitrace into itself. In Jouannaud and
Guiho’s SISP system the control pattern emerges from the match between an example input and its correspond-
ing output.
Table 1 characterizes each of the synthesis techniques discussed above according to several criteria. The
first column compares the time complexity of programs which are synthesizeable by each method. The time
complexity is closely related to a bound on the number of atoms in the output list since the time complexity
bounds the number of cons operations which can be performed. There is an apparent anomaly in
Biermann’s
regular LISP method in that it can generate programs with any finite number of nested loops yet the
complexity
of such a program is linear in the number of input atoms. This is explained by the structure
of S-expressions
and the lack of mechanisms in regular LISP programs for copying and reusing the input
variable. By way of
illustration consider the problem of producing a list of the last elements of each sublist
of a given list as in the
example
b. nis the depth of nesting and k is the number of conditional branches in the top level routine.
c. A class of LISP functions is closed under function sequcing if whenever F and G are functions in the class then
H(x) =append(F(x),G(x)) is a function in the class.
A program to compute this function requires a nested loop which scans down each sublist looking for the last
element, yet the program’s complexity is clearly linear in the number of input atoms. Columns 6 and 7 com-
pare the uses of program variables. A resource variable provides atoms for the output list. A control variable is
used to control the recursion loops. In many programs constructed by the five methods surveyed the resource
variables are also used as control variables. Column six shows which synthesis techniques allow separate
resource and control variables. Column seven points out Kodratoff’s unique technique of allowing copies of a
resource variable to be made then used differently in a program.
An important aspect of any program synthesizer is a characterization of the class of target programs. It is
important to know that a system is at least sound but preferably complete also over a class. Characterization of
several extensions of the class of programs synthesizable by Summers’ method are given in [Smith 1977,
Jouannaud and Kodratoff 1979]. Biermann’s enumerative methods on input traces are sound and complete
over their class of target programs. Guiho and Jouannaud [Guiho and Jouannaud 1977] have shown that SISP
can correctly synthesize programs over a well defined class of nonlooping functions.
322 SMITH
E. Concluding Remarks
The domain of LISP functions has served as a valuable testbed for the exploration of a number of syn-
thesis mechanisms. Summers’ insight that any self-contained input-output pair has a unique semitrace provides
one of the fundamental starting points for the synthesis techniques described above. The methods surveyed
have explored strikingly different mechanisms for detecting looping patterns in semitraces. Knowledge about
how to correctly sequence and nest recursive programs has been gained. Another valuable lesson of this
research involves the generation and correct use of auxilliary variables. Despite success in generating
ever larger classes of programs the question remains of whether these synthesis techniques can be used in any
practical way. It may be that there are some special programming domains in which some of these techniques
can be applied. However the real importance of this research probably lies in the abstraction and generalization
of the techniques which have been found to be sucessful in the LISP domain and the incorporation of these
techniques in synthesis systems of more general scope.
References
Barzdin [1977]
J.M. Barzdin, ‘“‘/nductive inference of automata, functions and programs,’’ Amer. Math. Soc. Translations, Vol.
LODO) NOT pp wlO 1 1D)
Bauer [1979]
M.A. Bauer, ‘“‘Programming by examples,’ Art. /ntell. 12, 1979, 1—21.
Biermann [1978]
A.W. Biermann, ‘‘The inference of regular LISP programs from examples,’’ /EEE Trans. on Systems, Man, and
Cybernetics, Vol. SMC—8(8), August 1978, pp 585—600.
Gold [1967]
M. Gold, *‘Language identification in the limit,’’ Inform. Control
(5), 1967, pp 447—474.
SYNTHESIS OF LISP A SURVEY,” 323
Hardy [1975]
S. Hardy, ‘‘Synthesis of LISP functions from examples,’’ Advance Papers 4th Int. Joint Conf. Artificial Intelli-
gence, Tbilisi, Georgia, USSR, Sept. 1975, pp 240-245.
Kodratoff [1979]
Y. Kodratoff, ‘‘A class of functions synthesized from a finite number of examples and a LISP program
scheme,”” Int. J. of Comp. and Inf. Sci., Vol. 8, No. 6, 1979.
Smith [1977]
D.R. Smith, ‘‘A class of synthesizeable LISP programs,”’ Technical report CS—1977—4,
Duke University, Dur-
ham, N.C., 1977.
Summers [1975]
P.D. Summers, ‘‘Program construction from examples,’’ Ph.D. dissertation, Yale University,
SYNTHESIZING LISP PROGRAMS = 325
CHAPTER 16
Yves Kodratoff *
Jean-Pierre Jouannaud **
Abstract
We describe program transforms that change the semantics of the program to which they are applied.
They allow the synthesis of very difficult list programs which mix synthesis from input-output behavior and a
program transformation technique.
* Laboratoire de Recherche en Informatique, Université Paris-Sud, Bat. 490, F—91405 ORSAY CEDEX, France.
** Centre de Recherche en Informatique, UER de mathématiques, Université de Nancy 1, 54037 NANCY CEDEX, France.
326 KODRATOFF AND JOUANNAUD
A. Introduction
Summers’ main contribution (Summers [1977]) to the field of LISP program synthesis from input-output
examples is in making a clear distinction between the two steps required by the synthesis.
A first step transforms the input-output examples into computation traces. Our contribution to this step is
rather formal: we simply noticed that the computation traces are terms (Robinson [1965]) built with the con-
structors (and their inverses) of an abstract data type. The type used by Summers and us is the type LIST
without the constructor of the atoms. The computation traces follow from this assumption: we never use the
particular value of an atom but rather compute list structures. The methodology can be applied to any type
described by 1— the empty constructor, 2— the predicate checking the result of the empty constructor, =
the data type constructor, 4— the operators inverse of the constructor, S— one relation between the construc-
tor and its inverses.
A second step detects recursion among the computation traces. We have improved mainly this aspect of
SUMMERS’ work and transformed it into a methodology for matching sequences of terms instead of matching
terms. It uses successive matchings and generalizations. We shall not describe here the details of our metho-
dology but will give enough examples of how it works to allow a precise description of the semantics of the syn-
thesized programs.
Our main point can be stated as follows. A wild combinatorial explosion can be avoided by synthesizing
programs that compute their output atom by atom, i.e. no sequence of atoms may be added ‘‘in one piece’’ to
the result.
This chapter will be devoted to the consequences of these semantics and we shall simply point out now
that the methodology itself can be also used for the transformation of recursive programs into iterative ones
and for the detection of a general enough induction hypothesis for a recurrence proof.
As a consequence of these semantics, our programs can be transformed in three different ways.
The first one, not always possible, transforms some program f defined on any linear ascending domain
{x} (described in Jouannaud et al. [1979], see here section B) into a program f° defined on flat lists {x’}, in
such a way that f(x) = f?(x’) for all x and x’ such that x7 = FLATTEN(x). This means that the new program
is defined on flat lists but has the same output as the original one. (The domain is changed; the outputs are
conserved.)
The second one transforms any program f° defined on flat lists {x’}, into a program f°’, defined on the
whole set of the lists {x’’}, in such a way that f’’ performs on each embedding level of an x’ the same operation
performed by f° on the top-level of an x’’. A good example of this behavior can be seen by considering
REVERSE such that (REVERSE(A B C D)) = (D C B A), REVERSE(A (BC) D)) = (D (BC) A) and
REVEALL such that (REVEALL (A BC D)) = (DCB A), (REVEALL (A (BC) D)) = (D (CB) A).
The third one, not always interesting, performs directly FLATTEN(f’’(x’’)) for any x’’.
We shall as succinctly as possible describe our methodology for program synthesis (described elsewhere
(Kodratoff [1980], Kodratoff et al. [1980]) ) from input-outputs, mainly in order to be able to show that the
three transforms can be applied to the programs we synthesize.
As we shall see, the input-output sequence {x;,F(x,)} is transformed into a sequence of predicates and
fragments {p,(x),f,(x) }. The predicate p;(x) is True if x has the same structure as x;, regardless of the values
of their atoms. For instance, (A B) and (E F) have the same structure since they are both lists with only atoms
at their top-level (hereafter called flat lists) and they contain two atoms. The fragment f;(x) describes how the
atoms of x; are used in order to obtain F(x;). For instance, let x; = (A B) and F(x;) = (B A). Then, fj(x) says
that for any flat list containing two atoms, F works by building a flat list containing first the second atom of x
(here, the value of this atom is B) and second the first atom of x (here, A).
We shall see also that we restrict our work to sequences such that p;(x) is easily computable from po(x)
and a recursion relation pj4;(x) = pj;(b(x)) and such that one can find an (often fairly complicated) recursion
relation between f,4;(x) and f;(x).
Let us suppose for the time being that we have found such recursion relations. We shall prove that these
relations are equivalent to a program.
When recursion relations are found, we make an induction hypothesis which is (and will stay) the really
unprovable part of any work on program synthesis from examples: we suppose that the function we are looking
for always fits these recursion relations. An alternate way of saying this is that we synthesize a function which
extends to infinity the recursion properties of the first few values given as examples.
When this hypothesis is taken for granted, then it is possible to prove the equivalence between the recur-
sion relations and a recursive program. The first proof, called the ‘‘basic synthesis theorem’? by Summers
[1977] uses the formalism of the fixed point semantics of recursive programs (Manna [1974] and Manna et al.
[1973]).
In an appendix, we give detailed proofs of extensions to SUMMERS’ theorem. We give here an intuitive
presentation of these results. As a first approximation, let us state that the following set of recursion relations:
(h; and g; similarly defined by recursion relations) is equivalent to the following recursive function:
F(x,y) = F’(x,y,constant)
F'(x,y,z) =IF po(x) THEN f’o(y,z)
ELSE H(x,y,F’(b(x),a(y) ,G(x,y,z)))
where pp, f’9,a,b are given in the above recursion relations and where H and G are similarly defined by a recur-
sive scheme of the same type.
The actual scheme is somewhat more complicated by the fact that ‘‘constant’’ may actually be computed
by a vector of functions (i.e. we allow a definition by composition), this transforms z into a vector of values
and G into a vector of functions. Besides, the variable y is allowed to be a vector of variables and, therefore,
the function a to be a vector of functions. Since H and G can be defined similarly to F, one may ask where the
process of recursive definition will stop: our scheme includes the fact that this process of recursive definition of
embedded function must stop after a finite number of times, by functions defined by the scheme:
which embed and are embedded in no other recursively defined function, where 1,(y,z) is given.
328 KODRATOFF AND JOUANNAUD
Property of our scheme. The above definition is consistent with our scheme; i.e. a program of recursive embed-
ding level m can call another of recursive embedding level n, only ifm <n.
Consequence. The programs of recursive embedding level 0 are defined by the original sequence of input-
output sequence of examples, their halting condition is thus po(x) = True. The programs of recursive embed-
ding 1 are defined by a sequence of examples which starts with the second example, their halting condition is
thus p;(x) =True. For instance, the fragments h; (see appendix) are defined by g,4)(x) = hj(x, g;(c-(x))), it
follows that g(x) = ho{x, go(c<x))) and ho(x,z) is computed when g;(x) is computed, i.e. when
p,(xx) = True.
x = (A (B) C) = cons
/ \
A cons
/ \
/ \
cons cons
/ / \
B nil G fist
Definition. A reduction function is a finite composition of car and cdr. Let R be the set of reduction functions.
Definition. Let Aj; be the i-th occurrence of the atom A in the list x and
u be the unique reduction function
which verifies u(x) = A;. Then u is said to be the functional name of A; in
x.
Example. Consider
x = cons
/ \
/ \
i \
/ \
cons cons
/ \ / \
A cons cons nil,
i ‘ / \
/ nil» C nil3
cons
/ \
B Meiyly
Definition. Let u and v belong to R, one says that v is aright factor of u ifu=u,.v. If v isa right fac-
tor of u, then uv! denotes a function of R such that u = (uv~!) v. Let (u modulo v) be the function u v~*
provided v* is a right factor of u, iff 0 <k’ <k (ie. vis not a right factor of u v-*).
Example:
u =car.car.cdr.car.cdr, v =car.cdr,
uv_!=car.car.cdr, (u modulo v) =car.
Definition. Let xg and @ belong to L, a owning a single *-labelled atom. One denotes S(a, Xo,*) (Knuth et al.
[1970]) the list obtained by substitution of xg to the *-labelled atom of a. The sequence {xo,x; = S(a, x;_),*)}
is called a linear ascending domain: A(Xo, @).
A linear ascending domain is totally ordered by the order induced by the substitutions, x; < x; for alli <j.
Remark. In fact, we change the names of the atoms in order to avoid several occurrences of the same atom in
Xj.
330 KODRAT OFFUD
AND JOUANNA
Example. Let
A nil A nil.
{(A), (A B), (A BOC), ...} ie. the set of the flat lists.
Example. Let
Definition. An atom of a list is said to be in the direction of a reduction function r, iff its functional name u
verifies:
From the intuitive point of view, an atom x of a list is in the direction of a reduction function r, iff there
exists an integer k >0, such that the path leading to x is a subpath of the path induced by eae
332. KODRATOFF AND JOUANNAUD
Example. Let u = cdr.car.cdr and r = car.cdr. Then (u modulo r) = cdr, and cdr is a right factor of r. An
atom the functional name of which is cdr.car.cdr is in the direction of r = car.cdr.
Example. Let u = car.cdr and r = car.car.cdr, (u modulo r) = u and u isa right factor of r. One of the above
examples was: Xj = (A B(C)), a = (A(B-)), the atom of value B+ in Xo is in the direction of the functional
name of the labelled atom B+ of @ since the functional name of B_ in xo is car.cdr and the functional name of
B- in q@ is car.car.cdr.
Consequence. The linear ascending domains are suitable domains for program synthesis from examples.
Let A(xo,q@) be a linear ascending domain and x belongs to A(Xo,a). Then p;(x) = True if x has the
same structure as x; (regardless of the values of their atoms), pj(x) = False if x has the structure of x;,x; > Xj.
Finally, if x has the structure of x;,x; < xj, the value of p;(x) is of no importance since we want to give
the form {p,\(x) — fj(x)} to the input-output sequence: the p; are checked in order po,p},p>,° °° and any x
has the structure of an x; > Xo. If po(x) = False then x has a structure of an x; > x, and so on.
Besides, this property assures that p,(x) =atom.u.b(x), —p;4;(x) =atom.u.b't!(x), therefore
Di4i(x) = p,(b(x)).
The functional name of nil» in @ is cdr,b=cdr, and the atom of Xg in the direction of cdr is nil,u=cdr.
The flat lists are characterized by po(x) = atom . cdr(x), pj4,(x) = p;(cdr(x)).
Remark. The fragments could have been constructed in a very different way. For instance, one could try to
put into evidence in F(x;) either x; or non-atomic parts of x;. In the case of flat lists cdr(x) or car.cdr(x;) are
such non-atomic parts of x;. This atomic decomposition in function of the functional names of the atoms of x;
implies that fj(x) is a cons-tree, the left leaves of which are functional names of atoms of x, the right leaves of
which are nil. This remark will be of utmost importance in the following.
Xo
= (A) > F(X) = (A),
From the inputs we know that A in F(x9) is car(xg), A and B in F(x)) are car(x) and car.cdr(x»), A, B and C
in F(x3) are respectively car(x3), car.cdr(x3),car.cdr.cdr(x3),..... The atoms in F(x;) are replaced by their func-
tional names in x;:
334. KODRATOFF AND JOUANNAUD
f3(x3) = cons
/ \
car cons
edi ii \
cdr Cam cons
cdr orelie / \
X3 Cai, car cons
X3 cdr / \
X3 Cait ve),
X3
x Cant nil
x
The difference between f}(x;) and f,(x) lies in that f;(x) is defined for all x > Xx, but gives always the same
result as f;(x,) (if the first two atoms have the same values in x and x)):
Remark, Of the atoms of F(x;), why is nil not replaced by its functional name in x;? No real trick is hidden
behind this restriction. First, it enables us to treat without ambiguity the cases where
several nils appear in the
input. Second, even in the unambiguous cases, giving its functional name tonil
would add no powe to rthe
methodology and would be slightly more complicated.
SYNTHESIZING LISP PROGRAMS — 335
[iF attomecdr.cdt-ecdr
edmux)s) LHE Nims coms EESE Ss
/ \
/ cons
/ / \
iL car cons
Cat cdr / \
exe ig cdr Car cons
Cdr X Cdn / \
cdr x Cat nil
x X
where 1 is the undefined value. It follows that the sequence {F,} is an increasing sequence of chains (Manna
[1974]).
336 KODRATOFF AND JOUANNAUD
The relationship between fragments is not usually that simple. Our algorithm then uses two different iterations.
First, iterations on the substitutions; as above, one might find x is substituted by a non-constant term
t)(x). The t;(x) are then looked at as new fragments among which recursion relations are to be detected.
Second, iterations on the generalizations; most often, the matching fails because one variable undergoes
different substitutions or because a function (or constant) receives a substitution (we stick to first order match-
ing). One then transforms f;(x) into a new sequence of generalized fragments g;(z,...,z,) where g;(zj,...,Z,) iS
the least generalization (Plotkin [1970], Reynolds [1970]) of f,(x) and fj4;(x). If the matching of g; and g;4,
fails, one iterates on the generalizations, i.e. the sequence g; is transformed into a new sequence obtained by
the least generalization of g; and g;4;. This process is iterated as long as one does not get a sequence for which
the matching of successive terms succeeds.
cons
/ i
Cat cons
cdr / \
cdr Clan cons
x car / \
cdr Car nil
X X
f,(x) cons
/ \
car cons
cdr / \
edn Car Cons
Chak car / \
Cait Cali car cons
Xx Xx Xx / \
Gar nil
x
f;(x) = cons
/ \
Car cons
cdr / \
ere (Se Gann cons
Car Car / \
Car COh car C.OmnS
Caan Xx X / \
cdr car cons
car x juan
Cah car cons
x Cat [ae
cdr Gracia
X Car
Car
Car
Car
X
f; (x) = cons
/ \
Car cons
cdr / \
Cali Cran cons
erie YouMs if \
C Cif Cut mammalfi cons
Xi x x i} \
CAT etal
Xx
The matching of f; and f;,; fails since one gets contradictory substitutions like x\x and x\car.cdr(x). One sees
also that the constant nil should undergo a substitution. One therefore transforms the sequence
f; into the
sequence of the least generalizations of fj and f,,;. The least generalization of
SYNTHESIZING LISP PROGRAMS = 339
(with 26 xix
li 1) =item)
and we can verify that f(x) =gj(x,x,nil) for all i. The matching of gogandg, succeeds with
x \car.car (x) sx \x"
340 KODRATOFF AND JOUANNAUD
Z Car ye
x’!
@ air Z
z \ CA
exem
Xo
The z variable gets a substitution of the form z\hj(x",z). Considering h,(x,z) like new fragments and looking
for their recursion relationships leads to hj4;(x,z) = h,(car.cdr(x),z). We have therefore found recursion rela-
tionships between generalized expressions of f;(x).
Example 2. The matching of the fragments obtained in section 2.4 fails because the
constant nil undergoes a
substitution. We accordingly generate the sequence of the least generalizations, which
happens to be the same
as the initial one except that a variable z is substituted to each nil.
SYNTHESIZING LISP PROGRAMS 341
£2(x,z) = cons,
/
Cat cons
cdr / ‘
ed'r Case cons
X cdr / \
x car Z
x
and f(x) = gj(x,nil), where the f(x) are those of section 2.3. The matching of g; and g;4, succeeds with x \
cdr(x) and
cons
f 7.(-<) eo xacnciels)
cons
8o(x,z) car Z
cons
hex. 2) Car Li
(iewee a lea
Remark 1. The generalizations of both examples are actually more complicated: they introduce sequences of
new variables. In example 1, since 3 variables are introduced at each generalization, we should have written
342 KODRATOFF AND JOUANNAUD
car Z6
Z5
It is a rule (see remarks 2 and 4) that a variable must be substituted by a term containing this variable. It fol-
lows that one must identify z; and z4 to one variable x’, identify z) and zs to one variable x’’. We thus write
pons
/
z3 \ car Te
Zs5
as
z3 \ Car
ms Z6
Xoo
where x’”’ is known, and it follows that one must identify z; and zp to one variable
z. We have shown
(Kodratoff [1980]) that this kind of reasoning is always possible when the trees are
of polynomial increase.
Some extensions to exponential increase are also described in (Papon [1980]).
Remark 2. Our algorithm uses heuristics we have called ‘‘lethal successes’, since they name matching
successes that would not lead to valid recursion relations. We give here an example of this behavior.
SYNTHESIZING LISP PROGRAMS 343
Example 3. (Composition)
Xo = (A) — F(x) = (AA),
f,;(x) = cons
/ \
Car cons
344 KODRATOFF
AND JOUANNAUD
f(x) = cons
i \
Gar cons
X / \
Car cons
Cdr / \
ix car cons
Oa tr / ‘
edit Cat cons
x x / \
Cat ni |
cdr
cdr
x
The matching of f, and f, leads to two lethal substitutions. The x+ receives cdr.cdr(x) which cannot be a recur-
sion relation, since the fragments cannot recur quicker than the domain (a computation trace could not exist).
The x++is such that cdr(x++) \ x which leads to no recursion since cdr has no inverse. The reader can check
that these lethal successes occur for other f;. Since the first lethal success occurs at x, f,(x) is rewritten as
f(x) = hy(x,g1(x)) with
cons
\
Car cons
hyxe2)s= x a
Car Zz
Galir
X
and
cons
ia
Car cons
g;(x) = X i eX
Crabli nil
Cali
x
In the same way, one could see that f,(x) must be given the form f(x) = ho(x,g2(x)),
with
cons
CAit cons
x nei
hex ee Al ie cons
: cdr \
X car Z
cdr
cdr
X
and
SYNTHESIZING LISP PROGRAMS = 345
jp
car cons
x i
@Aélir cons
£7(x) = cdr / \
Remark 3. We spoke of matching successive fragments. One may also discover recursion relations by matching
f; and a sub-tree of f;4;. This would lead to non-terminal recursive forms, as the following example shows.
fo(x) = cons |
/ \ |
car nil |
ee 8 6k VE aeees
Ae =
| | |
{Cx )e=scons: e| |
/ \ |
car cons |
cdr / \ |
x car nil |
| |
fo(x) = cons |
/ wa
car cons
cdr / \\
cdr cat cons
x cdr / \
Xx car nil
Xx
f;(x) = cons
/ \
car
cdr /
cdr Cae
Cat cdr
x cdr
Xx
and can be matched as shown by the dotted lines. One then finds:
cons
ho oN
fi(x)9 = car fo(x),
cdr
x
cons
fi) ae oy
Calit
X
fi4i(x,z) = hy(x,f,(x))
SYNTHESIZING LISP PROGRAMS = 347
where
cons
\
ho(x,z) = car zo
cdr
X
cons
fie aN
hy(x,z) = oe eo
eGlit
X
Considering the h,(x,z) as new fragments, one finds hj4;(x,z) = hj(cdr(x),z). This ends the recursion relations
necessary to describe the recursive behaviour of fj(x).
Remark 4. We spoke of matching successive fragments. Instead of matching f; and f;4,;, one could also attempt
a matching of f; and f;4,. For each 1 <j <k, a different function can in principle be found. In the following,
we Shall deal with only one function at a time, so this problem will not be studied again.
B.6.2. The halting conditions contain reduction functions that compute only atoms.
With respect to the way our programs are built, this property is a mere triviality:
each program is obtained
as the limit of fragments that, by construction, contain reduction functions computingatoms only. For any x,
the computation which takes place is precisely the computation made by the fragment f(x) following the predi-
cate p(x) = True. This property is not true for any program belonging to the scheme of theorem 2. We shall
now make precise how this semantic feature will show up in the scheme. We have given in (Jouannaud and
Kodratoff [1979]) a formal description of this condition which was stated as a property of our scheme. We
348. KODRATOFF AND JOUANNAUD
Example. In example 1 of section 2.5, the recursion relations instantiate the scheme of theorem 2 in order to
give the program:
ELSE
Gear Sed rex J ex HiCee))s)
HG z) = IF atomicdr.car.,cdr.can
cde) 1 HEN Goms
Car Z
EUSE Hicar
2c dite) rze)
For any x of the domain, car.car.cdr(x) and car(x) are atoms. Since x’’ undergoes no substitution,
car.car.cdr(x’’) and car(x’’) are always atoms in the halting condition of G. On the contrary, car.cdr.cdr(x) is
an atom only if cdr.car.cdr(x) = True. It follows that x’, which undergoes the substitution x’ \ car.cdr(x’) as
long as it does not fulfill this condition is always such that car.cdr.cdr(x’) is an atom when the halting condition
of G is reached. G is therefore an example of our claim: the halting condition of G contains reduction func-
tions which compute atoms only.
As a counter-example, consider the function G’, similar to G where the halting value is
SYNTHESIZING LISP PROGRAMS = 349
cons
/ \
car cons
cdr / \
otaliy car cons
xen Cap / \
corre Ne car Z
x x
for instance, then G’ would not fulfill our condition and we want to show that a function like G’ cannot be syn-
thesized from examples by our methodology.
Preliminary remarks. We shall need to consider in greater detail the way a matching may succeed and
thoroughly examine the case of the unary functions car and cdr. As before, we shall call f,(x) the fragments
obtained from the examples and g,(x) the generalized fragments such that g,(x) and g,4;(x) match.
When the matching between f,(x) and f,4;(x) is attempted several cases may arise due to the fact that f,
and f,4; are cons-trees made of functional names of atoms: there is a reduction function between the leaves x
and the cons which is their nearest father (e.g. in G’ above, car.cdr.cdr is between cons and x’’).
— aterm whose root is cons or a constant is substituted by another term. The generalized expression will
thus contain a variable in place of this constant or term. This variable will be said to be term-typed and we shall
further name it z' (the upper indices will always characterize variables introduced by a generalization. z at
power i will be written (z)'). In other words, each z! is initiated by a term whose root is cons or a constant. In
example | of section 2.5, nil is generalized to the term-typed variable z. In the example of section 4, car(x) is
generalized to the term-typed variable y.
— aterm whose root is a unary function is substituted by an other term. This case is a lethal success and
no further generalization takes place. For instance, we do not accept a matching between car.cdr(x) and
car.car.car(x). The substitution cdr(x) \ car.car(x) leads to a lethal success.
— an x matches a unary function. In general, this matching is not unique and the generalization will
introduce new x-typed variables further named x'. In other words each x’ is initiated by x. In example 1 of sec-
tion 2.5, we have x’ \ car.cdr(x’) and x’’ \ x”’ which are two x-typed variables (where it is understood that the
identity is a particular reduction function).
It follows that the generalized fragments are cons-trees, the cons are father of
— another cons
— a reduction function which has an x-typed variable as leaf
— a term-typed variable
— aconstant.
The nature of ci(®) This vector appears in our scheme in the recurrence relations of g,: g:4;(%) = h;(X,g;(c,(X))).
This vector contains two types of variables, we shall write it as (c’x!,.. . Caxtolaz ces Cage neve: aimey
actually be different for each i but, on the contrary, we shall see that c’, = c’, for all i. Recall that b is the
constant reduction function such that pj4;(x) = p;.b(x). The reduction function b may be the r-th power of an
irreducible reduction function b’, i.e., b = (b’)". For instance, if b = car.cdr.car.cdr then b = (car.cdr)? =
350 KODRATOFF AND JOUANNAUD
(b’)2.. We introduced the following lethal success (see (Kodratoff [1979]) for its reason, it has been here
exemplified in example 3 of section 2.5). If c’, is not equal to (b')®, 0 <p <r, then we have a lethal success.
The case p = 0 means that c’;, is the identity issued from a substitution x \ x. If follows that there are at most
r+1 possible values for c’, and they cannot be obtained as limits of infinite sequences. To each c’, we associate
an x-typed variable x*.
Description of g9(x)
Let u‘; be the reduction function applied to x* in g;(x). From the above preliminary remarks, one sees that
Zo(x) is a cons-tree the sons of which are
— cons
— a term-typed variable
— ufxk
— constants
and we know that uj; =c’,uj, where c’, = (b')?0 <p <r, where r is defined by b = (b’)". These restrictions
to the scheme insure that u;x' in the halting values compute atoms only. By construction of the domain, we know
that each x' is obtained by ‘‘adding’’ the term a to x;_;. We shall write this x; = a + x;_; (+ is not commuta-
tive!). From the input-output examples, we know also that atom.ug(xo) = True, i.e. Ug selects an atom out of
Xo, atom.u;(x;) = True, i.e. uy =Cc’,.Ug selects an atom out of x; = a@ +x 9, but depending on the relative
values of b, Ug, and c’, this atom may be in Xo or in a.
Case |.
If c’, = (b’)" = b, then this atom is always the same atom of Xo and u,(x,) is always an atom.
Case 2.
lie, = (b)*, 0 <p <r, then after a finite number of steps, say m steps,
the atom selected in at: -- +atxXo
where + appears m + | times, will be in a.
SYNTHESIZING LISP PROGRAMS © 351
Let n be the least common multiple of p and r. Let X9,...,Xm4n a Sequence for which we have verified that atom
uj(x)) = True, 0 <i < m+n. The reduction function Up4n41 applied to Xm+n41 reaches the same place in a
aS U,, applied to x,, which is an atom by hypothesis.
It follows that if u,(x,) is an atom for the n + m + 1 first examples then u,(x;) is an atom for any k. Since
each x of the domain has the same structure as some x,, the proof is completed for this case.
a a a XO
i 2 Rag yt atop fl! 2 eh Sa alllen pee ee |\enetetere tar 2 hung ae yet tee lee eg ce era ia
|
|
|
= le J=t=hal-l=l-)<le)-2=-2 |
Co ap placed Uo
10 times.
a a a XQ
||
|--|--|--|--|]--|--|--|--------------------- |
Capp ied Ug
7 times.
Here, b=(b’)* and c’, = b’.Ug ‘‘strikes”? two different places according to whether x; contains an even or odd
number of a’s.
Case 3
Ifc, = (b’)° = Identity, the same proof as in case 2 applies, except that it is enough to stop at X,p.
Ug always ‘‘strikes’’ at the same place in a when at least three a’s are added to Xo.
352. KODRATOFF AND JOUANNAUD
Remark \. It follows that some recursion relations might appear on the very few first examples that would not
be valid for all f;. One has to verify the recursion relationships up to the (n + m + 1)th example in order to
insure their validity.
B.6.3
We are now able to characterize the scheme of the functions we synthesize. They formally are instances
of the scheme of theorem 2 and their semantics fulfill the restrictions 2.6.1 and 2.6.2.
The last restriction implies that the halting value of the functions of embedding 0 contains only atoms and
term-type variables and constants. These term-type variables are in turn computed by functions of embedding
1 which fulfill the same condition. The value g(x) is thus computed atom by atom, each being put into a frame
which expresses the structure of g(x).
Definition.
As in section 2.3, b is the functional name of the *-labelled atom in @ and u is the functional name of the atom
of x9 which is in the direction of b. The cons-tree x adapted to a reduction function r is the smallest cons-tree
such that atom.r(x) = True. For instance, let r = car.cdr.car then x has the structure
cons
—
B ni |
because car.cdr.car(x) = B and any cons-tree which verifies this relation contains x. Let (xo,a) be a linear
ascending domain adapted to b, let x; be the (i+1)-th representative of this domain and uj, j <n, the func-
tional names of the n non-nil atoms of x;.
Let us further suppose that b does not contain car only, i.e., we rule out the (left) flat lists: (A), ((A)),
(((A))), .... For the sake of simplicity, we shall study only the transformation when X = {x} is any linear
ascending domain and X’ = {x’} is the (right) flat lists: (A), (A B), (A BC), ....
Because X is adapted, it is trivial that this transformation puts the atoms of any x into a one-to-one
correspondence with the atoms of x’. For any f(x) whose halting values compute atoms only, it is therefore
evident that f(x’) is obtained by replacing the functional values of these atoms in x by their corresponding
values in x’. This is stated more precisely with the following conventions:
Let uj} be the functional names of atoms computed in the halting values of f or of functions called by f
and u’} the corresponding functional names in X’.
If b = (b’)' one knows that c’; = (b’)k, 0 <k <r, then, if b’ contains m cdr, y; = (cdr) *™.
We define a transformation f — f’ by: in the recursive scheme of f, replace p,(xx) by p’\(xx), replace ui by
u’?, replace b by B, replace c’; by yj; in each level of recursive definition. Then f{x} = f’ {FLATTEN(x) }.
It is quite clear that the number of recursive calls and the names of the atoms can be put in a one-to-one
correspondance, and since the halting values in f compute atoms only, f and f’ will give the same result.
Pi41(x) = p,(car.cdr.(x)).
SYNTHESIZING LISP PROGRAMS 300
f(x) = 6x oarGxoe
ni)
cons
Zo(x,y,z) = lak
y z
Sia;
CX. yz) — = 8 oan acd nm Gx | oN |}
y Z
cons
car nil )
Grann
ealye
Xx
cons
ae
|e cdir
oe
X
and therefore
it follows that the halting condition must not be changed but the x-typed variable which recurs like car.cdr(x)
must now recur like cdr(x). The halting condition of G contains no reduction function; it is therefore
unchanged. The function H is a constant function which stops always at cons(car.car.cdr(x),nil) and since for
356 KODRATOFF AND JOUANNAUD
BG) = GrilGecca
5 Oo )ainly)
cons
G’ (x,y,z) = IF atom.cdr(x) THEN aN
y y/
cons
ELSE G(cdr(x), ae ,H’ (x) )
y
Definitions
— Among the nodes of a binary tree let the right-most nodes R be recursively defined by: the root of the
tree belongs to R and a node which is a right son of a node of R_ belongs to R. In
cons’
A cons’
cons \
/ ‘ cons
B ni /
— Let x be acons-tree. A sub-list of /evel 1 is a left son of a right-most cons. In the above cons-tree
A,(B),C are sublists of level 1.
SYNTHESIZING LISP PROGRAMS = 357
— Let R, be the right-most nodes of a sub-list of level n. Then a sub-list of level n+1 is a left son of a
cons of Rg.
In
cons
i \
A’ \
cons
/ \
hi \
cons’ cons
/ \ / N
B \ E’ nil
\ *
cons
/ \
/ nil
cons
/ \
C \
GOmssS
i \
D mil|
D nil
is left son of a cons in Rj, this sub-list is of level 2.
— let F bea function defined on the flat lists only and F’ be the all-levels recursively defined function
associated to F. We shall first define F’ by its semantics.
The computation of F’(x) is carried as follows: all sub-lists of level 0 in x are given a symbolic name s¥, i.e. x
is re-written as (s?...., s°) which is a flat list. The first computation of F’(x) is then carried on, it is the compu-
tation of F((s?,...,59)). The same operation is recursively applied to each s in F((s?,...,82)) as long as s; is not
an atom.
Syntactic Definition of F’
We said that F’ is derived from F because it works like F on sub-lists of x, this being repeated at each
level. Let F bea function which computes F(x) atom by atom, F might not belong to the scheme of theorem
2 but it will anyway have a halting value like the halting values of our scheme: it contains cons the sons of
which are cons, a reduction function which computes atoms only: u'x', constants and term-typed variables.
These last variables are also computed by functions that compute their result atom by atom.
358 KODRATOFF AND JOUANNAUD
The construction of F’ follows its semantics. F’(x) will have the same overall structure as Fe except
when u' x! are computed in F or the functions called by F. Each u' x’ is replaced by a program which is:
It could be possible to prove formally by induction on the level that the syntactic F’(x) will have the behavior
of the F’(x) defined by its semantics. This proof is quite trivial since it is evident that the syntactic F’(x) will
compute level by level as far as F computes atom by atom.
Example. We have elsewhere given examples of this transformation in the frame of our scheme (Jouannaud
and Kodratoff [1979]). We shall now give an example which does not belong to our scheme.
Let fib(x) be a FIBONACCI-like function defined on flat lists. The length of fib(x) equals the length of
fib(cdr(x)) plus the length of fib(cdr(cdr(x))).
fib(x) = fidx,nil)
cons
We obtain the function fiball(x), the all-level recursively defined function associated to fib(x), by transforming
the car(x) of
into:
fiball(x) = fiall(x,nil)
The reader can easily verify that fib((A B)) = (A), fib((A B C)) = (BC),fib((A BCD E)) =(DEDDE),
Remark. This transformation solves the following problem. We have said that we are able to synthesize func-
tions whose domain is linear ascending. One could try to synthesize all-level recursively defined functions from
examples given on sequences of linear ascending domains from the (right) flat lists: (A),(A B),(A B C),... to
the left flat lists; (A),((A)),(((A))), --* . The methodology we advise is a combination of synthesis from
examples and from transformations: give the examples on the flat list and add the specification ‘‘to be
extended to all-level’’ which will imply the above transformation of the synthesized function.
fialln(x.n,z) = TF atom.cdr
Ce) THEN cons
ELSE
halinCedr (x) ntal Incedm cdr (on nezne
embedding level of x, F’ contains thus explicitly the specification of the embeddings computed in F’(x). One
can imagine a first transformation: each time an expression the atomicity of which is not checked is consed in
the result, replace cons by appen. The function appen is nearly the classical APPEND, except that it must
make the difference between nil and other atoms:
This transformation has no limitation but will not be further described here since it leads to no new result. On
the contrary, we shall study another transformation which applies to a very restricted scheme:
— F’(x) takes always the form F’(x) =FF’(x,nil) with only one accumulator.
— all the halting level in F’ and the programs it calls contain only one term-typed variable and have a flat
list form: the atoms computed by ujx' are consed in the term-typed variable. This means that the halting
values of F and the programs called by it are:
cons
cons
eee Z
cons
We now define F”’ by the following transformation: F’’(x) = FF’’(x,nil) and F’’ recurs like FF’ but its halting
values are:
362 KODRATOFF AND JOUANNAUD
ELSE ugh
xh az ie
Otherwise stated, if u/x) is an atom then the sub-tree of the halting value in F:
\
cons
tae a
U j JxJ \
F
\ roy.
eee, ean
u;’x! \
When we use this transformation, on the one hand the recursive calls necessary when u/x! is not an atom
are kept as in F’ but, on the other hand, the flat list structure of the halting values is also kept because each
halting value has a flat list structure. We shall study this transformation on an example issued from F(x) = x
synthesized on flat lists. The transformation of section 4 is used in order to obtain an F’(x) = x defined at all-
levels, then F(x) = FLATTEN(F’(x)) = FLATTEN(x). The interest of this transformation lies in the fact
that F”’ is quite efficient.
{xo
= (A) > f(x) = A; x; =(A B) >f(x,;) =(ABC) 15)
=(ABOC)::--: }
Matching pj; and pj4, leads trivially to the recursion relation p;4,;(x) = p,(cdr(x)). Matching f; and the ‘‘tail’’ of
fj4; (as the arrows show) leads to the recursion relation:
where
We finally obtain a program whose stopping conditions are given by the first predicate and the first trace and
whose recursive body is given by the recursion relations:
where H_ is the same as in the above COPY. The new variable z undergoes no substitution, its value will
always be z=nil. One would obtain in the same way a COPYALL(x,z) where z has always the value nil. We
apply now the transformation of this section: cons(COPY ALL(car(x)),z) which is now present in COPYALL
and HALL must become FF’’(car(x),z) since we know in this call that car(x) is not atomic. One obtains:
MQFLAT(x) = FLATTEN(x)
= FQQ(x,nil)
FQQ(x,z) = IF atom.cdr(x) THEN IF atom.car(x) THEN cons(car(x),z) [FQQ1]
ELSE FQQ(car(x),z)
We shall characterize the called instance of FQQ by the number which points toward it.
When (MQFLAT)*(A)) is evaluated, only 1 call to FQQI is needed. When one adds one atom in the list
then (MQFLAT’(A B)) calls FQQI then HQQ (thus FQQ4), i.e. each new atom induces 2 new recursive calls.
When one adds a nesting level, MQFLAT applied to ((A)) needs a call to FQQ] and a call to FQQ2, i.e., each
nesting level adds only 1 supplementary recursive call. It follows that if
x contains n atoms and a Sai of
parentheses, 2n+m-—2 calls to FQQ and HQQ are needed.
SYNTHESIZING LISP PROGRAMS _ 365
MCFLAT = FOO(x,nil)
FOO(x,z) = IF(x=nil) THEN z
ELSE IF atom(x) THEN cons(x,z)
ELSE
FOO(car(x) ,FOO(cdr(x),z))
In order to study the complexity of this function we shall count the number of times FOO is called when one
evaluates (MCFLAT X) where X isa list containing n atoms and m couples of parentheses. In an intuitive
way, consider first the list (A) which leads to 3 calls of FOO (the initial one and then 2 recursive calls). When
an atom is added to it, say X becomes (A B), 5 calls to FOO are necessary, i.e. we add 2 new calls to FOO per
new atom. The same is true when one considers ((A)), i.e. we add 2 calls by new nesting level. This means
that FOs by new nesting level. This means that FOO is called 2n + 2m — | times by MCFLAT the complex-
ity of which is the complexity of MQFLAT + m- 1. It is difficult to say that our program is ‘‘better’’ than
McCarthy’s since the stopping conditions are not the same and since we use cross-recursion. We therefore do
not claim that each LISP system should implement MQFLAT. It is nevertheless surprising that a transforma-
tion which is systematic enough to be implemented, i.e. a program that can be automatically synthesized can
well challenge the programs due to very skillful programmers.
Remark. As we hinted at in the beginning of this section, it could be possible to define a more general flatten-
ing transformation. When the above transformation is not allowed, then replace cons by appen.
F. Conclusion
We presented a methodology for program synthesis which mixes synthesis from examples and program
transformation. A first version of the desired program is obtained by its input-output behavior and the
definitive version is obtained by a transformation of the first one, this transformation being induced by a
specification. It must be noticed that the main task is then to define a set of ‘‘suitable”’ specifications, suitable
being understood relatively to the domain and relatively to the problem to be solved. The solution we propose
to program synthesis from examples is a kind of unification algorithm which unifies sequences of terms rather
than two terms. It is a convenient (and theoretically sound!) way to express the fact that one induces from a
set of examples by putting into evidence what is common to all the examples and what are the common
differences between different examples. From the theoretical point of view, we are not totally satisfied because
we have not yet proved the soundness of our methodology. It may happen that we are able to synthesize
different programs from one set of examples. How can we be sure that these programs are equivalent? How
many examples are necessary to insure this equivalence? An answer to these questions is equivalent to a
LaGrange theorem (stated for polynomials on the reals) for the trees of polynomial increase.
G. Acknowledgements
We thank E. Papon who implemented the BMWk algorithm for the synthesis from input-output behavior
and F. Dupuy who implemented the transformations for the three specifications developed in the text.
366 KODRATOFF AND JOUANNAUD
H. APPENDIX
we start from mn examples, we get a computation trace which contains n embedded
If
IF...THEN...ELSE... statements. (See section B.4.)
Theorem 1 Let
be a trace where pj4;(x) = pj(b(x)), f(x) = gi(a(x)), gi410K) = h(x,g\(CG))) where po,b,h, fo and the functions
contained in the vectors a and ©are strict and monotonic (Manna et al. [1973]). Then, the limit of F,(x)
when n_ tends towards infinity is strongly equivalent to F(x),F(x) =f,(x,a(x)) where f, is the least fixed point
of the functional 7 defined by:
Proof:
First part: equivalence of 7 with a generalized trace.
Let be the everywhere undefined function (i.e. (x) =4 for all x). We shall prove that G,_; and 7"[Q]
are identical. We use computational induction (Scott [1970]).
SYNTHESIZING LISP PROGRAMS — 367
+ initial step
The induction hypothesis is 7°[Q](xx,x) =G,_)(xx,x) from which we want to prove that
741
O) (xx,x) = G,(xx,x).
By definition of 7,
We use the definition of G,-_; and the distributivity of h relative to the conditional (Manna ef al. [1973]):
Second part: We have still to prove that F,(x) =G,(x,a(x)). The following instantiation lemma, with
a’ = (1,a) where I is the identity, proves our point.
The instantiation lemma:
Let G, be a trace, i.e. a chain the domain of definition of which increases with n. It has therefore a limit
when n tends to infinity; let G be this limit. Let a” be a vector of monotonous functions. Then, the chain
F,(x) =G,(a(x)) tends toward G(a"(x)) when n_ tends toward infinity.
Proof:
It follows from the continuity of the functional: p{H](x) = H(a*(x)) and from the chain property of
PF and G::
= pIG](x) =G(a"(x)).
As seen above, the recursion relations that may be found are more complicated than those allowed by
theorem |: we need to generalize them to any level of embedded recursive calls.
Definition. A sequence fj is defined by embedded recursive calls and will be further named a recurrent sequence,
if it is defined by:
GeO) Teen
F(x) =lim F,(x) =f,(x,A(x,x))where f, is the least fixed point of the functional
Hy -1(xx,X,y)
=IF py(xx) THEN ho(X,y) ELSE ...
IF p,_\(xx) THEN hy_1(X,y) ELSE 1.
When one starts with n fragments, the functions C and H are defined from n—1 fragments generated as sub-
problems of the original problem. This explains why we have chosen to write gi 41(x) = hj(x, gj(c{X))) which,
in turn, implies that co and ho must be associated to p,(xx) = True.
370 KODRATOFF AND JOUANNAUD
Proof:
Fy.
It uses a double recurrence, first on n as in theorem 1, second on the recurrent property of the trace
It uses also the fact that H is strict if ho is strict, which we suppose to be true.
— if F, is of type (2) then one proves first that 7°*![Q] =G,, then that G,(x,A(x,x)) = Fa(x).
+ prove that 7*"[Q] =G,
The initial step is identical to the initial step in theorem 1. The induction step: (the induction hypothesis is
(a Q) ad Ge)
We must now induce on the recurrent form of F,. It is quite evident that if p;(xx) = True then
But our induction hypothesis says that C and H are limits of traces H, and C.. it follows that if p;\(xx) = True
then H(xx,x,y) = hj_)(\y) and C(xx,x) c,4(x). We can now write:
Ta+1l
OQ)(xx,x) =IF po(xx) THEN go(x) ELSE
rT
0) (xx,x) = G,(xx,x)
A is the limit of traces An(xx,x) and p;(xx) = True implies A(xx;,x) =a,{x). It follows that:
372 KODRATOFF AND JOUANNAUD
= G,(x,A(x,x)).
The instantiation lemma implies that their limits are equal, which completes the proof.
References
Biermann [1978]
A.W. Biermann, ‘‘The inference of regular LISP programs from examples,” I.E.E.E. Trans. on Systems, Man
and Cybernetics, Vol. SMC—8 (1978), pp. 585—600.
Hardy [1975]
J. Hardy, “Synthesis of LISP functions from examples,”’ Proc. 4th IJCAI, (1975), pp. 268—273.
Huet [1976]
G. Huet, “Unification dans les th‘ories d’ordre 1,2,...,@,’’ Th‘se de doctorat, Universit’ Paris 7 (1976).
Green, Waldinger, Barstow, Elschlager, Lenat, McCune, Shaw and Steinberg [1974]
C.C. Green, R.J. Waldinger, D.R. Barstow, Q. Elschlager, D.B. Lenat, B.P. McCune, D.E. Shaw, and
L.I. Stein-
berg, “‘Progress report on program-understanding systems,’” Memo AIM—240, Report STAN—CS—
74—444,
A.I. Lab., Stanford (1974).
SYNTHESIZING LISP PROGRAMS — 373
Kodratoff [1978]
Y. Kodratoff, ‘‘Choix d’un programme LISP correspondant ‘ un exemple,’ Congr‘s AFCET-IRIA reconnais-
sance des formes et traitement des images, Chatenay-Malabry (1978), pp. 212—219.
Kodratoff [1979]
Y. Kodratoff, ‘‘A class of functions synthesized from a finite number of examples and a LISP program
scheme,” Int. J. of Comp. and Inf. Sci. 8, (1979), pp. 489—521.
Kodratoff [1980]
Y. Kodratoff, ‘‘Un algorithme pour |’obtention de formes terminales recursives a partir de traces de calcul’’,
Actes des journ’es francophones: production assist’e de logiciel, gen‘ve, (1980), pp. 36—63.
Manna [1974]
Z. Manna, Mathematical theory of computation, McGraw-Hill (1974).
Papon [1980]
E. Papon, ‘‘Th‘se de 3‘me cycle, Universit’ Paris-Sud (1980).
Reynolds [1970]
J.C. Reynolds, ‘‘Transformational systems and the algebraic structure of atomic formulas,’’ Machine Intelli-
gence 5, Miltzer and Michie, eds., (1970), pp. 135—151.
Robinson [1965]
J. A. Robinson, ‘‘A machine oriented logic based on the resolution principle,’’ J. ACM 12, (1965), pp. 23-41.
Scott [1970]
D. Scott, ‘“‘Outline of a mathematical theory of computation,’ 4th Annual Princeton Conf. Inf. Sci. and Syst.
(1970), pp. 169-176.
Smith [1978]
J. P. Smith, ‘‘A class of synthesizeable LISP programs,’’ Report CS—1977—4, Dept. of Computer Science,
Duke University, 1978.
Summers [1977]
P.D. Summers, *‘A methodology for LISP program construction from examples,’’ J. ACM (1977), 24, pp.
Lol 5:
DEALING WITH SEARCH — 375
CHAPTER 17
Alan W. Biermann
Duke University
Durham, NC 27710
A. Introduction
All of the program synthesis methods described in this volume involve search. Unfortunately, this means
they are all computationally expensive and that a central issue in program synthesis is the problem of dealing
with search. For example, if we study almost any methodology in this book, we see it requires about ten or
fifteen steps to generate the example program described in Chapter 1. Furthermore, the number of different
rules or transformations applicable at any given point is very large so the target program is at depth ten or more
in a very bushy tree. If the target program is to be found via a uniform search of the tree, it will probably be
out of range of any realistic automatic procedure simply because of the astronomical number of nodes that
must be expanded before adequate depth would be reached.
The usual method for addressing such search problems is to apply ‘‘heuristic’’ methods that abridge the
complete search tree by making guesses as to which paths to follow. Using analogies with human thought
processes and other arguments, the system designers build strategies into the search program for eliminating
vast portions of the tree and moving to search depths that would not otherwise be possible. Examples of this
approach are described in many chapters of this book. This approach, however, has severe shortcomings
376 BIERMANN
because the tree pruning methods are not well understood and may remove the desired target node as well as
many others from consideration. In fact, this problem seems to occur in computer chess programs and the
result is that uniform search programs seem to be able to dominate heuristic programs. While heuristic
methods seem to be the appropriate method for dealing with large search problems, they do not necessarily
have a good record of success.
This chapter will argue for a third alternative which attempts to circumvent at least partially the problem
of astronomical search and which avoids the dangers of heuristic methods. The technique is to define classes of
programs that are much smaller than the set of all possible programs and to find synthesis methods for the indi-
vidual classes. Since the classes are smaller, it is possible to move through the search to a much deeper level
and to find much more complex programs. However, since the synthesis method can generate any member of
the class, it can function as reliably as any traditional language compiler.
Using this approach, program synthesis is done as follows. We assume there is a list of classes of pro-
grams for which the synthesis problem is solved. Given a new synthesis problem, one attempts to solve it
using one of the known synthesis methods. If one of the synthesis methods is successful, the solution is found
efficiently and reliably. If none of the synthesis methods are applicable, less efficient or less reliable methods
can be tried.
In this chapter, we describe two classes of LISP programs and their associated synthesis methods. Each
synthesis procedure is vastly more efficient than a general synthesis method and is completely reliable. In fact,
programs of the complexity of the example of Chapter | are well within the reach of these methods. In the fol-
lowing sections, we describe the classes of ‘“‘regular’? and ‘‘scanning’’ LISP programs and their associated con-
struction algorithms.
By (1) we mean that any member of the class or its equivalent can be synthesized by the method.
By (2) we
mean that no synthesis method could be created which could synthesize every regular
LISP program with fewer
examples than our method. By (3) we mean that one can choose examples randomly
from the behavior of the
target program and be sure convergence will be achieved. No carefully designed
training process is necessary.
DEALING WITH SEARCH — 377
C. Regular LISP
We will be constructing programs from the following five primitives:
f(x) = NIL
f(x) =X
f(x) = f,(car(x))
f(x) = fj(cdr(x))
f(x) =cons(fj(x), f(x))
The synthesis process will proceed by breaking the desired input-output relation into these primitives and con-
structing a general function by a merge operation. Thus if it is desired to generate a program which computes z
= (A.NIL) from x = (A.(B.(C.NIL))), we can write z in terms of x as follows.
z = cons(car(x),NIL)
In general, one can always find the composition of z in terms of x by the following construction.
After the composition of z in terms of x is found, its breakdown into primitive functions is straightforward.
In fact, the example composition breaks down as follows:
It turns out that in many situations, there are several ways to decompose a relation into primitive func-
tions. As designers of the synthesis procedure, we might find all possible decompositions and then do all possi-
ble mergers on those decompositions. However, it was found that most decompositions have some undesirable
properties and actually only one is needed. We will use only decompositions which are both direct and free of
car-NIL and cdr-NIL instances.
Directness requires two things. The first is that car and cdr operations be given precedence in the func-
tional decomposition. That is, whenever there is a choice between selecting the cons operation and either car
or cdr, one selects the latter. Suppose x = (A.B) and z = (A.A). Then we can write
378 BIERMANN
However, this violates directness because this same input-output relationship can be expressed with a break-
down that applies car before cons.
f,(x) = f,(car(x))
f(x) =cons(f3(x), f4(x))
f3(x) = >,
f4(x) =x
Direct calculations are usually more efficient because they avoid the unnecessary repetition of some car and
cdr operations.
The second requirement for directness is that the calculations be finite. Thus there can be no infinite
series of cons operations in a direct computation.
The other property we want is that the decompositions be free of car-NIL and cdr-NIL instances. Suppose
a program f,; has the form
f(x) = f2(car(x))
f(x) = NIL
Then it is clear that the value of f(x) is NIL and the computation of car(x) is wasted. This is called a car-NIL
instance and is another inefficiency to be avoided.
The desired breakdown of z in terms of x is, in fact, given by the following function t(x,z).
DEALING WITH SEARCH = 379
t(x,z) = ie
(D(t(car(x),z))) if cdr(x)
appears in c(x,z)
and no car(x) or x as
an argument of cons or x
7 alone appears in c(x,z)
(O(t(x,car(z)), t(x,cdr(z))))
otherwise
One can illustrate the trace function t(x,z) on the example of this section.
t(x,z) = (At(car(x),z))
= (At(A,(A.NIL)))
= (A(Ot(A,car(z)) t(A,cdr(z))))
=(A(O (I) (N)))
The direct and car-NIL, cdr-NIL free decomposition comes from t(x,z) by interpreting N,IA,D, and O as
applications of, respectively, the NIL, identity, car, cdr, and cons operations. Moving in from the outermost
parenthesis, one can construct the decomposition
f(x) = f2(car(x))
f(x) =cons(f3(x), f4(x))
f3(x) 9.6
f4(x) = NIL
It is shown in Biermann [1978] that synthesis from the decomposition of t(x,z) is capable of generating any
‘‘regular’ LISP program.
With these concepts, it is now possible to make some key definitions. Many times the car and cdr func-
tions are composed to a considerable depth so space can be saved with an abbreviated form. Specifically, the
to represent the composition.
Capes lo)
Thus cdr(cdr(car(x))) will be written cddar(x). The set of such expressions can be written c(atd)’r (x) with
the understanding that cr(x) is the identity function.
380 BIERMANN
(1) p,; =atom(c w;r (x)) where w; is in (a+d)” for i = 1,2,...,.n—1, and where w; is a proper suffix
of wj4; fori = 1,2,....n—2, and
Qe p, = Er.
Predicates constructed by the system described here will be such chains. An example chain of predicates Is
p,; = atom(car(x))
p2 = atom(cadar(x))
p3 = atom(cddadar(x))
a |S
A semiregular LISP program f will be defined to be a finite nonempty set of component programs
f,,i = 1,2,...,m, with one of them f, being designated as the initial component. The value of f operating on x
will be f(x). A component program f; of f is of the form
f(x) = cond((pjy,fiy)
(Pin, fia)
(Pinsfin))
where Pj1,Pi2,-.-,Pin 1S a chain of predicates with arguments x and each fj,j = 1,2,...,n is one of the following
D. Synthesis
Once the primitive function decomposition is complete, the synthesis can
proceed. The concepts related
to the construction will be described here but the actual implementation
for an efficient construction will not be
described. The full details are given in Biermann [1978].
The method is to assume that the total number MAX of compone
nt programs is known and to examine
all possible mergers of the functions in the primitive decomposition
into MAX functions. Thus in the
DEALING WITH SEARCH 381
decomposition of the example program, there are four primitives f,,f>,f3,andfy. If MAX = 1, then all four
functions would be merged into one.
f(x) = f,(car(x))
f(x) =cons(f,(x), f(x)
f(x) a,
f(x) = NIL
Of course this is an unacceptable merge because f; is not uniquely defined. However, if a chain of predicates
could be found such that
f(x) =cond((p;,f;(car(x)))
(p>,cons(f,(x),f,(x)))
(p3, x)
(p4, NIL))
then f; would become well defined. Unfortunately, one can show that no such chain can be found so that this
synthesis attempt fails. Furthermore, no reordering of the clauses within the conditional will yield a form for f
that will do the example computation, so MAX is increased and another attempt is made.
At MAX = 2, the four functions can be merged in 16 different ways.
WES SteersOmni
heme a
oa fy
Dy, f\—f ,f2—-f ,f3f 1 ,.f4--f,
The synthesis procedure tries each possible merger and attempts to build conditionals with predicate chains to
make the component programs well defined. In this case, it fails again but at MAX = 3 it finds an acceptable
merger:
The program is
f,(x) =cond((atom(x),cons
(f(x) ,f3(x)))
(T, f,(car(x))))
f,(x) =x
f,(x) =NIL
The basic method is thus to try every possible merger. When a merged function becomes doubly defined,
a conditional is constructed and a chain of predicates built to remove the ambiguity. If no such conditional can
be built, the merger fails and another is attempted. In practice, one can design the synthesis method so that
most unsuccessful mergers are never examined, and the correct one is usually found quite directly. This can be
seen by working through the example.
At MAX = I, we examine the merger of the first two primitives.
The first definition is to hold if x = (A.(B.(C.NIL))) and the second is to hold if x = A. We can build a chain
of predicates that will distinguish between these two sets: p; = atom(x), p) =T. This yields the definition
f(x) =cond((atom(x),cons(f,(x),f;(x)))
(T,f,(car(x))))
f\(x) =x whenx =A
which violates the above tentative definition of f). Furthermore no new chain of predicates can
be found to
allow a satisfactory modification of f}. So MAX = 1 fails and MAX is incremented.
At MAX = 2, various mergers can be tried but all fail because essentially three
different behaviors are
required on an input of one atom: cons(fj(x),f;(x)), x, and NIL. Limitations on the predicate
synthesizer
disallow predicates that can distinguish one atom from another. So no chain
of predicates can be found to
separate them and the search must proceed to MAX = 3
At MAX = 3, both the primitives f3(x) =x and f4(x) = NIL must be distingui
shed from each other and
from f; since they all give different outputs on an atomic input. This yields the final program
DEALING WITH SEARCH — 383
The complete details of the synthesis algorithm are given in Biermann [1978]. If multiple examples are
given, all are converted to primitive form and the merge process functions in the same way. The algorithm is
guaranteed to find a regular LISP program that will execute the given examples.
Algorithm Al:
Input: A finite set S of input-output pairs (x;,z;) for the desired program.
Output: A program g; from class C with the property that g,(x,) =z; for each (x;,z;) in S.
C)ai-le
(2) while there is (x;,z;)) in S such that g;(x;) is not z;, increment j.
(3) return result g,.
The result obtained if algorithm Al enumerates class C and has input S will be denoted Al(C,S).
A program synthesis algorithm A _ will be called sound if whenever S represents the behavior of a pro-
gram in C, the program A(C,S) operating on x; yields z; for each (x;,z,) in S.
Program g; will be said to cover program g; if the fact that gj(x) is defined implies that gj(x) is defined and
that gi(x) =gj;(x). Algorithm A will be called complete over class C if for each g in C there isa set S of pairs
(x,z) such that A(C,S) will halt, yielding g; which covers g.
Algorithm A_ will be called stable if g = A(C,S) and g(x) = z for all (x,z) in S’ implies g = A(C,S LU
S’). Thus if A chooses a program g on the basis of information S and if additional information S’ is compa-
tible with g, A will not make a different choice on the basis of S (LU S’.
384 BIERMANN
Proof. The soundness and stability properties follow immediately from the construction of Al. The complete-
ness property requires only a simple proof. Let g be an arbitrarily chosen program from C, and let g; be the
first program in the enumeration of C which covers g. For each i = 1,2,3,...,j-—1, choose an (x;,z;) such that
=z; and gj(x;) is undefined
gi(xj) or gj(x;) does not equal z;. Such an (x;,Z;) can be found for each i=
1,2,...4j-1 because the absence of such (x;,z;) would imply gj covers g; and g for i <j which contradicts the
definition of g;. If Al operates on finite set S={(x;,z,) |i=1,2,...j-I}, it will halt and return g; as its answer.
This completes the proof.
Suppose AO is a program synthesis algorithm that is sound, complete, and stable over C. If for every S
there is an associated S’CS such that AO(C,S’) = AI(C,S) and if not all such S with associated S’ are such
that Al(C,S’) = A1(C,S), then AO will be said to be more input efficient than Al.
Theorem 2: If a program synthesis algorithm AO is sound, complete, and stable over C, then AO is not more
input efficient than Al.
Proof: Assume AO is more input efficient than Al. Then there must be an S and an S’C S such that
A1(C,S) =g;, AO(C,S’)=g;, and A1(C,S’)=g; for i < j. Then there must be a subset S’CS’ such that
AO(C,S'—S") =g; since AO is more input efficient than Al, and AO is complete. But for each (x,z) in S”’,
gi(x)=z by the soundness of Al. So by the stability property of AO, AO(C,(S’-S"”) J S”) =g;. But this con-
tradicts the fact that AO(C,S’) =g; for i < j; so it must not be true that AO is more input efficient than Al.
This completes the proof.
F. Some Examples
The theorems of the previous section cannot be fully appreciated until a synthesis system is constructed
and tested. The regular LISP synthesizer was programmed and used to construct a wide variety of programs. If
the target program was known to be regular and was no larger than about six transitions in size, the synthesizer
generated that program (or an equivalent) immediately and with complete reliability. The absence of any tree
pruning methods except those proven to remove only nonsolutions makes such reliable behavior possible. The
number of examples required to create the target program was small, usually only one.
Of course, the class of regular LISP programs is very large and the cost of synthesis is exponential on the
size of the target program. Programs of four transitions or fewer required several seconds of CPU time for syn-
thesis. Programs with five transitions required most of a minute, and larger programs required a minute or
more to construct.
As a first illustration, consider the construction of a program to find the third from last atom ina list. For
example, list (A BC D E) yields output C. This implementation required that inputs be given in S-expression
form with all atoms distinct and non-NIL. So the actual input was (A.(B.(C.(D.(E.F))))) yields C. The syn-
thesized program after '4 second of CPU time was
f(x) = cond((atom(x),x)
(atom (cdddr(x)) ,f;(car(x)))
(T, f,(cdr(x))))
This same program would have been generated if any other list of atoms
of length 3 or more had been used.
DEALING WITH SEARCH — 385
Suppose the user had submitted the above example with the goal in mind that the target program should
find the third element of the list instead of the third from last element. Then the program could be tested on
(A BC D) with the result that B would be returned. Clearly this is not compatible with the current goal so
both examples could be submitted. The following program was generated from them.
f, (x) cond((atom(x),x)
(T, f2(cdr(x))))
f(x) = f3(cdr(x))
f3(x) = f,(car(x))
The synthesis time was 3 seconds. The same program would have been generated if any two lists of length
greater than 2 and of differing length had been used.
A problem of similar complexity to the example of chapter 1 was given to the system. (The actual prob-
lem of chapter | was not solvable since it required predicates not available to the regular LISP synthesizer.) The
target program was to collect the atoms in a list of atoms and lists. The example was (A(B) C (D) (E) F)
yields (A C F) and the program was generated in 39 seconds.
f;(x) = cond((atom(x),x)
(atom(car(x)) ,cons(f2(x) ,f3(x)))
(T, f,(cdr(x))))
f(x) = f,(car(x))
f3(x) = f,(cdr(x))
Again almost any single example would generate the same program provided it is long enough to force that pro-
gram through all of its transitions.
G. Scanning Programs
While the regular LISP synthesizer was both reliable and robust for a variety of inputs, its shortcoming
was its inability to generate programs larger than about six transitions. It was decided that the individual build-
ing blocks for regular programs, specifically car, cdr, and cons operations, were smaller than necessary. If the
building blocks were complete control structures such as a loop or branch, then a program of size, say six,
would be of very substantial size in terms of a real user’s needs.
Conceptually, scanning programs move sequentially across an input list, once or many times, processing
each element as it is encountered. They are the programs generated by production rules of the type described
in Chapter 1, Section E. The primary problem in the program synthesis of scanning programs is the selection
of which production rules to use and how to set the parameters. Once these decisions have been made, actual
code generation is very fast and efficient.
Synthesis uses a hierarchical technique, first accounting for the lowest level behavior and then moving
higher until all levels are accounted for. In order to illustrate this process a synthesis of the program that con-
verts list (A BC D) to (A BC DBC DCD D) will be given. This construction requires only the two pro-
duction rules described in Chapter 1, Section E and begins with a graphical display of the desired behavior. The
target output is graphed as a sequential selection of input atoms.
386 BIERMANN
The synthesis proceeds by accounting for the various parts of the figure. First it is noted that the output
atoms are simply copied from the input list. The first production rule from Chapter | yields this behavior.
This accounts for the lowest level behavior, so the synthesis task is now to find a code to call P® repeatedly with
the correct argument. This is illustrated as follows:
Input
ae al
B pe pe
C pe pe pe
D po pe pop?
—— oe SSsSsSSSSsSsSsSsSSs—SsSsSsSsFshsesSsS
Here P® at level A on the graph means call P° with Xo =(ABCD). P® at level B means call P® with
Xo = (B C D) and so forth.
. Sequential calls of P® at regular intervals can be made by looping
code as generated by the second produc-
tion Te from Chapter 1. This rule will generate program P! (thus
i=1) that will scan to the end of the list
(thus P* entry check is atom(X,)) decrementing the input list
by 1 each time (thus m=1). The rule is
DEALING WITH SEARCH — 387
Py(X), XL)=
cond((atom (X,), next)
(Teo OS Xe)
Input
Ames
B P!
G p!
D P!
tS es J oes eee eee eee
Thus P! can account for the descending sequences of Ps but the descending sequence of P!’s must also be
accounted for. But that can be done with another application of the looping production rule schema. In this
application, a routine called P? will be created by setting i=2, the entry check to atom(X,), n = 1, and the sub-
routine call Pk to P!. The result is the following rule.
PAX), XL) =
cond((atom(X), next)
(TeP ay Co Xo Xx)
< Se ee ee
When enough reductions have been made to achieve a one point representation, the program is generated by
syntactically expanding the nonterminal for that point, in this case P?. The nonterminal [P?,(X,), NIL] is
expanded. Using the previous rule, we see w and XL must be set to the string of length zero and next =
NIL.
[P?,(X,) ,NIL]==
(T,P}(X5,X>)))
This expansion has yielded another nonterminal that can be expanded. The P! rule is needed with w=1 5
OG)
PP(X1,X1,X2) = cons(car(Xo)
Pi (cdr(X}),X>))
The final program is the union of the code from the above three expansions.
(T, P}(X>,X>)))
P}(X1,X) =cond((atom(X),P2(cdr(X)))
(T,P?\(X1,X1,X)))
P?,(Xo,X1,X2) = cons(car(Xg),P}(cdr(X1),X>))
Reviewing the above synthesis procedure, there is a diagnosis stage in which the problem is hierarchically
decomposed and rule schemas are selected and instantiated. Then the actual code is generated by a relatively
mechanical syntactic process. In practice, this has been found to be a very fast synthesis procedure and pro-
grams with nested loops up to depth six have been diagnosed and constructed in less than a second of CPU
time.
It has been shown in Biermann and Smith [1979] that a relatively few rules are necessary (only six) to
generate a wide variety of programs. Other kinds of looping behavior, branching structures, and other lowest
level routines can be represented in production rule schemas. We will examine one more kind of rule schema
here, a rule for generating branching code.
[Pi
CX XL), next]== >
Pi(X;,XL) =cond(
(T, next))
i = rule designation
(P' condition check)
2 the routine be conditionally executed
390 BIERMANN
29 4 Output
Input
9 | po
Next, we would like to use a looping schema to make the sequence of calls to P®. However, the calls are
uneven so this tactic will not succeed. Then we can look for something distinguishing the items selected
from
those not. The predicate generator can quickly discover the selected atoms are not negative and
so the branch-
ing schema can be used with i = 1,(P! condition check) = not(neg(car(X))), PX = P®. This yields the rule
DEALING WITH SEARCH 391
P1(X,,XL) =cond(
(not(neg(car(X,))) , Po,(cdr(X,) , X1,XL))
(T, next))
The graphical representation of the problem can now be modified so that a looping schema can complete the
computation.
Oa p!
3350 p!
es) 4 Output
The looping schema can then be used to create a P? that will call P! repeatedly and complete the solution to the
problem. The details are given in the Appendix to this book.
H. Conclusion
This chapter began by noting that dealing with search is one of the most difficult issues in automatic pro-
gramming research. If one attempts to scan a completely general class of programs in the process of program
construction, the size of the search becomes unmanageable and even relatively trivial programs will be out of
reach. Yet the usual alternative to uniform search may be unsatisfactory: one prunes the search tree using
‘*heuristics’’ which will hopefully guide the system to an acceptable solution.
The approach advocated here is that automatic programming research be built around the concept of solv-
able classes of problems. If a given problem is to be addressed and falls into one of the known solvable classes,
the automatic synthesis can be achieved by straightforwardly applying the appropriate method. If the size of the
target program is known, the cost of doing the construction may be known as well. This approach attempts to
make synthesis of moderate sized programs both reliable and efficient.
392. BIERMANN
The regular LISP programs include most of the LISP functions that one might think of which use only the
atom predicate and which do not use additional variables. One could broaden the definition but this would
result in slower synthesis times. Only by keeping the definition narrow has the synthesis performance described
here been possible. One could also seek narrower definitions that would allow larger programs to be generated.
The class of scanning programs grew out of such an effort. The scanning programs are able to do most tasks
that involve repeated scans of an input list selecting items to be appended to an output. Reasonably satisfactory
solutions to the synthesis problem have been found for both classes as has been described here.
I. Acknowledgement
This work was supported by the National Science Foundation under Grants MCS 7904120 and MCS
8113491 and by the Air Force Office of Scientific Research, Air Force Systems Command, USAF, under Grant
81-0221.
References
Biermann [1978]
A.W. Biermann, ‘‘The inference of regular LISP programs from examples,’’ /EEE Transactions on Systems,
Man, and Cybernetics, Vol. SMC—8, No. 8 (1978), pp. 585—600.
CHAPTER 18
Ted J. Biggerstaff
ITT Programming
Stratford, CT 06497
Abstract
This paper describes the C2, design directed program synthesis system. The C2 synthesis paradigm views
program synthesis as the inverse of program evaluation. It uses an abstract design (representing a whole class
of specific programs) to direct the simulation of a target program evaluation. It makes a record of the simu-
lated behavior of the target program and induces the target program from that record of behavior.
The specification information used to perform this simulation and synthesis consists of four basic items
(specification factors):
394. BIGGERSTAFF
IO Examples,
1O Spec (similar to IO assertions),
An abstract design, and
An abstract definition of the implementation data structures (i.e., LISP lists).
The IO Examples and the IO Spec are supplied by the user. The abstract design and abstract data structure
definitions are part of the C2 library. The user need only specify the kind of design and implementation data
structures he desires.
A. Introduction
The C2 paradigm (Biggerstaff [1976a, 1976b, 1977]) synthesizes a program by simulating the symbolic
execution of the program, and recording the behavior of the program as a symbolic execution trace (state tree).
The target program is induced from this record of behavior. For contrasting synthesis paradigms, see Biermann
[1976a] and the other chapters of this volume.
The data required to perform such a simulation and construct the state tree is drawn from the four
specification sources.
e The IO Examples reveal data structure manipulations which must be performed to generate various forms
of the output.
® The IO Spec reveals a class of branching predicate,
The Abstract Design guides the overall simulation process and provides much of the stereotypical code
common to programs of that design, and
e The Abstract Implementation Data Structure definitions provide a class of branching predicate that is
needed for loop control and special case constructions.
The system codifies generic design knowledge by capturing the essence of an algorithmic method in an abstract
design schemata (i.e. algorithm skeleton). This design schemata can be used to synthesize any number of
specific programs which use that method. For example, a binary search or a quicksort could both be syn-
thesized from the same ‘‘divide-and-conquer’’ design schemata. The author believes that the codification
and
reuse of programming knowledge is the key to being able to synthesize large and complex programs.
(See also
Gerhart [1976], Barstow [1980], Green [1976], Balzer [1976] and Heidorn [1976].)
The system allows target programs to be specified by several separate specificat
ion items (called factors)
each of which focuses upon a separate aspect of the target program. Program
specification by factors provides
several desirable consequences. First, it allows the user to focus his attention
on one aspect of the program at a
time. This leads to clearer, easier to understand specifications, which
in turn, should reduce specification
errors. Second, there is a direct relationship between specific specification
factors and specific classes of con-
structions within the target program. This aids the user by relating
specific target program behaviors to specific
specification factors, thus, making it easy for the user to implement
changes of mind.
DESIGN DIRECTED SYNTHESIS 395
The fundamental idea of design directed program synthesis is that a target program may be efficiently syn-
thesized by use of general plans called design schema. These design schema contribute the control structures
which are common to the general algorithmic method of the schema, and the user contributes information
necessary to create the specialized constructs required for the specific target program.
This paradigm provides the user with flexibility in controlling the structure of the resulting target program.
In the example to be analyzed in this paper, an insertion sort program, the user can insist on a highly efficient
algorithm, and get a sort which ‘“‘replacd-s’’ the output list and which contains special case constructions for the
cases when ‘“‘replacd’’ will not work. On the other hand, if the user is not so concerned with efficiency, he will
get an algorithm using ‘‘append’’, which requires no special case constructions. Both alternatives are produced
from the same design schemata. The first alternative is chosen for the example synthesis shown in this paper.
B. Overview
In this section, we present an example which will be used throughout the paper. We present the specification
information required by the C2 system from the user and then overview the role of each specification factor in
the synthesis of the example.
‘These functions are expressed in a block structured pseudo-code that will be used throughout the paper.
396 BIGGERSTAFF
Figure 1A
Insertion SORT Synthesized by C2°
Figure 1B
Function fp Synthesized by C2
DESIGN DIRECTED SYNTHESIS 397
This design leads to several special cases within the function fp. First, if the output list is NIL (i-e.,
empty), then a list of one element is constructed and assigned to OUT. Second, if X should be the first
member of a non-empty output list, then OUT must be reassigned to point to the new, first element, whereas
the insertion of X farther down the output list may be effected without reassigning OUT. The C2 optimizer
merges these two special cases within fp.
Within fp’s loop there are two other special cases. First, X is inserted in the middle of the output list
(between X and Z); and second, X is appended to the end of the output list. These two special cases are also
merged into one within fp.
Many aspects of the target program structure are under user control. In this case, the user has chosen
options which have had the following effects:
e@ 10 Examples?
@ 10 Spec?
p(IN,OUT)
{permute(IN,OUT) A
{null(OUT) V
if not null(cdr(OUT))
then {(car(OUT) < cadr(OUT)) A
p(diff(IN,car(OUT)) ,cdr(OUT)) };
else TRUE;}}
gfp
© Implementation Data Structure Specification®
OUT=sort(IN)
Table 1
Inputs From User Specifying Sort
In the following sections, we will examine these specification factors in more detail and summarize how
they are used in the synthesis process.
D A
A,B, etc. are abstract representations of atoms and do not represent the atoms
A,B, etc.
“‘permute”’ is a predicate which is true when M is a permutation of L, and diff(IN,X) creates
a copy of the list IN with the element X re-
moved. If X is not in IN,diff(IN,X) just copies IN.
Name of design routine which will direct the development of the target
code.
Both input and output will be LISP style lists.
DESIGN DIRECTED SYNTHESIS 399
a. IO Examples
The examples in Table | show the input list structure on the left hand side of the arrow and the alterna-
tive output list structures which are possible on the right hand side of the arrow. For example, given the input
list (A B) of length 2, there are two possible forms for the output: the list (A B) or the list (B A). The IO
Examples are, by their nature, strictly limited to information about the structural transformations performed by
the target function. They provide no information about what conditions are required for a particular transfor-
mation to occur.
How are the IO Examples used in the synthesis paradigm? The IO examples reveal the form of the out-
put list for certain states of the computation. By the nature of the abstract design chosen, C2 knows that suc-
cessive states of the output list for an arbitrarily long input list (A B C ...) will correspond to a sequence of sort
outputs produced by the input lists NIL, (A), (A B), (A BC), etc. Thus, the sequence [NIL, (A), (A B), (C
A B)] is one possible sequence of output states for the first three iterations of the ‘‘sort’’ function. By analyz-
ing the difference between states, C2 can determine specific instances of the operations that sort must perform
upon the output list. That is, C2 can easily determine that sort must create a list consisting of A, append B to
that list, push C onto that list, and so forth.
A number of program synthesis systems based on specification by examples have been developed.
Representatives are Biermann [1976b], Hardy [1975] and Summers [1976].
b. The IO Spec
The example IO Spec in Table 1 defines the ‘‘predicate function’’ p, with the parameters IN and OUT, to
be the conjunction of two predicate expressions °. The first predicate expression, permute(IN,OUT), requires
that IN be a permutation of OUT. The second requires, in effect, that for all adjacent items in OUT, the first
item is less than or equal to the second item.
How is the IO Spec used in the synthesis paradigm? The IO Spec defines the conditions under which
specific IO Example output alternatives would occur. For example, under what conditions would sortC(A B
C)) = ’(BC A)? The answer is whenever the expression p('‘(A BC) , ‘(BC A)) is true; that is, whenever B is
less than or equal to C, "(B < C)", and C is less than or equal to A, "(C<A)".
The abstract design will formulate constructs of the form
Abstract execution of the predicate expression involving p will produce a simpler (symbolic) predicate expres-
sion, such as "(A <car( <SpecificListValue >))". A subprocess of inductive generalization, called ‘‘variabliza-
tion’’, will map this expression into the predicate expression ‘‘(X <car(Z))”’’ seen in Figure 1B.
The IO Spec mechanism is similar to the specification methods used in deduction based synthesis systems
(Manna [1975, 1976]).
® The predicate function ‘‘p’’ in Table 1 is the analog of the ‘‘output assertion’ in more formal approaches. Since the C2 analog of the
‘input assertion”’ is not required in this discussion, its introduction will be deferred until later.
400 BIGGERSTAFF
c. Abstract Design
determine whether the
The user provides to C2 the name of the desired abs tract design. This choice will
or a quicksort, for example. By choosing the
target program produced is an insertion sort, a selection sort
be an insertion sort.
‘‘sfp’’ design, the user has determined that the target routine will
schema to be
A specific design choice will cause a specific design synthesis routine and specific design
s the specific synthesis procedure and it uses
chosen from the library. The design synthesis routine implement
determinin g the general structure of the target rou-
the associated schema. The schema will be the skeleton(s)
tine.
The gfp schemata can be described informally as:
In C2, arbitrarily long lists are represented in two levels of abstraction, the ellipsis and the abstract list
constant. They differ only in their level of concreteness and therefore, in their demands upon the deductive
mechanisms underlying the evaluator. We will restrict this discussion to the ellipsis, since both are evaluated in
much the same way. The rules for evaluating expressions constructed by C2 are given in Table 2. When such
expressions include abstract data, such as ellipses, the results of evaluation may contain partially reduced sym-
bolic expressions (i.e., code structures). These codes structures are used as target program building blocks.
DESIGN DIRECTED SYNTHESIS 401
if null(L)
then NIL;
else(car(L) . cdr(L));
if null((...))
then NIL;
ClSczaeaes
where the ‘‘. . .”’ in the ‘‘else’’ clause indicates an open-ended tree branch.
.
Table 2
Evaluation Rules
402 BIGGERSTAFF
e [f-then-else
For an expression ‘‘ex’’ of the form
if Q
then expl;
else exp2;
eval(ex) returns
if eval(Q) otherwise
then eval(exp1)assuming Q;
else eval(exp2)assuming not Q
@ Distribution of Functions
For ‘‘ex’’ an expression of the form
if ex]
then expl;
else exp2;
eval(if ex
then f(expl);
else f(exp2);)
@ Expression Simplification
Expressions are simplified whenever possible. For example, ‘‘(TRUE A exp)”’ simplifies to ‘‘exp’’.
@ Function Invocation
For f(X,Y,Z) a function defined by the expression ‘‘exp”’,
eval(f(A,B,C))
becomes
eval(a(exp;(X/A,Y/B,Z/C)))
where the function o substitutes eval(A) for every occurrence of X in ‘“‘exp’’, eval(B) for every oc-
currence of Y in “‘exp’’, etc. o generalizes in the obvious way for functions with a differing number
of parameters.
Table 2 (continued)
Evaluation Rules
DESIGN DIRECTED SYNTHESIS = 403
(2 AGB.C)
if nulK(Ce AsB: ©)
then NIL;
else consCA
enulliC Ge .B C))
then NIL;
else cons(B
if nuliCG ©)
then NIL;
elses...
This is, in effect, the abstract execution tree of COPY(’(... A B C)). This extends the concept of a data struc-
ture to include more than just passive data. Arbitrarily long lists now appear more like the execution tree of a
single loop. That is, they are indefinitely long and they contain branching predicates. These structures will be
woven with the design to form special case branches within the fabric of the target program.
The abstract data structures interact (via abstract evaluation) with the abstract schemata producing new
(conditional) branches within the abstract evaluation trace. The predicate expressions, which are to appear in
these new conditional branches, migrate out of the abstract data structure definitions, such as the definition of
the indefinite LISP list (...A BC). Thus, predicates such as null(M), where M is an abstract list constant, ori-
ginate in the abstract data structure definitions. They migrate into the record of simulated behavior and are
finally generalized into predicate expressions such as null(Z), where ‘‘null(Z)’’ checks for the ‘‘special case’’ in
which the first element is to be put onto an empty output list.
1. The IO examples (see Table 1) which supply information about the values of the output list in various
states, and
2. The built-in design schema which supplies the basic control structure framework and code for managing a
number of target program variables (e.g., the input list or the current element variable).
The initial state tree for sort is shown in Figure 2. The following comments explain Figure 2.
e The notation for specifying computational states is explained in Figure 3. We will arbitrarily choose OUT
as the name of the output list.
e Some states have been omitted from the diagram because, at this point of the development, they only
serve to complicate the tree. We will add them as the discussion requires.
e Similarly, a number of value-variable pairs have been omitted from the states at this stage to simplify the
state tree.
® Stereotypical code, which is added by the abstract design, is omitted in order to shorten and simplify the
example. Typical omissions are the code which manages the ‘‘current item variable’’, X, and the ‘‘input
list variable’’, L. The design procedure supplies code of the form
X=car(L);
L=cdr(L):
Though the code has been omitted, its effects on the values shown in the states are observable. For
example, notice the changes in the values of X and L, from state SO to state S3.
e (... A BC) is called an ellipsis. It is shorthand notation for a list which may be NIL or may contain the
element A cons-ed onto the ellipsis (... BC). Eval((--- A BC)) returns the expression
if null((... A B C))
then NIL;
ElSesAuy ( B.e)e
See the value of L in state S1 and S3 relative to SO, as an example of ellipsis behavior.
Each of the labelled blocks of code in the figure are created from a subset of the IO examples for sort
(see Table 1):
Block | from NIL -> [NIL],
Block 2 from (A) -> [(A)],
Block 3 from (A B)->[(A B) (B A)]
Block 4 from (A BC)->[... (CB A) (BC A) (BA C)]
Block 5 from (A BC)->[(C A B) (ACB) (ABC)... ]
DESIGN DIRECTED SYNTHESIS 405
Figure 2
Formulation of State Tree Using IO Examples
The predicate expressions involving the IO Spec are rationalized as follows. If the current state of the
output list (and possible final answer of the function) is (B A), then by the definition of the IO Spec, we
know that p((A B),(B A)) must be true. Here, (A B) is the input example associated with output (B A)
in the IO Examples. Thus, the statement
406 BIGGERSTAFF
would be valid code for computing the output value for this specific case.
si:(VAL1:VBL1, VAL2:VBL2....]
where
-‘‘si’? represents state identifier
-‘‘VALi’’ represents a value for a variable in state ‘‘si””
-‘‘VBLi’’ represents the name of the variable with
value ‘“VALi”’
Figure 3
State Specification Format
The succeeding sections explain how the gfp design routine processes the state tree to develop the target
routine code. The steps of the gfp design procedure are summarized below:
2 It will restructure and generalize this state tree into two trees. The first will evolve into a func-
tion which gets the elements from the input list and calls a second function. The second func-
tion merges the element into the output list.
e It will add code required by the chosen design, e.g. the code that cdr’s down the input and
extracts the next element to be added to the output list.
It will reduce predicates, e.g. ‘“‘p((A B C),(C B A))’’, to their minimal form, e.g. ‘““(C < B)”’.
e@ It will map specific symbolic values, e.g. ‘‘(B C A)’’, into expressions (involving variables)
which would have those specific values at given states of the function’s execution. This is
called variabilization.
e And finally, it will fold the tree into the control structure of the target routine.
fp(X,PREFIXL,OUT)
DESIGN DIRECTED SYNTHESIS 407
where X is the ‘‘current item’’ being processed by the target function, PREFIXL is the prefix of the original
input list containing all items which have been processed so far by the target function, and OUT is the output
list being constructed by the target function. PREFIXL is a ‘‘virtual variable’’ introduced by the design pro-
cedure which will be ‘“‘optimized out”’ of the final code.
Assuming the existence of fp, states S6 and S15 can now be represented as a single state
S6 V S15:[fp(B,(A),(A)):OUT].
Similarly, all states developed for input lists of length three can be represented by the same symbolic value,
[fp(C,(A B),fp(B,(A),(A))):OUTI.
Given this, the state tree shown in Figure 2 is reformulated into two state trees; one for the outer loop (in
main sort function) and one for the inner loop (the fp function). These are shown in Figures 4 and 6, respec-
tively.
{s0: [NIL:OUT];
if null((...A B C))
then sl:[NIL:OUT],
else {s3:[fp(A,NIL,NIL):OUT];
if null((...B C))
then s4:[fp(A,NIL,NIL):OUT];
else{s6 V s15:[fp(B,(A),(A)):OUT];
if null((... C))
then s7 V s16:[fp(B,(A),(A)):OUT];
else
{if null((...))
then s9 V sll V sl3 V sl18 V 820 V s22:
[fp(C,(A B),fp(B,(A),(A))):OUT);
else... .}}}}
Figure 4
Reformulated State Tree Assuming
Existence of fp Function
408 BIGGERSTAFF
assert p(PREF,OUT);
fp(X,PREF,OUT)
{prog YZ;
Y =NIL;
Z = OUT;
<Develop pattern M’ to account for IO examples that map
append(PREF,cons(X,NIL)) into M’ >
if <there is no such M’>
then <cannot synthesize >;
else {while not null(Z)
{if not p(append(PREF,cons(X,NIL)),M’)
then {Y = ZZ = cdr(Z)3}
else{OUT = M’; return(OUT);}}
OUT = Ma:
return(OUT);}}
Figure 5
Find-Put Design Schemata
Since, the synthesis of the main sort loop illustrates only a few aspects of the synthesis procedure, we will
not describe the evolution of the state tree of Figure 4 into the target code. Suffice it to say that the state tree
of Figure 4 evolves into the target program shown in Figure IA.
The synthesis of the fp function, however, illustrates many important aspects of the C2 synthesis pro-
cedure, and the remainder of this section will follow that synthesis in some detail.
DESIGN DIRECTED SYNTHESIS 409
0 assert p(PL,M)
1 {t0:[(M:OUT,
undef: Y ,undef:Z,W:X,PL:PREF];
2Y = NIL;
3Z=M;
4 tl:[M:OUT,NIL:Y,M:Z,W:X,PL:PREF];
5 if null(M)
6 then {t2:[NIL:0UT,NIL:Y,NIL:Z,W:X,NIL:PREF];
i OUT = (W);
8 return(OUT):}
9 else {t3:[(car(M) . cdr(M)):OUT,NIL:Y,(car(M) . cdr(M)):Z,W:
X,PL:PREF];
10 if p(append(PL,cons(W,NIL)),(W car(M) . cdr(M)))
11 then{OUT = (W car(M) . cdr(M))
12 return(OUT): }
13. else{Y = Z,;
14 Z = cdr(Z);
15 t5:[(car(M) . cdr(M)):OUT,(car(M) . cdr(M)):Y,cdr(M):Z,
W:X,PL:PREF];
16 if null(cdr(M) )
ig then {t6:[(car(M)):OUT, (car(M)):Y,NIL:Z,W:X,PL:PREF];
18 OUT = (car(M) W);
19 return(OUT);}
20 else {t7:[(car(M)cadr(M) . cddr(M)):OUT,
(car(M) cadr(M) . cddr(M)):Y,
(cadr(M) . cddr(M)):Z,W:X,PL:PREF];
21 if p(append(PL,cons(W,NIL)),(car(M) W cadr(M)
cddr(M)))
22 then OUT = (car(M) W cadr(M) . cddr(M));
23 elsemeee nti in
Figure 6
State Tree Formulated
for Function fp(W,PL,M)
In effect,
so that the (i+1)th state of the output list is expressed in terms of the ith state of the output list.
this parameterizes the IO Examples making PL an arbitrary input list prefix, W an arbitrary element to be
inserted, and M an arbitrary state of the output list.
The IO example re-expression is accomplished as follows. Suppose that we are re-expressing the ith IO
example set. We will re-express each input list prefix as the symbolic list PL. The re-expression of the associ-
ated output possibilities requires a pattern matching process. This pattern matching will treat M and W like
variables for the moment, binding M to successive output possibilities of the previous 10 example set, 1.e. the
(i—1)th IO example set (and to NULL when (i—1)=0); and binding W to the last element of the input list of
the ith IO example set. Thus, for the input list (A B C), M will be successively bound to (A B) and then (B
A), while W is bound to C. With these bindings, the output possibilities of the ith IO example set are rewrit-
ten as expressions of M and W, which use every element of M and W, and which do not permute the elements
of M. For example, for M bound to (A B) and W bound to C:
- (C A B) is re-expressed as (W . M),
- (C B A) does not match this binding of M,
- (A C B) is re-expressed as (car(M) W cadr(M)),
- (BC A) does not match,
- (A BC) is re-expressed as (car(M) cadr(M) W),
- (B A C) does not match.
The resulting re-expressed IO example set is:
Binding M to (B A) produces only duplicates of the output possibilities already discovered. Thus, the above set
is the complete set of re-expressed output possibilities for input lists of length 3.
We now have a set of IO Examples in which the number of output possibilities grows linearly as the input
list length increases. This means that we have parameterized out the effect of the outer loop, and the Find-Put
function, fp, will reflect only the process of inserting an element into the output list.
In summary, the IO examples now take the form
(1) PL.===>(NIL),
O)REP ie 1 OW)te
(3) PL --->[(W .M),(car(M) W)],
(4) PL --->[(W.M),(car(M) W cadr(M)),
(car(M) cadr(M) W)]
These forms of the output will be used for the M’ values in the design schemata of Figure S.
Figure 5 is a simplified version of the Find-Put design schemata which will drive the simulation.
The
complete Find-Put design schemata includes code for managing a number of variables
which are not pe oene
to the sort example, e.g., “first element of input list’’, “length of the output list’’, ‘‘position
of current element
in input list’, and others. Such unused variables and code are removed from the target
routine during optimi-
zation. It simplifies the example to simply ignore them from the outset.
The simulation process will unfold the ’while’ loop in the design schemata substituting values for PREF
X and M’ from the IO Examples above and executing the intervening code to derive values in a given state
from those of the preceding state. For example in state tl, of Figure 6, the input prefix variable, PREF: the
output variable, OUT, and the ‘‘after’’ pointer, Z, are all bound to symbolic lists. Executing the “while”
DESIGN DIRECTED SYNTHESIS 411
predicate ** not null(Z)”’, shown in Figure 5, results in the value ‘‘ not null(M)’’, and this more specifically
determines the values of PREF, OUT and Z in the subsequent states. On the ‘‘null(M)”’ branch (state t2) they
are all NIL. On the ‘‘notnull(M)”’ branch (state t3) it is known that the list pointed to by both OUT and Z
contains at least one element, car(M), and may possibly contain other elements following the ‘‘car(M)’’ ele-
ment. This is represented as
(car(M) . cdr(M)).
All values for OUT are derived from the restructured IO examples given above. In line 7 of Figure 6, the
form of the output comes from example 2; in lines 10 and 11, from example 4; in line 18 from example 3; and
so on. Ina later step (see subsection c below), the design procedure will induce the code which would compute
values with these forms.
b. Predicate Simplification
The next step of the simulation is the creation of those branching predicates which are expressed in the
abstract design as an expression of the IO Spec, p. This step simplifies the predicate expression given in the
abstract design to one which is the minimum expression required at that specific point in the target routine.
The predicate simplification process uses a form of symbolic execution (Hantler [1976]) called abstract
evaluation (or AE). AE simplifies a predicate expression such as
(W < car(M))
The following discussion steps through this evaluation using the evaluation rules shown in Table 2.
The predicate synthesis process is based on the following axiom expressed in Hoare’s notation:
Every expression in this axiom is defined except the predicate expression q. That is, a definition for p is sup-
plied by the user and the specific forms of g (e.g. cons(W,M)) can be derived by differencing the IO Examples
(see the following section for details). Given this, we can use abstract evaluation to ‘‘solve for q.”’
The process of ‘‘solving for q’’ consists of two steps:
The truth of the expression involving the predicate ‘‘permute’’ can be deduced as TRUE from permute(PL,M).
Next, the evaluator determines that null((W car(M) . cdr(M))) is FALSE. After that, the evaluator begins
evaluation of the ‘‘if’’ predicate expression. The cdr of the list (W car(M) . cdr(M)) is the list (car(M)
cdr(M)) which is clearly not null. Next the evaluation begins on the ‘‘then’’ clause. The expression
(W <car(M)) cannot be further reduced and the value of this expression is just the expression itself.
p(PL,(car(M).cdr(M))) is TRUE because the list (car(M) . cdr(M)) is equivalent to M, and p(PL,M) was
asserted true at the outset. Thus, the total expression reduces to
{TRUE A
{FALSE V
if TRUE
then{(W < car(M)) A TRUE};
else TRUE;}}
(W <car(M)).
Other expressions involving p reduce in a similar fashion, producing the simplified predicates shown in
Figure 7.
0 fp(X,OUT)
LAY: = NIC
2Z=M;
3 if null(M)
4 then{OUT = (W); /*Create new list*/
5 return(OUT);}
6 else{if (W <car(M))
7 then{OUT = (W car(M) . cdr(M)); /*Put on front*/
8 return(OUT);}
9 élsetY = Z
10 Z = cdr(Z);
at if null(cdr(M))
12 /*Put on Tail*/ then{OUT = (car(M) W);
13 return(OUT);}
14 else {if(W <cadr(M))
Figure 7
State Tree After Predicate Simplification
e Differencing two successive output states to produce the instantiated code that would have pro-
duced the second state given the first, e.g. the difference of the lists ‘‘(car(M))’’ and
‘“(car(M) W)”’ is ‘‘append((car(m)),cons(W,NIL)).”’
e Mapping data constants to the variables which would have had those values in the given state,
e.g. mapping the data constant variables in ‘‘append((car(M)),cons(W,NIL))” will transform
the difference expression to either ‘‘append(Y,cons(X,NIL))’’ or ‘tappend(Y,cons(X,Z)).”
The determination of the correct interpretation is made by comparing the interpretation sets
from several equivalent states or failing that, by heuristics.
Returning to the problem of folding the pair of lines 4 and 5 into the pair of lines 7 and 8, we find that
differencing the output value associated with line 4 and that associated with line 3 produces the following
interpretations for the output list value ‘““(W)”’,
{cons(W,NIL) }.
414. BIGGERSTAFF
{cons(W,(car(M) . cdr(M))}
{cons(X,Z) ,cons(X,OUT) }
in order of preference. Preference ordering is based on the Find-Put design objective which seeks to interpret
modifications of the output list in terms of the ‘‘current element’’, X; the ‘‘before pointer’, Y; and the ‘‘after
pointer’, Z. Thus, the common, preferred interpretation for the two list instances in lines 4 and 7 is
‘‘cons(X,Z)’’. This interpretation allows lines 3 through 8 to be generalized into
if null(Z) V (X <car(Z))
then{OUT = cons(X,Z);
return(OUT);}
else’...
Note that this choice for an interpretation of the output list value forces the predicate expression instance
‘*(W <car(M))’’ to be variablized to ‘‘(X <car(Z))”’.
Similarly, lines 9 through 13 combine with lines 17 through 21. The expressions ‘‘(car(M) W)’’ in line
12, ‘“‘(car(M) W cadr(M) . cddr(M))”’ in line 15 and ‘‘(car(M) cadr(M) W)”’ in line 20 all have the common
interpretation of ‘tappend(OUT,cons(X,Z))’’, and within the FIND-PUT design context, the synthesizer is
allowed to optimize ‘“‘OUT = append(OUT,cons(X,Z))”’ to ‘treplacd(Y,cons(X,Z))’’. The results of these
combinations and generalizations are shown in Figure 1B.
Notice that if the design criteria were changed to the minimization of the amount of source code rather
than the minimization of computation time (i.e., prefer the use of ‘‘append’’ rather than ‘‘replacd’’), then
‘‘append(OUT,cons(X,Z))’’ would represent an acceptable interpretation for all four cases—inserting into an
empty list, on the front, on the back and in the middle. The resultant form of fp would be a form structurally
the same as the original design schemata of Figure 5S.
It should be clear that small changes in the design criteria applied during the induction and variablization
processes can profoundly effect the resulting target program structure. It is the author’s belief that this syn-
thesis paradigm is one of very few in which user can make design choices which can have a significant effect
upon the resulting target routine structure.
D. A Second Example
Consider a different problem. Given a list of integers as input, copy the list with all of the
non-positive
integers removed. For example, given the list (—7 2 9 —3) as input, the function
‘‘pcopy,’’ should produce
the list (2 9). This problem will introduce several new ideas:
DESIGN DIRECTED SYNTHESIS 415
® It will illustrate the role of the input predicate in error processing synthesis. An input predi-
cate describes what is known about the input data before the computation starts. As it turns
out, this is not really necessary for this example unless the user wants the target program to do
error processing (another design option).
8 It will reveal that the IO Spec information describing the structural relationships between the
input and output lists, is redundant. The job of describing structural relationships is handled
by the IO Examples. For example, leaving the clause ‘‘permute (IN,OUT)”’ out of the IO spec
definition would make no difference in the sort example discussed earlier.
e It will illustrate another form of optimization allowed by the C2 paradigm.
In this problem, the method requires only three examples in order to synthesize the target program. The pat-
tern that these examples establish is that for each new element of the input list, there are only two alternatives.
It is either included in or omitted from the output.
The Input predicate, ‘‘i’’, and the IO Spec, ‘‘p’’, are specified as:
C6 bar)
i(IN)
{if null (IN)
then TRUE;
else {integer(car(IN)) A i(cdr(IN)) };}
p(IN,OUT)
{if null (IN)
then null(OUT);
The Input predicate asserts that each element of the input list is an integer. Notice that the IO Spec relies
on this fact indirectly, only because it uses the arithmetic operator *‘2”’.
ies is a ‘‘structural function’? which operates upon abstract objects and lists of abstract objects. C asks the question, ‘‘is the abstract list
OUT a subset of the abstract list IN.’’ Structural functions ask questions about the abstract objects themselves and not about the values
they represent. Hence, structural functions are total functions which always return either TRUE or FALSE. By contrast, the LISP function
‘‘subset’’ might return as its value, an expression containing ‘‘eq(A,B)”’ meaning that the value of subset depends upon the possible equal-
ity of the value represented by A and the value represented by B.
€ is a structural function which returns TRUE if the ‘‘abstract data item,”’ car(IN) is in the “‘list of abstract objects,’’ OUT, and returns
FALSE otherwise.
3.3 “If and only if.”
416 BIGGERSTAFF
What is the role of the input predicate? In general, the input predicate will influence the abstract evalua-
tion of the IO Spec. However, because the IO Spec is written such that it makes no use of the information
from the input predicate, its only role in this example is in the synthesis of the error processing code. The
error processing capability can be (optionally) provided through the inclusion of the following code sequence in
the loop of the design schemata:
if not i(PREF)
then <Generate error message and
return user specified error code >;
else <Do main line processing >
where PREF is the prefix of input elements processed up to this point. The predicate expression i(PREF)
will evaluate to the series of branching predicates ‘‘integer(A)’’, ‘‘integer(B)’’, etc. within the state tree and
these will generalize to ‘‘integer(X)’’. Since it should be quite clear how such error processing would integrate
into the state tree and since these branches would tend to make the example unnecessarily complex, we will
omit the error processing from the remainder of the example.
Now consider the IO Spec. It levies two basic constraints upon the input and output lists—that the ele-
ments of the output list be a subset of the elements of the input list and that only positive elements are
members of the output. Odd as it may seem, the subset requirement is redundant and will contribute nothing
to the predicate expression. That information is already implicitly transmitted by the IO Examples.
2. Synthesizing ‘‘pcopy”’
The state tree for pcopy is shown in Figure 8. This is developed by using the gfp design routine. The
synthesis of the branching predicates is the most interesting part of this example. The remaining operations are
largely straightforward and are basically, variations of those of the sort example.
DESIGN DIRECTED'SYNTHESIS 417
1 {s0:[NIL:OUT,(... A B):IN,NIL:X],
2 if null((... A B))
3 then sl:[NIL:OUT,NIL:IN,NIL:X];
4 else {s2:[NIL:OUT,(... B),A:X).
5 if p((A),(A))
6 then {s4:[(A):OUT,(... B):IN,A:X];
7 if null((... B))
8 then s5:[(A):OUT,NIL:IN,A:X];
9 else {s6:[(A):OUT, (...):IN,B:X];
10 if p((A B),(A B))
11 then {s7:[(A B):OUT, (...):IN,B:X];
12 ase
13 else {s8:[(A):OUT,(...):IN,B:X];
14 ae
}
}
15 else {s9:[NIL:OUT, (...):IN,B:X];
16 if p((A B),(B))
17 then {s10:[(B):OUT,(...):IN,B:X];
18 eras
19 else {sl1:[NIL:OUT,(...):IN,B:X];
20 eet
}
}
}
Figure 8
State Tree for Example Two
Figure 9
Copy Positive Integers Function
Synthesized by C2
If the reader takes on faith that the expression p((A),nil) evaluates to ‘““(A <0)’, then we can follow
the more interesting evaluation of p((A B),(B)). Substituting values into the definition of p results in
418 BIGGERSTAFF
Both null((A B)) and null((B)) evaluate to FALSE, and ((B)) C (A B)) evaluates to TRUE. (A e (B)) evalu-
ates to FALSE. (A > 0) evaluates to FALSE because p((A B),(B)) is on a branch on which p((A),NIL) is
true, i.e. (A<0) is TRUE. The expression becomes
{if FALSE
then FALSE;
else {TRUE A {FALSE = FALSE} A p((B),(B))}}
The whole expression reduces to p((B),(B)). Substituting in the definition of p again produces
{if null((B))
then null((B));
else {((B) C (B)) A
{(B « (B)) =B > 0} A p(NIL,NIL)}}
{if FALSE
then FALSE;
else {TRUE A {TRUE = (B > 0)} A TRUE}}
which reduces to (B 2 0). This will be the predicate in the branch on line 16 of Figure 8.
Induction and variablization operate much like they do in the sort example with OUT being computed by
the statement
OUT = append(OUT,cons(X,NIL)):
OUT = cons(X,OUT):
return(reverse(OUT));
The optimizer capitalizes upon the design knowledge codified in the C2 synthesis system. For example, the
above transformation is designed very specifically for output lists built by the Find-Put design schemata.
The results of the synthesis are shown in Figure 9.
E. Conclusions
There are several conclusions which can be drawn from this research:
e Generic design knowledge can be codified in an easily reuseable form by focusing on a somewhat narrow
problem domain (e.g. list combining and restructuring). Importantly, this codification does not comprom-
ise the ability to synthesize a wide variety of programs. For example, the ‘‘gfp’’ design can synthesize all
set operations, most list searches, most list restructurings such as ‘‘flatten’’, and most operations which
remove items from the list.
e Disparate kinds of specification information can be mixed together and used for the synthesis. An added
advantage of this is that redundancies within the specification information (e.g. between the IO Examples
and the IO Spec), may be exploited to verify the consistency of the specification information.
e The user can influence the target routine structure through the simple use of design options. In many
synthesis paradigms, such influence on the target routine could only be accomplished through a large
amount of error prone work, such as rewriting a set of axioms.
e In some sense, a specification, factored into pieces is easier to develop and understand than some other
kinds of specifications. For example, the IO Examples are a direct and clear expression of the structural
manipulations of a target function.
In summary, the design directed paradigm represents one method by which large libraries of generic software
designs are possible. The author believes that only through capturing generic software design on a large scale,
will it be possible to produce complex real-world target systems in a largely automated manner.
F. Acknowledgements
I would like to acknowledge Professor David Johnson and Dr. Chris Jette for their contributions to this
research. Also, I would like to thank the editor for suggestions which significantly changed and improved this
paper.
References
Balzer [1977]
R. Balzer, N. Goldman, and D. Wile, ‘‘Imprecise program specification,’’ ISI/RR-—77—59 (April
1977).
Barstow [1980]
5
D.R. Barstow, ‘‘The role of knowledge and deduction in algorithm creation,’ International
Workshop on Program Construction, Castera-Verduzan, France (Sept. 1980).
Biermann [1976A]
b)
A.W. Biermann, ‘‘Approaches to automatic programming,’ in Advances in Computers, Vol. 15,
Academic Press, New York (1976).
420 BIGGERSTAFF
Biermann [1976B] :
A.W. Biermann and R. Krishnaswamy, ‘‘Construction programs from example computations,”’ /EEE
Trans. on Software Engineering 2,3(1976).
Biggerstaff [1976B] .
T.J. Biggerstaff and D.L. Johnson, ‘“‘The C2 super-compiler model of automatic programming,”
First CSCI/SCEIO Natl. Conf., Univ. of British Columbia, Vancouver (Aug. 1976).
Biggerstaff [1977]
T.J. Biggerstaff and D.L. Johnson, ‘‘Design directed program synthesis,» CSCI Tech. Rep.
77—02—01, Univ. of Washington (Feb. 1977).
Gerhart [1976]
S.L. Gerhart and L. Yelowitz, ‘“‘Control structure abstraction of the backtracking programming tech-
nique,’’ /EEE Trans. on Software Eng. 2,4(Dec. 1976).
Green [1976]
C. Green, “‘The design of the PSI program synthesis system,’’ 2nd Inter. Conf. on Soft. Eng., Calif.
(1976).
Guttag [1976]
J. Guttag, E. Horowitz, and D.R. Musser, ‘‘Abstract data types and software validation,’’ Tech. Rep.
ISI, Marina de! Rey (Aug. 1976).
Hantler [1976]
Sidney L. Hantler and James C. King, ‘‘An introduction to proving the correctness of programs,”’
Computing Surveys, Vol. 8, No. 3(Sept. 1976).
Hardy [1975]
S. Hardy, ‘‘Synthesis of LISP programs from examples,’’ Proc. of the Fourth Intern. Joint Conf. on
Artificial Intelligence (Sept. 1975).
Heidorn [1976]
G.E. Heidorn, *‘Automatic programming through natural language dialogue: a survey,’ IBM J. of
Res. and Develop. (July 1976)
Manna [1975]
Z. Manna and R. Waldinger, ‘‘Knowledge and reasoning in program synthesis,’’ Artificial Intelli-
gence 6,2(Summer 1975).
Manna [1977]
eee and R. Waldinger, ‘‘The logic of computer programming,’ Stanford AIM—289 (Aug.
Summers [1976]
P.D. Summers, ‘‘A methodology for LISP program construction from
examples,’’ Proc. of the Third
ACM Symp. on Principles of Programming Languages, Atlanta, Ga. (Jan.
1976).
THEOREM PROVING 421
CHAPTER 19
Pierpaolo Degano
Universita de Pisa
Istituto di Scienze della Informazione
Corso Italia 40, 56100 Pisa, Italy
Abstract
The way a theorem is proven for particular instances of its variables contains information about the way
the theorem can be generalized in order to allow its proof by structural induction. We describe a methodology
which extracts this information and therefore associates to a given theorem to be proven, the inductive
hypothesis suitable for its proof by structural induction.
422 CASTAING, KODRATOFF, DEGANO
Step I. The choice of the induction variables and selection of example proof traces.
One uses the rewriting system in order to choose fertile induction variables
(Castaing and Kodratoff
[1980]) and prove the theorem for particular values of these variables. We
thus obtain the example proof
traces.
In practice, if the functions are defined on the natural inte
gers and their stopping
verifies the theorem for x=0,1,2,3.... . If the functions are defined on the PPIs condition is x=0, , one
on
lists with only atoms at their top
THEOREM PROVING = 423
level and if their stopping condition is x=NIL, one verifies the theorem for x=NIL, x=(A), x=(B A), x=(C
B A), ... (recall that the type list is generally described with a constructor which adds an atom at the left of the
list).
We obtain a sequence of example proof traces; let k be the index of this sequence.
Step 2. Cross-fertilization
One has to prove a relation between two functions f and g. Suppose that the k-th trace contains f applied
to the values for which the proof has been carried on in trace (k—1). We know that the relation holds for
these values (proven in trace k—1), and we apply this relation to obtain a relation depending on g only.
We prove this relation for particular cases and obtain a new sequence of example proof traces to be used
in the following.
Step 5. The desired generalization is obtained. This step calls a theorem prover which will ‘‘easily’? prove the
generalized theorem.
where * is the multiplication supposed to be also defined by a rewriting rule. We want to prove that
g(x,1) = f(x).
Remark: The reader may try to prove this by structural induction and find that he needs an infinite sequence of
proofs.
424 CASTAING, KODRATOFF, DEGANO
The proof trace for x=0 is given by the sequence of doublets ((a;,b,),(a2,b2),(a3,b3)) where a; := g(0,1) =
f(0), ay := 1 = 1, a3 := 0 = 0, by := use the stopping condition f and g, bz := use the constructor of the
integers, ... .
In the following, we shall stop our trace when the definitions of f and g are no longer used since the rela-
tions between the variables will appear at this point.
-for x=1
g(1,1) = fQ) unfolding
trace 1: g(0,1*1) = f(0)*1 stopping condition
|
ah ea Cc
-for x=2
g(2,1) = f(2) unfolding
trace 2: ¢(1,271]) =f) * 2 unfolding
g(0,1*(2*1)) = (f(0)*1)*2 ~~ stopping condition
1*(2*1) = (1*1)*2
Step 4. We follow traces | and 2 since the trace for x=0 contains no unfolding.
Trace | asks at once for the use of the stopping condition, which can be used only if we instantiate x by 0:
We must use the stopping condition in order to follow trace 2; this implies: x—1=0, or x=1. We obtain:
which proves that g(1,x»*x3)=f(1)*xs provided 1*(x>*x3)=(1*1)*xs. At this step we suppose that we have a
system able to simplify both x7*x3=1*xs and 1*(x2*x3) =(1*1)*xs5 to x2*x3=xs5 by computing the function *. No
new information about the variables x»,x3,x5 is obtained by the use of trace 2 and we make the induction
hypothesis that x2*x3=xs5 is the relation which will be found within any trace.
Step 5. We replace x7*x3 by its value x5 in the relation obtained at step 3. We obtain the new relation
g(x,xs5) =f(x)*xs5 which is the generalized theorem we have now to prove.
Remark 1: Suppose that the generalized theorem is proven. The substitution xs5/l proves at once the desired
theorem.
Remark 2: For illustration purposes, we shall prove the generalized theorem. The basic case is reduced to y=y.
Induction hypothesis: g(x,y) = f(x)*y for all y.
We use the associativity of * in order to write: f(x+l)*y = f(x)*((x+1)*y). We thus must prove
g(x,(x+1)*y) = f(x)*(x+1)*y) which equals the induction hypothesis with the substitution (x+1)*y/y.
Step 2. We remark (see B.2, step 1) that f(0) appears in the trace for x=1. The trace for x=0 tells us that
f(0) = g(0,1) then the expression g(0,1*1) = f(0)*1 in the trace for x=1 becomes g(0,1*1) = g(0,1)*1.
We see as well that f(1) appears in the trace for x=2, obtaining g(1,2*1) = g(1,1)*2. In the same way,
we would obtain g(2,3*1) = g(2,1)*3.
These three relations are the new relations we are going to generalize. We therefore study the proof trace
of these expressions.
426 CASTAING, KODRATOFF, DEGANO
Step 3. We generalize the variables and the constants of the relations obtained by cross-fertilization:
g(x,x"x3) = g(x,X5)*X6
Step 4. We follow trace | which starts with the stopping condition of g which can be used only if x = 0 and
obtain
X7*X3=X5"X6
We follow trace 2:
B(X,xX2"xX3) = gB(X,x5)
*X6 (unfolding)
g(x—1,x*(x9*x3)) = g(x—1,x*xs5)*xg (stopping condition)
In order to use the stopping condition we must have x—1=0, i.e. x=1. This gives:
1*(x9*x3) =(1*x5)*x¢
1*(2* (x9*x3
=(1* (2*x5))
))*x¢
. We are able again to prove that all these expressions are equivalent to X9*X3=X
5*x6, SO we make the induc-
tion hypothesis that this relation is valid for all traces. We find the generalized
expression:
8(x,x5"X6) =g(x,X5)*x obtained by replacing X2"X3=X5*X¢ in the relation obtained at step
3.
Remark 1: Assume the validity of this expression. The induction proof of f(x) = g(x,1) can be written as:
Unfolding once f(x+1), we obtain f(x+1) = f(x)*(x+1) = g(x,1)*(x+1) using the hypothesis.
Unfolding g(x+1,1) gives g(x,(x+1)*1).
We therefore want to prove that g(x,(x+1)*1) = g(x,1)*(x+1) which is an instance of the generalized relation
with xs5/1,x6/(x+l), since (x +1)*1 = 1*(x4+1).
We must prove: g(x, ((x+1)*y)*z) = g(x, (x+1) * y) * z which equals the induction hypothesis with the sub-
stitution y/(x+1)*y.
C. Discussion
C.1. Cross-fertilization or Not?
The above example seems to imply that cross-fertilization, even when possible, is not needed so that one
is tempted to classify it as a useless complication. However, the study of more difficult examples (such as the
function described in section D.7), shows that a direct generalization of the theorem to prove is not obtained
because the equations have only one solution: the particular values we started from.
Cross-fertilization is the only escape we have found up to now in order to avoid this feature so that if the
same phenomenon occurs in cross-fertilization, our methodology becomes useless.
On the other hand, section B.3 nicely shows that the property to be looked for in the proof trace is exactly
the property issued from the induction proof. More generally, let us consider two functions f and g, f being
recursively defined by f(x) = a(f(x — 1), h(x)) and g being defined either by terminal recursion or by recur-
sion. Proving f(x) = g(x) by structural induction is an attempt at proving that f(x) = g(x) implies f(x+1) =
g(x+1). If our prover chooses to unfold f(x+1), it obtains
and the only possibility for using the induction hypothesis is to replace f(x) by g(x), i.e. attempt to prove
a(g(x), h(x+1)) = g(x+1). Our cross-fertilization procedure tries to find a generalization of this very expres-
sion. When g is also recursively defined, one might cross-fertilize any of g or f and the generalized expression
one has to look for depends on the unfolding planned by the prover (Boyer and Moore [1975]).
C.2. What if There is One Functional Symbol?
We have developed the case of proving equalities between two functions. It is however clear that our
methodology applies to any proof of this type P(fj,...,f,) =True where P is a given property.
A particular case can possibly raise difficulties: when P is an equality with a variable or a constant, for
instance REVERSE(REVERSE(x)) = x (see section D for proofs of this property for different forms of
REVERSE).
428 CASTAING, KODRATOFF, DEGANO
— The variable or constant is considered as the result of another function. For instance, we define
COPY(x) = x (see Kodratoff [1979]) and prove REVERSE(REVERSE(x)) = COPY (x).
— The function is defined as an instance of another function. For example, the iterative
REVERSE(X) = REV(x,nil) where REV(x,z) is terminal recursive. At a point in the traces, the
value x can be cross-fertilized by a functional expression containing REVERSE or REV.
It may unfortunately happen that the above solutions fail. More theoretical results or heuristic tricks
must (and we hope will) be included into our methodology.
D. Examples
The variables are either natural integers or lists with only atoms at their top level. We use the usual
definitions for the Predecessor (Pred), the Successor (Suc), the predicates Even, = 0, = 1, integer division by
20/2) CONS... CAR, CDR ATOM. nil:
D.0. Basis Functions
We have used the following definitions of PLUS (+), TIMES (*), APPEND.
f(x) *xy=FACT(x,0)*xp.
-with cross-fertilization:
FACT(x,x»)*x3=FACT (Pred(x),x4)*xWHERE x)=x3,x4=Pred(x).
This is equivalent to:
FACT (x,x9)*x
(Pred(x)
9=FA ,Pred(x>))
CT*x.
D2:
DIF(x):= IF(x=1) THEN 1
ELSE IF Even(x) THEN DIF(x/2) ELSE
+ (DIF (Pred(x)/2), DIF(Suc(x)/2)
DIG(x,y,z):= IF(x=1) THEN y
ELSE IF Even(x) THEN DIG(x/2, + (y,z),z)
ELSE
DIG (Suc(x)/2,y,
+ (y,z))
FIBO(x,FIBO(Pred(x)
,x3)) =+(FIBO(x,xs5),
FIBO(Pred(x),x7))
WHERE x3=+(Xxs5,x7), for all x.
-without cross-fertilization:
E. Conclusion
The examples of section D show that a wide variety of problems can be solved by our methodology. We
should like to conclude by pointing at its weaknesses.
It may happen that each trace introduces a new relation between the variables so that we have to deal with
an infinity of relations.
It may happen that the solution of the obtained equations is more difficult than the proposed problem.
Our main point now is to define precisely a class of theorems and functions to which our methodology applies.
In the meanwhile, we are carrying on an implementation of the methodology in order to have more
experimental results about its applicability and efficiency.
References
Kodratoff [1979]
Y. Kodratoff, ‘‘A class of functions synthesized from a finite number of examples and a LISP program
scheme,”’ Inter. J. of Comp. and Inform. Sci. 8 (1979), pp. 489—521.
Moore [1975]
J.S. Moore, ‘‘Introducing iteration into the pure LISP theorem prover,’’ IEEE, SE—1 (1975), pp. 328—338.
DESIGN ISSUES 433
CHAPTER 20
D.A. Waterman
W.S. Faught
Philip Klahr
Stanley J. Rosenschein
Robert Wesson
Abstract
This chapter describes considerations and research questions for the design of an advanced exemplary pro-
gramming (EP) system. An exemplary programming system typically synthesizes programs from examples of
the task to be performed. The EP system ‘‘looks over the shoulder’? of the user as he performs a task on the
computer, and from this example creates a program called an agent to perform the same task or some variant of
it. User comments or ‘‘advice’’ are combined with pre-stored knowledge about the task domain to create a
general-purpose program for performing tasks illustrated by the example.
434 WATERMAN, FAUGHT, KLAHR, ROSENSCHEIN, WESSON
interfaces to
The purpose of an EP system is to create small personalized programs capable of acting as
work. These programs free users
complex computer systems or as intelligent assistants to aid the user in his
from repeating detailed interactions with applications programs. Yet writing such programs often cannot be
justified because of the large number of programs needed, their personalized nature, and their fast-changi ng
specifications. The EP methodology provides a means for exploring quick, easy, and inexpensive methods for
creating individualized software of this type.
Our discussion in this chapter centers on design issues related to the next-generation EP system, EP-3,
which has not been implemented. Ideas for the EP-3 system design were generated from our experience
designing and implementing two earlier EP systems, EP-1 and EP-2.
The most important design issue for EP-3 is the choice of system architecture. We propose a Hearsay-II-
type architecture, a pattern-directed system that uses a collection of ‘experts’ called knowledge sources (KSs)
to hypothesize and refine procedures for accomplishing some specified task. Furthermore, we propose combin-
ing a model-directed top-down approach to program synthesis with a pattern-directed bottom-up approach. The
model-directed approach is based on scripts consisting of high-level descriptions of tasks the user might want to
perform. This works in conjunction with the pattern-directed approach based on data-directed evocation of
rules. Each collection of rules is a KS that fires when it ‘‘recognizes’’ particular information in the trace of the
example or in hypotheses generated by other KSs. This modular design permits extensibility: new KSs may be
added and modified as needed and new scripts may be acquired by monitoring and analyzing the examples and
mapping them into scripts.
A. Introduction
It felt strange to enter my new office for the first time. The highly polished walnut desk was empty, the
bookcases in repose, the dictaphone microphone in its cradle. All were silently waiting for me, poised and
ready for use by the budding new ‘‘data processing manager,’’ as I was to be called.
After depositing my briefcase, I sat down and turned to face the computer terminal stationed where type-
writers usually sit. When I turned it on, it beeped and typed the reassuring ‘‘*’’ prompt of an operating system
I knew. At last, something familiar! Now, I thought, all I need to do is wade through the new manuals of
library programs, transfer my old programs here, and learn the various nuances of this system, and I can begin
using it. About that time, someone appeared at the door holding a cup of coffee.
“Hi! I’m your new assistant. They call me E.P.,’’ he said, offering me the coffee. He moved over
beside me and sat down next to the terminal. ‘‘What would you like to do on the machine today?”’
‘Glad to meet you, E.P.,”’ I replied. ‘‘I’m not very familiar with this particular system, but how about
logging in and sending a message via electronic mail to my colleagues back at school letting them know I’ve
arrived and am setting up shop here. Send it to ‘Johnston @ ITH-1.’ Know my logon here?”’
‘Sure thing,’’ he replied.‘‘I make it my business to get to know all the computer users around here: who
they are, what they use the machine for, and how they do it. Even down to the way they like to name their
files. Like, do you use ---.src, ---.txt, or ---.orig for your text files?”” As I watched, he had expertly
used the
shortest logon and message generation commands and was almost finished composing the mail message
to
Johnston.
. “Oh, I usually don’t use any extension at all, but I keep all text in a sub-directory called
‘TEXT’.”” He
immediately began creating one for me. ‘‘Oh, while you’re at it, would you retrieve
all my Pascal source files
from the machine back at school? You might have to make some name changes
to get them here.”
“Sorry, I don't know how to use your old machine,”’ he confessed, ‘‘but
if youll show me a little, I’m
pretty quick at getting the hang of it.”’
I touched the keyboard for the first time and quickly logged into my old
machine via the net.
He interrupted, ‘‘Let’s see. That was logon name, then password, then
account number. Right?”
DESIGN ISSUES = 435
‘‘Uh huh.”’ I continued by listing all my old files with the ‘‘PAS:”’ prefix and using the file-transfer pro-
gram there to copy them over to my new directory, changing the prefix to an extension ‘“‘.pas.”’
When I had done a couple like this, he suggested I let him try to finish it. He did it a little differently,
using the local file-transfer program to do it, and I had to correct him once to point out that the files like
‘**PAS:EX-SYS1”’ and ‘‘PAS:EX-BIOSPH”’ were the executable versions, not source code, and not to transfer
them. All in all, though, this guy was going to make using the system fantastically easy!
B. Programming by Example
It would be fantastic, indeed, if we each had an ‘‘E.P.”’ assistant to help us at the terminal. How many
times we’ve had to rename a dozen files according to some simple criterion, or had to do some infrequent
operation that we didn’t quite remember how to do but never had time to construct a macro-program to do for
us! With the tremendous increase in useful program libraries and activities available through the terminal, such
an assistant seems almost essential, to everyone from the professional who doesn’t have the time to learn every
detail of a system to the office clerk who needs to do highly repetitive tasks but thinks ‘‘DO loops’’ are some
kind of new breakfast cereal.
We are researching the ways and means of creating machine-based versions of ‘‘E.P.”’ Through analysis
and the actual construction of two preliminary versions, we have arrived at the point where we are ready to
tackle the design of a program which itself can learn to mimic example tasks and remember how to perform the
routine ones it has seen before—everything that E.P. did above (except, perhaps, bringing the cup of coffee).
We call the method exemplary programming, a type of program synthesis based on program specification from
examples of the task to be performed (see also Biermann and Krishnaswamy, 1974; Biermann, 1976; Siklossy
and Sykes, 1975; Green, 1976). The EP system ‘‘looks over the shoulder’ of the user as he performs a task
on the computer, and from this example creates a program called an agent to perform the same task. User
comments or ‘‘advice’’ are combined with pre-stored knowledge about the task domain to create a general-
purpose program for performing tasks illustrated by the example.
The basic EP paradigm is illustrated in Figure |. The user interacts with an application program or operat-
ing system to perform a task. The EP program watches and saves the record or “‘trace’’ of the interaction as
one example of how to perform the task. During the interaction the user may provide the EP program with
advice clarifying the example. The trace, advice, and built-in knowledge about the task domain are used by the
EP program to construct an agent for performing the task. As the agent attempts to perform the task, it may
encounter conditions that did not occur during the example. At this point the user is notified and asked to
interact again, providing an example of what to do in this new situation. The EP program monitors the
interaction and augments the agent, enabling it to recognize and respond to the new Situation when it next
occurs. Thus, the agent is developed incrementally based on a number of different examples of task execution.
436 WATERMAN, FAUGHT, KLAHR, ROSENSCHEIN, WESSON
Advice
Example
Creation
SYSTEM a
Figure 1. EP Paradigm
The EP approach to program synthesis is based on four important ideas. The most fundamental idea is
that of the user agent (Anderson, 1976b; Waterman, 1977a, 1978). This is a program that can act as an inter-
face between the user and the computer systems he wants to access. A user agent is typically a small program
residing in a user’s terminal or in a portion of a central timesharing system. It may display many of the charac-
teristics of a human assistant, such as the ability to carry on a dialogue with the user or external computer sys-
tems or even other agents. The user agent is the target program we are attempting to create through use of the
EP system.
Another important idea that pervades the EP system design is that of concurrent processing. By this we
mean an organization that permits the user-system interaction to take place concurrently with its analysis by the
EP system and the subsequent creation of a user agent. Instead of having the EP system situated between the
user and the external system, it sits off to one side (see Figure 1), analyzing the trace as it is being generated.
In our current prototype, EP-2, the EP system actually runs on a remote computer linked to the user-system
interaction on the user’s local computer. Thus the delay in response time seen by the user during agent crea-
tion is minimized. Debugging is facilitated by having the EP system execute the agent as the user watches.
After debugging is complete, the agent can be compiled into an efficient form that runs directly on the user’s
local machine.
An important idea that has dominated the EP design philosophy has been learning from examples
(Hayes-Roth, 1978). Example-based learning has been studied by researchers in artificial intelligence and cog-
nitive psychology, particularly in the areas of concept formation (Hunt, 1962; Hayes-Roth and Hayes-Roth,
1977, Winston, 1975), serial pattern acquisition (Simon and Kotovsky, 1963; Waterman, 1976), and rule induc-
tion from examples (Hayes-Roth, 1976a, 1976b; Vere, 1978; Waterman, 1975). It is our contention that traces
of the activity one would like performed in a man-machine environment contain a wealth of information about
the task, the user, and the approach suggested by the user for attaining his goals. Specification by example
is a
natural means of conveying information, one that minimizes the need for training the user in the operation
of
the program-synthesis technique. When supplemented with advice from the user about
his intentions and
built-in knowledge about the task domain, the example becomes a powerful tool for program specification.
The final important idea incorporated into both the design of the EP system
and the design of the user
agents it creates is that of rule-based systems (Waterman and Hayes-Roth, 1978a, 1978b).
The rule-directed
approach to knowledge acquisition has been the basis for a number of successful research
projects in the past
DESIGN ISSUES = 437
few years, including MYCIN (Shortliffe, 1974), a system which contains condition-action rules encoding heuris-
tics for diagnosis and treatment of infectious diseases, TIERESIAS (Davis, 1976), a rule-based approach for
transferring expertise from a human to the knowledge base of a high-performance program; and Meta-
DENDRAL, a program designed to formulate rules of mass spectrometry which describe the operation of
classes of molecular structures (Buchanan, 1974). Other related systems include AM (Lenat, 1976, 1977;
Lenat & Harris, 1978), which uses heuristic rules to develop concepts in mathematics, and SLIM and
SPROUTER (Hayes-Roth, 1973, 1976a, 1976b; Hayes-Roth & McDermott, 1976, 1978), which infer general
condition - > action rules from before-and-after examples of their use. The rule-based system design imposes a
high degree of structure on the code, leading to a simple organization that facilitates debugging, verifiability,
and incremental modification (Hayes-Roth, Waterman, Lenat, 1978).
The EP paradigm is applicable to tasks that require repetitive, personalized user-system dialogue and can
be described by one or more sequences of actions on some external system. Examples of these user-interactive
tasks include computer network tasks (e.g., file transfers), operating systems tasks (e.g., file maintenance),
‘data-base retrievals, and edit macros. Development of an advanced EP system will permit us to make inroads
into the following problem areas involving effective use of computers: 1) the difficulty of correctly interacting
with numerous systems and facilities, each of which requires a unique syntax and protocol; 2) the problem of
remembering how to do something that an ‘‘expert’’ previously demonstrated; 3) the frustrating problem of
repeating the same sequence of instructions to accomplish a frequently occurring task; and 4) the problem of
generalizing specific command sequences to handle a more varied set of problem conditions.
There are, however, certain difficulties associated with the use of example-based programming. First is
the basic problem of specifying a complex algorithm from examples of behavior traces. For most interesting
classes of algorithms, a single behavior trace will generally be consistent with a large number of alternative
algorithms. Since the user of an EP system is trying to build a realization for an algorithm that could be arbi-
trarily complex, he can never be absolutely sure that the approximation constructed by example is close enough
to the desired algorithm to meet his needs. This implies the necessity for an open-ended algorithm-
construction paradigm, i.e., the ability to extend the algorithm by example at any time during its construction
or application by the user.
Implicit in all the EP work are the following conjectures: 1) there exist interesting classes of algorithms
that can be defined by specification of behavior traces, 2) there exist specifiable bases for choosing one algo-
rithm over another as the ‘‘intended’’ one, and 3) programs can be implemented to make this choice and syn-
thesize the algorithms. Reasonable computer scientists may differ as to whether the first two conjectures are
plausible. The skeptic might feel that the approach lacks merit because of the idiosyncratic nature of interesting
algorithms. However, our previous work with exemplary programming has led us to believe that not only do
such classes of algorithms exist, they provide the basis for interesting and practical applications.
Another fundamental problem is how to integrate diverse sources of knowledge in the synthesis of an
algorithm. For any particular application domain there will be several unique knowledge sources that can be
used to help interpret the behavior trace. For example, in the operating system domain they might include
expected input and output strings for each system command, simplified flowcharts of system commands, typical
connected sequences of user actions (e.g., telnetting to a remote site and then logging in), and user advice for
branches and loops. The problem of integrating knowledge sources necessitates the use of a special mechanisim
or representational technique, such as cooperating specialists (see Section D.S).
Another problem is that a strict example-based paradigm tends to present information at a very low level
only. High-level information can also be quite useful, e.g., a description of the task, algorithm, etc., in a high-
level language that effectively condenses the information in many examples into a few concise statements. In
our approach we touch on this idea through the application of ‘‘user advice’’ and pre-defined ‘‘scripts’’ (see
Section E.1) that represent high-level abstractions of potential algorithms the user may desire to implement.
Since much knowledge about the task domain is needed to help the system interpret the examples, the
task domain must be well defined ahead of time. Thus, unexpected results occurring during the example will
be difficult to handle. Even worse, the examples must be error-free, or the EP system must have a way of
recognizing errors when they occur. These errors can come from either the user or the system he is accessing.
438 WATERMAN, FAUGHT, KLAHR, ROSENSCHEIN, WESSON
We feel that the requirement of providing error-free examples is not so great a restriction as to negate the use-
fulness of the approach.
EP applies only to domains where there is an abundance of low-level feedback at short intervals, i.€., a
dialogue that represents or describes the relevant ongoing behavior. Consequently, the exemplary programming
approach would not apply to synthesizing a sort function by giving examples of input-output pairs. However, if
the algorithm could be demonstrated by actually sorting a list of items, showing all intermediate steps, then this
approach might possibly be applicable.
Another problem is that in some interactions involving man-machine dialogue, the man performs crucial
activities in his head that are relevant to the task. How to present these activities to the EP system is the ques-
tion here. For example, a user lists his files and deletes each with the prefix ‘‘bin’’ if he thinks he knows what
is in the file. The EP system might be able to infer that files with the prefix ‘‘bin’’ were being deleted but
would not be able to deal with the later information.
The ideas we describe for an advanced EP system are clearly speculative, since they have not been imple-
mented and tested. Our inital EP systems have only scratched the surface and do not prove that an advanced
EP system such as the one we describe here can be developed. Many researchers have been stymied by the
complexities of developing an automatic programming system. In our approach we hope to simplify the prob-
lem by limiting the automatic programming system to one that creates programs from examples rather than
general specifications. But the task is still complex and difficult. Our approach is based on the deficiencies
uncovered in developing previous EP systems. Although these previous systems produced only simple pro-
grams, they were both operational and useful. We hope the discussion that follows will shed light on new
issues and ideas in the area of program synthesis.
Section C of this Note describes the design and implementation of EP-2, our current EP System. Section
D discusses design goals for an advanced version of EP, while Section E describes architectural considerations.
The conclusions are presented in Section F.
C. EP-2 System
The current operational EP system, EP-2, is patterned after EP-1 (Waterman, 1977b, 1978), an initial
prototype EP system written in RITA (Anderson & Gillogly, 1976a, 1976b; Anderson, et al., 1977). EP-2,
however, extends and develops the ideas in that initial prototype.
with the system through the FE, and the FE marks and sends all these messages to EP-2 so that they may be
incorporated into the trace. The trace is a verbatim description of the interaction, including user advice. EP-2
processes the trace (either incrementally or all at once) to create a model. If the user makes an error and
wishes to correct it, he can edit the trace. Whenever the user edits the trace, the current model is deleted and
a new model is constructed from the edited (and presumably correct) trace by reprocessing the trace. An inter-
pretable version of the model is then stored as a program (or ‘‘agent’’) in a library under the name _ provided
by the user. (The trace is also stored.) When the user calls an agent, EP-2 goes into execute mode; it locates
and reads the model from the library and then interprets it in a manner isomorphic to executing an ordered
production system.
The model is a data structure that represents the algorithm the user had in his mind to perform the
actions shown in the trace. The model is represented as a graph structure of nodes and arcs. The nodes
represent states; the arcs have two components: conditions and actions. Several arcs may originate from each
node (state). During execution, EP-2 tests the conditions on all the arcs emanating from a particular node.
The arc whose condition is true (there should be at most one) is the branch to be taken. The arc’s action is
executed and the state of the agent is set to the succeeding node.
they could be built-in knowledge. EP would also have to know how to log on to remote systems and transfer
files. Some of this knowledge is obviously general and should be built-in. Nonetheless, many details (e.g.,
specific logon sequences) could be learned from repeated observations or from user advice.
Consequently, one promising approach is to build into EP numerous descriptions of commonplace activi-
ties that require domain knowledge. This should be done in a way that allows EP to fill in details later through
observation and advice. Thus EP will learn not only what the user does, but relatively permanent domain
knowledge as well.
5
do some compiling now.’’ When activated, these components automatically establish a complex context that
simplifies EP-3’s understanding of the lower-level actions appearing in the trace.
We will call the action sequences that represent a unified higher-level activity scripts because they resem-
ble drama scripts by specifying contexts and temporal sequences of actions yet to be performed (cf. Schank and
Abelson, 1977). Our scripts are more general than those of the playwright because they allow choice points and
alternative behaviors. A compile script for our imaginary user might look like this:
Recognizers: Scripts recognize that certain activities are being performed by a user.
Predictors: Scripts predict subsequent commands and activities in the trace. The confirmation of such
predictions will improve the credibility of the proposed scripts.
Evaluators: Scripts can support or refute existing hypotheses about user actions.
Generators: Scripts can generate agents by being instantiated within a particular computer environment.
Notice that the use of scripts as generators enables an EP-3 agent to be specified at a high, abstract level.
Executing an agent would involve the instantiation of the specified scripts relative to the current computer
environment. Thus, for example, an agent can be created on one machine and then executed on another
machine. The data independence of these high-level scripts is a necessary requirement for transferring
agents between system environments.
444 WATERMAN, FAUGHT, KLAHR, ROSENSCHEIN, WESSON
An EP-3 with perhaps hundreds of these rules would successively generate more and more abstract
hypotheses directly from the trace. As with most data-directed behavior, this approach might achieve impres-
sive levels of accomplishment if there are enough constraints and redundancies in the data. Care must be taken
however, to avoid pursuit of long search paths obviously at odds with the overall solution being generated.
The architecture proposed below follows this theme. We have tried to be specific whenever possible,
making the design choices necessary to achieve opportunistic analysis along the lines of a Hearsay-II type of
structure. The executive, for example, is not specified but clearly needs to contain a scheduler to resolve dupli-
cate KS firings. This architecture results from our efforts to produce an EP-3 design that will achieve the
highest level of performance possible with current technology.
The data structure chosen is the first instance of a change to the Hearsay-II structure. As others have
done (e.g., Englemore and Nii, 1977), we divide the blackboard into several levels. The example we present
below concerns the transfer of a set of files from one machine to another machine. An initial specification of
blackboard levels and examples relative to transferring files includes:
1) scripts:
top-level knowledge about typical activities users
are likely to do and how they do them, e.g.,
STORE-FILES script:
copy {fileset} from <from-machine > to <to-machine >
2) abstract procedures:
the steps necessary for accomplishing a script specified
at an abstract level, e.g.,
3 specific procedures:
—’
ENVIRONMENT:
<FROM-MACHINE >: unix
<TO-MACHINE >: ecl
<FILE-TRANSFER-PROGRAM >: ftp
VARS:
fileset: SET OF unix-FILE;
from-filename: unix-FILE;
to-filename: ecl-FILE;
4) abstract trace:
a generalization of the actual trace, with specific
commands and their parameters replaced by typed
variables. For example, the last part of the STORE-FILES
script:
SyYSi0 STORE
Usr: store... variables:
SVS 2000 f ron-fil enare
SyS20255 sn" => TYPE: <frammachinefile
Sys-2 2500. VALUE: code.ep
Sysi=252 to-filenare
TYPE: <to-machinemfile
VALUE: code.ep
successful -transfer
TYPE: Boolean
VALUE: True
5) Actual trace
Obvious, e.g., the left column above.
We will then have the KSs operating between the levels, generating, evaluating, and deleting hypotheses.
Some of the KSs might look like:
% dir
..syStem lists his files
% pwd
...prints working directory
% ftp ecl
...executes file transfer program
A novel feature of this particular representation comes as a side-effect of the recognition process. When
the entire trace has been ‘‘explained’’ as a particular instance of some script, what falls out is a rather formal
program, complete with state and environment variables, loops, etc. The program can then be used to tackle
EP-3’s ‘‘generational’’? problem—performing the same tasks itself. This model operates similar to the oppor-
tunistic planner in that it generates its plan as a data-driven side-effect of the more direct answer to the ques-
tion, How do we get from START to GOAL (or FINISH)?
Once such a plan (program) has been generated, with suitable KSs available, EP-3 can proceed to apply it
to new tasks:
1. Directly, by substituting new values for variables where requested and defaulting to the stored ones when
necessary; or
2. Even more intelligently, by using the script information and some evaluation KSs to generate alternate, more
efficient ways of accomplishing the same thing.
As noted above, the realization of this scheme requires that EP possess a great deal of domain-specific
knowledge—in this example, various ways of doing a file transfer.
F. Conclusions
The EP methodology has the potential for making a significant impact on the computing community
because it cuts across task domains, system requirements, and user types. It is most appropriate for repetitive
tasks involving extensive man-machine dialogue. The program created by the EP system acts as a repository of
information about how to perform the task and as an autonomous agent capable of performing that task. As
shown by the discussion of EP-2 applications, the EP paradigm is also useful for tasks composed of many simi-
lar subtasks. The user performs a few of the subtasks as the EP system watches and then tells it to do the rest
itself. Not only is an agent created to perform the task, the user is relieved of providing the EP system with a
repetitious example.
An essential part of the design of an advanced EP system is a pattern interpretation component. We have
described a multilevel framework for this component based on the Hearsay-II architecture that combines a
model-directed top-down approach to program synthesis with a pattern-directed bottom-up approach.
The model-directed approach, based on scripts, provides a concise way of representing contiguous,
context-dependent knowledge. It merges nicely with the learning-by-example paradigm, suggesting more
sophisticated future extensions such as automatic script acquisition by example. The mechanism suggested for
the advanced EP system contains most of the machinery needed for monitoring, analyzing, reformulating, and
storing examples as new scripts. Thus, this approach lends itself to the problem of learning permanent domain
knowledge.
The pattern-directed approach, based on data-directed evocation of rules, is a useful way to represent the
knowledge sources that map knowledge from one level to another. It facilitates both recognizing behavior in a
trace and operationalizing that behavior in new contexts. Maintaining these specialized KS ‘‘experts’’ not only
450 WATERMAN, FAUGHT, KLAHR, ROSENSCHEIN, WESSON
gives us the modularity and clarity needed to promote good human engineering but also allows us as system
designers to incorporate as much high-level expertise as is needed in the individual modules. Thus KSs
involved in particularly difficult tasks such as recognizing regularities in the trace and mapping them into con-
ventional programming constructs can be easily expanded and augmented until they contain enough expertise to
perform as desired.
A fundamental problem related to the use of exemplary programming for program construction is the
handling of unexpected or novel tasks that are demonstrated by example. In this situation the model-directed
approach may be of little help, since existing scripts will tend not to match the example in a consistent manner.
Furthermore, the data-directed approach may lead to false interpretations of the user’s intent, since without
sufficiently primed knowledge sources, the example could appear ambiguous. To handle this difficult type of
situation, the EP approach may have to be used in conjunction with other program synthesis techniques, such
as summarizing the task in a high-level language or describing the algorithm used.
Appendix A
% ep
Telnetting --- loggingin --- starting EP ---
[EP]: create
Describe agent - -
Text: This agent retrieves the file do.doc
Text: from ecl and prints it locally.
Text:
{The user can give a description of the
agent as a form of documentation.}
EP is watching
-EP waiting-
% ftp ecl
{The user starts up FTP, and logs onto
the remote server.}
Connections established.
temp.bak
%
[EP]: end
(The agent is now available for use. The user calls it on a different file.}
{This agent retrieves the file pattern from ecl and prints it locally.}
{The agent starts running by first printing the description of its task.}
% ftp ecl
Connections established.
{The agent tells FTP which remote file is to be retrieved. The file ‘pattern’
is instantiated from the agent call.}
localfile: temp.bak
255 SOCK 3276867075
DESIGN ISSUES 453
% del temp.bak
temp.bak
%
Ending agent - print pattern from ecl
-EP dormant-
%
Appendix B
% ep
To illustrate the work the agent must perform, a trace of the agent-system interaction required to effect
the retrieval is shown below. This is what the user would have to go through if he did not have access to the
agent. User input is in italics.
% ep
Telnetting --- logging in --- starting EP ---
%
The GOTO button is control-P
For help, type GOTO and “help <carriage return >".
%
[EP]: ships status 200 miles
0,
; [This agent uses the LADDER system to print the status of all ships within 200 miles of
the default location (currently NORFOLK). Types of ships recognized by LADDER are:
ships submarines carriers cruisers]
%
[PHASE 1: Start the LADDER system.]
%
-Calling: ladder
%
[This starts the LADDER system at SRI-KL.]
%
[Telnet to SRI with a TEE for saving results of this session.]
% tn sriltee ladder.temp
Open
PARSED!
WHAT IS THE CURRENT POSITION FUEL STATUS STATE OF READINESS AND COMMANDING
OFFICER OF JFK
PARSED!
For SHIP equal to KENNEDY JF, give the POSITION and DATE and PCFUEL and READY and RANK
and NAME.
May LIFER assume that "CURRENT POSITION FUEL STATUS STATE OF READINESS AND COM-
MANDING OFFICER" may always be used in place of "OPSTATUS"? Yes
For great circle distance to 37—00N, 76—00W less than or equal to 200, give the POSITION and DATE and
PCFUEL and READY and RANK and NAME and SHIP.
4k
-Calling: beep
4
[This beeps the user’s terminal.]
4
-Ending agent: beep
4
-Ending agent: ships status 200 miles
-EP dormant-
4
[EP]: ladexit
-Calling: ladexit
4
[Exiting LADDER, back to unix...]
4
[PHASE 4: Exit from the LADDER system.]
4 done
PARSED!
@k
[Confirm]
System shutdown scheduled for Mon 18-Sep-78 00:01:00,
Up again at Tue 19-Sep-78 04:00:00
Logout Job 84, User FHOLLISTER, Account DA, TTY 251, at 14-Sep-78 17:14:02
Used 0:0:12 in 0:4:56
%
; [PHASE 5: Format, print, and save the transcript.]
0
-Calling: ladsave
DESIGN ISSUES 457
%
[This saves the results of each LADDER run
by appending them on the file LADDER.RESULTS
and printing them on the RCC computer.]
%
[Delete cr’s and DEL’s.]
%
[You are now talking to unix.]
%
-Ending agent: ladexit
-EP dormant-
%
458 WATERMAN, FAUGHT, KLAHR, ROSENSCHEIN, WESSON
References
Anderson, R. H., and J. J. Gillogly, ‘‘The Rand Intelligent Terminal Agent (RITA) as a Network Access Aid,”’
AFIPS Proceedings, Vol. 45, 1976, pp. 501—509. (a)
Anderson, R. H., and J. J. Gillogly, Rand Intelligent Terminal Agent (RITA): Design Philosophy, The Rand Cor-
poration, R-1809— ARPA, 1976. (b)
Anderson, R. H., et al. RITA Reference Manual, The Rand Corporation, R—1808-ARPA, 1977.
Biermann, A. W. ‘‘Regular LISP Programs and their Automatic Synthesis from Examples,’’ Computer Science
Department Report CS—1976—12, Duke University, 1976.
Biermann, A. W., and R. Krishnaswamy, ‘‘Constructing Programs from Example Computations,’’ Computer
and Information Science Research Center Report CISRC—TR—74—5, Ohio State University, 1974.
Buchanan, J. R., A Study in Automatic Programming, Computer Science Report, Carnegie-Mellon University,
1974.
Davis, R., ‘‘Applications of Meta Level Knowledge to the Construction, Maintenance and Use of Large
Knowledge Bases,’’ Stanford Al Memo AIM—283, Stanford University, 1976.
Davis, R., and J. King, ‘‘An Overview of Production Systems,”’ in E. W. Elcock and D. Michie (eds.), Machine
Intelligence, Wiley, New York, 1976, pp. 300—332.
Englemore, R. S., and H. P. Nii, ““A Knowledge-Based System for the Interpretation of Protein X-ray Crystallo-
graphic Data,’’ STAN—CS—77—589, Stanford University, 1977.
Erman, L. D., and V. R. Lesser, ‘“‘A Multi-Level Organization for Problem Solving Using Many Diverse
Cooperating Sources of Knowledge,’’ Proceedings of the Fourth International Joint Conference on Artificial Intelli-
gence, 1975, pp. 483 —490.
Faught, W. S., et al., A Prototype Exemplary Programming System, The Rand Coporation, R—2411—ARPA,
1979.
Green, C., “The Design of the PSI Program Synthesis System,’’ Proceedings of the Second International Confer-
ence on Software Engineering, San Francisco, California, 1976, pp. 4—18.
Hayes-Roth, B., and F. Hayes-Roth, ‘‘Concept Learning and the Recognition and Classification of Exemplars,”’
Journal of Verbal Learning and Verbal Behavior, Vol. 16, 1977, pp. 321—338.
Hayes-Roth, B., and F. Hayes-Roth, Cognitive Processes in Planning, The Rand Corporation, R—2366—ONR,
1978.
Hayes-Roth, F., “A Structural Approach to Pattern Learning and the Acquisition of Classificatory Power,”
Proceedings of the First International Joint Conference on Pattern Recognition, 1.E.E.E., New York,
1973.
Hayes-Roth, F., ‘‘Patterns of Induction and Associated Knowledge Acquisition Algorithms,’ in C. H. Chen
(ed.), Pattern Recognition and Artificial Intelligence, Academic Press, New York,
1976. (a)
DESIGN ISSUES 459
Hayes-Roth, F., ‘“‘Uniform Representations of Structured Patterns and an Algorithm for the Induction of
Contingency-Response Rules,’ /nformation and Control, Vol. 33, 1976, pp. 87—116. (b)
Hayes-Roth, F., ‘‘Learning by Example,’ in A. M. Lesgold et al. (eds.), Cognitive Psychology and Instruction,
Plenum, New York, 1978.
Hayes-Roth, F., and J. McDermott, ‘‘Learning Structured Patterns from Examples,’’ Proceedings of the Third
International Joint Conference on Pattern Recognition, Coronado, California, 1976.
Hayes-Roth, F., and J. McDermott, ‘‘Knowledge Acquisition From Structural Descriptions,’ Communications of
the ACM, May 1978.
Hendrix, G., et al., ‘‘Developing A Natural Language Interface to Complex Data,’’ ACM Transactions on Data-
base Systems Vol. 3, No. 2, June 1978, pp. 105—147.
Hunt, E. B., Concept Formation: An Information Processing Problem, Wiley, New York, 1962.
Lenat, D., ‘‘AM: An Artificial Intelligence Approach to Discovery in Mathematics as Heuristic Search,’’ SAIL
AIM — 286, Artificial Intelligence Laboratory, Stanford University, 1976.
Lenat, D., ‘‘Automated Theory Formation in Mathematics,’’? Proceedings of the Fifth International Joint Confer-
ence on Artificial Intelligence, 1977, pp. 833 —842.
Lenat, D., and G. Harris, “‘Designing a Rule System that Searches for Scientific Discoveries,’ in D. A. Water-
man and F. Hayes-Roth (eds.), Pattern-Directed Inference Systems, Academic Press, New York, 1978.
Lesser, V. R., and L. D. Erman, ‘‘A Retrospective View of the Hearsay-II Architecture,’ Proceedings of the
Fifth International Joint Conference on Artificial Intelligence, MIT, 1977, pp. 790—800.
Nii, H. P., and E. A. Feigenbaum, ‘‘Rule-Based Understanding of Signals,’ in D. A. Waterman and F. Hayes-
Roth (eds.), Pattern-Directed Inference Systems, Academic Press, New York, 1978.
Schank, R. C., and R. P. Abelson, Scripts, Plans, Goals, and Understanding, Lawrence Erlbaum Associates, New
Jersey, 1977.
Shortliffe, E. H., ‘“‘SMYCIN: A Rule-Based Computer Program for Advising Physicians Regarding Antimicrobial
Therapy Selection,» Memo AIM—251, Artificial Intelligence Laboratory, Stanford University, 1974.
Shortliffe, E. H., Computer-Based Medical Consultations: MYCIN, American Elsevier, New York, 1976.
Siklossy, L., and D. A. Sykes, ‘‘Automatic Program Synthesis from Example,’’ Proceedings of the Fourth Inter-
national Joint Conference on Artificial Intelligence, 1975, pp. 268—273.
Simon, H. A., and K. Kotovsky, ‘“‘Human Acquisition of Concepts for Sequential Patterns,” Psychological
Review, Vol. 70, 1963, pp. 534—546.
Vere, S. A., ‘“‘Inductive Learning of Relational Productions,’ in D. A. Waterman and F. Hayes-Roth (eds.),
Pattern-Directed Inference Systems, Academic Press, New York, 1978.
460 WATERMAN, FAUGHT, KLAHR, ROSENSCHEIN, WESSON
Waterman, D. A., ‘‘Adaptive Production Systems,’ Proceedings of the Fourth International Joint Conference on
Artificial Intelligence, 1975, pp. 296—303.
Waterman, D. A., ‘‘Serial Pattern Acquisition: A Production System Approach,” in C. H. Chen (ed.), Pattern
Recognition and Artificial Intelligence, Academic Press, New York, 1976.
Waterman, D. A., Rule-Directed Interactive Transaction Agents: An Approach to Knowledge Acquisition, The Rand
Corporation, R—2171—ARPA, 1977. (a)
Waterman, D. A., A Rule-Based Approach to Knowledge Acquisition for Man-Machine Interface Programs, The
Rand Coporation, P—5895, 1977. (b)
Waterman, D. A., and F. Hayes-Roth, Pattern-Directed Inference Systems, Academic Press, New York, 1978.
Winston, P. H., ‘‘Learning Structural Descriptions from Examples,”’ in P. H. Winston (ed.), The Psychology of
Computer Vision, McGraw-Hill, New York, 1975.
SECTION VI
LEARNING
Le7iti ae.
ra ae
i
CONCEPT LEARNING = 463
CHAPTER 21
Abstract
A learning program produces, as its output, a boolean function which describes a concept. The function
returns true if and only if the argument is an object which satisfies the logical expression in the body of the
function. Concepts may be learned by interacting with a trainer or by providing a set of positive and a set of
negative instances. An interpreter has been written which performs the reverse of the learning process. The
concept description is regarded as a program which defines the set of objects which satisfy the given conditions.
The interpreter takes as its input, a predicate and a partially specified object. It produces, as its output, the
completed object. The interpreter is used to aid the learning of complex concepts involving existential
quantifiers. This paper presents algorithms for learning concepts and generating objects.
464 COHEN AND SAMMUT
A. Introduction
When a programmer is given an assignment it is usually of the form, ‘‘here is a problem which we would
like the computer to solve’’ or ‘‘we would like to get this type of information given these data.” The first step
in good software design or ‘‘software engineering” is to analyze the problem and to produce high level
specifications. The programmer should determine the class of problem he is dealing with. In doing this he for-
mulates a description of the problem (the specifications) which demonstrates the relationship between the
program’s input space (the data) and its output space (the results). That is, he forms some concept which
specifies what the required program is meant to achieve. This concept may be viewed as the description of the
relation between the input and the output of the desired system. The program then, may be seen as the imple-
mentation of this relation.
The concept description may be considered a recognition device that classifies input/output pairs.
Whereas execution of the program, given relevant input, will produce the appropriate output. Effectively, the
program is the generator of the input/output pairs.
Clearly, there is a fine distinction between the concept and the program. Historically, this distinction has
seemed significant mainly because the intent of programs was buried deep in the rigid syntax of the language,
most of which was not too far removed from the instruction code of the computer. For example, even in a
high level language like Pascal, an algorithm to append a list onto the end of another would look like this:
type
list =~ listcell:
listcell = record
head: integer;
tail: list
end;
{In Pascal this could have been more neatl y expressed as a recursiv
i e function
i , however other c
languages such as FORTRAN do not allow recursion. |
ay
CONCEPT LEARNING = 465
As more high-level languages have been developed, the trend has been to encode more of ’what’ to do
rather than *how’ to do it. These languages are closer to English than machine code. Through their use, pro-
grams come much nearer to describing the solution to the problem at an understandable level, and not just the
details of the implementation. Such is the case with a language like PROLOG (Roussel [1975]), where
irrelevant details are largely hidden from the user.
Consider the Prolog version of ’append’:
This states that the result of appending any list, X, to the empty list is X. The result of appending X to a list
whose head is A and whose tail is B is the list whose head is also A and whose tail is the result of appending X
to B. Thus the Prolog program is a succinct description of the concept of appending lists.
In effect, the difference between the concept and the program lies only in the level of specification and
detail given in their description. Conventional languages specify a great amount of detail, whereas high-level
languages such as Prolog and other AI systems require much less detail, although some efficiency is lost.
Taking the notion of identity of concept and program a step further we may develop a novel approach to
automatic program synthesis. This step involves the incorporation of a computerized concept formation system.
Concept learning provides the initial important part of the automatic programming process.
Input
|
}
I/O Pair --»Learning--»Prover/ Interpreter
l
|
Output
Figure 1 summarizes the method. Input/Output pairs are supplied to the learning system. It produces a con-
cept description which recognizes these pairs. Given an input, the interpreter uses the concept description to
guide the generation of the corresponding output.
B. Background
The ideas given above form the basis of an automatic programming system which has been implemented
at the University of N.S.W. An earlier version of the system was described in Cohen [1978]. This has been
466 COHEN AND SAMMUT
and
considerably enhanced by extending the syntax of the description language to include quantifiers (Sammut
improvement s are enabled by the use of theorem proving tech-
Cohen [1980] and Sammut [1981]). These
niques.
The project began as an attempt to answer three questions posed by Banerji [1969].
De How should the description of such a concept be stored and processed so that, given an object, we can
determine as quickly as possible whether the object is contained in the concept?
3: Given two sets of objects, how should we construct a short description of a concept that contains all ele-
ments of the first set and none of the second?
As a result of this attempt a structural concept learning system, CONFUCIUS, was designed and imple-
mented (Cohen [1978]). The description language, CODE, represents as boolean combinations of predicates.
These predicates can include relational statements and can refer to other previously or partially learned con-
cepts, thus, providing the language with the ability to grow and include recursive concepts. Restricting the
predicates to one free variable obviated the need for instantiation (c.f. Hayes-Roth [1978]) while still providing
a reasonable degree of generality, as the variable could be highly structured (this restriction has been lifted in
the new system).
A concept can be learned by CONFUCIUS in one of two ways:
CONFUCIUS uses a conservative focusing strategy (Bruner ert a/. [1956]) to generalize positive instances. In
the first method the trainer shows a positive instance and then CONFUCIUS asks if the hypothetical concepts it
generates are subconcepts of the concept to be learned (the target concept).
In the second method the questions are replaced by recognition tests with the set of negative instances.
The structural description language and learning algorithms of CONFUCIUS form the stepping-stones on
the way to the new system. What is of interest here is that earlier it was noted that by enforcing recognition of
partially specified objects in a concept by filling in the spaces as directed by the concept description, a crude
form of program interpretation was being performed.
Initially it seemed that object construction as such, was an interesting extension to the learning system.
The syntax of the language has some subtle refinements made to it so that the interpreter could process recur-
sive concepts and the like. However, when the description language was properly extended to include the
existential quantifier, it soon became apparent that object construction was an integral part of a learning strategy
capable of handling the extended language. It will be shown later that in order to learn concepts involving
quantified variables, objects need to be constructed from components of the data so that new pertinent concept
statements can be generated.
| Moreover, with the ability to generate objects, the concept learning system can show likely objects
to the
trainer for questioning rather than the more complex concepts they represent. This more closely corresponds
to the psychological concept formation procedures (Bruner et al. [1956]).
Finally, the occurrence of the existential quantifier in a concept statement implies a search
through some
domain with associated pattern matching to correctly instantiate the quantified variable.
However, by using the
construction techniques during the recognition process, this search becomes highly constraine
d as the target of
the search is actually generated.
Thus, it can be seen that the areas of automatic programming and concept
formation are interrelated. A
concept learning system may provide the basis for a logical form of automatic
program synthesis. Such a SyS-
tem is outlined in this paper.
CONCEPT LEARNING 467
C. The Language
The syntax of the language is quite simple, being a form of predicate calculus. There are two basic types:
An object is a set of property/value pairs. A property is a symbol. A value can be a symbol or another object
(implemented as a name or pointer to that object). Ultimately, values must be symbols. Thus, we can have
Structured objects. For example, the binary digit one’ is simply described as:
Adding some structure, the object of binary number one’ may be described as:
where
With this structure a list, or ordered n-tuple, of objects is an object. For this special case, the property
names (first, second, third ...) are omitted for convenience. The descriptions of concepts which classify such an
n-tuple may also be considered as descriptions of n-ary relations. Thus, if we have some concept C which
recognizes all and only all pairs of the form
pails
where X is less than Y for some ordering relation R, then the technique used in describing C equally applies to
R. Such a relation (concept) is described later.
The basic predicates of the language are of the form
X.p=v
where ’p’ is a property, ’x’ is an object and ’v’ is a value. The predicate is true if ’v’ is the right hand side of
the property/value pair in ’x’ whose left hand side is ’p’. Thus, using the above example dl.value = | is true.
These predicates are generalized by allowing ’x’ and ’v’ to evaluate to an object and value respectively
(Banerji [1978] gives the formalism for this). Thus, we can have statements of the form
468 COHEN AND SAMMUT
one.tail.value = 1
and
x.colour = y.colour
The description of a concept is built up as a boolean combination of these predicates. Thus the concept of
binary digit is defined as:
where °x’ is the object passed to the concept (i.e. its parameter). Another way of looking at this expression is
to consider it as the generator of the set of all ’x’ such that value of ’x’ is 0 or 1.
If Cis a concept with one parameter, then C(x) is a predicate which is true for an object °x’ which
satisfies the predicates in the body of Cafter ’x’ is substituted for the formal parameter. If C(x) is true, then
*x’ is said to be recognized by Cor that C classifies ’x’ (Cohen [1974]). Predicates of this form can be included
in a concept definition. Thus, the concept of binary number (with no leading zero) may be defined recursively
as:
This says, either the number is ‘one’ or its head is a number and its tail a digit. Now, the description of the
relation, given earlier, of “less’ as applied to binary numbers may be defined by the following concept:
maximum =
[list, max:
list.tail = nil A max = list.head
V [3 x: maximum (list.tail, x) A
(
less(x, list.head) A max = list.head
V less(list.head, x) A max = x
)
z=
(1 2 3)
If we assert that
then the system is required to prove the assertion. Thus the interpretation of the language is very similar to
that of PROLOG.
The following may be loosely considered as the main ‘‘axioms’’ used in proving the correctness of a
concept/ program.
470 COHEN AND SAMMUT
X = X is always true.
X = Y if X is a variable and X is bound to the value of Y.
X = Yif Y is a variable and Y = X.
The value of X is X if X is a constant.
The value of X is Y if X is a variable and X is bound to Y.
The value of X.P is V if the value of X is an object and V is bound to P in that object.
X A Y is true if X can be achieved and Y can be achieved.
X VY is true if X can be achieved and is consistent with any other constraints or Y is true. (‘‘Other con-
straints’’ may be present if this expression occurs in a concept which is called by another concept).
9. [4x: P(x)] if P(x) where x is a variable which will be instantiated during the execution of P(x).
10. P(x’, y’ ..) is true if P(x, y, ...) is of the form [x, y, ...: <expr>] and the values x’, y’, ... are bound to
the variables x, y, ... respectively and the expression ‘‘expr’’ is true.
The specification given above is similar to that given in Banerji [1978]. The most important difference is
that in this system, expressions may contain references to other concepts. And since concepts may be dis-
junctive, we must consider the possibility of backtracking as discussed above.
Given a method of using a concept as a program, we now consider ways in which new concepts may be
introduced to the system, that is, the ways in which new concepts may be learned.
E. Learning
The original model for learning used in CONFUCIUS was the ‘‘Conservative Focusing’ algorithm
described in Bruner et al. [1956]. This algorithm was developed by studying the behavior of human subjects in
tests of their learning abilities. One experiment consisted of showing the subject a positive instance of a con-
junctive concept. He could then modify the appearance of the object and ask if the object was now recognized
by the concept. As the name of the algorithm implies, changes to the object are made in conservative steps. If
a modified version of the object is still recognized by the concept, then the property of the object that was
changed is considered irrelevant. However, if the new object fails to be recognized then the property changed
is important.
The learning strategies used in CONFUCIUS went considerably beyond the capabilities of Bruner’s
methods. However, CONFUCIUS was incapable of learning concepts which require existential quantifiers in
their description.
The new system operates in the same environment as CONFUCIUS (i.e. the experimental situation
described above). However, the algorithm used to choose which features of the object are to be changed now
bears little resemblance to conservative focusing. Before discussing the algorithm we will first define some of
the terms used.
Definitions
ie ‘ Statement, X, is implied by a set of statements {Y;} if and only if any object satisfying all Y; also satisfies
2. A statement X is directly implied by {Y;} if and only if it is implied by {Y;} and there
does not exist a Y’
se by {Yj} such that X is implied by some set {Y'j} where Y'=Y; for all
i except one, say j, where
j = ‘
CONCEPT LEARNING 471
Generalization Rules
During the course of the learning procedure, a trial concept description is maintained. This is the current
hypothesis for the correct description. The program either replaces or adds new statements in order to alter the
generality of the trial. The following rules indicate how a concept description may be made more general.
X= V
Y= V
X =Y
may be used to replace the first two statements in the trial. The new concept is more general since the
values of X and Y are no longer restricted to the value V but may be any value, provided that they are
the same.
Concepts are stored in the program’s memory in disjunctive normal form. That is, as a disjunction of
conjunctions. A conjunction can be viewed as a set of statements. Let the concept C, contain a conjunc-
tion consisting of the set {s;}. Also let the trial concept be a set of statements which has a subset, {s';}
such that {s;} is equivalent to {s’}} for some substitution of variables, o. For example,
-Aq=nlAveEawa...
The two sets {x = nil, y = z} and {q = nil, v = w} are equivalent, for o = {x/q, y/v, z/w}.
472 COHEN AND SAMMUT
The arguments x, y, ... are obtained from o. From the example above, the two statements in the trial
could be replaced by C(q, v, w).
The new trial is more general if C is a disjunctive concept, since the other disjuncts of C will recognize
more objects than the one disjunct which appeared in the trial originally. For example, if C is ‘‘append’’,
as defined earlier, then by replacing the matched statements, the trial description admits new objects, q, v,
w which were not recognized by the previous trial.
3. A statement of the form
Ccy;.a)
may be replaced by
[ax ayia ee © Oa ee
where x’, y’, ... are unique variable names. This new statement is implied by,
Ce, LINX
= ONY =k
The Algorithm
The program begins by generating a simple, ungeneralized, description of the training instance. Some
statements in the description may directly imply more complex relationships. These implications may be
replaced by the more complex statement in order to generalize the concept. As long as the generalization is
valid, we continue to replace more specific predicates with those that are more general. However, it is possible
for a generalization to ‘‘overshoot’’ and become too general. In this case implied relationships are added to the
concept (rather than replacing existing predicates) in order to make the concept more specific.
Once the concept has been restricted to the point where it is once again contained in the target concept,
generalization can proceed again. Sometimes one line of generalization may be bound to fail and must be aban-
doned. In general, however, this algorithm may be regarded as producing successively better approximations of the
concept, oscillating between over-generalization and under-generalization until the target is finally reached.
The basic ideas behind the algorithm can best be illustrated by considering an example. Suppose we wish
to learn the concept ‘‘maximum of a list’’ as described earlier.
1. We begin by generating a set of predicates (statements) which describe the training instance in the sim-
plest possible terms. As ‘‘maximum’”’ is a recursive concept, the first positive instance the trainer shows
the system must teach it what the termination condition is. Let this instance be the pair <(1), 1>. That
is, the maximum of the list (1) is the number 1. The initial hypothesis for the concept is:
where X is the first sub-object in the pair and Y is the second. These are the primary statements, which
do no more than correspond to the object description.
The system now attempts to generalize the concept description by making some simple deductions (impli-
cations) from the hypothesis. It is possible to deduce that X.head = Y. A new hypothesis is proposed:
When two statements are used to deduce a new one, they are temporarily removed from the concept
description.
Employing the construction techniques of the previous section, the hypothesis is used to generate an
object which the system shows to the trainer. For example <(0), 0> is consistent with the constraints
above. When asked if this object is recognized by ’maximum’ the trainer answers ’yes’. The generaliza-
tion made is valid.
* When an object is generated for display to the trainer, the program must ensure that the object does not
conform to any of the conditions imposed by statements which have been temporarily removed.
The program now attempts to generalize further by trying to make other deductions from the new
hypothesis. Let us assume at this stage that no further generalizations are possible. This implies that one
disjunct of ’maximum’ has now been learned.
In response to a question asking if the complete concept has now been learned, the trainer answers ‘‘no’’.
The system then asks for a new training instance so that it may learn another disjunct.
The trainer shows the object <(2 1), 2>. The primary statements now generated are:
X.head = 2 (1)
A X.tail.head = 1 (2)
A X.tail.tail = nil ©)
I OC (4)
The first generalization attempted uses statements (1) and (4). These generate the relational statement
(5). The new hypothesis is,
X.tail.head = 1 (2)
A X.tail.tail = nil (3)
A X.head = Y (5)
This concept will construct the object <(0, 1), 0> to query the trainer. As this object is not recognized
by ’maximum’, the hypothesis must be too general. Therefore, it should be restricted (i.e. made more
specific). A concept may be made more specific by adding statements to its description. The problem
becomes: which statements should be added? The introduction of the new statement (5) is responsible
for the overgeneralization. Thus, too much information was removed when the implicants (1) and (4)
were replaced. Additional statements will be chosen to restrict the trial by attempting to form new rela-
tionships based on the implicants which have been removed. Thus (1) can be used in combination with
(2) to produce:
which is a restriction of the earlier hypothesis as statement (6) represents an added constraint on X.head.
A new object, <(1 0), 1 >, is constructed. This is recognized by ‘‘maximum’”’, that is we have a success-
ful generalization. Statements temporarily removed may now be discarded by the generalization process.
(They may still be referred to by the object construction process.)
474 COHEN AND SAMMUT
The program continues by attempting to generalize. Statements (5) and (6) may be used to deduce the
following:
* When the program attempts a generalization, it begins by trying to find simple relationships, such as
equality, between sub-objects. If these possibilities are exhausted then more complex relationships are
examined such as ‘* <”’ above.
In this case the generalization (7) is too great since the object <(0 0), 1 >, which is not recognized by
*maximum’ could be constructed. Since the implicants (1) and (2) can not be used to make the trial
more specific, statement (7) must be abandoned as a generalization.
The program must now try a new approach to restricting the trial.
Suppose that all the built-in relations such as ‘‘=”’ and ‘‘<”’ have been tried without success. The pro-
ee ”°
gram may now attempt to use a concept that it has learned before to make the description more specific.
The problem is, which of the concepts stored in the program’s memory are relevant? The method used
for discovering potentially useful concept is based on generalization rule 2. The program looks for a con-
cept which contains at least one statement which matches a statement in the trial. In this case remember
that the first disjunct of ‘maximum’ contains ‘*X.tail = nil’? which matches statement (3) and ‘‘X.head =
Y”’’ which matches statement (2) with the substitution {X/X.tail, Y/1}.
As a restricted hypothesis the program may produce this concept:
alee, (Oy
maximum(P, Q) (8)
A P = X.tail (9)
AQ=1 (10)
/\ X.head = Y (S)
]
“ Whenever a reference to a concept is introduced, its arguments are existentially quantified as above.
The values of P and Q can be determined from the substitution obtained during the statement matching
operation.
: In order to speed up the search for matching statements, the program keeps a directory of concepts and
indexes them according to the form of statements contained in them. It is then only necessary to consult
this directory to find the must concepts which might be useful.
* In this example, all the statements in one conjunction of the concept stored in memory were matched.
However, to learn more complex concepts such as ‘‘quick-sort’’ it must be possible to allow only partial
matches to occur. In such cases the values of some of the parameters to the introduced concept must
be
left free. As a heuristic, the concept with the largest number of matching statements is selected first.
13. This trial is now a valid generalization, since it is able to construct the object <Gal esc.
CONCEPT LEARNING 475
14. Continuing to generalize, the program uses (1) and (10) to produce:
[SeyO:
maximum(P, Q) (8)
A Q < X.head (11)
A X.head = Y (5)
P= Xitail (9)
[
This is a valid generalization since any object that it constructs always satisfies the concept that the trainer
is trying to teach the system.
1S. In fact the program will not be able to generalize any further. That is, the second disjunct of ‘maximum’,
Note: Throughout this example, ‘‘=’’ and ‘‘<”’ are referred to as built-in predicates. In the present sys-
46
tem, ‘‘<’’ is not, in fact, built-in but is a learned concept similar to maximum’. The inclusion of state-
ments involving ‘‘<’’ should therefore proceed along similar lines. However, this would complicate the
example. For reasons of clarity, it is treated in the same way as ‘*=”’.
)
* As above, negative examples are used to test the validity of the hypothesis.
* Positive examples recognized by valid generalizations are pruned from the list as they must be recognized by
the same disjunct.
* When there is a choice of generalizations to be made, those that recognize other positive examples should be
tried first. This would guide the process into learning a concept with a minimal number of disjuncts to cover
the positive instances.
* When no more valid generalizations can be made from the current hypothesis, the hypothesis is added as
another disjunct to the concept being learned.
* At this point the next positive instance in the list is used to initiate the learning of a new disjunct.
Clearly, as has been shown elsewhere (Winston [1970]), the sequence of positive instances provided is critical
to the learning. For example, when learning a recursive concept, an instance representing the termination con-
dition must be given first. An alternative to the ordering of instances fixed by the trainer is to have the pro-
gram choose its own order. The positive and negative lists may be given as unordered sets, and the learning
program charged with the responsibility of ordering the examples on some simplicity criteria. Such an ordering
may be based on the complexity of the structuring of object—thus, <(0), 0> is ‘‘simpler’’ than <(0 1), 1>.
It was already mentioned that the program chooses concepts for inclusion in the hypothesis on the basis of
the forms of the statements it contains. As each disjunct of a concept is learned, its statements are entered into
a directory. Such an entry may be:
During the learning of maximum’, the system tried to use ‘‘X.head = Y”’ to deduce some new information.
Since this statement matches the above directory entry, it would know that ‘append’ and ’maximum’ and possi-
bly other concepts are worth trying. Since the directory look-up will find more matches with ‘maximum’ than
append’, *maximum’ is considered more likely to be included in the concept. Therefore, it is tried first.
*Append’ would involve the construction of objects before it could be implied.
The entries in the directory of concepts may be ordered according to frequency of access. Thus, state-
ments which occur often can be reached quickly. These statements are also most likely to lead to useful con-
cepts since the frequent occurrence of that statement implies the frequent occurrence of the concepts in which
it is contained. (The concepts within an entry may be similarly ordered). If the directory is of limited size,
then statements which are rarely referred to may ‘“‘fall off the bottom”? thus resulting in ‘‘loss of memory’’.
CONCEPT LEARNING 477
F. A Further Example
Let us now turn to the problem of generating a program which, given a list of numbers X (some of the
numbers may be negative), produce a list, Y, the same as X except that the negative numbers have been
deleted.
As with ‘*‘maximum”’ a list will be represented by an object, <head: A; tail: B> where ‘‘head”’ and ‘‘tail”’
are the equivalent of LISP’s ‘‘car’’ and ‘‘cdr’’. A number will be represented by an object, <sign: S; mag: M>
where S is either “‘+’’ or ‘‘—”’ and M is the magnitude of the number, that is an unsigned cardinal number.
For this problem, zero will be considered positive. Let us assume that the concepts cardinal(X) (true if X is
any unsigned integer) and number(X) (true for any signed number, X) are already known to the system.
The concept ‘‘delete’’ which is to be learned may be described as follows: If X is nil then Y is nil. If the
head of X is negative then Y is obtained by deleting the negative numbers from the tail of X. If the head of X
is positive then the head of Y is the same as the head of X and the tail of Y is obtained by deleting the nega-
tive numbers from the tail of X.
delete =
[Xanye
X=nlAX=Y
V
[SP, Q:
X.head.sign = "—"
(A X.head.mag = P
A X.tail = Q-
/\ cardinal(P)
A delete(Q,Y)
]
V
[SP,Q:
X.head.sign = "+"
A X.tail = P
A Y.tail = Q
/\ Y.head = X.head
A delete(P, Q)
To teach the program the first disjunct, the trainer shows the example <nil, nil>. The initial trial concept is:
X = nil A Y = nil
Since this can recognize any pair of objects which are the same, the trial is invalid and must be restricted. This
can only be done by reintroducing one of the implicants.
X=nilAX=Y
This cannot be generalized any further, so the program accepts this description as the first disjunct.
To learn the second disjunct the trainer shows, <(—1), nil > which results in:
X.head.sign = "—
A X.head.mag = |
A X.tail = nil
A Y = nil
Since X.tail = nil and Y = nil the trial can be generalized to:
X.head.sign = "—
" "
A X.head.mag = |
A X.tail = Y
It is possible for this trial to produce the example object <(—1, —1), (—1)>. Therefore it must be made
more specific. As in the first disjunct, one of the implicants must be returned to the trial. This results in a
description equivalent to the initial trial.
X.head.sign =
Lee |
A. X.head.mag = 1
A X.tail = nil
AX tally
The magnitude of the head of X is recognized as a cardinal number by the concept ‘‘cardinal’’ therefore the
next generalization is:
(SP:
X.head.sign = "—"
\ X.head.mag = P
A X.tail = nil
A X.tail = Y
A cardinal(P)
This generalization is now used to generate an object to show to the trainer. <(—2), nil> is shown. Since
this is recognized by the target ‘‘delete’’, the trainer answers “‘yes’’ and the program
continues to generalize its
trial.
Since “X.tail = nil” and *‘X.tail = Y’’ match the statements in the first disjunct
of ‘‘delete’’ for the sub-
stitution {X/X.tail, Y/Y} the following generalization is tested:
CONCEPT LEARNING 479
[SP, Q:
X.head.sign = "—"
A. X.head.mag = P
A X.tail = Q
A cardinal(P)
A delete(Q,Y)
Given this concept, the object generation procedures produce the example <(—2, —1), nil>, which is also
valid. Therefore, the program may continue to generalize the trial.
The head of X has a sign and magnitude, therefore it is recognized as a number therefore the statements
describing X.head are replaced in the trial:
[SP, OUR:
X.head = R
A X.tail = Q
/\ number(R)
A delete(Q, Y)
This concept results in the example <(+1, —1), nil> which is incorrect. Thus the last generalization was
invalid.
The program is unable to restrict the trial in any way as long as number(R) is present. Therefore the trial
is abandoned and the program returns to the previous trial. It is also impossible to generalize that any further,
so the definition of the second disjunct of ‘‘delete”’ is:
[SP, Q:
X.head.sign = "—"
/\ X.head.mag = P
A X.tail = Q
A cardinal(P)
A delete(Q, Y)
To teach the final disjunct, the trainer shows the program <(+1), (+1) >. The initial trial is:
X.head.sign = "+"
A X.head.mag = 1
A X.tail = nil
A Y.head.sign = "+"
A Y.head.mag = |
A Y.tail = nil
The first generalization that can be made is that the signs are equal:
480 COHEN AND SAMMUT
X.head.mag = |
A X.tail = nil
A Y.head.mag = |
A Y.tail = nil
A X.head.sign = Y.head.sign
With this definition, the object <(—1), (—1) > may be generated, which is incorrect. The implicants, which
have been removed, are now used to look for ways of making the trial more specific. There is no concept
stored in the program’s memory which has a conjunction to match the statements in the trial. So the program
can only reintroduce one of the implicants.
X.head.sign = "+"
A X.head.mag = 1
A X.tail = nil
A Y.head.mag = 1
A Y.tail = nil
A X.head.sign = Y.head.sign
Of course, this is identical to the original trial, so the program must look for other ways of generalizing the
trial. Another equality is possible:
X.head.sign = "+"
A X.tail = nil
A Y.tail = nil
A X.head.sign = Y.head.sign
/. X.head.mag = Y.head.mag
Since this may produce the object <(+2), (+2) >, the generalization is valid.
Continuing to find equalities, the program attempts to add ‘‘X.head = Y.head’’, replacing the two equali-
ties just introduced. The resulting trial is identical to previous trial, so the program continues to look for more
generalizations.
Yet another quality is ‘*‘X.tail = Y.tail’’ resulting in the generalization:
X.head.sign = "+"
A. X.head = Y.head
A X.tail = Y.tail
However, this can result in the object <(+1, —1), (+1, —1) > being produced, hence the trial must be made
more specific. The implicant ‘‘X.tail = nil’? must be returned to the trial. However, ‘‘X.tail = nil’’ and
‘“X.tail = Y.tail’? match the first disjunct of delete with the substitution {X/X.tail, Y/Y.tail}. Thus a new gen-
eralization,
CONCEPT LEARNING 481
[SP, Q:
X.head.sign = "+"
A X.tail = P
A Y.tail = Q
A. X.head = Y.head
A delete(P,Q)
can produce the object <(+2, —1), (+2)> which is correct. No more generalization can be made, so the
complete concept ‘‘delete’’ has been learned.
G. Conclusion
Initially, the intent of the research was to provide a concept learning system—thus CONFUCIUS was
(re)born. However, in extending the language it was found necessary to incorporate object construction pro-
cedures. In effect, the result is a ‘‘computer language’’, since programs will be written by the learning system
and executed without human intervention. Thus, the complete system represents a new approach to automatic
program synthesis.
An area where such a system may find application is in the solution of robot-type planning problems.
QLISP [Rulifson, 1972] is a language which facilitates the writing of problem solving and theorem proving pro-
grams. Although it is a procedural language it also exhibits goal directed behavior in its ““GOAL”’ expression:
(GOAL goal-class goal)
Here the programmer asks the system to attempt some goal. For example, a robot planner might have an
expression of the form:
References
(Bruner, 1956] .
J.S. Bruner, J.J. Goodnow, and G.A. Austin, A Study of Thinking, Wiley, New York (1956).
[Banerji, 1969] .
R. Banerji, Theory of Problem Solving: An Approach to Artificial Intelligence, American Elsevier, New
York (1969).
[Banerji, 1978]
Fourth International Joint
7°
R. Banerji, ‘‘Using a descriptive language as a programming language,
Conference on Pattern Recognition, 346, 350 (1978).
[Cohen, 1977]
B.L. Cohen, ‘‘A powerful and efficient structural pattern recognition system,”’ Artificial Intelligence,
05223) 2561 97a):
[Cohen, 1978]
B.L. Cohen, A Theory of Structural Concept Formation and Pattern Recognition, Ph.D. thesis, Dept. of
Computer Science, University of N.S.W. (1978)
[Michalski, 1980]
R.S. Michalski, ‘‘Pattern analysis as rule guided inductive inference,’ /EEE Transactions on Pattern
Analysis and Machine Intelligence (2) 4, 349—361 (1980).
[Roussel, 1975]
P. Roussel, *‘Prolog: Manual de reference et d’utilisation,’’ Groupe d’Intelligence Artificielle, Mar-
seille Luminy.
[Rulifson, 1972]
J.F. Rulifson, J.A. Derksen, and R.L. Waldinger, ‘‘QA4: a procedural calculus for intuitive reason-
ing,’ S.R.I. Artificial Intelligence Center, Technical Note 73.
[Sammut, 1980]
C. Sammut and B. Cohen, ‘‘A language for describing concepts as programs,’ Language Design and
Programming Methodology, Springer Verlag Lecture Notes in Computer Science, Vol. 79, Editor:
J.M. Tobias.
[Sammut, 1981]
C. Sammut, Learning Concepts by Performing Experiments, Ph.D. Thesis, Dept. of Computer Science,
University of N.S.W.
[Winston, 1970]
P.H. Winston, Learning Structural Descriptions from Examples, Ph.D. Thesis, MIT Artificial Intelli-
gence Laboratory.
A PATTERN RECOGNITION VIEWPOINT 483
CHAPTER 22
Ranan B. Banerji
Saint Joseph’s University, Philadelphia, USA
Abstract
A formalism has been exhibited which unifies the basic structures of programs as learned from examples
and patterns as recognized from instances. This indicates that certain techniques of program construction are
available in the pattern-recognition field. Although these techniques would lead to much greater flexibility and
strength in pattern recognition, they are seldom used: there has been a marked reluctance to the use of sub-
routines in the pattern recognition field. It is believed that the formalism exhibited here would remove some
of the roadblocks. The limitations of some of the program-learning techniques are also illuminated by the
method.
484 BANERJI
Introduction
This paper is addressed to a wide area of activity, including hopefully the entire field of activity of Pro-
gram Construction and Automatic Programming and the area of Pattern Recognition involved with what I have
previously called (in Banerji [1979]) ‘‘Interpreted Logical Descriptions.”’ This area is often also called ‘‘Struc-
tural Descriptions’. I use the former term to avoid confusion with what is known as *‘Syntactic Descriptions”
(Fu and Swain [1969]).
Naturally, if one has to embrace such a wide area, one has to do it at a level of abstraction which may
seem useless. For the purposes of transferring techniques from one sub-area of activity to another such
abstraction may indeed be useless. However, it is my belief that even at this abstract level there will be enough
structure left as to allow us to pinpoint certain problems and insights that permeate the entire area addressed.
For some sub-areas, this pinpointing will have to be informal. For others, it will be so close to the technical
level as to perhaps allow transfers of technique.
In a very informal way we may say that an Automatic Programming System is a device which has com-
puter programs at the output. We are not terribly concerned here as to the language in which the program is
written — as long as one is convinced that a state-of-the-art compiler can be written to convert its sentences to
a program for a present-day digital computer.
By the same token, what the pattern-recognition expert wants as the output of a ‘‘learning system’’ is the
specification of some device which, given formal objects (e.g., ‘‘scenes’’, ‘‘feature vectors’’) yields a
classification signal. There is a wide divergence in the pattern recognition field as to the form this specification
takes at the output of the system. In the Interpreted Logical Description area (Cohen [1978], Cohen and Sam-
mut [1978], Banerji [1964] and [1976], Sammut [1981], Banerji [1978], and Chapter 21, this volume) one
makes the specification into a statement in symbolic logic: the state-of-the-art dictates statements in some sub-
set of a first order predicate calculus.
What brings Interpreted Logical Descriptions close to Automatic Programming is the fact that a number
of workers (Warren [1974], Bibel et a/. [1978], Kowalski [1979]) have used the descriptive language of predi-
cate calculus to specify programs. Interpreters exist for making this happen. The problems posed by this inter-
preter have some commonality with the problems of pattern recognition. Moreover, since all ‘‘learning’”’ in
pattern recognition is done on the basis of examples, the entire technique becomes analogous to learning of
programs from input-output.
The major bridge between automatic programming and pattern recognition then, has the learning of pro-
grams from input-output pairs at one end and recognizing relations by ordered-pair examples with logical
description languages at the other.
So far we have talked about the inputs to the automatic programming system, except in passing immedi-
ately above. A wide divergence occurs at this end between different workers in the field of automatic program-
ming. At one end of the spectrum we have workers who feel that the specification can and should be given in
natural language. Then there are others, motivated by problems of Robotics, who feel that the specification
should be in the form of examples of what inputs are to be transformed into what outputs. There are those
who are trying to understand what would be involved if the specification were given interactively (Biermann
[1972]): i.e. if instead of acting upon the entire specification initially, the system interrogated the specifier
(whether a cooperating human or an impersonal ‘‘world of the robot” responding to an experiment with
results) during the development of the program.
At the very comfortable opposite end of the spectrum are the workers who feel that the specification
needs to be given with the same precision and the same kind of a language as is used by workers
in the field of
program verification (Manna and Waldinger [1977]) for specifying the output of a program.
In my own work I have straddled the two extremes. On the one hand | have tried to
insist that the
specification language should have the same precision of syntax and semantics as a formal
language like sym-
bolic logic. On the other hand I have felt that the specification itself should be in the
form of a set of exam-
ples. This latter feeling came from my continued interest in pattern recognition.
As a matter of fact, my rather
A PATTERN RECOGNITION VIEWPOINT 485
recent interest in automatic programming arose only when I recognized that the pattern recognition language |
was using could describe executable functions.
As I have indicated, my preoccupation has been with program development from examples. However,
there are some linguistic aspects of the specification language which holds also when the specification of the
input is done by a precise language (Warren [1974], Bibel et a/. ) other than that of input-output pairs. When
the input specification is by input-output pairs (Biermann and Feldman [1972], Summers [1977], Jouannaud
[1977], Guiho and Jouannaud [1978], Treuil et a/. [1977]) commonality with pattern recognition problems is
of course much greater. A remarkable amount of these problems also are of a linguistic nature.
I believe that the discussion of all these common problems would be facilitated if we tried to translate the
techniques used by various authors into a single syntactic and semantic framework. In this paper, I shall choose
a system used by myself (Banerji [1979]) and (in somewhat modified form) some colleagues (Cohen [1978],
Cohen and Sammut [1978], Sammut [1981], and Chapter 21, this volume). In what follows, we shall introduce
the syntax of the system and—to the extent possible—its semantics.
If one allows the use of these defined predicates inside other definitions, one can obtain considerable
compression. For instance, if one wants to define pairs of binary digits one could write
However, the chaining of function applications could be reduced and considerable compression obtained if we
write
486 BANERJI
Another convenient compression we shall often use forms a basis of what is to follow. The pre-defined
predicates satisfied by a member of ‘‘digit’’ (i.e. value(x) = 0) can be written
x € (value, 0) (4)
Expressions like the one to the right of the ‘‘€”’ sign will be called ‘‘objects’’. As an example of a more com-
plex object, note that a typical member of ‘‘digpair’’ might satisfy
and further to
As we proceed with a formal definition of the syntax, it will be found that the symbol ‘‘ €’’ as used in the
discussion above is syntactically correct in equations (1), (2) and (3) above, but is not so in (4), (6) and (7).
In the last three cases, the syntax demands that the ‘* €”’ sign be replaced by the ‘‘=”’ sign. The ‘‘=”’ sign, as
well as the ‘“‘€”’ sign seem to: have different interpretations in different syntactic contexts. Only further
research can tell whether the ambiguities are an inherent part of the technique or can be removed by careful
syntactic redefinition. For the present, we shall have matters as they are and stick to the syntax as was previ-
ously defined by us. We shall discuss the semantics, as well as the ambiguous interpretations as we proceed.
ike Constants are strings of letters from the early part of the alphabet and digits. Variables are Strings of
letters from the end of the alphabet. A constant is a termand a variable is a term.
2. If A and B are terms, then A, B is an ordered pair (called ‘‘op’’ for short). A is its left hand side (called
‘‘Ihs”’) and B is the right hand side (called ‘‘rhs’’) of this op. Every op is a string of ordered pairs (called
“‘sop’’). If A is an op and B is a sop, then A; B is a sop.
If A is a sop then (A) is an odject. An object is a term.
If A is a term and B is a term, then A(B) is a term.
The reader will notice that of these four paragraphs, 1 and 4 define terms almost in the usual logical sense
and can be interpreted in the same way. The main deviation from standard logic is in 2 and 3 above,
which
define the objects. It is our belief that the use of this syntactic form gives us an efficient method
for proving a
class of theorems whose proof using standard methods would be less efficient.
A PATTERN RECOGNITION VIEWPOINT 487
5. The value of a constant is itself. The value of a variable is itself. The value of an op A,B is A! , B! where
A! and B! are the values of A and B respectively when they are defined. The value of the sop A;B is
A!.; B! where A! and B! are values of A and B when they are defined. In other cases the value of ops
and sops are undefined.
6. The value of an object (A) is defined if and only if the value A! of A is defined and there are no two
ordered pairs in A! whose Ihs are identical and their rhs are distinct constants (or when one is a constant
and the other is an object). In such cases, the value of A is (A!).
7. The value of a term in the form A(B) is defined only if the values A! and B! for A and B, respectively,
are defined and B! is an object. In all such cases, if A! is the Ihs of an ordered pair of B! and the rhs of
all such order pairs (i.e. with A! as the Ihs) are identical, the value of A(B) is this unique rhs. Else the
value of A(B) is A! (B}).
The reason for the continuous recursion in the definitions will become clear when we start to give some
more realistic examples. Meanwhile, it ought to be pointed out that by the above definition the term
has the value ‘‘square’’. This tempts one to think of the object in the parenthesis as an individual. From our
discussion above however there is also reason to think of the object as a set, or, as a short-hand for a conjunc-
tion of atomic predicates. The need for this will become clearer as we proceed.
8. If A and B are terms, then (A = B) is an atom-statement (called an ‘‘atom’’ for short). F and T are atom
statements. An atom is a conjunction. If A is a conjunction or an empty string and B is an atom, then
A A B is a conjunction. (If A is an empty string, then A A B is written in the form B. Also, if B is a
conjunction and A is an empty string, then B can be written as B A A.)
A conjunction is a disjunction. If B is a disjunction and A a conjunction, then A V B is a disjunction. A
disjunction is a statement. If A is a statement and B is a variable, then (SB) (A) is a statement. If A is a
term and C is a term, then A € C is a statement.
9. If C is a variable, D a constant, and E a statement, then C € D = E is a description of D.
Once again, one is tempted to interpret the symbol € in 9 as the C symbol. The description
for instance, looks harmlessly like symbolic logic (with ‘‘x € redthing”’ interpreted as having the same meaning
as ‘‘redthing (x)’’ with ‘‘redthing”’ as a predicate). However, if ‘‘=”’ is interpreted normally, then, since our
definitions say that the value of ‘‘color ((color, red ))”’ is ‘‘red’’, the statement
The use of defined predicates as in 9 above gives us the ability to express infinite disjunctions without,
essentially, leaving the propositional calculus. To see this, let us consider the two following descriptions
x € digit =x =0Vx=1
0 € digit
as well as an infinite class of statements like
(head, (head, (head, nil; tail 1); tail, 0); tail, 1) € num
and so on. Purely as a matter of motivation, and also since these examples will continue to play an important
role in our examples, the reader may find it convenient to think of ‘‘elements of num’? as representing binary
Strings, the ‘‘tail’’ standing for the least significant digit and the ‘‘head’’ as the remainder of the
string — including the empty string, here called ‘‘nil’’. The three objects shown here can then be written as
“1 10" and ‘*101°’. In what follows, we shall often use this ‘‘shorthand’’ for brevity.
To give meaning to my proposed bridge between program learning and pattern recognition, I would like to
point out that i) a binary predicate is really a unary predicate on objects on which projection functions are
defined (indeed this is true of n-ary predicates also for all n), and ii) an unary function, defined as a binary
predicate, often has enough information in the definition so as to enable the calculation of the output com-
ponent from the input component. Take, for instance, the successor function on numerals (looked upon as
strings of binary digits) as follows
A PATTERN RECOGNITION VIEWPOINT 489
We ask the reader to convince himself that ‘‘(first, 11; second,100) € succ”’ is true, interpreting the symbols as
belonging to standard logic.
What is more interesting (and gives our system the aspect of a programming language) is that the proces-
sor (which can replace the logical processes we asked the reader to simulate above), can also check the state-
ment
x = 100 (10)
to
The processor derives what I have called the ‘“‘substituted form’’ of statements. If a statement is true in a
standard (or nearly standard) sense and has no free variables, this form is ‘‘T’’. For false statements the form
istiche For statements with free or existentially quantified variables (note once more the alienation from
standard logic), the form can be a conjunction of atomic statements of the form ‘‘x = A’ where A is an
object.
A few preliminary comments are in order. First, there is no way that I can see of having the substituted
form T for a statement having free variables, since neither negation nor implication belongs in the language.
Many theorems, hence,can not be stated in the language.
10. The reduced form of (SB) (A) is the same as the reduced form of A. This reduced form is said to con-
struct B. If the reduced form does not contain B, then B is arbitrary. If the reduced form is F, the con-
struction fails.
11. The reduced form of A V B is the same as the reduced form of A unless the reduced form of A is F, in
which case it is the reduced form of B.
12. The reduced form of A A B depends heavily on the form of the atom B. These are separately discussed
below. In all that follows B is (C = D). C! is the value of C and D! the value of D. Also, the reduced
form of an empty string is defined to be the empty string. The reduced form of A will be denoted by Al.
(Cc! =D!) will be denoted by B!.
12a. If C! and D! are both constants, then the reduced form of A A B is F, unless C! and D! are identical; in
this case, itis A!. If A is an empty string, then A! is T.
12b. If C! is a variable and D! a constant or object, then the reduced form is B! A A!, where A! is the reduced
form of A. If D! is a variable, then it is B! A A'. If C! and D! are identical, then it is A!, unless A! is
empty in which case it is T.
12c. If D! is a variable and C! is not, then the reduced form of A A B is the same as the reduced form of
Au) (DD = GC):
12d. If C! is a variable, constant, or object, and D! is of the form E(F), then the reduced form of A A B is the
same as the reduced form of A A (F = (E,C})).
12e. If C'! and D! are of the form E(F), or if they are both objects, then the reduced form of A A B is the
same as the reduced form of A A (G =C!) A (G =D!), where G is a variable which does not occur in
INES oe Oe
13. A conjunction is said to be in stable formif it is the reduced form of all its cyclic permutations.
It is conjectured that the reduced form of all statements will reach a stable form only if it is the conjunc-
tion of atoms of the form C = D where C is a variable and D is a constant, object or variable (unless, of
course, the reduced form is F or T). We now proceed to obtain the ‘‘merged form’ of these.
Avoiding for the time some of the (often important) details, one of the major merger processes takes two
sentences like ‘**x = (color, red)”’ and ‘*x = (shape, square)”’ and yields the sentence
‘x = (color, red; shape, square)’. This process is completely unjustified, of course if ‘‘=”’ stood for identity
of objects as strings or if objects were interpreted as individuals. I have found no good way to interpret equality
as the identity of constants named by the equated terms. The natural interpretation of objects seems to be as
sets. However, interpreting variables as having sets for values would not justify the ‘‘merger’’ process if equal-
ity stands for identity of sets. Hence the tentative interpretation for equality as membership suggested above,
at least in this context.
All this, of course, is happening because of ‘‘merger’’, whose importance I intend to justify with an exam-
ple. However, we still need at least one more definition before we can even get started.
14. Let M be a set of descriptions (see Definition 9). A sentence has the reduced form given M as we have
defined above unless it contains a predicate of the form A € B. In the latter case, the reduced form is F
unless the value B’ of B is a constant and a description of B’ occurs in M. In this case, if C € B’ =S is
that description, then in the reduced form A €B is replaced by the reduced form of S', where S! is
obtained from S by replacing each occurrence of C by A.
A PATTERN RECOGNITION VIEWPOINT 491
which is slightly simpler, for ease of discussion, than the examples in lines (9) to (12) above.
Replacing the object on the Ihs of the € sign in (13) for x on the rhs of = in the definition of succ (eqn
8) we obtain three disjuncts. One immediately negates the first since the first conjunct
In a similar manner the second disjunct also is negated since the third conjunct yields
(tail, 0; head, nil) = nil,
This new object immediately satisfies the first conjunct of the first disjunct of the definition of succ. The other
two disjuncts simplify to
and
while the second conjunct of the original third disjunct already was
tail (y) = 0
492 BANERJI
y = (tail, 0) (16)
15. If AA (x; =t,) A (x2 =t)) is in stable form, then its sorted form is the sorted form of B A (x2 =t)) if
X1 < X>. Else it is the sorted form of C A (x; =t,). In the above, B is the sorted form of A A (x; =t))
and C is the sorted form of A A (x, = ty).
16. If A A (x =t,) A (x =t)) is in sorted form, then its merged form is given as follows:
l6a. If t; and t, are both variables, then the merged form is the merged form of the sorted form of
AN (x =t)) A (ty = ty).
16b. If t, is a variable, then the merged form is the merged form of the sorted form of
A (\ (x =t,) A (tz =x): similarly, if t) is a variable. In both cases, it is assumed that 15a is inapplicable.
l6c. If t; is a constant and t, an object, then the merged form is F.
l6e. If t; is an object (Aj) and t, is an object (A) then the merged form is the merged form of the sorted
form of the reduced form of A A B where B is constructed as follows:
l6e-1.If all the Ihs of all the ops in the sop A; ; A? are distinct, then B is (x = (A; ; A))).
l6e-2.1f there are two ops a,b,;anda,b) with the same lhs then B is of the form
(x =B!) A (G =b,)) A(G =b2) where G is a variable other than x which does not occur in A,
t; or ty and B! is obtained from (Ay; A») by deleting a, b; and a, by and adding a, G.
In the light of these, let us go back to merging the atoms 14, 15 and 16.
A PATTERN RECOGNITION VIEWPOINT 493
which can merge easily with ‘ty = (tail, 0)’ by 16e-1 to yield
Now we merely need one more definition to get the substituted form.
we shall write (11,100). Also, we shall assume that ‘‘num’’, instead of the description in Section A.3, has the
description
(1, 10)
Starting with the first positive example, the following statements can be extracted as true
For simplicity we are leaving out various other deductions, like head (head (first (x)) =10 etc. These contri-
bute great inefficiencies to the system (we shall discuss these later: workers in the field of program learning
will recognize these). However, in this section we are merely illustrating a method, not exalting it in any way.
Continuing with our learning technique, the system now constructs objects which hold all but a few of the
above statements true. The first two statements can not be changed without changing the third, fourth and
fifth statements, so these latter are changed first. None of these can be changed individually without changing
one of the others. Changing the fourth and fifth together yields the third negative example and changing the
third and fifth together yields the fourth negative example. Changing the third and fourth together yields no
negative examples (some of the positive examples on the first row are produced this way). So the learning pro-
gram marks the third and fourth statements inessential and the fifth essential. Similarly, changing the sixth and
seventh statements yield the first two negative examples. Removing any of the first two statements yield the
fifth and sixth negative examples. Hence, these two are essential.
Removing the inessential statements from the above list of statements yields the conjunction induced by
the first disjunct of the third conjunct of the description of succ. This description is satisfied by the first row of
positive examples. However, the rest of the positive examples do not satisfy the description, so the system
picks another positive example to focus on.
The learning of the second disjunct from the second row of positive examples is an identical process and
we shall not expand on it here. However, a third kind of deduction becomes essential for the learning of the
rest of the description, starting with the first element of the third row as the focus. At this moment the first
two disjuncts in the description of succ is already in memory as a partial description.
496 BANERJI
Using the focus one deduces, in addition to _first(x) € num and second(x) € num, also
tail (first (x) = 1) and tail (second (x)) =0 which can be changed to yield the negative examples on the third
row. However, a new deduction is suggested by the fact that we can also deduce head (first (x)) € num. This,
together with the fact that first (x) € num is an essential part of the description of all succ seen so far, “‘sug-
gests’’ (a word which hides many blind alleys and heuristics) that (Az) (z € succ & first (z) = head (first (x)))
is a possible deduction. This yields the following:
z= (10)
and one ‘‘notices’’ that second (z) = head (second (x)) is true.
Once this deduction is made, a repetition of the previous method of removing and testing conjuncts yields
the third disjunct of ‘‘succ,’’ completing the description.
In the discussion that follows, we shall try to bring the subject-matter of this section in line with
automatic programming.
which is readily converted into a program. On the other hand, the maximum m of a larger set S, satisfying the
property
needs to be converted into a quantifier-free recursive form to match it to the basic predicates and operators of
the programming language. If the programming language had quantifications among the basic building blocks,
this second statement would be equally easily convertible to a program.
To bring out the importance of the basic predicates in the case of learning by examples (either of pro-
grams or of descriptions), let me invoke the description of dig and num in Section A.3.
The two conjuncts of the definition of ‘‘num’’ are
and
Of these the first succinctly expresses what with a slightly greater effort, could be written
This would not only complicate the description but would also lengthen the learning process.
The matter becomes even more critical if we imagine what would happen if the second disjunct of the
second conjunct of “‘num’’ above was not available to us. We would be forced to learn longer and longer dis-
junctions.
Some workers (Summers [1977], Jouannaud [1977]) in the area of program learning from examples
invoke a special procedure in the learning algorithm for converting such a growth into a recursion. Others
(Cohen and Sammut [1978], Treuil et a/. [1977]) as in my work, build the recursion by matching a part of the
developed program to a previously developed program.
But this is taking us somewhat ahead of ourselves. The major point I want to make is that the beginning
of any learning task—be it learning descriptions or programs—lies in deducing the truth of certain statements
about the examples. An important part of SISP (Guiho [1978]) is the matching of (stating the equality
between) substrings of the input and output. In THESYS (Summers [1977]) these predicates come in two
parts: the recognition of the position of the input in parts of the Summers hierarchy and the equality of the out-
put with some function of the input. While THESYS restricts these structures to be built out of only CAR,
CDR, CONS and ATOM, the Paris school allows itself the liberty of using their LCAR, LRAC, RDC and
APPEND. A lot of the strength of their system comes from the use of these stronger building blocks.
However, Jouannaud [1977] in his thesis used another technique (later used by Treuil et al. [1977]),
somewhat stronger than the technique I have illustrated above in the learning of ‘‘num’”’ and ‘‘succ’’. They
develop (rather than invoke) subprograms which make the expression of the final program simpler.
It may be worthwhile at this point to indicate the kind of analogy I am drawing between program-learning
and description-learning by using the language of section A to describe a LISP program learned by the tech-
nique described by Kodratoff [1979]. The example I use appears in his paper in section 4.4.
498 BANERJI
Some of the examples (x’; and y’;) used by him, rewritten as objects would be:
second,
(car, (car,(car,A; cdr,nil); cdr,(car,B; cdr,nil));
and this sequence would have to satisfy the relation FARG(x) = y or, in the nomenclature of section A:
A B €F}
a € F =[ (cdr(second(a@)) = nil) A
A first(B) = first(a)
A second(B) = second(a)
\ y €E A first(y) = first(@)
A (second(y) = (car(cdr(second(a@))))))
a €G = { (cdr(car(cdr(second(a)))) = nil)
A (car(fourth(a@)) = third(a))
A (car(cdr(fourth(a)))
= (car(car(cdr(first(a@))))))
A (cdr(cdr(fourth(a))) = nil) }
A first(B) = car(cdr(first(@)))
A second(f) = car(cdr(second(a)))
A third() = third(a))
In my work the sub-descriptions (‘‘dig’’ inside ‘‘num’’, “‘num”’ inside ‘‘num”’ and ““succ’’, “‘succ’’ inside
‘“succ’’) are extracted from descriptions previously learned. The Paris school develops them ‘‘on the run” by
restricting the development to a narrow class.
A learning algorithm which was not guided by any preconception of the class of deductions which would
be useful, would have to learn exactly these descriptions (and few else, if any semblance of efficiency is to be
maintained) by examples, before the description of FARG can be learned. The reason the Kodratoff technique
is capable of learning FARG (and can deduce F and G without prior training) lies in the fact that the class of
programs is circumscribed to make certain deductions useful and that the learning program is biased towards
making just such deductions.
It is not clear to me which method of generating these subprograms is superior. One of these (discussed
in sec.B) is open ended in that its flexibility grows with learning—and its efficiency begins to drop at the same
time. The Paris school seems to keep a degree of flexibility and efficiency which reaches saturation. On the
other hand, the Paris school has always been very careful in circumscribing very carefully the class of functions
500 BANERJI
amenable to their techniques. This sets a laudable example of ‘‘engineering specification’? which is not avail-
able to our system yet.
To summarize, and especially to firm up the ‘tbridge’’ I have been claiming to build, let us make the fol-
lowing initial statement. In all learning, program or pattern, the main thingsare the examples. In program
learning, the examples are input-output pairs or n-tuples. In pattern recognition, they are what I call ‘‘objects’’,
i.e. n-tuples of measurement values. At both ends of the bridge, the different members of the example n-
tuples have structures of their own—numbers or list structures (mostly) in program construction, and further
objects in (at least our version of) pattern recognition.
The major source of efficiency in both activities is the power to recognize that the examples satisfy many
statements other than the ones specified at the input. Given the set of ordered pairs (1,1), (2,3), (3,6), (4,10)
(5,15) .... the recognizing device can glean right away that (x=l & y=l1) V (x=2 & y=3) ---. are viable
descriptions (albeit it changes every time a new example comes in). A more sophisticated learner would recog-
nize in addition that if one subtracts | from x and x from y, the resulting pair also forms a member of the set.
This results in a much more efficient finite program of a recursive nature. The greater strength comes from the
program constructors’ previous knowledge of another program—that of subtraction. The advantage is offset by
the effort spent in deciding what to subtract from what. Interpolation-theory received its impetus from the
recognition that the subtraction of the second elements of two examples yield a large class of functions (the
polynomials). Present day workers in program learning also use certain restricted deductions. (We do not
dare, for instance, say that any synthesizer notices that all examples satisfy y = x(x+1)/2 in the above exam-
ple.) Some limit these restricted deductions to a small class of predicates, others allow the class of predicates to
grow as more programs are learned. A recursive class of programsresult, determined by the restriction on the
deductions allowed on the examples; this restriction probably has to be made in the interest of efficiency.
In the field of pattern recognition, the only innovations in the seventies have been in the use of stronger
logical connectives in the description language: and allowing the use of n-ary over unary predicates in the
specification of the input. These predicates have always come from a fixed repertoire and no flexible method
has been even attempted to the growth of this repertoire. So some of the problems that are being actively
addressed by the program construction group have not yet been visualized in pattern recognition.
Standing in the middle of the bridge, I have to admit that one of the banks looks distinctly greener.
Acknowledgment
The preparation of this final manuscript was supported by the National Science Foundation under grant
MCS-— 8110104.
References
Banerji [1979]
R.B. Banerji, ‘‘Pattern recognition: structural description languages,” Encyclopedia of computer science
and technology, Vol. 12 (Marcel Dekker, Inc., 1979).
Fu [1969]
K.S. Fu and P. Swain, ‘‘On syntactic pattern recognition,”’ Third International Symposium on Com-
puter & Information Science, 1969.
Cohen [1978]
ant ‘A powerful and efficient structural pattern recognition system,” Artificial Intelligence 9,
A PATTERN RECOGNITION VIEWPOINT 501
Cohen [1978]
B. Cohen and C.A. Sammut, ‘‘Pattern recognition and learning with a structural description
language,’ Proc. 4th Int. Jt. Conf. on Pattern Recognition (Kyoto, 1978).
Banerji [1964]
R.B. Banerji, ‘‘A language for the description of concepts,’’ General Systems, 9 (1964), p. 135.
Banerji [1976]
R.B. Banerji, ‘‘A data structure which can learn simple programs from examples of input-output,”
Pattern Recognition and Artificial Intelligence (C. Chen, Ed.), Academic Press, (N.Y., 1976).
Banerji [1978]
R.B. Banerji, “‘Using a descriptive language as a programming language,’ Proc. of the 4th Int. Joint
Conf. on Pattern Recognition (Kyoto, 1978).
Warren [1974]
D. Warren, ‘‘Epilog (400, 400)—A user’s guide to the DEC-10 prolog system,’’ Internal Memo,
Dept. of Artificial Intelligence, University of Edinburgh (1974).
Manna [1977]
Z. Manna and R. Waldinger, ‘‘Studies in automatic programming logic,’’ North-Holland, N.Y.
GOT):
Bibel [1978]
W. Bibel, U. Furbach, and J.F. Schreiber, ‘‘Strategies for the synthesis of algorithms,’’ in Program-
miersprachen, Vol. 12, Informatik-Fachberichte, Springer-Verlag, NY, 1978.
Biermann [1972]
A.W. Biermann and J.A. Feldman, ‘‘On the synthesis of finite machines from samples of their
behavior,’’ JEEE Trans. on Computers C-12 (June 1972).
Summers [1977]
P.D. Summers, ‘‘A methodology for LISP program construction from examples,’ Journal Assoc.
Comp. Mach. 24 (1977).
Jouannaud [1977]
J.P. Jouannaud, ‘‘Sur l’Inference et la synthese automatique de fonctions LISP a partir d’examples,”’
Thesis, University of Paris VI (Nov. 1977).
Guiho [1978]
G. Guiho and J.P. Jouannaud, ‘‘Program synthesis from examples for a simple class of non-loop
functions,’’ Research Report, Laboratoire de Recherche en Informatique, Universite de Paris-Sud
(Mar. 1978).
Treuil [1977]
J.P. Treuil, J.P. Jouannaud, and G. Guiho, ““LQAS un Systeme-Question-Reponse base sur
l'apprentissage et la synthese de programmes a partir d’examples,”’ Institut de Programmation,
Universite de Paris-VI (Mar. 1977).
502. BANERJI
Biermann [1972]
A.W. Biermann, ‘‘On the inference of Turing machines from sample computations,” Artificial Intelli-
gence3, 181(1972).
Kowalski [1979]
R. Kowalski, ‘‘Logic for problem solving,’’ North Holland, NY (1979).
Sammut [1981]
C. Sammut, ‘‘Concept learning by experiment,’’ Proc. of the Int. Joint Conf. on Artificial Intelligence,
Vancouver, B.C. (Aug. 1981), p. 104.
Cohen [1980]
B. Cohen, ‘‘Program synthesis through concept learning,’ this volume.
Kodratoff [1979]
Y. Kodratoff, ‘‘A class of functions synthesized from a finite number of examples and a LISP pro-
gram schema,” /nt. Jour. of Comp. and Infor. Sciences, Vol. 8 (1979), p. 489.
Banerji [1979]
R.B. Banerji, ‘Artificial intelligence: a theoretical approach,’’ North Holland, NY (1979).
INDUCTIVE INFERENCE — 503
CHAPTER 23
Dana Angluin
Yale University
Abstract
In this paper we describe some recent results on efficient procedures for identification of formal languages
from examples. These results are based on structural information concerning the particular domains con-
sidered, and begin to suggest what kinds of structure theory are relevant for inductive inference.
A. Introduction
We begin with two examples. Given the strings of digits
* This work was supported by the National Science Foundation under grant number MCS 8002447.
504. ANGLUIN
one natural sequence of observations is that each string begins with 10, that the remainder of each string is a
palindrome of the form xx‘, and that the values of the x’s, namely, 49, 64, 81, 100, are squares of consecutive
integers. If instead we were presented with the strings
we might conjecture that what is common to them is that each one contains an even number of 0’s and an even
number of 1’s. Can we find some fruitful theory to account for inferences of this kind, or is this domain int-
rinsically beyond analysis? The study of inductive inference is one attempt to find such a theory.
One question that arises immediately is how to define the ’correctness’ of inferences of this kind. Since
there are in general infinitely many different rules compatible with any finite sample, any guess is subject to
being contradicted by the next example. As an illustration, the initial segment 2,4,6,8,10,12,... strongly sug-
gests the even numbers, but if instead it is the sequence of values of Euler’s totient function, then the next
element is 16. (This example is taken from Sloane’s instructive and entertaining book of integer sequences
[1973].)
Gold [1967] proposes the concept of *identification in the limit’ to define the correctness of inductive
processes. This paper is a study of the problem of how a child is able to learn a grammar for its native language
from examples. Gold models this as the acquisition of a formal grammar from examples and obtains a number
of fundamental theoretical resuits, some of which are described in subsequent parts of this paper.
The idea of identification in the limit is the following. There is a fixed domain D of rules, and two
players, say N (Nature) and M (Man). N selects a rule from D and begins giving M examples of it in such
a way that every possible example of the rule will eventually be given after some finite time. M reads the
examples provided by N and occasionally conjectures elements of D. If the sequence of M’s guesses is even-
tually constant (i.e., stops changing after some finite time) and correct (i.e., names the rule that N_ selected at
the start), then we say that M correctly identifies the rule in the limit.
One simple example of a domain of rules that may be correctly identified in the limit by an inference
algorithm M is the set of all integer sequences of the form f(1),f(2),f(3),..., where f is a polynomial. When
M has the first m+1 terms of such a sequence, it interpolates a polynomial of degree m through them and
conjectures this polynomial. If N has chosen a polynomial of degree d to start with, then after d+1 terms,
M’s guesses will stabilize on the correct conjecture.
Even in this simple domain, M_ itself is not able to tell whether the sequence of its guesses has converged
yet. For example, the polynomial
f(n) = (n—1)(n—2)(n—3)+1
is | for n = 1,2,3 and then jumps to 7. This is analogous to the residual uncertainty we must have about our
current scientific theories.
The criterion of identification in the limit (and variations of it) has been widely accepted as a useful
definition of correctness for inductive processes. However, since we are distinctly finite beings, the question of
measuring the ‘goodness’ of the answers at finite stages of an inductive process is of vital interest. Complexity
theory, with its emphasis on attempting to distinguish problems that are practical to solve from those
that are
solvable in theory but infeasible in practice, is a natural tool for the more detailed study of inductive
inference
procedures. Some inductive inference processes and problems are directly amenable to
standard notions of
complexity theory, while others seem to require new definitions.
The evidence from preliminary studies suggests that an ’efficient’ inference procedure
must rely on a
fairly rich structure theory for the domain in question, and that such a theory in turn
illuminates the concept of
identifiability’ in its domain. The remainder of this paper is a description of some
of the theoretical work on
INDUCTIVE INFERENCE = 505
measures of efficiency for inference methods, and algorithms that are provably efficient with respect to these
measures. General surveys of other aspects of abstract and concrete work in inductive inference may be found
in Angluin and Smith [1982], Biermann and Feldman [1972], Case and Smith [1983], Fu [1975], Fu and Booth
[1975], Gonzalez and Thomason [1978], and Smith [1980]. A general bibliography of abstract and concrete
work in inductive inference may be found in Smith [1979].
U4,U7,U3, > °°
of strings, such that every u; is an element of S, and every element of S appears somewhere in the sequence
(possibly repeated). If S is any language, a complete presentation of S is any infinite sequence
such that for each i, u; is a string and t; is 1 if and only if u; is in S, and for every string u over the alphabet,
there is some i such that u =u;. Thus, a positive presentation of a language gives only examples of the
language, while a complete presentation gives both members and nonmembers. As an example, if S is the set
of all strings of 1’s over the alphabet {0,1}, then a positive presentation of S might begin:
Another form of presentation of a language S is by informant. In this case, the inference machine has
access to an oracle for S, which will answer any query of the form “‘is the string u in S?’’ with yes or no. Ina
506 ANGLUIN
formal sense an informant for S and a complete presentation of S contain the same information about S, but in
practice inference methods for the two types of presentation tend to be rather different. Some authors have
considered mixtures of given data and informant presentation.
An inference machine is an algorithmic device that from time to time may request inputs and produce out-
puts. To run such a machine on an infinite sequence of inputs, we start the machine and whenever it requests
an input, we give it the next element of the input sequence, and whenever it produces an output, we append
the output to the (initially null) output sequence. Thus, we may consider the output of the machine to be the
null, finite, or infinite sequence of outputs produced while being run on the input sequence. (A more formal
definition may be found in Angluin [1980a].) This definition may easily be modified for the case of informant
presentation, but we omit this development.
We now define a notion of convergence for the output of such a machine. A finite nonempty sequence of
strings dj,d9,..., d, is said to converge to the value d_ if and only if d, is equal to d. An infinite sequence of
strings d),d,d3, --- . is said to converge to the value d if and only if there exists a number N such that d, is
equal to d for all n greater than N.
An inference machine M is defined to identify the language S in the limit from positive data if and only if
for every positive presentation u;,U7,U3, ° ~° of S, the sequence of outputs produced by M _ with this sequence
as input converges to some d in D such that L(d) =S. The definition of identification in the limit from complete
data is the same, with ‘‘complete presentation’? replacing ‘‘positive presentation’’.
If C is a class of languages with a system of description D and L, C is defined to be identifiable in the limit
from positive data if and only if there exists an inference machine M that identifies in the limit every S in C
from positive data. Identifiability from complete data is defined analogously.
A number of variations of the definition of identification in the limit have been considered, some of
which we now sketch. In ‘‘finite identification’’, the inference machine is required to be able to detect when it
has converged (Freivald and Wiehagen [1979], Kugel [1977]). ‘‘Behavioral correctness’’ requires only that the
language denoted by the guess converge correctly—the descriptions of it may continue to change (Case and
Smith [1983], Feldman [1972]). Another approach permits a finite number of ‘‘bugs’’ or ‘‘anomalies’’ in the
final guess (Case and Smith [1978,1983], Smith [1979]). Still another idea is to allow a finite ‘‘team”’ of infer-
ence machines to work in parallel on one problem (Daley [1981], Smith [1981]). A considerable amount of
work has been done studying the classes of languages identifiable with respect to these criteria and others.
Theorem 1: (Gold [1967]) If D and L is a recursive system of descriptions for the class C, then C is
effectively identifiable in the limit from complete data.
Proof: The elements of D are effectively enumerable by definition, so let dj,d>,d3,--- be some
effective enumeration of them. Let some input sequence
be given. For each n, search for the least k such that u; is in L(d,): if and only if t; =1 for all i=1,2,....n,
and output d, as the nth guess if such is found. It is clear that if the given sequence
is a complete presen-
tation of any language L(d) in C, the sequence of outputs produced by this process will converge
to the
first description of L(d) in the given enumeration of D. (]
INDUCTIVE INFERENCE = 507
This theorem shows that the regular sets described by finite automata or regular expressions are
effectively identifiable in the limit from complete data. The same is true of the context free or context sensi-
tive languages described by their customary grammars. The simple enumerative inference procedure described
in the proof above will be referred to as the enumeration algorithm. It is guaranteed to produce as its n th guess
the earliest description (if any) compatible with the first n terms of the input sequence.
The enumeration algorithm does not seem very practical. However, if we think of it as a systematic
search, certain possibilities suggest themselves. In the enumeration algorithm, each incompatibility between
the current hypothesis and the data results in the elimination of one hypothesis from consideration, because of
the simple linear ordering of the hypotheses. In concrete domains, it is often possible to organize the class of
hypotheses in a more complex way, allowing a whole subclass to be eliminated from further consideration by
each incompatibility. Wharton [1977] compares straight enumeration with a more sophisticated search for
inferring context free languages. Other search-based methods are _ described by Biermann
[1978,1972,1975,1976], Gaines [1976], Horning [1969], Maryanski and Booth [1977], Mitchell [1979], Shapiro
[1981], and Van der Mude and Walker [1978]. Cook et al. [1976] describe an inference method based on a
hill-climbing search for a local optimum.
Theorem 2: (Gold [1978]) The minimum size inference problem is NP-hard for D = deterministic
finite automata and s(d) = the number of states in d.
Theorem 3: (Angluin [1978]) The minimum size inference problem is NP-hard for D = regular
expressions and s(d) = the length of the expression d.
These results, and refinements of them, suggest that the computational problem of minimum size infer-
ence, although natural and rather appealing in its formulation, may prove intractable for classes as simple as the
regular sets with natural size functions. Search-based algorithms that solve these problems may be improved in
various ways, but are fundamentally up against NP-hard problems.
One approach to NP-hard problems is to analyze heuristic methods for them, in order to establish statisti-
cal or approximation guarantees for the methods. There have not as yet been any such studies in the domain
of inductive inference. Another open problem in this area is to give a general method for analyzing the com-
plexity of minimum size inference problems—so far, only a few specific problems have been analyzed.
event that d generates S. (Stochastic grammars may be used in defining these probabilities.) The objective,
given a finite positive sample S, is to find d to maximize P(diS), which by Bayes’ Theorem may be accom-
plished by maximizing P(d)P(Sld). We may think of P(d) as a kind of inverse of a size function, and P(Sld) as
a measure of the ‘“‘fit’’ of the description d to the sample S.
Horning [1969] considers this objective function for stochastic context free grammars. He gives a search-
based algorithm to maximize P(dS) and shows that it correctly converges in the limit with probability one under
appropriate assumptions. Related studies are those of Solomonoff [1964,1978,1975], Cook et al. [1976], and
Van der Mude and Walker [1978]. Feldman [1972] gives an abstract treatment of objective functions that com-
bine size and derivational complexity for grammars, and Feldman and Shields [1977] extend this treatment to
functions. Other mixed measures have been studied by Maryanski and Booth [1977] and Gaines [1976] for sto-
chastic deterministic regular grammars. Essentially no work has been done studying the computational com-
plexity of optimizing any of these mixed measures.
It is also possible to dispense with ‘‘size’’ entirely and concentrate on ‘“‘fit’? as an objective. One way to
do this is as follows. Given a finite set S of strings, find a description d such that L(d) contains S and for
any description d’ such that L(d’) contains S$, L(d’) is not a proper subset of L(d). That is, L(d) should be
minimal in the set containment ordering among all L(d’) that contain S. This problem is called the minimal
language inference problem.
This objective function does not at first seem very promising —for example, in the domain of regular sets,
the unique minimal regular set containing the finite sample S is S itself. However, as we shall see in the next
two sections, there are nontrivial domains in which this objective leads to correct identification in the limit from
positive data, and can be optimized by provably efficient algorithms.
Theorem 4: (Angluin [1980a]) Suppose that C is a class of languages with a recursive system of
descriptions D and L. Suppose also that every language in C is nonempty. Then C is identifiable in
the limit from positive data if and only if there is an effective procedure to enumerate a marker for L(d)
and C given d, for every description d in D.
Theorem 5: (Gold [1967]) Any class of languages containing all the finite languages and at least
one
infinite language is not effectively identifiable in the limit from positive data.
Proof: There are no markers for the infinite languages in such a class. (]
Thus, even the class of regular languages is not effectively identifiable from positive
data. Nonetheless,
there are some nontrivial classes of languages that may be effectively identified in
the limit from positive data
as we shall describe.
INDUCTIVE INFERENCE = 509
as in the pattern 10xx' of the first example in this paper.) An open problem in this area is to analyze the com-
plexity of the minimal language inference problem for patterns that contain k variables for any fixed k greater
than or equal to 2.
We first construct an incompletely specified tree-like deterministic acceptor that accepts just these strings. The
states are all prefixes of the given strings, with a transition on input b from u to ub provided u and ub are
both states. The start state is the null string, and the accepting states are the given strings themselves.
We now proceed repeatedly to collapse certain sets of states of this machine to obtain our final machine.
First we collapse all the accepting states. Then we collapse any set of states that are all either b-predecessors or
b-successors of a single state in the current machine, where b is either 0 or 1. We continue this process until
no further collapsing is possible. The result will be a deterministic finite state acceptor with one initial and one
final state that accepts a superset of the original sample. In our example this process will result in the acceptor
that recognizes all strings that contain an even number of 0’s and an even number of 1’s.
This technique works well on this particular example. It fails miserably given the sample consisting of the
strings 01, 00011, 00111, 000111—in this case the answer found is the language containing all strings over the
alphabet {0,1}. In Angluin [1983] there is a characterization of this technique and generalizations of it, which
we now summarize.
Consider deterministic finite state acceptors whose transition functions may be partial. (The interpretation
is that if a string attempts to use an undefined transition, it is rejected.) Such an acceptor is defined to be rever-
sible if and only if it has exactly one accepting state and the operation of interchanging the initial and accepting
state, and reversing the direction of each of the transition arrows produces a deterministic acceptor (also possi-
bly with a partial transition function). Alternatively, such an acceptor is reversible if and only if it has just one
accepting state and every input symbol induces an injective mapping on the state set of the machine. A regular
language is called reversibleifand only if it is accepted by some reversible automaton.
In Angluin [1983] the collapsing algorithm sketched above is described and shown to run in polynomial
time and to find the smallest reversible regular set containing the given sample. Thus it solves the minimal
language inference problem in the domain of the reversible regular sets. It is also shown that using this
method at the finite stages of an inductive inference process leads to correct identification in the limit of the
reversible regular sets from positive data. (There is also a generalization to k-reversibility, in which the
reversed acceptor need only be deterministic with lookahead k.)
The reversible automata contain permutation machines with one accepting state as a subclass. This obser-
vation suggests the possibility that this technique might be used in conjunction with the theory of algebraic
decomposition of automata to produce more powerful but still efficient inference methods for regular sets. In
the domain of regular languages, this collapsing technique and the labelling technique of Crespi-Reghizzi appear
to complement one another and perhaps may be usefully combined.
INDUCTIVE INFERENCE | 511
I. Concluding Remarks
It seems intuitively clear that when humans communicate complex procedures to one another (for exam-
ple, descriptions of algorithms), they do so most successfully by means of a mixture of general description to
fix the outlines of the procedure and specific examples to fill in many of the details. Such a form of communi-
cation appears to be more robust than either pure description or pure examples. This may be a chance peculiar-
ity of humanity, or it may be a general phenomenon of communication between complex systems. In either
case it is of interest to us as designers of systems that communicate with people. Hence it is important for us
to try to elucidate the general principles and assumptions underlying the use of examples to infer rules.
How such knowledge might ultimately be systematized and integrated with a component allowing ‘‘out-
line’’ description or general broad specification is not at all clear. One possibly promising approach, taken by
Shapiro [1982,1981], is to explore the synthesis of logic programs (for example, in PROLOG), in which exam-
ples and assertions may quite naturally be expressed in the same language.
A very interesting body of work on efficient algorithms to synthesize LISP programs from input/output
data, described for example by Jouannaud and Kodratoff [1979], is more fully treated elsewhere in this book.
This paper has described an approach using complexity theory to explore inductive inference procedures
in various specific domains, and emphasized the use of structural information to construct efficient
identification algorithms. The results in this area, while very preliminary, appear promising.
512 ANGLUIN
References
Angluin [1980] .
D. Angluin, ‘‘Finding patterns common to a set of strings,”’ J. Comp. Sys. Sci. 21:46—62, 1980.
Angluin [1980a]
D. Angluin, ‘Inductive inference of formal languages from positive data,”’ Inform. Contr. = 13541980:
45-117
Angluin [1982]
D. Angluin, ‘‘Inference of reversible languages,’ J. ACM 29:741—765, 1982.
Angluin [1981]
D. Angluin, ‘‘A note on the number of queries needed to identify regular languages,’ Inform. Contr.
DVO Or OSL,
Angluin [1978]
D. Angluin, ‘‘On the complexity of minimum inference of regular sets,’ Inform. Contr. 39—337—350, 1978.
Barzdin [1974]
J.M. Barzdin, ‘‘On synthesizing programs given by examples,’ in Lecture Notes in Computer Science, Volume
>:
Barzdin [1972]
J.M. Barzdin, ‘Prognostication of automata and functions,’ in Information Processing 71, North-Holland,
Amsterdam, 1972, pp. 81—84.
Biermann [1978]
A. W. Biermann, “‘The inference of regular LISP programs from examples,’ IEEE Trans. on Systems, Man,
and Cybernetics SMC —8:585—600, 1978.
Biermann [1972]
A.W. Biermann, ‘‘On the inference of Turing machines from sample computations,” Art. Int. 3:181—198,
1972.
Crespi-Reghizzi [1971]
S. Crespi-Reghizzi, ‘“‘Reduction of enumeration in grammar acquisition,’ in Proc. Second International Joint
om
Crespi-Reghizzi [1972]
S. Crespi-Reghizzi, ‘‘An effective model for grammar inference,” in Information Processing 71, North-Holland
Publishing Co., 1972, pp. 524—529.
Daley [1981]
R. Daley, ‘‘On the error correcting power of pluralism in inductive inference,’ Technical Report, Dept. of
Computer Science, Univ. of Pittsburgh, 1981.
Feldman [1967]
J.A. Feldman, ‘‘First thoughts on grammatical inference,’ Technical Report, Stanford University Artificial
Intelligence Memo #55, 1967.
Feldman [1972]
J.A. Feldman, ‘‘Some decidability results in grammatical inference,’ Inform. Contr. 20:244—262, 1972.
Fu [1975]
K. S. Fu, Syntactic Methods in Pattern Recognition, Academic Press, NY= 19758
Gaines [1976] . .
B.R. Gaines, ‘‘Behavior/structure transformations under uncertainty,” Int. J. of Man-Machine Studies
8:33 365.1976"
Gold [1978]
E. M. Gold, ‘‘Complexity of automaton identification from given data,”’ Inform. Contr. 37:302—320, 1978.
Gold [1967]
E. M. Gold, ‘‘Language identification in the limit,’ Inform. Contr. 10:447—474, 1967.
Horning [1969]
J. J. Horning, A study of grammatical inference, Ph.D. thesis, Stanford University, Computer Science Dept.,
1969.
Jantke [1979]
K.P. Jantke, ‘‘Natural properties of strategies identifying recursive functions,’ Elektronische Informa-
5
Kugel [1977]
P. Kugel, “Induction, pure and simple,’’ Inform. Contr. 35:276—336, 1977.
Mitchell [1979]
T. M. Mitchell, *‘An analysis of generalizations as a search problem,”’ in Proc. Sixth International Joint
Confer-
ence on Artificial Intelligence, IJCAI, 1979, pp. 577—582.
INDUCTIVE INFERENCE 515
Shapiro [1982]
E. Shapiro, ‘‘Algorithmic program diagnosis,’ in Proc. Ninth Symposium on Principles of Programming
Languages, ACM, 1982.
Shapiro [1981]
E. Shapiro, ‘‘A general incremental algorithm that infers theories from facts,’ in Proc. Seventh International
Joint Conference on Artificial Intelligence, IJCAI, 1981, pp. 446—451.
Sloane [1973]
N. J. A. Sloane, A Handbook of Integer Sequences, Academic Press, NY., 1973.
Smith [1979]
C. H. Smith, Hierarchies of identification criteria for mechanized inductive inference, Ph.D. thesis, S.U.N.Y.,
Buffalo, 1979.
Smith [1979a]
C. H. Smith, ‘‘An inductive inference bibliography,’ Technical Report, Purdue University Computer Science
Dept., CSD TR 323, 1979.
Smith [1981]
C. H. Smith, ‘‘The power of parallelism for automatic program synthesis,’ in Proc. 22nd Annual Symposium
wy)
Smith [1980]
D. R. Smith, ‘‘A survey of the synthesis of LISP programs from examples,”’ in Proc. Symposium on Program
Construction, INRIA, Bonas, France, 1980.
Solomonoff [1978]
R. J. Solomonoff, ‘‘Complexity-based induction systems: comparisons and convergence theorems,’’ IEEE
Trans. on Information Theory IT —24:422—432, 1978.
Solomonoff [1975]
R. J. Solomonoff, ‘‘Inductive inference theory—a unified approach to problems in pattern recognition and
artificial intelligence,’ in Proc. Fourth International Joint Conference on Artificial Intelligence, IJCAI, 1975,
pp. 274-280.
Solomonoff [1964]
R. J. Solomonoff, ‘‘A formal theory of inductive inference,’ Inform. Contr. 7:1—22, 224—254, 1964.
Wharton [1977]
R. M. Wharton, ‘‘Grammar enumeration and inference,” Inform. Contr. 33:253—272, 1977.
;
INDUCTIVE LEARNING | 517
CHAPTER 24
Ryszard S. Michalski
Department of Computer Science
University of Illinois
Urbana, Illinois 61801
Abstract
The theory presented here treats inductive learning as a process of generalizing symbolic descriptions,
under the guidance of generalization rules and background knowledge rules. This approach unifies various types
of inductive learning, such as learning from examples and learning from observation.
Two inductive learning programs are presented: INDUCE 1.1 — for learning structural descriptions from
examples, and CLUSTER/PAF — for learning taxonomic descriptions (conceptual clustering). The latter pro-
gram partitions a given collection of entities (objects, computational processes, observations, etc.) into clusters,
such that each cluster is described by a single conjunctive statement and the obtained assembly of clusters
satisfies an assumed criterion of preference.
518 MICHALSKI
The presented methodology can be useful for an automated determination of complete and correct pro-
gram specification for computer-aided decision making, for knowledge acquisition in expert systems, and the
conceptual analysis of complex data.
A. Introduction
Our understanding of inductive inference processes remains very limited despite considerable progress in
recent years. Making progress in this area is particularly difficult, not only because of the intrinsic complexity
of these problems, but also because of their open-endedness. This open-endedness implies that when one
makes inductive assertions about some piece of reality, there is no natural limit to the level of detail and to the
scope of concepts and operators used in the expression of these assertions, or to the richness of their forms.
Consequently, in order to achieve non-trivial general solutions, one has to circumscribe carefully the nature and
goals of the research. This includes defining the language in which descriptions may be written and the modes
of inference which will be used. Careful definitions will avoid the main difficulty of most current research:
attacking problems which are too general with techniques which are too limited.
Recently there has been a growing need for practical solutions in the area of computer induction. For
example, the development of knowledge-based expert systems requires efficient methods for acquiring and
refining knowledge. Currently, the only method of knowledge acquisition is the handcrafting of an expert’s
knowledge in some formal systems, e.g., in the form of production rules (Shortliffe [1974], Davis [1976]) or as
a semantic net (Brachman [1978]). Progress in the theory of induction and the development of efficient induc-
tive programs can provide valuable assistance and an aliernative method in this area. For example, inductive
programs could be useful for filling in gaps, and testing the consistency and completeness of expert-derived
decision rules, for removing redundancies, or for incremental improvement of the rules through the analysis of
their performance. They could also provide a means for detecting regularities in data bases and knowledge
bases. For appropriately selected problems, the programs could determine the decision rules directly from
examples of expert decisions, which would greatly facilitate the transfer of knowledge from experts into
machines. Experiments on the acquisition of rules for the diagnosis of soybean diseases (Michalski ef al.
[1980]), have indicated that rule-learning from examples is not only feasible, but in certain aspects is prefer-
able.
Another potential applicaton of computer induction is in various areas of science, e.g., biology, microbiol-
ogy, and genetics. Here it could assist a scientist in revealing structure or detecting interesting conceptual pat-
terns in collections of observations or results of experiments. The traditional mathematical techniques of
regression analysis, numerical taxonomy, factor analysis, and distance-based clustering techniques are not
sufficiently adequate for this task. Methods of conceptual data analysis are needed, whose results are not
mathematical formulas but conceptual descriptions of data, involving both qualitative and quantitative relation-
ships.
An important sub-area of computer inductive inference is automatic programming (e.g., Shaw et al.
[1975], Jouannaud er a/. [1979], Burstall e a/. [1977], Biermann [1978], Smith [1980], and Pettorossi [1980]).
Here, the objective is to synthesize a program from I/O pairs or computational traces, or to improve its compu-
tational efficiency by application of correctness-preserving transformation rules. The final result of learning is
thus a program, in a given programming language, with its inherent sequential structure, destined for machine
rather than human ‘‘consumption’’ (or, in other words, a description in ‘‘computer terms’’ rather than in
‘*human terms’’). In this case, the postulate of human comprehensibility, mentioned below, is of lesser
impor-
tance. Quite similar to research on automatic programming is research on grammatical inference (e.g.,
Bier-
mann and Feldman [1972], Yau and Fu [1978]) where the objective of learning is a formal grammar.
This paper is concerned with computer inductive inference, which could be called
a ‘‘conceptual’’ induc-
tion. The final result of learning is a symbolic description of a class or classes of entities
typically not computa-
tional processes in a form of a logical-type expression (e.g., a specification of the
program or a classification
rule). Such an expression is expected to be relatively ‘‘close’’ to a natural language
description of the same
class(es) of entities. Specifically, it should satisfy the following comprehensibility postulate:
INDUCTIVE LEARNING = 519
The results of computer inductive learning should be conceptual descriptions of data, similar to the descriptions a
human expert might produce observing the same data. They should be comprehensible by humans as single
‘chunks’ of information, directly interpretable in natural language, and use both quantitative and qualitative
information in an integrated fashion.
This postulate implies that a single description should avoid more than one level of bracketing, more than
one implication or exception symbol, avoid recursion, avoid including more than 3—4 conditions in a conjunc-
tion and more than 2—3 conjunctions in a disjunction, not include more than two quantifiers, etc. (the exact
numbers can be disputed, but the principle is clear). This postulate can be used to decide when to assign a
name to a specific formula and use that name inside of another formula. This postulate stems from the motiva-
tion of this research to provide new methods for knowledge acquisition and techniques for conceptual data
analysis. It is also well confirmed by the new role for research in artificial intelligence, as envisaged by Michie
[1977], which is to develop techniques for conceptual interface and knowledge refinement.
In this chapter we will consider two basic types of inductive inference: learning from examples and learn-
ing from observation (specifically, the so called ‘conceptual clustering”’).
(a) a set of observational assertions (data rules), which consist of data descriptions, {Ci}, specifying initial
knowledge about some entities (objects, situations, processes, etc.), and the generalization class, K;, associ-
ated with each Cj (this association is denoted by the symbol :: > ):
Descriptions C,; can be symbolic specifications of conditions satisfied by given situations, production rules,
sequences of attribute-value pairs representing observations or results of experiments, etc. The descrip-
tions are assumed to be expressions in a certain logical calculus, e.g., propositional calculus, predicate cal-
culus, or a calculus specially developed for inductive inference, such as variable-valued logic systems VL,
(Michalski [1973]) or VL> (Michalski [1978]).
520 MICHALSKI
.
(b) a set of background knowledge rules defining information relevant to the problem under consideration
This includes definitions of value sets of all descriptors’ used in the input rules, the properties of descrip-
tors and their interrelationships and any ‘‘world knowledge’’ relevant to the problem. The background
knowledge also includes a preference (or optimality) criterion, which for any two sets of symbolic descrip-
tions of the same generalization class specifies which one is preferable, or that they are equivalent with
regard to this criterion.
which is the most preferred among all sets of rules in the assumed format, that do not contradict the background
knowledge rules, and are, with regard to the data rules, consistent and complete.
A set of inductive assertions is consistent with regard to data rules, if any situation that satisfies a data rule
of some generalization class either satisfies an inductive assertion of the same class, or does not satisfy any
inductive assertion. A set of inductive assertions is complete with regard to data rules, if any situation that
satisfies some data rules also satisfies some inductive assertion.
It is easy to see that if a set of inductive assertions is consistent and complete with regard to the data
rules, then it is semantically equivalent to or more general than the set of data rules (i.e., there may exist situa-
tions which satisfy an inductive assertion but do not satisfy any data rule).
From a given set of data rules it is usually possible to derive many different sets of hypotheses which are
consistent and complete, and which satisfy the background knowledge rules. The role of the preference cri-
terion is to select one (or a few alternatives) which is (are) most desirable in the given application. The prefer-
ence criterion may refer to the simplicity of hypotheses (defined in some way), their generality, the cost of
measuring the information needed for their evaluation, their degree of approximation to the given facts, etc.
(Michalski [1978]).
* Descriptors are variables, relations and functions that are used in symbolic descriptions of objects or situations.
INDUCTIVE LEARNING — 521
Most of the research on computer inductive learning has dealt with a special subproblem of type Ia,
namely learning a conjunctive concept (description) characterizing a given class of entities. Here the data rules
involve only one generalization class (which represents a certain concept), or two generalization classes; the
second class being the set of ‘‘negative examples” (e.g., Winston [1970], Vere [1975], Hayes-Roth [1976]).
Where there is only one generalization class (the so-called uniclass generalization) there is no natural limit for
generalizing the given set of descriptions. In such case the limit can be imposed by the form of inductive asser-
tion (e.g., that it should be a most specific conjunctive generalization within the given notational framework, as
in (Hayes-Roth [1976]) and (Vere [1975]), or by the assumed degree of generality (Stepp [1978]). When there
are negative examples the concept of near miss (Winston [1970]) can be used to effectively determine the limit
of generalization.
A general problem of type Ia is to learn a characteristic description (e.g., a disjunctive description, grammar,
or an algorithm) which characterizes all entities of a given class, and does not characterize any entity which is
not in this class.
Problems of type Ib are typical pattern classification problems. Data rules involve many generalization
classes; each generalization class represents a single pattern. In this case, the individual descriptions C; are gen-
eralized so long as it leads to their simplification and preserves the condition of consistency (e.g., Michalski
[1980]). Obtained inductive assertions are discriminant descriptions, which permit one to distinguish one recog-
nition class from all other assumed classes. A discriminant description of a class is a special case of characteris-
tic description, where any object which is not in the class is in one of the finite (usually quite limited) number
of other classes. Of special interest are discriminant descriptions which have minimal cost (e.g., the minimal
computational complexity, or minimal number of descriptors involved).
Problems of type Ic are concerned with discovering a rule governing generation of an ordered sequence of
entities. The rule may be deterministic (as in letter sequence prediction considered in Simon and Lea [1973]),
or nondeterministic, as in the card game EULESIS (Dietterich [1980]). Data rules involve here only one gen-
eralization class, or two generalization classes, where the second class represents ‘‘negative examples.”
Problems of type II (learning from observation) are concerned with determining a characterization of a
collection of entities. In particular, such characterization can be a partition of the collection into clusters
representing certain concepts (‘‘conceptual clustering,’’ Michalski [1980], Michalski and Stepp [1983]). In this
case, data descriptions in (1) represent individual entities, and they all belong to the same generalization class
(i.e., data descriptions consist of a single row in (1)).
Methods of induction can be characterized by the type of language used for expressing initial descriptions
C;; and final inductive assertions C’j. Many authors use a restricted form (usually a quantifier-free) of predicate
calculus, or some equivalent notation (e.g., Morgan [1975], Fikes er al. [1972], Banerji [1977], Cohen [1977],
Hayes-Roth et al. [1978], Vere [1975]).
In our earlier work we used a special propositional calculus with multiple-valued variables, called
variable-valued logic system VL,. Later on we have developed an extension of the first order predicate cal-
culus, called VL, (Michalski [1978]). It is a much richer language than VL, including several novel operators
not present in predicate calculus, (e.g., the internal conjunction, internal disjunction, the exception, the selector).
We found these operators very useful for describing and implementing generalization processes; they also
directly correspond to linguistic constructions used in human descriptions. VL», also provides a unifying formal
framework for adequately handling descriptors of different types (measured on different scales). The handling
of descriptors each according to its type in the process of generalization is one of the significant aspects of our
approach to induction.
§22 MICHALSKI
1. The input data consist of descriptions of objects in terms of variables which are relevant to the problem,
and the machine is supposed to determine a logical or mathematical formula of an assumed form involv-
ing the given variables (e.g., a disjunctive normal expression, a regression polynomial, etc.).
2. The input data consist of descriptions of objects as in case 1, but the descriptions may involve a relatively
large number of irrelevant variables in addition to relevant variables. The machine is to determine a solu-
ton description involving only relevant variables.
3. This case is like case 2, except that the initial descriptions may not include the relevant variables at all.
Among irrelevant variables, they must include, however, also variables whose certain transformations
(e.g., represented by mathematical expressions or intermediate logical formulas) are relevant derived vari-
ables. The final formula is then formulated in terms of the derived variables.
The above cases represent problem statements which put progressively less demand on the content of the
input data (i.e., on the human defining the problem) and more demand on the machine.
The early work on concept formation and the traditional methods of data analysis represent case 1. Most
of the recent research deals with case 2. In this case, the method of induction has to include efficient mechan-
isms for selecting relevant variables (thus, this case represents selective induction. The formal logic provides
such mechanisms, and this fact is one of the advantages of logic-based solutions. Case 3 represents the subject
of what we call constructive induction.
Our research on induction using system VL, and initial work using VL», has dealt basically with case 2.
Later we realized how to approach constructive induction, and formulated the first constructive generalization
rules. We have incorporated them in our inductive program INDUCE | (Larson et al. [1977], Larson [1977])
and in the newer improved version INDUCE-1.1 (Dietterich [1978]).
The need for introducing the concept of constructive induction may not be obvious. The concept has
basically a pragmatic value. To explain this, assume first that the output assertions involve derived descriptors,
which stand for certain expressions in the same formal language. Suppose that these expressions involve, in
turn, descriptors which stand for some other expressions, and so on, until the final expressions involve only
initial descriptors. In this case the constructive induction simply means that the inductive assertions are multi-
level or recursive descriptions.
But this is not the only interesting case. Derived descriptors in the inductive assertions may be any arbi-
trary, fixed (i.e., not learned) transformations of the input descriptors, specified by a mathematical formula, a
computer program, or, implemented in hardware (e.g., the hardware implementation of fast Fourier transform).
Their specification may require a language quite different from the accepted formal descriptive language. To
determine these descriptors by learning, in the same fashion as the inductive assertions, may be a formidable
task. They can be determined, e.g., through suggestions of possibly useful transformations provided by an
expert, or as a result of some generate-and-test search procedure. In our approach, the derived descriptors are
determined by constructive induction rules, which represent segments of problem-oriented knowledge of
experts.
INDUCTIVE LEARNING — 523
Since selectors can include internal disjunction (see Appendix 1) and involve concepts of different levels of
generality (as defined by the generalization tree; see next section), the c-formulas are more general concepts
than conjunctive statements of predicates.
Other desirable forms of Cj are:
(CIVEC2. ee (3)
where C, Cl, C2, ... are c-formulas, and \, is the exception operator (see Appendix 1).
The motivation for this form comes from the observation that a description can be simpler in some cases,
if it states an overgeneralized rule and specifies the exceptions. Recently Vere [1978] proposed an algorithm
for handling such assertions in the framework of conventional conjunctive statements.
C(CrG@)) (4)
which consist of a context condition C and an implication C; — C, which states that properties in C,
hold only if C, is true.
Production rules used in knowledge-based inference systems are a special case of (4), when C is omitted
and there is no internal disjunction. Among interesting inductive problems regarding this case are:
Various aspects of the last problem within a less general framework were studied, e.g., by Hedrick [1974].
524 MICHALSKI
® Case assertions
This form occurs when a description is split into individual cases characterized by different values of a cer-
tain descriptor.
8 Types of descriptors
The process of generalizing a description depends on the type of descriptors used in the description. The
type of a descriptor depends on the structure of the value set of the descriptor. We distinguish among three
different structures of a value set:
il. Unordered
Elements of the domain are considered to be independent entities, no structure is assumed to relate them.
A variable or function symbol with this domain is called nominal (e.g., blood-type).
2 Linearly Ordered
The domain is a linearly (totally) ordered set. A variable or function symbol with this domain is called
linear (e.g., military rank, temperature, weight). Variables measured on ordinal, interval, ratio and abso-
lute scales are special cases of a linear descriptor.
Elements of the domain are ordered into a tree structure, called a generalization tree. A predecessor node
in the tree represents a concept which is more general than the concepts represented by the dependent
nodes (e.g., the predecessor of nodes “triangle, rectangle, pentagon, etc.,’ may be a ’polygon’). A variable
or function symbol with such a domain is called structured.
Each descriptor (a variable or function symbol) is assigned its type in the specification of the problem. In
the case of structured descriptors, the structure of the value set is defined by inference rules (e.g., see eqs. (8),
(9), (10)).
I Restrictions on Variables
Suppose that we want to represent a restriction on the event space saying that if a value of variable x, is 0
(e.g. °a person does not smoke’), then the variable x3 is "not applicable’ (x3 — the brand of cigarettes the
person smokes). This is represented by a rule:
For example, suppose that for any situation in a given problem, the atomic function f(x ;,x,) is always
greater than the atomic function g(x,,x7). We represent this:
For example, suppose that a predicate function ‘left’ is transitive. We represent this:
Other types of relationships characteristic for the problem environment can be represented similarly.
The rationale behind the inclusion of the problem background knowledge reflects our position that the gui-
dance of the process of induction by the knowledge pertinent to the problem is necessary for nontrivial induc-
tive problems.
C. Generalization Rules
The transformation from data rules (1) to inductive assertions (2) can be viewed (at least conceptually) as
an application of certain generalization rules.
A generalization rule is defined as a rule which transforms one or more symbolic descriptions in the same
generalization class into a new description of the same class which is equivalent or more general than the initial
set of descriptions.
A description
Ve>K (6)
is equivalent to a set
if any event (a description of an object or situation) which satisfies at least one of the V;, i =1,2,--- , satisfies
also V, and conversely. If the converse is not required, the rule (6) is said to be more general than (7).
The generalization rules are applied to data rules under the condition of preserving consistency and com-
pleteness, and achieving optimality according to the preference criterion. A basic property of a generalization
transformation is that the resulting rule has UNKNOWN truth-status; being a hypothesis, its truth-status must
be tested on new data. Generalization rules do not guarantee that the generated inductive assertions are useful
or plausible.
We have formalized several generalization rules, both for selective and constructive induction. Selective
induction differs from constructive induction in that selective does not generate any new descriptors in the gen-
eralization process. (The notation D, |< D) specifies that D, is more general than D)).
Selective generalization:
V (= Ral Ke WV [EeRola ke
This is a generally applicable rule; the type of descriptor L does not matter. For example, the description:
objects that are blue or red’ is more general than "objects that are red’.
This rule is also generally applicable. It is one of the most commonly used rules for generalizing informa-
tion. It can be derived from rule (i), by assuming that R» in (i) is equal the value set D(L). In this case
the selector [L = R,] always has truth-status TRUE and therefore can be removed.
Vile aleek
Vile blk VIL =a..b] ::> K
To illustrate rule (iii), consider as objects two states of a machine, and as a generalization class, a charac-
terization of the states as normal. The rule says that if two normal states differ only in that the machine
has two different temperatures, say, a and b, then the hypothesis is made that all states in which the
temperature is in the interval [a,b] are also normal.
V[L=a]::>K
Vil=biieK
one or Zz
more rules [L=s]—K
VIL=il::> K
S - represents the node at the next level of generality than nodes a,b, ... and i, in the tree domain of L
(i.e., is the most specific common generalization of nodes a,b, ... i).
The rule is applicable only to selectors involving structured descriptors. This rule has been used, e.g., in
(Winston [1970], Hedrick [1974], Lenat [1976]).
Example:
V,(L=Rl ::>K
V,[L=R5] 3 > “K < [L#R,] :>K
where Ry C1) R, = ()
528 MICHALSKI
This rule is generally applicable. It is used to take into consideration ‘negative examples’, or, in general,
to maintain consistency. It is a basic rule for determining discriminant class descriptions.
one or
< yl
3 xVipt. =k
more rules
VIpGi,¥)] 2: >K
It can be proven that this rule is a special case of the extending reference rule (i). This is a rule of gen-
eral applicability. It is the basic rule used in inductive learning methods employing predicate calculus.
attribute;(P;) - stands for an attribute of P;, e.g., color, size, texture, etc.
Example:
This is a generalization rule, because a set of objects with any two red parts is a superset of a set of
objects with two parts which are red and one part which is blue.
The rule can be extended to a more general form, in which in addition to the arbitrary context formula V
there is a predicate CONDITION(P),...,P,), which specifies some conditions imposed on variables
Piebe
If the arguments of different occurrences of a transitive relation (e.g., relation ’above’, “left of’, “larger
than’, etc.) form a chain, i.e., are linearly ordered by the relation, the rule generates descriptors relating
to specific objects in the chain. For example:
LST-object - the “least object’, i.e., the object at the beginning of the chain (e.g., the bottom object in the
case of relation ’above’)
MST-object - the object at the end of the chain (e.g., the top object)
position(object) - the position of the object in the chain.
Suppose that in the data rules, in the context of condition C, an ascending order of values of a linear
descriptor x; corresponds to an ascending (or descending) order of values of another linear descriptor X;
with the same quantified arguments. For example, whenever descriptor weight(P) takes on increasing
values, then the descriptor length(P) also takes on the increasing values. In such situations a two-
argument predicate descriptor is generated:
If the number of different occurrences of x; and x; is Statistically significant, then the ‘‘monotonic”’
descriptors f{ (x;,x;) and | (x;,x;) can be generalized to:
530 MICHALSKI
True, if r(x;,x)) 27
Ti (x;,X;)
~ |False, otherwise
(positive correlation)
(negative correlation)
where r(x;,x;) denotes the coefficient of statistical correlation, and 7 is a certain threshold, On<e7ae le
The concept of generalization rules is very useful for understanding and classifying different methods of
induction (Dietterich and Michalski [1979]).
l. EASTBOUND TRAINS
2. WESTBOUND TRAINS
————————_—_—__
=
Figure 1. Find a rule distinguishing between these two
classes of trains.
532 MICHALSKI
At the next step, data rules were formulated, which characterized trains in terms of the selected descrip-
tors, and specified the train set to which each train belongs. For example, the data rule for the second east-
bound train was:
J car,,car,,car3,car4,load),loady....
linfront(car,,car)] [infront(car>,car3)]...[length(car,) = long]&
[car—shape (car ,)=engine] [car—shape(car)) =V —shaped]&
[cont—load(car>,load;)]&
[load—shape (load ) =triangle]...[nrwheels(car3) =2]..:: >[class=Eastbound]
Background knowledge rules were used to define the structures of structured descriptors (arguments of
descriptors are omitted as irrelevant here):
[car — shape=open rctngl V open trapezoid,U — shaped V dbl open rctngl] = >
The criterion of preference was to minimize the number of rules used in describing each class, and, with
secondary priority, to minimize the number of selectors (expressions in brackets) in each rule.
The INDUCE program produced the following inductive assertions’:
Eastbound trains:
It can be interpreted:
Ifa train contains a car which is short and has a closed top then it is an eastbound train.
Alternatively,
It can be interpreted:
Westbound trains:
Either a train has three cars or there is a car with jagged top. (13)
Scar [nr—cars—length
ong =2] [position(car) =3] [shape(car) =open—top V jagged—top]
:: > [class-Westbound]
There are two long cars and the third car has open-top or jagged top.
* It may be a useful exercise for the reader to try at this point to determine his/her own solutions, before reading the computer solutions.
534. MICHALSKI
It is interesting to note that the example was constructed with rules (12) and (13) in mind. The rule (11)
which was found by the program as an alternative was rather surprising because it seems to be conceptually
simpler than rule (12). This observation confirms the thesis of this research that the combinatorial part of an
induction process can be successfully handled by a computer program, and, therefore, programs like the above
have a potential to serve as a useful aid to induction processes in various practical problems.
The descriptors underlined by the dotted lines (nr-cars-length-long’) are new descriptors, generated as a
result of constructive induction. How were they generated? The constructive generalization rules are imple-
mented as modules which scan the data rules and search for certain properties. For example, the counting rule
of constructive generalization checks for each unary descriptor (e.g., length(car)) how many times a given
value of the descriptor repeats in the data rules.
In our example, it was found that the selector [length(car) =long] occurs for two quantified variables in
every Westbound train, and therefore a new descriptor called ’nr-cars-length-long’ was generated, and a new
selector [nr cars-length-long=2] was formed. This selector, after passing a ‘relevance test’, was included in the
set of potentially useful selectors. The relevance test requires that a selector is satisfied by a sufficiently large
number of positive examples and a sufficiently small number of negative examples (see the details later). Dur-
ing the generation of alternative assertions, this selector was used as one of the conditions in the assertion (14).
The descriptor ’position(car)’ was found by the application of the chain rule.
Now, how does the whole program work? Various versions of the program were described in (Larson
[1977], Michalski [1978], Dietterich [1978]). Appendix 2, provides a description of the top level algorithm.
Here we will give a summary of the main ideas, their limitations, and describe some problems for future
research.
The work of the program can be viewed essentially as the process of applying generalization rules, infer-
ence rules (describing the problem environment) and constructive generalization (generating new descriptors)
to the data rules, in order to determine inductive assertions which are consistent and complete. The preference
criterion is used to select the most preferable assertions which constitute the solution.
The process of generating inductive assertions is inherently combinatorially explosive, so the major ques-
tion is how to guide this process in order to detect quickly the most preferable assertions.
As described in Appendix 2, the first part of the program generates (by putting together the ’most
relevant’ selectors step-by-step) a set of consistent c-formulas. A simple relevance test for a selector is to have
a large difference between the number of data rules covered by the selector in the given generalization class and
the number of rules covered in other generalization classes.
C-formulas are represented as labelled graphs. Testing them for consistency (i.e., for null intersection of
descriptions of different generalization classes) or for the degree of coverage of the given class is done by deter-
mining the subgraph isomorphism. By taking advantage of the labels on nodes and arcs, this Operation was
greatly simplified. Nevertheless, it consumes much time and space.
In the second part, the program transforms the consistent c-formulas into VL, events (i.e., sequences of
values of certain many-valued variables (Michalski [1973]), and further generalization is done using AQVAL
generalization procedure (Michalski and Larson [1978]). During this process, the extension against, closing the
interval and climbing generalization tree generalization rules are applied. The VL, events are represented
as
binary string, and most of the operations done during this process are simple logical operations on binary
strings. Consequently, this part of the algorithm is very fast and efficient. Thus, the high efficiency of the pro-
gram is due to the transformation of the data structures representing the rules into more efficient form
in the
second part of the algorithm (after determining consistent generalizations).
INDUCTIVE LEARNING — 535
A disadvantage of this algorithm is that the extension of references of selectors (achieved by the applica-
tion of the extension against, the closing interval and climbing generalization rules) is done after a supposedly
relevant set of selectors have been determined. It is possible that a selector from the initial data rules or one
generated by constructive generalization rules that did not pass the ’relevance test’, could still turn out to be
very relevant, if its reference were appropriately generalized. Applying the above generalization rules to each
selector represented as a graph structure (i.e., before the AQVAL procedure takes over) is, however, computa-
tionally very costly, and we decided against it in the INDUCE program. This problem will be aggravated when
the number of constructive generalization rules generating derived descriptors is increased. We plan to seek
solutions to this problem by designing a better descriptor relevance test, determining more adequate data struc-
tures for representing selectors and testing intersections of descriptions, and by applying problem background
knowledge.
Another interesting problem is how to provide an inductive program with the ability to discover relevant
derived descriptors, which are arithmetic expressions involving the input variables and to integrate them as
parts of inductive assertions. For example, suppose that the Eastbound trains in figure | are characterized as:
When the train has 3 cars, the load of the first two cars is twice the total load of Westbound cars, and
when the train is longer, the load of the first two cars is equal the total load of Westbound cars!
How would one design an efficient algorithm which could discover such an assertion?
Let us now consider a problem of describing, say, the Eastbound trains not in the context of Westbound
trains, but in the context of every possible train which is not Eastbound. This is a problem of determining a
characteristic description of Eastbound trains (type Ia).
A trivial solution to this problem is a ’zero degree generalization’ description, which is the disjunction of
descriptions of individual trains. A more interesting solution (although still of "zero degree generalization’)
would be some equivalence preserving transformation of such a disjunction, which would produce a computa-
tionally simpler description. Allowing a ’non-zero degree generalization’ leads us to a great variety of possibili-
ties, called the version space (Mitchell [1978]). As we mentioned before (Sec. A.1), the most studied solution
is to determine the most specific conjunctive generalization (i.e., the longest list of common properties).
Another solution is to determine the description of minimal cost whose degree of generality is under certain
threshold (Stepp [1978]). INDUCE 1.1 gives a solution of the first type, namely, it produces a set of the most
specific (longest) c-formulas (quantified logical products of VL»; selectors). Here are examples of such formu-
las:
536 MICHALSKI
(In every train there is a short car with closed top and two wheels)
Acar [position
(car) =2] [car—shape(car) =open—top]
[nr—cars=4 V 5]
(number of cars is 4 or 5)
Vcar[nr—wheels=2 V 3]
Problems of this type have been intensively studied in the area of cluster analysis and pattern recognition
(as “learning without teacher’, or unsupervised learning). The methods which have been developed in these
areas partition the entities into clusters, such that the entities within each cluster have a high ‘degree of similar-
ity’, and entities of different clusters have a low degree of similarity’. The degree of similarity between two
entities is typically a function (usually a reciprocal of a distance function), which takes into consideration only
properties of these entities and not their relation to other entities, or to some predefined concepts. Conse-
quently, clusters obtained this way rarely have any simple conceptual interpretation.
In this section we will briefly describe an approach to clustering which we call conceptual clustering. In this
approach, entities are assembled into a single cluster, if together they represent some concept from a pre-
defined set of concepts. For example, consider the set of points shown in Figure 2.
ee
e @
e
®
ore rs
e coi. ag ae
Figure 2.
A typical description of this set by a human is something like ’a circle on a straight line’. Thus, the points
A and B, although closer to each other than to any other points, will be put into different clusters, because they
are parts of different concepts.
Since the points in Figure 2 do not constitute a complete circle and straight line, the obtained conceptual
clusters represent generalizations of these data points. Consequently, conceptual clustering can be viewed as a
form of generalization of symbolic descriptions, similarly as problems of learning from examples. The input
rules are symbolic descriptions of the entities in the collection. To interpret this problem as a special case of
the paradigm in Sec. A.1, the collection is considered as a single generalization class.
If the concepts into which the collection is to be partitioned are defined as C-formulas, then the generali-
zation rules discussed before would apply. The restriction imposed by the problem is that the C-formulas logi-
cally intersect, as each cluster should be disjoint from other clusters.
538 MICHALSKI
We will describe here briefly an algorithm for determining such a clustering, assuming that the concepts
are simpler constructs than C-formulas, namely, non-quantified C-formulas with unary selectors, i.e., logical pro-
ducts of such selectors. Unary selectors are relational statements:
[x; # R\]
where:
A selector [x,; = Ril ({x;#Ri]) is satisfiedby a value of x;, if this value is in relation = (+) with some (all)
values from R;. Such restricted c-formulas are called VL; complexes or, briefly, complexes (Michalski [1980]).
Individual entities are assumed to be described by events, which are sequences of values of variables x;:
(ajay)
where a; € Dom(x;), and Dom(x;) is the value set of x;,i=1,2,..., n. An event e is said to satisfya complex, if
values of x; in e satisfy all selectors. Suppose E is a set of observed events, each of which satisfies a com-
plex C. If there exist events satisfying C which are not in E, then they are called unobserved events. The
number of unobserved events in a complex is called the absolute sparseness of the complex. We will consider
the following problem. Given is an event set E and an integer k. Determine k _ pairwise disjoint complexes
such that:
The theoretical basis and an algorithm for a solution of this problem (in somewhat more general formula-
tion, where the clustering criterion is not limited to sparseness) is described in Michalski [1980]. The
algorithm is interactive, and its general structure is based on dynamic clustering method (Diday and
Simon [1976]). Each step starts with k specially selected data events, called seeds. The seeds are treated
as representatives of k classes, and this way the problem is reduced to essentially a classification problem
(type 1b). The step ends with a determination of a set of k complexes defining a partition of E. From
such complex a new seed is selected, and the obtained set of k seeds is the input to the next iteration.
The algorithm terminates with a k partition of E, defined by k complexes, which have the minimum
or sub-minimum total sparseness (or, generally, the assumed cost criterion).
Figure 3 (on the next page) presents an example illustrating this process. The space of all events is
defined by variables x,,x2,x3 and x4, with sizes of their value sets 2, 5, 4 and 2, respectively. The space is
represented as a diagram, where each cell represents an event. For example, cell marked e represents
event(0,0,2,0).
INDUCTIVE LEARNING 539
Z|
i
socaue
a
a
2 ou:
a. ITERATION 1
biG
ITERATION 2
Sparseness = 18 Sparseness = 20
Imbalance= 1.6 Imbalance = 3.6
Dimensionality= 3 Dimensionality= 3
a
HBLoe
X\Xe XiXe
any
OF® a> Ennai a,
O ane O02) ee tea
at
iso
e 2te)
Baaeoo
male S5En Cpu
Bannnsnnny-
dasees MEer sas
is
O!'1}0O xy a] Ow EEO. ahs
Oe acu on ONei hen (eS ho
ITERATION 3 d. —s(TERATION4
, (Optimal solution)
Sparseness = 12 Sparseness = |6
Imbalance= 7.3 Imbalance= 3.6
Dimensionality= 4 Dimensionality= 3
Figure
Cells marked by a vertical bar represent data events, while remaining cells represent unobserved events.
3a also shows complexes obtained in the first iteration. Cells representing seed events in each iteration are
marked by +. So in the first iteration, assuming k=3, three seeds were chosen: O:150,0).-G=1,0,0) and
(1,1,0,1). The complexes determined in this iteration are a,a,a3 (Figure 3a), with the total sparseness 18 (the
total number of unmarked cells in these complexes). Figures 3b, c, and d show the results of the three con-
secutive iterations. The solution with the minimum sparseness is shown in Figure 3c. It consists of complexes:
ap = [xy = Ix, = fi
This result was obtained by program CLUSTER/PAF* implementing the algorithm in PASCAL language
on Cyber 175 (Michalski and Stepp [1982]).
Another experiment with the program involved clustering 47 cases of soybean diseases. These cases
represented four different diseases, as determined by plant pathologists (the program was not, of course, given
this information). Each case was represented by an event of 35 many-valued variables. With k=4, the pro-
gram partitioned all cases into four categories. These four categories turned out to be precisely the categories
corresponding to individual diseases. The complexes defining the categories contained known characteristic
symptoms of the corresponding diseases.
Program CLUSTER/PAF is very general and could be useful for a variety of tasks that require a determi-
nation of intrinsically disjunctive descriptions. For example, such tasks are splitting a goal into subgoals, dis-
covering useful subcases in a collection of computational processes, partitioning specific facts into conceptual
categories, formulating cases in program specification.
F. Summary
We have presented a view of inductive inference as a process of generalization of symbolic descriptions.
The process is conducted by applying generalization rules and the background knowledge rules (representing prob-
lem specific knowledge) to the initial and intermediate descriptions. It is shown that both learning from exam-
ples and learning from observation can be viewed this way.
A form of learning from observation, called *conceptual clustering’ was described, which partitions a col-
lection of entities into clusters, such that each cluster represents a certain concept. Presented methods for
learning from examples (INDUCE) and automated conceptual clustering (CLUSTER/PAF) generate logic-style
descriptions that are easy to comprehend and interpret by humans. Such descriptions can be viewed as formal
specifications of programs solving tasks in the area of computer-based decision making and analysis of complex
data.
G. Acknowledgement
A partial support of this research was provided by the National Science Foundation under grants
MCS79—06614 and MCS82—05116. The author is grateful to Robert Stepp for useful discussions and for
proofreading the paper.
APPENDIX 1
Data rules, hypotheses, problem environment descriptions, and generalization rules are all expressed
using the same formalism, that of variable-valued logic calculus VL}}.
VL»; is an extension of predicate calculus designed to facilitate a compact and uniform expression of descrip-
tions of different degrees and different types of generalization. The formalism also provides a simple linguistic
interpretation of descriptions without losing the precision of the conventional predicate calculus.
There are three major differences between VL}, and the first order predicate calculus (FOPC):
lA In place of predicates, it uses selectors (or relational statements) as basic operands. A selector, in the most
general form, specifies a relationship between one or more atomic functions and other atomic functions or
constants. A common form of a selector is a test to ascertain whether the value of an atomic function is a
specific constant or is a member of a set of constants.
The selectors represent compactly certain types of logical relationships which can not be directly
represented in FOPC but which are common in human descriptions. They are particularly useful for
representing changes in the degree of generality of descriptions and for syntactically uniform treatment of
descriptors of different types.
2. Each atomic function (a variable, a predicate, a function) is assigned a value set (domain), from which it
draws values, together with a characterization of the structure of the value set.
This feature facilitates a representation of the semantics of the problem and the application of generaliza-
tion rules appropriate to the type of descriptors.
3. | Anexpression in VL}, can have a truth status: TRUE, FALSE or ? (UNKNOWN).
The truth-status °?’ provides an interpretation of a VL»; description in the situation, when e.g., outcomes of
some measurements are not known.
An atomic function is a variable, or a function symbol followed by a pair of parentheses which enclose a
sequence of atomic functions and/or constants. Atomic functions which have a defined interpretation in the
problem under consideration are called descriptors.
A constant differs from a variable or a function symbol in that its value set is empty. If confusion is possi-
ble, a constant is typed in quotes.
* VL» is a subset of a more complete system VL), which is a many valued-logic extension of predicate calculus.
542 MICHALSKI
Examples
Constants 2 * red
Atomic forms: x, color(box) on—top(pl,p2) ((x;,g(x2))
Exemplary value sets:
DG) —Orle 10}
D(color) = {red, blue, --- }
D(on-top) = {true, false}
D(f) = {0,1,..., 20}
A selectoris a form
oe aR
where
L - called referee, is an atomic function, or a sequence of atomic functions separated by ’.?. (The
operator ’.’ is called the internal conjunction.)
# - is one of the following relational operators:
Linguistic interpretation
A VL»; expression (or, here, simply VL expression) is defined by the following rules:
The truth-status of
DEFINITION OF CONNECTIVES
~ & V, AND —
IN VL,
Figure Al.
INDUCTIVE LEARNING — 545
where QF; is a quantifier form FA x}4,Xx2,..., or VW x 4,x2,° °° and C; is a conjunction of selectors (called a com-
plex) is called a disjunctive simple VL expression (a DVL expression).
To make possible a name substitution operation, the following notation is adopted:
If FORMULA is an arbitrary VL»; expression then V: FORMULA assigns name V to the FORMULA.
If FORMULA is a VL», expression containing quantified variables P,,P>,..., Py, and V is the name of the
expression, then
P; -V
denotes the quantified variable P; in the FORMULA.
The latter construct enables one to refer in one expression to quantified variables inside of other expres-
sions.
546 MICHALSKI
APPENDIX 2
At the first step, the data rules whose condition parts are in the disjunctive simple forms are transformed
to a new set of rules, whose condition parts are in the form of c-expressions. A c-expression (a conjunctive
expression) is a product of selectors accompanied by zero or more quantifier forms, i.e., forms QFX,,Xz,...,
where QF denotes a quantifier. (Note, that due to the use of the internal disjunction and quantifiers, a c-
expression represents a more general concept than a conjunction of predicates.)
A generalization class is selected, say Kj, and all c-expressions associated with this class are put into a set
Fl, and all remaining c-expressions are put into a set FO (the set Fl represents events to be covered, and
set FO represents constraints, i.e., events not to be covered).
By application of inference rules representing background knowledge and constructive generalization rules,
new selectors are generated. The most promising selectors (according to the preference criterion) are
added to the c-expressions in Fl and FO.
A c-expression is selected from F1, and a set of consistent generalizations (a restricted star) of this expres-
sion is generated (Michalski and Stepp [1982]). This is done by starting with single selectors (called
*seeds’), selected from this c-expression as the most promising ones (according to the preference cri-
terion). In each subsequent next step, a new selector is added to the c-expression obtained in the previ-
ous step (initially the seeds), until a specified number (parameter NCONSIST) of consistent generaliza-
tions is determined. Consistency is achieved when a c-expression has NULL intersection with the set
FO. This ’rule growing’ process is illustrated in Fig. A2.
The obtained c-expressions, and c-expressions in FO, are transformed to two sets El and EO, respectively,
of VL, events (i.e., sequences of values of certain discrete variables).
A procedure for generalizing VL, descriptions is then applied to obtain the *best cover’ (according to a
user defined criterion) of set El against EO (the procedure is a version of AQVAL/1 program Michalski
and Larson [1978]).
During this process, the extension against, the closing the interval and the climbing generalization tree rules
are applied.
The result is transformed to a new set of c-expressions (a restricted star) in which selectors have now
appropriately generalized references.
If the c-expression completely covers Fl, then the process repeats for another decision class. Otherwise,
the set Fl is reduced to contain only the uncovered c-expressions, and steps 4 to 7 are repeated
for the
same generalization class.
The implementation of the inductive process in INDUCE-1.1 consists of a large collection
of specialized
algorithms, each accomplishing certain tasks. Among the most important tasks are:
INDUCTIVE LEARNING = 547
1. ‘‘Growing”’ rules.
Vr. Testing whether one c-expression is a generalization of (‘covers’) another c-expression. (This is done by
testing for subgraph isomorphism).
3. Generalizating a c-expression by extending the selector references and forming irredundant c-expressions
(includes application of AQVAL/1 procedure).
Program INDUCE 1.1 has been implemented in PASCAL (for Cyber 175) its description is given in (Lar-
son [1977] and Dietterich [1978]).
548 MICHALSKI
O O
O =e
O O
O O O
O- a disgarded c-rule
® - an active c-rule
Figure A2.
INDUCTIVE LEARNING = 549
References
Banerji [1977]
R.B. Banerji, ‘‘Learning in structural description languages,’’ Temple University Report to NSF Grant MCS
716—0—200 (1977).
Biermann [1978]
A.W. Biermann, ‘‘The inference of regular LISP programs from examples,’ /EEE Trans. on Systems, Man, and
Cybernetics, Vol. SMC—8(8) (Aug. 1978) pp. 585—600.
Brachman [1978]
R.T. Brachman, ‘‘On the epistomological status of semantic networks,’ Report No. 3807, Bolt, Beranek and
Newman (April 1978).
Carnap [1962]
R. Carnap, ‘‘The aim of inductive logic,’ in Logic, Methodology and Philosophy of Science, E. Nagel, P. Suppes,
,
and A. Tarski, eds., Stanford, California: Stanford University Press (1962) pp. 303—318.
Cohen [1977]
B.L. Cohen, ‘‘A powerful and efficient structural pattern recognition system,’ Artificial Intelligence, Vol. 9,
No. 3 (December 1977).
Davis [1976]
R. Davis, ‘‘Applications of Meta-level knowledge to the construction, maintenance, and use of large knowledge
bases,’ Report No. 552, Computer Science Department, Stanford University (July 1976).
Dietterich [1980]
T.G. Dietterich, ‘‘A methodology of knolege layers for inducing descriptions of sequentially ordered events,’
Report No. 80—1024, Department of Computer Science, University of Illinois (May 1980).
Dietterich [1978]
T. Dietterich, ‘‘Description of Inductive program INDUCE 1.1,’’ Internal Report, Department of Computer
Science, University of Illinois at Urbana-Champaign (October 1978).
550 MICHALSKI
Hayes-Roth [1976] a
F. Hayes-Roth, ‘‘Patterns of induction and associated knowledge acquisition algorithms,’’ Pattern Recognition
and Artificial Intelligence, ed. C. Chen, Academic Press, New York (1976).
Hedrick [1974]
C.L. Hedrick, ‘‘A computer program to learn production systems using a semantic net,’ Ph.D. Thesis, Depart-
ment of Computer Science, Carnegie-Mellon University, Pittsburg (July 1974).
Larson [1977]
J. Larson, “‘INDUCE-1: An _ interactive inference program in VL», logic system,’? Report No.
UIUCDCS—R-77—876, Department of Computer Science, University of Illinois, Urbana, Illinois (May
1977).
Lenat [1976]
D. Lenat, ‘‘AM: An artificial intelligence approach to discovery in mathematics as heuristic search,” Computer
Science Department, Report STAN—CS—76—570, Stanford University (July 1976).
Michalski [1980]
R.S Michalski, ‘“‘Knowledge acquisition through conceptual clustering: A theoretical framework and an algo-
rithm for partitioning data into conjunctive concepts, special issue on knowledge acquisition and induction,”’
Inter. Journal on Policy Analysis and Information Systems, No. 3, 1980. (Also, Report No. 80—1026, Depart-
ment of Computer Science, University of Illinois, May 1980.)
Michalski [1980]
R.S. Michalski, *‘Pattern recognition as rule-guided inductive inference,’ IEEE Trans.
on Pattern Analysis and
Machine Intelligence (July 1980).
INDUCTIVE LEARNING | 551
Michalski [1978]
R.S. Michalski, ‘‘Pattern recognition as knowledge-guided computer induction,’ Report No. 78—927, Depart-
ment of Computer Science, University of Illinois, Urbana, Illinois (June 1978).
Michalski [1973]
R.S. Michalski, ‘‘AQVAL/1—Computer implementation of a variable-valued logic system and the application to
pattern recognition,’’ Proceedings of the First International Joint Conference on Pattern Recognition, Washington,
D.C., (October 30—November 1, 1973).
Michie [1977]
D. Michie, ‘‘New face of artificial intelligence,’’ Information 3 (1977) pp. 5-11.
Mitchell [1978]
T.M. Mitchell, ‘‘Vernion spaces: An approach to concept learning,’’ Doctor of Philosophy Thesis, Stanford
University (1978).
Morgan [1975]
C.G. Morgan, ‘‘Automated hypothesis generation using extended inductive resolution,’’ Advance Papers of the
4th International Joint Conference on Artificial Intelligence, Vol. I, Tbilisi, Georgia (September 1975) pp.
Bl 0:
Pettorossi [1980]
A. Pettorossi, ‘‘An algorithm for reducing memory requirements in recursive programs using annotations,”
IBID.
Shortliffe [1974]
E.G. Shortliffe, ‘‘A rule based computer program for advising physicians antimicrobial therapy selection,”’
Ph.D. Thesis, Computer Science Department, Stanford University (Oct. 1974).
Smith [1980]
D.R. Smith, ‘‘A survey of the synthesis of LISP programs from examples,’’ International Workshop on Pro-
gram Construction, Bonas (Sept. 1980).
552. MICHALSKI
Stepp [1978]
R. Stepp, ‘‘The investigation of the UNICLASS inductive program AQ7UNI and user’s guide,’ Report No.
949, Department of Computer Science, University of Illinois, Urbana, Illinois (November 1978).
Vere [1975]
S.A. Vere, ‘“‘Induction of concepts in the predicate calculus,’’ Advance Papers of the 4th International Joint
Conference on Artificial Intelligence, Vol. I, pp. 351—356, Tbilisi, Georgia (September 1975).
Winston [1970]
P.H. Winston, ‘‘Learning structural descriptions from examples,’ Technical Report Al TR—231, MIT AI Lab,
Cambridge, Massachusetts (1970).
Appendix
Alan W. Biermann
Duke University
Durham, NC 27706
A. Introduction
The details of some of the examples from Chapter | will be completed here.
P(a)
R(a,z) z
Thus, if P(a) is true, we seek z such that R(a,z) and that z is the desired output.
554 BIERMANN
The deductive approach provides techniques for adding assertions and goals with their appropriate output
entries to the initial sequent. The aim is to deduce a goal of ‘‘true”’ with a corresponding output entry in terms
of primitive machine instructions. For the example problem, a sequent entry will be deduced giving the target
program which removes the negative numbers from a list.
goal output
The synthesis begins by stating the constraints of the problem in the form of transformations:
Rule (1) specifies the required output if the input is NIL. Rules (2) and (3) give properties about relation R
necessary for the synthesis. Rule (2) asserts that if a is not NIL with a first element that is negative, that first
element may be ignored in the calculaton. Rule (3) asserts that if a is not NIL, the first element is not nega-
tive, and the answer u is known for f(cdr(a)), then the first element should be added to the front of list u.
The transformation (1) can be applied to the original goal.
goal output
R(a,z) Zz
goal output
goal output
indicating that if the goal a=NIL is achieved, the output NIL should be returned.
Similarly, transformation rules (2) and (3) can be applied to obtain additional entries in the sequent.
APPENDIX = 555
goal output
Next, we use the method for introducing recursion into the program synthesis. We noticed in (5) that a
special case of the original goal R(a,z) has occurred. This leads one to suspect that an inductive argument
might succeed so a new assertion is created, an inductive hypothesis.
assertion
(7) if v<a then
if P(v) then
R(v,f(v))
Thus, by the usual inductive argument, we assume the correctness of f for all v smaller than a by some meas-
ure < and try to prove it for a. The recursion results from a GA-resolution between (5) and (7) with the sub-
stitution v — cdr(a) and u + f(v).
goal output
not(a=NIL)A neg(car(a) )
A true/
not(if cdr(a) < a then f(cdr(a))
if P(cdr(a)) then
false)
This reduces to
goal output
Recall that we are proceeding to create a program f(a), and note that a recursive call to that program was
entered in the ouput column by this step. Thus we have seen how an inductive argument leads to a looping
structure using this formalism.
A similar resolution of (6) and (7) yields
goal output
(9) not(a=nil) A
cons(car(a) ,f(cdr(a) ))
not (neg(car(a)))
556 BIERMANN
goal output
which reduces to
goal output
The final program can be constructed using GG resolution on (4) and (10). The above steps can, in fact, be
assembled to become a complete derivation of the target program as follows:
INITIALIZE PROBLEM
f
INPUT VARIABLE
X
INPUT CONDITION
list (x)
OUTPUT CONDITION
if x=NIL then f(x) =NIL
otherwise f(x) =y where
Vu [member(u,y) <=> (member(u,x) A not(neg(u)))]
APPENDIX 557
We would expect the system to begin by generating code to handle the trivial case and then seek a recur-
sive solution to the general case. Following this strategy would yield the code
f(x) = if x = NIL then NIL else g(x)
where g(x) is a program yet to be created.
Bibel and Hornig argue that there are relatively few practical recursion schemes and that a program syn-
thesizer need only try those few. Their method would pose the problem, find g(x) such that
Here V stands for an exclusive ‘‘or”’ operation. This is rewritten to consider the two cases separately,
not(member(car(x),y)) and (member (car(x),y)).
g(x) =y where x¥NIL A list (y) A
[((Wu [(member(u,y) <=> (member(u,x)A nonneg(u))]
/\ (not(member(car(x),y))) ]
Vv
VYulmember(u,y) < = > (member(u,x)A nonneg(u))]
A (member(car(x),y)) ]
Next, the system uses domain knowledge of the types (2) and (3) from the previous section to obtain a form
where the recursion can be found.
(For simplicity, we have omitted some of the details concerning the case x=NIL. We leave it to the reader to
fill them in.) This yields
558 BIERMANN
Finally, the Bibel-Hornig method attempts to discover the condition under which not(member(car(x),y)) is
true. It generates a typical model and uses a theorem proving technique to show that the desired condition is
(neg(car(x))). Thus the function g is synthesized as
z=cons(car(cdr(x)),
cons(car(cdr(cdr(x))),
cons(car(cdr(cdr(cdr(cdr(x))))),
NIL)))
One can create the desired program by first breaking this expression into a set of primitive forms and then per-
forming a merge operation on these primitives.
i f1 (x)
f(x) = cons(f,(x), f5(x))
f(x) = f3(cedr(x))
f3(x) = f4(car(x))
x
fs(x) = cons(f6(x), fio(x))
f6(x) = f7(cdr(x))
f4(x) = fg(cdr(x))
fg(x) = fo(car(x))
fy(x) = 2.
fio(x) = cons(f}1(x), f17(x))
f1y(x) = fy9(cdr(x))
f19(x) =, f13(cdr(x))
f13(x) = f4(cdr(x))
f14(x) a fy5(cdr(x))
fi5(x) = fy6(car(x))
f16(x) ek
f17(x) = NIL
APPENDIX 559
It turns out that this trace represents a rather inefficient calculation. Various car and cdr operations can be
saved as explained in Chapter 17 by giving these operations precedence. Following the method of Chapter 17,
the trace is revised to this.
f(x) = f,(cdr(x))
f>(x) = cons(f3(x), fs5(x))
f3(x) = f4(car(x))
f4(x) = x
f(x) = f(cdr(x))
f¢(x) = cons(f7(x), fo(x))
f(x) = fg(car(x))
fg(x) are
f(x) = f io(cdr (x) )
fio(x) = f1,(cdr(x))
fi,(x) = cons(fy2(x), fi4(x))
fio(x) = f13(car(x))
f13(x) = aX
fi4(x) = NIL
Proceeding with the synthesis using the above fourteen functions, every possible merger is attempted. If
a function becomes ambiguously defined due to a merger, a conditional is introduced to remove the problem.
That is, if g becomes defined both as g = f, and g = f>, then an attempt is made to define g as
| ae cond((pj,fj),
(p>,f>))
If predicates p, and p> can be defined so that the definition is successful, the new definition of g is maintained.
However, if they cannot be found, then the merger is a failure and is discarded. Mergers may also be disal-
lowed if they introduce infinite loops or contradictions as will be illustrated. The final program is the code
remaining after all possible mergers have been completed.
The example synthesis thus begins with an attempt to merge f, and fp.
f(x) = cond((p,, f,(cdr(x)))
(py,cons(f3(x), f5(x))))
But from the example, we see that branch f,(cdr(x)) is to be taken when x = (—7 2 9 —3 4) and the other is
taken when x = (2 9 —3 4). So the predicate generator produces p;’s and the code becomes
The merger of f; and f, appears to be successful. (It is possible a later contradiction would result in back up
and a return to separate f, and f).)
Next we attempt to merge f, and f3, but this leads to an infinite cons loop if car(x) is not negative. So
this merger fails. Function f3 will be distinct from f,. Similarly fs; must be distinct from f).
We could attempt to merge f3 and fs to obtain
But both branches are taken for x = (2 9 —3 4) so this merger fails also. Thus, the code
f3(x) = f,(car(x))
f5(x) = f¢(cdr(x))
In fact, subsequent mergers follow easily and the above program is the final program. The final partition
on the fourteen functions becomes {f},f>,fa,fefe.fiofisfis}, {f3.f7f12}, and {fs.fo,f14}. (For simplicity, special
considerations related to the merger of the NIL function f;4 have been omitted here.
Next, the nonlinearity of the input-output graph suggests the introduction of the branching rule from Chapter
Lge
This has generated a call for the P! rule with w=1, XL=Xp, and next=P*(cdr(X))).
P?(X>5) =cond((atom(X
) ,NIL)
(Pie. x5))
P}(X1,X») =cond((not(neg(car(X;))), P,(cdr(X1),X1,X))
(TP “edeGx,)))
Pf, (Xo,X1,X2) =cons(car(X,), P2(cdr(X)))
jie
AUTHOR INDEX — 563
AUTHOR INDEX
Depper, P. 158
Derksen, J.A. 482
Dershowitz, N. 161, 179
Deschamp, Ph. 183, 185, 188, 198
Dewar, R. 80, 88
Diday, E. 538, 549
Dyetenichy 1G. 521,05 2275305534. 536954725495 50
Dijkstra, Baw 2 2822072210224" 907. 978" 229-41 2302
Donzeau-Gouge, V. 133, 134
Dosch, W. 158
Eder, E. 88
Eigemeier, H. 80, 88
Elcock, E.W. 107, 458
Elschlager, Q. 372
Elschlager, R. 29
Elspas, B. 133
Englemore, R.S. 445, 458
Erman, L.D. 442, 458, 459
Fargues, 32° 323, 373
Faught, W.S. 433, 439, 458
Feather, M.S. 160, 163, 176, 179, 290, 302
Feigenbaum, E.A. 444, 459
Feldman, J.A. 485, 501, 505, 506, 508, 509, 510, 513, 518, 549
Feys, R. 161, 179
Fikes) R-E.. 521, 550
Floyds Rew.” 13. 14, 28. 110, T18-119. 120) 139) 15852077 24I 445527 20
Follett, R. 4, 10, 91, 99, 101, 102, 107
Fosdick, L.O. 134
Freivald, R.V. 506, 511, 512,513
Fronhofer, B. 88
Fu, K.S. 484, 500, 505, 514, 518, 549, 552
Furbach, U. 501
Gaines, B.R. 507, 508, 514
GaudeleMC. 10-11. 28.160, 167.176, 1800 183) 185. .1S8 2198
Gerhart, S.L. 13, 29, 134, 224, 236, 242, 394, 420
Gilchrist, B. 322,372
Gillogly, J.J. 438, 458
Ginzberg, M.J. 29
Gloess, P. 123
Goguen, J.A. 10, 29, 160, 168, 179, 180, 184, 185, 198
Gold, E.M. 504, 505, 506, 507, 508, 511, 514, 308, 322
Goldman, N. 419
Gonzalez, R.C. 505, 514
Goodenough, J.B. 224, 242
Goodnow, J.J. 482
Gordon, M. 111, 120
Gostelow 277, 286
Greeny C.C 712.29 133, 36:16) 280.87, 88,98, 107, LO. 1202020221593 23.43 712,394, 42035 458 ood
Gresse, C. 28, 29, 87, 88, 133
Greussay 373
Gries, D. 30, 221, 242, 244, 258, 259, 261, 271
566 AUTHOR INDEX
Guida. G53
Guiho,.G. 3. 1028; 29987885 133, 318731993205 3218 323.2485..49 om
Guttag, J.V. 10, 29, 121, 140, 148, 156, 160, 164, 170, 180, 184, 185, 198, 400, 420
Halbwachs, H. 244, 251, 270
Hantler, S.L. 236, 242, 411, 420
Hardy, J. 372
Hardy, S. 309% 323, 399, 420
Harriman, D.C. 134
Harris, G. 437, 459
Hartakero 550
Hayes Jb 201.551
Hayes-Roth, B. 436, 444, 458
Hayes-Roth, F. 23, 29, 436, 437, 444, 458, 459, 460, 466, 482, 521, 550
Hecht, M.S. 134
Hedrick, G.I, 523,527,530
Hehner, E. 231, 242
Heidorn, G.E. 394, 420
Hendrix, G. 439, 459
Hewitt, C. 42, 67
Hoare. :G-AcR® 13> 28.29. 110, Wd 112, 103;- 1145 105, 116. 198.119 20D eel O ISS 7 eo
25552 lero 2 4.
Hoffman, C.M. 286
Holloway, G.H. 134
Hopcroft, J.E. 88, 218, 221
Homiga KoM: 45 7.8, 10) 69. 85, 89. 556,557. 558
Horning, J.J. 10, 29, 121, 160, 180, 184, 198, 507, 508, 514
Horowitz, E. 180, 198, 420
HuetaG. 13451605162, 164, 180. 290, 302.336, 372
Hunt, E.B. 436, 459
Jantke KEP. Ol S14
Johnson, D.L. 420
Jones, N. 266, 270, 271
Jouannauds JP. 15. 165-87, 89, 161, 180; 318, 319, 320. 321, 323, 325, 3265 3325 34750 oe aor eon
497 S01 S11, 514, S182 550
Kahn, G. 134
Kamin, S. 161, 180
Kanoui, H. 88
Kant, E. 13, 29
Kaplan, R.M. 133, 135
Katz, S.M. 224, 242
Kayser, D. 525, 549
Keller RiM. 2442 953.271
Kerschenbaum, A. 220, 222
Kibler, D.F. 134
King, J.C. 134, 236, 242, 420, 444, 458
Klahr, P. 433
Knabe, Ch. 88
Knuth Dee. 16110512161 180; 184.1992 329% 373
Kodratott. Y. 36915. 16, 23, 28. 29, 87,89, 312; 317; 318,320) 32123233 25s 26m ao OOn 342, 347, 350,
39257908, 5/95 421, 422, 428. A430, 431,497. 502. S11. See S50)
Kotovsky, K. 436, 459
AUTHOR INDEX _ 567
Ullmann, J. 88
Van Caneghem, M. 88
Van der Mude, A. 507, 508, 515
Van Emden, M.H. 227, 231, 232, 241
Van silvkew Re. 20.e222
Veloso, P.A.S. 160, 169, 181
Vere 5 As 239.00: 436,.45992 152351552
Vuillemin, J. 139, 158, 164, 181, 373, 431
Wadge, W. 286, 287
Wadsworth, C. 120
Wagner, E.G. 29, 180, 181
Waldinger, R.J. 4, 5,6, 7, 10, 29, 33, 34, 66, 67, 87, 89, 94, 107, 109) MONIT 121 202222531 254204 S72.
484, S01, 553
Walker, A. 507, 508, 515
Wand 156
Warren, D. 484, 485, 501
Waterman, D.A. 433, 436, 437, 438, 444, 459, 460
Waters, R. 134, 135
Wegbreit, B. 87, 89, 118, 121, 290, 303
Wertz, H. 135
Wesson, R. 433
Wharton, R.M. 507, 515
Wiehagen, R. 506, 513
Wile, D. 419
Wilkins, D. 44, 67
Winston, P.H. 436, 460, 476, 482, 521, 527, 552
Wirsing, M. 156, 157, 158
Wirth: N2 227, 228: 242. 290. 303
Wossner, H. 160, 179, 302
Wright, J.B. 181
Yau KeG: -518..552
Yen Rit 29.180
Yelowitz, L. 420
Zaria 4. 273) 280. 287.
Ziller, S.N. 158
ee
eee