COSC 4P77 Final Project Improvements To Lilgp Genetic Programming System
COSC 4P77 Final Project Improvements To Lilgp Genetic Programming System
Final Project
Improvements to lilgp
Genetic Programming System
Adam Hewgill
2400083
Introduction
The goal of this project is to improve a freely available genetic programming
(GP) system so that it becomes a more useful tool for future researchers. lilgp is a widely
used GP system that is free and quite robust. It has been around since 1995 and has been
slowly but continually improved by both the original authors and external users of the
system. There now exists multithreading, a windows compatible version, strongly typed
GP, and constrained GP which link off of the main lilgp site[1]. The most recent version
of lilgp is 1.1 beta which contains the multithreading update for multiple populations on
multiple CPU’s, this release is dated September 1998.
Strong typing was added to the kernel by Sean Luke in 1997 at which time some
other multithreading bugs were fixed and rolled into a version of lilgp called “Sean’s
Patched lilgp Kernel”[2] which is recognized as an independent kernel or a branch off of
the original lilgp kernel which now stands alone. The two versions are both well tested
since they have been used for several years now by many researchers.
The improvements the have been made are applied to the strongly typed version
of the kernel since one of the improvements is directly related to strong typing. Also the
strongly typed version essentially contains the non-typed version within it as a special
case when using only one type. So it seems logical to make updates to the highest order
version so that everyone can use them, not just those who only want to deal with
problems needed only one type.
The first and most major improvement to the kernel is the update to the tree
growth methods. Initially the grow and full tree generation methods were limited due to
their simplicity which meant that every type had to contain at least one function and
terminal. This would cause a great deal of bloat in the tree since the author needed to
create useless functions and terminal that the tree generators could use to fill the tree up
to its maximum depth. Now these useless functions and terminals are no longer
necessary so the avoidable useless bloat is reduced leaving only the bloat created by
evolution for protection.
The second improvement was to add a method by which functions and terminals
could be turned on and off before a run simply by setting their state in the input file. This
removed the need to recompile to change the function set.
Thirdly, the random number generator which is built into the kernel is exposed to
the user via calls to random_seed( int ), random_int( int ), and random_double( ). These
where very useful functions since random_int ( int ) returns a number between zero and
the passed integer including zero and random_double( ) returns a double between zero and
one including zero. The problem was that if you were to use these number generators
you would change the number sequence so while the same seed will give you the same
run changing the app side in anyway relating to the random calls would totally change the
resulting kernel random numbers. The other problem is if the user wants a set of objects
to always be the same then the changing of the random seed would be best circumvented.
This problem is fixed by componentizing the random number generator so that
everything necessary is contained in a structure which all the calls for number pass as
their first parameter. The final changes are much smaller bug fixes and beautification
changed which make the kernel better overall.
grow_tree ( int depth, int return_type ) full_tree ( int depth, int return_type )
{ {
// Reached maximum depth of tree // Reached maximum depth of tree
if depth is zero if depth is zero
randomly select terminal of return_type randomly select terminal of return_type
Listing 1
The function necessity results in the same problems mentioned above for the
extraneous terminals. When the full tree algorithm is not at the maximum depth for a tree
it must continue to select functions until it does reach the bottom. However, if your type
set has a dead-end type such that if entered there exists only terminals and no functions
then the tree generator becomes stuck and can not extend the branch to the maximum
depth. This leads to the necessity of having a function for every type so the full tree can
use it to extend the branches to the maximum depth. This also changes the search space
if unfavorable ways.
The two issues outlined above have been dubbed the “maximum depth cutoff”
problem and the “full tree fill” problem respectively for use in this paper. “Maximum
depth cutoff” and “full tree fill” are essentially the same problem but occur at different
points in the tree generation process as was discussed previously. The remedy for these
problems is to increase the complexity of the algorithms by adding the ability to do
backtracking when one or the other of the two problems crops up in tree generation.
New Algorithm
The new algorithm is based on the original with only the problem outlined above
fixed. Nothing has been added to improve the quality, shape or size of the resulting
population of trees. See listing 2 and 3 for the new algorithm pseudo code.
The first necessary change was to the function definitions themselves. Before, the
two tree generators were void functions (procedures) that never failed due to the
constraints placed on the type set. Now we wish to tell the parent level of the recursion
that a child sub-tree can not be created to fit this branch or that the maximum possible
sub-tree is too short for this branch. This is implemented by making the function return
an integer that is one on error and zero otherwise. This will let the parent know if a
different function should be selected leading to the backtracking behavior desired.
In the grow tree algorithm there are two places where terminals can be selected
for use in the tree. The first is when the depth reaches 0 (at bottom of tree) and the
second is if the tree is not at the bottom but a terminal is chosen randomly so terminal
selection is spread out. To alleviate this, the very first thing done in the new grow tree
algorithm is to decide whether a terminal or function is selected probabilistically. This
has nothing to do with the maximum depth cutoff and simply provides a way of breaking
the terminal selection part off and having it by itself at the beginning of the algorithm.
This is desirable for continuity between the two tree generation algorithms and to reduce
code duplication now that the algorithm is more complex. The decision is made by
calculating the percent of nodes which are functions and then testing if a random number
[0..1) is greater or equal, which signifies selection of a terminal or less than, which
signifies selection of a function. The first if statement then changes from “if depth is
zero” to “if depth is zero or terminal selected” which covers all terminal selection
possibilities.
The solution to the “maximum depth cutoff” problem is simple to solve now that
the algorithm includes a framework for backtracking. When the first if statement is
entered a check is done immediately as to whether a terminal exists for the current return
type. This will always be true in the case of the grow algorithm where a terminal is
selected probabilistically (as discussed in the previous paragraph). If the check fails and
we have no terminals to place in the tree then we have reached a point where it is
impossible to generate the sub-tree to attach to the parent so a one is returned forcing the
parent to try again. Adding a terminal to a branch is the base case for recursion in both
algorithms and since the new algorithm maintains the base case behavior that was present
before, the recursive algorithm is still valid. If there is a terminal to add to the tree then
one is selected randomly and the recursion ends normally.
The solution to the “full tree fill” problem is also simple now that we have
backtracking. After we pass through the section where a terminal would be selected we
must definitely select a function to add to the tree. So a check is done to see if there are
any functions that are of the current return type. If none exist then a one is returned and
the parent is forced to try again. If there is a function for the type then one is selected
randomly and attached to the tree.
New GROW tree algorithm
Listing 2
Listing 3
If for some reason the old method is preferred it can still be accessed via calls to
the kernels random number generator instead of a local app one. This is done with this
call: random_double( globrand ).
The keep trying addition on the end forces the retrying outlined above. In
addition you can specify how many times to retry before giving up and using the parent.
This is specified in the num_times section where N is the number of times to retry.
breed[1].operator = crossover, . . . , keep_trying=on, num_times=N
Minor Changes
Initially the old kernel used an obsolete version of two functions relating to multi-
threading. These functions worked on the target system that it was initially developed on
but not on operating systems with more recent version of the POSIX thread library. Now
the user can specify if they are using an older version of the POSIX library by defining
PTHREAD_1.X in the GNUmakefile in the CFLAGS section as –DPTHREAD_1.X. This
incompatibility was initially discovered by Cale Fairchild while working with the multi-
threading turned on.
Checkpoint compression is a very little used function of the kernel which is
configured in the GNUmakefile. It is initially set to use VFORK to spawn the
compression algorithm but for wider compatibility the default was changed to SYSTEM
which works on all system.
When using an older compiler with strong checking for variables not being used
and other trivial code problems, the compiler will result in a bunch of warnings. All of
these that came up with our compiler have been resolved.
Future Work
The current implementations of the tree generation functions are simple naive
backtracking algorithms. With a more complicated type set it could be very costly with
the worst case running time being for a tree that is impossible to grow given the function
set. It would be more efficient to keep track of possible ending patterns that could be
used once tree generation reaches a certain level. This could take the form of an
automatically generated table that could be queried at every level to check if a forced
function or terminal selection should be made so the tree is possible to grow. This would
greatly reduce the necessity for backtracking and on the whole speed up tree generation
drastically.
The documentation provided with the original un-typed version of lilgp is
moderately well done but has not been expanded by the author of the strongly typed
version. So now the documentation is scattered over different websites and not very easy
for a new user to assimilate. The task of compiling and updating the documentation with
all new features and updates is the next step before releasing this as a new version of the
lilgp system for other researchers to use.
The programs that are evolved using the GP system should be available for use
outside the system itself otherwise the results are purely academic in nature. lilgp already
has the built-in capacity to export a population and then to import it again. This feature is
called check pointing because if your application takes a very long time to run and you
are forced to stop half way you can begin again at a check point and not at the beginning.
This could save you a great deal of time and can be used to test various changes to the
standard GP algorithms. For example if you have a check point that only has ten more
generations to perform then you can test if a change to the algorithm improves the tail
end of the evolution process where usually the evolution has flattened out considerably.
This change can be a tail end only change such as increased mutation and decreased
crossover for the last ten generations, etc. A library needs to be produced so that when
imported into an application a check point file can be loaded and the individual therein
can be used in that application as they are in lilgp itself.
Conclusions
The updates made to the kernel are effective for the problem they were tested on
which is a general type of the problems it is designed to handle. It is safe to assume that
they changes will be effective in all cases of a similar difficulty level and further research
is needed to decide whether complex type sets are also improved.
References
[1] lilgp 1.1 Beta homepage
http://garage.cps.msu.edu/software/lil-gp/lilgp-index.html
[2] Sean’s Patched lilgp Kernel
http://www.cs.umd.edu/users/seanl/gp/patched-gp/