The Road To General Intelligence
The Road To General Intelligence
The Road
to General
Intelligence
Studies in Computational Intelligence
Volume 1049
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design methods
of computational intelligence, as embedded in the fields of engineering, computer
science, physics and life sciences, as well as the methodologies behind them. The
series contains monographs, lecture notes and edited volumes in computational
intelligence spanning the areas of neural networks, connectionist systems, genetic
algorithms, evolutionary computation, artificial intelligence, cellular automata, self-
organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems.
Of particular value to both the contributors and the readership are the short publica-
tion timeframe and the world-wide distribution, which enable both wide and rapid
dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Jerry Swan · Eric Nivel · Neel Kant · Jules Hedges ·
Timothy Atkinson · Bas Steunebrink
© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribu-
tion and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons license and indicate if changes were
made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
The authors assert their moral rights in
relation to this work.
To Doug Hofstadter, for showing that
programs can be piano as well as forte.
Foreword by Melanie Mitchell
The idea that machines could one day exhibit ‘general intelligence’ has both inspired
and confounded the field of AI since its inception. ‘Inspired’ by drawing talented
people into the field to achieve one of humanity’s grandest challenges. ‘Confounded’
since there is no widely agreed-upon definition of what ‘general intelligence’ actually
means or a definitive list of the properties it entails.
The pursuit of general intelligence in AI has been, in large part, the story of
moving goal posts. Variously called ‘human-level AI’, ‘strong AI’, ‘AGI’, and even
‘superintelligence’, the criteria for machines exhibiting general intelligence have
continually changed over time. Alan Turing, in his classic 1950 paper ‘Computing
Machinery and Intelligence’, proposed that a machine should be considered intel-
ligent (or ‘thinking’) if it could persuade a human judge via conversation alone.
However, the rise of ever more capable chatbots (and the surprising propensity of
humans to assign intentionality to machines) has shown that AI conversationalists
that clearly lack general intelligence can easily fool humans in many cases. Early
AI proponents believed that something like general intelligence could be captured
in systems that heuristically followed symbolic rules. Many of the early founders of
AI predicted in the 1960s that human-level AI was only 10, 15, or at most 20 years
away. However, symbolic AI approaches, such as the optimistically named ‘General
Problem Solver’, turned out to be far from general and often disastrously brittle.
Intellectual board games like chess and Go have long been seen as a grand chal-
lenge for AI systems, and many people believed that conquering them would require
something like general intelligence. In 1958, AI pioneers Newell and Simon declared,
‘If one could devise a successful chess machine, one would seem to have penetrated to
the core of human intellectual endeavor’. And in 1997, the year that IBM’s Deep Blue
defeated world chess champion Garry Kasparov, the New York Times conjectured
about Go, ‘When or if a computer defeats a human Go champion, it will be a sign
that Artificial Intelligence is truly beginning to become as good as the real thing’. But
although Deep Blue and its counterpart AlphaGo are extraordinary achievements,
neither is anywhere close to a general intelligence.
Since the 2010s, the rise of deep learning (and its myriad successes) has once
again encouraged optimism among some in the AI community that ‘true AI’ is close
ix
x Foreword by Melanie Mitchell
at hand. More recently, though, cracks have begun to appear in deep learning’s facade
of intelligence, and many have expressed serious doubts about the prospects of ‘big
data’ approaches for developing general AI.
This book is an attempt to clarify what kind of knowledge representation and
information processing is needed for general intelligence in machines. The authors,
inspired by theories of semantics as well as programming-language theory, stress the
need for representations that exhibit both ‘algebraic compositionality’ and ‘strong
typing’—properties of programming languages that allow for explicit propagation of
arbitrary constraints, rapid adaptation to new inputs, and reflection (in which a process
can examine its own behavior). This is in contrast to the representations formed by
today’s deep learning systems, which seem to have none of these properties.
Most people in AI would agree that even when machines exhibit what seems
like intelligent behavior (for example, producing coherent translations between
languages) these machines don’t have anything like the kind of understanding of
their inputs or their own behavior that humans have. The notion of ‘understanding’
in machines is hard to pin down, but the authors of this book note that, at the very
least, understanding must entail the ability to transfer what one has learned to new
tasks, a capacity that today’s state-of-the-art AI systems still struggle with, but is
highly desired by the world of automation. The authors frame this notion as ‘Work
on Command’—the ability to respond, in a reasonable time, to dynamic changes in
the specification of the task one is faced with.
This book includes extensive discussion of additional abilities required for general
intelligence—for example, abduction, analogy, and hypothesis generation. Capturing
such abilities in AI systems in a general and humanlike way has been the subject of
much research but little progress to date, in part due to the lack of progress in capturing
the causal knowledge and reasoning that underlies them. Here the authors describe
how such abilities have been implemented in a reference system that exhibits what
they call Semantically Closed Learning (inspired by the concept of semantic closure
in open-ended evolution, and incorporating further ideas from category theory).
In short, this book provides an intriguing and provocative framework for thinking
about what general intelligence is, and how its essential abilities might be attain-
able by machines in an economically viable manner. The philosophy behind both
programming-language theory and category theory plays key roles in the formal-
ization and development of the main ideas. The authors also provide pointers to
what research challenges lie open. Given the complexity and intricacy of the desti-
nation, the road to general intelligence will be a bumpy one. This book gives a
thought-provoking view of one pragmatic direction toward this goal.
It’s no coincidence that companies have departments, that furniture has drawers and
shelves, that bodies have organs, that code bases have modules: things organize for a
reason. As Herbert Simon points out in ‘Sciences of the Artificial’, organizing offers
exponential compression in the search space for solving problems. By carving nature
at its joints, by elegantly articulating and factoring the space of possibilities, and by
finding the right abstractions, we enable ourselves to handle new situations in stride.
Mathematics is the marketplace for humanity’s clearest and most reliable abstrac-
tions. And within mathematics, category theory is unparalleled in its ability to factor
ideas into constituent parts that fit together frictionlessly. In this book, the authors
emphasize a subdiscipline of category theory called bidirectional transformations,
also known by names such as polynomial functors and optics, which I join them
in regarding as essential for expressing the crucial role of feedback in intelligent
systems. Finding categorical abstractions to handle new situations, including creating
more intelligent systems, requires a widespread research program that is only now
starting to emerge. As far as I know, this is the first book to confidently assert this
need and make a real stab at solving it.
xi
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Part I Requirements
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 What we Mean by General Intelligence . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Science as Extended Mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 The Death of ‘Good Old-Fashioned AI’ . . . . . . . . . . . . . . . . . . . . . 12
3 Where is My Mind? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 A Sanity Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Real-World Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Challenges for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Compositionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Strong Typing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Implications and Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 Challenges for Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1 A Priori Reward Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Sampling: Safety and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Work on Command: The Case for Generality . . . . . . . . . . . . . . . . . . . . 39
6.1 Goals and Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.3 Anytime Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
xiii
xiv Contents
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Chapter 1
Introduction
The rise of civilization is synonymous with the creation of tools that extend the intel-
lectual and physical reach of human beings [133]. The pinnacle of such endeavours is
to replicate the flexible reasoning capacity of human intelligence within a machine,
making it capable of performing useful work on command, despite the complexity
and adversity of the real world. In order to achieve such Artificial Intelligence (AI),
a new approach is required: traditional symbolic AI has long been known to be too
rigid to model complex and noisy phenomena and the sample-driven approach of
Deep Learning cannot scale to the long-tailed distributions of the real world.
In this book, we describe a new approach for building a situated system that
reflects upon its own reasoning and is capable of making decisions in light of its
limited knowledge and resources. This reflective reasoning process addresses the
vital safety issues that inevitably accompany open-ended reasoning: the system must
perform its mission within a specifiable operational envelope.
We take a perspective centered on the requirements of real-world AI, in order
to determine how well mainstream techniques fit these requirements, and propose
alternative techniques that we claim have a better fit. To reiterate: by AI we mean the
property of a machine that exhibits general-purpose intelligence of the kind exhib-
ited by humans, i.e., enjoying the ability to continually adapt existing knowledge
to different domains. The endeavor to create intelligent machines was definitively
proposed as such in the 1950s [220], although the concept of a humanoid automaton
recurs throughout recorded history. Due to the sheer magnitude and ambition of the
project, there have naturally been many bumps in the road: not only the infamous
‘AI winter’ [202], but also periods where the endeavor’s vision and direction have
been clouded by the prospects of short-term success.
© The Author(s) 2022 1
J. Swan et al., The Road to General Intelligence,
Studies in Computational Intelligence 1049,
https://doi.org/10.1007/978-3-031-08020-3_1
2 1 Introduction
AI for Automation
Given that substantial resources are required to create AI, it cannot be done on a
whim. Therefore the shape of AI (at least in its initial incarnation) will be strongly
influenced by the return anticipated by those investing in it. That is, to answer “How
to build AI?”, we must ask why we want AI in the first place, i.e., what is the business
case for a machine with general intelligence?
Philosophical considerations aside, intelligent machines are ultimately tools for
implementing a new leap in automation. In practical automation settings, the gen-
erality of a system is measured as the inverse of the cost of its deployment and
maintenance in a given environment/task space. At the low end of this spectrum
are systems that depend on full specifications of their environments and tasks. Such
systems are very costly to re-deploy when facing specification changes, possibly
incurring the highest cost: that of a complete rewrite. At the high end are more gen-
eral systems that re-deploy autonomously through continual open-ended adaptation
and anticipation.
The main functional requirement of general intelligence is therefore to control the
process of adaptation. In this work, we claim that this can be achieved in a unified,
domain-agnostic manner via the ability to ground arbitrary symbols (whether arising
from end-user vocabulary or being synthesized by the system) in an explicit learned
semantics. Hence, throughout this work, when we discuss symbols in reference to our
proposed architecture, it is not in the sense of the a priori opaque logical predicates
of ‘Good Old-Fashioned AI’, but rather follows in the footsteps of a collection of
cyberneticists, psychologists and systems theorists [8, 67, 218, 253, 261, 269, 299]
for whom “symbols are merely shorthand notation for elements of behavioral control
strategies.” [49].
In practical terms, the endeavor of creating general intelligence therefore consists
of building a template for a learning control system which can be re-targeted at an
arbitrary environment, bootstrapping the control mechanisms with as little latency as
possible, starting from small amounts of (incomplete or even faulty) knowledge. The
system is then expected to discover further constraints on the fly—be it from a corpus
of ready-made knowledge; from experience acquired with and without supervision;
perhaps by interacting in the environment, possibly under the sporadic guidance of
teachers and end-users.
Notwithstanding these business considerations, the creation of AI still relies on
good science, especially with regards to requirements engineering, with the initial
focus illustrated in Fig. 1.1. Although we set aside those requirements that are mostly
issues of hardware, paperwork, or procedures (e.g., constructing curricula for teach-
ing the system as well as its eventual operators), the fact that they must be addressed
and fulfilled then imposes constraints on which scientific techniques can even be con-
sidered. The requirement-centric perspective dictates which properties are important
for a technique to exhibit or avoid. For example, even legal requirements impinge
on techniques, such as when GDPR1 demands transparency in automated decision
Requirements
Knowledge RepresentaƟon ComposiƟonality Sect. 4.1
Strong Typing Sect. 4.2
ReflecƟon Sect. 4.3
Fig. 1.1 Theme development in Part I: a summary of the most pertinent engineering requirements
for constructing a general intelligence system of real-world use. Their importance is established
throughout the first part of this book and then leveraged to construct our proposed framework:
Semantically Closed Learning
making, which is more easily fulfilled when knowledge representation and reasoning
are not intrinsically black-box components or processes.
This book begins with a survey of historical (Chap. 2) and contemporary (Chap. 3)
AI methodologies, discussing their strengths and weaknesses, from the perspective
of their potential to support general intelligence. Machine learning (ML), notably
deep- and reinforcement learning, has emerged as the dominant AI paradigm. There
are certainly many valuable applications for which ML offers functionally good solu-
tions, in particular for industrial applications where such techniques are used to build
control systems beyond the reach of traditional software engineering. Nevertheless,
it remains a feat of imagination to ascribe any meaningful notion of intelligence to
any of these systems: the constraints and ambitions of machine learning and gen-
eral intelligence research are simply orthogonal. Although machine learning is a
valuable engineering technique, this fact is not to be confused with a claim that it
might offer a path toward general intelligence. In Chaps. 4 and 5, we make a criti-
cal appraisal of this claim, by contrasting deep learning and reinforcement learning
techniques against key requirements of general intelligence—from the perspective
of automation engineering, these are reified by the notion of ‘Work on Command’
in Chap. 6.
The second part of the book is concerned with an alternative framework that
we claim fulfills better these requirements. There is increasing consensus that it is
necessary to combine the strengths of both symbolic and connectionist paradigms
[59, 210]: the main advantage of symbolic approaches is the ready injection of
domain knowledge, with the attendant pruning of hypothesis space. In contrast, the
main advantage of connectionism is that it is (at least in principle) a tabula rasa.
4 1 Introduction
As has been argued by Marcus for many years [214], we also hold the view that
general intelligence requires the recursively algebraic capacities of human reasoning.
This motivated the research and associated reference architecture implementation
we present in this book. This architecture has been implemented, and prototypes
have been developed, addressing the domains of medical diagnosis, service robotics,
and industrial process automation—empirical demonstrations will be the topic of
subsequent works. In Chaps. 7–10, we define a framework for ‘Semantically Closed
Learning’ which:
• Describes an explicit (but nonetheless ‘universal’) recursive interpreter for a highly
generalized notion of algebraic reasoning.
• Represents the hierarchical causal structure of hypotheses as first-class objects.
• Defines a fine-grained and resource-aware attention mechanism, driven to favor
highly-structured and stable hypotheses.
• Describes key reasoning heuristics using the generic and compositional vocabulary
of category theory, from the emerging perspective of ‘Categorical Cybernetics’ [42,
140].
• Defines a novel compositional mechanism, using lenses [88], an approach which
unifies conventional backpropagation, variational inference, and dynamic pro-
gramming, for the purpose of abductive reasoning over hybrid numeric-symbolic
expressions.
• Describes a minimal viable implementation design for 2nd order automation engi-
neering—system identification, synthesis, and maintenance—with guarantees rel-
evant to safety.
Finally, in Chap. 11, we summarize our contribution, discuss research avenues, and
conclude.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Part I
Requirements
Chapter 2
Background
It’s all these black boxes you can’t open—see how each spends
most of its time trying to defeat the other.
Knuth [368]
Recent years have seen an explosion in academic, industrial, and popular interest in
AI, as exemplified by machine learning and primarily driven by the widely-reported
successes of deep- and reinforcement learning (e.g. [314, 315, 351]). Deep learning
is essentially predicated on the notion that, with a sufficiently large training set, the
statistical correlations captured by training will actually be causal [310]. However, in
the absence of convergence theorems to support this, it remains a hypothesis. Indeed,
insofar as there is evidence, it increasingly indicates to the contrary, since the appli-
cation of enormous volumes of computational effort has still failed to deliver models
with the generalization capability of an infant. There is accordingly increasing dis-
cussion about what further conceptual or practical insights might be required [57].
At the time of writing, the very definition of deep learning is in flux, with one Turing
Award laureate defining it as “a way to try to make machines intelligent by allowing
computers to learn from examples”1 and another as “differentiable programming”.2
We argue in the following that deep learning is highly unlikely to yield intelligence,
at the very least while it equates intelligence with “solving a regression problem”.
Specifically, we claim that it is necessary to adopt a fundamentally different perspec-
tive on the construction of inferences from observations, and that this is in accordance
with a fundamental revolution in the philosophy of science: Karl Popper’s celebrated
solution to ‘The Problem of Induction’ [268].
1 https://blogs.microsoft.com/ai/a-conversation-ai-pioneer-yoshua-bengio.
2 https://www.facebook.com/yann.lecun/posts/10155003011462143.
to replace black box reward functions with richer representations that are suitable
for reflective reasoning.
In pursuit of greater generality than can be provided by an a priori reward function,
we must therefore adopt the wholly pragmatic perspective of the following value
proposition.
There is clearly a huge gap between the software which enables facial or gait recog-
nition and the yet-to-be-realized technology which will allow safe and trustworthy
autonomous vehicles or factories. One can likewise consider the reality gap between
audio-activated digital assistants and fully-fledged household robots. There exist
countless other examples of roles that current AI techniques are incapable of fulfill-
ing. In roles where humans are currently irreplaceable, what traits enable them to
meet the demands of these roles? The generality of human intelligence is evident in
many ways:
• Humans can handle multiple objectives simultaneously and can typically order
activities so as to meet these objectives relatively efficiently.
• Humans can learn skills without forgetting those previously learned. They can also
make efficient use of related skills to bootstrap their learning process and minimize
this effort.
• Humans can explain their decision-making in terms of relevant causal factors
and ‘locally consistent’ frameworks of thinking, which means that a recipient of
the explanation (perhaps also their subsequent self) can understand, verify, and
possibly rectify the steps taken to reach conclusions.
• Humans can be told what is desired directly as a goal rather than needing to
iteratively try behavior in the hope of optimizing some sampled metric.
• Humans can be told what is forbidden and/or constrained and they can avoid such
situations without needing to physically interact with (i.e. ‘sample’) the environ-
ment, assuming relevant grounded world knowledge.
10 2 Background
• Humans can gracefully adjust their cognitive resource usage between perception,
action, and learning rather than having rigid boundaries between them.3
• Humans can operate in multi-agent settings, mostly through being able to effec-
tively model other agents’ trajectories based on their perceived intentions and
behavioral patterns.
• Humans can do all of the above in the real world, perhaps with a curriculum, but
not needing a high-fidelity resettable/reversible simulation within which to learn.
For the purposes of this work, the above list of traits will be considered as a
set of necessary emergent capabilities of general intelligence. With this motivation,
we claim that a system which exhibits these traits must satisfy the requirements
summarized in Fig. 1.1.
While the human mind is the most immediate exemplar for general intelligence,
we believe there are strong reasons to consider that the scientific method is better
suited to provide a template for its implementation. As Rodney Brooks has famously
observed [37], insights obtained via mental introspection might cause us to be deeply
misled about the nature of intelligence. In contrast, the adoption of the scientific
method yields falsifiable statements about the physical world. This can be seen as
providing an ‘extended mind’ [50]—an externalized artifact with verifiable proper-
ties that can directly inform the design of general intelligence architectures. Given the
inevitable concerns about ‘AI alignment’,4 such verifiability is of particular impor-
tance in obtaining measures of safety. Hence, we believe that the path to general
intelligence (at the very least, in a form capable of respecting safety concerns) lies in
the attempt to automate the scientific method, from the perspective of an embodied
reasoner with real-world concerns of deadlines and resource availability.
Recent years have seen increasing emphasis on causality in machine learning.
Causality is essential for building reasoning systems as it is a stronger criterion
than merely statistical correlation. Originally having been convincingly argued for
by Pearl [254], the relevance of causality to AI has been since agreed upon by
Schölkopf, Bengio, and others. One of the key ideas is the ‘ladder of causality,’
which is framed as inference situated on three ‘rungs’: the observational, interven-
tional, and counterfactual settings. Statistical learning from fixed datasets operates
solely the observational rung: training on data generated only by an external pro-
cess which the model does not affect. Interventions imply the ability to set values
of certain variables despite the natural external processes in order to generate infor-
mative data; for example, double-blind experiment design with control groups. The
most demanding yet powerful application of causality is counterfactual reasoning,
where inferences are drawn based on variable values which were never observed but
generated through interventions in a model; for example, alternate history timelines.
Pearl also introduced the Structural Causal Model (SCM), which is a directed
acyclic graph structure specifically designed to enable users to operate on all three
rungs of the ladder of causality. A key idea in the SCM formalism is that dependencies
3 Whilst not necessarily under conscious control, nonetheless a property of human cognition overall.
4 The quest for confidence that a general intelligence won’t attempt to turn everything into paperclips
[30].
2.2 Science as Extended Mind 11
between variables are framed as probabilistic functions rather than simply statistical
dependence. This is better aligned with a physical interpretation of observations,
namely that they are caused by physical processes over time. The distributions of a
set of variables X i are given by the formula:
X i = f i (PAi , Ui ), (i = 1, . . . , n)
where PAi are the parent nodes of X i . The functions are probabilistic due to Ui ,
which are exogenous noise variables which are jointly independent of one another.
If there were dependencies, they could be explained by forming yet more causal
relationships (as per the common cause principle), and so noise must be modeled as
independent.
Interventions in SCMs are defined as (temporarily) setting f i to be a constant.
Importantly, the distributions of the parent nodes are unaffected by interventions on
children since their relationships are effectively severed, which is different from stan-
dard Bayesian networks. Instances of the latter only denote conditional independence
relationships as undirected graphs, and so dependence between nodes persists even if
the value of one is set. The ability to perform interventions also allows for principled
counterfactual reasoning in SCMs. If we had observed some value for a node X i , we
can use abduction to estimate the value of Ui , and then after intervening on its par-
ents PAi , re-apply the observed exogenous noise in order to produce a counterfactual
inference. There are ubiquitous problems which require counterfactual reasoning that
are consequently intractable for purely statistical models [254]. Given their apparent
completeness for causal modeling, the modern problem of causal discovery involves
deducing the topology of an SCM which can accurately describe the system.
Despite widespread interest in the use of SCMs, it is vital to appreciate that, within
scientific practice, causality is best understood as being only part of a contextualized
process of situated, bidirectional inference. Hence we take the deeper view of science
as the construction of statements which provide a concise and consistent description
of possible worlds. As recently observed by David Deutsch:
Finding causal theories is necessary but not sufficient. We need explanatory theories.
“Mosquitos cause malaria” is essential and useful but only 1% of the way to understanding
and curing malaria.
The essence of the scientific method is, of course, the interleaving of problem
formulation, hypotheses generation, experimentation, and analysis of results. Hence,
a core aspect of the proposed approach is the requirement for a reflective expression
language. Reflection is the property that statements can themselves be treated as
data, meaning that hypotheses about knowledge in the language can be evaluated as
first-class objects. This becomes salient for the process of hypotheses generation.
Concretely, for our purposes hypotheses are some (sub)graph of inferences in the
system’s transition model. Mappings from sensor inputs to effectors (or in the oppo-
site direction, in the case of abductive inference) are just specific fragments of this
overall model, starting or terminating in appropriately designated sensor or effector
dimensions. Naturally, it is desired that the only hypotheses that are entertained by
12 2 Background
the system are those which (1) actually describe a possible world and (2) are rele-
vant to the task at hand. Considered from a ‘traditional symbolist’ perspective, the
latter is of course equivalent to the well-known ‘Frame Problem’ [219], which, as
discussed in subsequent chapters, is increasingly understood to have been an artefact
of coarse-grained and disembodied inference.
As we discuss in detail in Chaps. 7 and 9, it also follows that any reasonable
candidate architecture for encoding knowledge for general intelligence will be com-
positional, so as to obtain a semantics for compound hypotheses. Another essential
property is the notion of strong typing, which enables inheritance/taxonomy and
the explicit denotation of goals and constraints as regions of a (prospectively open-
ended) state space. These properties jointly enable structured updates to working
knowledge that retain the self-reinforcing nature of a scientific theory [89].
Finally, the modern scientific method requires that all working knowledge should
be in principle falsifiable via empirical observation. Hence, we include as a require-
ment to our approach that the base symbols of the expression language must include
denotations which are grounded in this way. Causal modeling also stipulates the
ability to intervene directly in the environment to learn the effects of one’s own
agency. Thus, the system and representation language must both support primitives
for interacting with the environment.
We claim that it will not be possible (or economical) to automate human labor
in the general case until AI also possesses these properties. As such, this is the
context in which we will highlight the challenges and shortcomings of deep learning,
reinforcement learning, and other existing AI approaches. To place contemporary
approaches in the appropriate context, we proceed via a brief historical recapitulation
of the rise and fall of the traditional symbolist approach.
The key figures at the inaugural AI conference at Dartmouth [220] were split across
the nascent symbolist and connectionist divide, with Simon, Newell, and McCarthy
in the former camp, and Shannon, Minsky, and Rochester in the latter [174]. How-
ever, the symbolist approach became the prevailing one, not least because of the
widespread confusion surrounding the solvability of the XOR problem by percep-
trons [226]. The following decades saw concerted effort in symbolic AI. Many of the
languages used to construct AI originated from the synthesis of procedural and log-
ical programming styles, respectively exemplified by LISP and resolution theorem
provers. Hewitt’s ‘PLANNER’ language [142] was a hybrid of sorts, being able to
procedurally interpret logical sentences using both forward and backward chaining.
It was used to construct SHRDLU [366] which was hailed as a major demonstra-
tion of natural language understanding. This inspired other projects such as CYC
[197], an ongoing attempt to create a comprehensive ontology and knowledge base
that seeks to capture ‘common-sense knowledge’. Less ambitious and more success-
ful were the various expert system projects that started in the 1960s and became
2.3 The Death of ‘Good Old-Fashioned AI’ 13
prevalent in the following two decades. These included the MYCIN expert system
for diagnosing infectious disease [349], Dendral for identifying unknown organic
molecules [38] and other well-known systems such Prospector [135]. Collectively,
such systems became known as ‘Good Old-Fashioned AI’ (GOFAI) [137].
(( The prospect of general intelligence using such rule-based systems was not highly
appraised by observers. In the early 1970s, the Lighthill Report [202] lead to a drastic
reduction in AI research funding by the UK government and DARPA cut funding
to academic AI research in the USA. In the late 1980s, the United States’ Strategic
Computing Initiative, which was essentially formed to participate in an AI/computing
race with Japan, cut funding for new AI research as its leaders realized that the
effort would not produce the full machine intelligence it desired. Simultaneously,
the market for large-scale expert system hardware collapsed and more affordable
general-purpose workstations took over. These developments formed part of what is
now colloquially known as the ‘AI Winter’.
In hindsight, many of these challenges and the accompanying demise of GOFAI
could be said to be a function of the hardware of the time. Computing power and mem-
5It was eventually concluded that solutions exist to these problems, including default logics [347]
and answer set programming [201].
14 2 Background
ory were obviously more expensive and software design decisions (such as deciding
between the use of LISP or C/C++) had a correspondingly disproportionate impact
on what could be computed in practice. A number of companies built upon LISP-
based expert systems (such as Symbolics, LISP Machines Inc., Thinking Machines
Corporation, and Lucid Inc.) went bankrupt. Ambitious undertakings were common,
such as the Fifth-Generation Computer Systems (FGCS) project in Japan during the
1980s. The massive investment in building highly parallel computing systems was
ultimately in vain as simpler, more general architectures such as workstations from
Sun Microsystems and Intel x86 machines became favored for such roles. Some of
those forward-looking ideas have been reinvented in the early 21st century, such
as the emphasis on highly parallel programming from FGCS now commonplace in
general-purpose GPU programming with CUDA and OpenCL.
Regardless of technological advances, the GOFAI paradigm still does not present
a viable path to general intelligence. For the architectures relying on ungrounded
knowledge representation, there is no prospect of deploying them to address tasks in
the real world of complex and noisy data streams. More fundamentally, the absence
of grounding precludes the understanding of causal relationships of the real world—a
core aspect of operationalizing the scientific method. Even if a GOFAI system were
hypothetically to achieve symbol grounding, there would still be a fatal flaw: GOFAI
never matured sufficiently to escape the scalability problem inherent in the ‘sense–
think–act’ loop. As the system’s body of knowledge grows, the time required to make
plans and predictions must also increase. As engineers put it, the system ‘lags behind
the plant’ and faces two options: either to deliver correct action plans but too late,
or to deliver on time plans that are incorrect [241]. This issue arises essentially from
the synchronous coupling of the agent and its environment, i.e., the latter is expected
to wait politely until the agent completes its deliberations. Technically speaking,
synchronicity means that the agent computes in zero time from the environment’s
perspective.
Machine learning has failed to acknowledge the significance of this problem and
has even adopted GOFAI’s synchronous coupling as one of the fundamentals of
reinforcement learning; see Chap. 3. For now, computation is scaling at a rate that
can sustain the synchronous abstractions used in large-scale projects (see Sect. 5.2).
For lower-level routines (such as reactively handling sensory streams of data at a
fixed frequency) this may suffice. On the other hand, there are certainly aspects to
cognition which are slower, more deliberative and explicitly logical, and this is where
the ‘sense–think–act’ approach breaks. Some believe that given more resources and
innovations in model architectures, deep learning may be able to encode knowledge
effectively enough to empower this sort of cognition and meet the requirements for
general intelligence. In the next chapters, we shall see why this, in fact, cannot be
the case.
2.3 The Death of ‘Good Old-Fashioned AI’ 15
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 3
Where is My Mind?
It was like claiming that the first monkey that climbed a tree was
making progress towards landing on the moon.
The research field of AI is concerned with devising theories, methods, and workflows
for producing software artifacts which behave as intelligent subjects. Evidently, intel-
ligence, as the property of an agent, is not of necessity inherited from the methods
used to construct it: that a car has been assembled by robots does not make it a robot.
Unfortunately, even this obvious distinction can sometimes be erased in some
prominent published work. To wit: the statement, “an agent that performs sufficiently
well on a sufficiently wide range of tasks is classified as intelligent” was recently
published by DeepMind [273] to give context to a paper claiming to have developed
“the first deep RL agent that outperforms the standard human benchmark on all 57
Atari games” [14]. This invites the inference that the range of the tasks (57 games)
that have been achieved warrants calling the advertised agent ‘intelligent’. However,
careful reading of the paper reveals that the authors have in fact developed 57 different
agents. Granted, this was achieved using the same development method and system
architecture, but 57 agents were nonetheless trained, rather than the claimed single
agent. Here is a prime example of distilled confusion: a property (applicability to 57
tasks) of one construction method (instantiating the Agent57 system architecture)
has just been ‘magically’ transferred to some 57 artifacts produced by the method.
This only fuels what Marcus terms the “epidemic of AI misinformation” [211]
from which the world has been suffering for some years [165]. As a minimum, this
damages public understanding (impinging on business expectations/validation), trust
(in scientific deontology), and education at large. Indeed, in common with others [75]
we are of the opinion that such ‘misinformation at scale’ steers both governance and
research the wrong way: it gives credence even among some seasoned researchers—
© The Author(s) 2022 17
J. Swan et al., The Road to General Intelligence,
Studies in Computational Intelligence 1049,
https://doi.org/10.1007/978-3-031-08020-3_3
18 3 Where is My Mind?
or worse, the next generation of researchers—to claims that machine learning is the
(only) root of general intelligence, a myth we debunk in the next two chapters. But
first, to give the matter a proper grounding, we must return to the roots of ML in order
to objectively assess its domain of application, appreciate its evident achievements,
and delineate the boundaries of its potential.
With the demise of GOFAI came a renewed effort to explore connectionist learning
paradigms. Over time, the field shifted from the symbolic languages of GOFAI
to ‘end-to-end’ feature learning. This has loosened the constraints on knowledge
representation, namely moving away from an expression language with discrete
symbols to something more representationally amorphous. Learning the parameters
of a function approximator then becomes a fitting and regularization problem, as
conceptualized by the bias-variance trade-off.
Inspired by trial-and-error learning in animals, reinforcement learning (RL) devel-
oped from work in optimal control, which is concerned with minimizing some metric
of a dynamical system over time. The techniques developed for solving these prob-
lems became known as dynamic programming, and also produced the formalism of
the Markov Decision Process (MDP), which is now essential to RL [331]. It provides
the abstractions of states s ∈ S, actions a ∈ A, and reward function r : S × A → R.
The evolution of states is assumed to progress in discrete steps according to the tran-
sition function P : S × A → (S), with a scalar reward Rt = r (st , at ) produced
at each timestep t. These ingredients can be modified to accommodate additional
complexity such as partial observability, continuous spaces, and multiple agents.
The purpose of RL is to optimize a value function based on sampled rewards
which are stationary and specified a priori, in order to produce an agent (called a
controller) consisting essentially of a policy that maps states to actions. It does so by
shaping the policy encoded in a neural network1 generally either via gradient descent
over its weights or by ‘neuro-evolution’, i.e., optimizing the fitness of a policy within
a population thereof. Training is unsupervised and uses for its ground truth a cor-
pus of trials and errors logged from an number of sessions of interaction (episodes)
with a simulated world—the number of episodes grows with the complexity of the
task/environment distribution and is generally enormous (in other words, the sam-
ple efficiency is inordinately low). Note that, although it is possible in principle to
perform trials and errors in the real world instead of in a simulator, this is rarely the
case in practice, for fear of wear and tear of equipment and safety risks (on sample
inefficiency and safety concerns, see Sect. 5.2).
1In early days, policies were encoded as mere lookup tables (as in Q-learning [208]), but this could
not scale with the increasing dimensionality of the problems, hence the need to compress said
policies via deep neural networks, hence ‘deep RL’.
3.1 A Sanity Check 19
Fig. 3.1 State-of-the-art high-level RL training procedure (Hoffman et al. [143]). The fact that, in
practice, the procedure is implemented in sophisticated ways does not change its fundamentals
2 https://en.wikipedia.org/wiki/Currying.
3 Because the RL procedure is synchronous and highly sample-inefficient, simulators must be fast
enough to keep the training time under manageable limits.
4 The task is fixed and the deployment environment is drawn from the same distribution as the one
the standard procedure of software update. This is unsurprising: like any other soft-
ware engineering method, ML addresses predefined controlled conditions—a claim
of ‘intelligence’ is neither made or required; client-side engineers know what to
expect.
Note that, in compliance with the continuous integration/continuous delivery
paradigm (CI/CD) that pervades modern software engineering (for better or worse),
it is possible to accelerate the rate of the updates. In principle, this is achieved by
expanding the training distributions via the parallel aggregation of multiple data
sources (as long as they are deemed, by humans in the loop, to pertain to one same
task/environment), then training new policies in the background and updating the
deployed controllers asynchronously.
Beyond direct application in standalone deployments, ML-generated controllers
are also used as components of larger systems. For example, some of the building
blocks of the Multi-level Darwinist Brain architecture (MDB) [20] consist of policies
operating on world models. Reminiscent of hierarchical RL, the selection of a policy
among the repertoire is itself a policy, subjected to a satisfaction model (intrinsic moti-
vation). World models and the satisfaction model are initially given by the designers,
with some guarantees that they are general enough to support a determined set of
tasks and environmental conditions. Learning of policies and models is performed
offline via neuro-evolution, subject to hand-crafted value functions. In that sense,
an MDB controller still depends critically on the foresight of its designers, which
limits its potential for autonomy. Yet it can be altered after deployment (incurring
some inevitable downtime) in an incremental fashion at a cost arguably lower than
that of re-engineering from scratch. This line of work culminates for example in the
DREAM architecture [68]. Whereas MDB representations are tuned, ad hoc, to indi-
vidual control policies, DREAM introduces a generic slow refactoring loop (called
‘re-representation’) that continually extracts common denominators from existing
representations in order to form more abstract ones to eventually facilitate transfer
learning, in support of a user-defined training curriculum.
There is no contention about the value of ML as long as claims remain aligned
with established principles and proven capabilities. As we have seen, whereas an
agent is learned, the deployed agent does not itself learn or act autonomously. This
is fully consistent with both the principles of ML and its goals and makes perfect
sense from an engineering/economic point of view. What is obviously more debatable
is the claim that a process designed for manufacturing purely reactive software will
eventually produce thinking machines. The next two chapters will question, in depth,
the plausibility of such a prophecy.
22 3 Where is My Mind?
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 4
Challenges for Deep Learning
Are you a good noticer? Do you notice things well? I mean, for
instance, supposing you saw two cock-starlings on an apple-tree,
and you only took one good look at them — would you be able to
tell one from the other if you saw them again the next day?
Deep learning (DL) has emerged as the dominant branch of machine learning, becom-
ing the state of the art for machine intelligence in various domains. As discussed in
the previous chapter, this has led some researchers to believe that deep learning
could hypothetically scale to achieve general intelligence. However, there is increas-
ing consensus (e.g. [57, 210, 230]) that the techniques do not scale as well as was
anticipated to harder problems.
In particular, deep learning methods find their strength in automatically synthe-
sizing distributed quantitative features from data. These features are useful insofar
as they enable mostly reliable classification and regression, and in some limited
cases also few- or zero-shot transfer to related tasks. However, it is increasingly
questionable whether deep learning methods are appropriate for autonomous roles
in environments that are not strongly constrained. While there are still countless use-
cases for narrow artificial intelligence, many of the truly transformative use-cases
can only be realized by general intelligence.
We recall from Sect. 2.2 that, while we do not know the internal mechanisms
of human general intelligence, we observe that ‘science as extended mind’ is a
pragmatic description of a general intelligence model of the environment. However,
neural network representations are not readily interpretable, either to humans or more
importantly—as we subsequently argue at length—to the learning process itself.
This chapter has two purposes: the first is to explain what properties are wanting
(their relationship to the entire book is shown in Fig. 4.1) and to elicit fundamental
© The Author(s) 2022 23
J. Swan et al., The Road to General Intelligence,
Studies in Computational Intelligence 1049,
https://doi.org/10.1007/978-3-031-08020-3_4
24 4 Challenges for Deep Learning
Requirements
Knowledge RepresentaƟon ComposiƟonality Sect. 4.1
Strong Typing Sect. 4.2
ReflecƟon Sect. 4.3
TTa sks
Tasks Declara ve Goals and Constraints Sect. 5.2
Non-sta onarity Sect. 5.1
Fig. 4.1 In this chapter we argue that the lack of structure in representation languages created via
deep learning are in conflict with the requirements of general intelligence
obstacles posed by deep learning. The second purpose is to argue that ‘science as
extended mind‘ offers a more effective perspective for designing the desired system.
The latter is further developed in Sect. 7.2 and is the foundation of our proposed
inference mechanisms in Chap. 9. The following claims contrast deep learning with
the requirements for operationalization of the scientific method:
• Representations are not compositional, which make them inefficient for modeling
long-tailed distributions or hierarchical knowledge.
• Representations are not strongly typed, which prevents verification against adver-
sarial scenarios and hinders generalization to new domains.
• Representations are generated by models which do not support reflection, which
restricts model improvement to gradient-based methods.
4.1 Compositionality
However, this notion is far weaker than is actually required, in particular for
purposes of AI safety but (as we subsequently argue) also for scalable inference
via greater sample efficiency. The weakness of this notion of compositionality is
evidenced by numerous challenges for deep learning (discussed in more detail in
this and subsequent chapters):
• Adversarial examples.
• Weak generalization capability.
• Inability to explicitly induce or propagate more than a small number of types of
invariant (translation, rotation, etc.).
Indeed, it could be said that DL is closer to the merely syntactic notion of compos-
ability than the semantic notion of compositionality. In its most degenerate form, the
syntactic notion is merely the observation that feature ensembles are instances of the
‘composite’ design pattern [101, 369] and hence hierarchically aggregated features
are syntactically substitutable for individual ones. However, that does not impose any
intrinsic constraints on what the features represent or what the ensembles compute.
In contrast, compositionality is defined as [207]:
The algebraic capacity to understand and produce novel combinations from known compo-
nents.
The term ‘algebraic’ here effectively means ‘having well-defined semantics’, in the
sense that the behaviour of a composite exhibits constraints that are a function of
those of its component parts. The role played by the alleged compositionality of
DL is lacking in almost every respect of this definition: in algebraic terminology, the
feature representations in DL layers can be ‘freely composed’. In contrast, in Chap. 9
we describe a mechanism for imposing a denotational semantics on composite rep-
resentations.
Hence, the only property in ML for which there is a guarantee of generalized, end-
to-end compositionality is differentiability [90]. If, as seems likely, it is necessary to
express more directly whether or not some desired property is compositional, then
this requires extending DL far beyond ‘differentiable programming’. In common
practice, composability in DL consists of assembling a network from constituent
parts which may be trained ‘end-to-end’. Usually, this follows the encoder-decoder
pattern where the encoder is responsible for generating vectorized features, and the
decoder maps the features for classification or regression. This paradigm is common
in deep learning applied to sequential data or labels. Example domains are text-
to-text [66, 276, 330, 370], image-to-text (and vice-versa) [169, 173, 283], and
program synthesis [46, 123, 124, 284]. When explicitly tasked with the generation of
compositional representations, neural networks have been observed to exhibit better
generalization performance [5]. However, as observed throughout the literature (e.g.
Liska et al. [204]), more complex architectures tend not to scale well, showing limited
scope. We argue the following as the most salient consequence:
26 4 Challenges for Deep Learning
Deep learning is not designed for generating typed representations. This defi-
ciency is prohibitive for developing general intelligence, since strong typing is
essential for invariant propagation, inheritance, verification, and rapid adapta-
tion of existing inferences to new observations.
28 4 Challenges for Deep Learning
invariances which does not involve the distillation of new types. This of course is an
open challenge and we discuss it further in Chap. 9.
Out-of-Domain Generalization and Meta-learning
An essential trait for general intelligence is the ability to efficiently leverage learned
knowledge when facing a novel yet related domain. Existing literature describes
techniques for domain adaptation, where a model can perform in another domain
with few or no labeled examples. Domain-invariant feature learning [102, 232] and
adversarial training methods [348, 352] have shown positive results for deep net-
works. Transfer learning and semi-supervised learning for deep neural networks are
also well-studied topics.
Nonetheless, there is consensus that there remains much to be desired from deep
learning in this regard. We argue that typed representations are the natural way to
address this requirement for general intelligence. Humans are naturally capable of
seeing new situations as modified versions of previous experience. In other words,
there is an abstract type of which both the prior observation and the current stimuli
are examples, but with certain attributes differing. Given enough new observations,
it may be appropriate to reify a different type altogether. Rapid domain adaptation
can also be modeled as a scientific exercise of determining an unknown type with
minimal experimentation. We expand on this perspective in Sect. 7.2.
Meta-learning has emerged as a popular research topic aiming to expand the
generalization of deep learning systems [86, 200, 227, 239, 286]. These methods
train models to a location in parameter space which allows for efficient adaptation
to unseen tasks as opposed to unseen data points. Conceptually, this may appear to
expand generalization capacity. However, the framework makes assumptions about
tasks coming from the same distribution, much as for individual data points in a
dataset. As such, it suffers similar issues, such as inflexibility to non-stationarity
in tasks. More related to generalization, meta-learning does not yield transferable
abstractions, rather it gives an optimized starting point for creating adaptable models.
As argued by Chao et al. [45], meta-learning is not fundamentally all that different
from supervised learning. This makes it unlikely to truly resolve the challenges of
generalization when the scope or nature of tasks are broadened.
4.3 Reflection
In the previous two sections, we made the case for compositionality and strong
typing as necessary properties for representing knowledge for general intelligence.
This section is concerned with what is needed in order to adapt that knowledge to
make it more accurate and comprehensive. In deep learning, this process is handled as
an optimization problem with the target being minimal error, which fits neatly with a
purely numerical class of models. The incorporation of symbols via typed expressions
complicates this but also offers new opportunities, which we now discuss.
30 4 Challenges for Deep Learning
The notion of scalability is applied in various ways. In this section we will draw
attention to scalability in terms of sample efficiency with respect to training data.
We expect that as an agent grows more intelligent, it should be able to evaluate and
compare increasingly complex models with roughly the same efficiency as when
it was less developed and learning about simpler phenomena. This is one of the
merits of the scientific method: given two competing theories for physical reality,
e.g. Newtonian mechanics and Einsteinian relativity, a single experiment (indeed,
even a ‘thought experiment’) may suffice to decisively favor one model over the
other.1
In this section we first characterize what learning looks like for deep neural net-
works and consider the choices that researchers make in order to support this sort
of learning. Ultimately, we find that the resulting formulation is not well suited for
kind of rapid, scalable learning that general intelligence requires and present the
corresponding claim below:
Lacking compositionality and strong typing, deep learning also cannot support
meaningful reflection over proposed models of its target domain. The property
of reflection allows for direct, structured updates to knowledge, which compares
favorably to deep learning’s exponentially growing requirements for data and
an undesirable dependence on end-to-end model design choices.
We reiterate that Claim 1 given at the beginning of this chapter does not detract from
the merits of deep learning. To date, these methods have far outshone their prede-
cessors in their ability to learn features from observational data and make valuable
predictions from them. Most standard neural network architectures are also very
amenable to parallelization and acceleration, making them practical for their cur-
rent use cases. Hence, for some, the preceding discussion may not provide sufficient
impetus to look beyond deep learning. For many practical applications in narrow
domains, it suffices to have training methods which are applicable in the presence
of relatively massive computational resources. In theory, recent work [81, 99, 236,
278, 309, 375] could yield representations which are causally disentangled and enjoy
greater compositionality, but it seems likely that even the exponents of such forms
of causal representation learning would admit that much progress is yet to be made.
We therefore recall the motivation which opened this chapter: the challenges for
deep learning we have discussed arise in the context that the paradigm was not
designed with general intelligence in mind. We can tie the challenges of machine
learning together as being the symptoms of a ‘narrow framing of the problem’. The
most salient part of the framing is that deep learning model parameters are intended
to converge to some satisfactory optimum given the dataset and iterative learning
procedure. Training a model with a priori knowledge of the desired outcome is
fundamentally at odds with the notion of open-ended learning [340], an essential
part of general intelligence.
It should be emphasized that, even for the task of constructing general intelli-
gence, we do believe that deep learning may be the most sensible and practical way
to implement very basic layers of perception on high-dimensional sensory inputs
such as visual and audio feeds. Although we have emphasized the importance of
compositionality and strong typing, we also acknowledge that they may not always
32 4 Challenges for Deep Learning
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 5
Challenges for Reinforcement Learning
1 Even though model-based RL leverages world models during learning, its end-product still remains
a policy.
Requirements
Knowledge Representa on Composi onality Sect. 4.1
Strong Typing
T Sect. 4.2
Reflec on Sect. 4.3
Fig. 5.1 In this chapter we argue that the centrality in RL of a priori fixed reward functions oppose
key requirements for general intelligence
Claim 4
Reinforcement learning techniques which use deep neural networks for function
approximation will inherit the issues stated in Chap. 4.
As per Sect. 2.1, we center our requirements around the continual adaptation of a
world model to achieve a variety of arbitrary goals. We also stipulate that (1) learning
without forgetting should be relatively cheap and (2) safety must be certifiable.
Arguably one the most central concept in RL is the value function which usually
express the value of a state or state-action pair in terms of the expected returns
under some policy starting from that state—recall from Section 3.1 that a policy is
simply a distribution of actions conditioned on states. The iteration of value functions
toward a fixed point makes sense because the reward in RL is specified beforehand
and is assumed to remain stationary. If we entirely commit to this strategy, then the
only way to become more general is to widen stationarity over broader notions of
tasks and environments. Learning is then extended to maximize reward over an entire
distribution of MDPs [293], [305], [277] which share a state and action space whereas
the reward function is selected from a distribution. These modifications, which are
examples of meta-RL, purportedly allow for an agent to learn how to succeed over a
variety of tasks, and would seem to be the most developed route in fully stationary
RL towards general intelligence.
Notwithstanding those extensions, the value function is still an ideal place to
scrutinize the foundations of RL. The assumptions of RL make it suitable to apply to
problems where there is a naturally occurring quantitative objective to be maximized
5.1 A Priori Reward Specification 35
and there are ‘operationalized’ actions [11], as exemplified by e.g. video games,
where even random behavior has non-negligible correlation with success. Even then,
the RL engineer may have to do significant work to make a dense, shaped reward
from a sparse reward, in order to obtain a sufficiently rich reward signal. This is not
all that different from having to invest effort into designing the architecture of neural
networks to accommodate inductive biases and ease nonconvex optimization. In the
context of general intelligence, we therefore present the following claim:
Claim 5
which explicitly requires a proxy for desired states and would consequently be vul-
nerable to Goodhart’s Law: “When a measure becomes a target, it ceases to be a
good measure.” [122].
We believe that RL is given credit for much of the work that is actually attributable
to engineers through their intelligent design of reward functions and model archi-
tectures. More fundamentally, we argue that a framework for general intelligence is
better founded if it does not assume that rewards (or more generally, objectives) can
ever be fully known, even by the stakeholders. While some things can be optimized
to a fixed point, a generally intelligent agent intended to replace human labor would
not benefit from such essentially stationary notions. For example, domestic robots
that are expected to work well around growing children must be able to accommodate
ever-changing constraints and preferences from the family as its youngest members
grow up. In fact, it becomes increasingly evident that, in the context of general intel-
ligence, the very concept of ‘reward function’ ends up creating more problems than
it solves. This notion is explored further in Sect. 6.1.
In Sect. 4.3, we emphasized that reflection is a key requirement for knowledge rep-
resentation. As with our preceding discussion of deep supervised learning, we have
identified that feedback mechanisms are a key bottleneck in RL. In the former, we
focused on the limitations of iteratively updating neural networks through gradient
descent. Here, we discuss an analogous issue which is fundamental to RL, irrespec-
tive of reliance on function approximation using deep architectures. In essence, in
the MDP formalism the agent receives a reward at each timestep as a result of its
state and chosen action, and hence, goals (taken to be regions of the state space) are
only expressed via sampling. Note that RL based on model-predictive control (MPC)
often takes a reward function in closed form, visible to the agent, though this still
leaves open the problem of finding its optima.
For arguably the majority of human activities we wish to automate, we take
advantage of the fact that we can interpret our desired goals prior to the need to
sample any information from the environment. Granted, this requires sufficient world
knowledge to make sense of a goal description, but in such a case, this approach
confers multiple benefits. We explore ways to make use of this perspective in Sect. 6.1.
Here, we highlight the undesirable side effects of a sample-based approach to goal
descriptions, of which RL reward feedback is an example. The key point is given in
the following claim:
5.2 Sampling: Safety and Efficiency 37
Claim 6
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 6
Work on Command: The Case for
Generality
Workin’ 9 to 5,
What a way to make a living,
Barely gettin’ by,
It’s all takin’ and no giving,
They just use your mind and they never give you credit,
It’s enough to drive you crazy if you let it.
Let us recall that, from a pragmatic perspective, AI is nothing more or less than a
tool for implementing a new leap in automation. This pragmatic perspective on AI
is the one that matters in the current world economy, and therefore will necessarily
receive primacy for development. While we do acknowledge AI-related notions such
as ‘artificial life’, a significant business case for ‘AI as organism’ has yet to be
demonstrated, and therefore we consider AI that does not directly seek to deliver
automation to be out of scope.
As discussed in Sect. 3.2, applications in automation with a known and well-
defined target or utility function can be addressed using reinforcement learning—RL
allows one to optimize a controller to perform exactly that task with guarantees on
speed and accuracy. However, training policies is difficult and time consuming, and
this hampers an orthogonal class of applications that require the minimization of
the cost/latency incurred by engineer-then-deploy cycles. There is indeed a recurrent
need in industry to streamline automation engineering and, in particular, to capitalize
on learning processes, recognizing that knowledge acquired from automating one
process can be useful for automating the next. Moreover, techno-economic conditions
for business sustainability are shifting rapidly, as exemplified by e.g. the Industry 4.0
initiative in Europe [74, 141]: production lines need to absorb increasing levels of
flexibility, meaning that processes are irreversibly moving away from the stationarity
that was once the norm. For example, a system may be tasked with controlling the
assembly process of product X, but after a few months X is replaced by new-and-
improved product Y, similar to X in some ways. Now we would like to tell the
system:
Stop assembling X immediately, here’s a specification of Y, and here are most of your old
and a few new effectors. Now start assembling Y, avoiding such and such kinds of defects
and wastage.
We use the notion of ‘work on command’ to refer to the ability of a system to
respond, at any time, to changes in task specifications, both positive (goals to be
achieved) and negative (constraints to be respected). To be of any use, we include
in this notion the ability to leverage all relevant knowledge from prior tasks with
little effort. Such leveraging should be non-destructive, i.e., it must be possible to
command the system to resume an earlier task, on which it should in general be able
to perform at least as well as before.
We posit that performing work on command requires general intelligence. Keep-
ing the pragmatic perspective focused on automation engineering, a system’s gener-
ality can be measured as the inverse of the cost of its deployment and maintenance
in a given range of real-world task/environment spaces.1 It will be clear that, the
more general an AI system, the better (and cheaper) it will be at performing work on
command.
We have seen in Sect. 3.1 that the function of RL is to compile a policy consisting
of a direct mapping from environment states to actions. In this paradigm, behavior
is the (fixed) computation of a response to a stimulus, best attuned to the assumed
specifications of the task and the environment. This notion of ‘behavior as a curried
planner’ offers the benefits of speed and accuracy: once learned, a policy needs
little computation to deliver optimal results. However, this comes at the cost of
brittleness: adaptation is impossible should the task or environment escape the initial
assumptions after deployment. But it does not have to be like this. In cybernetics,
system theory, and psychology, behavior is better described as “a control process
where actions are performed in order to affect perceptions” as noted by Cisek in his
critique of computationalism [49]—see also von Uexküll [191]. Since it is a process,
a behavior can be more easily adapted to the variability of its goal, environment,
and contingencies. For this reason, we consider processes, not algorithms, as the
concept most constructive for the synthesis of intelligent systems, as opposed to
purely reactive systems—see also “The Irrelevance of Turing Machines to AI” by
Sloman [316].
From this perspective, agents have to learn a world model which allows for
dynamic planning through simulated trajectories.2 The value proposition is that this
1 One might argue that generality is better defined as inversely proportional to computational effort
than to monetary cost. However, both involve empirically-determined (but in general, arbitrary)
resource expenditure. From the pragmatic, real-world perspective taken here, we find money the
more suitable unit of measurement.
2 As noted before, even if model-based RL leverages world models while learning policies, the final
objectives to hasten the learning of a single complex task [6, 161, 237] or alternatively
parameterize a set of related tasks for multitask learning [322, 371]. Regardless, there
is no alteration to the reward structure: it is defined a priori and is sampled from every
state because while the ‘goal’ is known, the reward is not.
The goals in WoC-RL effectively replace the notion of reward to become the
sole motivation for agent action. Whilst state space regions defined by these goals
can indeed be equipped with some associated quantity indicating desirability (which
could therefore be said to constitute a ‘reward’ when reached), the set of state space
regions from which this reward can be earned is explicitly specified, thus obviating
the need for pointwise sampling of rewards. As such, a WoC-RL agent has access,
at any real-valued time t, to the following goal structure G:
Gt = { (S 1 , T 1 , R 1 ), (S 2 , T 2 , R 2 ), . . . } (6.1)
where each S i is (a partial specification of)5 a state, each T i is a time interval, and
each R i is a positive or negative real. Having access to Gt , the agent knows which
(future) states are promised to give a reward or punishment when visited during a
specific time interval. If the current state St matches a rewarding state (i.e., there
exists (S i , T i , R i ) ∈ Gt such that St ⊇ S i and t ∈ T i and R i > 0), a goal is said to
be achieved; if the current state matches a punishing state (R i < 0), a constraint is
said to be violated. The relative values of R i are only useful to an agent in that they
aid prioritization between different goals.
The idea is that the tuples in G will typically persist over long time spans. Nonethe-
less, in accordance with the requirement to perform work on command, a user can
at any time modify G by adding or deleting state space regions: whether goal states
in order to ‘command’ the agent to achieve things, or forbidden regions in order to
‘warn’ the agent. Assuming that the user is not perfectly wise in specifying the work,
it may happen that G will contain just one or few goals at any time, but also ever
more numerous or detailed constraints, as the user’s insight increases.
The expected value of the reward of a WoC-RL agent for taking action a in state
s can be defined as in traditional RL [331]:
.
r (s, a) = E[ Rt | St−1 = s, At−1 = a ] (6.2)
but WoC-RL additionally defines the reward signal R as being completely specified
by G:
.
Rt | G = [ R | (S, T, R) ∈ G, St ⊇ S, t ∈ T ] (6.3)
where Rt | G reads as ‘the reward given the goals’. The long-term return as well as
the value function can again be defined as is traditional in RL [331]; however, neither
of these notions serve any purpose in WoC-RL.
The goal structure G can also be viewed as an explicit description of those states for
which a (sparse) reward function would output non-zero values—the only difference
being that this description is explicitly given to the WoC-RL agent without needing
to be sampled. Here we see that RL is a restricted case of WoC-RL, namely where a
(hypothetical) user would only provide instantaneous goals. That is, RL is obtained
if G is restricted as follows.
This stipulates that rewards are immediate and their distribution is not explicitly
known a priori. Restricted in this way, WoC-RL turns into traditional RL, as the
agent lacks prior information about goals and constraints, never being told ahead of
time when or where a reward might be received.
6.2 Planning
work : Time → A
that finds the action with the best return according to its current world model M and
goal structure G (which are retrieved from an implicit storage). Here it is crucial to
note that work is fixed, i.e., it is not a learnable policy. As prescribed before, G is
changeable by the user but not by the agent. That leaves M—the world model—as
the only adaptable ingredient of WoC-RL. For pedagogical reasons let us formally
specify work in terms familiar to the RL community:
.
work(t) = arg max q∗ (St , a, t | Gt , Mt ) (6.5)
a
44 6 Work on Command: The Case for Generality
where q∗ is analogous to the optimal action-value function [331], except here it takes
not only state s and action a, but also explicitly propagates time t, goals G, and model
M:
.
q∗ (s, a, t | G, M) =
pM (s , t | s, a, t)[(Rt | G) + max
q∗ (s , a , t | G, M)]
a
s ,t
where it is assumed the state transition probability p is included in, or can be derived
from, model M.6 It is important to note that the goal structure G is propagated
unchanged by the agent even though it can be changed at any time by the user: if that
happens at time t , work(t ) will start using the new goal structure Gt containing
the user’s new goal and constraint specifications. This is in line with the work-on-
command ideal of an agent performing task(s) exactly until the user tells it to do
otherwise.
The purpose of the above formula is to demonstrate that WoC-RL is not con-
cerned with sampling rewards or learning a policy. State quality q∗ is well-defined
in terms of G and M which are given to q∗ . Action sequences (plans) output by
work change if and only if G or M changes. Thus, here we see how WoC-RL avoids
conflating knowledge and motivation, which is crucial for the ability to work on com-
mand: whenever motivations are (externally) adapted, the latest knowledge should
immediately be brought to bear to act toward those new motivations. In contrast, the
classical notion of a policy is as a body of (behavioral) knowledge with respect to
one target. This offers no provisions for re-applying the same knowledge for new
and changeable targets, such as when automating a succession of related business
cases.
Still for pedagogical reasons, we sketch below the algorithmic context in which
work can be embedded. The simplest setup is a loop which also updates the world
model M based on the results of its predictions (gathered in ‘pred’ below). Note
that changes to the goal structure are supposed to occur externally: work(t) always
retrieves the actual Gt .
1 pred := ∅;
2 while true do
3 t := WallClock.now();
4 expired := { st ∈ pred | t ≤ t };
5 update Mt if ‘expired’ contains surprises;
6 pred := (pred \ expired) ∪ {St };
7 pred := pred ∪ { forward(Mt , st ) | st ∈ pred };
8 a := work(t);
9 execute(a);
6 Although in RL it is typically the case that t = t + 1, WoC-RL does not require such discrete
time-stepping, assuming instead simply that t > t.
6.2 Planning 45
The function forward yields zero or more predictions. Without going into the
detail of forward, we remark that its predictions could carry a likelihood, which may
be multiplied for each forward step—and forward may omit predictions below a
certain likelihood threshold. In that case, line 7 could be performed in a loop until
‘pred’ no longer changes.
We see in this algorithm that Mt is used for making predictions, causing Mt to be
updated whenever they fail. Now we can also see the naivety of work as specified in
formula (6.5): if Mt is used forward for prediction making, then it should also be used
backward for abductive planning. Such abductive planning can proceed iteratively,
in a similar manner to the prediction making as specified in the above algorithm.
Notice also that the actual work on command, as embodied in lines 8 and 9, can
be parallelized with the prediction making, since these two lines do not depend on
the others, only on the objective time. Thus, we see here how we can finally abolish
the ‘cognitive cycle’, namely the perceive–act–update loop in RL, which is inherited
from the sense–think–act loop in GOFAI. However, RL still relies on time-stepping
in lockstep with the environment, dangerously assuming that the environment does
not change while the agent chooses its next action. The way out is to realize that
changes in knowledge and motivation do not have to occur at every time-step or at
set times and that plans span over larger time horizons.7 For as long as predictions
succeed, and for as long as the goal structure is not modified by the user, the plans
that were or are being formed can be assumed to remain valid—to be best of the
agent’s knowledge. Thus, a WoC-RL agent could be made to learn, predict, and plan
continually and asynchronously from the environment. An asynchronous open-ended
continual inference mechanism will be described later in Sect. 7.3.
Finally, this sketch of WoC-RL illustrates the need for bootstrapping: if M and/or
G would start out empty, no work will be performed. In the early learning phase, the
user will need to ‘seed’ M with minimal world knowledge, and populate G with
simple goals that can be achieved with that little knowledge. The user may have to
act as a ‘teacher’ or ‘mentor’, populating G with progressively more difficult goals
and constraints.
Here we have described how a pragmatic perspective on general intelligence
calls for the engineering of systems that can perform work on command. WoC-
RL illustrates what such a system might look like, and how it would differ from
‘vanilla’ RL. A crucial aspect is the decoupling of knowledge (M) and motivation
(G). Although we have detailed what G may look like technically, we have so far not
delved into the learnable M. To attain general machine intelligence, both G and M
must be framed in a different manner than has been customary throughout the history
of AI, from GOFAI to reinforcement learning. The proposed approach is described
in the following chapters.
7This would require changing the signature of the anytime planner work to return a set of timed
actions instead of a single one.
46 6 Work on Command: The Case for Generality
Requirements
Knowledge Representa on Composi onality Sect. 4.1
Strong Typing
T Sect. 4.2
Reflec on Sect. 4.3
Tasks
T sks
Ta Declara ve Goals and Constraints Sect. 5.2
Non-sta onarity Sect. 5.1
Fig. 6.1 Anytime operation is required to deliver relevant correct action plans on time and asyn-
chronously (relative to system activity). It enables the economically desirable capability of work on
command
8Given a goal, at time t, with a deadline d, the WCET of predicting and planning must be less than
d − t.
6.3 Anytime Operation 47
the study of bounded rationality [362], there is work on the corresponding concept of
anytime bounded rationality [27, 150, 241], and we see that this combination is an apt
way to unify many of the desirable properties of general intelligence. Unfortunately,
common practice in RL is fundamentally incompatible with anytime rationality since
it relies both on synchronous coupling with the environment and the concept of
‘behavior as a curried planner’. As alluded to previously, anytime operation has not
been considered in most demonstrations of RL to date, but we believe that it will
become a stubborn hindrance for situated control and multi-agent scenarios.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Part II
Semantically Closed Learning
Chapter 7
Philosophy
In many respects, machine learning’s current concerns are reminiscent of those which
heralded the rejection of GOFAI. Since it was evidently not possible to construct a
suitably malleable model of the world a priori in terms of rigid logical facts, the
solution was surely to induce the required representations, ideally from raw data?
Given the challenges previously discussed, the requirement to create robust models
is just as pressing today as it was in the early 1990s, when GOFAI was nominally
supplanted by the ‘Physical Grounding Hypothesis’ [35]. In that sense, AI still needs
learning algorithms that can do more than ‘skim off the surface’ of the world they
attempt to represent. By this, we mean that knowledge representation should enjoy
both robustness and malleability. By ‘robustness’ we mean Gestalt compositional
interpretation in the presence of noise, so that turtles are not considered to be rifles
[9] even if their images have some similarity at a local scale. By ‘malleability’ we
mean the ability to envision a range of alternative hypotheses which are compatible
with some context. In order to achieve this, we believe that machine learning needs
to undergo the same fundamental shift that took place in the philosophy of science
in the mid 20th century.
The discussion of Chaps. 4 and 5 illustrates that a major concern for DL and RL is the
ability to obtain robust generalizations with high sample efficiency.1 However, the
ubiquity of domains with long-tailed distributions is antithetical to the very notion that
learning can be predominantly driven via sampling. For example, a widely-respected
pioneer of autonomous vehicles [339] has stated:
To build a self-driving car that solves 99% of the problems it encounters in everyday driving
might take a month. Then there’s 1% left. So 1% would still mean that you have a fatal
accident every week. One hundredth of one percent would still be completely unacceptable
in terms of safety.
Human drivers, though naturally fallible, are considerably more robust to the combi-
natorially vast range of situations they might encounter. What change in perspective
might be required in order to imbue a system with analogous capabilities?
The essential practice of supervised learning is to tabulate samples of input/output
observations and fit a regression model based on a numerical loss function. Like-
wise, the RL framework optimizes an objective function sampled iteratively from
the environment, and deep RL uses deep learning tools for function approximation.
As previously observed [212, 344], in terms of the philosophy of science, this is
very much in the empiricist tradition, in which observations are the primary entities.
In DL, treating observations as primary has led to the notion of model induction
as a curve-fitting process, independent of the domain of discourse. However, the
incorporation of sufficient analytic information can obviate the need for sampling
in both cases. As observed by Medawar, ‘theories destroy facts’ [221]: in order to
predict e.g. future planetary motion, we do not need to tabulate the position of all
the molecules that compose a celestial body, but rather apply Newton’s Laws to
a macroscale approximation of its center of gravity [326]. Similarly, under certain
conditions a description of the behavior of the simple pendulum can be obtained
in closed form [160], demoting empirical sampling of orbits to the role of fitting a
distribution to any remaining noise. Indeed, science itself can arguably be charac-
terized as the progressive transformation of ‘noise’ into ‘signal’: replacing, insofar
as human comprehension permits, uncertainty and nondeterminism with coherent
(i.e. relationally-consistent) structure. The resulting structures yield a much stronger
notion of ‘compression’ than expressed by the corresponding use of the term in RL.
Although the empiricist perspective has prevailed since the Renaissance, it was
inextricably bound to a deep philosophical problem: the ‘Problem of Induction’. The
problem asks what firm basis we have to believe that past inferences will continue
to be valid, e.g., that the Sun will continue to rise in the morning. Epistemologically,
we cannot hypothesize that past distributions of observations will resemble future
distributions, since this begs the question. The problem resisted all solution attempts
1 Addressing this concern is necessary but not sufficient: even if we could obtain robust generaliza-
tions via current DL methods, they would be neither granular nor compositional, both of which we
subsequently argue for.
7.1 The Problem of Machine Induction 53
until Karl Popper provided one in the mid 20th century [268]. Popper’s solution
was to show that conclusions could be well-founded without requiring that ‘laws’ or
distributions be somehow Platonically propagated through time. Instead, he argued
that although our hypotheses may be inspired by observation, they are altogether of a
higher-order, being predominantly characterized by their explanatory power. As sub-
sequently further developed by Deutsch [63], the key objects of discourse for science
are therefore not observations but the explanatory characteristics of the hypotheses
which they motivate. Hence, consistent with the definition of Sect. 2.2, we can con-
sider a hypothesis as an inter-related system of statements intended to account for
empirical observations. The tentative, self-correcting nature of the scientific method
means that:
• At any given instant, the collection of statements are not necessarily entirely self-
consistent (cf. the longstanding inability to reconcile quantum mechanics and
general relativity).
• Falsifiability via observation is not the primary driver. Although Popper empha-
sized falsification via observations, a subsequent refinement [186] emphasized
that the prevailing hypothesis may not even agree with all observations, provided
that it is a rich enough source of alternative hypotheses which potentially could
do so. It seems reasonable to consider this to be the spirit underlying the opening
Feynman quote.
A famous demonstration that such heuristics are part of common scientific prac-
tice is the discrepancy between the predictions of general relativity and the observed
rotational velocities of galaxies [246], relativity being a hypothesis which has repeat-
edly been vindicated in other wide-ranging experiments. Hence, while there is still
some global inconsistency between different local models (which will continue to
motivate further, hopefully ultimately unifying, hypotheses), this still allows the use-
ful application of local models at the appropriate scale. Over the years, the philosophy
of science has conjectured various heuristics for confronting rival hypotheses:
• Parsimony: this heuristic is exemplified by ‘Occam’s Razor’. However, it must be
stressed that this is not merely a domain-independent measure such as is advocated
by Algorithmic Information Theory [43], but something that is achieved via reflec-
tive interpretation of the hypothesis in order to reconcile causal inconsistencies.
• ‘Hard to Vary’: This notion was introduced by Deutsch [63]. Hypotheses which are
so unconstrained as to permit the generation of many roughly-equivalent alterna-
tives are unlikely to capture the pertinent causal aspects of a situation. Conversely,
when a hypothesis which has been preferred to many others generates few or no
alternatives, that is an indication that it is a good hypothesis. An initial investigation
of the role played by ‘hard to vary’ heuristics in AI is given by Elton [79].
It is also important to note that, by virtue of compositionality, the notion of hypoth-
esis here is stronger than ‘distribution over outcomes’ [64]. For example, suppose that
six in a hundred patients who are flu sufferers were to hold a crystal and experience
a subsequent improvement. Despite statistical significance, an experimenter would
not (in the absence of some other deeply compelling reason) subscribe immediately
54 7 Philosophy
to the notion that the crystal was the cause, because of the end-to-end consistency
of existing explanations about how viral infections and crystals actually operate.
Such inferences therefore operate at a different level than purely statistical notions,
in which claims of causality must anyway be justified in terms of priors known to the
domain-aware researcher when they frame the experiment. Hence the researcher here
has two privileges that traditional ML lacks: firstly, prior semantic knowledge about
the type of variables (the displacement of the pendulum bob is measured in radians,
the color of the pendulum bob is a property of the material with which it is coated,
etc.). Secondly, in the case that prior knowledge (or hypothetical interventions such
as Pearl’s ‘do operator’ [254]) does not adequately make the case for causality, the
researcher has the potential to clarify further via alternative experiments.
Scientific explanations have proved remarkably effective in describing the world
[363]; for example, our understanding of force and motion at the ‘human scale’ (i.e.,
between quantum and relativistic) has remained robust since Newton. Most signif-
icantly, such understanding is emphatically not in general a quantitative function
of the causal chain (e.g., some loss or objective function), but is instead dependent
on the overall consistency of explanation. ‘Consistency’ here means not only con-
sistency with respect to empirical observation, but the ‘internal consistency’ of the
entire causal chain described by the hypothesis.
The solution to the ‘Problem of Machine Induction’ should therefore precisely
mirror Popper’s solution to the ‘Problem of Induction’, i.e., to reject empiricism in
favor of explanatory power and attempt to afford suitably curious machine learners
the same privileges in determining causality as are presently enjoyed only by human
experimenters. In the remainder of this chapter, we describe ‘Semantically Closed
Learning’, a framework proposed to support this.
Just as the logical expressions of GOFAI could be said to be too ‘rigid’ with respect
to their ability to model complex environments, so the parameter space of DL archi-
tectures is too ‘loose’. Hence, while it is relatively computationally inexpensive to fit
a deep learning model to almost any naturally-occurring observations [203], general-
ization is certainly not assured [374]. It would appear that something intermediate is
required, in which there is no requirement for a priori provision of either an arbitrarily
complex objective function or an exponentially large collection of rules. To that end,
we describe below a set of operations intended to support principled and scalable
scientific reasoning. In particular, the ‘scientific’ aspect of reasoning can be charac-
terized by the gradual progression from an extensional representation (i.e., pointwise
tabulation of the effects of operators) to an intensional one (i.e., representable as an
expression tree with a knowable semantic interpretation), as with the analytic descrip-
7.2 Semantically Closed Learning (SCL) 55
tion of the pendulum described above.2 These operations are invoked by a granular
inference architecture, a reference version of which is described in the next chapter.
We tie these together under the heading of ‘Semantically Closed Learning’ (SCL),
the name having been chosen with reference to the property of ‘semantic closure’. In
the context of open-ended evolution, the term semantic closure was coined by Pattee
[252, 253], who described it as:
An autonomous closure between (A) the dynamics (physical laws) of the material aspects
of an organization and (B) the constraints (syntactic rules) of the symbolic aspects of said
organization.
2 As previously stated in Chap. 4, we of course acknowledge that not all phenomena can be effec-
tively compressed analytically. In certain cases, deep learning is indeed a viable approach, but the
anticipation is that this will occur predominantly at the ‘leaves’ of the expression tree, which are
operating directly on raw sensor data.
56 7 Philosophy
In one sense, all human beings are scientists (cf. Sloman on ‘Toddler theorems’
[317]). For example, even at a nascent level of cognition, concept formation [262]
can be seen as abstracting from an iterated process of hypothesis generation/valida-
tion. One may consider higher levels of cognition to be hierarchical, in the sense that
they make use of lower-level hypotheses (such as object permanence) as elements. A
certain amount of introspection into one’s own problem-solving activity will reveal
that higher levels of human reasoning are an ‘end-to-end white-box’ activity: arbi-
trary (and even novel) features of both a problem and its proposed solutions can
be confronted with one another. These features are of course ultimately grounded
in experience of the real world [187]. As such, the hypotheses evoked by any con-
frontation of features are so strongly biased towards the real world that events from
the long-tail of impossible/vanishingly unlikely worlds are never even entertained.
However, to talk purely in terms of bias as a means of efficient inference is missing
a key aspect of human cognition in general, and the scientific method in particu-
lar: a hierarchy of compositional representations offers the potential to reason at a
much coarser granularity than the micro-inferences from which it was constructed.
Therefore, reasoning can be considered to occur in a Domain-Specific Language
(DSL) which has been abstracted from the environment by virtue of the ubiquity and
robustness of its terms [87, 187]. This is in contrast to the prevailing approach in
ML, in which inference is entirely mediated through numerical representations, as
biased via some loss or reward function. There, some level of generality is achieved
by reducing the notion of feedback to the ‘lowest common denominator’ across
problem domains.
It is therefore instructive to consider compositional learning in the context of the
historical development of intelligent systems. The early cyberneticists understood
that ‘purposeful’ behavior must be mediated via feedback from the goal [299]. Since
they were predominantly concerned with analog systems, the feedback and the input
signal with which it was combined were commensurate: they could be said to be
‘in the same language’. In the move to digital systems, this essential grounding
property is typically lost: feedback is often mediated via a numeric vector that is typ-
ically a surjective mapping from some more richly-structured source of information.
Useful information is therefore lost at the boundary between the learner and the feed-
back mechanism. In Sect. 9.4, we describe a compositional mechanism for hybrid
symbolic-numeric feedback at arbitrary hierarchical levels that does not intrinsically
require any such information loss.
As detailed subsequently, while the specific choice of expression language for the
DSL is rightfully an open-ended research problem, a number of elementary properties
are required:
7.3 Baseline Properties of SCL 57
: type
: rule : compound type
e.
: rule firing event type
a. b.
: value
: match
: rule firing
: producƟon
: rule firing event
c. d.
Fig. 7.1 Concepts involved in the reasoning process. a A rule (colored circle) implements a relation
between typed values (shapes on either side). For forward inference, rules are read left-to-right: an
object of one type is transformed into an object of another type via a transfer function. b A type
may be structured in terms of other types. c A repertoire of rules and types. Rules are values and
may be composed, such as in the blue and gray rules. Rule firing is also a value (here depicted on
the left side of the yellow rule), and so the reasoning process (i.e., the production of inferences)
can be reasoned about. d A possible unfolding of forward inferences produced by the repertoire. e
Inferences can produce new rules—they can also produce new types (not depicted)
At the most elementary level of representation, labels for state space dimensions
can be used to index into code (e.g. stored procedures) and/or data (e.g. ontolo-
gies). Building upon this explicit delineation and naming of state space dimensions,
a defining property of SCL is the use of a strongly-typed ‘expression language’ [265]
which can be used to represent both constrained subregions of the state space and
the ‘transfer functions’ that map between such regions (see Fig. 7.1). Types therefore
form the basis for a representation language which, at a minimum, constrains infer-
ence to compose only commensurate (i.e. type-compatible) objects. Unlike testing,
which can only prove the presence of errors, the absence of errors (and indeed, more
general safety guarantees) can be witnessed via strong typing. An elementary such
example is the construction of causal models which are physically consistent with
respect to dimensional analysis.
In software engineering, such modeling has well-understood safety implications;
for example, the bug that led to the loss of NASA’s ‘Mars Climate Orbiter’ in 1999
was due to an invalid conversion between two different dimensional representations
of impulse [215]. However, this example only scratches the surface of what can be
expressed [263, 265]: the rich body of work in type theory is an ongoing investiga-
tion into which aspects of system behavior can be expressed statically, i.e., without
requiring actual program execution. For example, certain invariants of the application
of transfer functions to subregions of the state space can be modeled via refinement
types, which use predicates to define which subset of the possible values repre-
sentable via a tuple actually correspond to valid instances of the type; as pedagogical
examples, one can explicitly define the type of all primes or the type of pairs of even
58 7 Philosophy
Requirements
Knowledge Representa on Composi onality Sect. 4.1
Strong Typing
T Sect. 4.2
Reflec on Sect. 4.3
TTasks
Tasks Declara ve Goals and Constraints Sect. 5.2
Non-sta onarity Sect. 5.1
Fig. 7.2 The property of ‘endogenous situatedness’ imbues an agent with knowledge of its own
causal abilities, which includes various proxies for the capabilities of its own reasoning process.
This of course also requires a reflective representation and declarative goals and constraints
As a minimum, the state space includes the actionables and observables of the envi-
ronment and/or the system. As discussed in Chap. 6, it must be possible to explicitly
declare objectives (‘goals’) as delineated regions within the state space. While the
base dimensions of the state space (corresponding to sensors and actuators for a situ-
ated agent) are specified a priori, the representation may also permit the construction
of synthetic dimensions (e.g. to denote hidden variables or abstractions, as described
below), similar to Drescher’s ’synthetic items’ [71]. As discussed under ‘Work on
Command’, the property of reflection obviates the need for sampling of rewards, and
allows for dynamic changes to goal specification, since state space constraints are
available to the agent. A reflective state space is also key for enabling the creation of
new types through abstraction.
7.3 Baseline Properties of SCL 59
As discussed, our pragmatic definition of general intelligence emphasizes the need for
flexibility of response. This requires that an intelligent system avoids the ‘perceive–
act–update’ cycle of traditional RL and GOFAI, in which it is effectively assumed that
the system and the world progress in lockstep. Since system deliberation time will
necessarily increase with environment and task complexity, the lockstep approach
will not scale. As per previous work on time-bounded inference [242, 243] the alter-
native is to perform many simultaneous inferences of smaller scope, each inference
having a WCET (worst-case execution time) that is both small and bounded—hence
‘granular’.
In some of our previous work [242], scheduling based on dynamic priorities
is used to explore a vast number of lines of reasoning in parallel, while retaining
flexibility and responsiveness. As described in more detail in the next Chapter, by
virtue of scheduling over granular inferences, attention is then an emergent process:
the analog of attention-weights [16] are the priorities, which are dynamically updated
as a function of the expected value of inference chains.
3 For example, projections from SCL’s internal ‘train of thought’ state trajectory, monitoring of
resource usage, etc., as described in Chap. 8.
60 7 Philosophy
Building on the support for strong typing (Prop-1) and reflective state space represen-
tation (Prop-2), SCL makes use of four methods of compound inference: hypothesis
generation, abduction, abstraction, and analogy. All inference steps in SCL can be
considered to be the application of some rule r : A → B, for types A and B. If no
such rule exists, then it is necessary to synthesize it, as described in more detail in
Chap. 10. This synthesis process may involve any combination of the following:
Abstraction
Hypothesis Generation
This is the means by which salient hypotheses are generated. Hypothesis genera-
tion interprets an existing hypothesis to yield a new one intended to have fewer
relational inconsistencies. It is a broader notion than the counterfactual reasoning
conducted using structural causal models (SCM), since rather than merely taking dif-
ferent actions, it considers the overall consistency of alternative models. Informally,
this can be seen as the imposition of semantic/pragmatic constraints on expressions
in a generative grammar. As an example from the domain of natural language, the
famous phrase “Colorless green ideas sleep furiously” is syntactically valid, but not
semantically consistent: neither color nor the ability to sleep are properties typically
associated with ideas. The semantic inconsistency here is immediately obvious to
the human reader, but in general an artificial learning system must use feedback
from the environment to discover any semantic inconsistencies in its interpretation.
Hypothesis generation is described in detail in Sect. 9.2.
7.4 High-Level Inference Mechanisms of SCL 61
Analogical Reasoning
Analogy has been argued to be the “core of cognition” [146]. It can be considered
as a generative mechanism that factors out a common ‘blend’ [84] between two sit-
uations. There is considerable literature on cognitive and computational models of
analogy: in-depth surveys can be found in Genter and Forbus [106] and Prade and
Richard [271]. Analogy is generally considered to be either predictive or propor-
tional. Predictive analogy is concerned with inferring properties of a target object
as a function of its similarity to a source object (e.g. the well-known association
between the orbits of planets in the solar system and electron shells in the Rutherford
model of the atom). Study of proportional analogy extends at least as far back as
Aristotle [52]. A proportional analogy problem, denoted:
A : B :: C : D
Abduction
?
? ? ?
? ?
a. b. c.
Fig. 7.3 Bidirectional rules. Rules support both induction and abduction; depending on their denota-
tional semantics, their inputs and outputs (marked ‘?’) are ascribed particular meanings. a Induction:
the output can be a prediction or a timeless entailment (e.g., an instance of a subtyping relation).
The inputs may be (counter)factual (e.g., sensory inputs or absence thereof), induced or abducted. b
Abduction: the input can be a goal, an assumption, or a (counter)fact. The outputs can be subgoals,
subassumptions, or timeless premises; they are not necessarily unique. c The choice of outputs is
constrained by an input
62 7 Philosophy
an upside-down table to provide support’) to counterfactuals (‘if the table were turned
the other way up …’). In DL and RL, the ‘representation language’ is untyped and
noncompositional, so this kind of direct modification of hypotheses is not possible.
In Sect. 9.4, we describe a compositional mechanism for abduction.
In contrast to the a priori problem formulation required for supervised learning, the
scientific method is an iterative process of problem formulation and solving. Such an
iterative approach performs both supervised and unsupervised learning, the former
corresponding to the meeting of objectives supplied a priori, the latter being the search
for more compelling hypotheses, potentially via new experiments. In this wider
framework, hypotheses have the non-monotonic property of the scientific method
itself, i.e., they are potentially falsifiable by subsequent observation or experiment.
The aspiring robot scientist must therefore decide how to interleave the processes
of observation and hypothesis generation. Prior art on this in the (simply-stated but
open-ended) domain of number sequence extrapolation is Hofstadter et al.’s Seek-
Whence [144], which decides when to take further samples as a function of the
consistency of its hypothesis. In a more general setting, the self-modifying Power-
Play framework searches a combined task and solver space, until it finds a solver
that can solve all previously learned tasks [307, 323]. In more recent work, Lara-
Dammer et al. [190] induce invariants in a ‘molecular dynamics’ microdomain in a
psychologically-credible manner.
In particular, our chosen definition of general intelligence acknowledges that
resources (compute, expected solution time, relevant inputs, etc.) are finitely bounded.
At the topmost level, the corresponding resource-bounded framework for the scien-
tific method is simple: within supplied constraints, devote resources to finding good
hypotheses, balancing the predicted merits of hypothesis refinement against the abil-
ity of the refinement to further distinguish signal from noise. The presence of such
bounds is an intrinsic guard against the kind of ‘pathologically mechanical’ behav-
iors that one might expect from algorithms which do not incorporate finite concerns
about their own operation, as detailed further in the next chapter.
7.5 Intrinsic Motivation and Unsupervised Learning 63
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 8
Architecture
Machine learning excels at inducing mappings from data, but struggles to induce
causal hierarchies. In contrast, symbolic reasoning (in particular, when considered
as an expression language) can represent any form of domain knowledge and can
index into code or data via pattern matching.1 Evidently, reasoning and learning
must be robust to both the variability of inputs and the reliability of prior knowl-
edge, learned or imparted. In that regard, Marcus has argued extensively for neuro-
symbolic hybrids [210, 213, 214], advocating the complementarity of distributed
representations (‘neuro’) and qualitative localist causal knowledge (symbolic); see
also d’Avila Garcez [59]. We explain in this chapter how SCL defines a framework
with equivalent goals: although not explicitly ‘neural’ in operation, the dynamic
attention mechanism can be considered to play an analogous role to that of neural
connection weights, although perhaps a better guiding metaphor than homogeneous
neurons is the ‘structural stigmergy’ [179] of ant colonies [70, 146]. We then proceed
to present a reference system architecture for SCL: its purpose is of course to show
how SCL can be realized in silico and, by design, to provide sufficient guarantees
for its claims (open-ended learning, anytime operation, grounded reasoning, etc.) at
the operational level.
1Of course, heuristic synthesis of symbolic expressions (e.g. as in Genetic Programming [180])
has been practiced for decades, but has never really been considered as ‘mainstream’ ML.
© The Author(s) 2022 65
J. Swan et al., The Road to General Intelligence,
Studies in Computational Intelligence 1049,
https://doi.org/10.1007/978-3-031-08020-3_8
66 8 Architecture
Numerous authors have indeed attempted the integration of distributed and localist
representations for more than twenty years. However, depending on the original
focus of inquiry (reasoning, control theory, ML, or other), ‘integration’ can serve a
rather broad variety of purposes. We now list a few exemplars of these to give some
perspective to the purpose of hybridization in SCL; a broader survey can be found
in Bharadhwaj et al. [23].
The AKIRA hybrid architecture addresses control rather than learning [257] and
is designed around concurrent code fragments. These fragments are organized in a
weighted dynamic network of dependencies, competing for budget-constrained com-
puting power on the basis of their expected outcome (goal satisfaction). Focusing
on learning instead of control, the DUAL/AMBR architecture [177, 178] controlled
symbolic reasoning via spreading activation based on numeric truth values across
a network of causal and structural relations, in a manner reminiscent of Copycat
[146]. Clarion, another hybrid approach [327, 328], layered symbolic reasoning on
top of neural networks, with the aim of enabling top-down and bottom-up learning
processes. These were ultimately based on RL, thus imposing, architecture-wide,
the fundamental limitations described in previous chapters. In pursuance of another
objective, the Sigma architecture attempts to define intelligence in a principled man-
ner [295]: the authors claim ‘grand unification’, and ‘generic cognition’ based on
graphical models. It is indeed not impossible to think that such a computational
substrate, pending further significant work, might become the lingua franca of rep-
resentation processes able to transcend levels of abstraction. However, the authors
have so far limited themselves to the reimplementation of established concepts such
as episodic memory [297], RL [296], and a ‘standard’ model of cognition based on
the ‘sense–think–act’ loop [298].
More recently, the majority of ongoing research on hybrids is unsurprisingly ML-
centric, attempting to remedy some of the inadequacies of deep neural networks
for reasoning applications. For example, some recurrent neural networks such as
LSTM, although designed for sequential/temporal inference, face difficulties with
learning long-term dependencies, mainly because their memory consists essentially
of compressed (and degradable) input sequences. The Differentiable Neural Com-
puter (DNC) [124] alleviates this problem by coupling the network to an external
symbolic memory (other work uses different types of memory augmentation [7, 166,
260]), but does so at the cost of magnifying other shortcomings of DL: the DNC is
notoriously harder to train than LSTM (sample inefficiency), and its applicability to
arbitrary tasks appears to depend even more than before on the architecture employed
and its initial choice of parameters (brittleness). Several improvements have since
been proposed [55, 92, 248] and have addressed some of the performance issues
but less so the issue of applicability. As of today, the DNC performs reasonably
well on latching, copying, associative recall, and simple state machines; the general
framework of differentiable computing is preserved but the capabilities of the DNC
remain very remote compared to the requirements for general reasoning (abstracting,
8.1 SCL as a Distributed/Localist Hybrid 67
analogy making, planning, etc.). Other approaches have made the opposite trade-off,
namely that of sacrificing end-to-end differentiability to accommodate more power-
ful reasoning substrates. They proceed by inserting hand-crafted concepts directly
into the RL framework in various forms, e.g. conceptual graphs [152], algorithmic
primitives [138, 193], Drescher-style schemata [71, 164], or ontologies and associ-
ated auto-encoders [105]. In all of these cases, the agent learns a concept’s extension
and operational semantics, thus enabling planning via various forms of probabilistic
inference. However, as recent research shows, hand-crafting of explicit knowledge
structures can be entirely avoided in some cases (so far, mostly problems that can be
addressed by direct visual attention). Notably, the PrediNet neural architecture [311],
when subjected to an appropriate curriculum, is capable of acquiring some represen-
tations with explicit relational and propositional structure. With PrediNet, the neural
substrate is used only for learning and does not commit to any specific form of knowl-
edge exploitation: this can be carried out either symbolically (e.g. using predicate
calculus, temporal logic, etc.) or, more speculatively, via differentiable ‘reasoning
models’ [69, 225, 294].
These attempts at ‘getting the best of symbolic AI and ML’ proceed quite literally,
essentially reimplementing reasoning (and the necessary supporting representations)
in the terms and components of machine learning. We proceed differently. Three
aspects of SCL are of particular relevance to what is expected from the reconciliation
of ML and symbolic AI:
• Strong typing.
• Fine-grained, open-ended, continual, and compositional inference.
• Emergent resource-aware and goal-directed attention.
We claim that these principles combine the strengths of both approaches: whilst
SCL can be provided with prior domain knowledge in any desired form, it is not
subject to the problems which plagued GOFAI, since the ability to reflectively rea-
son at the type level allows the sustained and progressive learning of invariants from
the environment. In SCL, learning and planning do not need to be reconciled. By
design, both learning and planning are side-effects of the same reasoning process
which unfolds in response to the pressure of external goals and constraints over lim-
ited knowledge and resources (e.g. inputs, time, computational power and memory,
energy, physical agency, etc.). The research cited above pursues the objective of end-
to-end representability and actionability of structured knowledge; to this, we add
the (orthogonal) requirement of end-to-end controllability, i.e., self-referential goal-
directed resource allocation. In the SCL framework, the duality between distributed
and local does not concern the representation of functional knowledge (world mod-
els, goals, etc. are already unified), but rather the representation of cognition itself.
For reasons explained below, we use distributed representations for controlling the
reasoning process, itself explicitly represented in the corpus of world knowledge
to describe the state of the ‘system-in-the-world’ (the ‘endogenous situatedness’
of Prop-3 in Sect. 7.3); Wang’s Non-Axiomatic Reasoning System (NARS) [358]
follows a similar approach.
68 8 Architecture
For open-ended learning, inputs may originate from outside the distribution from
which knowledge was initially learned. Yet progress must nonetheless be sustained,
both in terms of continual learning and action. ‘Falling outside the system’s comfort
zone’ is not sufficient reason for invalidating acquired knowledge. This would amount
to ‘stop, erase, and re-learn’, similar to the RL ‘learn-then-deploy’ procedure that,
as we have seen, runs counter the desired property of anytime operation. In fact,
keeping going might just be the right thing to do: in case foreign inputs happen to be
in a semantic continuum with prior distributions, extrapolation would be correct, the
system’s activity would carry on while vindicating its knowledge over increasingly
broad scopes. Of course, in case of a semantic discontinuity, prior knowledge would
produce faulty extrapolations and the system might fail to meet its goals. In that
respect, the system has to perform two activities, continually and on a case-by-case
basis (possibly concurrently). The first consists of extending an initial distribution
with foreign inputs and vindicating the current related knowledge. The second is to
learn new knowledge from a seemingly novel distribution—the system must also
surmise explicit constraints on the relevance of the initial one to guard it against
further unwarranted use. This can be achieved by assessing the degree to which the
learned patterns match the new inputs and propagating the discrepancies across the
possible consequences at higher levels of abstraction in the knowledge hierarchy—for
these are ‘battle hardened’ oracles: by construction, they are broader, more reliable
and, critically, change less frequently than the lower layers of the hierarchy.
SCL accommodates control heuristics for which truth values are not axiomatic
but instead are assessed up to a certain degree (certainty is asymptotic, a tenet we
share with Wang [356]). In this approach, truth values are not static, they unfold over
time: they are multiplied at each step of inferencing and are also updated whenever
new (counter-)evidences become known [241]. At the conceptual level, dynamic
numeric truth values thus reflect, quantitatively, inference composition. Interpreted,
at the operational level, as the reliability of inference chains, truth values allow to
compute the worthiness of resource expenditure (i.e., for further ramifying inference
chains) with regards to the goals at hand. This forms the substrate from which goal-
directed attention emerges to control the reasoning process, as we will see in the next
section.
2 Feedback loops are manifested both at the macro-scale, coupling system and environment, and at
the micro-scale (i.e. intra-system), coupling inference chains—formally, the composition of lenses,
defined in Sect. 9.4.
8.2 Reference Architecture 69
Workspace Execu ve
instan ate
user
read write
World
Fig. 8.1 SCL reference system architecture for end-to-end semantic closure. The executive orches-
trates a number of asynchronous processes (in black) and provides interfaces to external sensing/
actuation data streams (also asynchronous). Scheduling is a resource-aware goal-directed emergent
process. Rules and states in grey denote axioms. The picture does not distinguish between state
modalities—past, present, assumed, predicted, or desired. States in blue pertain to the world, states
in red reflect the reasoning process, and states of the executive (memory usage, energy consumption,
etc.) are represented in black. The write operation performed by actuators on the workspace carry
efference copies; see text for details
deriving such adaptive behavior from the goals at hand, in light of (3) explicit repre-
sentations of the physical and cognitive state of the system.
Figure 8.1 gives an overview of a reference system architecture. The architecture
consists of two main subsystems: a workspace and an executive. The workspace con-
tains learned relational knowledge (implemented by rules) and a representation of
both world and system states—be they actual (past or present), hypothetical (assumed
or predicted), or desired (goals and constraints). Regarding ‘relational knowledge’,
we move away from the vocabulary of causality, which is framed in terms of ele-
mentary changes to the ‘wiring diagram’ of hypothesis chains, as per Pearl’s ‘do
operator’. Instead, in common with the emerging discipline of behavioural control
[365] (see Sect. 11.2.3), we adopt the relational perspective, which is concerned
with the invariants propagated by a relation, as per the notion of contract [245]. This
offers a more prescriptive framework that is therefore better suited to describing
state spaces, particularly those where rules are partially instantiated (e.g. input terms
contain wildcards or constraints)—see Sect. 10.1.
The consistency of the workspace is maintained by the executive which, besides
providing interfaces to sensors and actuators, consists essentially of a scheduler and
multiple instances of a universal interpreter. In broad terms, the architecture can
be considered a massive fine-grained production system [288, 300], in which huge
numbers of rules fire concurrently and asynchronously. Some small fraction of the
70 8 Architecture
workspace rules may be provided a priori while the vast majority are produced and
maintained dynamically. In addition to rules, the workspace contains states—axioms
are also accommodated. Each inferred state has an associated time horizon and an
estimate of its likelihood. States qualify both the world, the deliberations of the
system (‘trains of thought’), and the system itself embodied in the world (manifested
as memory expenditure, performance profile, physical agency and integrity, etc.).
Sensors and actuators constitute the physical interface of the system to the world,
and as such, must be amenable to modeling. For this reason, actuators write efference
copies in the workspace. An efference copy is an explicit state which encodes the
action that was actually executed (by the ‘body’) in response of the request (by
the ‘mind’) for a desired action: the error between the two allows the system to
model the contingencies of its embodiment, i.e., capabilities, limitations and costs
(including response latency, energy expenditure, wear and tear, etc.) in a feedback
loop generalizing the biologically plausible model of motor control proposed by
Wolpert [136]. Errors are useful to learn ‘body models’ at the edge—’inside-out’—,
but not only: there is an inherent dual (and equally important) way to make sense
of errors. When body models are well established (i.e., vindicated by significant
experience), errors change their meaning: they then signify hidden world states.
To take the example used by Wolpert, assuming a carton of milk is full, one will
exert a rather strong torque on one’s limbs to lift it, only to overshoot the intended
end location in case the carton turns out to be empty. In other words, a misaligned
efference copy defeats an invalid assumption (a case of abduction). The symmetric
feedback loop (i.e. the one pertaining to sensors) is similar, the main distinction being
that the inherent duality of errors is operationally mirrored. Sensing errors are signals
for learning world models at the edge (‘outside-in’) in response to the invalidation of
predictions (a case of deduction). Conversely, when the stationarity of world models
is well established (i.e. the reliability of world models is predicted reliably, vindicated
by significant experience) sensing errors also change their meaning: they then signify
the failure of the sensing apparatus.
Each rule acts as match-and-transform schema for typed expressions, encoding
patterns that denote subregions S1 , S2 of a state space S, together with a transfer
function τ : S1 → S2 . The state space is dynamic: additional dimensions may be
synthesized (e.g. via abstraction) at any time by some rules and subsequently also
be available for matching and transformation; least recently significant dimensions
may conversely be deleted. Matching binds values (states, rules or transformation
events) to rules, whereas transforming combines several such bindings to produce
new values (along with a transformation event as a side effect). The allocation of
resources for transforming available bindings is prioritized by the scheduler; the most
promising bindings will receive computational attention first.
Matching and transformation of SCL expressions is performed via instantiations
of a universal interpreter. Paths through the state space are composed via the appli-
cation of interpreter instances and compound SCL inference methods (as introduced
in Chap. 7 and discussed in further detail in Sects. 9.2–9.4) to produce inferences
(deductive, abductive, or analogical) and state-space dimensions. This interpretation
is explicitly compositional and imposes a denotational semantics on expressions.
8.2 Reference Architecture 71
The unfolding operation of universal interpreter application bears witness to the sys-
tem’s internal state of deliberation: such ‘trains of thought’ are considered first-class
citizens and are duly recorded in the workspace.
The learned ruleset constitutes a fine-grained representation of the entire transition
function of the world model. In contrast, axiomatic rules have a different purpose: to
implement heuristics for rule and type construction and maintenance. For example,
some rules seek to “carve Nature at its joints” [127], by identifying ‘surprising’
world states such as the failure of a prediction or the unpredicted achievement of
a goal. In either case, corrective rule synthesis is applied to the ruleset: this can
be considered to impose learned pragmatics on the default denotational semantics.
Conversely, the absence of surprise vindicates parts of the world model and the
rules that implement them: these rules will be deemed more reliable than others
that see repeated failures. For completeness, other axiomatic rules include seeking
out candidate patterns as a basis for constructing abstractions and analogies, as well
as identifying opportunities for generating improved hypotheses. Rule synthesis is
addressed extensively in Chap. 10.
Computing resources are inevitably limited and a system must allocate these
wisely to cater to arbitrary influxes of inputs and goals. The reference architecture
decouples matching from transformation and treats them accordingly as two dis-
tinct classes of processes. Matching processes produces requests for transformation,
termed ‘jobs’, each of these being associated with a quantitative estimate of its utility
with respect to all goals at hand, system-wide. This estimate is based on three main
factors: (1) the likelihood of the matched value, which is a function of the reliability
of the (chain of) rules that produced it; (2) the reliability of the matching rule; and
(3) the relative importance of the goals that may be served (or opposed) by the result
of the job. This is revised continually as premises, rules, and goals vary not only
in quality (the geometry of the subregions they occupy in the state space) but also
quantitatively as per their reliability and relevance. Estimates of utility are primar-
ily used for prioritizing jobs for execution, i.e. in fine, to schedule the execution of
transformations competing for resources.
The role of the scheduler in the SCL architecture is merely to re-order jobs contin-
ually according to their priorities. For example, previously unattended jobs may jump
ahead of new ones as their prospective benefits become apparent, whereas hopelessly
unpromising jobs are eventually deleted. This particular scheduling design confers
three essential benefits. First of all, it avoids the scalability issues inherent in stan-
dard job-scheduling theory. The scheduler does not perform explicit allocation but
instead slows down relatively less beneficial inference chains, exploiting the fine
granularity thereof to its full extent. In the standard acceptance of the term, this is
hardly a scheduler at all: SCL scheduling is indeed more a distributed emergent pro-
cess than it is an algorithm. Second, this particular design enforces a fundamental
property of the system architecture, namely endogenous situatedness. It achieves this
by imposing a semantically-grounded control over the execution of the expression
transformers: control is based on up-to-date predictions of functional costs and bene-
fits, and balances the cognitive load for the best use of time (deadlines) and available
(possibly varying) computing power. Last but not least, semantically-grounded con-
72 8 Architecture
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 9
A Compositional Framework
A couple in love walking along the banks of the Seine are, in real
fact, a couple in love walking along the banks of the Seine, not
mere particles in motion.
The system architecture presented in Chap. 8 controls (i.e., sustains and constrains)
the invocation of the inference methods introduced in Chap. 7. In this chapter, we
describe the methods of higher level inference in more detail. There is increasing
conviction that methods of category theory are well-suited for providing generic
descriptions for cognitive science, general intelligence [258], and control [42], the
latter being newly christened as ‘categorical cybernetics’. In this chapter, we describe
how to leverage the power of selected category-theoretic constructions to realize SCL
operations in a compositional manner.
approaches such as those in linguistics [361] and process algebra [224] might be
better known, the semantic approach via category theory has become increasingly
popular in diverse fields due to its flexibility and mathematical elegance [89].
A category is a two-sorted algebraic structure rather like a graph, consisting of
‘objects’ and ‘morphisms’: every morphism has a source object and a target object,
written f : X → Y . In addition to the graph-like structure is a way to compose
morphisms: given any morphisms f : X → Y and g : Y → Z with the target of one
agreeing with the source of the other, there is a composite morphism f g : X → Z
(typically written g ◦ f ). This must satisfy the ‘associativity property’ familiar from
algebraic structures such as groups, i.e., given three composable morphisms f, g, h
the two different ways of composing them must agree: ( f g)h = f (gh). Every object
must also have an assigned ‘identity morphism’ 1 X : X → X , and they must act as
the identity element for composition: 1 X f = f = f 1Y for all f : X → Y .
There are many examples of categories, and we will name only a handful:
1. The category Set of sets, whose objects are sets and morphisms are functions.
2. The category FinVec of finite-dimensional vector spaces, whose objects are finite
dimensional vector spaces and morphisms are linear maps.
3. The category Rel of relations, whose objects are sets and morphisms are binary
relations.
4. For any graph G, there is the ‘free category on G’, whose objects are nodes of
G and morphisms are paths in G. Composition is concatenation of paths, and
identity morphisms are paths of length zero.
5. Any monoid M can be viewed as a category with a single object ∗, where every
element m ∈ M is viewed as a morphism m : ∗ → ∗.
Of particular relevance is the fact that typed programming languages (e.g. the
simply-typed lambda-calculus [47] or intuitionistic type theory [217]) give rise to
corresponding categories: it is possible1 to consider a category with types as objects
and functions as morphisms [54].
When studying compositionality, it is typical to work in the context of ‘monoidal
categories’, which have additional structure: there is a monoid-like structure on
objects, with a binary operation ⊗ and a unit I , in a way that is also compatible
with morphisms, so if f : X 1 → Y1 and g : X 2 → Y2 are morphisms, then so is
f ⊗ g : X 1 ⊗ X 2 → Y1 ⊗ Y2 , satisfying various laws. A category typically has sev-
eral different monoidal structures. For example, in the category of sets we could take
⊗ to be Cartesian product of sets (whose unit is a 1-element set) or disjoint union of
sets (whose unit is the empty set). In the category of finite-dimensional vector spaces
we could take ⊗ to be the direct product (whose unit is the 0-dimensional space) or
the tensor product (whose unit is the 1-dimensional space).
Morphisms in a monoidal category are often represented using the graphical nota-
tion of string diagrams [216]. For example, if we have morphisms f : X 1 → Y1 ,
g : X 2 → Y2 and h : Y1 ⊗ Y2 → Z , then the composite morphism ( f ⊗ g)h : X 1 ⊗
X 2 → Z is represented by the diagram:
Y1
X1 f
h Z
X2 g
Y2
The other basic concept required is a ‘functor’, which is a structure-preserving
map between categories. If C and D are categories than a functor F : C → D is an
assignment sending every object X in C to an object F(X ) in D, and every morphism
f : X → Y in C to a morphism F( f ) : F(X ) → F(Y ) in D, in a way that preserves
identities and composition. If our categories are monoidal then we consider ‘monoidal
functors’, which also preserve the monoidal structure.
One of the most important requirements for general intelligence is that the hypotheses
which are generated are salient, i.e., pertinent to the task at hand. In the case of the
simple pendulum, the closed form expression was obtained by virtue of observation
that e.g. the color of the pendulum bob was not relevant, but the angle of displacement
was, and so on. However, this does not imply that all features of the pendulum were
given equal attention. Humans have sufficiently strong priors about force and motion
that it is hard to imagine an experimenter ever consciously entertaining color as a
factor. It is therefore evident that scientific hypothesis generation enjoys a degree of
subtlety which is absent from traditional ML approaches.
Previous such work on non-quantitative generation of alternative hypotheses can
be found in Mitchell and Hofstadter’s ‘Copycat’ [146]. Copycat is proposed as a
cognitively-plausible generator of proportional analogies between letter-strings and
operates without requiring a ‘top-down, a priori’ objective function. In the abstract,
Copycat can be considered as an interpreter for expressions that describe (poten-
tially partially constructed) analogies, in which top-down and bottom-up perspec-
tives interact. At any point in construction, structures can be interpreted to yield
what Hofstadter describes as their ‘counterfactual halo’, i.e., to suggest alternative
expressions that tend to be more self-consistent. Copycat avoids explicit combina-
torial search via a combination of attention heuristics (which share a great deal of
commonality with the scheduling mechanism described in Sect. 8.2) and interacting
hierarchical constraints. Salient actions are indexed ‘on demand’ by local updates to
hypotheses and no form of global truth maintenance is required. These local updates
act to greatly prune the space of possible alternatives, being biased by design in the
general direction of ‘more consistent, hence more compelling’ hierarchies.
More generally, a number of previous works in cognitive architectures argue that
the frame problem is an artifact of an overly rigid perspective on hypothesis repre-
76 9 A Compositional Framework
sentation [97, 146]. Specifically, the claim is that hypotheses should be both causally
grounded (via participation in a sensor-effector mapping that receives feedback from
the environment) and ‘reified on demand’ via the data-driven context of task features.
It is claimed that the arguments of a priori representationalism that lead to the frame
problem are then no longer applicable. Such context-specific hypothesis-chaining is
an intrinsic aspect of SCL: salience is facilitated via the joint action of fine-grained
scheduling and resource bounds on both time and space [325]. In the following
section, we describe a general mechanism for interpreting the structure of hypothe-
ses, in which alternative hypotheses are generated via a specific form of interpretation
which yields a modified hypothesis as a result.
Compositional Interpretation
2Strictly, the expressiveness is more general than primitive recursive in that it includes e.g. Acker-
mann’s function [154].
9.2 Hypothesis Generation 77
Fh
FA FB
f g
h
A B
Fig. 9.1 h as a homomorphism
F (Cata f )
F (μF ) FA
in f
Cata f
μF A
Fig. 9.2 Cata as a unique homomorphism
Technically, the universiality property arises as follows: for category C and functor
F : C → C, an algebra3 consists of a pair (A, f ) consisting of an object A and a
morphism f : F A → A. A homomorphism h : (A, f ) → (B, g) between algebras
is a morphism h : A → B such that the square in Fig. 9.1 commutes.
In the category that has algebras as objects and homomorphisms as morphisms,
an initial algebra is an algebra that is unique (up to isomorphism) and initial, i.e.,
it can be transformed into any other algebra in the category. We write (μF, in) for
an initial algebra, and Cata f for the unique homomorphism h : (μF, in) → (A, f )
from the initial algebra to any other algebra (A, f ). That is, Cata f is defined as the
unique arrow that makes the diagram of Fig. 9.2 commute.
The universal interpreter property of Cata then arises by virtue of this initiality
[154]. Cata is an abbreviation of ‘catamorphism’ (from Greek: κατ α ‘downwards’
and μoρϕη ‘shape’); informally, a generic method of transforming some typed source
expression into a target expression (of whatever desired type). Hence, in a category
of expressions Ex in which the objects are types and the morphisms are functions,
Cata thus provides an algorithm template for the interpretation of expressions. Hence
predicates on Ex are represented as a transformation of some source expression which
has target type bool and alternative hypotheses as having target type Ex.
As previously mentioned, in the wider context of general intelligence, such a
‘Closed World Assumption’ is insufficient: reality may always intrude to defeat prior
assumptions.4 Such exceptions arise as a function of the difference between expected
and observed states—whether because an action fails to yield the anticipated state, or
else because some state of interest arises in an unanticipated manner. In the context
of SCL, then, the term hypothesis recovers its traditional meaning as a tentative
proposition about reality that remains forever potentially subject to revision. When
Consensus on the value of abstraction in AI dates back to the inception of the field
[220]. Various forms of analogical reasoning have similarly been widely argued, by
cognitive and computer scientists alike [84, 144], to play a vital role. There are a wide
variety of proposed definitions for both. For example, Gentner [107] defines abstrac-
tion as ‘the process of decreasing the specificity of a concept’. Cremonini et al. [53]
define abstraction as ‘the process of mapping between problem representations, so
as to simplify reasoning while preserving the essence of the problem’. Definitions
of abstraction and analogy can overlap considerably. For example, the formal and
general abstraction framework of Giunchiglia and Walsh [111] describes abstraction
in terms of properties that are provably preserved under the source to target map-
ping. This definition could also be said to be applicable to predictive analogy [271],
which is concerned with inferring properties of a target object as a function of its
similarity to a source object, the oft-cited example of which is the similarity between
the solar system and the Rutherford model of the atom. Given the perceived richness
and complexity of abstraction and analogy, this overlap is unsurprising. Indeed, it
seems possible that the processes are recursively interleaved in a nontrivial and data-
driven manner. Hence, whilst in this Section we propose a concrete mechanism for
abstraction that can then be used as a basis for analogy, this should be considered as
a pedagogical device rather than any attempt at a definitive statement.
The interpretation of SCL expressions via catamorphisms already provides an ele-
mentary abstraction mechanism: the algorithm skeleton for the interpreter abstracts
over the generation of alternative types and values from a given expression. However,
9.3 Abstraction and Analogy 79
u
σ
σ2
σ
e1 u e2
σ1 σ2
Fig. 9.3 Anti-unification as a categorical product. The anti-unifier of expressions e1 and e2 is an
expression u, together with two substitutions σ1 and σ2 . For s →σ t, we say that t is a specialization
of s, i.e. it has been obtained from s via the instantiation of one or more variables. The least
general anti-unifier (u, σ1 , σ2 ) is the unique u such that, for any other candidate (u , σ1 , σ2 ), u is a
specialization of u via some substitution σ
since expressions in SCL are first-class objects, we may also perform abstraction via
other means, such as anti-unification. Anti-unification has a variety of uses in program
analysis, including invariant generation and clone detection [39]. The various forms
of unification [266] can be described categorically [119] via a category in which
the objects are terms (i.e. SCL expressions) and the morphisms are substitutions,
i.e., the binding of one or more variables to subexpressions. Anti-unification is the
categorical dual of unification [162], representing the factorization of substructure
common to two expressions. The discovery of such ‘abstract patterns’ is analogous
to the induction of subroutines, which can be instantiated across a range of parameter
values. More generally, abstraction is applicable across different dimensions of the
state space—see Sect. 11.2.2 for a discussion of wider prospects in this regard.
Figure 9.3 depicts anti-unification as a categorical product, a construction that
generalizes the familiar notion of Cartesian product. The diagram denotes that anti-
unifier (u, σ1 , σ2 ) is more specialized than all other candidates (u , σ1 , σ2 ) because
the latter can be recovered from the former via σ .
Analogical reasoning in humans appears to afford unbounded creative potential
[95]. We share the belief that analogy is a dominant mechanism in human cognition
[148, 228] and envisage that computational models of analogy will be a key research
area for general intelligence. It should be clear that such research is completely
open-ended. The categorical approach we describe below is therefore for pedagog-
ical purposes; the further incorporation of heuristics is a more realistic prospect for
practical use.
We give a categorical construction for proportional analogies which builds upon
the method of abstraction defined above.
As previously described in Sect. 7.4, the example application domain we con-
sider here is that of letter-string analogy (LSA) problems (e.g., abc : abd :: ijk
: ???). Although the domain may appear simple, it has been remarked that it can
require considerable sophistication to obtain solutions that appear credible to humans
80 9 A Compositional Framework
h
abc abd
v v
h
ijk ???
Fig. 9.4 Proportional analogy as a commutative diagram: abc : abd :: ijk : ???
[95], not least because the domain is not readily modeled as an optimization problem.
It is therefore reasonable to assume that mechanisms for solving LSA problems have
high relevance and are applicable for implementing cognitive processes at many lev-
els. Notable approaches to LSA problems include Hofstadter and Mitchell’s Copycat
[146] and the ‘E-generalization’ approach of Weller and Schmid [360], although the
latter is not cognitively plausible for reasons of scale.
As can be seen in Fig. 9.4, proportional analogy problems can also be consid-
ered to form a commutative diagram. The ‘abstraction via anti-unification’ approach
described above can be used as a building block for constructing analogies, for exam-
ple as is done in ‘Heuristic Driven Theory Projection’ [308]. In particular, abstraction
can be combined with the powerful category theoretic constructions of pushouts and
pullbacks to construct, express, and understand such analogies in a computation-
ally automatable manner across a wide range of expression languages. Specifically,
we can use these constructions to determine the possible relationships between our
objects A = abc, B = abd and C = ijk such that D = ijl is uniquely deter-
mined through commutative diagrams.5
Pushouts
5 The possibility D = ijd is not treated here; a more fully-featured approach to analogy would
include nondeterminism and preference heuristics, e.g. as in Goguen’s ‘sesqui-categories’ [117].
9.3 Abstraction and Analogy 81
h
A = {a, b, c} B = {1(a), 2(b), 3(c), 4}
v v
h
C = {W, X(a), Y (b), Z(c)} D = {W, 1X(a), 2Y (b), 3Z(c), 4}
Pullbacks
We now demonstrate that these concepts give mechanisms for automatic derivation
of analogies according to abstractions. For a more complete description of pushouts,
pullbacks, and pushout complements in this context, see Taentzer et al. [76]. In
Fig. 9.5, we give a sketch of a pushout-based solution to classic letter-string analogy
problems. Letter-strings are represented as lists of natural numbers, with natural
numbers themselves represented via Peano arithmetic. We are working in the category
82 9 A Compositional Framework
Fig. 9.5 A solution to the letter-string analogy question abc : abd :: ijk : ??? via pushouts.
Each letter string is represented by its corresponding tree in Peano arithmetic, with the letter ‘a’
corresponding to the number ‘0’. The shorthand ‘Sx ’ is used to concisely represent a sequence of x
consecutive successor nodes. Elements in red are ‘deleted’ by the transformation, whereas elements
in blue are ‘created’. The middle rule can be viewed as an abstract rule for transforming letter-strings,
and interpreted in language as ‘increment the final (non-a) character in the letter-string’
of labeled graphs (an ‘adhesive category’ [185]), along injective morphisms which
are both label- and structure-preserving. Note that in this scenario, relabeling may
be achieved in rewriting systems over labeled objects through the use of partially
labeled objects as intermediaries [126].
The example works as follows. We are provided with the expression trees of
the letter strings abc, abd, and ijk. We know that abc relates to abd in some
manner and wish to induce the equivalent relation for ijk and some unknown letter
string. Thus the task may be phrased “abc is to abd as ijk is to what?”. The
first step is to induce common substructures between the pairs abc and abd as
well as abc and ijk. Two particularly promising substructures are shown in the
top middle and left middle of the diagram. This first step is most critical; the two
common substructures will deterministically define the rest of the transformation.
If either common substructure is too specific, it may form too rigid a restriction for
9.4 Abduction 83
9.4 Abduction
There has recently been much interest in the applied category theory community in
‘lenses’, which provide a simple but powerful abstraction for hierarchical feedback.
Originally appearing in database theory and functional programming for describing
deeply nested destructive updates [91], they have later turned out to be of central
interest in categorical approaches to both game theory [108] and machine learning
[88, 90]. One perspective on lenses is that they are a general theory of ‘things that
compose by a chain rule’.
Lenses
Lenses are the morphisms of a category Lens whose objects are pairs of sets, where
we think of the first as ‘forwards’ and the second as ‘backwards’. A lens
λ : (X + , X − ) → (Y + , Y − )
84 9 A Compositional Framework
λ+ : X + → Y +
λ− : X + × Y − → X −
The category of lenses has a monoidal structure, which is given on objects by:
(X + , X − ) ⊗ (Y + , Y − ) = (X + × Y + , X − × Y − )
While this definition is written in terms of sets and functions, it turns out that lenses
can be more generally defined over (essentially) any monoidal category, although
the correct general definition is not obvious [291].
where J f is the Jacobian of f . In order for this to be a functor it must be the case
that the derivate of a composite function f g can be determined from the derivatives
of f and g using the lens composition law. This turns out to be essentially the chain
rule: Given composable smooth functions f, g, we have
9.4 Abduction 85
= J f (x) Jg ( f (x)) v
= [Jg ( f (x))J f (x)]
v (law of matrix transpose)
From this it follows that D( f )D(g) = D( f g), i.e., that D is a functor. This connec-
tion between lenses and the chain rule was explicitly observed by Zucker [376] and
is also implicit in Fong et al. [90].
Variational inference. There is a category whose morphisms X → Y are probabil-
ity distributions on Y conditional on X . There are several different ways to make
this precise, for example to say that X, Y are measurable spaces and the condi-
tional distribution is modeled as a measurable function X → G(Y ), where G(Y ) is
the measurable space of all probability measures on Y [98, 110]. Equivalently, one
might say that objects are finite sets and morphisms are stochastic matrices. Com-
position of morphisms is by ‘integrating out’ the middle variable (sometimes called
the Chapman-Kolmogorov equation [250]), which is simply matrix multiplication
in the finite case. Call this category Stoch. There is a morphism Stoch → Lens that
pairs a conditional distribution with the function that performs Bayesian inversion
on it, namely f → ( f, f # ) where f # : G(X ) × Y → G(X ) returns the posterior dis-
tribution f # (π, y) given a prior π and an observation y. Bayesian inversion satisfies
a ‘chain rule’ with respect to composition, meaning that the Bayesian inverse of a
composite conditional distribution can be computed in terms of the Bayesian inverses
of the components, and this fact precisely says that Stoch → Lens is a functor [318].
Dynamic programming. Consider a Markov chain with state space S, action space
A, and (stochastic) transition function P : S × A → S. Suppose further that actions
are controlled by an agent, who obtains utility U : S × A → R on each transition. For
each policy π : S → A we obtain a function f : S → S given by f (s) = P(s, π(s)),
and a function f # : S × R → R given by f # (s, c) = U (s, π(s)) + γ c, where 0 <
γ < 1 is a fixed discount factor. The second input to f # is known as the continuation
payoff. These two functions constitute a lens λπ : (S, R) → (S, R), indexed by the
policy. On the other hand, a lens V : (S, R) → (1, 1) turns out to be just a function
V : S → R, which we take to be the value function. If V is an initial value function
and π is the appropriately optimal policy for it, the lens composition λπ V : (S, R) →
(1, 1) performs a single stage of value function iteration. Thus value function iteration
amounts to approximating the limit:
λπ2 λπ1 V0
· · · −→ (S, R) −→ (S, R) −→ (S, R) −→ (1, 1)
where each πi is the optimal policy for the current value function at each stage.6
6 This connection between dynamic programming and lenses is due to Viktor Winschel.
86 9 A Compositional Framework
All three examples of the ‘lens pattern’ we have described above for machine learning
are notably ‘low-level’ and numerical. However, now that the common pattern has
been identified, it is possible in principle to design systems which are structurally the
same but which are semantically ‘higher-level’. This allows the best of both worlds:
logical languages embodying GOFAI principles such as abduction and generaliza-
tion can be combined with the hierarchical feedback which has been enormously
successful in numerical and connectionist approaches. One option is to construct a
monoidal functor C → Lens where C is a suitable category for higher-level reason-
ing; another is to build additional structure into Lens itself using its more general
definition.
The specific approach proposed here is to construct lenses in which the forwards
map performs deductive reasoning, and the backwards map performs abductive rea-
soning. The idea is that the forwards map λ+ will, given a hypothesis x, generate a
deductive conclusion λ+ (x), while the backwards map will, given an initial hypoth-
esis x and an observation y, abductively generate an updated hypothesis λ− (x, y) in
order to explain the observation in a way that is in some sense ‘as close as possible’
to the starting hypothesis.
Suppose now that from an initial hypothesis x we make a 2-step deduction
μ+ (λ+ (x)). If we then observe z, we can perform a 2-step abduction using the lens
composition law to determine a new hypothesis. First, using the deduced hypothesis
λ+ (x) and the observation z, we use μ− to abductively determine the new ‘middle’
hypothesis μ− (λ+ (x), z). We then treat this as though it is an observation, which
together with the initial hypothesis x abductively determines the final result:
Since we will only be describing the robot’s movement with pseudocode it suffices
to informally describe the map. The factory floor is divided into two zones, with a path
running through both. In each zone there is a goal adjacent to the path, representing
a place where the robot can pick up or deliver objects. An abstracted description of
the robot’s position is given by elements of Zone × Chunk, where
There is a function λ+
pos : R → Zone × Chunk that abstracts the robot’s position.
Going the other way, a command to move to a certain state at the more abstract level
can be ‘translated down’ into a lower-level command to move in the space R. For
this we also need to know the current position in the concrete space R, making the
type of the backwards function:
λ−
pos : R × (Chunk × Zone) → R
λ−
pos need not move the robot directly to a position that satisfies the goal, that is, the
equation λ+ −
pos (λpos (x, g)) = g (known as the ‘put-get law’ of lenses) need not always
−
hold. Rather, λpos can direct the robot through a series of ‘waypoints’ by treating
the position variable as a state variable. What should be guaranteed is that holding
the goal fixed and iterating λ− pos (−, g) : R → R from any starting position will after
finitely many steps reach a position x ∈ R satisfying λ+ pos (x) = g (provided the goal
is physically reachable for the robot). For example, our λ− pos (x, g) could be given by
the following pseudocode:
• If the current position satisfies the goal (λ+ −
pos (x) = g) then do nothing (λpos (x, g) =
x).
• Otherwise, if the current position is within a fixed short distance of the goal, then
move onto the center of the goal.
• Otherwise, if the robot is on the path, move along the path towards the goal.
• Otherwise, move directly onto the path.
Thus if we iterate λ−
pos (−, (zone1, goal)) from a typical starting position then the
robot will first move onto the path, then move along the path towards zone 1, and
then move onto the goal. Together the functions λ+ −
pos and λpos constitute a lens:
We will now demonstrate how this lens can be a part of a hierarchy in which the next
level is task-centric. Suppose the robot can carry an object, from the set:
and sense its carry weight. A widget weighs 2, a gizmo weighs 7 and no object
weighs 0, defining a function W : Object → [0, ∞).
We define a lens:
λwt : [0, ∞) → Object
where
λ+
wt : [0, ∞) → Object
88 9 A Compositional Framework
classifies any weight less than 1 as nothing, any weight between 1 and 5 as a
widget and any weight greater than 5 as a gizmo. The backward function:
λ−
wt : [0, ∞) × Object → [0, ∞)
ignores the current weight, and takes the desired object to its resulting desired weight,
namely, λ−wt (w, o) = W (o).
λ1 : X 1 → Y1 and λ2 : X 2 → Y2
is the lens:
λ1 ⊗ λ2 : X 1 × X 2 → Y1 × Y2
given by:
where
Task = {task1, task2, nothing}
The backwards function takes the current state and the desired task, and returns the
desired next state required to complete the task. Task 1 entails collecting a widget
from the goal in zone 1 and delivering it to the goal in zone 2; task 2 entails collecting a
gizmo from the goal in zone 2 and delivering it to the goal in zone 1. λ− task (c, z, o, t),
which returns the robot’s desired next state given the current state and the task, is
given by the following pseudocode:
• If the task is 1 and the held object is a widget, then proceed to the goal of zone
2 to deliver it:
λ−
task (c, z, widget, task1) = (goal, zone2, nothing)
9.4 Abduction 89
• If the task is 1 and the held object is a gizmo, then proceed to the goal of zone 2
to return it:
λ−
task (c, z, gizmo, task1) = (goal, zone2, nothing)
• If the task is 1 and no object is held, then proceed to the goal of zone 1 to pick up
a widget:
λ−
task (c, z, nothing, task1) = (goal, zone1, widget)
• If the task is 2, then there are three cases similar to the above.
• If there is no task, then remain in the current state:
λ−
task (c, z, o, nothing) = (c, z, o)
We can now compose together the entire control system; it is the lens:
The update function of this composite treats the output of λ− task , which is the next
state desired by the high-level planner, as the input to the first level, which ‘translates’
it into the lower level of coordinates.
Around this composite system we must place an ‘environment’. On the right-hand
side sits the human controller or other top-level planner, which decides the top-level
tasks given the top-level observations. On the left sits the ‘physical environment’,
consisting of the real world (or a simulation thereof), together with actuators that
implement the robot’s bottom-level desires and sensors that produce the bottom-level
observations. Crucially, this physical environment will typically have an internal state
that cannot be observed by the robot.
In our example, for simplicity we take the top-level planner to be a constant task.
The physical environment will store the robot’s position and currently held object.
The current position is continually updated with the desired position provided it is
reachable in a straight line. The desired weight is ignored since the robot has no
corresponding actuator. When the robot’s position is in one of the goal areas, the
carried object will change as an object is either picked up or delivered.
90 9 A Compositional Framework
from any starting position, the robot will repeatedly navigate between the goals of
zone1 and zone2, picking up and delivering widgets. (If it is initially carrying
a gizmo it will first return the gizmo before picking up its first widget.)
This setup has the feature that time ‘naturally’ moves slower the higher one goes
up the hierarchy. Suppose the robot’s initial position is in zone2 and it is holding no
object. If the task is task1 then λ− task will output (zone1, goal, widget). This
will be used as the desired input to λ− pos . The robot will navigate through several
stages towards the goal of zone1, during which time the output of λ− task will not
change. After the robot reaches the goal, the environment will update its held object
to a widget, which will cause λ+ wt to change its output to widget. This in turn
will finally cause λ−task to change its output to (zone2, goal, nothing), signaling
a change in desire to move to the other goal to deliver the widget. This will again
stay constant while the lower level λpos navigates the robot towards the new goal.
Here we have proposed to found abductive inference on the category-theoretic
machinery of lenses. Besides abduction, we have also shown how lenses generalize
backpropagation, variational inference, and dynamic programming. We then intro-
duced novel ‘symbolic-numeric’ lenses, which allows hybrid structures, consisting
of both symbols and these pre-existing lenses, to be hierarchically composed. This is
important for implementing scalable planning: the general planning problem suffers
from both branching and time horizon, which can be ameliorated by lower dimen-
sionality as well as longer time jumps. This can be achieved by progressively building
a hierarchy of ‘concepts’ and their affordances (cf. Sect. 11.2.2), and operationaliz-
ing planning as abductive reasoning at the highest available level, which, thanks to
the hierarchical composition, will still be firmly anchored in the sensorimotor level.
In the next chapter, we will see how control loops, which are considered from a lens
perspective by emerging research in categorical cybernetics, provide a compositional
vocabulary to identify and regulate control systems.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 10
2nd Order Automation Engineering
In this chapter, semantic closure meets system engineering: we describe how SCL
systems can be constructed and controlled in practice, casting a developmental per-
spective on automation which we call ‘2nd order automation engineering’. Let us
first give context to our objective, starting with a quote from Bundy and McNeil
[40], who described in 2006 what they considered to be ‘a major goal of artificial
intelligence research over the next 50 years’:
For autonomous agents able to solve multiple and evolving goals, the representation must be
a fluent, i.e., it must evolve under machine control. This proposal goes beyond conventional
machine learning or belief revision, because these both deal with content changes within a
fixed representation. The representation itself needs to be manipulated automatically.
a world model while absorbing the ‘dents’ caused by its contingencies. The function
of an SCL system is therefore to accord both motivations, as per the definition of
semantic closure (Sect. 7.2). In other words, this means to sustain a structural goal-
directed homeostasis ‘at the edge of chaos’ as Kauffman puts it [167]—beyond that,
ignorance unleashes oscillating behaviors, ending up in futility at best, in disaster
at worst. Although Kauffman considers a multi-agent epigenetic landscape over
evolutionary time scales, we are concerned with the growth of a single mind over the
human time scale of work on command. As we will now see, ‘adaptation at the edge
of chaos’ is as relevant for automation engineering as it is for the development of
organisms, when re-cast as the hard-coded drive to control the multitude of feedback
loops which unfold over the dynamic landscape of fine-grained models.
Kauffman developed his concept of adaptation from a systemic perspective to
explain how open systems evolve in complexity to fulfill their intrinsic determina-
tion (spontaneous order) despite exogenous constraints (evolutionary pressure). But
how do forms—the recognizable and composable manifestations of complexity—
arise in the first place? When in 1917 D’Arcy Thompson published the first edition
of ‘On growth and form’ [337] he probably did not anticipate that, four decades later,
his inquiry into the genesis of stable structures would eventually be met by a formal
theory that would extend its reach well beyond embryogenesis. In Thom’s ‘General
Theory of Models’ [336], forms arise from a substrate, determined by a kinematics
(the laws which govern the arrangement/interaction of its constituents) whereas the
temporal evolution of forms is regulated by a separate dynamics. Indeed, forms gener-
ally remain invariant under some selected pseudo-group G of interest (stability), yet
sometimes break up completely (catastrophe) and new forms arise (morphogenesis).
Thom’s theory of models is a mathematical framework for eliciting the dynamics
necessary to explain (and when possible, to predict) change, despite the inevitable
under-specification of the underlying kinematics, and parameterized by G.
We confront the related inverse problem: to control the morphogenesis of con-
trollers arising from a known kinematics. A general solution to this problem from the
perspective of system engineering—‘2nd order automation engineering’—ultimately
deserves its own book. In the present work, we limit ourselves to presenting a minimal
viable design within the scope of the SCL framework.
As we have amply discussed in Part I, autonomous control systems are not systems
that routinely switch between compiled behavior routines. Rather, the function of
an autonomous control system is to engineer its own behavioral processes in com-
pliance with requirements of scope, correctness, and timeliness. Behaviors result
from control feedback loops instantiated over the substrate of a world model. Hence,
to sustain the homeostasis introduced above, is to construct and maintain a world
model such that the instantiation of desired control loops is predictable within the
prescribed operational envelope. To an SCL system, control loops are the essential
10.1 Behavioral System Engineering 93
observables by which it can assess the adequacy of its world model to the prospects
of fulfilling its mission; as such, they constitute the forms of interest to be reasoned
about.
Functionally, a control loop is the coupling of a subset of rules (the output types
of some rules are the inputs types of another) which guarantees the reachability
of a goal state, starting from a terminal state (for now, the state of an actuator—a
definition we will generalize later) constrained by parameters; parameters are non-
terminal states which are not under the direct control of the loop. The guarantee
of reachability is given by the composition of contracts, locally defined by rules.
A rule is defined by (A, PA , f ) → (B, PB ) where A and B are types (generally
composite), PA a predicate on A, PB a predicate on B, and f a transfer function that
maps A to B. The pair (A, PA ) forms a refinement type [157], where the predicate PA
defines which subset of the possible representable values actually correspond to valid
instances of the type A for the purpose of the rule. A contract is the guarantee that if
one predicate holds, so will the other. This perspective of programming-by-contract
[245], in which rules are relations (not functions), is close in spirit to the Behavioral
System Theory developed by Willems [365] and confers a number of advantages:
(1) unlike functions, all relations have a converse, (2) they allow ready modeling of
nondeterminism, and (3) they allow identification and composition of invariants.
Figure 10.1 illustrates the relational perspective on control. Further possibilities
arising from an alternative categorical presentation via string diagrams due to Baez
and Erbele are discussed in Sect. 11.2.4. The generic relational controller (b) is a lens
implemented by a single rule
A0 1. A0 C
s A0
A A0 B
A1 R1
R R0
s p A1
fwd e p
B
Plant e 3. D
a R E
inv A1 R4
a g
2. R2
g’ R g E
a. b. c.
Fig. 10.1 Relational control loops. a A generic model-based controller in functional form (coupling
of inputs/outputs values). A forward model fwd makes predictions p from sensor readouts s and
efference copies e; based on such predictions and set-point g, an inverse model inv computes the
action a. b The same controller in relational form (coupling of constraints on types) is a lens,
implemented by a rule R. The controlled state is A0 , the terminal state A1 , the parameters ∅. There
are two concurrent inference loops: (1) deduction loop and (2) abduction loop. The mixed flow (3)
computes possible actions. c Complex control loops are composed of several rules/loops. Here, the
controlled state is E, the terminal state A1 , the parameters C ∪ D; see text for details
94 10 2nd Order Automation Engineering
where ϕ and ψ are linear arithmetic formulae. Note that FLA negations express
that subregions of the state space are ‘to be avoided’ to the degree of their
likelihood—‘forbidden’ when the likelihood approaches one—a trait we will lever-
age in Sect. 10.4. Predicates in FLA are closed under the application of transfer
functions, hence the predicate constraining the result of a rule can be automatically
derived from the predicate constraining the inputs, namely by applying the transfer
function to all of the constants, variables, and linear functions in the input predicate.
Contracts can therefore be composed along any arbitrary chain of rules coupled via
nonempty intersection between their input/output refinement types. This, in turn,
allows representing control loops as (macro) rules and composing them as such; this
plays an important part in system identification as we will discuss later.
Behaviors result from planning, which results itself from the instantiation of control
loops, subjected to the kinematics discussed above. In SCL, abductions are always
simulated: this allows exploring, concurrently, several possible courses of action
(plans) to achieve a given goal. Of course, at some point in time, the system has to
commit to one of these plans and start acting.
Each time an efference copy is a premise of a prediction, the corresponding (ter-
minal) goal is added to an auxiliary list associated with the prediction. Such lists are
1. If there is a current plan and its likelihood is below some threshold of acceptance
T , then cancel the plan.
2. If there is no current plan, then find the best one (at minima, the one with the
highest likelihood above T ); if none is found, return.
3. Execute all goals in the current plan, whose deadline falls in [t − WCET /2, t +
WCET /2) and which have not been executed yet and are signaled. Due to the
composite nature of states, non-terminal goals are the conjunction of sub-goals.
If a terminal goal k is in such a conjunction C, then the execution of k is effective
if and only if the executions of all other goals in C are effective.
4. If the execution of g is not effective, then cancel g.
For the commitment procedure to be effective requires that end-plans can be pro-
duced before the deadline of its earliest goal, which is of course rarely the case
in practice. To keep the WCET of computing end-plans within acceptable bounds
requires hierarchical planning. Rules are coupled via their types, which in turn are
hierarchized by virtue of abstraction; in that sense, type abstraction imposes a hier-
archy upon the ruleset. Consider for example, a mobile robot with an arm equipped
with a gripper. When the gripper G is actuated, effectively seizing an object O, then
when the robot moves, observations that O moves along the same trajectory as G
will ensue: a rule R will be synthesized (see next section) to capture these, along with
a positive precondition P on R predicting the success of R based on the prior context
of A having been close enough to G and G having been actuated. The successful
firing of P actually signifies ’G grabbed O’, denoted by a new abstract type T (also
detailed in the next section). In terms of abduction, T is considered a terminal state
in the sense that the modalities of reaching T are irrelevant to the activity of planning
the movement of O to some target location.
All type abstractions (super-types, latent states, etc.) actually decouple control
loops. Accordingly, we extend the definition of terminal states to ‘states transduced to
lower levels of abstraction’—sensors and actuators being at level zero. ‘Transduced’
means that there exists a rule (a transducer) whose right-hand state is at a level of
abstraction higher than that of its left-hand state. Hence, a terminal state of a control
loop at level N may be the controllable state of another loop at N − 1, mediated by
some transducer, as illustrated in Fig. 10.2. Finally, we can extend our definition of
an end-plan to ‘a plan whose goal state is a terminal state’, considering the top-level
mission goal state M a terminal state. The commit procedure will be invoked for
96 10 2nd Order Automation Engineering
… … M
C D
C D N
… B B
T0 T1
N-1
A
… F F
E
Fig. 10.2 Hierarchical behaviors (an ellipse denotes a path across the ruleset). An end-plan with
end-goal M at abstraction level N is composed of terminal goals, some of which (C and D) are
transduced (rules T0 and T1 ) into lowers levels: there these global goals become end-goals for local
end-plans
all end-plans at some level N , each plan being computed in parallel to assemble the
subgoals of plans at level N + 1.
System Identification
System
control
extend/refactor maintain
world model B self model
World maintain
A
idenƟfy
idenƟfy
Fig. 10.3 Reflective system adaptation. System identification is performed on and by the system
itself as its world model grows. The dynamics of the system is controlled by two processes by
means of rule and type synthesis: A reacts to variations of surface phenomena (Sect. 10.2), whereas
B proactively invokes structural heuristics (intensification/diversification, Sect. 10.3). B is domain-
independent, decoupled from A, and operates deeper in the structure of the world model and at
longer time horizons. Constraints on resources and responsiveness are less stringent for B than for
A
r
T R2 R3
T
r
… S
R0 R1 R4 R5
… r … r
S
Fig. 10.4 General control loop template. T is the terminal state, S the controlled state. The prop-
erties P1 , P2a and P2b are illustrated in blue, red, and green, respectively; see text for details
(P2 ) There exists a terminal state T such that (a) T is abducted from S and (b) S is
predicted from efferent copies of T .
The self model is both hierarchical and compositional, both properties being
inherited from rules and contracts. Technically, two rules
X : A → B and Y : C → D with B ∩ C = ∅
F = (A ∪ C) \ (B ∩ C) and G = (B ∪ D) \ (B ∩ C)
Y : C → D with Bi ∩ C = ∅
A0
B0 A0
A0 B0 A0 A1 D
B0 B0 L1
A1 R0 + B1 L0 + R1 = B1 C
A2 A2
A0 E
A0 B0
A0
A0 B0 C E
B0 A1 D
A1 L0 + B1
L1 = L2
A2 B1
B2 F
A2
B2
Fig. 10.5 Loop composition. The result of a + operation is a loop describing its arguments at one
level of hierarchy higher. Controlled states are marked in blue, terminal states in orange, parameters
in green, and control loops in grey; see text for details
n
n
F= (Ai ∪ C) \ (Bi ∩ C)
i=1 i=1
and
n
n
G= (Bi ∪ D) \ (Bi ∩ C)
i=1 i=1
Control loops can therefore be expressed by the same formalism used for rules (see
Fig. 10.5) and, in particular, the general loop template with terminal state T and
controlled state S is
L(T, S) = {R0 + rα + R1 + rβ , R2 + rγ + R3 } + R4 + rδ + R5
where the ri are free parameters and the Ri are subject to the following constraints:
R0 : A0 → B0 , T ∈ A0 R1 : A1 → B0 , S ∈ A1
R2 : A2 → B2 , T ∈ A2 R3 : A3 → B3
R4 : A4 → B4 , B3 ∩ A4 = ∅ R5 : A5 → B5 , S ∈ B5
L is then the search pattern for identifying control loops during system identification
at arbitrary levels of abstraction.
10.2 Reactive Synthesis 99
The reactive synthesis of a rule is triggered by one of the following two events: the
unpredicted success of a goal, or the failure of a prediction—a missing path in the
model structure in the first case, a faulty path in the second. We begin with the first
case.
b30 b2
b3 b31 b2 b20
a3 b32 a2 b21
2
a1 b10
b1 b11
a0 b0 b12 b3 b6
0 b4 b7
b0 b01 =M
0
b5 b8
b2 b0
C = b1
b2
a. b.
Fig. 10.6 Synthesis of a rule (A, PA , f ) → (B, PB ) with A = R2 and B = R3 . a Samples, each
a pair (ai , bi ) of vectors in A and B, are observed in a succession from which an orthogonal
basis (α) of A is constructed. b α is then transformed into an axis-aligned unit basis (β); f is
b : B = C + M × a : A. PA is the polytope formed by the ai whereas PB is that formed by the bi ;
see text for details
100 10 2nd Order Automation Engineering
A second possible trigger for the reactive synthesis of a rule is the failure of a
prediction. We proceed as in the case of the unexpected success of a goal, to synthesize
a rule N : (C, PC , f ) → (D, PD ) where (D, PD ) is the refinement type denoting the
failure of some rule R. N is a negative precondition for R: when N fires, the reliability
of the inferences produced by R is lowered, proportionally to that of the inference d :
D produced by N ; during abduction, a goal matching the right side of R triggers the
production of a subgoal to prevent N from firing, hence, other subgoals are derived
in order to avoid instances of C. Figure 10.7 illustrates the flow of inferences.
Whereas predicates in left-hand refinement types impose local constraints on
forward input bindings, negative preconditions impose global constraints on the
target rule, i.e., they define the context in which the rule is likely to fail, regardless
of the validity of the bindings of its left-hand refinement type. Just as there exist
contexts for failure, there exist contexts for success: these are captured in the left-
hand refinement types of positive preconditions—see Fig. 10.8. The synthesis of the
positive precondition P is triggered by the success of R if not already predicted
10.2 Reactive Synthesis 101
A0 B0
A B success(R, A(*, A1(x, y, z)))
x A1 R
y b1
z B1
a.
= success(N, C(*, C1(i, *, *)))
i C D
C1 N
= j = success(R, A(*, A1(x, *, *)))
= k
= C1(i, *, *)
A0 B0
A B
x A1 R
=y b1
=z = A1(x, *, *) B1
b.
Fig. 10.7 A negative precondition N for a rule R is represented. The first type (in grey) embedded
in the type of either the left- or right-hand term of R always denotes the success of R; inputs
matching A0 or B0 do not trigger inferencing. a Forward inference (in blue): any time an instance
a : A triggers a forward inference, a prediction success(R, a) is produced by R, which in turn may
trigger the production by N of its negation: this lowers the likelihood of b1 . b Backward inference
(in red): b1 → λ ∧ , λ → ϕ ∧ κ, → δ ∨ α, δ → β ∨ γ
by another positive precondition. Note that preconditions are rules and as such, can
themselves be subjected to other preconditions.
Any time a precondition P : A → S (where S is the success/failure of some
rule R) is synthesized, an abstract type X and a transducer T are also surmised: X
is defined as a new dimension of the state space, and T as A → X . The transfer
function of T is learned as any other function, with the minor difference that since
X is abstract (therefore not observable), evidences for the outcome of R are taken as
proxies for evidences of X .
Mirroring preconditions, whereas predicates in right-hand refinement types define
local guarantees on forward output bindings, postconditions define global conse-
quences of rule firings. Recall that forward rule firing statements (e.g., success(R, ...)
or ¬success(R, ...)) are first-class states: hence they can be surmised as input types
embedded in the composite left-hand type of a rule. Postconditions are rules taking
the (forward) firing of other rules as input types. Such input types summarize (parts
of) the state history: if a rule R admits Ai as input types composing its left-hand
type, then the single input type success(R, ...) in a postcondition S is equivalent to
S admitting all the Ai (and the conjunction of the negations of all negative precon-
ditions for R) as its own input types. Conversely, an input type ¬success(R, ...) in
S is equivalent to the negation of at least one of the Ai or the assertion of at least
102 10 2nd Order Automation Engineering
A0 B0
A B
x A1 R
y b1
=z = A1(x, y, *) B1
b.
Fig. 10.8 A positive precondition P for a rule R is represented. a Forward inference: any time
an instance c : C triggers a forward inference, a prediction success(R, ∗) is produced by P: this
increases the likelihood of b1 . b Backward inference: b1 → κ ∧ , κ → ϕ, → δ ∧ α, δ → β ∧ γ
one of the negative preconditions for R. Note that some of the Ai may themselves
be the firing of postconditions: this enables postconditions to capture state histories
as temporal hierarchical patterns.
3The related notions of exploration and exploitation also appear in the literature on evolutionary
computation and intrinsic motivation.
10.3 Proactive Synthesis 103
As observed by Blum and Roli [26], the notions of intensification and diversi-
fication are not mutually exclusive. For example, while random search is strongly
diversifying and gradient descent is strongly intensifying, simulated annealing [172]
lies somewhere in-between, progressing from diversification at high temperatures to
intensification at low temperatures.
The control mechanism we describe here is derived from that of the Reactive
Tabu Search (RTS) [18, 19] an extension of Glover’s original tabu search [113].
Tabu search is a local search metaheuristic [208] in the same family of ‘single-
point search’ techniques as stochastic gradient descent and simulated annealing.
The basic mechanism of tabu search (termed recency-based memory) maintains a
restricted local neighbourhood by prohibiting the choice of a neighbouring state if
(some attribute of) that neighbour has been encountered recently. The simplest (or
fixed-tabu) implementation implements the recency structure as a sequence of the
last k states encountered, where k is the tabu-tenure. In addition to the recency-
based memory structure (which could be said to model ‘short-term’ memory), many
implementations also maintain a ‘long-term’ or frequency-based memory, which is
essentially a frequency-histogram of attributes with a tenure much larger than k.
The essential idea of RTS is to inform control via dynamical system metrics. We
therefore briefly recap dynamical systems terminology. Informally, an attractor of
a dynamical system is a set of points in state space such that all ‘nearby’ points in
the state space eventually move close to it. The simplest dynamical system is one in
which there is a single fixed point which acts as an attractor for all states. The next
simplest attractor is a limit cycle, in which trajectories converge to a closed loop.
The cycle length of a dynamical system in state s is the number of iterations since
s was last encountered (or ∞ if no encounter with s is recorded). It is also possible
for the trajectory to be confined within some region of phase space but exhibit no
obvious periodicity, due to the presence of a so-called strange attractor. RTS thus
instruments the search space with recency and frequency information, in order to
drive control mechanisms that:
1. are self-adaptive (tabu-tenure is a function of the moving average of the detected
cycle-length);
2. maintain a good balance between intensification (i.e. exploration of promising
regions) and diversification;
3. recognize when the search lies in the attractor of an unpromising region.
The presence of previously-encountered attributes in the recency list triggers
the ‘fast-reaction’ mechanism which leads on successive iterations to a geometric
increase in recency tabu-tenure. This tends to force the search to choose neighbours
with unexplored attributes and will eventually break any limit cycle. Conversely, a
‘slow-reaction’ mechanism acts to counter redundant prohibition by reducing the
tabu-tenure when the fast-reaction mechanism has been idle for some time. In order
to detect the presence of strange attractors, the number of repeated attributes in
the frequency structure is examined. When the number of such attributes exceeds a
threshold level an ‘escape mechanism’ is activated, which (as per Battiti’s original
104 10 2nd Order Automation Engineering
The heuristics above are merely intended to give a flavor of the relationship
between metrics and strategies. Cross-domain heuristics for proactive synthesis are
clearly of great potential value, and are rightfully the subject of an extended empirical
investigation.
In order to make a choice of rules and target types in the above, the selection
mechanism must ultimately be grounded in a specific state space. There are actually
two choices here: (1) the ‘first-order’ state space described by the current ruleset and
(2) the infinite ‘second-order’ state space described by the prospective addition of
rules. By means of defunctionalization [287], second-order rules of the form A →
(B → C) can be ‘uncurried’ into first-order rules (A, B) → C, thereby allowing all
selection decisions to be grounded in a first order state space as follows:
10.4 Safety
Experiential Safety
winds, for example. In such an ‘experiential safety’ setting, safety guarantees are
at both global and local levels: the local level is concerned with guarantees at the
scale of single inference steps, the global scale with the longer-term behaviour of
the system. Regarding local guarantees, the framework provided by F-algebras is of
particular importance for general intelligence, in that it allows safety to be reconciled
with open-endedness. Specifically:
• Open-endedness: the algorithm template for the F-algebra of a type can be pro-
grammatically derived [334], even if the type has been synthesized online by the
system. The template orchestrates the invocation of learned rules, as described in
Sect. 10.3.
• Safety: The interpreter defined by the algorithm template can nonetheless be con-
strained to well-defined behavior, i.e., mapping only between prescribed input and
output types with required constraints.
Global safety properties of general interest are reachability, i.e. find the set of
states reachable from a given initial state X 0 and controllability, i.e. find the set of
states controllable to a given final state X t . Modulo efficiency considerations, the
bidirectional nature of rules means that reachability and control can be considered
equivalent—they are anyway the same in linear systems, for example (further details
of a categorical treatment in this setting can be found in Section 11.2.4). Depending
on the properties of the expression language used, variants (e.g. point-to-point reach-
ability) may be more computationally efficient. More generally, such guarantees can
be divided into two categories:
Formal Reachability
Time-Bounded Reachability
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Chapter 11
Prospects
D. A. Wilkenfeld [364]
11.1 Summary
Previous chapters have argued that compositionality is key to addressing the foun-
dational issues of generalization and representing long-tailed distributions with high
sample efficiency. This suggests that what is desired are a collection of represen-
tations which form a ‘basis set’, the elements of which can be composed (perhaps
nontrivially, via analogy) to describe a wide range of phenomena. We therefore claim
that the strongest possible emphasis should be placed on the search for compositional
primitives, i.e., compressed parametric representations of recurring phenomena, cap-
tured across multiple sensorimotor modalities. In cognitive science, such abstractions
are known as image schema [159, 188] and are intended to represent common pat-
terns in the embodied experience of space, force, motion, etc. Early work in this
area induced image schema corresponding to spatial propositions from video data
[96, 285]. There have also been attempts to model image schema symbolically [3,
139, 182], with recent work on a qualitative representation of containers in a sorted
first order language [60]. It is clearly desirable that computational representations of
image schema enjoy the cross-domain ubiquity ascribed to their cognitive counter-
parts. Concurrently with the development of the present work, a recent trends in deep
learning is the proposed universality of so-called ‘foundation models’ [29] which
provide a broad basis of representations for downstream tasks through large-scale
self-supervised training. While this paradigm offers the advantage of well-known
engineering pipelines, we saw in Chap. 4 that compositionality in the algebraic sense
is essentially absent from deep learning, as considered across heterogeneous archi-
tectures and arbitrary constraints. Furthermore, the full grounding of language and
other symbols will require representations which support strong invariant propaga-
tion and the ability to produce reasonable counterfactual statements. Since achieving
the associated ‘malleability of representation’ enjoyed by humans has so far proved
elusive, it is perhaps useful to focus initially on a related, but more overtly embodied
notion: that of ‘affordances’.
Affordances
The term ‘affordance’ was coined by Gibson [109] to describe a relation between an
agent and its environment, grounded by the physical embodiment of the agent and
the recognition capacity of its perception system:
If you know what can be done with a graspable detached object, what it can be used for, you
can call it whatever you please. The theory of affordances rescues us from the philosophical
muddle of assuming fixed classes of objects, each defined by its common features and then
given a name. [. . .] But this does not mean you cannot learn how to use things and perceive
their uses. You do not have to classify and label things in order to perceive what they afford.
• Tool use in crows, where previous work [301] implies that affordances are not
simply a function of the relation between body and environment, and have (at the
very least) a memetic component.
• Although the ability to create fire (from flint and tinder) or steel (from iron and
carbon) could be said to be inherent in their component parts, their manufacture
required nontrivial insight.
Hence affordances offer an overarching perspective on situated representations.
We believe that composition of affordances is a key step towards general intelligence
and that the category theoretic machinery of Sect. 9.1 provides a suitable framing
for processes that abstract and generalize across complex configuration spaces. The
initial research task is then to determine the right ‘expression language’ for describing
the affordances of simple agents in simple domains (which are nonetheless ‘noisy
and real-world’ [35]).
Subject to the ability to generalize from initial results to more complex domains, it
is then appropriate to progress from explicitly agent-centric affordances to the more
general patterns described as image schema, which might then have a greater prospect
of being more independent of any specific embodied configuration. By these means,
it may be possible to determine whether image schema do indeed exist as universal
compositional primitives and whether—as has variously been suggested [145, 228,
229, 320]—analogy has a vital role as a universal mechanism for leveraging existing
knowledge.
r, s | r 4 = s 2 = (sr )2 = 1
is semidecidable; it will halt for all finitely presented algebras, but it is not possible
to determine in advance how long this will take. One option would therefore be to
run it as a background monitor or (typically) low-priority task, as a form of ‘garbage
collection’ for causal inconsistencies.
11.3 Conclusion
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter’s Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.
Bibliography
1. P. Abbeel, A.Y. Ng, Apprenticeship learning via inverse reinforcement learning, in Twenty-
First International Conference on Machine Learning—ICML’04 (ACM Press, 2004)
2. M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M.B. Srivastava, K.-W. Chang, Generating
natural language adversarial examples, in Proceedings of the 2018 Conference on Empirical
Methods in Natural Language Processing, Brussels, Belgium, Oct. 31–Nov. 4, 2018, ed. by
E. Riloff, D. Chiang, J. Hockenmaier, J. Tsujii (Association for Computational Linguistics,
2018), pp. 2890–2896
3. S.R. Amant, T. Clayton, (Paul R. Cohen, and Carole R. Beal. An Image Schema Language,
Morrison, Yu-Han Chang, Wei Mu, 2006)
4. D. Amodei, C. Olah, J. Steinhardt, P.F. Christiano, J. Schulman, D. Mané, Concrete Problems
in AI Safety (2016). arxiv: abs/1606.06565
5. J. Andreas, Measuring compositionality in representation learning, in 7th International Con-
ference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019
(2019). https://www.OpenReview.net
6. M. Andrychowicz, D. Crow, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin,
P. Abbeel, W. Zaremba. Hindsight experience replay, in Advances in Neural Information
Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017,
4–9 Dec. 2017, Long Beach, CA, USA, ed. by I. Guyon, U. von Luxburg, S. Bengio, H.M.
Wallach, R. Fergus, S.V.N. Vishwanathan, R. Garnett, pp. 5048–5058 (2017)
7. F. Arabshahi, Z. Lu, S. Singh, A. Anandkumar, Memory Augmented Recursive Neural Net-
works (2015). arxiv:abs/1911.01545
8. W.R. Ashby, Design for a Brain (Wiley, Science Paperbacks, 1960)
9. A. Athalye, L. Engstrom, A. Ilyas, K. Kwok, Synthesizing robust adversarial examples, in Pro-
ceedings of the 35th International Conference on Machine Learning, Proceedings of Machine
Learning Research, PMLR, Stockholm, Sweden, ed. by J. Dy, A. Krause, vol. 80, 10–15 Jul.
2018, pp. 284–293
10. R. Atkey, From parametricity to conservation laws, via Noether’s theorem, in The 41st Annual
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’14,
San Diego, CA, USA, vol. 20–21, pp. 491–502, Jan. 2014
11. T. Atkinson, H. Baier, T. Copplestone, S. Devlin, J. Swan, The text-based adventure AI
competition. IEEE Trans. Games 11(3), 260–266 (2019)
12. F. Baader, T. Nipkow, Term Rewriting and All That (Cambridge University Press, 1999)
13. P.-L. Bacon, J. Harb, D. Precup, The Option-Critic Architecture (2016). arxiv: bs/1609.05140
© The Editor(s) (if applicable) and The Author(s) 2022 119
J. Swan et al., The Road to General Intelligence,
Studies in Computational Intelligence 1049,
https://doi.org/10.1007/978-3-031-08020-3
120 Bibliography
14. A.P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, Z.D. Guo, C. Blundell,
Agent57: Outperforming the Atari Human Benchmark (2020). arxiv:abs/2003.13350
15. J.C. Baez, J. Erbele, Categ. Control 1405, 6881 (2015)
16. D. Bahdanau, K. Cho, Y. Bengio, Neural Machine Translation by Jointly Learning to Align
and Translate (2014). arxiv:abs/1409.0473
17. T. Bansal, J. Pachocki, S. Sidor, I. Sutskever, I. Mordatch, Emergent complexity via multi-
agent competition. In: 6th International Conference on Learning Representations, ICLR 2018,
Vancouver, BC, Canada, 30 Apr.–3 May 2018, Conference Track Proceedings (2018). https://
www.OpenReview.net
18. R. Battiti, G. Tecchiolli, The reactive tabu search. Informs. J. Comput. 6(2), 126–140 (1994)
19. R. Battiti, G. Tecchiolli, The continuous reactive tabu search: blending combinatorial opti-
mization and stochastic search for global optimization. Ann. Oper. Res. 63(2), 151–188 (1996)
20. F. Bellas, R.J. Duro, A. Faina, D. Souto, Multilevel Darwinist Brain (MDB): artificial evolution
in a cognitive architecture for real robots. IEEE Trans. Auton. Mental Dev. 2(4), 340–354
(2010)
21. Y. Bengio, Y. Lecun, G. Hinton, Deep learning for AI. Commun. ACM 64(7), 58–65 (2021)
22. C. Berner, G. Brockman, B. Chan, V. Cheung, P. Debiak, C. Dennison, D. Farhi, Q. Fischer,
S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. Pondé
de Oliveira Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J.
Tang, F. Wolski, S. Zhang, Dota 2 with Large Scale Deep Reinforcement Learning (2019).
arxiv:abs/1912.06680
23. T.R. Besold, A.S. d’Avila Garcez, S. Bader, H. Bowman, P.M. Domingos, P. Hitzler, K.-
U. Kühnberger, L.C. Lamb, D. Lowd, P.M.V. Lima, L. de Penning, G. Pinkas, H. Poon, G.
Zaverucha, Neural-Symbolic Learning and Reasoning: a Survey and Interpretation (2017).
arxiv:abs/1711.03902
24. H. Bharadhwaj, A. Kumar, N. Rhinehart, S. Levine, F. Shkurti, A. Garg, Conserv. Saf. Crit.
Explor. 2010, 14497 (2020)
25. C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics)
(Springer, Berlin, Heidelberg, 2006)
26. C. Blum, A. Roli, Metaheuristics in combinatorial optimization: overview and conceptual
comparison. ACM Comput. Surv. 35(3), 268–308 (2003)
27. M.S. Boddy, T. Dean, Deliberation scheduling for problem solving in time-constrained envi-
ronments. Artif. Intell. 67, 245–285 (1994)
28. J. Bolt, B. Coecke, F. Genovese, M. Lewis, D. Marsden, R. Piedeleu, Interacting Conceptual
Spaces I : Grammatical Composition of Concepts (2017). arxiv:abs/1703.08314
29. R. Bommasani, D.A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M.S. Bernstein, J.
Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji,
A. Chen, K. Creel, J.Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S.
Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Goel, N.
Goodman, S. Grossman, N. Guha, T. Hashimoto, P. Henderson, J. Hewitt, D.E. Ho, J. Hong,
K. Hsu, J. Huang, T. Icard, S. Jain, D. Jurafsky, P. Kalluri, S. Karamcheti, G. Keeling, F.
Khani, O. Khattab, P.W. Koh, M. Krass, R. Krishna, R. Kuditipudi, A. Kumar, F. Ladhak,
M. Lee, T. Lee, J. Leskovec, I. Levent, X.L. Li, X. Li, T. Ma, A. Malik, C.D. Manning, S.
Mirchandani, E. Mitchell, Z. Munyikwa, S. Nair, A. Narayan, D. Narayanan, B. Newman,
A. Nie, J.C. Niebles, H. Nilforoshan, J. Nyarko, G. Ogut, L. Orr, I. Papadimitriou, J.S. Park,
C. Piech, E. Portelance, C. Potts, A. Raghunathan, R. Reich, H. Ren, F. Rong, Y. Roohani,
C. Ruiz, J. Ryan, C. Ré, D. Sadigh, S. Sagawa, K. Santhanam, A. Shih, K. Srinivasan, A.
Tamkin, R. Taori, A.W. Thomas, F. Tramèr, R.E. Wang, W. Wang, B. Wu, J. Wu, Y. Wu, S.M.
Xie, M. Yasunaga, J. You, M. Zaharia, M. Zhang, T. Zhang, X. Zhang, Y. Zhang, L. Zheng,
K. Zhou, P. Liang, On Oppor. Risks Found. Models 108, 07258 (2021)
30. N. Bostrom, Ethical issues in advanced artificial intelligence, in Science Fiction and Philos-
ophy: From Time Travel to Superintelligence, ed by S. Schneider (Wiley, 2009)
31. K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz,
P. Pastor, K. Konolige, S. Levine, V. Vanhoucke, Using simulation and domain adaptation
Bibliography 121
80. O. Evans, A. Stuhlmüller, N.D. Goodman, Learning the preferences of ignorant, inconsistent
agents, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 12–17 Feb.
2016, Phoenix, Arizona, USA, ed. by D. Schuurmans, M.P. Wellman (AAAI Press, 2016), pp.
323–329
81. U. Evci, F. Pedregosa, A. Gomez, E. Elsen, Difficul. Train. Sparse. Neural Netw. 1906, 10732
(2020)
82. T. Everitt, M. Hutter, Avoiding wireheading with value reinforcement learning, in: Artificial
General Intelligence—9th International Conference, AGI 2016, New York, NY, USA, 16–19
Jul 2016. Lecture Notes in Computer Science, vol. 9782 (Springer, 2016), pp. 12–22
83. B. Eysenbach, S. Gu, J. Ibarz, S. Levine, Leave no trace: learning to reset for safe and
autonomous reinforcement learning, in 6th International Conference on Learning Represen-
tations, ICLR 2018, Vancouver, BC, Canada, Conference Track Proceedings, 30 Apr.–3 May
2018 . https://www.OpenReview.net
84. G. Fauconnier, M. Turner. The Way We Think: Conceptual Blending and the Mind’s Hidden
Complexities (Basic Books, 2002)
85. M. Fernández, H. Kirchner, B. Pinaud, Strategic port graph rewriting: an interactive modelling
framework. Math. Struct. Comput. Sci. 29(5), 615–662 (2019)
86. C. Finn, P. Abbeel, S. Levine, Model-Agnostic meta-learning for fast adaptation of deep
networks, in Proceedings of the 34th International Conference on Machine Learning, ICML
2017, Sydney, NSW, Australia, 6–11 Aug. 2017, Proceedings of Machine Learning Research
PMLR, ed. by D. Precup, Y.W. Teh, vol. 70, pp. 1126–1135 (2017)
87. J.A. Fodor, The Language of Thought. Language and Thought Series (Harvard University
Press, 1975)
88. B. Fong, M. Johnson, Lenses and learners, in Proceedings of Bx 2019, CEUR Workshop
Proceedings, vol. 2355, pp. 16–29 (2019)
89. B. Fong, D. Spivak, An Invitation to Applied Category Theory: Seven Sketches in Composi-
tionality (Cambridge University Press, 2019)
90. B. Fong, D.I. Spivak, R. Tuyéras, Backprop as functor: a compos. Perspect. Superv. Learn.
1711, 10455 (2017)
91. N. Foster, M. Greenwald, J. Moore, B. Pierce, A. Schmitt, Combinators for bi-directional tree
transformations: a linguistic approach to the view update problem. ACM Trans. Programm.
Lang. Syst. 29(3) (2007)
92. J. Franke, J. Niehues, A. Waibel, Robust and scalable differentiable neural computer for ques-
tion answering, in Proceedings of the Workshop on Machine Reading for Question Answer-
ing@ACL 2018, Melbourne, Australia, 19 Jul 2018, ed. by E. Choi, M. Seo, D. Chen, R. Jia,
J. Berant (Association for Computational Linguistics, 2018), pp. 47–59
93. J. Frankle, M. Carbin, The lottery ticket hypothesis: finding sparse, trainable neural networks,
in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA,
USA, 6–9 May 2019. https://www.OpenReview.net
94. K. Frans, J. Ho, X. Chen, P. Abbeel, J. Schulman, Meta learning shared hierarchies, in 6th
International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada,
30 Apr.–3 May 2018, Conference Track Proceedings (2018). https://www.OpenReview.net
95. R.M. French The Subtlety of Sameness: A Theory and Computer Model of Analogy-Making
(The MIT Press, 1995)
96. R.M. French, Constrained connectionism and the limits of human semantics. Philosop. Psy-
chol. 12(4), 515–523 (1999)
97. R.M. French, P. Anselme, Interactively converging on context-sensitive representations: a
solution to the frame problem. Revue Internationale de Philosophie 53(209 (3)), 365–385
(1999)
98. T. Fritz, A synthetic approach to Markov kernels, conditional independence and theorems on
sufficient statistics. Adv. Math. 370 (2020)
99. T. Gale, E. Elsen, S. Hooker, State sparsity deep. Neural Netw. 1902, 09574 (2019)
100. M. Gallieri, S.S.M. Salehian, N.E. Toklu, A. Quaglino, J. Masci, J. Koutník, F.J. Gomez, Safe
Interactive Model-Based Learning (2019). arxiv:abs/1911.06556
124 Bibliography
101. E. Gamma, R. Helm, R. Johnson, J.M. Vlissides, Design Patterns: Elements of Reusable
Object-Oriented Software (Addison-Wesley Professional, 1 edn., 1994)
102. Y. Ganin, V.S. Lempitsky, Unsupervised domain adaptation by backpropagation, in Proceed-
ings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France,
6–11 Jul 2015, JMLR Workshop and Conference Proceedings, ed. by F.R. Bach, D.M. Blei,
vol. 37, pp. 1180–1189 (2015). https://www.JMLR.org
103. P. Gärdenfors, The Geometry of Meaning: Semantics Based on Conceptual Spaces (The MIT
Press, MIT Press, 2014)
104. P. Gärdenfors, Conceptual Spaces: The Geometry of Thought (MIT Press, Cambridge, MA,
USA, 2000)
105. M. Garnelo, K. Arulkumaran, M. Shanahan, Towards Deep Symbolic Reinforcement Learning
(2016). arxiv:abs/1609.05518
106. D. Gentner, K.D. Forbus, Computational models of analogy. Wiley Interdiscip. Rev.: Cogn.
Sci. 2(3), 266–276 (2011)
107. D. Gentner, C. Hoyos, Analogy and abstraction. Top. Cogn. Sci 9(3):672–693 (2017). https://
onlinelibrary.wiley.com, https://doi.org/10.1111/tops.12278
108. N. Ghani, J. Hedges, V. Winschel, P. Zahn, Compositional game theory, in Proceedings of
Logic in Computer Science (LiCS) 2018 (ACM, 2018), pp. 472–481
109. J.J. Gibson, The Ecological Approach to Visual Perception (Houghton Mifflin, 1979)
110. M. Giry, A categorical approach to probability theory. Categorical Aspects of Topology and
Analysis, pp. 68–85 (1982)
111. F. Giunchiglia, T. Walsh, A theory of abstraction. Artif. Intell. 57(2), 323–389 (1992)
112. A. Gleave, M. Dennis, N. Kant, C. Wild, S. Levine, S. Russell, Adversarial Policies: Attacking
Deep Reinforcement Learning (2019). arxiv:abs/1905.10615
113. F. Glover, M. Laguna, Tabu Search (Kluwer Academic Publishers, Norwell, MA, USA, 1997)
114. F. Gobet, P.C.R. Lane, S. Croker, P.C.-H. Cheng, G. Jones, I. Oliver, J.M. Pine, Chunking
mechanisms in human learning. Trends Cogn. Sci. 5(6), 236–243 (2001)
115. K. Gödel, Über formal unentscheidbare Sätze der Principia Mathematica und verwandter
Systeme I. Monatshefte für Mathematik und Physik 38(1), 173–198 (1931)
116. B. Goertzel, Patterns of Cognition: Cognitive Algorithms as Galois Connections Fulfilled by
Chronomorphisms On Probabilistically Typed Metagraphs (2021). arxiv:abs/2102.10581
117. J. Goguen, An introduction to algebraic semiotics, with application to user interface design,
in Computation for Metaphors, Analogy, and Agents, Berlin, Heidelberg. ed. by C.L. Nehaniv
(Springer, Berlin, Heidelberg, 1999), pp. 242–291
118. J. Goguen, Mathematical models of cognitive space and time, in Reasoning and Cognition,
ed. by D. Andler, M. Okada, I. Watanabe, pp. 125–128 (2006)
119. J.A. Goguen, What is unification?—A categorical view of substitution, equation and solution,
in Resolution of Equations in Algebraic Structures, Algebraic Techniques, vol. 1 (Academic,
1989), , pp. 217–261
120. J.I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, MA, USA,
2016)
121. I.J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, in
3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA,
7–9 May 2015, Conference Track Proceedings, ed. by Y. Bengio, Y. LeCun (2015)
122. C.A.E. Goodhart, Problems of Monetary Management: the UK experience, in Monetary The-
ory and Practice (Macmillan Education, UK, 1984), pp. 91–121
123. A. Graves, G. Wayne, I. Danihelka, Neural Turing Machines (2014). arxiv:abs/1410.5401
124. A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwińska, S.G.
Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, A.P. Badia, K.M. Hermann, Y. Zwols,
G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, D. Hassabis,
Hybrid computing using a neural network with dynamic external memory. Nature 538(7626),
471–476 (2016)
125. K. Guu, K. Lee, Z. Tung, P. Pasupat, M.-W. Chang, REALM: Retrieval-Augmented Language
Model Pre-Training. arxiv:abs/2002.08909
Bibliography 125
145. D. Hofstadter, Analogy as the core of cognition, in The Analogical Mind: Perspectives from
Cognitive Science, ed. by D. Gentner, K.J. Holyoak, B.N. Kokinov. (MIT Press, 2001), pp.
499–538
146. D. Hofstadter, M. Mitchell, Fluid Concepts and Creative Analogies: Computer Models of
the Fundamental Mechanisms of Thought, chapter The Copycat Project: A Model of Mental
Fluidity and Analogy-Making (Basic Books Inc, USA, 1995), pp. 205–267
147. D.R. Hofstadter, Godel, Escher, Bach: An Eternal Golden Braid (Basic Books Inc, USA,
1979)
148. D.R. Hofstadter, E. Sander. Surfaces and Essences: Analogy as the Fuel and Fire of Thinking
(Basic Books, 2013)
149. J.H. Holland, K.J. Holyoak, R.E. Nisbett, P.R. Thagard, Induction: Processes of Inference,
Learning, and Discovery (MIT Press, Cambridge, MA, USA, 1986)
150. E.J. Horvitz, G. Rutledge, Time-Depend. Util. Action Under Uncertain. 1303, 5722 (2013)
151. S.H. Huang, N. Papernot, I.J. Goodfellow, Y. Duan, P. Abbeel, Adversarial attacks on neural
network policies, in 5th International Conference on Learning Representations, ICLR 2017,
Toulon, France, Apr 2017, Workshop Track Proceedings (2017). https://www.OpenReview.
net
152. D.A. Hudson, C.D. Manning, Learning by abstraction: the neural state machine, in Advances
in Neural Information Processing Systems 32: Annual Conference on Neural Information
Processing Systems 2019, NeurIPS 2019, 8–14 Dec. 2019, Vancouver, BC, Canada, ed. by
H.M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E.B. Fox, R. Garnett, pp.
5901–5914 (2019)
153. M. Hutter, A Theory of Universal Artificial Intelligence based on Algorithmic Complexity
(2000). arxiv:cs.AI/0004001
154. G. Hutton, A Tutorial on the Universality and Expressiveness of Fold. J. Funct. Program. 9(4),
355–372 (1999)
155. F.N. Iandola, M.W. Moskewicz, K. Ashraf, S. Han, W.J. Dally, K. Keutzer, SqueezeNet:
AlexNet-level Accuracy with 50x Fewer Parameters and <1MB Model Size (2016).
arxiv:abs/1602.07360
156. M. Jaderberg, V. Dalibard, S. Osindero, W.M. Czarnecki, J. Donahue, A. Razavi, O. Vinyals,
T. Green, I. Dunning, K. Simonyan, C. Fernando, K. Kavukcuoglu, Population Based Training
of Neural Networks (2017). arxiv:abs/1711.09846
157. R. Jhala, N. Vazou, Refin. Types: A Tutor. 2010, 07763 (2020)
158. R. Jia, P. Liang, Adversarial examples for evaluating reading comprehension systems, in
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,
EMNLP 2017, Copenhagen, Denmark, 9–11 Sep. 2017, ed. by M. Palmer, R. Hwa, S. Riedel
(Association for Computational Linguistics, 2017), pp. 2021–2031
159. M. Johnson, The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason
(University of Chicago Press, 1987)
160. J.P.J. Neto, Solving the nonlinear pendulum equation with nonhomogeneous initial conditions.
Int. J. Apll. Math. 30, 06 (2017)
161. L. Kaelbling, Learning to achieve goals. IJCAI (1993)
162. W. Kahl, Finite Limits and Anti-unification in Substitution Categories, pp. 87–102, June 2019
163. L. Kaiser, I. Sutskever, Neural GPUs learn algorithms, in 4th International Conference on
Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference
Track Proceedings, ed. by Y. Bengio, Y. LeCun (2016)
164. K. Kansky, T. Silver, D.A. Mély, M. Eldawy, M. Lázaro-Gredilla, X. Lou, N. Dorfman, S.
Sidor, D. Scott Phoenix, D. George, Schema networks: zero-shot transfer with a generative
causal model of intuitive physics, in Proceedings of the 34th International Conference on
Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 Aug. 2017, Proceedings of
Machine Learning Research, PMLR, ed. by D. Precup, Y.W. Teh, vol. 70, pp. 1809–1818
(2017)
165. J. Kaplan, Artificial intelligence: think again. Commun. ACM 60(1), 36–38 (2017)
Bibliography 127
185. S. Lack, P. Sobociński, Adhesive categories, in Foundations of Software Science and Com-
putation Structures, Berlin, Heidelberg. ed. by I. Walukiewicz (Springer, Berlin, Heidelberg,
2004), pp. 273–288
186. I. Lakatos. Criticism and the Growth of Knowledge: Proceedings of the International Collo-
quium in the Philosophy of Science, London, 1965, chapter Falsification and the Methodology
of Scientific Research Programmes, vol. 4 (Cambridge University Press, 1970), pp. 91–196
187. G. Lakoff, R.E. Núñez, Where Mathematics Comes from: How the Embodied Mind Brings
Mathematics Into Being (Basic Books, 2000)
188. G. Lakoff, Women, Fire and Dangerous Things: What Categories Reveal About the Mind
(University of Chicago Press, Chicago, 1987)
189. G. Lakoff, M. Johnson, Metaphors we Live by (University of Chicago Press, Chicago, 1980)
190. F. Lara-Dammer, D.R. Hofstadter, R.L. Goldstone, A computational model of scientific dis-
covery in a very simple world, aiming at psychological realism. J. Exp. Theor. Artif. Intell.
31(4), 637–658 (2019). https://doi.org/10.1080/0952813X.2019.1592234
191. K.Y.H. Largerspetz, Jakob von Uexküll and the origins of cybernetics. Semiotica 2001(134),
643–651 (2001)
192. R.H. Lathrop, On the learnability of the uncomputable, in Proceedings of the Thirteenth
International Conference on International Conference on Machine Learning, ICML’96, San
Francisco, CA, USA. (Morgan Kaufmann Publishers Inc, 1996), pp. 302–309
193. M. Lázaro-Gredilla, D. Lin, J. Swaroop Guntupalli, D. George, Beyond Imitation: Zero-
Shot Task Transfer on Robots by Learning Concepts as Cognitive Programs (2018).
arxiv:abs/1812.02788
194. S. Legg, M. Hutter, Universal intelligence: a definition of machine intelligence. Minds Mach.
17(4), 391–444 (2007)
195. J. Leike, D. Krueger, T. Everitt, M. Martic, V. Maini, S. Legg, Scalable Agent Alignment via
Reward Modeling: a Research Direction (2018). arxiv:abs/1811.07871
196. T. Leinster, Basic category theory. Cambridge Studies in Advanced Mathematics (Cambridge
University Press, 2014)
197. D. Lenat, M. Prakash, M. Shepherd, CYC: using common sense knowledge to overcome
brittleness and knowledge Acquistion Bottlenecks. AI Mag. 6(4), 65–85 (1986)
198. M. Leshno, V.Y. Lin, A. Pinkus, S. Schocken, Multilayer feedforward networks with a non-
polynomial activation function can approximate any function. Neural Netw. 6(6), 861–867
(1993)
199. S. Li, R. Wang, M. Tang, C. Zhang, Hierarchical reinforcement learning with advantage-
based auxiliary rewards, in Advances in Neural Information Processing Systems 32: Annual
Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8–14 Dec. 2019.
ed. by H.M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E.B. Fox, R. Garnett
(Canada, Vancouver, BC, 2019), pp. 1407–1417
200. Z. Li, F. Zhou, F. Chen, H. Li, Meta-SGD: Learning to Learn Quickly for Few Shot Learning
(2017). arxiv:abs/1707.09835
201. V. Lifschitz, What is answer set programming? in Proceedings of the 23rd National Conference
on Artificial Intelligence, AAAI’08, vol. 3 (AAAI Press, 2008), pp. 1594–1597
202. J. Lighthill, Artificial Intelligence: A General Survey. Technical report, UK Science Research
Council (1973)
203. H.W. Lin, M. Tegmark, D. Rolnick, Why does deep and cheap learning work so well? J. Stat.
Phys. 168(6), 1223–1247 (2017)
204. A. Liska, G. Kruszewski, M. Baroni, Memorize or Generalize? Searching for a Compositional
RNN in a Haystack (2018). arxiv:abs/1802.06467
205. H. Lofting, The Voyages of Doctor Dolittle (Doctor Dolittle. Frederick A. Stokes, 1922)
206. R. Logan, N.F. Liu, M.E. Peters, M. Gardner, S. Singh, Barack’s wife Hillary: using knowledge
graphs for fact-aware language modeling, in Proceedings of the 57th Annual Meeting of
the Association for Computational Linguistics, Florence, Italy, July 2019 (Association for
Computational Linguistics, 2019), pp. 5962–5971
Bibliography 129
207. J. Loula, M. Baroni, B. Lake, Rearranging the familiar: testing compositional generalization in
recurrent networks, in Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing
and Interpreting Neural Networks for NLP, Brussels, Belgium, Nov. 2018 (Association for
Computational Linguistics, 2018), pp. 108–114
208. S. Luke, Essentials of Metaheuristics (Lulu, 2nd edn., 2013).
http://cs.gmu.edu/∼sean/book/metaheuristics/
209. S. MacLane, Categories for the Working Mathematician. Graduate Texts in Mathematics, vol.
5 (Springer, New York, 1971)
210. G. Marcus, E. Davis, Rebooting AI: Building Artificial Intelligence We Can Trust (Knopf
Doubleday Publishing Group, 2019)
211. G. Marcus, An Epidemic of AI Misinformation (2019). https://thegradient.pub/an-epidemic-
of-ai-misinformation/. Accessed 3rd Sep. 2021
212. G. Marcus, GPT-2 and the Nature of Intelligence (2019). https://thegradient.pub/gpt2-and-
the-nature-of-intelligence/. Accessed 28th Jan. 2020
213. G. Marcus, The next decade in AI: four steps towards robust. Artif. Intell. 2002, 06177 (2020)
214. F.G. Marcus, The Algebraic Mind (MIT Press, 2001)
215. Mars Climate Orbiter Mishap Investigation Board. Phase I Report. Technical report (1999)
216. D. Marsden, Categ. Theory Using String Diag. 1401, 7220 (2014)
217. P. Martin-Löf, G. Sambin, Bibliopolis Intuitionistic Type Theory Studies in Proof Theory
(Bibliopolis, Napoli, 1984)
218. H.R. Maturana, F.J. Varela, Autopoiesis and Cognition: The Realization of the Living (Boston
Studies in the Philosophy and History of Science. Springer, Netherlands, 1991)
219. J. McCarthy, P.J. Hayes, Some Philosophical Problems from the Standpoint of Artificial Intel-
ligence (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1987), pp. 26–45
220. J. McCarthy, M.L. Minsky, N. Rochester, C.E. Shannon, A proposal for the dartmouth summer
research project on artificial intelligence, 31 Aug. 1955. AI Mag. 27(4), 12 (2006)
221. P. Medawar, British Annual Gathering of MENSA (Annual Lecture, 1962)
222. J. Merel, A. Ahuja, V. Pham, S. Tunyasuvunakool, S. Liu, D. Tirumala, N. Heess, G. Wayne,
Hierarchical visuomotor control of humanoids, in International Conference on Learning Rep-
resentations (2019)
223. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words
and phrases and their compositionality, in Proceedings of the 26th International Conference
on Neural Information Processing Systems, NIPS’13, Red Hook, NY, USA, vol. 2. (Curran
Associates Inc, 2013), pp. 3111–3119
224. R. Milner. A Calculus of Communicating Systems. Lecture Notes in Computer Science, vol.
92 (Springer, 1980)
225. P. Minervini, M. Bosnjak, T. Rocktäschel, S. Riedel, E. Grefenstette, Differentiable reasoning
on large knowledge bases and natural language, in The Thirty-Fourth AAAI Conference on
Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial
Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in
Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 Feb. 2020 (AAAI Press, 2020),
pp. 5182–5190
226. M. Minsky, S. Papert, Perceptrons: An Introduction to Computational Geometry (The MIT
Press, 1969)
227. N. Mishra, M. Rohaninejad, X. Chen, P. Abbeel, A simple neural attentive meta-learner, in 6th
International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada,
30 Apr.–3 May 2018, Conference Track Proceedings (2018). https://www.OpenReview.net
228. M. Mitchell, Can Analogy Unlock AI’s Barrier of Meaning? https://www.youtube.com/watch?
v=QvLEmueHhqY. Accessed 28th Feb. 2020
229. M. Mitchell, Abst. Anal.-Making Artif. Intell. 2102, 10717 (2021)
230. M. Mitchell, Why AI is Harder Than We Think 2104, 12871 (2021)
231. M. Mittal, M. Gallieri, A. Quaglino, S. Sina Mirrazavi Salehian, J. Koutník, Neural lyapunov
model predictive control (2020). arxiv:abs/2002.10451
130 Bibliography
252. H. Pattee, Evolving self-reference: matter, symbols, and semantic closure. Commun. Cogn.—
Artif. Intell. 12, 9–27 (1995)
253. H. Pattee, Laws, Language and Life: Howard Pattee’s classic papers on the physics of symbols
with contemporary commentary, chapter Cell Psychology: An Evolutionary Approach to the
Symbol-Matter Problem (Springer, Netherlands, Dordrecht, 2012), pp. 165–179
254. J. Pearl, Causality: Models, Reasoning and Inference, 2nd edn. (Cambridge University Press,
New York, NY, USA, 2009)
255. X.B. Peng, M. Andrychowicz, W. Zaremba, P. Abbeel, Sim-to-Real transfer of robotic con-
trol with dynamics randomization, in 2018 IEEE International Conference on Robotics and
Automation, ICRA 2018, Brisbane, Australia, 21–25 May 2018, pp. 1–8
256. M.E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep
contextualized word representations, in Proceedings of NAACL (2018)
257. G. Pezzulo, G. Calvi, Dynamic computation and context effects in the hybrid architecture
AKIRA, in Proceedings of the 5th International Conference on Modeling and Using Context,
CONTEXT’05, Berlin, Heidelberg (Springer, 2005), pp. 368–381
258. S. Phillips, A general (category theory) principle for general intelligence: duality (adjointness),
in Artificial General Intelligence, Cham, ed. by T. Everitt, B. Goertzel, A. Potapov (Springer
International Publishing, 2017), pp. 57–66
259. S. Phillips, W.H. Wilson, Categorial compositionality III: F-(co)algebras and the systematicity
of recursive capacities in human cognition. PloS One 7(4), e35028 (2012)
260. D. Philps, A.S. d’Avila Garcez, T. Weyde, Making Good on LSTMs Unfulfilled Promise
(2019). arxiv:abs/1911.04489
261. J. Piaget, Biologie Et Connaissance: essai sur les relations entre les régulations organiques
et les processus cognitifs (Edinburgh University Press, 1971)
262. J. Piaget, M. Piercy, D.E. Berlyne, The Psychology of Intelligence. International library of
psychology (Routledge & Paul, 1950)
263. B.C. Pierce, Advanced Topics in Types and Programming Languages (MIT Press, 2005)
264. C.B. Pierce, Basic Category Theory for Computer Scientists (MIT Press, 1991)
265. C.B. Pierce, Types and Programming Languages (The MIT Press, 2002)
266. G.D. Plotkin, A note on inductive generalization. Mach. Intell. 5, 153–163 (1970)
267. D. Plump, Hypergraph rewriting: critical pairs and undecidability of confluence. Term Graph
Rewrit: Theory Pract. 15, 201–213 (1993)
268. K.R. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge (Routledge
Classics, Routledge, 1963)
269. T. William, Powers, Behavior: The Control of Perception (Benchmark Publications, Incorpo-
rated, 1973)
270. S. Pozzoli, M. Gallieri, R. Scattolini, Tustin Neural Networks: A Class of Recurrent Nets for
Adaptive MPC of Mechanical Systems (2019). arxiv:abs/1911.01310
271. H. Prade, G. Richard, Computational Approaches to Analogical Reasoning: Current Trends
(Springer, 2014)
272. E. Price, W. Zaremba, I. Sutskever, Extensions and Limitations of the Neural GPU (2016).
arxiv:abs/1611.00736
273. A. Puigdoménech, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, D. Guo, C. Blundell,
Agent57: outperforming the human Atari benchmark (2020). https://deepmind.com/blog/
article/Agent57-Outperforming-the-human-Atari-benchmark. Accessed 3rd Sep. 2021
274. qntm. https://twitter.com/badedgecases/status/1421855808699125762?s=20
275. A. Quaglino, M. Gallieri, J. Masci, J. Koutník, SNODE: spectral discretization of neural ODEs
for system identification, in 8th International Conference on Learning Representations, ICLR
2020, Addis Ababa, Ethiopia, 26–30 Apr. 2020. https://www.OpenReview.net
276. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language Models are Unsu-
pervised Multitask Learners (2019). https://d4mucfpksywv.cloudfront.net/better-language-
models/language-models.pdf
277. R. Raileanu, M. Goldstein, A. Szlam, R. Fergus, Fast Adapt. via Policy-Dyn. Value Fun. 2007,
02879 (2020)
132 Bibliography
278. N. Ramanan, S. Natarajan, Causal learning from predictive modeling for observational data.
Front. Big Data 3, 34 (2020)
279. C.H. Rankin, Invertebrate learning: what can’t a worm learn? Curr. Biol. 14(15), R617–R618
(2004)
280. A. Rasheed, O. San, T. Kvamsdal, Digital twin: values, challenges and enablers from a mod-
eling perspective. IEEE Access 8, 21980–22012 (2020)
281. S. Rasti, N. Marandi, A. Abdoli, M. Delavari, S. Gholam Abbas, Mousavi, Serological and
molecular detection of Toxoplasma gondii in sheep and goats in Kashan, Central Iran. J. Food
Saf. 38(2), e12425 (2018)
282. A. Ray, J. Achiam, D. Amodei, Benchmarking Safe Exploration in Deep Reinforcement Learn-
ing (2019). https://cdn.openai.com/safexp-short.pdf
283. S.E. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, Generative adversarial text to
image synthesis, in Proceedings of the 33nd International Conference on Machine Learning,
ICML 2016, New York City, NY, USA, 19–24 June 2016, JMLR Workshop and Conference
Proceedings, ed. by M.-F. Balcan, K.Q. Weinberger, vol. 48 (2016), pp. 1060–1069. https://
www.JMLR.org
284. S.E. Reed, N. de Freitas, Neural programmer-interpreters, in 4th International Conference on
Learning Representations, ICLR 2016, San Juan, Puerto Rico, 2–4 May 2016, Conference
Track Proceedings, ed. by Y. Bengio, Y. LeCun (2016)
285. T. Regier, The Human Semantic Potential: Spatial Language and Constrained Connectionism
(MIT Press, Cambridge, MA, 1996)
286. M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J.B. Tenenbaum, H. Larochelle, R.S.
Zemel, Meta-Learning for semi-supervised few-shot classification, in 6th International Con-
ference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 Apr.–3 May
2018, Conference Track Proceedings. https://www.OpenReview.net
287. J.C. Reynolds, Definitional interpreters for higher-order programming languages, in Proceed-
ings of the ACM Annual Conference, ACM ’72, New York, NY, USA, vol. 2 (Association for
Computing Machinery, 1972), pp. 717–740
288. E. Rich, Artificial Intelligence, 2nd edn. (McGraw-Hill Higher Education, 1990)
289. S.B. Richard, O. de Moor, Algebra of Programming (Prentice Hall International Series in
Computer Science, Prentice Hall, 1997)
290. E. Riehl, Category Theory in Context Dover Modern Math Originals (Dover Publications,
Aurora, 2017)
291. M. Riley, Categories of Optics (2018). arXiv:1809.00738
292. M. Ring, L. Orseau, Delusion, survival, and intelligent agents, in Artificial General Intel-
ligence, Berlin, Heidelberg. ed. by J. Schmidhuber, K.R. Thórisson, M. Looks (Springer,
Berlin, Heidelberg, 2011), pp. 11–20
293. S. Ritter, R. Faulkner, L. Sartran, A. Santoro, M. Botvinick, D. Raposo, Rapid Task-Solv.
Novel Environ. 2006, 03662 (2020)
294. T. Rocktäschel, S. Riedel, End-to-end differentiable proving, in Advances in Neural Informa-
tion Processing Systems 30: Annual Conference on Neural Information Processing Systems
2017, December 4-9, 2017, Long Beach, CA, USA, ed. by I. Guyon, U. von Luxburg, S.
Bengio, H.M. Wallach, R. Fergus, S.V.N. Vishwanathan, R. Garnett, pp. 3788–3800 (2017)
295. Towards Functionally Elegant Grand Unification, P. Rosenbloom, A. Demski, Volkan Ustun,
The sigma cognitive architecture and system. J. Artif. Gen Intell. 7, 1–103 (2016)
296. P.S. Rosenbloom, Deconstructing reinforcement learning in sigma, in Artificial General Intel-
ligence, Berlin, Heidelberg. ed. by J. Bach, B. Goertzel, M. Iklé (Springer, Heidelberg, Berlin,
2012), pp. 262–271
297. P.S. Rosenbloom, Deconstructing episodic memory and learning in sigma, in Proceedings
of the 36th Annual Meeting of the Cognitive Science Society, CogSci 2014, Quebec City,
Canada, 23–26 July 2014, ed. by P. Bello, M. Guarini, M. McShane, B. Scassellati (2014).
https://www.cognitivesciencesociety.org
298. P.S. Rosenbloom, Lessons from mapping sigma onto the standard model of the mind: self-
monitoring, memory/learning, and symbols, in 2017 AAAI Fall Symposia, Arlington, Virginia,
USA, 9–11 Nov. 2017 (AAAI Press, 2017), pp. 449–454
Bibliography 133
299. A. Rosenblueth, N. Wiener, J. Bigelow, Behavior, purpose and teleology. Philosop. Sci. 10(1),
18–24 (1943)
300. S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 3rd edn. (Prentice Hall
Press, USA, 2009)
301. C. Rutz, G.R. Hunt, J.J.H. St Clair, Corvid technologies: how do new caledonian crows get
their tool designs? Curr. Biol. 28(18), R1109–R1111 (2018)
302. A.M. Saxe, Y. Bansal, J. Dapello, M. Advani, A. Kolchinsky, B.D. Tracey, D.D. Cox, On
the information bottleneck theory of deep learning, in ICLR (Poster) (2018). https://www.
OpenReview.net
303. D. Saxton, E. Grefenstette, F. Hill, P. Kohli, Analysing mathematical reasoning abilities of
neural models, in 7th International Conference on Learning Representations, ICLR 2019,
New Orleans, LA, USA, 6–9 May 2019 (2019). https://www.OpenReview.net
304. R.C. Schank, R.P. Abelson, Scripts, Plans, Goals, and Understanding: An Inquiry Into Human
Knowledge Structures (Lawrence Erlbaum Associates, The Artificial Intelligence Series,
1977)
305. T. Schaul, D. Horgan, K. Gregor, D. Silver, Universal value function approximators, in Pro-
ceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceed-
ings of Machine Learning Research, PMLR, Lille, France, 07–09 Jul 2015, ed. by F. Bach,
D. Blei (2015), pp. 1312–1320
306. J. Schmidhuber, Gödel machines: self-referential universal problem solvers making provably
optimal self-improvements. Technical Report IDSIA-19-03. IDSIA, Manno-Lugano, Switzer-
land (2003). arXiv:cs.LO/0309048. Accessed 2006
307. J. Schmidhuber, POWERPLAY: Training an Increasingly General Problem Solver by Contin-
ually Searching for the Simplest Still Unsolvable Problem (2011). arxiv:abs/1112.5309
308. M. Schmidt, U. Krumnack, H. Gust, K.-U. Kühnberger, Computational Approaches to Analog-
ical Reasoning: Current Trends, chapter Heuristic-Driven Theory Projection: An Overview
(Springer, Berlin Heidelberg, 2014), pp. 163–194
309. B. Schölkopf, F. Locatello, S. Bauer, N.R. Ke, N. Kalchbrenner, A. Goyal, Y. Bengio, Toward
causal representation learning. Proc. IEEE—Adv. Mach. Learn. Deep Neural Netw. 109(5),
612–634 (2021)
310. B. Schölkopf, Causality for Machine Learning (2019). arxiv:abs/1911.10500
311. M. Shanahan, K. Nikiforou, A. Creswell, C. Kaplanis, D.GT. Barrett, M. Garnelo, An explic-
itly relational neural network architecture, in Proceedings of the 37th International Conference
on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event, Proceedings of Machine
Learning Research. PMLR, (2020), pp. 8593–8603
312. A. Sharma, S. Gu, Sergey L. Vikash Kumar, K. Hausman, Dynamics-Aware Unsupervised
Discovery of Skills (2019). arxiv:abs/1907.01657
313. M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, B. Catanzaro, Megatron-
LM: Training Multi-Billion Parameter Language Models Using Model Parallelism (2019).
arxiv:abs/1909.08053
314. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrit-
twieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N.
Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis,
Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–503
(2016)
315. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D.
Kumaran, T. Graepel, T.P. Lillicrap, K. Simonyan, D. Hassabis, Mastering Chess and Shogi
by Self-Play with a General Reinforcement Learning Algorithm (2017). arxiv:abs/1712.01815
316. A. Sloman, The irrelevance of turing machines to AI, in Computationalism: New Directions,
ed. by M. Scheutz (MIT Press, 2002)
317. A. Sloman, Kantian philosophy of mathematics and young robots, in Intelligent Computer
Mathematics, Berlin, Heidelberg. ed. by S. Autexier, J. Campbell, J. Rubio, V. Sorge, M.
Suzuki, F. Wiedijk (Springer, Berlin, Heidelberg, 2008), pp. 558–573
318. T.S.C. Smithe, Bayesian Updates Compose Optically (2020). arXiv:2006.01631
134 Bibliography
319. R.M. Smullyan, Godel’s Incompleteness Theorems Oxford Logic Guides (Oxford University
Press, 1992)
320. M. Sotoudeh, A.V. Thakur, Anal.-Making Core Primit. Softw. Eng Toolbox 2009, 06592
(2020)
321. D.I. Spivak, Functorial Dynamics and Interaction: A Computational Design Environment.
www.dspivak.net/grants/AFOSR2020-Topos-ContextDependence.pdf
322. A. Srinivas, A. Jabri, P. Abbeel, S. Levine, C. Finn, Univ. Plann. Netw. 1804, 00645 (2018)
323. R.K. Srivastava, B.R. Steunebrink, J. Schmidhuber, First experiments with Power Play. Neural
Netw. 41:130–136 (2013). Special Issue on Autonomous Learning
324. J.E.R. Staddon, D.T. Cerutti, Operant conditioning. Ann. Rev. Psychol. 54(1), 115–144 (2003)
325. B.R. Steunebrink, J. Koutník, K.R. Thórisson, E. Nivel, J. Schmidhuber, Resource-
Bounded machines are motivated to be effective, efficient, and curious, in Artificial General
Intelligence—6th International Conference, AGI 2013, Beijing, China, 31 July–3 Aug. 2013
Proceedings, ed. by K.-U. Kühnberger, S. Rudolph, P. Wang. Lecture Notes in Computer
Science, vol. 7999 (Springer, 2013), pp. 119–129
326. I. Stewart, J. Cohen. The Collapse of Chaos: Discovering Simplicity in a Complex World
(Penguin Books Limited, 2000)
327. R. Sun, The motivational and metacognitive control in CLARION. Modeling Integrated Cog-
nitive Systems, Apr. 2012
328. R. Sun, T. Peterson, C. Sessions, Beyond simple rule extraction: acquiring planning knowledge
from neural networks, in Neural Nets WIRN Vietri-01 (Springer, 2002), pp. 288–300
329. G.J. Sussman, J. Wisdom, Structure and Interpretation of Classical Mechanics (MIT Press,
Cambridge, MA, USA, 2001)
330. I. Sutskever, O. Vinyals, Q.V. Le, Sequence to sequence learning with neural networks, in
Advances in Neural Information Processing Systems 27: Annual Conference on Neural Infor-
mation Processing Systems 2014, 8–13 Dec. 2014, Montreal, Quebec, Canada, ed. by Z.
Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (2014), pp. 3104–
3112
331. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (The MIT Press, 2018)
332. J. Swan, C.G. Johnson, E.C. Brady, Subtype polymorphism à la carte via machine learning
on dependent types, in Companion Proceedings for the ISSTA/ECOOP 2018 Workshops,
ISSTA’18 (NY, USA (Association for Computing Machinery, New York, 2018), pp. 14–16
333. J. Swan, K. Krawiec, Z.A. Kocsis, Stochastic program synthesis via recursion schemes, in Pro-
ceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO’19,
New York, NY, USA (Association for Computing Machinery, 2019), pp. 35–36
334. J. Swan, K. Krawiec, Z.A. Kocsis, Stochastic synthesis of recursive functions made easy with
bananas, lenses, envelopes and barbed wire. Gen. Programm. Evol. Mach. 20(3), 327–350
(2019)
335. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.J. Goodfellow, R. Fergus, Intrigu-
ing properties of neural networks, in 2nd International Conference on Learning Representa-
tions, ICLR 2014, Banff, AB, Canada, 14–16 Apr. 2014, Conference Track Proceedings, ed.
by Y. Bengio, Y. LeCun (2014)
336. R. Thom, Structural Stability and Morphogenesis-An Outline of a General Theory of Models
(Benjamin, W.A., 1972)
337. D.A.W. Thompson, On Growth and Form Cambridge paperbacks (Cambridge University
Press, 1917)
338. K.R. Thórisson, D. Kremelberg, B.R. Steunebrink, E. Nivel, About understanding, in Artificial
General Intelligence—9th International Conference, AGI 2016, New York, NY, USA, 16–19
Jul. 2016, Proceedings (2016), pp. 106–117
339. S. Thrun, Flying Cars, Autonomous Vehicles, and Education. https://www.youtube.com/
watch?v=ZPPAOakITeQ. Accessed 28th Jan. 2020
340. S. Thrun, T.M. Mitchell, Lifelong robot learning, in The Biology and Technology of Intelligent
Autonomous Agents, Berlin, Heidelberg. ed. by L. Steels (Springer, Berlin, Heidelberg, 1995),
pp. 165–196
Bibliography 135
341. S. Thys, W. Van Ranst, T. Goedemé, Fooling automated surveillance cameras: adversarial
patches to attack person detection, in IEEE Conference on Computer Vision and Pattern
Recognition Workshops, CVPR Workshops 2019, Long Beach, CA, USA, 16–20 June 2019
(Computer Vision Foundation/IEEE, 2019), p. 0
342. H.R. Tiwary, On the hardness of computing intersection, union and Minkowski sum of poly-
topes. Disc. Comput. Geom. 40(3), 469–479 (2008)
343. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, P. Abbeel, Domain randomization
for transferring deep neural networks from simulation to the real world, in 2017 IEEE/RSJ
International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC,
Canada, 24–28 Sep. 2017 (IEEE, 2017), pp. 23–30
344. J. Togelius, Empiricism and the Limits of Gradient Descent, May 2018. http://togelius.
blogspot.com/2018/05/empiricism-and-limits-of-gradient.html. Accessed 28th Jan. 2020
345. F. Tramèr, A. Kurakin, N. Papernot, I.J. Goodfellow, D. Boneh, P.D. McDaniel, Ensemble
adversarial training: attacks and defenses, in 6th International Conference on Learning Rep-
resentations, ICLR 2018, Vancouver, BC, Canada, 30 Apr.–3 May 2018, Conference Track
Proceedings (2018). https://www.OpenReview.net
346. M. Turchetta, F. Berkenkamp, A. Krause, Safe exploration for interactive machine learning, in
Advances in Neural Information Processing Systems 32: Annual Conference on Neural Infor-
mation Processing Systems 2019, NeurIPS 2019, 8–14 Dec. 2019, Vancouver, BC, Canada, ed.
by H.M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E.B. Fox, R. Garnettpages
pp. 2887–2897 (2019)
347. H. Turner, Representing actions in logic programs and default theories: a situation calculus
approach. J. Log. Program. 31(1–3), 245–298 (1997)
348. E. Tzeng, J. Hoffman, K. Saenko, T. Darrell, Adversarial discriminative domain adaptation, in
2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu,
HI, USA, 21–26 July 2017 (IEEE Computer Society, 2017), pp. 2962–2971
349. W. van Melle, MYCIN: a knowledge-based consultation program for infectious disease diag-
nosis. Int. J. Man-Mach. Stud. 10(3), 313–322 (1978)
350. K. Van Moffaert, M. Drugan, A. Nowe, Scalarized multi-objective reinforcement learning:
novel design techniques in IEEE Symposium on Adaptive Dynamic Programming and Rein-
forcement Learning, ADPRL, May 2013
351. O. Vinyals, I. Babuschkin, W.M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D.H. Choi, R.
Powell, T. Ewalds, P. Georgiev, O. Junhyuk, D. Horgan, M. Kroiss, I. Danihelka, A. Huang,
L. Sifre, T. Cai, J.P. Agapiou, M. Jaderberg, A.S. Vezhnevets, R. Leblond, T. Pohlen, V.
Dalibard, D. Budden, Y. Sulsky, J. Molloy, T.L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, W.
Yuhuai, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap,
K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver, Grandmaster level in StarCraft II using
multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
352. R. Volpi, H. Namkoong, O. Sener, J. Duchi, V. Murino, S. Savarese, Generalizing to unseen
domains via adversarial data augmentation, in Proceedings of the 32nd International Con-
ference on Neural Information Processing Systems, NIPS’18, Red Hook, NY, USA (Curran
Associates Inc, 2018), pp. 5339–5349
353. J. von Neumann, O. Morgenstern. Theory of Games and Economic Behavior (Princeton
University Press, 1947)
354. A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, S.R. Bowman, GLUE: a multi-task benchmark
and analysis platform for natural language understanding, in 7th International Conference on
Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019 (2019). https://
www.OpenReview.net
355. P. Wang, Embodiment: does a laptop have a body? In Proceedings of the 2nd Conference on
Artificial General Intelligence (2009) (Atlantis Press, 2009/06)
356. P. Wang, Non-Axiomatic Logic: A Model of Intelligent Reasoning (World Scientific Publishing
Co., Inc, USA, 2013)
357. P. Wang, On defining artificial intelligence. J. Artif. Gen. Intell. 10(2), 1–37 (2019)
136 Bibliography
358. P. Wang, P. Hammer, H. Wang, An architecture for real-time reasoning and learning, in Arti-
ficial General Intelligence, Cham, ed. by B. Goertzel, A.I. Panov, A. Potapov, R. Yampolskiy
(Springer International Publishing, 2020), pp. 347–356
359. M. Weber, M. Yurochkin, S. Botros, V. Markov, Black Loans Matter: Distribut. Robust Fair-
ness Fight. Subgr. Discrim. 2012, 01193 (2020)
360. S. Weller, U. Schmid, Solving proportional analogies by E-Generalization, in KI 2006:
Advances in Artificial Intelligence, 29th Annual German Conference on AI, KI 2006, Bre-
men, Germany, 14–17 June 2006, Proceedings, pp. 64–75 (2006)
361. M. Werning, W. Hinzen, E. Machery, (eds.), The Oxford Handbook of Compositionality
(Oxford University Press, 2012)
362. G. Wheeler, Bounded rationality, in The Stanford Encyclopedia of Philosophy, ed. by E.N.
Zalta (Metaphysics Research Lab, Stanford University, fall 2020 edn., 2020)
363. E.P. Wigner, The unreasonable effectiveness of mathematics in the natural sciences. Com-
munications on Pure and Applied Mathematics (Richard Courant Lecture in Mathematical
Sciences Delivered at New York University, 11 May 1959)
364. A. Daniel, Understanding as representation manipulability. Synthese 190(6), 997–1016 (2013)
365. J.C. Willems, The behavioral approach to open and interconnected systems. IEEE Control
Syst. Mag. 27(6), 46–99 (2007)
366. T. Winograd, Understanding Natural Language (Academic Press Inc, USA, 1972)
367. L. Wittgenstein. Tractatus Logico-Philosophicus (Dover Publications, 1922)
368. J. Woehr, An interview with Donald Knuth. Dr Dobb’s J.-Softw. Tools Prof. Programm. 21(4),
16–23 (1996)
369. J. Woodward, J. Swan, S. Martin, The ‘composite’ design pattern in metaheuristics. GECCO
2014—Companion Publication of the 2014 Genetic and Evolutionary Computation Confer-
ence, July 2014
370. A.W. Yu, D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi, Q.V. Le, QANet: combining
local convolution with global self-attention for reading comprehension, in 6th International
Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 Apr.–3
May 2018, Conference Track Proceedings (2018). https://www.OpenReview.net
371. Yu. Tianhe, G. Shevchuk, D. Sadigh, C. Finn, Unsupervised Visuomotor Control Through
Distrib. Plann. Netw. 1902, 05542 (2019)
372. A.L. Zadeh, Fuzzy sets. Inf. Control 8, 338–353 (1965)
373. W. Zaremba, I. Sutskever, Learning to Execute (2014). arxiv:abs/1410.4615
374. C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, Understanding deep learning requires
rethinking generalization, in 5th International Conference on Learning Representations, ICLR
2017, Toulon, France, 24–26 Apr. 2017, Conference Track Proceedings (2017). https://www.
OpenReview.net
375. M. Zhu, S. Gupta, To Prune, or not to Prune: Exploring the Efficacy of Pruning for Model
Compression (2017). 1710.01878
376. P. Zucker, Reverse Mode Differentiation is Kind of Like a Lens II (2018). http://www.
philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/. Accessed 11th
Dec. 2020