0% found this document useful (0 votes)
55 views7 pages

Abductive Knowledge Induction From Raw Data

The document summarizes a research paper on Abductive Meta-Interpretive Learning (MetaAbd), a new method for jointly learning neural networks and inducing logic theories from raw data. MetaAbd extends previous abductive learning frameworks by combining logical induction, abduction, and neural networks. It uses neural networks to extract probabilistic logic facts from raw data and induces an abductive logic program to infer the truth values of facts and train the neural model. Experimental results show MetaAbd outperforms other models in predictive accuracy and data efficiency, and learns interpretable models that can be reused for subsequent tasks. To the best of the authors' knowledge, MetaAbd is the first system that can simultaneously train neural

Uploaded by

Tom West
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views7 pages

Abductive Knowledge Induction From Raw Data

The document summarizes a research paper on Abductive Meta-Interpretive Learning (MetaAbd), a new method for jointly learning neural networks and inducing logic theories from raw data. MetaAbd extends previous abductive learning frameworks by combining logical induction, abduction, and neural networks. It uses neural networks to extract probabilistic logic facts from raw data and induces an abductive logic program to infer the truth values of facts and train the neural model. Experimental results show MetaAbd outperforms other models in predictive accuracy and data efficiency, and learns interpretable models that can be reused for subsequent tasks. To the best of the authors' knowledge, MetaAbd is the first system that can simultaneously train neural

Uploaded by

Tom West
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)

Abductive Knowledge Induction From Raw Data

Wang-Zhou Dai , Stephen Muggleton


Department of Computing, Imperial College London, London, UK
{w.dai, s.muggleton}@imperial.ac.uk

Abstract and programs [Dong et al., 2019; Gaunt et al., 2017]. How-
ever, it is hard to design a unified differentiable module to ac-
For many reasoning-heavy tasks involving raw in- curately represent general relational knowledge, which may
puts, it is challenging to design an appropriate end- contain complex inference structures such as recursion [Glas-
to-end learning pipeline. Neuro-Symbolic Learn- machers, 2017; Garcez et al., 2019].
ing, divide the process into sub-symbolic percep- Therefore, many researchers propose to break the end-to-
tion and symbolic reasoning, trying to utilise data- end learning pipeline apart, and build a hybrid model that
driven machine learning and knowledge-driven rea- consists of smaller modules where each of them only ac-
soning simultaneously. However, they suffer from counts for one specific function [Glasmachers, 2017]. A rep-
the exponential computational complexity within resentative branch in this line of research is Neuro-Symbolic
the interface between these two components, where (NeSy) AI [De Raedt et al., 2020; Garcez et al., 2019] aim-
the sub-symbolic learning model lacks direct super- ing to bridge System 1 and System 2 AI [Kahneman, 2011;
vision, and the symbolic model lacks accurate input Bengio, 2017], i.e., neural-network-based machine learning
facts. Hence, most of them assume the existence of and symbolic-based relational inference.
a strong symbolic knowledge base and only learn However, the lack of supervision in the non-differentiable
the perception model while avoiding a crucial prob- interface between neural and symbolic systems, based on the
lem: where does the knowledge come from? In facts extracted from raw data and their truth values, leads to
this paper, we present Abductive Meta-Interpretive high computational complexity in learning [Li et al., 2020;
Learning (M etaAbd ) that unites abduction and in- Dai et al., 2019]. Consequently, almost all neural-symbolic
duction to learn neural networks and induce logic models assume the existence of a very strong predefined do-
theories jointly from raw data. Experimental results main knowledge base and could not perform program induc-
demonstrate that M etaAbd not only outperforms tion. This limits the expressive power of the hybrid-structured
the compared systems in predictive accuracy and model and sacrifices many benefits of symbolic learning (e.g.,
data efficiency but also induces logic programs that predicate invention, learning recursive theories, and re-using
can be re-used as background knowledge in subse- learned models as background knowledge).
quent learning tasks. To the best of our knowledge,
In this paper, we integrate neural networks with Inductive
M etaAbd is the first system that can jointly learn
Logic Programming (ILP) [Muggleton and de Raedt, 1994] to
neural networks from scratch and induce recursive
enable first-order logic theory induction from raw data. More
first-order logic theories with predicate invention.
specifically, we present Abductive Meta-Interpretive Learn-
ing (M etaAbd ) which extends the Abductive Learning (ABL)
framework [Dai et al., 2019; Zhou, 2019] by combining log-
1 Introduction ical induction and abduction [Flach et al., 2000] with neural
Despite the success of data-driven end-to-end deep learn- networks in Meta-Interpretive Learning (MIL) [Muggleton
ing in many traditional machine learning tasks, it has been et al., 2015]. M etaAbd employs neural networks to extract
shown that incorporating domain knowledge is still necessary probabilistic logic facts from raw data, and induces an ab-
for some complex learning problems [Dhingra et al., 2020; ductive logic program [Kakas et al., 1992] that can efficiently
Grover et al., 2019; Trask et al., 2018]. In order to leverage infer the truth values of the facts to train the neural model.
complex domain knowledge that is discrete and relational, To the best of our knowledge, M etaAbd is the first system
end-to-end learning systems need to represent it with a dif- that can simultaneously (1) train neural models from scratch,
ferentiable module that can be embedded in the deep learn- (2) learn recursive logic theories and (3) perform predicate
ing context. For example, graph neural networks (GNN) invention from domains with sub-symbolic representation.
use relational graphs as an external knowledge base [Zhou In the experiments we compare M etaAbd to the compared
et al., 2018]; some works even considers more specific do- state-of-the-art end-to-end deep learning models and neuro-
main knowledge such as differentiable primitive predicates symbolic methods on two complex learning tasks. The results

1845
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)

show that, given the same amount of background knowledge, 3 Abductive Meta-Interpretive Learning
M etaAbd outperforms the compared models significantly in 3.1 Problem Formulation
terms of predictive accuracy and data efficiency, and learns
human interpretable models that could be re-used in subse- A typical model bridging sub-symbolic and symbolic learn-
quent learning tasks. ing contains two major parts: a perception model and a
reasoning model [Dai et al., 2019]. The perception model
maps sub-symbolic inputs x ∈ X to some primitive symbols
2 Related Work z ∈ Z, such as digits, objects, ground logical expressions,
Solving “System 2” problems requires the ability of relational etc. The reasoning model takes the interpreted z as input and
and logical reasoning [Kahneman, 2011; Bengio, 2017]. Due infers the final output y ∈ Y according to a symbolic knowl-
to its complexity, many researchers have tried to embed in- edge base B. Because the primitive symbols z are uncertain
tricate background knowledge in end-to-end deep learning and not observable from both training data and the knowledge
models. For example, [Trask et al., 2018] propose the dif- base, we have named them as pseudo-labels of x.
ferentiable Neural Arithmetic Logic Units (NALU) to model The perception model is parameterised with θ and outputs
basic arithmetic functions (e.g., addition, multiplication, etc.) the conditional probability Pθ (z|x) = P (z|x, θ); the reason-
in neural cells; [Grover et al., 2019] encode permutation op- ing model H ∈ H is a set of first-order logical clauses such
erators with a stochastic matrix and present a differentiable that B∪H∪z |= y, where “|=” means “logically entails”. Our
approximation to the sort operation; [Wang et al., 2019] in- target is to learn θ and H simultaneously from training data
troduce a differentiable SAT solver to enable gradient-based D = {hxi , yi i}ni=1 . For example, if we have one example
constraint solving. However, most of these specially designed with x = [ , , ] and y = 6, given background knowl-
differentiable modules are ad hoc approximations to the orig- edge about adding two numbers, the hybrid model should
inal symbolic inference mechanisms. learn a perception model that recognises z = [1, 2, 3] and
To exploit the complex background knowledge expressed induce a program to add each number in z recursively.
by formal languages directly, Statistical Relational (StarAI) Assuming that D is an i.i.d. sample from the underlying
and Neural Symbolic (NeSy) AI [De Raedt et al., 2020; distribution of (x, y), our objective can be represented as
Garcez et al., 2019] try to use probabilistic inference or other
Y X
(H ∗ , θ∗ ) = arg max P (y, z|B, x, H, θ), (1)
differentiable functions to approximate logical inference [Co- H,θ
hx,yi∈D z∈Z
hen et al., 2020; Dong et al., 2019; Manhaeve et al., 2018;
Donadello et al., 2017]. However, they require a pre-defined where pseudo-label z is a hidden variable. Theoretically, this
symbolic knowledge base and only train the attached neu- problem can be solved by Expectation Maximisation (EM) al-
ral/probabilistic models due to the highly complex interface gorithm. However, the symbolic hypothesis H—a first-order
between the neural and symbolic modules. logic theory—is difficult to be optimised together with the
One way to learn symbolic theories is to use Inductive parameter θ, which has a continuous hypothesis space.
Logic Programming [Muggleton and de Raedt, 1994]. Some We propose to solve this problem by treating H like z as
early work on combining logical abduction and induction can an extra hidden variable, which gives us:
learn logic theories even when input data is incomplete [Flach
Y X X
θ∗ = arg max P (y, H, z|B, x, θ). (2)
et al., 2000]. Recently, ∂ILP was proposed for learning first- θ
hx,yi∈D H∈H z∈Z
order logic theories from noisy data [Evans and Grefenstette,
2018]. However, ILP-based works are designed for learning Now, the learning problem can be split into two EM steps:
in symbolic domains. Otherwise, they need to use a fully (1) Expectation: obtaining the expected value of H and z
trained neural models to make sense of the raw inputs. by sampling them in their discrete hypothesis space from
Machine apperception [Evans et al., 2021] unifies Answer (H, z) ∼ P (H, z|B, x, y, θ); (2) Maximisation: estimating
Set Programming with perception by modelling it with binary θ by maximising the likelihood of training data with numeri-
neural networks. It can learn recursive logic theories and per- cal optimisation approaches such as gradient descent.
form concept (monadic predicate) invention. However, both Challenges. The main challenge is to estimate the expecta-
logic hypotheses and the parameters of neural networks are tion of the hidden variables H ∪ z, i.e., we need to search for
represented by logical groundings, making the system very the most probable H and z given the θ learned in the previous
hard to optimise. For problems involving noisy inputs like iteration. This is not trivial. Even when B is sound and com-
MNIST images, it still requires a fully pre-trained perceptual plete, estimating the truth-values of hidden variable z results
neural net to extract the symbols like ILP-based methods. in a search space growing exponentially with the number of
Previous work on Abductive Learning (ABL) [Dai et al., training examples, which is verified in our experiments with
2019; Dai and Zhou, 2017] also unites sub-symbolic percep- DeepProblog [Manhaeve et al., 2018] in section 4.1.
tion and symbolic reasoning through abduction, but they need Furthermore, H and z are entangled, which makes the
a pre-defined knowledge base to enable abduction and can- tasks even harder. For example, given x = [ , , ]
not perform program induction. Our presented Abductive and y = 6, when the perception model correctly recognises
Meta-Interpretive Learning takes a step further, which not z = [1, 2, 3], we have at least two choices for H: cumula-
only learns a perception model that can make sense of raw tive sum or cumulative product. When the perception model
data, but also learns logic programs and performs predicate is under-trained and outputs z = [2, 2, 3], then H could be a
invention to understand the underlying relations in the task. program that only multiplies the last two digits.

1846
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)

Example (hx, yi): Pseudo-labels (z): Abduced facts: Abductive hypotheses (H):
f([ , , ], 15). [0,0,0] f(A,B) :- add(A,B).
+ #= 15.
...
Abducible Primitives (B): [3,5,0] * #= 15. f(A,B) :- mult(A,B).
add([A,B|T], [C|T]) :- C #= A+B. ...
mult([A,B|T], [C|T]) :- C #= A*B. [0,3,5] * #= 15. f(A,B) :- add(A,C),eq(C,B).
eq([A| ], B) :- A #= B. ... ... ...
head([H| ], H). [0,5,3] f(A,B) :- add(A,C),f(C,B).
+ #= X.
tail([ |T], T). ... f(A,B) :- eq(A,B).
X+ #= 15.
[1,3,5] ...
Neural Probabilistic facts (pθ (z|x)): ... + #= X. f(A,B) :- tail(A,C),f 1(C,B).
nn( = 0, 0.02). nn( = 1, 0.39). [7,8,0] X* #= 15. f 1(A,B) :- mult(A,C),eq(C,B).
...
... ...
nn( = 0, 0.09). nn( = 1, 0.02). [7,8,1]
... ... * #= X. f(A,B) :- mult(A,C),f 1(C,B).
nn( = 0, 0.07). nn( = 1, 0.00). [7,3,5] X* #= 15. f 1(A,B) :- mult(A,C),eq(C,B).
... ... ... ...

Figure 1: Example of M etaAbd ’s abduction-induction learning. Given training examples, background knowledge of abducible primitives
and probabilistic facts generated by a perceptual neural net, M etaAbd learns an abductive logic program H and abduces relational facts as
constraints (implemented with the CLP(Z) predicate “#=”1 ) over the input images; it then uses them to efficiently prune the search space of
the most probable pseudo-labels z (in grey blocks) for training the neural network.

3.2 Probabilistic Abduction-Induction Reasoning instead of directly sampling pseudo-labels z and H together
Inspired by early works in abductive logic program- from the huge hypothesis space, our M etaAbd induces ab-
ming [Flach et al., 2000], we propose to solve the challenges ductive hypothesis H consists of abducible primitives, and
above by combining logical induction and abduction. The in- then use the abduced facts to prune the search space of z.
duction learns an abductive logic theory H based on Pθ (z|x); Meanwhile, the perception model outputs the likelihood of
the abduction made by H reduces the search space of z. pseudo-labels with pθ (z|x) defining a distribution over all
Abductive reasoning, or abduction refers to the process of possible values of z and helps to find the most probable H ∪z.
selectively inferring specific grounded facts and hypotheses Formally, we re-write the likelihood of each hx, yi in Eq. 2:
that give the best explanation to observations based on back- P (y, H, z|B, x, θ) = P (y, H|B, z)Pθ (z|x)
ground knowledge of a deductive theory. = P (y|B, H, z)P (H|B, z)Pθ (z|x)
Definition 3.1 (Abducible primitive) An abducible primi- = P (y|B, H, z)Pσ∗ (H|B)Pθ (z|x), (3)
tive is a predicate that defines the explanatory grounding facts
in abductive reasoning. where Pσ∗ (H|B) is the Bayesian prior distribution on first-
order logic hypotheses, which is defined by the transitive
Definition 3.2 (Abductive hypothesis) An abductive hy- closure of stochastic refinements σ ∗ given the background
pothesis is a set of first-order logic clauses whose body knowledge B [Muggleton et al., 2013], where a refinement
contains literals of abductive primitives. σ is a unit modification (e.g., adding/removing a clause or
Following is an example of using abductive hypothesis and literal) to a logic theory. The equations hold because: (1)
abducible primitive in problem-solving: pseudo-label z is conditioned on x and θ since it is the output
of the perception model; (2) H follows the prior distribution
Example 1 Observing raw inputs x = [ , , ] and a so it only depends on B; (3) y ∪ H is independent from x
symbolic output y = 6, we could formulate an abductive hy- given z because the relations among B, H, y and z are deter-
pothesis H that is a recursive cumulative sum function, whose mined by pure logical inference, where:
abductive primitives are “+” and “=”. Hence, H will abduce 
a set of explanatory ground facts { + = Z, Z + = 6}. 1, if B ∪ H ∪ z |= y,
P (y|B, H, z) = (4)
Based on these facts, we could infer that none of the digits 0, otherwise.
in x is greater than 6. Furthermore, if the current perception
model assigns very high probabilities to = 2 and = 3, Following Bayes’ rule we have P (H, z|B, x, y, θ) ∝
we could easily infer that = 1 even when the perception P (y, H, z|B, x, θ). Now we can search for the most probable
model has relatively low confidence about it. H ∪ z in the expectation step according to Eq. 3 as follows:
1. Induce an abductive theory H ∼ Pσ∗ (H|B);
An illustrative example of combining abduction and induc-
tion with probabilities is shown in Fig. 1. Briefly speaking, 2. Use H ∪ B and y to abduce2 possible pseudo-labels z,
which are guaranteed to satisfy H ∪ B ∪ z ` y and
1
CLP(Z) is a constraint logic programming package accessible at exclude the values of z such that P (y|B, H, z) = 0;
https://github.com/triska/clpz. More implementation details in on-
2
line version: https://arxiv.org/abs/2010.03514. It can be parallelled, see https://arxiv.org/abs/2010.03514.

1847
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)

Abductive Meta-Interpreter
prove([], Prog, Prog, [], Prob, Prob).
4 Experiments
prove([Atom|As], Prog1, Prog1, Abds, Prob1, Prob2) :-
deduce(Atom),
This section describes the experiments of learning recursive
prove(As, Prog1, Prog2, Abds, Prob1, Prob2). arithmetic and sorting algorithms from images of handwrit-
prove([Atom|As], Prog1, Prog1, Abds, Prob1, Prob2) :- ten digits3 , aiming to address the following questions: (1)
call abducible(Atom, Abd, Prob),
Prob3 is Prob1 * Prob,
Can M etaAbd learn first-order logic programs and train per-
get max prob(Max), Prob3 > Max, ceptual neural networks jointly? (2) Given the same or less
set max prob(Prob3), amount of domain knowledge, is hybrid modelling, which di-
prove(As, Prog1, Prog1, [Abd|Abds], Prob3, Prob2).
prove([Atom|As], Prog1, Prog2, Abds, Prob1, Prob2) :-
rectly leverages the background knowledge in symbolic form,
meta-rule(Name, MetaSub,(Atom :- Body), Order), better than end-to-end learning?
Order,
substitue(metasub(Name, MetaSub), Prog1, Prog3),
prove(Body, Prog3, Prog4),
4.1 Cumulative Sum and Product From Images
prove(As, Prog4, Prog2, Abds, Prob1, Prob2) Materials. We follow the settings in [Trask et al., 2018].
The inputs of the two tasks are sequences of randomly cho-
Figure 2: Prolog code for M etaAbd .
sen MNIST digits; the numerical outputs are the sum and
product of the digits, respectively. The lengths of training
3. According to Eq. 3 and 4, score each sampled H ∪ z: sequences are 2–5. To verify if the learned models can ex-
trapolate to longer inputs, the length of test examples ranges
score(H, z) = Pσ∗ (H|B)Pθ (z|x) (5)
from 5 to 100. Note that for cumulative product, the extrap-
4. Return the H ∪ z with the highest score. olation examples has maximum length 15 and only contain
digits from 1 to 9. The dataset contains 3000 and 1000 exam-
3.3 The M etaAbd Implementation ples for training and validation, respectively; the test data of
Meta-Interpretive Learning [Muggleton et al., 2015] is a form each length has 10,000 examples.
of ILP [Muggleton and de Raedt, 1994]. It learns first-order Methods. We compare M etaAbd with following state-of-
logic programs with a second-order meta-interpreter, which the-art baselines: End-to-end models include RNN, LSTM
consists of a definite first-order background knowledge B and and LSTMs attached to Neural Accumulators(NAC) and
meta-rules M . B contains the primitive predicates for con- Neural Arithmetic Logic Units (NALU) [Trask et al., 2018];
structing first-order hypotheses H; M is second-order clauses NeSy system DeepProblog [Manhaeve et al., 2018]4 .
with existentially quantified predicate variables and univer-
A convnet processes the input images to the recurrent
sally quantified first-order variables. In short, MIL attempts
networks and Problog programs, as [Trask et al., 2018]
to prove the training examples and saves the resulting pro-
and [Manhaeve et al., 2018] described; it also serves as
grams for successful proofs.
the perception model of M etaAbd to output the probabilis-
M etaAbd extends the general meta-interpreter of MIL by
tic facts. As shown in Tab. 1, NAC, NALU and M etaAbd
including an abduction procedure (bold fonts in Fig. 2) that
are aware of the same amount of background knowledge for
can abduce groundings (e.g., specific constraints on pseudo-
learning both perceptual convnet and recursive arithmetic al-
labels z). As shown in Fig. 2, it recursively proves a series
gorithms jointly, while DeepProblog is provided with the
of atomic goals by deduction (deduce/1), abducing explana-
ground-truth program and only trains the perceptual convnet.
tory facts (call abducible/3) or generating a new clause
from meta-rule/4. Each experiment is carried out five times, and the average
The last argument of call abducible/3, Prob = of the results are reported. The performance is measured by
Pθ (z|x), describes the distribution of possible worlds col- classification accuracy (Acc.) on length-one inputs, mean av-
lected from the raw inputs. It helps pruning the search space erage error (MAE) in sum tasks, and mean average error on
of the abductive hypothesis H. During the iterative refine- logarithm (log MAE) of the outputs in product tasks whose
ment of H, M etaAbd greedily aborts its current prove/6 error grows exponentially with sequence length.
procedure once it has a lower probability than the best ab- Results. Our experimental results are shown in Tab. 2; the
duction so far (the 8th line in Fig. 2). learned first-order logic theories are shown in Fig. 3a. The
After an abductive hypothesis H has been constructed, the end-to-end models that do not exploit any background knowl-
search for z will be done by logical abduction. Finally, the edge (LSTM and RNN) perform worst. NALU and NAC is
score of H ∪ z will be calculated by Eq. 5, where Pθ (z|x) slightly better because they include neural cells with arith-
is the output of the perception model, which in this work is metic modules, but the end-to-end learning pipeline based on
implemented with a neural network ϕθ that outputs: embeddings results in low sample-efficiency. DeepProblog
Pθ (z|x) = sof tmax(ϕθ (x, z)). does not finish the training on the cumulative sum task and
the test on cumulative product task within 72 hours because
Meanwhile, we define the prior distribution on H by follow- the recursive programs result in a huge groundings space for
ing [Hocquette and Muggleton, 2018]: its maximum a posteriori (MAP) estimation.
6
Pσ∗ (H|B) = , 3
Code & data: https://github.com/AbductiveLearning/Meta Abd
(π · c(H))2 4
We use NAC/NALU at https://github.com/kevinzakka/NALU-
where C(H) is the complexity of H, e.g., its size. pytorch; DeepProblog at https://bitbucket.org/problog/deepproblog

1848
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)

Domain Knowledge End-to-end Models Neuro-Symbolic Models MetaAbd


Recurrence LSTM & RNN Problog’s list operations Prolog’s list operations
Arithmetic functions NAC& NALU [Trask et al., 2018] Full program of accumulative sum/product Predicates add, mult and eq
Sequence & Odering Permutation matrix Psort [Grover et al., 2019] Predicates “>”, “=” and “<” [Dong et al., 2019] Prolog’s permutation
Sorting sort operator [Grover et al., 2019] swap(i,j) operator [Dong et al., 2019] Predicate s (learned from sub-task)

Table 1: Domain knowledge used by the compared models.

MNIST cumulative sum MNIST cumulative product


Acc. MAE Acc. log MAE
Sequence Length 1 5 10 100 1 5 10 15
LSTM 9.80% 15.3008 44.3082 449.8304 9.80% 11.1037 19.5594 21.6346
RNN-Relu 10.32% 12.3664 41.4368 446.9737 9.80% 10.7635 19.8029 21.8928
DeepProblog Training timeout (72 hours) 93.64% Test timeout (72 hours)
LSTM-NAC 7.02% 6.0531 29.8749 435.4106 0.00% 9.6164 20.9943 17.9787
LSTM-NAC10k 8.85% 1.9013 21.4870 424.2194 10.50% 9.3785 20.8712 17.2158
LSTM-NALU 0.00% 6.2233 32.7772 438.3457 0.00% 9.6154 20.9961 17.9487
LSTM-NALU10k 0.00% 6.1041 31.2402 436.8040 0.00% 8.9741 20.9966 18.0257
M etaAbd 95.27% 0.5100 1.2994 6.5867 97.73% 0.3340 0.4951 2.3735
LSTM-NAC1-shot CNN 49.83% 0.8737 21.1724 426.0690 0.00% 6.0190 13.4729 17.9787
LSTM-NALU1-shot CNN 0.00% 6.0070 30.2110 435.7494 0.00% 9.6176 20.9298 18.1792
M etaAbd+1-shot CNN 98.11% 0.2610 0.6813 4.7090 97.94% 0.3492 0.4920 2.4521

Table 2: Accuracy on the MNIST cumulative sum/product tasks.

substantial improvement in the number of Prolog inferences


and significantly the complexity of searching pseudo-labels.

4.2 Bogosort From Images


Materials. We follow the settings in [Grover et al., 2019].
The input of this task is a sequence of randomly chosen
MNIST images of distinct numbers; the output is the cor-
rect ranking (from large to small) of the digits. For exam-
ple, when x = [ , , , , ] then the output should
be y = [3, 1, 4, 5, 2] because the ground-truth labels z ∗ =
[5, 9, 4, 3, 8]. The training dataset contains 3000 training/test
and 1000 validation examples. The training examples are se-
(a) Learned programs (b) Abduction time. quences of length 5, and we test the learned models on image
sequences with lengths 3, 5 and 7.
Figure 3: Learned programs and the time efficiency of M etaAbd .
Methods. We compare M etaAbd to an end-to-end model
NeuralSort [Grover et al., 2019] and a state-of-the-art NeSy
The EM-based learning of M etaAbd may be trapped in approach Neural Logical Machines (NLM) [Dong et al.,
local optima, which happens more frequently in cumulative 2019]5 . All experiments are repeated five times.
sum than produce since its distribution P (H, z|B, x, y, θ) NeuralSort can be regarded as a differentiable approxima-
is much denser. Therefore, we also carry out experiments tion to bogosort (permutation sort). Given an input list of
with one-shot pre-trained convnets, which are trained by ran- scalars, it generates a stochastic permutation matrix by apply-
domly sampling one example in each class from MNIST ing the pre-defined deterministic or stochastic sort operator
data. Although the pre-trained convnet is weak at start (Acc. on the inputs. NLM can learn sorting through reinforcement
20%∼35%), it provides a good initialisation and significantly learning in a domain whose states are described by vectors
improves the learning performance. of relational features (groundings of dyadic predicates“>”,
“==”, “<”) and action “swap”. However, the original NLM
Fig. 3b compares the time efficiency between ILP’s induc- only takes symbolic inputs6 , which provides a noisy-free re-
tion and M etaAbd ’s abduction-induction in one EM iteration lational features vector. In our experiments, we attach NLM
of learning cumulative sum. “z → H” means first sampling with the same convnet as other methods to process images.
z and then inducing H with ILP; “H → z” means first sam-
pling an abductive hypothesis H and then using H to abduce 5
We use NeuralSort at https://github.com/ermongroup/neuralsort;
z. The x-axis denotes the average number of Prolog infer- NLM at https://github.com/google/neural-logic-machines.
ences, the number at the end of each bar is the average in- 6
Please see https://github.com/google/neural-logic-machines
ference time in seconds. Evidently, the abduction leads to a /blob/master/scripts/graph/learn policy.py

1849
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)

Sequence Length 3 5 7
Neural Logical Machine (NLM) 17.97% (34.38%) 1.03% (20.27%) 0.01% (14.90%)
Deterministic NeuralSort 95.49% (96.82%) 88.26% (94.32%) 80.51% (92.38%)
Stochastic NeuralSort 95.37% (96.74%) 87.46% (94.03%) 78.50% (91.85%)
M etaAbd 96.33% (97.22%) 91.75% (95.24%) 87.42% (93.58%)

Table 3: Accuracy of MNIST sort. First value is the rate of correct permutations; second value is the rate of correct individual element ranks.

We also compared to DeepProblog with the ground-truth pro- sorting. This experiment also demonstrates M etaAbd ’s abil-
gram of sorting in this task, but it does not terminate when the ity of learning recursive logic programs and predicate inven-
neural predicate “swap net”7 is implemented to take noisy tion (the invented predicate s 1 in Fig. 3a).
image inputs by the aforementioned convnet. Therefore, we
do not display its performance in this task. 5 Conclusion
For M etaAbd , it is easy to include stronger back- In this paper, we present the Abductive Meta-Interpretive
ground knowledge for learning more efficient sorting algo- Learning (M etaAbd ) approach that can simultaneously train
rithms [Cropper and Muggleton, 2019]. But in order to make neural networks and learn recursive first-order logic theo-
a fair comparison to NeuralSort, we adapt the same back- ries with predicate invention. By combining ILP with neural
ground knowledge to logic program and let M etaAbd learn networks, M etaAbd can learn human-interpretable logic pro-
bogosort. The knowledge of permutation in M etaAbd is grams directly from raw-data, and the learned neural models
implemented with Prolog’s built-in predicate permutation. and logic theories can be directly re-used in subsequent learn-
Meanwhile, instead of providing the information about sort- ing tasks. M etaAbd adopts a general framework for combin-
ing as prior knowledge like the NeuralSort, we try to learn ing perception with logical induction and abduction. The per-
the concept of “sorted” (represented by a monadic predicate ception model extracts probabilistic facts from sub-symbolic
s) from data as a sub-task, whose training set is the subset of data; the logical induction searches for first-order abductive
the sorted examples within the training dataset (< 20 exam- theories in a relatively small hypothesis space; the logical
ples). The two tasks are trained sequentially as a curriculum. abduction uses the abductive theory to prune the vast search
M etaAbd learns the sub-task in the first five epochs and then space of the truth values of the probabilistic facts. The three
re-uses the learned models to learn bogosort. parts are optimised together in a probabilistic model.
M etaAbd uses an MLP attached to the same untrained con- In future work, we would like to apply M etaAbd in
vnet as other models to produce dyadic probabilistic facts real tasks such as computational science discovery, which
nn pred([ , | ]), which learns if the first two items in is a typical abductive process that involve both sym-
the image sequence satisfy a dyadic relation. Unlike NLM, bolic domain knowledge and continuous/noisy raw data.
the background knowledge of M etaAbd is agnostic to order- Since M etaAbd uses pure logical inference for reasoning,
ing, i.e., the dyadic nn pred is not provided with supervision it is possible to leverage more advanced symbolic infer-
on whether it should learn “greater than” or “less than”, so ence/optimisation techniques like Satisfiability Modulo The-
nn pred only learns an unknown dyadic partial order among ories (SMT) [Barrett and Tinelli, 2018] and Answer Set Pro-
MNIST images. As we can see, the background knowledge gramming (ASP) [Lifschitz, 2019] to reason more efficiently.
used by M etaAbd is much weaker than the others.
Results. Tab. 3 shows the average accuracy of the compared Acknowledgements
methods in the sorting tasks; Fig. 3a shows the learned pro- The first author acknowledges support from the UK’s EPSRC
grams by M etaAbd . The performance is measured by the av- Robot Synthetic Biologist, grant EP/R034915/1, for finan-
erage proportion of correct permutations and individual per- cial support. The second author acknowledges support from
mutations following [Grover et al., 2019]. Although using the UK’s EPSRC Human-Like Computing Network, grant
weaker background knowledge, M etaAbd has a significantly EP/R022291/1, for which he acts as director. The authors
better performance than NeuralSort. Due to the high sample- thank Céline Hocquette, Stassa Patsantzis and Ai Lun for
complexity of reinforcement learning, NLM failed to learn their careful proofreading and helpful comments.
any valid perceptual model and sorting algorithm (success
trajectory rate 0.0% during training). References
The learned program of s and the dyadic neural net [Barrett and Tinelli, 2018] Clark W. Barrett and Cesare Tinelli. Sat-
nn pred are both successfully re-used in the sorting task, isfiability modulo theories. In Handbook of Model Checking,
where the learned program of s is consulted as interpreted pages 305–343. Springer, 2018.
background knowledge [Cropper et al., 2020], and the neural [Bengio, 2017] Yoshua Bengio. The consciousness prior. CoRR,
network that generates probabilistic facts of nn pred is di- abs/1709.08568, 2017.
rectly re-used and continuously trained during the learning of [Cohen et al., 2020] William W. Cohen, Fan Yang, and Kathryn
Mazaitis. Tensorlog: A probabilistic database implemented us-
7
Please see https://bitbucket.org/problog/deepproblog/src/master ing deep-learning infrastructure. Journal of Artificial Intelligence
/examples/NIPS/Forth/Sort/quicksort.pl Research, 67:285–325, 2020.

1850
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)

[Cropper and Muggleton, 2019] Andrew Cropper and Stephen H. [Grover et al., 2019] Aditya Grover, Eric Wang, Aaron Zweig, and
Muggleton. Learning efficient logic programs. Maching Learn- Stefano Ermon. Stochastic optimization of sorting networks via
ing, 108(7):1063–1083, 2019. continuous relaxations. In International Conference on Learning
[Cropper et al., 2020] Andrew Cropper, Rolf Morel, and Stephen Representations, New Orleans, LA, 2019. Openreview.
Muggleton. Learning higher-order logic programs. Maching [Hocquette and Muggleton, 2018] Céline Hocquette and
Learning, 109(7):1289–1322, 2020. Stephen H. Muggleton. How much can experimental cost
[Dai and Zhou, 2017] Wang-Zhou Dai and Zhi-Hua Zhou. Com- be reduced in active learning of agent strategies? In Proceed-
ings of the 28th International Conference on Inductive Logic
bining logical abduction and statistical induction: Discovering
Programming, volume 11105, pages 38–53, Ferrara, Italy, 2018.
written primitives with human knowledge. In Proceedings of
Springer.
the 31st AAAI Conference on Artificial Intelligence, pages 4392–
4398, San Francisco, CA, 2017. [Kahneman, 2011] Daniel Kahneman. Thinking, fast and slow. Far-
rar, Straus and Giroux, New York, 2011.
[Dai et al., 2019] Wang-Zhou Dai, Qiu-Ling Xu, Yang Yu, and Zhi-
Hua Zhou. Bridging machine learning and logical reasoning by [Kakas et al., 1992] Antonis C. Kakas, Robert A. Kowalski, and
abductive learning. In Advances in Neural Information Process- Francesca Toni. Abductive logic programming. Journal of Logic
ing Systems 32, pages 2811–2822. Curran Associates, Inc., 2019. Computation, 2(6):719–770, 1992.
[De Raedt et al., 2020] Luc De Raedt, Sebastijan Dumančić, Robin [Li et al., 2020] Qing Li, Siyuan Huang, Yining Hong, Yixin Chen,
Manhaeve, and Giuseppe Marra. From statistical relational to Ying Nian Wu, and Song-Chun Zhu. Closed loop neural-
neuro-symbolic artificial intelligence. In Christian Bessiere, ed- symbolic learning via integrating neural perception, grammar
itor, Proceedings of the 29th International Joint Conference on parsing, and symbolic reasoning. In Proceedings of the 37th In-
Artificial Intelligence, pages 4943–4950. IJCAI, 7 2020. ternational Conference on Machine Learning, volume 119, pages
5884–5894, Online, 2020. PMLR.
[Dhingra et al., 2020] Bhuwan Dhingra, Manzil Zaheer, Vidhisha
Balachandran, Graham Neubig, Ruslan Salakhutdinov, and [Lifschitz, 2019] Vladimir Lifschitz. Answer Set Programming.
William W. Cohen. Differentiable reasoning over a virtual knowl- Springer, 2019.
edge base. In International Conference on Learning Representa- [Manhaeve et al., 2018] Robin Manhaeve, Sebastijan Dumancic,
tions, Addis Ababa, Ethiopia, 2020. OpenReview. Angelika Kimmig, Thomas Demeester, and Luc De Raedt. Deep-
[Donadello et al., 2017] Ivan Donadello, Luciano Serafini, and Ar- problog: Neural probabilistic logic programming. In Advances
tur S. d’Avila Garcez. Logic tensor networks for semantic im- in Neural Information Processing Systems 31, pages 3753–3763,
age interpretation. In Proceedings of the 26th International Joint Montréal, Canada, 2018. Curran Associates, Inc.
Conference on Artificial Intelligence, pages 1596–1602, Mel- [Muggleton and de Raedt, 1994] Stephen H. Muggleton and Luc
bourne, Australia, 2017. IJCAI. de Raedt. Inductive logic programming: Theory and methods.
[Dong et al., 2019] Honghua Dong, Jiayuan Mao, Tian Lin, Chong The Journal of Logic Programming, 19-20:629 – 679, 1994.
Wang, Lihong Li, and Denny Zhou. Neural logic machines. In [Muggleton et al., 2013] Stephen H. Muggleton, Dianhuan Lin,
International Conference on Learning Representations, New Or- Jianzhong Chen, and Alireza Tamaddoni-Nezhad. MetaBayes:
leans, LA, 2019. OpenReview. Bayesian meta-interpretative learning using higher-order
[Evans and Grefenstette, 2018] Richard Evans and Edward Grefen- stochastic refinement. In Proceedings of the 23rd International
Conference on Inductive Logic Programming, volume 8812,
stette. Learning explanatory rules from noisy data. Journal of
pages 1–17, Rio de Janeiro, Brazil, 2013. Springer.
Artificial Intelligence Research, 61:1–64, 2018.
[Muggleton et al., 2015] Stephen H. Muggleton, Dianhuan Lin,
[Evans et al., 2021] Richard Evans, Matko Bošnjak, Lars Buesing,
and Alireza Tamaddoni-Nezhad. Meta-interpretive learning of
Kevin Ellis, David Pfau, Pushmeet Kohli, and Marek J. Sergot. higher-order dyadic datalog: predicate invention revisited. Ma-
Making sense of raw input. Artificial Intelligence, 299:103521, chine Learning, 100(1):49–73, 2015.
2021.
[Trask et al., 2018] Andrew Trask, Felix Hill, Scott E Reed, Jack
[Flach et al., 2000] Peter A. Flach, Antonis C. Kakas, and Anto- Rae, Chris Dyer, and Phil Blunsom. Neural arithmetic logic units.
nis M. Hadjiantonis, editors. Abduction and Induction: Essays on In Advances in Neural Information Processing Systems 31, pages
Their Relation and Integration. Applied Logic Series. Springer 8035–8044. Curran Associates, Inc., 2018.
Netherlands, 2000.
[Wang et al., 2019] Po-Wei Wang, Priya L. Donti, Bryan Wilder,
[Garcez et al., 2019] Artur S. d’Avila Garcez, Marco Gori, Luı́s C. and J. Zico Kolter. SATNet: Bridging deep learning and logical
Lamb, Luciano Serafini, Michael Spranger, and Son N. Tran. reasoning using a differentiable satisfiability solver. In Proceed-
Neural-symbolic computing: An effective methodology for prin- ings of the 36th International Conference on Machine Learning,
cipled integration of machine learning and reasoning. IfCoLog pages 6545–6554, Long Beach, CA, 2019. PMLR.
Journal of Logics and their Applications, 6(4):611–632, 2019.
[Zhou et al., 2018] Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng
[Gaunt et al., 2017] Alexander L. Gaunt, Marc Brockschmidt, Nate Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks:
Kushman, and Daniel Tarlow. Differentiable programs with neu- A review of methods and applications. CoRR, abs/1812.08434,
ral libraries. In Proceedings of the 34th International Confer- 2018.
ence on Machine Learning, volume 70, pages 1213–1222, Syd-
[Zhou, 2019] Zhi-Hua Zhou. Abductive learning: towards bridging
ney, Australia, 2017. PMLR.
machine learning and logical reasoning. Science China Informa-
[Glasmachers, 2017] Tobias Glasmachers. Limits of end-to-end tion Sciences, 62(7), 2019.
learning. In Proceedings of The 9th Asian Conference on Ma-
chine Learning, volume 77, pages 17–32, Seoul, Korea, 2017.
PMLR.

1851

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy