The Challenge of Crafting Intelligible Intelligence: Daniel S. Weld Gagan Bansal
The Challenge of Crafting Intelligible Intelligence: Daniel S. Weld Gagan Bansal
ABSTRACT
Since Artificial Intelligence (AI) software uses techniques like deep
lookahead search and stochastic optimization of huge neural net-
works to fit mammoth datasets, it often results in complex behavior
that is difficult for people to understand. Yet organizations are de-
arXiv:1803.04263v3 [cs.AI] 15 Oct 2018
KEYWORDS
HCI, artificial intelligence, machine learning, interpretability
desiderata for intelligible behavior later in this article. In brief,
we seek AI systems where A) it is clear what factors caused the
1 INTRODUCTION system’s action [24], allowing the users to predict how changes
Artificial Intelligence (AI) systems have reached or exceeded hu- to the situation would have led to alternative behaviors, and B)
man performance for many circumscribed tasks. As a result, they permits effective control of the AI by enabling interaction. As we
are increasingly deployed in mission-critical roles, such as credit will see, there is a central tension between a concise explanation
scoring, predicting if a bail candidate will commit another crime, and an accurate one.
selecting the news we read on social networks, and self-driving cars. As shown in Figure 2, our survey focuses on two high-level ap-
Unlike other mission-critical software, extraordinarily complex AI proaches to building intelligible AI software: 1) ensuring that the un-
systems are difficult to test: AI decisions are context specific and derlying reasoning or learned model is inherently interpretable, e.g.,
often based on thousands or millions of factors. Typically, AI be- by learning a linear model over a small number of well-understood
haviors are generated by searching vast action spaces or learned by features, and 2) if it is necessary to use an inscrutable model, such as
the opaque optimization of mammoth neural networks operating complex neural networks or deep-lookahead search, then mapping
over prodigious amounts of training data. Almost by definition, no this complex system to a simpler, explanatory model for understand-
clear-cut method can accomplish these AI tasks. ing and control [28]. Using an interpretable model provides the
Unfortunately, much AI-produced behavior is alien, i.e., it can benefit of transparency and veracity; in theory, a user can see ex-
fail in unexpected ways. This lesson is most clearly seen in the per- actly what the model is doing. Unfortunately, interpretable methods
formance of the latest deep neural network image analysis systems. may not perform as well as more complex ones, such as deep neural
While their accuracy at object-recognition on naturally occurring networks. Conversely, the approach of mapping to an explanatory
pictures is extraordinary, imperceptible changes to input images model can apply to whichever AI technique is currently delivering
can lead to erratic predictions, as shown in Figure 1. Why are these the best performance, but its explanation inherently differs from
recognition systems so brittle, making different predictions for ap- the way the AI system actually operates. This yields a central co-
parently identical images? Unintelligible behavior is not limited nundrum: how can a user trust that such an explanation reflects the
to machine learning; many AI programs, such as automated plan- essence of the underlying decision and does not conceal important
ning algorithms, perform search-based lookahead and inference details? We posit that the answer is to make the explanation system
whose complexity exceeds human abilities to verify. While some interactive so users can drill down until they are satisfied with their
search and planning algorithms are provably complete and optimal, understanding.
intelligibility is still important, because the underlying primitives The key challenge for designing intelligible AI is communicat-
(e.g., search operators or action descriptions) are usually approx- ing a complex computational process to a human. This requires
imations [29]. One can’t trust a proof that is based on (possibly) interdisciplinary skills, including HCI as well as AI and machine
incorrect premises. learning expertise. Furthermore, since the nature of explanation
Despite intelligibility’s apparent value, it remains remarkably has long been studied by philosophy and psychology, these fields
hard to specify what makes a system “intelligible.” We discuss should also be consulted.
humans to spot these issues and correct them, e.g., by adding addi-
tional features [4].
Distributional Drift: A deployed model may perform poorly
in the wild, i.e., when a difference exists between the distribution
which was used during training and that encountered during de-
ployment. Furthermore, the deployment distribution may change
Figure 2: Approaches for crafting intelligible AI. Section over time, perhaps due to feedback from the act of deployment.
numbers indicate where each aspect is discussed. This is common in adversarial domains, such as spam detection,
online ad pricing, and search engine optimization. Intelligibility
helps users determine when models are failing to generalize.
This survey highlights key approaches and challenges for build- Facilitating User Control: Many AI systems induce user pref-
ing intelligible intelligence. Section 2 characterizes intelligibility erences from their actions. For example, adaptive news feeds predict
and explains why it is important even in systems with measurably which stories are likely most interesting to a user. As robots become
high performance. Section 4 describes the benefits and limitations more common and enter the home, preference learning will become
of GA2 M, a powerful class of interpretable ML models. Then, in ever more common. If users understand why the AI performed an
Section 5, we characterize methods for handling inscrutable models, undesired action, they can better issue instructions that will lead
discussing different strategies for mapping to a simpler, intelligible to improved future behavior.
model appropriate for explanation and control. Section 6 sketches User Acceptance: Even if they don’t seek to change system
a vision for building interactive explanation systems, where the behavior, users have been shown to be happier with and more
mapping changes in response to the user’s needs. Section 7 argues likely to accept algorithmic decisions if they are accompanied by
that intelligibility is important for search-based AI systems as well an explanation [18]. After being told that they should have their
as for those based on machine learning and that similar solutions kidney removed, it’s natural for a patient to ask the doctor why —
may be applied. even if they don’t fully understand the answer.
Improving Human Insight: While improved AI allows au-
2 WHY INTELLIGIBILITY MATTERS tomation of tasks previously performed by humans, this is not their
While it has been argued that explanations are much less important only use. In addition, scientists use machine learning to get insight
than sheer performance in AI systems, there are many reasons why from big data. Medicine offers several examples [4]. Similarly, the
intelligibility is important. We start by discussing technical reasons, behavior of AlphaGo [35] has revolutionized human understanding
but social factors are important as well. of the game. Intelligible models greatly facilitate these processes.
AI may have the Wrong Objective: In some situations, even Legal Imperatives: The European Union’s GDPR legislation de-
100% perfect performance may be insufficient, for example, if the crees citizens’ right to an explanation, and other nations may follow.
performance metric is flawed or incomplete due to the difficulty of Furthermore, assessing legal liability is a growing area of concern;
specifying it explicitly. Pundits have warned that an automated fac- a deployed model (e.g., self-driving cars) may introduce new areas
tory charged with maximizing paperclip production, could subgoal of liability by causing accidents unexpected from a human operator,
on killing humans, who are using resources that could otherwise shown as “AI-specific error” in Figure 3. Auditing such situations
be used in its task. While this example may be fanciful, it illustrates to assess liability, requires understanding the model’s decisions.
that it is remarkably difficult to balance multiple attributes of a util-
ity function. For example, as Lipton observed [25], “An algorithm
for making hiring decisions should simultaneously optimize for
3 DEFINING INTELLIGIBILITY
productivity, ethics and legality.” However, how does one express So far we have treated intelligibility informally. Indeed, few com-
this trade off? Other examples include balancing training error puting researchers have tried to formally define what makes an
while uncovering causality in medicine and balancing accuracy and AI system interpretable, transparent, or intelligible [6], but one
fairness in recidivism prediction [12]. For the latter, a simplified ob- suggested criterion is human simulatability [25]: can a human user
jective function such as accuracy combined with historically biased easily predict the model’s output for a given input? By this defi-
training data may cause uneven performance for different groups nition, sparse linear models are more interpretable than dense or
(e.g., people of color). Intelligibility empowers users to determine if non-linear ones.
an AI is right for the right reasons. Philosophers, such as Hempel and Salmon, have long debated the
AI may be Using Inadequate Features: Features are often nature of explanation. Lewis [23, p 217] summarizes: “To explain an
correlated, and when one feature is included in a model, machine event is to provide some information about its causal history.” But
learning algorithms extract as much signal as possible from it, indi- many causal explanations may exist. The fact that event C causes E
rectly modeling other features that weren’t included. This can lead is best understood relative to an imagined counterfactual scenario,
to problematic models, as illustrated by Figure 4b (and described where absent C, E would not have occurred; furthermore, C should
in the next Section), where the ML determined that a patient’s be minimal, an intuition known to early scientists, such as William
prior history of asthma (a lung disease) was negatively correlated of Occam, and formalized by Halpern and Pearl [11].
with death by pneumonia, presumably due to correlation with (un- Following this logic, we suggest that a better criterion than
modeled) variables, such as these patients receiving timely and simulatability is the ability to answer counterfactuals, aka “what-if”
aggressive therapy for lung problems. An intelligible model helps questions. Specifically, we say that a model is intelligible to the
2
Human Errors
the behavior of a binary classifier, i.e., where there is only one possi-
AI AI Specific
Errors Errors ble foil choice. However, as we seek to explain multi-class systems,
addressing this issue becomes essential.
Many systems are simply too complex to understand without
Figure 3: The dashed blue shape indicates the space of possi-
approximation. Here, the key challenge is deciding which details
ble mistakes humans can make. The red shape denotes the
to omit. After long study psychologists determined that several
AI’s mistakes; its smaller size indicates a net reduction in
criteria can be prioritized for inclusion in an explanation: necessary
the number of errors. The gray region denotes AI-specific
causes (vs. sufficient ones); intentional actions (vs. those taken
mistakes a human would never make. Despite reducing the
without deliberation); proximal causes (vs. distant ones); details
total number of errors, a deployed model may create new
that distinguish between fact and foil; and abnormal features [30].
areas of liability (gray), necessitating explanations.
According to Lombrozo, humans prefer explanations that are
simpler (i.e., contain fewer clauses), more general, and coherent (i.e.,
consistent with what the human’s prior beliefs) [26]. In particular,
degree that a human user can predict how a change to a feature, she observed the surprising result that humans preferred simple
e.g., a small increase to its value, will change the model’s output (one clause) explanations to conjunctive ones, even when the prob-
and if they can reliably modify that response curve. Note that if one ability of the latter was higher than the former [26]. These results
can simulate the model, predicting its output, then one can predict raise interesting questions about the purpose of explanations in
the effect of a change, but not vice versa. an AI system. Is an explanation’s primary purpose to convince a
Linear models are especially interpretable under this definition human to accept the computer’s conclusions (perhaps by presenting
because they allow the answering of counterfactuals. For exam- a simple, plausible, but unlikely explanation) or is it to educate the
ple, consider a naive Bayes unigram model for sentiment analysis, human about the most likely true situation? Tversky, Kahneman,
whose objective is to predict the emotional polarity (positive or neg- and other psychologists have documented many cognitive biases
ative) of a textual passage. Even if the model were large, combining that lead humans to incorrect conclusions; for example, people
evidence from the presence of thousands of words, one could see reason incorrectly about the probability of conjunctions, with a
the effect of a given word by looking at the sign and magnitude of concrete and vivid scenario deemed more likely than an abstract
the corresponding weight. This answers the question, “What if the one that strictly subsumes it [16]. Should an explanation system
word had been omitted?” Similarly, by comparing the weights asso- exploit human limitations or seek to protect us from them?
ciated with two words, one could predict the effect on the model of Other studies raise an additional complication about how to
substituting one for the other. communicate a system’s uncertain predictions to human users.
Ranking Intelligible Models: Since one may have a choice of Koehler found that simply presenting an explanation for a propo-
intelligible models, it is useful to consider what makes one prefer- sition makes people think that it is more likely to be true [18].
able to another. Social science research suggests that an explanation Furthermore, explaining a fact in the same way as previous facts
is best considered a social process, a conversation between explainer have been explained amplifies this effect [36].
and explainee [15, 30]. As a result, Grice’s rules for cooperative
communication [10] may hold for intelligible explanations. Grice’s
4 INHERENTLY INTELLIGIBLE MODELS
maxim of quality says be truthful, only relating things that are
supported by evidence. The maxim of quantity says to give as much Several AI systems are inherently intelligible, and we previously
information as is needed, and no more. The maxim of relation: only observed that linear models support counterfactual reasoning. Un-
say things that are relevant to the discussion. The maxim of manner fortunately, linear models have limited utility because they often
says to avoid ambiguity, being as clear as possible. result in poor accuracy. More expressive choices may include simple
Miller summarizes decades of work by psychological research, decision trees and compact decision lists. To concretely illustrate
noting that explanations are contrastive, i.e., of the form “Why P the benefits of intelligibility, we focus on Generalized additive mod-
rather than Q?” The event in question, P, is termed the fact and Q is els (GAMs), which are a powerful class of ML models that relate a
called the foil [30]. Often the foil is not explicitly stated even though set of features to the target using a linear combination of (poten-
it is crucially important to the explanation process. For example, tially nonlinear) single-feature models called shape functions [27].
consider the question, “Why did you predict that the image depicts For example, if y represents the target and {x 1 , . . . .x n } represents
the features, then a GAM model takes the form y = β 0 + j f j (x j ),
Í
an indigo bunting?” An explanation that points to the color blue
implicitly assumes that the foil is another bird, such as a chickadee. where the fi s denote shape functions and the target y is computed
But perhaps the questioner wonders why the recognizer did not by summing single-feature terms. Popular shape functions include
predict a pair of denim pants; in this case a more precise explana- non-linear functions such as splines and decision trees. With linear
tion might highlight the presence of wings and a beak. Clearly, an shape functions GAMs reduce to a linear models. GA2 M models
explanation targeted to the wrong foil will be unsatisfying, but the extend GAM models by including terms for pairwise interactions
nature and sophistication of a foil can depend on the end user’s between features:
expertise; hence, the ideal explanation will differ for different peo- Õ Õ
y = β0 + f j (x j ) + fi j (x i , x j ) (1)
ple [6]. For example, to verify that an ML system is fair, an ethicist
j i,j
might generate more complex foils than a data scientist. Most ML | {z }
explanation systems have restricted their attention to elucidating pairwise terms
3
Figure 4: A part of Figure 1 from [4] showing 3 (of 56 total) components for a GA2 M model, which was trained to predict a
patient’s risk of dying from pneumonia. The two line graphs depict the contribution of individual features to risk: a) patient’s
age, and b) boolean variable asthma. The y-axis denotes its contribution (log odds) to predicted risk. The heat map, c, visualizes
the contribution due to pairwise interactions between age and cancer rate.
Caruana et al. observed that for domains containing a moderate remedy might be to introduce a new feature to the model, represent-
ing whether the patient had been recently seen by a pulmonologist.
number of semantic features, GA2 M models achieve performance
After adding this feature, which is highly correlated with asthma,
that is competitive with inscrutable models, such as random forests
and retraining, the newly-learned model would likely reflect that
and neural networks, while remaining intelligible [4]. Lou et al.
asthma (by itself) increases the risk of dying from pneumonia.
observed that among methods available for learning GA2 M models,
There are two more takeaways from this anecdote. First, the
the version with bagged shallow regression tree shape functions
absence of an important feature in the data representation can cause
learned via gradient boosting achieves the highest accuracy [27].
any AI system to learn unintuitive behavior for another, correlated
Both GAM and GA2 M are considered interpretable because the feature. Second, if the learner is intelligible, then this unintuitive
model’s learned behavior can be easily understood by examining behavior is immediately apparent, allowing appropriate skepticism
or visualizing the contribution of terms (individual or pairs of fea- (despite high test accuracy) and easier debugging.
tures) to the final prediction. For example, Figure 4 depicts a GA2 M Recall that GA2 Ms are more expressive than simple GAMs be-
model trained to predict a patient’s risk of dying due to pneumonia, cause they include pairwise terms. Figure 4(c) depicts such a term for
showing the contribution (log odds) to total risk for a subset of the features age and cancer. This explanation indicates that among
terms. A positive contribution increases risk, whereas a negative the patients who have cancer, the younger ones are at higher risk.
contribution decreases risk. For example, Figure 4(a) shows how This may be because the younger patients who develop cancer are
the patient’s age affects predicted risk. While the risk is low and probably critically ill. Again, since doctors can readily inspect these
steady for young patients (e.g., age < 20), it increases rapidly for terms, they know if the learner develops unexpected conclusions.
older patients (age > 67). Interestingly, the model shows a sudden
Limitations: As described, GA2 M models are restricted to bi-
increase at age 86; perhaps a result of less aggressive care by doctors
nary classification, and so explanations are clearly contrastive —
for patients “whose time has come.” Even more surprising is the
there is only one choice of foil. One could extend GA2 M to handle
sudden drop for patients over 100. This might be another social
multiple classes by training n one-vs-rest classifiers or building a hi-
effect; once a patient reaches the magic “100”, he or she gets more
erarchy of classifiers. However, while these approaches would yield
aggressive care. One benefit of an interpretable model is its ability
a working multi-class classifier, we don’t know if they preserve
to highlight these issues, spurring deeper analysis.
model intelligibility, nor whether a user could effectively adjust
Figure 4(b) illustrates another surprising aspect of the learned
such a model by editing the shape functions.
model; apparently, a history of asthma, a respiratory disease, de-
creases the patients risk of dying from pneumonia! This finding is Furthermore, recall that GA2 Ms decompose their prediction into
counter-intuitive to any physician, who recognizes that asthma, effects of individual terms which can be visualized. However, if users
in fact, should in theory increase such risk. When Caruana et al. are confused about what terms mean, they will not understand the
checked the data, they concluded that the lower risk was likely due model or be able to ask meaningful “what-if” questions. Moreover,
to correlated variables — asthma patients typically receive timely if there are too many features, the model’s complexity may be over-
and aggressive therapy for lung issues. Therefore, although the whelming. Lipton notes that the effort required to simulate some
model was highly accurate on the test set, it would likely fail, dra- models (such as decision trees) may grow logarithmically with the
matically underestimating the risk to a patient with asthma who number of parameters [25], but for GA2 M the number of visualiza-
had not been previously treated for the disease. tions to inspect could increase quadratically. Several methods might
Facilitating Human Control of GA2 M Models: A domain help users manage this complexity; for example, the terms could
expert can fix such erroneous patterns learned by the model by set- be ordered by importance; however, it’s not clear how to estimate
importance. Possible methods include using an ablation analysis to
ting the weight of the asthma term to zero. In fact, GA2 Ms let users
compute influence of terms on model performance or computing
provide much more comprehensive feedback to the model by using
the maximum contribution of terms as seen in the training samples.
a GUI to redraw a line graph for model terms [4]. An alternative
4
Alternatively, a domain expert could group terms semantically to
facilitate perusal.
However, when the number of features grows into the millions —
which occurs when dealing with classifiers over text, audio, image
and video data — existing intelligible models do not perform nearly
as well as inscrutable methods, like deep neural networks. Since
these models combine millions of features in complex, nonlinear
ways, they are beyond human capacity to simulate.
5 UNDERSTANDING INSCRUTABLE MODELS Figure 5: The intuition guiding LIME’s method for construct-
There are two ways that an AI model may be inscrutable. It may ing an approximate local explanation (taken from [33]):
be provided as a blackbox API, such as Microsoft Cognitive Ser- “The black-box model’s complex decision function, f , (un-
vices, which uses machine learning to provide image-recognition known to LIME) is represented by the blue/pink background,
capabilities but does not allow inspection of the underlying model. which cannot be approximated well by a linear model. The
Alternatively, the model may be under the user’s control yet ex- bold red cross is the instance being explained. LIME samples
tremely complex, such as a deep, neural network, where a user instances, gets predictions using f , and weighs them by the
has access to myriad learned parameters but can not reasonably proximity to the instance being explained (represented here
interpret them. How can one best explain such models to the user? by size). The dashed line is the learned explanation that is
The Comprehensibility / Fidelity Trade-Off: A good expla- locally (but not globally) faithful.”
nation of an event is both easy to understand and faithful, conveying
the true cause of the event. Unfortunately, these two criteria almost to omit when creating the simpler explanatory model. Human pref-
always conflict. Consider the predictions of a deep neural network erences, discovered by psychologists and summarized in Section 3,
with millions of nodes: a complete and accurate trace of the net- should guide algorithms that construct these simplifications.
work’s prediction would be far too complex to understand, but any Ribeiro et al.’s LIME system [33] is a good example of a system
simplification sacrifices faithfulness. for generating a locally-approximate explanatory model of an arbi-
Finding a satisfying explanation, therefore, requires balancing trary learned model, but it sidesteps part of the question of which
the competing goals of comprehensibility and fidelity. Lakkaraju details to omit. Instead, LIME requires the developer to provide two
et al. [22] suggest formulating an explicit optimization of this form additional inputs: 1) a set of semantically meaningful features X ′
and propose an approximation algorithm for generating global ex- that can be computed from the original features, and 2) an inter-
planations in the form of compact sets of if-then rules. Ribeiro et al. pretable learning algorithm, such as a linear classifier (or a GA2 M),
describe a similar optimization algorithm that balances faithfulness which it uses to generate an explanation in terms of the X ′ .
and coverage in its search for summary rules [34]. The insight behind LIME is shown in Figure 5. Given an instance
Indeed, all methods for rendering an inscrutable model intelli- to explain, shown as the bolded red cross, LIME randomly generates
gible require mapping the complex model to a simpler one [28]. a set of similar instances and uses the blackbox classifier, f , to
Several high-level approaches to mapping have been proposed. predict their values (shown as the red crosses and blue circles).
Local Explanations: One way to simplify the explanation of a These predictions are weighted by their similarity to the input
learned model is to make it relative to a single input query. Such instance (akin to locally-weighted regression) and used to train a
explanations, which are termed local [33] or instance-based [20], are new, simpler intelligible classifier, shown on the figure as the linear
akin to a doctor explaining specific reasons for a patient’s diagnosis decision boundary, using X ′ , the smaller set of semantic features.
rather than communicating all of her medical knowledge. Contrast The user receives the intelligible classifier as an explanation. While
this approach with the global understanding of the model that this explanation model [28] is likely a poor global representation of
one gets with a GA2 M model. Mathematically, one can see a local f , it is hopefully an accurate local approximation of the boundary
explanation as currying — several variables in the model are fixed in the vicinity of the instance being explained.
to specific values, allowing simplification. Ribeiro et al. tested LIME on several domains. For example, they
Generating a local explanations is a common practice in AI explained the predictions of a convolutional neural network image
systems. For example, early rule-based expert systems included classifier by converting the pixel-level features into a smaller set
explanation systems that augmented a trace of the system’s rea- of “super-pixels;” to do so, they ran an off-the-shelf segmentation
soning — for a particular case — with background knowledge [38]. algorithm that identified regions in the input image and varied
Recommender systems, one of the first deployed uses of machine the color of some these regions when generating “similar” images.
learning, also induced demand for explanations of their specific While LIME provides no formal guarantees about its explanations,
recommendations; the most satisfying answers combined justifica- studies showed that LIME’s explanations helped users evaluate
tions based on the user’s previous choices, ratings of similar users, which of several classifiers best generalizes.
and features of the items being recommended [32]. Choice of Explanatory Vocabulary: Ribeiro et al.’s use of pre-
Locally-Approximate Explanations: In many cases, however, segmented image regions to explain image classification decisions
even a local explanation can be too complex to understand without illustrates the larger problem of determining an explanatory vo-
approximation. Here, the key challenge is deciding which details cabulary. Clearly, it would not make sense to try to identify the
5
2 L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, T. Darrell
Image
Description
Visual
Explanation
Laysan
Laysan Albatross
Albatross Description:
observedThis
Description: thatiscertain layers
a large bird may
with function
a white neck as
andedge or pattern
a black beak in de-
Image Relevance
Class
the Definition:
tectors
water [40]. Whenever a user can identify the presence of such
. layers, then it may be preferable to use them in the explanation.
andClasswhite
Bau belly.
Definition: The Laysan
et al. describe Albatross
an automatic is a large seabird
mechanism with a hooked
for matching CNN rep-
yellow beak, black
resentations backsemantically
with and white belly.
meaningful concepts using a large,
Class Definition
Visual Explanation:
labeled corpus of objects, parts, and texture; furthermore, using this
Visual Explanation:
their This is a Laysan Albatross because this bird has a
Class Relevance
neck
hooked
and
alignment,
black
yellow beak
potentially
back.
method
white neck
suggesting
quantitatively
and
a way toblack
scores CNN interpretability,
back.for intelligible models.
optimize
However, many obstacles remain. As one example, it is not clear
Description: This is a large bird with a white neck and a black that there are satisfying ways to describe important, discriminative
beak in the water.
features, which are often intangible, e.g., textures. An intelligible
Class Definition: The Laysan Albatross is a large seabird with explanation may need to define new terms or combine language
Fig. 1. Our proposed model generates visual explanations.
a hooked yellow beak, a black back, and a white belly.
Visual explanations are both
with other modalities, like patches of an image. Another challenge
image relevant and class relevant. In contrast, imageisdescriptions are relational
image relevant,
1. Our proposed model generates visual explanations. Vi
Visual Explanation: This is a Laysan Albatross because this
but not necessarily class relevant,
bird has a hooked yellow beak white neck and black back. and class definitions
inducing first-order,
are class
descriptions relevant
such as “a
descriptions, which would enable
butbecause
spider not nec-it has eight legs” and “full
essarily image relevant. In the visual explanations above, class discriminative visual because all seats are occupied.” While quantified and relational
actions by the user. This matches results from psychology literature, classifier built atop Inception v3 [39]. As the figure shows: 1) The
summarized in Section 2, and highlights Grice’s maxims, especially computer correctly predicts that the image depicts a fish. 2) The user
those pertaining to quantity and relation. It also builds on Lim and requests an explanation, which is provided using LIME [33]. 3) The
Dey’s work in ubiquitous computing, which investigated the kinds user, concerned that the classifier is paying more attention to the
of questions users wished to ask about complex, context-aware background than to the fish itself, asks to see the training data that
applications [24]. We envision an interactive explanation system influenced the classifier; the nearest neighbors are computed using
that supports many different follow-up and drill-down action after influence functions [19]. While there are anemones in those images,
presenting a user with an initial explanation: it also seems that the system is recognizing a clownfish. 4) To gain
• Redirecting the answer by changing the foil: “Sure, but why didn’t confidence, the user edits the input image to remove the background,
you predict class C?” resubmits it to the classifier and checks the explanation.
• Asking for more detail (i.e., a more complex explanatory model),
perhaps while restricting the explanation to a subregion of 7 EXPLAINING COMBINATORIAL SEARCH
feature space: “I’m only concerned about women over age 50...” Most of the preceding discussion has focused on intelligible machine
• Asking for a decision’s rationale: “What made you believe this?” learning, which is just one type of artificial intelligence. However,
To which the system might respond by displaying the labeled the same issues also confront systems based on deep-lookahead
training examples that were most influential in reaching that search. While many planning algorithms have strong theoretical
decision, e.g., ones identified by influence functions [19] or properties, such as soundness, they search over action models that
nearest neighbor methods. include their own assumptions. Furthermore, goal specifications
are likewise incomplete [29]. If these unspoken assumptions are
• Query the model’s sensitivity by asking what minimal perturba-
incorrect, then a formally correct plan may still be disastrous.
tions to certain features would lead to a different output.
Consider a planning algorithm that has generated a sequence
• Changing the vocabulary by adding (or removing) a feature in of actions for a remote, mobile robot. If the plan is short with a
the explanatory model, either from a predefined set, by using moderate number of actions, then the problem may be inherently in-
methods from machine teaching, or with concept activation telligible, and a human could easily spot a problem. However, larger
vectors [17]. search spaces could be cognitively overwhelming. In these cases,
• Perturbing the input example to see the effect on both prediction local explanations offer a simplification technique that is helpful,
and explanation. In addition to aiding understanding of the just as it was when explaining machine learning. The vocabulary
model (directly testing a counterfactual), this action enables an issue is likewise crucial: how does one succinctly and abstractly
affected user who wants to contest the initial prediction: “But summarize a complete search subtree? Depending on the choice of
officer, one of those prior DUIs was overturned...?” explanatory foil, different answers are appropriate [8]. Sreedharan
• Adjusting the model: Based on new understanding, the user may et al. describe an algorithm for generating the minimal explanation
wish to correct the model. Here, we expect to build on tools that patches a user’s partial understanding of a domain [37]. Work
for interactive machine learning [1] and explanatory debug- on mixed-initiative planning [7] has demonstrated the importance
ging [20, 21], which have explored interactions for adding new of supporting interactive dialog with a planning system. Since many
training examples, correcting erroneous labels in existing data, AI systems, e.g., AlphaGo [35], combine deep search and machine
specifying new features, and modifying shape functions. As learning, additional challenges will result from the need to explain
mentioned in the previous section, it may be challenging to interactions between combinatorics and learned models.
map user adjustments, that are made in reference to an explana-
tory model, back into the original, inscrutable model. 8 FINAL THOUGHTS
To make these ideas concrete, Figure 7 presents a possible dialog In order to trust deployed AI systems, we must not only improve
as a user tries to understand the robustness of a deep neural dog/fish their robustness [5], but also develop ways to make their reasoning
7
intelligible. Intelligibility will help us spot AI that makes mistakes [5] T. Dietterich. 2017. Steps Towards Robust Artificial Intelligence. AI Magazine 38,
due to distributional drift or incomplete representations of goals 3 (2017).
[6] F. Doshi-Velez and B. Kim. 2017. Towards A Rigorous Science of Interpretable
and features. Intelligibility will also facilitate control by humans in Machine Learning. ArXiv (2017). arXiv:1702.08608
increasingly common collaborative human/AI teams. Furthermore, [7] George Ferguson and James F. Allen. 1998. TRIPS: An Integrated Intelligent
Problem-Solving Assistant. In AAAI/IAAI.
intelligibility will help humans learn from AI. Finally, there are [8] M. Fox, D. Long, and D. Magazzeni. 2017. Explainable Planning. In IJCAI XAI
legal reasons to want intelligible AI, including the European GDPR Workshop. http://arxiv.org/abs/1709.10256
and a growing need to assign liability when AI errs. [9] I. J. Goodfellow, J. Shlens, and C. Szegedy. 2014. Explaining and Harnessing
Adversarial Examples. ArXiv (2014). arXiv:1412.6572
Depending on the complexity of the models involved, two ap- [10] P. Grice. 1975. Logic and Conversation. 41–58.
proaches to enhancing understanding may be appropriate: 1) using [11] J. Halpern and J. Pearl. 2005. Causes and explanations: A structural-model
an inherently interpretable model, or 2) adopting an inscrutably approach. Part I: Causes. The British journal for the philosophy of science 56, 4
(2005), 843–887.
complex model and generating post hoc explanations by mapping it [12] M. Hardt, E. Price, and N. Srebro. 2016. Equality of opportunity in supervised
to a simpler, explanatory model through a combination of currying learning. In NIPS.
[13] L. Hendricks, Z. Akata, M. Rohrbach, J. Donahue, B. Schiele, and T. Darrell. 2016.
and local approximation. When learning a model over a medium Generating visual explanations. In ECCV.
number of human-interpretable features, one may confidently bal- [14] L. A. Hendricks, R. Hu, T. Darrell, and Z. Akata. 2017. Grounding Visual Expla-
ance performance and intelligibility with approaches like GA2 Ms. nations. ArXiv (2017). arXiv:1711.06465
[15] D. Hilton. 1990. Conversational processes and causal explanation. Psychological
However, for problems with thousands or millions of features, per- Bulletin 107, 1 (1990), 65.
formance requirements likely force the adoption of inscrutable [16] D. Kahneman. 2011. Thinking, fast and slow. Farrar, Straus and Giroux, New York.
http://a.co/hGYmXGJ
methods, such as deep neural networks or boosted decision trees. [17] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, and R. Sayres. 2017.
In these situations, post-hoc explanations may be the only way to Interpretability Beyond Feature Attribution: Quantitative Testing with Concept
facilitate human understanding. Activation Vectors (TCAV). ArXiv e-prints (Nov. 2017). arXiv:stat.ML/1711.11279
[18] Derek J Koehler. 1991. Explanation, imagination, and confidence in judgment.
Research on explanation algorithms is developing rapidly, with Psychological bulletin 110, 3 (1991), 499.
work on both local (instance-specific) explanations and global ap- [19] P. Koh and P. Liang. 2017. Understanding black-box predictions via influence
proximations to the learned model. A key challenge for all these functions. In ICML.
[20] J. Krause, A. Dasgupta, J. Swartz, Y. Aphinyanaphongs, and E. Bertini. 2017.
approaches is the construction of an explanation vocabulary, es- A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level
sentially a set of features used in the approximate explanation Explanations. In IEEE VAST.
[21] T. Kulesza, M. Burnett, W. Wong, and S. Stumpf. 2015. Principles of explanatory
model. Different explanatory models may be appropriate for differ- debugging to personalize interactive machine learning. In IUI.
ent choices of explanatory foil, an aspect deserving more attention [22] H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec. 2017. Interpretable &
from systems builders. While many intelligible models can be di- Explorable Approximations of Black Box Models. KDD-FATML (2017).
[23] D. Lewis. 1986. Causal explanation. Philosophical Papers 2 (1986), 214–240.
rectly edited by a user, more research is needed to determine how [24] Brian Y Lim and Anind K Dey. 2009. Assessing demand for intelligibility in
best to map such actions back to modify an underlying inscrutable context-aware applications. In Proceedings of the 11th International Conference on
model. Results from psychology show that explanation is a social Ubiquitous Computing. ACM, 195–204.
[25] Z. Lipton. 2016. The Mythos of Model Interpretability. In ICML Workshop on
process, best thought of as a conversation. As a result, we advocate Human Interpretability in ML.
increased work on interactive explanation systems that support [26] T. Lombrozo. 2007. Simplicity and probability in causal explanation. Cognitive
psychology 55, 3 (2007), 232–257.
a wide range of follow-up actions. To spur rapid progress in this [27] Y. Lou, R. Caruana, and J. Gehrke. 2012. Intelligible models for classification and
important field, we hope to see collaboration between researchers regression. In KDD.
in multiple disciplines. [28] S. Lundberg and S. Lee. 2017. A unified approach to interpreting model predictions.
NIPS (2017).
Acknowledgements: We thank E. Adar, S. Ameshi, R. Calo, R. [29] J. Mccarthy and P. Hayes. 1969. Some Philosophical Problems from the Standpoint
Caruana, M. Chickering, O. Etzioni, J. Heer, E. Horvitz, T. Hwang, R. of Artificial Intelligence. In Machine Intelligence. 463–502.
Kambhamapti, E. Kamar, S. Kaplan, B. Kim, P. Simard, Mausam, C. [30] T. Miller. 2017. Explanation in artificial intelligence: Insights from the social
sciences. ArXiv (2017). arXiv:1706.07269
Meek, M. Michelson, S. Minton, B. Nushi, G. Ramos, M. Ribeiro, M. [31] Donald A Norman. 2014. Some Observations on Mental Models. In Mental Models.
Richardson, P. Simard, J. Suh, J. Teevan, T. Wu, and the anonymous Psychology Press, 15–22.
[32] A. Papadimitriou, P. Symeonidis, and Y. Manolopoulos. 2012. A generalized
reviewers for helpful conversations and comments. This work was taxonomy of explanations styles for traditional and social recommender systems.
supported in part by the Future of Life Institute grant 2015-144577 Data Mining and Knowledge Discovery 24, 3 (2012), 555–583.
(5388) with additional support from NSF grant IIS-1420667, ONR [33] M. Ribeiro, S. Singh, and C. Guestrin. 2016. Why Should I Trust You?: Explaining
the Predictions of any Classifier. In KDD.
grant N00014-15-1-2774, and the WRF/Cable Professorship. [34] M. Ribeiro, S. Singh, and C. Guestrin. 2018. Anchors: High-Precision Model-
Agnostic Explanations. In AAAI.
[35] D. Silver, A. Huang, C. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, et al.
REFERENCES 2016. Mastering the game of Go with deep neural networks and tree search.
nature 529, 7587 (2016), 484–489.
[1] S. Amershi, M. Cakmak, W. Knox, and T. Kulesza. 2014. Power to the people:
[36] S. Sloman. 1997. Explanatory coherence and the induction of properties. Thinking
The role of humans in interactive machine learning. AI Magazine 35, 4 (2014),
& Reasoning 3, 2 (1997), 81–110.
105–120.
[37] S. Sreedharan, S. Srivastava, and S. Kambhampati. 2018. Hierarchical Expertise-
[2] J. R Anderson, F. Boyle, and B. Reiser. 1985. Intelligent tutoring systems. Science
Level Modeling for User Specific Robot-Behavior Explanations. ArXiv e-prints
228, 4698 (1985), 456–462.
(Feb. 2018). arXiv:1802.06895
[3] T. Besold, A. d’Avila Garcez, S. Bader, H. Bowman, P. Domingos, P. Hitzler, K.
[38] W. Swartout. 1983. XPLAIN: a system for creating and explaining expert consult-
Kühnberger, L. Lamb, D. Lowd, P. Lima, L. de Penning, G. Pinkas, H. Poon, and
ing programs. Artificial Intelligence 21, 3 (1983), 285 – 325.
G. Zaverucha. 2017. Neural-Symbolic Learning and Reasoning: A Survey and
[39] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the
Interpretation. CoRR abs/1711.03902 (2017). arXiv:1711.03902
Inception Architecture for Computer Vision. In CVPR.
[4] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad. 2015. Intel-
[40] M. Zeiler and R. Fergus. 2014. Visualizing and understanding convolutional
ligible models for healthcare: Predicting pneumonia risk and hospital 30-day
networks. In ECCV.
readmission. In KDD.