0% found this document useful (0 votes)
35 views11 pages

A Causality Inspired Framework For Model Interpretation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views11 pages

A Causality Inspired Framework For Model Interpretation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

A Causality Inspired Framework for Model Interpretation

Chenwang Wu∗ Xiting Wang† Defu Lian†


University of Science and Technology Microsoft Research Asia University of Science and Technology
of China Beijing, China of China
Hefei, Anhui, China xitwan@microsoft.com Hefei, Anhui, China
wcw1996@mail.ustc.edu.cn liandefu@ustc.edu.cn

Xing Xie Enhong Chen‡


Microsoft Research Asia University of Science and Technology
Beijing, China of China
xing.xie@microsoft.com Hefei, Anhui, China
cheneh@ustc.edu.cn
ABSTRACT 1 INTRODUCTION
This paper introduces a unified causal lens for understanding rep- Although deep learning is widely used in various fields [14, 16, 19],
resentative model interpretation methods. We show that their ex- deep models are mostly complex functions that humans cannot un-
planation scores align with the concept of average treatment ef- derstand, which may reduce user trust. For this reason, eXplainable
fect in causal inference, which allows us to evaluate their relative Artificil Intelligence (XAI) has received increasing attention. A fun-
strengths and limitations from a unified causal perspective. Based damental question in XAI is: do explanations reveal important root
on our observations, we outline the major challenges in applying causes of the model’s behavior or merely correlations? The inability
causal inference to model interpretation, including identifying com- to distinguish correlation from causality can result in erroneous
mon causes that can be generalized across instances and ensuring explanations for decision-makers [35]. The importance of causality
that explanations provide a complete causal explanation of model is further highlighted by prominent research in human-computer
predictions. We then present CIMI, a Causality-Inspired Model In- interaction [41], in which extensive user studies reveal that in XAI,
terpreter, which addresses these challenges. Our experiments show causality increases user trust and helps evaluate the quality of ex-
that CIMI provides more faithful and generalizable explanations planations. This result echoes major theories in cognitive science
with improved sampling efficiency, making it particularly suitable which present that humans build mental models of the world by
for larger pretrained models. using causal relationships [34, 46].
XAI provides an ideal environment for causality studies due
CCS CONCEPTS to its adherence to fundamental causality assumptions, which are
• Computing methodologies → Machine learning. usually difficult to verify in other settings. For example, in XAI,
we can obtain a set of variables (e.g., input data and model param-
eters) that construct a complete set of possible causes for model
KEYWORDS
prediction, which ensures the satisfaction of the essential causal
Interpretability, causal inference, machine learning. sufficiency assumption [36, 42]. In addition, the black-box models to
ACM Reference Format: be studied can be easily intervened, allowing the vital do-operator
Chenwang Wu, Xiting Wang, Defu Lian, Xing Xie, and Enhong Chen. 2023. to be performed directly without any further assumptions such
A Causality Inspired Framework for Model Interpretation. In Proceedings of as ignorability or exchangeability. In contrast, the inability to per-
the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining form different do-operators in the same instance is the fundamental
(KDD ’23), August 6–10, 2023, Long Beach, CA, USA. ACM, New York, NY, problem of causality inference in more general scenarios [36].
USA, 11 pages. https://doi.org/10.1145/3580305.3599240 Due to its importance and applicability, causality has attracted
increasing attention in XAI. Multiple explanation methods [30,
∗ Work done during an internship at Microsoft Research Asia. 40, 42] utilize causal analysis techniques such as interventions
† Corresponding authors. (e.g. input data perturbation), and some have achieved noteworthy
‡ Also affiliated with the State Key Laboratory of Cognitive Intelligence.
success in delivering more trustworthy explanations. Despite this,
a formal and unified causal perspective for explainability remains
Permission to make digital or hard copies of all or part of this work for personal or lacking and some key research questions remain challenging to
classroom use is granted without fee provided that copies are not made or distributed answer, for example:
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the • RQ1: Can the existing explanation methods be framed within a
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission theoretical causal framework? If so, what are the causal models
and/or a fee. Request permissions from permissions@acm.org. employed, and what distinguishes them from each other?
KDD ’23, August 6–10, 2023, Long Beach, CA, USA • RQ2: What are the major challenges in leveraging causal infer-
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0103-0/23/08. . . $15.00 ence for model interpretation and what benefits we may achieve
https://doi.org/10.1145/3580305.3599240 by solving these challenges?

2731
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chenwang Wu, Xiting Wang, Defu Lian, Xing Xie, & Enhong Chen

• RQ3: How can the causal model be improved to overcome these Causal graph is used to formally depict causal relations. In the
challenges? graph, each node is a random variable, and each direct edge rep-
In this paper, we aim to bridge the gap between causality and resents a causal relation, which means that the target node (child)
explainability by studying these issues. can change in response to the change of the source node (parent).
We first provide a causal theoretical interpretation for explana- Do-operator is a mathematical operator for intervention. In gen-
tion methods including LIME [40], Shapley values [30], and CXplain eral, applying a do-operator 𝑑𝑜 (𝐸 = 𝑒) on a random variable 𝐸
[42] (RQ1). Our analysis shows that their explanation scores cor- means that we set the random variable to value 𝑒. For example,
respond to (average) treatment effect [36] in causal inference to • 𝑃 (𝑌 = 𝑦|𝑑𝑜 (𝐸 = 𝑒)) is the probability that 𝑌 is 𝑦 when in every
some extent, and they share the same causal graph, with only small instance, 𝐸 is assigned to 𝑒. This is a global intervention that
differences such as the choices of the treatment (i.e., the perturbed happens to the whole population. In comparison, 𝑃 (𝑌 = 𝑦|𝐸 = 𝑒)
features). This provides a unified view for understanding the precise denotes the probability that 𝑌 is 𝑦 on the subpopulation where
meaning of their explanations and provides theoretical evidence 𝐸 is observed to be 𝑒.
about their advantages and limitations. • 𝑃 (𝑌 = 𝑦|𝑑𝑜 (𝐸 = 𝑒), 𝐶 = 𝑐) applies do-operator to the sub-
These observations allow us to summarize the core challenge in population where the random variable 𝐶 has value 𝑐.
applying causal inference for model interpretation (RQ2). While Treatment effect is an important method to quantify how much
it is easy for explanation methods to compute individual causal causal effect a random variable 𝐸 has on 𝑌 . Suppose that 𝐸 is a
effects, e.g., understanding how much the model prediction will binary value, the average treatment effect 𝑇 of 𝐸 on 𝑌 is
change when one input feature changes, the core challenge is how to 𝑇 (𝑌 |𝑑𝑜 (𝐸)) = E𝑐 𝑇 (𝑌 |𝑑𝑜 (𝐸), 𝐶 = 𝑐)
efficiently discover prominent common causes that can be generalized (1)
= E𝑐 (𝑌 (𝑑𝑜 (𝐸 = 1), 𝐶 = 𝑐) − 𝑌 (𝑑𝑜 (𝐸 = 0), 𝐶 = 𝑐),
to different instances from a large number of features and data points.
Addressing this issue requires ensuring that the explanations are (1) where 𝑌 (𝑑𝑜 (𝐸 = 𝑒), 𝐶 = 𝑐) represents the value of 𝑌 when 𝐸 is set
causal sufficiency for understanding model predictions and can to 𝑒 and all other causes 𝐶 is fixed to 𝑐.
(2) generalize to different instances. These become increasingly
important when the black-box model grows larger and there are 2.2 Causal Graph of Existing XAI Methods
more data points to be explained. In this case, it is vital that the
explanations correspond to common causes that can be generalized
𝐶 𝑋 𝐸 𝑋 𝐸
across many data points, so that we can save users’ cognitive efforts.
To solve the above challenges (RQ3), we follow important causal
principles, and propose Causality Inspired Model Interpreter (CIMI)1 .
Specifically, we first discuss different choices of causal graphs for 𝐸 𝑌෠ 𝑈 𝑌෠ 𝑈 𝑌෠
model interpretation and identify the one that can address the
aforementioned challenges. Based on the selected causal graph, we (a) (b) (c)
devise training objectives and desirable properties of our neural in-
terpreters following important causal principles. We then show how Figure 1: (a) The causal graph for existing methods, in which
these training objectives and desirable properties can be achieved explanation 𝐸 is not the sore cause for model prediction 𝑌ˆ ; (b)
through our CIMI framework. Another causal graph, in which explanations are causally suf-
Finally, we conduct extensive experiments on four datasets. The ficient for prediction, but not generalizable; (c) Our proposed
results consistently show that CIMI significantly outperforms base- model which explanation 𝐸 is generalizable and modeled as
lines on both causal sufficiency and generalizability metrics on the only cause for 𝑌ˆ . Observed variables are shaded in blue.
all datasets. Notably, CIMI’s sampling efficiency is also outstand-
ing, emphasizing that our method is quite timely, because it is more Revisiting existing methods from the causal perspective allows
suitable for analyzing large models [57], in which each intervention us to show that many well-known perturbation-based methods
requires a forward pass through the model. This makes our method such as LIME [40], Shapley values [30], and CXPlain [42] actually
particularly suitable for larger pretrained language models: its compute or learn the treatment effect, and that their causal graph
generalizability allows users to save cognitive efforts by checking corresponds to the one shown in Fig. 1(a). Notably, here we only
a fewer number of new inputs, and its sampling efficiency makes it briefly summarize the commonalities and differences among these
more suitable for analyzing large models, in which each sample (or XAI methods by presenting the main intuition behind the mathe-
intervention) requires a forward pass through the model. matical analysis, and formal theoretical analysis can be found in
Appendix A of our full-version paper2 .
2 REVISITING XAI FROM CAUSAL In the causal graph shown in Fig. 1(a), 𝐸 corresponds to the spe-
PERSPECTIVE cific treatment, characterized by one feature (or a set of features) to
be perturbed. By 𝑑𝑜 (𝐸 = 1), these methods include the feature in
2.1 Preliminary about Causal Inference the input, while 𝑑𝑜 (𝐸 = 0) does the opposite. Then, they obtain the
We follow the common terminologies in causal inference [36] to model’s outcome 𝑌ˆ when 𝐸 is changed and compute the treatment
discuss existing and our explanation methods. 2 Since the supplementary material exceeds the space limit of two pages, we put all of
the supplementary material into the full version of the paper, and it can be found in
1 The source code of CIMI is available at https://github.com/Daftstone/CIMI. https://github.com/Daftstone/CIMI/blob/master/paper.pdf

2732
A Causality Inspired Framework for Model Interpretation KDD ’23, August 6–10, 2023, Long Beach, CA, USA

effect 𝑇 (𝑌ˆ |𝑑𝑜 (𝐸)) = 𝑌ˆ (𝑑𝑜 (𝐸 = 1), 𝐶 = 𝑐) − 𝑌ˆ (𝑑𝑜 (𝐸 = 0), 𝐶 = 𝑐), 1(b) fails to model the explanation’s generalizability: in this causal
where 𝐶 denotes the context concerning 𝐸, or more intuitively, the graph, the explanation may change in arbitrary ways when 𝑋
features that remain unchanged after changing 𝐸. The treatment changes. Generalizability is very important for model interpretabil-
effect then composes (or is equal to) the explanation weight, re- ity because it helps to foster human trust and reduce human efforts.
vealing the extent to which the feature can be considered in the Taking a pathological detector as an example, it would be quite
explanation, or "contribution" for each feature in the model predic- disconcerting if entirely different crucial regions of the same pa-
tion. It is worth noting that the do-operator here is directly applied tient were detected at different sectional planes. These prominent
to the data points and collects experiment outcomes, which is differ- common causes that can be generalized to various instances help
ent from traditional modeling confounders and converting causal avoid the high cost of repetitive explanation generation and human
estimand into statistical estimand. investigation for similar instances.
Although all three methods can be summarized using the frame- Our choice. Considering the above, we choose the causal graph in
work in Fig. 1(a), they differ a little in terms of the following aspects. Fig. 1(c), which resembles the Domain Generalization causal graph
It is worth emphasizing that we will see how this unified view al- [31] and follows its common cause principle to build a shared par-
lows us to easily compare the pros and cons of each work. ent node (in our case 𝐸) for two statistically dependent variables (in
• Intervened features 𝐸. CXPlain and Shapley value only con- our case 𝑋 and 𝑌ˆ ). In the causal graph, it is evident that alterations
sider one feature as 𝐸 while LIME uses a set of features as 𝐸 to non-explanatory variable 𝑈 have no impact on the explanation
for testing. Thus, the former two methods cannot measure the 𝐸 or the prediction 𝑌ˆ , only resulting in slight variations in 𝑋 . This
causal contribution of a set of features without further extension demonstrates the stability of the explanation across different in-
or assumptions. stances of 𝑋 and its sufficiency as a cause for the model prediction
• Context 𝐶. Shapley values consider all subsets of features as 𝑌ˆ , as 𝐸 is the only determining factor (parent) for 𝑌ˆ .
possible context, while the other methods take the input instance
𝑥 as the major context. Accordingly, Shapley values compute 3.2 Causality Inspired Problem Formulation
the average treatment effect on all contexts (i.e., all possible Given the causal graph in Fig. 1(c), we aim to learn unobserved
subsets of features) while others consider individual treatment causal factors 𝐸 and 𝑈 , where 𝐸 denotes the generalizable causal ex-
effects. While individual treatment effects may be computed planation for model prediction, and 𝑈 denotes the non-explanations.
more efficiently and have a more precise meaning, their ability Following the common assumption of existing feature-attribution-
to generalize to similar inputs may be significantly reduced. based explanations, we assume that 𝐸 and 𝑈 could be mapped into
• Model output 𝑌ˆ . Most methods track changes in model predic- the input space of 𝑋 . More specifically, we assume that 𝐸 is the
tions, while CXPlain observes how input changes the error of the set of features in 𝑋 that influences 𝑌ˆ , while 𝑈 = 𝑋 \𝐸 is the other
model prediction. Thus, CXPlain may be more useful for debug- features in 𝑋 that are not included in 𝐸. Equivalently, 𝐸 and 𝑈 can
ging, while the others may be more suitable for understanding be represented by learning masks 𝑀 over 𝑋 :
model behavior.
• 𝐸 = 𝑀 ⊙ 𝑋 , where ⊙ is element-wise multiplication, and 𝑀𝑖 = 1
means that the 𝑖 − 𝑡ℎ feature in 𝑋 is included in the explanation.
3 METHODOLOGY
• 𝑈 = (1 − 𝑀) ⊙ 𝑋 , where 𝑀𝑖 = 0 means the 𝑖 − 𝑡ℎ feature in 𝑋 is
3.1 Causal Graph included in the non-explanation.
Causally insufficiency of explanations in Fig. 1(a). From the Our goal is to learn a function 𝒈 : 𝑿 → 𝑴 that inputs an instance
previous section, we have seen that existing work adopts the causal 𝑋 = 𝑥 and outputs the masks representing the causal factors 𝐸 and
graph in Fig. 1(a). The major issue of this framework is that the non-causal factor 𝑈 . Function 𝑔 is the interpreter in this paper3 .
model prediction 𝑌ˆ is determined by both the explanation and the In our work, we relax 𝑀 ∈ {0, 1} to [0, 1] (do not discretize
context, in other words, the explanation 𝐸 is not the core cause for the probability vector of 𝑔), which not only guarantees the end-to-
𝑌ˆ . Thus, even if the users have carefully checked the explanations, end training of our neural interpreter but also distinguishes the
the problem remains as long as the specific context is a potential different contributions of features to the output. We also try to
cause for the model prediction, thereby the real complete reason discretize 𝑀 using deep hashing technique [26], see Section 4.6 for
for the model prediction cannot be seen. the comparison and discussion.
Solving the causal insufficiency issue. The causal insufficiency
of explanations may be addressed by removing context as a causal 3.3 Optimization Principles and Modules
of the model prediction. Fig. 1(b) and (c) show two possible causal
It is impractical to directly reconstruct the causal mechanism in
graphs to solve this issue. Here, 𝑋 denotes the random variable
the causal graph of Fig. 1(c) since important causal factors are
for input instances. 𝐸 and 𝑈 are unknown random variables for
unobservable and ill-defined [31]. However, causal factors in causal
explanations and non-explanations respectively, where 𝐸 = 𝑥𝑒
graphs need to follow clear principles. We use the following two
means that the explanation for 𝑋 = 𝑥 is 𝑥𝑒 , and 𝑈 = 𝑥𝑢 means
main principles in causality inference to devise desirable properties.
that the non-explanation for 𝑋 = 𝑥 is 𝑥𝑢 . In both causal graphs, 𝑌ˆ
has the only parent (cause), which is the explanation, making the
3 Although 𝑔 : 𝑋 → 𝑀 and the flow in Fig. 1(c) appear to be reversed, this is reasonable
explanation sufficient to model prediction.
because 𝑀 = 𝑔 (𝑥 ) is a normal symmetric equation. Since the direction of flows in our
Issue of explanations’ generalizability. While both causal graphs framework does not imply causal direction, defining 𝑔 is okay as long as 𝑋 → 𝑀 is a
allow explanations to be the only cause of model predictions, Fig. many-to-one (or one-to-one) mapping, which is exactly our cases.

2733
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chenwang Wu, Xiting Wang, Defu Lian, Xing Xie, & Enhong Chen

Principle 1. Humean’s Causality Principle [12] 4 : There exists 3.3.2 Causal Intervention Module. Following Principle 2, we de-
a causal link 𝑥𝑖 → 𝑦ˆ if the use of all available information results in sire 𝑈 and 𝐸 to be independent, which makes it possible to find
a more precise prediction of 𝑦ˆ than using information excluding 𝑥𝑖 , the invariable explanations of neighboring instances and improve
all causes for 𝑦ˆ are available, and 𝑥𝑖 occurs prior to 𝑦.
ˆ the interpreter’s generalizability. Despite the lack of true explana-
Principle 2. Independent Causal Mechanisms Principle [38]: tions for supervised training, we have the prior knowledge that the
The conditional distribution of each variable given its causes (i.e., its learned interpreter 𝑔 should be invariant to the intervention of 𝑈 ,
mechanism) does not inform or influence the other mechanisms. that is the 𝑑𝑜 (𝑈 ) does not affect 𝐸. Based on this prior knowledge,
Accordingly, we design three modules to ensure that the ex- we design a causal intervention loss to separate explanations.
tracted explanations (causal factors) satisfy the basic properties First, we describe how to intervene on 𝑈 . Following the common
required by Principles 1 and 2. practice [31] in causal inference, we perturb the non-explanation 𝑥𝑢
• Causal Sufficiency Module. Following Principle 1, we desire via a linear interpolation between the non-explanation positions of
to discover 𝐸 that is causally sufficient for 𝑌ˆ by ensuring that the original instance 𝑥 and another instance 𝑥 ′ sampled randomly
𝐸 contains all the information to predict 𝑌ˆ and explaining the from 𝑋 . The intervention paradigm is shown as follows:
dependency between 𝑋 and 𝑌ˆ . Similarly, we also ensure that 𝑈 𝑥𝑖𝑛𝑡 = 𝑔(𝑥) ⊙ 𝑥 + (1 − 𝑔(𝑥)) ⊙ ((1 − 𝜆) · 𝑥 + 𝜆 · 𝑥 ′ ),
is causally insufficient for predicting 𝑌ˆ . | {z } | {z }
• Causal Intervention Module. Following Principle 2, we ensure 𝑖𝑛𝑣𝑎𝑟𝑖𝑎𝑛𝑡 𝑒𝑥𝑝𝑙𝑎𝑛𝑎𝑡𝑖𝑜𝑛 𝑖𝑛𝑡𝑒𝑟 𝑣𝑒𝑛𝑒𝑑 𝑛𝑜𝑛−𝑒𝑥𝑝𝑙𝑎𝑛𝑎𝑡𝑖𝑜𝑛
that 𝑈 and 𝐸 are independent by intervening 𝑈 and guarantee (4)
that the learned 𝑔(𝑋 ) = 𝐸 will not change accordingly. This also where 𝜆 ∼ 𝑈 (0, 𝜖), and 𝜖 limits the magnitude of perturbation.
allows us to find explanations that can better be generalized to Furthermore, we can optimize the following causal intervention
varying cases. loss to ensure that 𝑈 and 𝐸 are independent.
• Causal Prior Module. Following Principle 1, we facilitate the L𝑖 = E𝑥 ℓ (𝑔(𝑥), 𝑔(𝑥𝑖𝑛𝑡 )). (5)
learning of explanations by using potential causal hints as inputs
to the interpreter and weakly supervise over its output causal This loss ensures that the generated explanations do not change
masks 𝑀. These learning priors enable faster and easier learning. before and after intervening in non-explanations. This invariant
property guarantees local consistency of explanations. i.e., inter-
3.3.1 Causal Sufficiency Module. According to Principle 1, to en- preters should generate consistent explanations with neighboring
sure that 𝐸 is a sufficient cause of 𝑌ˆ , it is necessary to guarantee (or similar) data points. This coincides with the smooth landscape
that 𝐸 is the most suitable feature for predicting 𝑌ˆ = 𝑓 (𝑋 ), rather assumption of the loss function of the deep learning model [27],
than other features 𝑈 . In other words, 𝑥𝑒 can always predict 𝑓 (𝑥) which may help to capture more generalizable features and improve
through an optimal function 𝑓 ′ that maps explanation 𝑥𝑒 to 𝑓 (𝑥), the generalizability of the interpreter.
while non-explanation 𝑥𝑢 cannot give meaning information for
predicting 𝑓 (𝑥). Accordingly, the causal sufficiency loss can be 3.3.3 Causal Prior Module. To facilitate the learning of the inter-
modeled as follows preter, we 1) inject potential causal hints into the neural network
of the interpreter, and 2) design a weakly supervision loss on the
L𝑠 ′ = min E𝑥 (ℓ (𝑓 (𝑥), 𝑓 ′ (𝑥𝑒 )) − ℓ (𝑓 (𝑥), 𝑓 ′ (𝑥𝑢 ))), (2) output causal masks 𝑀.
𝑓′
Interpreter neural network design. A core challenge in XAI is
where ℓ (·) is the mean squared error loss, 𝑥𝑒 = 𝑔(𝑥) ⊙ 𝑥, 𝑥𝑢 = that there lack of prior knowledge about which architecture should
(1 − 𝑔(𝑥)) ⊙ 𝑥, and 𝑥 is sampled from the entire model input space. be used for the interpreter [42]. When we learn an interpreter with
In practice, finding the optimal 𝑓 ′ directly is very difficult due the neural network, it is difficult to decide which neural network
to the vast and sometimes even continuous input space. The inter- structure should be used. If the architecture of 𝑔 is not as expressive
action between optimizing 𝑓 ′ and the interpreter 𝑔 may also easily and complex as the black-box model 𝑓 , then how we can be sure
lead to unstable training and difficulty in converging to an optimal that 𝑔 has the ability to understand the original black-box 𝑓 ? If 𝑔
solution [8]. To address this issue, we approximate the optimal 𝑓 ′ could be more complicated than 𝑓 , then it is prone to slow training
by using 𝑓 , under the assumption that the difference between 𝑓 ′ efficiency and overfitting.
and 𝑓 is minimum, considering that explanation 𝑥𝑒 is in the same Our solution to this problem is inspired by Principle 1, which
space with the origin model inputs 𝑋 . By setting 𝑓 ′ to 𝑓 , we are states that causes (model 𝑓 ) are more effective in predicting the
actually minimizing each individual treatment effect, which has a effects (explanation 𝑥𝑒 ). Hence, we generate the explanation 𝑥𝑒
precise causal meaning. Besides, since we do not have to optimize by directly utilizing the parameters of the black-box model 𝑓 . To
𝑓 ′ , it may allow us to sample much fewer samples 𝑥 ′ to optimize achieve this, we use the encoding part of the black-box model 𝑓 (de-
L𝑠 ′ and learn an interpreter 𝑔. In summary, the causal sufficiency noted as 𝑓𝑒 ) as the encoder in our interpreter model 𝑔. The decoder
loss L𝑠 rewritten as follows of 𝑔 is a simple neural network, denoted as 𝜙. The ease of learning
is supported by information bottleneck theory, which states that
L𝑠 = E𝑥 (ℓ (𝑓 (𝑥), 𝑓 (𝑥𝑒 )) − ℓ (𝑓 (𝑥), 𝑓 (𝑥𝑢 ))), (3)
information in each layer decreases as we progress through the
where 𝑥𝑒 = 𝑔(𝑥) ⊙ 𝑥, and 𝑥𝑢 = (1 − 𝑔(𝑥)) ⊙ 𝑥. model [13]. Therefore, the input 𝑥 contains the most information,
while 𝑓𝑒 (𝑥) contains less information as the information deemed
4 Althoughthe principle needs a clear occurrence order of variables, we follow it only unnecessary for prediction has been removed. The final prediction
to determine the relationship between 𝐸/𝑈 and 𝑌ˆ , which can be satisfied. and ground-truth explanation use the least amount of information.

2734
A Causality Inspired Framework for Model Interpretation KDD ’23, August 6–10, 2023, Long Beach, CA, USA

𝒙𝒆
Black-box model
Interpreter
𝒇 = 𝒇𝒅 ∘ 𝒇𝒆
𝒈 = 𝝓 ∘ 𝒇𝒆 Explanation 𝒙𝒖
mask 𝑔(𝑥)
𝒇(𝒙𝒆 ) Causal
𝒙 𝒇(𝒙𝒖 ) sufficiency
⊙ lossℒ𝑠
𝒇(𝒙)

Instance 𝒙 𝒇𝒆 (∙) 𝝓(∙) 𝒇𝒆 (∙) 𝒇𝒅 (∙)

 ℒ𝑠 = ℓ 𝑓 𝑥 , 𝑓 𝑥𝑒 − ℓ 𝑓 𝑥 , 𝑓(𝑥𝑢 )
𝒙
 ℒ𝑖 = ℓ(𝑔 𝑥 , 𝑔(𝑥𝑖𝑛𝑡 )) Annotation
𝒙𝒊𝒏𝒕
𝒈(𝒙) Causal
… Instance 𝑥
… Instance 𝑥 ′ that
satisfies 𝑥 ′ = 𝑥
intervention
𝒙
𝒈(𝒙𝒊𝒏𝒕 )
lossℒ𝑖 … Explanation
mask 𝑔(𝑥)
… Intervened
instance 𝑥𝑖𝑛𝑡
𝒈(𝒙) Weakly
𝒙′ 𝒈𝒙′ (𝒙)
supervising
lossℒ𝑝
… Explanation
𝑥𝑒 = 𝑔 𝑥 ⊙ 𝑥
Fixed component
in black box
𝒇𝒆 (∙) 𝝓(∙)

 ℒ𝑝 = log 𝜎 𝑔 𝑥 − 𝑔𝑥′ (𝑥) … Non-explanation


𝑥𝑢 = 1 − 𝑔 𝑥 ⊙ 𝑥
Trainable component
in interpreter

Figure 2: The framework of CIMI. The only trainable component is the decoder 𝜙, which is a simple neural network that can be
trained with a relatively small number of samples.

Consequently, compared with 𝑋 , the last embedding layer output included in 𝑥𝑒 while minimizing the probability that a token not in
𝑓𝑒 (𝑋 ) is a better indicator to find the explanation. 𝑥 (noise) is predicted to be the explanation:
Based on this observation, we design 𝜙 so its input concatenates
L𝑝 = E𝑥,𝑥 ′ ,𝑥≠𝑥 ′ log 𝜎 (𝑔(𝑥) − 𝑔𝑥 ′ (𝑥)), (7)
the encoded embedding 𝑓𝑒 (𝑥) ∈ R |𝑥 | ×𝑑 and the original instance
embedding 𝑣 𝑥 ∈ R |𝑥 | ×𝑑 along the axis 1, i.e., [𝑓𝑒 (𝑥); 𝑣 𝑥 ] 1 ∈ R |𝑥 | ×2𝑑 , where 𝑔𝑥 ′ (𝑥) means to map 𝑓𝑒 (𝑥) to 𝑥 ′
in 𝑔(𝑥), refer to Eq. 6, that is,
where 𝑑 is the dimension of embedding, and the operator [𝑎; 𝑏]𝑖 𝑔𝑥 ′ (𝑥) = 𝜙 ([𝑓𝑒 (𝑥); 𝑣 𝑥 ′ ] 1 ). Correspondingly, 𝑔𝑥 (𝑥) = 𝜙 ( [𝑓𝑒 (𝑥); 𝑣 𝑥 ] 1 ) =
denotes the axis 𝑖 along which matrix 𝑎 and 𝑏 will be joined. There- 𝑔(𝑥), and the subscript in 𝑔𝑥 (𝑥) are omitted for simplicity.
fore, the decoder 𝜙 maps input [𝑓𝑒 (𝑥); 𝑣 𝑥 ] ∈ R |𝑥 | ×2𝑑 to [0, 1] |𝑥 | ×1 , This weakly supervising loss prevents the interpreter from overly
and the 𝑖-th dimension of the output represents the probability that optimistically predicting all tokens as explanations, which helps
the token 𝑖 is used for explanation. 𝜙 can be any neural network. alleviate trivial solutions.
In summary, the interpreter 𝑔 can be reformulated as 3.3.4 Overall Framework and Optimization. Overall loss func-
𝑔(𝑥) = 𝜙 ( [𝑓𝑒 (𝑥); 𝑣 𝑥 ] 1 ). (6) tion. Combining the above three modules, the overall optimization
objective of CIMI is summarized as follows and the framework is
By setting the encoder in 𝑔 as 𝑓𝑒 , the architecture of 𝑔 could be shown in Fig. 2.
as complicated as 𝑓 , and such a complex structure helps to fully min L𝑠 + 𝛼 L𝑖 + L𝑝 , (8)
understand the model’s decision-making mechanism. 𝑔 can also 𝜙
be considered simple, because the parameters of 𝑓𝑒 in 𝑔 are fixed where 𝛼 is the trade-off parameter. Notably, the introduction of
and only the decoder 𝜙 is learnable, while only requiring a few weakly supervising loss is to avoid the difficulty of tuning regulariza-
additional parameters (1-layer LSTM + 2-layers MLP in our paper), tion parameters, so this term does not require trade-off parameters.
avoiding the issues of overfitting and high training cost. Analysis of the framework. As shown in Fig. 2, the only trainable
Weakly supervising loss. Without a further regularization loss on parameter in our framework is the simple decoder in the interpreter
the causal factors, there exists a trivial solution (i.e., all explanation 𝑔, which uses a 1-layer LSTM (hidden size is 64) and 2-layers MLP
masks set to 1) that makes the interpreter collapse. A common (64 × 16 and 16 × 2). This enables us to learn the interpreter effi-
regularization in causal discovery is sparsity loss which requires ciently with a small number of forward propagations through 𝑓 .
the number of involved causal factors to be small [5]. However, The validity of our framework can be further verified by consid-
this sparsity loss may fail to adapt to the different requirements of ering the information bottleneck theory, which says that during
different instances, as the constraints are the same for complicated the forward propagation, a neural network gradually focuses on
sentences and simple sentences. Therefore, this poses difficulty in the most important parts in the input by filtering information that
tuning the hyper-parameters for different datasets. is not useful for prediction through the layer [13]. According to
To tackle this issue, we leverage noisy ground-truth labels as this theory, setting the first part of the interpreter to the encoder
a prior for the causal factor 𝐸 to guide the learning process. Our 𝑓𝑒 of the black-box model enables the interpreter to filter a large
approach is based on the intuition that the explanation for 𝑥 should portion of noisy information that has been filtered by the black-box
contain more information about 𝑥 itself than information about encoder, thus allowing us to learn the explanations more efficiently
another instance 𝑥 ′ . Using this, we derive a weakly supervision and faithfully. A more formal description of the validity of our
loss by maximizing the probability that the token in instance 𝑥 is framework is given in Appendix C.

2735
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chenwang Wu, Xiting Wang, Defu Lian, Xing Xie, & Enhong Chen

Table 1: Faithfulness comparison when explaining BERT. *** indicate that our method’s improvements over the best results of
baselines results are statistically significant for 𝑝 < 0.001.

Clickbait Hate Yelp IMDB


Method
DFFOT↓ COMP↑ SUFF↓ DFFOT↓ COMP↑ SUFF↓ DFFOT↓ COMP↑ SUFF↓ DFFOT↓ COMP↑ SUFF↓
Gradient 0.5139 0.3651 0.1308 0.2776 0.4880 0.1324 0.4245 0.1497 0.2900 0.3216 0.1390 0.3999
Attention 0.5247 0.3655 0.1213 0.5933 0.2719 0.2718 0.5890 0.0809 0.3935 0.5459 0.0453 0.4496
AXAI 0.5234 0.3641 0.1245 0.4738 0.3210 0.2299 0.5120 0.1115 0.3088 0.4449 0.0891 0.4111
Probing 0.5278 0.3606 0.1249 0.5679 0.3013 0.2462 0.7133 0.0671 0.4392 0.6535 0.0445 0.4531
LIME 0.3994 0.4374 0.0778 0.2800 0.4860 0.1441 0.3346 0.2362 0.3201 0.2777 0.1953 0.4078
KernelSHAP 0.4447 0.4183 0.0725 0.4012 0.3963 0.1897 0.5484 0.0992 0.3488 0.5189 0.0565 0.4297
Rationale 0.5250 0.3651 0.1226 0.5937 0.2719 0.2719 0.5963 0.0838 0.3892 0.5501 0.0420 0.4533
CXPlain 0.4505 0.4092 0.0952 0.3796 0.4414 0.1438 0.4544 0.2287 0.3121 0.2894 0.3273 0.4094
Smask 0.5268 0.3561 0.1320 0.6121 0.2722 0.2735 0.5894 0.0839 0.3863 0.5594 0.0446 0.4492
CIMI 0.3826 0.4612 0.0416 0.2761 0.5022 0.1497 0.1896 0.3100 0.2500 0.1270 0.3270 0.3516
t-test *** *** *** *** *** *** *** *** *** ***

Relationship with disentanglement. Our method falls into the Sufficiency (SUFF) [11], in contrast to COMP, it only keeps
group of methods that disentangle the causal effects of vari- important tokens and compares the changes in output probabil-
ables [18], while most existing disentanglement methods focus ities over the original predicted class. Notably, this metric is not
on disentangling the latent representations [53]. The additional equivalent to the causal sufficiency we focus on.
causal perspective is essential for ensuring the extraction of com- The number of important tokens is selected from {1, 5, 10, 20, 50}
mon causes to model prediction, which improves both explanation and the average performance is taken. Notably, SUFF and COMP
faithfulness and generalizability. To better illustrate the effective- have been proven to be more faithful to the model prediction than
ness, we implement a variant of CIMI that uses the representation other metrics [6]. In addition to the above metrics, we use AvgSen
disentanglement loss [53], which can be found in Appendix E.7. to measure the explanation’s generalizability.
Average Sensitivity (AvgSen) [52], which measures the aver-
4 EXPERIMENTS age sensitivity of an explanation when the input is perturbed. In
our experiments, we replace 5 tokens per instance and calculate
4.1 Experimental Setup the sensitivity of top-10 important tokens.
4.1.1 Datasets. We use four datasets from the natural language In addition, we present the generated explanation instances in
processing domain, including Clickbait [2], Hate [9], Yelp [56], Appendix E.9 for a more intuitive evaluation.
and IMDB [32]. See Appendix D.1 for their details.
4.1.4 Parameter Settings. For the two pre-training models used,
4.1.2 Black-box Models and Baselines. Although pre-training mod- BERT and RoBERTa, we both add two-layer MLP as decoders for
els are brilliant in many fields, it is difficult to answer what informa- downstream tasks. In all four datasets, the optimization is based
tion they encode [25]. For this reason, we choose two well-known on Adam with a learning rate of 1𝑒 − 5. The training epoch is 20,
pre-training models, BERT [10] and RoBERTa [29], as the black and the batch size is 8. For the proposed CIMI, without special
box models to be explained. Notably, the main body shows the instructions, we train 100 epochs on Clickbait and Hate, and 50
experimental results on BERT, and the results of RoBERTa can be epochs on the other two larger datasets to improve efficiency. For
found in Appendix E.2 and E.3. the trade-off parameter 𝛼, set it to 1, 1, 1, 0.1 on Clickbait, Hate, Yelp,
We compare CIMI with Gradient [47], Attention [3], LIME and IMDB respectively. In addition, the perturbation magnitude 𝜖
[40], KernelSHAP [30], Rationale [24], Probing [1], CXPlain in the causal intervention module is set to 0.2.
[42], AXAI [39], Smask [27]. Their details can be found in Appen-
dix D.3. Although there are some works [28, 37] that can be used
4.2 Faithfulness Comparison
to extract causal explanations, they often make strict assumptions
about the underlying data format, so they cannot be compared In this section, we evaluate the causal sufficiency of the explanations
fairly, and we put the comparison in Appendix E.1. using faithfulness metrics. Table 1 summarizes the average results
of 10 independent repeated experiments.
4.1.3 Evaluation Metrics. First, we evaluate the causal sufficiency First, it can be seen that the proposed method achieves the
using three faithful metrics (see Appendix D.4 for more details). best or comparable results compared with the baselines on var-
Decison Flip-Fraction of Tokens (DFFOT) [43], which mea- ious datasets. In particular, this improvement is more pronounced
sures the minimum fraction of important tokens that need to be on more complex datasets (from Clickbait → IMDB). For example,
erased in order to change the model prediction. the improvement over the best baselines reaches 119% on IMDB
Comprehensiveness (COMP) [11], which measures the faith- w.r.t. DFFOT metric. Such invaluable property could adapt to the
fulness score by the change in the output probability of the original complex trend of black-box models. This gratifying result veri-
prediction class after the important tokens are removed. fies that CIMI can generate explanations that are more faithful to

2736
A Causality Inspired Framework for Model Interpretation KDD ’23, August 6–10, 2023, Long Beach, CA, USA

Gradient Attention AXAI Probing LIME KernelSHAP Rationale CXPlain Smask CIMI
0.7 0.6
0.5 0.7
0.6 0.5
0.6
0.5 0.4
0.4 0.5
COMP

COMP

COMP

COMP
0.4
0.3 0.3 0.4
0.3 0.3
0.2 0.2
0.2 0.2
0.1 0.1 0.1 0.1
0.0 0.0 0.0 0.0
2 4 6 8 10 0 5 10 15 20 25 0 10 20 30 40 0 10 20 30 40 50
Explanation Length Explanation Length Explanation Length Explanation Length
(a) Clickbait (b) Hate (c) Yelp (d) IMDB

Figure 3: Performance comparison concerning COMP under different length explanations.

Table 2: Generalizability comparison under BERT. IMP indicates the improvement of our method compared to baselines.

Clickbait Hate Yelp IMDB


Method AVG_IMP(%)
AvgSen↓ IMP(%) AvgSen↓ IMP(%) AvgSen↓ IMP(%) AvgSen↓ IMP(%)
Gradient 0.2182 43.18 0.4530 123.68 0.7934 561.19 0.8088 612.60 335.16
Attention 0.2155 41.42 0.5413 167.29 0.8642 620.18 0.9482 735.44 391.09
Lime 0.2036 33.59 0.4689 131.56 0.7880 556.68 0.8555 653.74 343.89
KernelSHAP 0.2022 32.66 0.4989 146.38 0.8280 589.96 0.9180 708.82 369.45
Rationale 0.2163 41.93 0.5440 168.64 0.8650 620.87 0.9497 736.73 392.04
Probing 0.1460 -4.23 0.2093 3.34 0.2953 146.08 0.2873 153.14 74.58
CXPlain 0.2101 37.86 0.5066 150.19 0.8135 577.95 0.8315 632.58 349.65
AXAI 0.2146 40.79 0.5127 153.17 0.8221 585.08 0.9103 702.05 370.27
Smask 0.2179 42.98 0.5532 173.20 0.8662 621.80 0.9468 734.16 393.03
CIMI 0.1524 0.2025 0.1200 0.1135

the model. Second, Gradient has impressive performance in some 4.4 Effectiveness of Causal Modules
cases, which indicates that their linear assumption can reflect the 4.4.1 Effectiveness w.r.t. Faithfulness. In this section, we verify the
model’s decision-making process to some extent. Third, among effectiveness of the proposed three causal modules concerning
the perturbation-based methods, LIME, KernelSHAP, and CXPlain faithfulness. We define versions that remove causal sufficiency loss,
all show satisfactory performance. Especially LIME based on local causal intervention loss, weakly supervising loss, and interpreter’s
linear approximation, which once again verifies the rationality of encoder 𝑓𝑒 as CIMI-s, CIMI-i, CIMI-p, and CIMI-f, respectively. Their
the first finding, the model linear assumption. sufficient effects are shown in Fig. 4. Overall, removing any module
In addition, we also illustrate the performance w.r.t. COMP under leads to performance degradation, which justifies the design of three
different explanation lengths, as shown in Fig. 3 (similar findings modules. Specifically, first, we found that the removal of the causal
can be obtained when concerning SUFF). The experimental results sufficient module (CIMI-s) has an impact on sufficient performance,
show that regardless of explanation length, CIMI exhibits significant which reasonably explains the original intention of our design of
competitiveness. The above results demonstrate the power of causal this module, ensuring that the explanations 𝐸 are causally sufficient
principle constraints in CIMI for understanding model predictions. for the model predictions 𝑌ˆ . Second, the sufficient impact of the
causal intervention module (CIMI-i) is marginal, since this module
4.3 Generalizability Comparison is designed primarily for the explanation’s generalizability. Finally,
We use AvgSen to evaluate the generalizability of the explanations both the weakly supervising loss and the interpreter’s encoder
to neighboring (similar) samples. It is undeniable that for AvgSen, design in the causal prior module can assist the model to learn
some important tokens included in the explanation may be replaced, more easily.
but the probability is low, especially in Yelp and IMDB which have
more tokens. The results are summarized in Table 2. It can be found 4.4.2 Effectiveness w.r.t. Generalizability. In this section, we dis-
that the explanations generated by CIMI are the most generalizable. cuss the generalizable effect of the three causal modules. Keeping
Specifically, in the four datasets, at least 8 of the top-10 impor- the experimental settings consistent with Section 4.3, the results
tant tokens before and after perturbation are consistent, which are illustrated in Fig. 5. First, we find that the causal sufficiency
is impossible for other methods. Besides, as the dataset becomes module helps improve generalizability against perturbations on
more complex, our performance remains stable while the baselines Clickbait and Hate, but significantly degrades performance on Yelp
decrease significantly. these results demonstrate the outstanding and IMDB. We suspect that in the latter two datasets, CIMI-s expla-
ability of the proposed method to capture invariant generalizable nations are not faithful to model prediction (Fig. 4 (c)(d)), then it is
features. Additionally, We conduct generalizability evaluation in difficult to capture the invariant explanations of model decisions
attack settings [45, 50], see Appendix E.8 for the comparison. from similar instances, resulting in a decrease in generalizability.

2737
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chenwang Wu, Xiting Wang, Defu Lian, Xing Xie, & Enhong Chen

CIMI CIMI-i CIMI-s CIMI-p CIMI-f


0.5 0.5
0.4
0.4 0.3 0.4
Value

Value

Value

Value
0.3 0.3
0.3
0.2
0.2 0.2 0.2
0.1 0.1
0.1 0.1
0.0 0.0 0.0 0.0
COMP SUFF COMP SUFF COMP SUFF COMP SUFF
Metric Metric Metric Metric
(a) Clickbait (b) Hate (c) Yelp (d) IMDB

Figure 4: The faithful effect of the causal modules concerning COMP and SUFF.

CIMI CIMI-i CIMI-s CIMI-p CIMI-f


[4, 26], denoted as CIMI-Gumbel and CIMI-DHL, respectively. As
0.8 shown in Fig. 7 below, discrete masks help improve the explanation
0.7
0.6
generalizability at the expense of a significant performance decline
AvgSen

0.5 w.r.t. faithfulness. Specifically, in most cases, Deep Hash Learning


0.4
0.3
contributes to explanations’ generalizability (AvgSen), because the
0.2 change in the mask value domain ([0, 1] → {0, 1}) enables the
0.1 explanations to be insensitive to noise. However, this discrete mask
0.0
Clickbait Hate Yelp IMDB cannot distinguish the relative importance between features (e.g.,
Dataset
when the probabilities of two words being explanations are 0.9 and
0.6, respectively, they are considered indistinguishable explanations
Figure 5: The generalizable effect of the causal modules con-
after discretization), leading to a significant decline in performance
cerning AvgSen.
during faithfulness tests that require a correct ordering of features
according to their relative importance. We will add these analyses
Second, only CIMI-i’s performance decreases consistently across to the paper, which we believe will help readers better understand
the four datasets. This is because the causal intervention module the motivation for differentiable training.
aims to make 𝑈 and 𝐸 independent to ensure that the explanation
can be generalizable to similar instances, that is, generalizability. 4.7 Usefulness Evaluation
Removing it resulted in performance degradation justifying the In addition to allowing us to better understand the model, the ex-
rationality of this module design. planation can also assist people in debugging the model. Noisy data
collection can cause the model to learn wrong correlations during
4.5 Sampling Efficiency training. To this end, this section analyzes the effectiveness of vari-
Fig. 6 illustrates the performance of various perturbation-based ous explanation methods in removing shortcut features. We use a
methods under the same forward propagation times to measure subset of 20 newsgroups that classify "Christianity" and "Atheism".
the sampling efficiency. First, CXPlain’s explanation mechanism The reason for choosing this dataset is that there are many shortcut
makes each sample 𝑥 perturbed at most |𝑥 | times, so it shows high features in its training set, but the test set is clean. For example,
efficiency on small datasets, e,g., Clickbait and Hate. However, it is 99% of the instances in the training set where the word "posting"
meaningless to talk about efficiency without explaining quality. Sec- appears is in the category of "Atheism".
ond, LIME performs well on small datasets (e.g., Clickbait), however, To test whether an explanation method can help detect short-
as the dataset becomes more complex, more sampling is required cuts, we first train a BERT model on the noisy training set. Then,
to generate high-quality explanations. Rationale’s training is unsta- we obtain explanations of different methods and treat a token in
ble and prone to trivial solutions, resulting in insensitivity to the the explanation as the potential short-cut if it does not appear in
number of sampling. Finally, our method significantly outperforms the clean test (more details in Appendix F). We then retrain the
baselines, especially on Hate, where we only need 3 samplings to classification model after removing shortcuts. The metric for eval-
outperform baselines with 100 samplings. This benefits from the uating the quality of shortcuts is based on the retrained model’s
generalization of the neural network under the constraints of the performance (better classification performance implies that the
causal principle, which summarizes the causal laws from a large shortcuts found are more accurate). The result is shown in Fig. 8.
number of data points and generalizes them to different instances, First, both explanation methods can effectively remove shortcuts
ultimately improving efficiency. to improve the model performance. Second, the improvement of
the proposed CIMI is more obvious, verifying the usefulness on
4.6 Performance Comparison of Different debugging models.
Discretization Strategies
In CIMI, we use Softmax function for differentiable training, in 5 RELATED WORK
this section, we added two variants of our method that discretize Existing explainable works can be divided into self-explaining meth-
𝑈 and 𝐸 by using Gumbel-Softmax [17] and Deep Hash Learning ods and post-hoc methods [33].

2738
A Causality Inspired Framework for Model Interpretation KDD ’23, August 6–10, 2023, Long Beach, CA, USA

LIME KernelSHAP Rationale CXPlain CIMI


0.46
0.50 0.35
0.44 0.30
0.45 0.30
0.42 0.25
COMP

COMP

COMP

COMP
0.25
0.40
0.40 0.20 0.20
0.35 0.15 0.15
0.38
0.10
0.30 0.10
0.36
0.05
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
The Number of Sampling The Number of Sampling The Number of Sampling The Number of Sampling
(a) Clickbait (b) Hate (c) Yelp (d) IMDB

Figure 6: Performance comparison with the different number of sampling (perturbation) w.r.t. SUFF.

CIMI-Gumbel CIMI-DHL CIMI


0.5 0.35 0.7
0.4 0.30
0.4 0.6
0.25 0.5
0.3
Value

Value

Value

Value
0.3 0.20 0.4
0.2 0.2 0.15 0.3
0.10 0.2
0.1 0.1
0.05 0.1
0.0 0.0 0.00 0.0
COMP SUFF AvgSen COMP SUFF AvgSen COMP SUFF AvgSen COMP SUFF AvgSen
Metric Metric Metric Metric
(a) Clickbait (b) Hate (c) Yelp (d) IMDB

Figure 7: Performance comparison under different discretization strategies.

0.6
Method
Origin
on a fundamental question in XAI: whether existing explanations
0.90

0.5
LIME
CIMI
capture spurious correlations or remain faithful to the underlying
Accuracy

0.85
causes of model behavior. First, many well-known perturbation-
Loss

0.80 0.4
Method
based methods such Shapley values [30], LIME [40], and Smask
0.75 Origin
LIME
0.3 [27] implicitly use causal inference, and their explanatory scores
0.70 CIMI
0.2 correspond exactly to the (average) treatment effect [36]. From a
0 2 4 6 8 10 0 2 4 6 8 10
Epoch Epoch causal point of view, the slight difference between them lies only
in the number of features selected, the contextual information con-
sidered, and the model output. CXPlain [42] explicitly considered
Figure 8: Usefulness evaluation. Classification performance
the non-informative features should have no effect on model pre-
comparison before and after deleting shortcuts.
dictions.

6 CONCLUSION
The self-explaining method focuses on building model architec-
tures that are self-explainable and transparent [51], such as decision We reinterpreted some classic methods from causal inference and
tree [22], rule-based models [49], self-attention mechanisms, [3, 55]. analyzed their pros and cons from this unified view. Then, we
In order to provide rules that are easy for humans to understand, revealed the major challenges in leveraging causal inference for
they are often too simple to enjoy both interpretability and pre- interpretation: causal sufficiency and generalizability. Finally, based
dictive performance [23]. Recently, methods of integrating add-on on a suitable causal graph and important causal principles, we
modules have received increasing attention [7, 15, 20, 23]. However, devised training objectives and desirable properties of our neural
the process of generating explanations remains opaque. interpreters and presented an efficient solution, CIMI. Through
Post-hoc interpretation has received more attention as models extensive experiments, we demonstrated the superiority of the pro-
have gradually evolved into incomprehensible highly nonlinear posed method in terms of the explanation’s causal sufficiency and
forms [51]. Gradient-based methods [30, 44, 47, 48, 54] approxi- generalizability and additionally explored the potential of explana-
mate the deep model as a linear and accordingly incorporate the tion methods to help debug models.
gradient as feature importance. Admittedly, the gradient is only
an approximation of the decision sensitivity. Influence function ACKNOWLEDGMENTS
[21] also has been introduced to understand models, which effi- The work was supported by grants from the National Key R&D
ciently approximates the impact of perturbations of training data Program of China (No. 2021ZD0111801) and the National Natural
through a second-order optimization strategy. Recently, casual in- Science Foundation of China (No. 62022077).
terpretability has attracted increasing attention because it focuses

2739
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chenwang Wu, Xiting Wang, Defu Lian, Xing Xie, & Enhong Chen

REFERENCES Proceedings of WWW’20. 695–705.


[1] Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, and Yoav Goldberg. 2017. [27] Dohun Lim, Hyeonseok Lee, and Sungchan Kim. 2021. Building reliable expla-
Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. nations of unreliable neural networks: locally smoothing perspective of model
In Proceedings of ICLR’17. interpretation. In Proceedings of CVPR’21. 6468–6477.
[2] Aman Anand. 2020. Clickbait Dataset. (2020). [28] Wanyu Lin, Hao Lan, Hao Wang, and Baochun Li. 2022. Orphicx: A causality-
https://www.kaggle.com/datasets/amananandrai/clickbait-dataset. inspired latent variable model for interpreting graph neural networks. In Pro-
[3] David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. ceedings of CVPR’22. 13729–13738.
Network dissection: Quantifying interpretability of deep visual representations. [29] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer
In Proceedings of CVPR’17. 6541–6549. Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A
[4] Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
propagating gradients through stochastic neurons for conditional computation. (2019).
arXiv preprint arXiv:1308.3432 (2013). [30] Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model
[5] Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste- predictions. Advances in neural information processing systems 30 (2017).
Julien, and Alexandre Drouin. 2020. Differentiable causal discovery from in- [31] Fangrui Lv, Jian Liang, Shuang Li, Bin Zang, Chi Harold Liu, Ziteng Wang, and Di
terventional data. Advances in Neural Information Processing Systems 33 (2020), Liu. 2022. Causality Inspired Representation Learning for Domain Generalization.
21865–21877. In Proceedings of CVPR’22. 8046–8056.
[6] Chun Sik Chan, Huanqi Kong, and Liang Guanqing. 2022. A Comparative Study [32] Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng,
of Faithfulness Metrics for Model Interpretability Methods. In Proceedings of and Christopher Potts. 2011. Learning word vectors for sentiment analysis. In
ACL’22. 5029–5038. Proceedings of ACL’11. 142–150.
[7] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter [33] Andreas Madsen, Siva Reddy, and Sarath Chandar. 2022. Post-hoc interpretability
Abbeel. 2016. Infogan: Interpretable representation learning by information for neural nlp: A survey. Comput. Surveys 55, 8 (2022), 1–42.
maximizing generative adversarial nets. NeurIPS’16 (2016). [34] Prashan Madumal, Tim Miller, Liz Sonenberg, and Frank Vetere. 2020. Explain-
[8] Zhongxia Chen, Xiting Wang, Xing Xie, Tong Wu, Guoqing Bu, Yining Wang, able reinforcement learning through a causal lens. In Proceedings of the AAAI
and Enhong Chen. 2019. Co-attentive multi-task learning for explainable recom- conference on artificial intelligence, Vol. 34. 2493–2500.
mendation.. In IJCAI. 2137–2143. [35] Raha Moraffah, Mansooreh Karami, Ruocheng Guo, Adrienne Raglin, and Huan
[9] Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Liu. 2020. Causal interpretability for machine learning-problems, methods and
Automated hate speech detection and the problem of offensive language. In evaluation. ACM SIGKDD Explorations Newsletter 22, 1 (2020), 18–33.
Proceedings of the international AAAI conference on web and social media, Vol. 11. [36] Brady Neal. 2020. Introduction to causal inference from a machine learning
512–515. perspective. Course Lecture Notes (draft) (2020).
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: [37] Matthew O’Shaughnessy, Gregory Canal, Marissa Connor, Christopher Rozell,
Pre-training of deep bidirectional transformers for language understanding. arXiv and Mark Davenport. 2020. Generative causal explanations of black-box classi-
preprint arXiv:1810.04805 (2018). fiers. Advances in Neural Information Processing Systems 33 (2020), 5453–5467.
[11] Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, [38] Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. 2017. Elements of causal
Richard Socher, and Byron C Wallace. 2020. ERASER: A Benchmark to Evaluate inference: foundations and learning algorithms. The MIT Press.
Rationalized NLP Models. Transactions of the Association for Computational [39] Arash Rahnama and Andrew Tseng. 2021. An adversarial approach for explaining
Linguistics (2020). the predictions of deep neural networks. In Proceedings of CVPR’21. 3253–3262.
[12] Clive WJ Granger. 1969. Investigating causal relations by econometric models [40] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i
and cross-spectral methods. Econometrica: journal of the Econometric Society trust you?" Explaining the predictions of any classifier. In Proceedings of KDD’16.
(1969), 424–438. 1135–1144.
[13] Chaoyu Guan, Xiting Wang, Quanshi Zhang, Runjin Chen, Di He, and Xing Xie. [41] Jonathan G Richens, Ciarán M Lee, and Saurabh Johri. 2020. Improving the accu-
2019. Towards a deep and unified understanding of deep neural models in nlp. racy of medical diagnosis with causal machine learning. Nature communications
In International conference on machine learning. PMLR, 2454–2463. 11, 1 (2020), 3923.
[14] JB Heaton, Nicholas G Polson, and Jan Hendrik Witte. 2016. Deep learning in [42] Patrick Schwab and Walter Karlen. 2019. Cxplain: Causal explanations for model
finance. arXiv preprint arXiv:1602.06561 (2016). interpretation under uncertainty. NeurIPS’19 (2019).
[15] Beta-vae Higgins. [n. d.]. Learning basic visual concepts with a constrained [43] Sofia Serrano and Noah A Smith. 2019. Is Attention Interpretable?. In Proceedings
variational framework. In Proceedings of ICLR’17. of ACL’19. 2931–2951.
[16] Xu Huang, Defu Lian, Jin Chen, Liu Zheng, Xing Xie, and Enhong Chen. 2023. [44] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside
Cooperative Retriever and Ranker in Deep Recommenders. In Proceedings of the convolutional networks: Visualising image classification models and saliency
ACM Web Conference 2023. 1150–1161. maps. arXiv preprint arXiv:1312.6034 (2013).
[17] Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization [45] Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju.
with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016). 2020. Fooling lime and shap: Adversarial attacks on post hoc explanation methods.
[18] Olivier Jeunen, Ciarán Gilligan-Lee, Rishabh Mehrotra, and Mounia Lalmas. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 180–186.
2022. Disentangling causal effects from sets of interventions in the presence of [46] Steven Sloman. 2005. Causal models: How people think about the world and its
unobserved confounders. Advances in Neural Information Processing Systems 35 alternatives. Oxford University Press.
(2022), 27850–27861. [47] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wat-
[19] Yahui Jiang, Meng Yang, Shuhao Wang, Xiangchun Li, and Yan Sun. 2020. Emerg- tenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint
ing role of deep learning-based artificial intelligence in tumor pathology. Cancer arXiv:1706.03825 (2017).
communications 40, 4 (2020), 154–166. [48] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution
[20] Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. for deep networks. In International conference on machine learning. PMLR, 3319–
arXiv preprint arXiv:1312.6114 (2013). 3328.
[21] Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions [49] P.-N. Tan. 2018. Introduction to data mining. India.
via influence functions. In International conference on machine learning. PMLR, [50] Chenwang Wu, Defu Lian, Yong Ge, Zhihao Zhu, and Enhong Chen. 2023.
1885–1894. Influence-Driven Data Poisoning for Robust Recommender Systems. IEEE Trans-
[22] Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable actions on Pattern Analysis and Machine Intelligence (2023).
decision sets: A joint framework for description and prediction. In Proceedings of [51] G Xu, TD Duong, Q Li, S Liu, and X Wang. 2020. Causality Learning: A New
KDD’16. 1675–1684. Perspective for Interpretable Machine Learning. IEEE Intelligent Informatics
[23] Seungeon Lee, Xiting Wang, Sungwon Han, Xiaoyuan Yi, Xing Xie, and Meeyoung Bulletin (2020).
Cha. 2022. Self-explaining deep models with logic rule reasoning. In Advances in [52] Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I Inouye, and Pradeep K
Neural Information Processing Systems. Ravikumar. 2019. On the (in) fidelity and sensitivity of explanations. Advances
[24] Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predic- in Neural Information Processing Systems 32 (2019).
tions. In Proceedings of EMNLP’16. 107–117. [53] Linan Yue, Qi Liu, Yichao Du, Yanqing An, Li Wang, and Enhong Chen. 2022.
[25] Zhen Li, Xiting Wang, Weikai Yang, Jing Wu, Zhengyan Zhang, Zhiyuan Liu, DARE: Disentanglement-Augmented Rationale Extraction. Advances in Neural
Maosong Sun, Hui Zhang, and Shixia Liu. 2022. A unified understanding of Information Processing Systems 35 (2022), 26603–26617.
deep nlp models for text classification. IEEE Transactions on Visualization and [54] Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolu-
Computer Graphics 28, 12 (2022), 4980–4994. tional networks. In European conference on computer vision. Springer, 818–833.
[26] Defu Lian, Haoyu Wang, Zheng Liu, Jianxun Lian, Enhong Chen, and Xing [55] Quanshi Zhang, Ying Nian Wu, and Song-Chun Zhu. 2018. Interpretable convo-
Xie. 2020. Lightrec: A memory and search-efficient recommender system. In lutional neural networks. In Proceedings of CVPR’18. 8827–8836.

2740
A Causality Inspired Framework for Model Interpretation KDD ’23, August 6–10, 2023, Long Beach, CA, USA

[56] Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional A SUPPLEMENTARY MATERIAL
networks for text classification. NeurIPS’15 (2015).
[57] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, The source code of CIMI is available at https://github.com/Daftstone/
Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, CIMI, and the full version of the paper (including the main text and
Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang,
Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large
all appendices) is available at https://github.com/Daftstone/CIMI/
Language Models. arXiv:2303.18223 [cs.CL] blob/master/paper.pdf.

2741

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy