0% found this document useful (0 votes)
13 views45 pages

Interpretable Machine Learning (MSBA 7027) : Zhengli Wang

The document discusses the trade-off between accuracy and interpretability in machine learning models, highlighting various methods to achieve interpretability such as building interpretable models and deriving explanations for complex models. It categorizes interpretable machine learning methods based on agnosticity, scope, and explanation type, covering techniques like Permutation-based Feature Importance, Partial Dependence Plots (PDP), LIME, and SHAP. Each method is summarized with its explanation type, scope, and agnosticity, providing insights into their applications and implementations.

Uploaded by

204118568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views45 pages

Interpretable Machine Learning (MSBA 7027) : Zhengli Wang

The document discusses the trade-off between accuracy and interpretability in machine learning models, highlighting various methods to achieve interpretability such as building interpretable models and deriving explanations for complex models. It categorizes interpretable machine learning methods based on agnosticity, scope, and explanation type, covering techniques like Permutation-based Feature Importance, Partial Dependence Plots (PDP), LIME, and SHAP. Each method is summarized with its explanation type, scope, and agnosticity, providing insights into their applications and implementations.

Uploaded by

204118568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Interpretable Machine Learning (MSBA 7027)

Zhengli Wang

Faculty of Business and Economics


The University of Hong Kong
2021

1
Accurate vs Interpretable: a Tradeoff
Ensemble algorithm / Neural Networks
Highly accurate models
• Non-linear & Non-smooth relationship
Random forest / Boosting • Long computation time
Support Vector Machine Highly interpretable models
• Linear and smooth relationships
Accuracy

Nonlinear Regression
K-Nearest Neighbors
• Short computation time
Decision Trees

Linear Regression

Interpretability

Simple linear model: easily interpreted, but prediction not accurate for complex problems
Complex nonlinear model: better performance, but too complex for humans to understand

Source: Machine learning for 5G/B5G Mobile and Wireless Communications: Potential, Limitations and Future Directions
2
Achieve Interpretability: Two Options

Derive explanations
Build interpretable for complex ML
ML models models

Model-based Post-hoc
(Opposite of Ad-hoc)

3
Categorization of Interpretable ML Methods: Overview

Agnosticity Scope Explanation type

Model-agnostic: Global explanation: Visual


Applicable to all Explaining the
model types whole model
Feature importance
Local explanation:
Model-specific:
Explaining
Only applicable to a
individual Surrogate
specific model type
predictions

4
Categorization of Interpretable ML Methods: Agnosticity

e.g. Permutation-based feature importance,


PDP, LIME, SHAP

e.g. Impurity-based feature importance (only


applicable to tree-based method, e.g. RF, GBM)

5
Categorization of Interpretable ML Methods: Agnosticity

e.g. Linear regression coeff, Tree split var

6
Categorization of Interpretable ML Methods: Agnosticity

e.g. PDP

e.g. Impurity-based feature importance,


Permutation-based feature importance

7
Achieve Interpretability: Two Options

Derive explanations
Build interpretable for complex ML
ML models models

Model-based Post-hoc
(Opposite of Ad-hoc)

8
Build Interpretable ML Models
In many instances, simple models suffice, we do NOT always need complex models
• Linear regression
• learns the coefficients () for a weighted sum of feature inputs

𝑦 = 𝛽0 + 𝛽1 𝑥1 + ⋯ + 𝛽𝑛 𝑥𝑛

• can directly interpret the impact of the inputs

• Decision Tree

Source: https://www.vebuso.com/2020/01/decision-tree-intuition-from-concept-to-application/
9
Achieve Interpretability: Two Options

Derive explanations
Build interpretable for complex ML
ML models models

Model-Agnostic

Model-based Post-hoc
(Opposite of Ad-hoc)

10
Outline

• Permutation-based Feature Importance

• PDP

• LIME (Local Interpretable Model-Agnostic Explanations)

• SHAP (SHapley Additive exPlanations)

11
Permutation-based Feature Importance

Idea If a feature is important, randomly permute it in the training data


would make the resulting model worse

If a feature is NOT important, permuting its values will likely


keep the model error relatively unchanged

12
Permutation-based Feature Importance
Example • Sample 50% (say 1000) training data, fix a feature
• Use the 1st row as the benchmark loss
• then permute the value of that feature (to each values of the feature in
the 1000 obs), compute the loss, take difference of that with original
loss
• Repeat for every row (for a total of 1000 rows)

13
Permutation-based Feature Importance

Implementation: vip package

• Sample observations before permuting features

• Repeat simulations many times (stabilize estimate)

14
Permutation-based Feature Importance
Takes in model object & newdata, and returns
the predictions

Sample 50% of training data


Repeat simulations 5 times
15
Permutation-based Feature Importance

Note

• Can become slow as #predictors grows

• Can speed up computation by


•  sample size
•  #simulations

• However, these may also  variability of feature importance estimates

16
Permutation-based Feature Importance

Summary

Expln. Type: Feature Importance

Scope: Global

Agnosticity: Model-Agnostic

17
PDP (Partial Dependence Plot)
Idea Understand marginal effect of a feature on the predicted outcome

By marginal effect, we take into account the average effect of all


the other features

18
PDP (Partial Dependence Plot)
Example Feature of interest: Gr_Liv_Area

Gr_Liv_Area X1 X2 X3 …
1 687 0 a 2 …
2 334 0 c 6 …
3 2107 1 c 4 …
4 3329 0 b 2 …
5 5095 1 a 2 …

Construct a grid of j evenly spaced values across the range of Gr_Liv_Area


Say j = 20, the grid will consist of 20 values: 334, 585, 919, 1253, …, 5095

19
PDP (Partial Dependence Plot)
Example
Gr_Liv_Area X1 X2 X3 …
1 334 0 a 2 …
2 334 0 c 6 …
3 334 1 c 4 …
Model ෠
𝑚𝑒𝑎𝑛(𝑌)
4 334 0 b 2 …
5 334 1 a 2 …

Gr_Liv_Area X1 X2 X3 …
260
1 585 0 a 2 …
2 585 0 c 6 … 239

Mean (Yhat) x 1000


3 585 1 c 4 …
4 585 0 b 2 … 218

5 585 1 a 2 …
Model ෠
𝑚𝑒𝑎𝑛(𝑌) 197

… 176

155

4844
4593
4093
3842
3341

4343
334

835

1336

2339

3592
3090
2840
1586

2589
2088
585

1837
1086

5095
Gr_Liv_Area X1 X2 X3 …
Gr_Liv_Area
1 5095 0 a 2 …
2 5095 0 c 6 …

𝑚𝑒𝑎𝑛(𝑌)
3 5095 1 c 4 … Model
4 5095 0 b 2 …
5 5095 1 a 2 … 20
PDP (Partial Dependence Plot)

Implementation: pdp package

• Widely used, mature and flexible package for constructing PDPs

• For h2o models, need to create a custom prediction func wrapper

• Create a custom prediction func (note we return the mean)

• Then use pdp::partial() to compute PDP values

21
PDP (Partial Dependence Plot)
Takes in model object & newdata, and returns
the mean of predictions

Func from pdp package

j = 20

22
PDP (Partial Dependence Plot)

23
PDP (Partial Dependence Plot)

Yhat

Gr_Liv_Area

24
PDP (Partial Dependence Plot)

Summary

Expln. Type: Visual

Scope: Global

Agnosticity: Model-Agnostic

25
LIME (Local Interpretable Model-Agnostic Explanations)
Expln. Type Surrogate Models
Idea Every complex model is linear on a local scale

X
LIME (Local Interpretable Model-Agnostic Explanations)
Expln. Type Surrogate Models
Idea Every complex model is linear on a local scale
Y
Local linear model

X
X

27
LIME (Local Interpretable Model-Agnostic Explanations)

𝜉 𝑥 = argmin ℒ 𝑓, 𝑔, 𝜋𝑥 + Ω(𝑔)
𝑔𝜖𝐺

28
LIME (Local Interpretable Model-Agnostic Explanations)

Loss Func Penalty

𝜉 𝑥 = argmin ℒ 𝑓, 𝑔, 𝜋𝑥 + Ω(𝑔)
𝑔𝜖𝐺

New Complex
data model Proximity
func
Family of Simple
interpretable interpretable
models model

29
LIME (Local Interpretable Model-Agnostic Explanations)

𝜉 𝑥 = argmin ℒ 𝑓, 𝑔, 𝜋𝑥 + Ω(𝑔)
𝑔𝜖𝐺

1. Sample new datapoints (each feature generated from its distribution)

2. Get predictions from the complex model f

This results in a new dataset Z with


• Labels: Prediction of complex model
• Features: Newly generated datapoints

30
LIME (Local Interpretable Model-Agnostic Explanations)

𝜉 𝑥 = argmin ℒ 𝑓, 𝑔, 𝜋𝑥 + Ω(𝑔)
𝑔𝜖𝐺

2 Penalty func
e.g. 𝑓, 𝑔, 𝜋𝑥 = σ𝑧∈Ζ 𝜋𝑥 z 𝑓 𝑧 − 𝑔 𝑧 e.g. LASSO

31
LIME (Local Interpretable Model-Agnostic Explanations)

Summary

Expln. Type: Surrogate Models

Scope: Local

Agnosticity: Model-Agnostic

32
SHAP (SHapley Additive exPlanations)
Motivation

• Limitation of observing single feature effect at a time


• Miss interaction between features
• May produce misleading explanations for the ML model

• Remedy
• Borrows idea from cooperative game theory (will explain in detail)
• observe [change in outcome] for each possible subset of features
• combine these changes to form a unique contribution for each feature value

33
SHAP (SHapley Additive exPlanations)
Expln. Type Feature Importance

Idea Cooperative Game Theory

• Imagine several cooperative players in a game


• After the game is over, receive certain payout
• Problem: Divide the payout among players, in a fair way

• Answer: Shapley values, which describes the average contribution of


each player

34
SHAP (SHapley Additive exPlanations) – Game Theory
Fairness is tricky
• E.g. each individual tends to think he/she contributes the most

Let v be the payoff function

Naïve idea: pay everyone by his/her marginal contribution


• This idea does NOT work in the following scenario
p = 4 players, v({1,2,3,4}) = 10000, v(S) = 0 for all S≠{1,2,3,4}

Intuitively, how much should we pay each individual?

35
SHAP (SHapley Additive exPlanations) – Game Theory
Suppose there are p players: player 1,2,3,…,p
Shapley value of individual j
𝑆 ! 𝑝− 𝑆 −1 !
𝜙𝑗 = ෍ 𝑣 𝑆 ∪ {𝑗} − 𝑣(𝑆)
𝑝!
𝑆⊆{1,…,𝑝}\{𝑗}

Satisfies certain desired properties (Symmetry, Dummy Player, Additivity)

36
SHAP (SHapley Additive exPlanations) – Game Theory
Example cont’d
p = 4 players, v({1,2,3,4}) = 10000, v(S) = 0 for all S≠{1,2,3,4}

37
SHAP (SHapley Additive exPlanations) – Game Theory
Another Example
A B C
$60 Coupon $40 Coupon $30 Coupon
Small Medium Large
500 750 1000
$70 Coupon $90 Coupon $110 Coupon

38
SHAP (SHapley Additive exPlanations) – Game Theory

Intuition of Weighting: A player’s contribution should be weighed more if


1. The game already has lots of players; 2. The game has only a few players

39
SHAP (SHapley Additive exPlanations)

From Game Theory to Machine Learning


• Think of features as players, prediction outcome from model as payout

40
SHAP (SHapley Additive exPlanations)
Shapley value of feature j
Complex Input
model datapoint

Shapley value
𝑆 ! 𝑝− 𝑆 −1 !
𝜙𝑗 𝑓, 𝑥 = ෍ 𝑓𝑥 𝑆 ∪ {𝑗} − 𝑓𝑥 (𝑆)
for feature j 𝑝!
𝑆⊆{1,…,𝑝}\{𝑗}

Weighting Marginal
Subset of Contribution
features of feature j
excluding j

𝑝 : total #features

41
SHAP (SHapley Additive exPlanations)
Shapley value of features for high_ob
SHAP (SHapley Additive exPlanations)
Shapley value of features for high_ob

Avg price: 180K


High_ob: 663K

Top three features: Gr_Liv Area, Second_Flr_SF, Over_Qual

They explain more than $75K towards this price difference

43
SHAP (SHapley Additive exPlanations)

Summary

Expln. Type: Feature Importance

Scope: Local

Agnosticity: Model-Agnostic

44
End

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy