LIME Flashcards
What does LIME stand for?
Local (for each observation!; approx model locally)
Interpretable (simple for human to understand)
Model-agnostic (works for any)
Explanations
What kind of models does LIME work for?
ANY classifier or regressor
Text explainer
Image explainer
How does LIME work?
- Permute data
- Calculate distance between permutations and original observations
- Make predictions on new data using original model
- Pick m features best describing the original model outcome from the permuted data
- Fit a simple linear model to the permuted data with m features and similarities scores as weigths
- Feature weights from the simple model make explanations for the complex models local behaviour
Why is interpretable ML important?
(LIME talk)
1) Trust: How can we trust the predictions are correct?
2) How can we understand and predict the behaviour?
3) How do we improve model to precent potential mistakes? Feature engineering.
4) GDPR: one aspect is that customer has right to an explanation in automated decision process
5) Choosing between competing models
6) Detect and improve untrustworthy models
What is the idea of a “pick-step” in the model evaluation process?
In model evaluation; certain representative predictions are selected to be explained to the human by an “explainer” like LIME
How does LIME work for image classification?
1) take a single image
2) Divide it into components
3) Make perturbed instances by turning components off (i.e., make them gray)
4) get Predictions on these perturbed instances
4) Learn a simple linear model on these perturbed images
What is the LIME paper?
“Why Should I Trust You?” Explaining the Predictions of Any Classifier
Ribeiro, Singh, Guestrin
University of Washington
August 2016
What are the three contributions of the LIME paper?
- LIME, an algorithm that can explain the predictions of any classifier or regressor in a faithful way, by approximating it locally with an interpretable model.
- SP-LIME, a method that selects a set of representative instances with explanations to address the “trusting the model” problem, via submodular optimization.
- Comprehensive evaluation with simulated and human subjects, where we measure the impact of explanations on trust and associated tasks. In our experiments, non-experts using LIME are able to pick which classifier from a pair generalizes better in the real world. Further, they are able to greatly improve an untrustworthy classifier trained on 20 newsgroups, by doing feature engineering using LIME. We also show how understanding the predictions of a neural network on images helps practitioners know when and why they should not trust a model.
What are three natural requirements for the interpretation model?
- Local accuracy: the prediction of the explainer, g(x’), must match the prediciton of the base model, f(x)
- Missingness: a simplied input of 0 in the explainer corresponds to toggling a feature off
- Consistency: if toggling feature a off in one model always makes a bigger model than in another model, then the importance should be greater in the first model than in the second
What can a 2-D projection of the data tell you?
1) clusters
2) sparsity
3) outliers
4) heirarchy
model should learn this if it does a good job
get an understanding of data to later check that model understands it too
What does a correlation graph do for you?
1) Understand relationsips that a model should learn
2) see high demensionality relationship (relationships between variables).
What is a Decision Tree Surrogate Model?
Take the inputs to a complex model, X, and the outputs of a complex model, y-hat, and train a single decision tree on it
Why should you compare PDP and ICE lines?
If you see ICE lines criss-crossing with the PDP line then the PDP line may be misleading; interactions may be at play.
PDP shows average
look at it Side-by-side with Surrogate Decision Tree
What are the characteristics of LIME, TreeInterpreter, and Shapley?
LIME can be used on any model (model agnostic; even deep learning)
Tree interpreter must be used on trees
Shapley is best for trees; takes row a data, follows path through tree, game theory approach; shapley is in XGBoost
In regulated industry: use Shapley
What contributesd to gini importance?
Higher the variable appears and how ofter it appears contributed to gini importance.