Explainability Flashcards
What are our intentions when analyzing the interpretability of the mode?
“Interpretability is the degree to which a
human can understand the cause of a
decision” (Miller, 2017)
“Interpretability is the degree to which a
human can consistently predict the model’s
result” (Been et al., 2016)
• We want to know why the model took a
decision
– Explainability/interpretability of the model
– Explanation of a single prediction
What are the occasions in which interpretability is critical?
Model interpretability becomes critical when:
(i) problem have large impact on human lives and/or health
(ii) models are not extensively validated in real-world settings
(iii) key criteria are hard to quantify and
we need to rely on human understanding
What are the types of models and how can we apply interpretability methods to them?
• Black-box models are the ones that
cannot be understood by looking at their
parameters.
– Explainability is achieved by applying
methods after training (post hoc)
• White-box models are the ones
considered interpretable due to their
simple structures.
– Explainability is achieved by restricting
the complexity of the model (intrinsic)
What are some results and goals we aim for when working on the explainability of a model
• Algorithm transparency
• Holistic model interpretability
• Global model interpretability on modular level
• Local interpretability for a single prediction
• Local interpretability for a group of predictions
• Results of the interpretation method
– Feature summary statistic
– Feature summary visualization
– Model internals
– Data point
– Intrinsically interpretable model
What are the parameters interpretations for linear regression?
• Interpretation of 𝛽i :
– If 𝑥i is numerical or ordinal, the 𝛽i represents how much the output changes for each unit of the feature’s increase
– If 𝑥i is binary and reference value is encoded as 0, 𝛽i represents how much the output changes when the feature does not have reference value
What is the interpretation for the logistic regression
• Interpretation is not very different from linear regression, in fact:
• When 𝑥j is increased of one unity the odds ration will change as follows:
– a negative weight means reducing the odds ratio
– a positive weight means increasing the odds ratio
– a null weight means the odds ratio is not changing
What are some model agnostic analysis we could do for explainability?
• Global model-agnostic methods
– Permutation Feature Importance
– Partial Dependence Plot
• Local model-agnostic methods
What is the permutation feature importance? How could we calculate it? What are its pros and cons
• Evaluates how important a single feature j is in a trained model as follows:
1. Train a model on a dataset
2. Shuffle the values of a single features in the dataset (i.e., column permutation)
3. Apply and evaluate the model both on original and on shuffled data.
4. Compute the feature importance of the feature as performance loss of the model on the shuffled dataset compared to the original dataset
• Pros
– Global insight
– Problem-independent
– Considers interactions with other features
– Does not require model retraining
• Cons
– Linked only to model performance and requires labeled data
– Is computed with possibly unrealistic data distribution
– Correlated features share importance
What is the partial dependence block? What is it used for?
• While feature importance shows what variables most affect predictions, partial dependence plots show how a feature affects predictions.
• It works as follows:
1. Train a model on data.
2. Run the model on each sample by changing only the value of target feature (choosing it from a set of values in the feature range)
3. Compute and plot the model output for each value of the feature.
• PDP can be computed also for two target features at once
• The variability of the PDP values (i.e., standard deviation or range) can be used as a measure for
feature importance
What is the individual conditional expectation? How does it relate to PCP?
• A major issue with PDP is that it only
reflects the average behavior of the model
• ICE consists of multiple PDP plots, one for each sample in the dataset.
– the values of all the features except
the target one are fixed for each
sample
– the prediction values are plotted
against the values of the target feature
• ICE can be plot together with PDP to
highlights average behavior
• An anchored version of ICE can be plot
using as a baseline the prediction on the
left extreme of the feature range