Classification Models Flashcards
What is the motivation for learning interpretable classification models?
Understanding a model improves trust in its predictions and can provide insights into the data/application domain.
Important in fields like medicine and finance, where explanations are often legally required.
What are the two approaches to interpretability?
Intrinsic approach and Post-hoc approach.
Intrinsic approaches involve directly interpretable models like decision trees, while post-hoc methods are used for black-box models.
Define global interpretability.
Interpreting the entire model at once, understanding how features interact to predict class labels generally.
Examples include small decision trees.
Define local interpretability.
Explaining the prediction of each testing example separately.
This can involve interpreting specific paths in a decision tree.
What does each path in a decision tree represent?
A rule in the form of IF-THEN statements.
For example, IF (Salary = ‘low’) THEN (Buy = ‘no’).
What is a pro of interpreting decision trees?
They are visual models that are easy to interpret, especially if small.
Decision trees typically focus on the most relevant attributes.
What is a con of interpreting decision trees?
Once an attribute is selected at a node, all its values must be added to the outgoing branches, leading to potential data fragmentation.
This can include irrelevant values.
What are the two approaches to learning IF-THEN classification rules?
Approach 1: Extraction from a decision tree; Approach 2: Learning rules directly from data.
Ordered rules can provide a clear hierarchy for classification.
List some pros of IF-THEN rules.
- Can be analyzed modularly
- Can contain only relevant attribute values
- Can be learned directly from data
Unlike decision trees, which may include irrelevant values.
List some cons of IF-THEN rules.
- Not visual/hierarchical
- May contain irrelevant values if from decision trees
- More difficult interpretation for ordered rule lists
Rules are applied sequentially, complicating interpretation.
What is the basic principle regarding model size and interpretability?
The smaller the size of the model, the simpler it is.
For decision trees, this refers to the number of nodes; for rule sets, the number of rules.
What are Naïve Bayes models based on?
Assigning a new example to the class with the maximal value of the product of conditional probabilities and class probabilities.
The Naïve Bayes formula is used for classification.
How is local interpretation of a Naïve Bayes model achieved?
By computing the importance of each attribute value for classifying the test example and ranking them.
The formula used is Imp(Attr_j) = | P(Attr_j | Class = yes) - P(Attr_j | Class = no) |.
What is LIME in the context of model interpretability?
Local Interpretable Model-agnostic Explanations, which provide local explanations for classifications of new instances.
It learns a linear local model based on the features of the instance.
What limitation does LIME have?
The data space region where the explanation applies is unclear.
The local linear model’s effectiveness depends on the size of the neighborhood around the instance.
What is a key takeaway regarding the interpretability of models?
The relative importance of predictive performance and interpretability is application domain-dependent.
Different models (decision trees, rule sets, Naïve Bayes) have distinct pros and cons.
What is decision tree/rule set size an objective measure of?
Simplicity
It has limited effectiveness as it is a purely syntactic measure, ignoring attribute meanings.
Does a shorter model guarantee better interpretability for users?
No
A shorter model is not necessarily more interpretable by users than a larger one.
What can black box models be indirectly interpreted by?
Learning local models for explaining each example
These local models are just surrogate models, unlike white box models which are intrinsically interpretable.
What is the central question regarding algorithm predictions and biased data?
How fair are the algorithm’s predictions given the biased data?
What percentage of images in the ImageNet dataset come from the US?
45%
This is significant considering the US only represents 4% of the world’s population.
What is the prevalence of cardiovascular disease in UK Biobank participants aged 45-54 compared to the general population?
4.6% for UK Biobank participants vs. 10.9% in general population for men
For women: 2.4% UK Biobank participants vs. 10.3% in general population.
What is the main effect of Google Translate when translating articles referring to women?
Phrases often become ‘he said’ or ‘he wrote’
This amplifies the bias in the data due to the ratio of masculine to feminine pronouns.
What does the Discrimination Score (DS) measure?
The difference in prediction probabilities between favored and unfavored individuals
DS = P(Y = +1 | S = 0) – P(Y = +1 | S = 1).