Classification Models Flashcards

1
Q

What is the motivation for learning interpretable classification models?

A

Understanding a model improves trust in its predictions and can provide insights into the data/application domain.

Important in fields like medicine and finance, where explanations are often legally required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two approaches to interpretability?

A

Intrinsic approach and Post-hoc approach.

Intrinsic approaches involve directly interpretable models like decision trees, while post-hoc methods are used for black-box models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define global interpretability.

A

Interpreting the entire model at once, understanding how features interact to predict class labels generally.

Examples include small decision trees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define local interpretability.

A

Explaining the prediction of each testing example separately.

This can involve interpreting specific paths in a decision tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does each path in a decision tree represent?

A

A rule in the form of IF-THEN statements.

For example, IF (Salary = ‘low’) THEN (Buy = ‘no’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a pro of interpreting decision trees?

A

They are visual models that are easy to interpret, especially if small.

Decision trees typically focus on the most relevant attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a con of interpreting decision trees?

A

Once an attribute is selected at a node, all its values must be added to the outgoing branches, leading to potential data fragmentation.

This can include irrelevant values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two approaches to learning IF-THEN classification rules?

A

Approach 1: Extraction from a decision tree; Approach 2: Learning rules directly from data.

Ordered rules can provide a clear hierarchy for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

List some pros of IF-THEN rules.

A
  • Can be analyzed modularly
  • Can contain only relevant attribute values
  • Can be learned directly from data

Unlike decision trees, which may include irrelevant values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

List some cons of IF-THEN rules.

A
  • Not visual/hierarchical
  • May contain irrelevant values if from decision trees
  • More difficult interpretation for ordered rule lists

Rules are applied sequentially, complicating interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the basic principle regarding model size and interpretability?

A

The smaller the size of the model, the simpler it is.

For decision trees, this refers to the number of nodes; for rule sets, the number of rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are Naïve Bayes models based on?

A

Assigning a new example to the class with the maximal value of the product of conditional probabilities and class probabilities.

The Naïve Bayes formula is used for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is local interpretation of a Naïve Bayes model achieved?

A

By computing the importance of each attribute value for classifying the test example and ranking them.

The formula used is Imp(Attr_j) = | P(Attr_j | Class = yes) - P(Attr_j | Class = no) |.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is LIME in the context of model interpretability?

A

Local Interpretable Model-agnostic Explanations, which provide local explanations for classifications of new instances.

It learns a linear local model based on the features of the instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What limitation does LIME have?

A

The data space region where the explanation applies is unclear.

The local linear model’s effectiveness depends on the size of the neighborhood around the instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a key takeaway regarding the interpretability of models?

A

The relative importance of predictive performance and interpretability is application domain-dependent.

Different models (decision trees, rule sets, Naïve Bayes) have distinct pros and cons.

17
Q

What is decision tree/rule set size an objective measure of?

A

Simplicity

It has limited effectiveness as it is a purely syntactic measure, ignoring attribute meanings.

18
Q

Does a shorter model guarantee better interpretability for users?

A

No

A shorter model is not necessarily more interpretable by users than a larger one.

19
Q

What can black box models be indirectly interpreted by?

A

Learning local models for explaining each example

These local models are just surrogate models, unlike white box models which are intrinsically interpretable.

20
Q

What is the central question regarding algorithm predictions and biased data?

A

How fair are the algorithm’s predictions given the biased data?

21
Q

What percentage of images in the ImageNet dataset come from the US?

A

45%

This is significant considering the US only represents 4% of the world’s population.

22
Q

What is the prevalence of cardiovascular disease in UK Biobank participants aged 45-54 compared to the general population?

A

4.6% for UK Biobank participants vs. 10.9% in general population for men

For women: 2.4% UK Biobank participants vs. 10.3% in general population.

23
Q

What is the main effect of Google Translate when translating articles referring to women?

A

Phrases often become ‘he said’ or ‘he wrote’

This amplifies the bias in the data due to the ratio of masculine to feminine pronouns.

24
Q

What does the Discrimination Score (DS) measure?

A

The difference in prediction probabilities between favored and unfavored individuals

DS = P(Y = +1 | S = 0) – P(Y = +1 | S = 1).

25
What is the true positive (TP) rate formula?
#TP / (#TP + #FN)
26
What is one approach to learning fair classifiers?
Pre-processing approach ## Footnote This includes removing sensitive attributes and data massaging.
27
What is 'fairness through unawareness'?
Removing sensitive attributes from the dataset ## Footnote It is trivial to implement but not very effective in practice.
28
What is the effect of removing sensitive attributes on bias detection?
It makes it more difficult to detect biases in the learned model.
29
What happens to the Discrimination Score (DS) when using the Naïve Bayes classifier on the Census Income dataset?
DS of NB’s predictions can be higher than the original dataset DS ## Footnote For example, it increased from 0.19 to 0.34.
30
In the Data Massaging Approach, what is done to 'demotion candidates'?
They are sorted and their class is changed to Y = -1 ## Footnote This is done to make the data fairer.
31
What is the goal of reweighing in the context of fair classifiers?
Assigning weights to each example based on its sensitive attribute value and class label.
32
What does the summary state about biased data?
Data is usually biased, making it challenging to learn fair classification models.
33
What is one measure of unfairness in classification tasks?
Difference of TP rates ## Footnote This measures the disparity in true positive rates between groups.