Week 8 Flashcards

1
Q

What ML interpretation method separates the explanations from the machine learning model?

A

Model-agnostic interpretation methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the advantage of using model-agnostic interpretation methods over model-specific ones?

A

Their flexibility. The same method can be used for any type of model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the disadvantage of using only interpretable models instead of using model-agnostic interpretation methods?

A

Predictive performance is lost compared to other ML models, and you limit yourself to one type of model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are two alternatives to using model-agnostic interpretation methods?

A
  1. Use only interpretable models.
  2. Use model-specific interpretation methods.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the disadvantage of using model-specific interpretation methods compared to model-agnostic ones?

A

It binds you to one model type and it’s difficult to switch to something else.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name three flexibilities that are desirable aspects of a model-agnostic explanation system:

A
  1. Model flexibility
  2. Explanation flexibility
  3. Representation flexibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Model flexibility (as an aspect of a model-agnostic explanation system)

A

It can work with any ML model, such as random forests and deep neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explanation flexibility (as an aspect of a model-agnostic explanation system)

A

It’s not limited to a certain form of explanation. For example, linear formula and graphics with feature importances are both options.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Representation flexibility (as an aspect of a model-agnostic explanation system)

A

It’s able to use a different feature representation as the model being explained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we further distinguish model-agnostic interpretation methods?

A

Into local and global methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do global model-agnostic interpretation methods describe?

A

How features affect the prediction on average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do local model-agnostic interpretation methods describe?

A

An individual prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How are global model-agnostic methods often expressed?

A

As expected values based on the distribution of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the partial dependence plot?

A

A feature effect plot: the expected prediction when all other features are marginalized out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When are global interpretation methods particularly useful?

A

When you want to understand the general mechanisms in the data or debug a model (since they describe average behavior).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

PDP (abbreviation)

A

Partial dependence plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

PD plot (abbreviation)

A

partial dependence plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the PDP show?

A

The marginal effect one or two features have on the predicted outcome of a ML model. Can show whether relationship between target and a feature is linear, monotonic or more complex.

19
Q

What does xs denote in the PD function for regression?

A

The features for which the PD function should be plotted.

20
Q

What does XC denote in the PD function for regression?

A

The other features (so non-xs features) used in the ML model ^f.

21
Q

How does PD work?

A

By marginalizing the ML model output over the distribution of the features in set C, so that the function shows the relationship between the features in set S we are interested in, and the predicted outcoe.

22
Q

Give the PD function for regression in the form of an expectation:

A

EXC[^f(xS, XC)].

23
Q

Give the PD function for regression in the form of an integral:

A

integral sign ^f(xS, XC) dP(XC).

24
Q

How is the partial function ^fS estimated?

A

By calculating averages in the training data, using the Monte Carlo method.

25
Q

Give the partial function ^fS that is used in the PD function for regression:

A

^fS(xS) = 1/n sum(i=1 to n) ^f(xS,x(i)C ).

26
Q

What does the partial function ^f in the PD function for regression tell us?

A

For given values of features S, it tells us what the average marginal effect on the prediction is.

27
Q

What does x(i)C denote in the partial function ^f in the PD function for regression?

A

Actual features values from the dataset for the features in which we are not interested.

28
Q

What is n in the partial function ^f in the PD function for regression?

A

The number of instances in the dataset.

29
Q

What is the assumption of the PDP about the relationship between C and S?

A

The features in C are not correlated with the features in S.

30
Q

What happens if the assumption that features in C are not correlated with features in S is violated in PDP?

A

The averages calculated for the PDP will include data points that are unlikely/impossible.

31
Q

WHat does the PDP display for classification where the ML model outputs probabilities?

A

The probability for a certain class given different values for features in S.

32
Q

What kind of model-agnostic method is the PDP?

A

A global method.

33
Q

How do you calculate the partial dependence for categorical features?

A

Replace the feature value of all data instances with one value and average the predictions.

34
Q

What does a flat PDP indicate?

A

The feature is not important.

35
Q

How is importance of a feature defined in PDP for numerical features?

A

As the deviation of each unique feature value from the average curve.

36
Q

What is the variable for the importance of numerical features?

A

I(xS)

37
Q

Range rule

A

The way of calculating the deviation when you want a rough estimate and only know the range.

38
Q

Why should the PDP-based feature importance be interpreted with care?

A

It captures only the main effect of the feature and ignores possible feature interactions.

39
Q

Name three disadvantages of the PDP:

A
  1. It doesn’t show the feature distribution, so you might overinterpret regions with almost no data.
  2. Assumption of independence.
  3. Heterogeneous effects might be hidden (averaged out by marginalizing).
40
Q

What does permutation feature importance measure?

A

The increase in the prediction error of the model after we permute the feature’s values, which breaks the relation between the feature and the true outcome.

41
Q

When is a feature important when using permutation feature importance?

A

If shuffling its values increases the model error, cuz then the model relied on the feature for the prediction.

42
Q

Should you compute importance on training or test data?

A

Since permutation feature importance relies on measurements of the model error, you should use unseen test data to prevent overfitting.

43
Q

Global surrogate model

A

An interpretable model that is trained to approximate the predictions of a black box model.

44
Q
A