W5-Machine Learning Modeling Pipelines in Production Flashcards

1
Q

Interpretability methods can be intrinsic or Post-Hoc. define each

A

-Intrinsically interpretable modelshave been around for a long time and provide a higher degree of certainty as to why they generated a particular result.

-Post-hoc methods treat models as black boxes and are applied after training. They tend to treat all models the same and often don’t evaluate the actual sequence of operations that led to the generation of the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Interpretability methods can also be grouped according to whether they are local or global, what are each of them based on?

A

based on whether the method explains an individual prediction (local- like shap values for each data point) or explains the entire model behavior (global-shap value for each feature for all datasets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between permutation importance and counterfactual explanation?

GPT

A

In summary, while both techniques involve changing the values of input features, permutation importance is used to understand the relative importance of features to the overall performance of the model, while counterfactual explanations are used to explain individual predictions and identify how changing the input features can affect the model’s output.

counterfactual explanation is a local method that explains why a specific prediction was made for a given instance by finding an alternate instance with minimal feature changes that would have resulted in a different prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Many classic models are highly interpretable, such as ____models and ____ models.

A

tree-based ,
linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Monotonic features help improve interpretability by matching our intuition about the reality of the world we are trying to model. Give an example of this

A

For example, if you’re trying to create a model to predict the value of a used car, when all other features or how constant, the more miles on the car, the less the value should be. But you don’t expect a car with more miles to be worth more than it was with less miles, all other things being equal. This matches your knowledge of the world and, so your model should match it too and the mileage feature should be monotonic.

Monotonic means that the contribution of the feature towards them all result, either consistently increases or decreases or stays even as the feature value changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Linear models are very interpretable because linear relationships are easy to understand and interpret and the features of linear models are always monotonic. True/False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The more complex model architectures can achieve high-accuracy, but this often comes at a price in terms of interpretability.although, ____ is one example of a newer architecture that delivers far greater accuracy but also delivers good interpretability.

A

Tensorflow Lattice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are partial dependence plots used for?

A

Partial dependence plots (PDP) are a widely used method to understand the effects of particular features on model results and the type of relationship between those features and the targets or labels in training data.

PDP typically focuses on the marginal impact of one or two features on model results, which could be linear and monotonic or more complex.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

PDP is intuitive and causal when the features are not correlated and is fairly easy to implement. True/False

A

True

PDP is causal in the sense that if we change a feature and measure the changes in the results, we expect the results to be consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are PDPs limitations?

A

PDP has some limitations, such as only being able to work with two features at a time and assuming independence between features you’re analysing, which may not always hold in real-world scenarios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are permutation feature importance limitations?

A

Some disadvantages of permutation feature importance include the uncertainty of whether to use training or test data to measure it, correlated features being a problem, and the need to have access to the original labeled training data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the limitations/disadvantages of the SHAP values?

A

Shapley values have some disadvantages, including being computationally expensive, easily misinterpreted, and always using all the features.
Shapley does not create a model, so it cannot be used to test changes in inputs, and it does not work well when features are correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do Concept activation vectors (CAVs) do?

A

Concept activation vectors (CAVs) are an advanced approach that provides an interpretation of a neural network’s internal state.

CAVs can be used to sort examples or images with respect to their relationship to a concept, providing confirmation that the CAVs correctly reflect the concept of interest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

LIME is a popular and well-known framework for creating local interpretations of model results. True/False

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly