W5-Machine Learning Modeling Pipelines in Production Flashcards
Interpretability methods can be intrinsic or Post-Hoc. define each
-Intrinsically interpretable modelshave been around for a long time and provide a higher degree of certainty as to why they generated a particular result.
-Post-hoc methods treat models as black boxes and are applied after training. They tend to treat all models the same and often don’t evaluate the actual sequence of operations that led to the generation of the results.
Interpretability methods can also be grouped according to whether they are local or global, what are each of them based on?
based on whether the method explains an individual prediction (local- like shap values for each data point) or explains the entire model behavior (global-shap value for each feature for all datasets)
What is the difference between permutation importance and counterfactual explanation?
GPT
In summary, while both techniques involve changing the values of input features, permutation importance is used to understand the relative importance of features to the overall performance of the model, while counterfactual explanations are used to explain individual predictions and identify how changing the input features can affect the model’s output.
counterfactual explanation is a local method that explains why a specific prediction was made for a given instance by finding an alternate instance with minimal feature changes that would have resulted in a different prediction.
Many classic models are highly interpretable, such as ____models and ____ models.
tree-based ,
linear
Monotonic features help improve interpretability by matching our intuition about the reality of the world we are trying to model. Give an example of this
For example, if you’re trying to create a model to predict the value of a used car, when all other features or how constant, the more miles on the car, the less the value should be. But you don’t expect a car with more miles to be worth more than it was with less miles, all other things being equal. This matches your knowledge of the world and, so your model should match it too and the mileage feature should be monotonic.
Monotonic means that the contribution of the feature towards them all result, either consistently increases or decreases or stays even as the feature value changes.
Linear models are very interpretable because linear relationships are easy to understand and interpret and the features of linear models are always monotonic. True/False
True
The more complex model architectures can achieve high-accuracy, but this often comes at a price in terms of interpretability.although, ____ is one example of a newer architecture that delivers far greater accuracy but also delivers good interpretability.
Tensorflow Lattice
What are partial dependence plots used for?
Partial dependence plots (PDP) are a widely used method to understand the effects of particular features on model results and the type of relationship between those features and the targets or labels in training data.
PDP typically focuses on the marginal impact of one or two features on model results, which could be linear and monotonic or more complex.
PDP is intuitive and causal when the features are not correlated and is fairly easy to implement. True/False
True
PDP is causal in the sense that if we change a feature and measure the changes in the results, we expect the results to be consistent.
What are PDPs limitations?
PDP has some limitations, such as only being able to work with two features at a time and assuming independence between features you’re analysing, which may not always hold in real-world scenarios.
What are permutation feature importance limitations?
Some disadvantages of permutation feature importance include the uncertainty of whether to use training or test data to measure it, correlated features being a problem, and the need to have access to the original labeled training data set.
What are the limitations/disadvantages of the SHAP values?
Shapley values have some disadvantages, including being computationally expensive, easily misinterpreted, and always using all the features.
Shapley does not create a model, so it cannot be used to test changes in inputs, and it does not work well when features are correlated.
What do Concept activation vectors (CAVs) do?
Concept activation vectors (CAVs) are an advanced approach that provides an interpretation of a neural network’s internal state.
CAVs can be used to sort examples or images with respect to their relationship to a concept, providing confirmation that the CAVs correctly reflect the concept of interest.
LIME is a popular and well-known framework for creating local interpretations of model results. True/False
True