Interpreting ML Book Cards Flashcards
What does each coefficient represent in a linear model?
Each coefficient quantifies the change in the outcome variable for a one-unit change in that feature, assuming other variables are held constant.
Why is standardization important in linear models?
Standardization (scaling features to have zero mean and unit variance) helps in comparing the relative importance of features by making coefficients comparable.
What are the key assumptions for interpreting linear models?
Linearity, no multicollinearity, homoscedasticity (constant variance of errors), and normality of residuals.
How does high multicollinearity affect linear models?
High multicollinearity can make coefficients unstable and inaccurate, which can be mitigated by using techniques like ridge or lasso regression.
What is the structure of a decision tree?
Decision trees split data based on feature values to create nodes and branches, leading to leaf nodes that provide the final prediction.
How do decision trees handle interpretability?
Decision trees use if-then-else conditions that correspond to paths from the root to leaf nodes, making them easy to visualize and understand.
What is a common limitation of decision trees?
Decision trees can overfit the data, capturing noise rather than underlying patterns, which can be mitigated by pruning.
What is LIME and what does it do?
LIME (Local Interpretable Model-agnostic Explanations) explains predictions of any black-box model by creating an interpretable local approximation of the model around a specific instance.
How does LIME create explanations?
LIME perturbs the instance by making small random changes to its feature values and fits a simple model (e.g., linear) to approximate the complex model’s behavior around that instance.
What are some limitations of LIME?
LIME’s explanations are local, not global, and its effectiveness depends on how well the perturbed data samples the local space of the instance.
What are Shapley values?
Shapley values are derived from game theory to fairly attribute the output of a model to its input features, showing each feature’s contribution to a prediction.
What are the key properties of Shapley values?
Efficiency, symmetry, dummy, and additivity—these properties ensure fair attribution of the model’s output to features.
What are the limitations of Shapley values?
They are computationally intensive, especially with many features, and assume feature independence, which may not always hold.
What is SHAP and how does it relate to Shapley values?
SHAP (SHapley Additive exPlanations) is an implementation of Shapley values tailored for machine learning models, offering efficient computation and a unified interpretation framework.
How does SHAP provide interpretations?
SHAP values represent additive feature attribution, decomposing a prediction into contributions from each feature, with specific methods like Kernel SHAP, Tree SHAP, and Deep SHAP.
What are some visualization tools used with SHAP?
Force plots, summary plots, and dependence plots are used to visualize how features contribute to individual predictions and overall model behavior.
What are the benefits of using SHAP?
SHAP provides consistent and unbiased explanations, handles feature interactions, and has fast computation methods, especially for tree-based models.
What are the limitations of SHAP?
SHAP values can be overwhelming with many features, require good understanding of data and model, and are sensitive to data distributions and feature independence assumptions.
How do you interpret the coefficient of a feature in a standardized linear model?
The coefficient represents the number of standard deviations the outcome variable will change for a one standard deviation increase in the feature, assuming all other variables are constant.
What is homoscedasticity and why is it important in linear models?
Homoscedasticity means that the variance of the errors is constant across all levels of the independent variables; it’s important because it ensures that the model’s predictions are reliable and unbiased across the range of data.
What methods can be used to handle multicollinearity in linear models?
Ridge regression (adds L2 regularization) and Lasso regression (adds L1 regularization) are common methods to stabilize coefficients and reduce multicollinearity effects.
What is the Gini impurity in decision trees?
Gini impurity measures the likelihood of an incorrect classification of a randomly chosen element if it were randomly classified according to the distribution of labels in the node. A lower Gini impurity indicates a purer node.
How do decision trees determine feature importance?
Decision trees determine feature importance based on how effectively each feature splits the data, using metrics like Gini impurity or information gain. Features that result in more homogeneous splits are deemed more important.
What is pruning in decision trees and why is it used?
Pruning involves cutting off parts of the tree that have little predictive power to reduce overfitting and improve the tree’s ability to generalize to new data.