ML Flashcards
Which of the following best describes the output of the
Metropolis-Hastings algorithm?
A random sample from the posterior
After the clustering has converged, which output (i.e., parameter of K-Means)
needs to be recalculated outside the loop before returning the outputs such
that both outputs are consistent with each other?
Cluster assignments: In the loop, we finally recalculate the means and then check
convergence. Thus, we should recalculate the cluster assignments based on the final
set of means.
In practice, it is common to apply PCA prior to K-Means. What is the main
motivation for this pre-processing?
Decorrelation: K-means does not use covariance information and further assumes
the features have no covariance. PCA is therefore primarily used to decoorrelate the
data. Reducing dimensionality may be a secondary motivation.
What is the difference between interpretability and explainability?
Interpretability concerns
models that are self-explanatory.
Explainability is used to provide
individual explanations to understand black-box models.
Which three properties you think are the most important for properties of
individual explanations?
Fidelity, Plausibility, Confidence
Which categories in the interpretable ML taxonomy given by Molnar apply to
(interpretation of) a linear regression model?
Modular, intrinsic, model-specific
In Generalized Additive Models, how can we model quadratic feature interactions in component functions?
Using decision trees with limited depth to model component functions E.g. max
depth 2 if the chosen features are different, or more until at most 2 features are used
in the tree.
Under what constraints (imposed during training) a GAM can generate a scoring model?
We restrict the GAM component functions to use only a weighted sum of indicator functions, where the weights are constrained to be integers.
Given the illustration of the trained decision tree model for ‘Play Tennis’
task, provide an explanation to the decision of the model for the following example
using one of the explainability methods discussed in the lecture: [Outlook= Sunny,
Temperature=Mild, Humidity= Normal, Windy=True].
This example follows Outlook: Sunny → Humidity: Normal → Yes path.
IF (Outlook =‘Sunny’) AND (Humidity =
‘Normal’) THEN Output = Yes
Assume that two variables, namely xj and xk interact and that xj is a
causal ancestor of xk.
Why does PDP fall short in visualizing the univariate causal effect of xj on
the model?
Because if xk is dependent on xj , the expected value of xk will be determined by xj.
Hence when we iterate over xj independently, the dependent variable should be
arranged accordingly.
When we pass over the dataset for an arbitrary value of xj, generated samples will contain unrealistic combinations of these features and thus the statistics will not be correct.
Assume that two variables, namely xj and xk interact and that xj is a
causal ancestor of xk.
How can we handle this issue using PDP?
A solution would be to analyze PDP for these interacting variables jointly.
Even if a grid search is conducted to generate the value pairs, the univariate histograms shown over the axes can reveal which combinations are realistic.
Moreover, if the causal interaction is known, the dependent variable among the two can be chosen from the conditional distribution p(xk|xj ).
In permutation feature importance, the assumption is that when perturbed,
more influential features cause ………… error.
higher
What are the drawbacks of global surrogates?
They may not model the global complexity of the original model.
They may not model feature interactions.
Results may reflect their own structural bias.
Which term in the optimization objective of LIME aims to ensure ‘local fidelity’
with respect to an instance of interest?
pi_k(z)
What are the open problems concerning LIME?
measuring similarity / proximity
instance sampling
choosing the interpretable version depending on task