ML Flashcards

Question 1

Q

Which of the following best describes the output of the
Metropolis-Hastings algorithm?

Answer

A

A random sample from the posterior

Question 2

Q

After the clustering has converged, which output (i.e., parameter of K-Means)
needs to be recalculated outside the loop before returning the outputs such
that both outputs are consistent with each other?

Answer

A

Cluster assignments: In the loop, we finally recalculate the means and then check
convergence. Thus, we should recalculate the cluster assignments based on the final
set of means.

Question 3

Q

In practice, it is common to apply PCA prior to K-Means. What is the main
motivation for this pre-processing?

Answer

A

Decorrelation: K-means does not use covariance information and further assumes
the features have no covariance. PCA is therefore primarily used to decoorrelate the
data. Reducing dimensionality may be a secondary motivation.

Question 4

Q

What is the difference between interpretability and explainability?

Answer

A

Interpretability concerns
models that are self-explanatory.
Explainability is used to provide
individual explanations to understand black-box models.

Question 5

Q

Which three properties you think are the most important for properties of
individual explanations?

Answer

A

Fidelity, Plausibility, Confidence

Question 6

Q

Which categories in the interpretable ML taxonomy given by Molnar apply to
(interpretation of) a linear regression model?

Answer

A

Modular, intrinsic, model-specific

Question 7

Q

In Generalized Additive Models, how can we model quadratic feature interactions in component functions?

Answer

A

Using decision trees with limited depth to model component functions E.g. max
depth 2 if the chosen features are different, or more until at most 2 features are used
in the tree.

Question 8

Q

Under what constraints (imposed during training) a GAM can generate a scoring model?

Answer

A

We restrict the GAM component functions to use only a weighted sum of indicator functions, where the weights are constrained to be integers.

Question 9

Q

Given the illustration of the trained decision tree model for ‘Play Tennis’
task, provide an explanation to the decision of the model for the following example
using one of the explainability methods discussed in the lecture: [Outlook= Sunny,
Temperature=Mild, Humidity= Normal, Windy=True].
This example follows Outlook: Sunny → Humidity: Normal → Yes path.

Answer

A

IF (Outlook =‘Sunny’) AND (Humidity =
‘Normal’) THEN Output = Yes

Question 10

Q

Assume that two variables, namely xj and xk interact and that xj is a
causal ancestor of xk.
Why does PDP fall short in visualizing the univariate causal effect of xj on
the model?

Answer

A

Because if xk is dependent on xj , the expected value of xk will be determined by xj.
Hence when we iterate over xj independently, the dependent variable should be
arranged accordingly.
When we pass over the dataset for an arbitrary value of xj, generated samples will contain unrealistic combinations of these features and thus the statistics will not be correct.

Question 11

Q

Assume that two variables, namely xj and xk interact and that xj is a
causal ancestor of xk.
How can we handle this issue using PDP?

Answer

A

A solution would be to analyze PDP for these interacting variables jointly.
Even if a grid search is conducted to generate the value pairs, the univariate histograms shown over the axes can reveal which combinations are realistic.
Moreover, if the causal interaction is known, the dependent variable among the two can be chosen from the conditional distribution p(xk|xj ).

Question 12

Q

In permutation feature importance, the assumption is that when perturbed,
more influential features cause ………… error.

Question 13

Q

What are the drawbacks of global surrogates?

Answer

A

They may not model the global complexity of the original model.
They may not model feature interactions.
Results may reflect their own structural bias.

Question 14

Q

Which term in the optimization objective of LIME aims to ensure ‘local fidelity’
with respect to an instance of interest?

Question 15

Q

What are the open problems concerning LIME?

Answer

A

measuring similarity / proximity
instance sampling
choosing the interpretable version depending on task

Question 16

Q

Assume that you work as a machine learning expert in a company and
there are a range of critical tasks on which black-box models are running. You are
tasked with building an explanation system for each of these models using ‘LIME’.
What would your choice of an ‘interpretable representation’ for the models with the
following original input representations be?
Binary tabular input: x in {0,1}^d, where d is feature dimensionality.

Answer

Study These Flashcards

A

This is the simple, interpretable representation that we would like to get. Therefore,
no transformation is needed.

Question 17

Q

Assume that you work as a machine learning expert in a company and
there are a range of critical tasks on which black-box models are running. You are
tasked with building an explanation system for each of these models using ‘LIME’.
What would your choice of an ‘interpretable representation’ for the models with the
following original input representations be?
Continuous tabular input: x in R^d

Answer

Study These Flashcards

A

We can binarize the continuous features/input using a suitable threshold. The respective threshold can be the training set mean or median statistic that measure the
central tendency. This is very application dependent, too. For example, in a medical
setting, a value withing the normal range can be represented as 0 and any value
beyond charts would be 1.

Question 18

Q

Assume that you work as a machine learning expert in a company and
there are a range of critical tasks on which black-box models are running. You are
tasked with building an explanation system for each of these models using ‘LIME’.
What would your choice of an ‘interpretable representation’ for the models with the
following original input representations be?
Free text input

Answer

Study These Flashcards

A

We can get a binary bag-of-words (BoW) representation, where a word is represented
with a 1 if it appears in the corresponding document/input to explain, and a 0 otherwise.

Question 19

Q

Assume you work in an HR consultancy company and you are tasked with
developing an interpretable applicant classification model for a given position. There
are two binary and three continuous features and the target variable (invite to
interview) is binary. What would you do? Which model family / classification
method would you use? Explain your answer.

Answer

Study These Flashcards

A

First the answer should be an intrinsically interpretable model, such as logistic regression (LR), SVM with a linear kernel, GAM or a Decision Tree (DT).
Among these, GAM is the most suitable as it can also handle the feature interactions, while
remaining interpretable without any preprocessing.
GAMs, SVM and LR can process the binary/continuous input without any issues and can yield interpretable models.
However, for the DT to yield an interpretable model, we need to discretize the continuous features.
This can be done via binarization using the training set mean as threshold.

ML Flashcards

(19 cards)