Chapter 15 Probabilistic Model Selection with AIC, BIC, and MDL Flashcards

Question 1

Q

It is common to choose a model that performs the best on a hold-out test dataset or to estimate model performance using a resampling technique, such as k-fold cross-validation. What’s an alternative approach? Give an example P 136

Answer

A

Using probabilistic statistical measures that attempt to quantify both the model performance on the training dataset and the complexity of the model. Examples include the Akaike and Bayesian Information Criterion and the Minimum Description Length.

Question 2

Q

What’s one benefit and one limitation of the information criterion statistics? P 136

Answer

A

The benefit of these information criterion statistics is that they do not require a hold-out test set, although a limitation is that they do not take the uncertainty of the models into account and may end-up selecting models that are too simple.

Question 3

Q

The simplest reliable method of model selection involves fitting candidate models on a training set, tuning them on the validation dataset, and selecting a model that performs the best on the test dataset according to a chosen metric, such as accuracy or error. What is the problem with these approaches of evaluation? P 137

Answer

A

A problem with this approach is that only model performance is assessed, regardless of model complexity.

Question 4

Q

Probabilistic model selection (or ____) provides an analytical technique for scoring and choosing among candidate models. Models are scored both on their ____ and based on the ____. P 137

Answer

A

information criteria, performance on the training dataset, complexity of the model

Question 5

Q

What’s the definition of model performance and model complexity? P 137

Answer

A

Model Performance. How well a candidate model has performed on the training dataset.
Model Complexity. How complicated the trained candidate model is after training.

Question 6

Q

Model performance may be evaluated using a probabilistic framework, such as ____ under the framework of maximum likelihood estimation. Model complexity may be evaluated as ____ aka ____. P 137

Answer

A

Log-likelihood, the number of degrees of freedom, parameters in the model

Question 7

Q

A limitation of probabilistic model selection methods is that the same general statistic cannot be calculated across a range of different types of models. Instead, the metric must be carefully derived for each model. True/False P 138

Question 8

Q

There are three statistical approaches to estimating how well a given model fits a dataset and how complex the model is. And each can be shown to be equivalent or proportional to each other, although each was derived from a different framing or field of study, but each statistic can be calculated using the log-likelihood for a model and the data. What are they and from which field are they derived? P 138

Answer

A

They are:
Akaike Information Criterion (AIC). Derived from frequentist probability.
Bayesian Information Criterion (BIC). Derived from Bayesian probability.
Minimum Description Length (MDL). Derived from information theory.

Question 9

Q

To use AIC/ BIC/ MDL for model selection, we simply choose the model giving ____ (smallest/biggest) AIC/BIC/ MDL over the set of models considered. P 139

Question 10

Q

“Compared to the BIC method, the AIC statistic penalizes complex models less”, what does it mean? P 139

Answer

A

Means that it may put more emphasis on model performance on the training dataset, and, in turn, select more complex models.

Question 11

Q

Unlike the AIC, the BIC penalizes the model more for its complexity, meaning that more complex models will have a worse (larger) score and will, in turn, be less likely to be selected. True/False P 139

Question 12

Q

Given a family of models, including the true model, the probability that BIC will select the correct model approaches one as the sample size N → infinity. And it’s the same for AIC too. True/False P 140

Answer

A

False

Importantly, the derivation of BIC under the Bayesian probability framework means that if a selection of candidate models includes a true model for the dataset, then the probability that BIC will select the true model increases with the size of the training dataset. This cannot be said for the AIC score.

Question 13

Q

A downside of BIC is that for smaller, less representative training datasets, it is more likely to choose models that are too simple. True/False P 140

Question 14

Q

The MDL principle takes the stance that the best theory for a body of data is one that maximizes the size of the theory plus the amount of information necessary to specify the exceptions relative to the theory True/False P 141

Answer

A

False

The MDL principle takes the stance that the best theory for a body of data is one that minimizes the size of the theory plus the amount of information necessary to specify the exceptions relative to the theory

Question 15

Q

The MDL calculation is very similar to BIC and can be shown to be equivalent in some situations. True/False P 141

Question 16

Q

The likelihood function for a linear regression model can be shown to be identical to the least squares function. True/False P 142

Answer

Study These Flashcards

A

True

Question 17

Q

Summary:

Akaike and Bayesian Information Criterion are two ways of scoring a model based on its
____ and ____.
Minimum Description Length provides another scoring method from information theory
that can be shown to be equivalent to ____.

Answer

Study These Flashcards

A

log-likelihood, complexity, BIC

Chapter 15 Probabilistic Model Selection with AIC, BIC, and MDL Flashcards

(17 cards)