model analysis Flashcards

Question 1

Q

reasons predictive models fail

Answer

A

PVEOQ

inadequate preprocessing of the data
inadequate model validation (eg not enough cross-fold validation statistics)
unjustified extrapolation (eg the model tries to predict a point outside its training region in predictor phase space)
overfitting
not considering enough models

Question 2

Q

variance / bias tradeoff (and underfit / overfit)

Answer

A

bias–how far from theoretical optimum (ie down to irreducible noise) our model’s fit is
variance–over an ensemble of full resampling and training runs, how much does the models’ fit vary
overfitting–overfitting can produce naturally low bias, but the model is overtrained on specific data and might not do well with “unseen data”–the variance between runs will be high
underfitting–not very accurate (high bias), but it might perform about the same on unseen data (eg a straight line fit to an amorphous point cloud)
eg in cross-validation
- very many folds can reduce bias (ie training on most of the test set for the k-1 folds being close to the best model we could expect, given the training data), but show high variance between resamples (the small held-out fold could contain a lot of outliers, eg)
- very few folds can produce a poor fit (high bias), but low variance (performs about the same on unseen data)

Question 3

Q

decomposition of MSE in model fitting and error analysis

Answer

A

the MSE formula, (1/n) sum_{test_samples} (y_i - y-hat_i)^2, can have its mean decomposed into compoments
E(MSE) = sig^2 + (model bias)^2 + model variance
- sig–the inherent, irreducible noise in the data
- bias–the misfit between the model in question and the ideal model’s fitting surface
- variance–the inter-training variations as the model is fit on various sample data from the population

Question 4

Q

Kappa (Cohen’s)

Answer

A

for classification models, a measure of accuracy; rooted in comparing eg 2 parties’ predictions
considers the confusion matrix
kappa = 1- (1-p_o) / (1-p_e)
- p_o is observed agreement
- p_e is expected agreement
for binary confusion matrix
- p_o is the proportion of samples on the diagonal
- p_e is computed as sum of the products of the row/col marginals relevant to the diagonal entries (sum of 2 products)
Fleiss may extend this to > 2 classes

Question 5

Q

log loss

Answer

A

aka logarithmic loss or cross-entropy loss
eg for binary classifier
- form the likelihood–how likely dod the model think the actual training observations were?
- ie for classes {0,1}, with probabilty refering to how likely class 1, if the instance j is labeled 0, use (1-pj), and if instance j is labeled 1, use pj–then take product over all instances
the negative of the log of this result is the log loss
penalizes “confidently wrong” answers more than eg Brier score

Question 6

Q

Brier score

Answer

A

for probabilistic classification models, a measure of accuracy
k classes are coded as k-tuples, eg (0,1,0) for a training instance of class B of A, B, C
then take MSE between these class labels, coded as tuples, and model class predictions–sum over each instance, then over all instances

Question 7

Q

odds ratio

Answer

A

can be used to assess informativeness of binary predictors in the case of binary classification; literally a ratio of odds (ie p/(1-p))
compute “probability” of an event (binary outcome variable, say, is “positive”) (via instance class labels) for both levels of the (binary) predictor–then,
- OR = odds of “positive” prediction at predictor level A / odds of “positive prediction at predictor level B = p1/(1-p1) / p2/(1-p2) = p1(1-p2) / (p2(1-p1))
- represents increase in odds of the “event” when going from first level of the predictor (ie related to p1) to the second level (ie related to p2)
can also be used for control/treatment group comparisons, where it can effectively remove the class priors of the sample dataset, and still be applied to the general population without adjustment (Kaplan)

Question 8

Q

global min/max search routines

Answer

A

eg have created a predictive model, now want to find the point (predictor values) corresponding to max/min outcome values
methods:
- Nelder-Mead simplex method
- simulated annealing