model analysis Flashcards
1
Q
reasons predictive models fail
A
PVEOQ
- inadequate preprocessing of the data
- inadequate model validation (eg not enough cross-fold validation statistics)
- unjustified extrapolation (eg the model tries to predict a point outside its training region in predictor phase space)
- overfitting
- not considering enough models
2
Q
variance / bias tradeoff (and underfit / overfit)
A
- bias–how far from theoretical optimum (ie down to irreducible noise) our model’s fit is
- variance–over an ensemble of full resampling and training runs, how much does the models’ fit vary
- overfitting–overfitting can produce naturally low bias, but the model is overtrained on specific data and might not do well with “unseen data”–the variance between runs will be high
- underfitting–not very accurate (high bias), but it might perform about the same on unseen data (eg a straight line fit to an amorphous point cloud)
- eg in cross-validation
- very many folds can reduce bias (ie training on most of the test set for the k-1 folds being close to the best model we could expect, given the training data), but show high variance between resamples (the small held-out fold could contain a lot of outliers, eg)
- very few folds can produce a poor fit (high bias), but low variance (performs about the same on unseen data)
3
Q
decomposition of MSE in model fitting and error analysis
A
- the MSE formula, (1/n) sum_{test_samples} (y_i - y-hat_i)^2, can have its mean decomposed into compoments
- E(MSE) = sig^2 + (model bias)^2 + model variance
- sig–the inherent, irreducible noise in the data
- bias–the misfit between the model in question and the ideal model’s fitting surface
- variance–the inter-training variations as the model is fit on various sample data from the population
4
Q
Kappa (Cohen’s)
A
- for classification models, a measure of accuracy; rooted in comparing eg 2 parties’ predictions
- considers the confusion matrix
- kappa = 1- (1-p_o) / (1-p_e)
- p_o is observed agreement
- p_e is expected agreement
- for binary confusion matrix
- p_o is the proportion of samples on the diagonal
- p_e is computed as sum of the products of the row/col marginals relevant to the diagonal entries (sum of 2 products)
- Fleiss may extend this to > 2 classes
5
Q
log loss
A
- aka logarithmic loss or cross-entropy loss
- eg for binary classifier
- form the likelihood–how likely dod the model think the actual training observations were?
- ie for classes {0,1}, with probabilty refering to how likely class 1, if the instance j is labeled 0, use (1-pj), and if instance j is labeled 1, use pj–then take product over all instances
- the negative of the log of this result is the log loss
- penalizes “confidently wrong” answers more than eg Brier score
6
Q
Brier score
A
- for probabilistic classification models, a measure of accuracy
- k classes are coded as k-tuples, eg (0,1,0) for a training instance of class B of A, B, C
- then take MSE between these class labels, coded as tuples, and model class predictions–sum over each instance, then over all instances
7
Q
odds ratio
A
- can be used to assess informativeness of binary predictors in the case of binary classification; literally a ratio of odds (ie p/(1-p))
- compute “probability” of an event (binary outcome variable, say, is “positive”) (via instance class labels) for both levels of the (binary) predictor–then,
- OR = odds of “positive” prediction at predictor level A / odds of “positive prediction at predictor level B = p1/(1-p1) / p2/(1-p2) = p1(1-p2) / (p2(1-p1))
- represents increase in odds of the “event” when going from first level of the predictor (ie related to p1) to the second level (ie related to p2)
- can also be used for control/treatment group comparisons, where it can effectively remove the class priors of the sample dataset, and still be applied to the general population without adjustment (Kaplan)
8
Q
global min/max search routines
A
- eg have created a predictive model, now want to find the point (predictor values) corresponding to max/min outcome values
- methods:
- Nelder-Mead simplex method
- simulated annealing