model evaluation Flashcards
What is the requirement for a goodness metric for backpropagation to work?
has to be differentiable
How do you define the goodness of your model?
Ability to generalize to unseen data
What can affect generalizability?
- algorithm
- hyperparameter values
- training data
- random initialization (can affect accuracy)
What are 4 methods of quantifying generalization goodness?
1) accuracy, precision, recall
2) mean absolute error
3) RMSE
4) area under ROC curve
Come up with a metric for Predicting lung cancer from chest x-rays
Recall, precision, FN/P
Come up with a metric for predicting high school GPA
MAE
Come up with a metric for evaluating search engine results
recall
Come up with a metric for predicting the location of an object in 3D space
euclidean/cosine distance
Come up with a metric for predicting if twitter user is a liberal or conservative
AUC
What is sensitivity?
True positive rate, also recall
What is specificity?
True negative rate
What is precision?
A measure of exactness. The percentage of tuples labeled as positive and are actually such
What is the F-measure?
Harmonic mean of precision and recall, giving equal weight to each
What metrics can you use for regression?
MAE, RMSE
Why do we take the square of metrics?
Taking the square results in steeper gradients when searching and gives higher penalities. RMSE accentuates the impact of outliers. Otherwise might take too long t
What is a condition of the AUC
Binary prediction metric that only applies when label is binary, predictions have to be probablistic.
What is the AUC?
Probability that a randomly chosen positive label prediction will be greater than a randomly chosen negative label prediction
What does an AUC of 0.5 mean?
AUC = 0.50 means no better than random chance
What is K-fold CV?
Tuples randomly partitioned into K mutually exclusive subsets
What is a standard value for K?
5 or 10
What is the algorithm for K-fold CV?
1) Choose a K and randomly assign each row to a single fold (between 1 and K)
- This fold assignment indicates the fold in which that row will serve as a tuple in the test set
2) Conduct K phases of training/testing, start with fold 1, which will serve as the test set, with the other folds serving as the training set
3) Calculate error metric on the whole dataset (concatenation of results from each fold)
Do you calculate an error metric for each fold in K-fold CV?
No, it is still only calculated once
What is CV used for?
Model selection
When would CV NOT give you better results than handpicking your train and test data?
With temporal data
What is ROC curve?
Shows tradeoff between TPR and FPR. It allows us to visualize the tradeoff between the rate at which it mistakenly identifies negative cases as positive for different portions of the test set