Model Evaluation Flashcards
Formula for Accuracy
True Positive + True Negative / Number of Items
Formula for Precision (p)
p = TP / TP + FP
Formula for Recall (r)
r = TP / TP + FN
Formula for F-Measure
Precision = p
Recall = r
fm = 2rp / r + p
compare the accuracy of the classifier with a random classifier.
Kappa Statistics
was developed in 1950 for signal detection theory.’
Works only for binary classification.
Receiver Operating Characteristic (ROC)
to estimate performance of classifier on previously unseen data.
Purpose of Model Evaluation
reserve k% for training and (100 - k) % for testing.
Holdout
partition data into k disjoint subset.
Cross Validation
train on k-1 partition, test on the remaining one.
K- Fold
shows how accuracy on unseen examples changes with varying training sample size.
Learning Curve
many algorithms allow choices for learning.
Hyperparameters
3 STEPS IN TRAINING THE MODEL
- Train
- Model Selection
- Test
learn models on the training data using different hyperparameters.
Train
evaluate the models using the validation data and choose the hyperparameters with the best accuracy.
Model Selection
test the final model using the test data.
Test
3 TYPES OF CLASSIFICATIONS ERRORS
- Training Errors
- Test Errors
- Generalization Errors
errors committed on the training set.
Training Errors
errors committed on the test set.
Test Errors
expected errors of a model over random selection of records from same distribution.
Generalization Errors
is when a model is too simple, both training and test errors are large.
Underfitting
when model is too complex, training error is small, but test error is large.
Overfitting
2 REASONS FOR OVERFITTING
- Not enough training data
- High model complexity
2 MODEL SELECTION FOR DECISION TREE
- Pre-Pruning (Early Stopping Rule)
- Post-Pruning
stops the algorithm before it becomes a fully grown tree.
Pre-Pruning (Early Stopping Rule)
grow decision tree to its entirety.
Post-Pruning