Model performance and evaluation Flashcards

1
Q

Why is it important for statistical models to be generalized and not overfitted?

A

The main purpose of a statistical model is not just to explain observed data, we want to make forecasts: being able to predict a future, thus yet unobserved, event.

Models need to be generalized so that they can be applied to unused (future) data, if trained on all available data, they will be overfitted and the model will simply “memorize” all the data and build a model which fits that perfectly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is cross-validation?

A

Cross-validation is the partioning of data to create multiple sets of data for the model to train on.

Each iteration returns one model, one estimate of generalization performance. At the end, we have k models! Better than arguing with just one.

To avoid overfitting test data should be always strictly independent of model building.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a confusion matrix?

A

The minute we run a model, we make mistakes. How can we investigate these mistakes?
One simple way: under the classification problem, is the confusion matrix.

  • A confusion matrix is a n x n contingency table
  • n is the number of classes in our classification problem
  • Different error types which must be accounted differently
How well did you know this?
1
Not at all
2
3
4
5
Perfectly