Machine Learning Flashcards
How to train and test your model?
What to do if you want to choose between several models the best one?
Split the data into 2/3 train and 1/3 test
Split into training, validation and test
What is a confusion matrix?
A matrix with binary rows and columns. Rows are used for correct or false prediction and the columns for the event(p.e. Spam or not Spam)
4 possible outcomes:
- True positive: “This message is spam, and we correctly predicted spam.”
- False positive (Type 1 Error): “This message is not spam, but we predicted spam.”
- False negative (Type 2 Error): “This message is spam, but we predicted not spam.”
- True negative: “This message is not spam, and we correctly predicted not spam.”
Accuracy is not a good indicator for a good model? What is it instead?
It’s common to look at the combination of precision and recall. Precision measures how accurate our positive predictions were. And recall measures what fraction of the positives our model identified. They are combined into sth. called the F1 score.
Usually the choice of a model involves a trade-off between precision and recall. A model
that predicts “yes” when it’s even a little bit confident will probably have a high recall but
a low precision; a model that predicts “yes” only when it’s extremely confident is likely to
have a low recall and a high precision.
What is a feature in machine learning?
Every input you feed into the model. P.e if you shall compute the salary based on the years of experience, these years of experience are the only feature you have. Features can be limitless, but keep in mind, if you have to many you could overfit the model, and if you have too little you could underfit.