7 Flashcards
If you have trained 5 different models with 95% precision - how can you improve these
Combining them with voting ensemble
Whats the difference between hard and soft voting classifiers
Hard counts votes and picks the most common
soft computes average probability and picks the highest
Can you speed up bagging, pasting, boosting, random forests, or stacking by distributing across multiple servers
Bagging, pasting, and random forests, stacking can be
Whats the benefit of out of bag evaluation
It doesn’t require a validation set for accurate testing - meaning more training data can be used
Why is extra trees more random, how does it help and is it faster then random forests
They use random thresholds as well as random feature subsets
The added randomness acts as a regularization tool and allows for faster training
How can you make adaboost stop underfitting
Increase the number of estimators or reduce regularization hyper parameters or increase learning rate
How can you stop Gradient boosting from over fitting - what can be done with learning rate
Decrease learning rate
Early stopping