Week 9 - Model ensembles Flashcards
What are the three types of models based on trees
- Decision trees
- Random forest
- Gradient boosting
What are model ensembles
The wisdom of crowds for machines. Combinations of models are known as model ensembles
if p>1/2 this implies
weak learnability
What 2 things do ensemble methods have in common
- They construct multiple, diverse predictive models from adapted versions of the training data.
- They combine the predictions of these models in some way, often by simple averaging or voting
What is bootstrapping
Bootstrapping is any test of metric that uses random sampling with replacement, and falls under the broader class of re-sampling methods. Boostrapping assigns measures of accuracy to sample estimates.
What is the 3 step process creating a bootstrap sample
- Randomly sample one observation from the set
- Write it down
- Put it back in the set
Bootstrapping contains duplicates for
Diversity
What is bootstrap AGgregating (Bagging) ensembling method
- create multiple random samples from the original data using bootstrapping
- Creating different models (learners) from different random samples of the original data
What is subspace sampling?
Encouraging the diversity in ensembles by building models from a different random subset of features instead of all features
What is random forests
Random forests is an ensemble learning approach for various tasks (regression, classification) that create various decision trees at training time and output the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individial trees.
What is boosting
Boosting is an ensemble method for reducing the error in supervised learning by converting weak learners into strong ones. The goal is that each learner can do a bit better than the previous one by iteratively considering the error rate and try to improve by focusing on the points they did not perform very well on.
How can we improvel learning with boosting?
By giving the misclassified instances a higher weight, and modifying the classifier to take these weights into account
How can we assign weights (boosting)
We want to assign half of the weights to misclassified items and the other half to the correctly classified items
* Every item 1/|D|
* Weight of all misclassified items: e
* Weight of all correctly classified items: 1-e
what is epsilon in boosting
the error rate. FP+FN/Total
What are 3 sources of misclassification erros
- Unavoidable bias
- Bias due to low expressiveness of models
- High variance