Chapter 15 Improve Performance with Ensembles Flashcards
EXTERNAL Q: WHY DOESN’T GRADIENT BOOST USE SAMPLE WITH REPLACEMENT?
in default mode, in python, gradient boost subsample parameter is equal to 1, so by default it doesn’t use sampling with replacement. But we can change it to a number (0,1] to have subsamples in this case, based on the link, sample with replacement is done, this reduces variance and increase bias
WHAT IS THE DIFFERENCE BETWEEN STOCHASTIC GRADIENT BOOST AND RANDOM FOREST IN SAMPLING
Stochastic gradient descent (as used by XGBoost) adds randomness by sampling observations and features in each stage, which is similar to the random forest algorithm except the sampling is done without replacement
WHAT ARE THE THREE MOST POPULAR METHODS FOR COMBINING THE PREDICTIONS FROM DIFFERENT MODELS? P100
Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset.
Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the sequence of models.
Voting. Building multiple models (typically of differing types) and simple statistics (like calculating the mean) are used to combine predictions.
WHAT IS ANOTHER NAME FOR BAGGING? P101
Bootstrap Aggregation
WHAT IS ANOTHER NAME FOR BAGGING? P101
Bootstrap Aggregation
HOW IS THE BAGGING PROCESS DONE? P101
By taking multiple samples from your training dataset (with replacement) and training a model for each sample. The final output prediction is averaged across the predictions of all of the sub-models
WITH WHICH TYPE OF ALGORITHM DOES BAGGING PERFORM THE BEST? P101
High variance
WHAT IS THE SIMILARITY AND THE DIFFERENCE BETWEEN RANDOM FOREST AND BAGGED DECISION TREES? P102
SIMILARITY:
Random Forests is an extension of bagged decision trees.
Samples of the training dataset are taken with replacement.
DIFFERENCE: The trees in random forest are constructed in a way that reduces the correlation between individual classifiers, Specifically, rather than greedily choosing the best split point in the construction of each tree, only a random subset of features are considered for each split
HOW DOES BOOSTING ALGORITHMS WORK? P103
Boosting ensemble algorithms create a sequence of models that attempt to correct the mistakes of the models before them in the sequence.
HOW DO BOOSTING ALGORITHMS WORK? P103
Boosting ensemble algorithms create a sequence of models that attempt to correct the mistakes of the models before them in the sequence.
WHAT ARE THE TWO MOST COMMON BOOSTING ENSEMBLE ML ALGORITHMS? P103
AdaBoost.
Stochastic Gradient Boosting (also called Gradient Boosting Machines)
HOW DOES ADABOOST WORK? P103
It generally works by weighting instances in the dataset by how easy or difficult they are to classify, allowing the algorithm to pay more or less attention to them in the construction of subsequent models.
WHAT IS THE DIFFERENCE BETWEEN STACKING AND VOTING ENSEMBLE MODELS? P105
The predictions of the sub-models can be weighted, but specifying the weights for classifiers manually or even heuristically is difficult. More advanced methods can learn how to best weight the predictions from sub-models, but this is called stacking (stacked aggregation). In stacking, a weight is dedicated to each model, in Voting the results are combined using a simple statistics like mean.