Ensembles Flashcards

Question

Bagging

Answer 1

Each model in the ensemble is trained on a random sample of the dataset - each random sample is the same size as the original dataset (unless subagging is used) - sampling with replacement is used

Answer 2

Each new model added to an ensemble is biased to pay more attention to instances that previous models misclassified

Answer 3

By incrementally adapting the dataset used to train the models - uses a weighted dataset - each instance has an associated weight (w(i) >= 0) - weights are initially set to 1/n (n=number of examples) - these weights are used as a distribution over which the dataset is sampled to create a replicated training set - number of times that an instance is replicated is proportional to its weight

Answer 4

Iteratively creating models and adding them to the ensemble

Answer 5

- predefined number of models have been added | - model's accuracy dips below 0.5

Answer 6

1. Accuracy of models > 0.5 | 2. Assumes it's a binary classification problem

Answer 7

1. Induces a model using the weighted dataset and calculates the total error, E, in the set of predictions made by the model for the instances in the training set. The E value is calculated by summing the weights of the training instances for which the predictions by the model are incorrect. 2. Increases weights for instances misclassified by the model: w[i] = w[i] * (1 / (2 *E)) 3. Decreases the weights for the instances classified correctly by the model: w[i] = w[i] * (1 / (2*(1-E)) 4. Calculates a confidence factor, such that alpha increases as E decreases. alpha = (0.5) * log((1-E)/E)

Answer 8

1. Train N models in parallel using bootstrapped data samples from an overall training set 2. Aggregates using majority voting

Answer 9

It results in duplicates within each of the bootstrap samples and consequently every bootstrap sample will be missing some instances too - every bootstrap sample will be different, and so every model will be different

Answer 10

1. Increased learning | 2. Model complexity

Answer 11

1. Simpler to implement and parallelize | => Ease of use and reduced training time

Answer 12

Caruana and Niculescu-Mizil (2006) found that bagging and boosting were the best performers out of 7 predictor types Caruana et al. (2008) - boosted DTs were best performing model of those tested for datasets <4k descriptive features. For more than 4k features, RFs were better as boosting led to overfitting

Answer 13

Bagging Predictors, 1996, Leo Breiman bagging can push a good but unstable procedure towards optimality. It can degrade the performance of stable procedures.

Answer 14

- Extension of Boosting - Iteratively train up ensemble - Train models to predict residuals e.g. XGBoost

Answer 15

Creates an ensemble model by iteratively adding learners (similar to Ada)

Answer 16

More aggressive - fitting each model directly to the errors of the ensemble (up to the current iteration) rather than a weighted dataset which is more subtle

Answer 17

Best model to fit would be the model that predicts the difference between the old model's prediction and the true prediction t[i] - M(n-1)d[i] - Uses gradient descent to reduce J (error in predictions)

Answer 18

Because we treat the residuals as the negative gradients of the loss function Under the hood, we're doing gradient descent on the error surface

Answer 19

Stacking ensembles use a machine learning model to combine the outputs of the base models in an ensemble Using the predictions of the base models as features at the stacked layer Can be more effective than simple majority voting or weighted voting Very common - heterogenous ensembles Common to use k-folds to generate the stacked level training set Requires new datasets to be generated for the stack layer

Answer 20

Produces small gains for lots of extra complexity

Ensembles Flashcards

(44 cards)