Ensembles Flashcards

Question 1

Q

Ensembles

Answer

A

Predict class label by multiple classifiers
Different experts. Them vote.

Question 2

Q

Bagging

Answer

A

Multiple doctors majority vote
Di ——> Mi ———> Prediction ———> Votes/Average
Unstable: small changes to train cause large changes classifier. Bagging improves.
Regression, Decision, Linear, Neural.
Stable: bagging not good idea. K nearest

Question 3

Q

Random Forests

Answer

A

Combination of tree predictors
Di ——> Ti (subset F of n variables) ——> Save Tree no pruning.
Major voting for classif. or avg for prediction.
Good for classification, no regression.

Out Of Bag
random forest predictor by averaging trees where observation not appear
almost identical by n-fold cross

Question 4

Q

Boosting (Adaboost)

Answer

A

Strong classifier as combination of weak classifiers.
H(x) = sign(sum( alpha_t * h_t(x) ))
alpha_t is weight to weak classifier t
alpha_t = (1/2) * ln ((1-e_t)/e_t)
weights updated: exp(-alpha_t) if correct, exp(alpha_t) if incorrect
normalize weights

Boosting work well with shallow trees
Boosting continues to reduce test error even if training error reaches zero

Question 5

Q

Gradient Boosting

Answer

A

1- Learn regression predictor
2 -Compute error residual
3- Learn to predict residual
4- Go to 2

For instance we use MSE
y = y + alpha * D_MSE
D_MSE (gradient)

Question 6

Q

Extreme Gradient Boost (XGBoost)

Answer

A

Efficient and scalable gradient boost (classif and regression trees)
Only Numerical
Quantile Sketch
Faster on single machines

Question 7

Q

Learning Ensembles

Answer

A

Random forests and boosting computes a set of weaks models
Combines to build stronger model
F(X) = SUM(alpha_m * Tm(X))

Question 8

Q

Stacking Generalization

Answer

A

Trains learning algorithm to combine predictions of an heterogeneus set
First, a set off base learners are trained
Then, meta learner is trained using base classifiers.
Crossvalidation-like scheme
Better than any single one of trained models.