Ensemble methods Flashcards

Question 1

Q

In Ensemble Learning, how to combine K different models?

Answer

A

P(t|x) = Σ Πk(x) * P(t|x, k)
where Πk(x) = P(k|x) are input-dependant mixing coefficients

Question 2

Q

What is bagging (= bootstrap aggregating)?

Answer

A

It’s averaging the predictions of a set of models, each of which is trained on a different subset of the total dataset.

Question 3

Q

What is the expected error after bagging?

Answer

A

E_com = E_av / M, where :
- E_av is the average error of all individual models
- M is the number of model

Question 4

Q

What is a Random Forest?

Answer

A

Training K different Decision Trees on different part of the dataset (using bootstrapping)
For each node, randomly pick m < M different variables to consider
No pruning is needed

By averaging the prediction of multiple high variance trees, we reduce both bias and variance (as compared with simple a simple Decision Tree).

Question 5

Q

What are Extremly Randomized Trees (ERTS)?

Answer

A

It’s like a Random Forest but in addition to randomly picking m attributes, the attribute test is also chosen randomly (2-way split, multiway split, etc.).
ERTS also include a depth hyperparameter.

Question 6

Q

What is boosting?

Answer

A

It’s sequentially training multiple classifiers by weighting the examples according to the previous classifier errors (misclassified datapoints gain more weight). The global prediction is determined by majority voting.

Question 7

Q

What is the AdaBoost algorithm?

Answer

A

1) Initialize each weight to 1/N
2) For m = 1, 2, … M :
- fit a classifier y_m(x) by minimizing Jm = Σ w_n^[m] * Id(y_m(x(n) ≠ t(n)))
- evaluate Em = Jm / Σ w_n^[m] and α_m = ln[ (1 - Em) / Em]
- update the weights: w_n^[m+1] = w_n^[m] * exp( α_m * Id(y_m(x(n) ≠ t(n))) )

3) y_M(x) = Sign( Σ α_m * y_m(x) )

Question 8

Q

What is a decision stump?

Answer

A

It’s a 1-level decision tree (i.e. a decision tree containing a single node).

Question 9

Q

What are weak learners?

Answer

A

It’s the base learners used for boosting (which doesn’t require complex models as base learners, as opposed to bagging).

Question 10

Q

What is the exponential error function formula?

Answer

A

E_M = Σ exp( - t(n) * f_M(x(n)) ), where f_M is a linear combination of multiple classifiers: f_M(x(n) = Σ α_m * y_m(x(n)) / 2

Question 11

Q

Which function is boosting approximating?

Answer

A

The log-odds ratio y(x) = ln[ P(t=+1|x) / P(t=-1|x) ] / 2

Question 12

Q

Compare bagging vs boosting.

Answer

A

faster vs slower
small vs higher error reduction
works well with reasonable vs weak base classifiers
doesn’t vs does overfit to wrong labels
reduces variance vs bias

Question 13

Q

What is stacking?

Answer

A

It’s an ensemble method using different types of base models. The different outputs are then fed to another model, called Meta-classifier to make final prediction. K-fold cross validation is a good way of avoiding overfitting when stacking.

Ensemble methods Flashcards

(13 cards)