Ensemble methods Flashcards

1
Q

In Ensemble Learning, how to combine K different models?

A

P(t|x) = Σ Πk(x) * P(t|x, k)
where Πk(x) = P(k|x) are input-dependant mixing coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is bagging (= bootstrap aggregating)?

A

It’s averaging the predictions of a set of models, each of which is trained on a different subset of the total dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the expected error after bagging?

A

E_com = E_av / M, where :
- E_av is the average error of all individual models
- M is the number of model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Random Forest?

A
  • Training K different Decision Trees on different part of the dataset (using bootstrapping)
  • For each node, randomly pick m < M different variables to consider
  • No pruning is needed

By averaging the prediction of multiple high variance trees, we reduce both bias and variance (as compared with simple a simple Decision Tree).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are Extremly Randomized Trees (ERTS)?

A

It’s like a Random Forest but in addition to randomly picking m attributes, the attribute test is also chosen randomly (2-way split, multiway split, etc.).
ERTS also include a depth hyperparameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is boosting?

A

It’s sequentially training multiple classifiers by weighting the examples according to the previous classifier errors (misclassified datapoints gain more weight). The global prediction is determined by majority voting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the AdaBoost algorithm?

A

1) Initialize each weight to 1/N
2) For m = 1, 2, … M :
- fit a classifier y_m(x) by minimizing Jm = Σ w_n^[m] * Id(y_m(x(n) ≠ t(n)))
- evaluate Em = Jm / Σ w_n^[m] and α_m = ln[ (1 - Em) / Em]
- update the weights: w_n^[m+1] = w_n^[m] * exp( α_m * Id(y_m(x(n) ≠ t(n))) )

3) y_M(x) = Sign( Σ α_m * y_m(x) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a decision stump?

A

It’s a 1-level decision tree (i.e. a decision tree containing a single node).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are weak learners?

A

It’s the base learners used for boosting (which doesn’t require complex models as base learners, as opposed to bagging).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the exponential error function formula?

A

E_M = Σ exp( - t(n) * f_M(x(n)) ), where f_M is a linear combination of multiple classifiers: f_M(x(n) = Σ α_m * y_m(x(n)) / 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which function is boosting approximating?

A

The log-odds ratio y(x) = ln[ P(t=+1|x) / P(t=-1|x) ] / 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Compare bagging vs boosting.

A
  • faster vs slower
  • small vs higher error reduction
  • works well with reasonable vs weak base classifiers
  • doesn’t vs does overfit to wrong labels
  • reduces variance vs bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is stacking?

A

It’s an ensemble method using different types of base models. The different outputs are then fed to another model, called Meta-classifier to make final prediction. K-fold cross validation is a good way of avoiding overfitting when stacking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly