Ensemble Flashcards

1
Q

How do ensemble methods work?

A

They work by combining predictions from several estimators built with a given learning algorithm in order to improve generalization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What kind of methods are used?

A
  • averaging - reduce variance of ‘strong’ estimators

- boosting - reduce bias of ‘weak’ estimators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does averaging work?

A

It works by building several estimators independently and averaging their predictions to reduce the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does boosting work?

A

It works by building sequentially several estimators such that the combined estimator has a reduced bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ‘Pasting’ - averaging methods?

A

Algorithm that uses random subsets of samples drawn randomly from the dataset for its independent estimators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is ‘Bagging’?

A

For averaging methods this means that the samples are drawn with replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ‘Random Subspaces’ - averaging methods?

A

Algorithm that uses random subsets of samples drawn as random subsets of features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is ‘Random Patches’ - averaging methods?

A

Algorithm that uses random subsets of samples drawn as random subsets of both data and features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RandomForests and Extra-Trees - averaging methods

A

They are combine-and-perturb methods based on constructing randomized decision trees and then averaging their prediction results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Differences between RandomForests and ExtraTrees

A

During tree construction:

  • in RFs the node split is picked based on the best split among a random subset of features
  • in ETs the splits are drawn at random for each candidate feature with the best one being picked for the node split
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is bias?

A

It is the error from erroneous assumptions in the learning algorithm.
High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is variance?

A

It is error from sensitivity to small fluctuations in the training set.
High variance can cause overfitting: modeling the random noise in the training data, rather than the intended outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a decision tree?

A

It is a model for predicting a dependent variable Y using an independent var X by checking a collection of splits..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a decision tree split?

A

A split is a condition or query on a single independent variable that is either true or false.
Splits are arranged as a tree with 2 child nodes: left for true condition, right for false condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Boosting intuition

A

minimizes bias by using estimators with low variance and high bias (i.e. shallow decision trees)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Bagging intuition

A

minimizes variance by using estimators low bias and high variance (i.e. full decision trees)

17
Q

What is Gini impurity?

A
  • a measure of how often a randomly chosen element from a set will be classified incorrectly
  • the classification is done randomly according to the distribution of the labels in the set
18
Q

What is information gain?

A
  • a measure given by the difference between the entropy of, for example, the target variable and the entropy of the target variable conditioned by a regressor
    IG(T, rgr) = H(T) - H(T | rgr)
19
Q

What is ‘reduced error’ pruning?

A
  • method of reducing overfitting in decision trees
  • ‘bottom-up’ algo
  • starting at the leaves, each node is replaced with the most popular class. If prediction accuracy is improved, the change is kept.