Random Forest Flashcards

1
Q

Random Forest in 1 Sentence

A

An ensemble method uses random features and bagged data to create some number of uncorrelated trees to be able to make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tree Depth

A

The depth of which the trees subsplit. You can indicate a maximum depth of subsplits to limit the amount of overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Minimum Samples per Split
vs
Min Samples per Leaf

A

Both are ways to prune the tree.

Per Split: Minimum number of samples to split on at a node.

Per Leaf: Require a minimum number of samples per leaf that a split will result in. Tuning this parameter has effect of smoothing predictions since it wont result in (near) empty leaves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Gini Impurity

A

A method to calculate the purity of a split. It is measurement of the likelihood of an incorrect classification of a new sample if the new sample were randomly classified according to distribution of class labels. Individual splits are calculated based on the amount of Gini Gain for each Gini split.

https://youtu.be/7VeUPuFGJHk?t=391

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OOB Score

Out of Bag Score

A

Random Forest’s internal accuracy calculation on the predictions of samples that are left out of component decision trees. Roughly equivalent to accuracy

EG: DT1 is trained on 2/3rds of data. The remaining 1/3rd is predicted by DT1. The accuracy of those predictions is OOB score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Maximum Features (RF)

A

Number of features to consider when making a split. Typically sqrt(n_features) or log(n_features).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly