Random Forest Flashcards

Question 1

Q

Random Forest in 1 Sentence

Answer

A

An ensemble method uses random features and bagged data to create some number of uncorrelated trees to be able to make predictions.

Question 2

Q

Tree Depth

Answer

A

The depth of which the trees subsplit. You can indicate a maximum depth of subsplits to limit the amount of overfitting.

Question 3

Q

Minimum Samples per Split
vs
Min Samples per Leaf

Answer

A

Both are ways to prune the tree.

Per Split: Minimum number of samples to split on at a node.

Per Leaf: Require a minimum number of samples per leaf that a split will result in. Tuning this parameter has effect of smoothing predictions since it wont result in (near) empty leaves.

Question 4

Q

Gini Impurity

Answer

A

A method to calculate the purity of a split. It is measurement of the likelihood of an incorrect classification of a new sample if the new sample were randomly classified according to distribution of class labels. Individual splits are calculated based on the amount of Gini Gain for each Gini split.

https://youtu.be/7VeUPuFGJHk?t=391

Question 5

Q

OOB Score

Out of Bag Score

Answer

A

Random Forest’s internal accuracy calculation on the predictions of samples that are left out of component decision trees. Roughly equivalent to accuracy

EG: DT1 is trained on 2/3rds of data. The remaining 1/3rd is predicted by DT1. The accuracy of those predictions is OOB score.

Question 6

Q

Maximum Features (RF)

Answer

A

Number of features to consider when making a split. Typically sqrt(n_features) or log(n_features).