Decision Trees and Random Forest Flashcards
1
Q
Motivation for Decision Trees
A
- goal of classifier: give accurate prediction; its sufficient to know that a test point is in an area where all neighbors are of a particular class, exact identities are irrelevant
- Decision Trees:
- build tree structure that recursively divides space into regions with similar labels
- very fast inference
- no metric needed
- bad ML algorithms but bias variance problems can be adressed by
- variance: bagging
- bias: boosting
2
Q
Decision Trees
A
- which node on top?
- gini impurity, total gini impurity (weighted average of gini impurities)
- choose lowest as root
- do the same for the next leaves
- only include node if gini impurity gets better
- numeric
- sort lowest to highest
- calc average between adjacent rows
- calc gini impurities of different averages
- take lowest
3
Q
Random Forest
A
- Bootstraped data set (randomly select samples till size of original ds)
- build decision tree from that (only select a subset of dims randomly)
- repeat
- classify an example: run through all trees, choose which label is assigned the most often
- Test RF
- 1/3 of data does not appear in dataset becaust of bootstraping
- out of bag data set
- run to test RF - ‘out of bag error’
4
Q
Bagging
A
Bootstrap aggregating
5
Q
Random Forest Algorithm
A
6
Q
Random Forest
Differences to Individual Tree
A
7
Q
Boosting
A
- draw sample of observations from data (with replacement)
- oberservations are not sampled randomly, higher weight observations are more likely choosen
- wheight each training example by how incorrectly it was classified
8
Q
A