Decision Trees and Random Forest Flashcards

Question 1

Q

Motivation for Decision Trees

Answer

A

goal of classifier: give accurate prediction; its sufficient to know that a test point is in an area where all neighbors are of a particular class, exact identities are irrelevant
Decision Trees:
- build tree structure that recursively divides space into regions with similar labels
- very fast inference
- no metric needed
bad ML algorithms but bias variance problems can be adressed by
- variance: bagging
- bias: boosting

Question 2

Q

Decision Trees

Answer

A

which node on top?
- gini impurity, total gini impurity (weighted average of gini impurities)
- choose lowest as root
do the same for the next leaves
- only include node if gini impurity gets better
numeric
- sort lowest to highest
- calc average between adjacent rows
- calc gini impurities of different averages
- take lowest

Question 3

Q

Random Forest

Answer

A

Bootstraped data set (randomly select samples till size of original ds)
build decision tree from that (only select a subset of dims randomly)
repeat
classify an example: run through all trees, choose which label is assigned the most often
Test RF
- 1/3 of data does not appear in dataset becaust of bootstraping
- out of bag data set
- run to test RF - ‘out of bag error’

Question 4

Q

Bagging

Answer

A

Bootstrap aggregating

Question 5

Q

Random Forest Algorithm

Question 6

Q

Random Forest

Differences to Individual Tree

Question 7

Q

Boosting

Answer

A

draw sample of observations from data (with replacement)
oberservations are not sampled randomly, higher weight observations are more likely choosen
wheight each training example by how incorrectly it was classified

Question 8

Q

(8 cards)