Decision Trees Flashcards
Information Gain
Distributions Entropy in bits
entropy(p1,p2,p3,….,pn) = -p1logp1 - p2logp2….
gain(A) = info(D) - info_a(D)
info(D) = information before split
info_a(D) = after split = |D1|/|D| * info(D1) + |D2|/|D| * info(D2) +….
Highly Branching
Problematic: id, primary keys…
Each leaf node is pure,
Overfitting
Information Gain Ratio
Large when data is evenly spread
Small when all data belong to one branch
IntrinsicInfo = - sum(|Si|/|S| * log(Si|/|S|))
S = all, Si = number of each value of attribute
GainRatio(S,A) = Gain(S,A)/IntrinsicInfo
C4.5
Numerical Attributes
Sort values including labels
Check cut points, choose with best info gain.
Sort complexity O(n*logn). Sort of children from parent. O(n)
Gini Index
Binary splits
gini(T) = 1 - sum(pj^2) pj is relative frequency of class j If D split on a into D1 and D2 then, gini_A(D) = |D1|/|D|*gini(D1) + |D2|/|D|*gini(D2) dif_gini(A) = gini(D) - gini_A(D)
Overfitting in Trees
Preprunning.
Not split if goodness measure fall below threshold.
Statistical significance test
Stop when no statistically significant association btw attribute and class.
Postprunning.
remove from fully grown.
use data different from training data to decide best pruned tree
Subtree raising, subtree replacement.
Error on training data not a useful estimator (holdout)
Regression and Model Trees
Prediction computed as avg of numerical target variable
Leafs can contain a linear model to predict target value
Impurity measure? SDR = std(D) - sum(|Di|/|D|*std(Di))
Decision stumps
One level decision trees
Categorical: one branch for each attribute value.
One branch for one value, one branch for all others.
Numerical:
Two leaves defined by threshold.
Multiple splits