Decision Trees Flashcards
Things that make decision trees unique
supervised learning
batch processing of training examples
uses a preference bias
Decision tree non-leaf node
associated with an attribute/feature
decision tree leaf node
associated with a classification
decision tree arc
associated with one of the possible values of attribute of parent node
How does decision tree work?
attribute at root is question; answer is determined by value of that attribute in the input example; answer determines movement of child; repeat until leaf (class label at leaf = classification give to input example)
Ockham’s Razor
Preference bias; the smallest decision tree that correctly classifies all training examples is best
Decision Tree Construction (greed) aka ID3, C5.0
- select best attribute to use for new node at current level
- partiton examples using the possible values of this attribute, assign subsets to appropriate child node; recursively generate child node until all examples at node have same label
How to select best attribute to construct best tree?
random; least values; most values; max gain
Information Value
Given a set S of size |S|, the expected work required to determine a specific element is log2|S|
Entropy Interpretation
The number of yes/no questions (in bits) need on average to determine the value of Y in a random drawing
Entropy H(Y)
H measures the information content (in bits) associated with a set of examples; 0
bit
information needed to answer a yes/no question; real valued scalar
Perfect balance (maximum inhomogeneity)
high entropy - from a nearly uniform distribution
Perfect homogeneity
low entropy - Y is from a varied (peaks/valleys) distribution
max value of H is
log2c (c is the number of classes)