4-Decision Trees Flashcards
How easy is it to construct optimal decision trees?
Construction of decision trees is NP-Hard
What is information gain?
Reduction of entropy before and after the data has been partitioned by the attribute A
IG(Attribute | Node)
What is entropy?
The expected (average) level of surprise or uncertainty. It is a measure of unpredictability.
Given a probability distribution, the entropy is the information required to predict an event
What is the level of unpredictability of a singular event?
Level of unpredictability for a single event is self-information.
Self-info(i) = log2(1/P(i)) = -log2(P(i))
How is entropy calculated?
H(x) = sum from i = 0 to N of (P(x_i)*self_info(x))
H(x) = sum from i = o to N of (-P(x_i)*log2(P(i))
How does entropy inform decision trees?
If entropy is low, the event is better predictable. This occurs when there is a single class that has a high probability
What is mean information?
The weighted average (by probability) of the entropy of child nodes
What is the cons of information gain?
It prefers highly branching attributes, as they are more likely to be homogeneous
What is gain ratio?
Gain ratio reduces the bias of information gain, by normalising against the split info
GR(Attr|Node) = IG(Attr | Node) / SI(Attr | Node)
It discourages the selection of many uniformly distributed values
What is split info?
Split info is the entropy of a given split.
SI(Attr | Node) = H(Attr) = sum from i to N of (-P(x_i)*log2(P(x_i))
When does ID3 stop?
ID3 stops when all instances of a node are off the same class. As then the information gain is 0. It can be modified to stop once a threshold is reached
What are the characteristics of ID3?
It searches a space of hypotheses for one that fits the training data
The space searched is the set of possible decision trees
Performs a greedy simple-to-complex search through the hypothesis space with no backtracking
What are the pros of decision trees?
Strong, basic, supervised learner
Fast to train and fast to classify
Very transparent
What are the cons of decision trees?
Prone to overfitting
Loss of information for continuous variables
Complex calculation if there are many variables
No guarantee to return global optimum
What are alternatives to ID3?
Oblivious trees - require same attribute at every node in a layer
Random tree - uses a random set of attributes