Lecture 11 - Decision Tree Induction Part 1 Flashcards
What is Induction?
Learning by generalizing from examples or experiences.
Induction can be contrasted with
Deduction
Repeatedly adding pairs of odd numbers and noticing the result is always even is an example of
induction
Constructing a mathematical proof that two odd numbers added will always be even is
deduction
What is a decision tree?
A tree in which each leaf is a decision
Each non-leaf is an attribute
Each branch is a value that the attribute parent can take
What is decision tree induction?
A procedure that attempts to use a training set of data to build a decision tree that will correctly predict the results of any unclassified data
What is a training set, in decision tree induction?
Set of classified examples/samples
Very small fraction of population (usually)
What is a classified example/sample in decision tree induction?
Vector of attributes and values, and the class
e.g:
(Skin Covering = Feathers, Beak = Straight, Teeth = None, Class = Heron)
What is the basic decision tree induction procedure?
Function buildDecTree(examples, atts)
Create node N if necessary
If examples are all in same class return N labelled with that class
If atts is empty return N labelled with modal example class
bestAtt = chooseBestAtt(examples, atts)
label N with bestAtt
for each value ai of bestAtt
si = subset of examples with bestAtt = ai
if si is not empty then
newAtts = atts - bestAtt
subtree = buildDecTree(si, newatts)
attach subtree as child of N
else
create leaf node l
label l with modal example class
attach l as child of N
return N
What is the “best attribute” in decision tree induction?
The attribute that best discriminates the examples with respect to their classes
What is the standard way of discriminating examples in decision tree induction?
Information gain
What is Shannon’s Information Function?
Write the formula for information gain with equiprobable outcomes
A function that determines the number of bits of information gained after an outcome
Information = log2(N)
where N is the number of possible outcomes
or Information = -log2(p)
where p is the probability of any of the equiprobable outcomes
What is the formula of Shannon’s Information Function for non-equiprobable outcomes?
Information is also sometimes known as ________ or ________
Uncertainty
Entropy
With the training set
- | red | blue
class1 | 63 | 7
class2 | 6 | 24
Calculate the information gain from knowing the color
uncertainty_nocolor = -0.69*log(0.69)-0.31*log(0.31) = 0.893
uncertainty_red = -(63/69)*log(63/69) - (6/69)*log(6/69) = 0.426
uncertainty_blue = -(7/31)*log(7/31) - (24/31)*log(24/31) = 0.771
uncertainty_colour = 0.69*0.426 + 0.31*0.771 = 0.533
informationgain_color = 0.893-0.533 = 0.36