Lecture 9 - Decision Trees Flashcards
1
Q
Decision Tree
A
- Data-driven method
- Popular classification technique
Reasons
- Performs well across a wide range of situations
- Does not require much effort from the analyst
- Easy understandable by the consumers
- At least when the trees are not too large
- Can be used for both:
- Classification, called classification trees
- Prediction, called regression trees
2
Q
Example
A
3
Q
Nodes
A
- Conditions in the nodes give the splitting value on a predictor
- The number inside the node gives the records after the split
- The bracket provides the number of records per class: [not acceptor, acceptor]
- The leaf nodes, named terminals, are marked with color to indicate a non-acceptor (orange) or acceptor (blue)
4
Q
Trees are easy translated into a set of rules
A
5
Q
Induction (with a Greedy Strategy)
A
- Tree is constructed in a top-down recursive divide-and-conquer manner
- St start, all the training instances are at the root
- Instances, i.e., from the training set, are then partitioned recursively based on selected attributes
6
Q
Issues with Induction (with a Greedy Strategy)
A
- Determine how to split the records
- How to specify the attribute test condition?
- How to determine the best split?
- Determine when to stop splitting
Specifying Test Condition
-
Depends on attribute type:
- Nominal
- Ordinal
- Continuous
- Depends on number of ways to split:
- Binary split, i.e., 2-way
- Multi-way split
7
Q
Splitting based on nominal attributes
A
8
Q
Splitting based on Continuous Attributes
A
Discretization vs binary
9
Q
Determining the Best Split
A
10
Q
Information gain
A
- Used to determine which feature/attribute provide the maximum information about a class
- Split records based on an attribute test optimising certain criterion
- Need a measure of node impurity, e.g., Gini Index, Entropy, etc.
11
Q
Information gain (visual)
A
12
Q
Gini Index
A
13
Q
Entropy measure
A
14
Q
Combined impurity
A
15
Q
Categorical Attributes
A