Lecture 9 - Decision Trees Flashcards

1
Q

Decision Tree

A
  • Data-driven method
  • Popular classification technique

Reasons

  • Performs well across a wide range of situations
  • Does not require much effort from the analyst
  • Easy understandable by the consumers
    • At least when the trees are not too large
  • Can be used for both:
    • Classification, called classification trees
    • Prediction, called regression trees
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Example

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nodes

A
  • Conditions in the nodes give the splitting value on a predictor
  • The number inside the node gives the records after the split
  • The bracket provides the number of records per class: [not acceptor, acceptor]
  • The leaf nodes, named terminals, are marked with color to indicate a non-acceptor (orange) or acceptor (blue)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Trees are easy translated into a set of rules

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Induction (with a Greedy Strategy)

A
  • Tree is constructed in a top-down recursive divide-and-conquer manner
  • St start, all the training instances are at the root
  • Instances, i.e., from the training set, are then partitioned recursively based on selected attributes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Issues with Induction (with a Greedy Strategy)

A
  • Determine how to split the records
    • How to specify the attribute test condition?
    • How to determine the best split?
    • Determine when to stop splitting

Specifying Test Condition

  • Depends on attribute type:
    • Nominal
    • Ordinal
    • Continuous
  • Depends on number of ways to split:
  • Binary split, i.e., 2-way
  • Multi-way split
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Splitting based on nominal attributes

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Splitting based on Continuous Attributes

A

Discretization vs binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Determining the Best Split

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Information gain

A
  • Used to determine which feature/attribute provide the maximum information about a class
  • Split records based on an attribute test optimising certain criterion
  • Need a measure of node impurity, e.g., Gini Index, Entropy, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Information gain (visual)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Gini Index

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Entropy measure

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Combined impurity

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Categorical Attributes

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Stopping Criteria for Tree Induction

A
  • Stop expanding a node when all the records belong to the same class
  • Stop expanding a node when all the records have similar attribute values
  • Early termination (to be discussed later)
17
Q

How to Address Overfitting

A

Pre-Pruning

  • Stop the algorithm before it becomes a fully-grown tree
  • Typical stopping conditions for a node:
    • Stop if all instances belong to the same class
    • Stop if all attribute values are the same
  • More restrictive conditions:
    • Stop if number of instances is less than some user-specified threshold
    • Stop if expanding the current node does not improve impurity measures, e.g., Gini or information gain
18
Q

How to Address Overfitting

A

Post-pruning

  • Grow decision tree to its entirety
  • Trim the nod es of the decision tree in a bottom-up fashion
  • If generalisation error improves after trimming, replace sub-tree by a leaf node
  • Class label of leaf node is determined from majority class of instances in the sub-tree
19
Q

Pros and cons of decision trees

A

Advantages:

  • Easy to understand (domain experts love them)
  • Easy to generate rules

Disadvantages

  • May suffer from overfitting
  • Classifies by rectangular partitioning (so does not handle correlated features very well)
  • Can be quite large - pruning is necessary
  • Does not handle streaming data easily
    • … but a few successful ideas/techniques exist