Decision Trees Flashcards

Question 1

Q

What is a decision tree?

Answer

A

A decision support tool which uses a tree like graph used to model decisions and possible consequences

Question 2

Q

What does an internal node represent?

Answer

A

A feature. For example, ‘Outlook’

Question 3

Q

What does a branch correspond to?

Answer

A

A feature value. For example, ‘rain’

Question 4

Q

What does a leaf represent?

Answer

A

A classification

Question 5

Q

What is the algorithm for creating a decision tree?

Answer

A

Split objects into subsets
Are they pure
Yes, STOP
No, repeat

Question 6

Q

What do we mean by a subset being pure?

Answer

A

All of the items in the set give the same output.

i.e all examples give no

Question 7

Q

After a split, what do we want to be more certain about?

Answer

A

More about the yes/no decision.

We want the subsets to be more pure

Question 8

Q

What does entropy measure?

Answer

A

The uncertainty within a dataset

Question 9

Q

What is the formula for entropy?

Answer

A

∑ − P(i) log2 ( P(i) )
Sum of all separate parts of the set
Where P(i) is the percentage of i in the set

Question 10

Q

What does an entropy of 1 tell us about a set?

Answer

A

There is an even split of all i in the set

Question 11

Q

What does an entropy of 0 tell us about a set?

Answer

A

There is only one type of i in the set

Question 12

Q

What does information gain measure?

Answer

A

The effectiveness of a feature in classifying training data

Question 13

Q

What is the formula for information gain

Answer

A

Entropy (S) - ∑ | S(v) | / |S| Entropy( S(v) )

Question 14

Q

What does the ID3 Algorithm do?

Answer

A

It deicides which feature is best to split on recursively, using the information gain of each of the possible features until there are no more features

Question 15

Q

What is the outline of the ID3 Algorithm

Answer

A

Create root node

A = Feature that has highest information gain
set a as Root
For each possible value v of A:
- Add new branch corresponding A -> v
- let S(v) be the subset of examples in S with A=v
- if S(v) is empty: add leaf node with most common value of Label in S
- else: below this branch add a subtree

repeat

Question 16

Q

What is the formal definition of overfitting?

Answer

A

A given hypothesis h overfits training data if there is an alternative hypothesis h’ such that

The error of the training data of the first hypothesis is less than the error of the training data of the alternate hypothesis

and

The error of real data for the first hypothesis is greater than the error of real data for the alternate hypothesis

Question 17

Q

When we train a decision tree, what is there the danger of?

Answer

A

The danger that the tree memorises the training set, as opposed to learning the general concepts in the data

Overfitting

Question 18

Q

How can we avoid overfitting?

Answer

A

Do not split data when it is not statistically significant
Validation methods
Grow a full tree, then prune branches that overfit
- Reduced Error Pruning (REPTree)
- Rule Post Pruning (RPP)

Question 19

Q

What is the algorithm for Reduced Error Pruning?

Answer

A

Split data into training and validation set

Do until further pruning is harmful:
- Evaluate impact on validation set of pruning each possible node (and those below)
- Greedily remove the one that improves validation set accuracy

Question 20

Q

What is the rule for Rule Post Pruning?

Answer

A

Grow the tree to overfit to the training data

Convert tree to equivalent set of rules: One for each root to leaf path
Prune each rule independently by removing preconditions if they do not worsen the rule’s accuracy
Sort final rules by accuracy and employ them in this sequence

Estimate accuracy using the validation or training set

Question 21

Q

What are two examples of alternate splitting criteria?

Answer

A

Gain Ratio
Gini Index

Question 22

Q

What are the advantages of Gain Ratio as a splitting criteria?

Answer

A

More sensitivity to how a feature splits data
Discourages the selection of features with many values but have uniform distribution

Question 23

Q

What are the advantages of the Gini Index?

Answer

A

Favours larger partitions
Uses squared proportion of classes: Less sensitive to noise

Question 24

Q

What is the optimal Gini Index number?

Answer

A

0.0 - Low Gini index implies this is a good feature to split on

Question 25

Q

How are Random Forests constructed?

Answer

A

Grow k different decision trees

Pick a random subset of training objects
Grow a full tree from these objects (NO PRUNING)
- When splitting choose random features
- Compute gain based upon the random features rather than the whole set

Repeat for all k trees

Question 26

Q

How do we test a data object X with Random Forests?

Answer

A

Classify X using each of the trees
Use majority voting to get the class of X

Question 27

Q

When are some instances that we should use decision trees?

Answer

A

Instances that can be described by feature value pairs
Targets are discrete valued
Problems which can be described with disjunction hypothesis: This and That gives y
Data which may be noisy
Data which may contain missing feature values

Question 28

Q

When are some instances that we shouldn’t use decision trees?

Answer

A

Not ideal for real-valued decisions
Problems that are not easy to express e.g. XOR
“Sparse” data as overfitting can be a problem

Question 29

Q