Decision Trees Flashcards

1
Q

What is a decision tree?

A

A decision support tool which uses a tree like graph used to model decisions and possible consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does an internal node represent?

A

A feature. For example, ‘Outlook’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a branch correspond to?

A

A feature value. For example, ‘rain’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a leaf represent?

A

A classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the algorithm for creating a decision tree?

A
  • Split objects into subsets
  • Are they pure
  • Yes, STOP
  • No, repeat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do we mean by a subset being pure?

A

All of the items in the set give the same output.

i.e all examples give no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

After a split, what do we want to be more certain about?

A

More about the yes/no decision.

We want the subsets to be more pure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does entropy measure?

A

The uncertainty within a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for entropy?

A

∑ − P(i) log2 ( P(i) )
Sum of all separate parts of the set
Where P(i) is the percentage of i in the set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does an entropy of 1 tell us about a set?

A

There is an even split of all i in the set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does an entropy of 0 tell us about a set?

A

There is only one type of i in the set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does information gain measure?

A

The effectiveness of a feature in classifying training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula for information gain

A

Entropy (S) - ∑ | S(v) | / |S| Entropy( S(v) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the ID3 Algorithm do?

A

It deicides which feature is best to split on recursively, using the information gain of each of the possible features until there are no more features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the outline of the ID3 Algorithm

A

Create root node

A = Feature that has highest information gain
set a as Root
For each possible value v of A:
- Add new branch corresponding A -> v
- let S(v) be the subset of examples in S with A=v
- if S(v) is empty: add leaf node with most common value of Label in S
- else: below this branch add a subtree

repeat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the formal definition of overfitting?

A

A given hypothesis h overfits training data if there is an alternative hypothesis h’ such that

The error of the training data of the first hypothesis is less than the error of the training data of the alternate hypothesis

and

The error of real data for the first hypothesis is greater than the error of real data for the alternate hypothesis

17
Q

When we train a decision tree, what is there the danger of?

A

The danger that the tree memorises the training set, as opposed to learning the general concepts in the data

Overfitting

18
Q

How can we avoid overfitting?

A
  • Do not split data when it is not statistically significant
  • Validation methods
  • Grow a full tree, then prune branches that overfit
    • Reduced Error Pruning (REPTree)
    • Rule Post Pruning (RPP)
19
Q

What is the algorithm for Reduced Error Pruning?

A

Split data into training and validation set

Do until further pruning is harmful:
- Evaluate impact on validation set of pruning each possible node (and those below)
- Greedily remove the one that improves validation set accuracy

20
Q

What is the rule for Rule Post Pruning?

A

Grow the tree to overfit to the training data

  • Convert tree to equivalent set of rules: One for each root to leaf path
  • Prune each rule independently by removing preconditions if they do not worsen the rule’s accuracy
  • Sort final rules by accuracy and employ them in this sequence

Estimate accuracy using the validation or training set

21
Q

What are two examples of alternate splitting criteria?

A
  • Gain Ratio
  • Gini Index
22
Q

What are the advantages of Gain Ratio as a splitting criteria?

A
  • More sensitivity to how a feature splits data
  • Discourages the selection of features with many values but have uniform distribution
23
Q

What are the advantages of the Gini Index?

A
  • Favours larger partitions
  • Uses squared proportion of classes: Less sensitive to noise
24
Q

What is the optimal Gini Index number?

A

0.0 - Low Gini index implies this is a good feature to split on

25
Q

How are Random Forests constructed?

A

Grow k different decision trees

  • Pick a random subset of training objects
  • Grow a full tree from these objects (NO PRUNING)
    • When splitting choose random features
    • Compute gain based upon the random features rather than the whole set

Repeat for all k trees

26
Q

How do we test a data object X with Random Forests?

A

Classify X using each of the trees
Use majority voting to get the class of X

27
Q

When are some instances that we should use decision trees?

A
  • Instances that can be described by feature value pairs
  • Targets are discrete valued
  • Problems which can be described with disjunction hypothesis: This and That gives y
  • Data which may be noisy
  • Data which may contain missing feature values
28
Q

When are some instances that we shouldn’t use decision trees?

A
  • Not ideal for real-valued decisions
  • Problems that are not easy to express e.g. XOR
  • “Sparse” data as overfitting can be a problem
29
Q
A