Decision Trees Flashcards

1
Q

What is the ID3 algorithm?

A

Split (node, {example}):

  1. A
  2. For each value of A create child node
  3. Split training {example}’s to child nodes
  4. For each subset Split (child_node, {subset}) until subset is pure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the definition of Entropy?

A

H(S) = -p(+) log2 p(+) - p(-) log2 p(-)

Where p(+) and p(+) are % positive/negative examples in S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we interpret entropy?

A

How many bits needed to tell if X is positive or negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we compute the expected drop in entropy (gain)?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does gain tell us?

A

Average entropy after the split, biased to larger subsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is infomation gain?

A

Difference between the entopy before the split and after the split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does infomation gain tell you?

A

How much more certain you are after a split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens if you run ID3 to completion?

A

All subsets will be pure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to Decision trees overfit?

A

Running ID3 to completion, end up with lots of singleton subsets, not alot of confidence in estimates (only 1 traning example).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we avoid overfitting (decision trees)?

A
  • Create full DT
  • For each node
    • Remove
    • Measure performance on validation set
  • Remove node that results in greatest improvement
  • Repeat until accuracy drops
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is split entropy defined?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we use split entropy for

A

Normalize infomation gain by how fine grained the split is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Definition of gain ratio?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does GainRation penalize?

A

Attributes with many values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the problem with infomation gain?

A

Biased towards attributes with many values (they create lots of small pure subsets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Whats unique about DT?

A

Not a black box, you can interpet rules of the tree

17
Q

How can we expland DT to multi class?

A
  • Predict most freq class
  • Generalize entropy: H(S) = -Σc p(c) log2 p(c) where p(c) is % of class c in S
18
Q

How can we expand DT to regression?

A
  • Predict average of examples in subset (or use linear regression)
  • Minimize variance in subsets
19
Q

Whats are the pros of DT?

A
  • Interpretable
  • Easly handles irrelevant attributes (gain = 0)
  • can handle missing data
  • very compact
  • very fast testing time: O(depth)
20
Q

What are the cons of DT?

A
  • Only axis-aligned splits of data
  • Greedy (may not find best tree
    • Exponentially many possible trees
21
Q

How do you create a random decision forest?

A
  • Grow K different decision trees
    • Pick a random subset Sr of training examples
    • Grow a full ID3 tree Tr
      • Pick from a subset d << D random attributes
      • Compute gain based on Sr
    • Repeat for r = 1…k
22
Q

How do you classify an example with random forrests?

A
  • Classify each X using trees T1 to Tk
  • Use majority vote
23
Q

What does entropy measure?

A

How pure/inpure a subset is