Decision Trees Flashcards

Question 1

Q

What is the ID3 algorithm?

Answer

A

Split (node, {example}):

A
For each value of A create child node
Split training {example}’s to child nodes
For each subset Split (child_node, {subset}) until subset is pure

Question 2

Q

What is the definition of Entropy?

Answer

A

H(S) = -p₍₊₎ log₂ p₍₊₎ - p_(-) log₂ p_(-)

Where p₍₊₎ and p₍₊₎ are % positive/negative examples in S

Question 3

Q

How can we interpret entropy?

Answer

A

How many bits needed to tell if X is positive or negative

Question 4

Q

How do we compute the expected drop in entropy (gain)?

Question 5

Q

What does gain tell us?

Answer

A

Average entropy after the split, biased to larger subsets

Question 6

Q

What is infomation gain?

Answer

A

Difference between the entopy before the split and after the split

Question 7

Q

What does infomation gain tell you?

Answer

A

How much more certain you are after a split

Question 8

Q

What happens if you run ID3 to completion?

Answer

A

All subsets will be pure

Question 9

Q

How to Decision trees overfit?

Answer

A

Running ID3 to completion, end up with lots of singleton subsets, not alot of confidence in estimates (only 1 traning example).

Question 10

Q

How can we avoid overfitting (decision trees)?

Answer

A

Create full DT
For each node
- Remove
- Measure performance on validation set
Remove node that results in greatest improvement
Repeat until accuracy drops

Question 11

Q

How is split entropy defined?

Question 12

Q

What do we use split entropy for

Answer

A

Normalize infomation gain by how fine grained the split is

Question 13

Q

Definition of gain ratio?

Question 14

Q

What does GainRation penalize?

Answer

A

Attributes with many values

Question 15

Q

What is the problem with infomation gain?

Answer

A

Biased towards attributes with many values (they create lots of small pure subsets)

Question 16

Q

Whats unique about DT?

Answer

Study These Flashcards

A

Not a black box, you can interpet rules of the tree

Question 17

Q

How can we expland DT to multi class?

Answer

Study These Flashcards

A

Predict most freq class
Generalize entropy: H(S) = -Σ_c p_(c) log₂ p_(c)where p_(c)is % of class c in S

Question 18

Q

How can we expand DT to regression?

Answer

Study These Flashcards

A

Predict average of examples in subset (or use linear regression)
Minimize variance in subsets

Question 19

Q

Whats are the pros of DT?

Answer

Study These Flashcards

A

Interpretable
Easly handles irrelevant attributes (gain = 0)
can handle missing data
very compact
very fast testing time: O(depth)

Question 20

Q

What are the cons of DT?

Answer

Study These Flashcards

A

Only axis-aligned splits of data
Greedy (may not find best tree
- Exponentially many possible trees

Question 21

Q

How do you create a random decision forest?

Answer

Study These Flashcards

A

Grow K different decision trees
- Pick a random subset S_r of training examples
- Grow a full ID3 tree T_r
  - Pick from a subset d << D random attributes
  - Compute gain based on S_r
- Repeat for r = 1…k

Question 22

Q

How do you classify an example with random forrests?

Answer

Study These Flashcards

A

Classify each X using trees T₁ to T_k
Use majority vote

Question 23

Q

What does entropy measure?

Answer

Study These Flashcards

A

How pure/inpure a subset is

Decision Trees Flashcards

(23 cards)