4-Decision Trees Flashcards

Question 1

Q

How easy is it to construct optimal decision trees?

Answer

A

Construction of decision trees is NP-Hard

Question 2

Q

What is information gain?

Answer

A

Reduction of entropy before and after the data has been partitioned by the attribute A

IG(Attribute | Node)

Question 3

Q

What is entropy?

Answer

A

The expected (average) level of surprise or uncertainty. It is a measure of unpredictability.

Given a probability distribution, the entropy is the information required to predict an event

Question 4

Q

What is the level of unpredictability of a singular event?

Answer

A

Level of unpredictability for a single event is self-information.

Self-info(i) = log2(1/P(i)) = -log2(P(i))

Question 5

Q

How is entropy calculated?

Answer

A

H(x) = sum from i = 0 to N of (P(x_i)*self_info(x))

H(x) = sum from i = o to N of (-P(x_i)*log2(P(i))

Question 6

Q

How does entropy inform decision trees?

Answer

A

If entropy is low, the event is better predictable. This occurs when there is a single class that has a high probability

Question 7

Q

What is mean information?

Answer

A

The weighted average (by probability) of the entropy of child nodes

Question 8

Q

What is the cons of information gain?

Answer

A

It prefers highly branching attributes, as they are more likely to be homogeneous

Question 9

Q

What is gain ratio?

Answer

A

Gain ratio reduces the bias of information gain, by normalising against the split info

GR(Attr|Node) = IG(Attr | Node) / SI(Attr | Node)

It discourages the selection of many uniformly distributed values

Question 10

Q

What is split info?

Answer

A

Split info is the entropy of a given split.

SI(Attr | Node) = H(Attr) = sum from i to N of (-P(x_i)*log2(P(x_i))

Question 11

Q

When does ID3 stop?

Answer

A

ID3 stops when all instances of a node are off the same class. As then the information gain is 0. It can be modified to stop once a threshold is reached

Question 12

Q

What are the characteristics of ID3?

Answer

A

It searches a space of hypotheses for one that fits the training data

The space searched is the set of possible decision trees

Performs a greedy simple-to-complex search through the hypothesis space with no backtracking

Question 13

Q

What are the pros of decision trees?

Answer

A

Strong, basic, supervised learner

Fast to train and fast to classify

Very transparent

Question 14

Q

What are the cons of decision trees?

Answer

A

Prone to overfitting
Loss of information for continuous variables
Complex calculation if there are many variables
No guarantee to return global optimum

Question 15

Q

What are alternatives to ID3?

Answer

A

Oblivious trees - require same attribute at every node in a layer

Random tree - uses a random set of attributes

4-Decision Trees Flashcards

(15 cards)