Decision Trees Flashcards

Question 1

Q

Things that make decision trees unique

Answer

A

supervised learning
batch processing of training examples
uses a preference bias

Question 2

Q

Decision tree non-leaf node

Answer

A

associated with an attribute/feature

Question 3

Q

decision tree leaf node

Answer

A

associated with a classification

Question 4

Q

decision tree arc

Answer

A

associated with one of the possible values of attribute of parent node

Question 5

Q

How does decision tree work?

Answer

A

attribute at root is question; answer is determined by value of that attribute in the input example; answer determines movement of child; repeat until leaf (class label at leaf = classification give to input example)

Question 6

Q

Ockham’s Razor

Answer

A

Preference bias; the smallest decision tree that correctly classifies all training examples is best

Question 7

Q

Decision Tree Construction (greed) aka ID3, C5.0

Answer

A

select best attribute to use for new node at current level
partiton examples using the possible values of this attribute, assign subsets to appropriate child node; recursively generate child node until all examples at node have same label

Question 8

Q

How to select best attribute to construct best tree?

Answer

A

random; least values; most values; max gain

Question 9

Q

Information Value

Answer

A

Given a set S of size |S|, the expected work required to determine a specific element is log2|S|

Question 10

Q

Entropy Interpretation

Answer

A

The number of yes/no questions (in bits) need on average to determine the value of Y in a random drawing

Question 11

Q

Entropy H(Y)

Answer

A

H measures the information content (in bits) associated with a set of examples; 0

Question 12

Q

bit

Answer

A

information needed to answer a yes/no question; real valued scalar

Question 13

Q

Perfect balance (maximum inhomogeneity)

Answer

A

high entropy - from a nearly uniform distribution

Question 14

Q

Perfect homogeneity

Answer

A

low entropy - Y is from a varied (peaks/valleys) distribution

Question 15

Q

max value of H is

Answer

A

log2c (c is the number of classes)

Question 16

Q

How is entropy related to tree size

Answer

Study These Flashcards

A

small entropy = small tree size

Question 17

Q

conditional entropy H(Y|X)

Answer

Study These Flashcards

A

weighted sum of the entropy of each subset of the examples partitioned by the possible values of attribute x; weighted sum of entropy at each child node generated by x;

Question 18

Q

What does conditional entropy measure

Answer

Study These Flashcards

A

the total impurity, disorder, or inhomogeneity at all the children nodes

Question 19

Q

Information gain

Answer

Study These Flashcards

A

measures the difference in entropy of a node and entropy remaining after the node’s examples are “split” between the children using a chosen attribute; choose attribute that maximizes I(Y;X)

Question 20

Q

Why is high information gain desirable?

Answer

Study These Flashcards

A

Means more of the examples are the same class in each child node; the decision trees rooted at each child that are needed to differentiate between the classes are likely to be small

Question 21

Q

The best attribute for a node is the attribute with

Answer

Study These Flashcards

A

maximum information gain, minimum conditional entropy

Decision Trees Flashcards

(21 cards)