Decision Tree Flashcards

Question 1

Q

Entropy

Answer

A

Measure of disorder that can be applied to set.
Disorder corresponds to how mixed(impure) the segment is WRT these properties of interest.

Question 2

Q

Entropy formula

Answer

A

-p1 x log(p1) - p2 log(p2) - ..

Question 3

Q

Information Gain measures

Answer

A

measures the change in entropy due to any new information being added.

Question 4

Q

Information Gain formula

Answer

A

– IG(parent,children)=entropy(parent)–[p(c1)×entropy(c1)+p(c2)× entropy(c2) + ⋯+ p(ck) × entropy(ck)]

Question 5

Q

Classification trees

Answer

A

Each interior node in the tree contains a test of an attribute, with each branch from the node representing a distinct value of the attribute.

Question 6

Q

Decision tree - Basic algorithm (a greedy algorithm)

Answer

A

Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)

Question 7

Q

When do decision tree stop?

Answer

A

There are no remaining attributes for further partitioning.
All sample for a given node belong to the same class.

Question 8

Q

Information Gain drawback

Answer

A

biased towards multivalued attributes

Question 9

Q

Gain ratio drawback

Answer

A

tends to prefer unbalanced splits in which one partition is much smaller than the others

Question 10

Q

Gini Index

Answer

A

biased to multivalued attributes
has difficulty when # of classes is large
tends to favor tests that result in equal‐sized partitions and purity in both partitions

(10 cards)