Lesson 5 - Decision Trees Flashcards

Question 1

Q

What are decision trees?

Answer

A

tree representation of rules and decision functions
- can be reduced to a series of if-then rules (comprehensible to humans)
medical, biological and financial applications
Composed from:
- root, starting point
- inner nodes, specify some attribute
- branches, possible value (discrete) o range of values (continuous) that the parent node can assume
- leaf nodes, classification

Question 2

Q

How to transform decision trees into Boolean functions?

Answer

A

paths from the root to a leaf are conjunctions of costraints on values
paths that lead to a same classification codify disjunctions
both define a series of Disjunctive Normal Form, one for each class
- (A=1 and B=3) or (C=2) or (C=4 and D=1)

Question 3

Q

When are decision trees are particularly useful?

Answer

A

instances are attribute-values pairs
- fixed set of attributes and values
- few possible attributes values
- discrete attributes values [can be continuos]
target discrete output values (2 or + values)
target can be described by disjunctions of boolean functions
training with noise and missing values
Decision trees learning algorithms are efficient, still relevant

Question 4

Q

ID3 learning algorithm

Answer

A

1986, most popular
- top-down, greedy search approach (space of possible DTs)
Create a root node T
If examples in S are all of the same class c, return the leaf node T
with label c
If A is empty, return the leaf node T with label the majority class in S
Select a ∈ A, the optimal attribute in A (details later);
Partition S according to the values that a can take: Sa=v1, . . . , Sa=vm where m is the number of distinct values of the attribute a
Return the tree T having sub-trees the trees obtained by recursively calling ID3(Sa=vj, A − a), for each j

Question 5

Q

How does ID3 choose the optimal attribute?

Answer

A

uses entropy and information gain
- entropy measures the degree of impurity of a sample
- information gain is the expected reduction of the entropy obtained when
  partitioning the samples according to the value of attribute a
alternative criterias are:
- Cross-Entropy
- Gini Index
- Misclassification

Question 6

Q

What are the problems of decision trees?

Answer

A

information gain favors too much those attributes that can assume many possible values
overfitting can occur
- setting a minimal number of examples to accept in leaf nodes or limiting depth of the tree (pruning)

Question 7

Q

Entropy

Answer

A

E(S) = -∑(c=1 ->m) p_c log⁡(p_c)
p_c=|S_c |/|S|
- S samples
- S_c subset of S with class c
- C number of classes

Question 8

Q

Information Gain

Answer

A

G(S,a)=E(S)−∑(v∈V(a) ) (|S_(v=a) |/|S|) E(S_(v=a))
- S samples
- a attribute

Question 9

Q

What are the inductive bias in decision trees?

Answer

A

can be seen as a search in a hypothesis space
shorter trees are preferred to larger trees
attributes with high informative gain are closer to the root

Question 10

Q

How is the value for real-valued attributes selected?

Answer

A

select the threshold that provides the maximal information gain for the attribute
- always localized between two values with a different classification
- computed only for a set of values
compared with the gains of other attributes to choose the optimal attribute
real-valued attributes can be used multiple times on the same path (discrete no)

Question 11

Q

How are missing values treated?

Answer

A

Different solutions based on samples:
- use the most frequent value for the attribute
- use the most frequent value for the attribute in same category samples
- consider values and probability of occurrence, substitute the sample with each possible value multiplied for probability of occurrence (multi-fraction product probabilities)

Question 12

Q

Reduced Error Pruning

Answer

A

Pruning -> replace subtree with leaf labelled as most frequen label
- divide training set in training and validation
- evaluate each inner node with validation while simulating that the subtree is pruned
- prune best performance node
- repeat until performance gets worse

Question 13

Q

Post-Rule Pruning

Answer

A

Transform decision tree into a set of rules and perform rule pruning
- rule generated for each path (if-then) [changing hypothesis space]
- for each rule a performance is estimated (rule as classifier)
- preconditions that lead to an increase in performance (validation or statistical test) are removed, greedy approach
- rules sorted in descending order of performance [classification follows order, first precondition match is used, or default -> most frequent class]
Generally improves performance and is better than Reduced-Error Pruning

Lesson 5 - Decision Trees Flashcards

(13 cards)