classification (decision trees) Flashcards

1
Q

What is the goal of decision trees?

A

To create a model that predicts the value of a target variable by learning simple decision rules inferred from features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are decision trees?

A

A non-parametric supervised learning method used for classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are benefits of decision trees?

A
  • don’t need as much data and gives rules that it learned
  • no underlying assumptions
  • handles multidimensional data
  • achieves good accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the splitting criterion?

A

Tells us which attribute to test at node N by determining the “best” way to separate or partition tuples into individual classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean for a partition to be pure?

A

If all tuples belong to the same class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does information gain do?

A

Minimizes the info needed to classify tuples in resulting partitions and reflects least randomness or “impurity”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

(T/F): Info gain guarantees that a simple tree is found

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does Gain(A) tell us?

A

Tells us how much would be gained by branching on A. It is the difference between original info required and new info required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you determine the “best” split point for A if it is CONTINUOUS-VALUED?

A
  • sort of values of A in increasing order
  • consider midpoint as split point ( ( ai + ai + 1 ) / 2 )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

(T/F): For info gain, pick the feature and split that MAXIMIZES weighted entropy

A

False; pick the feature and split that MINIMIZES weighted entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of DT is info gain used in?

A

ID3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What kind of DT is the Gini index used in?

A

CART

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Gini Index?

A

An alternative measure of impurity of class labels among a particular set of tuples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many possible ways are there to form 2 partitions of the data based on a binary split on A?

A

2^v - 2 ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

(T/F): We want to MINIMIZE the gini index

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you pick the splitting subset for DISCRETE-VALUED attribute using Gini index?

A

Subset that gives minimum Gini index for that attribute

17
Q

How do you pick the splitting subset for CONTINUOUS-VALUED attribute using Gini index?

A
  • Consider each possible split-point
  • midpoint bw each pair of sorted adjacent values is taken as a possible split point
18
Q

What is the range of the Gini index for a BINARY label?

A

[0, 0.50]

19
Q

What does maximum purity mean?

A

Every instance has the same label (in the binary case)

20
Q

What does maximum impurity mean?

A

50% of data are one label and 50% are another

21
Q

Explain the bias of info gain.

A

Biased towards multivalued attributes

22
Q

Explain the bias of the gain ratio.

A

adjusts for bias, but prefers unbalanced splits in which one partition is much smaller than others

23
Q

Explain the bias of Gini index.

A

biased towards multivalued attributes and has difficulty when the number of classes is large. It also favors tests that result in equal size partitions and purity in both partitions.

24
Q

What are the 3 partitioning scenarios?

A

1) A is discrete-valued
2) A is continuous-valued
3) A is discrete-valued and a binary tree

25
Q

Why is pruning performed?

A

To address overfitting

26
Q

What is pre-pruning?

A

“Pruned” by halting its construction early

27
Q

What are 2 examples of pre-pruning?

A
  • max depth
  • min leaf size
28
Q

What does controlling for min leaf size do?

A

The higher the leaf size, the less overfit it will get because it won’t create rules past a certain threshold. This means it could miss out on more detailed rules that capture something more generalizable.

29
Q

What is post-pruning?

A

Removes rules after decision tree algorithm has completed.

30
Q

Explain the cost-complexity ratio

A

This refers to post-pruning where we are trying to minimize an increase in error from pruning while also minimizing the number of rules used (less rules, less complex)