classification (decision trees) Flashcards
What is the goal of decision trees?
To create a model that predicts the value of a target variable by learning simple decision rules inferred from features
What are decision trees?
A non-parametric supervised learning method used for classification
What are benefits of decision trees?
- don’t need as much data and gives rules that it learned
- no underlying assumptions
- handles multidimensional data
- achieves good accuracy
What is the splitting criterion?
Tells us which attribute to test at node N by determining the “best” way to separate or partition tuples into individual classes
What does it mean for a partition to be pure?
If all tuples belong to the same class
What does information gain do?
Minimizes the info needed to classify tuples in resulting partitions and reflects least randomness or “impurity”
(T/F): Info gain guarantees that a simple tree is found
True
What does Gain(A) tell us?
Tells us how much would be gained by branching on A. It is the difference between original info required and new info required.
How can you determine the “best” split point for A if it is CONTINUOUS-VALUED?
- sort of values of A in increasing order
- consider midpoint as split point ( ( ai + ai + 1 ) / 2 )
(T/F): For info gain, pick the feature and split that MAXIMIZES weighted entropy
False; pick the feature and split that MINIMIZES weighted entropy
What kind of DT is info gain used in?
ID3
What kind of DT is the Gini index used in?
CART
What is Gini Index?
An alternative measure of impurity of class labels among a particular set of tuples
How many possible ways are there to form 2 partitions of the data based on a binary split on A?
2^v - 2 ways
(T/F): We want to MINIMIZE the gini index
True
How do you pick the splitting subset for DISCRETE-VALUED attribute using Gini index?
Subset that gives minimum Gini index for that attribute
How do you pick the splitting subset for CONTINUOUS-VALUED attribute using Gini index?
- Consider each possible split-point
- midpoint bw each pair of sorted adjacent values is taken as a possible split point
What is the range of the Gini index for a BINARY label?
[0, 0.50]
What does maximum purity mean?
Every instance has the same label (in the binary case)
What does maximum impurity mean?
50% of data are one label and 50% are another
Explain the bias of info gain.
Biased towards multivalued attributes
Explain the bias of the gain ratio.
adjusts for bias, but prefers unbalanced splits in which one partition is much smaller than others
Explain the bias of Gini index.
biased towards multivalued attributes and has difficulty when the number of classes is large. It also favors tests that result in equal size partitions and purity in both partitions.
What are the 3 partitioning scenarios?
1) A is discrete-valued
2) A is continuous-valued
3) A is discrete-valued and a binary tree
Why is pruning performed?
To address overfitting
What is pre-pruning?
“Pruned” by halting its construction early
What are 2 examples of pre-pruning?
- max depth
- min leaf size
What does controlling for min leaf size do?
The higher the leaf size, the less overfit it will get because it won’t create rules past a certain threshold. This means it could miss out on more detailed rules that capture something more generalizable.
What is post-pruning?
Removes rules after decision tree algorithm has completed.
Explain the cost-complexity ratio
This refers to post-pruning where we are trying to minimize an increase in error from pruning while also minimizing the number of rules used (less rules, less complex)