Lecture 4 - Tree models Flashcards
What are the three models based on trees
- decision trees
- random forest
- gradient boosting
What are the three parts of a decision tree
- root node
- internal/decision nodes
- leaf nodes
what is the max depth and max number of leaves of a decision tree with d number of binary features?
d+1
how many leafs does a decision tree have with d number of binary features
2^d
Decision tree
The set of literals at a node is called a ___
split
each leaf of the tree represents a ___ ___, which is a conjuction of literals encounteresd on the path from the root of the tree to the leaf
logical expression
Give 3 decision tree algrithms
- ID3
- C4.5
- CART
Growing a tree is recusive
true/false
true
how to decide the best split
to assign the majority class label to leaves, we look for a clear split between two classes -> Purity of the children of the split
Give 3 impurity metrics
- Minority class
- Entropy
- Gini index
Both the entropy and the Gini index are smooth and concave upper bounds of the training error. These properties can be advantegeous in some situations
True
What is entropy
Entropy is the expected average level of information, surprise, or uncertainty inherent in all the n possible outcomes of an event
Entropy for a binary classification tasks:
H(p,1-p) = -p log_2 p-(1-p)log_2 (1-p)
How do we assess if a split is useful at all?
In assessing the quality of a feature for splitting a parent node D into leaves, it is customary to look at the purity gain. Imp(D)-Imp(D1..Dj)
Purity gain = original entropy - entropy after splitting
Finding the best split for a decision tree is recursive?
False
Ho
How to prevent overfitting in decision trees. Give 2
- Limit the number of iterations of the algorithm leading to a tree with a bounded number of nodes.
- Prune the tree after it is built by removing weak branches.
How to do Reduced Error Pruning
- Starting at the leaves, each node is replaced with the majority class
- If the predictino accuracy is not affected then the change is kept.
- Keep a validatin set, see pruned tree performance on the validation set. Of course the pruning will not improve accuracy on the training set.
What are the two sources of imballance
- Asymmetric class distribution
- Asymmetric mis-classification cost
What does adding more samples accomplish?
Adds data for training and increases the training time
It is possible that it does not make any difference
what is sqrt(gini)
Sqrt(gini) is designed to minimise relative impurity and this is insensitive to changes in class distribution, whereas Gini emphasises children covering more examples
Entropy and Gini index are sensitive to fluctuations in the class distrubutions, sqrt(gini) isn’t. We want distribution-insensitive impurities.
yeah true
In regression trees we can replace imp with ____
variance
What is weighted variance
If a split partions the set of target values Y into mutually exclusive sets {y1,…yj}
The variance of boolean variable with success probability p is p(1-p), which is half of the gini index. So we could interpret the goal of tree learning as minimising the calss variance in the leaves.
true
In regression trees out goal is
to find a split that minimises the weighted average of the variance of the child nodes.