Lecture 4 - Tree models Flashcards
What are the three models based on trees
- decision trees
- random forest
- gradient boosting
What are the three parts of a decision tree
- root node
- internal/decision nodes
- leaf nodes
what is the max depth and max number of leaves of a decision tree with d number of binary features?
d+1
how many leafs does a decision tree have with d number of binary features
2^d
Decision tree
The set of literals at a node is called a ___
split
each leaf of the tree represents a ___ ___, which is a conjuction of literals encounteresd on the path from the root of the tree to the leaf
logical expression
Give 3 decision tree algrithms
- ID3
- C4.5
- CART
Growing a tree is recusive
true/false
true
how to decide the best split
to assign the majority class label to leaves, we look for a clear split between two classes -> Purity of the children of the split
Give 3 impurity metrics
- Minority class
- Entropy
- Gini index
Both the entropy and the Gini index are smooth and concave upper bounds of the training error. These properties can be advantegeous in some situations
True
What is entropy
Entropy is the expected average level of information, surprise, or uncertainty inherent in all the n possible outcomes of an event
Entropy for a binary classification tasks:
H(p,1-p) = -p log_2 p-(1-p)log_2 (1-p)
How do we assess if a split is useful at all?
In assessing the quality of a feature for splitting a parent node D into leaves, it is customary to look at the purity gain. Imp(D)-Imp(D1..Dj)
Purity gain = original entropy - entropy after splitting
Finding the best split for a decision tree is recursive?
False