Decision Tree Flashcards
What is a Decision Tree?
Nested if else condition.
How does decision trees work?
Decision trees use hyperplanes which run parallel to one of the axes to cut the co-ordinate system into hyper-cuboids in order to classify the data points.
Advantages of Decision Tree
- Minimal Data Preparation is required. (No normalization, standardization).
- Time complexity - Logarithmic
- Interpretability
Disadvantage of Decision Tree?
- Overfitting
- Prone to error for imbalanced dataset.
- Computationally expensive when the column is numerical.
CART
Classification and Regression Trees
What is entropy?
Entropy is a measure of randomness. It is the measure of purity/impurity in the data.
E(s) = € -pi. Log(pi), where the base of Log is 2 (or e).
How to calculate entropy for numerical data?
We can plot the distribution of numerical data points.
The distribution which is more horizontly distributed i.e., less peaked will have more entropy.
Information Gain?
Measures the quality of a split.
The information gain is based on the decrease in entropy after the data-set is split on an attribute.
The goal is to construct a decision tree that gives the highest information gain.
IG = E(Parent) - {Weighted Avg.} . E(Children)
Why is Decision Tree a greedy approach?
Decision tree applies a recursive greedy search algorithm in top bottom fashion to find the information gain at every level of the tree.
What is the Entropy of leaf node?
Zero
What is Gini Impurity?
The probability of misclassifying a randomly chosen element in a set. It is used to decide the optimal split for a decision tree.
GI = 1 - €(pi) ^2
Why is Gini impurity preferred over entropy?
This is because Gini impurity is computationally inexpensive as compared to entropy, due to log in entropy.
However, for certain kind of data set, entropy performs better than Gini Impurity.
How to form the decision tree for numerical column?
- Sort the value of numerical column.
- For every data point, divide the tree and subsequently calculate the entropy for both the parts.
- Take the maximum value of information gain to find the node for that particular split.
What is the max_depth criteria in the decision tree.
- If max_depth criteria is “None”, the nodes are expanded until all leaves are pure.
- If max_depth criteria is “1”, the nodes are expanded only once.
What is the splitter criteria in the decision tree. (Hyperparameter)
- If splitter criteria is “best”, the nodes are split based on maximum Information Gain.
- If splitter criteria is “random”, the nodes are split randomly.
What is the Min Samples Split criteria in the decision tree.
The minimum sample at node so that it can be split into further nodes.
What is the Min Samples Leaf criteria in the decision tree.
The minimum sample at leaf to be present after every split.
What is the Max Features criteria in the decision tree.
The number of maximum feature available at every split.
To prevent overfitting, we can limit the number of features. Here, number of features is chosen randomly.
What is the Max Leaf Nodes criteria in the decision tree.
Maximum number of Leaf Nodes
When to use decision tree for regression?
When the data is not Linearly separable, then we can use decision tree for regression purpose.
How to use decision tree regressor?
- Datapoints (Independent variable) are arranged in ascending order.
- Split at each data point is made and the best split is one with the minimum error.
In case of multiple column, every column is selected, sorted separately and the best split is found using minimum error from all the columns.
Error can be calculated using MSE or MAE.
What are the types of ensemble learning?
- Voting
- Stacking
- Bagging
- Boosting
What is voting ensemble?
Classification - Class with the maximum value count
What is stacking ensemble?
Stacking has different layers.
In first layer, there are several models which generates an output. The outputs of the first layer is then passed on to the second layer model where a weightage is assigned to each model of the first layer, and the final result is the weighted average of all the model of the first layer.
What is Bagging?
Bootstrap Aggregation - A subset of the datapoints is selected (with replacement) for every model. Final result is based on the maximum vote count.
When the base model used in Bagging is Decision Tree, it is called as Random Forest