Decision Tree Modeling Flashcards
What is tree-based learning? What does it do and how?
Tree-based learning is a type of
- supervised machine learning
- performs classification and regression tasks.
- It uses a decision tree as a predictive model to go from observations about an item represented by the branches to conclusions about the items target value represented by the leaves.
Ensemble Learning
which enable you to use multiple decision trees simultaneously in order to produce very powerful models
What’s the benefit of hyperparameter tuning?
Knowing how and when to tune a model can help increase its performance significantly
What is a Decision Tree?
- non-parametric supervised learning algorithm (not based on assumptions about distribution)
- for classification and regression tasks
- It has a hierarchical tree structure consisting of a root node, branches, internal nodes, and leaf nodes.
How data professionals use decision tree?
to make predictions about future events based on the information that is currently available.
Decision Tree PROs
- require no assumptions on data’s distribution
- handle collinearity easily.
- requiring little preprocessing to prepare data for training
Decision Tree CONs
- susceptible to overfitting.
- sensitive to variations in the training data.
The model might get extremely good at predicting scene data, but as soon as new data is introduced, it may not work nearly as well.
What are made at each node?
Decisions are made at each node.
Edges
The edges connect together the nodes essentially directing from one node to the next along the tree.
What is a Root Node?
- It’s the first node in the tree
- all decisions needed to make the prediction will stem from it
- It’s a special type of decision node because it has no predecessors.
What is a Decision Node?
- All the nodes above the leaf nodes.
- The nodes where a decision is made
- always point to a leaf node or other decision nodes within the tree.
Leaf Node
- where a final prediction is made.
- The whole process ends here as they do not split anymore
What are Child Nodes?
- any node that results from a split.
- The nodes that are pointed to either leaf nodes or other decision nodes
What are Parent Nodes?
node that the child splits from
What prediction outcomes types can decision tree be used for?
- classification: where a specific class or outcome is predicted
- regression: where a continuous variable is predicted—like the price of a car.
What is the criteria to split a Decision node?
A decision node is split on the criterion that minimizes the impurity of the classes in their resulting children.
What is Impurity?
- the degree of mixture with respect to class.
- A perfect split would have no impurity in the resulting child nodes; it would partition the data with each child containing only a single class.
Name 4 metrics to determine impurity
- Gini impurity
- entropy
- information gain
- log loss
What’s the requirement for choosing split points?
- identify what type of variable it is—categorical or continuous
- the range of values that exist for that variable
Choosing split for categorical predictor variable
consider splitting based on the categorical variable, ie. color.
Choosing split for continuous predictor variable
splits can be made anywhere along the range of numbers that exist in the data
Ie. sorting the fruit based on diameter: 2.25, 2.75, 3.25, 3.75, 5, and 6.5 centimeters.
Describe Gini impurity score
- most straightforward
- the best scores are those closest to 0
- The worst score is 0.5, which would occur when each child node contains an equal number of each class.
Classification trees PROs
- Require few pre-processing steps.
- Can work with all types of variables (continuous, categorical, discrete).
- No normalization or scaling required
- Decisions are transparent.
- Not affected by extreme univariate values
Name 2 disadvantages of classification trees
- Can be computationally expensive relative to other algorithms.
- sensitive to data changes. Small changes in data can result in significant changes in predictions
What are Hyperparameters?
- parameters that can be set before the model is trained
- affect how the model fits the data
- Help balance best model to neither underfit nor overfit the data
What is Max Depth for decision trees?
- how deep the tree is allowed to grow
- The depth = number of levels between the root node and the farthest node
- the root node is level 0
Max Depth PROs
- reduce overfitting problems by limiting how deep the tree will go
- it can reduce the computational complexity of training and using the model
Min Samples Leaf
- the minimum number of samples that must be in each child node after the parent splits.
- split only if there are enough samples in each of the result nodes to satisfy the required value.
Example
There’s a decision node that currently has 10 samples. However, the min samples leaf hyper parameter is set to six. There would be no way to split the data so that each leaf node has six samples and therefore no further split can take place
What is GridSearch?
A tool to find the optimal values for the parameters
What does GridSearch do?
- to confirm that a model achieves goal
- by systematically checking every combination of hyper parameters
- to identify which set produces the best results based on the selected metric.
What is an Overfit model and how to identify it?
- model learns the training data so closely that it captures more than the intrinsic patterns of all such data distributions
- model that scores very well on the training data but considerably worse on unseen data because it cannot generalize well.
- identify when accuracy of training model is high ~1
What is an under-fitted model and how can it be identified?
- model does not learn the patterns and characteristics of the training data well, and consequently fails to make accurate predictions on new data.
- easier to identify underfitting, because the model performs poorly on both training and test data
Name 3 hyperparameters of a Decision tree
- Max Depth
- min samples split
- Min Samples Leaf
CON of increasing Max Depth
- overfitting
As you increase the max depth parameter, the performance of the model on the training set will continue to increase. It’s possible for a tree to grow so deep that leaves contain just a single sample. However, this overfits the model to the training data, and the performance on the testing data would probably be much worse.
Min Samples Split
- minimum number of samples the parent node must have before splitting
if you set this to 10, then any node that contains nine or fewer samples will automatically become a leaf node. It will not continue splitting.
What is the max and min number that min samples split can have?
Min: 2 is the smallest number that can be divided into two separate child nodes.
Max: The greater the value you use the sooner the tree will stop growing.
What is regularization?
- the process of reducing model complexity to prevent overfitting.
- Regularization helps to make the model more generalizable to new data
- regularization trades a marginal decrease in training accuracy for an increase in generalizability.
How does regularization prevent overfitting in machine learning models?
Regularization introduces penalty terms to the model’s loss function
- discouraging overly complex solutions and
- promoting better generalization to unseen data.