Jupyter Notebook 2.3 - Decision Trees Flashcards
Why are decision trees important to study in machine learning?
Decision trees are significant for several reasons:
- Foundation for Powerful Methods: They serve as the basis for many advanced machine learning algorithms, such as Random Forests and Gradient Boosting, which are widely used for various predictive tasks.
- Interpretability: Decision trees produce highly interpretable models, making them useful in fields that require clear explanations of decisions, such as law, finance, and medicine. Their structure allows stakeholders to easily understand the reasoning behind predictions.
How do decision trees work in machine learning?
Decision trees are a type of model used for classification and regression. They learn from a labeled dataset by constructing a tree structure based on a series of if-else questions about the features. Each internal node represents a feature or attribute, each branch represents a decision rule, and each leaf node represents a predicted label or outcome.
What do the terms “samples,” “value,” and “Gini” mean in the context of decision trees?
In decision trees, the following terms are used to describe different aspects of the nodes:
Root Node: The top-most node in the tree, which has no parent. It poses the first if-else question that splits the data.
Internal Node: A node with one parent and two children, which poses additional if-else questions to further split the data.
Leaf Node: A node without children, representing the final output or prediction for the data that has reached that point.
Samples: The number of training samples that reach a particular node. For example, a node might apply to 179 samples where a specific condition (e.g., glucose < 154.5) is met.
Value: A list showing the number of samples belonging to each class at that node. This helps determine the distribution of classes.
Gini: A measure of impurity or diversity at a node. The Gini index is 0 when all samples belong to a single class, indicating perfect purity. It increases towards 1 as the diversity of classes in the node increases, indicating more impurity.
How do decision trees learn from training data and what is the process of training them?
Decision trees learn the relationship between training data and corresponding labels by organizing the data in a binary tree structure.
Here’s a summary of the training process:
Goal of Purity:
The primary objective is to create leaves that are as pure as possible, meaning the majority of samples in each leaf belong to the same class.
Recursive Splitting:
Starting at the root node, the training data is split into two parts based on the feature that provides the largest information gain. This involves comparing the values of a single feature against a specific threshold.
Iterative Process:
The splitting process is repeated recursively for each child node, creating new internal nodes and leaf nodes until a stopping criterion is met (e.g., all instances in a leaf belong to the same class, or a maximum tree depth is reached).
This recursive splitting continues until the tree structure is fully constructed, with each leaf node making specific predictions based on the training data that reached that node.
What is Gini impurity, and how is it calculated in decision trees?
Gini impurity is a measure of the impurity or diversity of a node in a decision tree.
Key points about Gini impurity include:
Pure Node: If all samples in the node belong to a single class, then
𝐺𝑖 =0 (pure)
Impure Node: If samples are distributed across multiple classes,
𝐺i, approaches 1, indicationg high impurity
The information gain of a node can be computed as the Gini impurity at the node minus the sum of the Gini impurity of its children weighted by their sizes relative to the parent node.
How does the CART algorithm grow decision trees for classification?
The CART (Classification and Regression Trees) algorithm is a method used to construct decision trees based on information gain. The process is as follows:
Greedy Search: Starting at the root node, the algorithm performs a greedy search to find the best feature and threshold that maximize information gain. For example, it might evaluate a feature like “glucose” and a threshold such as “glucose < 132.5.”
Splitting the Data: The selected feature and threshold are used to split the dataset into two subsets: those that meet the condition and those that do not.
Recursive Process: This process of selecting features and thresholds to maximize information gain continues recursively for each child node until a stopping criterion is met (e.g., a pure node, a maximum tree depth, or a minimum number of samples).
CART is widely used due to its efficiency and effectiveness in building decision trees for classification tasks.
How are decision trees used for regression, and what measures do they utilize?
Decision trees can be adapted for regression tasks in a manner similar to classification. Key points include:
Mean Squared Error (MSE): Instead of using Gini impurity or information gain, decision trees for regression use mean squared error (MSE) to measure the impurity of nodes. MSE calculates the average of the squared differences between predicted values and actual values.
Making Splits: The algorithm seeks to minimize the MSE when deciding how to split the data. It evaluates different features and thresholds to find splits that result in the lowest MSE in the resulting child nodes.
Recursive Splitting: This splitting process continues recursively for each child node until a stopping criterion is reached, such as a maximum tree depth or a minimum number of samples in a leaf.
In this way, decision trees can effectively model continuous target variables, providing predictions based on the average target value of the samples in each leaf node.
What are the downsides of using decision trees in machine learning?
Overfitting: Decision trees tend to overfit the training data if not regularized properly. This means they may perform well on the training set but poorly on unseen data.
Instability: Decision trees can be highly sensitive to slight variations in the training data. For instance, changing the random state in the data split can result in different tree structures. This instability can affect the reliability of feature importances calculated from the tree.
Generalization Issues: Due to overfitting and instability, decision trees may struggle to generalize well to new, unseen data, making them less effective in practice.
Despite their ease of use and interpretability, these downsides highlight the importance of applying regularization techniques and considering ensemble methods, such as random forests, to improve performance and stability.