Decision Trees (Classification Methods - Shoaib) Flashcards
What is a Decision Tree?
A decision tree is a flowchart-like structure used to visualize and make decisions based on a series of if-else questions. It is a tree-based model where internal nodes represent feature tests, branches represent outcomes, and leaf nodes represent final decisions or classifications. The tree starts with a root node.
What are the key structural components of a decision tree?
■Nodes: Decision nodes (internal nodes) and leaf nodes (terminal nodes).
■Edges: Connections between nodes representing conditions or outcomes.
■Root Node: The starting point of the tree.
■Leaf Nodes: The final decisions or classifications
How does a decision tree make a prediction?
The process starts at the root node, evaluates the condition at the node, and follows the edge corresponding to the outcome. This process is repeated until a leaf node is reached, which provides the final decision or classification. Each path from the root to a leaf is a decision sequence.
How are decision trees built?
Decision trees are constructed using a divide-and-conquer strategy. At each node, the data is split based on feature tests, aiming to separate the data into increasingly pure subsets. The process is recursive and continues until a stopping criterion is met.
What is the goal of split selection?
The goal is to choose the best feature to split the data at each node. This is typically done by selecting the feature that maximizes the separation of classes, thereby improving the purity of the resulting child nodes
What is Information Gain and how is it used for split selection?
Information gain is a metric used to measure how much a particular feature improves the purity of the data after the split. It quantifies the reduction in entropy achieved by splitting a dataset on a particular feature. The feature with the highest information gain is usually selected for the split.
How are missing values handled during decision tree construction?
When a sample has a missing value for a particular feature, it is placed into all child nodes with different weights. The weights are based on the proportions of samples with known values that fall into each child node.
What is the purpose of decision tree pruning?
Pruning is a method to reduce the complexity of a decision tree and to prevent overfitting. Pruning methods remove branches or nodes that do not significantly contribute to the model’s generalization performance.
What is post-pruning?
Post-pruning allows the tree to grow fully and then removes branches or subtrees based on their effect on a validation set. This approach can improve generalization by reducing complexity.
What is a decision stump?
A decision stump is a decision tree with only one level, i.e., a single split. It consists of a root node and two leaf nodes, representing the simplest form of a decision tree.
How do univariate and multivariate decision trees differ?
■Univariate: Each split is based on a single feature. Decision boundaries are parallel to the axes.
■Multivariate: Each split involves a linear combination of multiple features. Decision boundaries can be oblique. Multivariate decision trees can create more complex decision boundaries
What are some of the key advantages of using decision trees?
Decision trees are easy to interpret, they can handle both categorical and numerical data, they require less data preparation, and they can handle missing values.
What are the main disadvantages of decision trees?
Decision trees tend to overfit the training data, can be sensitive to small changes in the data, and may not perform well with very high-dimensional data or data with many features and few examples.
What is CART?
Classification and Regression Tree is a well-known decision tree algorithm that uses the Gini Index for splitting. It can be applied to both classification and regression.
What is C4.5?
A popular algorithm that uses information gain for split selection and can handle missing values. C4.5 is an extension of the ID3 algorithm and uses gain ratio as well as information gain to handle datasets with many feature values.