Tree-Based Models (10-20%) Flashcards
Compare decision trees, random forests, and gradient boosting machines (GBMs).
Decision trees:
1. intuitive and quick to run
2. good for recognizing natural break points in continuous variables
3. good for recognizing nonlinear interactions between variables
4. unstable and can be prone to overfitting
Random forests:
1. a natural extension of decision trees, uses many weak learners to solve a problem
2. “bagging” is used to get from decision trees to random forests, using a subset of features to build each decision tree, with a focus on reducing model variance
3. a powerful tool to detect nonlinear interactions between predictor variables
4. less prone to overfitting than gradient boosting machines
Gradient boosting machines (GBMs):
1. uses the concept of continuously refitting residuals of the previous model, with a focus on reducing model bias
2. an extension of decision trees with the insights of “bagging + boosting”
3. a powerful tool to detect nonlinear interactions between predictor variables
4. more prone to overfitting than random forests and more sensitive to hyperparameter inputs
Describe decision trees.
The idea is to split the feature space into exhaustive and mutually exclusive segments that minimize the variability of the target within each segment by splitting the parent segment by a variable that causes the new segments to be most different from each other.
For classification, the estimate of the target for observations in a leaf segment/node is set as the class that appears most in that segment. (E.g., are the majority of observations in that segment lapses or non-lapses?)
For regression, the target estimate is set as the mean of the target values of the observations within the leaf segment.
The structure is similar to a real (upside-down) tree. Each new segment is created by splitting an existing segment.
Define binary tree.
A tree structure in which each node has two children
Define balanced binary tree.
A binary tree in which the left and right subtrees of any node differ in depth by at most one.
Define a node in decision trees.
A subset of the data. For a given node, nodes that are below it must contain a subset of that node.
Define a root in decision trees.
The top node is a tree.
Define a child in decision trees.
A node directly connected to another node when moving away from the root node. In most tree depictions, the child nodes appear below their parent nodes.
Define a parent in decision trees.
A node directly connected to another node when moving away from the root. In most tree depictions, the parent node appears above its child nodes.
Define a leaf in decision trees.
A node with no children. These represent the final segments of data. The union of all data in the leaf nodes will be the full dataset.
Define edge in decision trees.
The connection between one node and another. These are represented by the arrows in the tree.
Define depth in decision trees.
The number of edges from the tree’s root node to the “furthest” leaf.
Define a subtree in decision trees.
A branch of the tree that doesn’t start from the root.
Define pruning in decision trees.
A technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little predictive power. Pruning reduces the complexity of the final model and, hence, improves predictive accuracy by reducing overfitting.
Define unbalanced binary tree.
A binary tree ub which the left and right subtrees of some nodes differ in depth by more that one.
Define entrophy.
Entropy is a measure of the impurity of each node in a decision tree (classification only, although there are regression equivalents). Knowing the impurity of a node helps the decision tee decide how ro create the splits. We can calculate the impurity of the resulting nodes for a candidate split so that we can measure how good that split is. We will choose the final split as the candidate split that results in child nodes with the lowest levels of impurity.
Entropy = -sum(p*log(p))
Define information gain in decision trees.
We would like to be able to split the data into smaller segments that help us improve the overall impurity of the resulting segments. We call this improvement in the impurity the information gain resulting from a chosen split. In order to pick the best split, we need to calculate the information gain for all possible splits and then pick the best one.
Information gain = entropy(parent) - sum[(Nk/Np)*entropy(child))