Decision Trees - Regression Trees Flashcards
Trees for continuous targets
If the ith observation is in node j, then the prediction for that observation, y^_i, equals the average of the target values from all observations in node j.
A split is considered ‘best’ when it results in the largest reduction of the SSE. Adding together the sums of squared residuals for all the terminal nodes equals the SSE
Consequently, each split chosen by the tree is minimizing the total sum of squared residuals from the two resulting nodes. This encourages predictions (i.e. target averages) of the resulting nodes to be far apart.
When a regression tree is fitted to a right-skewed target, it will strongly fit the target’s right tail, i.e. gravitate towards making splits that separate most of the right tail observations from the rest in order to minimize the SSE. To not over-emphasize the right-tail, the target can be log-transformed before fitting.
The predictor used to split the root node is considered the most important predictor by the tree, as it is the predictor that provides the largest reduction in the SSE for the first split. In contrast, predictors that do not appear in the tree are not important, as they do not help in reducing the SSE.
Poisson trees
A regression tree can be adapted to have characteristics of a Poisson regression model. Instead of minimizing the SSE, the Poisson deviance would take its place
If using weights ->replace ‘target’ with cbind(w, y) where w is a column of exposures and y is column of target
Add method = “poisson” after data argument
Output:
top = predicted rate
middle = # counts/ # observations
bottom = proportion of observations