Week 10 DSE Flashcards
What are the pros and cons of decision trees?
pro:
- no need to standardize data
- perform variable selection
- computationally efficient
- scales up very well, allows interpretation
cons:
- simple trees will likely lose out to other methods on prediction (unlike knn)
- don’t naturally lead to continuous models
- does not handle categorical var well when many categories
- can increase predictive performance but at the cost to interpretability (big, deep trees)
What is discontinous data?
- Change data would cause the important varibales predicted to be different
- highly dependent on training data
What is root node and leaf node?
- root: first node
- leaf: final decision
What is covariate space?
space of x variable
Regression tree ______________ the _____________ into a set of rectangles and then fit a _______________in each one.
splits effectively/partition
covariate space
simple model (constant)
What ar ethe steps of building a tree?
CART
- form the tree using recursive binary partition of data
To make a prediction, just find the ___________to which the new observation belongs and equate the _________to the __________ of that region.
interval
forecast
sample mean
What is the differnece between the splits if xi is numeric vs categoric?
- For numeric xi , the rule is based on the threshold xi < c.
- For categorical xi , the rule lists the set of categories sent left.
What is used to measure complexity of decision trees?
number of leaves/terminal nodes (T)
What is usually the loss funciton of decision trees?
MSE
What does alpha stand for in the tree choice minimisation problem
penalty parameter
What happens when alpha is 0?
we would choose the tree that perfectly fits the data resulting in overfitting
What is the algorithm in choosing tree size?
cost- complexity pruning (weakest link pruning)
What are the steps for the tree minimisation problem?
Step 1: Grow big. Use recursive binary split with stop condition.
Rationale: seemingly worthless split high in the tree may really play an important role lower down
Step 2: Prune back. Recursively prune back big tree. Examine every pair of leaves and eliminate if result in SMALLEST increase in loss. Give sequence of subtree of initial big one that contains T hat
What does smaller cp mean?
means smaller α
small penalty on complexity and hence bigger tree.
_________ has CV- min
cp.
Middle tree
Can decision trees help to improve linear regression model?
Yes
The tree can give us ideas about important interactions which can help further improve the linear regression model.
Is it possible to use MSE for binary y?
YES
Why is misclassification not recommended for use as loss for growing tree ?
Misclassification is not differentiable, reasonably insensitive to node probabilities
What is the gini error?
error rate for a rule that assigns each observation to outcome k with probability . In the binary case, it is just the variance at the node.
What loss is used for growing tree? What is used for pruning?
Growing tree: gini or deviance
Pruning: Misclassification
Which loss is suitable for cost complexty pruning? Which one is the most common?
All
misclassification most common
Trees cannot perform variable selection (T/F(
FALSE
Perform variable selection, detect nonlinearities and interactions without careful user specification.
What is a notable feature of decision trees?
automatic interaction detection
(detects Interaction between 2 variables)
What is T hat alpha, and Q?
T hat alpha: optimal tree structure
Q= argmin t (L(T,y)+alpha|T|
when alpha is infinity, what kind of model do you get?
simplest possible model, which is predicting sample mean for any prediction
What do you get afte rthe tree minimisation algo?
sequence of subtrees of the initial big one that must contain T hat alpha