Week 10 DSE Flashcards
What are the pros and cons of decision trees?
pro:
- no need to standardize data
- perform variable selection
- computationally efficient
- scales up very well, allows interpretation
cons:
- simple trees will likely lose out to other methods on prediction (unlike knn)
- don’t naturally lead to continuous models
- does not handle categorical var well when many categories
- can increase predictive performance but at the cost to interpretability (big, deep trees)
What is discontinous data?
- Change data would cause the important varibales predicted to be different
- highly dependent on training data
What is root node and leaf node?
- root: first node
- leaf: final decision
What is covariate space?
space of x variable
Regression tree ______________ the _____________ into a set of rectangles and then fit a _______________in each one.
splits effectively/partition
covariate space
simple model (constant)
What ar ethe steps of building a tree?
CART
- form the tree using recursive binary partition of data
To make a prediction, just find the ___________to which the new observation belongs and equate the _________to the __________ of that region.
interval
forecast
sample mean
What is the differnece between the splits if xi is numeric vs categoric?
- For numeric xi , the rule is based on the threshold xi < c.
- For categorical xi , the rule lists the set of categories sent left.
What is used to measure complexity of decision trees?
number of leaves/terminal nodes (T)
What is usually the loss funciton of decision trees?
MSE
What does alpha stand for in the tree choice minimisation problem
penalty parameter
What happens when alpha is 0?
we would choose the tree that perfectly fits the data resulting in overfitting
What is the algorithm in choosing tree size?
cost- complexity pruning (weakest link pruning)
What are the steps for the tree minimisation problem?
Step 1: Grow big. Use recursive binary split with stop condition.
Rationale: seemingly worthless split high in the tree may really play an important role lower down
Step 2: Prune back. Recursively prune back big tree. Examine every pair of leaves and eliminate if result in SMALLEST increase in loss. Give sequence of subtree of initial big one that contains T hat
What does smaller cp mean?
means smaller α
small penalty on complexity and hence bigger tree.