Week 10 DSE Flashcards

1
Q

What are the pros and cons of decision trees?

A

pro:
- no need to standardize data
- perform variable selection
- computationally efficient
- scales up very well, allows interpretation

cons:
- simple trees will likely lose out to other methods on prediction (unlike knn)
- don’t naturally lead to continuous models
- does not handle categorical var well when many categories
- can increase predictive performance but at the cost to interpretability (big, deep trees)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is discontinous data?

A
  • Change data would cause the important varibales predicted to be different
  • highly dependent on training data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is root node and leaf node?

A
  • root: first node
  • leaf: final decision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is covariate space?

A

space of x variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Regression tree ______________ the _____________ into a set of rectangles and then fit a _______________in each one.

A

splits effectively/partition
covariate space
simple model (constant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What ar ethe steps of building a tree?

A

CART
- form the tree using recursive binary partition of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

To make a prediction, just find the ___________to which the new observation belongs and equate the _________to the __________ of that region.

A

interval
forecast
sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the differnece between the splits if xi is numeric vs categoric?

A
  • For numeric xi , the rule is based on the threshold xi < c.
  • For categorical xi , the rule lists the set of categories sent left.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is used to measure complexity of decision trees?

A

number of leaves/terminal nodes (T)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is usually the loss funciton of decision trees?

A

MSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does alpha stand for in the tree choice minimisation problem

A

penalty parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens when alpha is 0?

A

we would choose the tree that perfectly fits the data resulting in overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the algorithm in choosing tree size?

A

cost- complexity pruning (weakest link pruning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the steps for the tree minimisation problem?

A

Step 1: Grow big. Use recursive binary split with stop condition.
Rationale: seemingly worthless split high in the tree may really play an important role lower down

Step 2: Prune back. Recursively prune back big tree. Examine every pair of leaves and eliminate if result in SMALLEST increase in loss. Give sequence of subtree of initial big one that contains T hat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does smaller cp mean?

A

means smaller α
small penalty on complexity and hence bigger tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

_________ has CV- min
cp.

A

Middle tree

17
Q

Can decision trees help to improve linear regression model?

A

Yes
The tree can give us ideas about important interactions which can help further improve the linear regression model.

18
Q

Is it possible to use MSE for binary y?

A

YES

19
Q

Why is misclassification not recommended for use as loss for growing tree ?

A

Misclassification is not differentiable, reasonably insensitive to node probabilities

20
Q

What is the gini error?

A

error rate for a rule that assigns each observation to outcome k with probability . In the binary case, it is just the variance at the node.

21
Q

What loss is used for growing tree? What is used for pruning?

A

Growing tree: gini or deviance

Pruning: Misclassification

22
Q

Which loss is suitable for cost complexty pruning? Which one is the most common?

A

All

misclassification most common

23
Q

Trees cannot perform variable selection (T/F(

A

FALSE

Perform variable selection, detect nonlinearities and interactions without careful user specification.

24
Q

What is a notable feature of decision trees?

A

automatic interaction detection

(detects Interaction between 2 variables)

25
Q

What is T hat alpha, and Q?

A

T hat alpha: optimal tree structure
Q= argmin t (L(T,y)+alpha|T|

26
Q

when alpha is infinity, what kind of model do you get?

A

simplest possible model, which is predicting sample mean for any prediction

27
Q

What do you get afte rthe tree minimisation algo?

A

sequence of subtrees of the initial big one that must contain T hat alpha