Non-parametric models Flashcards

1
Q

What is a non-parametric model?

A

We don’t know the number of parameters before fitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For k-nearest-neighbours, what is the prediction function for regression/classification?

A
The mean of y over k nearest neighbours.
Prob is the mean of 1 if y in class else 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

For k-nearest neighbours, what happens to the model as we increase/decrease k?

A

Increasing k the model becomes simpler as we go toward just taking the average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give four disadvantages to the k-nearest neighbour method

A

Memory intensive
Does not generalise to data outside test set range
Suffers from the curse of dimensionality: points become equally separated in high dimensions
Sensitive to scale of data and distance metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a CART? How can these approximate an arbitrary function?

A

Partition the input space into regions, and assign a constant value to each region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the prediction function for a CART?

A

A set of split points and indices (for split dimension) and constant values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the parameters we are optimising for?

A

These points and indices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain what the recursive binary splitting algorithm does

A

A greedy algorithm for growing the trees. At each iteration, we find the best split point and index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For a square error loss function, what is the objective function for a CART in a given region?

A

The sum of the SE loss (where the constant is yhat) for left- and right- regions at a given split value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What method can we use to determine the optimal split and region at each iteration?

A

We just try p(N-1) splits each time. Cannot use gradient descent as the loss is not differentiable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name four ways we can prevent overfitting with trees

A

Minimum number of datapoints per leaf node
Max tree depth
Min number a node must have before splitting
Stop when improvement is too small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In what situation would a CART not generalise well?

A

They do not extrapolate from the data. They are constants everywhere outside the range of the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly