Non-parametric models Flashcards

Question 1

Q

What is a non-parametric model?

Answer

A

We don’t know the number of parameters before fitting.

Question 2

Q

For k-nearest-neighbours, what is the prediction function for regression/classification?

Answer

A

The mean of y over k nearest neighbours.
Prob is the mean of 1 if y in class else 0.

Question 3

Q

For k-nearest neighbours, what happens to the model as we increase/decrease k?

Answer

A

Increasing k the model becomes simpler as we go toward just taking the average.

Question 4

Q

Give four disadvantages to the k-nearest neighbour method

Answer

A

Memory intensive
Does not generalise to data outside test set range
Suffers from the curse of dimensionality: points become equally separated in high dimensions
Sensitive to scale of data and distance metric.

Question 5

Q

What is a CART? How can these approximate an arbitrary function?

Answer

A

Partition the input space into regions, and assign a constant value to each region.

Question 6

Q

What is the prediction function for a CART?

Answer

A

A set of split points and indices (for split dimension) and constant values.

Question 7

Q

What are the parameters we are optimising for?

Answer

A

These points and indices

Question 8

Q

Explain what the recursive binary splitting algorithm does

Answer

A

A greedy algorithm for growing the trees. At each iteration, we find the best split point and index

Question 9

Q

For a square error loss function, what is the objective function for a CART in a given region?

Answer

A

The sum of the SE loss (where the constant is yhat) for left- and right- regions at a given split value.

Question 10

Q

What method can we use to determine the optimal split and region at each iteration?

Answer

A

We just try p(N-1) splits each time. Cannot use gradient descent as the loss is not differentiable.

Question 11

Q

Name four ways we can prevent overfitting with trees

Answer

A

Minimum number of datapoints per leaf node
Max tree depth
Min number a node must have before splitting
Stop when improvement is too small

Question 12

Q

In what situation would a CART not generalise well?

Answer

A

They do not extrapolate from the data. They are constants everywhere outside the range of the training data.

Non-parametric models Flashcards

(12 cards)