Non-parametric models Flashcards
What is a non-parametric model?
We don’t know the number of parameters before fitting.
For k-nearest-neighbours, what is the prediction function for regression/classification?
The mean of y over k nearest neighbours. Prob is the mean of 1 if y in class else 0.
For k-nearest neighbours, what happens to the model as we increase/decrease k?
Increasing k the model becomes simpler as we go toward just taking the average.
Give four disadvantages to the k-nearest neighbour method
Memory intensive
Does not generalise to data outside test set range
Suffers from the curse of dimensionality: points become equally separated in high dimensions
Sensitive to scale of data and distance metric.
What is a CART? How can these approximate an arbitrary function?
Partition the input space into regions, and assign a constant value to each region.
What is the prediction function for a CART?
A set of split points and indices (for split dimension) and constant values.
What are the parameters we are optimising for?
These points and indices
Explain what the recursive binary splitting algorithm does
A greedy algorithm for growing the trees. At each iteration, we find the best split point and index
For a square error loss function, what is the objective function for a CART in a given region?
The sum of the SE loss (where the constant is yhat) for left- and right- regions at a given split value.
What method can we use to determine the optimal split and region at each iteration?
We just try p(N-1) splits each time. Cannot use gradient descent as the loss is not differentiable.
Name four ways we can prevent overfitting with trees
Minimum number of datapoints per leaf node
Max tree depth
Min number a node must have before splitting
Stop when improvement is too small
In what situation would a CART not generalise well?
They do not extrapolate from the data. They are constants everywhere outside the range of the training data.