Chapter 2 - Statistical learning Flashcards
Explain typical (squared) bias, variance, training error, test error, and Bayes (or irreducible) error curves as we go from less flexible statistical learning methods towards more flexible approaches.
Exercise 2.4.3
Flexibility can be introduced by having more parameters where only using the mean is the simplest model.
Bayes error is an irreducible error. It is not possible to make a model better than that.
Training error will start relatively high when the model is not flexible, then it will be reduced once the flexibility increases. If the training error goes down below the Bayes error, the model is overfit.
Bias follows the training error, the better fit, the less error we have and the more flexible the model is, the more bias because we fit the points better.
Variance is the opposite to bias. Small changes in the input will have high impact in the response.
The test error will likely be higher than the training error at the start with a low flexibility model. Once the flexibility increases, the error will go down but eventually it will increase again. This leads to the trade-off between variance and bias. The test error increases when we start to overfit the model.
Describe the differences between a parametric and a non-parametric statistical learning approach. What are the advantages of a parametric approach to regression or classification (as opposed to a non-parametric approach)? What are its disadvantages?
Exercise 2.4.6
Parametric approaches assume a form or relationship in the data, for example that it is linear. The problem then becomes reduced into estimating the parameters required for that assumed relationship.
Non-parametric approaches have no such assumption and there is no functional form. Using a non-parametric approach requires a very large dataset if we want to make accurate predictions.
Advantage of parametric approaches: Computations is much simpler, it is more interpretable, and requires less data. A parametric approach model will have less parameters which makes it more computationally efficient, and we do not need as many observations or a large dataset.
Disadvantage of parametric approaches: we might assume the wrong underlying function and if we choose to use a flexible model, there is a high risk of overfitting.
Advantages/Disadvantages of non-parametric approaches: we do not need any assumption about the underlying function or the relationship between the parameters but to find this, we require a large dataset. This type of model will work well for complex relations and many real-life applications since most things are not linear. Likely, the predictions will be better if we use a non-parametric model, but it is often a black box, and we cannot explain or interpret how it works. The risk of overfitting is high.
K-nearest neighbors - If the Bayes decision boundary in this problem is highly non linear, then would we expect the best value for K to be large or small? Why?
Exercise 2.4.7
The best value of K should be small because if we use a large value, then we will have to average over all the values which lead to a linear relation. Small values of K result in a KNN model that is more flexible and non-linear. Large values of K means that we use more datapoints in the KNN model and the decision boundary becomes closer to linear. → Highly non-linear boundaries should have a small value of K. More flexibility.
Is nearest neighbor averaging good for small or large values of p?
Small p
Curse of dimensionality
In larger dimensions, near neighbors are far away in high dimensions. We need to average over a large neighborhood.
The output is qualitative - Regression or classification?
Classification
The output is quantitative - Regression or classification?
Regression
Give examples of classification problems
Drinkable water
Pass or fail exam
Animals on pictures
Disease
Give examples of regression problems
Money spent yearly on medical care
House prices
Predict profit and sales
Give examples when we can use cluster analysis
Product recommendations
Anomaly detection
Image separation
Advantages of very flexible models
- Can capture complex relations
- Incorporate more variables and learn more about how they relate to each other and the desired output
- Bias has decreased and we can fit the data better
- Less assumptions need to be made beforehand
Disadvantage of very flexible models
- Higher risk of overfitting
- Can become computationally difficult
- Need more data to train the model
- The variance is increased.
- Make assumptions beforehand on the underlying structure or on what variables to include
When to use less flexible models
- We do not have a lot of data and/or variables.
- Prioritizing illustrate and interpret,
- Identifying inferences is desired
When to use more flexible models
- When a simpler model does not perform well enough
- When we think the underlying relation is complex
- High quality and detailed predictions are a priority
- When the data set is large andwe have computational power
- Interpretability is not crucial