Chapter 2 - Statistical learning Flashcards

1
Q

Explain typical (squared) bias, variance, training error, test error, and Bayes (or irreducible) error curves as we go from less flexible statistical learning methods towards more flexible approaches.

Exercise 2.4.3

A

Flexibility can be introduced by having more parameters where only using the mean is the simplest model.

Bayes error is an irreducible error. It is not possible to make a model better than that.

Training error will start relatively high when the model is not flexible, then it will be reduced once the flexibility increases. If the training error goes down below the Bayes error, the model is overfit.

Bias follows the training error, the better fit, the less error we have and the more flexible the model is, the more bias because we fit the points better.

Variance is the opposite to bias. Small changes in the input will have high impact in the response.

The test error will likely be higher than the training error at the start with a low flexibility model. Once the flexibility increases, the error will go down but eventually it will increase again. This leads to the trade-off between variance and bias. The test error increases when we start to overfit the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the differences between a parametric and a non-parametric statistical learning approach. What are the advantages of a parametric approach to regression or classification (as opposed to a non-parametric approach)? What are its disadvantages?

Exercise 2.4.6

A

Parametric approaches assume a form or relationship in the data, for example that it is linear. The problem then becomes reduced into estimating the parameters required for that assumed relationship.

Non-parametric approaches have no such assumption and there is no functional form. Using a non-parametric approach requires a very large dataset if we want to make accurate predictions.

Advantage of parametric approaches: Computations is much simpler, it is more interpretable, and requires less data. A parametric approach model will have less parameters which makes it more computationally efficient, and we do not need as many observations or a large dataset.

Disadvantage of parametric approaches: we might assume the wrong underlying function and if we choose to use a flexible model, there is a high risk of overfitting.

Advantages/Disadvantages of non-parametric approaches: we do not need any assumption about the underlying function or the relationship between the parameters but to find this, we require a large dataset. This type of model will work well for complex relations and many real-life applications since most things are not linear. Likely, the predictions will be better if we use a non-parametric model, but it is often a black box, and we cannot explain or interpret how it works. The risk of overfitting is high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

K-nearest neighbors - If the Bayes decision boundary in this problem is highly non linear, then would we expect the best value for K to be large or small? Why?

Exercise 2.4.7

A

The best value of K should be small because if we use a large value, then we will have to average over all the values which lead to a linear relation. Small values of K result in a KNN model that is more flexible and non-linear. Large values of K means that we use more datapoints in the KNN model and the decision boundary becomes closer to linear. → Highly non-linear boundaries should have a small value of K. More flexibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is nearest neighbor averaging good for small or large values of p?

A

Small p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Curse of dimensionality

A

In larger dimensions, near neighbors are far away in high dimensions. We need to average over a large neighborhood.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The output is qualitative - Regression or classification?

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The output is quantitative - Regression or classification?

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give examples of classification problems

A

Drinkable water
Pass or fail exam
Animals on pictures
Disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give examples of regression problems

A

Money spent yearly on medical care
House prices
Predict profit and sales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Give examples when we can use cluster analysis

A

Product recommendations
Anomaly detection
Image separation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantages of very flexible models

A
  • Can capture complex relations
  • Incorporate more variables and learn more about how they relate to each other and the desired output
  • Bias has decreased and we can fit the data better
  • Less assumptions need to be made beforehand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Disadvantage of very flexible models

A
  • Higher risk of overfitting
  • Can become computationally difficult
  • Need more data to train the model
  • The variance is increased.
  • Make assumptions beforehand on the underlying structure or on what variables to include
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When to use less flexible models

A
  • We do not have a lot of data and/or variables.
  • Prioritizing illustrate and interpret,
  • Identifying inferences is desired  
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use more flexible models

A
  • When a simpler model does not perform well enough
  • When we think the underlying relation is complex
  • High quality and detailed predictions are a priority
  • When the data set is large andwe have computational power
  • Interpretability is not crucial
How well did you know this?
1
Not at all
2
3
4
5
Perfectly