Wronged Questions: Statistical Learning Flashcards by Yvonne L

Adding more predictions will never (increase/decrease) R^2

Decrease

How well did you know this?

Not at all

Perfectly

T/F: The expected test MSE formula applies to both quantitative and qualitative responses

False. It only applies to quantitative responses

How well did you know this?

Not at all

Perfectly

T/F: In the classification setting, the bias-variance trade-off does not apply since y_i is quantitative.

False. The bias-variance trade-off does indeed apply in the classification setting, albeit with some modifications due to
being categorical rather than quantitative.

How well did you know this?

Not at all

Perfectly

T/F: The training error rate is defined as the proportion of correct classifications made when applying our estimate
to the training observations.

False. The training error rate is defined as the proportion of incorrect classifications made when applying our estimate
to the training observations.

How well did you know this?

Not at all

Perfectly

T/F: A classifier’s effectiveness is determined by the magnitude of its training error rate rather than its test error rate.

False. A classifier’s effectiveness is determined by the magnitude of its test error rate rather than its training error rate.

How well did you know this?

Not at all

Perfectly

T/F: The Bayes classifier is known to produce the highest possible test error rate, known as the Bayes error rate.

False. The Bayes classifier is known to produce the lowest possible test error rate, known as the Bayes error rate.

How well did you know this?

Not at all

Perfectly

T/F: The Bayes error rate serves a role similar to that of the irreducible error in the classification setting.

True. The Bayes error rate is analogous to the irreducible error in classification, representing the lowest error rate that can be achieved by any classifier and is due to the noise in the data itself.

How well did you know this?

Not at all

Perfectly

Less flexible, more interpretable

Lasso, subset selection

How well did you know this?

Not at all

Perfectly

Moderately flexible and interpretable

Least squares, regression trees, classification trees

How well did you know this?

Not at all

Perfectly

More flexible, less interpretable

Bagging, boosting

How well did you know this?

Not at all

Perfectly

T/F: The K-Nearest Neighbor algorithm determines the function for f
that includes free parameters.

False. KNN is non-parametric

How well did you know this?

Not at all

Perfectly

T/F: Random forest makes no assumption about f’s function form.

True, random forest is non-parametric

How well did you know this?

Not at all

Perfectly

T/F: Compared to non-parametric methods, parametric methods are more versatile in fitting various forms of f.

False. Non-parametric methods are more versatile because they make no assumptions about f’s form.

How well did you know this?

Not at all

Perfectly

T/F: Compared to non-parametric methods, parametric methods typically need a larger number of observations to accurately estimate f.

False. Parametric methods need smaller sample sizes because they already assume a pattern

How well did you know this?

Not at all

Perfectly

T/F: Compared to non-parametric methods, parametric methods are generally more difficult to interpret.

False, parametric methods are easier to interpret due to them already having built-in structure

How well did you know this?

Not at all

Perfectly

T/F: Decision boundaries become overly flexible and capture noise, especially at lower K values.

Study These Flashcards

True

T/F: Larger K values lead to smoother, less flexible boundaries that better generalize to new data.

Study These Flashcards

True

T/F: Smaller K values result in boundaries that closely follow the training data, potentially capturing noise.

Study These Flashcards

True

T/F: Larger K values result in highly complex boundaries that can easily overfit the data.

Study These Flashcards

False. Smaller K values result in highly complex boundaries that can easily overfit the data.

T/F: Smaller K values create smoother and more generalized boundaries.

Study These Flashcards

False. Larger K values create smoother and more generalized boundaries.

Inference

Study These Flashcards

Exploring + confirming the significant associations between predictors and the response

Non-parametric learning methods

Study These Flashcards

1) KNN
2) Decision trees
3) Bagging, boosting, random forest

Bias

Study These Flashcards

Error arising from the assumptions made in the statistical learning tool

Variance

Study These Flashcards

Error arising from the model’s sensitivity towards the training data

T/F: If we are mainly interested in inference, then restrictive models are much more interpretable.

True

T/F: Subset selection is more interpretable than linear regression.

True

T/F: Bagging and boosting are more restrictive than linear regression.

False

T/F: Highly flexible methods make it difficult to discern how changes in predictors affect the response.

True

T/F: The reducible error provides a lower bound for the test error.

False

T/F: In the classification setting, the bias-variance trade-off does not apply since yi is quantitative.

False. The bias-variance trade-off does indeed apply in the classification setting, albeit with some modifications due to yi being categorical rather than quantitative.

T/F: KNN can often produce classifiers that are close to the optimal Bayes classifier.

True

Wronged Questions: Statistical Learning Flashcards

(31 cards)