Wronged Questions: Statistical Learning Flashcards
Adding more predictions will never (increase/decrease) R^2
Decrease
T/F: The expected test MSE formula applies to both quantitative and qualitative responses
False. It only applies to quantitative responses
T/F: In the classification setting, the bias-variance trade-off does not apply since y_i is quantitative.
False. The bias-variance trade-off does indeed apply in the classification setting, albeit with some modifications due to
being categorical rather than quantitative.
T/F: The training error rate is defined as the proportion of correct classifications made when applying our estimate
to the training observations.
False. The training error rate is defined as the proportion of incorrect classifications made when applying our estimate
to the training observations.
T/F: A classifier’s effectiveness is determined by the magnitude of its training error rate rather than its test error rate.
False. A classifier’s effectiveness is determined by the magnitude of its test error rate rather than its training error rate.
T/F: The Bayes classifier is known to produce the highest possible test error rate, known as the Bayes error rate.
False. The Bayes classifier is known to produce the lowest possible test error rate, known as the Bayes error rate.
T/F: The Bayes error rate serves a role similar to that of the irreducible error in the classification setting.
True. The Bayes error rate is analogous to the irreducible error in classification, representing the lowest error rate that can be achieved by any classifier and is due to the noise in the data itself.
Less flexible, more interpretable
Lasso, subset selection
Moderately flexible and interpretable
Least squares, regression trees, classification trees
More flexible, less interpretable
Bagging, boosting
T/F: The K-Nearest Neighbor algorithm determines the function for f
that includes free parameters.
False. KNN is non-parametric
T/F: Random forest makes no assumption about f’s function form.
True, random forest is non-parametric
T/F: Compared to non-parametric methods, parametric methods are more versatile in fitting various forms of f.
False. Non-parametric methods are more versatile because they make no assumptions about f’s form.
T/F: Compared to non-parametric methods, parametric methods typically need a larger number of observations to accurately estimate f.
False. Parametric methods need smaller sample sizes because they already assume a pattern
T/F: Compared to non-parametric methods, parametric methods are generally more difficult to interpret.
False, parametric methods are easier to interpret due to them already having built-in structure
T/F: Decision boundaries become overly flexible and capture noise, especially at lower K values.
True
T/F: Larger K values lead to smoother, less flexible boundaries that better generalize to new data.
True
T/F: Smaller K values result in boundaries that closely follow the training data, potentially capturing noise.
True
T/F: Larger K values result in highly complex boundaries that can easily overfit the data.
False. Smaller K values result in highly complex boundaries that can easily overfit the data.
T/F: Smaller K values create smoother and more generalized boundaries.
False. Larger K values create smoother and more generalized boundaries.
Inference
Exploring + confirming the significant associations between predictors and the response
Non-parametric learning methods
1) KNN
2) Decision trees
3) Bagging, boosting, random forest
Bias
Error arising from the assumptions made in the statistical learning tool
Variance
Error arising from the model’s sensitivity towards the training data
T/F: If we are mainly interested in inference, then restrictive models are much more interpretable.
True
T/F: Subset selection is more interpretable than linear regression.
True
T/F: Bagging and boosting are more restrictive than linear regression.
False
T/F: Highly flexible methods make it difficult to discern how changes in predictors affect the response.
True
T/F: The reducible error provides a lower bound for the test error.
False
T/F: In the classification setting, the bias-variance trade-off does not apply since yi is quantitative.
False. The bias-variance trade-off does indeed apply in the classification setting, albeit with some modifications due to yi being categorical rather than quantitative.
T/F: KNN can often produce classifiers that are close to the optimal Bayes classifier.
True