Wronged Questions: Linear Models Flashcards

1
Q

T/F: Error terms are considered to have a dimensionless measure.

A

False. The error term is not dimensionless. Since it is defined as ei = Yi - B0 - B1Xi, it has the same unit as the target variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

T/F: The error representation is based on the Poisson theory of errors.

A

False. The error representation is based on the Gaussian theory of errors. The error terms follow a Gaussian/normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

T/F: Error terms are also known as disturbance terms.

A

True. The error representation is based on the Gaussian theory of errors. The error terms follow a Gaussian/normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T/F: Error terms are observable quantities.

A

False. There isn’t a specific definition for disturbance terms. The Frees text (page 31) states that error terms are also called disturbance terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F: A model with a higher sum of squared errors has a higher total sum of squares compared to a model with lower sum of squared errors.

A

False. The total sum of squares for both models would be the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F: The validation set approach is a special case of k-fold cross-validation.

A

False. Neither the validation set approach nor k-fold CV is a special case of each other.

Note that LOOCV is a special case of k-fold CV with k = n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: The validation set approach is conceptually complex to implement.

A

False. The validation set approach is conceptually simple and easy to implement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: Performing the validation set approach multiple times always yield the same results.

A

False. While performing LOOCV multiple times always yields the same results, this is not true for the validation set approach, where results vary due to randomness in the split.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F: The validation error rate will tend to underestimate the test error rate.

A

False. One of the major drawbacks of the validation set approach is that it only uses the training dataset

LOOCV uses all the data so it doesn’t have this issue as much.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F: The validation set approach has higher bias than leave-one-out cross-validation.

A

True. The LOOCV approach has lower bias than the validation set approach since almost all data is used in the training set, meaning it does not overestimate the test error rate as much as the validation set approach.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F: The validation set approach is conceptually simple and straightforward to implement.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

T/F: The validation estimate of the test error rate can exhibit high variability, depending on the composition of observations in the training and validation sets.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T/F: The model is trained using only a subset of the observations, specifically those in the training set rather than the validation set.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

T/F: Given that statistical methods typically perform worse when trained on fewer observations, this implies that the validation set error rate may tend to underestimate the test error rate for the model fitted on the entire dataset.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T/F: The leverage for each observation in a linear model must be between 1/n and 1.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: The n leverages in a linear model must sum to the number of explanatory variables.

A

False. The leverages must sum to p+1, which is the number of predictors plus the intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F: If an explanatory variable is uncorrelated with all other explanatory variables, the corresponding variance inflation factor would be zero.

A

False. If an explanatory variable is uncorrelated with all other explanatory variables, the corresponding variance inflation factor would be 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F: In best subset selection the predictors in the k-variable model must be a subset of the predictors in the (k+1)-variable model.

A

False. The predictors in the k-variable model do not need to be a subset of those in the (k+1)-variable model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

T/F: In best subset selection, if p is the number of potential predictors, then 2^(p-1) models have to be fitted.

A

False. The correct number of models that need to be fitted is 2^p, not 2^(p-1).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T/F: In best subset selection, the residual sum of squares of the k-variable model is always lower than that of the (k+1)-variable model.

A

False. The residual sum of squares for the k-variable model must be higher than or equal to that of the (k+1)-variable model (as long as the models are nested).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

T/F: In each step of best subset selection, the most statistically significant variable is dropped.

A

False. In each step of best subset selection, the least statistically significant variable is dropped.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

T/F: In high-dimensional settings, best subset selection is computationally infeasible.

A

True. In high-dimensional settings, the computational complexity of fitting all possible models makes best subset selection infeasible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

se_b0

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

se_b1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

se_hat(y) - used for estimators

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

se_hat(y)_n+1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Frees rule of thumb for identifying outliers

A

Observation is an outlier if the standardised residuals exceeds 2 in absolute value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

High leverage point

A

Observation that is unusual in the horizontal direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

R^2 adj

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

F statistic

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Variance-covariance matrix

A

(X^TX)^-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Mallow’s C_p

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

AIC

A

-2ln(L)+2k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

BIC

A

kln(n)-2ln(L)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Leverage formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Cook’s distance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Breusch-Pagan test for heteroscedasticity

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

LOOCV Error

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Centered variable

A

Variable resulting from subtracting the sample mean from the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Scaled variable

A

Variable resulting from dividing a variable by its standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Standardised variable

A

Variable resulting from first centering, then scaling the variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Ridge regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Lasso Regression

A
  • performs variable selection
  • yields more interpretable models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Frees rule of thumb for high leverage points

A

[3(p+1)]/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Coefficient Matrix

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

T/F: The best model by AIC will not also be the best model by 𝐶_p.

A

False. The best model by AIC will also be the best model by C_p.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

T/F: AIC, BIC, C_p, and R^2adj are not reliable when the model has been overfitted.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

List the cross validation techniques in order of least to most bias

A

LOOCV < k-fold < validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

List the cross validation techniques in order of most to least variance

A

LOOCV > k-fold > validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

F statistic using R^2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Sum of Squares Regression (SSR)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

T/F: The standard error of the regression provides an estimate of the variance of y for a given x based on n-1 degrees of freedom.

A

False. The standard error of the regression provides an estimate of the variance of y for a given x based on n-2 degrees of freedom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

T/F: In forward stepwise selection, if p is the number of potential predictors, then 2^p models have to be fitted.

A

False. Forward stepwise follows a more complex pattern. 2^p applies to best subset selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

T/F: The predictors in the k-variable model must be a subset of the predictors in the (k+1)-variable model in forward stepwise selection.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

T/F: At each iteration, the variable chosen is the one that minimizes the test RSS based on cross-validation in forward stepwise selection.

A

False. At each iteration, the variable chosen is the one that minimizes the training RSS based on cross-validation in forward stepwise selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

T/F: Forward subset selection cannot be used even if the number of variables is greater than the number of observations.

A

False. Backward subset selection cannot be used if the number of variables is greater than the number of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

T/F: The least squares line always passes through the point [bar(x), bar(y)].

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

T/F: The squared sample correlation between x and y is equal to the coefficient of determination of the model.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

T/F: The choice of explanatory variable x affects the total sum of squares.

A

False, because SST is not a function of x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

T/F: The F-statistic of the model is the square of the t-statistic of the coefficient estimate for x.

A

True. This is true if both tests have the same set of hypotheses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

T/F: A random pattern in the scatterplot of y against x indicates a coefficient of determination close to zero.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Var(X+Y)

A

Var(X) + Var(Y) + 2Cov(X,Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

Var(X-Y)

A

Var(X) + Var(Y) - 2Cov(X,Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

T/F: As λ increases, the budget parameter increases.

A

False. An increase in λ actually corresponds to a decrease in the “budget” allowed for the coefficients’ magnitudes, not an increase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

T/F: As λ decreases towards 0, the model becomes more biased.

A

False. As λ decreases towards 0, the model becomes less biased due to flexibility (and thus variance) increasing when λ decreases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

T/F: Increasing the budget parameter decreases the variance of the model.

A

False. Increasing the budget parameter decreases λ, which results in an increase in variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

T/F: As λ decreases toward 0, the coefficient estimates become identical to those from ordinary least squares.

A

True. When λ is close to 0, the penalty effect diminishes, and the estimates approach those of the ordinary least squares, which has no penalty for coefficient size.

66
Q

T/F: A high λ value ensures that all coefficient estimates will be exactly zero.

A

False. A high λ value shrinks coefficients towards zero but does not ensure all are exactly zero

67
Q

T/F: Backward stepwise selection is computationally efficient compared to best subset selection.

A

True. Backward stepwise selection is more computationally efficient than best subset selection because it evaluates a smaller subset of models.

68
Q

T/F: Backwards stepwise selection cannot be used if the number of variables is greater than the number of observations.

A

True. Unlike the forward stepwise selection, backward stepwise selection cannot be used if the number of variables is greater than the number of observations.

69
Q

T/F: Backwards stepwise selection can be applied in a high-dimensional setting, unlike forward stepwise selection.

A

False. In high-dimensional settings, backward stepwise selection is not feasible, whereas forward stepwise selection remain feasible.

70
Q

T/F: At each step for backward selection, the variable to be dropped is the one whose absence causes the smallest decrease in the coefficient of determination.

A

True. The most statistically insignificant variable (the one whose absence causes the smallest decrease in the R^2) is dropped at each step of backward stepwise selection.

71
Q

T/F: If p is the number of potential predictors for backwards stepwise selection, then a total of 1+p(p+1)/2 models have to be fitted, which is the same as in the forward stepwise selection.

A

True.

72
Q

T/F: Shrinkage methods increase model rigidity, leading to better prediction accuracy when the rise in bias is offset by a larger reduction in variance.

A

True. Shrinkage methods, such as ridge and lasso, make the model less flexible than OLS, resulting in higher bias but lower variance. These methods improve prediction accuracy when the increase in bias is smaller than the decrease in variance.

73
Q

T/F: Shrinkage methods increase model rigidity, thus enhancing prediction accuracy when the rise in variance is offset by a larger reduction in bias.

A

False

74
Q

T/F: Shrinkage methods increase model flexibility, leading to better prediction accuracy when the rise in bias is offset by a larger reduction in variance.

A

False. They decrease model flexibility.

75
Q

T/F: Shrinkage methods increase model flexibility, thus enhancing prediction accuracy when the rise in variance is offset by a larger reduction in bias.

A

False

76
Q

T/F: The validation set approach tends to have higher bias in error estimates compared to k-fold cross-validation due to its dependency on a single split of the data.

A

True. The validation set approach may suffer from high bias if the validation split does not well-represent the entire dataset​​.

77
Q

T/F: LOOCV can be computationally expensive as it requires the model to be trained n times, where n is the number of observations.

A

True. LOOCV is computationally intensive because it involves training the model n times, each time leaving out a different single observation​​.

78
Q

T/F: K-fold cross-validation typically results in a better trade-off between bias and variance in error estimates compared to the validation set approach.

A

True. By repeatedly splitting the data into different subsets for training and validation, k-fold cross-validation provides a more robust estimate of model performance, striking a better balance between bias and variance in error estimates.

79
Q

T/F: With different random splits, the validation set approach generally provides more stable and less variable error estimates than k-fold cross-validation due to the larger size of the validation set.

A

False. With different random splits, the validation set approach can result in more variability and less stability in error estimates compared to K-fold CV, which benefits from averaging across multiple splits​​.

80
Q

T/F: K-fold cross-validation is generally less suitable for very small datasets compared to LOOCV, which uses all data points except one for training in each iteration.

A

True. LOOCV is particularly useful for small datasets because it maximally utilizes available data for training, while K-fold CV might not provide enough data in each training fold for very small datasets​​.

81
Q

T/F: Stepwise regression ensures the inclusion of all possible models, preventing the risk of data snooping.

A

False. Stepwise regression often involves data snooping, where fitting a large number of models to one set of data increases the chance of finding one that appears to fit well by chance.

82
Q

T/F: Stepwise regression automatically account for non-linear relationships and the presence of outliers and high leverage points in the data.

A

False. Stepwise regression typically does not account for non-linear relationships or the presence of outliers and high leverage points unless specifically designed to do so.

83
Q

T/F: Stepwise regression guarantees that the final model selected is the best among all possible models constructed from linear combinations of the predictors.

A

False. There’s no guarantee that the final model returned is the best one among all models that can be constructed, as only a subset of the possible models are considered.

84
Q

T/F: Stepwise regression relies on a variety of statistical measures, not just t-ratios, for determining which variables to add or remove.

A

False. Stepwise regression often relies heavily on t-ratios for adding or removing variables, rather than utilizing a broader array of statistical measures.

85
Q

T/F: Stepwise regression may overlook the best model because only a subset of all possible 2^k models are considered, where k is the number of predictors.

A

True. One of the primary drawbacks of stepwise regressions is that they consider only some of the 2^k possible models. Consequently, there’s a possibility that the best model, especially if it involves non-linear combinations of predictors or is otherwise outside the considered subset, might be missed.

86
Q

T/F: Not all multicollinearity problems can be detected by inspecting the correlation matrix.

A

True

87
Q

T/F: Severe multicollinearity reduces the accuracy of the estimates of the regression coefficients.

A

True

88
Q

T/F: The presence of multicollinearity always implies that the information provided by one variable is redundant in the presence of another variable.

A

False. The presence of multicollinearity does not always imply that the information is redundant. It is possible for two variables that are highly correlated to complement one another. This is the case of a suppressor variable, where a predictor increases the importance of another predictor.

89
Q

T/F: As a rule of thumb, when tolerance is less than 0.1 or 0.2, severe multicollinearity exists.

A

True. Tolerance is 1/VIF. Tolerances of 0.1 or 0.2 means that VIF is above 5 or 10

90
Q

T/F: The presence of severe multicollinearity makes it difficult to detect the importance of a variable.

A

True

91
Q

Two transformations that deal with heteroscedasticity

A

Logarithmic and square root transformation

92
Q

Correlation formula

A
93
Q

R^2 using covariance formula

A

Correlation formula ^2

94
Q

T/F: The standard least squares coefficient estimates are scale equivariant implying that regardless of how the j-th predictor is scaled, X_jB_j will remain the same.

A

True. On the other hand, ridge regression estimates are not scale equivariant. This is why scaling affects the result of ridge regression, and we often recommend variables to be scaled prior to performing ridge regression.

95
Q

T/F: Ridge regression’s advantage over least squares is rooted in the bias-variance trade-off because as the tuning parameter λ increases, the flexibility of the ridge regression fit increases.

A

False. As λ increases, flexibility decreases.

96
Q

As λ increases, the shrinkage of the ridge coefficient estimates leads to a substantial reduction in the variance of the predictions, at the expense of a slight increase in bias.

A

True. As λ increases, the flexibility of the ridge regression fit decreases, leading to decreased variance but increased bias.

97
Q

T/F: LOOCV is computationally prohibitive when used to validate least squares polynomial regression.

A

False. For polynomial or linear regression, LOOCV is computationally efficient because of the shortcut formula available. The shortcut formula enables the calculation of the estimated test error from just a single round of fitting, which makes the cost of LOOCV the same as a single model fit.

98
Q

T/F: k-fold cross-validation does not overestimate the test error rate as much as LOOCV.

A

False. k-fold CV, by using smaller training sets, tends to overestimate the test error more than LOOCV.

99
Q

T/F: 5-fold cross-validation requires more computational resources than 10-fold cross-validation.

A

False. 5-fold CV is more computationally efficient than 10-fold CV because it requires fewer runs.

100
Q

T/F: If the MSE of the full model containing all k predictors is an unbiased estimator of the true error variance, then Cp is an unbiased estimator of the test MSE.

A

True. The statement is about an unbiasedness property of Cp and reinforces the idea that selecting a model with a small Cp tends to lead to a model with a small test MSE.

101
Q

T/F: AIC and BIC are more general than Cp as they are applicable to linear, non-linear, and other general types of models fitted by maximum likelihood.

A

True.

102
Q

T/F: AIC and BIC provide an indirect estimate of the test error, while Cp provides a direct estimate of the test error.

A

False. All of the four model comparison statistics (adjusted R^2, Cp, AIC, BIC) aim to indirectly estimate the test error by adjusting the training error to account for model complexity. Direct estimation of test error can be achieved through resampling methods such as the validation set approach and cross-validation.

103
Q

T/F: Direct estimation of test error can be achieved through resampling methods such as the validation set approach and cross-validation.

A

True

104
Q

T/F: There are n-1 squared errors, one for each of the observation included in the fitting process for LOOCV.

A

False. There are n squared errors, one for each of the observation included in the fitting process for LOOCV.

105
Q

T/F: In the context of regression models with Gaussian errors, mallow’s Cp is an unbiased estimate of the test MSE if it is calculated using an unbiased estimate of σ^2.

A

True

106
Q

T/F: In the context of regression models with Gaussian errors, mallow’s Cp and Akaike information criterion are proportional to each other.

A

True

107
Q

T/F: In the context of regression models with Gaussian errors, a large value of Mallow’s Cp indicates a model with a high test error.

A

True

108
Q

T/F: Stepwise regression is designed to prioritize models based on non-linear combinations of predictors to accommodate complex relationships.

A

False. Stepwise regression does not consider models based on non-linear combinations of predictors, focusing instead on linear relationships.

109
Q

T/F: Stepwise regression effectively incorporates external knowledge or insights an investigator may have about the data.

A

False

110
Q

T/F: Stepwise regression guarantees that the model selected will not be influenced by outliers or high leverage points in the data.

A

False

111
Q

T/F: Stepwise regression primarily relies on t-ratios for adding or removing variables, which may not always be the most appropriate criterion.

A

True

112
Q

Studentised residuals

A
113
Q

Standardised residuals formula

A
114
Q

T/F: The residuals versus fitted values plot can be used to detect multicollinearity.

A

False

115
Q

T/F: Since irrelevant variables lead to unnecessary complexity in the resulting model, we can obtain a more easily interpretable model by setting the corresponding coefficient estimates to zero.

A

True

116
Q

T/F: The least squares approach is unlikely to yield any coefficient estimates that are precisely zero.

A

True

117
Q

T/F: Best subset selection automatically selects features or variables by excluding irrelevant variables from a multiple regression model.

A

True

118
Q

T/F: Partial Least Squares (PL) is a dimension reduction method.

A

True. Like principal components analysis (PCA), PLS is a dimension reduction method.

119
Q

T/F: After standardizing the predictors, PLS computes the first direction by setting its loadings to the coefficients from the simple linear regression of the response onto each original predictor.

A

True

120
Q

T/F: PLS identifies new features in an unsupervised way by approximating the original predictors, similar to principal components analysis.

A

False. Unlike principal components regression (PCR), PLS identifies new features in a supervised way; it uses the target variable to create new features that not only approximate the old features well, but also that are related to the target variable.

121
Q

T/F: In computing the first direction, PLS places the highest weight on the variables that are most strongly related to the response.

A

True. Since the slope coefficients for each simple linear regression are used for the first direction, a larger value indicates a stronger relationship with the target variable.

122
Q

T/F: The loadings for the first direction are proportional to the covariances between the response and each standardized predictor.

A

True. The loadings for the first direction are proportional to the correlations between the response and each predictor. Since the predictors are standardized, the correlations and the covariances are proportional to each other. Hence, the loadings are proportional to the covariances as well.

123
Q

T/F: Relationships among model deviations in SLR indicate a model misspecification issue.

A

True

124
Q

T/F: One option to handle heteroscedasticity in SLR is to use a logarithmic transformation of the dependent variable.

A

True

125
Q

T/F: If model deviations are associated with a variable, utilizing this information should enhance model specification in SLR.

A

True

126
Q

T/F: Departure from normality in the distribution of the deviations is a sign of model misspecification issue in SLR.

A

True

127
Q

T/F: K-fold CV entails randomly partitioning the observations into k groups, or folds, each of approximately equal size.

A

True

128
Q

T/F: One fold is designated as the validation set, while the model is trained on the remaining k folds.

A

False. One fold is designated as the validation set, while the model is trained on the remaining k-1 folds.

129
Q

T/F: The validation process is iterated k times, with a different fold of observations serving as the validation set each time, yielding k estimates of the test error.

A

True

130
Q

T/F: The k-fold cross-validation error estimate is determined by averaging the resulting mean squared error values obtained from the k validation iterations.

A

True

131
Q

T/F: Linear regression has the advantage of easy fitting by estimating only a small number of coefficients.

A

True

132
Q

T/F: Given a value for K and a prediction point x0, KNN regression identifies the K training observations that are closest to x0, and then estimates f(x0) using the distance-weighted average of all these K training responses.

A

False. KNN regression estimates f(x0) using the average of all the K training responses, not distance-weighted average.

133
Q

T/F: Ridge regression is less flexible than OLS and thus results in an improved prediction accuracy when its increase in bias is less than its decrease in variance.

A

True

134
Q

T/F: The linear model offers distinct advantages in inference and often proves to be surprisingly competitive with non-linear methods in addressing real-world problems when the true relationship between the predictor and response is approximately linear.

A

True

135
Q

T/F: If the true relationship between the target variable and the predictors is approximately linear, then the least squares estimates should be accurate, hence exhibiting low bias.

A

True

136
Q

T/F: If the number of observations is much larger than the number of variables, then the least squares estimates tend to have low variance and will perform well on test data when the true relationship between the target variable and the predictors is approximately linear.

A

True

137
Q

T/F: If the number of observations is not much larger than the number of variables, then there can be a lot of variability in the least squares fit, resulting in overfitting and poor predictions on unseen test observations when the true relationship between the target variable and the predictors is approximately linear.

A

True

138
Q

T/F: If the number of observations is less than the number of variables, then there is a unique least squares coefficient estimate when the true relationship between the target variable and the predictors is approximately linear.

A

False. If the number of variables is greater than the number of observations, then there is no longer a unique least squares coefficient estimate: the variance is infinite so the method cannot be used at all.

139
Q

T/F: The lasso tends to perform better than ridge regression when the response variable is a function of many predictors.

A

False. Ridge regression that tends to perform better when many predictors influence the response variable.

140
Q

T/F: Ridge regression has the advantage of forcing coefficient estimates to exactly zero.

A

False. Only Lasso can force coefficients to exactly zero.

141
Q

T/F: The penalty function in the lasso is a function of the l2
norm of the coefficients.

A

False. The lasso penalty is based on the l1 norm, not the l2 norm. In contrast, ridge penalty is based on the l2 norm.

142
Q

T/F: It is best to standardize predictors when using either ridge regression or the lasso.

A

True. Standardizing predictors ensures that the shrinkage penalty is applied uniformly across all predictor coefficients, which is important for both ridge regression and the lasso.

143
Q

T/F: The lasso tends to yield models that have better prediction accuracy than ridge regression.

A

False. Whether the lasso or ridge regression yields better prediction accuracy depends heavily on the specific dataset and situation.

144
Q

T/F: Studentised residuals in MLR should be realizations of a t-distribution.

A

True

145
Q

T/F: Studentised residuals in MLR should be realizations of a normal distribution.

A

False. Studentised residuals in MLR should be realizations of a t distribution.

146
Q

T/F: Studentised residuals in MLR have the same unit as the response variable.

A

False. Studentized residuals are comparable across different contexts because they are unitless.

147
Q

T/F: An observation with a negative studentized residual is likely an outlier.

A

False. A likely outlier is indicated by the magnitude of the studentized residual, not its sign.

148
Q

T/F: An observation with a large studentized residual is likely a high leverage point.

A

False. A high leverage point is identified using leverage, not studentized residual.

149
Q

(Large/small) R^2 and (big/small) t-statistics may suggest the presence of multicollinearity in multiple linear regression model.

A

Large R^2 and small t-statistics

150
Q

T/F: As K increases the test error usually decreases initially and then starts to increase.

A

True

151
Q

T/F: R^2 measures the linear relationship between y and the predictions hat(y).

A

True

152
Q

T/F: The training residual sum of squares will steadily decrease as λ increases for Lasso.

A

False. As λ increases the training error will increase because the flexibility has decreased.

153
Q

T/F: As λ increases for Lasso, the test residual sum of squares will remain constant.

A

False. The test error decreases initially, and then increases, following a U shape.

154
Q

T/F: As λ increases for Lasso, the variance will increase initially, and then finally decrease, following an inverted U shape.

A

False. The variance will steadily decrease.

155
Q

T/F: As λ increases for Lasso, the squared bias will steadily increase.

A

True.

156
Q

T/F: As λ increases for Lasso, the irreducible error will decrease initially, and then finally increase, following a U shape.

A

False. The irreducible error never changes.

157
Q

T/F: The validation set approach involves randomly dividing the available set of observations into two parts, a training set and a validation set.

A

True

158
Q

T/F: In validation set approach, the model is fit on the training set, and the validation error is estimated by applying the model to predict responses for the observed data in the validation set.

A

True

159
Q

T/F: The validation set approach can give highly variable results if the size of the validation set is not large enough.

A

True

160
Q

T/F: Validation set approach is typically repeated several times with different training-validation splits to reduce variability in the validation error estimate.

A

True

161
Q

T/F: The validation set error rate is skewed because the model has been optimized for the validation set.

A

False. The validation set error rate is not skewed because the model is fitted to the training set, not the validation set.

162
Q

T/F: LOOCV uses a single observation from the dataset for the validation each time, which leads to high variance in the error estimates.

A

True

163
Q

T/F: The estimate for test MSE in LOOCV is the average of the squared errors from each single validation.

A

True

164
Q

T/F: As λ increases, the sum of bj^2 increases for ridge regression.

A

False. The sum of bj^2 must decreases to minimise the SSE penalty. It is still possible for INDIVIDUAL bj’s to increase in absolute value as λ increases.

165
Q

T/F: As λ increases, it is not possible for an individual bj
to increase in absolute value for ridge regression.

A

False. It is still possible for INDIVIDUAL bj’s to increase in absolute value as λ increases.