Wronged Questions: Linear Models Flashcards

Question

se_hat(y) - used for estimators

Answer 1

Observation is an outlier if the standardised residuals exceeds 2 in absolute value

Answer 2

Observation that is unusual in the horizontal direction

Answer 3

-2ln(L)+2k

Answer 4

kln(n)-2ln(L)

Answer 5

Variable resulting from subtracting the sample mean from the variable

Answer 6

Variable resulting from dividing a variable by its standard deviation

Answer 7

Variable resulting from first centering, then scaling the variable

Answer 8

- performs variable selection - yields more interpretable models

Answer 9

[3(p+1)]/n

Answer 10

False. The best model by AIC will also be the best model by C_p.

Answer 11

LOOCV < k-fold < validation set

Answer 12

LOOCV > k-fold > validation set

Answer 13

False. The standard error of the regression provides an estimate of the variance of y for a given x based on n-2 degrees of freedom.

Answer 14

False. Forward stepwise follows a more complex pattern. 2^p applies to best subset selection

Answer 15

False. At each iteration, the variable chosen is the one that minimizes the training RSS based on cross-validation in forward stepwise selection.

Answer 16

False. Backward subset selection cannot be used if the number of variables is greater than the number of observations.

Answer 17

False, because SST is not a function of x.

Answer 18

True. This is true if both tests have the same set of hypotheses.

Answer 19

Var(X) + Var(Y) + 2Cov(X,Y)

Answer 20

Var(X) + Var(Y) - 2Cov(X,Y)

Answer 21

False. An increase in λ actually corresponds to a decrease in the "budget" allowed for the coefficients' magnitudes, not an increase.

Answer 22

False. As λ decreases towards 0, the model becomes less biased due to flexibility (and thus variance) increasing when λ decreases.

Answer 23

False. Increasing the budget parameter decreases λ, which results in an increase in variance.

Answer 24

True. When λ is close to 0, the penalty effect diminishes, and the estimates approach those of the ordinary least squares, which has no penalty for coefficient size.

Answer 25

False. A high λ value shrinks coefficients towards zero but does not ensure all are exactly zero

Answer 26

True. Backward stepwise selection is more computationally efficient than best subset selection because it evaluates a smaller subset of models.

Answer 27

True. Unlike the forward stepwise selection, backward stepwise selection cannot be used if the number of variables is greater than the number of observations.

Answer 28

False. In high-dimensional settings, backward stepwise selection is not feasible, whereas forward stepwise selection remain feasible.

Answer 29

True. The most statistically insignificant variable (the one whose absence causes the smallest decrease in the R^2) is dropped at each step of backward stepwise selection.

Answer 30

True. Shrinkage methods, such as ridge and lasso, make the model less flexible than OLS, resulting in higher bias but lower variance. These methods improve prediction accuracy when the increase in bias is smaller than the decrease in variance.

Answer 31

False. They decrease model flexibility.

Answer 32

True. The validation set approach may suffer from high bias if the validation split does not well-represent the entire dataset.

Answer 33

True. LOOCV is computationally intensive because it involves training the model n times, each time leaving out a different single observation.

Answer 34

True. By repeatedly splitting the data into different subsets for training and validation, k-fold cross-validation provides a more robust estimate of model performance, striking a better balance between bias and variance in error estimates.

Answer 35

False. With different random splits, the validation set approach can result in more variability and less stability in error estimates compared to K-fold CV, which benefits from averaging across multiple splits.

Answer 36

True. LOOCV is particularly useful for small datasets because it maximally utilizes available data for training, while K-fold CV might not provide enough data in each training fold for very small datasets.

Answer 37

False. Stepwise regression often involves data snooping, where fitting a large number of models to one set of data increases the chance of finding one that appears to fit well by chance.

Answer 38

False. Stepwise regression typically does not account for non-linear relationships or the presence of outliers and high leverage points unless specifically designed to do so.

Answer 39

False. There's no guarantee that the final model returned is the best one among all models that can be constructed, as only a subset of the possible models are considered.

Answer 40

False. Stepwise regression often relies heavily on t-ratios for adding or removing variables, rather than utilizing a broader array of statistical measures.

Answer 41

True. One of the primary drawbacks of stepwise regressions is that they consider only some of the 2^k possible models. Consequently, there's a possibility that the best model, especially if it involves non-linear combinations of predictors or is otherwise outside the considered subset, might be missed.

Answer 42

False. The presence of multicollinearity does not always imply that the information is redundant. It is possible for two variables that are highly correlated to complement one another. This is the case of a suppressor variable, where a predictor increases the importance of another predictor.

Answer 43

True. Tolerance is 1/VIF. Tolerances of 0.1 or 0.2 means that VIF is above 5 or 10

Answer 44

Logarithmic and square root transformation

Answer 45

Correlation formula ^2

Answer 46

True. On the other hand, ridge regression estimates are not scale equivariant. This is why scaling affects the result of ridge regression, and we often recommend variables to be scaled prior to performing ridge regression.

Answer 47

False. As λ increases, flexibility decreases.

Answer 48

True. As λ increases, the flexibility of the ridge regression fit decreases, leading to decreased variance but increased bias.

Answer 49

False. For polynomial or linear regression, LOOCV is computationally efficient because of the shortcut formula available. The shortcut formula enables the calculation of the estimated test error from just a single round of fitting, which makes the cost of LOOCV the same as a single model fit.

Answer 50

False. k-fold CV, by using smaller training sets, tends to overestimate the test error more than LOOCV.

Answer 51

False. 5-fold CV is more computationally efficient than 10-fold CV because it requires fewer runs.

Answer 52

True. The statement is about an unbiasedness property of Cp and reinforces the idea that selecting a model with a small Cp tends to lead to a model with a small test MSE.

Answer 53

False. All of the four model comparison statistics (adjusted R^2, Cp, AIC, BIC) aim to indirectly estimate the test error by adjusting the training error to account for model complexity. Direct estimation of test error can be achieved through resampling methods such as the validation set approach and cross-validation.

Answer 54

False. There are n squared errors, one for each of the observation included in the fitting process for LOOCV.

Answer 55

False. Stepwise regression does not consider models based on non-linear combinations of predictors, focusing instead on linear relationships.

Answer 56

True. Like principal components analysis (PCA), PLS is a dimension reduction method.

Answer 57

False. Unlike principal components regression (PCR), PLS identifies new features in a supervised way; it uses the target variable to create new features that not only approximate the old features well, but also that are related to the target variable.

Answer 58

True. Since the slope coefficients for each simple linear regression are used for the first direction, a larger value indicates a stronger relationship with the target variable.

Answer 59

True. The loadings for the first direction are proportional to the correlations between the response and each predictor. Since the predictors are standardized, the correlations and the covariances are proportional to each other. Hence, the loadings are proportional to the covariances as well.

Answer 60

False. One fold is designated as the validation set, while the model is trained on the remaining k-1 folds.

Answer 61

False. KNN regression estimates f(x0) using the average of all the K training responses, not distance-weighted average.

Answer 62

False. If the number of variables is greater than the number of observations, then there is no longer a unique least squares coefficient estimate: the variance is infinite so the method cannot be used at all.

Answer 63

False. Ridge regression that tends to perform better when many predictors influence the response variable.

Answer 64

False. Only Lasso can force coefficients to exactly zero.

Answer 65

False. The lasso penalty is based on the l1 norm, not the l2 norm. In contrast, ridge penalty is based on the l2 norm.

Answer 66

True. Standardizing predictors ensures that the shrinkage penalty is applied uniformly across all predictor coefficients, which is important for both ridge regression and the lasso.

Answer 67

False. Whether the lasso or ridge regression yields better prediction accuracy depends heavily on the specific dataset and situation.

Answer 68

False. Studentised residuals in MLR should be realizations of a t distribution.

Answer 69

False. Studentized residuals are comparable across different contexts because they are unitless.

Answer 70

False. A likely outlier is indicated by the magnitude of the studentized residual, not its sign.

Answer 71

False. A high leverage point is identified using leverage, not studentized residual.

Answer 72

Large R^2 and small t-statistics

Answer 73

False. As λ increases the training error will increase because the flexibility has decreased.

Answer 74

False. The test error decreases initially, and then increases, following a U shape.

Answer 75

False. The variance will steadily decrease.

Answer 76

False. The irreducible error never changes.

Answer 77

False. The validation set error rate is not skewed because the model is fitted to the training set, not the validation set.

Answer 78

False. The sum of bj^2 must decreases to minimise the SSE penalty. It is still possible for INDIVIDUAL bj's to increase in absolute value as λ increases.

Answer 79

False. It is still possible for INDIVIDUAL bj's to increase in absolute value as λ increases.

Wronged Questions: Linear Models Flashcards

(167 cards)