2. Linear Models Flashcards

1
Q

Difference between prediction and confidence interval in MLR.

A

Confidence: range for the mean response
Prediction: range for a response value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the hierarchical principle with interaction terms in MLR?

A

A significant interaction term implies that its individual terms should also be in the model, regardless of the t tests associated with the individual terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Model diagnostics: what is misspecified model equation? Give an example

A

Incorrectly assuming that the true form of f follows your model.

When there is evidence of a higher order polynomial relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Model diagnostics: residuals with non-zero averages. What does this mean

A

Residuals are realizations of the true error terms that come from a normal distribution, this means that some aspect of linear regression is incorrect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Model diagnostics: heteroscedasticity. This leads to an unreliable _____

A

Variance of the error term is not constant, there is evidence of more than one variance parameters.

This leads to an unreliable MSE, then all outputs that rely on MSE are also unreliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Model diagnostics: dependent errors, what does this mean in terms of the Y’s? The ___ will also be underestimated, leading to ____ CI and PI intervals.

A

This means that the Y’s have non-zero covariances.

The standard errors will be underestimated, this will make the intervals narrower and p-values smaller.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Model diagnostics: why is it bad if error terms are non-normal?

A

Then we are unable to make inferences based on the F and t distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Model diagnostics: multicollinearity. What is this, and what does it lead to?

A

When one of the predictors is correlated with another predictor. This means that the estimates of the regression coefficients will be unstable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Does multicollinearity affect the predictive power of y-hat, MSE or F test results?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model diagnostics: unusual points. What are these and what do they do to the model?

A

Outliers: extreme residuals
High leverage point: observation with an unusual set of predictor values. The bj’s are sensitive to these points and could affect the shape of the model greatly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Model diagnostics: high dimensions, what does this mean?

A

The model is too flexible, it overfits the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following can challenge the interpretation of a regression coefficient?

Misspecified model equation, multicollinearity, high leverage points

A

Misspecified model equation does make interpreting the bj’s problematic.

Multicollinearity masks which predictors are actually meaningful to the model.

High leverage points have a strong influence over the bj’s.

Answer: all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the formula for leverage? In SLR?

A

Formula sheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

High leverage point is given by what inequality?

A

h > 3((p+1)/n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a studentized residual? They can be a realization of what distribution?

A

A unit less version of a residual, they are the raw residuals divided by an appropriate standard error.

They can be a realization of a t distribution with df = n-p-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the formula for Cooks Distance? What does it measure, what distribution is it a realization of? An observation has a typical influence if D = ?

A

Formula sheet

Measures influence, realization of the F distribution with ndf = p+1 and ddf = n-p-1

Typical influence if D = 1/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In the plot of e vs y-hat; what makes residuals well behaved?

A
  1. Points are randomly scattered and lacking trends.
    If the residuals seem to be acting as a function of y-hat, then the model is likely missing a predictor that can explain the trend.
    Ex. U-shaped … add a positive quadratic term
  2. Non-zero average of residuals.
    Equally spread above and below the 0 line
  3. Heteroscedasticity.
    The residuals have inconsistent spread (cone-like shapes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How may we solve the issue of heteroscedasticity?

A

Cone shape toward infinity: transform the response using log, or square root (any concave function)

Cone shape toward 0: weighted least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How may we solve the issue of dependent errors?

A

We use time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How may we solve the issue of non-normal errors?

A

This occurs when the response is discrete in nature.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How may we solve the issue of multicollinearity?

A
  1. Exclude all but one of those predictors from the model
  2. Combine the predictors
  3. Do nothing and report it’s presence
  4. Use orthogonal predictors, then we know that they are uncorrelated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a suppressor variable? Should we add these to our model?

A

This is when multicollinearity is accepted.

This is a predictor that is weakly correlated with the response, but due to being related to other predictors, it enhances their usefulness. This means that adding a suppressor variable leads to a better model, even if it produces multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What happens when the residuals exhibit a predictable pattern from observation to observation?

A

Use a time series model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the e vs i plot used to detect?

A

Dependence of error terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Is forward selection a greedy approach?

A

Yes, because it only adds the next best predictor p as p increases, rather than finding the best subset of predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the algorithm for backwards selection?

A

Fit the full model, drop the predictor that results in the highest SSE.

27
Q

Disadvantage of both forward and backward selection?

A

These two procedures will result in nested models. There is no certainty that the absolute best model will be found.

28
Q

Which of backwards and forwards handles higher dimensions better?

A

Forward

29
Q

Besides R^2a, what are 4 other criterion’s that capture model quality? Would we like to minimize or maximize these values?

A

Mallows’ Cp
AIC
BIC
CV error

We wish to minimize these statistics

30
Q

AIC,BIC,Cp mimic what statistic?

A

Test MSE

31
Q

CP is an unbiased estimate of _____ if it is calculated using an unbiased estimate of sigma.

A

Test MSE

32
Q

When a model is overfit p, what selection criterion (AIC,BIC,Cp, R^2a) will be unreliable?

A

All of them, they are all functions of SSE

33
Q

What is the formula for LOOCV error p?

A

Formula sheet

34
Q

Which type of CV will overestimate the test MSE and which will not have this issue?

A

Validation set, there are only 2 subgroups, meaning the data will vary quite a bit if a different split is chosen.

KFOLD & LOOCV will not have this issue

35
Q

CV: in order of least to most biased.

A

LOOCV , Kfold, validation set

36
Q

CV: in order of least to most variance

A

Validation set , kfold, LOOCV

37
Q

Ridge and Lasso regression are _____ regression techniques. They aim to decrease a models ______. They both use _____ predictors.

A

Shrinkage methods
Decrease variance
Scaled predictors

38
Q

The tuning parameter in lasso and ridge is inversely related to ____. How is this tuning parameter chosen?

A

Flexibility.

The parameter is chosen using CV

39
Q

Ridge is preferred to lasso when _____.

A

When we are sure that the response is a function of all predictors. Because lasso may drop a predictor.

40
Q

Are both lasso and ridge useful when dealing with high dimensions?

A

Yes, as only some of the predictors will have a meaningful estimated coefficient.

41
Q

True or false: lasso is subjectively better than ridge

A

False. There is no clear advantage between the two.

42
Q

Which of lasso or ridge is easier to interpret when considering many features?

A

Lasso, as it can drop predictors

43
Q

What is tolerance?

A

Reciprocal of VIF

44
Q

If one dummy variable is not significant to the model (summer is not significant, but winter and fall are), should it be dropped?

A

No, it leads to altering the categorical variable to have w-1 classes instead of w. Further investigations are needed

45
Q

Can backward selection be done if n<p></p>

A

No, backwards requires n to be larger than p.

46
Q

Performing best subset selection requires fitting all ____ models. (A number)

A

2^p

47
Q

In lasso regression, which of the following is true?
A. As lambda increases, the number of predictors in the model will increase.
B. As lambda increases, the bias of the parameters in the chosen model will increase.
C. As lambda increases, the variance of the predictions made by the chosen model will increase.

A

B

48
Q

How many models are fit in forwards selection?

A

1+(p(p+1))/2

49
Q

In ridge regression, as the BUDGET parameter increases, which of the following will occur?
A. Training SSE will steadily inscrease
B. Test SSE will have an upside down U shape
C. variance will steadily decrease
D. Squared bias will follow a U shape
E. Irreducible error will remain constant

A

In ridge regression, as budget parameter increases -> flexibility increases
A. False. Training SSE decreases
B. False. Test MSE follows a u shape
C. False. Variance increases.
D. False. Squared bias decreases
E. True. Irreducible error always remains constant.

50
Q

In lasso and ridge, is the budget parameter inversely related with the lambda that is in our optimization problem?

A

Yes

51
Q

If E=0, the 95% confidence interval will be equal to the 95% prediction interval.

A

True

52
Q

For OLS, the residuals must sum to __.

A

0

53
Q

How do we calculate CV error in k fold CV?

A

We average the k test MSEs

54
Q

How do we calculate CV error in validation set CV?

A

Using the other half of the data, the testing set, we estimate the test MSE

55
Q

With high dimensional data, which of the following become unreliable?
A. Fitted equation
B. R^2
C. Confidence intervals for regression coefficients

A

All 3.

The fitted equation equation would fail to generalize.

56
Q

In the presence of heteroscedasticity, which of the following become unreliable?
Adjusted R^2
VIF
F test

A

Adjusted R^2 and F test.

VIF doesn’t have SSE as an input. Unrelated to heteroscedasticity

57
Q

When there is perfect multicollinearity, which of the following are true?
A. It is impossible to determine which predictors are truly significant
B. There is an issue with estimating the regression coefficients.
C. There is no concern with the MSE

A

All true.
As coefficients become less reliable, they mask which predictors are actually significant. With perfect multicollinearity the OLS estimates are no longer unique.

MSE is still reliable because the predictive ability of y-hat is not in question.

58
Q

Are ridge and lasso scaled with variance or standard deviation?

A

Standard deviation

59
Q

Is linear regression a flexible approach?

A

No, it’s relatively inflexible. Can only generate linear functions.

60
Q

KNN Regression. Which of the following are true?
A. N must be large to produce good predictions because this is a non-parametric method
B. It performs well in high dimensions
C. It will outperform linear regression when the chosen functional form poorly approximates the true relationship between x and y.

A

A. True
B. False. It doesn’t perform well in high dimensions.
C. True

61
Q

When n=p+1, what happens to the model?

A

It fits the data perfectly. So y-hat = y, SSE=0,

62
Q

When we use (best subset, forward, backward) with CV, do we use the training data set to build our model or the entire data set?

A

Best subset/forward/backward is to be only done on the training set.

63
Q

In the CV algorithms, are the validation set errors used to find the optimal number of p? Or the optimal entire model?

A

Optimal number of p. The optimal model is determined by using all the data points, not just the training set.