SRM Chapter 2 Flashcards

Question

Positive Residual

Answer 1

- Actual observation > (larger than) predicted observation

Answer 2

- Actual observation < (smaller than) predicted observation

Answer 3

- Y = B0 + e - No predictors (x's) - No relationship between y and x's

Answer 4

- High - Means more of the variance in y can be explained by the predictor(s). - Want this to be as high as possible so that the unexplained variance is minimized.

Answer 5

- Adjusted R-squared - Because R-squared increased as predictors are added so a larger R-squared doesn't necessarily mean a better model. - But Adjusted R-squared accounts for the number of predictors so it is a better method of comparison between models.

Answer 6

- Test to see whether the slope parameter is 0 (B1 = 0). - H0: B1 = 0 - H1: B1 <> 0 - If true, then there is no relationship between the x's and y. - So, we want to reject H0 to say that it's plausible that there is a linear relationship between x's and y.

Answer 7

- For significance level a, reject H0 if: - |t-stat| => ta/2,n-2 - p-value <= a

Answer 8

- Same as two-tailed but sometimes it's more appropriate to only have to reject one region - Looking to prove that there is a positive slope between x and y.

Answer 9

- When only a right tail rejection is needed

Answer 10

- When only a left tail rejection is needed

Answer 11

- Confidence: range for the mean response (across all observations) - Prediction: range for the response of a new observation - Prediction > Confidence (prediction is always at least as wide as the confidence interval).

Answer 12

- Range that estimates the MEAN response - Narrows in the middle

Answer 13

- Range that estimates a NEW observation's response - Narrows in the middle (when the chosen predictor value is also the sample mean of the predictor)

Answer 14

- Prediction accounts for the variance in e in addition to Y-hat - Have to cast a wider net to predict a new single response as opposed to the mean response over all observations

Answer 15

- B0, B1,...,Bp (Bj's). - B0 is still the intercept - B1,...,Bp are regression coefficients instead of slope because that no longer makes sense with multiple predictors (x's).

Answer 16

- Predictor xj must not be a linear combination of other p predictors - Because if an xj is a linear combination of other predictors it doesn't add any additional information about the relationship between x's and y.

Answer 17

- Models that share a set of predictors - Each model is a subset of the next model with more predictors

Answer 18

- p is a measure of flexibility

Answer 19

- p and SSE are inversely related - As flexibility increases, the amount of unexplained variability decreases

Answer 20

- As predictors are added: - Flexibility (p) increases - SSE (unexplained variability) decreases as more of the variability becomes explained - R-squared (ratio of explained variability to total variability) increases as more of the variability becomes explained

Answer 21

= 1 - SSE/SST (1 - ratio of unexplained variability to total variability) = SSR/SST (ratio of explained variability to total variability)

Answer 22

- Adjusted R-squared should (almost) always be LESS than R-squared - Because you can think of Adjusted R-squared as a shrunken version of R-squared that removes inflation from an added number of predictors - Two cases where Adjusted R-squared is GREATER than R-squared: - p = 0 (there are no predictors) - when R-squared = 1 (all of the variance is explained by the predictors, think 100%).

Answer 23

Correlation coefficient = sqrt(R-squared)

Answer 24

If p-value > significance level, that variable is insignificant and should be dropped.

Answer 25

Drop variables for which the p-value > the acceptable significance level. Drop one at a time (because p-values may change after a variable is dropped), starting with the highest p-value exceeding the significance level.

Answer 26

number of observations - (number of predictors + 1) (Add one to the number of predictors/xi's because x0).

Answer 27

1. Find the degrees of freedom as: # observations - (# predictors + 1) 2. Should be given significance level (a) - if the test is two-tailed, divide by 2. 3. Find the value on the t-table that corresponds with the df and the significance level. 4. Anything that has a t-statistic (absolute value) less than the value on the t-table is not statistically different from 0.

Answer 28

The significance of all predictors collectively. The hypothesis being tested (H0) is all of the coefficients = 0. If the p-value is greater than the significance level, then all of the coefficients (Bi's) are not statistically different from 0 and their respective xi's should be removed from the model.

Answer 29

H0: B1 = ... = Bi = 0 If the p-value of the F-test is greater than the significance level, then we fail to reject H0. this means that the Bi's (coefficients) are not statistically different from 0 and their corresponding xi's should be removed from the model.

Answer 30

1. Misspecified model equation 2. Residuals with non-zero averages 3. Heteroscedasticity 4. Dependent errors 5. Non-normal errors 6. Multicollinearity 7. Outliers 8. High leverage points 9. High dimensions

Answer 31

Assuming f looks like Y = B0 + B1x1 + B2x2 + ... + Bpxp + e e.g. if you attempt to fit a linear relationship to something that has a higher-order polynomial relationship More generally just knowing when linear regression is appropriate or not.

Answer 32

Residuals are how we quantify/approximate the irreducible error. Since the irreducible error is assumed to have a mean of 0, the residuals should have an average of 0 as well. If the average of the residuals is far from 0 there is something wrong with the model (this is not a violation but a symptom that points out that there is a violation).

Answer 33

For a bunch of residuals for observations with a similar y-hat, check their averages and they should each be close to 0. (Note that averaging all of them together won't produce 0).

Answer 34

Recall homoscedasticity = e is constant across all observations. Heteroscedascity is when e is not constant across all observations i.e. there is more than one variance parameter (sigma squared). Problems: - Unreliable MSE - Coefficient estimators (B-hats) don't have the smallest variance (but they are still unbiased)

Answer 35

When you wrongly assume e's are independent across observations: - Get underestimated se's - CI and PI will be narrower - p-values will be smaller - May pick wrong/non-optimal regression coefficient estimates (B-hats)

Answer 36

If the error terms (e's) don't follow a normal distribution, we can't perform hypothesis tests because we can't say that estimators follow a t- or F-distribution.

Answer 37

When a predictor is or is close to being a linear combination of other predictors. We get: - Unstable estimates of regression coefficients (bj's) (can't pick the best one that minimizes the SSE). - This leads to larger se's so it's harder to reject H0 for t-tests It does not affect: - y-hat - reliability of MSE - F-test results

Answer 38

Outlier: observation with extreme residual (y - y-hat, actual - predicted). This inflates the SSE.

Answer 39

High leverage point: observation with weird predictor values (x's) (any one predictor value might be normal but all together they are strange).

Answer 40

High-dimensional data is when p (number of predictors) is too large. This is relative to n (number of observations). Linear regression is meant for datasets with n much greater than p. Issues of high-dimensionality: - Overfitting

Answer 41

Quantity of predictors (p) dilutes the quality of data (information becomes sparse) when spread across a small number of observations. *Note that this only happens with MLRs because SLRs only have one predictor.

Answer 42

When the number of observations is lesser than or equal to the number of predictors: - Overfitting - The fitted equation will predict the responses perfectly - No degrees of freedom w/ error - Unreasonably low SSE

Answer 43

How much an observation influences the prediction of the response. Observation = i Predictors = x's Leverage = hi

Answer 44

hi = (standard error of y-hat)^2/MSE

Answer 45

- If hi > [3(p+1)]/n for an observation. - Leverage is between 0 and 1 so no absolute value needed.

Answer 46

The model is poor because it is likely missing a key predictor. - This is because we have a quadratic plot for something that should be linear, so there should probably be a square of an explanatory variable included as a predictor. - Don't know if it's a homoscedasticity violation because there is a clear trend in the residual plot.

Answer 47

sebj = sqrt(var-hat[Bj])

Answer 48

Use only orthogonal (think perpendicular) predictors. This way we can ensure they are not linear combinations of one another.

Answer 49

- Using only orthogonal predictors will completely eliminate multicollinearity. - Dropping/combining predictors that have high variance inflation (this reduces the possibility of approximate linear relationships btw predictors).

Answer 50

- Between 1/n and 1 - All hi's sum to p+1

Answer 51

Combines effects of outliers and leverage

Answer 52

When the standardized residual is greater than 2 or 3 (absolute value).

Answer 53

When its leverage is greater than 3x the average leverage.

Answer 54

When Cook's distance is much larger than 1/n (Cook's distance can range from 1/n to 1).

Answer 55

1. Include it but add a comment until we can do more data analysis. 2. Delete it from the dataset (if it's incorrect data collection). 3. Create a binary variable that indicates whether or not the observation is an outlier (this deals with observations where there isn't a specific reason for them being outliers).

Answer 56

- Recall heteroscedasticity is when error terms are not constant across all observations. This makes the residuals act strangely because the error term is not the same in the equation: residual = actual - observed. Examples of what the graph looks like: - Residuals have a varying spread from 0 - Spread increases with larger fitted values

Answer 57

- Residuals are not evenly distributed or symmetric, just all over the place - Might be several weirdly large/small residuals indicating a right/left skew

Answer 58

- Large R-squared value: Recall that R-squared tells us how much of the variance in y is explained by the model. So if it predicts super well it might be because there is a linear combination amplifying the effects of predictors -> lead to overfitting? - Small t-statistics: Recall that t-stat = b-hatj/sebj (estimated coefficient/its standard error). Also recall that multicollinearity inflates standard error... so the t-stat will be smaller than it should be. - Note: need these two conditions together. This explains that the model does well (high R-squared) but since the t-stats are small it's harder to reject H0 (a coefficient is statistically different from 0) so we can't really say if their respective predictors have a relationship with the response variable (y).

Answer 59

- Residual/estimate of its standard deviation - Should be realized from t-distribution (regular residual should be realized from normal distribution) - Unitless (so comparable across diff contexts)

Answer 60

1 (Think of inflation factor as a multiplier so since there is no correlation the variance is multiplied by 1 i.e. no effect)

Answer 61

Heteroscedasticity (it suggests the variance of errors is not constant across all observations)

Answer 62

If the observation' standardized residual is greater than 2. Note that you should use the absolute value.

Answer 63

Measure of how much the variance of a regression coefficient is inflated because of multicollinearity. VIF = 1 means no correlation (remember to think of it as a multiplier) VIF > 1 means there is correlation, this is a symptom of multicollinearity VIF > 10 means severe multicollinearity (Frees)

Answer 64

A predictor that increases the importance of another predictor. If there is multicollinearity you might think that information provided by a variable is ALWAYS redundant because it's a linear combination of another variable. This is not the case because a suppressor variable is an exception.

Answer 65

Tolerance is the reciprocal of VIF Tolerance = 1/VIF

Answer 66

- If VIF is greater than 5 or 10 - Equivalently, if tolerance is less than 0.1 or 0.2 - Recall that tolerance is the reciprocal of VIF

Answer 67

Outlier: if the observation is far from the line of best fit. High leverage point: if the x-value of the observation is unlikely (different/far from the other x values). *remember: "unusual in the horizontal direction"

Answer 68

No. The total sum of squares is a function of the observed values -> has nothing to do with the underlying variables. So it remains unchanged.

Answer 69

Both are unitless/dimensionless.

Answer 70

Studentized. For standardized both the e and the MSE will be really large, and since e is in the numerator and the MSE is in the denominator they can cancel each other out.

Answer 71

The hat matrix: X(X-transposeX)^-1X-transpose

Answer 72

Random scatter, no discernible pattern

Answer 73

The idea that a simpler model is preferred over a more complex model that doesn't substantially improve the simpler model (think doesn't provide much more information)

Answer 74

Using the same dataset for both developing (training) and evaluating (testing) a model. This can lead to overfitting.

Answer 75

Result of subtracting the sample mean from a variable

Answer 76

Result of dividing a variable by its unbiased sample sd

Answer 77

Result of centering then scaling a variable. 1. Start with the variable 2. Subtract the sample mean from it 3. Divide it by its unbiased sample standard deviation

Answer 78

- Posterior mode for B under a GAUSSIAN prior. - Priori that the coefficients are randomly distributed about 0.

Answer 79

- Posterior mode for B under a DOUBLE-EXPONENTIAL prior. - Priori that many of the coefficients are exactly 0.

SRM Chapter 2 Flashcards

(106 cards)