Regression - Middle Units Flashcards

1
Q

In multiple regression, we need the linearity assumption to hold for at least one of the predicting variables

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When making a prediction for predicting variables on the “Edge” of the space of predicting variables, then its uncertainty level is high

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The prediction of the response variable and the estimation of the mean response have the same interpretation

A

False (prediction has higher uncertainty than estimation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple regression model

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The estimator of the mean response is unbiased

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The sampling distribution of the prediction of the response variable is a chi-squared distribution

A

False (In multiple linear regression, the sampling distribution of the prediction of the response variable is a t-distribution since the variance of the error term is not known.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A linear regression model has high predictive power if the coefficient of determination is close to 1

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cooks distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The coefficient of variation is used to evaluate goodness-of-fit

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Influential point in multiple linear regression are outliers

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

We could diagnose the normality assumption using the normal probability plot

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the Box-Cox transformation

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If a logistic regression model provides accurate classification, then we can conclude that it is a good fit for the data

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The logit function is the log of the ratio of the probability of success to the probability of failure. It is also known as the log odds function

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

We interpret logistic regression coefficients with respect to the response variable

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The likelihood function is a linear function with a closed-form solution

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In logistic regression, there is not a linear relationship between the probability of success and the predicting variables

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

We can use a z-test to test for the statistical significance of a coefficient given all predicting variables in a Poisson regression model

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The number of parameters that need to be estimated in a logistic regression model with 5 predicting variables and an intercept is the same as the number of parameters that need to be estimated in a standard linear regression model with an intercept and same predicting variables.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Although there are no error terms in a logistic regression model using binary data with replications, we can still perform residual analysis

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

A goodness-of-fit test should always be conducted after fitting a logistic regression model without repetition

A

False

30
Q

For a classification model, training error tends to underestimate the true classification error rate of the model

A

True

31
Q

The binary response variable in logistic regression has a bernoulli distribution

A

True and false

32
Q

For logistic regression, if the p-value of the deviance test for goodness-of-fit is large, then it is an indication that the model is a good fit

A

True

33
Q

The error term in logistic regression has a normal distribution

A

False (the error term does not exist!)

34
Q

The estimated regression coefficients in Poisson regression are approximate

A

True

35
Q

In Poisson regression, there is a linear relationship between the log rate and the predicting variables

A

True

36
Q

Under logistic regression, the sampling distribution used for a coefficient estimator is a chi-square distribution

A

False (normal distribution)

37
Q

An over dispersion parameter close to 1 indicates that the variability of the response is close to the variability estimated by the model

A

True

38
Q

When testing a subset of coefficients, deviance follows a chi-square distribution with degrees of freedom, where q is the number of regression coefficients in the reduced model

A

False (q is the number of discarded coefficients)

39
Q

For both logistic and poisson regression, both the pearson and deviance residuals should approximately follow the standard normal distribution if the model is a good fit for the data

A

True

40
Q

The logit link function is the best link function to model binary response data because the models produced always fit the data better than other link functions

A

False

41
Q

If the non-constant variance assumption does not hold in multiple linear regression, we apply a transformation to the predicting variables

A

False (we apply a transformation to the response)

42
Q

Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent

A

True

43
Q

In logistic regression, R_2 could be used as a measure of explained variation in the response variable

A

False

44
Q

The interpretation of the regression coefficients is the same for both logistic and poisson regression

A

False

45
Q

We estimate the regression coefficients in Poisson regression using the MLE approach

A

true

46
Q

The f-test can be used to test for the overall regression in poisson regression

A

False (perhaps chi-square)

47
Q

A logistic regression model may not be a good fit if the responses are correlated or if there is heterogeneity in the success that hasn’t been modeled

A

True

48
Q

Trying all three link functions for a logistic regression model (C-In-In, probit, logit) will produce models with the same goodness of fit for a dataset

A

False

49
Q

A poisson regression model fit to a dataset with a small sample size will have a hypothesis testing procedure with more Type I errors than expected

A

True

50
Q

If poisson regression model does not have a good fit, the relationship between the log of the expected rate and the predicting variables might not be linear

A

True

51
Q

R-squared decreases as more predictors are added to a multiple linear regression model, given that the predictors added are unrelated to the response variable

A

False

52
Q

In a multiple linear regression model, an observation should always be discarded when its Cook’s distance is greater than 4/n where n is the sample size

A

False

53
Q

A linear regression model is a good fit to the data set if the adjusted R-squared is above 0.85

A

False

54
Q

The sum or squares regression (SSR) measures the explained variability captured by the regression model given the explanatory variables used in the model

A

True

55
Q

The hypothesis testing procedure for subsets of regression coefficients is not used for GoF assessment in logistic regression

A

True

56
Q

Statistical inference for logistic regression is not reliable for small sample size

A

True

57
Q

In logistic regression, we can perform residual analysis for binary data with replications

A

True

58
Q

When assessing GoF for a logistic regression model on binary data with replications, the assumption is that the response variables come from a normal distribution

A

False (what dist?) (EH: I think they’re going for binomial/bernoulli here)

59
Q

The null hypothesis for the GoF test of a logistic regression model is that the model does not fit the data

A

False

60
Q

The threshold to calculate the classification error rate of a logistic regression model should always be set at 0.5

A

False

61
Q

Using leave one out cross validation is equivalent to k-fold cross validation where the number of folds is equal to the sample size of the training set

A

True

62
Q

the assumption of constant variance will always hold for standard linear regression models with poisson distributed response data

A

False

63
Q

since there are no error terms in poisson model, we cannot perform residual analysis for evaluating the model’s goodness of fit

A

False

64
Q

We can diagnose the constant variance assumption in Poisson regression using the normal probability plot

A

False

65
Q

In poisson regression, the expectation of the response variable given the predictors is equal to the linear combination of the predicting variables

A

False (the natural log of y is a linear combination)

66
Q

The variance of the response is equal to the expected value of the response in Poisson regression with no overdispersion

A

True

67
Q

A poisson regression model with p predictors and the intercept with have p+2 parameters to estimate

A

False (it’s p+1 because there’s no error term)

68
Q

If a Poisson regression model is found to be overdispersed, there is an indication that the variability of the response variable implied by the model is larger than the variability present in the observed response variable

A

False (overdispersion means the variability in the response variable is larger than the model indicates)

69
Q

In all the regression models we have considered (including multiple linear, logistic, and Poisson), the response variable is assumed to have a distribution from the exponential family of distributions.

A

True

70
Q

When considering using generalized linear models, it’s important to consider the impact of Simpson’s paradox when interpreting relationships between explanatory variables and the response. This paradox refers to the reversal of these associations when looking at a marginal relationship compared to a conditional one.

A

True