Regression - Middle Units Flashcards

1
Q

In multiple regression, we need the linearity assumption to hold for at least one of the predicting variables

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When making a prediction for predicting variables on the “Edge” of the space of predicting variables, then its uncertainty level is high

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The prediction of the response variable and the estimation of the mean response have the same interpretation

A

False (prediction has higher uncertainty than estimation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple regression model

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The estimator of the mean response is unbiased

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The sampling distribution of the prediction of the response variable is a chi-squared distribution

A

False (In multiple linear regression, the sampling distribution of the prediction of the response variable is a t-distribution since the variance of the error term is not known.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A linear regression model has high predictive power if the coefficient of determination is close to 1

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cooks distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The coefficient of variation is used to evaluate goodness-of-fit

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Influential point in multiple linear regression are outliers

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

We could diagnose the normality assumption using the normal probability plot

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the Box-Cox transformation

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If a logistic regression model provides accurate classification, then we can conclude that it is a good fit for the data

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The logit function is the log of the ratio of the probability of success to the probability of failure. It is also known as the log odds function

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

We interpret logistic regression coefficients with respect to the response variable

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The likelihood function is a linear function with a closed-form solution

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
In logistic regression, there is not a linear relationship between the probability of success and the predicting variables
True
26
We can use a z-test to test for the statistical significance of a coefficient given all predicting variables in a Poisson regression model
True
27
The number of parameters that need to be estimated in a logistic regression model with 5 predicting variables and an intercept is the same as the number of parameters that need to be estimated in a standard linear regression model with an intercept and same predicting variables.
False
28
Although there are no error terms in a logistic regression model using binary data with replications, we can still perform residual analysis
True
29
A goodness-of-fit test should always be conducted after fitting a logistic regression model without repetition
False
30
For a classification model, training error tends to underestimate the true classification error rate of the model
True
31
The binary response variable in logistic regression has a bernoulli distribution
True and false
32
For logistic regression, if the p-value of the deviance test for goodness-of-fit is large, then it is an indication that the model is a good fit
True
33
The error term in logistic regression has a normal distribution
False (the error term does not exist!)
34
The estimated regression coefficients in Poisson regression are approximate
True
35
In Poisson regression, there is a linear relationship between the log rate and the predicting variables
True
36
Under logistic regression, the sampling distribution used for a coefficient estimator is a chi-square distribution
False (normal distribution)
37
An over dispersion parameter close to 1 indicates that the variability of the response is close to the variability estimated by the model
True
38
When testing a subset of coefficients, deviance follows a chi-square distribution with degrees of freedom, where q is the number of regression coefficients in the reduced model
False (q is the number of discarded coefficients)
39
For both logistic and poisson regression, both the pearson and deviance residuals should approximately follow the standard normal distribution if the model is a good fit for the data
True
40
The logit link function is the best link function to model binary response data because the models produced always fit the data better than other link functions
False
41
If the non-constant variance assumption does not hold in multiple linear regression, we apply a transformation to the predicting variables
False (we apply a transformation to the response)
42
Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent
True
43
In logistic regression, R_2 could be used as a measure of explained variation in the response variable
False
44
The interpretation of the regression coefficients is the same for both logistic and poisson regression
False
45
We estimate the regression coefficients in Poisson regression using the MLE approach
true
46
The f-test can be used to test for the overall regression in poisson regression
False (perhaps chi-square)
47
A logistic regression model may not be a good fit if the responses are correlated or if there is heterogeneity in the success that hasn't been modeled
True
48
Trying all three link functions for a logistic regression model (C-In-In, probit, logit) will produce models with the same goodness of fit for a dataset
False
49
A poisson regression model fit to a dataset with a small sample size will have a hypothesis testing procedure with more Type I errors than expected
True
50
If poisson regression model does not have a good fit, the relationship between the log of the expected rate and the predicting variables might not be linear
True
51
R-squared decreases as more predictors are added to a multiple linear regression model, given that the predictors added are unrelated to the response variable
False
52
In a multiple linear regression model, an observation should always be discarded when its Cook's distance is greater than 4/n where n is the sample size
False
53
A linear regression model is a good fit to the data set if the adjusted R-squared is above 0.85
False
54
The sum or squares regression (SSR) measures the explained variability captured by the regression model given the explanatory variables used in the model
True
55
The hypothesis testing procedure for subsets of regression coefficients is not used for GoF assessment in logistic regression
True
56
Statistical inference for logistic regression is not reliable for small sample size
True
57
In logistic regression, we can perform residual analysis for binary data with replications
True
58
When assessing GoF for a logistic regression model on binary data with replications, the assumption is that the response variables come from a normal distribution
False (what dist?) (EH: I think they're going for binomial/bernoulli here)
59
The null hypothesis for the GoF test of a logistic regression model is that the model does not fit the data
False
60
The threshold to calculate the classification error rate of a logistic regression model should always be set at 0.5
False
61
Using leave one out cross validation is equivalent to k-fold cross validation where the number of folds is equal to the sample size of the training set
True
62
the assumption of constant variance will always hold for standard linear regression models with poisson distributed response data
False
63
since there are no error terms in poisson model, we cannot perform residual analysis for evaluating the model's goodness of fit
False
64
We can diagnose the constant variance assumption in Poisson regression using the normal probability plot
False
65
In poisson regression, the expectation of the response variable given the predictors is equal to the linear combination of the predicting variables
False (the natural log of y is a linear combination)
66
The variance of the response is equal to the expected value of the response in Poisson regression with no overdispersion
True
67
A poisson regression model with p predictors and the intercept with have p+2 parameters to estimate
False (it's p+1 because there's no error term)
68
If a Poisson regression model is found to be overdispersed, there is an indication that the variability of the response variable implied by the model is larger than the variability present in the observed response variable
False (overdispersion means the variability in the response variable is larger than the model indicates)
69
In all the regression models we have considered (including multiple linear, logistic, and Poisson), the response variable is assumed to have a distribution from the exponential family of distributions.
True
70
When considering using generalized linear models, it’s important to consider the impact of Simpson’s paradox when interpreting relationships between explanatory variables and the response. This paradox refers to the reversal of these associations when looking at a marginal relationship compared to a conditional one.
True