Regression - Middle Units Flashcards
In multiple regression, we need the linearity assumption to hold for at least one of the predicting variables
False
Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients
True
The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients
True
When making a prediction for predicting variables on the “Edge” of the space of predicting variables, then its uncertainty level is high
True
The prediction of the response variable and the estimation of the mean response have the same interpretation
False (prediction has higher uncertainty than estimation)
In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors
False
We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple regression model
True
Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients
True
The estimator of the mean response is unbiased
True
The sampling distribution of the prediction of the response variable is a chi-squared distribution
False (In multiple linear regression, the sampling distribution of the prediction of the response variable is a t-distribution since the variance of the error term is not known.)
Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent
False
A linear regression model has high predictive power if the coefficient of determination is close to 1
False
In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases
False
Cooks distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed
True
The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response
False
The coefficient of variation is used to evaluate goodness-of-fit
False
Influential point in multiple linear regression are outliers
True
We could diagnose the normality assumption using the normal probability plot
True
If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model
False
If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the Box-Cox transformation
True
If a logistic regression model provides accurate classification, then we can conclude that it is a good fit for the data
False
The logit function is the log of the ratio of the probability of success to the probability of failure. It is also known as the log odds function
True
We interpret logistic regression coefficients with respect to the response variable
False
The likelihood function is a linear function with a closed-form solution
False
In logistic regression, there is not a linear relationship between the probability of success and the predicting variables
True
We can use a z-test to test for the statistical significance of a coefficient given all predicting variables in a Poisson regression model
True
The number of parameters that need to be estimated in a logistic regression model with 5 predicting variables and an intercept is the same as the number of parameters that need to be estimated in a standard linear regression model with an intercept and same predicting variables.
False
Although there are no error terms in a logistic regression model using binary data with replications, we can still perform residual analysis
True