Regression - Middle Units Flashcards
In multiple regression, we need the linearity assumption to hold for at least one of the predicting variables
False
Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients
True
The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients
True
When making a prediction for predicting variables on the “Edge” of the space of predicting variables, then its uncertainty level is high
True
The prediction of the response variable and the estimation of the mean response have the same interpretation
False (prediction has higher uncertainty than estimation)
In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors
False
We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple regression model
True
Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients
True
The estimator of the mean response is unbiased
True
The sampling distribution of the prediction of the response variable is a chi-squared distribution
False (In multiple linear regression, the sampling distribution of the prediction of the response variable is a t-distribution since the variance of the error term is not known.)
Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent
False
A linear regression model has high predictive power if the coefficient of determination is close to 1
False
In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases
False
Cooks distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed
True
The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response
False
The coefficient of variation is used to evaluate goodness-of-fit
False
Influential point in multiple linear regression are outliers
True
We could diagnose the normality assumption using the normal probability plot
True
If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model
False
If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the Box-Cox transformation
True
If a logistic regression model provides accurate classification, then we can conclude that it is a good fit for the data
False
The logit function is the log of the ratio of the probability of success to the probability of failure. It is also known as the log odds function
True
We interpret logistic regression coefficients with respect to the response variable
False
The likelihood function is a linear function with a closed-form solution
False
In logistic regression, there is not a linear relationship between the probability of success and the predicting variables
True
We can use a z-test to test for the statistical significance of a coefficient given all predicting variables in a Poisson regression model
True
The number of parameters that need to be estimated in a logistic regression model with 5 predicting variables and an intercept is the same as the number of parameters that need to be estimated in a standard linear regression model with an intercept and same predicting variables.
False
Although there are no error terms in a logistic regression model using binary data with replications, we can still perform residual analysis
True
A goodness-of-fit test should always be conducted after fitting a logistic regression model without repetition
False
For a classification model, training error tends to underestimate the true classification error rate of the model
True
The binary response variable in logistic regression has a bernoulli distribution
True and false
For logistic regression, if the p-value of the deviance test for goodness-of-fit is large, then it is an indication that the model is a good fit
True
The error term in logistic regression has a normal distribution
False (the error term does not exist!)
The estimated regression coefficients in Poisson regression are approximate
True
In Poisson regression, there is a linear relationship between the log rate and the predicting variables
True
Under logistic regression, the sampling distribution used for a coefficient estimator is a chi-square distribution
False (normal distribution)
An over dispersion parameter close to 1 indicates that the variability of the response is close to the variability estimated by the model
True
When testing a subset of coefficients, deviance follows a chi-square distribution with degrees of freedom, where q is the number of regression coefficients in the reduced model
False (q is the number of discarded coefficients)
For both logistic and poisson regression, both the pearson and deviance residuals should approximately follow the standard normal distribution if the model is a good fit for the data
True
The logit link function is the best link function to model binary response data because the models produced always fit the data better than other link functions
False
If the non-constant variance assumption does not hold in multiple linear regression, we apply a transformation to the predicting variables
False (we apply a transformation to the response)
Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent
True
In logistic regression, R_2 could be used as a measure of explained variation in the response variable
False
The interpretation of the regression coefficients is the same for both logistic and poisson regression
False
We estimate the regression coefficients in Poisson regression using the MLE approach
true
The f-test can be used to test for the overall regression in poisson regression
False (perhaps chi-square)
A logistic regression model may not be a good fit if the responses are correlated or if there is heterogeneity in the success that hasn’t been modeled
True
Trying all three link functions for a logistic regression model (C-In-In, probit, logit) will produce models with the same goodness of fit for a dataset
False
A poisson regression model fit to a dataset with a small sample size will have a hypothesis testing procedure with more Type I errors than expected
True
If poisson regression model does not have a good fit, the relationship between the log of the expected rate and the predicting variables might not be linear
True
R-squared decreases as more predictors are added to a multiple linear regression model, given that the predictors added are unrelated to the response variable
False
In a multiple linear regression model, an observation should always be discarded when its Cook’s distance is greater than 4/n where n is the sample size
False
A linear regression model is a good fit to the data set if the adjusted R-squared is above 0.85
False
The sum or squares regression (SSR) measures the explained variability captured by the regression model given the explanatory variables used in the model
True
The hypothesis testing procedure for subsets of regression coefficients is not used for GoF assessment in logistic regression
True
Statistical inference for logistic regression is not reliable for small sample size
True
In logistic regression, we can perform residual analysis for binary data with replications
True
When assessing GoF for a logistic regression model on binary data with replications, the assumption is that the response variables come from a normal distribution
False (what dist?) (EH: I think they’re going for binomial/bernoulli here)
The null hypothesis for the GoF test of a logistic regression model is that the model does not fit the data
False
The threshold to calculate the classification error rate of a logistic regression model should always be set at 0.5
False
Using leave one out cross validation is equivalent to k-fold cross validation where the number of folds is equal to the sample size of the training set
True
the assumption of constant variance will always hold for standard linear regression models with poisson distributed response data
False
since there are no error terms in poisson model, we cannot perform residual analysis for evaluating the model’s goodness of fit
False
We can diagnose the constant variance assumption in Poisson regression using the normal probability plot
False
In poisson regression, the expectation of the response variable given the predictors is equal to the linear combination of the predicting variables
False (the natural log of y is a linear combination)
The variance of the response is equal to the expected value of the response in Poisson regression with no overdispersion
True
A poisson regression model with p predictors and the intercept with have p+2 parameters to estimate
False (it’s p+1 because there’s no error term)
If a Poisson regression model is found to be overdispersed, there is an indication that the variability of the response variable implied by the model is larger than the variability present in the observed response variable
False (overdispersion means the variability in the response variable is larger than the model indicates)
In all the regression models we have considered (including multiple linear, logistic, and Poisson), the response variable is assumed to have a distribution from the exponential family of distributions.
True
When considering using generalized linear models, it’s important to consider the impact of Simpson’s paradox when interpreting relationships between explanatory variables and the response. This paradox refers to the reversal of these associations when looking at a marginal relationship compared to a conditional one.
True