Regression - Middle Units Flashcards by Kyle Contrata

In multiple regression, we need the linearity assumption to hold for at least one of the predicting variables

False

How well did you know this?

Not at all

Perfectly

Multicollinearity in the predicting variables will impact the standard deviations of the estimated coefficients

True

How well did you know this?

Not at all

Perfectly

The presence of certain types of outliers can impact the statistical significance of some of the regression coefficients

True

How well did you know this?

Not at all

Perfectly

When making a prediction for predicting variables on the “Edge” of the space of predicting variables, then its uncertainty level is high

True

How well did you know this?

Not at all

Perfectly

The prediction of the response variable and the estimation of the mean response have the same interpretation

False (prediction has higher uncertainty than estimation)

How well did you know this?

Not at all

Perfectly

In multiple linear regression, a VIF value of 6 for a predictor means that 80% of the variation in that predictor can be modeled by the other predictors

False

How well did you know this?

Not at all

Perfectly

We can use a t-test to test for the statistical significance of a coefficient given all predicting variables in a multiple regression model

True

How well did you know this?

Not at all

Perfectly

Multicollinearity can lead to less accurate statistical significance of some of the regression coefficients

True

How well did you know this?

Not at all

Perfectly

The estimator of the mean response is unbiased

True

How well did you know this?

Not at all

Perfectly

The sampling distribution of the prediction of the response variable is a chi-squared distribution

False (In multiple linear regression, the sampling distribution of the prediction of the response variable is a t-distribution since the variance of the error term is not known.)

How well did you know this?

Not at all

Perfectly

Multicollinearity in multiple linear regression means that the rows in the design matrix are (nearly) linearly dependent

False

How well did you know this?

Not at all

Perfectly

A linear regression model has high predictive power if the coefficient of determination is close to 1

False

How well did you know this?

Not at all

Perfectly

In multiple linear regression, if the coefficient of a quantitative predicting variable is negative, that means the response variable will decrease as this predicting variable increases

False

How well did you know this?

Not at all

Perfectly

Cooks distance measures how much the fitted values (response) in the multiple linear regression model change when the ith observation is removed

True

How well did you know this?

Not at all

Perfectly

The prediction of the response variable has the same levels of uncertainty compared with the estimation of the mean response

False

How well did you know this?

Not at all

Perfectly

The coefficient of variation is used to evaluate goodness-of-fit

False

How well did you know this?

Not at all

Perfectly

Influential point in multiple linear regression are outliers

True

How well did you know this?

Not at all

Perfectly

We could diagnose the normality assumption using the normal probability plot

True

How well did you know this?

Not at all

Perfectly

If the VIF for each predicting variable is smaller than a certain threshold, then we can say that multicollinearity does not exist in this model

False

How well did you know this?

Not at all

Perfectly

If the residuals are not normally distributed, then we can model instead the transformed response variable where the common transformation for normality is the Box-Cox transformation

True

How well did you know this?

Not at all

Perfectly

If a logistic regression model provides accurate classification, then we can conclude that it is a good fit for the data

False

How well did you know this?

Not at all

Perfectly

The logit function is the log of the ratio of the probability of success to the probability of failure. It is also known as the log odds function

True

How well did you know this?

Not at all

Perfectly

We interpret logistic regression coefficients with respect to the response variable

False

How well did you know this?

Not at all

Perfectly

The likelihood function is a linear function with a closed-form solution

False

How well did you know this?

Not at all

Perfectly

In logistic regression, there is not a linear relationship between the probability of success and the predicting variables

True

We can use a z-test to test for the statistical significance of a coefficient given all predicting variables in a Poisson regression model

True

The number of parameters that need to be estimated in a logistic regression model with 5 predicting variables and an intercept is the same as the number of parameters that need to be estimated in a standard linear regression model with an intercept and same predicting variables.

False

Although there are no error terms in a logistic regression model using binary data with replications, we can still perform residual analysis

True

A goodness-of-fit test should always be conducted after fitting a logistic regression model without repetition

False

For a classification model, training error tends to underestimate the true classification error rate of the model

True

The binary response variable in logistic regression has a bernoulli distribution

True and false

For logistic regression, if the p-value of the deviance test for goodness-of-fit is large, then it is an indication that the model is a good fit

True

The error term in logistic regression has a normal distribution

False (the error term does not exist!)

The estimated regression coefficients in Poisson regression are approximate

True

In Poisson regression, there is a linear relationship between the log rate and the predicting variables

True

Under logistic regression, the sampling distribution used for a coefficient estimator is a chi-square distribution

False (normal distribution)

An over dispersion parameter close to 1 indicates that the variability of the response is close to the variability estimated by the model

True

When testing a subset of coefficients, deviance follows a chi-square distribution with degrees of freedom, where q is the number of regression coefficients in the reduced model

False (q is the number of discarded coefficients)

For both logistic and poisson regression, both the pearson and deviance residuals should approximately follow the standard normal distribution if the model is a good fit for the data

True

The logit link function is the best link function to model binary response data because the models produced always fit the data better than other link functions

False

If the non-constant variance assumption does not hold in multiple linear regression, we apply a transformation to the predicting variables

False (we apply a transformation to the response)

Multicollinearity in multiple linear regression means that the columns in the design matrix are (nearly) linearly dependent

True

In logistic regression, R_2 could be used as a measure of explained variation in the response variable

False

The interpretation of the regression coefficients is the same for both logistic and poisson regression

False

We estimate the regression coefficients in Poisson regression using the MLE approach

true

The f-test can be used to test for the overall regression in poisson regression

False (perhaps chi-square)

A logistic regression model may not be a good fit if the responses are correlated or if there is heterogeneity in the success that hasn't been modeled

True

Trying all three link functions for a logistic regression model (C-In-In, probit, logit) will produce models with the same goodness of fit for a dataset

False

A poisson regression model fit to a dataset with a small sample size will have a hypothesis testing procedure with more Type I errors than expected

True

If poisson regression model does not have a good fit, the relationship between the log of the expected rate and the predicting variables might not be linear

True

R-squared decreases as more predictors are added to a multiple linear regression model, given that the predictors added are unrelated to the response variable

False

In a multiple linear regression model, an observation should always be discarded when its Cook's distance is greater than 4/n where n is the sample size

False

A linear regression model is a good fit to the data set if the adjusted R-squared is above 0.85

False

The sum or squares regression (SSR) measures the explained variability captured by the regression model given the explanatory variables used in the model

True

The hypothesis testing procedure for subsets of regression coefficients is not used for GoF assessment in logistic regression

True

Statistical inference for logistic regression is not reliable for small sample size

True

In logistic regression, we can perform residual analysis for binary data with replications

True

When assessing GoF for a logistic regression model on binary data with replications, the assumption is that the response variables come from a normal distribution

False (what dist?) (EH: I think they're going for binomial/bernoulli here)

The null hypothesis for the GoF test of a logistic regression model is that the model does not fit the data

False

The threshold to calculate the classification error rate of a logistic regression model should always be set at 0.5

False

Using leave one out cross validation is equivalent to k-fold cross validation where the number of folds is equal to the sample size of the training set

True

the assumption of constant variance will always hold for standard linear regression models with poisson distributed response data

False

since there are no error terms in poisson model, we cannot perform residual analysis for evaluating the model's goodness of fit

False

We can diagnose the constant variance assumption in Poisson regression using the normal probability plot

False

In poisson regression, the expectation of the response variable given the predictors is equal to the linear combination of the predicting variables

False (the natural log of y is a linear combination)

The variance of the response is equal to the expected value of the response in Poisson regression with no overdispersion

True

A poisson regression model with p predictors and the intercept with have p+2 parameters to estimate

False (it's p+1 because there's no error term)

If a Poisson regression model is found to be overdispersed, there is an indication that the variability of the response variable implied by the model is larger than the variability present in the observed response variable

False (overdispersion means the variability in the response variable is larger than the model indicates)

In all the regression models we have considered (including multiple linear, logistic, and Poisson), the response variable is assumed to have a distribution from the exponential family of distributions.

True

When considering using generalized linear models, it’s important to consider the impact of Simpson’s paradox when interpreting relationships between explanatory variables and the response. This paradox refers to the reversal of these associations when looking at a marginal relationship compared to a conditional one.

True

Regression - Middle Units Flashcards

(70 cards)