Wronged Questions: Non-linear Models Flashcards
T/F: Pearson’s goodness-of-fit test statistic is normally distributed under the null hypothesis.
False. It follows the chi-squared distribution
T/F: The number of degrees of freedom for the likelihood ratio test is the number of parameters in the reduced model.
False. The number of DF is determined by the # of variables removed from the full model to create the reduced model
T/F: Akaike information criterion values are better when they are smaller, as they indicate a more parsimonious model.
True. We want a small AIC value
T/F: Z-statistics are not used in Poisson regression for testing the significance of individual regression coefficients.
False. They are used for this purpose
T/F: The Bayesian information criterion generally favors models with more parameters over simpler ones.
False. BIC favours simpler models
T/F: A square root link is as appropriate as a log link.
False. A square root link outputs a non-negative number. It’s better if a link function outputs a real number.
T/F: If the GLM is adequate, the deviance is a realization from a chi-square distribution.
True. If a GLM is adequate, the scaled deviance is a realization from a chi-square distribution because it equals twice the difference of maximized log-likelihoods for nested models. For a Poisson regression, the deviance also equals the scaled deviance.
Hurdle Model
Model where a random variable is modelled using two parts.
1) Probability of obtaining 0
2) Probability of obtaining a non-zero probability
Log link function ensures predicted values are always _____, suitable for count data.
Positive
Log link isn’t standard for ______ distribution
Binomial
Gamma distribution is for positive, _________ data
Continuous
Identity link doesn’t guarantee ________ predictions for Poisson.
Positive
Logit link maps the linear predictor N to the interval _____, which is appropriate for probabilities
(0, 1)
The link function in a GLM is a function that relates the linear predictor to the ____ of the distribution.
mean
T/F: When using a linear probability model with a binary response variable, the main advantage is the relative ease of parameter interpretation.
True. A linear probability model, despite its limitations, is favored for its straightforward interpretability. The coefficients in a linear probability model can be directly interpreted as changes in probability associated with unit changes in predictors.
T/F: It is easy to distinguish between logit and probit models graphically, since the forms of their functions are quite different.
False. Graphically, they are not easily distinguishable because both functions are sigmoid (S shaped) and very close in form.
T/F: Between logit and probit models, one is significantly more popular because its cumulative distribution function is the only one of the two that has a closed-form expression.
True. The logit model tends to be more popular in many applications primarily because the logistic distribution used in the logit model has a closed-form cumulative distribution function.
T/F: Both logit and probit models have functions pi(z) that must be between 0 and 1, as they are used to model probabilities.
True. Both models are designed to model probabilities, and thus, the output of their respective functions, pi(z), must be bounded between 0 and 1.
T/F: Both logit and probit models aim to circumvent the disadvantages of linear probability models.
True. Both models are used to overcome limitations seen in linear probability models, particularly issues related to heteroscedasticity and probabilities being modeled outside the [0, 1] interval.
T/F: The dependent variable in Poisson regression is a non-negative integer that counts the number of events.
True. This statement is correct as Poisson regression is used specifically for count data, which are non-negative integers.
T/F: The maximum likelihood estimator for the mean in Poisson regression with no predictors is the sample mean of the observed counts.
True. The maximum likelihood estimator (MLE) for the mean in a Poisson distribution is indeed the sample mean of the counts.
T/F: The logarithmic link function is used to connect the mean of the dependent variable to the explanatory variables.
True. The logarithmic link function is a standard component in Poisson regression to relate the mean to the explanatory variables.
T/F: Poisson regression can incorporate exposures as an explanatory variable to allow the mean to vary with known amounts.
True. It is correct that exposures can be incorporated into the Poisson regression model to allow variations in the mean.
T/F: OLS assumes that the response variable is continuous and normally distributed, while GLMs can accommodate various types of distributions like binomial, Poisson, and normal.
True. OLS typically assumes the response variable is continuous and normally distributed. In contrast, GLMs are designed to handle various types of distributions through their distribution family.
T/F: GLMs include a link function that relates the mean of the response variable to the linear predictor, whereas OLS directly models the mean of the response variable as a linear combination of predictors.
True. GLMs use a link function to connect the mean of the response variable to the linear predictors, which is crucial for dealing with non-normal distributions.
OLS, however, does not use a link function and models the response variable directly as a linear combination of predictors without transformation.
T/F: Both OLS and GLM assume that observations are independent of each other.
True. The assumption of independent observations is fundamental in both OLS and GLMs. The independence of observations is crucial for the validity of the statistical tests used in inference, such as tests for coefficients and predictions.
T/F: GLMs do not require homoscedasticity, unlike OLS which assumes homoscedasticity.
True. GLMs can handle cases where the variance of the response variable is not constant (heteroscedasticity), due to the nature of the probability distributions used.
OLS, on the other hand, assumes constant variance across all levels of the predictor variables (homoscedasticity).
T/F: OLS assumes that the residuals are normally distributed, while GLM assumes that the residuals are uniformly distributed.
False. OLS assumes that the residuals are normally distributed. GLM does not focus on residuals and only focuses on the distribution (gamma, gaussian, etc) of the variable itself.
T/F: The three major drawbacks of the linear probability model are poor fitted values, heteroscedasticity, and meaningless residual analysis.
True
T/F: The logistic and probit regression models aim to circumvent the drawbacks of linear probability models.
True
T/F: The logit and probit functions are substantially different.
False.
T/F: A large Pearson chi-square statistic indicates that overdispersion is likely more severe.
True. A large Pearson chi-square statistic suggests that delta should have a large estimate, thus indicating a big change is necessary to address overdispersion.
Anscombe residuals
Used for non-normal response distributions in GLM because Pearson is often skewed.
Pearson Residual
Deviance statistic
Scaled deviance
X^2 statistic for likelihood ratio test
Dr* - Df*