Wronged Questions: Non-linear Models Flashcards

1
Q

T/F: Pearson’s goodness-of-fit test statistic is normally distributed under the null hypothesis.

A

False. It follows the chi-squared distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

T/F: The number of degrees of freedom for the likelihood ratio test is the number of parameters in the reduced model.

A

False. The number of DF is determined by the # of variables removed from the full model to create the reduced model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T/F: Akaike information criterion values are better when they are smaller, as they indicate a more parsimonious model.

A

True. We want a small AIC value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F: Z-statistics are not used in Poisson regression for testing the significance of individual regression coefficients.

A

False. They are used for this purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F: The Bayesian information criterion generally favors models with more parameters over simpler ones.

A

False. BIC favours simpler models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

T/F: A square root link is as appropriate as a log link.

A

False. A square root link outputs a non-negative number. It’s better if a link function outputs a real number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F: If the GLM is adequate, the deviance is a realization from a chi-square distribution.

A

True. If a GLM is adequate, the scaled deviance is a realization from a chi-square distribution because it equals twice the difference of maximized log-likelihoods for nested models. For a Poisson regression, the deviance also equals the scaled deviance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hurdle Model

A

Model where a random variable is modelled using two parts.
1) Probability of obtaining 0
2) Probability of obtaining a non-zero probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Log link function ensures predicted values are always _____, suitable for count data.

A

Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Log link isn’t standard for ______ distribution

A

Binomial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Gamma distribution is for positive, _________ data

A

Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Identity link doesn’t guarantee ________ predictions for Poisson.

A

Positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Logit link maps the linear predictor N to the interval _____, which is appropriate for probabilities

A

(0, 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The link function in a GLM is a function that relates the linear predictor to the ____ of the distribution.

A

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F: When using a linear probability model with a binary response variable, the main advantage is the relative ease of parameter interpretation.

A

True. A linear probability model, despite its limitations, is favored for its straightforward interpretability. The coefficients in a linear probability model can be directly interpreted as changes in probability associated with unit changes in predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F: It is easy to distinguish between logit and probit models graphically, since the forms of their functions are quite different.

A

False. Graphically, they are not easily distinguishable because both functions are sigmoid (S shaped) and very close in form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

T/F: Between logit and probit models, one is significantly more popular because its cumulative distribution function is the only one of the two that has a closed-form expression.

A

True. The logit model tends to be more popular in many applications primarily because the logistic distribution used in the logit model has a closed-form cumulative distribution function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T/F: Both logit and probit models have functions pi(z) that must be between 0 and 1, as they are used to model probabilities.

A

True. Both models are designed to model probabilities, and thus, the output of their respective functions, pi(z), must be bounded between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

T/F: Both logit and probit models aim to circumvent the disadvantages of linear probability models.

A

True. Both models are used to overcome limitations seen in linear probability models, particularly issues related to heteroscedasticity and probabilities being modeled outside the [0, 1] interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

T/F: The dependent variable in Poisson regression is a non-negative integer that counts the number of events.

A

True. This statement is correct as Poisson regression is used specifically for count data, which are non-negative integers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

T/F: The maximum likelihood estimator for the mean in Poisson regression with no predictors is the sample mean of the observed counts.

A

True. The maximum likelihood estimator (MLE) for the mean in a Poisson distribution is indeed the sample mean of the counts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

T/F: The logarithmic link function is used to connect the mean of the dependent variable to the explanatory variables.

A

True. The logarithmic link function is a standard component in Poisson regression to relate the mean to the explanatory variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

T/F: Poisson regression can incorporate exposures as an explanatory variable to allow the mean to vary with known amounts.

A

True. It is correct that exposures can be incorporated into the Poisson regression model to allow variations in the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

T/F: OLS assumes that the response variable is continuous and normally distributed, while GLMs can accommodate various types of distributions like binomial, Poisson, and normal.

A

True. OLS typically assumes the response variable is continuous and normally distributed. In contrast, GLMs are designed to handle various types of distributions through their distribution family.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

T/F: GLMs include a link function that relates the mean of the response variable to the linear predictor, whereas OLS directly models the mean of the response variable as a linear combination of predictors.

A

True. GLMs use a link function to connect the mean of the response variable to the linear predictors, which is crucial for dealing with non-normal distributions.

OLS, however, does not use a link function and models the response variable directly as a linear combination of predictors without transformation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

T/F: Both OLS and GLM assume that observations are independent of each other.

A

True. The assumption of independent observations is fundamental in both OLS and GLMs. The independence of observations is crucial for the validity of the statistical tests used in inference, such as tests for coefficients and predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

T/F: GLMs do not require homoscedasticity, unlike OLS which assumes homoscedasticity.

A

True. GLMs can handle cases where the variance of the response variable is not constant (heteroscedasticity), due to the nature of the probability distributions used.

OLS, on the other hand, assumes constant variance across all levels of the predictor variables (homoscedasticity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

T/F: OLS assumes that the residuals are normally distributed, while GLM assumes that the residuals are uniformly distributed.

A

False. OLS assumes that the residuals are normally distributed. GLM does not focus on residuals and only focuses on the distribution (gamma, gaussian, etc) of the variable itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

T/F: The three major drawbacks of the linear probability model are poor fitted values, heteroscedasticity, and meaningless residual analysis.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

T/F: The logistic and probit regression models aim to circumvent the drawbacks of linear probability models.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

T/F: The logit and probit functions are substantially different.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

T/F: A large Pearson chi-square statistic indicates that overdispersion is likely more severe.

A

True. A large Pearson chi-square statistic suggests that delta should have a large estimate, thus indicating a big change is necessary to address overdispersion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Anscombe residuals

A

Used for non-normal response distributions in GLM because Pearson is often skewed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Pearson Residual

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Deviance statistic

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Scaled deviance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

X^2 statistic for likelihood ratio test

A

Dr* - Df*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Pseudo r^2

A
39
Q

Canonical link b’^-1(μ) for inverse gaussian distribution

A
40
Q

Canonical link b’^-1(μ) for gamma distribution

A
41
Q

Canonical link b’^-1(μ) for negative binomial distribution

A
42
Q

Canonical link b’^-1(μ) for poisson distribution

A

ln(μ)

43
Q

Canonical link b’^-1(μ) for binomial distribution

A
44
Q

Canonical link b’^-1(μ) for gaussian distribution

A

μ

45
Q

b(θ) for Gaussian distribution

A
46
Q

b(θ) for binomial distribution

A
47
Q

b(θ) for poisson distribution

A
48
Q

b(θ) for negative binomial distribution

A
49
Q

b(θ) for gamma distribution

A

-ln(-θ)

50
Q

b(θ) for inverse gaussian distribution

A
51
Q

Φ for gaussian distribution

A

σ^2

52
Q

Φ for binomial distribution

A

1

53
Q

Φ for poisson distribution

A

1

54
Q

Φ for negative binomial distribution

A

1

55
Q

Φ for gamma distribution

A

1/α or α^-1

56
Q

Φ for inverse gaussian distribution

A

1/λ or λ^-1

57
Q

θ for gaussian distribution

A

μ

58
Q

θ for binomial distribution

A
59
Q

θ for poisson distribution

A

ln(λ)

60
Q

θ for negative binomial distribution

A

ln(1-p)

61
Q

θ for gamma distribution

A

-γα^-1

62
Q

θ for inverse gaussian distribution

A
63
Q

PDF of gaussian distribution

A
64
Q

PDF of binomial distribution

A

(n x) part is n!/(n-x)!x!

65
Q

PDF of poisson distribution

A
66
Q

PDF of negative binomial distribution

A
67
Q

PDF of gamma distribution

A
68
Q

PDF of inverse gaussian distribution

A
69
Q

Var(X) of gamma distribution

A

Φ x μ^2

70
Q

A heterogeneity model requires modeling with a _________ mixture.

A

Continuous

71
Q

A zero-inflated model requires modeling as a mixture of a point mass at zero and another distribution whose domain starts with _.

A

0

72
Q

For ΦE[Y]Y^p, what is p for gaussian distribution?

A

0

73
Q

For ΦE[Y]Y^p, what is p for poisson distribution?

A

1

74
Q

For ΦE[Y]Y^p, what is p for tweeide/compound poisson-gamma distribution?

A

(1,2)

75
Q

For ΦE[Y]Y^p, what is p for gamma distribution?

A

2

76
Q

For ΦE[Y]Y^p, what is p for inverse gaussian distribution?

A

3

77
Q

T/F: The cumulative probit model uses cumulative probabilities that differ significantly from those used in the cumulative logit model.

A

False. The cumulative probit model and the cumulative logit model both use similar structures of cumulative probabilities.

78
Q

T/F: The proportional odds model uses logit[Pr(yi<=j)] = aj, which is an intercept-only model and avoids using any explanatory variables.

A

False. The proportional odds model actually incorporates explanatory variables, and is expressed as logit[Pr(yi<=j)] = aj + Xi^TB. This model is not intercept-only.

79
Q

T/F: In the simplest cumulative logit model, the cut-point parameters are decreasing.

A

False. In the simplest cumulative logit model, the cut-point parameters are non-decreasing reflecting the ordered nature of the probabilities.

This also applies in a proportional odds model.

80
Q

T/F: Ordinal variables are unordered categorical variables that do not consider the sequence of categories.

A

False. Ordinal variables are ordered categorical variables, where the sequence of categories is significant and often represents a scale of measurement.

81
Q

T/F: The cumulative probit model will give results that are very similar to the cumulative logit model.

A

True. The cumulative probit model and the cumulative logit model both predict ordinal dependent variables and are expected to give similar results primarily because they share a similar conceptual framework, despite using different link functions.

82
Q

T/F: The link function for incorporating exposures explicitly is ln(μi) = ln(Ei) + Xi^Tβ.

A

True

83
Q

T/F: If exposures are not considered, the mean of the Poisson distribution is modeled as μi = exp(xi^Tβ).

A

True

84
Q

T/F: The likelihood ratio test is used for testing groups of regression coefficients.

A

True. The likelihood ratio test (LRT) is indeed used to compare the fit of two nested models, specifically testing whether the simpler (or reduced) model is sufficient or the full model with additional parameters is justified.

85
Q

T/F: The degrees of freedom for the likelihood ratio test correspond to the number of variables included in the full model.

A

False. The df for the LRT is determined by the number of variables removed from the full model to form the reduced model.

86
Q

T/F: The simplest cumulative logit model is logit[Pr(yi<=j)] = aj, which is an model that is intercept-only and does not use any of the explanatory variables.

A

True

87
Q

T/F: For the cumulative logit model, we use logit[Pr(yi<=j)] = aj, where the parameter interpretation is similar to logistic regression.

A

True

88
Q

T/F: Both the cumulative logit and cumulative probit models are based on cumulative probabilities.

A

True

89
Q

T/F: The proportional odds model does not incorporate explanatory variables.

A

False. The proportional odds model does incorporate explanatory variables.

90
Q

T/F: Ordinal variables are ordered categorical variables where the order of categories matters, representing a scale from smallest to largest or vice versa.

A

True. Ordinal variables are indeed ordered categories where the order is significant, depicting a sequence or scale which reflects increasing or decreasing magnitude or importance.

91
Q

T/F: Pearson residuals can be used to calculate a goodness-of-fit statistic.

A

True

92
Q

T/F: Pearson residuals can be used to detect if additional variables of interest can be used to improve the model specification.

A

True

93
Q

T/F: The variance of the response in a GLM is the scale parameter times the variance function v(μ).

A

True