Epidemiology Chapter 3 Flashcards

1
Q

Why is a standard simple linear regression model not appropriate for Bernoulli or binomial responses?

A

In simple linear regression we assume the errors are normally distributed with mean o and variance sigma square, this implies for fixed x, Y is normally distributed which is not the case with Bernoulli or binomial responses

Mismatch between the two sides of the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Likelihood for the binary logistic regression model

A

L(α,β│y_1,y_2,….,y_n )=∏(I=1 to n)Pr⁡(Y_i=y_i│α,β)
log⁡(π_i/(1-π_i ))=α+βx_i
π_i=e^( α+βx_i )/(1+e^(α+βx_i ) )
Pr⁡(Y_i=0| α,β)=1-π_i=1/(1+e^(α+βx_i ))
Likelihood function
(I=1 to n) ( e^( α+βx_i )/(1+e^(α+βx_i ) ))^y_i x (1/(1+e^(α+βx_i )))^(1-y_i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Log-likelihood for the binary logistic regression

A

∑_(I=1 to n){y_i log⁡(e^( α+βx_i )/(1+e^(α+βx_i ) ))+(1-y_i )log⁡(1/(1+e^(α+βx_i ) )}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Interpretation for one explanatory variable

Model and interpretation

A

Underlying model
log⁡(π_i/(1-π_i ))=α+βx_i

Define terms
π_i - probability of success for person i
x_i - explanatory variable value for the ith person
α and β are the model parameters

Interpret the parameter estimates
α_hat - log odds of success for someone of age x_i=0, taking exponentials gives the odds of success for someone of x_i= 0, or mean x_i if using centred data
β_hat - a unit increase in x is estimated to change the log odds of success by β_hat, taking exponentials gives the odds ratio of comparing two individuals 1 unit of x apart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Interpretation for multiply explanatory variables

Model

A

Underlying model
log(π_i/(1-π_i)) = α+β_1x_1i+β_2x_2i
Define terms
π_i - probability of success for person i
x_1i - explanatory variable value for the ith person
x_2i - explanatory variable value for the ith person
α, β_1 and β_2 are the model parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Extrapolating?

A

Making inference outside of our data range

Fix by using centred data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Interpretation for one explanatory variable

Hypotheses parameter estimates alpha and beta

A

Null hypothesis: α=0
Alternative: α≠0
Test statistic:
z=(α_hat-α_0)/se(α_hat)
here α_0 is 0
Under the assumption that H_0 is true, our test statistic will have been approximately sampled form a standard normal distribution
Associated p-value equals, p= 2 x Pr(Z>z)
If p<0.05 reject null at the 5% significance level

Same for beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interpretation for one explanatory variable

Confidence intervals for parameter estimates alpha and bata

A

α_hat ± z_(α/2)se(α_hat)
β_hat ± z_(β/2)se(β_hat)

If interval contains 0 cannot reject H0, can’t rule out the possibility that the parameter =0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe how the AIC is calculated and its uses

A

AIC = -2 x logloikelihood +2p,
p is the number of parameters
It is used for comparing models, prefer smaller values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe how the residual deviance have been calculated and their uses

A

The residual deviance compares the fitted model with the saturated one
2(LL_s-LL_p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Likelihood ratio test

A

Compares the proposed model with the null model

test statistic = null deviance - residual deviance
To test hypothesis
Null: β=0, suggests replacing the proposed model with the null model
ALternative: β≠0

Under the assumption that H_0 is true, our test statistic follows the chi-squared distribution on one degree of freedom (the difference in number of parameters between the two models being considered)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Interpreting a slope coefficient associated with an indicator variable
Multiple binary logistic regression

A

Interpret the parameter estimates

e^β_2 is the odds ratio for an patient with x_2i = 1 compared to a patient with x_2i =0, all other things being fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Alternative link functions

A

The logit function is a link function that links π_i to linear functions of explanatory variables. It is the default in R

The probit link function - based on the inverse cumulative distribution function for the standard normal distribution
ϕ^(-1) (π_i )=α+βx_i
in R: family=binomial(link=”probit”)

The complementary log-log link function maps π_i to the whole real line
log(-log(1-π_i)) = α+βx_i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Comparing binary regression models

A
  1. Through the consideration of AIC statistics
  2. by considering the interpretability of the models
  3. by constructing residual analysis on the fitted models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly