Epidemiology Chapter 3 Flashcards
Why is a standard simple linear regression model not appropriate for Bernoulli or binomial responses?
In simple linear regression we assume the errors are normally distributed with mean o and variance sigma square, this implies for fixed x, Y is normally distributed which is not the case with Bernoulli or binomial responses
Mismatch between the two sides of the model
Likelihood for the binary logistic regression model
L(α,β│y_1,y_2,….,y_n )=∏(I=1 to n)Pr(Y_i=y_i│α,β)
log(π_i/(1-π_i ))=α+βx_i
π_i=e^( α+βx_i )/(1+e^(α+βx_i ) )
Pr(Y_i=0| α,β)=1-π_i=1/(1+e^(α+βx_i ))
Likelihood function
∏(I=1 to n) ( e^( α+βx_i )/(1+e^(α+βx_i ) ))^y_i x (1/(1+e^(α+βx_i )))^(1-y_i)
Log-likelihood for the binary logistic regression
∑_(I=1 to n){y_i log(e^( α+βx_i )/(1+e^(α+βx_i ) ))+(1-y_i )log(1/(1+e^(α+βx_i ) )}
Interpretation for one explanatory variable
Model and interpretation
Underlying model
log(π_i/(1-π_i ))=α+βx_i
Define terms
π_i - probability of success for person i
x_i - explanatory variable value for the ith person
α and β are the model parameters
Interpret the parameter estimates
α_hat - log odds of success for someone of age x_i=0, taking exponentials gives the odds of success for someone of x_i= 0, or mean x_i if using centred data
β_hat - a unit increase in x is estimated to change the log odds of success by β_hat, taking exponentials gives the odds ratio of comparing two individuals 1 unit of x apart
Interpretation for multiply explanatory variables
Model
Underlying model
log(π_i/(1-π_i)) = α+β_1x_1i+β_2x_2i
Define terms
π_i - probability of success for person i
x_1i - explanatory variable value for the ith person
x_2i - explanatory variable value for the ith person
α, β_1 and β_2 are the model parameters
What is Extrapolating?
Making inference outside of our data range
Fix by using centred data
Interpretation for one explanatory variable
Hypotheses parameter estimates alpha and beta
Null hypothesis: α=0
Alternative: α≠0
Test statistic:
z=(α_hat-α_0)/se(α_hat)
here α_0 is 0
Under the assumption that H_0 is true, our test statistic will have been approximately sampled form a standard normal distribution
Associated p-value equals, p= 2 x Pr(Z>z)
If p<0.05 reject null at the 5% significance level
Same for beta
Interpretation for one explanatory variable
Confidence intervals for parameter estimates alpha and bata
α_hat ± z_(α/2)se(α_hat)
β_hat ± z_(β/2)se(β_hat)
If interval contains 0 cannot reject H0, can’t rule out the possibility that the parameter =0
Describe how the AIC is calculated and its uses
AIC = -2 x logloikelihood +2p,
p is the number of parameters
It is used for comparing models, prefer smaller values
Describe how the residual deviance have been calculated and their uses
The residual deviance compares the fitted model with the saturated one
2(LL_s-LL_p)
Likelihood ratio test
Compares the proposed model with the null model
test statistic = null deviance - residual deviance
To test hypothesis
Null: β=0, suggests replacing the proposed model with the null model
ALternative: β≠0
Under the assumption that H_0 is true, our test statistic follows the chi-squared distribution on one degree of freedom (the difference in number of parameters between the two models being considered)
Interpreting a slope coefficient associated with an indicator variable
Multiple binary logistic regression
Interpret the parameter estimates
e^β_2 is the odds ratio for an patient with x_2i = 1 compared to a patient with x_2i =0, all other things being fixed
Alternative link functions
The logit function is a link function that links π_i to linear functions of explanatory variables. It is the default in R
The probit link function - based on the inverse cumulative distribution function for the standard normal distribution
ϕ^(-1) (π_i )=α+βx_i
in R: family=binomial(link=”probit”)
The complementary log-log link function maps π_i to the whole real line
log(-log(1-π_i)) = α+βx_i
Comparing binary regression models
- Through the consideration of AIC statistics
- by considering the interpretability of the models
- by constructing residual analysis on the fitted models