Lecture 10: Logistic regression Flashcards
What is a contingency table?
Assess the frequency distribution of each of two categorical variables as well as the association between two categorical variables
- To form one in SPSS use crosstabs
What does expected frequencies table represent?
If null hypothesis was true what would the proportions be
What is the risk of an outcome?
The risk of an outcome is the number of times the outcome of interest occurred / the total number of possible outcomes (did & didn’t)
How is a risk ratio calculated?
Calculate the risk of having the outcome in both groups (with exposure / without exposure)
Risk of outcome with exposure
= number of times outcome occurred with exposure / total number of times of exposure (with & without outcome)
Risk of outcome without exposure
= number of times outcome occurred without exposure / total number of times without exposure (with & without outcome)
To calculate risk ratio
Risk with outcome / risk without outcome
- if > 1 there is more of a risk of the outcome occurring with the exposure
- risk of 1 there is no difference between groups
- risk < 1 is risk is less with exposure
How does odds calculation differ to risk?
Rather than dividing by total number of events - the number of outcomes is / number of times without outcome
Odds ratio calculated by dividing odds in exposed group / odds in non-exposed group
OR = 1 - mean outcome occurs half time and is not related to exposure
OR > 1 - outcomes occurs more than half time with exposure - is related
OR < 1 - outcome occurs less than half time - exposure doesn’t associate to increase risk of disease
When can risk and odds ratios be used?
Risk ratio - cannot be used in a case control study - i.e. when participants are selected already for the outcome of interest. Only odds ratio used here
Rare outcomes - both risk/odds ratios can be used
Odd’s ratios can be used in many study designs and forms the basis for logistic regression
Risk ratios often preferred for clinical practice
Why can’t simple linear regression be used for a binary outcome?
Linear regression assumes the population distribution is normally distributed around the mean (for each value of X) - not going to be the case is here is a binary response
Linear relationship doesn’t make sense for a binary outcome
- Output variable is limited to 0,1 - some of our observations would be outside this range
- Our goal is to separate the two best groups rather than minimise the least square error
- If linear regression was used would be very sensitive to influential outliers
- Homogeneity of variance would be violated
Sigmoid - S - shape used instead
What function is used in logistic regression?
g(y) link function - gives ability to model the distribution between left and right side of function
n = alpha + betaX
log (n / 1 - n) = alpha + betaX
The logistic function allows a linear relationship to be plotted
If beta increases in a simple logistic regression how will the log of the odds increase?
The log of the odds will increase and the steepness of the curve will increase
What is used to estimate the beta coefficient and the constant in logistic regression?
Maximum likelihood - an iterative process - many models are tried until the best fit is found
Find the coefficient value which makes the observed data most likely
(in SLR ordinary least squares is used)
How do we interpret the coefficient in logistic regression?
Increase in X by one unit will affect log(odds) by the value of the coefficient
To work out the odds change by one unit increase in X - use antilog function
anti-log (eX) = e (to the power of beat coefficient value) = odds ratio for the value of concentration occuring
What are the assumptions of binary logistic regression?
Binary dependent variable which has a Bernouli distribution
The binary variable is only linearly related to the predictor variables after transforming into the logit scale
The observations are independent
Continuous variables have a linear effect on the log-odds scale
Use for binary dependent variable with continuous predictors
What is the probability of an event in logistic regression?
p = exp (L) / 1 + exp (L)
= 1 / 1 + exp(-L)
Odds of an event p / 1-p
Logit - Model: ln (p/1-p) = alpha + betaX
L = alpha + BetaX is the linear predictor
exp (L) = eL is the odds of an event
How is binary logistic regression ran in SPSS?
Select regression - binary logistic
Choose variables - for any categorical - change reference category from last to first - select categorical - first - change
Choose the confience ratios for exponential beta
What does the omnibus test of model coefficients explain?
The omnibus test of model explains whether the inclusion of a block of variables contributes to a better model fit
The coefficient of determination (R2) gives an indication of how much variation in y is explained by the model
Nagelkerke R2 - is used
The classification table indicates how does the correct classification improve when the predictors are included in the model