Logistic Regression Flashcards
what is logistic regression (LR) for?
Logistic regression is an example of a non-linear regression model, which is what we need when we have a dichotomous or categorical DV
the assumptions is LR are characteristically ….
less severe – relatively assumption free
What are the 3 main reasons for performing a logistic regression rather than a standard multiple regression?
1) DV is categorical, and therefore 2) Line of best fit will be sigmoidal, not linear, and as such 3) There will be non-normality and heteroscedasticity in the residuals if OLS regression is used, which violates important assumptions of this method
how does LR build a model?
by measuring the deviance of predictors, and including them or excluding them based on their contribution to predicting the outcome variable …. LR says: Does an individual predictor increase or decrease the probability of an outcome?
as apposed to MR… LR uses a dichotomous DV, and …
continuous IVs -
LR is also not…
linear,
the predictive model is called XX
P hat
the residuals are not…..
Residuals are clearly not normal (skewed)
and exhibit …
heteroscedasticity – residuals are all or nothing, and not evenly distributed.
so, Instead of the model fit being linear (which excludes the possibility of using probability, as a linear line can extend past 0 and 1, where probability lies), LOG_REG uses
a non-linear (sigmoidal) line of best ft.
Probability means =
0-1 or a % 0-100) – likelihood of an event occurring
Odds mean =
Odds = Probability of event divided by its component 1-the probability of an event
why does LR use odds?
unpacks the maths nicely - actually it is converted back t p value after using odds
in LR, instead of using the odds (which are xx), we use the …
asymmetric natural log
what is the natural log called in LR
the logit
what does the odds ratio mean in LR?
Odds ratio: relationship between the odds of an event occurring across levels of another variable (by how much do the odds of Y change as X increases by 1 unit?)
and what does the ‘ratio of ratios’ mean?
Ratio of a ratios – the event of an event occurring as a function of levels of another variable. (e.g. odds of males having a disease with the odds of females having disease – i.e. the odds of these odds combined)
why do we present the results in terms of log odds and odds ratio?
as it turns a non-linear relationship into the familiar linear one
this enables us to subsequently …
test whether this coefficient is significantly different from 0 – just like a t-test in MR
the predicted odds range from ?
0 to + ∞
so when p>.50
odds>1 (.50 would be even at 1)
the predicted odds varies ….
varies exponentially with the predictor(s)
in comparison the natural logit ranges…..
from - ∞ to + ∞
it reflects odds of being a case but
varies linearly with the predictor(s)
the issue with this is
not very interpretable; – if p=.8; odds=4; logit=1.386
The typical partial regression coefficients (B) indicate ….
increment in the logit given unit increment in predictor
whereas the Odds ratios (eB) indcates?
the amount by which odds of being-a-case are multiplied given a unit increment in predictor (or change in level of predictor if predictor is categorical)
if MR uses OLS…. LR uses…..
Ð maximum likelihood estimation; an ITERATIVE solution where the regression coefficients are estimated by trial-and-error and gradual adjustment. Ð (this seeks to maximize the likelihood (L) of the observed values of Y given a model and using the observed values of the predictors)
if OLS uses the sum of squares….. LR uses….
uses measures of deviance rather than sums of squares Focus is lack of fit more focused on (1-R2) minimizing this – just the same as MR but flipped round Null Deviance, Dnull; similar to SSTotal = reflects amount of variability in data, the amount of deviance that could potentially be accounted for. Model Deviance, DK; similar to SSResidual = reflects amount of variability in data after accounting for prediction from k predictors
For each model, a x x value is calculated – which is analogous to a F ratio etc for the overall model.
log likelihood
LR uses xxx models
nested So, minimising the lack of fit of the model – maximising the likelihood of the data So, take models, compare sets of models with one another. Simplest way is comparing model with all the variables in, with no model at all..and compare subsets of the model, with and without individual predictors + sig of each predictor. (COMPARING 2 MODELS, ONE IS BIGGER AND ONE IS SMALLER AND NESTED IN THE BIGGER MODEL, COMPARING HIERARCHICALLY)
If the xxx model is true then the LRT statistic is distributed as xxx with m df
Ð If the smaller model is true then the Likelihood ratio test (LRT) statistic is distributed as χ2 with m df if smaller model is correct, under assumptions, then LRT is chi 2 with m df. SOOOO its testing whether it is worth having those m parameters in the model, if the LRT is not any bigger under the chi 2 dist with m df then it is not worth putting the additional parameters in, and we prefer the simpler model – more parsimonious explanation – only prefer bigger model if it improved fit.
this LR standard approach is more xxxxx than standard MR – resembles xxxxx MR.
Standard approach is more hierarchical than standard MR – resembles hierarchical MR. We only accept more predictors if they significantly enhance the degree of fit.
TAKE HOME MESSAGE: Collectively when we have assessed our model, and the 3 predictors don’t enhance our model prediction. Then you ask…
would one model on its own be a better fit etc?
a limitation of LR is that it is a xxxxxxx proceedure
low power
how is it low power?
as catagorical - IVs are either 0 or a 1 needs big sample sizes
what is Pseudo R2 ?
waste of time – analgous to R2 – McFaddons/ Cox n Snell / Nagelkerke – all crap. Not “variance accounted for” as not homoscedastic.
how do you calculate DF for catagorical DVs?
To calculate DF for a binary DV (has disease: yes vs no)– you need to add up all the main effects and interactions. It is not N-1 as when DV is continuous. Similarly, categorical IVs need (m-1)*(n-1) parameters to capture the effects when there are m levels of the IV and n levels of the DV.