Logistic Regression Flashcards
Concept
Aim is to predict how likely it is that an event will occur that OV = 1 or how likely it is that OV = 0
Why use logistic regression and not linear regression?
Many assumptions such as linear assumption will be violated
Odds
Long run ratio of an event happening to an event not happening
Odds (wins) = n (wins) / n (losses) = p (win) / p (lose)
What is chi square used for
To test significance of the model (bc categorical OV)
3 values of model testing
Deviance (-2LL)
Pseudo R squared
Hit rate
-2LL (log likelihood)
How much unexplained information after model has been fitted
(Big values, bad models)
Pseudo R squared
Explanatory power of the model with PVs compared to null model without PVs
(Higher pseudo R squared the better)
Hypotheses chi square
H0: newly added PVs, compared to null, have no difference on OV
- beta 1 = beta 2 = … = beta k = 0
H1: at least one beta does not equal 0
Hit rate
Number of people in dataset we correctly predict the event to occur
Effect of high cutoff rate
LESS 1s MORE 0s
Prediction of extra 0s are incorrect so correct estimates for 0s decrease and increase for 1s
Effect of low cutoff rate
MORE 1s LESS 0s
Prediction of extra 1s are not correct so probability of correct estimates for 1s decrease and increase for 0s
Exp(B) concept
When we increase PV by 1 unit, the odds of that event happening will change by exp(B) factor
Less than one: negative effect
More than one: positive effect
One: no effect
Reporting chi square statistic
If p is less than 0.05 we reject H0 and conclude that at least one of the betas is not equal to 0, and at least one of the PVs is significant and has an effect on OV, so our model as a whole adds explanatory power compared to the null model
Reject? Equal to 0? Significant? Effect on OV? Adds explanatory power?
Pseudo R squared interpretation
Explanatory power of the model with PVs compared to model without PVs
Gives an idea about “how well the PVs in the model fits into the data”
Higher they are more improvement
Interpreting beta
Only see if the effect is positive or negative
Values above 0 have positive effect
Below 0 have negative effect