Logistic Regression Flashcards
The linear regression model is…
- Y = bX + c
• ^ Y is the outcome variable • b is the slope of the line •X is the 'explanatory/predictor variable •c is the intercept
The residuals are…
Y - Y
Logistic regression
-A non-linear regression model
•Has a dichtomous or categorical DV
•Predictors are either continuous it categorical
Related methods
-Logistic analysis/ multiway frequency table analysis:
•Multiple categorical predictors and one categorical DV
-Discriminant analysis:
•Multiple categorical or continuous predictors and one categorical DV
•More assumptions than logistic regression
-Linear regression:
•Multiple categorical or continuous predictors and one continuous DV
Research questions
-Can we predict the presence or absence of a disorder/disease?
•E.g. label present as 1, absent as 0
-Can we predict an outcome using a set of predictors?
•How good is the model?
-Does an individual predictor increase or decrease the probability of an outcome?
•Related to importance of predictors
-Classification and prediction
-Simple categorical outcomes
•Can we predict the outcomes using categorical predictors?
Ordinary Least Squares (OLS)
-All forms of multiple regression based on the same structural model: OLS
-3 important characteristics:
•Model is linear
•Residuals are assumed to be normally and homogenously distributed
•Predicted scores (Y’hat’) are on the same scale as the date (Y)
-Characteristics don’t apply to logistic regression
When representing results of a logistic regressiom graphically…
-Better to use the non-linear/sigmoidal model as it represents the essence of the data better
Important concepts in logistic regression:
Probability
-The likelihood of an event occurring
•If p = .80, there is an 80% chance of that event occurring
Important concepts in Logistic Regression
Predicted Odds
-The probability of an event occurring by the probability of it not occurring
•Probs of event happening/Prob of event not happening
-If p = .80, the probs of it not occurring is .20
• = .80/.20
•=4
~The odds were 4:1 in favour of the event occurring
Important concepts in Logistic Regression
Logit
-Odds are asymmetric (unequal) but can use the natural log of the odds instead
•Log of odds = Logit
Important concepts in Logistic Regression
Odds ratio
-Relationship between the odds of an event occurring across levels of another variable
•By how much do the odds of Y change as X increased by 1 unit
•Essentially a ratio of ratios
•Measure of effect size is central here
~A good way of measuring the strength of the relationship
The structural model
- Don’t really need to know about it
- Log odds turns a non-linear relationship into a linear one
-Our model is of p’hat’i rather than Y’hat’
•p’hat’i is the estimated probability of the outcome i occurring
- Base e is an irrational constant, roughly 2.718
- B and C are model parameters
- Relates our predictor(s) to the predicted scores
Predicted odds vs Logit
Predicted odds
-Odds of being a case
-Odds = p/(1-p), which ranges from 0 to positive infinity
•When p = .50, the odds are 1 (even odds, 1:1)
•When p>.50, the odds are >1
-Varies exponentially with the predictor(s)
Predicted odds vs Logit
Logit
-Natural logarithm of the odds
•Ranges from negative to positive infinity
-Reflects odds of being a case but varies linearly with predictor(s)
-Not very interpretable
•If p = .8, the odds = 4
~But the logit = 1.386
Predicted odds vs Logit
Essentially…
- Logit = maths
- Predicted odds = descriptive
-Basically the same things but just transformations of each other
Two kinds of regression coefficient in logistic regression
-Typical partial regression coefficients (B)
•Identical in function to OLS regression
•Indicates an increment in the logit given unit increment in the predictor
-Odds ratios (e^B)
•Indicates the amount by which odds of being a case are multiplied given a unit increment in predictor (or change in level of predictor if the predictor is categorical)
•If B = 0, e = 1, the predictor has no relationship
Estimating the parameters in a Logistic Regression Model
-OLS uses an analytic solution
•Regression coefficients are calculated from known equations
•Seeks to minimise the sum of (the residuals)^2
-Logistic regression uses maximum likelihood estimation, which is an iterative solution
•Regression coefficients are estimated by trial and error, and gradual adjustment
~Seeks to maximise the likelihood (L) of the observed values of Y given a model and using the observed values of the predictors
Evaluating the model: in OLS multiple regression
-For OLS, the sum of squares are the building blocks of model evaluation
• Focus is the partitioning of variance
~SStotal= SSregression + SSresidual
~R^2 = SSregression/SStotal
Evaluating the model: in Logistic Regression
-L.G uses measures of deviance rather than sum of squares
•Deviance is essentially the lack of fit
-The focus is the lack of fit
•Null deviance, Dnull is similar to SStotal
~Relfects the amount of variability in the data and the amount of deviance that could potentially be accounted for
•Model deviance, Dk, is similar to SSresidual
~Reflects the amount of variability in the data after accounting for prediction from k predictors
Log likelihoods
-A log likelihood (LL) value can be calculated for each model we test to evaluate the model
•Essential to calculate for each model
- The LL is a function of the probabilities of the observed and model-predicted outcomes for each case, summed over all cases
- Can directly compare the goodness-of-fit of different models using LL
Log Likelihood Ratio Tests
- Compute a LL (LLs) value for a smaller model (one with k parameters)
- Compute a LL (LLb) value for a bigger model (one with k+m) parameters
-Likelihood ratio test (LRT) statistic:
•Compares models hierarchically
•LRT = -2LLs-(-2LLb) = -2*log(Ls/Lb)
~If the smaller model is true, the LRT statistic is distributed at chi-square with m df
Evaluating the model: in likelihood ratio
-Deviance measures contrast LLs using LL ratios
•Dnull = -2ln(Lnull/Lperfect)
~This compares the maximum likelihood (L) for
a model with no predictors (only an intercept) with a perfectly fitting model (aka saturated model)
•Dk = -2ln(Lk/Lperfect)
~This compares the maximum likelihood (L) for a model with a set of k predictors with a perfectly fitting model
Testing model fit
-Won’t be asked directly but need to know it
-In likelihood ratio, we test the null deviance (including only the constant) against the model deviance (containing k predictors)
•As k increases, the difference between the null and model deviance will generally increase, improving model fit
•If there is no significant improvement in fit when we add the k predictors to the model, we need to question the inclusion of those predictors
•If there is no significant deterioration in fit when we remove the k predictors from the model, we need to question the inclusion of those predictors
~I.e. they are redundant in the context of outcome variable
-Only accept more predictors if they increase the significance of model
Different expressions of the equation for the likelihood ratio test
- Dnull-Dk
- -2LLn-(-2LLk)
- -2ln(Lnull/Lk)
Testing model fit (2)
-Always relate prediction model to null model
•Null model might not be interestinf but we still need to try and get better than it
-An example:
•Null model = 20.28
•Model with 3 predictors = 16.19
•LR test statistic: 20.28-16.19 = 4.09
•Evaluate this number against the critical chi-squared value with 3 df
~3df because we have 3 predictors
~This is like the overall R^2 test based on F statistic
•This is not significant, so no improvement in model fit with predictors included in model
•A low power technique so would need lots of PPs
Testing model fit: Caveat
-Need to be wary of sample size:
•Very large sample sizes will be likely to mean trivial differences in model fit between models will be significant, so sometimes need to use adjusted fit indices
- Need more PPs than a linear regression
- The more complicated the model, the more PPs needed
Pseudo-R^2s
- Don’t need to know the formulae
- It is possible to evaluate likelihood ratio model in an analogous way to standard MR using McFaddon’s p^2
-Variations on this:
•Cox and Snell Index:
~Reaches a maximum of .75 when there is equal n in each category of the DV
•Nagelkerke Index:
~Divides Cox and Snell’s R^2 by its maximum in order to achieve a measure than ranges from 0-1
-These do not indicate “variance accounted for” due to inherent heteroscedasticity/not being homoscedastic
Testing predictor significance
-Significance of regression coefficient:
•Wald statistic:
~Quite conservative
~Distributed as chi-squared with df = 1
~Similar interpretation as testing B or beta for significance in ordinary least squares regression
-> SEb often overestimated, risk of type 2 errors
•Another way to do it is to do the thing where you calculate the model with all predictors and see what happens when you take out each predictor
~If an individual predictor makes a significant effect to the model, rely on that more than Wald statistic
-Contribution to prediction:
•Compares the likelihood ratio with and without the predictor
•Chi-squared =D(k-1) - Dk
•Distributed as X^2 with df = 1
•Gives similar information to sr^2 in OLS regression
Model building
-Also possible to examine:
•Main effects model:
~Main effects
•Full factorial model:
~Main effects and interactions between factors, no interactions involving covariates
•Complete:
~Main effects and all interactions, including interactions with covariates
•Saturated:
~Same as complete only covariates treated as factors
Common technique in likelihood ratio is backward-stepwise
-Iterative process, as used in lab: •Begin with complete model •Remove non-significant variables •Re-run model and compare fit •Ends up with a model with only significant predictors in
Assumptions and considerations
- Relatively assumption free
- Assumptions regarding distributions of predictor variables do not apply
Assumptions and considerations
Ratio of cases to variables
-Too few cases relative to number of predictor variables can be a problem
• May produce extremely large parameter estimates and standard errors
•Failure of covergenve when combinations of discrete variables result into many cells with no cases
~Solution: collapse categories, delete offending category or delete the discrete variables if not important
-Extremely high parameter estimates and SEs indicate a problem
•Estimate increases with more iterations or solution does not converge when the maximum likelihood is being conducted
~Solution: increase number of cases or eliminate one or more predictors