Logistic Regression Flashcards by Amy Fowler

The linear regression model is…

Y = bX + c

• ^
  Y is the outcome variable
• b is the slope of the line
•X is the 'explanatory/predictor variable
•c is the intercept

How well did you know this?

Not at all

Perfectly

The residuals are…

Y - Y

How well did you know this?

Not at all

Perfectly

Logistic regression

-A non-linear regression model
•Has a dichtomous or categorical DV
•Predictors are either continuous it categorical

How well did you know this?

Not at all

Perfectly

Related methods

-Logistic analysis/ multiway frequency table analysis:
•Multiple categorical predictors and one categorical DV

-Discriminant analysis:
•Multiple categorical or continuous predictors and one categorical DV
•More assumptions than logistic regression

-Linear regression:
•Multiple categorical or continuous predictors and one continuous DV

How well did you know this?

Not at all

Perfectly

Research questions

-Can we predict the presence or absence of a disorder/disease?
•E.g. label present as 1, absent as 0

-Can we predict an outcome using a set of predictors?
•How good is the model?

-Does an individual predictor increase or decrease the probability of an outcome?
•Related to importance of predictors

-Classification and prediction

-Simple categorical outcomes
•Can we predict the outcomes using categorical predictors?

How well did you know this?

Not at all

Perfectly

Ordinary Least Squares (OLS)

-All forms of multiple regression based on the same structural model: OLS

-3 important characteristics:
•Model is linear
•Residuals are assumed to be normally and homogenously distributed
•Predicted scores (Y’hat’) are on the same scale as the date (Y)

-Characteristics don’t apply to logistic regression

How well did you know this?

Not at all

Perfectly

When representing results of a logistic regressiom graphically…

-Better to use the non-linear/sigmoidal model as it represents the essence of the data better

How well did you know this?

Not at all

Perfectly

Important concepts in logistic regression:

Probability

-The likelihood of an event occurring

•If p = .80, there is an 80% chance of that event occurring

How well did you know this?

Not at all

Perfectly

Important concepts in Logistic Regression

Predicted Odds

-The probability of an event occurring by the probability of it not occurring
•Probs of event happening/Prob of event not happening

-If p = .80, the probs of it not occurring is .20
• = .80/.20
•=4
~The odds were 4:1 in favour of the event occurring

How well did you know this?

Not at all

Perfectly

Important concepts in Logistic Regression

Logit

-Odds are asymmetric (unequal) but can use the natural log of the odds instead
•Log of odds = Logit

How well did you know this?

Not at all

Perfectly

Important concepts in Logistic Regression

Odds ratio

-Relationship between the odds of an event occurring across levels of another variable
•By how much do the odds of Y change as X increased by 1 unit

•Essentially a ratio of ratios

•Measure of effect size is central here
~A good way of measuring the strength of the relationship

How well did you know this?

Not at all

Perfectly

The structural model

Don’t really need to know about it
Log odds turns a non-linear relationship into a linear one

-Our model is of p’hat’i rather than Y’hat’
•p’hat’i is the estimated probability of the outcome i occurring

Base e is an irrational constant, roughly 2.718
B and C are model parameters
Relates our predictor(s) to the predicted scores

How well did you know this?

Not at all

Perfectly

Predicted odds vs Logit

Predicted odds

-Odds of being a case

-Odds = p/(1-p), which ranges from 0 to positive infinity
•When p = .50, the odds are 1 (even odds, 1:1)
•When p>.50, the odds are >1

-Varies exponentially with the predictor(s)

How well did you know this?

Not at all

Perfectly

Predicted odds vs Logit

Logit

-Natural logarithm of the odds
•Ranges from negative to positive infinity

-Reflects odds of being a case but varies linearly with predictor(s)

-Not very interpretable
•If p = .8, the odds = 4
~But the logit = 1.386

How well did you know this?

Not at all

Perfectly

Predicted odds vs Logit

Essentially…

Logit = maths
Predicted odds = descriptive

-Basically the same things but just transformations of each other

How well did you know this?

Not at all

Perfectly

Two kinds of regression coefficient in logistic regression

Study These Flashcards

-Typical partial regression coefficients (B)
•Identical in function to OLS regression
•Indicates an increment in the logit given unit increment in the predictor

-Odds ratios (e^B)
•Indicates the amount by which odds of being a case are multiplied given a unit increment in predictor (or change in level of predictor if the predictor is categorical)
•If B = 0, e = 1, the predictor has no relationship

Estimating the parameters in a Logistic Regression Model

Study These Flashcards

-OLS uses an analytic solution
•Regression coefficients are calculated from known equations
•Seeks to minimise the sum of (the residuals)^2

-Logistic regression uses maximum likelihood estimation, which is an iterative solution
•Regression coefficients are estimated by trial and error, and gradual adjustment
~Seeks to maximise the likelihood (L) of the observed values of Y given a model and using the observed values of the predictors

Evaluating the model: in OLS multiple regression

Study These Flashcards

-For OLS, the sum of squares are the building blocks of model evaluation
• Focus is the partitioning of variance
~SStotal= SSregression + SSresidual
~R^2 = SSregression/SStotal

Evaluating the model: in Logistic Regression

Study These Flashcards

-L.G uses measures of deviance rather than sum of squares
•Deviance is essentially the lack of fit

-The focus is the lack of fit
•Null deviance, Dnull is similar to SStotal
~Relfects the amount of variability in the data and the amount of deviance that could potentially be accounted for

•Model deviance, Dk, is similar to SSresidual
~Reflects the amount of variability in the data after accounting for prediction from k predictors

Log likelihoods

Study These Flashcards

-A log likelihood (LL) value can be calculated for each model we test to evaluate the model
•Essential to calculate for each model

The LL is a function of the probabilities of the observed and model-predicted outcomes for each case, summed over all cases
Can directly compare the goodness-of-fit of different models using LL

Log Likelihood Ratio Tests

Study These Flashcards

Compute a LL (LLs) value for a smaller model (one with k parameters)
Compute a LL (LLb) value for a bigger model (one with k+m) parameters

-Likelihood ratio test (LRT) statistic:
•Compares models hierarchically
•LRT = -2LLs-(-2LLb) = -2*log(Ls/Lb)
~If the smaller model is true, the LRT statistic is distributed at chi-square with m df

Evaluating the model: in likelihood ratio

Study These Flashcards

-Deviance measures contrast LLs using LL ratios
•Dnull = -2ln(Lnull/Lperfect)
~This compares the maximum likelihood (L) for
a model with no predictors (only an intercept) with a perfectly fitting model (aka saturated model)

•Dk = -2ln(Lk/Lperfect)
~This compares the maximum likelihood (L) for a model with a set of k predictors with a perfectly fitting model

Testing model fit

-Won’t be asked directly but need to know it

Study These Flashcards

-In likelihood ratio, we test the null deviance (including only the constant) against the model deviance (containing k predictors)
•As k increases, the difference between the null and model deviance will generally increase, improving model fit

•If there is no significant improvement in fit when we add the k predictors to the model, we need to question the inclusion of those predictors

•If there is no significant deterioration in fit when we remove the k predictors from the model, we need to question the inclusion of those predictors
~I.e. they are redundant in the context of outcome variable

-Only accept more predictors if they increase the significance of model

Different expressions of the equation for the likelihood ratio test

Study These Flashcards

Dnull-Dk
-2LLn-(-2LLk)
-2ln(Lnull/Lk)

Testing model fit (2)

-Always relate prediction model to null model •Null model might not be interestinf but we still need to try and get better than it -An example: •Null model = 20.28 •Model with 3 predictors = 16.19 •LR test statistic: 20.28-16.19 = 4.09 •Evaluate this number against the critical chi-squared value with 3 df ~3df because we have 3 predictors ~This is like the overall R^2 test based on F statistic •This is not significant, so no improvement in model fit with predictors included in model •A low power technique so would need lots of PPs

Testing model fit: Caveat

-Need to be wary of sample size: •Very large sample sizes will be likely to mean trivial differences in model fit between models will be significant, so sometimes need to use adjusted fit indices * Need more PPs than a linear regression * The more complicated the model, the more PPs needed

Pseudo-R^2s

- Don't need to know the formulae - It is possible to evaluate likelihood ratio model in an analogous way to standard MR using McFaddon's p^2 -Variations on this: •Cox and Snell Index: ~Reaches a maximum of .75 when there is equal n in each category of the DV •Nagelkerke Index: ~Divides Cox and Snell's R^2 by its maximum in order to achieve a measure than ranges from 0-1 -These do not indicate "variance accounted for" due to inherent heteroscedasticity/not being homoscedastic

Testing predictor significance

-Significance of regression coefficient: •Wald statistic: ~Quite conservative ~Distributed as chi-squared with df = 1 ~Similar interpretation as testing B or beta for significance in ordinary least squares regression -> SEb often overestimated, risk of type 2 errors •Another way to do it is to do the thing where you calculate the model with all predictors and see what happens when you take out each predictor ~If an individual predictor makes a significant effect to the model, rely on that more than Wald statistic -Contribution to prediction: •Compares the likelihood ratio with and without the predictor •Chi-squared =D(k-1) - Dk •Distributed as X^2 with df = 1 •Gives similar information to sr^2 in OLS regression

Model building

-Also possible to examine: •Main effects model: ~Main effects •Full factorial model: ~Main effects and interactions between factors, no interactions involving covariates •Complete: ~Main effects and all interactions, including interactions with covariates •Saturated: ~Same as complete only covariates treated as factors

Common technique in likelihood ratio is backward-stepwise

``` -Iterative process, as used in lab: •Begin with complete model •Remove non-significant variables •Re-run model and compare fit •Ends up with a model with only significant predictors in ```

Assumptions and considerations

- Relatively assumption free | - Assumptions regarding distributions of predictor variables do not apply

Assumptions and considerations Ratio of cases to variables

-Too few cases relative to number of predictor variables can be a problem • May produce extremely large parameter estimates and standard errors •Failure of covergenve when combinations of discrete variables result into many cells with no cases ~Solution: collapse categories, delete offending category or delete the discrete variables if not important -Extremely high parameter estimates and SEs indicate a problem •Estimate increases with more iterations or solution does not converge when the maximum likelihood is being conducted ~Solution: increase number of cases or eliminate one or more predictors

Logistic Regression Flashcards

(32 cards)