Wk 7 - Logistic Regression Flashcards
What is the formula for linear regression? (x1, plus define components)
y’ = bx + c
predicted y = slope times x + constant (y intercept)
What does b signify in regression formulas? (x3)
Slope
Coefficient
Amount of change in y for every unit change in x
How is the fit of a regression line maximised/evaluated? (x2)
Least squares criterion:
Want minimal residuals (diff between scores and line)
What 2 questions can we ask of any given regression model?
Q1: Does the predictor variable do anything useful?
Q2: Does the model provide a good fit to the data?
What can we conclude if b = 0 in a regression model? (x1)
Changes in x produce no change/effect in y
How do we assess model fit in linear regression? (x2)
Calculate r-square (proportion of variance accounted for)
And test for significance
How are b and r-square related in linear regression? (x3)
Generally linked to some degree,
But if you move all scores similar distances from line,
b stays the same while r-square goes to hell
What is the major limitation of linear regression method? (x2)
Can’t deal with categorical data
‘All or nothing’ scores, rather than continuous predictions/outcomes available
How does what we are trying to predict change when using categorical rather than continuous DV/y variable? (x2)
Want to assess the change in PROBABILITY of y given b change in x
Rather than change in y scores
What statistical method enables regression with 2 categorical outcomes? (x1)
Binary logistic regression
In linear regression: Predictors are continuous or categorical Outcome is continuous Predictors assumed normally distributed Deals with linear relationships among variables
Whereas in logistic regression? (x4?
Predictors are continuous or categorical
Outcome is categorical
Predictors not assumed normally distributed
Deals with non-linear relationships among variables
What are the 2 applications/questions of logistic regression? (x1, x2)
Predict category people belong to, given predictors
Identify predictors of particular (categorical) outcome variable
*Outcomes are exhaustive and mutually exclusive
How does the linear regression model change for logistic regression? (x3)
y’ becomes a logistic function (s-shaped curve) =
1 divided by
e raised to the power of the linear equation (v)
In logistic regression, if we substitute our largest x value for v… (x1)
And if v is very small… (x1)
y gets close to zero
y gets large
What is the statistical question asked by logistic regression? (x2)
How many units change in x does it take
To shift the odds from favouring particular category of y?
What are odds? (x1)
Expression of relative probability of an event happening vs. not happening
How are odds calculated? (x1)
What happens to probability if you double the odds of an event occurring? (x2)
odds = p(event)
Divided by 1 - p(event)
Increases, but with diminishing returns
Interpreting odds:
If odds = 1… (x1)
Two outcomes are equally likely
Interpreting odds:
If odds > 1… (x1)
Target outcome is more probable than the alternative
Interpreting odds:
If odds < 1… (x1)
Target outcome is less probable than the alternative
What is the convenient way to compare odds of 2 events? (x2)
And what does this tell us? (x1)
Take their ratio
ie, divide one by the other
How many times more likely an event is for one group over an other
Explain changes in odds related to the b (coefficient) in logistic regression? (x2)
In linear, b = change in outcome value
In logistic, b = change in log odds brought by unit change in x
Explain how to interpret changes in log odds in logistic regression? (x2)
Have to EXPONENTIATE the coefficient
*as multiply/divide, exp/log undo each other
Explain what Exp(b) represents in logistic regression? (x3)
Rather than a multiplier of x (as in linear regression)
Exp(b) is the multiplier/proportion of change on the old odds
ie, what we need to multiply old odds by to get new odds
What is the impact of exp(b) on successive odds calculations? (x1)
Which is handy, as… (x1)
For each unit change in x, you get diminishing returns in change in new odds
This is what gives us the characteristic s-shape curve
Despite technical changes from direct predictions to changes in odds, linear and logistic regression remain conceptually similar in that… (x1)
Coefficients reflect predictive utility of our predictor variables
What are the 2 key questions in evaluating a logistic regression model?
Which are answered with which 2 tests?
Does the predictor variable(s) do anything useful?
Does the model provide a good fit to the data?
Significance tests on the coefficients (t-test against b = 0)
Evaluation of R2 via F test against R2 = 0 (model explains zero variance)
In what 2 ways can the null hypotheses for testing b (the significance of the coefficient) be expressed in logistic regression?
b coefficients = 0: No change in log odds with increases in predictor
Exp b = 1: No proportionate change in odds with increases in predictor
What 2 specific tests are used to evaluate coefficients in logistic regression? (plus explain/interpretation, x3, x3)
Wald test:
Form of chi-square testing b (change in log odds)
*significance means reject the null - evidence for predictive utility
95% CIs:
Interval we are 95% sure contains true value of Exp b
*If includes the value 1, evidence of no change in odds for predictor change
What 2 tests evaluate R-square (model fit) in logistic regression?
And we should… (x1)
Because…(x1)
Cox & Snell
Nagelkerke
Report both
As first is conservative and second is liberal
How do we assess model accuracy (ie, mispredictions) in logistic regression? (x2)
Omnibus test of model coefficients
Hosmer and Lemeshow test
What is involved in omnibus test of model coefficients in logistic regression? (x3)
Chi-square test of whether all predictors combined account for any variance
Test against the H0 that R2 = 0
Significant result implies model does better than absolutely terrible!
What is involved in the Hosmer and Lemeshow test of R-square in logistic regression? (x3)
Chi-square test of how closely model predicts outcome categories
Test against the H0 that predictions are perfect
Significant result implies discrepancies between model and data
How do we assess the proportion of correct classifications by a logistic regression model? (x4)
Cases originally all placed in most common category
*% correctly classified reported
Then predicted classifications are compared to empirical data
*% correctly classified reported
What 3 stages of info does SPSS output give us for logistic regression?
Preliminary info
Block 0 output
Block 1+ output
What preliminary info should we double check in SPSS output for logistic regression?
How outcome categories have been coded
What info is given in Block 0 output in SPSS for logistic regression? (x3)
All cases placed in most frequent category in data
Gives baseline model (no predictors) for comparison with more complex models
*Not theoretically interesting, but important
What info is given in Block 1+ output in SPSS for logistic regression? (x3)
Summary and tests of R-square
Tests of variables in Logistic Regression equation (e.g., coefficients)
Info on classification accuracy
What are the 3 methods of conducting logistic regression?
Direct or Enter method
Sequential logistic regression
Stepwise logistic regression
Describe the Direct or Enter method of logistic regression (x3)
As linear MR - all predictors entered simultaneously (ie, Block 1)
Used to evaluate relative strength of predictors
Doesn’t test hypotheses about order/importance of each predictor
Describe the Sequential logistic regression method (x3)
As HMR - researcher chooses order of entering predictors in separate blocks
Determines predictive value of each variable in context of whole model
First predictor explains max possible variance, more added if they improve fit
Describe the Stepwise logistic regression method (x2)
Which is done in what 2 ways?
Predictors entered sequentially, included in model on statistical grounds
Often more exploratory/for hypothesis generation
Forward method - no predictors, progressively weaker predictors added
Backward - start with full model, progressively stronger predictors removed
How is logistic model fit evaluated during stages of Stepwise regression? (x3)
Does adding this predictor significantly improve model fit?
Does removing this predictor significantly harm model fit?
Evaluated via nested model comparisons