Module 3 - Logistic Regression Flashcards

1
Q

What type of IVs and DVs can we have in logistic regression?

A

IVs can be categorical or continuous

DVs must be categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can logistic regression imply any causation?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we calculate the odds ratio?

A

Odds of event occurring in one group divided by the odds of the event occurring in the other group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is complete separation. What example does Field (2013) give?

A

When our predictor variables make a perfect prediction of the outcome eg: cats and burglars. It is problematic because it gives no in-between data in order to make probabilities, which is the whole point of logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is overdispersion?

A

Overdispersion is when observed variance is higher than expected. It is caused by a violation of the assumption of independence and causes standard errors to become too small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do we use the log-likelihood statistic for?

A

To determine how well our model fits the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we calculate the deviance statistic?

A

-2 x log likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is parsimony? How do we achieve it in logistic regression?

A

Parsimony is when we prefer simpler explanations of phenomena over complicated ones. We achieve parsimony by including all variables in the model then progressively removing the ones that aren’t relevant/causing an effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which assumption is broken by logistic regression? How do we overcome it?

A

The assumption of linearity is broken by logistic regression. We overcome it by using the log (or ‘logit’) of the data. Thus, the assumption of linearity in logistic regression assumes that there is a linear relationship between any continuous predictors and the logit of the outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The process of cleaning data and checking for assumptions is the same in logistic regression, expect for an additional 5 steps. What are they?

A
  1. Evaluate model parsimony
  2. Check the linearity of the log outcome variable (logit)
  3. Assess the data using deviance statistics
  4. Check for complete separation
  5. Check for overdispersion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of regression do we do if we want to check the model for parsimony? Do we check for interactions?

A

Hierarchical - put variable 1 and 2 in the first block, and add 3 in the second block. Only check for interactions if we have the theory to back it up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the Wald statistic test? How do we calculate it?

A

The Wald statistic is similar to the t-test. It tests the null hypothesis that b = 0. It is biased when b is large, because this makes this SE become inflated which makes the Wald underestimated.
Wald = b/SEb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In terms of the relationship between a predictor and outcome variable, what happens when the odds ratio is less than 1 vs. when it is greater than 1?

A

When odds ratio > 1: as predictor increases, odds of outcome variable increases
When odds ratio < 1: as predictor increase, odds of outcome variable decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain graphically why we can’t just use multiple linear regression.

A

Multiple linear regression creates a straight line of best fit when displayed graphically. However, categorical data doesn’t suit a line of best fit. Logistic regression provides a sigmoidal curve, which is much better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the basic example of a research question suited to logistic regression?

A

“Can one level of the outcome variable (coded 0) be discriminated from the other level (coded 1) using a set of predictors?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What other questions does logistic regression answer?

A

Which variables predict which outcome
How variables affect the outcome
Does a predictor variable increase or decrease the probability of an outcome, or does it have no effect in discriminating between the binary outcome?

17
Q

What is logistic regression?

A

Logistic regression is used to predict non-continous variables
Also known as Logit Analysis
Can be multinomial, ordinal or binary.
Similar to discriminant analysis (in MANOVA) but differs in terms of assumptions so not interchangeable
Logistic regression doesn’t try to predict an outcome score. Rather, it predicts the probability an event will occur given the predictive values.
Predicts outcome by creating a variate comprised of the IVs
Variate = measure comprised of 2+ variables

18
Q

What are the advantages of logistic regression?

A

Doesn’t require many assumptions to be met
Doesn’t require:
Normality
Linearity
Homoescadescity
Although they do increase predictive power
Can be interpreted in a similar way to multiple regression
Forced, Hierachical and stepwise methods all available

19
Q

What are the disadvantages of logistic regression?

A
Still has some assumptions
Independence of errors,
Linearity of logit,
Absence of outliers
Need strong theoretical justification for predictors
Causality cannot be established 
Requires large sample size
Problems with model overfit/complete separation
20
Q

What are log likelihood and deviance?

A

Log Likelihood is equivalent to the SSR (sum square residuals)
Log Likelihood: Compared predicted and actual probability
Large value = poor fit, small value = good fit
The Deviance score (-2LL) is used to compare model parsimony and to calculate R2
Chi-square distribution of log likelihood

21
Q

What are the different versions of R2 in logistic regression?

A

R2 is a measure of variance explained; all are derived from Deviance Stat
In log regression cannot simply square R statistic
Homer and Lemenshow: Orders data by group and compares to prediction using CHISQ dist
Cox and Snell: uses sample size, used by SPSS
Neve reaches theoretical max so limited in high end
Nagelkerke: moderated Cox and Snell to fix upper limit

22
Q

What is the Wald Statistic?

A

Wald Statistic (z statistic)
Logistic regression equivalent of the t statistic)
SPSS reports as z2 to get a chisquare distribution
Tells us whether the contribution is significant
Be cautious; when b is large SE becomes inflated
More accurate to add in hierachically and examine change in Likelihood stats
Check if CI crosses 0

23
Q

How is logistic regression accomplished in SPSS?

A

Correlate -> Bivariate -> Add all variables
Select potential predictors
Be careful with negative predictors (cancel out positive predictors)
Analyse -> Regression -> Binary Logistic
Outcome in Dependent, Predictors in covariate
Choose ‘enter’ method (unless hierachical is warranted)
If categorical predictor present -> categorical -> move predictor into box
Save; group membership
Options; Hosmer, CI, Classification

24
Q

How do you interpret logistic regression output in SPSS?

A

Check which cases are included under case processing
Block 0; Null hypothesis model
Probability without predictors
Variables not in equation; shows prediction outside model
Block 1; Simultaneous model
Omnibus test compares to Block 0 (p
Nagelkerke = variance explained
Homer-Lemenshow < .05 = good
Exp(B)= odds ratios
Contingency tables; Shows how many cases were correctly predicted
Classification tables; % correct predicted
compare to Null model

25
Q

What are some common problems in logistic regression?

A

Overdispersion; the variance is larger than expected from the model
Makes SE/CIs too small
Caused by violating independence of errors assumption
Present if Dispersion Parameter is greater than 1 ( big problem if over 2)
Incomplete Information from Predictors;
Ideally, you should have some data for every possible combination of predictors (definitely for categorical)
Violation causes large SEs
Complete Separation; when outcome can be perfectly predicted by 1+ predictor
model collapses, large SEs