Module 3 Flashcards

1
Q

What type of IVs and DVs can we have in logistic regression?

A

IVs can be categorical or continuous

DVs must be categorical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can logistic regression imply any causation?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we calculate the odds ratio?

A

Odds of event occurring in one group divided by the odds of the event occurring in the other group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is complete separation. What example does Field (2013) give?

A

When our predictor variables make a perfect prediction of the outcome eg: cats and burglars. It is problematic because it gives no in-between data in order to make probabilities, which is the whole point of logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is overdispersion?

A

Overdispersion is when observed variance is higher than expected. It is caused by a violation of the assumption of independence and causes standard errors to become too small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do we use the log-likelihood statistic for?

A

To determine how well our model fits the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we calculate the deviance statistic?

A

-2 x log likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is parsimony? How do we achieve it in logistic regression?

A

Parsimony is when we prefer simpler explanations of phenomena over complicated ones. We achieve parsimony by including all variables in the model then progressively removing the ones that aren’t relevant/causing an effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which assumption is broken by logistic regression? How do we overcome it?

A

The assumption of linearity is broken by logistic regression. We overcome it by using the log (or ‘logit’) of the data. Thus, the assumption of linearity in logistic regression assumes that there is a linear relationship between any continuous predictors and the logit of the outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The process of cleaning data and checking for assumptions is the same in logistic regression, expect for an additional 5 steps. What are they?

A
  1. Evaluate model parsimony
  2. Check the linearity of the log outcome variable (logit)
  3. Assess the data using deviance statistics
  4. Check for complete separation
  5. Check for overdispersion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of regression do we do if we want to check the model for parsimony? Do we check for interactions?

A

Hierarchical - put variable 1 and 2 in the first block, and add 3 in the second block. Only check for interactions if we have the theory to back it up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the Wald statistic test? How do we calculate it?

A

The Wald statistic is similar to the t-test. It tests the null hypothesis that b = 0. It is biased when b is large, because this makes this SE become inflated which makes the Wald underestimated.
Wald = b/SEb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In terms of the relationship between a predictor and outcome variable, what happens when the odds ratio is less than 1 vs. when it is greater than 1?

A

When odds ratio > 1: as predictor increases, odds of outcome variable increases
When odds ratio < 1: as predictor increase, odds of outcome variable decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain graphically why we can’t just use multiple linear regression.

A

Multiple linear regression creates a straight line of best fit when displayed graphically. However, categorical data doesn’t suit a line of best fit. Logistic regression provides a sigmoidal curve, which is much better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the basic example of a research question suited to logistic regression?

A

“Can one level of the outcome variable (coded 0) be discriminated from the other level (coded 1) using a set of predictors?”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What other questions does logistic regression answer?

A

Which variables predict which outcome
How variables affect the outcome
Does a predictor variable increase or decrease the probability of an outcome, or does it have no effect in discriminating between the binary outcome?

17
Q

What does censored data mean in logistic regression?

A

Censored data isn’t data that has been ignored or removed from the set. Rather it refers to when data above a cutoff is classed as a success and when data below a cutoff is classed as failure, for example.

18
Q

What is the first thing we should do before the main analysis in logistic regression?

A

Check for correlation between predictors

19
Q

What does block 0 refer to?

A

Block 0 refers to the model without any predictors in it

20
Q

What does the table ‘Variables not in the Equation’ refer to?

A

This table shows us the predictive ability of each variable individually, not together in the model

21
Q

What does the Hosmer-Lemmeshow test tell us? What do we want its result to be?

A

This test refers to the goodness-of-fit of the model. We want it to be GREATER than .05.

22
Q

What does the Classification table tell us? What is a good result?

A

This table shows us what percentage of outcomes were correctly predicted. Anything about 65% is good.

23
Q

What are we looking for in the ‘Omnibus tests of model coefficients’ table?

A

This table compares the different models. If it is significant, then there is a significant difference between models (i.e. one is a better predictor than the other)

24
Q

What is another way to compare two models? What do we subtract from what?

A

Subtract the -2LL of the second model from the -2LL of the first model.

25
Q

What is the preferred R-square?

A

Nagelkerke is preferred over Cox-Snell

26
Q

What two values can R-square be between? If R-square is less than 1, what does this mean for the relationship between the predictor and odds of the outcome? What about if R-square is greater than 1?

A

R square can lie between 1 and -1
If R square is less than 1, then as the predictor increases the odds of outcome decreases
If R square is greater than 1, then as the predictor increases, the odds of outcome increases.

27
Q

What should the confidence interval for Exp(B) not cross?

A

1

28
Q

What does a significant chi-square mean?

A

The predictors in the model are significant predictors

29
Q

What do we do after we’ve checked multiple models and found out just one of them is significant?

A

Run the analysis again with just this model, to maintain parsimony

30
Q

How do we want our clarification plot to look?

A

We don’t want points clustered around the middle, we want them located at the periphery.

31
Q

When looking at residuals and influential cases, what do we want Cook’s distance to be?

A

< 1

32
Q

What do we want our leverage values to be?

A

Close to .018

33
Q

What do we want 95% of our normalized residuals to be between?

A

Between 1.96 and -1.96

34
Q

What do we want our DFBETA for the constant to be?

A

< 1

35
Q

In what situation to we have to check the linearity of the logit? How do we know if this assumption has been violated?

A

Only check for linearity of logit when we have continuous predictors. It has been violated if there is a significant interaction

36
Q

In what situation do we have to check for multicollinearity? How do we know if this assumption has been violated?

A

Only check for multicollinearity of we have more than one predictor. It has been violated if tolerance is < .1 or VIF is > 10