Logistic Regression Flashcards

1
Q

what is logistic regression (LR) for?

A

Logistic regression is an example of a non-linear regression model, which is what we need when we have a dichotomous or categorical DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

the assumptions is LR are characteristically ….

A

less severe – relatively assumption free

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 3 main reasons for performing a logistic regression rather than a standard multiple regression?

A

1) DV is categorical, and therefore 2) Line of best fit will be sigmoidal, not linear, and as such 3) There will be non-normality and heteroscedasticity in the residuals if OLS regression is used, which violates important assumptions of this method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how does LR build a model?

A

by measuring the deviance of predictors, and including them or excluding them based on their contribution to predicting the outcome variable …. LR says: Does an individual predictor increase or decrease the probability of an outcome?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

as apposed to MR… LR uses a dichotomous DV, and …

A

continuous IVs -

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

LR is also not…

A

linear,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the predictive model is called XX

A

P hat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

the residuals are not…..

A

Residuals are clearly not normal (skewed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

and exhibit …

A

heteroscedasticity – residuals are all or nothing, and not evenly distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

so, Instead of the model fit being linear (which excludes the possibility of using probability, as a linear line can extend past 0 and 1, where probability lies), LOG_REG uses

A

a non-linear (sigmoidal) line of best ft.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Probability means =

A

0-1 or a % 0-100) – likelihood of an event occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Odds mean =

A

Odds = Probability of event divided by its component 1-the probability of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

why does LR use odds?

A

unpacks the maths nicely - actually it is converted back t p value after using odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

in LR, instead of using the odds (which are xx), we use the …

A

asymmetric natural log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the natural log called in LR

A

the logit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what does the odds ratio mean in LR?

A

Odds ratio: relationship between the odds of an event occurring across levels of another variable (by how much do the odds of Y change as X increases by 1 unit?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

and what does the ‘ratio of ratios’ mean?

A

Ratio of a ratios – the event of an event occurring as a function of levels of another variable. (e.g. odds of males having a disease with the odds of females having disease – i.e. the odds of these odds combined)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

why do we present the results in terms of log odds and odds ratio?

A

as it turns a non-linear relationship into the familiar linear one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

this enables us to subsequently …

A

test whether this coefficient is significantly different from 0 – just like a t-test in MR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the predicted odds range from ?

A

0 to + ∞

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

so when p>.50

A

odds>1 (.50 would be even at 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

the predicted odds varies ….

A

varies exponentially with the predictor(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

in comparison the natural logit ranges…..

A

from - ∞ to + ∞

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

it reflects odds of being a case but

A

varies linearly with the predictor(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

the issue with this is

A

not very interpretable; – if p=.8; odds=4; logit=1.386

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The typical partial regression coefficients (B) indicate ….

A

increment in the logit given unit increment in predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

whereas the Odds ratios (eB) indcates?

A

the amount by which odds of being-a-case are multiplied given a unit increment in predictor (or change in level of predictor if predictor is categorical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

if MR uses OLS…. LR uses…..

A

Ð maximum likelihood estimation; an ITERATIVE solution where the regression coefficients are estimated by trial-and-error and gradual adjustment. Ð (this seeks to maximize the likelihood (L) of the observed values of Y given a model and using the observed values of the predictors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

if OLS uses the sum of squares….. LR uses….

A

uses measures of deviance rather than sums of squares Focus is lack of fit more focused on (1-R2) minimizing this – just the same as MR but flipped round Null Deviance, Dnull; similar to SSTotal = reflects amount of variability in data, the amount of deviance that could potentially be accounted for. Model Deviance, DK; similar to SSResidual = reflects amount of variability in data after accounting for prediction from k predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

For each model, a x x value is calculated – which is analogous to a F ratio etc for the overall model.

A

log likelihood

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

LR uses xxx models

A

nested So, minimising the lack of fit of the model – maximising the likelihood of the data So, take models, compare sets of models with one another. Simplest way is comparing model with all the variables in, with no model at all..and compare subsets of the model, with and without individual predictors + sig of each predictor. (COMPARING 2 MODELS, ONE IS BIGGER AND ONE IS SMALLER AND NESTED IN THE BIGGER MODEL, COMPARING HIERARCHICALLY)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

If the xxx model is true then the LRT statistic is distributed as xxx with m df

A

Ð If the smaller model is true then the Likelihood ratio test (LRT) statistic is distributed as χ2 with m df if smaller model is correct, under assumptions, then LRT is chi 2 with m df. SOOOO its testing whether it is worth having those m parameters in the model, if the LRT is not any bigger under the chi 2 dist with m df then it is not worth putting the additional parameters in, and we prefer the simpler model – more parsimonious explanation – only prefer bigger model if it improved fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

this LR standard approach is more xxxxx than standard MR – resembles xxxxx MR.

A

Standard approach is more hierarchical than standard MR – resembles hierarchical MR. We only accept more predictors if they significantly enhance the degree of fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

TAKE HOME MESSAGE: Collectively when we have assessed our model, and the 3 predictors don’t enhance our model prediction. Then you ask…

A

would one model on its own be a better fit etc?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

a limitation of LR is that it is a xxxxxxx proceedure

A

low power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

how is it low power?

A

as catagorical - IVs are either 0 or a 1 needs big sample sizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

what is Pseudo R2 ?

A

waste of time – analgous to R2 – McFaddons/ Cox n Snell / Nagelkerke – all crap. Not “variance accounted for” as not homoscedastic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

how do you calculate DF for catagorical DVs?

A

To calculate DF for a binary DV (has disease: yes vs no)– you need to add up all the main effects and interactions. It is not N-1 as when DV is continuous. Similarly, categorical IVs need (m-1)*(n-1) parameters to capture the effects when there are m levels of the IV and n levels of the DV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

what is the Wald statistic?

A

Wald statistic – a Z statistic, tested against chi2 again– similar to B or beta in OLS MR. Sometimes too conservative. This Wald in combination with the change in the model when you take that predictor out.

40
Q

just a note: HE mentions – this below would be in the exam WHEN VIEWING THE SPSS OUTPUT IN THE EXAM – I WOULD BE CAREFUL TO CHECK THE CODING OF THE VARIABLES – AS THEY PROBABLY WLL TRY CATCH US OUT. SAY HAVING AUTISM IS CODED AS 0, AND NOT HAVING AUTISM IS 1. YOU THINK OTHER WAY AROUND ETC. – see slide 39

A

HE mentions – this below would be in the exam WHEN VIEWING THE SPSS OUTPUT IN THE EXAM – I WOULD BE CAREFUL TO CHECK THE CODING OF THE VARIABLES – AS THEY PROBABLY WLL TRY CATCH US OUT. SAY HAVING AUTISM IS CODED AS 0, AND NOT HAVING AUTISM IS 1. YOU THINK OTHER WAY AROUND ETC. – see slide 39

41
Q

What effects are possible to examine in LR?

A

Possible to also examine – Full factorial (main effects and interactions between factors; no interactions involving covariates – like omnibus, doesn’t tell us much) – Complete: main effects and all interactions, including interactions with covariates (good) – Saturated: as complete only covariates treated as factors (just everything, huge model, granular way of looking at it – just IS the data itself)

42
Q

Common technique in LOG_REG is xxxxx

A

Common technique in LOG_REG is backward-stepwise: (like we would rerun without maternal warmth) Ð Iterative process (see SPSS class): Ð Begin with complete model Ð Remove non-significant variables Ð Re-run model and compare fit

43
Q

what is linearity of the logit?

A

Absence of multicollinearity among the predictors

44
Q

what are the two methods mentioned in slides to test multicollinearity?

A

HOSMER & LEMESHOW method BOX-TIDWELL

45
Q

the HOSMER & LEMESHOW method…..

A

Turn covariates into quartiles and enter as factor. Then compare the quartiles (do the EXP_Bs (odds ratio) increases at roughly equal levels/roughly linear trend)

46
Q

the BOX-TIDWELL method……?

A

In this approach, terms, composed of interactions between each predictor and its natural logarithm, are added to the logistic regression model. The assumption is violated if one or more of the added interaction terms are statistically significant. Construct a new predictor (in his e.g. Leadership*log(leadership) If this extra predictor is sig. then there is evidence of non-linearity in the logit

47
Q

Terminology in Multinomial LR: – Categorical predictors are called ‘x’ – Continuous predictors are called ‘x’

A

– Categorical predictors are called ‘factors’ – Continuous predictors are called ‘covariates’

48
Q

A number of problems may occur when

A

there are too few cases relative to the number of predictor variables.

49
Q

Logistic regression, like all varieties of multiple regression, is sensitive to extremely high correlations among predictor variables, how to test for both discrete and continuous variables?

A

To find a source of multicollinearity among the discrete predictors, use multiway frequency analysis (cf. Chapter 16) to find very strong relationships among them. To find a source of multicollinearity among the continuous predictors, replace the discrete predictors with dichotomous dummy variables and then use the procedures of Section 4.1.7. Delete one or more redundant variables from the model to eliminate multicollinearity.

50
Q

if the design is repeated measures, say the levels of the outcome variable are formed by the time period in which measurements are taken (before and after some treatment), then there is an issue with…..

A

Independence of Errors

51
Q

one way to remedy Independence of Errors?

A

One remedy is to do multilevel modeling with a categorical DV in which such dependencies are considered part of the model

52
Q

The model fitting information table includes information about the fit of two models - what are they?

A

: one is a model with no effects just the intercept (intercept only model) and the other (final model) is the model specified for this stage

53
Q

The -2 * log likelihood (-2LL) values for each model is given in the table with larger values indicating

A

indicating worse fitting models.

54
Q

The difference between -2LL values for two models is a …

A

likelihood ratio test statistic

and this is distributed approximately as the chi-squared distribution

55
Q

with degrees of freedom (df) equal to

A

the difference in number of parameters between the two models.

56
Q

if the statistic is highly significant for x df it indicates that

A

that there is be a statistically significant deterioration in fit from the final model to the intercept only model

57
Q

this means that some or all the parameters in the final model are useful in

A

explaining variance in outcome (i.e., DV category membership

58
Q

A good answer might also explain why there are x df. - explain how we calculate df in LOG_REG

A

I think this is how many levels of DV -1 for each parameter (i.e. main effect and interaction term) included here i ref the 2002 model answer question 2 “• The final model contains terms for age covariate effect, which for a 3-category DV requires 2 parameters; the gender factor also requires 2 parameters; and the age*gender interaction also requires 2 parameters. This explains the 6 df (=2+2+2). Note that effects in logistic regression are really effect*DV interactions, explaining why the 3 levels of the DV are relevant to the number of parameters needed. “

59
Q

a goodness-of-fit table compares …

A

the deterioration of fit between a saturated model (i.e., a model with 0 df, that provides the best possible fit to the data) and the final model for that stage of the analysis.

Nominal Log_Reg (from past exam)

60
Q

The two statistics in a goodness-of-fit table are?

A

deviance and pearson calculate the goodness of fit statistic in slightly different ways (deviance is a log likelihood ratio test).

61
Q

if the significance values are not significant it shows us…..

A

This shows that the final model for stage 1 is not a significantly worse fit to the data than a saturated model with x more parameters (x more parameters in the saturated model explain why these test statistics have 8 df). Thus the extra parameters in the saturated model are not particularly useful in fitting the data. A really good answer might explain why there are 8 more parameters required for the saturated model than for the final model of stage 1. But i dont really understad this tbh

62
Q

what information does the likelihood ratio tests table provide?

A

The likelihood ratio tests table provides information on the deterioration in the fit from the final model fitted in this stage to reduced models in which particular terms are removed from the model. (as we go down the list)

63
Q

what is model fitting table about?

A

the difference between the intercept and our model with x predictor and x df

64
Q

what does the chi 2 in the model fitting table mean?

A

From SPSS output “ the chi 2 statistic is the difference in -2 log likelihoods between the final model and a reduced model. The reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of the effect are 0” gives the difference between the model and intercept full model using the -2 log likelihood and compares it against a chi2 dist

65
Q

Log Reg uses measures of xxxxx rather than sums of squares

A

deviance

66
Q

the focus is on the

A

lack of fit

67
Q

what is -2 log likelihood ?

A

Deviance measures contrast log likelihoods using log likelihood ratios. The log likelihood is a function of the probabilities of the observed and model- predicted outcomes for each case, summed over all cases so roughly, the maximum log likelihoods of the following get compared to each other: 1. model with no predictors (only an intercept), with 2. perfectly fitting model (aka saturated model) 1. model with a set of k predictors, with 2. perfectly fitting model 1) Compute a log likelihood (LLS) value for a smaller model (one with k parameters) 2) Compute a LLB value for a bigger model (one with k+m parameters) Likelihood ratio test (LRT) statistic, LRT = -2*LLS – (-2*LLB) = -2*log(LS/LB) If the smaller model is true then the LRT statistic is distributed as χ2 with m df

68
Q

why is pseudo r2 unreliable?

A

can’t compare models (one with another) sensitive to N

69
Q

what does the likelihood ratio tests table provide?

A

Individual predictor contribution The likelihood ratio tests table provides information on the deterioration in the fit from the final model [fitted in this stage] to reduced models in which particular terms are removed from the model. So if one variable exhibits a significant value it means the model has significantly worse fit without that variable

70
Q

after finding non-significant predictors from the outcome of the likelihood ratio tests, we would?

A

re run without those predictors as they do not contribute to the model well

71
Q

so if the -2LL value gets higher, what does this indicate?.

A

indicates a worse fitting model

72
Q

what does the chi 2 column indicate in the likelihood ratio tests table?

A

The difference in -2LL values is shown in the chi-square column of the table and this is the likelihood ratio test statistic for the deterioration in model fit and it has x df which is the number of parameters associated with the main effect

73
Q

what does the Parameter estimates table give us…

A

the effect of each component in a model with all the components present - their effect, independent of the other predictors present in the model the parameter estimates table provides a test of the effect within a model containing the other terms - their independent contribution to the model over and above all the other factors in the model (like partial regression coefficients)

74
Q

so for categorical predictors in the Parameter estimates table we get …..

A

one compared to the other (which is why one is set to =0 and no values are put in)

75
Q

and continuous predictors (like age or a likert scale or whatever) are presnted….

A

on their own ….

76
Q

and the odds ratio EXP(B) reflect for continuous means ….

A

change in odds ratio for every unit increase in the variables ON THE dv

77
Q

but the Exp(B) for categorical means….

A

odds change in reference to the other category on DV

78
Q

odds ratio of 1 means…

A

no effect

79
Q

further away from 1 =

A

the more is going on

80
Q

Exp(B) values below 1 which are significant in the categorical variables mean what?

A

that the unstated reference category is more likely (women here was the reference category) “In both cases the odds of identification are lower for men than for women. The odds ratios for men:women, are given in the gendwitn=1 rows, and are 0.644 and 0.471 for identifying a suspect and identifying a volunteer respectively. These odds ratios are both significantly below 1. This pattern is consistent with a bias towards making an identification (whether accurate or not) in women compared to men.

81
Q

Exp(B) values below 1 which are significant in the continuous variables mean what?

A

The odds of xxx [sig or not] decreases for each unit increase in xxx

82
Q

if a continuous Exp(B) value was above 1 at say… 1.899, what would the interpretation be?

A

For each unit increase in xxxxx, the odds of a xxxx almost doubles

83
Q

so above main effects you could look at more parameter models, a saturated model would be all main effects and interactions, but inbetween that, you could extend a model to include …..

A

interaction terms between some of the main effects (e.g. interaction between gender and marital status)

84
Q

if we included that interaction with all the main effects, what would this model be called …

A

full factorial model (it’s full with respect to factors i.e. it has all factors, main effects and interactions, and it has all the main effects of the covariates) same as ancova -full factorial- doesnt have interactions between categorical and covariate variables only factors interact with other factors not with covariates He calls it COMPLETE model saturated would be treating every possible cell by cell combination (so age would be broken down into every single age)

85
Q

again, the difference between -2LL values for two models is a

A

likelihood ratio test statistic

86
Q

and this is distributed approximately as the

A

chi-squared distribution with degrees of freedom (df) equal to the difference in number of parameters between the two models.

87
Q

if the statistic is highly significant ?

A

= statistically significant deterioration in fit from the final model to the intercept only model.

88
Q

state why the DF - e.g. • A good answer might also explain why there are x df. The final model contains terms for breed (3 levels); the food factor (2 levels); and the breed*food interaction .

A

• A good answer might also explain why there are 5 df. The final model contains terms for breed (3 levels), which requires 2 parameters (3‑1); the food factor (2 levels) requires 1 parameters (2-1); and the breed*food interaction also requires 2 parameters. This explains the 5 df (=2+1+2).

89
Q

• This shows that the model fitted for study 1 is a full factorial model (it has all the possible factors and their interactions included). It is also a saturated model as when only factorial IVs are included (ie no covariates), and the model is full… there are ….

A

there are no more degrees of freedom.

90
Q

• the goodness-of-fit (GOF) table compares

A

the deterioration of fit between a saturated model (i.e., a model with 0 df, that provides the best possible fit to the data) and the final model for that stage of the analysis.

91
Q

The two statistics (Pearson and Deviance) calculate the goodness of fit statistic in

A

slightly different ways

92
Q

(Deviance is equivalent ……

A

to a log likelihood ratio test). The zero df for the GOF statistics indicates that the model tested is a saturated one (as it contains the same number of parameters as the saturated model with which it is compared).

93
Q

McFaddon’s ρ2 treats ….

A

Dnull - Dk as equivalent to SSregression and Dnull as equivalent to SSresidual, and thereby calculates R2 just as one would in OLS regression.

94
Q

Cox & Snell is a variation on this which reaches a maximum of

A

.75 when there is an equal N in each category of the DV,

95
Q

Nagelkeke Index divides

A

Cox and Snell’s R2 by its maximum in order to achieve a measure that ranges from 0 to 1.