logical regression Flashcards

1
Q

what makes logical regression different to all the other types of regression: linear regression, multiple linear regression, non linear regression

A

For the others

They model ratio/scale data. DV must be ratio/scale

necessary because we use the sum of squared residual as a means to fit the model - using a parametric approach

logical regression

if DV has a limited range - e.g., either 1 or 0, or between 0 and 100

  • could be like marks in a test
  • accuracy scores
  • etc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the typical linear regression equation

A

here we assume a linear relationship between the iV and DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why can we not use linear regression if our IV is limitted in range e.g., pass (1) or fail (0)

A

because certain values will indicate 0 (e.g., 40%) and other values might indicate 1 (e.g., 90%)

Problem: values below 40, model will predict values lower than 1, and value above 90 predicts values higher than 1. Also, anything in between will equate to something between 0 and 1.

this doesn’t make sense

  • cant have values between 0 and 1 - want to predict ONLY 0 and ONLY 1
  • and cant have values exceeding 1 / less than 0 0 but the regression equation predicts values outside the range of 0 -to-1

serious problem. This is because it creates really large residuals. this will distort or bias our regression fit.

Residuals will violate the assumptions of homoscedasticity because the data range is limited to 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is logistic regression

A

a special case of non linear regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

why is logistic regression a special case of non linear regression

A

because it deals with this limitation in range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

different types of logistic regression

A

Logistic regression

If you have a limited range in DV e.g., proportion of correct answers on a test. this gives continuous prediction

Binary logistic regression

type of logistic regression where the DV is the binary e.g., 0 or 1. this just ensures we get a binary outcome of either 0 or 1.

both cases deal with this limitation in range of the DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

if i asked n to a 7 point likert scale and then average the scores would I use logical regression because technically there is a limited range of answers

A

No, because while the scale is limited in range you are analysing the average, which, according to central limit theorem is normally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

whats the big problem of using linear regression with data limitted in range?

A

the linear equation will fit the 0/1 values at certain points but everywhere else the residual is large! big problem is that it will predict values larger than 1 and smaller than 0.

we have a real problem with the residuals. and whenever we fit linear regression models. the residuals are what we use to do the fitting

will bias any result we get - will be a problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

cant we just fit a non-linear curve to the binary/limited range DV

A

nicely levels off at 0 and at 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Let’s say we invent and fit a logistic curve to the binary data - it seems to do quite ok. Can we be satisfied with this

A

no, while it fits ok we want to find the best fitting logistic curve. that’s what logistic regression does.

the best fitting curve that has an S shape

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the equation for the non-linear curve we fit in logistic regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is e

A

its a constant called Eulers number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the OLS regression equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the rationale for using the logistic regression equaiton

A
  • deals with the limitation of range - e.g., 0 to 100
  • functional form is very flexible - fits a wide range of data
  • there are analytical solutions for it - looking up eulers number. to the power of X
  • easier to compute than non linear regression problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

just link in linear regression the form of the equation we are fitting is _____?

A

fixed

thus when fitting the model we are just finding the best fitting numerical values for r the coefficients in the equation (c and b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is logistic regression doing?

A

modelling/predicting data between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

mathematically, is a prediction and what do we call it in statistics?

A

mathematically a prediction is the probability that a case has a value of 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how do we get the probability/ prediction

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what can we use the probability (prediction) to compute?

A

the odds

the equation: the probability of an event happening divided by the probability of it not

essentially the odds are Euler’s number raised to the power of our best-fitting coefficients. so if you know the logistic regression equation you can compute probability directly but can also compute the odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what do we use to measure effect size in logistic regression

A

the odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do we compute the log odds

A

the natural logarithm of the odds

basically taking the inverse of raising something to the power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what kind of relationship does the logistic regression have with the log odds

A

any logistic regression is linear with respect to the log odds (just like with OLS regression)

so by taking the natural logarithm of the odds you are creating a new unit (or DV if you will) that is now linear in terms of the independent variable X

so the log odds vary from negative infinity to infinity as the log odds move from 0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is another word for log odds

A

logit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

logit regression equation

A

Logit regression equation (c + bX)

  • So result of this Is your logit
  • Or logistic probability unit

here the logit to have a yes vs no answer is - 2113.056

so we have the normal regression equation with the constant and coefficient. to get the logit we just pluck X (-6 in this particular case) in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

when would you use Euler

A

if you wanted to compute the odds or probability for a data point that is not in your dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

how do you go from the logit to the odds

A

exp then in parenthesis whatever you have is just a different notation for eulers number raised to the power. this is how you would enter it (universal - this is how its done in R, SPSS, MATLAB).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

describe the relationship between the logit and odds

A

E to the power of (logit), and taking the natural logarithm are the inverse operations of one another

So you take the natural log of the odds to get the logit (orange arrow). To go from the logit to the odds you raise e to the power of the logit (green arrow)

To get the odds you literally just type:

“exp(logit value, -213.056 here)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what would 2.9577E-93 be

A

just means it’s a really small number. If it’s a negative sign after the E it just means you shift the decimal place this many places (93 here) to the left.

if it was E+93, then you shift the decimal place 93 places to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

how do you go from the odds to the probability/prediction ?

A

odds/(1+odds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Why has Lore added the column on the end “rounded p”

here has the logistic regression done a good job matching the outcome?

what relationship is there between the measure and the logit

A

Because the number is soooo small (look at the e- on the end). And remember we are doing logistic regression - so outcome has to be either 0 or 1

in this example, you can see the prediction/probability matches the outcome very well (yes/no column). so in this case the logistic regression equation does a really good job.

if you plotted the relationship between the measure and the logit the relationship would be perfectly linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is the relationship between te logit and measure?

how does this compare to the relationship between the measure and data itself (the yes/no responses).

A

linear

this relationship has a limitation in range and has an S shape curve that is the best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what the hell is a case vs not a case

is it always right?

A

when the prediction of the logistic regression is a “yes” or 1 its a case, and “no or 0 is not a case.

sometimes your model might predict something to be a case when it’s actually not. this is where the residuals come in (this is what we’re trying to minimise when we fit the regression equation)

(we try to minimise the sum of the squared residuals)

(residuals of the logit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

how do we turn the probability spat out by the logistic regression into binary outcome

A

you have a cut-off, typically .5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

what do the odds range between

what range for the probability

whats the relatinship between the two

A

0 and positive infinity

probability can only range between 0 and 1

they are related such that when you have an increase in the probability you have an increase in the odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Which do we report: the odds or the probabilty?

A

up to you!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

how do you write up whether something was a case or not?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

how can we find the point of 50/50 split in the dataset

A

take the negative of the constant and divide it by the coefficient

Tells you where the datapoint in your dataset is exactly .5

so this data point might not exist in your dataset but if you wanted to know which level of the IV would mark the 50/50 split in prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

what is a classification table

A

just put in how many of the stuff was correctly labelled as either a fail or pass.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

when might you consider changing the cut off?

A

well, look at the classification table. Maybe you care more about it correctly predicting X rather than y. this might depend on the specific research question e.g., if its about predicting whether someone has a disease then of course you want to be extra careful about predicting a positive score

by loweing the probability of a prediction we are more likely to get a case and then can do a follow up case to make sure. more liberal approach - more likely to catch the reas cases

39
Q

whats the difference between odds and probability

A
40
Q

IV for example can be “attendance”

____ is linearly related to the _IV___

what 2 things are not lineraly related to the IV

A

the relationship between the IV and logit is linear. means fro every unit increase in your IV you have a unit increase in your logit.

e.g., if the logit was 0. 1980 then a 1 unit increase in the IV would increase the logit by 0.1980. so thats a 1% increase regardless of if the attendance (IV) increased from 56-to-57% or 63-to-64%

the probability and odds are not linearly related to the IV. that’s the whole point were fitting a logistic regression equation because we don’t have a linear relationship.

41
Q

for each successive pair of odds for 1 unit increase in the IV ….

A

will always produce the same difference

the ratio of successive odds is a constant (consistently the same)

this is what we call the odds ratio (1.2190 here)

why the odds ratio is often used as the effect size in logistic regression

odds are a function of a one unit increase in your DV

42
Q

for each successive pair of probability values,

for 1 unit increase in the IV ….

A

will not be the same difference across all pairs

43
Q

At 55% the odds of passing were 0.2551. what are the odds someone with 56% attendance passing?

  • 55% (odds = 0.2551)
  • odds ratio of 1.2190
A

0.2551 * 1.2190 = 03110

basically use the odds ratio as a multiplier to whatever the odds at 55% were to get subsequent odds

44
Q

how does SPSS fit a onlinear regression equation? what method does it do to do this?

A

it uses an iterative procedure. basically it takes a guess at what the parameters should be and keeps changing them until the difference between sucessive solutions. s less than some critical value. this is because this way is more efficient if you have a large data set with complex equation

it does this using the maximum likelihood method to get the estimates for your parameters

means it selects/finds the coefficients that make the observed effects more likely

to determine whats more likely it uses the sum of the squared residuals of the logit

45
Q

why might my logistic regression gives me and ray slighttly different outcomes?

A

because. ituses an iterative procedure starting at potentially different points. might give you a slightly different outcome to someone else

46
Q

what is the first table we consider with the logistic regression output

A

the dependent variable encoding table

have a close look and make sure what. youconsider a case is coded as 1

e.g., this should be pass

this is important not only because the nature of the equation wouldchange but so would our interpretations

47
Q

What is block 0 model - what information does it give you

A

SPSS making a prediction without usingany IV’s. does a best guess basically

-2 log likelihood is - a goodness of fit measure

classification table

  • gives us the specificity - how good the model predicts real outcomes
  • specificity - % it correctly predicts what isnt a case (0)
  • sensitivity - % it correctly predicts what is a case (1)
48
Q

block 0 model

if specificity is 0 and sensitivity is 100% with an overall accuracy of 50% - what does the model tell us.

A
  • specificity - % it correctly predicts what isnt a case (0)
  • sensitivity - % it correctly predicts what is a case (1)

just guessing everyone will pass gives an overall accuracy of 50% with a sensitivity of 100 and specificity of 0 with sensitivity of 100 and specificity of 0.

49
Q

what variables are int eh equation for our block 0 model

A

constant only

50
Q

what is the logit of this ?

logit equation (c+ bx)

then what are the odds using. thelogit?

A

0 , we only have a constant and that has a value of 0

odds of this are:

51
Q

Why is the block 0 model useful

A

We use block 0 as a benchmark - how well would you do if you used just the most frequent outcome?

any subsequent model that uses you IV is assessed compared to that benchmark. If everyone had the same outcome we wouldn’t need to fit any model

52
Q

what is block 1

A

Step where the actual IV(s) are being fitted

53
Q

what does this tell us

A

SPSS took six iterations to arrive at the final solution. The negative log likelihood has decreased from model 0

54
Q

What tells us whether the jump from block 0 to the block 1 model is a significant improvement in prediction?

A

Omnibus tests of model coefficients

Chi squared.

Tells us the difference in the -2 log likelihood between block 0 and block 1, p value tells us whether this is significant

55
Q

what do Cox and snell R2 & nagelkerke R2 tell us

A

Goodness of fit test

NOT like R2 in linear regression – does not tell us about the variance explained in DV. Don’t explain it in terms of this is makes no sense in the context of linear regression

Referred to as pseudo R squares

Nagelkerke pseudo R2 is the preferred one used in the literature because its normalised – varies in range between 0 and 1

A value of 1 is as good as it gets

You SHOULD always report the value but its most useful when you compare different models together as higher values indicate better fit/better performance

56
Q

what do Hosmer and Lemeshow test tell us

A

another goodness of fit test. tells us how well the model fits the data.

if it’s not significant this means there is no significant difference between the model and the data

if it is significant it tells you the prediction deviates significantly from the data. Model does not do a good job.

57
Q

what do variables in the equation table tell us

A
  • constant
  • IV coefficients
  • odds ratio for each predicton - exp(B)
58
Q

use this to write the logit equation

A

log odds = -12.259 + (0.198 * attendance)

59
Q

interpret what the last right column means

A

exp(B) is the odds ratio. here it is 1.219 which means that for every 1% increase in attendance their odds of passing increase by 1.219.

60
Q

what is the WALD statistic - what does it tell us here

A

The significance of the coefficients here is determined using a WALD statistic. this is a bit conservative.

using this, we can see neither the coefficients for the constant or attendance are significant ( p values are larger than .05).

we might conclude looking ta this statistic then that including attendance doesn’t help us at all - not significant.

61
Q

if the wald statistic tells us the iV is not significant shall we just delete our whole SPSS account and go to sleep?

A

No! the test itself is quite conservative. Even if it says something is not significant it’s best to look at your overall model (omnibus test of coefficients)

remember this compares. the2 log-likelihoods and decides if block 1 is a significant improvment or not. this is better to use as an indicator of significance. tells us at an overall level whether the inclusion of IVs significantly improved the model.

WALD statistic is applied for each coefficient independently + more conservative.

62
Q

what can we do to inspect the misclassified items?

A

this is helpful for smaller datasets. if you have thousands of people thne the classification plot is more helpful

63
Q

interpret this classification plot

A

FFFFFFFPPPPPPP on the bottom is the predicted outcome

symbols above are the actual outcome

also note how many ppl each symbol represents (here each symbol is .25 so every 4 symbols is 1 person)

if there was a misclassification of bare ppl slightly to the right you might want to change your cut off to a higher value e.g., - .8

64
Q

what does casewise list tell us

A

if there were any outliers. if no outliers the case. of residuals list will be empty

65
Q

can we only do logistic regression with 1 variable?

A

No, just like with any regression we can include multiple IV’s in the model

each variable each predictor will have its own coefficient

66
Q

With multiple IVs in logistic regression can we use the block 0 model to write our logit equation?

A

yes, but because. wedont have any IVs the logit is simply the value of the constant

the odds ratio for that is .684

67
Q

how do we know if the inclusion of the multiple IV’s are better fit to the model than the block 0 model

A

The -2 log likelihood has dropped from block 0 to block 1 indicating a better fit to the data.

we can check whether this is significant by looking at the omnibus test (model row).

second way: compare the sensitivity and specificity and overall model of the two in the classification table

68
Q

pseudo R squared statistics - when are these useful to look at

A

When you are comparing multiple models

higher value tells you the model is good. ocmpare this value across models to see which one is better

69
Q

what does Homer and Lemeshow test

A

we want this to be insignificant. indicates there is no significant difference between the predicted values and observed data values

70
Q

Write the logit (log odds) regression equation from this table

A

of course we also want to know whether all variables in the model contributed significantly to the model - this table indicates they do (under sig)

so the WALD statistic confirms what the omnibus test told us

71
Q

multiple IVs in the logit regression equation - do we have a single odds ratio for summarising all coefficients OR does each coefficient have an odds ratio

A

each coefficient has its own odds ratio Exp(B)

72
Q

using the odds ratio iinterpret the relationship between

  • idealism scores and the decision to stop research
  • relativism scores and deciding to continue the research
A

idealism has a an odds ratio of .502

  • means a 1 point increase in idealism score leads to a reduction in the odds of someone deciding to continue the research
  • negative relationship between idealism and something being considered a case

relativism has an odds ratio of 1.49

this means a 1 unit in increase in score of relativism leads to a change of 1.409 in the odds of someone deciing to continue the research

72
Q

using the odds ratio iinterpret the relationship between

  • idealism scores and the decision to stop research on cats
  • relativism scores and deciding to continue the research on cats
A

idealism has an odds ratio of .502

  • means a 1 point increase in idealism score leads to a reduction in the odds of someone deciding to continue the research
  • negative relationship between idealism and something being considered a case

relativism has an odds ratio of 1.49

  • this means a 1 unit in increase in score of relativism leads to a change of 1.409 in the odds of someone deciding to continue the research
  • positive relationship between relativism ands oemething being considered a case
73
Q

here the constant is not significant. should we remove it from the model?

A
74
Q

using the odds ratio iinterpret the relationship between

  • gender and the decision to stop research on cats
A

gender is a binary female

  • males coded as 1
  • means the odds for men deciding to continue the research are 3.225 times higher than for women

so basically it’s looking at the difference between men and women and men were coded s 1 so anything it picks up is what differed between men and women.

75
Q

when looking ta the relationship between the IVs and prediction scores e.g., deciding to continue or stop research. Is it better to look at the odds or the coefficients?

A

Better to look at the odds as the coefficients are sensitive to scale while the odds are not

so the magnitude of odds ratio is more informative in that respect

76
Q

with which type of regression are we interested in residuals

A

all

77
Q

let’s look at the residuals. taking a single case at random e.g.,

A
  • logit regression equation
  • then can use this to predict the odd
  • e(logit) = .121 (basically odds of this person deciding to continue the research is low)
  • then we get the probability/prediction for thie which (odds/1+odds) leaves us with .108

they actually decided to continue the research (actual score 1)

so the residual here is 1 - .108 = .891

78
Q

What is the residual

A

The difference between the actual binary outcome and predicted probability of something being a case

79
Q

What is stored in the casewise list

A

Case wise list flags any outliers

  • default is any residual that exceeds 2 SD
80
Q

What are the three end columns showing

A
  • the residuals
  • Resid is our unstandardised residual that we get subtracting the observed and predicted outcome values
  • Then we have our standardised residuals
  • You can see here they all exceed 2 SD
  • Studentile residual exceeds 2SD as per the regression equation
81
Q

What should you do if you have some variables in the case wise list identified as outliers?

A

Investigate them and look for a pattern to those cases which are identified as outliers e.g., a value entered in an incorrect way. E.g., a case with a residual of 10

82
Q

How do outliers affect the regression equation

A

They have a disproportional effect on the predictive equation. If you remove them this would effect the regression equation.

83
Q

How do I know whether to remove or retain outliers in my regression analysis ?

A

Keep it in if

  • you only have a few outliers e.g., 4 out of 315
  • they are close to the threshold e.g., of 2 SD (remove it if was like 10 or something)
84
Q

why are normal residual plots not really applicable to logistic regression?

A

Bc DV takes the form of either 0 or 1

  • thus any resifuals will be highly clustered
85
Q

Look at the residual plot. How alarming would this be in OLS regression vs logistic regression

A

Would be very concerning in an OLD regression but not so much with logistic regression. Plots here are not high in diagnostic value and this is why we do not make use of them.

86
Q

Ok all 3 IVs are significant.

How can we figure out what are best predictor in our logistic regression model is? And what the order of the remaining predictors are?

A
  • so far we have used the enter method to run The logistic regression
  • for finding the best-worst IV’s in order we use the stepwise procedure
  • called the forward likelihood ratio - “forward LR”
  • this evaluates the contribution of each predictor with respect to the overall log likelihood of the model
  • is there a significant change int eh -2 log likelihood when this predictor was added into the equation
  • allows you to officially test the relative merit of each variable, and the order of their significance
87
Q

Discuss the output of the forward likelihood ratio (forward LR)

A

Get output in a series of steps – 1 step for each variable in the model.

e.g., 3 steps if 3 variables are added to the model

each step – another predictor variable is added and you get an update of the results

88
Q

Interpret this forward LR output

A

The different steps adding a new predictor into the model. Remember the order they are added reflect their relevant contributions to the overall fit and reduction in -2 log likelihood

  • step 1 idealism is added (best predictor)
  • step 2 gender has been added (second best predictor)
  • step 3 relativism is added (third best predictor)
  • when more variables are added to the model the coefficient and odds ratio of the other variables change
  • note idealism they both change when gender is entered, then again when gender + relativism is entered.
89
Q

Analysing logistic regression using the enter method vs the forward LR (stepwise in order of best-worst) method. How would the FINAL 3 variables look in terms of coefficients and odds ratio?

A

They would be the exact same.

90
Q

describe classification table using the forward LR method

A
  • get a different classification table for each variable added into the analysis
  • what is important to report when reporting the classification table? The cut off point
  • in all classification tables the specificity, sensitivity and overall percentage correct for a model is affected by the cut off point
91
Q

Cross Validation

A
  1. Approach you derive the regression equation from one dataset then apply it to another dataset. Can use CV to test how well the regression equation can predict novel data
  2. Could compute the regression equation for half the dataset then test this model on the other half
  3. Use this regression equation in the other dataset to predict the outcome (logit, odds, and probabilities. Use these along with the cut off to make a prediction (whether something is classified as a case or not a case)
  4. Then for this new dataset - cross-tabulate the predicted outcome against the predicted outcome. The “cross-validation classification table”
  5. Then use overall accuracy, specificity and sensitivity to determine if you have a good regression equation
92
Q

When have we covered cross validation in the context of linear regression?

A

Here we used t tests and correlations to test how well a model predicts data (generalises). In logistic regression we use the classification table to test how well the model does this.

93
Q

Interpret this cross-validation classification table

A

Left column stop and continue (actual outcome) while top stop and continue (predicted outcome)

53 decided to stop – correctly predicted to stop, 6 others decided to stop but were incorrectly predicted to continue. 35 decided to continue but were incorrectly predicted to stop, and 21 decided to continue and were correctly predicted to continue.

  • Specificity of model: 53/59 = 89.8%
  • Sensitivity of model: 21/56 = 37.5%
  • Overall accuracy: (53 + 21) / 115 = 64.3%

Overall accuracy = everyone correctly predicted to stop/continue divided by everyone.

94
Q

Using cross-validation the overall accuracy is 64.3%. is this good or bad?

A

To make this decision whether this model is a good model you might have to compare this to performance based on chance. Both in the overall dataset and/or in the new independent dataset.