logical regression Flashcards
what makes logical regression different to all the other types of regression: linear regression, multiple linear regression, non linear regression
For the others
They model ratio/scale data. DV must be ratio/scale
necessary because we use the sum of squared residual as a means to fit the model - using a parametric approach
logical regression
if DV has a limited range - e.g., either 1 or 0, or between 0 and 100
- could be like marks in a test
- accuracy scores
- etc
what is the typical linear regression equation
here we assume a linear relationship between the iV and DV
why can we not use linear regression if our IV is limitted in range e.g., pass (1) or fail (0)
because certain values will indicate 0 (e.g., 40%) and other values might indicate 1 (e.g., 90%)
Problem: values below 40, model will predict values lower than 1, and value above 90 predicts values higher than 1. Also, anything in between will equate to something between 0 and 1.
this doesn’t make sense
- cant have values between 0 and 1 - want to predict ONLY 0 and ONLY 1
- and cant have values exceeding 1 / less than 0 0 but the regression equation predicts values outside the range of 0 -to-1
serious problem. This is because it creates really large residuals. this will distort or bias our regression fit.
Residuals will violate the assumptions of homoscedasticity because the data range is limited to 0 and 1.
what is logistic regression
a special case of non linear regression.
why is logistic regression a special case of non linear regression
because it deals with this limitation in range
different types of logistic regression
Logistic regression
If you have a limited range in DV e.g., proportion of correct answers on a test. this gives continuous prediction
Binary logistic regression
type of logistic regression where the DV is the binary e.g., 0 or 1. this just ensures we get a binary outcome of either 0 or 1.
both cases deal with this limitation in range of the DV
if i asked n to a 7 point likert scale and then average the scores would I use logical regression because technically there is a limited range of answers
No, because while the scale is limited in range you are analysing the average, which, according to central limit theorem is normally distributed.
whats the big problem of using linear regression with data limitted in range?
the linear equation will fit the 0/1 values at certain points but everywhere else the residual is large! big problem is that it will predict values larger than 1 and smaller than 0.
we have a real problem with the residuals. and whenever we fit linear regression models. the residuals are what we use to do the fitting
will bias any result we get - will be a problem
cant we just fit a non-linear curve to the binary/limited range DV
nicely levels off at 0 and at 1
Let’s say we invent and fit a logistic curve to the binary data - it seems to do quite ok. Can we be satisfied with this
no, while it fits ok we want to find the best fitting logistic curve. that’s what logistic regression does.
the best fitting curve that has an S shape
What is the equation for the non-linear curve we fit in logistic regression
what is e
its a constant called Eulers number
what is the OLS regression equation
what is the rationale for using the logistic regression equaiton
- deals with the limitation of range - e.g., 0 to 100
- functional form is very flexible - fits a wide range of data
- there are analytical solutions for it - looking up eulers number. to the power of X
- easier to compute than non linear regression problems
just link in linear regression the form of the equation we are fitting is _____?
fixed
thus when fitting the model we are just finding the best fitting numerical values for r the coefficients in the equation (c and b)
what is logistic regression doing?
modelling/predicting data between 0 and 1
mathematically, is a prediction and what do we call it in statistics?
mathematically a prediction is the probability that a case has a value of 0 or 1
how do we get the probability/ prediction
what can we use the probability (prediction) to compute?
the odds
the equation: the probability of an event happening divided by the probability of it not
essentially the odds are Euler’s number raised to the power of our best-fitting coefficients. so if you know the logistic regression equation you can compute probability directly but can also compute the odds
what do we use to measure effect size in logistic regression
the odds ratio
how do we compute the log odds
the natural logarithm of the odds
basically taking the inverse of raising something to the power
what kind of relationship does the logistic regression have with the log odds
any logistic regression is linear with respect to the log odds (just like with OLS regression)
so by taking the natural logarithm of the odds you are creating a new unit (or DV if you will) that is now linear in terms of the independent variable X
so the log odds vary from negative infinity to infinity as the log odds move from 0 to 1
what is another word for log odds
logit
logit regression equation
Logit regression equation (c + bX)
- So result of this Is your logit
- Or logistic probability unit
here the logit to have a yes vs no answer is - 2113.056
so we have the normal regression equation with the constant and coefficient. to get the logit we just pluck X (-6 in this particular case) in
when would you use Euler
if you wanted to compute the odds or probability for a data point that is not in your dataset.
how do you go from the logit to the odds
exp then in parenthesis whatever you have is just a different notation for eulers number raised to the power. this is how you would enter it (universal - this is how its done in R, SPSS, MATLAB).
describe the relationship between the logit and odds
E to the power of (logit), and taking the natural logarithm are the inverse operations of one another
So you take the natural log of the odds to get the logit (orange arrow). To go from the logit to the odds you raise e to the power of the logit (green arrow)
To get the odds you literally just type:
“exp(logit value, -213.056 here)”
what would 2.9577E-93 be
just means it’s a really small number. If it’s a negative sign after the E it just means you shift the decimal place this many places (93 here) to the left.
if it was E+93, then you shift the decimal place 93 places to the right
how do you go from the odds to the probability/prediction ?
odds/(1+odds)
Why has Lore added the column on the end “rounded p”
here has the logistic regression done a good job matching the outcome?
what relationship is there between the measure and the logit
Because the number is soooo small (look at the e- on the end). And remember we are doing logistic regression - so outcome has to be either 0 or 1
in this example, you can see the prediction/probability matches the outcome very well (yes/no column). so in this case the logistic regression equation does a really good job.
if you plotted the relationship between the measure and the logit the relationship would be perfectly linear
what is the relationship between te logit and measure?
how does this compare to the relationship between the measure and data itself (the yes/no responses).
linear
this relationship has a limitation in range and has an S shape curve that is the best fit
what the hell is a case vs not a case
is it always right?
when the prediction of the logistic regression is a “yes” or 1 its a case, and “no or 0 is not a case.
sometimes your model might predict something to be a case when it’s actually not. this is where the residuals come in (this is what we’re trying to minimise when we fit the regression equation)
(we try to minimise the sum of the squared residuals)
(residuals of the logit)
how do we turn the probability spat out by the logistic regression into binary outcome
you have a cut-off, typically .5
what do the odds range between
what range for the probability
whats the relatinship between the two
0 and positive infinity
probability can only range between 0 and 1
they are related such that when you have an increase in the probability you have an increase in the odds
Which do we report: the odds or the probabilty?
up to you!
how do you write up whether something was a case or not?
how can we find the point of 50/50 split in the dataset
take the negative of the constant and divide it by the coefficient
Tells you where the datapoint in your dataset is exactly .5
so this data point might not exist in your dataset but if you wanted to know which level of the IV would mark the 50/50 split in prediction
what is a classification table
just put in how many of the stuff was correctly labelled as either a fail or pass.
when might you consider changing the cut off?
well, look at the classification table. Maybe you care more about it correctly predicting X rather than y. this might depend on the specific research question e.g., if its about predicting whether someone has a disease then of course you want to be extra careful about predicting a positive score
by loweing the probability of a prediction we are more likely to get a case and then can do a follow up case to make sure. more liberal approach - more likely to catch the reas cases
whats the difference between odds and probability
IV for example can be “attendance”
____ is linearly related to the _IV___
what 2 things are not lineraly related to the IV
the relationship between the IV and logit is linear. means fro every unit increase in your IV you have a unit increase in your logit.
e.g., if the logit was 0. 1980 then a 1 unit increase in the IV would increase the logit by 0.1980. so thats a 1% increase regardless of if the attendance (IV) increased from 56-to-57% or 63-to-64%
the probability and odds are not linearly related to the IV. that’s the whole point were fitting a logistic regression equation because we don’t have a linear relationship.
for each successive pair of odds for 1 unit increase in the IV ….
will always produce the same difference
the ratio of successive odds is a constant (consistently the same)
this is what we call the odds ratio (1.2190 here)
why the odds ratio is often used as the effect size in logistic regression
odds are a function of a one unit increase in your DV
for each successive pair of probability values,
for 1 unit increase in the IV ….
will not be the same difference across all pairs
At 55% the odds of passing were 0.2551. what are the odds someone with 56% attendance passing?
- 55% (odds = 0.2551)
- odds ratio of 1.2190
0.2551 * 1.2190 = 03110
basically use the odds ratio as a multiplier to whatever the odds at 55% were to get subsequent odds
how does SPSS fit a onlinear regression equation? what method does it do to do this?
it uses an iterative procedure. basically it takes a guess at what the parameters should be and keeps changing them until the difference between sucessive solutions. s less than some critical value. this is because this way is more efficient if you have a large data set with complex equation
it does this using the maximum likelihood method to get the estimates for your parameters
means it selects/finds the coefficients that make the observed effects more likely
to determine whats more likely it uses the sum of the squared residuals of the logit
why might my logistic regression gives me and ray slighttly different outcomes?
because. ituses an iterative procedure starting at potentially different points. might give you a slightly different outcome to someone else
what is the first table we consider with the logistic regression output
the dependent variable encoding table
have a close look and make sure what. youconsider a case is coded as 1
e.g., this should be pass
this is important not only because the nature of the equation wouldchange but so would our interpretations
What is block 0 model - what information does it give you
SPSS making a prediction without usingany IV’s. does a best guess basically
-2 log likelihood is - a goodness of fit measure
classification table
- gives us the specificity - how good the model predicts real outcomes
- specificity - % it correctly predicts what isnt a case (0)
- sensitivity - % it correctly predicts what is a case (1)
block 0 model
if specificity is 0 and sensitivity is 100% with an overall accuracy of 50% - what does the model tell us.
- specificity - % it correctly predicts what isnt a case (0)
- sensitivity - % it correctly predicts what is a case (1)
just guessing everyone will pass gives an overall accuracy of 50% with a sensitivity of 100 and specificity of 0 with sensitivity of 100 and specificity of 0.
what variables are int eh equation for our block 0 model
constant only
what is the logit of this ?
logit equation (c+ bx)
then what are the odds using. thelogit?
0 , we only have a constant and that has a value of 0
odds of this are:
Why is the block 0 model useful
We use block 0 as a benchmark - how well would you do if you used just the most frequent outcome?
any subsequent model that uses you IV is assessed compared to that benchmark. If everyone had the same outcome we wouldn’t need to fit any model
what is block 1
Step where the actual IV(s) are being fitted
what does this tell us
SPSS took six iterations to arrive at the final solution. The negative log likelihood has decreased from model 0
What tells us whether the jump from block 0 to the block 1 model is a significant improvement in prediction?
Omnibus tests of model coefficients
Chi squared.
Tells us the difference in the -2 log likelihood between block 0 and block 1, p value tells us whether this is significant
what do Cox and snell R2 & nagelkerke R2 tell us
Goodness of fit test
NOT like R2 in linear regression – does not tell us about the variance explained in DV. Don’t explain it in terms of this is makes no sense in the context of linear regression
Referred to as pseudo R squares
Nagelkerke pseudo R2 is the preferred one used in the literature because its normalised – varies in range between 0 and 1
A value of 1 is as good as it gets
You SHOULD always report the value but its most useful when you compare different models together as higher values indicate better fit/better performance
what do Hosmer and Lemeshow test tell us
another goodness of fit test. tells us how well the model fits the data.
if it’s not significant this means there is no significant difference between the model and the data
if it is significant it tells you the prediction deviates significantly from the data. Model does not do a good job.
what do variables in the equation table tell us
- constant
- IV coefficients
- odds ratio for each predicton - exp(B)
use this to write the logit equation
log odds = -12.259 + (0.198 * attendance)
interpret what the last right column means
exp(B) is the odds ratio. here it is 1.219 which means that for every 1% increase in attendance their odds of passing increase by 1.219.
what is the WALD statistic - what does it tell us here
The significance of the coefficients here is determined using a WALD statistic. this is a bit conservative.
using this, we can see neither the coefficients for the constant or attendance are significant ( p values are larger than .05).
we might conclude looking ta this statistic then that including attendance doesn’t help us at all - not significant.
if the wald statistic tells us the iV is not significant shall we just delete our whole SPSS account and go to sleep?
No! the test itself is quite conservative. Even if it says something is not significant it’s best to look at your overall model (omnibus test of coefficients)
remember this compares. the2 log-likelihoods and decides if block 1 is a significant improvment or not. this is better to use as an indicator of significance. tells us at an overall level whether the inclusion of IVs significantly improved the model.
WALD statistic is applied for each coefficient independently + more conservative.
what can we do to inspect the misclassified items?
this is helpful for smaller datasets. if you have thousands of people thne the classification plot is more helpful
interpret this classification plot
FFFFFFFPPPPPPP on the bottom is the predicted outcome
symbols above are the actual outcome
also note how many ppl each symbol represents (here each symbol is .25 so every 4 symbols is 1 person)
if there was a misclassification of bare ppl slightly to the right you might want to change your cut off to a higher value e.g., - .8
what does casewise list tell us
if there were any outliers. if no outliers the case. of residuals list will be empty
can we only do logistic regression with 1 variable?
No, just like with any regression we can include multiple IV’s in the model
each variable each predictor will have its own coefficient
With multiple IVs in logistic regression can we use the block 0 model to write our logit equation?
yes, but because. wedont have any IVs the logit is simply the value of the constant
the odds ratio for that is .684
how do we know if the inclusion of the multiple IV’s are better fit to the model than the block 0 model
The -2 log likelihood has dropped from block 0 to block 1 indicating a better fit to the data.
we can check whether this is significant by looking at the omnibus test (model row).
second way: compare the sensitivity and specificity and overall model of the two in the classification table
pseudo R squared statistics - when are these useful to look at
When you are comparing multiple models
higher value tells you the model is good. ocmpare this value across models to see which one is better
what does Homer and Lemeshow test
we want this to be insignificant. indicates there is no significant difference between the predicted values and observed data values
Write the logit (log odds) regression equation from this table
of course we also want to know whether all variables in the model contributed significantly to the model - this table indicates they do (under sig)
so the WALD statistic confirms what the omnibus test told us
multiple IVs in the logit regression equation - do we have a single odds ratio for summarising all coefficients OR does each coefficient have an odds ratio
each coefficient has its own odds ratio Exp(B)
using the odds ratio iinterpret the relationship between
- idealism scores and the decision to stop research
- relativism scores and deciding to continue the research
idealism has a an odds ratio of .502
- means a 1 point increase in idealism score leads to a reduction in the odds of someone deciding to continue the research
- negative relationship between idealism and something being considered a case
relativism has an odds ratio of 1.49
this means a 1 unit in increase in score of relativism leads to a change of 1.409 in the odds of someone deciing to continue the research
using the odds ratio iinterpret the relationship between
- idealism scores and the decision to stop research on cats
- relativism scores and deciding to continue the research on cats
idealism has an odds ratio of .502
- means a 1 point increase in idealism score leads to a reduction in the odds of someone deciding to continue the research
- negative relationship between idealism and something being considered a case
relativism has an odds ratio of 1.49
- this means a 1 unit in increase in score of relativism leads to a change of 1.409 in the odds of someone deciding to continue the research
- positive relationship between relativism ands oemething being considered a case
here the constant is not significant. should we remove it from the model?
using the odds ratio iinterpret the relationship between
- gender and the decision to stop research on cats
gender is a binary female
- males coded as 1
- means the odds for men deciding to continue the research are 3.225 times higher than for women
so basically it’s looking at the difference between men and women and men were coded s 1 so anything it picks up is what differed between men and women.
when looking ta the relationship between the IVs and prediction scores e.g., deciding to continue or stop research. Is it better to look at the odds or the coefficients?
Better to look at the odds as the coefficients are sensitive to scale while the odds are not
so the magnitude of odds ratio is more informative in that respect
with which type of regression are we interested in residuals
all
let’s look at the residuals. taking a single case at random e.g.,
- logit regression equation
- then can use this to predict the odd
- e(logit) = .121 (basically odds of this person deciding to continue the research is low)
- then we get the probability/prediction for thie which (odds/1+odds) leaves us with .108
they actually decided to continue the research (actual score 1)
so the residual here is 1 - .108 = .891
What is the residual
The difference between the actual binary outcome and predicted probability of something being a case
What is stored in the casewise list
Case wise list flags any outliers
- default is any residual that exceeds 2 SD
What are the three end columns showing
- the residuals
- Resid is our unstandardised residual that we get subtracting the observed and predicted outcome values
- Then we have our standardised residuals
- You can see here they all exceed 2 SD
- Studentile residual exceeds 2SD as per the regression equation
What should you do if you have some variables in the case wise list identified as outliers?
Investigate them and look for a pattern to those cases which are identified as outliers e.g., a value entered in an incorrect way. E.g., a case with a residual of 10
How do outliers affect the regression equation
They have a disproportional effect on the predictive equation. If you remove them this would effect the regression equation.
How do I know whether to remove or retain outliers in my regression analysis ?
Keep it in if
- you only have a few outliers e.g., 4 out of 315
- they are close to the threshold e.g., of 2 SD (remove it if was like 10 or something)
why are normal residual plots not really applicable to logistic regression?
Bc DV takes the form of either 0 or 1
- thus any resifuals will be highly clustered
Look at the residual plot. How alarming would this be in OLS regression vs logistic regression
Would be very concerning in an OLD regression but not so much with logistic regression. Plots here are not high in diagnostic value and this is why we do not make use of them.
Ok all 3 IVs are significant.
How can we figure out what are best predictor in our logistic regression model is? And what the order of the remaining predictors are?
- so far we have used the enter method to run The logistic regression
- for finding the best-worst IV’s in order we use the stepwise procedure
- called the forward likelihood ratio - “forward LR”
- this evaluates the contribution of each predictor with respect to the overall log likelihood of the model
- is there a significant change int eh -2 log likelihood when this predictor was added into the equation
- allows you to officially test the relative merit of each variable, and the order of their significance
Discuss the output of the forward likelihood ratio (forward LR)
Get output in a series of steps – 1 step for each variable in the model.
e.g., 3 steps if 3 variables are added to the model
each step – another predictor variable is added and you get an update of the results
Interpret this forward LR output
The different steps adding a new predictor into the model. Remember the order they are added reflect their relevant contributions to the overall fit and reduction in -2 log likelihood
- step 1 idealism is added (best predictor)
- step 2 gender has been added (second best predictor)
- step 3 relativism is added (third best predictor)
- when more variables are added to the model the coefficient and odds ratio of the other variables change
- note idealism they both change when gender is entered, then again when gender + relativism is entered.
Analysing logistic regression using the enter method vs the forward LR (stepwise in order of best-worst) method. How would the FINAL 3 variables look in terms of coefficients and odds ratio?
They would be the exact same.
describe classification table using the forward LR method
- get a different classification table for each variable added into the analysis
- what is important to report when reporting the classification table? The cut off point
- in all classification tables the specificity, sensitivity and overall percentage correct for a model is affected by the cut off point
Cross Validation
- Approach you derive the regression equation from one dataset then apply it to another dataset. Can use CV to test how well the regression equation can predict novel data
- Could compute the regression equation for half the dataset then test this model on the other half
- Use this regression equation in the other dataset to predict the outcome (logit, odds, and probabilities. Use these along with the cut off to make a prediction (whether something is classified as a case or not a case)
- Then for this new dataset - cross-tabulate the predicted outcome against the predicted outcome. The “cross-validation classification table”
- Then use overall accuracy, specificity and sensitivity to determine if you have a good regression equation
When have we covered cross validation in the context of linear regression?
Here we used t tests and correlations to test how well a model predicts data (generalises). In logistic regression we use the classification table to test how well the model does this.
Interpret this cross-validation classification table
Left column stop and continue (actual outcome) while top stop and continue (predicted outcome)
53 decided to stop – correctly predicted to stop, 6 others decided to stop but were incorrectly predicted to continue. 35 decided to continue but were incorrectly predicted to stop, and 21 decided to continue and were correctly predicted to continue.
- Specificity of model: 53/59 = 89.8%
- Sensitivity of model: 21/56 = 37.5%
- Overall accuracy: (53 + 21) / 115 = 64.3%
Overall accuracy = everyone correctly predicted to stop/continue divided by everyone.
Using cross-validation the overall accuracy is 64.3%. is this good or bad?
To make this decision whether this model is a good model you might have to compare this to performance based on chance. Both in the overall dataset and/or in the new independent dataset.