Lecture 6 - Binary Logistic Regression Flashcards
Logistic Regression is…?
….a modified form of the linear regression framework we have learned about so far. Logistic regression modifies the output from the linear regression model to transform it from a linear line to a sigmoidal curve, which has bounds of 0-1. Normal linear regression does not have bounds and in theory can run to infinity in either direction.
The point of logistic regression is…?
…to model a criterion variable which is binary i.e. one that can only take on one of two values. Just like in linear regression, the predictor variables can be continuous or categorical, and multiple predictors are fine.
The output from a logistic regression is usually interpreted as…?
…providing a probabilistic ‘guess’, based on the predictor variable scores, as to whether the criterion variable will be a 0 or a 1. So, if for a given person the output of the model is 0.9, this can be considered a 90% guess that that individual has a ‘1’ on the criterion variable.
In the SPSS output, where is the difference in -2LL between the model and model without predictors?
In the Omnibus table, under chi-square.
In the SPSS output, where is the equivalent of SSE-left, but for -2LL, between the model and model without predictors?
In the Model Summary table, under -2 log-likelihood.
What is the Likelihood ratio?
The difference in -2LL between the model and model without predictors.
What is the name of the equivalent for Likelihood ratio in linear regression?
SSE-reduced.
Using the SPSS output from Logistic regression. How can you calculate the -2LL of the model with no predictors (the equivalent of SSE-total)?
Add the likelihood ratio from the Omnibus table with the -2LL (SEE-left equivalent) from the Model Summary table.
What is the -2LL?
-2 Log Likelihood. It is -2 multiplied with the Log likelihood.
Why do we use -2LL and not only the Log Likelihood?
Because the -2LL has a known distribution, the chi-square distribution, making it possible to compare two values of -2LL and calculate a p-value for the difference between these two values.
How do you calculate Hosmer & Lemeshow’s r^2?
(Likelihood ratio) / (-2LL of the model with no predictors )
What does the Likelihood ratio divided by -2LL of the model with no predictors calculate?
Hosmer & Lemeshow’s r^2
What does Hosmer and Lemmeshows r^2 mean?
What proportion of -2LL we reduced by including predictors in the model
In the “Variables in the Equations” table, what does the “B” column denote?
The b0(bottom) and b1(top) of the model.
What is the “Wald” number/value and how is calculated in logistic regression?
Wald is a measure that can be thought of as the equivalent to the t-value in linear regression. It compares a predictor’s b1 to the predictor’s SE (being the “b1” of the no-predictor model. This generates a p-value. Wald is calculated by:
(b1 for the predictor / SE for the predictor) ^2 = Wald
Formula for odds?
Odds=P/(1-P) P = probability
Formula for probability from odds?
Probability=Odds/(1+Odds)
what is Exp(B)
Exp(B) tells us how much the odds of Y go up/down for every 1 increase in X.
How do you get from the probability of Y when X = 9.0 to the probability of Y when X = 10.0, using calculations involving odds?
- Convert the probability of Y when X = 9.0 into Odds.
- Apply Exp(B)(10-9) to the Odds:
((10-9)Exp(B)*Odds) = New Odds - Convert New Odds back to probability
When would a “Chi-square test” be used in traditional psychology?
When you have BOTH a binary criterion variable AND a Binary predictor.
When can Chi-square tests NOT be used?
Chi square can’t handle a continuous predictor variable, or multiple predictors, so in that situation students would then be taught to use logistic regression
What is the general linear model called after it has been transformed?
the generalized linear model
Name two advantages of using -2LL instead of SSE as a measure of goodness of fit.
- -2LL can be compared directly to another -2LL. When using SSE, we first have to calculate F.
- Using -2LL allow us to compare models with the same number of predictors, which is much harder to do using SSE or F-values.
What is the validity of that measurement tool?
to what extent the tool is measuring what we really want to measure, rather than something else
what are Nuisance Variables?
things that we do not want to measure but also affect how people respond to our measurement tools.
Describe Nominal Data
Options have:
No order
No specific interval
No true zero
e.g. colours, gender, ethnicity
Describe Ordinal Data
Options have:
- Order
(No specific interval)
(No true zero)
e.g. 1st 2nd 3rd (race positions)
Describe interval Data
Options have:
- Order
- Specific interval
(No true zero)
eg. likart scale: bad—neutral—good
Describe Ratio Data
Options have:
- Order
- Specific interval
- True zero
e.g. 0-100%, reaction time.
How does the public often interpret interval scales like a “5-star system”?
Public interpret these points as ordinal rather than interval data. 5 means good, not perfect, and 4 means something is wrong, not almost perfect. Cultures of what each point on the scale ‘means’ can develop.
What type of test needs to be used to interpret ordinal data?
non-parametric tests