W2: Logistic Regression Flashcards
Sometimes the responsee variable Y is not continous but binary, for example we may be interested in whether individuals (3)
Logistic regression
- pass or fail
- suffer from bipolar disorder or not
- go to university or not
Since Y (response) is binary, instead of continuous, we cannot use
multiple linear regression, instead we use logistic regression
Logistic regression equation formula:
What does p stand for in logistic regression equation formula?
probability , p, of the event of interest (response;y)
We can find the estimated proability p for an indivudal by rearranging the logistic regression equation formula:
How to put null hypothesis and alternate hypothesis of logistic regression?
Rather than a t-test in multiple linear regression, we use __ test in logistic regression
Wald test
Assess our model (fit) by considering the following 3 in logistic regression
- R^2 value
- Sensitivity and specificty model value
- Signifiance of covariates
Sensitvity and Specficity Table Labelled
In the sensitivty and specifcity table, we use the regression equation to make predictions then compare to observed values and display this information in a
contigency table
What is sensitivity of the model?
Logistic regression
Probabilty of predicting an event correctly
What is specificty of the model?
Logistic regression
Probability of predicting no event correctly of the model
Nagelkere R^2 can be interpreted
Logistic regression
the same way as R^2 in simple/multiple linear regression
Why is Nagelkere R^2 be treated as caution? - (2)
Logistic regression
There is no single way to calculate R^2 in logistic regression
As a result, the Nagelkere R^2 (and others) are often reffered to as pseudo-R^2
Step 1: Exploratory Analysis Example showing both covariates are useful - (3)
Logistic regression
From the boxplot, those who succeed (0) tend to be younger on average than those who fail (1), with less variability in their ages (boxplots)
From the table, we see more women (0) succeed at the task than fail, whereas men (1) fail than suceed
Both covariates are useful
Step 1: Exploratory Analysis Example - another example of boxplot
Logistic regression
From boxplot it shows those individuals who did not take a condom (0) had higher levels of embarassment on average than those who accepted a free condom (1) with less variability in their embarassment
Step 2 - Naglekere R^2 (In SPSS) - (2)
Logistic regression
We see the Nagelkerke R^2 = 0.336 and so 33.6% of the variation in Y (e.g., success/failure) by the model
This seems low compared to multiple regression - but this is often the case.
In contigency table, we want specificty and sensitvity close to __ % to be a good model
Logistic regression
100
Step 2 - What is a good Naglekere R^2 value?
Logo regression
If you get somewhere around the 30s then tend to be a reasonable model (i.e., got reasonable predictions)
Step 2 - Nagelkere R^2 in R
Logistic regression
Step 1 - Explotatory Analysis - Table looking in R
Logistic regression
Step 3 - Classification Table of Sensitivity and Specficity (in SPSS)
Logistic regression
From the table, the model has estimated sensitvity of 69.4% and an estimates specificty of 73.6% - seems reasonable.
Step 3 - Classification Table of Sensitvity and Specificty in R Interpretation more detailed
(1) presence (0) absence of empathy - (3)
Logistic regression
Estimates sensitivty is 58. 3%
Estimated specificty is 94.7%
Model is particularly good at classifiying those with an absence of empathy (specificty)
Altough it classified those with empathy correctly over half of the time (sensitivity)
Step 4 - Writing the logistic regression equation (in SPSS) + signifiance of predictors - (3)
Logistic regression
The regression equation is:
In(p/1-p) = -15.056 + 1.957(Sex) + 0.196(Age)
We see age and sex are both significant at the 0.1% level ( p < 0.001) so they are useful and should be kept in the model
Step 4 - Writing the logistic regression equation (in R)
Logistic regression
What is positive predicted value (PPV)? - 2
Logistic regression
This is the probability of an individual who is predicted to have the event it does
f/d+f
What is negative predicted value (NPV)? - (2)
Logistic regression
This is the probability of an individual who is predicted not to have an event does not
c/c+e
Assumptions in logistic regression - (3)
- The response variable is binary
- Observations are independent
- There is oen or more covariate
Sensitivity formula
Logistic regression
f/e+f
Specificty formula
Logistic regression
c/c+d
No formal checks for logistic regression
Assumptions
If model has high specifcity then used it for
Logistic regression
Absence or presence of empathy in individuals for cats in distress
Because the model has such a high specificity in comparison to its
sensitivity, it would be best to use it to rule out empathy to particular cats in distress