L5 - Logistic Regression Flashcards
What is the aim of Logistic Regression?
To find the best combination of variables which correctly classify cases according to their membership in the two categories defined by the dependent measure
(0,1) or (1,2).
i.e. binary logistic regression is figuring out the best predictor of whether you belong to a particular group
When do you use Logistic Regression?
When the dependent measure is binary
- (two possible outcomes, i.e. yes or no)*
- e.g. Are you a smoker? - Are you depressed?*
Does Logistic Regression give an outcome based on their estaimated or actual score, or a probability?
Probability
What is the probability outcome of logistic regression telling us?
The probability that the person is in the higher-value group
(i.e., 1 if the pair were coded 0,1 or 2 if it were 1,2).
What is this formula for?
Logistic Regression
Determining the probability that a person belongs to the higher variable group
Can Logistic Regression have multiple IV’s?
Yes
Only the DV is binary
The first thing to do when conducting a logistic regression is identify the significant predictors.
What test/statistic do we use to identify which IV is the most significant predictor of the DV?
The Wald Statistic
What is the Wald Statistic?
Statistical test that tells us whether the predictor (IV) is a significant predictor in our logistic regression
Like a t-test
What is the “Exp (B)” stand for in logistic regression?
The Odds-ratio.
What does the Odds-ratio (Exp (B)) tell us in logistic regression?
Indicates the change in probability (P) (E, i.e., membership in the higher value group) resulting from a one unit change in the value of that predictor variable (IV) score.
For example, 3.42 for gender, means that a one unit change in Gender, i.e., going from Male to Female (assuming male =1, female=2 for this variable), leads to a 3.42 greater probability of being in the higher of the two groups defined by the dependent measure.
If males = 1 and females = 2 and Exp (B) is 3.42; this means that females (higher IV) are 3.42 times more likely to engage in the behaviour than males.
True or False?
True.
Girls are 3.42x more likely to engage in the behaviour when compared to males.
It refers to how much the probability of them engaging in the behaviour increases if they belong to that IV.
UNSURE double check
What is a Z equation in logistic regression?
The linear combination of the variables.
i.e., Z = B 0 + X 1 .B 1 + X 2 .B 2 + ……+ X n .B n .
What does an odds-ratio (OR) of “1” indicate?
Chance
(no predictive value, variable makes no difference)
In the Z equation below, what does B<strong>0,</strong> B and X’s refer to?
Z = B0 + X 1 .B 1 + X 2 .B 2 + ……+ X n .B n .
B<strong>0 </strong>= the constant
All other B values are the regression coefficients
X’s are the actual values each person scored on each variable in the equation.
Z equation = Z = B 0 + X 1 .B 1 + X 2 .B 2 + ……+ X n .B n
Imagine gender were X2
Holding everything else constant, we could calculate a Z value with X2/gender = 1
This value is entered in the P(E) calculation below.
We get an overall probability of membership in the ‘higher’ group defined by the DV of 24% of being in the higher group)
If the odds-ratio is 3.42 for gender. What would the value of P(E) become if we increased X2 value from 1 to 2?
P(E) becomes 3.42 times larger to 82%
24% x 3.42 = 82%
- If P(E) were 3 it would be 24x 6.84*
- (remember, 1 is chance)*
- important*
In this logistic regression example, which of the 3 predictor variables are significant predictors of being part of the higher DV?
What difference in probability does each increase in the IV have on the DV?
Age and Gender are significant as p < .05.
SES has a p > .05 and so is not significant.
Increase of IV on DV for
Age = 21%
Gender = 342%
SES = 86%
What does the outcome variable (DV) tell us in logistic regression?
The best predictors for if someone belongs in a particular group.
Outcome measure is a grouping (binary) variable.
Trying to predict which group one should be in.
When predicting the DV in logistic regression, typically the more IV’s you belong to the ___ likelihood of belonging to the group.
greater
e.g. with problem gamblers; if they belong to the IV groups of 1. get to casino early; 2. use a credit card and not cash; 3. drink while playing - they are likely to be a problem gambler
What percent do you have of being part of a group in a “chance” model of logistic regression?
50%
What is considered a good logistic regression model predictor rate?
70-80% classified correctly
What statistic do you use to determine whether a probability variable is significant?
Wald statistic
p < .05
If you get a logistic regression model that finds independent variables that are not significant, what should your next step typically be?
Re-run the model without the insignificant predictor variables
After looking at the Wald statistic, where do you look in the logistic regression model?
Odds-Ratio
Why would the Odds-Ratio of age be much smaller than gender?
(why can’t you necessarily compare all odds-ratios?)
Because 1 unit of age is 1 year (not much difference), whereas 1 unit of gender is a completely different sex (large difference)
What does a 1.0 Odds-Ratio indicate?
That the predictor variable makes no difference whether or not you are likely to be part of the group of interest
Why is Logistic Regression and understanding Odds-ratios useful?
You can profile people
- E.g. when identifying if they might have cancer*
- if they are over 50*
- Smoker*
- Parents had cancer*
- The more of these that are ‘yes’ the more likely they are to have cancer*