Logistic Regression, Probability, and Odds Flashcards
What is regression?
Mathematical equation for straight line that is fitted to data as way of describing relationship between two or more variables.
What are the main purposes of regression?
Prediction (to estimate risk).
Control for confounding.
What does the regression line do?
Estimates average values for variable on vertical scale (Y) according to values on horizontal scale (X).
What happens in simple regression?
For each change in X, there is a change in Y.
What is general linear regression?
Refers to conventional linear regression models for a continuous response variable given continuous and/or categorical predictors.
What happens with transformation in a GLM?
Without transformation, the coefficient will estimate rate differences.
With transformation, the coefficient will estimate rate ratios.
What is the model for simple regression?
Yx=(D | X=x)= α+βx
Y = α + βX + ε (prediction)
Yi = BO + B1Xi + Ei (population)
What is a?
Measure when x = 0
Also B0
The y-intercept
What does slope coefficient do?
Estimate increase in risk per unit increase in X.
B1
How are mean and probability related?
Mean is estimate of probability.
Why will linear regression not work?
Probability is scale of 0-1 and regression is not within those constraints.
Linear model assumes risk changed linearly.
Why will logistic regression work?
Used for binary outcomes and on 0-1 scale showing odds.
This is because the probability curve becomes linear after taking natural log of a value.
What is the model form of logistic regression?
Logit (Y) = ln (Y/(1-Y))
OR
Logit [P (Y)] = ln ((P(Y))/(1-P(Y)))
OR
ln (Px/1-Px) = ln (D|X=x) = a +Bx
What is alpha?
Log odds of outcome when X = 0
The y-intercept.
What is beta?
Slope
Log odds RATIO comparing two exposure groups differing by one unit on scale of x.
In simple logistic regression what do 1 and 0 represent?
1 = exposed, 0 = not exposed.
Simple logistic regression, what would 1 (exposure) yield?
Alpha + beta = log odds of exposure
e ^(a+b) is the odds of disease with exposure.
Simple logistic regression, what would 0 (no exposure) yield?
Alpha ONLY. Log odds of outcomes when not exposed.
e^(a) is the odds of disease without exposure.
How would you get ODDS from log odds/?
Exponentiate both sides:
e ^ (a+b) when exposed
e ^ (a) when unexposed
e is a constant
e = 2.718
How would we find odds ratio?
Odds exposed v unexposed:
Odds(a+b)/odds(a)
Therefore:
e^(a+b-a)
Therefore:
e^b
Odds ratio is e^b
What happens when you have more than one categorical predictor?
Create dummy variables with a reference group.
How do you choose reference group?
Generally the one with the largest sample size so estiamtes with be reasonably precise and stable.
What is model for this?
logit(p) = a +b1 + b2
What is alpha in this model?
Log odds of disease for reference group.
What is b1?
Log odds of disease for group 1 versus reference group.
What is b2?
Log odds of disease for group 2 versus reference group.
What is model for continuous predictor?
Logit (p ) = alpha + betax
What is alpha?
Log odds when X = 0
What is beta?
Log OR with 1 unit change in X.
How would you calculate odds when X = to a different number?
Multiply that number by BETA and add to ALPHA.
What is the Wald hypothesis?
Test of exposed versus unexposed.
Null is that OR = 1 (that e^beta = 1); no difference between the two.
H0: beta = 0
Ha: beta does not equal 0
What are advantages to logistic regression?
Obtain estimates and CI for OR (crude and adjusted)
Can handle multiple independent variables.
What are disadvantages to logistic regression?
Implicit assumptions difficult to check.
Model selection difficult.