3.1: Generalised linear model Flashcards
How would you predict a categorical variable according to a generalised linear model? (GLM)
Transform the dependent variable to be continuous so that we can perform regression.
What are binary variables? (dichotomous/ binomials)
Two categories
Suppose we want to know how well beta-amyloid presence in blood plasma predicts whether someone will (or will not) get Alzheimer’s dementia (AD) (at age 75).
What is X and Y in our equation?
- X = amount of β-amyloid at 65
- Y = 0 if the person does not have AD
- Y = 1 if the person does have AD
In this binomial proiblem regarding alzheimers can we use Y = β0 + β1X? (3)
No, the prediction errors must have a variance which is independent of the predictor is an assumption of linear regression and this is not met with binomial data. This is apparent when this regression is graphed as e.g in the example given, there is likely much more error for X(amount of β-amyloid) = 0 than for X = 1. Therefore the assumptions are not met.
Additionally the outcomes do not make much sense, in the example an X = 0 gives a Y on about 0.5 indicating a 50% chance. However if X = 2 then there is over a 100% chance of getting it and if X = -2 there there is a less than zero chance of getting it according to the model. This makes interpretation problematic.
The points on the graph as seem to deviate quite a bit from the regression line and don’t seem to follow a general linear structure. This seems to make P-values invalid.
What type of function fits binomial data better and how does this look when graphed?
Logistic function: S-shaped curve, also known as sigmoid curve
What is the logistic function? Give it and explain it
P(Y=1| X) ≈ e^(β0 + β1X) / 1 + e^(β0 + β1X)
aka The chance that Y = 1 given a score on X
- e ≈ 2.71828
- The numerator (1) represents the curves maximum (since we’re working with binomial data)
- β0 + β1X is the linear part of the function where x is used to predict y
- β0 shifts the intercept of the steepest point of the curve. The larger it is the more it shifts to the left and vica versa
- β1 is the slope of the S curve, smaller values result is a more gradual curve and higher values result in a steeper curve
What assumptions are there for general linear models?
- (X’X)-1 exists
* Observations are independent
How can we test whether observations are independent?
Unfortunately we can’t; we can only make sure that our data collection methods and sampling strategies are such that they are independent.