Generalised Linear Models Flashcards
What two equations define a GLM? Define all terms used
The first is y = µ + ε
- Where y is the vector of responses
- µ is the expected response
- ε is the vector of iid random errors ~ N(0,1)
The second is g(µ) = x’β where x is the transpose of the predictor variables and β is the vector of parameters.
g is our link function, linking the mean response to the linear predictor η = x’β
The idea is that µ need not be linear as long as g(µ) = η is
State the three components of a GLM
- An ‘error’ distribution - not to be confused with ε. It is our distribution from the exponential family for the responses y
- Link function g(µ) connecting the mean response to the linear predictor η
- Predictor η = x’β that is a linear combination of the expected variables x and the parameters β
What assumptions do we make for a GLM?
We assume the distribution of y is a member of the exponential family and that our vector y for different covariate values are independent.
In addition, only θ can change with i, all the rest are the same for all i
Describe the concept of a Canonical Link
We write the natural parameter from the yθ in the exponential family as a function of µ. That is, we write b(θ) in θ but in terms of µ
State the IRLS Algorithm
β^(k+1) = β(k) + (X’ W^(k) X)^-1 X’ W^(k) e^(k)
What happens to the Newton Raphson Algorithm when we use the Fisher Info Matrix instead of Observed Fisher info?
We go from the N-R algorithm to the Fisher Scoring Algorithm
Which 3 Algorithms are identical?
The N-R, the Fisher Scoring and IRLS
Describe what’s meant by a Saturated Model
This means that all our natural parameters θ_i are free to vary
Outline Scaled Deviance
Of a GLM is twice the difference in maximum log-likelihood value when comparing it with the saturated model.
Give the formula for the Generalised Pearson Statistic
x^2 = ∑ (y_i - µ)^2 / V(µ)a(ϕ) which should be ~ X^2_(n-p) if the model is true
What do we do after fitting a GLM to test the significance of parameters?
fkjhg