Generalized Linear Models Flashcards
What is the objective of Generalising Linear Models?
To allow us to do regression in problems where our Yi is not normally distributed
What is the stochastic/random part of a model?
The form of the model which characterises the distribution of Yi (eg. Yi ~ N(mu(i), sigma²)
What is the structural part of the model?
A function of mu(i) which describes its relationship with the covariates (eg. mu(i) = B0 + B1X1 + B2X2 + … + BPXP)
What are the two types of model which we go over in this course?
- Poisson Model (for count outcomes)
- Binomial Model (for binary or binomial outcomes)
What is the difference between a binomial outcome and a binary outcome?
Binary (or Bernoulli) outcome is dependent on a single trial where as Binomial outcome is dependent on a number of trials
What is a link function?
A function which describes the relationship between the parameter of a distribution and the covariates
What is the link function for the Poisson Model?
log(lambda) = linear covariates
*natural logarithm
What is the link function for the Binomial Model?
log(odds of success) = linear covariates
*natural logarithm
Define the term “odds”?
A quantity which the the ratio of the probably of an event occurring divided by the probability of the event not occurring.
= [p(A)] / [p(not A)]
*in Bernoulli events, “A” is success and “not A” is failure
How do you read data from a CSV file into R?
data = read.csv(“filename.csv”)
What is the R function for viewing the first few rows of a data object?
head(data)
What is the R function for viewing the names of the variables in a data object?
names(data)
What is the R code for viewing the values under a specific variable name in a data object?
data$variableName
What is the R code for viewing the number of each type of value under a specific variable name in a data object?
table(data$variableName)
What is the R code for viewing the proportion of each type of value under a specific variable name in a data object?
prop.table(table(data$variableName))
What is the R code for adding a variable name to a data object based on some condition of each row?
data$newVariable = ifelse(data$conditionVariable == “something”, 1 , 0)
Will set newVariable to 1 if condition is true else set newVariable to 0
What is the R code for fitting a GLM to a binomial dependent variable and viewing a summary of the model?
model1 = glm(dependent ~ explanatory, family = “binomial”, data = dataObject)
summary(model1)
- the outputs from the summary give the coefficients for the link function (logit in this case)
- standard error gives an idea of the variability in the estimate of that coefficient
What is does logit(p) equate to?
log(odds of p)
- natural logarithm
How do we know how well the model fits the data?
D = -2(l(c) - l(f)) D ~ chi-squared with n-k-1 * where n is number of observations * where k+1 is the parameters estimated * where l is likelihood * where c is current * where f is a full/ideal model which fits all of the data
%Deviance Explained = [Dnull - Dcurrent]/[Dnull]
* where Dnull is the Deviance of the model with just the intercept
What is the R code for fitting a GLM to a “count” dependent variable and viewing a summary of the model?
model2 = glm(dependent ~ explanatory, family = “poisson”, data = dataObject)
summary(model2)
- the outputs from the summary give the coefficients for the link function (logit in this case)
- standard error gives an idea of the variability in the estimate of that coefficient
What is the criteria for a distribution to be part of the Exponential Family?
The distribution must be able to be written in the form:
exp{a(y).b(theta) + c(theta) + d(y)}
- where theta represents a parameter of the distribution
What is the Interaction Component/Term of a distribution written in Exponential form?
a(y).b(theta)
What is the Additive Component/Term of a distribution written in Exponential form?
c(theta) + d(y)
What is the Natural Parameter of a distribution written in Exponential form?
b(theta)
What does e^ln(x) equate to?
x
- we then use this to prove whether a distribution belongs to the Exponential Family. We place the pmf/pdf in the place of x in e^ln(x)
Is the Weibul distribution a member of the Exponential Family?
In most cases no, but yes if lambda is told to be some constant
What is the expected value of a(y) in the context of Exponential distributions?
[c’(theta)]/b’(theta)]
What is the variance of a(y) in the context of Exponential distributions?
[b’’(theta).c’(theta) - b’(theta).c’’(theta)]/[b’(theta)]^3