Generalized Linear Models Flashcards

1
Q

What is the objective of Generalising Linear Models?

A

To allow us to do regression in problems where our Yi is not normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the stochastic/random part of a model?

A

The form of the model which characterises the distribution of Yi (eg. Yi ~ N(mu(i), sigma²)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the structural part of the model?

A

A function of mu(i) which describes its relationship with the covariates (eg. mu(i) = B0 + B1X1 + B2X2 + … + BPXP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of model which we go over in this course?

A
  • Poisson Model (for count outcomes)

- Binomial Model (for binary or binomial outcomes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between a binomial outcome and a binary outcome?

A

Binary (or Bernoulli) outcome is dependent on a single trial where as Binomial outcome is dependent on a number of trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a link function?

A

A function which describes the relationship between the parameter of a distribution and the covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the link function for the Poisson Model?

A

log(lambda) = linear covariates

*natural logarithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the link function for the Binomial Model?

A

log(odds of success) = linear covariates

*natural logarithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define the term “odds”?

A

A quantity which the the ratio of the probably of an event occurring divided by the probability of the event not occurring.

= [p(A)] / [p(not A)]

*in Bernoulli events, “A” is success and “not A” is failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you read data from a CSV file into R?

A

data = read.csv(“filename.csv”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the R function for viewing the first few rows of a data object?

A

head(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the R function for viewing the names of the variables in a data object?

A

names(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the R code for viewing the values under a specific variable name in a data object?

A

data$variableName

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the R code for viewing the number of each type of value under a specific variable name in a data object?

A

table(data$variableName)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the R code for viewing the proportion of each type of value under a specific variable name in a data object?

A

prop.table(table(data$variableName))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the R code for adding a variable name to a data object based on some condition of each row?

A

data$newVariable = ifelse(data$conditionVariable == “something”, 1 , 0)

Will set newVariable to 1 if condition is true else set newVariable to 0

17
Q

What is the R code for fitting a GLM to a binomial dependent variable and viewing a summary of the model?

A

model1 = glm(dependent ~ explanatory, family = “binomial”, data = dataObject)

summary(model1)

  • the outputs from the summary give the coefficients for the link function (logit in this case)
  • standard error gives an idea of the variability in the estimate of that coefficient
18
Q

What is does logit(p) equate to?

A

log(odds of p)

  • natural logarithm
19
Q

How do we know how well the model fits the data?

A
D = -2(l(c)  - l(f))
D ~ chi-squared with n-k-1
* where n is number of observations
* where k+1 is the parameters estimated
* where l is likelihood
* where c is current
* where f is a full/ideal model which fits all of the data

%Deviance Explained = [Dnull - Dcurrent]/[Dnull]
* where Dnull is the Deviance of the model with just the intercept

20
Q

What is the R code for fitting a GLM to a “count” dependent variable and viewing a summary of the model?

A

model2 = glm(dependent ~ explanatory, family = “poisson”, data = dataObject)

summary(model2)

  • the outputs from the summary give the coefficients for the link function (logit in this case)
  • standard error gives an idea of the variability in the estimate of that coefficient
21
Q

What is the criteria for a distribution to be part of the Exponential Family?

A

The distribution must be able to be written in the form:

exp{a(y).b(theta) + c(theta) + d(y)}

  • where theta represents a parameter of the distribution
22
Q

What is the Interaction Component/Term of a distribution written in Exponential form?

A

a(y).b(theta)

23
Q

What is the Additive Component/Term of a distribution written in Exponential form?

A

c(theta) + d(y)

24
Q

What is the Natural Parameter of a distribution written in Exponential form?

25
What does e^ln(x) equate to?
x * we then use this to prove whether a distribution belongs to the Exponential Family. We place the pmf/pdf in the place of x in e^ln(x)
26
Is the Weibul distribution a member of the Exponential Family?
In most cases no, but yes if lambda is told to be some constant
27
What is the expected value of a(y) in the context of Exponential distributions?
[c’(theta)]/b’(theta)]
28
What is the variance of a(y) in the context of Exponential distributions?
[b’’(theta).c’(theta) - b’(theta).c’’(theta)]/[b’(theta)]^3