Generalized Linear Models Flashcards

1
Q

What is the objective of Generalising Linear Models?

A

To allow us to do regression in problems where our Yi is not normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the stochastic/random part of a model?

A

The form of the model which characterises the distribution of Yi (eg. Yi ~ N(mu(i), sigma²)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the structural part of the model?

A

A function of mu(i) which describes its relationship with the covariates (eg. mu(i) = B0 + B1X1 + B2X2 + … + BPXP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of model which we go over in this course?

A
  • Poisson Model (for count outcomes)

- Binomial Model (for binary or binomial outcomes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between a binomial outcome and a binary outcome?

A

Binary (or Bernoulli) outcome is dependent on a single trial where as Binomial outcome is dependent on a number of trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a link function?

A

A function which describes the relationship between the parameter of a distribution and the covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the link function for the Poisson Model?

A

log(lambda) = linear covariates

*natural logarithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the link function for the Binomial Model?

A

log(odds of success) = linear covariates

*natural logarithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define the term “odds”?

A

A quantity which the the ratio of the probably of an event occurring divided by the probability of the event not occurring.

= [p(A)] / [p(not A)]

*in Bernoulli events, “A” is success and “not A” is failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you read data from a CSV file into R?

A

data = read.csv(“filename.csv”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the R function for viewing the first few rows of a data object?

A

head(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the R function for viewing the names of the variables in a data object?

A

names(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the R code for viewing the values under a specific variable name in a data object?

A

data$variableName

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the R code for viewing the number of each type of value under a specific variable name in a data object?

A

table(data$variableName)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the R code for viewing the proportion of each type of value under a specific variable name in a data object?

A

prop.table(table(data$variableName))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the R code for adding a variable name to a data object based on some condition of each row?

A

data$newVariable = ifelse(data$conditionVariable == “something”, 1 , 0)

Will set newVariable to 1 if condition is true else set newVariable to 0

17
Q

What is the R code for fitting a GLM to a binomial dependent variable and viewing a summary of the model?

A

model1 = glm(dependent ~ explanatory, family = “binomial”, data = dataObject)

summary(model1)

  • the outputs from the summary give the coefficients for the link function (logit in this case)
  • standard error gives an idea of the variability in the estimate of that coefficient
18
Q

What is does logit(p) equate to?

A

log(odds of p)

  • natural logarithm
19
Q

How do we know how well the model fits the data?

A
D = -2(l(c)  - l(f))
D ~ chi-squared with n-k-1
* where n is number of observations
* where k+1 is the parameters estimated
* where l is likelihood
* where c is current
* where f is a full/ideal model which fits all of the data

%Deviance Explained = [Dnull - Dcurrent]/[Dnull]
* where Dnull is the Deviance of the model with just the intercept

20
Q

What is the R code for fitting a GLM to a “count” dependent variable and viewing a summary of the model?

A

model2 = glm(dependent ~ explanatory, family = “poisson”, data = dataObject)

summary(model2)

  • the outputs from the summary give the coefficients for the link function (logit in this case)
  • standard error gives an idea of the variability in the estimate of that coefficient
21
Q

What is the criteria for a distribution to be part of the Exponential Family?

A

The distribution must be able to be written in the form:

exp{a(y).b(theta) + c(theta) + d(y)}

  • where theta represents a parameter of the distribution
22
Q

What is the Interaction Component/Term of a distribution written in Exponential form?

A

a(y).b(theta)

23
Q

What is the Additive Component/Term of a distribution written in Exponential form?

A

c(theta) + d(y)

24
Q

What is the Natural Parameter of a distribution written in Exponential form?

A

b(theta)

25
Q

What does e^ln(x) equate to?

A

x

  • we then use this to prove whether a distribution belongs to the Exponential Family. We place the pmf/pdf in the place of x in e^ln(x)
26
Q

Is the Weibul distribution a member of the Exponential Family?

A

In most cases no, but yes if lambda is told to be some constant

27
Q

What is the expected value of a(y) in the context of Exponential distributions?

A

[c’(theta)]/b’(theta)]

28
Q

What is the variance of a(y) in the context of Exponential distributions?

A

[b’’(theta).c’(theta) - b’(theta).c’’(theta)]/[b’(theta)]^3