3.1: Generalised linear model Flashcards

Question 1

Q

How would you predict a categorical variable according to a generalised linear model? (GLM)

Answer

A

Transform the dependent variable to be continuous so that we can perform regression.

Question 2

Q

What are binary variables? (dichotomous/ binomials)

Answer

A

Two categories

Question 3

Q

Suppose we want to know how well beta-amyloid presence in blood plasma predicts whether someone will (or will not) get Alzheimer’s dementia (AD) (at age 75).

What is X and Y in our equation?

Answer

A

X = amount of β-amyloid at 65
Y = 0 if the person does not have AD
Y = 1 if the person does have AD

Question 4

Q

In this binomial proiblem regarding alzheimers can we use Y = β0 + β1X? (3)

Answer

A

No, the prediction errors must have a variance which is independent of the predictor is an assumption of linear regression and this is not met with binomial data. This is apparent when this regression is graphed as e.g in the example given, there is likely much more error for X(amount of β-amyloid) = 0 than for X = 1. Therefore the assumptions are not met.

Additionally the outcomes do not make much sense, in the example an X = 0 gives a Y on about 0.5 indicating a 50% chance. However if X = 2 then there is over a 100% chance of getting it and if X = -2 there there is a less than zero chance of getting it according to the model. This makes interpretation problematic.

The points on the graph as seem to deviate quite a bit from the regression line and don’t seem to follow a general linear structure. This seems to make P-values invalid.

Question 5

Q

What type of function fits binomial data better and how does this look when graphed?

Answer

A

Logistic function: S-shaped curve, also known as sigmoid curve

Question 6

Q

What is the logistic function? Give it and explain it

Answer

A

P(Y=1| X) ≈ e^(β0 + β1X) / 1 + e^(β0 + β1X)

aka The chance that Y = 1 given a score on X

e ≈ 2.71828
The numerator (1) represents the curves maximum (since we’re working with binomial data)
β0 + β1X is the linear part of the function where x is used to predict y
β0 shifts the intercept of the steepest point of the curve. The larger it is the more it shifts to the left and vica versa
β1 is the slope of the S curve, smaller values result is a more gradual curve and higher values result in a steeper curve

Question 7

Q

What assumptions are there for general linear models?

Answer

A

(X’X)-1 exists

* Observations are independent

Question 8

Q

How can we test whether observations are independent?

Answer

A

Unfortunately we can’t; we can only make sure that our data collection methods and sampling strategies are such that they are independent.

3.1: Generalised linear model Flashcards

(8 cards)