Lecture 7 Flashcards

1
Q

Why can’t you use regular regression for binary outcomes?

A
  • because you can get values other than 0 or 1
  • can have below 0 and above 1 and decimals
  • this does not make sense when trying to interpret; cannot exptrapolate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does logisitic regression involve?

A
  • model the probability of predicting Y=1 (this is a continuous function ranging from 0-1)
  • model: log odds of obtaining Y=1
  • predict this as a regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you calculate the odds and probability in logisitc regression?

A
  • use the values in the formula to get log(odds)
  • odds = e^(log(odds))
  • P(Y=1) = odds/(1+odds)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you interpret odds and log(odds)?

A
  • odds > 1: Y=1 more probable than Y=0
  • log(odds) > 0: Y=1 more probable than Y=0
  • odds=1 or log(odds)=0: equal chances of each
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we use log in logisitic regression?

A
  • can put in any values from -infinity to infinity, yet:

- the function cannot go below 0 or above 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you sub the regression equation into the log function?

A

1 / (1 + e^-(regression equation))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the link? What are the different types of links?

A
  • link = function (f(Y)), sometimes mu
  • identity link: mu = Y (linear model)
  • logistic link: mu = log P(Y=1)/P(Y=0). For binary variables
  • logarithmic link: mu = logY. For counts/frequencies, loglinear model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we use links/functions?

A
  • GLM allows linear techniques to be used on non-linear data

- when datasets do not conform to the assumptions of linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the assumptions of logistic regression? What is not assumed?

A
  • binary outcomes that are MUTUALLY EXCLUSIVE
  • independence of observations (as usual)
  • IVs can be continuous or categorical
  • NOT normality, linearity, homoscedasticity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you interpret the SPSS output for logistic regression?

A
  • Block 0: doesn’t tell you much, classification table tells you proportion of Y=0
  • Block 1: look at R2 (Nagelkerke)
  • % correct > how much correct classification the model has
  • Exp(B) = the odds ratio, interpret as: odds increase by a FACTOR of this when the IV increases by one unit
  • also look at the CI for Exp(B)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between Cox and Snell’s and Nagelkerke’s R2 values?

A
  • C+S: function of the likelihood ratio, does not have a maximum of 1
  • N: adjusts C+S by taking it to the maximum possible value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why do you have to use loglinear regression rather than X2?

A
  • when there is a 3x3 not a 2x2 table

- X2 works with 2x2 only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Simpson’s paradox?

A

conclusions drawn from the margins of a table are not necessarily the same as those from the whole table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are loglinear models based on?

A

counts or frequencies

3+ categorical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the formula for loglinear model? What do you actually test?

A

logF(MD) = sigma + lambda(M) + lambda(D) + lambda(MD)

  • tests INTERACTION to see if the variables are associated
  • test to see if the NON-SATURATED model is an accepatble fit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does loglinear regression go about reaching a simpler model?

A
  • it starts with a saturated model

- removed the highest order interaction and sees whether this affects the fit

17
Q

What measure of fit is used in loglinear regression? What do you do with this?

A
  • G2 (likelihood ratio statistic)
  • X2 distribution
  • saturated model has df=0, no probability (has a - in table)
  • go through the tables and look at the deleted affect significance levels
  • take out the non-significant (>.05) ones when you do your model selection
18
Q

How do you interpret the loglinear regression?

A
  • estimate value: if >0 then the likelihood increases, if less than 0 then likelihood deceases
  • because they are in terms of log(odds)!
19
Q

What are the assumptions of loglinear regression?

A
  • each case in one cell and one cell only
  • 5x as many cases as cells
  • all cell frequencies should be >1 and only 20% less than 5
  • normal standardised residuals, no obvious pattern when plotted against observed values
20
Q

What is Wald’s test?

A
  • a significance test

- it’s like the t-test in ANOVA

21
Q

How do you calculate the EXPECTED cell counts in loglinear regression by hand?

A
  • do e^(x) for each parameter that applies to that cell, then multiply all of these together
  • remember to do one for the constant as well!!

OR you can add the relevant B values together, then take the e of this summed total

eg. if Senior, Male, Appraisal A: need parameters senior, male, A, seniorA, maleA, seniormale, seniormale*A

22
Q

Why do you need to be careful when looking at logistic regression in terms of probability?

A
  • the log function is not linear
  • you cannot interpret the probability in a linear fashion
    BUT: you can get linear prediction in terms of log(odds)
23
Q

What is the key feature of the coding of binary variables in logistic regression?

A
  • it is arbitrary!

- just need to be 0, 1 coded

24
Q

Why can you not compare R2 values in logistic regression?

A
  • the variance is a function of the proportion (mean)
  • cannot be compared with R2 from linear regression
  • cannot be compared with R2 for binary outcomes with diff. means
25
Q

Explain the % correct

A
  • for P(Y=1), if >.5 then correct

- for P(Y=0), if p

26
Q

Why do we use log in loglinear models?

A
  • because of the properties of logs

- log(AB) = log(A) + log(B)

27
Q

What is the equation for logistic regression?

A

log(P(Y=1)/P(Y=0)) = alpha + b1X1 + b2X2 etc.

28
Q

What do the graphs look like in logistic regression for log(odds), odds and probability?

A
  • log(odds): linear
  • odds: exponential
  • probability: log (s-shaped) curve
29
Q

What is the equation for the Generalised Linear Model?

A

f(Y) = alpha + b1X2 + b2X2 etc. + e

30
Q

How do you calculate proportions in general? how does this translate to the loglinear model?

A

F(md) = N x p(m) x p(d)
- N = total number, p = proportion

TAKE LOG
Log(F(md)) = log(N x p(m) x p(d))
»> log(N) + log(p(m)) + log(p(d))
- then add interaction term (b/w m and d)

31
Q

What are the estimate terms in in loglinear models?

A
  • in log(odds)!!!!
32
Q

How do you calculate, for example, “if you are male, odds of getting an A”? And how do you get an odds ratio for males vs. females?

A
  • (sum of males with A)/(sum of males not with A - with B/C)

- odds ratio: males/females with the above equation

33
Q

What is the nature of loglinear modelling? What does this mean?

A
  • hierarchical
  • if the 3 way is sig, then keep the 3 way, 2 way and main effects
  • always have to keep the lower down effects
  • that’s why you can assume that the main effects are present, if you are keeping even just one 2-way effect
34
Q

Why is the model saturated in loglinear modelling?

A
  • cannot use any more parameters (all main and interaction effects included)
  • more parameters are redundant

eg.
- log(F)m-nd = sigma + (lambda)M
- log(F)fd = sigma + (lambda)D
- log(F)f-nd = sigma