Lecture 7 Flashcards by Claire Jenkins

Why can’t you use regular regression for binary outcomes?

because you can get values other than 0 or 1
can have below 0 and above 1 and decimals
this does not make sense when trying to interpret; cannot exptrapolate

How well did you know this?

Not at all

Perfectly

What does logisitic regression involve?

model the probability of predicting Y=1 (this is a continuous function ranging from 0-1)
model: log odds of obtaining Y=1
predict this as a regression

How well did you know this?

Not at all

Perfectly

How do you calculate the odds and probability in logisitc regression?

use the values in the formula to get log(odds)
odds = e^(log(odds))
P(Y=1) = odds/(1+odds)

How well did you know this?

Not at all

Perfectly

How do you interpret odds and log(odds)?

odds > 1: Y=1 more probable than Y=0
log(odds) > 0: Y=1 more probable than Y=0
odds=1 or log(odds)=0: equal chances of each

How well did you know this?

Not at all

Perfectly

Why do we use log in logisitic regression?

can put in any values from -infinity to infinity, yet:

- the function cannot go below 0 or above 1

How well did you know this?

Not at all

Perfectly

How do you sub the regression equation into the log function?

1 / (1 + e^-(regression equation))

How well did you know this?

Not at all

Perfectly

What is the link? What are the different types of links?

link = function (f(Y)), sometimes mu
identity link: mu = Y (linear model)
logistic link: mu = log P(Y=1)/P(Y=0). For binary variables
logarithmic link: mu = logY. For counts/frequencies, loglinear model

How well did you know this?

Not at all

Perfectly

Why do we use links/functions?

GLM allows linear techniques to be used on non-linear data

- when datasets do not conform to the assumptions of linear regression

How well did you know this?

Not at all

Perfectly

What are the assumptions of logistic regression? What is not assumed?

binary outcomes that are MUTUALLY EXCLUSIVE
independence of observations (as usual)
IVs can be continuous or categorical
NOT normality, linearity, homoscedasticity

How well did you know this?

Not at all

Perfectly

How do you interpret the SPSS output for logistic regression?

Block 0: doesn’t tell you much, classification table tells you proportion of Y=0
Block 1: look at R2 (Nagelkerke)
% correct > how much correct classification the model has
Exp(B) = the odds ratio, interpret as: odds increase by a FACTOR of this when the IV increases by one unit
also look at the CI for Exp(B)

How well did you know this?

Not at all

Perfectly

What is the difference between Cox and Snell’s and Nagelkerke’s R2 values?

C+S: function of the likelihood ratio, does not have a maximum of 1
N: adjusts C+S by taking it to the maximum possible value

How well did you know this?

Not at all

Perfectly

Why do you have to use loglinear regression rather than X2?

when there is a 3x3 not a 2x2 table

- X2 works with 2x2 only

How well did you know this?

Not at all

Perfectly

What is Simpson’s paradox?

conclusions drawn from the margins of a table are not necessarily the same as those from the whole table

How well did you know this?

Not at all

Perfectly

What are loglinear models based on?

counts or frequencies

3+ categorical variables

How well did you know this?

Not at all

Perfectly

What is the formula for loglinear model? What do you actually test?

logF(MD) = sigma + lambda(M) + lambda(D) + lambda(MD)

tests INTERACTION to see if the variables are associated
test to see if the NON-SATURATED model is an accepatble fit

How well did you know this?

Not at all

Perfectly

How does loglinear regression go about reaching a simpler model?

Study These Flashcards

it starts with a saturated model

- removed the highest order interaction and sees whether this affects the fit

What measure of fit is used in loglinear regression? What do you do with this?

Study These Flashcards

G2 (likelihood ratio statistic)
X2 distribution
saturated model has df=0, no probability (has a - in table)
go through the tables and look at the deleted affect significance levels
take out the non-significant (>.05) ones when you do your model selection

How do you interpret the loglinear regression?

Study These Flashcards

estimate value: if >0 then the likelihood increases, if less than 0 then likelihood deceases
because they are in terms of log(odds)!

What are the assumptions of loglinear regression?

Study These Flashcards

each case in one cell and one cell only
5x as many cases as cells
all cell frequencies should be >1 and only 20% less than 5
normal standardised residuals, no obvious pattern when plotted against observed values

What is Wald’s test?

Study These Flashcards

a significance test

- it’s like the t-test in ANOVA

How do you calculate the EXPECTED cell counts in loglinear regression by hand?

Study These Flashcards

do e^(x) for each parameter that applies to that cell, then multiply all of these together
remember to do one for the constant as well!!

OR you can add the relevant B values together, then take the e of this summed total

eg. if Senior, Male, Appraisal A: need parameters senior, male, A, seniorA, maleA, seniormale, seniormale*A

Why do you need to be careful when looking at logistic regression in terms of probability?

Study These Flashcards

the log function is not linear
you cannot interpret the probability in a linear fashion
BUT: you can get linear prediction in terms of log(odds)

What is the key feature of the coding of binary variables in logistic regression?

Study These Flashcards

it is arbitrary!

- just need to be 0, 1 coded

Why can you not compare R2 values in logistic regression?

Study These Flashcards

the variance is a function of the proportion (mean)
cannot be compared with R2 from linear regression
cannot be compared with R2 for binary outcomes with diff. means

Explain the % correct

- for P(Y=1), if >.5 then correct | - for P(Y=0), if p

Why do we use log in loglinear models?

- because of the properties of logs | - log(AB) = log(A) + log(B)

What is the equation for logistic regression?

log(P(Y=1)/P(Y=0)) = alpha + b1X1 + b2X2 etc.

What do the graphs look like in logistic regression for log(odds), odds and probability?

- log(odds): linear - odds: exponential - probability: log (s-shaped) curve

What is the equation for the Generalised Linear Model?

f(Y) = alpha + b1X2 + b2X2 etc. + e

How do you calculate proportions in general? how does this translate to the loglinear model?

F(md) = N x p(m) x p(d) - N = total number, p = proportion TAKE LOG Log(F(md)) = log(N x p(m) x p(d)) >>> log(N) + log(p(m)) + log(p(d)) - then add interaction term (b/w m and d)

What are the estimate terms in in loglinear models?

- in log(odds)!!!!

How do you calculate, for example, "if you are male, odds of getting an A"? And how do you get an odds ratio for males vs. females?

- (sum of males with A)/(sum of males not with A - with B/C) | - odds ratio: males/females with the above equation

What is the nature of loglinear modelling? What does this mean?

- hierarchical - if the 3 way is sig, then keep the 3 way, 2 way and main effects - always have to keep the lower down effects - that's why you can assume that the main effects are present, if you are keeping even just one 2-way effect

Why is the model saturated in loglinear modelling?

- cannot use any more parameters (all main and interaction effects included) - more parameters are redundant eg. - log(F)m-nd = sigma + (lambda)M - log(F)fd = sigma + (lambda)D - log(F)f-nd = sigma

Lecture 7 Flashcards

(34 cards)