M13 - Tutorial Logistic Regression Flashcards

1
Q

Logistic Regression

  • basic idea
  • disadv of linear model
  • problems in practice: estimation
  • regression of parameters
A
  • DV is discrete
  • -> binary problems

–> you only have 0 and 1 as value of IV and DV
- probabilities can be less than zero or greater than 1
–> linear regr is only valid if the var have a linear relship: categorial variables dont have this
–> Logit regr expresses the multiple regr equation in log terms –> overcomes the problem of linearity
Y takes on values between 0 and 1: (value close to 0 means that it is unlikely to not have occured, close to 1 means that y is very likely to have occured)

  • log regr tries to determine the probability of occurence of a certain event using regression approach considering different influencing variables
  • regr parametrs cannot be interpreted as knwon from linear regression
  • through maximum likelihood
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification table - block 0

  • what is included?
  • cut value:
  • hit rate:
A
  • block 0 only includes the constant in the model
  • it does not include explanatory variabes
  • -> predictions are only based on which category appears most often in the dataset
  • Cut value: indicates the point from which on the estimated probability of an observation is assigned to Y=1
  • hit rate: how often is this prediction right?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Classification table - Block 1

- what is included?

A
  • Block 1 comprises all IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Wald test
- tests…

  • adv over LR
  • adv over t-test
  • disadv
A

–> tests how far the estimated parameters are from zero in SE

  • avd LR: only requires estimating one model
  • adv t-test: can test multiple parameters simultaneously
    disadv: not standardized

= the wald test approximates the LR test

  • test the H0: bj = 0.
  • -> if H0 cannot be rejected: removing variables from the model will not substantially harm the fit of the model, since a predictor with a coeff that is very small relative to its SE is generally not doing much to help predct the DV.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
Interpretation of Metric Variables 
- odds
- odds ratio
 =1
=2
=0.2
label in SPSS
A
  • odds: the likelihood of an event occuring relative to the likelihood of an event not occuring
  • odds ratio: “effect size” : how much do the odds (event occuring) increase/decrease when there is a unit change in the associated IV (OR = Odds after 1unit change/ original odds)
    –> if >1, than as IV increases, th odds of the outcome occuring increases
    –> if < 1, than as the predictor increases, the odds of the outcome occuring decreases
    “the higher blabla, the lower the probability of blabla to occur”

–> statement about how many percentages can only marginal effects give

  • SPSS: odds ratio : Exp(B)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

z-test and t-test

difference?

A
t-test: 
ONE: compare sample mean with pop mean
TWO: compare two independent samples
- N < 30
- SD unkown
- student's t distribution
--> does the predictor have explanatory value?

z-test: compare sample mean with pop mean

  • N > 30
  • SD known
  • normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pseudo-R²-Measure

  • Problem for logit
  • tries to …
  • types
  • values
A
  • for the logit there is no meassure exactly matching the R² of the OLS

…tries to quantify the fraction of variance explained by the logistic regression model - how well does the logistic model fit the data?

  • McFadden R², Cox & Snell R², Nagelkerke R²
  • values >0.1 acceptable, > 0.2 good
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ordered Logit

  • ordered logit model
  • DV
A
  • regr model for ordinal DV

–> extension of logistic regr model that applies to dichotomous DV, allowing for more than two ordered response categories (e.g. Likert-scale)

  • DV: variable with at least three attributes that can be ranked –> ordinally scaled
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ordered Logit - Model estimation

- y and y*

A
  • y is a recording of metric variable y*
  • values of y* are not observable = threshold value
  • relship between y and y* is modeled using threshold model –> set upper and lower bounds (e.g. values for hot, medium, cold)
  • -> if y* <= threshold value O1, the first category will be observed
  • -> if y* > O1, but threshold value O2, the second category will be observed, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Log-likelihood test

  • assesses…
  • is an indicator of …
  • large values …., because
A
  • assesses the goodness of fit in logistic regression
  • is an indicator of how much unexplained information there is after the model has been fitted
  • large values indicate poorly fitting of models, because the larger the value, the more unexplained observations ther are.
  • H0 = all the parameters of the predictors are zero
  • -> if H0 rejected : predictors do have influence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Maximum-Likelihood estimation

  • corresponding to … in linear regression
  • how?
A
  • corresponding to OLS to estimate regression parameters (but does not aim at minimizing variance)
  • selects coefficients that make the observed values most likely to occur
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Chi²-test

  • tests…
  • calculates the fit/total error of a model, how?
  • interpretation
A
  • test if there is a relship between two categorial variables (does the number of cats that line-dance relate to the type of training they use?)
  • [(observed ij - model ij)²/ model ij]
  • -> standardizing the deviation for each observation
  • -> adding upa ll those Std Deviations : chi²
  • chi² : look up critical values for the df: it is significant if the value is bigger than the critical value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Likelihood Ratio test

  • alternative to…
  • based on …
  • the resulting statistic is based on …
  • interpretation
A
  • alternative to chi²
  • based on maximum-likelihood test
  • based on comparing observed frequencies with those predicted by the model
  • also has a chi² distribution: look up critical values for the df: it is significant if the value is bigger than the critical value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Marginal effects

  • how?
  • difference between …
  • Obacht!
A
  • set all variables equal tot he mean and consider the marginal effects of xi on y
  • difference between the p-values of Y=1 and Y=0
  • Obacht: marginal value dpeends on the considered variable and on the values of other IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Omnibus test

A
  • test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall.
  • F-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly