M13 - Tutorial Logistic Regression Flashcards
Logistic Regression
- basic idea
- disadv of linear model
- problems in practice: estimation
- regression of parameters
- DV is discrete
- -> binary problems
–> you only have 0 and 1 as value of IV and DV
- probabilities can be less than zero or greater than 1
–> linear regr is only valid if the var have a linear relship: categorial variables dont have this
–> Logit regr expresses the multiple regr equation in log terms –> overcomes the problem of linearity
Y takes on values between 0 and 1: (value close to 0 means that it is unlikely to not have occured, close to 1 means that y is very likely to have occured)
- log regr tries to determine the probability of occurence of a certain event using regression approach considering different influencing variables
- regr parametrs cannot be interpreted as knwon from linear regression
- through maximum likelihood
Classification table - block 0
- what is included?
- cut value:
- hit rate:
- block 0 only includes the constant in the model
- it does not include explanatory variabes
- -> predictions are only based on which category appears most often in the dataset
- Cut value: indicates the point from which on the estimated probability of an observation is assigned to Y=1
- hit rate: how often is this prediction right?
Classification table - Block 1
- what is included?
- Block 1 comprises all IV
Wald test
- tests…
- adv over LR
- adv over t-test
- disadv
–> tests how far the estimated parameters are from zero in SE
- avd LR: only requires estimating one model
- adv t-test: can test multiple parameters simultaneously
disadv: not standardized
= the wald test approximates the LR test
- test the H0: bj = 0.
- -> if H0 cannot be rejected: removing variables from the model will not substantially harm the fit of the model, since a predictor with a coeff that is very small relative to its SE is generally not doing much to help predct the DV.
Interpretation of Metric Variables - odds - odds ratio =1 =2 =0.2 label in SPSS
- odds: the likelihood of an event occuring relative to the likelihood of an event not occuring
- odds ratio: “effect size” : how much do the odds (event occuring) increase/decrease when there is a unit change in the associated IV (OR = Odds after 1unit change/ original odds)
–> if >1, than as IV increases, th odds of the outcome occuring increases
–> if < 1, than as the predictor increases, the odds of the outcome occuring decreases
“the higher blabla, the lower the probability of blabla to occur”
–> statement about how many percentages can only marginal effects give
- SPSS: odds ratio : Exp(B)
z-test and t-test
difference?
t-test: ONE: compare sample mean with pop mean TWO: compare two independent samples - N < 30 - SD unkown - student's t distribution --> does the predictor have explanatory value?
z-test: compare sample mean with pop mean
- N > 30
- SD known
- normally distributed
Pseudo-R²-Measure
- Problem for logit
- tries to …
- types
- values
- for the logit there is no meassure exactly matching the R² of the OLS
…tries to quantify the fraction of variance explained by the logistic regression model - how well does the logistic model fit the data?
- McFadden R², Cox & Snell R², Nagelkerke R²
- values >0.1 acceptable, > 0.2 good
Ordered Logit
- ordered logit model
- DV
- regr model for ordinal DV
–> extension of logistic regr model that applies to dichotomous DV, allowing for more than two ordered response categories (e.g. Likert-scale)
- DV: variable with at least three attributes that can be ranked –> ordinally scaled
Ordered Logit - Model estimation
- y and y*
- y is a recording of metric variable y*
- values of y* are not observable = threshold value
- relship between y and y* is modeled using threshold model –> set upper and lower bounds (e.g. values for hot, medium, cold)
- -> if y* <= threshold value O1, the first category will be observed
- -> if y* > O1, but threshold value O2, the second category will be observed, etc.
Log-likelihood test
- assesses…
- is an indicator of …
- large values …., because
- assesses the goodness of fit in logistic regression
- is an indicator of how much unexplained information there is after the model has been fitted
- large values indicate poorly fitting of models, because the larger the value, the more unexplained observations ther are.
- H0 = all the parameters of the predictors are zero
- -> if H0 rejected : predictors do have influence
Maximum-Likelihood estimation
- corresponding to … in linear regression
- how?
- corresponding to OLS to estimate regression parameters (but does not aim at minimizing variance)
- selects coefficients that make the observed values most likely to occur
Chi²-test
- tests…
- calculates the fit/total error of a model, how?
- interpretation
- test if there is a relship between two categorial variables (does the number of cats that line-dance relate to the type of training they use?)
- [(observed ij - model ij)²/ model ij]
- -> standardizing the deviation for each observation
- -> adding upa ll those Std Deviations : chi²
- chi² : look up critical values for the df: it is significant if the value is bigger than the critical value
Likelihood Ratio test
- alternative to…
- based on …
- the resulting statistic is based on …
- interpretation
- alternative to chi²
- based on maximum-likelihood test
- based on comparing observed frequencies with those predicted by the model
- also has a chi² distribution: look up critical values for the df: it is significant if the value is bigger than the critical value
Marginal effects
- how?
- difference between …
- Obacht!
- set all variables equal tot he mean and consider the marginal effects of xi on y
- difference between the p-values of Y=1 and Y=0
- Obacht: marginal value dpeends on the considered variable and on the values of other IV
Omnibus test
- test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall.
- F-test