Week 9 Logistical Regression Flashcards
to Provide revision on the topic of Logistical Regression
How is Logistic regression similar to Multiple Regression?
- Logistic regression uses similar procedures:
* Like multiple regression, the prediction equation includes a linear combination of the predictor variables
What does Logistic Regression (AKA Logit Regression) enable researchers to achieve?
Logistic regression allows one to:
- predict a discrete outcome such as group membership from a set of variables that may be continuous, discrete, or a mix
- evaluate the odds (or probability) of membership in one of the groups … based on the combination of values of the predictor variables
What is Binomial or Binary Logistic Regression?
Binomial (or Binary) logistic regression is a form of regression used when the single dependent variable is dichotomous, even though the independent variables may be of any type
What are some key terms to consider in Logistic Regression?
- In Binomial Logistic Regression, all IV’s are entered as covariates
- Logistic regression is used when there are multiple dependent variables
- Ordinal Logistic Regression is used if multiple classes of DV are ranked
- Sequential and Stepwise Logistic Regression may also be used
- Interactions may be used but must be transformed
- 95% of cells should have values >5.
- the higher of the 2 categories is defaulted to predict the Reference category in Binomial Logistic Regression.
Although Binomial logistic regression is relatively free of restrictions, there are some limitations to be aware of. What are these?
- causal inference does not apply in this form of analysis
- I must theoretically justify my choice of predictor variables in the analysis
- I must deal with missing values & check accuracy of data entry, prior to analysis
- When a perfect solution is identified through binomial classification (that is when one group level has completely polarised values compared to the other group level) then the maximum likelihood solution will not converge
- Extremely high parameter estimates & standard estimates are indications that problems exist
- Logistic regression assumes that responses of different cases are independent of each other
- Achieving multivariate normality and linearity may enhance power
What is a Logit Variable?
A Logit Variable is a natural log of the odds of the dependent occurring or not
When does Logistic regression apply Maximum Likelihood Estimation (MLE) ?
- MLE is applied after transformation of the DV into a logit variable.
- Thus logistic regression estimates the odds of a certain event occurring.
What is the difference between Maximum Likelihood Estimation (MLE) & Ordinary Least Square (OLS) estimation?
Because logistic regression calculates changes in the log odds of the dependent whereas OLS regression calculates changes in the dependent variable itself
Why does Howell (2002) favour Logistic Regression over alternatives to Logistic analysis such as Standard Multiple Regression (SMR) and Discriminant Function Analysis (DFA)?
With a dichotomous dependent variable SMR only provides a fairly good estimate if the percentage of improvement scores don’t fall below 20% or above 80% across all values, the rest of the time is not a wise choice. *DFA requires more stringent assumptions to be met & may produce probabilities out of the range being investigated, that is 0 to 1 (so not good)
What visual representation does Howell (2002) favour over a straight line?
- a sigmoidal curve better represents results when using binomial logistic regression.
- when predicting probabilities based on a criterion that has a categorical value of two values the relationship si not always linear.
What are the 2 steps required to calculate the probabilities?
- express all probabilities in terms of odds and
- then take all odds and transform to log of odds.
NB: This aspect of analysis is sometimes known as a link function within statistics.
What kind of research questions can logistic regression address?
- Can a level of the outcome variable (through an odds ratio evaluation) be predicted from a given set of variables that deviates from the other level of outcome variable?
- Which variables predict which outcome?
- How do variables affect the outcome?
- Does a particular variable increase or decrease the probability of an outcome, or does it have no effect on the outcome?
Assumption requirements of Logistic Regression vary according to the text you read, what do Hair, Black, Babin and Anderson (2011) suggest an advantage of logistic regression (LR)?
Hair, Black, Babin and Anderson (2011) suggest an advantage of logistic regression (LR) is, the lack of assumptions:
- LR doesn’t require any specific distributional form for the IVs
- heteroscedasticity of IVs isn’t required &
- linear relationships between DV & IV’s aren’t needed.
Assumption requirements of Logistic Regression vary according to the text you read, what would Pallant (2011) suggest?
Pallant (2011) would suggest you check sample size, multicollinearity and deal with any outliers by inspecting scatter plots if you have problems with goodness of fit in your model.
Assumption requirements of Logistic Regression vary according to the text you read, what would Andy Field (2013) recommend?
Field would suggest you check linearity of the relationship with the log of the outcome variable, check for large standard errors and over dispersion (caused by violating the assumption of independence).
Tell me a little more about how Pallant would ensure assumptions are met for Logistic Regression
- Sample size – a small sample and large number of predictors can cause problems with convergence, however, Pallant doesn’t give example values. Hair, Black, Babin and Anderson (2011) suggest each group should have 10 times the number of predictors.
- Multicollinearity – high inter-correlations among predictors should not be a identified in your sample – Check coefficients table, the Collinearity Statistics. Low tolerance values indicate high correlations, so check the necessity of these covarying variables.
- Outliers – these are cases not explained in the model and can be identified through residuals. Problems will occur with the goodness of fit in the model.
It is necessary to test Goodness of Fit for Logistic Regression Models. How do we do this?
Hosmer and Lemeshow goodness of fit test computes a chi-square statistic using observed and expected frequencies. This test evaluates whether the model’s estimates, fit the data well. We want this test to be NOT significant, that is >.05.