Week 9 Logistical Regression Flashcards by Andrea Jones

How is Logistic regression similar to Multiple Regression?

Logistic regression uses similar procedures:

* Like multiple regression, the prediction equation includes a linear combination of the predictor variables

How well did you know this?

Not at all

Perfectly

What does Logistic Regression (AKA Logit Regression) enable researchers to achieve?

Logistic regression allows one to:

predict a discrete outcome such as group membership from a set of variables that may be continuous, discrete, or a mix
evaluate the odds (or probability) of membership in one of the groups … based on the combination of values of the predictor variables

How well did you know this?

Not at all

Perfectly

What is Binomial or Binary Logistic Regression?

Binomial (or Binary) logistic regression is a form of regression used when the single dependent variable is dichotomous, even though the independent variables may be of any type

How well did you know this?

Not at all

Perfectly

What are some key terms to consider in Logistic Regression?

In Binomial Logistic Regression, all IV’s are entered as covariates
Logistic regression is used when there are multiple dependent variables
Ordinal Logistic Regression is used if multiple classes of DV are ranked
Sequential and Stepwise Logistic Regression may also be used
Interactions may be used but must be transformed
95% of cells should have values >5.
the higher of the 2 categories is defaulted to predict the Reference category in Binomial Logistic Regression.

How well did you know this?

Not at all

Perfectly

Although Binomial logistic regression is relatively free of restrictions, there are some limitations to be aware of. What are these?

causal inference does not apply in this form of analysis
I must theoretically justify my choice of predictor variables in the analysis
I must deal with missing values & check accuracy of data entry, prior to analysis
When a perfect solution is identified through binomial classification (that is when one group level has completely polarised values compared to the other group level) then the maximum likelihood solution will not converge
Extremely high parameter estimates & standard estimates are indications that problems exist
Logistic regression assumes that responses of different cases are independent of each other
Achieving multivariate normality and linearity may enhance power

How well did you know this?

Not at all

Perfectly

What is a Logit Variable?

A Logit Variable is a natural log of the odds of the dependent occurring or not

How well did you know this?

Not at all

Perfectly

When does Logistic regression apply Maximum Likelihood Estimation (MLE) ?

MLE is applied after transformation of the DV into a logit variable.
Thus logistic regression estimates the odds of a certain event occurring.

How well did you know this?

Not at all

Perfectly

What is the difference between Maximum Likelihood Estimation (MLE) & Ordinary Least Square (OLS) estimation?

Because logistic regression calculates changes in the log odds of the dependent whereas OLS regression calculates changes in the dependent variable itself

How well did you know this?

Not at all

Perfectly

Why does Howell (2002) favour Logistic Regression over alternatives to Logistic analysis such as Standard Multiple Regression (SMR) and Discriminant Function Analysis (DFA)?

With a dichotomous dependent variable SMR only provides a fairly good estimate if the percentage of improvement scores don’t fall below 20% or above 80% across all values, the rest of the time is not a wise choice. *DFA requires more stringent assumptions to be met & may produce probabilities out of the range being investigated, that is 0 to 1 (so not good)

How well did you know this?

Not at all

Perfectly

What visual representation does Howell (2002) favour over a straight line?

a sigmoidal curve better represents results when using binomial logistic regression.
when predicting probabilities based on a criterion that has a categorical value of two values the relationship si not always linear.

How well did you know this?

Not at all

Perfectly

What are the 2 steps required to calculate the probabilities?

express all probabilities in terms of odds and
then take all odds and transform to log of odds.

NB: This aspect of analysis is sometimes known as a link function within statistics.

How well did you know this?

Not at all

Perfectly

What kind of research questions can logistic regression address?

Can a level of the outcome variable (through an odds ratio evaluation) be predicted from a given set of variables that deviates from the other level of outcome variable?
Which variables predict which outcome?
How do variables affect the outcome?
Does a particular variable increase or decrease the probability of an outcome, or does it have no effect on the outcome?

How well did you know this?

Not at all

Perfectly

Assumption requirements of Logistic Regression vary according to the text you read, what do Hair, Black, Babin and Anderson (2011) suggest an advantage of logistic regression (LR)?

Hair, Black, Babin and Anderson (2011) suggest an advantage of logistic regression (LR) is, the lack of assumptions:

LR doesn’t require any specific distributional form for the IVs
heteroscedasticity of IVs isn’t required &
linear relationships between DV & IV’s aren’t needed.

How well did you know this?

Not at all

Perfectly

Assumption requirements of Logistic Regression vary according to the text you read, what would Pallant (2011) suggest?

Pallant (2011) would suggest you check sample size, multicollinearity and deal with any outliers by inspecting scatter plots if you have problems with goodness of fit in your model.

How well did you know this?

Not at all

Perfectly

Assumption requirements of Logistic Regression vary according to the text you read, what would Andy Field (2013) recommend?

Field would suggest you check linearity of the relationship with the log of the outcome variable, check for large standard errors and over dispersion (caused by violating the assumption of independence).

How well did you know this?

Not at all

Perfectly

Tell me a little more about how Pallant would ensure assumptions are met for Logistic Regression

Sample size – a small sample and large number of predictors can cause problems with convergence, however, Pallant doesn’t give example values. Hair, Black, Babin and Anderson (2011) suggest each group should have 10 times the number of predictors.
Multicollinearity – high inter-correlations among predictors should not be a identified in your sample – Check coefficients table, the Collinearity Statistics. Low tolerance values indicate high correlations, so check the necessity of these covarying variables.
Outliers – these are cases not explained in the model and can be identified through residuals. Problems will occur with the goodness of fit in the model.

How well did you know this?

Not at all

Perfectly

It is necessary to test Goodness of Fit for Logistic Regression Models. How do we do this?

Hosmer and Lemeshow goodness of fit test computes a chi-square statistic using observed and expected frequencies. This test evaluates whether the model’s estimates, fit the data well. We want this test to be NOT significant, that is >.05.

How well did you know this?

Not at all

Perfectly

What is alternative test of Goodness of Fit?

An alternative test for goodness of fit of the model is the omnibus tests of model coefficients. However, it tests whether the model that has all predictors included, is significantly different to the model with just the intercept.
Whereas, Hosmer and Lemeshow’s goodness of fit test divides subjects into deciles based on predicted probabilities and then a probability is computed from the chi-square calculation between observed and expected frequencies.

What is alternative test of Goodness of Fit?

An alternative test for goodness of fit of the model is the omnibus tests of model coefficients. However, it tests whether the model that has all predictors included, is significantly different to the model with just the intercept.

I have heard that sample size and interpretation of results are of particular importance in relation to goodness of fit. Why so?

if sample size is very large, almost any difference between models is likely to be statistically significant even if the difference has no practical importance and classification is wonderful with either model which needs to be kept in mind when interpreting the data
The model’s fit & the effects of the sample size need to be identified and remembered when interpreting the results.
you may have a good fit to the model but not obtain a significant result. So interpretation needs to be done in conjunction with what the tests are actually telling us.

In Logistic Regression, what do large R2 (R squared) values indicate?

In logistic regression larger R2 values indicate that more of the variation is explained by the model, to a maximum of one. However, for regression models with a categorical dependent variable, it is not possible to compute a single R2 statistic that has all of the characteristics of R2 in the linear regression model, so these alternative approximations are computed instead.

What does the Log Likelihood Statistic indicate in Logistic Regression?

The log likelihood statistic compares the models by subtracting the log likelihood of the second model from the first model.
By looking up the critical values for your chi square statistic it provides an evaluation of the comparison of the two models.

Why is the classification table the most important part of estimating the probability of predicting the outcome in Logistic Regression?

*The classification table indicates the practical results of the model to the researcher, i.e. how much is predicted

How do we interpret logistic regression coefficients?

Solving logistic regression coefficients involves calculus

* If an acceptable model is found the Wald test statistic for each predictor is evaluated

What does the Hosmer- Lemeshow Test indicate?

The Hosmer- Lemeshow Test indicates model fit. When it is .05 so it is a satisfactory fit.

What does the Classification Table indicate?

The Classification Table evaluates how well the data fits the model and able to predict the correct category for each case. e.g. the % of correctly falling into one category compared to the other.

What 3 pieces of information does the Model Summary table indicate?

The Model Summary Table provides 3 pieces of information: * Log likelihood statistic – often used to compare progressive models. * Cox & Snell R Square * Nagelkerke R Square

What does the Cox and Snell R Square indicate?

*Cox & Snell R Square - is based on the log likelihood value for the model compared to the log likelihood for a baseline model – indicates amount of variation in the DV explained by the model. A pseudo R2 stat.

What does the Nagelkerke R Square statistic indicate?

Nagelkerke R Square - Also a pseudo R2 stat & the preferred reference. An adjusted version of the Cox & Snell R2. where it adjusts the scale of the statistic to cover the full range from 0 to 1 (SPSS, 2009).

What 3 pieces of information does the "Variables in the Equation" table indicate?

Variables in the Equation provides: * the significance level & the following 3 pieces of information: * B value * Wald statistic * Exp(B)

What do B values indicate?

The B value is similar to Multiple Regression. It is important to look at the direction of this value – particularly with reference to contribution.

What does the Wald Statistic indicate?

Wald statistic provides information on the contribution of each variable/predictor in the model tested

What does the Exp(B) indicate?

Exp(B) values are the odds ratio for each IV. *The change in odds of being in one of the categories of outcome when the value of a predictor increases by one unit. That is the value of a person answering Yes to a particular scenario.

What information can I glean from the Casewise List?

The Casewise List identifies cases in the dataset that do not fit with the model well. *You can go back and look at these individually if it assists in understanding – note do not be concerned with the numbers for the cases. This changes each time, more importantly please note the values and you will see similarity.

When do I use Logit Regression?

Use logit models whenever your dependent variable is binary (also called dummy) which takes values 0 or 1

Is Logit Regression Linear?

Logit regression is a nonlinear regression model that forces the output (predicted values) to be either 0 or 1.

What do Logit Regression Models actually do?

Logit models estimate the probability of your dependent variable to be 1 (Y=1). This is the probability that some event happens

What is the difference between Logit and Probit Models?

Logit and probit models are basically the same, the difference is in the distribution: • Logit – Cumulative standard logistic distribution (F) • Probit – Cumulative standard normal distribution (Φ) Both models provide similar results.

What will I gain by running a Logit Model?

After running the logit model you can estimate predicted probabilities or odds ratios by different levels of a variable (in particular for categorical or nominal variables).

When can I use Ordinal Logit?

When a dependent variable has more than two categories and the values of each category have a meaningful sequential order where a value is indeed ‘higher’ than the previous one, then you can use ordinal logit.

Although Binomial logistic regression is relatively free of restrictions, there are some limitations to be aware of. What are these?

* causal inference does not apply in this form of analysis * I must theoretically justify my choice of predictor variables in the analysis * I must deal with missing values & check accuracy of data entry, prior to analysis * When a perfect solution is identified through binomial classification (that is when one group level has completely polarised values compared to the other group level) then the maximum likelihood solution will not converge * Extremely high parameter estimates & standard estimates are indications that problems exist * Logistic regression assumes that responses of different cases are independent of each other * Achieving multivariate normality and linearity may enhance power

What kind of research questions can logistic regression address?

* Can a level of the outcome variable (through an odds ratio evaluation) be predicted from a given set of variables that deviates from the other level of outcome variable? * Which variables predict which outcome? * How do variables affect the outcome? * Does a particular variable increase or decrease the probability of an outcome, or does it have no effect on the outcome?

Assumption requirements of Logistic Regression vary according to the text you read, what would Pallant (2011) suggest?

Pallant (2011) would suggest you check sample size, multicollinearity and deal with any outliers by inspecting scatterplots if you have problems with goodness of fit in your model.

Week 9 Logistical Regression Flashcards

to Provide revision on the topic of Logistical Regression