Chapter 13 Logistic Regression Flashcards
WHAT ARE THE ASSUMPTIONS OF LOGISTIC REGRESSION ABOUT THE DATA? P64
Binary Output Variable
Remove Noise: Logistic regression assumes no error in the output variable (y), consider removing outliers and possibly misclassified instances from your training data.
Gaussian Distribution: Logistic regression is a linear algorithm (with a nonlinear transform on output). It does assume a linear relationship between the input variables with the output. Data transforms of your input variables that better expose this linear relationship can result in a more accurate model. For example, you can use log, root, Box-Cox and other univariate transforms to better expose this relationship.
Remove Correlated Inputs: Like linear regression, the model can overfit if you have multiple highly-correlated inputs. Consider calculating the pairwise correlations between all inputs and removing highly correlated inputs.
It is possible for the expected likelihood estimation process that learns the coefficients of logistic regression to fail to converge. This can happen if ____ or ____.
there are many highly correlated inputs in your data
the data is very sparse (e.g. lots of zeros in your input data)
What are two reasons of failure in convergence of Logistic Regression?
P 64
Fail to Converge: It is possible for the expected likelihood estimation process that learns the coefficients to fail to converge. This can happen if there are many highly correlated inputs in your data or the data is very sparse (e.g. lots of zeros in your input data).