Logistic Regression Flashcards
Limits of Linear Regression
- Assumes response variable is normally distributed
In logistic regression, the coefficients are estimated
using a technique called _______________
Maximum Likelihood Estimation (MLE)
Unlike the _________________ method
used by linear regression, finding a closed form for
the coefficients using MLE is not possible. Instead, the process is _________.
Ordinary least Squares (OLS)
Iterative
The ___________________________is an extension of
linear regression that allows for linear predictors to be
related to a response variable that is not normally
distributed by using a transformation or link function
Generalized Linear Model (GLM)
The link function used for binomial logistic regression is called the _________
Logit / Log-odds function
log ( p / ( 1 - p) ) where p is a probability
Maps probabilities (0, 1) to (-inf, +inf)
In Logistic Regression
For every unit increase in
tumor size, the odds of it
being malignant changes
by a multiple of ___
e ^ Beta where Beta is the coefficient
If Beta<0 , the odds that the tumor is malignant \_\_\_\_\_\_\_ as tumor size increases. If Beta>0, the odds that the tumor is malignant \_\_\_\_\_\_\_\_ as tumor size increases.
decreases
increases
An estimate of the relative information lost by a given
model: the less information a model loses, the higher
the quality of the model
Akaike Information Criterion (AIC)
AIC = 2k - 2 ln ( L )
Where L is the maximum value of the likelihood function for the model
k is the # of estimated parameters of the model
Given a set of candidate models for the data, the preferred model is the one with the minimum AIC value. Thus, AIC rewards goodness of fit (as assessed by the likelihood function), but it also includes a penalty that is an increasing function of the number of estimated parameters. The penalty discourages overfitting, which is desired because increasing the number of parameters in the model almost always improves the goodness of the fit.
Strengths of Logistic Regression?
-Outputs have a nice probabilistic interpretation. -Can be regularized to avoid overfitting. -Easy to implement and use. -Very efficient to train.
Weaknesses of Logistic Regression?
-Makes strong assumptions about the data. -Does not do well with missing data. -Tends to underperform when there are multiple or non-linear decision boundaries. -Does not naturally capture complex relationships.