Lecture 7A: Logistic Regression Flashcards
What does logistic regression measure?
Measures the relationship between categorical dependent variable and one or more independent variables by estmating probabilities using logisitc function
When to use Logistic Model
- Independent variables are continous
- Meets assumptions of linear regression models
- Distribution fits linear model but target class is binary (normal distribution)
What does it mean to train the model?
It means finding the optimal values. Such that we get the best predictive performance, or, the best seperation of Y(1)’s and N(0)’s.
Optimal Coefficients
● The optimal coefficients can be used to predict the unseen features (x values in the equation)
Predictive Models
● Predictive models are “predictive” and they are expected to have “Errors”
○ Objective is to go as CLOSE as possible to the would be reality
○ Error is the gap between the prediction and the reality
○ Process: Feed the model with labeled data and modify the parameters to minimize the ERROR (the training process)
Feature Importance of the Model Features by
○ Multiplying the coefficients by the Standard Deviation
○ Convert the data set to standardized data before getting the coefficients
○ Higher coefficient values indicate larger influence of corresponding features on Outcome (Target Variable)
Linear regression is similar to logistic regression, except…
Logistic Regression predicts if something is true or false, instead of predicting something continous, like size…
Instead of fitting a line, like we do in linear regression, in logistic regression, we fit…
fits an “S” shaped “logistic function”. The “S” curve goes from zero to one
What is logistic regression usually used for?
It is usually used for classification
Just like linear regression, logistic regression can work with what type of data?
Logistic regression can work with continous data (like weight and age) and discrete data (like genotype and astrological sign)
Logistic regression does not have the same concept of a “residual” which is used in linear regression, so it can’t use least squares and it cant calculate R^2, instead it uses…
Maximum Likelihood
In summary, logistic regression can be used to
classify samples, and it can use different types of data (like size and/or genotype) to do that classification. It can also be used to assess what variables are useful for classifying samples
How do we find optimal values of the coefficients?
Cost Function, Loss Function, Error Function
Cost Function
Alternate Terms - Loss Function
Gap between “prediction” and “reality” is prediction “…”
“Error”
In Logistic Regression, how do we minimize error?
WE feed the model with “labelled data” and continually modify its parameters (or coefficients) to minimize the ERROR (the training process).
How do we know when we reached the “trained state” where the ERROR is minimal?
Its a mathemtical optimization problem. Various approaches.
In linear regression, we try to minimize the Mean Square Error(MSE).
Instead of using error values directly, we develop a function that will measure the “cost” or “loss” related to the error and the function is “continous”.
Make the function to be “convex” so there is a clear “global minima”