Logistic Regression Flashcards
What’s the equation for mean squared error? (multiple dimensions)

What’s the equation for the prediction of logistic regression?
sigmoid(thetaT*x) (I think)
What does logistic regression output?
Calculates the probability of each class, and takes the class with the highest probability. The prediction is based on the values of a set of independent variables.
What is this?

The output of logistic regression
What are some important characteristics to remember about logistic regression? (2)
- easily interpretable
- gives the probability of an event occurring, not just the predicted classification.
Can you apply linear regression to a classification problem?
Usually it’s a bad idea
What is the output of logistic regression?
The argmax of probabilities (between 0 and 1)
What is this?

The hypothesis of linear regression
What is the hypothesis of logistic regression in:
- words
- equation form
The hypothesis of linear regression fed into the sigmoid function

What does the graph of logistic regression look like?
Sigmoid function
- What is this?
- How do you interpret it?

- The probability expression of logistic regression’s output (before the argmax)
- Probability that y=1, given x, parametrized by theta

Do the outputs of logistic regression add up to exactly 1?
How should you think of the prediction of binary logistic regression?
Predict 1 when θTx >= 0.5. Otherwise, 0
How can you solve for the line of the decision boundary for binary logistic regression?
Essentially, setting theta transpose x (the hypothesis of linear regression) equal to 0 is the equation for the decision boundary.
Steps:
- Try to get theta transpose * x.
- Plug intercept value into theta transpose * x
- Set that equal to 0
- treat x2 as y and x1 as x and solve for the equation of the line
- If the line is over the origin, then the half-space that doesn’t contain the origin predicts 1. If it’s under the origin, the half-space with the origin predicts 1
What is important to remember about the decision boundary of binary logistic regression?
h(x) = 0.5
Can logistic regression take on a nonlinear decision boundary? If so, how?
Yes, by adding higher-order polynomial term features
Can binary logistic regression have a decision boundary that is a circle?
- Yes, if you use higher order polynomial features
For logistic regression, can we use the same cost function that linear regression uses?
No. Because plugging the sigmoid (which is a nonlinear function) into the MSE equation makes for a nonconvex function
- What’s the cost function for logistic regression?
- What does the graph look like?
- What’s the intuition?
Intuition:
- For y=1, as the h approaches 0, the penalty goes to infinity. Same idea for 0, except the graph of the cost function is flipped horizontally

What do we know about the cost function for logistic regression? (3)
- It’s derived from the principle of MLE
- It’s convex
- No closed-form solution for logistic regression because of the nonlinearity of the sigmoid
What does learning in logistic regression do? Why?
- We minimize the negative log conditional likelihood.
- We can’t maximize likelihood (as in Naïve Bayes) because we don’t have a joint model p(x,y)
What’s the cost function for logistic regression in compact form?

What do we know about the negative average conditional log likelihood for logistic regression?
It’s convex
What’s the softmax function?
What’s the relation between the softmax function and the sigmoid function?
- The sigmoid function is used for the two-class logistic regression, whereas the softmax function is used for the multiclass logistic regression
- When num classes = 2, the softmax function reduces to the sigmoid function used for binary logistic regression. So in some sense, they’re the same
What happens if theta^T * x = 0 during training for binary logistic regression?
Assuming no bias term, No change because the cost function penalizes based on the predicted probability and if that’s 0, the probability = 0.5.