Week 2: Learning Linear & Logistic Regression Flashcards
What is the loss function?
Is a measure that expresses the difference between actual outcome y and regression plane
θ^T x_i for one observation i.
State the loss function for linear & logistic regression respectively.
Linear regr.: Squared error loss;
L(y,y.hat) = (y.hat - y)^2,
Log regr.: Cross-entropy loss,
L(y, g(x)) = {ln g(x) for y =1, ln (1-g(x) for y = -1
Which loss function do we actually use when training a classifier? Why not the misclassification?
The cross-entropy loss. Besides formulating the loss function not just in termms of the hard class prediction y.hat, we also incorporate the predicted class probabability g(x).
Not misclassification since,
1) using cross-entropy can result in a model that generalizes better from the training data, as the final prediction y.hat does not reveal all aspects of the classifier (this would mean pushing the decision boundaries further away from the training data points),
2) Misclassification would give us a piecewise constant cost function = impossible for numerical optimization as the gradient is zero everywhere (except where undefined).
What is the cross-entropy loss function and when is it used?
Commonly used in classification tasks, especially for binary and multiclass classification. It measures the dissimilarity between predicted class probabilities and true class probabilities.
How do we measure closeness in OLS regression?
By residual sum of squares (RSS). Pick values of theta such that RSS is minimized on the training data.
How do we write RSS? (not matrix not.) = s
Sum (yi-yi^hat)^2 = sum (ei)^2
What do we generally model with both linear and logistic regression?
The conditional expectation of y given x.
What is the perspective on linear regression in ML (compared to classical statistics)?
Emphasize on learning the function instead of estimating parameters (for inference, e.g.).
Formula for squared error loss?
(yi-theta^T * x_i) ^2
Dimensions of X and Y in the training sample?
X has dim [n x (p+1)], Y has dim [n x 1].
The default vector, is it column or row?
Column
Difference between loss and cost functions?
Loss functions measures dissimilarity between observed output for observation i and the p-dim regression plane for one observation individually, while cost function measures the same dissimilarity but averaged over all observations in training sample.
Is linear regression parametric or non-parametric?
Parametric
Is logistic regression parametric or non-parametric?
Parameteric
At what rate does the loss function grow as the difference between y and the prediction yhat(x;theta) increases?
Quadratically.