wk7 Flashcards
What is the cross entropy loss function
a measure of the difference / dissimilarity in two distributions, the target and predicted output:
-Sum: real output ln(p_1 prediction) + (1-real output)ln(1-p_1 prediction). It can further be simplified to:
-Sum over i: real output * w^T x^(i) - ln ( 1 + exp ( w^T.x^(i) ) )
What is derivative of cross entropy loss function
sum over i: p_1(x^(i),w) - y^(i))x^(i)
What are the two problems with using simple gradient descent for logistic regression
1) can get stuck in local minima
2) if regression is two-dimensional and one dimension is of a higher magnitude, gradient descent can jump around and be inefficient, see image
What is the solution to the problems of simple gradient descent (overshooting and bouncing even in convex functions)
Use a hessian matrix, a matrix of the partial second-order derivative of each variable of the function. This tells us how rapidly the function changes and can help us adjust our step size rapidly
-We use the newton raphson method for an estimation of the weight but substitute in the hessian matrix to come up with a new iterative function
w = w_0 - H^-1(w_0) * Cross Entropy
How does iterative reweighted least squares change with regularisation
only the first-order derivative of cross entropy changes to include the term lambda/2 * Norm(W). For example, see the image
-Cost function has Lambda . W added to it
-Hessian just has lambda added to it
In the case of a multi-class logistic regression problem how do we calculate the probability for an individual class
p_i = exp(w^T x) / 1 + sum over M-1 for each i-1: exp(w^Tx)
How do you check for multi-colinearity for multi-class regression problems
Check the Pearson correlation between each pair of output variables:
-if the correlation is greater than |0.8|, there is multicolinearity
How do you compute the hessian for a function
It is equal to the partial derivative w.r.t each other variables of the function. But the explicit values can be found by multiplying p1 with p0 and then multiplying that by x^Tx