noted EQ Flashcards
What are the two ways to define logit (p1)
The ln( odds ) of some event, modelled as the weighted sum of independent variables
What is the loss function associated with the log odds function (before turning it into cross entropy)
The likelihood function
How do we change the likelihood function into the cross entropy loss function and why do we do so
We take the log of the function and make it negative. This has the effect of turning the product of probabilities into the sum, this is better for computation to prevent floating point errors. And also easier to differentiate to find maxima / minima
what is the naive equation for gradient descent and which version will solve / account for the issue of potentially 2d problems where dimensions have different relative magnitudes
Updating the weights by some factor of the gradient of those weights is the naive version.
Swapping the learning rate for the incerse hessian matrix of the weights makes for a better solution as the rate of gradient descent will naturally scale by the magnitude of the rate of change of the loss function
what is the dual representation equation for a hyperplane in the context of support vector machines
-The original equation is just minimising the norm of the weights which correspond to the margin
-We add another part to the loss function, this being the max(a, 1-y^(n) . w^T phi(x) +b)
-we have the constraint that the hyperplane must be less than or equal to 0 and the Lagrange (a) multiplier must be greater than or equal to than 0
-this way if the constraint of hyperplane is violated i.e. it is greater than 0. The maximisation problem will make the LaGrangian huge and there is a lot of loss
-If constraint is not violated and hyperplane is negative, to maximise the value the LaGrangian is 0 and hence there is no loss
-Expanding the aformentioned equation, differentiating and setting to 0 and further simplification gives the following rquation