noted EQ Flashcards

1
Q

What are the two ways to define logit (p1)

A

The ln( odds ) of some event, modelled as the weighted sum of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the loss function associated with the log odds function (before turning it into cross entropy)

A

The likelihood function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we change the likelihood function into the cross entropy loss function and why do we do so

A

We take the log of the function and make it negative. This has the effect of turning the product of probabilities into the sum, this is better for computation to prevent floating point errors. And also easier to differentiate to find maxima / minima

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the naive equation for gradient descent and which version will solve / account for the issue of potentially 2d problems where dimensions have different relative magnitudes

A

Updating the weights by some factor of the gradient of those weights is the naive version.

Swapping the learning rate for the incerse hessian matrix of the weights makes for a better solution as the rate of gradient descent will naturally scale by the magnitude of the rate of change of the loss function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the dual representation equation for a hyperplane in the context of support vector machines

A

-The original equation is just minimising the norm of the weights which correspond to the margin
-We add another part to the loss function, this being the max(a, 1-y^(n) . w^T phi(x) +b)
-we have the constraint that the hyperplane must be less than or equal to 0 and the Lagrange (a) multiplier must be greater than or equal to than 0
-this way if the constraint of hyperplane is violated i.e. it is greater than 0. The maximisation problem will make the LaGrangian huge and there is a lot of loss
-If constraint is not violated and hyperplane is negative, to maximise the value the LaGrangian is 0 and hence there is no loss
-Expanding the aformentioned equation, differentiating and setting to 0 and further simplification gives the following rquation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly