Probabilistic Discriminative Models Flashcards
What are Logistic Regression models?
These are models directly modeling P(Ck|X) by:
P(C1|x) = σ(w.T * x) [for 2-class classification]
P(Ck|x) = softmax_k(w.T * x) [for k-class classification]
Parameters are initialized randomly and their optimal values are approximated through gradient descent.
What is Conditional likelihood?
P(T|X, w) = Π P(ti|Xi)
What is the parameter update rule of Newton-Raphson method?
w(new) = w(old) - H^-1 * ∇E(w)
(H is the Hessian of E)
N.B. : If the error function is quadratic, Newton-Raphson finds the solution in one step.
What is Iterative Reweighed Least Squares (IRLS)?
w(new) = (X.T * R * X)^-1 * X.T * R * Z, were:
-Z is a n-dimensional vector (e.g. Z = X*w(old) - R^-1 * (y-t))
-R depends on w but is not constant, so we must apply this update equation iteratively.
What is the derivative of the sigmoid function?
σ’(x) = (1-σ(x))*σ(x)