Linear and Kernel Models Flashcards
Write down the general form of a linear predictive model
f(x) = <w, x> + b
Write down the optimal weight vector of a regularised model with loss L
w = argmin (1/N) sum{L(f(xi) - yi) + lambda * J(w)}
What is the goal of linear regression?
Finding a linear function that best interpolates a given set of labelled training points.
Write down the optimal weight vector of Least Squares Regression
w = argmin (1/N) sum{(f(xi) - yi)^2} = (XT X)^(-1) XT y
What is Ridge Regression? What does it solve?
w = argmin (1/N) sum{(f(xi) - yi)^2 + lambda * ||w||^2} = (XT X + lambda I)^(-1) XT y
It has a closed form solution, prevents weights from exploding, and can be used for N < p
Define elastic net regression
Combines L1 and L2 norm
Define 0-1 loss
Indicator function that returns 1 when the target and output are not equal and 0 otherwise
What is logistic regression?
A regression that returns a value between 0 and 1 representing the log odds of an event.
It uses a sigmoid function
p(y=1|x) = 1 / {1 + exp(-<w,x>-b)}
Or
log{p(y=1|x) / p(y=0|x)} = <w, x> + b
What happens when you apply high regularisation to regression?
It limits the influence of individual points
What do SVMs maximise?
The margin, i.e. the distance of the closest points to the hyper plane.
Write down the formula for the hard and soft margin versions of an SVM
check notes
What does increasing C do to an SVM?
It increases how much you pay for each point that violates the margin constant. I.e. decreases regularisation
What are the advantages of kernel methods?
They represent a computational shortcut and permit closed form solutions when p>N
What is the aim of the kernel function?
To embed data into a space where patterns can be discovered as linear relations
Write the formula for both the primal and dual solution
Check notes
What is a Kernel function? How is it used?
A substitute for a dot product in Kernel Regression, where for instance
alpha = (X XT)^-1 y
is substituted by a
lpha = (K)^-1 y
Where K is some function of the inner product of X
What is multiple kernel learning?
The kernel K is considered a linear combination of M basis kernels. We can then learn both the kernel alphas and the weights of each kernel as a single optimisation problem.