Week 3 (Trees and SVMs) Flashcards
What is a decision tree and how does it divide space
How are trees structured
How is the tree structure learned
How are trees pruned
What does a kernel function measure
Similarity between data points
What does regularised linear regression look like
What is the design matrix, and what are dual parameters
(Dual parameters represent how important each data point is to make the decision)
What is a linear kernel
What is the kernel trick
Computing the kernel without needing to compute the underlying feature map
Why don’t we need all the dual parameters?
Values equal to zero (common for classification) are unimportant
What is the Gram matrix
What is the closed form solution to compute the dual parameter matrix
(note t is the labels)
How are kernels used for regression
How do maximum margin classifiers work
Why does summing over the kernels with a new datapoint help classify it
Since it is either predicting +1 or -1 weighted by a, it simply gets the vote of if it is close to the points of each class.
What is the dual representation for maximum margin classifiers
What is the most popular kernel for SVM
RBF
What is a soft margin and a slack variable for SVMs
What does this equation mean for SVMs
wT w represents the margin, and the weird e is the slack variables. The subject to defines the relationship between the slack variables and the predicted values
What is the C parameter for in SVMs
How do SVMs perform multi class classification