Week 8 Flashcards
What is kNN
Supervised classification
Assumes similar data points will exist close to each other (similarity is captured by distance e.g. Euclidean)
For a given labelled data, the class of a new point is determined by the majority class of the k Nearest Neighbours (k is a hyperparameter)
kNN disadvantages
Susceptible to influence of outliers (if one class overlaps with another)
Susceptible to class imbalance (high k, bias if one class dominates)
How do we choose k?
Start with k=1 and predict on the test set, evaluate. Repeat and increase. k should be odd.
What is weighted kNN
Impact of nearer neighbours on the query point should be more than the further away points. (1/distance) and add up the scores for each class.
kNN vs other algorithms
Perform instance-based learning, experiences performance degradation with big training set.
Suitable for fewer features as low cost. (should perform feature selection first)
Normalisation of data must be performed as distance metrics are used.
What is a Support Vector Machine?
Uses classification algorithms for binary and multiclass classification problems
Performs better on text in terms of higher speed and better performance. Used to classify text and gene expressions
Support vectors from SVMs can categorise unlabeled data.
How do SVMs work?
Find a line that separates data points by a margin. Shortest distance between the observations and the threshold is called the margin. Points either side of the line are classified.
What happens if we choose a threshold that allows for misclassification?
Poor at training, good at classifying. Low variance, higher bias.
What are the margins called?
Soft margin when misclassifications are allowed
Hard margin when they are not.
What is a hyperplane?
A hyperplane in an n-dimensional Euclidean space is a flat, n – 1
dimensional susbset of that space that divides the space into
two disconnected parts.
Why are the data points in SVMs called support vectors
They are the data points that support or determine the decision boundary. We want to maximise this margin (optimisation).
How do SVMs separate data that is not linearly separable by a line?
Apply a transformation such as omega(x) = x^2 and add a second dimension to the feature space. Now they are separable.
What is the kernel trick for
Calculates the high dimensional relationships (mapping) without actually transforming the data, reducing the computation require for SVMs by avoiding the math.