Week 8 Flashcards

1
Q

What is kNN

A

Supervised classification
Assumes similar data points will exist close to each other (similarity is captured by distance e.g. Euclidean)
For a given labelled data, the class of a new point is determined by the majority class of the k Nearest Neighbours (k is a hyperparameter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

kNN disadvantages

A

Susceptible to influence of outliers (if one class overlaps with another)
Susceptible to class imbalance (high k, bias if one class dominates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we choose k?

A

Start with k=1 and predict on the test set, evaluate. Repeat and increase. k should be odd.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is weighted kNN

A

Impact of nearer neighbours on the query point should be more than the further away points. (1/distance) and add up the scores for each class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

kNN vs other algorithms

A

Perform instance-based learning, experiences performance degradation with big training set.

Suitable for fewer features as low cost. (should perform feature selection first)

Normalisation of data must be performed as distance metrics are used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Support Vector Machine?

A

Uses classification algorithms for binary and multiclass classification problems

Performs better on text in terms of higher speed and better performance. Used to classify text and gene expressions

Support vectors from SVMs can categorise unlabeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do SVMs work?

A

Find a line that separates data points by a margin. Shortest distance between the observations and the threshold is called the margin. Points either side of the line are classified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens if we choose a threshold that allows for misclassification?

A

Poor at training, good at classifying. Low variance, higher bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the margins called?

A

Soft margin when misclassifications are allowed
Hard margin when they are not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a hyperplane?

A

A hyperplane in an n-dimensional Euclidean space is a flat, n – 1
dimensional susbset of that space that divides the space into
two disconnected parts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why are the data points in SVMs called support vectors

A

They are the data points that support or determine the decision boundary. We want to maximise this margin (optimisation).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do SVMs separate data that is not linearly separable by a line?

A

Apply a transformation such as omega(x) = x^2 and add a second dimension to the feature space. Now they are separable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the kernel trick for

A

Calculates the high dimensional relationships (mapping) without actually transforming the data, reducing the computation require for SVMs by avoiding the math.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly