L17 - KNN and Weighted-KNN Flashcards

Question 1

Q

What type of model is KNN?

Answer

A

A supervised learning model for classification.

Question 2

Q

What core assumption does the model work on?

Answer

A

Data points within close proximity are likely to be of the same class.

Question 3

Q

What is the majority vote concept of KNN?

Answer

A

The new data point is classified based on the majority class of the surrounding data points.

Question 4

Q

Which distance metric is used to determine similarity?

Answer

A

Euclidian distance metric.

Question 5

Q

What is K? What type of parameter is it?

Answer

A

The coefficient of the algorithm, representing the number of surrounding points assessed in the majority vote.

Question 6

Q

What are the 2 main issues that can skew the classification performance?

Answer

A

Outliers - If a class has outliers close to a cluster of a different class, this can cause incorrect classification of new data.
Class imbalance - If one class count heavily outweighs another, this can cause incorrect classification.

Question 7

Q

What is the solution to Outliers and Class Imbalance?

Answer

A

Weighting data points by the inverse of their distance to the new data point.

Question 8

Q

Why is the selection of K so important?

Answer

A

Determines the number of neighbours to assess.

Chosen carefully to avoid under-fitting or overfitting.

Question 9

Q

Why should K be an odd number?

Answer

A

To avoid classification ties.

Question 10

Q

If there are a high number of outliers, should a high or low K be chosen? Give reason…

Answer

A

High K.

To compensate for the outliers by having a wider spread of data points to assess.

This will ensure nearby class clusters can be assessed, which will outweigh the outlier points.

Question 11

Q

What are the 2 main methods for choosing K? Explain each…

Answer

A

Incremental: Start with K = 1, and increment by 1. Perform a classification test data upon each incrementation to determine the classification performance with that K.

Square Root: K = the square root of all data points.

Question 12

Q

What is the purpose of applying Weighted KNN? How does it work?

Answer

A

Compensates for outliers and class imbalance by assigning a higher weight to nearer points.

All K points are assessed. Points of the same class have their weight summed. New data point is assigned to the class with the greatest weight.

Question 13

Q

What are some things to consider when using KNN or Weighted-KNN?

Answer

A

Training Data Size - Performance degradation occurs when training on large data sets. Complexity grows with training size.

Normalisation - All data should be normalised to be between 0 and 1.

Dimensionality - Both work better in lower dimensions. Thus feature space should be decreased. E.g though feature selection.

Question 14

Q

What are some advantages and disadvantages of KNN and WKNN?

Answer

A

Advantages:
- Simple to implement
- Adaptable
- Few hyper-parameters

Disadvantages:
- Computationally expensive on large data
- Doesn’t perform well on high dimensional data
- Prone to overfitting

Question 15

Q

What are the 2 hyper-parameters of KNN?

Answer

A

K
Distance Metric