5-KNN Flashcards

Question 1

Q

What is the KNN algorithm?

Answer

A

Store all training data points
Compute distance of test instance to all training data points
Find K closest data points
4 . Compute concept based on nearest data points

Question 2

Q

How are data points represented in KNN?

Answer

A

Each point is a feature vector

Question 3

Q

What is hamming distance?

Answer

A

Number of differing elements in two strings of different elements. E.g. differences in nominal attributes once one-hot encoded

Question 4

Q

What is simple matching distance?

Answer

A

1 - number of matching features / total features

Question 5

Q

What is jaccard distance?

Answer

A

1 - | A ^ B | / |A U B|

If nominal features are one-hot encoded, sets are expressed as the elements encoded with 1

Question 6

Q

What is manhatten distance?

Answer

A

d(a,b) = sum from i to N of |a_i - b_i|

Question 7

Q

What is euclidean distance?

Answer

A

d(a,b) = sqrt(sum from i to N of (a_i - b_i)^2)

Question 8

Q

What is cosine distance?

Answer

A

d(a,b) = 1 - cos(a,b)
= 1 - sum(a_ib_i)/(sqrt(sum(a_i^2))sqrt(sum(b_i^2)))

Question 9

Q

What is normalised ranks?

Answer

A

For ordinal values:
1. Sort values and return rank
2. Map rank to evenly spaced values between 0 and 1, i.e. 1/4 if 4
3. Compare using distance function

Question 10

Q

How can label be chosen?

Answer

A

Majority voting

Inverse distance weighting (usually with small epsilon in case instances match): 1/di + e

Inverse linear distance: (d_max - d_i)/(d_max - d_min)

Question 11

Q

What is the impact of k in KNN?

Answer

A

Lower k generates jagged decision boundary. Higher k generates smooth decision boundary

Question 12

Q

How if more than K neighbours share same distance?

Answer

A

Random

Change distance matrix

Question 13

Q

How to break ties if two classes are equally as likely?

Answer

A

Avoid even K

Random tie break

Use class with highest prior probability

Question 14

Q

What are the pros of KNN?

Answer

A

It is intuitive
Supports classification and regression
No assumptions
No training phase

Question 15

Q

What are the cons of KNN?

Answer

A

Difficult to choose best distance function
Difficult to choose right K
Expensive with large data sets

Question 16

Q

What is lazy learning?

Answer

Study These Flashcards

A

Lazy learning are also known as instance-based learning.

Training data is stored and test instances are compared to the test data. There is no learning.

Question 17

Q

What is eager learning?

Answer

Study These Flashcards

A

Model is trained on training data using labelled instances. The model generalises from seen data to unseen data. The model then predicts the labels for test instances.

5-KNN Flashcards

(17 cards)