Nearest Neighbour Flashcards by Zoey Sheffield

What is Voronoi tesselation?

Partitions space into regions where all points are closer to the regions training point than any other point

How well did you know this?

Not at all

Perfectly

What is a Voronoi cell?

A single region in a Voronoi tessalation

How well did you know this?

Not at all

Perfectly

What is the decision boundary for 1 nearest neighbor?

The edges of all pairs of Voronoi cell that contain training examples from diffrent classes

How well did you know this?

Not at all

Perfectly

kNN classification algorithm

Compute distance D(x, x_i) to every training example
Select k closest instances
Output the most frequent class

How well did you know this?

Not at all

Perfectly

kNN regression algorithm

Compute distance D(x, xi) to every training example
Select k closest instances
Output the mean of closest instances

How well did you know this?

Not at all

Perfectly

What is interpolation?

Prediction within range of the training examples

How well did you know this?

Not at all

Perfectly

What is extrapolation?

Prediction outside the range of training examples

How well did you know this?

Not at all

Perfectly

What happens when you pick small values for k in kNN?

Small changes in training set produce large changes in classification

How well did you know this?

Not at all

Perfectly

What happens when you pick very large values for k in kNN?

Everything is classified as the most probable class P(y)

How well did you know this?

Not at all

Perfectly

How can we choose the right value for k in kNN?

Train kNN for k = 1, 2, 3, … then pick the model with the smallest validation error on the validation set

How well did you know this?

Not at all

Perfectly

Whats the definition of D(x, x’) if D is the euclidian distance function and x has d dimensions?

How well did you know this?

Not at all

Perfectly

Whats the definition of D(x, x’) if D is the hamming distance function and x has d dimensions?

How well did you know this?

Not at all

Perfectly

Whats the definition of D(x, x’) if D is the minkowski (p-norm) distance function and x has d dimensions?

How well did you know this?

Not at all

Perfectly

What is distance function obtained when p=2 for p-norm?

Euclidian

How well did you know this?

Not at all

Perfectly

What is distance function obtained when p=1 for p-norm?

Manhatten

How well did you know this?

Not at all

Perfectly

How does p-norm behave when p approaches 0?

Study These Flashcards

logical and

How does p-norm behave when p approaches infinitity?

Study These Flashcards

logical or

How can you resolve ties in kNN? (4)

Study These Flashcards

randomly
prior (class with greatest probability)
nearest: use 1-nn
use odd k (doesnt solve multiclass)

What is a reasonable choice to fill in missing values (kNN)?

Study These Flashcards

Mean of the value across entire data set

How does Parzen Windows differ from kNN?

Study These Flashcards

Instead of picking k nearest neighbors, parzen windows looks at all the training examples that are in a fixed radius from the point

How can kNN be modified to use a kernal?

Study These Flashcards

Replace the distance function with a kernal K(x’, x)

Whats the cons of kNN? (4)

Study These Flashcards

Need to handle missing data (fill in or special distance function)
Sensitive to class outliers
Sensitive to irrelevant attributes (affects distance)
Computationaly expensive

What is the runtime complexity of kNN? (testing)

Study These Flashcards

O(nd)

n - training examples

d - dimensions

What can we reduce d in kNN?

Study These Flashcards

Dimensionality reduction

What datasets are K-D trees effective (kNN)?

low-dimensional, real-valued data

What datasets are inverted lists effective (kNN)?

high-dimensional, discrete (sparse) data (e.g. text)

What datasets are locality-sensitive hashing effective (kNN)?

high-dimensional, real-valued or discrete

Which methods are inexact at finding neighbors in kNN? * K-D trees * inverted lists * locality-sensitive hashing

K-D trees and locality-sensitive hashing

Which methods are exact at finding neighbors in kNN? * K-D trees * inverted lists * locality-sensitive hashing

inverted lists

How are K-D trees built from training data?

1. Pick a random dimension 2. Find median 3. Split data 4. Repeat until the nodes have the desired amount of points

How is a dataset split with locality-sensitive hashing?

Create random hyper-planes **h**₁, ..., **h**_k that split the dataset into 2^k regions

Nearest Neighbour Flashcards

(31 cards)