Non-parametric classification Flashcards

1
Q

Difference between parametric and non-parametric models

A

Parametric model:
a model parametrization is assumed, and the data are used to find the best value of the parameters

Non-parametric model:
No structure is assumed for the hypothesis set; classification is obtained by other criteria, for example similarities among data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is non-parametric classification useful?

A

Sometimes, data are not linearly separable, so linear classification can’t be used; moreover, finding a non-linear transform may be difficult. In this cases, a non-parametric model can be used for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nearest neighbor: rule

A

given a dataset
{(x1,y1),…,(xN,yN)}
where yn = +-1 for any n

to classify a new point x, find the nearest point to x in the dataset and apply its label to x*.

This simple method gives a Voronoi tessellation the regressor space, that is much more complex that the linear division given by the perceptron.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to define a measure of distance between points

A

Euclidean distance:
d(x, x’) = || x - x’ ||

Weighted Euclidean distance:
d(x, x’) = (x - x’)’ Q (x - x’)

Cosine similarity:
d(x, x’) = cosim(x, x’) = x x’ / ( ||x|| * ||x’|| )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nearest neighbor: algorithm

A

given x*

  • reorder the data according to the distance from x:
    d(x
    , x[1]) < = d(x, x[2]) < = … < = d(x, x[N])
  • label associated to x* is
    h(x*) = y[1]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nearest neighbor: basic assumptions

A
  • there is a nearby point
  • the target function f(x) is smooth enough, so that the classification of the nearest neighbor is indicative of the classification of x*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

k-nearest neighbors: need

A

Since data are noisy, relying on a single point is not recommended, so a k-NN algorithm can be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

k-nearest neighbors: rule

A
  • it’s a generalization of the nearest-neighbor
  • h(x) = sign( sum(i=1,K) y[i] (x) )
  • > the label of a new point is the same as the one of the majority of the K nearest points.

K=1 - > nearest neighbor
K=N - > the label of a new point x* is that of the majority of the points in the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Theoretical theorem on the choice of K

A

For N - > +inf , if K(N) - > +inf and K(N)/N - > 0
then
1) Ein(h) - > Eout(h)
2) Eout(h) - > Eout* minimum achievable out-of-sample error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Practical rule for the choice of K

A

K(N) = floor( sqrt(N) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Pros and cons of K-NN

A
  • No training! - > no training cost
  • High testing cost for prediction
    memory requirement is O(Nd)
    computational complexity is O(N
    d + N*logK)
  • Exploratory data analysis deals with the efficient solution of K-NN problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly