Non-parametric classification Flashcards
Difference between parametric and non-parametric models
Parametric model:
a model parametrization is assumed, and the data are used to find the best value of the parameters
Non-parametric model:
No structure is assumed for the hypothesis set; classification is obtained by other criteria, for example similarities among data
Why is non-parametric classification useful?
Sometimes, data are not linearly separable, so linear classification can’t be used; moreover, finding a non-linear transform may be difficult. In this cases, a non-parametric model can be used for classification.
Nearest neighbor: rule
given a dataset
{(x1,y1),…,(xN,yN)}
where yn = +-1 for any n
to classify a new point x, find the nearest point to x in the dataset and apply its label to x*.
This simple method gives a Voronoi tessellation the regressor space, that is much more complex that the linear division given by the perceptron.
How to define a measure of distance between points
Euclidean distance:
d(x, x’) = || x - x’ ||
Weighted Euclidean distance:
d(x, x’) = (x - x’)’ Q (x - x’)
Cosine similarity:
d(x, x’) = cosim(x, x’) = x x’ / ( ||x|| * ||x’|| )
Nearest neighbor: algorithm
given x*
- reorder the data according to the distance from x:
d(x, x[1]) < = d(x, x[2]) < = … < = d(x, x[N]) - label associated to x* is
h(x*) = y[1]
Nearest neighbor: basic assumptions
- there is a nearby point
- the target function f(x) is smooth enough, so that the classification of the nearest neighbor is indicative of the classification of x*
k-nearest neighbors: need
Since data are noisy, relying on a single point is not recommended, so a k-NN algorithm can be used
k-nearest neighbors: rule
- it’s a generalization of the nearest-neighbor
- h(x) = sign( sum(i=1,K) y[i] (x) )
- > the label of a new point is the same as the one of the majority of the K nearest points.
K=1 - > nearest neighbor
K=N - > the label of a new point x* is that of the majority of the points in the dataset
Theoretical theorem on the choice of K
For N - > +inf , if K(N) - > +inf and K(N)/N - > 0
then
1) Ein(h) - > Eout(h)
2) Eout(h) - > Eout* minimum achievable out-of-sample error
Practical rule for the choice of K
K(N) = floor( sqrt(N) )
Pros and cons of K-NN
- No training! - > no training cost
- High testing cost for prediction
memory requirement is O(Nd)
computational complexity is O(Nd + N*logK) - Exploratory data analysis deals with the efficient solution of K-NN problems