Lecture 5 - Distance based models Flashcards
What are distance based algorithms
Distance based algorithms are machine learning algorithms that classify instances by computing distances between these instances and a number of internally stored exemplars.
____ that are closest to the instance have the largest influence on the classification assignment to the instance
Exemplars
What is hamming distance
Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different.
110 and 101 hamming distance 2.
What is 0-norm,1-norm and 2-norm give examples.
- Hamming distance
- Manhattan distance
- Euclidean distance
What is the Chebyshev distance
The Chebyshev distance, also known as the maximum or Lā distance, between two points in a space is the greatest of their differences along any coordinate dimension.
what are the 4 distance metric conditions
- Distances between a point and itself are 0
- All other distances are larger than zero if x!=y.
- Distances are symmetric
- Detours can not shorten the distance (tringle inequality)
What are exemplars
Exemplars are prototypical instances within clusters/classes
What are the two exemplars
- Centroid
- Medoid
Which one happens in data and which one doesnt
Centroids do not happen in data
Medoids happen in data, more time consuming to calculate
Since the number of classes is typically much lower than the number of exemplars, decision rules often take more than one nearest exemplar into account (or k-nearest exemplar)
true
What is the curse of dimentionality
In high-dimentional spaces everything is far away from everything and so pairwise distances are uninformative
list nearest-neighbour classifier properties
- Nearly perfect seperation of classes on the training set
- Easily adapted to real-valued targets and structured objects
- Unbalanced complexity
- Perfect seperates training data
- by increasing the number of neighbours k we would increase bias and decrease variance
what are 2 types of distance-based clustering
Predictive clustering
Descriptive clustering
What is predictive clustering
use a distance metric, a way to construct exemplars and a distance-based decision rule to create clusters that are compact with respect to the distance metric
what is descriptive clustering
trees called dendrograms purely defined in terms of a distance measure are used. they partition the given data rather than the entire instance space.