Lecture 5 - Distance based models Flashcards
What are distance based algorithms
Distance based algorithms are machine learning algorithms that classify instances by computing distances between these instances and a number of internally stored exemplars.
____ that are closest to the instance have the largest influence on the classification assignment to the instance
Exemplars
What is hamming distance
Hamming distance between two strings or vectors of equal length is the number of positions at which the corresponding symbols are different.
110 and 101 hamming distance 2.
What is 0-norm,1-norm and 2-norm give examples.
- Hamming distance
- Manhattan distance
- Euclidean distance
What is the Chebyshev distance
The Chebyshev distance, also known as the maximum or L∞ distance, between two points in a space is the greatest of their differences along any coordinate dimension.
what are the 4 distance metric conditions
- Distances between a point and itself are 0
- All other distances are larger than zero if x!=y.
- Distances are symmetric
- Detours can not shorten the distance (tringle inequality)
What are exemplars
Exemplars are prototypical instances within clusters/classes
What are the two exemplars
- Centroid
- Medoid
Which one happens in data and which one doesnt
Centroids do not happen in data
Medoids happen in data, more time consuming to calculate
Since the number of classes is typically much lower than the number of exemplars, decision rules often take more than one nearest exemplar into account (or k-nearest exemplar)
true
What is the curse of dimentionality
In high-dimentional spaces everything is far away from everything and so pairwise distances are uninformative
list nearest-neighbour classifier properties
- Nearly perfect seperation of classes on the training set
- Easily adapted to real-valued targets and structured objects
- Unbalanced complexity
- Perfect seperates training data
- by increasing the number of neighbours k we would increase bias and decrease variance
what are 2 types of distance-based clustering
Predictive clustering
Descriptive clustering
What is predictive clustering
use a distance metric, a way to construct exemplars and a distance-based decision rule to create clusters that are compact with respect to the distance metric
what is descriptive clustering
trees called dendrograms purely defined in terms of a distance measure are used. they partition the given data rather than the entire instance space.
What is k-means clustering problem
The k-means clustering problem is to find a partition that minimises the total within-cluster scatter
what are weak points
The algorithm can be impacted by the starting points, you also need to know the number of cluster in advance
what is the complexity of medoid
Finding a medoid requires us to calutate for each data point the total distance to all other data points, in order to chose the point that minimises it. Regardless of the distance metric used, this is an O(n^2) operation for n points.
how can we evaluate centroids
Inertia: the k-means algorithm aims to choose centroids that minimise the inertia or within-cluster scatter.
Inertia shows how internally coherent and compact clusters are
what is a sillhouette
a silhouette then sorts and plots s(x) for each instance, grouped by cluster.
We want high value for b and low value for a
What is hierarchial clustering
The k-means algorithms is flat. Hierarchical clustering provides a taxonomy of instances within the cluster, or where clusters can be merged with each other.
what is a dendrogram
Given data set D, a dendrogram is a binary tree with the elements of D at its leaves. An internal node of the tree represents the subset of elements in the leaves of the subtree rooted at that node. The level of a node is the ditsance between the two clusters represented by the children of the node. leaves have level 0.
What is a linkage function
A linkage function calculates the distance between arbitary subsets of the instance space, given a distance metric.
What are the 4 linkage types
Single linkage
Complete linkage
Average linkage
Centroid linkage
What are the best 3 linkage in order
- Complete linkage
- Centroid linkage
- Single linkage