K-Means Clustering Flashcards
Is K-Means Clustering supervised or unsupervised?
K-Means Clustering is an example of unsupervised machine learning.
Explain supervised versus unsupervised learning
In unsupervised learning, there is no specific output. The data is analyzed without knowing a specific output you’re looking for.
Name some examples of clustering
Market segmentation, product analysis, etc.
What is a cluster
A cluster is a collection of objects that are similar
How do we determine similarity in clustering?
We need a notion of distance
What is the objective of clustering?
The objective of clustering is to group similar data points into a group. Some examples are segmenting customers into similar groups, or automatically organizing files emails into folders.
How does clustering simplify data?
Clustering simplifies data by reducing many data points into a few clusters
What are some examples of distance used in clustering?
Examples of common distance measures in clustering are Manhattan Distance, Euclidian Distance, and Chebyshev Distance.
What is the formula for Euclidian distance?
Square root of [ X1 - X2 squared + Y1 - Y2 squared + Z1 - Z2 squared + m…]
m number of columns
What is the formula for Manhattan distance?
Absolute value of X1 minus X2 plus absolute value of Y1 minus Y2 plus absolute value of Z1 minus Z2, etc.
Why is it called Manhattan distance?
In Manhattan, you cannot connect two points directly. You must walk in a grid.
How do you calculate Chebyshev or chessboard distance?
Take the max value of (absolute value X1 - X2 or absolute value of Y1 - Y2 or absolute value of Z1 - Z2 … etc)
What is the Minkowski distance?
A formula which uses a P value, depending on which distance measure you want. It is calculated by the sum of (all absolute value of Xi - Yi raised to the power of P) then raised to the power of 1/p.
Euclidian distance uses P equals two, Manhattan uses P equals one, chessboard equals P is greater than two and up to infinity.
What measure of distance does K means clustering use?
Euclidian distance
Name some types of clustering
connectivity based clustering
Centroid based clustering