K-Means Clustering Flashcards

1
Q

Is K-Means Clustering supervised or unsupervised?

A

K-Means Clustering is an example of unsupervised machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain supervised versus unsupervised learning

A

In unsupervised learning, there is no specific output. The data is analyzed without knowing a specific output you’re looking for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name some examples of clustering

A

Market segmentation, product analysis, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a cluster

A

A cluster is a collection of objects that are similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we determine similarity in clustering?

A

We need a notion of distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the objective of clustering?

A

The objective of clustering is to group similar data points into a group. Some examples are segmenting customers into similar groups, or automatically organizing files emails into folders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does clustering simplify data?

A

Clustering simplifies data by reducing many data points into a few clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some examples of distance used in clustering?

A

Examples of common distance measures in clustering are Manhattan Distance, Euclidian Distance, and Chebyshev Distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for Euclidian distance?

A

Square root of [ X1 - X2 squared + Y1 - Y2 squared + Z1 - Z2 squared + m…]

m number of columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for Manhattan distance?

A

Absolute value of X1 minus X2 plus absolute value of Y1 minus Y2 plus absolute value of Z1 minus Z2, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is it called Manhattan distance?

A

In Manhattan, you cannot connect two points directly. You must walk in a grid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you calculate Chebyshev or chessboard distance?

A

Take the max value of (absolute value X1 - X2 or absolute value of Y1 - Y2 or absolute value of Z1 - Z2 … etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Minkowski distance?

A

A formula which uses a P value, depending on which distance measure you want. It is calculated by the sum of (all absolute value of Xi - Yi raised to the power of P) then raised to the power of 1/p.

Euclidian distance uses P equals two, Manhattan uses P equals one, chessboard equals P is greater than two and up to infinity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What measure of distance does K means clustering use?

A

Euclidian distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name some types of clustering

A

connectivity based clustering
Centroid based clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is connectivity based clustering?

A

Based on the idea that related objects are closer to each other

17
Q

Formula for determining number of connections between N points

A

(N x (N-1)) / 2

18
Q

What is the point of attempting to choose the optimal K and what is a commonly used method for doing so?

A

You are attempting to strike a good balance between compression and accuracy. The elbow method is commonly used.

19
Q

Using K means, how do you calculate the centroid of a cluster

A

The centroid of a cluster is calculated by finding the mean vector of all data points in that cluster. For example, add up the absolute values of X and divide by the number of data points to find the X value, and add up all of the absolute value of Y data points and divide by the number of data points to find the Y value.