CHAP 8 : Unsupervised learning + k means clustering Flashcards

1
Q

What are the types of unsupervised learning tasks? [4]

A
  1. Clustering – grouping of similar customer profiles
  2. Dimensionality reduction – finding key features from data
  3. Anormaly detection – detecting fraud transactions

** 4. Association rule mining – analyze data for patterns, or co-occurrences [e.g. people who buy a new home are most likely going to buy new furniture]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the aim of clustering?

A

To identify similar groups of data based on notion of similar patterns (shape, colour, size etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What measure is used to group data with similarities?

A

Similarity measures, such as distance metrics like euclidean distance (most commonly used)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What similarity measure is used in k means?

A

Squared Euclidean Distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 2 main steps in k-means algorithm?

A
  1. Cluster assignment step
  2. Move centroid step
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is K-Means algorithm?

A

it is a [popular] centroid-based clustering algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

List the steps of k means clustering in detail.

A
  1. Randomly initialise k-cluster centroids. (k = no of clusters)
  2. Cluster assignment step : assign data points to the closest cluster centre. (calculate
  3. Move the cluster centre (centroid) to the average of assigned points
  4. Stop when the algorithm convereges.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we know that the kmean algorithm converges and that we can stop running the algorithm? [2]

A
  1. There is no change in the point’s assignment.
  2. There is minimal change is the loss / objective function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the optimisation objective function in k means clustering? What is it called?

A

To minimise the sum of average squared Euclidean distances between each data point and its assigned cluster centroid. [aka within-cluster sum of squares (WCSS) or the sum of squared errors (SSE).]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is objective function same as error function / cost function?

A

YES.
The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does an optimal k means clustering look like?

A
  1. Have clusters of around equal sizes, that are spherical and are non-overlapping
  2. For each cluster, n > k, where k = no of cluster centres and n is the no of data points in the assigned cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 2 methods we can use to choose k, the number of clusters?

A
  1. Elbow method, by plotting a graph of loss/ error function against the k, the number of clusters. [When we see an elbow shape in the graph, we pick the K-value where the elbow gets created, also known as the elbow point]
  2. Choose manually, k under situations where allowed. For example, shirt size only has 4 distinct sizes : S,M,L,XL. Thus k = 4
How well did you know this?
1
Not at all
2
3
4
5
Perfectly