Lecture 3 - Unsupervised Machine Learning Flashcards

1
Q

What are some examples of unsupervised machine learning?

A
  • outlier detection
  • similarity search
  • association rules
  • data visualization
  • clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe Clustering. What inputs does it take? What is the output?

A

Clustering is a way of grouping data into a number of clusters without having labels present

Input: Set of objects described by features xi

Output: An assignment of objects into “groups”

Unlike classification, we are not given the “groups”. The algorithm must figure these groups out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can you give some examples of use cases for clustering?

A
  • define market segments by clustering customers
  • study social networks by recognizing communities
  • recommendation systems (Amazon recommending products, Netflix recommending shows)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you normalize/scale data?

A

You can either

  • Scale data from 0-1
  • Normalise using the Z-score (x’=x-μ)/σ): Transform the data so that it is expressed as σ from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is K-Means Clustering? What is the input of the algorithm? What are the assumptions? Describe the 4 steps in the algorithm.

A

K Means clustering is one of the most popular clustering methods

Input:
- The number of clusters ‘k’ (hyperparameter)

Assumptions:

  • The center of each cluster is the mean of all samples belonging to that cluster
  • Each sample is closer to the center of its own cluster than to the center of other clusters

The four steps are like so:

  1. Initial guess of the center (the “mean”) of each cluster
  2. Assign each xi to its closest mean
  3. Update the means based on the cluster assignments
  4. Repeat steps 2-3 until convergence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the assumptions of K-Means clustering?

A

The center of each cluster is the mean of all samples belonging to that cluster

Each sample is closer to the center of its own cluster than to centers of other clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can you relate K-Means clustering to set theory?

A

We can interpret K-Means steps as trying to minimize an objective:
Given a set of observations (x1,x2,…,xn) the algorithm’s goal is to partition the n observations into k sets S={S1,S2,…,Sk} so as to minimize the within-cluster sum of squares:

{See the rest of the math in Notion}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you determine how many K’s in K-Means clustering?

A

You can determine how many clusters using:

  • Elbow Method
  • Silhouette analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Elbow Method?

A

Elbow Method:

  • Run K-means for several k
  • Distortion: Sum of distances of each point to the center of the closest cluster
  • Look for k where the curve stops decreasing rapidly
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is silhouette analysis?

A

Thickness of the plot shows the size of the cluster (how many datapoints are assigned to the cluster)

The groups in the graph should be approximately similar in terms of the sihouette coefficient, they should not be under the mean of the s. coeff., and hopefully they would also be approximately of the same thickness (unless you can clearly see the diff. between the clusters that they really differ in terms of size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some issues with K-Means clustering?

A

Final cluster assignment depends on initialization of centers

  • Cluster assignments may vary on different runs
  • May not achieve global optimum

Assumes you know the number of clusters ‘k’
- Lots of heuristic approaches to picking ‘k’

Each object is assigned to one (and only one) cluster:

  • No possibility for overlapping clusters or leaving objects unassigned
  • Fuzzy clustering/soft k-means allows assigning to many

Sensitive to scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is a set convex?

A

A set is convex if a line between to points in the set stays in the set (See images on Notion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can K-Means cluster into non-convex sets?

A

No, K-Means clusters cannot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Density based Clustering

A
  • Clusters are defined by “dense” regions
  • It’s deterministic, meaning that it always gives the same clusters
  • No fixed number of clusters ‘k’, determines them by itself
  • Objects in non-dense regions don’t get clustered
    i. e not trying to “partition” the space
  • Clusters can be non-convex, i.e you can find clusters of any shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is DBSCAN? Which hyperparameters does it have?

A

DBSCAN is a density based clustering algorithm.

It has two hyperparameters:
- Epsilon(ε): Distance we use to decide if another point is a “neighbour”.

  • MinNeighbours: Number of neighbours needed to say a region is “dense”
    If you have at least minNeighbours “neighbours”, you are called a “core point”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the algorithm of density-based clustering (the process)

A

For each example xi:

  • If xi is already assigned to a cluster, do nothing
  • Test whether xi is a ‘core’ point (≥ minNeighbours examples within ‘ε’)
    • If xi is not a core point, do nothing (this could be an outlier).
    • if xi is a core point, “expand” cluster

“Expand” cluster function:

  • Assign all xj within distance ‘ε’ of core point xi to cluster.
  • For each newly-assigned neighbour xj that is a core point, “expand” cluster
17
Q

What are some of the issues with density-based clustering?

A

Some points are not assigned to a cluster
- Good/bad depending on the application

Ambiguity of “non-core” (boundary) points between clusters

Consumes a lot of memory with large datasets

Sensitive to the choice of ε and miNeighbours
- Otherwise, not sensitive to initialization (except for boundary points)

18
Q

What are the two ways of doing hierarchical clustering?

A

Hierarchical clustering can be split into the following two types of clustering:

  • Divisive Clustering*
  • Top-down hierarchical clustering where all start in one cluster and are then divided into smaller and smaller clusters
  • Agglomerative Clustering*
  • Hierarchical clustering using a bottom-up approach where each observation starts in its own cluster

In general, Agglomerative clustering works much better in practice

19
Q

In Agglomerative clustering, clusters are successively merged…

A
  • Using some linkage criterion
  • and based on a distance metric

until all samples belong to one cluster

20
Q

True or False? If uncertain whether scaling is required, I should scale my data

A

True, if you’re not sure whether scaling is needed, scale it.

21
Q

Hierarchical clustering is often visually inspected using…

A

A dendrogram

Which is a tree diagram that shows the hierarchy and how the data is split into clusters

22
Q

Which distance metrics are typically used in Agglomerative clustering?

A

Euclidean Distance

Manhattan (block) distance

23
Q

Which different linkages (for hierarchical clustering) are there?

A
  • Centroid
  • Single(“nearest neighbour”)
  • Complete(the “farthest neighbour”)
  • Average
  • Ward
24
Q

What is a centroid linkage?

A

Centroid: The distance between the centroids of each cluster

25
Q

What is a Single(“Nearest neighbour”) linkage?

A

Single(“nearest neighbour”): The shortest distance between two points in each cluster

26
Q

What is a Complete(“Farthest neighbour”) linkage?

A

Complete(the “farthest neighbour”): The longest distance between two points in each cluster

27
Q

What is an Average Linkage?

A

Average: The average distance between each two points in each two clusters

28
Q

What is a Ward Linkage?

A

Ward: The sum of the squared distances from each point to the mean of the merged clusters

29
Q

What are the issues with hierachical clustering?

A

Infeasible with very large datasets

Influenced by order of datapoints

Sensitive to outliers

it is impossible to undo a step in hierarchical clustering (i.e revert to the previous step)

30
Q

What is the purpose of unsupervised learning?

A

As we do not have data labels, the purpose is to group datapoints that are similar, and find some patterns

31
Q

What are some common scaling/normalization methods?

A

Rescaling (min-max normalization), Mean normalization, Standardization (Z-score Normalization)

32
Q

What is the objective of K-means clustering? (can also be used as a definition of K-means)

A

Given a set of observations, the algorithm’s goal is to partition the n observations into K sets so as to minimize within-cluster sum of squares

33
Q

Some extra info on silhouette analysis:

A

The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). The silhouette ranges from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. If most objects have a high value, then the clustering configuration is appropriate. If many points have a low or negative value, then the clustering configuration may have too many or too few clusters.

34
Q

True or False? DBSCAN is sensitive to hyperparameter setting of epsilon and MinNeighbours, and also to the initialization, as it first guesses the mean of the clusters.

A

False. DBSCAN is sensitive to hyperparameter setting epsilon and MinNeighbours, but it does not guess the means of the clusters in the beginning like K-means does, so it is not sensitive to initialization