Unsupervised Learning Flashcards

Question 1

Q

What is unsupervised learning?

Answer

A

Unsupervised learning is a type of machine learning where the training data does not contain any output information (i.e., unlabeled data). The goal is to find patterns and structures in the input data.

Question 2

Q

What is clustering in unsupervised learning?

Answer

A

Clustering is the process of grouping similar objects into clusters based on their characteristics. It is used to create a higher-level representation of the data and for tasks such as data reduction and outlier detection.

Question 3

Q

What are some common applications of unsupervised learning?

Answer

A

Social network analysis or marketing

Image segmentation

Data annotation (e.g., single-cell transcriptomics)

Question 4

Q

What is the goal of clustering algorithms?

Answer

A

Clustering algorithms aim to form groups such that members within a group are similar to each other but different from members of other groups.

Question 5

Q

What are similarity measures in clustering?

Answer

A

Similarity measures define how close two instances are to each other. Examples include Euclidean distance, Manhattan distance, and cosine similarity.

Question 6

Q

What is a cluster center?

Answer

A

A cluster center is a representative data point of a cluster. For numeric data, it is the “center of mass” (mean), while for nominal data, it is the mode.

Question 7

Q

What are within-cluster and between-cluster variations?

Answer

A

Within-cluster variation (WC): Measures how compact the clusters are.

Between-cluster variation (BC): Measures the distances between different clusters.

Question 8

Q

What is the k-means algorithm? How does it work?

Answer

A

K-means is a partition-based clustering algorithm that follows these steps:

Define the number of clusters (k).

Choose k initial centroids randomly.

Assign each data object to the nearest centroid.

Compute new centroids as the mean of cluster members.

Repeat the process until cluster membership no longer changes.

Question 9

Q

What are variations within the k-means algorithm?

Answer

A

Selection of the initial k means

Different dissimilarity calculations

Various strategies for calculating cluster means

Use of different distance measures

Question 10

Q

What is the elbow method in k-means?

Answer

A

The elbow method helps determine the optimal number of clusters by plotting the within-cluster sum of squares (WCSS) against k. The ideal k is at the ‘elbow’ where WCSS decreases sharply.

Question 11

Q

What are the strengths of k-means clustering?

Answer

A

Simple and easy to implement

Computationally efficient

Question 12

Q

What are the weaknesses of k-means clustering?

Answer

A

Requires predefining k

Sensitive to initialization

Sensitive to noise and outliers

Struggles with non-globular cluster shapes

Question 13

Q

What is hierarchical clustering?

Answer

A

Hierarchical clustering builds a hierarchy of clusters by either merging (agglomerative) or splitting (divisive) data points based on similarity.

Question 14

Q

What is agglomerative clustering?

Answer

A

Agglomerative clustering starts with each data point as its own cluster and merges the closest clusters iteratively until only one cluster remains.

Question 15

Q

What are different distance metrics in agglomerative clustering?

Answer

A

Single linkage: Distance between the closest points of two clusters

Complete linkage: Distance between the farthest points of two clusters

Centroid distance: Distance between cluster centroids

Group average: Average of all pairwise distances between clusters

Question 16

Q

What is a dendrogram?

Answer

Study These Flashcards

A

A dendrogram is a tree-like diagram that visualizes the merging process in hierarchical clustering.

Question 17

Q

What are the strengths of agglomerative clustering?

Answer

Study These Flashcards

A

Produces deterministic results

Multiple possible cluster configurations

No need to predefine k

Can handle arbitrarily shaped clusters (single-linkage)

Question 18

Q

What are the weaknesses of agglomerative clustering?

Answer

Study These Flashcards

A

Computationally expensive for large datasets

Requires defining a distance metric

Question 19

Q

What is the difference between partition-based and hierarchical clustering?

Answer

Study These Flashcards

A

Partition-based clustering (e.g., k-means) requires a predefined number of clusters and assigns data points to clusters iteratively.

Hierarchical clustering builds a tree-like structure of clusters and does not require a predefined number of clusters.

Question 20

Q

What is the purpose of clustering in data analysis?

Answer

Study These Flashcards

A

Clustering helps with data exploration, pattern discovery, data compression, anomaly detection, and feature engineering for supervised learning models.

Unsupervised Learning Flashcards

(20 cards)