Unsupervised Learning Flashcards

Question 1

Q

Supervised vs unsupervised learning

Answer

A

Supervised uses labeled data, unsupervised uses unlabeled data

Question 2

Q

How unsupervised learning can be helpful during exploration of a dataset

Answer

A

helps explore and understand the underlying structure of a dataset,
making it easier to prepare data for further analysis and model building.

Question 3

Q

Define a cluster and its attributes

Answer

A

Groups of similar data points
Attributes: number, membership, centroids, intertia

Question 4

Q

Why is it important to standardize data before performing clustering?

Answer

A

ensuring that all features contribute equally, improving the performance and reliability of the clustering algorithm, and preventing biases from features with larger scales.

Question 5

Q

Are all clusters meaningful? If not, what are the pitfalls and how can we avoid them?

Answer

A

No
pitfalls include noise and outliers, incorrect number of clusters
use elbow plot and silhouette score

Question 6

Q

Explain the concept of an anomaly and describe how unsupervised learning can help us to detect them.

Answer

A

data that deviates from the normal pattern in a dataset.
Unsupervised learning helps detect them through clustering, density-based, distance-based methods

Question 7

Q

Describe some of the important roles of a human operator when undertaking unsupervised learning.

Answer

A

Data cleaning, standardization
Feature selection
Parameter tuning
Engineering judgement and knowledge

Question 8

Q

Hierarchical vs flat clustering

Answer

A

Hierarchical - useful for capturing nested structures and provides visual representation through dendrograms. It is computationally intensive and less scalable.
Flat - Efficient and scalable, requires the number of clusters to be predefined, and is suitable for flat, non-hierarchical data structures.

Unsupervised Learning Flashcards

(8 cards)