Unsupervised Learning Flashcards

1
Q
  • Supervised vs unsupervised learning
A

Supervised uses labeled data, unsupervised uses unlabeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • How unsupervised learning can be helpful during exploration of a dataset
A

helps explore and understand the underlying structure of a dataset,
making it easier to prepare data for further analysis and model building.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  • Define a cluster and its attributes
A
  • Groups of similar data points
  • Attributes: number, membership, centroids, intertia
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • Why is it important to standardize data before performing clustering?
A

ensuring that all features contribute equally, improving the performance and reliability of the clustering algorithm, and preventing biases from features with larger scales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • Are all clusters meaningful? If not, what are the pitfalls and how can we avoid them?
A

No
pitfalls include noise and outliers, incorrect number of clusters
use elbow plot and silhouette score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  • Explain the concept of an anomaly and describe how unsupervised learning can help us to detect them.
A

data that deviates from the normal pattern in a dataset.
Unsupervised learning helps detect them through clustering, density-based, distance-based methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  • Describe some of the important roles of a human operator when undertaking unsupervised learning.
A
  • Data cleaning, standardization
  • Feature selection
  • Parameter tuning
  • Engineering judgement and knowledge
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  • Hierarchical vs flat clustering
A
  • Hierarchical - useful for capturing nested structures and provides visual representation through dendrograms. It is computationally intensive and less scalable.
  • Flat - Efficient and scalable, requires the number of clusters to be predefined, and is suitable for flat, non-hierarchical data structures.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly