Unsupervised Learning Flashcards
1
Q
- Supervised vs unsupervised learning
A
Supervised uses labeled data, unsupervised uses unlabeled data
2
Q
- How unsupervised learning can be helpful during exploration of a dataset
A
helps explore and understand the underlying structure of a dataset,
making it easier to prepare data for further analysis and model building.
3
Q
- Define a cluster and its attributes
A
- Groups of similar data points
- Attributes: number, membership, centroids, intertia
4
Q
- Why is it important to standardize data before performing clustering?
A
ensuring that all features contribute equally, improving the performance and reliability of the clustering algorithm, and preventing biases from features with larger scales.
5
Q
- Are all clusters meaningful? If not, what are the pitfalls and how can we avoid them?
A
No
pitfalls include noise and outliers, incorrect number of clusters
use elbow plot and silhouette score
6
Q
- Explain the concept of an anomaly and describe how unsupervised learning can help us to detect them.
A
data that deviates from the normal pattern in a dataset.
Unsupervised learning helps detect them through clustering, density-based, distance-based methods
7
Q
- Describe some of the important roles of a human operator when undertaking unsupervised learning.
A
- Data cleaning, standardization
- Feature selection
- Parameter tuning
- Engineering judgement and knowledge
8
Q
- Hierarchical vs flat clustering
A
- Hierarchical - useful for capturing nested structures and provides visual representation through dendrograms. It is computationally intensive and less scalable.
- Flat - Efficient and scalable, requires the number of clusters to be predefined, and is suitable for flat, non-hierarchical data structures.