Unsupervised Learning Flashcards
What is unsupervised learning?
Finding patterns from unlabeled data (if you know there is some structure).
Exploratory data analysis task.
Unsupervised learning methods used in Data Understanding?
Scatter plots, 2/3 dimensional PCA.
Correlation analysis.
What is clustering?
Identifying groups of “similar” data objects. Unsupervised learning is 90% clustering. Algorithms: hierarchical clustering.
What is association analysis?
Finding associations between attributes or typical combinations of values likeIf Demand=high and Supply=low then Price=high.
What kinds of problems could you solve with these methods?
Why is the evaluation of unsupervised learning results more
difficult than supervised learning?
Evaluation of supervised learning pretty straight forward: use cross-validation and separate test data.
Unsupervised learning: evaluation criteria not well defined. Can measure compactness of clusters, support and confidence for associatioin rules.
Plausibility checks: does model make sense to experts (interpretability)?
Hierarchical clustering algorithms
Build clusters step by step. Usually, a bottom-up strategy where at first each data object is considered a cluster and step by step joining clusters that are close to each other (agglomerative hierarchical clustering). See the opposite strategy divisive hierarchical clustering.
Divisive hierarchical clustering
The entire data set starts as a cluster and is divided step by step into smaller ones.
How does the clustering happen?
a dissimilarity (or similarity) matrix is used to find the most different groups. Distance measures: Euclidean (simplest), Manhattan, Pearson, Tscheby…
Clusters should not depend on the measurement unit! (some normalization should happen)
Isotrophic vs non-isotrophic distances