Unsupervised Learning Flashcards

Question 1

Q

What is unsupervised learning?

Answer

A

Finding patterns from unlabeled data (if you know there is some structure).
Exploratory data analysis task.

Question 2

Q

Unsupervised learning methods used in Data Understanding?

Answer

A

Scatter plots, 2/3 dimensional PCA.
Correlation analysis.

Question 3

Q

What is clustering?

Answer

A

Identifying groups of “similar” data objects. Unsupervised learning is 90% clustering. Algorithms: hierarchical clustering.

Question 4

Q

What is association analysis?

Answer

A

Finding associations between attributes or typical combinations of values likeIf Demand=high and Supply=low then Price=high.

Question 5

Q

What kinds of problems could you solve with these methods?

Question 6

Q

Why is the evaluation of unsupervised learning results more
difficult than supervised learning?

Answer

A

Evaluation of supervised learning pretty straight forward: use cross-validation and separate test data.
Unsupervised learning: evaluation criteria not well defined. Can measure compactness of clusters, support and confidence for associatioin rules.
Plausibility checks: does model make sense to experts (interpretability)?

Question 7

Q

Hierarchical clustering algorithms

Answer

A

Build clusters step by step. Usually, a bottom-up strategy where at first each data object is considered a cluster and step by step joining clusters that are close to each other (agglomerative hierarchical clustering). See the opposite strategy divisive hierarchical clustering.

Question 8

Q

Divisive hierarchical clustering

Answer

A

The entire data set starts as a cluster and is divided step by step into smaller ones.

Question 9

Q

How does the clustering happen?

Answer

A

a dissimilarity (or similarity) matrix is used to find the most different groups. Distance measures: Euclidean (simplest), Manhattan, Pearson, Tscheby…
Clusters should not depend on the measurement unit! (some normalization should happen)

Question 10

Q

Isotrophic vs non-isotrophic distances

Unsupervised Learning Flashcards

(10 cards)