Unsupervised Learning Flashcards

1
Q

What is unsupervised learning?

A

Finding patterns from unlabeled data (if you know there is some structure).
Exploratory data analysis task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unsupervised learning methods used in Data Understanding?

A

Scatter plots, 2/3 dimensional PCA.
Correlation analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is clustering?

A

Identifying groups of “similar” data objects. Unsupervised learning is 90% clustering. Algorithms: hierarchical clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is association analysis?

A

Finding associations between attributes or typical combinations of values likeIf Demand=high and Supply=low then Price=high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kinds of problems could you solve with these methods?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is the evaluation of unsupervised learning results more
difficult than supervised learning?

A

Evaluation of supervised learning pretty straight forward: use cross-validation and separate test data.
Unsupervised learning: evaluation criteria not well defined. Can measure compactness of clusters, support and confidence for associatioin rules.
Plausibility checks: does model make sense to experts (interpretability)?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hierarchical clustering algorithms

A

Build clusters step by step. Usually, a bottom-up strategy where at first each data object is considered a cluster and step by step joining clusters that are close to each other (agglomerative hierarchical clustering). See the opposite strategy divisive hierarchical clustering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Divisive hierarchical clustering

A

The entire data set starts as a cluster and is divided step by step into smaller ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the clustering happen?

A

a dissimilarity (or similarity) matrix is used to find the most different groups. Distance measures: Euclidean (simplest), Manhattan, Pearson, Tscheby…
Clusters should not depend on the measurement unit! (some normalization should happen)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Isotrophic vs non-isotrophic distances

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly