Unsupervised learning Flashcards

1
Q

For what tasks can we use unsupervised learning?

A

Dimensionality reduction
Anomaly detection
Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some challenges of k-means clustering?

A

Clusters tend to be the same size
Depends on initialization
Handles anisotropic data and non-linearites poorly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we improve k-means for non-linear datasets?

A

Change the cost function from euclidean to geodesic/ graph based/ kernel based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the elbow method?

A

A methode for determining the number of clusters. (Stop increasing number of clusters when the gain of increase is small).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the steps of PCA?

A
  1. Create design matrix
  2. Center data
  3. UDV^T = SVD(1/(N-1) XX^T)
  4. Keep the n eigenvectors (columns of U) with largest eigenvalues.
    v* = U^T x
    x = Uv*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the steps in AAM (Active appearance models)

A
  1. Calculate the shape model ( mean + eigenmodes troughout the dataset)
  2. Warp the image so it fits the landmark template
  3. Create appearance model using PCA on the “shape free paches”
  4. PCA jointly on the shape and appearance to capture correlations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are eigenfaces?

A

“Eigenvectors” created from face images using PCA. These eigenfaces can be used as a basis to reconstruct face images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some interpretations of the PCA?

A
  1. Best L2 recontruction error among all linear models of equal rank
  2. k’th eigenvector is the direction of maximal variance orthogonal to all former eigenvectors.
  3. PCA fits an ellipsoid to the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we interpret PCA probabilistic?

A

x = Uw + mu + e
w ~ N(0,I)
e ~ N(0, sigma^2*I)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we enforce sparsity on the PCA?

A

Add a sparsity weight penalty (L0, L1…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is one of the main advantages of sparse penalites in the PCA setting?

A

Can remove noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the main dissadvantage of Autoencoders compared to PCA?

A
  1. Visualizing the latent space

2. Generating new samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can AE be used for image anomaly detection?

A

Train the autoencoder on “normal” data. When using the autoencoder on a new image subtract the result from the original to find anomalies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between GMM and fully Bayesian GMM?

A

In fully bayesian GMM we assign hyperpriors to pi_n and my_k, sigma_k. A Dirchlete prior on pi_n usually favors fewer non-zero clusters, hyperpriors on mu, sigma can avoid clusters that degenerate to 0 variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is variational Bayesian inference?

A

When we cannot compute the posterior p(theta | X) analytically, we can approximate it using q(theta) from a family of distributions Q.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What kind of families Q can we use?

A
  1. Delta dirac ( finds the mode)
  2. Parametric familiy of e.g. gaussians
  3. Mean field approximation (q factroizes).
17
Q

In variational inference what metric do we use to calculate the distance between two distributions?

A

KL(q || p) = Ep[log(q/p)]

18
Q

What can we do instead of minimizing KL?

A

Maximizing ELBO = Eq[log(p(X|theta)] - KL(q||p)