Unsupervised learning Flashcards
For what tasks can we use unsupervised learning?
Dimensionality reduction
Anomaly detection
Visualization
What are some challenges of k-means clustering?
Clusters tend to be the same size
Depends on initialization
Handles anisotropic data and non-linearites poorly
How can we improve k-means for non-linear datasets?
Change the cost function from euclidean to geodesic/ graph based/ kernel based
What is the elbow method?
A methode for determining the number of clusters. (Stop increasing number of clusters when the gain of increase is small).
What are the steps of PCA?
- Create design matrix
- Center data
- UDV^T = SVD(1/(N-1) XX^T)
- Keep the n eigenvectors (columns of U) with largest eigenvalues.
v* = U^T x
x = Uv*
Describe the steps in AAM (Active appearance models)
- Calculate the shape model ( mean + eigenmodes troughout the dataset)
- Warp the image so it fits the landmark template
- Create appearance model using PCA on the “shape free paches”
- PCA jointly on the shape and appearance to capture correlations.
What are eigenfaces?
“Eigenvectors” created from face images using PCA. These eigenfaces can be used as a basis to reconstruct face images.
What are some interpretations of the PCA?
- Best L2 recontruction error among all linear models of equal rank
- k’th eigenvector is the direction of maximal variance orthogonal to all former eigenvectors.
- PCA fits an ellipsoid to the data.
How can we interpret PCA probabilistic?
x = Uw + mu + e
w ~ N(0,I)
e ~ N(0, sigma^2*I)
How can we enforce sparsity on the PCA?
Add a sparsity weight penalty (L0, L1…)
What is one of the main advantages of sparse penalites in the PCA setting?
Can remove noise
What is the main dissadvantage of Autoencoders compared to PCA?
- Visualizing the latent space
2. Generating new samples
How can AE be used for image anomaly detection?
Train the autoencoder on “normal” data. When using the autoencoder on a new image subtract the result from the original to find anomalies.
What is the difference between GMM and fully Bayesian GMM?
In fully bayesian GMM we assign hyperpriors to pi_n and my_k, sigma_k. A Dirchlete prior on pi_n usually favors fewer non-zero clusters, hyperpriors on mu, sigma can avoid clusters that degenerate to 0 variance.
What is variational Bayesian inference?
When we cannot compute the posterior p(theta | X) analytically, we can approximate it using q(theta) from a family of distributions Q.