13 | DW-3 | tSNE, UMAP Flashcards
(QUIZ 6)
t-SNE was invented by ______ in ______
Laurens van der Maaten, 2008
(QUIZ 6)
t-SNE means ________________________ .
t-distributed stochastic neighbour embedding
(QUIZ 6)
PCA is a ______ dimensionality reduction technique, whereas t-SNE is a ______ technique.
PCA: linear
t-SNE: non-linear
https://www.geeksforgeeks.org/difference-between-pca-vs-t-sne/
(QUIZ 6)
PCA is focused on the ______ structure of the data, whereas t-SNE is focused on the ______ structure.
PCA: global
t-SNE: local
https://www.geeksforgeeks.org/difference-between-pca-vs-t-sne/
(QUIZ 6)
PCA is a ______ algorithm whereas t-SNE is ______.
The above means that the results from a data set for ______ are always the same whereas for ______ they might differ in each analysis.
PCA: deterministic
t-SNE: non-deterministic
https://www.geeksforgeeks.org/difference-between-pca-vs-t-sne/
(QUIZ 6)
We cannot preserve variance in ______, instead we can preserve distance using hyperparameters.
In ______ we decide on how much variance to preserve using eigen values.
t-SNE
PCA
What does UMAP stand for?
Uniform Manifold Approximation and Projection
What type of machine learning technique is UMAP?
A nonlinear dimensionality reduction technique
What is UMAP based on?
Topological and geometric principles of manifold learning
What kind of data is UMAP commonly used for?
High-dimensional data, such as transcriptomic, image, and clustering data
What is the main goal of UMAP?
To reduce the dimensionality of high-dimensional data while preserving local and global structure
How does UMAP compare to PCA in terms of aim?
Unlike PCA, which focuses on variance maximization, UMAP seeks to preserve the data’s intrinsic topological structure in a lower-dimensional space
What type of visualization does UMAP aim to provide?
A meaningful, interpretable 2D or 3D projection of complex datasets
What is the theoretical foundation of UMAP?
It is based on the concept of Riemannian geometry and algebraic topology
What does UMAP assume about high-dimensional data?
That it lies on a low-dimensional manifold embedded in a higher-dimensional space
What is the first step in UMAP?
Constructing a weighted graph representation of the data’s local manifold structure
What does UMAP optimize to generate the final embedding?
A low-dimensional graph layout that approximates the original high-dimensional structure
What are the main steps in UMAP’s algorithm?
Construct a k-nearest neighbors (kNN) graph
Apply a fuzzy simplicial set representation
Optimize a low-dimensional embedding that preserves the fuzzy topology
What role does the k-nearest neighbors (kNN) graph play in UMAP?
It captures the local structure of the data
How does UMAP embed the data in a lower-dimensional space?
By optimizing a cross-entropy loss between the high-dimensional and low-dimensional representations
What are the key hyperparameters in UMAP?
n_neighbors
min_dist
metric
spread
UMAP
What does n_neighbors control?
The balance between local and global structure preservation
UMAP
What does min_dist affect?
The compactness of clusters in the lower-dimensional space
How does the choice of metric influence UMAP?
It determines the distance function used to define neighborhood relationships