13 | DW-3 | tSNE, UMAP Flashcards

1
Q

(QUIZ 6)
t-SNE was invented by ______ in ______

A

Laurens van der Maaten, 2008

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

(QUIZ 6)
t-SNE means ________________________ .

A

t-distributed stochastic neighbour embedding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

(QUIZ 6)
PCA is a ______ dimensionality reduction technique, whereas t-SNE is a ______ technique.

A

PCA: linear
t-SNE: non-linear
https://www.geeksforgeeks.org/difference-between-pca-vs-t-sne/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

(QUIZ 6)
PCA is focused on the ______ structure of the data, whereas t-SNE is focused on the ______ structure.

A

PCA: global
t-SNE: local
https://www.geeksforgeeks.org/difference-between-pca-vs-t-sne/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

(QUIZ 6)
PCA is a ______ algorithm whereas t-SNE is ______.
The above means that the results from a data set for ______ are always the same whereas for ______ they might differ in each analysis.

A

PCA: deterministic
t-SNE: non-deterministic
https://www.geeksforgeeks.org/difference-between-pca-vs-t-sne/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

(QUIZ 6)
We cannot preserve variance in ______, instead we can preserve distance using hyperparameters.

A

In ______ we decide on how much variance to preserve using eigen values.
t-SNE
PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does UMAP stand for?

A

Uniform Manifold Approximation and Projection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of machine learning technique is UMAP?

A

A nonlinear dimensionality reduction technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is UMAP based on?

A

Topological and geometric principles of manifold learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of data is UMAP commonly used for?

A

High-dimensional data, such as transcriptomic, image, and clustering data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main goal of UMAP?

A

To reduce the dimensionality of high-dimensional data while preserving local and global structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does UMAP compare to PCA in terms of aim?

A

Unlike PCA, which focuses on variance maximization, UMAP seeks to preserve the data’s intrinsic topological structure in a lower-dimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of visualization does UMAP aim to provide?

A

A meaningful, interpretable 2D or 3D projection of complex datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the theoretical foundation of UMAP?

A

It is based on the concept of Riemannian geometry and algebraic topology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does UMAP assume about high-dimensional data?

A

That it lies on a low-dimensional manifold embedded in a higher-dimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the first step in UMAP?

A

Constructing a weighted graph representation of the data’s local manifold structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does UMAP optimize to generate the final embedding?

A

A low-dimensional graph layout that approximates the original high-dimensional structure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the main steps in UMAP’s algorithm?

A

Construct a k-nearest neighbors (kNN) graph
Apply a fuzzy simplicial set representation
Optimize a low-dimensional embedding that preserves the fuzzy topology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What role does the k-nearest neighbors (kNN) graph play in UMAP?

A

It captures the local structure of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does UMAP embed the data in a lower-dimensional space?

A

By optimizing a cross-entropy loss between the high-dimensional and low-dimensional representations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the key hyperparameters in UMAP?

A

n_neighbors
min_dist
metric
spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

UMAP
What does n_neighbors control?

A

The balance between local and global structure preservation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

UMAP
What does min_dist affect?

A

The compactness of clusters in the lower-dimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How does the choice of metric influence UMAP?

A

It determines the distance function used to define neighborhood relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How does UMAP compare to t-SNE in terms of computational efficiency?

A

UMAP is generally faster and scales better to large datasets

26
Q

UMAP
Which method preserves more of the global structure?

A

UMAP tends to preserve more of the global structure, while t-SNE focuses on local relationships

27
Q

How do UMAP and t-SNE handle different perplexity-like parameters?

A

UMAP uses n_neighbors, while t-SNE uses perplexity, but they serve similar functions

28
Q

What is a major difference in the cost function optimization between UMAP and t-SNE?

A

t-SNE minimizes Kullback-Leibler divergence, whereas UMAP optimizes a fuzzy topological representation

29
Q

What is one major weakness of UMAP?

A

It can sometimes distort global relationships in favor of preserving local structures

30
Q

How does UMAP handle density variation across clusters?

A

It may struggle with variable-density clusters, sometimes overcompressing sparse regions

31
Q

Why is UMAP not always deterministic?

A

Because of its reliance on stochastic processes in initialization and optimization

32
Q

What challenge does UMAP face with very high-dimensional sparse data?

A

It may require careful tuning of hyperparameters to avoid misleading embeddings

33
Q

Why is UMAP commonly used in transcriptomic analysis?

A

It effectively visualizes complex gene expression patterns in a low-dimensional space

34
Q

What type of transcriptomic data is UMAP often applied to?

A

Single-cell RNA sequencing (scRNA-seq) data

35
Q

How does UMAP help in scRNA-seq analysis?

A

It reveals cell clusters and differentiation pathways in an intuitive way

36
Q

What preprocessing steps are typically required before applying UMAP to transcriptomic data?

A

Normalization
Feature selection
Distance metric selection

37
Q

Which of the following dimensionality reduction methods use density aware distances?
- PCA
- ICA
- MDS
- t-SNE
- UMAP

A
  • t-SNE (via perplexity parameter)
  • UMAP random ( via k-nearest neighbour graph and fuzzy simplical complex.)
38
Q

What does t-SNE stand for?

A

t-Distributed Stochastic Neighbor Embedding

39
Q

What is t-SNE used for?

A

A nonlinear dimensionality reduction technique for high-dimensional data visualization

40
Q

In what field is t-SNE commonly used?

A

Single-cell RNA sequencing (scRNA-seq) and other biological data analyses

41
Q

What problem does t-SNE aim to solve?

A

The challenge of representing high-dimensional data in 2D or 3D while keeping similar points close together

Embeds high dimensional data by preserving local similarity (probabilistic)

42
Q

What are the main steps in t-SNE?

A
  • Compute pairwise similarities in high-dimensional space
  • Define a low-dimensional probability distribution
  • Optimize the embedding using gradient descent
43
Q

How does t-SNE measure similarity in high-dimensional space?

A

Using a Gaussian distribution around each point

44
Q

How does t-SNE define similarity in low-dimensional space?

A

Using a Student’s t-distribution with one degree of freedom

45
Q

What does the perplexity parameter control in t-SNE?

A

The balance between local and global structure

46
Q

How does changing perplexity affect t-SNE results?

A
  • Low perplexity favors local structure
  • High perplexity includes more global relationships
47
Q

t-SNE: Cost Function
What function does t-SNE minimize?

A

Kullback-Leibler (KL) divergence between high- and low-dimensional probability distributions

48
Q

t-SNE: Cost Function
Why does t-SNE use a Student’s t-distribution in low-dimensional space?

A

To avoid overcrowding and better separate clusters

49
Q

What are the advantages of t-SNE?

A
  • Excellent for visualizing complex datasets
  • Captures non-linear relationships
  • Good for clustering high-dimensional data
50
Q

What are the main weaknesses of t-SNE?

A
  • Computationally expensive
  • Non-deterministic (results vary across runs)
  • Poor at preserving global structure
51
Q

How does t-SNE handle large datasets?

A

It struggles with large datasets due to high computational cost

52
Q

How does t-SNE compare to UMAP in terms of speed?

A

UMAP is generally faster than t-SNEWhich method preserves global structure better?
UMAP preserves more global structure than t-SNE

53
Q

Why might someone choose UMAP over t-SNE?

A

UMAP is deterministic, faster, and better at maintaining overall data structure

54
Q

_____ prioritises local structure, ignores global distances

55
Q

______ captures both local and some global structure

56
Q

How are tSNE and UMAP similar?

A
  • non-linear
  • density-aware distances
  • random (but can be made deterministic with fixed seed)
  • NO variable importance assessment
57
Q

Which dimensionality reduction technique(s) include(s) variable importance assessment?

58
Q

Which dimensionality reduction technique(s) are or can be made deterministic?

A

is deterministic:
- PCA
- classical MDS

can be made deterministc:
- ICA - depends on implementation
- tSNE, UMAP: can be made deterministic with fixed seeds

not deterministic:
- metric/non-metric MDS

59
Q

Which dimensionality reduction technique(s) capture global structure?

A
  • PCA
  • MDS can be tuned for local or global (classical MDS - global)
  • UMAP: captures some global structure (but focuses on local)
60
Q

Which dimensionality reduction technique(s) ignores global distances?

61
Q

re local/global distances:
Which dimensionality reduction technique(s) focuses on statistically independent directions, ie not distance based?