2 Data Visualizations Flashcards

1
Q

4 Basic Visualization Technique Categories

A
  1. Array plots
  2. Scatter plots
  3. Histograms
  4. Graphs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Array Plots

A

Rows are data points (instances)
Columns are numerical features
Each Grid element is colored based on the feature value
Uses a color map and color bar (often)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does it mean when values on array plot are not colored

A

There are missing values in the data (that are set to negative infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Strengths of Array Plots

A

Reveals qualitative information about dataset structure

Helps detect missing values, normalization issues, and batch differences quickly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Limitations of Array Plots

A

Can become overwhelming with large datasets.

Lacks precise information about value distributions or feature correlations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do array plots help detect missing values, normalization issues, and batch differences?

A
  • Missing values: Appear as distinct color gaps or unusual patterns.
  • Normalization issues: Features with different scales will have inconsistent color ranges.
  • Batch differences: Different groups of data points may show noticeably different color patterns, indicating batch effects. (If groups have their own color)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Scatter Plots

A

Consider two features at a time in order to detect potential correlation by displaying values for the two features on the x and y axis respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to augment scatter plots using transparency

A

By changing transparency based on density (where a lot of datapoints on the plot are located), it is easier to identify where most of the datapoints lie on the plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Histograms

A

Focus on single numerical feature in order to extract more information about that feature.

Number of instances having a particular feature value is given on the y-axis. (Count)

Feature values are indicated on x-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the strengths of histograms in data visualization?

A

Precisely shows the distribution of feature values (e.g., mean, variance, tailedness, and outliers).

Provides hints for preprocessing to reduce the impact of outliers (e.g., large spenders).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Weaknesses of histograms in data visualization

A

Does not show correlations between features.

Not suitable for high-dimensional data (requires a separate histogram for each feature).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Graph Visualization

A

Arrange nodes of the graph in a ring (2D layout) and draw lines between nodes connected by an edge (line).

This works well when the graph is sparse (not many edges between all nodes)

If edge strength is “real-valued”, only draw a line if edge value is above a certain threshold, or use different transparency levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of low-dimensional embedding in data visualization?

A

It aims to construct a scatter plot where the x axis and y axis do not carry any specific meaning but where distances/similarities between points faithfully represent distances/similarities in the original input space. (interpret clusters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is MDS (Multi Dimensional Scaling)

A

Multi Dimensional Scaling is a popular low-dimensional embedding technique that generates for each instance a vector in low dimensions. These vectors are optimized so that the distances between points in low-dimensional space replicate the true distances between corresponding instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Metric MDS?

A

Metric MDS (multi-dimensional scaling) is a specific type of MDS that focuses on maintaining the distances from the original data as closely as possible. It aims to minimize the difference (or “stress”) between the true distances dij and the distances in the lower-dimensional representation d^ij .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the stress function in Metric MDS (Multi Dimensional Scaling)

A

The stress function measures how well the low-dimensional representation matches the original distances. It is defined as the sum of (difference between original distances between two instances and low dimensional distance between two instances)^2.

The goal is to minimize this function, meaning we want distances in the lower dimensional space to be as close to the true distances as possible.

17
Q

Explain a physical analogy of Metric MDS solutions

A

Imagine each pair of data points is connected by a spring.

Each spring has a natural length based on the true distance between the data points.

The MDS solution is like finding the position where all springs are relaxed (not stretched or compressed), which corresponds to the best low-dimensional representation of the data.

18
Q

Optimization Process in Metric MDS solutions

A
  1. Initialization: start with random positions for the data points in the low dimensional space
  2. Iterative Improvement: Adjust the positions of the points to reduce the stress function (use methods like gradient descent or more advanced techniques for faster optimization)
  3. Best solution: the optimization may settle on a less than ideal position, so it is necessary to repeat the process multiple times with different starting points to find the best configuration (the one with the lowest stress)
19
Q

Gradient Descent Algorithm Purpose

A

Find the minimum of a function (the function in MDS case is the stress function)

20
Q

How the Gradient Descent Algorithm Works

A
  1. Start with random value for parameter θ
  2. Update θ: move it a little bit in the direction that makes cost J(θ) smaller (size of movement is the hyper parameter learning rate)
  3. Keep going until cost J(θ) doesn’t change much anymore, or after a set number of steps

Cost J(θ) in the Multi Dimensional Scaling process is the stress function (which we try and minimize), which equals the difference in distance between points across dimensions.

21
Q

Chain Rule

A

For a parameter of interest (θ), its impact on the quantity to optimize J(θ) can be measured across layers in a path by calculating the derivative of each layer and multiplying them.

Derivative (Z/b) * Derivative (b/a) * Derivative (a/θ) = Impact of θ on Z, which in other words is Derivative Z/θ. Z = J(θ). This is for a model with a path of layers going: θ -> a -> b -> Z.

22
Q

Multivariate Chain Rule (multiple paths from θ to Z)

A

Apply Chain Rule to each path:

Derivative (Z/b) * Derivative (b/a) * Derivative (a/θ) = Impact of θ on Z, which in other words is Derivative Z/θ. Z = J(θ). This is for a model with a path of layers going: θ -> a -> b -> Z. t

Then sum up all the products (all the Derivative Z/θ’s) to get the multivariate impact.

23
Q

MDS

A

Multi-dimensional Scaling is a technique used to visualize high-dimensional data in fewer dimensions (usually 2D or 3D). Helps us understand the relationships between data points based on their distances

24
Q

Metric MDS vs Non Metric MDS

A

Non Metric MDS extends MDS by introducing function f that allows more flexibility in how distances are treated. It is more focused on preserving order of distances rather than exact distances.

Non Metric MDS is better at representing relationships between data points than Metric MDS due to its use of the function f for better reflecting the structure and distances in higher dimensions.

25
Q

Problem with MDS

A

MDS seeks to accurately reconstruct all distances between points (global), when we tend to only care about local structure of data distribution. Local structure is the relationships between nearby points.

What tends to happen is by focusing on all relationships, the MDS can distort local relationships in favor of maintaining all pairwise distances. We’d prefer it to just get the local ones right.

26
Q

T-SNE

A

T-SNE was developed to address issues of MDS by focusing on similarities between data points rather than their exact distances.

T-SNE determines similarity of points based on their distances in high dimensional space. Coverts these distances into probabilities that reflect the similarity.

27
Q

Difference in performance between T-SNE and MDS

A

MDS better preserves overall distances

TSNE better for representing the local structure of the data (close relationships)

28
Q

Early Exaggeration

A

Method used in first phase of t-SNE training to help algorithm clearly and tightly make clusters. It boosts similarity between points temporarily to help algorithm spot points that are close together.