2 Data Visualizations Flashcards

Question 1

Q

4 Basic Visualization Technique Categories

Answer

A

Array plots
Scatter plots
Histograms
Graphs

Question 2

Q

Array Plots

Answer

A

Rows are data points (instances)
Columns are numerical features
Each Grid element is colored based on the feature value
Uses a color map and color bar (often)

Question 3

Q

What does it mean when values on array plot are not colored

Answer

A

There are missing values in the data (that are set to negative infinity)

Question 4

Q

Strengths of Array Plots

Answer

A

Reveals qualitative information about dataset structure

Helps detect missing values, normalization issues, and batch differences quickly

Question 5

Q

Limitations of Array Plots

Answer

A

Can become overwhelming with large datasets.

Lacks precise information about value distributions or feature correlations.

Question 6

Q

How do array plots help detect missing values, normalization issues, and batch differences?

Answer

A

Missing values: Appear as distinct color gaps or unusual patterns.
Normalization issues: Features with different scales will have inconsistent color ranges.
Batch differences: Different groups of data points may show noticeably different color patterns, indicating batch effects. (If groups have their own color)

Question 7

Q

Scatter Plots

Answer

A

Consider two features at a time in order to detect potential correlation by displaying values for the two features on the x and y axis respectively.

Question 8

Q

How to augment scatter plots using transparency

Answer

A

By changing transparency based on density (where a lot of datapoints on the plot are located), it is easier to identify where most of the datapoints lie on the plot.

Question 9

Q

Histograms

Answer

A

Focus on single numerical feature in order to extract more information about that feature.

Number of instances having a particular feature value is given on the y-axis. (Count)

Feature values are indicated on x-axis.

Question 10

Q

What are the strengths of histograms in data visualization?

Answer

A

Precisely shows the distribution of feature values (e.g., mean, variance, tailedness, and outliers).

Provides hints for preprocessing to reduce the impact of outliers (e.g., large spenders).

Question 11

Q

Weaknesses of histograms in data visualization

Answer

A

Does not show correlations between features.

Not suitable for high-dimensional data (requires a separate histogram for each feature).

Question 12

Q

Graph Visualization

Answer

A

Arrange nodes of the graph in a ring (2D layout) and draw lines between nodes connected by an edge (line).

This works well when the graph is sparse (not many edges between all nodes)

If edge strength is “real-valued”, only draw a line if edge value is above a certain threshold, or use different transparency levels

Question 13

Q

What is the purpose of low-dimensional embedding in data visualization?

Answer

A

It aims to construct a scatter plot where the x axis and y axis do not carry any specific meaning but where distances/similarities between points faithfully represent distances/similarities in the original input space. (interpret clusters)

Question 14

Q

What is MDS (Multi Dimensional Scaling)

Answer

A

Multi Dimensional Scaling is a popular low-dimensional embedding technique that generates for each instance a vector in low dimensions. These vectors are optimized so that the distances between points in low-dimensional space replicate the true distances between corresponding instances.

Question 15

Q

What is Metric MDS?

Answer

A

Metric MDS (multi-dimensional scaling) is a specific type of MDS that focuses on maintaining the distances from the original data as closely as possible. It aims to minimize the difference (or “stress”) between the true distances dij and the distances in the lower-dimensional representation d^ij .

Question 16

Q

What is the stress function in Metric MDS (Multi Dimensional Scaling)

Answer

A

The stress function measures how well the low-dimensional representation matches the original distances. It is defined as the sum of (difference between original distances between two instances and low dimensional distance between two instances)^2.

The goal is to minimize this function, meaning we want distances in the lower dimensional space to be as close to the true distances as possible.

Question 17

Q

Explain a physical analogy of Metric MDS solutions

Answer

A

Imagine each pair of data points is connected by a spring.

Each spring has a natural length based on the true distance between the data points.

The MDS solution is like finding the position where all springs are relaxed (not stretched or compressed), which corresponds to the best low-dimensional representation of the data.

Question 18

Q

Optimization Process in Metric MDS solutions

Answer

A

Initialization: start with random positions for the data points in the low dimensional space
Iterative Improvement: Adjust the positions of the points to reduce the stress function (use methods like gradient descent or more advanced techniques for faster optimization)
Best solution: the optimization may settle on a less than ideal position, so it is necessary to repeat the process multiple times with different starting points to find the best configuration (the one with the lowest stress)

Question 19

Q

Gradient Descent Algorithm Purpose

Answer

A

Find the minimum of a function (the function in MDS case is the stress function)

Question 20

Q

How the Gradient Descent Algorithm Works

Answer

A

Start with random value for parameter θ
Update θ: move it a little bit in the direction that makes cost J(θ) smaller (size of movement is the hyper parameter learning rate)
Keep going until cost J(θ) doesn’t change much anymore, or after a set number of steps

Cost J(θ) in the Multi Dimensional Scaling process is the stress function (which we try and minimize), which equals the difference in distance between points across dimensions.

Question 21

Q

Chain Rule

Answer

A

For a parameter of interest (θ), its impact on the quantity to optimize J(θ) can be measured across layers in a path by calculating the derivative of each layer and multiplying them.

Derivative (Z/b) * Derivative (b/a) * Derivative (a/θ) = Impact of θ on Z, which in other words is Derivative Z/θ. Z = J(θ). This is for a model with a path of layers going: θ -> a -> b -> Z.

Question 22

Q

Multivariate Chain Rule (multiple paths from θ to Z)

Answer

A

Apply Chain Rule to each path:

Derivative (Z/b) * Derivative (b/a) * Derivative (a/θ) = Impact of θ on Z, which in other words is Derivative Z/θ. Z = J(θ). This is for a model with a path of layers going: θ -> a -> b -> Z. t

Then sum up all the products (all the Derivative Z/θ’s) to get the multivariate impact.

Question 23

Q

MDS

Answer

A

Multi-dimensional Scaling is a technique used to visualize high-dimensional data in fewer dimensions (usually 2D or 3D). Helps us understand the relationships between data points based on their distances

Question 24

Q

Metric MDS vs Non Metric MDS

Answer

A

Non Metric MDS extends MDS by introducing function f that allows more flexibility in how distances are treated. It is more focused on preserving order of distances rather than exact distances.

Non Metric MDS is better at representing relationships between data points than Metric MDS due to its use of the function f for better reflecting the structure and distances in higher dimensions.

Question 25

Q

Problem with MDS

Answer

A

MDS seeks to accurately reconstruct all distances between points (global), when we tend to only care about local structure of data distribution. Local structure is the relationships between nearby points.

What tends to happen is by focusing on all relationships, the MDS can distort local relationships in favor of maintaining all pairwise distances. We’d prefer it to just get the local ones right.

Question 26

Q

T-SNE

Answer

A

T-SNE was developed to address issues of MDS by focusing on similarities between data points rather than their exact distances.

T-SNE determines similarity of points based on their distances in high dimensional space. Coverts these distances into probabilities that reflect the similarity.

Question 27

Q

Difference in performance between T-SNE and MDS

Answer

A

MDS better preserves overall distances

TSNE better for representing the local structure of the data (close relationships)

Question 28

Q

Early Exaggeration

Answer

A

Method used in first phase of t-SNE training to help algorithm clearly and tightly make clusters. It boosts similarity between points temporarily to help algorithm spot points that are close together.

Brainscape's Knowledge GenomeTM

2 Data Visualizations Flashcards

Brainscape's Knowledge Genome^TM