Lecture 4 Flashcards

Low Dimensional Vizualization

1
Q

What are the main reasons for using data visualization?

A

To explore data and uncover patterns or associations.
To effectively communicate findings.
To detect errors or outliers visually.

Example: Using a histogram to spot an unusually high data value (outlier) in height measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 core principles of the Grammar of Graphics?

A

Separation of data and aesthetics: Define how data is mapped (e.g., color, size).
Plot element definition: Specify visual components like points, lines, or bars.
Layer composition: Combine layers to build a plot.

Extra: This concept, developed by Leland Wilkinson, inspired the ggplot2 package in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name the 6 main layers in ggplot2.

A

Data: Input data.
Aesthetics (aes): Mapping variables to visual features like color, x/y positions.
Geometric objects (geom): Defines the plot type (e.g., geom_point, geom_bar).
Scales: Adjusts visual scaling (e.g., scale_x_log10).
Facets: Creates subplots for different subsets of data (facet_grid).
Theme: Sets plot styles (e.g., axis labels, grid lines).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do histograms visualize data?

A

They show the frequency of data values across intervals (bins).

Example:
Command: geom_histogram(bins=10)
Adjust bins to control the granularity of the plot.
Tip: Histograms are good for showing distributions but can hide subtle patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are density plots, and how do you control their appearance?

A

Density plots smooth data distributions using kernel density estimation.

Command Example:
geom_density(bw=0.5) (controls the bandwidth for smoothness).
Tip: Use caution as bandwidth significantly affects visual interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the key elements of a box plot?

A

Median: Center line of the box.
Quartiles (Q1 & Q3): Edges of the box (25th and 75th percentiles).
Whiskers: Extend up to 1.5 times the interquartile range (IQR).
Outliers: Points beyond whiskers.

Extra: Box plots are not ideal for multimodal or discrete data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do scatter plots show?

A

They show relationships between two continuous variables.

Example: Comparing life expectancy vs. GDP using:
geom_point(aes(x = gdpPercap, y = lifeExp)).
Enhancements:
Use color or size to add more dimensions (aes(color=continent, size=pop)).
Log scaling helps with large variance (scale_x_log10()).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of a Q-Q plot?

A

To compare the distribution of data to a theoretical distribution (e.g., normal or uniform).

Command: geom_qq(distribution = stats::qunif)
Diagonal line (geom_abline) represents perfect alignment between distributions.
Tip: Use Q-Q plots to check normality before statistical tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When should line plots be used?

A

To show connections or trends over time (e.g., unemployment rate over years).

Command: geom_line(aes(x = date, y = unemploy/pop)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a violin plot, and when is it useful?

A

A combination of a box plot and density plot, showing distribution and density.
Ideal for multimodal data.

Command: geom_violin().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do bar plots show?

A

Quantitative values per category (e.g., number of countries per continent).

Command: geom_bar(stat = ‘identity’).
Tip: Add error bars with geom_errorbar() to show uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a scatterplot matrix?

A

A grid of scatter plots that shows relationships between several variables.

Command: ggpairs(mpg, columns = c(‘displ’,’cyl’,’cty’,’hwy’)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the purpose of a 2D density plot?

A

It shows point density across a 2D space, useful for large datasets.

Command: geom_hex().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are two important principles for data visualization?

A

Show raw data when possible: Avoid over-smoothing or hiding outliers.
Maximize data/ink ratio: Present data with minimal visual clutter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly