DataViz 6 Flashcards

Graphically Supported Hypotheses

1
Q

What are descriptive plots used for in data analysis?

A

Descriptive plots visualize how variables are distributed to explore data in an unbiased way. Examples include histograms, boxplots, violin plots, and PCA projections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What type of plot is used to visualize the distribution of diamond carat weights?

A

A histogram can be used to show the distribution of diamond weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are associative plots, and what do they show?

A

Associative plots display how a response variable depends on one or more explanatory variables. Examples include scatterplots and boxplots showing relationships like price vs. weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What hypothesis can be supported by plotting diamond prices against carat weights?

A

The hypothesis ‘the price of a diamond increases with weight’ can be supported by a scatterplot with price on the y-axis and carat on the x-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a common mistake when using descriptive plots?

A

Using descriptive plots to make causal claims is a common mistake. Associative plots should be used to support such claims.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why does correlation not imply causation?

A

Correlation may be random, reverse the actual cause-effect relationship, or be influenced by a third variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why might parental help with homework correlate negatively with academic performance?

A

Children struggling in school may need more parental help, which reverses the cause-effect assumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the scenario of sunburns and water intake, what third variable might explain the correlation?

A

Sunny days increase both water intake and sunburn risk, acting as the third variable (common cause).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Simpson’s Paradox?

A

A correlation between two variables can flip direction when stratified by a third variable, leading to misleading conclusions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the three main parts of a data presentation?

A

Introduction: Motivation, background, goals.
Central Part: Claims, hypotheses, results supported by plots.
Closure: Summary and key takeaways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What should a good plot title convey?

A

A good title should state the finding clearly (e.g., ‘Flight time increases with distance’) rather than describing the method (e.g., ‘Scatterplot of distance vs. time’).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is labeling axes important?

A

Properly labeled axes ensure the audience understands what each axis represents. Use clear, legible fonts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why should barplots usually start at 0?

A

Starting at 0 prevents exaggerating small differences and ensures proportionality is maintained in bar lengths.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are key rules for using color in visualizations?

A

Use meaningful and necessary colors.
Ensure contrast for visibility.
Use soft colors for most elements and bright colors for highlights.
Use consistent backgrounds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is ‘chart junk,’ and why should it be avoided?

A

Chart junk refers to unnecessary visual elements (e.g., excessive gridlines, double encoding, pseudo-3D plots) that distract from data. Simplifying plots improves clarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is an example of chart junk in visualizations?

A

Adding unnecessary decorations, pictures within graphs, or pseudo-3D effects that don’t add information.

17
Q

What does it mean to increase the data-ink ratio?

A

Maximizing the data-ink ratio means using ink primarily to display data, minimizing ink used for non-essential decorations.

18
Q

How can complex figures be presented for better understanding?

A

Start with a simplified version to introduce key elements, then gradually add more complexity to help the audience follow the full visualization.

19
Q

When should sequential color palettes be used?

A

Use sequential palettes for continuous variables to display quantitative differences effectively.

20
Q

When should qualitative palettes be used?

A

Use qualitative palettes to separate categorical variables into distinct groups.