DataViz 6 Flashcards
Graphically Supported Hypotheses
What are descriptive plots used for in data analysis?
Descriptive plots visualize how variables are distributed to explore data in an unbiased way. Examples include histograms, boxplots, violin plots, and PCA projections.
What type of plot is used to visualize the distribution of diamond carat weights?
A histogram can be used to show the distribution of diamond weights.
What are associative plots, and what do they show?
Associative plots display how a response variable depends on one or more explanatory variables. Examples include scatterplots and boxplots showing relationships like price vs. weight.
What hypothesis can be supported by plotting diamond prices against carat weights?
The hypothesis ‘the price of a diamond increases with weight’ can be supported by a scatterplot with price on the y-axis and carat on the x-axis.
What is a common mistake when using descriptive plots?
Using descriptive plots to make causal claims is a common mistake. Associative plots should be used to support such claims.
Why does correlation not imply causation?
Correlation may be random, reverse the actual cause-effect relationship, or be influenced by a third variable.
Why might parental help with homework correlate negatively with academic performance?
Children struggling in school may need more parental help, which reverses the cause-effect assumption.
In the scenario of sunburns and water intake, what third variable might explain the correlation?
Sunny days increase both water intake and sunburn risk, acting as the third variable (common cause).
What is Simpson’s Paradox?
A correlation between two variables can flip direction when stratified by a third variable, leading to misleading conclusions.
What are the three main parts of a data presentation?
Introduction: Motivation, background, goals.
Central Part: Claims, hypotheses, results supported by plots.
Closure: Summary and key takeaways.
What should a good plot title convey?
A good title should state the finding clearly (e.g., ‘Flight time increases with distance’) rather than describing the method (e.g., ‘Scatterplot of distance vs. time’).
Why is labeling axes important?
Properly labeled axes ensure the audience understands what each axis represents. Use clear, legible fonts.
Why should barplots usually start at 0?
Starting at 0 prevents exaggerating small differences and ensures proportionality is maintained in bar lengths.
What are key rules for using color in visualizations?
Use meaningful and necessary colors.
Ensure contrast for visibility.
Use soft colors for most elements and bright colors for highlights.
Use consistent backgrounds.
What is ‘chart junk,’ and why should it be avoided?
Chart junk refers to unnecessary visual elements (e.g., excessive gridlines, double encoding, pseudo-3D plots) that distract from data. Simplifying plots improves clarity.
What is an example of chart junk in visualizations?
Adding unnecessary decorations, pictures within graphs, or pseudo-3D effects that don’t add information.
What does it mean to increase the data-ink ratio?
Maximizing the data-ink ratio means using ink primarily to display data, minimizing ink used for non-essential decorations.
How can complex figures be presented for better understanding?
Start with a simplified version to introduce key elements, then gradually add more complexity to help the audience follow the full visualization.
When should sequential color palettes be used?
Use sequential palettes for continuous variables to display quantitative differences effectively.
When should qualitative palettes be used?
Use qualitative palettes to separate categorical variables into distinct groups.