8 - Data Visualization Flashcards
Why visualize?
Explorative Analysis
- visualisation supports analyst intuition, e.g. finding outliers
- may help you look for connections you didn’t think of before
- manual data mining
- > requires both good visualization and flexible adaptation
- > you visualize for yourself
Why visualize?
Decision support & management information
- visualization can provide a quick overview of relevant trends and patterns, e.g. data from month to month
- > requires both good visualization and simple adaptation
- > you build visualization tools
Why visualize?
Presentation and argumentation
- Visualization can help you to underline your arguments with quantitative data in a way that is easy to communicate
- careful: ethics of visualization, e.g. What do I decide ti show? How do I show it? Manipulation without lying
- > requires polished visualization, no adaptation
- > you visualize for others
Good visualization …
- shows all relevant data
- makes the audience think about the content rather than the representations
- does not distort the data
- makes large data sets understandable
- enables comparisons
- layers details from overview to finer points
- has a clear purpose
- is integrated with the context of the representation
= substance + data analysis + design
Three steps for effective visualization
- formulate the question
- gather (and analyze) the data
- apply a visual representation
Question, concepts and visuals
What … is the best and the worst?
Concept:
- maximums and minimums
Visuals:
- bar graph
Questions, concepts and visuals
How has … changed over time?
Concept:
- temporal patterns (trend, seasonality)
Visuals:
- line graph
Questions, concepts and visuals
What … stand out from the rest?
Concept:
- outliers
Visuals:
?
Questions, concepts and visuals
What makes … different from …?
Concept:
- Clustering
Visuals:
?
Questions, concepts and visuals
How are … and … related?
Concept:
- correlation
Visuals:
- scatter plot
Questions, concepts and visuals
What is the breakdown for …?
Concept:
- distribution
Visuals:
- stacked bar graph
Warning: Visualizations can distort data
Data can be distorted by …
- changing the scale of the y-axis between diagrams
- modifying the base line
- switching the aggregation level
- using areas to show one-dimensional data
- using advantageous visual effects (shadows, highlights, …)
- > distortion can be the consequence of errors, mislead decoration, or intentional deception
- > data visualization is also a matter of ethics
Types of representation
- size: represent by area
- color: e.g. coloring values differently
- location: e.g. on a map
- network: e.g. identifying different groups
- time: e.g. line graph across different years
Different ways of visualizing distributions
Sorted
- you can show the median
Unsorted
- distribution according to the time of sample
- no median possible
Histograms: Beware the power of bins
- small bins show variations at higher granularity
- the larger the bins the less variation is visible
- the more we aggregate the the more the median becomes obvious (however further aggregation leads to loss of information)
How can proportions be shown?
Pie chart:
- need to add up to 100%
(Stacked) Bar graphs
Maximize Information Content
Pay attention to:
- adding labels providing information on direct, size
- scaling time by providing overall duration
- summarizing development in static overview
- choosing colors that are good for visualization
Use the smallest effective Difference
For elegant visualization that do not distract, make visual distinctions as subtle as possible but as clear as necessary
- mark only those lines strongly, that display the most important data
- make color differences visible
- avoid “chart junk”
Three approaches to using color
Sequential:
- same or similar hues are used and saturation varies for a single metric
Diverging:
- two hues are used to indicate a division, such as positive and negative values
Qualitative:
- when data is non-numeric, contrasting colors are used for each category
Scaling
Scaling one graph vs. scaling several graphs
One Graph:
- deliberately set the axes’ minimum and maximum value
- chose the best aspect ratio to make information visible
Several Graphs:
- consider scaling all compared graphs similarly
- makes comparisons easier
- can make graphs hard to read
Aggregation
- can make data stand out
- can gloss over data
Multiples and parallelism
- can be used for comparisons and for transformations
- makes both similarities and differences easier to follow and understand
- reduces effort that can then be spend on interpreting relevant details - “Don’t make me think”
Infographics vs. Data Visualization
Data visualisation creates a picture from a data set that efficiently communicates the main insights
Infographics tell a story through
- data visualization
- and text
- and images
- > frequently used in the context of PR, marketing and journalism
- > can also visualize a process
The art of (visual) storytelling
Introduction/Foundation:
- What is this about and why should you care?
Ah ha! The main event
- Some new, previously unknown piece of information - generate insight. This is the infographics entire reason of existence
Conclusion/Call to action
- What is the follow-up we want? Addresses?
Tips for designing good infographics
Be accurate
- tell what you did and what you found out
Focus on a key message
- why should we care?
Visualize when possible
- show trends, ratios, patterns, but also
- processes, organizational hierarchies, examples …
Minimise text
- presentations should not be good at transferring knowledge without verbal explanation
Be data transparent
- where did the data come from
- what did you do to it?
Animated and Interactive Visualizations
- Interactive visualizations let you adapt diagrams to filter or aggregate data on the fly
- these are particularly relevant for web based presentations of data sets
- a range of tools, e.g. based in Python, is available however: the aforementioned design principles still apply