W5 Flashcards
What are the relevant libraries to import for visualisation?
iimport matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import pandas as pd
import seaborn as sns
import numpy as np
import numpy.random as nr
from datetime import datetime
Set the default size of plots
import matplotlib
matplotlib.rcParams[‘figure.figsize’] = (4, 2.5)
How can you attain a bar chart and pie chart for the survivor rate from the Titanic EXAMPLE?
Different ways to visualise the same data (i.e. bar chart and pie chart)
survive_count = titanic.value_counts(‘Survived’)
_, ax = put.subplots(ncols=2, figsize=(8, 2))
survive_count.plot.bar(ylabel=’count, rot=0, ax=ax[0], title=’Bar chart’)
survive_count.plot.pie(y_label=’ ‘, ax=ax[1], title=’Pie chart’);
What are the different visualise the scatter plots for 3 variables in different colours in the auto EXAMPLE?
_, ax = plt.subplots(ncols=3, figsize=(10, 2), sharey=True)
sc_m = ax[0].scatter(x=auto.displacement, y=auto.mpg, c=auto.weight, s=5)
plt.subplots_adjust(wspace=0.3); plt.colorbar(sc_m)
auto.plot.scatter(x=’displacement’, y=’mpg’, c=’weight’, s=5, ax=ax[1], title=’pandas default’)
sc_s = sns.scatterplot(auto, x=’displacement’, y=’mpg’, hue=’weight’, s=5, ax=ax[2])
sns.move_legend(sc_s, “upper left”, bbox_to_anchor=(1, 1))
ax[0].set_title(‘matplotlib default’); ax[2].set_title(‘seaborn default’);
What are the different ways we can map data into visual properties when visualising data?
- Length or height
* i.e does the data visuals run vertically or horizontally - Position
* Is the data scaling - Area
* How much of the graph space is used - Angle/area
- Line weight
- Hue and shade
What do we want to achieve through visualisation and what are the considerations?
Take advantage of the human visual system to
* Understand data and extract information
* Communicate
Considerations
* Correctness
* Effectiveness: e.g. match human perception
Through visualisation, we encode the data into plots. To understand the data through the plots, it relies on the audience’s capability to decode the plots correctly.
What are marks and channels?
Marks and channels are building blocks for visual encoding.
- Marks: geometric primitives
- Channels: control the appearance of marks based on attributes
What do marks do and some examples?
Marks represent items or links, for now, we only consider items
Basic geometric elements, classified according to the number of dimensions
EXAMPLES:
* Points (zero-dimensional)
* Lines (one-dimensional)
* Areas (two-dimensional)
_, ax = plt.subplots(ncols=3, figsize=(11, 3)); plt.subplots_adjust(wspace=0.3)
auto.plot.scatter(x=’displacement’, y=’mpg’, ax=ax[0], s=5, title=’points (0d)’)
survive_count.plot.bar(ylabel=’count’, rot=0, ax=ax[1], width=0.1, title=’lines (1d)’)
survive_count.plot.pie(ylabel=’’, ax=ax[2], title=’areas (2d)’);
What do channels do, the 2 types, and some examples?
Channels (or visual variables) control the appearance of marks, proportional to / based on attributes
2 types of channels:
- Identity channels: what something is
* E.g. shape, hue of colours, spatial region - Magnitude channels: ordered attributes
* E.g. position, length, area, angle (or tilt), lightness of colours
EXAMPLES:
1. Position (horizontal, vertical, both)
2. Shape (triangle, star, line, right angle)
3. Size (length, area, volume)
4. Colour
5. Tilt
6. Volume
What are the 2 principles of the use of visual channels?
- Expressiveness principle
* Visual encoding should express all of, and only, the information in the dataset attributes - Effectiveness principle
* The importance of the attribute should match the salience of the channel
- i.e. the most important attributes should be encoded with the most effective channels
What is the expressiveness principle, counterexample and EXAMPLE?
- Expressiveness Principle: Visual encoding should express all of, and only, the information in the dataset attributes
* Magnitude channels: Quantitative and ordinal data
* Identity channels: Categorical attributes
Counterexamples (using identity channel for a quantitative variable):
EXAMPLE:
g = sns.scatterplot(auto, x=’displacement’, y=’mpg’, style=’cylinders’)
sns.move_legend(g, “upper left”, bbox_to_anchor=(1, 1));
What are the expressiveness channels ranked by effectiveness?
A. Magnitude Channels: Ordered Attributes -
1. Position on common scale
2. Position on unaligned scale
3. Length (1D size)
4. Tilt/angle
5. Area (2D size)
6. Depth (3D position)
7. Colour luminance
8. Colour Saturation
9. Curvature
10. Volume (3D Size)
B. Identity Channels: Categorical Attributes -
1. Spatial region
2. Colour hue
3. Motion
4. Shape
What are the factor attributes to effectiveness for channels?
- Accuracy: capability to estimate the magnitude of data encoded
- Discriminability: the capability to distinguish items as intended (this quantifies the number of bins available for visual encoding)
* EXAMPLE: In using use different lightness of green to represent different categories, it is quite difficult to distinguish which is which, as the number of bins available when using lightness as a channel is limited. - Separability: can we combine multiple channels
What are the rankings of accuracy across different channels?
Ranking of enabling accurate estimates:
1. Position along common scales
2. Position along identical, nonaligned scales
3. Length
4. Direction/slope
5. Angle
6. Area
7 Volume
8. Shading and saturation
9. Colour hue
What are some different ways we can interpret separability?
a) Fully Separable
- Position + hue (colour)
b) Some interference
- Size + hue (colour)
c) Some/significant interference
- Width + Height
What is proportional judgement and the ranked error (from least to most error-prone graphs)?
Proportional judgement - Is the ability to recognise and distinguish proportions in data/graphs
Ranked from least to most error-prone for proportional judgements:
1. Positions
2. Angles
3. Circular Areas
4. Rectangular Areas
What is human capability and its limitations in interpreting graphical results?
The human perceptual system is fundamentally based on relative judgements, not absolute ones. Our perception of colour and luminance is contextual, based on the contrast with surrounding colours.
Our visual system evolved to provide colour constancy, so that the same surface is identifiable across a broad set of illumination conditions, even though a physical light meter would yield very different readings.
- Comparing lengths (unframed unaligned vs. framed unaligned vs. unframed aligned)
- Comparing Luminance
What are the key takeways from selecting channels?
- Choose the channels that suit your need, which are not necessarily the ones that enable accurate estimates
* For example, using channels at the bottom half of the scale can be appropriate if the goal is not to enable accurate judgements, but to reveal general patterns
* One can annotate the plot to help the audience to decode - Multiple graphic forms may enable multiple tasks
* For example, if you want to show both general impressions of the share of each grade and at the same time allow readers to compare the number of students per each grade easily and accurately, you may want to use separate charts
Should colour be included as a channel?
- Colour can be a redundant channel, and therefore unnecessary
- Alternative channels can be more effective
- Colour luminance and colour saturation were understood as being least effective on the ordered attributes of magnitude channels; yet were 2nd most effective on the categorical attributes of identity channels
What is a colourmap and its different
A colourmap (or colour palette) specifies a mapping between colours and data values.
- Texonomy of colourmaps:
1. Categorical (or qualitative)
2. Continuous
-> Sequential
-> Diverging - It is important to match colourmaps to data type characteristics.
What are the categorical colormaps?
- Often are miscellaneous colours: pastel1, pastel2, paired, accent, dark2, set1, set2, set3, tab10, tab20,…
- Suitable for categorical nominal data
- Ideally, each colour should have the same lightness
- But this would restrict the number of discriminable colours
- Note the number of discriminable colours is limited in noncontiguous small regions:
What are the (continuous) sequential colours?
A sequential colourmap ranges from a minimum value to a maximum value.
- The colour changes in lightness and possibly saturation of colour incrementally to full lightness and saturation, often using a single hue
- Good for ordered data
What are problems with rainbow colours?
- Perceptually unordered:
* No clear “greater than” or “less than” logic to order the colour
* Hue, which represents the type of colours, may not be appropriate to represent order - Perceptually nonlinear:
* Steps of the same size at different points in the colourmap range are not perceived equally by our eyes
* Human is not very good at perceiving changes in hue - Colour blind readers may not be able to distinguish red and green colour
* Readers perceive sharp transitions in colour as sharp transitions in the data, even when this is not the case (misleading)
What is a Perceptually uniform sequential colourmap?
Perceptually uniform sequential colourmaps are colourmaps that may contain multiple types of colours, but equal steps in data are perceived as equal steps in the colour space in terms of lightness.
- The default colourmaps of both matplotlib and seaborn for quantitative data are perceptually uniform sequential colourmap.
_, ax = plt.subplots(ncols=2, figsize=(7, 2.5), sharey=True)
sc_m = ax[0].scatter(x=auto.displacement, y=auto.mpg, c=auto.weight)
ax[0].set_title(‘matplotlib default’); ax[0].set_ylabel(‘mpg’); plt.colorbar(sc_m)
sc_s = sns.scatterplot(auto, x=’displacement’, y=’mpg’, hue=’weight’, ax=ax[1])
sns.move_legend(sc_s, “upper left”, bbox_to_anchor=(1, 1))
plt.subplots_adjust(wspace=0.3); ax[1].set_title(‘seaborn default’);
What is a key difference between Perceptually uniform sequential colourmap and rainbow colourmap?
Perceptually uniform sequential colourmaps have reasonable representations in grayscale, whereas rainbow colourmaps may not (i.e. lightness scaling in perceptually uniform sequential colourmaps)