Data visualisation Flashcards
The story of Florence Nightingale
- Led a team of nurses to
Istanbul in 1854 to assist in the
care of British soldiers fighting
in the Crimean war. - Collected data on cause of
death and invented the
“polar” or “artic” plot. - Linked deaths to poor
sanitisation.
The story of Dr John Snow
- Mapped cholera outbreaks
from nineteenth century
London. - Cholera appeared to cluster
around one water pump,
which turned out to be
contaminated by sewerage.
Pub in the area was on a separate water source and reported no cases
Why visualise data?
- It enables us to discover patterns.
- It enables us to thinking deeply about the data.
- It is enables us to see differences between groups and individuals.
- It’s all about communication.
- It tells a story.
- To us, in terms of understanding.
- To the reader, in terms of presentation.
Principles of data visualisation
- Should be guided by scientific questions. Don’t go fishing.
- Aesthetics – colours, be careful with yellow and similar colours. Note, colour blindness effects (don’t use red and green together).
- Don’t just show summary statistic (e.g. mean), show the distribution
(variance). - Think about looking at individuals, to really understand your data and
find the hidden stories.
Have small to large from left to right when in English
Make sure proportions are correct and have an absolute zero when necessary (height)
Use appropriate range/scale on axis so data are not distorted
minimise clutter on the visualisation
Don’t cherry pick time periods
Common ways to visualise data
Histogram
Density plot (smoothed histogram)
Box plot (Displays the median, the interquartile range, and the range of the data from one
factor/variable.
Can also detect outliers.)
Violin plot (Similar to a box plot but they also display the density of data for one variable/factor.)
- Relative v absolute risk.
RELATIVE RISK = ratio of one risk to another
= 0.73/0.64 = 1.14 = 14.00% increased risk
ABSOLUTE RISK DIFFERENCE= one risk
subtracted from the other = 0.73–0.64 = 0.09 =9% increased risk
Relative risk is risk comparing groups. Absolute risk is the actual risk of one group.
Differences Between Odds and Risk
Risk represents the likelihood or probability of an event, while odds represent the ratio of events to non-events.
For small probabilities, risk and odds are almost identical, but as the probability increases, the odds increase much more rapidly. For example:
If the risk is 0.1 (10%), the odds are 0.11 (1-to-9).
If the risk is 0.9 (90%), the odds are 9.0 (9-to-1).
Risk
Risk is the probability of an event occurring. It’s simply the number of people who experience an event (like developing a disease) out of the total number of people.
Mathematically:
Risk= Numberofpeoplewiththe event /
Totalnumberofpeopleinthe population
Example: If 80 out of 100 people with diabetes have hypertension, the risk of having hypertension is: {Risk} = frac{80}/{100} = 0.8 {(or 80%)}.
Range: Risk is a fraction that ranges from 0 (no chance of the event happening) to 1 (the event is certain to happen).
Odds
Odds compare the number of people who experience an event to the number of people who do not. It’s a ratio, not a probability.
Mathematically:
Odds = Numberofpeoplewiththe event /
Numberofpeoplewithoutthe event
Example: Using the same scenario, if 80 out of 100 people with diabetes have hypertension, then 20 people do not. The odds of having hypertension are:
Odds=
80 / 20 =4.0(or4-to-1).
Range: Odds range from 0 to infinity, unlike risk which is always a fraction between 0 and 1.
Absolute vs. Relative Measures (Risk and Odds)
Absolute measures (risk or odds) tell you the actual number of people affected, which gives a clearer sense of the real-world impact.
Relative measures (risk or odds) express the proportional reduction and can sound impressive but may be misleading if the absolute risk is low. They’re better for comparing the magnitude of effect between groups.
Absolute Risk (or Odds):
Absolute risk is the actual probability of an event happening in a group. It tells you the baseline risk or likelihood.
Absolute odds are the raw odds (event vs. non-event) within a group.
Example (Risk): If the risk of developing hypertension in a group is 80%, that is the absolute risk.
Example (Odds): If the odds of developing hypertension are 4-to-1 (80% have hypertension, 20% do not), those are the absolute odds.
Relative Risk (or Odds):
Relative risk compares the risk between two different groups (e.g., those receiving treatment vs. those receiving a placebo). It tells you the proportionate change in risk.
Relative odds compare the odds between two groups in the same way, providing a ratio of odds rather than probabilities.
Risk or odds in treatment group / Risk or odds in control group = Relative risk or odds
Example (Risk): If the risk of hypertension in the placebo group is 80% and the risk in the treatment group is 40%, the relative risk reduction is:
\frac{40\%}{80\%} = 0.5 { or a 50% reduction in relative risk}.
Example (Odds): Using the same data, the odds in the placebo group are 4 (80% vs. 20%), and the odds in the treatment group are 0.67 (40% vs. 60%). The relative odds reduction is:
\frac{0.67}{4.0} = 0.1675 \text{ or an 83.25% reduction in relative odds}.