Problems with Describing Data Flashcards
Know what mean, median, mode are; whether they are parametric or nonparametric; and the problems with each
Mean: parametric
Problems: strongly influenced by extreme outliers and strongly influenced by skewed distributions
Median: non-parametric
Problems: not accurate when only a few different score values are involved (dogs leg example)
Mode: non-parametric
Problems: does not always reflect majority, only plurality
Know the importance of variability statistics, and the definitions and problems with SD (parametric) and range statistics (non parametric)
Variability statistics: gives us the variability on how much a certain value may differ depending on the variability measure (SE, SD, Range)
SD: parametric statistic based on the average distance of scores from the mean, can be distorted by extreme scores
Range: non-parametric statistic based on highest score minus the lowest score. Can be a problem due to only using two numbers, may not be accurate
Know what’s needed to properly interpret error bars in bar charts
Need to know what the error bars are (SD, SE, CI) and the sample size
How do absolute numbers and frequencies tell different stories in data? Understand how a full picture requires both.
Absolute numbers: total number, the raw data
Frequencies: percentages, fractions
Need to know both as someone may claim there has been an increase of 100%, compared to something else 50%. But the 100% could be going from 1 to 2, while the 50% could be going from 1000 to 1500.
What is needed to understand a graph that represents rate of change and what are some common mistakes of understanding them
Need full range of data, the population, over a longer period of time, which is accurate and done, are all of the samples the same?
Need the absolute values across time or the same time frames and you need to know the denominator of the value we are looking at, e.g. how many people out of x
What is the law of large numbers?
Number will vary more when you have a small population
Understand the rule of combining percentages with different bases (such as an amount going up and then down by the same %)
The baselines values will be different once changed, so that will change the final results
Know the strengths and weakness of bar, line, and pie charts and which kinds of data each one is best for illustrating
Bar chart: good for categories, categorical data and the magnitude of data
Downsides: bad with showing multiple dimensions of data
Line chart: Good for changes over time and multiple trends simultaneously
Downsides: Dual y axis can make it difficult and misleading to understand. Log scaling = axis can be squashed unequally
Pie charts: good for proportions and percentages
Downsides: bad for data that is overtime and is harder to understand than other charts
Know the definition and differences between data ducks and glass slipper graphics and why each one is bad
Data sucks: attempt to make the chart look like something else, can be misleading and confusing to understand.
Glass slipper: using an incorrect data visualisation format for your data, forcing it into something it is not
They are different from one another as data ducks may be correct formatting but done poorly, with cosmetics being favoured. While for glass slipper is data that is in a completely wrong format
Each are bad as they are supposed to convey information clearly and quickly
What is the principle of proportional ink and when and how is it usually violated?
Principle of proportional ink: the graphic should only vary in one dimension
Usually violated through the use of overlapping bar charts or when bar charts have poor axis. 3D graphs are very bad here, as they make certain percentages look bigger due to using up more ink
Know how the axis can be used misleadingly in displaying data, what log scales are, and how they can also be misinterpreted
Axis can be misleading without a baseline of 0. Leading to an appearance that something is bigger than it actually is. They can also be reversed or upside down, promoting misunderstandings of the data.
Log scales: axes can be squashed with unequal intervals affecting perceptions of change, changing how it appears.
How Venn diagrams, subway maps, and geographical maps risk being misused as “glass slippers”
Subway map: form of visualisation, takes large amounts of complex geographical info and compresses it. Can be misused by displaying content that has none of the features of a subway map
Venn diagram: overlapping ovals used to represent group membership for items that may belong to multiple groups. Inappropriate images that represent the ovals can be misleading
The difference between bar and line graphs in whether their y-axis should include zero and why this is
Bar charts need it, while line graphs do not. Bar charts emphasise the absolute magnitude where’s line graphs emphasises changes.
How the problems of 3D graphics and with donut bar charts relate to proportional ink
Donut bar chart and proportional ink: when the bands are ordered from smallest in the centre to the largest at the periphery the amount of ink used for each band exaggerates the difference in band size.
3D graphs and proportional ink: need two IV’s, cannot use if you only have one! Perspective influences the amount of ink used.