data visualisation Flashcards
1
Q
Statistical visualization
A
- A fancy way of saying graphs?
- Sort of, but visualisations we’ll learn about today draw on statistical calculations (median, IQR)
- So we are not just graphing raw data but visualising statistical summaries of data
2
Q
Why might we use graphs rather than numeric summaries (tables, numbers in text)?
A
- Visually appealing
- Can convey a ”story” without reader needing to know statistics
- Quickly identify trends or patterns in data
- Descriptive statistics can hide patterns in data
3
Q
Histogram
A
- Used for continuous data
- Bar charts are used for categorical data
- Divide up all the possible values into “bins” and then count the number of observations in each bin
4
Q
Tukey Boxplots
A
- Suited to continuous data
- Shows 5 descriptive statistics in one plot
- Minimum bound, Q1, median, Q3, maximum bound
- Plus outliers
- Allow you to say a lot about your data, as we will find out
- Each section contains 25% of the data
- The size of these sections tell us about the amount of variation
- The lower “whisker” is short, so data here are quite similar
- The upper whisker is long, so there is more variability in the data here
- Let’s see this in histogram form
5
Q
Minimum and Maximum Bounds
A
- Not true min and max
- Min and Max bounds are largest data points above and below the thresholds
- These thresholds are Q1 – (1.5 x IQR) and Q3 + (1.5 x IQR)
- Observations larger than this value are referred to as outliers, and plotted as dots
6
Q
(r studio) histogram
A
• Histogram(data, x = dataset, bins = 16, by = dataset, position =’dodge’)
7
Q
(r studio) boxplot
A
• Tukeyboxplot(data = data, y = dataset, x = dataset) + labs(x = ‘data’, y = ‘dataset, in seconds’
8
Q
(r studio) scatterplot
A
• scatterplot(data = data, x =dataset, y = dataset)