Topic 3 - Numerical Summaries Flashcards
LO
LO3 Produce, interpret and compare graphical and numerical summaries, using base R and ggplot.
Advantages of Numerical summaries
- Numerical summaries produce all the data to 1 simple number/ stat
- Loses lots of info, but is easy to comminicate
Major features used to create a numeric summary
- Max
- Min
- Spread (stdev, range, IQR)
- Centre (mean, median)
Mean
The average of the data
= sum of data / size of data
Median
- The middle point when data is smallest to largest
Robustness
- The median is said to be robust and is a good summary for skewed data, as it is not affected by outliers
Comparing mean and median
Symetrical data:
- Excpect mean and median to be the same
Left skewed data:
- Mean expected to be smaller than median
Right skewed data:
- Expect mean to be larger than the median
Limitations of mean and median
Both need to be paired with the spread of the data
Standard deviation
- First define the Root Mean Square (RMS)
- Measures the average of a set of numbers, regardless of the signs
1) root the #
2) Mean the result
3) square the result
Stdev in terms of RMS
Stdev measures the spread of data
SDpop = RMS of gaps from the mean
pop Vs sample SD
SDpop = SD sample x Root((n-1)/n)
SD rule of thumb
1 SD = 68%
2 SD = 95%
3 SD = 99.7%
Standard units
- āzā score
- How many SD is a data point above or below the mean
Standard units = (Data Point - Mean) / SD
IQR
- Another measure of spread
- Range of the middle 50% of the data
IQR = Q3 - Q1
Coefficient of Variation
- Combines the mean and SD into one summary
CV = SD / Mean - The higher the CV, the greater the spread around the mean