lecture 3 - visualising summary data Flashcards
what are the 3 main measures of central tendency/ average?
arithmetic mean, median and mode.
how to choose which measure of central tendency?
which is most useful or valid - depends on what you want
central tendency and measurement
- Remember – need interval or ratio data for mean.
- Because of link to inferential stats and normal distribution (more later in the course) mean is most common measure of central tendency for interval & ratio scales.
- But median & mode also fine depending on what information is being conveyed.
- With ordinal scales can’t use mean therefore median most common (but can use mode).
Nominal scales can’t use median or mean therefore mode most common measure of central tendency.
measures of variability
- Standard deviation - ‘average’ deviation around mean - derived from variance
- NOTE – “sample” (using N-1 formula used when want unbiased estimate from sample to population - usually use) SD/variance when estimating about population from a sample – this is almost all the time; “population” (using N formula) SD/variance when have whole population (very rare).
Inter-quartile range (IQR) variability around the mean – the difference between the 1st quartile (25% of the data below this point) and 3rd quartiles (75% of the data below this point). Thus, IQR says how spread out the middle 50% of the data are.
variability and measurement
- Remember – need interval or ratio data for SD.
- Because of link to inferential stats and normal distribution (more later in the course) SD is most common measure of variability for interval & ratio scales.
- But IQR also fine depending on what information is being conveyed.
- With ordinal scales can’t use SD therefore IQR most common.
- Nominal scales can’t use SD or IQR therefore no real measure of variability for nominal scales (other than possibly listing the number of different categories).
IQR can use ordinal data
properties of the 3 averages with respect to the shape of the distribution
normal - distribution is symmetric and unimodal - easy to summarise
distribution is positively skewed - the mean doesn’t properly summarise data as most people not clustered around it
bimodal - distribution symmetric , more than one mode, scores are clustered in more than one place so can’t describe where it clusters.
shape of distribution tells you something critical about how you summarise it
data presentation - why graph data
- Exploring data
e.g. Histograms showing shape of distribution - Summarising data
e.g. Plotting means and standard deviations - quick for unimodal data and good if lots of conditions as compresses data to manageable form and easy to understand and present - Presenting data to audience
e.g. To aid digestibility by focusing on key points
Or more cynically to mislead by distracting attention from “difficult” parts of data
how to draw a data-summary graph for interval data
- Want to produce a plot that allows comparison of the two groups
- So the frequency histogram is not best here.
- Want to plot some measure of central tendency as well as some measure of variability.
- So mean and standard deviation would be a good choice!
axis on data summary graph
X = independent variable - what you measured/ manipulated
Y = dependent variable - the data
data summary graph - error bars for mean
length is usually +/ - 1 x standard deviation
error bars can go above or below
mean is the height of the bars on the graph and then error bar goes on top
‘data-summary’ graphs for medians - ordinal data
appropriate summary stats are median and IQR
data summary graph - error bars for median
length is usually +/ - 1 x IQR
height of bar is median
summary of how to draw a graph of means with error bars
error bar - length relates to some multiple of eg SD
and bar or dot represents the mean
y axis = DV and units if applicable
x axis = Iv and units if applicable
always define critical features eg error bars and central tendency measure in caption
graph usually called a ‘figure’s text
caption usually underneath
graphs can distort
different scales and binning on graphs can make the data look distributed differently
Tufte (2001) points out that graphs should do the following among other things
- Show the data.
✓ Induce the reader to think about the data being presented (rather than some other aspect of the graph, like how pink it is).
✓ Avoid distorting the data.
✓ Present many numbers with minimum ink.
✓ Make large data sets (assuming you have one) coherent.
✓ Encourage the reader to compare different pieces of data.
✓ Reveal the underlying message of the data
but graphs don’t often do these things