lecture 3 - visualising summary data Flashcards

1
Q

what are the 3 main measures of central tendency/ average?

A

arithmetic mean, median and mode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how to choose which measure of central tendency?

A

which is most useful or valid - depends on what you want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

central tendency and measurement

A
  • Remember – need interval or ratio data for mean.
  • Because of link to inferential stats and normal distribution (more later in the course) mean is most common measure of central tendency for interval & ratio scales.
  • But median & mode also fine depending on what information is being conveyed.
  • With ordinal scales can’t use mean therefore median most common (but can use mode).
    Nominal scales can’t use median or mean therefore mode most common measure of central tendency.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

measures of variability

A
  • Standard deviation - ‘average’ deviation around mean - derived from variance
  • NOTE – “sample” (using N-1 formula used when want unbiased estimate from sample to population - usually use) SD/variance when estimating about population from a sample – this is almost all the time; “population” (using N formula) SD/variance when have whole population (very rare).
    Inter-quartile range (IQR) variability around the mean – the difference between the 1st quartile (25% of the data below this point) and 3rd quartiles (75% of the data below this point). Thus, IQR says how spread out the middle 50% of the data are.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

variability and measurement

A
  • Remember – need interval or ratio data for SD.
  • Because of link to inferential stats and normal distribution (more later in the course) SD is most common measure of variability for interval & ratio scales.
  • But IQR also fine depending on what information is being conveyed.
  • With ordinal scales can’t use SD therefore IQR most common.
  • Nominal scales can’t use SD or IQR therefore no real measure of variability for nominal scales (other than possibly listing the number of different categories).
    IQR can use ordinal data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

properties of the 3 averages with respect to the shape of the distribution

A

normal - distribution is symmetric and unimodal - easy to summarise

distribution is positively skewed - the mean doesn’t properly summarise data as most people not clustered around it

bimodal - distribution symmetric , more than one mode, scores are clustered in more than one place so can’t describe where it clusters.

shape of distribution tells you something critical about how you summarise it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

data presentation - why graph data

A
  • Exploring data
    e.g. Histograms showing shape of distribution
  • Summarising data
    e.g. Plotting means and standard deviations - quick for unimodal data and good if lots of conditions as compresses data to manageable form and easy to understand and present
  • Presenting data to audience
    e.g. To aid digestibility by focusing on key points
    Or more cynically to mislead by distracting attention from “difficult” parts of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to draw a data-summary graph for interval data

A
  • Want to produce a plot that allows comparison of the two groups
    • So the frequency histogram is not best here.
  • Want to plot some measure of central tendency as well as some measure of variability.
    • So mean and standard deviation would be a good choice!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

axis on data summary graph

A

X = independent variable - what you measured/ manipulated

Y = dependent variable - the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

data summary graph - error bars for mean

A

length is usually +/ - 1 x standard deviation

error bars can go above or below

mean is the height of the bars on the graph and then error bar goes on top

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

‘data-summary’ graphs for medians - ordinal data

A

appropriate summary stats are median and IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

data summary graph - error bars for median

A

length is usually +/ - 1 x IQR

height of bar is median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

summary of how to draw a graph of means with error bars

A

error bar - length relates to some multiple of eg SD
and bar or dot represents the mean
y axis = DV and units if applicable
x axis = Iv and units if applicable

always define critical features eg error bars and central tendency measure in caption

graph usually called a ‘figure’s text
caption usually underneath

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

graphs can distort

A

different scales and binning on graphs can make the data look distributed differently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Tufte (2001) points out that graphs should do the following among other things

A
  • Show the data.
    ✓ Induce the reader to think about the data being presented (rather than some other aspect of the graph, like how pink it is).
    ✓ Avoid distorting the data.
    ✓ Present many numbers with minimum ink.
    ✓ Make large data sets (assuming you have one) coherent.
    ✓ Encourage the reader to compare different pieces of data.
    ✓ Reveal the underlying message of the data

but graphs don’t often do these things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

different types of histograms

A

Simple histogram: Use this option to visualize frequencies of scores for a single variable.

Stacked histogram: If you have a grouping variable (e.g., whether people worked hard or wished upon a star) you can produce a histogram in which each bar is split by group. In this example, each bar would have two colours, one representing people who worked hard and the other those who wished upon a star. This option is a good way to compare the relative frequency of scores across groups (e.g., were those who worked hard more successful than those who wished upon a star?).

Frequency polygon: This option displays the same data as the simple histogram, except that it uses a line instead of bars to show the frequency, and the area below the line is shaded.

Population pyramid: Like a stacked histogram, this shows the relative frequency of scores in two populations. It plots the variable (e.g., success after 5 years) on the vertical axis and the frequencies for each population on the horizontal: the populations appear back to back on the graph. This option is useful for comparing distributions across groups.

17
Q

what is a box plot

A

one of the best ways to display your data. at the centre of the plot is the median. This statistic is surrounded by a box the top and bottom of which are the limits within which the middle 50% of observations fall (the interquartile range, IQR). Sticking out of the top and bottom of the box are two whiskers which show the top and bottom 25% of scores (approximately).

18
Q

different types of box plot

A

1-D Boxplot: This option produces a single boxplot of all scores for the chosen outcome (e.g., level of success after 5 years).

Simple boxplot: This option produces multiple boxplots for the chosen outcome by splitting the data by a categorical variable. For example, we had two groups: wishers and workers. It would be useful to use this option to display different boxplots (on the same graph) for these groups (unlike the 1-D boxplot, which lumps the data from these groups together).

Clustered boxplot: This option is the same as the simple boxplot, except that it splits the data by a second categorical variable. Boxplots for this second variable are produced in different colours. For example, imagine we had also measured whether our participants believed in the power of wishing. We could produce boxplots not just for the wishers and workers, but within these groups we could also have different-coloured boxplots for those who believe in the power of wishing and those who do not.

19
Q

types of bar chart

A

Simple bar: Use this option to display the means of scores across different groups or categories of cases. used for related means. For example, you might want to plot the mean ratings of two films.

Clustered bar: If you have a second grouping variable you can produce a simple bar chart (as above) but with different coloured bars to represent levels of a second grouping variable. use it for means that are independent, related or ‘mixed’ designs. For example, you could have ratings of the two films, but for each film have a bar representing ratings of ‘excitement’ and another bar showing ratings of ‘enjoyment’.
Stacked bar: This is like the clustered bar, except that the different-coloured bars are stacked on top of each other rather than placed side by side.

Simple 3-D bar: This is also like the clustered bar, except that the second grouping variable is displayed not by different-coloured bars, but by an additional axis. Given what I said in Section 5.2 about 3-D effects obscuring the data, my advice is to stick to a clustered bar chart and not use this option.

Clustered 3-D bar: This is like the clustered bar chart above, except that you can add a third categorical variable on an extra axis. The means will almost certainly be impossible for anyone to read on this type of graph, so don’t use it.

Stacked 3-D bar: This graph is the same as the clustered 3-D graph, except the different-coloured bars are stacked on top of each other instead of standing side by side. Again, this is not a good type of graph for presenting data clearly.

Simple error bar: This is the same as the simple bar chart, except that, instead of bars, the mean is represented by a dot, and a line represents the precision of the estimate of the mean (usually, the 95% confidence interval is plotted, but you can plot the standard deviation or standard error of the mean instead). You can add these error bars to a bar chart anyway, so really the choice between this type of graph and a bar chart with error bars is largely down to personal preference. (Including the bar adds a lot of superfluous ink, so if you want to be Tuftian about it you’d probably use this option over a bar chart.)

Clustered error bar: This is the same as the clustered bar chart, except that the mean is displayed as a dot with an error bar around it. These error bars can also be added to a clustered bar chart

20
Q

line charts

A

bar charts but with lines instead of bars.

Simple line: Use this option to display the means of scores across different groups of cases.

Multiple line: This option is equivalent to the clustered bar chart: it will plot means of an outcome variable for different categories/groups of a predictor variable and also produce different-coloured lines for each category/group of a second predictor variable.

21
Q

what is a scatterplot

A

A scatterplot is a graph that plots each person’s score on one variable against their score on another. It visualizes the relationship between the variables, but also helps us to identify unusual cases that might bias that relationship

22
Q

types of scatterplot

A

Simple scatter: Use this option to plot values of one continuous variable against another. looks just at two variables.

Grouped scatter: This is like a simple scatterplot, except that you can display points belonging to different groups in different colours (or symbols). displays scores on two continuous variables but colours the data points by a third categorical variable.

Simple 3-D scatter: Use this option to plot values of one continuous variable against values of two others. displays the relationship between 3 variables.

Grouped 3-D scatter: Use this option to plot values of one continuous variable against two others, but differentiating groups of cases with different-coloured dots. displays the relationship between 3 variables.

Summary point plot: This graph is the same as a bar chart (see Section 5.6), except that a dot is used instead of a bar

Simple dot plot: Otherwise known as a density plot, this graph is like a histogram (see Section 5.4), except that, rather than having a summary bar representing the frequency of scores, individual scores are displayed as dots. Like histograms, they are useful for looking at the shape of the distribution of scores. like a histogram the data are still placed into bins but a dot is used to represent each data point.

Scatterplot matrix: This option produces a grid of scatterplots showing the relationships between multiple pairs of variables in each cell of the grid. allows you to see the relationship between all combinations of many different pairs of variables.

Drop-line: This option produces a plot similar to a clustered bar chart (see, for example, Section 5.6.2) but with a dot representing a summary statistic (e.g., the mean) instead of a bar, and with a line connecting the ‘summary’ (e.g., mean) of each group. These graphs are useful for comparing statistics, such as the mean, across groups or categories

23
Q

catterplot

A

The catterplot is a variation on the scatterplot that was designed by Herman Garfield to overcome the difficulty that sometimes emerges when plotting very unpredictable data. He named it the catterplot because of all the things he could think of that were unpredictable, cat behaviour topped his list.