Unwin Chapter 2 Flashcards
Good Graphics?
data graphics
used extensively in scientific publications, in newspapers, and in the media generally. many of those do not fully convey the information in the data they are supposed to be presenting and may even obscure it.
guides
may be drawn on a plot as a form of annotation and are useful for emphasising particular issues, eg. which values are positive or negative.
sloping guides
highlight deviations from linearity.
fitted lines
may be superimposed on data to show the hypothesised overall structure and to highlight local variability and any lack of fit. eg. polynomial regression or smoothers.
overlaying information
eg. guides or annotation, can lead to overlapping and cultured displays. they may require individual adjustments depending on the shape of the data.
captions
ideally, should fully explain the graphic they accompany, including giving the source of the data. relying on explanation in surrounding text rarely works. however, long captions can put off the reader.
legends
describe which symbols and/or colours refer to which data groups. it is recommended that this information be directly on the plot so the reader’s eyes do not have to jump back and forth.
annotations
used to highlight particular features of a graphic. there cannot be many of them. they are useful for identifying events in a time series or drawing attention to particular points in scatterplots.
frames
may be drawn to surround graphics. they take up space and add to clutter, so should only be used for purposes of separating a graphic from other graphics or text.
aspect ratios
have a surprisingly strong effect on the perception of graphics. this is especially true of a time series. if you want to show gradual change, you can grow the horizontal axis and shrink the vertical axis.
colour
one of the most effective ways of displaying data, but also most difficult to get right. factors are eg. some people are colour blind, colours have particular associations, colours may not be printed as intended, and are a matter of personal taste.
scatterplot matrices (sploms)
plots each continuous variable against every other. it is effective for a small number of variables, giving an overview of possible bivariate results. it is important to cut down on scales and place variable names on the diagonal. histograms of individual variables could also be used in addition.
parallel coordinates
valuable for displaying large numbers of continuous variables simulateneously. plots of different variables for the same dataset.
mosaic plots
display the counts in multivariate contingency tables. there are various types. the width of a column is proportional to the number with that combination of factors.
small multiples
can work well, but careful captioning is necessary to ensure that it is clear which smaller plot is which, and common scaling is essential.