Displaying Data Flashcards
how to draw a good graph (4)
- show the data
- make patterns in the data easy to see (avoid unnecessary clutter)
- represent magnitudes honestly (have a baseline)
- draw graphical elements clearly (appropriate font and text size)
frequency and frequency distribution (2)
- frequency: number of observations having a particular measurement in a sample
- frequency distribution: number of occurrences for all values in the data
relative frequency
- proportion of observations having a given measurement, calculated as the frequency divided by the total number of observations
relative frequency distribution
- proportion/fraction of occurrences of each value in a data set
frequency table (2)
- text display of the number of occurrences of each category in a data set
- categorical data for one variable
bar graph (2)
- uses the height of rectangular bars to display frequency distribution (or relative frequency distribution)
- categorical data for one variable
how to make a good bar graph (6)
- the bars must have equal widths to represent magnitude correctly
- baseline of y-axis is at 0
- bars should stand apart, spaces between bars
- nominal data: order categories based on frequency of occurrence
- ordinal data: present values in natural order
- total # of observations (n) in figure legend
bar graph vs pie chart (2)
- bar graph is usually better than a pie chart
- more difficult to compare frequencies, supplementary labelling required
histogram (3)
- uses area of rectangular bars to display the frequency distribution (or relative frequency distribution)
- data values split into consecutive bins/intervals of equal width
- used for single numerical variable
mode
- interval corresponding to the highest peak in the frequency distribution
bimodal
- frequency distribution having two distinct peaks
symmetric
- frequency distribution having frequencies on the left half of the histogram mirror the frequencies on the right half
skewed
- frequency distribution that is not symmetric for a numerical value
uniform
- frequency distribution having level frequency distribution (all frequencies are around the same range)
outliers
- observation well outside of the range of values of other observations in a data set
how to draw a good histogram (6)
- each bar must rise from baseline of 0
- no spaces between each bar
- “left closed” intervals: value 70 falls into the interval 70-72 rather than 68-70
- number of intervals should best show patterns and exceptions in the data
- use readable numbers for breakpoints (0.5 rather than 0.486)
- include total number of individuals in legend
contingency table (2)
- used for multiple associated categorical variables
- gives the frequency of occurrence of all combinations of 2+ categorial variables
grouped bar graph (3)
- uses height of rectangular bars to display frequency distributions of 2+ categorical variables
- different categories of response variable are indicated by different colours
- bars are grouped by category of the explanatory variable treatment
mosaic plot (3)
- area of rectangles to display relative frequency occurrence of all combinations of 2 categorical values
- bar area and height indicate the relative frequencies of the responses
- width of each vertical stack is proportional to the number of observations in that group
mosaic plot (3)
- area of rectangles to display relative frequency occurrence of all combinations of 2 categorical values
- bar area and height indicate the relative frequencies of the responses
- width of each vertical stack is proportional to the number of observations in that group
scatter plot (3)
- graphical display of two numerical values where each observation is represented as a point on a graph with two axes
- position on x-axis indicates measurement of explanatory variable
- position on y-axis indicates measurement of response variable
positive association
- points tend to run from lower left to upper right
negative association
- points tend to run from upper left to lower right
absent association
- no discernible pattern in points
strip chart
- graphical display of a numerical variable and a categorical variable in which each observation is represented as a dot
violin plot
- graph that shows approximation of frequency distribution of a numerical variable in each group and its mirror image, association between numerical and categorical
violin plot
- graph that shows approximation of frequency distribution of a numerical variable in each group and its mirror image, association between numerical and categorical
line graph (2)
- uses dots connected by line segments to display trends over time in a summary measurement, such as mean, or other ordered series
- steepness of line segment reflects speed of change between values
map
- spatial equivalent of the line graph, using colour gradient to display a numerical response variable at multiple locations on a surface
- explanatory variable: location in space
how to make a good table (3)
- make patterns in the data easy to see (avoid clutter and arrange values to facilitate pattern detection)
- represent magnitudes honestly (intervals of equal width)
- draw table elements clearly (labels, units)
what graph do you use for categorical data?
- bar graph
what graph do you use for numerical data?
- histogram
what graph do you use for multiple numerical values? (2)
- scatter plot
- line graph
what graph do you use for multiple categorical variables? (3)
- grouped bar graph
- mosaic plot
- contingency table
what graph do you use for one numerical variable and one categorical variable? (4)
- multiple histograms
- cumulative frequency diagrams
- violin plot/box plot (categorical explanatory, numerical response)
- strip chart (categorical explanatory, numerical response)