Interpreting Graphical Summaries Flashcards
which graphs are quantitative and categorical
quantitative (numerical)
- stem-and-leaf plots
- dotplots
- histograms
- boxplots
- time series plots
qualitative (categorical)
- bar charts
- pie charts
important features of a graph
- what is the overall pattern?
- what is the shape and centre of the distribution?
variability: how spread out is the distribution?
- are there any big deviations from this pattern?
- are there any gaps? are there any outliers (which are points that fall well outside the overall pattern)
- number and location of peaks
histogram
shapes of histogram:
- unimodal histogram is one that rises to a single peak and then declines
- bimodal histogram has two different peaks (may occur when the data set consists of observations on two quite different kinds of individuals or objects, e.g. male vs female)
- histogram with more than 2 peaks is multimodal
- a histogram is symmetric if the left half is a mirror image of the right half
- a unimodal histogram is positively (right) skewed if the right tail is longer and flatter than the left tail and negatively (left) skewed if the left tail is longer and flatter
boxplot
- boxplots are most useful when comparing two or more distributions (whereas stem-and-leaf plots/histograms provide clearer displays of a single distribution)
- box plots can also be used to identify the approximate shape of the distribution of a set of data
5 number summary (max, Q3, median, Q1, min)
modified boxplots and outliers
outliers are values that are far removed from the rest of the distribution
common rule: values more than 1.5 x of IQR beyond Q1 or Q3
bar chart
a bar chart presents categorical data in the form of bars that provide a visual display of the relative sizes of each category
- are particularly useful to show comparisons where the actual size of the data is important
pie chart
pie chart presents categorical data in the form of slices of a pie providing a visual display of what fraction of the whole each category takes up
- are an appropriate means representing relative differences in data
stem plot
easy way to put a list of numbers into order while getting a picture of their shape
help determine shape of a data set, identify outliers and locate the center
pictogram
pictogram is like a bar graph except it uses pictures related to the topic of the graph
easy to be mislead by pictograms because the eye tends to focus on the area of the picture instead of just the height
line graph
example of line graph displayed over time
shows winning times for the men’s 500 meter speed skating 1924-2010
patterns are easy to detect with pictures than they would be scanning a list
scatterplot
scatterplots are useful for displaying the relationship between two measurement variables
each dot represents one individual, unless two or more individuals have the same data, in which case only one point is plotted at that location (some software programs replace with a number)
harder to read than line graph, but displays more information, shows outliersk as well as the degree of variability that exists for one variable at each location of the other variable
time series plot
time series is simply a record of a variable across time, usually measured at equally spaced intervals
for instance, most economic data used by both governments and businesses are measured monthly
components of time series - long-term trend, seasonal components, irregular cycles and random fluctuations