Week 9 : Interpreting graphs Flashcards
Data
Types of data:
- categorial data
- numerical data
Types of data
Categorial data
- each value represents a discrete category
- order does not matter (e.g. 1=tiger, 2=lion, 3=ape)
- represent with pie charts, bar graphs & stacked column charts
Types of data
Numerical data
- each value represents either a real number (e.g., age) or place on a continuum (e.g. a rating scale)
- Order matters (1= very unhappy, 7= very happy)
- histograms & scatterplots
- Time series graphs if data are collected over time
- discrete vs continuous
numerical data
discrete
- the variable has a discrete, finite number of values
- e.g. binned volume (e.g. day of month on which u bought ur last avocado… only 31 possibilities)
numerical data
Continuous
- the variable has an infinite number of values
- e.g. log-transformed volume, average number of avodados you buy each month
- assumed to be normally distributed
What is a graph?
- visual representation of data that can present complex information quickly and clearly & assist the reader to see patterns and trends in data
- graph is good when… precise numbers arent required, trend or comparison can be demonstrated and there are relationships between data values
Graphs need to include the following…
- titles
- axis labels
- legends
- footnotes
- representation of axes
- scale and error
- a visualstyle that is easy to interpret
Graphs need to include the following…
Title
- summarize what graph is showing
- placed in centre above/below graph
- apply numbering system to the titles of all graphs
- explain what the X and Y axez represent
Graphs need to include the following…
Labels & legend
- X-axis (horizontal) and Y-azis (vertical)
- brief
- explain exactly what each aspect of the graph is showing
- include units of measurement
- Legend… key to the various data plotted on a graph (e.g. colours or shading)
Graphs need to include the following…
Footnotes
- further explain data
- e.g. in sample survey include footnote describing the sample that’s being represented and the number of respondents in the sample (n)
- incluse a base on th egraph that allows reader to see how many ppl answered the question & make a quick assessment of the likely accuracy of the results based on sample size
- should mention the source of data
Graphs need to include the following…
Axes & scales
- vertical axis starts at 0
- Only exception to this rule is when there are negative values, in which case the scale would start at less than 0
Indicating range of error
Confidence intervals
- gives an estimated range of values taken from a set of sample data
- The range of values is likely to include what the ‘true’ value would be if the entire population were to be surveyed
- when reporting exact known figures, confidence intervals are not necessary
- statistical results are often presented using the 95% confidence interval… range of valyes within which there is a 95% chance the true population value lies
- typically displayed by using error bars… if they do not overlap there is usually a statistically significant difference in the estimates for those response categories
Visual style
how to make the graph look the best
- reduce clutter
- highlight what is importan
- data ink… numbers (scale) and vital points representing data (non-data ink is titles, headings, legends, etc. should not be overused)
- colours and patterns… do not over use to distract
- dimension… graphs should be 2D whenever possible
Graphs
Bar graphs
- compares a series of categories by representing each one as a bar
- used to evaluate categorial data
- simple & easy to interpret
- easy to include error bars
- populat in survey reporting
Bar graphs
Vertical/horizontal & clustered bar graph
- vertical… best for comparing estimates (means or percentages) & between 2-7 groups
- Horizontal… best for showing categorial data when comparing estimates, 8+ different groups, use when category labels are too long to appear neatly on x-axis
- clustered…
Graphs
Stacked column graphs…
- compare percentage that each value contributes to a total of 100% across categories
- best used for categorial data when each column is comprised of no more than 3 components
Graphs
Line graphs
- used to illustrate trends over time for continuous data
- can also be used to compare 2 different variables over time
Graphs
Histograms
- Shows the distribution (shape) of a numerical variable
- Provides a visual, intuitive sense of the data (Mean, range, skew, possible outliers )
- data are grouped into ranges than plotted as connected bars
- each bar represents a range of data (width of bar proportionate to width of each category, height proportional to frequency/percentage of that cagetory)
- bars presented in ascending or descending order
- used for data that are at least at the ordinal level of measurement, and most often for plotting continuous data
Graphs
Scatter plots
- used to plot data points on a horizontal and a vertical axis to show relationships between 2 variables
- plotting continuous data
- useful when comparing 2 variables in situations when there are so many data points
Graphs
Line graphs
- Tells you how two variables are associated by drawing a line through a series of places where X and Y intersect
- X-axis: discrete variable
- Y-axis: continuous variable of interest
Graphs
Box whisker plot
- displaying variation in a set of data
- used in exploratory data analysis
- shows shape of distribution, central value & variability
- shows 5 number summary
- useful for indicating whether a distribution is skewed and whether there are unusual observations (outliers) in the data set
- Ideal for comparing distributions because the centre, spread and overall range are immediately apparent
Graphs
Pie charts
- Great for displaying relative frequencies (parts of a whole) but do not really give you much else
- Not great for displaying absolute frequencies
- show parts or percentages of a whole
- limitations… hard to tell difference, error bars & confidence intervals not shown legend & labels hard to read & align & do not work when comparing data
Graphs
time-series graphs
- special kind of line graph that shows how something changes over time
- X-axis: time, usually as a discrete variable
- Y-axis: continuous variable of interest
Importance of y-axis
- Truncating the y-axis (e.g. not starting at baseline) can exaggerate differences
- A more reasonable y-axis range starts ay 0 and show the whole range
- But also too broad of a range can minimize differences
- There should only be one y-axis
Importance of X-axis
- Truncating an Axis = Restricting range to maximize differences
- Expanding an Axis = Using too broad a range to minimize differences
- Ignoring conventions (E.g., values should go from small to large)
- Comparing non-equivalent data (E.g., two different Y axes on same graph)