Ch3: Visual Displays of Data Flashcards
The graph, “Why does college have to cost so much?”, was coined “the most misleading graph ever published”
4 lies:
3.1: The Power of Graphs
- covers differing time periods
- compares ordinal observations to a scale observation (university rank to tuition)
- Cornell was at a lower point on the y-axis to begin with, an institution already failing to deliver what students are paying for became worse
- reverses the implied meaning of up and down (low numbers in world rankings are a good thing)
Researchers categorize lies on graphs into 2 groups:
3.1: The Power of Graphs
- Lies that exaggerate
* Lead us to think a given difference is bigger or smaller than it actually is - Lies that reverse the finding
* Lead us to think that the opposite finding is occurring
* EX: that Cornell is doing worse, when it’s actually doing better (RE: lie 3)
The best data visualizations avoid misleading tricks, including the following 5:
3.1: The Power of Graphs
- The biased-scale lie
- The sneaky sample lie
- The interpolation lie
- The extrapolation lie
- The inaccurate values lie
The biased-scale lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
EX: Restaurant rating scale including “almost perfect” and “good” - no option for bad
The sneaky sample lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
EX: students on ratemyprof are ones who strongly like, or strongly dislike, a professor (a self-selected sample)
The interpolation lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
- Involves assuming that some value between the data points lies on a straight line between those data points
- EX: reporting that Canada had its lowest number of break-ins since the 70’s. However, you can’t assume a perfect, gradual decline: in 1991, there was a dramatic increase - but this is ignored. Make sure a reasonable number of in-between data points have been reported
The extrapolation lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
Assumes that values beyond the data points will continue indefinitely (assuming that a pattern/trend in data will continue)
The inaccurate values lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
- Tells the truth in one part of data but visually distorts it in another place
- EX: 4 stick figures represent over 43,000 nurses. However, when adding just 3,000 more (for over 46,500), there’s an excessive amount of stick figures added - what’s the actual value of these stick figures then? Stop being so dramatic girl
Types of graphs that have two scale variables:
3.2: Common Types of Graphs
- Scatterplots (+ time plots)
- Line graphs
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
3.2: Common Types of Graphs
- Bar graphs (+ pareto charts)
- Pictorial graphs
- Pie charts
Scatterplots:
Types of graphs that have two scale variables
a graph that depicts the relation between two scale variables
- The values of each variable are marked along the two axes, and a mark/dot is made to indicate the intersection of the two scores for each participant
- Mark/dot made above the p’s score on the x-axis, and across the score on the y-axis
How to organize a scatterplot:
Types of graphs that have two scale variables
- Organize data by participant; each participant will have two scores, one on each scale variable
* EX: athletic performance score, hours of practice score - Label the horizontal x-axis with the name of the IV and its possible values, starting with 0 if practical
- Label the vertical y-axis with the name of the DV and its possible values, starting with 0 if possible
- Make a mark on the graph above each study participant’s score on the x-axis and next to his or her score on the y-axis
A scatterplot between two scale variables can tell 3 possible stories:
Types of graphs that have two scale variables
- There may be no relation at all
- A linear relation between variables: means that the relation between variables is best described by a STRAIGHT line
* Positive (upwards to right), negative (downwards to right) - A nonlinear relation between variables means that the relation between variables is best described by a line that breaks or curves in some way
Line graphs:
Types of graphs that have two scale variables
used to illustrate the relation between two scale variables
- One type is based on a scatterplot and allows us to construct a line of best fit that represents the predicted y score for each x value
What do line graphs allow us to do?
Types of graphs that have two scale variables
Allows us to use the x value to predict the y value and make predictions based on only one piece of information
Line graphs
A second type of line graph allows us to visualize changes in the values on the y-axis over time - AKA a….
Types of graphs that have two scale variables
- Time plot, or time series plot: a graph that plots a scale variable on the y-axis as it changes over an increment of time labelled on x-axis
- As with a scatterplot, marks are made similarly and a line of best fit is drawn
Steps to making a time plot:
Types of graphs that have two scale variables
- Label the x-axis with the name of the IV and its possible (should be an increment of time)
- Label the y-axis with the name of the DV and its possible values (starting with 0 if possible)
- Make a mark above each value on the x-axis at the value for that time on the y-axis
- Connect the dots
Bar graphs:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
a visual depiction of data in which the IV is nominal or ordinal and the DV is scale; the height of each bar typically represents the average value of the DV for each category
Variations of bar graphs
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
- Pareto Chart: a type of bar graph in which the categories along the x-axis are ordered from highest bar on the left to lowest bar on the right
How to make a bar graph:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
- Label the x-axis with the name and levels of the nominal or ordinal IV
- Label the y-axis with the name of the scale DV and its possible values, starting with 0 if possible
- For every level of the IV, draw a bar with the height of that level’s value on the DV
Pictorial graphs:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
a visual depiction of data typically used for an IV with very few levels (categories) and a scale DV. Each level uses a picture or symbol to represent its value on the scale DV
- EX: using drawings of people to indicate population size (short for small pop., tall for large)
- Should be used sparingly and accordingly
Pie charts:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
a graph in the shape of a circle, with a slice for every level (category) of the IV. The size of each slice represents the proportion (or percentage) of each category
- SLICES SHOULD ALWAYS ADD UP TO 100%
- However, difficult to make comparisons with - data can almost always be presented more clearly in a table or bar graph
1st step to choosing the appropriate type of graph
3.3: How to Build a Graph
First, examine variables:
* Determine IV, DV, what type (N.O.I.R.)
* Most of the time, IV = x-axis and DV = y-axis
2nd step to choosing the appropriate type of graph (5 considerations)
3.3: How to Build a Graph
Second, after assessing the types of variables that are in the study, use the following to select the appropriate graph:
- If there is one scale variable (with frequencies), use a histogram
- If there is one scale IV and one scale DV, use a scatterplot or a line graph
- If there is one nominal or ordinal IV and one scale DV, use a bar graph
- Consider a Pareto chart if the IV has many levels
- If there are two or more nominal or ordinal IVs and one scale DV, use a bar graph
6 critical q’s to ask to understand a graph
How to read a graph:
3.3: How to Build a Graph
Clarify IV vs. DV:
* What variable are the researchers trying to predict? AKA, what is the DV?
* Is the DV nominal, ordinal, or scale
* What are the units of measurement on the DV?
- What variables did the researchers use to predict this DV? That is, what are the IVs?
- Are these two IVs nominal, ordinal, or scale?
- What are the levels for each of these IVs?
7 guidelines for Creating a Graph
3.3: How to Build a Graph
- Does the graph have a clear, specific title?
- Are both axes labelled with the names of the variables? Do all labels read left to right - even the one on the y-axis?
- Are all terms on the graph the same terms that are used in the text that the graph is to accompany? Have all unnecessary abbreviations been eliminated?
- Are the units of measurement (mines, percentages) included in the labels?
- Do the values on the axes either go down to 0 or have cut marks (double slashes) to indicate that they do not go down to 0?
- Are the colors used in a simple, clear way - ideally, shades of gray - instead of other colors?
- Has all the chartjunk been eliminated?
7 guidelines for Creating a Graph
Has all the chartjunk been eliminated?
3.3: How to Build a Graph
Chartjunk: any unnecessary information or feature in a graph that detracts from a viewer’s ability to understand the data
Includes forms of…
* Moiré vibrations:
* Grid
* Ducks
Moiré vibrations
Chartjunk
3.3: How to Build a Graph
- Moiré vibrations: any visual patterns that crate a distracting impression of vibration and movement
- Sometimes the default software/settings in statistics
Grid
Chartjunk
3.3: How to Build a Graph
- Grid: background pattern, almost like graph paper, on which the data representations, such as bars, are superimposed
- Use only for hand-drawn graphs
Ducks
Chartjunk
3.3: How to Build a Graph
Ducks: features of data that have been dressed up to be something other than merely data (EX: fancy fonts, cutesy pictures)
The future of graphs:
3.3: How to Build a Graph
- GIS - Geographic Information Systems
- Word Clouds
- Multivariable Graphs
- Innovative Ways to Display Variability
GIS - Geographic Information Systems
The future of graphs:
3.3: How to Build a Graph
- Many companies have published software that enables computer programmers to link Internet-based data to Internet-based maps
- These visual tools are all variations on geographic information systems
- Behavioural scientists can use GIS to organize workflow, assess group dynamics, study the design of classrooms, and much more
Word Clouds
The future of graphs:
3.3: How to Build a Graph
- An increasingly common type of graph, provides information on the most popular words used in a specific text
- Size of the word usually indicates the frequency of the word
Multivariable Graphs
The future of graphs:
3.3: How to Build a Graph
- As graphing technologies become more advanced, there are increasingly elegant ways to depict multiple variables on a single graph
- Can make bubble graphs: a graph that resembles a scatterplot, but the dots are replaced by the bubbles that can represent additional variables through their color and size
- EX: like the life-expectancy vs. country wealth example video shown in class
Innovative Ways to Display Variability:
The future of graphs:
3.3: How to Build a Graph
Violin plots: graphs that are shaped like a violin, and include information about a distribution’s middle score and overall variability