Ch3: Visual Displays of Data Flashcards
The graph, “Why does college have to cost so much?”, was coined “the most misleading graph ever published”
4 lies:
3.1: The Power of Graphs
- covers differing time periods
- compares ordinal observations to a scale observation (university rank to tuition)
- Cornell was at a lower point on the y-axis to begin with, an institution already failing to deliver what students are paying for became worse
- reverses the implied meaning of up and down (low numbers in world rankings are a good thing)
Researchers categorize lies on graphs into 2 groups:
3.1: The Power of Graphs
- Lies that exaggerate
* Lead us to think a given difference is bigger or smaller than it actually is - Lies that reverse the finding
* Lead us to think that the opposite finding is occurring
* EX: that Cornell is doing worse, when it’s actually doing better (RE: lie 3)
The best data visualizations avoid misleading tricks, including the following 5:
3.1: The Power of Graphs
- The biased-scale lie
- The sneaky sample lie
- The interpolation lie
- The extrapolation lie
- The inaccurate values lie
The biased-scale lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
EX: Restaurant rating scale including “almost perfect” and “good” - no option for bad
The sneaky sample lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
EX: students on ratemyprof are ones who strongly like, or strongly dislike, a professor (a self-selected sample)
The interpolation lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
- Involves assuming that some value between the data points lies on a straight line between those data points
- EX: reporting that Canada had its lowest number of break-ins since the 70’s. However, you can’t assume a perfect, gradual decline: in 1991, there was a dramatic increase - but this is ignored. Make sure a reasonable number of in-between data points have been reported
The extrapolation lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
Assumes that values beyond the data points will continue indefinitely (assuming that a pattern/trend in data will continue)
The inaccurate values lie
Data visualizations to avoid misleading tricks
3.1: The Power of Graphs
- Tells the truth in one part of data but visually distorts it in another place
- EX: 4 stick figures represent over 43,000 nurses. However, when adding just 3,000 more (for over 46,500), there’s an excessive amount of stick figures added - what’s the actual value of these stick figures then? Stop being so dramatic girl
Types of graphs that have two scale variables:
3.2: Common Types of Graphs
- Scatterplots (+ time plots)
- Line graphs
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
3.2: Common Types of Graphs
- Bar graphs (+ pareto charts)
- Pictorial graphs
- Pie charts
Scatterplots:
Types of graphs that have two scale variables
a graph that depicts the relation between two scale variables
- The values of each variable are marked along the two axes, and a mark/dot is made to indicate the intersection of the two scores for each participant
- Mark/dot made above the p’s score on the x-axis, and across the score on the y-axis
How to organize a scatterplot:
Types of graphs that have two scale variables
- Organize data by participant; each participant will have two scores, one on each scale variable
* EX: athletic performance score, hours of practice score - Label the horizontal x-axis with the name of the IV and its possible values, starting with 0 if practical
- Label the vertical y-axis with the name of the DV and its possible values, starting with 0 if possible
- Make a mark on the graph above each study participant’s score on the x-axis and next to his or her score on the y-axis
A scatterplot between two scale variables can tell 3 possible stories:
Types of graphs that have two scale variables
- There may be no relation at all
- A linear relation between variables: means that the relation between variables is best described by a STRAIGHT line
* Positive (upwards to right), negative (downwards to right) - A nonlinear relation between variables means that the relation between variables is best described by a line that breaks or curves in some way
Line graphs:
Types of graphs that have two scale variables
used to illustrate the relation between two scale variables
- One type is based on a scatterplot and allows us to construct a line of best fit that represents the predicted y score for each x value
What do line graphs allow us to do?
Types of graphs that have two scale variables
Allows us to use the x value to predict the y value and make predictions based on only one piece of information
Line graphs
A second type of line graph allows us to visualize changes in the values on the y-axis over time - AKA a….
Types of graphs that have two scale variables
- Time plot, or time series plot: a graph that plots a scale variable on the y-axis as it changes over an increment of time labelled on x-axis
- As with a scatterplot, marks are made similarly and a line of best fit is drawn
Steps to making a time plot:
Types of graphs that have two scale variables
- Label the x-axis with the name of the IV and its possible (should be an increment of time)
- Label the y-axis with the name of the DV and its possible values (starting with 0 if possible)
- Make a mark above each value on the x-axis at the value for that time on the y-axis
- Connect the dots
Bar graphs:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
a visual depiction of data in which the IV is nominal or ordinal and the DV is scale; the height of each bar typically represents the average value of the DV for each category
Variations of bar graphs
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
- Pareto Chart: a type of bar graph in which the categories along the x-axis are ordered from highest bar on the left to lowest bar on the right
How to make a bar graph:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
- Label the x-axis with the name and levels of the nominal or ordinal IV
- Label the y-axis with the name of the scale DV and its possible values, starting with 0 if possible
- For every level of the IV, draw a bar with the height of that level’s value on the DV
Pictorial graphs:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
a visual depiction of data typically used for an IV with very few levels (categories) and a scale DV. Each level uses a picture or symbol to represent its value on the scale DV
- EX: using drawings of people to indicate population size (short for small pop., tall for large)
- Should be used sparingly and accordingly
Pie charts:
Types of graphs with one nominal (sometimes ordinal) IV and a scale DV
a graph in the shape of a circle, with a slice for every level (category) of the IV. The size of each slice represents the proportion (or percentage) of each category
- SLICES SHOULD ALWAYS ADD UP TO 100%
- However, difficult to make comparisons with - data can almost always be presented more clearly in a table or bar graph
1st step to choosing the appropriate type of graph
3.3: How to Build a Graph
First, examine variables:
* Determine IV, DV, what type (N.O.I.R.)
* Most of the time, IV = x-axis and DV = y-axis
2nd step to choosing the appropriate type of graph (5 considerations)
3.3: How to Build a Graph
Second, after assessing the types of variables that are in the study, use the following to select the appropriate graph:
- If there is one scale variable (with frequencies), use a histogram
- If there is one scale IV and one scale DV, use a scatterplot or a line graph
- If there is one nominal or ordinal IV and one scale DV, use a bar graph
- Consider a Pareto chart if the IV has many levels
- If there are two or more nominal or ordinal IVs and one scale DV, use a bar graph