Visualizing Data Flashcards
what are two purposes of graphs
- analyze data
- communicate/present data
what are four ways to draw a BAD graph
- have the graph hide data (not showing all data points)
- have patterns hard to see
(having a 3d graph skews the data and makes it hard to read) - magnitudes are distorted
(not having y-axis starting at zero) - graphical elements are not clear
(text/figure elements are too SMALL to read)
what makes a good graph
- showing the data
(showing individual data points which allows the data shape to be displayed and patterns easier to see) - make patterns in the data easy to see
(the right graph for the data will allow the main pattern to be seen right away) - represent the magnitude honestly
(always start at zero for baselines) - draw graphical elements clearly
(including labeled axes, units, graphical symbols for more than one data set…)
what must the graph axes always start at
0
why should graph axes start at zero
makes the graph honest as the reader will now compare the data to zero as the baseline not some random data point
should graphs be 3d? why or why not?
NO - obscures the pattern in the data
how much data should be put in the graph
just enough to get the point across without overflooding the graph
what graphs should be used for showing categorical data
frequency table and bar graph
frequency table
text display of the number of occurrences of each category in the data set
bar graph
uses the height of rectangular bars to visualize the frequency of occurrences of each category
does a bar graph show exact numbers of the data
NO but it does give a picture of how steeply the numbers change between categories
what makes a good bar graph
- baseline of y axis is zero
- bars are equal width
- nomial data is organized by frequency of occurrence (greatest to least)
- bars are not fused together
- total number of observations should be recorded in figure legend
histogram
uses area of rectangular bars to display frequency
what is a histogram used for
showing data of a single numerical variable
what does a peak in a histogram refer to
an interval of the frequency distribution that is noticeably more frequent than surrounding intervals
what is an example of a histogram with a peak?
bell shaped
what does bimodal refer to with histograms
frequency distribution having TWO distinct peaks
Histogram
bar graph
frequency table
what does skew refer to in a histogram
when the frequency distribution is NOT symmetrical
what types of skew can a histogram have
negative and positive
what is an outlier
extreme data points lying well away from the rest of the data
how can outliers occur in data
- Mistakes in recording the data
- Real phenomenon that CANNOT be dropped from the data
what makes a good histogram
- bars rise from zero
- bars are contiguous and NOT spaced out
- using readable numbers for the break point between data intervals (using 5 and not 4.998)
- total number of individuals in the legend
what graphs should be used for showing ASSOCIATIONS between categorical variables
contingency table
mosaic plot
grouped bar graph
contingency table
a frequency table for two or more categorical variables