Visualizing Data Flashcards
what are two purposes of graphs
- analyze data
- communicate/present data
what are four ways to draw a BAD graph
- have the graph hide data (not showing all data points)
- have patterns hard to see
(having a 3d graph skews the data and makes it hard to read) - magnitudes are distorted
(not having y-axis starting at zero) - graphical elements are not clear
(text/figure elements are too SMALL to read)
what makes a good graph
- showing the data
(showing individual data points which allows the data shape to be displayed and patterns easier to see) - make patterns in the data easy to see
(the right graph for the data will allow the main pattern to be seen right away) - represent the magnitude honestly
(always start at zero for baselines) - draw graphical elements clearly
(including labeled axes, units, graphical symbols for more than one data set…)
what must the graph axes always start at
0
why should graph axes start at zero
makes the graph honest as the reader will now compare the data to zero as the baseline not some random data point
should graphs be 3d? why or why not?
NO - obscures the pattern in the data
how much data should be put in the graph
just enough to get the point across without overflooding the graph
what graphs should be used for showing categorical data
frequency table and bar graph
frequency table
text display of the number of occurrences of each category in the data set
bar graph
uses the height of rectangular bars to visualize the frequency of occurrences of each category
does a bar graph show exact numbers of the data
NO but it does give a picture of how steeply the numbers change between categories
what makes a good bar graph
- baseline of y axis is zero
- bars are equal width
- nomial data is organized by frequency of occurrence (greatest to least)
- bars are not fused together
- total number of observations should be recorded in figure legend
histogram
uses area of rectangular bars to display frequency
what is a histogram used for
showing data of a single numerical variable
what does a peak in a histogram refer to
an interval of the frequency distribution that is noticeably more frequent than surrounding intervals
what is an example of a histogram with a peak?
bell shaped
what does bimodal refer to with histograms
frequency distribution having TWO distinct peaks
Histogram
bar graph
frequency table
what does skew refer to in a histogram
when the frequency distribution is NOT symmetrical
what types of skew can a histogram have
negative and positive
what is an outlier
extreme data points lying well away from the rest of the data
how can outliers occur in data
- Mistakes in recording the data
- Real phenomenon that CANNOT be dropped from the data
what makes a good histogram
- bars rise from zero
- bars are contiguous and NOT spaced out
- using readable numbers for the break point between data intervals (using 5 and not 4.998)
- total number of individuals in the legend
what graphs should be used for showing ASSOCIATIONS between categorical variables
contingency table
mosaic plot
grouped bar graph
contingency table
a frequency table for two or more categorical variables
why is a contingency table used
to show how frequencies of the categories in a response variable are contingent upon the value of the exploratory variable
what does a cell refer to in a contingency table
one combination of categories of the row and column variables in the table
what variable goes in the columns of a contingency table
explanatory variable
what variable goes in the row of a contingency table
response variable
contingency table
how does a mosaic plot differ from a grouped bar graph
the bars within treatment groups are stacked on top of one another
how to read a mosaic plot
the bar area and height relate to relative frequencies of the responses
how to read whether there are associations between treatment and response variables in a MOSAIC PLOT
Yes association: vertical position where the colours meet will differ between stacks
No association: the meeting point between colors will be at the same vertical position between stacks
do mosaic plots show absolute or relative frequencies in each combination of variables
RELATIVE
Mosaic plot
grouped bar graph
uses the heights of rectangles to graph the frequency of occurrences of all combinations of two or more categorical variables
grouped bar graph
how are bars groped in a grouped bar graph
by the categories of explanatory variables
what graph is used to show associations between numerical variables
scatter plot
what variable is found each axis in a scatter plot
x - explanatory variable
y - response variable
what are possible associations in a scatter plot
positive (graph runs lower left to upper right)
Negative (points run from upper left to lower right
Absent (no easily seen pattern)
scatter plot
what graphs are used to show associations between numerical and categorical variables
strip chart
violin plot
multi-histogram method
strip chart
where each observation is represented as a dot on the graph
how are the axes of a strip chart labeled
x axis - categorical measurements
y axis - numerical measurements
how does a strip chart differ from a scatter plot
by the explanatory variable being categorical and NOT numerical
when is a strip chart ideal
when there are only A FEW observations in each category to reduce overcrowding of data
strip chart
violin plot
displays data using compact visual symmetry
how is a violin plot similar and different from a histogram
like: approximates the frequency of each group
differ: distribution is smoothed and shown with mirror image
what does the dot in the center of each violin mean in a violin plot
mean of the data
when is a violin plot ideally used
when the goal of the graph is to show the most important features of the frequency distribution = used for large number of observations
violin plot
how should multi-histograms be positioned
stacked on top of each other so the spread of data between them is easier to compare
when does the multi-histogram method best for
only when there is a FEW categories as they can take up lots of room
stacked histogram method
what graphs are used to show trends in time and space
line graph and map
line graph
displays trends over time by using dots connected by line segments in a summary measurement (mean)
what do the lines show in a line graph
connecting two points together shows the temporal pattern
line graph
what is the spatial equivalence of a line graph
map
map
graph that uses colour gradients to display numerical response variables at different locations
map
two types of tables for displaying data
display table and data tables
display tables
numerical detail is less important than the effective communication of results
data tables
purpose is to store raw data for reference purposes NOT for communicating general findings