MODULE 4 - VISUALIZING DATA Flashcards
what is a contingency table?
tables of data frequencies or proportions within different levels of categorical variabl
what is frequency?
the number of sampling units that falls in each level
what are one-way contingency tables?
or data with a single categorical variable and are shown as a one-dimensional table of columns
what are two-way contingency tables?
for data with two categorical variables and are shown as a two-dimensional table of rows and columns
what are marginal distributions?
the row and column sums of a two-way contingency table
they can be shown as frequencies or proportions
how to calculate marginal distributions as frequencies
row: sum frequencies across all columns for each row
column: sum frequencies across all rows for each column
how to calculate marginal distributions as proportions
table total: sum all frequencies in the table
row: sum frequencies across all columns for each row and divide by table total
column: sum frequencies across all rows for each column and divide by table total
what are conditional distributions?
are two-way tables that show the proportion of sampling units for one variable within each level of the second variable
how to calculate conditional distributions
- Identify the primary versus secondary variable. This determines whether you use the row or column marginal distribution
- For each cell in the new table, divide the value from the contingency table by the marginal distribution of the primary variable.
advantages/disadvantages to a histogram
pros: provide a great way to visualize the pattern of relative abundance in your sampling units along the numerical variable
cons: that it is cumbersome to display histograms when your dataset also has multiple levels of a categorical variable
what is a bin?
a small range of the numerical variable
4 parts to a box plot
- box
- solid line
- whiskers
- extreme values