module 4 Flashcards
contingency tables
- shows frequency of sampling units
- tables of data frequencies within diff levels of categorical data
types of contingency tables
one-way and two-way tables
calculate marginal distributions as frequencies
- Row: sum frequencies across all columns for each row
- Column: sum frequencies across all rows for each column
calculate marginal distributions as proportions
- Table total: sum all frequencies in the table
- Row: sum frequencies across all columns for each row and divide by table total
- Column: sum frequencies across all rows for each column and divide by table total
marginal distribution
- row and column sums of a two-way contingency table.
- can be shown as frequencies or proportions
what do marginal distributions show
- how many sampling units are in each level of a categorical variable w out the need of other categorical variables
- they describe overall patterns in the sample
conditional distributions
- two-way tables that show the proportion of sampling units for one variable within each level of the second variable
- shows the relationship between two variables
- shown as a separate table
how do you create a conditional distribution table
- calculated from contingency table and marginal distribution
- select one of the categorical variables to be primary and one to be secondary (aka conditional)
- take the frequency from the contingency table and divide it by the marginal distribution of the primary variable
- basically: take the value and divide it by the sum of the row/column
how do you choose primary and secondary variables in calculating a conditional distribution
- depends on the question being asked
- ex. are there more _____(primary) than _____ (primary) in the _____(secondary) category? or how many ppl like _____(secondary) when doing _____(primary)
what do the primary and secondary variables in calculating conditional distributions determine
- if you use the row or column marginal distribution
what do conditional distributions show
- relative frequency of secondary variables within each level of the primary variable
- shows how the secondary variable changes across the primary variable
t or f: bar graphs are only used to visualize single variable categorical data
false, single and two variable categorical data
t or f: bar graphs are good at visualizing numerical data
- false, only acceptable in one case as it only shows average numerical value
- acceptable: stat datasets have categorical info on many sampling units, data is not statistical in nature
- not acceptable: if data is from a statistical population w one numerical and one categorical value
t or f: bar graphs can be horizontal or vertical
true, choice depends on focus of research question with more relevant info on the horizontal axis
how do you display data w two categorical measurement variables in a bar graph
- designate one variable as grouping variable (base of the figure, level of other variable are shown within it) it is whichever variable shows the info more clearly
- decide whether to create the figure as a grouped or stacked bar graph
what are the two types of two variable bar graphs
- grouped: variables are separate but shown beside each other in groups for each variable on the x axis
- stacked: variables are stacked on top of each other, just one bar per variable on the x axis
when should you not use stacked bar graphs
when two different variables you want to compare have the same value
each bar or group of bars in a bar graph should be separated by a ______
gap
steps in making a histogram
- divide the numerical variable into bins of equal size
- count how many sampling units fit within each bin (frequency)
- create a plot w each bin having a bar w a height equal to that bins frequency
t or f: histograms show the separation of variables through a gap between the bars
false
histograms
- split numerical data into bins and display the number of sampling units of each bin
- for numerical data
what are the lines drawn from the edge of the box to the last data point within the extreme threshold in box plots called
whisker plots
box plot
- based on quartiles
- shows 5 descriptive stats (minimun, 1st quartile, median, 3rd quartile, and maximum)
- shows interquartile range
- shows how numerical variables change across multiple categorical groups
- equally spaced categorical groups across the x axis with a box for each group drawn
t or f: box plots can sometime show extreme values
true