Module 4 Flashcards
w
what is a contingency table?
data frequencies or proportions within different levels of categorical variable.
What are one way and two way contingency tables?
- they jusr refer to the number of categorical variables you observe for each sampling unit
What are marginal dsitributions?
- one way to see overall patterns in the data
- claculate row and column frequencies
- the row and column sums of a two-way contingency table. They can be shown as frequencies or proportions.
How to find marginal distributions in rows vs columns?
rows: sum frequencies accross all columns for each row
column: sum frequencies accross all rows for each column
what are conditional distributions?
two-way tables that show the proportion of sampling units for one variable within each level of the second variable. the interaction between categorical variables (shown as seperate table)
How create conditional distribution?
select one of the categorical variables to be the primary variable and the other one to be the secondary (conditional) variable
How are conditional distributions calculated?
calculated as the frequency from contingency table divided by the marginal distribution of the primary variable
- identify primary and secondary variable
- for each cell in the new table, divide the value from the contingency table by the marginal distribution of the primary variable
What do contiditonal distributions show us in regards to the variables?
allow us to see how the secondary variable changes accross the primary variable
What is a bar graph?
used to visualize categorical data
vertical or horizontal orientation
What are two variable bar graphs?
- can be used to display data with two categotical measurement variables
- designate one variable as the grouping vairable (forms the base of the figure, and levels of the other variable are shown within each level)
- next step: do we create it as a grouped bar chart or a stacked bar chart?
What type of variable is good for a grouping variable?
ordinal categorical variables
What are grouped bar charts
- the second variables are shown beside each other within each level of the grouping variable
- levels of grouping variable are separated using a large gap
What is a stacked bar chart?
- levels of the second variable are stacked on top of one another within each level of the grouping variable
- just one bar for each level of the grouping variable (color used to sepearate)
What are histograms?
- visualize numerical data
- split numerical data into bins of equal size and display the number of sampling units in each bin
what are the three steps of how histograms are created?
- divide the numerical variable into a number of bins of equal size
- count how many sampling units fit within each bin (frequency)
- create a plot where each bin has abar with a height equal to the frequency of that bin, make sure no gaps between the bars
Advantages of histograms?
- good way to visualize pattern of relative abundance in your sampling units along the numerical variable
Disadvantage of histogram?
complicated to display hisograms when the dataset has many levels of categorical variable
what is a bin?
a small range of the numerical variable. The numerical variable is divided into a number of bins of equal size forming the base of the figure.
What are box plots?
- visualize numerical data
- based on quartiles and are popular because they show five descriptive statistics in relative compact design
What do boxplots show?
- 1st quartile
- 3rd quartile
- minimum
- median
- maximum
What happens with grouped box plots for categorical groups?
- designate one categorical variable as the grouping variable
- the second as the secondary variable
- grouping variable forms the base of the figure
- the levels of the secondary variable are shown within each level of the grouping variable
When should we use histograms vs box plots
if you have numerical data for a small number of categorical groups and want to showcase the shape of the data distribution, then histograms are the choice.
if you have many categorical groups, or are not interested in showcasing the shape of the data distribution, then use box plot
What is a scatter plot?
- used to visualize the relationship between two numerical variables
- each point on the scatterplot is a sampling unit
- both numerical variables are measured from the same sampling unuit
What is the independent vs dependent variable?
independent: the experimental treatment that is manipulated
dependent: the measured response under those treatments
What are line plots?
used when you have data on two numerical variables, and where the researcher has taken repeated measures from the same sampling unit
the repeated measurements for each sampling unit are connected together by a line so that the viewer knows the data points are not independent of each other