Module 4: Visualizing Data Flashcards
Contingency table
-shows frequency or proportion of sampling units in each level of a categorical variable
-the frequency is simply the number of sampling units that falls in each level
Contingency table proportions
-help with visualizing the relative distribution of sampling units among the levels
One-way and two-way categorical data
-just refer to the number of categorical variables you observe for each sampling unit
Marginal distributions
-good way to see overall patterns in data
Row marginal distribution
-shows the total counts for each row across columns
Column marginal distributions
-shows the total counts for each column across all rows
Column marginal distributions
-shows the total counts for each column across all rows
To calculate marginal distributions as proportions
-sum all frequencies in the table
-row: sum frequencies across all columns for each row and divide by table total
-column: sum frequencies across all rows for each column and divide by table total
Conditional distributions
-shows the interaction between categorical variables (shown as a separate table)
-to create one you first have to select one of the variables to be the primary variable to be the primary and the other to be the secondary (conditional) variable
Seeing the pattern
-conditional distributions allow us to see how the secondary variable changes across the primary variable
Bar graphs
-can be used to visualize categorical data
-most relevant information should be on the horizontal axis
-two variable bar graphs: first designate one variable as the grouping variable and the create it as a grouped bar chart or stacked bar chart
Histograms
-divide numerical data into a number of bins of equal size
-count how many sampling units fit within each bin (frequency)
-create a plot where each bin has a bar with a height equal to the frequency of that bin
Box plots
-box: drawn from 1st quartile to 3rd quartile
-solid line: drawn at the median value
-extreme threshold: 1.5 x IQR above and below box
-whiskers: drawn from edge of box plot to last data point within threshold
-extreme values: symbols drawn overtop any data points outside the extreme threshold
Scatter plot
-used to visualize the relationship between 2 numerical variables, each point on scatter plot is a sampling unit
Line plot
-used when you have data on 2 numerical variables, and where the researcher has taken repeated measures from the same sampling unit (data points are not independent of eachother)