Module 2 - Section 1 Flashcards
nature of the variable
categorical or numerical/ qualitative or quantitative
Graph types for categorical variable
bar chart, pie chart
graph types for numerical variable
dot plots, stem plots, histograms, time plots, box plots, scatter plots
distribution of a categorical variable
how the observations are distributed across the categories
frequency
number of individuals who fall into each category
AKA count
relative frequency
fraction of how many people who fall into each category over total number of individuals
frequency/number of observations
percentage of relative frequency
(frequency/total # of observations) x 100%
frequency distribution table
categories in one column, frequency in next column
relative frequency table
columns: category, relative frequency, percents
purpose of bar chart
shows frequency or percent in the different categories
purpose of pie chart
shows relationship between the parts and the whole
bar chart (appearance)
Title
frequency
or
relative ________________________ bars
frequency
______________________ categories
frequency vs relative frequency bar charts
appear the same, just different scale on the y-axis
pie chart
circular chart, emphasize each category to the whole
slice size=category relative frequency x 360 degrees
contingency table
shows relationship between two categorical variables
appearance of contingency table
Title
smoker non smoker total
heart Y #1 #2 1+2
disease N #3 #4 3+4
Total 1+3 2+4 1+2+3+4
marginal distribution
the totals along the bottom and right of a contingency table that shows the frequency distribution of its respective variable
conditional distribution
shows the distribution of one variable for just the individuals who satisfy some condition on another variable
Smokers that have and do not have heart disease.
Those with heart disease that do or do not smoke.
Associated
if the conditional distribution varied, or is not distributed equally, then the two variables are associated.
AKA dependent or not independent
Independent
if the conditional distribution is the same for each category, or evenly distributed then the variables are independent.
AKA no association or not dependent