Descriptive Stats Flashcards
Descriptive stats vs. inferential stats
D: describe, organize, or summarize data
I: generalize from sample of data to larger groups of subjects using inductive reasoning
Population vs. sample
Pop: largest collection of entities about which an investigator wishes to draw conclusions
Sample: subset of population actually being studied
Probability sample
Investigator can specify chance of subject being selected
4 types of probability samples
simple random
stratified random
cluster
systematic
simple random sample
all members of pop have equal chance to be selected; “representative” if resembles source population
stratified random sample
population divided into groups with shared characteristics, random samples from each group; may be more representative of population
cluster samples
ex: randomly select 5 schools then randomly select equal # students from each
used when too expensive/ labor intensive to use other methods
systematic samples
systematic selection of subjects, e.g. every 5th pt admitted to hospital
may be = simple random without randomization but may be prone to selection bias if systematic error involved
stratified vs. cluster sampling
strata are homogenous, then members of strata are randomly selected
clusters are heterogenous “natural groupings” and are selected at random
probability addition rule
prob (Ex or Ey) = prob(Ex) + prob(Ey)
probability multiplication rule
prob (Ex and Ey) = prob(Ex)*prob(Ey)
*assuming events are independent of each other
probability binomial distribution
probability that specific combos of 2 mutually exclusive independent events will occur
4 types of variables
nominal
ordinal
interval
ratio
nominal variables
“categorical”
names or labels with no inherent order, e.g. race and gender
*includes dichotomous/ binomial data
ordinal variables
“ranked”
natural order exists but not evenly spaced, e.g. cancer grade, pain score, scale from best to worst
interval variables
space between adjacent scale values are equal, but no absolute 0, e.g. temperature in C; ratios cannot be computed (100* C =/= (2 x 50* C)
ratio variables
true zero is absence of variables, and ratios can be calculated; e.g. annual income, BP, Kelvin temperature
discrete variable
takes only certain limited values, no in-between; e.g. race, gender, # of anything
continuous variables
may take any value though may have limited range; e.g. weight, age, BP
dependent variable
outcome of interest in study that is expected to change based on intervention
independent variable
intervention, exposure, or factor that may influence dependent variable
3 ways to summarize data
frequency distribution
measures of central tendency
measures of dispersion
frequency distribution
data points within certain categories = frequency -> percent/ relative frequency vs. all data points
measures of central tendency
mean, median, mode
measures of dispersion/ spread/ variation
range (max and min), interquartile range, variance, SD, variation coefficient
variance equation
s^2 = (sum[{xi-xbar}^2]) / (n-1)
xi - xbar = deviation score
coefficient of variation equation
SD / mean
percent of population within 1, 2, and 3 SDs of mean
1: ~68%
2: ~95%
3: ~99.7%
left/negative skew and right/positive skew
normal curve: mode = mean = median
left: mean
what is z-score
indicates how many SDs an element is from the mean in a normal distribution
z-score equation
z = (X - u)/ sigma X = sample statistic/ estimate u = mean sigma = SD
z-score and directionality
in a two-sided z-score table, positive value is for values to left of mean
for right side, use 1 - z-score