Biostatistics and Research Design Flashcards
what test to run?
comparing blood pressure to patients before and after taking their meds
paired t-test
DV: BP (continuous)
IV: before/after meals (2 paired observations)
what test to run?
comparing blood pressure values with patients on vs off their medication
2-sample t-test
DV: BP (continuous)
IV: on/off medication (2 samples, binary)
what test to run?
comparing length of hospital stay to age
correlation
DV: length of stay (continuous)
IV: age (continuous)
what test to run?
comparing length of hospital stay to age and also mobility
linear regression
DV: length of stay (continuous)
IV: age (continuous) and mobility (confounding factor)
what test to run?
comparing mortality of hospital patients by age and also mobility
logistic regression
DV: mortality (binary)
IV: age (continuous) and mobility (confounding variable)
what test to run?
comparing mortality of patients with high vs low blood glucose
chi-squared (2x2 table)
DV: mortality (binary)
IV: high vs low glucose (binary)
what test to run?
comparing patient Hgb A1c levels to one of 3 types of diet
ANOVA
DV: Hgb A1c (continuous)
IV: type of diet (more than 2 samples)
what test to run?
DV: binary
IV: binary
chi-squared (2x2 table)
what test to run?
DV: binary
IV: continuous or categorical/binary + confounding variables
logistic regression
what test to run?
DV: continuous
IV: 2 paired observations
paired t-test
What test to run?
DV: continuous
IV: 2 samples (binary)
2-sample t-test
what test to run?
DV: continuous
IV: continuous
correlation
what test to run?
DV: continuous
IV: continuous or categorical/binary + confounding variables
linear regression
what test to run?
DV: continuous
IV: more than 2 samples
ANOVA
type 1 error
false positive - reject the null hypothesis (detect a difference) when the null hypothesis is true
alpha: probability of false positive
type 2 error
false negative - fail to reject null hypothesis when there is truly a difference
beta: probability of false negative
how to calculate power?
1 - beta
aka
1 - the probability of a false negative
what is the probability of a true negative in relation to alpha,
and the probability of a true positive in relation to beta?
probability of true negative = 1 - alpha (probability of false positive)
probability of true positive = 1 - beta (probability of false negative)
*false positive = type 1 error, false negative = type 2 error
what does the p value represent?
probability of finding an outcome more extreme than your findings (closer to being an outlier, outer edges of bell curve), assuming null hypothesis is true
study is “statistically significantly” if p value is below a certain level of ____
alpha - probability of false positive
alpha cutoff is usually 0.05
what is meant by power? what is statistically significant power? when is it especially important?
the probability of rejecting the null hypothesis by obtaining a p value less than 0.05% (alpha)
power should be at least 0.80 (80% chance of rejecting null hypothesis is difference truly exists)
power is very important only if experiment fails to reject null hypothesis (assuming a meaningful difference actually exists) … but an experience that with low power can still be statistically significant if p value is < 0.05
give an example of when negative studies might be used (looking for evidence to support null hypothesis)
testing side effects of a new drug compared to a conventionally used drug - hoping the new drug will not cause effects more adverse than current drug
contrast nominal categorical data to ordinal categorical data
nominal data: categories with no hierarchy (ex - ethnicity)
ordinal data: data does not have numeric assignment, but rather falls into specific bucket with some rank or order (ex - level of schooling)
define:
Bernoulli distribution
log-normal distribution
binomial distribution
Bernoulli distribution: proportions expected in binary outcome (think pie chart with 2 options), ex: infected v uninflected
log-normal distribution: continuous data that cannot be negative (think right skewed bell curve), ex: income v age
binomial distribution: counting up multiple binary (Bernoulli) outcomes in discrete observations (think bell curve made of bars), ex: number of positive tests per batch of 100 tests
if mean > median, what direction is the data skewed?
RIGHT (positive)
if mean < median, what direction is the data skewed?
LEFT (negative - tail towards left)
what is the golden rule of standard deviation?
68 - 95 - 99.7
first SD contains 68% of data
second SD contains 95% of data
third SD contains 99.7% of data
what does a Z distribution show?
what is the z cutoff for 95% CI?
bell curve of standard deviation values - each point on the x-axis is number of SD away from mean
z = 1.96 for cutoff of 95% CI
how does sample size and variance affect CI?
when n (sample size) increases, CI gets narrower (more confident)
when variance (SD) increases, CI gets wider (less confident)
when comparing 2 binary variables (2x2 contingency table), when should you use chi-squared statistic vs fisher’s exact test?
chi-squared statistic - assumes n is large
fisher’s exact test - any cell <10
what is the ordinary lease squares (OLS) in linear regression?
computed line (slope) with the least amount of error
when comparing continuous data IV with continuous data DV
contrast absolute risk difference with risk ratio (relative risk, rate ratio, RR)
absolute risk difference: probably 1 - probability 2
risk ratio: probability 1 / probability 2
what does a larger chi-squared (x^2) signify?
larger x^2 signifies a larger difference between expected and measured outcomes/values - signifies more error with a no-difference assumption
each x^2 value is associated with p value
describe the Hawthorne Effect
people change their behavior in a study, form of subject bias
minimize via subject blinding (placebo)
give 4 time points during a RCT that blinding can occur
- treatment allocation (randomization)
- patient blinding (placebo)
- clinician blinding (blind to what treatment being provided)
- outcome assessor blinding (investigator assessing outcome blinded to which group)
describe intention to treat vs explanatory/ “as treated” protocol and why this difference is important
intention to treat: analysis of outcome according to treatment assigned, regardless of drop out or lack of compliance —> preserves randomization
“as treated”: analysis of outcome according to the treatment actually received, irrespective of what group subjects were originally assigned to
describe the difference between “as treated” protocol and “per-protocol” in RCT
“as treated” protocol would not remove non-compliant subjects
“per-protocol” would remove non-compliant subjects from the results
in both cases, analysis of outcome is according to what treatment was actually received, regardless of original group assignment
describe these variations on RCT: parallel stratified crossover cluster non-inferiority
parallel: classic, intervention group v control
stratified: randomization into stratified groups if there is a variable that will likely have a large influence on outcome (such as stage of cancer, wealth, etc)
crossover: subjects undergo intervention, then control after wash-out period (can serve as their own controls - reduces variability, increases power) - doesn’t work when the order matters or intervention has lasting effect
cluster: randomize care systems rather than individual patients (ex - testing 2 different disinfectants in a number of different emergency rooms) - does not work if individual consent is required
non-inferiority: basically testing if new treatment is not excessively less effective than current treatment, if new treatment is more favorable in other ways (cost, convenience, availability, etc)
what is therapeutic equipoise
refers to genuine uncertainty about which treatment is better in RCT
ethical concern in RCT
describe secondary, composite, and surrogate outcomes
secondary: outcomes of interest other than primary, ideally also designated a priori, require more stringent p value to be considered significant (otherwise you can find a difference anywhere if you look hard enough)
composite: combining multiple outcomes into one (ex: a OR b OR c - achieving any of these would be considered a primary outcome)
surrogate: some number outcome, often a lab measurement, that doesn’t necessarily speak to patient’s experience or quality of life
describe internal vs external validity
internal validity: how much results reflect reality for patients in study
external validity = generalizability
describe the features of cohort studies, outcomes, and key strength
observational, from exposure to disease
exposures can be beneficial or harmful
can be prospective (“will x exposure causes disease?”) or retrospective (“did x exposure cause disease”?)
outcomes: absolute risk (aka incidence), absolute risk difference (subtraction), and relative risk (risk ratio)
best way to look at prognosis and incidence