Biostats/Epi Flashcards
Three ways to characterize the center of normal distribution
Mean: average of all numbers
Median: middle number of data set when all lined up in order
Mode: most commonly found number
Skewness
positive or negative based on location of tail
if tail is pointing toward lower/negative #s then it is negative; if pointing toward larger/positive #s then is positive
least likely to be affected by outlier in central tendency
mode
adding one outlier changes mean and median; it will only change the mode if it changes most common number and one outlier is unlikely to change the most common number
Central Tendency key points
if distribution is equal: mean=mode=median
mode is ALWAYS at the peak
In skewed data: mean is always furthest away from the mode toward the tail
Mode is the least likely to be affected by outliers
Z score
describes a single data point; how far a data point is from the mean
z score of 0 is the mean
z score of +1 is 1SD above mean
z score of -1 is 1SD below mean
Standard of the mean
how far is the dataset mean from the true population mean
SEM = SD/number of population squared
Confidence intervals
range of 95% of repeated measurements would be expected to fall; 95% chance true population falls within this range
CI95% = mean +/- 1.96*(SEM)
Null hypothesis
H0: there is no difference
type 1 (alpha) error
there is no difference in reality but our study finds a difference
type 2 (beta) error
there is a difference in reality but our study misses it
Power
chance of detecting difference
power = 1 - beta
P-value
represents chance that the null hypothesis is correct; used to accept or reject the null hypothesis
if p<0.05 we usually reject the null hypothesis; difference in means is “statistically significant”
The 3 ways the power of a study increased
Increased sample size (the one thing you can control)
large difference means
less scatter of data
Power calculation
1 - Beta (type II error); if want to increase then need to increase the number of subjects for a high power; common power goal is 80%
new drugs that improve survival on incidence and prevalence
incidence is unchanged (not preventing new people from getting the disease)
prevalence changes (people are living longer with the disease)
vaccines on incidence and prevalence
both incidence and prevalence will fall
test that is good at ruling OUT disease
high sensitivity (TP/TP+FN)
test that is good at ruling IN disease
high specificity (TN/TN+FP)
a test is negative in 80% of people who do not have the disease is telling you what?
the true negative of the test; specificity
a test is positive in 50% of the people who do have the disease is telling you what?
the true positive of the test; sensitivity
Positive predicative value (PPV)
of all the people that test positive on a test, what percentage of those are true positives?
PPV=TP/TP+FP
Negative predicative value (NPV)
of all the people that test negative on a test, what percentage of those are true negatives?
NPV=TN/TN+FN
Accuracy
quantified by the area under the ROC curve (AUC); the more accurate the test is, the closer the AUC value is to 1.0
Selection Bias
errors in selection or retention of study groups
subtypes: sampling bias, attrition bias, berkson’s bias, nonresponse bias, prevalence bias
Sampling Bias
a subtype of selection bias; pts do not representative of actual practice
ex: avg age of HF trials pt 65 yrs vs avg age of actual HF pts is 80+ yrs
Attrition Bias
a subtype of selection bias that occurs in prospective study due to lost to follow-up unequally between the groups
Berkson’s Bias
a subtype of selection bias that involves hospitalized pts; they may have more severe symptoms and better access to care which may alter the results of the study
Nonresponse Bias
a subtype of selection bias that occurs with survey and questionnaire studies; the non-responders are not included and the pts that do respond may represent a selected group
Prevalence Bias
aka Neyman Bias is a subtype of selection bias that occurs in studies trying to associate exposure with disease; exposure occurs long before the disease assessment and pts exposed who die quickly are not included
prevalence of disease based on select group survivors
Length-time Bias
pts with severe disease do not get studied because they die; or anything to do with a benign disease that’s causing ppl to live longer
ex: analysis of HIV+ pt show it is asymptomatic; may overestimate because severe cases were pts died may have been missed
Lead-time Bias
screening test identifies disease earlier so survival appears longer when it is not
Measurement Bias
sloppy research technique\protocol not followed
ex: BP measurement incorrectly in one arm; avoided by standardized data collection
Recall Bias
form of measurement bias; inaccurate recall of past events by study subjects; common in survey studies
Observer Bias
form of measurement bias where investigators know exposure status of pt; avoid by blinding
ex: pathologist reviewing specimens knowing the pt has cancer
Procedure bias
one group receives procedure (surgery) and the other does not; more care and attention given to the procedure pts; avoided by blinding and by using placebo (sham surgery performed)
Confounding bias
unmeasured factor confounds study results; avoided in stratified analysis
control for confounders by randomization or matching in a case-control study
paired t-test
compares the mean of 2 related groups; test requires that a quantitative dependent variable (outcome) be evaluated in 2 related groups (ex: matched/paired) groups
comparing data against oneself (pre/post surgery or tx)
chi-square test
evaluates the association between 2 categorical variables as in a study evaluating the association between sex (ex: male v female) and MI (presence v absence)
Crossover study
subjects are randomly allocated to a sequence of 2 or more treatments given consecutively; a WASHOUT (no tx) period is often added between intervals to limit the confounding effects of prior tx
Accumulation effect
the concept of accumulation effect can be applied to disease pathogenesis and exposure to risk modifiers; cumulative exposure to a risk factor or risk reducer must sometimes occur for prolonged periods before a clinically significant effect is detected
Ecological study
unit of analysis in ecological studies is populations rather than individuals
Prevalence
equals the incidence rate multiplies by the average disease duration; changing diseases prevalence in a steady-state population w constant incidence rate means that there is an additional factor affecting the duration that prolongs disease duration (improved quality of care) due to pts surviving longer
Phase I of clinical trial
address whether new treatments are effective and safe for their intended use in a target population; conducted in small number of *HEALTHY subjects
Negative correlation coefficient
Reducing the significance level of alpha (ex: 0.05 to 0.01)
allows researchers to report any significant findings with greater confidence
Number needed to harm (NNH)
NNH = 1/Absolute risk increase
Cumulative incidence
number of new cases of a disease over a specific period divided by the total population at risk at the beginning of the study; make sure to subtract the people who already have the disease
Risk
probability of developing a disease over a certain period of time divide the number of affected subjects by the total number of subjects in the corresponding exposure group
Relative risk reduction (RRR)
1-RR
Attack rate
ratio of number of ppl who contract an illness divided by the number of people who are at risk of contracting that illness
Misclassification bias
incorrect categorization of subjects regarding their exposure, outcome status or both; in case-control studies, recall bias usually leads to misclassification of the exposure status
allele frequency
the frequency of an allele is equal to the # of that specific allele divided by the total # of alleles in the population