Biostats Flashcards
Prevalence
of existing cases of a disease at a specified time / # of people in base population at that time
Incidence
new cases occurring in a specific time period / # people initially at risk
Case fatality
of people who die of a disease / total # people with the disease
Mortality
people dying of a disease in a specified time period / # people alive during that time period
Years of Potential Life Lost (YPLL)
Calculated by multiplying the number of cause-specific deaths in an age group by the difference between the midpoint of the age group and the average age at death (assumed to be 75)
Reasons for an association betweeen a factor and a disease
Bias in the sampling of subjects Bias in the measurement of the factor Confounding Chance Transposition of cause and effect Causal
Relative Risk
Incidence of disease in the exposed / incidence of disease in the unexposed
Ex: 3.5x more likely to have cancer if you smoke
Attributable Risk
The difference between incidence of the disease in individuals with a risk factor and in those without a risk factor
AR = risk in the exposed - risk in the unexposed
Ex: In population X, 500 per 1,000 cases are due to smoking
Number Needed to Treat (NNT)
NNT = 1 / attributable risk
If 10 cases out of 50 are due to risk factor X, then NNT = 1 / (10/50) = 5 needed to treat in order to gain 1 outcome
Population Attributable Risk (PAR)
The proportion of cases that would be prevented if the risk factor could be eliminated from the population
PAR = total incidence - incidence in unexposed / total incidence
Cross-Sectional study
Usually large surveys of a representational sample group; assesses both risk factors and disease status in the present in order to draw correlations; can give information about the relative risk for certain diseases given exposure to different risk factors
Case-control (retrospective study)
A sample of cases and controls are examined for past history of risk factors; association of the disease with the risk factor is assumed to correlate with an association between the risk factor and the disease
Susceptible to sampling bias and re-call bias
Cohort Study
AKA Prospective, longitudinal study
A cohort (exposed and unexposed) is assembled and followed over time to determine who develops disease
Limited utility for very rare diseases or very long latencies, vulnerable to bias if loss-to-follow up is unequal in exposure vs. unexposure groups
Randomized Controlled Trial (RCT)
Participants are randomized to trial arm (exposed) or control arm (unexposed) and followed to assess outcomes
High internal validity, lower external validity (groups were artificially designed and so are not representative)
Vulnerable to cross-over; must use an intent-to-treat analysis
Sensitivity
Describes how good a screening test is at detecting true positives.
Sensitivity = TP / (TP + FN)
Specificity
Describes how often a test shows true negatives.
Specificity = TN / (TN + FP)
Positive Predictive Value (PPV)
Describes how often individuals with positive test results actually have the disease
PPV = TP / (TP + FP)
PPV increases with higher disease prevalence. Increased test specificity increases PPV.
Negative Predictive Value (NPV)
Describes how often individuals with negative test results are actually disease-free
NPV = TN / (TN + FN)
NPV increases with lower disease prevalence and increased sensitivity.
Lead-time bias
If a new screening test diagnoses a disease one year earlier but treatment has no effect on survival, data will show a false 1-year improvement in survival; this is an artifact of lead-time bias
Length time bias
Screening tests are likely to pick up individuals with longer asymptomatic phases (because symptomatic patients have already been diagnosed, we assume); therefore, those cases detected by screening are more likely to have longer disease regardless of whether it is detected by screening or not
2 methods for combining screening tests
- Parallel testing - The overall screening result is positive if any one test is positive; increases sensitivity at the expense of specificity
- Series testing - The overall screening result is positive only if all tests are positive; increases specificity at the expense of sensitivity
2 types of quantitative variables
- Continuous, i.e. weight, BP, serum levels, etc.
Continuous data are compared with t-test - Categorical (ordinal), i.e. no disease vs. disease
Categorical data are compared with Chi Square test
Mode
The most commonly observed value
Median
The middle value in a data set arranged lowest to highest
Mean
The arithmetic average
Mean = sum(x) / n
Range
The highest value minus the lowest value
Variance
Variance = sum (x - mean)^2 / (n - 1)
Standard deviation
The square root of the variance
Characteristics of a normal distribution
Mean = median
67% of values lie within 1 standard deviation from the mean
95% of values lie within 2 standard deviations of the mean
Type I error
False Positive
The investigator wrongly concludes that there is a difference, when the difference actually occurred as a result of chance or bias
Type II error
False Negative
The investigator concludes that there is no difference, when actually there is a true difference obscured by chance or bias
P value
The probability that chance alone caused the observed association
.05 is a conventional threshold
Confidence Interval
The range of values within which you are X% confident that the true value lies
95% CI identical to P = .05
Effect Modification
Occurs when the effect of a risk factor on an outcome is different at different levels of a third factor; the third factor is known as the effect modifier
Ex: age, gender
Beta Value
Beta is the acceptable risk of making a type II (false negative) error, arbitrarily set at .2
Power = 1 - B x 100
Therefore, a study that accepts the null should be able to shower a power of 80% or higher