Epidemiology and Statistics Flashcards
This is the study of illness outcomes by observing and comparing events in a group of individuals with shared characteristics.
Clinical epidemiology
What are 2 measures of disease occurrence?
Incidence and prevalence
What are 2 measures of disease outcomes?
Mortality and morbidity
What are some measures of validity or performance of diagnostic and screening tests?
Sensivity
Specificity
Predictive values
Likelihood ratios
What are some measures of disease association?
Odds ratio (OR) Relative Risk (RR)
Define odds ratio
(odds of disease in population A)/(odds of disease in population B)
This measure can give you mortality of a certain disease, usually calculates as a/(a+b)
Proportion
This is the equation where (a = frequency of events during a certain time period)/(a+b)
Rate
Is prevalence a proportion or a rate?
Proportion
Incidence is a rate!
This is the (new cases)/(subjects x yrs of follow-up per subject)
Incidence rate
What is the prevalence rate?
PR = (# individuals with disease)/(total population at a specific time)
In a population of 1000 people in which 50 are sick with H1NI flu illness and 25 die from H1NI in 1 year, what is the mortality rate?
Mortality rate from H1NI in that year = 25/1000
= 0.025 or 2.5%; case rate for H1NI disease = 25/50 = 0.5 or 50%
In a table of disease and testing, what is:
a/a+c
Sensitivity
In a table of disease and testing, what is:
d/b+d
Specificity
What one (sensitivity or specificity) is use to rule in a disease?
Specificity
In a table of disease and testing, what is:
a/a+b
PPV
In a table of disease and testing, what is:
d/c+d
NPV
What will happen to PPV if there is low disease prevalence?
Decrease
What will happen to NPV if there is low disease prevalence?
Increase
This is the true positive/false positive
aka sensitivity/(1-specificity)
+LR
What is the equation for negative likelihood ratio?
false negative/true negative
aka
1-sensitivity/specificity
This is the ability of a test to detect which individuals have the disease and which do not have the disease
Validity
This is the consistency of results under repeated measurements by the same individuals under the same conditions
Reliability/repeatability
This is the number of individuals needed to be screened for a given duration to prevent or detect one outcome
Number needed to screen
This is the bias that reflects the observed lengthening of survival time due to earlier diagnosis by a screening test without any actual prolongation of survival
Lead time bias
This is the bias that reflects the increased likelihood of identification of indolent tumors by intermittent screening as compared to fast growing/aggressive tumors that can be missed due to their rapid progression
Length-time bias
This is the bias that reflects the identification of disease by a screening test that does not affect the patients life in the absence of screening (aka pseudodisease aka benign lung nodules)
Overdiagnosis bias
This is the application of statistical tools and methods to address and analyze problems in health and medicine
Biostatistics
This type of study has the following advantages/disadvantages:
Advantages- quick and inexpensive, feasible for rare disorders, fewer subjects needed thank cross-sectional studies, generates OR
Disadvantages- reliance on recall or records to determine exposure, selection bias, selection of control groups is difficult
Case-control studies
This type of study has the following advantages/disadvantages:
Advantages- ethically safe, easier and cheaper than RCT, matching is possible, can establish timing and directionality, eligibility criteria and outcome can be standardized, generates RR
Disadvantages- difficult to identify controls, blinding is difficult, randomization not present, needs large sample sizes, expensive to conduct
Cohort study
This type of study has the following advantages/disadvantages:
Advantages- quick, cheap, simple, and ethically safe, population-based, best for quantifying the prevalence of a disease or risk factor
Disadvantages- establishes association and not causality, recall bias, confounders may be distributed
Cross-sectional study
This type of study has the following advantages/disadvantages:
Advantages- unbiased distribution of confounders, blinded, no bias, follow-up usually complete
Disadvantages-expensive volunteer bias, ethically problematic, loss of follow-up
RCT
This type of RCT randomizes entire groups rather than individual subjects to treatment
Ex: A hospital randomizes antibiotic cycling and measures infection rates
Cluster RCT
This type of RCT randomizes 2 or more treatments/interventions in all possible combinations
Ex: effect of salmeterol + fluticasone vs tiotropium _ salmeterol + fluticasone vs salmeterol alone in COPD exacerbations
Factorial randomized design
This type of RCT randomly allocates each patient to a sequence that includes each treatment so that each patient can act as their own control for treatment
Crossover design
This is an evidence-based resource produced after reviewing published and unpublished studies and combining the information of all relevant studies to address a particular clinical question.
Systematic review
This is a type of systematic review that uses systematic methods to combine qualitative and quantitative study data from several selected studies
Meta analysis
Put the following study designs in the heirarchy of form of evidence from lowest to highest:
Case report Observational cohort Case-control Meta analysis RCT Expert opinion Case series
Lowest: expert opinion Case report Case series Case-control Observational cohort RCT Highest: Meta-analysis
This is the sum of all observations divided by count of total number of observations, used for continuous variables, and easily distorted in a skewed data set
Mean
When is it beneficial to use the median in data sets?
When distribution is not normal, as it’s not affected by extreme values
This is the most frequent observation in the data set, used for discrete variable
Mode
This is (a/a+b)/(c/c+d)
RR
basically (risk in exposed)/(risk in non-exposed)
What does it mean if the RR=1?
No association
What does it mean when the RR >1?
Positive association - risk in exposed is greater than nonexposed
Ex: smokers are at high risk for developing lung cancer
What does it mean when the RR<1?
Exposed is less than that in nonexposed
Ex: daily exercise protects against heart disease
This is the ratio of odds of disease in the exposed group to the odds of diseaase in the non-exposed group
OR
TRUE/FALSE: OR is used in both cohort and case-control studies
TRUE
Give me the following letter (abcd) ratios to define the following OR calculations:
Odds of disease in exposed
Odds of disease in nonexposed
OR of disease in exposed
OR of exposure in diseased
Odds of disease in exposed = a/b
Odds of disease in nonexposed = c/d
OR of disease in exposed = ad/bc
OR of exposure in diseased = ad/cb
What does it mean when the OR=1?
No association of exposure with disease or disease with exposure
What does it mean when the OR>1?
Exposure or disease is positively related
Ex: odds or getting lung cancer is higher in smokers or odds of smoking exposure is higher in lung cancer patients
What does it mean when the OR<1?
Exposure or disease is negatively related
Ex: odds of developing heart disease is lowers in individuals who perform daily exercise (protective)
This test is used to measure statistical difference and is used to compare the MEANS from 2 different samples in a NORMALLY distributed data set
t-test
This test is used to determine if there are significant differences between variances of multiple samples
ANOVA
This test is used to determine the association between 2 categoric variables from 2 or more independent groups in the form of a 2x2 table
Compares the frequency distribution from observed data to the frequency distribution that would be expected under null hypothesis of no association
Chi-squared test
This analysis is used to estimate the influence of one or more independent variables and predict the value of the dependent variable, allows good control of confounders
Regression analysis
Which is used for a continuous dependent variable: logistic or linear regression?
Linear regression
This is a statement of no effect or no association.
Null hypothesis
This is the probability that the results observed in a study may have been just a chance finding.
P-value (statistical significance)
TRUE/FALSE: p-values indicate the strength and/or direction of the association
FALSE
They depend heavily on the sample size
and effect size
This is the range within the true value of a parameter lieis
Confidence interval (CI)
What does a 95% CI mean?
A 95% CI indicates 95% certainty that the interval contains the true value of the expected outcome in the entire population
True/False: Precision of CI depends on the sample size and variation
TRUE
larger sample sizes and less variation have norrower CIs, meaning more precision in results and vice versa
This error is a rejection of the null hypothesis when the null hypothesis is true (false+)
Type I error
What is a type II error?
Accepting the null hypothesis when it’s false (false-)
This is the probability of finding a true differnece when the difference really exists or the probability of correctly rejecting the null hypothesis when it’s false
Power
Ex; if the probability of a type II error is 5%, the statistical power is 95%
What’s considered a good power level?
> 9000
or in research, 80-90% power
What is selection bias?
Errors in participant selection or bias in the assignment of participants
What is information bias?
Errors in data collection and poor categorization in exposure or otucome status
What is recall bias?
Poor recall or memory of exposure status in case-control studies
What is interviewer bias?
Results obtained differnetly in the disease vs control groups by unblinded interviewers
What is publication bias?
Preferential publication of striking result in small studies
AKA every COVID paper thus far.
This is a variable that can cause or prevent the outcome of interest independently and is not an intermediate variable
Confounder
What is essential for external validity (aka generalizability)?
Internal validity
Why do we care about intention to treat analysis?
Measures effectiveness (aka real world care) rahter than efficacy
241 of 487 patients treated with salmeterol (49.5%) and 210 of 507 patients treated with salmeterol/fluticasone (41.4%) had at least one exacerbation over the 44- week trial. What is the NNT to prevent one exacerbation for salmeterol/fluticasone group?
The absolute risk reduction (ARR) of 49.5%–41.4% = 8.1%. NNT with salmeterol/fluticasone (rather than salmeterol alone) to prevent one additional patient from experiencing an exacerbation in 44 weeks = 1/0.081 = 12.3.