Biostatistics Flashcards
Steps to journal publication
Research question
Design the study
Enroll subjects
Collect the data
Analyze the data
Publish
Types of data
Continous has logical order with values that continuously increase by the same amount, such as HR of 120 is twice as fast as 60.
Continous data includes ratio data where 0=none, and interval data where 0=/=none.
Ratio: age, height, weight, time, BP
Interval: temperature
Categorical or discrete data are numbers assigned to a category and are limited.
Categorical includes nominal where the order does not matter, and ordinal where they are ranked in a logical order.
Nominal: gender, ethnicity, marital status, mortality
Ordinal: NYHA functional classes, pain scale
Bell curve, central tendency, spread and skewed data
Mean is average, median is middle, mode is most. Mean is preferred for continuous normal data. Median is preferred for ordinal, or continuous skewed. Mode is preferred for nominal.
Range is the difference between highest and lowest values. Standard deviation is to what degree the data is away from the middle.
Gaussian is a normal bell curve. Mean, median, and mode are the same value, left and right have the same amounts, and 68% of values will be 1SD or 95% in 2SD from the middle. 99.7% will be within 3SD.
Outliers, or extreme values, can skew the data. Skew refers to the direction of the tail, such as the hump is the opposite of the skew. If it is skewed right, there are high outliers, and the hump is on the left, while skewed left are low outliers, and the hump will be on the right side. Right is also called positive because it moves the data more positive, and left is negative because it moves the data back (or negative).
All about the hypothesis, alpha, p-value, CI
The null hypothesis is always no change after the study. Alternative usually has a change, such as more positive, more negative, or both. The alternative is what the researchers want, unless noninferiority trial.
The alpha level is the error margin, or the threshold to reject the null. Alpha is generally 5% or 0.05. The smaller alpha means it is harder to reject the null.
Alpha correlates with the tail values in a normal bell curve, alpha of 0.05 will accept 2SD, and alpha 1 will accept 3SD.
P value is related to alph: a p-value less than 0.05 is significant if the alpha is also 0.05, and the null is rejected.
Confidence intervals provide the same as significance level, adding precision. It correlates with alpha as well. CI=1-alpha, so if alpha is 0.05, the CI is 95%. There is usually a range for the CI, and if it crosses zero, it is NOT significant.
So a 95% CI of 18-58mL is significant, but a 95% CI of -0.26-0.89 is not significant.
If the CI includes 1 in ratio data, then it is not significant.
Wide range is less precise, narrow range is more precise.
Types of Errors
Type I: false-positive; if null was actually true, but was rejected. Generally if the p<0.05 (with alpha=0.05), there is a 5% probability in this error. This is due exclusively to chance.
Type II: false-negative; if the null was actually false, but was accepted. Generally, this is determined by beta. Beta is set at 0.1 or 0.2, meaning 10-20% chance of a type II error. This risk increases with smaller sample size.
Power is the probability that the null was rejected correctly by avoiding type II errors. Power=1-B. If power increases, the risk of type II is smaller.
Types of risks: relative risk/risk ratio, relative risk reduction, absolute risk reduction.
Relative risk: ratio of risk in the exposed group r treatment divided by the risk in the control group. If RR is 1 (100%), then there is no difference of risk in each group. If RR is >1, the risk is greater in the treatment group, and <1 is lower in the treatment group. A RR of 1.5 means that there is a 50% increased risk in the treatment group to experience the event. If RR is 0.5, then there is a 50% decreased chance in the treatment group.
Risk=#ppl with event/total in group
RR=risk in treatment/risk in control.
Relative risk reduction is 1-RR and determines how much the risk is actually reduced. So if RR was 57%, then RRR is 43%, and they are 43% less likely to have the event in the treatment group.
Absolute risk reduction includes the reduction in risk and the incidence rate. ARR=%risk in the control group minus % risk in the treatment group. If ARR is 12%, that means 12 of 100 people would benefit from the treatment. Or, for every 100 patients, 12 would not go through the event.
Number needed to treat and harm
ARR can determine the number needed to treat, or the number of patients that need to be treated so 1 person can benefit.
NNT=1/ARR. If ARR is 12%, then NNT is 9. For every 9 patients on the treatment, one will get a benefit. The number needed to harm is the number of patients that need to be treated, for a specific time, for one person to experience harm. NNT and NNH are both 1/ARR, however NNT is rounded up, and NNH is rounded down. So NNH is 8. One event is expected to occur for every 8 patients on treatment.
Odds ratio and hazards ratio
Odds are the probability it will occur versus not occur. In case-control studies, the odds ratio is preferred to determine the risk of events.
OR=AD/BC: Exposure with event is A. Exposure without event is B. No exposure with event is C. No exposure without event is D. If the OR is 1.23, then the exposure is associated with a 23% increased risk for the event to happen.
Hazards ratio is used in a survival analysis, and is based on time. HR=rate in treatment/rate in control.
If OR or HR=1, then the event rate is the same in the exposed and nonexposed group.
If they are >1, then the event rate is higher in the exposed group.
If they are <1, then the event rate is lower in the exposed group.
If HR is 0.5, then there are half as many events in the exposed group.
Types of statistical tests
Continuous tests depend on distribution. Parametric is used for normal distribution and nonparametric tests are for non-normal distribution. Categorical does not differentiate. The type of test also depends on the amount of groups studied.
One group:
Parametric; one-sample t-test
Nonparametric; sign test
Categorical; chi-square
One group with before/after:
Parametric; paired t-test
Nonparametric; Wilcox signed-rank test
Categorical; Wilcox signed-rank test
Two groups:
Parametric; unpaired student t-test
Nonparametric; Mann-Whitney (Wilcox rank-sum)
Categorical; Chi-square, Fisher’s exact, or Mann-Whitney with ordinal
More than three groups:
Parametric; ANOVA (f-test)
Nonparametric; Kruskal-Wallis test
Categorical; Kruskal-Wallis test
Correlation and regression
Correlation helps determine if exposure leads to an event. Positive correlation increases events with exposure, while negative correlation decreases events with exposure.
Spearman’s rank-order correlation is for ordinal ranked data (Rho). Continuous data uses Pearson’s correlation (r). R determines strength of correlation, where +1 is totally correlated positively, 0 is no correlation, and -1 is totally negatively correlated.
Correlation does not prove causation.
Regression describes how much the dependent variable changes with the independent variable. It is common with multiple independent variables. Three types:linear for continuous data, logistic for categorical, and Cox regression for categorical data in a survival analysis.
Sensitivity and specificity
Sensitivity describes how effective a test is to determine the disease. If sensitivity is 100%, all patients with the disease will be positive, however, the chances for false positives are generally higher.
If the person has the disease, and the test is positive its A. If the person does not have the condition, but the test is negative, its B. If the person has the condition, but the test is negative, its C. If the person does not have the disease, and has a negative test, its D.
Sensitivity=A/(A+C) x 100
Specificity=D/(B+D) x 100
Specificity describes how effective the test is to rule out patients without the disease. 100% specificity means all patients without the disease will have a negative result. The chances for false negatives are generally higher.
If the test has a sensitivity of 28%, then only 28% of people will have a positive result if they actually have the disease, and 72% have a negative result with the disease, and it is easily missed. If the test has a specificity of 87%, then the test is negative in 87% of people without the disease, and positive in 13% in patients without the disease, and they could have a false diagnosis.
Trial populations and designs
Intention to treat includes all patients originally enrolled, even if they did not complete the trial. This gives a real-world conservative estimate.
Per protocol analysis only includes those that finished the trial. This gives an optimistic estimate.
Noninferiority tries to prove the drug is not worse than the current treatment. These trials use the delta margin to determine the minimal difference in effect that is considered clinically acceptable. This trial may lead to a continual decrease in efficacy due to the margin being slightly less effective than current treatment.
Equivalence trials tries to prove that the drug is as effective as the current treatment. They test for higher or lower effectiveness. This does not lead to a decrease in effectiveness because it gives an actual difference.
Types of graphs and comparing data
Forest plots are commonly used for meta-analysis studies. They provide a box and whisker-type plot using the CI for ratio data. The box shows the effect estimate, and the diamond is a pooled effect. The lines show the length of the CI to see if it crosses zero, which is the vertical line.
Determine the type of data used to see if the vertical line should be zero or one. Difference data should be zero, ratio data should be one.
Types of medical studies, benefits, and limitations, with examples.
Case-control: compares patients with a disease to those without the disease to find a retrospective relationship. Benefits: data is easy to get, good for unethical interventions, less expensive than RCT. Limitations: correlation does not equal causation. Example: predictors of surgical site infections after open lower extremity bypass revascularization.
Cohort study: compares outcomes to a treatment based on exposure by following patients prospectively or retrospectively. Benefits: unethical interventions. Limits: can be influenced by confounders, more expensive. Example: Statin use and cognitive function.
Cross-sectional study: estimates the relationship between variables and outcomes at one specific point in time. Benefits: can find associations to study later. Limits: no causality. Example: SSRI and bone minerla density in elderly women.
Case report/case series: Describes a unique adverse event in one patient, or multiple. Benefits: can identify new diseases, SE, or uses. Limits: cannot generate conclusions. Example: tardive oculogyric crisis with clozapine.
Randomized controlled trial: Gold standard and is an actual experiment. Benefits: preferred for cause and effect or superiority, less bias. Limits: expensive, may not reflect real world. Example: Entresto versus enalapril in HF.
Crossover RCT: RCT, but patients switch treatment halfway through. Benefits: patient is their own control. Limits: needs a washout period. Example: Timolol and latanoprost crossover for glaucoma.
Factorial design: randomizes more than two groups. Benefits: evaluates multiple interventions. Limits: each group needs more subjects for power. Example: prednisone or pentoxifylline in hepatitis.
Meta-analysis: retrospectively looks at multiple trials. Benefit: small studies can be pooled for larger result. Limits: not uniform studies. Example: CKD with antioxidants.
Systemic review article: summary of clinical literature for a specific question. Benefits: inexpensive. Limits: can be biased. Example: evolving treatment in RCC after VEGF inhibitors.
Pharmacoeconomics and definitions
Pharmacoeconomics research identifies, measures, and compares the costs and the consequences of pharmaceutical products and services. Outcomes research is the broad term for healthcare economics.
This includes cost-effectiveness analysis, cost-minimization, cost-utility, and cost-benefit.
The study perspective is critical for interpretation.
ECHO model provides a broad framework to assess the outcomes. Economic includes direct, indirect, and intangible costs of the drug compared to medical intervention. Clinical includes medical events that occur due to treatment or intervention. Humanistic outcomes include the consequences of diseases.
Direct medical costs include preparation, administration, home infusion supplies, hospital bed, staff, procedures, office visits. Non-medical direct costs include travel, lodging, childcare, and home health aides. Indirect costs include lost work time, low productivity, morbidity, and mortality. Intangible includes pain, suffering, anxiety, and fatigue.