BioStat Flashcards
STEPS TO JOURNAL PUBLICATION
- Begin with research question: Write a null hypothesis
- Design the study: Is it randomized, placebo-controlled, a case-control or other?
- Enroll the subjects
- Collect data: prospective (going into the future) or retrospective (back in time)
- Analyze data
- Publish!
What are the two main types of study data
- Continuous Data
- Discrete (Categorical) Data
Continuous Data: what is it? What are the two datas?
Discrete (Categorical) Data: what is it? What are the two datas?
Measures of Central Tendency
Mean, Median, Mode
Mean: the average value; it is calculated by adding up the values and dividing the sum by the number of values. The mean is preferred for continuous data that is normally distributed
Median: the value in the middle when the values are arranged from lowest to highest. When there are two center values (as with an even number of values), take the average of the two center values. The median is preferred for ordinal data or continuous data that is skewed (not normally distributed).
Mode: the value that occurs most frequently. The mode is preferred for nominal data.
Guassian (normal) Distribution: characteristic
When the distribution of data is normal, the curve is symmetrical (even on both sides), with most of the values closer to the middle. Half of the values are on the left side of the curve, and half of the values are on the right side. When data is normally distributed:
■ The mean, median and mode are the same value, and are at the center point of the curve.
■ 68% of the values fall within 1 SD of the mean and 95% of the values fall within 2 SDs of the mean.
.
The examples show how the curve of normally distributed data changes based on the spread (or range) of the data. The curve gets taller and skinnier as the range of data narrows. The curve gets shorter and wider as the range of data widens (or is more spread out).
SKEWED DISTRIBUTIONS: Data that are skewed do not have the characteristics of a normal distribution; the curve is not symmetrical. Outliers (Extreme Values) and Skew Refers to the Direction of the Tail
An outlier is an extreme value, either very low or very high, compared to the norm. For example, if a study reports the mean weight of included adult patients as 90 kg, then a patient in the same study with a weight of 40 kg or 186 kg is an outlier. When there are a small number of values, an outlier has a large impact on the mean and the data becomes skewed. In this case, the median is a better measure of central
tendency.
.
Skew:
Data is skewed towards outliers. When there are more low values in a data set and the outliers are the high values, data is skewed to the right (positive skew). When there are more high values in the data set and the outliers are the low values, the data is skewed to the left (negative skew).
DEPENDENT AND INDEPENDENT VARIABLES
THE NULL HYPOTHESIS (H0) AND ALTERNATIVE HYPOTHESIS (HA)
The null hypothesis states that there is no statistically significant difference between groups. In a study comparing a drug to a placebo, the null hypothesis would assert that there is no difference in efficacy between them (drug efficacy = placebo efficacy). The researcher aims to disprove or reject this hypothesis.
The alternative hypothesis, on the other hand, posits that there is a statistically significant difference between the groups (drug efficacy ≠ placebo efficacy). This is what the researcher hopes to prove or accept.
ALPHA LEVEL: THE STANDARD FOR SIGNIFICANCE
When investigators design a study, they select a maximum permissible error margin, called alpha (a). Alpha is the threshold for rejecting the null hypothesis. In medical research, alpha is commonly set at 5% (or 0.05).
.
The p-value is compared to alpha. If the alpha is set at 0.05 and the p-value is less than alpha (p < 0.05), the null hypothesis is rejected, and the result is termed statistically significant
Interpreting CI
The values in the CI range are used to determine whether signficance has been reached
Interperating CI
Comparing ratio data (relative risk, odds ratio, hazard ratio)
CI and estimation…narrow vs wide CI/ meaning
A narrow CI range implies high precision, while a wide CI range implies poor precision. For example, a study comparing metoprolol to placebo finds a 12% absolute risk reduction (ARR) in heart failure progression, with a 95% CI range of 6-35%. This can be written as ARR 12% (95% CI 6%-35%) or as ARR 0.12 (95% CI 0.06, 0.35). The CI indicates 95% confidence that the true ARR for the population lies between 6% and 35%. A wider range, such as 4%-68%, indicates less precision, making it unclear where within that range the true value lies.
Type 1 Errors: False-Positive
In the scenario described, a Type I error occurs when the alternative hypothesis is accepted and the null hypothesis is rejected in error. The probability of making a Type I error is determined by alpha, which is related to the confidence interval. When alpha is 0.05 and a study result reports p < 0.05, it is statistically significant, and the probability of a Type I error is less than 5%. This means you are 95% confident (0.95 = 1 - 0.05) that the result is correct and not due to chance.
.
Cl = 1 - a (type I error)
Type II Errors False Negative
The probability of a Type II error, denoted as beta (β), occurs when the null hypothesis is accepted when it should have been rejected. Beta is typically set at 0.1 or 0.2, indicating a 10% or 20% risk of a Type II error. This risk increases with a small sample size. To decrease this risk, a power analysis is performed to determine the necessary sample size to detect a true difference between groups.
False positive, false negative in H0 relationships
Risk and Relative Risk (Risk Ratio) calculation/ formula
A placebo -controlled study was performed to evaluate whether metoprolol reduces disease progression in patients with heart failure (HF). A total of 10,111 patients were enrolled and followed for 12 months. What is the relative risk of HF progression in
the metoprolol-treated group versus the placebo group? Calculate the risk of HF progression in each group. Then calculate RR and interperate it
RELATIVE RISK REDUCTION (RRR): Interpretation and formula
The RR calculation determines whether there is less risk (RR< 1) or more risk (RR> 1). The relative risk reduction (RRR) is calculated after the RR and indicates how much the risk is reduced in the treatment group, compared to the control group.
Using the risks previously calculated for HF progression in the treatment and control groups (metoprolol: 16% and placebo: 28%), calculate the RRR of HF progression.
ABSOLUTE RISK REDUCTION: Interpretation and formula
Absolute risk reduction is more useful because it includes the reduction in risk and the incidence rate of the outcome.
ARR Calculation
Using the risks previously calculated for HF progression in the metoprolol study, calculate the ARR of HF progression.
NUMBER NEEDED TO TREAT (NNT): interpretation and formula
NNT is the number of patients who need to be treated for a certain period of time (e.g., one year) in order for one patient to benefit ( e.g., avoid HF progression).
NNT Calculation
The ARR in the metoprolol study was 12%. The duration of the study period was one year. Calculate the number of patients that need to be treated with metoprolol for one year in order to prevent one case of HF progression.