Statistics and Study Design Flashcards
What are descriptive studies vs analytical studies?
Descriptive studies: describe characteristics of a population/phenomenon (descriptive surveys, case reports, cross-sectional studies)
Analytical studies: test hypotheses, determine associations (observational - survey, cohort, case-control vs experimental - RCT)
- Analytical studies employ inferential statistical tools
Data Types
- Nominal
- Ordinal
- Interval
- Ratio scale
Nominal - includes dichotomous/binary (pregnancy yes/no) and categorical
Ordinal - ranked (e.g. OHSS severity)
Interval - units with linear relationship to each other but NO absolute zero (e.g. temperature in Celsius or Fahrenheit)
Ratio scale - absolute value CAN be zero (e.g. Kelvin temperature scale, weight in lbs, age)
Line charts are useful for…
Bar charts are useful for…
Pie charts are useful for…
Scatter plots are useful for…
Line charts - Seeing trends over time
Bar charts - Categories
Pie charts - Showing parts of a whole
Scatter parts - showing data distribution
What defines a normal distribtion?
Observations are independent with “thin” tails (few extremes)
Central limit theorem
sums of independent events will converge towards a normal distribution even if their underlying distributions are not normal
Define SD in a normal distribution
1 SD - 68.2%
2 SD - 95.4%
3 SD - 99.6%
Scope of data collection
- Census
- Sample or “study population”
Census - data compiled about the total population (e.g. SART)
Sample - process of drawing sample contains an element of randomness and may not truly represent the overall population
law of large numbers
if sample is large enough, the distribution of sample approximates the distribution of the population from which it was derived
Type 1 error vs Type 2 error
Type 1: null hypothesis is falsely rejected (false positive), alpha “arrogant”
Type 2: null hypothesis fails to be rejected and an actual relationship between populations is missed (false negative), beta “bashful”
Statistical calculations:
- Sensitivity
- Specificity
- PPV
- NPV
- Accuracy
- Precision
- ROC curve
SENSITIVITY (true positive rate): test’s ability to correctly identify those with the condition:
- True positive / (true positive + false negative = or total positives)
SPECIFICITY (true negative rate) = true negative / (true negative + false positive = or total negatives)
PPV: TP/(TP + FP = total test positives) = probability that a positive test accurately indicates presence of the condition
NPV: TN/(TN + FN = total test negatives)
ACCURACY = the proportion of all test results that are correctly identified, both positive and negative. Represents the overall effectiveness of the test across all outcomes. Also, how close is the measured value to the true value/standard?
- TP + TN / total population
PRECISION = the proportion of positive identifications that were correct (e.g. PPV). Also, the consistency of repeated measurements, indicating how close the measurements are to each other, regardless of their proximity to the actual/real value.
ROC curve: (plot of true positive rate as a function of false positive rate)
- Random classifier is a straight diagonal line
- A better test is shifted up and to the left
Measurement error types
- Random (noise)
- Systemic (bias)
- Missing data (either random or bias)
- Censoring (how was data excluded from analysis?)
p-value
Probability of observing the results
Lower p-value (<0.05 typically) indicates observed data are unlikely under the null hypothesis, indicating significant evidence against it
Statistical power
- Factors influencing?
- Relationship to type 2 error?
Probability that a test will correctly reject a false null hypothesis (detect a true effect)
Factors influencing power:
- Significance level (alpha)
- Sample size
- Effect size
- Variance
HIGH power REDUCES risk of type 2 error
POWER (typically 0.8) = 1- beta (beta=likelihood of type 2 error, typically 0.2)
Power calculation considerations
- For non-inferiority trials
- For superiority trials
Must be done a priori (prior to study), not post-hoc (after study)
Cannot do power calculation without prior info (effect size, variance) (e.g. pilot study)
Non-inferiority vs superiority design will affect power calculations:
- Non-inferiority trials in lieu of “equivalence” trials (would need infinite subjects for this): non-inferiority margin typically set as a specified upper bound of the 95% CI for a difference in outcomes between groups)
- Superiority trials are typically placebo trials and require fewer subjects
Standard deviation equation
Variance =
Standard error =
SD = square root [sum ((difference between each observation and the mean) ^2) / size of population]
Variance = SD ^2
Standard error = SD / square root (N)
*Appropriate for normal distributions, but not in skewing