Biostats Flashcards
Types of study data
Continuous
Discrete (Categorical)
Continuous Data
Has a logical order with values that continuously increase by the same amount.
Includes interval data and ratio data
Interval data
Type of continuous data, has no meaningful zero
Example-C and F temperature scales
Ratio data
Type of continuous data with a meaningful zero
Example- Age, height, weight, time, BP
Discrete (categorical data)
Includes nominal and ordinal data
Has categories
Nominal data
Type of discrete (categorical) data
Categories are in arbitrary order- the order does not matter.
Example- gender, ethnicity, marital status, mortality
Ordinal data
Type of discrete (categorical) data
Categories are ranked in a logical order, but the difference between the categories is not equal.
Example- NYHA class, 0-10 pain scale
Standard Deviation
How spread out the data is, and to what degree it is dispersed away from the mean.
Data that is highly dispersed has a larger SD
Gaussian (normal) distribution
Symmetrical curve, half of the values on the left and right
Mean, median, mode are equal
Large sample sets of continuous data tend to form
Gaussian or “normal” distribution
“bell-shaped curve”
In Gaussian distribution, __________of the values fall within 1 SD of the mean and ___________of the values fall within 2 SDs from the mean.
68%- 1 SD
95%- 2 SD
When does skewed distribution occur?
When the number of values (sample size) is small and/or there are outliers in the data
When there are small numbers of values, what measure of central tendency is the best?
Median
The distortion of central tendency caused by outliers is decreased by
collecting more values
Variable
any data point or characteristic that can be measured or counted.
Independent variable
Changed by the researcher in order to determine whether it has an effect on the dependent variable (outcome)
The outcome is the
dependent variable
HF progression is an example of
dependent variable
Comorbid conditions, doses are examples of
Independent variables
Null hypothesis
There is no statistically significant difference between groups.
The researcher is trying to disprove or reject the null hypothesis.
Alternative hypothesis
There is a statistically significant difference between groups. The researcher is trying to prove or accept
Error margin
Alpha
The alpha level is commonly set at
5% or 0.05
The p value is compared to
the alpha
How to compare the p value to the alpha
p-value < alpha- reject null hypothesis, alt hypothesis is accepted- statistically significant
p-value >alpha- accept the null hypothesis, alt hypothesis is rejected
Confidence interval
Provides the same information about significance as the p value, plus the precision of the result
How to calculate CI
CI=1-alpha
An alpha of 0.05 represents a 95% CI
A CI of 95% indicates
you are 95% confident that the true value for the population lies somewhere within the range
A narrow CI indicates
high precision
A wide CI indicates
poor precision
Type I error
False positive
The null hypothesis was rejected in error
Probability of a type I error
CI=1-alpha (type I error)
When alpha is 0.05 and a study result is reported with p<0.05, it is statistically significant and the probability of making a type 1 error is <5%
Type II error
False negative
The null hypothesis is accepted when it should have been rejected.
Power
The probability that a test will reject the null hypothesis correctly (the power to avoid a type II error)
Power=1-beta
Determined by the outcome values, difference in outcome rates, and the significance (alpha level)
Relative risk
Risk in exposed group (treatment) divided by the risk in the control group
Risk
of subjects with unfavorable event/total number of subjects
RR=1
No difference in risk
RR>1
Greater risk of outcome in treatment group
RR<1
Lower risk of outcome in the treatment group
How to interpret a RR of 0.57
Patients treated were 57% AS LIKELY to have disease progression/event as placebo patients
RRR
Indicates how much the risk is reduced
1-RR (must use decimal form)
RRR interpretation
LESS likely (vs control)
RR+RRR=
100
Absolute risk reduction
Indicates the reduction in risk AND the incidence rate of the outcome
ARR= (%risk control)-(%risk tx)
ARR of 12% indicates
12 out of every 100 patients benefit from the tx
NNT
Number of patients needed to be treated for a certain period of time in order for 1 patient to benefit.
1/ARR (in decimal)
NNH
Number of patients who need to be treated for a certain period of time in order for 1 patient to experience harm
1/ARR (in decimal)
Odds ratio
Used to estimate the risk of unfavorable events in case control stubdies
Odds ratio calculation
OR=AD/BC
A=Outcome present, treatment group
C=Outcome present, control group
B=Outcome absent, treatment
D= Outcome absent, control
Hazard ratio
The rate in which an unfavorable event occurs within a short period of time.
HR=Hazard rate tx/Hazard rate control
OR or HR=1
Event rate is the same
OR or HR >1
event rate in tx is higher
OR or HR <1
Event rate in tx is lower
Normally distributed continuous data
Use parametric tests
Not normally distributed continuous data
Use nonparametric tests
T-tests are used when
Continuous data is normally distributed
ANOVA is used for
continuous data with 3 or more samples
Chi-square test is used for
nominal or ordinal data
Selecting a test to analyze data:
Parametric tests with 1 group
One-sample T test
If you have before and after measurements, use dependent/paired T test
Selecting a test to analyze data:
Parametric tests with 2 groups (tx, control)
Independent, unpaired student t test
Selecting a test to analyze data:
>/=3 groups, parametric tests
ANOVA
Selecting a test to analyze data:
Discrete/categorical data with 1 group
Chi-square
Selecting a test to analyze data:
Discrete/categorical data with 2 groups (tx, control)
Chi-square or Fishers exact
Correlation
Statistical technique to determine if 1 variable changes or is related to another variable.
Does not mean causation
Regression
Used to describe the relationship between a dependent variable and one or more independent variables.
Common in observational studies where researchers need to assess multiple independent variables or need to control for confounding factors.
Sensitivity
True positive
100% sensitivity=will be positive in all pts with condition
Specificity
True negative
100% specificity=will be negative in all patients without the condition
Intention to treat analysis
Includes data for all patients originally allocated to each treatment group even if the patient did not complete the trial according to the study protocol
Per protocol analysis
Includes the subset of patients who completed the study according to the protocol
Equivalence trials
Want to show that the treatments have the same effect
Non-inferiority trials
Want to show that the new treatment is no worse than the current standard of care
Forest plots provide CI for
difference data or ratio data
The boxes on a forest plot show
effect estimate
The diamonds on a forest plot show
pooled results from multiple studies
Horizontal lines on forest plots show
CI
Vertical solid line on forest plot
The line of no effect
0 for difference data
1 for ratio data
Case control studies
Retrospective comparisons of cases (pts with disease) and controls (pts without disease)
Cohort studies
Retrospective or prospective comparisons of pts with an exposure compared to those without the exposure
RCTs
Prospective comparison of patients who were randomly assigned to groups
ECHO model
Shows economic, clinical, and humanistic outcomes
Cost minimization analysis
Interventions have equal outcomes and just the costs are being compaared
Cost benefit analysis
Compares benefits and costs of an intervention in terms of monetary units.
Converts benefits of tx into dollars
Cost effectiveness analysis
Compares the clinical effects of interventions ot teh costs
Cost utility analysis
Uses QALYs and DALYs