Stats Flashcards
Measures of central tendency?
Mean
Median
Mode
Mean?
Average value
Median?
Middle value
Mode?
Frequent value
Best measure when distribution not skewed?
Mean
Best measure when distribution skewed?
Median
Standard deviation?
Square root of variance
NOT influenced by sample size
Empirical rule of standard deviation?
Need a normal distribution (not skewed).
1 SD = 68% of data (34% on either side of mean)
2 SD = 95% of data
3 SD = 99.7% of data
Level 1 evidence?
Meta-analysis with small CI
At least TWO RCTs with a large sample size
RCTs can be in which level of evidence?
1, 2, 3 depending on sample size and how many done (need at least 2 for level 1)
Meta-analysis can be in which level of evidence?
1,2 depending on the CI
Nul hypothesis
What we are trying to REJECT
Need p < 0.05 to reject (chance occurrence less than 5%)
Type 1 error (ALPHA)
Nul hypothesis rejected but was TRUE
Error of INTERNAL VALIDITY
Probability of it happening = p value
Type 2 error (BETA)
Nul hypothesis accepted but was FALSE
Lack of POWER
Internal validity
Are the results representing what we wanted to measure?
Reliability
Are the results consistent and reproducible?
Student “t” test
Compares means of TWO samples made up of CONTINUOUS VARIABLES
Small samples with N < 30
ANOVA (“f” test)
Compares means of MORE THAN TWO samples made up of continuous variables
One-tailed test
Reject nul hypothesis in ONE direction (active treatment is better than placebo)
Two-tailed test
Reject nul hypothesis in TWO directions (active treatment is different than placebo, either better or worst)
“Z” test
SAME AS “T” TEST BUT FOR LARGER SAMPLES N > 30
Chi-square
Evaluates ASSOCIATIONS between 2 samples of CATEGORICAL VARIABLES
(percentages, proportions)
Can compare 2 proportions
Can make a table of frequencies
Pearson test
Test of linear correlation between CONTINUOUS VARIABLES
-1 = perfect indirect association
0 = no association
1 = perfect direct association
Linear regression
PREDICTION of results once a correlation is demonstrated between CONTINUOUS VARIABLES
Multiple logistic regression
PREDICTION of results once a correlation is demonstrated between CATEGORICAL VARIABLES
Incidence
Number of new cases / number of people at risk over a certain period of time
(REMOVE KNOWN CASES)
Prevalence
Number of NEW AND OLD cases / entire population over a certain period of time
Establishing causality
- High degree of correlation
- Consistency of the correlation
- Temporal association
- Coherence with contemporary scientific knowledge
- Dose-response relation
- Reversibility
- Biological plausibility
- Specificity
- Elimination of other explanations
Relationship between prevalence and illness duration?
Prevalence is proportional to incidence x illness duration.
SO, if illness duration increases (ex: new treatment comes out which prolongs life expectancy), prevalence will ALSO INCREASE.
Power
1 - Beta (error)
Probability that a difference will be picked up if that difference really exists.
Most important factor = SAMPLE SIZE
Cohort study
Follow 2 groups (exposed/non-exposed) PROSPECTIVELY to see if they develop an illness.
CAUSE to EFFECT
Useful when EXPOSURE is rare
Longer and more expensive than case-control studies.
Case-control
Looking back at 2 groups (ill / not ill) and determine their level of exposure.
RETROSPECTIVE
EFFECT to CAUSE
Useful when ILLNESS is rare
Attrition bias
Loss of certain patients for analysis
Fix with LAST OBSERVATION CARRIED FORWARD
Association in studies can be due to what?
Chance
Bias
Reverse causality
Confounding
Qualities of a good screening test?
Inexpensive Easy to administer Little discomfort Reliable Valid Comparable to gold-standard
PPV and NPV decedent on what?
PREVALENCE
Sensitivity
If I HAVE THE DISEASE, will the test pick it up?
True positive / all people with disease
Specificity
If I DON’T HAVE THE DISEASE, will the test not pick it up?
True negative / all people without disease
Sensitivity good for?
RULING OUT
Negative result most helpful
Specificity good for?
RULING IN
Positive result most helpful
Positive predictive value
If I have a POSITIVE TEST, what are chances I have disease?
True positive / all positive test
Negative predictive value
If I have a NEGATIVE TEST, what are chances I don’t have disease?
True negative / all negative test
High sensitivity means?
Low false negatives.
High specificity means?
Low false positives.
Ratio vs. proportion vs. odd vs. rate
Ratio is dividing one number by another:
- Proportion is part/whole so like M : population or F : population
- Odd is part/non-part like M : F (part and non-part of whole population)
Rate is ratio with time as an intrinsic part of the denominator
Odds
Part / non-part
Whole = 1
Non-part = 1-part
Odds = part / (1-part)
Odds = proportion / 1-proportion
Odds if probability of death is 20%?
Odds = 0.2 / (1-0.2) Odds = 0.2/0.8 Odds = 0.25
25%
aka for every person that dies, there are 4 people who live
Odds if probability of horse winning is 75%?
Odds = 0.75 / (1-0.75) Odds = 0.75 / 0.25 Odds = 3
300%
3 to 1 odds of your horse winning (3 x 100%)
Disease odds ratio
Odd disease among exposed / odd disease among unexposed
Exposure odds ratio
Odd exposure among disease / odd exposure among non disease
Odds ratio
AD/BC
Odds ratio relevant for which type of study?
Case-control (RETROSPECTIVE)
Odds ratio interpretation?
OR > 1 means greater odds of association
(1.2 = 20% increase in odds of an outcome with a given exposure)
OR = 1 means no association between outcome and exposure
OR < 1 means lower odds of association
(0.2 = 80% decrease in the odds of an outcome with a given exposure)
Risk ratio
Proportion so now we look at wholes!
Risk of disease among exposed / risk of disease among unexposed
Risk of disease among exposed = A / (A + C)
Risk of disease among unexposed = B / ( B + D)
Risk ratio interpretation
RR > 1 means exposure is associated with a higher risk
RR = 1 means no association between exposure and disease
RR < 1 means exposure associated with lower risk (may be causally protective)
Risk ratios relevant for which type of study?
RCTs COHORT studies (can be retrospective or prospective)
Absolute risk reduction?
Risk of disease among exposure - risk of disease among non-exposure
Always positive because it is an absolute value
NNT/NNH
1 / ARR (absolute risk reduction)
Confidence interval
Mean +/- (1.96 x standard error of the mean)
Standard error of the mean = SD / (square root of sample size)
1.96 = t-statistic for 95% CI (most commonly used)
If CI crosses 1 = not statistically significant
Effect sizes when comparing standardized practices?
Small 0.2
Medium 0.5
Large 0.8
Effect sizes when comparing correlation coefficients?
Small 0.1
Medium 0.25
Large 0.4
Effect sizes for relative risk?
Small 1.5
Medium 2.5
Large 4.3
Pareto principle?
80% of effects are the product of 20% of the causes
Pareto diagram highlights the importance of different causes to a phenomenon but does NOT allow for evaluation of the quality of practice of doctors with regards to application of guidelines/recommendations!