Statistics Flashcards
Confidence Intervals
A range of values so defined that there is a SPECIFIED PROBABILITY that the value of
a parameter lies within it.
Effect Size
- Magnitude of an intervention reflected by an index value.
- Can be calculated
from data in a clinical trial. - It is mostly INDEPENDENT of sample size.
- Most interventions have small
to moderate effect sizes.
Effectiveness
How well an intervention performs under “real-world” circumstances.
Efficacy
How well an intervention performs under IDEAL and CONTROLLED circumstances.
Fidelity
(1) Extent to which delivery of an intervention ADHERES to the protocol or program model originally developed and…
(2) How CLOSE the intervention REFLECTS the appropriateness of the care that should be provided.
Minimally Clinically Important Difference (MCID)
Smallest difference in score in the domain of interest which patients perceive as BENEFICIAL and which would mandate (barring troublesome side
effects, $$$) a CHANGE in the pt management
P value
The probability of obtaining a result EQUAL to or MORE EXTREME than what was actually observed (assuming no difference in groups). Usually p = 5% (0.05).
Personalized Medicine vs Precision Medicine
PERSONALIZED = study of tailoring of medical treatment to the individual CHARACTERISTICS of each patient
PRECISION = uses information about a person’s genes, proteins, &
environment to prevent, diagnose, and treat disease
Reliability
Degree to which the result of a measurement, calculation, or specification can be depended on to be PRECISE.
Statistical Significance
Claim that a result from data generated by testing or experimentation is NOT likely to occur RANDOMLY or by CHANCE, but is instead likely to be attributable to a
specific cause.
Validity
Extent to which the instrument measures what it was designed to measure.
(multiple types of validity, each representing a different construct)
Types of data (4)
Nominal, Ordinal, Interval, Ratio
Nominal Data
2 categories, e.g. Yes/no; boy/girl
Ordinal Data
Has order but not rank.
E.g. strongly agree, agree, disagree, and strongly disagree
Interval Data
Has rank AND order.
E.g. 1-4, 5-8, 9-12, etc.
Ratio Data
Has rank, order, and is COUNTABLE.
E.g. weight, temperature, age
Parametric vs Non-parametric Tests
Parametric tests: test group MEANS
- Used when data are normally distributed
- Data from multiple groups have the same variance
- Data have a linear relationship
Nonparametric tests: test group MEDIANS
- “distribution-free tests” - they don’t assume that data follow a specific distribution
- Can be used with smaller sample sizes, & when you want to be more conservative with your analyses.
Means are used with [parametric / non-parametric ] tests.
Medians are used with [parametric / non-parametric ] tests.
Means are used with PARAMETRIC tests.
Medians are used with NON-PARAMETRIC tests.
Parametric tests are used when data have…
- normal or non-normal distribution?
- groups have same or different variance?
- data are linearly or non-linearly related?
…so, you’ll be comparing [means / medians]
Parametric tests are used when data have…
- NORMAL distribution (though can also be used when assuming a particular [though non-normal] distribution). Typically requires a LARGE sample size to get a normal distribution.
- groups have SAME VARIANCE
- data are LINEARLLY related
…so, you’ll be comparing MEANS
(otherwise, use non-parametric tests!)
You need a statistical test for 2 samples.
What are your options, depending on if your data are parametric vs non-parametric?
2-samples
Parametric = t-test
Non-parametric = Mann-Whitney U test
When is it appropriate to use a Mann-Whitney U test?
What if you have >2 samples?
2 samples, NON-parametric data
> 2 samples = ANOVA (“ANOVA by rank” for use with non-parametric data)
When is it appropriate to use a t-test?
What if you have >2 samples?
2 samples, parametric data
> 2 samples = ANOVA (aka F test, aka “ANOVA sum of squares” for use with parametric data)
You need a statistical test for 2 samples that are paired.
What are your options, depending on if your data are parametric vs non-parametric?
Paired 2-samples
Parametric = Paired t-test
Non-parametric = Wilcoxon
When is it appropriate to use a paired t-test?
Paired 2 samples, parametric data
When is it appropriate to use a Wilcoxon test?
Paired 2 samples, non-parametric data
You need a statistical test to analyze distribution.
What are your options, depending on if your data are parametric vs non-parametric?
Distribution
Parametric = Chi squared (for large samples with n>20 [at least]; Fisher exact test for smaller samples)
Non-parametric = Kolmogorov-Smirnov (can also use Fisher exact test for non-parametric data or small parametric samples…which again, these are hard to assume parametricitiy given often not perfectly “normal” data in small sample size)
When is it appropriate to use a Chi squared test?
What about a Fisher exact test?
What about a Kolmogorov-Smirnov test?
All used with distribution
Chi squared best with PARAMETRIC data, bigger samples (n>20)
Fisher exact test can be used with smaller sample (n<20) parametric data, vs with non-parametric data
Kolmogorov-Smirnov can be used to look at distribution in non-parametric data
You need a statistical test that can deal with >2 samples, +/- several dependent variables.
What are your options, depending on if your data are parametric vs non-parametric?
> 2 samples
Parametric = ANOVA; with several dependent variables, use a MANOVA
Non-Parametric = Kruskal Wallis
When is it appropriate to use a Kruskal Wallis test?
> 2 samples, non-parametric data
What is a test of association?
Association = is there a RELATIONSHIP between 2 or more variables
The type of test, of course, varies based on if data are parametric vs non-parametric
I want to look at the association between two groups. Yay!
What test do I use for parametric data?
For non-parametric data?
(Bonus: what if you have >2 groups??)
I want to look at the association between two groups. Yay!
Parametric data = Pearson Product
Non-parametric data = Kendall Tau
> 2 groups? Use the correlation matrix version of each of the above tests
I want to look at association between groups, with one dependent variable (and multiple independent variables).
What test do I use for parametric data?
For non-parametric data?
I want to look at the association between groups, with one dependent variable (and multiple independent variables).
Parametric data = linear regression
Non-parametric data = logistic regression
When would it be appropriate to use a LINEAR regression?
- Number of dependent vs independent variables?
- Parametric vs non-parametric data
When would it be appropriate to use a linear regression?
- ONE dependent variable, MULTIPLE independent variables
- PARAMETRIC data
When would it be appropriate to use a LOGISTIC regression?
- Number of dependent vs independent variables?
- Parametric vs non-parametric data
When would it be appropriate to use a LOGISTIC regression?
- ONE dependent variable, MULTIPLE independent variables
- NON-PARAMETRIC data
I want to look at association between groups, with multiple dependent variables.
What test do I use for parametric data?
For non-parametric data?
I want to look at the association between groups, with multiple dependent variables.
Parametric data = Ologit regression (categories are ORDERED)
Non-parametric data = Discriminate Analysis (categories are NOT ORDERED) or use Multinominal regression
When would it be appropriate to use an OLOGIT regression?
- Number of dependent vs independent variables?
- Parametric vs non-parametric data
- Categories that are ordered vs non-ordered?
When would it be appropriate to use an OLOGIT regression?
- MULTIPLE dependent variables
- PARAMETRIC data
- Categories that are ORDERED
When would it be appropriate to use a DISCRIMINATE ANALYSIS or MULTINOMIAL regression?
- Number of dependent vs independent variables?
- Parametric vs non-parametric data
- Categories that are ordered vs non-ordered?
When would it be appropriate to use a DISCRIMINATE ANALYSIS or MULTINOMIAL regression?
- MULTIPLE dependent variables
- NON-PARAMETRIC data
- Categories that are NOT ORDERED
Sensitivity
Proportion of TRUE POSITIVES
(proportion by percentage of patients
who DO have the disease of interest who
register a POSITIVE test finding)
SnNOut (therefore, with high sensitivity, a NEGATIVE tests helps you more confidently rule OUT for the dx)
Specificty
Proportion of TRUE NEGATIVES
(proportion by percentage of patients
who do NOT have the disease of interest
who register a NEGATIVE test finding)
SpPIN (therefore, with high specificity, a POSITIVE test helps you more confidently rule IN for the dx)
Positive Predictive Value
Probability that subjects with a POSITIVE TEST truly DO have the disease
Negative Predictive Value
Probability that subjects with a NEGATIVE TEST truly DO NOT have the disease
Positive Likelihood Ratio
LR+ Commonly used to rule in a condition
(probability of a true positive) / (probability of a false positive)
Positive LR = sensitivity / (100 – specificity).
AKA: (Probability of a patient WITH the disease and a POSITIVE test) divided by the (probability of a patient without the disease and a positive test).
Negative Likelihood Ratio
LR- Commonly used to rule out a condition
(probability of a false negative) / (probability of a true negative)
Negative LR = (100 – sensitivity) / specificity.
AKA: (Probability of a person who has the disease testing negative) divided by the (probability of a person who does not have the disease testing negative)
Effect size
Measures the strength of treatment effect (magnitude of the intervention)
INDEPENDENT of sample size - very useful when evaluating data from under- or over-powered studies’
In randomized trials (comparative studies), effect sizes are often reported as “trivial, small, moderate, or large”
Odds ratio
Odds that an outcome will occur given a particular exposure vs to the odds of
the outcome occurring in the ABSENCE of that exposure
OR >1 = finding is more likely
OR <1 = finding is less likely
What is the relationship between sensitivity, specificity & likelihood ratios?
Positive LR = sensitivity / (100 – specificity).
Negative LR = (100 – sensitivity) / specificity.
Recall:
- Sensitivity = TRUE POSITIVE rate
- Specificity = TRUE NEGATIVE rate
How do you interpret positive likelihood ratios?
0-1 ?
1 ?
1 - infinity?
Tell you likelihood of a disease/condition/result.
0-1: Decreased likelihood of disease.
+LR 1/2 (0.5) = 15% less likely
+LR 1/5 (0.2) = 30% less likely
+LR 1/10 (0.1) = 45% less likely
1: Null, no diagnostic value.
> 1: increased evidence for disease. High +LR helps you rule IN for a disease.
+LR 2 = 15% more likely
+LR 5 = 30% more likely
+LR 10 = 45% more likely
An LR over 10 is very strong evidence to rule in a disease.
Journal Impact Factor
“How much of this journal being cited during the most recent X (often 2 or 5) years?”
Reflects the number of citations made in the current year to articles in the previous two years, divided by the total number of citable articles from the previous two or five years
Describe the 4 clinical phases in research trials
Phase I: assess the SAFETY of an intervention.
Phase II: test EFFICACY of the intervention in a tightly controlled environment.
Phase III: EFFECTIVENESS (randomized and blinded testing in a REAL WORLD environment)
Phase IV: tests the impact of the intervention for costs, overall long-term care, etc.