Clinical Outcome Measures Flashcards
4 Pyschometric properties of reliability
Internal consistency
Test-retest
Intra-rater
Inter-rater
Internal Consistency
Definition
Consistency of construct across individual items of outcome measure
Internal Consistency
Study Design
Conduct outcome measure on group of people-> analyze intra-subject correlation between them
Internal Consistency
Statistical Results
Chronbach alpha
measures correlation
Ideal: 0.7-0.9
Internal Consistency
Appraisal Considerations
Sample size
Participants wide diversity in outcome measure
Test-retest
Definition
Consistency of test when given to a person (unchanged in outcome) on two different occasions
Test-retest
Study Design
One person gives test to same people on different days
Test-retest
Statistical Results
ICC or Kappa
Closer to 1 better
Test-retest
Appraisal considerations
Sample size
Wide diversity in outcome measure
Intra-rater
Definition
Consistency of raters when compared to themselves on two different occasions
Intra-rater
Study Designs
Several therapists give test to same people at different times
Intra-rater
Statistical Results
ICC or Kappy
Closer to 1 better
Intra-rater
Appraisal considerations
Sample size Diversity in outcome measure Stable in characterstic of interest Same circumstances for assesment each time Time appropriate between assessments
Inter-rater
Definition
Consistency of raters compared to each other
Inter-rater
Study Design
Therapists measure same participants and their scores are compared
Inter-rater
Statistical Results
ICC or Kappa
Closer to 1 better
Inter-rater Appraisal considerations
Sample size Diversity in outcome measure Stable in characteristic of interest Same circumstances each assesment Time appropriate between assesments
3 Main groups of Validity and their subcategories
- Content
- face - Criterion
- concurrent
- predictive - Construct
- convergent
- discriminative
- known groups
Content Validity Characteristics
Def: includes all the content required
Design: expert panel conducts extensive assesment
Statistics: none
Considerations: diversity of experts, several rounds of giving opinions, transparent process
Face Validity Characteristics
Def: makes sense to me
Design: read it, think, done
Stats: none
Consider: diversity of experts, transparent process
Criterion Validity Characteristics
Def: compare to established measure
Design: measure OM of interest to established OM
Stats: Spearman rho or Pearson Correlation Coeff
-1 to 1
Criterion (concurent and predictive) Validity Appraisal Considerations
Gold standard credible?
Blinding?
Everyone does both assessments?
Time appropriate between assessments?
Concurrent Validity Characteristics
Def: two measures correlate at same time point
Design: comparison at same time
Stats: spearman rho
-1 to 1
Predictive Validity
Charactersitics
Def: OM of interest is correlated with another OM at later point
Design: OM of interest is compared to later OM
Stats: Spearman Rho
-1 to 1
Construct Validity
Characteristics
Def: Does OM measure what intended to?
Desgin: Measure OM to established gold standard
Stats: Spearman Rho
-1 to 1
Construct, Convergent, and Discriminitive Validity Appraisal Considerations
Argument for gold standard?
Blinding of raters?
All participants complete both measures?
Time appropriate?
Convergent Validity
Characteristics
Def: does OM of interest correlate with another measure
Design: measure Om to established gold standard
Stats: Spearman Rho
-1 to 1
Discriminitve Validity
Characterstics
Def: does Om interest NOT correlate with measure known to measure different construct
Design: Measure OM to established OM
Stats: Spearman Rho
-1 to 1
Known Groups Validity
Characteristics
Def: Does OM produce different results for groups of people known to be diff. on construct the OM is supposed to measure?
Design: measure three distinct diff. groups and look for score differences
Stats: Analysis of variance for linear trends
p value
Considerations: differences between groups sufficiently established?
Ceiling/Floor Characteristics
Definite: common for individuals to get highest/lowest score
Study Design: measure diverse sample and look at proportion that get high/lowest
Stats: %
Considerations: sufficient diversity in sample? full range of population?
Minimal Detectible Change Characteristics
Def: min amount of change required on OM to exceed anticipated measurement error/variability
Design: test-restest reliability and then calculate MDC
Stats: points on outcome measure scale
Considerations: all tet-retest reliabil plus calculations
Responsiveness Characteristics
Def: OM ability to detect change over time
Design: measure population at two points over time when expect change
Stats: effect size
Considerations: reasonable to expect change in time between assessments?
Minimal Clinically Important Difference Definition
Minimal amount of change on OM patients are likely to percieve as beneficial
MCID Study Design
Measure group likely to experience change and gold standatd representing meaningful change.
Find cutoff score that detects meaningful change
MCID Validity Considerations
- Diverse sample
- Participants likely to experience change!
- Sufficient time for change!
- Reliabile/valid measures
- Sample size
- Rater blinding!
- Sufficient follow-up!
- Gold standard actually measures meaningful change!
MCID Statistics
Cutoff scores and expected sensitivity, specificity, and LR ratios
ICC/Kappa Number Interpretation
> 0.8 excellent ageement
0.6 substantial
0.4 moderate
Spearman Rho Number Interpretation
+1: positive correlation
- 1: negative correlation
0: no correlation
> 0.85 strong
0.6 moderate
Effect Size Number Interpretation
> 0.8 large
- 5-0.8 moderate
- 2-0.5 small
Positive LR Number Interpretation
> 10 large
5-10 moderate
2-5 small
Negative LR Number Interpretation
0-.1 large
.1-.2 mod
.2-.5 small
.5-1 negligible
8 Questions of Appraising Diagnostic Research
- Entire SPECTRUM of pts represented in study? My patient?
- COMPARISON with reference GOLD standard of diagnosis?
- Were diagnostic tests performed by 1 or more RELIABLE EXAMINERS who were MASKED to results of referance material?
- Did all subjects recieve BOTH tests regardless of test outcomes?
- Was diagnostic test INDEPENTANTLY interpreted of all other clinical info?
- Were CLINICALLY USEFUL STATS included in analysis/interpretation for clinical application?
- Is test ACCURATE and clinically RELEVANT to PT?
- Will resulting POST-TEST PROBABILITIES affect management and help the patient?
Factors affecting Appraisal Question: entire spectrum represented?
What were authors trying to achieve?
Are their goals different from yours?
–look in abstract or beginning of methods
Factors affecting Appraisal Question: comparison to gold standard?
Was there a reference test or gold standard?
Did they compare the two?
Factors affecting Appraisal Question: reliable examiners masked?
Were there reliable examiners?
Were they blinded to other test results?
Factors affecting Appraisal Question: were clinically useful stats used to interpret clinical application?
Likelihood rations/ sensitivity and specificity
Overall accuracy of the test
Factors affecting Appraisal Question: will post-test prob affect patient management?
Look at negative and positive LR