5010: Reliability, Responsiveness Flashcards
Types of Measurement Error
Systematic
(consistent, unidirectional, biased, “constant”)
Random
(inconsistent, either direction equally likely, try to minimize, on average- will cancel out)
Sources of Measurement Error
Rater (stabilization, recording)
Meas. Instrument or Method (goni-faulty, consistency- interrater)
Subject (clothing, m. mass, gender, time of day, meds)
Types of Reliability
Intrarater (usu. MOST reliable)
Interrater
Test-retest (suggests no rater involvement, self-reported data)
Intraclass Correlation Coefficient (ICC)
True score variance b/w subjects
= ————————————————
Total variance
Reliability Coefficient
True score variance b/w subjects
= ————————————————
Total variance + error variance
Variance
Measure of avg. variability of sample data. Ideally for a clinical measure.
Interpretation of Reliability Stats
Range: 0-1
Statistical Measures of Reliability
ICC- for Continuous (sometimes Ordinal)
SEM for Continuous
Kappa for categorical
Cronbach’s alpha for Multiple items, one meas.
Bland-Altman Plots
Plot difference between test-retest
Shows repeatability and any bias over time
Validity
Are we really measuring what we think we are measuring?
4 Categories of Validity testing
Face Validity
Content Validity
Criterion-related validity
Construct Validity
Face Validity & How to Test
Does it appear to be valid for this measurement (subjective)
- Have clinicians look it over & give opinion
- Perform the test on patients & ask their opinion
- Clinician or Pt may reject it
Content Validity & How to Test
Does instrument address all aspects and only aspects of the attribute being measured?
- 1st author must give definition of what they intend to meas.
- Use a thorough, organized, comprehensive development process.
- May sample expert opinion
- Test for ceiling & floor effects
- Analyze data using factor analysis
What is Factor Analysis
Testing tool used to test CONTENT VALIDITY
- used for multi-item meas. tools
- complex statistical analysis based on correlation among items
- will identify # and type of underlying dimensions being meas
- may identify items that do not correlate/fit w/ other items
i.e: balance test –> might also identify strength –> but that is not what we are trying to measure.
Criterion-Delated Validity & Test
this test requires a “gold standard” to serve as criterion.
Goniometry meas- gold standard is w/ x-ray
We don’t always use them because of cost, time, practicality, and the may be uncomfortable
2 Ways to Test:
1. Concurrent- gold standard used @ same time
2. Predictive- gold standard is some future event
(GRE to predict PT success)
To measure, calculate the correlation b/w the measurement and the gold standard.
Construct Validity
Used for abstract attributes that are difficult to define, where there is NO “gold standard”
Construct= logical argument/hypothesis about how a given means should behave (if it is measuring what you think it is measuring)
Researcher constructs argument, then test that argument via hypothesis testing.
NEVER QUITE FULLY PROVEN
Theoretical Models (Framework)
helps define a variable by stating its relationship to other variables and phenomena
provides a basis for construct validity testing
i.e.: personal/environmental factors
Fear-Avoidance Model of Pain
Construct Validity: Convergent vs. Discriminant
Convergent: correlation w/ other established balance measures (positive or negative)
Discriminant: No correlation with what we are testing.
(might correlate w/ another aspect like “cognition”, but if we are testing balance, it is discriminant)
Look at Self-Check on pg 168 of course pack
:)
Responsiveness
Ability of an instrument to detect clinically important change over time.
Essential for outcome measure when you expect to see progress in response to your treatment
Things to ask…
Time Frame?
Function Measure Has Adequate Responsiveness?
How do researchers meas/compare responsiveness?
I. Must have intervention &/or pt pop. where change is expected.
II. Must follow puts over sufficient time to see change
III. Calc. responsiveness for a # of outcome measures and compare.
Baseline———->Discharge Meas.s & Global Change Meas.s
Global Change Measure /
External Criterion
Responsiveness involves detecting change in subjects who have actually changed.
In an sample, many pts improve, but some stay same or get worse.
Authors may use global change meas. to determine change status. (Improved/Same or stable/Worse)
GROC
Global Rating of Change
A transitional measure
Other Transitional Measures
How have your symptoms changed since you began treatment?
i. complete recovery
ii. much improved
iii. slightly improved
iv. no change
v. slightly worse
vi. much worse
vii. worse than ever
Responsiveness Statistics
&
Advantage
Effect Size
Standardized Response Mean (SRM)
Advantage: Both statistics are unit-less, making comparisons b/w outcome measures easier (even if scales are different)
Effect Size
Mean Change
= ————————
Baseline of sd
(Final-Initial) = ----------------------- Baseline of sd
Standardized Response Mean (SRM)
Mean Change
= ——————————
sd of change scores
(Final-Initial) = ------------------------------ sd of change scores
Interpretation of Cohen’s Effect Sizes
Effect Size &/or SRM*
.8 Large
.5 Moderate
.2 Small
*Both normalized (in sd units)
Population
- remember reliability, validity, and responsiveness are specific to a given population
- can’t assume it will work in a diff. context
- -> try to reference a study where population is similar to pt in your clinic care
Practicality
Consider
- time
- expense
- setting
May depend on context of your measurement
(research study, clinical practice)