5010: Reliability, Responsiveness Flashcards by Angela Kasper

Types of Measurement Error

Systematic
(consistent, unidirectional, biased, “constant”)

Random
(inconsistent, either direction equally likely, try to minimize, on average- will cancel out)

How well did you know this?

Not at all

Perfectly

Sources of Measurement Error

Rater (stabilization, recording)

Meas. Instrument or Method (goni-faulty, consistency- interrater)

Subject (clothing, m. mass, gender, time of day, meds)

How well did you know this?

Not at all

Perfectly

Types of Reliability

Intrarater (usu. MOST reliable)
Interrater
Test-retest (suggests no rater involvement, self-reported data)

How well did you know this?

Not at all

Perfectly

Intraclass Correlation Coefficient
(ICC)

True score variance b/w subjects
= ————————————————
Total variance

How well did you know this?

Not at all

Perfectly

Reliability Coefficient

True score variance b/w subjects
= ————————————————
Total variance + error variance

How well did you know this?

Not at all

Perfectly

Variance

Measure of avg. variability of sample data. Ideally for a clinical measure.

How well did you know this?

Not at all

Perfectly

Interpretation of Reliability Stats

Range: 0-1

How well did you know this?

Not at all

Perfectly

Statistical Measures of Reliability

ICC- for Continuous (sometimes Ordinal)
SEM for Continuous
Kappa for categorical
Cronbach’s alpha for Multiple items, one meas.

How well did you know this?

Not at all

Perfectly

Bland-Altman Plots

Plot difference between test-retest

Shows repeatability and any bias over time

How well did you know this?

Not at all

Perfectly

Validity

Are we really measuring what we think we are measuring?

How well did you know this?

Not at all

Perfectly

4 Categories of Validity testing

Face Validity
Content Validity
Criterion-related validity
Construct Validity

How well did you know this?

Not at all

Perfectly

Face Validity & How to Test

Does it appear to be valid for this measurement (subjective)

Have clinicians look it over & give opinion
Perform the test on patients & ask their opinion
Clinician or Pt may reject it

How well did you know this?

Not at all

Perfectly

Content Validity & How to Test

Does instrument address all aspects and only aspects of the attribute being measured?

1st author must give definition of what they intend to meas.
Use a thorough, organized, comprehensive development process.
May sample expert opinion
Test for ceiling & floor effects
Analyze data using factor analysis

How well did you know this?

Not at all

Perfectly

What is Factor Analysis

Testing tool used to test CONTENT VALIDITY

used for multi-item meas. tools
complex statistical analysis based on correlation among items
will identify # and type of underlying dimensions being meas
may identify items that do not correlate/fit w/ other items

i.e: balance test –> might also identify strength –> but that is not what we are trying to measure.

How well did you know this?

Not at all

Perfectly

Criterion-Delated Validity & Test

this test requires a “gold standard” to serve as criterion.
Goniometry meas- gold standard is w/ x-ray

We don’t always use them because of cost, time, practicality, and the may be uncomfortable

2 Ways to Test:
1. Concurrent- gold standard used @ same time
2. Predictive- gold standard is some future event
(GRE to predict PT success)
To measure, calculate the correlation b/w the measurement and the gold standard.

How well did you know this?

Not at all

Perfectly

Construct Validity

Study These Flashcards

Used for abstract attributes that are difficult to define, where there is NO “gold standard”
Construct= logical argument/hypothesis about how a given means should behave (if it is measuring what you think it is measuring)
Researcher constructs argument, then test that argument via hypothesis testing.
NEVER QUITE FULLY PROVEN

Theoretical Models (Framework)

Study These Flashcards

helps define a variable by stating its relationship to other variables and phenomena

provides a basis for construct validity testing

i.e.: personal/environmental factors
Fear-Avoidance Model of Pain

Construct Validity: Convergent vs. Discriminant

Study These Flashcards

Convergent: correlation w/ other established balance measures (positive or negative)

Discriminant: No correlation with what we are testing.
(might correlate w/ another aspect like “cognition”, but if we are testing balance, it is discriminant)

Look at Self-Check on pg 168 of course pack

Study These Flashcards

Responsiveness

Study These Flashcards

Ability of an instrument to detect clinically important change over time.

Essential for outcome measure when you expect to see progress in response to your treatment

Things to ask…

Study These Flashcards

Time Frame?

Function Measure Has Adequate Responsiveness?

How do researchers meas/compare responsiveness?

Study These Flashcards

I. Must have intervention &/or pt pop. where change is expected.
II. Must follow puts over sufficient time to see change
III. Calc. responsiveness for a # of outcome measures and compare.

Baseline———->Discharge Meas.s & Global Change Meas.s

Global Change Measure /

External Criterion

Study These Flashcards

Responsiveness involves detecting change in subjects who have actually changed.

In an sample, many pts improve, but some stay same or get worse.

Authors may use global change meas. to determine change status. (Improved/Same or stable/Worse)

GROC

Study These Flashcards

Global Rating of Change

A transitional measure

Other Transitional Measures

How have your symptoms changed since you began treatment? i. complete recovery ii. much improved iii. slightly improved iv. no change v. slightly worse vi. much worse vii. worse than ever

Responsiveness Statistics & Advantage

Effect Size Standardized Response Mean (SRM) Advantage: Both statistics are unit-less, making comparisons b/w outcome measures easier (even if scales are different)

Effect Size

Mean Change = ------------------------ Baseline of sd (Final-Initial) = ----------------------- Baseline of sd

Standardized Response Mean (SRM)

Mean Change = ------------------------------ sd of change scores (Final-Initial) = ------------------------------ sd of change scores

Interpretation of Cohen's Effect Sizes

Effect Size &/or SRM* .8 Large .5 Moderate .2 Small *Both normalized (in sd units)

Population

* remember reliability, validity, and responsiveness are specific to a given population * can't assume it will work in a diff. context - -> try to reference a study where population is similar to pt in your clinic care

Practicality

Consider - time - expense - setting May depend on context of your measurement (research study, clinical practice)

5010: Reliability, Responsiveness Flashcards

(31 cards)