Week 6: Reliability, Validity, Epidemiologic Analysis and Dichotomizing Treatment Effect Flashcards
What is reliability?
Extent to which a measurement is consistent and free from error
All reliability scores have…
signal and noise
What is signal?
true score
What is noise?
error
Reliability is the ratio of…
signal to noise
relative reliability
ratio of total variability of scores compared to individual variability within scores
unitless coefficient
ICC and kappa
absolute reliability
indicates how much of a measured value is likely due to error
expressed in the original unit
SEM is commonly used
Standard error of measurement (SEM) for relative measure of reliability
ICC (and kappa)
Standard error of measurement (SEM) for absolute measure of reliability
SEM
Most common types of reliability
test-retest, inter-rater, intra-rater, internal consistency
inter-rater
2+ or more raters who measure the same group of people
intra-rater
the degree that the examiner agrees with himself or herself
2+ measurements on the same subjects
in measurement validity, the test should…
discriminate, evaluate, and predict
reliability is a __________ for validity
prerequisite
content validity
establishes that the multiple items that make up a questionnaire, inventory, or scale adequately sample the universe of content that defines the construct being measured
Criterion-related Validity
establishes the correspondence between a target test and a reference or ‘gold’ standard measure of the same construct
concurrent validity
the extent to which the target test correlates with a reference standard taken at relatively the same time
predictive validity
the extent to which the target test can predict a future reference standard
construct validity
establishes the ability of an instrument to measure the dimensions and theoretical foundation of an abstract construct
convergent validity
the extent to which a test correlates with other tests of closely related constructs
divergent validity
the extent to which a test is uncorrelated with tests of distinct or contrasting constructs
quantifying reliability: ‘old approach’
pearson’s r
assesses relationship
only 2 raters could be compared
Quantifying reliability: ‘modern’ approach
intraclass correlation coefficients (ICC)
cohen’s kappa coefficients
both ICCs and kappa give single indicators of reliability that capture strength of relationship plus agreement in a single value
ICC
values from 0 - 1.0
measures degree of relationship and agreement
can be used for > 2 raters
interval/ratio data
ICC types
six types depending on purpose, design, and type of measurements
ICC type is defined by
two numbers in parentheses
ex: ICC (2,1), ICC (model, form)
model 1
raters are chosen from a larger pop; some subjects are assessed by different raters (rarely used)
model 2
each subject assessed by the same set of raters
when is model 2 used
for test-retest and inter-rater reliability
model 3
each subject is assessed by the same set of raters, but the raters represent the only raters of interest
when do you use model 3
used for intra-rater reliability or when you do not wish to generalize the scores to other raters
ICC forms
second number in parentheses represents number of observations used to obtain reliability estimate
form 1
scores represent a single measurement
form k
scores based on mean of several (k) measurements
ICC interpretation
no absolute standards
ICC > 0.90
best for clinical measurements
ICC > 0.75
good reliability
ICC < 0.75
poor to moderate reliability
cronbach’s alpha (a)
represents correlation among items and correlation of each individual item with the total score
between 0.70 to 0..90
if cronbach’s alpha is too low it means
not measuring the same construct
if cronbach’s alpha is too high it means
redundancy
agreements of reliability for categorical scales are
diagonal
disagreements of reliability for categorical scales are
all other parts of the table
percent agreement
simply how often raters agree
range of 0% to 100%
kappa coefficient
proportion of agreement between raters after chance agreement has been removed
can be used on both nominal and ordinal
can be interpreted like ICC
weighted kappa
best for ordinal data
can choose to make ‘penalty’ worse for larger disagreements
weights can be arbitrary, symmetric or asymmetric
kappa = <.4
poor to fair
kappa = .4 - .6
moderate
kappa = .6 - .8
substantial
kappa = .8 - 1.0
excellent
what does a diagnostic test do
focuses the examination
identify problems
assist in classification
diagnostic test are all about
probabilities and limiting uncertainty
pre-test probability
before any testing takes place
post-test probability
outcome of the test
clinical prediction rules (CPR)
combinations of clinical findings
predictions
quantifies the contributions of a set of variables to diagnosis, prognosis, and likely response to treatment
concurrent validity is
sensitivity/specificity
correlation coefficients
predictive validity is
correlation coefficients
regression
what is sensitivity
proportion of people WITH the disease who have a positive test result
LEFT COLUMN, TOP BOX out of 100
what is specificity
proportion of people WITHOUT the disease who have a negative test result
RIGHT COLUMN, BOTTOM BOX out of 100
SpPin
test with HIGH specificity
Positive
helps rule a condition IN
SnNout
test with HIGH sensitivity
Negative
helps rule a condition OUT
PPV (positive predictive value)
patients with positive tests divided by all patients with positive test results
NPV (negative predictive value)
patients with negative tests divided by all patients with negative test results
Likelihood ratios
quantifies the test’s ability to provide persuasive information
NOT influenced by prevalence
ranges from 0 to infinity
LR is 0 - 1
decreased probability of disease/condition of interest
LR = 1
no diagnostic value; null value
LR > 1
increased probability of disease/condition of interest
farther from 1 = more likely
LR+ =
sensitivity/(1-specificity)
LR- =
(1-sensitivity)/specificity
diagnostic test is positive = what likelihood ratio
LR+
diagnostic test is negative = what likelihood ratio
LR-
LR+ : > 10
large and often conclusive shift
LR+ : 5 - 10
moderate shift
LR+ : 2 - 5
small: sometimes important
LR+ : 1 - 2
small: rarely important
LR- : < 0.1
large and often conclusive shift
LR- : 0.1 - 0.2
moderate shift
LR- : 0.5 - 0.2
small: sometimes important
LR- : 0.5 - 1
small: rarely important
case-control and cohort studies are…
intended to study risk factors
association between disease and exposure
what is an example of a exposure?
cervical manipulation, smoking, running > 20 mi/wk
what is an example of disease or outcome?
cancer, stroke, knee OA
cohort studies subjects are selected based on
exposure
cohort studies are usually
prospective but can be retrospective!
case-control studies are selected based on
whether or not they have a disorder
case-control studies are usually
retrospective
Relative Risk is in
cohort studies (two ‘o’s’)
Odds ratios are in
case-control studies (a and o, ‘at odds’)
RR or OR = 1
null value
no association between an exposure and a disease
if 1 is in CI then it is not significant
RR or OR > 1
positive association between an exposure and a disease
exposure is considered to be harmful
RR or OR < 1
a negative association between an exposure and a disease
exposure is protective
experimental event rate
% patients in experimental group with bad outcome
control event rate
% patients in control group with bad outcome
number needed to treat
how many patients you have to provide treatment to in order to prevent one bad outcome
- the closer to 1 the better
Number Needed to Treat (NNT)
NNT = 1/ARR
if NNT = 1.0 it means
need to treat 1 patient to avoid one adverse outcome
if NNT = 10 it means
need to treat 10 patients to avoid one adverse outcome
do we want a small or big NNT?
SMALL
Number needed to harm (NNH)
NNH = 1/ARI
(ARI = EER - CER)
if NNH is 1.0 it means
we need to treat 1 patient to cause an adverse outcome
if NNH is 10 it means
we would need to treat 10 patients to cause an adverse outcome
do we want a small or big NNH
BIG