Chapter 6: Properties of Good Measures Flashcards
Properties of Good Measures (2)
(1) Reliability
(2) Validity
Reliability is a necessary condition for …
validity
-> At a certain point, your score is mostly made up of error, so it can’t be validly measuring what you want it to
Less reliable = more …
error
-> The less reliable, the more error there is (+ measurement error): Not measuring real variability between people
Diff types of reliability (5)
(1) Internal consistency
(2) Test-retest reliability
(3) Inter-rater reliability
(4) Parallel-form reliability
(5) Split-half reliability
Internal consistency
High correlation between items. Cronbach’s alpha.
Test-retest reliability
If ADHD symptoms today, scores same in same weeks (highly correlated with each others)
-> Be careful! Some constructs should change over time, even over very short intervals
-> E.g. ADHD symptoms at 8 vs early 20s: wouldn’t expect to be correlated. Some symptoms go down, some hyperactivity might just be ‘be a young kid’. Test-retest wouldn’t make sense.
Inter-rater reliability
Agreement between two people judging whether something is present or occurring
Parallel-form reliability
The reliability coefficient obtained by two comparable sets of measures
-> See this a lot more in educational testing settings/IQ testing/academic testing
Split-half reliability
Reflects the correlations between two halves of an instrument
-> E.g. Super long measure (100 items)
-> Are the scores that pple get on the first 50 items associated with scores that pple get on the second 50 items.
We are creating new ADHD rating scales. We create 2 10-item scales that are meant to be similar in the symptoms they measure. In a very large sample of adolescents, scores between these 2 measures are correlated @ .92. What type of reliability is measured?
Parallel form
Types of Validity (3)
(1) Convergent validity
(2) Discriminant validity
(3) Face validity
Convergent validity
Are scores on the measure related to other measures or indicators of the SAME construct
Discriminant validity
Are scores on the measure DIFFERENT from scores of other constructs
-> Our measure should be uncorrelated with unrelated/random constructs
Face validity
Does this APPEAR to measure what it is supposed to measure
-> Maybe low face validity if i don’t want the person to know what i’m testing
Measurement Invariance
‘Fairness’ of a measure
-> Has to do with all of the argument around potential cultural bias in diff tasks (e.g. GRE, LSATS, SAT…)
-> People have a lot of concern over whether scales are INVARIANT (= function similarly across diff groups)
-> If there’s a lot of measurement bias, certain types of pple will perform systematically better than others (even if not better on the trait)