Testing Flashcards
Reliability vs Validity
Consistency of measurement of a given score; Estimate of the degree to which a test is free from measurement error
Degree to which a test actually measures what it is intended to measure
Reliability evidence:
- Test-Retest
- Internal
- Alternate form
- Interrater
Extent that a measure produces consistent scores across time; Looks at correlation between two scores of same sample across time
- Pearson’s r - -1 to 1 (Looks at correlation) - 0.8+ is good
- Participant characteristics, practice effects, time interval impact
Extent to which individual items within a test measure same domain or construct
- Spit-half reliability coefficient: Split test items in 2 halves and correlate score between halves
- Cronbach’s alpha: Estimates based in all possible ways of splitting test items
- # of domains (more=bad) and items (more=good) impact
Consistency of test results between 2 diff forms of a test
- Administer 2 versions of test and calculate correlation between scores
Degree of consensus between diff raters in scoring test items
- Percent agreement: Nominal data (classifications/ratings); Calculate percentage of items raters agree on (75%+ good)
- Cohen’s kappa: Calculates percentage of items raters agree on + accounts for agreement that occurs by chance
What are the interpretations for a test w/
High internal, test-retest, interrater
Low internal, high test-retest + interrater
Low test-restest, high internal + interrater
An ideal test for most purposes
Scores reflect a test w/ heterogenous item content; BUT scores are based on items that are measuring something other than the construct the test is designed to measure
Scores reflect a test measuring fluctuating ability; BUT scores are too vulnerable to the fx of normal variability and time
Tripartite model of validity:
- Content-related evidence
- Construct-related evidence (Convergent vs divergent)
- Criterion-related evidence (Concurrent vs predictive)
Extent to which a test covers full range of construct
- Subject matter experts review items’ relevance w/ content validity ratio (-1 to +1)
Extent to which a test measures theoretical construct
- Convergent: Extent to which scores positively correlate w/ existing measures of the SAME construct
- Divergent: Extent to which a measure does not correlate with measures of DISSIMILAR constructs (Should not be higher than 0.7)
Extent to which a test accurately predicts/correlates w/ specific criterion/outcome
- Concurrent: Extent to which a test correlates w/ a criterion that is measured at THE SAME TIME
- Predictive: Extent to which a test correlates w/ a criterion that is measured at SOME POINT IN THE FUTURE