Study Guide 7: CTT Flashcards
True score
- A person’s average score on a test over an infinite number of repeated tests (error would be eliminated over an infinite #). It describes the theoretical performance on a test.
- in CTT, ‘T’ incorporates systematic error
Systematic error
- Error that affects the individual the same way each time he or she takes the test (e.g. reading ability, test wiseness).
- In CTT, systematic error is incorporated in the true score ‘T’
- Affects validity of scores, NOT reliability
Unsystematic error
- Random error that affects individuals differently each time a test is taken (eg noise, anxiety).
- In CTT, ‘E’ refers to unsystematic error.
Classical Test Theory (CTT)
X = T + E
where X is observed score, T is true score and E is error (unsystematic)
Index of Reliability
- The proportion of true score variance reflected in the observed score variance.
- quantifies the closeness of the relationship btw X and T for a set of examinees
Reliability Estimates
1) alternate forms reliability
2) test-retest reliability
3) split-half reliability
4) cronbach’s coefficient alpha
5) kuder-Richardson formula 20
Sources of error
1) content sampling error: error d/t items selected/heterogeneity
2) time sampling error: error d/t daily fluctuations that affect test performance
3) scorer error: error d/t scorer variability in test-retest
Variables that affect reliability
1) test length: longer tests increase reliability
2) group heterogeneity: greater diversity increases reliability
3) item difficulty: items of medium difficulty increase reliability
Sources of error
1) content sampling error: error d/t items selected/heterogeneity
2) time sampling error: error d/t daily fluctuations that affect test performance
3) scorer error: error d/t scorer variability in test-retest
Variables that affect reliability
1) test length: longer tests increase reliability
2) group heterogeneity: greater diversity increases reliability
3) item difficulty: items of medium difficulty increase reliability
observed score
: True score + error (X = T + E); result obtained from a single sampling
reliabiliaty
- The extent to which test scores remain consistent over repeated administrations of the same or parallel test
- The degree to which test scores are free from measurement error
- Increases with 1) greater test length, 2) greater group heterogeneity, and 3) item difficulty closer to medium.
Reliability coefficient:
: Indicates % of variability in observed scores due to individual (true score) differences (or variability); implies the remainder is due to random measurement error.
standard error of measurement
- average size of error scores
- helps to interpret accuracy of test scores
- sem = so √1-Rxx, where Rxx= reliability
- if Rxx = 1, sem = 0
tau equivalence
tests are “parallel” if:
a) the tests measure the same psychological construct – the true scores on one test are equal to the true scores on the other test
b) the tests have the same level of error variance
if the items meet tau equivalence, then alpha, KR-20, and the split-half reliability
will all give identical and accurate estimates of reliability
if the items meet tau equivalence, then alpha, KR-20, and the split-half reliability
will all give identical and accurate estimates of reliability
essential tau equivalence
this is less strict than tau equivalence
theoretically, this means that the true scores for two tests (or two versions of a
test) are the same
this is estimated, practically-speaking, by seeing if the observed scores on the
two tests (or test versions) have the same (or nearly the same) mean
the requirement for equal error variances (as seen for tau equivalence) is not
made if the items only meet essential tau equivalence, then the alpha and KR-20 will
give identical and accurate estimates, but the split-half reliability estimate will not
what happens to Alpha, KR-20 and tau equivalence if neither tau or essnetial tau equivalence is met?
if the items meet neither tau equivalence or essential tau equivalence, then alpha
and KR-20 will underestimate the reliability (although it is not known by how
much or how little) and the split-half reliability estimate will be inaccurate
split-half reliability
reliability as an estimate of interal consistency – when you divide your test into two (odd vs. even numbered items, or 1-10, 11-20) and then correlate performance on the two halves.
alternate forms reliability
an estimate of reliability as equivalence
cronbach’s alpha coefficient
this is THE most common estimate of reliability. It takes every possible way of splitting a test into two halves - and then takes the average of those split-half reliabilities. KR-20 is incorporated in SPSS calculations of cronbach’s alpha.