L2: Classical test theory Flashcards
Ch 5, 6, 7 (70 cards)
what is the central statistic of classical test theory?
definition & synonyms
summed item score (sum of the scores on the items)
synyonyms: sum score, test score, score on the test
what is the central idea behind classical test theory?
- every test taker has a true score on a test, which is underlying the summed item score
- true score: score that you would get using a perfect measurement instrument
- observed score will generally not equal the true score due to measurement error
define measurement error
- other influences that cause random noise in the observed score
- goal is to minimize this error to improve reliability
what are the 2 core assumptions underlying classical test theory?
assumptions & what follows
- observed scores are true scores plus measurement error: Xo = Xt + Xe
- measurement error is random
what follows:
- mean of the error = 0, because a nonzero mean would make the measurement error systematic
- correlation between true score and error = 0 (Rte=0) because the mean of error is 0
- observed score variance = true scores variance + error variance (So^2 = St^2+ Se^2
what are the 4 ways of thinking about reliability?
as a proportion of variance:
- ratio of true score variance to observed score variance
- lack of error variance (reliable tests have minimal error variance)
as shared variance:
- correlation between observed scores & true scores (reliability is the squared correlation between these 2)
- lack of correlation between observed scores & error scores (highly reliable test shows lil correlation between observed scores & error)
how can you define reliability as a proportion of variance?
comes from So^2 = St^2 (signal) + Se^2 (noise) assumption
high reliability when most of So^2 is St^2
low reliability when most of So^2 is Se^2
reliability = signal / signal + noise = St^2 / St^2 + Se^2 = St^2/So^2
and
reliability = 1 - noise / signal + noise = 1 - ( Se^2 / (St^2 + Se^2)) = 1- Se^2/So^2
how can you define reliability as shared variance?
low reliablity if Xt (true score) shares not a lot of variance with Xo (observed score)
high reliablity if Xt shares a lot of variance w Xo
reliability = correlation (Xo, Xt)^2 = Rot^2 aka the amount of variance shared by observed score and true score
and
reliability = 1- correlation (Xo, Xe)^2 = 1- Roe^2 aka 1 - the amount of variance shared by observed score and error score
what are the 4 models to test reliability from most restrictive to least restrictive?
- parallel test (most restrictive)
- tau equivalent test
- essentially tau equivalent test
- congeneric test (least restrictive)
what are the restrictions of parallel test?
restriction on Xt1 (first testβ true score): needs to = Xt2
restriction on Se1^2 and Se2^2: need to be equal to each other (the variances of the measurement errors)
implication:
- mean of Xt1 and Xt2 need to be equal
- variance of Xt1 and Xt2 need to be equal
- mean of Xo1 and Xo2 need to be equal CAN BE TESTED
- correlation between Xt1 and Xt2 = 1 (Rt1t2 = 1)
- reliability of test 1 = reliability of test 2, so also Rt1o1 = Rt2o2 (correlation between observed and true score)
- variance of observed scores on test 1 and test 2 are equal (So1^2 = So2^2) CAN BE TESTED
what is the model of observed score of test 1 and 2 according to parallel test? and of the true score?
model observed score on test 1: Xo1 = Xt1 + Xe1 and Se1^2 = Se2^2
model observed score on test 2:
Xo2 = Xt1 + Xe2 and Se2^2 = Se1^2
model for the true score:
Xt2 = Xt1
what are the 2 types of reliability based on the parallel test model?
- test restest reliability
- split halves reliability
what are the restrictions on the tau equivalent test?
restriction on Xt1 (first testβ true score): needs to = Xt2
no restriction on measurement error variances
implications
- mean of Xt1 = mean of Xt2
- variance of Xt1 = variance of Xt2 (St1^2 = St2^2)
- mean of Xo1 = mean of Xo2 CAN BE TESTED
- correlation between true scores on test 1 and test 2 = 1 (Rt1t2 = 1)
what type of reliability is based on essential tau equivalent test model?
cronbachs alpha
what is the model of observed score of test 1 and 2 according to essential tau equivalent test model? and of the true score?
model for true score:
Xt2 = a + Xt1
model observed score of test 1: Xo1 = Xt1 + Xe1
model observed score of test 2: Xo2 = a + Xt1 + Xe2
what is the model of observed score of test 1 and 2 according to tau equivalent test model? and of the true score?
model for true score: Xt2 = Xt1
model observed score of test 1: Xo1 = Xt1 + Xe1
model observed score of test 2: Xo2 = Xt1 + Xe2
what are the restrictionson essentially tau equivalent test?
restriciton on Xt2: a + Xt1 (true scores on second test are equal to true scores on first + any number)
no restriction on measurement error variances
implications:
- mean of true scores are different
- variance of true scores are equal (St1^2 = St2^2)
- correlation between true scores on test 1 and test 2 is 1 (Rt1t2 = 1)
what is the model of observed score of test 1 and 2 according to congeneric test model? and of the true score?
model for true score:
Xt2 = a + bXt1
model observed score test 1:
Xo1 = Xt1 + Xe1
model observed score test 2:
Xo2 = a + bXt1 + Xe2
what are the restrictions on the congeneric test?
Xt2 = a + bXt1
no restriction on measurement error variances
implications:
- mean of true scores are different
- variances of true scores are different
- correlation between true scores on test 1 and 2 = 1 (Rt1t2 = 1)
what reliability measure is based on congeneric model?
omega
what are 3 methods of estimating reliability?
- alternate forms reliability
- test retest reliability
- internal consistency reliability
what is the alternate forms reliability estimation technique?
- assumes parallel test model (meaning they measure same trait w same amoutn of error variance)
- apply 2 versions of the same test
- correlation between the 2 forms is the reliability
what are the main challenges with the alternate forms reliability estimation technique?
- constructing the alternate forms of the same test is hard
- carry over effects (lack of motivation, fatigue etc)
what is the test retest reliability estimation technique?
- assumes parallel test model
- apply same test twice to same group but at different times
- correlation is the reliability
- assumes the trait being measured remains stable over time (which isnt always the case)
what are the main challenges with test retest reliability technique?
- carry over effects
- change in the true score: for constructs that fluctuate, like mood, the true score might change between the 2 tests