Lec2 - Ch5 Classical test theory models Flashcards
Classical Test Theory Models and Conceptual basis
Reliability
- what is it, in regards to tests and scores?
- how much noise is there in a psychological test?
- it is a property of test scores, not of the test itself
> a test might have different psychometric properties for different kinds of respondents (i.e. it could be reliable for an age range but not the other)
> therefore, each set score has some level of reliability
what is the COTAN
- committee that evaluates psychological tests in the Netherlands
- it is part of the NIP (Netherlands Institute of Psychology)
how does the COTAN differentiate tests?
- test used for high-impact inferences at individual level
> very important; big consequences if mistake
> e.g. personnel selection, diagnosis of learning disabilities… - test used for less impact inference at individual level
> descriptive use, less consequences
> e.g. study/therapy progress, career choice test, … - test used at group level
> e.g. costumer team satisfaction, student evaluation, comparing groups
high-impact inferences tests
- reliability rules
- good: 0.9 or larger
- sufficient: between 0.8 and 0.9
- insufficient: smaller than 0.8
less impact inferences tests
- reliability rules
- good: 0.8 or larger
- sufficient: between 0.7 and 0.8
- insufficient: smaller than 0.7
group level tests
- reliability rules
- good: 0.7 or larger
- sufficient: between 0.6 and 0.7
- insufficient: smaller than 0.6
what is the aim of behavioural science?
- it strives to quantify the degree to which differences in one variable are associated with differences in other variables
- these differences have to be measured accurately, hence reliability
What are the assumptions that testing is based on?
- behavioural differences among people exist
- differences have important implications
- they can be measured with precision
what is Classical test theory?
- it is a measurement theory
- it explains reliability and it shows how to measure it
What is the central idea of classical test theory?
Every test taker has a true score on a test
what is reliability according to the classical test theory?
- Extent to which differences in respondent’s observed scores are consistent with differences in true scores
- it derives from observed scores, true scores and measurement error
What are the two main assumptions of Classical Test Theory?
- observed scores are true scores plus measurement error
- measurement error is random (affects everybody but it is not systematic)
> likely to increase or decrease any particular score at random - Xo = Xt + Xe
What are the implications of the assumptions of classical test theory? (consequences)
- mean of the measurement error is equal to zero
> because a non-zero mean would make the error systematic (the error cancels itself out) - the correlation between true score and error is equal to zero
> because the mean is zero for all error values - observed score variance = true score variance + error variance
Observed scores
- value obtained from measuring a characteristic in a sample of individuals
- true score + measurement error
> (it can be seen as a composite score)
True score
- Score that you would get using a perfect measurement instrument
- “real amount” of the characteristic you are measuring
- average score that a participant would obtain if they completed the test an infinite number of times
Measurement Error
- Influences that create random noise in the observed score
- it creates inconsistencies between true and observed scores
> e.g. distraction, not precise meter, .. - it is impossible to know all the sources of measurement error and noise
- we must differenciate to which extent differences in scores are attributable to real differences in the trait or to random external influences
what is the mean measurement error in a test?
- always 0
- it is independent of the individual’s true scores
- inflates or deflates respondents’ scores randomly, therefore it cancels itself out
- error scores are uncorrelated with true scores (r=0)
- see picture 1 for effects of measurement error
Variance of error scores
- how to calculate it
- what it represents
- see picture 2
- it represents the degree to which error affected different people in different ways
> high degree of error variance indicates the potential for poor measurement
how do you calculate the variance of observed scores?
- see picture 3
- variance of observed scores = variance of true scores + variance of error scores
> variability in observed scores will be larger than variability in true scores
What are signal and noise?
- Signal: true score variance
- Noise: measurement error variance
What are the ways to think about reliability?
IMPORTANT!
- see picture 6
- Proportion of variance
> ratio of true score variance to observed score variance
> lack of error variance - Shared variance
> squared correlation between observed scores and true scores
> lack of correlation between observed scores and error scores
Proportion of Variance
1 - Ratio of true score variance to observed score variance
- see picture 4
- true score variance is the signal that we want to detect
- error variance is the noise obscuring the signal
- reliability = signal / (signal+noise)
> signal+noise = observed score variance
What does it mean to obtain a reliability of .48?
- 48% of the differences among people’s observed scores can be attributed to differences among their true levels
- reliability ranges from 0 to 1; if it is 0, it means that the true score variance is also 0, which is impossible in a real world situation
Proportion of variance
2- reliability as lack of measurement error
- see picture 5 and 7
- reliability: degree to which error variance is minimal in comparison with the vairance of observed tests
- reliability = 1- ( noise / (noise+signal) )
- the reliability is high when the error variance is small and the observed score variance is large
what would a small degree of error variance indicate?
- the respondents’ scores are being affected only slightly by measurement error
- the error affecting one person’s score is not very different than the error affecting another person’s score
what is the definition of reliability according to the shared variance?
reliability: proportion of shared variance between true score and observed score
Shared Variance
1- reliability as the squared correlation between observed scores and true scores
- see picture 8
- reliability = squared correlation between observed scores and the true scores
> squaring the correlation gives the amount of variance shared by two variables
> reliability of 1: the differences among respondents’ observed scores are perfectly consistent with the differences among their true scores
> reliability of 0: differences among respondents’ observed scores are totally inconsistent with the differences among their true scores
Index of reliability
unsquared correlation betweeen observed scores and true scores
coefficient of reliability
- squared correlation between observed scores and true scores
- squared index of reliability
- when referring to reliability, we usually use this term
Shared variance
2- reliability as the lack of squared correlation between observed scores and error scores
- see picture 9
- reliability: 1 - squared correlation between observed scores and measurement error
- reliability: degree to which observed scores are uncorrelated with error scores
- as the correlation between observed and error scores increases, the reliability decreases
what does it mean to have an error score standard deviation of 17.8?
it means that on average, the respondents’ observed scores deviated from their true scores by nearly 18 points
standard error of measurement
- what is it
- how to measure it
-* see picture 10*
- standard deviation of error scores
- the larger the standard error of a measurement»_space; the greater the average difference between observed scores and true scores»_space; the test is less reliable
! if reliability is 1, standard error is 0
! the standard error can never be larger than the standard deviation of the observed scores
What are the four models used to calculate reliability?
- parallel test
- tau-equivalent test
- essentially tau-equivalent test
- congeneric test
!! any particular way of estimating reliability is accurate only if the tests being examined actually fit a particular model
- see picture 11
what does it mean for a model to be restrictive?
- the more assumptions are required from a model, the more restrictive the model is
> e.g. the parallel test model has the most assumption, thus it is the most restrictive
what method is the most restrictive? which one is the least?
- most restrictive: parallel test
- least restrictive: congeneric test
What are the assumptions that all four models have in common?
What are their implications?
IMPORTANT
- error scores are random (and thus uncorrelated with true scores)
> respondents’ error scores cancel out across respondents
> respondents’ error scores are uncorrelated with their true scores - unidimensionality
-true scores on one test are linearly related to the true scores on the other test (see picture 12)
what does the linear relationship represent?
- see picture 12
- Xt1: true scores on test one
- a: degree to which the true scores on test 1 are higher or lower than the true scores on test 2
- b: general magnitude and variability among Xt1 and Xt2
what are the implications of error measurement being random when comparing two tests?
- respondents’ error scores on test 1 are uncorrelated with respondents’ error scores on test 2
- respondents’ true scores on test 1 are uncorrelated with respondents’ error scores on test 2
what are the general differences among the tests?
IMPORTANT
-
parallel test: everything is equal between test1 and test2
> measurement error variance is equal -
tau-equivalent test: error and observed score variances differ
> observed mean remains the same -
essential tau-equivalent test: true and observed score means differ
> observed means and variances differ
> true score variance remains the same -
congeneric test: everything differs
> true score variance also differ
!! see picture 11
in what models do the two tests have the same reliability?
- only in parallel tests model
> this is because the correlation between observed scores on the two tests = the ratio of true score variance to observed score variance
> this ratio is also the ratio defining reliability
Parallel test model
- Xt1 = Xt2
- reliability = correlation between the two tests
- test-retest and split-halves reliability are based on this model
> the slope linking the true scores on the two tests is 1 (b=1)
> the intercept linking the true scores on the two tests is 0 (a=0)
> the two tests have the same level of error variance
how can the correlation in parallel tests be calculated?
- see picture 13
- the correlation between the observed scores on the two tests is the covariance between their observed scores divided by the product of the standard deviations of their observed scores
-> the correlation between the scores on parallel tests is equal to the ratio of true score variance to observed score variance (= reliability)
what are the implications of the parallel test model?
… between test 1 and test 2
- means of true scores are equal
- variance of true scores are equal
- mean of observed scores are equal
- correlation between the true scores is 1
- reliabilities are equal
- variance of observed scores are equal
Tau-equivalent test model - implications
- Xt2 = Xt1
!!variance of error and observed scores is not equal - mean of true scores on test1 and test2 are equal
- variance of true scores on test1 and test2 are equal
- mean of observed scores on test 1 and 2 are equal
- correlation between the true scores on test1 and test2 is 1
Essential tau-equivalent test model - implications
- Xt2 = a + Xt1
- mean true and observed scores are different (“a”)
- variance of true scores on test1 and test2 are equal
- variance of observed scores on test1 and test2 is not equal
- correlation between the true scores on test1 and test2 is 1
> Cronbach’s alpha is based on this model
congeneric test model
- Xt2 = a + bXt1
- mean of true scores is different (“a”)
- variance of true scores are different (“b”)
- correlation between the true scores on test1 and test2 is 1
> omega is based on this model
what are the advantages and disadvantages of the congeneric test model?
- advantage: less restrictive, therefore tests are more likely to fit this model
- disadvantage: options for estimating reliability are relatively limited
only in book
Domain sampling theory
- it assumes that items on any particular test represent a sample from a large indefinite number of potential items
- responses to each item are considered a function of the psychological attribute
> each item can be seen as a sample drawn from a population of similar items, which are of equally good measure
What is reliability according to the domain sampling theory?
reliability is the average size of the correlations among all possible pairs of tests with N items selected from a domain of test items