Lec2 - Ch5 Classical test theory models Flashcards

Classical Test Theory Models and Conceptual basis

1
Q

Reliability
- what is it, in regards to tests and scores?

A
  • how much noise is there in a psychological test?
  • it is a property of test scores, not of the test itself
    > a test might have different psychometric properties for different kinds of respondents (i.e. it could be reliable for an age range but not the other)
    > therefore, each set score has some level of reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the COTAN

A
  • committee that evaluates psychological tests in the Netherlands
  • it is part of the NIP (Netherlands Institute of Psychology)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

how does the COTAN differentiate tests?

A
  • test used for high-impact inferences at individual level
    > very important; big consequences if mistake
    > e.g. personnel selection, diagnosis of learning disabilities…
  • test used for less impact inference at individual level
    > descriptive use, less consequences
    > e.g. study/therapy progress, career choice test, …
  • test used at group level
    > e.g. costumer team satisfaction, student evaluation, comparing groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

high-impact inferences tests
- reliability rules

A
  • good: 0.9 or larger
  • sufficient: between 0.8 and 0.9
  • insufficient: smaller than 0.8
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

less impact inferences tests
- reliability rules

A
  • good: 0.8 or larger
  • sufficient: between 0.7 and 0.8
  • insufficient: smaller than 0.7
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

group level tests
- reliability rules

A
  • good: 0.7 or larger
  • sufficient: between 0.6 and 0.7
  • insufficient: smaller than 0.6
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the aim of behavioural science?

A
  • it strives to quantify the degree to which differences in one variable are associated with differences in other variables
  • these differences have to be measured accurately, hence reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the assumptions that testing is based on?

A
  • behavioural differences among people exist
  • differences have important implications
  • they can be measured with precision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is Classical test theory?

A
  • it is a measurement theory
  • it explains reliability and it shows how to measure it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the central idea of classical test theory?

A

Every test taker has a true score on a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is reliability according to the classical test theory?

A
  • Extent to which differences in respondent’s observed scores are consistent with differences in true scores
  • it derives from observed scores, true scores and measurement error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two main assumptions of Classical Test Theory?

A
  • observed scores are true scores plus measurement error
  • measurement error is random (affects everybody but it is not systematic)
    > likely to increase or decrease any particular score at random
  • Xo = Xt + Xe
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the implications of the assumptions of classical test theory? (consequences)

A
  • mean of the measurement error is equal to zero
    > because a non-zero mean would make the error systematic (the error cancels itself out)
  • the correlation between true score and error is equal to zero
    > because the mean is zero for all error values
  • observed score variance = true score variance + error variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Observed scores

A
  • value obtained from measuring a characteristic in a sample of individuals
  • true score + measurement error
    > (it can be seen as a composite score)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True score

A
  • Score that you would get using a perfect measurement instrument
  • “real amount” of the characteristic you are measuring
  • average score that a participant would obtain if they completed the test an infinite number of times
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Measurement Error

A
  • Influences that create random noise in the observed score
  • it creates inconsistencies between true and observed scores
    > e.g. distraction, not precise meter, ..
  • it is impossible to know all the sources of measurement error and noise
  • we must differenciate to which extent differences in scores are attributable to real differences in the trait or to random external influences
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the mean measurement error in a test?

A
  • always 0
  • it is independent of the individual’s true scores
  • inflates or deflates respondents’ scores randomly, therefore it cancels itself out
  • error scores are uncorrelated with true scores (r=0)
  • see picture 1 for effects of measurement error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Variance of error scores
- how to calculate it
- what it represents

A
  • see picture 2
  • it represents the degree to which error affected different people in different ways
    > high degree of error variance indicates the potential for poor measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

how do you calculate the variance of observed scores?

A
  • see picture 3
  • variance of observed scores = variance of true scores + variance of error scores
    > variability in observed scores will be larger than variability in true scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are signal and noise?

A
  • Signal: true score variance
  • Noise: measurement error variance
21
Q

What are the ways to think about reliability?
IMPORTANT!

A
  • see picture 6
  • Proportion of variance
    > ratio of true score variance to observed score variance
    > lack of error variance
  • Shared variance
    > squared correlation between observed scores and true scores
    > lack of correlation between observed scores and error scores
22
Q

Proportion of Variance
1 - Ratio of true score variance to observed score variance

A
  • see picture 4
  • true score variance is the signal that we want to detect
  • error variance is the noise obscuring the signal
  • reliability = signal / (signal+noise)
    > signal+noise = observed score variance
23
Q

What does it mean to obtain a reliability of .48?

A
  • 48% of the differences among people’s observed scores can be attributed to differences among their true levels
  • reliability ranges from 0 to 1; if it is 0, it means that the true score variance is also 0, which is impossible in a real world situation
24
Q

Proportion of variance
2- reliability as lack of measurement error

A
  • see picture 5 and 7
  • reliability: degree to which error variance is minimal in comparison with the vairance of observed tests
  • reliability = 1- ( noise / (noise+signal) )
  • the reliability is high when the error variance is small and the observed score variance is large
25
Q

what would a small degree of error variance indicate?

A
  • the respondents’ scores are being affected only slightly by measurement error
  • the error affecting one person’s score is not very different than the error affecting another person’s score
26
Q

what is the definition of reliability according to the shared variance?

A

reliability: proportion of shared variance between true score and observed score

27
Q

Shared Variance
1- reliability as the squared correlation between observed scores and true scores

A
  • see picture 8
  • reliability = squared correlation between observed scores and the true scores
    > squaring the correlation gives the amount of variance shared by two variables
    > reliability of 1: the differences among respondents’ observed scores are perfectly consistent with the differences among their true scores
    > reliability of 0: differences among respondents’ observed scores are totally inconsistent with the differences among their true scores
28
Q

Index of reliability

A

unsquared correlation betweeen observed scores and true scores

29
Q

coefficient of reliability

A
  • squared correlation between observed scores and true scores
  • squared index of reliability
  • when referring to reliability, we usually use this term
30
Q

Shared variance
2- reliability as the lack of squared correlation between observed scores and error scores

A
  • see picture 9
  • reliability: 1 - squared correlation between observed scores and measurement error
  • reliability: degree to which observed scores are uncorrelated with error scores
  • as the correlation between observed and error scores increases, the reliability decreases
31
Q

what does it mean to have an error score standard deviation of 17.8?

A

it means that on average, the respondents’ observed scores deviated from their true scores by nearly 18 points

32
Q

standard error of measurement
- what is it
- how to measure it

A

-* see picture 10*
- standard deviation of error scores
- the larger the standard error of a measurement&raquo_space; the greater the average difference between observed scores and true scores&raquo_space; the test is less reliable
! if reliability is 1, standard error is 0
! the standard error can never be larger than the standard deviation of the observed scores

33
Q

What are the four models used to calculate reliability?

A
  • parallel test
  • tau-equivalent test
  • essentially tau-equivalent test
  • congeneric test

!! any particular way of estimating reliability is accurate only if the tests being examined actually fit a particular model
- see picture 11

34
Q

what does it mean for a model to be restrictive?

A
  • the more assumptions are required from a model, the more restrictive the model is
    > e.g. the parallel test model has the most assumption, thus it is the most restrictive
35
Q

what method is the most restrictive? which one is the least?

A
  • most restrictive: parallel test
  • least restrictive: congeneric test
36
Q

What are the assumptions that all four models have in common?
What are their implications?
IMPORTANT

A
  • error scores are random (and thus uncorrelated with true scores)
    > respondents’ error scores cancel out across respondents
    > respondents’ error scores are uncorrelated with their true scores
  • unidimensionality
    -true scores on one test are linearly related to the true scores on the other test (see picture 12)
37
Q

what does the linear relationship represent?

A
  • see picture 12
  • Xt1: true scores on test one
  • a: degree to which the true scores on test 1 are higher or lower than the true scores on test 2
  • b: general magnitude and variability among Xt1 and Xt2
38
Q

what are the implications of error measurement being random when comparing two tests?

A
  • respondents’ error scores on test 1 are uncorrelated with respondents’ error scores on test 2
  • respondents’ true scores on test 1 are uncorrelated with respondents’ error scores on test 2
39
Q

what are the general differences among the tests?
IMPORTANT

A
  • parallel test: everything is equal between test1 and test2
    > measurement error variance is equal
  • tau-equivalent test: error and observed score variances differ
    > observed mean remains the same
  • essential tau-equivalent test: true and observed score means differ
    > observed means and variances differ
    > true score variance remains the same
  • congeneric test: everything differs
    > true score variance also differ
    !! see picture 11
40
Q

in what models do the two tests have the same reliability?

A
  • only in parallel tests model
    > this is because the correlation between observed scores on the two tests = the ratio of true score variance to observed score variance
    > this ratio is also the ratio defining reliability
41
Q

Parallel test model

A
  • Xt1 = Xt2
  • reliability = correlation between the two tests
  • test-retest and split-halves reliability are based on this model
    > the slope linking the true scores on the two tests is 1 (b=1)
    > the intercept linking the true scores on the two tests is 0 (a=0)
    > the two tests have the same level of error variance
42
Q

how can the correlation in parallel tests be calculated?

A
  • see picture 13
  • the correlation between the observed scores on the two tests is the covariance between their observed scores divided by the product of the standard deviations of their observed scores
    -> the correlation between the scores on parallel tests is equal to the ratio of true score variance to observed score variance (= reliability)
43
Q

what are the implications of the parallel test model?

A

… between test 1 and test 2
- means of true scores are equal
- variance of true scores are equal
- mean of observed scores are equal
- correlation between the true scores is 1
- reliabilities are equal
- variance of observed scores are equal

44
Q

Tau-equivalent test model - implications

A
  • Xt2 = Xt1
    !!variance of error and observed scores is not equal
  • mean of true scores on test1 and test2 are equal
  • variance of true scores on test1 and test2 are equal
  • mean of observed scores on test 1 and 2 are equal
  • correlation between the true scores on test1 and test2 is 1
45
Q

Essential tau-equivalent test model - implications

A
  • Xt2 = a + Xt1
  • mean true and observed scores are different (“a”)
  • variance of true scores on test1 and test2 are equal
  • variance of observed scores on test1 and test2 is not equal
  • correlation between the true scores on test1 and test2 is 1

> Cronbach’s alpha is based on this model

46
Q

congeneric test model

A
  • Xt2 = a + bXt1
  • mean of true scores is different (“a”)
  • variance of true scores are different (“b”)
  • correlation between the true scores on test1 and test2 is 1

> omega is based on this model

47
Q

what are the advantages and disadvantages of the congeneric test model?

A
  • advantage: less restrictive, therefore tests are more likely to fit this model
  • disadvantage: options for estimating reliability are relatively limited
48
Q

only in book

Domain sampling theory

A
  • it assumes that items on any particular test represent a sample from a large indefinite number of potential items
  • responses to each item are considered a function of the psychological attribute
    > each item can be seen as a sample drawn from a population of similar items, which are of equally good measure
49
Q

What is reliability according to the domain sampling theory?

A

reliability is the average size of the correlations among all possible pairs of tests with N items selected from a domain of test items