L2: Classical test theory Flashcards

Ch 5, 6, 7

1
Q

what is the central statistic of classical test theory?

definition & synonyms

A

summed item score (sum of the scores on the items)
synyonyms: sum score, test score, score on the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is the central idea behind classical test theory?

A
  • every test taker has a true score on a test, which is underlying the summed item score
  • true score: score that you would get using a perfect measurement instrument
  • observed score will generally not equal the true score due to measurement error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define measurement error

A
  • other influences that cause random noise in the observed score
  • goal is to minimize this error to improve reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the 2 core assumptions underlying classical test theory?

assumptions & what follows

A
  1. observed scores are true scores plus measurement error: Xo = Xt + Xe
  2. measurement error is random
    what follows:
    - mean of the error = 0, because a nonzero mean would make the measurement error systematic
    - correlation between true score and error = 0 (Rte=0) because the mean of error is 0
    - observed score variance = true scores variance + error variance (So^2 = St^2+ Se^2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the 4 ways of thinking about reliability?

A

as a proportion of variance:
- ratio of true score variance to observed score variance
- lack of error variance (reliable tests have minimal error variance)
as shared variance:
- correlation between observed scores & true scores (reliability is the squared correlation between these 2)
- lack of correlation between observed scores & error scores (highly reliable test shows lil correlation between observed scores & error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how can you define reliability as a proportion of variance?

A

comes from So^2 = St^2 (signal) + Se^2 (noise) assumption

high reliability when most of So^2 is St^2
low reliability when most of So^2 is Se^2

reliability = signal / signal + noise = St^2 / St^2 + Se^2 = St^2/So^2
and
reliability = 1 - noise / signal + noise = 1 - ( Se^2 / (St^2 + Se^2)) = 1- Se^2/So^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how can you define reliability as shared variance?

A

low reliablity if Xt (true score) shares not a lot of variance with Xo (observed score)
high reliablity if Xt shares a lot of variance w Xo

reliability = correlation (Xo, Xt)^2 = Rot^2 aka the amount of variance shared by observed score and true score
and
reliability = 1- correlation (Xo, Xe)^2 = 1- Roe^2 aka 1 - the amount of variance shared by observed score and error score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the 4 models to test reliability from most restrictive to least restrictive?

A
  1. parallel test (most restrictive)
  2. tau equivalent test
  3. essentially tau equivalent test
  4. congeneric test (least restrictive)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the restrictions of parallel test?

A

restriction on Xt1 (first test’ true score): needs to = Xt2
restriction on Se1^2 and Se2^2: need to be equal to each other (the variances of the measurement errors)
implication:
- mean of Xt1 and Xt2 need to be equal
- variance of Xt1 and Xt2 need to be equal
- mean of Xo1 and Xo2 need to be equal CAN BE TESTED
- correlation between Xt1 and Xt2 = 1 (Rt1t2 = 1)
- reliability of test 1 = reliability of test 2, so also Rt1o1 = Rt2o2 (correlation between observed and true score)
- variance of observed scores on test 1 and test 2 are equal (So1^2 = So2^2) CAN BE TESTED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the model of observed score of test 1 and 2 according to parallel test? and of the true score?

A

model observed score on test 1: Xo1 = Xt1 + Xe1 and Se1^2 = Se2^2
model observed score on test 2:
Xo2 = Xt1 + Xe2 and Se2^2 = Se1^2
model for the true score:
Xt2 = Xt1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the 2 types of reliability based on the parallel test model?

A
  • test restest reliability
  • split halves reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are the restrictions on the tau equivalent test?

A

restriction on Xt1 (first test’ true score): needs to = Xt2
no restriction on measurement error variances
implications
- mean of Xt1 = mean of Xt2
- variance of Xt1 = variance of Xt2 (St1^2 = St2^2)
- mean of Xo1 = mean of Xo2 CAN BE TESTED
- correlation between true scores on test 1 and test 2 = 1 (Rt1t2 = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what type of reliability is based on essential tau equivalent test model?

A

cronbachs alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the model of observed score of test 1 and 2 according to essential tau equivalent test model? and of the true score?

A

model for true score:
Xt2 = a + Xt1
model observed score of test 1: Xo1 = Xt1 + Xe1
model observed score of test 2: Xo2 = a + Xt1 + Xe2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the model of observed score of test 1 and 2 according to tau equivalent test model? and of the true score?

A

model for true score: Xt2 = Xt1
model observed score of test 1: Xo1 = Xt1 + Xe1
model observed score of test 2: Xo2 = Xt1 + Xe2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the restrictionson essentially tau equivalent test?

A

restriciton on Xt2: a + Xt1 (true scores on second test are equal to true scores on first + any number)
no restriction on measurement error variances
implications:
- mean of true scores are different
- variance of true scores are equal (St1^2 = St2^2)
- correlation between true scores on test 1 and test 2 is 1 (Rt1t2 = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the model of observed score of test 1 and 2 according to congeneric test model? and of the true score?

A

model for true score:
Xt2 = a + bXt1
model observed score test 1:
Xo1 = Xt1 + Xe1
model observed score test 2:
Xo2 = a + bXt1 + Xe2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what are the restrictions on the congeneric test?

A

Xt2 = a + bXt1
no restriction on measurement error variances
implications:
- mean of true scores are different
- variances of true scores are different
- correlation between true scores on test 1 and 2 = 1 (Rt1t2 = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what reliability measure is based on congeneric model?

A

omega

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are 3 methods of estimating reliability?

A
  1. alternate forms reliability
  2. test retest reliability
  3. internal consistency reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the alternate forms reliability estimation technique?

A
  • assumes parallel test model (meaning they measure same trait w same amoutn of error variance)
  • apply 2 versions of the same test
  • correlation between the 2 forms is the reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what are the main challenges with the alternate forms reliability estimation technique?

A
  • constructing the alternate forms of the same test is hard
  • carry over effects (lack of motivation, fatigue etc)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is the test retest reliability estimation technique?

A
  • assumes parallel test model
  • apply same test twice to same group but at different times
  • correlation is the reliability
  • assumes the trait being measured remains stable over time (which isnt always the case)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what are the main challenges with test retest reliability technique?

A
  • carry over effects
  • change in the true score: for constructs that fluctuate, like mood, the true score might change between the 2 tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is the internal consistency reliability estimation technique?

A
  • looks at how consistent the items within a single test are w each other, if items on a test measure the same trait, the test should be internally consistent
  • assumes parallel or essential tau equivalent test model (theres multiple internal consistency techniques)
  • consider (blocks of) items as seperate tests
  • formula will give the reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

what are the main challenges with the internal consistency reliabiltiy technique?

A

carry over effects (by end of test might be way better at answering or more tired or whatever so different parts of test might not be as consistent w each other)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what are the 3 types of internal consistency reliability techniques?

A
  1. split half
    - assumes parallel test model
    - split test in 2 parts
    - formula gives reliabilty (see table 6.2)
  2. cronbachs alpha (or KR20 for binary items or standardized alpha)
    - assumes essential tau equivalent test model
    - each item considered separate part
    - formula gives reliability (see table 6.2)
  3. omega
    - assumes congeneric test model (or stricter)
    - not applied in practice yet
    - estimate true score variance using unidimensional factor analysis
    - reliability is true score variance / observed score variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

evaluate the 4 main ways of estimating reliability: alternate forms, test retest, split half, cronbachs alpha

A
  • alternate forms: hardly feasible in practice, only in specific situations
  • test retest: important to establish reliability for new test, in research only rarely used
  • split half: depends highly on split used (undesirable), still used frequently
  • cronbachs alpha/kr20: very pop in research due to its ease, assumption (essential tau equivalence) hardly met; its the lower bound of the reliability! so the actual reliability will be equal to or higher then cronbachs alpha
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what are the COTAN guideliens about reliability for psych tests?

A
  1. tests used for high impact inferences at individual level (ex: personnel selection, diagnosis of learning disabilities etc)
    - good: 0.9 or larger
    - sufficient 0.8-0.9
    - insufficient: smaller than 0.8
  2. tests used for less impact inference at individual level (descriptive use ex: study/therapy progress, career choice tests etc)
    - good: 0.8 or larger
    - sufficient: 0.7-0.8
    - insufficient: smaller than 0.7
  3. tests used at group level (ex: customer/team satisfaction, student evaluations, comparing groups etc)
    - good: 0.7 or larger
    - sufficient: 0.6-0.7
    - insufficient: smaller than 0.6
30
Q

what is item discrimination?

A

differences in the item scores reflect differences in the construct (indicates how good the items are)
- item total correlation: correlation between item scores & sum scores (how well does an item predict the total score on the test?)
- corrected item total correlation: correlation between item scores & rest scores (how well does an item predict the other item scores excluding the item u are looking at)

31
Q

what is the problem with the item-total correlation? what resolves this?

A

its biased upwards as you are correlating an item with itself (partly)
-> this problem solved with corrected item total correlation as it exluces the item itself

32
Q

what are the factors affecting reliability?

A
  1. test length
  2. sample heterogeneity
  3. the correlation between pretest and posttest scores
33
Q

how does a tests reliability get affected by test length?

A

lengthening a test will generally increase reliability

34
Q

what is the equation for a test’ reliability if its length has been changed?

A

Rnew = n x Roriginal / 1+(n-1)Roriginal
where n = new amount of items / original amoutn of items

35
Q

how does a test reliability get affected by sample heterogeneity?

A

in homogeneous samples, reliability will be smaller than in heterogenous samples
R = St^2 / St^2 + Se^2
in homogeneous samples, St^2 will be smaller as ppl are relatively similar to each other
while in heterogenous samples St^2 will be larger as ppl are relatively dissimilar to each other

36
Q

how does a tests reliability get affected by the correlation between pretest and posttest scores?

A

difference score: Di = Xi (post test) - Yi (pretest): difference between post and pretest scores
difference score reliabilty:
𝑅𝑑 = 𝑠𝑋𝑜^2 𝑅𝑋𝑋 + 𝑠𝑌𝑜^2 𝑅𝑌𝑌 − 2𝑟𝑋𝑜𝑌𝑜𝑠𝑋𝑜𝑠𝑌𝑜 / 𝑠𝑋𝑜^2 + 𝑠𝑌𝑜^2 − 2𝑟𝑋𝑜𝑌𝑜𝑠𝑋𝑜𝑠𝑌o
pretest variancereliability of pre tests + posttest variancereliability of posttest - 2xcorrelation between pre and posttestsd of pre testsd post test
important properties:
- if correlation between pretest and posttest is large, reliability will be small
- aka, difference reliability depends on reliability of the pretest and posttest
- sensitive to difference in variance between Xi and Yi

37
Q

how can you estimate true scores?

A
  1. true score estimate = summed item score
  2. true score estimate: Xest = mean of Xo (observed scores) + Reliability (Xo (score of person youre interested in) - mean of Xo)
    - based on regression to the mean
    - due to unreliability, high scoring persons will likely score lower on a next test
    -> the lower the reliability, the more the true score estimate is pulled toward the mean
38
Q

what is the standard error aka standard error of measurment?

A

amount of error present in an individuals score
Sem= So (sd of observed scores) * sqrt (1-Reliabilty)
so higher reliability = smaller sem
lower reliability = higher sem
can be used to construct 95% confidence interval around true score estimate

39
Q

what is attenuation?

A

effect size/correlations observed will be SMALLER than the effect sizes / correlations of the true scores (because observed scores are diluted by error)
+ correlation is smaller & less likely to be significant for less reliable test anyway (so always consider reliabilty)
aka when measurements arent reliable (cus theres measurement error), it weakens the relationships observed between variables

40
Q

what is wrong w corrections for attenuation?

A

the corrections done on the observed scores in order to remove the error in them, can be wrong !

41
Q

When is reliability high?

A
  • when there is little error score variance relative to the true score variance
  • when the sum of the true score variance & error variance comes close to the true score variance
  • when the proportion of error variance in the observed variance is small
42
Q

what is the relationship between standard error of measurment & reliability?

A

A smaller standard error of measurement means that there is less deviation of observed scores from true scores, so a more reliable test

43
Q

Consider two tests that purport to measure the same construct. In a pilot study, a researcher finds their observed test score means to be the same, but their test score variances to not be the same. Which of the test models do these data follow?

A

tau equivalent test

44
Q

Which test model does Cronbach’s alpha assume?

A

essentialy tau equivalent test model or stricter

45
Q

which model does the test-retest reliability assume?

A

parallel test model

46
Q

which model does the split half reliability assume?

A

parallel test model

47
Q

In a hypothetical dataset that contains the test scores on two tests, the true score mean and true score variance differ across the two tests. Which test model does this dataset follow?

A

the congeneric test model

48
Q

If you find the reliability of two tests measuring the same construct to be the same, what test model do these tests follow?

A

parallel test model

49
Q

In a hypothetical dataset that contains the test scores on two tests, the true score mean and true score variance are equal across the two tests. Which test model does this dataset follow?

A

parallel test model & tau equivalent test model

50
Q

Say you want to assess the consistency between the observed scores of one test and those of another test. Which method for estimating reliability do you use?

A

alternate forms reliability

51
Q

which criteria do 2 test forms need to meet, in order to legitimately use the alternate forms method of estimating reliability?

A

tests need to have identical true scores & identical error variance

52
Q

Jimmy conducts a study into aggression, for which he uses the Aggression Questionnaire (AGQ; Buss & Perry, 1992). He wants to know how reliable the AGQ is. Therefore, he lets his respondents fill in the questionnaire again.

Which method of estimating reliability does Jimmy intend to use here?

A

test retest

53
Q

When someone calculates Cronbach’s alpha to estimate the reliability of a test, what general method of estimating reliability is that person using?

A

internal consistency

54
Q

For which of the reliability methods, it is problematic if the true scores differ across two tests?

A
  • alternative forms
  • test retest
55
Q

for which reliability methods is it problematic if there are carry over effects?

A
  • test retest
  • internal consistency
  • alternative forms
56
Q

What is a problem that arises from using the internal consistency method for estimating reliability?

A

A correlation between the item’s error scores caused by carry-over effects

57
Q

for which items are the the following reliability measure suitable? raw alpha, KR20, standardized alpha

A

raw alpha: LIkert scale items that do not differ in variance too much
KR20: binary items
standardized alpha: Likert scale items that differ substantially in their item variance

58
Q

what is a difference between raw alpha, KR20, and standardized alpha? not concerning which items they are used on

A

Raw alpha and KR20 are based on the item covariances and item variances; standardized alpha only uses item correlations

59
Q

In what situation is it good to use standardized alpha?

A

When the item variances differ a lot from each other and thus the test score mostly reflects items with high variances

60
Q

What can we do to improve the reliability of a test?

A

Add more items to the test that are perfectly parallel to the original items

61
Q

is the reliabiltiy of the test smaller in a heterogenous sample or homogeneous sample?

A

homogenous sample

62
Q

If the pretest and posttest are both reliable, the reliability of the difference scores can still be relatively small if the pretest and posttest are…

A

highly correlated

63
Q

where can you find the split halves reliability? what about the reliablity of one halve?

A

split halves reliabilty: spearman brown coefficient
reliability of one halve: correlation between forms

64
Q

Which statistic do we use when we want to know the consistency between one item and the other items of a test?

A

Corrected item-total correlation

65
Q

define reliability

A

consistency or stability of test scores across repeated applications. It’s a crucial aspect of psychometrics because it determines how much trust can be placed in test results.

66
Q

what is the main assumption of the parallel test?

A

that 2 tests measure the same trait w equal true scores and error variances

67
Q

what is the main assumption of the tau equivalent test?

A

that 2 tests have equal true score variances but can differ in error variances

68
Q

What is the main assumption of the congeneric tests?

A

that a linear relationship between true scores across 2 tests exists, allowing for flexibilty in error & true score variances

69
Q

what is the domain sampling theory?

A

treats test items as a sample from a larger domain (bucket) of possible items
- the reliability of a test is the average correlation between all possible pairs of tests drawn from that domain: basically how consistent the test resuts would be if you made different tests by pulling out different sets of items from the bucket of all possible questions
basically this theory is saying “we want to know if the questions we randomly picked give a reliable pic of the persons true ability, even if we swapped them out for different questions from the same big bucket”