Lecture 1: Classic Test Theory Flashcards

1
Q

When psychologists assess the quality of a test, what two metrics do they typically refer to?

A

Validity and reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is test variance and how do you calculate it? (2)

A

Item variance is the measure of dispersion of the scores on item i. The test variance is the measure of the dispersion of the test scores. A covariance matrix is constructed in which the variance of each item is along the diagonal and the covariance between each item is displayed. The test variance is the sum of all these values in the matrix or the variance of the final test scores, its the same value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Whats the difference between covariance and correlation if there is one?

A

Covariance is an unscaled measure of association between variables, correlation is a scaled measure of association between variables between -1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What can be used to infer the dimensionality of a test in CTT?

A

Principle Component analysis (PCA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant by Principal component analysis (PCA)?

A

Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimising information loss. It does so by creating new uncorrelated variables that successively maximise variance. E.g reducing something (e.g tumour) with 30 dimensions (smoothness, volume) to two principle components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Summarise the main steps of how PCA is calculated

A

We calculate the covariance matrix of our data, we calculate the eigenvectors of the covariance matrix, and this gives us our principal components. The eigenvector with the largest eigenvalue is the first principal component, and the eigenvector with the smallest eigenvalue is the last principal component.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does Xgp represet in CTT?

A

𝑋𝑔𝑝 is a random variable denoting the repeatedly sampled measurements of test g on subject p.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What two fundamental equations can be derived from CTT?

A
  1. 𝐸 (𝑋𝑔𝑝) =πœπ‘”π‘
    The expected value of Xgp is equal to the true value
  2. 𝐸𝑔𝑝 =𝑋𝑔𝑝 βˆ’πœπ‘”π‘ (for a fixed subject)
    (error = observed score - true score)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What three assumptions are there within CTT?

A

(a) the measurement is on an interval scale;
(b) the variance of observed scores 𝜎2 𝑋𝑔 is finite;
(c) the measurements are repeatedly sampled in a linear, experimentally independent way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What 8 properties are derived from CTT?

A
  1. The expected error score is zero;
  2. The correlation between true and error scores is zero;
  3. The correlation between the error score on one measurement and the true score on another measurement is zero;
  4. The correlation between errors on linearly experimentally independent measurements is zero;
  5. The expected value of 𝑋𝑔𝑝 over persons is equal to the expected value of the true score random variable over persons;
  6. The variance of 𝐸𝑔𝑝 over persons is equal to the expected value, over persons, of 𝜎2 𝑋𝑔𝑝 (variance within persons);
  7. Sampling over persons with any πœπ‘”π‘, the expected value of the error score random variable is zero;
  8. The variance of observed scores is the sum of the variance of true scores and the variance of error scores;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Give proof that the expected error score is 0

*Not required but gives an idea of how CTT is derived

A
  1. 𝐸 (𝑋𝑔𝑝) =πœπ‘”π‘ (fundamental Eq. 1)
  2. 𝐸𝑔𝑝 =𝑋𝑔𝑝 βˆ’πœπ‘”π‘ (fundamental Eq. 2)

𝐸 (𝐸𝑔𝑝) = 𝐸 (𝑋𝑔𝑝 βˆ’πœπ‘”π‘) = 𝐸 (𝑋𝑔𝑝) βˆ’πΈ (πœπ‘”π‘)* = πœπ‘”π‘ - πœπ‘”π‘ = 0

*For one person πœπ‘”π‘ is fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Give the proof for the following:

  1. The correlation between true and error scores is zero;
  2. The correlation between the error score on one measurement and the true score on another measurement is zero;
  3. The correlation between errors on linearly experimentally independent measurements is zero;

*Not needed to reproduce exact theorems

A
  1. The correlation between true and error scores is zero;
    𝑋𝑔 =𝑇𝑔 +𝐸𝑔
    or 𝑋 =𝑇+𝐸
    𝐸 𝐸𝑔𝑝 =0 (property 1) β†’ 𝐸 (𝐸𝑔 |𝑇𝑔 =πœπ‘”π‘) =𝐸 (𝐸𝑔𝑝) =0 for all πœπ‘”π‘ β†’ 𝜌 (𝐸𝑔,𝑇𝑔 )=0

If you know the error is 0 for each person, the expected value of the error is also 0. Therefore there cannot be a correlation between the error and the true score.

  1. The correlation between the error score on one measurement and the true score on another measurement is zero;
    𝐸 𝐸𝑔 =0 (property 1) β†’ 𝐸 (𝐸𝑔 |π‘‡β„Ž =πœβ„Žπ‘) =0 for all πœβ„Žπ‘ β†’ 𝜌 (𝐸𝑔,π‘‡β„Ž) = 0

Same logic; if error is zero, it cannot be correlated with true score

  1. The correlation between errors on linearly experimentally independent measurements is zero;
    𝐸 (𝐸𝑔) =0 (property 1) β†’ 𝐸 (𝐸𝑔 |πΈβ„Ž =πΈβ„Žπ‘) =0 for all πΈβ„Žπ‘ β†’ 𝜌 (𝐸𝑔,πΈβ„Ž) =0

Same logic; if error is zero, it cannot be correlated with other errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give the proof of property 8: The variance of observed scores is the sum of the variance of true scores and the variance of error scores;

A
  1. 𝜌 𝐸𝑔,𝑇𝑔 =0 (property 2)
  2. 𝑋𝑔 =𝑇𝑔 +𝐸𝑔 (population model)

𝜎2 (𝑋𝑔) = 𝜎2 (𝑇𝑔 +𝐸𝑔 )*= 𝜎2(𝑇𝑔)+𝜎2 (𝐸𝑔) +2𝜎( 𝑇𝑔,𝐸𝑔)
β†’πœŽ2 𝑋𝑔 =𝜎2(𝑇𝑔)+𝜎2 𝐸𝑔
or 𝜎2/𝑋 =𝜎2/𝑇 +𝜎2/𝐸

*Covariance matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can reliability be defined according to these terms (conceptually, and proof shown)

A

Conceptually: That it is the squared correlation between the test score and the true score for a participant

Using fundamental Equations 1 and 2, and property 2, reliability can be defined as:

𝜌 (𝑋𝑔,𝑇𝑔) 
= 𝜎(𝑋𝑔,𝑇𝑔) / 𝜎 (𝑋𝑔) 𝜎(𝑇𝑔) 
= 𝜎(𝑇𝑔 +𝐸𝑔,𝑇𝑔) / 𝜎 (𝑋𝑔) 𝜎(𝑇𝑔) 
= 𝜎 (𝑇𝑔,𝑇𝑔) +𝜎(𝐸𝑔,𝑇𝑔) / 𝜎 (𝑋𝑔) 𝜎(𝑇𝑔) 
= 𝜎2 (𝑇𝑔) +0 /𝜎 (𝑋𝑔) 𝜎(𝑇𝑔)
= 𝜎 (𝑇𝑔)/ 𝜎 (𝑋𝑔)

β†’πœŒπ‘‹ =𝜌^2𝑋,𝑇 =
= 𝜎(𝑋𝑔,𝑇𝑔) ^2
= (𝜎(𝑇𝑔)/𝜎 (𝑋𝑔))^2 =
= 𝜎2 (𝑇𝑔)/𝜎2 (𝑋𝑔)

corr between test score and true score
= formula for corr
= Xg = Tg + Eg
=covar of T+E & T can be written as covar of T & T + covar of E & T (rule)
=covar of T = the var, covar between E and T is 0 as explained before
= the 𝜎 (𝑋𝑔) cancel, leaving one on top
=Not there yet, reliability of x = corr between x and t squared (how much, in %, var of the total score variance is due to the true score)
=the corr squared = what we derived before
= the var of t / the var of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How insightful is this definition of reliability?

A

insightful as 𝜎^2 (𝑋𝑔) =𝜎^2(𝑇𝑔)+𝜎^2 (𝐸𝑔) (property 8)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

These are theoretical equations, we cannot calculate them without the variance of true scores. How do we try to do this?

A

The concept of parallel tests: if you have test h with a parallel test form g.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the assumptions of parallel tests

A

You assume the true scores are identical on the two tests as well as the variance of the error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How are parallel tests g and h defined mathematically?

A

πœβ„Žπ‘ =πœπ‘”π‘ β†’ π‘‡β„Ž =𝑇𝑔 =𝑇
The true score on one test is the same as the true score of another for one subject

𝜎2(πΈβ„Žπ‘)=𝜎2(𝐸𝑔𝑝) β†’πœŽ^2 (π‘‹β„Ž) =𝜎^2 (𝑋𝑔) =𝜎^2(𝑋)
If you have the same error variance, you have the same test score variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do these definitions help us calculate the reliability of a test?

A

If you have the same true score between tests and the error and test score variance between tests then the correlation between the test scores is equal to the reliability of each test

20
Q

Prove mathematically that the correlation between the test scores is equal to the reliability of each test

A

𝜌 (π‘‹β„Ž,𝑋𝑔)
= 𝜎 (π‘‡β„Ž +πΈβ„Ž,𝑇𝑔 +𝐸𝑔)/ 𝜎 (π‘‹β„Ž) 𝜎 (𝑋𝑔)
= 𝜎 (π‘‡β„Ž,𝑇𝑔) +𝜎 (πΈβ„Ž,𝑇𝑔) +𝜎 (𝐸𝑔,π‘‡β„Ž) +𝜎 (𝐸𝑔,πΈβ„Ž) / 𝜎 (π‘‹β„Ž) 𝜎 (𝑋𝑔)
= 𝜎 (π‘‡β„Ž,𝑇𝑔 )+0+0+0 / 𝜎 (π‘‹β„Ž) 𝜎 (𝑋𝑔)

= plug in the X = T + E from CTT divided by std of X for each (formula for covar)
= Same trick T+E,T = T+T, T + E for both (property 3 + 4)
=covariance of E with anything = 0, corr of true scores/ std of test scores

since parallel tests also say:
π‘‡β„Ž =𝑇𝑔 =𝑇
𝜎2 π‘‹β„Ž =𝜎2 𝑋𝑔 = 𝜎2 𝑋 :

𝜎 (π‘‡β„Ž,𝑇𝑔)/𝜎 (π‘‹β„Ž) 𝜎 (𝑋𝑔)
= 𝜎^2 (𝑇) / 𝜎^2 𝑋
=𝜌^2𝑋,𝑇
=πœŒπ‘‹

21
Q

What is the next step towards really calculating reliability?

A

Chronbachs alpha: Assumes all the items are really parallel tests/items to use their ideas to calculate reliability. This is the most used index for reliability in psychology.

22
Q

How is Cronbach’s Alpha given mathematically ad conceptually?

A

π›ΌπœŒπ‘‹ = (𝑛/π‘›βˆ’1)*(𝜎|2𝑋|βˆ’Ξ£πœŽπ‘–^2 / 𝜎|2𝑋|)
= (𝑛/π‘›βˆ’1) (Ξ£Ξ£π‘–β‰ π‘—πœŽπ‘–π‘— / 𝜎|2𝑋| )
where n is the number of items

= 𝑛/π‘›βˆ’1 * test score variance - sum of item variances (diag of covar matrix) / test variance
= 𝑛/π‘›βˆ’1 * sum of all elements of covar matrix - diagonal of covar matrix / sum of all elements of covar matrix

so n/n-1 * sum of the undiagonal element for a covariance matrix divided by the sum of the matrix

23
Q

When does Chronbach’s Alpha give the exact reliability? What consequences does this have?

A

Cronbach’s alpha gives the reliability if the test is essential tau-equivalent, i.e.,
𝑇𝑖 =π‘Žπ‘—π‘˜ +𝑇𝑗 β†’ 𝜎^2 (𝑇𝑖) =𝜎^2 (𝑇𝑗)

Since this is rarely ever fulfilled, Chronbach’s alpha underestimates the reliability. It is essentially the lower bound of the reliability

24
Q

How is Cronbach’s alpha a lower bound?

A

When Cronbacks alpha is derived reliability can be rewritten as:
πœŒπ‘‹ =𝐴+ π›ΌπœŒβˆ’π΅
reliability = A + Cronbachs alpha - B

β†’If the item/parts are essential tau-equivalent: 𝐴=𝐡 so that π΄βˆ’π΅ =0
β†’If not, A will always be larger than B: 𝐴>𝐡. Thus, Cronbach’s alpha is a lower bound!

Note: A and B signify other parts of the derived equation that satisfy these points

25
Q

How was Chronbachs Alpha first introduced and how is it relevant to how it is treated today?

A

Cronbach’s alpha was just one of 6 proposed measures of reliability in the same paper although it is sometimes treated as the only measure. It is πœ†3 when πœ†1-πœ†6 were proposed in that paper. Chronbach reinvented it but it existed before

26
Q

What measure of reliability was later proposed by woodward and bentler (1980)?

A

Greatest lower bound:

Under Classical Test Theory, the variance of the test scores is given by:
𝜎^2 (𝑋) =𝜎^2(𝑇) +𝜎^2 (𝐸)

Then, the greatest lower bound (Woodward & Bentler, 1980) is given by:
πΊπΏπ΅πœŒπ‘‹ = 1βˆ’ max(𝐸𝜎^2(𝐸𝑖)) / 𝜎2(𝑋)

The maximum variance of E possible, given that 𝜎2 𝑋 should
be positive

27
Q

How can GLB be estimated?

A

GLB can only be estimated using an algorithm (Theres an R package)

28
Q

How do 4 of these reliability estimates compare to each other in regards to size? What could be inferred from this?

A

πœ†1-6 were proposed, πœ†3 is C.A

πœ†1 < π›ΌπœŒπ‘‹ β‰€πœ†2 ≀ πΊπΏπ΅πœŒπ‘‹
πœ†1 < π›ΌπœŒπ‘‹ =πœ†2 = πΊπΏπ΅πœŒπ‘‹ (for essentially tau-equivalent items)

This could indicate that GLB would be the safest to use since in the worst case it is equal to CA, in the best case its larger than CA. This shows that CA is valuable because it follows from parallel testing but there are indices out there, some of which are arguably better.

29
Q

Name 6 other practically useful statistics from CTT ad name when they are useful

A

β€’ Split half reliability: π‘†π΅πœŒπ‘‹ = 2πœŒπ‘‹1𝑋2 / 1+πœŒπ‘‹1𝑋2
> 𝑋1 and 𝑋2 are the two halves
> If lower bounds are not meaningful, e.g., in randomized experimental trials

β€’ Test-retest reliability: π‘Ÿπ‘’π‘‘π‘’π‘ π‘‘πœŒπ‘‹ =πœŒπ‘‹1𝑋2
> 𝑋1 and 𝑋2 are the two administrations
> If the underlying construct is stable enough, and no memory effects

β€’ Standard Error of Measurement (SEM): 𝑆𝐸𝑀 =𝜎𝐸 =πœŽπ‘‹(1βˆ’πœŒπ‘‹)
> To determine a confidence interval for πœπ‘

β€’ Correction for attenuation (reduction in strength of signal?): πœŒπ‘‡π‘”π‘‡β„Ž = πœŒπ‘‹π‘”π‘‹β„Ž / πœŒπ‘‹π‘”πœŒπ‘‹β„Ž
> Where 𝑋𝑔 is from one test and π‘‹β„Ž from another

β€’ Item mean
> As a measure of item difficulty

β€’ Item-rest correlation
> As a measure of item discrimination

30
Q

What criticisms have there been for test theory?

A

The true score is nothing more than the expected value of 𝑋on test 𝑔
𝑋𝑔𝑝 =πœπ‘”π‘ + 𝐸𝑔𝑝
𝑋𝑔𝑝 =𝐸(𝑋𝑔𝑝) + 𝐸𝑔𝑝

As 𝐸(𝑋𝑔𝑝) is just a statistical expectation about test 𝑔:

  1. The true score does not necessarily correspond to a unidimensional construct score
  2. Statistics from CTT depend on both the item properties and the properties of the subjects
  3. The true score contains irrelevant -but systematic- item specific effects
31
Q

What is meant by saying that the true score does not necessarily correspond to a unidimensional construct score?

A

The true score is just an expected value on a test, it is a statistical thing. Some people seem to give it some kind of magical status, they see it as the construct or as a dimension, latent variable but it is just an expected value. If your data contains a unidimensional construct or nicely/ accurately measures a unidimensional construct, then your true score might represent this but still you’re not sure.

32
Q

What is meant by unidimensional constructs and why would we want to measure them?

A

Constructs with just one dimension e.g working memory, extraversion, openness to experience. As opposed to higher order constructs with multiple dimensions such as intelligence, emotional intelligence etc. The benefit of trying to measure unidimensional constructs is that you can infer that a measure of a score reflects a high level of one variable rather than a possible high score in a number of variables.

33
Q

Every test has a true score. You can therefore apply CTT to any test. You can calculate sum score or reliability and you have applied classical test theory.

What is wrong with this?

A

You may not have checked if this is the right thing to do and if CTT is appropriate for your data. E.g a with a three unrelated question questionnaire you could likely get good test retest reliability, sum score etc despite the test measuring nothing.

Alternatively a test could be measuring two constructs. This may be observed by looking at the correlation matrix and seeing that there are two groups of questions correlating with each other. Cronbach’s Alpha and GLB, however, sum all items and so how do you interpret this sum score since it has two dimensions?

34
Q

What does it mean to say that β€œStatistics from CTT depend on both the item properties and the properties of the subjects”?

A

Each intelligence test (X𝑔,Xβ„Ž,X𝑓,etc.) contains a different true score (πœπ‘”,πœβ„Ž,πœπ‘“, etc.) with its own scale depending on

  1. The number of items (10 items vs 1000 items means a difference of steps of 0.1 or 0.001 on your scale)
  2. The difficulty/discrimination of the items
  3. The skill of the subjects that took the test (you in smart vs dumb sample)

However all of these tests would have the same true score since they’re supposed to be measuring the same thing.

As a result, all statistics from classical test theory depend on
1. the properties of the test
β€’ Item difficulty and item discrimination
2. the properties of the sample
β€’ Mean and variance of the true scores of the subjects

35
Q

How does variance affect reliability?

A

Say a group was measured on a construct that they do not differ much in accurately (e.g uva professors and intelligence) then it will be hard to score a high reliability even if you always get similar answers since variance is factored into the reliability equation

36
Q

What does it mean to say that the true score contains irrelevant -but systematic- item specific effects?

A

E.g in the following example

  1. At parties, I always talk to everybody
  2. I like giving talks for large audiences
  3. In business meetings, I am the centre of attention
  4. If someone hurts me, I will stand up for myself

All involve extraversion and this plays a factor in the decision made, however there is the extra noise (item specific error) in each item in addition to the measurement error variance. For example that each item takes place in a different setting. Perhaps someone is into parties and has no experience with business. Perhaps someone is very involved in their work and does not go to parties. The sum score calculates the answers to all these items as part of the true score, however, despite the extra error since these will also likely carry reliability.

37
Q

What was proposed to deal with these criticisms?

A

Latent variable models

38
Q

How do Latent Variable models deal with these criticisms?

A

They specify an explicit measurement model, a statistical model which describes the relationship between the construct and the item. This is different to CTT where the true score is not an explicit construct it is an expected value. In CTT the expected value of a score on item i is equal to the true score, in LVM the expected value depends on the latent variable.

39
Q

What is meant by latent variables and item parameters?

A

Latent variable (person parameter):
Unobserved dimension of individual differences
that underlies all items in a test

Item parameters:
Model the item properties (comparable to e.g.,
item-rest correlation, item means)

40
Q

Show mathematically how in four important equations in Latent Variable Models, the latent variables play a factor for measurement model 𝐸 (𝑋𝑝𝑖|πœƒπ‘)

A

πœƒπ‘ refers to latent variables/ person parameters
πœπ‘–,πœ‡1𝑖,𝛽𝑖, πœ‹1𝑖, +πœ†π‘– refer to item parameters

β€’ factor analysis:
e.g., 𝐸 (𝑋𝑝𝑖|πœƒπ‘) =πœπ‘– +πœ†π‘–πœƒπ‘

β€’ Item response theory:
e.g., 𝐸 (𝑋𝑝|𝑖 )πœƒπ‘ = exp(π›Όπ‘–πœƒπ‘+𝛽𝑖) / 1+exp(π›Όπ‘–πœƒπ‘+𝛽𝑖)

β€’ Latent class analysis: e.g., 𝐸(𝑋𝑝𝑖| πœƒπ‘) =πœ‹|πœƒπ‘ 0𝑖|
 Γ—πœ‹|1βˆ’πœƒπ‘ 1𝑖|

β€’ Latent profile analysis: e.g., 𝐸 (𝑋𝑝𝑖| πœƒπ‘) =πœ‡|πœƒπ‘ 0𝑖|Γ—πœ‡|1βˆ’πœƒπ‘ 1𝑖|

41
Q

What does a structural model, 𝐸(πœƒπ‘|𝐡𝑝) represent?

A

Structural model, 𝐸(πœƒπ‘|𝐡𝑝):
A statistical model describing the relation between construct and other variables, 𝐡𝑝
β€’ E.g., similar to a regression model, ANOVA, t-test, etc

42
Q

What is the relationship between the structural model and the measurement model?

A

With the structural model you can take your latent variable of the construct and apply it to a regression model, anova etc. The measurement model accounts for all the measurement properties of the items so that you can safely make inferences with the structural model about the latent variable which do not suffer from problems which the true score suffers from

43
Q

How do latent variable models address the following criticism of CTT?

The true score does not necessarily correspond to a construct score

A

Latent variable models are falsifiable
β€’ There will be no latent variable in the data of unrelated questions
β€’ only item specific effects which inflate test-retest reliability
β€’ You will be able to tell from the latent variable variance (approaches 0)

A latent variable model with one latent variable will also not
fit the multidimensional data (model fit indices will indicate).
Instead, a latent variable model with two latent variables will fit these data (model fit indices will indicate)

44
Q

How do latent variable models address the following criticism of CTT?

The true score depends on the scale of test g

A

In a latent variable model, test and sample properties are separated:
β€’ Test properties will be captured by the item parameters
β€’ Sample properties will be captured by the latent variable
Thus, all intelligence tests will be measuring the same latent variable

45
Q

How do latent variable models address the following criticism of CTT?

The true score contains irrelevant but systematic
item specific effects

A

Recall that a latent variable is defined as : β€œUnobserved dimension of individual differences that underlies all items in a test”

46
Q

Thus summarise the advantages of latent variable models and give two disadvantages

A

Latent variables explicitly model the dimensionality of a test

Latent variable models are falsifiable

Latent variable models are not test and sample dependent
Latent variable models explicitly account for item specific error

But:
Require much larger sample sizes
Statistically more complex