Lecture 1: Classic Test Theory Flashcards

Question

How was Chronbachs Alpha first introduced and how is it relevant to how it is treated today?

Answer 1

Cronbach's alpha was just one of 6 proposed measures of reliability in the same paper although it is sometimes treated as the only measure. It is 𝜆3 when 𝜆1-𝜆6 were proposed in that paper. Chronbach reinvented it but it existed before

Answer 2

Greatest lower bound: Under Classical Test Theory, the variance of the test scores is given by: 𝜎^2 (𝑋) =𝜎^2(𝑇) +𝜎^2 (𝐸) Then, the greatest lower bound (Woodward & Bentler, 1980) is given by: 𝐺𝐿𝐵𝜌𝑋 = 1− max(𝐸𝜎^2(𝐸𝑖)) / 𝜎2(𝑋) The maximum variance of E possible, given that 𝜎2 𝑋 should be positive

Answer 3

GLB can only be estimated using an algorithm (Theres an R package)

Answer 4

𝜆1-6 were proposed, 𝜆3 is C.A 𝜆1 < 𝛼𝜌𝑋 ≤𝜆2 ≤ 𝐺𝐿𝐵𝜌𝑋 𝜆1 < 𝛼𝜌𝑋 =𝜆2 = 𝐺𝐿𝐵𝜌𝑋 (for essentially tau-equivalent items) This could indicate that GLB would be the safest to use since in the worst case it is equal to CA, in the best case its larger than CA. This shows that CA is valuable because it follows from parallel testing but there are indices out there, some of which are arguably better.

Answer 5

• Split half reliability: 𝑆𝐵𝜌𝑋 = 2𝜌𝑋1𝑋2 / 1+𝜌𝑋1𝑋2 > 𝑋1 and 𝑋2 are the two halves > If lower bounds are not meaningful, e.g., in randomized experimental trials • Test-retest reliability: 𝑟𝑒𝑡𝑒𝑠𝑡𝜌𝑋 =𝜌𝑋1𝑋2 > 𝑋1 and 𝑋2 are the two administrations > If the underlying construct is stable enough, and no memory effects • Standard Error of Measurement (SEM): 𝑆𝐸𝑀 =𝜎𝐸 =𝜎𝑋(1−𝜌𝑋) > To determine a confidence interval for 𝜏𝑝 • Correction for attenuation (reduction in strength of signal?): 𝜌𝑇𝑔𝑇ℎ = 𝜌𝑋𝑔𝑋ℎ / 𝜌𝑋𝑔𝜌𝑋ℎ > Where 𝑋𝑔 is from one test and 𝑋ℎ from another • Item mean > As a measure of item difficulty • Item-rest correlation > As a measure of item discrimination

Answer 6

The true score is nothing more than the expected value of 𝑋on test 𝑔 𝑋𝑔𝑝 =𝜏𝑔𝑝 + 𝐸𝑔𝑝 𝑋𝑔𝑝 =𝐸(𝑋𝑔𝑝) + 𝐸𝑔𝑝 As 𝐸(𝑋𝑔𝑝) is just a statistical expectation about test 𝑔: 1. The true score does not necessarily correspond to a unidimensional construct score 2. Statistics from CTT depend on both the item properties and the properties of the subjects 3. The true score contains irrelevant -but systematic- item specific effects

Answer 7

The true score is just an expected value on a test, it is a statistical thing. Some people seem to give it some kind of magical status, they see it as the construct or as a dimension, latent variable but it is just an expected value. If your data contains a unidimensional construct or nicely/ accurately measures a unidimensional construct, then your true score might represent this but still you're not sure.

Answer 8

Constructs with just one dimension e.g working memory, extraversion, openness to experience. As opposed to higher order constructs with multiple dimensions such as intelligence, emotional intelligence etc. The benefit of trying to measure unidimensional constructs is that you can infer that a measure of a score reflects a high level of one variable rather than a possible high score in a number of variables.

Answer 9

You may not have checked if this is the right thing to do and if CTT is appropriate for your data. E.g a with a three unrelated question questionnaire you could likely get good test retest reliability, sum score etc despite the test measuring nothing. Alternatively a test could be measuring two constructs. This may be observed by looking at the correlation matrix and seeing that there are two groups of questions correlating with each other. Cronbach's Alpha and GLB, however, sum all items and so how do you interpret this sum score since it has two dimensions?

Answer 10

Each intelligence test (X𝑔,Xℎ,X𝑓,etc.) contains a different true score (𝜏𝑔,𝜏ℎ,𝜏𝑓, etc.) with its own scale depending on 1. The number of items (10 items vs 1000 items means a difference of steps of 0.1 or 0.001 on your scale) 2. The difficulty/discrimination of the items 3. The skill of the subjects that took the test (you in smart vs dumb sample) However all of these tests would have the same true score since they're supposed to be measuring the same thing. As a result, all statistics from classical test theory depend on 1. the properties of the test • Item difficulty and item discrimination 2. the properties of the sample • Mean and variance of the true scores of the subjects

Answer 11

Say a group was measured on a construct that they do not differ much in accurately (e.g uva professors and intelligence) then it will be hard to score a high reliability even if you always get similar answers since variance is factored into the reliability equation

Answer 12

E.g in the following example 1. At parties, I always talk to everybody 2. I like giving talks for large audiences 3. In business meetings, I am the centre of attention 4. If someone hurts me, I will stand up for myself All involve extraversion and this plays a factor in the decision made, however there is the extra noise (item specific error) in each item in addition to the measurement error variance. For example that each item takes place in a different setting. Perhaps someone is into parties and has no experience with business. Perhaps someone is very involved in their work and does not go to parties. The sum score calculates the answers to all these items as part of the true score, however, despite the extra error since these will also likely carry reliability.

Answer 13

Latent variable models

Answer 14

They specify an explicit measurement model, a statistical model which describes the relationship between the construct and the item. This is different to CTT where the true score is not an explicit construct it is an expected value. In CTT the expected value of a score on item i is equal to the true score, in LVM the expected value depends on the latent variable.

Answer 15

Latent variable (person parameter): Unobserved dimension of individual differences that underlies all items in a test Item parameters: Model the item properties (comparable to e.g., item-rest correlation, item means)

Answer 16

𝜃𝑝 refers to latent variables/ person parameters 𝜐𝑖,𝜇1𝑖,𝛽𝑖, 𝜋1𝑖, +𝜆𝑖 refer to item parameters • factor analysis: e.g., 𝐸 (𝑋𝑝𝑖|𝜃𝑝) =𝜐𝑖 +𝜆𝑖𝜃𝑝 • Item response theory: e.g., 𝐸 (𝑋𝑝|𝑖 )𝜃𝑝 = exp(𝛼𝑖𝜃𝑝+𝛽𝑖) / 1+exp(𝛼𝑖𝜃𝑝+𝛽𝑖) ``` • Latent class analysis: e.g., 𝐸(𝑋𝑝𝑖| 𝜃𝑝) =𝜋|𝜃𝑝 0𝑖| ×𝜋|1−𝜃𝑝 1𝑖| ``` • Latent profile analysis: e.g., 𝐸 (𝑋𝑝𝑖| 𝜃𝑝) =𝜇|𝜃𝑝 0𝑖|×𝜇|1−𝜃𝑝 1𝑖|

Answer 17

Structural model, 𝐸(𝜃𝑝|𝐵𝑝): A statistical model describing the relation between construct and other variables, 𝐵𝑝 • E.g., similar to a regression model, ANOVA, t-test, etc

Answer 18

With the structural model you can take your latent variable of the construct and apply it to a regression model, anova etc. The measurement model accounts for all the measurement properties of the items so that you can safely make inferences with the structural model about the latent variable which do not suffer from problems which the true score suffers from

Answer 19

Latent variable models are falsifiable • There will be no latent variable in the data of unrelated questions • only item specific effects which inflate test-retest reliability • You will be able to tell from the latent variable variance (approaches 0) A latent variable model with one latent variable will also not fit the multidimensional data (model fit indices will indicate). Instead, a latent variable model with two latent variables will fit these data (model fit indices will indicate)

Answer 20

In a latent variable model, test and sample properties are separated: • Test properties will be captured by the item parameters • Sample properties will be captured by the latent variable Thus, all intelligence tests will be measuring the same latent variable

Answer 21

Recall that a latent variable is defined as : “Unobserved dimension of individual differences that underlies all items in a test”

Answer 22

Latent variables explicitly model the dimensionality of a test Latent variable models are falsifiable Latent variable models are not test and sample dependent Latent variable models explicitly account for item specific error But: Require much larger sample sizes Statistically more complex

Lecture 1: Classic Test Theory Flashcards

(46 cards)