Lecture 4 Modern Test theory Flashcards

1
Q

From classical test theory to modern test theory

What are classical test theory advantages?

A
  1. Allows for calculating relability
  2. Intuitive and easy to apply
  3. It’s in SPSS and it’s easy to do in Excel
  4. No large sample sizes/many items needed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the disadvantages of classical test theory?

A
  1. Focus on the test, not on the items
  2. Test properties depend on the population (e.g. reliability and difficulty of a test should be generalisable to different populations)
  3. Person properties depend on the test (i.e. sum score is higher if the test is easy and lower if the test is difficult)

Modern test theory adresses these disadvantages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Modern Test Theory? What is the assumption when calculating this theory?

A

Specify a measurement model in which we mathematically link the item scores to the construct (= latent variable/latent trait/factor)
- The idea of reflective measurement - the construct affects the item scores

Assumption:
- Unidimensionality = you only measure 1 construct

Book uses trait level, Dylan uses latent variable, but it’s the same thi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an Item Response Theory?

A

Specific form of modern test theory where there is a specific mathematical link between the latent variable and the item
- Individual’s response to a particular test item is influenced by qualities of the individual (trait level) and by qualities of the item (difficulty level)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does each variable mean on the graph demonstrating item response theory?

A

Picture 1
X-axis = the latent variable (trait level → level of the relevant psychological construct)
- Each subject has a position on the latent variable
- 0 on the x axis is the average of the latent variable (person has 50% chance of answering an item correctly)

Y-axis = probability of the correct response (0 to 1, 1 being the correct answer)
P(Xis = 1|…) → the probability that a correct response will be made by a particular individual when answering a particular item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the name of the function? Why is it helpful?

A

The graph is a Logistic (s-shaped) function
- Runs from 0 to 1 - exactly what we need because we are modellling the probability of a correct response and thanks to that accounting for measurement error
- Do this through Item characteristic curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an item characteristic curve?

A

A graphical display linking respondents’ trait levels to the probability of correctly answering an item
- There is a curve like this for every item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does the position of the curve change with different difficulty?

A

All the way to the left
- very easy item because the person who is below average on the latent variable has a probability of answering correctly very close to 1

All the way to the right
- very difficult item because the person who is above average on the latent variable has probability close to 0

The position of the curve on the latent x axis, depends on how difficult the item is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Rasch model? What is its function formula and what do the variables mean?

A

Picture 2
Function formula:
𝑃(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖) =(𝑒^(𝜃𝑠−𝛽𝑖))/(1 + 𝑒^(𝜃𝑠−𝛽𝑖))
- P(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖) → the probability that subject s will respond correctly to the item i correctly
- The probability of a correct response only depends on the latent variable and item difficulty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do the variables in Rasch’s model function mean?

A
  • 𝑋𝑖𝑠 = 1 → ‘correct’ (1) response (X) t the item (i) by a subject (s)
  • 𝛽 → the difficulty of an item, can be any positive or negative number (or zero)
  • The larger 𝛽 value, the more difficult the item is
  • 𝜃 → latent variable (tells us how well a certain subject scores on the variable)
  • e = 2.72 (base of natural logarithm)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the Two Parameter logistic model (2PL)? How does it differ from Rasch Model?

A

Similar to the Rasch model (s-shaped function, 𝛽 parameter)

Formula: 𝑃(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖, 𝛼𝑖) = (𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))/(1 + 𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))
The probability of a respondent answering an item correctly is conditional on the respondent’s trait level (latent variable), the item difficulty and the item’s discrimination
Now we have α𝑖 parameter = item dicrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an item discrimination?

A

The steepness of the ICC indicates the item’s ability to discriminate between individuals with different trait levels

  • It indicates the relevance of the item to the latent variable being measure by the test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can the number of item discrimination be interpreted? Is it mostly positive/negative?

A
  • Mostly positive number but can be negative for contra-indicative item
  • The larger the number the better the test can detect differences = strong consistency between an item and the underlying latent variable

↪ The steeper the curve (larger number of α𝑖), the more different do the two subjects score on the test (bigger difference in probability) even though they might be very close on the latent variable (picture 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Three Parameter logistic model?

A

Picture 4
𝑐𝑖 = guessing value → lower-bound probability of a correct answer purely on the basis of chance
𝑃(𝑋𝑖𝑠 = 1|𝜃𝑠, 𝛽𝑖, 𝛼𝑖, 𝑐𝑖 = 𝑐𝑖 + (1 − 𝑐𝑖) * (𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))/(1 + 𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖)))
So the curve doesn’t start at 0 because people are assumed to guess on the more difficult items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the guessing value depend on?

A

Depends on the number of response options available (4 options → guessing will produce a correct answer 25% of the time so c = 0.25)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Graded Response model (GRM)?

A

Picture 5
A model for a likert scale (the other ones are for binary items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the formula function of GRM and how does it differ from 2PL?

A
  • Separate item characteristic curve for each response option → 𝑃(𝑋𝑖𝑠 > 𝑗|𝜃𝑠, 𝛽𝑖𝑗, 𝛼𝑖) = (𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖𝑗)))/(1 + 𝑒^(𝛼𝑖(𝜃𝑠−𝛽𝑖𝑗)))
  • The function formula is the same as for 2PL but now the item difficulty (𝛽) is specific to each response option
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the graph of GRM show? Use Dom as an example

A

Picture 5

  • The characteristic curve of each response option is positioned on the latent variable based on each difficulty
  • The probability of each response option for each person (on the latent variable axis) is shown in this model
  • the smiley face (let’s call him Dom) has the probability of ~ 0 to choose option 2, ~ 0.2 to choose option 5, and ~ 0.5 to choose option 4…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How many difficulty parameters are there in GRM?

A

There are m-1 difficulty parameters (𝛽𝑖𝑗) for each item (m= response options)

  • each difficulty value represents the trait level required to move from one response option the the next ‘‘higher’’ one on the scale → making it dichotomous instead of polytomous like likert scale is
  • That’s why in the formula it says Xis is j or higher since we always compare the one higher category
20
Q

How do we use the GRM formula and the difficulty parameters to conclude what is a person’s most likely response?

Long flashcard - don’t remember, just read it and try to understand

A
  1. We are given difficulty parameters 𝛽𝑖𝑗 for each distinction (if there are 5 options on the scale, we will have 4 distinctions → m-1)
  2. Let’s assume a person has an average level of extroversion (𝜃 = 0) and 𝛼𝑖 = 2.32
  3. With this data we can calculate the four probabilities of each response option (picture 11)
  4. The probabilities become smaller as the response options become more extremely positive (more difficult)
  5. With these four values we can estimate the probability that a person will choose a specific response option when responding to this item. We do this by computing the difference between two adjecent ‘‘range’’ probabilities
  6. j refers to one response option and j-1 refers to the immediately prior option (picture 12)
  7. Thus we see that the person is most likely to respond ‘‘neutral’’, since this option has the largest probability (0.53)

People with low trait levels will be relatively likely to respond with the lower response options

21
Q

What is the model fit?

A

Whether the actual responses to a set of items are well represented by a given measurement model that the test user has chosen (e.g. 1PL or 2PL)

  • Once obtaining evidence of ‘‘good fit’’, test users might proceed to examine item parameters
  • If the model turns out to be a poor fit, the test user should cautious about interpreting the info
22
Q

What is the scale analysis as an application of the item response theory?

A

You look at items on the questionnaire and try grouping them together based on strength of the item - is the item about the latent variable?
- You then do an analysis in a computer programme which calculates the difficulty levels and the discrimantion value between the items
- The higher the discrimination value, the better the item is since it is easily distinguishable on the latent variable from other items and adds important info to the results

23
Q

What is the item information?

A

Tells us the amount of info we have for a given latent variable position

24
Q

What is beneficial about the IRT approach when it compes to item information?

A

IRT approach allows for the possibility that a test might be better at reflecting differences at some trait levels rather than at other trait levels

25
Q

Why is item information important for the psychometric properties of a test?

A
  • If a test’s items have characteristics (e.g., item difficulty levels) that are more strongly represented at some trait levels than at others = test’s psychometric quality might differ by trait levels
  • A test provides ‘good info’ when it can accurately detect differences between individuals who have different trait levels
  • To reflect much smaller and more subtle differences between test takers, we need a test with stronger psychometric properties
26
Q

What do we need to compute the item’s information value at a particular trait level I (θ)?

A

We need the probability that a respondent with a particular trait level will answer the item correctly (Pi (θ))
I (θ) = Pi (θ)(1- Pi (θ))

No need to remember the formula just to understand it

27
Q

What is an item information curve(function)?

A

Quantifies item information thorugh computing information values at many more trait levels (picture 6)
- Tells us how much information an item is giving about the latent variable at certain difficulty levels

28
Q

Interpret the Item information function (IIF) (lower graph on picture 6)

A

The item provides the most information at trait levels around the item’s difficulty (β), where the peak of the bell curve is located

The amount of information decreases as θ moves away from β in either direction
↪ If the respondent’s trait level is much higher or lower than the difficulty, the item provides less information about their trait

The width of the curve relates to the discrimination parameter (α): a steeper ICC (higher α) leads to a narrower, taller IIF, meaning the item provides more precise information over a smaller range of trait levels

29
Q

What does the top graph of picture 7 represent?

A

Together, the curves show how different items target different levels of the latent trait, providing a range of information across the scale

30
Q

What is scale information function (bottom graph of picture 7)?

A

Evaluates the quality of the test as a whole

The sum of the individual item information functions, indicating that the scale provides the most information at trait levels where the individual items overlap

  • Useful for illustrating the degree to which a test provides a different quality at different trait levels
31
Q

Interprest the scale information function on picture 7 (bottom)

A

The peak - where the items provide the most combined information (likely near the middle of the trait range)
↪ indicates that the test is most reliable at these trait levels

The width - the range of θ values where the scale provides useful information
↪ The scale will provide less information at the extreme ends (very low or very high trait levels), where only one or two items contribute to measurement

32
Q

What does the scale information function illustrate in terms of the IRT perspective? How does this differ in the CTT perspective?

A

From IRT perspective, a test’s psychometric quality can vary across trait levels

  • Different from CTT, where a test has one reliability that can be estimated using an index such as a coefficient alpha
33
Q

What can we use the scale information for?

A
  1. Norm-referenced test
  2. Criterion referenced test

Picture 8

34
Q

What is the function of norm-referenced tests?

A

Compare score to population (e.g. intelligence test)

  • We need broad range of information so that we have enough info for each individual in the population

Picture 8 top graph

35
Q

What is the function of criterion-referenced tests?

A

Determine if somone passes a certain cut-off (e.g. personal selection test)

  • we need the most information at the cut-off point so that we don’t make mistakes at determining whether a person belongs there or not
  • above or below the cutoff we don’t need that much info since we either take the person or not based on the selection test
36
Q

What is Differential Item Functioning?

A

Used to study whether an item is fair towards different groups
- The groups should be equal on the latent variable so we would expect no mean difference on the item scores between the two groups

37
Q

Why is it important to find differential item function? What if an item shos DIF?

A

So that we can meaningfully compare the groups on an item

  • If a test includes another variable (besides the latent one) that plays a role in how well one of the groups is able to answer, it displays differentional item functioning (might be biased towards one group)
    ↪ the item functions differently in the different groups
38
Q

How does item differential functioning look on a graph?

A

Picture 9
After computing Item Characteristic Curve for both groups, we see that the item is more difficult (more to the right on the graph) for one group even though they have the same ability on the latent variable
- E.g. the dutch student has higher probability of answering correctly even though both internationals and dutch students have the same statistical knowledge
- Mean difference on the latent variable doesn’t matter as we are considering each value of the variable - the students all have the same ability on the latent variable

39
Q

What else is the differential item functioning uselful at?

A
  • Detecting whether the item discriminates differently for one group
  • E.g. for international students it doesn’t discriminate that well (much more false positives and false negatives)
  • Guessing is not commonly considered as a source of DIF

Picture 10

40
Q

What is Person Fit?

A

An attempt to identify individuals whose response pattern doesn’t seem to fit any of the expected patterns of responses to a set of items
- E.g. Finding a person who responds correctly to a difficult items but doesn’t respond correctly on easy items

41
Q

What does poor person fit indicate?

A
  • It could indicate cheating, random responding, low motivation, cultural bias of the test, intentional misrepresentation, or even scoring or administration error
    It might reveal that a person’s personality is unique and that they don’t fit the ‘‘typically expected’’ pattern of responses
42
Q

What is a computerized adaptive testing?

A
  • Provides an accurate and very efficient assessment of individuals’ latent variable
  • It uses an item band (many items of which we know the difficulty and discrimination of
  • Computer selects the item which gives the most information for you (probability of a correct response of 50%)
    ↪ Could be completely different for each person
  • This could be different for each subject since they differ in the position on the latent variable
43
Q

How does the computerised adaptive testing work?

A
  • If you answer correctly, you get a more difficult item
  • If you answer incorrectly, you get an easier item

↪ Through this, the computerised adaptive test algorithm is trying to find your latent value
- It will start by first assuming that you are on an average value (theta = 0), then it continues giving you items based on the response you give and whether you’re correct or not
- It continues until the estimate for the latent trait doesn’t change anymore
- That is your latent variable

44
Q

What are the advantages of Item Response Theory?

A
  1. Population and test are independent - subject characteristics captured by the latent variable, test characteristics captured by the item discrimination and difficulty
  2. Focus on items, not the test
45
Q

What are the disadvantages of the Item Response Theory?

A
  1. Statistically more complex - need computer to calculate the latent variable and other characteristics
  2. Needs large samples - to be able to estimate all the variables
46
Q

Contrast Classical Test Theory (CTT) and Item Response Theory (IRT)

A

Classical test theory

  • Sum score is taken as the construct score
  • Focus on test
  • Reliability of a test

Item Response Theory

  • Score on latent variable is taken as the construct score
  • Focus on items
    ↪ Difficulty / discrimination
  • Item/Scale information - instead of reliability of a test
  • Computerized Adaptive Testing (CAT)