Lecture 2: Item response theory Flashcards

1
Q

Describe a typical measurement model

A

It described the relationship between a construct and other variables, measured by test Bp. The latent variable θ is measured by three items Xp1, Xp2 and Xp3, which each carry with them error Ep1-3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Distinguish between the four types of latent variable models by how they’re used

A

Item response theory: Continuous latent variable, categorical observed data

Factor Analysis: Continuous latent variable, Continuous observed data

Latent class analysis: Categorical latent variable, categorical observed data

Latent Profile analysis: Categorical latent variable, continuous observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What could scoring on a unidimensional IRT model look like?

A

The scoring is could be:
correct (0); false (1)
yes (1); no (0)
agree (1); disagree (0)

i.e unidimensional categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe a unidimensional IRT model in regards to its function, expected value and when it is suitable.

A

The measurement model is the expectation of the item score as a function of the latent variable:
E(Xpi | θp) = P(Xpi = 1|θp)

We know that Xpi is categorical and θp is continuous. By definition the expected value of a dichotomous variable (variable that only has 0 and 1 in it) is the same as the probability of scoring a 1 on that variable.

Therefore the focus is finding the probability of answering 1 to an item as a function of the latent trait. That probability as a function of the latent trait follows an S shaped curve (see doc); if you are high on the latent variable (e.g verbal ability) then you will have a score close to 1 (representing the ‘correct’ answer). Since it is a unidimensional model, it assumes that there is only one variable (verbal ability being measured).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Is it common to model unidimensional IRT models?

A

Yes, it is uncommon to model multidimensional IRT models because IRT is typically used for very strict tests that need to measure one thing, e.g academic tests, however they do exist. They will not be discussed in this lecture however, and will be more discussed in the FA model lectures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Aside from correct, incorrect exams, when else are IRT models useful

A

In measuring a trait/ disorder, e.g depression or an opinion e.g death penalty etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Since IRT is based on an S-shaped relation between Xpi and θp, we need an S shaped function.

Name the two most popular options for this function.

A

Normal ogive function (cumulative normal):
𝑓 𝑥 = Φ(𝑥) = 1 / sqrt(2𝜋) 𝑥S−∞ 𝑒^(−1/2)(ℎ^2) 𝑑ℎ

Logistic function:
𝑓 (𝑥) = 𝑒^𝑥/ 1+𝑒^𝑥
𝑓 (𝑥)= 1 / 1+𝑒−𝑥

Don’t have to know models by heart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the normal give function do? Why is it suitable for IRT?

A

𝑓 𝑥 = Φ(𝑥) = 1 / sqrt(2𝜋) 𝑥~−∞ 𝑒^(−1/2)(ℎ^2) 𝑑ℎ

It uses a cumulative normal distribution and gives you the probability of finding a score ‘x’ or smaller (e.g probability table; p score).

If you plot it; put x on the x axis and the result of the function on the y axis you get this s shaped curve. This is logical because it approaches 1. This makes it optimal for IRT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the logistic function

A

𝑓 (𝑥) = 𝑒^𝑥/ 1+𝑒^𝑥
𝑓 (𝑥)= 1 / 1+𝑒^−𝑥

𝑒 denotes a number (around 2.72) and this is just an exponential function. 𝑒^𝑥 is denoted to as exp(x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do you use
𝑓 (𝑥) = 𝑒^𝑥/ 1+𝑒^𝑥

and when do you use
𝑓 (𝑥)= 1 / 1+𝑒^−𝑥

A

You can use either of the functions, they both do the same thing. Sometimes it is denoted as the first, sometimes as the second.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Traditionally which function was used and when was did the other function come into play?

A

Traditionally IRT models have been developed using a normal ogive function. Later the logistic function was used as an approximation as it was easier to compute with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How was the logistic function adapted to make it closer to the normal ogive function?

A

If D = around 1.71 then:
𝑓 (𝑥) = 𝑒^D𝑥/ 1+𝑒^D𝑥

is very close to the normal ogive, as it makes the function more steeper (see doc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is the logistic function sometimes preferred to the normal ogive function?

A

Because the Normal Ogive function is quite complicated and the logistic function is less complex with less variables. This makes it easier to compute and also easier to make derivatives of the model, which is used a lot in modelling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How useful is the addition of the D to the logistic function and how is it useful?

A

It is not very useful, it is usually just used by people who want to stick to the old ogive framework. While it doesn’t matter at all; your model will be exactly the same, it will be conceptionally the same, your results will be the same. (Explained just in case you see it in some papers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What was the first IRT model ever developed? What is it known as now?

A

The original ‘Rasch model’:
𝑃 (𝑋𝑝𝑖 =1 |𝜃𝑝) = 𝑒^ 𝜃𝑝−𝑏𝑖 / 1+𝑒𝜃𝑝−𝑏𝑖

Also known as the one-parameter model:
𝑃 (𝑋𝑝𝑖 =1 |𝜃𝑝) = 𝑒^ 𝑎(𝜃𝑝−𝑏𝑖) / 1+𝑒^𝑎(𝜃𝑝−𝑏𝑖)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain the one parameter model

A

𝑃 (𝑋𝑝𝑖 =1 |𝜃𝑝) = 𝑒^ 𝑎(𝜃𝑝−𝑏𝑖) / 1+𝑒^𝑎(𝜃𝑝−𝑏𝑖)

𝑒^ 𝑎(𝜃𝑝−𝑏𝑖) / 1+𝑒^𝑎(𝜃𝑝−𝑏𝑖) is the logistic function

Item parameters:
b tunes the location of the s shaped curve on the latent trait dimension (x axis), i.e if someone is high on a latent variable and therefore has a high probability of answering certain questions ‘correctly’. The curve shifts to the left/ right if you change b. b therefore denotes item difficulty: if the curve is more to the right then it is a more difficult item as even those with a high level of the latent trait struggle to get a higher score.

a denotes the slope/ steepness of all the items. The Rasch model does not account for that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is score of the latent trait denoted on these graphs?

A

The mean 𝜃 is 0 and the SD is 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How can you measure probability using these graphs

A

You can draw vertically up from the theta score and horizontally across: e.g for bi = 1 and a latent 𝜃 score of 1, the probability for answering the item correctly is 0.5. b is located where the probability of a correct answer is 0.5 for a given score of theta in a two-parameter model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does it mean to say that a and bi are fixed effects while 𝜃 denotes random effect?

A

a and bi are fixed effects while 𝜃 denotes random effect. This means that the latent variable has a (mostly normal) distribution since it is a sample from the population. The a and b parameters that you estimate in your model but typically do not have a distribution since b is about items and items are considered fixed. ai does not typically differ across items; it is fixed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What name is given to this S shaped curves that the normal ogive function and the logistic function are attempting to mimic?

A

Item Characteristic Curve (‘ICC’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What changes in the 2 parameter model?

A

An item specific discrimination parameter ‘ai’ is introduced. This means that steepness can be specified for different items:

𝑃 (𝑋𝑝𝑖 =1 |𝜃𝑝) = 𝑒^ 𝑎i(𝜃𝑝−𝑏𝑖) / 1+𝑒^𝑎i(𝜃𝑝−𝑏𝑖)

22
Q

How does changing the value for ai mean for the function and how can it be interpreted?

A

a higher value for ai (e.g 10) means a steeper slope, a smaller value for ai (e.g 0.1). A negative value for ai would have a backwards curve and would typically be for a contra indicative item. The standard value for a before was 1.

ai tunes how well an item can distinguish between people who differ on the latent variable. A large ai means that small differences in the 𝜃 score have a large effect on the probability of answering the item correctly at a specific level of the latent trait denoted by bi. A small ai value means that small differences in 𝜃 does not have much effect on the probability of scoring the item correctly and therefore is poor at discriminating people according to 𝜃, however spans across a range of 𝜃 values.

ai is therefore known as the discrimination parameter.

23
Q

ai is the _______ parameter and bi is the ______ parameter

A

ai is the discrimination parameter and bi is the difficulty parameter

24
Q

Why does it make sense that a contra-indicative item would have a negative ai?

A

Because the probability of a ‘yes’/ correct response gets lower as you are higher on the latent trait

25
Q

Does the one-parameter model have a discrimination parameter?

A

Yes, but it is the same for all items

26
Q

How do you calculate the probability of someone scoring incorrectly on an item?

A

if the probability of scoring correctly on a two-parameter model is:
𝑃 (𝑋𝑝𝑖 =1 |𝜃𝑝) = 𝑒^ 𝑎i(𝜃𝑝−𝑏𝑖) / 1+𝑒^𝑎i(𝜃𝑝−𝑏𝑖)

then the probability of being incorrect is 1- prob(correct) i.e:
1−𝑃 (𝑋𝑝𝑖 =1 | 𝜃𝑝 ) = 1 - (𝑒^ 𝑎𝑖(𝜃𝑝−𝑏𝑖) / 1 + 𝑒^𝑎𝑖(𝜃𝑝−𝑏𝑖))
= 1 + 𝑒^ 𝑎𝑖(𝜃𝑝−𝑏𝑖) / 1 + 𝑒^𝑎𝑖(𝜃𝑝−𝑏𝑖) - 𝑒^ 𝑎𝑖(𝜃𝑝−𝑏𝑖) / 1 + 𝑒^𝑎𝑖(𝜃𝑝−𝑏𝑖)
= 1 / 1 + 𝑒^𝑎𝑖(𝜃𝑝−𝑏𝑖)

this is similar to the other form of the logistic function without the -ai but NOT the same. The other form of the logistic model has a negative exponential

27
Q

What changes in the three parameter model? What function does this serve?

A

The probability of guessing correctly e.g in a multiple choice questionnaire you will score some correctly even if you have no knowledge on the topic. It can also represent a factor such as fraud or plagiarism which explains a high probability despite a low theta score. Therefore the guessing parameter, ci is included in the model:

𝑃 (𝑋𝑝𝑖 =1 |𝜃𝑝) = ci + (1 - ci) (𝑒^ 𝑎i(𝜃𝑝−𝑏𝑖) / 1+𝑒^𝑎i(𝜃𝑝−𝑏𝑖))

ai, bi and ci are fixed effects while 𝜃 is still random effect.

28
Q

What does this guessing parameter mean for the function? (2)

A

It adds a lower bound to the item. If the item discrimination has a negative value, then it adds an upper bound to the item.

It also means that b is not the point of the probability = 0.5 anymore since ci affects the lower bound and therefore where 0.5 is located.

29
Q

Does ci increase item difficulty

A

There is a slight increase in item difficulty with an increase in the value of ci. Not really worth worrying about

30
Q

What is meant by a graded response model?

A

Where each item is scored according to ordinal responses e.g likert scales 1-5 for applicability, agreement etc. Designed for ordered categories

31
Q

What model family are graded response models part of and why? Describe what changes in the formula

A

Part of the cumulative model family because instead of the probability of modelling 1 or 0 𝑃 (𝑋𝑝𝑖 =1 |𝜃𝑝) , you model the probability of x being equal to c or higher (𝑃 𝑋𝑝𝑖 ≥𝑐 | 𝜃𝑝). This is known as accumulative probability. This accumulative probability is then subjected to a two parameter model:

𝑃 (𝑋𝑝𝑖 ≥𝑐 | 𝜃𝑝) = 𝑒^ 𝑎i(𝜃𝑝−𝑏𝑖𝑐) / 1+𝑒^𝑎i(𝜃𝑝−𝑏𝑖𝑐)

Where bic is now a difficulty parameter for both the item and the category. There was c - 1 category parameters where c is the number of categories. So a 5 point Likert scale would have 4 category parameters. This makes sense when you consider a one dimensional model having 1 difficulty parameter despite there being 0 and 1.

32
Q

How is the graded response model calculated

A

The model can be used to calculate the probability of scoring a particular category or higher. The location of the curve changes as c changes (e.g bi3; 𝑃 (𝑋𝑝𝑖 ≥3 | 𝜃𝑝) ) and also the question the function is answering. The question would be what the probability of scoring 3 or higher on item i is. This is the same for bi2 - 4 but different to bi5, as there is no score higher than 5 on a 5 point scale and the question would become simply the probability of scoring 5.

The functions bi1 (prob scoring higher than 2) until bi5 (*probability of scoring a 5) form 4 S shaped curves progressively further along the axis representing theta. This is not particularly useful however, as we want to find the probability of scoring a particular score, not that score or higher.

Thus the category probabilities are calculated:
𝑃 (𝑋𝑝𝑖 = 𝑐 | 𝜃𝑝) = 𝑃 (𝑋𝑝𝑖 ≥𝑐 | 𝜃𝑝) - 𝑃 (𝑋𝑝𝑖 ≥𝑐 + 1 | 𝜃𝑝)
e.g
𝑃 (𝑋𝑝𝑖 = 3 | 𝜃𝑝) = 𝑃 (𝑋𝑝𝑖 ≥3 | 𝜃𝑝) - 𝑃 (𝑋𝑝𝑖 ≥4 | 𝜃𝑝)
(because 𝑃(3,4,5)– 𝑃(4,5) = 𝑃(3))

33
Q

Describe the graphed functions of the graded response model

A

𝑃 (𝑋𝑝𝑖 = 1 | 𝜃𝑝) forms an S shaped curve towards negative values, 𝑃 (𝑋𝑝𝑖 = 5 | 𝜃𝑝) forms an S shaped curve toward higher positive values. Between these 𝑃 (𝑋𝑝𝑖 = 2 | 𝜃𝑝), 𝑃 (𝑋𝑝𝑖 = 3 | 𝜃𝑝) and 𝑃 (𝑋𝑝𝑖 = 4 | 𝜃𝑝) form normal distributions. 𝑃 (𝑋𝑝𝑖 = 5 | 𝜃𝑝) and 𝑃 (𝑋𝑝𝑖 = 1 | 𝜃𝑝) act as upper and lower bounds for your scores respectively, If they were normally distributed then it would be predicted that people very low or high on theta would reply with nothing.

34
Q

Name two more models belonging to other model families and the family it belongs to

A

Partial Credit Model: Adjacent category model family

Nominal Response Model: Baseline category model family

35
Q

What is characteristic of a partial credit model?

A

Its a model for ordered categories e.g subjects can obtain 0-3 points for a given question; like open questions in exams

36
Q

How does the partial credit model differ to that of the graded response model in regards to its parameters?

A

The curves produced by the model look quite the same however the interpretation of the parameters is slightly different. Since it is an adjacent category model, the b parameters are the points at which the curves of each point overlap. This still reflects item difficulty however as it still affects the placement of the curves on the x axis (higher, more to the right, more difficult)

37
Q

What is characteristic of a Nominal Response Model?

A

Baseline category model family: Model for unordered categories. E.g an arithmetic test with multiple choice

38
Q

Describe the nominal response model in terms of how it works and the subsequent function

A
Example question:
Which of these models does not have a bic parameter?
1. Three parameter model
2. Nominal response model
3. Two-parameter model
4. Graded response model

All the answers except 2 can be scored 0 yet the NRM models each of these answers. The graphs look the same as the previous two models with the multiple curves however there is no numeric order to these graphs. The s shaped curves may be x = 4 and x = 2 and those in between may be x = 1 and x = 3. E.g in the above example, 1 may be the s shaped curve on the left because it is the parameter in which bic is introduced and only those with very little knowledge on the subject would select this.

39
Q

There are a lot of models so there have been efforts made to reduce them. All models discussed, except ________ are _________ models

A

All models discussed, except the three parameter model are generalised linear models

40
Q

What is meant by a generalised linear model?

A

You can transform the probability of answering a 1 (or correct response) to get a linear function, For some function 𝑓(.), 𝑓 (𝑃 (𝑋𝑝𝑖 =1 | 𝜃𝑝 )) is a linear function.

E.g for the one and two parameter models:
Logit function: f(x) = logit(x) = log(x/1-x)

one parameter:
logit(P(Xpi = 1| 𝜃𝑝)) = … = log(𝑒^𝑎(𝜃𝑝−𝑏𝑖)) = a(𝜃𝑝−𝑏𝑖)

two parameter:
logit(P(Xpi = 1| 𝜃𝑝)) = … = log(𝑒^𝑎i(𝜃𝑝−𝑏𝑖)) = ai(𝜃𝑝−𝑏𝑖)

41
Q

When is the probit function used?

A

Something similar can be shown using the probit function, however the cumulative normal distribution is used. There the probit function is used for cumulative normal IRT models with the cumulative normal ogive functions §

42
Q

Name the link function and the form produced for each model

A

One-parameter model: Logit / Probit, 𝑎(𝜃𝑝 −𝑏𝑖)
Two-parameter model: Logit / Probit, 𝑎𝑖(𝜃𝑝 −𝑏𝑖)
Three-parameter model - -
Graded response model: Cumulative Logit / Probit, 𝑎𝑖(𝜃𝑝 −𝑏𝑖𝑐)
Partial credit model: Adjacent Logit / Probit, 𝑎𝑖(𝜃𝑝 −𝑏𝑖𝑐)
Nominal response model: Logit / Probit, 𝑎𝑖𝑐(𝜃𝑝 −𝑏𝑖𝑐)

43
Q

What is the equivalent of reliability (CTT) in IRT? How is this different?

A

Item information: The amount of information that an item provides for a given 𝜃𝑝. It is used for similar purposes as reliability however the important difference is that it gives you, loosely speaking, the reliability at a specific point of theta. In CTT you got the reliability of the test but it was mixed up with person properties and item properties etc. Here you look at an item rather than the whole test.

44
Q

What is the general item information formula for one and two parameter models?

A

𝐼𝑖(𝜃𝑝) = 𝑎𝑖^2𝑃𝑖(𝜃𝑝)𝑄𝑖(𝜃𝑝)

𝑃𝑖 (𝜃𝑝) = 𝑒^ 𝑎i(𝜃𝑝−𝑏𝑖) / 1 + 𝑒^ 𝑎i(𝜃𝑝−𝑏𝑖)
𝑄𝑖 (𝜃𝑝) = 1 / 1 + 𝑒^ 𝑎i(𝜃𝑝−𝑏𝑖)

where 𝑄𝑖 (𝜃𝑝) is the probability of an incorrect response, and 𝑎𝑖 =𝑎 for the one parameter model.

45
Q

What is the general item information formula for three parameter models?

A

𝐼𝑖(𝜃𝑝) = (𝑎𝑖^2) (𝑄𝑖 (𝜃𝑝) / 𝑃𝑖 (𝜃𝑝)) ({𝑃𝑖 (𝜃𝑝) −𝑐𝑖}^2/ (1 - cj)^2

46
Q

Describe an important characteristic of the item difficulty equation of the two parameter model. Briefly describe a second characteristic.

A

The item gives the most information at the item difficulty (bi, steepest point in s shaped curve; inflection point). This is intuitive because little differences in theta at this point make big changes to the probability of a correct answer compared to the two ends of the distribution.

The amount of information an item gives at the top of the item difficulty equals 1/4 ai^2. This is important because it means items with a large discrimination will give more information about the latent trait.

47
Q

What is different about the item information of three parameter models?

A

Maximim information is not at an items difficulty due to the lower bound changing the curve a bit. At the point b (difficulty parameter) you have a probability of ci + 1/2(1 - ci). In the item information function the maximum information is just above the item difficulty.

therefore the maximimum information of a three parameter model is b + … (don’t need to know exactly)

48
Q

How do the item information function change for contraindicative items

A

It doesn’t: the normal distributions look the same flipped and the negative values are squared so it doesn’t matter

49
Q

What are the item information values used to calculate? How is this achieved?

A

They are used to calculate the test information: the amount of information that a test provides for a given value of theta. The general formula is:
𝐼 𝜃𝑝 = E|n,𝑖=1| 𝐼𝑖 (𝜃)
where n is the number of items. ie just summing all the item information values

50
Q

What is meant by target information?

A

Desired information specified by the researcher. E.g wanting an intelligence test which covers the whole range of the latent variable would require specifying a target information function (looking at what shape you want your test information function to form), and fitting item information functions so that they make that shape (see docs). A target information function can also serve to have cut off points so that people at a particular value of theta are selected.

51
Q

How do you get item information from multiple category models?

A

Each response category gives information:
𝐼𝑖c(𝜃𝑝) = 𝑎𝑖^2𝑃𝑖c(𝜃𝑝)𝑄𝑖c(𝜃𝑝)

where 𝑃𝑖𝑐 𝜃𝑝 is the probability to respond in category 𝑐
and 𝑄𝑖𝑐 𝜃𝑝 is the probability to not respond in category 𝑐

Item information: 𝐼𝑖 𝜃𝑝 =E|𝐶, 𝑖=1| 𝐼𝑖𝑐 (𝜃)