L4: Modern test theory Flashcards

1
Q

what are the pros of classical test theory?

A
  • intuitive & easy to apply:
  • its in SPSS & easy to do in excel
  • no large sample sizes/ many items needed
  • can easily calculate reliability from it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are the cons of classical test theory?

A
  • focus on the test, not the items
  • test properties depend on the pop. (ex: reliability & difficulty of a test)
  • person properties depend on the test (ex: sum score is higher if the test is easy and lower if the test is difficult)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what does modern test theory do?

A

it specifies a measurement model in which we mathematically link the item scores to the construct aka latent variable / trait / factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the assumptions of modern test theory?

A

unidimensionality: you only measure 1 construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is item response theory?

A

specific theory within modern test theory about how the construct measured is related to its items using 1 of 4 models that all have an Item Characteristic Curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the 4 item response theory models?

A
  • 1 parameter/ Rasch Model: shows item characteristic curve
    w probability of getting item correct in relation to the level of construct you have (using item difficulty (beta))
  • using a 2 parameter logistic model (using item discrimination (alpha, steepness of the curve) as well as item difficulty)
  • using a 3 parameter logistic model (using guessing value (Ci), as well as item discrimination (alpha, steepness of the curve) as well as item difficulty)
  • graded response model for likert scales
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

define item characteristic curve

A

graphical representation that shows prob of answering an item correctly across different levels of the trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what can you do with the 4 models of item response theory?

A
  • scale analysis (analyze how good the difficulty (medium), discrimination (high), guessing etc) the items are based on the graph) & test construction & improvement (get item info from item characteristic function) -> select items that provide most info
  • person fit: identifies ppl whose response patterns do not fit hte expected model which could indicate guessing or inconsistent responding
  • study item fairness (differential item functioning, DIF)
  • computerized adaptive testing CAT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the item information function?

A

a function that shows at what point in the curve of the item characteristic curve we get the most info
all the item functions together show the scale info function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the scale information function? what is it useful for?

A

it shows all the info you get from all item info functions

useful when making a
- norm referenced test (when u compare score to population, ex; intelligence test), cause then you need to make sure that your scale info covers whole construct scale of population. so the function will be flat at prob = 1
- criterion referenced test (wehn u determine if someone passes certain cut off ex: exam) cause then u need to see all the info around the cut off

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how can you study item fairness with item response theory?

A

see if theres differential item functioning (item works differently for different groups, normally groups should not have a mean difference in an item score)
like a specific question in a stat exam that references the dutch education system is harder for non dutch ppl to answer
mean difference in trait doesnt matter cause u consider 2 ppl of some level on trait and see how likely to are to respond correctly
- guessing is not commonly considered as a source of DIF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is computerized adaptive testing?

A

its an algorithm that calculates what items are most relevant to you (where prob of correct response is 50%) from an item bank aka creates a test thats adapted to someones ability on the latent variable
(cause often theres many items that are either way too easy or way too hard, so doing them doesnt give you any extra info)
correct? -> more difficult item
incorrect? -> easier item
continue until estimate for latent trait doesnt change anymore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the pros of item response theory?

A
  • population (latent trait) and test (item difficulty, discrimination) are independent
  • focus on individual items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the cons of item response theory?

A
  • statistically complex
  • needs large samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

compare classical test theory to modern/item response theory

A

Classical test theory
* Sum score is taken as the construct score
* Focus on test
* Reliability of a test
Item Response Theory
* Score on latent variable is construct score
* Focus on items
* Difficulty / discrimination
* Item/Scale information
* Computerized Adaptive Testing (CAT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the 4 key factors affecting responses to test items?

A
  1. Respondent Trait Level: ex: ppl w higher levels of the trait more likely to answer difficult items correctly.
  2. Item Difficulty: how challenging an item is. In IRT, difficulty refers to the point on the trait continuum at which an individual has a 50% chance of answering the item correctly.
  3. Item Discrimination: how well an item distinguishes between individuals with different levels of the trait.
  4. Guessing: particularly important in multiple-choice tests, where respondents might guess the correct answer without knowing it
17
Q

In an IRT analysis the item difficulty of item 9 is found to be 0.78. What can you conclude?

A

People with a trait level of 0.78 have a 50% chance of answering item 9 correctly

18
Q

What are differences between the GRM on the one hand and the 1PL, 2PL and 3PL on the other hand?

A
  • The GRM is designed for polytomous items, the 1PL, 2PL and 3PL model for dichotomous items
    check_box
  • The GRM estimates multiple difficulty parameters per item, the 1PL, 2PL and 3PL model only one
19
Q

If you have 16 items that are scored on a 5-point Likert scale. How many difficulty parameters does a GRM model include for each item? Give a round number

20
Q

In a 2PL with 15 items. How many parameters are there in total? Give a round number

21
Q

In a 1PL with 20 items. How many parameters are there in total? Give a round number

22
Q

In a 3PL with 25 items. How many parameters are there in total? Give a round number

23
Q

In a GRM with 10 items that use a 5-point Likert scale. How many parameters are there in total? Give a round number

24
Q

What is the difference between the ICC’s of the GRM and the ICC’s of the 1PL model?

A
  • The ICC’s of the 1PL model are based on the difficulty parameters, while the ICC’s of the GRM are based on difficulty and discrimination parameters
  • In the 1PL model there is only one curve per item, in the GRM there are multiple curves per item
25
Q

What is the problem with a test that only has easy items, with regard to test information?

A

The test provides little information at high trait levels, since the items do not discriminate well among people with high trait levels

26
Q

In a 2PL, what can you conclude about the item information if an item has a difficulty of 0.65?

A

At a trait level of 0.65, the item provides the most information

27
Q

For a norm referenced test, the following statements about the applications of IRT correct?
1. Creating tests with good psychometric quality involves selecting items with average difficulty and high discrimination
2. Creating tests with good psychometric quality involves checking if there are items that function differently in different groups of people

A

only 2 is correct

28
Q

what are the steps of CAT?

A
  1. item properties are obtained from large amount of items
  2. respondents are presented w items w average difficulty
  3. respondents are presented w items adapted to their trait level
  4. the computer makes a solid estimation about the respondents trait levels
29
Q

What is the main benefit of using a CAT instead of a conventionnal test?

A

a cat requires administration of fewer items