Lecture 6 Flashcards
What item characteristics impact the reliability, validity, and even utility of a test?
-item difficulty [only in maximal performance tests]
-item discrimination [applies to all tests]
-distractor power: plausibility of the distractors [only in maximal performance tests]
What is Item Analysis?
-various methods can be used to analyze item properties. The modern advanced approach is called “Item Response Theory”
-item analysis tells us WHY scores on a specific test are more or less reliable or valid, and suggests ways to improve the reliability or validity of these scores
Why is it important to have items of increasing levels of difficulty in a maximal performance test?
-puts the participants at ease
-saves time
-items that are too hard, or too easy, bring no added value (no added information) to the results, and are thus useless
-items presenting an increasing level of difficulty ensures that the test scores will follow a normal distribution and provide an adequate discrimination between the participants
What are the formulas used to assess item difficulty index?
-p = # of participants with the correct response/N (total # of participants)
– p = proportion passing (right response; high p = easy item) [always varies between 0 and 1
-q = # of participants with an erroneous response/N (total # of participants)
– q = proportion failing
What is the ideal p in an item difficulty index?
-having an average p around .5, or 50% –> this allows for a maximal level of discrimination between participants, based on a series of easy, moderate, and hard items
-if p = 0 or 1 the item should be deleted; it does not help to discriminate among participants
What should we keep in mind for item difficulty index?
-the objectives of the testing situation and the number of response choices (need to be adjusted for guessing)
How do we calculate guessing and find an adjusted optimal p?
-4 mc: 25% chance of correct answer
-optimal level of difficulty (50%) is 50% of the remaining interval: (1-guessing probability)/2 = .75/2 = .375
-so optimal average p, corrected for guessing is: (adjusted middle point + guessing probability) = .375 + .250 = .625
What is item discrimination?
-assesses the degree to which different types of persons respond differently to the items
-whether the item does a good job at assessing what the test itself is supposed to assess
-also known as item validity
What does item discrimination look like in maximal performance tests and how do we measure it?
-a good item is correctly answered by the strongest respondents and incorrectly answered by the weakest respondents
-measured by: item-discrimination index
What does item discrimination look like in typical performance tests and how do we measure it?
-a good item is not answered in the same manner by respondents who present the characteristic versus respondents who do not present the characteristic
-measured by item-total correlation
How do we separate participants into different types for item discrimination index?
-split the group in two: 50% strongest and 50% weakest; [lacks precision]
-better: Split the group in three (or four) and retain only the two extreme (strongest, weakest) groups (25%/ 33%);
-the groups can be split on the basis of their scores on the test itself, or better, of their scores on a valid criterion measure.
What are the formulas used for item discrimination index?
-D = Item Discrimination Index
-D = PT – PB.
-PT: Proportion of the Top Group who answered correctly = number of participants with the correct response in the Top Group / number of participants in the Top Group
-PB: Proportion of the Bottom Group who answered correctly = number of participants with the correct response in the Bottom Group / number of participants in the Bottom Group
When should item-total correlation be used?
-it is the ideal procedure to assess item discrimination for typical performance tests using a reasonably continuous response scale - but can even be applied to binary (true-false) items using a point-biserial correlation
What is item-total correlation?
-assesses the degree to which each item is related to the total score on the test.
-should be neither too strong, nor too low.
-ideal: should be higher than 0.3.
-in very short tests, this estimate will be inflated. This is why it is typical to calculate a “corrected” item total correlation where the item itself is excluded from the total.
What is distractor power?
-in a Maximal Performance test relying on multiple choice questions, the erroneous responses are named Distractors.
-a distractor that is grossly erroneous reduces the quality of the test by increasing the probability of success due to guessing.
-a distractor that is too plausible also reduces the quality of the test by increasing the likelihood that people who know more will select it.