Study Guide 10: Latent Ability, Item Characteristics, & Responses: Item Response Theory Flashcards
Binary item
an item with only two possible response options
Differential item functioning (DIF)
refers to the case when the ICCs are not the same for the two groups. Specifically, DIF occurs when examinees from different groups show differing probabilities of responding correctly to (or endorsing) an item after being matched on the underlying ability (latent variable).
Item response theory (IRT)
- a measurement theory that relates characteristics of items (called item parameters) and characteristics of individuals (called latent traits or latent ability and designed as ‘theta’) to the probability of giving a positive or correct response. IRT is popular because it provides a theoretical justification for doing lots of things that CTT does not.
Inflection point
An inflection point is a point on a curve at which the sign of the curvature (i.e., the concavity) changes.
Item Analysis:
A set of statistical techniques to examine the performance of individual items. This is important when developing a test or when adopting a known measure.
Item bias
an item is biased when knowledge or skills that are not relevant to the construct of interest are needed to either endorse the item or get it right
Item bias occurs when examinees of one group are less likely to answer an item correctly (or endorse an item) than examinees of another group because of some characteristic of the test item or testing situation that is not relevant to the test purpose. DIF is required, but not sufficient, for item bias.
Item characteristic curve (ICC)
line showing the relationship between the characteristics of individuals (i.e., the latent trait) and the probability of giving a positive or correct response.
Item impact
describes the situation in which examinees from different groups (e.g., male, female) have differing probabilities of responding correctly to (or endorsing) an item because there are true differences between the groups in their underlying ability (on the latent trait).
Item impact and item bias differ
in terms of whether group differences are based on relevant or irrelevant characteristics (respectively) of the test. DIF requires that members of the two groups be matched on the relevant underlying ability before determining whether members of the two groups differ in their probability for success.
Item information function (IIF)
The three item parameters can be combined into an item information function (IIF). The IIF describes how well, or precisely, an item measures at each level of the latent ability (i.e., theta). Higher information scores are better.
Measurement model
expresses the mathematical links between an outcome (e.g., a reponsdent’s score on a particular item) and the components that affect the outcome (e.g., qualities of the respondent and/or qualities of the item)
1PL, 2PL, 3PL models are all examples of the models.
They reflect the idea that an individual’s response to an item is determined by the individual’s trait level and by item properties (eg. Difficulty, discrimination).
Non-uniform DIF
the ICCs for the two groups do cross. In non-uniform DIF, which group performs better changes across the latent variable.
parameter
The variables a, b, and c, are the parameters of the curve. They vary from item to item and they define the specific shape of the ICC
Person fit
IRT is used to estimate item characteristics and then identify individuals whose responses to items do not adhere to those parameters. The analysis of person fit is an attempt to identify individuals whose response pattern does not seem to fit any of the expected patterns of responses to a set of items. Ex. It would be odd to find an individual who endorses a difficult item but doesn’t endorse an easy item.
Implications: poor person fit could indicate cheating, random responding, low motivation, cultural bias of the test, intentional misrepresentation, scoring/admin errors.
In personality assessment: may reveal that their personality is unique.
Polytomous item
has more than two response options. These require different IRT models – ex. Graded response model, partial credit model.