IRT Flashcards
IRT’s desirable objectives (2)
(1) Administer SHORTER measures
(2) Compare scores across: DIFF measures of the SAME constructs in DISTINCT groups
Why is there a problem in administrating shorter measures according to CTT?
Problem bc relationship between LENGTH of test & RELIABILITY of test
-> Shorter test don’t have as high reliability as longer
Limitations of CTT (3)
(1) Adding/deleting items changes true score (because the true score is TEST-DEPENDENT, so comparison not possible across diff test forms)
(2) True score is interpretable ONLY in reference to NORM sample’s distribution of scores: SAMPLE-DEPENDENT
(3) Reliability of true score is function of the items used: All items of EQUALLY reliable, measure SAME RANGE of scores, reliability CONSTANT across scores
What’s the problem with CTT assumption that “Reliability of true score is function of the items used”?
In practice, some items are better than some others
Item Response Theory (IRT) Assumptions (4)
(1) True score defined on the LATENT trait dimension rather than observed score
(2) Knowing **PROPERTIES OF ITEM **a person endorses tell us the TRAIT LEVEL the person possesses
(3) Properties of an item do NOT change if we were to administer the item using different samples
(4) True score of the person does NOT change regardless of which sets of items we administer.
In IRT, we place both _______ and _______ on the same scale to be able to compare those two.
items characteristics; people characteristics
IRT is a family of mathematical models that describe the probability of a given response to an item as a function of _______________ and ____________. It models the _______________.
certain item characteristics; respondent true score; likelihood of you endorsing an item
IRT: What’s the chance you’re gonna answer YES to an item assessing HIGH attachment levels?
KNOWING what’s the level of attachment of an ITEM
+
Underlying level of INDIVIDUAL attachment
= likelihood of you saying yes.
Item Response Function
Representation of the probability of item endorsement across the range of true scores
=> models the likelihood of item endorsement across the entire range of underlying traits
IRT: TRUE SCORE =
PROB OF ENDORSING ITEMS WITH SPECIFIC CHARACTERISTICS given the trait level set.
Item Characteristic Curve (ICC)
Function that models the likelihood of endorsement => plot of the Item response function
Item Response Function
Probability that a person with a given ability level will answer CORRECTLY.
=> EQUATION that relates true score (theta) defined in latent dimension to the probability of endorsing an item.
=> DIFF CURVES FOR DIFF ITEMS!!!
Variables in Item Response Function
Y = Probability of item endorsement (“yes”) = HOW MUCH TRAIT LEVEL YOU POSSESS
X = Theta (latent trait) - e.g. entire range of math level
Theta is a CONTINUUM (from -infinity to +infinity)
Theta def + values
Entire range of latent trait.
=> CONTINUUM (from -infinity to +infinity)
=> Negative values = LOW levels
=> Positive values = HIGH levels
How does a typical ICC looks for items that are dichotomous (yes-no)?
S shape
Whare are item characteristics?
Item DIFFICULTY & Item DISCRIMINATION
What’s the “nature” of the ICC function?
MONOTONIC: Probability of item endorsement increases in theta.
ICC: In the middle of the curve, ____ changes in theta correspond with ___ changes in probability
small; large
ICC limited by 0 and 1, why?
Bounds of probability: You can never say a probability is ZERO or ONE (impossible); never reaching those two points
Item Difficulty def
b
The point in theta (X axis) where probability of endorsing an item is 50%.
=> To find it, start by checking 0.5 in the Y axis
=> Then you find what’s the level of theta (X) that correspond to item difficulty
Item difficulty typically range between ______
– 2 and + 2
(-/+ 2 = Arbitrary z-score)
Item difficulty:
=> NEGATIVE difficulties = _____
=> POSITIVE difficulties = ______
Items are “EASIER”, more frequently endorsed (doesn’t take much of the trait level to endorse);
Items are more “DIFFICULT”, less frequently endorsed
Item difficulty: What does it mean if Theta > b
Items more likely to be endorsed
=> When theta level is HIGHER than difficulty of the item
Item difficulty: What does it mean if Theta < b
Items less likely to be endorsed
=> When level of underlying trait LOWER than item difficulty
Theta = b
= 50%; item difficulty
Item Discrimination
a
Value of the slope at the STEEPEST point of the curve, i.e.,b= 50%;
-> Point in the curve where the increases in Y are the highest.
To find it: find theta for difficulty -> this is the point where beta is the most elevated
=> The steeper the line, the closer it is to VERTICAL.
Item Discrimination is related to ______
Item difficulty
Item Discrimination tells us ________
at which levels of data the item is most likely to differentiate best
=> Discriminates levels of theta
Discrimination typically ranges between _____
.5 and 1.5
[Item discrimination] What does it mean when…
=> Steeper slopes
=> Smaller slopes
Highly discriminating items;
Poorly discriminating items
Items would be most effective in measuring underlying trait at the level that correspond with _______.
item difficulty
→ Hard questions are more effective at measuring high levels of the trait.
Item Information Curve indicates _________
How well an item is working for EACH LEVEL of the trait.
= How well an item differentiate among respondents who are at different levels of the latent variable
= Item difficulty + Item discrimination
Item parameters determine the amount of information at what range of the latent trait. (1) What are the parameters (2) What info do they give
Item difficulty → Location on the latent trait where information is MAXIMIZED
Item discrimination → HOW MUCH INFO an item provides
Test Information Curve indicates _______
How much information is the TEST measuring? How well this is test at measuring things with precision?
=> The relative precision of the test in measuring diff levels of the data.
When talking about Test Information Curve (TIC), we’re talking about Validity or Reliability? Why?
We’re talking about RELIABILITY (NOT VALIDITY)
Bc it focuses on how precisely a test measures the latent trait ACROSS DIFF LEVELS OF THAT TRAIT.
=> *THE HIGHER THE CURVE, THE BETTER YOUR ASSESSMENT OF THE TRAIT (mountain)
The height of the TIC is (inversely) proportional to the ____________
Standard error of measurement (SEM)
=> TIC and SEM are inversely related.
-> Relibility = how much test scores are free of measurement error
SEM is ____ in regions of latent trait continuum where test information is the _____.
lowest; highest
In IRT, SEM is different for different latent trait values; how is that different from CTT?
CTT: 1 score of reliability for entire set of items
IRT: 1 item = 1 reliability coefficient; Measurement error is NOT equal across the entire range of data
How does IRT Help us Improve Psychological Tests? (4)
(1) IDENTIFY item characteristics (i.e., difficulty, discrimination)
(2) CHOOSE items with higher discrimination covering the entire range of the latent continuum
(3) INCREASE RELIABILITY with fewer items
(3) COMPARE itemps across DIFF MEASURES of SAME CONSTRUCT + Compare group differences
IRT Applications (2)
(1) Improving existing measures
(2) Detecting differential item functioning
Differential Item Functioning (DIF) examines ______
Whether scales and items function differently across different discrete groups.
-> Occurs when groups (such as defined by gender, ethnicity, age, or education) have different probabilities of endorsing a given item (controlling for overall score)
Differential Item Functioning (DIF) occurs when _________________
individuals from diff groups who have EQUAL levels of the UNDERLYING TRAIT, have diff probabilities of endorsing or agreeing with an item.
DIF analysis helps determine if items are ____ by _____________.
fair; examining group differences in responses while controlling for the trait level