Week 8- Item Response Theory Flashcards
What is Item Response Theory?
New theory with a focus on test items that adds more tools for solving measurement problems in psych in relation to test bias, adaptive testing, item selection
It is concerned about the relaitonship between observed responses to items and the underlying dimension or construct.
What is the most important thing for a clinical psychologist?
Measurement
What is the difference between classical test theory and irt?
CTT focuses more on the total score of a
scale or subscale
• IRT focuses on the relationship between
items and the total score or latent dimension
underlying the test.
What is central in IRT?
The relationship between the item and the overall construct.
What is a negtive of IRT
because the scoring is more technical it takes longer
What does IRT assume
Assumes there is a relationship between responses to items and the underlying or latent dimension being assessed by the scale.
What is the item characteristic curve?
The relationship between the probability of a correct response on a true/ false item and the underlying dimension can be assumed to take the form of a cumulative normal distribution
What is sample invariant?
Something that doesnt depend on the sample they are drawn from
What does the slope in the curve represent?
The estimate of discrimination
What does the point on the x axis represent?
The difficulty or threshold
How do we measure on the item characteristic curve?
Finding where the 50% point of probability of responding is and then dropping down
What is a pseudo-guessing parameter?
can be used to estimate the probability of a response for people with very low levels of the underlying dimension
Rasch model
Items had same discrimination but that differ in difficulty.
National Survey of Mental Health and Wellbeing
•Larger, Cutdown & Tolerance discriminate at close to the estimated diagnostic threshold
•Other criteria are markers of
more severe states
•Estimated diagnostic threshold shown as vertical line
What are the two types of IRT models?
Parametric or Non-Parametric.
How many parameters are estimated?
1- Parameter or Rasch Models. They assume all items have the same slope or discrimination and differ only on difficulty or threshold. Infrequently used because real life doesnt meet up to the standard.
2 parameter models- like seen in slides (two parameters are discrimination and threshold)
3 parameter models- add a parameter for pseudo-guessing (more widely used in ability testing)
What is item bias?
When you look at the question and ask if it is as fair as it could be
What a the test for item bias?
Does the item behave differently for people from two groups who are at the same level of the underlying dimension or trait?
How do you select test items?
On the basis of achieving the desired test information function e.g. are you testing if someone has a severe alcohol problem or if something may develop an alcohol problem in the future
What is item banking and adaptive testing
Choosing items that provide the best estimate a persons level on the latent dimension with known precision
How do you start item banking?
You pick at item with a mean that has high discrimination and average difficulty
How do you choose the next item
Is chosen to maximise the informatuin (minimize the error of measurement)
What is cat item selection?
For example a 4 item bank for a maths test
– Item A “567 + 235 = ?” difficulty =0
– Item B “456 / 56 = ?” difficulty=1
– Item C “24 + 78 = ?” difficulty= -1
– Item D “10 + 15 = ?” difficulty=-1.5
• If Item A correct then the next choice would be item B
and if correct maths ability would be at or above 1 (in
standardised scores mean=0, SD=1)
• If item A incorrect then next choice C then if C
incorrect then item D if D incorrect then ability would
be less than -1.5 (standardised)
• More items would give greater allowance for getting a
single item wrong (increase reliability and reduce
standard error)
What are the important rules in Embretson’s new rules of measurement?
Standard error of measurement, assessment of item properties, mixing item formats
What is standard error of measurement
New Rule: The standard error of estimation differs across scores but generalises across populations. SE(0)= 1 to the square root of I(0).
The standard error of estimation is the inverse of test information function
If something has more information it has less error
What is the assessment of item properties
• Old: Unbiased assessment of item properties
depends on having representative samples
• New: Unbiased assessment of item
properties may be obtained from unrepresentative
samples
What is mixing item formats
Old: Mixed item formats leads to unbalanced
impact on test total scores
• New: Mixed item formats can yield optimal test
scores