Ch 8.2 Test Development Flashcards
Item Difficulty
Item difficulty index (p)
- Perhaps should be named “Item Easiness”
- Represents the proportion of test-takers who got the item right
- Range from 0 (no one gets the item right) - 1 (everyone gets the item right)
- What is a reasonable range for difficulty
Item Analysis (Item Difficulty)
- Item difficulty index
- (# people answered correctly) / (total # of people answering the question)
- Larger values = Easier items
- Good rule of thumb is px = .5 (.3-.8)
Item Analysis (Discrimination)
Item Discrimination (D)
- Like convergent or predictive validity, but on the item level
- To discriminate between high and low scorers
- Items must discriminate between levels of an external construct / ability / overall score
– Does each item predict the outcome?
– Ex: anxiety, depression
Item Analysis (Discrimination) Characteristics
- Ranges from -1 to 1
- Larger values = more adequately discriminates between high and low scorers
- Item can discriminate when D > .35
Item Characteristic Curve (ICC)
A graphic representation of item difficulty and discrimination
Item Analysis (Other Considerations) [placeholder]
.
What about guessing?
- Some have tried to formulate item analysis “corrections for guessing”
- Problematic because the guess is not completely random (at times people are able to rule out alternatives)
- In the test manual, developers should provide:
– Specific instructions that examiner gives the examinee
– Specific instructions for scoring & interpreting omitted items
What about speed tests?
Items near end of test may appear more difficult in reality (test-takers may not get to those items)
- Items at the end may look like they discriminate better than they actually do (test-takers who know the material may work faster and are more likely to reach later items
Test developers should:
- Administer tests with generous time limits then conduct item analysis, and then norm the test using necessary speed conditions
Item Analysis (Qualitative Factors)
Qualitative item analysis
- Nonstatistical procedures used to explore how individual test items work
- Rely on verbal procedures
- Think-aloud technique
– Describe thought processes related to answering test items
Test Revision [placeholder]
Can occur as a stage in…
New test development
- Use quantitative & qualitative item analysis data to create a new form
- Administer new form
to standardization
Life cycle of existing test, if…
- Dated stimulus materials
- Dated vocabulary
- Outdated norms
Cross validation
- Revalidation of a test on a sample other than those on whom test performance was originally found to be a valid predictor of some criterion
- Different from a standardization sample
- Depression screening in sample A and then sample B
Validity Shrinkage
- Expected decrease in item validities observed during cross-validation
- Why?
- Operation of Chance