Chapter 6 - Writing Test Items Flashcards
DEVELLIS
Provided guidelines for writing Define what you want to measure Generate item pool Avoid long items Avoid double barrelled item with two ideas Mix positive and negative worded Items -acquiescence response set
Acquiescence response set
People tend to agree with test items
-SO word them on opposite directions
Dichotomous Format
- true/false
- can easily construct from text
- simple, quick, easy to score
- but encourages memorization
- truth comes in grey
- personality tests often use
Distractors
- the wrong answers in polytomous (multiple choice) test
- rare that more than 4 are efficient
- ineffective/joke ones hurt reliability -time consuming, losers guess right
- best to have 3-4 good ones
Likert Format
- test that requires person to indicate degree of agreement
- scoring needed negatively worded items to be reverse scored, then responses summed
- popular in measuring attitude
- can be subjected to factor analysis so developers should find groups of items that go together
- Format used to create likert scales
Category Format
- 10 point rating scale
- controversy: people change ratings depending on context AND subjects spread event across 10 points
- more reliable if all options clearly rated rather than the extremes labelled
Visual Analogue Scale
- given a line and asked to mark place between two defined end points
- popular for measuring self health
- not used for multi item scales because scoring is time consuming
Checklist
- test form common in personality measurements with adjective checklist
- must endorse an adjective or not
Q Sort
- test form used to described self or provide ratings of others
- given statement and asked to sort into 9 piles
- hit home pile 9, not at all pile 1
- most put around 5 making bell curve
Item Difficulty
- number of people who get item correct
1. Determine probability item could be answered by chance (.50 T/F, .25 for MC)
2. Add 1.00 to chance performance and divide by 2.0
Optimal Item Difficulty Level
Halfway between 100% of people and level of success expected by chance alone
-ex. ODL for 4 multiple choice is .625
Extreme Group Method
Compared people who have done well with those who suck
-difference between these is discrimination index
Discrimination Index
Difference between people proportion of people who got answer right with those that didn’t
Point Biserial Method
Point Biserial correlation
Correlation beteeen dichotomous variable and continuous variable
Item Characteristic Curve
Can prepare graph for each test item
- total test score is plotted on horizontal x
- proportion of examiners who get the item correct plotted on vertical y
Item Response Theory
Newer approaches to test based on item analysis consider chances of getting particular items right or wrong
- each item on test has its own item characteristic curve that describes probability of getting each answer right given ability level
- some say it’s most important development in testing
PROMIS
Patient Reported Outcome Measurment Information Systems
- to develop precise measures that described how patients report their health status
- designed to provide practicing docs with highly valid measures that can be used in clinical practice research
- large item bank created
PROMIS
Patient Reported Outcome Measurment Information Systems
- to develop precise measures that described how patients report their health status
- designed to provide practicing docs with highly valid measures that can be used in clinical practice research
- large item bank created
Criterion Reference Test
Compared performance with some clearly defined criterion for learning
- popular in individualized instruction programs
- many regard CRT as diagnostic instruments to
- if student does poorly on some items, teacher knows individual education needs more focus in one area
Antimode
Least frequent score on frequency polygon
- made when evaluating test items
- item is given to two groups of students (one that’s been taught material, one that hasnt)
- V polygon results: scores on right from inexperienced, scored on left from experience led
- BOTTOM OF V IS ANTIMODE: least frequent score!
- this point divides those who’ve been exposed from those who haven’t
- taken as CUTTING SCORE
- when people score higher, they’ve met objective