Chapter 6: Test Development Flashcards
6 steps of test construction
- Define the test (what are we testing and why)
- Select item format
- Construct test items
- Test the items (determine reliability and validity)
- Revise the test
- Publish the test
Answer choice formats: selected vs. constructed items
Selected items: pick from a number of answers (multiple choice, true/false, matching)
Constructed items: generate your own answers (short answer, essay)
Strengths of selected-response items
Can include more items (each question takes less time to answer)
Increased content sampling as well as reliability and validity
Reduction of construct-irrelevant factors
Scoring is efficient and reliable
Weaknesses of selected-response items
Developing items is time consuming (easier to write constructed items)
Unable to assess all abilities
Subject to random guessing (make it look like examinee knows more than he/she actually does)
Strengths of constructed-response items
Questions are relatively easy to write
Can assess higher-order cognitive abilities (have to show reasoning)
No random guessing
Weaknesses of constructed-response items
Test can include relatively few items (takes longer to answer each one)
Difficult to score reliably (even with good rubric, still hard)
Subject to misinterpretation (examinee might misconstrue question)
Construct-irrelevant factors can sneak in (ex- bad handwriting makes answers hard to read)
Which three things on a test should be clear?
Clear directions (examinee should know how to answer question) Clear questions (questions should only ask 1 thing; answering should be able to be done in a decisive manner) Clear print (should be easy to read)
5 things that should not be included on a test
Cues to answers (ex- including answer in a different question)
Items that cross pages (increases likelihood of examinee error)
Construct-irrelevant factors
Exact phrasing from materials (encourages rote memorization over understanding of concept)
Biased language and content
2 things to consider when placing items on a test
Item arrangement: placement should make sense
Number of items: if using a power test, should be able to complete questions in given time limit
Who the items on the test should be tailored to
The target population (example: wouldn’t give a college level test to 4th graders)
Multiple choice tests: factors to consider pertaining to item stem
Stem should clearly state question
Negatively stated stems should not be used
Multiple choice tests: factors to consider pertaining to alternatives
Alternatives should be brief
3-5 alternatives should be included
Alternatives should be grammatically correct in the question
Alternatives should be plausible (otherwise, question becomes a dead giveaway)
Multiple choice tests: factors to consider pertaining to questions themselves
Items should be clear and easy to read
Only 1 correct/best answer should be included
Placement of correct answer should be random (otherwise, examinees can detect pattern)
Minimize “none/all of the above” and “always/never” questions (becomes a dead giveaway)
True/false tests: factors to consider
Include only 1 idea in each item
Avoid specific determiners (never, always) and qualifiers (usually)
Keep items about the same length
Approximately equal number of true and false items
Matching tests: factors to consider pertaining to directions
State basis for matching
Indicate responses may be used once, more than once, or not at all
Matching tests: factors to consider pertaining to item design
Include more responses than stems (make it possible to get only 1 wrong)
Keep lists relatively short
Keep responses brief
Consider order of responses
What kind of material should be used in a matching test?
Homogenous material (all items should relate to a common theme)
Essay tests: factors to consider
Clearly specify the task
Develop a comprehensive scoring rubric
Limit essay items to objectives that can’t be easily measured with selected-response items
Grade blindly
Short-answer tests: factors to consider
Items should only require short answers
Only 1 correct response
Quantitative items: indicate level of precision desired
Use direct question as opposed to incomplete sentences
Factors to consider if using incomplete sentence short-answer questions
Include only one blank space (becomes dead giveaway otherwise)
Add blanks near end of sentence
Don’t give clues with blank size
Give enough space to answer
Short-answer tests: what should be created for each item?
Scoring rubric
Typical response tests: factors to consider pertaining to items
Focus items on experiences (thoughts, feelings, behaviors)
Limit items to a single experience
Avoid items that will be answered universally the same
Avoid leading questions
Limit use of “never” and “always”
Typical response tests: factors to consider pertaining to responses
Don’t always have high numbers represent the same thing
Odd or even number of options: neutral option requires odd number
Label options for Likert-type scales (rating from 0-7, etc.)
Consider structuring scale as an interview