Study Guide: Test Construction Flashcards
6 Steps of Test Construction
- Define Test’s Purpose
- Preliminary Design Issues
- Item Preparation
- Item Analysis
- Standardization and Ancillary Research
- Preparation of Final Materials and Publication
Test Purpose
- What will be measured?
* Who is the target audience and does the construct match the group?
Preliminary Design Issues definition
Anything that introduces error. Must strike a balance between efficiency and accuracy as well as meet breadth and depth
Examples of Preliminary Design Issues
- Mode of administration
- Length–longer is more reliable (up to ~15min)
- Item format (T/F, multiple choice, essay)
- # of scores (For example, “Depression is multi-faceted and has many scales)
- Training
- Background research
What is the most important Preliminary Design Issue?
Background research!
4 parts of Item Preparation
- Stimulus
- Response
- Conditions governing responses
- Scoring procedures
Stimulus
The question itself is a stimulus.
*We are trying to provoke a specific response correlated with the construct driven ONLY by the stimulus
Response
The behavior you are looking for that is correlated to a construct
Conditions governing responses
What are your rules? Is there a time limit? Are they able to ask questions?
Scoring procedures
Formula or rubric used to formulate final scores
*Make sure each facet is represented (weighted)
Types of Test Items
- Selected-Response Items
* Constructed-Response Items
Selected-Response Items
Where you know all possible responses without bias
*T/F, multiple choice, Likert scale, etc.
Constructed-Response Items
Responses are unknown/more nebulous
*Essays, oral responses, performance assessment
Benefits of Selected-Response Items
- One clear answer
* scoring reliability and efficiency
Benefits of Constructed-Response Items
- No agreed-upon answer
- Bx can give further context
- Goes deeper into construct
Item Analysis
- Item Tryout
- Statistical Analysis
- Item Selection
Item Tryout
aka Pilot Test
- Get subjects similar to target population–cannot be same people used in actual survey
- 2-3x the items you think you will need.
Statistical Analysis
- Difficulty
- Discrimination
- Distractor Analysis
Item Difficulty
% of subjects taking the test who answered correctly
Difficulty formula
p = # people correct // total
What shows good variability for Difficulty?
.5
Difficulty considerations
- Behavioral measure
- Characteristic of the item and the sample
- Extreme p values restrict variability
- More comparative than a ‘cut-off’
Why is Difficulty a behavioral measure
It taps into individual differences in holding the construct
Item Discrimination
- Assumption that a single item and the test measure the same thing–comparing items to other items within the test
- Looks at how well any single item is good at discerning who does/does not have a trait
- You want a high rate!
2 Indices of Discrimination
- Index D
* Discrimination Coefficients
Index D(iscrimination) formula
- Score and rank
- Take top and bottom 27%
D= (# correct upper - # correct lower) // # people in larger group
Why do we generally focus on just the high/low 27%?
Look up
Ranges of D(iscrimination)
.40 and up = good
.30 to .39 = okay
.20 to .29 = marginal
.19 and below = poor
Distractor Guidelines
- Plausible
- Parallel in structure and grammar
- Keep everything short
- Mutually exclusive
- Alternate placement
- Limit ‘all of the above’ stuff
D values for a distractor
- You want low, preferably negative (this means more in the low group chose it)
- Zero: it might not be an equally plausible answer
- Be cautious of large D values as well
Why do we want consistency between distractors?
Moving away from randomness helps determine true measure of construct.
Standardization and Ancillary Research
- Norming
- Reliability Studies
- Equating Programs
Test Norming
Two steps:
- Define target population
- Select sample
Sampling Methods
*Probability and Non-Probability
Probability Sampling Methods
Every member of Population has a known non-zero chance of being selected
- Random
- Systematic
- Stratefied
Non-Probability Sampling Methods
Not every member of population has equal chance of being selected…and some have a zero chance of being selected
- Convenience Sampling
- Judgement
- Quota
- Snowball
What is more important: original conceptualization or the technical/statistical work?
Original concept!
What should you be thinking about, even at the original design stage?
Final Score Reports!
Does the norming group need to be large?
Not if it is properly selected!