What will be measured? Who is the target audience and does the construct match the group?

Selected-Response Items Constructed-Response Items

Item Tryout Statistical Analysis Item Selection

Study Guide: Test Construction Flashcards by Caryn Zaner

6 Steps of Test Construction

Define Test’s Purpose
Preliminary Design Issues
Item Preparation
Item Analysis
Standardization and Ancillary Research
Preparation of Final Materials and Publication

How well did you know this?

Not at all

Perfectly

Test Purpose

What will be measured?

* Who is the target audience and does the construct match the group?

How well did you know this?

Not at all

Perfectly

Preliminary Design Issues definition

Anything that introduces error. Must strike a balance between efficiency and accuracy as well as meet breadth and depth

How well did you know this?

Not at all

Perfectly

Examples of Preliminary Design Issues

Mode of administration
Length–longer is more reliable (up to ~15min)
Item format (T/F, multiple choice, essay)
# of scores (For example, “Depression is multi-faceted and has many scales)
Training
Background research

How well did you know this?

Not at all

Perfectly

What is the most important Preliminary Design Issue?

Background research!

How well did you know this?

Not at all

Perfectly

4 parts of Item Preparation

Stimulus
Response
Conditions governing responses
Scoring procedures

How well did you know this?

Not at all

Perfectly

Stimulus

The question itself is a stimulus.

*We are trying to provoke a specific response correlated with the construct driven ONLY by the stimulus

How well did you know this?

Not at all

Perfectly

Response

The behavior you are looking for that is correlated to a construct

How well did you know this?

Not at all

Perfectly

Conditions governing responses

What are your rules? Is there a time limit? Are they able to ask questions?

How well did you know this?

Not at all

Perfectly

Scoring procedures

Formula or rubric used to formulate final scores

*Make sure each facet is represented (weighted)

How well did you know this?

Not at all

Perfectly

Types of Test Items

Selected-Response Items

* Constructed-Response Items

How well did you know this?

Not at all

Perfectly

Selected-Response Items

Where you know all possible responses without bias

*T/F, multiple choice, Likert scale, etc.

How well did you know this?

Not at all

Perfectly

Constructed-Response Items

Responses are unknown/more nebulous

*Essays, oral responses, performance assessment

How well did you know this?

Not at all

Perfectly

Benefits of Selected-Response Items

One clear answer

* scoring reliability and efficiency

How well did you know this?

Not at all

Perfectly

Benefits of Constructed-Response Items

No agreed-upon answer
Bx can give further context
Goes deeper into construct

How well did you know this?

Not at all

Perfectly

Item Analysis

Study These Flashcards

Item Tryout
Statistical Analysis
Item Selection

Item Tryout

Study These Flashcards

aka Pilot Test

Get subjects similar to target population–cannot be same people used in actual survey
2-3x the items you think you will need.

Statistical Analysis

Study These Flashcards

Difficulty
Discrimination
Distractor Analysis

Item Difficulty

Study These Flashcards

% of subjects taking the test who answered correctly

Difficulty formula

Study These Flashcards

p = # people correct // total

What shows good variability for Difficulty?

Study These Flashcards

Difficulty considerations

Study These Flashcards

Behavioral measure
Characteristic of the item and the sample
Extreme p values restrict variability
More comparative than a ‘cut-off’

Why is Difficulty a behavioral measure

Study These Flashcards

It taps into individual differences in holding the construct

Item Discrimination

Study These Flashcards

Assumption that a single item and the test measure the same thing–comparing items to other items within the test
Looks at how well any single item is good at discerning who does/does not have a trait
You want a high rate!

2 Indices of Discrimination

* Index D | * Discrimination Coefficients

Index D(iscrimination) formula

* Score and rank * Take top and bottom 27% D= (# correct upper - # correct lower) // # people in larger group

Why do we generally focus on just the high/low 27%?

Look up

Ranges of D(iscrimination)

.40 and up = good .30 to .39 = okay .20 to .29 = marginal .19 and below = poor

Distractor Guidelines

* Plausible * Parallel in structure and grammar * Keep everything short * Mutually exclusive * Alternate placement * Limit 'all of the above' stuff

D values for a distractor

* You want low, preferably negative (this means more in the low group chose it) * Zero: it might not be an equally plausible answer * Be cautious of large D values as well

Why do we want consistency between distractors?

Moving away from randomness helps determine true measure of construct.

Standardization and Ancillary Research

* Norming * Reliability Studies * Equating Programs

Test Norming

Two steps: * Define target population * Select sample

Sampling Methods

*Probability and Non-Probability

Probability Sampling Methods

Every member of Population has a known non-zero chance of being selected * Random * Systematic * Stratefied

Non-Probability Sampling Methods

Not every member of population has equal chance of being selected...and some have a zero chance of being selected * Convenience Sampling * Judgement * Quota * Snowball

What is more important: original conceptualization or the technical/statistical work?

Original concept!

What should you be thinking about, even at the original design stage?

Final Score Reports!

Does the norming group need to be large?

Not if it is properly selected!

Study Guide: Test Construction Flashcards

(39 cards)