Chapter 6: Test Development Flashcards

1
Q

6 steps of test construction

A
  1. Define the test (what are we testing and why)
  2. Select item format
  3. Construct test items
  4. Test the items (determine reliability and validity)
  5. Revise the test
  6. Publish the test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Answer choice formats: selected vs. constructed items

A

Selected items: pick from a number of answers (multiple choice, true/false, matching)
Constructed items: generate your own answers (short answer, essay)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Strengths of selected-response items

A

Can include more items (each question takes less time to answer)
Increased content sampling as well as reliability and validity
Reduction of construct-irrelevant factors
Scoring is efficient and reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Weaknesses of selected-response items

A

Developing items is time consuming (easier to write constructed items)
Unable to assess all abilities
Subject to random guessing (make it look like examinee knows more than he/she actually does)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Strengths of constructed-response items

A

Questions are relatively easy to write
Can assess higher-order cognitive abilities (have to show reasoning)
No random guessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Weaknesses of constructed-response items

A

Test can include relatively few items (takes longer to answer each one)
Difficult to score reliably (even with good rubric, still hard)
Subject to misinterpretation (examinee might misconstrue question)
Construct-irrelevant factors can sneak in (ex- bad handwriting makes answers hard to read)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which three things on a test should be clear?

A
Clear directions (examinee should know how to answer question)
Clear questions (questions should only ask 1 thing; answering should be able to be done in a decisive manner)
Clear print (should be easy to read)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

5 things that should not be included on a test

A

Cues to answers (ex- including answer in a different question)
Items that cross pages (increases likelihood of examinee error)
Construct-irrelevant factors
Exact phrasing from materials (encourages rote memorization over understanding of concept)
Biased language and content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

2 things to consider when placing items on a test

A

Item arrangement: placement should make sense

Number of items: if using a power test, should be able to complete questions in given time limit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Who the items on the test should be tailored to

A

The target population (example: wouldn’t give a college level test to 4th graders)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Multiple choice tests: factors to consider pertaining to item stem

A

Stem should clearly state question

Negatively stated stems should not be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multiple choice tests: factors to consider pertaining to alternatives

A

Alternatives should be brief
3-5 alternatives should be included
Alternatives should be grammatically correct in the question
Alternatives should be plausible (otherwise, question becomes a dead giveaway)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multiple choice tests: factors to consider pertaining to questions themselves

A

Items should be clear and easy to read
Only 1 correct/best answer should be included
Placement of correct answer should be random (otherwise, examinees can detect pattern)
Minimize “none/all of the above” and “always/never” questions (becomes a dead giveaway)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True/false tests: factors to consider

A

Include only 1 idea in each item
Avoid specific determiners (never, always) and qualifiers (usually)
Keep items about the same length
Approximately equal number of true and false items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Matching tests: factors to consider pertaining to directions

A

State basis for matching

Indicate responses may be used once, more than once, or not at all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Matching tests: factors to consider pertaining to item design

A

Include more responses than stems (make it possible to get only 1 wrong)
Keep lists relatively short
Keep responses brief
Consider order of responses

17
Q

What kind of material should be used in a matching test?

A

Homogenous material (all items should relate to a common theme)

18
Q

Essay tests: factors to consider

A

Clearly specify the task
Develop a comprehensive scoring rubric
Limit essay items to objectives that can’t be easily measured with selected-response items
Grade blindly

19
Q

Short-answer tests: factors to consider

A

Items should only require short answers
Only 1 correct response
Quantitative items: indicate level of precision desired
Use direct question as opposed to incomplete sentences

20
Q

Factors to consider if using incomplete sentence short-answer questions

A

Include only one blank space (becomes dead giveaway otherwise)
Add blanks near end of sentence
Don’t give clues with blank size
Give enough space to answer

21
Q

Short-answer tests: what should be created for each item?

A

Scoring rubric

22
Q

Typical response tests: factors to consider pertaining to items

A

Focus items on experiences (thoughts, feelings, behaviors)
Limit items to a single experience
Avoid items that will be answered universally the same
Avoid leading questions
Limit use of “never” and “always”

23
Q

Typical response tests: factors to consider pertaining to responses

A

Don’t always have high numbers represent the same thing
Odd or even number of options: neutral option requires odd number
Label options for Likert-type scales (rating from 0-7, etc.)
Consider structuring scale as an interview