Module 7: Test Development Flashcards

1
Q

Test Development

A

an umbrella term for all that goes into the process of creating a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Test Conceptualization

A

+ brainstorming of ideas about what kind of test a developer wants to publish
+ entail literature reviews and experimentation, creation, revision, and deletion of preliminary items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of information is determined in test conceptualization?

A
  1. Construct
  2. Goal
  3. User
  4. Taker
  5. Administration
  6. Format
  7. Response
  8. Benefits
  9. Costs
  10. Interpretation
  11. Whether the test is norm-referenced or criterion-referenced
  12. How best to measure a targeted construct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pilot work/study/research

A

preliminary reserch surrounding the creation of a prototype of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Pilot work/study/research

A

preliminary reserch surrounding the creation of a prototype of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test Construction

A

stage in the process that entails writing test items, revisions, formatting, setting scoring rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What kind of item should not be made?

A

It is not good to create an item that contains numerous ideas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Item Pool

A

reservoir or well from which the items will or will not be drawn for the final version of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Item Banks

A

relatively large and easily accessible collection of test questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Computer Adaptive Testing

A

+ refers to an interactive, computer administered test-taking process wherein items presented to the testtaker are based in part on the testtaker’s performance on previous item
+ test administered may be different for each testtaker, depending on the test performance on the items presented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does Computer Adaptive Testing reduce?

A

Reduces floor and ceiling effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Floor Effects

A

occurs when there is some lower limit on a survey or questionnaire and a large percentage of respondents score near this lower limit (testtakers have low scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Ceiling Effects

A

occurs when there is some upper limit on a survey or questionnaire and a large percentage of respondents score near this upper limit (testtakers have high scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Item Branching

A

ability of the computer to tailor the
content and order of presentation of items on the basis of responses to previous items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Item Format

A

form, plan, structure, arrangement, and layout of individual test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the different kinds of item formats?

A
  1. Dichotomous format
  2. Polychotomous format
  3. Category format
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Dichotomous Format

A

offers two alternatives for each item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Polychotomous Format

A

each item has more than two alternatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Category Format

A

a format where respondents are asked to rate a construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Checklist

A

subject receives a longlist of adjectives and indicates whether each one if characteristic of himself or herself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Guttman Scale

A

items are arranged from weaker to
stronger expressions of attitude, belief, or feelings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Selected-Response Format

A

require testtakers to select
response from a set of alternative responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the three elements of a multiple choice format?

A
  1. stem (question),
  2. a correct option, and
  3. several incorrect alternatives (distractors or foils)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Multiple Choice Format

A

Should’ve one correct answer, has grammatically parallel alternatives, similar length, alternatives that fit grammatically with the stem, avoid ridiculous distractors, not excessively long, “all of the above”, “none of the above” (25%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the different kinds of distractors?

A
  1. Effective distractors
  2. Ineffective distractors
  3. Cute distractors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Effective Distractors

A

a distractor that was chosen equally by both high and low performing groups that enhances the consistency of test results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Ineffective Distractors

A

may hurt the reliability of the test because they are time consuming to read and can limit the no. of good items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Cute Distractors

A

less likely to be chosen, may affect the reliability of the test bec the testtakers may guess from the remaining options

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Who are most likely to choose good distractors?

A

Good distractors has been chosen frequently by low scorers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Matching Item Format

A

Test taker is presented with two columns: Premises and Responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Binary Choice Format

A

Usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact (50%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Constructed-Response Format

A

requires testtakers to supply or to create the correct answer, not merely selecting it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Complete Item

A

requires the examinee to provide a word or phrase that completes a sentence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Short-Answer Format

A

Should be written clearly enough that the testtaker can respond succinctly, with short answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Essay

A

allows creative integration and expression of the material

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Scaling

A

process of setting rules for assigning numbers in measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Types of Selected-Response Format

A
  1. Multiple Choice
  2. Matching Items
  3. Binary Choice
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Types of Selected-Response Format

A
  1. Multiple Choice
  2. Matching Items
  3. Binary Choice
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Types of Constructed-Response Format

A
  1. Completion Item
  2. Short-Answer
  3. Essay
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the primary scales of measurement?

A
  1. Nominal
  2. Ordinal
  3. Ratio
  4. Interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Nominal

A

+ involve classification or categorization based on one or more distinguishing characteristics
+ label and categorize observations but do not make any quantitative distinctions between observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Ordinal

A

+ rank ordering on some characteristics is also permissible
+ median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Ratio

A

+ contains equal intervals, has no absolute zero point (even negative values have interpretation to it)
+ Zero value does not mean it represents none

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Interval

A

+ has true zero point (if the score is zero, it means none/null)
+ Easiest to manipulate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are the comparative scales of measurement?

A
  1. Paired Comparison
  2. Rank Order
  3. Constant Sum
  4. Q-Sort Technique
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What are the comparative scales of measurement?

A
  1. Paired Comparison
  2. Rank Order
  3. Constant Sum
  4. Q-Sort Technique
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Paired Comparison

A

+ produces ordinal data by presenting with pairs of two stimuli which they are asked to compare
+ respondent is presented with two objects at a time and asked to select one object according to some criterion

48
Q

Rank Order

A

respondents are presented with
several items simultaneously and asked to rank them in
order or priority

49
Q

Constant Sum

A

respondents are asked to allocate a constant sum of units, such as points, among set of stimulus objects with respect to some criterion

50
Q

Q-Sort Technique

A

sort object based on similarity with respect to some criterion

51
Q

Continuous Rating

A

rate the objects by placing a mark at the appropriate position on a continuous line that runs from one extreme of the criterion variable to the other

e.g., Rating Guardians of the Galaxy as the best Marvel Movie of Phase 4

52
Q

Itemized Rating

A

having numbers or brief descriptions associated with each category

e.g., 1 if your like the item the most, 2 if so-so, 3 if you hate it

53
Q

Likert Scale

A

+ indicate their own attitudes by checking how strongly they agree or disagree with carefully worded statements that range from very positive to very negative towards attitudinal object
+ principle of measuring attitudes by asking people to respond to a series of statements about a topic, in terms of the extent to which they agree with them

54
Q

Visual Analogue Scale

A

a 100-mm line that allows subjects to express the magnitude of an experience or belief

55
Q

Semantic Differential Scale

A

derive respondent’s attitude towards the given object by asking him to select an appropriate position on a scale between two bipolar opposites

56
Q

Staple Scale

A

developed to measure the direction and intensity of an attitude simultaneously

57
Q

Summative Scale

A

final score is obtained by summing the ratings across all the items

58
Q

Thurstone Scale

A

+ involves the collection of a variety of different statements about a phenomenon which are ranked by an expert panel in order to develop the questionnaire
+ allows multiple answers

59
Q

Ipsative Scale

A

the respondent must choose between two or more equally socially acceptable options

60
Q

Test Tryout

A

the test should be tried out on people who are similar in critical respects to the people for whom the test was designed

61
Q

What is an informal rule of thumb for test tryouts?

A

An informal rule of thumb should be no fewer than 5 and preferably as many as 10 for each item (the more, the better)

62
Q

What happens if there is a risk of using few subjects in a test?

A

Risk of using few subjects = phantom factors emerge

63
Q

What kind of conditions should test tryouts be executed under?

A

Should be executed under conditions as identical as possible

64
Q

What makes a good test item?

A

A good test item is one that answered correctly by high scorers as a whole

65
Q

Empirical Criterion Keying

A

administering a large pool of test items to a sample of individuals who are known to differ on the construct being measure

66
Q

Item Analysis

A

statistical procedure used to analyze items, evaluate test items

67
Q

Discriminability Analysis

A

employed to examine correlation between each item and the total score of the test

68
Q

Table of Specification

A

a blueprint of the test in terms of number of items per difficulty, topic importance, or taxonomy

69
Q

Guidelines for Item Writing

A

Define clearly what to measure, generate item pool, avoid long items, keep the level of reading difficulty appropriate for those who will complete the test, avoid double-barreled items, consider making positive and negative worded items

70
Q

Double-Barreled Items

A

items that convey more than one ideas at the same time

71
Q

Item Difficulty

A

defined by the number of people who get a particular item correct

72
Q

Item-Difficulty Index

A

calculating the proportion of the total number of testtakers who answered the item correctly; The larger, the easier the item

73
Q

Item-Endorsement Index

A

for personality testing, percentage of individual who endorsed an item in a personality test

74
Q

What is the optimal average item difficulty?

A

The optimal average item difficulty is approx. 50% with items on the testing ranging in difficulty from about 30% to 80%

75
Q

What is the level of difficulty if the item difficulty range is 0.20-0.39?

A

Difficult

75
Q

What is the level of difficulty if the item difficulty range is 0.0-0.19?

A

Very difficult

76
Q

What is the level of difficulty if the item difficulty range is 0.40-0.60?

A

Average/moderately difficult

76
Q

What is the level of difficulty if the item difficulty range is 0.80-1.0?

A

Very easy

77
Q

What is the level of difficulty if the item difficulty range is 0.61-0.79?

A

Easy

78
Q

Omnibus Spiral Format

A

items in an ability are arranged into increasing difficulty

79
Q

Item-Reliability Index

A

provides an indication of the internal consistency of a test

80
Q

What does it mean when the item-reliability index is high?

A

The higher Item-Reliability index, the greater the test’s internal consistency

81
Q

Item-Validity Index

A

designed to provide an indication of the degree to which a test is measure what it purports to measure

82
Q

What does it mean when the item-validity index is high?

A

The higher Item-Validity index, the greater the test’s criterion-related validity

83
Q

Item-Discrimination Index

A

measure of item discrimination; measure of the difference between the proportion of high scorers answering an item correctly and the proportion of low scorers answering the item correctly

84
Q

Extreme Group Method

A

compares people who have done well with those who have done poorly

85
Q

Discrimination Index

A

difference between these proportion

86
Q

Point-Biserial Method

A

correlation between a dichotomous variable and continuous variable

87
Q

What does it mean when the correlation of an item is 0.40 and above?

A

Very good item

88
Q

What does it mean when the correlation of an item is 0.30-0.39?

A

Good item

89
Q

What does it mean when the correlation of an item is 0.20-0.29?

A

Fair item

90
Q

What does it mean when the correlation of an item is 0.09-0.19?

A

Poor item

91
Q

Item-Characteristic Curve

A

graphic representation of item difficulty and discrimination

92
Q

Guessing

A

one that eluded any universally accepted solutions

93
Q

What will happen if an item analysis is done for a speed test?

A

Item analyses taken under speed conditions yield misleading or uninterpretable results

94
Q

How should item analysis be handled if it is for a speed test?

A

Restrict item analysis on a speed test only to the items completed by the testtaker

95
Q

What should the test developer do when they are analyzing a speed test?

A

Test developer ideally should administer the test to be item-analyzed with generous time limits to complete the test

96
Q

What are the types of scoring models?

A
  1. Cumulative Model
  2. Class Scoring/Category Scoring
  3. Ipsative Scoring
97
Q

Cumulative Model

A

testtaker obtains a measure of the level of the trait; thus, high scorers may suggest high level in the trait being measured

98
Q

Class Scoring/Category Scoring

A

testtaker response earn credit toward placement in a particular class or category with other testtaker whose pattern of responses is similar in some way

99
Q

Ipsative Scoring

A

compares testtaker’s score on one scale within a test to another scale within that same test, two unrelated constructs

100
Q

What is the process of test revision?

A

+ Characterize each item according to its strength and weaknesses
+ As revision proceeds, the advantage of writing a large item pool becomes more apparent because some items were removed and must be replaced by the items in the item pool
+ Administer the revised test under standardized conditions to a second appropriate sample of examinee

101
Q

Cross-Validation

A

revalidation of a test on a sample of testtakers other than those on who test performance was originally found to be a valid predictor of some criterion; often results to validity shrinkage

102
Q

Validity Shrinkage

A

decrease in item validities that inevitably occurs after cross-validation

103
Q

Co-validation

A

conducted on two or more test using the same sample of testtakers

104
Q

Co-norming

A

creation of norms or the revision of existing norms

105
Q

Anchor Protocol

A

test protocol scored by highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies

106
Q

Scoring Drift

A

discrepancy between scoring in an anchor protocol and the scoring of another protocol

107
Q

Differential Item Functioning

A

item functions differently in one group of testtakers known to have the same level of the underlying trait

108
Q

DIF Analysis

A

test developers scrutinize group by group item response curves looking for DIF Items

109
Q

DIF Items

A

items that respondents from different groups at the same level of underlying trait have different probabilities of endorsing a function of their group membership

110
Q

Routing Test

A

subtest used to direct or route the
testtaker to a suitable level of items

111
Q

Item-Mapping Method

A

setting cut scores that entails a histographic representation of items and expert judgments regarding item effectiveness

112
Q

Basal Level

A

the level of which a the minimum criterion
number of correct responses is obtained

113
Q

Computer Assisted Psychological Assessment

A

+ standardized test administration is assured for testtakers and variation is kept to a minimum
+ test content and length is tailored according to the taker’s ability