Test Development Flashcards
is an umbrella term for all that goes into the process of creating a test
Test Conceptualization
The thought that “there ought to be a test for…” is impetus to
developing a new test.
TC
The stimulus could be knowledge of psychometric problems with other
tests, a new social phenomenon, or any number of things.
T C
The process of setting rules for
assigning numbers in measurement.
Scaling
are instruments
to measure some trait, state, or ability and may be categorized in many ways
Scaling
was very influential in the
development of sound scaling methods
LL Thorndike
grouping of words, statements, or symbols on which
judgments of the strength of a particular trait, attitude, or emotion are
indicated by the test taker.
Rating Scales
Developed to be “a practical means of assessing
what people believe, the strength of their convictions, as well as individual differences in moral
tolerance” (p
Morally Debatable Behavior Scale Revision
Each item presents the test taker with five alternative responses (sometimes seven), usually on an agree–disagree or
approve–disapprove continuum.
Likert Scale
Offers a continuum of responses that allow for measurements of
attitudes on various topics
Likert Scale
Test takers must choose between two alternatives according to some rule.
Method of Pair Comparisons
For each pair of options, test takers receive a higher score for
selecting the option deemed more justifiable by the majority of a group
of judges.
Method of Pair Comparisons
Entails judgments of a stimulus in
comparison with every other stimulus on the scale
Comparative Scaling (Sorting Task)
Stimuli are placed into one of two or more
alternative categories that differ quantitatively with respect to some
continuum.
Categorical Scaling
Items range sequentially from weaker to stronger
expressions of the attitude, belief, or feeling being measured.
Guttman Scale
provide a list of terms
and the individual
selects that most
characteristic of
herself or himse
Adjective checklist
provide a list of adjectives that must be sorted into nine piles of increasing similarity to the target person.
Q - Sorts
Guide for item Writing
Define clearly what you wish to measure
2. Generate pool of items
3. Avoid items that are exceptionally long
4. Be aware of the reading level of those taking the scale and the
reading level of the items
5. Avoid items that convey two or more ideas at the same time
6. Consider using questions that mix positive and negative
wording
The reservoir or well from which items will or will not be
drawn for the final version of the tes
Test Pool
Includes variables such as the form, plan, structure, arrangement, and layout of individual test items.
Item Format
Items require test takers to select a
response from a set of alternative responses
Selected-response format
Items require test takers to supply or to create the correct answer, not merely to select it.
Constructed response format
Multiple-choice format has three elements:
1) a stem, (2) a correct
alternative or option, and (3) several incorrect alternatives or options
variously referred to as distractors or foils.
Distractions
b: standardized behavioral
samples; c: reliable assessment instruments; and d: theory-linked measures
A relatively large and easily accessible collection of test
questions.
Item Bank
An interactive, computer-
administered test-taking process wherein items presented to the test
taker are based in part on the test taker’s performance on previous
items.
Computerized Adaptive Testing
A discrepancy between scoring in an anchor protocol and the scoring
of another protocol is referred
Scoring Drift
refers to the revalidation of a test on a sample of
test takers other than those on whom test performance was originally
found to be a valid predictor of some criterion.
Cross Validation
test validation process conducted on two or more
tests using the same sample of test takers.
Co validation
Allows test developers to evaluate the validity of
items in relation to a criterion measure.
Item validity Index
Indicates how adequately an item separates
or discriminates between high scorers and low scorers on an entire test.
Item discriminatory Index
The quality of each alternative within a
multiple-choice item can be readily assessed with reference to the
comparative performance of upper and lower scorers.
Analysis of item alternatives:
is an item that favors one particular group of
examinees in relation to another when differences in group ability are
controlle
Biased Test Item