Test Development Flashcards

1
Q

A method of qualitative item analysis requiring examinees to verbalize their thoughts as they take a test; useful in understanding how individual items function in a test and how testtakers interpret or misinterpret the meaning of individual items

A

“Think aloud” test administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A system of scaling in which stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum

A

Categorical scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Also referred to as category scoring, a method of evaluation in which test responses earn credit toward placement in a particular class or category with other testtakers. Sometimes testtakers must meet a set number of responses corresponding to a particular criterion in order to be placed in a specific category or class; contrast with cumulative scoring and ipsative scoring

A

Class scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In test development, a method of developing ordinal scales through the use of a sorting task that entails judging a stimulus in comparison with every other stimulus used on the test

A

Comparative scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A form of test item requiring the testtaker to construct or create a response, as opposed to simply selecting a response. Items on essay examinations, fill-in-the-blank, and short-answer tests are examples of items in a constructed-response format; contrast with selected-response format

A

Constructed-response format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The test validation process conducted on two or more tests using the same sample of testtakers; when used in conjunction with the creation of norms or the revision of existing norms, this process may also be referred to as co-norming

A

Co-validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A revalidation on a sample of testtakers other than the testtakers on whom test performance was originally found to be a valid predictor of some criterion

A

Cross-validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A method of scoring whereby points or scores accumulated on individual items or subtests are tallied, and the higher the total sum, the higher the individual is presumed to be on the ability, trait, or other characteristic being measured; contrast with class scoring and ipsative scoring

A

Cumulative scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the test development process, a group of people knowledgeable about the subject matter being tested and/or the population for whom the test was designed who can provide input to improve the test’s content, fairness, and other related ways

A

Expert panel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Named for its developer, a scale wherein items range sequentially from weaker to stronger expressions of the attitude or belief being measured

A

Guttman scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A general term to describe various procedures, usually statistical, designed to explore how individual test items work as compared to other items in the test and in the context of the whole test; item analyses may be conducted, for example, to explore the level of difficulty of individual items on an achievement test or the reliability of a personality test; contrast with qualitativeitem analysis

A

Item analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A graphic representation of item difficulty and discrimination

A

Item-characteristic curve (ICC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In achievement or ability testing and other contexts in which responses are keyed correct, a statistic indicating how many testtakers responded correctly to an item. In theory, this index may range from zero (no testtaker responded with the answer keyed correct) to x, where x is the total number of items on the test; in contexts where the nature of the test is such that responses are not keyed correct, this same statistic may be referred to as an item-endorsement index

A

Item-difficulty index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A statistic designed to indicated how adequately a test item separates or discriminates between high and low scorers

A

Item-discrimination index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In personality assessment and other contexts in which the nature of the test is such that responses are not keyed correct or incorrect, a statistic indicating how many testtakers responded to an item in a particular direction. In theory, this index may range from zero (no testtaker responded with such an answer) to x, where x is the total number of items on the test. In achievement tests, which have responses that are keyed correct, this statistic is referred to as an item-difficulty index

A

Item-endorsement index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A reference to the form, plan, structure, arrangement, or layout of individual test items, including whether the items require testtakers to select a response from existing alternative responses or to construct a response

A

Item format

17
Q

The reservoir or well from which items on the final version of a test will be drawn (or discarded)

A

Item pool

18
Q

A statistic designed to provide an indication of a test’s internal consistency; the higher the item-reliability index, the greater the test’s internal consistency

A

Item-reliability index

19
Q

A statistic indicating the degree to which a test measures what it purports to measure; the higher the item-validity index, the greater the test’s criterion-related validity

A

Item-validity index

20
Q

A system of assumptions about measurement, including the assumption that a trait being measured by a test is unidimensional, and the extent to which each test item measures the trait

A

Latent-trait model

21
Q

Named for its developer, a summative rating scale with five alternative responses, most typically on a continuum ranging, for example, from “strongly agree” to “strongly disagree”

A

Likert scale

22
Q

In the process of test conceptualization, the preliminary research surrounding the creation of a prototype test, also referred to as pilot study and pilot research; a general objective of pilot work is to determine how best to measure, gauge, assess, or evaluate the targeted construct(s)

A

Pilot work

23
Q

A general term for various nonstatistical procedures designed to explore how individual test items work, both compared to other items in the test and in the context of the whole test; in contrast to statistically based procedures, qualitative methods involve exploration of the issues by verbal means such as interviews and group discussions conducted with test takers and other relevant parties

A

Qualitative item analysis

24
Q

A system of ordered numerical or verbal descriptors on which judgments about the presence/absence or magnitude of a particular trait, attitude, emotion, or other variable are indicated by raters, judges, or examiners or, when the rating scale reflects self-report, the assessee

A

Rating scale

25
Q

(1) A system of ordered numerical or verbal descriptors, usually occurring at fixed intervals, used as a reference standard in measurement; (2) a set of numbers or other symbols whose properties model empirical properties of the objects or traits to which numbers or other symbols are assigned

A

Scale

26
Q

(1) In test construction, the process of setting rules for assigning numbers in measurement; (2) the process by which a measuring device is designed and calibrated and the way numbers (or other indices that are scale values) are assigned to different amounts of the trait, attribute, or characteristic measured; (3) assigning numbers in accordance with empirical properties of objects or traits

A

Scaling

27
Q

A form of test item requiring test takers to select a response, as opposed to constructing or creating a response; for example, true-false, multiple-choice, and matching items; contrast with constructed-response format

A

Selected-response format

28
Q

A study of test items, usually during test development, in which items are examined for fairness to all prospective test takers and for the presence of offensive language, stereotypes, or situations

A

Sensitivity review

29
Q

An index derived from the summing of selected scores on a test or subtest

A

Summative scale

30
Q

The decrease in item validities that inevitably occurs after cross-validation

A

Validity shrinkage