Test Development Flashcards
A method of qualitative item analysis requiring examinees to verbalize their thoughts as they take a test; useful in understanding how individual items function in a test and how testtakers interpret or misinterpret the meaning of individual items
“Think aloud” test administration
A system of scaling in which stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum
Categorical scaling
Also referred to as category scoring, a method of evaluation in which test responses earn credit toward placement in a particular class or category with other testtakers. Sometimes testtakers must meet a set number of responses corresponding to a particular criterion in order to be placed in a specific category or class; contrast with cumulative scoring and ipsative scoring
Class scoring
In test development, a method of developing ordinal scales through the use of a sorting task that entails judging a stimulus in comparison with every other stimulus used on the test
Comparative scaling
A form of test item requiring the testtaker to construct or create a response, as opposed to simply selecting a response. Items on essay examinations, fill-in-the-blank, and short-answer tests are examples of items in a constructed-response format; contrast with selected-response format
Constructed-response format
The test validation process conducted on two or more tests using the same sample of testtakers; when used in conjunction with the creation of norms or the revision of existing norms, this process may also be referred to as co-norming
Co-validation
A revalidation on a sample of testtakers other than the testtakers on whom test performance was originally found to be a valid predictor of some criterion
Cross-validation
A method of scoring whereby points or scores accumulated on individual items or subtests are tallied, and the higher the total sum, the higher the individual is presumed to be on the ability, trait, or other characteristic being measured; contrast with class scoring and ipsative scoring
Cumulative scoring
In the test development process, a group of people knowledgeable about the subject matter being tested and/or the population for whom the test was designed who can provide input to improve the test’s content, fairness, and other related ways
Expert panel
Named for its developer, a scale wherein items range sequentially from weaker to stronger expressions of the attitude or belief being measured
Guttman scale
A general term to describe various procedures, usually statistical, designed to explore how individual test items work as compared to other items in the test and in the context of the whole test; item analyses may be conducted, for example, to explore the level of difficulty of individual items on an achievement test or the reliability of a personality test; contrast with qualitativeitem analysis
Item analysis
A graphic representation of item difficulty and discrimination
Item-characteristic curve (ICC)
In achievement or ability testing and other contexts in which responses are keyed correct, a statistic indicating how many testtakers responded correctly to an item. In theory, this index may range from zero (no testtaker responded with the answer keyed correct) to x, where x is the total number of items on the test; in contexts where the nature of the test is such that responses are not keyed correct, this same statistic may be referred to as an item-endorsement index
Item-difficulty index
A statistic designed to indicated how adequately a test item separates or discriminates between high and low scorers
Item-discrimination index
In personality assessment and other contexts in which the nature of the test is such that responses are not keyed correct or incorrect, a statistic indicating how many testtakers responded to an item in a particular direction. In theory, this index may range from zero (no testtaker responded with such an answer) to x, where x is the total number of items on the test. In achievement tests, which have responses that are keyed correct, this statistic is referred to as an item-difficulty index
Item-endorsement index