Chapter 8: Test Development Flashcards

Question

Pilot Work/Pilot Study/Pilot Research

Answer 1

Refers to the preliminary reserach surrounding the creation of a prototype of the test; test items may be piloted to evaluate whether they should be included in the final form of the instrument; May involve open-ended interviews with research subjects believed for some reason (perhaps on the basis of an existing test); developer attempts to determine how best to measure a targeted construct

Answer 2

``` Entails the Creation Revision Deletion of many test items Literature Reviews Experimentation Related activities ```

Answer 3

Assignment of numbers according to rules; defined as the process of setting rules for assigning numbers in measurement; process by which a measuring device is designed and calibrated and by which numbers (or other indices)-scale values- are designed to different amounts of the trait, attribute, or characteristic being measured

Answer 4

If the Testtaker's test performance as a function of age is of critical interest

Answer 5

If the testtaker's test performance as a function of grade is of critical interest

Answer 6

If all raw scores on the test are to be transformed into scores that can range from 1 to 9

Answer 7

Unidimentional vs. Multidimensional | Comparative vs. Categorical

Answer 8

Defined as a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by a testtaker; can be used to record judgments of oneself, other, experiences, or objects, and that they can take several forms

Answer 9

When final test score is obtained by summing the ratings across all the items

Answer 10

used extensively in psychology, usually to scale attitudes; relatively easy to construct

Answer 11

Testtakers are presented with pairs of stimuli which they are asked to compare; select one of the stimuli according to some rule; the rule that they agree more with one statement than the other, or the rule that they find one stimulus more appealing than the other;

Answer 12

Entails judgments of a stimulus in comparison with every other stimulus on the scale

Answer 13

Scaling system that relies on sorting; stimuli are place into one of two or more alternative categories that differ quantitatively with respect to some continuum

Answer 14

Another scaling method that yields ordinal-level measures; items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured; all respondents who agree with the stronger statements of the attitude will also agree with milder statements

Answer 15

Item-analysis procedure and approach to test development that involves a graphic mapping of a testtaker's responses; Objective for the developer of a measure of attitudes is to obtain an arrangement of items wherein endorsement of one item automatically connotes endorsement of less extreme positions

Answer 16

Collect a reasonably large number of statements reflecting positive and negative attitudes towards a topic are collected Judges or experts evaluate each statement in terms of how strongly it indicates that the topic is justified. Each judge is instructed to rate each statement on a scale as if the scale were interval in nature Mean and a standard deviation of the judges' ratings are calculated for each statement Items are selected for inclusion in the final scale based on several criteria, including (a) the degree to which the item contributes to a comprehensive measurement of the variable in question (b) the test developer's degree of confidence that the items have indeed been sorted into equal intervals Scale is now ready for administration; the way the scale is used depends on the objectives of a test situation

Answer 17

Variables being measured Group for whom the test is intended Preferences of the test developer

Answer 18

What range of content should the items cover? Which of the many different types of item formats should be employed? How many items should be written in total and for each content area covered?

Answer 19

Reservior or well from which test items will or will not be drawn for the final version of the test

Answer 20

Include variables such as the form, plan, structure, arrangement, and layout of individual test items

Answer 21

Selected-Response Format | Constructed-Response Format

Answer 22

Require testtakers to select a response from a set of alternative responses

Answer 23

Require the testtakers to supply or to create the correct answer, not merely to select it

Answer 24

Multiple Choice Matching True or False

Answer 25

Stem Correct Alternative or option Several incorrect alternatives or options variously referred to as distractors or foils

Answer 26

Has one correct alternative Has grammatically parallel alternatives Has alternatives of similar length Has alternatives that fit grammatically with the stem Includes as much of the item as possible in the stem to avoid unnecessary repetition Avoids ridiculous distractors

Answer 27

Testtaker is presented with two columns: Premises on the left and responses to the right;

Answer 28

Multiple-choice item that contains only two possible responses

Answer 29

The most familiar binary-choice item; type of selected-response item which takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact

Answer 30

Contains a single idea, is not excessively long, and is not subject to debate; correct response must undoubtedly be one of the two coices

Answer 31

Requires the examinee to provide a word or phrase that completes a sentence; also known as Short-Answer Item

Answer 32

Should be worded so that the correct answer is specific; Should be written clearly enough that the testtaker can respond succinctly (with a short answer)

Answer 33

Useful when the test developer wants the examinee to demonstrate a depth of knowledge about a single topic; permits restating of learned material and allows for the creative integration and expression of the material in the testtaker's own words; subjective and inter-scorer differences

Answer 34

Relatively large and accessible collection of test questions; advantage is accessibility to a large number of test items conveniently classified by subject area, item statistics, or other variables

Answer 35

Technique with the ability to individualize testing; ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items

Answer 36

Refers to an interactive, computer-administered testtaking procedure wherein items presented to the testtaker are absed in part on the testtaker's performance on previous items; tends to reduce floor effects and ceiling effects

Answer 37

Refers to the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait, or other attribute being measured

Answer 38

Refers to the diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, or other attribute being measured

Answer 39

Employs testtaker responses which earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way; used by dome diagnostic systems wherein individuals must exhibit a certain number of symptoms to qualify for a specific diagnosis

Answer 40

Comparing a testtaker's score on one scale within a test to another scale within that same test

Answer 41

EPPS designed to measure the relative strength of different psychological needs

Answer 42

Cross Validation Co-Validation Quality Assurance During Test Revision

Answer 43

Stimulus materials look dated and current testtakers cannot relate to them. Verbal Content of the test, including the administration instructions and the test items, contains dated volcabulary that is not readily understood by current testtakers As popular culture changes and words take on new meanings, certain words or expressions in the test items or directions may be perceived as inappropriate or even offensive to a particular group and must therefore be changed. Test norms are no longer adequate as a result of age-related shifts in the abilities measured over time, and so an age extension of the norms (upward, downward, or in both directions) is necessary The reliability or the validity of the test, as well as the effectiveness of individual test items, can be significantly improved by a revision The theory on which the test was originally based has been improved significantly, and these changes should be reflected in the design and content of the test.

Answer 44

Refers to the revalidation of a test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion

Answer 45

The decrease in item validities that inevitably occurs after cross-validation of findings; expected and viewed as integral to the test development process; infinitely preferable to a scenariou wherein high item validities are published in a test manual as a result of inappropriately using the identical sample of testtakers for test standardization and cross-validation of findings

Answer 46

Should outline the test development procedures used | Reliability information, including test-retest validity and Internal consistency estimates

Answer 47

Defined as a test validation process conducted on two or more tests using the sample of testtakers

Answer 48

Process that occurs when co-validation is used in conjunction with the creation of norms or the revision of existing norms

Answer 49

Test protocol scored by a highly authoritative scorer that is designed as a model for schoring and a mechanism for resolving scoring discrepancies

Answer 50

A discrepancy between scoring in an anchor protocol and the scoring of another protocol

Answer 51

Evaluating existing tests for the purpose of mapping test revisions Determining measurement equivalence across testtaker populations Developing item banks

Answer 52

Help test developers evaluate how well an individual item (or entire test) is working to measure different levels of the underlying construct Can be used to weed out uninformative questions Eliminate redundant items to tailor an instrument to provide high information (Precision)

Answer 53

Phenomenon wherein an item functions differently in one group of testtakers as compaered to another group of testtakers known to have the same (or similar) level of the underlying Trait

Answer 54

A process by which test developers scrutinize group-by-group item response curves, looking for DIF Items; used to evaluate the effect of different test administration procedures and item ordering effects

Answer 55

Items that respondents from different groups at the same level of the underlying traid have different probabilities of endorsing as a function of their group membership

Chapter 8: Test Development Flashcards

(79 cards)