Chapter 8: Test Development Flashcards by John Gandeza

The process of developing a test occurs in five stages:

test conceptualization
test construction
test tryout
item analysis
test revision

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages:

The beginnings of any published test can probably be traced to thoughts—self-talk, in behavioral terms.

A review of the available literature on existing tests designed to measure a particular construct might indicate that such tests leave much to be desired in psychometric soundness.

An emerging social phenomenon or pattern of behavior might serve as the stimulus for the
development of a new test

Test Conceptualization

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages:

Some Preliminary Questions

Some Preliminary Questions
1. What is the test designed to measure?
Ano susukatin ng test developer and ano yung definition ng test, same or different sa other tests?

What is the objective of the test?
Anong goal? Same or different bas a iba, ano icocorrelate?
Is there a need for this test?
Meron na bang ganto? Ano pinagkaiba sa ibang tests? Supporting r&a? less time administer? Ano edge?
Who will use this test?
Clinicians? Educators? Others? For what purpose or purposes would this test be used?
Who will take this test?
Para san, culture, reading compre, culture, attractive?
What content will the test cover?
Bat ganto content? Same or different ba coverage sa ibang tests? Culture?
How will the test be administered?
Pang pc ba? Indi or group, pwede?
What is the ideal format of the test?
Type of test
Should more than one form of the test be developed?
 On the basis of a cost–benefit analysis, should alternate or parallel forms of this test be created?
What special training will be required of test users for administering or interpreting the
test?
Restrictions? Sino magaadminister?
What types of responses will be required of testtakers?
Disability?
Who benefits from an administration of this test?
Ano learning? Benefit?
Is there any potential for harm as the result of an administration of this test?
May anticipation ba from harm?
How will meaning be attributed to scores on this test?
Ano imemeasure?

How well did you know this?

Not at all

Perfectly

Norm-referenced versus criterion-referenced tests: Item development issues

same lang sila pag mataas or mababa scores
sa CRT, kahit top1 ka walang kwenta. pero mostly used sa licensing contexts, educational, mastery
sa NRT, negats pag knowledge and mastery yung required

How well did you know this?

Not at all

Perfectly

In general, to the preliminary research surrounding the creation of a prototype of the test. Test items may be pilot studied (or piloted) to evaluate whether they should be included in the final form of the instrument.

Once _____ work has been completed, the process of test construction begins. Keep in mind, however, that depending on the nature of the test—particularly its need for updates and revisions—the need for further _____ research is always a possibility.

pilot work, pilot study, and pilot research

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages:

Test Construction

1) Scaling
2) Writing Items
3) Scoring Items

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction

_____ may be defined as the process of setting rules for assigning numbers in measurement. Stated another way, _____ is the process by which a measuring device is designed and calibrated and by which numbers (or other indices)—scale values—are assigned to different amounts of the trait, attribute, or characteristic being measured.

Scaling

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction

Historically, the prolific L. L. _____ is credited for being at the forefront of efforts to develop methodologically sound scaling methods. He adapted psychophysical scaling methods to the study of psychological variables such as attitudes and values

Thurstone

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction

Types of scales (3)

There is no best type of scale. Test developers scale a test in the manner they believe is optimally suited to their conception of the measurement of the trait (or whatever) that is being measured.

Age-based Scale
Grade-based Scale - function of grade
Stanine Scale - scores be transformed from 1 to 9

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction

Scaling methods (7) Ordinal

Rating Scale - grouping, judgment
Summative Scale - summing of results
Likert Scale - 5 or 7 agree disagree
Paired Comparison - choose one of the pair accdg to rule
Comparative Scaling - sorting method, judge stimulus with every other stimulus on the scale
Categorical Scale - 2+ categ
Guttman Scale or Scalogram Analysis - rank from weak to strong

How well did you know this?

Not at all

Perfectly

Guttman scales are developed through the administration of a number of items to a target group. The resulting data are then analyzed by means of _____ analysis, an item-analysis procedure and approach to test development that involves a graphic mapping of a testtaker’s responses.

scalogram

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

An _____ is the reservoir or well from which items will or will not be drawn for the final version of the test

item pool

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

Item format (2)

Selected-Respone Format - select

2. Constructed-Response Format - supply or create

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction > 2. Writing Items > 1. selected-response format

Multiple-Choice Format (3)

Stem, correct alternative/option, distractors, or foils

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction > 2. Writing Items

In a _____, the testtaker is presented with two columns: premises on the left and responses on the right. The testtaker’s task is to determine which response is best associated with which premise.

matching item

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

A multiple-choice item that contains only two possible responses is called a _____ item. Perhaps the most familiar _____ item is the true–false item. As you know, this type of selected-response item usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact. Other varieties of _____ items include sentences to which the testtaker responds with one of two responses, such as agree or disagree, yes or no, right or wrong, or fact or opinion

binary-choice

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction > 2. Writing Items

Constructed-Response Format (3)

Completion item - worded, fill in the blanks
Short-answer item
Essay item - composition, facts, understanding, analysis and or interpretation

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

Writing items for computer administration (5)

Item Bank - collection of tests
Computer Adaptive Testing - CAT, next question is based on previous answers
a. Floor effect
b. Ceiling effect
c. Item branching

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

Writing items for computer administration (5)

_____ tends to reduce floor effects and ceiling effects

CAT

How well did you know this?

Not at all

Perfectly

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

Writing items for computer administration (5)

A _____ refers to the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait, or other attribute being measured. Testtakers who have not yet achieved such ability might fail all of the items; because of the _____, the test would not provide any guidance as to the relative mathematical ability of testtakers in this group.

Example: a test whose items are too difficult for those taking it would show a _____ because most people would obtain or be close to the lowest possible score of 0

floor effect

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

Writing items for computer administration (5)

As you might expect, a _____ refers to the diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, or other attribute being measured. Returning to our example of the ninth-grade mathematics test, what would happen if all of the testtakers answered all of the items correctly? It is likely that the test user would conclude that the test was too easy for this group of testtakers and so discrimination was impaired by a _____.

Ex: a test whose items are too easy for those taking it would show a _____ because most people would achieve or be close to the highest possible score.

ceiling effect

The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items

Writing items for computer administration (5)

The ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items is referred to as _____. A computer that has stored a bank of achievement test items of different difficulty levels can be programmed to present items according to an algorithm or rule.

tailor fit

item branching

The process of developing a test occurs in five stages: 2. Test Construction: Scoring Items (3)

Many different test scoring models have been devised. Perhaps the most commonly used model—owing, in part, to its simplicity and logic—is the _____. Typically, the rule in a _____ scored test is that the higher the score on the test, the higher the testtaker is on the ability, trait, or other characteristic that the test purports to measure.

Ex. For example, a student took three classes and earned the following grades “A,” “A” and “B.” Those grades correspond to 4, 4 and 3 on the numeric scale. Add up all numeric grades; in this example, the sum is 4 + 4 + 3 = 11.

cumulative model

The process of developing a test occurs in five stages: 2. Test Construction: Scoring Items (3)

In tests that employ _____, testtaker responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way. This approach is used by some diagnostic systems wherein individuals must exhibit a certain number of symptoms to qualify for a specific diagnosis

class or category scoring

The process of developing a test occurs in five stages: 2. Test Construction: Scoring Items (3) A third scoring model, _____, departs radically in rationale from either cumulative or class models. A typical objective in ipsative scoring is comparing a testtaker’s score on one scale within a test to another scale within that same test. Ex. a supervisor using an ipsative scale to indicate an employee's strength in different areas initially might assign 20 points for communication, 30 for timeliness, and 50 for work quality but a few months later assign 30 points for communication, 30 for timeliness, and 40 for quality of work.

ipsative scoring AKA forced-choice measurement

The process of developing a test occurs in five stages: 3. Test Tryout Equally important are questions about the number of people on whom the test should be tried out. An informal rule of thumb is that there should be no fewer than _____ subjects and preferably as many as ten for each item on the test. In general, the more subjects in the tryout the better.

five

The process of developing a test occurs in five stages: 3. Test Tryout The thinking here is that the more subjects employed, the weaker the role of chance in subsequent data analysis. A definite risk in using too few subjects during test tryout comes during factor analysis of the findings, when what we might call _____—factors that actually are just artifacts of the small sample size—may emerge.

phantom factors

The process of developing a test occurs in five stages: 3. Test Tryout What Is a Good Item?

1. Reliable and valid | 2. Answered correctly by high scorers

The process of developing a test occurs in five stages: 3. Test Tryout What Is a Good Item?

1. Reliable and valid | 2. Answered correctly by high scorers

The process of developing a test occurs in five stages: 4. Item Analysis (8)

1) Item-Difficulty Index 2) Item-Reliability Index 3) Item-Validity Index 4) Item-Discrimination Index 5) Analysis of item alternatives 6) Item-Characteristic Curves 7) Other Considerations in Item Analysis 8) Qualitative Item Analysis

The process of developing a test occurs in five stages: 4. Item Analysis Note that the larger the item-difficulty index, the easier the item. Because p refers to the percent of people passing an item, the higher the p for an item, the easier the item. An exception here may be a giveaway item. Such an item might be inserted near the beginning of an achievement test to spur motivation and a positive testtaking attitude and to lessen testtakers’ test-related anxiety.

Item-difficulty index in the context of achievement testing item-endorsement index, such as personality testing.

The process of developing a test occurs in five stages: 4. Item Analysis The _____ provides an indication of the internal consistency of a test; the higher this index, the greater the test’s internal consistency.

item-reliability index

The process of developing a test occurs in five stages: 4. Item Analysis A statistical tool useful in determining whether items on a test appear to be measuring the same thing(s) is _____ Ex. people who get a high score on a test of verbal ability are also good on other tests that require verbal abilities.

factor analysis

The process of developing a test occurs in five stages: 4. Item Analysis The _____ is a statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure. The higher the item-validity index, the greater the test’s criterion-related validity.

item-validity index

The process of developing a test occurs in five stages: 4. Item Analysis _____ indicate how adequately an item separates or discriminates between high scorers and low scorers on an entire test. (d) The _____ is a measure of the difference between the proportion of high scorers answering an item correctly and the proportion of low scorers answering the item correctly; the higher the value of d, the greater the number of high scorers answering the item correctly

Measures of item-discrimination index

The process of developing a test occurs in five stages: 4. Item Analysis The _____ for what we refer to as the “upper” and “lower” areas of a distribution of scores will demarcate the upper and lower 27% of the distribution of scores—provided the distribution is normal.

optimal boundary lines

The process of developing a test occurs in five stages: 4. Item Analysis The quality of each alternative within a multiple-choice item can be readily assessed with reference to the comparative performance of upper and lower scorers. No formulas or statistics are necessary here.

Analysis of item alternatives

The process of developing a test occurs in five stages: 4. Item Analysis By charting the number of testtakers in the U and L groups who chose each alternative, the test developer can get an idea of the effectiveness of a distractor by means of a simple ______.

eyeball test

The process of developing a test occurs in five stages: 4. Item Analysis Recall that an _____ is a graphic representation of item difficulty and discrimination. Note that the extent to which an item discriminates high- from low-scoring examinees is apparent from the slope of the curve. The steeper the slope, the greater the item discrimination.

item-characteristic curve

The process of developing a test occurs in five stages: 4. Item Analysis Other Considerations in Item Analysis (3)

1) Guessing 2) Item fairness 3) Speed tests

The process of developing a test occurs in five stages: 4. Item Analysis Qualitative Item Analysis (2)

1. “Think aloud” test administration (Ex. Users are asked to say whatever they are looking at, thinking, doing, and feeling at each moment.) 2. Expert panels

The process of developing a test occurs in five stages: 4. Item Analysis In addition to interviewing testtakers individually or in groups, _____ may also provide qualitative analyses of test items.

expert panels

The process of developing a test occurs in five stages: 4. Item Analysis A _____ is a study of test items, typically conducted during the test development process, in which items are examined for fairness to all prospective testtakers and for the presence of offensive language, stereotypes, or situations.

sensitivity review

The process of developing a test occurs in five stages: 5. Test Revision (3)

1) Test Revision as a Stage in New Test Development 2) Test Revision in the Life Cycle of an Existing Test 3) The Use of IRT in Building and Revising Tests

The process of developing a test occurs in five stages: 5. Test Revision (3) _____ information curves can help test developers evaluate how well an individual item (or entire test) is working to measure different levels of the underlying construct. Developers can use these information curves to weed out uninformative questions or to eliminate redundant items that provide duplicate levels of information.

IRT

The process of developing a test occurs in five stages: 5. Test Revision (3) This phenomenon, wherein an item functions differently in one group of testtakers as compared to another group of testtakers known to have the same (or similar) level of the underlying trait, is referred to as _____. _____ items are those items that respondents from different groups at the same level of the underlying trait have different probabilities of endorsing as a function of their group membership.

differential item functioning (DIF)

The process of developing a test occurs in five stages: 5. Test Revision (3) The _____ is then evaluated by content experts, potential respondents, and survey experts using a variety of qualitative and quantitative methods.

item pool

The process of developing a test occurs in five stages: 5. Test Revision (3)