Chapter 8: Test Development Flashcards
The process of developing a test occurs in five stages:
- test conceptualization
- test construction
- test tryout
- item analysis
- test revision
The process of developing a test occurs in five stages:
The beginnings of any published test can probably be traced to thoughts—self-talk, in behavioral terms.
A review of the available literature on existing tests designed to measure a particular construct might indicate that such tests leave much to be desired in psychometric soundness.
An emerging social phenomenon or pattern of behavior might serve as the stimulus for the
development of a new test
- Test Conceptualization
The process of developing a test occurs in five stages:
Some Preliminary Questions
Some Preliminary Questions
1. What is the test designed to measure?
Ano susukatin ng test developer and ano yung definition ng test, same or different sa other tests?
- What is the objective of the test?
Anong goal? Same or different bas a iba, ano icocorrelate? - Is there a need for this test?
Meron na bang ganto? Ano pinagkaiba sa ibang tests? Supporting r&a? less time administer? Ano edge? - Who will use this test?
Clinicians? Educators? Others? For what purpose or purposes would this test be used? - Who will take this test?
Para san, culture, reading compre, culture, attractive? - What content will the test cover?
Bat ganto content? Same or different ba coverage sa ibang tests? Culture? - How will the test be administered?
Pang pc ba? Indi or group, pwede? - What is the ideal format of the test?
Type of test - Should more than one form of the test be developed?
On the basis of a cost–benefit analysis, should alternate or parallel forms of this test be created? - What special training will be required of test users for administering or interpreting the
test?
Restrictions? Sino magaadminister? - What types of responses will be required of testtakers?
Disability? - Who benefits from an administration of this test?
Ano learning? Benefit? - Is there any potential for harm as the result of an administration of this test?
May anticipation ba from harm? - How will meaning be attributed to scores on this test?
Ano imemeasure?
Norm-referenced versus criterion-referenced tests: Item development issues
- same lang sila pag mataas or mababa scores
- sa CRT, kahit top1 ka walang kwenta. pero mostly used sa licensing contexts, educational, mastery
- sa NRT, negats pag knowledge and mastery yung required
In general, to the preliminary research surrounding the creation of a prototype of the test. Test items may be pilot studied (or piloted) to evaluate whether they should be included in the final form of the instrument.
Once _____ work has been completed, the process of test construction begins. Keep in mind, however, that depending on the nature of the test—particularly its need for updates and revisions—the need for further _____ research is always a possibility.
pilot work, pilot study, and pilot research
The process of developing a test occurs in five stages:
- Test Construction
1) Scaling
2) Writing Items
3) Scoring Items
The process of developing a test occurs in five stages: 2. Test Construction
_____ may be defined as the process of setting rules for assigning numbers in measurement. Stated another way, _____ is the process by which a measuring device is designed and calibrated and by which numbers (or other indices)—scale values—are assigned to different amounts of the trait, attribute, or characteristic being measured.
Scaling
The process of developing a test occurs in five stages: 2. Test Construction
Historically, the prolific L. L. _____ is credited for being at the forefront of efforts to develop methodologically sound scaling methods. He adapted psychophysical scaling methods to the study of psychological variables such as attitudes and values
Thurstone
The process of developing a test occurs in five stages: 2. Test Construction
Types of scales (3)
There is no best type of scale. Test developers scale a test in the manner they believe is optimally suited to their conception of the measurement of the trait (or whatever) that is being measured.
- Age-based Scale
- Grade-based Scale - function of grade
- Stanine Scale - scores be transformed from 1 to 9
The process of developing a test occurs in five stages: 2. Test Construction
Scaling methods (7) Ordinal
- Rating Scale - grouping, judgment
- Summative Scale - summing of results
- Likert Scale - 5 or 7 agree disagree
- Paired Comparison - choose one of the pair accdg to rule
- Comparative Scaling - sorting method, judge stimulus with every other stimulus on the scale
- Categorical Scale - 2+ categ
- Guttman Scale or Scalogram Analysis - rank from weak to strong
Guttman scales are developed through the administration of a number of items to a target group. The resulting data are then analyzed by means of _____ analysis, an item-analysis procedure and approach to test development that involves a graphic mapping of a testtaker’s responses.
scalogram
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
An _____ is the reservoir or well from which items will or will not be drawn for the final version of the test
item pool
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
Item format (2)
- Selected-Respone Format - select
2. Constructed-Response Format - supply or create
The process of developing a test occurs in five stages: 2. Test Construction > 2. Writing Items > 1. selected-response format
Multiple-Choice Format (3)
Stem, correct alternative/option, distractors, or foils
The process of developing a test occurs in five stages: 2. Test Construction > 2. Writing Items
In a _____, the testtaker is presented with two columns: premises on the left and responses on the right. The testtaker’s task is to determine which response is best associated with which premise.
matching item
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
A multiple-choice item that contains only two possible responses is called a _____ item. Perhaps the most familiar _____ item is the true–false item. As you know, this type of selected-response item usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact. Other varieties of _____ items include sentences to which the testtaker responds with one of two responses, such as agree or disagree, yes or no, right or wrong, or fact or opinion
binary-choice
The process of developing a test occurs in five stages: 2. Test Construction > 2. Writing Items
- Constructed-Response Format (3)
- Completion item - worded, fill in the blanks
- Short-answer item
- Essay item - composition, facts, understanding, analysis and or interpretation
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
Writing items for computer administration (5)
- Item Bank - collection of tests
- Computer Adaptive Testing - CAT, next question is based on previous answers
a. Floor effect
b. Ceiling effect
c. Item branching
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
Writing items for computer administration (5)
_____ tends to reduce floor effects and ceiling effects
CAT
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
Writing items for computer administration (5)
A _____ refers to the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait, or other attribute being measured. Testtakers who have not yet achieved such ability might fail all of the items; because of the _____, the test would not provide any guidance as to the relative mathematical ability of testtakers in this group.
Example: a test whose items are too difficult for those taking it would show a _____ because most people would obtain or be close to the lowest possible score of 0
floor effect
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
Writing items for computer administration (5)
As you might expect, a _____ refers to the diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, or other attribute being measured. Returning to our example of the ninth-grade mathematics test, what would happen if all of the testtakers answered all of the items correctly? It is likely that the test user would conclude that the test was too easy for this group of testtakers and so discrimination was impaired by a _____.
Ex: a test whose items are too easy for those taking it would show a _____ because most people would achieve or be close to the highest possible score.
ceiling effect
The process of developing a test occurs in five stages: 2. Test Construction: 2. Writing Items
Writing items for computer administration (5)
The ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items is referred to as _____. A computer that has stored a bank of achievement test items of different difficulty levels can be programmed to present items according to an algorithm or rule.
tailor fit
item branching
The process of developing a test occurs in five stages: 2. Test Construction: Scoring Items (3)
Many different test scoring models have been devised. Perhaps the most commonly used model—owing, in part, to its simplicity and logic—is the _____. Typically, the rule in a _____ scored test is that the higher the score on the test, the higher the testtaker is on the ability, trait, or other characteristic that the test purports to measure.
Ex. For example, a student took three classes and earned the following grades “A,” “A” and “B.” Those grades correspond to 4, 4 and 3 on the numeric scale. Add up all numeric grades; in this example, the sum is 4 + 4 + 3 = 11.
cumulative model
The process of developing a test occurs in five stages: 2. Test Construction: Scoring Items (3)
In tests that employ _____, testtaker responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way. This approach is used by some diagnostic systems wherein individuals must exhibit a certain number of symptoms to qualify for a specific diagnosis
class or category scoring