Chapter 8: Test Development Flashcards
Stages in the Process of Developing a Test
Test Conceptualization Test Construction Test Tryout Item Analysis Test Revision
Test Construction
Drafting of items for the test
Test Tryout
First draft of the test is then tried out on a group of sample testtakers
Item Analysis
When statistical procedures are employed to assist in making judgments about which items are good as they are, which items need to be revised, and which items should be discarded
Analysis of the Test’s Items Include
Analyses of item reliability
Item Validity
Item Discrimination
Test Conceptualization
There ought to be a test designed to measure (____) in a (____) way; stimulus could be anything; review of related literature on existing tests
Preliminary Questions to Ask During Test Conceptualization
What is the test designed to Measure?
What is the objective of the test?
Is there a need for this test?
What is the test designed to Measure?
Closely linked to how the test developer defines the construct being measured and how that definition is the same as or different from other tests purporting to measure the same construct
What is the objective of the test?
In service of what goal will the test be employed? In what way or ways is the objective of this test the same as or different from other tests with similar goals? What real-world behaviors would be anticipated to correlate with testtaker responses?
Is there a need for this test?
Are there any other tests purporting to measure the same thing? In what ways will the new test be better than or different from existing ones? Will there be more compelling evidence for its reliability or validity? Will it be more comprehensive? Will it take less time to administer? In what ways would this test not be better than existing tests?
Preliminary Questions to be Addressed
Who will use this test?
Who will take this test?
What content will the test cover?
How will the test be administered?
What is the ideal format of the test?
Should more than one form of the test be developed?
What special training will be required of test users for administering and interpreting the test?
What types of responses will be required of testtakers?
Who benefits from an administration of this test?
Is there any potential for harm as a result of an administration of this test?
How will meaning be attributed to scores on this test?
Who will use this test?
Clinicians? Educators? Others? For what purpose or purposes would this test be used?
Who will take this test?
Who is this test for? Who needs to take it? Who would find it desirable to take it? For what age range of testtakers is the test designed? What reading level is required of a testtaker? What cultural factors might affect the testtaker response?
What content will the test cover?
Why should it cover this content? Is this coverage different from the content coverage of existing tests with the same or similar objectives? How and why is the content area different? To what extent is this content culture-specific?
How will the test be administered?
Individually or in groups? Is it amenable to both group and individual administration? What differences will exist between individual and group administrations of this test? Will the test be designed for or amenable to computer administration? How might differences between versions of the test be reflected in test scores?
What is the ideal format of the test?
Should it be true-false, essay, multiple-choice, or in some other format? Why is the format selected for this test the best format?
Should more than one form of the test be developed?
On the basis of a cost-benefit analysis, should alternate or parallel forms of this test be created?
What special training will be required of test users for administering or interpreting the test?
What background and qualifications will a prospective user of data derived from an administration of this test need to have? What restrictions, if any, should be placed on distributors of the test and on the test’s usage
What types of responses will be required of testtakers?
What kind of disability might preclude someone from being able to take this test? What adaptations or accommodations are recommended for persons with disabilities?
Who benefits from an administration of this test?
What would the testtaker learn, or how might the testtaker benefit, from an administration of this test? What would the test user learn, or how might the test user benefit? What social benefit, if any, derives from an administration of this test?
Is there any potential for harm as the result of an administration of this test?
What safeguards are built into the recommended testing procedure to prevent any sort of harm to any of the parties involved in the use of this test?
How will meaning be attributed to scores on this test?
Will a testtaker’s score be compared to others taking the test at the same time? To others in a criterion group? Will the test evaluate masters of a particular content area?
Good item on a Norm-referenced Test
An item for which high scorers on the test respond correctly; low scorers on the test tend to respond to that same item incorrectly
Good item on a Criterion-Oriented Test
High scorers on the test get a particular item right whereas low scorers on the test get that same item wrong; each item should address the issue of whether the testaker has met certain criteria
Pilot Work/Pilot Study/Pilot Research
Refers to the preliminary reserach surrounding the creation of a prototype of the test; test items may be piloted to evaluate whether they should be included in the final form of the instrument; May involve open-ended interviews with research subjects believed for some reason (perhaps on the basis of an existing test); developer attempts to determine how best to measure a targeted construct
Pilot Work Process
Entails the Creation Revision Deletion of many test items Literature Reviews Experimentation Related activities
Scaling
Assignment of numbers according to rules; defined as the process of setting rules for assigning numbers in measurement; process by which a measuring device is designed and calibrated and by which numbers (or other indices)-scale values- are designed to different amounts of the trait, attribute, or characteristic being measured
Age-Based Scale
If the Testtaker’s test performance as a function of age is of critical interest
Grade-Based Scale
If the testtaker’s test performance as a function of grade is of critical interest
Stanine Scale
If all raw scores on the test are to be transformed into scores that can range from 1 to 9
Categorization of a Test Scale
Unidimentional vs. Multidimensional
Comparative vs. Categorical