Test Development Flashcards
PROCESS OF TEST DEVELOPMENT
● TEST CONCEPTUALIZATION
● TEST CONSTRUCTION
● TEST TRYOUT
● ITEM ANALYSIS
● TEST REVISION
which refers to the preliminary research surrounding the creation of a prototype of the test.
PILOT WORK
the process by which a measuring device is designed and calibrated and by which numbers (scale values) are assigned to different amounts of the trait, attribute, or characteristic being measured.
SCALING
a grouping of words, statements, or symbols on which judgements of the strength of a particular trait, attitude, or emotion are indicated by the testtaker
Rating scale
a type of rating scale wherein the final test scores is obtained by summing the rating across all items (e.g., likert scale)
SUMMATIVE SCALE
Developed by Rensis Likert, is a type of summative rating scale in which each items presents the testtaker with 5 alternative responses (sometimes 7), usually on an agree-disagree or approve-disapprove continuum.
LIKERT SCALE
What is this scale
Select the behavior that you think that best describes you:
a. I enjoy spending time with others
b. I enjoy spending time alone
METHOD OF PAIRED COMPARISON
a type of comparative scaling wherein the respondents are presented with several itms simultaneously and asked to rank them in the order of priority
RANK ORDER SCALE
What is this type of scale
CHARACTERISTICS
( ) friendly
( ) jolly
( ) reserved
( ) withdrawn
( ) shy
( ) cheerful
( ) uneasy
( ) hospitable
( ) talkative
( ) different
CHECKLIST
stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum
● sequence of numbers that identifies items as belonging to mutually exclusive categories.
CATEGORICAL SCALING
combination of the checklist format and the category format; the subject is given statements and asked to sort them into 9 piles
● statements that are least descriptive of the person are placed on Pile 1 while those that are most descriptive are placed on Pile 9
Q SORT
items on this range sequentially from weaker to stronger expression of the attitude, belief, or feeling being measured.
GUTTMAN SCALE
reservoir from which items will or will not be drawn for the final version of the test
ITEM POOL
refers to the form, plan, structure, arrangement and layout of individual test items.
ITEM FORMAT
TYPES OF ITEM FORMAT
SELECTED-RESPONSE FORMAT
CONSTRUCTED-RESPONSE FORMAT
require testtakers to select a response from a set of alternative responses
SELECTED-RESPONSE FORMAT
CONSTRUCTED-RESPONSE FORMAT
require testakers to supply or to create the correct answer, not merely to select it
presented with two columns: premises (left) and responses (right)
● task is to determine which response is best associated with which premise
MATCHING ITEM
a multiple choice item that contains only two possible responses
BINARY CHOICE ITEM
usually takes the form of a sentence that requires the testtaker indicate whether the statement is or is not a fact
True-False item
requires the examinee to provide a word or phrase that completes a sentence
COMPLETION ITEM
A test item wherein the testtaker responds to the question by writing a composition which demonstrates recall of facts; understanding, analysis, and/or interpretation
ESSAY ITEM
should be written clearly enough so that testtaker can respond with a short answer
SHORT ANSWER ITEM
the ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items
ITEM BRANCHING
also referred to as category scoring, wherein testtaker responses earn credit toward placement in a particular category with other testakers whose pattern of responses is presumably similar in some ways
CLASS SCORING
a descriptor used in psychology to indicate a specific type of measure in which respondents compare two or more desirable options and pick the one that is oat preferred (sometimes called “forced choice” scale)
IPSATIVE SCORING
Comparing a testaker’s score on one scale within a test to another scale within the same test
IPSATIVE SCORING
having created a pool of items from which the final version of the test will be developed, the test developer will try out the test.
TEST TRYOUT
It serves as a prototype of the test.
TEST TRYOUT
should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered; all instructions and everything from time limits allotted for completing the test to the atmosphere at the test site, should be as similar as possible.
TEST TRYOUT
a set of methods used to evaluate test items in order to come up with a cluster of valid and reliable test items
ITEM ANALYSIS
METHODS IN ITEM ANALYSIS
ITEM-DIFFICULTY
INDEX
ITEM-DISCRIMINATION INDEX
ITEM-RELIABILITY INDEX
ITEM-VALIDITY INDEX
an index of an item’s difficulty is obtained by calculating the proportion of the total number of testakers who answered the item correctly
ITEM DIFFICULTY INDEX
n item that might be inserted near the beginning of an achievement test to spur the motivation and positive testtaking attitude and to lessen test-related anxiety.
GIVEAWAY ITEM
for maximum discrimination among testtakers, approximately about 0.5, with individual items ranging from 0.3 to 0.8
ITEM DIFFICULT INDEX INTERPRETATION
0.86 and above
VERY EASY
ITEM DIFFICULT INDEX INTERPRETATION
0.71 - 0.85
EASY
ITEM DIFFICULT INDEX INTERPRETATION
0.40 - 0.70
DESIRABLE ITEM
ITEM DIFFICULT INDEX INTERPRETATION
0.15 - 0.39
DIFFICULT ITEM
ITEM-DIFFICULTY INDEX (INTERPRETATION)
0.14 and below
VERY DIFFICULT
indicates how adequately an item separates or discriminates high scorers and low scorers on an entire test.
ITEM DISCRIMINATION INDEX
symbolizes by a lower italic “d” (d)
ITEM DISCRIMINATION INDEX
it compares people who have done well with those who have done poorly on a test.
● difference between the proportion of high scorers answering an item correctly and low scorers answering the item incorrectly
● Upper group = High Scorers
● Lower group = Low Scorers
EXTREME GROUP METHOD
ITEM DISCRIMINATION INDEX (INTERPRETATION)
0.40 and above
VERY GOOD ITEM
ITEM DISCRIMINATION INDEX (INTERPRETATION)
0.30-0.39
GOOD ITEM
ITEM DISCRIMINATION INDEX (INTERPRETATION)
0.20-0.29
MARGINAL
ITEM DISCRIMINATION INDEX (INTERPRETATION)
0.10-0.19
POOR ITEM
ITEM DISCRIMINATION INDEX (INTERPRETATION)
0 and below
Discard
0 and below
Discard
p = Very Easy/ Easy/Very Difficult
d = Discarded/Poor/Marginal
REJECT
p = Easy
d= Good/ Very Good
Revise
p = Desirable
d = Discarded/ Poor
REJECT
p = Desirable
d = Marginal
REVISE
p = Desirable
d = Good/ Very Good
ACCEPT
p = Difficult
d = Marginal
REVISE
p = Difficult
d = Good/ Very Good
ACCEPT
p = Very Difficult
d = Good/ Very Good
REVISE
assessing the quality of each alternative within a multiple choice item by comparing the performance of upper and lower scorers
ANALYSIS OF ITEM ALTERNATIVES
by charting the numbers of testtakers in the U and L groups who chose each alternatives, the test developer can get an idea f the effectiveness of a distractor by means of a simple _________
eyeball test
by charting the numbers of testtakers in the U and L groups who chose each alternatives, the test developer can get an idea f the effectiveness of a distractor by means of a simple _________
eyeball test
a graphic representation of item difficulty and discrimination
ITEM CHARACTERISTICS CURVE (ICC)
techniques of data generation and analysis that rely primarily on verbal rather than mathematical r statistical procedures
● Various nonstatistical procedures designed to explore how individual test items work
QUALITATIVE ITEM ANALYSIS
a qualitative research tool designed to shed light on the testtaker’s thought processes during the administration of a test
THINK ALOUD ADMINISTRATION
on a one-on-one basis, the examinee will be asked to take a test, thinking aloud as they respond to each item
THINK ALOUD ADMINISTRATION
study of the test items, typically conducted during the test development process
● Items are examined for fairness to all prospective testtakers and for the presence of offensive language, stereotypes, or situations.
SENSITIVITY REVIEW
Once a test is made available, subsequently, it undergoes refinement.
TEST REVISION
A type of revision is development of ________of the original tests
short forms
For a Test that has a low internal consistency, you can try ________
Factor Analysis
is helpful to increase the reliability of a multivariate/ heterogenous test by identifying the underlying factors
FACTOR ANALYSIS
____________ require weighing each item’s content validity, item-difficulty and -discrimination, inter-item correlation, and bias
Choosing the final items