Test Development Flashcards by Aerish Sison

PROCESS OF TEST DEVELOPMENT

● TEST CONCEPTUALIZATION
● TEST CONSTRUCTION
● TEST TRYOUT
● ITEM ANALYSIS
● TEST REVISION

How well did you know this?

Not at all

Perfectly

which refers to the preliminary research surrounding the creation of a prototype of the test.

PILOT WORK

How well did you know this?

Not at all

Perfectly

the process by which a measuring device is designed and calibrated and by which numbers (scale values) are assigned to different amounts of the trait, attribute, or characteristic being measured.

SCALING

How well did you know this?

Not at all

Perfectly

a grouping of words, statements, or symbols on which judgements of the strength of a particular trait, attitude, or emotion are indicated by the testtaker

Rating scale

How well did you know this?

Not at all

Perfectly

a type of rating scale wherein the final test scores is obtained by summing the rating across all items (e.g., likert scale)

SUMMATIVE SCALE

How well did you know this?

Not at all

Perfectly

Developed by Rensis Likert, is a type of summative rating scale in which each items presents the testtaker with 5 alternative responses (sometimes 7), usually on an agree-disagree or approve-disapprove continuum.

LIKERT SCALE

How well did you know this?

Not at all

Perfectly

What is this scale

Select the behavior that you think that best describes you:
a. I enjoy spending time with others
b. I enjoy spending time alone

METHOD OF PAIRED COMPARISON

How well did you know this?

Not at all

Perfectly

a type of comparative scaling wherein the respondents are presented with several itms simultaneously and asked to rank them in the order of priority

RANK ORDER SCALE

How well did you know this?

Not at all

Perfectly

What is this type of scale

CHARACTERISTICS

( ) friendly
( ) jolly
( ) reserved
( ) withdrawn
( ) shy
( ) cheerful
( ) uneasy
( ) hospitable
( ) talkative
( ) different

CHECKLIST

How well did you know this?

Not at all

Perfectly

stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum
● sequence of numbers that identifies items as belonging to mutually exclusive categories.

CATEGORICAL SCALING

How well did you know this?

Not at all

Perfectly

combination of the checklist format and the category format; the subject is given statements and asked to sort them into 9 piles
● statements that are least descriptive of the person are placed on Pile 1 while those that are most descriptive are placed on Pile 9

Q SORT

How well did you know this?

Not at all

Perfectly

items on this range sequentially from weaker to stronger expression of the attitude, belief, or feeling being measured.

GUTTMAN SCALE

How well did you know this?

Not at all

Perfectly

reservoir from which items will or will not be drawn for the final version of the test

ITEM POOL

How well did you know this?

Not at all

Perfectly

refers to the form, plan, structure, arrangement and layout of individual test items.

ITEM FORMAT

How well did you know this?

Not at all

Perfectly

TYPES OF ITEM FORMAT

SELECTED-RESPONSE FORMAT

CONSTRUCTED-RESPONSE FORMAT

How well did you know this?

Not at all

Perfectly

require testtakers to select a response from a set of alternative responses

SELECTED-RESPONSE FORMAT

How well did you know this?

Not at all

Perfectly

CONSTRUCTED-RESPONSE FORMAT

require testakers to supply or to create the correct answer, not merely to select it

How well did you know this?

Not at all

Perfectly

presented with two columns: premises (left) and responses (right)
● task is to determine which response is best associated with which premise

MATCHING ITEM

How well did you know this?

Not at all

Perfectly

a multiple choice item that contains only two possible responses

BINARY CHOICE ITEM

How well did you know this?

Not at all

Perfectly

usually takes the form of a sentence that requires the testtaker indicate whether the statement is or is not a fact

True-False item

How well did you know this?

Not at all

Perfectly

requires the examinee to provide a word or phrase that completes a sentence

COMPLETION ITEM

How well did you know this?

Not at all

Perfectly

A test item wherein the testtaker responds to the question by writing a composition which demonstrates recall of facts; understanding, analysis, and/or interpretation

ESSAY ITEM

How well did you know this?

Not at all

Perfectly

should be written clearly enough so that testtaker can respond with a short answer

SHORT ANSWER ITEM

How well did you know this?

Not at all

Perfectly

the ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items

ITEM BRANCHING

How well did you know this?

Not at all

Perfectly

also referred to as category scoring, wherein testtaker responses earn credit toward placement in a particular category with other testakers whose pattern of responses is presumably similar in some ways

CLASS SCORING

a descriptor used in psychology to indicate a specific type of measure in which respondents compare two or more desirable options and pick the one that is oat preferred (sometimes called “forced choice” scale)

IPSATIVE SCORING

Comparing a testaker’s score on one scale within a test to another scale within the same test

IPSATIVE SCORING

having created a pool of items from which the final version of the test will be developed, the test developer will try out the test.

TEST TRYOUT

It serves as a prototype of the test.

TEST TRYOUT

should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered; all instructions and everything from time limits allotted for completing the test to the atmosphere at the test site, should be as similar as possible.

TEST TRYOUT

a set of methods used to evaluate test items in order to come up with a cluster of valid and reliable test items

ITEM ANALYSIS

METHODS IN ITEM ANALYSIS

ITEM-DIFFICULTY INDEX ITEM-DISCRIMINATION INDEX ITEM-RELIABILITY INDEX ITEM-VALIDITY INDEX

an index of an item’s difficulty is obtained by calculating the proportion of the total number of testakers who answered the item correctly

ITEM DIFFICULTY INDEX

n item that might be inserted near the beginning of an achievement test to spur the motivation and positive testtaking attitude and to lessen test-related anxiety.

GIVEAWAY ITEM

for maximum discrimination among testtakers, approximately about 0.5, with individual items ranging from 0.3 to 0.8

ITEM DIFFICULT INDEX INTERPRETATION 0.86 and above

VERY EASY

ITEM DIFFICULT INDEX INTERPRETATION 0.71 - 0.85

EASY

ITEM DIFFICULT INDEX INTERPRETATION 0.40 - 0.70

DESIRABLE ITEM

ITEM DIFFICULT INDEX INTERPRETATION 0.15 - 0.39

DIFFICULT ITEM

ITEM-DIFFICULTY INDEX (INTERPRETATION) 0.14 and below

VERY DIFFICULT

indicates how adequately an item separates or discriminates high scorers and low scorers on an entire test.

ITEM DISCRIMINATION INDEX

symbolizes by a lower italic “d” (d)

ITEM DISCRIMINATION INDEX

it compares people who have done well with those who have done poorly on a test. ● difference between the proportion of high scorers answering an item correctly and low scorers answering the item incorrectly ● Upper group = High Scorers ● Lower group = Low Scorers

EXTREME GROUP METHOD

ITEM DISCRIMINATION INDEX (INTERPRETATION) 0.40 and above

VERY GOOD ITEM

ITEM DISCRIMINATION INDEX (INTERPRETATION) 0.30-0.39

GOOD ITEM

ITEM DISCRIMINATION INDEX (INTERPRETATION) 0.20-0.29

MARGINAL

ITEM DISCRIMINATION INDEX (INTERPRETATION) 0.10-0.19

POOR ITEM

ITEM DISCRIMINATION INDEX (INTERPRETATION) 0 and below

Discard

0 and below

Discard

p = Very Easy/ Easy/Very Difficult d = Discarded/Poor/Marginal

REJECT

p = Easy d= Good/ Very Good

Revise

p = Desirable d = Discarded/ Poor

REJECT

p = Desirable d = Marginal

REVISE

p = Desirable d = Good/ Very Good

p = Difficult d = Marginal

REVISE

p = Difficult d = Good/ Very Good

p = Very Difficult d = Good/ Very Good

REVISE

assessing the quality of each alternative within a multiple choice item by comparing the performance of upper and lower scorers

ANALYSIS OF ITEM ALTERNATIVES

by charting the numbers of testtakers in the U and L groups who chose each alternatives, the test developer can get an idea f the effectiveness of a distractor by means of a simple _________

eyeball test

by charting the numbers of testtakers in the U and L groups who chose each alternatives, the test developer can get an idea f the effectiveness of a distractor by means of a simple _________

eyeball test

a graphic representation of item difficulty and discrimination

ITEM CHARACTERISTICS CURVE (ICC)

techniques of data generation and analysis that rely primarily on verbal rather than mathematical r statistical procedures ● Various nonstatistical procedures designed to explore how individual test items work

QUALITATIVE ITEM ANALYSIS

a qualitative research tool designed to shed light on the testtaker’s thought processes during the administration of a test

THINK ALOUD ADMINISTRATION

on a one-on-one basis, the examinee will be asked to take a test, thinking aloud as they respond to each item

THINK ALOUD ADMINISTRATION

study of the test items, typically conducted during the test development process ● Items are examined for fairness to all prospective testtakers and for the presence of offensive language, stereotypes, or situations.

SENSITIVITY REVIEW

Once a test is made available, subsequently, it undergoes refinement.

TEST REVISION

A type of revision is development of ________of the original tests

short forms

For a Test that has a low internal consistency, you can try ________

Factor Analysis

is helpful to increase the reliability of a multivariate/ heterogenous test by identifying the underlying factors

FACTOR ANALYSIS

____________ require weighing each item’s content validity, item-difficulty and -discrimination, inter-item correlation, and bias

Choosing the final items

Test Development Flashcards

(70 cards)