W3 - Chapter 8 - Test Development - DN Flashcards

1
Q

anchor protocol

A
  • a test answer sheet
  • developed by a test publisher
  • to test the accuracy of examiners’ scoring

p.280

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

biased test item

A
  • an item that favours one group in relation to another
  • when differences in group ability are controlled

p.271

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

binary-choice item

A
  • multiple choice item
  • contains only two possible responses (true-false)

p.254

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

categorical scaling

A
  • system of scaling
  • stimuli placed in one of two or more alternative categories that differ quantitatively with respect to some continuum

p.249

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

categorical scoring

A
  • a method of evaluation
  • where test responses earn credit toward placement in a particular class/category
  • sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category
  • also called class scoring
  • contrast with cumulative scoring & ipsative scoring

p.260

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ceiling effect

A
  • diminished utility of a tool of assessment in distinguishing testtakers at the high end of the ability, trait, or other attribute being measured
    p. 259, 307
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

class scoring

A
  • a method of evaluation
  • where test responses earn credit toward placement in a particular class/category
  • sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category
  • contrast with cumulative scoring & ipsative scoring

p.260

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

comparative scaling

A
  • in test development
  • a method of developing ordinal scales
  • through the use of a **sorting task **
  • entails judging a stimulus in comparison with every other stimulus used on the test

p.249

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

completion item

A
  • requires an examinee to provide a word or phrase that completes a sentence
    p. 254
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

computerized adaptive testing (CAT)

A
  • an interactive, computer-administered testtaking process
  • items are presented to the testtaker, based in part on the testtakers’ performance on previous items

p.15, 255-256

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

co-norming

A
  • the test norming process conducted on two or more tests
  • using the same sample of testtakers
  • when used to validate all of the tests being normed, this process may also be referred to as co-validation

p.138n4, 278

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

constructed-response format

A
  • a form of test item requiring a testtaker to construct or create a response
  • as opposed to simply selecting a response
  • contrast with selected-response format

p.252

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

co-validation

A
  • when co-norming is used to validate all of the tests being normed
  • this process may also be referred to as co-validation

p.278

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

cross-validation

A
  • a revalidation on a sample of testtakers
  • other than the testtakers on whom test performance was originally found to be a valid predictor of some criterion

p.278

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

essay item

A
  • a test item that requires a testtaker to write a composition
  • typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation

p.255

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

expert panel

A
  • in test development process
  • group of people knowledgeable about - the subject matter being tested, and/or the population for whom the test is being designed
  • they can provide input to improve test’s content, fairness etc.

p.274-275

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

floor effect

A
  • a phenomenon arising from the diminished utility of a tool of assessment in distinguishing testtakers at the low end of the ability, trait, or other attribute being measured
    p. 256-259
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

giveaway item

A
  • a test item, usually near the beginning of a test of ability or achievement
  • designed to be relatively easy
  • usually for the purpose of building the testtakers confidence or reducing test-related anxiety

p.263n4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What three criteria must be met when correcting for the impact of guessing?

A
  1. must recognize that guesses are not normally totally random
  2. must deal with the problem of omitted items
  3. some testtakers are lucky and others unlucky

p.269-271

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Guttman scale

A
  • a scale - items range sequentially from weaker to stronger expressions of the attitude or belief being measured
  • constructed so that selection of an earlier item presumes that all following items are also true of the testtaker
  • named after its developer

p.249

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

ipsative scoring

A
  • approach to scoring & interpretation
  • responses & presumed strength of measured trait are interpreted relative to the measured strength of other traits for that testtaker
  • contrast with class scoring & cumulative scoring

p.260

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

item analysis

A
  • general term used to describe various procedures
  • usually statistical, designed to explore how individual items work compared to others in the test & in the context of the whole test
    • e.g., to explore the level of difficulty of individual items on an achievement test
    • e.g., to explore the reliability of a personality test
  • contrast with qualitative item analysis

p.262-275

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

item bank

A
  • a collection of questions to be used in the construction of a test
    p. 255, 257-259, 282-284
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

item branching

A
  • in computerised adaptive testing (CAT)
  • the individualised presentation of test items drawn from an item bank based on the testtakers’ previous responses

p.260

25
Q

item-characteristic curve (ICC)

A
  • graphic representation of the probalistic relationship between a person’s level of trait (ability, characteristic) being measured and the probability for responding to an item in a predicted way
  • also known as a category response curve or an item trace line

p.177, 281 p.268

26
Q

item-difficulty index

A
  • items cannot be too easy or too hard in order to differentiate between testtakers knowledge of the subject matter
  • a statistic obtained by calculating the proportion of the total number of testtakers who answered an item correctly
    • p is used to denote item difficulty
    • a subscript 1 refers to the item number = p1
  • can range from 0-1
    • the larger the item-difficulty index, the easier the item
    • (i.e., the higher the p, the easier the item - because p represents the number of people passing the item)

p.263-264

27
Q

item-discrimination index

A
  • measure of item discrimination
  • symbolised by d

p.264-268

28
Q

item-endorsement index

A
  • the name given to an item-difficulty test (which is used in achievement testing) when used in other contexts (e.g., personality testing)
    p. 263
29
Q

item fairness

A
  • a reference to the degree of bias, if any, in a test item
    p. 271-272
30
Q

item format

A
  • a reference to the form, plan, structure, arrangement, or layout of individual test items
  • including whether the test items require testtakers to select or create a response

p.252-255

31
Q

item pool

A
  • the reservoir or well from which items will or will not be drawn for the final version of the test
  • the collection of items to be further evaluated for possible selection for use in an item bank

p.251

32
Q

item-reliability index

A
  • provides an indication of the internal consistency of a test
  • the higher the index, the greater the internal consistency
  • index is equal to
    • the product of the item-score standard deviation (s) and
    • the correlation (r) between the item score and the total test score

p.264

33
Q

item-validity index

A
  • a statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure
  • important when a test developer’s goal is to maximise the criterion-related validity of a test
  • the higher the item-validity index, the greater the test’s criterion-related validity
  • to calculate we must first know
    • the item-score standard deviation (symbolised as s1, s2, s3 etc.)
    • and the correlation between the item score and the criterion score
  • then we use the item difficulty index p1 in the following formula
    • s1 = square root of p1 (1 - p1)
  • the correlation between the score on item 1 and a score on a criterion measure (r1c) is multiplied by item 1’s item-score standard deviation (s1)
    • the product is an index of an items validity (s1 r1c)

p.264

34
Q

Likert scale

A
  • summative rating scale with 5 alternative responses
    • ranging on a continuum from e.g., “strongly agree” to “strongly disagree”

p.247

35
Q

matching item

A
  • the testtaker is presented with two columns
  • premises on the left & responses on the right
  • task is to determine which response is best matched to which premise
    • young testtakers (draw a line)
    • others typically asked to write a letter/number as a response

p.253

36
Q

method of paired comparisons

A
  • a scaling method
  • a pair of stimuli (e.g., photos) is selected according to a rule
    • (e.g., “select the one that is more appealing”) p.248
37
Q

multiple-choice format

A
  • one of the three types of selected-response item formats
  • three elements
    1. a stem
    2. a correct alternative or option
    3. and several incorrect alternatives (referred to as distractors or foils)

p.252

38
Q

pilot work

A
  • also referred to as pilot study & pilot research
  • preliminary research surrounding the creation of a prototype test
  • general objective is to determine how best to
    • gauge
    • assess, or
    • evaluate the targeted construct(s)

p.243-244

39
Q

qualitative item analysis

A
  • non-statistical procedures designed to explore how individual test items work
  • both compared to other items in the test & in the context of the whole test
  • unlike statistical measures, they involve exploration of the issues by verbal means
    • (e.g., interviews & group discussions with testtakers & other relevant parties)

p.272-275

40
Q

qualitative methods

A
  • techniques of data generation & analysis
  • rely primarily on verbal rather than mathematical or statistical procedures

p.272

41
Q

rating scale

A
  • a system of ordered numerical or verbal descriptors
  • used to make judgements about the presence, absence, or magnitude of a particular trait, attitude, emotion, or other variable

p.205, 247, 371

42
Q

scaling

A
  • 1) in test construction
    • the process of setting rules for assigning numbers in measurement
  • 2) the process by which a measuring device
    • is designed and calibrated &
    • the way numbers (or other indices) are assigned to different amounts of a trait, attribute, or characteristic being measured

p.244-251

43
Q

scalogram analysis

A
  • an item-analysis procedure
  • entails graphic mapping of a testtaker’s responses

p.250

44
Q

scoring drift

A
  • a discrepancy between the scoring in an anchor protocol and the scoring of another protocol
    p. 280
45
Q

selected-response format

A
  • a form of test item
  • requiring testtakers to select a response
    • (e.g., true/false, multiple choice, and matching items)
    • as opposed to creating one - contrast with constructed-response format p.252
46
Q

sensitivity review

A
  • a study of test items
  • usually during test development
  • items are examined for fairness to all prospective testtakers
    • for the presence of offensive language, stereotypes, or situations

p.274

47
Q

short-answer item

A
  • may also be referred to as a completion item
  • a word, term, sentence or a paragraph may qualify
    • anything beyond this is an essay item

p.254

48
Q

summative scale

A
  • an index derived from the summing of selected scores on a test or sub-test
    p. 247
49
Q

test conceptualization

A
  • an early stage of the test development process
  • when an idea for a particular test or test revision is conceived

p.240, 241-244

50
Q

test construction

A
  • a stage in the process of test development
  • entails writing test items (or rewriting/revising existing items)
  • as well as formatting items, setting scoring rules, and otherwise designing and building a test

p.240

51
Q

test development

A
  • an umbrella term for all that goes into the process of creating a test
    p. 240-284
52
Q

test revision

A
  • action taken to modify a test’s content or format
  • for the purpose of improving the test’s effectiveness as a tool of measurement

p.240

53
Q

test tryout

A
  • a stage in the process of test development that entails administering a preliminary version of a test to a representative sample of testtakers
  • under conditions that simulate the conditions under which the final version of the test will be administered

p.240, 261-262

54
Q

“think aloud” test administration

A
  • a method of qualitative item analysis
  • examinees verbalize their thoughts as they take the test
  • useful in understanding how
    • individual items function in a test
    • testtakers interpret or misinterpret the meaning of the individual items

p.274

55
Q

true-false item

A
  • a binary-choice item
    • i.e., contains only one of two responses
  • requires testtaker to indicate whether a statement is or is not a fact

p.254

56
Q

validity shrinkage

A
  • the decrease in item validities that inevitably occurs after cross-validation
    p. 278
57
Q

What is the optimal item difficulty?

A
  • usually midpoint between 1.0 and the probability of answering correctly by guessing
    • which is called the chance success proportion
      • multi choice (50% chance of getting it right by guessing) - .5 +1.00 = 1.5 divided by 2 = .60 10:00

p.263

58
Q

How can you create a visual representation of the best items on a test

(i.e., if the objective is to maximise criterion-related validity)?

A
  • this can be achieved by plotting each item’s
    • item-validity index and
    • item-reliability index

p.265

Fig 8-5