W3 - Chapter 8 - Test Development - DN Flashcards

1
Q

anchor protocol

A
  • a test answer sheet
  • developed by a test publisher
  • to test the accuracy of examiners’ scoring

p.280

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

biased test item

A
  • an item that favours one group in relation to another
  • when differences in group ability are controlled

p.271

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

binary-choice item

A
  • multiple choice item
  • contains only two possible responses (true-false)

p.254

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

categorical scaling

A
  • system of scaling
  • stimuli placed in one of two or more alternative categories that differ quantitatively with respect to some continuum

p.249

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

categorical scoring

A
  • a method of evaluation
  • where test responses earn credit toward placement in a particular class/category
  • sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category
  • also called class scoring
  • contrast with cumulative scoring & ipsative scoring

p.260

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

ceiling effect

A
  • diminished utility of a tool of assessment in distinguishing testtakers at the high end of the ability, trait, or other attribute being measured
    p. 259, 307
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

class scoring

A
  • a method of evaluation
  • where test responses earn credit toward placement in a particular class/category
  • sometimes testtakers must meet a set number of responses corresponding to a particular criterion to be placed in a specific category
  • contrast with cumulative scoring & ipsative scoring

p.260

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

comparative scaling

A
  • in test development
  • a method of developing ordinal scales
  • through the use of a **sorting task **
  • entails judging a stimulus in comparison with every other stimulus used on the test

p.249

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

completion item

A
  • requires an examinee to provide a word or phrase that completes a sentence
    p. 254
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

computerized adaptive testing (CAT)

A
  • an interactive, computer-administered testtaking process
  • items are presented to the testtaker, based in part on the testtakers’ performance on previous items

p.15, 255-256

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

co-norming

A
  • the test norming process conducted on two or more tests
  • using the same sample of testtakers
  • when used to validate all of the tests being normed, this process may also be referred to as co-validation

p.138n4, 278

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

constructed-response format

A
  • a form of test item requiring a testtaker to construct or create a response
  • as opposed to simply selecting a response
  • contrast with selected-response format

p.252

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

co-validation

A
  • when co-norming is used to validate all of the tests being normed
  • this process may also be referred to as co-validation

p.278

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

cross-validation

A
  • a revalidation on a sample of testtakers
  • other than the testtakers on whom test performance was originally found to be a valid predictor of some criterion

p.278

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

essay item

A
  • a test item that requires a testtaker to write a composition
  • typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation

p.255

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

expert panel

A
  • in test development process
  • group of people knowledgeable about - the subject matter being tested, and/or the population for whom the test is being designed
  • they can provide input to improve test’s content, fairness etc.

p.274-275

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

floor effect

A
  • a phenomenon arising from the diminished utility of a tool of assessment in distinguishing testtakers at the low end of the ability, trait, or other attribute being measured
    p. 256-259
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

giveaway item

A
  • a test item, usually near the beginning of a test of ability or achievement
  • designed to be relatively easy
  • usually for the purpose of building the testtakers confidence or reducing test-related anxiety

p.263n4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What three criteria must be met when correcting for the impact of guessing?

A
  1. must recognize that guesses are not normally totally random
  2. must deal with the problem of omitted items
  3. some testtakers are lucky and others unlucky

p.269-271

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Guttman scale

A
  • a scale - items range sequentially from weaker to stronger expressions of the attitude or belief being measured
  • constructed so that selection of an earlier item presumes that all following items are also true of the testtaker
  • named after its developer

p.249

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

ipsative scoring

A
  • approach to scoring & interpretation
  • responses & presumed strength of measured trait are interpreted relative to the measured strength of other traits for that testtaker
  • contrast with class scoring & cumulative scoring

p.260

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

item analysis

A
  • general term used to describe various procedures
  • usually statistical, designed to explore how individual items work compared to others in the test & in the context of the whole test
    • e.g., to explore the level of difficulty of individual items on an achievement test
    • e.g., to explore the reliability of a personality test
  • contrast with qualitative item analysis

p.262-275

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

item bank

A
  • a collection of questions to be used in the construction of a test
    p. 255, 257-259, 282-284
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

item branching

A
  • in computerised adaptive testing (CAT)
  • the individualised presentation of test items drawn from an item bank based on the testtakers’ previous responses

p.260

25
item-characteristic curve (ICC)
* **graphic** representation of the **probalistic relationship** between a person's **level of trait (**ability, characteristic) being measured and the **probability** for **responding** to an item in a **predicted** way * also known as a category response curve or an item trace line p.177, 281 p.268
26
item-difficulty index
* items cannot be too easy or too hard in order to differentiate between testtakers knowledge of the subject matter * a statistic obtained by calculating the **proportion** of the **total number** of **testtakers** who answered an item **correctly** * *p* is used to denote item difficulty * a subscript 1 refers to the item number = *p*1 * can **range from 0-1** * the larger the item-difficulty index, the easier the item * (i.e., the higher the p, the easier the item - because ***p* represents** the **number of people** **passing** the item) p.263-264
27
item-discrimination index
* measure of item discrimination * symbolised by *d* p.264-268
28
item-endorsement index
* the name given to an item-difficulty test (which is used in achievement testing) when used in **other contexts** (e.g., personality testing) p. 263
29
item fairness
* a reference to the **degree of bias**, if any, in a test item p. 271-272
30
item format
* a reference to the **form, plan, structure, arrangement,** or **layout** of individual test items * including whether the test items require testtakers to **select or create** a response p.252-255
31
item pool
* the reservoir or well from which items will or will not be **drawn** for the final version of the test * the **collection of item**s to be further **evaluated** for **possible selection** for use in an **item bank** p.251
32
item-reliability index
* provides an indication of the **internal consistency** of a test * the **higher the index**, the greater the internal consistency * index is equal to * the product of the item-score standard deviation (*s*) and * the correlation (*r*) between the item score and the total test score p.264
33
item-validity index
* a statistic designed to provide an indication of the **degree** to which a **test is measuring** what it **purports to measure** * **important** when a test developer's **goal** is to maximise the **criterion-related validity** of a test * the higher the item-validity index, the greater the test's criterion-related validity * to calculate we must first know * the item-score standard deviation (symbolised as *s*1, *s*2, *s*3 etc.) * and the correlation between the item score and the criterion score * then we use the item difficulty index *p*1 in the following formula * *s*1 = square root of *p*1 (1 - *p*1) * the correlation between the score on item 1 and a score on a criterion measure (*r*1c) is multiplied by item 1's item-score standard deviation (*s*1) * the product is an **index of an items validity (*s*1 *r*1c)** p.264
34
Likert scale
* **summative rating scale** with **5 alternative responses** * ranging on a continuum from e.g., "strongly agree" to "strongly disagree" p.247
35
matching item
* the testtaker is presented with two columns * *premises* on the left & *responses* on the right * task is to determine which response is best matched to which premise * young testtakers (draw a line) * others typically asked to write a letter/number as a response p.253
36
method of paired comparisons
* a **scaling** method * a **pair of stimuli** (e.g., photos) is selected **according to a rule** * (e.g., "select the one that is more appealing") p.248
37
multiple-choice format
* one of the three types of **selected-response** item formats * three elements 1. a stem 2. a correct alternative or option 3. and several incorrect alternatives (referred to as distractors or foils) p.252
38
pilot work
* also referred to as pilot study & pilot research * **preliminary research** surrounding the creation of a prototype test * general objective is to determine how best to * **gauge** * **assess**, or * **evaluate** the **targeted construct**(s) p.243-244
39
qualitative item analysis
* **non-statistical** procedures designed to explore how individual test items work * both compared to **other items** in the test & in the **context** of the **whole test** * unlike statistical measures, they involve **exploration** of the issues by **verbal means** * (e.g., interviews & group discussions with testtakers & other relevant parties) p.272-275
40
qualitative methods
* techniques of **data generation & analysis** * rely primarily on **verbal** rather than mathematical or statistical procedures p.272
41
rating scale
* a system of **ordered numerical** or **verbal descriptors** * used to make **judgements** about the **presence, absence, or magnitude** of a particular trait, attitude, emotion, or other variable p.205, 247, 371
42
scaling
* 1) in **test construction** * the process of **setting rules** for **assigning numbers** in measurement * 2) the process by which a measuring device * is designed and calibrated & * the way numbers (or other indices) are assigned to different amounts of a trait, attribute, or characteristic being measured p.244-251
43
scalogram analysis
* an **item-analysis** procedure * entails **graphic mapping** of a testtaker's **responses** p.250
44
scoring drift
* a **discrepancy** between the scoring in an **anchor protocol** and the scoring of **another protocol** p. 280
45
selected-response format
* a form of test item * requiring testtakers to **select a response** * (e.g., true/false, multiple choice, and matching items) * as opposed to creating one - contrast with constructed-response format p.252
46
sensitivity review
* a **study of test items** * usually during test development * items are examined for **fairness** to all prospective testtakers * for the presence of offensive language, stereotypes, or situations p.274
47
short-answer item
* may also be referred to as a completion item * a word, term, sentence or a paragraph may qualify * anything beyond this is an essay item p.254
48
summative scale
* an index derived from the **summing of selected scores** on a test or sub-test p. 247
49
test conceptualization
* an early stage of the test development process * when an **idea** for a particular test or test revision is **conceived** p.240, 241-244
50
test construction
* a stage in the process of test development * entails **writing test items** (or **rewriting/revising** existing items) * as well as **formatting items, setting scoring rules**, and otherwise **designing** and **building** a **test** p.240
51
test development
* an umbrella term for all that goes into the process of creating a test p. 240-284
52
test revision
* action taken to **modify** a test's **content** or **format** * for the purpose of **improving** the test's **effectiveness** as a tool of **measurement** p.240
53
test tryout
* a stage in the process of test development that entails **administering a preliminary version** of a test to a **representative sample** of testtakers * under **conditions** that **simulate** the **conditions** under which the **final version** of the test will be administered p.240, 261-262
54
"think aloud" test administration
* a method of **qualitative** item analysis * examinees **verbalize** their **thoughts** as they take the test * useful in understanding how * **individual items function** in a test * testtakers **interpret or misinterpret** the **meaning** of the individual items p.274
55
true-false item
* a **binary-choice** item * i.e., contains only one of two responses * requires testtaker to indicate whether a statement **is or is not a fact** p.254
56
validity shrinkage
* the **decrease** in item validities that inevitably occurs **after cross-validation** p. 278
57
What is the optimal item difficulty?
* usually **midpoint** between **1.0** and the **probability** of answering **correctly** by **guessing** * which is called the **chance success proportion** * multi choice (50% chance of getting it right by guessing) - .5 +1.00 = 1.5 divided by 2 = .60 10:00 p.263
58
How can you create a **visual representation** of the **best items** on a test (i.e., if the objective is to **maximise criterion-related validity**)?
* this can be achieved by **plotting** each item's * item-validity index and * item-reliability index p.265 Fig 8-5