Chapter 8 Test Development Flashcards

Question 1

Q

Biased test item:

Answer

A

Biased test item is an item that favours one particular group of examinees in relation to another when differences in group ability are controlled.
p.264

Question 2

Q

Anchor Protocol?

Answer

A

A test answer sheet developed by a test publisher to check the accuracy of examiner’s scoring. To resolve scoring discrepancies.

Question 3

Q

How to detect a biased test item?

Answer

A

Methods of item analysis:
Item characteristic curves. Specific items are identified as biased if exhibit differential item functioning.
The item characteristic curves (ICC)
for the different groups should not be statistically different.

Question 4

Q

What is the order of Test Development from conceptualization?

Answer

A

Test conceptualization
Test construction
Test Tryout
Analysis
Revision to Test tryout again
p.234

Question 5

Q

What is a good item on a norm referenced achievement test?

Answer

A

Is an item for which high scorers on the test respond correctly.
Low scorers on the test tend to respond to that item incorrectly.

Question 6

Q

What pattern should occur on a criterion referenced test?

Answer

A

On a criterion oriented test, the pattern of results may be the same as norm referenced test-
high scorers get a particular item right whereas the low scorers get it wrong.
p.235

Question 7

Q

Criterion-referenced test: difference …

Answer

A

Ideally, each item on a criterion referenced test addresses the issue of whether the test taker has met a certain criteria - eg pilot.
Norm referenced insufficient when knowledge of mastery is needed.
p.236

Question 8

Q

Pilot work

Answer

A

Refers to the preliminary research surrounding the creation of a prototype of the test.
Test developer typically attempts to determine how best to measure a targeted construct.

Question 9

Q

What is scaling?

Answer

A

Scaling is the process of setting rules for assigning numbers in measurement.
A process by which a measuring device is designed and calibrated and by which numbers - scale values - are assigned to different amounts of the trait, attribute or characteristic being measured.

Question 10

Q

Stanine scale?

Answer

A

When raw scores are transformed to scale that can range between 1 to 9.

Question 11

Q

What is the MDBS?

Answer

A

The MDBS is an example of a rating scale.
Morally debatable behaviours scale.
30 items.Never justified to always justified -10 point scale.
Rating scales are:
A grouping of words, statements or symbols on which judgements of the strength of a particular trait, attitude or emotion are indicated by the test taker.
p.239

Question 12

Q

What is a rating scale?

Answer

A

Rating scales are:
A grouping of words, statements or symbols on which judgements of the strength of a particular trait, attitude or emotion are indicated by the test taker.
Used to record judgements of oneself, others, experiences, or objects, and they can take several forms.
p.239

Question 13

Q

What is a summative scale?

Answer

A

Is where the final test score is obtained by summing the ratings across all the items.
p.240

Question 14

Q

What is the Likert Scale?

Answer

A

A summative scale used to scale attitudes.
Five alternative responses…sometimes 7.
Usually on an agree - disagree or
approve - disapprove continuum.

Use of scales results in ordinal level data.

Question 15

Q

Unidimensional raring scale?

Answer

A

Only one dimension is underlying the ratings.

Question 16

Q

Multidimensional rating scales.

Answer

A

More than one dimension is thought to guide the test taker’s responses.
When more than one dimension is tapped by an item.p241.

Question 17

Q

Method of paired comparisons?

Answer

A

A scaling method that produces ordinal data.
Test-takers are presented with pairs of stimuli.. two photos, two statements, two objects…
They must select one of the stimuli according to some rule.
p.241
An advantage is that it forces test takers to choose between items.

Question 18

Q

Categorical scaling

Answer

A

Relies on sorting
Stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum.
 e.g. MDBS-R
eg sorting 30 cards into 3 piles:
behaviours never justified
sometimes justified
always justified

Question 19

Q

Guttman scale:

Answer

A

Scaling method that yields ordinal level measures.
Items on it range sequentially from weaker to stronger expressions of attitude, belief, or feeling being measured.
Feature is that all respondents that agree with the stronger statements will also agree with the milder statements.
Assessed by a scalogram analysis.

Question 20

Q

Scalogram analysis.

Answer

A

An item analysis procedure and approach to test development that involves a graphic mapping of a test taker’s responses.
p.242.
Guttman scale.

Question 21

Q

Item pool

Answer

A

An item pool is the reservoir from which items will or will not be drawn for the final version of a test.

Question 22

Q

Item format

Answer

A

Variables such as the form, plan, stricture, arrangement, and layout of individual test items…collectively referred to as item format.
Selected response format
Constructed response format.

Question 23

Q

Selected response format

Answer

A

Require test takers to select a response from a set of alternative responses.
Eg Multiple choice format
Matching
True/false.

Question 24

Q

Constructed response format.

Answer

A

Requires test takers to supply or to create the correct answer, not merely to select it.
Eg essay
short answer

Question 25

Q

Multiple choice format.

Answer

A

3 elements:

a stem
correct alternative or option
several incorrect options - distractors or foils.

Question 26

Q

What sort of item os matching item?

Answer

A

In a matching item the test taker is presented with two columns:
premises on the left and responses on the right.
Test taker task is to determine which response is best suited with which premise.
p.246

Question 27

Q

Binary choice item.

Answer

A

Where a multiple choice item contains only two possible responses.
EG True - false.
Agree - disagree
Yes - no
Fact - opinion
Right - wrong.

Question 28

Q

Constructed response format:

Answer

A

Completion item
Short answer
Essay

Question 29

Q

Computer administration items:

Answer

A

Advantages:
Ability to store items in an item bank.
Item bank = large collection of testing questions.
Ability to individualize testing through item branching.

Question 30

Q

Computerized adaptive testing.

Answer

A

CAT refers to an interactive, computer administered test taking process wherein items presented to the test taker are based in part on the teat takers performance on previous items.
p.248

Question 31

Q

Floor effects

Answer

A

A floor effect refers to the diminished utility of an assessment tool for distinguishing test takers at the low end of the ability, trait, or other attribute being measured.
Solution = to add some less difficult items.

Question 32

Q

Ceiling effect

Answer

A

A ceiling effect refers to the diminished utility of an assessment tool for distinguishing test takers at the high end of the ability, trait, or other attribute being measured.
ie test too easy.
Solution- add some harder questions.

Question 33

Q

Item branching

Answer

A

Is the ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items.
Patterns of items (eg) based on consecutive correct responses.
p. 252

Question 34

Q

Class or category scoring.

Answer

A

Test taker responses earn credit toward placement in a particular class or category with other test takers whose pattern of responses similar.

Question 35

Q

Ipsative Scoring

Answer

A

Scoring model that compares a test taker’s score on one scale with a test to another scale within that same test.
p. 253.

Question 36

Q

Item fairness.

Biased item

Answer

A

A biased item is one that favours one particular group of examinees in relation to another when differences in group ability are controlled.

Question 37

Q

What do Item Characteristic Curves do?

Answer

A

They can be used to identify biased items.
Specific items are identified as biased in a statistical sense if they exhibit differential item functioning…different shapes of item-characteristic curves for different groups.

Question 38

Q

Qualitative Item Analysis

Answer

A

Is a general term for various non statistical procedures designed to explore how individual test items work.
Compares individual test items to each other and to the test as a whole.
Qualitative methods involve:
interviews
group discussions

Question 39

Q

Think aloud test administration

Answer

A

Cognitive assessment approach.
Respondents verbalize thoughts as they occur.
p.266 table

Question 40

Q

Qualitative Analysis

Expert panels

Answer

A

eg A sensitivity review - a study of items - conducted during test development process in which items are examined for fairness to all prospective test takers… and for the presence of offensive language, stereotypes, etc…

Question 41

Q

Test Revision

Answer

A

Some items will be eliminated and others will be rewritten from the original pool.

Look at difficult- easy - biased - etc

Question 42

Q

Cross-validation

Answer

A

Cross-validation refers to the revalidation of a test on a sample of test takers other than those on whom test performance was originally found to be a predictor of some criterion.

Question 43

Q

Validity Shrinkage

Answer

A

Validity shrinkage is the decrease in item validities that occurs after cross-validation of findings
Such shrinkage is expected and integral to the test development process.

Question 44

Q

Co-validation

Answer

A

Co-validation is a test validation process conducted on two or more tests using the same sample of test takers.

Question 45

Q

Co-norming

Answer

A

When used in conjunction with the creation of norms or the revision of existing norms, co-validation may also be referred as co-norming.

A current trend among test publishers who publish more than one test designed for use with the same population is to co-validate and/or co-norm tests.
Economical.

Question 46

Q

Anchor protocol

Answer

A

Is a mechanism for ensuring consistency in scoring …
and is a test protocol scored by an authoritative scorer that os designed as a model for scoring and a mechanism for resolving scoring discrepancies.

Question 47

Q

Scoring drift

Answer

A

A scoring drift is a discrepancy between scoring in an anchor protocol and the scoring of another protocol.

Once protocols are scored, the data from them must be entered into a data base.

Question 48

Q

Item banks

Answer

A

Each of the items assembled as part of an item bank has undergone rigorous qualitative and quantitative evaluation.

Many items come from existing instruments.
New items may be written.
All items constitute the item pool.
p.274

Question 49

Q

What scales of measurement are there?

Answer

A

.Likert scales (eg 1=strongly disagree - 7=strongly disagree)
.Binary choice scales (true/false: like/dislike)
.Forced choice (eg. I am happy most of the time OR I am sad most of the time)
. Semantic differential scales (eg. strong …….weak).

Question 50

Q

Writing test items

What’s the first step?

Answer

A

To create an item pool.

Two general item format options:

selected response items
constructed response items

Question 51

Q

What are the 4 analytic tools that test developers use to analyze and select items?

Answer

A

Item difficulty Index
Item discrimination index
Item validity index
Item reliability index

Question 52

Q

Item difficulty.

How is it calculated?

Answer

A

Item difficulty index is calculated as the proportion of test takers who answered the item correctly.
(p)

P value ranges from 0 to 1

Each item has a corresponding p value. eg
p1 is read “ item difficulty index for item 1”

Question 53

Q

What is the ideal level of item difficulty for a test as a whole?

Answer

A

It is calculated as the average of all the p values for the test items.

Optimal average item difficulty is 0.5

IE individual items should range in difficulty from 0.3 (somewhat difficult) to 0.8 (somewhat easy).
The effect of guessing must be taken into account.

Question 54

Q

Which items do not discriminate between test takers?

Answer

A

Items that everyone answers correctly p item = 1
or
that no one answers correctly
p item = 0

DO NOT DISCRIMINATE between test takers.

Question 55

Q

What is the Item Discrimination Index?

Answer

A

Item discrimination index is the degree to which an item differentiates correctly on the behaviour the test is designed to measure.

IE. An item is good if most of the high scorers on the test overall answer the item correctly.

Most of the low scorers on the test answer the item incorrectly.

Question 56

Q

Item difficulty

Formula

Answer

A

(1 + Probability) /2. =

eg

(1+.25) /2. =.625

=>. .63

Optimal