Chapter 8 Flashcards

1
Q

What’s an ideal item on a test for a norm-referenced test?

What about criterion?

A

Top scorers should get it correct, while low scorers should get it wrong.

The above doesn’t matter for criterion-based. An ideal test item is based on how well it assesses mastery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Scaling definition

A

Process of settings rules for assigning numbers in measurement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stanine scale

A

Raw scores transformed into scores ranged from 1 to 9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Rating scale

A

Records judgements of oneself, others, experiences, or objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Summative scale

A

Final test score is a sum of all items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Method of paired comparisons

A

Asked to choose an option based out of two options.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Comparative Scaling

A

Sort options in comparison based on judgements. (eg. rank cards)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Categorical scaling

A

Sort objects into categories (eg. sorting cards to “justified” “sometimes justified” “always justified”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Guttman scale

A

Weaker to stronger expressions.

Agree with stronger ones will also agree with milder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Direct vs Indirect estimation

A

Direct (like equal-appearing intervals) transforms responses to another scale.

Indirect is no need to transform to another scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Selected-response vs. Constructed-response formats

A

Item formats. One is multiple options choose one, other is generate own answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

3 types of selected-response item formats

A

MCs, matching, t/f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the names of the two columns in matching

A

Premises and responses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Completion item

A

Fill in the blank item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Computerized adaptive testing

What are the advantages of CAT?

A

Items are based on performance on previous items.

They reduce number of items needed and reduce measurement error (both by around 50%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Floor vs. Ceiling effects

A

Floor: assessment tool is bad at distinguishing testtakers at the low end of what’s measured. (all too hard)

Ceiling: everything’s too easy.

17
Q

Item Branching

A

Ability to customize content and order on the basis of previous responses.

18
Q

Class Scoring or Category Scoring

A

Responders gets placed in a class or category with other responders based on their responses.

19
Q

Ipsative scoring

what conclusions can be drawn?

A

Compare score on one scale to another scale within a same test.

Only appropriate for intraindividual comparison, not interindividual

20
Q

What makes a good test item?

A

Can discriminate testtakers.

All high scorers getting a particular item wrong is bad sign. Same for opposite (low scorers and getting that item right).

21
Q

Item analysis

A

Statistical procedures to analyze and identify good items for a test.

22
Q

4 possible analyses for test items

A

Difficulty, Reliability, Validity, and item discrimination

23
Q

How to calculate index of item’s difficulty?

item-difficulty index or

item-endorsement index

A

Just a proportion. correct/total number of people

24
Q

What’s the ideal item difficulty? What should the range be?

A

.3 to .8, ideal = .5 for discrimination.

(basis of chance + 1) / 2

25
Q

Item-reliability index

A

standard deviation multiplied by correlation of item score and total test score. Internal consistency.

26
Q

Item-variability index

A

Degree to which a test measures what it purports to measure.

27
Q

Item-discrimination index

A

lowercase “d”. Difference between high scorers (upper 25-33%) answering an item correctly and proportion of low scorers (lower 25-33%) answering it correctly.

Negative d is bad. Means low scorers answer it more correctly than high scorers.

28
Q

Item-characteristic curve

A

A graphic representation of item difficulty and discrimination.

Probability of correct (y) and Ability (x)

29
Q

What are biased test items?

A

Ones that prohibit item fairness by favoring a group.

30
Q

How do ICCs helpp identify bias?

A

Different ICCs for different groups even if totals are the same.

31
Q

How should item analysis people deal with speed tests? What’s the problem with analyzing speed tests?

A

Items at the end might be rushed or have no good results, leading to wrong interpretations of analyses.

Adminster the test with lots of time for item-analyses.

32
Q

“think aloud” test administration

A

Testtaker thinks through his thought process and says out loud. Qualitative research. Sees if they using right line of thought.

33
Q

Sensitivity review

A

Examine fairness of a test and look for stereotypes, offensive language, etc.

34
Q

Cross-validation

A

revalidation of a test on a new sample.

35
Q

Validity shrinkage

A

Items for a final version of a test will have lower item validity.

36
Q

Co-validation

Co-norming

A

Using 2+ tests for same sample of testtakers.

Making or revising existing norms with 2+ tests for same sample.

37
Q

Anchor Protocol

A

A test protocol scored by a highly authoritative scorer

38
Q
A