Chapter 8 Flashcards by Haley Lam

What’s an ideal item on a test for a norm-referenced test?

What about criterion?

Top scorers should get it correct, while low scorers should get it wrong.

The above doesn’t matter for criterion-based. An ideal test item is based on how well it assesses mastery.

How well did you know this?

Not at all

Perfectly

Scaling definition

Process of settings rules for assigning numbers in measurement.

How well did you know this?

Not at all

Perfectly

Stanine scale

Raw scores transformed into scores ranged from 1 to 9.

How well did you know this?

Not at all

Perfectly

Rating scale

Records judgements of oneself, others, experiences, or objects

How well did you know this?

Not at all

Perfectly

Summative scale

Final test score is a sum of all items

How well did you know this?

Not at all

Perfectly

Method of paired comparisons

Asked to choose an option based out of two options.

How well did you know this?

Not at all

Perfectly

Comparative Scaling

Sort options in comparison based on judgements. (eg. rank cards)

How well did you know this?

Not at all

Perfectly

Categorical scaling

Sort objects into categories (eg. sorting cards to “justified” “sometimes justified” “always justified”

How well did you know this?

Not at all

Perfectly

Guttman scale

Weaker to stronger expressions.

Agree with stronger ones will also agree with milder

How well did you know this?

Not at all

Perfectly

Direct vs Indirect estimation

Direct (like equal-appearing intervals) transforms responses to another scale.

Indirect is no need to transform to another scale.

How well did you know this?

Not at all

Perfectly

Selected-response vs. Constructed-response formats

Item formats. One is multiple options choose one, other is generate own answer.

How well did you know this?

Not at all

Perfectly

3 types of selected-response item formats

MCs, matching, t/f

How well did you know this?

Not at all

Perfectly

What are the names of the two columns in matching

Premises and responses

How well did you know this?

Not at all

Perfectly

Completion item

Fill in the blank item

How well did you know this?

Not at all

Perfectly

Computerized adaptive testing

What are the advantages of CAT?

Items are based on performance on previous items.

They reduce number of items needed and reduce measurement error (both by around 50%)

How well did you know this?

Not at all

Perfectly

Floor vs. Ceiling effects

Study These Flashcards

Floor: assessment tool is bad at distinguishing testtakers at the low end of what’s measured. (all too hard)

Ceiling: everything’s too easy.

Item Branching

Study These Flashcards

Ability to customize content and order on the basis of previous responses.

Class Scoring or Category Scoring

Study These Flashcards

Responders gets placed in a class or category with other responders based on their responses.

Ipsative scoring

what conclusions can be drawn?

Study These Flashcards

Compare score on one scale to another scale within a same test.

Only appropriate for intraindividual comparison, not interindividual

What makes a good test item?

Study These Flashcards

Can discriminate testtakers.

All high scorers getting a particular item wrong is bad sign. Same for opposite (low scorers and getting that item right).

Item analysis

Study These Flashcards

Statistical procedures to analyze and identify good items for a test.

4 possible analyses for test items

Study These Flashcards

Difficulty, Reliability, Validity, and item discrimination

How to calculate index of item’s difficulty?

item-difficulty index or

item-endorsement index

Study These Flashcards

Just a proportion. correct/total number of people

What’s the ideal item difficulty? What should the range be?

Study These Flashcards

.3 to .8, ideal = .5 for discrimination.

(basis of chance + 1) / 2

Item-reliability index

standard deviation multiplied by correlation of item score and total test score. Internal consistency.

Item-variability index

Degree to which a test measures what it purports to measure.

Item-discrimination index

lowercase "d". Difference between high scorers (upper 25-33%) answering an item correctly and proportion of low scorers (lower 25-33%) answering it correctly. Negative d is bad. Means low scorers answer it more correctly than high scorers.

Item-characteristic curve

A graphic representation of item difficulty and discrimination. Probability of correct (y) and Ability (x)

What are biased test items?

Ones that prohibit item fairness by favoring a group.

How do ICCs helpp identify bias?

Different ICCs for different groups even if totals are the same.

How should item analysis people deal with speed tests? What's the problem with analyzing speed tests?

Items at the end might be rushed or have no good results, leading to wrong interpretations of analyses. Adminster the test with lots of time for item-analyses.

"think aloud" test administration

Testtaker thinks through his thought process and says out loud. Qualitative research. Sees if they using right line of thought.

Sensitivity review

Examine fairness of a test and look for stereotypes, offensive language, etc.

Cross-validation

revalidation of a test on a new sample.

Validity shrinkage

Items for a final version of a test will have lower item validity.

Co-validation Co-norming

Using 2+ tests for same sample of testtakers. Making or revising existing norms with 2+ tests for same sample.

Anchor Protocol

A test protocol scored by a highly authoritative scorer

Chapter 8 Flashcards

(38 cards)