Ch. 8 - Test Development Flashcards

Question

floor effect vs. ceiling effect reduced by?

Answer 1

CAT tends to reduce these. floor effect - not distinguishing between low scores/low ability ceiling effect - not distinguishing between high scorers/high ability

Answer 2

the ability of a computer to tailor the content and order of presentation items on the basis of responses to previous items

Answer 3

``` # of test items (by 50%) measurement error (by 50%) ```

Answer 4

``` category scoring testtakers will be placed in a certain group/class with other testtakers whose pattern of responses was similar in some way (e.g., diganosis) ```

Answer 5

higher the score, the higher the testtaker is on the thing being measured

Answer 6

compares testtakers score on one thing within a test to another thing on a test (thing = scale). comparing yourself with yourself. e.g. Jon is cooler than he is smart BUT you can't say Jon is cooler than Jenny

Answer 7

no fewer than 5 for EACH item on the test. more the better. preferably 10.

Answer 8

item analysis | generally, a good test item is answered correctly by high scorers on the test as a whole

Answer 9

the statistical scrutiny of test data

Answer 10

item's: difficulty, validity, relaibility, item discrimination (IDDRV)

Answer 11

denoted by lowercase italicized p = to testtakers who got it correct / total testtakers greater the p, the easier the item if >.90 - is it really needed? other than as a giveaway/warmup for those with test anxiety

Answer 12

item-endorsement index

Answer 13

shows the internal consistency of a test higher the value, the greater the test's internal consistency =s*r (item SD * correlation between item score and test score)

Answer 14

measures the degree to which a test is measuring what it's supposed to measure

Answer 15

measures how adequately an item separates/discriminates between high and low scores on the entire test yields a lowercase italicized d d compares performance on a particular item with performance on the upper and lower regions of a distribution of continuous test scores higher the d, the higher the # of high scorers are answering it correctly

Answer 16

higher the d, the higher the # of high scorers are answering a test item correctly Bonus: if it's a -d, that means that more low-scorers answer it correctly than high scorers Bonus: if d = 0, same # of high and low scorers get it right

Answer 17

the test item is easy

Answer 18

for multiple-choice items, see how many people answered the distractors and evaluate them appropriately (e.g., maybe too distracting/wording needs to be changed)

Answer 19

ICC - a graphic representation of item difficulty and discrimination

Answer 20

greater item discrimination

Answer 21

straight line with a slope

Answer 22

looks like the top of an F

Answer 23

- guesses are not made totally randomly - how do we deal with omitted items? - some people are luckier guessers than others

Answer 24

the degree (if any) a test item is biased

Answer 25

favor one particular group of examinees when differences in group ability are controlled

Answer 26

not be significantly differnt for different groups regardless of ability

Answer 27

yield misleading or uninterpretable results because items that are closer to the end appear more difficult just because few people were able to finish them

Answer 28

intervies, group discussions, "think aloud" test administration (sheds light on thought patterns), and sensitivity reviews

Answer 29

an expert panel. items on a test are examined for fairness to all prospective testatkers, flag offensive language, stereotypes

Answer 30

characterize each item according to its strengths and weaknesses consider the purpose of the test - if for hiring and firing, eliminate biased items if for culling most skilled performers - get items with the best item discrimination to ID the best of the best

Answer 31

process used to introduce objectivity and uniformity into test administration, scoring, and interpretation

Answer 32

administer the revised test under standardized conditions, then cross-validation

Answer 33

stimulus look dated, dated vocabulary, offensive language, test norms aren't adequate (group membership change), age-related shifts in the abilities over time, improve the reliability or validity of the test, theory on which test was based has improved

Answer 34

-all steps to make a new one (conceptualization, construction, tryout, item analysis, revision) + need to determine whether there is equivalence between the old and new versions of the test. likely scores will not mean the same thing (item analysis to evaluate stability of items between revisions of the same test)

Answer 35

re-validation of a test on a sample of testtakers other than the original group the test was found to be valid on. (aCROSS groups) validity shrinkage is inevitable

Answer 36

test validation process conducted on two or more tests using the same sample of testtakers (economical - test subjects ID'ed once, personnel costs)

Answer 37

co-validation on two tests and creating norms or revising existing norms good for test users if tests are often used together bc they are normed on the same population (sampling error has been eliminated basically) like co-validation, saves money

Answer 38

confirming that a test is given the same way

Answer 39

test protocol scored by a highly trained scorer, designed as a model for scoring and mechanism for resolving scoring discrepancies

Answer 40

the discrepancy between and anchor protocol and another scorer's protocol

Answer 41

(DIF) - when an item functions differently in one group of testtakers as compared to another group of testtakers known to have the same/similar level of underlying trait. This means that for some reason respondents from different groups have different probabilities of endorsing as a function of their group membership (ex: Asian women fear cultural shame around feeling depressed)

Ch. 8 - Test Development Flashcards

(65 cards)