Test construction Flashcards
item difficulty equation
P = total number of examinees passing an item /
total number of examinees
a p value of 0.50 means that 50% of examinees answered the item correctly
item discrimination equation
D = percent of examinees in the upper-scoring group that answered the item correctly
minus percent of examinees in the lower-scoring group that answered the item correctly
D value of +1.0 means that all examinees in the upper-scoring group got the item right, and none of the examinees in the lower-scoring group got it right
item response theory
uses the item characteristic curve
ability to discriminate between high and low achievers is indicated by the slope of the curve
probably of guessing correctly is the y-intercept
reliability coefficient
consistency of test scores
when a test has a reliability coefficient of 0.89, 89% of the variability in obtained scores is true variability
Spearman-Brown prophecy formula
provides an estimate of what the reliability coefficient would have been if it had been based on the full length of the test instead of just half the items
Cronbach’s coefficient alpha
for split-half reliability
administering a test to a single group of examinees, and using a formula to determine inter-item consistency
Kuder-Richardson Formula 20
coefficient alpha - avg reliability obtained from all possible splits of the test
when test items are scored dichotomously (right or wrong), used to determine internal consistency
kappa statistic
used to assess inter-rater reliability
nominal or ordinal scale of measurement
alternate forms reliability
the most thorough method for estimating reliability
internal consistency reliability
not appropriate for speed tests
standard error of measurement (SEM)
used to construct a confidence interval around an obtained score
an index of the amount of error that can be expected in obtained scores due to the unreliability of the test
standard error of estimate (SEE)
used to construct a confidence interval around an examinee’s predicted criterion scores
an index of error when predicting criterion scores from predictor scores
content validity
to obtain information about an examinee’s familiarity with a particular content or behavior domain
construct validity
to determine the extent to which an examinee possesses a particular hypothetical trait
criterion-related validity
to estimate or predict an examinee’s standing or performance on an external criterion