Test Construction Flashcards
Item difficulty
- Measured using an item difficulty index ranging from 0 - 1
- Equation = Total number of examinees passing the exam divided by total number of examinees
- Optimal difficulty level depends on likelihood of answering correctly by chance, the goal of the testing, etc.
Item discrimination
- The extent to which an item differentiates between examinees who obtain high versus low scores on the entire test
- D = U - L
- Ranges from -1 to +1
- Items with a discrimination index of .35 or higher is typically acceptable
Item characteristic curve
- Constructed for each item
- Plot the proportion of examinees in the sample who answered correctly against the total test score, performance on an external criterion, or an estimate of the latent ability or trait measured by the item
Item response theory
Is sample invariant
Classical test theory
Uses 2 methods of item analysis: item difficulty and item discrimination
Limitations of CTT
- Item and test parameters are sample dependent
- Difficult to equate scores across content
Item’s level of difficulty
Ability level at which 50% of the examinees provide a correct response
Item’s ability to discriminate
Indicated by the slope of the curve
The steeper the slope, the greater the discrimination
Probability of guessing correctly
Indicated by the point at which the ICC intercepts the vertical axis
Test score in Classical Test Theory
X = T + E
T = True score component
E = Error component (measurement error)
Reliability coefficient
Ranges from 0 to 1
Correlation coefficient
Unlike most correlations, the r is never squared
Ex. a reliability coefficient of .89 means that 89% of variability in obtained scores is true score variability
Test-retest reliability
Same test to same group of examinees on two different occasions
Coefficient indicates stability/consistency
May be impacted by a PRACTICE EFFECT
Alternate forms reliability
Two forms of the same test are administered at the same time point
Coefficient of equivalence
May be impacted by CONTENT SAMPLING
Good for speeded tests
Split-half reliability
Scores on two halves of the test are correlated (e.g., odd versus even numbered items)
Usually underestimates a test’s true reliability
Spearman-Brown Prophecy Formula
Used to correct the split-half reliability coefficient
Provides an estimate of the reliability coefficient in a given test length
Tends to overestimate a tests true reliability
Cronbach’s coefficient Alpha
Calculates the average reliability from all possible splits of the test
This is a conservative measurement of reliability
Kuder-Richardson Formula 20
Used when test items are scored dichotomously
Kappa statistic/Cohen’s Kappa
Used when scores or ratings represent a nominal or ordinal scale of measurement
Test length
The longer the test, the less the relative effects of measurement error and the larger the reliability coefficient
Spearman Brown prophecy formula can also be used to estimate the impact of lengthening a test
Range of scores
The reliability coefficient is maximized then the range of scores is unrestricted
Standard error of measurement
Index of the amount of error that can be expected in obtained scores due to the unreliability of the test
SEM = SD sqrt(1-reliability coefficient)
Internal consistency reliability
How well items within a test correlate with other items on the same test
This includes split-half reliability and Cronbach’s coefficient alpha
Content sampling
Impacts split-half reliability and coefficient alpha
Inter-rater reliability
Measured using a kappa or a percent agreement