Test Construction Flashcards
Standard Error of Measurement/CI
The SEM is used to construct the CI around a specific test score. Depends on test’s SD and reliability coefficient.
Criterion Contamination
Bias introduced to criterion score as a result of person’s knowledge about their performance on the predictor
Item Difficulty
Determined by dividing # of people who got it right by total #. 0 is very difficult, 1.0 is easy. .5 difficulty is preferred.
Reliability
Consistency of test scores over time, across forms, or across items. Can do test-retest, coefficient alpha, interrater, split-half, alternative forms. Reliability of .80 means 80% of variability is TRUE variability.
Cross-validation and shrinkage
Re-assess criterion-related validity on a new sample to see how generalizable coefficient is. Coefficient shrinks as a result because the “chance factors” operating in original sample aren’t present.
Standard Error of Estimate/CI
Index of error when predicting criterion scores. Used to make a CI around a predicted score. Magnitude depends on criterion’s SD and validity coefficient.
Classical Test Theory
Observed variability in test scores reflects: 1) true differences between examinees on the attribute, and 2) effects of random error.
Factor Analysis
Stat technique used to determine how many factors are needed to account for intercorrelations among a set of tests, substests, or test items.
Construct Validity
Extent to which a test measures a hypothetical trait it is intended to measure.
Incremental Validity
Extent to which predictor increases decision-making accuracy. Calculate by subtracting base rate from positive hit rate. (linked to true and false negatives and positives, and criterion cut-off scores)
Kappa stat
Correlation coefficient used to assess interrater reliability
Split half/ Spearman-Brown
Split-half: split test in half and correlate two halves. Tends to underestimate reliability
Spearman-Brown: corrects split-half technique and figures out reliability if test were full length.
Item discrimination
the extent to which a test differentiates between examinees who obtain high versus low scores on the test or on an external criterion. Ranges from -1.0 to +1.0. If all in upper group and none from lower get it right, score is +1.0
Coefficient Alpha/KR-20
Both are used to assess internal consistency reliability (inter-item consistency). KR-20 used for test items that are scored dichotomously.
Sensitivity and Specificity
Sensitivity= % of people in sample who have disorder and were accurately identified
Specificity: % of people who do not have disorder and were accurately identified as NOT having it