Test Construction Flashcards
Standard Error of Measurement/CI
The SEM is used to construct the CI around a specific test score. Depends on test’s SD and reliability coefficient.
Criterion Contamination
Bias introduced to criterion score as a result of person’s knowledge about their performance on the predictor
Item Difficulty
Determined by dividing # of people who got it right by total #. 0 is very difficult, 1.0 is easy. .5 difficulty is preferred.
Reliability
Consistency of test scores over time, across forms, or across items. Can do test-retest, coefficient alpha, interrater, split-half, alternative forms. Reliability of .80 means 80% of variability is TRUE variability.
Cross-validation and shrinkage
Re-assess criterion-related validity on a new sample to see how generalizable coefficient is. Coefficient shrinks as a result because the “chance factors” operating in original sample aren’t present.
Standard Error of Estimate/CI
Index of error when predicting criterion scores. Used to make a CI around a predicted score. Magnitude depends on criterion’s SD and validity coefficient.
Classical Test Theory
Observed variability in test scores reflects: 1) true differences between examinees on the attribute, and 2) effects of random error.
Factor Analysis
Stat technique used to determine how many factors are needed to account for intercorrelations among a set of tests, substests, or test items.
Construct Validity
Extent to which a test measures a hypothetical trait it is intended to measure.
Incremental Validity
Extent to which predictor increases decision-making accuracy. Calculate by subtracting base rate from positive hit rate. (linked to true and false negatives and positives, and criterion cut-off scores)
Kappa stat
Correlation coefficient used to assess interrater reliability
Split half/ Spearman-Brown
Split-half: split test in half and correlate two halves. Tends to underestimate reliability
Spearman-Brown: corrects split-half technique and figures out reliability if test were full length.
Item discrimination
the extent to which a test differentiates between examinees who obtain high versus low scores on the test or on an external criterion. Ranges from -1.0 to +1.0. If all in upper group and none from lower get it right, score is +1.0
Coefficient Alpha/KR-20
Both are used to assess internal consistency reliability (inter-item consistency). KR-20 used for test items that are scored dichotomously.
Sensitivity and Specificity
Sensitivity= % of people in sample who have disorder and were accurately identified
Specificity: % of people who do not have disorder and were accurately identified as NOT having it
Factor loadings and communality
Factor loading= correlation between a test or other variable, when squared this is the amount of variability in test accounted for by that factor.
Communality= total variability in scores accounted for by the factor analysis.
Oblique and Orthogonal Rotation
Oblique= rotation produces correlated factors
Orthogonal= rotation produces uncorrelated factors
Rotation is done to simplify interpretation of factors
Content validity
Extent to which test adequately samples the domain of info or skill (expert judgment)
Test length/ range of scores
Increasing test length with more items of similar content and quality increases reliability. Or increase heterogeneity of sample in terms of the attribute measures, which increases the range.
Relationship between reliability and validity
reliability is a necessary but insufficient condition for validity!
Item characteristic curve
Constructed in item response theory for each item. Provides info on relationship between examinee’s level on the ability or trait measures and the probability of responding correctly.