Test Construction - Flash Cards
Incremental Validity/True Positives, False Positives, True Negatives, False Negatives
The extent to which a predictor increases decision-making accuracy. Calculated by subtracting the base rate from the positive hit rate. Terms to have linked with incremental validity are predictor and criterion cutoff scores; true and false positives and true and false negatives. True positives are those who scored high on the predictor and criterion; false positives scored high on the predictor but low on the criterion; true negatives scored low on the predictor and the criterion; and false negatives scored low on the predictor but high on the criterion.
Split-Half Reliability/ Spearman-Brown Formula
Split-half reliability is a method for assessing internal consistency reliability and involves “splitting” the test in half (e.g., odd- versus even-numbered items) and correlating examinees’ scores on the two halves of the test. The split-half reliability coefficient tends to underestimate a test’s actual reliability and is usually corrected with the Spearman-Brown formula, which estimates what the test’s reliability would be if it were based on the full length of the test.
Factor Analysis
A multivariate statistical technique used to determine how many factors (constructs) are needed to account for the intercorrelations among a set of tests, subtests, or test items. Factor analysis can be used to assess a test’s construct validity by indicating the extent to which the test correlates with factors that it would and would not be expected to correlate with. From the perspective of factor analysis, true score variability consists of communality and specificity. Factors identified in a factor analysis can be either orthogonal or oblique.
Test-Retest Reliability
A method for assessing reliability that involves administering the same test to the same group of examinees on two different occasions and correlating the two sets of scores. Yields a coefficient of stability.
Criterion-Referenced Interpretation
Interpretation of a test score in terms of a prespecified standard; i.e., in terms of percent of content correct (percentage score) or of predicted performance on an external criterion (e.g., regression equation, expectancy table).
Item Difficulty
An item’s difficulty level is calculated by dividing the number of individuals who answered the item correctly by the total number of individuals; ranges in value from 0 (very difficult item) to 1.0 (very easy item). In general, an item difficulty index of .50 is preferred because it maximizes differentiation between individuals with high and low ability and helps ensure a high reliability coefficient.
Test Length/Range Of Scores
A test’s reliability can be increased in several ways. One way is to increase the test length by adding items of similar content and quality. Another is to increase the heterogeneity of the sample in terms of the attribute(s) measured by the test, which will increase the range of scores.
Standard Error Of Estimate/Confidence Interval
An index of error when predicting criterion scores from predictor scores. Used to construct a confidence interval around an examinee’s predicted criterion score. Its magnitude depends on two factors: the criterion’s standard deviation and the predictor’s validity coefficient.
Orthogonal And Oblique Rotation
In factor analysis, an orthogonal rotation of the identified factors produces uncorrelated factors, while an oblique rotation produces correlated factors. Rotation is done to simplify the interpretation of the identified factors.
Standard Error of Measurement/Confidence Interval
An index of measurement error. Used to construct a confidence interval around an examinee’s obtained test score. Its magnitude depends on two factors: the test’s standard deviation and reliability coefficient.
Cross-Validation And Shrinkage
Process of re-assessing a test’s criterion-related validity on a new sample to check the generalizability of the original validity coefficient. Ordinarily, the validity coefficient “shrinks” (becomes smaller) on cross-validation because the chance factors operating in the original sample are not all present in the cross-validation sample.
Criterion-Related Validity/Concurrent And Predictive
The type of validity that involves determining the relationship (correlation) between the predictor and the criterion. The correlation coefficient is referred to as the criterion-related validity coefficient. Criterion-related validity can be either concurrent (predictor and criterion scores obtained at about the same time) or predictive (predictor scores obtained before criterion scores).
Multitrait-Multimethod Matrix
A systematic way to organize the correlation coefficients obtained when assessing a measure’s convergent and discriminant validity (which, in turn, provides evidence of construct validity). Requires measuring at least two different traits using at least two different methods for each trait. Terms to have linked with multitrait-multimethod matrix are monotrait-monomethod, monotrait-heteromethod, heterotrait-monomethod, and heterotrait-heteromethod coefficients.
Reliability/Reliability Coefficient
Reliability refers to the consistency of test scores; i.e., the extent to which a test measures an attribute without being affected by random fluctuations (measurement error) that produce inconsistencies over time, across items, or over different forms. Methods for establishing reliability include test-retest, alternative forms, split-half, coefficient alpha, and inter-rater. Most produce a reliability coefficient, which is interpreted directly as a measure of true score variability - e.g., a reliability of .80 indicates that 80% of variability in test scores is true score variability.
Item Characteristic Curve
When using item response theory, an item characteristic curve (ICC) is constructed for each item by plotting the proportion of examinees in the tryout sample who answered the item correctly against either the total test score, performance on an external criterion, or a mathematically-derived estimate of a latent ability or trait. The curve provides information on the relationship between an examinee’s level on the ability or trait measured by the test and the probability that he/she will respond to the item correctly.