assessment: principles of test construction Flashcards
Validity
How accurately an instrument measures a given construct. Validity is concerned with what an instrument measures, how well it does so, and the extent to which meaningful inferences can be made from the instrument’s results. The three main types of validity are (a) content validity, the extent to which an instrument’s content seems appropriate to its intended purpose; (b) criterion-related validity, the effectiveness of an instrument in predicting an individual’s performance on a specific criterion, either predictive or concurrent; (c) and construct validity, the extent to which an instrument measures a theoretical construct (i.e., idea or concept).
six types of reliability:
- test-retest,
- alternative form,
- internal consistency,
- split-half reliability,
- inter-item consistency,
- inter-rater
face validity
A superficial measure that is concerned with whether an instrument looks valid or credible. Face validity is not a true type of validity.
validity coefficient
Often used to report validity; a correlation between a test score and the criterion measure.
factor analysis
A statistical test used to reduce a larger number of variables (often items on an assessment) to a smaller number of factors (groups or factors). The two forms of factor analysis are (a) exploratory factor analysis (EFA), which involves an initial examination of potential models (or factor structures) that best categorize the variables and (b) confirmatory factor analysis (CFA), which refers to confirming the EFA results
standard error of estimate
A statistic that indicates the expected margin of error in a predicted criterion score due to the imperfect validity of the test.
sensitivity
the instrument’s ability to accurately identify the presence of a phenomenon
specificity
the instrument’s ability to accurately identify the absence of a phenomenon
false positive
an instrument inaccurately identifying the presence of a phenomenon
false negative
an instrument inaccurately identifying the absence of a phenomenon
efficiency
the ratio of total correct decisions divided by the total number of decisions
incremental validity
the extent to which an instrument enhances the accuracy of prediction of a specific criterion
decision accuracy
Decision Accuracy: The accuracy of an instrument in supporting counselor decisions. Decision accuracy often assesses sensitivity (the instrument’s ability to accurately identify the presence of a phenomenon); specificity (the instrument’s ability to accurately identify the absence of a phenomenon); false positive error (an instrument inaccurately identifying the presence of a phenomenon); false negative error (an instrument inaccurately identifying the absence of a phenomenon); efficiency (the ratio of total correct decisions divided by the total number of decisions); and incremental validity (the extent to which an instrument enhances the accuracy of prediction of a specific criterion).
Reliability
Consistency of scores attained by the same person on different administrations of the same test. Concerned with measuring the difference between (error) an individual’s observed test score and true test score: X = 1 + e. There are several different types: (a) test-retest reliability (sometimes called temporal stability) determines the correlation between the scores obtained from two different administrations of the same test, thus evaluating the consistency of scores across time; (b) alternate form reliability (sometimes called equivalent form reliability or parallel form reliability) compares the consistency of scores from two alternative, but equivalent, forms of the same test; (c) internal consistency measures the consistency of responses within a single administration of the instrument (two common types of internal consistency are split-half reliability and interitem reliability-e.g., KR-20 and coefficient alpha); and (d) interscorer reliability, sometimes called inter-rater reliability, is used to calculate the degree of consistency of ratings between two or more persons observing the same behavior or assessing an individual through observational or interview methods.
reliability coefficient
A measure of reliability of a set of scores on a test. Ranges from 0 to 1.00; the closer the coefficient to 1.00, the more reliable the scores