Test Construction Flashcards
Ways to increase test reliability
- increase length of similar content and quality
- increase heterogeneity of the sample- attributes measured by the test
Item discrimination range and usage
The item discrimination index (D) ranges from -1.0 to +1.0. If all examinees in the upper group and none in the lower group answered the item correctly, D is +1.0; if none of the examinees in the upper group and all examinees in the lower group answered the item correctly, D equals -1.0.
Item Discrimination definition
Item discrimination refers to the extent to which a test item discriminates (differentiates) between examinees who obtain high versus low scores on the entire test or on an external criterion.
Factors identified in a factor analysis can be either ________ or _________.
orthogonal or oblique
From the perspective of factor analysis, true score variability consists of _________ and __________
communality and specificity
Factor Analysis
A multivariate statistical technique used to determine how many factors (constructs) are needed to account for the intercorrelations among a set of tests, subtests, or test items. Factor analysis can be used to assess a test’s construct validity by indicating the extent to which the test correlates with factors that it would and would not be expected to correlate with.
Item Characteristic Curve
When using item response theory, an item characteristic curve (ICC) is constructed for each item by plotting the proportion of examinees in the tryout sample who answered the item correctly against either the total test score, performance on an external criterion, or a mathematically-derived estimate of a latent ability or trait.
The curve provides information on the relationship between an examinee’s level on the ability or trait measured by the test and the probability that he/she will respond to the item correctly.
Content Validity
The extent to which a test adequately samples the domain of information, knowledge, or skill that it purports to measure. Determined primarily by “expert judgment.” Most important for achievement and job sample tests.
Reliability and Validity
Reliability is a necessary but not sufficient condition for validity.
Criterion-Related Validity/Concurrent And Predictive
The type of validity that involves determining the relationship (correlation) between the predictor and the criterion. The correlation coefficient is referred to as the criterion-related validity coefficient. Criterion-related validity can be either concurrent (predictor and criterion scores obtained at about the same time) or predictive (predictor scores obtained before criterion scores).
Construct Validity
Construct validity refers to the extent to which a test measures the hypothetical trait (construct) it is intended to measure.
Methods for establishing construct validity include
* correlating test scores with scores on measures that do and do not measure the same trait (convergent and discriminant validity),
* conducting a factor analysis to assess the test’s factorial validity,
* determining if changes in test scores reflect expected developmental changes, and
* seeing if experimental manipulations have the expected impact on test scores.
Relevance
Test Construction
In test construction, relevance refers to the extent to which test items contribute to achieving the stated goals of testing.
Factor Loadings and Communality
In a factor matrix, a factor loading is the correlation between a test (or other variable included in the analysis) and a factor and can be squared to determine the amount of variability in the test that is accounted for by the factor. The communality is the total amount of variability in scores on the test that is accounted for by the factor analysis - i.e., by all of the identified factors.
Reliability
&
Reliability Coefficient
Reliability refers to the consistency of test scores; i.e., the extent to which a test measures an attribute without being affected by random fluctuations (measurement error) that produce inconsistencies over time, across items, or over different forms.
Methods for establishing reliability include
- test-retest,
- alternative forms,
- split-half,
- coefficient alpha,
- inter-rater.
Most produce a reliability coefficient, which is interpreted directly as a measure of true score variability - e.g., a reliability of .80 indicates that 80% of variability in test scores is true score variability.
Orthogonal And Oblique Rotation
In factor analysis, an orthogonal rotation of the identified factors produces uncorrelated factors, while an oblique rotation produces correlated factors. Rotation is done to simplify the interpretation of the identified factors.