chapter 6 Flashcards
- As applied to a test, is a judgment or estimate of how well a test measures what it purports to measure in a particular context.
- More specifically, it is a judgment based on
evidence about the appropriateness of inferences drawn from test scores
Validity
Is a logical result or deduction.
Inference
Is the process of gathering and evaluating evidence about validity.
Validation
Are absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test.
- Require professional time and know-how, and they may be costly.
- May yield insights regarding a particular population of testtakers as compared to the norming sample described in a test manual.
Local validation studies
One way measurement specialists have traditionally conceptualized validity is according to three categories:
Content validity
Criterion-related validity
Construct validity
This is a measure of validity based on an evaluation of the subjects, topics, or content covered by the items in the test.
Content validity
This is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures.
Criterion-related validity
This is a measure of validity that is arrived at by executing a comprehensive analysis of
a. how scores on the test relate to other test scores and measures, and
b. how scores on the test can be understood within some theoretical framework for
understanding the construct that the test was designed to measure.
Construct validity
Refers to a judgment regarding how well a test measures what it purports to measure at the time and place that the variable being measured (typically a behavior, cognition, or emotion) is actually emitted.
Ecological validity
Relates more to what a test appears to measure to the person being tested than to what the test actually measures.
- Is a judgment concerning how relevant the test items appear to be.
Face validity
Describes judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample.
Content validity
For the “structure” of the evaluation—that is, a plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth.
Test blueprint
Is a judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest—the measure of interest being the criterion.
Criterion-related validity
Two types of validity evidence are subsumed under the heading:
Concurrent validity
Predictive validity
Is an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently).
Concurrent validity
Is an index of the degree to which a test score predicts some criterion measure.
Predictive validity
The standard against which a test a test score is evaluated.
Criterion
Characteristics of a criterion:
- An adequate criterion is relevant.
- An adequate criterion measure must also be valid for the purpose for which it is being used.
Is the term applied to a criterion measure that has been based, at least in part, on predictor measures.
Criterion contamination
Is the extent to which a particular trait, behavior, characteristic, or attribute exists in the population (expressed as a proportion).
Base rate
May be defined as the proportion of people a test accurately identifies as possessing or exhibiting a particular trait, behavior, characteristic, or attribute.
Hit rate
May be defined as the proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute
Miss rate
Is a miss wherein the test predicted that the test-taker did possess the particular characteristic or attribute being measured when in fact the testtaker did not.
False positive
Is a miss wherein the test predicted that the test-taker did not possess the particular characteristic or attribute being measured when the testtaker actually did.
False negative