Chapter 5-validity Flashcards
Validity def
The extent to which a test measures the attribute/construct it is designed to measure
Does the test measure what it was designed to measure?
-> Not a yes or no question - question of degree
One of the most important characteristic of the test.
Guidelines regarding validity (3)
(1) Do NOT accept a test’s name as an indicator of what the test measures.
(2) Validity is NOT a yes/no decision
(3) Validity evidence tells how well the test measures what it is intended to measure.
-> Diff types of evidence can be generated for diff types of validity
What do we mean when we say that “Validity is NOT a yes/no decision”
It comes in degrees and applies to a particular use and a particular population
It is a process: An ongoing, dynamic effort to accumulate evidence for a sound scientific basis for proposed test score interpretations
3 Types of Validity
Content, Criterion, Construct
Subtypes of Criterion validity
Concurrent, Predictive
Subtypes of Construct validity
Convergent, Divergent validity
Face Validity
Whether a test appears to measure what it is supposed to measure (does it appear valid).
Mere appearance that a measure has validity.
=> Not sufficient evidence of validity
A test with high face validity may: (3)
(1) Induce cooperation and positive motivation before and during test administration
(2) Reduce dissatisfaction and feelings of injustice among low scorers
(3) Convince policymakers, employers, and administrators to implement the test
There are situations where test designers make ON PURPOSE a test with low face validity. Why?
Sometimes a test with low face validity can elicit more honest responses
Content Validity def
Degree to which ELEMENTS OF A TEST are representative of the domain/construct of interest.
-> Evaluate how adequately the test samples the domain or content of the construct.
-> More Qualitative than Quantitative
Establishing content validity (3)
(1) Describe the content domain: Identify the boundaries of the content domain + Determine the structure of the content domain.
(2) Inspect test - Expert judgment
(3) Form judgment that the test measures what it is supposed to measure… without gathering any external evidence
+ Content of the items must be carefully evaluated (wording appropriateness…).
When is content validity high?
When test content is a representative sample of the tasks that define the content domain + When the items do not measure something else
However, content validity is not enough to determine that the test is valid. Why?
No information about relation of test to external constructs or external variables
Criterion Validity def
Effectiveness of the test in predicting narrowly and specifically identified variables that are thought to be direct measures of the construct.
-> How well a test corresponds with a particular criterion.
Criterion def
A standard that researchers use to measure outcomes such as performance, attitude, performance.
-> Standard against which the test is compared
Objective criterion def
Observable and measurable
E.g., Number of accidents, days of absence
Types of criterion
Objective & Subjective criterion
Subjective criterion
Based on a person’s judgement
E.g., Supervisor ratings, peer ratings
Concurrent Validity
Comes from assessments of the simultaneous relationship between the test and the criterion.
Criterion available at THE same time as test
-> Can also be used when a person does not know how he or she will respond to the criterion measure.
Predictive validity
The forecasting function of tests.
Degree to which test scores accurately predict scores on a criterion measure.
-> Criterion measure available in the future
What happens if the criterion measures fewer dimensions than those measured by the test?
This decreases the evidence of validity based on its content because it has underrepresented some important characteristics
-> Underrepresentative
Criterion contamination def
If the criterion measures more dimensions than those measured by the test
Validity coefficient def
Relationship between a test and a criterion.
-> Tells the extent to which the test is valid for making statements about the criterion.