Validity Flashcards
Why is it not strictly accurate to talk about the validity of a test?
A particular test might be valid in one context but not in another; it’s about the extent to which it’s valid in this context
What are constructs?
Unobservable, underlying hypothetical traits or characteristics that we can try and measure indirectly using tests (e.g. intelligence, anxiety, speeding propensity)
What is construct under-representation?
The variation that the measure doesn’t capture in the underlying construct it’s supposed to be measuring (e.g. behaviour a participant won’t admit to in self-report)
What is construct irrelevant variance?
Score variation not related to the construct (e.g. misinterpretation of question wording, dishonesty or biased responses)
How should the test and construct overlap in order to get a measure with high validity?
With a large overlap so valid measurement is large, with minimal construct under-representation and construct irrelevant variance
What’s the difference between content and face validity?
Face validity is how valid a test appears to be (usually from the perspective of the test-taker); Content validity is often based on expert opinion (still not enough without empirical validity)
Why is face validity not directly important from a psychometric perspective, and when can it be useful?
Because a test can have good face validity without any actual validity, or poor face validity while still having good actual validity; can be useful from a public relations point of view, so laypeople accept or choose the test because they think it’s appropriate and take it seriously, etc
How could you go about evaluating content validity?
Getting experts to rate each question in a uni examination for its relevance to course content; getting those experts to make judgments on whether particular lectures were over or under-represented
How could I create an exam that had great empirical validity but poor content validity?
By predicting students’ understanding of the course (e.g. GPA, hours spent studying, number of lectures attended, etc) with a high degree of accuracy, and without including any content from the course (although this wouldn’t be appropriate)
What is the general process involved in testing empirical validity?
Create hypotheses regarding how your measure ought to perform if it is valid, then design and run studies to test these hypotheses
What are the questions we might ask when designing an empirical validation study?
Does it map onto criteria as expected?; are there group differences as expected?; does it correlate with other things?; does it not correlate with dissimilar things?; are developmental changes, experimental effects, and internal structure as expected?
Give two examples of things that might restrict the range of scores in a test, and indicate what influence this could have on the validity coefficient.
Non-random attrition (certain types of people dropping out of a longitudinal study); and Self-selection (only certain people are in the sample or will volunteer in the first place); both will lead to a smaller validity correlation coefficient
How big does the validity coefficient have to be?
Magnitude depends entirely on the context; e.g. if validating a driving behaviour measure with accident involvement, we’d want a low validity coefficient (.10); if comparing a new and old intelligence test, we’d expect it to be much larger (at least .80)
What is criterion validity?
A judgment regarding how adequately a score on a test can predict an individual’s most probable standing on some measure of interest (the criterion); the standard against which the test is evaluated
Give 3 examples of criterion validity
- University admissions test > GPA at end of the 1st year
- Depression inventory > Clinicians’ ratings of severity of depression
- Clerical aptitude test > supervisor’s ratings of job performance