validity part 1 Flashcards
validity vs. reliability
-Reliability = is the test measuring anything other than error?
- Validity = is the test measuring what it specifically intends to be measuring?
- Reliability is a prerequisite for validity
- Low reliability will limit validity
- However, high reliability does not guarantee validity
what is validity
- pertains to interpretations of scores, not to the test itself
- is related to HOW the scores will be interpreted and used
- is not “all or none.” A test might be valid for some purposes and not others.
- rests on both empirical evidence (criterion validity) as well as theory that links the evidence to the construct being measured (construct validity)
types of validity
- Face
- Content
Criterion
- Concurrent
- Predictive
Construct
what is face validity
- Pertains to the test’s appearance
- Do the items look like they “belong”?
- Can a test-taker tell what is being assessed by looking at the test items?
- Low FV = both answers are “no”
- High FV = both answers are “yes”
advantages of high and low face validity
high: Safeguards privacy of test-taker
low: More difficult for test takers to distort their answers
disadvantages of high and low face validity
high: Easier for test-takers to distort their answers
low: Could be considered deceptive or misleading to test takers;
Could decrease motivation;
Ambiguity of items could make test-takers wary and thus introduce error
what is content validity
- Pertains to coverage
- Do the test items adequately cover the domain the test intends to assess?
- Match between the actual content of the test and the content that should be included
how can content validity be compromised
-Ways that content validity might be compromised:
1) Construct-irrelevant content (including items that don’t belong)
- Test on Chapters 1-4 includes items from Chap 5
2) Construct underrepresentation (failing to include items that should be included)
- Test on Chapters 1-4 include items only on material in Chapter 3.
content validity vs. face validity
- Face Validity depends on the judgment of non-experts (test-takers)
- Content Validity rests on the judgment of experts.
- Face Validity is not usually considered an important psychometric facet of validity
determining content validity
Usually addressed in the process of test development rather than after the fact
Two common methods for ensuring content validity
- Item specifications.
- Committee that is put together and they will decide what topics they think should be addressed on the instrument
Expert panels.
-Experts in the field assembled together, read a group of items that have already been developed for the test, relying on their expertise to tell us what are good vs not good items
what is criterion validity
- Criterion = alternate way of measuring the construct
- Goal = demonstrate that the test correlates with other criteria that are important elements of the construct being measured
examples of criteria
- Another test of the same construct
- Psychiatric diagnosis
- Therapist ratings
- Grade Point Average
- Job performance evaluations
concurrent vs. predictive
Concurrent:
- When test is administered: Now
- When is criterion assessed: Now
Predictive
- When test is administered: Now
- When criterion is assessed: Later
validity coefficient
-Expresses the correlation between the test and the criterion
All the factors affecting the correlation coefficient can also affect the validity
- Restriction of range
- Heteroscedasticity
- Non-linear relationship
restriction of range revisited
- Correlation between SAT and college GPA
- Unrestricted (hypothetical) correlation based on all students who take the SAT = .70
- Restricted correlation based only on those students who are admitted to college = .50
- Excluding those who chose not to go for outside reasons (e.g. pandemic, family business)
- Restricted range will always result in a LOWER correlation between test and criterion than would be obtained if range were unrestricted
other factors affecting the validity coefficient
- Criterion Contamination
- Criterion Unreliability
- Differential Validity
criterion contamination
- When the assessment of the criterion is not independent of the test results
- Example: Those who assign criterion ratings have knowledge of the test scores
- Could either inflate or deflate the validity coefficient
criterion unreliability
Measurement error sets an upper limit (below 1.00) on the magnitude of the correlation between two variables
- A test and a criterion might show a lower than expected validity coefficient because of measurement error that affects BOTH the test and the criterion.
- Many studies do not provide information on the reliability of the criterion, and in many cases the criteria that are employed have lower reliability than the test whose validity is being examined
main point for criterion unreliability
If the criterion measures we use are poor (in terms of reliability) then we are unlikely to find strong evidence for validity even when the test is actually a good measure of the intended construct.
correction for attenuation
¥ Formula that allows us to “correct” for the presence of error in two variables whose correlation is being evaluated
when might correction for attenuation be useful
- If we are interested in comparing the correlations obtained between a single test and several different criteria, all of which differ in criterion reliability
- Purpose might be to show that the validity of the test remains relatively stable even when compared against criteria whose reliability varies
- If our interest is in describing the “theoretical” correlation between the constructs measured by the test and the criterion
differential validity
- The magnitude of the validity coefficient depends on the composition of the sample on which it is calculated
- A test might exhibit higher validity when used for one subgroup of individuals than when used for another subgroup
- If so > Differential Validity