Lecture 9: Validity Flashcards
Reliability
degree to which differences in test scores reflect differences in psychological attribute that affects test scores (does the test’s score reflect something with precision?
Validity
“the degree to which evidence and theory support the interpretations of test scores for proposed uses”
3 Points about validity
- Validity is about the interpretation of test scores in terms of a specific psychological construct; it is not about the test itself 2. Validity is matter of degree; it is not all or none 3. Validity is based on solid empirical evidence and theory (use information from research, not just choice)
What is the importance of theory for validity
questionnaires and test require a theory that defines what the construct is; otherwise it’s not possible to understand what the test scores mean (if you can’t define it, you don’t know if you’re measuring it)
Why is it important to measure validity? (what could be detrimental if you don’t)
if the psychological meaning of test scores is misinterpreted, then 1. Psychological conclusions based on those scores may be mistaken 2. Decisions made about people (based even partly on those scores) may be misguided and potentially unfair or dangerous
What are the dimensions of construct validity?
- Test Content (what are we measuring) 2. Internal Structure (Factor analysis; what are the dimensions? How do they relate) 3. Response processes (how someone thinks about an item, what are they thinking about, how does that relate to what we’re trying to ask) 4. Consequences of use (the impact—what we want or what we don’t) 5. Associations with other variables (how we’ve been thinking about it in the past; empirical evidence)
What are the three types of validity? (from book reading)
- Content
- Criterion—degree to which test scores can predict specific criterion; de-emphasizes conceptual meaning or interpretation of test scores
a. Concurrent and predictive validity (criterion is now thought to be under the construct) - Construct (now more focused)
a. Test content—content of the test match what should be on the test (expert rating)
i. Construct-irrelevant content and construct underrepresentation
ii. Face validity is close to content (but looks at the common people)
b. Internal structure—does the internal structure match what it should be (factor analysis)
c. Response processes—do the psychological processes that respondents use when completing a measure match what they should (interviewing and eye tracking)
d. Associations with other variables—do the associations with other variables match what they should be (convergent, discriminant, concurrent, and predictive correlations)
e. Consequences of use—actual consequences of using a measure match the consequences that should be seen (evaluation of consequences, differential impact and systematic changes [MCAT content reflected how premeds were taught])
What are the three types of validity from Cronbach and Meehl?
- Criterion
a. Predictive—criterion obtained after test is given
b. Concurrent—test score and criterion obtained at the same time - Content—showing the test items area sample of a universe in which the investigator is interested; deductive
a. Acceptance of the universe of the content as defining the variable to be measured is essential - Construct—when a test is interpreted as a measure of some attribute or quality which is not “operationally defined”
What is content validity?
“the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured”; the relationship between the content of a test and some well-defined domain of knowledge or behavior (relationship between what it is on the test and what you think should be on the test based on the theory); good match between content and domain=high content validity); does the actual content of a test match the content that should be included on the test?
Steps for content validity
- Reflect the full scope of the construct (vs construct under-representation) 2. Systematically exclude anything besides the construct (vs construct irreverent variance); evidence via expert ratings
Construct underrepresentation
The degree to which a test fails to capture important aspects of the given construct
Irrelevant variance
the degree to which test scores are affected by processes extraneous to the intended construct
Steps to assess content validity
- Describe the content domain (in terms of boundaries/limits and structure) 2. Determine the areas of the content domain that are measured by each item (go item by item, what am I measuring?) 3. Compare the structure of the test with the structure of the content domain (do you want two domains even?)
Concerns with content validity
short forms of a test (ex. Shortened depression screening tool—it only has two items, maybe missing people or over diagnosing); content overlap (ex. Agitation or tension; if you’re measuring depression, but including things like tense or agitation, it’s more anxiety; overlap between two different questionnaires of two different constructs)
Final thoughts on content validity
content validation is a process rather than a statistical analysis (really a process where you’re going through each item and domain); content validity differs from internal consistency reliability (in internal consistency we want the items to hang together and mean the same thing; content validity is similar but more related to the content of the construct, not just if the items measure the same thing)
Face validity
if a test looks like it measures its target construct (“how sad are you?” For depression); don’t confuse this with the empirical approach (does not mean it actually measures what you’re interested in); concern: malingering (subject to response bias—all questions are, but more subject; low face validity might be better for malingering)
Internal Structure (structural validity)
does the actual internal structure of a test match the structure that the test should possess; dimensionality and factor analysis (if a test has different dimensions, how it is measuring these and how do they relate to one another)
Internal structure validity issues evaluated through factor analysis
- Number of factors 2. Meaning of factors 3. Associations between factors (if more than one factor) 4. Order of factors (do they relate to the construct overall or are they separate? Top: Depression, Second Level: 1. Cognition 2. Somatic, Third level: what relates to those)
Reponses Process Validity
do the psychological processes that actually shape peoples’ response to the test match the processes that should shape their response? (understand what the question means to the individual taking the test; people have different processes; interviewing after a questionnaire helps get a general sense of individual’s response)
Two types of evidence for response process validity
Evidence: 1. Direct evidence: interviews with respondents, “think alouds” 2. Indirect evidence: eye-tracking, response times, statistical analysis, and experimental studies of process (use if you can’t give interview or if you think they’re lying)
Consequences of Testing Validity
do the actual consequences of using a test match the consequences that should be seen?; Controversial as a facet of validity
Evidence for consequences of testing validity
evidence: 1. Evidence of intended effects 2. Evidence regarding unintended differential impact on groups (unbiased test could have biased results; women might have higher emotional intelligence than men, but if using for a screening for grad school, you’ll choose more women) 3. Evidence regarding unintended systemic effects (choosing more women for your program over time, more woman professors and faculty members, long-term effects on school and even profession)
Associations with other measures validity
does the actual associations with other measures match the associations that it should have with those measures? (are the scores obtained a good measure of the construct; extent to which the scores obtained on measure are consistent with the construct and measures related to this construct);
Evidence for associations with other measures validity
); evidence: 1. Convergent Validity (concurrent/predictive validity; expect to find significant correlations between measures of behaviors related to our construct and our test; different measures of construct should be correlated also) 2. Discriminant (divergent) validity (behaviors not associated with our construct of interest should be uncorrelated with our test)