L3: Validity Flashcards
define validity (casual & apa definition)
casual: the degree to which a psych test measures what it purports to measure
APA: the degree to which test scores are interpreted and used in ways consistent with empirical evidence and theory
aka construct validity
what does validity depend on?
how a researcher interprets the test scores
ex: Raven test of logical reasoning is invalid measure of neuroticism since theres no theory or evidence for such an interpretation
What are the types of evidence you can evaluate to see how valid a test is?
- test content (content validity)
- response process
- internal structure of the test
- associations w other variables
- consequences of use
define content validity
the degree to which the content of a measure truly reflects the full domain of the construct for which it is being used -> expert judgment
define face validity & how it differs from content validity
validity in the eyes of the test user
different than content validity cause content validity is about expert judgment
what do you need to watch out for in content validity?
- construct underrepresentation (ex: a personality test without (or just a few) neuroticism questions)
- construct irrelevant content (ex: a personality test w questions about mood)
How does the response process indicate validity? w example for desired process for self report & ability/achievemen questions
for the test score to have a valid interpretation, respondents should use the intended psychological response process to answer the items
ex:
- self report questions: “if i leave the house, i often double check if i took my keys w me” 1 never 2 hardly 3 sometimes 4 often 5 always: the desired response process is often based on memory retrieval (reading the item -> memory retrieval -> matching w response options -> respond)
- ability / achievement questions: desired reponse process depends on the (cognitive) ability (ex: read item -> logical reasoning -> match w response options -> respond)
How can you find out if the respondent used the desired process?
- direct evidence: think out loud protocols (respondents have to say whatever theyre thinking out loud), interview respondents
- indirect evidence: process data (response times, mous movement, eye movement), stat analysis of responses (like item total correlations, reliability), experimentally manipulating the response process
what are some threats to the response process validity?
- poorly designed items (misinterpretation of item, unintended correct solution, multiple correct solutions etc)
- respondent reasons (lack of motivation, social desirability, guessing etc)
how does the internal structure of the test affect validity?
does the theoretical structure comply to the structure that you find in practice?
every psych test has a theoretical internal structure (unidimensional if ur test only measures a single construct, multidimensional if ur test measures multiple constructs)
how can you see if the internal structure of the test is valid?
factor analysis should show:
- nr of factors match the theoretical structure of the construct
- rotated factor loadings display the theoretical structure (the right items correlate with the right factor)
- correlations between the factors should be as expected based on theory (so if its multidimensional but both factors are moderately correlated, this should be refelcted)
how does the test’ association with other variables affect validity?
key question: do the test scores relate to other tests and variables in a theoretically meaningful way
ex: if u invent a new scale for weight, then u should see if the weight calculated correlates moderately w length (as would be theoretically expected)
check w nomological network, criterion validity, concurrent validity, predictive validity
what are the 4 types of validity evidence based on the relationships between the test score &other variables?
- convergent evidence: when test scores correlate highly w other measures of the same construct
- discriminant evidence: when test scores do not correlate w measures of unrelated constructs
- criterion evidence: when test scores correlate w specific outcomes or behaviours (predictive or concurrent)
predictive evidence: when the test scores predict future performance or outcomes
concurrent evidence: when test scores correlate w other measures taken at the same time
what is a nomological network? what is it used for?
summarizes all theoretical relations between the construct of interest and other constructs and variables
what is a nomological network used for?
used when establishing validity of a new test, you look at the correlations/relations between the construct you want to study and other things in the network. the correlations shown are from literature & theory
each of the other constructs in nomological network can be operationalized in another test (ex: construct of depression via items of depression test)
apply these tests to a sample of subjects and see how well the actual results fit to the theoretical relations (based on that u can see how valid your test is) (convergent & discriminant evidence used for this)
- can also include observed variables next to the constructs (age, high school grades, educational attainment etc) and their relations to the constructs (like intelligence)
what is discriminant vs convergent evidence in a nomological network?
convergent evidence is when the observed positive correlations between 2 constructs (meaning when youve applied them to a sample of subjects) fit the theoretical positive correlation
discriminant evidence is when the constructs are unrelated in practice (when applied to a sample) and also in theory
-> both are good for validity
define validity coefficients
individual correlations between operationalized constructs (correlation between IQ test and critical thinking test)
define validity generalization
establishing whether the pattern of correlations found in nomological network holds also for other tests (different type of IQ test, different critical thinking test etc)
how do you quantify construct validity?
how well do the predicted and observed correlations match
define criterion validity
when u include observed variables (like age, high school grades etc) in the nomological network next to the constructs
you can then calculated the CV: association between the construct and an observed variable it should theoretically be related to
define concurrent validity
type of criterion validity
the association between the construct & an observed variable measured at the same time (ex: the correlation between an intelligence test & age)
define predictive validity
type of criterion valiidity
the association between the construct & an observed variable measured in the future (ex: the correlation between a primary school education test & salary in first job)
what are the 4 methods for evaluating convergent & discriminant validity?
- focused associations: look at the correlations between a test score & a few key variables (high correlations w related constructs indicates good convergent validity; low correlations w unrelated constructs indicate good discriminant validity)
- sets of correlations: here u evaluate a broader set of correlations to provide compehensive pic a tests convergent & discriminant validity
- multitrait multimethod matrices: correlations of same trait measured by different methods (convergent validity) should be high, correlations of different traits measured by same or different methods (disciminant validity), should be low
- quantifying construct validity: stat techniques using the multitrait multimethod data
what are multitrait multimethod matrices?
helps us interpret validity coefficient
ex: you find the correlation of 2 self report social skill test = .62
-> correlation may be due to trait variance (shared variance in the test scores due to same trait)
-> correlation may be due to method variance (shared variance in the test scores due to same method)
shows all methods & all constructs and their correlations
what should a multitrait multimehtod matrix look like for the validity to be good?
- monotrait heteromethod correlation should be large (convergent validity)
- hetero trait heteromethod correlation is smaller (discriminant validity)
- heterotrait monomethod correlation is also smaller (discriminant validity)
how does the test’ consequences of use affect validity? what are the 3 types of evidence we can gather to check it?
does the use of the test scores have its intended consequence?
1. evidence of intended effects: does the test help in what it was need for (ex: a screening test in personell selection: does it save time (more efficient selection)?)
2. evidence of unintended differential impact on groups (ex: a selection test involves writing an essay about effects of social media on self esteem: students that use social media are at advantage)
3. evidence of unintended systematic effects on organizational systems (ex: high stakes test in education alter curriculum cause teachers focus more on topics that will come on exam)
what are the factors affecting validity coefficients?
- the actual associations between constructs
- random measurmenet error & reliability (low reliability weakens observed validity coefficient)
- restricted range (if sample studied has restricted range of scores, it can distort the VC)
- skew and relative proportions (skewed distributions & unequal group sizes)
- method variance (shared method variance (when multiple traits are measured by same method) can artifiicially inflate correlations between those traits)
- time (the longer the time between measurements, the weaker the expected correlation due to changes in the traits or behaviours)
- prediction of single events (predicting single instances of behaviour or outcomes tend to produce weaker correlations than predicting aggregates)
what is the difference between reliability & validity?
reliability refers to the consistency of test scores
while validity addresses whether the scores serve their intended purpose
how can you interpret a validity coefficient?
as squared correlation (r^2): explains the proportion of variance in one variable that can be accounted for by another
what 4 methods can you use to estimate practical effects of a validity coefficient?
- Binomial Effect Size Display: converts correlations into more easily interpretable data regarding the practical impact of a validity coefficient.
- Taylor-Russell Tables: help evaluate the effectiveness of a test in predicting success based on the selection ratio and base rate.
- Utility Analysis: helps assess practical value of a test by considering factors like the cost of testing and the benefits of correct predictions.
- Sensitivity/Specificity: often used in diagnostic testing to evaluate how well a test identifies true positives and true negatives
is face validity a facet of construct validity?
no
what is construct underrepresentation?
When a test does not include the entire range of content of the construct that it intends to measure
Rachel analyzes an anxiety test and finds that the items form two factors. To what type of validity evidence does this finding contribute?
internal structure
what are the different types of validity?
construct validity: umbrella term for other types! en gros: does the test measure what it’s supposed to measure?
includes:
- convergent validity: matching what it should, yout test should correlate w other tests that measure the same thing! if u make an anxiety test it should correlate w other well known anxiety tests
- discriminant validity: not matching what it shouldnt, it should not correlate w tests that measure different things, your anxiety test should not correlate w a happiness test
criterion validity: how well does the test correlate w real world outcomes;
- concurrent validity: correlates w current, real time outcomes (driving test correlate w actual driving performance)
- predictive validity: predicts future outcomes
content validity: does the test cover all aspects of the construct
what is consequential validity about?
the uses of test scores
in studying response process validity, what is considered “direct evidence” and what is considered “indirect evidence”?
direct:
- interview
- think aloud protocol
indirect
- response times
- mouse movements
- experiments
Why is it so important to use a reliable criterion measure when evaluating a validity correlation?
cause the validity coefficient will be attenuated when the criterion measure it not reliable
what is the relationship between reliability, validity, and measurement error?
high measurement error -> poor reliability -> attenuated validity coefficient
what happens to a validity coefficient when one of the tests is skewed?n
its reduced
What is a risk of using predictive validity correlations when evaluating validity?
Underestimation of validity coefficients, because the variables are measured at different points in time