Week 5-6 Validity Flashcards
validity
the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of a test
four important implications of this definition of validity
1 Validity concerns interpretations and uses of scores
2 Validity is not a property of the test itself
3 Validity is a matter of degree
4 Validity is based on theory and evidence
Validity is a crucial basis for:
1 the meaningful interpretation of behavioural research
2 making sound societal decisions based on such research
3 making informed test-based decisions about individuals
Content validity:
The degree to which the content of a test is representative of the domain it’s supposed to cover.
criterion-related validity
a measure obtained by evaluating the relationship of scores obtained on the test with scores on other tests
construct validity
a measure obtained by performing an analysis of:
a) how scores on the test relate to other test scores and measures, and
b) how scores on the test can be understood within some theoretical framework
Test Content
This is the match between the content of a test and the content that should be included in the test
two types of validity relevant to test content:
1 Content validity
2 Face validity
two key threats to content validity:
Construct-irrelevant content
Construct under-representation
Face validity
what a test appears to measure to the person being tested, rather than what the test actually measures
Content validity can be evaluated only by ________ in a field, whereas face validity must be assessable by __________ •
Experts, non-experts
Internal structure
the way the parts of a test are related to each other:
• some tests include items that are highly correlated with each other, forming a single cluster
• other tests include items that fall into two or more clusters
Factorial validity concerns the match between the ________ internal structure of a test and the structure the test ________ possess
actual ,should
Response processes
there should be a close match between the psychological processes that the respondents actually use when completing a measure, and the process that they should use
methods for obtaining validity evidence of the response processes include:
- “think-aloud” procedures
- cognitive interviews
- focus groups
- response times
- eye movements
Associations with other variables
the way in which the construct is connected to other relevant psychological variables
- Our theoretical understanding of the construct we are trying to measure should lead us to expect a particular pattern of associations with other variables
- This type of validity evidence emphasises the match between measures predicted and observed associations with other measures
Convergent evidence (convergent validity)
the degree to which test scores are correlated with tests of related constructs
Discriminant evidence (discriminant validity)
the degree to which test scores are uncorrelated with tests of unrelated constructs
Concurrent validity evidence
the degree to which test scores are correlated with other relevant variables that are measured at the same time as the test undergoing validation
Predictive validity evidence
the degree to which scores on the test undergoing validation are correlated with relevant variables that are measured at a future point in time
consequential validity
refers to the social and personal consequences associated with using a particular test
what kind of test is associated with greater consequential validity
a non-biased test
three other types of validity that arguably do not fit as strongly within this construct/theory framework:
1 Criterion Validity
2 Induction-Construct Development Interplay
3 Measurement as Theory
A criterion
the standard against which a test or test score is evaluated
2 examples of criterion validity
Concurrent validity and predictive validity
Induction-Construct Development Interplay
bottom up theory development, e.g FFM
Measurement as theory
This approach rejects much of the unitary view except the importance attached to constructs and the theoretically based examination of response processes
Internal consistency reliability is typically estimated with
Cronbach’s α
Cronbach’s α assumes
a) unidimensionality
b) multidimensionality
unidimensionality
Factorial validity concerns the ___________ ___________ of test scores
internal structure
The test items on a unidimensional test would also have the property of ___________ ____________
conceptual homogeneity
Is a total test score calculated for a multidimensional tests with uncorrelated dimensions?
no, a score is obtained for each dimension, but the dimensions scores are not combined to compute a total test score
The communality for a given variable can be interpreted as
the percentage of variation in that variable explained by the extracted components
As a general statement, you want to see communalities that are at least
.04 for items or .09 for subscales
Items are less reliable than subscales, so they have _______ communality expectations
lower
The sum of the “Initial Eigenvalues” is equal the number of
variables included in the analysis
Component loadings refer to the associations between ______ and ____________
items, components
Component loadings vary between
-1 and +1
The size of the loading indicates the degree of association between a _______ and a ______________
item, component
Component loadings further from 0 indicate ____________ associations
stronger
A positive loading indicates that people who respond with a high score on an item have a ________ level of the underlying component
high
A negative loading indicates that people who respond with a high score on an item have a _____ level of the underlying component
low
As a general statement, useful component loadings are either _____ or ____ or greater
.20 (Items) or .30 (subscales)
Simple structure occurs when each item is strongly linked to
one and only one component
One way to help achieve simple structure is to
rotate the solution
should you always rotate the solution?
Yes
How many components do you need to rotate the solution?
two or more
If you extract only one component, then there will only be the
“Component Matrix”
If you have a minimum of 5 variables per factor and the communalities all exceed _____, then a sample size of about should be sufficient
.20, 150
There are no simple guidelines to follow, because the size of the sample required will be dependent upon two main factors:
- the amount of communality associated with the variables (higher communality means less sample size required)
- the number of variables per factor (higher number of variables per factor means less sample size required)
4 Methods of evaluating construct validity
1 Focussed associations
2 Sets of correlations
3 Multitrait-Multimethod Matrices
4 Quantifying Construct Validity
3 Factors affecting validity coefficients
1 True associations
2 Measurement error
3 Restricted range
The interconnections between a construct and other related constructs are known collectively as a
nomological network
validity coefficient
correlation coefficient between a test score (predictor) and a performance measure (criterion)
Example of a focused association
SAT and first year university marks
Validity generalization studies are intended to evaluate the predictive utility of a test’s scores across
a range of settings, times, situations, etc
Sets of correlations
large nomological networks incorporating a wider variety of other constructs, with differing levels of association with the main construct
validity generalization
is a process of evaluating a test’s validity coefficients across a large set of studies, comparable to a meta analysis
Is the judgement about the degree to which the pattern of coefficients matches the expectation
a) objective
b) subjective
A subjective
Validity generalization studies can essentially address three questions:
1 estimate the average level of predictive validity across studies
2 estimate the degree of variability associated with the validity coefficients
3 identify sources of systematic variability in the validity coefficients
a correlation between two scores may conflate two sources of variance
- trait variance (the good stuff)
* method variance (the bad stuff)
large correlations between different traits using the same measurement method suggests that the correlations are simply due to
a response style (i.e., method variance)
using MTMMM we want a lot shared trait variance, particularly identical traits measured using
different methods
The multitrait component of the study refers to
administer the questionnaire measuring the trait of interest in addition to other measures such as a measure of impulsivity, conscientiousness, and emotional stability
the multimethod component refers to
measuring the trait of interest and the additionally trait using multiple methods, e.g self report and acquaintance report
Sources of variance for Heterotrait-Heteromethod correlations
Nonshared trait variance, nonshared method variance
The most stringent test in a MTMM analysis is to determine whether the ________________ _________________- correlations are “meaningfully” larger than the ____________- ________________ correlations
monotrait-heteromethod, heterotrait-monomethod
one of the limitations associated with MTMMM
There are no clear guidelines to evaluate the differences in the mean correlations
At the very least, the ___________ ___________ correlations need to be larger than the ______________ _____________correlations
monotrait-heteromethod,heterotrait-monomethod
Quantifying construct validity (QCV)
this procedure requires researchers to predict the magnitude of the correlation between their measure of interest and their selected criteria
3 advantages of the QCV approach
1 It forces researchers to consider carefully the expected pattern of convergent and discriminant associations that would make theoretical sense
2 It forces researchers to make explicit quantitative predictions about the pattern of associations
3 It provides a single value reflecting the overall “goodness-of-fit” between the predicted and actual associations
using the QVC, why would a low correlation between the predicted and actual associations not necessarily reflect poor validity
because the predicted associations may simply be a poor reflection of a construct’s nomological network
If a test’s or a criterion’s reliability is much lower than .70, there are two options:
1 disregard—or reduce the weight given to—a validity coefficient based on poor reliability
2 adjust the validity coefficient to account for measurement error
a correlation between two variables can be reduced if the range of scores in one or both variables is
artificially limited or restricted