chapter 4 Flashcards
describe early definitions/studies of test validity
aimed to determine the degree to which tests measure what they were supposed to
involved correlating scores of the test with an external criterion (benchmark)
aimed to produce validity coefficients
what were validity coefficients + what was their main limitation
empirical index used to evaluate degree to which a test measured what it was supposed to
limitation = demonstrating relevance to purpose of test was difficult + open to interpretation
describe Rulon’s validity scale (4 elements)
an instrument cannot be labelled valid/invalid w/o respect to a given purpose
an assessment of validity of instrument must contain assessment of content and relation to measurement
different kinds of validity evidence are required for different types of instruments
some instruments are obviously valid and need no further study
what was APA’s definition of validity?
the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses
what were the four implications of APA’s definition of validity?
the validity is about test scores and what they mean, not about the test itself (which depends on purpose of test + meaning assigned to scores)
validity is a measure of degree (no specific point at which tests become valid)
involves using scale for intended purpose
concerns the importance of evidence and theory (scores can be viewed as valid if there is sufficient evidence and a viable theory)
why are theories important?
if theory is not supported empirically, it is not possible to know what test scores mean
every set of questions on a test refers to a theory
w/o understanding of theory, scores are just questions on a test with no meaning
what was Cronbach and Meehl’s definition of construct validity?
construct validity is composed of several subelements of validity
all should be considered when evaluating validity of construct
what is content validity and how is it measured?
determining if a test covers the representative sample of behaviour being measured
evaluated with domain representativeness and relevance
what does a measure with poor content validity entail?
test does not measure all the content that it should measure
what are two concerns associated with domain representativeness as an evaluation method of content validity?
the extent to which the questions on a test adequately measure the entire domain that it is supposed to measure ( shortened tests = involves loss of info)
the issue of content overlap between two tests
what is the main concern associated with domain relevance as an evaluation method for content validity?
the extent of relevance the questions on a test are to assessing the construct
some questions may assess a construct more than others and should be emphasized more
what is inclusionary and exclusionary criteria in terms of content validity?
inclusionary criteria = items/questions that count you towards having a specific disorder
exclusionary criteria = items/questions that eliminate you from having a specific disorder
what type of validity is face validity?
content validity