Chapter 5-validity Flashcards
Validity def
The extent to which a test measures the attribute/construct it is designed to measure
Does the test measure what it was designed to measure?
-> Not a yes or no question - question of DEGREE
-> One of the most important characteristics of the test.
Guidelines regarding validity (3)
(1) Do NOT accept a test’s name as an indicator of what the test measures.
(2) Validity is NOT a yes/no decision
(3) Validity evidence tells how well the test measures what it is intended to measure.
-> Diff types of evidence can be generated for diff types of validity
What do we mean when we say that “Validity is NOT a yes/no decision”
- It comes in degrees and applies to a particular USE and a particular POPULATION
- It is a process: An ongoing, dynamic effort to accumulate evidence for a sound scientific basis for proposed test score interpretations
3 Types of Validity
Content, Criterion, Construct
Subtypes of Criterion validity
Concurrent, Predictive
Subtypes of Construct validity
Convergent, Divergent
Face Validity
Whether a test appears to measure what it is supposed to measure (does it appear valid).
Mere appearance that a measure has validity.
=> Not sufficient evidence of validity
A test with high face validity may: (3)
(1) Induce cooperation and positive motivation before and during test administration
(2) Reduce dissatisfaction and feelings of injustice among low scorers
(3) Convince policymakers, employers, and administrators to implement the test
There are situations where test designers make ON PURPOSE a test with low face validity. Why?
Sometimes a test with low face validity can elicit more honest responses
Content Validity def
Degree to which ELEMENTS OF A TEST are representative of the domain/construct of interest.
-> Evaluate how adequately the test samples the domain or content of the construct.
-> More QUALITATIVE than Quantitative
Establishing content validity (3)
(1) Describe the content domain: Identify the boundaries of the content domain + Determine the structure of the content domain.
(2) Inspect test - Expert judgment
(3) Form judgment that the test measures what it is supposed to measure… without gathering any external evidence
+ Content of the items must be carefully evaluated (wording appropriateness…).
When is content validity high?
When test content is a representative sample of the tasks that define the content domain
+
When the items do not measure something else
However, content validity is not enough to determine that the test is valid. Why?
No information about relation of test to external constructs or external variables
Criterion Validity def
Effectiveness of the test in predicting narrowly/specifically identified variables that are thought to be DIRECT measures of the construct.
-> How well a test corresponds with a particular criterion.
Criterion def
A standard that researchers use to measure outcomes such as performance, attitude, performance.
-> Standard against which the test is compared
Objective criterion characteristics (2)
Observable and Measurable
E.g., Number of accidents, days of absence
Types of criterion
Objective & Subjective criterion
Subjective criterion
Based on a person’s judgement
E.g., Supervisor ratings, peer ratings
Concurrent Validity
Comes from assessments of the simultaneous relationship between the test and the criterion.
Criterion available at THE same time as test
-> Can also be used when a person does NOT know how they will respond to the criterion measure.
Predictive validity
The forecasting function of tests.
Degree to which test scores accurately predict scores on a criterion measure.
-> Criterion measure available in the future
What happens if the criterion measures FEWER dimensions than those measured by the test?
This decreases the evidence of validity based on its content because it has underrepresented some important characteristics
-> Underrepresentative
Criterion contamination def
If the criterion measures MORE dimensions than those measured by the test
Validity coefficient def
Relationship between a test and a criterion.
-> Tells the extent to which the test is valid for making statements about the criterion.
What’s the range of validity coefficients?
Validity coefficients: Correlation between test and criterion
-> Rarely greater than r = .60
-> If higher than that → alternative test
Comparison of validity coefficient in psychology and medicine
Psychological tests can provide information that is AS VALID as common medical tests
Factors Limiting Validity Coefficients (3)
(1) Range of Scores
(2) Unreliability of Test Scores
(3) Unreliability in Criterion
[Factors Limiting Validity Coefficients] Explain (1) Range of Scores
Restricted range of scores decreases validity coefficients because it diminishes the Test score & criterion score correlation
Explain (2) Unreliability of Test Scores and how we deal with it
Low reliability decreases validity coefficients.
Solution: Correction for attenuation - validity coefficient if we had perfect realibility of test scores
Explain (3) Unreliability in Criterion and how we deal with it
Low reliability decreases validity coefficients
Solution: Correction for attenuation - Correcting for unreliability in test (predictor) & criterion
What’s the correction for attenuation - Validity coeff
Observed validity coeff / sqrt(reliability coefficients)
-> See what would the validity coeff be if test and/or criterion were more reliable
Take aways of Criterion Validity (2)
(1) Importance of choosing appropriate criterion
(2) Small validity coeff can have practical utility
Construct Validity def
“All encompassing” type of validity. Process of Construct validation
-> In construct validity evidence, no single variable can serve as the criterion.
-> Extent to which your test or measure accurately assesses what it’s supposed to
Construct validation def
Assembling evidence about what a test means. Done by showing the relationship between a test and other tests and measures.
-> Over a series of studies, the meaning of the test gradually begins to take shape.
‘Limitations’ of criterion validity
For some constructs it’s easier, other it’s difficult to find a criterion.
-> E.g. criterion for trust, chronic pain (…)? Hard to identify appropriate criterion.
-> Thus, other ways of establishing validity of test scores → construct validity
Psychological Constructs def (4)
(1) Abstract concepts used to refer to psychological attributes (e.g. intelligence, love, beauty)
(2) Exist in theory; generally LATENT; not directly observable (EXCEPTIONS: e.g. reaction time, absenteeism)
(3) Important in describing or understanding human behavior -> Its existence EXPLAINS why/how something is happening
(4) Can observe and measure the behaviors that show evidence of these constructs
Construct Explication
How a particular construct is manifested and how such manifestations can be measured.
How to gather Evidence of Construct Validity (2)
(1) Gathering Theoretical evidence
(2) Gather Psychometric evidence
Explain how we gather THEORETICAL evidence of construct validity (2)
(1) Establish nomological network - identifying all possible relationships
(2) Based on this theoretical work → Propose experimental hypotheses
-> If what we think is true, what would be the evidence to support this relationship
Nomological Network (3)
Consists of:
(1) Constructs (e.g. job satisfaction)
(2) Their observable manifestations (e.g. smiles, productivity, positive feedback)
(3) The relations within and between constructs and their observable manifestations (e.g. positive feedback related to productivity)
Explain how we gather PSYCHOMETRIC evidence (6)
(1) Content validity
(2) Criterion validity
(3) Reliability of the test
(4) Experimental interventions
(5) Convergent evidence of validity
(6) Discriminant evidence of validity
[Gathering psychometric evidence] Evidence of validity based on content (2)
(1) No construct underrepresentation: Does the test sample adequately from the construct domain?
(2) No irrelevant construct representation: Does the test properly exclude content that is unrelated to the construct?
[Gathering psychometric evidence] Evidence of validity based on relations with criteria
Are the relations of the test with the external criteria as would expected based on theory?
[Gathering psychometric evidence] Evidence of validity based on reliability of the test
E.g. test-retest/internal consistency not too low or too high given the construct
[Gathering psychometric evidence] Evidence of validity based on experimental interventions
Provide evidence of situational changes that should influence test scores based on theory
-> E.g. education influencing scores on an achievement test
-> Medication influencing scores on an anxiety test
[Gatheting psychometric evidence] Convergent Validity (2)
Extent to which two measures that are supposed to be related are actually correlated
When a test scores correlate with:
(1) Other measures of the SAME construct, or
(2) Measures of constructs to which the test should be related based on theory (think nomologic net)
[Gatheting psychometric evidence] Discriminant (Divergent) Validity
Test scores are uncorrelated with:
Measures of constructs to which the construct should NOT be related based on theory (think nomologic net)
Problems with Content validity (3)
(1) Educational setting: content validity has been of greatest concern in educational testing (score on this test represent comprehension of subject) BUT many factors can limit performance on test
(2) Unclear boundaries: hard to separate types of validities
-> It’s often hard to separate “content coverage” (content validity) from whether the test actually measures the underlying concept (construct validity), leading to blurred boundaries.
(3) Doesn’t consider the relationship of contruct w external variables/constructs
[Problems with Content validity] How can we answer unclear boundaries: hard to separate types of validities?
Content validity evidence offers some unique features. Logical rather than statistical.
Construct underrepresentation
CONTENT validity. Failure to capture important components of a construct.
Construct-irrelevant variance
CONTENT validity. Occurs when scores are influenced by factors irrelevant to the construct.
-> E.g., a test of intelligence might be influenced by reading comprehension, test anxiety, or illness.
Would a validity coeff of .40 always be considered good?
NO. All validity coefficient don’t have the same meaning.
Several issues of concern when interpreting validity coefficients (9)
(1) All validity coefficient don’t have the same meaning
(2) The conditions of a validity study are never exactly reproduced. E.g. If you take the GRE to gain admission to graduate school, the conditions under which you take the test may not be exactly the same as those in the studies that established the validity of the GRE.
(3) Criterion-related validity studies mean nothing UNLESS the criterion is valid and reliable.
(4) Validity study might have been done on a population that does not represent the group to which inferences will be made.
(5) Be sure the sample size was adequate
(6) Never Confuse the Criterion with the Predictor (GRE & success in grad school example)
(7) Check for Restricted Range on Both Predictor and Criterion: Correlation requires that there be variability in both the predictor and the criterion.
(8) Review Evidence for Validity Generalization (may not be generalized to other similar situations)
(9) Consider Differential Prediction: Predictive relationships may not be the same for all demographic groups.
Differential Prediction
Predictive relationships may NOT be the same for all demographic groups.
-> The validity for men could differ in some circumstances from the validity for women.
-> Under these circumstances, separate validity studies for different groups may be necessary.
Criterion-Referenced Tests
Have items that are designed to match certain specific instructional objectives. Designed to measure student performance against a fixed set of predetermined criteria.
Validity studies for the criterion-referenced tests
Would compare scores on the test to scores on other measures that are believed to be related to the test.
Validity & Reliability relationship
A measurement procedure MUST BE RELIABLE (consistent) in order to be VALID.
-> A measurement procedure can be reliable, but not necessarily valid.