Validity Flashcards
Validity
Validity can be defined as the agreement between a test score or measure and the quality it is believed to measure.
Does this test what it is supposed to be testing
Do the results mean what we think they mean
Face validity -
does it look like it actually measures what it is testing
Least important - a good test doesnt actually have to look like a good test
Easy for people to pick the answer they want - get desired outcome
These appearances can help motivate test takers because they can see that the test is relevant
Construct-related evidence
Construct validity evidence is established through a series of activities in which a researcher simultaneously defines some construct and develops the instrumentation to measure it
involves assembling evidence about what a test means.
showing the relationship between a test and other tests and measures.
Each time a relationship is demonstrated, one additional bit of meaning can be attached to the test
Convergent evidence
Obtained in 2 ways:
1- show that a test measures the same things as other tests used for the same purpose
2 - demonstrate specific relationships that we can expect if the test is really doing its job.
Discriminant evidence
Prove that it measures something unique - low correlations with measures of unrelated constructs
Reliability and Validity
Maximum validity coefficient
A reliable test may not be able to be demonstrated to be valid
we can have reliability without validity
An unreliable test cannot be demonstrated to be valid
validity coefficients are not usually expected to be exceptionally high, a modest correlation between the true scores on two traits may be missed if the test for each of the traits is not highly reliable.
Criterion-related evidence
how well a test corresponds with a particular criterion
high correlations between a test and a well-defined criterion measure
Criterion-related evidence
predictive validity evidence -
Trying to make some prediction - predictive validity evidence - forecasting
Predictor variable and criterion
High school and uni GPA - r value is .36 - as good as validity gets
Criterion-related evidence
Concurrent Validity Evidence
See if someone is doing their job well
Concurrent-related evidence for validity comes from assessments of the simultaneous relationship between the test and the criterion—such as between a learning disability test and school performance
test may give diagnostic information that can help guide the development of individualized learning programs. Concurrent evidence for validity applies when the test and the criterion can be measured at the same time.
Content-Related Evidence
measure considers the adequacy of representation of the conceptual domain the test is designed to cover.
attempt to determine whether a test has been constructed adequately
multiple judges rate each item in terms of its match or relevance to the content
Educational settings especially
The score on your history test should represent your comprehension of the history you are expected to know
Does test performance rely on some of the wrong things
Rely on knowledge some people have
CONSTRUCT UNDERREPRESENTATION
Not testing all the material
failure to capture important components of a construct.
For example, if a test of mathematical knowledge included algebra but not geometry, the validity of the test would be threatened
CONSTRUCT IRRELEVANT VARIANCE
How can you tell whether a test is testing the right thing or not
Expert judgement - is this test covering all the things its meant to cover
Factor analysis - think about which items go together
scores are influenced by factors irrelevant to the construct. For example, a test of intelligence might be influenced by reading comprehension, test anxiety, or illness.
Validity Coefficient
Correlation between test and criterion
tells the extent to which the test is valid for making statements about the criterion
These do not tend to be that high
Good is about .30 to .40
.60 is very high
Coefficient of determination is r2
How high is high enough?
Type 1 (yes but no) vs. Type 2 errors (no but yes)
What should the criterion be?
Criterion have their own measurement problems - these measures need to be valid and reliable as well and based on adequate sample size
Population in the validity study
Is the group representative
Are there attrition problems? - whole bunch of people who would have informed your equation but a bunch have dropped out
Restriction of range?
Generalizability?
Dont take generalizability for granted - the relationship between predictor and criterion may not always be the same in all situations
The validity may not be the same across all groups
Evaluating Validity Coefficients
Look for Changes in the Cause of Relationships
Conditions of a validity study are never exactly reproduced
if you take the GRE to gain admission to graduate school, the conditions under which you take the test may not be exactly the same as those in the studies that established the validity of the GRE.
The logic of criterion validation presumes that the causes of the relationship between the test and the criterion will still exist when the test is in use. Though this presumption is true for the most part, there may be circumstances under which the relationship changes
Evaluating Validity Coefficients
What does the criterion mean?
Criterion-related validity studies mean nothing at all unless the criterion is valid and reliable
criterion should relate specifically to the use of the test.
Because the SAT attempts to predict performance in college, the appropriate criterion is GPA, a measure of college performance
Evaluating Validity Coefficients
Review the Subject Population in the Validity Study
validity study might have been done on a population that does not represent the group to which inferences will be made.
In industrial settings, attrition can seriously jeopardize validity studies. Those who do poorly on the job either drop out or get fired and thus cannot be studied when it comes time to do the job assessment.
Evaluating Validity Coefficients
Be Sure the Sample Size Was Adequate
validity coefficient that is based on a small number of cases.
smaller the sample, the more likely chance variation in the data will affect the correlation.
Thus, a validity coefficient based on a small sample tends to be artificially inflated
cross validation study checks how well this relationship holds for an independent group of subject
Evaluating Validity Coefficients
Never Confuse the Criterion with the Predictor
GRE is the predictor, and success in graduate school is the criterion.
select students who have the highest probability of success in the program. By completing the program, the students have already succeeded on the criterion (success in the program).
Before the university would acknowledge that the students indeed had succeeded, the students had to go back and demonstrate that they would have been predicted to do well on the criterion. This reflects a clear confusion between predictor and criterion. Further, most of the students provisionally admitted because of low GRE scores succeeded by completing the program.
Evaluating Validity Coefficients
Check for Restricted Range on Both Predictor and Criterion
variable has a “restricted range” if all scores for that variable fall very close together
If all the people in your class have a GPA of 4.0, then you cannot predict variability in graduate-school GPA. Correlation requires that there be variability in both the predictor and the criterion.
at least three explanations for the modest performance of the GRE for predicting graduate-school performance
1- GRE may not be a valid test for selecting graduate students.
2- those students who are admitted to graduate school represent such a restricted range of ability that it is not possible to find significant correlations
3 -grades in graduate school often represent a restricted range. Once admitted, students in graduate programs usually receive As and Bs.
Evaluating Validity Coefficients
Review Evidence for Validity Generalization
Criterion-related validity evidence obtained in one situation may not be generalized to other similar situations.
Generalizability refers to the evidence that the findings obtained in one situation can be generalized—that is, applied to other situations.
Generalizations from the original validity studies to these other situations should be made only on the basis of new evidence
Evaluating Validity Coefficients
Consider Differential Prediction P
Predictive relationships may not be the same for all demographic groups. The validity for men could differ in some circumstances from the validity for women