Validity Flashcards
Validity
Validity can be defined as the agreement between a test score or measure and the quality it is believed to measure.
Does this test what it is supposed to be testing
Do the results mean what we think they mean
Face validity -
does it look like it actually measures what it is testing
Least important - a good test doesnt actually have to look like a good test
Easy for people to pick the answer they want - get desired outcome
These appearances can help motivate test takers because they can see that the test is relevant
Construct-related evidence
Construct validity evidence is established through a series of activities in which a researcher simultaneously defines some construct and develops the instrumentation to measure it
involves assembling evidence about what a test means.
showing the relationship between a test and other tests and measures.
Each time a relationship is demonstrated, one additional bit of meaning can be attached to the test
Convergent evidence
Obtained in 2 ways:
1- show that a test measures the same things as other tests used for the same purpose
2 - demonstrate specific relationships that we can expect if the test is really doing its job.
Discriminant evidence
Prove that it measures something unique - low correlations with measures of unrelated constructs
Reliability and Validity
Maximum validity coefficient
A reliable test may not be able to be demonstrated to be valid
we can have reliability without validity
An unreliable test cannot be demonstrated to be valid
validity coefficients are not usually expected to be exceptionally high, a modest correlation between the true scores on two traits may be missed if the test for each of the traits is not highly reliable.
Criterion-related evidence
how well a test corresponds with a particular criterion
high correlations between a test and a well-defined criterion measure
Criterion-related evidence
predictive validity evidence -
Trying to make some prediction - predictive validity evidence - forecasting
Predictor variable and criterion
High school and uni GPA - r value is .36 - as good as validity gets
Criterion-related evidence
Concurrent Validity Evidence
See if someone is doing their job well
Concurrent-related evidence for validity comes from assessments of the simultaneous relationship between the test and the criterion—such as between a learning disability test and school performance
test may give diagnostic information that can help guide the development of individualized learning programs. Concurrent evidence for validity applies when the test and the criterion can be measured at the same time.
Content-Related Evidence
measure considers the adequacy of representation of the conceptual domain the test is designed to cover.
attempt to determine whether a test has been constructed adequately
multiple judges rate each item in terms of its match or relevance to the content
Educational settings especially
The score on your history test should represent your comprehension of the history you are expected to know
Does test performance rely on some of the wrong things
Rely on knowledge some people have
CONSTRUCT UNDERREPRESENTATION
Not testing all the material
failure to capture important components of a construct.
For example, if a test of mathematical knowledge included algebra but not geometry, the validity of the test would be threatened
CONSTRUCT IRRELEVANT VARIANCE
How can you tell whether a test is testing the right thing or not
Expert judgement - is this test covering all the things its meant to cover
Factor analysis - think about which items go together
scores are influenced by factors irrelevant to the construct. For example, a test of intelligence might be influenced by reading comprehension, test anxiety, or illness.
Validity Coefficient
Correlation between test and criterion
tells the extent to which the test is valid for making statements about the criterion
These do not tend to be that high
Good is about .30 to .40
.60 is very high
Coefficient of determination is r2
How high is high enough?
Type 1 (yes but no) vs. Type 2 errors (no but yes)
What should the criterion be?
Criterion have their own measurement problems - these measures need to be valid and reliable as well and based on adequate sample size
Population in the validity study
Is the group representative
Are there attrition problems? - whole bunch of people who would have informed your equation but a bunch have dropped out
Restriction of range?