2-6 tb Flashcards
traditional views of validity
a test is valid if it measures what it is designed to measure
a test can measure exactly what it was designed to measure and yet not be valid if the inferences that are made from the test scores are not supportable
e.g. making a hiring decision based on a personality test
a test is valid if it measures what it is designed to measure
a test can measure exactly what it was designed to measure and yet not be valid if the inferences that are made from the test scores are not supportable
e.g. making a hiring decision based on a personality test
traditional views of validity
current views of validity
view validity as a unitary or single concept
importance of evaluating the interpretation of test scores and then agree to which the accumulated evidence supports the intended interpretation of test scores for the proposed use
view validity as a unitary or single concept
importance of evaluating the interpretation of test scores and then agree to which the accumulated evidence supports the intended interpretation of test scores for the proposed use
current views of validity
What are the five sources of evidence of validity currently recognized by the Standards?
- evidence based on test content
- evidence based on response processes
- evidenced based on internal structure
- evidence based on relations with other variables
- evidence based on the consequences of testing
- evidence based on test content
Previously referred to as content validity, this source of validity evidence involves logically examining and evaluating the content of a test (including the test questions, format, wording, and tasks required of test takers) to determine the extent to which the content is representative of the concepts that the test is designed to measure without either underrepresenting those concepts or including elements that are irrelevant to their measurement.
Previously referred to as content validity, this source of validity evidence involves logically examining and evaluating the content of a test (including the test questions, format, wording, and tasks required of test takers) to determine the extent to which the content is representative of the concepts that the test is designed to measure without either underrepresenting those concepts or including elements that are irrelevant to their measurement.
- evidence based on test content
- evidence based on response processes
bserving test takers as they respond to the test and/or interviewing them when they complete the test
bserving test takers as they respond to the test and/or interviewing them when they complete the test
- evidence based on response processes
- evidenced based on internal structure
focuses on whether the conceptual framework used in test development could be demonstrated using appropriate analytical techniques
e.g. if a test was designed to measure a single concept (such as anxiety), we would analyze the test results to find out how many underlying concepts account for the variations in test taker scores.
focuses on whether the conceptual framework used in test development could be demonstrated using appropriate analytical techniques
e.g. if a test was designed to measure a single concept (such as anxiety), we would analyze the test results to find out how many underlying concepts account for the variations in test taker scores.
- evidenced based on internal structure
- evidence based on relations with other variables
criterion-related validity
correlating test scores with other measures to determine whether those scores are related to other measures to which we would expect them to relate
criterion-related validity
correlating test scores with other measures to determine whether those scores are related to other measures to which we would expect them to relate
- evidence based on relations with other variables
- evidence based on the consequences of testing
intended and unintended consequences may occur
if the test is biased, an unintended consequence might be that test scores appear to favor one group over another. However, it is also important to understand that just because different groups score differently on a test does not automatically mean that the test is biased
intended and unintended consequences may occur
if the test is biased, an unintended consequence might be that test scores appear to favor one group over another. However, it is also important to understand that just because different groups score differently on a test does not automatically mean that the test is biased
- evidence based on the consequences of testing
What are the three main categories and strategies for gathering evidence of validity?
- evidence based on test content/content validity
- evidence based on relations with other variables/criterion-related validity
- evidence based on relations with other variables/construct validity
- evidence based on test content/content validity
content validity: how well the test represents the material it’s covering
how well the test represents the material it’s covering
- evidence based on test content/content validity
def of content validity
- evidence based on test content/content validity
How do we know when test items show evidence of content validity?
observable and measurable behaviors
e.g. play an instrument
????
- evidence based on relations with other variables/criterion-related validity
What are the similarities and differences between predictive validity and concurrent validity?
Sim: both measure how well the test does to other measures (e.g. compare to similar test or how well it correlates with supervisor performance rating)
Diff: predictive how well they’ll do on a criterion measure (e.g. performance rating) at a later time vs concurrent measures how the scores are related to scores on other measurement that have established validity at about the same time
- evidence based on relations with other variables/criterion-related validity
What is predictive validity? Predictive method?
Predictive Validity: a test’s relationships with other variables that shows a relationship between test scores obtained at one point in time and a criterion measured at a later point in time (e.g. supervisor ratings)
Predictive Method: want to show a relationship between test scores and a future behavior
- evidence based on relations with other variables/criterion-related validity
What is concurrent validity? Concurrent method?
Concurrent Validity: a test’s relationships with other variables in which test administration and criterion measurement happen at roughly the same time
Concurrent Method: administering two measures, the test and a second measure of the attribute, to the same group of individuals at as close to the same point in time as possible (e.g. take an American Lit test and the second measure is the grade in the class)
- evidence based on relations with other variables/construct validity
What is meant by convergent evidence of validity? Example?
If the test is measuring a particular construct, we expect the scores on the test to correlate strongly with scores on other tests that measure the same or similar constructs.
example: researchers have developed a number of tests to measure general self-efficacy as well as self-efficacy related to a specific task. We would expect two measures of general self-efficacy to yield strong, positive, and statistically significant correlations.
- evidence based on relations with other variables/construct validity
What is meant by discriminant evidence of validity? Example?
When the test scores do not correlate with unrelated constructs, we can say that the test is demonstrating discriminant evidence of validity
example: a test that measures skill at performing numerical calculations would not be expected to correlate with a test that measures reading comprehension
Appropriate use of various validation strategies
Evidence based on content (content validity): for tests that measure concrete attributes (observable and measurable behaviors—play an instrument)
Evidence based on relations with external criteria: validity (criterion): for tests that predict outcomes (job performance, success in college)
Evidence based on relations with other constructs (construct validity): for tests that measure abstract constructs (like anxiety, depression, our course project)
evidence of validity based on content
2 of them
During test development
After test development
during test development
evidence of validity based on content
the first method for obtaining evidence of validity based on the content of a test involves performing a series of systematic steps as a test is being developed
define the testing universe, develop the test specifications, establishing a test format, constructing test questions
the first method for obtaining evidence of validity based on the content of a test involves performing a series of systematic steps as a test is being developed
during test development
evidence of validity based on content
define the testing universe, develop the test specifications, establishing a test format, constructing test questions
After Test Development
evidence of validity based on content
experts rate how essential item are to the attribute being measures
content validity ratio is calculated, providing a measure of agreement among the judges.
experts rate how essential item are to the attribute being measures
content validity ratio is calculated, providing a measure of agreement among the judges.
After Test Development
evidence of validity based on content
evidence of validity based on test content: summary
When purchasing a test, users should not make any assumptions about whether the test can show evidence of validity based on its content
compare the specifications of the purchased test with the content domain of the test
When purchasing a test, users should not make any assumptions about whether the test can show evidence of validity based on its content
compare the specifications of the purchased test with the content domain of the test
evidence of validity based on test content: summary
face validity
the perception of the test taker that the test measures what it is supposed to measure
the perception of the test taker that the test measures what it is supposed to measure
face validity