Chapter 6: Validity Flashcards
as applied to a test, is a judgment or estimate of how well a test measures what it purports to measure in a particular context. More specifically, it is a judgment based on evidence about the appropriateness of inferences drawn from test scores.
is the process of gathering and evaluating evidence about validity. Both the test developer and the test user may play a role in the validation of a test for a specific purpose.
Validity
One way measurement specialists have traditionally conceptualized validity is according to three categories (trinitarian view):
1) content validity
2) criterion-related validity
3) construct validity = umbrella validity
Three approaches to assessing validity—associated, respectively, with content validity, criterion-related validity, and construct validity—are:
- scrutinizing the test’s content
- relating scores obtained on the test to other test scores or other measures
- executing a comprehensive analysis of
a. how scores on the test relate to other test scores and measures
b. how scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure
Another term you may come across in the literature is _____. This variety of validity has been described as the “Rodney Dangerfield of psychometric variables” because it has “received little attention—and even less respect—from researchers examining the construct validity of psychological tests and measures”.
face validity
_____ relates more to what a test appears to measure to the person being tested than to what the test actually measures. Face validity is a judgment concerning how relevant the test items appear to be.
**from the perspective of the testtaker, not the test user.
A test’s lack of _____ could contribute to a lack of confidence in the perceived effectiveness of the test—with a consequential decrease in the testtaker’s cooperation or motivation to do his or her best.
Face validity
_____ describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample.
Content validity
The quantification of content validity
The measurement of _____ is important in employment settings, where tests used to hire and promote people are carefully scrutinized for their relevance to the job, among other factors.
content validity
One method of measuring content validity, developed by C. H. Lawshe, is essentially a method for gauging agreement among raters or judges regarding how essential a particular item is.
**content validity ratio (CVR)
Lawshe
Criterion-Related Validity (3)
1) Criterion-related
2) Concurrent
3) Predictive
Criterion-Related Validity:
is a judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest—the measure of interest being the criterion.
Ex. a company could administer a sales personality test to its sales staff to see if there is an overall correlation between their test scores and a measure of their productivity.
Criterion-Related Validity
_____ is an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently).
Ex. let’s say a group of nursing students take two final exams to assess their knowledge. One exam is a practical test and the second exam is a paper test. If the students who score well on the practical test also score well on the paper test, then ______ has occurred.
Concurrent validity
_____ is an index of the degree to which a test score predicts some criterion measure.
Ex. the SAT test is taken by high school students to predict their future performance in college (namely, their college GPA).
Predictive validity
Characteristics of a criterion (3)
1) An adequate criterion is relevant
2) An adequate criterion measure must also be valid
3) Ideally, a criterion is also uncontaminated.
Judgments of criterion-related validity, whether concurrent or predictive, are based on two types of statistical evidence: (2)
Validity Coefficient and Expectancy Data.
Type of statistical evidence:
The _____ is a correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.
Typically, the Pearson correlation coefficient is used to determine the validity between the two measures.
validity coefficient
Let’s say a medical practitioner is more likely to correctly diagnose a kidney infection if a urine test is ordered, rather than relying on a physical examination and discussion of symptoms alone. We can say that the urine test has _____.
Incremental validity
Type of statistical evidence:
_____ provide information that can be used in evaluating the criterion-related validity of a test. Using a score obtained on some test(s) or measure(s), expectancy tables illustrate the likelihood that the testtaker will score within some interval of scores on a criterion measure—an interval that may be seen as “passing”, “acceptable”, and so on.
Expectancy data
An _____ shows the percentage of people within specified test-score intervals who subsequently were placed in various categories of the criterion (for example, placed in “passed” category or “failed” category).
expectancy table
_____ provide an estimate of the extent to which inclusion of a particular test in the selection system will actually improve selection. More specifically, the tables provide an estimate of the percentage of employees hired by the use of a particular test who will be successful at their jobs, given different combinations of three variables: the test’s validity, the selection ratio used, and the base rate.
Taylor-Russel Tables
_____ limitations:
- Relationship between the predictor (the test) and the criterion (rating of performance on the job) must be linear
- Potential difficulty of identifying a criterion score that separates “successful” from “unsuccessful” employees.
Taylor-Russell tables
_____ entails obtaining the difference between the means of the selected and unselected groups to derive an index of what the test (or some other tool of assessment) is adding to already established procedures.
Naylor-Shine tables