Ch.6 for UNIT 4 Flashcards
Validity,
it is a judgment based on evidence about the appropriateness of inferences drawn from test scores.1 An inference is a logical result or deduction.
reasonable boundaries
No test or measurement technique is “universally valid” for all time, for all uses, with all types of testtaker populations. Rather, tests may be shown to be valid within what we would characterize as reasonable boundaries of a contemplated usage. If those boundaries are exceeded, the validity of the test may be called into question
Validation
process of gathering and evaluating evidence about validity. Both the test developer and the test user may play a role in the validation of a test for a specific purpose.
validation studies and local validation studies and when this is required
. It may sometimes be appropriate for test users to conduct their own validation studies with their own groups of testtakers. Such local validation studies may yield insights regarding a particular population of testtakers as compared to the norming sample described in a test manual. Local validation studies are absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test. For example, a local validation study would be necessary if the test user sought to transform a nationally standardized test into Braille for administration to blind and visually impaired testtakers. Local validation studies would also be necessary if a test user sought to use a test with a population of testtakers that differed in some significant way from the population on which the test was standardized.
trinitarian view of validity: criterion validity, content validity, construct validity
Content validity. This measure of validity is based on an evaluation of the subjects, topics, or content covered by the items in the test.
Criterion-related validity. This measure of validity is obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures.
Construct validity. This measure of validity is arrived at by executing a comprehensive analysis of how scores on the test relate to other test scores and measures, and
how scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure.
“umbrella validity”
construct validity as being “umbrella validity” because every other variety of validity falls under it
ecological validity
a judgment regarding how well a test measures what it purports to measure at the time and place that the variable being measured (typically a behavior, cognition, or emotion) is actually emitted. In essence, the greater the ecological validity of a test or other measurement procedure, the greater the generalizability of the measurement results to particular real-life circumstances.
Face validity
relates more to what a test appears to measure to the person being tested than to what the test actually measures. Face validity is a judgment concerning how relevant the test items appear to be. Stated another way, if a test definitely appears to measure what it purports to measure “on the face of it,” then it could be said to be high in face validity.
On the one hand, a paper-and-pencil personality test labeled The Introversion/Extraversion Test, with items that ask respondents whether they have acted in an introverted or an extraverted way in particular situations, may be perceived by respondents as a highly face-valid test. On the other hand, a personality test in which respondents are asked to report what they see in inkblots may be perceived as a test with low face validity.
problems with lack of face validity
test’s lack of face validity could contribute to a lack of confidence in the perceived effectiveness of the test—with a consequential decrease in the testtaker’s cooperation or motivation to do their best
Ultimately, face validity may be more a matter of public relations than psychometric soundness.
Content validity
describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample.
test blueprint
test blueprint for the “structure” of the evaluation—that is, a plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth; In many instances the test blueprint represents the culmination of efforts to adequately sample the universe of content areas that conceivably could be sampled in such a test
Criterion-related validity i
judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest—the measure of interest being the criterion.
concurrent and p[redictive validity (UNDER CRITERION RELATED VALIDITY)
the extent to which one measurement is backed up by a related measurement obtained at about the same point in time.Concurrent validity is an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently). Predictive validity is an index of the degree to which a test score predicts some criterion measure.
criterion
as the standard against which a test or a test score is evaluated.
Characteristics of a criterion
-An adequate criterion is relevant.
-An adequate criterion measure must also be valid for the purpose for which it is being used
-a criterion is also uncontaminated. Criterion contamination is the term applied to a criterion measure that has been based, at least in part, on predictor measures.
=As an example, consider a hypothetical “Inmate Violence Potential Test” (IVPT) designed to predict a prisoner’s potential for violence in the cell block. In part, this evaluation entails ratings from fellow inmates, guards, and other staff in order to come up with a number that represents each inmate’s violence potential. After all of the inmates in the study have been given scores on this test, the study authors then attempt to validate the test by asking guards to rate each inmate on their violence potential. Because the guards’ opinions were used to formulate the inmate’s test score in the first place (the predictor variable), the guards’ opinions cannot be used as a criterion against which to judge the soundness of the test. If the guards’ opinions were used both as a predictor and as a criterion, then we would say that criterion contamination had occurred.
cant reuse predictor in the criterion
Concurrent Validity
If test scores are obtained at about the same time as the criterion measures are obtained, measures of the relationship between the test scores and the criterion provide evidence of concurrent validity: Statements of concurrent validity indicate the extent to which test scores may be used to estimate an individual’s present standing on a criterion. If, for example, scores (or classifications) made on the basis of a psychodiagnostic test were to be validated against a criterion of already diagnosed psychiatric patients, then the process would be one of concurrent validation. = once the validity of the inference from the test scores is established, the test may provide a faster, less expensive way to offer a diagnosis or a classification decision
COMPARING ONE THING TO SOMETHING ELSE
Predictive Validity
Measures of the relationship between the test scores and a criterion measure obtained at a future time provide an indication of the predictive validity of the test; that is, how accurately scores on the test predict some criterion measure.
base rate vs hit rate vs miss rate , predictive validity
a base rate is the extent to which a particular trait, behavior, characteristic, or attribute exists in the population (expressed as a proportion). In psychometric parlance, a hit rate may be defined as the proportion of people a test accurately identifies as possessing or exhibiting a particular trait, behavior, characteristic, or attribute. miss rate may be defined as the proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute. Here, a miss amounts to an inaccurate prediction.
false positive vs falkse negative in miss rates
false positive is a miss wherein the test predicted that the testtaker did possess the particular characteristic or attribute being measured when in fact the testtaker did not. A false negative is a miss wherein the test predicted that the testtaker did not possess the particular characteristic or attribute being measured when the testtaker actually did.
The validity coefficient
orrelation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.
How high should a validity coefficient be for a user or a test developer to infer that the test is valid?
There are no rules for determining the minimum acceptable size of a validity coefficient. In fact, Cronbach and Gleser (1965) cautioned against the establishment of such rules. They argued that validity coefficients need to be large enough to enable the test user to make accurate decisions within the unique context in which a test is being used. Essentially, the validity coefficient should be high enough to result in the identification and differentiation of testtakers with respect to target attribute(s),
Incremental validity
incremental validity, defined here as the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use.
Incremental validity assesses whether a new assessment adds predictive value beyond what’s already provided by existing methods, essentially determining if the new tool offers unique information.
Here’s a more detailed explanation:
hierarchical regression.
First we estimate how well a criterion can be predicted with existing predictors, and then we evaluate how much the prediction improves when the new predictor is added to the prediction equation. Incremental validity is highest when a predictor is strongly correlated with the criterion and minimally correlated with other predictors.
=To the degree that a predictor is strongly correlated with other predictors, it gives us redundant information. There is little point in going to the trouble of measuring a variable that gives us information we already had.
Construct validity
the degree to which a test or instrument is capable of measuring a concept, trait, or other theoretical entity. judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct. A construct is an informed, scientific idea developed or hypothesized to describe or explain behavior./ Constructs are unobservable, presupposed (underlying) traits that a test developer may invoke to describe test behavior or criterion performance.