Ch.6 for UNIT 4 Flashcards

1
Q

Validity,

A

it is a judgment based on evidence about the appropriateness of inferences drawn from test scores.1 An inference is a logical result or deduction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reasonable boundaries

A

No test or measurement technique is “universally valid” for all time, for all uses, with all types of testtaker populations. Rather, tests may be shown to be valid within what we would characterize as reasonable boundaries of a contemplated usage. If those boundaries are exceeded, the validity of the test may be called into question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Validation

A

process of gathering and evaluating evidence about validity. Both the test developer and the test user may play a role in the validation of a test for a specific purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

validation studies and local validation studies and when this is required

A

. It may sometimes be appropriate for test users to conduct their own validation studies with their own groups of testtakers. Such local validation studies may yield insights regarding a particular population of testtakers as compared to the norming sample described in a test manual. Local validation studies are absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test. For example, a local validation study would be necessary if the test user sought to transform a nationally standardized test into Braille for administration to blind and visually impaired testtakers. Local validation studies would also be necessary if a test user sought to use a test with a population of testtakers that differed in some significant way from the population on which the test was standardized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

trinitarian view of validity: criterion validity, content validity, construct validity

A

Content validity. This measure of validity is based on an evaluation of the subjects, topics, or content covered by the items in the test.

Criterion-related validity. This measure of validity is obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures.

Construct validity. This measure of validity is arrived at by executing a comprehensive analysis of how scores on the test relate to other test scores and measures, and

how scores on the test can be understood within some theoretical framework for understanding the construct that the test was designed to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

“umbrella validity”

A

construct validity as being “umbrella validity” because every other variety of validity falls under it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ecological validity

A

a judgment regarding how well a test measures what it purports to measure at the time and place that the variable being measured (typically a behavior, cognition, or emotion) is actually emitted. In essence, the greater the ecological validity of a test or other measurement procedure, the greater the generalizability of the measurement results to particular real-life circumstances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Face validity

A

relates more to what a test appears to measure to the person being tested than to what the test actually measures. Face validity is a judgment concerning how relevant the test items appear to be. Stated another way, if a test definitely appears to measure what it purports to measure “on the face of it,” then it could be said to be high in face validity.
On the one hand, a paper-and-pencil personality test labeled The Introversion/Extraversion Test, with items that ask respondents whether they have acted in an introverted or an extraverted way in particular situations, may be perceived by respondents as a highly face-valid test. On the other hand, a personality test in which respondents are asked to report what they see in inkblots may be perceived as a test with low face validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

problems with lack of face validity

A

test’s lack of face validity could contribute to a lack of confidence in the perceived effectiveness of the test—with a consequential decrease in the testtaker’s cooperation or motivation to do their best
Ultimately, face validity may be more a matter of public relations than psychometric soundness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Content validity

A

describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

test blueprint

A

test blueprint for the “structure” of the evaluation—that is, a plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth; In many instances the test blueprint represents the culmination of efforts to adequately sample the universe of content areas that conceivably could be sampled in such a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Criterion-related validity i

A

judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest—the measure of interest being the criterion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

concurrent and p[redictive validity (UNDER CRITERION RELATED VALIDITY)

A

the extent to which one measurement is backed up by a related measurement obtained at about the same point in time.Concurrent validity is an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently). Predictive validity is an index of the degree to which a test score predicts some criterion measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

criterion

A

as the standard against which a test or a test score is evaluated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Characteristics of a criterion

A

-An adequate criterion is relevant.
-An adequate criterion measure must also be valid for the purpose for which it is being used
-a criterion is also uncontaminated. Criterion contamination is the term applied to a criterion measure that has been based, at least in part, on predictor measures.

=As an example, consider a hypothetical “Inmate Violence Potential Test” (IVPT) designed to predict a prisoner’s potential for violence in the cell block. In part, this evaluation entails ratings from fellow inmates, guards, and other staff in order to come up with a number that represents each inmate’s violence potential. After all of the inmates in the study have been given scores on this test, the study authors then attempt to validate the test by asking guards to rate each inmate on their violence potential. Because the guards’ opinions were used to formulate the inmate’s test score in the first place (the predictor variable), the guards’ opinions cannot be used as a criterion against which to judge the soundness of the test. If the guards’ opinions were used both as a predictor and as a criterion, then we would say that criterion contamination had occurred.

cant reuse predictor in the criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Concurrent Validity

A

If test scores are obtained at about the same time as the criterion measures are obtained, measures of the relationship between the test scores and the criterion provide evidence of concurrent validity: Statements of concurrent validity indicate the extent to which test scores may be used to estimate an individual’s present standing on a criterion. If, for example, scores (or classifications) made on the basis of a psychodiagnostic test were to be validated against a criterion of already diagnosed psychiatric patients, then the process would be one of concurrent validation. = once the validity of the inference from the test scores is established, the test may provide a faster, less expensive way to offer a diagnosis or a classification decision
COMPARING ONE THING TO SOMETHING ELSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Predictive Validity

A

Measures of the relationship between the test scores and a criterion measure obtained at a future time provide an indication of the predictive validity of the test; that is, how accurately scores on the test predict some criterion measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

base rate vs hit rate vs miss rate , predictive validity

A

a base rate is the extent to which a particular trait, behavior, characteristic, or attribute exists in the population (expressed as a proportion). In psychometric parlance, a hit rate may be defined as the proportion of people a test accurately identifies as possessing or exhibiting a particular trait, behavior, characteristic, or attribute. miss rate may be defined as the proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute. Here, a miss amounts to an inaccurate prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

false positive vs falkse negative in miss rates

A

false positive is a miss wherein the test predicted that the testtaker did possess the particular characteristic or attribute being measured when in fact the testtaker did not. A false negative is a miss wherein the test predicted that the testtaker did not possess the particular characteristic or attribute being measured when the testtaker actually did.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The validity coefficient

A

orrelation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.

21
Q

How high should a validity coefficient be for a user or a test developer to infer that the test is valid?

A

There are no rules for determining the minimum acceptable size of a validity coefficient. In fact, Cronbach and Gleser (1965) cautioned against the establishment of such rules. They argued that validity coefficients need to be large enough to enable the test user to make accurate decisions within the unique context in which a test is being used. Essentially, the validity coefficient should be high enough to result in the identification and differentiation of testtakers with respect to target attribute(s),

22
Q

Incremental validity

A

incremental validity, defined here as the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use.
Incremental validity assesses whether a new assessment adds predictive value beyond what’s already provided by existing methods, essentially determining if the new tool offers unique information.
Here’s a more detailed explanation:

23
Q

hierarchical regression.

A

First we estimate how well a criterion can be predicted with existing predictors, and then we evaluate how much the prediction improves when the new predictor is added to the prediction equation. Incremental validity is highest when a predictor is strongly correlated with the criterion and minimally correlated with other predictors.

=To the degree that a predictor is strongly correlated with other predictors, it gives us redundant information. There is little point in going to the trouble of measuring a variable that gives us information we already had.

24
Q

Construct validity

A

the degree to which a test or instrument is capable of measuring a concept, trait, or other theoretical entity. judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct. A construct is an informed, scientific idea developed or hypothesized to describe or explain behavior./ Constructs are unobservable, presupposed (underlying) traits that a test developer may invoke to describe test behavior or criterion performance.

25
Evidence of homogeneity in construct validity ()reasons why findings would be contrary to prediction / how to improve the homogeneity of a dichotomous test
homogeneity refers to how uniform a test is in measuring a single concept. =One way a test developer can improve the homogeneity of a test containing items that are scored dichotomously (such as a true–false test) is by eliminating items that do not show significant correlation coefficients with total test scores. If all test items show significant, positive correlations with total test scores and if high scorers on the test tend to pass each item more than low scorers do, then each item is probably measuring the same construct as the total test. Each item is contributing to test homogeneity.
26
Evidence of homogeneity in construct validity ()reasons why findings would be contrary to prediction / how to improve the homogeneity of a multipoint scale test (likert)
Each response is assigned a numerical score, and items that do not show significant Spearman rank-order correlation coefficients are eliminated. If all test items show significant, positive correlations with total test scores, then each item is most likely measuring the same construct that the test as a whole is measuring (and is thereby contributing to the test’s homogeneity).
27
Evidence of changes with age and why this should be high for good construct validity
If a test score purports to be a measure of a construct that could be expected to change over time, then the test score, too, should show the same progressive changes with age to be considered a valid measure of the construct.
28
Evidence of pretest–posttest changes and why this should be high for good construct validity
-Evidence that test scores change as a result of some experience between a pretest and a posttest can be evidence of construct validity. -
29
Evidence from distinct groups and why this should behigh for good construct validity
Also referred to as the method of contrasted groups, one way of providing evidence for the validity of a test is to demonstrate that scores on the test vary in a predictable way as a function of membership in some group.
30
Convergent evidence
when scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same (or a similar) construct,
31
discriminant evidence and why this should be high in construct validity
A validity coefficient showing little (a statistically insignificant) relationship between test scores and/or other variables with which scores on the test being construct-validated should not theoretically be correlated provides discriminant evidence of construct validity (also known as discriminant validity)
32
multitrait-multimethod matrix
the matrix or table that results from correlating variables (traits) within and between methods. Values for any number of traits (such as aggressiveness or extraversion) as obtained by various methods (such as behavioral observation or a personality test) are inserted into the table, and the resulting matrix of correlations provides insight with respect to both the convergent and the discriminant validity of the methods used.
33
convergent validity
is the correlation between measures of the same trait but different methods.
34
method variance,
the similarity in scores due to the use of the same method.
35
Factor analysis
shorthand term for a class of mathematical procedures designed to identify factors or specific variables that are typically attributes, characteristics, or dimensions on which people may differ. In psychometric research, factor analysis is frequently employed as a data reduction method in which several sets of scores and the correlations between them are analyzed.
36
Exploratory factor analysis vs confirmatory factor analysis
typically entails “estimating, or extracting factors; deciding how many factors to retain; and rotating factors to an interpretable orientation” confirmatory factor analysis, researchers test the degree to which a hypothetical model (which includes factors) fits the actual data.
37
factor loading,
which is “a sort of metaphor. Each test is thought of as a vehicle carrying a certain amount of one or more abilities” Factor loading in a test conveys information about the extent to which the factor determines the test score or scores. A new test purporting to measure bulimia, for example, can be factor-analyzed with other known measures of bulimia, as well as with other kinds of measures (such as measures of intelligence, self-esteem, general anxiety, anorexia, or perfectionism). High factor loadings by the new test on a “bulimia factor” would provide convergent evidence of construct validity.
38
bias
factor inherent in a test that systematically prevents accurate, impartial measurement.
39
intercept bias vs sope bias
occurs when the use of a predictor results in consistent underprediction or overprediction of a specific group’s performance or outcomes. Slope bias occurs when a predictor has a weaker correlation with an outcome for specific groups. For example, on high-stakes educational tests, some individuals with math disabilities are allowed to use calculators as a part of their testing accommodations.
40
rating/ rating scale/ rating error
rating is a numerical or verbal judgment (or both) that places a person or an attribute along a continuum identified by a scale of numerical or word descriptors known as a rating scale. Simply stated, a rating error is a judgment resulting from the intentional or unintentional misuse of a rating scale.
41
leniency error (also known as a generosity error
is, as its name implies, an error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading
42
central tendency vs severity error
severity error. Movie critics who pan just about everything they review may be guilty of severity errors. Of course, that is only true if they review a wide range of movies that might consensually be viewed as good and bad. a central tendency error. Here the rater, for whatever reason, exhibits a general and systematic reluctance to giving ratings at either the positive or the negative extreme. Consequently, all of this rater’s ratings would tend to cluster in the middle of the rating continuum.
43
restriction-of-range rating errors and how to overcome them
central tendency, leniency, severity errors rankings, a procedure that requires the rater to measure individuals against one another instead of against an absolute scale. By using rankings instead of ratings, the rater (now the “ranker”) is forced to select first, second, third choices, and so forth.
44
Halo effect
describes the fact that, for some raters, some ratees can do no wrong. More specifically, a halo effect may also be defined as a tendency to give a particular ratee a higher rating than the ratee objectively deserves because of the rater’s failure to discriminate among conceptually distinct and potentially independent aspects of a ratee’s behavior. . Men have been shown to receive more favorable evaluations than women in traditionally masculine occupations. Except in highly integrated situations, ratees tend to receive higher ratings from raters of the same race (Landy & Farr, 1980). It is also possible that a particular rater may have had particularly great—or particularly distressing—prior experiences that lead them to provide extraordinarily high (or low) ratings on that irrational basis.
45
fairness
extent to which a test is used in an impartial, just, and equitable way.4
46
common misunderstandings about test fairness
-Some tests, for example, have been labeled “unfair” because they discriminate among groups of people = We would all like to believe that people are equal in every way and that all people are capable of rising to the same heights given equal opportunity. A more realistic view would appear to be that each person is capable of fulfilling a personal potential -Another misunderstanding of what constitutes an unfair or biased test is that it is unfair to administer to a particular population a standardized test that did not include members of that population in the standardization sample. In fact, the test may well be biased, but that must be determined by statistical or other means. The sheer fact that no members of a particular group were included in the standardization sample does not in itself invalidate the test for use with that group.
47
group-related test-score adjustment
-Arguments in favor of group-related test-score adjustment have been made on philosophical as well as technical grounds. From a philosophical perspective, increased minority representation is socially valued to the point that minority preference in test scoring is warranted. - In the same vein, minority preference is viewed both as a remedy for past societal wrongs and as a contemporary guarantee of proportional workplace representation. - it is argued that some tests require adjustment in scores because (1) the tests are biased, and a given score on them does not necessarily carry the same meaning for all testtakers; and/or (2) “a particular way of using a test is at odds with an espoused position as to what constitutes fair use”
48
group related test score adjustment arguments against
- those who view such adjustments as part of a social agenda for preferential treatment of certain groups. These opponents of test-score adjustment reject the subordination of individual effort and ability to group membership as criteria in the assignment of test scores -“minority applicants who are selected under a quota system but who also would have been selected under unqualified individualism and must therefore pay the price, in lowered prestige and self-esteem” -
49
techniques for Preventing or Remedying Adverse Impact and/or Instituting an Affirmative Action Program