chapter 6 Flashcards

1
Q
  • As applied to a test, is a judgment or estimate of how well a test measures what it purports to measure in a particular context.
  • More specifically, it is a judgment based on
    evidence about the appropriateness of inferences drawn from test scores
A

Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is a logical result or deduction.

A

Inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is the process of gathering and evaluating evidence about validity.

A

Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Are absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test.
- Require professional time and know-how, and they may be costly.
- May yield insights regarding a particular population of testtakers as compared to the norming sample described in a test manual.

A

Local validation studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

One way measurement specialists have traditionally conceptualized validity is according to three categories:

A

Content validity
Criterion-related validity
Construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

This is a measure of validity based on an evaluation of the subjects, topics, or content covered by the items in the test.

A

Content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

This is a measure of validity obtained by evaluating the relationship of scores obtained on the test to scores on other tests or measures.

A

Criterion-related validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

This is a measure of validity that is arrived at by executing a comprehensive analysis of

a. how scores on the test relate to other test scores and measures, and
b. how scores on the test can be understood within some theoretical framework for
understanding the construct that the test was designed to measure.

A

Construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Refers to a judgment regarding how well a test measures what it purports to measure at the time and place that the variable being measured (typically a behavior, cognition, or emotion) is actually emitted.

A

Ecological validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Relates more to what a test appears to measure to the person being tested than to what the test actually measures.
- Is a judgment concerning how relevant the test items appear to be.

A

Face validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describes judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample.

A

Content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For the “structure” of the evaluation—that is, a plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth.

A

Test blueprint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is a judgment of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest—the measure of interest being the criterion.

A

Criterion-related validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Two types of validity evidence are subsumed under the heading:

A

Concurrent validity
Predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Is an index of the degree to which a test score is related to some criterion measure obtained at the same time (concurrently).

A

Concurrent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is an index of the degree to which a test score predicts some criterion measure.

A

Predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The standard against which a test a test score is evaluated.

A

Criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Characteristics of a criterion:

A
  • An adequate criterion is relevant.
  • An adequate criterion measure must also be valid for the purpose for which it is being used.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Is the term applied to a criterion measure that has been based, at least in part, on predictor measures.

A

Criterion contamination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Is the extent to which a particular trait, behavior, characteristic, or attribute exists in the population (expressed as a proportion).

A

Base rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

May be defined as the proportion of people a test accurately identifies as possessing or exhibiting a particular trait, behavior, characteristic, or attribute.

A

Hit rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

May be defined as the proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute

A

Miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Is a miss wherein the test predicted that the test-taker did possess the particular characteristic or attribute being measured when in fact the testtaker did not.

A

False positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Is a miss wherein the test predicted that the test-taker did not possess the particular characteristic or attribute being measured when the testtaker actually did.

A

False negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Judgments of criterion-related validity, whether concurrent or predictive, are based on two types of statistical evidence:

A

Validity coefficient
Expectancy data

26
Q

Is a correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.

A

Validity coefficient

27
Q

The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use.

A

Incremental validity

28
Q

Is a judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct.

A

Construct validity

29
Q

Is an informed, scientific idea developed or hypothesized to describe or explain behavior.
- Are unobservable, presupposed (underlying) traits that a test developer may invoke to describe test behavior or criterion performance.

30
Q

Evidence of Construct Validity:

A
  1. Evidence of homogeneity
  2. Evidence of changes with age
  3. Evidence of pretest–posttest changes
  4. Evidence from distinct groups
  5. Convergent evidence
  6. Discriminant evidence
31
Q

Refers to how uniform a test is in measuring a single concept.

A

Homogeneity

32
Q

Some constructs are expected to change over time.
- If a test score purports to be a measure of a construct that could be expected to change over time, then the test score, too, should show the same progressive changes with age to be considered a valid measure of the construct.

A

Evidence of changes with age

33
Q

Evidence that test scores change as a result of some experience between a pretest and a posttest can be evidence of construct validity.

A

Evidence of pretest–posttest changes

34
Q

One way of providing evidence for the validity of a test is to demonstrate that scores on the test vary in a predictable way as a function of membership in some group.

A

Method of contrasted groups

35
Q

Evidence for the construct validity of a particular test may converge from a number of sources, such as other tests or measures designed to assess the same (or a similar) construct. Thus, if scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same (or a similar) construct, this would be an example of ___________

A

Convergent evidence

36
Q

A validity coefficient showing little (a statistically insignificant) relationship between test scores and/or other variables with which scores on the test being construct-validated should not theoretically be correlated provides __________ of construct validity (also known as ___________)

A

Discriminant evidence / discriminant evidence

37
Q

An In 1959 an experimental technique useful for examining both convergent and discriminant validity evidence was presented in Psychological Bulletin. This rather technical procedure was called the __________.

A

Multitrait-multimethod matrix

38
Q

Data indicating that a test measures the same construct as other tests purporting to measure the same construct are also referred to as evidence of ___________.

A

Convergent validity

39
Q

Both convergent and discriminant evidence of construct validity can be obtained by the use of __________.
- Is a shorthand term for a class of mathematical procedures designed to identify factors or specific variables that are typically attributes, characteristics, or dimensions on which people may differ.

A

Factor analysis

40
Q

Typically entails “estimating, or extracting factors; deciding how many factors to retain; and rotating factors to an interpretable orientation”

A

Exploratory factor analysis

41
Q

Researchers test the degree to which a hypothetical model (which includes factors) fits the actual data.

A

Confirmatory factor analysis

42
Q

“A ort of metaphor. Each test is thought of as a vehicle carrying a certain amount of one or more abilities”
- conveys information about the extent to which the factor determines the test score or scores.

A

Factor loading

43
Q
  • May conjure up many meanings having to do with prejudice and preferential treatment.
  • For psychometricians, it is a factor inherent in a test that systematically prevents accurate, impartial measurement.
44
Q

Is a numerical or verbal judgment (or both) that places a person or an attribute along a continuum identified by a scale of numerical or word descriptors known as a _________.

A

Rating / Rating scale

45
Q

Is a judgment resulting from the intentional or
unintentional misuse of a rating scale.

A

Rating error

46
Q

(Also known as a generosity error) is, as its name implies, an error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading.

A

Leniency error

47
Q

Here the rater, for whatever reason, exhibits a general and systematic reluctance to giving ratings at either the positive or the negative extreme.

A

Central tendency error

48
Q

A procedure that requires the rater to measure individuals against one another instead of against an absolute scale.

49
Q

Describes the fact that, for some raters, some ratees can do no wrong.

A

Halo effect

50
Q

In psychometric context as the extent to which a test is used in an impartial, just, and equitable way.

51
Q

To show a statistically significant difference between individuals or groups with respect to measurement.

A

To discriminate

52
Q

Psychometric Techniques for Preventing or Remedying Adverse Impact and/or Instituting an Affirmative Action Program:

A
  1. Addition of Points
  2. Differential Scoring of Items
  3. Elimination of Items Based on Differential Item Functioning
  4. Differential Cutoffs
  5. Separate Lists
  6. Within-Group Norming
  7. Banding
  8. Preference Policies
53
Q

A constant number of points is added to the test score of members of a particular group. The purpose of the point addition is to reduce or eliminate observed differences between groups.

A

Addition of Points

54
Q

This technique incorporates group membership information, not in adjusting a raw score on a test but in deriving the score in the first place. The application of the technique may involve the scoring of some test items for members of one group but not scoring the same test items for members of another group. This technique is also known as empirical keying by group.

A

Differential Scoring of Items

55
Q

This procedure entails removing from a test any items found to inappropriately favor one group’s test performance over another’s. Ideally, the intent of the elimination of certain test items is not to make the test easier for any group but simply to make the test fairer. Sackett and Wilk (1994) put it this way: “Conceptually, rather than asking ‘Is this item harder for members of Group X than it is for Group Y?’ these approaches ask ‘Is this item harder for members of Group X with true score Z than it is for members of Group Y with true score Z?’”

A

Elimination of Items Based on Differential Item Functioning

56
Q

Different cutoffs are set for members of different groups. For example, a passing score for members of one group is 65, whereas a passing score for members of another group is 70. As with the addition of points, the purpose of ____________ is to reduce or eliminate observed differences between groups.

A

Differential Cutoffs

57
Q

Different lists of testtaker scores are established by group membership. For each list, test performance of testtakers is ranked in top-down fashion. Users of the test scores for selection purposes may alternate selections from the different lists. Depending on factors such as the allocation rules in effect and the equivalency of the standard deviation within the groups, the ____________ technique may yield effects similar to those of other techniques, such as the addition of points and differential cutoffs. In practice, the _______ is popular in affirmative action programs where the intent is to overselect from previously excluded groups.

A

Separate Lists

58
Q

Used as a remedy for adverse impact if members of different groups tend to perform differentially on a particular test, __________ entails the conversion of all raw scores into percentile scores or standard scores based on the test performance of one’s own group.

A

Within-Group Norming

59
Q

When race is the primary criterion of group membership and separate norms are established by race, this technique is known as

A

Race norming

60
Q

The effect of _________ of test scores is to make equivalent all scores that fall within a particular range or band. For example, thousands of raw scores on a test may be transformed to a stanine having a value of 1 to 9. All scores that fall within each of the stanine boundaries will be treated by the test user as either equivalent or subject to some additional selection criteria.

61
Q

Is a modified banding procedure wherein a band is adjusted (“slid”) to permit the selection of more members of some group than would otherwise be selected.

A

Sliding band

62
Q

In the interest of affirmative action, reverse discrimination, or some other policy deemed to be in the interest of society at large, a test user might establish a policy of preference based on group membership. For example, if a municipal fire department sought to increase the representation of female personnel in its ranks, it might institute a test-related policy designed to do just that. A key provision in this policy might be that when a male and a female earn equal scores on the test used for hiring, the female will be hired.

A

Preference Policies