Chapter 6: Validity Flashcards

1
Q

Judgment or estimate of how well a test measures what it
purports to measure in a particular context.

Judgment based on evidence about the appropriateness of inferences drawn from test scores

A

Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Logical result or deduction

A

Inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

May yield insights
regarding a particular population of testtakers as compared to the norming sample described in a test manual.

Absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test.

Ex: a local
validation study would be necessary if the test user sought
to transform a nationally standardized test into Braille for
administration to blind and visually impaired testtakers.

A

Local validation studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

3 categories of validity

A

Content validity
Criterion-related validity
Construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Based on an evaluation of the subjects, topics, or content covered by the items in the test.

Ex: Driver’s license exam

A test that covers all relevant topics about traffic rules and excludes irrelevant topics

A

Content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Obtained by evaluating the
relationship of scores obtained on the test to scores on other tests or measures

Ex: Compare a student’s college entrance exam score to their first-semester GPA

A

Criterion-related validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Arrived at by executing a comprehensive
analysis of:

a. how scores on the test relate to other test scores and measures, and
b. how scores on the test can be understood within some theoretical framework

Ex: An interviewer can use a structured interview to assess an applicant’s competencies and ensure they are a good fit for the role.

A

Construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classic conception of validity which visualizes construct validity as being “umbrella validity” because every other variety of validity falls under it.

A

Trinitarian view

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Different ways of approaching the process of test validation

A

Strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Concept which states that all validation is singly focused on providing evidence to support the interpretation or the inference.

A

Unitary view of validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Refers to the in-the-moment and in-the-place evaluation of targeted variables (such as behaviors, cognitions, and emotions) in a natural, naturalistic, or real-life context.

Ex: Assessing physical activity, mood, and stress, or assessing alcohol us

A

Ecological momentary assessment (EMA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How well findings from a test or study can be generalized to real-life situations and settings outside the controlled environment

Ex: Assessing driving skills using a real driving course rather than a simulated computer game.

A

Ecological validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relates more to what a test appears to measure to the person being tested than to what the test actually measures.

Ex: A test measuring anxiety that includes questions about sweating, trembling, and feeling worried, demonstrates high face validity because these items are directly related to the construct of anxiety.

A

Face validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

“structure” of the evaluation

Plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth

A

Test blueprint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

2 types of criterion-related validity

A

Concurrent validity
Predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Index of the degree to which a test score is related to some criterion measure obtained at the same time.

Ex: therapist may use two separate depression scales with a patient to confirm a diagnosis

A

Concurrent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Index of the degree to which a test score predicts some criterion measure.

Ex: College entrance tests are often used to predict college succes

A

Predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

3 characteristics of criterion

A

Relevant
Valid
Uncontaminated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Occurs when the criterion measure includes aspects unrelated to the intended construct, leading to inaccurate or misleading data

Ex: A manager’s evaluation of an employee’s work performance is influenced by how much they like or dislike that employee personally, leading to unfair assessments.

A

Criterion contamination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Extent to which a particular trait, behavior, characteristic, or attribute exists in the population

Ex: base rate of depression in a college student population

A

Base rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Proportion of people a test accurately identifies as
possessing or exhibiting a particular trait, behavior, characteristic, or attribute.

Ex: proportion of neurological patients accurately identified
as having a brain tumor

A

Hit rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute.

A

Miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

2 subtypes of miss rate

A

False positive
False Negative

24
Q

A miss wherein the test predicted that the testtaker did possess the particular
characteristic or attribute being measured when in fact the testtaker did not.

Ex: A psychological test designed to screen for depression incorrectly flags a person as depressed when they are not experiencing symptoms of depression.

A

False positive

25
Q

A miss wherein the test predicted that the testtaker did not possess the particular characteristic
or attribute being measured when the testtaker actually did.

Ex: A pregnancy test that indicates a woman is not pregnant, while she is, is a false negative.

A

False Negative

26
Q

Judgments of criterion-related validity are based on 2
types of statistical evidence

A

Validity coefficient
Expectancy data

27
Q

Correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.

A

Validity coefficient

28
Q

Illustrate the relationship between predictor scores (like test scores) and criterion scores (like job performance), showing the likelihood of a specific outcome based on a testtaker’s performance

A

Expectancy data

29
Q

Degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

Ex: a medical practitioner is more likely to correctly diagnose a kidney infection if a urine test is ordered, rather than relying on a physical examination and discussion of symptoms alone

A

Incremental validity

30
Q

Informed, scientific idea developed or hypothesized to describe or explain behavior.

Ex: Intelligence, personality

31
Q

Type of construct validity evidence and the procedures used to obtain
it (5)

A

■ the test is homogeneous, measuring a single construct;

■ test scores changes with age

■ test scores has pretest-posttest changes

■ test scores obtained by people from distinct groups vary

■ test scores correlate with scores on other tests

32
Q

Refers to how uniform a test is in measuring a single concept.

Ex: If the test is one of
marital satisfaction, and if individuals who score high on the test as a whole respond to a
particular item in a way that would indicate that they are not satisfied whereas people who tend
not to be satisfied respond to the item in a way that would
indicate that they are satisfied, then again the item should
probably be eliminated or at least reexamined for clarity.

A

Homogeneity

33
Q

Some constructs are expected to change over time.

Ex: For example, if children in grades 6, 7, 8, and 9 took a test of eighth-grade vocabulary, then we would expect that the total number of items scored as correct from all the test protocols would increase as a function of the higher grade level of the testtakers.

A

Changes with age

34
Q

Also referred to as the method of contrasted groups

Demonstrate that scores on the test vary in a predictable way as a function of membership in some group.

Ex: Roach and colleagues (1981) proceeded by identifying two
groups of married couples, one relatively satisfied in their marriage, the other not so satisfied.
The groups were identified by ratings by peers and professional marriage counselors. The group
of couples rated by peers and counselors to be happily married rated themselves on the MSS
as significantly more satisfied with their marriage than did couples rated as less happily married evidence to support the notion that the MSS is indeed a valid measure of the construct marital satisfaction.

A

Distinct groups

35
Q

Scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same (or a similar) construct

Also from correlations with measures purporting to
measure related constructs.

A

Convergent evidence 

36
Q

Used to determine if two constructs that shouldn’t be related are, in fact, unrelated.

Ex: Scores on a test measuring vocabulary recognition shouldn’t be correlated with scores on a test measuring fine motor skills.

A

Discriminant validity

37
Q

Matrix or table that results from correlating variables (traits) within and between methods, aiming to show convergent and discriminant validity

A

Multitrait-multimethod matrix

38
Q

Correlation between measures of
the same trait but different methods.

Ex: People who score highly on self-esteem tests are likely to score highly on extroversion tests.

A

Convergent validity

39
Q

Class of mathematical procedures designed to identify factors
or specific variables that are typically attributes, characteristics, or dimensions on which people may differ

A

Factor analysis

40
Q

2 types of factor analysis

A

Exploratory factor analysis
Confirmatory factor analysis

41
Q

Method used to identify underlying factors that explain correlations between variables.

Ex: Researchers might use EFA to understand what factors influence student satisfaction with their university experience.

A

Exploratory factor analysis

42
Q

Researchers test the degree to which a hypothetical model (which includes factors) fits the actual data.

Ex: Researchers might use a questionnaire to understand factors that influence student satisfaction with their university.

A

Confirmatory factor analysis

43
Q

Sort of metaphor

Each test is thought of as a vehicle carrying a certain amount of one or more abilities

Ex: A new test purporting to measure bulimia can be factor-analyzed with other known measures of bulimia, as well as with other kinds of measures (such as measures of intelligence, self-esteem, general anxiety, anorexia, or perfectionism)

A

Factor loading

44
Q

Factor inherent in a test that systematically prevents accurate, impartial measurement.

45
Q

Occurs when the use of a predictor results in consistent underprediction or overprediction of a specific group’s performance or outcomes

Ex: A verbal reasoning test uses vocabulary more familiar to high-SES (socioeconomic status) individuals, leading to lower scores for equally capable low-SES applicants

A

Intercept bias

46
Q

Occurs when a predictor has a weaker correlation with an outcome for specific
groups

Ex: If a depression scale predicts future mental health outcomes more accurately for one cultural group than another, it may have slope bias. For example, if the scale underestimates the severity of depression in one group compared to another, leading to different treatment recommendations, this reflects bias in prediction.

A

Slope bias

47
Q

Numerical or verbal judgment (or both) that places a person or an
attribute along a continuum identified by a rating scale

48
Q

Scale of numerical or word descriptors

A

Rating scale

49
Q

Judgment resulting from the intentional or unintentional misuse of a rating scale.

A

Rating error

50
Q

Also known as a generosity error

Error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading

Ex: Section of a particular course will quickly be filled if it is being taught by a professor with a reputation for leniency errors in end-of-term grading.

A

Leniency error

51
Q

Rating error that results in consistently low scores. It can occur when a rater is too strict or negative.

Ex: A professor consistently gives students lower grades than their actual performance merits, believing that only exceptional work deserves top marks.

A

Severity error

52
Q

Rater exhibits a general and systematic reluctance to give ratings at either the
positive or the negative extreme.

Consequently, all of this rater’s ratings would tend to cluster
in the middle of the rating continuum.

Ex: A reviewer consistently rates products between 3 and 4 stars, even when some deserve 1 star and others deserve 5.

A

Central tendency error

53
Q

Procedure that requires the rater to measure individuals against one another instead of against an absolute scale.

54
Q

Describes the fact that, for some raters, some ratees can do no wrong.

Ex: Assuming that a coworker is more skilled than they actually are because they went to a prestigious university

A

Halo effect

55
Q

Extent to which a test is used in an impartial, just, and equitable way

A

Fairness or Test fairness