Chapter 6: Validity Flashcards

1
Q

Judgment or estimate of how well a test measures what it
purports to measure in a particular context.

Judgment based on evidence about the appropriateness of inferences drawn from test scores

A

Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Logical result or deduction

A

Inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

May yield insights
regarding a particular population of testtakers as compared to the norming sample described in a test manual.

Absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test.

Ex: a local
validation study would be necessary if the test user sought
to transform a nationally standardized test into Braille for
administration to blind and visually impaired testtakers.

A

Local validation studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

3 categories of validity

A

Content validity
Criterion-related validity
Construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Based on an evaluation of the subjects, topics, or content covered by the items in the test.

Ex: Driver’s license exam

A test that covers all relevant topics about traffic rules and excludes irrelevant topics

A

Content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Obtained by evaluating the
relationship of scores obtained on the test to scores on other tests or measures

Ex: Compare a student’s college entrance exam score to their first-semester GPA

A

Criterion-related validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Arrived at by executing a comprehensive
analysis of:

a. how scores on the test relate to other test scores and measures, and
b. how scores on the test can be understood within some theoretical framework

Ex: An interviewer can use a structured interview to assess an applicant’s competencies and ensure they are a good fit for the role.

A

Construct validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classic conception of validity which visualizes construct validity as being “umbrella validity” because every other variety of validity falls under it.

A

Trinitarian view

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Different ways of approaching the process of test validation

A

Strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Concept which states that all validation is singly focused on providing evidence to support the interpretation or the inference.

A

Unitary view of validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Refers to the in-the-moment and in-the-place evaluation of targeted variables (such as behaviors, cognitions, and emotions) in a natural, naturalistic, or real-life context.

Ex: Assessing physical activity, mood, and stress, or assessing alcohol us

A

Ecological momentary assessment (EMA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How well findings from a test or study can be generalized to real-life situations and settings outside the controlled environment

Ex: Assessing driving skills using a real driving course rather than a simulated computer game.

A

Ecological validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relates more to what a test appears to measure to the person being tested than to what the test actually measures.

Ex: A test measuring anxiety that includes questions about sweating, trembling, and feeling worried, demonstrates high face validity because these items are directly related to the construct of anxiety.

A

Face validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

“structure” of the evaluation

Plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth

A

Test blueprint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

2 types of criterion-related validity

A

Concurrent validity
Predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Index of the degree to which a test score is related to some criterion measure obtained at the same time.

Ex: therapist may use two separate depression scales with a patient to confirm a diagnosis

A

Concurrent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Index of the degree to which a test score predicts some criterion measure.

Ex: College entrance tests are often used to predict college succes

A

Predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

3 characteristics of criterion

A

Relevant
Valid
Uncontaminated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Occurs when the criterion measure includes aspects unrelated to the intended construct, leading to inaccurate or misleading data

Ex: A manager’s evaluation of an employee’s work performance is influenced by how much they like or dislike that employee personally, leading to unfair assessments.

A

Criterion contamination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Extent to which a particular trait, behavior, characteristic, or attribute exists in the population

Ex: base rate of depression in a college student population

A

Base rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Proportion of people a test accurately identifies as
possessing or exhibiting a particular trait, behavior, characteristic, or attribute.

Ex: proportion of neurological patients accurately identified
as having a brain tumor

A

Hit rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute.

A

Miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

2 subtypes of miss rate

A

False positive
False Negative

24
Q

A miss wherein the test predicted that the testtaker did possess the particular
characteristic or attribute being measured when in fact the testtaker did not.

Ex: A psychological test designed to screen for depression incorrectly flags a person as depressed when they are not experiencing symptoms of depression.

A

False positive

25
A miss wherein the test predicted that the testtaker did not possess the particular characteristic or attribute being measured when the testtaker actually did. Ex: A pregnancy test that indicates a woman is not pregnant, while she is, is a false negative.
False Negative
26
Judgments of criterion-related validity are based on 2 types of statistical evidence
Validity coefficient Expectancy data
27
Correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.
Validity coefficient
28
Illustrate the relationship between predictor scores (like test scores) and criterion scores (like job performance), showing the likelihood of a specific outcome based on a testtaker's performance
Expectancy data
29
Degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use Ex: a medical practitioner is more likely to correctly diagnose a kidney infection if a urine test is ordered, rather than relying on a physical examination and discussion of symptoms alone
Incremental validity
30
Informed, scientific idea developed or hypothesized to describe or explain behavior. Ex: Intelligence, personality
Construct
31
Type of construct validity evidence and the procedures used to obtain it (5)
■ the test is homogeneous, measuring a single construct; ■ test scores changes with age ■ test scores has pretest-posttest changes ■ test scores obtained by people from distinct groups vary ■ test scores correlate with scores on other tests
32
Refers to how uniform a test is in measuring a single concept. Ex: If the test is one of marital satisfaction, and if individuals who score high on the test as a whole respond to a particular item in a way that would indicate that they are not satisfied whereas people who tend not to be satisfied respond to the item in a way that would indicate that they are satisfied, then again the item should probably be eliminated or at least reexamined for clarity.
Homogeneity
33
Some constructs are expected to change over time. Ex: For example, if children in grades 6, 7, 8, and 9 took a test of eighth-grade vocabulary, then we would expect that the total number of items scored as correct from all the test protocols would increase as a function of the higher grade level of the testtakers.
Changes with age
34
Also referred to as the method of contrasted groups Demonstrate that scores on the test vary in a predictable way as a function of membership in some group. Ex: Roach and colleagues (1981) proceeded by identifying two groups of married couples, one relatively satisfied in their marriage, the other not so satisfied. The groups were identified by ratings by peers and professional marriage counselors. The group of couples rated by peers and counselors to be happily married rated themselves on the MSS as significantly more satisfied with their marriage than did couples rated as less happily married evidence to support the notion that the MSS is indeed a valid measure of the construct marital satisfaction.
Distinct groups
35
Scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same (or a similar) construct Also from correlations with measures purporting to measure related constructs.
Convergent evidence 
36
Used to determine if two constructs that shouldn't be related are, in fact, unrelated. Ex: Scores on a test measuring vocabulary recognition shouldn't be correlated with scores on a test measuring fine motor skills.
Discriminant validity
37
Matrix or table that results from correlating variables (traits) within and between methods, aiming to show convergent and discriminant validity
Multitrait-multimethod matrix
38
Correlation between measures of the same trait but different methods. Ex: People who score highly on self-esteem tests are likely to score highly on extroversion tests.
Convergent validity
39
Class of mathematical procedures designed to identify factors or specific variables that are typically attributes, characteristics, or dimensions on which people may differ
Factor analysis
40
2 types of factor analysis
Exploratory factor analysis Confirmatory factor analysis
41
Method used to identify underlying factors that explain correlations between variables. Ex: Researchers might use EFA to understand what factors influence student satisfaction with their university experience.
Exploratory factor analysis
42
Researchers test the degree to which a hypothetical model (which includes factors) fits the actual data. Ex: Researchers might use a questionnaire to understand factors that influence student satisfaction with their university.
Confirmatory factor analysis
43
Sort of metaphor Each test is thought of as a vehicle carrying a certain amount of one or more abilities Ex: A new test purporting to measure bulimia can be factor-analyzed with other known measures of bulimia, as well as with other kinds of measures (such as measures of intelligence, self-esteem, general anxiety, anorexia, or perfectionism)
Factor loading
44
Factor inherent in a test that systematically prevents accurate, impartial measurement.
Bias
45
Occurs when the use of a predictor results in consistent underprediction or overprediction of a specific group’s performance or outcomes Ex: A verbal reasoning test uses vocabulary more familiar to high-SES (socioeconomic status) individuals, leading to lower scores for equally capable low-SES applicants
Intercept bias
46
Occurs when a predictor has a weaker correlation with an outcome for specific groups Ex: If a depression scale predicts future mental health outcomes more accurately for one cultural group than another, it may have slope bias. For example, if the scale underestimates the severity of depression in one group compared to another, leading to different treatment recommendations, this reflects bias in prediction.
Slope bias
47
Numerical or verbal judgment (or both) that places a person or an attribute along a continuum identified by a rating scale
Rating
48
Scale of numerical or word descriptors
Rating scale
49
Judgment resulting from the intentional or unintentional misuse of a rating scale.
Rating error
50
Also known as a generosity error Error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading Ex: Section of a particular course will quickly be filled if it is being taught by a professor with a reputation for leniency errors in end-of-term grading.
Leniency error
51
Rating error that results in consistently low scores. It can occur when a rater is too strict or negative. Ex: A professor consistently gives students lower grades than their actual performance merits, believing that only exceptional work deserves top marks.
Severity error
52
Rater exhibits a general and systematic reluctance to give ratings at either the positive or the negative extreme. Consequently, all of this rater’s ratings would tend to cluster in the middle of the rating continuum. Ex: A reviewer consistently rates products between 3 and 4 stars, even when some deserve 1 star and others deserve 5.
Central tendency error
53
Procedure that requires the rater to measure individuals against one another instead of against an absolute scale.
Rankings
54
Describes the fact that, for some raters, some ratees can do no wrong. Ex: Assuming that a coworker is more skilled than they actually are because they went to a prestigious university
Halo effect
55
Extent to which a test is used in an impartial, just, and equitable way
Fairness or Test fairness