Chapter 6: Validity Flashcards by Fritzy Gletch Berdin

Judgment or estimate of how well a test measures what it
purports to measure in a particular context.

Judgment based on evidence about the appropriateness of inferences drawn from test scores

Validity

How well did you know this?

Not at all

Perfectly

Logical result or deduction

Inference

How well did you know this?

Not at all

Perfectly

May yield insights
regarding a particular population of testtakers as compared to the norming sample described in a test manual.

Absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test.

Ex: a local
validation study would be necessary if the test user sought
to transform a nationally standardized test into Braille for
administration to blind and visually impaired testtakers.

Local validation studies

How well did you know this?

Not at all

Perfectly

3 categories of validity

Content validity
Criterion-related validity
Construct validity

How well did you know this?

Not at all

Perfectly

Based on an evaluation of the subjects, topics, or content covered by the items in the test.

Ex: Driver’s license exam

A test that covers all relevant topics about traffic rules and excludes irrelevant topics

Content validity

How well did you know this?

Not at all

Perfectly

Obtained by evaluating the
relationship of scores obtained on the test to scores on other tests or measures

Ex: Compare a student’s college entrance exam score to their first-semester GPA

Criterion-related validity

How well did you know this?

Not at all

Perfectly

Arrived at by executing a comprehensive
analysis of:

a. how scores on the test relate to other test scores and measures, and
b. how scores on the test can be understood within some theoretical framework

Ex: An interviewer can use a structured interview to assess an applicant’s competencies and ensure they are a good fit for the role.

Construct validity

How well did you know this?

Not at all

Perfectly

Classic conception of validity which visualizes construct validity as being “umbrella validity” because every other variety of validity falls under it.

Trinitarian view

How well did you know this?

Not at all

Perfectly

Different ways of approaching the process of test validation

Strategies

How well did you know this?

Not at all

Perfectly

Concept which states that all validation is singly focused on providing evidence to support the interpretation or the inference.

Unitary view of validity

How well did you know this?

Not at all

Perfectly

Refers to the in-the-moment and in-the-place evaluation of targeted variables (such as behaviors, cognitions, and emotions) in a natural, naturalistic, or real-life context.

Ex: Assessing physical activity, mood, and stress, or assessing alcohol us

Ecological momentary assessment (EMA)

How well did you know this?

Not at all

Perfectly

How well findings from a test or study can be generalized to real-life situations and settings outside the controlled environment

Ex: Assessing driving skills using a real driving course rather than a simulated computer game.

Ecological validity

How well did you know this?

Not at all

Perfectly

Relates more to what a test appears to measure to the person being tested than to what the test actually measures.

Ex: A test measuring anxiety that includes questions about sweating, trembling, and feeling worried, demonstrates high face validity because these items are directly related to the construct of anxiety.

Face validity

How well did you know this?

Not at all

Perfectly

“structure” of the evaluation

Plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth

Test blueprint

How well did you know this?

Not at all

Perfectly

2 types of criterion-related validity

Concurrent validity
Predictive validity

How well did you know this?

Not at all

Perfectly

Index of the degree to which a test score is related to some criterion measure obtained at the same time.

Ex: therapist may use two separate depression scales with a patient to confirm a diagnosis

Concurrent validity

How well did you know this?

Not at all

Perfectly

Index of the degree to which a test score predicts some criterion measure.

Ex: College entrance tests are often used to predict college succes

Predictive validity

How well did you know this?

Not at all

Perfectly

3 characteristics of criterion

Relevant
Valid
Uncontaminated

How well did you know this?

Not at all

Perfectly

Occurs when the criterion measure includes aspects unrelated to the intended construct, leading to inaccurate or misleading data

Ex: A manager’s evaluation of an employee’s work performance is influenced by how much they like or dislike that employee personally, leading to unfair assessments.

Criterion contamination

How well did you know this?

Not at all

Perfectly

Extent to which a particular trait, behavior, characteristic, or attribute exists in the population

Ex: base rate of depression in a college student population

Base rate

How well did you know this?

Not at all

Perfectly

Proportion of people a test accurately identifies as
possessing or exhibiting a particular trait, behavior, characteristic, or attribute.

Ex: proportion of neurological patients accurately identified
as having a brain tumor

Hit rate

How well did you know this?

Not at all

Perfectly

Proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute.

Miss rate

How well did you know this?

Not at all

Perfectly

2 subtypes of miss rate

Study These Flashcards

False positive
False Negative

A miss wherein the test predicted that the testtaker did possess the particular
characteristic or attribute being measured when in fact the testtaker did not.

Ex: A psychological test designed to screen for depression incorrectly flags a person as depressed when they are not experiencing symptoms of depression.

Study These Flashcards

False positive

A miss wherein the test predicted that the testtaker did not possess the particular characteristic or attribute being measured when the testtaker actually did. Ex: A pregnancy test that indicates a woman is not pregnant, while she is, is a false negative.

False Negative

Judgments of criterion-related validity are based on 2 types of statistical evidence

Validity coefficient Expectancy data

Correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.

Validity coefficient

Illustrate the relationship between predictor scores (like test scores) and criterion scores (like job performance), showing the likelihood of a specific outcome based on a testtaker's performance

Expectancy data

Degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use Ex: a medical practitioner is more likely to correctly diagnose a kidney infection if a urine test is ordered, rather than relying on a physical examination and discussion of symptoms alone

Incremental validity

Informed, scientific idea developed or hypothesized to describe or explain behavior. Ex: Intelligence, personality

Construct

Type of construct validity evidence and the procedures used to obtain it (5)

■ the test is homogeneous, measuring a single construct; ■ test scores changes with age ■ test scores has pretest-posttest changes ■ test scores obtained by people from distinct groups vary ■ test scores correlate with scores on other tests

Refers to how uniform a test is in measuring a single concept. Ex: If the test is one of marital satisfaction, and if individuals who score high on the test as a whole respond to a particular item in a way that would indicate that they are not satisfied whereas people who tend not to be satisfied respond to the item in a way that would indicate that they are satisfied, then again the item should probably be eliminated or at least reexamined for clarity.

Homogeneity

Some constructs are expected to change over time. Ex: For example, if children in grades 6, 7, 8, and 9 took a test of eighth-grade vocabulary, then we would expect that the total number of items scored as correct from all the test protocols would increase as a function of the higher grade level of the testtakers.

Changes with age

Also referred to as the method of contrasted groups Demonstrate that scores on the test vary in a predictable way as a function of membership in some group. Ex: Roach and colleagues (1981) proceeded by identifying two groups of married couples, one relatively satisfied in their marriage, the other not so satisfied. The groups were identified by ratings by peers and professional marriage counselors. The group of couples rated by peers and counselors to be happily married rated themselves on the MSS as significantly more satisfied with their marriage than did couples rated as less happily married evidence to support the notion that the MSS is indeed a valid measure of the construct marital satisfaction.

Distinct groups

Scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same (or a similar) construct Also from correlations with measures purporting to measure related constructs.

Convergent evidence

Used to determine if two constructs that shouldn't be related are, in fact, unrelated. Ex: Scores on a test measuring vocabulary recognition shouldn't be correlated with scores on a test measuring fine motor skills.

Discriminant validity

Matrix or table that results from correlating variables (traits) within and between methods, aiming to show convergent and discriminant validity

Multitrait-multimethod matrix

Correlation between measures of the same trait but different methods. Ex: People who score highly on self-esteem tests are likely to score highly on extroversion tests.

Convergent validity

Class of mathematical procedures designed to identify factors or specific variables that are typically attributes, characteristics, or dimensions on which people may differ

Factor analysis

2 types of factor analysis

Exploratory factor analysis Confirmatory factor analysis

Method used to identify underlying factors that explain correlations between variables. Ex: Researchers might use EFA to understand what factors influence student satisfaction with their university experience.

Exploratory factor analysis

Researchers test the degree to which a hypothetical model (which includes factors) fits the actual data. Ex: Researchers might use a questionnaire to understand factors that influence student satisfaction with their university.

Confirmatory factor analysis

Sort of metaphor Each test is thought of as a vehicle carrying a certain amount of one or more abilities Ex: A new test purporting to measure bulimia can be factor-analyzed with other known measures of bulimia, as well as with other kinds of measures (such as measures of intelligence, self-esteem, general anxiety, anorexia, or perfectionism)

Factor loading

Factor inherent in a test that systematically prevents accurate, impartial measurement.

Bias

Occurs when the use of a predictor results in consistent underprediction or overprediction of a specific group’s performance or outcomes Ex: A verbal reasoning test uses vocabulary more familiar to high-SES (socioeconomic status) individuals, leading to lower scores for equally capable low-SES applicants

Intercept bias

Occurs when a predictor has a weaker correlation with an outcome for specific groups Ex: If a depression scale predicts future mental health outcomes more accurately for one cultural group than another, it may have slope bias. For example, if the scale underestimates the severity of depression in one group compared to another, leading to different treatment recommendations, this reflects bias in prediction.

Slope bias

Numerical or verbal judgment (or both) that places a person or an attribute along a continuum identified by a rating scale

Rating

Scale of numerical or word descriptors

Rating scale

Judgment resulting from the intentional or unintentional misuse of a rating scale.

Rating error

Also known as a generosity error Error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading Ex: Section of a particular course will quickly be filled if it is being taught by a professor with a reputation for leniency errors in end-of-term grading.

Leniency error

Rating error that results in consistently low scores. It can occur when a rater is too strict or negative. Ex: A professor consistently gives students lower grades than their actual performance merits, believing that only exceptional work deserves top marks.

Severity error

Rater exhibits a general and systematic reluctance to give ratings at either the positive or the negative extreme. Consequently, all of this rater’s ratings would tend to cluster in the middle of the rating continuum. Ex: A reviewer consistently rates products between 3 and 4 stars, even when some deserve 1 star and others deserve 5.

Central tendency error

Procedure that requires the rater to measure individuals against one another instead of against an absolute scale.

Rankings

Describes the fact that, for some raters, some ratees can do no wrong. Ex: Assuming that a coworker is more skilled than they actually are because they went to a prestigious university

Halo effect

Extent to which a test is used in an impartial, just, and equitable way

Fairness or Test fairness

Chapter 6: Validity Flashcards

(55 cards)