Chapter 6: Validity Flashcards
Judgment or estimate of how well a test measures what it
purports to measure in a particular context.
Judgment based on evidence about the appropriateness of inferences drawn from test scores
Validity
Logical result or deduction
Inference
May yield insights
regarding a particular population of testtakers as compared to the norming sample described in a test manual.
Absolutely necessary when the test user plans to alter in some way the format, instructions, language, or content of the test.
Ex: a local
validation study would be necessary if the test user sought
to transform a nationally standardized test into Braille for
administration to blind and visually impaired testtakers.
Local validation studies
3 categories of validity
Content validity
Criterion-related validity
Construct validity
Based on an evaluation of the subjects, topics, or content covered by the items in the test.
Ex: Driver’s license exam
A test that covers all relevant topics about traffic rules and excludes irrelevant topics
Content validity
Obtained by evaluating the
relationship of scores obtained on the test to scores on other tests or measures
Ex: Compare a student’s college entrance exam score to their first-semester GPA
Criterion-related validity
Arrived at by executing a comprehensive
analysis of:
a. how scores on the test relate to other test scores and measures, and
b. how scores on the test can be understood within some theoretical framework
Ex: An interviewer can use a structured interview to assess an applicant’s competencies and ensure they are a good fit for the role.
Construct validity
Classic conception of validity which visualizes construct validity as being “umbrella validity” because every other variety of validity falls under it.
Trinitarian view
Different ways of approaching the process of test validation
Strategies
Concept which states that all validation is singly focused on providing evidence to support the interpretation or the inference.
Unitary view of validity
Refers to the in-the-moment and in-the-place evaluation of targeted variables (such as behaviors, cognitions, and emotions) in a natural, naturalistic, or real-life context.
Ex: Assessing physical activity, mood, and stress, or assessing alcohol us
Ecological momentary assessment (EMA)
How well findings from a test or study can be generalized to real-life situations and settings outside the controlled environment
Ex: Assessing driving skills using a real driving course rather than a simulated computer game.
Ecological validity
Relates more to what a test appears to measure to the person being tested than to what the test actually measures.
Ex: A test measuring anxiety that includes questions about sweating, trembling, and feeling worried, demonstrates high face validity because these items are directly related to the construct of anxiety.
Face validity
“structure” of the evaluation
Plan regarding the types of information to be covered by the items, the number of items tapping each area of coverage, the organization of the items in the test, and so forth
Test blueprint
2 types of criterion-related validity
Concurrent validity
Predictive validity
Index of the degree to which a test score is related to some criterion measure obtained at the same time.
Ex: therapist may use two separate depression scales with a patient to confirm a diagnosis
Concurrent validity
Index of the degree to which a test score predicts some criterion measure.
Ex: College entrance tests are often used to predict college succes
Predictive validity
3 characteristics of criterion
Relevant
Valid
Uncontaminated
Occurs when the criterion measure includes aspects unrelated to the intended construct, leading to inaccurate or misleading data
Ex: A manager’s evaluation of an employee’s work performance is influenced by how much they like or dislike that employee personally, leading to unfair assessments.
Criterion contamination
Extent to which a particular trait, behavior, characteristic, or attribute exists in the population
Ex: base rate of depression in a college student population
Base rate
Proportion of people a test accurately identifies as
possessing or exhibiting a particular trait, behavior, characteristic, or attribute.
Ex: proportion of neurological patients accurately identified
as having a brain tumor
Hit rate
Proportion of people the test fails to identify as having, or not having, a particular characteristic or attribute.
Miss rate
2 subtypes of miss rate
False positive
False Negative
A miss wherein the test predicted that the testtaker did possess the particular
characteristic or attribute being measured when in fact the testtaker did not.
Ex: A psychological test designed to screen for depression incorrectly flags a person as depressed when they are not experiencing symptoms of depression.
False positive
A miss wherein the test predicted that the testtaker did not possess the particular characteristic
or attribute being measured when the testtaker actually did.
Ex: A pregnancy test that indicates a woman is not pregnant, while she is, is a false negative.
False Negative
Judgments of criterion-related validity are based on 2
types of statistical evidence
Validity coefficient
Expectancy data
Correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.
Validity coefficient
Illustrate the relationship between predictor scores (like test scores) and criterion scores (like job performance), showing the likelihood of a specific outcome based on a testtaker’s performance
Expectancy data
Degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use
Ex: a medical practitioner is more likely to correctly diagnose a kidney infection if a urine test is ordered, rather than relying on a physical examination and discussion of symptoms alone
Incremental validity
Informed, scientific idea developed or hypothesized to describe or explain behavior.
Ex: Intelligence, personality
Construct
Type of construct validity evidence and the procedures used to obtain
it (5)
■ the test is homogeneous, measuring a single construct;
■ test scores changes with age
■ test scores has pretest-posttest changes
■ test scores obtained by people from distinct groups vary
■ test scores correlate with scores on other tests
Refers to how uniform a test is in measuring a single concept.
Ex: If the test is one of
marital satisfaction, and if individuals who score high on the test as a whole respond to a
particular item in a way that would indicate that they are not satisfied whereas people who tend
not to be satisfied respond to the item in a way that would
indicate that they are satisfied, then again the item should
probably be eliminated or at least reexamined for clarity.
Homogeneity
Some constructs are expected to change over time.
Ex: For example, if children in grades 6, 7, 8, and 9 took a test of eighth-grade vocabulary, then we would expect that the total number of items scored as correct from all the test protocols would increase as a function of the higher grade level of the testtakers.
Changes with age
Also referred to as the method of contrasted groups
Demonstrate that scores on the test vary in a predictable way as a function of membership in some group.
Ex: Roach and colleagues (1981) proceeded by identifying two
groups of married couples, one relatively satisfied in their marriage, the other not so satisfied.
The groups were identified by ratings by peers and professional marriage counselors. The group
of couples rated by peers and counselors to be happily married rated themselves on the MSS
as significantly more satisfied with their marriage than did couples rated as less happily married evidence to support the notion that the MSS is indeed a valid measure of the construct marital satisfaction.
Distinct groups
Scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same (or a similar) construct
Also from correlations with measures purporting to
measure related constructs.
Convergent evidence
Used to determine if two constructs that shouldn’t be related are, in fact, unrelated.
Ex: Scores on a test measuring vocabulary recognition shouldn’t be correlated with scores on a test measuring fine motor skills.
Discriminant validity
Matrix or table that results from correlating variables (traits) within and between methods, aiming to show convergent and discriminant validity
Multitrait-multimethod matrix
Correlation between measures of
the same trait but different methods.
Ex: People who score highly on self-esteem tests are likely to score highly on extroversion tests.
Convergent validity
Class of mathematical procedures designed to identify factors
or specific variables that are typically attributes, characteristics, or dimensions on which people may differ
Factor analysis
2 types of factor analysis
Exploratory factor analysis
Confirmatory factor analysis
Method used to identify underlying factors that explain correlations between variables.
Ex: Researchers might use EFA to understand what factors influence student satisfaction with their university experience.
Exploratory factor analysis
Researchers test the degree to which a hypothetical model (which includes factors) fits the actual data.
Ex: Researchers might use a questionnaire to understand factors that influence student satisfaction with their university.
Confirmatory factor analysis
Sort of metaphor
Each test is thought of as a vehicle carrying a certain amount of one or more abilities
Ex: A new test purporting to measure bulimia can be factor-analyzed with other known measures of bulimia, as well as with other kinds of measures (such as measures of intelligence, self-esteem, general anxiety, anorexia, or perfectionism)
Factor loading
Factor inherent in a test that systematically prevents accurate, impartial measurement.
Bias
Occurs when the use of a predictor results in consistent underprediction or overprediction of a specific group’s performance or outcomes
Ex: A verbal reasoning test uses vocabulary more familiar to high-SES (socioeconomic status) individuals, leading to lower scores for equally capable low-SES applicants
Intercept bias
Occurs when a predictor has a weaker correlation with an outcome for specific
groups
Ex: If a depression scale predicts future mental health outcomes more accurately for one cultural group than another, it may have slope bias. For example, if the scale underestimates the severity of depression in one group compared to another, leading to different treatment recommendations, this reflects bias in prediction.
Slope bias
Numerical or verbal judgment (or both) that places a person or an
attribute along a continuum identified by a rating scale
Rating
Scale of numerical or word descriptors
Rating scale
Judgment resulting from the intentional or unintentional misuse of a rating scale.
Rating error
Also known as a generosity error
Error in rating that arises from the tendency on the part of the rater to be lenient in scoring, marking, and/or grading
Ex: Section of a particular course will quickly be filled if it is being taught by a professor with a reputation for leniency errors in end-of-term grading.
Leniency error
Rating error that results in consistently low scores. It can occur when a rater is too strict or negative.
Ex: A professor consistently gives students lower grades than their actual performance merits, believing that only exceptional work deserves top marks.
Severity error
Rater exhibits a general and systematic reluctance to give ratings at either the
positive or the negative extreme.
Consequently, all of this rater’s ratings would tend to cluster
in the middle of the rating continuum.
Ex: A reviewer consistently rates products between 3 and 4 stars, even when some deserve 1 star and others deserve 5.
Central tendency error
Procedure that requires the rater to measure individuals against one another instead of against an absolute scale.
Rankings
Describes the fact that, for some raters, some ratees can do no wrong.
Ex: Assuming that a coworker is more skilled than they actually are because they went to a prestigious university
Halo effect
Extent to which a test is used in an impartial, just, and equitable way
Fairness or Test fairness