Ch 6 Validity Flashcards
Validity
- Extent to which a test measures what it claims to measure in a specific context
- Are the inferences appropriate?
Tests are not universally valid but can still measure…
- particular purpose
- particular population of people
- particular time
Validity is only valid in…
particular circumstances
Validation
Gathering and evaluating validity evidence
Whose responsibility to check for validity?
- Test developers
- Test users
- Local validation studies (Necessary when test is altered in some way). (Also necessary when used on a different population)
Face Validity (FV)
How valid a test looks to those involved
- Judgement concerning item relevance
Judgment comes from test taker and not test user
- Lack of FV may lead to lack of confidence in the test’s effectiveness
- FV is not correlated to psychometric soundness
Content
The actual items that make up the test (specific questions)
Content Validity
- How adequately a test samples behavior it is designed to sample
- Like FV except you are attempting to cover every possible aspect of what you are trying to measure
- Content validity heavily stresses comprehensiveness
Ex: Does a comprehensive final cover topics presented during a course?
Content Validity
- Expert opinion
- Test develper creates many potential items
- Experts rate each items
– essential
– useful, but not neccessary
– not necessary
- Remaining items are ranked and ordered
Content Validity Ratio
- Percentage of items judges agree are essential
Relativity of content validity
- A measure that is content valid in one culture may not have content validity in another culture
Ex: A measure of content validity for depression in Western culture should be different from a measure of content validity for depression in Eastern cultures
Criterion
Standard against which a test or score is evaluated
Ex: Mental disorder diagnosis of depression, anxiety, etc.
Criterion - Related Validity
- How adequately a score can be used to infer an individual’s standing on some criterion
- Measured with validity coefficient
– Correlation between test score and score on criterion measure
Ex: Correlation between depression score and DSM-5 depression diagnosis
Criterion - Related Validity Two Types [Placeholder]
.
Concurrent
Test scores & criterion measure at same time
ex:
- New diagnostic tool * Existing Diagnosis
- Test A * Test B
Predictive
- Test scores, then criterion measure taken in future
- Ex: Act score to predict freshman GPA (ACT was taken in the past, GPA is the criterion)
- A test with high predictive validity useful
Incremental Validity (A type of predictive validity)
- Degree to which an additional predictor explains something else about the criterion
Ex: High School GPA added to the above example
Criterion Characteristics [placeholder]
.
Uncontaminated [skip]
.
Criterion Contamination
- The criterion is based on the predictor (I.e., the criterion is on the test)
- Ex: Develop a depression measure based on BDI (Beck Depression Inventory) and then test it on patient diagnoses that were based on BDI
- Relevant
- Valid (ratings/tests used as criteria must be valid)
Standard Error of Estimate (SEE)
- Margin of error expected in the predicted criterion score
- Related to the correlation between the test score and criterion
Standard Error of Measurement (SEM)
For reliability
Standard Error of Estimate (SEE)
For Validity
- When the measure tries to predict the criterion
Construct
informed, scientific idea hypothesized to describe or explain behavior
Ex: intelligence, leadership
Construct Validity
- Appropriateness of inferences drawn from test scores regarding standing on a construct
- Referred to as “Umbrella Validity”
Evidence of Construct Validity
- Evidence of homogeneity - How uniform a test is in measuring a single concept
- Evidence of changes with age - Some constructs are expected to change over time (Reading rate) (vs. Marital satisfaction)
- Evidence of pretest/posttest changes - test scores change as a result of some experience between a pretest and a posttest (therapy)
- Evidence from distinct groups - Scores on a test vary in a predictable way as a function of membership in some group (scores on the psychopathy checklist for prisoners vs. civilians)
To test for construct validity, look to the theory
- No one way to test, solely based on your construct and the theory behind your construct
- Does it theorize that your construct is a single construct?
- Does it theorize it should look differently between people who have been treated and not treated?
- Does it theorize it should change over time?
Other Forms of Evidence
Your test scores should correlate with other “tried and true” tests of the same thing
Convergent Validity
A test that correlates highly with other tests of the same construct
Should your tests correlate with scores of other non-related variables?
No
Discriminant (Divergent) Validity
A test does not correlate with other tests of different constructs
Factor
Characteristics, dimensions, or attributes that people differ on
Factor Analysis
- A family of methods that classifies several items into groups of related factors or latent variables
- Done by finding similarities among the items
- Most often used in survey research to see if a long series of questions can be grouped into smaller sets of questions
Two types of Factor Analysis [placeholder]
.
Exploratory
You do not have a hypothetical model and let data guide the factors
Confirmatory
You have a hypothetical model and test that
Reliability - Validity Relationship
Without strong reliability a test cannot be valid; however, a test can be reliable without strong validity
- Attenuation is the weakening of validity due to poor reliability