Chapter 6 Flashcards
What is validity in psychological measurement?
The meaningfulness of a test score. What the test score actually means.
How well a test captures what it purports to capture
Inference
A logical result or deduction
Is there such thing as a universally valid test?
No. There are boundaries
Local validation studies
Test users make their own validation studies for their purpose (eg. altering the test, different population, etc.)
Trinitarian view of validity
Content, Criterion-related, and Construct Validity
Ecological Validity
How generalizable things are to real world
Face validity
What a test appears to measure to the person being tested
Content Validity
How well a test samples behavior representative of the universe of behavior that it’s designed to sample
How to ensure content validity in educational assessments?
Make sure test approximates proportion of material covered in the course
Test blueprint
A plan for what information should be covered by items, how much of each area, organization on test, etc.
How might (content) validity be relative?
Different cultures/religions/political parties have different historical interpretations. This means a test can be valid in one but not another region
Criterion-Related Validity
How well a test score infers most probable standing on some measure of interest
Concurrent validity
How much test score is related to some measure at the same time (concurrently)
Predictive validity
How well a test score predicts some criterion measure
Criterion (definition in testing evaluation)
The standard to which a test or score is being evaluated
An adequate criterion must be:
1) also Valid
2) Uncontaminated (a criterion measure cannot be based on its predictor)
How do you statistically find criterion contamination or correct it?
Can’t
Base rate vs. hit rate vs. miss rate
base rate: proportion that actually has something
hit rate: proportion accurately identified by a test to have something
miss rate: proportion that a test fails to identify to have something
2 statistics used for criterion-related validity
1) validity coefficient:
- relationship between test scores and scores on the criterion measure (eg. Pearson, Spearman)
2) expectancy data:
What happens to validity coefficient when attrition of subjects happens over time?
Lower validity? (adverse effect)
Incremental validity
How much an additional predictor explains something about criterion measure that isn’t already explained by the existing predictors
Time spent studying (1) and time in library (2) likely has high or low incremental validity?
Low. Because they’re probably highly correlated
Incremental validity of emotional intelligence
Has been doubted because of how they correlate strongly with intelligence and personality
Construct Validity
how appropriate inferences drawn from a test are
Construct (definition)
scientific idea that describes/explains behavior. unobservable/underlying
Homogeneity of a test
Whether a test measures a specific concept
If a concept meaning says the score is supposed to improve with age (eg. reading ability), measuring scores across ages would be evidence for what type of validity?
construct validity
Pre-test and post-test of marriage thing after marriage therapy is evidence for which validity?
construct validity
Method of contrasted groups
Show that scores differ between groups (one that should score higher vs. shoudn’t)
Convergent evidence
eg. new anxiety test correlate with old
Divergent evidence
evidence that something doesn’t correlate with something that it isn’t supposed to correlate with
multitrait-multimethod matrix
what is it used for?
A matrix/table resulted from correlating traits within/between methods
used for convergent/divergent validity evidence
Method variance
Similarity in scores due to the same method
Factor analysis
Procedure to identify factors of attributes/characters/dimensions
Factor loading
high or low for convergent/discriminant validity?
Determines extent to which a factor determines the test score.
high = convergent
Intercept bias vs. Slope bias
intercept bias: consistent under/overprediction of a score of specific group
slope bias: a predictor has a weaker correlation with an outcome of specific groups
Rating error
Misuse of a rating scale (eg. leniency error = grading too easy)
Severity error
too harsh (opposite of leniency error)
Central tendency error
Rater tends to grade average rather than positive/negative
What might be a way to overcome rating errors?
Use ranking rather than scores. Forces a rank between individuals
Fairness in psychometric context
How equitable, just, impartial a test is
Does no members of a specific group being part of test make it unvalid for a test user in the group?
Not in itself.
According to Borsboom article, what is validity? (when is a test valid?)
When the attribute exists and if variations in the attribute causes variations in outcomes
Ontology vs. Epistemology (Borsboom)
Ontology: Existence of attributes
Epistemology: Process of knowing or finding out about the attributes
Importance of causality in validity (Borsboom)
Validity is tied to causal relationships (changes in attribute –> changes in test score).
Currently, correlational views of validity are misconceptions. Tests aren’t valid because of correlations.
What does Borsboom propose in test construction research?
To focus on causal processes between attributes and test outcomes rather than optimizing predictive properties (multicollinearity issues)
How does correlational approaches not provide good evidence for validity?
They can lead to stupid conclusions, like thunder and lightning perfectly correlating, but you can’t say they’re the same thing.