TM T3 Flashcards by S K

Validity

The degree to which empirical evidence and theoretical rationales support an assessment conclusion.

Not something a test HAS; a justification

Criterion, construct, content

How well did you know this?

Not at all

Perfectly

Construct validity

Based on conceptual variables underlying a test.

How well did you know this?

Not at all

Perfectly

Content validity

Based on subject matter of a test

How well did you know this?

Not at all

Perfectly

Avoid saying

“the validity of the test”

How well did you know this?

Not at all

Perfectly

When is validity wrong?

Wrong population -> different groups require special tools

Wrong task -> using the wrong test can lead to invalid results

Wrong context -> wrong testing environment

How well did you know this?

Not at all

Perfectly

Wrong context example

Using a personality test for a hiring decision

How well did you know this?

Not at all

Perfectly

Face validity

the APPARENT soundness of a test or measure regardless if they are. Intuitive. Something can lack this but still be valid - that might even be BETTER, since you cannot tell what is being tested

How well did you know this?

Not at all

Perfectly

5 Sources of Evidence of Validity

Test content
Response processes
Internal structure
Relations with other variables
Consequences of testing

How well did you know this?

Not at all

Perfectly

Content validity details

Evidence based on a test’s content
Extent to which a test measures subject matter or behavior under investigation
Example: test for 3rd grade math… how well does it represent what we want kids to know?

How well did you know this?

Not at all

Perfectly

Validity based on response processes

How do people actually respond to a test - is it measuring what it’s supposed to measure?

“Can you repeat the question in your own words?”
“What, to you, is ___”
“How sure are you of your answer?”

Revise based on answers

How well did you know this?

Not at all

Perfectly

Evidence based on internal structure

Does the test align with the theoretical framework or construct it’s intended to measure? Is the construct represented in the patterns of responses that people give?

How well did you know this?

Not at all

Perfectly

Factor analysis

Tool used to see which variables correlate with eachother and if they cluster around one common factor: “energy loss, appetite changes, difficulty concentrating” might cluster around somatic traits in the BDI-II

How well did you know this?

Not at all

Perfectly

Criterion validity

How well a test correlates with a specific standard
4 methods
Predictive, concurrent, retrospective and incremental
If a measure of criminal behavior is calid, we should be able to tell if:
1. They will be arrested in the future
2. The are currently breaking th elaw
3. They have a previous criminal record

How well did you know this?

Not at all

Perfectly

2 ways of thinking of CONSTRUCT validity

Convergent - how does my test correlate to other similar tests?

Discriminant - can a test prove you’re measuring your construct and not something else?

How well did you know this?

Not at all

Perfectly

Evidence based on consequences of testing

“Does this test produce situations for people that are not OK?”
If so -> not as strong as a test with no negative outcomes
Does this test promote fair outcomes

How well did you know this?

Not at all

Perfectly

Operational definitions

How you’re going to try to measure something - as an external, observable behavior

Example: aggression
How many times did they hit somebody

How well did you know this?

Not at all

Perfectly

Construct

Abstract idea
Can’t directly measure it
Characteristics or attributes
EX: aggression, intelligence

How well did you know this?

Not at all

Perfectly

Demonstrating content validity

Define the test universe
Develop the test specifications
Establish the test format
Construct the test questions

How well did you know this?

Not at all

Perfectly

Construct validity: defining the test universe

What relevant research is there to help develop the constructs?
Who are the key experts? Can they evaluate items? Also - cite them
Main construct aspects/dimensions
Other validated instruments

How well did you know this?

Not at all

Perfectly

Construct validity: Test specifications, format, questions

Specify content areas
Construct test questions
Purpose and intended use of test
Format and length
Et cetera

How well did you know this?

Not at all

Perfectly

IVR example

Constructs: minimization, violence recognition, partner blaming, distal blaming

IVR lacks some

test specifications

Criterion-related validity

Correlation with establish CRITERION (standard of comparison) - how well it correlates with its established standard

Examples of criterion validity

SAT performance correlating to academic success

accidents on a job correlating to supervisor ratings

4 types of criterion validity

Predictive Concurrent Retrospective Incremental

Predictive validity (criterion)

Test scores correlate with a future variable Assessed after test is admin. Prospective validity EXAMPLE: predicting disease onset

Concurrent validity (criterion)

Measurement backed by a related measurement at the same time - like a self-report and a supervisor rating. Convergent validity.

Retrospective validity

Correlates with past occurrences - accident proneness and past injuries. Postdictive validity.

Incremental validity

Does a test predict beyond other measures? Controlled regression for effects of other variables Example: CROPS and PROPS - assessed incremental validity of CROPS/PROPS by seeing if they added to prediction of lifetime traumatic events rating

Validity coefficient

+- .10 weak +- .30 moderate +- .50 strong

Cohen validity coefficient

+- .10 weak +- .30 moderate +- .50 strong

Linear regression

Y1 = a + bx Slope: expected change in y for every 1 unit change in x Intercept: where the regression line crosses the y axis Deviations from line: residuals

Coefficient of determination

Proportion in variability of y accounted for by variability in x - amount of shared variance between the predictor and the criterion

All about validity coefficient

Relationship between test score and criterion measure EX: correlation between PREP-M scores (prep for marriage) and reports of marital satisfaction.

Measurement error is also called

Attenuation due to unreliability

Threats to validity

Criterion contamination Restriction of range Attenuation due to unreliability

Criterion contamination

Criterion is influenced by unrelated factors - sales influenced by territory

Restriction of range

Limited variability in predictor or criterion (SAT scores in highly selective colleges may not predict as well for academic performance due to low diversity in range)

Attenuation due to Unreliability

measurement error

Think of the criterion variable as

the dependent variable, or the existing standard used for a comparison

Reliability vs. validity

R: consistency V: accuracy

True score

Hypothetical score without any error - can't be directly observed or measured

Observed score

Actual score, probably with some degree of measurement error

Systematic error

"the scale that always adds 5 pounds or the scale that always subtracts 5 pounds" Consistent error in measurement that skews the result in one direction Impacts validity EXAMPLE: observer bias, respondent bias

Random error

Unpredictable Impacts reliability EX: mood fluctuations, guessing on MCs

Internal Consistency reliability

the extent to which test items are measuring the same construct alpha EX: on an anxiety questionnaire, someone with anxiety who rates high on feeling worried is likely to rate high as nervous on another item

Alternate-forms reliability

consistency of results from multiple tests designed to measure the same subject

Interrater reliability

degree of agreement between raters rating consistency

What is the difference between heterogenous tests and homogenous tests?

Heterogenous test: measuring broader construct Example: WAIS measuring verbal comprehension, memory, processing speed) or the SAT and its subjects; BDI - depression as a broad subject Homogenous test: Measuring a single construct Example: basic math test

*Why is it necessary to include confidence intervals to interpret individual test scores?

they offer the individual's true score, quantifying possible error as a warning

Longer tests...

Have more items, increasing chances of reliability