Validity Flashcards by Olivia Cosham

Why is it not strictly accurate to talk about the validity of a test?

A particular test might be valid in one context but not in another; it’s about the extent to which it’s valid in this context

How well did you know this?

Not at all

Perfectly

What are constructs?

Unobservable, underlying hypothetical traits or characteristics that we can try and measure indirectly using tests (e.g. intelligence, anxiety, speeding propensity)

How well did you know this?

Not at all

Perfectly

What is construct under-representation?

The variation that the measure doesn’t capture in the underlying construct it’s supposed to be measuring (e.g. behaviour a participant won’t admit to in self-report)

How well did you know this?

Not at all

Perfectly

What is construct irrelevant variance?

Score variation not related to the construct (e.g. misinterpretation of question wording, dishonesty or biased responses)

How well did you know this?

Not at all

Perfectly

How should the test and construct overlap in order to get a measure with high validity?

With a large overlap so valid measurement is large, with minimal construct under-representation and construct irrelevant variance

How well did you know this?

Not at all

Perfectly

What’s the difference between content and face validity?

Face validity is how valid a test appears to be (usually from the perspective of the test-taker); Content validity is often based on expert opinion (still not enough without empirical validity)

How well did you know this?

Not at all

Perfectly

Why is face validity not directly important from a psychometric perspective, and when can it be useful?

Because a test can have good face validity without any actual validity, or poor face validity while still having good actual validity; can be useful from a public relations point of view, so laypeople accept or choose the test because they think it’s appropriate and take it seriously, etc

How well did you know this?

Not at all

Perfectly

How could you go about evaluating content validity?

Getting experts to rate each question in a uni examination for its relevance to course content; getting those experts to make judgments on whether particular lectures were over or under-represented

How well did you know this?

Not at all

Perfectly

How could I create an exam that had great empirical validity but poor content validity?

By predicting students’ understanding of the course (e.g. GPA, hours spent studying, number of lectures attended, etc) with a high degree of accuracy, and without including any content from the course (although this wouldn’t be appropriate)

How well did you know this?

Not at all

Perfectly

What is the general process involved in testing empirical validity?

Create hypotheses regarding how your measure ought to perform if it is valid, then design and run studies to test these hypotheses

How well did you know this?

Not at all

Perfectly

What are the questions we might ask when designing an empirical validation study?

Does it map onto criteria as expected?; are there group differences as expected?; does it correlate with other things?; does it not correlate with dissimilar things?; are developmental changes, experimental effects, and internal structure as expected?

How well did you know this?

Not at all

Perfectly

Give two examples of things that might restrict the range of scores in a test, and indicate what influence this could have on the validity coefficient.

Non-random attrition (certain types of people dropping out of a longitudinal study); and Self-selection (only certain people are in the sample or will volunteer in the first place); both will lead to a smaller validity correlation coefficient

How well did you know this?

Not at all

Perfectly

How big does the validity coefficient have to be?

Magnitude depends entirely on the context; e.g. if validating a driving behaviour measure with accident involvement, we’d want a low validity coefficient (.10); if comparing a new and old intelligence test, we’d expect it to be much larger (at least .80)

How well did you know this?

Not at all

Perfectly

What is criterion validity?

A judgment regarding how adequately a score on a test can predict an individual’s most probable standing on some measure of interest (the criterion); the standard against which the test is evaluated

How well did you know this?

Not at all

Perfectly

Give 3 examples of criterion validity

University admissions test > GPA at end of the 1st year
Depression inventory > Clinicians’ ratings of severity of depression
Clerical aptitude test > supervisor’s ratings of job performance

How well did you know this?

Not at all

Perfectly

What is the Method of Contrasted Groups?

Study These Flashcards

An approach to criterion validity, to determine whether test scores of groups of people vary as expected (e.g. clinical groups vs. non-clinical controls; experts vs. novices for a skill or ability test)

What is criterion contamination?

Study These Flashcards

It’s where the criterion being used to assess the test’s validity is pre-determined by the test, thereby undermining the logic of criterion validity

Give two examples of criterion contamination, explaining what is contaminated in each case

Study These Flashcards

Discovering that your validating test to distinguish between people with or without schizophrenia, was used for the original diagnosis (test is circular); or validating the Zuckerman sensation seeking scale by comparing scores with a risk-taking behaviour scale (the criterion) – they’re both asking about the same specific behaviours so the relationship is artificially inflated

Does it matter if a criterion used to validate a test is not that reliable? Why/why not?

Study These Flashcards

Yes, because the reliability of both the test and its criterion limits the size of how big the validity coefficient can be; both need to be reliable, valid and relevant

What’s the relationship between the reliability of the test/criterion and the validity coefficient?

Study These Flashcards

The validity coefficient is always less than or equal to the square root of the test’s reliability coefficient, multiplied by the square root of the criterion’s reliability coefficient

What is concurrent validity?

Study These Flashcards

A subset of criterion validity which determines whether test scores correlate with other measures of the same or similar thing (e.g. measure of depression correlating highly with previous validated measures of depression)

What is predictive validity?

Study These Flashcards

A subset of criterion validity where the test is trying to predict what the criterion will be at some future time

Describe incremental validity

Study These Flashcards

Using a number of different predictors, and seeing how much each individual predictor adds to predicting the criterion in addition to the effect of other predictors (can be predictive or concurrent)

If predicting whether someone will bungee jump, how could we go about obtaining incremental validity?

Study These Flashcards

Measure sensation seeking, risk-taking behaviour in the past, susceptibility to peer pressure, and fear of heights, etc; while any of these alone might not affect bungee jumping (poor criterion validity), used in combination, they might (using multiple regression)

What if two predictors are highly correlated when using an incremental validity measure?

Then one may not add much predictive power beyond the other, even if they can both predict the criterion on its own (poor incremental validity), so you might not want to include both

Describe discriminant/divergent validity

It's when test scores don’t correlate highly with measures you’d expect them not to correlate with (e.g. depression measure doesn’t correlate highly with validated anxiety measures; shows you’re specifically measuring depression, not just a general measure of maladjustment)

Describe the main features of the WISC IV intelligence test

10 core subtests are arranged into 4 groups: verbal comprehension index; perceptual reasoning index; working memory index; processing speed index; These can be combined to form a measure of general intellectual functioning (full scale IQ)

What age is the WISC IV intelligence test designed for?

Children aged 6-16 years

What are the factors about the WISC-IV that show content and empirical validity?

Content: scales judged to cover theoretical aspects of IQ; Empirical: correlated with other tests; predicts academic achievement; predicts differences between normative and special groups

Theoretically, what are we asking when examining whether the internal structure is as expected?

Is the construct we’re measuring supposed to be multi-faceted (multiple subscales/heterogeneous) or not? Either way, if the test is a valid reflection of the construct, we’d expect the items to behave in the same way

How can we investigate the predictions about whether the internal structure of our test maps onto the internal structure of our construct?

By using Factor Analysis

Explain what factor analysis does

It uses mathematical techniques to group items into clusters (factors or components) on the basis of how much they correlate with one another; it picks up the distinct clusters of items in our data based on their inter-correlations

What is construct validity?

It’s how well the scores on the test reflect the construct (trait or characteristic) that the test is supposed to be measuring; it’s essentially an umbrella term for all other types of validity evidence (except face validity)

Validity Flashcards

(33 cards)