Wk 5 - Validity Flashcards

1
Q

If the validity coefficient between a test and its criterion measure (where a high test score should predict a high criterion score) is -.97 (minus point nine seven) and is statistically significant then this probably indicates… (x1)
Because… (x4)

A

The test could be reliable but is definitely not valid
The negative correlation indicates that the test has an inverse relationship with the criterion -
When it ought to have a positive correlation if it was valid.
However high correlation suggests the test is probably reliable (if unreliable, correlation would be much closer to zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Factors that may affect a predictive validity coefficient do NOT include… (x1)
Because… (x1)

A

The mean score on the test (assuming no ceiling or floor effects)
As calculating the correlation coefficient involves standardizing the variables anyway

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or false, and why? (x2)

Construct irrelevant variance refers to the variance in a CONSTRUCT that does not covary with TEST scores

A

False
Construct irrelevant variance refers to the variance in the TEST that does not covary with the CONSTRUCT
(not the other way around, as stated in the question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or false, and why? (x2)
If the variances of a test and the construct it is attempting to measure only overlap by a small degree then the test is likely to have low reliability.

A

False
If variances of test and the construct it’s measuring only overlap by a small degree then the test is likely to have low VALIDITY
(not RELIABILITY as it says in the question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True or false, and why? (x4)
Non-random attrition between two time points in a longitudinal validation study is one of the factors that could potentially compromise the evaluation of the CONCURRENT validity of a test (assuming the test is administered during the initial time point).

A

False
CONCURRENT validity involves administering both test and criterion measures AT SAME TIME
So if people drop out after the initial time point it won’t matter -
We already have all the data we needed
(though it may affect any evaluation of the test’s PREDICTIVE validity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or false, and why? (x2)
In a validation study for a behavioural measure, you discover that self-selection biases in your sample are influencing the spread of scores for the measure. This could compromise the evaluation of the CONCURRENT validity of the test.

A

True
Anything that affects the spread of scores in a test may affect its correlations with other variables
(which is what we’re analysing when we evaluate the concurrent validity of a test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or false, and why? (x1)
Evaluating a test by seeing if it does not correlate highly with a construct it is not supposed to be measuring is an example of deviating validity.

A

False.

It’s an example of discriminant or divergent validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or false, and why? (x2)

A factor analysis involves mathematically grouping items according to the similarity of their content.

A

False
Factor analysis involves mathematically grouping items according to their inter-correlations
Not similarity of content (there’s no way the factor analysis can “know” what the content of the items is)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or false, and why? (x1)

If a test has poor face validity then this may have implications for the data that the test yields.

A

True

Poor face validity can lead to things like missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or false, and why? (x2)
Content validity is not important for a university examination as long as that examination is supported by empirically-based validity evidence

A

False.
Even if we could create a test that discriminated between good and poor students in the course (i.e. it had empirically-based validity),
Still a problem if it did not do this by measuring knowledge of course content directly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True or false, and why? (x1)
You can test the incremental validity of a test by seeing whether it can predict some relevant criterion measure in isolation from other measures.

A

False
Incremental validity is about whether a test contributes to predicting some outcome IN ADDITION TO the effect of other measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True or false, and why? (x2)
If we had an established intervention known to reduce state anxiety then we potentially could use this to test the validity of a new measure of state anxiety.

A

True
We can use intervention as part of experiment to see if intervention reduces scores in the new test in ways we’d predict if test is valid
(compared with some placebo intervention)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or false, and why? (x2)

It is possible for a test to have excellent reliability but poor validity.

A

True
You don’t need validity to have good reliability
(your test can be consistent in the scores it produces without measuring what you want it to)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or false, and why? (x3)

It is possible for a test to have excellent validity but poor reliability.

A

False
Need good reliability for chance of test being valid
(because the level of reliability places a ceiling on how high your validity coefficient can be).
If your measure is producing wildly inconsistent scores then it’s probably not measuring anything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True or false, and why? (x3)
If the reliability of both a test and a criterion measure are high then this means the correlation between them should also be high.

A

False
If reliability of both a test and a criterion measure is high then the correlation between them is not restricted –
However, this doesn’t mean it can’t be small
(the correlation can be high or low, depending on the validity of the test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or false, and why? (x2)

Content validity is empirical data supporting the hypothesis that the content of a test is appropriate

A

False
Content validity involves opinions
And is not generally based on empirical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When students complain that, in a course examination, a lecturer did not ask any questions from a particular lecture, they are effectively complaining about the examination having… (x1)

A

Potentially poor content validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is it not strictly accurate to talk about the validity of a test (hint: one test could be used in more than one context)? (x3)

A

It’s interpretations of test scores required by proposed uses that are evaluated, not the test itself.
When test scores are used or interpreted in more than one way, each intended interpretation must be validated
Because test can be used in different contexts – validity can change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are constructs? (x2)

Plus egs x4

A

Unobservable, underlying hypothetical traits or characteristics
That we can try and measure indirectly using tests
Intelligence, anxiety, plumbing skill, speeding propensity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is contruct underrepresentation? (x1 plus e.g. x1)

A

The portion of variance in the construct that is not captured by our test
self report assumes insight, into eg being slow or fast drivers, but the chances of perfect insight are very slim – so there’s things you don’t capture in the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is construct irrelevant variance? (x1 plus e.g. x 1)

A

Stuff that’s captured by the measurement, but not part of the construct
Eg speed questionnaire – variance that’s to do with interpreting the wording, social desirability issues, not reflection of speed of driving

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the similarities (x2) and difference between content and face validity? (x2)

A

Both are opinion-based, not empirical, but…
Face is how valid the test appears to be, from the perspective of the test taker (usually), while
Content is judgment (usually by experts) regarding how adequately a measure samples behaviour representative of the universe of behaviour it was designed to for

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Can we say that ‘test is valid’? (x2)

A

No

Only that validity hypotheses are supported - not all or nothing

24
Q

What is ‘valid measurement’? (x2)

A

That aspect of the characteristic/ability/trait which IS captured by both the construct and the test
Variance that is due purely to the trait, not construct irrelevant variance or construct underrepresentation

25
Q

Variance in test measurement is made up of what two components?

A

Valid measurement

Construct irrelevant variance

26
Q

Variance in a construct is made up of which two components?

A

Construct under-representation

Valid measurement

27
Q

How can face validity be useful (x2), or not (x2)?

A

Good for PR - people can see the point
They accept test and take it seriously, less missing values
But may have terrible actual validity -
Just because it feels like driving a car, doesn’t mean it tells us about RL driving behaviour

28
Q

How could you go about evaluating content validity? (x2)

A

Getting experts to rate each item for its relevance to the construct - e.g. exam questions for course content
Getting those experts to make judgements on whether particular lectures were over- or under-represented.

29
Q

How could I create an exam that had great empirical validity but poor content validity? (x7)

A

Ask about known correlates of academic success:
hours spent studying, assignment marks, motivation, GPA, questions based on other prerequisite courses, number of lectures/tutorials attended

30
Q

What is the general process involved in testing empirical validity? (x2)

A

Create hypotheses regarding how your measure ought to perform if it is valid
Then design and run studies to test these hypotheses

31
Q

Give two examples of things that might restrict the range of scores in a test and indicate what influence this could have on the validity coefficient.

A

Non-random attrition: certain types of people dropping out of your longitudinal study - e.g. those too sick or not very sick dropping out of study
Self-selection: only certain people in the sample in the first place - we can’t test randomly in RL

32
Q

What is criterion validity? Give 5 examples.

A
Judgment regarding how adequately a score on a test  can be used to infer an individual’s most probable standing on some measure of interest (the criterion)
Method of contrasted groups
Concurrent
Predictive
Incremental
Convergent
33
Q

What is the method of contrasted groups? (x2 plus egs x 2)

A

Establish criterion validity by determining whether test scores of groups of people vary as expected
Eg groups we think will score high or low – does the test tell them apart?
Clinical group vs non-clinical controls (e.g. dental phobia patients vs controls for dental phobia questionnaire)
Experts vs novices for a skill or ability test

34
Q

Give two examples of criterion contamination, explaining what is contaminated in each case. (x2 and x3)

A

Validate schizophrenia test by finding it can tell apart those with/out
BUT then discover that people with schizophrenia were originally diagnosed using my test (validity test is circular)
Zuckerman sensation seeking scale: validated by comparing scores with a risk-taking behaviour scale (the criterion),
But virtually the same items appeared in both test and criterion
ie many questions directly ask if you like to take risks

35
Q

What are the effects of restriction of range of continuous variables? (x1)

A

Giving us a much smaller correlation than we would get from full range

36
Q

How big does a validity coefficient have to be? (x1 plus egs x 2)

A

Depends entirely on the context
.2 is awesome for crash risk – most people only crash every ten years, and only small portion of any sample will have had a crash in eg past yeardepends entirely on the context
But, if correlating a new test of intelligence with an older one, might expect more like .80 (no excuse for it to be smaller if it’s measuring the same thing)

37
Q

What is concurrent validity? (x2)

A

Subset of criterion validity

Where test and criterion are measured for each person at the same time/in same session

38
Q

What is predictive validity? (x2 plus e.g. x1)

A

Subset of criterion validity
Where the test is trying to predict what the criterion will be at some future time
Speed test predicts crashes over next year

39
Q

What is incremental validity? (x3)

A

Type of criterion validity
Can be predictive or concurrent
How much each individual predictor ADDS to predicting the criterion in addition to the effect of other predictors

40
Q

Discuss two examples of incremental validity (x3 each)

A

Predict bungee jump:
Sensation-seeking, Past risk-taking behaviour, Susceptibility to peer pressure, Fear of heights
Alone might have poor criterion validity, using together (e.g. combined using Multiple Regression) improves it
Eg crash risk:
Speeding propensity; hazard perception skill; fatigue; km driven/wk; drunk; tailgaiting propensity; traffic violations; distraction; years experience; age
(last two correlate highly, so using both doesn’t add much value – doesn’t change you validity coefficient)

41
Q

What is convergent validity? (x2 plus e.g. x1)

A

Does it correlate with similar things?
Convergent and criterion are not mutually exclusive – many things might be equally described by either term
eg if anxiety measure is valid, we can expect it to correlate with established measures of anxiety

42
Q

What is discriminant/divergent validity? (x1 plus e.g. x2)

A

Does it not correlate with dissimilar things?
e.g. new measure of depression does NOT correlate highly with validated measures of anxiety
(shows it isn’t just a general measure of maladjustment and is specifically measuring depression, because depression and anxiety are different things)

43
Q

What is the criterion validity coefficient?

A

Correlation between the test scores and the criterion scores

44
Q

Why do both the test and its criterion have to have decent reliability? (x2 plus explain x4)

A

Reliability of each limits size of validity coefficient
So, noise/measurement error in the data makes the correlation smaller, due to scores being more spread out
Eg, imagine a perfect match of test onto criterion (correlation of 1)
• Then introduce uncertainty/inaccuracy/unreliability into the tests scores…
• Perfect correlation is spoiled by smearing the scatterplot
• Same would happen if the criterion scores were unreliable

45
Q

What questions does factor analysis answer? (x4)

A

Is the internal structure as expected? Does it map onto theory?
Multi-faceted (multiple subscales; heterogeneous)?
Or no subscales, homogeneous?

46
Q

How does factor analysis work? (x2)

A

Uses mathematical techniques to group the items into clusters/factors/components based on how well they correlate with each other
Picks out the distinct clusters of items in our data based on their inter-correlations -tells us to what extent the constructs tie together

47
Q

What is construct validity? (x3)

A

Umbrella term covering water we are measuring the right thing…
Argued to cover all validity methods
How well the scores on your test reflect the construct (i.e. the trait or characteristic) that your test is supposed to be measuring.

48
Q

Define criterion (x1)
Plus eg x1
And four contingencies for deciding on a criterion measure

A

The standard against which the test is evaluated (e.g. actual driving speed was one of the criteria used to validate the speed questionnaire).
Needs to be reliable, valid, relevant, and not subject to criterion contamination

49
Q

Give 6 egs of criterion validity checks

A

Surgery competence simulator test: correlation between test score and patient outcome rating
University admissions test, with GPA at end first year;
Depression inventory with clinician rating of severity;
Salesperson personality scale with amount of worthless crap sold to gullible people;
Clerical aptitude with supervisor’s job performance rating;
Creative thinking test with panel rating of product

50
Q

Are face and content validity necessary for actual validity? (x3)

A

No
Measure with terrible face validity could still be great predictor, e.g. risk-taking and driving behaviours
Could have psychometric validity sans content - e.g. GPA as predictor of course performance, rather than exam

51
Q

What size correlation is generally necessary to say that an item belongs in a particular cluster (in factor analysis)?

A

Greater than .4, with larger being better

52
Q

How do we use experimental effects to check validity? (x2 plus egs x2)

A

Are they as expected?
ie it’s been shown in a bunch of studies what the effect is supposed to be
Eg new test of state anxiety:
• Give them all the test, split into two groups (treatment/placebo)
• If valid, intervention group should differ from placebo
Eg hazard perception: subject 25 people to a validated training course and see if scores improve on the test compared with control group/ pre-training scores

53
Q

How might we use developmental changes to assess validity? (x1 Plus e.g. x1)

A

Diffs in test scores as you might predict if your measure was valid?
Eg everyone would agree that language skills improve with age

54
Q

What are the components of the WISC-IV Wechsler Intelligence Scale for Children

A
10 core subtests arranged into 4 groups:
•	Verbal comprehension index
•	Perceptual reasoning index
•	Working memory index
•	Processing speed index
Plus 5 supplementary tests for additional/replacement tests
55
Q

Describe the properties of the WISC-IV (x6)

A

Large, representative normative sample across 11 age groups (6 - 16 yrs)
Mean = 100m, SD = 15
Excellent internal consistency, test-retest and inter-rater reliability
Need diff of 7.58 IQ points (2 x SEdiff =3.79) for confidence that not due to chance
Good content validity
And empirical: predicts academic achievement, diffs between normal/special groups, confirmatory factor analysis

56
Q

Factors that may affect a predictive validity coefficient do include… (x3)

A

Internal consistency of the criterion
Certain types of people dropping out of the sample between the original test and when the criterion is measured
Certain types of people agreeing to be in the sample.