Test Psychometrics Overview Flashcards

1
Q

Validity

A

the extent to which a test measures what it set out to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of Validity

A
  • Face
  • Content
  • Criterion
  • Construct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Face validity

A

does the test appear to be measuring something meaningful?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Three MAIN types of validity

A
  • Content
  • Criterion
  • Construct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Content validity

A
  • What do experts believe is being measured? This is the least quantitative form. Does the content fit with the construct?
  • Contains some inter-rater reliability (Kappa)
  • Important for intelligence/achievement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Criterion validity

A

does the measure appropriately predict aspects that it should? 3 kinds:

(a) concurrent
(b) Predictive 
(c) Known groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of criterion validity

A
  • concurrent
  • predictive
  • known groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Construct validity

A

– includes all forms; the extent to which the construct being suggested is actually being measures. 3 types:

(a) Convergent (and divergent)	
(b) Discriminant 
(c) Internal Structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Concurrent validity

A
  • does it correlate with other measures given at the same time (aka, predict complementary performance)?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Predictive validity

A

– does it predict future performance? (e.g., GRE score)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

known groups validity

A

– using groups with expected, different outcomes (e.g., giving intelligence tests to individuals with MR and giftedness)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

convergent validity

A

– the target test theoretically related to other tests?
positive correlation

measure of depression should be positively correlated with other depression measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

divergent validity

A

Divergent– does the test NEGATIVELY relate to other tests that it SHOULDN’T be related to positively? negative correlation (e.g., happiness measure should be negatively correlated with depression measure)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

discriminant validity

A

relation to a theoretically unrelated construct. Should be uncorrelated with it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

internal structure validity

A

(aka, Factor Validity) – looks at the factors within the construct. Most tests have bad internal structure. Why? Not theory driven! Only three intelligence tests have good internal structure (according to Dr. MacDonald):
 Stanford-Binet (CHC)
 Woodcock-Johnson (CHC)
 KABC-II(5/4 factor of CHC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

incremental validity

A

whether the measure will increase the predictive ability of an existing method of assessment. In other words, incremental validity seeks to answer if the new test adds much information that might be obtained with simpler, already existing methods. Example: some have argued that the Rorschach has poor incremental validity since other, more easily administered tests of personality gather the same data, just in a less tedious way.

17
Q

ecological validity

A

whether the measure appropriately simulates real-world phenomena. This should not be confused with external validity, which refers to the generalizability of findings to the real world. In other words, an ecologically valid measure should appropriately capture the feel of the corresponding real-world scenario. Example: mock-juries may produce externally valid findings. However, most mock-juries do no include actual court proceedings, instead court transcripts of a trial. Thus, mock-juries could be said to have poor ecological validity.

18
Q

reliability

A

the consistency of a measure

19
Q

components of reliability

A

CASTI

  • Cronbach’s alpha (internal consistency measure in statistics)
  • Alternate forms (e.g. Blue and Green forms of WRAT-5)
  • Split-half (splitting the test in half and comparing it with itself by correlation
  • Test-retest (temporal stability of scores)
  • Inter-rater
20
Q

standard error of measurement

A

 We want this to be low so that R is high and we are accurately assessing and thus obtaining the person’s true score, reflective of an accurate assessment of the characteristic(s) in question.

21
Q

norms

A
  • To accurately interpret test data, to ascertain a person’s exact position with reference to a standardized sample, we must have a normative reference group because otherwise a raw score has no meaning. So, we need to see where the person falls in the sample’s relative standing.
  • Raw score is converted into a derived score (a relative measure), which tells the person relative standing.
  • There is a need for cultural/ethnic normative groups, which can be accomplished through stratified random sampling.
22
Q

Objective test validity

A

have high face validity, intent is easy to discern and hence participants can fake their responses, tests require person to be introspective and accurately answer truthfully, often resulting in false positives, in addition, defensiveness of person may prevent them from accurately responding

23
Q

projective test validity

A

are better predictors for long term behavioral patterns, while self report measures work best when both test items and criterion behaviors are assessed at or near the same time and are matched for specificity, the longer the time interval, the less predictive the test will be, objective measurements are best at predicting short-term behavior patterns, best to use a combination of both objective and projective measures

24
Q

Reliability Rules of thumb (cut offs)

A

.90 for decision making tasks
.80 or above for clinical and psychoeducational tasks (moderate)
.70-.79 for subtests are relatively reliable
.60-.69 subtests are marginally reliable
less than .60 are unreliable

25
Q

Validity rules of thumb

A

.50 - .70 is acceptable criterion