Test Construction Flashcards

1
Q

Item difficulty

A
  • Measured using an item difficulty index ranging from 0 - 1
  • Equation = Total number of examinees passing the exam divided by total number of examinees
  • Optimal difficulty level depends on likelihood of answering correctly by chance, the goal of the testing, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Item discrimination

A
  • The extent to which an item differentiates between examinees who obtain high versus low scores on the entire test
  • D = U - L
  • Ranges from -1 to +1
  • Items with a discrimination index of .35 or higher is typically acceptable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Item characteristic curve

A
  • Constructed for each item
  • Plot the proportion of examinees in the sample who answered correctly against the total test score, performance on an external criterion, or an estimate of the latent ability or trait measured by the item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Item response theory

A

Is sample invariant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Classical test theory

A

Uses 2 methods of item analysis: item difficulty and item discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Limitations of CTT

A
  • Item and test parameters are sample dependent
  • Difficult to equate scores across content
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Item’s level of difficulty

A

Ability level at which 50% of the examinees provide a correct response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Item’s ability to discriminate

A

Indicated by the slope of the curve
The steeper the slope, the greater the discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Probability of guessing correctly

A

Indicated by the point at which the ICC intercepts the vertical axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Test score in Classical Test Theory

A

X = T + E
T = True score component
E = Error component (measurement error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Reliability coefficient

A

Ranges from 0 to 1
Correlation coefficient
Unlike most correlations, the r is never squared
Ex. a reliability coefficient of .89 means that 89% of variability in obtained scores is true score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Test-retest reliability

A

Same test to same group of examinees on two different occasions
Coefficient indicates stability/consistency
May be impacted by a PRACTICE EFFECT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Alternate forms reliability

A

Two forms of the same test are administered at the same time point
Coefficient of equivalence
May be impacted by CONTENT SAMPLING
Good for speeded tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Split-half reliability

A

Scores on two halves of the test are correlated (e.g., odd versus even numbered items)
Usually underestimates a test’s true reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Spearman-Brown Prophecy Formula

A

Used to correct the split-half reliability coefficient
Provides an estimate of the reliability coefficient in a given test length
Tends to overestimate a tests true reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cronbach’s coefficient Alpha

A

Calculates the average reliability from all possible splits of the test
This is a conservative measurement of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Kuder-Richardson Formula 20

A

Used when test items are scored dichotomously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Kappa statistic/Cohen’s Kappa

A

Used when scores or ratings represent a nominal or ordinal scale of measurement

19
Q

Test length

A

The longer the test, the less the relative effects of measurement error and the larger the reliability coefficient
Spearman Brown prophecy formula can also be used to estimate the impact of lengthening a test

20
Q

Range of scores

A

The reliability coefficient is maximized then the range of scores is unrestricted

21
Q

Standard error of measurement

A

Index of the amount of error that can be expected in obtained scores due to the unreliability of the test
SEM = SD sqrt(1-reliability coefficient)

22
Q

Internal consistency reliability

A

How well items within a test correlate with other items on the same test
This includes split-half reliability and Cronbach’s coefficient alpha

23
Q

Content sampling

A

Impacts split-half reliability and coefficient alpha

24
Q

Inter-rater reliability

A

Measured using a kappa or a percent agreement

25
Q

Kendall’s coefficient of concordance

A

Used to assess inter-rater reliability when three or more raters and ratings are reported as ranks

26
Q

Consensual observer drift

A

When two or more observers working together influence each other’s ratings and both assign ratings in a similarly idiosyncratic way

27
Q

Impact of guessing on reliability

A

As the probability of guessing correctly increases, the reliability coefficient decreases

28
Q

Content validity

A

Items on the test adequately represent the domain being measured

29
Q

Construct validity

A

The test has expected relationships with other variables

30
Q

Convergent and Discriminant Validity

A

Convergent validity = high correlations ith measures of the same and related traits
Discriminant validity = low correlations with measures of unrelated characteristics

31
Q

Multitrait-Multimethod Matrix

A

Systematically organizes data collected when assessing a test’s convergent and discriminant validity
Includes coefficients that are: monotrait-monomethod, monotrait-heteromethod, heterotrait-monomethod, heterotrait-heteromethod

32
Q

Criterion-related Validity

A

Test correlates/predicts an examinee’s performance on some external criterion

33
Q

Standard error of estimate

A

SEE = Standard deviation of criterion scores sqrt(1-validity coefficient squared)
Used to construct a confidence interval around an estimated score

34
Q

Incremental validity

A

The increase in correct decisions that can be expected if the predictor is used as a decision making tool
Positive hit rate - base rate

35
Q

Specificity and sensitivity

A

Provide information about a predictor’s accuracy when administered to a group of individuals who are known to have a disorder of interest
Sensitivity = % of people ho have the disorder and were accurately identified by true positives and false negatives
Specificity = % of people who do not have the disorder and were accurately identified by true negatives and false positives

36
Q

Criterion Contamination

37
Q

Cross-valdiation

38
Q

Shrinkage

39
Q

Concurrent validity

A

A form of criterion-related validity.
When criterion data are collected prior to or at the same time as data on the predictor

40
Q

Predictive validity

A

When the criterion is measured at some point after the predictor has been administered

41
Q

Criterion related validity coefficient

A

Ranges from -1 to 1

42
Q

Positive predictive value

A

Probability that people ho test positive have the disorder

43
Q

Positive likelihood ratio

A

The extent to which a positive result affects the probability that the person has a disorder
A useful predictor should have an LR+ of at least 1.0

44
Q

Relationship between reliability and validity

A

A test’s reliability always places a ceiling on its validity
High reliability however does not guarantee validity