Test Construction Flashcards

1
Q

Standard Error of Measurement/CI

A

The SEM is used to construct the CI around a specific test score. Depends on test’s SD and reliability coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Criterion Contamination

A

Bias introduced to criterion score as a result of person’s knowledge about their performance on the predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Item Difficulty

A

Determined by dividing # of people who got it right by total #. 0 is very difficult, 1.0 is easy. .5 difficulty is preferred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Reliability

A

Consistency of test scores over time, across forms, or across items. Can do test-retest, coefficient alpha, interrater, split-half, alternative forms. Reliability of .80 means 80% of variability is TRUE variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cross-validation and shrinkage

A

Re-assess criterion-related validity on a new sample to see how generalizable coefficient is. Coefficient shrinks as a result because the “chance factors” operating in original sample aren’t present.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard Error of Estimate/CI

A

Index of error when predicting criterion scores. Used to make a CI around a predicted score. Magnitude depends on criterion’s SD and validity coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Classical Test Theory

A

Observed variability in test scores reflects: 1) true differences between examinees on the attribute, and 2) effects of random error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Factor Analysis

A

Stat technique used to determine how many factors are needed to account for intercorrelations among a set of tests, substests, or test items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Construct Validity

A

Extent to which a test measures a hypothetical trait it is intended to measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Incremental Validity

A

Extent to which predictor increases decision-making accuracy. Calculate by subtracting base rate from positive hit rate. (linked to true and false negatives and positives, and criterion cut-off scores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kappa stat

A

Correlation coefficient used to assess interrater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Split half/ Spearman-Brown

A

Split-half: split test in half and correlate two halves. Tends to underestimate reliability
Spearman-Brown: corrects split-half technique and figures out reliability if test were full length.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Item discrimination

A

the extent to which a test differentiates between examinees who obtain high versus low scores on the test or on an external criterion. Ranges from -1.0 to +1.0. If all in upper group and none from lower get it right, score is +1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Coefficient Alpha/KR-20

A

Both are used to assess internal consistency reliability (inter-item consistency). KR-20 used for test items that are scored dichotomously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sensitivity and Specificity

A

Sensitivity= % of people in sample who have disorder and were accurately identified

Specificity: % of people who do not have disorder and were accurately identified as NOT having it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Factor loadings and communality

A

Factor loading= correlation between a test or other variable, when squared this is the amount of variability in test accounted for by that factor.
Communality= total variability in scores accounted for by the factor analysis.

17
Q

Oblique and Orthogonal Rotation

A

Oblique= rotation produces correlated factors

Orthogonal= rotation produces uncorrelated factors

Rotation is done to simplify interpretation of factors

18
Q

Content validity

A

Extent to which test adequately samples the domain of info or skill (expert judgment)

19
Q

Test length/ range of scores

A

Increasing test length with more items of similar content and quality increases reliability. Or increase heterogeneity of sample in terms of the attribute measures, which increases the range.

20
Q

Relationship between reliability and validity

A

reliability is a necessary but insufficient condition for validity!

21
Q

Item characteristic curve

A

Constructed in item response theory for each item. Provides info on relationship between examinee’s level on the ability or trait measures and the probability of responding correctly.