Test construction Flashcards

1
Q

Classical Test Theory and Reliability is based on the assumption that

A

obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E): i.e., X = T + E.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

measurement error is due to

A

random factors that affect the test performance of examinees in unpredictable ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Test reliability refers to

A

the extent to which a test provides consistent information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

interpretation of reliability coefficient

A

They’re always interpreted directly as the amount of variability in obtained test scores that’s due to true score variability. For instance, if a test has a reliability coefficient of .80, this means that 80% of variability in obtained test scores is due to true score variability and the remaining 20% is due to measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

4 Methods for Estimating Reliability:

A

There are four main methods for assessing a test’s reliability: test-retest, alternate forms, internal consistency, and inter-rater.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test-retest reliability provides information about

A

the consistency of scores over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

test-retest reliability involves

A

administering the test to a sample of examinees, re-administering the test to the same examinees at a later time, and correlating the two sets of scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Test-retest reliability is useful for tests that are designed to measure a characteristic that’s ___over time.

A

stable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Alternate forms reliability provides information about

A

the consistency of scores over different forms of the test and, when the second form is administered at a later time, the consistency of scores over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

alternate forms reliability involves

A

administering one form of the test to a sample of examinees, administering the other form to the same examinees, and correlating the two sets of scores. Alternate forms reliability is important whenever a test has more than one form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Internal consistency reliability provides information

A

on the consistency of scores over different test items and is useful for tests that are designed to measure a single content domain or aspect of behavior.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

best reliability measures for measuring speed/performance

A

internal consistency relaibility is NOT a good measure because it tends to overestimate their reliability. For speed tests, test-retest and alternate forms reliability are appropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

3 key methods for INTERNAL CONSISTENCY reliability

A

Coefficient Alpha (Cronbach’s Alpha)

Kuder-Richardson 20 (KR-20)

Split-Half Reliability:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

coefficient/cronbach’s alpha used for tests with ___

A

multiple response formats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

kuder-richardson 20 (KR-20) is used for tests with ___

A

dichotomous (e.g. right/wrong) scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

issue with split-half reliability is

A

a split-half reliability coefficient underestimates a test’s reliability and is usually corrected with the Spearman-Brown prophecy formula, which is used to determine the effects of lengthening or shortening a test on its reliability coefficient.

17
Q

Inter-rater reliability is important for measures that are ____scored and provides information on the consistency of scores or ratings assigned by different raters.

A

subjectively

18
Q

methods used to evaluate inter-rater reliability:

A

Percent agreement

Cohen’s kappa coefficient is also known as the kappa statistic

19
Q

issue with percent agreement method

A

Percent agreement can be calculated for two or more raters. A problem with this method is that it does not take chance agreement into account, which can result in an overestimate of reliability.

20
Q

when ratings represent a nominal scale, best inter rater reliability method is

A

Cohen’s kappa coefficient is also known as the kappa statistic and is one of several inter-rater reliability coefficients that is corrected for chance agreement between raters.

21
Q

The reliability of subjective ratings can be affected by _____It occurs when two or more raters communicate with each other while assigning ratings, which results in increased consistency (but often decreased accuracy) in ratings and an overestimate of inter-rater reliability.

A

consensual observer drift.

22
Q

3 Factors that Affect the Reliability Coefficient

A

content homogeneity
range of scores
guessing

23
Q

when a test’s reliability coefficient is .81, the reliability index is the ____

A

square root of .81, which is .90.

24
Q

For dichotomously scored items, an item’s difficulty level (p) indicates the percentage of examinees who

A

answered the item correctly.

25
Q

when 50 of 100 examinees answered an item correctly, the item’s p value is

A

50/100, or .50.

26
Q

For mastery tests (tests used to identify examinees who have mastered a certain level of knowledge or skill), ___ p values are preferred.

A

lower

27
Q

The item discrimination index (D) ranges from ___and indicates the difference between the percentage of examinees with ___ and the percentage of examinees with __

A

-1.0 to +1.0 ; high total test scores (often the top 27%) who answered the item correctly; low total test scores (the bottom 27%) who answered the item correctly.

28
Q

what is the optimal difficulty level for a four-answer multiple choice question

A

the chance of choosing the correct answer to a four-answer multiple-choice question by guessing is .25, and the optimal difficulty level for this type of item is calculated by adding 1.0 to .25 and dividing the result by 2: (1.0 + .25)/2 = 1.25/2 = .625.

29
Q

when 90% of examinees in the high-scoring group and 20% of examinees in the low-scoring group answered an item correctly, the item’s D value is

A

.90 minus .20, which is .70.

30
Q

an item’s difficulty level affects its ability to discriminate, with items of ___ having higher levels of discrimination.

A

moderate difficulty

31
Q

if a test has a standard deviation of 5 and a reliability coefficient of .84, its standard error of measurement equals

A

5 times the square root of 1 minus .84:
1 minus .84 is .16,
the square root of .16 is .4,
5 times .4 is 2.

In other words, when a test’s standard deviation is 5 and its reliability coefficient is .84, its standard error of measurement is 2.