test construction Flashcards

1
Q

What is Classical Test Theory?

A

A theory of measurement used for developing and evaluating tests, also known as true score test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the formula representing the relationship between obtained test scores, true score variability, and measurement error?

A

X = T + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does true score variability (T) represent?

A

Actual differences among examinees regarding what the test measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is measurement error (E)?

A

Random factors affecting test performance in unpredictable ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some examples of measurement error?

A
  • Distractions during testing
  • Ambiguously worded test items
  • Examinee fatigue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does test reliability refer to?

A

The extent to which a test provides consistent information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a reliability coefficient?

A

A type of correlation coefficient that ranges from 0 to 1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is a reliability coefficient interpreted?

A

As the amount of variability in obtained test scores due to true score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What reliability coefficient is considered minimally acceptable for many tests?

A

0.70 or higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What reliability coefficient is usually required for high-stakes tests?

A

0.90 or higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the four main methods for assessing a test’s reliability?

A
  • Test-retest
  • Alternate forms
  • Internal consistency
  • Inter-rater
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does test-retest reliability measure?

A

The consistency of scores over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is alternate forms reliability assessed?

A

By correlating scores from different forms of the test administered to the same examinees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does internal consistency reliability measure?

A

The consistency of scores over different test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is internal consistency reliability not useful for speed tests?

A

It tends to overestimate their reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is coefficient alpha also known as?

A

Cronbach’s alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Kuder-Richardson 20 (KR-20) used for?

A

Evaluating internal consistency reliability for dichotomously scored items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the split-half reliability method?

A

Correlating scores from two halves of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a drawback of split-half reliability?

A

It underestimates a test’s reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What formula is used to correct split-half reliability?

A

Spearman-Brown prophecy formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does inter-rater reliability assess?

A

The consistency of scores or ratings assigned by different raters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What methods are used to evaluate inter-rater reliability?

A
  • Percent agreement
  • Cohen’s kappa coefficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a limitation of percent agreement in inter-rater reliability?

A

It does not account for chance agreement

24
Q

What is consensual observer drift?

A

Increased consistency (but often decreased accuracy) in ratings due to raters communicating

25
Q

How can consensual observer drift be reduced?

A
  • Not having raters work together
  • Providing adequate training
  • Regularly monitoring accuracy
26
Q

What factor affects the size of the reliability coefficient related to content?

A

Content homogeneity

Tests that are homogeneous regarding content tend to have larger reliability coefficients than heterogeneous tests, especially for internal consistency reliability.

27
Q

How does the range of scores influence reliability coefficients?

A

Larger reliability coefficients occur when test scores are unrestricted in range

This happens when the sample includes examinees with high, moderate, and low levels of the characteristics measured.

28
Q

What impact does guessing have on reliability coefficients?

A

Easier guessing leads to lower reliability coefficients

True/false tests are likely less reliable than multiple-choice tests with three or more answer choices.

29
Q

What is the reliability index?

A

Theoretical correlation between observed test scores and true test scores

Calculated by taking the square root of the reliability coefficient.

30
Q

What does an item analysis determine in test development?

A

Which items to include based on difficulty level and discrimination ability

It is a process used in classical test theory.

31
Q

How is item difficulty (p) calculated?

A

p = number of correct answers / total number of examinees

Ranges from 0 to 1.0, with smaller values indicating more difficult items.

32
Q

What is the preferred range of item difficulty for most tests?

A

p = .30 to .70

Moderately difficult items are preferred, but optimal values may vary based on the test purpose.

33
Q

What is the optimal item difficulty level for mastery tests?

A

Lower p values are preferred

For example, an optimal average item difficulty of .20 might be used to identify mastery of at least 20% of content.

34
Q

How is the optimal difficulty level for guessing calculated?

A

Optimal p = (1.0 + probability of guessing) / 2

For a four-answer multiple-choice question, this would be (1.0 + .25) / 2 = .625.

35
Q

What does the item discrimination index (D) measure?

A

Difference in correct responses between high and low total test score groups

Ranges from -1.0 to +1.0, with higher D values indicating better discrimination.

36
Q

What is an acceptable D value for most tests?

A

D value of .30 or higher

Items of moderate difficulty typically have higher discrimination levels.

37
Q

What does a reliability coefficient less than 1.0 indicate about a test score?

A

An examinee’s obtained test score may or may not be their true score.

38
Q

What is a confidence interval in the context of test scores?

A

It indicates the range within which an examinee’s true score is likely to be based on their obtained score.

39
Q

How is the standard error of measurement calculated?

A

It is calculated by multiplying the test’s standard deviation by the square root of 1 minus the reliability coefficient.

40
Q

What is the standard error of measurement if the standard deviation is 5 and the reliability coefficient is .84?

A

2.

41
Q

How do you construct a 68% confidence interval around an obtained test score?

A

Add and subtract one standard error of measurement to and from the obtained score.

42
Q

How do you construct a 95% confidence interval around an obtained test score?

A

Add and subtract two standard errors of measurement to and from the obtained score.

43
Q

How do you construct a 99% confidence interval around an obtained test score?

A

Add and subtract three standard errors of measurement to and from the obtained score.

44
Q

What is the 95% confidence interval for an examinee who scored 90 with a standard error of measurement of 5?

A

80 to 100.

45
Q

What does Item Response Theory (IRT) focus on?

A

Examinees’ responses to individual test items.

46
Q

How does IRT differ from Classical Test Theory (CTT)?

A

CTT is test-based and focuses on total test scores, while IRT is item-based.

47
Q

What advantage does IRT have over CTT regarding item parameters?

A

IRT derives sample invariant parameters using mathematical techniques and a large sample size.

48
Q

What is a computerized adaptive test?

A

A test that tailors items to each examinee by presenting items appropriate for their level of the trait.

49
Q

What is another name for Item Response Theory?

A

Latent trait theory.

50
Q

What does the item characteristic curve (ICC) represent?

A

The relationship between each item and the latent trait measured by the test.

51
Q

What are the two axes of the ICC graph?

A

Total test scores (horizontal/x-axis) and probabilities of endorsing or answering the item correctly (vertical/y-axis).

52
Q

What does the difficulty parameter in IRT indicate?

A

The level of the trait required for a 50% probability of endorsing or answering the item correctly.

53
Q

What does the discrimination parameter in IRT indicate?

A

How well the item can discriminate between individuals with high and low levels of the trait.

54
Q

What does the slope of the ICC indicate?

A

The steeper the slope, the better the discrimination of the item.

55
Q

What does the y-axis crossing point of the ICC represent?

A

The probability of guessing correctly.

56
Q

Fill in the blank: When the y-axis crossing point of the ICC is closer to 0, it indicates that _______.

A

it is more difficult for examinees to choose the correct answer by guessing.