Item Analysis and Test Reliability Flashcards

1
Q

Three factors that affect the reliability coefficient

A
  1. Content homogeneity
  2. Range of scores
  3. Guessing (T/F tests tend to have lower validity, due to easier guessing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four main methods of assessing a test’s reliability?

A
  1. Test-retest
  2. Alternate Forms
  3. Internal Consistency
  4. Inter-rater
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

_____ is due to random factors that affect the test performance of examinees in unpredictable ways and include distractions during testing, ambiguously worded test items, and examinee fatigue.

A

Measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

_____ is the result of actual differences among examinees with regard to whatever the test is measuring. It’s assumed to be consistent, which means that an examinee’s true score will be the same regardless of which form of the test he or she takes or who scores the test.

A

True score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

_____ provides information about the consistency of scores over time.

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

_____ provides information about the consistency of scores over different forms of the test and, when the second form is administered at a later time, the consistency of scores over time.

A

Alternate forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

provides information on the _____ consistency of scores over different test items and is useful for tests that are designed to measure a single content domain or aspect of behavior. It’s not useful for speed tests because it tends to overestimate their reliability.

A

Internal consistency reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are three methods of evaluating a test’s internal consistency reliability?

A
  1. Coefficient alpha (Cronbach’s alpha)
  2. Kuder-Richardson 20 (KR-20)
  3. Split-half reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

This method of evaluating internal consistency reliability involves administering the test to a sample of examinees and calculating the average inter-item consistency.

A

Coefficient alpha (Cronbach’s Alpha)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

This method of evaluating internal consistency reliability is an alternative to coefficient alpha that can be used when test items are dichotomously scored (e.g., as correct or incorrect).

A

Kuder-Richardson 20 (KR-20)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

This method of evaluating internal consistency reliability involves administering the test to a sample of examinees, splitting the test in half (often in terms of even- and odd-numbered items), and correlating the scores on the two halves.

A

Split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Given split-half reliability coefficients tend to underestimate a test’s reliability, due to using two shorter forms, they are usually corrected with the _____, which is used to determine the effects of lengthening or shortening a test on its reliability coefficient.

A

Spearman-Brown prophecy formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

_____ provides information on the consistency of scores over different raters and is important for tests that are subjectively scored.

A

Inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are two methods of evaluating inter-rater reliability?

A
  1. Cohen’s kappa coefficient -
    used to assess the consistency of ratings assigned by two raters when ratings represent a nominal scale
  2. Kendall’s coefficient of concordance - assess the consistency of ratings assigned by three or more raters when ratings represent ranks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

This occurs when two or more raters communicate with each other while assigning ratings, which results in increased consistency (but often decreased accuracy) in ratings and an overestimate of inter-rater reliability.

A

Consensual observer drift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tests that are _____ with regard to content tend to have larger reliability coefficients than tests that are _____, and this is especially true for internal consistency reliability.

A

homogeneous; heterogeneous

17
Q

Reliability coefficients are _____ when test scores are unrestricted in terms of range. An unrestricted range occurs when the examinees included in the sample are heterogeneous with regard to the characteristic(s) measured by the test – i.e., when the sample includes examinees who have high, moderate, and low levels of the characteristic(s).

A

Larger

18
Q

For dichotomously scored items, an item’s _____ indicates the percentage of examinees who answered the item correctly.

A

difficulty level (p)

Obtained by dividing the number of examinees who answered the item correctly by the total number of examinees.

p = .30 to .70 (moderate difficulty) is preferred on most tests.

19
Q

With regard to _____, the optimal p value lies halfway between 1.0 and the probability that the item can be answered correctly by guessing.

A

guessing

20
Q

The _____ ranges from -1.0 to +1.0 and indicates the difference between the percentage of examinees with high total test scores (often the top 27%) who answered the item correctly and the percentage of examinees with low total test scores (the bottom 27%) who answered the item correctly.

A

discrimination index (D)

For most tests, a D value of .30 or higher is acceptable. Note that an item’s difficulty level affects its ability to discriminate, with items of moderate difficulty having higher levels of discrimination.

21
Q

When a test’s reliability coefficient is less than 1.0, this means that an examinee’s obtained test score may or may not be his or her true score. Consequently, obtained test scores are often interpreted in terms of a _____, which indicates the range within which an examinee’s true score is likely to be given his or her obtained score

A

confidence interval

22
Q
The \_\_\_\_\_t is used to construct a confidence interval, and it’s calculated by:
Standard Deviation (√[1-reliability coefficient])
A

standard error of measurement

For a 68% confidence interval, you add and subtract one standard error of measurement to and from the obtained score; for a 95% confidence interval, you add and subtract two standard errors of measurement; and for a 99% confidence interval, you add and subtract three standard errors of measurement.

23
Q

In Item Response Theory, _____ is indicated by the slope of the item characteristic curve (ICC)

A

Discrimination

24
Q

In Item Response Theory, _____ indicated by the percent of low-, average-, and high-ability examinees who answered the item correctly.

A

Difficulty level

25
Q

In Item Response Theory, _____ is indicated by the point at which the ICC crosses the y-axis: The closer this point is to 0, the more difficult it is for examinees to choose the correct answer to the item just by guessing.

A

Probability of guessing correctly