Item Analysis and Test Reliability Flashcards by Raffael Boccamazzo

Three factors that affect the reliability coefficient

Content homogeneity
Range of scores
Guessing (T/F tests tend to have lower validity, due to easier guessing)

How well did you know this?

Not at all

Perfectly

What are the four main methods of assessing a test’s reliability?

Test-retest
Alternate Forms
Internal Consistency
Inter-rater

How well did you know this?

Not at all

Perfectly

_____ is due to random factors that affect the test performance of examinees in unpredictable ways and include distractions during testing, ambiguously worded test items, and examinee fatigue.

Measurement error

How well did you know this?

Not at all

Perfectly

_____ is the result of actual differences among examinees with regard to whatever the test is measuring. It’s assumed to be consistent, which means that an examinee’s true score will be the same regardless of which form of the test he or she takes or who scores the test.

True score variability

How well did you know this?

Not at all

Perfectly

_____ provides information about the consistency of scores over time.

Test-retest reliability

How well did you know this?

Not at all

Perfectly

_____ provides information about the consistency of scores over different forms of the test and, when the second form is administered at a later time, the consistency of scores over time.

Alternate forms reliability

How well did you know this?

Not at all

Perfectly

provides information on the _____ consistency of scores over different test items and is useful for tests that are designed to measure a single content domain or aspect of behavior. It’s not useful for speed tests because it tends to overestimate their reliability.

Internal consistency reliability

How well did you know this?

Not at all

Perfectly

What are three methods of evaluating a test’s internal consistency reliability?

Coefficient alpha (Cronbach’s alpha)
Kuder-Richardson 20 (KR-20)
Split-half reliability

How well did you know this?

Not at all

Perfectly

This method of evaluating internal consistency reliability involves administering the test to a sample of examinees and calculating the average inter-item consistency.

Coefficient alpha (Cronbach’s Alpha)

How well did you know this?

Not at all

Perfectly

This method of evaluating internal consistency reliability is an alternative to coefficient alpha that can be used when test items are dichotomously scored (e.g., as correct or incorrect).

Kuder-Richardson 20 (KR-20)

How well did you know this?

Not at all

Perfectly

This method of evaluating internal consistency reliability involves administering the test to a sample of examinees, splitting the test in half (often in terms of even- and odd-numbered items), and correlating the scores on the two halves.

Split-half reliability

How well did you know this?

Not at all

Perfectly

Given split-half reliability coefficients tend to underestimate a test’s reliability, due to using two shorter forms, they are usually corrected with the _____, which is used to determine the effects of lengthening or shortening a test on its reliability coefficient.

Spearman-Brown prophecy formula

How well did you know this?

Not at all

Perfectly

_____ provides information on the consistency of scores over different raters and is important for tests that are subjectively scored.

Inter-rater reliability

How well did you know this?

Not at all

Perfectly

What are two methods of evaluating inter-rater reliability?

Cohen’s kappa coefficient -
used to assess the consistency of ratings assigned by two raters when ratings represent a nominal scale
Kendall’s coefficient of concordance - assess the consistency of ratings assigned by three or more raters when ratings represent ranks

How well did you know this?

Not at all

Perfectly

This occurs when two or more raters communicate with each other while assigning ratings, which results in increased consistency (but often decreased accuracy) in ratings and an overestimate of inter-rater reliability.

Consensual observer drift

How well did you know this?

Not at all

Perfectly

Tests that are _____ with regard to content tend to have larger reliability coefficients than tests that are _____, and this is especially true for internal consistency reliability.

Study These Flashcards

homogeneous; heterogeneous

Reliability coefficients are _____ when test scores are unrestricted in terms of range. An unrestricted range occurs when the examinees included in the sample are heterogeneous with regard to the characteristic(s) measured by the test – i.e., when the sample includes examinees who have high, moderate, and low levels of the characteristic(s).

Study These Flashcards

Larger

For dichotomously scored items, an item’s _____ indicates the percentage of examinees who answered the item correctly.

Study These Flashcards

difficulty level (p)

Obtained by dividing the number of examinees who answered the item correctly by the total number of examinees.

p = .30 to .70 (moderate difficulty) is preferred on most tests.

With regard to _____, the optimal p value lies halfway between 1.0 and the probability that the item can be answered correctly by guessing.

Study These Flashcards

guessing

The _____ ranges from -1.0 to +1.0 and indicates the difference between the percentage of examinees with high total test scores (often the top 27%) who answered the item correctly and the percentage of examinees with low total test scores (the bottom 27%) who answered the item correctly.

Study These Flashcards

discrimination index (D)

For most tests, a D value of .30 or higher is acceptable. Note that an item’s difficulty level affects its ability to discriminate, with items of moderate difficulty having higher levels of discrimination.

When a test’s reliability coefficient is less than 1.0, this means that an examinee’s obtained test score may or may not be his or her true score. Consequently, obtained test scores are often interpreted in terms of a _____, which indicates the range within which an examinee’s true score is likely to be given his or her obtained score

Study These Flashcards

confidence interval

The \_\_\_\_\_t is used to construct a confidence interval, and it’s calculated by:
Standard Deviation (√[1-reliability coefficient])

Study These Flashcards

standard error of measurement

For a 68% confidence interval, you add and subtract one standard error of measurement to and from the obtained score; for a 95% confidence interval, you add and subtract two standard errors of measurement; and for a 99% confidence interval, you add and subtract three standard errors of measurement.

In Item Response Theory, _____ is indicated by the slope of the item characteristic curve (ICC)

Study These Flashcards

Discrimination

In Item Response Theory, _____ indicated by the percent of low-, average-, and high-ability examinees who answered the item correctly.

Study These Flashcards

Difficulty level

In Item Response Theory, _____ is indicated by the point at which the ICC crosses the y-axis: The closer this point is to 0, the more difficult it is for examinees to choose the correct answer to the item just by guessing.

Probability of guessing correctly

Item Analysis and Test Reliability Flashcards

(25 cards)