Item Analysis and Test Reliability Flashcards
After using the split-half method to estimate a test’s reliability, you would use which of the following to correct the split-half reliability coefficient?
A. coefficient of determination
B. correction for attenuation
C. Kuder-Richardson formula
D. Spearman-Brown formula
Answer D is correct. Test length is one of the factors that affects the size of the reliability coefficient, and the Spearman-Brown formula is often used to estimate the effects of lengthening or shortening a test on its reliability coefficient. This formula is especially useful for correcting the split-half reliability coefficient because assessing split-half reliability involves splitting the test in half and calculating a reliability coefficient for the two halves of the test. Therefore, split-half reliability tends to underestimate a test’s actual reliability, and the Spearman-Brown formula is used to estimate the reliability coefficient for the full length of the test.
Classical test theory predicts that total variability in test scores is the result of true score variability:
A. plus random error.
B. plus systematic error.
C. minus systematic error.
D. minus systematic and random error.
Answer A is correct. Classical test theory is also known as true score theory and predicts that obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E), with measurement error referring to random factors that affect test performance in unpredictable ways.
The slope of an item characteristic curve provides information on which of the following?
A. the probability of guessing correctly
B. the degree of relevance
C. item difficulty
D. item discrimination
Answer D is correct. An item’s ability to discriminate between examinees with high and low levels of the latent ability assessed by the test is indicated by the slope (steepness) of the item characteristic curve: the steeper the slope, the better the discrimination.
You would use which of the following to evaluate the internal consistency of a test when all of its items are scored as correct or incorrect?
A. Spearman rho
B. Cohen’s kappa coefficient
C. Kuder-Richardson 20
D. Spearman-Brown
Answer C is correct. Kuder-Richardson 20 (KR-20) can be used to assess a test’s internal consistency reliability when test items are dichotomously scored (e.g., as correct or incorrect).
Which of the following would be used to assess the inter-rater reliability of a rating scale designed to help clinicians distinguish between children who either do or do not meet the DSM criteria for a diagnosis of ADHD?
A. Kuder-Richardson 20
B. Spearman-Brown
C. Cronbach’s coefficient alpha
D. Cohen’s kappa coefficient
Answer D is correct. The kappa coefficient is used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale (e.g., when a rating scale classifies children as either meeting or not meeting the diagnostic criteria for ADHD).
An achievement test has a test-retest reliability coefficient of .90. This means that ____ of variability in test scores is due to true score variability.
A. 90%
B. 84%
C. 16%
D. 10%
Answer A is correct. A reliability coefficient is interpreted directly as the amount of variability in test scores that’s due to true score variability. When a test’s reliability coefficient is .90, this means that 90% of variability in test scores is due to true score variability and the remaining 10% is due to measurement error.
An educational psychologist administers her newly developed aptitude test to a sample of examinees and is disappointed that the test’s reliability coefficient is only .45. Which of the following is likely to be most helpful for increasing the test’s reliability coefficient?
A. adding more items to the test that are similar to the original items in terms of content and administering the test to a new sample that is more homogeneous with regard to level of aptitude
B. adding more items to the test that differ from the original items in terms of content and administering the test to a new sample that is more homogeneous with regard to level of aptitude
C. adding more items to the test that are similar to the original items in terms of content and administering the test to a new sample that is more heterogeneous with regard to level of aptitude
D. adding more items to the test that differ from the original items in terms of content and administering the test to a new sample that is more heterogeneous with regard to level of aptitude
Answer C is correct. Reliability coefficients tend to be larger for longer tests than shorter tests as long as the added items address similar content as the original items do and when the tryout sample is heterogeneous with regard to the content measured by the test so that there is an unrestricted range of scores.
When a test item has an item discrimination index (D) of _____, this means that the same percentage of examinees in the high-scoring and low-scoring groups answered the item correctly.
A. 0
B. .50
C. -1.0
D. +1.0
Answer A is correct. The item discrimination index (D) indicates the difference between the percentage of examinees with high total test scores who answered the item correctly and the percentage of examinees with low total test scores who answered the item correctly. When the same percentage of examinees in the two groups answered the item correctly, D equals 0.
A cognitive ability test has a mean of 100, standard deviation of 15, and standard error of measurement of 4. The 95% confidence interval for an obtained score of 110 on this test is:
A. 92 to 108.
B. 106 to 116.
C. 102 to 118.
D. 98 to 122.
Answer C is correct. The 95% confidence interval for an obtained test score is calculated by adding and subtracting two standard errors of measurement to and from the score. For the situation described in this question, the 95% confidence interval is 110 minus and plus 8 (two standard errors), which is 102 to 118.