Test Construction Prpjet quizzes Flashcards
You would use which of the following to evaluate the internal consistency of a test when all of its items are scored as correct or incorrect? A. Kendall’s coefficient of concordance B. Cohen’s kappa coefficient C. Kuder-Richardson 20 D. Spearman-Brown
Answer C is correct. Kuder-Richardson 20 (KR-20) can be used to assess a test’s internal consistency reliability when test items are dichotomously scored (e.g., as correct or incorrect).
An educational psychologist administers her newly developed aptitude test to a sample of examinees and is disappointed that the test’s reliability coefficient is only .45. Which of the following is likely to be most helpful for increasing the test’s reliability coefficient?
A. adding more items to the test that are similar to the original items in terms of content and administering the test to a new sample that is more homogeneous with regard to level of aptitude
B. adding more items to the test that differ from the original items in terms of content and administering the test to a new sample that is more homogeneous with regard to level of aptitude
C. adding more items to the test that are similar to the original items in terms of content and administering the test to a new sample that is more heterogeneous with regard to level of aptitude
D. adding more items to the test that differ from the original items in terms of content and administering the test to a new sample that is more heterogeneous with regard to level of aptitude
Answer C is correct. Reliability coefficients tend to be larger for longer tests than shorter tests as long as the added items address similar content as the original items do and when the tryout sample is heterogeneous with regard to the content measured by the test so that there is an unrestricted range of scores.
The slope of an item characteristic curve provides information on which of the following? A. the probability of guessing correctly B. the degree of relevance C. item difficulty D. item discrimination
Answer D is correct. An item’s ability to discriminate between examinees with high and low levels of the latent ability assessed by the test is indicated by the slope (steepness) of the item characteristic curve: the steeper the slope, the better the discrimination.
When a test item has an item discrimination index (D) of \_\_\_\_\_, this means that the same percentage of examinees in the high-scoring and low-scoring groups answered the item correctly. A. 0 B. .50 C. -1.0 D. +1.0
Answer A is correct. The item discrimination index (D) indicates the difference between the percentage of examinees with high total test scores who answered the item correctly and the percentage of examinees with low total test scores who answered the item correctly. When the same percentage of examinees in the two groups answered the item correctly, D equals 0.
Which of the following would be used to assess the inter-rater reliability of a rating scale designed to help clinicians distinguish between children who either do or do not meet the DSM criteria for a diagnosis of ADHD? A. Kuder-Richardson 20 B. Spearman-Brown C. Cronbach’s coefficient alpha D. Cohen’s kappa coefficient
Answer D is correct. The kappa coefficient is used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale (e.g., when a rating scale classifies children as either meeting or not meeting the diagnostic criteria for ADHD).
A cognitive ability test has a mean of 100, standard deviation of 15, and standard error of measurement of 4. The 95% confidence interval for an obtained score of 110 on this test is: A. 92 to 108. B. 106 to 116. C. 102 to 118. D. 98 to 122.
Answer C is correct. The 95% confidence interval for an obtained test score is calculated by adding and subtracting two standard errors of measurement to and from the score. For the situation described in this question, the 95% confidence interval is 110 minus and plus 8 (two standard errors), which is 102 to 118.
After using the split-half method to estimate a test’s reliability, you would use which of the following to correct the split-half reliability coefficient? A. coefficient of determination B. correction for attenuation C. Kuder-Richardson formula D. Spearman-Brown formula
Answer D is correct. Test length is one of the factors that affects the size of the reliability coefficient, and the Spearman-Brown formula is often used to estimate the effects of lengthening or shortening a test on its reliability coefficient. This formula is especially useful for correcting the split-half reliability coefficient because assessing split-half reliability involves splitting the test in half and calculating a reliability coefficient for the two halves of the test. Therefore, split-half reliability tends to underestimate a test’s actual reliability, and the Spearman-Brown formula is used to estimate the reliability coefficient for the full length of the test. [The correction for attenuation formula (answer B) isn’t mentioned in the content summary and is a formula that is used to estimate the effect of increasing the reliability of the predictor and/or criterion on the predictor’s criterion-related validity coefficient.]
Classical test theory predicts that total variability in test scores is the result of true score variability: A. plus random error. B. plus systematic error. C. minus systematic error. D. minus systematic and random error.
Answer A is correct. Classical test theory is also known as true score theory and predicts that obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E), with measurement error referring to random factors that affect test performance in unpredictable ways.
An achievement test has a test-retest reliability coefficient of .90. This means that \_\_\_\_ of variability in test scores is due to true score variability. A. 90% B. 84% C. 16% D. 10%
Answer A is correct. A reliability coefficient is interpreted directly as the amount of variability in test scores that’s due to true score variability. When a test’s reliability coefficient is .90, this means that 90% of variability in test scores is due to true score variability and the remaining 10% is due to measurement error.
A factor analysis yields three factors, and one of the tests included in the analysis has factor loadings of .60, .30, and .10 on Factors I, II, and III, respectively. Assuming that the rotation was orthogonal, the communality for Test A is: A. .91. B. .54. C. .46. D. .09.
Answer C is correct. When factors are orthogonal (uncorrelated), you can add the squared factor loadings to determine a test’s communality: .60 squared is .36, .30 squared is .09, and .10 squared is .01; .36 plus .09 plus .01 is .46.
The results of a factor analysis indicate that the factor loading for Test A and Factor I is .70. This means that:
A. 70% of variability in Test A scores is explained by the factor analysis.
B. 49% of variability in Test A scores is explained by the factor analysis.
C. 70% of variability in Test A scores is explained by Factor I.
D. 49% of variability in Test A scores is explained by Factor I.
Answer D is correct. A factor loading indicates the correlation between a test and an identified factor. It can be squared to determine the amount of variability in test scores that’s explained by the identified factor: .70 squared is .49 (49%).
In the context of factor analysis, “orthogonal” means: A. correlated. B. uncorrelated. C. statistically significant. D. statistically insignificant.
Answer B is correct. In factor analysis, factors that are uncorrelated are referred to as orthogonal while factors that are correlated are referred to as oblique.
A test has \_\_\_\_\_\_\_\_\_\_ validity when it has low correlations with measures of unrelated characteristics. A. discriminant B. concurrent C. convergent D. differential
Answer A is correct. For the exam, you want to know that discriminant validity is another name for divergent validity and that a test has discriminant (divergent) validity when scores on the test have low correlations with scores on measures of unrelated traits.
When using the multitrait-multimethod matrix, a large \_\_\_\_\_\_\_\_\_\_\_\_ coefficient provides evidence of a test’s convergent validity. A. heterotrait-heteromethod B. heterotrait-monomethod C. monotrait-heteromethod D. monotrait-monomethod
Answer C is correct. The monotrait-heteromethod coefficient indicates the correlation between the test that’s being validated and a measure of the same trait (monomethod) using a different method (heteromethod). It provides evidence of the test’s convergent validity when it’s large.
To ensure that job applicants believe that questions in a job selection test are relevant to job performance so they do their best when answering them, the test developer will want to make sure the test has adequate \_\_\_\_\_\_\_\_ validity. A. content B. face C. convergent D. divergent
Answer B is correct. Face validity refers to the extent to which test items “look valid” to examinees. It’s not an actual type of test validity but is often important for ensuring that examinees are motivated to do their best when answering test items. (Because adequate content validity does not necessarily guarantee adequate face validity, answer B is better than answer A.)