Test Construction Flashcards
When using % agreement for inter-rater reliability, one problem could be that
it overestimates reliability due to chance agreement
If you administer a test to another sample ____ may occur where the validity coefficient is likely to be smaller with the second sample than with the first when cross-validity
shrinkage
multrait-multimatrix method is important for evaluating a test’s”
construct validity
The Kuder-Richardson Formula 20 (KR-20) can be used to estimate a test’s ____________ reliability when test items are scored dichotomously.
internal consistency
A test’s reliability coefficient can range from
0 to 1
.90 = req. for many high stakes tests .70 = req. for many other tests
Formula for standard error of measurement
SD*sq root of 1-reliability coeff
You would use which of the following to construct a confidence interval around an examinee’s predicted criterion score?
Standard error of estimate
Murray’s theory of personality, which describes personality as being the result of internal and external forces - resulted in the development of which personality test
Thematic Appreciation Test
construct versus content validity
construct: does it measure what you want it to measure
content: does it adequately sample everything to cover the content of the ability
the amount of variability in obtained test scores that’s due to true score variability
reliability coefficient
Spearman-Brown prophecy formula is used to
correct split-half reliability, which shortens the test into 2 tests
Chronbach’s alpha and KR-20 measure:
Whereas, Cohen’s kappa and Kendall’s coefficient measure:
Chronbach’s alpha and KR-20 measure: internal consistency
Cohen’s kappa and Kendall’s coefficient measure: inter-rater reliability
Percentage scores (40/80 correct answers = 50%) and expectancy tables are examples of
criterion referenced scores
Another name for % variability accounted for
coefficient of determination
a method for developing personality inventories in which the items (presumed to measure one or more traits) are created and then administered to a criterion group of people known to possess a certain characteristic (e.g., antisocial behavior, significant anxiety, exaggerated concern about physical health) and to a control group of people without the characteristic. Only those items that demonstrate an ability to distinguish between the two groups are chosen for inclusion in the final inventory.
empirical-criterion keyeing