Test Construction Flashcards
true score variability
variability due to real differences in ability or knowledge in the test-takers
error variability
variability caused by chance or random factors
classical test theory
test scores = true score variability + error variability
reliability
the amount of consistency, repeatability, and dependability in scores obtained on a test
reliability coefficient
- represented as ‘r’
- ranges from 0.00-1.00
- minimum acceptability is 0.8
- two factors that affect the size are: range of test scores and the homogeneity of test content
sources of errors in tests
- content sampling
- time sampling (ex - forgetting over time)
- test heterogeneity
factors that effect reliability
- number of items (the more the better)
- homogeneity (the more similar the items are, the better)
- range of scores (the greater the range, the better)
- ability to guess (true/false = the least reliable)
test-retest reliability
(or coefficient of stability)
- correlating pairs of scores from the same sample of people who are administered the identical test at two points in time
parallel forms reliability
(or coefficient of equivalence)
- correlating the scores obtained by the same group of people on two roughly equivalent but not identical forms of the same test administered at two different points in time
internal consistency reliability
- looks at the consistency of the scores within the test
- 2 ways: Kuder-Richardson or Cronbach’s coefficient alpha
split half reliability
- splitting the test in half (ex - odd vs even numbered questions) and then correlating the scores based on half the number of items
Spearman-Brown prophecy formula
- a type of split-half reliability
- tells us how much more reliable the test would be if it were longer
*inappropriate for speeded tests
Kuder-Richardson and Cronbach’s coeffcient alpha
- involve analysis of the correlation of each item with every other item in the test (reliability/internal consistency)
- KR-20 is used when items are scored dichotomously (ex - right or wrong)
- Cronbach’s is used when items are scored non-dichotomously (ex - Likert scale)
interrater reliability
- the degree of agreement between two or more scorers when a test is subjectively scored
- best way to improve = provide opportunity for group discussion, practice exercises, and feedback
validity
- the meaningfulness, usefulness, or accuracy of a measure
3 basic types of validity
- content
- criterion
- construct