Test Construction Flashcards
Classical Test Theory
*Assumes that obtained test scores are due to 1. true score variability and 2. measurement error.
Reliability Coefficient
Range from 0 to 1.0
Amount of variability obtained test scores due to true score variability.
r= .80 (80% of score are due to true score variability and 20% due to measurement error).
.70 or higher is seen as minimally acceptable for most test
Test-retest reliability
Provide info on the consistency of scores over time.
Administer test at baseline and then again later on. Correlating the 2 scores.
*Useful for test that are measuring characteristics that are stable over time.
Alternate Forms Reliability
Provide info on the consistency of scores over different forms of the test and when second form is administered at a later time.
*When a measure has more than 1 form
Internal Consistency Reliability
Provide info on the consistency of scores over different test items
*useful for test that measure a single content domain
Methods
-Coefficient Alpha (administering test to examinees and averaging the inter-item consistency)
Inter-Rater Reliability
*subjectively test
Provide info on the consistency of scores assigned by different raters.
Methods
-Percent agreement
-Cohen Capa Coefficient
Content Homogeneity
*Effect size of reliability coefficient
-Test that’s contents are homogeneous have larger reliability coefficients then those that are heterogeneous.
Range of scores
*Effect size of reliability coefficient
-reliability coefficients are larger when tests scores are larger/ unrestricted in range.
-unrestricted range occurs when examinees included the sample are heterogeneous with regard to characteristics measured by the test (high, moderate and low)
Guessing
*Effect size of reliability coefficient
-likelihood a test answer can be correctly answered by guessing, the lower the reliability coefficient.
Reliability index
*Effect size of reliability coefficient
-theoretical correlation between true test scores and observed test scores.
*When the reliability coefficient is .81 the Reliability Index is the square root of .81 = .90
Item Analysis
Used to determine which items to include in the test.
Used to determine item difficulty level and ability to discriminate between high and low total test scores.
Item Difficulty
P = number of participants who answer correctly
- calculated by dividing the number of correct responses by the total number of responses.
value of P ranges from 0 to 1.0
*Moderately difficult items are preferred for most test. (.30 to .70)
Item Discrimination
D Ranges from -1.0 to 1.0
Difference between examinees with high total test scores and percentage of examinees with low total test scores who also answered the question correctly
Standard error of measurement & confidence intervals
Standard error of measurement is used to obtain a confidence interval.
-calculated by multiplying the test SD times the square root of 1 minus the reliability coefficient
68% confidence interval = add and subtract 1 standard error of measurement to and from the obtained score.
95% confidence interval = you add and subtract 2 standard error of measurement
99% confidence interval = you add and subtract 3 standard error of measurement
(an examinee obtained a score of 90 on a test that has a standard error of measurement of 5 and ask you to identify the 95% confidence interval for this score. To do so, you add and subtract 10 (two standard errors) to and from 90, which gives you a 95% confidence interval of 80 to 100. )
Item Response Theory
Item based
Focuses on responses to individual test items.
*determine the probability of answering a test item correctly
*better suited for developing computerized tests.