Test construction Flashcards
Classical Test Theory and Reliability is based on the assumption that
obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E): i.e., X = T + E.
measurement error is due to
random factors that affect the test performance of examinees in unpredictable ways
Test reliability refers to
the extent to which a test provides consistent information
interpretation of reliability coefficient
They’re always interpreted directly as the amount of variability in obtained test scores that’s due to true score variability. For instance, if a test has a reliability coefficient of .80, this means that 80% of variability in obtained test scores is due to true score variability and the remaining 20% is due to measurement error.
4 Methods for Estimating Reliability:
There are four main methods for assessing a test’s reliability: test-retest, alternate forms, internal consistency, and inter-rater.
Test-retest reliability provides information about
the consistency of scores over time.
test-retest reliability involves
administering the test to a sample of examinees, re-administering the test to the same examinees at a later time, and correlating the two sets of scores.
Test-retest reliability is useful for tests that are designed to measure a characteristic that’s ___over time.
stable
Alternate forms reliability provides information about
the consistency of scores over different forms of the test and, when the second form is administered at a later time, the consistency of scores over time.
alternate forms reliability involves
administering one form of the test to a sample of examinees, administering the other form to the same examinees, and correlating the two sets of scores. Alternate forms reliability is important whenever a test has more than one form.
Internal consistency reliability provides information
on the consistency of scores over different test items and is useful for tests that are designed to measure a single content domain or aspect of behavior.
best reliability measures for measuring speed/performance
internal consistency relaibility is NOT a good measure because it tends to overestimate their reliability. For speed tests, test-retest and alternate forms reliability are appropriate.
3 key methods for INTERNAL CONSISTENCY reliability
Coefficient Alpha (Cronbach’s Alpha)
Kuder-Richardson 20 (KR-20)
Split-Half Reliability:
coefficient/cronbach’s alpha used for tests with ___
multiple response formats
kuder-richardson 20 (KR-20) is used for tests with ___
dichotomous (e.g. right/wrong) scoring
issue with split-half reliability is
a split-half reliability coefficient underestimates a test’s reliability and is usually corrected with the Spearman-Brown prophecy formula, which is used to determine the effects of lengthening or shortening a test on its reliability coefficient.
Inter-rater reliability is important for measures that are ____scored and provides information on the consistency of scores or ratings assigned by different raters.
subjectively
methods used to evaluate inter-rater reliability:
Percent agreement
Cohen’s kappa coefficient is also known as the kappa statistic
issue with percent agreement method
Percent agreement can be calculated for two or more raters. A problem with this method is that it does not take chance agreement into account, which can result in an overestimate of reliability.
when ratings represent a nominal scale, best inter rater reliability method is
Cohen’s kappa coefficient is also known as the kappa statistic and is one of several inter-rater reliability coefficients that is corrected for chance agreement between raters.
The reliability of subjective ratings can be affected by _____It occurs when two or more raters communicate with each other while assigning ratings, which results in increased consistency (but often decreased accuracy) in ratings and an overestimate of inter-rater reliability.
consensual observer drift.
3 Factors that Affect the Reliability Coefficient
content homogeneity
range of scores
guessing
when a test’s reliability coefficient is .81, the reliability index is the ____
square root of .81, which is .90.
For dichotomously scored items, an item’s difficulty level (p) indicates the percentage of examinees who
answered the item correctly.