Test construction Flashcards
Classical Test Theory and Reliability is based on the assumption that
obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E): i.e., X = T + E.
measurement error is due to
random factors that affect the test performance of examinees in unpredictable ways
Test reliability refers to
the extent to which a test provides consistent information
interpretation of reliability coefficient
They’re always interpreted directly as the amount of variability in obtained test scores that’s due to true score variability. For instance, if a test has a reliability coefficient of .80, this means that 80% of variability in obtained test scores is due to true score variability and the remaining 20% is due to measurement error.
4 Methods for Estimating Reliability:
There are four main methods for assessing a test’s reliability: test-retest, alternate forms, internal consistency, and inter-rater.
Test-retest reliability provides information about
the consistency of scores over time.
test-retest reliability involves
administering the test to a sample of examinees, re-administering the test to the same examinees at a later time, and correlating the two sets of scores.
Test-retest reliability is useful for tests that are designed to measure a characteristic that’s ___over time.
stable
Alternate forms reliability provides information about
the consistency of scores over different forms of the test and, when the second form is administered at a later time, the consistency of scores over time.
alternate forms reliability involves
administering one form of the test to a sample of examinees, administering the other form to the same examinees, and correlating the two sets of scores. Alternate forms reliability is important whenever a test has more than one form.
Internal consistency reliability provides information
on the consistency of scores over different test items and is useful for tests that are designed to measure a single content domain or aspect of behavior.
best reliability measures for measuring speed/performance
internal consistency relaibility is NOT a good measure because it tends to overestimate their reliability. For speed tests, test-retest and alternate forms reliability are appropriate.
3 key methods for INTERNAL CONSISTENCY reliability
Coefficient Alpha (Cronbach’s Alpha)
Kuder-Richardson 20 (KR-20)
Split-Half Reliability:
coefficient/cronbach’s alpha used for tests with ___
multiple response formats
kuder-richardson 20 (KR-20) is used for tests with ___
dichotomous (e.g. right/wrong) scoring
issue with split-half reliability is
a split-half reliability coefficient underestimates a test’s reliability and is usually corrected with the Spearman-Brown prophecy formula, which is used to determine the effects of lengthening or shortening a test on its reliability coefficient.
Inter-rater reliability is important for measures that are ____scored and provides information on the consistency of scores or ratings assigned by different raters.
subjectively
methods used to evaluate inter-rater reliability:
Percent agreement
Cohen’s kappa coefficient is also known as the kappa statistic
issue with percent agreement method
Percent agreement can be calculated for two or more raters. A problem with this method is that it does not take chance agreement into account, which can result in an overestimate of reliability.
when ratings represent a nominal scale, best inter rater reliability method is
Cohen’s kappa coefficient is also known as the kappa statistic and is one of several inter-rater reliability coefficients that is corrected for chance agreement between raters.
The reliability of subjective ratings can be affected by _____It occurs when two or more raters communicate with each other while assigning ratings, which results in increased consistency (but often decreased accuracy) in ratings and an overestimate of inter-rater reliability.
consensual observer drift.
3 Factors that Affect the Reliability Coefficient
content homogeneity
range of scores
guessing
when a test’s reliability coefficient is .81, the reliability index is the ____
square root of .81, which is .90.
For dichotomously scored items, an item’s difficulty level (p) indicates the percentage of examinees who
answered the item correctly.
when 50 of 100 examinees answered an item correctly, the item’s p value is
50/100, or .50.
For mastery tests (tests used to identify examinees who have mastered a certain level of knowledge or skill), ___ p values are preferred.
lower
The item discrimination index (D) ranges from ___and indicates the difference between the percentage of examinees with ___ and the percentage of examinees with __
-1.0 to +1.0 ; high total test scores (often the top 27%) who answered the item correctly; low total test scores (the bottom 27%) who answered the item correctly.
what is the optimal difficulty level for a four-answer multiple choice question
the chance of choosing the correct answer to a four-answer multiple-choice question by guessing is .25, and the optimal difficulty level for this type of item is calculated by adding 1.0 to .25 and dividing the result by 2: (1.0 + .25)/2 = 1.25/2 = .625.
when 90% of examinees in the high-scoring group and 20% of examinees in the low-scoring group answered an item correctly, the item’s D value is
.90 minus .20, which is .70.
an item’s difficulty level affects its ability to discriminate, with items of ___ having higher levels of discrimination.
moderate difficulty
if a test has a standard deviation of 5 and a reliability coefficient of .84, its standard error of measurement equals
5 times the square root of 1 minus .84:
1 minus .84 is .16,
the square root of .16 is .4,
5 times .4 is 2.
In other words, when a test’s standard deviation is 5 and its reliability coefficient is .84, its standard error of measurement is 2.