Reliability Flashcards
Different versions of the same test or measure; contrast with parallel forms
Alternate forms
A statistic widely employed in test construction and used to assist in deriving an estimate of reliability; more technically, it is equal to the mean of all split-half reliabilities
Coefficient alpha or Cronbach’s alpha and alpha
An estimate of parallel-forms reliability or alternate-forms reliability
Coefficient of equivalence
In generalizability theory, an index of the influence that particular facets have on a test score
Coefficient of generalizability
An estimate of test-retest reliability obtained during time intervals of six months or longer
Coefficient of stability
A range or band of test scores that is likely to contain the “true score”
Confidence interval
The variety of the subject matter contained in the items; frequently referred to in the context of the variation between individual test items in a test or between test items in two or more tests
Content sampling or item sampling
In the true score model, the component of variance attributable to random sources irrelevant to the trait or ability the test purports to measure in an observed score or distribution of scores. Common sources include those related to test construction (including item or content sampling), test administration, and test scoring and interpretation
Error variance
Also referred to as domain sampling theory, a system of assumptions about measurement that includes the notion that a test score, and even a response to an individual item, is composed of a relatively stable component that actually is what the test or individual item is designed to measure, and relatively unstable components that collectively can be accounted for as error
Generalizability theory
A reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedure used and the resulting correlation coefficient tends to be higher as a consequence; contrast with restriction of range
Inflation of range or inflation of variance
An estimate of how consistently the items of a test measure a single construct obtained from a single administration of a single form of the test and the measurement of the degree of correlation among all of the test items
Internal consistency or inter-item consistency
An estimate of the degree of agreement or consistency between two or more scorers (or judges or raters or observers)
Inter-scorer reliability or inter-rater reliability, observer reliability, judge reliability, and scorer reliability
A system of assumptions about measurement, including the assumption that a trait being measured by a test is uni-dimensional, and the extent to which each test item measures the trait
Item response theory (IRT) or latent-trait theory or the latent-trait model
The variety of the subject matter contained in the items; frequently referred to in the context of the variation between individual test items in a test or between test items in two or more tests
Item sampling or content sampling
A measure of inter-scorer reliability originally designed for use when scorers make ratings using nominal scales of measurement
Kappa statistic