Ch. 5 - Reliability Flashcards
alternate forms
different versions of the same test or measure;
contrast with parallel forms
alternate-forms reliability
estimate to the extent to which item sampling and other errors have affected scores on two versions of the same test;
contrast with parallel-forms reliability
bias
a factor inherent within a test that systematically prevents accurate, impartial measurement
classical test theory (CTT)
aka ‘true score theory / model’ …
system of assumptions about measurement that includes the notion that a test score (and even a response to an individual item) is composed of a relatively stable component that actually is what the test or individual item is designed to measure, as well as a component that is error.
coefficient alpha
aka ‘Cronbach’s alpha’ and alpha…
a statistic widely employed in test construction and used to assist in deriving an estimate of reliability
more technically, equal to mean of all split-half reliabilities
coefficient of equivalence
an estimate of parallel-forms reliability or alternate-forms reliability
coefficient of generalizability
index of the influence that particular facets have on a test score
coefficient of inter-scorer reliability
determines the degree of consistency among scorers in the scoring of a test
coefficient of stability
estimate of test-retest reliability obtained during time intervals of six months or longer
confidence interval
range or band of test scores that is likely to contain the “true score”
content sampling
variety of the subject matter contained in the items;
aka item sampling, in context of variation between individual test items in a test or between test items in two or more tests
criterion-referenced test
aka ‘domain-referenced testing’ and ‘content-referenced testing’
method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard (or criterion)
contrast with norm-referenced testing and assessment
decision study
conducted at the conclusion of a generalizability study, this research is designed to explore the utility and value of test scores in making decisions
dichotomous test item
test item or question that can be answered with only one of two response options (true/false, yes/no)
discrimination
in IRT, degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured by a test
domain sampling theory
a system of assumptions about measurement that includes the notion that a test score (and even response to an individual item) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error
dynamic characteristic
a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences
contrast with static characteristic
error variance
in true score model…
component of variance attributable to random sources irrelevant to the trait or ability the test purports to measure in an observed score or distribution of scores
common sources of error variance include those related to test construction (including item or content sampling), test administration, and test scoring and intrepration
estimate of inter-item consistency
an estimate of the reliability of a test obtained from a measure of inter-item consistency
facet
in generalizability theory…
variables of interest in the universe including number of items in the test, amount of training the test scorers have had, purpose of the test administration, etc
generalizability theory
aka domain sampling theory
system of assumptions about measurement that includes the notion that a test score (and response) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error
generalizability study
in context of generalizability theory…
research conducted to explore the impact of different facets of the universe on a test score
heterogeneity
more generally, having diverse contents
heterogeneous test measures multiple factors