Ch. 5 - Reliability Flashcards
alternate forms
different versions of the same test or measure;
contrast with parallel forms
alternate-forms reliability
estimate to the extent to which item sampling and other errors have affected scores on two versions of the same test;
contrast with parallel-forms reliability
bias
a factor inherent within a test that systematically prevents accurate, impartial measurement
classical test theory (CTT)
aka ‘true score theory / model’ …
system of assumptions about measurement that includes the notion that a test score (and even a response to an individual item) is composed of a relatively stable component that actually is what the test or individual item is designed to measure, as well as a component that is error.
coefficient alpha
aka ‘Cronbach’s alpha’ and alpha…
a statistic widely employed in test construction and used to assist in deriving an estimate of reliability
more technically, equal to mean of all split-half reliabilities
coefficient of equivalence
an estimate of parallel-forms reliability or alternate-forms reliability
coefficient of generalizability
index of the influence that particular facets have on a test score
coefficient of inter-scorer reliability
determines the degree of consistency among scorers in the scoring of a test
coefficient of stability
estimate of test-retest reliability obtained during time intervals of six months or longer
confidence interval
range or band of test scores that is likely to contain the “true score”
content sampling
variety of the subject matter contained in the items;
aka item sampling, in context of variation between individual test items in a test or between test items in two or more tests
criterion-referenced test
aka ‘domain-referenced testing’ and ‘content-referenced testing’
method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard (or criterion)
contrast with norm-referenced testing and assessment
decision study
conducted at the conclusion of a generalizability study, this research is designed to explore the utility and value of test scores in making decisions
dichotomous test item
test item or question that can be answered with only one of two response options (true/false, yes/no)
discrimination
in IRT, degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured by a test
domain sampling theory
a system of assumptions about measurement that includes the notion that a test score (and even response to an individual item) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error
dynamic characteristic
a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences
contrast with static characteristic
error variance
in true score model…
component of variance attributable to random sources irrelevant to the trait or ability the test purports to measure in an observed score or distribution of scores
common sources of error variance include those related to test construction (including item or content sampling), test administration, and test scoring and intrepration
estimate of inter-item consistency
an estimate of the reliability of a test obtained from a measure of inter-item consistency
facet
in generalizability theory…
variables of interest in the universe including number of items in the test, amount of training the test scorers have had, purpose of the test administration, etc
generalizability theory
aka domain sampling theory
system of assumptions about measurement that includes the notion that a test score (and response) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error
generalizability study
in context of generalizability theory…
research conducted to explore the impact of different facets of the universe on a test score
heterogeneity
more generally, having diverse contents
heterogeneous test measures multiple factors
homogeneity
describes degree to which a test measures a single trait
inflation of range/variance
a reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedure used and so the resulting correlation coefficient tends to be higher
contrast with restriction of range
information function
inter-item consistency
the consistency or homogeneity of the items of a test, estimated by techniques such as the split-half method
internal consistency estimate of reliability
an estimate of the reliability of a test obtained from a measure of inter-item consistency
inter-scorer reliability
aka inter-rater reliability, observer reliability, judge reliability, and scorer reliability
an estimate of the degree of agreement of consistency between two and more scorers (or judges, raters, observers)
item response theory (IRT)
aka latent-trait theory / model
system of assumptions about measurement (including assumption that a trait being measured by a test is unidimensional) and the extent to which each test item measures the trait
item sampling
aka content sampling
variety of the subject matter contained in the items
freq ref to in context of the variation between individual test items in a test or between test items in two or more tests
latent-trait theory
aka latent-trait model
system of assumptions about measurement, including the assumption that a trait being measured by a test is unidimensional, and the extent to which each test item measures the trait
measurement error
refers to the inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistakes
odd-even reliability
estimate of split-half reliability of a test, obtained by assigning odd-numbered items to one half of the test and even-numbered items to the other half
parallel forms
two or more versions of forms of the same test where, for each form, the means and variances of observed test scores are equal
contrast with alternate forms
parallel-forms reliability
as estimate of the extent to which item sampling and other errors have affected test scores on two versions of the same test when, for each form of the test, the means and variances of observed test scores are equal
contrast with alternate-forms reliability
polytomous test item
a test item or question with three or more alternative responses, where only one alternative is scored correct or scored as being consistent with a targeted trait or other construct
power test
a test, usually of achievement or ability, which 1) either no time limit or such a long time limit that all test-takers can attempt all items and 2) some items so difficult that no test-taker can obtain a perfect score
contrast with speed test
random error
a source of error in measuring a targeted variable, caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process
contrast with systematic error
Rasch model
reference to an IRT model with very specific assumptions about the underlying distribution
reliability
the extent to which measurements are consistent or repeatable
also extent to which measurements differ from occasion to occasion as a function of measurement error
reliability coefficient
general term for an index of reliability or the ratio of true score variance on a test to the total variance
replicability crisis
low replication rates commonly found in psychological research
restriction of range/variance
aka restriction of variance
phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is restricted by the sampling procedure used and so the resulting correlation coefficient tends to be lower
contrast with inflation of range
Spearman-Brown formula
equation used to estimate internal consistency reliability from a correlation of two halves of a test that has been lengthened or shortened
inappropriate for use with heterogeneous tests or speed tests
speed test
test usually of achievement or ability, with a time limit
speed tests usually contain items of uniform difficulty level
split-half reliability
estimate of the internal consistency of a test obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
standard error of a score
in true score theory, aka SEM
a statistic designed to estimate the extent to which an observed score deviates from a true score
standard error of measurement
(SEM, aka std err of score)
in true score theory
a statistic designed to estimate the extent to which an observed score deviates from a true score
standard error of the difference
a statistic designed to aid in determining how large a difference between two scores should be before it is considered statistically significant
static characteristic
a trait, state, or ability presumed to be relatively unchanging over time
contrast with dynamic characteristic
systematic error
a source of error in measuring a variable that is typically constant and proportionate to what is presumed to be the true value of the variable being measured
contrast with random error
test-retest reliability
estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
transient error
source of error attributable to variations in the test-takers feelings, moods, or mental state over time
true score
a value that, according to classical test theory, genuinely reflects an individual’s ability (or trait) level as measured by a particular test
true variance
in the true score model
component of variance attributable to true differences in the ability or trait bring measured that are inherent in an observed score of distribution of scores
universe
in generalizability theory
the total context of a particular test situation, including all the factors that lead to an individual’s test-taker’s score
universe score
in generalizability theory
a test score corresponding to the particular universe being assessed or evaluated
variance
a measurement of variability equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean