Unit 5: Reliability Flashcards
measurement processes that alter what is measured
Carryover effects
Portion of variability in test scores that is due to factors unrelated to the construct being measured.
error variance
allows us to estimate, with a specific level of confidence, the range in which the true score is likely to exist
Standard Error of Measurement (SEM)
the degree to which a measure predictably overestimates or underestimates a quantity
refers to the degree to which systematic error influences the measurement
bias
an estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error
alternate forms reliability
refers to the portion of variability in tests scores that reflects the actual differences in the trait, ability, characteristics the test is designed to measure
true variance
An estimate of test-retest reliability may be most appropriate in gauging the reliability of tests that employ outcome measures such as r___
reaction time or perceptual judgments
If interested in looking in the truth independent of measurement, psychologists look for the
construct score
a person’s standing on a theoretical variable independent of any particular measurement
construct score
In ability tests, __ are carryover effects in which the test itself provides an opportunity to learn and practice the ability being measured
practice effects
a statistic that quantifies reliability, ranging from 0 (not at all reliable) to 1 (perfectly reliable)
reliability coefficient
It provides an estimate of the amount of error inherent in an observed score or measurement
Standard Error of Measurement (SEM)
based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation
generalizability theory
to evaluate the relationship between different forms of a measure
alternate forms
an estimate of the reliability of a test can be obtained without developing an alternate form of the test and without having to administer the test twice to the same people
Internal consistency estimate of reliability or estimate of inter-item consistency
provides a measure of the precision of an observed test score
Standard Error of Measurement (SEM)
if the test is __ in items, an estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability
heterogeneous
terms that refer to variation among items within a test as well as to variation among items between tests
item sampling or content sampling
If two scores each contain error such that in each
case the true score could be higher or lower, then we would want the two scores to be further apart before we conclude that there is a significant difference between them
standard error of difference
designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective
criterion-referenced test
The tool used to estimate or infer the extent to which an observed score deviates from a true score
Standard Error of Measurement (SEM)
Scores on criterion-referenced tests tend to be interpreted in -
pass–fail (or, perhaps more accurately, “master–failed-to-master”) terms
Universe is described in terms of its facets, which include considerations such as —
the number of items in the test,
the amount of training the test scorers have had, and
the purpose of the test administration
In general, a primary objective in splitting a test in half for the purpose of obtaining a split-half reliability estimate is to create what might be called __
mini- parallel-forms
nature of the test
(1) The test items are homogeneous or heterogeneous in nature;
(2) The characteristic, ability, or trait being measured is presumed to be dynamic or static;
(3) The range of test scores is or is not restricted;
(4) The test is a speed or a power test; and
(5) The test is or is not criterion-referenced
Refers to the degree of correlation among all the
items on a scale
inter-item consistency
often used when coding nonverbal behavior
inter-scorer reliability
It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by any number of item
Spearman-Brown formula
The extent to which a testtaker’s score is affected by the content sampled on a test and by the way the content is sampled (i.e., the way in which the item is constructed) is a source of error variance
test construction
A reliability estimate of a __ test should be based on performance from two independent testing periods
speed
purpose: to evaluate the stability of a measure
test-retest reliability
In many tests, the advent of computer scoring and a growing reliance on objective, computer-scorable items have virtually eliminated error variance caused by scorer difference
test scoring and interpretation
it accurately measures internal consistency under highly specific conditions that are rarely met in real measures
cronbach’s alpha
The relationship between the SEM and the reliability of a test is _
inverse
the higher the reliability of a test (or individual subtest within a test), the lower the SEM
its use is typically to evaluate the homogeneity of a measure (or, all items are tapping in a single construct)
internal consistency
exist when, for each form of the test, the means and the variances of observed test scores are equal
parallel forms
By determining the reliability of one half of a test, a test developer can use the __ to estimate the reliability of a whole test
Spearman-Brown formula
Computation of a coefficient of split-half reliability:
Step 1: Divide the test into equivalent halves.
Step 2: Calculate a Pearson r between scores on
the two halves of the test
Step 3: Adjust the half-test reliability using the
Spearman–Brown formula
Tests designed to measure one factor, such as
one ability or one trait,
homogeneous
If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be __
lower