UNIT 5 NEW Flashcards
Could also be used to determine the number of items needed to attain a desired level of reliability that has the desired reliability
Spearman–Brown formula
sources of error variance of alternate forms
test construction or administration
If interested in looking in the truth independent of measurement, psychologists look for the
construct score
It provides an estimate of the amount of error inherent in an observed score or measurement
Standard Error of Measurement (SEM)
sources of error variance of test-retest
administration
A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant
standard error of difference
a reference to an IRT model with specific assumptions about the underlying distribution
rasch model
The procedures of this provide a way to model the probability that a person with X ability will be able to perform at a level of Y
Item Response Theory (IRT)
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
split-half reliability
its use is typically to evaluate the homogeneity of a measure (or, all items are tapping in a single construct)
internal consistency
If test developers or users wish to shorten a test, the __ may be used to estimate the effect of the shortening on the test’s reliability
Spearman–Brown formula
It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice (because of factors such as time or expense)
split-half reliability
A __ of behavior, or the universe of items that could conceivably measure that behavior, can be thought of as a hypothetical construct
domain
exist when, for each form of the test, the means and the variances of observed test scores are equal
parallel forms
consists of unpredictable fluctuations and inconsistencies of other variables in the measurement process;
sometimes referred to as “noise”;
random error
trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences
dynamic characteristic
a statistic that quantifies reliability, ranging from 0 (not at all reliable) to 1 (perfectly reliable)
reliability coefficient
In general, a primary objective in splitting a test in half for the purpose of obtaining a split-half reliability estimate is to create what might be called __
mini- parallel-forms
tied to the measurement instrument used
true score
often used when coding nonverbal behavior
inter-scorer reliability
The influence of particular facets on the test score is represented by _
coefficients of generalizability
to evaluate the relationship between different forms of a measure
alternate forms
terms that refer to variation among items within a test as well as to variation among items between tests
item sampling or content sampling
the inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistake
measurement of error
to evaluate the extent to which items on a scale relate to one another
internal consistency
This source of error fluctuates from one testing situation to another with no discernible pattern that would systematically raise or lower scores; increase or decrease test scores unpredictably
random error
Computation of a coefficient of split-half reliability:
Step 1: Divide the test into equivalent halves.
Step 2: Calculate a Pearson r between scores on
the two halves of the test
Step 3: Adjust the half-test reliability using the
Spearman–Brown formula
The extent to which a testtaker’s score is affected by the content sampled on a test and by the way the content is sampled (i.e., the way in which the item is constructed) is a source of error variance
test construction
Widely used as a measure of reliability, in part because it requires only one administration of the test
coefficient alpha
If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be __
lower
test items or questions that can be answered with only one of two alternative responses, such as true–false, yes– no, or correct–incorrect questions
dichotomous test items
If two scores each contain error such that in each
case the true score could be higher or lower, then we would want the two scores to be further apart before we conclude that there is a significant difference between them
standard error of difference
It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by any number of item
Spearman-Brown formula
generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly
speed test
In many tests, the advent of computer scoring and a growing reliance on objective, computer-scorable items have virtually eliminated error variance caused by scorer difference
test scoring and interpretation
trait, state, or ability presumed to be relatively unchanging, such as intelligence
static characteristic
approach to reliability evaluation
test-retest method
Portion of variability in test scores that is due to factors unrelated to the construct being measured.
error variance
an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
test-retest reliability