Chapter 5 Flashcards
a proportion that indicates the ratio between the true score variance on a test and the total
variance
reliability coefficient
A statistic useful in describing sources of test score variability is the _______
variance
Variance from true differences is _______
true variance
variance from irrelevant, random sources
is ________________
error variance
refers to the
proportion of the total variance attributed to true variance.
reliability
refers to collectively all of the factors associated
with the process of measuring some variable, other than the variable being measured.
measurement error
is a source of error in measuring a targeted variable caused by
unpredictable fluctuations and inconsistencies of other variables in the measurement process.
Random error
refers to a source of error in measuring a
variable that is typically constant or proportionate to what is presumed to be the true value of
the variable being measured.
systematic error
terms that refer to variation among items within a test as well as to
variation among items between tests
item sampling or
content sampling
is an estimate of reliability obtained by correlating pairs of scores
from the same people on two different administrations of the same test
Test-retest reliability
the estimate of test-retest reliability is often referred to as the ______
coefficient of
stability.
The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms
coefficient of reliability, which is often termed the ________
coefficient of equivalence.
refers to an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal.
parallel forms
reliability
are simply different versions of a test that
have been constructed so as to be parallel.
Alternate forms
refers to an estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error.
alternate forms reliability
is obtained by correlating two pairs of scores obtained
from equivalent halves of a single test administered once.
split-half reliability
This method yields an estimate of split-half
reliability that is also referred to as _________
odd-even reliability.
refers to the degree of correlation among all the
items on a scale.
Inter-item consistency
allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.
Spearman–Brown formula
is the degree to which
a test measures a single factor.
homogeneity
may be thought of as the mean of all possible split-half correlations, corrected by the Spearman–Brown formula.
coefficient alpha
describes the degree to which a test
measures different factors.
heterogeneity
as a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.
average proportional distance method
(APD)
Homogeneity VS heterogeneity of test items (essay)
Recall that a test is said to be homogeneous
in items if it is functionally uniform throughout. Tests designed to measure one factor, such as one ability or one trait, are expected to be homogeneous in items. For such tests, it is reasonable to expect a high degree of internal consistency. By contrast, if the test is heterogeneous in items, an estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability.
is the degree of agreement or consistency between two or
more scorers (or judges or raters) with regard to a particular measure.
inter-scorer reliability
is a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.
dynamic characteristic
ability presumed to be relatively unchanging is ______ such as
intelligence.
static characteristic
if some items are so difficult that no test-taker is able to obtain a perfect score, then the test is a _____
power test
generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all test-takers should be able to complete all the test items correctly
speed test
is designed to provide an indication of where a test-taker stands with respect to some variable or criterion, such as an educational
or a vocational objective
criterion-referenced test
a value that according to classical test theory genuinely reflects an individual’s ability (or trait) level as measured by a particular test.
true score
also referred to as the true score (or classical) model of measurement. _________ is the most widely used and accepted model in the psychometric literature today
classical test theory (CTT)
seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score.
domain sampling theory
is based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation.
generalizability theory
Cronbach encouraged test developers and researchers to describe the details of the particular test situation or ______ leading to a specific test score.
Universe
examines how generalizable scores from a particular test are if
the test is administered in different situations.
generalizability study
include things like the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration.
Facets
These coefficients are similar to reliability coefficients
in the true score model.
coefficients of generalizability.
developers examine the usefulness of test
scores in helping the test user make decisions.
decision study
Another alternative to the true score model is ________
Item response theory (IRT)
a synonym for Item response theory (IRT) in the academic literature is _____
latent-trait theory.
is a categorical variable with two possible response values (Yes/No, Agree/Disagree, Success/Fail).
dichotomous item
is a categorical variable ordinal or nominal with more than two possible values (e.g. strongly disagree, disagree, agree, strongly agree).
polytomous item
is a reference to an IRT model with very specific assumptions about the underlying distribution
Rasch model
is the tool used to estimate or infer the extent to
which an observed score deviates from a true score.
standard error of measurement
a range or band of test scores that is likely to contain the true score.
confidence interval
Comparisons between scores are made using the _________
standard error of the difference
refers to a group of personality tests.
Personality test battery