Chapter 5 Flashcards
Refers to consistency in measurement.
Not an all-or-none matter.
A test may be reliable
in one context and unreliable in another.
RELIABILITY
Index of reliability, a proportion
that indicates the ratio between the true score variance on a
test and the total variance.
Reliability coefficient
Score on an ability test is presumed to reflect
not only the testtaker’s true score on the ability being measured but also error.
X= T + E
Classical test theory
A statistic useful in describing sources of test score variability
Variance
2 types of measurement error
Systematic
Random
Source of error in measuring a targeted variable
caused by unpredictable fluctuations and inconsistencies of other
variables in the measurement process.
Random error
Source of error in measuring a variable that
is typically constant or proportionate to what is presumed to be the true
value of the variable being measured.
Doesn’t affect score consistency
Systematic error
Source of variance which refers to variation among items
within a test as well as to variation among items between tests.
Differences are sure to be found in the way the items are worded
and in the exact content sampled.
Item sampling
or Content sampling
The testtaker’s reactions to those influences are the source of
one kind of error variance:
Sources of error in test administration
Test environment - physical setting where a test is administered
Testtaker variables - factors related to the individual taking the test
Examiner-related variables - factors related to the person administering the test
2 possible sources of error variance under test scoring and interpretation
Scorers - subjectivity involved
Scoring systems - glitch if computer scoring
3 potential sources of nonsystematic error in assessment
situation
Forgetting
Failing to notice abusive
behavior
Misunderstanding instructions regarding
reporting
Estimate of reliability obtained by
correlating pairs of scores from the same people on two different
administrations of the same test.
Test-retest reliability
The
estimate of test-retest reliability when the interval between testing is greater than six months
Coefficient
of stability
The degree of the relationship between various forms of a test can be
evaluated by means of _______________; often referred to as coefficient of equivalence.
Alternate-forms or parallel-forms
coefficient of reliability
Exist when, for each form of the test, the
means and the variances of observed test scores are equal.
Parallel forms
Simply different versions of a test that have
been constructed so as to be parallel.
Alternate forms
Obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once.
Internal consistency estimate of reliability
or
Estimate of interitem consistency
Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the two halves
of the test.
Step 3. Adjust the half-test reliability using the Spearman-Brown formula
3 steps in the computation of coefficient of split-half reliability
3 acceptable ways to split a test
Randomly assign
Odd-even reliability
By content
Assign odd-numbered
items to one half of the test and even-numbered items to the
other half
Odd-even reliability
Each half contains items equivalent with respect to content and
difficulty
By content
Randomly assign items
to one or the other half of the test.
Randomly assign
Formula that allows a test developer or user to
estimate internal consistency reliability from a correlation of two halves of
a test.
Spearman-Brown
Degree of correlation among all
the items on a scale.
Inter-item consistency
Degree to which a test measures a
single factor.
Homogeneity
Degree to which a test measures
different factors.
Heterogeneity
Statistic of choice for determining the inter-item consistency
of dichotomous items, primarily those items that can be scored right or wrong
KuderRichardson formula 20 or KR-20
May be thought of as the mean of all possible split-half
correlations, corrected by the Spearman-Brown formula.
Typically ranges in value from 0 to 1.
COEFFICIENT ALPHA
A measure used to evaluate the internal consistency of a test that
focuses on the degree of difference that exists between item scores.
AVERAGE PROPORTIONAL DISTANCE (APD)
Step 1: Calculate the absolute difference between scores for all of the
items.
Step 2: Average the difference between scores.
Step 3: Obtain the APD by dividing the average difference between
scores by the number of response options on the test, minus one
Steps in getting AVERAGE PROPORTIONAL DISTANCE (APD)
Degree of agreement or
consistency between two or more scorers (or judges or raters)
with regard to a particular measure.
Inter-scorer reliability
- the test items are homogeneous or heterogeneous in nature
- the characteristic, ability, or trait being measured is presumed to be
dynamic or static - the range of test scores is or is not restricted
- the test is a speed or a power test
- the test is or is not criterion-referenced
5 natures of test
Homogeneity vs heterogeneity of a test
Homogeneity of test items - all the items on a test are very similar in content, difficulty, and format, essentially measuring the same construct
Heterogeneity of test items - items vary significantly in content, difficulty, and format, potentially assessing multiple related but distinct aspects of a concept
Trait, state, or ability presumed to be
ever-changing as a function of situational and cognitive
experiences.
Dynamic characteristic
Trait, state, or ability presumed to be
relatively unchanging such as intelligence.
Static characteristic
When a time limit is long enough to allow testtakers to attempt all
items, and if some items are so difficult that no testtaker is able to
obtain a perfect score
Power test
Generally contains items of uniform
level of difficulty (typically uniformly low) so that, when given
generous time limits, all testtakers should be able to complete all
the test items correctly
Speed test
A reliability estimate of a speed test should be based on performance from
two independent testing periods using one of the following: ___________ from
two separately timed half tests
Test-retest reliability
Alternate-forms reliability
Split-half reliability
Designed to provide an indication of where a
testtaker stands with respect to some variable or criterion, such as an educational
or a vocational objective.
Criterion-referenced test
Any theoretical and statistical framework describing how
respondents generate their answers to items on a scale or
instrument and explaining associated sources of error
PSYCHOMETRIC MODELS
3 major approaches of psychometric models:
Classical test theory
Generalizability theory
Item response theory
A framework of principles and assumptions about how to
determine the reliability of a set of data.
GENERALIZABILITY THEORY
Include things
like the number of items in the test, the amount of training the test
scorers have had, and the purpose of the test administration.
Facets
Examines how generalizable scores from a
particular test are if the test is administered in different situations
Generalizability study
Represents the influence of particular facets on the test score, which is similar to reliability coefficient in
the true score model.
Coefficients of generalizability
Involves the application of information
from the generalizability study
Developers examine the usefulness of test scores in
helping the test user make decisions.
Decision study
States that
the probability that an item will be answered correctly is a function
of an underlying trait or ability that is not directly observable; that is,
a latent trait; also referred to as latent-trait theory
ITEM RESPONSE THEORY
Test items or questions that can be
answered with only one of two alternative responses, such as true–
false, yes–no, or correct–incorrect questions
Dichotomous test items
Test items or questions with three or more
alternative responses, where only one is scored correct or scored as
being consistent with a targeted trait or other construct
Polytomous test items
Provides a measure of the precision of an observed test score
Standard error of measurement
Determine whether a
score is significantly different from a criterion
Confidence interval
Statistical measure that can aid a test user in determining
how large a difference should be before it is considered statistically
significant.
Standard error of the
difference
4 sources of error variance
Test construction
Test administration
Test scoring and interpretation
Other sources of error
2 other sources of error variance
Sampling error - not an actual representative of a population of data
Methodological error - mistake or flaw in the design or execution of a research study