Test Construction Flashcards
An examiner administers and scores the same test numerous times without deviating from the procedure in order to reduce the possibility of measurement error. This exemplifies what?
Standardization
The scores of a representative population sample on a test that an examiner compares an individual's scores to are referred to as \_\_\_\_\_\_\_\_; while they allow for comparisons on a person's performance on different tests, they do not provide the ultimate standard of performance.
Norms
A psychological test that is regarded as \_\_\_\_\_\_\_\_ is administered, scored, and interpreted independent of the subjective judgment of the examiner.
Objective
The SAT and GRE are examples of \_\_\_\_\_\_\_\_ tests, as they provide information about a person's best possible performance, while the MMPI-2 and PAI are \_\_\_\_\_\_\_\_ tests, providing information about a person's usual experience.
Maximum
performance;
typical
performance
________ tests assess the difficulty level
an examinee can attain (e.g., Information
from WAIS), ________ tests assess the
person’s response rate (e.g., Digit
Symbol from WAIS), and ________ tests
help determine whether an individual can
attain a certain level of acceptable
performance (e.g., test of reading skills).
Power;
speed;
mastery
A \_\_\_\_\_\_\_\_ occurs when an instrument cannot take on a value higher than some limit due to the measure not including enough difficult items, resulting in all high-achieving examinees getting similar scores (test is too easy); conversely, a \_\_\_\_\_\_\_\_ occurs when an instrument cannot take on a lower value and thus all low-achieving examinees get similar scores (test is too hard).
Ceiling
effect; floor
effect
In contrast to normative measures, these types of measures require individuals to use their own frame of reference to compare 2 or more desirable options and choose the one that is most preferred.
Ipsative
measures
\_\_\_\_\_\_\_\_ is the consistency of a test, or the degree to which a test provides the same results under the same conditions; \_\_\_\_\_\_\_\_ refers to the degree that a test measures what it claims to be measuring.
Reliability;
validity
A perfectly reliable test would yield every
examinees’ ________ every time it was
administered, as this would indicate the
examinees’ actual ability on whatever the
test is measuring; however, a test is
never perfectly reliable due to ________,
which is random and can be caused by
environmental noise, examinee’s mood
on testing day, and any other number of
factors.
True score;
measurement
error
The most commonly used methods of estimating
reliability of a test use a correlation coefficient,
referred to as the ________, ranging in value
from 0.0 to +1.0, where coefficients closer to 0.0
indicate less reliability and values closer to +1.0
indicate increasing reliability; the coefficient is
not squared to determine the proportion of
variability, unlike other correlation coefficients,
rather it is interpreted directly.
Reliability
coefficient
A researcher administers the same instrument to the same group of college students on 2 separate occasions; following the second administration, the researcher correlates on the first and second administrations. What type of reliability is the researcher attempting to obtain?
Test-retest
reliability (or
“coefficient of
stability”)
TRUE or FALSE: It is not recommended to use the test-retest coefficient when attempting to obtain reliability for a test that measures attributes that are unstable (e.g., mood).
TRUE: Low coefficients, in such cases, would likely be more a reflection of the attribute's unreliability rather than the test's unreliability
A researcher administers one form of a test on one day, then administers an equivalent form to the same group of people at a later date/time. What type of reliability is being sought in this example?
Alternate forms
reliability (or “coefficient
of equivalence;”
parallel-forms reliability)
When correlations are obtained among individual
test items, ________ reliability is being
assessed; the 3 methods for obtaining this
reliability include ________ (involves dividing
test into 2 parts then correlating responses from
the 2 parts), ________ (used when test items are
dichotomously scored- e.g., “true/false”), and
________ (used for tests with multiple-scored
items- e.g., “never/rarely/sometimes/always”).
Internal consistency (or "coefficient of internal consistency"); split-half; Kuder-Richardson Formula 20; Cronbach's coefficient alpha
While the split-half reliability coefficient usually lowers the reliability coefficient artificially, the \_\_\_\_\_\_\_\_ can be used to correct for the effects of shortening the measure.
Spearman-Brown
prediction formula
Measures of internal consistency are not good at assessing reliability for \_\_\_\_\_\_\_\_ tests.
Speed tests, as the
correlation would
be spuriously
inflated
Instruments that rely on rater judgments would be best to have high \_\_\_\_\_\_\_\_ reliability, which is increased when scoring categories are \_\_\_\_\_\_\_\_ and \_\_\_\_\_\_\_\_.
Inter-rater (interscorer); mutually exclusive (a particular behavior belongs to a single category); exhaustive (categories cover all possible responses/behaviors)
The \_\_\_\_\_\_\_\_ estimates the amount of error to be expected in an individual test score and is used to determine a range, referred to as a/an \_\_\_\_\_\_\_\_, within which an examinee's true score will likely fall.
Standard Error of
Measurement;
confidence
interval
What is the
formula for the
standard error of
the measurement?
SDx√1-rxx (SDx = standard deviation of test scores; = reliability coefficient)
What is the probability that a person's true score lies within a range of plus or minus 1 standard error of measurement (SEM) of their obtained score? How about plus or minus 1.96 (2) SEM? And finally, plus or minus 2.58 (2.5) SEM?
68% of the
time; 95% of
the time; 99%
of the time
TRUE or FALSE: Hypothetically, a test with a reliability coefficient of +1.0 would have a standard error of measurement of 0.0.
TRUE: A test
with perfect
reliability will
have no error
The standard error of measurement is \_\_\_\_\_\_\_\_ related to the reliability coefficient (rxx) and \_\_\_\_\_\_\_\_ related to the standard deviation of test scores (SDx).
Inversely;
positively
What reliability
coefficient, when
practical, is the
best to use?
Alternate-forms
Classical test theory states that an observed score reflects \_\_\_\_\_\_\_\_ plus \_\_\_\_\_\_\_\_.
True score
variance;
random error
variance