Test Construction Flashcards
An examiner administers and scores the same test numerous times without deviating from the procedure in order to reduce the possibility of measurement error. This exemplifies what?
Standardization
The scores of a representative population sample on a test that an examiner compares an individual's scores to are referred to as \_\_\_\_\_\_\_\_; while they allow for comparisons on a person's performance on different tests, they do not provide the ultimate standard of performance.
Norms
A psychological test that is regarded as \_\_\_\_\_\_\_\_ is administered, scored, and interpreted independent of the subjective judgment of the examiner.
Objective
The SAT and GRE are examples of \_\_\_\_\_\_\_\_ tests, as they provide information about a person's best possible performance, while the MMPI-2 and PAI are \_\_\_\_\_\_\_\_ tests, providing information about a person's usual experience.
Maximum
performance;
typical
performance
________ tests assess the difficulty level
an examinee can attain (e.g., Information
from WAIS), ________ tests assess the
person’s response rate (e.g., Digit
Symbol from WAIS), and ________ tests
help determine whether an individual can
attain a certain level of acceptable
performance (e.g., test of reading skills).
Power;
speed;
mastery
A \_\_\_\_\_\_\_\_ occurs when an instrument cannot take on a value higher than some limit due to the measure not including enough difficult items, resulting in all high-achieving examinees getting similar scores (test is too easy); conversely, a \_\_\_\_\_\_\_\_ occurs when an instrument cannot take on a lower value and thus all low-achieving examinees get similar scores (test is too hard).
Ceiling
effect; floor
effect
In contrast to normative measures, these types of measures require individuals to use their own frame of reference to compare 2 or more desirable options and choose the one that is most preferred.
Ipsative
measures
\_\_\_\_\_\_\_\_ is the consistency of a test, or the degree to which a test provides the same results under the same conditions; \_\_\_\_\_\_\_\_ refers to the degree that a test measures what it claims to be measuring.
Reliability;
validity
A perfectly reliable test would yield every
examinees’ ________ every time it was
administered, as this would indicate the
examinees’ actual ability on whatever the
test is measuring; however, a test is
never perfectly reliable due to ________,
which is random and can be caused by
environmental noise, examinee’s mood
on testing day, and any other number of
factors.
True score;
measurement
error
The most commonly used methods of estimating
reliability of a test use a correlation coefficient,
referred to as the ________, ranging in value
from 0.0 to +1.0, where coefficients closer to 0.0
indicate less reliability and values closer to +1.0
indicate increasing reliability; the coefficient is
not squared to determine the proportion of
variability, unlike other correlation coefficients,
rather it is interpreted directly.
Reliability
coefficient
A researcher administers the same instrument to the same group of college students on 2 separate occasions; following the second administration, the researcher correlates on the first and second administrations. What type of reliability is the researcher attempting to obtain?
Test-retest
reliability (or
“coefficient of
stability”)
TRUE or FALSE: It is not recommended to use the test-retest coefficient when attempting to obtain reliability for a test that measures attributes that are unstable (e.g., mood).
TRUE: Low coefficients, in such cases, would likely be more a reflection of the attribute's unreliability rather than the test's unreliability
A researcher administers one form of a test on one day, then administers an equivalent form to the same group of people at a later date/time. What type of reliability is being sought in this example?
Alternate forms
reliability (or “coefficient
of equivalence;”
parallel-forms reliability)
When correlations are obtained among individual
test items, ________ reliability is being
assessed; the 3 methods for obtaining this
reliability include ________ (involves dividing
test into 2 parts then correlating responses from
the 2 parts), ________ (used when test items are
dichotomously scored- e.g., “true/false”), and
________ (used for tests with multiple-scored
items- e.g., “never/rarely/sometimes/always”).
Internal consistency (or "coefficient of internal consistency"); split-half; Kuder-Richardson Formula 20; Cronbach's coefficient alpha
While the split-half reliability coefficient usually lowers the reliability coefficient artificially, the \_\_\_\_\_\_\_\_ can be used to correct for the effects of shortening the measure.
Spearman-Brown
prediction formula
Measures of internal consistency are not good at assessing reliability for \_\_\_\_\_\_\_\_ tests.
Speed tests, as the
correlation would
be spuriously
inflated
Instruments that rely on rater judgments would be best to have high \_\_\_\_\_\_\_\_ reliability, which is increased when scoring categories are \_\_\_\_\_\_\_\_ and \_\_\_\_\_\_\_\_.
Inter-rater (interscorer); mutually exclusive (a particular behavior belongs to a single category); exhaustive (categories cover all possible responses/behaviors)
The \_\_\_\_\_\_\_\_ estimates the amount of error to be expected in an individual test score and is used to determine a range, referred to as a/an \_\_\_\_\_\_\_\_, within which an examinee's true score will likely fall.
Standard Error of
Measurement;
confidence
interval
What is the
formula for the
standard error of
the measurement?
SDx√1-rxx (SDx = standard deviation of test scores; = reliability coefficient)
What is the probability that a person's true score lies within a range of plus or minus 1 standard error of measurement (SEM) of their obtained score? How about plus or minus 1.96 (2) SEM? And finally, plus or minus 2.58 (2.5) SEM?
68% of the
time; 95% of
the time; 99%
of the time
TRUE or FALSE: Hypothetically, a test with a reliability coefficient of +1.0 would have a standard error of measurement of 0.0.
TRUE: A test
with perfect
reliability will
have no error
The standard error of measurement is \_\_\_\_\_\_\_\_ related to the reliability coefficient (rxx) and \_\_\_\_\_\_\_\_ related to the standard deviation of test scores (SDx).
Inversely;
positively
What reliability
coefficient, when
practical, is the
best to use?
Alternate-forms
Classical test theory states that an observed score reflects \_\_\_\_\_\_\_\_ plus \_\_\_\_\_\_\_\_.
True score
variance;
random error
variance
Methods of recording behaviors include \_\_\_\_\_\_\_\_ recording (elapsed time that behavior occurs is recorded), \_\_\_\_\_\_\_\_ recording (number of times behavior occurs is recorded), \_\_\_\_\_\_\_\_ recording (rater notes whether subject engages in behavior during given time period), and \_\_\_\_\_\_\_\_ recording (all behavior during an observation session is recorded).
Duration;
frequency;
interval;
continuous
Simply put, \_\_\_\_\_\_\_\_ refers to the degree a test measures what it purports to measure.
Validity
A depression scale that only assesses the affective aspects of depression but fails to account for the behavioral aspects would be lacking what type of validity?
Content validity, which refers to the extent to which test items represent all facets of the content area being measured (e.g., EPPP)
TRUE or FALSE: Content validity assessment requires a degree of agreement between experts in the subject matter, thus it includes an element of subjectivity.
TRUE: Tests should also correlate highly with other tests that measure the same content domain
In contrast to content validity, \_\_\_\_\_\_\_\_ occurs when a test appears to valid by examinees, administrators, and other untrained observers; it is not technically a type of test validity.
Face
validity
A personality test that effectively predicts the future behavior of an examinee has what type validity?
Criterion-related validity, which is obtained by correlating scores on a predictor test to some external criterion (e.g., academic achievement, job performance)
Criterion-related validity is assessed using a/an \_\_\_\_\_\_\_\_ to determine the relationship between the predictor and the criterion; for interpretation this value can be squared, producing the "\_\_\_\_\_\_\_\_," which indicates the proportion of variability in the criterion that is explained by variability in the predictor.
Correlation
coefficient;
coefficient of
determination
The process of \_\_\_\_\_\_\_\_ validation involves the predictor and the criterion being collected at the same time, providing information regarding a test's usefulness for predicting a given current behavior; \_\_\_\_\_\_\_\_ validation involves a waiting period between collection of predictor scores and criterion data, providing information regarding a test's usefulness for predicting future behavior.
Concurrent;
predictive
When interpreting a person's predicted score on a given criterion measure, the \_\_\_\_\_\_\_\_ will determine within what range of scores their actual score will likely fall.
Standard
Error of
Estimate
The standard error of measurement constructs a confidence interval around an examinee's \_\_\_\_\_\_\_\_ score (using a reliability coefficient), while the standard error of estimate does the same for an examinee's \_\_\_\_\_\_\_\_ score (using a validity coefficient).
Obtained;
predicted
Interviewees are given an aptitude test (predictor) to predict work success (criterion), with hiring contingent on achieving a certain minimum score, called a/an \_\_\_\_\_\_\_\_ score. The manager then rates performance on work tasks, an indication of success, and only those who score above a certain \_\_\_\_\_\_\_\_ are deemed successful.
Predictor
cutoff;
criterion cutoff
Scoring above both the predictor and criterion cutoff points produces \_\_\_\_\_\_\_\_; scoring above the predictor cutoff point but below the criterion cutoff point produces \_\_\_\_\_\_\_\_; scoring below the predictor cutoff point but above the criterion cutoff point produces \_\_\_\_\_\_\_\_; and scoring below both the predictor and criterion cutoff points produces \_\_\_\_\_\_\_\_.
True positives (valid acceptances); false positives (false acceptances); false negatives (invalid rejections); true negatives (valid rejections)