test construction Flashcards
A psychologist develops a diagnostic test to identify people who have injection phobia. In this situation, the test’s ________ refers to how good the test is at identifying people who have injection phobia from the pool of people who actually have injection phobia.
Select one:
A.
specificity
B.
sensitivity
C.
positive predictive value
D.
negative predictive value
B.
sensitivity
Which of the following is NOT an example of a standard score?
Select one:
A.
WAIS IQ score
B.
percentage score
C.
z-score
D.
T-score
B.
percentage score
__________ refers to the extent to which individual test items contribute to the overall purpose of the test.
Select one:
A.
Validity
B.
Reliability
C.
Discrimination
D.
Relevance
D.
Relevance
A first-year college student obtains a score of 150 on her final English exam, a score of 100 on her math exam, a score of 55 on her chemistry exam, and a score of 30 on her history exam. The means and standard deviations for these tests are, respectively, 125 and 20 for the English exam, 90 and 10 for the math exam, 45 and 5 for the chemistry exam, and 30 and 5 for the history exam. Based on this information, you can conclude that the student’s test performance was best on which exam?
Select one:
A.
English
B.
Math
C.
Chemistry
D.
History
C.
Chemistry
To maximize the ability of a test to discriminate among test takers, a test developer will want to include test items that vary in terms of difficulty. If the test developer wants to add more difficult items to her test, she will include items that have an item difficulty index of:
Select one:
A.
.90
B.
.50
C.
.10
D.
0
c. .10
A final exam is developed to evaluate students’ comprehension of information presented in a high school history class. When the exam is administered to three classes of students at the end of the semester, all students obtain failing scores. This suggests that the exam may have poor ________ validity.
Select one:
A.
concurrent
B.
incremental
C.
content
D.
divergent
c. content
The correction for attenuation formula is used to measure the impact of increasing:
Select one:
A.
a test’s reliability on its validity
B.
a test’s validity on its reliability
C.
the number of test items on the test’s validity
D.
the number of test items on the test’s reliability
A.
a test’s reliability on its validity
When a test has been constructed on the basis of item response theory, an examinee’s total test score provides information about his/her:
Select one:
A.
status on a latent trait or ability
B.
predicted performance on an external criterion
C.
performance relative to other examinees included in the standardization sample
D.
current developmental level
A.
status on a latent trait or ability
The primary advantage in using a percentile rank, z-score, or T-score is that these scores:
Select one:
A.
are easy to interpret because they reference an individual’s test performance to an established standard of performance
B.
are easy to interpret because they reference an individual’s test performance to the performance of other examinees
C.
are easy to interpret because they make it possible to predict which criterion group an examinee is likely to belong to
D.
normalize the raw score distribution so that parametric tests can be used to analyze test scores
B.
are easy to interpret because they reference an individual’s test performance to the performance of other examinees
You would use a multitrait-multimethod matrix in order to:
Select one:
A.
compare a test’s predictive and concurrent validity
B.
determine if a test has adequate convergent and discriminant validity
C.
identify the common factors underlying a set of related constructs
D.
test hypotheses about the causal relationships among variables
B.
determine if a test has adequate convergent and discriminant validity
An advantage of using the kappa statistic rather than percent agreement when assessing a test’s inter-rater reliability is that the former:
Select one:
A.
is easier to calculate
B.
corrects for chance agreement
C.
corrects for small sample size
D.
takes into account the effects of multicollinearity
B.
corrects for chance agreement
: Which type of reliability would be most appropriate for estimating the reliability of a multiple-choice speed test?
Select one:
A.
Split-half
B.
Coefficient of concordance
C.
Alternate forms
D.
Coefficient alpha
c. alternate forms
When using principal component analysis:
Select one:
A.
the first principal component represents the largest share of the total variance.
B.
the first principal component represents the smallest share of the total variance.
C.
each component represents an equal share of the total variance.
D.
the order of the components is not related to the share of total variance they represent.
A.
the first principal component represents the largest share of the total variance.
Which of the following scores does not “belong with” the other three?
Select one:
A.
Stanine scores
B.
z-scores
C.
Percentile ranks
D.
Percentage scores
D.
Percentage scores
Stella obtains a score of 50 on a test that has a standard deviation of 10 and a standard error of measurement of 5. The 95% confidence interval for Stella’s score is approximately:
Select one:
A.
45 to 55
B.
40 to 60
C.
35 to 65
D.
30 to 70
b. 40 to 60
In a normal distribution, which of the following represents the lowest score?
Select one:
A.
Percentile rank of 20
B.
z-score of -1.0
C.
T-score of 25
D.
Wechsler IQ score of 70
C.
T-score of 25
A personnel director uses a mechanical aptitude test to hire machine shop workers. Several of the people hired using the test turn out to be less than adequate performers. These individuals are:
Select one:
A.
true positives
B.
true negatives
C.
false positives
D.
false negatives
c. false positives
In terms of magnitude, the standard error of measurement can be:
Select one:
A.
no greater than 1.0
B.
no less than 1.0
C.
no greater than the standard deviation of the test scores
D.
no less than the standard deviation of the test scores
C.
no greater than the standard deviation of the test scores
A 200-item test that has been administered to 100 college students has a normal distribution, a mean of 145, and a standard deviation of 12. When the students’ raw scores have been converted to percentile ranks, Alex obtains a percentile rank of 49, while his twin sister Alicia obtains a percentile rank of 90. The teacher realizes that she made a mistake in scoring Alex’s and Alicia’s tests: Both should have received a raw score that was five points higher. In terms of their percentile ranks, when the teacher adds the five points to Alex’s and Alicia’s scores, she can expect that:
Select one:
A.
Alicia’s percentile rank will increase more than Alex’s.
B.
Alex’s percentile rank will increase more than Alicia’s.
C.
Alicia’s and Alex’s percentile ranks will increase by the same amount.
D.
Alicia’s and Alex’s percentile ranks will not change.
B.
Alex’s percentile rank will increase more than Alicia’s.
In a distribution of percentile ranks, the number of examinees receiving percentile ranks between 20 and 30 is:
Select one:
A.
equal to the number of examinees receiving percentile ranks between 50 and 60
B.
greater than the number of examinees receiving percentile ranks between 50 and 60
C.
about one-half the number of examinees receiving percentile ranks between 50 and 60
D.
about one-fourth the number of examinees receiving percentile ranks between 50 and 60
A.
equal to the number of examinees receiving percentile ranks between 50 and 60
To obtain a “coefficient of stability,” you would:
Select one:
A.
administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores.
B.
administer a test to a group of examinees and determine the average inter-item correlation.
C.
administer a test to two different random samples of examinees on two occasions and correlate the two sets of scores.
D.
administer parallel forms of a test to the same group of examinees and correlate the two sets of scores.
A.
administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores.
The best way to control consensual observer drift is to:
Select one:
A.
use the correction for attenuation formula
B.
use a true experimental research design
C.
videotape the observers
D.
alternate raters
d. alternate raters
A reliability coefficient is best defined as a measure of:
Select one:
A.
relevance
B.
consistency
C.
interpretability
D.
generalizability
b. consistency
A test designed to measure knowledge of clinical psychology is likely to have the highest reliability coefficient when:
Select one:
A.
the test consists of 30 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology
B.
the test consists of 30 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology
C.
the test consists of 80 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology
D.
the test consists of 80 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology
C.
the test consists of 80 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology
Assuming no constraints in terms of time, money, or other resources, the best way to demonstrate that a test has adequate reliability is by using which of the following techniques?
Select one:
A.
equivalent (alternate) forms
B.
test-retest
C.
Cronbach’s alpha
D.
Cohen’s kappa
A.
equivalent (alternate) forms
In a normal distribution, a T-score of ___ is equivalent to a percentile rank of 16.
Select one:
A.
10
B.
20
C.
30
D.
40
d. 40