Test Construction Quiz Questions Flashcards
A screening test for a disorder that has a very low base rate in the population is known to have an overall accuracy rate of 98%. When using this test to identify individuals in the general population who have the disorder, it’s important to keep in mind that the test will produce:
a larger number of false positives than false negatives
In a factor matrix, the factor loading for Test A and Factor II is .70. This means that:
49% of variability in Test A is accounted for by Factor II
When using the multitrait-multimethod matrix to evaluate the construct validity of a newly developed test, a __________ coefficient provides evidence of the test’s divergent (discriminant) validity.
small heterotrait-monomethod
When item response theory has been used as the basis for test construction, an examinee’s score on the test provides information about his/her:
future status on an external criterion
To construct the 68% confidence interval for an examinee’s obtained test score, you would need the examinee’s score and:
the standard error of measurement
When a test’s reliability coefficient is equal to 0, the standard error of measurement for the test is:
equal to the tests standard deviation.
A test developer uses a multitrait-multimethod matrix to organize the data she has collected in a validation study of her newly developed self-report measure of self-esteem. The matrix indicates that the correlation between her self-report measure of self-esteem and an established (previously validated) teacher rating of self-esteem is .91. This correlation coefficient suggests that the self-report measure of self-esteem has:
adequate convergent validity.
A measure of test anxiety is administered to a sample of 50 psychologists who are studying for the licensing exam, and a split-half reliability coefficient of .80 is calculated from their scores. The test is then administered to another group of 50 psychologists who are more heterogeneous with regard to level of test anxiety. The split-half reliability coefficient for the second group is most likely to be:
Larger than .80
You would use which of the following to estimate what a predictor’s criterion-related validity coefficient would be if the predictor and/or criterion had a reliability coefficient of 1.0?
Correction for attenuation formula
Scores on a predictor that will be used to estimate job performance rating range from 0 to 200. If the predictor’s cutoff score is raised from 130 to 150, this will have which of the following effects?
Decrease the number of false positives
To assess the internal consistency reliability of a test that contains 50 items that are each scored as either “correct” or “incorrect,” you would use which of the following?
KR-20
The kappa statistic for a test is .95. This means that the test has:
Adequate inter-rater reliability
For a newly developed test of cognitive flexibility, coefficient alpha is .55. Which of the following would be useful for increasing the size of this coefficient?
Adding more items that are similar in terms of content and quality
A student receives a score of 450 on a college aptitude test that has a mean of 500 and standard error of measurement of 50. The 68% confidence interval for the student’s score is:
400 to 500
Consensual observer drift tends to:
produce an overestimate of a test’s inter-rater reliability.
To determine a test’s internal consistency reliability by calculating coefficient alpha, you would:
administer the test to a single sample of examinees one time.
A problem with using percent agreement as a measure of inter-rater reliability is that it doesn’t take into account the effects of:
chance agreement among raters.
According to classical test theory, total variability in obtained test scores is composed of:
true score variability plus random error.
Which of the following methods for evaluating reliability is most appropriate for speed tests?
Coefficient of equivalence
Your newly developed measure of integrity correlates highly with a well-known and widely used measure of integrity. This correlation provides evidence of your measure’s ________ validity.
Convergent
In a multitrait-multimethod matrix, a test’s construct validity would by confirmed when:
monotrait-heteromethod coefficients are high and heterotrait-monomethod coefficients are low.
High Mono-trait, Low Hetero-trait HMLH
ham l’hell
Which of the following best defines the relationship between a predictor’s reliability coefficient and its criterion-related validity coefficient?
Validity is no greater than the square root of reliability
V before R because of VR- no greater than sq root
The results of a factor analysis indicate that Test A has a factor loading of .70 for Factor I and a factor loading of .20 for Factor II. Assuming that only two factors were extracted and that the factors are orthogonal, you can conclude that the communality for Test A scores is:
0.53
When conducting a factor analysis, you would choose an oblique rotation of the factors if:
you believe the constructs measured by the tests included in the analysis are correlated.
The standard error of estimate is used to:
estimate the difference between an examinee’s predicted criterion score and his or her true criterion score.
In a scatterplot constructed from data collected in a concurrent validity study, the number of “false negatives” is likely to increase if:
the predictor cutoff score is raised and/or the criterion cutoff score is lowered.
Validity is best described as:
Accuracy
A test developer uses a sample of 50 current employees to identify items for and then validate a new selection test (predictor). When she correlates scores on the test with scores on a measure of job performance (criterion) for this sample, she obtains a criterion-related validity coefficient of .63. When the test developer administers the test and the measure of job performance to a new sample of 50 employees, she will most likely obtain a validity coefficient that is:
Less than .63 (shrinkage)
When determining a predictor’s incremental validity, the positive hit rate is calculated by:
dividing the number of true positives by the total number of positives.
Test’s sensitivity measures
True positives
Test’s specificity measures
True negatives
“How much no”
specific = pacific = bad (bc California)
A test’s content validity is established primarily by which of the following?
Having subject matter experts systematically review the test’s items
To ascertain if the test you have developed is valid as a screening test for determining whether a person has an anxiety or affective disorder, you would be most interested in evaluating the test’s:
Concurrent validity (type of criterion-related validity)
To evaluate the concurrent validity of a new selection test for computer programmers, you would:
administer the test to current computer programmers and correlate their test scores with recently assigned job performance ratings.
____________ refers to the percent of examinees who have the condition being assessed by a predictor who are identified by the predictor as having the condition.
Sensitivity
“How much yes”