Test Construction Flashcards
When using the multitrait-multimethod matrix, a large ____________ coefficient provides evidence of a test’s convergent validity.
monotrait-heteromethod
Regardless of the distribution of raw scores, when the raw scores are converted to percentile ranks, the resulting distribution will be:
rectangular.
The percentile rank distribution is:
rectangular regardless of the shape of the raw score distribution.
The item discrimination index (D) ranges from:
-1.00 to +1.00.
In a normal distribution, a T-score of ___ is equivalent to a percentile rank of: 84.
60
When using the multitrait-multimethod matrix, a small heterotrait-monomethod coefficient provides evidence of a test’s:
divergent validity.
The test manual for an academic achievement test indicates that it has an alternate forms reliability coefficient of .80. This means that _____ of variability in test scores is true score variability.
80%
A factor analysis yields three factors, and one of the tests included in the analysis has factor loadings of .60, .30, and .10 on Factors I, II, and III, respectively. Assuming that the rotation was orthogonal, the communality for Test A is:
.46.
On cross-validation, a multiple correlation coefficient (R) is most likely to “shrink” when:
the original validation sample was small and the number of predictors is large.
A psychologist was hired to develop a new selection test that the company will add to its current selection procedure. The psychologist determines that the test has adequate test-retest reliability and concurrent validity. However, before the company begins using the test to hire applicants, the psychologist will want to make sure that its use also produces an increase in decision-making accuracy. In other words, the psychologist will want to make sure that the test has adequate:
incremental validity.
In the context of factor analysis, “oblique” means:
correlated.
A newly developed aptitude test that will be used to help make college admissions decisions was administered to 100 high school seniors whose grade point averages ranged from 3.5 to 4.0, and a split-half reliability coefficient of .75 was calculated from their scores. The aptitude test was then administered to another sample of 100 high school seniors whose grade point averages ranged from 2.0 to 4.0. The split-half reliability coefficient for the second sample of students will most likely be:
larger than .75.
The Kuder-Richardson Formula 20 (KR-20) can be used to estimate a test’s ____________ reliability when test items are scored dichotomously.
internal consistency
A student receives a score of 85 on a test that has a mean of 75 and standard deviation of 5. What is the z-score equivalent of the student’s score?
+2.0
The item difficulty index ranges from __________, with 0 indicating a __________.
0 to +1.0; very difficult item
To estimate the effect of shortening or lengthening a test on the test’s reliability coefficient, you would use which of the following?
Spearman-Brown formula
Which aspect of an item characteristic curve (ICC) indicates the probability of choosing the correct answer to the item by guessing alone?
the point at which the curve intercepts the y-axis
When using the multitrait-multimethod matrix to assess a test’s construct validity, a large heterotrait-monomethod coefficient indicates which of the following?
inadequate divergent validity
A job applicant’s score on a selection test is used to predict what her future score on a measure of job performance will be if she’s hired. If the applicant’s predicted job performance score is 80 and the measure of job performance has a standard deviation of 7 and standard error of estimate of 3, the 99% confidence interval for the applicant’s predicted score of 80 is:
71 to 89.
When you assess the validity of a test, you are assessing its:
accuracy.
When using a selection test to predict the job performance scores of job applicants, you would use which of the following to construct a confidence interval around each applicant’s predicted job performance score?
standard error of estimate
In a normal distribution, a T-score of 60 is equivalent to a percentile rank of:
84.
An expectancy table is useful for interpreting an examinee’s score in terms of:
likely criterion performance.
The point at which an item characteristic curve intercepts the Y (vertical) axis provides information about which of the following?
the probability of answering the item correctly by guessing
Which of the following is likely to produce the largest reliability coefficient for a newly developed achievement test?
unrestricted range of scores and homogeneous content of test items
A factor matrix indicates that one of the tests included in the factor analysis has a factor loading of .30 for Factor I. This means that ____ of variability in test scores is explained by Factor I.
9%
When using the multitrait-multimethod matrix to assess a test’s construct validity, a small heterotrait-monomethod coefficient suggests that the test has:
adequate divergent validity.
The results of a factor analysis indicate that a test has a correlation coefficient of .20 with Factor I, .35 with Factor II, and .60 with Factor III. The correlation of .60 indicates that ____% of variability in test scores is explained by Factor III.
36
When a test item has an item discrimination index (D) of _____, this means that the same percentage of examinees in the high-scoring and low-scoring groups answered the item correctly.
0
Job applicants who are hired on the basis of their scores on a job selection test but then obtain unsatisfactory scores on a measure of job performance six months later are:
false positives.
Based on his score on a selection test, a job applicant obtains a predicted job performance score of 70. The job performance measure’s standard error of estimate is 5, which means that there’s a 95% chance that his true job performance score is between:
60 and 80.
An organizational psychologist conducts a predictive validity study to obtain the data she needs to evaluate a new selection test’s incremental validity. The data she collects indicates that 45 of the employees in her study are “true negatives,” which means that they obtained:
low scores on both the predictor and criterion.
An achievement test has a test-retest reliability coefficient of .90. This means that ____ of variability in test scores is due to true score variability.
90%
A test developer is calculating a test’s __________ when she divides the number of true positives identified by the test by the number of true positives plus false negatives.
sensitivity
A normal distribution of raw scores has a mean of 75 and standard deviation of 5. In this distribution, a raw score of 65 is equivalent to a:
z-score of -2.0.
A test’s __________ is its ability to accurately identify individuals who don’t have the condition that is assessed by the test.
specificity
A test developer would use the multitrait-multimethod matrix to evaluate a test’s:
construct validity.
To evaluate the test-retest reliability of a newly developed measure of intelligence, a test developer administers the test to the same sample of examinees on two separate occasions. When he correlates the two sets of scores, he obtains a reliability coefficient of .60. To increase this reliability coefficient, the test developer should:
increase the number of test items and make sure the new sample of examinees is heterogeneous with regard to level of intelligence.
An assumption of classical test theory is that measurement error:
is random.
Classical test theory predicts that total variability in test scores is the result of true score variability:
plus random error.
For a test that consists of 50 true/false questions, the optimal average item difficulty level (p) is:
.75
In a normal distribution, a percentile rank of ____ is one standard deviation above the mean of the distribution.
84
You would use which of the following to construct a confidence interval around an examinee’s predicted criterion score?
standard error of estimate
In the context of factor analysis, “orthogonal” means:
uncorrelated.
When a predictor’s criterion-related validity coefficient is .40, this means that:
16% of variability in criterion scores is explained by variability in predictor scores.
The correction for attenuation formula is used to estimate the effects of increasing:
the reliability of a predictor and/or criterion on the criterion-related validity coefficient.
The results of a factor analysis indicate that the factor loading for Test A and Factor I is .70. This means that:
49% of variability in Test A scores is explained by Factor I.
The standard error of measurement is used to:
construct a confidence interval around an examinee’s obtained score.
A middle school student receives a full-scale IQ score of 105 on an intelligence test that has a mean of 100, standard deviation of 15, and standard error of measurement of 3. The 95% confidence interval for this student’s score is:
99 to 111.
A T-score distribution has a mean of _____ and standard deviation of _____.
50; 10
In the context of diagnostic efficiency, prevalence refers to how common a disorder is in a particular population at a particular point in time, and its magnitude affects a test’s positive and negative predictive values. When the prevalence increases:
the positive predictive value increases and the negative predictive value decreases.
Dr. Haar is concerned that the statistics tests she uses for her introductory statistics class are too difficult since so few students pass them. To make her tests a little easier, she will want to remove some items that have an item difficulty index (p) of ________ and add some items that have an item difficulty index of ________.
.15 and lower; .85 and higher
When a predictor has a criterion-related validity coefficient of _____, this means that 64% of variability in scores on the criterion is explained by variability in scores on the predictor.
.80
In a normal distribution of scores, a T-score of _____ is equivalent to a z-score of _____ and a percentile rank of 84.
60; 1.0
A problem with using percent agreement as a measure of inter-rater reliability is that it may:
overestimate reliability because it’s affected by chance agreement.
Consensual observer drift __________ a measure’s inter-rater reliability.
tends to artificially increase
In a normal distribution, which of the following represents the highest score?
T score = 65
Before using a newly developed 10-item screening test to identify people who are depressed, you administer the test to a sample of clinic patients along with an established (validated) 50-item measure of depression and correlate the two sets of scores. In this situation, you are evaluating the screening test’s:
concurrent validity.
The item discrimination index (D) provides information on how well a test item discriminates between examinees who obtain a high or low score on the entire test. It ranges in value from:
-1.0 to +1.0.
The sensitivity of a diagnostic screening test is:
the ability of the test to identify clients who have the disorder.
Job applicants complain that the items included in a selection test “don’t look like they have anything to do with job performance.” As described by these applicants, this test lacks ________ validity.
face
To evaluate the inter-rater reliability of a test when scores or ratings on the test represent a nominal scale of measurement, you would use which of the following?
kappa coefficient
An educational psychologist administers her newly developed aptitude test to a sample of examinees and is disappointed that the test’s reliability coefficient is only .45. Which of the following is likely to be most helpful for increasing the test’s reliability coefficient?
adding more items to the test that are similar to the original items in terms of content and administering the test to a new sample that is more heterogeneous with regard to level of aptitude
The manual for a test of fluid intelligence reports that, for the standardization sample, Cronbach’s alpha was .93. This suggests that the test has adequate:
internal consistency reliability.
In a factor matrix, a communality indicates the proportion of variability:
in a single test that’s accounted for by all of the identified factors.
When a predictor’s reliability coefficient is .49, its criterion-related validity coefficient can be no greater than:
.70.
Which of the following is used to estimate the effect of increasing a predictor’s reliability on its criterion-related validity coefficient?
correction for attenuation formula
After using the split-half method to estimate a test’s reliability, you would use which of the following to correct the split-half reliability coefficient?
Spearman-Brown formula
An organizational psychologist is hired by a large software company to develop a new selection test for hiring entry level software developers. To do so, she obtains a sample of 30 software developers who were recently hired by the company using its existing selection procedure and uses their responses to the proposed selection test items to determine which items to include in its final version. When she administers the final version of the test to this sample and correlates their test scores with scores on a measure of job performance, she obtains a criterion-related validity coefficient of .45. When the psychologist cross-validates the test on another sample of 40 recently hired software developers, the validity coefficient for this sample will most likely be:
smaller than .45.
The use of banding to assist with hiring decisions is based on the assumption that:
small differences in selection test scores are not necessarily associated with meaningful differences in job performance.
A test has __________ validity when it has low correlations with measures of unrelated characteristics.
discriminant
Which of the following is not a norm-referenced score?
percentage scores
A z-score of 0 is equivalent to a T-score of _____ and a stanine of _____.
50; 5
A cognitive ability test has a mean of 100, standard deviation of 15, and standard error of measurement of 4. The 95% confidence interval for an obtained score of 110 on this test is:
102 to 118.
Which of the following is used to determine a test’s internal consistency reliability?
coefficient alpha
When conducting a factor analysis, a researcher would rotate the initial factor matrix to:
obtain a factor matrix that is easier to interpret.
To evaluate the validity of a new aptitude test that will be used to facilitate the college admissions process, a test developer administers the test to a sample of high school juniors and seniors who are admitted to college without use of their aptitude test scores. She then obtains the GPAs of the same students at the end of their second year of college and calculates a correlation coefficient for the students’ aptitude test scores and GPAs, which provides information about the test’s ________ validity.
predictive
In the context of factor analysis, “orthogonal” means:
uncorrelated
In a normal distribution, a T-score of 40 is equivalent to a percentile rank of:
16
Before adding a new selection test to the procedure that’s currently being used to make hiring decisions, you would want to make sure that adding the test will increase decision-making accuracy. In other words, you’d want to make sure the new selection test has adequate:
incremental validity.
In the context of test construction, “shrinkage” is associated with:
cross-validation.
Which of the following would be used to assess the inter-rater reliability of a rating scale designed to help clinicians distinguish between children who either do or do not meet the DSM criteria for a diagnosis of ADHD?
Cohen’s kappa coefficient
The reliability index is an estimate of the correlation between actual observed scores and theoretical true scores and is calculated by:
taking the square root of the reliability coefficient.
In a normal distribution, which of the following represents the highest score?
T score = 70
When using the multitrait-multimethod matrix to evaluate a test’s validity, the matrix provides evidence of the test’s __________ validity when scores on the test have low correlations with scores on tests that measure unrelated constructs.
divergent
All other things being equal, which of the following tests is likely to have the lowest reliability coefficient?
a true/false test
A manager and assistant manager were asked to rate 30 employees in terms of readiness for promotion. After reviewing each employee’s file, the manager and assistant manager independently categorized employees as being ready or not ready for promotion. Which of the following is the appropriate technique for determining the inter-rater reliability of the ratings made by the manager and assistant manager?
Cohen’s kappa coefficient
The reliability index is an estimate of the correlation between actual observed scores and theoretical true scores and is calculated by:
taking the square root of the reliability coefficient.
When using the multitrait-multimethod matrix to evaluate a test’s validity, the matrix provides evidence of the test’s __________ validity when scores on the test have high correlations with scores on other tests that measure the same or a related construct.
convergent
According to classical test theory, variability in test scores is due to a combination of:
true score variability and random error.
A test developer administers her newly developed job knowledge test to 50 applicants for sales positions at a large insurance company. All 50 applicants are hired regardless of their scores on the test and their scores are correlated with the scores they receive on a job performance measure six months later. This correlation coefficient provides information about the job knowledge test’s:
predictive validity.
In the context of factor analysis, “orthogonal” means:
uncorrelated.
To ensure that job applicants believe that questions in a job selection test are relevant to job performance so they do their best when answering them, the test developer will want to make sure the test has adequate ________ validity.
face
In a multitrait-multimethod matrix, a large monotrait-heteromethod coefficient provides evidence of a test’s:
convergent validity.
Which of the following describes the relationship between a test’s reliability coefficient and its criterion-related validity coefficient?
A test’s criterion-related validity coefficient can be no greater than the square root of its reliability coefficient.
A test’s __________ is the proportion of people who have a disorder and are correctly identified by the test as having the disorder.
sensitivity
The incremental validity of a new selection test (predictor) is calculated by subtracting:
the base rate from the positive hit rate.
Before using a selection test to estimate how well job applicants will do on a measure of job performance on their first few days of work, you would want to make sure the selection test has adequate:
predictive validity.
Before using a selection test to estimate how well job applicants will do on a measure of job performance one year after they’re hired, you would want to make sure the selection test has adequate:
predictive validity.
You have developed a new selection test for your client, the Acme Company, to help the company make better hiring decisions. After administering the test to samples of job applicants and current employees, you decide to raise the test’s cutoff score. Doing so will have which of the following effects?
It will decrease the number of false positives and increase the number of true negatives.
When a test has a standard deviation of 10, the test’s standard error of measurement will fall between:
0 and 10
Using __________ to select job applicants takes into account the fact that tests used to make selection decisions are not always entirely reliable.
banding
You would use which of the following to evaluate the internal consistency of a test when all of its items are scored as correct or incorrect?
Kuder-Richardson 20
Which of the following best describes classical test theory (CTT) and item response theory (IRT)?
CTT is test based and IRT is item based.
A test’s __________ refers to its ability to correctly identify individuals who are true negatives.
specificity
The slope of an item characteristic curve provides information on which of the following?
item discrimination
When using percent agreement to assess the inter-rater reliability of a behavior observation scale, it’s important to keep in mind that doing so may:
overestimate reliability because it’s affected by chance agreement.
A middle school student receives a full-scale IQ score of 105 on an intelligence test that has a mean of 100, standard deviation of 15, and standard error of measurement of 3. The 95% confidence interval for this student’s score is:
99 to 111.