Test Construction Prpjet quizzes Flashcards

1
Q
You would use which of the following to evaluate the internal consistency of a test when all of its items are scored as correct or incorrect?
A. Kendall’s coefficient of concordance
B. Cohen’s kappa coefficient
C. Kuder-Richardson 20
D. Spearman-Brown
A

Answer C is correct. Kuder-Richardson 20 (KR-20) can be used to assess a test’s internal consistency reliability when test items are dichotomously scored (e.g., as correct or incorrect).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

An educational psychologist administers her newly developed aptitude test to a sample of examinees and is disappointed that the test’s reliability coefficient is only .45. Which of the following is likely to be most helpful for increasing the test’s reliability coefficient?
A. adding more items to the test that are similar to the original items in terms of content and administering the test to a new sample that is more homogeneous with regard to level of aptitude
B. adding more items to the test that differ from the original items in terms of content and administering the test to a new sample that is more homogeneous with regard to level of aptitude
C. adding more items to the test that are similar to the original items in terms of content and administering the test to a new sample that is more heterogeneous with regard to level of aptitude
D. adding more items to the test that differ from the original items in terms of content and administering the test to a new sample that is more heterogeneous with regard to level of aptitude

A

Answer C is correct. Reliability coefficients tend to be larger for longer tests than shorter tests as long as the added items address similar content as the original items do and when the tryout sample is heterogeneous with regard to the content measured by the test so that there is an unrestricted range of scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
The slope of an item characteristic curve provides information on which of the following?
A. the probability of guessing correctly
B. the degree of relevance
C. item difficulty
D. item discrimination
A

Answer D is correct. An item’s ability to discriminate between examinees with high and low levels of the latent ability assessed by the test is indicated by the slope (steepness) of the item characteristic curve: the steeper the slope, the better the discrimination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
When a test item has an item discrimination index (D) of \_\_\_\_\_, this means that the same percentage of examinees in the high-scoring and low-scoring groups answered the item correctly.
A. 0
B. .50
C. -1.0
D. +1.0
A

Answer A is correct. The item discrimination index (D) indicates the difference between the percentage of examinees with high total test scores who answered the item correctly and the percentage of examinees with low total test scores who answered the item correctly. When the same percentage of examinees in the two groups answered the item correctly, D equals 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
Which of the following would be used to assess the inter-rater reliability of a rating scale designed to help clinicians distinguish between children who either do or do not meet the DSM criteria for a diagnosis of ADHD?
A. Kuder-Richardson 20
B. Spearman-Brown
C. Cronbach’s coefficient alpha
D. Cohen’s kappa coefficient
A

Answer D is correct. The kappa coefficient is used to assess the consistency of ratings assigned by two raters when the ratings represent a nominal scale (e.g., when a rating scale classifies children as either meeting or not meeting the diagnostic criteria for ADHD).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
A cognitive ability test has a mean of 100, standard deviation of 15, and standard error of measurement of 4. The 95% confidence interval for an obtained score of 110 on this test is:
A. 92 to 108.
B. 106 to 116.
C. 102 to 118.
D. 98 to 122.
A

Answer C is correct. The 95% confidence interval for an obtained test score is calculated by adding and subtracting two standard errors of measurement to and from the score. For the situation described in this question, the 95% confidence interval is 110 minus and plus 8 (two standard errors), which is 102 to 118.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
After using the split-half method to estimate a test’s reliability, you would use which of the following to correct the split-half reliability coefficient?
A. coefficient of determination
B. correction for attenuation
C. Kuder-Richardson formula
D. Spearman-Brown formula
A

Answer D is correct. Test length is one of the factors that affects the size of the reliability coefficient, and the Spearman-Brown formula is often used to estimate the effects of lengthening or shortening a test on its reliability coefficient. This formula is especially useful for correcting the split-half reliability coefficient because assessing split-half reliability involves splitting the test in half and calculating a reliability coefficient for the two halves of the test. Therefore, split-half reliability tends to underestimate a test’s actual reliability, and the Spearman-Brown formula is used to estimate the reliability coefficient for the full length of the test. [The correction for attenuation formula (answer B) isn’t mentioned in the content summary and is a formula that is used to estimate the effect of increasing the reliability of the predictor and/or criterion on the predictor’s criterion-related validity coefficient.]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
Classical test theory predicts that total variability in test scores is the result of true score variability:
A. plus random error.
B. plus systematic error.
C. minus systematic error.
D. minus systematic and random error.
A

Answer A is correct. Classical test theory is also known as true score theory and predicts that obtained test scores (X) are due to a combination of true score variability (T) and measurement error (E), with measurement error referring to random factors that affect test performance in unpredictable ways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
An achievement test has a test-retest reliability coefficient of .90. This means that \_\_\_\_ of variability in test scores is due to true score variability.
A. 90%
B. 84%
C. 16%
D. 10%
A

Answer A is correct. A reliability coefficient is interpreted directly as the amount of variability in test scores that’s due to true score variability. When a test’s reliability coefficient is .90, this means that 90% of variability in test scores is due to true score variability and the remaining 10% is due to measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A factor analysis yields three factors, and one of the tests included in the analysis has factor loadings of .60, .30, and .10 on Factors I, II, and III, respectively. Assuming that the rotation was orthogonal, the communality for Test A is:
A. .91.
B. .54.
C. .46.
D. .09.
A

Answer C is correct. When factors are orthogonal (uncorrelated), you can add the squared factor loadings to determine a test’s communality: .60 squared is .36, .30 squared is .09, and .10 squared is .01; .36 plus .09 plus .01 is .46.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The results of a factor analysis indicate that the factor loading for Test A and Factor I is .70. This means that:
A. 70% of variability in Test A scores is explained by the factor analysis.
B. 49% of variability in Test A scores is explained by the factor analysis.
C. 70% of variability in Test A scores is explained by Factor I.
D. 49% of variability in Test A scores is explained by Factor I.

A

Answer D is correct. A factor loading indicates the correlation between a test and an identified factor. It can be squared to determine the amount of variability in test scores that’s explained by the identified factor: .70 squared is .49 (49%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
In the context of factor analysis, “orthogonal” means:
A. correlated.
B. uncorrelated.
C. statistically significant.
D. statistically insignificant.
A

Answer B is correct. In factor analysis, factors that are uncorrelated are referred to as orthogonal while factors that are correlated are referred to as oblique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A test has \_\_\_\_\_\_\_\_\_\_ validity when it has low correlations with measures of unrelated characteristics.
A. discriminant
B. concurrent
C. convergent
D. differential
A

Answer A is correct. For the exam, you want to know that discriminant validity is another name for divergent validity and that a test has discriminant (divergent) validity when scores on the test have low correlations with scores on measures of unrelated traits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
When using the multitrait-multimethod matrix, a large \_\_\_\_\_\_\_\_\_\_\_\_ coefficient provides evidence of a test’s convergent validity.
A. heterotrait-heteromethod
B. heterotrait-monomethod
C. monotrait-heteromethod
D. monotrait-monomethod
A

Answer C is correct. The monotrait-heteromethod coefficient indicates the correlation between the test that’s being validated and a measure of the same trait (monomethod) using a different method (heteromethod). It provides evidence of the test’s convergent validity when it’s large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
To ensure that job applicants believe that questions in a job selection test are relevant to job performance so they do their best when answering them, the test developer will want to make sure the test has adequate \_\_\_\_\_\_\_\_ validity.
A. content
B. face
C. convergent
D. divergent
A

Answer B is correct. Face validity refers to the extent to which test items “look valid” to examinees. It’s not an actual type of test validity but is often important for ensuring that examinees are motivated to do their best when answering test items. (Because adequate content validity does not necessarily guarantee adequate face validity, answer B is better than answer A.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
Based on his score on a selection test, a job applicant obtains a predicted job performance score of 70. The job performance measure’s standard error of estimate is 5, which means that there’s a 95% chance that his true job performance score is between:
A. 65 and 75.
B. 60 and 80.
C. 55 and 85.
D. 50 and 90.
A

Answer B is correct. The 95% confidence interval for a predicted criterion score is calculated by adding plus and minus two standard errors of estimate to the obtained criterion score: 70 + 5(2) = 60 to 80.

17
Q
A test developer administers her newly developed job knowledge test to 50 applicants for sales positions at a large insurance company. All 50 applicants are hired regardless of their scores on the test and their scores are correlated with the scores they receive on a job performance measure six months later. This correlation coefficient provides information about the job knowledge test’s:
A. concurrent validity.
B. predictive validity.
C. convergent validity.
D. discriminant validity.
A

Answer B is correct. Predictive validity is a type of criterion-related validity that’s evaluated to determine how well predictor scores predict future criterion scores. It’s assessed by correlating scores on the predictor (e.g., job knowledge test) with scores obtained on the criterion (e.g., job performance measure) at a later time.

18
Q

On cross-validation, a multiple correlation coefficient (R) is most likely to “shrink” when:
A. the original validation sample was small and the number of predictors is small.
B. the original validation sample was small and the number of predictors is large.
C. the original validation sample was large and the number of predictors is small.
D. the original validation sample was large and the number of predictors is large.

A

Answer B is correct. Shrinkage refers to the reduction in the size of a criterion-related correlation coefficient on cross-validation and is greatest when the initial sample is small and, for the multiple correlation coefficient, the number of predictors is large.

19
Q
A psychologist was hired to develop a new selection test that the company will add to its current selection procedure. The psychologist determines that the test has adequate test-retest reliability and concurrent validity. However, before the company begins using the test to hire applicants, the psychologist will want to make sure that its use also produces an increase in decision-making accuracy. In other words, the psychologist will want to make sure that the test has adequate:
A. predictive validity.
B. differential validity.
C. incremental validity.
D. construct validity.
A

Answer C is correct. A predictor has incremental validity when its use produces an adequate increase in decision-making accuracy.

20
Q

An organizational psychologist conducts a predictive validity study to obtain the data she needs to evaluate a new selection test’s incremental validity. The data she collects indicates that 45 of the employees in her study are “true negatives,” which means that they obtained:
A. low scores on both the predictor and criterion.
B. high scores on both the predictor and criterion.
C. high scores on the predictor and low scores on the criterion.
D. low scores on the predictor and high scores on the criterion

A

Answer A is correct. To identify the correct answer to this question, you need to know that a person’s score on the predictor determines if he/she is a “positive” or “negative”: If the person received a high score on the predictor, that person is a positive; if the person received a low score on the predictor, he/she is a negative. Knowing that narrows the choices to answers A and D since the question is asking about true negatives. To choose between these two answers, you need to know that a person’s score on the criterion determines if he or she is a true or false positive or negative: When the person obtains low scores on the predictor and the criterion, he/she is a true negative (answer A); when the person obtains a low score on the predictor but a high score on the criterion, he/she is a false negative (answer D).

21
Q

The sensitivity of a diagnostic screening test is:
A. the ability of the test to identify clients who do not have the disorder.
B. the ability of the test to identify clients who have the disorder.
C. the probability that a client who tests negative on the test does not actually have the disorder.
D. the probability that a client who tests positive on the test actually has the disorder.

A

Answer B is correct. Sensitivity is the proportion of people with the disorder who are identified by the test as having the disorder. It’s calculated by dividing the true positives by the true positives plus the false negatives (TP/TP + FN).

22
Q
When a predictor’s reliability coefficient is .49, its criterion-related validity coefficient can be no greater than:
A. .70.
B. .51.
C. .49.
D. 30.
A

Answer A is correct. The maximum criterion-related validity coefficient for a predictor is equal to the square root of its reliability coefficient: The square root of .49 is .70.

23
Q

When a predictor’s criterion-related validity coefficient is .40, this means that:
A. 40% of variability in predictor scores is true score variability.
B. 16% of variability in predictor scores is true score variability.
C. 40% of variability in criterion scores is explained by variability in predictor scores.
D. 16% of variability in criterion scores is explained by variability in predictor scores.

A

Answer D is correct. Like other correlation coefficients for two different measures, the criterion-related validity coefficient can be squared to determine the amount of variability in one measure that’s explained by or shared with the other measure: .40 squared is .16. When a predictor’s criterion-related validity coefficient is .40, this means that 16% of variability in criterion scores is explained by variability in predictor scores.

24
Q
A z-score of 0 is equivalent to a T-score of \_\_\_\_\_ and a stanine of \_\_\_\_\_.
A. 10; 5
B. 100; 10
C. 50; 10
D. 50; 5
A

Answer D is correct. A z-score of 0, a T-score of 50, and a stanine of 5 are all equivalent to the mean score obtained by the norm group.

25
Q
In a normal distribution, a T-score of 60 is equivalent to a percentile rank of:
A. 16.
B. 50.
C. 84.
D. 98.
A

Answer C is correct. A T-score of 60 is one standard deviation above the mean and, in a normal distribution, a score that’s one standard deviation above the mean is equivalent to a percentile rank of 84.

26
Q
A student receives a score of 85 on a test that has a mean of 75 and standard deviation of 5. What is the z-score equivalent of the student’s score?
A. +1.0
B. +2.0
C. +2.5
D. +10
A

Answer B is correct. You could have calculated the z-score to identify the correct answer to this question using the following formula: z = (X –M/SD) = (85 – 75)/5 = 10/5 = +2.0. Alternatively, you could have noticed that the student’s score is two standard deviations above the mean, which means that his/her z-score is +2.0.

27
Q
Using \_\_\_\_\_\_\_\_\_\_ to select job applicants takes into account the fact that tests used to make selection decisions are not always entirely reliable.
A. the top-down method
B. banding
C. expectancy tables
D. ranking
A

Answer B is correct. The use of banding is based on the assumption that small differences in test scores are often due to the unreliability of the test. Note that ranking (answer D) is also known as the top-down method (answer A).

28
Q

The percentile rank distribution is:
A. normal regardless of the shape of the raw score distribution.
B. normal only when the raw score distribution is normal.
C. rectangular regardless of the shape of the raw score distribution.
D. the same as the distribution of raw scores.

A

Answer C is correct. A percentile rank distribution is always rectangular (flat) regardless of the shape of the distribution of raw scores.

29
Q
An expectancy table is useful for interpreting an examinee’s score in terms of:
A. likely criterion performance.
B. a confidence interval.
C. mastery of test content.
D. norms.
A

Answer A is correct. Expectancy tables provide information on an examinee’s expected score on a criterion based on his/her obtained predictor (test) score.