Test Construction Quiz Questions Flashcards

1
Q

A screening test for a disorder that has a very low base rate in the population is known to have an overall accuracy rate of 98%. When using this test to identify individuals in the general population who have the disorder, it’s important to keep in mind that the test will produce:

A

a larger number of false positives than false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a factor matrix, the factor loading for Test A and Factor II is .70. This means that:

A

49% of variability in Test A is accounted for by Factor II

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When using the multitrait-multimethod matrix to evaluate the construct validity of a newly developed test, a __________ coefficient provides evidence of the test’s divergent (discriminant) validity.

A

small heterotrait-monomethod

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When item response theory has been used as the basis for test construction, an examinee’s score on the test provides information about his/her:

A

future status on an external criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To construct the 68% confidence interval for an examinee’s obtained test score, you would need the examinee’s score and:

A

the standard error of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When a test’s reliability coefficient is equal to 0, the standard error of measurement for the test is:

A

equal to the tests standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A test developer uses a multitrait-multimethod matrix to organize the data she has collected in a validation study of her newly developed self-report measure of self-esteem. The matrix indicates that the correlation between her self-report measure of self-esteem and an established (previously validated) teacher rating of self-esteem is .91. This correlation coefficient suggests that the self-report measure of self-esteem has:

A

adequate convergent validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A measure of test anxiety is administered to a sample of 50 psychologists who are studying for the licensing exam, and a split-half reliability coefficient of .80 is calculated from their scores. The test is then administered to another group of 50 psychologists who are more heterogeneous with regard to level of test anxiety. The split-half reliability coefficient for the second group is most likely to be:

A

Larger than .80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You would use which of the following to estimate what a predictor’s criterion-related validity coefficient would be if the predictor and/or criterion had a reliability coefficient of 1.0?

A

Correction for attenuation formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scores on a predictor that will be used to estimate job performance rating range from 0 to 200. If the predictor’s cutoff score is raised from 130 to 150, this will have which of the following effects?

A

Decrease the number of false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

To assess the internal consistency reliability of a test that contains 50 items that are each scored as either “correct” or “incorrect,” you would use which of the following?

A

KR-20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The kappa statistic for a test is .95. This means that the test has:

A

Adequate inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For a newly developed test of cognitive flexibility, coefficient alpha is .55. Which of the following would be useful for increasing the size of this coefficient?

A

Adding more items that are similar in terms of content and quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A student receives a score of 450 on a college aptitude test that has a mean of 500 and standard error of measurement of 50. The 68% confidence interval for the student’s score is:

A

400 to 500

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Consensual observer drift tends to:

A

produce an overestimate of a test’s inter-rater reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To determine a test’s internal consistency reliability by calculating coefficient alpha, you would:

A

administer the test to a single sample of examinees one time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A problem with using percent agreement as a measure of inter-rater reliability is that it doesn’t take into account the effects of:

A

chance agreement among raters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

According to classical test theory, total variability in obtained test scores is composed of:

A

true score variability plus random error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which of the following methods for evaluating reliability is most appropriate for speed tests?

A

Coefficient of equivalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Your newly developed measure of integrity correlates highly with a well-known and widely used measure of integrity. This correlation provides evidence of your measure’s ________ validity.

A

Convergent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

In a multitrait-multimethod matrix, a test’s construct validity would by confirmed when:

A

monotrait-heteromethod coefficients are high and heterotrait-monomethod coefficients are low.

High Mono-trait, Low Hetero-trait HMLH
ham l’hell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which of the following best defines the relationship between a predictor’s reliability coefficient and its criterion-related validity coefficient?

A

Validity is no greater than the square root of reliability

V before R because of VR- no greater than sq root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The results of a factor analysis indicate that Test A has a factor loading of .70 for Factor I and a factor loading of .20 for Factor II. Assuming that only two factors were extracted and that the factors are orthogonal, you can conclude that the communality for Test A scores is:

A

0.53

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

When conducting a factor analysis, you would choose an oblique rotation of the factors if:

A

you believe the constructs measured by the tests included in the analysis are correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The standard error of estimate is used to:

A

estimate the difference between an examinee’s predicted criterion score and his or her true criterion score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In a scatterplot constructed from data collected in a concurrent validity study, the number of “false negatives” is likely to increase if:

A

the predictor cutoff score is raised and/or the criterion cutoff score is lowered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Validity is best described as:

A

Accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

A test developer uses a sample of 50 current employees to identify items for and then validate a new selection test (predictor). When she correlates scores on the test with scores on a measure of job performance (criterion) for this sample, she obtains a criterion-related validity coefficient of .63. When the test developer administers the test and the measure of job performance to a new sample of 50 employees, she will most likely obtain a validity coefficient that is:

A

Less than .63 (shrinkage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

When determining a predictor’s incremental validity, the positive hit rate is calculated by:

A

dividing the number of true positives by the total number of positives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Test’s sensitivity measures

A

True positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Test’s specificity measures

A

True negatives

“How much no”

specific = pacific = bad (bc California)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

A test’s content validity is established primarily by which of the following?

A

Having subject matter experts systematically review the test’s items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

To ascertain if the test you have developed is valid as a screening test for determining whether a person has an anxiety or affective disorder, you would be most interested in evaluating the test’s:

A

Concurrent validity (type of criterion-related validity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

To evaluate the concurrent validity of a new selection test for computer programmers, you would:

A

administer the test to current computer programmers and correlate their test scores with recently assigned job performance ratings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

____________ refers to the percent of examinees who have the condition being assessed by a predictor who are identified by the predictor as having the condition.

A

Sensitivity

“How much yes”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Assuming a normal distribution, which of the following represents the highest score?
- z-score of 1.5
- T score of 70
- WAIS FSIQ of 120
- Percentile rank of 88

A

T score of 70

37
Q

Eigenvalues are associated with:

A

principal component analysis.

38
Q

: Which of the following scores is NOT a norm-referenced score?
- Percentile Rank
- T-Score
- Pass or Fail
- Grade Equivalent Score

A

Pass or fail

39
Q

A psychologist develops a diagnostic test to identify people who have injection phobia. In this situation, the test’s ________ refers to how good the test is at identifying people who have injection phobia from the pool of people who actually have injection phobia.

A

sensitivity

40
Q

To maximize the ability of a test to discriminate among test takers, a test developer will want to include test items that vary in terms of difficulty. If the test developer wants to add more difficult items to her test, she will include items that have an item difficulty index of:

A

.10 (scale goes from 0 to 1, 0 is hardest, .10 means 10% of people will answer correctly)

41
Q

The correction for attenuation formula is used to measure the impact of increasing:

A

a test’s reliability on its validity

42
Q

When a test has been constructed on the basis of item response theory, an examinee’s total test score provides information about his/he

A

status on a latent trait or ability

43
Q

The primary advantage in using a percentile rank, z-score, or T-score is that these scores:

A

are easy to interpret because they reference an individual’s test performance to the performance of other examinees

44
Q

You would use a multitrait-multimethod matrix in order to:

A

determine if a test has adequate convergent and discriminant validity

45
Q

An advantage of using the kappa statistic rather than percent agreement when assessing a test’s inter-rater reliability is that the former:

A

corrects for chance agreement

46
Q

Which type of reliability would be most appropriate for estimating the reliability of a multiple-choice speed test?

A

Alternate forms

47
Q

When using principal component analysis:

A

the first principal component represents the largest share of the total variance.

48
Q

Stella obtains a score of 50 on a test that has a standard deviation of 10 and a standard error of measurement of 5. The 95% confidence interval for Stella’s score is approximately:

A

5*1.96= ~10, correct answer is 40-60

49
Q

In a normal distribution, which of the following represents the lowest score?
- percentile rank of 20
-z-score of -1.0
-T-score of 25
-FSIQ of 70

A

T-score of 25 (2.5 SD below mean)

50
Q

In terms of magnitude, the standard error of measurement can be:

A

no greater than the standard deviation of the test scores

51
Q

To obtain a “coefficient of stability,” you would:

A

administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores.

52
Q

The best way to control consensual observer drift is to:

A

Altnerate raters

53
Q

A reliability coefficient is best defined as a measure of:

A

Consistency

54
Q

A test designed to measure knowledge of clinical psychology is likely to have the highest reliability coefficient when:

A

High number of items, heterogeneous sample

55
Q

In factor analysis, a factor loading indicates the correlation between:

A

a test and an identified factor

56
Q

In factor analysis, when two factors are “orthogonal,” this means that:

A

the factors are uncorrelated.

57
Q

Which of the following types of validity would you be most interested in when designing a selection test that will be used to predict the future job performance ratings of job applicants?

A

Criterion-related

58
Q

Which of the following best describes the relationship between validity and reliability?

A

A valid test is also a reliable test.

59
Q

In a multitrait-multimethod matrix, the coefficient that indicates a test’s reliability is the _____________ coefficient.

A

monotrait-monomethod

60
Q

The heterotrait-monomethod coefficient is

A

a measure of divergent validity.

61
Q

The monotrait-heteromethod coefficient is

A

a measure of convergent validity.

62
Q

The heterotrait-heteromethod coefficient is

A

a measure of divergent validity.

63
Q

Cronbach’s alpha is an appropriate method for evaluating reliability when:

A

all test items are designed to measure the same underlying characteristic.

64
Q

Incremental validity is a measure of:

A

decision-making accuracy

65
Q

In factor analysis, communality refers to:

A

the proportion of variance accounted for in a single variable by all of the identified factors.

66
Q

Criterion contamination has which effect

A

It artificially increases the predictor’s criterion-related validity coefficient.

67
Q

To evaluate the concurrent validity of a new selection test for clerical workers, you would:

A

administer the test to a sample of current clerical workers and correlate their scores on the test with their recently assigned performance ratings

68
Q

A test developer would use the Kuder-Richardson Formula (KR-20) in order to:

A

evaluate a test’s internal consistency reliability

69
Q

To maximize the inter-rater reliability of a behavioral observation scale, you should make sure that coding categories:

A

are mutually exclusive

70
Q

The minimum and maximum values of the standard error of estimate are:

A

0 and the standard deviation of the criterion

71
Q

When the kappa statistic for a measure is .90, this indicates that the measure:

A

has adequate inter-rater reliability

72
Q

A personnel director hires all job applicants who obtain a high score on a job selection test but, after using the test for six months, realizes that many of the new employees are obtaining low-performance ratings. Assuming that the selection test has adequate criterion-related validity, the personnel director can reduce the number of unsatisfactory workers that she hires using the test by:

A

raising the selection test cutoff score

73
Q

All other things being equal, which of the following tests is likely to have the largest reliability coefficient?

A

A multiple-choice test that consists of items that each have five answer options

74
Q

According to classical test theory, total variability in test scores is due to:

A

true score variability plus random error

75
Q

The assumption underlying convergent validity is that:

A

a measure of a characteristic should correlate highly with a different type of measure that is already known to assess the same characteristic.

76
Q

Content sampling is not a potential source of measurement error for which of the following methods for evaluating a test’s reliability?

A

Test-retest only

77
Q

After reviewing the data collected on a new selection test during the course of a criterion-related validity study on new hires, a psychologist decides to lower the selection test cutoff score. Apparently the psychologist is hoping to do which of the following?

A

Increase the number of true positives (it will also increase the number of false positives, but that is not what he wants)

78
Q

The standard error of measurement is used to:

A

calculate the range within which an examinee’s true test score is likely to fall given her obtained score

79
Q

The distribution of percentile ranks is always:

A

rectangular (flat) regardless of the shape of the distribution of raw scores

80
Q

In factor analysis, the original factor matrix is usually rotated in order to:

A

facilitate interpretation of the identified factors

81
Q

When a test user uses a correction for guessing formula that involves subtracting points from each examinee’s scores, the resulting distribution of scores will have a ____________________ than the original (non-corrected) distribution.

A

lower mean and larger standard deviation

82
Q

The item discrimination index (D) ranges in value from:

A

-1.0 to +1.0

83
Q

Which of the following is used to estimate the effects of shortening or lengthening a test on the test’s reliability coefficient?

A

Spearman-Brown formula

84
Q

A test developer would construct an expectancy table to:

A

facilitate criterion-referenced interpretation of test scores

85
Q

To evaluate the validity of a newly developed selection test for clerical workers, a test developer will correlate scores obtained on the test by newly hired clerical workers with the job performance ratings they receive after being on-the-job for six months. The resulting correlation coefficient will provide information on the test’s:

A

predictive validity

86
Q

The point at which an item characteristic curve intercepts the vertical (Y) axis provides information on which of the following?

A

The probability of answering the item correctly by guessing

87
Q

‘hen the heterotrait-monomethod coefficient is large, this indicates:

A

Lack of discriminant validity

88
Q

It would be most important to assess the test-retest reliability of a measure that:

A

Measures a stable trait

89
Q

In terms of item response theory, the slope (steepness) of the item characteristic curve (ICC) indicates the item’s:

A

ability to discriminate between examinees