test construction Flashcards

1
Q

A psychologist develops a diagnostic test to identify people who have injection phobia. In this situation, the test’s ________ refers to how good the test is at identifying people who have injection phobia from the pool of people who actually have injection phobia.
Select one:

A.
specificity

B.
sensitivity

C.
positive predictive value

D.
negative predictive value

A

B.

sensitivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following is NOT an example of a standard score?
Select one:

A.
WAIS IQ score

B.
percentage score

C.
z-score

D.
T-score

A

B.

percentage score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

__________ refers to the extent to which individual test items contribute to the overall purpose of the test.
Select one:

A.
Validity

B.
Reliability

C.
Discrimination

D.
Relevance

A

D.

Relevance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A first-year college student obtains a score of 150 on her final English exam, a score of 100 on her math exam, a score of 55 on her chemistry exam, and a score of 30 on her history exam. The means and standard deviations for these tests are, respectively, 125 and 20 for the English exam, 90 and 10 for the math exam, 45 and 5 for the chemistry exam, and 30 and 5 for the history exam. Based on this information, you can conclude that the student’s test performance was best on which exam?
Select one:

A.
English

B.
Math

C.
Chemistry

D.
History

A

C.

Chemistry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To maximize the ability of a test to discriminate among test takers, a test developer will want to include test items that vary in terms of difficulty. If the test developer wants to add more difficult items to her test, she will include items that have an item difficulty index of:
Select one:

A.
.90

B.
.50

C.
.10

D.
0

A

c. .10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A final exam is developed to evaluate students’ comprehension of information presented in a high school history class. When the exam is administered to three classes of students at the end of the semester, all students obtain failing scores. This suggests that the exam may have poor ________ validity.
Select one:

A.
concurrent

B.
incremental

C.
content

D.
divergent

A

c. content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The correction for attenuation formula is used to measure the impact of increasing:
Select one:

A.
a test’s reliability on its validity

B.
a test’s validity on its reliability

C.
the number of test items on the test’s validity

D.
the number of test items on the test’s reliability

A

A.

a test’s reliability on its validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When a test has been constructed on the basis of item response theory, an examinee’s total test score provides information about his/her:
Select one:

A.
status on a latent trait or ability

B.
predicted performance on an external criterion

C.
performance relative to other examinees included in the standardization sample

D.
current developmental level

A

A.

status on a latent trait or ability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The primary advantage in using a percentile rank, z-score, or T-score is that these scores:
Select one:

A.
are easy to interpret because they reference an individual’s test performance to an established standard of performance

B.
are easy to interpret because they reference an individual’s test performance to the performance of other examinees

C.
are easy to interpret because they make it possible to predict which criterion group an examinee is likely to belong to

D.
normalize the raw score distribution so that parametric tests can be used to analyze test scores

A

B.

are easy to interpret because they reference an individual’s test performance to the performance of other examinees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

You would use a multitrait-multimethod matrix in order to:
Select one:

A.
compare a test’s predictive and concurrent validity

B.
determine if a test has adequate convergent and discriminant validity

C.
identify the common factors underlying a set of related constructs

D.
test hypotheses about the causal relationships among variables

A

B.

determine if a test has adequate convergent and discriminant validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

An advantage of using the kappa statistic rather than percent agreement when assessing a test’s inter-rater reliability is that the former:
Select one:

A.
is easier to calculate

B.
corrects for chance agreement

C.
corrects for small sample size

D.
takes into account the effects of multicollinearity

A

B.

corrects for chance agreement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

: Which type of reliability would be most appropriate for estimating the reliability of a multiple-choice speed test?
Select one:

A.
Split-half

B.
Coefficient of concordance

C.
Alternate forms

D.
Coefficient alpha

A

c. alternate forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When using principal component analysis:
Select one:

A.
the first principal component represents the largest share of the total variance.

B.
the first principal component represents the smallest share of the total variance.

C.
each component represents an equal share of the total variance.

D.
the order of the components is not related to the share of total variance they represent.

A

A.

the first principal component represents the largest share of the total variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following scores does not “belong with” the other three?
Select one:

A.
Stanine scores

B.
z-scores

C.
Percentile ranks

D.
Percentage scores

A

D.

Percentage scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Stella obtains a score of 50 on a test that has a standard deviation of 10 and a standard error of measurement of 5. The 95% confidence interval for Stella’s score is approximately:
Select one:

A.
45 to 55

B.
40 to 60

C.
35 to 65

D.
30 to 70

A

b. 40 to 60

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In a normal distribution, which of the following represents the lowest score?
Select one:

A.
Percentile rank of 20

B.
z-score of -1.0

C.
T-score of 25

D.
Wechsler IQ score of 70

A

C.

T-score of 25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A personnel director uses a mechanical aptitude test to hire machine shop workers. Several of the people hired using the test turn out to be less than adequate performers. These individuals are:
Select one:

A.
true positives

B.
true negatives

C.
false positives

D.
false negatives

A

c. false positives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In terms of magnitude, the standard error of measurement can be:
Select one:

A.
no greater than 1.0

B.
no less than 1.0

C.
no greater than the standard deviation of the test scores

D.
no less than the standard deviation of the test scores

A

C.

no greater than the standard deviation of the test scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A 200-item test that has been administered to 100 college students has a normal distribution, a mean of 145, and a standard deviation of 12. When the students’ raw scores have been converted to percentile ranks, Alex obtains a percentile rank of 49, while his twin sister Alicia obtains a percentile rank of 90. The teacher realizes that she made a mistake in scoring Alex’s and Alicia’s tests: Both should have received a raw score that was five points higher. In terms of their percentile ranks, when the teacher adds the five points to Alex’s and Alicia’s scores, she can expect that:
Select one:

A.
Alicia’s percentile rank will increase more than Alex’s.

B.
Alex’s percentile rank will increase more than Alicia’s.

C.
Alicia’s and Alex’s percentile ranks will increase by the same amount.

D.
Alicia’s and Alex’s percentile ranks will not change.

A

B.

Alex’s percentile rank will increase more than Alicia’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In a distribution of percentile ranks, the number of examinees receiving percentile ranks between 20 and 30 is:
Select one:

A.
equal to the number of examinees receiving percentile ranks between 50 and 60

B.
greater than the number of examinees receiving percentile ranks between 50 and 60

C.
about one-half the number of examinees receiving percentile ranks between 50 and 60

D.
about one-fourth the number of examinees receiving percentile ranks between 50 and 60

A

A.

equal to the number of examinees receiving percentile ranks between 50 and 60

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

To obtain a “coefficient of stability,” you would:
Select one:

A.
administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores.

B.
administer a test to a group of examinees and determine the average inter-item correlation.

C.
administer a test to two different random samples of examinees on two occasions and correlate the two sets of scores.

D.
administer parallel forms of a test to the same group of examinees and correlate the two sets of scores.

A

A.
administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The best way to control consensual observer drift is to:
Select one:

A.
use the correction for attenuation formula

B.
use a true experimental research design

C.
videotape the observers

D.
alternate raters

A

d. alternate raters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A reliability coefficient is best defined as a measure of:
Select one:

A.
relevance

B.
consistency

C.
interpretability

D.
generalizability

A

b. consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A test designed to measure knowledge of clinical psychology is likely to have the highest reliability coefficient when:
Select one:

A.
the test consists of 30 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology

B.
the test consists of 30 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology

C.
the test consists of 80 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology

D.
the test consists of 80 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology

A

C.
the test consists of 80 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Assuming no constraints in terms of time, money, or other resources, the best way to demonstrate that a test has adequate reliability is by using which of the following techniques?
Select one:

A.
equivalent (alternate) forms

B.
test-retest

C.
Cronbach’s alpha

D.
Cohen’s kappa

A

A.

equivalent (alternate) forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In a normal distribution, a T-score of ___ is equivalent to a percentile rank of 16.
Select one:

A.
10

B.
20

C.
30

D.
40

A

d. 40

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

In factor analysis, a factor loading indicates the correlation between:
Select one:

A.
a test and an identified factor

B.
two different tests

C.
two factors measured by the same test

D.
two factors measured by different tests

A

A.

a test and an identified factor

28
Q

In factor analysis, when two factors are “orthogonal,” this means that:
Select one:

A.
the factors are correlated.

B.
the factors are uncorrelated.

C.
the factors explain a statistically significant amount of variability in test scores.

D.
the factors do not explain a statistically significant amount of variability in test scores.

A

B.

the factors are uncorrelated.

29
Q

Which of the following types of validity would you be most interested in when designing a selection test that will be used to predict the future job performance ratings of job applicants?
Select one:

A.
Discriminant

B.
Content

C.
Construct

D.
Criterion-related

A

D.

Criterion-related

30
Q

Which of the following best describes the relationship between validity and reliability?
Select one:

A.
A valid test is also a reliable test.

B.
A valid test may or may not be a reliable test.

C.
A reliable test is also a valid test.

D.
An invalid test is not a reliable test.

A

A.

A valid test is also a reliable test.

31
Q

In a multitrait-multimethod matrix, the coefficient that indicates a test’s reliability is the _____________ coefficient.
Select one:

A.
heterotrait-heteromethod

B.
heterotrait-monomethod

C.
monotrait-heteromethod

D.
monotrait-monomethod

A

D.

monotrait-monomethod

32
Q

Cronbach’s alpha is an appropriate method for evaluating reliability when:
Select one:

A.
all test items are designed to measure the same underlying characteristic.

B.
test items are subjectively scored.

C.
the test will be administered to examinees at regular intervals over time.

D.
there is a restriction in the range of scores.

A

A.

all test items are designed to measure the same underlying characteristic.

33
Q

Incremental validity is a measure of:
Select one:

A.
decision-making accuracy

B.
shrinkage

C.
the generalizability of research results

D.
the costs involved in using a predictor

A

A.

decision-making accuracy

34
Q

In factor analysis, communality refers to:
Select one:

A.
the proportion of variance accounted in a single variable by a single factor.

B.
the proportion of variance accounted in multiple variables by a single factor.

C.
the proportion of variance accounted for in a single variable by all of the identified factors.

D.
the total proportion of variance not explained by the factor analysis.

A

C.

the proportion of variance accounted for in a single variable by all of the identified factors.

35
Q

Criterion contamination has which of the following effects?
Select one:

A.
It artificially increases scores on the criterion.

B.
It artificially reduces the criterion’s reliability coefficient.

C.
It artificially increases the predictor’s criterion-related validity coefficient.

D.
It artificially attenuates scores on the predictor and the criterion.

A

C.

It artificially increases the predictor’s criterion-related validity coefficient.

36
Q

To evaluate the concurrent validity of a new selection test for clerical workers, you would:
Select one:

A.
conduct a factor analysis to confirm that the test measures the attributes it was designed to measure

B.
have supervisors and others familiar with the job rate test items for relevance to success as a clerical worker

C.
administer the test to a sample of current clerical workers and correlate their scores on the test with their recently assigned performance ratings

D.
administer the test to clerical workers when they are initially hired and six months after they are hired and then correlate the two sets of scores

A

C.
administer the test to a sample of current clerical workers and correlate their scores on the test with their recently assigned performance ratings

37
Q

When using criterion-referenced interpretation of scores obtained on a job knowledge test, you would most likely be interested in which of the following?
Select one:

A.
The total number of test items answered correctly by an examinee

B.
An examinee’s performance relative to that of other examinees

C.
An examinee’s standing on two or more measures designed to assess the same characteristic

D.
Ensuring that test items are based on a systematic job evaluation

A

A.

The total number of test items answered correctly by an examinee

38
Q

A test developer would use the Kuder-Richardson Formula (KR-20) in order to:
Select one:

A.
evaluate a test’s internal consistency reliability

B.
evaluate a test’s test-retest reliability

C.
determine the impact of increasing a test’s reliability on its validity

D.
determine the impact of lengthening or shortening the test on its reliability

A

A.

evaluate a test’s internal consistency reliability

39
Q

A test developer asks a group of experienced salespeople to review the test items she has developed for a test to help select new sales applicants. This demonstrates the developer is interested in determining the test’s ______ validity.
Select one:

A.
incremental

B.
content

C.
concurrent

D.
differential

A

b. content

40
Q

To maximize the inter-rater reliability of a behavioral observation scale, you should make sure that coding categories:
Select one:

A.
are mutually exclusive

B.
are measured on an interval or ratio scale

C.
produce criterion-referenced scores

D.
produce scores that are normally distributed

A

A.

are mutually exclusive

41
Q

The minimum and maximum values of the standard error of estimate are:
Select one:

A.
-1 and +1

B.
0 and 1

C.
0 and the standard deviation of the predictor

D.
0 and the standard deviation of the criterion

A

D.

0 and the standard deviation of the criterion

42
Q

When the kappa statistic for a measure is .90, this indicates that the measure:
Select one:

A.
has adequate inter-rater reliability

B.
has adequate internal consistency reliability

C.
has low criterion-related validity

D.
has low incremental validity

A

A.

has adequate inter-rater reliability

43
Q

A personnel director hires all job applicants who obtain a high score on a job selection test but, after using the test for six months, realizes that many of the new employees are obtaining low-performance ratings. Assuming that the selection test has adequate criterion-related validity, the personnel director can reduce the number of unsatisfactory workers that she hires using the test by:
Select one:

A.
lowering the selection test cutoff score and the job performance rating cutoff score

B.
raising the selection test cutoff score and the job performance rating cutoff score

C.
lowering selection test cutoff score

D.
raising the selection test cutoff score

A

D.

raising the selection test cutoff score

44
Q

All other things being equal, which of the following tests is likely to have the largest reliability coefficient?
Select one:

A.
A multiple-choice test that consists of items that each have five answer options

B.
A multiple-choice test that consists of items that each have four answer options

C.
A multiple-choice test that consists of items that each have three answer options

D.
A true-false test

A

A.

A multiple-choice test that consists of items that each have five answer options

45
Q

The applicants for sales positions at the Acme Company complain that the selection test they are required to take is unfair because it doesn’t “look like” it measures the knowledge and skills that are important for successful job performance. Their complaint suggests that the selection test is lacking which of the following?
Select one:

A.
Incremental validity

B.
Differential validity

C.
Construct validity

D.
Face validity

A

d. face validity

46
Q

: The optimal item difficulty level (p) for a true/false test is:
Select one:

A.
.50

B.
.75

C.
+1.0

D.
-1.0

A

b .75

47
Q

According to classical test theory, total variability in test scores is due to:
Select one:

A.
true score variability plus systematic error

B.
true score variability plus random error

C.
relevant variability plus irrelevant variability

D.
relevant variability plus confounding variability

A

B.

true score variability plus random error

48
Q

The assumption underlying convergent validity is that:
Select one:

A.
a measure of a characteristic should correlate highly with a different type of measure that is already known to assess the same characteristic.

B.
a measure of a construct should have little correlation to a measure known to assess an unrelated characteristic.

C.
a measure of a construct should correlate more highly with itself than with another measure of the same construct.

D.
to be valid, a measure of a characteristic should correlate highly with the measure of the behavior it is designed to predict.

A

A.
a measure of a characteristic should correlate highly with a different type of measure that is already known to assess the same characteristic.

49
Q

Content sampling is not a potential source of measurement error for which of the following methods for evaluating a test’s reliability?
Select one:

A.
Coefficient alpha and alternate forms

B.
Alternate forms and test-retest

C.
Split-half only

D.
Test-retest only

A

D.

Test-retest only

50
Q

After reviewing the data collected on a new selection test during the course of a criterion-related validity study, a psychologist decides to lower the selection test cutoff score. Apparently the psychologist is hoping to do which of the following?
Select one:

A.
Decrease the number of false negatives

B.
Increase the number of true positives

C.
Increase the number of false positives

D.
Increase the number of false negatives

A

B.

Increase the number of true positives

51
Q

Which of the following item difficulty (p) levels maximizes the differentiation of examinees into high- and low-performing groups?
Select one:

A.
0.5

B.
0.9

C.
1.5

D.
0.75

A

a. 0.5

52
Q

The standard error of measurement is used to:
Select one:

A.
estimate a test’s “true” reliability coefficient

B.
estimate a test’s “true” criterion-related validity coefficient

C.
calculate the range within which an examinee’s true test score is likely to fall given her obtained score

D.
calculate the range within which an examinee’s true criterion score is likely to fall given her predicted criterion score

A

C.

calculate the range within which an examinee’s true test score is likely to fall given her obtained score

53
Q

The distribution of percentile ranks is always:
Select one:

A.
the same as the shape of the distribution of raw scores

B.
normal regardless of the shape of the distribution of raw scores

C.
rectangular (flat) regardless of the shape of the distribution of raw scores

D.
bimodal regardless of the shape of the distribution of raw scores

A

C.

rectangular (flat) regardless of the shape of the distribution of raw scores

54
Q

In factor analysis, the original factor matrix is usually rotated in order to:
Select one:

A.
facilitate interpretation of the identified factors

B.
determine how many factors to extract

C.
cross-validate the factor analysis

D.
verify the causal relationships among the identified factors

A

A.

facilitate interpretation of the identified factors

55
Q

In the multitrait-multimethod matrix, which of the following coefficients provides information about a test’s convergent validity?
Select one:

A.
Heterotrait-heteromethod

B.
Heterotrait-monomethod

C.
Monotrait-heteromethod

D.
Monotrait-monomethod

A

C.

Monotrait-heteromethod

56
Q

When a test user uses a correction for guessing formula that involves subtracting points from each examinee’s scores, the resulting distribution of scores will have a ____________________ than the original (non-corrected) distribution.
Select one:

A.
higher mean and larger standard deviation

B.
higher mean and smaller standard deviation

C.
lower mean and larger standard deviation

D.
lower mean and smaller standard deviation

A

C.

lower mean and larger standard deviation

57
Q

The item discrimination index (D) ranges in value from:
Select one:

A.
0 to 10

B.
0 to 50

C.
-1.0 to +1.0

D.
-50 to +50

A

C.

-1.0 to +1.0

58
Q

Which of the following is used to estimate the effects of shortening or lengthening a test on the test’s reliability coefficient?
Select one:

A.
Cohen’s kappa statistic

B.
Kuder-Richardson Formula 20

C.
Cronbach’s coefficient alpha

D.
Spearman-Brown formula

A

D.

Spearman-Brown formula

59
Q

A reliability coefficient of .60 indicates that ___ of variability in test scores is true score variability.
Select one:

A.
0.6

B.
0.4

C.
0.36

D.
0.16

A

A.

0.6

60
Q

In the context of test construction, cross-validation is associated with which of the following?
Select one:

A.
Shrinkage

B.
Criterion deficiency

C.
Criterion contamination

D.
Banding

A

A.

Shrinkage

61
Q

A test developer would construct an expectancy table to:
Select one:

A.
facilitate norm-referenced interpretation of test scores

B.
facilitate criterion-referenced interpretation of test scores

C.
correct obtained scores for the effects of guessing

D.
correct obtained test scores for the effects of measurement error

A

B.

facilitate criterion-referenced interpretation of test scores

62
Q

To evaluate the validity of a newly developed selection test for clerical workers, a test developer will correlate scores obtained on the test by newly hired clerical workers with the job performance ratings they receive after being on-the-job for six months. The resulting correlation coefficient will provide information on the test’s:
Select one:

A.
discriminant validity

B.
predictive validity

C.
construct validity

D.
concurrent validity

A

B.

predictive validity

63
Q

The point at which an item characteristic curve intercepts the vertical (Y) axis provides information on which of the following?
Select one:

A.
The item’s difficulty level

B.
The item’s ability to discriminate between low and high scorers

C.
The probability of answering the item correctly by guessing

D.
The item’s ability to predict performance on an external criterion

A

C.

The probability of answering the item correctly by guessing

64
Q

Assuming that each score is from a normal distribution of scores, which of the following accurately lists the scores in order of magnitude from lowest to highest?
Select one:

A.
T = 40; z = 0; PR = 82; WAIS-IV IQ = 126

B.
PR = 16; T = 30; z = 1.5; WAIS-IV IQ = 116

C.
z = -2; T = 25; WAIS-IV IQ = 115; PR = 75

D.
PR = 50; T = 50; WAIS-IV IQ = 132; z = 1.5

A

A.

T = 40; z = 0; PR = 82; WAIS-IV IQ = 126

65
Q

When the heterotrait-monomethod coefficient is large, this indicates:
Select one:

A.
a lack of differential validity

B.
a lack of discriminant validity

C.
adequate convergent validity

D.
adequate concurrent validity

A

B.

a lack of discriminant validity

66
Q

It would be most important to assess the test-retest reliability of a measure that:
Select one:

A.
is subjectively scored

B.
assesses examinees’ speed of responding

C.
measures a stable trait

D.
measures a characteristic that fluctuates over time

A

C.

measures a stable trait

67
Q

In terms of item response theory, the slope (steepness) of the item characteristic curve (ICC) indicates the item’s:
Select one:

A.
level of difficulty

B.
ability to discriminate between examinees

C.
internal consistency reliability

D.
criterion-related validity

A

B.

ability to discriminate between examinees