test construction 2 Flashcards

1
Q

What is the range of p values (Item difficulty index)?

a. -1.0 to 1.0
b. 0 to 2.0
c. 0 to 1.5
d. 0 to 1.0

A

d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to calculate the item difficulty index

A

p= Total number of Examinees passing the item/Total number of Examinees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a larger p value indicate?

a. better reliability
b. easier items
c. better item discrimination
d. harder items

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In most situations, a p value of _____ is optimal. One exception is the case of a true/false test, for which the optimal p value is ____.

A

.50; .75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

This refers to the extent to which a test item is able to differentiate between examinees who obtain high versus low scores on the entire test or on an external criterion.

A

Item discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The item discrimination index ranges from:

a. -1.0 to 1.0
b. 0 to 2.0
c. 0 to 1.5
d. 0 to 1.0

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For most tests, an item with a discrimination index of ____ or higher is considered acceptable.

A

.35

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If all examinees in the upper group and none in the lower group answered the item correctly, D is equal to _____.

A

1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If none of the examinees in the upper group and all examinees in the lower group answered the item correctly, D equals _____.

A

-1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Test construction is usually based on one of two theories:

A

classical test theory

item response theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantages of item response theory are that item parameters are sample invariant and performance on different sets of items or tests can be easily __________. Use of IRT involves deriving an ____________________________ for each item.

A

equated; item characteristic curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Whenever we administer a test to examinees, we would like to know how much of their scores reflects “truth” and how much reflects error. It is a measure of ________ that provides us with an estimate of the proportion of variability in examinees’ obtained scores that is due to true differences among examinees on the attributes measure by the test. when a test is ________, it provides dependable, consistent results.

A

Reliability; reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Most methods for estimating reliability produce a reliability coefficient, which is a correlation coefficient that ranges in value from:

a. -1.0 to 1.0
b. 0 to 2.0
c. 0 to 1.5
d. 0 to 1.0

A

d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When a test’s reliability coefficient is 0.0, this means that all variability in obtained test scores is due to __________________.

A

measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When a test’s reliability coefficient is +1.0, this indicates that all variability in scores ______________.

A

reflects true score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A reliability coefficientt of .84 indicates that ____% of variability in scores is due to true score differences among examining, while the remaining _____% is due to measurement error.

A

84; 16

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

This method for estimating reliability involves administering the same test to the same group of examinees on two different occasions and then correlating the two sets of scores.

a. Alternate forms reliability
b. Test-retest reliability
c. Split-half reliability
d. Inter-rater reliability

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

An ______________ coefficient is calculated by administering two equivalent forms of a test to the same group of examinees and correlating the two sets of scores.

a. Alternate forms reliability
b. Test-retest reliability
c. Split-half reliability
d. Inter-rater reliability

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The test-retest reliability coefficient is also known as the coefficient of ____________.

A

stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The alternate forms reliability coefficient is also referred to as the coefficient of ______________.

A

equivalence (and stability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

To assess ____________________, a test is administered once to a single group of examinees.

a. Alternate forms reliability
b. Test-retest reliability
c. Split-half reliability
d. Internal consistency reliability

A

d

22
Q

A _________________ coefficient is calculated by splitting the test in half and correlating examinees’ scores on the two halves. Because the size of a reliability coefficient is affected by the test length, the split-half method tends to __________ a test’s true reliability. Consequently, the ____________ formula is often used in conjunction with split-half reliability to obtain an estimate of what the test’s true reliability is.

A

split-half; underestimate; Spearman-Brown

23
Q

____________, another method used to assess internal consistency reliability, indicates the average inter-item consistency rather than the consistency between two halves of the test. The _____________ can be used as a substitute for it when tests items are scored dichotomously (right or wrong).

A

Coefficient alpha; Kuder-Richardson Formula 20

24
Q

____________ should be assessed when a test is subjectively scored. The scores assigned by different raters can be used to calculate a ___________________ or to determine the percent agreement between raters.

A

Inter-rater reliability; correlation coefficient (kappa statistic)

25
Q

The ___________ could be used to estimate the effects of increasing or reducing the number of items on a test.

A

Spearman-brown prophesy formula

26
Q

While different types of tests can be expected to have different levels of reliability, for most tests, reliability coefficients of ______ or larger are considered acceptable.

A

.80

27
Q

The magnitude of a reliability coefficient is affected by several factors. In general, the longer a test, the ____________ its reliability coefficeent.

A

larger

28
Q

The __________________ is useful for indicating how much we can expect an individual examinee’s obtained score reflects his or her true score. It is calculated by multiplying the standard deviation of the test scores by the ________ of one minus the reliability coefficient.

A

standard error of measurement; square root

29
Q

The standard error of measurement is used to construct a ______________ around an examinee’s obtained score.

A

confidence interval

30
Q

________ refers to a test’s accuracy: A test is ______ when it measures what it is intended to measure.

A

Validity; valid

31
Q

There are three main forms of validity: _______ is of concern whenever a test has been designed to measure one or more content or behavior domains. _________ is important when a test will be used to measure a hypothetical construct such as achievement, motivation, intelligence, or mechanical aptitude. __________ is of interest when a test has been designed to estimate or predict performance on another measure.

A

Content validity; construct validity; criterion-related

32
Q

When scores on the test (X) are important because they provide information on how much each examinee know about a content domain or on each examinee’s status with regard to the trait being measured, then ___________ or ________ validity, respectively, are of interest. However, when the test (X) scores will be used to predict scores on some other measure (Y) and it is the scores on Y that are of most interest, the ___________ validity is of greatest concern.

A

content; construct; criterion-related

33
Q

High correlations with measures of the same trait provide evidence of the test’s ___________ validity, while low correlations with measures of unrelated characteristics provide evidence of the test’s ___________ validity.

A

convergent; discriminant (divergent)

34
Q

The ________________ is used to systematically organize the data collected when assessing a test’s convergent and discriminant validity. It indicates that a test has convergent validity when the ________________ coefficients are large and discriminant validity when the _______________ and the ___________ coefficients are small.

A

mulitrait-multimethod matrix; monotrait-heteromethod; heterotrait-heteromethod; heterotrait-monomethod

35
Q

Factor analysis is used to identify the factors (dimensions) that underlie the ___________ among a set of tests. One use of the data obtained in a factor analysis is to determine if a test has ______________.

A

intercorrelations; construct validity

36
Q

Using a factor analysis, a test is shown to have construcct validity when it has ______________ correlations with the factor(s) it is expected to correlate with and ______ correlations with the factor(s) it is not expected to correlate with.

A

high; low

37
Q

In a factor matrix, the correlation between a test and a factor is referred to as ___________. This correlation can be interpreted in terms of shared variability. For example, if a test has a correlation of .50 with Factor I, this means that _____ percent of variability in test scores is explained by Factor I.

A

factor loading; 25

38
Q

In factor analysis, when the identified factors are ______________ (uncorrelated), a test’s communality can be calculated by summing the __________________. If a test has a correlation of .50 with a Factor I and a correlation of .20 with Factor II and the factors are uncorrelated, the test’s communality is equal to _____. This means that ____% of the variability in test scores is explained by the identified factors, while the remaining variability is due to some combination of specificity and measurement error.

A

orthogonal; .29; 29

39
Q

When the purpose of testing is to draw conclusions about performance on another measure, the test is referred to as the _____________ and the other measure is called the _____________.

A

predictor; criterion

40
Q

There are two types of criterion-related validity: When establishing __________ validity, the predictor is administered to a sample of examinees prior to the criterion. It is the appropriate type of validity when the goal of testing is to predict ________ status on the criterion.

When evaluating ______________ validity, the predictor and criterion are administered at about the same time. It is the preferred method for assessing validity when the purpose of testing is to estimate ________ status on the criterion.

A

predictive; future; concurrent; current

41
Q

This is used to construct a confidence interval around an individuals predicted criterion score.

A

standard error of estimate

42
Q

The data collected in a concurrent or predictive validity study can be used to assess a predictor’s _________________, or the increase in correct decisions that can be expected if the predictor is used as a decision-making tool.

A

incremental validity

43
Q

Study tip: Remember that it is the ___________ that determines if a person is a positive or a negative, and the ________ that determines if he/she is a “true” or “false”

A

predictor; criterion

44
Q

The optimal item difficulty level for a true/false test is:

a. .25
b. .50
c. .75
d. 1.00

A

c

45
Q

For a test item that has an item discrimination index of +1.0, you would expect:

a. high achievers to be more likely to answer the item correctly than low achievers
b. low achievers to be more likely to answer the item correctly than high achievers
c. low and high achievers to be equally likely to answer the item correction
d. low and high achievers to be equally likely to answer the item incorrectly

A

a

46
Q

In terms of item response theory, the slope (steepness) of the item response curve indicates the item’s:

a. difficulty
b. discriminability
c. reliability
d. validity

A

b. When using an item characteristic curve, an item’s ability to discriminate between high and low achievers is indicated by the slope of the curve - the steeper the slopoe, the greater the discrimination.

47
Q

A researcher correlates scores on two alternate forms of an achievement test and obtains a correlation coefficient of .80. This means that ____% of observed test score variability reflects true score variability:

a. 80
b. 64
c. 36
d. 20

A

a

48
Q

To estimate the effects off lengthening a 50-item test to 100 items on the test’s reliability, you would use which off the following:

a. Pearson r
b. Kuder-Richardson Formula 20
c. kappa coefficient
d. Spearman-Brown Formula

A

d

49
Q

To assess the internal consistency of a test that contains 50 items with are each scored as “right” or “wrong,” you would use which of the following:

a. KR-20
b. Spearman-Brown
c. kappa statistic
d. coefficient of concordance

A

a

50
Q

You administer a test to a group of examinees on April 1 and then re-administer the test to the same group of examinees on May 1. When you correlate the two sets of scores, you will have obtained:

a. coefficient of consistency
b. coefficient of determination
c. coefficient of equivalence
d. coefficient of stability

A

d

51
Q

The kappa statistic for a test is .90. This means that the test has:

a. adequate inter-rater reliability
b. adequate internal consistency reliability
c. inadequate intra-rater reliability
d. inadequate alternate forms reliability

A

a

52
Q

Refers to the extent to which test items contribute to achieving the stated goals of testing

A

relevance