test construction 2 Flashcards by Bill MacLaney

What is the range of p values (Item difficulty index)?

a. -1.0 to 1.0
b. 0 to 2.0
c. 0 to 1.5
d. 0 to 1.0

How well did you know this?

Not at all

Perfectly

How to calculate the item difficulty index

p= Total number of Examinees passing the item/Total number of Examinees

How well did you know this?

Not at all

Perfectly

What does a larger p value indicate?

a. better reliability
b. easier items
c. better item discrimination
d. harder items

How well did you know this?

Not at all

Perfectly

In most situations, a p value of _____ is optimal. One exception is the case of a true/false test, for which the optimal p value is ____.

.50; .75

How well did you know this?

Not at all

Perfectly

This refers to the extent to which a test item is able to differentiate between examinees who obtain high versus low scores on the entire test or on an external criterion.

Item discrimination

How well did you know this?

Not at all

Perfectly

The item discrimination index ranges from:

a. -1.0 to 1.0
b. 0 to 2.0
c. 0 to 1.5
d. 0 to 1.0

How well did you know this?

Not at all

Perfectly

For most tests, an item with a discrimination index of ____ or higher is considered acceptable.

.35

How well did you know this?

Not at all

Perfectly

If all examinees in the upper group and none in the lower group answered the item correctly, D is equal to _____.

1.0

How well did you know this?

Not at all

Perfectly

If none of the examinees in the upper group and all examinees in the lower group answered the item correctly, D equals _____.

-1.0

How well did you know this?

Not at all

Perfectly

Test construction is usually based on one of two theories:

classical test theory

item response theory

How well did you know this?

Not at all

Perfectly

Advantages of item response theory are that item parameters are sample invariant and performance on different sets of items or tests can be easily __________. Use of IRT involves deriving an ____________________________ for each item.

equated; item characteristic curve

How well did you know this?

Not at all

Perfectly

Whenever we administer a test to examinees, we would like to know how much of their scores reflects “truth” and how much reflects error. It is a measure of ________ that provides us with an estimate of the proportion of variability in examinees’ obtained scores that is due to true differences among examinees on the attributes measure by the test. when a test is ________, it provides dependable, consistent results.

Reliability; reliable

How well did you know this?

Not at all

Perfectly

Most methods for estimating reliability produce a reliability coefficient, which is a correlation coefficient that ranges in value from:

a. -1.0 to 1.0
b. 0 to 2.0
c. 0 to 1.5
d. 0 to 1.0

How well did you know this?

Not at all

Perfectly

When a test’s reliability coefficient is 0.0, this means that all variability in obtained test scores is due to __________________.

measurement error

How well did you know this?

Not at all

Perfectly

When a test’s reliability coefficient is +1.0, this indicates that all variability in scores ______________.

reflects true score variability

How well did you know this?

Not at all

Perfectly

A reliability coefficientt of .84 indicates that ____% of variability in scores is due to true score differences among examining, while the remaining _____% is due to measurement error.

84; 16

How well did you know this?

Not at all

Perfectly

This method for estimating reliability involves administering the same test to the same group of examinees on two different occasions and then correlating the two sets of scores.

a. Alternate forms reliability
b. Test-retest reliability
c. Split-half reliability
d. Inter-rater reliability

How well did you know this?

Not at all

Perfectly

An ______________ coefficient is calculated by administering two equivalent forms of a test to the same group of examinees and correlating the two sets of scores.

a. Alternate forms reliability
b. Test-retest reliability
c. Split-half reliability
d. Inter-rater reliability

How well did you know this?

Not at all

Perfectly

The test-retest reliability coefficient is also known as the coefficient of ____________.

stability

How well did you know this?

Not at all

Perfectly

The alternate forms reliability coefficient is also referred to as the coefficient of ______________.

equivalence (and stability)

How well did you know this?

Not at all

Perfectly

To assess ____________________, a test is administered once to a single group of examinees.

a. Alternate forms reliability
b. Test-retest reliability
c. Split-half reliability
d. Internal consistency reliability

Study These Flashcards

A _________________ coefficient is calculated by splitting the test in half and correlating examinees’ scores on the two halves. Because the size of a reliability coefficient is affected by the test length, the split-half method tends to __________ a test’s true reliability. Consequently, the ____________ formula is often used in conjunction with split-half reliability to obtain an estimate of what the test’s true reliability is.

Study These Flashcards

split-half; underestimate; Spearman-Brown

____________, another method used to assess internal consistency reliability, indicates the average inter-item consistency rather than the consistency between two halves of the test. The _____________ can be used as a substitute for it when tests items are scored dichotomously (right or wrong).

Study These Flashcards

Coefficient alpha; Kuder-Richardson Formula 20

____________ should be assessed when a test is subjectively scored. The scores assigned by different raters can be used to calculate a ___________________ or to determine the percent agreement between raters.

Study These Flashcards

Inter-rater reliability; correlation coefficient (kappa statistic)

The ___________ could be used to estimate the effects of increasing or reducing the number of items on a test.

Spearman-brown prophesy formula

While different types of tests can be expected to have different levels of reliability, for most tests, reliability coefficients of ______ or larger are considered acceptable.

.80

The magnitude of a reliability coefficient is affected by several factors. In general, the longer a test, the ____________ its reliability coefficeent.

larger

The __________________ is useful for indicating how much we can expect an individual examinee's obtained score reflects his or her true score. It is calculated by multiplying the standard deviation of the test scores by the ________ of one minus the reliability coefficient.

standard error of measurement; square root

The standard error of measurement is used to construct a ______________ around an examinee's obtained score.

confidence interval

________ refers to a test's accuracy: A test is ______ when it measures what it is intended to measure.

Validity; valid

There are three main forms of validity: _______ is of concern whenever a test has been designed to measure one or more content or behavior domains. _________ is important when a test will be used to measure a hypothetical construct such as achievement, motivation, intelligence, or mechanical aptitude. __________ is of interest when a test has been designed to estimate or predict performance on another measure.

Content validity; construct validity; criterion-related

When scores on the test (X) are important because they provide information on how much each examinee know about a content domain or on each examinee's status with regard to the trait being measured, then ___________ or ________ validity, respectively, are of interest. However, when the test (X) scores will be used to predict scores on some other measure (Y) and it is the scores on Y that are of most interest, the ___________ validity is of greatest concern.

content; construct; criterion-related

High correlations with measures of the same trait provide evidence of the test's ___________ validity, while low correlations with measures of unrelated characteristics provide evidence of the test's ___________ validity.

convergent; discriminant (divergent)

The ________________ is used to systematically organize the data collected when assessing a test's convergent and discriminant validity. It indicates that a test has convergent validity when the ________________ coefficients are large and discriminant validity when the _______________ and the ___________ coefficients are small.

mulitrait-multimethod matrix; monotrait-heteromethod; heterotrait-heteromethod; heterotrait-monomethod

Factor analysis is used to identify the factors (dimensions) that underlie the ___________ among a set of tests. One use of the data obtained in a factor analysis is to determine if a test has ______________.

intercorrelations; construct validity

Using a factor analysis, a test is shown to have construcct validity when it has ______________ correlations with the factor(s) it is expected to correlate with and ______ correlations with the factor(s) it is not expected to correlate with.

high; low

In a factor matrix, the correlation between a test and a factor is referred to as ___________. This correlation can be interpreted in terms of shared variability. For example, if a test has a correlation of .50 with Factor I, this means that _____ percent of variability in test scores is explained by Factor I.

factor loading; 25

In factor analysis, when the identified factors are ______________ (uncorrelated), a test's communality can be calculated by summing the __________________. If a test has a correlation of .50 with a Factor I and a correlation of .20 with Factor II and the factors are uncorrelated, the test's communality is equal to _____. This means that ____% of the variability in test scores is explained by the identified factors, while the remaining variability is due to some combination of specificity and measurement error.

orthogonal; .29; 29

When the purpose of testing is to draw conclusions about performance on another measure, the test is referred to as the _____________ and the other measure is called the _____________.

predictor; criterion

There are two types of criterion-related validity: When establishing __________ validity, the predictor is administered to a sample of examinees prior to the criterion. It is the appropriate type of validity when the goal of testing is to predict ________ status on the criterion. When evaluating ______________ validity, the predictor and criterion are administered at about the same time. It is the preferred method for assessing validity when the purpose of testing is to estimate ________ status on the criterion.

predictive; future; concurrent; current

This is used to construct a confidence interval around an individuals predicted criterion score.

standard error of estimate

The data collected in a concurrent or predictive validity study can be used to assess a predictor's _________________, or the increase in correct decisions that can be expected if the predictor is used as a decision-making tool.

incremental validity

Study tip: Remember that it is the ___________ that determines if a person is a positive or a negative, and the ________ that determines if he/she is a "true" or "false"

predictor; criterion

The optimal item difficulty level for a true/false test is: a. .25 b. .50 c. .75 d. 1.00

For a test item that has an item discrimination index of +1.0, you would expect: a. high achievers to be more likely to answer the item correctly than low achievers b. low achievers to be more likely to answer the item correctly than high achievers c. low and high achievers to be equally likely to answer the item correction d. low and high achievers to be equally likely to answer the item incorrectly

In terms of item response theory, the slope (steepness) of the item response curve indicates the item's: a. difficulty b. discriminability c. reliability d. validity

b. When using an item characteristic curve, an item's ability to discriminate between high and low achievers is indicated by the slope of the curve - the steeper the slopoe, the greater the discrimination.

A researcher correlates scores on two alternate forms of an achievement test and obtains a correlation coefficient of .80. This means that ____% of observed test score variability reflects true score variability: a. 80 b. 64 c. 36 d. 20

To estimate the effects off lengthening a 50-item test to 100 items on the test's reliability, you would use which off the following: a. Pearson r b. Kuder-Richardson Formula 20 c. kappa coefficient d. Spearman-Brown Formula

To assess the internal consistency of a test that contains 50 items with are each scored as "right" or "wrong," you would use which of the following: a. KR-20 b. Spearman-Brown c. kappa statistic d. coefficient of concordance

You administer a test to a group of examinees on April 1 and then re-administer the test to the same group of examinees on May 1. When you correlate the two sets of scores, you will have obtained: a. coefficient of consistency b. coefficient of determination c. coefficient of equivalence d. coefficient of stability

The kappa statistic for a test is .90. This means that the test has: a. adequate inter-rater reliability b. adequate internal consistency reliability c. inadequate intra-rater reliability d. inadequate alternate forms reliability

Refers to the extent to which test items contribute to achieving the stated goals of testing

relevance

test construction 2 Flashcards

(52 cards)