Test Construction Flashcards

Question 1

Q

Ways to increase test reliability

Answer

A

increase length of similar content and quality
increase heterogeneity of the sample- attributes measured by the test

Question 2

Q

Item discrimination range and usage

Answer

A

The item discrimination index (D) ranges from -1.0 to +1.0. If all examinees in the upper group and none in the lower group answered the item correctly, D is +1.0; if none of the examinees in the upper group and all examinees in the lower group answered the item correctly, D equals -1.0.

Question 3

Q

Item Discrimination definition

Answer

A

Item discrimination refers to the extent to which a test item discriminates (differentiates) between examinees who obtain high versus low scores on the entire test or on an external criterion.

Question 4

Q

Factors identified in a factor analysis can be either ________ or _________.

Answer

A

orthogonal or oblique

Question 5

Q

From the perspective of factor analysis, true score variability consists of _________ and __________

Answer

A

communality and specificity

Question 6

Q

Factor Analysis

Answer

A

A multivariate statistical technique used to determine how many factors (constructs) are needed to account for the intercorrelations among a set of tests, subtests, or test items. Factor analysis can be used to assess a test’s construct validity by indicating the extent to which the test correlates with factors that it would and would not be expected to correlate with.

Question 7

Q

Item Characteristic Curve

Answer

A

When using item response theory, an item characteristic curve (ICC) is constructed for each item by plotting the proportion of examinees in the tryout sample who answered the item correctly against either the total test score, performance on an external criterion, or a mathematically-derived estimate of a latent ability or trait.

The curve provides information on the relationship between an examinee’s level on the ability or trait measured by the test and the probability that he/she will respond to the item correctly.

Question 8

Q

Content Validity

Answer

A

The extent to which a test adequately samples the domain of information, knowledge, or skill that it purports to measure. Determined primarily by “expert judgment.” Most important for achievement and job sample tests.

Question 9

Q

Reliability and Validity

Answer

A

Reliability is a necessary but not sufficient condition for validity.

Question 10

Q

Criterion-Related Validity/Concurrent And Predictive

Answer

A

The type of validity that involves determining the relationship (correlation) between the predictor and the criterion. The correlation coefficient is referred to as the criterion-related validity coefficient. Criterion-related validity can be either concurrent (predictor and criterion scores obtained at about the same time) or predictive (predictor scores obtained before criterion scores).

Question 11

Q

Construct Validity

Answer

A

Construct validity refers to the extent to which a test measures the hypothetical trait (construct) it is intended to measure.
Methods for establishing construct validity include
* correlating test scores with scores on measures that do and do not measure the same trait (convergent and discriminant validity),
* conducting a factor analysis to assess the test’s factorial validity,
* determining if changes in test scores reflect expected developmental changes, and
* seeing if experimental manipulations have the expected impact on test scores.

Question 12

Q

Relevance

Test Construction

Answer

A

In test construction, relevance refers to the extent to which test items contribute to achieving the stated goals of testing.

Question 13

Q

Factor Loadings and Communality

Answer

A

In a factor matrix, a factor loading is the correlation between a test (or other variable included in the analysis) and a factor and can be squared to determine the amount of variability in the test that is accounted for by the factor. The communality is the total amount of variability in scores on the test that is accounted for by the factor analysis - i.e., by all of the identified factors.

Question 14

Q

Reliability
&
Reliability Coefficient

Answer

A

Reliability refers to the consistency of test scores; i.e., the extent to which a test measures an attribute without being affected by random fluctuations (measurement error) that produce inconsistencies over time, across items, or over different forms.

Methods for establishing reliability include

test-retest,
alternative forms,
split-half,
coefficient alpha,
inter-rater.

Most produce a reliability coefficient, which is interpreted directly as a measure of true score variability - e.g., a reliability of .80 indicates that 80% of variability in test scores is true score variability.

Question 15

Q

Orthogonal And Oblique Rotation

Answer

A

In factor analysis, an orthogonal rotation of the identified factors produces uncorrelated factors, while an oblique rotation produces correlated factors. Rotation is done to simplify the interpretation of the identified factors.

Question 16

Q

Norm-Referenced Interpretation

Answer

A

Ways of comparing someone’s test performance to the performance of others in a standard sample [Percentile ranks and standard scores (e.g., z-scores and T scores)

Question 17

Q

Multitrait-Multimethod Matrix

Answer

A

A systematic way to organize the correlation coefficients obtained when assessing a measure’s convergent and discriminant validity (which, in turn, provides evidence of construct validity). Requires measuring at least two different traits using at least two different methods for each trait. Terms to have linked with multitrait-multimethod matrix are monotrait-monomethod, monotrait-heteromethod, heterotrait-monomethod, and heterotrait-heteromethod coefficients.

Question 18

Q

Standard Error Of Estimate/Confidence Interval

Answer

A

An index of error when predicting criterion scores from predictor scores. Used to construct a confidence interval around an examinee’s predicted criterion score. Its magnitude depends on two factors: the criterion’s standard deviation and the predictor’s validity coefficient.

Question 19

Q

Incremental Validity

Answer

A

The extent to which a predictor increases decision-making accuracy. Calculated by subtracting the base rate from the positive hit rate.

Question 20

Q

True Negatives
vs
False Negatives

Answer

A

true negatives scored low on the predictor and the criterion; and false negatives scored low on the predictor but high on the criterion.

Question 21

Q

True Positives vs False Positives

Answer

A

. True positives are those who scored high on the predictor and criterion; false positives scored high on the predictor but low on the criterion

Question 22

Q

Sensitivity

Answer

A

Sensitivity is the percent of people in the tryout sample who have the disorder and were accurately identified by the predictor as having the disorder.

Question 23

Q

Specificity

Answer

A

Specificity is the percent of people in the tryout sample who do not have the disorder and were accurately identified by the predictor as not having the disorder.

Question 24

Q

Sensitivity vs Specificity

Answer

A

Sensitivity and specificity provide information about a predictor’s accuracy when administered to a group of individuals who are known to have or not have the disorder (or other characteristic) of interest.

Question 25

Q

Test-Retest Reliability

Answer

A

A method for assessing reliability that involves administering the same test to the same group of examinees on two different occasions and correlating the two sets of scores. Yields a coefficient of stability.

Question 26

Q

Split-Half Reliability/ Spearman-Brown Formula

Answer

A

Split-half reliability is a method for assessing internal consistency reliability and involves “splitting” the test in half (e.g., odd- versus even-numbered items) and correlating examinees’ scores on the two halves of the test. The split-half reliability coefficient tends to underestimate a test’s actual reliability and is usually corrected with the Spearman-Brown formula, which estimates what the test’s reliability would be if it were based on the full length of the test.

Question 27

Q

Standard Error of Measurement/Confidence Interval

Answer

A

An index of measurement error. Used to construct a confidence interval around an examinee’s obtained test score. Its magnitude depends on two factors: the test’s standard deviation and reliability coefficient.

Question 28

Q

Criterion-Referenced Interpretation

Answer

A

Interpretation of a test score in terms of a prespecified standard; i.e., in terms of percent of content correct (percentage score) or of predicted performance on an external criterion (e.g., regression equation, expectancy table).

Question 29

Q

Item Difficulty

Answer

A

An item’s difficulty level is calculated by dividing the number of individuals who answered the item correctly by the total number of individuals; ranges in value from 0 (very difficult item) to 1.0 (very easy item). In general, an item difficulty index of .50 is preferred because it maximizes differentiation between individuals with high and low ability and helps ensure a high reliability coefficient.

Question 30

Q

Cross-Validation And Shrinkage

Answer

A

Process of re-assessing a test’s criterion-related validity on a new sample to check the generalizability of the original validity coefficient. Ordinarily, the validity coefficient “shrinks” (becomes smaller) on cross-validation because the chance factors operating in the original sample are not all present in the cross-validation sample.

Question 31

Q

Kappa Statistic

Answer

A

A correlation coefficient used to assess inter-rater reliability.

Question 32

Q

Criterion Contamination

Answer

A

Refers to bias introduced into a person’s criterion score as a result of the knowledge of the scorer about his/her performance on the predictor. Tends to artificially inflate the relationship between the predictor and criterion.

Question 33

Q

True Negatives vs False Negatives

Answer

A

true negatives scored low on the predictor and the criterion; and false negatives scored low on the predictor but high on the criterion.

Question 34

Q

Validity equation

Answer

A

validity is no greater than the sq rt of

reliability of predictor * reliability of criterion