test construction Flashcards

Question

How can consensual observer drift be reduced?

Answer 1

* Not having raters work together * Providing adequate training * Regularly monitoring accuracy

Answer 2

Content homogeneity ## Footnote Tests that are homogeneous regarding content tend to have larger reliability coefficients than heterogeneous tests, especially for internal consistency reliability.

Answer 3

Larger reliability coefficients occur when test scores are unrestricted in range ## Footnote This happens when the sample includes examinees with high, moderate, and low levels of the characteristics measured.

Answer 4

Easier guessing leads to lower reliability coefficients ## Footnote True/false tests are likely less reliable than multiple-choice tests with three or more answer choices.

Answer 5

Theoretical correlation between observed test scores and true test scores ## Footnote Calculated by taking the square root of the reliability coefficient.

Answer 6

Which items to include based on difficulty level and discrimination ability ## Footnote It is a process used in classical test theory.

Answer 7

p = number of correct answers / total number of examinees ## Footnote Ranges from 0 to 1.0, with smaller values indicating more difficult items.

Answer 8

p = .30 to .70 ## Footnote Moderately difficult items are preferred, but optimal values may vary based on the test purpose.

Answer 9

Lower p values are preferred ## Footnote For example, an optimal average item difficulty of .20 might be used to identify mastery of at least 20% of content.

Answer 10

Optimal p = (1.0 + probability of guessing) / 2 ## Footnote For a four-answer multiple-choice question, this would be (1.0 + .25) / 2 = .625.

Answer 11

Difference in correct responses between high and low total test score groups ## Footnote Ranges from -1.0 to +1.0, with higher D values indicating better discrimination.

Answer 12

D value of .30 or higher ## Footnote Items of moderate difficulty typically have higher discrimination levels.

Answer 13

An examinee’s obtained test score may or may not be their true score.

Answer 14

It indicates the range within which an examinee’s true score is likely to be based on their obtained score.

Answer 15

It is calculated by multiplying the test’s standard deviation by the square root of 1 minus the reliability coefficient.

Answer 16

Add and subtract one standard error of measurement to and from the obtained score.

Answer 17

Add and subtract two standard errors of measurement to and from the obtained score.

Answer 18

Add and subtract three standard errors of measurement to and from the obtained score.

Answer 19

80 to 100.

Answer 20

Examinees’ responses to individual test items.

Answer 21

CTT is test-based and focuses on total test scores, while IRT is item-based.

Answer 22

IRT derives sample invariant parameters using mathematical techniques and a large sample size.

Answer 23

A test that tailors items to each examinee by presenting items appropriate for their level of the trait.

Answer 24

Latent trait theory.

Answer 25

The relationship between each item and the latent trait measured by the test.

Answer 26

Total test scores (horizontal/x-axis) and probabilities of endorsing or answering the item correctly (vertical/y-axis).

Answer 27

The level of the trait required for a 50% probability of endorsing or answering the item correctly.

Answer 28

How well the item can discriminate between individuals with high and low levels of the trait.

Answer 29

The steeper the slope, the better the discrimination of the item.

Answer 30

The probability of guessing correctly.

Answer 31

it is more difficult for examinees to choose the correct answer by guessing.

Answer 32

Test scores can be expected to be consistent ## Footnote Adequate reliability does not indicate that the test measures what it was designed to measure.

Answer 33

The degree to which evidence and theory support the interpretation of test scores for proposed uses of tests ## Footnote Validity is a unitary concept, incorporating multiple sources of validity evidence.

Answer 34

* Content Validity * Construct Validity * Criterion-related Validity

Answer 35

* Evidence based on test content * The response process * The internal structure of the test * Relationships with other variables * The consequences of testing

Answer 36

Evidence that a test measures one or more content or behavior domains ## Footnote Important for achievement tests and work samples.

Answer 37

By clearly defining the domain to be assessed and including representative items ## Footnote Subject matter experts systematically review items for domain coverage.

Answer 38

The extent to which test items 'look valid' to examinees ## Footnote Not an actual type of validity, but can affect examinees' willingness to perform well.

Answer 39

Evidence that a test measures a hypothetical trait inferred from behavior ## Footnote Important for traits like intelligence and motivation.

Answer 40

The degree to which scores on the test correlate with scores on other measures of the same or related constructs.

Answer 41

The degree to which scores on the test have low correlations with scores on measures of unrelated constructs ## Footnote Also known as discriminant validity.

Answer 42

A table of correlation coefficients that provide information about a test’s reliability and validity ## Footnote Used to assess convergent and divergent validity.

Answer 43

A reliability coefficient for the same trait using the same method.

Answer 44

Evidence of the self-report sociability test’s convergent validity.

Answer 45

Evidence of the self-report sociability test’s divergent validity.

Answer 46

A statistical method used to assess a test’s convergent and divergent validity ## Footnote Involves several steps including administering tests and correlating scores.

Answer 47

* Administer the test to a sample * Correlate all pairs of scores * Derive the initial factor matrix * Rotate and interpret the factor matrix

Answer 48

Correlation coefficients indicating the relationship between each test and identified factors.

Answer 49

By squaring and adding the factor loadings when factors are orthogonal.

Answer 50

Evidence of convergent and divergent validity for the test being validated.

Answer 51

To interpret the identified factors based on the correlation patterns of the tests.

Answer 52

It is of interest whenever scores on a test predict or estimate scores on another measure.

Answer 53

Evaluating a job knowledge test used for hiring decisions by predicting job performance scores.

Answer 54

Concurrent and predictive validity.

Answer 55

By obtaining scores on the predictor and criterion at about the same time.

Answer 56

When predictor scores estimate current status on the criterion.

Answer 57

By obtaining scores on the predictor before obtaining scores on the criterion.

Answer 58

When predictor scores estimate future status on the criterion.

Answer 59

More accurate predictor scores for predicting criterion scores.

Answer 60

By squaring the criterion-related validity coefficient.

Answer 61

49% of variability in job performance is explained by job knowledge.

Answer 62

Validating a predictor for a new sample to check the correlation coefficient.

Answer 63

Due to chance (random) factors affecting high correlations.

Answer 64

It is likely to shrink.

Answer 65

When the initial sample is small and the number of predictors is large.

Answer 66

job knowledge test.

Answer 67

It is used to construct a confidence interval around a person's predicted criterion score

Answer 68

It indicates the range within which an examinee’s true criterion score is likely to fall given his or her predicted score

Answer 69

A 68% confidence interval adds and subtracts one standard error, a 95% confidence interval adds and subtracts two, and a 99% confidence interval adds and subtracts three

Answer 70

By multiplying the criterion measure’s standard deviation by the square root of 1 minus the criterion-related validity coefficient squared

Answer 71

It ranges from 0 to the size of the criterion measure’s standard deviation

Answer 72

The standard error is 0

Answer 73

To estimate the maximum validity coefficient if the predictor and/or criterion had a reliability coefficient of 1.0

Answer 74

The extent to which a test is useful for clinical purposes

Answer 75

The increase in the accuracy of predictions about criterion performance by adding a new predictor

Answer 76

Recently hired employees who obtained high scores on both the predictor and criterion

Answer 77

Recently hired employees who obtained high scores on the predictor but low scores on the criterion

Answer 78

Recently hired employees who obtained low scores on both the predictor and criterion

Answer 79

Recently hired employees who obtained low scores on the predictor but high scores on the criterion

Answer 80

By dividing the number of employees with high scores on the criterion by the total number of employees

Answer 81

The proportion of employees who would have been hired using their scores on the new predictor and obtained high scores on the criterion

Answer 82

The ability of a test to correctly distinguish between people who do and do not have a disorder

Answer 83

The proportion of people with the disorder identified by the test as having the disorder

Answer 84

The proportion of people without the disorder identified by the test as not having the disorder

Answer 85

The proportion of people correctly categorized by the test

Answer 86

The probability that a person who tests positive actually has the disorder

Answer 87

The probability that a person who tests negative does not actually have the disorder

Answer 88

A predictor’s reliability always places a ceiling on its validity

Answer 89

It is the square root of the predictor’s reliability coefficient

Answer 90

Scores that indicate how well an examinee performed compared to a standardization sample. ## Footnote Norm-referenced scores include percentile ranks and standard scores.

Answer 91

To make distinctions among individuals or groups in terms of the ability or trait assessed by a test. ## Footnote (Urbina, 2014, p. 212)

Answer 92

The percentage of examinees in the reference group who scored at or below a given score. ## Footnote For example, a percentile rank of 82 means 82% scored lower than the examinee.

Answer 93

As a nonlinear transformation. ## Footnote This is because the percentile rank distribution is always rectangular.

Answer 94

How well an examinee did in terms of standard deviations from the mean score obtained by the reference group. ## Footnote Standard scores include z-scores, T-scores, IQ scores, and stanines.

Answer 95

Mean = 0, Standard Deviation = 1.0.

Answer 96

z = (X – M)/SD, where X is the raw score, M is the mean, and SD is the standard deviation.

Answer 97

The examinee's raw score is one standard deviation below the mean. ## Footnote T-scores have a mean of 50 and standard deviation of 10.

Answer 98

Mean = 100, Standard Deviation = 15.

Answer 99

Raw scores that range from .25 standard deviations below to .25 standard deviations above the mean. ## Footnote Stanines have a mean of 5 and a standard deviation of 2.

Answer 100

To evaluate a person’s or group’s degree of competence or mastery against a preestablished standard of performance. ## Footnote (Urbina, 2014, p. 121)

Answer 101

The percentage of test items that examinees answered correctly.

Answer 102

A predetermined score that distinguishes between mastery and non-mastery of content.

Answer 103

A table that predicts an examinee’s expected score on another measure based on their obtained test score.

Answer 104

Cutoff scores select candidates above a certain score; ranking selects candidates from highest to lowest scores.

Answer 105

Grouping test scores into bands based on the standard error of measurement to consider scores within each band as equivalent.

Answer 106

PR of 2 is equivalent to -2 SD.

Answer 107

PR of 16 is equivalent to -1 SD.

Answer 108

PR of 84 is equivalent to +1 SD.

Answer 109

PR of 98 is equivalent to +2 SD.

test construction Flashcards

(137 cards)