Chapter 4-reliability Flashcards

Question

Internal Consistency method

Answer 1

We examine how people perform on similar subsets of items. -> Selected from the same form of the measure. One Test administration: A single form of a test is administered only once to a group of examinees. => How consistently the examinees performed across items or subsets of items on this single test form.

Answer 2

(1) Items came from the SAME CONTENT DOMAIN and constructed the same way (2) Performance would GENERALIZE to other items from the same content domain

Answer 3

Higher for "narrow" constructs Lower for "broader constructs -> Very high may indicate insufficient sampling in the domain E.g. Medium internal consistency is bad for a narrow construct (panic disorder), but not so bad for a broad construct (Neuroticism)

Answer 4

Correlate scores based on first half or second half of items. Correlate scores based on odd or even items. => If the items get progressively more difficult, then you might be better advised to use the odd-even system, whereby one subscore is obtained for the odd-numbered items in the test and another for the even-numbered items.

Answer 5

CRONBACH'S ALPHA = AVERAGE OF ALL POSSIBLE SPLIT-HALF RELIABILITIES Unaffected by how items are arranged in the test Contemporary approach to estimate internal consistency. -> Most general method of finding estimates of reliability through internal consistency.

Answer 6

Interrater Agreement Proportion of the potential agreement following CORRECTION FOR CHANCE.

Answer 7

Degree of agreement among independent observers who rate/assess the same phenomenon. Two or more people rate/score the same tests

Answer 8

Model that holds that the TRUE score of a characteristic is obtained when ALL of the ITEMS in the domain are used to capture it. -> Considers the problems created by using a limited number of items to represent a larger and more complicated construct. -> Conceptualizes reliability as the ratio of the variance of the observed score on the SHORTER test and the variance of the LONG-run true score.

Answer 9

Requires a bank of items that have been systematically evaluated for level of difficulty. -> Considerable effort must go into test development, and complex computer software is required.

Answer 10

CARRYOVER EFFECTS: Occurs when the first testing session influences scores from the second session.

Answer 11

OVERESTIMATES -> This can happen because the participant REMEMBERS items or patterns from the first test, so their performance on the second test is less independent than it should be.

Answer 12

SYSTEMATIC E.g. When everyone's score improves exactly 5 points. In this case, no new variability occurs.

Answer 13

PRACTICE effects: improvement with practice -> Because of these problems, the time interval between testing sessions must be selected and evaluated carefully.

Answer 14

the time interval between testing sessions

Answer 15

Parallel Forms Method -> However: Test developers find it burdensome to develop two forms of the same test, and practical constraints make it difficult to retest the same group of individuals. -> Many test developers prefer to base their estimate of reliability on a single form of a test.

Answer 16

(1) The two halves may have different variances. (2) The split-half method also requires that each half be scored separately, possibly creating additional work.

Answer 17

The Kuder-Richardson technique avoids these problems because it simultaneously considers all possible ways of splitting the items. (Or Cronbach's alpha)

Answer 18

Simultaneously considers all possible ways of splitting the items. When test is dichotomous (Right or Wrong)

Answer 19

Factor analysis is one popular method for dealing with the situation in which a test apparently measures several different characteristics. -> Used to divide the items into subgroups, each internally consistent; E.g. test that has submeasures for several different traits.

Answer 20

(1) Time sampling: The same test given at different points in time may produce different scores, even if given to the same test takers. (2) Item sampling: The same construct or attribute may be assessed using a wide pool of items. (3) When different observers record the same behavior: Different judges observing the same event may record different numbers.

Answer 21

Test-retest method (coeff of stability)

Answer 22

Parallel forms reliability

Answer 23

Adjusted index of agreement such as the kappa statistic.

Answer 24

(1) Increase the # of Items (2) Throw out items that run down the reliability (by running a factor/discriminability analysis) (3) Estimate what the true correlation would have been (CORRECTION FOR ATTENUATION)

Answer 25

Kappa = 0 is considered poor -> means the agreement is basically by chance. Kappa = 1 represents perfect, complete agreement.

Answer 26

lower; small

Answer 27

Test-retest = Pearson's (coeff stability) Alternate forms = Pearson's (coeff equivalence) Internal consistency = Spearman- Brown formula, 𝐾𝑅20 formula, or Cronbach alpha Interrater = Kappa formula

Answer 28

(1) Test-retest = Scores on first and second administration correlated (2) Alternate forms = Scores on both tests are correlated (3) Internal consistency = Scores on both halves are correlated OR Average correlation on all split halves Interrater = Scores by both observers are correlated

Answer 29

Subtracting one test score from another -> two different attributes

Answer 30

Difference scores are unreliable because the random error from both scores is compounded and the true score is cancelled out.

Chapter 4-reliability Flashcards

(54 cards)