Ch. 5 - Reliability Flashcards

1
Q

reliability

A

consistency in measurement (not good or bad, right or wrong, just consistent); the proportion of the total variance attributed to true variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reliability coefficient

A

a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

concept of reliability - equation

A

Observed Score = True Score + Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

we use X to describe test score variability / reliability

A

variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the proportion of the total variance attributed to true variance is

A

reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the greater the reliability…

A

indicates that you are capturing more true variance than “noise”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

measurement error

A

all of the factors associated with the process of measuring some variable, other than the variable being measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

error variance

A

variance from irrelevant, random sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sources of error variance

A
test construction (content sampled, way items are worded
test administration (environment: lighting, temperature; testtaker variables: sick, bad mood; examiner-related variables: "giving away" answers with tone of voice)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

more sources of error variance

A

computer glitches or errors in hand-scoring; testtakers may over or under report
sampling error - only contacting voters with landlines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

test-retest reliability

A

a method of reliability. obtained by correlating pairs of scores from the same people on two different administrations of the same test. use when measuring something that’s stable over time (trait)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

as the time between test administrations increases, the correlation usually…

A

decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

coefficient of stability

A

the estimate of test-retest reliability, when the interval between testing is greater than six months

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

coefficient of equivalence

A

the degree of the relationship between various forms of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

parallel forms (reliability)

A

for each form of the test, the means and variances of observed test scores are equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

alternate forms (reliability)

A

these don’t necessarily met the requirements of parallel forms (same means and variances) but are equivalent in terms of content, level of difficulty, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

parallel or alternate forms relaibility

A

the extent to which item sampling and other errors have affected test scores on versions of the same test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how do you obtain parallel or alternate forms reliability estimates?

A

administer test two times with same group (like test-retest but don’t have to wait)
same problems: scores affected by item sampling, testtaker variables, etc
time consuming and expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

estimate of inter-item consistency

A

degree of correlation among all items on a scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

how do you do a split-half reliability estimate?

A

(1) divide test into equivalent halves
(2) find Pearson r between the scores on each half
(3) adjust the half-test reliability with Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is a split-half reliability estimate?

A

obtaining reliability estimate evaluating the internal consistency of the test (no need for two firms or time elapsing).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

how should split the test for a split-half reliability estimate?

A

not down the middle
randomly assign items
split odd-even
divide by content and difficulty

i.e. make mini parallel forms!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Spearman-Brown Adjustment

A

determines the reliability of a whole test from a shortened version. (half)

24
Q

don’t use split-half reliability with what kind of test?

A

heterogeneous (measures more than one trait)

25
Q

reliability usually increases as…

A

test length increases

26
Q

alternatives to the Spearman-Brown reliability estimate (for split-half)

A

Kuder-Richardson (for tests with dichotemous items)
Average Proportional Distance
Cronbach’s alpha - “mean of all possible split-half correlations”

27
Q

reliability coefficients range from

A

0 to 1. possible to get negative, but usually a mistake in data entry

28
Q

measures of reliability are subject to

A

error. they are estimates

29
Q

a reliability coefficient may not be acceptable if

A

it is done with the same test on a very different set of testtakers

30
Q

what’s a good reliability?

A

like grades! .90 is an A, .80 is a B

31
Q

if reliability is really high on a split-half estimate, what is likely the cause?

A

redundancy in test items

32
Q

the more homogeneous a test is…

A

the more inter-item consistency it can be expected to have (duh)

33
Q

split-half reliability, odd-even, Spearman-Brown formula, Kuder-Richardson (KR-20), alpha, and Average Proportional Distance are all methods of evaluating…

A

the internal consistency of a test

34
Q

inter-scorer reliability

A

the degree of agreement or consistency between two or more scorers/judges/raters

35
Q

if inter-scorer reliability is high,…

A

test scores can be derived in a systematic, consistent way by trained scorers

36
Q

what are the three approaches for estimating reliability?

A

test-retest, alternate or parallel forms, internal or inter-item consistency

37
Q

what about the nature of a test might influence reliability? (5)

A
homogeneous vs heterogeneous test
dynamic vs static characteristics
restriction or inflation of range
speed vs power test
criterion-referenced vs norm-referenced tests
38
Q

heterogeneous vs homogeneous test

A

measures different factors; measures one factor/trait

39
Q

traditional ways of estimating reliability are often not appropriate for what kind of test?

A

criterion-referenced

40
Q

what kind of reliability estimate is best for a heterogeneous test?

A

test-retest (not inter item consistency because that will be low)

41
Q

what kind of reliability estimate is best for a measurement of dynamic characteristics?

A

inter-item consistency (not test-retest)

42
Q

power test

A

has a long time limit, but some items are so hard that no testtaker will get a perfect score

43
Q

speed test

A

must be done in a certain amount of time. easy items but tough to get them all done (typing)

44
Q

classical test theory believes that…

A

everyone has a “true score” on a test. very test-dependent, though

45
Q

what are alternatives to classical test theory?

A

domain sampling theory
generalizability theory
Item Response Theory (IRT)

46
Q

domain sampling theory

A

test’s reliability is an objective measure of how precisely the test measures the “domain” of the test (ex: behavior). takes issue with the true score + error = score

47
Q

generalizability theory

A

a person’s test scores vary from testing to testing because of the variables in the testing sitaution. takes issue with the true score + error = score

48
Q

Item Response Theory (IRT)

A

hundreds of varietys; items vary in many different ways including: Difficulty and Discrimination

49
Q

what tells us how much error could be in single test score?

A

Standard Error of Measurement (SEM)

50
Q

Standard Error of Measurement

A

estimates the extent to which an observed score deviates from a “true” score

51
Q

the higher reliability of a test, the ____ the SEM

A

lower

52
Q

if a person were to take a bunch of equivalent tests, scores would be…

A

normally distributed with their true score at the mean

53
Q

confidence interval

A

the range or band of scores that is likely to contain the true score

54
Q

95% confidence interval - what does it mean?

A

we are 95% confident that the true score is within +- 2 standard errors of measurement. 95% of this testtaker’s scores are expected to fall within this range on the distribution

55
Q

true differences in a characteristic being measured might be from another source besides error or change from one testing to another. what might that be?

A

an actual difference. might be what you’re looking for in psychotherapy outcome reasearch

56
Q

standard error of the difference helps you determine

A

if your research showed statistically significant results of something weird!

57
Q

the standard error of the difference will always be ___ compared to the standard error of measurement for a score.

A

larger, because both include error.