Test Construction: Reliability Flashcards

1
Q

Reliability Coefficient:

Range and Reliability

A

Range: 0.0 to +1.0

Acceptable Test Reliability: r = .80 or higher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reliability is also known as…

A

Consistency, Precision

*a reliable test yields consistent, dependable results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Test-Retest Reliability: Overview

A

Same group
Same exam repeated

r = coefficient of stability/consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Test-Retest Reliability: Source of error

A

Time Sampling Factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Alternate Forms Reliability

aka Equivalent Forms/Parallel Forms

A

Same group

Compare scores on alternate/equivalent form of exam

r = coefficient of equivalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Alternate Forms Reliability: Sources of Error

A

Content Sampling

Time Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Test-retest reliability and alternate forms reliability should NOT with large probability of these 2 types of errors

A

Time Sampling

Practice Effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Internal Consistency Reliability: Two Methods

A

Split-Half Reliability

Cronbach’s Alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Internal Consistency Reliability: Split-Half Reliability

A

One test
One group

Test is split into equal halves so that each examinee has two scores

r = correlation between two halves/scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Internal Consistency Reliability: Split-Half reliability underestimation

A

Shorter tests = underestimation of reliability (small sample size)

*Corrected with Spearman-Brown prophecy formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Internal Consistency Reliability: Spearman-Brown formula

A

r = split-half reliability if it were based on full test

“Long Spear cuts you in half”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Internal Consistency Reliability: Cronbach’s Alpha

A

Like split-half,
One test
One group

Looks at all items for inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Internal Consistency Reliability:

Cronbach’s Alpha: characteristics

A

Conservative

Considered the lower boundary of the test’s reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Internal Consistency Reliability: Kuder-Richardson Formula 20 (KR-20)

A

Cronbach Alpha Variation for Dichotomous scores (yes/no)

“KRonbach 2.0”
two point oh = 2 for another version of Cronbach and 2 for dichotomous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Internal consistency reliability:

Sources of error

A

Content sampling

Heterogeneity of content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Internal Consistency Reliability:

Most Effective Application

A

When test measures single characteristic

When characteristic fluctuates over time

When practice effects are likely

17
Q

Internal consistency reliability: when not to use

A

Speed test

*fixed time prevents completion of all items

18
Q

Best Reliability Approach for Speed Test

A

Alternate Forms Reliability

19
Q

Inter-Rater Reliability: Two Methods, three statistics

a.k.a. Inter-Scorer, Inter-Observer

A

Used when scores depend on a rater’s judgment

Two methods:

  • Correlation Coefficient
  • kappa (k) statistic
  • coefficient of concordance (Kendall)

*Percent Agreement

20
Q

Inter-Rater Reliability: When to Use Kappa Statistic (k)

A

Nominal or Ordinal scales

*Mnemonic: kNOw the other raters

21
Q

Inter-Rater Reliability: Coefficient of Concordance

aka Kendall’s Coefficient

A

3 or more raters
Ranked Ratings

“Everyone ranked Kendall Jenner Pepsi Ad as horrible”

22
Q

Inter-Rater Reliability: Sources of Error

A
  • motivation
  • biases
  • characteristics of measuring device
  • observer drift
23
Q

Inter-rater reliability: Observer Drift

A

Error introduced raters working together influence each other’s ratings to be more similar

24
Q

Factors that affect Reliability: Test Length

A

Smaller test = less data and more error

25
Q

Factors that affect Reliability: How to Increase range

A

heterogenous examinees

varied degree of difficulty of items

(r is maximized when range is unrestricted)

26
Q

Lowest to Highest Test Reliability

A

True/false

Multiple-choice

Free recall (least amount of guessing)

27
Q

Interpreting Reliability: the reliability coefficient

A

Proportion of variability that is attributable to true score variability

e.g. r = .84
84% = variability of true score differences
16% = measurement error

28
Q

Interpreting Reliability: Standard Error of Measurement

A

Similar to standard deviation

Useful to interpret an individual examinee’s test score

SEM factors in reliability to create a Confidence interval

29
Q

Standard Error and Confidence Intervals

A

Add and subtract SEM to Test Score to create CI
[Same percentages as SD and same math as Z score]

E.g. Test score = 100, SEM = 10:
68% CI = +1 & - 1 [90-110]
95% CI = +2 & - 2 [80-120]
99% CI = +3 & - 3 [70-130]

30
Q

Standard Error of the Difference

A

Compares ONE examinee’s performance on:

two different tests

(or same test at different times)

31
Q

Factor that decreases Cronbach’s Alpha

A

Heterogeneity of content

32
Q

Error: Content Sampling

A

Interaction between:

Examinee knowledge and different content on alternate form

33
Q

Error: Time Sampling Factors

A

Random factors related to passage of time

E.g.

  • Difference in examinees’ anxiety, motivation, etc.
  • Random variations in testing environment
34
Q

Error: Carryover Effects

A

When scores are affected by multiple tests:

order of test administration (order effects)

repeated measurement (practice effects)