Test Construction: Reliability Flashcards
Reliability Coefficient:
Range and Reliability
Range: 0.0 to +1.0
Acceptable Test Reliability: r = .80 or higher
Reliability is also known as…
Consistency, Precision
*a reliable test yields consistent, dependable results
Test-Retest Reliability: Overview
Same group
Same exam repeated
r = coefficient of stability/consistency
Test-Retest Reliability: Source of error
Time Sampling Factors
Alternate Forms Reliability
aka Equivalent Forms/Parallel Forms
Same group
Compare scores on alternate/equivalent form of exam
r = coefficient of equivalence
Alternate Forms Reliability: Sources of Error
Content Sampling
Time Sampling
Test-retest reliability and alternate forms reliability should NOT with large probability of these 2 types of errors
Time Sampling
Practice Effects
Internal Consistency Reliability: Two Methods
Split-Half Reliability
Cronbach’s Alpha
Internal Consistency Reliability: Split-Half Reliability
One test
One group
Test is split into equal halves so that each examinee has two scores
r = correlation between two halves/scores
Internal Consistency Reliability: Split-Half reliability underestimation
Shorter tests = underestimation of reliability (small sample size)
*Corrected with Spearman-Brown prophecy formula
Internal Consistency Reliability: Spearman-Brown formula
r = split-half reliability if it were based on full test
“Long Spear cuts you in half”
Internal Consistency Reliability: Cronbach’s Alpha
Like split-half,
One test
One group
Looks at all items for inter-item consistency
Internal Consistency Reliability:
Cronbach’s Alpha: characteristics
Conservative
Considered the lower boundary of the test’s reliability
Internal Consistency Reliability: Kuder-Richardson Formula 20 (KR-20)
Cronbach Alpha Variation for Dichotomous scores (yes/no)
“KRonbach 2.0”
two point oh = 2 for another version of Cronbach and 2 for dichotomous
Internal consistency reliability:
Sources of error
Content sampling
Heterogeneity of content
Internal Consistency Reliability:
Most Effective Application
When test measures single characteristic
When characteristic fluctuates over time
When practice effects are likely
Internal consistency reliability: when not to use
Speed test
*fixed time prevents completion of all items
Best Reliability Approach for Speed Test
Alternate Forms Reliability
Inter-Rater Reliability: Two Methods, three statistics
a.k.a. Inter-Scorer, Inter-Observer
Used when scores depend on a rater’s judgment
Two methods:
- Correlation Coefficient
- kappa (k) statistic
- coefficient of concordance (Kendall)
*Percent Agreement
Inter-Rater Reliability: When to Use Kappa Statistic (k)
Nominal or Ordinal scales
*Mnemonic: kNOw the other raters
Inter-Rater Reliability: Coefficient of Concordance
aka Kendall’s Coefficient
3 or more raters
Ranked Ratings
“Everyone ranked Kendall Jenner Pepsi Ad as horrible”
Inter-Rater Reliability: Sources of Error
- motivation
- biases
- characteristics of measuring device
- observer drift
Inter-rater reliability: Observer Drift
Error introduced raters working together influence each other’s ratings to be more similar
Factors that affect Reliability: Test Length
Smaller test = less data and more error