Test Construction: Reliability Flashcards
Reliability Coefficient:
Range and Reliability
Range: 0.0 to +1.0
Acceptable Test Reliability: r = .80 or higher
Reliability is also known as…
Consistency, Precision
*a reliable test yields consistent, dependable results
Test-Retest Reliability: Overview
Same group
Same exam repeated
r = coefficient of stability/consistency
Test-Retest Reliability: Source of error
Time Sampling Factors
Alternate Forms Reliability
aka Equivalent Forms/Parallel Forms
Same group
Compare scores on alternate/equivalent form of exam
r = coefficient of equivalence
Alternate Forms Reliability: Sources of Error
Content Sampling
Time Sampling
Test-retest reliability and alternate forms reliability should NOT with large probability of these 2 types of errors
Time Sampling
Practice Effects
Internal Consistency Reliability: Two Methods
Split-Half Reliability
Cronbach’s Alpha
Internal Consistency Reliability: Split-Half Reliability
One test
One group
Test is split into equal halves so that each examinee has two scores
r = correlation between two halves/scores
Internal Consistency Reliability: Split-Half reliability underestimation
Shorter tests = underestimation of reliability (small sample size)
*Corrected with Spearman-Brown prophecy formula
Internal Consistency Reliability: Spearman-Brown formula
r = split-half reliability if it were based on full test
“Long Spear cuts you in half”
Internal Consistency Reliability: Cronbach’s Alpha
Like split-half,
One test
One group
Looks at all items for inter-item consistency
Internal Consistency Reliability:
Cronbach’s Alpha: characteristics
Conservative
Considered the lower boundary of the test’s reliability
Internal Consistency Reliability: Kuder-Richardson Formula 20 (KR-20)
Cronbach Alpha Variation for Dichotomous scores (yes/no)
“KRonbach 2.0”
two point oh = 2 for another version of Cronbach and 2 for dichotomous
Internal consistency reliability:
Sources of error
Content sampling
Heterogeneity of content
Internal Consistency Reliability:
Most Effective Application
When test measures single characteristic
When characteristic fluctuates over time
When practice effects are likely
Internal consistency reliability: when not to use
Speed test
*fixed time prevents completion of all items
Best Reliability Approach for Speed Test
Alternate Forms Reliability
Inter-Rater Reliability: Two Methods, three statistics
a.k.a. Inter-Scorer, Inter-Observer
Used when scores depend on a rater’s judgment
Two methods:
- Correlation Coefficient
- kappa (k) statistic
- coefficient of concordance (Kendall)
*Percent Agreement
Inter-Rater Reliability: When to Use Kappa Statistic (k)
Nominal or Ordinal scales
*Mnemonic: kNOw the other raters
Inter-Rater Reliability: Coefficient of Concordance
aka Kendall’s Coefficient
3 or more raters
Ranked Ratings
“Everyone ranked Kendall Jenner Pepsi Ad as horrible”
Inter-Rater Reliability: Sources of Error
- motivation
- biases
- characteristics of measuring device
- observer drift
Inter-rater reliability: Observer Drift
Error introduced raters working together influence each other’s ratings to be more similar
Factors that affect Reliability: Test Length
Smaller test = less data and more error
Factors that affect Reliability: How to Increase range
heterogenous examinees
varied degree of difficulty of items
(r is maximized when range is unrestricted)
Lowest to Highest Test Reliability
True/false
Multiple-choice
Free recall (least amount of guessing)
Interpreting Reliability: the reliability coefficient
Proportion of variability that is attributable to true score variability
e.g. r = .84
84% = variability of true score differences
16% = measurement error
Interpreting Reliability: Standard Error of Measurement
Similar to standard deviation
Useful to interpret an individual examinee’s test score
SEM factors in reliability to create a Confidence interval
Standard Error and Confidence Intervals
Add and subtract SEM to Test Score to create CI
[Same percentages as SD and same math as Z score]
E.g. Test score = 100, SEM = 10:
68% CI = +1 & - 1 [90-110]
95% CI = +2 & - 2 [80-120]
99% CI = +3 & - 3 [70-130]
Standard Error of the Difference
Compares ONE examinee’s performance on:
two different tests
(or same test at different times)
Factor that decreases Cronbach’s Alpha
Heterogeneity of content
Error: Content Sampling
Interaction between:
Examinee knowledge and different content on alternate form
Error: Time Sampling Factors
Random factors related to passage of time
E.g.
- Difference in examinees’ anxiety, motivation, etc.
- Random variations in testing environment
Error: Carryover Effects
When scores are affected by multiple tests:
order of test administration (order effects)
repeated measurement (practice effects)