Chapter 4: Reliability Flashcards
Reliability
Consistency or stability of test scores
Factors that impact reliability
When the test is administered Items selected to be included External distractions (ex- noise) Internal distractions (ex- fatigue) Person administering test Person scoring test
Two components of score
True score (representative of true knowledge or ability) Error score
Systematic error
Error resulting from receiving a different set of instructions for test
Classical test theory equation
Xi=T+E
Xi- obtained score
T- true score
E- error
What measurement error reduces
Usefulness of measurement
Generalizability of test results
Confidence in test results
Content sampling error
Difference between sample of items on test and total domain of items
How good sampling affects error
Reduces it
Largest source of measurement error
Content sampling error
Time sampling error
Random fluctuations in performance over time
Can be due to examinee (fatigue, illness, anxiety, maturation) or due to environment (distractions, temperature)
Inter-rater differences
When scoring is subjective, different scorers may score answers differently
Clerical errors
Adding up points incorrectly
Reliability (mathematic definition)
Symbol: rxx
Ratio of true score variance to total score variance (number from 0 to 1, where 0 is total error and 1 is no error)
Reliability equation
rxx= (sigma^2T)/(sigma^2X)
Reliability’s relation to error
Greater the reliability, the less the error
What reliability coefficients mean
rxx of 0.9: 90% of score variance is due to true score variance
Test-retest reliability
Administer the same test on 2 occasions
Correlate the scores from both administrations
Sensitive to sampling error
Things to consider surrounding test-retest reliability
Length of interval between testing
Activities during interval (distraction or not)
Carry-over effects from one test to next
Alternate-form reliability
Develop two parallel forms of test
Administer both forms (simultaneously or delayed)
Correlate the scores of the different forms
Sensitive to content sampling error (simultaneous and delayed) and time sampling error (delayed only)
Things to consider surrounding alternate-form reliability
Few tests have alternate forms
Reduction of carry-over effects
Split-half reliability
Administer the test
Divide it into 2 equivalent halves
Correlate the scores for the half tests
Sensitive to content sampling error
Things to consider surrounding split-half reliability
Only 1 administration (no time sampling error)
How to split test up
Short tests have worse reliability
Kuder-Richardson and coefficient (Cronbach’s) alpha
Administer test
Compare each item to all other items
Use KR-20 for dichotomous answers and Cronbach’s alpha for any type of variable
Sensitive to content sampling error and item heterogeneity
Measures internal consistency
Inter-rater reliability
Administer test
2 individuals score test
Calculate agreement between scores
Sensitive to differences between raters
Composite scores
Scores that are combined to form a combined score
Reliability of these is usually better than their individual parts
Difference scores
Calculated difference between 2 scores
Reliability of these is usually lower than their individual parts (information is lost: only can see change, not initial baseline)
Choosing a reliability test to use
Multiple administrations: test-retest reliability
One administration: homogeneous content uses coefficient alpha and heterogeneous content uses split-half coefficient
Factors to consider when evaluating reliability coefficients
Construct being measured
Time available for testing
How the scores will be used
Method of estimating reliability
High-stake decision tests: reliability coefficient used
Greater than 0.9 or 0.95
General clinical use: reliability coefficient used
Greater than 0.8
Class tests and screening tests: reliability coefficient used
Greater than 0.7
How to improve reliability
Increase number of test items
Use composite scores
Develop better items
Standardize administration
Standard error of measurement (SEM)
Standard deviation of test administered to the same individual an infinite number of times
Useful when interpreting test scores
When reliability increases, this decreases
How to calculate confidence intervals
Use SEM and SD
Relationship between reliability and confidence interval
Reliability increases, confidence interval decreases
Test manuals/researchers report: information included
Internal consistency
Test-retest
Standard error of measurement (SEM)
Information on confidence intervals
Generalizability theory
Shows how much variance is associated with different sources of error