Chapter 4: Reliability Flashcards
What is reliability?
How often you get the same result, think of a dart board, MEASURING CONSISTENCY
What contributes to measurement error?
Situational
- Test environment
- Test-taker
- Examiner
Test Creation
- Item sampling
- Test scoring: criteria, administrator bias
- Test interpretation: arbitrary, low inter-rater agreement
What components make up Classical Test Score Theory?
assumes that each person has a true score, T, that would be obtained if there were no errors in measurement.
x (observed score) = T (True Score) + E (Error)
In what ways can error impact the observed score?
Test reliability is usually estimated in one of what three ways? Know the major concepts in each way.
- test-retest: do the scores stay consistent across when administered again?
- internal consistency: how well do items test across test-takers with a lot of ability and those without ability
- parallel forms: two tests that measure the same-thing
What is a carryover effect?
When the test participant can remember items from their previous time taking the test, and thus, may score differently.
When the first session influences the second
Define parallel/alternate forms reliability. What are its advantages and disadvantages?
Parallel forms: When you have two tests that are different from each other that measure the same thing.
Disadvantage: Extremely hard to do/construct
Advantage: Reduce memory bias, you can re-administer a test, decrease cheating
Define split half reliability. How is this measured?
- Give once and divide into separately scored halves
- Correlate the halves
- Only requires one administration
- Deflated reliability
- Difficult to split into equivalent halves
How do the different aspects of internal consistency differ?
Measures whether all the items are measuring the same thing
Split-alf:
Spearman-Brown:
KR20:
Cronback’s alpha
Understand the major components of inter-rater reliability.
- Measured using Kappa coefficient
- Different raters will rate the items differently
What is the Kappa statistic and how does it relate to reliability?
- measures inter-rater reliability
Summary of Reliability
Error and Reliability
Difference between observed and true score
Classic test score theory
x= t + e
Evaluating sources of error
Test-retest, alternative/parallel form, split-half, internal consistency, inter-rater
Fixing reliability
Increase items, drop unhelpful items, correct for measurement error
Know the Summary of Reliability Table from lecture
https://docs.google.com/presentation/d/1tvNLDV2q4N0Rn0JajLM5sBhzXSxXrbh7/edit#slide=id.p29
What does the standard error of measurement do?
If it changes randomly, its distribution should be normal
The mean will be an estimate of the true score
The standard deviation will be the standard error of measurement (how much the score varies from the true score)
Assumed that the standard errors are the same for everyone
True variance is ideal, the ACTUAL differences in ability before error
What factors should be considered when choosing a reliability coefficient?
Depends of the nature of the test.
For example, if you have raters rating an essay then calculate interrater reliability. If you are looking to see test items consistently rate those with low and high ability use internal consistency measures.