Reliability and coefficient alpha Flashcards
What is reliability?
The desired consistency or reproductibility of test scores
-does my test give me the same accurate measurement each time?
Test score theory
every person has a true score that we can measure but no test is free from error
x=T+e (x=observed score, T= true score, e= error)
Classical test theory: 4 assumptions
- each person has a true score we could obtain if there was no measurement error
- there is measurement error but this error is random
- the true score of an individual doesn’t change with repeated applications of the same test even though their observed score does
- the distribution of random errors will be the same for all ages
Classical test theory:
The domain sampling model
If we construct a test on something, we can’t ask all possible questions
- > So we only use a few test items (sample)
- Using fewer test items can lead to the introduction of error
The domain sampling model
formula
reliability = variance of observed scores on short test/variance of true scores
As the sample gets larger, estimate is more accurate
Other things can affect performance…
- might be tired on day taking the test (different scores for different days)
Types of reliability
- test-retest reliability
- parallel forms reliability
- internal consistency reliability (split half, Kuder-Richards 20, Cronbach’s alpha)
- inter-rater reliability
Test-retest reliability
- Give someone the same test at two different points in time.
- If the scores are highly correlated, we have good test-retest reliability
- Correlation between the 2 scores also known as the co-efficient of stability
Source of error in test-retest reliability
time sampling
Issues with test-retest reliability
Can we use it when measuring things like mood, stress, etc.?
Won’t the person’s score increase the 2nd time because of practice effect?
What if we want to measure changes between 1st and 2nd administration?
Can the actual experience of being tested change the thing being tested?
What if some event happens in between the 1st and 2nd administration to change the thing being tested?
Parallel forms reliability
- Two different forms of the same test (i.e., measuring the same construct)
- Correlation between the two forms known as the co-efficient of equivalence
Parallel forms reliability- source of error
item sampling
Parallel forms reliability- Ways to change the form of test
- question response alternatives are reworded
- order is changed (reduce practice effect)
- change wording of question
Parallel forms reliability: issues
What if we give the different forms to people at two different times?
Do we give the different forms to the same people, or different people?
What if people work out how to answer the one form from doing the other form?
Difficult to generate a big enough item pool
Internal consistency reliability
Do the different items within one test all measure the same thing to the same extent?
I.e., Are items within a single test highly correlated?
Split-half reliability
Coefficient alpha