Psychometrics - Reliability + Coefficient Alpha Flashcards
What is reliability?
The desired consistency or reproducibility of test scores
4 assumptions of classical test theory
- Each person has a true score that we could obtain if it weren’t for measurement error
- There is measurement error, but it’s random
- The true score of a person doesn’t change upon repeating tests, even though the observed score does
- The distribution of random errors will be the same for all ages
Domain Sampling Model
The idea that we can’t construct a text that asks all possible questions within the domain being tested, so we have to select only certain ones. But using fewer items can lead to an introduction of error
Reliability analysis’s aim:
Establish how much error is made by using the score from the shorter test as an estimate of one’s true ability - error comes from multiple sources, and there are different ways of measuring reliability that are sensitive to different measurement errors
Four types of reliability
- Test-retest
- Parallel forms
- Internal consistency
- Inter-rater reliability
What kind of error is the test-retest method designed for?
Time sampling
How does test-retest work?
You give someone the same test at two different points in time, and assess how much of difference there is in performance from the first test taking to the second
Problems with test-retest reliability
Practice effect, testing effects, maturation, history
Also not ideal when you want to assess something that is expected to change over time
What source of error is parallel sampling designed to account for?
Item sampling
How does parallel forms reliability work?
You compose two different forms of the same test and get participants to do both
Problems with parallel forms of reliability
How do we give both tests without having time problems? You need a bigger item pool. Testing effects.
What error does internal consistency reliability account for?
The reliability of one test administered on one occasion
What does internal consistency measure and what three methods are used?
Do the different items within all measure the same thing to the same extent?
- Split half reliability
- Coefficient alpha
- KR-20
When do we use KR-20?
It’s used to find the alpha with dichotomous format measures
How does split half reliability work?
A test is split in half, assessed and then correlated to see if the test is consistent.
Pros and cons of split half
+ You only need one test
- It’s difficult to split the test
- Halving the length of the test means it’s less reliable
- Split half will change each time depending on which items sit in which test
How do we account for the decrease in reliability that splitting the test will have?
Spearman-Brown correction!
Formula for Spearman-Brown
Predicted reliability = 2 x correlation between halves / 1 + correlation between halves
How do we account for the differences in Split Half depending on which items are where?
Cronbach’s Alpha!
What does Cronbach’s alpha do?
It takes the average of all possible split half correlations for a test
Formula for Cronbach’s alpha
A = (number of indicators)(average inter-item correlation)/(1 + (number of indicators - 1)(average inter-item correlation)
= kr/(1 + (k - 1)r)
Alpha: correlation between number of items and reliability
Positive and non-linear, there is a rapid increase from 2-10, steady from 11-39 and it plateaus around 40.
What levels of Cronbach’s alpha are we aiming for?
- 7 for exploratory research
- 8 for basic research
- 9 for applied scenarios
What affects Cronbach’s alpha?
- Multidimensionality
- Bad test items
- Number of items
What source of error does inter-rater reliability help with?
Observer differences
What does inter-rater reliability do?
It measures how consistently two or more rafters agree on rating something, based on the premise that multiple raters fro measurement can improve measurement reliability
What inter-rater reliability do we use when?
2 raters = Cohen’s kappa
3+ = Fleiss’ kappa
When do we use Cronbach’s alpha?
When we’re using raw scores
When do we use standardized item alpha?
When scores have been standardized to account for things like age, gender etc.
What is inter-item variance?
The variance between two items