Reliability Flashcards
What is reliability?
The extent to which a measurement tool gives you consistent measures; also refers to the degree to which test scores are free from errors of measurement
How can we measure reliability?
By seeing if someone gets close to the same score if they complete the same questionnaire a number of times; or if people give similar responses to a series of questions supposed to be measuring the same thing
What is classical test theory?
The traditional conceptual basis of psychometrics; it’s the idea that every actual/observed score we measure can be decomposed into two parts: the true score and measurement error
What is the true score, according to classical test theory?
The aspect of what we strive to measure; this is constant for an individual
What is measurement error, according to classical test theory?
What we don’t want to measure; it’s random, and unrelated to the true score
What is the equation for classical test theory?
X (observed score) = T (true score) + E (errors of measurement)
What is a true score theory?
Another term for classical test theory
What is reliability in terms of the relationship between true and total variance?
Reliability (r) is the proportion of true variance (variation of test scores in a sample without measurement error) to the total variance (actual variation in data – including error); true variance/total
What does a lower measurement error tell us about the reliability
It will be higher
Why do we describe classical test theory in terms of variance rather than standard deviations?
Because it is additive and can be broken up into components
If a person took the same test multiple times and we ended up with a lower reliability, what would we expect in regards to the spread of their scores?
We’d expect them to be more spread out due to measurement error (and less spread out if higher reliability)
Describe the various sources of measurement error
Test construction (item sampling/content sampling); Test administration (e.g. distractions during the test, fatigue, etc); Test scoring (e.g. biased examiners, ambiguous scoring guidelines, technical errors); Other influences (e.g. self efficacy, motivational factors, etc)
What is item sampling/content sampling?
Only certain items or content are included in the test so scores may vary according to this; e.g. in an exam, not everything is included so people may be advantaged or disadvantaged depending on what they focused on when revising
Why can we only estimate the reliability of a test and not measure it directly?
Because true variance is hypothetical/theoretical, so instead we estimate reliability via different methods
What are four methods available to us to help estimate the reliability of a test?
Internal consistency; Test-retest reliability; Alternate-forms reliability; Inter-rater reliability
What is internal consistency (aka inter-item consistency or internal coherence)?
It’s how much the item scores in a test correlate with one another on average; are responses consistent across items? (e.g. Cronbach’s alpha, KR-20)
What is test-retest reliability?
The correlation between scores on the same test by the same people done at two different times
Why might test-retest reliability not always be appropriate?
People might remember from the first attempt (but counterbalancing alternate forms can get around this)
What is Cronbach’s alpha, and when should it be used?
A measure of internal consistency; when there’s more than 2 possible outcomes to a question
Describe the steps involved in calculating Cronbach’s alpha by hand.
- Split the questionnaire in half
- Calculate the total score for each half
- Work out the correlation between the total scores for each half
- Repeat steps 1-3 for all possible two way splits of the questionnaire
- Work out the average of all possible split-half correlations
- Adjust the correlation to account for shortening the test by applying a special version of the Spearman-Brown formula
What is KR-20 and what is it used for?
Kuder-Richardson 20 formula, used for estimating internal consistency for questions with 2 outcomes (e.g. true/false); like Cronbach’s alpha, it gives an estimate of the mean of correlations between all possible halves of your questionnaire (then corrected for halving)