Wk 4 - Reliability Flashcards
What is classical test theory?
What equation does it give us for the relationship between scores? (x3)
It’s the conceptual basis for psychometrics
The observed score = True score + Error of measurement
X = T + E
What is true score theory? (x1)
Another name for Classical Test theory
What is reliability in terms of the relationship between true and total variance? (x4)
According to Classical Test theory
r = true variance (hypothetical variation of test scores in a sample if no measurement error) divided by total variance (actual variation in data - including error)
r = omega(sq, T) over omega(sq, X)
Therefor measurement error is inversely related to reliability – lower measurement error = higher reliability
Why do we describe classical test theory in terms of variance rather than standard deviations? (x2)
Because variance is additive and can be broken up into its components
Whereas SD can’t
What are four sources of measurement error?
Test construction
Test administration
Test scoring
Other influences
What is item sampling/content sampling? (x2 plus egs)
Content sampling would be testing every aspect of the trait/skill
eg aaaaaalll the content of the course over 24 hours
Item sampling is testing a representative proportion of it
eg the 2 hour exam
Why can we only estimate the reliability of a test and not measure it directly? (x2)
Because true variance is a hypothetical/theoretical construct -
We can’t measure everyone on the planet and work it out
Name and describe four methods available to us to help estimate the reliability of a test.
Test-retest – how do scores correlate if people sit the same test twice?
Alternate-forms – how do scores correlate if people do two different versions of same test
Internal consistency – how much do the items in a test correlate with each other, on average? (Cronbachs, KR-20)
Inter-rater reliability – check the correlation on two/more different examiner ratings
Describe the steps involved in calculating Cronbach’s alpha by hand.
Split questionnaire in half
Calculate total score from items in each half
Work out correlation between those totals (the two halves)
Repeat steps 1-3 for all possible two-way splits of the total number of items
Work out the average of all the possible split-half correlations
Adjust correlation to account for the fact that you’ve shortened (halved) the test – special version of the Spearman-Brown formula (the less items, the lower the reliability correlation. So, when you cut the test in half you’re artificially lowering the reliability.)
How is the KR-20 calculated?
As with Cronbachs, this formula gives you an estimate if you’d worked out the mean of the correlations between all possible halves of your questionnaire (then corrected for halving)
What’s the difference between parallel forms and alternate forms? (x3)
Both give correlation between scores on 2 versions of same test by same people at same time, but
Alternate forms just needs high Coefficient of Equivalence, whereas
Parallel also requires that Mean, SD and correlations with other tests must be the same
What is the coefficient of equivalence in the context of a test with parallel forms? (x3)
The correlation between two versions of the same test
Applies to parallel and alternate
Also another term for the reliability coefficient used in these methods
List five considerations that might affect which reliability estimate you can use
- Homogeneity/heterogeneity of the test
- Static vs dynamic characteristics
- Restriction of range/variance
- Speed tests versus power tests
- Criterion-referenced tests
What is a homogeneous test? (x1)
If the test items all measure the same thing.
What is a heterogeneous test? (x2)
If more than one independent thing is being measured
i.e. there are subscales that don’t intercorrelate highly
Describe in detail exactly what the standard error of measurement is supposed to represent (x4)
The reliability of a test will never be 100%
So knowing the margin of error is critical in interpreting the meaning of an individual’s scores
Assuming a normal distribution of those test scores, and that their true score is at the centre, the SEM is the SD
Mostly hypothetical - we use the one time they did the test, to calculate what SEM would be is one person, one test, many time
What is the CI? (x1)
Why do we have to add and subtract DOUBLE the SEM from an individual’s score in order to get the 95% confidence interval? (x4)
Plus eg calculation
The range of scores that is likely to contain a person’s true scoreBecause under a normal distribution (assumed to be the case)
68% of scores are +/- 1 SD/SEM from mean, while
95% are within +/- 1.96
(99.7% are within 3)
WAIS IQ score of 105, SD of all IQ tests is 15, reliability is .98
SEM = 15 x sqrt(1-.98) = 2.12
CI = [105 +/- (2*2.12)] from 101 to 109
What is the reliable change index? (x1)
How do you calculate? (x1)
And how do you apply? (x1)
In clinical practice, another variation on SEdiff which is mathematically equivalent
Work out the diff between two scores (eg change during intervention), and divide by the SEdiff
If the RCI is greater than 1.96 (ie 2 standard errors of the difference) you have statistically significant change
True or false?
When calculating the Cronbach’s alpha, you have to multiply the correlations derived from all the possible split-half correlations of the items
And why? (x1)
False, you average them
True or false?
As part of the process of calculating Cronbach’s alpha, you need to adjust for the halving of the number of items by applying a special version of the Spearman Brown formula.
And why? (x2)
True, because the reliability correlation will be lower, the less items you have
So the test is artificially lowering it
True or false?
Imagine we revised a test and as a result the true variance became a greater proportion of the total variance. This would mean that the standard deviation of scores of a single person taking the test multiple times would become smaller
And why? (x2)
True
Because:
Total variation = True (systematic) variation + Error (unsystematic) variation
So to get the same Total while True increases, Error must be shrinking (ie the SD - the root of the variance)
True or false?
The fact that I cannot ask students everything about the course in the exam will decrease the proportion of the observed score that can be accounted for by the true score (assuming the exam mark is supposed to reflect students’ PSYC3020 knowledge)
And why? (x2)
True
Because:
Score obtained = True measurement + Error
So, inability to ask about ‘everything’ will increase the amount of error in the test, therefor to to retain the same observed score, the True score must go down.
How can we use variance to explain the relationship between true scores and measurement error, according to Classical Test theory? (x2)
How would this relate to SD of scores? (x1)
X = T + E
Total Variance = True variance + Error variance
SD is the root of variance, so the SD of score obtained would be influenced by SD of True and Error too
How does test construction affect measurement error? (x2)
Give an example (x1)
Whether you can access all of the trait, or only some of it (content vs item sampling)
ie error increases the less items you have that tap into what you’re measuring
A two-hour exam vs 24 hr that asked about absolutely everything