Week 2 - Reliability Flashcards
Most common misconception about reliability and validity
They are not properties of the test themselves but the properties of the test in a particular situation or for a particular purpose
Define reliability
The consistency with which a test measures what it purports to measure in any given set of circumstances. If something is reliable it can be depended on.
Does the test produce consistent responses?
What is social desirability bias?
A form of method variance common to the construction of psych tests of personality that arises when people respond to questions that place them in a favourable or unfavourable light.
The domain-sampling model
The test or assessment device draws from a larger set of items to give a score, therefore the score is an estimate.
If all possible questions had been asked, we would have the true position.
Thus, reliability becomes a question of sampling test items from a domain of all possible items.
Standard error of measurement
an expression of the precision of an individual test score as an estimate of the trait as it purports to measure
Reliability coefficient
an index - usually Pearsons’ r - of the ratio of true score to error score variance in a test given in a set of circumstances.
The proportion of observed score variance that is due to true score variance. 0.5 can be the minimum level of reliability.
What’s the oldest way to calculate reliability of a test?
Two forms of the same test, see if the different items agree in the scores they yield.
- Minimises practice effects.
- However, If they lead to different scores, then one (but we don’t know which one) cannot be depended on.
What is Split-half reliability?
Split the test in half to compare scores. e.g. odd numbers scores compared to even numbers. By correlating the numbers from the two tests (with a high enough sample of ppts) you get an estimate of reliability.
(When you have larger samples, just use the whole test)
Pros and cons of split half reliability?
With odd even number method, fatigue effects are the same for both halves of the test.
But for speeded tests (time limit), it is not recommended.
But odd even method is arbitrary and different methods of splitting have different pros and cons.
How to work out Cronbach’s alpha?
Split the test into subtests, of 1 item each. Correlate all subtests with all other subtests and the average correlation is reliability.
i.e. ‘internal consistency’ of a test.
Cons of Cronbach’s alpha? (4)
- Tests with high internal consistency can just have items with similar content.
- Although faithfully sampling a domain, the domain might be trivial.
- High internal consistency does not mean the test is measuring the thing. The items might be interrelated but not homogenous/unidimensional.
- If there are multiple factors (traits) underlying performance on a test, alpha can overestimate the reliability the factor thought to underlie the test.
So Confirmatory Factor Analysis might be better.
What is test-retest reliability?
the estimate of reliability obtained by correlating scores on the test obtained on two or more occasions of testing The stronger the correlation, the more reliable.
Important for retesting patients to see if they are getting worse etc. If the test will drift over time, they need a more reliable test.
What is Generaliseability theory?
Cronbach 1972 said in obtaining scores from a test, the user seeks to generalise beyond the particular score to some wider universe of behaviour. Users must SPECIFY the desired range of conditions over which this is to hold.
What is inter-rater reliability? What is the best method of obtaining it?
Correlation scores across different judges or raters. Consistency between raters.
How reliable does a test need to be?
VERY, if the test has serious consequences for the individual. But if it’s still being developed, a lower level of reliability will suffice.
Nunnally (1967) rule of thumb, 0.5 or better for a test developer, 0.7 or better for research, and better than 0.9 for individual assessment.