Chapter 3: Reliability and Validity Flashcards
What is reliability?
The stability or consistency of a test
Tells us if the test provides good measurement
Tests are used in decision-making
Why is consistency in a test
important?
Because an inconsistent test means:
Our test doesn’t provide a good measure of stable traits or attributes.
basically we could end up making bad decisions
What is classical test theory?
A psychometric theory of measurement
Most commonly used approach to measurement in psychology.
x = T + e
x is observed score on the test
T is the true score
e is the error of measurement
Classical Test Theories equation for error
e = x - T
Assumptions of Classical Test Theory
- The mean error of measurement is 0
- True scores and errors are uncorrelated
- Errors on different measures are uncorrelated
Test-retest method
Methods of Estimating Reliability
- Test-retest method
- Parallel forms
- Split-half methods
- Internal consistency methods
Test Re-test Reliability
Give the same group of people the same test at two different points in time and then correlate the scores by computing a correlation coefficient. (reliability coefficient is thought of more as a stability coefficient)
Measures the stability of scores over time.
Pearson Product Moment Correlation (r)
The most common correlation coefficient.
It is used when two sets of scores are continuous and normally distributed.
Correlation Coefficients can vary from 0 (no relationship) to +1 or –1 (perfect positive or negative relationship)
We need a .70 or above for reliability
Test-retest methods: Error & Issues
Error is due solely to measurement error
Some issues:
– Carryover effects (interval between tests)
– Memory
– Stability of construct
– Factor of fatigue
– Reactivity (people may learn about the topic between tests)
- Motivation (ppl may not be motivated on the test when taking it a second time)
- it is difficult to determine a suitable interval of time between tests (if you wait too long the person could have changed but if you do it too soon then there would be carryover effects)
Problems with method:
– Time-consuming
– Expensive
Alternate Forms Reliability (aka equivalent forms)
Give a test to a group of people, then after a suitable amount of time give them a different form of the test, then correlate the scores.
Has to be administered either at different times or in succession
Half must take test A then B and half must take B then A
Alternate forms methods: Error & Issues
Error due to test content & perhaps passage of time (if not give back to back)
Some issues:
– Need same number and type of items on each test
– Item difficulty must be the same on each test
– Variability of scores must be the same on each test
- item sampling
- temporal aspects
Developing an equivalent alternative test can be extremely time consuming and sometimes impossible.
Example: can easily come up with equivalent tests to assess math knowledge but it is near impossible to come up with two equal tests that assess depression because a limited number of items relate to depression while there are infinite math questions you can ask.
Alternate forms methods: Bonuses
Bonuses
– Shorter interval
– Carryover effects are lessoned
– Reactivity is partially controlled
Split Half Methods
Give the test to a group of people, split it in half (usually odds and evens) , then correlate the scores
Concerned with internal consistency
Determines to what extent the test is composed of homogeneous items.
Some psychologist think tests should be homogeneous while others don’t care if they are homo or heterogeneous, they only care how well the test works
The reliability of the split half method
From the viewpoint of item sampling (not temporal stability), the longer the test the higher will its reliability be
The Spearman-Brown formula: (allows us to estimate the reliability of the entire test from split-half administration
estimated r = [ k (obtained r ) ] / [ 1 + (k − 1)(obtained r ) ]
k is the number of times the test is lengthened or shortened
For split half tests, k is 2
Split-half methods: Error & Issues
Error due to differences in item content between the
halves of the test
Some issues:
– Deciding which split-half reliability estimate to use
Split-half methods: bonuses
Bonus:
– Carryover, reactivity, and time are minimized
The Rulon Formula
Alternative to Spearman-Brown formula
estimated r = 1 − variance of differences / variance of total scores
Four scores are generate for each person: Odd items, Even items, Difference (odd – even), Total (odd + even)
If scores were perfectly consistent then there would be no variance so the “variance of differences” would be 0
R would = 1
The ratio of the two variances reflects the proportion of error variance, when this is subtracted from 1 we get the proportion of “true” variance aka the reliability
Why do we want Variability and how do we increase it?
Variability of scores among individuals, that is, individual differences, makes statistical calculations such as the correlation coefficient possible.
For greater variability increase the range of responses and create a test that is not too easy or too difficult.
The number of items – a 10-item true-false scale can theoretically yield scores from 0 to 10, but a 25-item scale can yield scores from 0 to 25, and that of course is precisely the message of the Spearman-Brown formula.
Internal Consistency Methods
Examines the items.
Give the test to a group, then compute the correlations among all items and compute the average of these intercorrelations, use a formula like coefficient alpha to estimate the reliability