Reliability Flashcards
TEST-RETEST
We consider the consistency of the test results when the test is administered on different occasions
only applies to stable traits
Sources of difference between test and retest?
Systematic carryover - everyones score improved the same amount of points - does not harm reliability
Random Carryover - changes are not predictable from earlier scores or when something affects some but not all test takers
Practice effects - skills improve with practice
Midterm exam twice - expect you to do better-
Time before re-administration must be carefully evaluated
Short time: carryover and practice effects
Long time: poor reliability, change in the characteristic with age, combination
Well-evaluated test: test-retest
Well-evaluated test - many retest correlations associated with different time intervals between testing sessions - consider events in between
PARALLEL FORMS
we evaluate the test across different forms of the test
use different items; however, the rules used to select items of a particular difficulty level are the same.
Give two different forms to the same person (same day), calculate the correlation
Reduces learning effect
CON: not always practical - hard to come up with two forms that you expect to behave identically
SPLIT HALF/Internal Consistency
Administer the whole test - split it in half and calculate the correlation between halves
If progressively more difficult - even odd system
CON: how do you figure out which halves? - midterm - don’t expect all questions to be the same
SPLIT HALF: Spearman-Brown Correction
allows you to estimate what the correlation
between the two halves would have been if each half had been the length of the whole test:
R = 2r/1+r
Corrected r = the estimated correlation between the two halves of the test if each had the total number of items
increases the estimate of reliability
r = the correlation between the two halves of the test
Assuming variance btw the two halves are similar
SPLIT HALF: Cronbach’s Alpha
The coefficient alpha for estimating split-half reliability
LOWEST boundary for reliability
Unequal variances
A = the coefficient alpha for estimating split-half reliability
O2x = the variance for scores on the whole o2y1o2y2 = the variance for the two separate halves of the test
SPLIT HALF: KR20 formula
Reliability estimate - math as a way of solving the problem for all possible split halves
S2 = the variance of the total test score
P = the proportion of the people getting each item correct (this is found separately for each item)
Q = the proportion of people getting each item incorrect. For each item, q equals 1-p.
Sumpq = sum of the products of p times q for each item on the test
to have nonzero reliability, the variance for the total test score must be greater than the sum of the variances for the individual items.
This will happen only when the items are measuring the same trait.
The total test score variance is the sum of the item variances and the covariances between items
only situation that will make the sum of the item variance less than the total test score variance is when there is covariance between the items
greater the covariance, the smaller the Spq term will be.
When the items covary, they can be assumed to measure the same general trait, and the reliability for the test will be high.
KR20 formula cons split half
split half: KR21 Formal
Similar - different version
does not require the calculation of the p’s and q’s for every item. Instead, the KR21 uses an approximation of the sum of the pq products—the mean test score
Assumptions need to be met:
most important is that all the items are of equal difficulty, or that the average difficulty level is 50%.
Difficulty is defined as the percentage of test takers who pass the item. In practice, these assumptions are rarely met, and it is usually found that the KR21 formula underestimates the split-half reliability
SPLIT HALF: Coefficient Alpha
Variance of all individual items compared to variance of test score
Tests where there is no correct answer - likert
Similar to the KR20 - sumpq - replaced by sums2i = variance of the individual items -sum individual variances
Factor ANalysis
Can be used to divide the items into subgroups, each internally consistent - subgroups of items will not be related to one another
Help a test constructor build a test tha has submeasures for several different traits
Classical test theory - turning away bc
- Requires that exactly the same test be administered to each person
- Some items are too easy and some are too hard - so few of the items concentrate on a persons exact ability level
- Assumes behavioral dispositions are constant over time
Item Response Theory
Basis of computer adaptive tests
focus on the range of item difficulty that helps assess an individual’s ability level.
turning away from the classical test theory for a variety of different reasons.
IRT, the computer is used to focus on the range of item difficulty that helps assess an individual’s ability level. For example, if the person gets several easy items correct, the computer might quickly move to more difficult items.
more reliable estimate of ability is obtained using a shorter test with fewer items