Chapter 4 reliability Flashcards
Reliability
- the consistency with which a test measures what it purports to measure in any given set of circumstances
psychological tests have both systematic and unsystematic sources of unreliability
Systematic Errors
Systematic errors produce predictably incorrect results from a measuring process.
Unsystematic errors
constitute random variations
If you weighed your package with your hand resting on it, the pressure of your hand incorrectly inflated the weight of your package. Each time you weigh the package, the outcome varies depending on how hard you press on the scale.
domain-sampling model
sees a test as a representative sample of the larger domain of possible items that could be included in the test
o test reliability becomes a problem of sampling; through sampling items from a domain of all possible items (not people)
true position
The mean of the scores from all possible samples indicates the true position
the person’s ‘true score’ as it is called in classical test theory
The standard deviation
is the the distribution of scores from all possible samples about the true score would tell us about the likelihood of obtaining any particular sample score
standard error of measurement:
an expression of the precision of an individual test score as an estimate of the trait it purports to measure
o we use a sample to make estimates of the likely true score for an individual
The interval in which it lies, with a stated degree of confidence.
If the interval is large, we have a great deal of imprecision in the measurement process and we cannot depend on any score we obtain with this sample of items.
reliability coefficient:
an index—often a Pearson product moment correlation coefficient—of the ratio of true score to error score variance in a test as used in a given set of circumstances
- The reliability coefficient is used in forming judgments about the overall value of a particular test (e.g. is this a better test for some given purpose than another test?),
- quantifies the degree of consistency. There may be many reasons why a test is not consistent, such as errors in assessment that occur when the testing environment has an influence on how the participants perform, or other issues related to how the tests are designed. Calculating the reliability coefficient can help us understand such errors in testing.
z scores
describes the position of a raw score in terms of its distance from the mean, when measured in standard deviation units.
The z-score is positive if the value lies above the mean, and negative if it lies below the mean.
• for scores expressed as standard normal deviates (z scores).
error score variability
drawing samples repeatedly from a domain gives rise to variation in obtained scores AND THUS CAN BE THOUGHT OF AS a mixture of true score and error score variability that makes up the observed score.
variance
- a measure of the spread, or dispersion, of scores within a sample
o small variance indicates highly similar scores, all close to the sample mean
o large variance indicates more scores at a greater distance from the mean and possibly spread over a larger range
• Variance: the sum of the squared deviation of each score from the mean of the scores (the distance between)
observed score variance
The observed score variance = true score plus error score variance.
variance and reliability
- we use reliability coefficient to discover how much variance there is
- proportion will be less than 1.0 and in some cases a good deal less.
- 0.5 ( r = 0.5) = variance in the scores obtained within the test is due to variance in true scores and the other 50 per cent to errors of measurement.
- If r = 1.0 (perfect reliability), the SEM is zero; that is, there is no error in estimating the true score.
- the proportion of true score variance is zero (r = 0) then the SEM = 1, which is the standard deviation for a standard normal distribution
obtained score gives us no more information about the true score than any other score we might have obtained at random.
The reliability coefficient is determined in three main ways:
1) the equivalent forms reliability
2) Split half reliability:
3) The formula for estimating the reliability of a test that is longer
o the formula for a test that is longer than the original test by some factor is given Spearman-Brown formula (Spearman-Brown prophecy): tell us about an otherwise unknown state of affairs.
split-half reliability
the estimate of reliability obtained by correlating scores on the two halves of a test formed in some systematic way (e.g. odd versus even items)
when speeded tests are being examined (those that must be completed within a time limit), this method of estimating reliability is not recommended.
odd-even method is arbitrary and different reliability estimates can result from the one test split in different ways.
o Useful in overcoming logistical difficulties of test-retest reliability.
o Estimates of reliability based on split-half will be smaller than actual reliability scores due to use of smaller number of items in your calculation.
o Longer tests have higher reliability