Chapter 5: Reliability Flashcards
_____ is a synonym for dependability or consistency.
Reliability
A _____ is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance.
reliability coefficient
Recall from our discussion of _____ that a score on an ability test is presumed to reflect not only the testtaker’s true score on the ability being measured but also error.
classical test theory
A statistic useful in describing sources of test score variability is the _____(σ2)—the standard deviation squared.
variance
Variance from true differences is true variance, and variance from irrelevant, random sources is _____.
error variance
The term _____ refers to the proportion of the total variance attributed to true variance. The greater the proportion of the total variance attributed to true variance, the more _____ the test.
reliability/reliable
1) Test construction
2) Test administration
3) Test scoring and interpretation
4) Other sources of error: Underreport and Overreport
Sources of Error Variance (4)
Sources of Error Variance:
One source of variance during test construction is item sampling or content sampling, terms that refer to variation among items within a test as well as to variation among items between tests.
Test construction
Sources of Error Variance:
test environment: the room temperature, the level of lighting, and the amount of ventilation and noise, for instance.
testtaker variables. Pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication can all be sources of error variance.
Examiner-related variables: physical appearance, demeanor, presence, absence, oral exam emphasizing key words, nonverbal cues when correctness
Test administration
Sources of Error Variance:
The advent of computer scoring and a growing reliance on objective, computer-scorable items virtually have eliminated error variance caused by scorer differences in many tests. If subjectivity is involved in scoring, then the scorer (or rater) can be a source of error variance.
Test scoring and interpretation
Reliability Estimates (4)
1) Test-Retest Reliability Estimates
2) Parallel-Forms and Alternate-Forms Reliability Estimates
3) Split-Half Reliability Estimates
4) Other Methods of Estimating Internal Consistency:
a) Inter-item consistency
b) The Kuder-Richardson formulas
c) Coefficient alpha
Reliability Estimates:
using the same instrument to measure the same thing at two points in time.
is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
**The passage of time can be a source of error variance. The longer the time that passes, the greater the likelihood that the reliability coefficient will be lower.
**even when the time period between the two administrations of the test is relatively small, various factors (such as experience, practice, memory, fatigue, and motivation) may intervene and confound an obtained measure of reliability
Test-Retest Reliability Estimates
Reliability Estimates:
it is referred to as an internal consistency estimate of reliability or as an estimate of inter-item consistency.
Ex. Both groups take both tests: group A takes test A first, and group B takes test B first. The results of the two tests are compared, and the results are almost identical, indicating high parallel forms reliability.
Put simply, you’re trying to find out if test A measures the same thing as test B.
source of error variance: item sampling
cons: time-consuming and expensive.
Parallel-Forms and Alternate-Forms Reliability Estimates
Reliability Estimates:
is obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once.
One acceptable way to _____ is to randomly assign items to one or the other half of the test.
odd-even reliability
**The Spearman-Brown formula
Split-Half Reliability Estimates
Reliability Estimates: Other Methods of Estimating Internal Consistency
refers to the degree of correlation among all the items on a scale. A measure of inter-item consistency is calculated from a single administration of a single form of a test. An index of interitem consistency, in turn, is useful in assessing the homogeneity of the test.
Tests are said to be homogeneous if they contain items that measure a single trait.
The more homogeneous a test is, the more _____ it can be expected to have.
Inter-item consistency
Reliability Estimates: Other Methods of Estimating Internal Consistency
Dissatisfaction with existing split-half methods of estimating reliability compelled to develop their own measures for estimating reliability.
a measure of internal consistency reliability for measures with dichotomous choices
**coefficient alpha or coefficient α-20.
The Kuder-Richardson formulas (Kuder-Richardson formula 20, or KR-20)
Reliability Estimates: Other Methods of Estimating Internal Consistency
In contrast to KR-20, which is appropriately used only on tests with dichotomous items, _____ is appropriate for use on tests containing “nondichotomous items”.
is the preferred statistic for obtaining an estimate of internal consistency reliability. Coefficient alpha is widely used as a measure of reliability, in part because it requires only one administration of the test.
Coefficient alpha
Reliability Estimates: Other Methods of Estimating Internal Consistency
Unlike a Pearson r, which may range in value from -1 to +1, coefficient alpha typically ranges in value from _____.
0 to 1
Variously referred to as scorer reliability, judge reliability, observer reliability, and inter-rater reliability, inter-scorer reliability is the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
problem is a lack of clarity in scoring criteria, solution: rewrite the scoring criteria section of the manual to include clearly written scoring rules, group discussion, and practice exercises and information on rater accuracy.
Measures of Inter-Scorer Reliability
Perhaps the simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate a coefficient of correlation. This correlation coefficient is referred to as a _____.
coefficient of inter-scorer reliability
Using and Interpreting a Coefficient of Reliability
Three approaches to the estimation
of reliability: (3)
1) test-retest
2) alternate or parallel forms, and
3) internal or inter-item consistency
Another question that is linked in no trivial way to the purpose of the test is, “How high should the coefficient of reliability be?” Perhaps the best “short answer” to this question is: “On a continuum relative to the purpose and importance of the decisions to be made on the basis of _____ on the test”.
scores
The Nature of the Test
considerations concerning the purpose and use of a reliability coefficient are those concerning the nature of the test itself: (5)
1) test items are homogeneous or heterogeneous in nature;
2) the characteristic, ability, or trait being measured is presumed to be dynamic or static;
3) the range of test scores is or is not restricted;
4) the test is a speed or a power test; and
5) the test is or is not criterion-referenced
- Test-retest, 2, 1, administration, Pearson r or Spearman rho
- Alternate-forms, 1 or 2, 2, test construction or administration, Pearson r or Spearman rho
- Internal Consistency, 1, 1, test construction, Pearson r
- Inter-scorer, 1, 1, scoring and interpretation, Pearson r or Spearman rho
Type of reliability, # of test forms, source of Error Variance, Statistical procedure