Study Guide 9: Reliability: Estimation, Interpretation, & Impact Flashcards
Adjusted true score estimate
it takes measurement error into account, and adjust the point estimate to the mean. Xest= X+Rxx (Xo – X)
Alternate forms reliability
method for estimating reliability of test scores by obtaining scores from two different forms of a test and computing the correlation between them.
(Cronbach’s) coefficient alpha
most widely used method for estimating reliability. useful for determining the extent to which the ratings from raters are internally consistent
Cohen’s Kappa
measures the agreement multiple raters, participants or measurement categories- who each classify items into mutually exclusive categories. It’s a more robust measure of agreement than a simple percentage aggrement because it corrects for agreement that would be expected by chance alone. Thus Kappa =0 is agreement by chance only, and 1.0=perfect chance-corrected agreement. 0-0.2=slight 0.21-0.40=fair 0.41-0.6=moderate 0.61-0.8=substantial 0.81-1.0=almost perfect
Composite score
if a test includes multiple items, and if the overall score for the test is computed from the responses to those items on a test, the overall score is the composite score.
Confidence interval or error band
reflect the accuracy or precision of the point estimate as reflective of an individual’s true score. The greater the sem the greater the average difference between observed scores and true scores.
95% confidence interval=Xo+/-(1.96)(sem)
68% (+ 1 SEM), 95% (+ 1.96 SEM), 99% (+ 2.58 SEM)
Correction for attenuation
is a statistical procedure, due to Spearman (1904), to “rid a correlation coefficient from the weakening effect of measurement error”
Essential tau equivalence
when two tests measure the same psychological construct. rests on more liberal assumptions than that of parallel tests (tau equivalence) – ie. the assumption of equal error variance is not required. Thus, estimates from alpha are likely to be accurate more often than those from methods like the split-half approach.
Internal consistency reliability ·
practical alternative to alternative forms or test-retest procedure.
Inter-rater reliability
- how repeatable the scores are when two or more different people are scoring or observing the same behavior
Kuder-Richardson formula 20 or KR-20
measure of internal consistency reliability for measures with binary items. It is analogous to Cronbach’s alpha, but for dichotomous choices.
Point estimate
single best estimate of the quantity of an underlying psychological attribute at the moment the individual took the test
Random (unsystematic) error
- is caused by any factors that randomly affect measurement of the variable across the sample. It does not have any consistent effects across the entire sample. Instead, it pushes observed scores up or down randomly. This means that if we could see all of the random errors in a distribution they would have to sum to 0 – there would be as many negative errors as positive ones. The important property of random error is that it adds variability to the data but does not affect average performance for the group.
Regression to the mean
likelihood that, upon a second testing, an individual’s score is likely to be closer to the group mean than was his or her first score.
Spearman-Brown correction
formula that allows you to calculate the reliability of a revised test (ie., a test that has been lengthened or shortened)
Split-half estimate of reliability
– splitting a test into two parallel halves of equal size, and correlating the performance on those halves
Standardized coefficient alpha
relies only on correlations (pair-wise correlations) – then take the average of all the correlation- this reflects the degree to which responses to all of the items are generally consistent with each other. Then estimate reliability by using average interitem correlation within the Spearman and Brown formula.
Describe the relationship between (a) reliability and SEM, and (b) reliability and confidence intervals
Larger sem means less reliability
More reliable tests will produce narrower confidence intervals.
Describe the implications of reliable measures for research
1) observed correlations (ie. between measures) will always be weaker than true correlations (ie. between psychological constructs)
2) the degree of attenuation is determined by the reliabilities of the measures -the poorer the measure, the greater the attenuation
3) error constrains the maximum correlation that could be found between two measures.
4) It is possible to estimate the true correlation between a pair of constructs. By knowing the observed correlation between measures, and their estimated reliabilities, they can solve for true correlation.
a. The equation used for this is “correction for attenuation” because it allows researchers to estimate the correlation that would be obtained if it were not affected by attenuation.
Understand what elements contribute to the correlation between two (observed) scores
a) the correlation between the true scores of the two psychological constructs being assessed
b) the reliabilities of the two measures -
change scores
the change in a score on one test from one point to another – difference scores. Concern variability.
discrepancy scores
the difference scores are computed by subtracting scores from one type of test (eg. An achievement test) from a different type of test (eg. IQ test).
In order to create discrepancy scores, the test scores used in the calculation should be on similar metric scales. Thus, if scores on two tests are in different metrics, standardization of the scores is necessary in order to calculate difference scores and a discrepancy.
Understand and describe the difference and relationship between internal consistency and dimensionality
Internal consistency doesn’t necessarily mean a test is unidimensional, although this is a tempting conclusion. An internal consistency estimate could be high (eg. Alpha=.75) even if a test is multidimensional because a composite test might have items within each composite test that correlate highly with each other, but the items from different sets correlate weakly.
Factor analysis is a more appropriate method for evaluating dimensionality.