Chapter 9 Flashcards
The ability of a test to provide consistent results when repeated
Reliability
(either by the same examiner or by more than one examiner testing the same attribute on the same group of subjects)
The degree to which a test truly measures what it was intended to measure
Validity
-In valid tests, when the characteristic being measured changes, corresponding changes occur in the test measurement
(Contrast: tests with low validity do not reflect patient changes very well)
Theoretical concept involving a measurement derived from a perfect instrument in an ideal environment
-What is the equation?
True Score
-Observed score = True score + Error
In a group of subjects, variation of true scores occurs because of:
1) Individual differences of the subjects
2) Plus an error component
Errors that are attributable to the examiner, the subject, or the measuring instrument
Random Errors
-Have little effect on the group’s mean score because the errors are just as likely to be high as they are low
Errors that cause scores to move in only one direction in reponse to a factor that has a constant effect on the measurement system
Systematic Errors
- Considered a form of bias
- Ex: blood pressure cuff out of calibration will always generate abnormal BP readings
The proportion of true score variance divided by the observed score variance.
Reliability
What is the difference between true score variance and observed score variance?
True Score = real difference between subjects’ scores due to biologically different people
Observed score = portion of variability that is due to faults in measurement
Equal to the true score variance divided by the sum of the true score variance plus the error variance. Becomes larger as the error variance gets smaller
Reliability Coefficient
Ex: Error variance = 0.0, then the reliability coefficient = 1.0
-Becomes larger (decreased reliability) as error variance gets larger
What does a reliability coefficient of 0.75 imply?
Implies that 75% of the variance in the scores id due to the true variance of the trait being measured and 25% is due to the error of the variance.
T/F A reliability coefficient of 0.0 implies no reliability
True
- 0 = No reliability
- 0 = perfect reliability
- 75 = greater good reliability
- 5-0.75 = moderate reliability
<0.5 = poor reliability
Means that when 2 or more examiners test the same subjects for the same characteristic using the same measure, scores should match.
Inter-examiner reliability
Means that the scores should match when the same examiner tests the same subjects on two or more occasions. This is the degree that the examiner agrees with himself or herself
Intra-reliability
There should be a high degree of this between scores of 2 examiners testing the same group of subjects or 1 examiner testing the same group on 2 occasions. However, it is possible to have good this and concurrent poor agreement.
Correlation
-High correlation and concurrent poor agreement occurs when 1 examiner consistently scores subjects higher or lower than the other examiner.
Used to assess self-adminstered questionnaires which are not directly controlled by the examiner. Test is administered to the same group of subjects on more than one occasion.
Test-retest reliability
-Test scores should be consistent when repeated and should correlate as well.
T/F Conditions like pain and disability status are effective parameters to use for test-retest reliability
FALSE.
-For test-retest reliability it is assumed that the condition being considered has not changed between the tests, therefore pain and disability status would not be good parameters.
Type of reliability that uses 2 versions of a questionnaire or test that measures the same construct are compared. Both subjects are administered to the same subjects and the scores are compared to determine the level of correlation.
Parallel forms reliability (Alternate forms reliability)
Teh degree each of the items in a questionnaire measures the targeted construct. All questions should measure various characteristics of the construct and nothing else
Internal consistency reliability
A questionnaire is administered to 1 group of subjects on 1 occasion. The results are examined to see how well questions correlate. If reliable, each questions contributes in a similar way to the questionnaire’s overall score. What type of reliability is this?
Internal consistency reliability
A measure of internal consistency that evaluates items in a questionnaire to determine the degree that they measure the same construct. Is essentially the mean correlation between each of a set of items
Cronbach’s coefficient alpha
T/F
A Cronbach’s alpha rating of 0 implies perfect internal consistency while 1 represents a questionnaire that includes too many negatively correlating items. Alpha values <0.30 are generally considered acceptable.
FALSE.
- 1 = perfect internal consistency
- 0 = questionnaire includes many negatively correlating items
- >0.70 = generally considered to be acceptable
Useful to visualize the results of two examiners who are evaluating the same group of patients. Inter-examiner reliability articles often present their findings in this form.
2x2 Contingency Table
-If not utilized, they are fairly easy to create from the data presented in the article
The agreement between examiners evaluating the same patients can be represented by the percentage of agreement of paired ratings. However, percentage of agreement does not account for agreement that would be expected to occur by chance.
Kappa statistic
-Even using unreliable measures, a few agreements are expected to occur just by chance. Kappa therfore is appropriate for use with dichotomous or nominal data because it accounts for agreement that occurs beyond chance levels represents true agreement.
= Observed agreement minus change agreement divided by the sum of 1 minus chance agreement
Kappa
= the number of exact agreements divided by the number of possible agreements
Observed agreement (Po)
a. k.a. Po = a+d/a+b+c+d
- Use the Po to determine the Kappa
= the number of expected agreements divided by the number of possible agreements
Chance agreements (Pc)
a. k.a. a(exp) + d(exp)/ a+b+c+d
- a(exp) and d(exp) can be found using the same procedure used to calculate expected cell values in chi square test (multiple the row total by the column total and then dividing by the grand total for cells a and d)
Kappa = __ minus __ / 1 minus __
Kappa = Po-Pc/1-Pc
When the amount of observed agreement exceeds the chance agreement, what effect does this have on Kappa?
Makes Kappa positive
- Strengthens the agreement if Kappa is more positive
- If negative, the agreements are less than chance.
0 Kappa = no agreement
0.8-1.0 = almost perfect agreement
Another measure of inter-examiner reliability that is for use with continuous variables. Can be used to evaluate 2 or more raters.
Intraclass Correlation Coefficient
-Can use Pearson’s r, but ICC is preferred when sample size is small (<15) or more than 2 tests are involved
How many possible type of ICC models may be utilized?
6
-Type used should always be presented in paper (1st number represents the model while the 2nd number represents the form used)
Index of reliability that ranges from below 0.0 (weak reliability) to +1.0 (strong reliability).
ICC
The ratio of between groups variance to total variance.
ICC
- Between group variance is due to different subjects having test scores that truly differ
- Total variance is due to score differences resulting from inter-rater unreliability of two or more examiners rating the same person.
What test is used to calculate ICC?
Two-way ANOVA
The ability of tests and measurements to in fact evaluate the traits that they were inteded to evaluate. Vital in research as well as in clinical practice.
Validity
-The extent of a test’s validity depends on the degree to which systematic error has been controlled for.
The greater the validity, the ____ likely the test results will reflect true differences between scores
More likely
-Validity is a matter of degrees and not simply “black and white” i.e. it is better to say a test has moderate or high validity as oppsed to saying the test is valid.
T/F The validity of a test is dependent on the tests intended purpose.
True
-a hand-grip dynamometer is valid to measure grip strength but not for the quality of a hand tremor.