Chapter 9 Flashcards

1
Q

The ability of a test to provide consistent results when repeated

A

Reliability

(either by the same examiner or by more than one examiner testing the same attribute on the same group of subjects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The degree to which a test truly measures what it was intended to measure

A

Validity

-In valid tests, when the characteristic being measured changes, corresponding changes occur in the test measurement

(Contrast: tests with low validity do not reflect patient changes very well)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Theoretical concept involving a measurement derived from a perfect instrument in an ideal environment

-What is the equation?

A

True Score

-Observed score = True score + Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a group of subjects, variation of true scores occurs because of:

A

1) Individual differences of the subjects
2) Plus an error component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Errors that are attributable to the examiner, the subject, or the measuring instrument

A

Random Errors

-Have little effect on the group’s mean score because the errors are just as likely to be high as they are low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Errors that cause scores to move in only one direction in reponse to a factor that has a constant effect on the measurement system

A

Systematic Errors

  • Considered a form of bias
  • Ex: blood pressure cuff out of calibration will always generate abnormal BP readings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The proportion of true score variance divided by the observed score variance.

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between true score variance and observed score variance?

A

True Score = real difference between subjects’ scores due to biologically different people

Observed score = portion of variability that is due to faults in measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Equal to the true score variance divided by the sum of the true score variance plus the error variance. Becomes larger as the error variance gets smaller

A

Reliability Coefficient

Ex: Error variance = 0.0, then the reliability coefficient = 1.0

-Becomes larger (decreased reliability) as error variance gets larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does a reliability coefficient of 0.75 imply?

A

Implies that 75% of the variance in the scores id due to the true variance of the trait being measured and 25% is due to the error of the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T/F A reliability coefficient of 0.0 implies no reliability

A

True

  1. 0 = No reliability
  2. 0 = perfect reliability
  3. 75 = greater good reliability
  4. 5-0.75 = moderate reliability

<0.5 = poor reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Means that when 2 or more examiners test the same subjects for the same characteristic using the same measure, scores should match.

A

Inter-examiner reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Means that the scores should match when the same examiner tests the same subjects on two or more occasions. This is the degree that the examiner agrees with himself or herself

A

Intra-reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

There should be a high degree of this between scores of 2 examiners testing the same group of subjects or 1 examiner testing the same group on 2 occasions. However, it is possible to have good this and concurrent poor agreement.

A

Correlation

-High correlation and concurrent poor agreement occurs when 1 examiner consistently scores subjects higher or lower than the other examiner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Used to assess self-adminstered questionnaires which are not directly controlled by the examiner. Test is administered to the same group of subjects on more than one occasion.

A

Test-retest reliability

-Test scores should be consistent when repeated and should correlate as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T/F Conditions like pain and disability status are effective parameters to use for test-retest reliability

A

FALSE.

-For test-retest reliability it is assumed that the condition being considered has not changed between the tests, therefore pain and disability status would not be good parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Type of reliability that uses 2 versions of a questionnaire or test that measures the same construct are compared. Both subjects are administered to the same subjects and the scores are compared to determine the level of correlation.

A

Parallel forms reliability (Alternate forms reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Teh degree each of the items in a questionnaire measures the targeted construct. All questions should measure various characteristics of the construct and nothing else

A

Internal consistency reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A questionnaire is administered to 1 group of subjects on 1 occasion. The results are examined to see how well questions correlate. If reliable, each questions contributes in a similar way to the questionnaire’s overall score. What type of reliability is this?

A

Internal consistency reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A measure of internal consistency that evaluates items in a questionnaire to determine the degree that they measure the same construct. Is essentially the mean correlation between each of a set of items

A

Cronbach’s coefficient alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

T/F

A Cronbach’s alpha rating of 0 implies perfect internal consistency while 1 represents a questionnaire that includes too many negatively correlating items. Alpha values <0.30 are generally considered acceptable.

A

FALSE.

  • 1 = perfect internal consistency
  • 0 = questionnaire includes many negatively correlating items
  • >0.70 = generally considered to be acceptable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Useful to visualize the results of two examiners who are evaluating the same group of patients. Inter-examiner reliability articles often present their findings in this form.

A

2x2 Contingency Table

-If not utilized, they are fairly easy to create from the data presented in the article

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The agreement between examiners evaluating the same patients can be represented by the percentage of agreement of paired ratings. However, percentage of agreement does not account for agreement that would be expected to occur by chance.

A

Kappa statistic

-Even using unreliable measures, a few agreements are expected to occur just by chance. Kappa therfore is appropriate for use with dichotomous or nominal data because it accounts for agreement that occurs beyond chance levels represents true agreement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

= Observed agreement minus change agreement divided by the sum of 1 minus chance agreement

A

Kappa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

= the number of exact agreements divided by the number of possible agreements

A

Observed agreement (Po)

a. k.a. Po = a+d/a+b+c+d
- Use the Po to determine the Kappa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

= the number of expected agreements divided by the number of possible agreements

A

Chance agreements (Pc)

a. k.a. a(exp) + d(exp)/ a+b+c+d
- a(exp) and d(exp) can be found using the same procedure used to calculate expected cell values in chi square test (multiple the row total by the column total and then dividing by the grand total for cells a and d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Kappa = __ minus __ / 1 minus __

A

Kappa = Po-Pc/1-Pc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

When the amount of observed agreement exceeds the chance agreement, what effect does this have on Kappa?

A

Makes Kappa positive

  • Strengthens the agreement if Kappa is more positive
  • If negative, the agreements are less than chance.

0 Kappa = no agreement

0.8-1.0 = almost perfect agreement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Another measure of inter-examiner reliability that is for use with continuous variables. Can be used to evaluate 2 or more raters.

A

Intraclass Correlation Coefficient

-Can use Pearson’s r, but ICC is preferred when sample size is small (<15) or more than 2 tests are involved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How many possible type of ICC models may be utilized?

A

6

-Type used should always be presented in paper (1st number represents the model while the 2nd number represents the form used)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Index of reliability that ranges from below 0.0 (weak reliability) to +1.0 (strong reliability).

A

ICC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

The ratio of between groups variance to total variance.

A

ICC

  • Between group variance is due to different subjects having test scores that truly differ
  • Total variance is due to score differences resulting from inter-rater unreliability of two or more examiners rating the same person.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What test is used to calculate ICC?

A

Two-way ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

The ability of tests and measurements to in fact evaluate the traits that they were inteded to evaluate. Vital in research as well as in clinical practice.

A

Validity

-The extent of a test’s validity depends on the degree to which systematic error has been controlled for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

The greater the validity, the ____ likely the test results will reflect true differences between scores

A

More likely

-Validity is a matter of degrees and not simply “black and white” i.e. it is better to say a test has moderate or high validity as oppsed to saying the test is valid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

T/F The validity of a test is dependent on the tests intended purpose.

A

True

-a hand-grip dynamometer is valid to measure grip strength but not for the quality of a hand tremor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

T/F An invalid test can never be reliable.

A

FALSE

-Ex: a test used skull circumference to predict intelligence. Poor validity, but the reliability could still be very high.

38
Q

Method to test validity. Asks if the test appears to measure what it is supposed to measure.

A

Self-evident

39
Q

Method to test validity. Asks if the test actually works as hypothesized.

A

Pragmatic

40
Q

Method to test validity. Asks if the test adequately measures the theoretical contruct involved.

A

Construct validity.

41
Q

Method to test validity. Simply deciding whether a test appears to have merit based on face value. It is the lowest level of test validation and is often assessed when researchers are first exploring a topic.

A

Face validity

42
Q

The ability of a test to include or represent all of the content of a construct.

A

Content validity

-a.k.a. the content of a test is compared to the literature that is already available on the topic (the test is said to have good content validity if it accurately reflects what is in the literature)

43
Q

Pragmatic method. The degree a test corresponds with an external criterion that is an independent measure of the characteristic being tested.

A

Criterion related validity

  • A valid test should correlate well with or predict some relevant criterion.
  • Concurrent and predictive validity are subgroups
44
Q

Pragmatic method. The results of a new test are compared with an established test to see if they are well correlated. Both tests are given at the same time.

A

Concurrent validity.

45
Q

A.k.a. a reference standard. A test that is generally acknowledged to be the best available. The value of a concurrent validity trial depends greatly on the quality of the gold standard that is used,

A

Gold standard test

46
Q

Pragmatic method. The extent to which a test effectively measures a theoretical construct (like pain or disability). The characteristic is not observed directly, rather, an abstration of the characteristic that corresponds to the construct under consideration is observed (pain scale or disability questionnaire).

A

Construct validity

47
Q

Pragmatic method. Has to do with the degree of correlation that exists between a new test and another measure of the same or similar constucts. A test is good at this correlates well with another measure of the same construct.

A

Convergent validity

48
Q

Pragmatic method. Opposite of convergent validity, where the new test is weakly related to or unrelated to another measure that is should in fact be different from. A test good at this should be able to separate patients into different groups.

A

Discriminant validity

49
Q

What are some causes for validity and reliability scores to be systematically off center?

A
  • Results from bias
  • Test environment is faulty, causing all scores to be inaccurate.
  • Scores miss the bull’s eye in one direction.
50
Q

T/F

Accurate tests are free from random error while precise tests are free from bias.

A

FALSE

  • Accurate tests are free from bias
  • Precise tests are free from random error
51
Q

Point of a specified value where scores above are considered positive and scores below are considered negative.

A

Cutoff points

52
Q

The ideal diagnostic test would always correctly what?

A

Discriminate between those with and those without the condition

  • Always positive for those with the condition
  • Always negative for those without it
53
Q

The ability of a test to correctly identify people who have the target disorder.

A

Sensitivity

54
Q

The ability of a test to correctly identify people who do not have the target disorder.

A

Specificity

55
Q

Commonly used to assess the validity of tests, expressed as a percentage.

  • 0% = none
  • 100% = perfect
A

Sensitivity and specificty

-Can use a 2x2 contigency table.

56
Q

= a/ (a+c) = TP/(TP+FN) *100

A

Sensitivity

57
Q

= d/(b+d) = TN/(TN+FP)*100

A

Specificity

58
Q

In tests that have very high sensitivity, a negative test will ____ the condition under consideration.

A

Rule out

  • This is because there are very few false negatives in tests with very high sensitivity.
  • If a test with very high sensitivity is negative, it is very likely a true negative,
59
Q

In tests that have a very high specificity, a positive test will ______ condition under consideration

A

Rule in

-This is because there are very few false positives in tests with very high specificity.

60
Q

T/F

If a cut off point is raised, sensitivity is increased, but there are also more false negatives.

A

True

-Specificity would therefore diminish.

61
Q

In tests with low sensitivity _____

A

People with the target disorder will be missed (false negatives)

62
Q

In tests with low specificity _________

A

People who do not actually have the target disorder will be identified as having it (false positive)

63
Q

Tests with ____ sensitvity may be suitable when the consequences of reporting false positive fingins to a patient are minor

A

High

-Incorrectly reporting to a patient that their triglycerides are elevated which results in them shifting to a healthier lifestyle.

64
Q

Tests with ____ specificity are better when false positive findings lead to painful or expensive treatment.

A

High

-A test that leads to surgical intervention, so false positives must be minimized.

65
Q

What are some of the implications for sensitivity and specificity?

A
  • Many false positives may result since very few cases have the potential to be detected, even when highly specific tests are used.
  • Not a serious problem if positive screening leads to confirmatory testing.
  • Many cases may be overlooked if screening for common conditions, even when a highly sensitive test is used.
66
Q

The probability that a positive test will correctly identify people who have the target disorder.

A

Positive predictive value

a/(a+b)

67
Q

The probability that a negative test will correctly identify people who do not have the target disorder

A

Negative predictive value

d/(c+d)

68
Q

The probability that the results of a diagnostic test would be expected in a patient with the condition of interest (________),compared to the expected results of the same test in a patient without the condition (_________)

A

With condition = sensitivity

Without condition = specificity

69
Q

A ratio of the probability of a positive test in a person with the condition compared to the probability of a positive test in a person without the condition.

A

Likelihood Ratio of a postive test.

-{a/(a+c)}/1-{d/(b+d)} a.k.a. sensitivity/1-specificity.

70
Q

T/F

LR >1 implies the probability that the condition is present is increased.

A

LR = >1 = probability is increased

LR = < 1 = probability is decreased

LR = 1 = probability condition is present vs. not present is the same.

71
Q

A ratio of the probability of a negative test in a person with the condition compared to the probability of a negative test in a person without the condition.

A

Likelihood ration of a negative test

Eq: 1-sensitivity/specificity

72
Q

T/F Likelihood ratios are the most useful single indicator of a test’s diagnostic strength

A

True

-Can be used to help make decisions about the need of further testing.

73
Q

T/F

LR of >10 or <0.1 indicates that it changes the probability of a given diagnosis to a small and rarely important degree

A

FALSE.

  • LR >10 or <0.1 = large and conclusive conclusive changes in the probability of a given diagnosis
  • LR in the range of 1-2 or 0.5-1 indicates the probability of a given diagnosis to a small and rarely important degree.
74
Q

T/F

LR > 10 indicate that the test can be used to rule the condition in

A

True

LR ~1 provide no useful information

LR < 0.1 = indicate that rest can be used to rule the condition out.

75
Q

The probability that a patient has a condition before the test is carried out.

A

Pre-test probability

  • Based on clinician experience, prevalnce of the condition, and published literature.
  • May be modified up or down the if the patient has risk factors.
76
Q

Generated by combining a patients pre-test probaility of having the condition with the test’s LR

A

Post-test probability

  • High pre-test + high LR = very high post-test probability
  • Low pre-test + low LR = very low post-test probability
77
Q

What are the 3 sources of clinical disagreement

A
  • The examiner (Dr.)
  • The examined (patient)
  • The examination
78
Q

What are examples of clinical disagreement due to the examiner?

A

1) Biological variations of senses (loss of hearing, vision, smell, etc.)
2) Tendency to record inferences rather than evidence (“pre-diagnose” patients)
3) Ensnarement by diagnostic classification schemes
4) Entrapment by prior expectation (Dr. finds what he hopes to find)
5) Examiner incompetence

79
Q

What are some causes of clinical disagreement due to the examined?

A

1) Biological variation
2) Effects of illness and medications (meds may mask symptoms, patients in extreme pain are difficult to examine)
3) Memory and rumination (Recall bias)
4) Toss-ups (deals with conflicting way to manage patients conditions)

80
Q

Clinical disagreement due to the examination

A

1) Disruptive environment (child crying)
2) Disruptive interactions between Dr. and patient (no trust)
3) Dysfunctional or incorrectly used diagnostic tools

81
Q

What is the first thing to decide when appraising reliability and validity articles?

A

Determine whether the purpose of the study is to assess the test’s reliability or validity (or both)

  • Reliability studies assess the consistency of tests within or between examiners
  • Validity studies compare test results with established tests, or how accurately the test predicts a future outcome.
82
Q

How to determine if an articles tests were adequately described?

A
  • Article should mention how patients prepared for the test, what they had to endure, and how their results were analyzed or interpreted
  • Patient inconcenience, cost, and harm must be weighed against the need for information
83
Q

How to determine if a study included a full range of subjects with and without the condition?

A
  • All types of patient should be included, like one would see in everyday clinical practice
  • If too many sick are included, there is a greater change that those with the disease will test positive
84
Q

How to determine if a study used an acceptable gold standard for comparison?

A
  • Credibility of a validity study depends on the soundness of the gold standard
  • Often difficult to find an ideal gold standard since most tests have both high sensitivity and high specificity
  • Especially complex for spinal function tests
85
Q

How to determine if the test results and the gold standard assessed independently in a blinded fashion?

A

-Raters need to be unaware of results of previous testing, because this knowledge can greatly affect the interpretation of tests

86
Q

When raters are influenced by knowledge of certain features of the case, this is known as:

A

Expectation bias

87
Q

When the decision to carry out the gold standard test is influenced by the results of the tests that is being evaluated, this is known as:

A

Verification bias

88
Q

T/F Be wary of studies that use more than one gold standard test

A

Trye

-Ex: some patient were biopsied while other wait to see if the condition develops.

89
Q

T/F An examiner must consider the consequences of not performing a certain test such as a test to detect potentially very harmful conditions if undiagnosed

A

True

90
Q

T/F

As a general rule, the risk assosiated with the test should be proportional to the importance of the information to be gained.

A

True