Classification in Diagnostic Tests Flashcards

1
Q

What is sensitivity?

A

Sensitivity is the true positive rate of a test or measure. This means the percentage of participants, of who you know that they have a certain disorder/pathology, are correctly classified as having a certain pathology/disorder/etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is specifity?

A

Specifity is the true negative rate of a test or measure. This means the percentage of participants who belong to the healthy control group get also classified as having no disorder/pathology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the positive predictive power of a test?

A

This is the confidence you can have (in terms of percentage) that an outcome of a test is actually predicting what the measure should be predicting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is negative predictive power?

A

Related to specifity; the confidence you can have in that an outcome of a test which says that someone has no pathology/disorder is in reality also predicting this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of these TOC values are the most important for clinical practice?

A

PPP and NPP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which factor influences PPP and NPP? And which implication does this have for clinical practice?

A

Prevalence rates of a given disorder; if this is higher, the PPP will increase. If this is a lower prevalence, the NPP will increase. This means that you really have to be careful to have the same base rate of the disorder in your sample as is the case in the population, otherwise you will have over- or underestimations of PPP and NPP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the current problem with TOC values in NP-research?

A

That most researchers only include calculations of sensitivity and specifity, and don’t consider PPP and NPP, thereby excluding important information. Meaning, that clinicians have (with only S and S) no idea how confident they can be that a certain measure rules in/out a disorder correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Clinicians need information on true positives (1), true negatives (2), but also false positives (3) and false negatives (4) when using a test or measure for clinical decision-making.

A
  1. Sensitivity
  2. Specifity
  3. PPP
  4. NPP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main recommendations for TOC-values according to Lange & Lippa?

A

(1) Always calculate sensitivity, specificity, PPP, and NPP.
(2) Calculate PPP and NPP values based on a range of hypothetical base rates of the
condition in the target population, not the actual base rate of the condition in the
experimental sample.
(3) Always evaluate the clinical utility of a test/measure based on the interpretation of
sensitivity, specificity, PPP, and NPP together. Never interpret sensitivity and specificity
in isolation. Never interpret positive predictive power and negative predicting
power in isolation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main difference in studying test results in NP-research and medical tests?

A

In NP-research, we mainly focus on finding significant associations and causality (this is often sufficient), while in medical tests this is far from sufficient. To know if results have clinical implications, they also need information on the sensitivity, specifity, CI’s and likelihood rations (more generally speaking, descriptive statistics). P-values and odds ratio’s are, so to say, secondary statistics in medical tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which two forms of variability are there in tests results?

A
  1. Intra-observer variability = the lack of reproducibility in results when the same observer/laboratory performs the test on the same specimen at different times.
  2. Inter-observer variability = the lack of reproducibility among two or more observers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Studies of reproducability adress (1)….., not (2)…. or (3)…..

A

(1) precision
(2) accuracy
(3) validity
- –> All observers can agree with each other and still be wrong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the basic design to assess reproducibility?

A

Comparing test results from more than one observer (inter-observer) or that were performed on more than one occasion (intra-observer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can the inter-observer agreement be measured in categorical variables?

A
  1. The percentage of observations on which the observers agree exactly. Cons: hard to interpret, and it counts partial agreement the same as complete agreement.
  2. Kappa measures the extent of agreement beyond what would be expected from observers’ knowledge of the prevalence of abnormality, and can give credit for partial agreement.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can the inter-observer agreement be measured in continuous variables?

A

This depends on the design of the study; can be done with mean differences between the measurements, SD’s, etc. Another possibility is the Coefficient of Variation (CV) which is the standard deviation of all of the results obtained from a single specimen divided by the mean value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do studies of the accuracy of tests measure?

A

They measure (assuming that there is a gold standard) to what extent a test gives the right answer.

17
Q

How can accuracy studies of diagnostic tests be designed ?

A

Studies of diagnostic test accuracy resemble a case-control or cross-sectional design. These CANNOT be used to estimate predictive values (because the researchers know about the health status before the measurement bc they need to be assigned to the HC or treatment group, causing bias).

Consecutive sampling: more valid/interpretable results ánd predictive values

Tandem testing: compare 2 (presumably imperfect) with eachother to determine which one is more accurate > after that, the gold standard test is selectively applied on positive results + on a random sample of negative results, to make sure they really don’t have the disease.

Cohort designs: necessary for prognostic tests (intervention + follow-ups to see who develops the outcome of interest)

18
Q

What is the outcome variable in diagnostic tests?

A

The presence or absence of a disease, determined by the gold standard.

19
Q

What is the outcome variable in a prognostic test study?

A

What happens to patients with the disease (how long they live, complications, additional treatments that they require).

20
Q

What is the ROC curve and how is it used?

A

Receiver operating characteristic curves; this is a visual representation of the trade-off between sensitivity and specifity of diagnostic tests.

The area under the ROC curve (0.5 = useless, 1 = perfect test) gives a summary of the overall accuracy of the test; useful to compare the accuracy of two or more tests

21
Q

What are likelihood ratios and how are they used?

A

Also used for continuous/ordinal diagnostic tests to describe their accuracy; better than sensitivity/specifity and ROC (because it uses all information of a test).

The ratio gives the probability of the result given disease (or not). The higher the ratio, the better the test is in ruling in a disease; the lower the ratio (the lower to 0), the better the test result is for ruling out a disease.

22
Q

What is net reclassification improvement?

A

A more direct approach than ROC curves to quantify what new tests add to existing prediction models.

More specifically, to examine how often a model/clinical prediction rule including the new test changes the classification of patient from one risk category to another, compared to the old model.
–> when new test improves the prediction, more subjects who develop the disorder move up a higher risk category (cases) than move down to a lower risk category (controls)

23
Q

What is the difference between studies to assess accuracy and studies to create clinical prediction rules?

A

The goal in creating clinical prediction rules is to improve clinical decisions by using mathematical methods to develop a new test, rather than to evaluate one that already exists.

24
Q

How can studies for clinical prediction rules be designed?

A

Using mathematical models to create prediction rules, such as the multivariate technique (including predictor variables to predict the outcome) such as logistic regression or the Cox (proportional hazards) model, who can quantify the independent contribution of candidate predictor variables for predicting the outcome.

A non-modeling technique is recursive partitioning/Classification and Regression Tree (CART) analysis, which both have very high levels of sensitivity.

25
Q

Why should you validate the clinical prediction rule that you developed?

A

To avoid overfitting; adress this by dividing the cohort in derivation and validation data sets, and testing the rule derived from the derivation cohort using the data from the validation cohort (this only adressess internal validation). To adress external valididation, you need to test the rule in different populations. So; lack of generalizability.

26
Q

What do diagnostic yield studies assess?

A

They estimate the proportion of positive tests among patients with a particular indication for the test. Not very useful, but when the study shows that a test is almost always negative, you can conclude that this test is not very useful for the particular indication.

27
Q

What do before/after studies of clinical decision-making assess?

A

They adress the effect of a test result on clinical decisions. They generally compare what clinicians do before and after obtaining results of a diagnostic test.

28
Q

What do studies do to adress the feasibility, accuracy and validity of tests?

A

Use descriptive statistics (such as CI’s, means, SD’s, ranges etc)

29
Q

Which design is the best for studying the utility of a diagnostic test? and which variables whould be among the outcomes?

A

The best design is a clinical trials, where subjects are randomly assigned to receive the test yes/no.

Clinical trials minimize confounding and selection bias and allow measurement of all relevant outcomes (mortality, morbidity, cost and QoL). BUT not practical and ethical issues (withholding potentially valuable tests).

30
Q

Which design can you use to study the utility of a diagnostic test when the clinical trial is not ethical/feasible? To which two problems should you be aware?

A

Then you can use observational studies (looking at the benefits, harms and costs of the test), with special attention to possible biases (people who volunteer to participate often differ from those who don’t) and confounding (because you don’t have randomisation).

31
Q

What are possible pitfalls in the design/analysis of diagnostic test studies?

A
  1. Inadequate sample size; rare diseases need very largs sample sizes
  2. Inappropriate exclusion: it is inapproprate to exclude subjects from the numerator without excluding similar subjects from the denominator
  3. Dropping borderline/uninterpretable results: look into why this has happened
  4. Verification bias: selective application of a single gold standard, this causes a problem if the test being studied is also used to decide who gets the gold standard.
  5. Differential verification bias: different gold standards for those testing positive and negative. This bias can occur any time the gold standard differs among those with positive and negative test results. Can be solved by using the same gold standard for all subjects.
32
Q

How can studies of the value of prognostic tests (focus on the influence of specific variables/risk factors that influence the outcome of the study) be summarized?

A

Risk ratios, hazard ratios and reclassification improvements

33
Q

What is spectrum bias?

A

When the spectrum of disease (or non-disease) in the sample differs from that of patients to whom the investigator wants to generalize.

34
Q

Regarding the evaluating of a test, what is often a first good step?

A

Measuring the reproducability of the test, including the intra- and inter-observer variability.

35
Q

What are three ways to perform internal validation of a clinical prediction model?

A
  1. Split sample: use half of the sample to develop model, test on other half
  2. Cross-validation: develop model on randomly drawn half, test on other half
  3. Bootstrapping: drawing many samples from original sample of same sample size (drawing with replacement)
36
Q

Where do you use logistic regression/discriminant analyses for in clinical prediction models?

A

Using these, you can test if the number of prediction variables can be reduces (see when the model reaches significance yes/no, see the added predictive value of independent variables). This can save a lot of time, effort and costs.

37
Q

Regarding logistic regression and discriminant function analysis; which one is preferred and why?

A

Logistic regression is often preferred, because this has fewer assumptions as discriminant function analysis; regarding the rest, they actually do the exact same thing.