EBM Flashcards
What are the four components needed to estimate sample size?
Population - that fits your demographic - this is the P in pico. Can be unknown or estimated.
Margin of Error (Confidence Interval) — how much error you want to allow, this defines how much higher or lower than the population mean, that the sample mean will fall. Usually +/- 0.5%
Confidence level - how confident you are that the actual mean falls within the confidence interval (90, 95, 99% are the most common and correlate to Z-scores of 1.6, 2.0, and 2.3 respectively)
Standard deviation - how much variance to allow - standard is .5%
Necessary Sample Size = (Z-score)² – StdDev*(1-StdDev) / (margin of error)²
What is an appropriate drop-out rate for a trial?
If less than 80% are followed up it is generally recommended that the result is ignored.
If the drop-out rates are high, how confident can you be in the final results? What if all the drop-outs had a bad outcome?
What does control, experimental, and patient expected event rates mean?
Control Event Rate (CER)
The rate at which events occur in the control group e.g. in a RCT of aspirin v placebo to prevent MI, a CER of 10% means that 10% of the placebo group had a MI. It is sometimes represented as a proportion (10% = 10/100= 0.1).
Experimental Event Rate (EER)
The rate at which events occur in the experimental group e.g. in the CER example above, an EER of 9% (or 0.09) means that 9% of the aspirin group had a MI.
Patient expected event rate
The patient expected event rate (PEER) refers to the rate of events we’d expect in a patient who received no treatment or conventional treatment.
Define Type 1 and Type 2 error in stats.
Type 1 error - a positive result when there is no real difference. This is a false positive
Type 2 error - no significant difference is found when there is actually a real treatment difference. This is a false negative
Small studies with a wide CI are prone to these errors.
Out of interest…
If you see an unexpected positive result (e.g. a small trial shows willow bark extract is effective for back pain) think: could this be a type 1 error? After all, every RCT has at least a 1 in 20 chance of a positive result and a lot of RCTs are published…
If a trial shows a non-significant result, when perhaps you might not have expected it, think could this be a type 2 error? Is the study under-powered to show a positive result?
Systematic reviews, which increase study power and reduce CI, are therefore very useful at reducing Type 1 and 2 error.
What is the null hypothesis?
States there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
In a clinical trial presenting two survival curves, how is the absolute benefit of treatment is best described? What is the significance of a plateau?
The median ‘increase’ in survival time (when comparing treatment to placebo or other).
The median survival is the time at which the percentage surviving is 50%. If more than half the patients are cured, there is no such point on the survival curve and the median is undefined (and often described as greater than the longest time on the curve). I like undefined medians!
Curves which flatten to a level plateau, suggest that patients are being cured, and curves which descend all the way to zero, imply that no one (or almost no one) is cured.
A median survival or survival percentage at x years wont give you the full story. The drop off may continue past 5 years but then flatten out at 6 - which means no deaths. People fortunate enough to make it out to six or seven years may well be cured. You can’t tell that from the median or 5 year survival or from any other single point. It’s the shape of the survival curve that tells this story.
Read: http://cancerguide.org/scurve_basic.html for a really good summary of survival curves.
What does relative risk reduction mean?
The relative risk, relative risk reduction, or risk ratio, is the ratio of the risk of an event in experimental group compared to the control group i.e. RR = EER/CER. The RRR is the proportional reduction seen in an event rate between the experimental and control groups. For example if a drug rreduces your risk of an MI from 6% to 3%, it halves your risk of a MI i.e. the RRR is 50%. But note that the ARR is only 3%.
Out of interest…
Relative risks and odds ratios are used in meta-analyses as they are more stable across trials of different duration and in individuals with different baseline risks. They remain constant across a range of absolute risks. Crucially, if not understood, they can create an illusion of a much more dramatic effect. Saying that this drug reduces your risk of a MI by 50% sounds great; but if your absolute risk was only 6%, this is the same as reducing it 3%. So, saying this drug reduces your risk by 50% or 3% are both true statements but sound very different, so guess which one that drug companies tend to prefer!
Define a hazard ratio
A way of expressing the relative risk of an adverse event i.e. if an adverse event was twice as likely to happen with a particular intervention, it would have a HR of 2.
Define number needed to treat (NNT) and how to calculate it.
A clinically useful measure of the absolute benefit or harm of an intervention expressed in terms of the number of patients who have to be treated for one of them to benefit or be harmed.
Calculated as 1/ARR.
Example: The ARR of a stroke with warfarin is 2% (=2/100 = 0.02),
The NNT is 1/0.02 = 50.
e.g. Drug A reduces risk of a MI from 10% to 5%, what is the NNT?.
The ARR is 5% (0.05), so the NNT is 1/0.05 = 20.
Define Absolute Risk Reduction (ARR)
CER - EER
- Absolute risk of an event happening (also called risk difference)
- Always expressed as a percentage.
- Looks at the difference between two event rates - i.e. absolute risk of death from MI + placebo is 5%. With a drug it’s 3%. Thr ARR is 2%.
- Important to determine clinical relevance.
- (Absolute Risk Increase calculates an absolute difference in bad events happening in a trial ie when the experimental treatment harms more than the control).
Explain the concept of a likelihood ratio. How do you apply them as a bedside test?
The sensitivity and specificity of a test can be combined into one measure called the likelihood ratio. The likelihood ratio for a test result is defined as the ratio between the probability of observing that result in patients with the disease in question, and the probability of that result in patients with- out the disease.
LR = probability of a positive test / probability of a negative test.
For example, among patients with abdominal distension who undergo ultrasonography, the physical sign “bulging flanks” is present in 80% of patients with confirmed ascites and in 40% without ascites (i.e., the distension is from fat or gas). The LR for “bulging flanks” in detecting ascites, therefore, is 2.0 (i.e., 80% divided by 40%). Similarly, if the finding of “flank tympany” is present in 10% of patients with ascites but in 30% with distension from other causes, the LR for “flank tympany” in detecting ascites is 0.3 (i.e., 10% divided by 30%).
Easy recall
LR of 2 increases probability by 15%
LR of 5 by 30%
LR of 10 by 45%
For LRs between 0 and 1, use the inverse
1/2 = 0.5 - decreases probability by 15%
1/5 = 0.2 - decreases probability by 30%
1/10 = 0.1 - decreases probability by 45%
.
What does a p-value mean? What are the main influencing factors?
A measure that an event happened by chance alone e.g. p = 0.05 means that there is a 5% chance or magnitude that the result occurred by chance. For entirely arbitrary reasons p
The size of a P value depends on two factors:
- The magnitude of the treatment effect (relative risk, hazard ratio, mean difference, etc)
- The size of the standard error (which is influenced by the study size, and either the number of events or standard deviation, depending on the type of outcome measure used).
Very small P values (the easiest to interpret) arise when the effect size is large and the standard error is small.
Borderline P values can occur when there is a clinically meaningful treatment effect but a large or moderate standard error—often because of an insufficient number of participants or events (the trial is referred to as being underpowered).
This is perhaps the most common cause of borderline results. Borderline P values can also occur when the treatment effect is smaller than expected, which with hindsight would have a required a larger trial to produce a P value
Define positive and negative predictive value.
How do they differ from sensitivity and specificity?
The PPV is the percentage of patients who test positive for for a disease who really do have it out of the total positive, and the NPV is the percentage who test negative out of the total number of negative tests who really do not have it.
A/A+B
Depends on the background prevalence of the disorder in the population.
If a disease is rare, the PPV will be lower (but sensitivity and specificity remain constant). Often with tests the PPV is higher in a secondary care or sieved population than it is in primary care.
The likelihood ratio takes this into account and gives the most accurate information on test accuracy.
In an example using HIV with a 10% population prevalence, we had 9900 ‘true positive’ test results – infected persons who tested positive – and 9000 false positive results. The positive predictive value in this case is (9900)/(9900 + 9000), or 52.4% or, nearly half of its positive results were false. In a subpopulation with higher HIV prevalence, the positive predictive value would be higher, as there would be more truly HIV-positive findings compared to the constant rate of false positive results.
The negative predictive value is defined as the proportion of persons with negative test results who are correctly diagnosed.
D/D+C
This value, too, depends on HIV prevalence. The negative predictive value is the number of persons correctly diagnosed as HIV-negative, divided by the total number of HIV-negative findings. The 81,000 ‘true negative’ and 100 false negative results in our example yield a negative predictive value of (81,000/81,100), or over 99.9% – a very high likelihood that a negative result indicates a truly HIV-uninfected person.
What is the point of an ROC curve? How is it used?
The ROC curve is used to graph of Sensitivity vs the False positive rate or the sensitivity vs specificity
The AUC looks at the overall ability of the test to discriminate between those individuals with the disease and those without the disease.
A truly useless test (one no better at identifying true positives than flipping a coin) has an area of 0.5 (the red line is random). The best test has an area of 1 (which is the top left corner) - remember the AUC is the AUC from the red line.
If patients have higher test values than controls, then:
The area represents the probability that a randomly selected patient will have a higher test result than a randomly selected control.
If patients tend to have lower test results than controls:
The area represents the probability that a randomly selected patient will have a lower test result than a randomly selected control.
For example: If the area equals 0.80, on average, a patient will have a more abnormal test result than 80% of the controls.
If the test were perfect, every patient would have a more abnormal test result than every control and the area would equal 1.00.
If the test were worthless, half the controls would have a higher value than an actual diseases patient, and half would be lower, the AUC would be 0.5.
Define cumulative incidence. How does it differ from regular incidence?
Incidence is the number of new cases of a disease over time.
– Units include time
– Range is 0 to infinity
– Denominator is person-time
• Cumulative incidence is a proportion
– No units
– Range is 0 to 1
– Denominator is all at-risk in population
The cumulative incidence increases each year as the cases continue to accumulate, but the denominator for cumulative incidence – the initial population at risk – remains fixed.
- Incidence rate applies to a broader range of questions
- Kaplan-Meier provides a means to estimate cumulative incidence – censors those with incomplete follow-up
What does the term probability mean?
Probability of an event happening = Number of ways it can happen / Total number of outcomes
Probability can only ever be between 0 and 1.
For example - there are two ways a coin can land, heads or tails, it can go either way. There is a 1 in 2 or 1/2 chance of landing heads, and a 1/2 chance of landing tails. The probability of landing heads is 1 in 2.
The probability of a six sided dice landing a 4 is 1 in 6 or 1/6. There is only one way it can happen (there is only one 4 on the dice), vs 6 sides.
What does absolute risk reduction, or risk reduction mean?
Control event rate minus the experiment event rate (CER - EER)
The absolute risk is the actual, arithmetic risk of an event happening. The ARR (sometimes also called the Risk Difference) is the difference between 2 event rates e.g. AR of a MI with placebo over 5 years is 5% and with drug A is 3%, the ARR is simply 2%. This is the difference between the CER (control event rate) and the EER (experimental event rate).
e.g. Drug B reduces the chance of a stroke from 20% (CER) to 17% (EER). What is the ARR? Answer 3%.
Absolute risk increase (ARI) similarly calculates an absolute difference in bad events happening in a trial e.g. when the experimental treatment harms more patients than the control.
Knowing the absolute risk is essential when deciding how clinically relevant a study is.
Define subgroup analysis
What are the inherent problems with this?
What are the benefits?
- Participant data is split into subgroups to make comparisons between them, i.e by gender to compare differences within, or geographical locations.
- Used to investigate heterogenous results or to answer specific questions about patient groups or types of intervention.
- May be misleading – they are observational by nature, not randomised
- The more subgroup analyses there are, the higher the likelihood of false positives and negatives.
-Unexpected results from a subgroup analysis can be useful as a potential starting place for a subsequent clinical trial.
Does prespecifying a subgroup analysis help reduce the false positive/negative rate?
Why?
How can you address this?
– Prespecified subgroup analysis does not prevent this, particularly if there are a large number of prespecified subgroup analyses (referred to as multiplicity). (If 20 subgroup analyses are prespecified, then it is expected that one of these subgroup analyses may show a false result for a P=.05 probability relationship.) For example, if the null hypothesis is true for each of 10 independent tests for interaction at the 0.05 significance level, the chance of at least one false positive result exceeds 40%.
- Multiplicity can be addressed by using criteria for statistical analysis that is more stringent than P=
What is the difference between prespecified subgroup analysis vs post-hoc analysis?
Is one better than the other?
Prespecified
- planned and documented before data examination.
- preferably included in study protocol
- includes endpoint, baseline characteristic, statistical method used.
Post-hoc
- hypotheses tested not specified prior to data examination
- unclear how many were undertaken
- unclear if motivated by post-hoc inspection of the data
However, both prespecified and post hoc subgroup analyses are subject to inflated false positive rates arising from multiple testing. Investigators should avoid the tendency to prespecify many subgroup analyses in the mistaken belief that these analyses are free of the multiplicity problem.
Define specificity.
Specificity is the proportion of people without the disease who test negative. A very specific test will have few false positives and be good at ruling a disease out. SpPIN means if a test is highly Specific (Sp) a Positive result rules the diagnosis in.
True negative / false positive + true negatives (d/b+d)
In other terms, if the test result for a highly specific test is positive you can be nearly certain that they actually have the disease.
Therefore, a test with 100% specificity correctly identifies all patients without the disease. A test with 80% specificity correctly reports 80% of patients without the disease as test negative (true negatives) but 20% patients without the disease are incorrectly identified as test positive (false positives).
A test with a high sensitivity but low specificity results in many patients who are disease free being told of the possibility that they have the disease and are then subject to further investigation. A good example is the D-Dimer which is sensitive but not specific - i.e about half of people who don’t have the disease will test positive.
Define sensitivity
The sensitivity of a clinical test refers to the ability of the test to correctly identify those patients with the disease.
Sensitivity = true positives / true positives plus false negatives (a/a+c)
A test with 100% sensitivity correctly identifies all patients with the disease. A test with 80% sensitivity detects 80% of patients with the disease (true positives) but 20% with the disease go undetected (false negatives). A high sensitivity is clearly important where the test is used to identify a serious but treatable disease (e.g. cervical cancer). Screening the female population by cervical smear testing is a sensitive test. However, it is not very specific and a high proportion of women with a positive cervical smear who go on to have a colposcopy are ultimately found to have no underlying pathology.
How do you calculate and interpret a positive likelihood ratio?
LR+ = The probability of an individual with disease having a positive test / The probability of an individual without disease having a positive test
You will notice that the numerator in this equation is exactly the same as the sensitivity of the test, and the denominator is the converse of specificity (1 − specificity). Thus the LR+ of a test can simply be calculated by dividing the sensitivity of the test by 1− specificity (Sensitivity/1 − specificity).
LR+s greater than 1 mean that a positive test is more likely to occur in people with the disease than in people without the disease. LR+s less than 1 mean that a positive test is less likely to occur in people with the disease compared to people without the disease. Generally speaking, for patients who have a positive result, LR+s of more than 10 significantly increase the probability of disease (‘rule in’ disease) whilst very low LR+s (below 0.1) virtually rule out the chance that a person has the disease
>p>
How do you calculate and interpret a negative likelihood ratio?
LR− =The probability of an individual with the disease having a negative test / the probability of an individual without the disease having a negative test
The numerator in this equation is the converse of sensitivity (1 − sensitivity), and the denominator is equivalent to specificity. Thus the LR− of a test can be calculated by dividing 1 − sensitivity by specificity (1 − Sensitivity/Specificity).
LR−s greater than 1 mean that a negative test is more likely to occur in people with the disease than in people without the disease.
LR−s less than 1 mean that a negative test is less likely to occur in people with the disease com- pared to people without the disease.
Generally speaking, for patients who have a negative test, LR−s of more than 10 significantly increase the probability of disease (rule in dis- ease) whilst a very low LR− (below 0.1) virtually rule out the chance that a person has the disease.
LR+ = sensitivity
1-specificity
LR- = 1-sensitivity
specificity
What is pre and post-test probability?
What else do you need to estimate post-test probability, how do you do it, and what is it called?
The estimated probability of disease before the test result is known, is referred to as the pre-test probability, which is usually estimated on the basis of the clinician’s personal experience, local prevalence data and published reports.
The patient’s probability or chance of having the disease after the test results is known is referred to as the post-test probability. The post-test probability of disease is what clinicians and patients are most interested in as this can help in deciding whether to confirm a diagnosis, rule out a diagnosis or perform further tests.
According to the Bayes theorem, the post-test odds that a patient has a disease is obtained by multiplying the pre-test odds by the likelihood ratio of the test
Post−test odds = pre−test odds × likelihood ratio
Post-test odds are different to probability but can be converted.
What is Fagan’s nomogram?
How is it used?
The Fagan’s nomogram is a graphical tool which, in routine clinical practice, allows one to use the results of a diagnostic test to estimate a patient’s probability of having disease. In this nomogram, a straight line drawn from a patient’s pre-test probability of disease (left axis) through the likelihood ratio of the test (middle axis) will intersect with the post-test probability of disease (right axis).
Hypothetical example
In a hypothetical population, the prevalence of Disease A was 10%, which means that when we randomly select a person from this population, his or her chance of having Disease A (pre-test probability) is 10%. The LR+ of Test A was earlier calculated to be about 13. As shown in Figure 2, when we draw a straight line from the pre-test probability of 10% through the likelihood ratio of 13, the line intersects with the post-test probability of about 60%.
This means that the probability of Disease A for a person in this hypothetical population increases from 10% to 60% when he or she has had a positive result for Test A.
In the same way, we can also estimate the post-test probability of a person in this population who has a negative result. You will recall that the LR− of Test A was earlier calculated to be 0.21. Joining the pre-test probability of 10% to the likelihood ratio of 0.21 on the Fagan’s nomogram, we read off a post-test probability of about 2% (Fig. 3). This means that after a negative test, a person in this population’s chance of having Disease A reduces from 10% to 2%.
A certain autosomal recessive disorder affects 1 in 1600 people; the carrier frequency is 5%. A DNA assay can identify the mutation in 80% of carriers; the false-positive rate of this assay is zero.
What is the best estimate of the positive predictive value (PPV) and negative predictive value (NPV) of this assay in screening the population for carriers?
PPV NPV
A 20% 100%
B 80% 80%
C 100% 80%
D 100% 99%
E 100% 100%
Answer: D
Question 47 AMP2007a
A test has a sensitivity of 95% and a specificity of 80%. It is used to screen for a condition with a prevalence of 1 in 100.
What will the positive predictive value be nearest to?
A. 0.2%.
B. 0.5%.
C. 1%.
D. 2%.
E. 5%.
E. 5%.
2006a Q33