Stats: evidence appraisal practice questions Flashcards
You are reading an abstract summarizing the results of a clinical study in which
blood pressure was measured on 100 men with hypertension before and after treatment with a
new antihypertensive drug. The conclusion (no data given) was that the drop in mean blood
pressure following treatment was highly significant (p
(D) From the abstract, it is clear that the investigators were interested in reducing blood pressure by using the new drug. A one-sided test is appropriate if prior to the conduct of the trial, we are interested in rejecting the null hypothesis of no change in favor of an alternate hypothesis
that the change is in a particular direction.
You are reading an abstract summarizing the results of a clinical study in which
blood pressure was measured on 100 men with hypertension before and after treatment with a
new antihypertensive drug. The conclusion (no data given) was that the drop in mean blood
pressure following treatment was highly significant (p
(B) The comparison is between means (pre and post treatment) and the data are paired (two
measurements on each patient) so the appropriate test is the paired t test.
. You are reading an abstract summarizing the results of a clinical study in which
blood pressure was measured on 100 men with hypertension before and after treatment with a
new antihypertensive drug. The conclusion (no data given) was that the drop in mean blood
pressure following treatment was highly significant (p
(D) “Highly significant, p implies that the probability is less than .0005 of observing a drop
as large or larger in BP as what was seen, when the drug actually had no effect. The fact that
significance tests do not directly relate to clinical importance or cause -and-effect rules out
A,B,C.
Cortisol levels were measured in two independent groups of women at
childbirth. Group 1 underwent emergency Caesarean section following induced labor. Group 2
delivered by either Caesarean section or the vaginal route following spontaneous labor.2
The number of women (n), mean levels, and standard deviations were as follows:
Group n Mean Std. dev.
1 10 535 60
2 10 645 70
To compare the mean cortisol levels for statistical significance you would use
A. unpaired t-test
B. paired t-test
C. chi square test
D. Fisher’s exact test
E. Doesn’t matter since sample sizes are equal
(A) Since the object is to compare means (not proportions) and the groups are independent (not
paired or matched), the unpaired t-test should be used.
Cortisol levels were measured in two independent groups of women at
childbirth. Group 1 underwent emergency Caesarean section following induced labor. Group 2
delivered by either Caesarean section or the vaginal route following spontaneous labor.2
The number of women (n), mean levels, and standard deviations were as follows:
Group n Mean Std. dev.
1 10 535 60
2 10 645 70
The researchers reported that the p-value for the comparison of mean cortisol levels between
groups was 0.0014. Which of the following conclusions can NOT be drawn from this
information?
A. The difference is statistically significant at the 5% level
B. The difference is not statistically significant at the 0.1% level (α = 0.001)
C. If there is truly no difference between the two groups, the probability of observing a difference
at least as large as (645 – 535) is less than 1%.
D. Inducing labor causes reduced cortisol
E. A 95% confidence interval for the difference in cortisol between the two groups would not
include zero.
. (D) Statistical significance alone does not imply causation. The p-value (.0014) is less than .05 so
the difference is significant at the 5% level and A is true. The p-value is greater than .001 so the
difference is not significant at the .1% level and B is true. C is the definition of a p-value. E is
true because of the relationship between p-values and confidence intervals: a 95% confidence
interval for a difference will not include the null value (no difference) if the difference is
statistically significant at the 5% level.
A clinical trial is being planned in which a new drug (A) is to be compared to the drug in current
use (B). Patients will be randomly allocated into two groups –one group to receive drug A, the
other group drug B. Patients in each group will have systolic blood pressure (SBP)
measurements taken during a baseline period and after a prescribed period on the drug
therapy. It is planned to determine the effectiveness of the new drug by comparing the
difference in mean SBP changes (mean drop with drug A compared to mean drop with drug B)
with a t-test to determine whether the new drug (A) is better than the current drug (B) in
reducing blood pressure on average. As implied by the discussion, the investigators would use
the:
A. unpaired t-test (one-sided) B. paired t-test (one-sided) C. unpaired t-test (two-sided) D. paired t-test (two-sided) E. unpaired t-test (three-sided)
(A) The fact that the patients are to be "randomly allocated into two groups" implies an unpaired design. The last sentence implies that they are interested in detecting a difference in one direction only (new drug better than old), so a one-sided test should be done.
A summary of a randomized clinical trial of two treatments states that “no significant difference
(p > .05)” was found in treatment outcomes. Based on this, you should conclude that the
difference in treatment outcomes
A. is due to chance.
B. is due to the treatment
C. is not of clinical interest.
D. could be of clinical interest, if the sample sizes are large enough so that there is little likelihood
of missing an important difference.
E. could be clinically important, if the observed difference were large enough and if the sample
sizes are too small to yield much power to detect such a difference.
(E) A non-significant p-value alone is not sufficient information to rule out either chance or
treatment effect as explanations for the observed difference. Conclusions about clinical
importance are made based on the observed difference and confidence interval, which are not
reported here. If the observed difference is large enough to be clinically important, then the
sample size was too small to detect this difference as statistically significant. If the observed
difference is of little clinical importance, then one needs to ensure that the trial had sample
sizes large enough to detect any difference of clinical importance.
The following data are results from a comparative study of two diagnostic tests (A and B) for a given condition. Sixty patients known to have the condition were tested with both diagnostic tests. (+) B (-) total (+) 36 11 47 A (-) 3 10 13 39 21 60
The estimated sensitivity of test A is A. 36/39 B. 39/60 C. 36/47 D. 47/60 E. none of the above
(D) Of 60 patients known to have the condition, 47 tested positive with test A; this implies the
sensitivity of test A = 47/60.
The following data are results from a comparative study of two diagnostic tests (A and B) for a given condition. Sixty patients known to have the condition were tested with both diagnostic tests. (+) B (-) total (+) 36 11 47 A (-) 3 10 13 39 21 60
You wish to use a significance test to compare the sensitivities of tests A and B. An appropriate
test would be
A. unpaired t-test
B. McNemar’s test
C. paired t-test
D. chi-square test for independent proportions
E. Fisher’s exact test.
(B) Sensitivities are calculated as proportions. Since each patient had both tests administered, we
have paired data, and to compare paired proportions, McNemar’s test is used.
The following abstract describes a trial of the use of antibiotic prophylaxis
against gonorrhea. The study subjects were volunteers from a crew of a large naval vessel 4
operating in the western Pacific in 1974 who were then randomly assigned to receive either
antibiotic (100 mg minocycline) or placebo before taking liberty.
Abstract: In a prospective evaluation of antibiotic prophylaxis against gonorrhea, 1080 men
were given 200 mg of oral minocycline or placebo after sexual intercourse with prostitutes in a
Far Eastern port. Later at sea, gonococcal infection was detected in 57 of 565 men given
placebo and 24 of 515 men given minocycline (P
(C) The null hypothesis is always one of no difference; here the interest is to compare the
infection rates of placebo and treatment groups.
The following abstract describes a trial of the use of antibiotic prophylaxis
against gonorrhea. The study subjects were volunteers from a crew of a large naval vessel 4
operating in the western Pacific in 1974 who were then randomly assigned to receive either
antibiotic (100 mg minocycline) or placebo before taking liberty.
Abstract: In a prospective evaluation of antibiotic prophylaxis against gonorrhea, 1080 men
were given 200 mg of oral minocycline or placebo after sexual intercourse with prostitutes in a
Far Eastern port. Later at sea, gonococcal infection was detected in 57 of 565 men given
placebo and 24 of 515 men given minocycline (P
(B) Since subjects are assigned to one of two independent groups and we are comparing
proportions, the Chi-square test is appropriate.
The reference to “(p
(D) “P
The reference to “(p
(A) The corresponding 95% confidence interval must include the estimated rate ratio (2.16).
Because p
You are reading a report of a clinical trial of two treatments in which a large number of
treatment outcomes (variables) were compared for “statistical significance”. Such “multiple
testing” causes difficulty in interpreting reported statistically significant differences because of
A. decreased power.
B. increased probability of a Type II error.
C. decreased positive predictive value.
D. increased probability of a Type I error.
E. decreased negative predictive value.
(D) Even if there are no real differences between the treatments for any of the variables tested, if
the tests were performed at the 5% alpha level, we expect 5% of the conclusions from the tests
to be “false positives”, i.e., claiming a difference exists when in fact none exists. The more tests
done, the higher the chance of at least one Type I error.
Prior to 1982, several large outbreaks of leptospirosis occurred in troops deployed to Panama
for jungle warfare training. In a field trial of the efficacy of doxycycline to prevent leptopirosis,
doxycycline (200 mg.) or placebo was administered by tablet on a weekly basis and at the
completion of training to 940 volunteers from 2 U.S. Army units deployed to Panama for
training. Twenty cases of leptospirosis occurred in the placebo group (attack rate of 4.2%)
compared to only 1 case in the doxycycline group (attack rate of 0.2%, P
(A) The only two tests listed that are used to compare unpaired proportions are Fisher’s exact
test and the z-test. Fisher’s exact test is often recommended when small proportions are to be
compared.