Basic statistics (MRCP) Flashcards
A study is evaluating the effect of agomelatine on postnatal depression at
a mother and baby unit. Which one of the following should be considered
when assessing the internal validity of this study?
A. Benefi ts of agomelatine in major depression outside the postpartum period
B. The degree to which the subjects adhered to the study protocol
C. The cost of using agomelatine compared with standard care
D. Consistency of the reported outcome in comparison with previous studies
E. Benefi ts of agomelatine in postpartum depression when used at an outpatient service
B. Internal validity is the degree to which a study establishes the cause-and-effect relationship
between the treatment and the observed outcome. External validity is the degree to which
the results of a study becomes applicable outside the experimental setting in which the study
was conducted. In other words, external validity refers to generalizability of study results while
internal validity refers to rigorousness of the research method. The benefi t of agomelatine
in different populations (choices A and E) refers to external validity; the cost of the drug
and consistency of results obtained from different studies are related to applicability of the
intervention in a clinical setting. Assessment of adherence to study protocol is one of many ways
of analysing the quality of an intervention trial.
A new clinician-administered test for assessing suicidal risk is studied in a
prison population in Canada, where a high suicide rate of 1 in 25 has been
recorded. Which of the following indicate that this test is NOT suitable for
your clinical population?
A. The positive predictive value is 80%
B. The likelihood ratio for a positive test is 14
C. The prevalence of suicide in your clinical sample is 1 in 890
D. The inter-rater reliability (kappa) of the test is 0.8
E. The literacy rate of the prison population is very low but comparable with your clinical
sample
C. Having a high positive predictive value, a likelihood ratio more than 10, and good interrater
reliability as measured by kappa are desirable properties of an instrument. But when the
same instrument is applied to a population with much lower prevalence of suicide (the studied
phenomenon), the post-test probability decreases substantially. Post-test probability is a measure
of positive predictive value in the target population; it depends on pretest probability, i.e. the
prevalence and likelihood ratio.
A new rating scale being evaluated for anxiety has a sensitivity of 80% and specifi city of 90% against the standard ICD-10 diagnosis. The likelihood ratio of a positive result is A. Nearly 2 B. Nearly 0.2 C. 0.08 D. 8 E. 0.5
D. The likelihood ratio of a positive test (LR+) is the ratio between the probability of a
positive test in a person with disease and the probability of a positive test in a person without
disease. It can also be expressed as
LR+ = sensitivity/(1 – specifi city)
Here, sensitivity = 0.8; specifi city = 0.9.
Hence LR+ = 0.8/1 – 0.9 = 8.
A pharmaceutical company developed a new antidepressant ‘X’. They
conducted a randomized double-blind placebo controlled trial of the drug.
The study had two arms: an active medication arm and a placebo arm.
Each arm had 100 subjects. Over a 4-week period, a 50% drop in Hamilton
depression scale (HAMD) scores were seen in 40 subjects in the active
medication arm, while a similar drop was seen only in 20 subjects in the
placebo arm. What is the number needed to treat (NNT) from this trial for
the new antidepressant?
A. 1
B. 2
C. 3
D. 4
E. 5
5
During the same placebo controlled trial described in question 4, 20% of
people on X developed active suicidal ideas, while only 10% of patients on
placebo developed the same side-effect. What is the number needed to
harm (NNH) associated with the suicidal ideas from the trial data?
A. 5
B. 10
C. 15
D. 20
E. 25
b
The prevalence of depression in patients with mild cognitive impairment
is 10%. On applying a depression rating scale with the likelihood ratio of a
positive test (LR+) equal to 10, a patient with mild cognitive impairment
becomes test positive. The probability that this patient is depressed is equal
to
A. 15%
B. 32%
C. 52%
D. 85%
E. 100%
C. This question tests one’s ability to calculate post-test probability from likelihood ratios.
The probability of having a disease after testing positive with a diagnostic test depends on
two factors: (a) the prevalence of the disease, (b) the likelihood of a positive test result using
the instrument. It is important to remember that baseline prevalence of a disease for which a
diagnostic instrument is being tested is taken as the pretest probability.
So pretest probability = 10%
Now, post-test odds = likelihood ratio × pretest odds
From a given probability odds can be calculated using the formula
odds = (probability)/(1 – probability)
Here pretest odds = (10%)/(1 – 10%) = 10/90 = 1/9.
Now post-test odds = likelihood ratio × pretest odds
= 10 × 1/9 = 10/9
Using the formula probability = odds/(1 + odds)
post-test probability = (10/9)/[1 + (10/9)] = 10/19 = 52.3%
A multi-centre double blind pragmatic randomized controlled trial (RCT)
reported remission rates for depression of 65% for fl uoxetine and 60% for
dosulepin. The number of patients that must receive fl uoxetine for one
patient to achieve the demonstrated benefi cial effect is
A. 60
B. 20
C. 15
D. 10
E. 5
B. This question tests one’s knowledge of the NNT (number needed to treat) concept. NNT
is given by the inverse ratio of the absolute benefi t increase (ABI) in therapeutic trials. ABI is
the difference between benefi t due to experimental intervention and the compared standard/
placebo. Here it is given by 65% – 60% = 5%. If ABI = 5%, NNT = 100/5 = 20.
In a randomized double-blind trial two groups of hospitalized depressed
patients treated with selective serotonin reuptake inhibitors (SSRIs) are
evaluated for benefi cial effects on insomnia of trazodone vs temazepam.
Which of the following is NOT an important factor when evaluating the
internal validity of results obtained from the above study?
A. Baseline differences in antidepressant therapy between the two groups
B. The method used to randomize the sample
C. Setting in which the study takes place
D. Sensitivity of the insomnia scale to pick up changes in severity
E. Inclusion of the data in fi nal analysis from patients who have dropped out
C. Threats to internal validity of an experimental study include confounding, selection bias,
differential attrition, and quality of measurement. Having a signifi cant difference in baseline SSRI
therapy could explain differential outcomes in the trazodone vs temazepam groups. Similarly,
poor randomization may lead to selection bias and infl uence the differences in outcome. Failure
to account for differential drop-out rates may spuriously infl ate or defl ate the difference in
outcome. Using a scale with poor sensitivity to change will reduce the magnitude of differences
that could be observed. Given both groups are recruited from the same setting (hospital), this
must not infl uence validity; on the other hand, this might well infl uence generalizability of results
to the non-hospitalized population (external validity)
While adapting the results of an RCT into clinical practice, a clinician wants
to calculate the new NNT values for his own clinical population using the
results of the RCT. Apart from the reported RCT which of the following is
needed to carry out the calculation of the new NNT?
A. The expected rate of spontaneous resolution of the treated condition in the clinical
population
B. The size of the clinical population
C. The case fatality rate for the treated condition in the clinical population
D. Lifetime prevalence of the disease in the clinical population
E. All of the above
A. Published RCTs may quote impressive outcomes in terms of NNT. Applying principles of
evidence-based medicine, one must check for the internal validity of a study and the degree of
generalizability before adapting the results to clinical practice. One must also be aware of the
fact that though clinically more meaningful, NNTs quoted in RCTs may not translate to the same
extent in actual clinical practice. One way of appreciating the usefulness of a newly introduced
drug is to calculate the NNT for one’s own clinical population (target population). To enable
this one may estimate the patient expected event rate (PEER), which is given by the expected
spontaneous resolution rate or the response rate for an existing standard treatment. This can
be obtained from the local audit data or clinical experience. The product of PEER and relative
benefi t increase from the published RCT gives the new absolute benefi t increase (ABI new)
value for the target population. The inverse of the new ABI gives the new NNT for the target
population. The disease prevalence rate or absolute size of the target population has no effect on
the new NNT.
In an attempt to ensure equivalent distribution of potential effect-modifying
factors in treating refractory depression, a researcher weighs the imbalance
that might be caused whenever an individual patient enters one of the two
arms of the study. Every patient is assigned to the group where the least
amount of imbalance will be caused. This method is called
A. Stratifi cation
B. Matching
C. Minimization
D. Randomization
E. Systematic sampling
C. In most treatment trials interventions are allocated by randomization. Block
randomization and stratifi ed randomization can be used to ensure the balance between groups
in size and patient characteristics. But it is very diffi cult to stratify using several variables in a
small sample. A widely acceptable alternative approach is minimization. This method can be used
to ensure very good balance between groups for several confounding factors irrespective of the
size of the sample. With minimization the treatment allocated to the next participant enrolled in
the trial depends (wholly or partly) on the characteristics of those participants already enrolled.
This is achieved by a simple mathematical computation of magnitude of imbalance during each
allocation.
The effectiveness of an intervention is measured by using pragmatic trials. Which trial design is normally employed when carrying out a pragmatic trial? A. RCT B. Meta analysis C. Systematic review D. Cohort study E. Case series
A. RCTs provide high-quality evidence for or against proposed interventions. But RCTs
have a major limitation in terms of generalizability. This is because the trials are conducted in a
somewhat artifi cial experimental setting that is different from clinical practice. So RCTs have
high internal validity due to rigorous methodology but poor external validity. Pragmatic RCTs are
a type of RCTs introduced with the intention of increasing external validity, i.e. generalizability
of RCT results. But this takes place at the expense of internal validity. In pragmatic RCTs the
trial takes place in a setting as close as possible to natural clinical practice, i.e. the inclusion and
exclusion criteria are less fastidious, often ‘treatment as usual’ is employed for comparisons,
instead of placebos and real world, functionally signifi cant outcomes are considered.
The probability of detecting the magnitude of a treatment effect from a
study when such an effect actually exists is called
A. Validity
B. Precision
C. Accuracy
D. Power
E. Yield
D. The power of a study refers to the ability of the study to show the difference in outcome
between studied groups if such a difference actually exists. The term power calculation is often
used while referring to sample size estimation before a study is undertaken. In order to carry out
power calculation one has to know the expected precision and variance of measurements within
the study sample (obtained from a literature search or pilot studies), the magnitude of a clinically
signifi cant difference, the certainty of avoiding type 1 error as refl ected by the chosen
p value, and the type of statistical test one will be performing. There is no point in calculating the
statistical power once the results of a study are known. On completion of trials, measures such as
confi dence intervals indicate the power of a study and the precision of results
Power is the ability of a study to detect an effect that truly exists. Power can
also be defi ned as
A. Probability of avoiding type 1 error
B. Probability of committing type 1 error
C. Probability of committing type 2 error
D. Probability of detecting a type 2 error
E. Probability of avoiding type 2 error
E. Power refers to the probability of avoiding a type 2 error. To calculate power, one needs
to know four variables.
1. sample size
2. magnitude of a clinically signifi cant difference
3. probability of type 1 error (signifi cance level from which p value is derived)
4. variance of the measure in the study sample.
Underpowered trials are those that enrol too few participants to identify differences between
interventions (arbitrarily taken as at least 80% of the time) when such differences truly exist.
Underpowered RCTs are prone to false-negative conclusions (type 2 errors). Somewhat
controversially, underpowered trials are considered to be unethical, as they expose participants
to the ordeals of research without providing an adequate contribution to clinical development
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
The positive predictive value of this test is
A. 50%
B. 60%
C. 40%
D. 100%
E. 0%
D. It is useful to construct a 2 × 2 table for calculating properties of reported diagnostic
tests. From the given information we can draw the following:
Now, positive predictive value = true positive/total positive = 60/60 = 100%.
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
How sensitive is this test in detecting schizophrenia?
A. 60%
B. 40%
C. 100%
D. 90%
E. 0%
a
Sensitivity = true positive/total diseased (schizophrenia subjects) = 60/100 = 60%
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. How
accurate is this test in detecting schizophrenia?
A. 100%
B. 80%
C. 60%
D. 40%
E. 70%
b
Accuracy = all true observations/total population studied = (100 + 60)/200 = 160/200 = 80%
A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. What
are the chances that the text will turn negative in your next patient with
schizophrenia?
A. 100%
B. 70%
C. 60%
D. 40%
E. 30%
D. This question asks the candidate to calculate the probability of a negative test in
someone with the disorder – false-negative rate (FNR)
This is given by FNR = false negative/total diseased = 40/100 = 40%
FNR is same as (1 – sensitivity); similarly false-positive rate (FPR) is same as (1 – specifi city).
Which of the following properties of a screening test increases with
increasing disease prevalence in the population?
A. Negative predictive value
B. Sensitivity
C. Specifi city
D. Accuracy
E. Positive predictive value
E. Sensitivity, specifi city, and accuracy are measures that refl ect the characteristics of the
test instrument. These measures do not vary with changes in the disease prevalence. Positive
predictive value increases while negative predictive value decreases with rising population
prevalence of the disease studied. The prevalence can be interpreted as the probability before the
test is carried out that the subject has the disease, known as the prior probability of disease. The
positive and negative predictive values are the revised estimates of the same probability for those
subjects who are positive and negative on the test, and are known as posterior probabilities.
Thus the difference between the prior and posterior probabilities is one way of assessing the
usefulness of the test.
Two observers are rating MRI scans for the presence or absence of white
matter hyperintensities. On a particular day from the records, they are
observed to have an agreement of 78%. If they could be expected to agree
50% of the time, even if the process of detecting hyperintensities is by pure
chance, then the value of kappa statistics is given by
A. 50%
B. 44%
C. 56%
D. 78%
E. 22%
C. Agreement between different observers can be measured using the kappa (κ) statistic
for categorical measures such as the one highlighted in this question (presence or absence of
MRI hyperintensities). Kappa is a measure of the level of agreement in excess of that which would
be expected by chance. It is calculated as the observed agreement in excess of chance, expressed
as a proportion of the maximum possible agreement in excess of chance. In other words
kappa = the difference between observed and expected agreement/(1 – expected agreement).
In this example, the observed agreement is 78%. The expected agreement is 50%. Hence
kappa = (0.78 – 0.50)/(1 – 0.50) = 0.28/0.50 = 56%.
The number of days that a series of fi ve patients had to wait before starting
cognitive behavioural therapy (CBT) at a psychotherapy unit is as follows:
12, 12, 14, 16, and 21. The median waiting time to get CBT is
A. 15 days
B. 12 days
C. 14 days
D. 21 days
E. 13 days
C. The median is calculated by placing observations in a rank order (either ascending
or descending) and picking up the most central value. If the number of observations is even
(multiples of two), then the median is taken as the arithmetic mean of the two middle values
The number of days that a series of fi ve patients had to wait before starting CBT at a psychotherapy unit is as follows: 12, 12, 14, 16, and 21. The mean waiting time to get CBT is A. 15 days B. 12 days C. 14 days D. 21 days E. 13 days
A. The arithmetic mean is calculated from the sum of all individual observations divided
by the number of observations. Here the number of observations = 5. The sum of individual
observations = 12 + 12 + 14 + 16 + 21 = 75. The average = 75/5 = 15.
The most clinically useful measure that helps to inform the likelihood of
having a disease in a patient with positive results from a diagnostic test is
A. Accuracy
B. Positive predictive value
C. Sensitivity
D. Specifi city
E. Reliability
B. The probability that a test will provide a correct diagnosis is not given by the sensitivity
or specifi city of the test. Sensitivity and specifi city are properties of the test instrument – they
are not functions of the target population/clinical sample. On the other hand, positive and
negative predictive values are functions of the population studied; they provide much more
clinically useful information. Predictive values observed in one study do not apply universally.
Positive predictive value increases with increasing prevalence of the disease; negative predictive
value decreases with increasing prevalence. Sensitivity and specifi city, being properties of the
instrument used, do not vary with prevalence
Zarkin et al., 2008 reported the cost-effectiveness comparison of naltrexone
and placebo in alcohol abstinence. The mean effectiveness measured as
percentage days of abstinence was nearly 80% for naltrexone group while
it was 73% for the placebo group. The mean cost incurred for the placebo
group was $400 per patient. The naltrexone group incurred a cost of
680 per patient. How much additional cost needs to be spent per patient
for each percentage point increase in total days of abstinence when using
naltrexone compared with placebo?
A. $40
B. $50
C. $7
D. $500
E. $2
A. The incremental cost-effectiveness ratio (ICERAB) can be defi ned as the difference in
cost (C) of interventions A and B divided by the difference in mean effectiveness (E), (CA – CB)/
(EA – EB), where intervention B is usually the placebo or standard intervention that is compared
with intervention A. In this example, the difference in costs = $680 – 400 = $280. The difference
in effectiveness as measured by percentage days of abstinence is 80 – 73% = 7%. Hence
ICER = 280/7 = $40 per patient per percentage point of days of abstinence.
Two continuous variables A and B are found to be correlated in a nonlinear
fashion. All of the following can be considered as suitable statistical
techniques for examining this relationship except
A. Curvilinear regression
B. Logistic regression
C. Multiple linear regression
D. Polynomial regression
E. Exponential regression
C. When the relationship between two continuous variables is plotted in a graph,
the resulting distribution may be a straight line or a curve. If the relationship between the
independent (X) variable and dependent (Y) variable appear to follow a straight line, then linear
regression can be constructed to predict the dependent variable from the independent variable.
Otherwise, one can resort to one of the following methods:
1. Attempting to transform the available data to straighten the curved relationship.
2. One can try curvilinear regression, e.g. logarithmic regression, exponential regression, and
trigonometric regression.
3. Unless there is a theoretical reason for supposing that a particular form of the equation as
mentioned above, such as logarithmic or exponential, is needed, we usually test for
non-linearity by using a polynomial regression equation.
4. Multiple linear regression is often used to examine the linear relationships when there is more
than one independent variable infl uencing a dependent variable.
A drug representative presents data on a new trial. The data show that
drug A prevents annual hospitalization in 20% more dementia patients than
placebo. You are very impressed but your consultant wants to know how
many patients you need to treat to prevent one hospitalization. The correct
answer is
A. 20
B. 5
C. 80
D. 1
E. 100
B. The answer to this question can be found by calculating the number needed to treat
(NNT). The absolute increase in benefi t (ABI) is given by the difference in outcome between two
groups. This is 20% as quoted by the drug representative. Hence NNT = 100/20 = 5. You need to
treat fi ve patients with the new drug to prevent one annual hospitalization. How small must the
NNT be to be clinically impressive? This depends on the availability of other interventions and
their NNTs, incremental cost of the proposed intervention, and tolerability of the intervention.
The last one can be partly deciphered by calculating the number needed to harm for a notable
side-effect of the intervention
A new study attempts to evaluate the benefi ts of regular exercise in
preventing depression compared with unmodifi ed lifestyle in a sample of
80 healthy elderly men. Which of the following is not possible in such a
study design?
A. Randomized trial
B. Allocation concealed trial
C. Prospective trial
D. Double-blinded trial
E. Controlled trial
D. Blinding reduces differential assessment of outcomes of interest (ascertainment bias,
information bias, or observer bias) that can occur if the investigator or participant is aware of
the group assignment. Blinding can also improve compliance and retention of trial participants
and reduce unaccounted supplemental care or treatment that may be sought by the participants.
Single blinding refers to either the investigator or the patient being blind to group assignment.
Double blinding refers to both the patient and the investigator remaining unaware of the group
assignment after randomization. This is desirable but not always possible in RCTs. In the example
above, the subjects who undertake the exercise schedule cannot be kept unaware of exercising!
A single-blind trial is possible in such cases
When searching medical databases, the term MeSH refers to
A. Software that distributes all indexed articles
B. A keyword that will retrieve all published articles by an author
C. A thesaurus of medical subject headings
D. A keyword that stops ongoing search process
E. A database of mental health and social care topics
C. MeSH stands for medical subject headings. It is a thesaurus embedded in the
Pubmed–Medline interface and can be used to search literature more effectively using
recognized key words
Which of the following is strictly correct about a single-blind study design?
A. Only the patients, but not the researchers, do not know whether placebo or active drug
is being administered
B. Only the researchers, but not the patients, do not know whether placebo or active drug
is being administered
C. Both the patient and researchers do not know the treatment given
D. Only one group of the trial subjects is kept unaware of the treatment status
E. Either the patients or the researchers do not know whether placebo or active drug is
administered
E. Single blind: either the patient or the clinician remains unaware of the intervention given.
Double blind: both the patient and investigator are unaware of the given intervention.
Open label: both researchers and the participants are aware of treatment being given after
randomisation.
Triple blind: apart from the patient and the researcher, those who measure the study outcomes
(the assessors) are also unaware of the given intervention.
Which one of the following correctly describes a crossover trial?
A. Halfway through the treatment phase, the subjects from both arms interchange
randomly
B. Each subject receives both intervention and control with a washout period in between
C. Controls from one trial are shared with another trial where a different drug is evaluated
simultaneously
D. The trial permits investigation of the effect of more than one independent variable on
the clinical outcome
E. None of the above
B. If random interchange between treatment and placebo groups occurs halfway through
the study, this will lead to chaos and failed randomization. This is termed as contamination.
This can occur when participants or their care givers discover they are ‘controls’, and obtain
the experimental treatment outside the trial, thus effectively becoming the active treatment
group. Choice C is practically impossible; to share controls of one RCT with another means the
trial is open label. When each subject in the trial receives both intervention and placebo with a
washout period in between while remaining blind to the intervention, this is called as crossover
RCT. Crossover trials are possible only if short-term outcomes are evaluated in chronic
diseases. This is because the disease process must be suffi ciently long for the subject to receive
both interventions across its course. Any intervention applied in a crossover setting must not
permanently alter the disease process.
A study evaluates the effect of various psychological interventions on
bulimia. This study could be termed as a factorial design if
A. Halfway through the treatment phase, the subjects from two arms interchange randomly
B. Each subject receives both intervention and control with a washout period in between
C. Controls from one trial are shared with another trial where a totally different
psychotherapy is evaluated simultaneously
D. The trial permits investigation of the effect of more than one psychotherapy, both
separately and combined, on the clinical outcome.
E. None of the above
D. If one wishes to compare the effect of more than one intervention against placebo either
a multi-arm RCT or a factorial design can be chosen. A multi-arm RCT is a simple extension of
the usual RCTs where an extra arm of subjects is generated through randomization to allocate
the second intervention in addition to placebo and the fi rst intervention groups. A factorial RCT
evaluates the effect of more than one intervention, independently and also in combination. In the
above example the effect of two different psychotherapies independently and in combination
could be studied using a factorial design
A 2 × 2 contingency table is constructed to analyse the primary outcome
data of a trial. The degrees of freedom to use chi-square statistics is
A. 1
B. 2
C. 3
D. 4
E. –4
A. ‘Degree of freedom’ is defi ned as the number of values in the fi nal calculation of statistics
that are free to vary. In a two-way chi-square test, this is given by
Degrees of freedom (d.f.) = (number of rows – 1) × (number of columns – 1)
In this question, for a 2 × 2 table, there are 2 rows and 2 columns. Hence
d.f. = (2 – 1) × (2 – 1) = 1 × 1 = 1.
Degrees of freedom cannot take negative values
Which one of the following is correctly matched with the most suitable study method? A. Diagnostic test: case–control study B. Prognosis: prospective cohort study C. Therapy: cross-sectional survey D. Aetiology: case–series study E. Epidemiology: RCT.
B. No single study design is suffi cient in itself to answer various clinical questions. For
evaluation of a diagnostic test, a survey design that allows comparison with the gold standard is
often used. For prognostic studies a prospective cohort design is useful. Therapeutic interventions
are best evaluated using RCTs. Aetiological studies are often cohort or case–control studies;
although the RCT is ideal it may not be always possible to conduct one. Epidemiological studies
are often cross-sectional surveys
Which of the following characters of a pragmatic RCT distinguishes it from
an explanatory RCT?
A. Pseudo-randomization is practised in pragmatic trials
B. Type 1 error level is set to be higher in pragmatic trials
C. Descriptive rather than inferential statistics are used to report the outcome of
pragmatic trials
D. Higher generalizability is achieved in pragmatic trials
E. Strict exclusion of patients with comorbid conditions is seen in pragmatic trials
D. The RCT has traditionally been considered as a study design that can yield results with
a high degree of internal validity. But the major drawback of RCTs is that the process takes
place under highly experimental conditions, which are not seen in clinical practice. So any results
achieved from such RCTs, though valid, may not be reproducible in everyday practice. In order
to circumvent this issue, more naturalistic trials that retain core principles of RCT such as
randomization, longitudinal follow-up, and controlled intervention are being increasingly used.
Such real-world RCTs are called pragmatic trials or effectiveness trials. Such trials can be useful
to fi nd out if an intervention will be effective in clinical practice, although they may not be suitable
to study the biological effi cacy of a drug. A pragmatic RCT may reject various practices seen in an
explanatory RCT, such as strict exclusion criteria, blinding, placebo use, fi xed dose intervention,
high follow-up care, per-protocol analysis, etc. But basic principles such as randomization and use
of probability theory (hypothesis testing and p values) are retained.
Which one of the following statement with respect to bias is false?
A. Bias is a systematic error
B. Bias cannot be controlled for during the analysis stage of a trial
C. The presence of bias always overestimates the fi nal effect
D. Blinding reduces measurement bias
E. Randomization reduces selection bias
C. Bias is defi ned as any trend in the collection, analysis, interpretation, publication, or
review of data that can lead to conclusions that are systematically different from the truth. It can
also be termed as a systematic error that infl uences the result in either direction. Hence a biased
study could either overestimate or underestimate the true effect, depending on the direction of
the trend. Bias may be introduced by poor study design or poor data collection. Bias cannot be
‘controlled for‘ at the analysis stage. In RCTs, randomization ensures a reduction in selection bias
if the process is carried out in a strictly concealed manner. Blinding can reduce the measurement
bias if properly executed.
Which one of the following is NOT a major disadvantage of a double blind,
well-concealed RCT design?
A. Very expensive to carry out
B. May become time consuming
C. Experimental results may not translate to clinical samples
D. Randomization may be unethical and not possible in certain cases
E. Introduction of recall bias
E. Recall bias refers to the systematic error produced by the tendency of subjects to recall
an exposure differently when they are diseased compared with when they are not. Recall bias
most often occurs in case–control studies. The remaining choices refer to genuine disadvantages
of a well-conducted RCT
The last observation carried forward (LOCF) method is not suitable for
processing the data for which of the following RCTs with intention to treat
analysis?
A. Benzodiazepines for anxiety
B. SSRIs for depression
C. Venlafaxine for generalized anxiety disorder
D. Memantine for Alzheimer’s disease
E. Risperidone for bipolar disorder
D. In most drug trials, patients drop-out because of non-effi cacy or adverse events. If we
think that a number of participants drop-out because of non-effi cacy, dropping them out of the
analysis would project a favourable outcome for the drug in question. Hence the LOCF method
takes the last observation and utilizes it in the analysis. For illustration, we take two subjects, in a
trial of antidepressants.
Subject 1, improves signifi cantly over the 4 weeks, his MADRS score has dropped to 1 from a
baseline of 30, while Subject 2 dropped out of the study in the second week, due to non-effi cacy.
If we remove subject 2 from the analysis, the mean score at the end would be 1 (an whopping
improvement of 29 points on the MADRS), while if we carry forward his last observation score
(week 2) of 30 to the end and took the mean of the two scores (15.05), the drop is only
15 points from the mean baseline score of 30.
Trials of Alzheimer’s disease interventions are different, since we do not expect (although we
most defi nitely would like to see) improvement in the cognitive score, but a rather slow decline
in scores over time, in spite of the medications, due to the progressive nature of the illness.
If a patient drops out early because of the experience of adverse effects, carrying forward his
score to the endpoint analysis will falsely project a favourable outcome. Again to illustrate, let us
consider a trial of cholinesterase inhibitors. Subject 1 experienced a decline of 19 points over 4 weeks, while the second subject dropped out
the fi rst week, when his MMSE had not declined. If we carry forward his last observation of 20, it
will look like there was no deterioration at all, and the difference in the mean scores over time
would be diluted to 10, rather than a drop of 19.
As a corollary, the reason for drop-out is another important issue. In trials of Alzheimer’s disease
interventions, early drop-outs are most probably due to adverse effects, while late drop-outs are
due to non-effi cacy. This can again project a favourable outcome for the drug.
All of the following measures can be used to decrease the heterogeneity in
a meta-analysis except
A. Transformation of the outcome variable in question
B. Employing meta regression analysis
C. Using a random effects model
D. Doing a subgroup analysis
E. Including data from smaller unpublished studies
E. There are a number of ways to manage heterogeneity. The easiest way would be to avoid
it. This includes using strict inclusion criteria to include studies that are as similar as possible.
In case of continuous variables, one of the ways would be to transform the data so that all data
look similar and are less heterogeneous. Meta regression is a collection of statistical procedures
to assess heterogeneity, in which the effect size of study is regressed on one or several covariates,
with a value defi ned for each study. The fi xed-effect model of meta-analysis as reported in this
question, considers the variability between the studies as exclusively due to random variation.
The random-effects model assumes a different underlying effect for each study and takes this
into consideration as an additional source of variation. The effects of the studies are assumed to
be randomly distributed and the central point of this distribution is the focus of the combined
(pooled) effect estimate. If there were some types of studies that were likely to be quite
different from the others, a subgroup analysis may be done. And fi nally, one could exclude the
studies that contribute a great deal to the heterogeneity. Locating unpublished studies may help
reduce publication bias but will not have any predictable and constant effect on the degree of
heterogeneity.
Both odds ratios and relative risk are often used as outcome measures in
published studies. Which of the following statement is true regarding these
measures?
A. The odds ratio cannot be calculated in cohort studies
B. Incidence rate is required to calculate the odds ratio
C. Relative risk cannot be calculated for case–control studies
D. If the outcome of interest is very common, the odds ratio approximates relative risk
E. The odds ratio cannot be used to study dichotomous outcomes
C. Odds are the probability of an event occurring divided by the probability of the event
not occurring. An odds ratio is the odds of the event in one group (e.g. intervention group)
divided by the odds in another group (e.g. control group). Odds ratios tend to exaggerate the
true relative risk to some degree. But this exaggeration is kept minimal and even negligible if
the probability of the studied outcome is low (empirically, less than 10%); in such cases the odds
ratio approximates the true relative risk. As the event becomes more common the odds ratio
no longer remains a useful proxy for the relative risk. It is suggested that the use of odds ratios
should probably be limited to case-control studies and logistic regression examining dichotomous
variables. As risk refers to the probability of an event occurring at a time point, in other words
it is the same as the incidence rate. The inherent cross-sectional nature of a case–control study
(where ‘existing cases’ are recruited) does not allow one to study ‘new’ incidences. Hence we
cannot measure risk, and so relative risk, from case–control designs.
Which one of the following clinical question can be correctly addressed by a
case–control design?
A. Is it effective to use hyoscine patches in treating clozapine-induced hypersalivation?
B. How many inpatients in wards for elderly people suffer from untreated
hypercholesterolaemia at any given time?
C. How rapidly will lithium discontinuation produce relapse of schizoaffective disorder?
D. Are we at local community team compliant with the NICE guidelines for prescribing
antipsychotics?
E. Do patients with depression have more academic examination failures than their healthy
siblings?
E. Choice A refers to a clinical question related to therapeutic intervention – RCTs are
best suited to answer this. Choice B is an epidemiological question – ‘how many in a population
have a particular condition?’ A cross-sectional survey could answer this question. Choice C
refers to a prognostic question – how long will it take for schizoaffective relapse following
lithium discontinuation? A prospective cohort (or a RCT if ethically approved) is the most
appropriate design for this question. Choice D requires a clinical audit, which is often closer to a
cross-sectional survey in design. Choice E refers to defi ned cases and controls being compared
for a possible exposure or risk factor that might have occurred in the past. Hence the case–
control design is best suited to answer this question. Please note that it is possible to design a
prospective cohort study by observing for a long time those with academic failure to detect
development of depression.
A 50-year-old man sustained signifi cant memory loss following nearfatal
carbon monoxide poisoning. Following discussion he agreed to take
part in a double-blinded trial of donepezil vs placebo administered in
six separate 4-week modules with a 2-week washout period in between.
Neuropsychological measures were obtained at regular pre-planned
intervals to monitor changes. He was the sole subject on the trial and the
randomization sequence was generated and maintained by the pharmacy.
This study design could be best described as
A. Uncontrolled trial
B. N-of-1 trial
C. Crossover RCT
D. Pragmatic RCT
E. Naturalistic observational study
B. N-of-1 trials are randomized double-blind multiple crossover comparisons of an active
drug against placebo in a single patient. The design uses a series of pairs of treatment periods
called modules. Within each module the patient receives active treatment during one period and
either an accepted standard treatment or placebo in the other. Random allocation determines
the order of the two treatment periods within each pair and both clinician and patient are
blinded for the intervention. This design is mostly suited for chronic recurrent conditions for
which long-term interventions exist that are not curative. Interventions with rapid onset and
offset of effects are best suited for n-of-1 trials. This allows shorter treatment periods wherein
multiple modules of intervention and placebo/standard treatment can be compared, increasing
the chance of achieving a statistically signifi cant result. It is also necessary that the interventions
tested must be cleared from the patient’s system within a fi nite washout period.
While conducting a systematic review, publication bias could be determined
using which of the following methods?
A. Funnel plot
B. Galbraith plot
C. Failsafe N
D. Soliciting and comparing published vs. unpublished data
E. All of the above
E. Publication bias refers to the tendency of journals to accept and publish certain types
of studies more often than the others. In general, studies with results that are impressively
signifi cant or of higher quality by virtue of larger sample size are more successful in getting
published. Publication bias can be considered as a form of selection bias when one attempts a
systematic review or meta-analysis. Publication bias can be detected using a funnel plot – visual
inspection of a graph drawn by plotting a measure of precision (often sample size) against
treatment effect will reveal asymmetry of the two arms of the funnel-shaped graph if publication
bias is present. Galbraith plot refers to a graph obtained by plotting a measure of precision
such as (1/standard error) against standard normal deviate (log of odds ratio/standard error).
The coordinates obtained from such a plot can be used to determine the extent of publication
bias using linear regression. Failsafe N is another way of estimating publication bias. Consider a
meta-analysis yielding a statistically signifi cant difference in outcome between two interventions,
despite suspected publication bias. Then failsafe N answers the question ‘How many missing
studies are needed to reduce the effect to statistical non-signifi cance?’ The higher the failsafe
N, the lower the publication bias. If one could solicit and compare all unpublished data with
published data, then publication bias would become obvious.
In a RCT the randomization sequence is protected before and until the
randomization is completed. This is known as
A. Concealment
B. Double blinding
C. Matching
D. Masking
E. Trial independence
A. Allocation concealment refers to the process used to prevent fore knowledge of the
assignment before allocation is complete. So the investigator who recruits subjects for a trial will
not know the nature of assignment of consequent subjects that enter randomization. Allocation
concealment seeks to prevent selection bias, protects the allocation sequence before and until
assignment, and can almost always be successfully implemented in a RCT. It is often confused with
blinding which seeks to prevent ascertainment bias and protects the sequence after allocation,
and cannot always be implemented
Data collected for a study on antidepressant effi cacy show the outcome
as observations of the number of days needed to achieve remission.
The standard deviation for such observations will be measured in which of
the following units?
A. No units
B. Days
C. Square root of days
D. Days square
E. Person-years
B. The standard deviation has the same units as the primary variable. This is an advantage of
standard deviation compared with variance, which is also a measure of dispersion
In a study presenting outcome in terms of median days of hospital
admission, the collected data show many observations substantially higher
than the median. Which one of the following is correct regarding the above
study?
A. The results are negatively skewed
B. Mean = median = mode
C. The results are not skewed
D. Mean > median
E. Mode = median
D. If many observations are substantially higher than the median we can assume that the
mean of the distribution might be greater than the median. This translates to a positively skewed
distribution. No comments can be made on mode using the available information
A trial is conducted to evaluate the effi cacy of lamotrigine in patients with
symptoms of recurrent depersonalization. While calculating the number of
patients needed in the trial to demonstrate a meaningful effect, α level is set
at 0.05. Which of the following is true regarding alpha (α)?
A. It is the probability of a type 2 error
B. It is the threshold for defi ning clinical signifi cance
C. If α = 0.05, there is a 5% chance that the null hypothesis is rejected wrongly
D. If α = 0.05, then 5% of treated subjects will show absence of treatment effect.
E. None of the above
C. α is the probability of type 1 error. It is used to set the threshold for statistical (not
clinical) signifi cance, often arbitrarily set as p = 0.01–0.05 (α = 1–5%). If α = 0.05, there is a 1 in
20 or 5% chance that the null hypothesis is rejected wrongly.
Which of the following is an agreed method of assessing the quality of
conducting and reporting systematic reviews and meta-analyses?
A. ASSERT
B. CONSORT
C. QUOROM
D. SIGN
E. NICE
C. Despite the increasing importance and abundance of systematic reviews and metaanalyses
in the scientifi c literature, the reporting quality of systematic reviews varies widely.
To address the issue of suboptimal reporting of meta-analyses, an international group in 1996
developed a guidance called the QUOROM Statement (QUality Of Reporting Of Metaanalyses).
QUOROM focused on the standards of reporting meta-analyses of RCTs. A revision
of these guidelines renamed as PRISMA (Preferred Reporting Items for Systematic reviews and
Meta-Analyses) includes several conceptual advances in the methodology of systematic reviews
All of the following methods are used to assess heterogeneity in a meta-analysis except A. Q statistic B. I squared statistic C. Galbraith plot D. L’Abbé plot E. Paired t statistics
E. Meta-analysis is generally done to combine the results of different trials, as individual
clinical trials are often too small and hence underpowered to detect treatment effects reliably.
Meta-analysis increases the power of statistical analyses by pooling the results of all available
trials. But this comes at a small cost. Although similar studies are taken to be included in the
meta-analysis, it is likely that each trial is different from each other just by chance. Sometimes
the difference can occur due to foreseeable situations, e.g. the dosage of medication tested,
the mean ages of the population tested, difference in the scales used, etc, may differ among
studies. To measure if this heterogeneity is more than the random heterogeneity we expect,
statisticians resort to certain tests of heterogeneity. They are statistical as in the chi-square test
(or Q statistic), which tests the ‘null hypothesis’’ of homogeneity and the I-squared test (which
measures the amount of variability due to heterogeneity). Galbraith’s plot and l’Abbé plot are
pictorial representations of heterogeneity. A paired t test is generally not used to calculate the
heterogeneity.
Which one of the following types of data can have potentially infi nite number of values? A. Continuous B. Categorical C. Nominal D. Ordinal E. Binary
A. Data can be qualitative or quantitative. Quantitative data refers to measures that often
have a meaningful unit of expression. This can be either discrete or continuous. A discrete
measure has no other observable value between two contiguous potentially observable values,
i.e. there are ‘gaps’ between values. A continuous variable, on the other hand, can take potentially
infi nite values. The other choices in the question refer to qualitative measures whose value can
only be described and counted but cannot be expressed in meaningful units
A multi-centre RCT was conducted with strict inclusion criteria. Which one
of the following properties of the study is most likely to be affected by the
stringent inclusion criteria?
A. Generalizability of results
B. Precision of results
C. Accuracy of the results
D. Statistical signifi cance of the results
E. All of the above
A. A major disadvantage with RCTs is the poor generalizability of experimental fi ndings
to a clinical setting. Having strict inclusion and exclusion criteria may help chose a highly
homogeneous population, increasing the internal validity of the study but at the expense of
generalizability.
A researcher is interested in studying whether maternal smoking increases
the risk of school refusal in children. Which one of the following is the
correct null hypothesis for the above research question?
A. School refusal increases the risk of maternal smoking
B. Maternal smoking decreases the risk of school refusal
C. Maternal smoking does not increase the risk of school refusal
D. Maternal smoking increases the risk of school refusal
E. None of the above
C. In scientifi c research, nothing can be proven; we can only disprove presumed facts.
If one wants to prove maternal smoking causes school refusal, it is best to assume that maternal
smoking does not cause school refusal to start with and then proceed to disprove this statement.
Such statements waiting to be disproved during the course of a research study are called the null
hypotheses. The converse of the null hypothesis is called the alternative hypothesis.
Research question: Does maternal smoking increase risk of school refusal?
Null hypothesis: Maternal smoking does not increase risk of school refusal
Alternative hypothesis: Maternal smoking increases the risk of school refusal
From the following example, the most important methodological challenge
while conducting a cohort study is
A. Statistical analysis of the results
B. Randomization of the cohorts
C. Identifying those who develop the outcome
D. Identifying a suitable comparison group
E. Concealment of cohort allocation
D. Subjects do not get randomized in a simple cohort study. Hence there is no question of
allocation concealment. When valid instruments and a reasonable follow-up schedule are used,
identifi cation of those who develop the ‘event’ of interest/outcome is often not diffi cult in a
cohort design. Often the most diffi cult part is to identify a reasonable control cohort that lacks
the ‘exposure’ of interest. Internal controls refer to those who are ‘non-exposed’ but derived
from the same study population as the ‘exposed’. External control refers to an independently
recruited cohort without the exposure
In a study investigating the mean cholesterol levels in 36 patients taking
olanzapine, the mean was found to be 262 mg/dL. The standard deviation
of this observation was 15 mg/dL. The 95% confi dence interval for this
observation is are
A. 232–292 mg/dL
B. 247–277 mg/dL
C. 259.5–264.5 mg/dL
D. 257–267 mg/dL
E. 226–298 mg/dL
D. 95% confi dence limits of means of a sample are nothing but the range between an
observation less than approximately two standard error units less than mean value and an
observation two standard error units more than the mean value. Using mathematical expression,
95% confi dence limits = mean ± (2 × standard error of mean).
Standard error of mean is calculated as SE = standard deviation/√sample size.
SE = 15/√36 = 15/6 = 2.5 in this question.
Hence 95% confi dence limits are
262 ± (2 × 2.5) = 262 ± 5 = 257, 267.
In a normal distribution curve, 99% of observations will fall within which of
the following values of standard deviation (SD)?
A. –2 SD to +2 SD
B. –3 SD to +3 SD
C. –1 SD to +2 SD
D. –1 SD to +1 SD
E. +1 SD to +3SD
B. An important property of the normal distribution curve is the relationship between
the SD of normally distributed observations and probability. Normal distribution curves are
symmetric and bell-shaped. Nearly 68.5% of the sampled population will lie within 1 SD of the
mean on either side of the curve, 95.5% within 2 SDs, and 99% within 3 SDs. In other words,
there is a 1% chance that an observation will fall outside +3 SD to –3 SD; a 5% chance that it will
fall outside +2SD to –2SD and nearly 30% chance that it will occur outside +1SD and –1SD.
Confi dence intervals are used to describe the range of uncertainty around
the estimated value of an outcome from the sample studied. Which of the
following statements about confi dence intervals is incorrect?
A. Sample size is used in calculating confi dence intervals
B. It includes a range of values above and below the point estimate
C. If the confi dence interval includes a null treatment effect, the null hypothesis can be
rejected
D. 95% confi dence interval is often used in clinical studies
E. When the estimated outcome is a ratio, a positive treatment effect is shown by
confi dence intervals remaining above one.
C. If the confi dence interval includes a null treatment effect, the null hypothesis cannot be
rejected within the set levels of confi dence limits. Confi dence intervals provide a measure of
dispersion of the point estimate within stipulated confi dence limits (arbitrarily 95% corresponds
to a p value of 5%). In other words, confi dence intervals provide the assured range within which
the true value may lie. Confi dence intervals are a measure of precision of the results obtained
from a study. The larger the sample studied, the narrower the intervals. If the confi dence
intervals cross the value ‘0’ for the difference between means then the results are statistically not
signifi cant. If it crosses the value ‘1’ for ratio measures such as the odds ratio, it is not signifi cant.
If it crosses infi nity for inverse ratios such as NNT then it is not signifi cant.
A clinical researcher is examining the incidence of akathisia in two groups
of patients. One group (n = 35) has been prescribed benzodiazepine for use
as required while the other group (n = 35) is free from any benzodiazepine
exposure. The outcome is measured as proportion of patients who develop
akathisia in a dichotomous scale. Akathisia develops in 10 patients without
benzodiazepines and in 20 patients with benzodiazepines. Which of the
following statistical tests is best suited to analyse the statistical signifi cance
of the difference between the two groups?
A. Chi square test
B. Paired t test
C. Multiple regression analysis
D. Wilcoxon rank sum test
E. Pearson coeffi cient test
A. In this study, the dependent variable is treated as a categorical outcome. In other words,
the population has been categorized into ‘akathisia present’ or ‘akathisia absent’. This type of
outcome yields frequency counts or proportions that can be analysed for signifi cance using
the chi square test. The t test is used for comparing means. The Wilcoxon rank sum test is a
non-parametric equivalent of the t test. Pearson coeffi cients are used to analyse correlation.
Regression analyses are used to predict one variable from another when they are correlated
Considering normal distribution, which one of the following statements is
incorrect?
A. It is a continuous distribution
B. It is symmetrical in shape
C. The mean, median, and mode are identical
D. The shape of the distribution depends on the number of observations made
E. Both tails of the distribution extend to infi nity
D. Irrespective of the number of observations made, the shape of a normally distributed
curve is symmetric and bell shaped. The exact shape of the normal distribution is defi ned by a
function that has only two parameters: mean and standard deviation. For a given range of scores,
when the standard deviation is small, the curve becomes leptokurtic, i.e. thin but still symmetric.
When the standard deviation is larger, it becomes platykurtic.