Basic statistics (MRCP) Flashcards

1
Q

A study is evaluating the effect of agomelatine on postnatal depression at
a mother and baby unit. Which one of the following should be considered
when assessing the internal validity of this study?
A. Benefi ts of agomelatine in major depression outside the postpartum period
B. The degree to which the subjects adhered to the study protocol
C. The cost of using agomelatine compared with standard care
D. Consistency of the reported outcome in comparison with previous studies
E. Benefi ts of agomelatine in postpartum depression when used at an outpatient service

A

B. Internal validity is the degree to which a study establishes the cause-and-effect relationship
between the treatment and the observed outcome. External validity is the degree to which
the results of a study becomes applicable outside the experimental setting in which the study
was conducted. In other words, external validity refers to generalizability of study results while
internal validity refers to rigorousness of the research method. The benefi t of agomelatine
in different populations (choices A and E) refers to external validity; the cost of the drug
and consistency of results obtained from different studies are related to applicability of the
intervention in a clinical setting. Assessment of adherence to study protocol is one of many ways
of analysing the quality of an intervention trial.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A new clinician-administered test for assessing suicidal risk is studied in a
prison population in Canada, where a high suicide rate of 1 in 25 has been
recorded. Which of the following indicate that this test is NOT suitable for
your clinical population?
A. The positive predictive value is 80%
B. The likelihood ratio for a positive test is 14
C. The prevalence of suicide in your clinical sample is 1 in 890
D. The inter-rater reliability (kappa) of the test is 0.8
E. The literacy rate of the prison population is very low but comparable with your clinical
sample

A

C. Having a high positive predictive value, a likelihood ratio more than 10, and good interrater
reliability as measured by kappa are desirable properties of an instrument. But when the
same instrument is applied to a population with much lower prevalence of suicide (the studied
phenomenon), the post-test probability decreases substantially. Post-test probability is a measure
of positive predictive value in the target population; it depends on pretest probability, i.e. the
prevalence and likelihood ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
A new rating scale being evaluated for anxiety has a sensitivity of 80% and
specifi city of 90% against the standard ICD-10 diagnosis. The likelihood ratio
of a positive result is
A. Nearly 2
B. Nearly 0.2
C. 0.08
D. 8
E. 0.5
A

D. The likelihood ratio of a positive test (LR+) is the ratio between the probability of a
positive test in a person with disease and the probability of a positive test in a person without
disease. It can also be expressed as
LR+ = sensitivity/(1 – specifi city)
Here, sensitivity = 0.8; specifi city = 0.9.
Hence LR+ = 0.8/1 – 0.9 = 8.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A pharmaceutical company developed a new antidepressant ‘X’. They
conducted a randomized double-blind placebo controlled trial of the drug.
The study had two arms: an active medication arm and a placebo arm.
Each arm had 100 subjects. Over a 4-week period, a 50% drop in Hamilton
depression scale (HAMD) scores were seen in 40 subjects in the active
medication arm, while a similar drop was seen only in 20 subjects in the
placebo arm. What is the number needed to treat (NNT) from this trial for
the new antidepressant?
A. 1
B. 2
C. 3
D. 4
E. 5

A

5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

During the same placebo controlled trial described in question 4, 20% of
people on X developed active suicidal ideas, while only 10% of patients on
placebo developed the same side-effect. What is the number needed to
harm (NNH) associated with the suicidal ideas from the trial data?
A. 5
B. 10
C. 15
D. 20
E. 25

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The prevalence of depression in patients with mild cognitive impairment
is 10%. On applying a depression rating scale with the likelihood ratio of a
positive test (LR+) equal to 10, a patient with mild cognitive impairment
becomes test positive. The probability that this patient is depressed is equal
to
A. 15%
B. 32%
C. 52%
D. 85%
E. 100%

A

C. This question tests one’s ability to calculate post-test probability from likelihood ratios.
The probability of having a disease after testing positive with a diagnostic test depends on
two factors: (a) the prevalence of the disease, (b) the likelihood of a positive test result using
the instrument. It is important to remember that baseline prevalence of a disease for which a
diagnostic instrument is being tested is taken as the pretest probability.
So pretest probability = 10%
Now, post-test odds = likelihood ratio × pretest odds
From a given probability odds can be calculated using the formula
odds = (probability)/(1 – probability)
Here pretest odds = (10%)/(1 – 10%) = 10/90 = 1/9.
Now post-test odds = likelihood ratio × pretest odds
= 10 × 1/9 = 10/9
Using the formula probability = odds/(1 + odds)
post-test probability = (10/9)/[1 + (10/9)] = 10/19 = 52.3%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A multi-centre double blind pragmatic randomized controlled trial (RCT)
reported remission rates for depression of 65% for fl uoxetine and 60% for
dosulepin. The number of patients that must receive fl uoxetine for one
patient to achieve the demonstrated benefi cial effect is
A. 60
B. 20
C. 15
D. 10
E. 5

A

B. This question tests one’s knowledge of the NNT (number needed to treat) concept. NNT
is given by the inverse ratio of the absolute benefi t increase (ABI) in therapeutic trials. ABI is
the difference between benefi t due to experimental intervention and the compared standard/
placebo. Here it is given by 65% – 60% = 5%. If ABI = 5%, NNT = 100/5 = 20.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a randomized double-blind trial two groups of hospitalized depressed
patients treated with selective serotonin reuptake inhibitors (SSRIs) are
evaluated for benefi cial effects on insomnia of trazodone vs temazepam.
Which of the following is NOT an important factor when evaluating the
internal validity of results obtained from the above study?
A. Baseline differences in antidepressant therapy between the two groups
B. The method used to randomize the sample
C. Setting in which the study takes place
D. Sensitivity of the insomnia scale to pick up changes in severity
E. Inclusion of the data in fi nal analysis from patients who have dropped out

A

C. Threats to internal validity of an experimental study include confounding, selection bias,
differential attrition, and quality of measurement. Having a signifi cant difference in baseline SSRI
therapy could explain differential outcomes in the trazodone vs temazepam groups. Similarly,
poor randomization may lead to selection bias and infl uence the differences in outcome. Failure
to account for differential drop-out rates may spuriously infl ate or defl ate the difference in
outcome. Using a scale with poor sensitivity to change will reduce the magnitude of differences
that could be observed. Given both groups are recruited from the same setting (hospital), this
must not infl uence validity; on the other hand, this might well infl uence generalizability of results
to the non-hospitalized population (external validity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

While adapting the results of an RCT into clinical practice, a clinician wants
to calculate the new NNT values for his own clinical population using the
results of the RCT. Apart from the reported RCT which of the following is
needed to carry out the calculation of the new NNT?
A. The expected rate of spontaneous resolution of the treated condition in the clinical
population
B. The size of the clinical population
C. The case fatality rate for the treated condition in the clinical population
D. Lifetime prevalence of the disease in the clinical population
E. All of the above

A

A. Published RCTs may quote impressive outcomes in terms of NNT. Applying principles of
evidence-based medicine, one must check for the internal validity of a study and the degree of
generalizability before adapting the results to clinical practice. One must also be aware of the
fact that though clinically more meaningful, NNTs quoted in RCTs may not translate to the same
extent in actual clinical practice. One way of appreciating the usefulness of a newly introduced
drug is to calculate the NNT for one’s own clinical population (target population). To enable
this one may estimate the patient expected event rate (PEER), which is given by the expected
spontaneous resolution rate or the response rate for an existing standard treatment. This can
be obtained from the local audit data or clinical experience. The product of PEER and relative
benefi t increase from the published RCT gives the new absolute benefi t increase (ABI new)
value for the target population. The inverse of the new ABI gives the new NNT for the target
population. The disease prevalence rate or absolute size of the target population has no effect on
the new NNT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In an attempt to ensure equivalent distribution of potential effect-modifying
factors in treating refractory depression, a researcher weighs the imbalance
that might be caused whenever an individual patient enters one of the two
arms of the study. Every patient is assigned to the group where the least
amount of imbalance will be caused. This method is called
A. Stratifi cation
B. Matching
C. Minimization
D. Randomization
E. Systematic sampling

A

C. In most treatment trials interventions are allocated by randomization. Block
randomization and stratifi ed randomization can be used to ensure the balance between groups
in size and patient characteristics. But it is very diffi cult to stratify using several variables in a
small sample. A widely acceptable alternative approach is minimization. This method can be used
to ensure very good balance between groups for several confounding factors irrespective of the
size of the sample. With minimization the treatment allocated to the next participant enrolled in
the trial depends (wholly or partly) on the characteristics of those participants already enrolled.
This is achieved by a simple mathematical computation of magnitude of imbalance during each
allocation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
The effectiveness of an intervention is measured by using pragmatic trials.
Which trial design is normally employed when carrying out a pragmatic
trial?
A. RCT
B. Meta analysis
C. Systematic review
D. Cohort study
E. Case series
A

A. RCTs provide high-quality evidence for or against proposed interventions. But RCTs
have a major limitation in terms of generalizability. This is because the trials are conducted in a
somewhat artifi cial experimental setting that is different from clinical practice. So RCTs have
high internal validity due to rigorous methodology but poor external validity. Pragmatic RCTs are
a type of RCTs introduced with the intention of increasing external validity, i.e. generalizability
of RCT results. But this takes place at the expense of internal validity. In pragmatic RCTs the
trial takes place in a setting as close as possible to natural clinical practice, i.e. the inclusion and
exclusion criteria are less fastidious, often ‘treatment as usual’ is employed for comparisons,
instead of placebos and real world, functionally signifi cant outcomes are considered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The probability of detecting the magnitude of a treatment effect from a
study when such an effect actually exists is called
A. Validity
B. Precision
C. Accuracy
D. Power
E. Yield

A

D. The power of a study refers to the ability of the study to show the difference in outcome
between studied groups if such a difference actually exists. The term power calculation is often
used while referring to sample size estimation before a study is undertaken. In order to carry out
power calculation one has to know the expected precision and variance of measurements within
the study sample (obtained from a literature search or pilot studies), the magnitude of a clinically
signifi cant difference, the certainty of avoiding type 1 error as refl ected by the chosen
p value, and the type of statistical test one will be performing. There is no point in calculating the
statistical power once the results of a study are known. On completion of trials, measures such as
confi dence intervals indicate the power of a study and the precision of results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Power is the ability of a study to detect an effect that truly exists. Power can
also be defi ned as
A. Probability of avoiding type 1 error
B. Probability of committing type 1 error
C. Probability of committing type 2 error
D. Probability of detecting a type 2 error
E. Probability of avoiding type 2 error

A

E. Power refers to the probability of avoiding a type 2 error. To calculate power, one needs
to know four variables.
1. sample size
2. magnitude of a clinically signifi cant difference
3. probability of type 1 error (signifi cance level from which p value is derived)
4. variance of the measure in the study sample.
Underpowered trials are those that enrol too few participants to identify differences between
interventions (arbitrarily taken as at least 80% of the time) when such differences truly exist.
Underpowered RCTs are prone to false-negative conclusions (type 2 errors). Somewhat
controversially, underpowered trials are considered to be unethical, as they expose participants
to the ordeals of research without providing an adequate contribution to clinical development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
The positive predictive value of this test is
A. 50%
B. 60%
C. 40%
D. 100%
E. 0%

A

D. It is useful to construct a 2 × 2 table for calculating properties of reported diagnostic
tests. From the given information we can draw the following:
Now, positive predictive value = true positive/total positive = 60/60 = 100%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls.
How sensitive is this test in detecting schizophrenia?
A. 60%
B. 40%
C. 100%
D. 90%
E. 0%

A

a

Sensitivity = true positive/total diseased (schizophrenia subjects) = 60/100 = 60%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. How
accurate is this test in detecting schizophrenia?
A. 100%
B. 80%
C. 60%
D. 40%
E. 70%

A

b

Accuracy = all true observations/total population studied = (100 + 60)/200 = 160/200 = 80%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A new diagnostic test detects 60 out of 100 schizophrenia patients correctly.
It does not wrongly diagnose anyone in a sample of 100 controls. What
are the chances that the text will turn negative in your next patient with
schizophrenia?
A. 100%
B. 70%
C. 60%
D. 40%
E. 30%

A

D. This question asks the candidate to calculate the probability of a negative test in
someone with the disorder – false-negative rate (FNR)
This is given by FNR = false negative/total diseased = 40/100 = 40%
FNR is same as (1 – sensitivity); similarly false-positive rate (FPR) is same as (1 – specifi city).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following properties of a screening test increases with
increasing disease prevalence in the population?
A. Negative predictive value
B. Sensitivity
C. Specifi city
D. Accuracy
E. Positive predictive value

A

E. Sensitivity, specifi city, and accuracy are measures that refl ect the characteristics of the
test instrument. These measures do not vary with changes in the disease prevalence. Positive
predictive value increases while negative predictive value decreases with rising population
prevalence of the disease studied. The prevalence can be interpreted as the probability before the
test is carried out that the subject has the disease, known as the prior probability of disease. The
positive and negative predictive values are the revised estimates of the same probability for those
subjects who are positive and negative on the test, and are known as posterior probabilities.
Thus the difference between the prior and posterior probabilities is one way of assessing the
usefulness of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Two observers are rating MRI scans for the presence or absence of white
matter hyperintensities. On a particular day from the records, they are
observed to have an agreement of 78%. If they could be expected to agree
50% of the time, even if the process of detecting hyperintensities is by pure
chance, then the value of kappa statistics is given by
A. 50%
B. 44%
C. 56%
D. 78%
E. 22%

A

C. Agreement between different observers can be measured using the kappa (κ) statistic
for categorical measures such as the one highlighted in this question (presence or absence of
MRI hyperintensities). Kappa is a measure of the level of agreement in excess of that which would
be expected by chance. It is calculated as the observed agreement in excess of chance, expressed
as a proportion of the maximum possible agreement in excess of chance. In other words
kappa = the difference between observed and expected agreement/(1 – expected agreement).
In this example, the observed agreement is 78%. The expected agreement is 50%. Hence
kappa = (0.78 – 0.50)/(1 – 0.50) = 0.28/0.50 = 56%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The number of days that a series of fi ve patients had to wait before starting
cognitive behavioural therapy (CBT) at a psychotherapy unit is as follows:
12, 12, 14, 16, and 21. The median waiting time to get CBT is
A. 15 days
B. 12 days
C. 14 days
D. 21 days
E. 13 days

A

C. The median is calculated by placing observations in a rank order (either ascending
or descending) and picking up the most central value. If the number of observations is even
(multiples of two), then the median is taken as the arithmetic mean of the two middle values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
The number of days that a series of fi ve patients had to wait before starting
CBT at a psychotherapy unit is as follows: 12, 12, 14, 16, and 21. The mean
waiting time to get CBT is
A. 15 days
B. 12 days
C. 14 days
D. 21 days
E. 13 days
A

A. The arithmetic mean is calculated from the sum of all individual observations divided
by the number of observations. Here the number of observations = 5. The sum of individual
observations = 12 + 12 + 14 + 16 + 21 = 75. The average = 75/5 = 15.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The most clinically useful measure that helps to inform the likelihood of
having a disease in a patient with positive results from a diagnostic test is
A. Accuracy
B. Positive predictive value
C. Sensitivity
D. Specifi city
E. Reliability

A

B. The probability that a test will provide a correct diagnosis is not given by the sensitivity
or specifi city of the test. Sensitivity and specifi city are properties of the test instrument – they
are not functions of the target population/clinical sample. On the other hand, positive and
negative predictive values are functions of the population studied; they provide much more
clinically useful information. Predictive values observed in one study do not apply universally.
Positive predictive value increases with increasing prevalence of the disease; negative predictive
value decreases with increasing prevalence. Sensitivity and specifi city, being properties of the
instrument used, do not vary with prevalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Zarkin et al., 2008 reported the cost-effectiveness comparison of naltrexone
and placebo in alcohol abstinence. The mean effectiveness measured as
percentage days of abstinence was nearly 80% for naltrexone group while
it was 73% for the placebo group. The mean cost incurred for the placebo
group was $400 per patient. The naltrexone group incurred a cost of
680 per patient. How much additional cost needs to be spent per patient
for each percentage point increase in total days of abstinence when using
naltrexone compared with placebo?
A. $40
B. $50
C. $7
D. $500
E. $2

A

A. The incremental cost-effectiveness ratio (ICERAB) can be defi ned as the difference in
cost (C) of interventions A and B divided by the difference in mean effectiveness (E), (CA – CB)/
(EA – EB), where intervention B is usually the placebo or standard intervention that is compared
with intervention A. In this example, the difference in costs = $680 – 400 = $280. The difference
in effectiveness as measured by percentage days of abstinence is 80 – 73% = 7%. Hence
ICER = 280/7 = $40 per patient per percentage point of days of abstinence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Two continuous variables A and B are found to be correlated in a nonlinear
fashion. All of the following can be considered as suitable statistical
techniques for examining this relationship except
A. Curvilinear regression
B. Logistic regression
C. Multiple linear regression
D. Polynomial regression
E. Exponential regression

A

C. When the relationship between two continuous variables is plotted in a graph,
the resulting distribution may be a straight line or a curve. If the relationship between the
independent (X) variable and dependent (Y) variable appear to follow a straight line, then linear
regression can be constructed to predict the dependent variable from the independent variable.
Otherwise, one can resort to one of the following methods:
1. Attempting to transform the available data to straighten the curved relationship.
2. One can try curvilinear regression, e.g. logarithmic regression, exponential regression, and
trigonometric regression.
3. Unless there is a theoretical reason for supposing that a particular form of the equation as
mentioned above, such as logarithmic or exponential, is needed, we usually test for
non-linearity by using a polynomial regression equation.
4. Multiple linear regression is often used to examine the linear relationships when there is more
than one independent variable infl uencing a dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A drug representative presents data on a new trial. The data show that
drug A prevents annual hospitalization in 20% more dementia patients than
placebo. You are very impressed but your consultant wants to know how
many patients you need to treat to prevent one hospitalization. The correct
answer is
A. 20
B. 5
C. 80
D. 1
E. 100

A

B. The answer to this question can be found by calculating the number needed to treat
(NNT). The absolute increase in benefi t (ABI) is given by the difference in outcome between two
groups. This is 20% as quoted by the drug representative. Hence NNT = 100/20 = 5. You need to
treat fi ve patients with the new drug to prevent one annual hospitalization. How small must the
NNT be to be clinically impressive? This depends on the availability of other interventions and
their NNTs, incremental cost of the proposed intervention, and tolerability of the intervention.
The last one can be partly deciphered by calculating the number needed to harm for a notable
side-effect of the intervention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

A new study attempts to evaluate the benefi ts of regular exercise in
preventing depression compared with unmodifi ed lifestyle in a sample of
80 healthy elderly men. Which of the following is not possible in such a
study design?
A. Randomized trial
B. Allocation concealed trial
C. Prospective trial
D. Double-blinded trial
E. Controlled trial

A

D. Blinding reduces differential assessment of outcomes of interest (ascertainment bias,
information bias, or observer bias) that can occur if the investigator or participant is aware of
the group assignment. Blinding can also improve compliance and retention of trial participants
and reduce unaccounted supplemental care or treatment that may be sought by the participants.
Single blinding refers to either the investigator or the patient being blind to group assignment.
Double blinding refers to both the patient and the investigator remaining unaware of the group
assignment after randomization. This is desirable but not always possible in RCTs. In the example
above, the subjects who undertake the exercise schedule cannot be kept unaware of exercising!
A single-blind trial is possible in such cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When searching medical databases, the term MeSH refers to
A. Software that distributes all indexed articles
B. A keyword that will retrieve all published articles by an author
C. A thesaurus of medical subject headings
D. A keyword that stops ongoing search process
E. A database of mental health and social care topics

A

C. MeSH stands for medical subject headings. It is a thesaurus embedded in the
Pubmed–Medline interface and can be used to search literature more effectively using
recognized key words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Which of the following is strictly correct about a single-blind study design?
A. Only the patients, but not the researchers, do not know whether placebo or active drug
is being administered
B. Only the researchers, but not the patients, do not know whether placebo or active drug
is being administered
C. Both the patient and researchers do not know the treatment given
D. Only one group of the trial subjects is kept unaware of the treatment status
E. Either the patients or the researchers do not know whether placebo or active drug is
administered

A

E. Single blind: either the patient or the clinician remains unaware of the intervention given.
Double blind: both the patient and investigator are unaware of the given intervention.
Open label: both researchers and the participants are aware of treatment being given after
randomisation.
Triple blind: apart from the patient and the researcher, those who measure the study outcomes
(the assessors) are also unaware of the given intervention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Which one of the following correctly describes a crossover trial?
A. Halfway through the treatment phase, the subjects from both arms interchange
randomly
B. Each subject receives both intervention and control with a washout period in between
C. Controls from one trial are shared with another trial where a different drug is evaluated
simultaneously
D. The trial permits investigation of the effect of more than one independent variable on
the clinical outcome
E. None of the above

A

B. If random interchange between treatment and placebo groups occurs halfway through
the study, this will lead to chaos and failed randomization. This is termed as contamination.
This can occur when participants or their care givers discover they are ‘controls’, and obtain
the experimental treatment outside the trial, thus effectively becoming the active treatment
group. Choice C is practically impossible; to share controls of one RCT with another means the
trial is open label. When each subject in the trial receives both intervention and placebo with a
washout period in between while remaining blind to the intervention, this is called as crossover
RCT. Crossover trials are possible only if short-term outcomes are evaluated in chronic
diseases. This is because the disease process must be suffi ciently long for the subject to receive
both interventions across its course. Any intervention applied in a crossover setting must not
permanently alter the disease process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

A study evaluates the effect of various psychological interventions on
bulimia. This study could be termed as a factorial design if
A. Halfway through the treatment phase, the subjects from two arms interchange randomly
B. Each subject receives both intervention and control with a washout period in between
C. Controls from one trial are shared with another trial where a totally different
psychotherapy is evaluated simultaneously
D. The trial permits investigation of the effect of more than one psychotherapy, both
separately and combined, on the clinical outcome.
E. None of the above

A

D. If one wishes to compare the effect of more than one intervention against placebo either
a multi-arm RCT or a factorial design can be chosen. A multi-arm RCT is a simple extension of
the usual RCTs where an extra arm of subjects is generated through randomization to allocate
the second intervention in addition to placebo and the fi rst intervention groups. A factorial RCT
evaluates the effect of more than one intervention, independently and also in combination. In the
above example the effect of two different psychotherapies independently and in combination
could be studied using a factorial design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

A 2 × 2 contingency table is constructed to analyse the primary outcome
data of a trial. The degrees of freedom to use chi-square statistics is
A. 1
B. 2
C. 3
D. 4
E. –4

A

A. ‘Degree of freedom’ is defi ned as the number of values in the fi nal calculation of statistics
that are free to vary. In a two-way chi-square test, this is given by
Degrees of freedom (d.f.) = (number of rows – 1) × (number of columns – 1)
In this question, for a 2 × 2 table, there are 2 rows and 2 columns. Hence
d.f. = (2 – 1) × (2 – 1) = 1 × 1 = 1.
Degrees of freedom cannot take negative values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q
Which one of the following is correctly matched with the most suitable
study method?
A. Diagnostic test: case–control study
B. Prognosis: prospective cohort study
C. Therapy: cross-sectional survey
D. Aetiology: case–series study
E. Epidemiology: RCT.
A

B. No single study design is suffi cient in itself to answer various clinical questions. For
evaluation of a diagnostic test, a survey design that allows comparison with the gold standard is
often used. For prognostic studies a prospective cohort design is useful. Therapeutic interventions
are best evaluated using RCTs. Aetiological studies are often cohort or case–control studies;
although the RCT is ideal it may not be always possible to conduct one. Epidemiological studies
are often cross-sectional surveys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Which of the following characters of a pragmatic RCT distinguishes it from
an explanatory RCT?
A. Pseudo-randomization is practised in pragmatic trials
B. Type 1 error level is set to be higher in pragmatic trials
C. Descriptive rather than inferential statistics are used to report the outcome of
pragmatic trials
D. Higher generalizability is achieved in pragmatic trials
E. Strict exclusion of patients with comorbid conditions is seen in pragmatic trials

A

D. The RCT has traditionally been considered as a study design that can yield results with
a high degree of internal validity. But the major drawback of RCTs is that the process takes
place under highly experimental conditions, which are not seen in clinical practice. So any results
achieved from such RCTs, though valid, may not be reproducible in everyday practice. In order
to circumvent this issue, more naturalistic trials that retain core principles of RCT such as
randomization, longitudinal follow-up, and controlled intervention are being increasingly used.
Such real-world RCTs are called pragmatic trials or effectiveness trials. Such trials can be useful
to fi nd out if an intervention will be effective in clinical practice, although they may not be suitable
to study the biological effi cacy of a drug. A pragmatic RCT may reject various practices seen in an
explanatory RCT, such as strict exclusion criteria, blinding, placebo use, fi xed dose intervention,
high follow-up care, per-protocol analysis, etc. But basic principles such as randomization and use
of probability theory (hypothesis testing and p values) are retained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Which one of the following statement with respect to bias is false?
A. Bias is a systematic error
B. Bias cannot be controlled for during the analysis stage of a trial
C. The presence of bias always overestimates the fi nal effect
D. Blinding reduces measurement bias
E. Randomization reduces selection bias

A

C. Bias is defi ned as any trend in the collection, analysis, interpretation, publication, or
review of data that can lead to conclusions that are systematically different from the truth. It can
also be termed as a systematic error that infl uences the result in either direction. Hence a biased
study could either overestimate or underestimate the true effect, depending on the direction of
the trend. Bias may be introduced by poor study design or poor data collection. Bias cannot be
‘controlled for‘ at the analysis stage. In RCTs, randomization ensures a reduction in selection bias
if the process is carried out in a strictly concealed manner. Blinding can reduce the measurement
bias if properly executed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Which one of the following is NOT a major disadvantage of a double blind,
well-concealed RCT design?
A. Very expensive to carry out
B. May become time consuming
C. Experimental results may not translate to clinical samples
D. Randomization may be unethical and not possible in certain cases
E. Introduction of recall bias

A

E. Recall bias refers to the systematic error produced by the tendency of subjects to recall
an exposure differently when they are diseased compared with when they are not. Recall bias
most often occurs in case–control studies. The remaining choices refer to genuine disadvantages
of a well-conducted RCT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

The last observation carried forward (LOCF) method is not suitable for
processing the data for which of the following RCTs with intention to treat
analysis?
A. Benzodiazepines for anxiety
B. SSRIs for depression
C. Venlafaxine for generalized anxiety disorder
D. Memantine for Alzheimer’s disease
E. Risperidone for bipolar disorder

A

D. In most drug trials, patients drop-out because of non-effi cacy or adverse events. If we
think that a number of participants drop-out because of non-effi cacy, dropping them out of the
analysis would project a favourable outcome for the drug in question. Hence the LOCF method
takes the last observation and utilizes it in the analysis. For illustration, we take two subjects, in a
trial of antidepressants.
Subject 1, improves signifi cantly over the 4 weeks, his MADRS score has dropped to 1 from a
baseline of 30, while Subject 2 dropped out of the study in the second week, due to non-effi cacy.
If we remove subject 2 from the analysis, the mean score at the end would be 1 (an whopping
improvement of 29 points on the MADRS), while if we carry forward his last observation score
(week 2) of 30 to the end and took the mean of the two scores (15.05), the drop is only
15 points from the mean baseline score of 30.
Trials of Alzheimer’s disease interventions are different, since we do not expect (although we
most defi nitely would like to see) improvement in the cognitive score, but a rather slow decline
in scores over time, in spite of the medications, due to the progressive nature of the illness.
If a patient drops out early because of the experience of adverse effects, carrying forward his
score to the endpoint analysis will falsely project a favourable outcome. Again to illustrate, let us
consider a trial of cholinesterase inhibitors. Subject 1 experienced a decline of 19 points over 4 weeks, while the second subject dropped out
the fi rst week, when his MMSE had not declined. If we carry forward his last observation of 20, it
will look like there was no deterioration at all, and the difference in the mean scores over time
would be diluted to 10, rather than a drop of 19.
As a corollary, the reason for drop-out is another important issue. In trials of Alzheimer’s disease
interventions, early drop-outs are most probably due to adverse effects, while late drop-outs are
due to non-effi cacy. This can again project a favourable outcome for the drug.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

All of the following measures can be used to decrease the heterogeneity in
a meta-analysis except
A. Transformation of the outcome variable in question
B. Employing meta regression analysis
C. Using a random effects model
D. Doing a subgroup analysis
E. Including data from smaller unpublished studies

A

E. There are a number of ways to manage heterogeneity. The easiest way would be to avoid
it. This includes using strict inclusion criteria to include studies that are as similar as possible.
In case of continuous variables, one of the ways would be to transform the data so that all data
look similar and are less heterogeneous. Meta regression is a collection of statistical procedures
to assess heterogeneity, in which the effect size of study is regressed on one or several covariates,
with a value defi ned for each study. The fi xed-effect model of meta-analysis as reported in this
question, considers the variability between the studies as exclusively due to random variation.
The random-effects model assumes a different underlying effect for each study and takes this
into consideration as an additional source of variation. The effects of the studies are assumed to
be randomly distributed and the central point of this distribution is the focus of the combined
(pooled) effect estimate. If there were some types of studies that were likely to be quite
different from the others, a subgroup analysis may be done. And fi nally, one could exclude the
studies that contribute a great deal to the heterogeneity. Locating unpublished studies may help
reduce publication bias but will not have any predictable and constant effect on the degree of
heterogeneity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Both odds ratios and relative risk are often used as outcome measures in
published studies. Which of the following statement is true regarding these
measures?
A. The odds ratio cannot be calculated in cohort studies
B. Incidence rate is required to calculate the odds ratio
C. Relative risk cannot be calculated for case–control studies
D. If the outcome of interest is very common, the odds ratio approximates relative risk
E. The odds ratio cannot be used to study dichotomous outcomes

A

C. Odds are the probability of an event occurring divided by the probability of the event
not occurring. An odds ratio is the odds of the event in one group (e.g. intervention group)
divided by the odds in another group (e.g. control group). Odds ratios tend to exaggerate the
true relative risk to some degree. But this exaggeration is kept minimal and even negligible if
the probability of the studied outcome is low (empirically, less than 10%); in such cases the odds
ratio approximates the true relative risk. As the event becomes more common the odds ratio
no longer remains a useful proxy for the relative risk. It is suggested that the use of odds ratios
should probably be limited to case-control studies and logistic regression examining dichotomous
variables. As risk refers to the probability of an event occurring at a time point, in other words
it is the same as the incidence rate. The inherent cross-sectional nature of a case–control study
(where ‘existing cases’ are recruited) does not allow one to study ‘new’ incidences. Hence we
cannot measure risk, and so relative risk, from case–control designs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Which one of the following clinical question can be correctly addressed by a
case–control design?
A. Is it effective to use hyoscine patches in treating clozapine-induced hypersalivation?
B. How many inpatients in wards for elderly people suffer from untreated
hypercholesterolaemia at any given time?
C. How rapidly will lithium discontinuation produce relapse of schizoaffective disorder?
D. Are we at local community team compliant with the NICE guidelines for prescribing
antipsychotics?
E. Do patients with depression have more academic examination failures than their healthy
siblings?

A

E. Choice A refers to a clinical question related to therapeutic intervention – RCTs are
best suited to answer this. Choice B is an epidemiological question – ‘how many in a population
have a particular condition?’ A cross-sectional survey could answer this question. Choice C
refers to a prognostic question – how long will it take for schizoaffective relapse following
lithium discontinuation? A prospective cohort (or a RCT if ethically approved) is the most
appropriate design for this question. Choice D requires a clinical audit, which is often closer to a
cross-sectional survey in design. Choice E refers to defi ned cases and controls being compared
for a possible exposure or risk factor that might have occurred in the past. Hence the case–
control design is best suited to answer this question. Please note that it is possible to design a
prospective cohort study by observing for a long time those with academic failure to detect
development of depression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

A 50-year-old man sustained signifi cant memory loss following nearfatal
carbon monoxide poisoning. Following discussion he agreed to take
part in a double-blinded trial of donepezil vs placebo administered in
six separate 4-week modules with a 2-week washout period in between.
Neuropsychological measures were obtained at regular pre-planned
intervals to monitor changes. He was the sole subject on the trial and the
randomization sequence was generated and maintained by the pharmacy.
This study design could be best described as
A. Uncontrolled trial
B. N-of-1 trial
C. Crossover RCT
D. Pragmatic RCT
E. Naturalistic observational study

A

B. N-of-1 trials are randomized double-blind multiple crossover comparisons of an active
drug against placebo in a single patient. The design uses a series of pairs of treatment periods
called modules. Within each module the patient receives active treatment during one period and
either an accepted standard treatment or placebo in the other. Random allocation determines
the order of the two treatment periods within each pair and both clinician and patient are
blinded for the intervention. This design is mostly suited for chronic recurrent conditions for
which long-term interventions exist that are not curative. Interventions with rapid onset and
offset of effects are best suited for n-of-1 trials. This allows shorter treatment periods wherein
multiple modules of intervention and placebo/standard treatment can be compared, increasing
the chance of achieving a statistically signifi cant result. It is also necessary that the interventions
tested must be cleared from the patient’s system within a fi nite washout period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

While conducting a systematic review, publication bias could be determined
using which of the following methods?
A. Funnel plot
B. Galbraith plot
C. Failsafe N
D. Soliciting and comparing published vs. unpublished data
E. All of the above

A

E. Publication bias refers to the tendency of journals to accept and publish certain types
of studies more often than the others. In general, studies with results that are impressively
signifi cant or of higher quality by virtue of larger sample size are more successful in getting
published. Publication bias can be considered as a form of selection bias when one attempts a
systematic review or meta-analysis. Publication bias can be detected using a funnel plot – visual
inspection of a graph drawn by plotting a measure of precision (often sample size) against
treatment effect will reveal asymmetry of the two arms of the funnel-shaped graph if publication
bias is present. Galbraith plot refers to a graph obtained by plotting a measure of precision
such as (1/standard error) against standard normal deviate (log of odds ratio/standard error).
The coordinates obtained from such a plot can be used to determine the extent of publication
bias using linear regression. Failsafe N is another way of estimating publication bias. Consider a
meta-analysis yielding a statistically signifi cant difference in outcome between two interventions,
despite suspected publication bias. Then failsafe N answers the question ‘How many missing
studies are needed to reduce the effect to statistical non-signifi cance?’ The higher the failsafe
N, the lower the publication bias. If one could solicit and compare all unpublished data with
published data, then publication bias would become obvious.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

In a RCT the randomization sequence is protected before and until the
randomization is completed. This is known as
A. Concealment
B. Double blinding
C. Matching
D. Masking
E. Trial independence

A

A. Allocation concealment refers to the process used to prevent fore knowledge of the
assignment before allocation is complete. So the investigator who recruits subjects for a trial will
not know the nature of assignment of consequent subjects that enter randomization. Allocation
concealment seeks to prevent selection bias, protects the allocation sequence before and until
assignment, and can almost always be successfully implemented in a RCT. It is often confused with
blinding which seeks to prevent ascertainment bias and protects the sequence after allocation,
and cannot always be implemented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Data collected for a study on antidepressant effi cacy show the outcome
as observations of the number of days needed to achieve remission.
The standard deviation for such observations will be measured in which of
the following units?
A. No units
B. Days
C. Square root of days
D. Days square
E. Person-years

A

B. The standard deviation has the same units as the primary variable. This is an advantage of
standard deviation compared with variance, which is also a measure of dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

In a study presenting outcome in terms of median days of hospital
admission, the collected data show many observations substantially higher
than the median. Which one of the following is correct regarding the above
study?
A. The results are negatively skewed
B. Mean = median = mode
C. The results are not skewed
D. Mean > median
E. Mode = median

A

D. If many observations are substantially higher than the median we can assume that the
mean of the distribution might be greater than the median. This translates to a positively skewed
distribution. No comments can be made on mode using the available information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

A trial is conducted to evaluate the effi cacy of lamotrigine in patients with
symptoms of recurrent depersonalization. While calculating the number of
patients needed in the trial to demonstrate a meaningful effect, α level is set
at 0.05. Which of the following is true regarding alpha (α)?
A. It is the probability of a type 2 error
B. It is the threshold for defi ning clinical signifi cance
C. If α = 0.05, there is a 5% chance that the null hypothesis is rejected wrongly
D. If α = 0.05, then 5% of treated subjects will show absence of treatment effect.
E. None of the above

A

C. α is the probability of type 1 error. It is used to set the threshold for statistical (not
clinical) signifi cance, often arbitrarily set as p = 0.01–0.05 (α = 1–5%). If α = 0.05, there is a 1 in
20 or 5% chance that the null hypothesis is rejected wrongly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Which of the following is an agreed method of assessing the quality of
conducting and reporting systematic reviews and meta-analyses?
A. ASSERT
B. CONSORT
C. QUOROM
D. SIGN
E. NICE

A

C. Despite the increasing importance and abundance of systematic reviews and metaanalyses
in the scientifi c literature, the reporting quality of systematic reviews varies widely.
To address the issue of suboptimal reporting of meta-analyses, an international group in 1996
developed a guidance called the QUOROM Statement (QUality Of Reporting Of Metaanalyses).
QUOROM focused on the standards of reporting meta-analyses of RCTs. A revision
of these guidelines renamed as PRISMA (Preferred Reporting Items for Systematic reviews and
Meta-Analyses) includes several conceptual advances in the methodology of systematic reviews

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q
All of the following methods are used to assess heterogeneity in a
meta-analysis except
A. Q statistic
B. I squared statistic
C. Galbraith plot
D. L’Abbé plot
E. Paired t statistics
A

E. Meta-analysis is generally done to combine the results of different trials, as individual
clinical trials are often too small and hence underpowered to detect treatment effects reliably.
Meta-analysis increases the power of statistical analyses by pooling the results of all available
trials. But this comes at a small cost. Although similar studies are taken to be included in the
meta-analysis, it is likely that each trial is different from each other just by chance. Sometimes
the difference can occur due to foreseeable situations, e.g. the dosage of medication tested,
the mean ages of the population tested, difference in the scales used, etc, may differ among
studies. To measure if this heterogeneity is more than the random heterogeneity we expect,
statisticians resort to certain tests of heterogeneity. They are statistical as in the chi-square test
(or Q statistic), which tests the ‘null hypothesis’’ of homogeneity and the I-squared test (which
measures the amount of variability due to heterogeneity). Galbraith’s plot and l’Abbé plot are
pictorial representations of heterogeneity. A paired t test is generally not used to calculate the
heterogeneity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q
Which one of the following types of data can have potentially infi nite
number of values?
A. Continuous
B. Categorical
C. Nominal
D. Ordinal
E. Binary
A

A. Data can be qualitative or quantitative. Quantitative data refers to measures that often
have a meaningful unit of expression. This can be either discrete or continuous. A discrete
measure has no other observable value between two contiguous potentially observable values,
i.e. there are ‘gaps’ between values. A continuous variable, on the other hand, can take potentially
infi nite values. The other choices in the question refer to qualitative measures whose value can
only be described and counted but cannot be expressed in meaningful units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

A multi-centre RCT was conducted with strict inclusion criteria. Which one
of the following properties of the study is most likely to be affected by the
stringent inclusion criteria?
A. Generalizability of results
B. Precision of results
C. Accuracy of the results
D. Statistical signifi cance of the results
E. All of the above

A

A. A major disadvantage with RCTs is the poor generalizability of experimental fi ndings
to a clinical setting. Having strict inclusion and exclusion criteria may help chose a highly
homogeneous population, increasing the internal validity of the study but at the expense of
generalizability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

A researcher is interested in studying whether maternal smoking increases
the risk of school refusal in children. Which one of the following is the
correct null hypothesis for the above research question?
A. School refusal increases the risk of maternal smoking
B. Maternal smoking decreases the risk of school refusal
C. Maternal smoking does not increase the risk of school refusal
D. Maternal smoking increases the risk of school refusal
E. None of the above

A

C. In scientifi c research, nothing can be proven; we can only disprove presumed facts.
If one wants to prove maternal smoking causes school refusal, it is best to assume that maternal
smoking does not cause school refusal to start with and then proceed to disprove this statement.
Such statements waiting to be disproved during the course of a research study are called the null
hypotheses. The converse of the null hypothesis is called the alternative hypothesis.
Research question: Does maternal smoking increase risk of school refusal?
Null hypothesis: Maternal smoking does not increase risk of school refusal
Alternative hypothesis: Maternal smoking increases the risk of school refusal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

From the following example, the most important methodological challenge
while conducting a cohort study is
A. Statistical analysis of the results
B. Randomization of the cohorts
C. Identifying those who develop the outcome
D. Identifying a suitable comparison group
E. Concealment of cohort allocation

A

D. Subjects do not get randomized in a simple cohort study. Hence there is no question of
allocation concealment. When valid instruments and a reasonable follow-up schedule are used,
identifi cation of those who develop the ‘event’ of interest/outcome is often not diffi cult in a
cohort design. Often the most diffi cult part is to identify a reasonable control cohort that lacks
the ‘exposure’ of interest. Internal controls refer to those who are ‘non-exposed’ but derived
from the same study population as the ‘exposed’. External control refers to an independently
recruited cohort without the exposure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

In a study investigating the mean cholesterol levels in 36 patients taking
olanzapine, the mean was found to be 262 mg/dL. The standard deviation
of this observation was 15 mg/dL. The 95% confi dence interval for this
observation is are
A. 232–292 mg/dL
B. 247–277 mg/dL
C. 259.5–264.5 mg/dL
D. 257–267 mg/dL
E. 226–298 mg/dL

A

D. 95% confi dence limits of means of a sample are nothing but the range between an
observation less than approximately two standard error units less than mean value and an
observation two standard error units more than the mean value. Using mathematical expression,
95% confi dence limits = mean ± (2 × standard error of mean).
Standard error of mean is calculated as SE = standard deviation/√sample size.
SE = 15/√36 = 15/6 = 2.5 in this question.
Hence 95% confi dence limits are
262 ± (2 × 2.5) = 262 ± 5 = 257, 267.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

In a normal distribution curve, 99% of observations will fall within which of
the following values of standard deviation (SD)?
A. –2 SD to +2 SD
B. –3 SD to +3 SD
C. –1 SD to +2 SD
D. –1 SD to +1 SD
E. +1 SD to +3SD

A

B. An important property of the normal distribution curve is the relationship between
the SD of normally distributed observations and probability. Normal distribution curves are
symmetric and bell-shaped. Nearly 68.5% of the sampled population will lie within 1 SD of the
mean on either side of the curve, 95.5% within 2 SDs, and 99% within 3 SDs. In other words,
there is a 1% chance that an observation will fall outside +3 SD to –3 SD; a 5% chance that it will
fall outside +2SD to –2SD and nearly 30% chance that it will occur outside +1SD and –1SD.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Confi dence intervals are used to describe the range of uncertainty around
the estimated value of an outcome from the sample studied. Which of the
following statements about confi dence intervals is incorrect?
A. Sample size is used in calculating confi dence intervals
B. It includes a range of values above and below the point estimate
C. If the confi dence interval includes a null treatment effect, the null hypothesis can be
rejected
D. 95% confi dence interval is often used in clinical studies
E. When the estimated outcome is a ratio, a positive treatment effect is shown by
confi dence intervals remaining above one.

A

C. If the confi dence interval includes a null treatment effect, the null hypothesis cannot be
rejected within the set levels of confi dence limits. Confi dence intervals provide a measure of
dispersion of the point estimate within stipulated confi dence limits (arbitrarily 95% corresponds
to a p value of 5%). In other words, confi dence intervals provide the assured range within which
the true value may lie. Confi dence intervals are a measure of precision of the results obtained
from a study. The larger the sample studied, the narrower the intervals. If the confi dence
intervals cross the value ‘0’ for the difference between means then the results are statistically not
signifi cant. If it crosses the value ‘1’ for ratio measures such as the odds ratio, it is not signifi cant.
If it crosses infi nity for inverse ratios such as NNT then it is not signifi cant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

A clinical researcher is examining the incidence of akathisia in two groups
of patients. One group (n = 35) has been prescribed benzodiazepine for use
as required while the other group (n = 35) is free from any benzodiazepine
exposure. The outcome is measured as proportion of patients who develop
akathisia in a dichotomous scale. Akathisia develops in 10 patients without
benzodiazepines and in 20 patients with benzodiazepines. Which of the
following statistical tests is best suited to analyse the statistical signifi cance
of the difference between the two groups?
A. Chi square test
B. Paired t test
C. Multiple regression analysis
D. Wilcoxon rank sum test
E. Pearson coeffi cient test

A

A. In this study, the dependent variable is treated as a categorical outcome. In other words,
the population has been categorized into ‘akathisia present’ or ‘akathisia absent’. This type of
outcome yields frequency counts or proportions that can be analysed for signifi cance using
the chi square test. The t test is used for comparing means. The Wilcoxon rank sum test is a
non-parametric equivalent of the t test. Pearson coeffi cients are used to analyse correlation.
Regression analyses are used to predict one variable from another when they are correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Considering normal distribution, which one of the following statements is
incorrect?
A. It is a continuous distribution
B. It is symmetrical in shape
C. The mean, median, and mode are identical
D. The shape of the distribution depends on the number of observations made
E. Both tails of the distribution extend to infi nity

A

D. Irrespective of the number of observations made, the shape of a normally distributed
curve is symmetric and bell shaped. The exact shape of the normal distribution is defi ned by a
function that has only two parameters: mean and standard deviation. For a given range of scores,
when the standard deviation is small, the curve becomes leptokurtic, i.e. thin but still symmetric.
When the standard deviation is larger, it becomes platykurtic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

In descriptive statistics, which of the following is the most widely used
measure of dispersion of a frequency distribution?
A. Range
B. Median
C. Standard deviation
D. Variance
E. p Value

A

C. Standard deviation is a widely used measure of dispersion of data in descriptive statistics.
Other measures include range, interquartile range (usually accompanies median values), and
variance. Standard deviation is obtained by the root mean square of differences between
individual observations and the mean value. Note that standard error is often preferred as the
measure of dispersion while making inferences from a sample of the population. Standard error is
a measure of precision of sample estimate in comparison with the population value.

58
Q

In qualitative research which of the following refers to modifying the
research methods and hypothesis as and while one conducts the research?
A. Triangulation
B. Iterative approach
C. Theoretical sampling
D. Content analysis
E. Deductive approach

A

B. The iterative approach in qualitative studies refers to the process of altering the research
methods and building the hypothesis as the study progresses, in response to new information
gained while conducting the research. This fl exibility allows qualitative studies to follow an
inductive rather than the deductive approach seen in quantitative research. Data come before
theory is generated in inductive methods; a stated theory is tested using generated data in
deductive methods.

59
Q

Systolic blood pressure is known to be normally distributed across the
population with a mean of 120 mmHg and standard deviation of 10 mmHg.
How many out of 100 patients in a population will have systolic blood
pressure between 120 and 130 mmHg?
A. 68
B. 97
C. 48
D. 17
E. 34

A

E. In the above question, the mean is given as 120 mmHg. Assuming normal distribution
with a standard deviation of 10 mmHg, we can fi nd out the proportion of the population that
will fall between two observed values. For values between –1 and +1 standard deviation from
the mean, this will be nearly 68%. Nearly 34% will have values between the mean and 1 standard
deviation. In other words 34% will have systolic blood pressure between 120 mmHg and
130 mmHg.

60
Q

In which of the following situations is intention to treat analysis deliberately
not attempted even if there are signifi cant numbers of drop-outs?
A. A study that analyses the efficacy of an intervention itself
B. A study that analyses the effectiveness of providing an intervention
C. A study that compares two interventions for economic efficiency
D. A study that compares an established standard treatment against a new treatment with
the view of replacing the standard
E. None of the above

A

A. Signifi cant numbers of subjects recruited for trials often do not complete the trial as
per protocol. The data generated from such drop-outs cannot be ignored as this will potentially
lead to an attrition bias in favour of the intervention generally. Therefore, it is a standard practice
to analyse the results of trials on an ‘intention to treat’ basis, i.e. data from subjects are analysed
as per initial allocation irrespective of trial completion. In a few situations such as the ‘effi cacy
studies’, intention to treat analysis is not used, instead ‘per-protocol analysis’ is carried out.
An effi cacy study is designed to explain the effects of the intervention itself. This is in contrast
to effectiveness studies, which are designed to study the usefulness of making an intervention
available (choices B, C and D).

61
Q

A 24-week RCT of memantine in moderate–severe Alzheimer’s dementia
was reported. The investigators recruited 126 subjects for the memantine
arm and 126 for the placebo arm, out of which 100 in the memantine group
and 100 in the placebo group completed the study. Using a categorical
measure of treatment response it was shown that 40% of the patients in
the memantine group responded while only 20% in placebo group showed a
response. Calculate the relative risk reduction of using memantine
A. 20
B. 5
C. 2
D. 1
E. 10

A

D. Relative risk reduction (or relative benefi t increase) is calculated using the following
expression: relative risk reduction = absolute risk reduction/control event rate (RRR = ARR/
CER)
The control event rate is 20%; the experimental event rate is 40%.
Absolute risk reduction is the difference between the two event rates, i.e. 40 – 20 = 20%
RRR = 20/20 = 1

62
Q

Using the above study results calculate the number needed to treat (NNT)
for patients receiving memantine compared with placebo
A. 20
B. 5
C. 2
D. 10
E. 7

A

The NNT can be calculated from the absolute risk reduction (ARR).
NNT = 1/ARR
NNT = 1/0.2 = 5
Five subjects must be treated with memantine to have one additional response

63
Q
If the above study used a per protocol analysis of primary outcome, the
odds ratio of having a response is
A. 2.7
B. 7.2
C. 0.16
D. 6
E. 0.37
A

A. To calculate the odds ratio, it will be useful to construct a 2 × 2 table. As per protocol
analysis is used, only those who completed the trial have been included in the analysis
The odds ratio is obtained using the cross product ratio ad/bc
= (80 × 40)/(60 × 20) = 8/3 = 2.7

64
Q
Which one of the following measures is used in correlation analysis for
non-parametric data?
A. Kappa statistics
B. Pearson’s correlation
C. Spearman’s correlation
D. Cohen’s d
E. Cronbach’s alpha
A

C. Spearman’s correlation is used for non-parametric correlation analysis. It is also called
the rank correlation test. It can be used when one or both variables to be correlated consist of
ranks (ordinal) or if they exist as quantitative data but do not have normal distribution. Pearson’s
correlation is used for parametric correlation. Kappa is a measure of agreement not correlation.
Cohen’s d is used to calculate effect size. The internal consistency of an instrument is tested using
Cronbach’s alpha.

65
Q

Parametric statistical methods make assumptions, which when satisfi ed
make the fi nal estimate precise and accurate. Which of the following is one
such parametric assumption?
A. The distribution of observations in the population is not known
B. The variance of the compared samples are homogeneous
C. The analysed variables are categorical measures
D. Outliers are unequally distributed
E. The sample size is at least 2% of the size of target population

A

B. To enable use of inferential statistics, standard sampling assumptions such as (1) the
randomness of the sampled data and (2) the independent nature of the observations must be
met. In addition, when parametric statistics are employed assumptions such as
1. homogeneity of variance of the samples
2. observations are obtained from continuous (interval/ratio) scales
3. normal distribution of the observed variable
must be met. There is no set proportion of population size that must constitute the sample
size in order to use parametric statistics. But in samples that are too small the distribution may
not be normal and the central limit theorem may not be applicable. In conditions where such
assumptions are not met non-parametric statistics are used. The latter are often considered to be
less robust.

66
Q

In a study comparing drug A and a placebo control, 20 out of 200 patients
taking drug A die after 3 years. Twenty-fi ve out of 225 patients taking the
placebo die after 3 years. If death is the outcome of interest, the control
event rate is given by
A. 25/225
B. 20/200
C. (25 – 20)/200
D. (25 – 20)/225
E. 25

A

A. Drawing a 2 × 2 table will help answering this question

Control event rate is the rate of death (‘event’ of interest) in the control group = 25/225

67
Q

In an RCT comparing the effect of exposure therapy versus cognitive
restructuring, follow-up was carried out at 6, 11, 24, and 36 weeks. At weeks
6 and 11, after rating the patient, the outcome assessors tried to guess the
treatment condition. Correctness of guesses did not differ significantly from
that expected by chance. This was an attempt to demonstrate which of the
following?
A. Adequacy of randomization
B. Concealment of allocation
C. Blindness of assessor
D. Blindness of patient
E. Matching of two groups

A

C. Adequacy of blinding can be tested during or after completing a trial by asking the
blinded parties to guess the allocation. Guess rates that are signifi cantly higher than expected
by chance indicate failure of blinding. Testing for ‘blindness’ may not generate valid answers all
the time. This is because as participants begin to experience treatment response or outcomes
of interest, they begin to generate ‘hunches’ about the effi cacy of the treatments being tested.
Hence tests for blinding can show spurious failure of blinding while in fact they test the ‘effi cacy
hunches’ that develop late in the process of a trial.

68
Q
If the sample size is sufficiently large, mean values of repeated observations
follow normal distribution irrespective of the distribution of original data in
the population. This is known as
A. Bayesian theorem
B. Central limit theorem
C. Bonferroni correction
D. Transformation theorem
E. Independent observations theorem
A

B. The central limit theorem explains why normal distributions are so frequent when
considering most biological parameters. Consider repeated sampling from a population where
distribution of the observed variable is unknown. You intend to plot the distribution of individual
means of each sample from the population. As sample size increases, the sample means approach
a normal distribution with its mean value being the same as the population mean and a standard
deviation equal to the standard deviation of the population divided by the square root of the
sample size. Usually 10 or more observations are suffi cient to result in an approximate normal
distribution.

69
Q

The validity of a new instrument is compared with an external criterion.
A conceptually related external criterion is identified to occur sometime
in the future. If the correlation between current scores obtained using the
instrument and the future expected outcome is studied, this is called
A. Concurrent validity
B. Incremental validity
C. Predictive validity
D. Inter-rater reliability
E. Internal consistency

A

C. The term validity refers to the strength of our conclusions, or in the case of statistics,
the strength of our inferences. It refers to applicability. The term reliability refers to the
consistency of our measurements, or the reproducibility. An important subtype of validity is
called criterion validity. If an instrument provides a result that withstands the test of an external
criterion then the instrument is said to have high criterion validity. The external criterion may
be a measurement that can be obtained more or less at the same time (concurrent validity) or
it may be an outcome that is predicted to occur in the future (predictive validity). If a test offers
something over and above what is offered by an existing instrument, then incremental validity
can be established. Internal consistency of a test refers to looking at how consistent the results
are for different items (measuring the same construct) within the instrument studied. This can
be measured by undertaking item–item correlation, item–total score correlation or split half
reliability (Cronbach’s alpha; see elsewhere in this chapter).

70
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

The sensitivity of the overall screen using both items is approximately
A. 100%
B. 25%
C. 67%
D. 76%
E. 91%
A

E. Questions similar to this are very common in the MRCPsych exam. Most of such
questions provide some data and require the candidate to do a series of calculations from the
data. It is always advisable to redraw as soon as possible the presented data in a format that will
fi t the purpose. From the given table we can create a 2 × 2 table, with the gold standard result on
the top. One should be careful while constructing the 2 × 2 table. It is advisable to stick to one
style of using columns and rows to indicate a particular group of data. Here, we have drawn the
2 × 2 table with the gold standard results indicated across the two data columns with screening
test results indicated across the two rows Sensitivity is defi ned as the test’s ability to identify people who, according to the diagnostic (gold)
standard, actually have the disorder (true positives). Sensitivity = A/(A + C) = 39/43 = 90.69%, i.e.
90.69% of subjects who really have depression according to DSM-IV criteria have a positive test
result on the screening test. In other words, sensitivity is the proportion of true positives (cases)
correctly identifi ed by the test.

71
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

The specifi city of the overall screen is approximately
A. 67%
B. 95%
C. 38%
D. 25%
E. 91%
A

A. Specifi city is defi ned as the test’s ability to exclude people who, according to the diagnostic (gold) standard, do not actually have the disorder (true negatives). Specifi city = D/
(B + D) = 84/124 = 67.74%, i.e. 67.74% of the people who do not have depression will have a
negative result on the two-question screen. Thus specifi city is the proportion of true negatives
among all non-diseased individuals. In other words, it is the ability of a test to rule out the
disorder among people who do not have it.

72
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

The predictive power of a positive test using the overall screen is
A. 49%
B. 91%
C. 67%
D. 25%
E. 95%
A

A. Not all of those people, who have been found to be ‘positive’ on the test, might actually
have the disorder. Positive predictive value (PPV) gives the proportion of true positives among
the test positives. It is calculated using the formula, PPV = A/(A + B) = 39/79 = 49.36%, i.e.
49.36% of people diagnosed with depression using the screening test actually have the illness.

73
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

The predictive power of a negative test using the overall two-item screen is
given by
A. 49%
B. 91%
C. 67%
D. 25%
E. 95%
A

E. Not all of the people who have been found to be ‘negative’ on the test might actually be
disease free. Negative predictive value (NPV) answers the question ‘Of those people who have
been found to be ‘disease negative’ on the test, how many actually do not have the disorder?’
It is calculated using the formula, NPV = D/(C + D) = 84/88 = 95.45%, i.e. 95.45% of people
diagnosed ‘normal’ on the test don’t have the disorder.

74
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

The pretest probability of the overall two-item screen is
A. 49%
B. 91%
C. 67%
D. 25%
E. 95%
A

D. The prevalence, also known as the pretest probability or base rate, refers to the
proportion of people who have the disorder = (A + C)/N, i.e. 43/167 = 25.74%.

75
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

The likelihood ratio of a positive test for the overall two-item screen is
A. 2.8
B. 4.8
C. 6.8
D. 8.8
E. 10.8
A

A. PPV and NPV depend on the prevalence of the illness, and, as one can see, the
prevalence of an illness can vary according to the population it tests. For example, the prevalence
of depression is likely to be more in patients in a palliative care unit. Since the prevalence keeps
changing with population, and hence the PPV and NPV, one way of summarizing the fi ndings of
a study of a diagnostic test where there is a different prevalence is to use the likelihood ratio.
The likelihood ratio for a positive test (LR+) result is the likelihood that a positive test comes from
a person with the disorder rather than one without the disorder. LR+ is calculated using the
formula, LR+ve = [A/(A + C)]/[B/(B + D)]
or simply
LR+ve = sensitivity/(1 - specifi city).
So, (39/43)/(40/124) = 0.90/0.322 = 2.8. Since the specifi city and sensitivity of a test are
considered to be constant for any particular test, the LR is also constant irrespective of
prevalence rates.

76
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

The likelihood ratio of a negative test (LR–) for the overall two-item
screen is
A. 0.14
B. 0.34
C. 0.54
D. 0.74
E. 0.94
A

A. The LR– represents the likelihood that a negative test comes from a person with the
disorder rather than one without the disorder. LR– is calculated using the formula LR–ve =
[C/(A + C)]/[D/(B + D)], or simply LR–ve = (1 – sensitivity)/specifi city.
So, (4/43)/(84/124) = 0.10/0.67 = 0.14
Similar to LR+ve, LR–ve is also constant irrespective of prevalence rates.

77
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

Using the nomogram below, calculate the post-test probability of a positive
test when using the two-item depression screening test in the palliative care
unit using the fi gures indicated at the beginning of Question 70

A. 1
B. 2
C. 4
D. 10
E. 50
A

E. The post-test probability is the probability that a patient, scoring positive on the test,
actually has the disorder (PPV). It can be calculated using the nomogram that is provided. Since
we know the pre-test probability (prevalence) and the likelihood ratio, we should be able to fi nd
the post-test probability from the chart. A straight line drawn through the pre-test probability
(25) and the likelihood ratio +ve (2.8) should yield a post-test probability of about 50.

78
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

Using the nomogram in Question 77, calculate the post-test probability of
a negative test when using the two-item depression screening test in the
palliative care unit
A. 1
B. 4
C. 10
D. 50
E. 80
A

B. In this case, since the question is about post-test probability of a negative test, the
likelihood ratio –ve (0.14) and the line would pass through 4.

79
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

What is the false positive rate for the overall 2-items screening test?
A. 32%
B. 9%
C. 90%
D. 67%
E. 25%
A

A. False positive (FP) is the number of people diagnosed to have a condition with the new
test when they actually do not have the condition according to the gold standard. In this case,
the percentage of people falsely identifi ed by the test as depressed. Using the 2 × 2 table, false
positive is calculated FP = B/B+D = 40/124 = 32%.

80
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

What is the false negative rate for the overall two-item screening test?
A. 32%
B. 9%
C. 90%
D. 67%
E. 25%
A

B. False negative (FN) is the number of people not diagnosed with a condition with the
new test when they actually have the condition according to the gold standard. In this case, the
percentage of people among the depressed group falsely identifi ed by the test as not depressed,
i.e. C/A +C; 4/43 = 9.3%.

81
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

Taking into consideration the above screening test, we randomly pick
1000 people from the general population. Considering the prevalence of a
major depressive disorder using DSM-IV in the general population as 10%,
calculate the positive predictive value of the 2-item screening test in the
population?
A. 49%
B. 91%
C. 67%
D. 31%
E. 95%

A
D. In Question 75, we discussed how the prevalence of a condition can vary according to
the population tested. Using the same screening test for depression in the general population of
1000 subjects (N), we are asked to calculate the positive predictive value. The prevalence rate or
pre-test probability is 10% (A + C/N). We need to make a fresh 2 × 2 table in order to answer
the question. We know that sensitivity and specifi city remains constant for the disease. From the
given data the prevalence = A+C/N = 10%
As N = 1000 now, we can say A+C = 100
Sensitivity (A/A+C) = A/100 = 0.91; so, A = 91.
Specifi city (D/B+D) = 67.74%; D/900 = 0.677; D = 610.

Using the formula for positive predictive value, PPV = A/A+B = 91/290 = 31%.

82
Q

A recent study conducted in a palliative care unit assessed the use of a
two-item questionnaire to screen for the presence of depression. Given below
is the table which compares the result of the screen to the gold standard
(DSM-IV) diagnosis. In relation to this table, answer questions 70–82

Depressed- +ve 39, -ve 4 (43)
Not depressed- +ve 40, -ve 84 (124)

Taking into consideration the above screening test, we randomly pick
1000 people from the general population. Considering the prevalence of a
major depressive disorder in the general population using DSM-IV as 10%,
calculate the new negative predictive value of the two-item screening test in
the population?
A. 49%
B. 91%
C. 67%
D. 30%
E. 98%

A

E. See the table in Answer 81. Using the formula for negative predictive value, NPV =
D/C+D = 98.36%. Note that the same answer can be derived using pretest odds and likelihood
ratios. Please see question 6.

83
Q

The table below shows the adverse events reported during an RCT on
sertraline for the prevention of relapse in detoxicated alcohol-dependent
patients with a comorbid depressive disorder. Answer Questions 83–86
based on the data presented in the table

What proportion of patients develops dyspepsia after exposure to the
sertraline? (6/44)
A. 13.6%
B. 5%
C. 8.6%
D. 63.2%
E. 90.2%
A

This question looks at the chances of developing dyspepsia with sertraline. It is otherwise called
the ‘experimental event rate’ (EER). This is calculated as A/(A + B); that is, 6/44 = 0.136 or 13.6%.
Similar to the above question, the chances of developing dyspepsia with placebo, or the ‘control
event rate’ (CER) is C/(C + D), or 2/39 = 0.05 or 5%.

84
Q

The table below shows the adverse events reported during an RCT on
sertraline for the prevention of relapse in detoxicated alcohol-dependent
patients with a co-morbid depressive disorder. Answer Questions 83–86
based on the data presented in the table

What proportion of dyspepsia will be eliminated if sertraline was not
administered? (6/44 sert) (6/88)
A. 13.6%
B. 5%
C. 8.6%
D. 63.2%
E. 90.2%
A

C. This is otherwise called the ‘attributable risk’ or the ‘risk difference’ or ‘absolute risk
reduction’ (ARR). It is calculated as the difference in the absolute risks of developing a headache
between sertraline and placebo, that is 13.6 – 5 = 8.6%

85
Q

The table below shows the adverse events reported during an RCT on
sertraline for the prevention of relapse in detoxicated alcohol-dependent
patients with a comorbid depressive disorder. Answer Questions 83–86
based on the data presented in the table

How many times is a person on sertraline more likely to develop dyspepsia
than a person on placebo?
A. 1.7
B. 2.7
C. 3.7
D. 4.7
E. 5.7
A

B. This question asks for the ‘relative risk’ or ‘risk ratio’ of dyspepsia with sertraline. It is
an estimate of how much greater is the risk of developing dyspepsia with sertraline than with
placebo. It is the ratio of the absolute risks or ratio of event rates, i.e. EER/CER = 13.6/5 = 2.7.
This means that the risk of dyspepsia with sertraline is 2.7 times that of placebo. If there is no
difference between sertraline and placebo, the relative risk would be 1. Expressed otherwise,
relative risk values that are more than 1.0 represent increases in risk. Relative risk values that are
less than 1.0 represent decreases in risk. If 95% confi dence intervals are given, and if the range
includes the value 1, then the elevation in risk can be considered as statistically insignifi cant. The
relative risk is used as a primary summary measure in RCTs and cohort studies.

Remember RR is from exposure->outcome

86
Q

The table below shows the adverse events reported during an RCT on
sertraline for the prevention of relapse in detoxicated alcohol-dependent
patients with a comorbid depressive disorder. Answer Questions 83–86
based on the data presented in the table

How many times are the odds of being dyspeptic on sertraline higher than
the odds of being dyspeptic on placebo?
A. 1.9
B. 2.9
C. 3.9
D. 4.9
E. 5.9
A

B. This question looks at the odds ratio. It is an estimate of how many times more likely it
was that a person who experienced a problem (dyspepsia) was exposed to the supposed cause
(risk factor) than was a control subject (those not exposed to the risk factor). Let us consider
the data in the table in a different way: the number of people who developed dyspepsia is 8
and those who did not develop dyspepsia is 75. The ‘odds’ of an event happening is the ratio of
the probability of its occurrence to the probability of its non-occurrence. So in patients with
dyspepsia, the probability of being on sertraline is A/A + C = 6/8 = 0.75. The probability of being
on a placebo is C/A + C = 2/8 = 0.25. Therefore the odds of a person with nausea being on
sertraline is 0.75/0.25 = 3 or simply A/C. Similarly, we can also calculate the odds of the person
‘without dyspepsia’ being on sertraline. It is 38/37 (B/D) = 1.02, i.e. the odds of having used
sertraline in those who did not have nausea is 1.02. The ratio of these odds is simply called the
odds ratio. The ratio = (A/C)/(B/D) or (AD/BC). That is, 3/1.02 or 6 × 37/2 × 38 = 222/76 = 2.92.
The odds ratio is interpreted in a manner more or less similar to the relative risk. Confi dence
intervals are provided and interpreted in the same manner. Odds ratios are usually used in case
control studies and in meta-analyses as primary summary measures

Remember OR is from outcome->exposure

87
Q

The finding of a hypothetical cost-effectiveness analysis of a new model of
psychotherapy in depression is shown in the table below

AD- $5000, effect 45 weeks
Psychotherapy (new) $10,000, effect 50 weeks

Calculate the average cost-effectiveness ratio (ACER) for the new
treatment?
A. £200/week
B. £100/week
C. £50/week
D. £111/week
E. £20/week
A

A. As cost-effectiveness analysis has been applied to healthcare, researchers have used
predominantly two methods of calculating the summary measure – the average ACER and
incremental cost-effectiveness ratio (ICER). The ACER captures the average cost per effect,
i.e. cost of treatment/effect of treatment. In this case, the cost of the new psychotherapy is
£10,000 and the effect is 50 depression-free weeks. In the above question, the ACER for the new
treatment (psychotherapy) will be C/E = 10,000/50 = £200. The ACER for antidepressants from
the question will be 5000/45 = £111.

88
Q

The finding of a hypothetical cost-effectiveness analysis of a new model of
psychotherapy in depression is shown in the table below

AD- $5000, effect 45 weeks
Psychotherapy (new) $10,000, effect 50 weeks

Calculate the incremental cost-effectiveness ratio (ICER) for the new
treatment
A. £1000 per additional depression-free week
B. £200 per additional depression-free week
C. £111 per additional depression-free week
D. £89 per additional depression-free week
E. £600 per additional depression-free week

A

A. In contrast to ACER, the ICER reports the ratio of the change in cost to the change in
effect (for example ΔC/ΔE). In plain and simple language, this pretty much translates to the extra
cost per extra effect, i.e. ΔC/ΔE. From the question, we can see ΔC = 10,000 – 5000 = 5000;
ΔE = 50 – 45 = 5 weeks. So, ΔC/ΔE = 5000/5 = £1000. Again in plain language, this would mean
that compared with antidepressants, the new treatment would cost an average of 1000 additional
pounds per one added depression-free week. In many economic evaluations, the ICER indicates
that a new treatment is relatively more costly (ΔC >0) and relatively more effective (ΔE >0) than
usual care, as in the situation in the question. Now, it is for the decision makers to decide if this
additional money is worth spending.

89
Q

The finding of a hypothetical cost-effectiveness analysis of a new model of
psychotherapy in depression is shown in the table below

AD- $5000, effect 45 weeks
Psychotherapy (net) $10,000, effect 50 weeks

What is the incremental net benefi t (INB) if the health commissioners are
willing to pay around £1500 per additional depression free week?
A. £500
B. £1000
C. £2500
D. 5 weeks
E. 1 week

A

C. An INB calculation determines whether the net benefi t of a new treatment outdoes
that of usual care. In our case, the net benefi t of psychotherapy surpasses the benefi t of using
antidepressants. In general, the INB is calculated by valuing ΔE in pounds and then subtracting
the associated ΔC. This is where the society’s willingness to pay for the additional depression
week comes into play. INB is calculated using the formula (ΔE × λ) – ΔC, where λ is society’s
willingness to pay for a 1-unit gain of effect. In our question, ΔE = 5 weeks; the service managers
are willing to pay around £1500/each depression free week (λ – willingness to pay) and ΔC is
£5000. So, INB = (5 × 1500) – 5000 = 7500 – 5000 = £2500. The INB equation computes the
net value of patient outcome gained in pounds. When the INB is positive, the value of a new
treatment’s extra benefi ts (ΔE × λ) outweighs its extra costs (ΔC). In short, society values the
extra effect more than the extra cost (i.e. ΔE × λ >ΔC). Conversely, when the INB is less than
0, society (or your health service management) does not consider the extra benefi t worth the
extra cost.

90
Q

The finding of a hypothetical cost-effectiveness analysis of a new model of
psychotherapy in depression is shown in the table below

AD- $5000, effect 45 weeks
Psychotherapy $10,000, effect 50 weeks

After critically appraising the above cost-effectiveness analysis paper,
managers of an NHS foundation trust decide to choose psychotherapy over
antidepressants as the first-line management for depression. Which of the
following statements best defines the opportunity costs?
A. The original cost incurred while providing psychotherapy as the fi rst choice treatment
B. The cost of providing psychotherapy instead of prescribing antidepressant drugs for
depression
C. The apparent cost of not providing antidepressants as the fi rst choice of treatment.
D. The cost of the using antidepressants in the absence of psychotherapy for depression.
E. The cost of conducting this trial in order to make treatment recommendations

A

C. Resources are scarce and are relative to needs. The use of resources in one way prevents
their use in other ways. For example, if a city council decides to build a hospital on a piece of
huge vacant land in the middle of the city, the city forgoes the opportunity to benefi t from the
next best alternative such as selling the land to decrease the current debt or building a shopping
mall that would generate additional income for the council. Opportunity cost is assessed in not
just monetary or material terms, but in terms of anything which is of value. The opportunity
cost of investing in a healthcare intervention is best measured by the health benefi ts that could
have been achieved had the money been spent on the next best alternative intervention. In
this example the cost of not providing the ‘next best alternative’, antidepressant therapy, is the
opportunity cost of providing psychotherapy as the fi rst choice treatment.

91
Q

The finding of a hypothetical cost-effectiveness analysis of a new model of
psychotherapy in depression is shown in the table below

AD- $5000, effect 45 weeks
Psychotherapy $10,000, effect 50 weeks

The given cost-effectiveness acceptability curve (CEAC) is drawn using the
data from the hypothetical study on treatment of depression. What is the
probability of cost-effectiveness if the society is willing to pay £150 for every
depression-free day?

A. >90%
B. 75%
C. 50%
D. 25%
E. <10%
A

A. How does a decision maker decide on the willingness to pay (λ)?
The net benefi t approach forces decision makers to directly consider the issue of valuing
additional patient outcomes. The INB can be computed with various λ s and analysed using
multiple regression techniques. How sensitive the results are to the assumed λ value can be
gauged using a cost effectiveness acceptability curve (CEAC). The CEAC shows the probability
that a new treatment is cost-effective for different values for λ. So in the given question, if λ is
£150, the probability of it being cost-effective is >90%. But if the λ is £10, the probability is less
than 25%. At the same time, the probability of cost-effectiveness is also >90% if λ was £100. So, it
would be sensible for the decision maker to pay £100 for every depression-free day, rather than
a £150.

92
Q

12 being the highest degree of depression) was developed to screen for
depression in a population of patients with dementia. The scale was tested
against the gold standard of DSM-IV in a small study. The neurologists
using the test wanted a score that would identify a depressed person
from a non-depressed based on this instrument. A statistician involved
in the development of this instrument mailed the following graph to the
neurologists. Answer Questions 96–99 based on the graph below

What is the above graph called?
A. Scatter plot
B. Funnel plot
C. Receiver operator characteristics curve
D. Galbraith plot
E. Forest plot
A

C. This is a receiver operator curve (ROC). Scores on scales are usually considered to
be continuous variables. Although dichotomizing continuous data leads to loss of information,
in clinical practice, it makes sense to deal with dichotomous variables. For instance, with the
new scale in the question, it would make sense if we can differentiate a depressed patient from
a non-depressed patient, rather than just saying patient A had a greater score than patient B.
In this situation, we should know where the ideal cut-off for the scale is. However, because the
distributions of the scores in these two groups most often overlap, any cut-off point that is
chosen will result in two types of errors: false negatives (that is, depressed cases judged to be
normal) and false positives (that is, normal cases judged to be depressed). Changing the cut-off
point will change the numbers of wrong judgements but will not eliminate the problem. The
cut-off point also depends on if we want the test to be more sensitive (as in a screening test)
or more specifi c (as in diagnostic tests). The ROC helps us to determine the ability of a test to
discriminate between groups and to choose the optimal cut-off point

93
Q
What does 1 – specifi city represent?
A. False-positive rate
B. False-negative rate
C. True-positive rate
D. True-negative rate
E. None of the above
A

A. The test in question is a 12-item scale that has a potential score ranging from 1 to 12.
The sensitivity and specifi city of each cut-off score (in this case, there will be 11 possible cut-off
scores, as shown in the fi gure) is calculated with reference to the gold standard used to diagnose
depression (in this case, DSM-IV). These pairs of values are plotted, with (1 – specifi city) on the
x-axis and the sensitivity on the y-axis, yielding the curve in the fi gure in question. Note that the
true positive rate is synonymous with the term sensitivity, the true negative rate is the same as
specifi city, and the false positive rate means the same as (1 – specifi city); they’re simply alternative
terms for the same parameters. For simplicity, the graph can be depicted as below

94
Q

What does the dotted line represent? (ROC curve)
A. It is the curve of the test that best discriminates depressed from non-depressed people
B. It is the curve of a test that partially discriminates depressed from non-depressed
people
C. It is the curve of a test that does not discriminate depressed from non-depressed
people
D. It is the curve representing the application of the current screening instrument to the
whole population
E. It is the curve of a test with maximum sensitivity but minimum specifi city

A

C. The dotted line represents a test that is useless in discriminating a depressed from a
non-depressed person. A perfect test would run straight up the y-axis until the top and then run
horizontally to the right. The more the ROC deviates from the dotted line and tends towards the
upper left-hand corner, the better the sensitivity and specifi city of the test.

95
Q
Which cut-off point provides the best acceptable combination of sensitivity
and specifi city?
A. 1/2
B. 8/9
C. 3/4
D. 5/6
E. 6/7
A

E. From the graph, we can see that the more the ROC curve deviates from the dotted line
and tends toward the upper left-hand corner, the better the sensitivity and specifi city of the test.
Hence it is generally considered that the cut-off point that’s closest to this corner is the one
that minimizes the overall number of errors (‘the best trade off ’); in this case, it is 6/7. Since the
scale in our question is a screening test for depression, we would want it to be more sensitive
rather than specifi c. As we can see from the fi gure, a cut-off score of 11/12 would give excellent
specifi city, but very poor sensitivity, thus increasing the false negative rates.

96
Q

If the area under the curve (AUC) for the new test was found to be 0.5,
what does it mean?
A. The test can discriminate a depressed from a non-depressed person with high accuracy
B. The test can discriminate a depressed from a non-depressed person with moderate
accuracy
C. The test cannot discriminate a depressed from a non-depressed person
D. The test is half as good as the gold standard test
E. The test can identify 50% of depressed patients correctly

A

C. The primary statistical measure obtained from the ROC is the AUC. The AUC value can
be used to compare with the AUC value of a curve corresponding to the null hypothesis. The
null hypothesis is represented by a curve that could be obtained if the test has no usefulness
in discriminating those with the diagnosis and those without. This hypothetical curve will then
have an AUC of 0.50, which corresponds to the area in the graph that falls below the dotted
line. The difference in the two AUC consists of the area of the graph between the dotted line
and the curve. The AUC can be interpreted in another very useful way. AUC is the probability
that the test will show a higher value for a randomly chosen individual with depression than
for a randomly chosen individual without depression. That means, if we fi nd the AUC for this
particular test was 0.9 and take two individuals at random, one with and one without depression,
the probability that the fi rst individual will have a higher score than the second is nearly 90%.
Fortunately, the AUC, the sensitivities and specifi cities, and the whole ROC are calculated by
statistical software, sparing us of the burden

97
Q
What is the name of the graph shown above? (for SRs)
A. Funnel plot
B. Galbraith plot
C. L’Abbé plot
D. Scatter plot
E. Forest plot
A

E. Meta-analyses are usually displayed in graphical form using Forest plots, which present
the fi ndings for all studies plus (usually) the combined results. This allows the reader to visualize
how much uncertainty there is around the results. The graph in question, modifi ed below,
presents a Forest plot, sometimes called a ‘blobbogram’ identifying its basic components

98
Q
How many studies in the meta-analysis show statistically signifi cant
advantage for the new antidepressant?
A. 1
B. 2
C. 4
D. 6
E. 7
A
C. As shown in the diagram above, the horizontal lines along with the ‘blobs’ show the 95%
confi dence intervals of the effect size or each study. If the confi dence intervals cross the line of
no effect (at 0 in this case), it suggests that the effect is not statistically signifi cant. Out of the
seven studies, the confi dence intervals of three of the effect sizes of three of the trials (1, 2 and
5) cross the line of no effect, and four (trials 3, 4, 6 and 7) do not cross the line. The summary
measures in cases of dichotomous variables are usually odds ratios, and the line of no effect in
that case will correspond to 1
99
Q
Which of the trials has the greatest weight on the overall analysis?
A. Trial 1
B. Trial 3
C. Trial 4
D. Trial 6
E. Trial 7
A

D. The size of the blobs (lozenges) in the blobbogram usually represents the size of the
study, or more exactly the proportion of the weight that the study contributes to the combined
effect. In this case, the largest blob is that of trial 6

100
Q

In which of the following situations is sensitivity analysis especially
recommended while conducting a meta-analysis?
A. Presence of a high degree of homogeneity
B. Any meta-analysis of continuous data
C. Any meta-analysis of economic data
D. Presence of signifi cant publication bias
E. Pooled outcome showing a large effect of intervention

A

D. A systematic exploration of the uncertainty in the data is known as sensitivity analysis.
It is carried out to measure the effects of varying study variables such as individual sample size,
number of positive trials, number of negative trials, etc., on expected summary outcome measure
of a study (often a meta-analysis or economic study). Sensitivity analysis can be undertaken to
answer the question, ‘Is the conclusion generated by a meta-analysis affected by the uncertainties
in the methods used?’ One such uncertainty is publication bias. So, we can use sensitivity analyses
to fi nd out the impact of having missed unpublished studies.

101
Q

The number of independent values or quantities which can be assigned to a statistical distribution

A

degrees of freedom

102
Q

An estimate of the between-study variance

A

Tau

103
Q

broad analysis of continuous, ratio and interval data

A

generally normally distributed, and therefore, can use parametric tests using mean and SD

104
Q

analysis of ordinal/ranked (categories, order inherent, not quantifiably) related

A

non-parametric

105
Q

binary/nominal analysis

A

compare in terms of modal values and frequency counts, however can be easily transformed into single comparative measure (eg odds ratio)

106
Q

ratio

A

relationship between the numerator and denominator, instances of an observation in any reference group. number from 0 to infinity

107
Q

proportion

A

type of ratio, whereby the numerator is incorporated into denominator therefore can be expressed as percentage

108
Q

rate

A

ratio that is, or should be, quoted in reference to time frame

109
Q

point prevalence vs period prevalence, rate vs ratio

A

point is ratio, period is rate

110
Q

variance

A

sum of all differences in values from the mean, squared, and divided by the degrees of freedom (n-1)

111
Q

when data is skewed or bimodal described how

A

by median and interquartile range

112
Q

what does the standard error reflec

A

reflects how much the mean and SD would be likely to vary in the general population

113
Q

CI for poulation mean

A

CI= mean +/- 1.96 x SE (SE=SD/sqRn)

114
Q

probability

A

likelihood of an event occurring relative to the total number of possibilities

115
Q

Type 1 error

A

when null hypothesis is falsely rejected (false positive)

116
Q

Type 2 error

A

when null hypothesis is falsely accepted (false negative)

117
Q

Power

A

probability of correctly rejecting the null, when a true difference exists. 1-B (set at 0.2)

118
Q

Effect sizes

A

differenece between 2 group means, divided by SD in controls= cohen’s D

or the average of SD in 2 patient groups= standardised difference

numerically equivalent to z scores.

119
Q

identifying the truth?

A

in research, philisophically, one cannot prove anything from empirical observation, but one can disprove falsities
Identifying the truth is actually achieved by moving urther away from error, rather than discovering truth.

120
Q

Differences in means- the t-test

A

when data is approximately normally distributed, two groups= t test. Student T test is subjects are different, paired t test if observations of same group at different time point

121
Q

F statistic

A

when normally distributed data in 3+ groups= ANOVA, which is a measure of variance
F statistic = the variability around the mean between groups is compared with the variability around mean within the group
ANOVA only tells you if there are differences between the groups, but it doesn’t tell you where the differences are.

122
Q

Bonferroni correction

A

divides the significance level by the number of observations, so if 5 observations, then 0.05/5= <0.01
minimising Type 1 error when doing multiple significance testing

123
Q

Wilcoxon rank

A

For paired data
Non-normally distributed
Non-parametric

124
Q

Mann Whitney U

A

Non parametric

2 independent

125
Q

benefits of parametric

A

more powerful

calculation easier for CI and more flexible when examining interralationships between >2 variables

126
Q

when to use chi 2

A

when comparing two proportions of dichotomous data

127
Q

down side of the flexibility of chi2

A

Can do contingency 2x2 tables to test significance of variables- to see if there is any difference. This is the basis of multivariate statistics, stratified analysis, however v sensitive to sample size

128
Q

measures of association

A

Odds ratio
Relative risk
Correlatioin
Regression

129
Q

measure of aggreement/concordance

A

Cohen’s kappa- measure of reliability of research assessment, chance agreement allowed in this calculation by comparing the actual and potential agreement beyond chance, expressed as a fraction between 0 and 1

130
Q

reliability between and within raters

A

Measures stability between raters= inter-rater reliability
And within rater over time= intra-rater reliability

Correlation co-efficient or if 2+ raters at same time, intraclass correlation co-efficient

131
Q

validitiy

A

measuring what it is actually supposed to be measuring

132
Q

difference between odds and probability

A

in probability- the denominator includes the numerator, whereas in odds it does not

P= 0/0+1
0=P/1-P

133
Q

odds ratio

A

odds of exposure in cases, relative and divided by that in control= a/c / b/d or ad/bc

Where

a = Number of exposed cases

b = Number of exposed non-cases

c = Number of unexposed cases

d = Number of unexposed non-cases

odds of person with the outcome, having the exposure
odds of cases having risk factor

134
Q

relative risk

A

in cohort studies
risk of outcome from exposure

OR approximates to RR when outcome is rare

a/a+b / b/b+c

135
Q

attributable risk

A

difference between the disease outcome in exposed vs non exposed

136
Q

Correlation coefficient measures

A

in parametric= Pearson’s

in non-parametric= Spearman’s

137
Q

when to use multivariate versus logistic regression

A

multivariate if continuous

logitic if binary

138
Q

A systematic review differs from a literature review in that eligibility criteria are developed based on

A

Population and outcomes of interest, interventions and comparisons.

139
Q

A systematic review of qualitative studies can be undertaken by a

A

Meta-synthesis

140
Q

Qualitative data reports would NOT include:
A. Analysis by synthesis.
B. Discourse analysis.
C. Interpretive phenomenological analysis (IPA).
D. Thematic analysis.

A

analysis by synthesis