Topic 8: Multiplicity Flashcards
What is multiplicity
Multiplicity is the term to describe the increased risk of false positive (type 1) conclusions that arises when multiple statistical tests are carried out on a data set.
Why does multiplicity occur?
Doing multiple statistical tests for multiple hypotheses means you have multiple effect estimates and multiple p values. Each test is performed with a small chance of error and you look across tests to make a conclusion, which increases the type one error rate.
What is family wise error rate
Term to describe the increased error across p values associated with multiple tests contributing to your conclusion.
What is the family wise type 1 error
The chance of a false positive conclusion in the trial as a whole
What is required of the family wise type 1 error rate in order for trial outcomes to influence practice
It needs to be controlled.
What is the p-value
The p-value arising from a statistical test represents the probability that the observed difference is due to chance.
For two tests, each with a 95% prob of no error, what is the overall chance of error
0.95*0.95 = 0.9025. So overall chance of error is 9.75% (increased from 5%)
For multiplicity, is it the number of p-values contributing to the conclusion or the number of p-values calculated
p-values contributing to your conclusion
Is multiplicity an issue when you have multiple outcomes
Not if there is still only one hypothesis test contributing to the conclusion.
What is data-dredging
A lot of tests carried out in order to try and find something important
How can we avoid accusations of data dregdging
Don’t do any unplanned statistical tests. Do no more or less tests than those that have been planned in the trial protocol.
Which tests in a trial need to be reported
All of them, regardless of whether they have positive outcomes or not.
Do secondary analysis tests need to be reported too even if they don’t feed into the main conclusio
Yes
What can only reporting positive outcomes lead to
Reporting bias
How does reporting bias relate to multiplicity
It may not be clear if the results reported relate to all conducted tests, so its unclear how much of what has been reported has occured by chance of just excessive testing
When is multiplicity an issue in a trial with multiple outcomes
If only one or the other outcome is required to significant, as opposed to both needing to be significant.
When is multiplicity NOT an issue in a trial with multiple outcomes
When both are required to be significant, or if they are ordered so that the second is only tested if the first is significant. This means that the conclusion is only based on one result: effectiveness on both outcomes. Multiplicity is not an issue since the overall chance of error is not inflated.
What is a hierarchical drug trial
When you’re comparing two doses of the same drug, but the lower dose would only be considered if the higher dose was found to be effective.
What are two ways to overcome multiplicity for having multiple outcomes
Hierarchical strategy, you make conclusions in a set ordered way rather than conducting statistical tests for all hypothesis. Could also use a composite outcome.
What is a composite outcome
Multiple end-points combined
What is the issue with using composite outcomes to avoid multiplicity
They may be hard to interpret in terms of what the treatment differences are, what it relates to, and what the main driver of the treatment difference is.
Why is using multiple treatment arms beneficial
You can answer two questions with only 50% more patients by having two interventions and a control - efficient.
When is multiplicity an issue in trials with multiple treatment arms
When the conclusions simultaneously make reference to more than one treatment comparison - the chance of false positive claim is increased
In what kind of multiple treatment arm trial is multiplicity particularly an issue
When arms are added and removed throughout the trial, so you don’t know at the start, which treatment arms will be used.
How can you get around multiplicity in a multiple arm trial
Use a hierarchical design
Give an example of a time where you might repeat analysis at different time points
A trial that has an interim analysis and then a final analysis.
Give 3 reasons you might stop a trial early
Overwhelming evidence of improvement in efficacy. A lack of efficacy. Futility - a small chance the trial will show efficacy if you continue.
What is an adaptive trial
When you can make changes throughout the trial, give it interim looks to make changes and check the original assumptions that were used to power the trial.
In what phases are adaptive trials more common
Earlier phase trials
Why is multiplicity an issue with repeating analysis at different time points/interim analyses.
The more looks at the data, the more tests, and the greater the chance of making a type 1 error.
What does testing for interactions do to the power of the test
Decreases the power of each test - need more participants for equivalent power.
Why do subgroup treatment estimates have wider confidence intervals than overall treatment effect
you normally need more patients to detect an interaction term than you would for a standard trial, and the study has been powered for the overall treatment effect.
What did subgroup analyses use before using treatment interactions for analysis
Do a set of analyses on each subgroup and compare p-tests from the same test on different populations.
How do you avoid claims of data dredging when doing subgroup analyses
Pre-specify the groups used in subgroup analyses and have them be based on clinical rationale - avoids suspicion that you have done lots of tests and selected the strongest one.
What does it mean that subgroup analyses are considered hypothesis generating
May be used to direct subsequent research
How is multiplicity corrected
Splitting the type 1 error rate between analyses so the overall type 1 error doesn’t exceed the planned level.
What are the two ways of splitting type 1 error to control for multiplicity
Equally so that each test has an equal chance of significance, or so that less of the error is used on some analyses than others.
What kind of trials usually split the alpha equally
Multiple endpoints, ,multiple treatments and subgroup analyses
Which kind of trials usually split alpha unequally
Multiple time points. It is usually useful to use less error on interim analysis than final analysis - also means that only extremely certain results at interim analysis will stop a trial for efficacy.
True/False: Correcting multiplicity has no effect on the power
False
Give 3 corrections to control for type 1 error, when the hypotheses are of equal importance
Bonferroni, Hom, Hochberg
What does the bonferroni correction do
split error equally over independent tests
What is the disadvantage of the Bonferroni correction
Overly conservative.
What does the Holm correction do
Sequentially rejective multiple test procedure with a stepwise nature based on bonferroni correction. Order p values smallest to largest. Compare (is p less than it) the most significant hypothesis against alpha/n - which is used in bonferroni - then compare second most significant to (alpha/n-1). Third significant to (alpha/ n-2) and so on. As soon as we fail to reject the null, stop and reject all remaining null hypotheses.
True/False: The Holm correction is uniformly more powerful than the bonferroni correction
True
Which of the Holm and Hocheberg is step up procedure and which is step dow
Holm is step up, Hochberg is step down.
How does the Hochberg Correction work
order p values smallest to largest. Take the least significant (largest p) and compare to alpha/n. Take the 2nd largest p and compare to alpha/n-i+1. When p < specific alpha value, comparison stops and you conclude the hypothesis for that p and all other null hypothesis for p less than it are rejected.