Final 1 Flashcards

1
Q

What is the goal of a Phase I Trial?

A
  • Determine which dose of drug is safe and most likely to show benefit
  • Estimate largest size of a dose before unacceptable toxicity is experienced by patients (Maximally tolerated dose – MTD)
  • Start with a low dose and escalate until a prespecified level of toxicity is achieved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give a brief overview of the 3+3 (Step Up/Step Down) Design

A
  • Rule Based
    • Treat 3 participants at dose K
    – If no DLT escalate to dose level k+1
    – If 2+ DLTs, de-escalate to dose level k-1
    – If 1 DLT, treat 3 additional participants at dose level K
    — If 1 in 6 DLT, escalate to dose level K+1
    — If 2 in 6 DLT, de-escalate to dose K-1
    – MTD is the highest dose where 0 or 1 DLT is observed (repeat as needed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some issues with the Step Up-Step Down (3+3 Design)

A

o Ignores dose history other than previous 3 patients
o Imprecise and inaccurate MTD estimation
o Low probability of selecting true MTD
o High variability in MTD estimates
o Dangerous outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the goal of a Phase II Study?

A

Identify agents with POTENTIAL efficacy (does not test efficacy).

  • Discard agents without promise
  • OR support continuing the experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a pilot study? (And what are the two types)?

A
  • Pilot Study: A non-randomized clinical trial to determine if a treatment should be tested in a large RCT
  • Phase IIa: Pilot studies designed to demonstrate clinical efficacy or biological activity
  • Phase IIb: Studies to determine the optimal dose for biological activity with minimal side effects (for some conditions this is phase I)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we determine the sample size of a Pilot/Type II study?

A

Want a large sample that will have a high prob of detecting any common complications in treatments
- p=1-(1-r)^N
• p = prob of observing at one complication
• r= complication rate
• N = sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

*Which type of error is a more serious concern in pilot studies?

A

Type II error is a more serious concern

  • Don’t want to reject treatments that offer large patient benefits on the basis of small pilot studies – don’t want to conclude that a new treatment is ineffective when it might be
  • Type I errors will be caught in Phase III (theoretically)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formulation of null hypotheses in Futility studies and how does it compare to conventional studies?

A

o Formulation of the null and alternative hypotheses are reversed. Higher alpha (flip alpha and beta so alpha about 0.2 and beta about 0.05)
o Often there is no comparison arm (single arm study) or sometimes a historical control
o Smaller sample size
o One sided hypothesis – result compared to a pre-specified fixed value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the hypotheses in a continuous futility study and how do they compare to a conventional study?

A
  • Control Mean (Cx) + Increase for clinically meaningful (delta) = target threshold. If new treatment mean is less than this, don’t move on to phase III.
  • *Figure out how to insert table**
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the interpretation of errors in a futility study and how does it compare to conventional studies?

A
  • Choose Type I error/Beta: Flip alpha and beta: One Sided alpha – what is the percent that we can tolerate of rejecting an acceptable treatment (0.10 or 0.15)
    • Beta – 10-15% - accepting a greater chance of carrying an ineffective treatment forward (since will test in larger study)

Insert Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the hypotheses in a binary futility study and how does it compare to a conventional study?

A
  • Proportion expected to fail in control (P_cx) – reduction in fail considered clinically meaningful (delta) = target threshold
    • IF change is greater than P_cx = delta, stop

Insert Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the different types of control groups for futility studies.

A

o Historical Controls - Potential for bias – changes in background treatment over time
o Calibration control group – compare calibration group to historical control (small group to test bias of historical controls)
o Controls – If good information on controls is available, may not need any controls or may only need a couple (for randomization/masking)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe a Simple Selection Design

A
  • Phase II
  • Selecting the best among K treatments to take forward to Phase III based on stat ranking and selection theory
  • Sample size estimated to ensure that if that best treatment is superior by at least D, then it will be selected with high probability (ie* 90%)
  • Somewhat arbitrary and even the “best” might not carry through
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the three types of Selection and Futility/Superiority Designs

A
  • Phase II
  • Selection + Superiority (A)
      • Multiple Steps
      • – Randomize to k treatments + control , Calculate test, if max difference is at least as large as cutoff – proceed; if not, stop
      • – Randomize more patients to treatment chose in I and control; calculate weighted test statistic between step one and two; if at least as large as cutoff – reject null. If not, stop.
      • If reject, proceed to PIII
      • Poor specificity (often terminate early) but rarely choose suboptimal treatment
      • Lower sample sizes than some others
  • Selection + Superiority (B)
      • Similar to A but no control in Stage I
  • Selection + Futility
      • Includes control and a concurrently control Futility study ; requires simulation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we do Phase II/Pilot Studies?

A
  • General notes on selection designs:
      • Efficient way to identify a treatment with the most potential for effectiveness
      • Designed to select one best treatment of many in the pilot face
      • If determining which treatment-s to move forward with, it will be too limiting
  • Why do we do this?
      • Good for dose and determining if a drug is worth studying more
      • Too many drugs and combos to test all in PIII – long and expensive
        • – Long term outcome requires at least 5 years to first interim analysis so can’t assess futility in short term
      • Starting with patients and doing P2 to P3 – seamless P2 to P3
        • – Hard to recruit and retain because of long commitment (Esp for placebo)
        • – Don’t allow for change over time with Standard of Care, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Intraclass correlation and cluster randomization

A
  • People in clusters are more like than people across clusters
  • POSITIVE INTRACLASS CORRELATION REDUCES VARIATION AMONG MEMBERS OF THE SAME GROUP
  • Total var= var within group + var between groups
    - - Var within group = var_y(1-ICC)
    - – If you don’t take ICC into account, within group var=overall var
    - - Var due to group = var_gc(ICC)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define the Variance Inflation Factor.

A
  • 1+(n-1)ICC Design inflation factor
  • Design Effect (DEFF) = Design inflation factor; is directly proportional with the alpha level of the test. Type I error will increase if we ignored the DEFF since the variance estimate would be biased downwards
    - - Variance of group mean in cluster trial is greater in group randomized trial by a factor of DEFF (1 + (n-1)ICC).
    - - EXAMPLE Z=(bar(x)-mu_0)/var and var is smaller than it should be than Z will be larger than it should be and therefore the pvalue will be smaller than it should be so type I error increased
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What impact would this DEFF have on the computer alpha level of the test if we ignored the DEFF?

A

Variance estimate will be biased downwards. Variance is in the denominator of Z value. Therefore Z will be larger than it should be. Therefore, p-value will be smaller than it should be. Therefore Type I error increased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Do we perform matched or unmatched analysis for cluster randomized trials and why?

A

Even though randomization units are matched, we perform an unmatched analysis to preserve Type I error rate and improve power (could be an issue for small number of pairs and low matching correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a non-inferiority/equivalence trial and when would we use one?

A
  • A trial with the primary objective of showing that the response to the investigational product is NOT CLINICALLY INFERIOR to the control
  • Non-inferiority studies good for treatments that significantly cheaper, easier, less-risky, etc.
  • Non-inferiority can be for efficacy or for safety
  • New intervention might have other benefits such as less side effects, simpler, less invasive, lower cost -Willing to accept these advantages but at how great of a cost to efficacy
  • Controls: Current gold standard – historical data, placebo, active controls
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is an active control trial?

A
  • A trial in which the experimental intervention is compared with an accepted standard intervention (active control – proven better than placebo). Goal to show efficacy of the new treatment by showing that it’s
    • Superior to the active control OR
    • As good as the active control
    • Based on clinical and statistical judgment (ie* cost/benefit)
  • Widely used treatment that established efficacy in a well documented superiority trial and the new treatment should show similar efficacy
22
Q

What is the margin of non-inferiority delta? What is a key concern of it?

A
  • Margin of non-inferiority Delta: specified in protocol, maximum difference in responses between two interventions that is considered clinical unacceptable (ie* ½ or 1/3 of the established superiority) – retain a certain proportion of the active control’s efficacy
    • Placebo – 40 % mortality
    • AC – 30% mortality
    • Margin – ½ then will accept 5% or 35% mortality
  • Non-inferiority is comparing to an active control but we need to consider the effectiveness of active control as compared to the placebo in choosing the margin
  • Have to determine the margin of non-inferiority delta – maximum difference between two interventions that is considered clinically acceptable
    – Placebo 40% mortality; Active control 30% mortality; Margin: 35%-30%=5% or 37%-30%=7%? Definitely cannot go to 10% margin because then you’re at placebo. Need to establish what is acceptable.
23
Q

What are some concerns with the margin of inferiority delta?

A
  • Must be very valid. Clear protocol and carefully, rigorously conducted (minimal drop out, non-compliance, missing data that might bias the results toward the null in a non-inferiority study because they will look more like “placebo” if they’re not taking a treatment). ESPECIALLY IMPORTANT FOR NON-INFERIORITY BECAUSE IT WILL BIAS TOWARDS THE NULL FOR NON-INFERIORITY WHILE IT WOULD BIAS AWAY FROM THE NULL IN EFFECTIVENESS. This matters because it would lead you to be more likely to call a treatment INCREASES PROBABILITY OF TYPE II ERROR.
  • Watch for things that might bias towards the null (drop out, non-compliance, missing data) even more so than normal since we don’t want to reject the null in this case
    - - Might actually be doing worse – new treatment might cause more deaths but might not be able to detect it because of above (increases prob of type II error).
24
Q

What is assay sensitivity and what assumptions are necessary?

A

The ability of a trial to distinguish an effective treatment from a less effective or ineffective treatment.
- C > Placebo is based on previous studies
- A > B > C is non inferiority study
- Is trial internally valid?
- Assumptions:
• Active control is truly effective – if not a less effective intervention may be approved based on the active control test
• Constancy of the control effect – control effect does not change over time (if not needs placebo arm NOT a noninferiority test)

25
Q

What are some factors that can reduce assay sensitivity?

A
  • Poor compliance
  • Poor responsiveness of the study population
  • Poor diagnostic criteria
  • Biased assessment of the endpoint
  • Treatment effect
26
Q

When is a placebo needed for assay sensitivity?

A
  • Magnitude of the benefit of active control is minimal to the placebo
  • Not serious/life threatening (placebo wouldn’t hurt)
  • No standard of care established
  • Placebo effect varies among populations
  • Efficacy of active control may not be consistent
27
Q

Can we test non-inferiority and superiority in the same study, and if so, how?

A
  • Reasonable to plan to test of non-inferiority and then superiority – must be in that order because power
  • CANNOT test superiority first and then non-inferiority
      • Non-inferiority will have a smaller sample size
        • – Looking for a smaller effect size for non-inferiority than superiority (superiority is non-inferiority + effectiveness)
          • — Superiority margin is from intervention vs placebo (Larger)
          • — Non-Inferiority is for Intervention vs control (Smaller_
        • – Non-inferiority is one sided while standard/superiority will be two sided.
      • Test order has to do with type I error as well. If superiority first, if it fails it would not move on to the non-inferiority test. Has to pass the first test to move to the second.
        • – If first do non-inf before trying sup. Can detect both non-inf and sup. Non-inf is gate keeper. Then test for sup.
        • – If do sup first and it fails, we would not move on to non-inferiority. This is due to Type I errors. Can only move on to next sequential test if current test is successful.
      • Power issue has to do with the margin and the effect size being smaller (ie* harder to detect) for superiority. Thus sample size must be higher for superiority than non-inferiority. WILL ALSO BE UNDERPOWERED FOR ONE OF THE TESTS IF THE DELTAS ARE NOT EQUAL.
28
Q

If a study has significant drop out, how do we handle analysis?

A

Analyze based on randomization

  • All randomized subjected included in analysis according to assigned treatment group regardless of adherence to their assigned treatment
  • All outcomes must be ascertained and included regardless of their purported relationship to treatment
  • AS RANDOMIZED, SO ANALYZED
29
Q

What is intent to treat analysis?

A

Analyze everyone as they’re randomized.

  • Loss to follow up is a big issue – ITT does not minimize this bias and can be a limitation in ability to use ITT. Should be minimizing drop out in general for a study
  • Randomization is the most important
  • Deviation from the original randomize can impact the treatment comparison
  • TRIES TO KEEP THE RANDOMIZED GROUPS AND ADDRESS REALISTIC HYPOTHESIS ABOUT THE CLINICAL UTILITY OF THE TREATMENT
  • TEST OF THE RANDOM. PROCESS AS MUCH AS IT’S A TEST OF THE INTERVENTION
  • Primary analysis for clinical trials
30
Q

What are the variations of ITT?

A

o Strict ITT includes drop out before first treatment, crossover, LTFU.
o Modified ITT includes crossover, LTFU but not drop out before first treatment.
o Exclusion of missing ITT method. Include crossover, drop out before first (although technically LTFU), but not LTFU

31
Q

Why do we use survival analysis methods and what is the basic overview of them?

A
  • Sometimes outcome/mortality at a single point doesn’t tell the whole picture (ie* 5 year survival doesn’t encompass the whole survival curve). Hazard can change over time so survival function is a function of the hazard which can be a function of time.
  • Comparison at multiple time points not recommended (increased Type I error). Compare median survival time (rare) or compare the whole curve (standard)
  • Median/50% survival is when half the group has died
  • Survival Probability= Percent that has survived to that point
  • S(t) = exp(-lambda*t)
    - - Larger lambda -> survival curve decreases faster
32
Q

Give a brief overview of the Kaplan Meier method.

A
  • Kaplan Meier Method uses conditional probability approach. Also handles LTFU
  • Assumption of a constant hazard rate is needed for parametric assumption and this is usually unreasonable. For unadjusted analyses, the log rank test is standard
33
Q

Give a brief overview of the Log Rank MH Approach

A
  • Log Rank. Mantel Haenszel Test – compare the whole curve
    o Compare the proportion of event rates between the two arms at each time point and combine that information across all times – considering those at “Risk” of having the the event at each time
    o Not parametric
    o All 2x2 tables weighted equally – later time points given more weight where there are fewer data still at risk (insensitive to early differences)
34
Q

Give a brief overview of the rank approached to survival analysis.

A

Gehan and Breslow are rank tests for comparing survival curves. Assumes censoring pattern is equal in the two groups (G). Allows for variation in censoring patterns (B).

35
Q

Compare the survival analysis approaches.

A
  • MH test gives more weight to the later times while Gehan gives more weights to the earlier times. Can lead to different results. Logrank is typically standard. Must be prespecified in analysis plan.
  • Parametric comparison – exponential – just need hazard rates to compare the two curves
  • Semi-parametric: Cox PH regression analysis (incorporated PRE-SPECIFIED covariates).
    o Compare survival curves taking covariates into account
    — Model the HR as a function of time and cov – product of the unadjusted hazard and adjustment for LC of covariates
    o Similar to logistic but takes time to event into account
    o NEED TO TEST ASSUMPTION THAT HAZARDS ARE PROPORTIONAL
  • Strata: Can include in MH and Cox
36
Q

Give an overview of subgroup analyes

A
  • Who does the treatment work best for?
  • Pre-Specify hypothesis and clearly justify the rationale – make sure it’s not fishing
      • Otherwise will increase Type I error – want to protect against Type 1 error
      • NIH mandates comparison of sex and racial/ethnic groups
      • Usually done to see if the treatment is effective in specific groups (usually for marginally unsuccessful results)
      • ALWAYS test for interaction with treatment
        • – Don’t interpret the main effects when you’re looking at the interaction
      • Still considered post-hoc even when pre-specified
      • Only do subgroup analysis if interaction effect is present (usually pvalue will be higher because of lower power for the interaction test so increase alpha to maintain power)
37
Q

How do you graphically interpret subgroup analyses, interaction, and effect modification?

A
  • LOOK AT GRAPHS AND LEARN*
  • Parallel lines indicates lack of confounding/interaction. Different results for men vs women on placebo vs treatment is not an interaction. Interaction would mean different slopes of lines(see ppt). Otherwise you just need to adjust for the var (ie* sex).
38
Q

How do you handle continuous interaction in survival analysis?

A
  • Can use continuous but hard to interpret
  • Restricted Cubic Spline – visualize for interpretation – curvy lines
      • 3-5 variables to represent the continuous variable
  • Best for interpretation/simplicity: Divide into binomial or quartiles
      • Divide range into quartiles – percentiles or equal size
        • – Equal sizes - good for interpretation but might be unbalanced (unequal distribution within each group). If a single (or multiple) group is too small, there might be modelling issues
        • – Percentiles -Equal number within each group but the group intervals will be different sizes. (ie* 25-54, 55-57, 58-59, 60-65 but 50 people each). Harder to interpret easier to analyze/model.
        • – Best would be some type of clinical significance. Equal size is usually more clinically significant. Ideally would be to divide into ranges that have clinical significance (ped, adolescent, adult, 65+). Most interpretable.
39
Q

Why do we do interim analysis?

A
  • Ethical:
    o Detect benefit, safety, harm, or anything strongly indicating one of the treatments might be inferior or ineffective early
  • Administrative:
    o Be sure the study is being executed as planned, assess the appropriateness of enrollment; find unanticipated problems (ie* non-compliance) that could be correct; check on assumptions (ie* sample size)
  • Economic
    o Early stopping for negative results that are wasting resources. Allocation of R&D funds
40
Q

What are some things to consider when stopping a study early?
`

A
  • Terminate early:
      • Clear early effect – continuing is unnecessary or even unethical
      • Futility – sufficient evidence that the treatments are not different so continuing is unnecessary or unethical
  • Impact of early termination on the credibility of the results and the acceptability by the clinical community needs to be taken into account. Mostly for positive results
  • Have to weigh the pros and cons of stopping early for either positive or negative
  • Issues: Type I error inflation when looking for problems; What to analyze; preliminary information might impact objectivity about the treatment and enrollment (clinicians, patients, et.c) if anything gets disseminated
41
Q

Briefly describe - Fully Sequential Method – Sequential probability ratio test (SPRT)

A

o Assess the treatment effect after each part of subjects is enrolled, treated, and evaluated
o Not feasible for clinical outcomes (since won’t be instantaneous and a logistical issue)
o Logistically prohibitive in clinical setting where subject accrual and outcome evaluation aren’t instantaneous
o Cumbersome to monitor for large trials
o No prespecified sample size or time frame so hard to plan

42
Q

Briefly describe - Group Sequential Method

A
  • More popular than fully sequential method
    o Modification of SPRT were interim analysis is conducted at pre-specified times
    o Assumes that a standard analysis will be performed at the end of the trial once a fixed sample size is obtained and then considers to adjustments for the number of tests conducted along the way
    o Use large critical values for all interim tests and then their effect on the final test (overall alpha) is negligible so we can use the conventional critical value for the final test
    — Need a small pvalue/big effect to stop early
    o Limitations: Need to specify the number of times to check in (restrictive monitoring time); need to specify when to check the data (equal increments of information/patients); Administrative difficulties with respect to timing DSMB reviews
43
Q

Describe the cost on alpha of interim analysis.

A
  • There is a cost to looking at the data too much

- At calendar time t, a certain fraction (t*) of the total information is observed. 0

44
Q

Describe Missing at Random

A

MAR: Probability of measurement not being observed does not depend on what the value would have been. Probability does not depend on the value of Y as Y is missing after controlling for other variables X
• P (Y missing | XY) = P(Y missing | X)
• Much weaker that MCAR.
• Can test where missingness on Y depends on X
• No need to model the missingness in the data
- Standard Assumption

45
Q

Describe Missing Completely at Random

A

MCAR: Assumption that some data are missing on Y. If the probability of being missing is unrelated to the measurement of Y that would have been observed or other variables X, these data are MCAR
• Ideal if missing data – the analysis of only those units with complete data gives valid inference but loses power
• P ( Y missing | XY) = P (Y missing)

46
Q

Describe Missing Not at Random

A

MNAR: Probability of missing data is dependent on the value of that missing data
• Non-ignorable – missing data mechanism must be modeled to get reliable parameter estimates. Requires good prior knowledge of cause of missingness – no way to test goodness of fit of this and results may be sensitive to its choice

47
Q

What is sensitivity analysis for missing data

A

o Naïve analysis, best-case analysis, worst-case analysis
o If all scenarios pvalues above alpha (Two sample test of equal proportions) – can report as no difference; if all below then there is a difference; issue is if some above and some below (As in example).
o Sensitivity analysis: How do difference scenarios change the conclusion – do we need something more nuanced?
- Initial Analysis: Patterns, percentages, correlation, etc.

48
Q

Describe Listwise Deletion

A
  • Listwise Deletion (procedure based)
  • Delete all subjects with any missing values. Analyze remaining. Often the software default.
  • Underestimates important info
  • Biased if not MCAR
  • “Complete Case”
  • Good for MCAR – otherwise bad
49
Q

Describe Hot Deck and Last Observation Carried Forward (imputation)

A
  • Hot Deck and Last Observation Carried Forward (imputation)
  • Missing value imputed from a randomly selected similar record
  • If longitudinally collected outcomes, LOCF would use the previous outcome for their next missing outcome
  • Using a different observations value for imputation. Randomly shuffle and use the first one before that
  • Ignores trend over time
  • Bad
50
Q

What are the three least common imputation methods?

A
  • Mean Imputation
      • Bad
  • Regression Imputation
      • Build the model for the missing value(s) and impute them
      • Ignores random components and can underestimated standard errors and variances
      • Better
  • Single Random Imputation
      • Like regression imputation but takes into account random error
      • Still underestimates SEs; treats imputed like observed; ignores imputation variation
51
Q

What is multiple imputation?

A
  • Method for averaging the outcomes across multiple imputed datasets to account for the uncertainty of imputation
      • Missing values are imputated m times to create m data sets with complete data: m is usually 5-10
      • Analysis is conducted on each m datasets leading to m analyses
      • Pooling – consolidate the m results into one result by calculating the mean, variance, and Cis of the param estimates for the variables of concern
  • Most popular: Multiple imputation by chained equations (MICE)
  • Takes into account the uncertainty of imputation process
  • Best