Stats, Trial Design, Interpretation Flashcards
Internal validity
How is the study structured?
Is it a “good” study?
Study design issues
External Validity
Does the study apply to my situation?
Is it applicable to the patients I see?
Is it practical?
Generalizability
Types of Study Design -5
Descriptive
Observational
Case control
Follow-up
Cross-sectional
Experimental
Types of Study Design - descriptive -3
No comparative group– no intervention
ex. case study, case series, survey, education intervention with no comparator
can be large (ie. 30,000 high dose theophylline)
Types of Study Design - observational (epidemiological) (think watchful scientist)
Comparative group; no intervention
- case control: based on outcome
- follow-up: based on risk factors
- cross-sectional: hybrid of the two
Types of Study Design - experimental -7
Comparative group
Patients selected
Consent
Investigator allocates
Intervention
Measurements
Assess outcome
Types of Study Design - experimental - Parallel vs Crossover - PARALLEL DEF-4
Each patient receives one therapy
Two concurrent groups
Interpatient variability
NEED MORE PTS
Types of Study Design - experimental - Parallel vs Crossover - CROSSOVER DEF-5
Each patient receives one therapy then another
Randomized to sequence (everyone gets both drugs)
Tx A -> outcome -> Washout (5 half-lifes) ->Tx B -> outcome
Position effect (statistics)
FEWER PATIENTS NEEDED
Types of Study Design - experimental - ADVANTAGES -4
Control more variables, can blind
Decrease sources of bias
Ascertain cause and effect (can not say that in other trials)
“Cadillac” of study designs
Analyzing Methods Section - FLUFF
Types of bias
Often use flowcharting to follow a patient through the study
Selection Bias -5
use Table 1
Was bias introduced in how the patients were selected?
Is the study population adequately defined?
Inclusion and exclusion criteria
Treatment groups comparable
See “Table 1” of study
Classification Bias def -2
Refers to how classifications made (bias can be made in recruiting pts, often defs in supplementary material, definitions can be extensive)
ex. Postmenopausal, Receptor positive, Outcomes—disease-free—survival event
Preventing Classification Bias -3
Use structured definitions
Use “reliable,” “complete” sources of information
-is EHR good source of info
Allocation Bias def -2
use Table 1
Was bias introduced when patients assigned to their groups? - very hard to assess because studies often just say “pts were randomized”
Was it truly random? use Table 1 to see if equal
Randomization method -1
Permutated blocks (for every 4 pts, assign 2 to a group -> this allows study to stop in middle if needed)
Keeping even numbers of patients in the groups throughout the conduct of the study (to allow better stats)
Stratified according to participating center
Chemotherapy planned to be given before, during or not at all
Compliance Bias def -3
How was compliance assessed?
Not ALWAYS specifically addressed in study - this makes it HARD to assess
ie. Semiannual visits for first 5 years
Attrition Bias def -3
Drop-outs and why (acct for all pts)
- ie. may just drop out (withdraw consent) or be ineligible following medical review
If more patients drop out of one group vs another, does this introduce bias or influence the results?
Interventions def -3
Comparable
Blinding
Double-blind: Neither investigator nor patient knows patient allocation
Single-blind: Either patient or investigator does not know
Competing interventions (that would influence results)
Observer and Measurement bias -5
prevent with blinding
How are outcomes measured?
Is it appropriate?
Patient or observer influences
Sufficient observation (challenging, need many yrs for oncology)
Is it clinically meaningful?
Confounding Bias -4
all studies susceptible
Attributing the outcome to a risk factor not related to the outcome (wrongly attributing outcome)
Can control for many variables in the analysis
Often difficult to prevent
Look at exclusion criteria
Preventing Other Problems - Is the study powered to be meaningful? -3
Study enough patients
Discuss with statistical power
Usually discussed when sample size calculations presented in methods
Analyzing Results
Add numbers to flow chart (assess attrition)
Follow the numbers
Attrition
Present results for EVERYTHING mentioned in methods
Statistics
What Data To Include - Intention to treat
All patients randomized included in analysis
Considered the most conservative analysis
What Data To Include - Modified intention to treat
all patients randomized AND received at least one dose of therapy
What Data To Include - Per protocol
Only those patients who completed the study per protocol (ie, pt’s dropped if ADR and stopped tx)
For many studies useful to have both an intention to treat method and a per protocol
Statistical Analysis -Descriptive -3
measure of central tendency
mean, median, mode, etc.
spread of the data
Statistical Analysis -Inferential 1
Null hypothesis =No difference exists
Non-inferiority Trials -3 &&&
new in last 10 years
hard
Use a different hypothesis: The two treatments are not non-inferior to each other (tough because double negatives) (think the treatments are same)
P values mean different things—if less than 0.05, means they are non-inferior
Can’t claim superiority with these trials but can do a non-inferiority analysis then a superiority analysis
Non-inferiority Trials -Why? -4
Unethical to do a placebo controlled trial
Treatment expected to be similar to standard treatment (Therapeutic non-inferiority to active control)
Treatment assumed to be better than placebo
Treatment likely to have other advantages (safety, cost, convenience . . .)
Non-inferiority Trials -set up -3
no clue here
Set a “marginal difference”
Uses alternative hypothesis
Set confidence interval threshold
-actually need more pts for this type of trial
study types - superiority vs equivalence vs non-inferior def -3
Superiority trials (Is new therapy significantly better or worse?)
Equivalence trials “neither any better or any worse” (Establish equivalence range. Is it in the range to be similar?) (to see if 2 drugs are pharmaceutically equivalent)
Non-inferiority “not much worse than the active comparator” (Is new therapy no worse than control?)
Statistical Tests - Nominal: yes or no - def and examples
Categorical
Response rate (patients responded or not)
Adverse events or not
Alive or dead
Pregnant or not
Race
Statistical Tests - Nominal Data “Traps” -2
Percentages
Seem like on a continuous scale
Think of the data origin
Did the patient have a response or not?
Response is yes-no
Presented as % patients with response
Multiple groups or categories
Still assess if yes or no they belong to each group
No ranking
Statistical Tests - Nominal Data Tests -4
Chi-Square (lots of rules)
N>40
20-40 use if expected frequency of cells >5
Fishers exact (if N<30 use but can use for all nominal data)
Related samples: McNemar (cross-over)
3 or more independent groups: Chi-Square (also called Chi-Squared for independent groups)
Statistical Tests - Ordinal Data def -6
Ranked
Likert scales (strongly agree to strongly disagree)
Hierarchy
Responses not mathematically equal
ie. Years of HRT (none, 0-5 years, 5-10 years, greater than 10 years), Age of diagnosis (less than 50, 50-55, 55-65, older)
Statistical Tests - Ordinal Data “Traps” -4
Likert scale (1-5, strongly disagree-strongly agree)
Calculate means
Behaves like continuous data and presented as continuous—need to remember still ordinal data!!
Useful to present median, mode, “top box” = positive responses like Likert 4 and 5 only
-USE MEDIAN NOT MEAN
Statistical Tests - Ordinal Data Tests -4
Mann Whitney U test (based on MEDIAN)
Wilcoxon rank sum test
Related samples (cross-over)
Sign test
Wilcoxon signed rank test
Kruskal-Wallis ANOVA (for multiple groups)
Statistical Tests - Continuous Data -Continuous “Traps” -2
Data presented as % probably not continuous
Are composite scales, etc, really continuous?— many times yes!
Battery of ordinal scales
Total (when all batteries combined together) behaves as continuous
Statistical Tests - Continuous Data -def and examples
Interval, ratio data
Time to disease progression
WBC, platelet count
Serum creatinine
age, weight
Statistical Tests - Continuous Data: VAS -2
Visual analog scales (VAS) Scale of 0-10, 0 being no pain, 10 being the worst pain imaginable (DO NOT DEFINE THE POINTS IN BETWEEN) Only anchors the ends Administered verbally or in writing Handled as continuous data
Other pain scales: 0=no pain, 1=mild, 2=moderate, 3=severe
Defines all points
Handled as ordinal data
Statistical Tests - Continuous Data Tests -5
Parametric vs Non-parametric
Mann-Whitney U (median)
Student’s t-test (2 groups) (mean)
Normal distribution (are both mean and median similar), equal variance (are std deviations the same)
Related Data: paired t-test (cross-over)
ANOVA (3 or more groups)
Hypothesis Testing -4
Start with null hypothesis
Superiority trial: There is no difference
Equivalence: The groups are not equivalent
Non-inferority: The therapy is not non-inferior to the other therapy
Types of Error—Superiority
columns across are truths
rows are experiment
- Type I or alpha error (alpha, p value) TOP RIGHT OF BOX = experiment shows “difference exists” WHEN IN FACT truth is “no difference”. alpha is set up front as 0.05, P VALUE DETERMINED AFTER STUDY AND IS NEW ALPHA AND TELLS IF SIG
- Type II or beta error (beta) BOTTOM LEFT OF BOX= experiment shows “no difference exists” WHEN IN FACT truth is “difference”. beta is set up front. 0.2 is good, some do lower UNFORTUNATELY NO EQUIVALENT P VALUE TO FIGURE WHERE WE REALLY FELL
Types of Error—Superiority
columns across are truths
rows are experiment
- Type I or alpha error (alpha, p value) TOP RIGHT OF BOX = experiment shows “difference exists” WHEN IN FACT truth is “no difference”. alpha is set up front as 0.05, P VALUE DETERMINED AFTER STUDY AND IS NEW ALPHA AND TELLS IF SIG
- Type II or beta error (beta) BOTTOM LEFT OF BOX= experiment shows “no difference exists” WHEN IN FACT truth is “difference”. beta is set up front when doing sample size. 0.2 is good, some do lower UNFORTUNATELY NO EQUIVALENT P VALUE TO FIGURE WHERE WE REALLY FELL
Power and Sample Size -5
picked at the begining
Power = 1 - beta
Determined by alpha (p value) and beta values desired
Estimated response rate
Difference believed to be valuable
Front-end concept!!
Sample Size Calculations using power
slide 80 &&&
Expressing Risk - three types and data used -3
Expressed as odds ratio, relative risk or hazard ratio
Used for nominal data ONLY!!!
Use a 2x2 table—helpful for organizing data in
study
Odds Ratio def and trial use -3
ESTIMATE OF RISK
Based on prevalence
No denominator, making assumptions
Case Control, cross-sectional
Relative Risk def and trial use -4
Based on incidence
Denominator
Association between exposure and disease over time
Follow-up, experimental
Incidence -2
PREVALENCE IS WITHOUT UNIT OF TIME
(Number of persons developing dx/ total at risk) per unit of time
Direct estimate of probability or risk
Relative Risk rationale and calculation -5
Expression of risk for follow-up studies (also experimental trials)
Accounts for denominator information
Calculation: RR = (a/a+b) / (c/c+d)
The proportion between the two!!
Usually presented with a confidence interval
Interpreting Risk (by outcome) -4
1 = no difference between the groups
2-5 = mild association
5-10 = moderate association
> 10 = strong association
Relative Risk Reduction -3
RELATIVE BENEFIT INCREASE
The most “optimistic” way to present risk
ie.
Calculated 1-RR = 1-0.82= 0.18= 18%
Letrozole decreased the risk of a disease free survival event by 18%
Absolute Risk Reduction -3
ABSOLUTE BENEFIT INCREASE
Takes into account the actual values of the numbers rather than just the proportion
Are we talking events that occur 1 in 10 or 1 in 1000?!!
ie.
Calculated (A/A+B)- (C/C+D) =8.8%-10.7% = 1.9% (absolute value) -NOTE INCIDENCE DIFF
Number Needed to Treat (NNT) calculation and use-3
VERY IMPT -KNOW
Inverse of ARR
Be sure to convert percentages to decimals
Way to make numbers more practical and meaningful
Concept is “Number Needed to Harm” (NNH) for adverse events
-ALWAYS ROUND TO NEAREST WHOLE PERSON
Survival Analysis -5 &&&
BASICALLY RELATIVE RISK AMPED UP
EX. BREAST CANCER AND HORMONE EVENTS JUMPED AFTER 5 YRS - THINGS CHANGE OVER TIME
Takes into account the timing of events
Weighted relative risk over the entire study
Result is Hazard Ratio (HR)
Data presented in Kaplan—Meier curves
Cox proportional hazards regression the most common for multivariate analyses; log rank test for differences in survival
example need for survival analysis - 2
Takes into account that you had many patients for the first two years, and not as many in the last 3 years
Same number of events in end, but different denominators over time
Censoring def -1 and examples -4 -slide 101&&&when is arm favored??
Accounting for missing or incomplete data
Study ends before patient has an event
Patient is lost to follow-up
Patient withdraws due to an adverse event
Patients voluntarily crossed over to other tx (ie. letrazole)
IPCW: Inverse Probability of Censoring Weighted analysis DEF -1
Modeling technique to account for bias introduced in censoring
Censoring vs ITT issues -3
ITT: Tamoxifen looks better than it may be (patients that crossed over to letrozole included =pt likely had better outcomes)
Censored: Only disease free patients allowed to cross-over. High risk patients left in tamoxifen group.=pts likely had worse outcomes
-USE IPCW analysis to account for adjust for these biases
RR vs HR &&&slide 103 clarify
Relative risk can easily be calculated from numbers presented in the study
Hazard ratio is the same concept but is the weighted relative risk over time
Adjusts for change over time
Adjusts for “repeated measures”
Adjusts for different “slopes” of the line
P-Value def -3
Probability results due to chance alone
Determine level of significance (alpha value)
prior to conducting the study
By custom, p < 0.05 is considered “statistically significant”
Statistical Pearls as related to p value -4
The size of the p value has nothing to do with the importance of the result (ONLY YOU DETERMINE THIS, ie. p=0.001 for bp med which measured 2mmHG diff)
Do not confuse statistical significance with clinical significance
Results that are not statistically significant MAY still be important
Statistics do not determine what is important, statistics determine how certain we are.
Confidence Interval -5
95% CI (If study was repeated 100 times, 95% of the time the result would likely fall in this range)
Provides a “range” to result (Inferences on the population) (DO NOT CONFUSE CI WITH STD DEV, which only tell you about variation in study)
Calculation based on Standard Error of Mean (SEM
CI can be applied to any type data
IMPT TO Determine value that represents no difference
When used with OR, RR or HR
No difference value = 1 (If CI doesn’t include 1, then statistically significant)
Other Statistical Issues: Repeated Measures - Cox model - 3
If made multiple measurements over time, then need to correct for it using a statistical test that takes into account repeated measures
ie. Evaluated every 6 months
Cox model accounts for this
Other Statistical Issues: Bonferroni Effect -3 &&&clarify slide 113
NOT CLEAR
If look at enough things, something will be statistically significant just by chance alone
If didn’t make correction and should have, multiply the p value by number of comparisons.
2009 states adverse drug reaction (ADR) analysis not adjusted for multiple comparisons
WHEN IS IT POSSIBLE TO MAKE A BETA ERROR?
When p value is >0.5 because you are saying there is NO DIFFERENCE BETWEEN THE GROUPS.
Duration of Studies
Were patients studied for sufficient duration?
Do the results change over time?
Are the same things being compared at each time point? &&&
Subgroup Analysis key points -4
Allocation no longer applies (NOT RANDOM)
Sample size calculations don’t hold for
subgroups (POWER NOT APPLIED)
As more subgroups evaluated, more opportunity for finding a significant result when one does not exist
Results can be overstated and misleading
Interpreting Forest Plots when used? -2 what does bar mean? -1 what does box mean? -2 do shorter bars usually have larger boxes?
Used with subgroup analysis and meta-analyses
Bar = confidence interval
Box
Location = HR
Size = number of people in analyses
Usually shorter bars have larger boxes = smaller
confidence interval as increase sample = more confident of result
Meta-Analysis
Combine results from many studies
Reanalyze
Decrease beta
Specific criteria for selection and classification of studies (selection bias refers to how studies selected!!)
Studies should have similar methodologies
Compounds problems observed in the individual studies
Effect Size slide128-130&&&never heard of this
NOT COVERED IN VIDEO
Way to standardize the effect
Used for continuous data with normal
distribution
Calculated by dividing the difference of means by standard deviation
Use table from website to interpret and make more practical
Reporting Data key concepts &&&clarify
How reported affects significance placed on data
Watch graphs!! (CAN BE MISLEADING)
Changing numbers to %
Collapsing data in categories
% change from baseline
Case-control Studies def -3
Identify cases with the disease of interest (outcome)
Identify controls without the outcome
Look back in time (from present to past) to assess the risk factors
Retrospective study. . . .key concepts -3
Often confusing terminology
Study design: another name for case control
Refer to the time frame of the study
Application of Case- Control -4
what to apply to?
“rare” yes or no?
expensive?
Applied to new diseases or outbreaks
Can study “rare” diseases
Evaluate multiple risk factors
Relatively easy and less expensive
Weakness of Case - Control, aka types of bias -4
Selection bias
Classification bias
Information bias&&&
Confounding bias&&&
Cross - Sectional study def -4
Identify a study population (from present going forward)
First Classify based on outcome
Second separate outcomes to Classify based on risk factor
Predict prevalence
Cross - Sectional Problems / weaknesses -4
Chicken and the egg
Confounding bias &&&
Selection bias
Classification bias
Follow-up Studies def -5
Identify a study population (from present to future)
Exclude individuals with the outcome of interest
THEN those without outcome of interest -> Classify based on risk factor
Follow over time
Assess outcome
Follow-up study features -4
why best?
Strongest study design
Strongest causal link
Denominator; predict incidence
Can usually address information bias &&&
Follow-up bias / weakness -4
Hawthorne effect &&&
Surveillance bias
Change over time
Attrition bias &&&
Issues in Oncology Studies -6
Duration of therapy and evaluation
Results represented
Endpoints selected
Combination therapy
Doses, regimens, routes
Balancing cost and clinical outcomes
basics of statistical tests -3
type of data (nominal, ordinal, continuous)
number of groups
independent (parallel) OR related (cross-over) groups