Study Design and Statistics for Stratified Medicine Flashcards
What is statistics?
Methods to:
- understand variability between different units under study(e.g. people, genes, drugs)
- formally weigh up quantitative evidence
What do descriptive statistics do?
Summarise a set of observations
What do Inferential statistics methodsmake inferences about?
Some characteristic of apopulationbased on information contained in asample
- A simple linear regression..
2. What can a hypothesis test be used for?
- Assumes that a linear relationship between X and Y exists.
Y is the dependent variable (or response)
X is the independent variable (or predictor)
Y ~ β0 + β1X
For every one unit change in Xwe expect Y to change by β1 units.
- β1 only estimates the true relationship between X and Y.
A hypothesis test can be used to decide whether the true value of β1 is significantly different from zero. If so, X and Y are statistically associated
What does a multiple regression model?
Why is this harder to represent visually?
The joint effect of multiple different independent variables (Xs) on a dependent variable (Y)
A one unit increase in X1 (calorie intake) leads to an expected increase of β1 units of Y (BMI)
A one unit increase in X2 (physical activity) leads to an expected increase of β2 units of Y – β2 could be negative…
What can hypothesis tests determine in a multiple regression?
Determine whether each β is significantly different from zero, i.e. whether each of the Xs is associated with Y
Accounting for the other variables in the model!
What does a descriptive study describe?
A cohort without making comparisons of exposure and outcome
What does an analytical study involve?
Comparisons made. Difference depends upon when exposure or outcome is made.
If an investigator does not assign exposures in a study, what kind of study is this? What are different types of studies relating to this and how can they differ?
An observational study. This involves analytical study and descriptive study.
If this involves a comparison group it is an analytical study.
What are the 3 different forms of analytical study?
Cohort study
Case control
Cross study
What are the two types of experimental study
How do they differ?
Randomised controlled trial
Non-randomised controlled trial
Randomised controlled trial involves random allocation
- What does a randomised controlled trial involve?
- What are advantages of this study design?
- What is a limitation
Experimental: investigator randomly assigns exposures(e.g. drug vs no drug)
- Randomisation helps to eliminate bias
Blinding also minimises information bias
Gold standard in clinical research; high internal validity
- Expensive, not always appropriate; external validity?
- What does a cross sectional study ascertain?
- What is an advantage?
- What is a limitation?
Exposures and outcomes at the same time
e. g. capture rheumatoid arthritis status and weight
2. Can associate presence or absence of disease (RA) with presence or absence of exposure (weight)
3. Can’t infer causal direction (RA → weight or weight → RA)
Important example: population-based biobank data
- What does a Cohort study involve?
- What are two advantages of this
- What is a limitation
- Proceeds “forwards in time”
Select groups based on exposure and monitor for outcomes - Good for estimating risk of incident disease (new diagnosis)
Direction of causality can be inferred from design(e.g. weight → RA)
- Can be expensive and not effective for rare diseases
- What does a Case-control study involve?
- What is an advantage?
- What are the limitations?
- Works “backwards in time” – harder to correctly interpret
Identify groups based on outcome and look back for exposures - Good for rare diseases
- Big problem: control group must be similar to cases in all important respects except for not having the outcome in question
Other differences could be picked up as incorrect associations
Also: recall bias = better recollection of exposures among cases
What research method has the highest risk of bias and confounding variables?
What do these variables affect?
Observational research
Internal validity of study
What are the three main classes of bias and confounding variables?
Selection bias
Information bias
Confounding
What does selection bias refer to/ask?
What are three examples?
Are the groups similar in all important respects?
Examples:
Membership bias
Jogging → heart disease risk?Joggers also differ in diet, smoking, …
Non-respondent bias
Smokers less likely to return questionnaires
Neyman bias
Hospital-based sample of myocardial infarction cases won’t
What does information bias refer to/ask?
What are three examples?
Was information gathered in the same way?
Examples:
Diagnostic suspicion bias
More intensive search for disease among people exposed to “suspected” cause
Family history bias
Controls less likely to know about family history of a disease
Recall bias
Abortion → cancer?Abortion underreported in contr
Investigating confounding variables focuses upon what?
What is a example?
Is an extraneous factor blurring the effect?
The effect being observed is not due to the exposure, but a third factor with which it is associated
Example:
Salpingitis appears to be associated with using an intrauterine device (IUD)
But actually salpingitis is associated with having >1 sexual partner
Women with >1 sexual partner more likely to have an IUD
How can confounding variables be dealt with prior to study conduction?
Matching:
For each case, find a control that matches on potential confounders
e.g. find control with same smoker status
How can confounding variables be dealt with after study conduction?
Stratification-
Split data into strata and analyse separately
e.g. analyse smokers and non-smokers separately
Multivariate techniques -
Include potential confounders in a joint model with the exposure
e.g. use multiple regression and include a term for smoker status
What must a biomarker study design capture?
sufficient information to examine the hypothesised effect
Sufficient numbers of:
biomarker + (positive) and biomarker negative participants
outcome + (positive) and outcome (negative) participants
Why is there higher potential for biases and confounding in a biomarker study?
Focusing on treatment effect study: exposure/outcome and biomarker
E.g. Biomarker study: exposure/biomarker/outcome
What is needed for a biomarker study?
All need replication via repeated demonstration in independent studies
What does a prognostic biomarker need to demonstrate?
Need to demonstrate that biomarker + group has better (or worse) outcome than biomarker – group regardless of treatment
What does a predictive biomarker need to demonstrate?
A significant treatment by biomarker interaction
What terms are used for a multiple regression to the model a treatment biomarker interaction design for a predictive biomarker?
How outcome changes due to treatment in all patients (regardless of biomarker status)
How outcome changes due to presence of biomarker(regardless of treatment)
Additional impact of treatment in biomarker positive patients
What is biomarker sensitivity/ true positive rate?
is the ability of a test to correctly identify those with the specified characteristic (e.g. disease, drug response)
“How well does biomarker do at picking out the positive cases?”
What is biomarker specificity/ true negative rate?
is the ability of a test to correctly identify those without the specified characteristic
“How well does biomarker do at only picking positive cases?”
Sensitivity and specificity are intrinsic characteristics of the test, but do not take into account what?
The proportion of actual positives and negatives in the population being tested
What is the Positive Predictive Value (PPV)?
Positive Predictive Value (PPV)
The fraction of the group that test positive that are actually positive
What is the Negative Predictive Value (NPV)?
The fraction of the group that test negative that are actually negative
What do Receiver Operating Characteristic (ROC) curves provide?
What does the area under the curve (AUC) provide and is useful for?
a graphical representation of the trade-off by displaying the range of possible cut points of a diagnostic test with their associated sensitivity (or TPR) against (1 – specificity) (or FPR).
An indication of the utility of the predictor(0.5 = random chance, 1 = perfect separation)
Is useful in assigning the best cutoffs for clinical use
Provide a means of comparing two or more predictive tests