Pre-midterm Flashcards
What distinguishes a ratio?
- A/B
- Dimensionless or dimensions
What distinguishes a proportion?
- A/(A+B)
- 0, 1
- Dimensionless
What distinguishes a rate?
- N/T (instantaneous change over time)
- Has dimensions
- E.g., speed
Two categories of incidence measures?
- Cumulative incidence/risk/incidence proportion/attack rate
- Incidence density/incidence rate
Synonyms of incidence proportion
- Cumulative incidence
- Risk
- Attack rate
Formula of incidence proportion
new cases during followup/#people followed-up
What are the assumptions of incidence proportion?
- Conditional on no competing risks
- Timing is everything for the interpretation (over a year? 2 years?)
- Fixed cohort with no exits
Interpretation of incidence proportion?
Risk/probability of developing a disease
Formula of incidence density
new cases during follow-up/total person-time at risk during follow-up
Unit of incidence density
Time^(-1)
Incidence density is a…
Rate (like speed)
What is incidence density useful for?
- Dynamic cohorts
- Dealing with competing risks
Problem with incidence density?
Person-years are considered equivalent, but they are not, making it hard to compare studies
Interpretation of incidence density?
If in a steady population:
- Put over 1 year
- Take reciprocal (1/xx)
- Interpret as average time until event occurs
Risk Incidence rate?
Risk = (Incidence Rate)*Time
- Good if risk is < 20%
- Becomes less good as we go through years, it becomes an overestimate
When are survival analyses good?
- Dynamic cohort
- Changing incidence rates
Assumptions of survival analyses?
- No changes in survivorship over calendar time
- Uniform withdrawal/events over interval
Formula for prevalence proportion?
existing cases/#total population
Incidence Prevalence
If steady state:
- P/(1-P) = (Incidence rate)Duration of disease
If rare disease (P < 0.10)
- P = (Incidence rate)Duration of disease
Direct standardization
Apply rates from population of interest to the standard population
Standardized mortality ratio formula
= observed events/expected events x 100
7 steps of natural history of disease?
- Biological onset
- Pathological evidence
- Symptoms
- Medical care sought
- Diagnostic
- Treatment
- Outcome (cure, control, disability, death)
According to the natural history of disease, what is the preclinical phase? and the clinical phase?
Preclinical: from biological onset to symptoms
Clinical: from symptoms to outcome
According to the natural history of disease, when does primary prevention takes place and what is it?
- Before biological onset
- Attempt to prevent development (e.g., vaccination)
According to the natural history of disease, when does secondary prevention takes place and what is it?
- Between biological onset and symptoms appearance
- Screen to improve prognosis
According to the natural history of disease, when does tertiary prevention takes place and what is it?
- Between appearance of symptoms and outcome
- Management to diminish impact of disease
What is sensitivity and what does it result in?
- P(T+ | D+)
- Higher sensitivity means fewer false negatives
Formula for sensitivity?
= True positives / all D+
What is specificity and what does it result in?
- P(T- | D-)
- Higher specificity means fewer false positives
Formula for specificity?
= True negatives / all D-
What happens when you require T+ on two tests to be considered D+, such as in sequential testing?
Less Sensitivity = Sensitivity1 x Sensitivity2 (you end up with more false negatives)
More specificity = Specificity1 + Specificity2 - (Spec1 x Spec2) (you end up with less false positives)
What happens when you require T+ on one of two tests to be considered D+, such as in simultaneous testing?
More sensitivity = Sensitivity1 + Sensitivity2 - (Sens1 x Sens2) (you end up with less false negatives)
Less specificity = Specificity1 x Specificity2 (you end up with more false positives)
Positive predictive value formula
P (D+ | T+) = True positives/all T+
Negative predictive value formula
P (D- | T-) = True negatives/all T-
Relation between predictive value and prevalence?
Increased prevalence leads to increased PPV
Relation between Sensitivity/specificity and predictive value?
Specificity impacts PPV
Sensitivity impacts NPV
Formula for PPV with sensitivity and specificity?
PPV = (sens x prev)/[(sens x prev) + (1-spec)*(1-prev)]
Formula for NPV with sensitivity and specificity?
NPV = (spec x 1-prev)/[(spec1-prev)+(1-sens)prev)]
What is a likelihood ratio?
Likelihood that a test result would be expected in a person D+
Compared to
The likelihood that the same result would be expected in a person D-
How is LR+ calculated?
P(true positives)/P(false positives) = sens/(1-spec)
How is LR+ interpreted?
The higher the LR+, the more likely the test is positive in D+ person compared to D- person (rules in the disease)
How is LR- calculated?
P(false negatives)/P(true negatives) = (1-sens)/spec
How is LR- interpreted?
The lower the LR-, the less likely the test is negative in a D+ person compared to a D- person (rules out the disease)
What are significant numbers for LR?
LR = 1, NS
LR+ > 1 –> associated with disease (good from > 10)
LR- < 1 –> associated with no disease (good from < 0.1)
Overall percent agreement formula?
agreed/#total ratings
Corrected percent agreement formula?
yes-yes/(#total ratings - #no-no)
Kappa statistics formula?
(% agreement observed - % agreement expected)/(100 - % agreement expected)
Prognosis is susceptible to which bias?
Lead time bias
- Prognosis usually from diagnosis, ideally from onset, but better screening may inflate survival
What is a causal mechanism?
Joint action of multiple component cause
- Causal pie
- Is a sufficient cause
What is a sufficient cause?
The smallest set of conditions and events that inevitably produces the disease (a complete pie)
What is a component cause?
A piece of the pie
When are component cause necessary?
When they appear in all causal pies
What do we need to know about the strength of a component cause?
- Cannot be determined for individual cases - they were all equal
- At the population level, a component cause is stronger when it takes par in a larger proportion of disease cases (e.g., smoking) - depends on the prevalence
What is the sum of all component causes?
Infinite, due to interactions
What is induction time?
The time between the action of a component cause and the completion of a sufficient cause (pie).
- For the last component cause, induction time is always 0.
What is the latent period?
Period between the action of the last component cause/disease initiation and the clinical manifestation
- The latent period is reduced with screening
What is the empirical induction period?
It is used when the specific causal mechanisms are unknown and that the induction and latent period may not be distinguished
- Latent + induction = empirical induction period
What are 4 of Hill’s causal guidelines, and which one is necessary(*)?
*Temporality
Strength of association (strong associations are less likely to be confounded, but sometimes causal is weak)
Specificity (rare and unnecessary)
Biological gradient (dose response relation, linear or not)
What is the counterfactual when talking about causality?
Causality can be established when the outcome on the same person, treated and untreated, is different. But only one of these events may be observed. The other is counterfactual.
- Counterfactual refers to treatment/no treatment simultaneously, and in the same population
What is the principle of exchangeability?
- When the risk outcome for tx group is equal to the risk of the outcome in the control group had they been treated
What is the goal of a RCT?
Simulate the counterfactual comparison by having exchangeable groups thanks to randomization. Protects against the confounding, even if not measured or unknown, without having to collect tons of info.
3 prerequisites for RCT?
- Clinical equipoise ($/ethics)
- Modifiable exposure, and modifiable by the investigator
- Common and early outcomes ($)
What is efficacy and effectiveness?
Efficacy: how well an intervention works under ideal conditions
Effectiveness: how well an intervention works in field conditions
What is the Hawthorne effect?
When watched people change their behaviour
What is the goal of the control group?
- Control for Hawthorne effect
- Control for Secular trends
4 types of control?
- Nothing
- Placebo
- Active Alternative
- TAU
When do you use nothing as a control, and what problems does it cause?
- When you have no accepted competitor to the treatment
- Participants are not blind, and no causal inference is possible
What is a placebo, and how useful is it for control groups?
- Sensorily similar to experimental intervention, without the active ingredients
- Good for blinding, allows straightforward conclusions
- Good to assess side effects of tx
What is an active alternative, and how useful is it for control groups?
- Another treatment (good for ethics and relevance)
- Often used for equivalence or noninferiority RCT, but conclusions drawn may be ambiguous
What is TAU, and why is it used, what are the problems?
- Treatment as usual
- For when the current practice varies and is hard to standardize
- You get little control over what happens in this group
How can you protect internal validity? At what cost?
With tighter exclusion criteria - it decreases generalizability
You’ll need a bigger sample size if you want…(4 items)
- smaller a (more certainty)
- bigger 1-B (more power)
- be able to detect a smaller difference
- you have more variation in your sample
4 RCT designs?
- Two-arm (parallel) RCT
- Crossover RCT
- Factorial RCT
- Cluster-randomized RCT
What is a crossover RCT? Assumptions?
- Each participant is their own control, they get tx/placebo and get assessed, then there is a wash-out period, then they swith
- Assumption: no residual carryover
What is a factorial RCT? Assumptions?
- When you want to look at the outcome of 2 distinct treatments, so you do a 2x2 design (possibility to test effect modification)
- Assumptions: 1) outcomes for both tx are different; 2) modes of action are independent
What is a cluster-randomized RCT? Concerns? What does it need?
- When you randomize groups (towns, hospitals, schools) instead of individuals
- Concerns: contamination (groups talking with each other)
- Requires larger samples and fancy stats to handle the clustering of individual units within groups
When is simple randomization good? What is the problem?
Good: for large samples, simple and easy
Problem:
- If small sample, equal chance criteria unmet
- Could lead to unequal number of participants in each arms, decreasing the power and increasing confounding
When is block randomization good? What is the problem?
- Good: to ensure balance between treatment arms, especially in smaller samples
- Problems: increases predictability, which leads the way to selection bias (but you can vary the block size at random to palliate)
When is stratified randomization good? What is the problems?
Good: to control for a covariate and ensure balance between arms (prevents type I error and improves power)
Problem:
- all subjects must be identified before the assignment starts
- many covariates may get complicated, so should use only those known to have a strong effect
When is covariate adaptive randomization good? What is the problems?
Good: to balance covariates, when you don’t want to wait for all participants to be recruited before assigning
- Especially good for small samples with many covariates
Problems:
- only the first participant is truly randomly assigned
- predictability
- control only for known confounders
What is allocation concealment?
- Not knowing what the next assignment is going to be
- ALWAYS feasible (both for subjects and investigators)
- Prevents selection bias (sicker patients getting tx, for example)
What is the problem with stopping a RCT early for benefit?
Size effects are often overestimated
What are the advantages/problems of the intention-to-treat analysis compared to the as-treated analysis?
- Advantage: Preserves exchangeability
- Problem: Diluted treatment effect due to non compliance and unplanned cross-over
Bottom line:
- Good for effectiveness, as it is a good reflection of the real world
- Conservative estimate of efficacy
2 problems of subgroup analyses
- Multiple comparison problem (more Type I error, need more stringent alpha)
- Lack of power (more Type II error, due to smaller n)
Bottom line: subgroup analyses should ideally be planned in advance, but it is not always possible
Major differences between RCT and observational study?
Observational study:
- Tx assignment no longer under control of investigator
- Absence of random assignment, so cannot assume exchangeability of groups
What is a cohort?
A study population, drawn from a source population, with goal to make inference to a reference population
What is the main difference between prospective and retrospective cohort studies?
Prospective
- Exposure assessed at beginning of study or as it occurs during study
- Outcomes assessed in future
Retrospective
- Exposure assessed from historical records
- Outcomes assessed when study is begun
What are ambispective cohorts?
Mixture of propsective and retrospective cohorts
What is the difference b/w prospective and retrospective cohort studies in terms of non-differential exposure?
Prospective: nature forces non-differential exposure measurement
Retrospective: mask to induce non-differential exposure measurement
Closed/stationary cohort?
Fixed membership - no one can enter, but people can leave by loss to follow-up and competing risks
Open/dynamic cohort?
People can come in and come out
If we say that the study design should follow the etiology - what do we mean?
Baseline should take place before the biological onset to assess exposure of interest
What are fixed (vs time-varying) exposures?
Fixed: Stable regardless of the disease process (e.g., genotype, race, ever smoked)
Time-varying: varies within individuals (e.g., health behaviour, meds use, biomarker levels, SES)
What is immortal time bias?
“Period of follow-up during which, by design, death or outcome cannot occur”
- When entering into one group (e.g., oscar winners) is contingent upon survival
- Time during which the participant is “immortal” should be considered as non-exposed (e.g., non oscar winner)
- Excluding this time would also introduce bias (selection bias)
Internal vs external validity in terms of population?
Internal validity: result from study population is equal to the true effect in the source population
External validity: result from study population is equal to true effect in reference population
Formula for risk difference (RD)? Range?
= Risk1 - Risk0
*range = [-1, 1] (0 = null)
Reminder that risk = cumulative incidence = (#who develop disease during period)/(#at risk during period)
Formula for relative risk/risk ratio (RR)? Range?
= Risk 1/Risk 0
*range = [0, infinity] (1 = null)
Reminder that risk = cumulative incidence = (#who develop disease during period)/(#at risk during period)
Formula for incidence rate difference (IRD)? Range?
= rate1 - rate0
*range = [infinity, infinity] (0 = null)
Reminder that rate = (#who develop disease during period)/(total person time at risk during period)
Formula for incidence rate ratio (IRR)? Range?
= rate1/rate0
*range = [0, infinity] (1 = null)
Reminder that rate = (#who develop disease during period)/(total person time at risk during period)
Absolute vs relative measures?
Absolute: difference measures
Relative: ratio measures (RR, IRR) - may be very large even for exposures with little public health consequences
What is incidence hazard?
Instantaneous failure (hazard) rate at time t given that individual has survived up to time t (may be derived from cumulative survival)
What is a cumulative hazard? How is it measured?
Accumulation of hazard from time zero to time t
By Cox proportional hazards models
As N increases, what happens to random error, precision, and bias/systematic error?
Random error: decreases
Precision: increases
Systematic error: stable
What is selection bias?
Bias regarding the way in which the study participants have been selected
- Differential selection of exposed vs non-exposed
- may be immigrative (selection into study)
- may be emigrative (selection out of study)
What is information bias?
Bias regarding the way the study variables are measured
- Collecting different quality and extent of information from exposed and non-exposed groups
What is confounding bias?
Incompletely controlled factors
What happens if losses to follow-up are non-differential with respect to disease? And differential?
Non-differential: unbiased estimate
Differential: biased estimate
4 ways to limit selection bias in cohort studies?
- Random draw from source population (for immigrative selection bias; by design)
- Limit loss to follow-up (for emigrative selection bias; by design)
- Contact sample of non-respondents to assess if differential and according to which characteristics (for emigrative selection bias; by design)
- Sensitivity analyses
Synonym for information bias?
Measurement error
Possible consequence from information bias?
Misclassification bias (erroneous info results in individual being placed in a different category)
When is misclassification non-differential, vs differential?
Non-differential: errors on variable in question do not depend on value of other variables (e.g., disease status)
Differential: error on variable depend on value of other variables
What would happen if there is information bias (non-differential/differential) without confounding and selection bias? With?
Without:
- Non differential: toward the null
- Differential: either direction
With:
- Anything goes
7 ways to minimize information bias in cohort studies?
- Improve measurement properties of tools
- Repeat assessments
- Ensure parallel assessments in exposed vs unexposed groups
- Blind assessors
- Validate sub-samples
- Correct for known measurement error
- Sensitivity analyses for unknown measurement error
3 properties of a confounder?
- Must be associated with exposure
- Must be associated with outcome
- Must not be an effect of the outcome or of the exposure
Can the level of confounder vary over time?
Yes - you adjust with regular regressions
What is confounding by indication?
When association between confounder and exposure is expected to be in a predictable direction (e.g., sicker individuals have more radical tx)
5 ways to minimize confounding bias?
- Randomize
- Restrict study population
- Adjust (when confounders no affected by prior exposure)
- Causal methods (when time-varying confounders are affected by prior levels of exposure)
- Sensitivity analyses
What are the two classification criteria of Pearce (2012) classification scheme?
1) outcome under study (incidence vs prevalence)
2) sampling based on the outcome or not
What is a study base?
Either aggregate of total population-time in which cases occur OR members of source population during time period when cases are identified
4 principles of the study base?
- Sample controls from study base in which cases arose
- Controls are proxy/sample for complete study base
- Controls should be representative of study base’s exposure distribution
- Controls should be selected independently of the exposure
Can incidence case-control studies be conducted in a defined cohort or in a dynamic population?
Both.
- In a defined/closed cohort, cases accrue over the course of follow-up in the exposed and unexposed subcohorts
- In a dynamic population, in a steady state, the exposed and unexposed person-time is assumed to be relatively constant over time
3 types of case-control studies conducted within a fixed cohort?
- Density/risk-set sampling (controls selected longitudinally from risk set available each time case arises)
- Cumulative incidence/exclusive sampling (controls from those who never experience outcome)
- Case cohort/inclusive sampling (controls from entire source population - those at risk at beginning of follow-up)
Control sampling in density (nested) case-control studies?
Density sampling or risk-set sampling: Select controls longitudinally based on risk set available each time a new case arises
Goal: to represent the distribution of exposed person-time vs unexposed person-time in the source population
Problem with density sampling?
Often infeasible because it relies on having info about everybody’s disease status, updated regularly
In density sampling, is the risk set static or dynamic?
Dynamic - changes from one case to the next
In density sampling, what influences the probability that an individual control will be selected?
- Contribution in person-time at risk (the more time contributed, the more likely to be selected)
- Eligible to be selected multiple times, as long as at risk (i.e., may serve as control for multiple cases)
In density sampling, what happens if a control becomes a case?
Will be included in the study both as control and as a case
Primary measure of association in case-control study?
Odds ratio
Formula for odds ratio?
= [(E+D+)(E-D-)]/[(E-D+)(E+D-)]
What formula should be true if controls are selected independently of the exposure?
E+D-/E-D- = PT1/PT0
In incidence density case-control studies, the Odds Ratio is equals to…?
OR = Incidence Density Ratio = Rate1/Rate0
- If controls are selected independently of exposure
Sampling method in Cumulative incidence (“exclusive”) case-control studies?
Controls are selected from those who do not experience the outcome at any point during the follow-up (survivors)
Cumulative case control studies correspond to which cohort studies?
Those that follow a closed cohort and measure risks
What is the main problem with cumulative case control studies?
Controls selected from the survivors do not represent the experience of the entire source population, because it ignores the contribution of cases
In cumulative case control studies, the Odds ratio estimates…
OR approximates Risk Ratio, and ONLY FOR RARE DISEASES (< .05)
Why is the rare disease assumption necessary for cumulative case control studies?
If the disease is rate, the experience of cases will be a very small part of the overall experience of the source population
What happens if you don’t satisfy the rare disease assumption in cumulative case control studies?
The odds ratio will be an overestimate of the risk ratio
What is the sampling method in case-cohort (or inclusive sampling) studies?
Controls are selected from the entire source population (those at risk - disease free - at beginning of follow-up)
What happens to the probability that a participant will be included as control in case-cohort studies?
- Every participant has an equal chance of being included as a control
Why is there no need for the rare disease assumption in case-cohort studies?
- Controls are selected from the entire source population, so the distribution of exposure is equal to the source population
In case-cohort studies, odds ratio is equal to…
Risk ratio
How are controls to be selected in dynamic case-control studies?
If steady state with constant level of exposure over study period: estimate distribution of PT1 and PT0 in the source population from a representative sample of controls without the outcome
If no steady state: controls sampled at midpoint of the study period will represent average distribution of PT0 and PT1, and OR=IDR
2 types of study bases?
- Primary study base = fixed cohort
- Base defined by experience to be investigated, usually well-defined
- Cases are subject within the study base who develop disease, and all cases are identifiable but not necessarily used - Secondary study base = dynamic population
- Cases are defined before study base is identified
- Study base is defined as source of the cases (controls are those that would have been recognized as cases had they develop the disease)
Pros/cons of primary vs secondary study base
- Primary study bases are easier to sample for controls (bc base is well defined), but otherwise limited
- Hard to control sample for secondary study bases (who would have become cases had they developed the disease?)
Overall, secondary study bases are more practical and more common
What is a population roster/sampling frame?
- Census lists, birth certificates, electoral rolls, etc.
- Makes sampling easier
3 other approaches when no roster is available?
- Random digit dialing
- Neighborhood controls
- Hospital controls
Optimal number of controls per case?
- Usually 1:1, but you gain precision (not validity) for up to 1:4
- Beyond that, marginal added value
What happens to exchangeability in case-crossover design?
Since each case serves as their own control, the confounding by stable and slow-varying characteristics is eliminated (measured or not), thus increasing exchangeability
Where is the control period in the unidirectional case-crossover?
Before
When is bidirectional case-crossovers acceptable?
When outcome cannot influence subsequent exposure
What happens in case-crossovers to cases with identical exposures (concordant pairs) during analysis?
They do not contribute
What is the measure used in case-crossovers and what is estimated?
- Exposure odds ratio
- Unbiased estimate of the IRR
What are cross-sectional studies? Synonym?
Synonym: prevalence studies
Snapshot of the distributions of exposure and outcome in a population
When are cross-sectional studies practical?
- Disease onset are to measure
- Lengthy follow-up
- Not enough resources
What are the uses of cross-sectional studies?
Public health, burden of disease
Most common measure used for cross-sectional studies?
Odds ratio, unless the outcome is common (>10%) then best to estimate prevalence ratio
Formula of Prevalence odds ratio for cross-sectional studies
= [(E+D+)(E-D-)]/[(E-D+)(E+D-)]
Formula for prevalence ratio in cross-sectional studies?
= [E+D+/all E+]/[E-D+/all E-]
Prevalence ratio is an approximation of
IRR
What is the ecological fallacy?
Apply population data to draw conclusions on the individual level
What are multilevel studies?
Any study with multilevel or nested data (e.g., individuals clustered within areas or groups)
Common example of a multilevel study?
The effect of ecological exposure on individual-level outcome
What is cross-level interaction?
When individual-level exposures vary depending on class-level exposure
What is a measure of potential impact?
An estimate of the effect of removing the exposure
What are the two measures of potential impact at the individual level?
- Attributable risk (risk difference)
- Attributable fraction
What are the two measures of potential impact at the population level?
- Population attributable risk
- Population attributable fraction
What is background risk?
Incidence not due to exposure, or risk in the unexposed group.
- explained by the fact that there can be several sufficient causes for a given disease (i.e. consistent with multicausality)
Formula for attributable risk?
AR = risk1 - risk0 (= to risk difference)
Assumptions to attributable risk:
- Effect of exposure is causal
- Sum does not add up to 100% due to interaction
Interpretation of attributable risk, e.g. of 0.7%:
Among 1000 babies who regularly sleep prone, there are 7 excess cases of SIDS attributable to prone sleeping
What is the attributable fraction?
Proportion by which the incidence rate of the outcome among those exposed would be reduced if the exposure were eliminated
Formula for attributable fraction?
= (risk1 - risk0)/risk1
= AttribRisk/risk1
= (RR-1)/RR
What is the interpretation of attributable fraction, e.g. of 0.67:
Among the prone sleeping babies, 67% of the cases of SIDS are attributable to the prone sleeping posture
What is the population attributable risk?
How much of the burden of disease is due to the exposure in the entire population?
Formula for population attributable risk?
= risktotal - risk0
= AR*probability of exposure
Interpretation for population attributable risk, if 2.4 per 1000?
Among every 1000 babies in a population, there are 2 excess cases of SIDS attributable to prone sleeping (if all babies were made to sleep on their backs, then 2 SIDS cases can be averted for every 1000 babies
Goal and assumptions of population attributable risk?
Goal: provide measure of public health impact of exposure on entire population (answers to question about value of prevention)
Assumptions
- Effect of exposure on outcome is causal
- Depends on prevalence of exposure and strength of the effect
What is the population attributable fraction?
The proportion by which the incidence rate of the outcome in the entire population would be reduced if the exposure were eliminated (proportion of disease in pop attributable to exposure)
Formula for population attributable fraction?
= PAR/risk total
= [ExpPrev(RR-1)]/[ExpPrev(RR-1) + 1]
Interpretation of population attributable fraction, e.g., 41%
Making all babies sleep on their back would eliminate 41% of all cases of SIDS in the population