Study Designs Flashcards
How to make a cohort study based on cross-sectional study
Exclude prevalent cases, implement follow-up. Hypothesis changes to: Is there a difference in disease incidence between exposed and non-exposed?
Measurements and interpretations for cohort studies
Risk Ratio: (2.78) The risk of outcome in the 2-year folllow up is 2-fold higher in exposure than unexposed;
Attributable Risk (22.61 cases per 1,000 population at risk): 22.61 cases per 1,000 at risk is due to their exposure, assuming there’s a causal relationship between exposure and outcome.
Incidence Rate (person-year)
2 types of exposure misclassification (measurement error) and their impacts
- Nondifferential: The degree of exposure misclassification is similar in cases and in controls. The bias is toward the null. Underestimate the association.
- Differential: the bias is away from the null. Overestimate the association.
Note: Whether bias results, and the direction of the bias, depends on the actual pattern of misclassification in cases and in controls. Need to examine the data!
In RTC, what will be considered as unplanned cross-over?
- Controls intentionally or unintentionally exposed to intervention
- Treatment group stop or refuse further intervention
In RTC, what is a commonly used strategy to allocate participants to study groups? Steps? (3)
Stratified Randomization (Blocking):
- Eligible & willing participants characterized (e.g., sex, age)
- Grouped into stratum of each characteristic
- THEN randomized to study group
T/F Randomization can affect statistical power.
False
The process of randomization does not affect power.
Special Groups controls ( Example, Pro 1, Con 1)
Example: Relatives, friends, co-worker
Pro: Likely high level of comparability; come from same residential area as cases; similar SES, lifestyles
Con: Not good for lifestyle exposures: similarities may result in small differences in proportion exposed in cases and controls
2 Major Assumptions in Life Table Survival Analysis
Censored are similar to those followed up
Stable Population Survivorship
Survival Curve (come from Kaplan-Meier Method)
RTC: Major Methodological Components (7)
- Selection of study sample
- Allocation to study groups
- Compliance
- Ascertainment of outcomes
- Monitoring adverse effects
- Stopping rules
- Measures of effect
Interpretation of Number needed to treat
To prevent one case of xxx during a period of xx years, xx persons would have to receive the intervention.
Population Surveillance
1) Systematic collection, recording, analysis, interpretation, and dissemination of data reflecting the current health status of a community or population;
2) Monitor changes in disease frequency and prevalence of risk factors within overall population or subgroups;
3) Fundamental role in disease prevention and control, health promotion, recommendations and policy
Measures in case-control studies
OR: AD/BC; proportion of exposed cases: A/A+C proportion of exposed controls: B/B+D
How to adjust for effects of other factors in prognosis
Use Multivariable regression model (Logistic; Possion)
Frequency matching definition
Controls selected to match proportion of cases with certain characteristic. More relaxed than individual matching.
Cross-sectional Studies cons
1) Prevalence only; 2)Temporality between exposure and disease; 3) Sicker cases of disease may have died (Survivor bias); 4) Severe exposures may be hospitalized (not counted)
Case-control study pro (4) and con (4)
pro:
- Efficient in regards to time, costs, effort
- Diseases with long latency
- Good for rare diseases
- Multiple exposures with single outcome
Con:
- Inefficient for rare exposures
- No direct measure of incidence
- Temporality may be hard to establish
- Particularly prone to selection bias and recall bias
Case-Control Null hypo
OR=1.0; The odds of exposure to […] are not different between cases and controls.
How to assess exposure (3) and their pro/con
- Pre-existing records
- Advantage: inexpensive, large samples
- Disadvantage: data quality (missing data; diagnostic criteria)
2. Interviews & Questionnaires
- Advantage: inexpensive, low burden, large samples
- Disadvantage: Response Bias (social desirability), transportability across population subgroups
3. Clinical examinations
- Advantage: accuracy & reliability; things that can’t be assessed by questionnaire (e.g., blood, ECG, fitness)
- Disadvantage: costs, burden, risks, standard interpretation
Corhort studies pro (6) and con (4)
Pro:
- Rare exposures
- Multiple exposures with multiple outcomes
- Minimizes bias in exposure assessment (prospective)
- Temporality between exposure and disease occurrence
- Direct measure of risk
- Nested studies
Con:
- Inefficient for rare disease;
- Feasibility issues: sample size, time, costs
- Data quality (retrospective)
- Loss to follow-up bias
Ecological Study [data, exposure, outcome, traits]
Data: Aggregated group data are the units of observation(no individual measures)
Exposure: Average exposure in group
Outcome: Average disease experience in group
Traits:
1) Cost- and time-efficient relative to a de novo study;
2) Preliminary evidence for confirmation by individual-level study;
3) Correlation between exposure and disease (by correlation coefficient. <0.3 week; 03-0.7 moderate; >0.7 strong)
RTC: Study population
- Sample of base population who meet eligibility, inclusion and exclusion criteria and are willing to participate in study protocol
- Large enough for sufficient number of events
- High likelihood of compliance & completion
Cross-sectional Studies: Research question
Is exposure status related to the presence of disease when measured at the same time point?
Ecological Study Shortcomings
Because Aggregate data are being analyzed: 1) Lack temporal information for associations; 2) Lack adjustment for differences in other characteristics (confounding); 3) Measurement differences (diagnosis, recording, treatment); 4) Composite ecological measures are less variable than individual measures, which could mask relationships
Difference between Observational and Experimental studies?
exposures not assigned/assigned by researcher
RTC: how to ascertain outcomes?
- Questionnaires, repeat clinic examinations, NDI
- Standardized case definition, and case verification
- Data gatherers blinded to study group status
3 types of information bias in exposure assessment
- recall bias - accuracy of recall differs between case and control;
- observation bias – data gathering differs between case and control;
- social desirability bias – under/over report behaviors
National Health Objectives two Overreaching goals:
- Increase quality and years of healthy life; 2. Eliminate health disparities
Effective surveillance requires
reliable flow of information
Major bias in cohort study (6)
- Losses to Follow-up (the major bias)
- Non-response (non-participation) bias
- Selection bias (less concerning as Exposure assessment precedes occurrence of Outcome; mostly threatens external validity)
- Exposure misclassification
- Outcomes ascertainment bias (blinding of gatherers)
- Outcomes misclassification
Control selection issues (2)
- selection must be independent of exposure 2. Four Sources of Controls each have their pro and con (Hospital-based; Community-based; Neighborhood; Special Groups)
Cohort study: design layout
- define base population
- select study sample and exclude prevalent cases
- assess exposures
- follow-up to ascertain disease cases
How to assess exposure in case-control studies?
- Use Questionnaire, interview, medical records, proxy 2. Exposure assessment should be comparable for cases and controls (method, place, and circumstances) 3. Data gatherers blinded to case-control status
Analytic studies
determinants of disease distribution and whether exposures are causally associated with disease
RTC: allocation to study groups (After? Pro? How?)
After:
- Determining eligible & willing; consented
- Baseline assessments (prognostic profile at entry)
Pro:
- Characteristic that makes experimental designs useful.
- Maximize comparability between intervention & control groups.
- Balance distribution of extraneous factors (confounders), measured and unmeasured, that could affect intervention.
How to randomization to study groups
- Each individual has same chance of group assignment.
- Random numbers table; draw from hat; computer program
Broad categories for RTC and their definition (maybe will have matching questions)
- Primary prevention: Examine whether an intervention reduces the risk of first disease occurrence. Study sample often has increased risk of outcome (elevated risk factors; preclinical disease) .
- Secondary prevention: Examine whether an intervention reduces symptoms, risk of recurrence or death in those with existing disease (prognostic trials).
- Efficacy trial: Examine the extent to which an intervention produces benefit under ideal circumstances.
- Effectiveness trial: Examine if the benefit of intervention is obtained by reasonable percentage of participants in a more pragmatic setting?
What factors can influence prognosis (4)?
- type and stage of disease
- timing of diagnosis or intervention
- initial health status
- age, sex, etc
Matching (Advantages 2; Types 2)
Pro: Enhances study efficiency ; Can provide control for confounding. Types: Individual matching; Frequency (“group”) matching
2 Methods of Population Surveillance
Passive and Active surveillance
Cross-sectional Studies pros
1) Feasible (sample size, time, costs);
2) Multiple exposures and multiple outcomes;
3) Surveillance of exposure and outcome distributions(monitor person, place, time trends)
4) Hypothesis generation for etiologic studies
5) Potential for Prospective cohort (if exclude prevalent cases and implement follow-up)
Median Survival Time (measure prognosis)
- Survival time for half of the study population
- Less affected than mean by skewed survival distribution (e.g., extreme low or high survival)
- Only need deaths on half study sample
- Relatively broad view of survival differences between groups
Hospital-based controls ( Con 4)
Con: 1. Difficult to define source population of hospital cases; 2. By definition are ill; thus, differ from base population in ways that could affect exposure prevalence; 3. Factors that influence hospital selection may be DIFFERENT than for cases 4. Exposures (esp. lifestyle factors) may be related to the disease requiring hospital visit
What is the begining of prognosis
Diagnosis or Intervention
Analytic studies include?
Case-Control and Cohort
Case-Control design layout
- define base population 2. select on disease status 3. Assess past exposure
What;s the research question for cohort studies?
In a definded population that is initially without the outcome of interest, is exposure to a specidiedfactor associated with future develpment of the outcome?
Prognosis: 3 common measures
- Case-Fatality
- Observed Survival Rates
- Median Survival Time
T/F: If want to achieve maximized external validity, we will choose RTC?
False!
Don’t choose RTC because it may not well represent the general public. External validity is generalizability!
Examples of Passive surveillance
National Registry of Birth Records (NRBR) National Death Index (NDI) Surveillance, Epidemiology, and End Results (SEER) Program
How to assess outcome (3) and their pro/con
- Pre-existing records
- Advantage: passive follow-up; standardized case criteria
- Disadvantage: may be incomplete; quality of data reporting
- Interviews & Questionnaires
- Advantage: inexpensive, low burden, large samples
- Disadvantage: participation; verification of events
- Clinical examinations
- Advantage: accuracy & reliability, things that can’t be assessed by questionnaire (e.g., blood pressure; subclinical)
- Disadvantage: costs, burden, risks, interpretation
Neighborhood controls ( Pro 1, Con 2)
Pro: Likely high level of comparability;
Con:
- Logistics, costs, time intensive
- Quality of exposure information may differ between cases and controls: neighborhood non-cases may be healthier and less accurately recall past exposures
pro/con in choosing prevalent or incident cases
Prevalent:
- Pro: easier case acquisition; potential larger sample size;
- con: may reflect selective survivors (severe cases underrepresented); temporal sequence harder to determine.
Incident:
- Pro: temporal sequence between exposure and outcome;
- Con: have to wait for cases; severe cases may die before enrollment
Population Surveillance issues
1) repliers decline 2) technique, sample may not be representative any more
Basic questions in Epidemiology
What? (disease) Who? (person) When?(time) Where?(place) How?(exposures) Why?(mechanisms)
2 common approaches of “Life Table” Analysis
1) Actuarial Life Table
2) Kaplan-Meire
Survival Curve (come from actuarial method)
Matching (Why? Definition.)
Why? Cases and controls likely will be different on factors related to the exposure distribution and to the outcome (e.g., age, sex, smoking, etc).
Def: process of selecting controls so that they are similar to cases on specific characteristics.
How to enhance compliance and rate of completion in RTC?
Enhance active monitoring and facilitation by researcher:
- Select population that is interested & reliable
- Pilot study or run-in phase to detect noncompliers
- Frequent contact, calendar packs for drugs, incentives
- Monitor percentage of completed activities within groups
2 types of cohort study
- Prospective
- Retrospective
Note: In both, exposure will come first and can give a direct measure of disease incidence!
Examples of Active surveillance
National Health and Nutrition Examination Survey (NHANES) ; National Health Interview Study (NHIS); Behavioral Risk Factor Surveillance System (BRFSS)
Design Layout for RCT
- Define base population
- Screen interested/ willing
- Obtain consent among eligible
- Randomize to study groups
- Intervention and follow-up
Median Survival time Figure
For 1984, 50% percentile survive after 3 years; For 1994, it’s after 6 years.
Prevalence Odds Ratio
In cross-sectional study.
Hypothesis: Is there an association between disease and exposure? Odds of having disease in exposed/ that in non-exposed OR= AC/BD
Research question of randomized trails?
Is the incidence of outcome lower among those randomized to an intervention group compared with those randomized to a control group?
RTC: validity of study findings – systematic errors
External validity
Internal validity
The measure for individual matching
Matched odds ratio = B/C Have 2 concordant pairs and 2 discordant pairs but only interested in the later 2. Interpretation: (OR=1.68) Cases have 68% higher odds of exposures than in controls after accounting for age, sex…
RTC: Base population
- To whom results will generalize (infer)
- Focused by restriction on sex, age, prevalent risk factors, prevalent disease, etc
RTC: Monitoring adverse effects (why?who?)
- Treatment benefits must be considered against risks of adverse effects (do no harm)
- Data Safety Monitoring Board (DSMB)
Progression of Clinical Trial Phases
- Phase I: test new biomedical intervention in a small scale study (e.g., N = 20-80) for the first time to evaluate safety (e.g., safe dosage range; side effects).
- Phase II: study biomedical/behavioral intervention in large scale (e.g., several hundred) to determine efficacy and further evaluate safety.
- Phase III: further evaluate efficacy and safety in large scale(e.g., several thousand) comparing intervention to control condition.
- Phase IV: post-marketing studies of approved efficacious and safe intervention to evaluate effectiveness in general population
Cross-sectional Studies: Design Layout
1) define base population; 2) select sample (selection independent of disease and exposure); 3) assess exposures and disease, simultaneously.
Null hypothesis in RCT?
RR = 1.0
There will be no difference in incidence of outcome between intervention and control groups
Descriptive studies include?
Ecological and Cross-sectional
Problems with matching (4)
- Matching on several variables increases difficulty finding controls;
- Cannot examine the association of outcome with matching variables.
- Unplanned matching: particularly lifestyle factors and neighbors, family as controls, because they are too similar.
- Matching may make cases and controls more similar on exposure of interest. Thus accidentally wipe out the potential association.
1) Residual confounding (incomplete control of confounding in frequency matching);
2) Matching on factor weakly associated with exposure or outcome can reduce statistical power.
Cross-sectional Studies: Design type
1) Observational study at a single time point in a defined population; 2) snapshot of the population; 3) Exposure and disease assessed simultaneously; 4) Prevalence study; 5) Generate hypothesis for etiologic studies; 6) Monitor Person, Place, Time trends (surveillance)
Active surveillance
Regular contact with health care facilities (clinics, hospitals) or study participants to ascertain cases; May involve interviewing physicians and patients, reviewing medical records, clinical examinations, mailed questionnaires, telephone interviews
Hospital-based controls ( Pro 3)
- Same hospital as cases, so easily identified; low costs and effort
- Need for healthcare, so may be more aware of past exposures (lower recall bias) and more willing to cooperate
- Factors that influence hospital selection may be similar as for cases
How to calculate the interval p and cumulative p in an actuarial life table?
Community-based controls ( Definition, Pro 1, Con 2)
Def: Random probability sample of non-cases within the defined community that gave rise to cases.
Pro: Likely a high level of comparability;
Con:
- Logistics, costs, time intensive;
- Quality of exposure information may differ between cases and controls : controls may be healthier and less accurately recall past exposures
Comparison between Passive and Active surveillance
Passive: Pros: 1) Relatively inexpensive and easy to develop 2) Provides a way to identify areas that need assistance; Cons:1)Quality of data may be low, 2) Underreporting of disease! Active: Pros: Allows for collection of more complete data; Cons:Expensive and time consuming
RTC: Measures of effect
- Person-years exposure
- Incidence density (rates)
- Rate differences: (Placebo – Intervention)
- Rate ratio (relative risk): Intervention / Placebo
- Number Needed to Treat (NNT): 1/(Placebo rate – Intervention rate)
- Time to event analysis:
- Survival analysis (Kaplan-Meier curves – up stair like)
- Multivariable Regression (Cox regression)
what to consider when choosing a study design?
- research question
- nature of outcome and exposure
- available resources/ time
- study logistics
- previous study findings/gaps to be filled
In RTC: What you need to know to estimate sample size?
- desired level of power*
- expected number of endpoints (expected rate)
- effect size (expected rate difference; number of cases)
- expected compliance (compute on low, mod, high)
- alpha level
- 1-sided or 2-sided test
case-control 2 major issues
Selection of cases & controls; Obtaining exposure information
In RTC, when should we use blocking?
- Useful when a characteristic is particularly related with outcome (e.g., survival is worse among elderly women)
- Also useful when experimental sample is small (e.g., because of exclusions, limited funding, etc)
Observational studies include?
Descriptive and Analytic. Hypotheses often derive from descriptive studies.
What is a nested case-control study, how to do one, and what are the pros
Def: Case-control study conducted within a cohort study. Cohort serves as sampling frame for incident cases.
How?
- Baseline exposure assessment before disease occurs
- Follow-up for disease occurrence: Controls selected from cohort members who are without the disease at time of case occurrence; Multiple controls typically matched to a case
Pros:
- Efficient design for using stored samples and costly bioassays or other measures that would be cost-prohibitive to complete in entire cohort.
- Temporal sequence known.
- Density sampling (at-risk time is the same).
- Low likelihood of information bias (Exposure precedes Disease)
What is Observed Survival Rates
Analysis of time-related patterns of event
What should control look like?
Controls should not look like cases except not getting disease but should look like the population the disease cases derive from (e.g. age distribution)
In corhort studies, what are the ways and examples to select sample?
- Select on a characteristic not related to exposure
example: 1) Community of residence; 2) Occupation (e.g., Nurses Health Study). - Select on Exposure status
example: Occupational exposures: Agricultural Health Study; Study of London Busmen
What are the two types of blinding?
- Single blinded – only participants blinded
- Double blinded – participants & investigators blinded
RTC: what is blinding used for? What is the mostly used strategy?
- Used for concealment of group assignment.
- Placebo: Important for establishing incidence of adverse effects as well as for evaluating intervention effects.
Passive surveillance
Responsibility for reporting is on the healthcare providers and/or the district health officers
Case selection issues (3)
- Case definition (standardized diagnostic criteria, case classification, homogenous disease entity);
- Sources of cases (Hospital; Physician clinics: differences in case diagnostic criteria; Registries: recording patterns; General population would be preferred but it’s logistically difficult and expensive);
- Choose prevalent or incident cases
Why prognosis is important? (4)
- Better understanding of natural history of disease;
- Effectiveness of screening;
- Effectiveness of intervention;
- Identify high-risk subgroups
What’s the research question for case-control studies?
Is previous exposure to a specified factor different in persons that have a defined outcome (cases) compared to persons that do not have the outcome (controls)?
Individual matching definition and result
Def: Each control is matched to a case on specific factor(s).
Result: case-control pairs. Include matching variables as covariates in the statistical model to prevent introducing a bias.
Kaplan-Meier Method to measure prognosis
- Exact intervals span from one event to the next
- What is probability of surviving each time interval, given being alive at its start? (conditional P)
- What is the overall probability of surviving the entire observation period? (cumulative P)
How to interpret the interval p and cumulative p in an actuarial life table?
The probability to survive year x provided they survived year x-1.
The probability to survive xx years after the being diagnosed of disease .
Definition of prognosis?
How an individual is expected to progress after:
disease onset (or diagnosis);
intervention
RTC: Sources of potential bias?
- Inconsistent case definitions
- Unblinded data gatherers
- Large losses to follow-up
- Differential losses to follow-up between study groups
Experimental studies include?
Clinical Trial and Field Trial
Descriptive studies
distributions of exposures and outcomes in defined populations (Person, Place, Time)
What do we need to know to measure prognosis? (3)
Initiation of follow-up date (diagnosis, intervention)
Events (death, relapse, remission; lost to follow-up)
Follow-up time to event (hours, days, years)
Why case-control ratio and what should it be?
Case-control ratio is a major determinant of statistical power! Optimal ratio is 1:1 ; 4:1 generally is ceiling for balancing gains in statistical power against study efficiency.
2 design layouts for RTC
- SImple, non cross-over design
- Cross-over design
Ecological Fallacy
Inappropriate conclusions about individual-level relationships based on aggregate data