Midterm Flashcards

1
Q

Association vs. Causality

A

Causality requires meeting assumptions such as temporal relationship, strength of association, dose response relationship. Experimental studies tend to look at causality.

Association is when there is limited knowledge and you cannot say for sure that the exposure causes the outcome. Observational studies tend to look at association.

When a study is about association, they will have a hypothesis that states “is associated with” while a causality study will say “increases/decreases the risk”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Descriptive Study

A

A study that describes the distribution of disease (e.g. person, place, or time).

Often an implicit hypothesis such as “the distribution of disease varies by person, place or time”. But can also be explicit as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Analytic Study

A

Motivation is often to identify a causal determinant and find an association between exposure and outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Relative Risk

A

RR can mean incidence rate ratio, risk ratio (cumulative incidence ratio), hazard ratio, and odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Bias

A

Systematic error in the design or conduct of a study that results in a measure of association among study participants that is meaningfully different from the true measure of association (e.g. such as that in the source population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Information Bias

A

Error due to collection of incorrect information about study participants.

Participants are classified into incorrect exposure or disease categories (misclassification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Selection Bias

A

Error arising from 1) criteria or procedures used to select study participants or 2) nonparticipation (occurring at initial enrollment or due to losses to follow-up)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Direction of bias for RR

A

Axis 1: Upward vs. downward (this does not provide information on strength of association is being over or underestimated)

Axis 2: Toward the null vs. away from the null

When assessing direction of bias the reference point is always the true RR. (e.g. if the True OR is .8 and the Obs OR is .2, then the bias is downward and away from the null).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Strength of Association

A

The further from the null, the stronger the association.

Bias away from null overestimates the strength of association

Bias towards the null underestimates the strength of association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Source population in a cohort study

A

The population that gave rise to the study sample. (should always include calendar time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

General cohort

A

Defined by a factor unrelated to any particular exposure

Typically a convenience sample based on logistical advantages (e.g. willingness to participate, ease of recruitment, and/or follow-up)

Use of an internal comparison group

Uses RR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Specific-exposure cohort

A

Defined by a specific exposure

Use of an external comparison group (e.g. general population).

Method to analyze is indirect standardization

Uses RR

Susceptible to selection bias (such as healthy worker effect) - The main issue is that the exposed cohort and nonexposed external
comparison group are not selected in the same fashion from the same
source population. Selection from different source populations may result in
different disease risk for reasons other than the exposure under study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sources of selection bias

A
  1. different criteria are used to select exposed and unexposed participants
  2. Selection of exposed or nonexposed participants is related to the development of the outcome of interest
  3. Loss to follow-up is related to both the exposure and the outcome of interest (differential losses to follow-up)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Susceptibility to selection bias

A

Cohorts with internal comparison groups are less prone to selection bias than specific-exposure cohorts. Study participants are selected before the development of the disease and it is unlikely that future events will bias selection process. Cohorts using internal groups could have selection bias due to differential losses to follow-up.

Cohort using an external comparison group - healthy worker effect - RR is biased downward

specific-exposure cohorts are extremely prone to selection bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Differential losses to follow-up

A

a situation in research where participants who drop out of a study have different characteristics than those who stay in the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Source Population in case-control

A

The population that gave rise to the cases. Essentially, the population of persons who would have been identified as cases if they had developed the condition of interest during the time period in which the cases were identified.

Calendar time should be included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Types of Source populations

A

Primary source population - well-defined (e.g. residence, calendar period), and specified a priori. Determines case ascertainment
Examples include:
- residents of a defined geographic area
-members of a health plan
-members of a general cohort

Secondary source population (more prone to selection bias than primary) - theoretically defined and inferred based on the method of case ascertainment. case ascertainment method is defined a priori. “Would/if criterion” is employed.
Examples include:
- cases ascertained through a hospital “person who would attend the hospital if..”
-cases recruited through advertisements “person who would answer the ad if they were…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Case-control studies

A

a method of sampling controls from the source population such that the controls reflect exposure distribution in the source population that gave rise to the cases. Controls should be randomly sampled and representative of source population.

Uses odds ratio.

case selection: includes all cases that arise in the source population. But in reality usually only a sample of cases are included but they need to be representative of all cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Selection bias in case-control

A

If the exposure under study is not similar among study cases compared to all cases that arose in the source population.

If the exposure under study is not similar among study controls compared to the source population.

Prone to selection bias. cases and controls are often selected through fundamentally different processes
- imperfect method of case ascertainment
- case non-participation
- case refusal, inability to locate cases, case too sick, case died

Controls: - non-participation, control refusal, inability to locate, random sampling from primary source is hard, secondary source pop is difficult to operationally define

partial non-participation among cases and/or controls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Timeline of case and control recruitment

A

ascertain and recruit incident cases

accumulate controls during the study period at same rate that cases are being accumulated

source population is restricted to persons at risk of becoming a case

a control who later becomes a case serves as both a control and a case

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

2 x 2 table

A

cases controls
exposed a b
non-exposed c d

odds ratio = ad/bc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of Case-control Studies

A

Population-based
- primary source population
- cases: all new cases of disease x that arise
- control: rep sample of the source pop with respect to exposure

Hospital-based
- secondary source population
-cases: same as above but in a hospital
-control: same as above, but it’s hard to achieve in a secondary source pop

Source pop can come from place of residence, insurance, access to a regular physician, etc.

One exception: if most residents of a defined geographic area would attend hospital A and no other hospitals if they contracted a disease then cases could be considered population-based and population-based controls can be used.

Nested
- primary source population
-case and control same as above
-typically conducted when the exposure of interest are measured by assay of stored biologic specimens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Pros and Cons of hospital-based case-control studies

A

Pro
- easily accessible and high participation rate
- protect against recall bias

Con
-nonrandom sample of the source pop, most of whom are healthy
-some may not even be members of source population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Strategies for selection of hospital controls

A

only include patients admitted for diseases for which there is no suspicion of an association with the exposure under study

include controls with a variety of diseases

include diseases thought to have a comparable source population as the disease under study

base exclusions on diagnosis at the current hospitalization, not on past medical history

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

pros and cons of nested

A

pro
- exposure measured at baseline before development of disease
-selection of controls by random sampling from a well-defined, primary source population

sources of selection bias
- incomplete case ascertainment
-cohort losses to follow-up
-selection bias associated with participant selection in the entire cohort itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Confounding

A

When associated with both exposure and outcome and is not a mediator on the casual pathway.

Can be caused by an imbalance b/w exposed and nonexposed groups in another, extraneous exposure (confounder)

If there is confounding and the variable is identified and measured, then can adjust as long as there was no bias in selection of cases or controls within each stratum of the covariate.

For example - if SES is only associated with exposure, but there is over selection of high SES controls, then there is an artifactual inverse association with the outcome leading to confounding. Can be addressed through stratification (Mantel-Haenszel method)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Key test of validity of a case control study

A

Controls and the source pop should be alike with respect to the exposure under study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

The best indication of the presence of confounding is

A

A meaningful difference between the unadjusted RR and the adjusted RR

calculate and inspect RR for each stratum of potential confounder. If the stratum specific RRs are similar, then potential confounding. If they are different it may be effect modification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

When and how to address confounding

A

Design phase - identify potential confounders by consulting the lit

data collection - measure potential confounders accurately

analysis - check theoretical confounders and other study variables. determine if there is confounding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Methods used to adjust for confounding in analysis stage

A

methods based on stratification

multivariable statistical models

standardization (direct or indirect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Magnitude of confounding

A

RRunadj - RR adj
———————— x 100
RRadj

This percentage should be more than 10%. No need to look at p value here. This will show whether is confounding

32
Q

Mantel-Haenszel summary RRs

A

To calculate:

Set up i 2x2 tables (where i is the # of strata or categories of a potential confounding variable)

Compute the weighted average of the stratum-specific RRs (# of subjects or person-time experience in each stratum)

For cohort - risk ratio or rate ratio

For case-control studies - odds ratio

ORmh = sum of aidi/Ni
———————-
sum of bici/Ni

33
Q

When to use MH

A

adjustment for a single confounder that is a categorical variable

simult adj for 2 or 3 confounders, as long as the number of strata for each confounder is relatively small

MH only is to be used for categorical variables

cannot use for large strata - too cumbersome

34
Q

Generalized linear models

A

Linear - no RR estimated
Y = b0 + b1 *X1
If want to find b1 in 10 years, just b1 * 10
Null value is 0

The following models are all log-transformed:

Logistic - odds ratio
- used in case-control studies
-other studies with binary dependent variables
-risk prediction
-ignores time

ln(odds of Y) = bo + b1*X…

Poisson (log-linear) - IRR
- cohort studies with person-time data
- incidence rate studies that use aggregate level data

ln(incidence rate of Y) = b0 + b1*X…

Cox Proportional Hazards - HRR
- studies with binary outcome and person-time data
- cohort studies
- RCTs
- Survival analysis

ln[h(t)] = ln[h0(t)] + b1*X…

35
Q

Unconditional logistic regression vs conditional logistic regression

A

unconditional - used in unmatched case control (can also use stratification such as the traditional Mantel-Haenszel method with stratification by the matching factors for unmatched case control)

conditional - used in some matched cases control

36
Q

Deriving RR (per N-year increase of age) in a log-transformed model

A

for example, it could be any continuous variable.

ex: N = 10

beta(per year) = .04
RR(per year) = 1.05

beta(per year)*10 = .4 then e^.4 = 1.63

This is done in relation to reference level - could be 10 could be 20, but N=10 will always be the same.

37
Q

Categorical or continuous in model for a natural order variable? (test for trend)

A

Can model it as a singular term, if linear then can use as continuous.

Test for trend can be used to assess evidence of an exponential trend (linear on a log scale). only applied to exposures with a natural order.

To do this:
1) model variable as categorical variable to capture shape of the dose-response relationship

2) model variable as a single term in a separate regression model to test for trend

p-value for trend = p-value for b1 in the single term model. if p<.05 then it is significant

38
Q

Standardization

A

Stratification-based method of comparing
rates of an outcome between two populations that have
different distributions of one or more confounders

To make the comparisons fair (i.e. to remove
confounding) by forcing the two populations to have the
same covariate distribution

39
Q

Indirect Standardization

A

Used for retrospective (historical) cohorts with an
external comparison group such as special exposed cohorts.

Standardization covariates MUST be categorical

40
Q

Standardized incidence/mortality ratio

A

incidence ratio: total observed cases
—————————–
total expected cases

mortality ratio: total observed deaths
——————————–
total expected deaths

41
Q

Residual confounding

A

When your study adjusts for a variable or set of related variables that do not completely remove the confounding by that/those variables.

Coarse categorization: This may be because you use too broad of categories so that there are heterogenous groups of people within each stratum. This is problematic because these heterogenenous groups of people could also differ with respect to their exposure prevalence and risk of the outcome.

Suboptimal modeling of the confounder in a multivariable model (e.g. modeling a covariate as continuous when the true dose-response curve is U-shaped)

Inadequate adjustment for complex, multidimensional confounders, such as smoking, SES, and health status

Inadequate measurement of the confounder (measurement error - unvalidated data collection instrument), collection of insufficiently detailed information

**If confounding remains due to not adjusting at all for a particular confounder this is NOT considered residual confounding.

42
Q

Health status as confounder

A

Healthy vaccinee effect - seniors are at high short-term risk of death who are unvaccinated

43
Q

Addressing residual confounding

A

Measurement - measure potential confounders as carefully as the exposure under study. Especially if multidimensional

Data analysis - Use sufficiently fine covariate categorization, optimize modeling of covariates in multivariable models, strive to capture full dimensionality of multidimensional confounders in multivariable models

*however need to take into account statistical imperative of model parsimony - ratio of # of outcomes to # of covariates should be more than 10.

interpretation - Be transparent about the residual confounding in interpretation and how it could be better accounted for.

44
Q

Matching in Cohort Studies

A

Adjust for one or more potential confounders in the design phase of your study

Select non-exposed participants who are similar to the exposed participants with respect to the distribution of one or more potential confounders.

Potential confounders are called matching factors. When matched no need to account in the analysis phase but only if there’s complete follow-up

45
Q

Matching in case-control studies

A

to adjust for one or more potential confounders in the design phase of the study

selection of controls who are similar to cases with respect to their distribution of one or more potential confounders

However matching in the design phase alone does not completely remove confounding and so will need to still adjust in the analysis phase

matching intentionally introduces selection bias and creates a new, superimposed confounding toward the null

Matching on a true confounder increases statistical efficiency by optimizing precision

46
Q

Frequency matching

A

Selection of controls such that the distribution(s) of one or more potential confounders is/are similar in cases and controls

Often used when matching factors are demographic variables (e.g. age, sex, race)

For example if some stratum have 0 individuals, you risk not being able to use the data from all subjects in the study leading to reduced statistical efficiency

47
Q

Individual Matching

A

Selection of one or more controls that are identical to a given case with respect to one or more potential confounders

Useful for controlling for a confounder using “fine stratification” (mini stratum)

matching factors that are multidimensional confounders

using risk-set sampling of controls in nested case-control studies

The matched set is the stratum

Cannot do twin studies with unmatched case-control

Must use conditional logistic regression - don’t need to include matching factors OR stratification - mantel-Haenszel matched analysis (McNemar Test) - this gives matched OR

48
Q

Nested Case-Control Studies with Matching

A

For each case, N number of matched controls can randomly sampled from the case’s risk set
- can restrict the risk set by matching factors

Enables selection of control with the same risk set as case
- same concurrent time at risk for development of outcome

49
Q

Simplest Mantel-Haenszel matched analysis

A

four possible combinations of matched pairs

concordant, concordant, discordant, and discordant. Only need to look at the discordant pairs.

q r s t

r and s are the discordant pairs

r/s = Matched Odds Ratio

50
Q

Can association between matching factors and disease be studied?

A

No, because matching forces controls to be the same as cases with respect to the matching factor therefore, there is no way to find the association.

51
Q

Overmatching

A

Overmatching generally refers to matching that is counter productive, by either causing bias or reducing efficiency. This causes a new superimposed confounding toward the null and leads to loss of statistical efficiency.

Overmatching must be corrected in analysis phase

Matching on a mediator in a causal pathway between exposure and disease will bias the effect estimate towards the null

Matching on a non-confounder that is associated with exposure, but not a risk factor for disease

52
Q

Survival analysis

A

Study of the distribution of time elapsed from a baseline time to an outcome(event)

Study of the effect of exposures (including treatments) affect the distribution of time to event

Used for two study designs: cohort studies and RCTs

Baseline data examples: date of entry into a cohort, birth date, etc.

outcome examples: death, incident disease, disease cure, etc.

It is better to experience a beneficial outcome earlier than later

It is better to experience an adverse outcome later than earlier

53
Q

Cumulative incidence vs. cumulative survival

A

CI (0 to 1) is the proportion of a specified population at risk that experienced the outcome under study during a specified time period

Probability (risk) of experiencing the outcome under study in the specified time period

CS (0 to 1) is the proportion of a specified population at risk that does NOT experience the outcome under study (i.e. “survives”) during a specified time period

Probability (risk) of NOT experiencing the outcome under study in the specified time period

Can both be calculated directly if closed cohort

54
Q

CS + CI =?

55
Q

CI curve vs CS curve

A

CI curve is the proportion of subjects who have experienced the event as a function of time since baseline

CS curve is the proportion of subjects who have NOT experienced the event as a function of time since baseline

56
Q

Median survival time

A

where CI = CS = .5

57
Q

How to plot cumulative incidence/survival as a step function

A
  1. rank survival times from lowest to highest
  2. create intervals that start when one or more events occur
  3. calculate cumulative incidence during interval
  4. calculating cumulative survival would just be subtraction/total instead of addition/total
58
Q

Cumulative Incidence in open cohort

A

cumulative incidence will be underestimated because it assumes those who withdrew, lost to follow-up or died did not experience that incidence.

59
Q

Kaplan Meier method

A
  1. Rank survival times from lowest to highest
  2. divide survival time into intervals that start when one or more events occur (ei and ci) and calculate # at risk at start of each interval (ni)
  3. calculate probability of surviving each interval (pi = (ni-ei)/ni)
  4. calculate cumulative survival during each interval (Si = Si-1 x pi) - first interval is always 1
60
Q

Censoring

A

Termination of follow-up for a subject on a specified date because it is unknown whether the outcome occurred or would have occurred after that date.

unknown whether outcome occurred or would have occurred

Kaplan-Meier survival estimates calculated cumulative incidence/survival taking censoring into account, but assumes that censoring is unbiased

61
Q

Log-rank test

A

Compares K-M curves for 2 or more groups.

62
Q

Stratified log rank test

A

Compares K-M curves for 2 or more groups using stratification to control for confounding

limitation: method breaks down if data becomes too sparse

63
Q

Interpreting and presenting K-M curves

A

The further to the right, the fewer subjects at risk and the more uncertainty

Good practice to end the plot at a follow-up time when only 10-20% of subjects are still at risk.

64
Q

Two main survival analysis methods

A

KM survival curves (descriptive and cannot readily calculate RR or adjust for multiple covariates) and cox proportional hazards regression

65
Q

Cox proportional hazards regression

A

allows baseline hazard to vary over time

assumes the hazard ratio is constant over time which is equivalent to stating that the exposure-outcome relationship is NOT modified by follow-up time (therefore not an effect modifier) - if PH assumption is violated then follow-up time is a modifier and stratification by follow-up time would be needed

allows adjustment of multiple covariates and provides an RR

66
Q

Hazard

A

the instantaneous incidence rate at a point in time (change in number of new cases at time point) - basically the slope between two points on the curve.

Incidence rate could change with time

67
Q

When is Proportional Hazards assumption not met

A

When the proportional hazards curves cross one another.

68
Q

Cause-specific mortality is always ___ overall mortality

A

less than or equal to

69
Q

cause-specific survival is always _____ overall survival

A

more than or equal to

70
Q

Measuring cause-specific mortality is ____ logistically challenging than measuring overall mortality

71
Q

Methods for assessing cause-specific mortality

A

direct methods (gold standard)- determine the cause of death for each decedent. This can be done by review of medical records or death certificates but medical records is better.

indirect methods - take overall-mortality estimates and apply a correction to them, in order to estimate the number of deaths due to a specific cause - through relative survival

72
Q

Relative survival

A

Provides an estimate of cause-specific survival in a cohort. corrects for deaths from causes other than the disease under study

RS = observed OS/expected OS

If expected OS = 1, RS = observed OS

If expected OS<1, RS > or equal to observed OS

73
Q

Expected OS

A

Usually expected OS of person of the same demographics and calendar period from publicly available vital statistics data

Key assumption: OS in the diseased cohort would be the same as the OS of the comparison population, if the cohort members did not have the disease (assuming that the only difference between the two cohorts is the disease).

74
Q

Effect modification

A

Variation in the magnitude of the association between an exposure and an outcome across strata of a second exposure (the effect modifier)

Has an underlying public health, clinical, biologic, or psychosocial basis. Not merely a statistical phenomenon.

Can be assessed through stratified analysis and multivariable models

Effect modification is reciprocal since there is an interaction

75
Q

Effect modification via stratified analysis

A

If stratify and RR for each stratum is not similar then there is a potential for effect modification. If this is the case you can then calculate a p-value for heterogeneity (interaction) (this is a likelihood ratio test).

If p-value for heterogeneity is significant, then effect modification/interaction, if not then no effect modification/interaction.

calculate p-value of the interaction term, if multiple, the interaction terms in aggregate