Midterm Flashcards

Question 1

Q

Association vs. Causality

Answer

A

Causality requires meeting assumptions such as temporal relationship, strength of association, dose response relationship. Experimental studies tend to look at causality.

Association is when there is limited knowledge and you cannot say for sure that the exposure causes the outcome. Observational studies tend to look at association.

When a study is about association, they will have a hypothesis that states “is associated with” while a causality study will say “increases/decreases the risk”.

Question 2

Q

Descriptive Study

Answer

A

A study that describes the distribution of disease (e.g. person, place, or time).

Often an implicit hypothesis such as “the distribution of disease varies by person, place or time”. But can also be explicit as well.

Question 3

Q

Analytic Study

Answer

A

Motivation is often to identify a causal determinant and find an association between exposure and outcome.

Question 4

Q

Relative Risk

Answer

A

RR can mean incidence rate ratio, risk ratio (cumulative incidence ratio), hazard ratio, and odds ratio

Question 5

Q

Bias

Answer

A

Systematic error in the design or conduct of a study that results in a measure of association among study participants that is meaningfully different from the true measure of association (e.g. such as that in the source population)

Question 6

Q

Information Bias

Answer

A

Error due to collection of incorrect information about study participants.

Participants are classified into incorrect exposure or disease categories (misclassification)

Question 7

Q

Selection Bias

Answer

A

Error arising from 1) criteria or procedures used to select study participants or 2) nonparticipation (occurring at initial enrollment or due to losses to follow-up)

Question 8

Q

Direction of bias for RR

Answer

A

Axis 1: Upward vs. downward (this does not provide information on strength of association is being over or underestimated)

Axis 2: Toward the null vs. away from the null

When assessing direction of bias the reference point is always the true RR. (e.g. if the True OR is .8 and the Obs OR is .2, then the bias is downward and away from the null).

Question 9

Q

Strength of Association

Answer

A

The further from the null, the stronger the association.

Bias away from null overestimates the strength of association

Bias towards the null underestimates the strength of association

Question 10

Q

Source population in a cohort study

Answer

A

The population that gave rise to the study sample. (should always include calendar time)

Question 11

Q

General cohort

Answer

A

Defined by a factor unrelated to any particular exposure

Typically a convenience sample based on logistical advantages (e.g. willingness to participate, ease of recruitment, and/or follow-up)

Use of an internal comparison group

Uses RR

Question 12

Q

Specific-exposure cohort

Answer

A

Defined by a specific exposure

Use of an external comparison group (e.g. general population).

Method to analyze is indirect standardization

Uses RR

Susceptible to selection bias (such as healthy worker effect) - The main issue is that the exposed cohort and nonexposed external
comparison group are not selected in the same fashion from the same
source population. Selection from different source populations may result in
different disease risk for reasons other than the exposure under study

Question 13

Q

Sources of selection bias

Answer

A

different criteria are used to select exposed and unexposed participants
Selection of exposed or nonexposed participants is related to the development of the outcome of interest
Loss to follow-up is related to both the exposure and the outcome of interest (differential losses to follow-up)

Question 14

Q

Susceptibility to selection bias

Answer

A

Cohorts with internal comparison groups are less prone to selection bias than specific-exposure cohorts. Study participants are selected before the development of the disease and it is unlikely that future events will bias selection process. Cohorts using internal groups could have selection bias due to differential losses to follow-up.

Cohort using an external comparison group - healthy worker effect - RR is biased downward

specific-exposure cohorts are extremely prone to selection bias

Question 15

Q

Differential losses to follow-up

Answer

A

a situation in research where participants who drop out of a study have different characteristics than those who stay in the study

Question 16

Q

Source Population in case-control

Answer

A

The population that gave rise to the cases. Essentially, the population of persons who would have been identified as cases if they had developed the condition of interest during the time period in which the cases were identified.

Calendar time should be included

Question 17

Q

Types of Source populations

Answer

A

Primary source population - well-defined (e.g. residence, calendar period), and specified a priori. Determines case ascertainment
Examples include:
- residents of a defined geographic area
-members of a health plan
-members of a general cohort

Secondary source population (more prone to selection bias than primary) - theoretically defined and inferred based on the method of case ascertainment. case ascertainment method is defined a priori. “Would/if criterion” is employed.
Examples include:
- cases ascertained through a hospital “person who would attend the hospital if..”
-cases recruited through advertisements “person who would answer the ad if they were…”

Question 18

Q

Case-control studies

Answer

A

a method of sampling controls from the source population such that the controls reflect exposure distribution in the source population that gave rise to the cases. Controls should be randomly sampled and representative of source population.

Uses odds ratio.

case selection: includes all cases that arise in the source population. But in reality usually only a sample of cases are included but they need to be representative of all cases.

Question 19

Q

Selection bias in case-control

Answer

A

If the exposure under study is not similar among study cases compared to all cases that arose in the source population.

If the exposure under study is not similar among study controls compared to the source population.

Prone to selection bias. cases and controls are often selected through fundamentally different processes
- imperfect method of case ascertainment
- case non-participation
- case refusal, inability to locate cases, case too sick, case died

Controls: - non-participation, control refusal, inability to locate, random sampling from primary source is hard, secondary source pop is difficult to operationally define

partial non-participation among cases and/or controls

Question 20

Q

Timeline of case and control recruitment

Answer

A

ascertain and recruit incident cases

accumulate controls during the study period at same rate that cases are being accumulated

source population is restricted to persons at risk of becoming a case

a control who later becomes a case serves as both a control and a case

Question 21

Q

2 x 2 table

Answer

A

cases controls
exposed a b
non-exposed c d

odds ratio = ad/bc

Question 22

Q

Types of Case-control Studies

Answer

A

Population-based
- primary source population
- cases: all new cases of disease x that arise
- control: rep sample of the source pop with respect to exposure

Hospital-based
- secondary source population
-cases: same as above but in a hospital
-control: same as above, but it’s hard to achieve in a secondary source pop

Source pop can come from place of residence, insurance, access to a regular physician, etc.

One exception: if most residents of a defined geographic area would attend hospital A and no other hospitals if they contracted a disease then cases could be considered population-based and population-based controls can be used.

Nested
- primary source population
-case and control same as above
-typically conducted when the exposure of interest are measured by assay of stored biologic specimens

Question 23

Q

Pros and Cons of hospital-based case-control studies

Answer

A

Pro
- easily accessible and high participation rate
- protect against recall bias

Con
-nonrandom sample of the source pop, most of whom are healthy
-some may not even be members of source population

Question 24

Q

Strategies for selection of hospital controls

Answer

A

only include patients admitted for diseases for which there is no suspicion of an association with the exposure under study

include controls with a variety of diseases

include diseases thought to have a comparable source population as the disease under study

base exclusions on diagnosis at the current hospitalization, not on past medical history

Question 25

Q

pros and cons of nested

Answer

A

pro
- exposure measured at baseline before development of disease
-selection of controls by random sampling from a well-defined, primary source population

sources of selection bias
- incomplete case ascertainment
-cohort losses to follow-up
-selection bias associated with participant selection in the entire cohort itself

Question 26

Q

Confounding

Answer

A

When associated with both exposure and outcome and is not a mediator on the casual pathway.

Can be caused by an imbalance b/w exposed and nonexposed groups in another, extraneous exposure (confounder)

If there is confounding and the variable is identified and measured, then can adjust as long as there was no bias in selection of cases or controls within each stratum of the covariate.

For example - if SES is only associated with exposure, but there is over selection of high SES controls, then there is an artifactual inverse association with the outcome leading to confounding. Can be addressed through stratification (Mantel-Haenszel method)

Question 27

Q

Key test of validity of a case control study

Answer

A

Controls and the source pop should be alike with respect to the exposure under study

Question 28

Q

The best indication of the presence of confounding is

Answer

A

A meaningful difference between the unadjusted RR and the adjusted RR

calculate and inspect RR for each stratum of potential confounder. If the stratum specific RRs are similar, then potential confounding. If they are different it may be effect modification

Question 29

Q

When and how to address confounding

Answer

A

Design phase - identify potential confounders by consulting the lit

data collection - measure potential confounders accurately

analysis - check theoretical confounders and other study variables. determine if there is confounding.

Question 30

Q

Methods used to adjust for confounding in analysis stage

Answer

A

methods based on stratification

multivariable statistical models

standardization (direct or indirect)

Question 31

Q

Magnitude of confounding

Answer

A

RRunadj - RR adj
———————— x 100
RRadj

This percentage should be more than 10%. No need to look at p value here. This will show whether is confounding

Question 32

Q

Mantel-Haenszel summary RRs

Answer

A

To calculate:

Set up i 2x2 tables (where i is the # of strata or categories of a potential confounding variable)

Compute the weighted average of the stratum-specific RRs (# of subjects or person-time experience in each stratum)

For cohort - risk ratio or rate ratio

For case-control studies - odds ratio

ORmh = sum of aidi/Ni
———————-
sum of bici/Ni

Question 33

Q

When to use MH

Answer

A

adjustment for a single confounder that is a categorical variable

simult adj for 2 or 3 confounders, as long as the number of strata for each confounder is relatively small

MH only is to be used for categorical variables

cannot use for large strata - too cumbersome

Question 34

Q

Generalized linear models

Answer

A

Linear - no RR estimated
Y = b0 + b1 *X1
If want to find b1 in 10 years, just b1 * 10
Null value is 0

The following models are all log-transformed:

Logistic - odds ratio
- used in case-control studies
-other studies with binary dependent variables
-risk prediction
-ignores time

ln(odds of Y) = bo + b1*X…

Poisson (log-linear) - IRR
- cohort studies with person-time data
- incidence rate studies that use aggregate level data

ln(incidence rate of Y) = b0 + b1*X…

Cox Proportional Hazards - HRR
- studies with binary outcome and person-time data
- cohort studies
- RCTs
- Survival analysis

ln[h(t)] = ln[h0(t)] + b1*X…

Question 35

Q

Unconditional logistic regression vs conditional logistic regression

Answer

A

unconditional - used in unmatched case control (can also use stratification such as the traditional Mantel-Haenszel method with stratification by the matching factors for unmatched case control)

conditional - used in some matched cases control

Question 36

Q

Deriving RR (per N-year increase of age) in a log-transformed model

Answer

A

for example, it could be any continuous variable.

ex: N = 10

beta(per year) = .04
RR(per year) = 1.05

beta(per year)*10 = .4 then e^.4 = 1.63

This is done in relation to reference level - could be 10 could be 20, but N=10 will always be the same.

Question 37

Q

Categorical or continuous in model for a natural order variable? (test for trend)

Answer

A

Can model it as a singular term, if linear then can use as continuous.

Test for trend can be used to assess evidence of an exponential trend (linear on a log scale). only applied to exposures with a natural order.

To do this:
1) model variable as categorical variable to capture shape of the dose-response relationship

2) model variable as a single term in a separate regression model to test for trend

p-value for trend = p-value for b1 in the single term model. if p<.05 then it is significant

Question 38

Q

Standardization

Answer

A

Stratification-based method of comparing
rates of an outcome between two populations that have
different distributions of one or more confounders

To make the comparisons fair (i.e. to remove
confounding) by forcing the two populations to have the
same covariate distribution

Question 39

Q

Indirect Standardization

Answer

A

Used for retrospective (historical) cohorts with an
external comparison group such as special exposed cohorts.

Standardization covariates MUST be categorical

Question 40

Q

Standardized incidence/mortality ratio

Answer

A

incidence ratio: total observed cases
—————————–
total expected cases

mortality ratio: total observed deaths
——————————–
total expected deaths

Question 41

Q

Residual confounding

Answer

A

When your study adjusts for a variable or set of related variables that do not completely remove the confounding by that/those variables.

Coarse categorization: This may be because you use too broad of categories so that there are heterogenous groups of people within each stratum. This is problematic because these heterogenenous groups of people could also differ with respect to their exposure prevalence and risk of the outcome.

Suboptimal modeling of the confounder in a multivariable model (e.g. modeling a covariate as continuous when the true dose-response curve is U-shaped)

Inadequate adjustment for complex, multidimensional confounders, such as smoking, SES, and health status

Inadequate measurement of the confounder (measurement error - unvalidated data collection instrument), collection of insufficiently detailed information

**If confounding remains due to not adjusting at all for a particular confounder this is NOT considered residual confounding.

Question 42

Q

Health status as confounder

Answer

A

Healthy vaccinee effect - seniors are at high short-term risk of death who are unvaccinated

Question 43

Q

Addressing residual confounding

Answer

A

Measurement - measure potential confounders as carefully as the exposure under study. Especially if multidimensional

Data analysis - Use sufficiently fine covariate categorization, optimize modeling of covariates in multivariable models, strive to capture full dimensionality of multidimensional confounders in multivariable models

*however need to take into account statistical imperative of model parsimony - ratio of # of outcomes to # of covariates should be more than 10.

interpretation - Be transparent about the residual confounding in interpretation and how it could be better accounted for.

Question 44

Q

Matching in Cohort Studies

Answer

A

Adjust for one or more potential confounders in the design phase of your study

Select non-exposed participants who are similar to the exposed participants with respect to the distribution of one or more potential confounders.

Potential confounders are called matching factors. When matched no need to account in the analysis phase but only if there’s complete follow-up

Question 45

Q

Matching in case-control studies

Answer

A

to adjust for one or more potential confounders in the design phase of the study

selection of controls who are similar to cases with respect to their distribution of one or more potential confounders

However matching in the design phase alone does not completely remove confounding and so will need to still adjust in the analysis phase

matching intentionally introduces selection bias and creates a new, superimposed confounding toward the null

Matching on a true confounder increases statistical efficiency by optimizing precision

Question 46

Q

Frequency matching

Answer

A

Selection of controls such that the distribution(s) of one or more potential confounders is/are similar in cases and controls

Often used when matching factors are demographic variables (e.g. age, sex, race)

For example if some stratum have 0 individuals, you risk not being able to use the data from all subjects in the study leading to reduced statistical efficiency

Question 47

Q

Individual Matching

Answer

A

Selection of one or more controls that are identical to a given case with respect to one or more potential confounders

Useful for controlling for a confounder using “fine stratification” (mini stratum)

matching factors that are multidimensional confounders

using risk-set sampling of controls in nested case-control studies

The matched set is the stratum

Cannot do twin studies with unmatched case-control

Must use conditional logistic regression - don’t need to include matching factors OR stratification - mantel-Haenszel matched analysis (McNemar Test) - this gives matched OR

Question 48

Q

Nested Case-Control Studies with Matching

Answer

A

For each case, N number of matched controls can randomly sampled from the case’s risk set
- can restrict the risk set by matching factors

Enables selection of control with the same risk set as case
- same concurrent time at risk for development of outcome

Question 49

Q

Simplest Mantel-Haenszel matched analysis

Answer

A

four possible combinations of matched pairs

concordant, concordant, discordant, and discordant. Only need to look at the discordant pairs.

q r s t

r and s are the discordant pairs

r/s = Matched Odds Ratio

Question 50

Q

Can association between matching factors and disease be studied?

Answer

A

No, because matching forces controls to be the same as cases with respect to the matching factor therefore, there is no way to find the association.

Question 51

Q

Overmatching

Answer

A

Overmatching generally refers to matching that is counter productive, by either causing bias or reducing efficiency. This causes a new superimposed confounding toward the null and leads to loss of statistical efficiency.

Overmatching must be corrected in analysis phase

Matching on a mediator in a causal pathway between exposure and disease will bias the effect estimate towards the null

Matching on a non-confounder that is associated with exposure, but not a risk factor for disease

Question 52

Q

Survival analysis

Answer

A

Study of the distribution of time elapsed from a baseline time to an outcome(event)

Study of the effect of exposures (including treatments) affect the distribution of time to event

Used for two study designs: cohort studies and RCTs

Baseline data examples: date of entry into a cohort, birth date, etc.

outcome examples: death, incident disease, disease cure, etc.

It is better to experience a beneficial outcome earlier than later

It is better to experience an adverse outcome later than earlier

Question 53

Q

Cumulative incidence vs. cumulative survival

Answer

A

CI (0 to 1) is the proportion of a specified population at risk that experienced the outcome under study during a specified time period

Probability (risk) of experiencing the outcome under study in the specified time period

CS (0 to 1) is the proportion of a specified population at risk that does NOT experience the outcome under study (i.e. “survives”) during a specified time period

Probability (risk) of NOT experiencing the outcome under study in the specified time period

Can both be calculated directly if closed cohort

Question 54

Q

CS + CI =?

Question 55

Q

CI curve vs CS curve

Answer

A

CI curve is the proportion of subjects who have experienced the event as a function of time since baseline

CS curve is the proportion of subjects who have NOT experienced the event as a function of time since baseline

Question 56

Q

Median survival time

Answer

A

where CI = CS = .5

Question 57

Q

How to plot cumulative incidence/survival as a step function

Answer

A

rank survival times from lowest to highest
create intervals that start when one or more events occur
calculate cumulative incidence during interval
calculating cumulative survival would just be subtraction/total instead of addition/total

Question 58

Q

Cumulative Incidence in open cohort

Answer

A

cumulative incidence will be underestimated because it assumes those who withdrew, lost to follow-up or died did not experience that incidence.

Question 59

Q

Kaplan Meier method

Answer

A

Rank survival times from lowest to highest
divide survival time into intervals that start when one or more events occur (ei and ci) and calculate # at risk at start of each interval (ni)
calculate probability of surviving each interval (pi = (ni-ei)/ni)
calculate cumulative survival during each interval (Si = Si-1 x pi) - first interval is always 1

Question 60

Q

Censoring

Answer

A

Termination of follow-up for a subject on a specified date because it is unknown whether the outcome occurred or would have occurred after that date.

unknown whether outcome occurred or would have occurred

Kaplan-Meier survival estimates calculated cumulative incidence/survival taking censoring into account, but assumes that censoring is unbiased

Question 61

Q

Log-rank test

Answer

A

Compares K-M curves for 2 or more groups.

Question 62

Q

Stratified log rank test

Answer

A

Compares K-M curves for 2 or more groups using stratification to control for confounding

limitation: method breaks down if data becomes too sparse

Question 63

Q

Interpreting and presenting K-M curves

Answer

A

The further to the right, the fewer subjects at risk and the more uncertainty

Good practice to end the plot at a follow-up time when only 10-20% of subjects are still at risk.

Question 64

Q

Two main survival analysis methods

Answer

A

KM survival curves (descriptive and cannot readily calculate RR or adjust for multiple covariates) and cox proportional hazards regression

Answer 64

A

allows baseline hazard to vary over time

assumes the hazard ratio is constant over time which is equivalent to stating that the exposure-outcome relationship is NOT modified by follow-up time (therefore not an effect modifier) - if PH assumption is violated then follow-up time is a modifier and stratification by follow-up time would be needed

allows adjustment of multiple covariates and provides an RR

Answer 65

A

the instantaneous incidence rate at a point in time (change in number of new cases at time point) - basically the slope between two points on the curve.

Incidence rate could change with time

Answer 66

A

When the proportional hazards curves cross one another.

Answer 67

A

less than or equal to

Answer 68

A

more than or equal to

Answer 69

A

direct methods (gold standard)- determine the cause of death for each decedent. This can be done by review of medical records or death certificates but medical records is better.

indirect methods - take overall-mortality estimates and apply a correction to them, in order to estimate the number of deaths due to a specific cause - through relative survival

Answer 70

A

Provides an estimate of cause-specific survival in a cohort. corrects for deaths from causes other than the disease under study

RS = observed OS/expected OS

If expected OS = 1, RS = observed OS

If expected OS<1, RS > or equal to observed OS

Answer 71

A

Usually expected OS of person of the same demographics and calendar period from publicly available vital statistics data

Key assumption: OS in the diseased cohort would be the same as the OS of the comparison population, if the cohort members did not have the disease (assuming that the only difference between the two cohorts is the disease).

Answer 72

A

Variation in the magnitude of the association between an exposure and an outcome across strata of a second exposure (the effect modifier)

Has an underlying public health, clinical, biologic, or psychosocial basis. Not merely a statistical phenomenon.

Can be assessed through stratified analysis and multivariable models

Effect modification is reciprocal since there is an interaction

Answer 73

A

If stratify and RR for each stratum is not similar then there is a potential for effect modification. If this is the case you can then calculate a p-value for heterogeneity (interaction) (this is a likelihood ratio test).

If p-value for heterogeneity is significant, then effect modification/interaction, if not then no effect modification/interaction.

calculate p-value of the interaction term, if multiple, the interaction terms in aggregate