SURVIVAL ANALYSIS Flashcards

1
Q

Right censored

A

When the true survival time is incomplete, after the follow-up period
- e.g., loss to follow-up, withdrawals, no event by end of study

true survival time >= observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Left censored

A

When the event occurs before we had the chance to record it

true survival time <= observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interval censored

A

Event occurs between t1 and t2, but we are not sure when

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Main differences between S(t) and h(t)

A

S(t) is survivor function. It gives the probability that a person survives longer than a specified time. (focused on not failing)

h(t) is hazard function (conditional failure rate). It gives the instantaneous potential for the event to occur, given that the individual has survived up to time t. (focused on failing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the use of h(t)?

A

Identifying specific models (exponential, weibull, lognormal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 assumptions to censoring

A
  1. Random: random overall, i.e., subjects censored at time t are representative of all study subjects who remain at risk with respect to survival experience (censored failure rate = observed failure rate; much stronger assumption)
  2. Independent: random within subgroups (if we have only one group, then random = independent; useful for validity)
  3. Non-informative: distribution of time to event does not provide information on time to censorship, and vice-versa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Non parametric survival analysis?

A
  • Life tables
  • Kaplan Meier

Make no assumption about the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Semi parametric survival analysis?

A
  • Cox PH model

Assumes that the hazard functions are proportional
But leaves hazard rate unspecified (no intercept, it’s integrated to baseline hazard)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parametric survival analysis?

A

Strong assumptions, with Weibull, exponential, Gompertz, lognormal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When is the Cox PH a no go?

A

When the effect of the exposure varies with time (i.e., interaction) - it violates the proportional hazards assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is there to know about the baseline hazard in Cox PH?

A

It’s only a function of time (not of covariates) and it’s not directly estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the main characteristics of the hazard function?

A
  • it’s a rate
  • it’s always positive
  • infinite upper bound
  • can decrease, increase or stay constant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we go from S(t) to h(t)?

A

If h(t) is constant, then S(t) = exp(-lambda*t)

If not, the S(t) = exp(negative integral of the hazard function between 0 and t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assumptions that come with life tables?

A
  • Censoring occurs uniformly (therefore we halve the time at risk of censored subjects)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

With Kaplan-Meier, what happens with ties?

A

The censoring is assumed to occur right after death, which leads to slightly overestimating the survivor function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What do we use to test if Kaplan-Meier curves are different?

A
  • Log-rang test
  • Wilcoxon test
  • Likelihood ratio test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What about CI and Kaplan-Meier?

A

We have a confidence interval for each estimate of S(t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Log-rank test for Kaplan-Meier

A
  • A chi2 test, with equal weight on every failure
  • Good for: test differences that fit the proportional hazard model (so before using Cox); RCT with no confounders
  • Bad for: h(t) crossing; needing to control for confounding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Wilcoxon test for Kaplan-Meier

A
  • A log-rank test (therefore, a chi2 test) that weights strata by size, giving more weight to earlier time points
  • More powerful if we don’t have proportional hazards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Likelihood ratio test for Kaplan-Meier

A

Assumes exponential distribution (constant hazard)

21
Q

Which test should I use for a Kaplan-Meier if I suspect I have:

a) proportional hazards?
b) non proportional hazards?

A

a) Log-rank test

b) Wilcoxon test

22
Q

What’s the main use of Cox PH regression?

A

Studying the dependency of survival time on predictor variables

23
Q

What are the assumptions of the Cox PH?

A
  • Unknown shape of baseline hazard (by comparing, it cancels out - so we don’t care)
  • Proportionality
  • not appropriate if functions cross
24
Q

What is the interpretation of the coefficient and of the exp(coef) of a Cox PH?

A

Coef:
increase in the expected log of relative hazard for each one unit increase in predictor, holding other predictors constant (null=0)
OR
increase in the expected log of the relative hazard for subjectsA as compared to subjectsB, holding other predictors constant

exp(coef):
Increase in the expected hazard for each one unit increase in predictor, holding…
OR
expected hazard is x times higher in A compared to B

25
Q

How do we check the assumption of proportional hazards in Cox regressions?

A
  • Graphically: the log-log survival plot should be parallel (i.e., independent of time)
  • Schoenfeld residuals: the null hypothesis being that the correlation between Schoenfeld residuals and ranked failure time is 0 // also plot them against time
  • We can also divide the data into strata
26
Q

How do we check that the relationship between the log hazard and the covariates is linear in Cox regressions?

A

By plotting Martingale residuals against covariates, and adding a smooth line to detect deviations from 0 (produced by a local linear regression - loess)

27
Q

What are Martingale’s residuals?

A

In Cox,
the Martingale’s residuals for individual i on time ti (end of follow-up for that individuals)
=
event indicator
-
cumulative hazard function for individual i

28
Q

3 types of model selection for survival analyses? (most likely Cox)

A
  1. Purposeful selection (univariate analysis, take all above certain threshold, put them in multivariate analysis)
  2. Stepwise selection (using AIC; cons: considers only a small number of all possible models)
  3. Best subset selection (all possible models, using Mallow’s C)
29
Q

What is AIC and how do we use it?

A

When we want to choose the best model
= 2k - 2max(loglikelihood)
where k = #parameters in the model

The smaller the better

30
Q

What is Mallow’s C and how do we use it?

A

Usually when we use best subset selection

= W + (p-2q)
where W = W(p) - W(p-q)*
and p = #variables considered
and q = #variables excluded

*(Wald Stat for full model) - (Wald stat of subset model)

  • The smaller the better*
  • Large models have small W, but are penalized by (p-2q)
31
Q

What is the probability density function?

A

f(t) = d(Failures)/dt

The probability of the failure time occurring at exactly time t

32
Q

Formula with h(t), f(t), and S(t)

A

h(t) = f(t)/S(t)

33
Q

What is F(t)

A

Cumulative distribution function - The integral of the probability density function [f(t)]

34
Q

Which function goes from 0 to 1, always increasing?

A

F(t) - cumulative distribution function

35
Q

Which function goes from 1 to 0, always decreasing?

A

S(t) - survival function

36
Q

Which function goes from 0 to 1, decreasing and increasing?

A

f(t) - Probability density function

37
Q

Which function goes from 0 to infinity, decreasing and increasing?

A

h(t) - hazard function

38
Q

Constant hazard

A
  • Exponential model

- h(t) is always equal to the same value, lambda, regardless of time

39
Q
What is the shape of the
a) hazard function
b) cumulative distribution function
c) probability density function
in the case of an exponential distribution?
A

a) constant (horizontal line)
b) always increasing, but stabilizes after a bit near one (think good ROC curve or logarithm function)
c) always decreasing (think exponential function)

40
Q

What are the properties of parametric multivariate regression techniques?

A
  1. Model the underlying hazard/survival function
  2. Assume that the dependent variable (time-to-event) takes a known distribution (Weibull, exponential, lognormal)
  3. Estimate parameters of these distributions, such as the baseline hazard function
  4. Estimate the covariate-adjusted hazard ratio
41
Q

What does a Weibull look like?

A

Increasing over time OR decreasing over time

e.g., cancer patients not responding to treatment; death after surgery

42
Q

What does a lognormal look like?

A

First increasing, then decreasing (e.g., tuberculosis patients - increases first, then decreases)

43
Q

Which models are proportional hazards models?

A

Weibull + exponential

44
Q

What is it that we can calculate for Weibull and exponential models only?

A
Hazard ratio (ratio of incidence rates)
- we can't do that for other parametric models, because the hazards are not necessarily proportional over time)
45
Q

Formula hazard ratio for exponential model:

A

HR = e^-B

46
Q

Formula hazard ratio for weibull model:

A

HR e^(-B/scale)

47
Q

What’s up with the weibull model and the scale?

A
  • It makes it more flexible (due to 2 parameters instead of one)
  • A scale of 1.0 makes it exponential
48
Q

What are the two components of parametric models?

A
  1. A baseline hazard function

2. A linear function of fixed covariates that, when exponentiated, gives the relative risk

49
Q

Exponential vs weibull model in terms of baseline hazard?

A

Exponential: assumes fixed baseline hazard that we can estimate

Weibull: models baseline hazard as function of time. 2 parameters (shape/scale) must be estimated to describe the underlying h(t) over time