Epidemiology Chapter 5 Flashcards

1
Q

What is survival data?

A

Time to event data
- time is defined from the origin until the occurrence of a predetermined event is measure for each subject
for example
- time until death following a heart transplant
- time until death following an AIDS diagnosis
- length of time in remission
- time until rejection of a transplanted organ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why can’t we use standard techniques?

A
  1. Survival times are non-negative
  2. Distribution of survival times is usually skewed and often highly skewed
  3. Censored observations, when we don’t know the exact survival time of a subject
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Censoring
Right

Left

Interval

A

Right
We have a lower bound, only know that they survived to a certain point and his denoted by *
- e.g. a patient withdraws from a study, or dies from other causes

Left
We have an upper bound, we know that the subject was dead by a certain time t but not the exact time

Interval
We know that the time of death occurred during a particular time interval but we don’t know the exact time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The cumulative distribution function

A

The probability of dying at or before time t
F(t)=Pr(T<=t) = integral from o to t f(u) du
F(0) =0 and F(infinity) = 1
It is impossible to die before time t=0 and certain that death will take place before t=infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The probability density function

A

f(t)=F’(t) = dF(t)/dt

alternatively 
f(t) = lim (delta t -> 0) Pr(t<=T
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The survival function

A

The probability of dying after time t or event of interest has not occurred by time t
S(t) = Pr(T>t)=1-F(t)
S(0)=1 and S(infinity)=0
It is certain you will survive beyond time 0 and you will not survive beyond t=infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Hazard function

A

Expresses he risk of death at some time point t - instantaneous death rate at time t
h(t) = lim (deltat->0) Pr(t<=T< t+delta t given T>=t)/delta t

using the rule Pr(A | B) = Pr(A n B)/Pr(B)
Pr(t<=T< t + delta t given T>=t) = Pr(t<=T< t+delta t n T>=t)/ Pr(T>=t)

Pr(t<=T=t) = Pr(t<=T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The cumulative hazard function

A

H(t) = integral from 0 to t h(u) du

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The exponential distribution

A
pdf f(t)=f(t)=λe^(-λt), λ>0,t>=0
Skewed distribution so is often used in survival analysis 

C.d.f
F(t)=integral from 0 to t f(x)dx
= integral from 0 to t of λe^(-λx)
= 1-e^(-λt)

Survival function
S(t)=1-F(t) = 1-(1-e^(-λt))=e^(-λt)

Hazard function
h(t) = f(t)/S(t) = λe^(-λt)/e^(-λt) = λ - not sensible in practise as the instantaneous death rate will be the same no matter what the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The Weibull distribution

A

pdf f(t)=λαt^(α-1) e^(-λtα), t,α,λ>0,
if we set α=1 we get the exponential distribution
α is the shape parameter and λ is the scale parameter

C.d.f
F(t)=integral from 0 to t f(x)dx
=integral from 0 to t λαx^(α-1) e^(-λxα) dx
= 1-e^(-λxα)

Survival function
S(t)=1-F(t) = e^(-λxα)

Hazard function
h(t) = f(t)/s(t) = λαt^(α-1) e^(-λtα)/ e^(-λxα) = λαt^(α-1)
depends on time, affording more flexibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kaplan-Meir method

A

Non-parametric, doesn’t require specific assumptions about the distributions
Used to estimate the survival function
t_i is the time if the ith event
n_i is the number of people yet to experience the event and uncensored just before t_i
d_i is the number of deaths during [t_i, t_i+1)
c_i is the numbered of censored observations in the interval [t_i, t_i+1)

S_hat(t) = product t_i (ni-di)/ni

SE(S_hat(t)) = S_hat(t) sqrt( sum di/ni(ni-di))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The log-rank test

A

Non-parametric or distribution free test which allows us to compare the survival times of two groups.
Wish to test
H_0 : survival rates in the two underlying populations are the same
H_1: Not H_0

Test statistic
χ^2=(obs_A-Exp_A )^2/(Exp_A )+(obs_B-Exp_B )^2/(Exp_B )
obs_A - number of observed deaths in group A
Exp_A - number of expected deaths in group A
Under the assumption that H_0 is trie the test statistic will follow a chi-squared distribution on 1 degree of freedom

Expected number of deaths in treatment group A
d_ix n_Ai/(n_Ai+n_Bi) = d_i x n_Ai/n_i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the Cox’s proportional hazards model

function

A

h(t , x) = h_0(t)e^(beta_1x_1 + beta_2x_2 + …+ beta_px_p)

The intercept term is incorporated in h_0(t)

h_0(t) is known as the baseline hazard function, when all x =0

The ratio of hazards for two patients will be proportional and not depend on time

The model assumes that patients have the same ‘shape’ of hazard function but that it is shifted multiplicatively according to the covariates so that they can never cross

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interpretation of the Cox’s proportional hazards model parameters for single covariate

A

Suppose we have two individuals one a smoker one a non-smoker
Hazard ratio is given by exp(beta)
Beta is known as the log-hazard ratio

For continuous covariate x, weight
A 1 unit increase in weight will increase by e(beta)
for example beta = 0.1 exp(beta)=1.105
A 1kg increase in weight leases to a 10.5% increase in the hazard
Figure is constant no matter what t is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Cox’s proportional hazards model

Summary of methods

A

Likelihood function
product from i=1 to n (L_i)^d_i
where L_i = Hazard for person with time t_i / sum of hazards for people with times >=ti

log-likelihood function
sum from i=1 to n d_i log(L_i)

Obtain the maximum likelihood using R of solver on excel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cox’s proportional hazards model

Hypotheses for z test

A

Null, beta = 0
Alternative, beta ≠ 0, for each beta
Get associated p-value from R

Confidence interval
Reject H0 if it doesn’t contains 1 i.e e^beta_1 ≠0

17
Q

Cox’s proportional hazards model

Likelihood ratio test

A

Null model
h(t, x1, x2,…,xn) = h_0(t)

2(difference between the null model and our fitted model)
Associated p-values from the r code
null hypotheses: all betas =0
alternative: all betas ≠ 0

18
Q

Cox’s proportional hazards model

Alternative hypotheses tests

A

The Wald and the Score(log rank) tests

Given in the R code

19
Q

How to test the proportional hazards assumptions ?

A

Produce a plot based on the Kaplan-Meier survival functions by treatment
If s^hat(t) represents the estimated survival function we plot log(-log(s^hat(t))) against t and hope to see that our plots are fairly parallel across treatment levels

Or formal test in R
test cox.zph(model)

20
Q

Links between the pdf, pdf, survival and cumulative hazard functions

A

The pdf, cdf, survival and cumulative hazard functions are related, if we know one we can derive the others

key link:
(log{S(t)})’=S’(t)/S(t)={1-F(t)}’/S(t)=-f(t)/S(t)=-h(t)

integrating gives 
log{S(t)}=-H(t)
so 
S(t)=exp{-H(t)}

Using the hazard function we find
H(t)=integral from 0 to t h(u)du
S(t)=exp{-H(t)}
F(t)=1-S(t) = 1-exp{-H(t)}

f(t) = F’(t) = {1-exp{-H(t)}}’=exp{-H(t)}H’(t)
= h(t)exp{-H(t)}
= h(t)s(t)