Epidemiology Chapter 5 Flashcards
What is survival data?
Time to event data
- time is defined from the origin until the occurrence of a predetermined event is measure for each subject
for example
- time until death following a heart transplant
- time until death following an AIDS diagnosis
- length of time in remission
- time until rejection of a transplanted organ
Why can’t we use standard techniques?
- Survival times are non-negative
- Distribution of survival times is usually skewed and often highly skewed
- Censored observations, when we don’t know the exact survival time of a subject
Censoring
Right
Left
Interval
Right
We have a lower bound, only know that they survived to a certain point and his denoted by *
- e.g. a patient withdraws from a study, or dies from other causes
Left
We have an upper bound, we know that the subject was dead by a certain time t but not the exact time
Interval
We know that the time of death occurred during a particular time interval but we don’t know the exact time
The cumulative distribution function
The probability of dying at or before time t
F(t)=Pr(T<=t) = integral from o to t f(u) du
F(0) =0 and F(infinity) = 1
It is impossible to die before time t=0 and certain that death will take place before t=infinity
The probability density function
f(t)=F’(t) = dF(t)/dt
alternatively f(t) = lim (delta t -> 0) Pr(t<=T
The survival function
The probability of dying after time t or event of interest has not occurred by time t
S(t) = Pr(T>t)=1-F(t)
S(0)=1 and S(infinity)=0
It is certain you will survive beyond time 0 and you will not survive beyond t=infinity
The Hazard function
Expresses he risk of death at some time point t - instantaneous death rate at time t
h(t) = lim (deltat->0) Pr(t<=T< t+delta t given T>=t)/delta t
using the rule Pr(A | B) = Pr(A n B)/Pr(B)
Pr(t<=T< t + delta t given T>=t) = Pr(t<=T< t+delta t n T>=t)/ Pr(T>=t)
Pr(t<=T=t) = Pr(t<=T
The cumulative hazard function
H(t) = integral from 0 to t h(u) du
The exponential distribution
pdf f(t)=f(t)=λe^(-λt), λ>0,t>=0 Skewed distribution so is often used in survival analysis
C.d.f
F(t)=integral from 0 to t f(x)dx
= integral from 0 to t of λe^(-λx)
= 1-e^(-λt)
Survival function
S(t)=1-F(t) = 1-(1-e^(-λt))=e^(-λt)
Hazard function
h(t) = f(t)/S(t) = λe^(-λt)/e^(-λt) = λ - not sensible in practise as the instantaneous death rate will be the same no matter what the time
The Weibull distribution
pdf f(t)=λαt^(α-1) e^(-λtα), t,α,λ>0,
if we set α=1 we get the exponential distribution
α is the shape parameter and λ is the scale parameter
C.d.f
F(t)=integral from 0 to t f(x)dx
=integral from 0 to t λαx^(α-1) e^(-λxα) dx
= 1-e^(-λxα)
Survival function
S(t)=1-F(t) = e^(-λxα)
Hazard function
h(t) = f(t)/s(t) = λαt^(α-1) e^(-λtα)/ e^(-λxα) = λαt^(α-1)
depends on time, affording more flexibility
Kaplan-Meir method
Non-parametric, doesn’t require specific assumptions about the distributions
Used to estimate the survival function
t_i is the time if the ith event
n_i is the number of people yet to experience the event and uncensored just before t_i
d_i is the number of deaths during [t_i, t_i+1)
c_i is the numbered of censored observations in the interval [t_i, t_i+1)
S_hat(t) = product t_i (ni-di)/ni
SE(S_hat(t)) = S_hat(t) sqrt( sum di/ni(ni-di))
The log-rank test
Non-parametric or distribution free test which allows us to compare the survival times of two groups.
Wish to test
H_0 : survival rates in the two underlying populations are the same
H_1: Not H_0
Test statistic
χ^2=(obs_A-Exp_A )^2/(Exp_A )+(obs_B-Exp_B )^2/(Exp_B )
obs_A - number of observed deaths in group A
Exp_A - number of expected deaths in group A
Under the assumption that H_0 is trie the test statistic will follow a chi-squared distribution on 1 degree of freedom
Expected number of deaths in treatment group A
d_ix n_Ai/(n_Ai+n_Bi) = d_i x n_Ai/n_i
What is the Cox’s proportional hazards model
function
h(t , x) = h_0(t)e^(beta_1x_1 + beta_2x_2 + …+ beta_px_p)
The intercept term is incorporated in h_0(t)
h_0(t) is known as the baseline hazard function, when all x =0
The ratio of hazards for two patients will be proportional and not depend on time
The model assumes that patients have the same ‘shape’ of hazard function but that it is shifted multiplicatively according to the covariates so that they can never cross
Interpretation of the Cox’s proportional hazards model parameters for single covariate
Suppose we have two individuals one a smoker one a non-smoker
Hazard ratio is given by exp(beta)
Beta is known as the log-hazard ratio
For continuous covariate x, weight
A 1 unit increase in weight will increase by e(beta)
for example beta = 0.1 exp(beta)=1.105
A 1kg increase in weight leases to a 10.5% increase in the hazard
Figure is constant no matter what t is
Cox’s proportional hazards model
Summary of methods
Likelihood function
product from i=1 to n (L_i)^d_i
where L_i = Hazard for person with time t_i / sum of hazards for people with times >=ti
log-likelihood function
sum from i=1 to n d_i log(L_i)
Obtain the maximum likelihood using R of solver on excel