Survival analysis Flashcards
Life tables and survival probability
“showing its number of clients (that is, life insurance policy holders) by age, and the number of deaths during the past year in each age group.
for example five deaths among the 312 clients aged 59.
survival probability: probability 0.893 of a person aged
30 (the beginning of the table) surviving past age 59, etc. S is calculated
according to an ancient but ingenious algorithm.
The prob. of dying at age i:
”
Hazard rate
“The hazard rate at age i is by definition the probability of dying at age i given survival past age i - 1
A crucial observation is that the probability Sij of surviving past age j
given survival past age i 1is the product of surviving each intermediate
year
First, each hi was estimated as the binomial proportion of the number of deaths yi among the ni clients.
The insurance company doesn’t have to wait 50 years to learn the proba-
bility of a 30-year-old living past 80 (estimated to be 0.506 in the table).
Memoryless in the continuous case. “
Censored data
“The response for each patient is survival time in days. The +-sign following some entries indicates censored data, that is, survival times known only to exceed the reported value. These are patients “lost to followup,” mostly because the NCOG experiment ended with some of the patients still alive.
This is what the experimenters hoped to see of course, but it compli-
cates the comparison. Notice that there is more censoring in Arm B.
Solution: Kaplan Meier”
Kaplan-Meier
“Kaplan–Meier curves provide a graphical comparison that takes proper account of censoring. Kaplan–Meier curves have become familiar friends to medical researchers, a lingua franca for reporting clinical trial results.
Assuming ordered survival times, The Kaplan–Meier estimate for survival probability is the life table estimate:
Survival prob. calulation takes into account whether variable was censored.
Accuracy: Greenwoods formula”
Hazard ratio
“The hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions characterised by two distinct levels of a treatment variable of interest.
In a clinical study of a drug, the treated population dies at twice the rate per unit time when compared to the control population, that is
”
Parametric hazard rates
“Life table curves are nonparametric, in the sense that no particular relationship is assumed between the hazard rates hi.
A parametric approach can greatly improve the curves’ accuracy.
we assume that the death counts yk are independent binomials,
Comparison in terms of hazard rates is more informative than the sur-
vival curves of Figure 9.1. Both arms show high initial hazards, peaking at
five months, and then a long slow decline.”
The log-rank test
“The log-rank test employs an ingenious extension of life tables for the nonparametric two-sample comparison of censored survival data.
The idea here is simple but clever. Each month we test the null hypothesis
of equal hazard rates
Deaths are hypergeometrically distributed! “
Proportional hazards model
“The Kaplan–Meier estimator is a one-sample device, dealing with data
coming from a single distribution. The log-rank test makes two-sample
comparisons. Proportional hazards ups the ante to allow for a full regres-
sion analysis of censored data (including multiple variables).
Now the data points include a 1 x p vector of covariates whose effect on survival we wish to assess.
Uses the partial likelihood! “
Missing data and the EM-algorithm
“The EM algorithm is an iterative technique for solving missing-data inferential problems using only standard methods.
In other words, maximum likeli-
hood estimation is self-consistent: generating artificial data from the MLE
density doesn’t change the MLE. (EM algo does not change the MLE).
Kaplan–Meier survival curve; Kaplan–Meier is self-consistent, leading to its identification as the “nonparametric MLE” of a survival function.
”