Survival Analysis I Flashcards
What is survival analysis?
A statistical method to analyse time-to-event data, focusing on the time until an event occurs (e.g., death, disease onset)
What some common applications of survival analysis?
Medical research (time to death), clinical trials, engineering (failure time of components), and social sciences (time to job acquisition)
What is censoring in survival analysis?
A censored observation is one with incomplete information.
A situation where we do not observe the event of interest for some subjects during the study period
What is right censoring?
When an event has not occurred by the end of the study period or the subject is lost to follow-up
What are the assumptions of survival analysis?
- Non-informative censoring
- Survival probabilities are the same for all participants at the same time
- Events occur at recorded times
What are the two key variables in survival data?
- Time variable
- Failure/censoring indicator (1 = event occurred, 0 = censored)
What is an example row of data in a survival dataset for someone who never had an event?
ID: 1
Entry date: 01 Jan 2023
Event date: .
Censor date: 01 Jan 2025
Time: 2.0
Event: 0
What is an example row of data in a survival dataset for someone who had an event?
ID: 2
Entry date: 01 Jan 2023
Event date: 01 May 2024
Censor date: .
Time: 1.3
Event: 1
What is the Kaplan-Meier method?
Known as “product-limit estimator”. A non-parametric estimator for survival probability over time, considering censored data. Does not assume frequency of event remains constant over time, so not easily summarised by a single number. Estimates the cumulative probability of experiencing an event by a certain time point
How is the Kaplan-Meier survival function estimated?
By multiplying the conditional survival probabilities at each event time: 1*(1-p(tj)) * (1-p(tj))
What does a Kaplan-Meier curve show?
The probability of remaining event-free over time
What does a step-down in the Kaplan-Meier curve indicate?
The occurrence of an event at that time
How do we interpret the medial survival time from a Kaplan-Meier curve?
The time at which 50% of subjects have experience the event
How is the incidence rate calculated?
Incidence rate = (number of events) / (total person-years at risk)
How do we an interpret an IRR?
IRR > 1: Higher risk in group 1
IRR < 1: Lower risk in group 1 compared to group 2
Example calculation: If men have an IR of 12 per 100 PY and women 18 per 100 PY, what is the IRR?
IRR = 18/12 = 1.5 (women have 50% higher event rate)
What test is commonly used to compare equality of survival curves?
Log-rank test (similar to Mann-Whitney U/Wilcoxon rank sum test)
Used to compare cures on a Kaplan-Meier plot (same formula used in chi-square test - calculates the observed no. of events and compares this to the no. expected if there were in reality no difference between groups)
What is the H0 of the log-rank test?
There is no difference in survival probabilities between groups at any time point
What does a significant log-rank test indicate?
That survival distributions differ between the groups at different time points
What are the limitations of the log-rank test?
It assumes proportional hazards and may not detect differences survival curves cross (e.g., when comparing a medicine with a surgical intervention)
What Stata command is used to set survival data?
stset <time>, failure(<event>)</event></time>
<time> = time from beginning of study to death or end of study follow-up
</time>
How do you generate Kaplan-Meier survival estimates?
sts graph
By group: sts graph, by(<group>)</group>
How can you test the quality of survival curves in Stata?
sts test group
What command provides summary statistics for survival data?
stdescribe
This provides basic information like average follow-up per person, shortest follow-up time, etc. Used to check for obvious errors in the dataset. It gives the total PYs at follow-up, used to calculate overall incidence rate