Duration Models Flashcards
Why duration models?
Because they take into account time and duration. Timing is important, question is often not “if” it is going to happen. But “when” it is going to happen.
Two relating, interesting questions:
1) Analysis and prediction of when an event will happen? Or whether it happens at all.
2) Analysis of the effects of covariates on whether and when the event happens.
Duration Data
Time it takes for the event of interest to happen
Hazard rate
The probability that an event occurs at time period t, conditional that it did not happen yet.
When is a observation censored?
If the event did not take place in the observed time period, it means that that observation is censored.
What do we know about censored observations:
- DO KNOW: the event did not happen within the observation period
- DON’T KNOW: if and when the event will happen
Why use censored data?
Models need to use as much information as possible, censored observations are not missing at random and they contain essential information.
Hazard model
A model for the hazard rate, which is the probability that an event happens in a time interval given that it has not happend yet.
Building blocks of the hazard model:
- Probability density function
- Cumulative distribution function
- Survival function
- Hazard rate
Probability density function
Prob. that event happens in the time interval t
Cumulative distribution function
Prob. that the even takes place AT or BEFORE time t.
Survival function
Probability that event not happens before time t.
Hazard rate
Conditional prob. that even occurs at t, given that it has not occurred until t.
Kaplan Meier Survival Function
Uses the survival function (S)t: probability that the event did not happen till time period t.
Kaplan-Meier estimator
Non-parametric estimator directly computed from the observed proportions of surviving cases (over-time periods)
Cox Proportional Hazard model
Model for the Hazard rate, allows for including:
- fixed covariates
- time-varying covariates
Cox Proportional Hazard model, formula:
h(t) = h0(t)*exp(B1X1 + ….. + BkXk)
Why is Cox Proportional Hazard model popular:
1) Partial likelihood approach, no need to specify hazard baseline function
2) No need to specify the probability distribution
3) Very easy & fast to estimate
Parameter interpretation via…
The hazard ratio, which is the ratio between two hazard rates.
Example of a hazard ratio, in words and formula..
A model for churn and X is a dummy variable for gender ( 1 = female | 0 = male):
Hazard rate = Hfemale(t) / Hmale(t) = e^Bgender
If parameter is corresponding to B, then the hazard rate increases by:
100*(exp(B)-1) %
Key assumption of the proportional hazard model
Proportionally assumption:
- assumption that the hazards for different levels of covariates is constant overtime.
- in other words, the hazard ratio of males vs. females is constant overtime.
Model fit and selection, via:
- Likelihood ratio test
- Wald-test
- Score test
- Pseudo R2
Index of Concordance
The fraction of pairs in your data, where the observation with the higher survival time, has the higher probability of survival predicted by your model.
Pseudo R2
Percentage of improvement in log-likelihood of the full model, relative to the null model.