An introduction to relative survival with merlin

Relative survival models are used widely, particularly in population based cancer epidemiology (Dickman et al. 2004). Generally they are concerned with modelling the excess mortality in a population with a particular disease, compared to a reference population, usually matched on things like age and gender. One of the benefits of the approach is that they do not rely on accurate cause of death information.

Concentrating on applications of relative survival to cancer settings, the data generally comes from population based registries. Such data often exhibits a hierarchical structure, with patients nested within geographical regions such as counties. Patients living in the same area may share unobserved characteristics, such as environmental aspects or medical care access. Charvat et al. 2016 recently described a flexible relative survival model allowing a random intercept, with the baseline log hazard function modelled with B-splines, or restricted cubic splines.

I therefore define the total hazard at the time since diagnosis, $t$, for the $i^{th}$ patient, to be $h_{i}(t)$, with

$$ h_{i}(t) = h_{i}^{*}(t) + \lambda_{i}(t) $$


  • $h_{i}^{*}(t)$ is the expected mortality for the $i^{th}$ patient
  • $\lambda_{i}(t)$ is the excess mortality for the $i^{th}$ patient

and we model

$$ \lambda_{i}(t) = \lambda_{0}(t) \exp (X_{i}^{T} \beta) $$

where $\lambda_{0}(t)$ is the baseline hazard function.

Alternatively, we can model on the (log) cumulative excess hazard scale, using the flexible parametric model of Royston and Parmar 2002, where I define the total cumulative hazard at the time since diagnosis, $t$, for the $i^{th}$ patient to be $H_{i}(t)$, with

$$ H_{i}(t) = H_{i}^{*}(t) + \Lambda_{i}(t) $$


  • $H_{i}^{*}(t)$ is the expected cumulative mortality for the $i^{th}$ patient
  • $\Lambda_{i}(t)$ is the excess cumulative mortality for the $i^{th}$ patient

and we model

$$ \Lambda_{i}(t) = \Lambda_{0}(t) \exp (X_{i}^{T} \beta) $$

where $\Lambda_{0}(t)$ is the baseline cumulative hazard function, modeled with restricted cubic splines.

In terms of modelling relative survival, it really is quite simple in terms of implementation. Our expected mortality rate is simply another variable in our dataset, which is then included in the calculation of the overall hazard function.

To turn a survival model, fitted with merlin, into a relative survival model, we simply pass the name of the variable through the bhazard() option, as follows,

merlin (depvar1 ... , family(..., failure(depvar2) bhazard(varname1))) ...

where depvar1 is our survival time, depvar2 is our event indicator, and varname1 is our variable which contains the expected mortality at each patient’s event time.

This extends relative survival models to any combination of; any number of levels, any number of random effects at each level, time-dependent effects, non-linear covariates, multivariate models i.e. joint models, joint frailty models…the list goes on!

I’ll cover some special cases in more detail in future tutorials.

This example uses merlin version 2.0.2.


comments powered by Disqus