The do file with all the code from this tutorial is available to download here

An area of intense research in recent years is in the field of joint frailty survival models, for the analysis of joint recurrent event and terminal event data. Here I’ll describe and implement the two most popular approaches, originally proposed by Liu et al. (2004) and Mazroui et al. (2012). We have a survival model for the recurrent event process, and a survival model for the terminal event process, linked through shared random effects.

I’ll illustrate these models using the `readmission`

dataset that comes with the extensive `frailtypack`

in `R`

(Król et al. (2017)), developed by Virginie Rondeau and her group.
The dataset has information on rehospitalisation times after surgery in patients diagnosed with colorectal cancer.
Covariates of interest include gender, Dukes’ tumour stage and comorbidity Charlson index. I’ll load the dataset
and generate binary indicator variables for use later.

```
. use "https://www.mjcrowther.co.uk/data/jointfrailty_example",clear
. tab sex, gen(gender)
sex | Freq. Percent Cum.
------------+-----------------------------------
Female | 312 36.24 36.24
Male | 549 63.76 100.00
------------+-----------------------------------
Total | 861 100.00
. tab dukes, gen(dk)
dukes | Freq. Percent Cum.
------------+-----------------------------------
A-B | 324 37.63 37.63
C | 331 38.44 76.07
D | 206 23.93 100.00
------------+-----------------------------------
Total | 861 100.00
. tab charlson, gen(ch)
charlson | Freq. Percent Cum.
------------+-----------------------------------
0 | 577 67.02 67.02
1-2 | 46 5.34 72.36
3 | 238 27.64 100.00
------------+-----------------------------------
Total | 861 100.00
```

The time of rehospitalisation is stored in `time`

, with corresponding event indicator `event`

. This is in clock-reset
formulation, i.e. each time a patient has a cancer recurrence the clock is reset to zero, so it’s a semi-Markov model.
Our overall survival time is stored in `stime`

, with corresponding event indicator stored in `death`

.

In the first specification, we have one random effect which accounts for the correlation between recurrent events, and which is then included in the linear predictor for the terminal event model, with an accompanying estimated coefficient, which estimates the strength of the relationship between the two processes. So for example,

$$ h_{ij}(t) = h_{0}(t) \exp (X_{1ij} \beta_{1} + b_{i}) $$ and $$ \lambda_{i}(u) = \lambda_{0}(u) \exp (X_{1i} \beta_{2} + \alpha b_{i}) $$

where $h_{ij}(t)$ is the hazard function for the $j$th event of the $i$th patient, $\lambda_{i}(u)$ is the hazard function for the terminal event, and $b_{i} \sim N(0,\sigma^2)$.

We can fit such a model with `merlin`

, adjusting for all available covariates in each outcome model,

```
. merlin (time gender1 dk2 dk3 ch2 ch3 M1[id]@1 , family(rp, df(3) failure(event))) ///
> (stime gender1 dk2 dk3 ch2 ch3 M1[id] , family(rp, df(1) failure(death)))
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_1
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -1567.3654
Iteration 1: log likelihood = -1553.5228 (not concave)
Iteration 2: log likelihood = -1524.3864
Iteration 3: log likelihood = -1522.1226
Iteration 4: log likelihood = -1522.0638
Iteration 5: log likelihood = -1522.0637
Mixed effects regression model Number of obs = 861
Log likelihood = -1522.0637
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time: |
gender1 | -.4603471 .1407838 -3.27 0.001 -.7362784 -.1844159
dk2 | .4080398 .157101 2.60 0.009 .1001276 .7159521
dk3 | 1.395737 .2068642 6.75 0.000 .9902907 1.801183
ch2 | .3244111 .2581746 1.26 0.209 -.1816018 .830424
ch3 | .3478035 .1353953 2.57 0.010 .0824336 .6131733
M1[id] | 1 . . . . .
_cons | -2.126217 .1482473 -14.34 0.000 -2.416777 -1.835658
-------------+----------------------------------------------------------------
stime: |
gender1 | -.3533351 .225481 -1.57 0.117 -.7952696 .0885995
dk2 | .9249668 .3356399 2.76 0.006 .2671247 1.582809
dk3 | 2.960991 .4042631 7.32 0.000 2.16865 3.753332
ch2 | .7341131 .6293604 1.17 0.243 -.4994107 1.967637
ch3 | 1.058568 .2494583 4.24 0.000 .5696388 1.547497
M1[id] | .910748 .238329 3.82 0.000 .4436317 1.377864
_cons | -3.240611 .3218213 -10.07 0.000 -3.871369 -2.609853
-------------+----------------------------------------------------------------
id: |
sd(M1) | .8002138 .082537 .6537474 .9794948
------------------------------------------------------------------------------
```

We now have two model specifications, each of which can be completely different. I’ve used Royston-Parmar survival models. We define a random effect with the same name in each linear predictor, which means only one random effect has been defined, but it is included in both outcome models. In the first model, I’ve simply constrained the coefficient of the random effect to be 1. This provides a highly convenient way of linking random effects between outcome models. Our results show a highly positive estimate of associtaion of 0.910 (95% CI: 0.444, 1.378) showing substantial association between the two event processes, i.e. a higher rate of rehospitalisation also increases the mortality rate.

The second approach consists of two random effects included in the recurrent event model, one of which is also included in the linear predictor of the terminal event model, as follows,

$$ h_{ij}(t) = h_{0}(t) \exp (X_{1ij} \beta_{1} + b_{1i} + b_{2i}) $$ and $$ \lambda_{i}(u) = \lambda_{0}(u) \exp (X_{1i} \beta_{2} + b_{2i}) $$

where $b_{1i} \sim N(0,\sigma_{1}^2)$ and $b_{2i} \sim N(0,\sigma_{2}^2)$. Our $\sigma_{2}$ parameter then
indicates the association between the two processes. We can fit this model with `merlin`

, this time illustrating
how to use different distributions for the recurrent event and terminal event processes,

```
. merlin (time gender1 dk2 dk3 ch2 ch3 M1[id]@1 M2[id]@1 , family(rp, df(3) failure(event))) ///
> (stime gender1 dk2 dk3 ch2 ch3 M2[id]@1 , family(rp, df(1) failure(death)))
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_1
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -1590.3742
Iteration 1: log likelihood = -1528.1177
Iteration 2: log likelihood = -1521.2043
Iteration 3: log likelihood = -1520.979
Iteration 4: log likelihood = -1520.9745
Iteration 5: log likelihood = -1520.9745
Mixed effects regression model Number of obs = 861
Log likelihood = -1520.9745
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time: |
gender1 | -.4617122 .1439662 -3.21 0.001 -.7438808 -.1795436
dk2 | .4038236 .1597784 2.53 0.011 .0906638 .7169835
dk3 | 1.370281 .2084709 6.57 0.000 .9616859 1.778877
ch2 | .3259105 .2628614 1.24 0.215 -.1892884 .8411094
ch3 | .3437377 .1366298 2.52 0.012 .0759482 .6115272
M1[id] | 1 . . . . .
M2[id] | 1 . . . . .
_cons | -2.153943 .1511438 -14.25 0.000 -2.45018 -1.857707
-------------+----------------------------------------------------------------
stime: |
gender1 | -.3586175 .2282707 -1.57 0.116 -.8060198 .0887848
dk2 | .9218092 .3362921 2.74 0.006 .2626887 1.58093
dk3 | 2.992597 .371161 8.06 0.000 2.265135 3.720059
ch2 | .7382485 .6340328 1.16 0.244 -.504433 1.98093
ch3 | 1.130697 .2543349 4.45 0.000 .6322093 1.629184
M2[id] | 1 . . . . .
_cons | -3.267833 .2988508 -10.93 0.000 -3.85357 -2.682096
-------------+----------------------------------------------------------------
id: |
sd(M1) | .3860664 .1480004 .1821168 .818416
sd(M2) | .7341325 .0978282 .5653872 .9532415
------------------------------------------------------------------------------
```

This defines two random effects at the `id`

level, which by default have an independent covariance structure.
Our estimate of $\sigma_{1}$ is 0.386 (95% CI: 0.182, 0.818),
and similarly we obtain an estimate for $\sigma_{2}$ of 0.734 (95% CI: 0.565, 0.953).

From both a theoretical and interpretation perspective, I prefer the first of these two joint frailty models. If there was no association, then $\sigma_{1} = 0$, which is at the boundary of its parameter space, and hence unlikely to converge, compared to $\alpha = 0$ from model 1.

Extensions including higher levels, time-dependent effects, non-linear covariate effects and the relative survival extension can be included very simply.