Joint frailty models for recurrent and terminal events

The do file with all the code from this tutorial is available to download here

An area of intense research in recent years is in the field of joint frailty survival models, for the analysis of joint recurrent event and terminal event data. Here I’ll describe and implement the two most popular approaches, originally proposed by Liu et al. (2004) and Mazroui et al. (2012). We have a survival model for the recurrent event process, and a survival model for the terminal event process, linked through shared random effects.

I’ll illustrate these models using the readmission dataset that comes with the extensive frailtypack in R (Król et al. (2017)), developed by Virginie Rondeau and her group. The dataset has information on rehospitalisation times after surgery in patients diagnosed with colorectal cancer. Covariates of interest include gender, Dukes’ tumour stage and comorbidity Charlson index. I’ll load the dataset and generate binary indicator variables for use later.

. use "https://www.mjcrowther.co.uk/data/jointfrailty_example",clear

. tab sex, gen(gender)

        sex |      Freq.     Percent        Cum.
------------+-----------------------------------
     Female |        312       36.24       36.24
       Male |        549       63.76      100.00
------------+-----------------------------------
      Total |        861      100.00

. tab dukes, gen(dk)

      dukes |      Freq.     Percent        Cum.
------------+-----------------------------------
        A-B |        324       37.63       37.63
          C |        331       38.44       76.07
          D |        206       23.93      100.00
------------+-----------------------------------
      Total |        861      100.00

. tab charlson, gen(ch)

   charlson |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        577       67.02       67.02
        1-2 |         46        5.34       72.36
          3 |        238       27.64      100.00
------------+-----------------------------------
      Total |        861      100.00

The time of rehospitalisation is stored in time, with corresponding event indicator event. This is in clock-reset formulation, i.e. each time a patient has a cancer recurrence the clock is reset to zero, so it’s a semi-Markov model. Our overall survival time is stored in stime, with corresponding event indicator stored in death.

In the first specification, we have one random effect which accounts for the correlation between recurrent events, and which is then included in the linear predictor for the terminal event model, with an accompanying estimated coefficient, which estimates the strength of the relationship between the two processes. So for example,

$$ h_{ij}(t) = h_{0}(t) \exp (X_{1ij} \beta_{1} + b_{i}) $$ and $$ \lambda_{i}(u) = \lambda_{0}(u) \exp (X_{1i} \beta_{2} + \alpha b_{i}) $$

where $h_{ij}(t)$ is the hazard function for the $j$th event of the $i$th patient, $\lambda_{i}(u)$ is the hazard function for the terminal event, and $b_{i} \sim N(0,\sigma^2)$.

We can fit such a model with merlin, adjusting for all available covariates in each outcome model,

. merlin (time  gender1 dk2 dk3 ch2 ch3 M1[id]@1           ///
>               , family(rp, df(3) failure(event)))        ///
>        (stime gender1 dk2 dk3 ch2 ch3 M1[id]             ///
>               , family(rp, df(1) failure(death)))
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_1

Fitting fixed effects model:

Fitting full model:

Iteration 0:   log likelihood = -4188.9255  
Iteration 1:   log likelihood = -4175.3708  (not concave)
Iteration 2:   log likelihood = -4148.1729  
Iteration 3:   log likelihood = -4143.6946  
Iteration 4:   log likelihood = -4143.6239  
Iteration 5:   log likelihood = -4143.6239  

Mixed effects regression model                  Number of obs     =        861
Log likelihood = -4143.6239
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
time:        |            
     gender1 |  -.4603478   .1407838    -3.27   0.001    -.7362791   -.1844166
         dk2 |   .4080402    .157101     2.60   0.009      .100128    .7159525
         dk3 |   1.395737   .2068641     6.75   0.000     .9902903    1.801183
         ch2 |   .3244107   .2581746     1.26   0.209    -.1816021    .8304235
         ch3 |   .3478034   .1353953     2.57   0.010     .0824335    .6131732
      M1[id] |          1          .        .       .            .           .
       _cons |  -2.126217   .1482472   -14.34   0.000    -2.416776   -1.835657
-------------+----------------------------------------------------------------
stime:       |            
     gender1 |  -.3533375   .2254812    -1.57   0.117    -.7952725    .0885975
         dk2 |   .9249671   .3356416     2.76   0.006     .2671216    1.582813
         dk3 |    2.96099   .4042653     7.32   0.000     2.168644    3.753335
         ch2 |   .7341089   .6293614     1.17   0.243    -.4994168    1.967635
         ch3 |   1.058569   .2494584     4.24   0.000     .5696393    1.547498
      M1[id] |   .9107484   .2383294     3.82   0.000     .4436314    1.377865
       _cons |   -3.24061   .3218232   -10.07   0.000    -3.871372   -2.609849
-------------+----------------------------------------------------------------
id:          |            
      sd(M1) |   .8002135    .082537                      .6537471    .9794943
------------------------------------------------------------------------------
    Warning: Baseline spline coefficients not shown - use ml display

We now have two model specifications, each of which can be completely different. I’ve used Royston-Parmar survival models. We define a random effect with the same name in each linear predictor, which means only one random effect has been defined, but it is included in both outcome models. In the first model, I’ve simply constrained the coefficient of the random effect to be 1. This provides a highly convenient way of linking random effects between outcome models. Our results show a highly positive estimate of associtaion of 0.910 (95% CI: 0.444, 1.378) showing substantial association between the two event processes, i.e. a higher rate of rehospitalisation also increases the mortality rate.

The second approach consists of two random effects included in the recurrent event model, one of which is also included in the linear predictor of the terminal event model, as follows,

$$ h_{ij}(t) = h_{0}(t) \exp (X_{1ij} \beta_{1} + b_{1i} + b_{2i}) $$ and $$ \lambda_{i}(u) = \lambda_{0}(u) \exp (X_{1i} \beta_{2} + b_{2i}) $$

where $b_{1i} \sim N(0,\sigma_{1}^2)$ and $b_{2i} \sim N(0,\sigma_{2}^2)$. Our $\sigma_{2}$ parameter then indicates the association between the two processes. We can fit this model with merlin, this time illustrating how to use different distributions for the recurrent event and terminal event processes,

. merlin (time  gender1 dk2 dk3 ch2 ch3 M1[id]@1 M2[id]@1         ///
>               , family(rp, df(3) failure(event)))               ///
>        (stime gender1 dk2 dk3 ch2 ch3 M2[id]@1                  ///
>               , family(rp, df(1) failure(death)))
variables created: _rcs1_1 to _rcs1_3
variables created: _rcs2_1 to _rcs2_1

Fitting fixed effects model:

Fitting full model:

Iteration 0:   log likelihood = -4211.9343  
Iteration 1:   log likelihood = -4149.7324  
Iteration 2:   log likelihood = -4142.7635  
Iteration 3:   log likelihood = -4142.5391  
Iteration 4:   log likelihood = -4142.5347  
Iteration 5:   log likelihood = -4142.5346  

Mixed effects regression model                  Number of obs     =        861
Log likelihood = -4142.5346
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
time:        |            
     gender1 |   -.461712   .1439655    -3.21   0.001    -.7438791   -.1795448
         dk2 |   .4038239   .1597776     2.53   0.011     .0906655    .7169823
         dk3 |   1.370284   .2084699     6.57   0.000     .9616907    1.778878
         ch2 |   .3259101   .2628604     1.24   0.215    -.1892868    .8411069
         ch3 |   .3437384   .1366295     2.52   0.012     .0759494    .6115273
      M1[id] |          1          .        .       .            .           .
      M2[id] |          1          .        .       .            .           .
       _cons |  -2.153936   .1511434   -14.25   0.000    -2.450172   -1.857701
-------------+----------------------------------------------------------------
stime:       |            
     gender1 |  -.3586172    .228271    -1.57   0.116    -.8060201    .0887856
         dk2 |   .9218063   .3362933     2.74   0.006     .2626836    1.580929
         dk3 |   2.992598    .371162     8.06   0.000     2.265133    3.720062
         ch2 |   .7382451   .6340333     1.16   0.244    -.5044374    1.980928
         ch3 |   1.130689   .2543359     4.45   0.000     .6321992    1.629178
      M2[id] |          1          .        .       .            .           .
       _cons |  -3.267832   .2988515   -10.93   0.000     -3.85357   -2.682094
-------------+----------------------------------------------------------------
id:          |            
      sd(M1) |   .3860378   .1480128                      .1820816    .8184524
      sd(M2) |   .7341392   .0978276                      .5653946    .9532463
------------------------------------------------------------------------------
    Warning: Baseline spline coefficients not shown - use ml display

This defines two random effects at the idlevel, which by default have an independent covariance structure. Our estimate of $\sigma_{1}$ is 0.386 (95% CI: 0.182, 0.818), and similarly we obtain an estimate for $\sigma_{2}$ of 0.734 (95% CI: 0.565, 0.953).

From both a theoretical and interpretation perspective, I prefer the first of these two joint frailty models. If there was no association, then $\sigma_{1} = 0$, which is at the boundary of its parameter space, and hence unlikely to converge, compared to $\alpha = 0$ from model 1.

Extensions including higher levels, time-dependent effects, non-linear covariate effects and the relative survival extension can be included very simply.

This example used merlin version 2.0.2.

Related

comments powered by Disqus