The do file with all the code from this tutorial is available to download here

In this example I’ll look at the analysis of clustered survival data with three levels. Such data arises in the meta-analysis of recurrent event data, where we have events, $k$, nested within patients, $j$, nested within trials, $i$.

### Random intercepts

The first example will consider a model with a random intercept at level 1 and a random intercept at level 2,

$$ h_{ijk}(t) = h_{0}(t) \exp (X_{ij}\beta + b_{1i} + b_{2ij}) $$

where

$$ b_{1i} \sim N(0,\sigma_{1}^{2}) \text{ and } b_{2ij} \sim N(0,\sigma^{2}_{2})$$

First we must be clear about what assumptions this model has. By assuming a random intercept at the trial level 1, we are assuming that each trial comes from a distribution of trials, which arguably is a strong and perhaps implausible assumption. We may prefer to stratify by trial, but for the sake of illustration, here I’ll proceed with a random trial effect. A second important note is that our covariate effects, $\beta$, are interpreted as conditonal on the random effects.

I’ll start by showing how to simulate data from such a model. Let’s assume we have data from 15 trials,
so we generate a `trial`

variable which will indicate which trial the observations come from, and also our trial level
random effect

```
. clear
. set seed 498093
. set obs 15
number of observations (_N) was 0, now 15
. gen trial = _n
. gen b1 = rnormal(0,1)
```

Note that I set the simulation seed for reproducibility!

Now let’s assume we recruit 100 patients within each trial. We can use `expand`

for this,

```
. expand 100
(1,485 observations created)
. sort trial
```

and then generate a unique patient `id`

variable, and our patient level random effect

```
. gen id = _n
. gen b2 = rnormal(0,1)
```

We also generate a binary treatment variable, at the patient level,

```
. gen trt = runiform()>0.5
```

Finally, we allow each patient to experience up to two events,

```
. expand 2
(1,500 observations created)
. sort trial id
```

Now we can simulate our observation level survival times, by using `survsim`

. I’ll assume
a Weibull baseline hazard function with $\lambda=0.1$ and $\gamma=1.2$, and a log hazard ratio
of $\beta=-0.5$ for the treatment effect,

```
. survsim stime died, dist(weibull) lambda(0.1) gamma(1.2) maxtime(5) ///
> covariates(trt -0.5 b1 1 b2 1)
```

Note that I’m simulating the recurrent events under a clock-reset approach, i.e. after each event the timescale is reset to 0. In a further example I will look at the clock-forward approach, i.e. one continuous timescale.

We can fit our data-generating model, i.e. the true model, with `merlin`

as follows,

```
. merlin (stime trt M1[trial]@1 M2[trial>id]@1, family(weibull, failure(died)))
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -4250.8981 (not concave)
Iteration 1: log likelihood = -4183.9241
Iteration 2: log likelihood = -4176.4586
Iteration 3: log likelihood = -4175.4024
Iteration 4: log likelihood = -4175.3996
Iteration 5: log likelihood = -4175.3996
Mixed effects regression model Number of obs = 3,000
Log likelihood = -4175.3996
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stime: |
trt | -.5449902 .0757995 -7.19 0.000 -.6935545 -.396426
M1[trial] | 1 . . . . .
M2[trial>id] | 1 . . . . .
_cons | -1.99857 .2224338 -8.99 0.000 -2.434532 -1.562608
log(gamma) | .192227 .0259834 7.40 0.000 .1413005 .2431534
-------------+----------------------------------------------------------------
trial: |
sd(M1) | .8217195 .156671 .5654985 1.194031
-------------+----------------------------------------------------------------
id: |
sd(M2) | .971307 .0566649 .8663601 1.088967
------------------------------------------------------------------------------
```

### Random trial and random treatment effect

Now we consider a second example, but still sticking within a meta-analysis context. In many situations, we would investigate the presence of heterogeneity, i.e. unobserved variability, in any treatment effects. We can do this by modelling a random treatment effect, with the following model.

$$ h_{ijk}(t) = h_{0}(t) \exp (b_{12i} + (\beta + b_{11i}) X_{ij} + b_{2ij}) $$

where

$$ b_{11i} \sim N(0,\sigma_{11}^{2}), b_{12i} \sim N(0,\sigma_{12}^{2}) \text{ and } b_{2ij} \sim N(0,\sigma^{2}_{2})$$

Once more let’s assume we have data from 15 trials, so we generate a `trial`

variable which
will indicate which trial the observations come from, but this time we have a random intercept
and also a random treatment effect, which we assume are independent (we could use `drawnorm`

if we
wanted to impose correlated random effects),

```
. clear
. set seed 498093
. set obs 15
number of observations (_N) was 0, now 15
. gen trial = _n
. gen b11 = rnormal(0,1)
. gen b12 = rnormal(0,1)
```

Again let’s assume we recruit 100 patients within each trial,

```
. expand 100
(1,485 observations created)
. sort trial
```

and then generate our unique patient `id`

variable, and our patient level random effect

```
. gen id = _n
. gen b2 = rnormal(0,1)
```

We also generate a binary treatment variable, at the patient level,

```
. gen trt = runiform()>0.5
```

Finally, we allow each patient to experience up to two events,

```
. expand 2
(1,500 observations created)
. sort trial id
```

Now to impose our random treatment effect (varying at the trial level), we can first generate its overall effect using

```
. . gen trtb12 = (-0.5 + b12) * trt
```

and then simulate our survival times as follows,

```
. survsim stime died, dist(weibull) lambda(0.1) gamma(1.2) maxtime(5) ///
> covariates(trtb12 1 b11 1 b2 1)
```

We can fit our data-generating model, i.e. the true model, with `merlin`

as follows,

```
. merlin (stime trt trt#M1[trial]@1 M2[trial]@1 M3[trial>id]@1, family(weibull, failure(d
> ied)))
Fitting fixed effects model:
Fitting full model:
Iteration 0: log likelihood = -4116.4692 (not concave)
Iteration 1: log likelihood = -4016.0111
Iteration 2: log likelihood = -4009.416
Iteration 3: log likelihood = -4009.1759
Iteration 4: log likelihood = -4009.1752
Iteration 5: log likelihood = -4009.1752
Mixed effects regression model Number of obs = 3,000
Log likelihood = -4009.1752
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
stime: |
trt | -.5156967 .3010874 -1.71 0.087 -1.105817 .0744238
trt#M1[tri~] | 1 . . . . .
M2[trial] | 1 . . . . .
M3[trial>id] | 1 . . . . .
_cons | -2.027292 .2255663 -8.99 0.000 -2.469394 -1.585191
log(gamma) | .2010559 .0258223 7.79 0.000 .1504451 .2516668
-------------+----------------------------------------------------------------
trial: |
sd(M1) | 1.120972 .2208733 .7618621 1.649351
sd(M2) | .8307332 .1644114 .5636366 1.224402
-------------+----------------------------------------------------------------
id: |
sd(M3) | 1.03136 .0559169 .9273872 1.14699
------------------------------------------------------------------------------
```

This example used `survsim`

version 4.0.4 and `merlin`

version 2.0.2.