A Markov illness-death model using merlin and multistate

An illness-death model with transition-specific distributions

This example comes from the paper Parametric multi-state survival models: flexible modelling allowing transition-specific distributions with application to estimating clinically useful measures of effect differences by myself and Paul Lambert, published in Statistics in Medicine. The data comes from a widely used study in breast cancer, which is included in the multistate package. The multistate package now requires the use of my merlin command to fit the transition hazard models, so the syntax will differ slightly from the paper. We load the data,

. use http://fmwww.bc.edu/repec/bocode/m/multistate_example,clear
(Rotterdam breast cancer data, truncated at 10 years)

which comes in wide format,

. list pid rf rfi os osi if pid==1 | pid==1371, sepby(pid) noobs

  +-------------------------------------+
  |  pid     rf   rfi     os        osi |
  |-------------------------------------|
  |    1   59.1     0   59.1      alive |
  |-------------------------------------|
  | 1371   16.6     1   24.3   deceased |
  +-------------------------------------+

where pid is our patient identifier, rf contains the time of recurrence, rfi the recurrence event indicator, os overall survival time and osi our survival indicator. We can use msset to reshape our wide dataset into the stacked format, with a row for each transition of which a patient is at risk for.

. msset, id(pid) states(rfi osi) times(rf os)

msset creates internal variables for use in subsequent analyses,

. list pid _start _stop _from _to _status _trans if pid==1 | pid==1371, noobs

  +---------------------------------------------------------------+
  |  pid      _start       _stop   _from   _to   _status   _trans |
  |---------------------------------------------------------------|
  |    1           0   59.104721       1     2         0        1 |
  |    1           0   59.104721       1     3         0        2 |
  | 1371           0   16.558521       1     2         1        1 |
  | 1371           0   16.558521       1     3         0        2 |
  | 1371   16.558521   24.344969       2     3         1        3 |
  +---------------------------------------------------------------+

and also returns a default transition matrix, if one was not provided. We need to store this for later. In this case it’s the illness-death transition matrix, as it will assume an upper triangular transition matrix, with a common initial state.

. mat tmat = r(transmatrix)

. mat list tmat

tmat[3,3]
               to:    to:    to:
            start    rfi    osi
from:start      .      1      2
  from:rfi      .      .      3
  from:osi      .      .      .

We can now stset our dataset, using the msset created variables,

. stset _stop, enter(_start) failure(_status==1) scale(12) 

     failure event:  _status == 1
obs. time interval:  (0, _stop]
 enter on or after:  time _start
 exit on or before:  failure
    t for analysis:  time/12

------------------------------------------------------------------------------
      7,482  total observations
          0  exclusions
------------------------------------------------------------------------------
      7,482  observations remaining, representing
      2,790  failures in single-record/single-failure data
 38,474.539  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  19.28268

We also changed the timescale from months into years by using scale(12). Tumour size at diagnosis, size, is a three-level factor variable, which we now create dummy indicator variables by,

. tab size, gen(sz)

     Tumour |
    size, 3 |
classes (t) |      Freq.     Percent        Cum.
------------+-----------------------------------
    <=20 mm |      3,339       44.63       44.63
  >20-50mmm |      3,327       44.47       89.09
     >50 mm |        816       10.91      100.00
------------+-----------------------------------
      Total |      7,482      100.00

Neither merlin or predictms support factor variables so you must create your own dummies. We can now fit our multi-state model, in this case fitting transition-specific distributions. There’s more details in the original paper, but for transition 1 we fit a Royston-Parmar model with three degrees of freedom for the baseline, and time-dependent effects for tumour size (both groups, compared to the reference) and progesterone level,

. stmerlin age sz2 sz3 nodes pr_1 hormon if _trans==1, dist(rp) df(3)     ///
>         tvc(sz2 sz3 pr_1) dftvc(1)
Obtaining initial values
variables created for model 1, component 7: _cmp_1_7_1 to _cmp_1_7_1
variables created for model 1, component 8: _cmp_1_8_1 to _cmp_1_8_1
variables created for model 1, component 9: _cmp_1_9_1 to _cmp_1_9_1
variables created: _rcs1_1 to _rcs1_3

Fitting full model:

Iteration 0:   log likelihood = -5067.1092  
Iteration 1:   log likelihood = -4837.9868  
Iteration 2:   log likelihood = -4793.0266  
Iteration 3:   log likelihood = -4790.4667  
Iteration 4:   log likelihood = -4790.4569  
Iteration 5:   log likelihood = -4790.4569  

Survival model                                  Number of obs     =      2,982
Log likelihood = -4790.4569
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_t:          |            
         age |  -.0062709   .0021004    -2.99   0.003    -.0103875   -.0021542
         sz2 |   .4777278   .0634814     7.53   0.000     .3533064    .6021491
         sz3 |   .7445511   .0904352     8.23   0.000     .5673013    .9218009
       nodes |   .0784041   .0045454    17.25   0.000     .0694953    .0873129
        pr_1 |  -.0783095   .0122404    -6.40   0.000    -.1023003   -.0543188
      hormon |  -.0797627   .0824503    -0.97   0.333    -.2413623     .081837
   sz2#rcs() |  -.1740496   .0446886    -3.89   0.000    -.2616378   -.0864615
   sz3#rcs() |  -.2669214   .0616144    -4.33   0.000    -.3876835   -.1461593
  pr_1#rcs() |    .072824   .0086398     8.43   0.000     .0558904    .0897577
       _cons |  -.9480286   .1266086    -7.49   0.000    -1.196177   -.6998803
------------------------------------------------------------------------------
    Warning: Baseline spline coefficients not shown - use ml display

. est store m1

Note I use stmerlin, which is a wrapper for the more complex merlin command, but it uses the st variables created by stset so is much more convenient for use when fitting standard survival models. For transition 2 it’s a Weibull model with just main effects,

. stmerlin age sz2 sz3 nodes pr_1 hormon if _trans==2, dist(weib)

Fitting full model:

Iteration 0:   log likelihood =  -18993.64  
Iteration 1:   log likelihood = -1055.4788  
Iteration 2:   log likelihood =   -1014.44  
Iteration 3:   log likelihood = -867.44017  
Iteration 4:   log likelihood = -860.32011  
Iteration 5:   log likelihood = -859.53193  
Iteration 6:   log likelihood =  -859.5294  
Iteration 7:   log likelihood =  -859.5294  

Survival model                                  Number of obs     =      2,982
Log likelihood =  -859.5294
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_t:          |            
         age |   .1250736   .0079699    15.69   0.000     .1094529    .1406942
         sz2 |   .1615513   .1614483     1.00   0.317    -.1548816    .4779842
         sz3 |   .4153084   .2332724     1.78   0.075    -.0418971     .872514
       nodes |   .0439416   .0182545     2.41   0.016     .0081633    .0797198
        pr_1 |   .0223507   .0334238     0.67   0.504    -.0431587    .0878601
      hormon |  -.1399099   .2291894    -0.61   0.542    -.5891128     .309293
       _cons |  -14.02238   .6218293   -22.55   0.000    -15.24114   -12.80362
  log(gamma) |   .5106518   .0572511     8.92   0.000     .3984416    .6228619
------------------------------------------------------------------------------

. est store m2

and for transition 3 a Royston-Parmar model again,

. stmerlin age sz2 sz3 nodes pr_1 hormon if _trans==3, dist(rp) df(3)     ///
>         tvc(pr_1) dftvc(1)
note; a delayed entry model is being fitted
Obtaining initial values
variables created for model 1, component 7: _cmp_1_7_1 to _cmp_1_7_1
variables created: _rcs1_1 to _rcs1_3

Fitting full model:

Iteration 0:   log likelihood = -2914.0745  
Iteration 1:   log likelihood = -2384.9027  
Iteration 2:   log likelihood = -2369.5917  
Iteration 3:   log likelihood = -2369.3088  
Iteration 4:   log likelihood = -2369.3081  
Iteration 5:   log likelihood = -2369.3081  

Survival model                                  Number of obs     =      1,518
Log likelihood = -2369.3081
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_t:          |            
         age |    .004944   .0024217     2.04   0.041     .0001976    .0096905
         sz2 |   .1653556   .0712326     2.32   0.020     .0257423    .3049689
         sz3 |   .3243007   .0992351     3.27   0.001     .1298035    .5187979
       nodes |   .0297038   .0057735     5.14   0.000     .0183879    .0410196
        pr_1 |  -.1843937   .0211401    -8.72   0.000    -.2258276   -.1429599
      hormon |   .0315519   .0976381     0.32   0.747    -.1598152    .2229191
  pr_1#rcs() |   .0636262   .0121506     5.24   0.000     .0398114     .087441
       _cons |   .3912493   .1659889     2.36   0.018     .0659171    .7165815
------------------------------------------------------------------------------
    Warning: Baseline spline coefficients not shown - use ml display

. est store m3

Each time we store the model results using estimates store. We can then pass the model objects to predictms to obtain a huge range of predictions. It’s that simple.

First we create a time variable, at which to calculate predictions,

. range tvar 0 15 16
(7,466 missing values generated)

We request transition probabilities with the probability option, which by default assume you start in state 1 at time 0. I’m also predicting for a patient with age 50 through use of the at1() option (variables not specified in the at#() statement(s) will be set to 0),

. predictms, transmatrix(tmat) models(m1 m2 m3) probability at1(age 50) timevar(tvar)

which gives us some new variables,

. list _prob_* tvar in 1/5, noobs ab(12)

  +---------------------------------------------------+
  | _prob_at1_~1   _prob_at1_~2   _prob_at1_~3   tvar |
  |---------------------------------------------------|
  |            1              0              0      0 |
  |    .93257775      .05513152      .01229073      1 |
  |    .82898007      .10983523       .0611847      2 |
  |     .7569528      .12875291       .1142943      3 |
  |    .70569409      .13640839      .15789752      4 |
  +---------------------------------------------------+

We can use the graphms command to obtain a stacked plot of the transition probabilities,

. graphms

Alt text

If we want confidence intervals (by default it will use the delta method), we simply add the ci option to predictms,

. predictms, transmatrix(tmat) models(m1 m2 m3) probability at1(age 50)    ///
>            timevar(tvar) ci

Calc. CIs using the delta method for at1() (32)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
................................

for example,

. list _prob_at1_1_1* tvar in 1/5, ab(15)

     +----------------------------------------------------------+
     | _prob_at1_1_1   _prob_at1~1_lci   _prob_at1~1_uci   tvar |
     |----------------------------------------------------------|
  1. |             1                 1                 1      0 |
  2. |     .93257775         .91671807         .94559645      1 |
  3. |     .82898007         .80160108         .85327236      2 |
  4. |      .7569528          .7259451         .78548884      3 |
  5. |     .70569409         .67286081         .73652243      4 |
     +----------------------------------------------------------+

As well as transition probabilities, we can calculate the mean length of time spent in each state, as a function of follow-up time, by adding the los option,

. predictms, transmatrix(tmat) models(m1 m2 m3) probability at1(age 50)    ///
>            timevar(tvar) los

. list _los_* tvar in 1/5

     +------------------------------------------+
     | _los_at~1   _los_at~2   _los_at~3   tvar |
     |------------------------------------------|
  1. |         0           0           0      0 |
  2. | .97673802   .02017742   .00308456      1 |
  3. |  1.855171   .10680299   .03802601      2 |
  4. | 2.6458503   .22771998   .12642975      3 |
  5. | 3.3759215   .36079804   .26328042      4 |
     +------------------------------------------+

We can get useful contrasts between covariate patterns through use of the at2() option. Here I’ll calculate the difference in transition probabilities for a patient aged 50, compared to a patient aged 60,

. predictms, transmatrix(tmat) models(m1 m2 m3) probability at1(age 50)    ///
>            timevar(tvar) at2(age 60) difference

. list _diff_prob_* tvar in 1/5, ab(15)

     +------------------------------------------------------------+
     | _diff_prob_at~1   _diff_prob_at~2   _diff_prob_at~3   tvar |
     |------------------------------------------------------------|
  1. |               0                 0                 0      0 |
  2. |       .00295461        -.00375377         .00079916      1 |
  3. |       .00663634        -.00825508         .00161874      2 |
  4. |        .0077531        -.01033321         .00258011      3 |
  5. |       .00731331        -.01138523         .00407192      4 |
     +------------------------------------------------------------+

Instead of the difference, we could get the ratio,

. predictms, transmatrix(tmat) models(m1 m2 m3) probability at1(age 50)   ///
>            timevar(tvar) at2(age 60) ratio

We can put our options together and get the ratio of length of stays across the different ages, with confidence intervals! Exciting, I know.

. predictms, transmatrix(tmat) models(m1 m2 m3) probability  at1(age 50)    ///
>            timevar(tvar) at2(age 60) los ratio ci

Calc. CIs using the delta method for at1() (32)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
................................
Calc. CIs using the delta method for at2() (32)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
................................

I admit, this is a rather brief overview of the main features of the multistate package - I will add more examples when I can.

This example used merlin version 2.0.2 and multistate version 4.3.0.

Related

comments powered by Disqus