A Markov illness-death model

An illness death model with transition-specific distributions

This example comes from my recent paper on multi-state modelling in Stats in Med. The data comes from a widely used study in breast cancer, which is included in the multistate package.

. use http://fmwww.bc.edu/repec/bocode/m/multistate_example,clear
(Rotterdam breast cancer data, truncated at 10 years)

The data comes in wide format,

. list pid rf rfi os osi if pid==1 | pid==1371, sepby(pid) noobs

  +-------------------------------------+
  |  pid     rf   rfi     os        osi |
  |-------------------------------------|
  |    1   59.1     0   59.1      alive |
  |-------------------------------------|
  | 1371   16.6     1   24.3   deceased |
  +-------------------------------------+

where pid is our patient identifier, rf contains the time of recurrence, rfi the recurrence event indicator, os overall survival time and osi our survival indicator. We can use msset to reshape our wide dataset into the stacked format, with a row for each transition of which a patient is at risk for.

. msset, id(pid) states(rfi osi) times(rf os)

Let’s see what the data looks like now,

. list pid rf rfi os osi if pid==1 | pid==1371, sepby(pid) noobs

  +-------------------------------------+
  |  pid     rf   rfi     os        osi |
  |-------------------------------------|
  |    1   59.1     0   59.1      alive |
  |    1   59.1     0   59.1      alive |
  |-------------------------------------|
  | 1371   16.6     1   24.3   deceased |
  | 1371   16.6     1   24.3   deceased |
  | 1371   16.6     1   24.3   deceased |
  +-------------------------------------+

msset creates internal variables for use in subsequent analyses,

. list pid _start _stop _from _to _status _trans if pid==1 | pid==1371, noobs

  +---------------------------------------------------------------+
  |  pid      _start       _stop   _from   _to   _status   _trans |
  |---------------------------------------------------------------|
  |    1           0   59.104721       1     2         0        1 |
  |    1           0   59.104721       1     3         0        2 |
  | 1371           0   16.558521       1     2         1        1 |
  | 1371           0   16.558521       1     3         0        2 |
  | 1371   16.558521   24.344969       2     3         1        3 |
  +---------------------------------------------------------------+

and also returns a default transition matrix, if one was not provided. We need to store this for later.

. mat tmat = r(transmatrix)

We can now stset our dataset, using the msset created variables,

. stset _stop, enter(_start) failure(_status==1) scale(12) 

     failure event:  _status == 1
obs. time interval:  (0, _stop]
 enter on or after:  time _start
 exit on or before:  failure
    t for analysis:  time/12

------------------------------------------------------------------------------
      7,482  total observations
          0  exclusions
------------------------------------------------------------------------------
      7,482  observations remaining, representing
      2,790  failures in single-record/single-failure data
 38,474.539  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  19.28268

We also changed the timescale from months into years by using scale(12). An important variable, size is a factor variable, which we now create dummy indicator variables by,

. tab size, gen(sz)

     Tumour |
    size, 3 |
classes (t) |      Freq.     Percent        Cum.
------------+-----------------------------------
    <=20 mm |      3,339       44.63       44.63
  >20-50mmm |      3,327       44.47       89.09
     >50 mm |        816       10.91      100.00
------------+-----------------------------------
      Total |      7,482      100.00

We can now fit our multi-state model, in this case fitting transition-specific distributions. For transition 1,

. stpm2 age sz2 sz3 nodes pr_1 hormon if _trans==1, df(3) scale(h)        ///
>         tvc(sz2 sz3 pr_1) dftvc(1)

Iteration 0:   log likelihood = -3522.9535  
Iteration 1:   log likelihood = -3479.1473  
Iteration 2:   log likelihood = -3476.6534  
Iteration 3:   log likelihood = -3476.6455  
Iteration 4:   log likelihood = -3476.6455  

Log likelihood = -3476.6455                     Number of obs     =      2,982

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb           |
         age |  -.0062709   .0021004    -2.99   0.003    -.0103875   -.0021543
         sz2 |   .4777289   .0634816     7.53   0.000     .3533073    .6021505
         sz3 |    .744544   .0904352     8.23   0.000     .5672943    .9217937
       nodes |   .0784025   .0045454    17.25   0.000     .0694937    .0873113
        pr_1 |  -.0783066   .0122404    -6.40   0.000    -.1022973   -.0543159
      hormon |  -.0797426   .0824504    -0.97   0.333    -.2413424    .0818572
       _rcs1 |   .9703563   .0472652    20.53   0.000     .8777182    1.062994
       _rcs2 |   .3104222   .0218912    14.18   0.000     .2675162    .3533282
       _rcs3 |  -.0176099   .0114839    -1.53   0.125    -.0401179    .0048982
   _rcs_sz21 |  -.1740546   .0446893    -3.89   0.000     -.261644   -.0864652
   _rcs_sz31 |  -.2669255   .0616161    -4.33   0.000    -.3876909   -.1461601
  _rcs_pr_11 |    .072824   .0086399     8.43   0.000     .0558901    .0897578
       _cons |  -.9480559   .1266088    -7.49   0.000    -1.196205   -.6999071
------------------------------------------------------------------------------

. est store m1

for transition 2,

. streg age sz2 sz3 nodes pr_1 hormon if _trans==2, dist(weib) nohr

         failure _d:  _status == 1
   analysis time _t:  _stop/12
  enter on or after:  time _start

Fitting constant-only model:

Iteration 0:   log likelihood = -789.49708
Iteration 1:   log likelihood = -772.22637
Iteration 2:   log likelihood = -771.90383
Iteration 3:   log likelihood = -771.90373
Iteration 4:   log likelihood = -771.90373

Fitting full model:

Iteration 0:   log likelihood = -771.90373  
Iteration 1:   log likelihood = -759.32799  
Iteration 2:   log likelihood = -585.02458  
Iteration 3:   log likelihood = -580.52674  
Iteration 4:   log likelihood = -580.44872  
Iteration 5:   log likelihood =  -580.4487  

Weibull PH regression

No. of subjects =        2,982                  Number of obs    =       2,982
No. of failures =          195
Time at risk    =  17203.80014
                                                LR chi2(6)       =      382.91
Log likelihood  =    -580.4487                  Prob > chi2      =      0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .1250736   .0079699    15.69   0.000     .1094529    .1406942
         sz2 |   .1615513   .1614483     1.00   0.317    -.1548816    .4779841
         sz3 |   .4153083   .2332724     1.78   0.075    -.0418973    .8725138
       nodes |   .0439416   .0182545     2.41   0.016     .0081633    .0797198
        pr_1 |   .0223507   .0334238     0.67   0.504    -.0431587    .0878601
      hormon |  -.1399098   .2291893    -0.61   0.542    -.5891126     .309293
       _cons |  -14.02238   .6218293   -22.55   0.000    -15.24114   -12.80362
-------------+----------------------------------------------------------------
       /ln_p |   .5106518   .0572511     8.92   0.000     .3984416     .622862
-------------+----------------------------------------------------------------
           p |   1.666377    .095402                      1.489502    1.864256
         1/p |   .6001043   .0343567                      .5364071    .6713655
------------------------------------------------------------------------------

. est store m2

and for transition 3,

. stpm2 age sz2 sz3 nodes pr_1 hormon if _trans==3, df(3) scale(h)        ///
>         tvc(pr_1) dftvc(1)
note: delayed entry models are being fitted

Iteration 0:   log likelihood = -943.78807  
Iteration 1:   log likelihood = -930.00714  
Iteration 2:   log likelihood =  -929.1202  
Iteration 3:   log likelihood = -929.11658  
Iteration 4:   log likelihood = -929.11658  

Log likelihood = -929.11658                     Number of obs     =      1,518

-------------------------------------------------------------------------------
              |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
xb            |
          age |   .0049441   .0024217     2.04   0.041     .0001977    .0096906
          sz2 |   .1653563   .0712326     2.32   0.020      .025743    .3049696
          sz3 |   .3243048   .0992351     3.27   0.001     .1298075    .5188021
        nodes |   .0297031   .0057735     5.14   0.000     .0183873    .0410189
         pr_1 |  -.1843876   .0211383    -8.72   0.000     -.225818   -.1429572
       hormon |   .0315634   .0976384     0.32   0.746    -.1598045    .2229312
        _rcs1 |   .5057489   .0581187     8.70   0.000     .3918383    .6196595
        _rcs2 |   .1035699     .03143     3.30   0.001     .0419681    .1651716
        _rcs3 |  -.0100584   .0117741    -0.85   0.393    -.0331352    .0130185
   _rcs_pr_11 |   .0636225   .0121503     5.24   0.000     .0398085    .0874366
        _cons |    .391217   .1659763     2.36   0.018     .0659094    .7165246
-------------------------------------------------------------------------------

. est store m3

Each time we store the model results using estimates store. We can then pass the model objects to predictms to obatin any number of predictions. It’s that simple.

The default predictions give us transition probabilities, assuming you start in state 1 at time 0. I’m also predicting for a patient with age 50 through use of the at1() option,

. predictms, transmatrix(tmat) models(m1 m2 m3) at1(age 50) seed(52545)

which gives us some new variables,

. list _prob_* _time in 1/5, noobs

  +--------------------------------------------+
  | _prob_~1   _prob_~2   _prob_~3       _time |
  |--------------------------------------------|
  |        1          0          0           0 |
  |    .9298     .05656     .01364   1.0148781 |
  |   .82649      .1093     .06421   2.0297561 |
  |   .75502      .1271     .11788   3.0446342 |
  |   .70297     .13589     .16114   4.0595122 |
  +--------------------------------------------+

We can use the graph option to obtain a stacked plot of the transition probabilities,

. predictms, transmatrix(tmat) models(m1 m2 m3) at1(age 50) graph seed(52545)

Alt text

If we want confidence intervals, we simply add the ci option,

. predictms, transmatrix(tmat) models(m1 m2 m3) at1(age 50) obs(100) n(10000) m(200) ci seed(52545)

for example,

. list _prob_at1_1_1* _time in 1/5

     +----------------------------------------------+
     | _prob_~1   _pr~1_lci   _pr~1_uci       _time |
     |----------------------------------------------|
  1. |        1           1           1           0 |
  2. | .9980385   .99669364   .99938336   .19477458 |
  3. | .9896805   .98554902   .99381198   .38954915 |
  4. | .9751085   .96738264   .98283436   .58432373 |
  5. |  .956239   .94484323   .96763477   .77909831 |
     +----------------------------------------------+

As well as transition probabilities, we can calculate the mean length of time spent in each state, as a function of follow-up time, by adding the los option,

. predictms, transmatrix(tmat) models(m1 m2 m3) at1(age 50) los seed(52545)

. list _los_* _time in 1/5

     +-----------------------------------------------+
     | _los_at~1   _los_at~2   _los_at~3       _time |
     |-----------------------------------------------|
  1. |         0           0           0           0 |
  2. | .99001918   .02140007   .00345881   1.0148781 |
  3. |  1.878816   .10930209   .04163803   2.0297561 |
  4. | 2.6792285   .23098121   .13442443   3.0446342 |
  5. | 3.4179435   .36479615    .2767726   4.0595122 |
     +-----------------------------------------------+

We can get useful contrasts between covariate patterns through use of the at2() option. Here I’ll calculate the difference in transition probabilities for a patient aged 50, compared to a patient aged 60,

. predictms, transmatrix(tmat) models(m1 m2 m3) at1(age 50) at2(age 60) difference seed(52545)

. list _diff_prob_* _time in 1/5

     +--------------------------------------------+
     | _diff_~1   _diff_~2   _diff_~3       _time |
     |--------------------------------------------|
  1. |        0          0          0           0 |
  2. |   .00449    -.00476     .00027   1.0148781 |
  3. |   .00714    -.00689    -.00025   2.0297561 |
  4. |   .00768    -.00866     .00098   3.0446342 |
  5. |   .00737    -.00968     .00231   4.0595122 |
     +--------------------------------------------+

Instead of the difference, we could get the ratio,

. predictms, transmatrix(tmat) models(m1 m2 m3) at1(age 50) at2(age 60) ratio seed(52545)

We can put our options together and get the ratio of length of stays across the different ages, with confidence intervals! Exciting, I know.

. predictms, transmatrix(tmat) models(m1 m2 m3) at1(age 50) at2(age 60) los ratio ci seed(52545)

This last code chunk uses stms to fit transition-specific models, but jointly, meaning we can constrain covariate effects to be equal across transitions, such as the effect of age being the same for transitions 1 and 3.

stms (age sz2 sz3 nodes pr_1 hormon, model(fpm) df(3) scale(h))  ///
     (age sz2 sz3 nodes pr_1 hormon, model(weib))                ///
     (age sz2 sz3 nodes pr_1 hormon, model(fpm) df(3) scale(h))  ///
     , transvar(_trans) constraints(age 1 3)  

Related

comments powered by Disqus