survsim 4.0 - even more multi-state models

survsim version 4.0.0


I may have got carried away this week, but survsim version 4.0.0 is now released. It’s version 4 because this breaks some things introduced in 3 and 3.1, but, provides a whole lot more.

Key developments:

  • computation time reduced by about 70% for multi-state model simulation, as the root-finding procedure is now vectorised
  • reset added to hazard#() to allow simulation from semi-Markov multi-state models
  • syntax changed for user-defined functions, time #t must now specified with {t}, for more robust parsing
  • support for general multiple timescales, including:
    • {t0} can be used in user() and tdefunction() to denote the time of entry into the initial state of the associated transition
  • the competing risks syntax has now been removed and is now part of the more general msm syntax
  • bug fix; variables used directly in user() with a multi-state model were not passed to Mata -> now fixed
  • bug fix; if the same variable was used in multiple hazard#()s, parsing would only pick up the first -> now fixed

You can install the latest version of survsim using:

net install survsim, from("https://www.mjcrowther.co.uk/code/survsim/")

or use adoupdate if you already have version 3.0.0 or 3.1.0. It will be up on the SSC archive soon.

I’ll now go through the examples I posted earlier this week, that are affected by the changes, showing how to use the new syntax etc.

$~$

Simulating survival times from a user-defined (log) (cumulative) hazard function


I’ll simulate 300 observations, and generate a binary treatment group,

. set obs 300
number of observations (_N) was 0, now 300

. set seed 134987

. gen trt = runiform()>0.5

The most flexible form of simulating survival data with survsim is by specifying a custom hazard or cumulative hazard function, such as:

$$ h(t) = h_{0}(t) \exp (\text{trt} \beta_{1}) $$ where $$ h_{0}(t) = \exp (-1 + 0.02 t - 0.03 t^2 + 0.005 t^3) $$

which can be done, on the loghazard() scale for simplicity, using

. survsim stime1 died1, loghazard(-1 :+ 0.02:*{t} :- 0.03:*{t}:^2 :+ 0.005:*{t}:^3)   ///
>                       maxtime(1.5) covariates(trt -0.5)
Warning: 199 survival times were above the upper limit of maxtime()
         They have been set to maxtime()
         You can identify them by _survsim_rc = 3

with a common right censoring time of 1.5 years specified using maxtime(1.5). We could make the treatment effect diminish over time by,

. survsim stime2 died2, loghazard(-1 :+ 0.02:*{t} :- 0.03:*{t}:^2 :+ 0.005:*{t}:^3)    ///
>                       covariates(trt -0.5) tde(trt 0.03) maxtime(1.5)
Warning: 181 survival times were above the upper limit of maxtime()
         They have been set to maxtime()
         You can identify them by _survsim_rc = 3

which will form an interaction between trt, its coefficient 0.03 and linear time, which is the default. We could instead simulate from a model on the cumulative hazard scale, using the logchazard() option instead.

$~$

Simulating competing risks data from specified cause-specific hazard functions


survsim simulates competing risk data using the cause-specific hazard setting described by Beyersmann et al. (2009), utitlising the general simulation algorithm described by Crowther and Lambert (2013). Let’s simulate from a competing risk model with 2 competing events. The first cause-specific hazard has a Weibull distribution, with no covariates. The second cause-specific hazard model has an exponential distribution, with a beneficial treatment effect. Right censoring is applied at 10 years.

. clear

. set obs 1000
number of observations (_N) was 0, now 1,000

. gen trt = runiform()>0.5

. survsim time state event , hazard1(dist(weibull) lambda(0.1) gamma(1.2))                ///
>                            hazard2(dist(exponential) lambda(0.01) covariates(trt -0.5)) ///
>                            maxtime(10)
variables time0 to time1 created
variables state0 to state1 created
variables event1 to event1 created

We can see which events occurred with

. tabulate state1 event1

           |        event1
    state1 |         0          1 |     Total
-----------+----------------------+----------
         1 |       208          0 |       208 
         2 |         0        750 |       750 
         3 |         0         42 |        42 
-----------+----------------------+----------
     Total |       208        792 |     1,000 

Now let’s simulate from a competing risk model with 3 competing events. The first cause-specific hazard has a user defined baseline hazard function, with a harmful treatment effect. The second cause-specific hazard model has a Weibull distribition, with a beneficial treatment effect. The third cause-specific hazard has a user-defined baseline hazard function, with an initially beneficial treatment effect that reduces linearly with respect to log time. Right censoring is applied at 3 years. Phew.

. cap drop time* state* event*

. survsim time state event , hazard1(user(exp(-2 :+ 0.2:* log({t}) :+ 0.1:*{t})) covariates(trt 0.1))  ///
>                            hazard2(dist(weibull) lambda(0.01) gamma(1.3) covariates(trt -0.5))       ///
>                            hazard3(user(0.1 :* {t} :^ 1.5) covariates(trt -0.5) tde(trt 0.1)         ///
>                                    tdefunction(log({t})))                                            ///
>                            maxtime(3)
variables time0 to time1 created
variables state0 to state1 created
variables event1 to event1 created

. tabulate state1 event1

           |        event1
    state1 |         0          1 |     Total
-----------+----------------------+----------
         1 |       362          0 |       362 
         2 |         0        338 |       338 
         3 |         0         28 |        28 
         4 |         0        272 |       272 
-----------+----------------------+----------
     Total |       362        638 |     1,000 

You can see that this can get as complex as necessary. I currently let you use up to 50 cause-specific hazards, just in case you’re feeling particularly adventurous.

$~$

Simulating from an illness-death model


We first define the transition matrix for an illness-death model. It has three states:

  • State 1 - A “healthy” state. Observations can move from state 1 to state 2 or 3.
  • State 2 - An intermediate “illness” state. Observations can come from state 1, and move on to state 3.
  • State 3 - An absorbing “death” state. Observations can come from state 1 or 2, but not leave.

This gives us three potential transitions between states:

  • Transition 1 - State 1 -> State 2
  • Transition 2 - State 1 -> State 3
  • Transition 3 - State 2 -> State 3

which is defined by the following matrix:

. matrix tmat = (.,1,2\.,.,3\.,.,.)

The key is to think of the column/row numbers as the states, and the elements of the matrix as the transition numbers. Any transitions indexed with a missing value . means that the transition between the row state and the column state is not possible. Let’s make it obvious, sticking with our “healthy”, “ill” and “dead” names for the states:

. mat colnames tmat = "healthy" "ill" "dead"

. mat rownames tmat = "healthy" "ill" "dead"

. mat list tmat

tmat[3,3]
         healthy      ill     dead
healthy        .        1        2
    ill        .        .        3
   dead        .        .        .

Now we’ve defined the transition matrix, we can use survsim to simulate some data. We’ll simulate 1000 observations, and generate a binary treatment group indicator, remembering to set seed first (I bashed the keyboard to pick my seed).

. clear

. set obs 1000
number of observations (_N) was 0, now 1,000

. set seed 9865

. gen trt = runiform()>0.5

The first transition-specific hazard has a user defined baseline hazard function, with a harmful treatment effect. The second transition-specific hazard model has a Weibull distribition, with a beneficial treatment effect. The third transition-specific hazard has a user-defined baseline hazard function, with an initially beneficial treatment effect that reduces linearly with respect to log time. Right censoring is applied at 3 years.

. survsim time state event, transmatrix(tmat)                                                       ///
>                           hazard1(user(exp(-2 :+ 0.2:* log({t}) :+ 0.1:*{t})) covariates(trt 0.1))  ///
>                           hazard2(dist(weibull) lambda(0.01) gamma(1.3) covariates(trt -0.5))     ///
>                           hazard3(user(0.1 :* {t} :^ 1.5) covariates(trt -0.5) tde(trt 0.1)        ///
>                                   tdefunction(log({t})))                                           ///
> maxtime(3)
variables time0 to time2 created
variables state0 to state2 created
variables event1 to event2 created

The hazard number # in each hazard#(), represents the transition number in the transiton matrix. Simple as that. survsim has created variables storing the times at which states were entered, with the associated state number and the associated event indicator. It begins by creating the 0 variables, which represents the time at which observatations entered the inital state, time0, and the associated state number, state0. As ltruncated() and startstate() were not specified, all observations are assumed to start in state 1 at time 0. Subsequent transitions are simulated until all observations have either entered an absorbing state, or are right-censored at their maxtime(). For simplicity, I will assume time is measured in years. We can see what survsim has created:

. list if inlist(_n,1,4,16,112)

      +----------------------------------------------------------------------------------+
      | trt   time0   state0       time1   state1   event1       time2   state2   event2 |
      |----------------------------------------------------------------------------------|
   1. |   0       0        1           3        1        0           .        .        . |
   4. |   1       0        1   .95636156        2        1           3        2        0 |
  16. |   0       0        1   1.0755764        2        1   2.4401409        3        1 |
 112. |   1       0        1   2.3290322        3        1           .        .        . |
      +----------------------------------------------------------------------------------+

All observations start initially in state 1 at time 0, which are stored in state0 and time0, respectively. Then,

  • Observation 1 is right-censored at 3 years, remaining in state 1
  • Observation 4 moves to state 2 at 0.956 years, and is subsequently right-censored at 3 years, still in state 2
  • Observation 16 moves to state 2 at 1.076 years, and then moves to state 3 at 2.440 years. Since state 3 is an absorbing state, there are no further transitions
  • Observation 112 moves to state 3 at 2.329 years. Again, since state 3 is absorbing, there are no further transitions

Related

comments powered by Disqus