survsim 3.1 - multi-state models

survsim version 3.1.0


I’ve stuck with survsim this week, now adding the ability to simulate survival times from a Markov multi-state model, defined by a collection of transition-specific hazard/intensity functions. Key developments:

  • ability to simulate from a Markov multi-state model, assuming a clock-forward timescale
  • observations can start in the same, or different, states using the startstate() option, and at different times using the ltruncated() option

You can install the latest version of survsim using:

net install survsim, from("https://www.mjcrowther.co.uk/code/survsim/")

or use adoupdate if you already have version 3.0.0. It will be up on the SSC archive soon.

With the introduction of version 3.1.0, survsim has five core settings for the simulation of survival data. There is now a top-level help file, found using help survsim, with further help nested files covering each of the five areas.

$~$

Simulating from an illness-death model


We first define the transition matrix for an illness-death model. It has three states:

  • State 1 - A “healthy” state. Observations can move from state 1 to state 2 or 3.
  • State 2 - An intermediate “illness” state. Observations can come from state 1, and move on to state 3.
  • State 3 - An absorbing “death” state. Observations can come from state 1 or 2, but not leave.

This gives us three potential transitions between states:

  • Transition 1 - State 1 -> State 2
  • Transition 2 - State 1 -> State 3
  • Transition 3 - State 2 -> State 3

which is defined by the following matrix:

. matrix tmat = (.,1,2\.,.,3\.,.,.)

The key is to think of the column/row numbers as the states, and the elements of the matrix as the transition numbers. Any transitions indexed with a missing value . means that the transition between the row state and the column state is not possible. Let’s make it obvious, sticking with our “healthy”, “ill” and “dead” names for the states:

. mat colnames tmat = "healthy" "ill" "dead"

. mat rownames tmat = "healthy" "ill" "dead"

. mat list tmat

tmat[3,3]
         healthy      ill     dead
healthy        .        1        2
    ill        .        .        3
   dead        .        .        .

Now we’ve defined the transition matrix, we can use survsim to simulate some data. We’ll simulate 1000 observations, and generate a binary treatment group indicator, remembering to set seed first (I bashed the keyboard to pick my seed).

. set obs 1000
number of observations (_N) was 0, now 1,000

. set seed 9865

. gen trt = runiform()>0.5

The first transition-specific hazard has a user defined baseline hazard function, with a harmful treatment effect. The second transition-specific hazard model has a Weibull distribition, with a beneficial treatment effect. The third transition-specific hazard has a user-defined baseline hazard function, with an initially beneficial treatment effect that reduces linearly with respect to log time. Right censoring is applied at 3 years.

. survsim time state event, transmatrix(tmat)                                                       ///
>                           hazard1(user(exp(-2 :+ 0.2:* log(#t) :+ 0.1:*#t)) covariates(trt 0.1))  ///
>                           hazard2(dist(weibull) lambda(0.01) gamma(1.3) covariates(trt -0.5))     ///
>                           hazard3(user(0.1 :* #t :^ 1.5) covariates(trt -0.5) tde(trt 0.1)        ///
>                                   tdefunction(log(#t)))                                           ///
> maxtime(3)
variables time0 to time2 created
variables state0 to state2 created
variables event1 to event2 created

The hazard number # in each hazard#(), represents the transition number in the transiton matrix. Simple as that. survsim has created variables storing the times at which states were entered, with the associated state number and the associated event indicator. It begins by creating the 0 variables, which represents the time at which observatations entered the inital state, time0, and the associated state number, state0. As ltruncated() and startstate() were not specified, all observations are assumed to start in state 1 at time 0. Subsequent transitions are simulated until all observations have either entered an absorbing state, or are right-censored at their maxtime(). For simplicity, I will assume time is measured in years. We can see what survsim has created:

. list if inlist(_n,1,4,16,112)

      +----------------------------------------------------------------------------------+
      | trt   time0   state0       time1   state1   event1       time2   state2   event2 |
      |----------------------------------------------------------------------------------|
   1. |   0       0        1           3        1        0           .        .        . |
   4. |   1       0        1   .95636156        2        1           3        2        0 |
  16. |   0       0        1   1.0755764        2        1   2.4401409        3        1 |
 112. |   1       0        1   2.3290322        3        1           .        .        . |
      +----------------------------------------------------------------------------------+

All observations start initially in state 1 at time 0, which are stored in state0 and time0, respectively. Then,

  • Observation 1 is right-censored at 3 years, remaining in state 1
  • Observation 4 moves to state 2 at 0.956 years, and is subsequently right-censored at 3 years, still in state 2
  • Observation 16 moves to state 2 at 1.076 years, and then moves to state 3 at 2.440 years. Since state 3 is an absorbing state, there are no further transitions
  • Observation 112 moves to state 3 at 2.329 years. Again, since state 3 is absorbing, there are no further transitions

More soon.

Related

comments powered by Disqus