`survsim`

version 3.1.0

I’ve stuck with `survsim`

this week, now adding the ability to simulate survival times from a Markov multi-state model, defined by a collection of transition-specific hazard/intensity functions. Key developments:

- ability to simulate from a Markov multi-state model, assuming a clock-forward timescale
- observations can start in the same, or different, states using the
`startstate()`

option, and at different times using the`ltruncated()`

option

You can install the latest version of `survsim`

using:

```
net install survsim, from("https://www.mjcrowther.co.uk/code/survsim/")
```

or use `adoupdate`

if you already have version 3.0.0. It will be up on the SSC archive soon.

With the introduction of version 3.1.0, `survsim`

has five core settings for the simulation of survival data. There is now a top-level help file, found using `help survsim`

, with further help nested files covering each of the five areas.

$~$

#### Simulating from an illness-death model

We first define the transition matrix for an illness-death model. It has three states:

- State 1 - A “healthy” state. Observations can move from state 1 to state 2 or 3.
- State 2 - An intermediate “illness” state. Observations can come from state 1, and move on to state 3.
- State 3 - An absorbing “death” state. Observations can come from state 1 or 2, but not leave.

This gives us three potential transitions between states:

- Transition 1 - State 1 -> State 2
- Transition 2 - State 1 -> State 3
- Transition 3 - State 2 -> State 3

which is defined by the following matrix:

```
. matrix tmat = (.,1,2\.,.,3\.,.,.)
```

The key is to think of the column/row numbers as the states, and the elements of the matrix as the transition numbers. Any transitions indexed with a missing value `.`

means that the transition between the row state and the column state is not possible. Let’s make it obvious, sticking with our “healthy”, “ill” and “dead” names for the states:

```
. mat colnames tmat = "healthy" "ill" "dead"
. mat rownames tmat = "healthy" "ill" "dead"
. mat list tmat
tmat[3,3]
healthy ill dead
healthy . 1 2
ill . . 3
dead . . .
```

Now we’ve defined the transition matrix, we can use `survsim`

to simulate some data. We’ll simulate 1000 observations, and generate a binary treatment group indicator, remembering to `set seed`

first (I bashed the keyboard to pick my seed).

```
. set obs 1000
number of observations (_N) was 0, now 1,000
. set seed 9865
. gen trt = runiform()>0.5
```

The first transition-specific hazard has a user defined baseline hazard function, with a harmful treatment effect. The second transition-specific hazard model has a Weibull distribition, with a beneficial treatment effect. The third transition-specific hazard has a user-defined baseline hazard function, with an initially beneficial treatment effect that reduces linearly with respect to log time. Right censoring is applied at 3 years.

```
. survsim time state event, transmatrix(tmat) ///
> hazard1(user(exp(-2 :+ 0.2:* log(#t) :+ 0.1:*#t)) covariates(trt 0.1)) ///
> hazard2(dist(weibull) lambda(0.01) gamma(1.3) covariates(trt -0.5)) ///
> hazard3(user(0.1 :* #t :^ 1.5) covariates(trt -0.5) tde(trt 0.1) ///
> tdefunction(log(#t))) ///
> maxtime(3)
variables time0 to time2 created
variables state0 to state2 created
variables event1 to event2 created
```

The hazard number `#`

in each `hazard#()`

, represents the transition number in the transiton matrix. Simple as that. `survsim`

has created variables storing the times at which states were entered, with the associated state number and the associated event indicator. It begins by creating the `0`

variables, which represents the time at which observatations entered the inital state, `time0`

, and the associated state number, `state0`

. As `ltruncated()`

and `startstate()`

were not specified, all observations are assumed to start in state 1 at time 0. Subsequent transitions are simulated until all observations have either entered an absorbing state, or are right-censored at their `maxtime()`

. For simplicity, I will assume time is measured in years. We can see what `survsim`

has created:

```
. list if inlist(_n,1,4,16,112)
+----------------------------------------------------------------------------------+
| trt time0 state0 time1 state1 event1 time2 state2 event2 |
|----------------------------------------------------------------------------------|
1. | 0 0 1 3 1 0 . . . |
4. | 1 0 1 .95636156 2 1 3 2 0 |
16. | 0 0 1 1.0755764 2 1 2.4401409 3 1 |
112. | 1 0 1 2.3290322 3 1 . . . |
+----------------------------------------------------------------------------------+
```

All observations start initially in state 1 at time 0, which are stored in `state0`

and `time0`

, respectively. Then,

- Observation 1 is right-censored at 3 years, remaining in state 1
- Observation 4 moves to state 2 at 0.956 years, and is subsequently right-censored at 3 years, still in state 2
- Observation 16 moves to state 2 at 1.076 years, and then moves to state 3 at 2.440 years. Since state 3 is an absorbing state, there are no further transitions
- Observation 112 moves to state 3 at 2.329 years. Again, since state 3 is absorbing, there are no further transitions

More soon.